Monthly Archive for September, 2008

Thoughts about Exceptions

Exceptions and return values

To signal the impossibility of a function to perform its assigned task, two common representations are used: using the return value of the function to signal the failure (using either a variant return type or an output argument to return the actual result of the function if there is one, or throwing an exception.

Return values have several shortcomings that exceptions solve.

  • Return values can be ignored, meaning that the main program flow can continue executing as if a failure did not happen. This can lead to the program silently ignoring the failure of a necessary operation (for instance, noticing that a file could not be saved before quitting, leading to data loss) or to the manipulation of invalid data (for instance, using data that could not be loaded, which creates erroneous results, a hard crash, or a security vulnerability). By using exceptions, the execution flow will be interrupted even if the programmer forgets to check a return value.

    And automating the resolution of “the programmer forgets” issues is a good thing.

  • In many languages, exceptions are the only means of signaling failure in certain language constructs. This includes constructors and operators. Of course, there is another way of signaling failure in a constructor, by using a “failure” state for the object. However, unless the intended semantics of the object include that failure state (as would be the case, for instance, for a read-from-file stream which may encounter many errors beyond the file-not-found failure in the constructor) introducing that state forces all users of the object to test for that failure state before using the object. This is usually not acceptable for common objects, and often leads to the users forgetting or outright skipping the test.

Of course, in a program where failure recovery is not important, or which is small enough to warrant good programming practices from beginning to end, return values can be an acceptable choice. If a program is not used to manipulate critical data (for instance, a video game) and is properly sandboxed (no write access to disk except for select locations), then gracefully exiting with an exception or crashing because of data corruption following an untested return value are more or less equivalent (even if the latter feels somewhat less professional).

A common argument for and against exceptions is the syntactic weight of exception handling. In this regard, having to place a try { … } catch() { … } around every statement that might throw an exception is indeed a lot of overhead.

Did he just say that?

Yes, I did. You might have to place a try-catch block around every statement that might throw. Of course, this is somewhat alleviated by the fact that you can incorporate several statements in the same try-catch block (assuming that they don’t throw the same exceptions or that you don’t need to treat them differently), with the obvious limit that there are not that many statements that can throw in a given function anyway.

But that’s not the real issue with the statement, is it? The typical philosophy of the “good” user of exception is that exceptions should be left to propagate up the call stack through one or more functions until something can handle them. This is a very bad idea.

Exceptions are a means for a function to communicate with its environment. If you look at exceptions with the eyes of a functional programmer, you’ll notice that throwing an exception is nothing more than returning a variant type which can be either a value or an error, with the language support for the semantic sugar that says expressions or statements (other than exception handlers) that involve an exception merely evaluate to that exception themselves. As such, and as often observed by designers, thrown exceptions are part of a function’s signature as much as its arguments or its return value. Java went as far as adding checked exceptions (listing and type-checking exceptions thrown by a function as part of that function’s signature), an attempt which can be praised, though it fell short because of the lack of expressiveness of the associated type system.

The consequence is that if a function is somehow part of the public interface of a class or module, its signature must match its documented semantics. In the same way that one does not expose internal implementation details through a function’s arguments or return type, one should not expose those same details through the exceptions thrown by that function. If you look for a second at the exceptions thrown by the implementation of the function, they are either specific or generic.

A specific exception reveals explicit details about where it was thrown as why, and incorporates that information in its type (so that the appropriate handlers may intercept it). Including those exceptions in the signature of a function would reveal its internal implementation: where one would call a function to get the data received through a socket for a certain user, one would expect an “Unknown user identifier” exception, and not a “Key not found in dictionary” or “Index out of bounds in array” exception. What is relevant here is not the implementation information (a dictionary or array is used, and a key or index was invalid) but the semantic information that such an implementation cannot explain (that the required user was not found). When such exceptions are thrown, it is imperative that they are caught and translated into new exceptions before they leave the function. Of course, this imposes absolutely no constraint on private functions, which may let escape any implementation-related exceptions they wish.

A generic exception reveals no useful information to the type system (and thus does not allow efficient error recovery beyond a “Retry, Abort, Ignore” prompt) and often does not reveal any information to the user either. Consider the typical exception of the form “I failed because {string message describing cause}” or “I received an invalid argument {string message describing cause}” and its typical consequence : a 30-line Java call stack with an exception at the top that cannot be comprehended by the user without access to the code that threw it.

The inheritance-based solution of inheriting a specific exception from a generic one works in that it doesn’t reveal information about the internal implementation, but it does not improve the information given about the failure in terms of the public interface.

The end result is that most public functions will either have to catch-and-throw every exception (translating them along the way) or give up and provide overly generic useless explanations.

Polymorphism

A notable exception is when the function is not aware of the exceptions that a function it calls may throw. The situation where this happens is when (through object or functional polymorphism) the function being called was provided to the function doing the call through means other than compile-time specification (for instance, as an argument, or as a member variable of the object containing the function doing the call).

This is correct behavior: somewhere up the call stack, there is a function which is aware that the caller function was connected to the called function (because it, or a function of the same module, did that connection in the first place), and is thus aware that the called function may throw a certain set of exceptions, which it may handle.

A classic example would be a typical for-each function which calls another function on every object in a collection. The for-each function cannot know, in advance, that the other function opens a file and may throw a “File not found” exception as part of this execution, and thus cannot and should not translate that exception. However, the caller of the for-each function provided it with the function that it knows will attempt to open a file. As such, it can use an exception handler for the “File not found” exception that escaped the for-each function.

Two issues appear here. The first is the fact that most type systems which include exceptions (such as Java’s checked exceptions) cannot express the behavior “I may throw anything my argument throws, plus these additional exceptions of my own” : they need explicit lists, and while inheritance does alleviate this burden, it results in the information loss related to the lack of constraints between types present in the signature.

The second issue is that the intermediary function may contain handlers of its own (because it might do work other than just calling the function in special ways). These handlers may catch the exceptions thrown by the polymorphic function or object, which will confuse both the intermediary function (it expects issues for its own code, not for the called code) and the original function (which expected a failure). Again, through proper application of an exception-handling type system, a function may only catch exceptions which it knows can be thrown, bringing the exception-handling solution closer, in terms of functionality, to using continuations for exception handling.

SQL Generation, Part 1

As a complementary solution to generating PHP views is the generation of PHP models in a Model-View-Controller approach. In this example, the following model is used:

  • A user has a first name, a last name, and a password (stored as a hash).
  • A user may have at most one session, which contains an optional IP and an optional session token.
  • A user has a list of favorites (stored as URLs).

To represent these constraints, the Objective Caml code below is used:

let structure  =
  empty
  >> create_entity "user"
    [ "firstname", "VARCHAR(255) NOT NULL" ;
      "lastname", "VARCHAR(255) NOT NULL" ;
      "passhash", "VARCHAR(64) NOT NULL" ]
  >> add_extension "user" "session"
    [ "ip", "VARCHAR(16) NULL" ;
      "token", "VARCHAR(64) NULL" ]
  >> add_list_field "user" "favorites" "VARCHAR(255)"

Note: here, the >> operator is used to mean x >> f == f(x). An extension defines an entity as being tied to another entity with optional semantics: the user entity may have at most one session entity, and a given session always has exactly one associated user. A list field associates several elementary values of a certain type with a single entity in a classic one-to-many relationship. The above structure is automatically transformed by a compiler function into this SQL code:

CREATE TABLE `user` (
  `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  `passhash` VARCHAR(64) NOT NULL,
  `lastname` VARCHAR(255) NOT NULL,
  `firstname` VARCHAR(255) NOT NULL
); 

CREATE TABLE `session` (
  `token` VARCHAR(64) NULL,
  `ip` VARCHAR(16) NULL,
  `user` INT NOT NULL UNIQUE REFERENCES `user`(`id`)
); 

CREATE TABLE `user_favorites` (
  `value` VARCHAR(255),
  `user` INT NOT NULL REFERENCES `user`(`id`)
);

Design and rationale

The underlying data structure used by the generator is divided into three main categories:

  • Entities are the core concept, they are uniquely identified by a name. In themselves, they carry no information, except for an identifier which can be used to distinguish two entities of the same type from one another (for instance, differentiating two users using their key). Entities can have properties and can be related to each other by bindings.
  • Properties add a basic piece of information (an integer, a string or something close) to an entity. All entities of a given type have that property.
  • Bindings tie together two types of entities. A binding incorporates the notion of arity, meaning that it may relate zero or one, exactly one, or many entities of a type with zero or one, exactly one, or many entities of another type. A binding may also be normalized : either it is made part of the first entity type (meaning the arity for the other type cannot be “many”), or it is made part of the second type (meaning the arity for the first type cannot be “man”), or it is turned into a standalone relationship which gets its own table.

This structure differs from the typical “table with columns” approach that is prevalent in SQL in particular and in relational algebra in general, though it is perfectly equivalent to the latter : while an entity-property-binding representation can be turned into a table-column representation by making every entity and binding a table, and sticking properties as columns of the entity tables, the reverse transform is possible:

  • A table is an entity.
  • The non-reference columns of a table are properties of that table’s entity.
  • The reference columns of a table are bindings of the appropriate arity.

For the purposes of generating code, however, the entity-property-binding method is superior because it separates properties from bindings : while a property affects only the entity to which it is associated, a binding affects two entities. For instance, the unique identifier for an entity is not generated if that entity is an extension (that is, it is bound to exactly one other entity of another type, and is not subject to any other bindings). Also, because of normalization being a configurable option, it’s possible to decide whether the binding is a relationship or a column.

The entire data structure is kept in a single object, which acts as an associative container where new entities, properties and bindings may be added. The implementation of simple add_entity, add_property, add_binding functions allow the creation of more elaborate data structure generators:

let create_entity e l s =
  s >> add_entity e
    >> add_fields e l

let add_extension e n l s =
  s >> add_entity n
    >> add_binding (`One e) (`Option n) e `B
    >> add_fields n l  

let add_list_field e n k s =
  let n = e ^ "_" ^ n in
  s >> add_entity n
    >> add_fields n ["value",k]
    >> add_binding (`One e) (`Many n) e `B

The `B argument means that the binding should be inserted into the right member of the relationship.

Implementation

A minimally documented implementation is available for download here : files/articles/sql-gen-1.ml

Design patterns

As with PHP views, Objective Caml can now serve as a meta-language, useful for describing processes that lead to the creation of data structures, and in particular of typical database design patterns. For instance, consider a simple rights management system, where a table maps object/user pairs with a set of rights (and a different such table exists for every kind of objects that can have rights). Adding rights management is then done as:

let add_rights users objects rights s =
  let id = objects ^ "_rights" in
  s >> create_entity id
    [ "rights" , "set(" ^ String.concat "," rights ^ ") NOT NULL" ]
    >> add_binding (`Many id) (`One users) "user" `A
    >> add_binding (`Many id) (`One objects) objects `A 

let structure =
  empty
  >> add_entity "user"
  >> add_entity "page"
  >> add_rights "user" "page" [ "read" ; "write" ]

Similarly, one could define a tree with the nodes referencing data from a given entity type:

let add_binary_tree_of data name s =
  s >> add_entity name
    >> add_binding (`Many name) (`One data) "data" `A
    >> add_binding (`One name) (`Option name) "left" `A
    >> add_binding (`One name) (`Option name) "right" `A 

let structure =
  empty
  >> add_entity "node"
  >> add_binary_tree_of "node" "tree"

Further work

Now that the underlying data structure itself can be generated automatically, the operations (either in a high-level language or directly as SQL stored procedures) appears as a logical next step.

Before-After Binding

Decorating method calls

A common feature of aspect-oriented programming is the ability to respond to the occurrence of a certain event by performing an action. A typical application of this is, for instance, to respond to the deletion of a user (something which is handled by a user-management module) by also removing that user’s access rights (this is done by the rights-management module, of which the user-management module is unaware) and that user’s personal documents (done by yet another independent module).

This particular pattern also exists in graphical user interfaces, when certain user input operations trigger events that may be forwarded to any number of listeners.

In this proposed implementation, every decorated method comes in four pieces : a call-binder object which is used to store listeners and forward function calls, a front-office function which is publically available for calling (and which is the one being decorated), a function for accessing the binding object, and a function for actually implementing the function being decorated. For instance:

class example =
object
  val act_bind = new callbind
  method act = act_bind # call () (self # act_impl)
  method act_bind = act_bind
  method private act_impl () = print_endline "Call!" 
end 

let _ =
  let ex = new example in
  ignore (ex # act_bind # before (fun () -> print_endline "Before!")) ;
  ignore (ex # act_bind # after (fun () -> print_endline "After!")) ;
  ex # act

Implementation

The implementation consists in storing a list of functions to be called for both events (before the call and after the call), with the appropriate removal semantics (adding an element returns a unique identifier which can be used to remove the function).

type binding = int 

let add e = function
  | [] -> (0:binding), [0,e]
  | ((n,_)::_) as l -> (n+1), (n+1,e)::l 

let rec rm (i:binding) = function
  | [] -> []
  | (j,f)::t -> if i = j then t else (j,f)::(rm i t) 

class ['a,'b] callbind =
object
  val mutable before = []
  val mutable after = [] 

  method call (arg:'a) (f:'b) =
    List.iter (fun (_,x) -> x arg) before ;
    let result = f () in
    List.iter (fun (_,x) -> x arg) after ;
    result 

  method before f    = let (i,l) = add f before in before <- l ; i
  method after f     = let (i,l) = add f after in after <- l ; i
  method rm_before i = before <- rm i before
  method rm_after i  =  after <- rm i after
end


1170 feed subscribers
(readers who polled a feed this week)