Monthly Archive for September, 2008

Recursive Descent Parsers

Parsing and grammars

A grammar describes how to generate a sequence of tokens that satisfies certain rules. For instance, the grammar for a basic calculator would be:

e = 'integer' [Value]
  | e '+' e   [Add]
  | e '*' e   [Mul]
  | '(' e ')'

Parsing is the process of obtaining a derivation that generates a sequence of tokens from a grammar. For instance, a token sequence such as:

2 + 3 * 4

Can be created from the above grammar using one of the derivations:

Mul (Add (Value 2, Value 3), Value 4)
Add(Value 2, Mul(Value 3, Value 4))

Of course, the grammar used as an example is ambiguous (there are several possible derivations), and in the real world one prefers non-ambigous grammars (so that the result of the parsing is always unique). An example of non-ambiguous grammar for the above is a grammar that takes into account the differences in priority for addition and substraction. So:

base = 'integer' [Value]
     | '(' expr ')' 

add = mul
    | mul '+' add [Add]

mul = base
    | base '*' mul [Mul]

expr = add

Where the first derivation (which was incorrect as far as priorities are concerned) is now impossible, because an Add can only appear inside a Mul if it’s inside brackets.

Recursive Descent Parsers

The simplest way to construct a parser from a grammar is to use a recursive descent parser. Assuming that the input is tokenized (every integer and operator appears as a single string within a list of strings), then a parser that effectively recognizes the expressions above is:

exception Syntax 

type r =
  | Value of int
  | Add of r * r | Mul of r * r

let rec rule_base = function
  | [] -> raise Syntax
  | "(" :: t -> let e, t = rule_expr t in
                ( match t with ")" :: t -> raise Syntax | l -> e, l )
  | i :: t -> try Value (int_of_string i), t with _ -> raise Syntax
and rule_mul t =
  let a, t = rule_base t in
  match t with
    | "*" :: t -> let b, t = rule_mul t in Mul (a,b), t
    | _ -> a, t
and rule_add t =
  let a, t = rule_mul t in
  match t with
    | "+" :: t -> let b, t = rule_add t in Add (a,b), t
    | _ -> a, t
and rule_expr t = rule_add t 

let _ = rule_expr ["2";"+";"3";"*";"4"]

The parser correctly answers Add (Value 2, Mul (Value 3, Value 4)).

Handling Errors

A syntax error appears when the provided sequence does not follow the grammar, and there is therefore no possible derivation for that sequence. Syntax errors have three possible causes : a necessary token was forgotten (consider the sequence (2 + 3 * 4 where a closing brace is missing), a token was inserted where it doesn’t belong (consider the sequence 2 ++ 3 * 4 where a surnumerary plus was added) or a correct token was replacedwth an incorrect token (consider @ + 3 * 4 where the 2 was mistyped into an @). Most of the time, when a token that doesn’t fit is encountered, it’s difficult to determine whether that token is the error, or whether a token was forgotten before it (should 2 ++ 3 be 2 + 3 or 2 + 1 + 3?).

Because of this, most parsers merely indicate the problematic token where a syntax error has happened (this is what Objective Caml does) while others also add the list of tokens that could have been accepted (see for instance PHP’s infamous Paamayim Nekudotayim error message).

The problem is that naive Recursive Descent Parsers are quite bad at handling or detecting syntax errors as soon as alternatives are involved. Consider the grammar built on top of the previous one:

def = 'let' 'n' '=' def 'in' def [Def1]
    | expr                       [Def2]

prog = 'eof'                   [Prog1]
    | 'let' 'n' '=' def prog [Prog2]
    | def ';;' pro           [Prog3]

Consider now the sequence of tokens let n = 10 %, where the final token is obviously an error. The parser will explore the possible derivations in this order:

  • Prog1 fails (’let’ is not ‘eof’).
  • Prog2 runs and reaches ‘prog’:
    • Prog1 fails (’%’ is not ‘eof’)
    • Prog2 fails (’%’ is not ‘let’)
    • Prog3 reaches ‘def’:
      • Def1 fails (’%’ is not ‘let’)
      • Def2 fails (’%’ does not exist in ‘expr’)
    • … so ‘prog’ fails.
  • … so Prog2 fails.
  • Prog3 reaches ‘def’:
    • Def1 fails (’%’ is not ‘in’)
    • Def2 fails (’let’ does not exist in ‘expr’)
  • … so Prog3 fails.

The last failure encountered was that ‘let’ is not a valid initial token, since it cannot generate ‘expr’. Displaying this error is of course nonsensical : the actual error is the final token, not the first one.

The solution is to store the furthest token that generated an error (in terms of distance from the first token in the token stream) along with the list of tokens that it was expected to be. In this case, the token errors would be:

  • Token 0 (’let’) : expected ‘eof’, ‘(’ or ‘value’
  • Token 3 (’10′) : expected ‘let’ or ‘(’
  • Token 4 (’%') : expected ‘eof’, ‘let’, ‘(’, ‘value’ or ‘in’

And so the error provided would be along the line of “Syntax error on token ‘%’ after token ‘10′, expected ‘let’, ‘(’, ‘in’, a value or the end of file”.

Another key idea to keep in mind is the fact that one can remember opening braces, and when a closing brace is expected, signal the position of the opening brace.This would help for a missing ‘=’ after a ‘let’, for instance.

A sample implementation

The implementation can be found here.

Its interface header would look like this:

(** The description of a set of tokens. *)
module type TOKENS =
  sig
    (** The available tokens. *)
    type token 

    (** Errors generated when tokens are expected but
        not found. *)
    type error 

    (** Printing an error *)
    val string_of_error : error -> string 

    (** The end-of-file token. *)
    val eof : token
  end 

(** A recursive descent parser over a set of tokens. *)
module RecursiveDescentParser :
  functor (Tokens : TOKENS) ->
    sig 

(* ------------------------------------------------------ *) 

      (** Description of a failure *)
      type failure 

      (** Show a failure. Internally uses 
          Tokens.string_of_error on involved errors. *)
      val pretty_print : failure -> string 

      (** The exception thrown when a failure occurs. *)
      exception ParseError of failure 

(* ------------------------------------------------------ *)       

      (** An expectation function. Returns true if the token
          was expected *)
      type expect = Tokens.token -> bool 

(* ------------------------------------------------------ *) 

      (** A parsing context. This is what your parsing rules
          will be passing around. *)
      type ctx 

      (** A rule : receives a context, returns whatever was
          extracted from the context along with the new 
          context (includes any possible failures or removed
          tokens. *)
      type 'a rule = ctx -> 'a * ctx 

(* ------------------------------------------------------ *) 

      (** Get the next token in a context. Does not alter
          the context. Return Tokens.eof if no more tokens
          are available. *)
      val next : ctx -> Tokens.token 

      (** Get a context minus its first token. An empty 
          context remains an empty context. The original 
          context is not altered. *)
      val shift : ctx -> ctx 

      (** Create a new context from a list of tokens *)
      val stream_of_list : Tokens.token list -> ctx 

(* ------------------------------------------------------ *) 

      (** Have the context remember a certain failure 
          (to improve reporting) before trying again to 
          parse something else. *)
      val add_failure : failure -> ctx -> ctx 

      (** Raises a [ParseError] exception bound to the
          provided context and error. *)
      val fail : Tokens.error -> ctx -> 'a 

      (** Raises a [ParseError] exception.
          [fail_close err old ctx] works like 
          [fail err ctx], but the error is also bound 
          to a former context [old]. This serves to reference
          the opening brace tied to a missing closing brace,
          or similar constructs. *)
      val fail_close : Tokens.error -> ctx -> ctx -> 'a 

      (** Either the expectation is satisfied by the next
          token, or call [fail]. *)
      val expect : expect * Tokens.error -> ctx -> ctx 

      (** As [expect], but uses [fail_close]. *)
      val expect_close : expect * Tokens.error -> ctx -> ctx -> ctx 

(* ------------------------------------------------------ *) 

      (** Given a rule, creates a parenthesized version which
          [expect]s a first token, then reads the rule, then 
          [expect_close]s a second token, then returns the data
          returned by the rule. *)
      val between :
        expect * Tokens.error -> 'a rule -> expect * Tokens.error -> 'a rule 

      (** Given a list of rules, tries every one of them in
          order and returns the data of the first that did 
          not throw an exception. If all fail, also fails. *)
      val one_of : 'a rule list -> 'a rule 

      (** Reads as many instances of the rule from the input, until
          it fails, and returns the list in order. *)
      val many : 'a rule -> 'a list rule 

    end

The interface could certainly be improved (for instance, monadic manipulation of rules, or access to the internal representation) and so could the implementation (the manyfunction is not tail-recursive yet). Also, another of the classic issues of recursive descent parser (which is left-associative operators) is not solved.

An Introduction to C++ Pointers

This entire text was originally part of a post I made on the GameDev.Net forums, which I’ve reformatted slightly. It exposes what is in my opinion a good explanation of what C++ pointers are and what they can be used for, starting with the elementary concepts required to introduce the hellish semantics that they provide. Of course, this article in itself is no match for starting programming with a language that uses reference semantics, but it’s still a good starting point.

Without any more discussion, here is the explanation:

Values

C++ programs manipulate values. Typical values include integers, such as 42, characters, such as ‘x’, objects of more complex types, such as std::string or std::fstream, and also instances of any class you define in your program.

Variables and References

Values need to be manipulated, which involves giving them a name. A reference is a name given to a value. Creating a reference is done like so:

Type & name = value;

Where value is usually the return value of a function or operator…

std::ostream &o = (std::cout << "Text.");

…but might also be another name given to the same value:

std::ostream &o = std::cout;

Another way to create a reference is as a function argument:

void foo(Type & name)

This will cause the value which was passed as argument to be available under that name in the entire function body. This can be quite useful if you wish to modify that value.

Of course, almost any program needs to create values. In C++, values are always created using operator new (except for literals such as 42 or “Hello”, which simply exist). operator new exists in several versions. The default version generates dynamic values, which exist until delete is called:

int & value = * new int;
value = 10;
std::cout << value;
delete (&value);

This causes performance problems if used too often, because dynamic memory allocation is slow in a non-garbage-collected language like C++. This is why another, faster version exists: placement new creates a new value using existing memory. Of course, the lifetime of the value is then limited by the lifetime of the memory which was used to create it, and it must of course be manually destroyed by calling its destructor:

std::string & str = * new (existing_memory) std::string("Hello");
str += " world!";
std::cout << str;
str.~string();
// existing_memory dies somewhere after this line

The existence of placement new allows the programmer to use memory areas which are faster than the free store used by the default operator new: it may use stack memory, a specially reserved portion of global memory, or a subset of the memory used by another value, by increasing speed of allocation. For instance, using stack memory:

 int & value = * new (alloca(sizeof(int))) int;

Since allocating objects on the stack, in global memory or as a subset of an object is a very repetitive task which involves obtaining the memory, computing the size to be allocated, and then calling the destructor of the object right before the memory is gone, the C++ language provides shortcuts to do this. Take for example stack allocation of a string:

void sayhello()
{
  std::string & str = * new (alloca(sizeof(std::string))) std::string("Hello");
  str += " world!";
  std::cout << str;
  str.~string();
}

I have underlined the repetitive tasks that are related to stack allocation. The shortcut provided by C++ in this situation is as follows:

void sayhello()
{
  auto std::string str("Hello");
  str += " world!";
  std::cout << str;
}

The main differences here are the presence of the auto keyword, the absence of the & in the definition of the reference, and the absence of the str.~string() destructor call at the end of the function. The first two differences cause C++ to allocate a new block of memory on the stack (auto) which will be released when the function returns and initialize it with a new value from the string literal “Hello”. The auto keyword also causes the compiler to generate destructor calls: the object will be automatically destroyed right before the function exits (since that is when the stack allocation is released). In this situation auto is aptly named a storage specifier.

The other storage specifier is static, which uses global memory: defining a local variable as static causes the compiler to allocate some memory for it as part of the global data span (at compile-time) and then initialize the value at that spot in memory the first time the variable’s definition is encountered. Since auto is the default storage specifier, it is generally omitted from the definition, which leads to the typical variable definition that most C and C++ programmers are used to:

std::string str("Hello");
str += " world!";
std::cout << str;

Variables at global scope automatically use global storage.

Member variables of structures or classes use the third type of allocation: they use a bit of reserved memory that was part of the value of which they are a member, and they are initialized there as part of that value’s constructor. Their destructor is automatically called as part of the value’s destructor. Before:

struct super
{
  std::string & str;
  some_memory_buffer;
  super() : str(* new (some_memory_buffer) std::string) {}
  ~super() { str.~string(); }
};

After:

struct super
{
  std::string str;
  // No explicit memory buffer
  // Automatic construction
  // Automatic destruction
};

To summarize: references are names given to values. C++ provides special shortcuts to create a value and bind it to a reference at the same time, with special rules about when the created value is to be destroyed.

Iterators and Sequences

A sequence is, as its name hints at, a sequence of values. It could be, for instance, the sequence 1, 2, 3, 4. Or, it could be the set of opponents in a video game, in an arbitrary order. Or, it could be the sequence of characters being read from std::cin. In short, sequences are the fundamental concept used to represent groups of objects that are to be manipulated together.

A typical representation of sequences is through iterators, which are means of iterating over a sequence (accessing its elements in order). The basic operations required to iterate are the ability to read the “current value”, the ability to move on to the “next value”, and the ability to determine if the end of the sequence was reached and there are no values left. For instance, pseudocode to add together values of a sequence through iteration:

a = 0
while not end-of-sequence(iter)
  a += value(iter)
  iter = next(iter)

Various languages provide various forms of iterators, but the three operations outlined above are always present. For instance, Objective Caml uses (it = []), List.hd it and List.tl it as the end-of-sequence, value and next operations. C++ uses it == end, *it and ++it as the end-of-sequence, value and next operations. So, C++ code to add together the values of a sequence (represented by two iterators of type Iter representing the first iterator of the sequence and the first iterator past the sequence) would be:

template<typename Iter>
int sum(Iter begin, Iter end)
{
  int a = 0;
  while (begin != end)
  {
    a += *begin;
    ++begin;
  }
  return a;
}

The begin-end representation of sequences is standard. The end iterator is the first iterator after the sequence: as such, it has no associated value, and trying to obtain the iterator after end is invalid. So, you cannot *end or ++end. You can, however, compare end with another iterator to see if that other iterator has reached the end of the sequence.

The point of using a past-the-end iterator (one that isn’t in the sequence, but is actually right after it) instead of a last-element iterator (one that corresponds to the last element in the sequence) is twofold. First, it makes code less complex, because the condition “I have reached the past-the-end iterator on this iteration” is easier to evaluate than “I was working on the last iterator on the previous iteration”. Second, it allows representation of empty sequences (since the empty sequence has no last element and as such could not be represented using a last-iterator approach).

An iterator which supports only *it and ++it is called a forward iterator. Some iterators also support –it (which moves to the previous element), making it a bidirectional iterator. Some iterators also support it + n and it – n, allowing the iterator to move an arbitrary number of steps backward or forward in a single jump, which is called a random access iterator. Iterators may in turn be read-only (input iterator), write-only (output iterator) or support both read and write.

Iterators are a very powerful concept: most of the things in C++ which look like sequences can be turned into iterators.

  • The sequence of elements in a vector is represented by vect.begin() and vect.end(), which are random access read-write iterators (or read-only, if the vector is constant).
  • The sequence of elements in a list is represented by list.begin() and list.end(), which are bidirectional read-write iterators (or read-only, if the list is constant).
  • The sequence of objects of type T read from an std::istream such as std::cin is represented by std::istream_iterator<T>(std::cin) and std::istream_iterator<T>() (past-the-end iterator) which are forward input iterators.
  • Adding a sequence of elements at the end of any non-associative standard library container c is represented by std::back_inserter(c), with no end iterator (this could be an infinite sequence) and is a forward output iterator.

The list goes on and on. This system is then combined with iterator-based algorithms, such as std::copy(src_begin, src_end, dest_begin), where src_begin and src_end are at least forward input iterators representing the input sequence to be copied, and dest_begin is at least a forward output iterator representing where the input sequence should be copied to. As such, reading as many integers as possible from standard input and placing them in a vector is as simple as:

std::vector<int> integers;
std::copy( std::istream_iterator<int>(std::cin),
           std::istream_iterator<int>(),
           std::back_inserter(integers) );

Or, we could add together the values read on the standard input using our function above:

std::cout << sum( std::istream_iterator<int>(std::cin),
                  std::istream_iterator<int>() );

Just like we could add together the elements of a vector:

std::cout << sum( integers.begin(), integers.end() );

Pointers

Pointers are a concept originally introduced in C. From a semantic point of view, pointers are random access read-write iterators. As iterators, they represent sequences of contiguous elements of the same type—such sequences are created either using a vector, or using a block allocation approach such as new[] or arrays.

Given a block of N objects of type T, a pointer representing that sequence is of type T*.

Depending on the nature of the block, obtaining the first pointer begin in the sequence varies.

  • T *begin = new T[N]; will directly return the first pointer in the sequence.
  • If a is an array of N objects of type T then T *begin = a; will create the first pointer in the sequence of elements of a.
  • Otherwise, if t (of type T) is the first element of a sequence (as opposed to the first iterator of a sequence), then T *begin = &t; is the first iterator of the sequence.

The past-the-end iterator of a sequence is simply defined as T *end = begin + N;

int values[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 };
std::cout << sum(values, values + 10);

As with any other iterator, going beyond the bounds of a sequence (before begin or after end) results in undefined behavior, which will usually lead to random misbehavior of your program.

All of the above also holds when N = 1 (which means that a single object is also a one-object sequence, which can as such be manipulated using a pointer).

The main problem with pointers is that this is not everything. If pointers were merely iterators, then people would have no more trouble with them than they would have with the other iterators in the standard library. However, pointers were introduced in the C language as a means to represent several other concepts.

The first of these concepts is reference semantics. You see, in C, there are no references per se: every time you create a variable, it also creates a value. This makes it difficult to handle dynamic memory (because every variable you create already has its own value, so how do you give your dynamic memory a name in order to use it?) and allow functions to modify a variable in another function (because all the arguments to your function are, by default, brand new values that will disappear when the function returns, and you have no access to the values in the calling function). The solution adopted was to decide that, since pointers are read-write iterators, and every value is a one-element sequence by definition, then pointers can be used to give indirect names to values.

For instance:

 /* Solve aX² + bX + c */
int find_solutions(float a, float b, float c, float *x1, float *x2)
{
  float delta = b * b - 4f * a * c;
  float sq;  

  if (delta < 0f)
    return 0;
  sq = sqrt(delta);
  *x1 = (- b - sq) / (2 * a);
  *x2 = (- b + sq) / (2 * a);
  return 2;
}

Important note: the above is a C function which uses idioms not found or advised in C++ code. The C++ equivalent would use references instead of pointers, and would probably return a boost::optional<std::pair<float,float>> anyway.

This function needs to return the number of solutions (zero or two), but it also needs to send back the values of those solutions. In order to do so, it uses a trick which consists in being given two iterators to one-value sequences and modifying the unique value in each sequence. Since a copy of an iterator of a sequence is a new iterator of that same sequence, the iterator allows the function to modify the original sequence—and the values inside that sequence are the values from within the calling function.

int main()
{
  float x1, x2;
  float *px1 = &x1; // An iterator to the one-element sequence 'x1'
  float *px2 = &x2; // An iterator to the one-element sequence 'x2'  

  find_solutions(1f, 0f, -1f, px1, px2);   

  // The function call changed the contents of the sequences
  // iterated by px1 and px2, namely x1 and x2. As such, the
  // values of x1 and x2 have been changed.
}

Of course, C++ provides references, which is why pointers are not used for pass-by-reference in C++ at all (as references provide a far more simple representation of one-element sequences).

Another use of pointers, which remained in C++, is the representation of reseatable references. In C++, a reference is a name given to a value, and the value is bound to the name forever. It is not possible to bind the name to another value (though it is possible, of course, to bind several names to the same value). Reseating a reference means binding it to another value, and is possible in languages such as C# or Java, but not in C++.

However, pointers are merely iterators, and it’s possible to assign an iterator to another (thereby changing the sequence that the iterator is traversing, or the point in the sequence where the iterator is)—after all, this is the entire point of traversing a sequence with a single iterator!

Therefore, when it becomes useful to have a reference to a value, but the ability to change that value is also useful, people sometimes use a pointer (though, in modern C++, there are other types, such as boost::shared_ptr or boost::weak_ptr, which do a far better job). For instance, if you wish to reference a fighter’s target (and fighters can change targets at will):

struct fighter
{
  object *target;
}

As such, a fighter’s target can be changed by changing the pointer so that it corresponds to another value instead of the original one.

A third use of pointers, introduced in C, was optional semantics. An option type allows the possibility for a value to be absent. For instance, one could decide that sqrt returns a float option: either its argument is positive and it returns a float, or it is negative and then it returns nothing.This is quite similary to the option types found in ML languages, for instance Objective Caml’s ‘a option.

Instead of defining an option type distinct from an iterator type (as C++ does for all its iterators), C merged the two together, and this carried over to C++. A pointer can be constructed from the integer constant zero, and thus becomes the null pointer. This is a compile-time effect: the compiler notices that you’re assigning zero to a pointer, and transforms that integer into the actual value of the null pointer (which may be something entirely different from zero).

The null pointer may not be used in any way (increment, decrement, arithmetic, dereferencing), except to be compared for equality with another pointer, to be assigned a new pointer, or to be tested as a boolean condition (a null pointer always evaluates to false, while other pointers evaluate to true).

To summarize: pointers are bidirectional read-write iterators used in C for reference semantics and reseatable reference semantics, as well as a clunky option type.

AJAX Without Javascript

AJAX is an acronym for Asynchronous Javascript And XML. It allows changing parts of a page based on data present on the server without having to reload the entire page.

The technical solution involves some javascript code that is executed by the browser and sends an HTTP request to the server. The server then responds with the appropriate data and the javascript code parses the data (originally in XML, now currently in the JSON serialized format) and displays it where required.

Writing AJAX by hand quickly became old, and instead several javascript libraries provide facilities to perform AJAX queries. One such library (used in this example) is jQuery.

The problem

AJAX still requires a reasonable amount of work both on the server and on the client. Server-side, the programmer has to set an access path so that the AJAX requests are received and handled, and must also handle the sending back of data to the page. Client-side, the programmer still has to specify what data to put where.

Let’s consider a simple problem: a page has two buttons labeled “Increment” and “Reset”, a numerical counter (which is incremented by the increment button and reset by the reset button), and a short sentence which describes how many times the counter has been reset.

Ideally, the page must satisfy that:

  • Clicking the Increment button may only have to reload the two counters (the actual counter, and the increment count).
  • Clicking the Reset button may only have to reload the actual counter, without touching the increment count.
  • There should not be two code paths based on whether the page is being loaded through normal means or altered by clicks.
  • Every button click must save the data on the server so that it sticks around on the next visit or when reloading.

The solution

In a clean MVC fashion, the Model is the $_SESSION variable provided by PHP. The controller is a set of functions that alter and print the model:

session_start();
class Controller
{
  function url()   { return 'http://www.example.com/mypage.php'; }
  function incr()  { $_SESSION['value'] ++; $_SESSION['incr'] ++;}
  function reset() { $_SESSION['value'] = 0; }
  function value() { return (string)$_SESSION['value']; }
  function incrs() { return (string)$_SESSION['incr']; }
}

And the view is a special script that is to be compiled by a special compiler:

<html>
  <head>
<?
$this -> action('js_incr' , 'incr',  array('value', 'incrs'))
      -> action('js_reset', 'reset', array('value'));
?>
    <title>Example page</title>
  </head>
  <body>
<ul>
  <li>
    <button onclick="js_incr({})">Increment</button>
    <button onclick="js_reset({})">Reset</button>
  </li>
  <li>
    The value is <? $this -> span('value'); ?>.
  </li>
  <li>
    The value has been incremented <? $this -> span('incrs'); ?> times.
  </li>
</ul>
  </body>
</html>

The compiler itself is run as follows (of course, it’s also possible to save the compiled code to a file and load it without having to compile it again):

require_once( 'AjaxCompiler.php' );
$compiler = new AjaxCompiler('view.php');
$ajax = $compiler -> getCompiledObject(); 

print $ajax -> react($_POST, new Controller());

The compiler executes the file, keeping everything in HTML format but replacing the function calls with the appropriate data. The action function specifies that a javascript function (named after the first argument) must send all its arguments to a PHP function on the server (named after the second argument) that is a member function of the controller (the controller is the second argument to the $ajax -> react() call) and return that function’s output converted to javascript, then refresh the spans and divs named in its third argument.

Spans and divs are generated with the appropriately named functions of the controller (which may use any means of rendering seen fit) and sent along with the AJAX response to the original query so that they are replaced in the XHTML document. All the binding is done automatically by the compiler, so that the same functions are used for generation both in normal querying mode and in AJAX mode, and yield identical results for normal XHTML.

Work in progress

Many things remain to be done. hierarchical AJAX documents are still not possible, nor are parametric spans and divs (for instance, for editing individual elements of a list of elements). Also, the system has not yet been tested with javascript-enabled data in those spans and divs that are se

SQL Generation, Part 2

Introduction

This is a follow-up of an older article, which examined automatic generation of relational database tables to represent an entity-property-binding data structure. That article concluded by stating that the next step would be the generation of queries.

Deletion, modification and reading queries all have the ability to affect a subset of the available instances of an entity. This article is dedicated to providing subset semantics and a way to compile them to SQL.

Naive solution

In the most naive implementation, subsets would be either singletons (that is, all operations affect a single instance at a time) or the entire set (that is, all operations affect all instances). A simple modification would also allow constructing subsets through union operations from the singletons.

val all : entity -> subset 
val singleton : id -> subset 
val union : id list -> subset

Subsets would then be of the form ” or ‘WHERE id = $1 OR id = $2 …‘, which can be generated in mere seconds from Objective Caml code, and could be easily appended to any SQL query (select, delete, update) to determine the working set for that query.

The advantage of this solution is that it’s very simple to implement. The problem is that it moves all the processing from the database to the host code, which then has to sort through heaps of data (and send huge queries to select a large subset that’s not the entire set) to achieve anything meaningful.

Predicate-based solution

Most queries performed on a database are either based on identifiers (thus possible with singletons) or based on logic predicates. As such, a subset is merely the set of all instances of an entity which satisfy a given predicate (such as having a certain e-mail address, or being born before a certain date, or simply being the sole element of a singleton). This semantic approach is used in SQL (the WHERE clause introduces a predicate) and showcased with a distinct syntax in a recent article.

A basic predicate system involves writing a first-order logical expression (that is, without quantifiers) which applies to a single variable (an instance of the appropriate entity) to support the following operations:

P(x) = atom(x) | P(x) × P(x) | P(x) + P(x) | ¬ P(x)

An atom is any elementary comparison between two expressions involving constants, query parameters, the identifier of the instance and its properties. For instance, an example of atom would be less (property “created”) (sub curDate (days 30)), indicating that the instance is older than 30 days, corresponding to WHERE created < DATEADD(”dd”,-30,CURRENT_DATE()) (your SQL dialect may vary).

In this situation, the subset can be translated into an SQL sentence to be placed after the WHERE, with the rules:

atom     -> compile(atom)
Not(p)   -> "NOT (" tr(p) ")"
And(p,q) -> "(" tr(p) "AND" tr(q) ")"
Or(p,q)  -> "(" tr(p) "OR" tr(q) ")"

A preprocessing operation could be to move all Not operations to atoms using the De Morgan rules, and instead negate only atoms, or even convert the predicate to a normal form.

By encapsulating subset operations in a type-safe, no-leak system, it becomes possible to propagate the construction of SQL queries to helper modules, for instance a module that provides the subset of recent elements  (younger than 30 days), which may then be intersected or unified at will with other subsets to obtain a full query.

Further work

All of the above does not handle bindings (which are the entity-property-binding representation of cross-table searching). The possible semantics, along with the actual implementation of bindings through SQL joins (and the consequences of that implementation on SQL UPDATE and DELETE queries) remains to be seen in a futher article.

Array iterators

Array, pointers and iterators

Consider a typical operation of outputting an array of data to the screen:

const char *users[] = { "Alice", "Bob", "Charlie" }; 

for(int i = 0; i < 3; ++i)
  std::cout << users[i] << "\n";

This can be performed in a more functional way by using algorithms and iterators:

std::copy(users, users + 3,
          std::ostream_operator<const char*>(std::cout, "\n"));

While marginally longer, this code has the advantage that the output iterator can be replaced with any other (for instance, one that appends to a vector). This relies on the fact that arrays decay to pointers when they are used in a pointer context (be aware that arrays are not pointers, however) and that pointers are iterators.

The main issue, however, is the compile-time constant 3 which represents the size of the array. Should the contents of the array change over time, one may forget to update the constant (even if it’s defined as a named constant on the line that immediately follows the array).

Arrays as template parameters

Luckily, it’s possible to solve this problem by using a template function to extract the size of an array (since the size of the array is part of its type). The code and application:

template <size_t N, typename T>
size_t size(T (&)[N]) { return N; } 

std::copy(users, users + size(users),
          std::ostream_operator<const char*>(std::cout, "\n"));

This works by having a template function that has both the type and the size of an array as template parameters. Since the template parameters are inferred (when possible) from the types of the arguments provided to the function template, both the type (which is ignored) and the size are extracted from an array reference. Even better, unlike the brittle sizeof(users)/sizeof(users[0]) construct, this code will throw a clean compile-time error if for some reason users becomes a pointer and the size is not available anymore.

The next step is obviously:

template <size_t N, typename T>
T* end(T (&a)[N]) { return a + N; } 

std::copy(users, end(users),
          std::ostream_operator<const char*>(std::cout, "\n"));

Zend Controllers

The Zend Framework dispatches the provided URL to a controller action which reacts to that URL. The standard dispatch rules can be summarized as follows:

  1. The dispatcher examines the first segment of the URL and matches it against the names of the registered modules. Note that a default module may be used if no other module matches.
  2. The dispatcher then examines the next segment of the URL and matches it against the names of the controllers available in the directory bound to the module. So, if the segment is auth, it will look for a file named AuthController.php. By default, if there is no match, the IndexController.php file may be used (if it exists).
  3. The dispatcher then examines yet another segment of the URL and matches it against the names of the actions available in the controller class. So, if the segment is logout, it will look for a method named logoutAction. By default, if there is no matching, the indexAction may be used (if it exists).
  4. Any remaining parts of the URL are provided to the selected action.

So, basically, the initial configuration of the Zend Dispatcher will generate a set of URL prefixes which are bound to member functions of fresh instances of classes in the file system. This leads to the obvious question: what is that set of prefixes for my website?

In my case, the list would be:

-- module 'admin'
  -- controller 'auth'
    /admin/auth
    /admin/auth/logout
  -- controller 'help'
    /admin/help/page
  -- controller 'index'
    /admin/
  -- controller 'pages'
    /admin/pages
    /admin/pages/page
    /admin/pages/save
    /admin/pages/ajaxmove
    /admin/pages/ajaxrename
    /admin/pages/ajaxdelete
    /admin/pages/ajaxnew
  -- controller 'users'
    /admin/users
    /admin/users/ajaxnew
    /admin/users/ajaxdelete
    /admin/users/ajaxupdate
-- module 'public'
  -- controller 'index'
    /

So, for instance, /admin/pages/page/id/1 would forward /id/1 to the page action of the pages controller of the admin module, whereas /home/about-us would forward /home/about-us to the index action of the index controller of the public module (which is the default module).

The Tool

The above list was automatically generated by a tool. The core of that tool is this:

class _privateControllerExplorer {
  function __construct($modules, $default_module) {
    $this -> _modules = $modules;
    $this -> _default_module = $default_module;
  } 

  function explore($root_path) {
    $modules = array();
    foreach ($this -> _modules as $name => $path)
      $modules[$name] =
        array( "path" => $root_path . $path,
               "controllers" => $this -> explore_module($root_path . $path, $name));
    return $modules;
  } 

  private function explore_module($path, $module) {
    $uri = $module == $this -> _default_module ? '/' : "/$module/";
    $controllers = array();
    foreach (glob("$path/*Controller.php") as $controller) {
        $name = basename($controller);
        $name = str_replace('Controller.php', '', $name);
        $name{0} = strtolower($name{0});
        $controller_uri = $name === 'index' ? $uri : "$uri$name";
        $controllers[$name] =
          array( "path" => $controller,
                 "actions" => $this -> explore_controller($controller, $controller_uri));
      }
    return $controllers;
  } 

  private function explore_controller($path, $uri) {
    preg_match_all('%function[^a-zA-Z0-9_{]*([a-zA-Z0-9_]*)Action%',
                   @file_get_contents($path), $matches);
    $actions = array();
    foreach($matches[1] as $action)
      $actions[$action] = $uri . ($action == 'index' ? '' : '/' . $action);
    return $actions;
  }
} 

function explore_controllers($modules, $default_module, $root) {
  $explorer = new _privateControllerExplorer($modules, $default_module);
  return $explorer -> explore($root);
}

One should call the function defined at the bottom using the arguments:

  • The modules are a hash binding every module name to a directory where the controllers for that module are to be found. This is usually extracted quite easily from the configuration of your FrontController (where controller directories are bound to module names).
  • The default_module is the name of the default module (or null if there is no default module). This can also be found in the configuration of the FrontController.
  • The root is the absolute path to the position from where the controller directories are defined. That is, controller directories are assumed to be relative paths, and must be completed with the root path.

It should be fairly trivial to write a function which extracts that information from the FrontController.

The tool obviously makes a few assumptions. The first assumption is that the standard Zend dispatcher is used (otherwise, the rules for dispatching may not match what the tool is attempting to extract). The second assumption is that every controller file has a fairly simple layout : instead of intrusively interpreting the file in PHP (which may require the inclusion of other files and have side-effects) the tool merely processes it with a regular expression that could definitely use some improvement.

(Meta) language, Part 1

What is this?

This article proposes a simple modeling language with a few syntactic and semantic tricks that allow using the language itself as a metalanguage. The language would be used to generate a persistent data model and several queries over that data model (both to read and to write). The underlying system for representing the data is the entity-property-binding architecture from this article.

Definitions and queries

The system defines entities, properties, bindings and queries. Queries, in turn, will involve subsets (filtering entity instances based on predicates involving their properties and bindings) and operations performed to all instances in these subsets.

Consider a simple data model for a library: an author has a name, a book has an author, a title and a publication date. Subsets which we would like to manipulate (beyond singletons) are the list of books of an author, the list of books with a certain title, the books published between two dates, and the authors with a certain name. Using the appropriate syntax:

entity author
property name[author] : string

entity book
property title[book] : string
property publication[book] : date

binding written_by[book : many book ; author : one author] 

def authorByName n =
  author |? name[_] = n
end

def booksByTitle t =
  book |? title[_] = n
end

def booksBetweenDates f l =
  book |? publication[_] > f |? publication[_] < l
end 

def booksByAuthor a =
  book |? written_by[book : _ ; author : a]
end

The syntax of defining entities, properties and bindings is pretty straightforward : between square brackets, the involved entities are mentioned with the appropriate multiplicity (for bindings) or value type (for properties). Subsets are a little bit more complex: they involve a simplified functional notation.

Every entity, when used in a subset expression, represents the set of all instances of that entity. This set is then filtered using the |? operator : list |? predicate is equivalent to the Objective Caml List.filter predicate list. This involves simplified functional notatioin for making anonymous function definition shorter. Here, name[_] = n is equivalent to (fun x -> name[x] = n). To determine which expressions belong to the anonymous function and which don’t, the rule implies only syntactic context. That is, the anonymous function starts as the placeholder argument _ and grows until it reaches a context where a function would be expected (such as the function side of the operator |? or any other functional operator).

Other functional operators:

x | f is equivalent to f x

x |: f is equivalent to List.map f x

x |! f is equivalent to List.iter f x ; x

exists x: f is equivalent to List.exists f x

forall x: f is equivalent to List.forall f x

In terms of the language, sets of entities are filtered and manipulated this way. The fundamental piece of behavior in this language is therefore chaining a value through several mutators.

A few example of queries (reading the titles and publication dates of books by an author, deleting an author, changing a book title and adding a new author):

def titlesOfBooksByAuthor a =
  booksByAuthor a |: title[_], publication[_]
end

def deleteAuthor a =
  booksByAuthor a |! delete ;
  delete a
end

def setBookTitle t b =
  title[b] <- t
end

def addNewAuthor n =
  new author | name[_] <- n
end

This introduces three new constructs: delete merely deletes its argument (an entity instance) from the data model and returns unit. property[instance] <- value sets the property of the selected instance to the specified value, then returns the instance (to allow chaining (as in the example of the add-new-author query which returns the author after setting its name). new entity creates a fresh instance of the entity, for which all properties must be set before they are read. Constraint violations (binding multiplicity, the fact that properties are used before being initialized) are signaled as runtime errors and there are attempts to detect them at compile time. Queries are transactional, meaning that an exception or runtime error rolls back the entire transaction.

Last but not least, let us introduce modules to handle all the namespacing issues, using the same convention that modules start with a capital letter and non-modules don’t (this is only a convention, not a lexical constraint). The chosen syntax is:

def Library =
module
  entity author
  property name[$.author] : string

  entity book
  property title[$.book] : string
  property publication[$.book] : date

  binding written_by[book : many $.book ; author : one $.author] 

  (* ... *)

end 

By default, the $ variable represents the current module (so I could create a function which takes an argument that has the same name as a module member, and use name to represent the parameter and $.name to represent the module member). Of course, this variable is optional, which means that when a variable name cannot be resolved, a ‘$.‘ is appended to see if it exists as a module member (should this fail, an error is launched.

Metaprogramming

In the current state, all objects are first-class citizens, including modules. This means that functions can be used to manipulate entities, properties, queries, modules, and so on. For instance, assuming that testToken determines if an authorization token is valid:

def addRightsControl func token object =
  if ! testToken token object then throw AccessDenied ;
  func object
end

def safeSetBookTitle title token book =
  addRightsControl (setBookTitle title) token book
end 

An interesting addition here is the pattern keyword. First, let’s consider its usage and semantics, and let’s then examine why it makes sense within the language semantics:

pattern UriIdentified decoratedEntity =

   (* To indicate violation of the uniqueness rule *)
  exception UriNotUnique : string

  (* Associate an URI to every instance of the entity. *)
  property uri[decoratedEntity] : string

  (* Access an element by its URI. *)
  def byUri uri =
    decoratedEntity |? $.uri[_] = uri
  end

  (* Overload operator 'uri[object] <- uri' to enforce uniqueness. *)
  def write($.uri) val object =
    if ! empty ($.getByUri val) then throw ($.UriNotUnique val) ;
    $.uri[object] <- val
  end
end

def Page =
module
  (* Define basic properties of a page *)
  entity page
  property content[$.page] : string

  (* Associate pages to Unique Resource Identifiers *)
  usepattern UriIdentified $.page

  (* Provide an URI-to-content shortcut. *)
  def getContentByUri uri =
    $.getByUri uri |: $.content[_]
  end
end

This example creates a pattern : a means of extending the behavior of a module by adding new entities, properties, bindings and other various definitions to that module. The pattern in the example introduces a new exception, a new property (bound to a specified entity that was passed as an argument), a shortcut for accessing the entity (if any) with an identifier and an overloaded version of the write-operator for the property. This pattern can then be used as part of any module using the usepattern statement.

Simple things for simple people

The language now has a variety of keywords with seemingly distinct functionality. This is not actually the case: all the functionality above can be explained using the same rules revolving around chaining through a sequence of mutators.This is because, fundamentally, statements such as entity, property, usepattern, binding, exception or def are just that: mutators.The rules for these mutators are:

The expression… …returns the original module with an additional…
m entity e
entity
m def name = expr end
field with the expression bound to the name
m property p[...] : …
property
m binding b[...]
binding
m exception ex : …
exception
The expression…
…returns…
module
an empty module (with no members).
m pattern name args = defs end
same as m def name args x = x defs end
m usepattern f same as f m

In particular, a pattern is nothing more than a function which is applied to a module and returns a module, and usepattern is just a synonym for the pipe operator | (except that the lambda argument _ is replaced by $ to the right of an usepattern statement). In general, to the right of any of the keywords above save pattern, the dollar symbol represents the module being modified.

So, in essence, adding a function to a module is not any more complex than:

def addSomeMethod mod f =
  mod def method = f end
end 

(addSomeMethod module { x -> x + 1 }).method 10 (* Outputs 11 *)

An additional detail about definitions is that they overwrite previous definitions with the same name (this is also true for built-in operations, such as writing to a property). However, it’s also possible to later remove a definition, which restores the last available definition.This allows defining private elements in modules, but also in patterns which don’t want to overwrite existing definitions with the same name in other modules.

def example =
module
  def frobnicate x = x + 1 end
  def one x = $.frobnicate x end (* Returns x + 1 *)

  def frobnicate x = x / 2 end
  def two x = $.frobnicate x end (* Returns x / 2 *)
  hide frobnicate

  def three x = $.frobnicate x end (* Returns x + 1 *)
end

Other ideas

This is just a basic overview. The next articles in the series will deal with:

  • Using names as first-class variables (manipulating names, concatenating names, generating fresh names locally).
  • Some example patterns that can be easily accomplished with the system.
  • Reflection facilities for exploring a module (in addition to merely adding things to it).
  • A type / predicate system for checking and for reflection.
  • Deciding between runtime metaprogramming and compile-time metaprogramming.

Do Not Use Inheritance

(… to extend existing functionality)

Inheritance does not support inheritance

Inheritance serves two distinct purposes:

  • Reusing and extending existing functionality (represented by the parent class).
  • Creating a subtype relationship, allowing instances of the new class to be used wherever instances of the parent could.

With the notable exception of protected members, the first property can be equally obtained with composition, and the ultimate objective of protected members usually implies that inheritance will be used primarily to specify new behavior for the existing interface of the parent class, instead of being used to extend the functionality of the parent class.

The second property can be used anywhere an instance of the parent class exists, by replacing that instance with one from the new class. The only exception to this is… inheritance!

That is, if the classes FirstDerived and SecondDerived both inherit from the class Base, the parent instance of every FirstDerived instance will always be an instance of Base, and cannot be set to be an instance of SecondDerived. By contrast, with composition, it’s possible to use an object provided as a constructor argument, which may be an instance of any class that inherits from Base.

When does this matter?

In a well-written program, never. After all, if a class follows a typical is-a constraint for its inheritance graph, then nobody will ever want to use a Dog as the parent instance of a Cat (even though both inherit from Animal).

When it matters, however, is the situation where the is-a constraint is not respected. A recent example was the Digitalus CMS, written in PHP using the Object system (though I would not go as far as saying that its design is Object-Oriented). That CMS had a base Content class representing the underlying database representation of content, which was effectively a monostate, and inherited two classes, Page and ContentPage, from that base class. I will ignore for a short while the naming issues there (calling Page something which inherits from what is ultimately a page manager, as well as using only two words for three classes) and dive into the real problem : while Page is-a Content (the former being an administration-only manager allowing modification, while the latter is a guest-only manager), ContentPage is a page of content, which is precisely the kind of thing that Content manages. And therefore, even though a page-of-content is not a content-manager (and is never used as such), page-of-content inherits from content-manager.

In fact, ContentPage goes as far as smashing around the Liskov Substitution Principle with a chainsaw and a sledgehammer, by actually removing from its interface most of the functions provided by its Content parent. Why? To get a handful of functions from the parent available in the child.

My objective was to extend the behavior of Content by creating a new class, LocalContent, that would allow different websites to exist together inside the same Digitalus CMS instance (sharing the same administration), and thus needed to perform site-page associations as part of that model.

However, to extend the behavior of Content, one cannot use inheritance : pages of content are only compatible with the original content manager ! Since the extended content manager cannot be used in all situations, one cannot simply change the base class of ContentPage (this would break code where a Content is necessary), which implies either code duplication or heavy refactoring of third party code (and the implicit compatibility loss with future updates of the tool). By contrast, had the original author used composition, the simple solution would have been to create an additional constructor (for instance, through default parameters in the original constructor) allowing the user to specify what instance of Content or one of its subclasses the content page should use.

The chosen solution was to trash the ContentPage implementation, and write my own (functionally equivalent) version of it using LocalContent through composition. So much for extensibility.

Conclusion

Using inheritance to incorporate the behaviour of a class, when there is no clear is-a relationship, prevents you from extending the base class and incorporating those extensions in your own (something which may indeed prove useful, since no is-a relationship would preclude the existence of such extensions). By contrast, composition adds only minimal overhead (in terms of code to be written) and keeps the benefit of inheritance-based polymorphism for the class you’re writing.

XSLT as a Templating System

XSL is a programming language (which uses an XML-compatible syntax) used to describe a transform from an XML file to another XML file (though it may also be used to generate non-XML documents with a little bit of hacking).

Consider a simple XML file describing a filesystem tree: every node contains a name and optionally some children.

<dir name="home">
  <dir name="arkadir">
    <file name=".emacs" />
    <file name=".zshrc" />
  </dir>
  <dir name="root">
    <file name="TWiki.tar.gz"/>
  </dir>
</dir>

One would wish to convert this to bit of XHTML such as the following:

<ul><li><p>home</p>
  <ul><li><p>arkadir</p>
    <ul><li><p>.emacs</p></li>
        <li><p>.zshrc</p></li>
    </ul></li>
        <li><p>root</p>
    <ul><li><p>TWiki.tar.gz</p></li>
    </ul></li>
  </ul></li>
</ul>

A stylesheet to perform that transform would be:

<xsl:template match="dir">
  <li>
    <p><xsl:value-of select="@name"/></p>
    <ul><xsl:apply-templates/></ul>
  </li>
</xsl:template> 

<xsl:template match="file">
  <li><p><xsl:value-of select="@name"/></p></li>
</xsl:template> 

<xsl:template match="/">
  <ul><xsl:apply-templates/></ul>
</xsl:template>

Stylesheets work based on templates, where every element in the source tree may be associated with one or more templates. Whenever a template matches the currently explored node, the code for that template is inserted into the output tree, with the special elements in the template being replaced with the appropriate values : xsl:value-of inserts the text value of an attribute or XSL variable, or the text contents of an element, while xsl:apply-template will traverse the child nodes and attempt to match them with templates. The replace-with-result semantics allow the writer to specify how the generated code should be wrapped at every level.

Making it real

To perform any kind of serious work in a programming language, one needs to have access to abstraction functionality: the ability to factor identical code into code blocks (xsl:template), the ability to make abstractions based on parameters (xsl:param), and the ability to separate the code into different files (xsl:import).

An issue appears from the fact that xsl:apply-templates will apply all available templates for the given elements, which makes it difficult to factor out code (for instance, using distinct templates for a given element when generating a quick view and a full view within the same file). The element xsl:for-each can be used to inline that behavior, but it still prevents factoring.

The solution comes from calling templates:

<xsl:template name="long-dir">
  <li>
    <p><xsl:value-of select="@name"/></p>
    <ul>
      <xsl:for-each select="dir">
        <xsl:call-template name="long-dir"/>
      </xsl:for-each>
    </ul>
  </li>
</xsl:template> 

<xsl:template name="short-dir">
  {
  <xsl:value-of select="@name"/>
  <xsl:for-each select="dir">
    <xsl:call-template name="short-dir"/>
  </xsl:for-each>
  }
</xsl:template>

Instead of executing templates based on matching the input elements by XPath, one can select the appropriate elements using xsl:for-each and then use the appropriate template with xsl:call-template. The code above, for instance, uses this property to define two recursive directory printing templates, one which uses lists and the other which uses text-mode with delimiting brackets (and an output of { home { arkadir } { root } } for the example above).

Using helpers

The ability to import and call individual templates makes it possible to create libraries of helpers that can be included at will. For instance, to insert a button that hides/shows an element, one could write the following template:

<!-- A template that inserts a show/hide toggle button.
     Param 'target' : the identifier of the target element.
     Param 'show' : the display style when visible. -->
<xsl:template name="toggle">
  <xsl:param name="target"/>
  <xsl:param name="show"/>
  <button style="width:20px ; height:20px ; float:left">
    <xsl:attribute name="onclick">
      toggle('<xsl:value-of select="$target"/>','<xsl:value-of select="$show"/>');
    </xsl:attribute>
  </button>
</xsl:template> 

<xsl:template match="dir">
  <!-- Generate an unique identifier -->
  <xsl:variable name="id"><xsl:value-of select="generate-id(.)"/></xsl:variable>
  <xsl:variable name="show">block</xsl:variable>
  <li>
    <!-- Insert a toggle button -->
    <xsl:call-template name="toggle">
      <xsl:with-param name="target" select="$id"/>
      <xsl:with-param name="show" select="$show"/>
    </xsl:call-template>
    <p><xsl:value-of select="@name"/></p>
    <ul>
      <!-- Specify the ID of the 'ul' element to bind it
           to the toggle button. -->
      <xsl:attribute name="id">
        <xsl:value-of select="$id"/>
      </xsl:attribute>
      <xsl:apply-templates/>
    </ul>
  </li>
</xsl:template> 

<xsl:template match="file">
  <li><p><xsl:value-of select="@name"/></p></li>
</xsl:template> 

<xsl:template match="/">
  <!-- Insert the script containing the toggle function -->
  <script>
function toggle(id, show)
{
  var target = document.getElementById(id);
  if (target.style.display == 'none')
    target.style.display = show;
  else
    target.style.display = 'none';
}
  </script>
  <ul><xsl:apply-templates/></ul>
</xsl:template>

Assuming that the toggle function (written in javascript) is provided in the resulting file, one can simply bind a new toggle button to any element on the page, and the button will then automatically perform show-hide operations on that element.

Templating system

A templating system is plugged at the output end of a web site. The web site generates print instructions (in the form of function calls, native objects or generated XML) that are then processed by the templating system and turned into an XHTML document that is sent to the user.

Zend views apply this process with native objects : the data required for rendering is inserted into a Zend_View object, and a template is a PHP file which reads data from the view object and wraps it in XHTML.

When using XSL templates, the web site generates an XML file which is then processed by the templates. The process is intrinsically slower than using PHP objects (because one has to generate the XML file, then invoke the processor, as opposed to loading up an object and using the same interpreter to turn it into XHTML), but can be performed on the client side and thus make the server load lighter. As a bonus, since the generated output is made up of only strings, it’s possible to cache the parts that do not change often in a database without requiring complex serialization and parsing setups.

The Mutator Idiom

Imperative Elegance

Imperative programming has some elegant aspects to it. When representing a sequence of operations, as often happens for instance when drawing vectorial elements one by one, the “recipe” feel of imperative code can appear better than manually folding a concatenation operator of a list of operations.

Of course, achieving the same elegance without mutable state is possible. The basic idea is to use a value to represent the state, and create mutator functions which represent the state changes, and which return a new state based on an old one. For instance, in the example below, the state is represented by type op, and all mutator functions have a noticeable -> mutator type (instead of the -> unit type in true imperative functions).

module Graphics :
sig
  type op
  type mutator = op -> op
  val apply       : op -> unit
  val nil         : op
  val draw_rect   : vect -> vect  -> mutator
  val draw_circle : vect -> int   -> mutator
  val draw_line   : vect -> vect  -> mutator
  val draw_pixel  : vect          -> mutator
  val draw        : vect -> op    -> mutator
end

Keeping the state as the last argument of every mutator allows a fairly standard set of manipulations.

Mutator lists

The first possibility is to store all mutators in a list, and to concatenate them together with a folding operation:

let small_cross =
  List.fold_left (fun x f -> f x) nil
  [
    draw_line (-1,-1) ( 1, 1) ;
    draw_line (-1, 1) ( 1,-1)
  ] 

let four_crosses =
  List.fold_left (fun x f -> f x) nil
  [
    draw ( 2, 0) small_cross ;
    draw (-2, 0) small_cross ;
    draw ( 0, 2) small_cross ;
    draw ( 0,-2) small_cross
  ]

Using this approach, one comes closer to the imperative style of specification, while keeping the ability to manipulate the objects freely (for instance, in this case, to reuse the small cross drawing four times as part of a larger one).

Piping operator

A commonly found improvement in functional language is a piping operator, which replaces the (fun x f -> f x) of the above folding operations, and which can then be used in infix position to separate the statements. Explicitly unrolling the folding operation yields:

let (>>) x f = f x  

let small_cross =
  nil
  >> draw_line (-1,-1) ( 1, 1)
  >> draw_line (-1, 1) ( 1,-1) 

let four_crosses =
  nil
  >> draw ( 2, 0) small_cross
  >> draw (-2, 0) small_cross
  >> draw ( 0, 2) small_cross
  >> draw ( 0,-2) small_cross

The piping operator can be found in languages such as F#. Objective Caml does not provide it by default, but it can be rewritten as above.

Control structures

Using the piping operator, one can implement the classic imperative control structures in a purely functional way. The loop will keep the state as an accumulator, and the body of the loop will be a mutator:

let rec _while cond iter init =
  if cond init
  then _while cond iter (iter init)
  else init 

let rec _for s n iter init =
  if s > n
  then init
  else _for (s+1) n iter (iter init s) 

let drawing =
  nil
  >> draw_rectangle (0,0) (10,10)
  >> _for 0 10
     (fun i -> draw_line (0,i) (10-i,0))

As a consequence, every imperative construct can easily be converted to a purely functional approach using mutators and explicit state representation. Even better, the fact that state is explicit allows simple referencing of a previously manipulated state, and makes implementing undo functionality or simply reusing operations much easier.



693 feed subscribers
(readers who polled a feed this week)