Monthly Archive for October, 2008

Flow

Flow is a lightweight C++ inter-thread flow library, built on top of the boost::thread library. It allows dispatch of parts of a complex computation process to one or more additional threads, in a transparent and idiomatic way. It has been written in early 2008 by me (Victor Nicollet), and it is available under the terms of the new BSD license.

The flow library implements thread-safe FIFO queues (flows), and provides facilities for manipulating them that are compatible with the Standard C++ Library.

Downloads

Using this library requires Boost.Threads and Boost.Optional to be installed and available to the program. To use the elements defined in the flow library, one only has to include the header file flow.hpp: since all code is templated, there is no need for separate compiling and linking.

  • flow.hpp : include this file to use any entity in the library.
  • prime.cpp : a naive prime number test using several cores.
  • sort.cpp : a constant-time sort using an auxiliary thread.

Concepts

A flow is a FIFO queue where enqueueing and dequeueing are atomic. The type flows::flow<T> represents a flow of elements of type T.

Flows are manipulated through readers and writers, which are objects used to read from and write to the flow. Reading will remove the oldest element from the flow, while writing will append a new element to the flow, in a FIFO fashion. Attempting to read from an empty flow will block until another thread writes to the flow. Readers are represented by class flows::reader<T>, and writers are represented by class flows::writer<T>.

When all writers of a flow disappear, the flow is closed: all the readers are notified that no more data will be appended to the flow, triggering an end-of-flow state once all the remaining elements have been removed from the flow.

When all readers of a flow disappear, the flow is silent: all the writers are notified that nobody will ever read the data written to the flow, providing opportunities for optimization.

When all readers and writers of a flow disappear, that flow is automatically garbage-collected and is destroyed by the thread that destroyed the last reader or writer (along with all the elements it still contained).

Documentation

class flows::reader<T>

This class represents a reader, which may be associated to a flow (valid reader), or not be associated to any flow (invalid reader). The class itself can be summarized as:

template <typename T>
class reader
{
public:
  typedef T                  value_type;
  typedef boost::optional<T> option_type;
  typedef /* unspecified */  iterator; 

  iterator    begin();
  iterator    end();
              reader();
              reader(const reader&);
  reader&     operator =(const reader&);
  void        close();
  option_type get();
};

The default constructor creates an invalid reader. A valid reader can only be obtained from flows::flow<T>. Assigning to a reader dissociates that reader from its flow (if any), which may lead to that flow being silenced or even destroyed if this was the last reader for that flow. Note that flows::reader<T>::close() is equivalent to assigning a default-constructed reader: the reader becomes invalid and detached from its associated flow (if any).

The flows::reader<T>::get() function behaves as follows:

  • If the reader is invalid, returns no value.
  • If the reader is valid, but the associated flow is closed and empty, returns no value.
  • If the reader is valid, but the associated flow is empty, blocks until a value becomes available or the flow is closed, and tries again.
  • If the reader is valid and the associated flow is not empty, atomically removes the oldest value and returns it.

As a consequence, if two or more readers are associated to the same flow, every element in the flow will be read by at most one reader (and all elements will be either read by a reader or destroyed along with the flow).

flows::reader<T>::begin() and flows::reader<T>::end() are the first and past-the-end iterators of the read sequence of elements. They are input iterators (in the sense defined by the Standard C++Library) which represent the sequence of elements read by a particular reader:

  • Creating an iterator using begin() or incrementing an iterator reads the next value from the flow by calling the flows::reader<T>::get() function on the associated reader. This function call may block until the flow is written to. If it returns a value, then the iterator is bound to that value. Otherwise, the iterator becomes equal to the past-the-end iterator.
  • Dereferencing an iterator returns the value bound to the iterator. This is guaranteed to occur in constant time.

Iterators are invalidated on every assignment to the reader.

An analogy for read iterators is the std::istream_iterator, but the data is read from a flow using flows::reader<T>::get(), instead of being read from a stream using operator>>().

class flows::writer<T>

This class represents a writer, which may be associated to a flow (valid writer), or not be associated to any flow (invalid writer). The class can be summarized as:

template <typename T>
class writer
{
  typedef T                 value_type;
  typedef /* unspecified */ iterator; 

  iterator begin();
           writer();
           writer(const writer&);
  writer&  operator =(const writer&);
  void     close();
  bool     silent();
  void     put(const T&);
};

The default constructor creates an invalid writer. The only way to create a valid writer is to use flows::flow<T>. Assigning to a writer dissociates that writer from its flow (if any), and then binds it to the flow of the writer that was assigned to it (if any). This may lead to the flow becoming closed or even garbage-collected. flows::writer<T>::close() is equivalent to (but more idiomatic than) assigning a default-constructed writer.

The flows::writer<T>::put() function atomically writes a copy of its argument to the flow associated to the writer. If there is no associated flow, the behavior of this function is undefined.

The flows::writer<T>::silent() function returns whether the associated flow is silent or not. If there is no associated flow, the behavior of this function is undefined. This is provided as optimization hints to avoid computing values only to place them in a silent flow. It can be used to notify search threads that the one of them has found what was searched. See the above example on prime numbers for an illustration.

The flows::writer<T>::begin() function returns the output iterator for this writer. This iterator behaves as an std::back_insert_iterator, but calls flows::writer<T>::put() on its associated writer instead of calling insert(). This iterator is invalidated by assignment to the writer.

class flows::flow<T>

This class represents an active flow, and allows creating fresh readers and writers associated to that flow. For that reason, for as long as this instance exists, the underlying flow can be neither closed nor silenced (because a new reader or writer could be obtained from it at any time). Therefore, this instance should be used to prepare a computation, and destroyed before the computation starts.

When the instance is destroyed, the flow itself remains in memory for as long as readers or writers are associated to it. Once all readers and writers are gone, the flow is destroyed immediately, destroying any unread data left.

Note that the library provides several functions that dispatch computations to other threads without requiring the explicit creation of a flow from user code. Ideally, flows::flow<T> should only be used in situations where these functions do not meet the needs of the program, and even then it should only be used with care.

The class can be summarized as:

template <typename T> 
class flow
{
  flow(); 

  const writer<T>& input() const;
  const reader<T>& output() const;
};

As expected, flows::flow<T>::input() returns a writer associated to the flow, and flows::flow<T>::output() returns a reader associated to the flow. Unlike the naming approach of the C++ Standard Library, where an input stream is a reader and an output stream is a writer, the input of a flow is the writer (where the flow gets its input) and the output of a flow is the reader (where the flow sends its input).

Generators

A generator is a separate thread that computes values and outputs them to a flow. The original thread keeps a reader associated to that flow, and may read values from the flow as they become available.

Two functions allow the creation of generators:

template <typename T, typename F>
reader<T> generate(F f); 

template <typename T, typename F, typename I>
reader<T> generate(F f, I b, I e);

The first function creates a new flow and a new thread. Then, in that new thread, it calls f(w) where f is the argument to the generate function, and w is a writer associated to that flow. Simultaneously, it returns a reader associated to the flow in the original thread. The created thread ends when the argument function returns.

The second function creates one new flow, and creates a thread for each iterator i between b and e. Each thread calls f(w,*i), and the function simultaneously returns a reader for the flow. Each created thread ends when its function call returns.

That is, the first function creates a generator, and the second function creates one generator for each iterator in the passed sequence.

Processing

A processor is a thread which reads from a flow, does some computations, and writes to another flow. The most generic way of doing this (by specifying the exact function that reads and writes) is with the flows::process<T>() function:

template<typename T, typename U, typename F>
reader<T> process(const reader<U> &source, F f);

This function creates a new flow and a new thread which runs f(source,dest), where dest is a writer associated to the new flow. It immediately returns a reader associated to the new flow.

A special case of processing is mapping an function to a flow: the processor reads a value, applies a function to it, and writes the result to another flow. Mapping is done using the flows::transform<T>() function:

template <typename T, typename U, typename F>
reader<T> transform(const reader<U> &source, F f, unsigned n = 1);

This function creates n new threads and a single flow, and returns a reader associated to that flow. Every thread reads a value x from the source reader, and writes f(x) to the new flow. Note that if n = 1, the order is preserved, but not if n > 1. The new flow is closed when all the data from the original flow has been processed.

Accumulating

An accumulator is a thread which reads from a flow and applies a binary operation to collapse all the values into a single one, using an initial value for the first application. Once all the data is processed, it writes the final value to an output flow.

Accumulation is performed using the function flows::accumulate<T>() function:

template <typename T, typename U, typename F>
reader<T> accumulate(const reader<U> &source, T initial, F f);

This function creates a flow and thread. The thread performs the operation f(initial,*it) for it between source.begin() and source.end(), then writes initial to the flow. The function returns the reader associated to the flow immediately.

Further work

The implementation could be improved. First, it could be adapted to work with lock-free structures. Second, it could also be adapted to use worker threads instead of spawning new threads for every operation.

The interface could also use some improvement. For instance, it’s still too easy to create deadlocks by forgetting to close a writer in the wrong place. And the code does not interact with the standard library as well as it should.

JXML : Javascript Libraries

Javascript is frequently used to make an HTML web page dynamic. Most modern browsers support Javascript as a scripting language and provide reasonably standard means of binding code to an event (such as clicking on a button), letting code change the content of the page (such as hiding an object) or even interacting with the server (sending a request and retrieving the data). Despite a lot of compatibility and semantic problems (such as the ability to move objects around in the DOM in a sane way), libraries such as jQuery provide further capabilities and make many things simpler.

Yet, there is still much left to be done.

Javascript, for instance, provides no namespacing, leading to collision risks that are usually circumvented by means of a root object that acts as a namespace-like contraption, with the exception that the code within a namespace cannot be spread over several documents. Javascript also provides no standard and reliable means of creating new behavior and data: while it’s possible to create new data by altering the innerHtml of elements in the DOM, and while the closure system allows the creation of new behavior, creating a new element in the DOM and binding some new behavior do it is very complex to do without library support.

While jQuery proposes means of associating behavior to elements programmatically, it allows too much freedom, which leads to inconsistencies in the way in which objects are defined and thus restrict the interoperability of jQuery-based creations.

JXML : a Javascript Description Language

The aim of JXML is to provide a language that provides a single consistent way of defining UI components in Javascript, and allowing them to work in combination with each other. It works by harnessing the raw power of jQuery to allow a large breadth of behavior types while using a strict model-delegate pattern to encapsulate component behavior independently of its display.

JXML is based on XML. Some sample JXML code would be:

<jx:namespace name="example">
  <jx:var name="default_increment">0</jx:var>
  <jx:component name="incrementer">
    <jx:param name="initial" default="{example.default_increment}"/>
    <jx:var name="current">this.param.initial</jx:var>
    <jx:method name="increment">++this.current ; this.change()</jx:method>
    <button label="Increment">
      <on:click>
        this.model.increment()
      </on:click>
    </button>
    <span name="display">
      <on:refresh>
        this.display.text('The value is ' + this.model.current);
      </on:refresh>
    </span>
  </jx:component>
</jx:namespace>

This code defines an incrementor : a component which contains a button and a bit of text displaying a number. When the button is pressed, the displayed number is incremented. Once the above JXML is compiled and the resulting JS file is included, creating the component is done using:

var model = new example.incrementer({initial:10});

Note, however, that this merely creates the model part of the component. All its variables, parameters and methods are available and can be used as-is, for instance writing model.current = 25 or model.increment(). To create the UI delegate, one must specify the position where the delegate should be placed. This is done by passing a function which accepts HTML as its input and places it somewhere in the DOM. For instance, using jQuery:

var view = model.show(function(x){$("div#target").html(x)})

This automatically creates a new UI delegate, inserts it where specified, and decorates it with all the events (such as responding to button clicks). It also returns a view object, which contains a model member (that references the original member) and one member for each named element in the view stored as a jQuery selector. So, for instance, one could write view.model.increment() or even view.display.hide(’slow’).

Creating a view also registers it within the model. Every time the change() method of the model is called, all the UI delegates refresh themselves (be careful with infinite refresh loops).

Last but not least, view.clone(f) is equivalent to view.model.show(f), and is useful when storing views in array components (such as lists).

Generic Perfect Hash

A hash function is a pure function: when called on the same argument twice, it returns the same value. A real-world hash function is also expected to return a useful key: a small value (in terms of memory footprint) with a trivial comparison operator, and such that the probability of collision (two different arguments having the same key) is small.

Such hash functions are useful: they turn a large object that is difficult to store and compare (the argument) into a small one that is easy to store and compare, with the benefit that if the small objects are different, then the large objects are different, and if the small objects are equal then there is a very high probability that the large objects are also equal.

A perfect hash function has a probability of collision of zero: two objects are equal if and only if their keys are equal.

The problem is that generic perfect hash functions cannot exist. If a hash function always creates a 32-bit number, then that hash function can only discriminate between 4 294 967 296 different objects. Meaning that if you manipulate 4 294 967 297 objects, you will have a collision and the function will not be perfect.

Specific perfect hash functions do exist: if the entire set of manipulated objects is used, then a perfect hash function can be found. However, this requires knowledge of the set, which is not always possible.

Approximation

I propose here an approximation of a generic perfect hash function, which uses the Objective Caml garbage collection to make an arbitrary number of values fit within an Objective Caml integer.

The interface to this approximation is as follows:

type 'a handle
val (!!) : 'a handle -> 'a
val make : unit -> 'a -> 'a handle

Here is what happens:

  • ‘make ()’ creates a new perfect hash function that handles values of a certain type. When called on an argument, that hash function returns a handle that is associated to that argument. If two handles are associated to arguments that are structurally equal, then the handles are referentially equal. If two handles are associated to arguments that are not structurally equal, then the handles are not structurally equal either (however, comparison of two handles from distinct hash functions is unspecified).
  • ‘!! handle’ returns the argument to which a handle was associated. Since this is a perfect hash function, there is only exactly one argument associated with each handle.

A simple implementation of the above consists in creating a hash table that maps arguments to handles, and an integer reference. If the argument is in the hash table, the hash function returns the associated handle. Otherwise, the integer reference is incremented, a handle is created from that integer reference (to ensure structural inequality), added to the hash table, and returned.

The problem with that simple implementation is that it is not GC-friendly: even if the values which are hashed disappear from the user code, the hash table will keep them alive until the program ends or the hash function itself is flushed (which, if the function was global, never happens).

Optimization

The proposed optimization is to make the previous implementation GC-friendly by using weak references. The implementation idea is as follows:

  • The hash function stores an array of weak references to handles, and a hash table mapping values to indices in that array.
  • Whenever a major GC iteration occurs, the hash function contents are scanned. If a weak reference is gone, then the corresponding hash table entry disappears.
  • The value associated to a handle is stored outside the handle and in the hash function, so that it remains available once the handle is gone, in order to clean up.

The final implementation is as follows:

type 'a handle =
    { id : int ; owner : 'a handles }
and 'a handles = {
      mutable handles : 'a handle Weak.t ;
      mutable values : 'a option array ;
      hash : ('a,int) Hashtbl.t
    } 

let find_free_index handles =
  let i = ref 0 in
  let s = Array.length handles.values in
  while !i < s && handles.values.(!i) <> None do incr i done ;
  if !i < s then !i
  else begin
    let new_handles = Weak.create (s*2) in
    let new_values = Array.make (s*2) None in
    Weak.blit handles.handles 0 new_handles 0 s ;
    Array.blit handles.values 0 new_values 0 s ;
    handles.handles <- new_handles ;
    handles.values <- new_values ;
    s
  end 

let get handles value =
  match begin
    try Weak.get handles.handles (Hashtbl.find handles.hash value)
    with Not_found -> None
  end with
    | Some handle -> handle
    | None ->
        let handle =
          { id = find_free_index handles ; owner = handles }
        in
        Weak.set handles.handles (handle.id) (Some handle) ;
        handles.values.(handle.id) <- Some value ;
        Hashtbl.add handles.hash value handle.id ;
        handle 

let (!!) handle =
  match handle.owner.values.(handle.id) with
    | Some x -> x
    | None -> assert false 

let clean handles =
  for i = 0 to Array.length handles.values do
    if not (Weak.check handles.handles i) then
      match handles.values.(i) with
        | Some x ->
            Hashtbl.remove handles.hash x ;
            handles.values.(i) <- None
        | None -> ()
  done 

let make_handler size =
  let handles =
    {
      handles = Weak.create size ;
      values = Array.make size None ;
      hash = Hashtbl.create size
    }
  in
  let alarm = ref None in
  let handler = Weak.create 1 in
  Weak.set handler 0 (Some handles) ;
  alarm := Some
    (Gc.create_alarm (fun () ->
      match Weak.get handler 0 with
        | None -> (match !alarm with
            | None -> ()
            | Some alarm -> Gc.delete_alarm alarm)
        | Some handles -> clean handles)) ;
  handles 

let make () =
  let handles = make_handler 128 in
  (fun value -> get handles value)

This code is hereby placed in the public domain, without warranty of any kind.

Premature Optimization

Wisdom of the Ancients

“We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.”
- Donald Knuth

I remember when Donald Knuth was in France, back in March 2005, to give a lecture at the EHESS about tree lattices. As expected, the author of the Art of Computer Programming, the creator of TeX, and the co-inventer of the Knuth-Morris-Pratt algorithm (algorithm) was tall, smart and eloquent. He is the kind of person I will trust on the subjects related to computer science.

This is not to say, of course, that anything he says (or said a while ago) is to be taken at face value. This includes the above quotation.

Software optimization

Computer programs with identical functionality can run at different speeds. The performance of a program is usually a factor of three key elements: the outright efficiency of the program (performing useless computations, processing the data that does not need to be processed), the adaptation of the program to the underlying hardware (attempting random access on a magnetic band, thrashing the cache, fragmenting the memory) and the memory/speed tradeoff (recomputing data versus storing it in memory).

Furthermore, many of today’s programs perform a large number of distinct tasks interactively. Whether a program takes one millisecond or a hundred milliseconds to compute and display the next screen is not relevant (or even perceptible) to the user. In the internet age, latencies of a full second are acceptable to many, and a web page that loads in 300 milliseconds is considered to be fast by most. In video games, a low frame rate is far less problematic than a wildly varying framerate that alternates between a high and a medium value, because users tends to adapt: the brain can compensate for a consistent slowdown.

That being said, the most fundamental point to be made about optimization is that it has a cost: optimizing a program (either when writing it, or after it has been written) takes time. Time to identify the bottlenecks, time to think about a possible solution to eliminate them, time to implement that solution. And so, given a finite time span, a tradeoff exists between implementing more features and optimizing existing features.

Premature optimization

The 20-80 rule is a classic in optimization situations, basically stating that 80% of the time is spent in 20% of the code. So, if one concentrates on optimizing those 20%, the end result will have better performance characteristics than if the same time was spent optimizing the entire program. The actual figures certainly are different than 80% and 20%, but the basic idea that spending one week optimizing the “Out of memory” message box on a text editor will have a smaller impact on the final product than spending that same week optimizing the rendering of the actual text box or the speed of the autosave feature. Or, in an equivalent fashion, spending six months optimizing the memory usage of a Java application when a $30 RAM purchase yields equivalent results is quite wasteful.

Optimizing a program as it is being written, without any practical hint of where the bottlenecks will lie, makes the optimizations substantially easier: the very design of the program can be adapted on the fly to allow the optimization to fit in perfectly. On the other hand, since the actual impact of the code on performance is not known in advance, there is a high probability that the resulting performance gain is negligible (or, in the case of unpredictable interactions with the cache and the pipeline, even negative).

Optimizing a program after it has been written, on the other hand, yields a high probability of a significant performance gain, because profiling and manual testing can show which parts of the program are the bottleneck. However, when the program can finally be tested and profiled, its design and architecture is already well set into place, and often large portions of code must be refactored, rewritten and tested in order to implement the optimization.

Of course, as with all things, the ideal course is somewhere in the middle. If the developers wait too long to optimize, architectural choices that are too difficult to reverse will simply prevent some optimizations, no matter how interesting, making the cost of performance improvement too high, leading to the release of low-performance software. On the other hand, if developers are too eager to optimize before the code is even written, this leads to often wasted effort and often architectural choices that will reduce generality and extensibility later on. This situation is the essence of premature optimization: before the actual performance characteristics of a program are known, important design and architecture decisions are taken because they are expected to improve performance, even though they have a negative impact on future productivity (by making code harder to write, refactor, extend or even debug).

Why premature optimization is evil

  • Optimizing code as it is written is more difficult, intellectually, than writing code and then optimizing it, because it requires thinking at three levels (interface, implementation and optimization) at the same time. This leads to bugs, because the code expresses the original functionality in a non-obvious way in order to improve execution speed, or even outright sacrifices correctness.
  • Optimization requires specific knowledge about processes, seizing opportunities that arise when two pieces of functionality are used together. When applied improperly, this reduces abstraction and increases coupling, because the two independent pieces must be made aware of each other, and optimization-related data must be shared.
  • The optimization itself must be tested to show that it not only does output the expected result, but that it does so in a way that is indeed efficient as originally intended. Of course, the typical way of testing optimization (by profiling the code before and after it is applied) is impossible when no unoptimized code is available.
  • Quite often, premature optimization leads to marginal improvements and sometimes even have a negative impact on performance.

Not all pre-emptive optimization is premature

However, it’s sometimes possible to know in advance that some optimization will be required. For instance, if a developer has already worked on a similar project, and has already been bitten by performance issues, then it makes sense to suggest an optimization strategy ahead of time in order to avoid the same problems. In this case, assuming that the situation is indeed the same as before, then optimization is not premature.

Another example would be a choice between two possible architectures, each of them with the same potential for extension and usability, but one of which provably does more work than another (for instance by requiring that certain parts of data be recomputed unnecessarily). In that case, choosing the more efficient architecture at no additional cost is a good idea, and not premature optimization.

Lay the foundations for later optimization

Finally, sometimes it appears that a given process might be inefficient. However, optimizing the process requires work and time, and it is not yet certain whether the process is important enough to be optimized: perhaps it only accounts for a very small fraction of the computation time, or perhaps it will be dropped from the project and replaced altogether in the following weeks.

A good idea, in that case, is to slightly alter the design of the process to allow optimization code to be added later on. Knowing a few optimization tricks helps here.

The most frequent example of this approach is scalability. By essence, being able to improve performance by adding hardware is an incredible optimization technique, but it requires code to be built for it from the very beginning.

Another, more specific example is precaching data queried from a database. Consider a page containing various elements. A first query done in the Page module retrieves the list of elements, and then queries are performed for each element in the Element module to retrieve the contents. The consequence, of course, is that the Element module performs a lot of queries instead of doing the sane thing: performing a single query to retrieve all the elements from the page.

An optimization technique in that situation is to use a hint system, to hint the Element module that a certain set of elements will probably be queries soon enough. Then, on the first element request, the Element module queries for the entire set of elements, and merely serves the data from cache afterwards. This, in itself, requires no modification of the interface of either module (as everything can be transmitted through a third module privately), but does require that the data is not modified in the database when precaching is used.

XSL to XHTML

Last month, I have discussed using XSLT to generate XHTML from a fixed XML representation of a page, in this article. However, due to the fickle nature of computers, several practical arise when applying a perfectly good theory.

Verbatim inclusion and XHTML namespaces

Naive implementation of copying XHTML tags from the input XML (for instance, the content of a page as entered by a user) is usually incorrect. XSL does provide a recursive copy process, <xsl:copy-of select=”xpath”/>, but this copy is so good that it will actually copy the namespace of the original nodes. If this namespace is not the usual xhtml namespace, then it will add xmlns=”" attributes to all copied nodes that appear as part of an XHTML document.

The namespace of any XHTML element that has to be copied as XHTML should be the xhtml namespace. This is usually done by defining the xhtml namespace in the root element of the original XML document, for instance with:

<document xmlns:xhtml="http://www.w3.org/1999/xhtml">

And then setting the correct namespace for every element that has to be copied:

<xhtml:div><xhtml:img src="/pic.jpg" alt="The Picture"/></xhtml:div>

This is understandably difficult to enforce, especially when the inner XHTML was input by the user. An alternative is to set a default namespace for a container of XHTML code. For instance:

<content xmlns="http://www.w3.org/1999/xhtml">
  <div><img src="/pic.jpg" alt="The Picture"/></div>
</content>

This allows wrapping the input XHTML without modifying it.

Of course, you still have to be careful to use valid XHTML as an input (otherwise, the XML parser will reject the document).

As a quick bonus snippet, a copy-everything helper, which performs a deep copy of every child (element or text) of the current node:

<xsl:template name="copy_everything">
  <xsl:copy-of select="*|text()"/>
</xsl:template>

Verbatim inclusion and Javascript

The problem with Javascript that is inlined in XHTML code (unlike, say, CSS) is that Javascript uses some characters which must be escaped in XML documents, such as < or &. The solution, of course, is to place the javascript code in a <![CDATA[...]]> section to avoid the issue altogether. This has the benefit of working correctly with an XSL stylesheet which will outpt the javascript code as-is.

Except that, since the XSL stylesheet is generating XHTML, it will escape the characters found in the javascript. No problem so far, since modern browsers have XML parsers which will turn the < and & back to their former selves. However, older browsers often can’t.

The typical solution used for backwards compatibility is to use a final XHTML file that looks like this:

<script type="text/javascript"/>
  /* <![CDATA[*/
  { javascript code }
  /* ]]> */
</script>

The XML parser in modern browsers ignores the javascript comments and generates /**/ { javascript code} /**/, which is valid. The older browsers expect Javascript inside the script tag, so they recognize the comments, and the CDATA tag is interpreted as a comment.

The problem is that applying the XSLT to the document will parse the CDATA segment, which removes the compatibility trick (but it should still work with browsers that have a good XML parser).

A partial solution comes from the ability in XSL to wrap the text content of an element in a CDATA segment. The input document can now contain verbatim script declarations, such as:

<verbatim-script>
  <![CDATA[alert('<hello>');]]>
</verbatim-script>

And the corresponding output will be:

<script type="text/javascript">
  /*<cdata><![CDATA[*/
  alert('<hello>');
  /*]]></cdata>*/
</script>

Which will have the same quote-and-comment structure and properties as the original solution. Implementation:

<xsl:output cdata-section-elements="cdata"/>
<xsl:template match="verbatim-script">
  <script type="text/javascript">
    /*<cdata>*/
    <xsl:value-of select="."/>
    /*</cdata>*/
  </script>
</xsl:template>

The downside with this solution is that there is now an element within the script tag, which is invalid in XHTML and therefore causes validation failures. If you really need to include a bunch of Javascript in your file, then this is probably the solution.

An improved and perfectly valid solution is to simply move off any javascript to external files, which are merely included by the client instead of being inlined. Depending on your dispatch architecture, it could be well possible to have the client look for a javascript by querying the same URL with a certain GET parameter, which triggers the use of a distinct XSL stylesheet that only extracts the script from an XML file and outputs it as a large Javascript file.

(Meta) language, Part 3

Specification attempt

This article attempts to chalk up a specification of the main syntax elements of the (meta)language being designed. The language codename chosen here is Xed.

Xed is an expression language (meaning that all syntax constructs in the language are expressions). It’s also a pure functional language, and although some expressions may appear to behave as side-effects for readability purposes they are ultimately implemented as pure functional operations.

Every expression in the language executes in a context. That context is composed of a table of local variables (with the associated values), including the special values _ and $, a list of persistent entities and relationships and their initial values (as read from persistent storage when the script is executed) and a table of modifications applied to those persistent entities (commited to the persistent storage when the script finishes executing).

Value types

Xed can manipulate several types of objects :

  • Base scalar types (integers, booleans), support the common arithmetic and boolean operators.
  • Base vector types (strings, streams), support size-of, access, shift/push, traversal and concatenation.
  • Base associative types (hashes), support size-of, access, set/unset, traversal and merge. Setting a new binding for a value hides the old one instead of replacing it.
  • Objects, mostly equivalent to hashes except that the types of values that are associated are not restricted to a single type.
  • Variants, equivalent to the polymorphic variant type in Objective Caml.
  • Persistent data types, used when defining relationships.
  • Closures (including patterns).
  • Names.

Syntax and semantics

The syntax, with its associated semantics, of all relevant elements are detailed below.

Object semantics

The most fundamental semantics are those of objects. An object is an association that binds arbitrary values to names. Adding a new binding merely hides the existing binding for that name, so that removing a binding restores the previous one.

  • [obj.empty]
    The empty object : { } is an expression that creates an empty object.
  • [obj.make]
    Create an object by defining its members, using shorthand notation. Uses the form { name = expr; … } to bind an expression to each member. This form is best used when defining objects that contain simple values.
  • [obj.add.short]
    Add members to an object using shorthand notation. Uses the form expr .+ { name = expr ; … } to bind an expression to each member. This form is best used when extending objects that contain simple values.
  • [obj.add.def]
    Add a member to an object using the definition syntax. Uses the form expr def name args = expr end to bind the second expression to the provided name in the object created by the first expression. If arguments are present, defines a closure. In the second expression, the special variable $ refers to the object created by the first expression.
  • [obj.add.rec]
    Add a member to an object using the recursive definition syntax. Uses the form expr def rec name args = expr and name args = expr … end to add all the specified members to the object defined by the first expression. The special variable $ refers to the object being created, and not the old one.
  • [obj.undef]
    Un-define a member of an object. Creates a new object where that member does not exist anymore. If an earlier definition was hidden by the undefined member, that definition becomes available again. Uses the syntax expr undef name, or the abbreviated syntax expr .- name. Causes an error if no such definition exists. This approach is used to implement private members and data hiding.
  • [obj.undef.all]
    Un-define a member of an object completely. Creates a new object where that does not have that member anymore (effectively removes the member and all the members hidden by it). Uses the syntax expr undef-all name.
  • [obj.ext.add]
    Add an object as a member of another object. expr object name = <text> end-object is equivalent to expr def name = { $ = $ } <text> end. This syntax is an extension that is not available by default.
  • [obj.mem]
    Extracts the selected member, using the syntax expr.name. Generates an error if the member does not exist.

Closure semantics

Closures carry with them state information extracted from the context. They do not, however, keep the list of modifications to the persistent state, which is provided to them when they are called and not when they are created. All local variables referring to the above context are carried over.

  • [fun.make]
    Create a new closure, using the syntax { args -> expr }. There is no limit on the number of arguments, but pattern matching is not allowed.
  • [fun.match]
    Create a new closure, using the syntax { pattern -> expr | pattern -> expr }. This function accepts a single argument, but that argument is pattern-matched. Patterns may be either variant type patterns or regular expression patterns, which bind variables that are available in the expressions.
  • [fun.call]
    A function is called on an argument using the syntax expr expr.

Pipe semantics

A lot of operations are performed using pipes, which serve as a shorthand notation for many elementary operations.

  • [pipe.call]
    A pipe can be used to call a function on an argument. For instance, a |> f is rewritten to f a. The exception is if f contains the special variable _, at which point it is rewritten as if it were equivalent to { _ -> f } a. This specificity of the right-hand operator is called the implicit lambda rule.
  • [pipe.map]
    The expression s |: f, where s is a stream or hash, creates a new stream or hash (the same as the input) by applying f to every element of s. This expression is subject to the implicit lambda rule. If the input was a hash, the key-element association is unchanged.
  • [pipe.filter]
    The expression s |? f keeps only the elements of the input stream or hash such that f x is true. This expression is subject to the implicit lambda rule. The key-element association of hashes is kept.
  • [pipe.hash.filter]
    The expression s |# f filters a hash based on its keys instead of its values.

Note that the combination of the pipe.call operation and the fun.math operation allows an operation equivalent to the match-with or switch-case pattern-matching constructs.

Name, Variant, Exception semantics

The universal variant type works on the same basis as Objective Caml polymorphic variants. Exceptions are thrown and caught based on their name, and they may carry associated values. They are ultimately members of the variant type, and so they do not have to be defined.

  • [name.fresh]
    Creates a fresh name, different from all the other names in the program. The syntax is simply fresh.
  • [name.lit]
    Creates a name literal. The syntax is `name`
  • [name.use]
    In any situation where a (syntactical) name is expected and a (semantical) name is available, one can use the approach (name) to instantiate it. For instance, obj undef (name) or `new[(name)]` or even `(name)`.
  • [var.make]
    A new member of a variant type is created with expr::name, where the name can be chosen on the spot (no definition required). This may be used in pattern-matching expressions. Note that in orderto support variants bound to an unit value, a cleaner syntax is :name (equivalent to ()::name).
  • [exn.throw]
    Throws a variant object as an exception. The syntax is merely throw expr. The current execution context is completely lost (with the exception of any values referenced by the thrown expression), meaning in particular that any modifications applied to the persistent data model are thrown away.
  • [exn.catch]
    Catches an exception if it matches a certain variant name, using the syntax try expr catch { pattern -> expr | … }. If an exception is thrown by the first expression, then pattern-matching finds the corresponding expression. If one is found, it’s evaluated in the context of the try-catch expression. Otherwise, the exception is propagated upwards.

Note that the language is fully exception-safe: when an exception is thrown, no trace of the execution of the throwing code is left behind, except whatever is saved by the exception itself. This means that any side-effects are automatically rolled back, removing the need for a “finally” clause. Another important element is the fact that all exceptions must be caught before the escape the script (meaning no script will be interrupted by an unexpected exception).

Pattern semantics

Patterns are functions which map objects to objects. Such functions can be created and called by basic means, but the following syntactic sugar is available:

  • [pattern.ext.def]
    Defines a pattern as a member of an object. The expression is expr pattern name args = <text> end-pattern is equivalent to expr def name args $ = $ <text> end. This syntax is an extension that is not available by default.
  • [pattern.ext.use]
    Uses a pattern of an object. The expression is expr use-pattern name args and is equivalent to expr |> name args. This syntax is an extension that is not available by default.

Iterator Intervals

Implicit Iterator Intervals

The Standard C++ Library is built around the concept of iterators. On the one hand, output iterators can be incremented and assigned values, and on the other hand input iterators traverse a range between a start iterator (usually referred to as the begin iterator) and a past-the-end iterator (the end iterator). All containers follow this convention, either through member functions begin() and end() or through pointer arithmetics based on base pointer and size. Also, several other range concepts (for instance, input from an istream)  also follow this convention.

On the other hand, all standard library algorithms work on iterator ranges, which are passed implicitly by providing the two iterators independently.

The problem with implicit iterator intervals is that they don’t play nice with functions that transform intervals: for instance, a filter function would turn an interval into the elements of that interval which satisfy a certain predicate, or a sub-interval function would restrict iteration to a fixed contiguous sub-interval. In the current situation, since it’s not possible to dispatch the result of a single function call to two arguments (the expected begin and end iterators), the result of filter or sub-interval functions has to be stored as a variable, which implies that the type has to be explicitly spelled out (and it’s often too complex to write down).

Explicit Iterator Intervals

Instead of having algorithms operate on two argument iterators, I propose to combine the two iterators into a single object, called an iterator interval. The type of this iterator interval is unspecified: expression templates are used to avoid unnecessary recourse to runtime polymorphism. The only constraint is that an iterator interval must specify at least the following expressions

T::iterator is an input iterator. Then, given const T t :
t.begin() is the first iterator in the interval.
t.end() is the past-the-end iterator in the interval.

Note that this rules out the standard library containers as intervals: these will return a const_iterator instead of a normal iterator. This is because a container is not an interval, it merely has an interval that represents all its content. The mutability of a container determines the mutability of its members, whereas the mutability of an interval does not affect the mutability of the elements of the represented sequence.

For starters, a few helper functions that create intervals from most things which can have intervals:

namespace interval
{
  namespace detail
  {
    template<typename It>
    class interval_type
    {
      It begin_iterator, end_iterator;
    public:
      interval_type(It begin, It end)
        : begin_iterator(begin), end_iterator(end) {} 

      typedef It iterator; 

      iterator begin() const { return this -> begin_iterator; }
      iterator end() const { return this -> end_iterator; }
    };
  } 

  template<typename T, std::size_t N>
  detail::interval_type<T> of(T (&array)[N])
  {
    return detail::interval_type<T*>(array, array + N);
  } 

  template<typename It>
  detail::interval_type<It> of(It begin, It end)
  {
    return detail::interval_type<It>(begin,end);
  } 

  template<typename T>
  detail::interval_type< std::istream_iterator<T> > of(std::istream &in)
  {
    return detail::interval_type< std::istream_iterator<T> >
      (std::istream_iterator<T>(in),
       std::istream_iterator<T>());
  } 

  template<typename T>
  detail::interval_type< typename T::const_iterator > of(const T &t)
  {
    return detail::interval_type< typename T::const_iterator >
      (t.begin(), t.end());
  } 

  template<typename T>
  detail::interval_type< typename T::iterator > of(T &t)
  {
    return detail::interval_type< typename T::iterator >
      (t.begin(), t.end());
  }
}

So, to create an interval from a container, you can write interval::of(my_vector), to create one from an array you can write interval::of(my_array), and to create one from an input stream you can write interval::of<Type>(my_stream). Once you have obtained an interval, you can start using it.

Algorithms

For the purposes of this article, only two elementary algorithms have been rewritten. One could consider rewriting others as well, using the same principle. The key of rewriting algorithms is to have them take a single iterator interval instead of two iterators.

namespace interval
{
  template<typename I, typename It>
  void copy(const I &i, const It &o)
  {
    std::copy(i.begin(), i.end(), o);
  } 

  template<typename I, typename F>
  void for_each(const I &i, const F &f)
  {
    std::for_each(i.begin(), i.end(), f);
  }
}

So, now, it’s possible to write code such as:

interval::copy(interval::of<int>(std::cin), std::back_inserter(my_vector));

Filtering

The real benefit of using intervals is that now intervals can be transformed. An example transformation is to eliminate the elements in an interval which do not satisfy a certain predicate. This involves defining a new iterator type which wraps around the old one, but which will advance internally so that it always references a value which satisfies the predicate. The obvious limitation is that all iterators become forward iterators (the random access property, if present, cannot be kept).

namespace interval
{
  namespace detail
  {
    template<typename It, typename T>
    class filtered_interval_iterator
    {
      It internal;
      const T *owner; 

      void advance()
      {
        while (this -> internal != this -> owner -> end_iterator
               && ! this -> owner -> filter(* this -> internal))
          ++ this -> internal;
      } 

    public: 

      typedef typename std::iterator_traits<It>::value_type value_type;
      typedef typename std::iterator_traits<It>::reference reference;
      typedef typename std::iterator_traits<It>::pointer pointer;
      typedef typename std::iterator_traits<It>::difference_type difference_type;
      typedef std::input_iterator_tag iterator_category; 

      filtered_interval_iterator(It internal, const T &owner)
        : internal(internal), owner(&owner)
      { this -> advance(); } 

      filtered_interval_iterator &operator++()
      {
        ++ this -> internal;
        this -> advance();
        return *this;
      } 

      filtered_interval_iterator operator++(int)
      {
        filtered_interval_iterator temp(*this);
        ++ *this;
        return temp;
      } 

      reference operator*() const { return *this->internal; } 

      bool operator == (const filtered_interval_iterator &o)
      {
        return this -> internal == o.internal;
      } 

      bool operator != (const filtered_interval_iterator &o)
      {
        return this -> internal != o.internal;
      }
    }; 

    template<typename It, typename F>
    class filtered_interval_type
    {
      F filter;
      It begin_iterator, end_iterator; 

      friend class filtered_interval_iterator<It,filtered_interval_type>; 

    public: 

      filtered_interval_type(It begin, It end, F filter)
        : filter(filter), begin_iterator(begin), end_iterator(end) {} 

      typedef filtered_interval_iterator<It,filtered_interval_type> iterator; 

      iterator begin() const
      {
        return iterator(this -> begin_iterator, *this);
      } 

      iterator end() const
      {
        return iterator(this -> end_iterator, *this);
      }
    };
  } 

  template<typename I, typename F>
  detail::filtered_interval_type<typename I::iterator,F> filter(const I &interval, F flt)
  {
    return detail::filtered_interval_type<typename I::iterator,F>
      (interval.begin(), interval.end(), flt);
  }
}

Filtering takes an interval and a predicate, and filters the interval with the predicate, resulting in a new interval. So, to print to standard output all even numbers read from standard input:

bool even(int x) { return x%2==0; } 

interval::copy( interval::filter( interval::of<int>(std::cin), even ),
                std::ostream_iterator<int>(std::cout, "\n"));

Sub-intervals

Another functionality that can be easily implemented is the ability to select only a subset of an interval. The implementation increments the begin-iterator until it reaches the start distance, then forces any iterator which has moved the total length to equal the past-the-end iterator to mark the end of the interval.

namespace interval
{
  namespace detail
  {
    template<typename It>
    class sub_interval_iterator
    {
      It internal;
      typename std::iterator_traits<It>::difference_type left; 

    public: 

      typedef typename std::iterator_traits<It>::value_type value_type;
      typedef typename std::iterator_traits<It>::reference reference;
      typedef typename std::iterator_traits<It>::pointer pointer;
      typedef typename std::iterator_traits<It>::difference_type difference_type;
      typedef std::input_iterator_tag iterator_category; 

      sub_interval_iterator(It internal, difference_type left)
        : internal(internal), left(left)
      { } 

      sub_interval_iterator &operator++()
      {
        -- this -> left;
        ++ this -> internal;
        return *this;
      } 

      sub_interval_iterator operator++(int)
      {
        sub_interval_iterator temp(*this);
        ++ *this;
        return temp;
      } 

      reference operator*() const { return *this->internal; } 

      bool operator == (const sub_interval_iterator &o)
      {
        return this -> left == 0 && o.left == 0
          || this -> internal == o.internal;
      } 

      bool operator != (const sub_interval_iterator &o)
      {
        return this -> internal != o.internal
          && this -> left != o.left;;
      }
    }; 

    template<typename It>
    class sub_interval_type
    {
    public:
      typedef typename std::iterator_traits<It>::difference_type distance;
    private:
      distance length;
      It begin_iterator, end_iterator;
    public:
      sub_interval_type(It begin, It end, distance start, distance length)
        : length(length), begin_iterator(begin), end_iterator(end)
      {
        while (start > 0 && begin != end) { --start; ++begin; }
        begin_iterator = begin;
      } 

      typedef sub_interval_iterator<It> iterator; 

      iterator begin() const
      {
        return iterator(this -> begin_iterator, this -> length);
      } 

      iterator end() const
      {
        return iterator(this -> end_iterator, 0);
      }
    };
  } 

  template<typename I>
  detail::sub_interval_type<typename I::iterator>
  sub(const I& i, typename I::iterator::difference_type start, typename I::iterator::difference_type length)
  {
    return detail::sub_interval_type<typename I::iterator>(i.begin(), i.end(), start, length);
  }
}

With this function, one can specify the start and length of the sub-interval, just like with a typical sub-string function.

Assignment

How to assign an interval to a container? Containers have constructors and assignment functions that work on two iterators, not a single interval. However, it’s possible to work around this, in part, by using a cast function:

namespace interval
{
  namespace detail
  {
    template<typename I, typename C>
    struct assign_type
    {
      I i;
      assign_type(const I &i) : i(i) {}
      operator C() const { return C(i.begin(), i.end()); }
    };
  } 

  template<typename C, typename I>
  detail::assign_type<I,C> assign(const I &i)
  {
    return detail::assign_type<I,C>(i);
  }
}

The return value of the assignment function above can now be assigned to the appropriate container.

The final example reads even integers from standard input, stores them in a vector, then outputs all the contents of the vector between index 2 and index 12:

bool even(int i) { return i % 2 == 0; } 

int main()
{
  std::vector<int> values = interval::assign< std::vector<int> >
   ( interval::filter ( interval::of<int>(std::cin), even )); 

  interval::copy
   ( interval::sub( interval::of(values), 2, 10 ),
     std::ostream_iterator<int>(std::cout, "\n") );
}

Possible extensions

It is currently not possible to retrieve the original iterator, but this could be done easily (just extend the interval type to handle this) and would be useful (for instance when using std::find across an interval).  Also, iterator types are not conserved: filtering conserves bidirectional iterators and sub-interval extraction conserves random access iterators (plus, on random access iterators, sub-interval extraction is much easier than the above version, which has to handle forward iterators).

PHP-CLI

CLI is the command line interface to the PHP interpreter. Instead of running scripts when viewed from the web, this lets PHP run as a standard shell scripting language.

Let’s assume that:

  • You have a web server running PHP
  • You also have command-line access to that server (for instance, the server is running Linux, and you have SSH access).
  • The command-line interface is installed (or you can install it or have it installed)

Then you might want to consider using a command-line back-end for your dynamic web site instead of using a web back-end.

What, you think, move back to the prehistoric era of command lines and white-on-black terminals when we have AJAX and Flex?

I certainly don’t deny that command line interfaces are ugly (I pride myself of having a colorful zsh prompt, but that’s about as far as things go), and if you expect non-developers to be administrators then a web administration tool is an absolute necessity.

Still, if the site administrators are all developers (or can be quickly taught how to use the command-line) or if the time to design an interface is short, then command-line interfaces have some advantages.

  • No need to secure access to the back-end, since the command-line access to the server is already secure.
  • No need to design and debug a complete HTML view, or write and debug javascript to improve a click-based editor.
  • You have access to all the typical command-line tools: make/sed/cron/grep/awk/sort/…, shell scripting, loading data from files, and all other automation tools that can improve administration speed.
  • The administration tool runs with your user rights, not the web server rights, meaning you can increase the paranoia of the web server rights without restricting the capabilities administration tool.
  • You can even invoke external tools that have no point being part of the web site, which serve development purposes (such as diagnosis tools or optimization tools).

How-to

On your typical debian-based LAMP stack, installing the CLI is as easy as an apt-get install php5-cli (or the equivalent for your distribution). Running a PHP program on the command line is done with php5 script.php — arg1 arg2. Anything the script outputs is printed to the output stream, and the arguments are available as the global variable $argv.

For instance, a sample script that takes a single argument and outputs that argument squared would be written as:

<?php echo $argv[1] * $argv[1]; ?>

Additional details are of course best read from the PHP documentation itself.

Tips and tricks

  • Mind your relative paths. Usually, the administration scripts will want to behave as if they were part of the web server, which means that the appropriate paths must be added to the path. A good idea is to share configuration files between the administration scripts and the web site itself, and choose a root path from where all relative paths should be handled (the script should then cd to that root path before doing anything else).
  • Provide shell-friendly data. This involves placing one row of data per line, and using a common separator (such as a colon) between fields. Non-data lines can be used, but it is best if they are prefixed with a comment symbol, such as # or . If only one piece of data is required, provide it with no other decoration (except whitespace) so that it can be used as a shell variable.
  • Make everything transactional. This is usually a no-brainer, but it’s easy to forget. When a set of operations must be performed, either check beforehand that all of them will be done without error, or use a transaction system to avoid failing in the middle of a database or filesystem modification.
  • Include a help page. You will inevitably forget how to use certain commands. A general help page (display the list of commands with a summary of their usage) and specific help pages for every non-trivial command are always useful later on.
  • Rely on PHP introspection. PHP can determine if an object has a certain member, so scripts which can support several operations (such as create/view/list/edit) can be turned into objects with the appropriate members.
  • Use CRUD if your underlying model allows it. The typical manipulations performed by an administrator are the creation, retrieval,modification and deletion of individual pieces of data from the persisten model. Also be ready to provide complete lists of elements with relevant information, but don’t reimplement wheels such as searching by e-mail (grep) or sorting by age (sort).
  • Log everything. You can write to a file every command that is executed using a script. Incorporate a logging system which prints out the current date, the script arguments, and interesting environment variables (for instance, the current user).
  • Standardize argument retrieval. For instance, if the script user writes makeuser -name foo -pass bar, then the script should be able to access the arguments as array( “name” => “foo”, “pass” => “bar” ) and not as array(”makeuser”, “-name”, “foo”, “-pass”, “bar”).
  • Standardize column printing. For instance, if you need to print data as columns of fixed widths, separated by colons, it should be written as $printer = new Printer(5,10,10,50,10); $printer -> print($id, $created, $label, $content, $hits); and not anything more complex. Don’t mix content and formatting.
  • You’re safe, so take advantage of it. Disable running the scripts as CGI (meaning they can only be run with access to the local machine from the appropriate user groups) and feel free to display any useful data. If your application is directory-driven (it could be using the Zend Framework) apply glob() all over the place to display a clean diagnosis of what the application can use or find.

A little warning

Don’t get overzealous. While it may be interesting to control every single aspect of the web site from the command line, reinventing common wheels (such as database backups or version control) will take more time than it’s worth, and safety must be a constant concern in order to prevent executing any administration script from the web.

Also avoid creating non-elementary operations. These are ultimately best implemented as shell scripts (in the language of your choice, including PHP) that exist outside the application and only interact with the application through the administration tools, thereby saving you the time wasted on reimplementation and security checks.

Operational Semantics

Formal language semantics describe how a language should behave. Most of the time, semantics define the values manipulated by the program, as well as the state of the program, and describe how the various expressions and statements change the state and create values. When designing a virtual machine for a language, translating the formal language semantics into code is usually the easiest way to implement the language. Elementary semantics are quite easy to grasp, so let’s start with an example:

[|literal|] → literal 
[|X + Y|]   plus  ([|X|],[|Y|])
[|X - Y|]minus ([|X|],[|Y|])

These are operational semantics for an elementary language that computes sums and differences of numbers. The color code I am using is bold green for expressions being evaluated, orange for virtual machine operations and black for values. Operational semantics exmplain how to evaluate an expression by evaluating its sub-expressions and then applying an operation to their values. Applying the semantics above to the expression 1 + 2 – 3 yields:

[|1 + 2 - 3|] minus ([|1 + 2|],[|3|])
→ minus (plus ([|1|],[|2|]), 3 )
→ minus (plus ( 1 , 2 ), 3 )
→ minus ( 3 , 3 ) → 0

For Objective Caml readers, some example code equivalent to the above would be:

let rec eval = function
  | `literal x -> x
  | `add(x,y)  -> eval x + eval y
  | `sub(x,y)  -> eval x - eval y 

eval (`sub(`add(`literal 1,`literal 2), `literal 3)

The main point of using operational semantics to describe the formal semantics of a program language is that they are very concise, allowing definitions to take a streamlined form. The main difficulty appears when one attempts to introduce more complex behavior into the language, but this difficulty is solved by a multitude of techniques. I tend to give Objective Caml examples for several reasons. The main reason is that they are concise when expression tree traversal is involved, because of pattern-matching.

Context

Usually, the evaluation of a part of a program is not independent of the rest of the program. While simple computations (such as the above one) are standalone, any expression that involves a name will require context in order to resolve that name. For instance, if a program reads the value of user-defined variable FROBNICATE, it needs to know what that variable represents, and that information is extracted from the context. A basic example of context-based evaluation would be:

[|literal|]K  → literal 
[|variable|]K K[variable]
[|X + Y|]Kplus  ([|X|]K,[|Y|]K)
[|X - Y|]Kminus ([|X|]K,[|Y|]K)

I’m using red font to represent the context. The new expression is the variable, which when evaluated is replaced with whatever value is associated to its name in the evaluation context. So, evaluating TWO + 1 in the context K={TWO:2} yields:

[|TWO + 1|]{TWO:2}plus ([|TWO|]{TWO:2},[|1|]{TWO:2})
→ plus ({TWO:2}[TWO],1)
→ plus (2,1)
→ 3

Example Objective Caml code:

let rec eval k = function
  | `lit x     -> x
  | `var s    -> List.assoc s k
  | `add(x,y) -> eval k x + eval k y
  | `sub(x,y) -> eval k x - eval k y 

eval ["TWO",2] (`add(`var "TWO",`lit 1))

In short, a context is usually a set of bindings between names and values that every variable expression can draw from in order to resolve itself into a value. Of course, an evaluation problem happens whenever a variable does not exist in the context in which it is evaluated. Among the possibilities for resolving that situation is using static analysis to make sure that every variable is defined before being used (this is what compiled languages do), deciding that if a variable is not bound in a context, then its value is null (this is what most scripting languages do), or reporting an error (this is what LaTeX and some other shell scripting languages do).

Once context-reading semantics are in place, it can be interesting to create context-writing semantics. One such example is the let-in binding expression of ML family languages, which adds the following rule to the previous four:

[|let name = X in Y|]K[|Y]]K' where K' = K + {name:[|X|]K}

This expression evaluates its first operand in the current context, then creates a new context by binding the name to that value (and keeping all the bindings of the current context), then evaluates its second operand in that new context and returns its value. Note that the original context is not modified, so other expressions don’t have access to the new binding. For instance:

[|let x = 1 in let y = x + 2 in x + y|]{}[|let y = x+2 in x + y]]K where K = {} + {x:[|1|]{}}
→ [|let y = x+2 in x + y]]{x:1}[|x + y|]K where K = {x:1} + {y:[|x+2|]{x:1}}
→ [|x + y|]K where K = {x:1} + {y:plus ([|x|]{x:1}, [|2|]{x:1})}
→ [|x + y|]K where K = {x:1} + {y:plus ({x:1}[x], 2)}
→ [|x + y|]K where K = {x:1} + {y:plus (1, 2)}
→ [|x + y|]{x:1,y:3}plus ([|x|]{x:1,y:3}, [|y|]{x:1,y:3})
→ plus ({x:1,y:3}[x], {x:1,y:3}[y])
→ plus (1,3)
→ 4

Example Objective Caml code for that evaluator:

let rec eval k = function
  | `lit x      -> x
  | `var s      -> List.assoc s k
  | `add(x,y)   -> eval k x + eval k y
  | `sub(x,y)   -> eval k x - eval k y
  | `def(s,x,y) -> eval ( (s, eval k x) :: k ) y

Stacks and de Bruijn Indices

Comparing strings to extract a variable from a context is often a bad idea, performance-wise. This is sometimes necessary, for instance when semantics allow runtime creation of variable names (such as $$name in PHP). In practice, when compile-time nested variable scopes exist (as outlined above) using de Bruijn indices and a stack instead of an associative container allow an increase in performance.

Whenever a variable is defined, it is pushed onto a stack. When the scope of that variable ends, it is popped off the stack. The de Bruijn index of a variable is the depth of that variable on the stack, which allows retrieving it. The de Bruijn index of any variable can be computed at compile-time through a very simple static analysis which stores a stack of variable names and replaces every name it encounters with the current depth of that name. Since the parser respects the same scoping rules as the interpreter, the correct association is made between variable names and stack depth.

An example of Objective Caml evaluator that uses de Bruijn indices, with implicit popping (but a performance penalty due to non-random access):

let eval k = function
  | `lit x    -> x
  | `var i    -> List.nth k i
  | `add(x,y) -> eval k x + eval k y
  | `sub(x,y) -> eval k x - eval k y
  | `def(x,y) -> eval ( eval k x :: k ) y

An alternative is to use a random-access container for the stack to access values in constant time instead of proportional to their depth. In general, de Bruijn indices are interesting both for writing interpreters and for performing static analysis in code. Ideally, one of the first steps performed by a parser is to transform variable names to de Bruijn indices. An interesting side-effect is that this also provides a clean algorithm to determine whether all variables are defined:

let rec pos x = function
  | [] -> raise Not_found
  | h::t ->
      if x = h then 0
      else 1 + pos x t 

let rec db s = function
  | `lit x      -> `lit x
  | `var n      -> `var(pos n s)
  | `def(n,x,y) -> `def(db s x,db (n::s) y)
  | `add(a,b)   -> `add(db s a,db s b)
  | `sub(a,b)   -> `sub(db s a,db s b)

Functions

Another core functionality of many languages is functions. The behavior of functions is usually classified in three ways. First, the macros are a syntactic shorthand which is expanded at compile-time instead of runtime. So, the macro which adds a number with itself would work as follows:

[|TWICE(X)|]K → [|X+X|]K

The problem, of course, is that the expression is repeated twice, and as such it will be evaluated twice. So, for instance:

[|TWICE(1+2)|]{}[|(1+2)+(1+2)|]{}
plus([|1+2|]{},[|1+2|]{})
→ plus(plus([|1|]{},[|2|]{},plus([|1|]{},[|2|]{})
→ plus(plus(1,2),plus(1,2))
→ plus(3,3)
→ 6

Here, the macro duplicated the subexpression, then the subexpression itself was evaluated once for each member. This happens, for instance, with C preprocessor macros which merely duplicate the code, instead of working with the value of the code. As a result, macros are powerful in terms of expressiveness (because they can rewrite code arbitrarily, even outside of expressions) but not in terms of performance (or, in fact, in terms of semantics when side-effects exist that only have to be executed once). Besides, it’s impossible to have recursive macros, since the expansion would be infinite.

The improved alternative to macros is functions. Unlike macros, which are syntactic, functions perform a similar substitution at runtime. To make things simpler, we’ll consider here a function with a single parameter. The semantic rule for a function call is:

[|F(X)|]K[|B|]K' where K' = {a:[|X|]K} and λa.B = [|F|]K

Now, let’s understand this. To evaluate a function-call expression, one first evaluates the function expression (F), which should result in a function object (λa.B). A function object, here, is a combination of a name (a) called the parameter and an expression (B) called the body.

Once these are known, the argument expression is evaluated (x), and its result is bound to the name of the parameter in a brand new context (K’). The body is then evaluated in that context. So, if we consider a function such as:

[|TWICE|]K → λx.x+x

Then the previous example would now evaluate as follows (for conciseness, I now skip the evaluation of simple arithmetic expressions):

[|TWICE(1+2)|]{}[|B|]K where K = {a:[|1+2|]{}} and λa.B = [|TWICE|]{}[|x+x|]{x:3}
→ 6

Objective Caml example:

let add = function
  | `Int x, `Int y -> `Int (x+y)
  | _              -> raise TypeError 

let sub = function
  | `Int x, `Int y -> `Int (x-y)
  | _              -> raise TypeError 

let rec eval s =
  | `var i     -> List.assoc i s
  | `lit i     -> `Int i
  | `twice     -> `Func ("x", `add (`var "x",`var "x"))
  | `add(a,b)  -> add (eval s a) (eval s b)
  | `sub(a,b)  -> sub (eval s a) (eval s b)
  | `call(f,x) ->
    match eval s f with
      | `Func (a,b) -> eval [a,eval s x] b
      | _           -> raise TypeError

However, in this example, the function name was bound to a value as part of the language semantics. How is it possible to define functions as part of the program? Several possibilities exist.

One approach, taken by many languages (including C and PHP) is to make functions a part of the context. When a function is called, its body is evaluated in a context which is composed of the global context (that is, the set of all the functions and variables that were defined at global scope) and the values of the parameters. The advantages to this are multiple:

  • The function now has access to every function and variable in the global scope. Of course, for single-pass compilation reasons, C requires that the variables and functions are declared before they are used (so that the compiler doesn’t have to backtrack once a definition is found) but both functions and variables can be defined anywhere.
  • Since a function only references its own local data, plus the data in the global scope, one does not have to care about the lifetime of normal values: either the value is global (it remains around for the entire duration of the program) or it is local (it disappears when the function returns).

So, the new rule in that situation is :

[|F(X)|]K[|B|]K' where K' = GLOBALS + {a:[|X|]K} and λa.B = [|F|]K

The problem that this introduces is that the global context has to be filled, and normal expressions cannot be used to fill it because they might contain non-global variables introduces with the let-in construct. Programming languages tend to define a syntax for non-local scope definitions that is different from the normal syntax at local scope. C, for instance, defines functions and globals at file scope, and allows statements and local definitions at function scope. C# and Java allow class definitions at namespace/package scope, method/variable definitions at class scope, and normal statements at method scope. Objective Caml allows global definitions at module scope and normal expressions everywhere else. Some dynamic languages, such as PHP, notoriously work around this issue by allowing runtime extension of the global context from anywhere (so functions can define other global functions).

Closures

Closures are an extension of the previous solution based on a global context. Instead of having all functions use the same global context as a starting point, closures allow each function to keep its own base context. As such, a closure associates a context with every function, something that will be expressed as λa.B{K}. The classic way of creating a closure is to use the context that was available at the point where the closure is created. So, the evaluation rules would be:

[|F(X)|]K[|B|]K" where K" = K' + {a:[|X|]K} and λa.B{K'} = [|F|]K
[|{a -> B}|]K → λa.B{K}

The ability to carry any context is both an advantage and a disadvantage. The advantage is that this adds to the expressive power of the language by allowing the creation of arbitrary functions in arbitrary locations, instead of having to rely solely on parameters and global variables. The disadvantage is that this implies local values can be carried out of their scope by a closure, which implies a garbage collection scheme or restrictions on what closures can do.

Consider a closure which squares a function:

{f -> {x -> f(f(x))}}

And consider the ‘twice’ function from above:

{x -> x + x}

An example of application would be:

[|{f -> {x -> f(f(x))}}({x -> x + x})(3)|]{}[|B|]K" where K" = K' + {a:[|3|]{}} and λa.B{K'} = [|{f -> {x -> f(f(x))}}({x -> x + x})|]{}[|B|]K" where K" = K' + {a:3} and λa.B{K'} = [|B'|]K""
          where K"" = K"' + {a':[|{x -> x + x}|]{}} and λa'.B'{K"'} = [|{f -> {x -> f(f(x))}}|]{}[|B|]K" where K" = K' + {a:3} and λa.B{K'} = [|{x -> f(f(x))}|]{f:λx.x+x{}}[|f(f(x))|]{f:λx.x+x{} ; x:3}[|B|]K" where K" = K' + {a:[|f(x)|]{f:λx.x+x{} ; x:3}} and λa.B{K'} = [|f|]{f:λx.x+x{} ; x:3}[|x+x|]K' where K' = {x:[|B|]K"'}
            where K"' = K" + {a:[|x|]{f:λx.x+x{} ; x:3}} and λa.B{K"} = [|f|]{f:λx.x+x{} ; x:3}[|x+x|]{x:6}
→ 12

A bit complex? Consider this in more simple terms: {f -> {x -> f(f(x))}}(twice) evaluates to {x -> twice(twice(x))}. Then, {x -> twice(twice(x)}(3)} evaluates to twice(twice(3)), which is twice(6), which is 12. The context allows the closure to remember that in the expression f(f(x)), f is twice.

This is, in essence, the nature of lambda-calculus: a language built around functions “λa.B” and applications “f(x)” such that “(λa.B)(x)” replaces all occurences of “a” in “B” by “x”. Here, we have written:

λf.(λx.f(f(x)))(λy.y+y)(3)
→ λx.(λy.y+y)((λy.y+y)(x))(3)
→ (λy.y+y)((λy.y+y)(3))
→ (λy.y+y)(3+3)
→ 6+6
→ 12

For those who are familiar with it, our evaluation technique is equivalent to lambda-calculus with the rightmost redex being reduced first.

Using this approach, it’s possible to use De Bruijn indices with the exact same technique as before: a closure expression merely introduces a new variable (the formal parameter) in its body expression, thereby increasing the stack depth by one. All the variables in the carried context are available using the same depth as if the closure was in fact merely a let-in binding. In fact, De Bruijn index actually originate from De Bruijn’s work on lambda-calculus.

So, another interpretation of a closure is that it is a let-in binding, but which still needs to be provided with the actual expression to be bound. That is: {a -> B} is like let a = (hole) in B, where the hole is filled by calling the resulting expression on an argument.

Recursion

Using the global context as an evaluation context for every function allowed recursive functions without any additional work: when the function wanted to call itself, it could find its own name in the global context (since the execution of the program only starts after all global functions have been defined).

This is not the case for closures, however: a closure does not necessarily have a name (it’s just an anonymous expression), and the only way to give it a name with the current tools is to use a let-in binding expression. This means, however, that the name of the closure is only available after the “in” keyword, whereas the definition itself (which must perform the recursive calls) is present before the “in” keyword.

So, the problem is that in order to perform a recursive call, a closure must be bound as part of the context in which its body is evaluates. Since the solution used by global-evaluation schemes (which is to place the closure in the global context, and evaluate the body in that global context) is not available (the context is defined when the closure itself is defined, which happens before the closure actually exists in order to be given a name). Therefore, the only way to introduce the closure into the context without having to alter the evaluation system is by making the function its own argument, using the following semantics:

[|let rec n = f in e|]K[|e|]K' where K' = K + {a:μ(n,a).B{K"}} and λa.B{K"} = [|f|]K
[|f(x)|]K[|B|]K"' where K"' = K' + {n:μ(n,a).B{K"} ; a:[|x|]K} if μ(n,a).B{K"} = [|f|]K

These semantics add a new type to the virtual machine: a recursive closure. When called on an argument, a recursive closure defines two values within its context: the argument value (as the parameter name) and itself (as the recursive name). Recursive closures are defined by means of the “let-rec-in” construct that turns a normal closure into a recursive closure. This expression requires special De Bruijn rules to be parsed, since it introduces a new variable (the name itself) to the left of the “in” variable, but only within closures.

Multiple, mutual recursion is an extension of this principle, left as an exercise to the reader. Also, the astute reader familiar with fix-point operators might have recognized the relationship between the form of recursive calls (f f x) with the Y fixpoint operator of lambda-calculus.

However, this approach involving an additional value are too complex to implement when a simpler solution is possible: the recursive definition is only required within the body of the closure itself. Therefore, passing the closure to itself should only happen within that body. And since the body is cleanly delimited at parse-time, adding those calls syntactically instead of semantically is very easy. As such, it is merely a question of replacing “let rec f = a in b” with “let f = (let f = {f -> a} in f(f)) in b” where all occurences of the variable “f” in a have been replaced with “f(f)“. Once this syntactic replacement is performed, the previous virtual machine (with no recursive closures) can be used.

State

So far, there have been two ways of communicating data within the expressions. The value descends from the leaves to the roots, while context moves from the roots to the leaves, with possibilities for converting a value into a context element (let-in) or a context element into a value (variables).

Side-effects have the annoying property of appearing in an expression and affecting the nature of another expression without being part of the value of the first expression. This is for instance the case where one closure increments a variable while another closure reads it: the return value of the first closure cannot (and should not, for purposes of encapsulation) be provided to the second closure.

Since neither values (by definition) nor context (which only moves into expressions and never out) can represent side-effects, a new concept must be introduced. This concept is known as state. Unlike both values and context, which always move in a single direction, state moves in both. Every expression, when evaluated, is provided with a state (known as its initial state), and its evaluation yields another state (known as its final state). What the state really is depends on the application: it could merely represent the mutable variables of the program, or it could also represent input-output.

For the purposes of this article, the state will consist here of only mutable variables. The exact representation used here is a dynamic array of values, indexed from zero to infinity. Two expressions are allowed to interact with the context using the following rules:

[|ref x|]K,S → i,S' + {i:v} where i=length(S') and v,S' = [|x|]K,S
[|!r|]K,SS'[i],S' where i,S' = [|r|]K,S
[|r:=x|]K,S → v,S" + {i:v} where i,S' = [|r|]K,S and v,S" = [|x|]K,S'   

The “ref x” expression adds a new value to the state, then returns a reference to that value (in practice, the index of that value within the state). The “!r” expression extracts the current value of the reference “r” from the state. The “r:=x” expression changes the value associated with the reference “r” in the state to the value of “x”.

A common pattern, once state is involved, is to keep in mind that every expression has to read and write the state, meaning that the state will be travelling as an argument and return value everywhere. In practice, it’s possible to represent both values and context using the state (this is how virtual machines programmed in imperative languages work), but for clarity it will not be done here.

The existence of state means that some expressions will be used solely for their impact on the state, and not for their value. This results in the creation of the sequence expression, represented in Objective Caml using a semicolon, and with the semantics:

[|a ; b|]K,S[|b|]K,S' where _,S' = [|a|]K,S   

The Butterfly Specification

Butterflies and Hurricanes

Butterflies and Hurricanes is a track by the band Muse, and a quick reference to the butterfly effect of chaos theory. The gentle wing flaps of a butterfly create small disturbances in the air flow, and these disturbances propagate chaotically and may well cause a hurricane a few months later a thousand miles from there. And they make great music, too.

When a client comes bringing new feature ideas and specifications for the development team to add to a project that’s under pressure, the project manager’s duty to put a stop to feature creep. However, sometimes, the client reminds everyone that a small line in the fine print of a technical specification from a few months earlier dictates a small change in functionality which corresponds to a massive change in implementation. And since it’s not a new feature, but an existing requirement, throwing it away is much harder.

Such specifications, deceivingly small, easy and innocent, reveal themselves to be a real pain once their actual implications are understood. This is the Butterfly Specification: a small, ten-word wing flap in a requirements document creates a hurricane in the code months later, despite nobody being able to see it coming until it’s too late.

A single drop cannot drown you

When a butterfly specification is discovered and the usual name-calling and fist-shaking moments are over, one usually notices that the specification or requirement was nothing really special. That sentence, you’d say, is pretty clear about what had to be done, yet it hasn’t, so there must be poor planning afoot. This is true.

In improvisational theater (think Whose Line Is It Anyway?) good actors know how to create out of thin air a character (complete with tics, accent, tone, ideas and habits) and a story (complete with objectives, obstacles and an ending) while speaking loudly enough to be understood and interacting with other actors. When they forget one of these elements, the performance is lacking and the audience (though uninitiated) can grasp that the quality was lower than expected. As such, the ability of an actor to keep in mind all these elements at the same time is as important as his ability to implement them correctly. An actor with an outstanding accent and tone but without a coherent story will appear worse than an actor who has an acceptable story and a pretty nifty Canadian speech to talk aboot it.

If any of you ever did any improvisational theater, you were probably told about these things on the very first day. Then, you spent the rest of your training that year actually managing to get two at the same time, or three if you were gifted. People who start improvisational theater all begin the same: I can come up with a story, I can keep an accent for a dozen sentences, I can understand what the others are doing, so I should be able to play pretty well! Then they start to play with a moving and deep irish accent, only to notice after twenty seconds of idle and boring blathering that they still don’t have a story going on. So they panickedly start introducing one in a perfectly non-irish voice like a Bostonian with a hangover on the day after St Patrick’s. And the entire public is still wondering why they haven’t noticed their teammate, who had been begging for their attention all along.

The (obvious) bottom line is that people can’t concentrate on several things at the same time without a great amount of talent or experience, and even the most skilled software architects can’t think of everything at every point in time, nor can they convey that huge amount of information to the developers without losing a lot of detail along the way. And a butterfly specification is one of those razor-sharp lost details that a developer steps on bare-footed a few months later.

By looking at every piece of functionality, it’s easy to think that implementing that piece will be easy. Even worse, sometimes the piece of functionality is so obvious (or even common sense) that it’s forgotten as soon as it was read. The real difficulty is remembering all the pieces to be implemented at a given point in the program, which is an order of magnitude harder.

A real-life example I encountered lately was that of a comment system on a web site. The system behaved as expected: the user wrote a comment, and other users could read that comment. Of course, there was a flaw: comments that were too long were clipped to a fixed length without notifying the user (nor would the user naturally read again the comment he just posted), leading to a lot of clipped comments everywhere. Yet, it was common sense that this should not happen in an user interface.

However, the programmer who performed the clipping did not think in terms of users and comments and being notified of the clipping: he thought in terms of storing a string in a database, because that was the low-level action he was implementing. Then, the procedure he wrote for that purpose was used by another programmer, who did think in terms of users and comments, but did not notice that the procedure would clip the string before storing it. Since neither developer had the entire picture in mind, the functionality was incomplete.

The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague.

- E.W. Dijkstra

Bridging the Gap

There are two main approaches to reducing the impact of butterfly specifications: detect them earlier, or decrease the effort required to implement them.

Let the machines do it for you

Since the problem mostly comes from the ability of a human to manipulate a large array of concepts at the same time, and since computers are notoriously good at doing so, a simple idea is to let the computer take care of such trivial verifications.

If you every climb on board of the A340, you might feel reassured that the automatic pilot software was proven to be error-free by Astrée, a static analysis tool. I wouldn’t trust 130kLOC of C code with my life unless it was written, reviewed and thoroughly tested by tens of thousands of people (and even then, my Ubuntu plays tricks on me often enough), but the fact that it was proven to adhere to the requirements is good enough for me.

Before you reach that level, however, there are many other ways in which software can be used to verify your program. One example is the use of the type system to ensure that no operations are performed which are physically possible but nonsensical (such as adding together a shoe size and a success percentage), or to use classes and functions which encapsulate behavior that is easy to think of but also easy to forget (for instance, using std::string instead of the cstring functions). “Don’t have memory leaks” and “don’t perform nonsensical operations” are two common sense specifications, but other domain-specific type constraints can also be implemented in terms of type systems (for instance, encoding information taint in the type of values to avoid printing out secret data, or sending unquoted input to the database).

Last but not least, unit testing is a way of expressing a certain set of constraints (such as behavior requirements) that are then automatically verified every time something is done.

Use thought processes

In French Improvisational Theater, a coach is responsible for giving a character and story to every player on his team when a new round begins in a match. In order to remember what character elements to provide, coaches usually have a sentence that they use, with holes they fill in. That sentence (in english) is:

“You are [character], you want to [objective] because [reason], you think [ideas] and you [tic].”

For instance:

“You are the lead guitarist of a german metal band, you want to buy new strings for your guitar because you have a concert tonight, you think drug addicts are fun when they walk and you blink violently at people you meet.”

By using this sentence, the coach is certain that no element of the character was forgotten, so that the coached player can run onto the ice rink and play his role without having to spend additional time building a character from scratch.  Even better, by keeping in mind the words in bold, the sentence actually gains momentum and energy as it is being spelled while leaving free thinking time.

When developing, such processes bear the same benefits: while too complex (or too low-level) to be abstracted away as patterns or factored out as functions or classes, processes are automatic enough to avoid forgetting about them, yet thorough enough to avoid forgetting the things they care about.

As a programmer, I keep a permanent analysis process when writing (or reading) code. Every variable that is used is mentally checked, to see if it’s an argument or other piece of input data, or if it was carefully checked and tested before being used. Every database is looked upon with a deeply suspicious eye. Every resource allocated manually in an exception-supporting program is followed to determine if it leaks. Where many programmers think of programming style in terms of putting braces on a certain line and indenting by a certain amount of spaces, the war-worn veterans think of programming style as a paranoid and manic concern for using casts or pointers, side-effects which are not tested for success, or for any code that is not purely idiomatic.



693 feed subscribers
(readers who polled a feed this week)