Monthly Archive for November, 2008

The Window Tax

In 1798, the French Directoire instituted a new tax, called “la taxe sur les portes et fenêtres” (quite literally, the tax on doors and windows). Back then, the young republic did not yet have the means to closely examine the income and possessions of all taxpayers in order to determine how much each of them should pay, yet it wanted those who could afford to pay more to actually pay more. Doors and windows were an obvious sign of wealth that could be easily evaluated by the tax collector without having to enter the house, and since the taxpayers often did not have access to estate ownership titles having the tax be paid by the occupant helped things significantly.

Neon sign tax?Of course, soon after the tax appeared, people reduced the number of windows and doors, sometimes going as far as wall every doors but one, and brick-up all the windows.

A similar thing happened in France in the 1990s: until then, insurance for work accidents was provided equally to all companies, and all companies paid the same flat rate based on the national accident rate. In an effort to make large companies pay for improved safety measures (because they had the funds for it), the insurance rate became deregulated for companies above a certain size, so that a large unsafe company paid more than if it was safe. Small companies still paid a national rate.

Unlike what was expected, most large companies spent no money on safety and instead outsourced dangerous jobs to small companies, thereby improving their own rates. And since small companies often had worse safety than large companies, the overall workplace safety slightly decreased as a consequence of this deregulation.

The same happens in the computer world. When your log software advertises a thousand visitors per day, it simply hides the fact that HTTP has no notion of visitor:

  • A given person may use several computers and IP addresses to visit the same site.
  • A given person may use a proxy cache to read the website without having to connect to it.
  • The same IP address may be shared by several people (by ISP shuffling, on public computers and on LANs).

Similarly, when your download client displays a download speed, it’s quite probably an average over the last few seconds, meaning that if you are suddenly disconnected from the network, the download speed will remain above zero for a short while.

Of course, every user of the HTTP protocol knows, deep down, that HTTP has no notion of visitor and therefore a visitor is just a set of requests from the same IP over a given duration, just like every user of a download client can understand that download speed is an approximation and isn’t really above zero when there is no network connection.

Yet, it’s quite easy to forget.

An example upon which I’ve stumbled recently was a paying web site. Paying users were allowed access to some functionalities of the website, so there was a test performed on every of these pages to see if the user account was associated to valid payment information before displaying the page. Nothing too fancy, a mere if(user.Subscribed) test that didn’t deserve factoring out. Then, one day, marketing suggested adding a temporary test account that could be used for a limited duration to allow using those functionalities.

The obvious problem is that the previous assumption “only paying users can read this page” that was spread all over the system became incorrect. Cheating by changing the definition of user.Subscribed was impossible, because subscribed users have valid payment information but test accounts don’t. So, a new test had to be added to every page: if(user.Subscribed || user.TestAccount). Over a few months, the test became increasingly large as new properties were added for access.

Luckily enough, a sane developer came around and refactored the thing as an if(Functionality.IsAllowed(user)) function which allowed changing the access restrictions for these pages in a single place, and developers from that team learned not to make explicit assumptions within the code, not even on extremely simple tests.

The Window Tax Antipattern

The Window Tax Antipattern can appear when:

  • A given module provides a piece of information, Provided.
  • A rule (that is neither a consequence of the definition of that module nor an absolute truth) describes how another piece of information, Deduced, can be deduced from Provided.
  • Other modules use Deduced by manually deducing it from Provided.

In such a situation, if the rule changes, the deduction becomes incorrect and every module which uses the deduced information will have to be changed as well. As an additional issue, since the deduction is not explicit, finding every module that makes that deduction is often difficult. The solution is to identify the rule, and wrap it in a function that constructs Deduced from Provided (thereby allowing easier refactoring should the rule change).

In our above examples:

  • The provided information is the number of doors and windows, and the rule states that wealth can be deduced from that number.
  • The provided information is the accident rate, and the rule states that the safety of workers can be deduced from that rate.
  • The provided information is whether the user has a paying account, and the rule states that the page can be viewed only by users with paying accounts.

In some cases, the antipattern does not apply. For instance, if the rule is a consequence of the definition of the original module, then the rule cannot change unless the underlying module also changes (which is an acceptable reason for refactoring). So, if the rule is “when you push an element onto a stack, the stack is not empty until you pop it”, then assuming that a stack onto which you just pushed an element is not empty is not a problem (until, of course, either you or another part of the program pops that element).

However, every single business rule (those that can be changed by management, marketing, clients or even laws) is potentially subject to this antipattern unless cleanly wrapped.

In an environment that is heavy on business rules, the Adapter pattern becomes of particular interest. Since every object usually cares about information that is not directly contained in another object business rules crystallize as adapters around objects, providing the information that the current object requires (for instance, “can this user view this page?”) based on the information that the other object and its environment provides (for instance, “is this a test or paying account?”). View with suspicion every direct interaction between two business objects, and never hesitate to insert adapters or explicit rules whenever the used information is not a direct, obvious and inevitable consequence the provided information.

Calendrier de l’Avent

(English version below).

Calendrier de l'Avent

Je vous propose pour les fêtes de fin d’année un calendrier de l’Avent (illustré ci-contre). Vous pouvez d’ailleurs voir une démonstration complète, configurée pour simuler le 14 Décembre 2008, ici. Il s’agit d’un JavaScript gratuit que vous pouvez utiliser à volonté sur vos pages. Il propose entre autres:

  • 25 petites images de chez n.design studio.
  • Une grille de 5×5 jours, répartis aléatoirement. Cliquer sur un jour affiche l’image pour ce jour.
  • On ne peut voir que les images du jour et des jours précédent.
  • Le script se souvient des images déjà ouvertes par un visiteur d’une visite à l’autre.
  • Les jours changent aléatoirement de place (en conservant leur image).

Pour utiliser ce script, il suffit d’ajouter ces trois lignes au code HTML des pages à l’endroit où on veut le faire apparaître:

<div id="calendrierAvent">
  <script type="text/javascript" src="http://www.nicollet.net/files/avent/calendrierAvent.js"></script>
</div>

Vous pouvez l’instaler dans la charte graphique du site, si vous le souhaitez. Référez-vous à la documentation de votre site/blog pour savoir comment insérer du HTML (ou postez un commentaire ici et j’essaierai de vous aider).

English version

This ticket is about an advent calendar widget written in JavaScript using the jQuery library. It can be plugged into any web site quite easily, if you have access to the HTML code of that web page. It features:

  • 25 small images from n.design studio.
  • A 5×5 clickable numbered board. Clicking on a day shows the image for that day.
  • You can only see images for today (or the days before today).
  • The script remembers what images every user has seen from one visit to the next.
  • Images are shuffled around between visits (but the image for every day remains the same).

You can see a demo, configured as if today wereDecember 14th 2008, here. To install it, just add the following to the HTML of the page where you want it to appear:

<div id="calendrierAvent">
  <script type="text/javascript" src="http://www.nicollet.net/files/avent/calendrierAvent.js"></script>
</div>

You may wish to install it as part of a template, for instance. See your blog or website software for how to insert HTML into your pages (or post a comment here and I’ll try to help you out).

Lessons Learned

Let’s face it: functional programming has never been a mainstream trend in the development world. Of course, there’s always the odd contender, such as Objective-Caml-hiring Jane Street Capital, but the vast majority of job listings and lines of code remains written in the mainstream family of languages descended from C: C++, Java, C# and the little brothers, PHP and Javascript. Things might change change with the advent of F# as an official language in the .NET framework, but the change is yet to be seen.

The Dangers of Expressive Freedom

Why are non-initiates so reluctant to try out functional programming languages as a whole? One common feature of Objective Caml, Haskell and LISP is the unusual syntax, completely distinct from the tried-and-true statement-based program. And that is downright scary when you’re not used to it.

The elementary approach of BASIC, COBOL, Fortran, C, C++, Java, C# and PHP is to construct programs by listing statements. A statement is an operation that exists only to perform a certain side-effect, elegantly and reliably terminated by a token such as the now-omnipresent semicolon.

do this;
then do that;

There is a clear and simple mapping between the code and the operations executed at runtime, which is both helpful (debuggers can work statement-by-statement) and reassuring (because the programmer can see what the code does, in the right order). In fact, average C and C++ programmers often become so attuned to this top-to-bottom execution order that they tend to assume left-to-right execution as well, leading to such malformed statements as a[i] = i++; where sequencing rules break any expectations.

Another clear visual cue in C family languages is the block. A block starts with { and ends with } or any similar pair of open-close symbols distinct from those used in expressions (because Expressions Are Not Statements). Blocks are cleanly delimited sets of instructions which further improve the basic readability of code.

Ultimately, the basic syntax follows a simple, easily recognizable and easily understood pattern:

{
  statement;
  statement;
  block-options
  {
    statement;
    statement;
  }
  statement;
  statement;
}

Execution flows from the top to the bottom, every statement depends only on the statements that are above it (and, in some special cases, on the statements in the same block as itself). This is mainly useful because it allows one-pass compilers to work correctly (as there is never any necessity to jump back and forth within the program, merely using a symbol/label/block table to implement the correct addressing and jumps within the output code without actual jumping back while compiling. The consequence is that the language is easier for programmers who are still in their learning stages, or those who don’t want to spend time and effort moving on to the later stages, because the syntax does not get in the way of understanding the code (such a rigid structure does, of course, get in the way of writing code).

Compare this with the free-flowing structure of LISP code:

(atom '(atom atom atom)
  (atom atom) (atom atom)
  (atom
    '(atom atom atom)
    '(atom atom atom)))

In LISP, there are no statements and no blocks. Instead, everything is composed of lists grouped together into trees called S-Expressions. The benefits of this approach in terms of expressiveness are both obvious and widely documented: since LISP provides primitives for manipulating S-Expressions, a piece of LISP code is first and foremost a piece of data which may be manipulated in various ways. Execution of code is one among many other possible manipulations, and consists in turning a complex S-Expression into a simpler one through evaluation.

But then, the clear mapping between the code and the behavior disappears. Sure, there is still a mapping, but it loses the clarity of “statements are executed one after another, in order” and is instead, by design of the language, potentially defined by the user itself. When looking at a piece of LISP code, it’s sometimes hard to determine whether that code is a piece of hard-coded data or a custom program which will be interpreted by an user-defined evaluator (and in the latter case, it does not convey any information about what that custom evaluator would do).

And so, in functional languages, the user cannot deduce runtime behavior from the layout of the code: an additional translation step is required to plug together the abstract syntax tree, determine the direction of data flows based on the functions being used, and finally deduce what will happen. People who are not used to it require a short time to adapt, and beginners (or programmers who are not used to abstract thinking in general) will encounter serious difficulties because of this.

Cross-Language Experience

Of course, the reverse is also true: since understanding a functional program is difficult and requires abstraction skills in order to imagine the runtime behavior of what the code hints at, then anyone who can reliably understand and write functional programs has those abstraction skills, and can apply them to a wide variety of programming problems. Functional programming advocates say these skills also help when working with non-functional programming languages.

Yeah right. I can implement recursion with a stack and loop if I want to.
- Joe Clueless

Precisely, Joe. Functional constructs are often tightly related to the programming language they were defined in. It’s often difficult to apply, say, Haskell Monads to LISP or Objective Caml, so imagine applying them to Java or C. The point is not, of course, to transliterate functional concepts as-is to non-functional languages. Every language has its own constructs and idioms, as well as techniques for improving performance or reducing the amount of code required to implement a given feature, and these carry with difficulty across languages.

On the other hand, whipping out a stack-and-loop solution out of thin air is not something programmers tend to do. And when they do, implementing a recursion-like solution without thinking about it in terms of recursive solving is quite difficult, because recursive solutions impose certain constraints on how the stack is uses within the loop that are quite helpful when deciding what should go into the stack.

For instance, consider a typical recursive structure: the tree. Let’s define a tree and add the elements in the nodes, in Objective Caml:

type tree = Tree of int * (tree list)
let rec sum (Tree (x,children)) = List.fold_left (+) x (List.map sum children)

Now, let’s consider a similar tree structure in C:

typedef struct _Tree
{
  _Tree *first_child, *next_sibling;
  int value;
} Tree;

Looping through all elements in a tree with a stack is not straightforward, because one has to consider what will be pushed on the stack: the nodes to be explored for children? The nodes to be explored for siblings? Something else? When using a recursive approach, one can consider that, for instance, sum(value,child,sibling) = value + sum(child) + sum(sibling). The naive approach is to push every call on the stack and use the associative properties of addition to add everything up. Yielding the solution:

int sum(Node *node)
{
  int total = 0;
  push(node);
  while (node = pop())
  {
    total += node -> value;
    if (node -> first_child) push(node -> first_child);
    if (node -> next_sibling) push(node -> next_sibling);
  }
  return total;
}

A slightly better approach is to use tail recursion by moving the node to its next sibling in-between loop iterations instead of pushing to the stack:

int sum(Tree *node)
{
  int total = 0;
  while (node || node = pop())
  {
    total += node -> value;
    if (node -> first_child) push(node -> first_child);
    node = node -> next_sibling;
  }
  return total;
}

To write this code directly, all that is required is 1° experience of recursive solutions (for instance, from functional programming languages) and 2° knowledge of the usual tricks to implement recursive solutions in C.

Those are classic tree algorithms. You would either learn these in school or pull them out of a book.
- Joe Clueless

Let’s ignore for a moment that the Objective Caml implementation of the “sum” function is two lines long and pretty much described anything there is to know about the algorithm in general (which you can translate to C if you know the elementary translation primitives).

It’s true that anyone with reasonable knowledge of depth-first search could design the unoptimized traversal above (though experience of tail recursion is about the only of thinking up the optimized version, short of a stroke of genius). It’s also true that most courses in C teach how to manipulate trees. But there is more to recursion than only trees: why teach individually every application of recursion, when learning the fundamental principles allows deriving those applications by hand just as fast?

The same pattern applies to dynamic programming: of course, the classic dynamic problems (matrix multiplication, Levenshtein distance) are fairly easy to understand if you set your mind to it, but this does not help in the general case because the algorithms are hand-compiled implementations of a higher-order concept: memoization. For instance, applying memoization to the recursive definition of the Levenshtein distance leads to using a triangular matrix for storing the intermediary values, and a quick analysis of the call tree shows that all values in the matrix will have to be computed eventually, leading to the correct dynamic implementation.

Several other classic concepts in mainstream programming are in fact merely translation tips for existing functional concepts:

  • The Command design pattern corresponds to monads in general and currying mutators in particular.
  • The Abstract Factory design pattern corresponds to currying generators.
  • The Memento design pattern replicates backtracking in languages with mutable values.
  • The Observer design pattern is a function.
  • The Prototype design pattern corresponds to the original when there are no mutable values.

So, several design patterns are the functional equivalent of a for-loop from 1 to N. Because of this, identifying design patterns in code becomes easier when one is thinking in functional terms: even if the program itself is written in a non-functional programming language, functional experience improves the quality of the program by refining its design.

Even then, designing things in a functional language, then translating them, loses clarity and performance.
- Joe Clueless

That’s true. If translating a functional design into an imperative program was easy, then optimizing compilers could reach the performance levels of good C implementations from high-level Haskell code. However, even if it’s not easy, it’s still easier than having to design things by hand from scratch using primitive low-level tools.

Hand-Compiling Functional Code

There’s a thin shared membrane between imperative programming and functional programming. This membrane is related to the representation of mutability in functional languages:

  • Values cannot be changed.
  • One can simulate a changing object by creating a slightly different copy of that object.

However, that is a functional view of that representation. The imperative view would be:

  • You can always undo any modifications you perform.
  • When you change an object, only the current reference you hold sees those modifications.

Of course, you have two ways of achieving this in an imperative program. The first is to use a functional-like approach where backtracking information is stored in objects. The second is to simply avoid any cases where the imperative programming mutable version differs from the functional programming version. That is:

  • Never backtrack. If you have to, store either the original value (on small scales) or a diff (on large scales).
  • Never change an object unless you are the only owner of that object.
  • Never store a reference to an object unless you know it is going to remain constant.

This eliminates a lot of constructs from both imperative programming and functional programming. Of course, this makes it extremely restrictive, which is both good and bad. It’s good, because you now have full functional benefits for a portion of your code: that code does not change the state of anything it doesn’t own (though objects may change themselves, of course, as long as only the caller function owns them), performs well in multi-threaded environments because it requires no locks, and can be tested in isolation simply by providing the right arguments.

It’s bad, because developing programs this way is harder. You lose a lot of benefits from imperative programming (such as the ability to change a value you don’t own, which is an excellent communication device between objects in a mutable environment) yet don’t gain any of the expressive power of functional language because many imperative languages simply don’t have these.

As such, this hybrid programming style is best kept for small self-contained sections of your program, in particular for small utility classes and in bottom-up implementation approaches.

Language Statistics

I recently stumbled upon a study (article) of similarities between languages based on the frequency of two-character sequences within that language. The basic premise is that the words of different languages are written in different ways: the sequence “ch” is more frequent in French, while “sh” is more frequent in English. The probability matrix of all two-character sequences was called the Statistical Language Signature, and was used in that study to build a tree representing the evolution of languages.

Could a similar technique be applied to programming languages? For instance, could an analysis of two-character sequences identify a text as being written in a programming language instead of another?

I have written one such statistical analysis program and extracted the number of occurences in code databases of tens of millions of characters for various languages. These are not probabilities of occurence, but rather the actual numbers, sorted in descending order:

C++ (.[ch]pp) Perl (.pl)
PHP (.php)
OCaml (.ml)
LaTeX (.tex) SH (.sh)
Assembly (.s) XML (.xml) LISP (.el)

Data was gathered by concatenating together files I could find on my hard drive. As such, there is probably a statistical bias (for instance, related to the fact that some OCaml files were generated by Menhir and not hand-written). Also note that white space is mostly ignored (so languages such as make or Python, which are whitespace-sensitive, might yield odd results). Let’s consider a few interesting lines from, say, OCaml and PHP:

     1	  1041163 : a-z a-z
     2	    61686 :   _ a-z
     3	    55004 : a-z   _
     4	    45553 : A-Z a-z
     5	    31807 : 0-9 0-9
     6	    27180 : a-z A-Z 

     8	    20599 : a-z 0-9
     9	    18215 : A-Z A-Z
    10	    16455 : 0-9 a-z
     1	   823427 : a-z a-z
     2	    50294 : A-Z A-Z
     3	    35603 :   $ a-z
     4	    34804 :   -   -
     5	    33476 :   =   =
     6	    31877 : A-Z a-z
     7	    24959 : a-z   (
     8	    18658 : a-z A-Z
     9	    15675 :   _ a-z
    10	    13769 : a-z   _

The first noticeable element is the “$a-z” sequence (rank 3, right) that is particular to PHP (and, to a lesser extent, Perl). Another is the difference between PHP ALLCAPS constants on the one hand (rank 2, right), and the Constructor_name approach used by OCaml (ranks 2, 3, 4 left).

Other points of interest:

  • TeX has the largest number of different two-character sequences (nearly 1200), followed closely by Emacs LISP (1150) and Perl (1130). Assembly has the lowest, with less than 300.
  • SH has a high frequency of “/a-z” and “a-z/” sequences, possibly related to path manipulation.
  • The highest-ranked non-alphanumeric sequence in C++ is “//”, the comment sequence. Also, the increment operator ‘++’ is seven times less frequent than the decrement operator ‘–’.
  • The frequency of empty string literals is unusually and significantly higher in OCaml (perhaps due to typical recursive or folding manipulation).

Feel free to peruse the above as you wish, or use the statistics program to process your own data sets.

Perfect Caching

To keep things simple and manageable, the programmer is forced to assume in most of his code that execution happens sequentially, and that the global persistent state of a web application changes atomically and instantly when the script finishes executing.

This happy sunny world ends as soon as performance considerations appear. A 100-millisecond script execution, which is not quite unheard of, means that a fully sequential web site can only service 10 hits per second on average. If this is all that is required, then this is fine. However, popular websites might have to service more than 20 million hits per day, which amounts to 231 hits per second or about 4 milliseconds per request.

Needless to say, even serving static pages does not bring you close enough to that 4 milliseconds. Wikipedia, with its 187 million hits per day on the English version alone, would have to accept an incoming connection, fetch the appropriate data from the database and respond to the request within 460 microseconds.

When you reach this level of speed requirements, single-machine optimization reaches its limits—not even assembly-level driver optimization by hardware experts could bring out that kind of performance. The only way to improve performance sensibly is to add more hardware. If you have to service 20 million daily hits but can only handle 100-millisecond responses, you need about 23 identical servers (and unless your business model is insane, those million hits are bound to pay for 23 $300/year servers quite easily). In fact, you could even decide that 23 processors (in six quad-core servers) are better.

That, of course, would be in an ideal world where those 23 independent servers can churn out the correct responses full time. However, the original issue was not a hardware issue, but a design issue: the source code assumes that all requests occur one after another. Adding more hardware to handle several requests at once breaks that assumption and leads to unexpected behavior in the program.

The New Bottleneck

So, in order to keep the program well-behaved, the actual persistent state is still stored and updated in a single place, usually referred to as the master database server. The old bottleneck of accepting connections, and responding with correctly crafted documents is gone, but a new bottleneck appears: there is still only one server to handle all requests related to the persistent store.

The issue of delegating persistent storage to several servers is a complex one. The easiest part is delegating read access, but keeping write access on a single server: this creates slave database servers, which are updated with the data from the master server and allow read-only access to lighten the load of read requests on the master. This is not trivial to do, however: first, cross-server transactions do not exist (you cannot read from a slave server then use that value to update the master server, or at least not atomically) so every potential mutator request must either work with the master server alone or accept losing atomicity, and both solutions require changes in code to redirect read requests to appropriate locations or make non-atomic modifications acceptable.

Quick note: it is not necessary to use the same kind of server for the master and for the slave. For instance, you could use a normal database for the master, and a caching system (such as memcache) for the slaves. This way, you get the optimization benefits of a database for concurrent modification handling (in mutators) and the optimization benefits of a cache system for read handling (in accessors).

Then, there’s the question of delegating write access. Sometimes, it’s easy: if the data you need is made of large independent blocks, such that the time of finding them is smaller than the time of altering them (for instance, YouTube videos) you can dispatch them to several servers and use a single (possibly distributed) directory to find them. Or, if you have several largely independent sets of data (such as independent forums, or the “Posted Items” and “Profile Information” on Facebook) that never need to be modified or queried together, you can dispatch them to several servers. Or, again, if you have a simple way of splitting entities based on their properties (for instance, based on whether the identifier is even or odd) you can route your modification requests based on that property.

Doing It

Accessor/Mutator separation : the most fundamental analysis that has to be done is to determine which requests are mutators, and which are accessors. Typical web sites run many more accessors than they run mutators. If you have a static analysis tool that can determine whether a certain code path or controller ever performs a modification, you’re lucky. Otherwise, you can either write your code to handle this at some granularity level, such as functions (my personal convention is to prefix all mutator functions by “do”, for instance doLogin($username,$password) might change the database), modules or URLs (a classic way of doing this is to login as a different database user when inside an accessor URL, one without the rights to perform modification queries); or you can rely on empirical analysis by logging all queries performed by every URL access over a certain period of time. If a certain URL visit frequently results in a mutator query (or has recently done so), then run it as a mutator, and otherwise run it as an accessor (in the odd case where an URL runs as an accessor but attempts a mutator query, re-run it from the very beginning as a mutator).

Immediate/Delayed separation : the next step is to determine how soon a certain modification should be visible to readers. The user is less capricious than the code: a mutator has to see the current state of the application in order to work correctly, but users can usually handle a state that is a few seconds or minutes old without problems. This allows the master server to delay the propagation of some changes for a short while, which can be pretty handy if it’s under a heavy load. These changes will still be available to other mutators, of course, but the readers will see an older version. Some modifications, however, have to become visible instantly: changing one’s password, for instance, should enter into effect immediately, while indexing a new item within a search engine may take a little bit longer to happen before the user notices. The benefits of a query-based cache, such as MemCache, are obvious here: instead of pushing the data from the master server to the slave server, you can have the web server itself invalidate elements of the cache related to the newly updated piece of data without adding any load to the master server unless the data is actually accessed (or the web server could fill the cache with the appropriate value).

Transaction separation : when a certain process has to perform writes, not all reads that occur before those writes have to be part of a transaction. For instance, if visiting a page updates a “this user is currently visiting…” field in the table, then that update is independent of any reads that happen when serving that request (so, for instance, it would make no sense to read the visited page from anything but a slave server or cache). A typical means of separating transactions is to wrap them in functions that perform all transaction-related work by themselves, either as procedures stored on the database or as normal functions in the host language.

Memoization Trees

A given web page (or any HTTP response, for that matter) is composed of three distinct yet interleaved sets of data:

  • Constant data (views, internationalization libraries), which only changes during major events such as software updates.
  • Program state data (databases, filesystem), which changes when the software alters it.
  • External data (RSS feeds, FTP, the current time), which can and will change randomly.

An ideal cache system stores data in such a manner that, when a request hits the cache, only minimal work is required to combine the cached data with external data and obtain the correct response. The ideal piece of software is therefore:

page(args):
  state_data = get_state_data(args);
  external_data = get_external_data(args);
  return frobnicate(state_data,external_data);

At which point, caching mostly amounts to memoization of the state data.

This brings up two problems. The first is that the code seldom looks like the above: usually, the external data is intermingled with normal data at several points in the code, which makes a full caching impossible. But that’s not really a problem—one could insert a caching layer between the database and the program, and just trigger caching based on the initial arguments. The real problem is that this kind of caching is not granular enough, because it caches the entire state for a given set of arguments. If the arguments change by even a single bit, the cache becomes invalid, even though 95% of the entire cached data was still valid for the new set of arguments.

Another way to look at the final document, is to see that it was generated from a tree of operations that start with the three types of data as the leaves, have various pieces of code as the branches, and yield the final page as the root. Displaying a page with different arguments results in a forest of such trees, except that some branches may be shared between the two trees (which branches are shared depends on which arguments were changed).

So, if every branch that depends only on non-external data stores both its result and the arguments which led to this result, displaying a tree will not have to run all the branches that it shares with a previously evaluated tree. Caching policies may then eliminate branches that are examined too infrequently while pushing forward those branches that are evaluated often. Statistical analysis over a long duration might even determine that certain branches deserve to be cached while others don’t, and lead to introduction of caching code at appropriate points.

The net result that, instead of having a single cache layer, there are several layers of varying available size and flush frequencies. In extreme situations where the memory available for caching is infinite, every single part of the program is cached (and is therefore computed only once).

In practice, dividing the view/controller section of a web site into a hierarchy of blocks helps enforce this: cached blocks have their HTML pulled from the cache and displayed, while uncached blocks compute that HTML and add it to the cache. HTML which appears on several pages is cached only once and then reused on all other pages.

Invalidation Cascades

The problem with having several layers of caching is that it makes cache invalidation harder. Relying on classic timeout means that every update takes several timeout lengths to traverse the entire set of layers, which leads to smarter invalidation techniques. The first question to be asked is whether the invalidation should propagate when queries happen, when modifications happen, or as a side operation.

My suggestion is, for every node in the memoization tree (which is actually a directed acyclic graph made up of the merged tree of a forest, welded along their shared branches), to store a list of direct super-nodes that use it. When the data directly below this node disappears or changes, make the node disappear and propagate this disappearance to all its super-nodes based on the list. This can be done instantly (so that all nodes that depended on the changed value disappear) or performed asynchronously by an invalidation process that keeps a list of “to be invalidated” nodes, depending on the needs.

The alternative is to perform the test at run-time, by storing for every node a list of state-level triggers (such as “property Foo of object ID=bar was changed”, possibly with wildcards) that can result in invalidation. This list of triggers can be constructed at memoization time or (even better) at compile time. The memoized value then keeps a set of timestamps corresponding to each trigger, and compares those timestamps with the latest execution of those triggers. The update process then merely traverses the list of triggers and determines which ones should be refreshed based on what the update is really doing.

An Exercise in Terseness

Terse: smoothly elegant, devoid of superfluity

Programming languages have cultures that revolve around them, which are both a cause and a consequence of the idioms and techniques found in these languages. Some languages, such as Perl or Objective Caml, have small communities around them that lead to a reasonable consistency in their cultural aspect: while both value terseness as a sign of good design, Perl hackers rely on the wild tentacled expressiveness of the language while Objective Caml hackers rely on streamlined mathematical representations that the language expresses concisely.

Other languages are larger, often due to their application in so many situations that a single language-minded community cannot contain them. JavaScript ranges from download-free-script online communities with non-portable low-quality code to elegant functional programming to code generated from Java, while C++ ranges from academic numerical computations inspired from earlier proceedings in the C language to modern reliance on standard library generic and functional programming to large-scale object-oriented development.

Tentacled Onion

When I say that the Perl language has a wild tentacled expressiveness, it is more of an admiring stance than a negative analysis. Consider, for instance, the “underscore” special variable $_ : in many situations, if no variable is used to assign the result of an expression, then that result is assigned to $_. For instance:

while(defined($_ = <>)) print $_; 

# Equivalent to : 

while(<>) print;

To the average Perl hacker, this concept is an illustration of a fundamental principle (as the standard documentation puts it, “this may seem like an odd thing to you, but you’ll use the construct in almost every Perl script you write”). In terms of language semantics, the behavior is pretty simple to explain: a program is a collection of operations which read values and output values, where the output of one operation is the input of another. This is true for almost all programming languages (with the exception of some languages, such as Prolog).

Writing a program is the description of what operations to run, and what data goes from which operation to which operation. Concise languages tend to describe operations by naming them, and make data flows implicit by various means. Perl takes a “don’t write what you don’t need to specify” that is shared by several other languages from SH to Ruby: when it’s obvious that a value has to be assigned to something, there’s a default assignment target that will keep the programmer happy most of the time (the $_ special variable). And, in general, if an acceptable default behavior exists, it is used.

Go Forth and Multiply

Forth makes data flow completely implicit, through slightly different means: instead of being stored in variables, data is pushed onto a stack. Every operation therefore reads the top values from the stack, and its output is pushed back on top of the stack when it ends. As such, a Forth program is a sequence of words, and every word is an operation.

This extends to “function” definitions, which don’t have to name their parameters and instead read directly from the stack:

: abs dup 0 < if negate then ;

This definition of “abs” duplicates the top element, pushes zero, then pops the two top elements and pushes a boolean determining whether the bottom was smaller than the top, then pops that boolean and, if true, pops the top and pushes its negation.

Of course, terseness and conciseness leads to shorter code for equivalent functionality, and shorter code usually leads to more difficulties in understanding. For instance, long word definitions are frowned upon in Forth because keeping track of the stack is hard (its implicit manipulation is harder to follow than the less concise but more explicit technique of naming values for later use). The same goes for Perl, where the implicit use of a default behavior requires knowledge of what that default behavior is (and even the most war-worn Perl hacker can be surprised by the behavior of an exceptionally devious and elegant piece of code).

Paths, Expressions and Domains

One area where conciseness does not create difficulties in understanding is domain-specific languages. If a general-purpose language can describe a shopping cart in thirty bytes, there’s a high probability that those thirty bytes will be cleverly written and therefore difficult to understand. This is an elementary cookie-based shopping cart implementation in Perl:

@P=sub{lijs while<CKE>;}ul$<chomp; sub{~"http://.*";print}

Just kidding. On the other hand, a thirty-byte description of a shopping cart in a scripting language designed for creating e-commerce web sites is boringly unsurprising: this is what the language was designed for, and one can expect it to get to the point fast enough. Everyone who uses such a language already knows what a shopping cart is and therefore has no difficulties understanding the code.

XPath is an example of a concise yet understandable language: it has a clean and small domain (identification of nodesets within an XML document) and a syntax that matches. For instance, were I to look for all the ‘li’ elements of an XML document that are inside a ‘div’ named “menu”:

 /descendant::div[attribute::class="menu"]/descendant::li

Or, as a shorter version (using the shorthand syntax of XPath):

//div[class="menu"]//li

The equivalent search without XPath in any general-purpose language (even with a good library for traversing XML) would be much longer than that. The same happens for regular expressions: a complex language for the small domain of matching patterns within text (and possibly replacing them with something else), that aims to be concise above all.

An Exercise in Terseness

I have often been frustrated by the verbosity of XSLT as a tree transform language. Looking at the w3schools tutorials, for example, one can see an example such as:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 

<xsl:template match="/">
  <html>
  <body>
    <h2>My CD Collection</h2>
    <table border="1">
      <tr bgcolor="#9acd32">
        <th>Title</th>
        <th>Artist</th>
      </tr>
      <xsl:for-each select="catalog/cd">
      <tr>
        <td><xsl:value-of select="title"/></td>
        <td><xsl:value-of select="artist"/></td>
      </tr>
      </xsl:for-each>
    </table>
  </body>
  </html>
</xsl:template> 

</xsl:stylesheet>

This is, at least in my opinion, a mighty long piece of code for such a small thing. Using a stack-based approach similar to Forth, one could make the above look like:

: ~/
  "My CD Collection" /h2
    "Title" "Artist" . ( /th ) "#9acd32" @bgcolor /tr
    ~catalog/cd { $ ~title # ~artist # . ( /td ) /tr } .
  "1" @border /table
/body /html ;

The semantics of this language use two elements: a stack of output node sets and a stack of input node sets. It still works based on words:

Word Semantics XSLT Equivalent
“foo”
Pushes the text node “foo” on top of the output stack. Raw text
/foo
Pops the top of the output stack and replaces it with an element named “foo” containing all the elements that were in the popped node set. <foo></foo>
@foo
Pops the top of the output stack, then adds an attribute named “foo” containing the text representation of all the elements in the popped node set o the new top of the output stack. foo=
~foo
Pops the top of the input stack and pushes any of its contents that match the XPath “foo”. select=”"
$
Duplicate the top of the input stack. Automatic
.
Concatenate the top two output node sets into one. Automatic
{ }
Pops the top of the input stack, pushes an empty node set onto the output stack. For every node in the popped node set, pushes that node on the input stack and executes the code in brackets, adding an implicit . at the end of that code. xsl:for-each
( ) Same as { }, but pops from the output stack instead. XSLT 2.0 only.
# Pops the top of the input stack and pushes its text value to the top of the output stack. xsl:value-of
: ~foo ; Defines a template that matches the pattern “foo”. xsl:template
$: Pops the top of the input stack, pushes an empty node set onto the output stack. Then, pushes every element of the popped node set that matches an existing template onto the input stack and runs that template, then concatenates the top two output node sets. xsl:apply-templates

This shortens the resulting transform script, while reducing its readability (now, the generated XML is not “obviously present” in the script, but rather implied by the sequence of words that construct it).

If you remember the first example from this earlier article, you could rewrite it as such:

: ~/    $: /ul ;
: ~file ~@name # /p /li ;
: ~dir  $ ~@name # /p $: /ul . /li ;

When you’re careful, you can make out the meaning of this. However, it’s harder than with the original:

<xsl:template match="dir">
  <li>
    <p><xsl:value-of select="@name"/></p>
    <ul><xsl:apply-templates/></ul>
  </li>
</xsl:template> 

<xsl:template match="file">
  <li><p><xsl:value-of select="@name"/></p></li>
</xsl:template> 

<xsl:template match="/">
  <ul><xsl:apply-templates/></ul>
</xsl:template>

Parsing the language is extremely easy: a word is either a string literal or a sequence of non-whitespace characters, a template is defined as : ~foo … ; and a program is a sequence of templates. Of course, extracting the loops and conditionals may require some additional effort, but it remains extremely easy to get done. Writing a virtual machine is equally easy as long as XML extraction and manipulation functions for node sets are available. Even static analysis (of stack depth) is quite simple because of the requirement that every template and loop pushes and pops a fixed number of elements from the stack : a linear-time left-to-right traversal can determine whether the number of elements pushed and popped corresponds to the requirements, and generate code that runs the program. If the program outputs the file directly, an optimization pass can be made with a slightly increased difficulty (involving, mostly, executing the individual templates with slightly different semantics) in order to generate the output file linearly.

Singletons

A singleton is a way of designing a part of an object-oriented system that was used frequently enough to deserve its own name: the Singleton Design Pattern. In practice, a singleton is a class which can only have one instance at any given time, and for which that unique instance is globally accessible.

The purpose of a singleton class is to represent a concept which, by its own fundamental nature, can only exist as a single instance. Sometimes, this happens because there are no technical solutions available to allow more than one instance of that concept, or sometimes because the solutions do exist but are too impractical or difficult to implement.

To determine the need for the Singleton Design Pattern in a practical case, one should then examine the following checklist:

  1. Is there a need for the considered functionality to be grouped in an object?A valid reason for wishing to regroup the functionality as a single object is if you need to manipulate it as an object, for instance to implement an interface and use functions or methods that can be applied to that interface. Another valid reason is the existence of several values related to that functionality which should then be grouped as a single object.

    If there is no reason to group the functionality in an object (no values to represent state, no usage as an object implementing an interface), then it is preferable to use a function, method, or namespace instead.

  2. Is there a need for the considered functionality to be grouped in a class?The alternative here is often initializing a global instance of an existing class to represent the functionality, instead of defining a brand new class for that purpose. For instance, even though the standard output of a program is by definition a one-instance-only concept, many implementations of the C++ std::cout are not singletons, but rather global variables initialized with a specific internal buffer that prints to the standard output (that buffer, in turn, is not a singleton because it provides no global access to its unique instance).

    If the difference in functionality is so wide that you would have to rewrite most of the class, then it is a good idea to create a brand new class. By contrast, if the difference is very small, then one can often get away with using an old class instead.

  3. Is there a technical reason for which the considered functionality does not allow multiple instances of the class?If it is technically possible to create several instances of the class, then the use of a singleton is not warranted: one can simply create a global instance instead.

    Note that it is not enough to decide that more than one instance will not be needed to warrant the use of a singleton! The reason the singleton is used is because the represented concept itself, independently of any external reasons such as how it will be used by the program, does not allow the creation of multiple instances. If the program happens to only create a single instance, then this is a coincidence and not a technical limitation.

    Note that this is only an argument against preventing the creation of multiple instances: the other traits of the singleton pattern (being a class which has a globally accessible instance) are not necessarily discarded if this particular condition does not hold.

  4. Is there a need for the unique instance to be globally accessible?The non-singleton alternative is the one used by std::cout : implement a singleton-like class (only allows a single instance) but use its unique instance locally to either initialize a global instance of another class or as a persistent local variable in a function or method.

    The question to be asked here is whether an observer outside the scope of the module or function you are working in needs to know of the existence of the singleton candidate, or whether the fact that the class extends a public interface is enough.

    If knowledge of the class’ existence is necessary, then a globally accessible instance is almost always warranted: there is no point in knowing about the class if one cannot access its instance, and there is no point in passing the instance of that class around while knowing it is the only one.

    On the contrary, if only the interface is public, then only the module which contains the class needs to know about the class (and provide it to the outside world as that interface).

    This is an argument against making the instance a global variable: the other traits of the singleton pattern (the limitation to a single instance, for instance) still hold even if the instance itself is not global.

If the candidate satisfies the above four conditions, then it can accurately be described by the Singleton Design Pattern.

Monostates

The Monostate Design Pattern is a close relative to the Singleton Design Pattern, in terms of applicability. Almost every concept which can be implemented as a singleton can also be implemented as a monostate.

An image that has no reason to be here. A monostate is a class which can have any number of instances, but where every instance has exactly the same state as all others. In terms of implementation, it is equivalent to a singleton: a monostate can be represented as a proxy class which automatically references the unique instance of the singleton, and the singleton can be used in place of a monostate simply by using the singleton’s global instance access function wherever the code created a new monostate.

From a design point of view, I tend to consider monostates as a globally weaker alternative to singletons. My reasoning is that a programmer makes the basic assumption that the state of a given instance can only be modified through references to that instance. However, a monostate violates this assumption by allowing the state of an instance to be modified through a reference to another instance. This oddity does not exist for a singleton, where only a single instance exists.

Of course, a monostate does have an advantage whenever the represented concept allows modifications of its state outside of a reference to its instance: this happens, for instance, when representing an entity which evolves outside the program (a filesystem, controlled hardware…) and which explicitly violates the above assumption. If that is the case, then it is interesting to use a FooBarView monostate to represent the concept of FooBar so that the name implies that the object itself is only one of many different views over the same autonomous concept.

Therefore, as a rule of thumb, if a hesitation exists between using the Singleton Design Pattern and the Monostate Design Pattern, then one should use a singleton unless by its very definition the represented unique-and-global-state will change outside the application.

Misconceptions

The Singleton Design Pattern carries several misconceptions, which often lead to its misuse in a situation where it is not an accurate representation of the concepts at hand (and thus a need for refactoring when the mismatch between pattern and concept becomes clear) and to the usage of unrelated arguments in defense of its use.

  1. “I will use the Singleton Design Pattern because using global variables is not good object-oriented practice.” This is the single most encountered misuse of the pattern. It often finds its roots in the affirmations that “global variables are evil” (and thus should be avoided) and that “design patterns enforce good design” (and thus should be used). Both affirmations are incorrect, or at least partially erroneous.

    First, global variables are not evil in themselves. What is dangerous is a hard-coded reference to global mutable state : the fact that a piece of code is bound by its very implementation to a bit of mutable state, so that this binding cannot be altered or customized by the user of the code. The hard-coded binding is what makes the code difficult to reuse and thus creates a bad object-oriented practice. However, both global variables and singletons, when accessed directly by the using code, create that same hard-coded binding, and thus share the same issue (which stems, in fact, from the need to have hard-coded access to global mutable state, regardless of how that state is represented).

    Then, design patterns are not a silver bullet which automatically enforces good design. Design patterns are bits of high-level vocabulary which allows one to describe a situation or problem using fewer words, which lets the brain focus on the relevant words instead of the details. Using a design pattern to describe something which does not fit that pattern, however, is akin to calling “a tool which is used to drive nails into planks” a “hammer”: you will be in trouble when you notice the “hammer” was in fact a “nailgun”.

    In short, using a singleton to replace a variable is a misuse of the design pattern, which violates the third prerequisite (the impossibility to have more than one instance).

  2. “I will use a singleton to correct a technical issue related to the use of global variables, even though that issue does not prevent the creation of multiple instances.” The technical issues often cited here are the resolution of the initialization order problem (making sure that a global will be initialized before another variable which needs to use it during its initialization) and the possibility of lazy creation (a global variable might exist in memory for a while even if the program never uses).

    The Singleton Design Pattern can solve these issues. However, the techniques it uses are orthogonal to the techniques used to restrict it to one instance only. Therefore, using a singleton to solve the problem and then removing the one-instance restriction (usually by making the constructor public) will result in a more accurate description, and is thus preferable to both singletons and global variables.

  3. “I will only ever need one instance of this class, therefore I will use a singleton.” This violates the third prerequisite, and possibly also the fourth. Again, a singleton represents the inability for an additional instance to be created. If your object could be safely instantiated twice, but the application you are writing only uses one instance, then adding code to enforce the uniqueness of the instance is both in contradiction with the nature of the concept you are representing (because it supports multiple instances) and completely useless (because the application never attempts to create two instances anyway).

    If the third prerequisite is not satisfied, there are plenty of alternative solutions to use to achieve your goal (the most elementary example being the implementation of a singleton minus the one-instance-only restriction).

  4. “I have made this a singleton, so that I don’t have to pass it to everywhere it is needed.” This is a violation of the first prerequisite. The only reason to turn some functionality into an object is because you wish to group the related state together in a manner which can then be manipulated by functions as a local variable, an argument or a return value.

    If your representation of the concept at hand does not involve any such manipulation, then a far better representation is to use a good old procedural interface for the concept. If your code could be rewritten with minimal to a procedural solution (by replacing the get-instance-and-call-method statements with call-procedure statements), then your code is implicitly procedural: correct object-oriented code is usually much harder to convert to procedural style. Therefore, since your code is procedural, it is both easier and more transparent to represent it using procedural means than by diverting object-oriented means from their intended purpose.

    A correct object-oriented use of the Singleton Design Pattern involves using the singleton class to implement an interface, and then pass the singleton to code which is applied to that interface. And so, not only does a singleton not avoid passing the instance around, but it requires doing so.

Advantages

The main advantage of the singleton is that it is a design pattern: a name which can be instantly associated by any remotely competent software engineer to a the elementary properties of “only one instance” and “globally accessible”. This is an exceedingly important notion: instead of spending precious time writing documentation which painstakingly describes the exact behavior of a given piece of code, being careful not to imply false properties and to be exhaustive in your enumeration of all cases, and also having the user read that entire documentation twice to make sure he understood it correctly, you can simply describe the class’ instantiation policy as “Singleton” and be done with it.

Sometimes, if the intended usage of a public library class is to use a single provided global instance (even if there is no technical reason to limit yourself to a single instance) provided by the library, then I would perhaps go as far as label that class a singleton, in order to forward the point of intended usage despite the violation of prerequisite three.

However, for a class that is to be used internally, and where the users of the code are also its developers, I would not consider mislabeling: conveying a meaning which is incorrect to an user is acceptable, because the user is not expected to know about the implementation (and, for all he knows, perhaps there is a technical limitation on the number of instances), whereas conveying an incorrect meaning to a fellow developer is not.

The special treatment provided to classes in several programming languages make singletons and monostates preferable alternatives to normal procedural modules. For instance, PHP5 provides autoloading support for classes (the file for a class is loaded when that class is used in the source instead of having to be loaded manually beforehand) which makes class-based implementations of modules better. Similarly, since ECMAScript provides no namespace functionality, using a single-instance global-access object as a namespace is a valid and somewhat frequent approach.

Conclusion

A singleton is an excellent tool for describing an unique and globally accessible concept, and there is a simple checklist to determine whether a concept qualifies for representation as a singleton. However, using the singleton design pattern to represent a concept which does not have the two required properties (for instance, by altering the concept by force to make it fit) generally causes more harm than good, by misrepresenting the actual concept being manipulated.

ACID Outside Databases

The ACID acronym stands for Atomicity-Consistency-Isolation-Durability. In the context of database management systems, these terms stand for the following essential properties:

  • Atomicity is the all-or-none stance on state operations: either a modification is completely applied, or it is not applied at all (for instance, if a failure of some kind prevented complete application), and never partially applied.
  • Consistency is the requirement that any constraints placed on the data are respected after an operation is completed. If an operation cannot be completed without violating a constraint, it fails (and if Atomicity is supported, then it is not applied at all).
  • Isolation indicates that any operations performed on the data are performed sequentially (that is, one after another without overlapping), so one operation does not observe the results of another unless that other operation has already been completed. Of course, optimizations are allowed to overlap operations as long as overlapping them maintains the illusion of sequentiality.
  • Durability indicates that once an operation has been completely applied, no failure may undo that operation (aside from utter and irreversible loss of the hard drive storing the data). For instance, if the PayPal database receives money from a payment and responds with “payment received”, then the payment and corresponding funds will never disappear from the database.

Feeling Safe

The basic ideas intersect the classic principles of object-oriented design: an object has a set of possible states, along with a list of messages that it can receive and respond to, based on its current state, by returning values or changing to a new state. In this analogy, a database is an object, its states are defined positively by its internal structure and negatively by the structural constraints, and the messages it receives are read and write queries. The ACID properties are implicit here:

  • A message cannot, by definition, leave an object in-between states: it always leads from an current state to a new state (or keep the same state).
  • An object cannot, by definition, have a state that is not valid, because all object states are designed to be valid (otherwise, they wouldn’t be states).
  • There are no concurrent accesses: messages are sent one at a time and are assumed to perform immediately, right before the next one is issues.
  • There is no notion of data loss: a theoretical object lives forever as a mathematical construct.

One can suspect that this is not because the theory was carefully designed so that it includes ACID properties without explicitly naming them, but rather because humans, when they think of algorithms, automatically assume that ACID properties are true. After all, it is much more complicated to provide solid algorithms using partial execution, inconsistent intermediary states, concurrent modification of shared data, and random loss of data: there are many more possibilities to take into account, and some guarantees (such as durability) are downright impossible.

The real world, however, is not safe. Programmers in exception-enabled imperative languages routinely forget to handle some corner cases where an exception interrupts execution without a rollback, and even without exceptions it is easy to forget part of a rollback in response to a failure. Programmers of all languages encounter human communication issues where the definition of what is consistent varies between two pieces of code treating the same data. The theory of concurrent computations is still in its infancy, and its industrial applications are still restricted to stone age mutex-and-semaphore techniques in most modern languages. And, of course, both hardware and software are subject to a myriad failures that can wipe out data or processes at any time. In order to do real work without infinite brainpower, it becomes necessary to create a protective wrapper between the real world and the implementation. Heroic developers and engineers of untold skill stand between the wrapper and the chaotic unpredictability of our universe, designing redundant hardware, transaction systems, constraint verification engines and optimal locking or lock-free synchronization strategies to implement the wrapper. Then, your run-of-the-mill applications developer can stay within that that protective shell and design new software with improved productivity.

Although the implementation of that wrapper varies, what it does is ACID.

Of course, there is no single legendary wrapper. Hardware provides one small and limited ACID layer limited to certain instructions and certain register and memory states. The operating system then provides an improved set of ACID primitives . The programming language semantics and standard library provide yet another. On top of that, applications provide an increasingly complex set of layers where the atomicity of your PayPal transaction is guaranteed by a web server, which has a transactional database to support its claim, which in turn uses operating system primitives to provide that support, and these in turn rely on the hardware as a fundamental provider of ACID properties. And because everything is built on top of failure-prone silicon mazes, every layer has a decreasing but nonzero chance of failure.

It is therefore extremely important for developers of any even remotely important application to make that application ACID-compliant, otherwise it will lack reliability or plain old user-friendliness.

ACID for code design

Whether a remote SOAP API or the plain interface of a class, ACID requirements are a huge competitive advantage once satisfied. In themselves, they are often easily implemented in terms of code (although they may in some cases create performance issues), but noticing that a certain piece of code may violate ACID requires a lot of concentration and experience.

Every modification of state is, potentially, an ACID violation, because most imperative languages do not support ACID at all (except through specific concurrency-oriented or persistence-oriented libraries). Writing a member function for an arbitrary object does not guarantee that two threads will not run that function at the same time, or that the function’s execution will not be interrupted by internal or external factors without rollback. When changing the contents of a file, the writing process may be interrupted at any time by a vast number of factors.

Transaction-based systems are the skeleton (or pattern) of most solutions to ACID problems: if one defines a transaction as the smallest unit of atomic behavior, moving the object from a state to another instantaneously and permanently, then transactions by default enforce the ACID requirements. The basic philosophy of transactions is to lock first, commit last:

  • When a transaction starts, it should lock every resource it can use. This can be done explicitly (using a lock mechanism) or implicitly (by remembering the values, comparing them right before committing, and doing everything all over again if they differ). This is required to ensure isolation.
  • A transaction does not change anything before it’s ready to commit. Should a transaction fail before committing, it releases its locks, cleans up any temporary work it did, and does nothing else.
  • Once the work is done (and is verified to satisfy the constraints), it is committed using an ACID primitive supported by the wrapper it is built upon, and its locks are released.

For instance, if a script has to replace a file on the server by a file sent by the user, a bad idea would be to open the server-side file for writing, and writing the user file there: any interruption of the script would leave the server-side file in a jumbled state. The correct solution is to write the user file to a different location on the server, then use an atomic “rename” command to replace one server-side file with another. In the general case, the process of “lock, create temporary, verify temporary, replace value with temporary” performs much better than directly modifying objects.

The ideal code contains no function which does not respect ACID (and, in this regard, pure functional code is ideal).

ACID for user interfaces

Your average web site user does not know what ACID is. However, the average web user cares about:

  • Not breaking some part of the website. When, I upload a file to the web site, I obviously expect the web site to remain online and keep on working, but I also expect the file to work perfectly. This is pretty elementary and most get it right. The next step is to ensure that the file itself works. While it works most of the time, in some cases of partially failed uploads (or partially completed profiles, or partially configured accounts, or partially changed passwords) the modified element ends up in an unacceptable state. The next step is to allow rollbacks of anything: a versioning system (or its little brother the “undo” button) are great additions to any user-driven site, and an absolute requirement to many.

    This not require full atomicity: for instance, uploading ten files may be considered as a non-atomic combination of ten atomic uploads. This way, if the upload of the tenth file fails, the nine others are already present and don’t have to be uploaded again.

  • Displaying valid, coherent and up-to-date information. Don’t display my password on someone else’s screen, don’t display my relationship status change for a split second if I perform both “End relationship” and “Don’t publish relationship stories” on Facebook in one go, and don’t display a Wikipedia page as a mix of two major revisions.

    This does not require full isolation: for instance, it doesn’t matter in the vast majority of cases that the “latest threads” section is a millisecond older than the “latest messages” section on a forum.

  • Not losing any information. When I submit data to be included, I want it included, not lost in some limbo. By extension, accidental overwriting should be cancellable and submitted data may be allowed to undergo moderation workflows instead of being committed right away

Although full ACID support is not required in many cases (one of the reasons why there are several transaction isolation levels in database management systems), it never hurts to guarantee ACID first, and optimize queries later.

Video Game Bricks

I don’t believe in game engines. I tend to consider software development as the process of using existing bricks together with glue code to obtain the expect result, whereas engines provide the final result with the holes to be filled in. This is not to say that game engines are useless—countless situations can benefit from them. However, my approach to game development has always been to rely on frameworks and collections of helper and utility classes, perhaps due to my beginnings as a DirectDraw programmer.

Objective Caml games do exist, even though they often lack the backing of a full video game studio. Whether used as a prototyping tool or to develop the final product, I think the functional approach can benefit video game development. To that end, I regularly develop small bricks that can be useful for game development. All of the source code presented here is my own work, placed in the public domain.

Vectors and Transforms

Bidimensional vectors with supporting elementary mathematical operators:

type vector = { x : float; y : float } 

val right : vector
val up : vector
val zero : vector 

val equal : vector -> vector -> bool 

val ( ++ ) : vector -> vector -> vector
val ( -- ) : vector -> vector -> vector
val ( *+ ) : float -> vector -> vector 

val sqlen : vector -> float
val len : vector -> float 

exception ZeroVector
val normalize : vector -> vector 

val dot : vector -> vector -> float

The operators behave as expected. Equality and normalization work with a fixed epsilon of 1e-6. The total memory usage for a vector is 20 bytes (16 for the coordinates, 4 for internal usage), and they are manipulated by reference. Intermediary operations on vectors tend to be inlined, and intermediary values are garbage-collected when present.

Bidimensional transforms and corresponding operations:

type transform 

val identity   : transform
val rotate     : transform -> float -> transform
val scale      : transform -> float -> transform
val translate  : transform -> Vector.vector -> transform 

type transform_description =
    {
      angle    : float;
      scale    : float;
      position : Vector.vector
    } 

val make       : transform_description -> transform
val describe   : transform -> transform_description 

val apply      : transform -> Vector.vector -> Vector.vector
val compose    : transform -> transform -> transform

This type represents similarities. The total memory usage is 36 bytes (32 for the coordinates, 4 for internal usage), and they are manipulated by reference.

Downloads: vector.ml vector.mli transform.ml transform.mli

Indexing

When using functional programming, the aliasing problem makes it impossible to have several data structures directly reference a certain value without having to update all data structures when that value changes. A solution to that problem is to store indirect references instead: the value is stored in a single data structure and bound to a key. The other data structures do not reference the value, but rather “whatever the key is bound to”. Meaning that changing the single data structure holding the value also implicitely updates the value for all other data structures.

An indexing data structure provides exactly that: a way of binding a value to a key and retrieving it with that key. If no backtracks happen, then the following implementation performs all relevant operations in amortized constant time.

type         'a t
type         handle
exception    UnboundHandle 

val empty  : int -> 'a t 

val add    : 'a -> 'a t -> 'a t * handle
val remove : handle -> 'a t -> 'a t
val get    : handle -> 'a t -> 'a 

val map    : ('a -> 'b) -> 'a t -> 'b t
val mapi   : (handle -> 'a -> 'b) -> 'a t -> 'b t
val iter   : ('a -> unit) -> 'a t -> unit
val iteri  : (handle -> 'a -> unit) -> 'a t -> unit
val fold   : (handle -> 'a -> 'b -> 'b) -> 'b -> 'a t -> 'b

Note that “UnboundHandle” is not necessarily raised. If it is, then you know what happened. However, attempting to access a removed handle is, in the general case, undefined, and may return another value.

You can also look for persistent arrays for generic data storage.

Downloads: assoc.ml assoc.mli

Segregation is Good

As a frequent reader of the gamedev.net game development forums, I often notice that most beginner and intermediate developers start designing the architecture of their game by deciding on a base class from which all game objects will inherit, and storing a list of these objects. The justifications given are usually that:

  • It’s Object Oriented.
  • Inheriting from a base class allows code reuse.

The most appealing advantage of the “base class everything inherits from” approach is that it provides the illusion of a solid design foundation (that is essential to the “getting things done” feeling that motivates many of us) without requiring a lot of thought. All that is to be done is to slap together a few functions into a base class (usually plagiarized from one of the many tutorials available on the web) and then start inheriting.

The Issues

The naive approach to the base class solution can be summarized as this:

interface BaseObject
{
  void render();
  void update();
}

In theory, this interface seems to do everything required: after all, every object will have to update its state, and every frame it shall render itself to the screen. In practice, however, this approach leads to two serious issues:

  • Not all objects are both updated and rendered. For instance, a script or a trigger cannot be rendered, but they still have to be regularly updated. The least that could be done here is to separate the rendering side of things from the update side.
  • Sometimes, updating requires access to other objects. This happens for instance when objects collide or when an AI looks for a target. However, interacting with other objects only happens through the base object interface, which provides no information about what the other object is. Even assuming that all objects have a collision hull as part of their public interface (which in itself also does not make any sense, since many objects do not collide with anything), determining that the other object in a collision is a friend or an enemy requires knowledge beyond any acceptable base interface.

While the former can easily be solved by using two base classes instead of one, the latter is much more problematic. It is usually solved through contraptions such as downcasting (which eliminates the benefits of the Liskov Substitution Principle) or the Visitor Design Pattern (which eliminates the benefits of the Open-Closed Principle in most modern static languages).

The benefits

But surely people wouldn’t be using base classes this way if there wasn’t a tangible benefit to using them, would they? Well-known games (including the original Half Life engine, which was mod-friendly enough to give birth to a vast number of mods) have used this approach through the ages.

Purty pictar of waterfall. There is a fundamental rule for interface implementation and class inheritance:

The finality of every interface and every inheritance relationship is to allow code to operate on different types of objects.

The sole benefit of using a base class for game objects is to store those objects in a single container and apply update and render functions over that container. Everything else in a typical game requires using a more specific type through visitation or downcasting.

When, then, is a single container useful? In the case of a game engine, a single container is useful because on the one hand, it’s preferable to have the engine handle the storage, lifetime and processing of game objects and on the other hand, designing a game engine to allow the addition of other containers (one for each type of object, for instance) is difficult.

As such, having everything inherit from a single type makes communications between the engine and the game much easier (only one type of object has to be carried through the interface), even though it might make the development of the actual game harder.

Of course, a container may allow multiple interfaces to exist: a container for renderable objects, a container for updated objects, a container for objects with collision detection, a container for objects with a physical model, a container for objects which are synchronized over the network and so on. However, the original problem remains: when two objects collide, the collision response depends on the exact type of the objects, which exists in game space but not in engine space. The engine is therefore by construction unable to provide the original most derived types back to the game, and so the game must retrieve them by other means (note that visitation is not possible, because it requires to encode the possible most derived types in the visitor, which itself is encoded in the interface of the visited object.

However, beginner and intermediate programmers don’t write engines (or at least, they shouldn’t, because the motive that drives studios to write engines is the necessity to distribute programming tasks across large teams with varying competence, and to keep the code for later projects), they write games. The source code of these games is worked on by only a small handful of programmers, usually only one, which allows a far greater level of control, especially for purposes of refactoring. As such, the necessity of having generic containers disappears, and the main benefit of a single base class with it.

Segregate types

My advice is pretty simple: strive to keep one list per type of object. In your average video game, you would have a list of projectiles flying around, a list of AI-controlled opponents, and a player-controlled character.

Do not let this approach hurt code reuse, however. If both the player character and the AI-controller opponents can take damage from walking in lava, you don’t want to write the damage assignment code twice. Instead, you can have both the AI-controller opponents and the player-controlled character implement the same interface with a single “takeDamage()” function, and apply the damage assignment code to both. You may even share the health/damage handling code between the player and the opponents by aggregating a “health” class, by inheriting from a “Character” class which manages health, or by applying a “WithHealth” mix-in:

class IDamageTaker
{ public: virtual void takeDamage(int); }; 

template <typename T>
class WithHealth : public IDamageTaker, public T
{
  int health;
public:
  void takeDamage(int d)       { health -= d; }
  void healDamage(int d)       { health += d; }
  bool alive()           const { return health > 0; }
}; 

class Vanilla
{ public: virtual ~Vanilla() {} }; 

class PlayerCharacter : public
  WithHealth<
  WithPosition<Vanilla> >
{ /* Player-related data here */ }; 

class AIOpponent : public
  WithHealth<
  WithPosition<
  WithTarget<Vanilla> > >
{ /* Opponent-related data here */ };

Now, the code for health management is not duplicated and the code for assigning damage is not duplicated. The only dupicated code is the traversal of the various lists (since instead of traversing a single list, now several lists must be traversed in order to reach all elements), and that code is not only very small, but also fairly easy to factor out (for instance, by implementing an iterator that traverses several containers in order).

On the other hand, because of the presence of several lists, there is no more need for determining whether a projectile has hit an opponent or the player: the type of the object is known simply because the currently processed list is also known.