Monthly Archive for December, 2008

Javascript Unit Testing

It has recently come to my attention that a new unit testing framework for Javascript had become available : FireUnit is a Firefox extension that integrates with Firebug to allow automatic testing. It includes features such as pretty-printing of test results in the Firebug console and simulation of user-side events.

I’m marginally skeptical about this. I can see where the thing can be useful: you have a standalone data model that should evolve according to certain rules when manipulated in certain ways. In those cases, writing an unit test happens like in any other programming language and everyone is happy.

But, like in other programming languages, whatever is related to user interaction (reading input, displaying things) is inevitably much harder to test:

  • it’s hard to determine whether a result is displayed correctly or if something’s off, it’s even harder to determine whether the result will be displayed correctly in other configurations (including, for instance, other browsers), and animations are hellish to check because of time-dependence,
  • third party frameworks often allow ways of changing states but not of querying them (for instance, jQuery allows setting a movement queue for animation, but querying the current movement queue is harder, and does not allow checking whether an element is shown, hidden or in-between),
  • many elementary errors rely on race conditions (press two buttons in short sequence while the server is busy) because of the lack of synchronization, and these are an order of magnitude harder to test than single-threaded behavior,
  • most of the AJAX interaction has a lot of special cases that take a while to be mocked, because of all the possible server response issues and delays,
  • like many user interfaces, JavaScript code on pages tends to change frequently as designers and usability experts change their minds about particular parts, making unit tests an investment harder to amortize

My opinion on this is: if your JavaScript codebase is mostly made up of libraries and isolated components that have a cleanly described API, you can test the “put this in, get this out” rules documented by the API using unit tests, and you will be happy. On the other hand, if your usage of JavaScript is one-shot dynamization of pages using adapted libraries (and I suspect the vast majority of JavaScript development occurs in these cases) then there isn’t really a lot to be tested: the test cases are very complex and would take a lot of time to be implemented to begin with, and might change the next day anyway.

Welcome Back

My Christmas Gift of 2004, from my parents, was the nicollet.net domain name and some lightweight hosting. The old nicollet.net server (well, actually, small part of a server) served me well over the years, with over 100k visitors:

Total Visitors 107,104
Total Pageviews 1,318,695
Total Hits 1,650,862
Total Bytes Transferred 14.54GB
Average Visitors Per Day 74
Average Pageviews Per Day 911
Average Hits Per Day 1,141
Average Bytes Transferred Per Day 10.30MB
Average Pageviews Per Visitor 12
Average Hits Per Visitor 15
Average Bytes Per Visitor 145,756
Average Length of Visit 177sec

Not really impressive over such a long period, but until August 2008 the site didn’t contain much beyond a few files I needed on other computers, some pictures and documents I wanted to share, and perhaps small snippets about development that I uploaded once every two months.

Then, on August 2008, the blog started. I had recently uploaded a fresh install of Joomla! for managing my files, and since it more-or-less handled blog-like layouts, I tried to write a blog. That was a mistake, of course: the default install of Joomla!, even more so the 1.5 version with a lot less support for external modules and plug-ins. The best free component for comments I could find was Chronocomments, which didn’t work for all my visitors, but did work for many spammers: starting from November, I received more spam on my website than through all my mailboxes combined, totaling around 900 spam comments (and only 8 legitimate comments).

Either way, through some advertising (mainly, by writing useful things on my blog and then linking to my blog from forum threads where those articles would answer the question being asked) I managed to get a reasonable number of visitors (around 60 daily non-bot visitors).

The visitors for december are only counted up to the 22nd (for some reason, the log analysis tool doesn’t want to show more than that.

What now?

As of today (December 30th), I’m moving on to a new server. The old one supported PHP4 and an old version of MySQL, and didn’t have much in the way of freedom (only FTP access was allowed). The new one is a good old dedicated server running Debian Etch 4, on which I’ve installed myself Apache2, PHP5 and MySQL 5. I have also moved from Joomla! to WordPress for my blog management software.

The transfer is not yet complete (I spent five hours yesterday converting all my blog articles from the old format to the new one) and I couldn’t transfer the comments. My e-mail address (firstname@nicollet.net) is down for the time being, because I have to install a new mail server myself and I won’t be able to get around it until the end of the week. I also have to re-upload my teaching material, which will also be done by next week.

The URLs are still a bit wonky (the DNS transfer was completed after WordPress was installed, so the server still calls itself by its old name, r17474.ovh.net) but should be ironed out soon. The URL naming scheme I decided to use in WordPress (nicollet.net/year/month/title) is different from the one in Joomla! (nicollet.net/blog/category/id-title) but I am in the process of setting up the appropriate redirections from the old site to the new one so that no articles are lost.

As far as the RSS feeds, however, I’m afraid they’ll be discontinued (partly because of the silly URL structure in Joomla! and partly because the structure of the new blog version itself has somewhat changed, and now includes articles independent of the current day, such as this one). Make sure you scroll to the bottom of the page and get the new global feed (or just click here).

Happy new year, and good reading!

Member and Non-Member Functions

The C++ language provides two flavors of functions: member functions and non-member functions. A function is something which is used like this:

result = function(arguments);

By contrast, a member function is used like this:

result = object.function(arguments);

Because of this, it’s impossible to use member functions and non-member functions in the same way: where a non-member function can be called on its own (provided that you have the arguments for it), a member function also requires an object on which it is called.

The Window Procedure Problem

This can be quite confusing to beginners, for instance when defining the window procedure in the Win32 API: this requires defining a non-member function with a certain set of arguments, and then providing a function pointer (referencing that function) to the API for registering it with a window. This usually looks like this:

LRESULT CALLBACK MyWndProc(HWND, UINT, WPARAM, LPARAM)
{
  // Implementation
}  

int APIENTRY WinMain(HINSTANCE, HINSTANCE, LPSTR, int)
{
    WNDCLASS wc;  

    wc.lpfnWndProc = (WNDPROC) MyWndProc;  

    // Register the other WNDCLASS properties 

    RegisterClass(&wc);
}

A question that is often asked on beginner forums is, how do I use a member function of my Window C++ class as a window procedure?

This is, of course, impossible. The main reason why it’s impossible is that it doesn’t make any sense: a window procedure, in the Win32 API, is a procedure that will be called on every window of that window class. By contrast, a user-defined Window C++ class tends to have a “one instance = one window” approach to things, so that even if the same function is called for every window, the object on which it is to be called is different, leading to the question of how those objects are specified to the Win32 API.

A New Hope

Of course, the user could add the constraint “one window = one Win32 window class”, which means that a given window procedure will only ever be called for a single window (the question of whether using a window class to create a single window is an abuse of window classes is best left alone). Then, common sense would imply that a member function could be used as a non-member function as long as the object on which it is called is provided:

class MyWindowClass
{
public:
  LRESULT CALLBACK WndProc(HWND, UINT, WPARAM, LPARAM)
  {
    // Implementation
  }
}; 

int APIENTRY WinMain(HINSTANCE, HINSTANCE, LPSTR, int)
{
    MyWindowClass windowInstance;
    WNDCLASS wc;  

    wc.lpfnWndProc = (WNDPROC) windowInstance.WndProc;  

    // Register the other WNDCLASS properties 

    RegisterClass(&wc);
}

The hope here is that C++ will somehow detect that “object.function” can be used in the form “result = object.function(arguments)” and therefore behaves just like a normal function. Of course, this doesn’t work: C++ doesn’t support closures.

Closures

By default, a C++ function pointer (the data type used to manipulate functions beyond simply calling them right away) is just a pointer to a function. This means that it can only point at a finite set of values, which happens to be the set of all available non-member functions (of the appropriate type) within your application. It does not carry any other values.

The problem is that “object.function” is not a non-member function defined within the application: it’s a member function being applied to an object. There is no way for the compiler to detect and define this function at compile-time automatically: you could have any number of objects created and destroyed within your application at runtime, and “object.function” will do a different thing for every one of these objects (because this is how member functions work, after all: behave differently on every object).

In short, there’s no workaround for the fact that “object.function” is the association of a member function (“function”) and a piece of data (“object”), and that no built-in C++ type can store both a function and a piece of data (this is commonly known as a closure, and C++ does not support these).

Of course, since closures are a fairly useful tools, a set of replacement tools have been provided: you still can’t use a normal function pointer (or member function pointer) to represent them, but you can use another variable, or another type.

The most elementary example was known from C. C APIs that let you define callbacks usually let you define callback data as well (unless they are brain-dead, which sometimes happens with C APIs). Callback data is the “data” part of the closure, while the callback itself is the “function” part of the closure. It all starts with a callback function that takes an additional “data” argument:

void myCallback(void *callbackData)
{
  MyObjectType &object = *reinterpret_cast<MyObjectType*>(callbackData);
  object.function();
} 

MyObjectType bobby;
hypotheticalAPI(myCallback,&bobby); 

MyObjectType timmy;
hypotheticalAPI(myCallback,&timmy);

Here, the ‘hypotheticalAPI’ function stores the callback and data together, then calls the callback function with the associated data as its first argument. This means that the “myCallback” function will call either “bobby.function()” or “timmy.function()” depending on which callback is activated.

This is actually the way the Win32 API is intended to work. The difference here is that you don’t pass another argument this way, but instead associate the callback data to the handle of the window. This works more or less like this:

class MyWindowClass
{
public:
  LRESULT CALLBACK WndProc(HWND, UINT, WPARAM, LPARAM)
  {
    // Implementation 
  }
};  

LRESULT CALLBACK MyWindowClassWndProc(HWND h, UINT i, WPARAM w, LPARAM l)
{
  MyWindowClass &object =
    *reinterpret_cast<MyWindowClass*>
      (GetWindowLongPtr(h, DWLP_USER)); 

  return object.WndProc(h,i,w,l);
}  

int APIENTRY WinMain(HINSTANCE, HINSTANCE, LPSTR, int)
{
    MyWindowClass windowInstance;
    WNDCLASS wc;   

    wc.lpfnWndProc = (WNDPROC) MyWindowClassWndProc;   

    // Register the other WNDCLASS properties  

    RegisterClass(&wc);  

    HWND h = CreateWindow( /* args */ ); 

    SetWindowLongPtr(h, DWLP_USER, reinterpret_cast<LONG_PTR>(&windowInstance));
}

The SetWindowLongPtr function is used to associate the address of our windowInstance object to the window handle. The window procedure we register then uses the GetWindowLongPtr function to retrieve the windowInstance object, and calls its member function with the correct arguments.

Functors

Aside from the old C trick, C++ also provides the user with the ability to overload “operator()”, which means the ability to have any object behave like a function. And since objects can carry arbitrary amounts of data, having them behave like closures is a very simple task. The most naive implementation of this happens in the standard library itself: instead of having the various algorithm functions (transform, accumulate etc) take function pointers, these functions take an arbitrary type as the operation to be applied. This way, any object with a correctly implemented “operator()” can be used:

struct multiplyBy
{
  int lhs;
  int operator(int rhs) const { return lhs * rhs; }
  multiplyBy(int lhs) : lhs(lhs);
}; 

std::transform(std::istream_iterator<int>(std::cin),
               std::istream_iterator<int>(),
               multiplyBy(10),
               std::ostream_iterator<int>(std::cout,"\n"));

The implementation is quite simple: instead of defining a function pointer argument, use an arbitrary type parameter:

template<typename F>
int apply_twice(int value, F f)
{
  return f( f(value) );
} 

apply_twice(5, multiplyBy(2));
// Output : 20

However, this requires compile-time resolving of the function: since the type of the functor is a template argument, the same function template cannot be made to accept, at runtime, both a normal “int (*)(int)” function and a “multiplyBy” instance.

Boost.Function

The obvious solution, then, would be to use runtime polymorphism provided by virtual functions: make a base “function” class where “operator()” is virtual, then create a derived class template which, given a functor (or normal function pointer) wraps around that functor so that it can be used wherever the base class can. The boost library actually goes a little bit further in this respect: it provides a clean way of specifying the type of the wrapped function, and handles the memory allocation of the underlying base class, so that the functions become value types:

int apply_twice(int value, boost::function<int(int)> f)
{
  return f( f(value) );
} 

boost::function<int(int)> multiplyBy(int i)
{
  return boost::bind(std::multiplies<int>, i);
} 

apply_twice(5, multiplyBy(2));

Boost.Function also handles member functions out of the box:

struct integer
{
  int value;
  integer(int value) : value(value) {}
  int add(int other) { return value + other; }
}; 

integer i = 3;
boost::function<int(int)> add3 = boost::bind(&integer::add, i); 

apply_twice(5, add3);

The documentation for version 1.37.0 is available here :

http://www.boost.org/doc/libs/1_37_0/doc/html/function.html
http://www.boost.org/doc/libs/1_37_0/libs/bind/bind.html

Was this information useful? Feel free to leave a comment below, or subscribe to my RSS feed!

Output Buffering

A program has an object, which it needs to convert to a string. On the one hand, converting to a string is often more efficient by appending the individual sub-strings to an output channel, because the output channel is optimized precisely for that (it allocates a large memory buffer which is gradually filled) whereas string concatenation cannot reuse the memory space of one operand nor over-allocate memory for it. On the other hand, writing the string to an output channel means there will be no means of retrieving that string, because it has been output and therefore left the program.

This has led, in many languages, to the development of a channel-like object that outputs to a string instead of sending out the data. Java has java.lang.StringBuilder, C++ has std::stringstream and .NET has System.IO.StringWriter. In this regard, PHP takes a quite original route.

PHP has always heavily relied on inline tags to construct output. This means that unlike Java,  C++ or .NET programs, a typical PHP script will be regularly interrupting its normal execution process to include some raw text that is immediately output back to the user:

Hello, <?=$_SERVER['REMOTE_ADDR']?>!

A keyboard under orange light.Where other languages usually include text as string literals (or text loaded from external configuration files) and output these by manually sending them to a stream:

std::cout << "Hello, "
          << getenv("REMOTE_ADDR")
          << "!";

Explicit output means that it’s possible to replace std::cout with a custom string stream (or a polymorphic ostream parameter) to create a string from the output instead of directly sending it to standard output. Implicit output means there’s no “I’m writing this to standard output” indicator to replace. Sure, PHP does support the standard “use literals and print-to-string” option, using the traditional echo, print and printf functions, and it’s possible to write a program without ever using a single line of inline HTML, but it’s not an encouraged practice: quite to the contrary, Zend_View actually relies on inline HTML with its phtml files.

The PHP solution is known as output buffering. Instead of allowing additional output channels in addition to the normal “standard output” global channel, PHP allows replacing the global channel with another. This is done by using ob_start(), and is ended by either ob_get_clean() or ob_end_flush(). The former returns whatever was output since the global channel was replaced as a string, the latter outputs it to the previously active global channel. Both then restore the previously active buffer, which makes it possible to create a stack of temporary replacements of the global output channel.

This has several interesting properties. The first is that now, it is preferable to print out any kind of data either using print/printf or using inline HTML, depending on the nature of what is being output, instead of manually concatenating a string, returning it through several layers of functions, and finally printing it at some point:

<?php 

ob_start();
foreach( $data as $row )
  printf("<li>%s</li>", $row );  
printf("<ul>%s</li>", ob_get_clean());

Output buffering can also be used to capture inline HTML, of course:

<? ob_start(); ?>Hello, <?=$_SERVER['REMOTE_ADDR']?>!<? print md5(ob_get_clean()); ?>

Another interesting consequence is that now, it’s possible to prevent any output from being sent before the headers have all ben generated.This is because, by default, PHP streams data to the user as soon as it becomes available, and the HTTP response headers are, by definition, to be sent before everything else, which means the first time PHP detects an output it will send all the headers registered so far, then ignore any further headers with a warning. By starting the very first script with an ob_start(), all output is buffered until ob_end_flush() is manually called, which allows all headers to be completed beforehand without risk of accidental sending.

Any function which usually prints out data directly (such as fpassthru()) can also be captured by this.

Last but not least, output buffering allows output filtering: an optional argument to ob_start can be a function used to process the data before it’s output: when the output is extracted (sent to the previous output, or as a string) the function is called on the data and the return value is used instead. So, the above example is equivalent to :

<? ob_start('md5'); ?>Hello, <?=$_SERVER['REMOTE_ADDR']?>!

Note that all buffers are flushed in order when the script ends, so there’s no risk of forgetting to flush a buffer and having no output.

More information about output buffering is available in the PHP manual.

And until next time, Joyeux Noël.

Destroy After Use

Exceptions make the control flow of a program quite complex, since any call could possibly create an exception and thus leave the currently executing function. Even with a garbage collector, certain resources (such as files) have to be manually released. Some languages use destructors and RAII (for instance, C++) to handle scope-based release, others use an explicit using(){} or finally {} block to also add a scope to such resources.

None of this exists in OCaml.

It can be rather easily reconstructed using existing language semantics, however:

let scoped user resource clean =
  try
    let result = user resource in
    clean resource ;
    result
  with exn ->
    clean resource ;
    raise exn 

let with_input name f =
  scoped f (open_in_bin name) close_in 

let with_output name f =
  scoped f (open_out_bin name) close_out

Agnosticism

It is a good idea (as far as code extension and maintenance) to separate responsibilities as much as possible, and to restrict use of shared data. Consider a situation where a video game developer is writing an Opponent object: that opponent should be able to shoot a Rocket. However, a rocket object on its own does not have a lot of purpose: it needs to be handled by the model code so that it moves around and explodes when it hits something, and it needs to reference some audio and video resources. The problem is that the opponent object cannot and should not have access to either the model code (quite the opposite: model code has access to the opponent instead) or a resource management entity able of providing rocket-related assets.

Of course, if the objective was to keep code as simple as possible, a quick solution is possible: make the resource management entity and the model code globally accessible, and let the opponent code access them. Even without going as far as globals, one could provide the opponent code with references to them. But the problem of separation is quite annoying, because now the opponent has more powers over the game than it should—if all that is required is to shoot rockets, then the opponent should only be provided with the ability to shoot rockets. This also makes the “shoot rocket” logic independent of the opponent (because it has to somehow be provided by the model  code) and thus more easily reusable.

In a pure object-oriented approach, there’s at least one easy answer: provide the opponent code with an instance implementing all the operations it ever needs to perform, including the “shoot rocket” operation. The opponent then calls the appropriate methods of that instance, and the instance is actually an adapter that forwards the calls to the appropriate sections of the model code.

There are two ways of eliminating the dependency of a module User on a module Used (though they may not be applicable in all cases):

  • Hide the model Used behind an interface IUsed and have User call methods of IUsed.
  • Have Used accept messages as a data type independent of both Used and User, have User return the necessary messages, and forward the returned messages to Used.

Monad-Based View Optimisation

There are two main execution models for web scripts. One model is to have the web server (apache, IIS) start a new instance of the script when a request for that script is found. The script then executes from beginning to end, performing its entire initialization, response and shutdown cycle. Caching may occur at both the bytecode level (to avoid parsing the source file several times) and explicitly in the code (querying a cache system for existing data). The other model is to have the web scripts run as a process in cooperation with the web server.

In this other model, initialization is performed only once, and the data is then kept in memory, which allows more in-memory caching opportunities. For example, views.

A view is a function which reads data and outputs a formatted document based on that data. Of course, language and design considerations might turn the view into an object or module, but the general concept of “data goes in, document comes out” remains constant in all situations. A “user profile” page would be a view for a social networking site, reading user data from the data model and writing it as a document.

A view helper is a function which reads data and outputs a formatted sub-document—the main difference is that instead of being sent to the user directly, the view helper is instead incorporated into a document. That is, view helpers are used to help views render the data. For instance, a “friend” view helper would read the data for a single friend and output it both on the “user profile” and the “view all friends” pages.

A simplistic illustration in Objective Caml (this language choice will be explained later on, although by reading the ‘monad’ word in the title you would have guessed that functional programming would be involved today):

let showFriend(friend) =
  print "<img src='/images/";
  print (User.id friend);
  print ".png'/><span>";
  print (User.name friend);
  print "</span>" 

let showUserPage(user) =
  print "<html><head><title>";
  print (User.name user);
  print "</title></head><body>";
  print "<h2>Friends (";
  print (string_of_int (List.length (User.friends user)));
  print ")</h2><ul>";
  List.iter
    (fun friend ->
      print "<li>";
      showFriend(friend);
      print "</li>")
    (User.friends user);
  print "</ul></body></html>"

This is, of course, very simple. It’s quite probable that a complex page could use hundreds of view helpers to further improve code reuse.

There’s a problem with this approach, however. If the logic is pushed to the extreme, a page will be composed of hundreds of function calls that each append a very small number of characters to the final document. This can reduce program performance because of multiple concatenation options.

This limitation is already present in the simple example above: instead of printing a friend as “<li><img src=’/images/”, “.png’/><span>” and “</span></li>”, it prints the individual “<li>” and “</li>” separated from the rest. The compiler can do nothing about this, because it has no reason to know that print “a”; print “b” is equivalent to print “ab”.

Monads

It’s possible, however, to provide the compiler with this information: monads allow the programmer to represent a function with side-effects as a non-function type, which can then be pre-processed when the script is initialized so that it is always used in an optimized state.

Let’s consider the following situation:

  • A view is something that reads data and outputs a document. It’s not a function, but it can be converted to such a function after an optimization process.
  • The function “show” creates a view which outputs the passed string: show “text” reads data and outputs “text”.
  • The function “call” creates a view which calls a function on the passed data and outputs the result: call func outputs func(data).
  • The operator “++” creates a view from two views. The two views are rendered with the same data one after another.
  • The function “foreach” creates a view which calls a function, then calls another function on every element in the returned data set, then concatenates the resulting views using “++”.

Then, the above code will look like this:

let showFriend =
     show "<img src='/images/"
  ++ call User.id
  ++ show ".png'/><span>"
  ++ call User.name
  ++ show "</span>" 

let showUserPage =
     show "<html><head><title>"
  ++ call User.name
  ++ show "</title></head><body>"
  ++ show "<h2>Friends ("
  ++ call (fun user -> string_of_int (List.length (User.friends user)))
  ++ show ")</h2><ul>"
  ++ foreach User.friends
     (   show "<li>"
      ++ showFriend
      ++ show "</li>" )
  ++ show "</ul></body></html>"

Notice how foreach looks idiomatic and quite similar to typical imperative languages, and how showFriend is used just like any other view. Now, both showFriend and showUserPage are values, not functions, so they are evaluated at initialization time instead of every time they are called. This makes it possible to optimize things as necessary once and for all.

A non-optimized version of the above would simply perform concatenation:

let (++)    f g x = f x ^ g x
let show      s x = s
let call      f x = f x
let foreach l f x = String.concat "" (List.map (l x) f)

An optimized version could use a complex type: a view is a list of things to print, and consecutive “show a constant string” instructions can be collapsed through concatenation at initialization time.

type instr =
  | S of string
  | C of (data -> string)
  | F of (data -> data list) * instr 

let (++) f g =
  List.fold_right (fun instr acc ->
    match instr, acc with
      | S a, S b::t -> S (a^b)::t
      | i  , a      -> i :: a) f g 

let show s = [C s]
let call f = [V f]
let foreach l f = [F (l,f)] 

let rec render v x =
  List.iter (function
    | S s     -> print s
    | C f     -> print (f x)
    | F (l,f) -> List.iter (render f) (l x)) v

Since the “++” concatenation function collapses consecutive constants, the two views above would look like this:

[S "<img src='/images";
 C User.id;
 S ".png'/><span>";
 C User.name;
 S "</span>"]
[S "<html><head><title>";
 C user.name;
 S "</title></head><body><h2>Friends (";
 C (fun user -> string_of_int (List.length (User.friends user));
 S ")</h2><ul>";
 F (User.friends, [S "<li><img src='/images";
                   C User.id;
                   S ".png'/><span>";
                   C User.name;
                   S "</span></li>"]);
 S "</ul></body></html>"]

So, the definition of the friends view has been inlined in the user view, and the constant “<li>” and “</li>” have been collapsed.

Looking for Minima

Annealing is a process to harden metals: by repeatedly heating and freezing metal, atoms within the metallic structure are dislodged and then moved back into place, which moves those atoms that were not part of a regular structure to more correct positions, resulting in a stronger metal piece.

The underlying process is that of energy reduction: an atom in a non-regular position within a crystalline lattice has a high potential energy, but is kept into place because it has reached a local energy minimum (so moving away requires more energy than it has). By imcreasing the available energy, the atom is allowed to move out of the local minimum, and so it ends up in a location with lower energy (with a reasonable probability, though this is by no means certain). By applying this process a high enough number of times, the global minimum can be reached.

Simulated Annealing

In computer science, that process can be applied to minimization techniques: instead of exploring the problem space completely, annealing is simulating by moving around randomly in that space. In order to move towards minima, every move which increases the value can be canceled with a certain probability that increases as time passes.

For example:

let annealing steps energy initial =
  let rec aux n best best_energy current =
    if n = 0 then best else
      let current = random_move current in
      let new_energy = energy current in
      if new_energy < best_energy then
        aux (n-1) current new_energy current
      else if Random.int steps < n then
        aux (n-1) best best_energy best
      else
        aux (n-1) best best_energy current
  in aux steps initial (energy initial) initial

This is by no means the only possible approach: other variations can involve checking whether the move is upwards since the last position (instead of the best position so far) or using a different probability approach for preventing up-moves.

Genetic Computing

A common problem found in simulated annealing and genetic algorithms is the implementation of solutions as programs: since these approaches require the ability to randomly mutate a solution, and a program is usually a fickle and weak construct that can easily die if one of its parts is forcibly removed, representing programs as randomly altered entities is difficult.

One possible solution is to consider stack based-languages: these languages have very light syntax requirements and can be represented mostly as a sequence of words. Besides, the only way for a program in these languages to die is to underflow the stack (which can be easily solved by stating that reading from an empty stack yields a reasonable value, such as zero or one. My concept was to use a language which reads a single integer variable ‘x’ and manipulates a stack ‘s’, and outputs a stack of numbers. These numbers are then multiplied together to yield the final result. The available instructions are:

let operations =
  [|
    "nop", (fun (s,_) -> s);
    "pop", (fun (s,_) -> !s) ;
    "dup", (fun (s,_) -> t s :: s);
    "div", (fun (s,_) -> (t s /. t !s) :: !!s);
    "two", (fun (s,_) -> 2. :: s);
    "mul", (fun (s,_) -> (t s *. t !s) :: !!s);
    "add", (fun (s,_) -> (t s +. t !s) :: !!s);
    "cos", (fun (s,_) -> cos (t s) :: !s);
    "sin", (fun (s,_) -> sin (t s) :: !s);
    "sub", (fun (s,_) -> (t s -. t !s) :: !!s);
    "min", (fun (s,_) -> min (t s) (t !s) :: !!s);
    "max", (fun (s,_) -> max (t s) (t !s) :: !!s);
    "avg", (fun (s,_) -> ((t s +. t !s) *. 0.5) :: !!s);
    "var", (fun (s,x) -> x :: s);
    "bra", (fun (s,_) -> (if t s > 0. then t !s else t !!s) :: !!!s);
    "zer", (fun (s,_) -> 0. :: s);
    "one", (fun (s,_) -> 1. :: s);
    "neg", (fun (s,_) -> -1. :: s);
    "opp", (fun (s,_) -> -. (t s) :: !s);
    "dp2", (fun (s,_) -> t s :: t s :: s);
  |]

I have altered the meaining of ‘!’ to represent the tail of the list (or, if the list is empty, the empty list), and defined ‘t’ as the top of the list (or 0. if the list is empty).

The evaluation algorithm is extremely straightforward:

let run p x =
  List.fold_left ( *. ) 1.
    (Array.fold_left (fun s (_,o) -> o (s,x)) [] p)

I then apply simulated annealing to such programs (using a program size of 10 instructions) for 10000 iterations. The energy function is simply the error between the value computed by the program and the expected value. For instance, with the function ‘fun x -> x’, simulated annealing gives me the program:

cos dup pop nop bra cos dp2 nop var nop

Which can be rewritten as:

let t = cos (if 0 > (cos 0) then 0 else 0) in t *. t *. t *. x

This is a fairly convoluted way of saying “x”. Another example, with the function ‘fun x -> x +. x *. x’:

max bra one dup nop max add var add var

The infix form here would be:

((if max 0 0 > 0 then 0 else 0) + (max 1 1) + x) * x

Which is just ‘(1+x)*x’.

Code Template Parameters

One cannot provide a function pointer as a template argument. This means that cases in which polymorphism is desirable for compile-time reasons (such as using one set of operations when testing and another set of operations when running the code), or where the behavior of code depends on a compile-time element and should be able to evolve in an open-closed fashion, cannot be handled by passing the correct function pointer.

Of course, a common trick in template metaprogramming is to replace entities which cannot be used as parameters (such as non-integer values) with types that have the entities as static members.

template <typename T>
void Frobnicate(int value) { T::Process(value); } 

struct Print
{ static void Process(int value) { std::cout << value; } }; 

struct Ignore
{ static void Process(int) {} }; 

Frobnicate<Print>(10);
Frobnicate<Ignore>(10);

As long as the structure that contains the function has external linkage (that is, it’s not defined inside a function) this will work and inject the correct kind of behavior into the function template at compile-time, possibly also performing any required inlining.

A typical application of this approach is, as mentioned earlier, unit testing: when working with rendering-related or input-related objects, it’s useful to replace their primitives with mocks that can test if the calls performed were those expected. This replacement can be done with runtime polymorphism (using an interface for the primitives, with a real class and a mock class boh implementing it) but leads to unnecessary performance loss (the overhead of runtime polymorphism that is not really required outside of the unit testing scenario). Using compile-time polymorphism based on templates, unit testing works, but the runtime code is still as optimised as if no unit testing ever happened.

Adobe Flash 10 : 3D in your browser

A short while ago, Adobe set loose the newest version of the Flash player, along with the development SDK.

Among the advertised benefits of this new player, there is the ability to use native support for drawing three-dimensional objects. Nothing on the scale of Papervision3D, though: the programming interface is extremely limited in its scope and, although a great step from what Flash supported natively in earlier versions, it’s still a long way from what desktop 3D programmers have grown accustomed to. We can only hope for existing engine developers (or new teams) to harness the low-level primitives of Flash Player 10 into high-level drawing primitives.

Graphics Processing Units

I suspect that many readers, especially those with a Flash/Flex background, are not familiar with the modern graphics pipeline found in the hardware of most reasonably recent computers, so a quick primer on the subject could be helpful.

The CPU in a typical computer is a general-purpose piece of hardware: it can handle 3D as well as it can handle spreadsheets or web browsing. This led to the creation of a new class of software which was specifically designed to perform computations related to 3D: it could only do these computations, but it did them darn fast. So, video games and CAO programs started delegating some processing to this hardware, which slowly evolved over time into our modern GPU hardware.

Early on, the GPU could only handle elementary functions such as projecting 3D triangles on a 2D screen, but quickly gained new features, such as mapping textures on triangles or handling lights (this is what the TnL revolution was: Texture and Lighting the availability of Transform and Lighting in hardware), and more recently the ability to apply an arbitrary amount of operations independently to every pixel drawn on the screen. Today’s GPUs are massive beasts, able to handle thousands of pixels in parallel where a CPU could only handle one or two.

CPU and GPU architectures How does it work?

The CPU collects or creates the data that should be displayed. This is generally composed of 3D data (triangles that are to be drawn) and 2D data (textures to be mapped onto triangles). Most of the time, these are sent to the GPU during an initialization step and stored in the GPU memory. Sometimes, new data is computed at runtime and is sent asynchronously (that is, the GPU receives the new data while drawing the old data, so that no time is wasted).

Once the required data is present on the GPU, the CPU sends rendering instructions. These instructions are usually quite small (“Draw object #3 at position (x,y,z) using textures #5 and #7″) but result in massive amounts of work (Object #3 contains thousands of triangles and fills millions of pixels on the screen) so that the majority of data transfers happen between the GPU local memory and the GPU local processor using high-speed connections on the video card.

The result is that the CPU can send a few dozen cheap rendering instructions, then handle whatever it needs to handle (gathering user input, performing physics simulations, handling the network connections), while the the GPU toils away at running the provided instructions on thousands of polygons and millions of pixels with amazing performance.

A common techique to improve performance is batching. Once the multiprocessors start working on a new instruction, they work fast, but starting a new task takes some time. As such, drawing 600 polygons in one instruction is significantly slower than drawing 300 polygons twice with two instructions. Batching consists in grouping as many polygons as possible in a single batch that can be drawn with a single instruction, and is the backbone of performance improvement.

The exact details of what happens on the GPU has changed over time, with the addition of textures, lighting, instancing, bump-mapping, arbitrary pixel shaders and vertex shaders, 64-bit numbers, integer support, the unified shader model of DirectX 10, and a whole lot of other esoteric names for esoteric features. The core processes delegated to the GPU, however, have not really changed:

  • Vertex operations involve gathering 3D data from various sources (usually, local memory that has been initialized with data from the CPU) and then transforming it according to certain sets of rules, so that it ends up as a 2D shape projected onto the screen.Various additional operations at this stage involve:
    • Lighting: for every point of the 3D shape, the illumination is computed and stored.
    • Instancing: creating many copies of a single piece of 3D data, such as soldiers in an army or trees in a forest.
    • Tessellation: transforming non-polygonal curved surfaces into polygons.
  • Pixel operations involve filling the projected 3D data with on-screen pixels, using predefined rules for deciding what the color will be or whether the pixel is to be drawn.A large number of additional operations at this stage can be performed:
    • Blending: instead of replacing existing color on the screen, a mix is computed to achieve transparency or illumination effects.
    • Textures: data for every pixel is collected from one or more textures.
    • Lighting: using the illumination stored in the 3D shape, as well as other sources (normal maps, bump maps) illumination is computed for every pixel.
    • Depth: if the pixel being drawn is behind the pixel present at the same location on the screen, it’s not drawn.
    • Stencils: using a screen-sized boolean texture, rendering can be limited to only certain areas.

The result of these operations can be displayed or kept in an off-screen buffer for use as a source of 2D data later on.

Two steps in GPU processing

Back to Flash Player 10

Flash Player 10 allows the programmer to control the pixel operations level. That is, the programmer provides the 2D data and the Shapes, and the Flash player performs the required pixel operations and outputs the resulting pixels to a Graphics object. The function to perform this is the raw Graphics.drawTriangles(), or the combination of Graphics.drawGraphicsData() and the GraphicsTrianglePath class.

I see two problems with this:

There is no Depth Buffering

This is the biggest problem, in my opinion: drawing projected data does not handle the depth of pixels at all, which makes it impossible to tell whether a certain triangle stands in front of another. As a consequence, the Graphics.drawTriangles() function assumes that the triangles it is asked to draw are ordered from the furthest to the nearest using the Painter’s Algorithm. Since triangles have no reason to be in the correct order to begin with, the programmer is forced to sort triangles by hand. This has two obvious performance-related consequences:

  • No caching of the polygon data on the GPU is possible: since the order of polygons might change from one frame to the next, the CPU-GPU transfer lines are used to send the polygon data every frame, which decreases performance.Even worse, general batching is impossible: if polygons from two batches are interleaved, then the batches must be split to respect the order of the polygons. This means that instead of sending 600-polygon batches to the GPU like a video game would, the Flash Player will be sending a whole lot of 2-polygon or 4-polygon batches instead.

    Of course, smart programmers might still achieve batching as a custom trick in specific situations, but common rendering techniques which automatically achieve high amounts of batching tend to rely on depth buffering to work, and as such might not be carried over as easily.

  • Instead of being done on the GPU (often in a highly optimized fashion), sorting has to happen on the CPU. Polygon-based sorting is either incorrect (sorting polygon centers might result in overlap problems with polygons that are not facing the eye), or complex (using a BSP tree to determine optimal rendering order). Not to mention the problem of overlapping or crossing polygons in the Painter’s Algorithm which can only be solved by slicing the inconvenient polygons beforehand.

Vertex operations remain on the CPU

In order to know where to render things, one needs to project 3D shapes onto a 2D screen. This task is traditionally performed by the GPU in classic desktop applications, but has to be done on the CPU in Flash 10. This is usually done by constructing a view-and-projection matrix, and then using Matrix3D.transformVector() to transform vectors individually, then manually extracting the x and y coordinates of the resulting vectors and plugging them in an array that is compatible with Graphics.drawTriangles().

Resist the temptation of using Matrix3D.transformVectors() to transform an array of vectors in one shot: while this may result in higher performance, it is not equivalent: if your matrix incorporates projection (which is required for projecting 3D shapes on 2D screens) that projection will not be applied, only the affine transform will be. Besides, you will still have to manually extract the x and y coordinates.

My choice was to use Matrix3D.transformVectors() to apply the entire affine transform, then applying the projection manually using arithmetic operators:

matrix.transformVectors(triangles, transformed); 

for (var i : int = 0; i < vertices; ++i)
{
    var z : Number = transformed[3*i+2];
    projected[2*i]   = ((transformed[3*i] / z) + 1) * width;
    projected[2*i+1] = ((transformed[3*i+1] / z) + 1) * height;
}

This choice was dictated by another fun fact: unlike OpenGL or Direct3D, there is no simple and obvious way of creating a projection matrix in Flash 10. In fact, when I tried to use Matrix3D.rawData to create a projection matrix from its mathematical definition, the program complained that the matrix was not invertible (which is often the point of a projection matrix).

A step forward

Reading my above disappointments, it would be easy to conclude that Flash Player 10 is a worthless technology with only marginal improvements over previous versions and limited lip service to the ressurection of browser-based 3D. This is not the case, and there are massive benefits to Flash Player 10. Even if there is still no batching or GPU-based geometry transforming, and even if the user has to sort polygons by hand, there is a high-performance way to map a texture using texture coordinates, which should definitely improve the performance of several existing 3D engines from the Flash Player 9 era within a matter of weeks. An active engine undergoing continued development efforts, such as Papervision3D, should be able to integrate the new features quite easily.

Besides, Flash 10 is not aimed at rivaling high-end 3D video games (or at least, I hope it wasn’t) but rather to apply simple 3D effects to DisplayObjects, thereby leveraging existing 2D codebases in a fresh new direction. And this, it does quite well: sorting works quite simply (as long as you have no overlaps), transforming is handled internally using matrices, and so on.

Still, a better interface could have been chosen to support general 3D. I could understand that depth buffering cannot work on every possible piece of phone hardware, but sacrificing what is a de facto standard on desktop and laptop computers seems strange. The same goes for vertex and index buffers (sent to the GPU). These can be trivially emulated in software, yet provide tremendous performance benefits by lowering the transfers from CPU to GPU.

In conclusion, don’t expect to write the next Crysis or Half-Life 2 using Flash Player 10, but the technology should be able to equal older 3D game graphics without too much difficulty (Crash Bandicoot, The Sims, Ragnarok Online, Second Life) and enhancing the aspect of existing Flex applications with 3D effects should be a breeze.



1150 feed subscribers
(readers who polled a feed this week)