Tag Archive for 'Architecture'

Shouldn’t Happen…

Design and development is turning the great unknown chaos into tiny bits of controlled functionality with promises about what the result will be, and expectations about what the input should be.

There is an interesting duality between two categories of expectations, depending on whether they are the responsibility of the user, or of the programmer.

User errors are classic mistakes involving incorrect input, such as attempting to load a file that does not have the right format, or visiting a web site that does not exist, or entering an incorrect email address. A program is expected to, at the very least, gracefully handle these situations (because nobody likes errors) and the best programs are actively designed to reduce the possibility of error though appropriate user interface choices.

Programmer errors are the most frequent ones, but most of there are luckily caught by a compiler (or, in the case of the less lucky interpreted languages, the parser). The basic idea is that if you expect a function parameter to be an integer, and you tell your compiler, then static analysis will determine that you will receive a string argument, and the universe will collapse build will fail.

Static Analysis

Static analysis can be very smart. It can prove beyond any doubts complex properties about complex software written in obscenely low-level code (such as C with inline assembly). The problem if that working with a static analysis tool can add unusual constraints on the developers themselves: the halting problem dictates that no tool can safely predict the behavior of a program, so any given tool will either have false negatives (undetected bugs) or false positives (safe code reported as dangerous) and the general trend for static analysis tools is to avoid any false negatives at the cost of false positives.

The quality of a static analysis tool is determined by how hard it is to write code without false positives (usually done by manually coding around the blind spots of the tool).

Static analysis tools have two problems. One, they’re not available for every single language and platform out there. Some of use are still using languages with eval(), throwing Java exception-safety out the window because we find it too constraining, doing without those pesky type systems and generally making a childish fuss about those “warning” thingies. Two, static analysis tools can only check constraints that are described by the developer in some form, such as assertions, preconditions, postconditions, type annotations or some other kind of attribute added to the code.

So, if you forget to “assert” it, nobody is going to check it for you. For instance, no tool is going to warn you that you unwittingly leak a credit card number to a third party.

The Elephant Statue

In a sense, predicting user errors is the mirror activity of gathering specifications. Both force you to think about all possible situations your software will face, and decide what should happen: maybe you have to display an error, maybe you will have to tread the input in a clever but predictable way, or maybe you will have to rework your process to prevent that situation from happening.

This is akin to creating an elephant statue by starting with a block of stone and carving out everything but the elephant. Deciding what your users can do implicitly defines what your users cannot do. Depending on the situation, you may guide your design with either approach.

Interest(ing) rates

The most common way of investing money is putting it in a savings account. You lend a fixed amount of money to someone, and they pay interest over that money at a predetermined rate. Let’s say you lend 1,000 € at an interest rate of 3%, paid every year: at the end of the year, you would receive 30 € as payment for your lending. You would spend these on fine wine or nice clothes and wait until the next year to get another 30 €, and so on.

Savings accounts work on the basis of simple interest : what you get paid is a linear function of both time and money. Lend for half a year? 3% ÷ 2 = 1.5% Lend for two years? 3% ×2 = 6%

An important thing to bear in mind is that interest is paid at fixed intervals, for instance at the beginning of January. You don’t have to spend those 30 € : you can them on the savings account and earn simple interest on them after a year (3% of 30 € is 0.90 €).

Using this strategy, lending for two years is done at a 6.09% rate instead of 6%, because you get interest on interest. This is known as compound interest : what you get paid is an exponential function of time. Lend for two years ? (+3%)² = +6.09% Lend for three years ? (+3%)³ = +9,27%

The mathematical justification is that, with a 3% interest, your total amount of money is multiplied by 1.03 every year:

1,000 + 30 = 1,000 + 3% of 1,000 = 1,000 + 0.03 × 1,000 = 1.03 × 1,000

So, after two years, the amount is multiplied by 1.03 two times, and so on.

1,060.90 = 1.03 × 1,030 = 1.03 × 1.03 × 1,000

In short, percentages have a multiplicative effect.

And now, pop quiz : I’ve gained +5% weight over the winter holidays. What percentage of my weight do I have to lose to be back to normal ?

If you answered -5%, you missed the point. Multiplicative effect means the total change of weight would be +5% × -5% = 1.05 × 0.95 = 0.9975 = -0.25%. I would be losing too much weight !

The correct answer was 1 ÷ 1.05 = -4.76%.

Similarly, if the number of graduates of a given school increases by +10% on year one and +25% on year two, the total increase is +37.5% and not +35%.

Duality

This is where mathematicians (and computer scientists) use an interesting little concept called duality. Percentages are numbers that are easy to understand, but hard to combine. We can transform them into something that is a little bit harder to understand, but easier to combine.

The traditional way to transform multiplication into addition is to exponentiate, due to an interesting property of the exponential function:

exp(a) ×exp(b) = exp(a + b)

So, I wish to find a percentage operator (§) such that:

  • we conserve some values, 0§ = 0% and 100§ = 100%
  • applying A§, then B§, is equivalent to applying (A+B)§

Then this uniquely defines an operator which is called exponential percentage:

A§ = B%  ↔  A = 100 × log(1 + B ÷ 100) ÷ log(2)

Some common values:

0% = 0§ +100% = +100§ -100% = -∞§ 200% = 158.4§
+1% = +1.4§ +99% = +99.2§ -1% = -1.4§ -99% = -664§
+10% = +13.7§ +90% = +92.6§ -10% = -15.2§ -90% = -332§
+25% = +32.2§ +75% = +80.7§ +50% = +58.4§ -50% = -100§

percent

So, if I gained +5§ weight over the holidays, I can lose -5§ weight and be back to where I started, and if a number increases by 10§, then by 25§, it increases by 35§ overall.

And of course, a yearly interest rate of 4.2§ = 3% compounded over ten years is 42§ = 34%.

No Free Lunch

Normal percentage rules make compounding hard, but it’s reasonably easy to estimate a percentage based on a fraction. Exponential percentage rules make compounding easy, but evaluating a percentage based on real figures is harder.

In practice, compounding happens less often than evaluating, so humans use normal percentage rules. And computers are good at compounding through multiplication, so they don’t need exponentiation.

Duality does have some other uses, though. For instance, there’s the duality between two representations of complex numbers:

a + ib = r exp iθ

The cartesian (a,b) notation makes it easier to add numbers, but multiplication is harder:

a + ib + c + id = (a+c) + i(b+d)

The polar (r,θ) notation makes it easier to multiply numbers, but addition is harder:

r exp iθ × s exp iφ = (r × s) exp i(θ+φ)

For mathematically-oriented computer scientists, duality is a gold mine, because it lets one reduce a complex problem in one area to a simpler problem in another area (whether simpler means faster, as in the case of FFT, or easier to think about)..

The Law of DSLs

There’s one common duality that is fundamental in the computer world: the correspondence between data and code. In a fit of narcissism, let me sit wisely atop a tall mountain to announce Nicollet’s Law of Domain Specific Languages:

Any sufficiently complex data processing algorithm is as an interpreter for a small domain-specific language, and the data being processed is a program executed by the interpreter.

In some cases, this law only complicates things further. In many cases, however, the different angle it provides leads to many advantages, one of them being to transform a non-programming concept (such as an accounting file format) into a concept programmers are familiar with (a programming language).

A minimalist language design culture is enough to grasp several interesting concepts about executing code, which can be quite handy when processing data:

1. Compile to Bytecode

Interpreters don’t execute a string of characters. They tokenize that string, turn the tokens into an abstract syntax tree representing operations, functions and variables, then turn that syntax tree into a sequence of small, executable operations. That sequence is then fed into a virtual machine (or further compiled to machine code) to perform the actual operations.

If the input data for your algorithm is very complex, you can begin on the other side: what will the algorithm do with the data? Will it be inserting the data into a database? Constructing a data object from bits and pieces? What you are looking for is a set of atomic operations you can apply to generate the result. Implement these operations, then start working on a translation algorithm to turn the input data into such operations.

There are several common and friendly representations for such atomic bytecode:

Instruction lists are executed in order. This is your classic assembler listing, without the jumps. A typical “parse file and insert into database” algorithm would generate such an instruction list, and every instruction would be an INSERT, DELETE or UPDATE. Works best when you can read the data and generate the instructions in the right order: if you cannot get the list in the right order from the start, consider another approach.

Dependency graphs work like makefiles: you have several instruction lists floating around with relationships between them, indicating that one list has to be executed before another. A topological sort of the graph results in a single classic instruction list you can execute. A multi-file import, where some files contain data needed in other files, can be the way to go.

Nested scopes are the typical extension to instruction lists: every item in a list can be either an instruction, or another list, possibly tagged with some data. This could be a conditional (if this condition is true, execute this list), a loop (though it is best to avoid these) or a context (a “polygon” scope contains “insert vertex” operations that apply to that polygon). You can even allow variables in a let-in fashion (of which the polygon example above is just a special case) ! Note that nested scopes can be easily represented as XML.

2. Static Analysis

A side-effect of compiling to bytecode is that you get to process the entire file before you actually perform the intended operations. This makes a rollback easier if you notice that there’s an error on the last line of the file: if you make sure that no atomic operation in your target language can fail due to bad input (such as incorrect data values), then you can check your input data for correctness without doing anything to your program state.

Even better, if your compilation process is cheap (linearly traverse a file for parsing) and you have heuristics for predicting how much time and resources your individual instructions require, then you can try to accurately predict the needs of the entire process.

Static analysis also means you can optimize. If, for instance, you’re inserting data into a database and need to resolve names or keys frequently (such as “add this item to list #732″), you can easily construct a table of needed keys (that you can get in one query when the processing starts) using the dependency graph approach.You can also optimize resource allocation by using common register allocation techniques: sort your dependency graph to keep as few resources in memory as possible at any given time.

3. Caching

Try to perform most of the processing offline.

For instance, if you frequently “apply” one file to another, such as a nearly-constant “list of categories” file used to resolve the “category” key in a daily object import, you can benefit from compiling the nearly-constant file to an easily loaded, easily applied format.

You see a cached dictionary that maps keys to categories? I see a DSL that allows dictionary literals as part of the language, and a source file that contains a literal mapping keys to categories, with an interpreter that can apply constant propagation to dictionaries.

Another benefit is when applying changes to mission-critical software. Inserting lots of data into a web database can create a heavy load on the server and make the site unavailable to visitors. It might therefore be preferrable to pre-compile the imported data into requests through a process that keeps a light load on the server, then run the requests.

Besides, with proper nested scoping, you can slice an import into several transactions. This keeps the lock count low, allows spreading the transactions over time to reduce the load, and lets you resume the import process if, for some reason, it gets interrupted.

Do It Yourself

Unless you’re working in an esoteric field on the bleeding edge of technology, the vast majority of programming problems you face have already been solved many times by many other people, and several of these solutions are readily available on the web or in legacy code libraries you might have access to.

To solve a problem, you can

  • reinvent a particular wheel : the non-factored approach, since you create your own instance of that wheel,  or
  • reuse one of its existing implementations : the factored approach, where several projects benefit from the same piece, including your own.

Both alternatives have costs and benefits that the experienced software engineer is aware of, and these will depend on your exact problem somewhere along the lines of :

1

The time spent solving a problem steadily increases with the size or difficulty of that problem, and is further subject to two important rules.

Non-factored is cheaper for small problems

A factored solution carries some overhead because it is used by several projects with different scopes. The “one click, 200 words” bias happens when non-technical managers hear “leverage an existing solution”, and see a picture of a one-click installer and a 200-word tutorial telling them their particular problem can be solved with two lines of C# code.

HolyGrail grail = new HolyGrail();
grail.doWhatIMean(/* No options here! ^_^ */);

Yeah. Riiiight.

Every one of us has spent days reading up on third party libraries just to decide if they are worth the effort, slaying compatibility dragons to make it talk with the rest of the project, filling hundreds of configuration options that have no relevance whatsoever to the tiny problem at hand, teaching co-workers about the nooks and crannies of that code, and painstakingly wading through less-than-civilized error reporting to solve the obtuse problems that come up on the day before you release.

Even writing your own reusable code is orders of magnitude harder than just jotting down a quick one-shot solution to whatever problem you have. An excessive tendency to build generic code from the very beginning makes your development process look like Dragon Ball Z : you have to power up for fifteen episodes before you can show a splash screen.

This rule is the reason why the red curve stays above the blue curve for small problems.

Factored scales better for large problems

Solving a larger problem involves a larger solution. In a do-it-yourself situation, you have to make the solution larger yourself. When using a factored approach, you already injected an existing large solution into your project, and it only feels small because you’re using a small part of it. With the programming equivalent of flipping a switches, you get to use a larger part.

The solution that involves the most code (the non-factored one, in case you wondered) also involves the most maintenance, documentation and development work. Whether this comes from a thousand-line reinvented wheel or obscene copy-pasting, having a large code base is something you will have to pay for in the long run. You don’t buy code, you rent it.

This rule is the reason why the red curve ends up above the blue curve for sufficiently large problems.

Keeping these two rules in mind, the key to making the right decision is determining where the red and blue curves intersect, and where your project stands. Easier said than done. For instance, what does “problem size” mean, precisely?

Problem size can be, literally, the size of the problem for an obvious metric. A content distribution network like Amazon S3 is a bad choice for 1000 downloads per week, but an obvious solution for 1000 downloads per second.

Could be the things in the application that are similar to the one you’re implementing. Sending usage statistics back to your server is a small problem solved with a vanilla HTTP request. If you communicate with the server a lot, you might want to keep the URL and error handling logic together in one place.

Or it could be the number of features. Displaying data in table format takes two nested loops and some HTML. Sorting, filtering, asynchronous sending or editing involves some rather smart Javascript development, or integrating a tool like jqGrid or ExtJS.

Once, Twice, Refactor

The special case of writing your own reusable code has been “solved” by Agile folks who suggest writing a non-reusable version of the code on the first try, and refactoring it to a reusable version the second time it’s needed. This is your third choice : go with the non-factored solution if you are unsure whether the problem is large enough to warrant the factored solution, and change your mind as soon as you gather enough data.

2

This is a solution that costs less than the factored approach if the problem is small, and costs less than the non-factored solution if the problem is large, while keeping an acceptable overhead when the problem is somewhere in-between.

Of course, writing your own reusable code means that the cost of switching from the non-factored to the factored version is significantly lower than starting with the non-factored version from scratch, because you refactor the original solution into a reusable one.

The advantages are not so obvious when moving from one approach to the other involves throwing away all code and installing a third party application. You do get some benefits—at the very least, you know more about the problem that you did at first, and perhaps your first approach served as a useful prototype to further refine your needs—but doing this can hurt a lot.

So, you end up getting hurt if you don’t know what you’re doing. What a surprise.

Javascript signals

Signals operate as a simple way of decoupling dependencies within a project, by allowing caller-callee relationships through an interface that makes both parties anonymous. Assuming a shared signals object is provided, the receiver registers itself on that object:

signals.output = function(text){ alert(text) };

And the sender uses the registered channel to remotely execute that function:

signals.output('Hello');

Signals are the functional equivalent of object-oriented inversion of control, a technique that allows users to configure the behavior of third party code without having to modify it. This is done by removing any explicit dependencies of the third party code on specific behavior units, such as “output a piece of text”, and injecting those dependencies back from the outside as an object or set of objects which hide the actual implementation of those behavior units. Basically, we’re replacing:

function frobnicate(a,b) {
  foo(a);
  bar(b);
  alert('Success');
}

frobnicate(1,2); // Can't prevent the alert box from appearing!

With the slightly longer but easily configured:

function frobnicate(a,b,output) {
  foo(a);
  bar(b);
  output('Success');
}

frobnicate(1,2,function(t){alert(t)}); // Original behavior
frobnicate(1,2,function(){}); // Muted function
frobnicate(1,2,function(t){console.debug(t)}); // To firebug console

Since a given piece of code might depend on several distinct behavior units, I use a record to transmit all that behavior as a single argument. This results in the classic “configure my library with your options object” that can be found, among other places, in jQuery.

This simple approach causes a small number of difficulties:

  • If I want to use a slightly different version of a signals object for another part of the program, I have to manually create a copy of the object and change the copy (basically the equivalent of a pure functional object mutation).
  • In some situations, I might want to handle several callbacks for a single signal. The current approach only lets me define a single function for a given signal.
  • Some functions of the signal set (such as sending a form through AJAX) might rely on other functions of the signal set (display an error message) to handle their own behavior unit dependencies, and I would like those functions to automatically have access to the signal set they belong to, dynamically.

This leads me to a subtly different implementation of signals:

signals = (function(){
  s = function() { this._c = s; };
  s.prototype.channel = function(c) {
    var h = [],
        s = function() { for (var k in h) if (h[k]) h[k].apply(this,arguments); };
    s.bind = function(f) { h.push(f); return h.length-1; };
    s.unbind = function(f) { h[k] = null; };
    return this.set(c,s);
  };
  s.prototype.set = function(n,v){
    var i = function(){ this._c = i; };
    i.prototype = new this._c();
    i.prototype[n] = v;
    return new i();
  };
  return s;
})();

This small class encapsulates pure functional mutation semantics by means of its set function:

var signals = new signals();
var initial = signals.set('xxx',100);
var final = initial.set('xxx',200);
console.log(initial.xxx + ' ' + final.xxx); // Outputs '100 200'

This small piece of behavior is in itself quite helpful, but it gets better: if a function is added to the object, it remains there but is always executed within the context of the current object and therefore has access to its actual values.

var signals = (new signals()).set('show',function(){console.log(this.xxx)});
var initial = signals.set('xxx',100);
var final = initial.set('xxx',200);
initial.show(); // Displays 100
final.show(); // Displays 200

Last but not least, it’s possible to create a full communication channel that can be connected to several receivers and forwards its arguments to all receivers.All receivers are called with the signals object as their context, which lets them access it and behave accordingly.

var unreadMessages = 0;
var signals = (new signals()).channel('setUnread');

// Update the number of unread messages, notify user if they have
// new messages.
signals.setUnread.bind(function(unread){
  if(unreadMessages < unread) this.notice('You have new messages!');
  unreadMessages = unread;
});

// Update all places that display the number of unread messages
signals.setUnread.bind(function(unread){
  $('.unread').html('Messages'+(unread > 0 ? ' ('+unread+')' : ''));
});

// When at page scope, notices are printed by growling
var global = signals.set('notice',growl);
global.setUnread(10);

// When inside a smaller scope, such as a component, display notices in
// a dedicated location
var local = signals.set('notice',function(arg){$display.html(arg)});
local.setUnread(15);

Last Minute Skin

Right now, we render our page layout on the server, thus wasting precious bandwidth sending the same header, footer and menus all over again every single time. AJAX techniques have evolved to reload only the inner part of every page, but they require clever URL manipulation or ‘back’, ‘refresh’ and bookmarks won’t work, and they impose strong constraints on page layout and on the way the server responds to requests.

Why not do it the other way around? Have every page include the same layout-generating JavaScript file (kept in the browser cache for optimum performance) ! This is the idea behind the last-minute-skin pattern.

PHP Type Checking

PHP does not enforce types at compile-time (if anything, because there isn’t a compile time) and runtime checking only happens at the leaves of your source code tree, when you use a PHP function and that function notices one of its arguments is incorrect.

There are of course ways of introducing additional type safety into PHP code, both through development practices and through hints. For instance, you can hard-code checks into function prologues:

function SetUsername($username, $usr_id)
{
  assert (is_string($username));
  assert (is_int($usr_id));
  // ...
}

And, if using class types, you can also use the type hint mechanism in PHP 5 to get automatic warnings:

function FitToWindow(Image $img, Window $window)
{
  // ...
}

There remains the issue of member variables, which are modified and read in many different places. This means a “check the object is in a valid state” function is an useful addition to a class, to be used as a validity check during development to catch any errors as soon as they occur.

I sometimes use the following for my checks:

class Type
{
 public static function Is($value, $type)
 {
   if (func_num_args() > 2) {
     $args = func_get_args();
     array_shift($args);
     return self::Is($value, $args);
   }

   if (is_string($type))
     return self::Is($value, array_filter(explode(' ', $type)));

   if (empty($type))
     return true;

   $first = array_shift($type);

   if ($first == 'null')
     return $value === null || self::Is($value, $type);

   if ($first == 'array') {
     if (!is_array($value))
       return false;
     $next = 0;
     foreach ($value as $key => $val) {
       if ($key != $next++)
         return false;
       if (!self::Is($val, $type))
         return false;
     }
     return true;
   }

   if ($first == 'time')
     return is_int($value) && $value >= 0;

   if ($first == 'hash') {
     if (!is_array($value))
       return false;
     foreach ($value as $val)
       if (!is($val, $type))
         return false;
     return true;
   }

   if (is_callable($first))
     return call_user_func($first, $value) && self::Is($value, $type);

   if (is_callable('is_' . $first))
     return call_user_func('is_' . $first, $value) && self::Is($value, $type);

   if (class_exists($first))
     return($value instanceof $first);

   return false;
 }

 public function checkTypes()
 {
   self::check($this);
 }

 public static function check($obj)
 {
   $class = get_class($obj);
   foreach (get_class_vars($class) as $var => $value)
     if ($var{0} != '_')
       if (!is($obj->$var, $value))
         throw new Exception("Type error: `$class::$var` is not of type `$value`");
  }
}

The typical use is to define a new class, then assign a default value to all type-checked variables: that default value is a type string (or array) that is parsed and verified by the check functions. For instance:

class User
{
  var $id = 'int';
  var $name = 'null string';
  var $media = 'array Media';
  var $friends = 'positive int';
  var $_hash;
}

This would check that the identifier is an integer, that the name is a string or null, that media is an array of instances of the Media class, and that friends is an integer such that is_positive($obj->friends) returns true (assuming you define that function somewhere). The hash variable is unchecked because it starts with an underscore. This has some advantages:

  • Type expressions are shorter than the corresponding assert statements.
  • They go deeper as far as checks go (for instance, arrays also check that all members are of a certain type).
  • They document the code, by explaining in the class definition what the types of the variables are, as opposed to staying in a function.
  • They help with automated testing by allowing the creation of classes with arbitrary values of the chosen type.

This also has disadvantages:

  • This prevents setting an actual default value for the variables.
  • It introduces an artificial naming convention for variables starting with or without underscores.
  • Type-checking arrays or large structures takes time.
  • It’s not detected by documentation generators.
  • Does not play well with private variables.

You’re not a person

WEEK 1

In this application, every person belongs to exactly one team.

WEEK 4

We need to manage external contractors. We could use the “person” object.

WEEK 5

Hey, we need to assign a team to every person. Let’s create an “external” team.

WEEK 127

Did you see that newspaper article about our company? They say we have an average of 30 people on every team. Do we even have 30-people teams?

Names are short. They can only convey a very limited amount of information. Even worse, that information tends to be different from its meaning in standard English: by declaring in week one that every person belongs to a team, the project designers separated the Application::Person (always in a team) from the English::Person (might be in zero, one or more teams). By week four, this separation vanished from the minds of most of the team. A developer noticed that “English::Contractor is-a English::Person” and mistakenly translated it to “Application::Contractor is-a Application::Person“.

This was the first mistake. Why didn’t he notice?

A positive property is what you can do with a thing.With the Person object, you can store a name, login, password and phone number!This is exactly we you needed! Those positive properties you need that the object doesn’t provide, you can always add them through inheritance or composition, and that’s still less work than implementing everything or having to refactor the code. A negative property is what you cannot do with an object no matter how hard you try. With the Person object, you cannot remain on your own without a team! But our brains are biased to look for positive properties first, and passively ignore negative properties until it’s too late. Positive properties are about the solution solving the problem. Negative properties are about the solution not being applicable.

The second mistake was, by far, the worst. So they finally noticed that negative property that blasted all their model away. And they went on with it, patching the issue by altering the meaning of Application::Team. It originally a project team within the company, it then represented a named group of people that could be a project team or the group of external contractors. This is refactoring: no matter how you look at it, you change the behavior of an object and let it propagate throughout the project, so you better be careful about where it propagates! In this case, they weren’t careful about propagating the change of meaning to the documentation and user interaction part of the project, who mistakenly kept the old meaning of Application::Team. This led to a naive PR team issuing a statement that included the “external” group as if it were a project team.

It’s always helpful to have an anal-retentive person in a group, preferably in a position of authority that lets them veto such changes, and who is vigilant enough to spot that “external” team early on in the design.

The real mistake was allowing a negative property to slip into the design. Negative properties hinder reuse, by definition. Sure, allowing a person to belong to zero-one-many teams is hard on every piece of code that must work on teams, because the writers have to remember to check whether the person has a team in the first place. But it has to be done. Doing it may even bring to light some issues in the original requirements (”So what happens when a person changes teams between the moment team bonuses are computed and the moment they are paid out?”) that would become annoying later on.

Best Practices

There are hundreds of things that can go wrong even in the simplest situations. I’ve already explained why the real value of a domain expert is precisely to identify in advance everything that could go wrong with a project, so that it can be avoided.

Consider a comment form on a website. Nothing too fancy: the user fills in the “Name”, “Website (optional)” and “Comment” areas on a form, clicks the “Submit” button, and the page reloads with the comment on the page. No login required, no AJAX, no special effects. There are many things that can go wrong with this setup, and will go wrong if left in the hands of an inexperienced developer. They can be inconvenient, annoying or outright dangerous.

For example,

  • Double-posting. When the submit button is clicked, the form sends a request to the server with the comment to be added. The server responds with the new list of comments. The user clicks the “refresh” button while on that page, or navigates to another page and presses the “back” button. This cause the browser to send the request again, so the comment appears twice in the comment list. If using POST, this is slightly less dangerous : the user might get an annoying “Submit again?” window instead of double-posting.
  • SQL Injection. It is highly probable that the comments will be stored in an SQL-accessible database. If the code constructing the SQL query is not properly written, an appropriately chosen value for the comment fields can result in nasty things happening to the database.
  • Cross-Site Request Forgery. Suppose that posting the form creates a GET request like:
    http://yourdomain/postcomment?name={name}&text={text}

    Knowing this, I can include an image tag in a forum, with a source attribute that matches the posting of a spam comment on your website. Every visitor of that forum page will send that request automatically (browsers auto-fetch images by default) and spam your comment list.

  • Script Injection. The text entered by users must be displayed back to the visitors. If that text is not escaped before being output, an malicious attacker can submit a comment containing a dangerous script like:
    document.location = "http://www.youtube.com/watch?v=f2b1D5w82yU";
  • Encoding Issues. What happens if the page is encoded in UTF-8 but I send you ISO-8859-1 text? Conversely, what happens if the page is encoded in ISO-8859-1 and I copy-paste my comment from Microsoft Word? For that matter, what is the encoding of the database? What is the encoding of your string literals?
  • No Validation. User forgets to enter a name or a comment. No server-side check is made to determine whether the posted comment is valid and you get a mix of ugly empty comments and/or server error messages.
  • Lossy Validation. You have to prevent people from posting with no name or no comment body. This means errors will be displayed on the page and, if the detection of such errors happens on the server after the initial post, it’s easy to forget displaying back the text the user entered in the first place. “Sorry, you forgot to enter a name so I’ve thrown your ten-line comment away” [#]
  • Does not work in Internet Explorer. There are many possible causes for it, such as respecting W3C specifications.
  • Legal Issues. If a malicious commenter uses your page as a soapbox for illegal activities, some countries will hold you responsible. For instance, in France, you can be condemned if anonymous posters engage in holocaust denial on your website.

That’s nine, just thinking about the obvious problems that would happen if following the simplest approach to this, and I have seen many of them happen in three situations: novice programmers (such as interns), freelancers and low-wage programmers. The worst offender is by far the code written in naive PHP, which has the peculiarity of “the simplest thing” being almost always “the incorrect thing” as well.

Still, if you can’t let an intern write a simple user comments page, what are you going to let interns do?

All of the above issues are easy to correct once you know about them. Always send data as POST, check the referrer, convert everything to UTF-8, validate your data, use prepared statements instead of inline SQL, respond with a 303 redirect to a GET page, include the posted data and any errors in the session and display them back in the form if present, take all your dynamic generation text through an an HTML escaping function, add “type=submit” to buttons, and add a quick moderation tool to hide unwanted messages quickly.

Knowing about the issues and acting to prevent them is the hard part, which is why every project should have at least one experienced developer who knows about the errors. Or be using a framework that prevents such errors from happening in the first place (then again, if the documentation for Zend_Form has an “user refreshes page, double-posts by mistake” error, who can we trust?)

Although it has been taken over by marketing folks, there are still good thinks to be said about “best practices”. The basic idea is to have a set of practices available for the less experienced developers to follow. Such practices are usually very simple to understand and follow (never display data in a POST controller, never change the model significantly in a GET controller), reasonably simple to verify automatically (assert that no output happened as part of a POST controller response) and have the immediate effect of preventing a classic mistake (no re-post on a page refresh).

I’m a big proponent of enforcing good code through practices first, and then code-based contraptions if developers insist on ignoring them. The problem with going for the contraptions first is you have to explain how to use the contraptions anyway, and people will be tempted to move around the contraptions and still write bad code.

If your code is reviewed by a compiler or an automatic code analysis tool, you can learn how to game the system. This results in code that does not trigger the alarms, while still being bad. Compare with having your code reviewed by a live person, who is experienced and anal-retentive about respecting practices and makes it horribly clear that if you don’t follow them, you will be forced to follow them, on your free time before you can commit your code. Such reviews leave no room for wiggling, and as long as the judgment of the reviewer is fair, will actually motivate the team to respect the standards.


[#] Viadeo actually did even worse things to me (”Sorry, I forgot to tell you that you were only allowed 255 characters in this box, so I’ve deleted everything for you so you can try again. Oh, and don’t try the back button of your browser, I have also deleted your input on the previous page.“) so I suspect it has been written by Java rookies with close oversight by non-technical management.

JSOS : JS-PHP mapping

I’ve been working on Javascript-PHP remote call mapping techniques. A set of PHP classes with static functions are selected as a public interface and automatically exported so that the Javascript can call them. Usually, this kind of mapping involves three difficult points:

  • Detecting what functions are exported and building the appropriate JS code.
  • Detecting what function the JS is calling.
  • Transforming data between JS and PHP.

I solve these with glob(), __autoload and json_encode (respectively) in a little prototype I called JSOS (JavaScript Or Something). jQuery on the client side provides me with synchronous AJAX queries. The code (PHP with echoed Javascript) looks like this, and is placed in a controller:

<?php
 function __autoload($classname)
 {
   require_once($classname . '.inc.php');
 }

 if (!isset($_POST['cls'])) {
   echo '<html><head><title>JSOS</title>';
   echo '<script type="text/javascript" src="jquery.js"></script>';
   echo '<script type="text/javascript">var jsos={};';
   echo '$.ajaxSetup({async:false,timeout:5000});';
   echo 'jsos.$=function(c,f,a){';
   echo 'var o={exception:"Server disconnect"},';
   echo 't={cls:c,func:f};';
   echo 'for(var i in a)t[""+i]=a[i];$.post(".",t,';
   echo 'function(d){o=d},"json");if("exception" in o)throw o.exception;';
   echo 'return o.result};';

   foreach (glob('*.inc.php') as $file)
   {
     $class = str_replace('.inc.php', '', $file);
     echo 'jsos.' . strtolower($class) . '={};';
     $text = file_get_contents($file);

     preg_match_all('/public static function ([A-Za-z_0-9]+)\(([^)]*)\)/',
                    $text, $func);

     foreach ($func[1] as $id => $value) {
       echo 'jsos.'.strtolower($class).'.'.strtolower($value).'=function(';

       $args = explode(',', $func[2][$id]);

       foreach (array_keys($args) as $key) {
         $args[$key] = str_replace(array(' ', '$'), '', $args[$key]);
       }

       $args = array_filter($args);

       echo implode(',', $args).'){return jsos.$';
       echo '("'.$class.'","'.strtolower($value).'",['.implode(',',$args).'])};';
     }
   }

   echo '</script></head><body></body></html>';
   exit;
 }

 $args = array();
 foreach ($_POST as $key => $val) {
   if ($key === 'cls')
     $cls = $val;
   elseif ($key === 'func')
     $func = $val;
   else
     $args[(int)$key] = $val;
 }

 ksort($args);

 if (!class_exists($cls)) {
   echo '{exception:"No such package!"}';
   exit;
 }

 if (!method_exists($cls, $func)) {
   echo '{exception:"No such method!"}';
   exit;
 }

 try {
   $result = call_user_func_array(array($cls, $func), $args);
   echo json_encode(compact('result'));
   exit;
 }
 catch (Exception $e) {
   $exception = "$e";
   echo json_encode(compact('exception'));
   exit;
 }
 

All files in the current directory with a ‘.inc.php’ extension are assumed to contain a similarly-named class, and all public static functions of that class are exported. For example, suppose Test.inc.php contains the following definition:

class Test
{
  public static function Run($a) { return "$a$a"; }
}

Then, to call the Test::Run function from the above page, one would simply type in Javascript:

jsos.test.run('Hello');

And indeed, with only five lines of code, the result to a client request is computed on the server and displayed on the client again, without issues. Here’s the result running in Firebug:

jsos

Further work should include handling errors (right now, if the server encounters an error or outputs non-JSON data, the call will die with a “Server disconnect” exception), which makes debugging easier than having to wade through the Net tab of Firebug.

Easier Unit Tests

Automated unit testing has three main advantages:

  1. It forces you to express in detail what you want the unit to do (so that a computer may then test it by checking the results).
  2. For an unit to be testable, its results should be checked as “correct” or “incorrect” on their own or with minimal context. This makes code easier to reuse.
  3. Since it’s automatic, you do not need to test manually every time you make a change: the tests will run hourly or nightly (or for every build) and tell you what went wrong.

The cost, however, is that unit testing code must be written for every unit. That code needs to create an object, manipulate it, then check its return values for validity. And test-driven-development advocates insist that such code should be written before the actual unit (make it compile, make it fail, make it pass).

I do understand their position: if you’re thinking of very specific corner cases that you might forget about while writing the unit, you might as well write tests for these corner cases first. But what about the simple, core functionality of the units? If a programmer sets their minds to testing an unit, they can usually look at the value and decide “it’s correct” or “it’s wrong”, and that process is orders of magnitude faster than writing an assertion that checks whether the value is correct. We’re not talking about complex ten-argument dozen-property function on a deep-inherited object here, it’s more of a “this string goes in, that string goes out” concept.

An example would be a “slugify” function : plug an arbitrary string into it, and a cleaned up lowercase hyphen-separated string comes out. As long as you are mindful of corner cases (weird characters, encodings, empty strings and so on) testing this function manually is quite simple. You certainly lose the benefit of point 1 (no more expressing the details of what the function should do in advance of writing it) but you can keep the benefits of 2 (whether automatic or manual, tests make code reusable) and you can even get the benefits of 3 by saving your code as an unit test.

This is what happens in the console-like scaffold I have been working on (you can click to enlarge):

autotest

You type in code in the text box. Clicking “Run” (or pressing TAB RET) sends the code through AJAX to the server and retrieves the var_dumped result (as well as a highlighted version of the code). The mindful programmer can use this console to preemptively test and debug code. The excellent programmer will even adapt their code to make such testing easier, and being able to use any kind of code in the barren context of this console means that code can be used anywhere.

The real hit there is that little Add Cog icon from the FamFamFam Silk icon set. It’s truly beautiful and adds a touch of antialiased pixel art to an otherwise bland rounded-edges Web 2.0 console. And what it does is even better: when clicking that icon, the corresponding piece of PHP is automatically added to the database of unit tests, as a test that runs the code on the green background and asserts that the output of the code is identical to the text on the grey background.

This makes the “for every bug you find, write an unit test that finds that bug” insanely easy to do: find the bug using the console, correct the code, check with the console that the bug is dealt with, and then add the final “bug corrected” console run to the unit test database. The work spent debugging a program from this console is never lost, it all ends up being an automated test with one single click.

This also makes writing unit tests much more developer-friendly: the console lets you test your code as you write it, without having to write controllers or views or whatever other contraption you need to display the results. An excellent choice for testing-averse developers.