Archive for the 'Dynamic' Category

Reusable CSS

Woe unto CSS, for it provides no refactoring-friendly tools! The CSS beast has neither functions nor variables, and its definition of inheritance is perverted beyond words. Pain and suffering await those who hope to keep their CSS from one project to the next, or even share the CSS between pages on a single website!

Consider two simple pages: the home page has a small navigation bar (selected by #navig) at the top of the screen, while the catalog page as a larger navigation (still selected by #navig). Each page includes a different layout.css stylesheet, so everything’s fine. Except that now, anything defined in a layout has to be copied over by hand to the other layouts if you want to reuse them. Ouch.

Does that example sound extreme? It certainly is! But the danger of page-specific stylesheets remains: if you won’t be stepping on your own toes with something as trivial as #navig, perhaps .book will mean two different things on two different pages?

Rule Zero : Keep all your CSS Together

This might seem a bit harsh, especially if you have truckloads of CSS floating around and don’t want to slow down the initial loading time of your page, or the time spent resolving collisions. However,

  • This rule will make it easier to factor out common bits of CSS, leading to an overall smaller set of stylesheets.
  • The number of HTTP requests matters as much as the bandwidth, so delivering all your CSS as a single, minified, gzipped blob is often a good performance idea.
  • The entire point is to make it harder to create page-specific rules, so that you don’t make a rule page-specific by mistake, and strive to make most of your rules page-independent.

I usually place all of my CSS in correctly named files in a directory on my server, then have the server generate a single, all.css master file that @imports the other stylesheets by path. This means Firebug’s CSS browser will correctly identify the source file for any given rule. When the code moves to a production server, the auto-generated master file becomes a pre-generated/minified/gzipped resource, and can even be moved to a CDN for improved performance.

On the other hand, keeping all your code in one place will only help you see collisions, it will not actually help you solve them.

Fortunately, we can look to other languages for tips and trick on how to make code easier to reuse. The fundamental observation is that you cannot use something if you don’t give it a name. One would expect CSS identifiers and classes to serve the same function, and indeed it does work in simple cases:

a.important { font-weight: bold }

Now, you have the important «function», that you «call» on an anchor element to make it appear important. Bam! Instant reusability. Using an identifier instead of a class still allows reuse on distinct pages, but restricts reuse within a single page.

Rule One : Document your «Functions»

You cannot reuse code if you cannot find it, and even if you don’t forget about it someone else on the team might be completely unaware that it even exists. So you should somehow document that the important class exists. My personal, PHP-friendly preference, is to have a “Css” class with all those nice classes available:

class Css
{
  /* a.important : make a link important */
  const IMPORTANT = "important";
}

Then, you can reuse them when you see fit to do so:

Click <a href="<?=$url?>" class="<?=Css::IMPORTANT?>">here</a>

That’s just personal preference—any way of documenting your CSS classes is fine as long as it’s somewhere everyone can see it. In fact, I have a nice set of PHP helpers lying around to bind jQuery UI CSS effects to my code, thereby documenting what jQuery UI can do without having to dive into the stylesheets every single time.

The real problem appears when you have more than one «argument». A typical example is the list of links with a “selected” link: the graphical effect applies to the list, to the elements of that list, and to the content of those elements, which leads to several rules selecting different elements.

ul#navig { margin: 0 ; padding: 0}
ul#navig li { list-style-type: none }
ul#navig li.selected a { font-weight: bold ; color: black }

This kind of structure cannot be documented simply by stating that the ul#navig element is going to become a pretty list, because without the li.selected in there there will be no «pretty» worth mentioning.

I document this as follows:

/*
  <ul id="navig">
    <li><a>Item</a><li>
    <li class="selected"><a>Item</a><li>
    <li><a>Item</a><li>
  </ul>
*/
ul#navig { margin: 0 ; padding: 0}
ul#navig li { list-style-type: none }
ul#navig li.selected a { font-weight: bold ; color: black }

Why not document it in the PHP code, then? IMO, a CSS designer to write a quick const FOO = "bar"; line in PHP, but not an HTML helper that turns an array of links into pretty list HTML. CSS designers write the CSS (with documented HTML) and PHP developers turn that into HTML helpers.

</acronym soup>

Another important element of code reuse is the notion of encapsulation, and in particular the existence of “private data” that is part of the program, but can only be accessed by some parts.

There is no such thing with CSS. There are two reasons for this. The main reason is that being sloppy with selectors is commonplace:

/*
  <div id="userList">
    <ul class="users">
      ...
    </ul>
    <a>New</a> |
    <a>Edit</a> |
    <a>Delete</a>
  </div>
*/
#userList a { color: #FF9900 ; text-decoration: none }
#userList a:hover { text-decoration: underline }

The three links in the user list component («new», «edit» and «delete») will appear in orange without underlining, as expected and documented. The unexpected and non-documented consequence of this code is that all links within the list of users will be orange without underlining as well.

Rule Two : Only Select what you Need to Select

The typical consequence of sloppy selectors is that «insert component A into component B» operations utterly destroy the formatting of component A. The typical designer reaction to such graphicalypse is «Darn, component B destroyed some property of component A, so let’s add some rules to component A to reverse the damage!»

Bad idea. It makes the code longer, and only hides the actual problem (along with any symptoms that only appear in specific cases). The real solution is to make sure selectors only select what they need to select.

One way of doing so is to use the «>» selector, as it restricts the selection to only children of the initially selected element. This would work:

#userList > a { color: #FF9900 ; text-decoration: none }
#userList > a:hover { text-decoration: underline }

Of course, it wouldn’t work in IE6, but who cares about IE6 anymore?

The general approach is to use specific classes for those elements that must be affected:

/*
  <div id="userList">
    <ul>
      ...
    </ul>
    <a class="userList-link">New</a> |
    <a class="userList-link">Edit</a> |
    <a class="userList-link>Delete</a>
  </div>
*/
a.userList-link { color: #FF9900 ; text-decoration: none }
a.userList-link:hover { text-decoration: underline }

If anyone uses that userList-link class in their code (and your naming conventions were clean enough), they had it coming.

Rule Three : Choose Proper Naming Conventions

It is quite important to remain consistent in your naming practices, especially since you now need to identify, for any given identifier and/or class:

  • If it represents a «function» (#userList), or if it helps select a specific «argument» (.userList-link).
  • In the latter situation, what function the argument corresponds to (so that you can look for its definition).

My preference is to use camelCase names (classes or identifiers) for functions, and camelCase-camelCase names for arguments, where the first half is the name of the function. The CSS would then be gathered in a camelCase.css stylesheet named after the function, with a documentation of the expected HTML at the top, hence making it much easier to find and reuse.

Now that you have access to functions, you will probably want to use them to implement reusable «components» — standalone pieces of HTML and CSS that represent atoms of information.

At some point, you will have to make components interact (if only to respect each other on the page layout). All of this will be hell if component A uses normal block layout rules, component B is floating to the left and component C is positioned absolutely.

Rule Four : a Component Should only Care about its Inner Layout

As soon as a component starts to care about outer layout concepts such as margin, position, floating or clearing, you will be in a world of pain. This is because such concepts depend on where the component appears, and as such are not easy to reuse.

I split my CSS code into components and bones:

  • Components. These are reusable atoms. They do not care about their outer layout at all, so they never specify anything like margin, position, floating, clearing, display mode or anything that might cause them to interact differently with their surroundings on the page.

    They may specify a width and height if they wish, but it is discouraged (a component that can adapt to any geometry is easier to use). They can specify anything they want in terms of border, padding, font, color, background, font, and any inner properties they need.

  • Bones. These are elements found inside the components that handle the layout of the component contents themselves. They can and should make appropriate assumptions about what bones can be found within a component and how they should interact to result in the layout you need to see.

A nice finishing touch is to make the component overflow : hidden, because the last thing you need is a component’s skeleton sticking out from its skin and interacting with other elements.

I repeat: never allow the contents of a component to stick out of that component!

In particular, if you have a component with floating elements inside, make sure you add a clearer element at the bottom of the component to have it resize with its contents.

In practice, I assume every function argument to be a bone, and every function to be a component. The situations where a function acts as a bone are so rare, and the results so difficult to reuse (so you’ve added a float:left to an element, where are you going to put it?), that I don’t really take them into account. The Component-Bone approach tends to solve almost everything elegantly, as long as you’re clever about where a component begins and a bone ends.

For instance, if you’re laying out a list of comments for a blog, you are probably going to have a «comment list» component with «comment» bones that are laid out on top of one another with appropriate margins, borders and paddings. The contents of every «comment» bone will be a «comment» component, with bones representing the picture, name, date and comment body, laid out cleanly without that component.

Whether the .commentList-comment is placed on the same element as .comment is something you can decide for yourself. What is essential is that, in order for the comment style to be reusable independently of the comment list style, all outer layout information should be in .commentList-comment, not in .comment.

Good.

Now, before I finish, do you remember when I said earlier that component B could be mangled by component A for two different reasons? The second reason happens to be inheritance. Everyone knows inheritance is bad for reuse. Right?

What happens is that, if you define a font size, color or family in a given element, then all descendants of that element will get the same font size, color and family (unless some CSS rule changes them). That’s inheritance: the value of the property in the child element is inherited from the parent element.

Rule Five: Only Change Inheritable Properties on your own Content

It’s impossible to define the entire list of inheritable properties at the root of every single component in your web side, however convenient it may be. Keeping everything in sync is very difficult, if not impossible. It is far easier, by comparison, to restrict such changes to only those areas of a component where the content is closely controlled and guaranteed not to contain any other components.

I believe there are basically three kinds of areas in any given page that are actually worth being paid attention to. These are:

  • Layout areas. These are those component-in-component-in-component places where touching an inheritable property can get you killed annoyed.
  • Text areas. Those contain no components, but they might still contain paragraphs, links, headings, images in a typical «rich text editor» fashion. If you change one property (such as the color of text), be ready to change all the related properties (the color of links) to keep a consistent appearance.
  • Line areas. These contain a short bit of text without any other tags. You don’t have to worry about changing properties here.

Every component should document, for every piece of content that should be filled from outside the component, whether it is a layout, text or line area. For instance:

/*
  <div class="comment">
    <span class="comment-author">...</span>
    <div class="comment-contents"><p>...</p></div>
    <div class="comment-reply">
      ...
    </div>
  </div>
*/

Here, a span (can only contain inline elements or text) represents a line area, a div-with-paragraph represents a text area (may contain several paragraphs, of course) and a normal div represents a layout area. This tells me, for instance, «don’t even think about putting a component in the comment contents, or I’ll clobber their stylesheet beyond recognition.»

Depending on the kind of web site you are building, other kinds of areas may be useful to you, such as forms.

That DOM removal thing, again

Earlier this month, I pondered what looked like a bug in JavaScript/DOM/jQuery: removing an element from the DOM with jQuery (either manually with remove() or by setting the html() of its parent to something else) kept most of the data bound to the element around, but removed all event handlers from it. You could then re-insert the element, but its event handlers would be lost.

I then gathered from several sources, such as Stack Overflow, that this is a jQuery issue (or rather, feature) and not a JavaScript one.

The underlying cause is explained by Douglas Crockford:

When a DOM object contains a reference to a JavaScript object (such an event handling function), and when that JavaScript object contains a reference to that DOM object, then a cyclic structure is formed. This is not in itself a problem. At such time as there are no other references to the DOM object and the event handler, then the garbage collector (an automatic memory resource manager) will reclaim them both, allowing their space to be reallocated. The JavaScript garbage collector understands about cycles and is not confused by them. Unfortunately, IE’s DOM is not managed by JScript. It has its own memory manager that does not understand about cycles and so gets very confused. As a result, when cycles occur, memory reclamation does not occur.

A common solution to this problem is to remove the cycles when the element is removed from the DOM. Since a major source of cycles in your average jQuery program is the presence of event handlers, then removing the event handlers when an element is removed from the DOM solves the problem most of the time.

With the release of jQuery 1.4, the new documentation for .remove() makes mention of this fact:

In addition to the elements themselves, all bound events and jQuery data associated with the elements are removed.

The documentation for .html() still makes no mention of this. If you want to remove an element and keep all the goodies you bound to it, jQuery 1.4 provides you with .detach():

The .detach() method is the same as .remove(), except that .detach() keeps all jQuery data associated with the removed elements. This method is useful when removed elements are to be reinserted into the DOM at a later time.

chain()

Like many other languages, PHP is home to method chaining, a pattern that allows writing several mutators on the same object without having to name it more than once. A typical example can be found in the Zend Framework for configuration of e-mails, among other things :

$mail = new Zend_Mail();
$mail -> setBodyText('This is the text of the mail.')
      -> setFrom('somebody@example.com', 'Some Sender')
      -> addTo('somebody_else@example.com', 'Some Recipient')
      -> setSubject('TestSubject');

This is a very simple trick that is accomplished by having every mutator return the object itself.
However, the PHP syntax rules forbid calling a member function on the result of a new-expression, so that you always require a two-step sequence: initialize the object, then call its chain of mutators.

Of course, a simple solution is to use a function:

 function chain($obj) { return $obj; }

 $mail = chain(new Zend_Mail())
   -> setBodyText('This is the text of the mail.')
   -> setFrom('somebody@example.com', 'Some Sender')
   -> addTo('somebody_else@example.com', 'Some Recipient')
   -> setSubject('TestSubject');

In a similar vein, there’s the matter of using the method chaining pattern on objects that were not designed for that. This is where a quick wrapper can come in handy:

 // Define the appropriate class and function
 class WithWrapper
 {
   public $value;
   public function __construct($obj) {
     $this -> value = $obj;
   }
   public function __call($name, $args) {
     assert (count($args) === 1);
     $this -> value -> $name = $args[0];
     return $this;
   }
 }

 function with($obj) {
   return new WithWrapper($obj);
 }

 // A typical record class
 class Person
 {
   var $age;
   var $firstName;
   var $lastName;
   var $married;
 }

 // Create entry for Jane
 $jane = with(new Person())
   -> age(24)
   -> firstName("Jane")
   -> lastName("Smith")
   -> married(false)
   -> value;

 // Jane gets married
 with($jane)
   -> lastName("Brown")
   -> married(false);

This is starting to look like Visual Basic

Left to the reader

PHP best practices have been moving steadily towards putting all functions inside classes, if only to provide namespacing. The good news is that you have no more namespace collision issues (well, unless you join together two projects with different conventions), and the bad news is that your function names are starting to get quite long.

<?php echo Framework_Html::Escape($username); ?>

Escaping strings to be output in HTML documents is a quite common behavior in PHP websites. Is the risk of a name collision worth giving up on a shorter approach, like:

<?=esc($username)?>

I am a proponent of turning very common operations into short functions with appropriate “smart” behavior. For instance:

  • esc(string $string) returns a Framework_Html instance representing the string escaped with htmlspecialchars.
  • esc(Framework_Html $html) returns its argument as-is, so you don’t have to care about whether a given string has already been escaped or not.
  • esc($format, $a, $b, $c...) returns a Framework_Html instance representing the unescaped string sprintf($format, esc($a), esc($b), esc($c)), useful to avoid repeated escaping in, say, <a href="%s">%s<a/> .

In a similar vein:

  • func(callback $call) returns its argument (after checking that is_callable($call) is true). This serves as a piece of documentation to tell that something is a function.
  • func(object $obj, string $func) returns a callback representing the member function $func of object $obj.
  • func(string $class, string $func) returns a callback representing the static member function $func of class $class.
  • func(string $args, string $body) acts as a shorter alias for create_function.
  • func(string $body) acts as an alias for create_function('$_',"return $body;"), in those cases you need a very short lambda expression.

And of course, there’s the jslog() and is() functions discussed earlier on the blog.

I think there would be a small handful of functions, maybe 8 or 10, that would be used so often on a given project that everyone would have to know about them anyway—so, you might as well keep them out of any class.

PHP Type Checking

PHP does not enforce types at compile-time (if anything, because there isn’t a compile time) and runtime checking only happens at the leaves of your source code tree, when you use a PHP function and that function notices one of its arguments is incorrect.

There are of course ways of introducing additional type safety into PHP code, both through development practices and through hints. For instance, you can hard-code checks into function prologues:

function SetUsername($username, $usr_id)
{
  assert (is_string($username));
  assert (is_int($usr_id));
  // ...
}

And, if using class types, you can also use the type hint mechanism in PHP 5 to get automatic warnings:

function FitToWindow(Image $img, Window $window)
{
  // ...
}

There remains the issue of member variables, which are modified and read in many different places. This means a “check the object is in a valid state” function is an useful addition to a class, to be used as a validity check during development to catch any errors as soon as they occur.

I sometimes use the following for my checks:

class Type
{
 public static function Is($value, $type)
 {
   if (func_num_args() > 2) {
     $args = func_get_args();
     array_shift($args);
     return self::Is($value, $args);
   }

   if (is_string($type))
     return self::Is($value, array_filter(explode(' ', $type)));

   if (empty($type))
     return true;

   $first = array_shift($type);

   if ($first == 'null')
     return $value === null || self::Is($value, $type);

   if ($first == 'array') {
     if (!is_array($value))
       return false;
     $next = 0;
     foreach ($value as $key => $val) {
       if ($key != $next++)
         return false;
       if (!self::Is($val, $type))
         return false;
     }
     return true;
   }

   if ($first == 'time')
     return is_int($value) && $value >= 0;

   if ($first == 'hash') {
     if (!is_array($value))
       return false;
     foreach ($value as $val)
       if (!is($val, $type))
         return false;
     return true;
   }

   if (is_callable($first))
     return call_user_func($first, $value) && self::Is($value, $type);

   if (is_callable('is_' . $first))
     return call_user_func('is_' . $first, $value) && self::Is($value, $type);

   if (class_exists($first))
     return($value instanceof $first);

   return false;
 }

 public function checkTypes()
 {
   self::check($this);
 }

 public static function check($obj)
 {
   $class = get_class($obj);
   foreach (get_class_vars($class) as $var => $value)
     if ($var{0} != '_')
       if (!is($obj->$var, $value))
         throw new Exception("Type error: `$class::$var` is not of type `$value`");
  }
}

The typical use is to define a new class, then assign a default value to all type-checked variables: that default value is a type string (or array) that is parsed and verified by the check functions. For instance:

class User
{
  var $id = 'int';
  var $name = 'null string';
  var $media = 'array Media';
  var $friends = 'positive int';
  var $_hash;
}

This would check that the identifier is an integer, that the name is a string or null, that media is an array of instances of the Media class, and that friends is an integer such that is_positive($obj->friends) returns true (assuming you define that function somewhere). The hash variable is unchecked because it starts with an underscore. This has some advantages:

  • Type expressions are shorter than the corresponding assert statements.
  • They go deeper as far as checks go (for instance, arrays also check that all members are of a certain type).
  • They document the code, by explaining in the class definition what the types of the variables are, as opposed to staying in a function.
  • They help with automated testing by allowing the creation of classes with arbitrary values of the chosen type.

This also has disadvantages:

  • This prevents setting an actual default value for the variables.
  • It introduces an artificial naming convention for variables starting with or without underscores.
  • Type-checking arrays or large structures takes time.
  • It’s not detected by documentation generators.
  • Does not play well with private variables.

Find the bug!

This code summarizes some text by removing anything past 255 characters. It’s done before inserting things into a database, so that the brutal cut of a VARCHAR(255) doesn’t leave a truncated string but rather three nice dots.

function summarize($text)
{
  assert (is_string($text));
  if (strlen($text) < 255) return $text;
  return substr($text,0,253).'...';
}

Yet, however simple it might be, this function contains a bug. One day, your database will return a weird error when an user tries to save the data. If you’re lucky, it will happen to a lone user who will complain and move on. If you’re unlucky, it will happen during a million-line import and crash everything, or it will display strange things on the screenof every user around.

Can you find why?

Continue reading ‘Find the bug!’

From PHP to Firebug

I often encounter problems with the typical approach of using var_dump (or Zend_Debug::Dump) to trace through PHP code:

  • Does not work on a page that has to redirect to another page.
  • Results are difficult to see if the page is queried through AJAX.
  • The crash may happen so utterly that no data is actually output, or it’s well hidden, or otherwise destroyed by output buffering mishaps.
  • I have to look around the page to find it, and it also destroys my page layout.

The other possibility commonly used is to use error_log or log or printing to stderr or printing to a file, all of which rely on access to the server or setting up a way of displaying the data and filtering through it to see only relevant information.

I could bring in some logging facility from a framework (such as Symphony or Zend, which has a nice one), but I’d rather not add overweight dependencies—the Zend_Log_Writer_Firebug setup is kind of scary if you’re not already using enough bits of Zend to make it worth it.

So, I set out to design a system that would make things simpler. It works by sending ‘console.log()’ instructions to Firebug to display whatever it needs to display, and keeping things in the session until they can be displayed.

The code:

 function jslog()
 {
   if (defined('NO_JSLOG'))
     return;

   $args = func_get_args();
   if (empty($args)) {
     return create_function('$x', 
       'if (strpos($x,"</head>") > 0){
         $s = "<script type=text/javascript>";
         $e = "</script></head>";
         $x = str_replace("</head>",$s.$_SESSION["jslog"].$e,$x);
         $_SESSION["jslog"] = "";
       } return $x;');
   }

   $_SESSION['jslog'] .=
     'console.log('.implode(',',array_map('json_encode',$args)).');';
 }

Nothing overly complex. Initialization happens as a simple ob_start(jslog()), and everything can be disabled with a well-timed define('NO_JSLOG','') if you don’t have Firebug running. Typical usage includes being echo-like:

jslog("Hello");

Being var_dump-like:

jslog($_POST);

Being printf-like:

jslog('Earned $%.2f today', $dollars);

Being the best of both worlds:

jslog('My session is %o and my post is %o', $_SESSION, $_POST);

Displayed objects are converted to JSON and sent to Firebug, where they can be explored with the nifty DOM explorer tab that is so much easier to use than looking at var_dumped data.

JSOS : JS-PHP mapping

I’ve been working on Javascript-PHP remote call mapping techniques. A set of PHP classes with static functions are selected as a public interface and automatically exported so that the Javascript can call them. Usually, this kind of mapping involves three difficult points:

  • Detecting what functions are exported and building the appropriate JS code.
  • Detecting what function the JS is calling.
  • Transforming data between JS and PHP.

I solve these with glob(), __autoload and json_encode (respectively) in a little prototype I called JSOS (JavaScript Or Something). jQuery on the client side provides me with synchronous AJAX queries. The code (PHP with echoed Javascript) looks like this, and is placed in a controller:

<?php
 function __autoload($classname)
 {
   require_once($classname . '.inc.php');
 }

 if (!isset($_POST['cls'])) {
   echo '<html><head><title>JSOS</title>';
   echo '<script type="text/javascript" src="jquery.js"></script>';
   echo '<script type="text/javascript">var jsos={};';
   echo '$.ajaxSetup({async:false,timeout:5000});';
   echo 'jsos.$=function(c,f,a){';
   echo 'var o={exception:"Server disconnect"},';
   echo 't={cls:c,func:f};';
   echo 'for(var i in a)t[""+i]=a[i];$.post(".",t,';
   echo 'function(d){o=d},"json");if("exception" in o)throw o.exception;';
   echo 'return o.result};';

   foreach (glob('*.inc.php') as $file)
   {
     $class = str_replace('.inc.php', '', $file);
     echo 'jsos.' . strtolower($class) . '={};';
     $text = file_get_contents($file);

     preg_match_all('/public static function ([A-Za-z_0-9]+)\(([^)]*)\)/',
                    $text, $func);

     foreach ($func[1] as $id => $value) {
       echo 'jsos.'.strtolower($class).'.'.strtolower($value).'=function(';

       $args = explode(',', $func[2][$id]);

       foreach (array_keys($args) as $key) {
         $args[$key] = str_replace(array(' ', '$'), '', $args[$key]);
       }

       $args = array_filter($args);

       echo implode(',', $args).'){return jsos.$';
       echo '("'.$class.'","'.strtolower($value).'",['.implode(',',$args).'])};';
     }
   }

   echo '</script></head><body></body></html>';
   exit;
 }

 $args = array();
 foreach ($_POST as $key => $val) {
   if ($key === 'cls')
     $cls = $val;
   elseif ($key === 'func')
     $func = $val;
   else
     $args[(int)$key] = $val;
 }

 ksort($args);

 if (!class_exists($cls)) {
   echo '{exception:"No such package!"}';
   exit;
 }

 if (!method_exists($cls, $func)) {
   echo '{exception:"No such method!"}';
   exit;
 }

 try {
   $result = call_user_func_array(array($cls, $func), $args);
   echo json_encode(compact('result'));
   exit;
 }
 catch (Exception $e) {
   $exception = "$e";
   echo json_encode(compact('exception'));
   exit;
 }
 

All files in the current directory with a ‘.inc.php’ extension are assumed to contain a similarly-named class, and all public static functions of that class are exported. For example, suppose Test.inc.php contains the following definition:

class Test
{
  public static function Run($a) { return "$a$a"; }
}

Then, to call the Test::Run function from the above page, one would simply type in Javascript:

jsos.test.run('Hello');

And indeed, with only five lines of code, the result to a client request is computed on the server and displayed on the client again, without issues. Here’s the result running in Firebug:

jsos

Further work should include handling errors (right now, if the server encounters an error or outputs non-JSON data, the call will die with a “Server disconnect” exception), which makes debugging easier than having to wade through the Net tab of Firebug.

Easier Unit Tests

Automated unit testing has three main advantages:

  1. It forces you to express in detail what you want the unit to do (so that a computer may then test it by checking the results).
  2. For an unit to be testable, its results should be checked as “correct” or “incorrect” on their own or with minimal context. This makes code easier to reuse.
  3. Since it’s automatic, you do not need to test manually every time you make a change: the tests will run hourly or nightly (or for every build) and tell you what went wrong.

The cost, however, is that unit testing code must be written for every unit. That code needs to create an object, manipulate it, then check its return values for validity. And test-driven-development advocates insist that such code should be written before the actual unit (make it compile, make it fail, make it pass).

I do understand their position: if you’re thinking of very specific corner cases that you might forget about while writing the unit, you might as well write tests for these corner cases first. But what about the simple, core functionality of the units? If a programmer sets their minds to testing an unit, they can usually look at the value and decide “it’s correct” or “it’s wrong”, and that process is orders of magnitude faster than writing an assertion that checks whether the value is correct. We’re not talking about complex ten-argument dozen-property function on a deep-inherited object here, it’s more of a “this string goes in, that string goes out” concept.

An example would be a “slugify” function : plug an arbitrary string into it, and a cleaned up lowercase hyphen-separated string comes out. As long as you are mindful of corner cases (weird characters, encodings, empty strings and so on) testing this function manually is quite simple. You certainly lose the benefit of point 1 (no more expressing the details of what the function should do in advance of writing it) but you can keep the benefits of 2 (whether automatic or manual, tests make code reusable) and you can even get the benefits of 3 by saving your code as an unit test.

This is what happens in the console-like scaffold I have been working on (you can click to enlarge):

autotest

You type in code in the text box. Clicking “Run” (or pressing TAB RET) sends the code through AJAX to the server and retrieves the var_dumped result (as well as a highlighted version of the code). The mindful programmer can use this console to preemptively test and debug code. The excellent programmer will even adapt their code to make such testing easier, and being able to use any kind of code in the barren context of this console means that code can be used anywhere.

The real hit there is that little Add Cog icon from the FamFamFam Silk icon set. It’s truly beautiful and adds a touch of antialiased pixel art to an otherwise bland rounded-edges Web 2.0 console. And what it does is even better: when clicking that icon, the corresponding piece of PHP is automatically added to the database of unit tests, as a test that runs the code on the green background and asserts that the output of the code is identical to the text on the grey background.

This makes the “for every bug you find, write an unit test that finds that bug” insanely easy to do: find the bug using the console, correct the code, check with the console that the bug is dealt with, and then add the final “bug corrected” console run to the unit test database. The work spent debugging a program from this console is never lost, it all ends up being an automated test with one single click.

This also makes writing unit tests much more developer-friendly: the console lets you test your code as you write it, without having to write controllers or views or whatever other contraption you need to display the results. An excellent choice for testing-averse developers.

Keep Your URLs together

I have worked with many novice Agile developers, and many of them tend to make the same mistake we all did while developing web sites. They are writing some kind of functionality, and they need to display some information or post back some data to the server, so they have to make up a new URL on the spot.

Being Agile, they don’t have an existing detailed specification to tell them what the URL should be. And they’re in the middle of writing something that’s quite complex, so thay can’t dedicate too much brain power to perform a proper choice. The end result is a hardcoded URL that they will need to change later on.

The problem here is that when an URL changes, everyone has to check their own files for uses of that URL, and correct it. Yes, it would be possible to add a permanent redirect (and it often is a good idea on a live website so that the search engine google references can be kept) but these do not play nice with POST requests, and what would be the point if the site has not gone live yet? So, people forget incorrect URLs in the middle of their files, and it takes a reasonable amount of examining crawler logs to find and replace them.

My usual practice is to have a central list of all URLs. Since I tend to work with an __autoload strategy, I just create an Url class and use members of that class to return properly HTML-formatted URLs : <?=htmlspecialchars(URLROOT.'/account/confirm/'.urlencode($id))?> becomes the cleaner <?=Url::ConfirmAccount($id)?>, and the actual account is hardcoded within the Url class as:

class Url
{
  const ROOT = 'http://mydomain.com';

  static function ConfirmAccount($id)
  {
    assert(is_int($id));
    return self::Local('account','confirm',$id);
  }

  private static function Local()
  {
    $url = self::ROOT;
    $get = '';

    foreach (func_get_args() as $segment)
      if (is_array($segment))
        foreach ($segment as $getkey => $getval)
          $get .= ($get === '' ? '?' : '&')
               .  urlencode($getkey)
               . '=' . urlencode($getval);
      else
        $url .= '/' . urlencode($segment);

    return htmlspecialchars($url.$get);
  }
}

So that the url-encoding of the segments, and the final cleanup of any HTML special chars that could have remained within the URL, are performed by the function automatically. Any associative arrays found in the argument list are converted to GET arguments that are also properly formatted and appended to the URL. Using the URL in a non-HTML environment, such as a text document or a Location: header, requires reversing the entity encoding beforehand, but this situation should be rare enough.

It would of course be proper to construct the ROOT constant from the requested domain name rather than hard-coding it. I have not done it here in order to keep the example short.

The benefits of this approach are many:

  • Specifying the URL of the account confirmation page is not done by a random page anymore, it’s done by the Url class. The random page merely has to state that it wants to link to the account confirmation page. In case of a change of the account confirmation URL (such as /account-confirm instead of /account/confirm) all modifications will occur in a single place.
  • The programmer that uses the URL does not need to remember the format used to provide the data: if an URL can be built from several arguments, those arguments can be named, documented and checked by the PHP code.
  • Everything within the URL is properly escaped before it is returned : the output of an URL function is always a properly formatted URL with all special characters encoded as HTML entities. This way, no invalid URLs will ever appear within the code.

Of course, in order to work, functions of the Url class should never be called with constant arguments: that would be akin to hardcoding those addresses. While the other benefits remain, changing the meaning of these arguments would have the same rippling effects over code. So, whenever you need to call a function with constant arguments, create a new function that explains what the url-with-constant-arguments is. For instance, “ConfirmAccount(0)” might be described as “ConfirmRootAccount()”, thereby shielding you from a change in the meaning of what a root account is.