Monthly Archive for September, 2009

chain()

Like many other languages, PHP is home to method chaining, a pattern that allows writing several mutators on the same object without having to name it more than once. A typical example can be found in the Zend Framework for configuration of e-mails, among other things :

$mail = new Zend_Mail();
$mail -> setBodyText('This is the text of the mail.')
      -> setFrom('somebody@example.com', 'Some Sender')
      -> addTo('somebody_else@example.com', 'Some Recipient')
      -> setSubject('TestSubject');

This is a very simple trick that is accomplished by having every mutator return the object itself.
However, the PHP syntax rules forbid calling a member function on the result of a new-expression, so that you always require a two-step sequence: initialize the object, then call its chain of mutators.

Of course, a simple solution is to use a function:

 function chain($obj) { return $obj; }

 $mail = chain(new Zend_Mail())
   -> setBodyText('This is the text of the mail.')
   -> setFrom('somebody@example.com', 'Some Sender')
   -> addTo('somebody_else@example.com', 'Some Recipient')
   -> setSubject('TestSubject');

In a similar vein, there’s the matter of using the method chaining pattern on objects that were not designed for that. This is where a quick wrapper can come in handy:

 // Define the appropriate class and function
 class WithWrapper
 {
   public $value;
   public function __construct($obj) {
     $this -> value = $obj;
   }
   public function __call($name, $args) {
     assert (count($args) === 1);
     $this -> value -> $name = $args[0];
     return $this;
   }
 }

 function with($obj) {
   return new WithWrapper($obj);
 }

 // A typical record class
 class Person
 {
   var $age;
   var $firstName;
   var $lastName;
   var $married;
 }

 // Create entry for Jane
 $jane = with(new Person())
   -> age(24)
   -> firstName("Jane")
   -> lastName("Smith")
   -> married(false)
   -> value;

 // Jane gets married
 with($jane)
   -> lastName("Brown")
   -> married(false);

This is starting to look like Visual Basic

Filling in the Holes

As the technical lead on a software project, I get to interact on a daily basis with stakeholders that are technologically impaired. They think in high-level, end user terms like « I need a comment system » or « send the user an e-mail notification » and they expect things to happen without having to delve into the boring techno-babblish details of how it’s done. The rationale is that deciding what happens is a stakeholder job and deciding how it happens is a developer job.

Of course, no matter how hard you try to separate the two, developers are sometimes going to decide what happens, because no stakeholder can spare the time needed to walk the development team through the bloody details of every single feature. Given the productivity gains from recent advances in development tools and the social skills of the average programmer, I think it’s fair to say that an in-depth description of a feature takes about as long as the implementation of that same feature.

This is why all projects follow the same steps regardless of the methodology used:

  1. A stakeholder makes some general statement about a feature, such as user comments being available on certain items.
  2. The development team writes what they feel is the best implementation of that feature in the context of that project, filling in the missing details as they go.
  3. The stakeholder sees the results and points out what details did not match their mental model of the requested features.

This introduces several dangers: there’s the budget issue when missing details turn out to be costlier than originally envisioned, and there’s mismatch issue that makes the customer unhappy. A good project manager should strive to reduce these. How?

Gathering more requirements is a classic strategy in waterfall models. The basic reasoning is that the more details you manage to gather about the product to be implemented, the lower the chances of a surprise requirement blowing your budget away and the higher the chances of meeting the customer’s expectations. The downside is that this step takes time, which in turn uses up the budget and delays the release.

Also be careful when deciding on a budget after having gathered all the requirements. Everyone changes their mind sooner or later, and any requirement change, no matter how small, should prompt a critical analysis of the budget: adding « just one link » is indeed a tiny change, but the involved overhead (change the internal documentation, determine the impact on other feature, tell the developer, write the code, test the changes) adds up much faster than you think.

Fast iterations involving the stakeholders is the Scrum approach, shared with many agile methodologies. It deals with budget issues by developing the simplest possible implementation that matches the requirements. It also provides the customer with feedback on the estimated implementation time through poker planning before every iteration, so that requirements can be changed on the fly if sacrificing a small feature can significantly reduce costs.

Short-iteration agile projects build customer dissatisfaction into the development process: if you don’t like the existing implementation, you can ask for a change and get it done on the next iteration. It also lets the developers decide based on technical considerations (what’s easier to implement) as opposed to high-level decisions (how the stakeholder wants a feature to behave), which lets them work faster and do what they are skilled at.

The downside to these approaches is that stakeholder involvement should not be taken for granted, and even when it is, it’s not uncommon for customers to have dissenting opinions among themselves. Also, an agile process does not help if there’s a fixed deadline and a fixed set of 1.0 features, and the customer expects these to be done in time.

Having developers with common sense helps a lot—all the people working on the project should be able to tell ahead of time if a given solution is going to be unacceptable, and dismiss it if they see one. This avoids implementing useless solutions, or forwarding an useless solution to a customer for validation.

The obvious corollary is that a developer should write bug-free code without having to be told to write bug-free code. Commits that contain segmentation faults, access violations, unhandled exceptions, blank pages, broken links or performance bottlenecks should be investigated into, to determine why a mistake was made and what steps should be taken to avoid repeating it.

What other techniques have you come up with for reducing communication-related risks?

Last Minute Skin

Right now, we render our page layout on the server, thus wasting precious bandwidth sending the same header, footer and menus all over again every single time. AJAX techniques have evolved to reload only the inner part of every page, but they require clever URL manipulation or ‘back’, ‘refresh’ and bookmarks won’t work, and they impose strong constraints on page layout and on the way the server responds to requests.

Why not do it the other way around? Have every page include the same layout-generating JavaScript file (kept in the browser cache for optimum performance) ! This is the idea behind the last-minute-skin pattern.

Now we know why…

…the stock market crashed.

failcac

Left to the reader

PHP best practices have been moving steadily towards putting all functions inside classes, if only to provide namespacing. The good news is that you have no more namespace collision issues (well, unless you join together two projects with different conventions), and the bad news is that your function names are starting to get quite long.

<?php echo Framework_Html::Escape($username); ?>

Escaping strings to be output in HTML documents is a quite common behavior in PHP websites. Is the risk of a name collision worth giving up on a shorter approach, like:

<?=esc($username)?>

I am a proponent of turning very common operations into short functions with appropriate “smart” behavior. For instance:

  • esc(string $string) returns a Framework_Html instance representing the string escaped with htmlspecialchars.
  • esc(Framework_Html $html) returns its argument as-is, so you don’t have to care about whether a given string has already been escaped or not.
  • esc($format, $a, $b, $c...) returns a Framework_Html instance representing the unescaped string sprintf($format, esc($a), esc($b), esc($c)), useful to avoid repeated escaping in, say, <a href="%s">%s<a/> .

In a similar vein:

  • func(callback $call) returns its argument (after checking that is_callable($call) is true). This serves as a piece of documentation to tell that something is a function.
  • func(object $obj, string $func) returns a callback representing the member function $func of object $obj.
  • func(string $class, string $func) returns a callback representing the static member function $func of class $class.
  • func(string $args, string $body) acts as a shorter alias for create_function.
  • func(string $body) acts as an alias for create_function('$_',"return $body;"), in those cases you need a very short lambda expression.

And of course, there’s the jslog() and is() functions discussed earlier on the blog.

I think there would be a small handful of functions, maybe 8 or 10, that would be used so often on a given project that everyone would have to know about them anyway—so, you might as well keep them out of any class.

Dashes vs Underscores

When you optimize your website for search engines, you have to take every little facet into account. Every character in an URL is a weapon for getting a better ranking than your competitors.

Which leads to quite silly bikeshed conversations.

I have heard that when part of an URL, foo-bar is considered by Google to be a single word, while foo_bar is considered to be two words. I have also heard that foo-bar is treated as two words and foo_bar is treated as one. And I have also heard that both foo-bar and foo_bar are treated equally as two words. The variety of dates available for the resources (anywhere from 2001 to 2009) makes it even harder, as I suspect Google has been evolving their algorithms on the subjects in the last eight years.

Ironically, a search for “dashes vs underscores” reveals (in the top five ranks) websites with either underscores and dashes as separators, further adding to the confusion. What is true (and easily verified) is that when part of a search query, foo-bar is treated as two words and foo_bar is treated as one word.

It’s important to notice, however, that search engines don’t exist in a vacuum. They have to take into account whatever is the most prevalent way of presenting information. And it appears, from the many websites that use the “dashes” convention, that the “dashes as two words, underscores as one word” side of the debate has won. Wordpress? Dashes. Magento? Dashes. Amazon? Dashes. Google’s own Blogger? Dashes.

So, even if the “dashes as two words, underscores as one word” side was wrong to begin with, it has become so prevalent today that it would be foolish for Google not to change their algorithm in the face of such unambiguous adoption of a word separation convention.

Besides, underscores look ugly :)