Archive for the 'Dynamic' Category

Six Comment System Tips

If you’re trying to set up a comment system for your web site, there are a few tips I want to share with you:

  1. At first, moderate everything. The spam bots will find you faster than the real users will, and despite genuine advances in anti-spam technology, some spam will slip through. Once you get a decent number of readers who comment on your post, disable comment moderation.
  2. Let people provide their web site address. Not only does this motivate people with blogs to comment on a high-traffic web site (because it brings them traffic), but this makes your web site a good place for other readers to find interesting links.
  3. If you ask people for their e-mail address, do not spam them—a thank you e-mail is usually fine, but unexpected mail from a web site breeds anger and hatred.
  4. Never ask for e-mail validation before you let people post comments. 99% will give up, and by the time the validation e-mail arrives in their inbox, 99% of those remaining will have lost their interest in writing a comment.
  5. If you don’t have too many comments, respond to as many as you can. What you need is a conversation and a feeling of mutual respect and empathy, otherwise, people won’t come back. If they provide a blog link, go read that blog and comment there as well.
  6. Don’t use captcha validation. If you do, make sure people only have to enter it once (as opposed to once every time they write a new comment).

Any tips you wish to share? Please tell me about them in the (snicker) comments below.

Five Bad Reasons for using Magento

If you’ve been paying any attention at all to the e-Commerce universe the last few months, you’ve heard about Magento. It’s easy to find resources online explaining in great detail why Varien is a metaphorical messiah and Magento is the second coming and you should start using it right now.

logo

I’m not saying that « Magento sucks. » There are plenty of good reasons for using Magento. It’s reliable, backed by an active community and used by many people. There are also bad reasons for choosing it, and if you don’t know better, you will end up being disappointed.

Here are the five bad reasons I hear the most:

1. Magento is « fully customizable » by dummies

Magento is as customizable as any other open source solution : you can code away any issues you have. If you can code, that is. Sure, there’s a fair amount of customization you can achieve without ever leaving the Magento back-office (sometimes at the cost of learning XML), but unless you learn how to code or spend money on it, you can easily reach a hard limit. Don’t choose Magento because you think you’ll be able to do anything you want.

The best way to use Magento is still to pay for someone to customize it for you, and stick to the basic functionality.

2. Magento is a complete e-Commerce package

Magento is just a piece of software. This means that, once installed, you will need to do the marketing yourself, which is hard if you’re not used to internet marketing and don’t have an existing high-traffic web site to rely on. You will have to host your web site (and make backups). And will have to do any administrative tasks related to storing user information too, such as registering with government agencies.

If you don’t want to do any of that, try looking at Selling on Amazon instead.

3. Magento has been used by [large corporation]

The large corporation does not succeed because it used Magento. It succeeds because it can spend money and hire talent to leverage Magento appropriately. There’s work involved in creating a successful e-Commerce site, so make sure you can take whatever steps are necessary to create one with your tool.

Besides, almost every tool has been used by a corporation or another, including homebrew solutions like Amazon’s Obidos. To say that something has been used by a large corporation only means it’s somewhat useful, not necessarily that it’s the best thing around.

4. Magento is free

Oh, please. Magento is cheap, but certainly not free. Even assuming that you have the skills to set up and customize Magento on your own, doing so still takes time. Plus, you need hosting, accounting, logistics, shipping. And selling stuff online involves more work than just plugging products into a web site and waiting for customers to come! Setting up an e-Commerce operation is an investment, no matter how you look at it.

Or, as Jason Cohen has phrased it quite admirably:

Open-source is free like puppies are free. You don’t write a check to get it, but you have to support it for life. Your employee’s time is not free. Working around bugs is not free. Having nothing but the Web of Lies Internet to rely on for tech support is not free.

5. Magento is a complete, standalone product

This sounds like a good idea in theory — a completely standalone solution that can be used by everyone and handles everything: buying, storing, marketing, advertising, selling, invoicing, shipping… until you need to make it talk to other software. If you’re not lucky enough to use a big-name piece of software that has Magento connectors available, the application that handles your inventory or your accounting or your web site will not be connected to your e-Commerce web site.

So, you will either have to pay for a connector to be written, or copy over all the data by hand.

Did you choose Magento for a bad reason? Or did you ever give up on Magento for a bad reason only to find out later that you should have stayed the course? Make sure you mention it in the comments!

Related Posts

  • Hacking Magento : a peek at various common security vulnerabilities and whether Magento is subject to them
  • Jamin-Puech : a Magento-based web site I worked on
  • Seul avec l’Open Source [fr] : why open source is only half the solution, and why your own efforts will make a difference

Autoloading : be friendly to intruders

Back in the old days of PHP 4, every script started with a shopping list of include statements for other files.

PHP 5 brought along the __autoload function, and people were overjoyed. Since most programmers already had some kind of mental rule that said « class Foo is defined in Foo.php, » PHP let those programmers write down the rule and then followed it when looking for classes that had not been defined yet. A simple example would be:

function __autoload($classname)
{
  @include "$classname.php";
}

The classic PHP 5 architecture moved from « write a shopping list at the top of every script » to « include the file that defines __autoload » and even « redirect all requests to a single index.php file that defines __autoload and dispatches the requests. »

And the tutorialosphere went wild. People everywhere discovered the power of autoloading and expounded on the usage of __autoload as the next step in PHP evolution. A Bing search for __autoload (or google) will bring up many such one-page tutorials that discuss the benefits of that function for the sake of wide adoption.

But meanwhile, __autoload’s little sister spl_autoload_register remained unknown, despite a major difference:

If there must be multiple autoload functions, spl_autoload_register() allows for this. It effectively creates a queue of autoload functions, and runs through each of them in the order they are defined. By contrast, __autoload() may only be defined once.

With __autoload, your code breaks if you ever need to interact with code that uses its own autoloading approach. While you can usually turn __autoload into spl_autoload_register in a few key presses, you might not have sufficient control over the code to make that change.

joomla

Case in point: Joomla! is a well-known content management system (often said to be the third of the Drupal-Wordpress-Joomla! triumvirate of PHP CMS solutions). Since version 1.5, it uses __autoload. It looks like this:

function __autoload($class)
{
  if(JLoader::load($class)) {
    return true;
  }
  return false;
}

If you need to make Joomla! and the Zend Framework talk to each other, you need to include Zend Framework files by hand because you can’t add Zend_Loader on top of __autoload.  While it would be possible to change Joomla! to use spl_autoload_register instead of __autoload, this change will probably be overwritten by the next update you download.

In short, if you write code that will be used by people who do not own it (in the sense that they can change it without annoying side-effects), you need to use spl_autoload_register() instead of __autoload().

In the case of Joomla!, a simple patch would be to remove the __autoload() function definition and replace it with:

spl_autoload_register(array('JLoader','load'));

(In fact, there has already been one such suggestion made there).

Related posts

  • PHP Autoloading : yes, I made that mistake once, too
  • Pervasive code : an unusual class-to-file mapping in JITBrain
  • Singletons : having a single autoloader carries the typical issues of one-instance-only entities

MySQL (Un)Maintenance Trick

Not so long ago, I discussed the puzzling fact that in JavaScript, if(x) is not equivalent to if(x == true). Today, I stumbled upon a similar occurence in MySQL.

The Problem

Consider the following table, containing arbitrary text with an «alive» boolean flag:

CREATE TABLE tested (
  txt CHAR(32) NOT NULL,
  alive BOOLEAN NOT NULL,
  PRIMARY KEY(txt),
  KEY(alive)
);

INSERT INTO tested (txt,alive) VALUES
( MD5(1), FALSE ),
( MD5(2), FALSE ),
( MD5(3), FALSE ),
( MD5(4), TRUE ),
( MD5(5), TRUE );

I want to display all the lines that are marked as alive, sorted by their text field. What is the difference between these two requests?

SELECT txt FROM tested WHERE alive ORDER BY txt;
SELECT txt FROM tested WHERE alive = TRUE ORDER BY txt;

And the answer is… both queries will return the same result set! But let’s EXPLAIN them, just in case.

type possible keys key rows
WHERE alive ALL 5
WHERE alive = TRUE ref alive alive 2

The first query will scan through the entire table, whereas the second query will use the index to only run through lines that are still alive. If your table consists of 99% dead elements, the first query will be a hundred times slower than the second one!

The Reason

The fundamental reason for this behavior can be found in the MySQL documentation:

These types are synonyms for TINYINT(1). A value of zero is considered false. Nonzero values are considered true:

However, the values TRUE and FALSE are merely aliases for 1 and 0, respectively, as shown here:

In short, that boolean column is not actually a boolean value, but actually an integer. This means it can contain values that are neither TRUE nor FALSE, such as 2. Such a value would be returned by the first query, but not the second, so the query optimizer is not allowed to turn the first one into the second one. And «this column evaluates to true in a boolean context» is not easily expressed as a key constraint, whereas «this column equals one» is the textbook definition of a key constraint. This explains why the second query is faster.

It also means that the second query might start behaving incorrectly if a non-TRUE, non-FALSE value finds its way into that column.

The Solution

The good news is that NOT foo is mathematically equivalent to foo = FALSE, so that the constraint can be easily rewritten by turning the «alive» property into a «dead» property. Both queries become equivalent, so the second query is a faster yet functionally identical alternative:

CREATE TABLE tested (
  txt CHAR(32) NOT NULL,
  dead BOOLEAN NOT NULL,
  PRIMARY KEY(txt),
  KEY(dead)
);

INSERT INTO tested (txt,dead) VALUES
( MD5(1), TRUE ),
( MD5(2), TRUE ),
( MD5(3), TRUE ),
( MD5(4), FALSE ),
( MD5(5), FALSE );

SELECT txt FROM tested WHERE dead = FALSE ORDER BY txt;

Back/Refresh/Bookmark

A feature shared by almost any web search engine in existence is the ability to return to the list of results if the result you check out wasn’t what you hoped it would be. Even better, you actually return to the same page in the search results that you were on in the first place, so you can just resume your search.

Sounds obvious?

It is, when you’re using a technology straight out of the 90′s such as plain vanilla HTTP. The original problem that led to the design of HTTP was that you had many resources publicly available on the web, but there was no simple way to tell your friends where the latest naughty lady bitmaps important accounting files were. Thus were born the Uniform Resource Locators, known nowadays as URLs, which obviously helped locate resources in a way that was uniform across servers. Of course, for this to work, reaching out for the same URL twice should bring back the same piece of data both times (or, at the very least, a piece of data that is similar within reasonable expectations, such as being the “latest version” of the same data).

This is what HTTP GET does: locate and fetch a resource from the tubes, bringing back the same resource every single time.

The GET method means retrieve whatever information (in the form of an entity) is identified by the Request-URI.

This is why your web browser can implement back, refresh and bookmark but your SSH client can’t. And this is also why HTTP POST does not play nice with back, refresh and bookmark (it involves posting data, which may have an impact if done more than once, such as double-posting on a forum).

The Good

Google et alii play into this game: every search result page is a resource, which can be located through an URL that contains all the information about that page. Of course, the resource is generated on the fly on the server, but to the user (and the browser) it looks as if a static resource was present on the internet, so pressing the back button brings you back to the same URL, and this yields the expected data.

Note that many web browsers will also remember some other things about the visited page, such as the scroll position and any text you entered in the fields. This is important, because this lets you refresh a page without losing the data you entered.

The Bad

Of course, it’s hard to think in terms of resources-each-bound-to-locators when you’re not dealing with resources. And besides, the “user gets form, user posts form, user is redirected to new page” mechanism can be quite limiting when you’re trying to be clever. And it all looks so different from how desktop applications work!

I am regularly brought on the brink of intellectual suicide when faced with “web applications” that maul the very spirit of HTTP with spiked baseball bats covered in hot pepper oil. No back, no refresh, no bookmark… These designers hope (and in this, they are quite correct) that as long as their application looks and feels like an application, there will be no affordances that would cause the user to use these forbidden buttons.

A list of elements is the most ubiquitous example in modern computing: clicking on an element opens a detailed view of that element. The “application” way of doing this is opening a new window to display the details, so that the user may close the window or cancel their way out of it to go back to the list. No sane user will try to use the back feature to return to the list, because that’s not how the desktop taught us to behave.

This can be imitated on the web with a new browser window, or with a modal window using javascript and (possibly) AJAX.

On the other hand, if the application replaces the list with the detailed view, then it’s using the “web” way of doing things, and the “show me the previous screen” reflex of pressing back will kick in really soon. Too bad pressing back breaks the application. And pressing forward again to return to a reasonable state won’t work either.

I will not delve into the sheer stupidity of moving against affordances that are so deeply ingrained in the average internet user for design reasons (and even doing so for technical reasons is pretty bad).

The Ugly

And then, there’s the matter of AJAX. The good thing about AJAX is that it lets you dynamically update only a part of your page, without having to refresh the rest. The bad thing about AJAX is that it lets you dynamically update only a part of your page, without having to refresh the rest.

That’s because every single thing you did on your AJAX page will be lost the moment your user presses back or refresh. If your lists are managed through AJAX, then the user will navigate to page 3, then click on the link, which will bring them to another page, then press back, which will (as expected) bring them back to the page with the list… that was reset to page 1, because there’s no way for the browser to know that the complex javascript state that kept your list on page 3 had to be remembered in the history.

For a real-life example, try this link (from the ExtJS library), click around a bit, and then refresh — you’re back to step zero. Since everything happens on a single page in a new tab/window, there’s no back button to be used.

The good news is that there are tricks available for handling back/refresh/bookmark in JavaScript. Another example would be this one, from ExtJS again, which handles the back part almost correctly (still no luck on refresh/bookmark), or this example based on the excellent jQuery Address plugin, which manages to do all three properly.

The hash : a hack on top of a hack

Back in the days where documents could be quite large in order to be handled offline as a single file, people had to come up with a way of navigating though them. The adopted solution was to use anchors. Somewhere in your file, you would add a named anchor, such as <a name="myAnchor">. Then, appending #myAnchor to the URL of the document would scroll the anchor into view (well, the anchor was invisible, but you get my meaning).

So, you could link to a part of a document from the document itself by using the URL-with-hash as the target of a link. And you could link to it from other pages, which let you send your readers to the appropriate location on the referenced document.

It became fairly standard for web browsers to implement this system so that:

  • Clicking on an anchor (or changing the anchor part of the URL) would not reload the document if it was already being displayed.
  • The anchor was taken into account by back/refresh/bookmark.
  • JavaScript would have read-write access to the anchor part of the URL (usually as window.location.hash).

This, in turn, provides several ways of solving the back/refresh/bookmark in heavy JavaScript situations.

The first one, used by the ExtJS example above, is to alter the hash every single time you change the application state in a relevant way (what is relevant is left for the developer to decide), and simultaneously push information about the state in some kind of storage.

Then, as the user presses forward and back, the hash-changing will keep the user on the same page and let the JavaScript read from the storage whatever state needs to be applied to elements and restores that state accordingly. The problem is that, when the page is refreshed (or bookmarked and loaded later), everything but the hash will disappear in a puff of garbage-collected smoke, so that no memory of the stored states will remain.

The second solution is to store those states in the hash, so that they will be available even if you refresh, or bookmark the link, or give it to someone else.  This is what the jQuery Address example does, by storing the name of the selected tab in the hash, and then reading that name from the hash to activate the relevant tab whenever back or refresh happen.

Deep Linking

Quick note: this solution is sometimes referred to as deep linking. In fact, deep linking represents the very ability to open any page in a web site by entering its URL (so, if you find this article in a Google search, then Google is by definition deep linking into my site so you can access the article directly). As explained above, deep linking is a natural consequence of how HTTP works:

The conclusion is that any attempt to forbid the practice of “Deep Linking” is based on a misunderstanding of the technology [...]

The entire point of those hash-based tricks is to enable (or restore) deep linking in heavy JavaScript web sites. Quite ironic, given that not so long ago we were trying to forbid it.

The Middle Path : a hash behind the hash

Of course, between those two solutions, there’s the middle way: store some fundamental information in the hash, and keep the rest in a separate storage area. This way, back will be fully functional, refresh will still manage to keep enough relevant information around to be useful, and the hash will be small enough so that it remains below allowed size limits.

This third strategy is quite simple to manage by starting from the second approach, keeping a global hash table to store the detailed state, and adding the key of the current detailed state to the summarized state.

This is basically saying, store the current (detailed) state of components A and B in the hash table with key 0x3ffc, then store 0x3ffc in the hash along with the current (summarized) state of component C. When that hash is reactivated because of back, the detailed state will be available based on the key, and when reactivated because of refresh, the summarized state will be replaced and the unavailable detailed state will be replaced by defaults.

Model Zero : Keep Stuff Around

The simplest situation for this kind of thing is when developers absolutely need to keep some part of the page on the user’s computer. It could be, for instance, a media player that has to remain present on the page even as the user browses the web site. Refreshing the page would drop the media player, which would cause the music to stop, so the page must stay, and content must be updated on the fly.

Of course, you don’t want to have AJAX-related link problems: Google does not follow JavaScript links, so if the entire link tree of your website is implemented in JavaScript, only your home page will be known to Google.

A classic design technique found on Deezer is to keep two URLs for every page. One URL contains the hash, the other doesn’t. By default, links use the non-hash URL, and as such can be followed by Google. If JavaScript is enabled, links automatically start using the hash URL instead.

For example, the non-hash URL for a search for “nicollet” on Deezer is:

http://www.deezer.com/en/music/result/all/nicollet

The hash URL for that same search is:

http://www.deezer.com/en/#music/result/all/nicollet

Both display the same page, so the algorithm for displaying the hash URL is basically, “load all the data for the non-hash URL and use it, while keeping the few elements that have to stay around, such as the music player”. Quite simple, and it does not detract a lot from the typical way of designing web pages in a non-AJAX world.

Sure, Deezer is flash-based, but equivalent JavaScript techniques are used by Facebook when they need to keep a chat box around, and (if you need a public JS-based example anyone can visit) by Jukebo.fr.

This is actually what the jQuery Address plugin was designed to do: run through all the links on a given page and turn their no-hash URL targets into hash-based URL targets, then provide the user with a “hash was changed” event that they can respond to by doing whatever they feel should be done (such a loading bits of data from the server for the new URL).

As a whole, Model Zero is fairly simple to implement and use, and requires only minimal support from the server, such as controllers that display partial views instead of complete views if called with a certain parameter.

Model One : Components on a Page

… starring Samuel L. Jackson. This is the simplest model that does not involve simply mirroring server-side pages on the client, and it usually happens when a normal page contains some kind of JavaScript component, such as jQuery UI tabs that are expected to be refreshed appropriately, or an ExtJS grid with complex state.

The assumption here is that you don’t care about Google following the links in your grid (jQuery UI tabs decay to a nice HTML markup when JS is disabled, so they would create no problems), but you do care about your user using back/refresh/bookmark on the page.

The basic idea is that every single component on the page has to be connected to the history handler (be it Ext.History, $.address or anything else) so that they notify it of their state changes, and are notified in return of external address changes.

Depending on which component you are trying to connect to which history handler, things will be more or less difficult. In the rather common situation where you wrote your own component, it will have to provide a “my state changed” event (triggered by your component whenever user interaction causes a change of state) and a “give me a new state” function (called by the history handler whenever the state changes). Somewhere along the way (probably in your component) you will need a serialization-deserialization utility to turn the complex state of your component into a hash-storable string and back.

For additional safety, make sure that setting a state that is already the current state does not trigger the state change callback: depending on your history handler implementation, it might cause an infinite loop, and it ain’t pretty.

A quick-and-dirty example to connect jQuery UI tabs to the jQuery address plugin (this should not work as is, and is intended merely as an illustration):

$(function(){

  $.address.externalChange(function(){
    var i = $.address.value().substr(1) || 0;
    $('#tabs').tabs('select', i);
  });

  $('#tabs').tabs({
    select : function(event){
      $.address.value(
        $('#tabs').tabs('option','selected');
      );
    }
  });
});

The first part extracts the state from the hash and selects a tab, the second part reads the currently selected tab and activates it. It relies on the tabs not firing a “select” event if the selected tab was already active, and on externalChange not being triggered by setting the value from the code.

Note that the second part does not compute the new hash value from the event itself, but rather from global data: this is because, if you had two components on the same page, changing any of the two would have to store the new state of both components in the URL. In that situation, you would have a single “compute the hash based on the state of both components” function triggered by the on-change events of both components.

Model One-Dot-Five : what if I use both of the above?

What happens if you load your pages using model zero, and then you have components on these pages that use model one?

This is not very difficult, but it requires some cooperation on the part of model zero, which “owns” the hash for practical reasons. The first possibility is to have model zero use only one channel of your history management tool—for instance, if your tool allows you to specify query strings, you can store your state as ZERO?COMPONENT=STATE&COMPONENT=STATE without difficulties.

The second possibility is to have model zero provide an interface that lets components from the loaded pages register and control parts of what model zero will save to the hash through a modification to its serialization process (again, the query string seems like an obvious choice).

The problem is that the first possibility will not work…

First: since you always remain on the same page, components that disappear from the page do not remove the events you registered with the history manager to handle them. It is the responsibility of model zero (which knows when a page content change happens) to eliminate any callbacks registered by components from model one. Otherwise, on every URL change, all callbacks will be processed, which reduces performance, causes memory leaks and might, in really nasty situations, cause a component on a page you visited five minutes ago to maul the URL of the page you’re visiting right now.

Also remember that the hash-was-changed event will trigger an AJAX reload of the page contents, so the new components will by definition not be present on the page when the event is triggered!

So, model zero reloading takes a little more effort now:

  1. Detect that a reload is required.
  2. Unregister all hash change event handlers registered from child components.
  3. Query the content, insert it into the page. It may contain components, which will then register themselves with the hash change event.
  4. Propagate the hash change event to the newly created components.

Since it now needs to unregister child handlers and propagate the hash change after loading is done, model zero needs tighter control over how children are connected to the history handler. I believe the only clean solution here is to make model zero a global object that wraps around the history handler, so that child components only need to interact with the global object.

An interesting interface for this part of the global object would be:

zero.onChange(callback)
Registers a callback to be called whenever the state of the child components needs to be modified (because of an URL change). Child components, when they initialize themselves, register themselves this way to listen to URL changes. All callbacks registered this way are discarded when the contents of the page change.

zero.setState(dictionary)
Changes the query string part of the URL to take into account changes that happened to child components. This does not trigger the change event.

Being global means components initialized in <script> tags in the incoming, server-sent HTML can still reach out for it and register themselves.

Model Two : recursive boxing

When you look at the previous model, you notice that model one components inside a model zero page is a flat structure only waiting to be made recursive. The key to this is to create a component that acts like a model zero page by loading its content from the server and passing down some information to its own child components.

Model Three : client controllers

This model gives up any kind of non-javascript usage of the web site. A classic example is gmail (and, just like gmail, you can provide an alternate static HTML version of the web site with greatly reduced features). This approach basically leaves everything in the hands of the JavaScript programmer by letting him define the application as a stateful component, with the hash in the URL being a serialized version of the component state (or the relevant parts thereof). It’s also the hardest to work with, since there’s basically no high-level architecture, though you may implement a domain-specific one like gmail did.

Why I Gave Up on the Zend Framework

The Zend Framework is a really nifty thing. Really, it is. The amount of functionality that you get merely by installing it is extremely exciting: internationalization, forms, an MVC layout for your program, a cute class loader, a database abstraction layer, a templating engine, a request dispatcher, mail-sending functions, pretty debugging “dump” functions… and there are so many people working on it and using it that basically all the bugs left in there are shallow. It has been a staple dependency of many of my projects for quite a while now, and still is.

Zend Framework is actually available for your projects in two flavors, «use what you need» and «obey the hive mind», with a continuous spectrum in-between these two extremes.

We are Zend. Resistance is futile.

We are Zend. Resistance is futile.

The «use what you need» approach leaves the maintenance programmers with a warm and fuzzy feeling. All you have to do is dump all the framework files somewhere in your include path, include the files for the bits you want to use, and just call the functions. The framework takes care of recursively including the appropriate dependencies for you and carefully avoids treading on any toes by prefixing everything with «Zend».

In fact, if you use Zend_Loader, you can skip the include-source-files step completely (except for Zend/Loader.php obviously), and since auto-loading is reverse-compatible with loading files manually, it’s also a good step towards a well-deserved refactoring.

So, if you need to send multi-part mail, with HTML-and-text content, in UTF-8 format, you can just use Zend_Mail and everything will work fine regardless of the rest of the code base. There are dozens of such small features (for PDF generation, LDAP, access control, localization, and so on).

There is virtually no excuse for not using a plug-in class from the Zend Framework in your application if it solves the problem you’re having. Besides, since the files are not included until you need them, the worst that could happen is that you’re having some PHP code taking up a few megabytes of disk storage for nothing. So I have a lib/Zend directory on all my projects, just in case I need something.

Obey the Hive Mind

While many pieces of Zend are independent of each other, there’s a central functionality core that’s designed to act well together. There are many examples:

  • it’s easier to use Zend_Dispatcher and Zend_Controller together.
  • it’s easier to render a Zend_View if you’re also using Zend_Controller.
  • it’s easier to turn a Zend_Form into HTML if you’re using Zend_View.
  • it’s easier to set up a “login already in use” validator with Zend_Form if you have a field in a Zend_Db_Table to connect it to.
  • it’s easier to translate Zend_Form error messages with Zend_Translate (and Zend_Registry).

Sure, it’s usually possible to take advantage of 99% of the functionality without having to add new dependencies, but there’s always that tiny voice in the back of your head, nagging that you could get that additional 1% so easily if you just gave in.

Giving in means, of course, going all the way to Bootstrap heaven: now your project is laid out across the lines of the ideal Zend Framework template, your files cleanly stashed in their folders with a cosmic Feng Shui feeling to it all, and the Zend approach to MVC pervades your every HTTP request.

This isn’t so bad: actually, such an approach has some huge selling points for shops that write lots of small projects, such as the ability to get 20% of your basic functionality up and running in days, the ability to hire any Zend-certified developer and not have to educate them about the framework, and you don’t need them lousy architects on your team.

I’ve had some trouble with the Zend way before, though. There are some bits of functionality that I won’t touch with a ten-foot-pole, such as Zend_View, Zend_Controller or Zend_Db_Table, because the havoc they wreak in situations I find myself in outweighs the benefit.

Documentation

My main issue is that I find Zend quite lacking on the documentation side.

«But the Zend Framework is possibly the most documented there is!» you say, before trailing off in a rant about how the “FM” should be “R” and the “FW” should be “S”.

You’re probably right. But I don’t really care about that documentation. I’m talking about project documentation—to know what happens in code written by my team.

«What does Zend have to do with that? Document your code, you lazy slob!»

Humans are lazy, and I would argue that laziness is actually an essential quality of a good programmer. I can require that documentation be written, but I expect it to be missing, inaccurate or monosyllabic. Things like that happen when you’re rushing out a bug patch at 3:00 am. And even if I could ensure that documentation is written and kept up to date, I’d rather have my code be self-documented—not only does it take less time, but it’s harder to get inaccurate self-documentation and you can even get the language to check things for you.

It’s the difference between documenting the parameter type as a @param MyClass $obj in a comment and documenting it as a MyClass $obj type hint in the function signature.

Look at the average .phtml template, and you’ll see something like this:

<div>
  ...
  <a href="<?php echo $this->getUrl() ?>"><?php
    echo $this->escape($this->user->name)
  ?></a>
  ...
  <?echo $this->partial('preferences.phtml', $this->pref); ?>
  ...
</div>

Half the point of a view in the MVC approach is that I should be able to easily reuse that view from any controller, or even from within another view. Of course, Zend lets me do this very easily:

$view = new Zend_View();
$view->xxx = yyy; // Fill in members
$view->render('template.phtml');

The red line, of course, is where trouble begins. Since Zend_View fields are by definition dynamic, there’s no way to get auto-completion to help you find what they should be. Nor can you look at a list of these fields in a class definition or function definition, because there’s none. You have to read the template file and find out by yourself what values are used by the template and what their types should be. Oh, and if the template passes some of that data to other templates, you have to read those templates too, because they might use specific information. And you have to look at view helpers too, because they might be accessing view elements behind your back.

Your best bet is to look at an existing controller that uses the view, and hope that you don’t stray too far from what that controller is doing. You never know: a certain member might be expected to be present if another has a certain value (this never happened with the first controller, but it happens in yours), there’s no compiler checking that all values are being provided appropriately, and runtime testing doesn’t reveal such special cases on the first try.

And they say Zend_View is an object-oriented approach to rendering…

The most important aspect of Zend_View templating is that it is object oriented. You may use absolutely any value type in a template: arrays, scalars, objects and even PHP resources. There is no intermediary tag system between you and the full power of PHP. Part of this OOP approach is that all templates are effectively executed within the variable scope of the current Zend_View instance. To explain this consider the following template.

That’s not what object-oriented means. OOP means if two views behave differently, then they should be instances of different classes, instead of injecting arbitrary code and data into a single class and spitting in the face of encapsulation.

The bottom line is that reusing Zend_View templates is a pain in the derrière unless you take special steps about it (steps that you wouldn’t need with a standard class-with-members).

What’s in that row?

This is futher compounded by the way Zend_Db works: an ORM that generates SQL from a sequence of PHP calls, and then turns the result into a list of Zend_Db_Table_Row objects. Which leads to the question of what fields can be found in a given row, and that question is hard to answer.

A typical application will follow a rule along the lines of «every table row is, by definition, a row of a table, so you just peek at the table definition and you know that each column is mapped to a field,» and that is a fine rule to follow, because then the only issue is you can’t type-hint the row based on the table, so you can’t make sure a given argument is always a row from “account”.

But following that rule is hard. In addition to those 80% plain old CRUD cases where you’re working with a single table at once, you’ll have those 20% that use joins where you need data from both tables (never mind the pain of doing that in PHP). Then you end up with a row that breaks the rule, so you keep it in tightly enclosed areas of your application, until it gets too frustrating not being able to use a view-that-renders-accounts on a record-that-contains-accounts-and-sessions, and the next thing you remember is that you don’t know if a given view expects an account or an account-and-session.

And the language can’t help you.

Auto-complete me

Nor can your editor, for that matter, since auto-completing $row-> requires knowledge that your editor simply cannot have (the list of columns defined when you configured your Zend_Db_Table).

I really do enjoy it when my code editor helps eliminate some of the tedium of writing code. In fact, I’m quite ready to make a small additional effort tagging my members, arguments and functions with some type information just so that writing code can be easier.

My editor is Eclipse PDT. It has several nice features that I use extensively.

The first is, of course, its ability to suggest members of classes and objects. Having well-defined classes to represent your data means that Eclipse can use the type hints you leave around to determine that $account is of class Account, so that it has a $firstname member. That’s:

  1. one less round-trip to the database documentation
  2. zero chances of typing $account->firstName by mistake
  3. being told immediately if $account has entirely different members (because it’s another type)

Since Zend_Db_Table_Row and Zend_View actually go out of their way to make sure that you can have arbitrary data in there based on runtime considerations, getting this functionality out of them is impossible.

The other nice feature I use a lot is the ability to control-click a class or function to see its definition. This lets me navigate around the code in seconds instead of having to open the project file explorer, expand several layers of directories usually far from each other, and spend precious brain power translating a class/member naming scheme into file naming schemes.

Finding a file is a job for the editor, not for the programmer.

My view helpers look like this:

View_Account::renderSimple($account);

Clicking on that function name brings up the file and scrolls it down to where it matters. Took me less than a second. Zend View Helpers look like this:

$this->renderSimpleAccount($account);

I dare anyone to navigate to the definition of that helper in less than a second. [EDIT: apparently I shouldn't dare people on the internets :) ]

What about links? The typical approach to generating a link to a different part of a site, with the Zend Framework, is to spell out its controller and action:

<a href="<?php echo $this->url(array(
  'controller' => 'user',
  'action' => 'edit',
  'id' => '123'
));?>">click me!</a>

Now you have to click on every single URL on your website to make sure links are correct and you still manage to forget one and the end user will click on that link that’s spelled out as «edti». And even if you do get it right, you still have to navigate to the appropriate controller class, open it up and scroll down the right action.

My urls look like this:

<a href="<?=Action_User_Edit::url(123)?>">click me!</a>

Since every one of my actions is a class (as opposed to a function in a controller class), they get to have members, and one of these members is a static url() function that:

  • lets me ctrl-click through to the action itself
  • has PHP check that my link is correct (or else, die with a class-not-found answer)
  • documents the expected URL arguments as function arguments
  • even lets me find out who links to a certain controller, in case I have to move it

The bottom line…

…is that I don’t use Zend_View, Zend_Controller or Zend_Db in my projects. I need my code to be self-documenting, and there’s nothing self-documenting about Zend_View or Zend_Db. I need my code to be easy to navigate through and simple enough for my editor to handle, and the full dynamic behavior of Zend_View and Zend_Db prevent that.

Your needs might be different. Are they?

Reusable CSS

Woe unto CSS, for it provides no refactoring-friendly tools! The CSS beast has neither functions nor variables, and its definition of inheritance is perverted beyond words. Pain and suffering await those who hope to keep their CSS from one project to the next, or even share the CSS between pages on a single website!

Consider two simple pages: the home page has a small navigation bar (selected by #navig) at the top of the screen, while the catalog page as a larger navigation (still selected by #navig). Each page includes a different layout.css stylesheet, so everything’s fine. Except that now, anything defined in a layout has to be copied over by hand to the other layouts if you want to reuse them. Ouch.

Does that example sound extreme? It certainly is! But the danger of page-specific stylesheets remains: if you won’t be stepping on your own toes with something as trivial as #navig, perhaps .book will mean two different things on two different pages?

Rule Zero : Keep all your CSS Together

This might seem a bit harsh, especially if you have truckloads of CSS floating around and don’t want to slow down the initial loading time of your page, or the time spent resolving collisions. However,

  • This rule will make it easier to factor out common bits of CSS, leading to an overall smaller set of stylesheets.
  • The number of HTTP requests matters as much as the bandwidth, so delivering all your CSS as a single, minified, gzipped blob is often a good performance idea.
  • The entire point is to make it harder to create page-specific rules, so that you don’t make a rule page-specific by mistake, and strive to make most of your rules page-independent.

I usually place all of my CSS in correctly named files in a directory on my server, then have the server generate a single, all.css master file that @imports the other stylesheets by path. This means Firebug’s CSS browser will correctly identify the source file for any given rule. When the code moves to a production server, the auto-generated master file becomes a pre-generated/minified/gzipped resource, and can even be moved to a CDN for improved performance.

On the other hand, keeping all your code in one place will only help you see collisions, it will not actually help you solve them.

Fortunately, we can look to other languages for tips and trick on how to make code easier to reuse. The fundamental observation is that you cannot use something if you don’t give it a name. One would expect CSS identifiers and classes to serve the same function, and indeed it does work in simple cases:

a.important { font-weight: bold }

Now, you have the important «function», that you «call» on an anchor element to make it appear important. Bam! Instant reusability. Using an identifier instead of a class still allows reuse on distinct pages, but restricts reuse within a single page.

Rule One : Document your «Functions»

You cannot reuse code if you cannot find it, and even if you don’t forget about it someone else on the team might be completely unaware that it even exists. So you should somehow document that the important class exists. My personal, PHP-friendly preference, is to have a “Css” class with all those nice classes available:

class Css
{
  /* a.important : make a link important */
  const IMPORTANT = "important";
}

Then, you can reuse them when you see fit to do so:

Click <a href="<?=$url?>" class="<?=Css::IMPORTANT?>">here</a>

That’s just personal preference—any way of documenting your CSS classes is fine as long as it’s somewhere everyone can see it. In fact, I have a nice set of PHP helpers lying around to bind jQuery UI CSS effects to my code, thereby documenting what jQuery UI can do without having to dive into the stylesheets every single time.

The real problem appears when you have more than one «argument». A typical example is the list of links with a “selected” link: the graphical effect applies to the list, to the elements of that list, and to the content of those elements, which leads to several rules selecting different elements.

ul#navig { margin: 0 ; padding: 0}
ul#navig li { list-style-type: none }
ul#navig li.selected a { font-weight: bold ; color: black }

This kind of structure cannot be documented simply by stating that the ul#navig element is going to become a pretty list, because without the li.selected in there there will be no «pretty» worth mentioning.

I document this as follows:

/*
  <ul id="navig">
    <li><a>Item</a><li>
    <li class="selected"><a>Item</a><li>
    <li><a>Item</a><li>
  </ul>
*/
ul#navig { margin: 0 ; padding: 0}
ul#navig li { list-style-type: none }
ul#navig li.selected a { font-weight: bold ; color: black }

Why not document it in the PHP code, then? IMO, a CSS designer to write a quick const FOO = "bar"; line in PHP, but not an HTML helper that turns an array of links into pretty list HTML. CSS designers write the CSS (with documented HTML) and PHP developers turn that into HTML helpers.

</acronym soup>

Another important element of code reuse is the notion of encapsulation, and in particular the existence of “private data” that is part of the program, but can only be accessed by some parts.

There is no such thing with CSS. There are two reasons for this. The main reason is that being sloppy with selectors is commonplace:

/*
  <div id="userList">
    <ul class="users">
      ...
    </ul>
    <a>New</a> |
    <a>Edit</a> |
    <a>Delete</a>
  </div>
*/
#userList a { color: #FF9900 ; text-decoration: none }
#userList a:hover { text-decoration: underline }

The three links in the user list component («new», «edit» and «delete») will appear in orange without underlining, as expected and documented. The unexpected and non-documented consequence of this code is that all links within the list of users will be orange without underlining as well.

Rule Two : Only Select what you Need to Select

The typical consequence of sloppy selectors is that «insert component A into component B» operations utterly destroy the formatting of component A. The typical designer reaction to such graphicalypse is «Darn, component B destroyed some property of component A, so let’s add some rules to component A to reverse the damage!»

Bad idea. It makes the code longer, and only hides the actual problem (along with any symptoms that only appear in specific cases). The real solution is to make sure selectors only select what they need to select.

One way of doing so is to use the «>» selector, as it restricts the selection to only children of the initially selected element. This would work:

#userList > a { color: #FF9900 ; text-decoration: none }
#userList > a:hover { text-decoration: underline }

Of course, it wouldn’t work in IE6, but who cares about IE6 anymore?

The general approach is to use specific classes for those elements that must be affected:

/*
  <div id="userList">
    <ul>
      ...
    </ul>
    <a class="userList-link">New</a> |
    <a class="userList-link">Edit</a> |
    <a class="userList-link>Delete</a>
  </div>
*/
a.userList-link { color: #FF9900 ; text-decoration: none }
a.userList-link:hover { text-decoration: underline }

If anyone uses that userList-link class in their code (and your naming conventions were clean enough), they had it coming.

Rule Three : Choose Proper Naming Conventions

It is quite important to remain consistent in your naming practices, especially since you now need to identify, for any given identifier and/or class:

  • If it represents a «function» (#userList), or if it helps select a specific «argument» (.userList-link).
  • In the latter situation, what function the argument corresponds to (so that you can look for its definition).

My preference is to use camelCase names (classes or identifiers) for functions, and camelCase-camelCase names for arguments, where the first half is the name of the function. The CSS would then be gathered in a camelCase.css stylesheet named after the function, with a documentation of the expected HTML at the top, hence making it much easier to find and reuse.

Now that you have access to functions, you will probably want to use them to implement reusable «components» — standalone pieces of HTML and CSS that represent atoms of information.

At some point, you will have to make components interact (if only to respect each other on the page layout). All of this will be hell if component A uses normal block layout rules, component B is floating to the left and component C is positioned absolutely.

Rule Four : a Component Should only Care about its Inner Layout

As soon as a component starts to care about outer layout concepts such as margin, position, floating or clearing, you will be in a world of pain. This is because such concepts depend on where the component appears, and as such are not easy to reuse.

I split my CSS code into components and bones:

  • Components. These are reusable atoms. They do not care about their outer layout at all, so they never specify anything like margin, position, floating, clearing, display mode or anything that might cause them to interact differently with their surroundings on the page.

    They may specify a width and height if they wish, but it is discouraged (a component that can adapt to any geometry is easier to use). They can specify anything they want in terms of border, padding, font, color, background, font, and any inner properties they need.

  • Bones. These are elements found inside the components that handle the layout of the component contents themselves. They can and should make appropriate assumptions about what bones can be found within a component and how they should interact to result in the layout you need to see.

A nice finishing touch is to make the component overflow : hidden, because the last thing you need is a component’s skeleton sticking out from its skin and interacting with other elements.

I repeat: never allow the contents of a component to stick out of that component!

In particular, if you have a component with floating elements inside, make sure you add a clearer element at the bottom of the component to have it resize with its contents.

In practice, I assume every function argument to be a bone, and every function to be a component. The situations where a function acts as a bone are so rare, and the results so difficult to reuse (so you’ve added a float:left to an element, where are you going to put it?), that I don’t really take them into account. The Component-Bone approach tends to solve almost everything elegantly, as long as you’re clever about where a component begins and a bone ends.

For instance, if you’re laying out a list of comments for a blog, you are probably going to have a «comment list» component with «comment» bones that are laid out on top of one another with appropriate margins, borders and paddings. The contents of every «comment» bone will be a «comment» component, with bones representing the picture, name, date and comment body, laid out cleanly without that component.

Whether the .commentList-comment is placed on the same element as .comment is something you can decide for yourself. What is essential is that, in order for the comment style to be reusable independently of the comment list style, all outer layout information should be in .commentList-comment, not in .comment.

Good.

Now, before I finish, do you remember when I said earlier that component B could be mangled by component A for two different reasons? The second reason happens to be inheritance. Everyone knows inheritance is bad for reuse. Right?

What happens is that, if you define a font size, color or family in a given element, then all descendants of that element will get the same font size, color and family (unless some CSS rule changes them). That’s inheritance: the value of the property in the child element is inherited from the parent element.

Rule Five: Only Change Inheritable Properties on your own Content

It’s impossible to define the entire list of inheritable properties at the root of every single component in your web side, however convenient it may be. Keeping everything in sync is very difficult, if not impossible. It is far easier, by comparison, to restrict such changes to only those areas of a component where the content is closely controlled and guaranteed not to contain any other components.

I believe there are basically three kinds of areas in any given page that are actually worth being paid attention to. These are:

  • Layout areas. These are those component-in-component-in-component places where touching an inheritable property can get you killed annoyed.
  • Text areas. Those contain no components, but they might still contain paragraphs, links, headings, images in a typical «rich text editor» fashion. If you change one property (such as the color of text), be ready to change all the related properties (the color of links) to keep a consistent appearance.
  • Line areas. These contain a short bit of text without any other tags. You don’t have to worry about changing properties here.

Every component should document, for every piece of content that should be filled from outside the component, whether it is a layout, text or line area. For instance:

/*
  <div class="comment">
    <span class="comment-author">...</span>
    <div class="comment-contents"><p>...</p></div>
    <div class="comment-reply">
      ...
    </div>
  </div>
*/

Here, a span (can only contain inline elements or text) represents a line area, a div-with-paragraph represents a text area (may contain several paragraphs, of course) and a normal div represents a layout area. This tells me, for instance, «don’t even think about putting a component in the comment contents, or I’ll clobber their stylesheet beyond recognition.»

Depending on the kind of web site you are building, other kinds of areas may be useful to you, such as forms.

That DOM removal thing, again

Earlier this month, I pondered what looked like a bug in JavaScript/DOM/jQuery: removing an element from the DOM with jQuery (either manually with remove() or by setting the html() of its parent to something else) kept most of the data bound to the element around, but removed all event handlers from it. You could then re-insert the element, but its event handlers would be lost.

I then gathered from several sources, such as Stack Overflow, that this is a jQuery issue (or rather, feature) and not a JavaScript one.

The underlying cause is explained by Douglas Crockford:

When a DOM object contains a reference to a JavaScript object (such an event handling function), and when that JavaScript object contains a reference to that DOM object, then a cyclic structure is formed. This is not in itself a problem. At such time as there are no other references to the DOM object and the event handler, then the garbage collector (an automatic memory resource manager) will reclaim them both, allowing their space to be reallocated. The JavaScript garbage collector understands about cycles and is not confused by them. Unfortunately, IE’s DOM is not managed by JScript. It has its own memory manager that does not understand about cycles and so gets very confused. As a result, when cycles occur, memory reclamation does not occur.

A common solution to this problem is to remove the cycles when the element is removed from the DOM. Since a major source of cycles in your average jQuery program is the presence of event handlers, then removing the event handlers when an element is removed from the DOM solves the problem most of the time.

With the release of jQuery 1.4, the new documentation for .remove() makes mention of this fact:

In addition to the elements themselves, all bound events and jQuery data associated with the elements are removed.

The documentation for .html() still makes no mention of this. If you want to remove an element and keep all the goodies you bound to it, jQuery 1.4 provides you with .detach():

The .detach() method is the same as .remove(), except that .detach() keeps all jQuery data associated with the removed elements. This method is useful when removed elements are to be reinserted into the DOM at a later time.

chain()

Like many other languages, PHP is home to method chaining, a pattern that allows writing several mutators on the same object without having to name it more than once. A typical example can be found in the Zend Framework for configuration of e-mails, among other things :

$mail = new Zend_Mail();
$mail -> setBodyText('This is the text of the mail.')
      -> setFrom('somebody@example.com', 'Some Sender')
      -> addTo('somebody_else@example.com', 'Some Recipient')
      -> setSubject('TestSubject');

This is a very simple trick that is accomplished by having every mutator return the object itself.
However, the PHP syntax rules forbid calling a member function on the result of a new-expression, so that you always require a two-step sequence: initialize the object, then call its chain of mutators.

Of course, a simple solution is to use a function:

 function chain($obj) { return $obj; }

 $mail = chain(new Zend_Mail())
   -> setBodyText('This is the text of the mail.')
   -> setFrom('somebody@example.com', 'Some Sender')
   -> addTo('somebody_else@example.com', 'Some Recipient')
   -> setSubject('TestSubject');

In a similar vein, there’s the matter of using the method chaining pattern on objects that were not designed for that. This is where a quick wrapper can come in handy:

 // Define the appropriate class and function
 class WithWrapper
 {
   public $value;
   public function __construct($obj) {
     $this -> value = $obj;
   }
   public function __call($name, $args) {
     assert (count($args) === 1);
     $this -> value -> $name = $args[0];
     return $this;
   }
 }

 function with($obj) {
   return new WithWrapper($obj);
 }

 // A typical record class
 class Person
 {
   var $age;
   var $firstName;
   var $lastName;
   var $married;
 }

 // Create entry for Jane
 $jane = with(new Person())
   -> age(24)
   -> firstName("Jane")
   -> lastName("Smith")
   -> married(false)
   -> value;

 // Jane gets married
 with($jane)
   -> lastName("Brown")
   -> married(false);

This is starting to look like Visual Basic

Left to the reader

PHP best practices have been moving steadily towards putting all functions inside classes, if only to provide namespacing. The good news is that you have no more namespace collision issues (well, unless you join together two projects with different conventions), and the bad news is that your function names are starting to get quite long.

<?php echo Framework_Html::Escape($username); ?>

Escaping strings to be output in HTML documents is a quite common behavior in PHP websites. Is the risk of a name collision worth giving up on a shorter approach, like:

<?=esc($username)?>

I am a proponent of turning very common operations into short functions with appropriate “smart” behavior. For instance:

  • esc(string $string) returns a Framework_Html instance representing the string escaped with htmlspecialchars.
  • esc(Framework_Html $html) returns its argument as-is, so you don’t have to care about whether a given string has already been escaped or not.
  • esc($format, $a, $b, $c...) returns a Framework_Html instance representing the unescaped string sprintf($format, esc($a), esc($b), esc($c)), useful to avoid repeated escaping in, say, <a href="%s">%s<a/> .

In a similar vein:

  • func(callback $call) returns its argument (after checking that is_callable($call) is true). This serves as a piece of documentation to tell that something is a function.
  • func(object $obj, string $func) returns a callback representing the member function $func of object $obj.
  • func(string $class, string $func) returns a callback representing the static member function $func of class $class.
  • func(string $args, string $body) acts as a shorter alias for create_function.
  • func(string $body) acts as an alias for create_function('$_',"return $body;"), in those cases you need a very short lambda expression.

And of course, there’s the jslog() and is() functions discussed earlier on the blog.

I think there would be a small handful of functions, maybe 8 or 10, that would be used so often on a given project that everyone would have to know about them anyway—so, you might as well keep them out of any class.



1170 feed subscribers
(readers who polled a feed this week)