Monthly Archive for October, 2009

Heterogeneity

John is a fairly adept PHP developer. He is familiar with object-oriented features from PHP 5, has experimented with some PHP 6 features, and is quite skilled at bending the Zend or Symphony frameworks to his will.

But John is not really an SQL expert—sure, he might have written some simple queries and he can fight his way around a normalized database, but he’d rather use a mapping layer on the PHP side. He is no fan of JavaScript either, although he can sometimes hack together a quick solution based on his limited knowledge and online tutorials. And John is in trouble, because web development is ultimately a heterogeneous environment where you have to know three languages to get things going.

There have been many efforts to help out programmers like John by eliminating as many languages as possible from the process. Database mapping tools provide a protective layer that shields SQL-averse programmers from the unfathomable Lovecraftian horror of INNER JOIN. Ready-made components encapsulate clever JavaScript so that server-side developers don’t have to muck in the demeaning task of keeping browsers in line.

I’ve had the pleasure of working on both sides of the fence. Some of my projects were beautifully streamlined 98% PHP – 1% JS – 1% SQL works of art where the various pieces of non-PHP code were carefully hidden away from the prying eyes and trembling hands of PHP developers. Others had a complete architecture designed for each of the three languages, with team members that specialized in certain areas only, and strong conventions on how data had to cross the borders. These were not toy projects, but rather large websites that had to support the brunt of thousands of visits.

The bottom line is that when you’re running a website with the intent to get money out of it, you want as many daily hits as possible, and so the software must be able to handle all of them smoothly. If you are writing your own web software, the burden of optimizing that software is yours as well. This involves identifying bottlenecks and reimplementing them to do less work, so that you will eventually need:

  1. Developers that are familiar enough with the software and any third party elements involved.
  2. Profiling tools that help identify what parts of the software take the most time.
  3. A software model that is flexible enough to allow reimplementing critical pieces.

It is generally observed that [weasel words] the layers of PHP/C#/Java code stacked to hide away the SQL/JS/CSS/HTML underneath will decrease the performance of the software, because databases are queried with SQL and web pages are presented in JS/CSS/HTML regardless of what one-language programmers would like to believe, so the layers end up generating that code themselves, often with hilarious results.

A classic example would be server-side code for displaying a list of objects (displayed here as PHP):

$user_id = Controller::getCurrentUser();
$user    = UserFactory::getById($user_id);
$friends = $user -> getFriendsList();

foreach ($friends as $friend_id) {
  $friend = UserFactory::getById($friend_id);
  View::renderUser($friend);
}

This is an actual excerpt from a piece of code I wrote, with only slight rewording of certain components. A naive implementation would result in a first query reading from the database the data for the current user (with a list of 200 friends), then 200 more queries reading the individual users from the friend list. This results in a slow-loading page, a dead database and an unhappy customer (believe me, I’ve tried). The PHP-only programmer answers with a blank stare, because the code is properly written and well-encapsulated.

Now, here’s the million dollar question: can your mapping layer be configured so that the above code can get all the data in one, two or three queries?

The project I that code is coming from relied on Zend_Db for database work, which could hardly be called anything but naive. The optimization approach was to place a caching layer between the user factory and the database, and configure that layer with rules such as “if the developer calls getFriendsList, the next time UserFactory::getById is called, precache the data for all the users returned in the list of friends”. This meant that only two queries were made, which happened to save the day on that particular project.

Still, my point is not whether your favourite ORM can achieve the same performance as hand-written SQL code. Some of them certainly can.

My point is that to write software that has database interaction as a bottleneck, you need programmers that understand the database interaction layer thoroughly. Whether that layer is a PHP/C#/Java ORM or plain old SQL requests is irrelevant—without knowledge of how data is pulled from the database, there will be no way to prevent or eliminate bottlenecks reliably.

The ORM system Foo can eliminate the need for SQL experts, but it creates the need for Foo experts instead. What is important, then, is whether it’s easier to find Foo experts or SQL experts.

Gremlin : jQuery Growl

I have uploaded Gremlin, a simple jQuery-based Growl system, for elegant page-wide notification needs. Check it out, it’s free and built to be simple.

Javascript signals

Signals operate as a simple way of decoupling dependencies within a project, by allowing caller-callee relationships through an interface that makes both parties anonymous. Assuming a shared signals object is provided, the receiver registers itself on that object:

signals.output = function(text){ alert(text) };

And the sender uses the registered channel to remotely execute that function:

signals.output('Hello');

Signals are the functional equivalent of object-oriented inversion of control, a technique that allows users to configure the behavior of third party code without having to modify it. This is done by removing any explicit dependencies of the third party code on specific behavior units, such as “output a piece of text”, and injecting those dependencies back from the outside as an object or set of objects which hide the actual implementation of those behavior units. Basically, we’re replacing:

function frobnicate(a,b) {
  foo(a);
  bar(b);
  alert('Success');
}

frobnicate(1,2); // Can't prevent the alert box from appearing!

With the slightly longer but easily configured:

function frobnicate(a,b,output) {
  foo(a);
  bar(b);
  output('Success');
}

frobnicate(1,2,function(t){alert(t)}); // Original behavior
frobnicate(1,2,function(){}); // Muted function
frobnicate(1,2,function(t){console.debug(t)}); // To firebug console

Since a given piece of code might depend on several distinct behavior units, I use a record to transmit all that behavior as a single argument. This results in the classic “configure my library with your options object” that can be found, among other places, in jQuery.

This simple approach causes a small number of difficulties:

  • If I want to use a slightly different version of a signals object for another part of the program, I have to manually create a copy of the object and change the copy (basically the equivalent of a pure functional object mutation).
  • In some situations, I might want to handle several callbacks for a single signal. The current approach only lets me define a single function for a given signal.
  • Some functions of the signal set (such as sending a form through AJAX) might rely on other functions of the signal set (display an error message) to handle their own behavior unit dependencies, and I would like those functions to automatically have access to the signal set they belong to, dynamically.

This leads me to a subtly different implementation of signals:

signals = (function(){
  s = function() { this._c = s; };
  s.prototype.channel = function(c) {
    var h = [],
        s = function() { for (var k in h) if (h[k]) h[k].apply(this,arguments); };
    s.bind = function(f) { h.push(f); return h.length-1; };
    s.unbind = function(f) { h[k] = null; };
    return this.set(c,s);
  };
  s.prototype.set = function(n,v){
    var i = function(){ this._c = i; };
    i.prototype = new this._c();
    i.prototype[n] = v;
    return new i();
  };
  return s;
})();

This small class encapsulates pure functional mutation semantics by means of its set function:

var signals = new signals();
var initial = signals.set('xxx',100);
var final = initial.set('xxx',200);
console.log(initial.xxx + ' ' + final.xxx); // Outputs '100 200'

This small piece of behavior is in itself quite helpful, but it gets better: if a function is added to the object, it remains there but is always executed within the context of the current object and therefore has access to its actual values.

var signals = (new signals()).set('show',function(){console.log(this.xxx)});
var initial = signals.set('xxx',100);
var final = initial.set('xxx',200);
initial.show(); // Displays 100
final.show(); // Displays 200

Last but not least, it’s possible to create a full communication channel that can be connected to several receivers and forwards its arguments to all receivers.All receivers are called with the signals object as their context, which lets them access it and behave accordingly.

var unreadMessages = 0;
var signals = (new signals()).channel('setUnread');

// Update the number of unread messages, notify user if they have
// new messages.
signals.setUnread.bind(function(unread){
  if(unreadMessages < unread) this.notice('You have new messages!');
  unreadMessages = unread;
});

// Update all places that display the number of unread messages
signals.setUnread.bind(function(unread){
  $('.unread').html('Messages'+(unread > 0 ? ' ('+unread+')' : ''));
});

// When at page scope, notices are printed by growling
var global = signals.set('notice',growl);
global.setUnread(10);

// When inside a smaller scope, such as a component, display notices in
// a dedicated location
var local = signals.set('notice',function(arg){$display.html(arg)});
local.setUnread(15);

Smart Spamming

I found an interesting comment on my website today, for the article on last-minute-skinning of a page in HTML from some Javascript. It looks pretty sane:

CT — October 5, 2009 at 22:15

Interesting stuff. I don’t relish the idea of taking the vile HTML our designers produce and creating the skin files. Nice proof of concept though – I’ll have to keep an eye out for an excuse to use it ; )

This comment, while completely adequate and relevant to the article, is spam. How do I know? First, the provided website is a classic credit-rating-improvement web portal. But should I prevent people who work in the credit spam industry from posting relevant comments on my articles? Well, there are other comments on that article, too, such as:

Tom Milsom — September 8, 2009 at 11:41

Interesting stuff. I don’t relish the idea of taking the vile HTML our designers produce and creating the skin files. Nice proof of concept though – I’ll have to keep an eye out for an excuse to use it ; )

So, it looks like the spam-bot found an earlier comment on the article, copied it verbatim, and posted it with a different link. This would ensure that, if the spam domain is fresh enough not to register as such, the Akismet spam detector would let the comment go through unscathed based on its content alone. And as a human, if I did not pay attention to the author’s website while reviewing comments, I would let it go through as well because the comment would look sane. I don’t remember comments from one month ago, and I guess many people don’t.

Everyone enjoys advertising if they are looking for, or otherwise interested in, the product being advertised. I discovered Cushy CMS because it ran an ad on The Daily WTF, and I am quite happy with the discovery because I was looking for such a product. And nobody enjoys advertising for products they don’t need—I don’t give a cheese about US credit ratings. I have limited space on my screen that I’d rather not fill up with advertising about things I do not need, and my time is even more precious than that.

This spam comment blurs the line between spam comments that are irrelevant to the discussion and point to websites irrelevant to the readers, and ham comments that are relevant to the discussion and point to websites that are relevant to the readers (by virtue of usually being run by the author of the comment and thus sharing at least some elements).

Suppose that tommorrow, someone posts an original and interesting comment on one of my articles, yet links it to a credit rating website. Should I accept the comment as such, block it, or publish it without the link?

One of the main reasons why people comment on the blogs of other people is to improve their visibility on the internet. If I post a comment on a well-known blog, hundreds and thousands of people will browse over that comment, a small percentage of these will find my writing worthy enough to follow the link and end up on my blog, and an even smaller percentage will become regulars, posting comments and subscribing to my feeds. Which is good, of course, because the more comments I get on my blog, the more interesting it becomes.

This means that commenting is often quite similar to advertising one’s own blog or website. People allow commercial advertising on their blogs (ad banners and such) to get money in return, and they allow personal blog/website advertising on their blogs to get comments in return. So, I guess if an irrelevant website was linked to by a genuinely interesting comment, I would publish that comment (of course, restrictions do apply: I would not allow all websites, just like I would not allow all ad banners).

I like the blogs with good comment advertising—where I can browse the comments and find links to interesting websites.



693 feed subscribers
(readers who polled a feed this week)