Author Archive for Victor Nicollet

Team Naming

Names. We programmers see more names in a single session than a phone directory editor will see in their entire career, yet we prove worse at finding names than a fift

Naming is a two-way approach: the name must accurately convey what the thing is, and the name should be easily guessed for that thing. The two sides of the equation are not always of equal importance: guessing the name of a local variable is less useful than guessing the name of a class in a library.

Humans always use context to understand what names mean, in order to disambiguate the many possible meanings of a name. For instance, ‘window’ could refer to the ubuquitous user interface concept or it could refer to the glass-paned house building block. A sentence like “I open a window” needs a minimum level of context to disambiguate between the two interpretations.

On the other hand, the information must not be made redundant either. For instance, a class named “OpponentTimer” defined within a “Opponent” namespace: it’s fairly obvious that the timer is related to an opponent both within the namespace (you’re dealing with opponents, so the timer should have something to do with it) and outside the namespace (as it’s being referred to as “Opponent.Timer” or something like that). The same goes with file paths, such as ‘/scripts/invaderScript.py’ which could have been named just as well ‘/scripts/invader.py” with no loss of information due to the context.

This is what I used to think about this issue :

One thing I have noticed time and time again is that the vast majority of people I work with (or see on the internet, for that matter) are very bad at finding names. So bad, in fact, that I can usually propose better names within seconds of reading them for the first time. At least they agree that the new names are better.

The reason is, in retrospect, quite obvious : two brains are better than one, especially when it comes to looking at things in different contexts to determine if there are any ambiguities. These programmers must have been thinking the same thing when looking at my code.

By now, you should have noticed that “team naming” refers to “working as a team to name things” as opposed to “naming a team”—lack of context does tend to create such misunderstandings :)

So, that would be why pair programming with at least one -ansi -pendantic -Wall programmer in the team tends to create code that is much cleaner than one-programmer code written by either participant.

Short of acquiring some sort of split personality, there’s no easy way to achieve that alone : no matter how hard you try, your brain can only hold one context at a time. Some programmers might be able to switch contexts faster than others when they think about it, but you generally don’t switch contexts when naming a variable. Maybe we should?

Even then, noticing an ambiguity involves thinking about two contexts where the name has different meanings. Merely having two contexts in mind (or minds, when working as a team) doesn’t mean you actually found two incompatible contexts.You have to think about all the contexts in which the element can be used. The good news is, all of these are nested and you can reach them by removing information progressively from the innermost context that you have in mind. If your code was laid out correctly, these should match scopes, classes and namespaces/packages.

Global side-effects

Functional programming languages do not allow global state—they don’t really allow any kind of state, but at least local state can be simulated by function arguments. The only way to handle “global” state would be to pass the global variable as an additional parameter (and additional return value) to every function. And it wouldn’t play nice with exceptions either, so every exception would have to carry the values of all global variables.

Besides, global state has the annoying habit of creating hidden dependencies between program parts, which lead to coupling that is hard to break and code that is not re-entrant.

On the other hand, global variables have the clear benefit of making code shorter by making many dependencies implicit. Why pass around data as an argument or a member variable if making it global works just as well?

Some kind of middle ground here would be self-propagating implicit function arguments and return values.

Such arguments would have to be declared at global scope, since they have to be globally accessible. On the other hand, since they are merely arguments, they have no value until a function is called with one, which means the declaration would probably end up looking like:

channel total : int

This would declare an implicitly progated integer argument named “total“. Then, functions could be made to read and write to that channel implicitly as if it were a normal variable.

let rec sum = function
  | [] ->
      ()
  | head :: tail ->
      total <- total + head ;
      sum tail

Since the presence of the channel can be determined statically (it happens at name resolution time)  it can also be made a part of the function’s type signature, which has the double benefit of allowing compile-time checking of channel usage and letting the programmer know which channels must be provided for a function to work:

sum : int list [total] -> unit

The presence of a list of channels before a function means the function must be called in an environment where the channel is available. This means either the environment’s type is inferred to contain that channel, or a conflict appears and the channel must be explicitly defined:

let a = 0 in
bind a as total in
  sum [ 1 ; 2 ; 3 ] ;
  print_int a ;
  sum [ 4 ; 5 ; 6 ] ;
  print_int a

The bind instruction creates an environment where the named variable or expression is automatically updated in the local scope (as if it were applied the principles of assignment I explored last week). That is, every time a function call writes to the channel, the modification is propagated back to the bound variable as soon as the function returns (and the channel always reflects the value of the variable). So the example above would print 6 and 21.

Channel implementation is fairly straightforward : every function that accesses a channel takes an implicit final argument representing the input value of that channel (if the channel is read) and returns an implicit value (if the channel is written to). That value is then locally bound, in the calling function, to either a similar constructor that propagates the use of the channel, or an explicit bind operation for the channel.

Local side-effects

In pure functional languages, variables cannot be modified. In order to perform operations that imperative programmers achieve with variable modification, functional programmers perform re-assignment:

// Imperative
x = sqrt(x);

(* Functional *)
let x = sqrt x in ...

And in the case of a loop, they represent the loop body as a function that is called with different arguments, and the modified value is propagated through the calls as an argument:

// Imperative
x = 0
for (i = 0; i < 10; ++i)
  x += i;

(* Functional *)
let x = 0 in
let rec loop i x =
  if i = 10 then x else loop (i+1) (x+i)
in
let x = loop i x in ...

So, can local modification of variables be allowed in a pure functional context through simple rewriting rules that turn the imperative constructs into their functional counterparts? Yes, although there are some limits.

Simple assignments

We know that there might be assignments within every expression (an assignment is an expression of the form x ← <V> ; <E>). The tactic used here is to turn every expression <E> into an expression of the form:

let [E'] in x1 ← v1 ; x2 ← v2 ... ; e

such that no sub-expression within the “let …” contains an assignment (and is therefore a plain old pure functional language expression) and only variable names appear after the “in …” . This rewriting tactic is applied recursively : atomic expressions (those without sub-expressions) are trivially written as such, which leaves only the question of expressions with sub-expressions that have already been recursively turned into the above format.

The key is to turn op(<A>, <B>, <C>, <D>) into:

let [A'] in
let xa* = va* in
let [B'] in
let xb* = vb* in
let [C'] in
let xc* = vc* in
let [D'] in
let e = op(a,b,c,d) in
xa1 ← va1 ; xa2 ← va2 ; ... xb1 ← vb1 ; ... ; e

The order of the sub-expression is fixed (which means an order of evaluation has to be specified for every operation). Every expression computes all its values (including the asisgned ones) and the assignment is simulated using redefinition of the variable so that subsequent sub-expressions can “see” the modified variable. The actual assignments are then pushed to the end of the complete expression so that the recursive rewriting rule will see them from the superexpression above.

Note that expressions which do not always evaluate all sub-expressions cannot be expressed as above. Fortunately, all such expressions can be rewritten as a conditional and expressions that evaluate all sub-expressions, and conditionals are evaluated above.

Note that a lambda expression is considered to be an atomic expression here, so no propagation of assignment occurs from within the anonymous function to the surrounding context! This means (as expected) that the assignments are a purely local construct that cannot cross function barriers, so I simply remove them when they reach the top level to obtain a normal pure functional expression.

This performs the transform:

(* Imperativeish *)
let x = 3.14 in
x ← sqrt x ; print_float x

(* Functional *)
let x = 3.14 in
let v = sqrt x in
let x = v in
print_float x

Loops and conditionals

Loops, conditionals are special cases of block-based expressions. A block is a language construct that looks like a lambda expression (it gathers all values from the surrounding scope) and may be executed zero, one or several times. The main difference is that a block cannot be saved for later execution, it is always executed at a specified time. In short, a block is a beta-redex. Since we have the guarantee that the block is executed before the current context resumes, we can let it alter the state of the current context.

For every block, I select a set of variables that the block may alter (although it does not necessarily do so). The block itself is syntactically an expression, so I can rewrite its internal assignments as above by moving them all to the top level of the expression. Then, I turn the block into a closure which takes as arguments the aforementioned set of variables, and returns a pair that contains the result of the expression and the final values (after assignment) of the set of variables. In short:

(* Imperativeish *)
{
  a ← 1 ;
  b ← b + 2 ;
  a + b
}

(* Functional *)
fun (a,b,c) ->
  let a = 1 in
  let b = b + 2 in
  a + b, (a,b,c)

I can add completely unused variables (like “c” above) to the set of variables simply because another branch of the construct (usually a conditional) may use that variable as well, and I need both blocks to be functions of the same type.

Then, by transforming any blocks into functions, a conditional follows the rewriting rule :

(* Imperativeish *)
let r = if cond then A else B in ...

(* Functional *)
let r, (a,b,c) = if c then A(a,b,c) else B(a,b,c) in ...

Loops work in the same way :

(* Imperativeish *)
while c do
  A
done ; ...

(* Functional *)
let rec loop (a,b,c) =
  if c then
    let _, (a,b,c) = A (a,b,c) in
      loop (a,b,c)
  else (a,b,c)
in let (a,b,c) = loop (a,b,c) in ...

Records

All of the above only handles assignment to variables. What about assigning to records?

It is of course impossible to alter a record held by someone else. However, if the record is stored in a local variable, then it is possible to change the local variable to take this into account.

The rewriting rule is quite simple, and turns a complex assignment (assign to a record) into a less assignment recursively:

x.label ← y   becomes   x ← { x with label = y }
var           remains the same
anything else causes an error

So, this rule would perform the following transform:

(* Imperativeish *)
x.owner.details.name ← boris ; ...

(* Functional *)
let x =
  { x with owner =
    { x.owner with details =
      { x.owner.details with name = "boris"} } }
in ...

The same approach can be applied to most other assignment operations (array, string, hash table).

Expect the Unexpected

When looking at a function declaration, there are several levels of abstraction one can use to describe what that function does.

The actual action of that function is what really happens. This includes any bugs the function may contain and any undocumented behavior that is subject to change in later versions.

The documented action of the function is what the author of the function intended to do with that function. This includes a complete description of what the function should reasonably be expected to do, what conditions may trigger an error, and what external factors may affect the outcome.

The expected action of the function is what the user of the function expects the function to do. This is the action that matters most of the time, since there are often many users for every function.

In an ideal world, all three actions would be identical: the author implemented the function to do exactly what was documented and the documentation covers all behavior and explicitly marks all unspecified elements, the user has read the documentation and understands it completely.

In the real world, those actions are all different. The difference between the actual action and the documented action is either a bug (the function does not behave as documented) or the documentation being too vague and leaving things implicitly unspecified. The difference between the expected action and the documented action happens because the user has not read, or understood, all the nuances of the function’s behavior as described in the documentation.

Breaking the Mental Model

The classic example of the latter difference in understanding is the strtolower function:

When we convert the string “integer” to upper and lower case in the Turkish locale, we get some strange characters back:

"INTEGER".ToLower() = "ınteger"
"integer".ToUpper() = "İNTEGER"

The user is not aware that strtolower depends on the current locale, because their mental model of the strtolower function turns every uppercase letter of the occidental latin alphabet into its corresponding lowercase letter in that same alphabet. Of course, this is not what happens, and there is no way of “getting” this fact straight without thoroughly reading and remembering the entire documentation of the strtolower function.

The best we can do, as function authors, is to make it woefully obvious to users of that function when they misunderstand the function.

But, you say, the only way to detect most non-trivial function misuses is through complete testing, and it’s quite probable that the user will not think of the test cases that would break their mental model!

This is correct, and this precisely why I said misunderstand and not misuse. Determining whether or not a function is used correctly is something that the user can do quite easily once they get a correct mental model of that function, so we’ll let them do exactly that. The point here is to make the function as hard to use as possible when you don’t understand it completely.

Consider the strtolower function. If you don’t understand that locale can affect the operation performed by that function, then you are going to get things wrong. A nice way to ensure you understand this is to make the locale a mandatory argument of the function. By telling the user “you need to specify a locale before using this function” you are breaking the mental model of any user that expected the function to be locale-independent, and that is a good thing.

Exceptional Situations

There is an interesting gradient of mental-model-breaking in the handling of exceptional situations:

Handling Method Always When fails
No handling (ASM, C++ undefined behavior No No
Return codes (C APIs) Weak Weak
Exceptions Weak Strong
Java Exceptions Medium Strong
Type System Strong N/A

Here, I’m discussing the ability for a given handling method of breaking an incorrect mental model in two situations : “always” means whenever the function is used, “when fails” means whenever the function is used incorrectly in a fashion that interrupts the normal course of execution.

When the function is used, the existence of exceptional situations is mentioned as weak (only in the documentation), medium (compiler error that is not very specific) or strong (specific, reliable compiler error). When a failure occurs, the result is weak (depends on user action) or strong (independent of user action).

As such, using the type system appears to be the strongest means of describing the existence of exceptional situations. How?

In a functional language, every function returns a result. There is no point in computing a result unless that result is used, which means every function result is used somewhere in the code. As such, having functions that may encounter errors return an “Error or Success” type forces the user of the function to handle the possibility of an error before they get the result.

This is precisely how Objective Caml avoids the very possibility of a “null reference” runtime error : the option type has to be explicitly turned into a value, which means that pattern matching must be used and therefore the null case has to be handled as well:

let frobnicate option =
  match option with
    | Some value -> work_with value
    | None -> work_without_value ()

Dealing with Programmers

The problem is that programmers are humans and humans are lazy. Nobody wants to spend additional time designing the type of a function just to prevent misunderstanding of that function (unless it’s an API, of course) and nobody wants to have to type an additional argument to a function.

In fact, the entire convention over configuration philosophy relies on the idea that programmers should have to make as few decisions as possible. But adding default values for every argument is dangerous if programmers are not aware that those arguments exist—choosing a sane default value implies that such a value exists and is the one most programmers have in their own limited mental models for that behavior.

And if no consensus exists, using a default value is impossible: a programmer would expect strtolower to work in the current locale by default, while another would expect strtolower to work in an invariant locale by default. Choosing a default locale means that one of these two programmers is wrong and leads to bugs. It certainly is the programmer’s fault for not reading the documentation properly, but one could argue that a successful library is one that produces great results even in the hands of less competent programmers.

Do You Care?

As I mentioned earlier, I use different e-mail addresses for every website that asks me for one. These look like victor-{website}@nicollet.net and are all redirected to the same inbox until I decide I get too much spam from them. In other news, I recently gave one such address to The Motley Fool (a financial information website) and it predictably ended up being the number one source of spam in my inbox. Get cancer and die, Fool.

Non-technical people have asked me whether such an address (namely, one that contains a hyphen) is valid. The answer is that of course, a dash is a valid character in an address (just like _, + and $ for instance) and therefore every sane MTA around the globe should be able to deliver things to my address.

Apparently, Yahoo! does not agree:

Darn, you, Yahoo!, now I have to reconfigure the internet.

Darn you, Yahoo!, now I have to reconfigure the internet.

So, what just happened here? Yahoo! does not want me to enter an invalid alternate e-mail and therefore sets up an invalid e-mail detector. And a false positive happens.

I hate false positives. Being allergic to some kinds of pollen, I have experienced the devastating effects of false positives in my own immune system. Someone (or something) is trying to be smart, but they are not, and it happens in a way that is obvious and frustrating. That this verification is utterly useless only adds more to the frustration.

What is Yahoo! trying to do here? I can see three possible explanations :

Trying to be smart

Maybe a pointy-haired boss thought “everyone validates fields” and asked for all fields to be validated even when it wasn’t necessary. Maybe a developer thought “validating all fields is a clever challenge”. Maybe the underlying libraries include a “mail verification” password that was programmed by an intern. Either way, the bottom line when you have an opportunity to be smart is, you better be really smart, or you’ll end up hurting yourself. There is no such thing as “pretty clever” when your code has to serve millions of people.

Making sure every account has a valid e-mail

Nobody trusts free e-mail in the business world. Posting anything even remotely related to business from a hotmail or yahoo address screams “amateur” unless you’re in an industry where merely having an address is unusual. The exception here would be gmail, which merely screams “my company can’t afford a domain”, but then again all our base is belong to google.

So it should be no surprise that providers of free e-mail would require at least some reassurance that the person creating the account is real. For instance, if it already has an e-mail address (never mind the possibility of confirming account A with account B and vice versa, leaving no trace of my actual identity).

But a mere syntactic verification is useless. I could write mickey1@mouse.com and then increment the “1″ until I ended up with a unique address that the system would accept. All you have done is delay the evil scammer for a few minutes, but the scammer doesn’t care because that’s just what his job is. But in the mean time, you got the syntax check wrong and hindered legitimate users that have other things to do with their time than changing their e-mail address so that they can get a Yahoo! account.

To weed out scammers and invalid addresses, it is necessary to send an e-mail to that address and have the user click on a confirm link. That is the one and only way to tell if an e-mail address is valid.

But once you start doing this level of verification, it suddenly becomes quite useless to do any other verification: you already have 0% false positives and 0% false negatives, adding another test can only increase the probability of a false positive, with no other benefit. Just accept the address as-is and start the verification workflow.

Making sure the user did not mistype their e-mail

I tend to read lists of e-mail addresses as part of my job, and the typical foobar@qux;com is a staple of French keyboards (‘.’ is ‘shift’ + ‘;’). Needless to say, if an user mistypes their password recovery e-mail, they’re in for a world of pain.

However, the correct approach to this issue is to provide a helpful warning, not an error message. Not only do you eliminate the risk of false positives in your regular expressions ever negatively affecting an user’s experience (like mine) but you can afford voluntarily introducing false positives that correspond to common mistakes but are not necessarily mistakes, thus making the feature even more helpful.

Instead of a nasty “Invalid E-Mail Address” message that begs the question “Who are you to decide that my e-mail address, hosted on my e-mail server and my domain, is invalid?”, a simple “You may have mistyped your address” warning that does not prevent submitting the form would be most welcome.

I can still remember the good old days when my computer asked me “Are You Sure?” whenever I tried to do something smart. Now, it just tells me “You Can’t Do That”, without the HAL 9000 voice.
Don’t believe me? Think how many lines of code you need to kill the operating system now, versus how many you needed in the good old days—the worst I managed was outside-allocated-memory access with CUDA.

I would argue that enterprise workflow systems push the “You Can’t Do That” logic to its final conclusion: anything out of the ordinary needs moderator intervention (if it is possible at all). This is both harder to program (as you have to clearly express what is ordinary) and harder to use in a cinch where something unusual must be done for the greater good. By contrast, a few permissive systems do exist : if what you’re trying to do can be undone then you are always allowed to do it, and a moderator is then notified about it and may choose to reverse your operation. Of course, some things cannot be undone (viewing or showing restricted information to someone, sending an e-mail to someone, and son on) and therefore require ex ante approval, but most tasks in a computer system are reversible.

Once you taste the pleasure of a “do first, be moderated later” system, it’s hard to go back to “your post will be online once it’s moderated”. Think about what Wikipedia would look like if it applied ex ante moderation…

So, unless you’re facing a critical situation, always give your users the benefit of doubt and perhaps a warning…

Empty Lists

We have all written this code before :

<ul>
  <?php foreach ($list as $element):?>
    <li><?=htmlspecialchars($element)?></li>
  <?php endforeach; ?>
</ul>

What happens when the list is empty? What is generated is an empty UL element :

<ul></ul>

This would be perfectly fine, if it wasn’t completely wrong. Quoth the XHTML DTDs (any of them) :

<!ELEMENT ul (li)+>

There must always be at least one list item in a list (what kind of insanity would have led to preventing empty lists from existing is beyond me, although I’m certain they must have had a good reason), which means a document will not validate if it contains the aforementioned empty UL element. This is also the case for HTML 4, though HTML 5 does currently allow empty lists.

So, to circumvent the empty list case, the code becomes:

<?php if (count($list) > 0): ?>
  <ul>
    <?php foreach ($list as $element): ?>
      <li><?=htmlspecialchars($element)?></li>
    <?php endforeach; ?>
  </ul>
<?php endif; ?>

While it might be possible to abstract these details away behind a function that prints a list of elements, the ultimate point of such an abstraction would be to free the developer’s mind of the issue of empty lists not being allowed in XHTML. And such a thing would be ill advised : since the correct behavior is to remove the empty list from the document, the developer should be aware that no UL element will be generated for an empty list, especially since this has implications on the CSS side (which has to accomodate the absence of the list) and the Javascript side (which has to create the element if it doesn’t exist before adding elements to it).

An important quality of any developer is their ability to identify and handle any corner cases of their domain. An important quality of any domain is to have as few corner cases as possible.

Semantic & Symbolic

Our brain interprets variables in two different fashions : semantic and symbolic. Semantic understanding means understanding the meaning of the variable’s name, where long and detailed names like “newCustomers” convey information in plain english and shorter names like “i” and “x” convey information through conventions. Symbolic understanding relies on recognizing the shape of the variable in several locations in code, and deducing from there its actual meaning—in itself, the name is not relevant, your brain just goes “hey, that’s the same variable“. Of course, there is some amount of semantic recognition to symbolic variables, usually because we understand the variable as being a symbol.

Mathematics make much use of symbolic recognition. After all, mathematicians do not write f(number) = 10 × number, they write f(x) = 10x and while one-letter variables do have some amount of semantics associated to them (i,j,k,m,n are integers, x,y,z are reals, p is a prime number, q and r are rationals, f,g,h are functions, t is often seen as time, d is a divisor, P is a predicate or a polynomial) this minimal amount of information is ridiculous when compared to the huge amounts of purely symbolic information one gathers from the use of the letter.

Symbolic recognition works better when the expression is small. In descending order of readability when you’re familiar with the language,

Mathematical notation:

ƒ : A → ∃n∈A. 2|n

Objective Caml:

let f(a) = List.exists (fun n -> n mod 2 = 0) a

C++:

bool f(const std::vector<int> &a) {
  for(std::vector<int>::const_iterator it = a.begin(); it != a.end(); ++it)
    if (*it % 2 == 0) return true;
  return false;
}

I guess there are two things one can learn from this.

First, in a terse language, symbolic recognition works better, which in turns means the programs can be even more terse while retaining their understandability.

Second, don’t bother with long variable names in a two-line function if all the information present in the name can be readily and easily deduced from the two lines in the function.

JavaScript Component Tutorial

Earlier this year, I ranted about how the graceful degradation model of jQuery made it hard to create complex components. Also, while working with a team on JavaScript components, I had to review all my previous takes on JavaScript architecture in order to build conventions that an entire team can follow.

Namespacing

To avoid collisions with other libraries, I create an object that uses a name I own. Any namespace name strategy is possible here, from java-like netNicolletCheese (if you have a common project name) to just cheese (if you have a project name that’s fairly unique). Then, any code I write goes into that namespace. I may further add sub-namespaces if I have a lot of code. Either way, you have to make sure the namespace exists before adding things to it, thus I add at the top of every file:

if(!('netNicolletCheese' in this)) this.netNicolletCheese = {};

The basic idea is that since I don’t know what order my files will be defined in, I have to define the namespace in every single one, while avoiding redefinition. This way, I can include files on an if-needed basis or stick them all together and remove any occurences of the namespace line except the first.

Then, everything is defined as members of that object. Executing any kind of code in library files is forbidden, only function and object definitions are allowed.

Components

A component is a class that contains data and renders itself somewhere on the page. This is different from the jQuery model of graceful degradation that assumes the rendered data is already present on the page and merely changes its layout. Use with caution, since this loses many benefits of graceful degradation like accessibility or search engine friendliness.

A component is always created as follows:

instance = new namespace.component(selector, data, options);
  • It’s always assigned to a variable. It’s a global variable if it’s defined at global scope (obviously, this may only happen in the code on a page, not in library code), and a public member variable of another object if it’s defined within an object. There are no free-floating components, every single one must be accessible from global scope as this makes command-line debugging way easier, and keeps the structure easier to see.
  • It has a first argument, which is a selector (in the jQuery sense). It will be fed to $(…) in order to get the target elements of the component (usually a single one). The typical behavior of a component is to generate some HTML from its internal state and call $(selector).html(…) to display the HTML. The selector is evaluated when the constructor is called, which means you may have to wrap the object initialization in a $(document).ready(…) to wait for the DOM to be instantiated. It also means adding any elements matching the selector later on won’t have any effect on the component.
  • It has a second argument, which is the data used to initialized the component. For instance, if the component is intended to display a list of elements, the data argument would be that list en JSON notation. This makes it easy to generate that data on the server side using one of the many JSON generators, while also making the component easy to instance on the client side programmatically.
  • It has an optional third argument, which represents the options that one may provide the component with (such as width, height, speed, effects, and so on). If it’s not part of the main data argument, it’s part of the options. The options are a classic JS record.

Component Initialization

The component is instantiated either when the document is ready, by placing the initialization code in the appropriate event, such as :

var page = {};
$(function(){ page.instance = new namespace.component(selector, data, options) });

Or it can be instantiated inside another component an an appropriate time.

The constructor itself consists of two distinct operations :

  • Set up any member variables representing the internal object state, using the data argument and options argument.
  • Render the object so that it appears on the page, using the rendering function, and passing the selector to it:
    this.render(selector);

    Note that a component may be created without a target selector, simply by using an empty array as the selector. It will remain unrendered until its render function is manually called with a valid selector as its argument.

Component Rendering

The render function is called during initialization. It’s also called whenever the entire component needs to be redrawn. Some components are small, and are redrawn every time, while other components may choose to only redraw parts of their contents and may therefore use other rendering functions for those parts. The rendering function reliably performs up to six operations:

  • It initializes the target, if it was provided. This lets the calling code change the rendering target dynamically.
    if (typeof(selector) != "undefined") this.$target = $(selector);

    This is generally useful when a component contains other components : a full rendering of the container means the target DOM elements of the inner components have been destroyed and created anew, and the container must therefore notify the inner components that they have a new target to render to.

    Note that the name of the target is always the same: for any component, component.$target is the current target of the component.

  • It optionally determines whether there is a target to begin with, to avoid unnecessary work. This usually takes the form :
    if (this.$target.get().length == 0) return;

    In the case where a component is inside a container, the container will create the component before rendering itself (to make things simpler, rendering assumes all sub-components already exist), and therefore provide an empty array as the selector.

  • It generates the full HTML for the component as a string.
  • It inserts the HTML into the DOM, replacing anything that previously existed. This usually happens as:
    this.$target.html(theGeneratedHtml);
  • It changes the rendering target of any sub-components and tells them to render themselves, usually written by extracting the correct targets from its own target and reverting it to an array of DOM elements:
    this.subComponent.render(this.$target.find('.subComponent').get());
  • It sets up any relevant events on the generated DOM. For instance, if the generated HTML contains a button, the button’s click event may be set to an event handler:
    this.$target.find('button').click(this.onButtonClick)

Component Event Handlers

It would be easy to define the “on button click” event simply as follows:

namespace.component.prototype.onButtonClick = function()
{ this.data.frobnicate(); }

But that wouldn’t work with jQuery, since the events re-bind the ‘this’ variable on the event handler before calling it. Meaning ‘this’ would be, in this case, the button DOM element instead of our component. This is bad.

The solution is to create an anonymous function that forwards the call to the appropriate member function:

this.$target.find('button').click(function(){this.onButtonClick()})

Whoops. ‘this’ doesn’t follow lexical scoping, which means this code still has the same problem. However, this can be solved quite easily:

var self = this;
this.$target.find('button').click(function(){self.onButtonClick()})

A short example

We can write a short incrementer: a button with a number that increases every time the button is pressed.

// Create the namespace if it doesn't exist
if (!('netNicollet' in this)) this.netNicollet = {};

// The constructor for our component
netNicollet.counter = function(selector, initial)
{
  // Set up data members (only one)
  this.value = initial;

  // Render the component
  this.render(selector);
}

// The rendering function
netNicollet.counter.prototype.render = function(selector)
{
  // Change the target (if applicable)
  if (typeof selector != "undefined")
    this.$target = $(selector);

  // Early-out if no target
  if (this.$target.get().length == 0)
    return;

  // Generate the HTML
  var html = '<div>' +  this.value + '</div>'
    + '<button type="button">Increment</button>';

  // Insert the HTML into the DOM
  this.$target.html(html);

  // Set up the events
  var self = this;
  this.$target.find('button')
    .click(function(){self.increment()});
}

// The increment operation
netNicollet.counter.prototype.increment = function()
{
  // Change the state
  this.value++;

  // Update the graphics
  this.render();
}

// Call this once the document is ready.
var counter = new netNicollet.counter('body', 1337);

Bored CSS

I had some free time on my lunch hour today, so I decided to answer a plea for help on the GameDev.net forums.

I am absolutely horrible with CSS. I need something to launch my site with (server is not available ATM)

If you can make this page look presentable i’ll be very happy. Please build on each other work so instead of msging me paste a link in the thread so everyone can see what you done and hopefully make it better.

It took me a few minutes to review the page structure, think of a classic left-right page structure, and write the corresponding CSS. I didn’t have a lot of time, so I couldn’t make the stylesheet fully portable (for instance, the rounded corners only work in Firefox, and some CSS selectors seem to confuse Internet Explorer) thus illustrating the classic conundrum that designing the stylesheet is 99% of the work, and making the stylesheet work across all browsers is the remaining 99% of the work.

And then there’s the 99% of adding jQuery to the page ;)

You can check out the re-designed page, or look at this screenshot to see what it looks like in my FireFox:

PHP Autoloading

Like C, PHP initially started out as a “every file defines functions and variables and classes” language where using an entity assumed that it had already been defined (which, in practice, meant that the file it was defined in had already been included).

This led to several issues :

  • It was hard to find out what file contained what function. It was certainly possible to namespace functions based on the file name, but it required more effort than the amateur team workforce was capable of, and it made function names so much longer.
  • It was easy to mess things up when doing dynamic loading, because one could mistakenly load a dangerous or private file.
  • When serializing classes, one would have to determine where the class was defined when reloading the serialized data, so that the class definition could be loaded again.
  • Every time a class or function was used, the developer would have to check that the corresponding definition file was loaded as well. This led to loading many files that were not necessary just in case they would be used. Since PHP is not compiled, this meant parsing the files and populating the global scope with unnecessary entities.

Which is why autoloading was introduced.

The mechanism behind autoloading is simple : if at any point during the execution of a program the script uses a class that is not defined, the __autoload function is called with the name of that class as an argument. That function is then allowed to load a file or evaluate a script string in order to define that class.

The function obviously determines, using the class name, what source file defines that class, and loads it just in time for the class to be used. This solves all of the above issues in one strike:

  • There’s usually a clean convention for mapping class names to files. For instance, the Zend convention is that class Foo_Bar_Qux is defined in Foo/Bar/Qux.php within the include path. And if you don’t follow the convention, the code doesn’t work (of course, there’s still the issue of writing the code on Windows and then running into Linux case sensitivity).
  • Using Zend_Loader (or writing your own sane __autoload function) you can restrict dynamic loading to a single directory.
  • __autoload also triggers while deserializing.
  • Developers don’t need to include anything : every used class is included, and no class is included unless it’s used.

There is of course a slight performance penalty as the loader has to process the class name to find out what file to load, but bytecode caches work around this issue quite well when performance is important.



1170 feed subscribers
(readers who polled a feed this week)