void User::DeleteAccount() { // You cannot delete accounts. It's illegal, we must keep a trace of every // account on the system. Use User::DisableAccount() instead. // Why is there a DeleteAccount() function then? Because once upon a // time, when there was no DeleteAccount() function, a smartass though // "hey, they forgot to write a DeleteAccount()" and promptly wrote it // himself. // So, this function remained here as a warning for you: you obviously // didn't get the "you cannot delete accounts" memo, because you came // looking for a function to do just that. Do not try to delete // accounts, my friend. Do not stray from the righeous path. Disable // the accounts instead. assert (false); }
Monthly Archive for July, 2009
I have worked with many novice Agile developers, and many of them tend to make the same mistake we all did while developing web sites. They are writing some kind of functionality, and they need to display some information or post back some data to the server, so they have to make up a new URL on the spot.
Being Agile, they don’t have an existing detailed specification to tell them what the URL should be. And they’re in the middle of writing something that’s quite complex, so thay can’t dedicate too much brain power to perform a proper choice. The end result is a hardcoded URL that they will need to change later on.
The problem here is that when an URL changes, everyone has to check their own files for uses of that URL, and correct it. Yes, it would be possible to add a permanent redirect (and it often is a good idea on a live website so that the search engine google references can be kept) but these do not play nice with POST requests, and what would be the point if the site has not gone live yet? So, people forget incorrect URLs in the middle of their files, and it takes a reasonable amount of examining crawler logs to find and replace them.
My usual practice is to have a central list of all URLs. Since I tend to work with an __autoload strategy, I just create an Url class and use members of that class to return properly HTML-formatted URLs : <?=htmlspecialchars(URLROOT.'/account/confirm/'.urlencode($id))?> becomes the cleaner <?=Url::ConfirmAccount($id)?>, and the actual account is hardcoded within the Url class as:
class Url
{
const ROOT = 'http://mydomain.com';
static function ConfirmAccount($id)
{
assert(is_int($id));
return self::Local('account','confirm',$id);
}
private static function Local()
{
$url = self::ROOT;
$get = '';
foreach (func_get_args() as $segment)
if (is_array($segment))
foreach ($segment as $getkey => $getval)
$get .= ($get === '' ? '?' : '&')
. urlencode($getkey)
. '=' . urlencode($getval);
else
$url .= '/' . urlencode($segment);
return htmlspecialchars($url.$get);
}
}
So that the url-encoding of the segments, and the final cleanup of any HTML special chars that could have remained within the URL, are performed by the function automatically. Any associative arrays found in the argument list are converted to GET arguments that are also properly formatted and appended to the URL. Using the URL in a non-HTML environment, such as a text document or a Location: header, requires reversing the entity encoding beforehand, but this situation should be rare enough.
It would of course be proper to construct the ROOT constant from the requested domain name rather than hard-coding it. I have not done it here in order to keep the example short.
The benefits of this approach are many:
- Specifying the URL of the account confirmation page is not done by a random page anymore, it’s done by the Url class. The random page merely has to state that it wants to link to the account confirmation page. In case of a change of the account confirmation URL (such as /account-confirm instead of /account/confirm) all modifications will occur in a single place.
- The programmer that uses the URL does not need to remember the format used to provide the data: if an URL can be built from several arguments, those arguments can be named, documented and checked by the PHP code.
- Everything within the URL is properly escaped before it is returned : the output of an URL function is always a properly formatted URL with all special characters encoded as HTML entities. This way, no invalid URLs will ever appear within the code.
Of course, in order to work, functions of the Url class should never be called with constant arguments: that would be akin to hardcoding those addresses. While the other benefits remain, changing the meaning of these arguments would have the same rippling effects over code. So, whenever you need to call a function with constant arguments, create a new function that explains what the url-with-constant-arguments is. For instance, “ConfirmAccount(0)” might be described as “ConfirmRootAccount()”, thereby shielding you from a change in the meaning of what a root account is.
While reading a WordPress stylesheet recently, I stumbled upon an interesting way to nest CSS selectors and their associated rules. To illustrate, here’s a listamatic example of list stylesheet:
ul#navlist { margin: 0 0 0 30px; padding: 0; width: 12.5%; } #navlist li { list-style-type: none; background-color: #191970; color: #daa520; border: .2em solid #daa520; font-weight: 600; text-align: center; padding: .3em; margin-bottom: .1em; } #navlist li a { color: #daa520; text-decoration: none; display: block; } #navlist li a:hover { background-color: #faebd7; color: #191970; }
And here’s the reindented version:
ul#navlist { margin: 0 0 0 30px; padding: 0; width: 12.5%; } #navlist li { list-style-type: none; background-color: #191970; color: #daa520; border: .2em solid #daa520; font-weight: 600; text-align: center; padding: .3em; margin-bottom: .1em; } #navlist li a { color: #daa520; text-decoration: none; display: block; } #navlist li a:hover { background-color: #faebd7; color: #191970; }
The idea is to combine the common prefixes in selectors as branches of a tree, and then indent each node of the tree by its depth in the tree (and place it right below its parent node or elder sibling). From what I gather, the position of the braces varies depending on personal styles, but the basic indentation rules apply as such.
I’ve already ranted about my document scanner suite. I have recently updated it to add new features.
The basic workflow goes like this:
- You run the “scan” command. This usually happens by clicking the desktop icon for the launcher, but you can also run it on a command line.
- The program prompts you for a document name. Aside from being different from any existing document name (to avoid accidental overwriting) you are free to choose any valid file name.
- The program starts scanning pages. Every time a page is scanned, a preview is shown and the user can accept or try again. Every time a page is accepted, the user is allowed to scan another page or stop scanning.
- Every scanned page is saved to TIFF on the fly. Once all pages have been retrieved, they are converted to PNM, then to DJVU. This conversion step takes around two minutes per page on my computer. Then, all DJVU files are bundled together as a single file.
- The bundled DJVU is stored both locally and on a backup server through FTP.
Once the manual scan-preview-confirm process has ended, the lengthy compression and upload stage starts, but is completely non-interactive. It is therefore possible to start scanning another document (or do something else) while it finishes.
I have also reduced the resolution from 300 dpi to 150 dpi, as it remains quite readable. This has resulted in a reduction in file size from around 8MiB PNG files to 2MiB TIFF files, which are in turn compressed to 1MiB DJVU files. My current library of scanned pages (mostly administrative documents, reports and contracts) weighs in at around 150MiB instead of the previous 1.1GiB.
Below is a scan of Papier d’Arménie made by my delightful assistant:
The Objective Caml source code for running this little baby follows below:
exception CommandFailed of int let run command = print_endline command ; let result = Sys.command command in if result <> 0 then raise (CommandFailed result) let ask request = print_endline ( "# " ^ request ) ; read_line () let tmp ext = Filename.temp_file "" ext let say format = Printf.printf ("# " ^^ format) (* Scan a page, display the result, ask if the user wants to keep it (tries again until it gets the scan right) and returns the filename where the successful scan was saved. *) let rec scan_to_tiff () = let file = tmp ".tiff" in run ("scanimage -l 0 -t 0 -x 215 -y 297 --brightness -22 " ^ "--contrast 22 --resolution 150 --progress --mode Gray " ^ "--format=tiff > " ^ file) ; run ("display " ^ file) ; if ask "keep this page? [Yn]" <> "n" then file else scan_to_tiff () (* Scan individual pages (using scan_to_tiff) until the user decides to stop. If an individual scan fails due to system errors, allows retrying. Returns the list of all filenames the user agreed with. *) let rec scan_list_to_tiff () = try let file = scan_to_tiff () in if ask "scan another page? [Yn]" <> "n" then file :: scan_list_to_tiff () else [file] with CommandFailed i -> say "command failed with exit code %d\n" i ; if ask "try again? [Yn]" <> "n" then scan_list_to_tiff () else [] (* Turn individual image into djvu image. Returns djvu filename if successful. *) let rec tiff_to_djvu file = let pnm = tmp ".ppm" in let djvu = tmp ".djvu" in run ( "convert " ^ file ^ " " ^ pnm ) ; run ( "cpaldjvu " ^ pnm ^ " " ^ djvu ) ; djvu (* Turn a set of images into individual djvu pages. Allow skipping or retrying on error during the conversion process. *) let rec tiff_list_to_djvu_list = function | [] -> [] | file :: list -> try tiff_to_djvu file :: tiff_list_to_djvu_list list with CommandFailed i -> say "command failed with exit code %d\n" i ; if ask "try again? [Yn]" <> "n" then tiff_list_to_djvu_list (file :: list) else tiff_list_to_djvu_list list (* Turn a list of individual djvu files into a bundled djvu file. *) let rec make_djvu_bundle file list = try if list = [] then false else if List.tl list = [] then ( run ( "cp " ^ List.hd list ^ " " ^ file ) ; true ) else ( run ( "djvm " ^ file ^ " " ^ String.concat " " list) ; true ) with CommandFailed i -> say "command failed with exit code %d\n" i ; if ask "try again? [Yn]" <> "n" then make_djvu_bundle file list else ( say "scan aborted" ; false ) (* Choose a name for the output djvu file *) let rec choose_djvu_filename () = let path = "/home/arkadir/docs/" in let name = ask "document name (extension will be added automatically) ?" in if name <> "" && name <> Filename.basename name then ( say "incorrect filename" ; choose_djvu_filename () ) else if Sys.file_exists (Filename.concat path (name ^ ".djvu")) then ( say "file already exists" ; choose_djvu_filename () ) else Filename.concat path (name ^ ".djvu") (* Upload a file to an ftp server. *) let rec upload_file file = try run ( "ncftpput -f /home/arkadir/docs/ftp.cfg /home/www/blog/docs " ^ file ) with CommandFailed i -> say "command failed with exit code %d\n" i ; if ask "try again? [Yn]" <> "n" then upload_file file else say "upload aborted" (* Complete process *) let _ = let name = choose_djvu_filename () in let files = tiff_list_to_djvu_list (scan_list_to_tiff ()) in if make_djvu_bundle name files then upload_file name
This requires the classic djvuLibre utils to be installed (cpaldjvu and djvm), as well as imagemagick (convert) and ncftp (ncftpput). Scanning happens with sane (scanimage). Some files are also uploaded to my web server, where I use “convert -thumbnail” to create thumbnails from DJVU files.
Names. We programmers see more names in a single session than a phone directory editor will see in their entire career, yet we prove worse at finding names than a fift
Naming is a two-way approach: the name must accurately convey what the thing is, and the name should be easily guessed for that thing. The two sides of the equation are not always of equal importance: guessing the name of a local variable is less useful than guessing the name of a class in a library.
Humans always use context to understand what names mean, in order to disambiguate the many possible meanings of a name. For instance, ‘window’ could refer to the ubuquitous user interface concept or it could refer to the glass-paned house building block. A sentence like “I open a window” needs a minimum level of context to disambiguate between the two interpretations.
On the other hand, the information must not be made redundant either. For instance, a class named “OpponentTimer” defined within a “Opponent” namespace: it’s fairly obvious that the timer is related to an opponent both within the namespace (you’re dealing with opponents, so the timer should have something to do with it) and outside the namespace (as it’s being referred to as “Opponent.Timer” or something like that). The same goes with file paths, such as ‘/scripts/invaderScript.py’ which could have been named just as well ‘/scripts/invader.py” with no loss of information due to the context.
This is what I used to think about this issue :
One thing I have noticed time and time again is that the vast majority of people I work with (or see on the internet, for that matter) are very bad at finding names. So bad, in fact, that I can usually propose better names within seconds of reading them for the first time. At least they agree that the new names are better.
The reason is, in retrospect, quite obvious : two brains are better than one, especially when it comes to looking at things in different contexts to determine if there are any ambiguities. These programmers must have been thinking the same thing when looking at my code.
By now, you should have noticed that “team naming” refers to “working as a team to name things” as opposed to “naming a team”—lack of context does tend to create such misunderstandings
So, that would be why pair programming with at least one -ansi -pendantic -Wall programmer in the team tends to create code that is much cleaner than one-programmer code written by either participant.
Short of acquiring some sort of split personality, there’s no easy way to achieve that alone : no matter how hard you try, your brain can only hold one context at a time. Some programmers might be able to switch contexts faster than others when they think about it, but you generally don’t switch contexts when naming a variable. Maybe we should?
Even then, noticing an ambiguity involves thinking about two contexts where the name has different meanings. Merely having two contexts in mind (or minds, when working as a team) doesn’t mean you actually found two incompatible contexts.You have to think about all the contexts in which the element can be used. The good news is, all of these are nested and you can reach them by removing information progressively from the innermost context that you have in mind. If your code was laid out correctly, these should match scopes, classes and namespaces/packages.
Functional programming languages do not allow global state—they don’t really allow any kind of state, but at least local state can be simulated by function arguments. The only way to handle “global” state would be to pass the global variable as an additional parameter (and additional return value) to every function. And it wouldn’t play nice with exceptions either, so every exception would have to carry the values of all global variables.
Besides, global state has the annoying habit of creating hidden dependencies between program parts, which lead to coupling that is hard to break and code that is not re-entrant.
On the other hand, global variables have the clear benefit of making code shorter by making many dependencies implicit. Why pass around data as an argument or a member variable if making it global works just as well?
Some kind of middle ground here would be self-propagating implicit function arguments and return values.
Such arguments would have to be declared at global scope, since they have to be globally accessible. On the other hand, since they are merely arguments, they have no value until a function is called with one, which means the declaration would probably end up looking like:
channel total : int
This would declare an implicitly progated integer argument named “total“. Then, functions could be made to read and write to that channel implicitly as if it were a normal variable.
let rec sum = function
| [] ->
()
| head :: tail ->
total <- total + head ;
sum tail
Since the presence of the channel can be determined statically (it happens at name resolution time) it can also be made a part of the function’s type signature, which has the double benefit of allowing compile-time checking of channel usage and letting the programmer know which channels must be provided for a function to work:
sum : int list [total] -> unit
The presence of a list of channels before a function means the function must be called in an environment where the channel is available. This means either the environment’s type is inferred to contain that channel, or a conflict appears and the channel must be explicitly defined:
let a = 0 in bind a as total in sum [ 1 ; 2 ; 3 ] ; print_int a ; sum [ 4 ; 5 ; 6 ] ; print_int a
The bind instruction creates an environment where the named variable or expression is automatically updated in the local scope (as if it were applied the principles of assignment I explored last week). That is, every time a function call writes to the channel, the modification is propagated back to the bound variable as soon as the function returns (and the channel always reflects the value of the variable). So the example above would print 6 and 21.
Channel implementation is fairly straightforward : every function that accesses a channel takes an implicit final argument representing the input value of that channel (if the channel is read) and returns an implicit value (if the channel is written to). That value is then locally bound, in the calling function, to either a similar constructor that propagates the use of the channel, or an explicit bind operation for the channel.
In pure functional languages, variables cannot be modified. In order to perform operations that imperative programmers achieve with variable modification, functional programmers perform re-assignment:
// Imperative x = sqrt(x); (* Functional *) let x = sqrt x in ...
And in the case of a loop, they represent the loop body as a function that is called with different arguments, and the modified value is propagated through the calls as an argument:
// Imperative x = 0 for (i = 0; i < 10; ++i) x += i; (* Functional *) let x = 0 in let rec loop i x = if i = 10 then x else loop (i+1) (x+i) in let x = loop i x in ...
So, can local modification of variables be allowed in a pure functional context through simple rewriting rules that turn the imperative constructs into their functional counterparts? Yes, although there are some limits.
Simple assignments
We know that there might be assignments within every expression (an assignment is an expression of the form x ← <V> ; <E>). The tactic used here is to turn every expression <E> into an expression of the form:
let [E'] in x1 ← v1 ; x2 ← v2 ... ; e
such that no sub-expression within the “let …” contains an assignment (and is therefore a plain old pure functional language expression) and only variable names appear after the “in …” . This rewriting tactic is applied recursively : atomic expressions (those without sub-expressions) are trivially written as such, which leaves only the question of expressions with sub-expressions that have already been recursively turned into the above format.
The key is to turn op(<A>, <B>, <C>, <D>) into:
let [A'] in let xa* = va* in let [B'] in let xb* = vb* in let [C'] in let xc* = vc* in let [D'] in let e = op(a,b,c,d) in xa1 ← va1 ; xa2 ← va2 ; ... xb1 ← vb1 ; ... ; e
The order of the sub-expression is fixed (which means an order of evaluation has to be specified for every operation). Every expression computes all its values (including the asisgned ones) and the assignment is simulated using redefinition of the variable so that subsequent sub-expressions can “see” the modified variable. The actual assignments are then pushed to the end of the complete expression so that the recursive rewriting rule will see them from the superexpression above.
Note that expressions which do not always evaluate all sub-expressions cannot be expressed as above. Fortunately, all such expressions can be rewritten as a conditional and expressions that evaluate all sub-expressions, and conditionals are evaluated above.
Note that a lambda expression is considered to be an atomic expression here, so no propagation of assignment occurs from within the anonymous function to the surrounding context! This means (as expected) that the assignments are a purely local construct that cannot cross function barriers, so I simply remove them when they reach the top level to obtain a normal pure functional expression.
This performs the transform:
(* Imperativeish *) let x = 3.14 in x ← sqrt x ; print_float x (* Functional *) let x = 3.14 in let v = sqrt x in let x = v in print_float x
Loops and conditionals
Loops, conditionals are special cases of block-based expressions. A block is a language construct that looks like a lambda expression (it gathers all values from the surrounding scope) and may be executed zero, one or several times. The main difference is that a block cannot be saved for later execution, it is always executed at a specified time. In short, a block is a beta-redex. Since we have the guarantee that the block is executed before the current context resumes, we can let it alter the state of the current context.
For every block, I select a set of variables that the block may alter (although it does not necessarily do so). The block itself is syntactically an expression, so I can rewrite its internal assignments as above by moving them all to the top level of the expression. Then, I turn the block into a closure which takes as arguments the aforementioned set of variables, and returns a pair that contains the result of the expression and the final values (after assignment) of the set of variables. In short:
(* Imperativeish *)
{
a ← 1 ;
b ← b + 2 ;
a + b
}
(* Functional *)
fun (a,b,c) ->
let a = 1 in
let b = b + 2 in
a + b, (a,b,c)
I can add completely unused variables (like “c” above) to the set of variables simply because another branch of the construct (usually a conditional) may use that variable as well, and I need both blocks to be functions of the same type.
Then, by transforming any blocks into functions, a conditional follows the rewriting rule :
(* Imperativeish *) let r = if cond then A else B in ... (* Functional *) let r, (a,b,c) = if c then A(a,b,c) else B(a,b,c) in ...
Loops work in the same way :
(* Imperativeish *)
while c do
A
done ; ...
(* Functional *)
let rec loop (a,b,c) =
if c then
let _, (a,b,c) = A (a,b,c) in
loop (a,b,c)
else (a,b,c)
in let (a,b,c) = loop (a,b,c) in ...
Records
All of the above only handles assignment to variables. What about assigning to records?
It is of course impossible to alter a record held by someone else. However, if the record is stored in a local variable, then it is possible to change the local variable to take this into account.
The rewriting rule is quite simple, and turns a complex assignment (assign to a record) into a less assignment recursively:
x.label ← y becomes x ← { x with label = y }
var remains the same
anything else causes an error
So, this rule would perform the following transform:
(* Imperativeish *)
x.owner.details.name ← boris ; ...
(* Functional *)
let x =
{ x with owner =
{ x.owner with details =
{ x.owner.details with name = "boris"} } }
in ...
The same approach can be applied to most other assignment operations (array, string, hash table).
When looking at a function declaration, there are several levels of abstraction one can use to describe what that function does.
The actual action of that function is what really happens. This includes any bugs the function may contain and any undocumented behavior that is subject to change in later versions.
The documented action of the function is what the author of the function intended to do with that function. This includes a complete description of what the function should reasonably be expected to do, what conditions may trigger an error, and what external factors may affect the outcome.
The expected action of the function is what the user of the function expects the function to do. This is the action that matters most of the time, since there are often many users for every function.
In an ideal world, all three actions would be identical: the author implemented the function to do exactly what was documented and the documentation covers all behavior and explicitly marks all unspecified elements, the user has read the documentation and understands it completely.
In the real world, those actions are all different. The difference between the actual action and the documented action is either a bug (the function does not behave as documented) or the documentation being too vague and leaving things implicitly unspecified. The difference between the expected action and the documented action happens because the user has not read, or understood, all the nuances of the function’s behavior as described in the documentation.
Breaking the Mental Model
The classic example of the latter difference in understanding is the strtolower function:
When we convert the string “integer” to upper and lower case in the Turkish locale, we get some strange characters back:
"INTEGER".ToLower() = "ınteger" "integer".ToUpper() = "İNTEGER"
The user is not aware that strtolower depends on the current locale, because their mental model of the strtolower function turns every uppercase letter of the occidental latin alphabet into its corresponding lowercase letter in that same alphabet. Of course, this is not what happens, and there is no way of “getting” this fact straight without thoroughly reading and remembering the entire documentation of the strtolower function.
The best we can do, as function authors, is to make it woefully obvious to users of that function when they misunderstand the function.
But, you say, the only way to detect most non-trivial function misuses is through complete testing, and it’s quite probable that the user will not think of the test cases that would break their mental model!
This is correct, and this precisely why I said misunderstand and not misuse. Determining whether or not a function is used correctly is something that the user can do quite easily once they get a correct mental model of that function, so we’ll let them do exactly that. The point here is to make the function as hard to use as possible when you don’t understand it completely.
Consider the strtolower function. If you don’t understand that locale can affect the operation performed by that function, then you are going to get things wrong. A nice way to ensure you understand this is to make the locale a mandatory argument of the function. By telling the user “you need to specify a locale before using this function” you are breaking the mental model of any user that expected the function to be locale-independent, and that is a good thing.
Exceptional Situations
There is an interesting gradient of mental-model-breaking in the handling of exceptional situations:
| Handling Method | Always | When fails |
| No handling (ASM, C++ undefined behavior | No | No |
| Return codes (C APIs) | Weak | Weak |
| Exceptions | Weak | Strong |
| Java Exceptions | Medium | Strong |
| Type System | Strong | N/A |
Here, I’m discussing the ability for a given handling method of breaking an incorrect mental model in two situations : “always” means whenever the function is used, “when fails” means whenever the function is used incorrectly in a fashion that interrupts the normal course of execution.
When the function is used, the existence of exceptional situations is mentioned as weak (only in the documentation), medium (compiler error that is not very specific) or strong (specific, reliable compiler error). When a failure occurs, the result is weak (depends on user action) or strong (independent of user action).
As such, using the type system appears to be the strongest means of describing the existence of exceptional situations. How?
In a functional language, every function returns a result. There is no point in computing a result unless that result is used, which means every function result is used somewhere in the code. As such, having functions that may encounter errors return an “Error or Success” type forces the user of the function to handle the possibility of an error before they get the result.
This is precisely how Objective Caml avoids the very possibility of a “null reference” runtime error : the option type has to be explicitly turned into a value, which means that pattern matching must be used and therefore the null case has to be handled as well:
let frobnicate option =
match option with
| Some value -> work_with value
| None -> work_without_value ()
Dealing with Programmers
The problem is that programmers are humans and humans are lazy. Nobody wants to spend additional time designing the type of a function just to prevent misunderstanding of that function (unless it’s an API, of course) and nobody wants to have to type an additional argument to a function.
In fact, the entire convention over configuration philosophy relies on the idea that programmers should have to make as few decisions as possible. But adding default values for every argument is dangerous if programmers are not aware that those arguments exist—choosing a sane default value implies that such a value exists and is the one most programmers have in their own limited mental models for that behavior.
And if no consensus exists, using a default value is impossible: a programmer would expect strtolower to work in the current locale by default, while another would expect strtolower to work in an invariant locale by default. Choosing a default locale means that one of these two programmers is wrong and leads to bugs. It certainly is the programmer’s fault for not reading the documentation properly, but one could argue that a successful library is one that produces great results even in the hands of less competent programmers.
As I mentioned earlier, I use different e-mail addresses for every website that asks me for one. These look like victor-{website}@nicollet.net and are all redirected to the same inbox until I decide I get too much spam from them. In other news, I recently gave one such address to The Motley Fool (a financial information website) and it predictably ended up being the number one source of spam in my inbox. Get cancer and die, Fool.
Non-technical people have asked me whether such an address (namely, one that contains a hyphen) is valid. The answer is that of course, a dash is a valid character in an address (just like _, + and $ for instance) and therefore every sane MTA around the globe should be able to deliver things to my address.
Apparently, Yahoo! does not agree:
So, what just happened here? Yahoo! does not want me to enter an invalid alternate e-mail and therefore sets up an invalid e-mail detector. And a false positive happens.
I hate false positives. Being allergic to some kinds of pollen, I have experienced the devastating effects of false positives in my own immune system. Someone (or something) is trying to be smart, but they are not, and it happens in a way that is obvious and frustrating. That this verification is utterly useless only adds more to the frustration.
What is Yahoo! trying to do here? I can see three possible explanations :
Trying to be smart
Maybe a pointy-haired boss thought “everyone validates fields” and asked for all fields to be validated even when it wasn’t necessary. Maybe a developer thought “validating all fields is a clever challenge”. Maybe the underlying libraries include a “mail verification” password that was programmed by an intern. Either way, the bottom line when you have an opportunity to be smart is, you better be really smart, or you’ll end up hurting yourself. There is no such thing as “pretty clever” when your code has to serve millions of people.
Making sure every account has a valid e-mail
Nobody trusts free e-mail in the business world. Posting anything even remotely related to business from a hotmail or yahoo address screams “amateur” unless you’re in an industry where merely having an address is unusual. The exception here would be gmail, which merely screams “my company can’t afford a domain”, but then again all our base is belong to google.
So it should be no surprise that providers of free e-mail would require at least some reassurance that the person creating the account is real. For instance, if it already has an e-mail address (never mind the possibility of confirming account A with account B and vice versa, leaving no trace of my actual identity).
But a mere syntactic verification is useless. I could write mickey1@mouse.com and then increment the “1″ until I ended up with a unique address that the system would accept. All you have done is delay the evil scammer for a few minutes, but the scammer doesn’t care because that’s just what his job is. But in the mean time, you got the syntax check wrong and hindered legitimate users that have other things to do with their time than changing their e-mail address so that they can get a Yahoo! account.
To weed out scammers and invalid addresses, it is necessary to send an e-mail to that address and have the user click on a confirm link. That is the one and only way to tell if an e-mail address is valid.
But once you start doing this level of verification, it suddenly becomes quite useless to do any other verification: you already have 0% false positives and 0% false negatives, adding another test can only increase the probability of a false positive, with no other benefit. Just accept the address as-is and start the verification workflow.
Making sure the user did not mistype their e-mail
I tend to read lists of e-mail addresses as part of my job, and the typical foobar@qux;com is a staple of French keyboards (‘.’ is ‘shift’ + ‘;’). Needless to say, if an user mistypes their password recovery e-mail, they’re in for a world of pain.
However, the correct approach to this issue is to provide a helpful warning, not an error message. Not only do you eliminate the risk of false positives in your regular expressions ever negatively affecting an user’s experience (like mine) but you can afford voluntarily introducing false positives that correspond to common mistakes but are not necessarily mistakes, thus making the feature even more helpful.
Instead of a nasty “Invalid E-Mail Address” message that begs the question “Who are you to decide that my e-mail address, hosted on my e-mail server and my domain, is invalid?”, a simple “You may have mistyped your address” warning that does not prevent submitting the form would be most welcome.
I can still remember the good old days when my computer asked me “Are You Sure?” whenever I tried to do something smart. Now, it just tells me “You Can’t Do That”, without the HAL 9000 voice.
Don’t believe me? Think how many lines of code you need to kill the operating system now, versus how many you needed in the good old days—the worst I managed was outside-allocated-memory access with CUDA.
I would argue that enterprise workflow systems push the “You Can’t Do That” logic to its final conclusion: anything out of the ordinary needs moderator intervention (if it is possible at all). This is both harder to program (as you have to clearly express what is ordinary) and harder to use in a cinch where something unusual must be done for the greater good. By contrast, a few permissive systems do exist : if what you’re trying to do can be undone then you are always allowed to do it, and a moderator is then notified about it and may choose to reverse your operation. Of course, some things cannot be undone (viewing or showing restricted information to someone, sending an e-mail to someone, and son on) and therefore require ex ante approval, but most tasks in a computer system are reversible.
Once you taste the pleasure of a “do first, be moderated later” system, it’s hard to go back to “your post will be online once it’s moderated”. Think about what Wikipedia would look like if it applied ex ante moderation…
So, unless you’re facing a critical situation, always give your users the benefit of doubt and perhaps a warning…
We have all written this code before :
<ul>
<?php foreach ($list as $element):?>
<li><?=htmlspecialchars($element)?></li>
<?php endforeach; ?>
</ul>
What happens when the list is empty? What is generated is an empty UL element :
<ul></ul>
This would be perfectly fine, if it wasn’t completely wrong. Quoth the XHTML DTDs (any of them) :
<!ELEMENT ul (li)+>
There must always be at least one list item in a list (what kind of insanity would have led to preventing empty lists from existing is beyond me, although I’m certain they must have had a good reason), which means a document will not validate if it contains the aforementioned empty UL element. This is also the case for HTML 4, though HTML 5 does currently allow empty lists.
So, to circumvent the empty list case, the code becomes:
<?php if (count($list) > 0): ?>
<ul>
<?php foreach ($list as $element): ?>
<li><?=htmlspecialchars($element)?></li>
<?php endforeach; ?>
</ul>
<?php endif; ?>
While it might be possible to abstract these details away behind a function that prints a list of elements, the ultimate point of such an abstraction would be to free the developer’s mind of the issue of empty lists not being allowed in XHTML. And such a thing would be ill advised : since the correct behavior is to remove the empty list from the document, the developer should be aware that no UL element will be generated for an empty list, especially since this has implications on the CSS side (which has to accomodate the absence of the list) and the Javascript side (which has to create the element if it doesn’t exist before adding elements to it).
An important quality of any developer is their ability to identify and handle any corner cases of their domain. An important quality of any domain is to have as few corner cases as possible.

Hi. I'm Victor Nicollet,
Recent Comments