<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Nicollet.Net &#187; Architecture</title>
	<atom:link href="http://www.nicollet.net/toroidal/architecture/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.nicollet.net</link>
	<description>Everyone Loves Me</description>
	<lastBuildDate>Mon, 23 Jan 2012 16:55:59 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>OCaml Submodule Pattern</title>
		<link>http://www.nicollet.net/2012/01/ocaml-submodule-pattern/</link>
		<comments>http://www.nicollet.net/2012/01/ocaml-submodule-pattern/#comments</comments>
		<pubDate>Mon, 23 Jan 2012 16:55:59 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Functional]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Design Patterns]]></category>
		<category><![CDATA[Objective Caml]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2660</guid>
		<description><![CDATA[My code is quite large for an OCaml project. The main RunOrg repository alone contains 46212 lines of OCaml code (plus an additional 5631 lines of OCaml mli files) — and then, there&#8217;s the web framework code and the independent plugins code. It&#8217;s is Better™ to have many short files than a few long ones. [...]]]></description>
			<content:encoded><![CDATA[<p>My code is quite large for an OCaml project. The main RunOrg repository alone contains 46212 lines of OCaml code (plus an additional 5631 lines of OCaml mli files) — and then, there&#8217;s the web framework code and the independent plugins code.</p>
<p>It&#8217;s is Better™ to have many short files than a few long ones. One reason is incremental compiling with <em>ocamlbuild</em> : that the smaller your files are, the smaller the percentage of code to be compiled when you make a small change. Another reason is that files provide a natural delineation of code that makes it slightly easier to reason about.</p>
<p>The very process of splitting a large file into smaller files is also an excellent way to clean up the code. Every split is an opportunity to move some code to a more generic location — why have a <code>CMember_importParser</code> module when all of its functionality could fit into an <code>OzCsv</code> plugin module ? Even when no such generic solution exists, cutting through the jungle that a 2000-line module contains helps clean up dependencies, identify shared functionality and imagine better ways to design code.</p>
<p>Still, when cutting up code this way, the problem of encapsulation remains. If code that relates to pictures (an upload module, a transform module, a download module, an access rights module) is split across several files, it is desirable to let each file access functions and values from other values that would not otherwise be shown to modules not related to picture processing. For instance, a <code>get_download_link</code> function should be available throughout all picture-related modules, but the rest of the application should use the <code>get_download_link_for_user</code> function that checks whether the user is allowed to download the file.</p>
<p>In order to achieve several nested levels of encapsulation required to work with modules this way, I have come up with a convention :</p>
<ul>
<li>A module name (and thus, a file name) is composed of segments written in camelCase and separated by underscores. For instance, <code>CEntity_view_grid</code> is a module name containing segments <code>CEntity,</code> <code>view</code> and <code>grid</code>.</li>
<li>Modules with only one segment are public. Any other module may include, open or otherwise reference them with no limitations beyond what the module signature says. So, <code>CEntity</code> may access <code>MGroup</code> freely.</li>
<li>Modules with N &gt; 1 segments are private. They may only be accessed by modules which share the first N-1 segments. So, <code>CEntity_view</code> is available to modules <code>CEntity</code> and <code>CEntity_edit</code> but not <code>CPicture</code>.</li>
<li>A module with N segments may export any module with N+1 segments it can access, possibly under a more restrictive signature. For instance, <code>CEntity_view</code> is available to all other modules as <code>CEntity.View</code>.</li>
</ul>
<p>To make these rules easier to respect, private module dependencies are made explicit by adding a list of module aliases at the top of each file. The top of my <code>cEntity_view.ml</code> file starts with :</p>
<pre style="padding-left: 30px;"><code>module Sidebar     = CEntity_sidebar
module Unavailable = CEntity_unavailable
module Edit        = CEntity_edit
module Info        = CEntity_view_info
module Directory   = CEntity_view_directory
module Grid        = CEntity_view_grid
module Wall        = CEntity_view_wall
</code></pre>
<p>It is forbidden to use a private module without going through such an alias, and it is forbidden to define such an alias anywhere except at the top of the file. This makes it extremely easy to determine whether private access rules are respected.</p>
<p>The rule of thumb for splitting files (in my particular coding style) is :</p>
<ul>
<li>Code for separate layers (model, view, controller&#8230;) go into separate public modules.</li>
<li>For complex code (such as complex rules in model or controller code), consider splitting files larger than 200 lines.</li>
<li>For simple code (such as HTML template or JSON serialization definitions), there is no splitting limit except for factoring out common behavior.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2012/01/ocaml-submodule-pattern/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Frameworks, Libraries, Conventions</title>
		<link>http://www.nicollet.net/2012/01/frameworks-libraries-conventions/</link>
		<comments>http://www.nicollet.net/2012/01/frameworks-libraries-conventions/#comments</comments>
		<pubDate>Thu, 05 Jan 2012 19:12:25 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Dynamic]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Zend]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2649</guid>
		<description><![CDATA[Funkatron came up with the MicroPHP Manifesto : I am a PHP developer I am not a Zend Framework or Symfony or CakePHP developer I think PHP is complicated enough I like building small things I like building small things with simple purposes I like to make things that solve problems I like building small [...]]]></description>
			<content:encoded><![CDATA[<p>Funkatron came up with the <a href="http://funkatron.com/posts/the-microphp-manifesto.html" target="_blank">MicroPHP Manifesto</a> :</p>
<blockquote><p><strong>I am a PHP developer</strong></p>
<ul>
<li>I am not a Zend Framework or Symfony or CakePHP developer</li>
<li>I think PHP is complicated enough</li>
</ul>
<p><strong>I like building small things</strong></p>
<ul>
<li>I like building small things with simple purposes</li>
<li>I like to make things that solve problems</li>
<li>I like building small things that work together to solve larger problems</li>
</ul>
<p><strong>I want less code, not more</strong></p>
<ul>
<li>I want to write less code, not more</li>
<li>I want to manage less code, not more</li>
<li>I want to support less code, not more</li>
<li>I need to justify every piece of code I add to a project</li>
</ul>
<p><strong>I like simple, readable code</strong></p>
<ul>
<li>I want to write code that is easily understood</li>
<li>I want code that is easily verifiable</li>
</ul>
</blockquote>
<p>Without surprise, a large swath of the community did not take it well, for similar reasons to <a href="http://www.nicollet.net/2010/03/why-i-gave-up-on-the-zend-framework/" target="_blank">my earlier piece against Zend Framework</a> — deviation from the commonly accepted norm.</p>
<p>I have come a long way since I wrote that article, and I must have been walking in circles, because I actually ended up where I originally begun : why do we call these things <em>frameworks</em> ?</p>
<p>Zend, Symfony, CakePHP — as well as Node.js, Rails, Django, Ocsigen &#8230; — actually contribute three different things to projects that use them.</p>
<h4>Libraries</h4>
<p>A library provides <em>functionality</em> used for solving <em>general problems</em> in a flexible, <em>standalone</em> manner. <code>Zend_Mail</code> is a classic example of the library aspect of Zend Framework: you can plug it into your application and start sending e-mail. The interface you would use is uncluttered by details that are not directly related to sending e-mail.</p>
<p>The core qualities of a library are its power (how many different aspects of a problem does it let me solve — attachments, rich text, bouncing, MIME handling&#8230;) and the clarity of its interface. <strong>What problems can you solve, and how fast can you solve them?</strong></p>
<h4>Conventions</h4>
<p>When you hear «conventions» you immediately think of opening brace positions and variable naming rules. It&#8217;s about more than that.</p>
<p>The Model-View-Controller separation is an example of convention: it has been decided that under no circumstances should HTML rendering occur in Model code, no HTTP or session handling should happen in View code, and no SQL queries happen in Controller code.</p>
<p>Good conventions are designed to let the developers assume interesting properties about the code without having to actually read it. A convention like «no global variables» means I never have to care about global state in my code, ever. A convention like «view code must respect the law of Demeter» means all the data used by the view is right where it is being initialized.</p>
<p>They are also designed to make reuse and interoperability easier by reducing the number of ways in which a possible interface can be implemented. A convention could say the values are passed by assigning them to members post-construction and <strong>not</strong> as constructor arguments, so you have one less point of contention between the object that is initialized and the object that does the initialization.</p>
<p>Last but not least, conventions are usually based on experience of things that could go wrong if certain behavior is allowed. A typical example is the requirement to escape all strings as they are being output — eliminating any ambiguities as to whether the string has already been escaped elsewhere and should be output as-is: it has not.</p>
<p>Zend comes with a variety of useful conventions enforced both through the interface of its tools — <em>this</em> is how you use a view, <em>this</em> is how you define a view helper that should be available from within any view, <em>this</em> is how you bind a piece of code to an URL, and so on. I happen to disagree with many of those conventions myself — because I believe they solve the wrong problems — but they are certainly better than a project with no conventions.</p>
<p>For the reference, my PHP conventions are described in <a href="http://www.nicollet.net/ohm-least-resistance/" target="_blank">the user manual for Ohm</a>.</p>
<h4>Framework</h4>
<p>A framework is actually going a step further than mere conventions. They are super-conventions designed to be respected by plugin authors. The point is that if plugin A and plugin B respect the set of conventions provided by the framework, then they can be used together in the same application.</p>
<p>Consider a practical example : a plugin that implements a CAPTCHA field in a form, and a plugin that displays and submits a form through AJAX. On a bad day, it goes like this :</p>
<ol>
<li>When an error occurs, the server-side AJAX-form plugin sends out a small piece of JSON containing the fields that have errors, along with the error messages. A small client-side script applies these.</li>
<li>However, the CAPTCHA plugin expected the image to be reloaded when an error occurs.  It may either keep the same image and target word — defeating the purpose of a CAPTCHA — or change the target word without knowing that the image could not be changed.</li>
<li>You then need to post on StackOverflow hoping for a solution, search online for a patch to either plugin that could make it work as expected, or try to read the code to either in order to create the patch yourself.</li>
</ol>
<p>Had the framework provided a clean notion of « this field must be refreshed on every attempt » as part of their form interface, both plugins would have used it — the CAPTCHA plugin would have marked its field as such, and the AJAX plugin would have implemented a special case for such fields.</p>
<p>As such, the purpose of a framework is to provide a clean, unambigous and extensive <strong>vocabulary</strong> that all the plugins should be able to speak, and that is designed to cover as much real-world situations as possible.</p>
<p>Zend Framework and Symfony in particular do an absolutely great job of this. When you can have a pager component push its data to the page through a progressive enhancement component, and log its performance to FirePHP when an user authentication component  determines that the viewing user is a developer, and all of it works by plugging square pegs into square holes, you know there has been a lot of great work going on below the hood.</p>
<h4>Back to the point</h4>
<p>Using a framework is all fun and games until you need to disagree with it. You need to plug out what does not work, and plug your own implementation in its place. The more complex the vocabulary, and the harder it will be to write new code — frameworks make it easy to connect existing components, at the cost of having to deal with more concepts when actually implementing new things.</p>
<p>What it boils down to, in the end, is whether you expect to be reusing a lot of third party components, or to write a lot of your own code. In the latter case, MicroPHP — and lean environments that do not have a heavy framework side to them — is actually an improvement over trying to fit a six-inch wooden square peg into a mini-USB port.</p>
<p>The exception to this is, of course, being so familiar with a particular framework that you immediately know what changes you need to do without fighting against third party code.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2012/01/frameworks-libraries-conventions/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Two-way bindings</title>
		<link>http://www.nicollet.net/2011/12/two-way-bindings/</link>
		<comments>http://www.nicollet.net/2011/12/two-way-bindings/#comments</comments>
		<pubDate>Thu, 29 Dec 2011 18:42:05 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Functional]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Objective Caml]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2640</guid>
		<description><![CDATA[Quick : how would you design a function that opens a file handle that auto-closes whenever execution leaves a certain scope, even if an exception happens ? The C++ solution is quite straightforward : have a destructor that closes the file handle, and create the handle as an auto variable: std::ofstream out = std::ofstream("out.txt"); out [...]]]></description>
			<content:encoded><![CDATA[<p>Quick : how would you design a function that opens a file handle that auto-closes whenever execution leaves a certain scope, even if an exception happens ?</p>
<p>The C++ solution is quite straightforward : have a <em>destructor</em> that closes the file handle, and create the handle as an auto variable:</p>
<pre style="padding-left: 30px;">std::ofstream out = std::ofstream("out.txt");
out &lt;&lt; whatever();
// file handle is always closed by the language</pre>
<p>On the other hand, OCaml does not have destructors, but can simulate the RAII spirit <a href="http://thelema.github.com/AAA-batteries/hdoc/BatFile.html#VALwith_file_in" target="_blank">using a closure</a>: you provide a function that will be called with the file handle as its argument, and the file handle will be destroyed when the function returns or raises an exception.</p>
<pre style="padding-left: 30px;">BatFile.with_file_out "out.txt" begin fun out -&gt;
  BatIO.nwrite out (whatever ())
  (* the file handle is always closed by the language *)
end</pre>
<p>This is a special case of a more general principle.</p>
<h4>Two-way binding</h4>
<p>The standard <code>let</code> keyword performs a one-way binding: bind value to variable <em>then</em> evaluate expression. Two-way binding adds a post-processing step : when you&#8217;re done with the expression, do something else. Such a behavior has important consequences for writing concise and readable code.</p>
<p>In my OCaml code, two-way binding is performed with keyword let! that is preprocessed as follows :</p>
<pre style="padding-left: 30px;">let! pattern = value in expression
(* Is translated to *)
value (fun pattern -&gt; expression)</pre>
<p>For instance, the above file manipulation script would be written as:</p>
<pre style="padding-left: 30px;">let! out = BatFile.with_file_out "out.txt" in
BatIo.nwrite out (whatever ())</pre>
<p>This syntax expresses the actual intent of the code better than the anonymous callback syntax did: <em>bind the file handle to this variable, but don&#8217;t forget the post-processing steps</em>.</p>
<p>Here are a few more examples of situations that may be improved by this syntax :</p>
<h4>Events and reactive programming</h4>
<p>Reactive programs can be constructed either using the typical &#8220;<em>register this function to be called whenever this value changes or this event happens</em>&#8221; semantics, or  by using binding semantics instead:</p>
<pre style="padding-left: 30px;">let () =
  let! user = User.on_change (#last_login) in
  if user # notify_login then
    Mail.send (user # email)
      ("Someone has logged in to your account at " ^ datetime (user # last_login))</pre>
<p>The underlying signature of <code>User.on_change</code> (which registers a listener callback and returns unit) remains the same.</p>
<h4>Retry semantics</h4>
<p>CouchDB implements transactions with retry semantics: you read a document, compute some changes and try saving them back, and  if the document was changed by someone else in the mean time, you will have to try again. It makes sense for the code inside the transaction to be 1° idempotent and 2° wrapped away in a function that 3° takes the latest version of the document as an argument :</p>
<pre style="padding-left: 30px;">let set_title article_id new_title =
  let! article = Database.transaction article_id in
  Database.write article_id { article with title = new_title }</pre>
<p>In such a design, the write function would throw a specific exception if a collision occurs, and the transaction function would intercept that exception and try again until the transaction succeeded or a maximum number of retries happened.</p>
<h4>Monads</h4>
<p>Value binding in monads benefits from having a syntax that actually looks like binding.With the option monad, one can turn this :</p>
<pre style="padding-left: 30px;">match Files.get file_id with None -&gt; None | Some file -&gt;
  match file # owner with None -&gt; None | Some user_id -&gt;
    match Users.get user_id with None -&gt; None | Some user -&gt;
      Some (user # name)</pre>
<p>Into a more straightforward version :</p>
<pre style="padding-left: 30px;">let  open BatOption.Monad in
let! file    = bind $ Files.get file_id in
let! user_id = bind $ file # owner in
let! user    = bind $ Users.get user_id in
return (user # name)</pre>
<p>Also, one can deal with Lwt threads almost as well as the Lwt-specific syntax extension:</p>
<pre style="padding-left: 30px;">open Lwt
open Lwt_io

let process_lines channel process =
  let loop () =
    let! line_opt = bind $ read_line_opt channel in
    match line_opt with
      | None -&gt; return ()
      | Some line -&gt; loop () &lt;&amp;&gt; process line
  in
  loop ()</pre>
<h4>Being Silly</h4>
<pre style="padding-left: 30px;">let fold init list f = List.fold_left (fun acc x -&gt; f (acc,x)) init list
let map list f       = List.map f list

let probabilities odds =
  let sum =
    let! accumulator, odd = fold 0. odds in
    accumulator +. float_of_int odd
  in
  let! odd = map odds in
  float_of_int odd /. sum</pre>
<h4>The Syntax Extension</h4>
<p>In case you don&#8217;t know how to create it, this is the preprocessor file for this syntax extension :</p>
<pre style="padding-left: 30px;">open Camlp4.PreCast
open Syntax

EXTEND Gram
 GLOBAL: expr;

 expr: LEVEL "top"
 [
   [ "let"; "!"; p = patt ; "=" ; e = expr ; "in" ; e' = expr -&gt;
     &lt;:expr&lt; (($e$) (fun $p$ -&gt; $e'$)) &gt;&gt; ]
 ] ;

END;</pre>
<p>By the way, I find that this extension has a significant advantage over the Lwt extension &#8211; it is readily compatible with syntax highlighting in most editors.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/12/two-way-bindings/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Comment Branches</title>
		<link>http://www.nicollet.net/2011/11/comment-branches/</link>
		<comments>http://www.nicollet.net/2011/11/comment-branches/#comments</comments>
		<pubDate>Thu, 17 Nov 2011 18:42:36 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Functional]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Design Patterns]]></category>
		<category><![CDATA[Productivity]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2619</guid>
		<description><![CDATA[Your development job is making changes in your software. Writing, testing and debugging those changes takes some time. If your job is anywhere as hectic as mine, you will have to fix and deploy urgent patches, even when your application code is in a half-written, half-debugged state because of the feature of the month. This is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-2620" title="branches" src="http://www.nicollet.net/wp-content/uploads/2011/11/branches.png" alt="" width="675" height="100" />Your development job is making changes in your software. Writing, testing and debugging those changes takes some time.</p>
<p>If your job is anywhere as hectic as mine, you will have to fix and deploy urgent patches, even when your application code is in a half-written, half-debugged state because of <em>the feature of the month</em>.</p>
<p>This is what <em>branches</em> are for. You keep two versions of the code, one of which is called the <strong>trunk </strong>and is always ready for deployment, and another which holds the changes that you are working on.</p>
<p>When your feature is done, you <em>merge</em> the two versions together. You want to keep the merge operation painless. To do so, you have several kinds of branches available.</p>
<p>The <strong>repository branch</strong> is built into your SourceSafe/subversion/git/whatever. It creates two independent copies, and you need to migrate changes from the trunk to every branch out there as soon as possible, or the merge will make you wish for a sweet and merciful death.</p>
<p>By the way, changeset-oriented tools (like git or mercurial) make this easer, while revision-oriented tools (like subversion) make it harder.</p>
<p>The <strong>feature branch</strong> is done using programming logic. The code you deploy to production supports the new feature, but it is turned off for everyone except yourself. This technique is great for adding features, but inefficient when changing existing ones.</p>
<p>A side effect of the feature branch is that you can stress-test new code by rolling it out to increasing numbers of users progressively.</p>
<p>The <strong>comment branch</strong> is an odd gambit. It involves ripping out an entire module and replacing it with another that has a <em>different</em> interface. This will involve large amounts of re-wiring all over the code base, and these will take hours or days before they can be compiled, let alone <em>tested</em>.</p>
<p>Use a comment structure such as this one:</p>
<pre style="padding-left: 30px;"><span style="color: #008000;">/*[*/</span> old code <span style="color: #008000;">/*|* new code *]*/</span></pre>
<p>It is trivial to build a text-replacement macro that turns the above into the code below and back:</p>
<pre style="padding-left: 30px;"><span style="color: #008000;">/*[* old code *|*/</span> new code <span style="color: #008000;">/*]*/</span></pre>
<p>Use the macro to switch between development mode (when you write new code and desperately try to get it to compile) and fix mode (when you edit the old code and deploy it). For consistency, always commit the <em>old </em>version to the repository.</p>
<p>Why use <strong>comment branches</strong> instead of <strong>repository branches</strong> ? Maybe your source control tool sucks at branches. I use Subversion. Yes, I know. Legacy, pain and unlikely hopes of a brighter future.</p>
<p>When a trunk change occurs in a part that has been erased or reworked in the branch, that change <em>will</em> cause a conflict that <em>will</em> require manual intervention. Even with git or mercurial. For a large number of small changes sprinkled over a large codebase that is routinely involving many small updates, repository branches turn into a merge minefield.</p>
<p>Does your branch involve a small number of well-defined files ?</p>
<p>Then you should use <strong>repository branches</strong>, because conflicts will only happen in those files, and will usually be easy to fix.</p>
<p>Does your branch involve many changes in many files everywhere in the project ?</p>
<p>Then use <strong>comment branches</strong>.</p>
<p>Last and possibly least, there is the <strong>TODO-branch</strong>. This involves non-breaking, purely cosmetic changes. 25% of my project uses this syntax for historical reasons:</p>
<pre style="padding-left: 30px;">Table.get id |-&gt; function
   | None       -&gt; return 0
   | Some value -&gt; return value.count</pre>
<p>Then, a convention change happened, and this is used instead:</p>
<pre style="padding-left: 30px;">let! value_opt = breathe (Table.get id) in
match value_opt with  
   | None       -&gt; return 0
   | Some value -&gt; return value.count</pre>
<p>Then, another convention change happened, and this should be used instead</p>
<pre style="padding-left: 30px;">let! value = breathe_req_or (return 0) (Table.get id) in
return value.count</pre>
<p>And then, there&#8217;s the current version:</p>
<pre style="padding-left: 30px;">let! value = breathe_req_or (return 0) $ Table.get id in
return value.count</pre>
<p>Whenever I change coding conventions, I do not spend the time to reformat the tens of thousands of lines of code in my application. That would have been wasteful. Instead, every time a piece of code is refactored, it is refactored to the most recent style.</p>
<p>The same happens when using an old and a new version of a given API. My code uses two libraries for handling HTML forms, uses both Javascript and Coffeescript, and a variety of similar two-hammers-one-nail situations.</p>
<p>These are, for all practical purposes, branches. They are work that is being performed for long durations. The benefit of TODO-branches is that code in the middle of such changes is still compatible with the trunk. It all happens in the head of the developer, who remembers what changes should be done the next time a piece of code is rewritten.</p>
<p><small>Article Image &copy; Dominic Alves &mdash; <a href="http://www.flickr.com/photos/dominicspics/422131893/">Flickr</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/11/comment-branches/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Ozone Templating</title>
		<link>http://www.nicollet.net/2011/10/ozone-templating/</link>
		<comments>http://www.nicollet.net/2011/10/ozone-templating/#comments</comments>
		<pubDate>Tue, 18 Oct 2011 14:59:37 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Functional]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[Objective Caml]]></category>
		<category><![CDATA[RunOrg]]></category>
		<category><![CDATA[Template]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2588</guid>
		<description><![CDATA[There have been recurring requests about an in-depth explanation of how Ozone — our in-house OCaml web framework — handles HTML templates. So, here it is. A template is usually understood by everyone to be « HTML with holes » that is filled using values from the application itself. It is, in a sense, a DSL that [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-2596" title="mountains" src="http://www.nicollet.net/wp-content/uploads/2011/10/mountains.png" alt="" width="675" height="100" /></p>
<p>There have been recurring requests about an in-depth explanation of how Ozone — our in-house OCaml web framework — handles HTML templates. So, here it is.</p>
<p>A template is usually understood by everyone to be « HTML with holes » that is filled using values from the application itself. It is, in a sense, a DSL that is restricted to describing how HTML should be built.</p>
<p>Here is an example of template Ozone could use, stored in file <em>users.htm</em> :</p>
<pre style="padding-left: 30px;"><code>&lt;<span style="color: #003366;">h1</span>&gt;<span style="color: #ff6600;"><strong>{t:users.title}</strong></span>&lt;/<span style="color: #003366;">h1</span>&gt;
&lt;<span style="color: #003366;">ul</span> <span style="color: #008000;">class</span>=<span style="color: #ff0000;">"userlist"</span>&gt;
  {{<strong><span style="color: #ff6600;">list</span></strong>:
    &lt;<span style="color: #003366;">li</span> <span style="color: #008000;">id</span>=<span style="color: #ff0000;">"<strong><span style="color: #ff6600;">{v:id}</span></strong>"</span>&gt;
      &lt;<span style="color: #003366;">img </span><span style="color: #008000;">src</span>=<span style="color: #ff0000;">"<strong><span style="color: #ff6600;">{v:img}</span></strong>"</span>/&gt;
<span style="font-family: monospace;">      &lt;</span><span style="color: #003366;">a</span><span style="font-family: monospace;"> </span><span style="color: #008000;">href</span><span style="font-family: monospace;">=</span><span style="color: #ff0000;">"<strong><span style="color: #ff6600;">{v:url}</span></strong>"</span><span style="font-family: monospace;">&gt;</span><strong><span style="color: #ff6600;">{v:name}</span></strong><span style="font-family: monospace;">&lt;/</span><span style="color: #003366;">a</span><span style="font-family: monospace;">&gt;
    &lt;/</span><span style="color: #003366;">li</span><span style="font-family: monospace;">&gt;
</span><span style="font-family: monospace;">  }}</span><span style="font-family: monospace;">
</span><span style="font-family: monospace;">&lt;/</span><span style="color: #003366;">ul</span><span style="font-family: monospace;">&gt;
</span>
&lt;<span style="color: #003366;">script</span> <span style="color: #008000;">type</span>=<span style="color: #ff0000;">"text/coffeescript"</span> <span style="color: #008000;">params</span>=<span style="color: #ff0000;">"<strong><span style="color: #ff6600;">save_url</span></strong>"</span>&gt;
list = @$.find <span style="color: #ff0000;">'ul'</span>

save = =&gt;
  ids = [];
  list.children(<span style="color: #ff0000;">'li'</span>).each -&gt;
    ids.push $(@).attr 'id'
  @ajax save_url, ids

list.children(<span style="color: #ff0000;">'li'</span>).sortable
  change: save
&lt;/<span style="color: #003366;">script</span><span style="font-family: monospace;">&gt;
</span>
&lt;<span style="color: #003366;">style</span> <span style="color: #008000;">type</span>=<span style="color: #ff0000;">"less"</span>&gt;
<span style="color: #008000;">.userlist</span> {
  <span style="color: #003366;">list-style-type</span>: none;
  <span style="color: #008000;">li img</span> {
    <span style="color: #003366;">float</span>: left;
    <span style="color: #003366;">margin-right</span>: 5px;
    <span style="color: #003366;">width</span>: 50px;
    <span style="color: #003366;">height</span>: 50px;
  }
}
&lt;/<span style="color: #003366;">style</span>&gt;  </code></pre>
<p>The template for a sortable list of users contains three things:</p>
<ul>
<li>A piece of HTML, which is the actual « HTML with holes » to be filled later. The holes are marked in orange.</li>
<li>A piece of CoffeeScript, which will be extracted from the template file, compiled to javascript and appended to a site-wide javascript file. It will be replaced, in the template, by a hole that will call the extracted javascript with additional parameters provided by the application (in orange).</li>
<li>A piece of LESS CSS, which is compiled to CSS and appended to a site-wide CSS file.</li>
</ul>
<p>These are not sections — they can appear in any order as long as the elements and attributes are respected so the pre-build tool can identify and extract the CoffeeScript and CSS bits.</p>
<p>Let&#8217;s examine each of them in order.</p>
<h4>The HTML Template</h4>
<p>This is the meat of the template. In order to improve application performance, loading the templates is a multi-step operation that involves intermediary storage formats.</p>
<p>The <em>first</em> step consists in reading in all the necessary templates, parsing them to determine that no variables are undefined, and storing them as a JSON blog in the underlying CouchDB database. This is a manually triggered operation that happens whenever we modify the templates (it&#8217;s part of our deployment procedure). This step may also involve a bit of cleanup, such as removing semantically irrelevant spaces from the HTML (this cannot be done earlier, because some templates are plaintext instead of HTML, and only the application knows which is which).</p>
<p>The <em>second </em>step happens whenever a new instance of our application begins — maybe it died and needed to restart, maybe Apache decided it needed another worker process to handle a surplus of request, or maybe we added a new server to our web farm. The startup process of our application server does not read anything from the disk — instead, it will read in all the template data from the database, along with all the other bits of configuration: internationalization strings, third party API keys, feature branch triggers, and so on. Then, it will compile every template down to optimized closure-based opcodes for a hole-filling virtual machine.</p>
<p>The <em>third </em>step happens whenever a bit of HTML needs to be rendered. The application provides the hole-filling virtual machine with a data object and a « writing stream » which is either the HTTP request stream or a JSON serializer stream, depending on whether the request is normal HTTP or AJAX. This is an extremely fast operation where no parsing or checks are performed.</p>
<p>On the application side, loading a template involves three things:</p>
<ul>
<li>Declaring the type of the data object expected by the template.</li>
<li>Declaring the source file from the template (as a function of the language).</li>
<li>Declaring the hole-to-value mapping  to be used.</li>
</ul>
<p>Here&#8217;s that loading code for the above template file:</p>
<pre style="padding-left: 30px;"><span style="color: #003366;">module </span>User = Loader.Html(<span style="color: #003366;">struct</span>
  <span style="color: #003366;">type </span>t = &lt;
    id   : Id.t ;
    url  : string ;
    img  : string option ;
    name : string
  &gt; ;;
  <span style="color: #003366;">let </span>source  _ = <span style="color: #ff0000;">"users/list"</span>
  <span style="color: #003366;">let </span>mapping _ = [
    <span style="color: #ff0000;">"id"</span>,   Mk.esc (<span style="color: #003366;">fun </span>x -&gt; Id.to_string (x # id)) ;
    <span style="color: #ff0000;">"url"</span>,  Mk.esc (<span style="color: #003366;">fun </span>x -&gt; x # url) ;
    <span style="color: #ff0000;">"img"</span>,  Mk.esc (<span style="color: #003366;">fun </span>x -&gt; BatOption.default img404 (x # img)) ;
    <span style="color: #ff0000;">"name"</span>, Mk.esc (<span style="color: #003366;">fun </span>x -&gt; x # name)
  ]
<span style="color: #003366;">end</span>)

<span style="color: #003366;">module </span>UserList = Loader.Html(struct
  <span style="color: #003366;">type </span>t = &lt;
    users : User.t list
  &gt; ;;
  <span style="color: #003366;">let </span>source  _ = <span style="color: #ff0000;">"users"</span>
  <span style="color: #003366;">let </span>mapping lang = [
    <span style="color: #ff0000;">"list"</span>, Mk.list (<span style="color: #003366;">fun </span>x -&gt; x # users) (User.template lang)
  ]
<span style="color: #003366;">end</span>)</pre>
<p>One <em>view </em>is defined and loaded for every independent piece of HTML in the template. Here, there is an User view which represents the list item for a single user, repeated zero, one or more times ; and there is the UserList view representing the wrapper in which those list items will be placed.</p>
<p>The <code>{v:foobar}</code> syntax defines a variable hole. The corresponding view MUST define a mapping for that variable, or an error will occur at deployment time.</p>
<p>The <code>{{foobar: }}</code> syntax is a variant: in addition to declaring a variable hole, it also defines such a sub-view, which can be loaded using <code>template/foobar</code> as the path.</p>
<p>The <code>{t:foobar}</code> syntax defines a translation hole. The template engine will automatically load the corresponding term from the internationalization dictionary used to render the template.</p>
<p>The <code>Mk.esc</code> and <code>Mk.list</code> are binding instructions which are used to compile the template to a virtual machine. The common binding instructions are:</p>
<ul>
<li><code>Mk.esc f</code> applies <code>f</code> to the data object, which returns a string. That string is then HTML-escaped and output.</li>
<li><code>Mk.str f</code> is the same as above, but the string is not HTML-escaped.</li>
<li><code>Mk.i18n f</code> is the same as above, but the string is translated as an internationalization term.</li>
<li><code>Mk.list f t</code> applies <code>f</code> to the data object, which returns a list of data objects compatible with template <code>t</code>. That template is then used to render those data objects in order.</li>
<li><code>Mk.list_or f t e</code> is the same as above, but if the returned list is empty, it instead uses template <code>e</code> to draw a « list is empty » message.</li>
<li><code>Mk.sub f t</code> applies <code>f</code> to the data object, which returns a single object compatible with template <code>t</code>. That template is then used to render the object.</li>
<li><code>Mk.sub_or f t e</code> is the same as above, but <code>f</code> returns an optional type. If it is missing, then template <code>e</code> is used to render an « object is missing » message.</li>
<li><code>Mk.text f</code> provides <code>f</code> with the current writing stream and internationalization object, so that it may directly write HTML to the output. This is how most rendering helpers such as « render a currency amount » are used.</li>
<li><code>Mk.box f</code> is the same as above, but the writing stream supports the addition of arbitrary javascript code to be executed by the client as part of rendering the template. This is how javascript-dependent rendering helpers such as « render a datepicker » are used.</li>
</ul>
<p>The data type is defined in the view itself, either explicitly (as I did above for the sake of clarity) or by using an existing type from your application — if the application already had an user module with the appropriate data type, I could have used that type instead.</p>
<p>By specifying views in this way, the data required to render a template is made available to the compiler for type-checking, and missing bindings are detected during deployment (usually to a local test server). This has made template-related errors exceedingly rare — once the HTML is done, it becomes extremely hard to use it wrong.</p>
<p>Although this feature is not currently in use, the virtual machine semantics also allow compiling it down to JavaScript. This would allow us to send the rendering code to the client as a one-time cost, and send a much smaller data package through AJAX whenever something new needs to be rendered.</p>
<h4>The CoffeeScript Layer</h4>
<p>We use CoffeeScript because it&#8217;s more elegant, shorter, and includes a compiling-to-javascript step that lets us detect syntax errors at deployment time. Yes, compile- and deployment-time are my favorite buzzwords, because I enjoy the feeling of safety that they bring.</p>
<p>As mentioned above, the actual CoffeeScript is removed from the template in a pre-processing step, and replaced with a hole that says « call JavaScript function #33 now » that happens to define a list of parameters matching the params attribute of the original script element.</p>
<p>So, starting with the script element from the example above:</p>
<pre style="padding-left: 30px;"><code>&lt;<span style="color: #003366;">script</span> <span style="color: #008000;">type</span>=<span style="color: #ff0000;">"text/coffeescript"</span> <span style="color: #008000;">params</span>=<span style="color: #ff0000;">"<span style="color: #ff6600;"><strong>save_url</strong></span>"</span>&gt;
list = @$.find <span style="color: #ff0000;">'ul'</span>

save = =&gt;
  ids = [];
  list.children(<span style="color: #ff0000;">'li'</span>).each -&gt;
    ids.push $(@).attr 'id'
  @ajax save_url, ids

list.children(<span style="color: #ff0000;">'li'</span>).sortable
  change: save
&lt;/<span style="color: #003366;">script</span>&gt;</code></pre>
<p>If this is the 33rd script tag encountered by the preprocessor, then it would append the following to the complete CoffeeScript file:</p>
<pre style="padding-left: 30px;">@j33 = (save_url) -&gt;
  list = @$.find <span style="color: #ff0000;">'ul'</span>

  save = =&gt;
    ids = [];
    list.children(<span style="color: #ff0000;">'li'</span>).each -&gt;
      ids.push $(@).attr 'id'
    @ajax save_url, ids

  list.children(<span style="color: #ff0000;">'li'</span>).sortable
    change: save</pre>
<p>And it would be replaced in the template file with this:</p>
<pre style="padding-left: 30px;">{j:j33:save_url}</pre>
<p>This syntax (which can be used manually, although it should be avoided) is a javascript hole, it runs the specified function and provides by-name values for the arguments. The parser would notice that we are declaring an HTML view instead of a JS/HTML view and complain about it, so we would have to go back and re-define it:</p>
<pre><span style="color: #003366;">module </span>UserList = Loader.JsHtml(struct
  <span style="color: #003366;">type </span>t = &lt;
    users : User.t list ;
    save_url : string
  &gt; ;;
  <span style="color: #003366;">let </span>source  _ = <span style="color: #ff0000;">"users"</span>
  <span style="color: #003366;">let </span>mapping lang = [
    <span style="color: #ff0000;">"list"</span>, Mk.list (<span style="color: #003366;">fun </span>x -&gt; x # users) (User.template lang)
  ]
  <span style="color: #003366;">let </span>script  _ = [
    <span style="color: #ff0000;">"save_url"</span>, (<span style="color: #003366;">fun </span>x -&gt; Json_type.String (x # save_url))
  ]
<span style="color: #003366;">end</span>)</pre>
<p>I have used <code>Loader.JsHtml</code> instead of <code>Loader.Html</code>, and defined a secondary mapping that is specific to JavaScript parameters, and which uses the data object to return JSON values.</p>
<p>How is the JavaScript called? Well, it really depends on how your JavaScript library handles it. On non-AJAX HTTP, Ozone will try to inject all JavaScript calls in a script element at the end of the HTML body. In AJAX mode, Ozone allows you render a template to a JSON object representing both the HTML and the JavaScript together, and it is the responsibility of the code that made the AJAX request to receive that object, place the HTML wherever applicable, and then &#8220;run the JavaScript&#8221;.</p>
<p>By convention, the JavaScript is called using a <em>client context</em> as its<code>this</code> value. The client context is an object which may contain whatever the caller finds interesting to place there, along with a variable named <code>$</code> which should be a jQuery selection containing the root element of the previously rendered HTML. Hence, <code>@$.find 'ul'</code> would select the list in the rendered HTML, instead of all the lists on the page.</p>
<h4>The LESS CSS Layer</h4>
<p>This is the least interesting of all three layers. The LESS CSS code is extracted, appended to a single file, and compiled to CSS (which, again, is an useful deployment-time syntax check). The point of this feature is simply to let the designer place element-specific CSS next to the element, instead of having it exist in an external file and cause trouble with asset garbage collection (can I remove this rule or is it still used anywhere?) External files still exist, though, for CSS rules that are not limited to a single template.</p>
<h4>Bonus : the triple hash</h4>
<p>How do I define some code that should be called when a button is clicked? Defining it directly in the onclick method is ugly, hard to read and does not let the application provide parameters, so what else can I do?</p>
<p>The solution is to use an intermediary global object that happens to be the same for the entire file — a pattern that stores any template-related JS in a global variable named after <code>__FILE__</code> !</p>
<p>Yes, it is a hack, but it&#8217;s a simple and useful one.</p>
<p>The only difference is that <code>__FILE__</code> is spelled <code>###</code>.</p>
<pre style="padding-left: 30px;"><code>&lt;<span style="color: #003366;">button </span><span style="color: #008000;">type</span>=<span style="color: #ff0000;">"button"</span> <span style="color: #008000;">onclick</span>=<span style="color: #ff0000;">"###.frobnicate()"</span>&gt;<span style="color: #ff6600;"><strong>{t:frobnicate}</strong></span>&lt;/<span style="color: #003366;">button</span>&gt;

&lt;<span style="color: #003366;">script </span><span style="color: #008000;">type</span>=<span style="color: #ff0000;">"text/coffeescript"</span> <span style="color: #008000;">params</span>=<span style="color: #ff0000;">"<span style="color: #ff6600;"><strong>message</strong></span>"</span>&gt;
###.frobnicate = -&gt;
  alert message
&lt;/<span style="color: #003366;">script</span>&gt; </code></pre>
<p><small>Article Image &copy; gdbg12 &mdash; <a href="http://www.flickr.com/photos/78168499@N00/408879493/">Flickr</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/10/ozone-templating/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Node.js is Aquarius</title>
		<link>http://www.nicollet.net/2011/10/node-js-is-aquarius/</link>
		<comments>http://www.nicollet.net/2011/10/node-js-is-aquarius/#comments</comments>
		<pubDate>Mon, 03 Oct 2011 08:07:15 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Dynamic]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Node.js]]></category>
		<category><![CDATA[Web]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2548</guid>
		<description><![CDATA[@Ted Dziuba : your article on Node.js being cancer has brought many angry nerds with pitchforks to your door. You do make good points, and the best opinion is not one that everyone blindly agrees with, but one that gets everyone thinking — hopefully before they speak. Scalability I, too, would take issue with a [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-2549" title="ecluse" src="http://www.nicollet.net/wp-content/uploads/2011/10/ecluse.png" alt="" width="675" height="100" /></p>
<p>@<a href="http://teddziuba.com/2011/10/node-js-is-cancer.html" target="_blank">Ted Dziuba</a> : your article on Node.js being cancer has brought many angry nerds with pitchforks to your door. You do make good points, and the best opinion is not one that everyone blindly agrees with, but one that gets everyone thinking — hopefully before they speak.</p>
<h3>Scalability</h3>
<p>I, too, would take issue with a statement like «Node.js is scalable because it is non-blocking» though not the same issue as you took. Being <em>non-blocking</em> does not help with <em>scalability</em> at all. Scalability is about how easily your system administrator can add a new machine to your web farm to soak up a heavier load than usual, and it&#8217;s all about two things:</p>
<ul>
<li><strong>Can you run multiple copies of your software in parallel</strong>? In-application sharing of data makes this harder. Some Java servers store the entire relevant state in application memory, so scaling is impossible. PHP stores session files on the disk by default, so scaling is only possible with server affinity (the same user always gets sent to the same server). A clean server with no in-application data sharing is easily duplicated, regardless of the language.</li>
<li><strong>Is there a shared resource with sequential access</strong>? If you run a hundred thousand web servers, but all of them have to read-write the same physical drive, then your application will be no faster than that read-write speed. If you access a database that involves heavy locking, then your application will be no faster than the locking sequence can allow.</li>
</ul>
<p>None of these are in any way improved or even affected by non-blocking semantics.</p>
<p>Node.js improves <em>performance</em> when serving multiple concurrent requests. It makes it no easier to scale, but it helps delay the point where scaling becomes necessary.</p>
<p>The typical explanation of how this happens is that if serving a request uses 10ms of processing things on the server («Work») and 10ms of waiting for database requests to complete («Wait»), then the ideal web server should be able to serve two concurrent requests in 10ms each by overlapping the processing time of one request with the database wait time of another. This is a pretty nice and simple idea, which is why everyone has been doing it for ages. The main difference is how it is done.</p>
<p>What the traditional UNIX world did is pop enough processes — that is the Unix answer to every problem, <a href="http://en.wikipedia.org/wiki/Fork_bomb#Defusing" target="_blank">including having too many processes around</a>. If your Work-time is 10ms and your Wait-time is 40ms, then by allowing up to four processes you are effectively recycling all the wait-time in a high concurrent load situation. This is why every CGI- or FastCGI-enabled web server in existence provides a configuration entry for the number of concurrent child processes.</p>
<p>Node.js does the same. With that same Wait/Work ratio of 40/10, Node.js will be serving four concurrent requests at the same time, because it cannot create processing time out of thin air.</p>
<p>What Node.js brings to the table is an architecture that performs, at the server level, what the traditional UNIX world did at the kernel level: scheduling. Whether this approach is significantly faster than a properly configured FastCGI setup is still a matter of debate, and I believe the answer here is simply that, as long as the Wait/Work time ratio does not push the number of concurrent processes higher than what the available memory allows, there will be no significant difference between FastCGI and Node.js in terms of blocking.</p>
<h4>The UNIX Way</h4>
<p>I once agreed with your stated opinion on the matter, but I got better. Here&#8217;s the thing: today, being an HTTP server is no more of a «responsibility» than reading from STDIN and writing to STDOUT. Make no mistake: being a production, internet-facing HTTP server <em>is</em> a responsibility, but that is not what Node.js is (or should be) trying to achieve.</p>
<p>Consider this: the production, internet-facing HTTP server must communicate with the actual application using one protocol or another. CGI is one such protocol, FastCGI is another, and HTTP is yet another — the fact that the same protocol is used for serving requests over the internet is not  a problem, it is actually a benefit because communicating through HTTP is a solved problem with a clean API in almost every single language out there.</p>
<p>There is now something I would jokingly call «The REST Way» which follows in the tracks of the UNIX Way in a cloudy fashion : small applications performing one task — dispatching internet requests, constructing responses, persistent storage, caching — running on any number of servers in any number of locations, and connected to each other through HTTP requests. In an nginx-Node.js-CouchDB stack, nginx is the dispatcher, Node.js constructs responses, and CouchDB provides persistent storage, and everyone «speaks» HTTP in the same way that Unix processes «speak» STDIN/STDOUT.</p>
<p><small>Article image &copy; Patrick Janicek &mdash; <a href="http://www.flickr.com/photos/marsupilami92/5943144941/">Flickr</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/10/node-js-is-aquarius/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Having a Strong Opinion</title>
		<link>http://www.nicollet.net/2011/09/having-a-strong-opinion/</link>
		<comments>http://www.nicollet.net/2011/09/having-a-strong-opinion/#comments</comments>
		<pubDate>Thu, 01 Sep 2011 12:14:21 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Imperative]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Psychology]]></category>
		<category><![CDATA[Strategy]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2534</guid>
		<description><![CDATA[Many blogs about technical hiring will at one point state something about buzzwords and programmer flexibility. One of the original trendsetters, Joel Spolsky, said: The recruiters-who-use-grep, by the way, are ridiculed here, and for good reason. I have never met anyone who can do Scheme, Haskell, and C pointers who can&#8217;t pick up Java in [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-2535" title="sunset" src="http://www.nicollet.net/wp-content/uploads/2011/09/sunset.png" alt="" width="675" height="100" /></p>
<p>Many blogs about technical hiring will at one point state something about buzzwords and programmer flexibility. One of the original trendsetters, <a href="http://www.joelonsoftware.com/articles/ThePerilsofJavaSchools.html" target="_blank">Joel Spolsky</a>, said:</p>
<blockquote><p>The recruiters-who-use-grep, by the way, are ridiculed here, and for  good reason. I have never met anyone who can do Scheme, Haskell, and C  pointers who can&#8217;t pick up Java in two days, and create better Java code  than people with five years of experience in Java, but try explaining  that to the average HR drone.</p></blockquote>
<p>And this is not only a point about elite languages like Scheme-Haskell-C versus mundane languages like Java-PHP-whatever : flexibility, the ability to switch languages and to adapt to new interfaces and libraries, is almost always presented as a prerequisite to being competent. John can perform miracles with PHP but cannot easily learn Ruby ? Then John is not a competent programmer, he is just a competent <em>PHP</em> programmer.</p>
<p>Maybe there is some truth to this characterization. Maybe there is indeed something about good programmers that lets them shine in a language-independent way, with languages as mere details of their day-to-day miracles. But I am vaguely uncomfortable with that notion. And not for personal reasons — my current language of choice is one of those elite functional languages that would hypothetically place me at the apex of the competence food chain.</p>
<p>I believe the critical element of programming competence is not <em>ability</em> but <em>passion</em>. What makes you a good programmer is how much you care about software development. Does John have a nine-to-five PHP programming job and hardly touch the computer outside of work, or does he do small projects on the side, or contribute to Open Source PHP software, or answer technical PHP questions on Stack Overflow, or perform any other number of PHP-related activities that do not have professional rewards as their main objective? Does he unconsciously try to <em>do the right thing</em> in his code, even though it will be harder than writing a dirty hack to make his boss happy?</p>
<p>I have seen people, many of them high-ranking academics, with the intellectual firepower to outgun me in any programming-related endeavor, but a striking lack of passion that let their applications crippled, hideous and unreliable. And I have no doubts that, had they cared about those things, they could have done better.</p>
<p>I have seen people, many of them students, with a genuine passion for software development, who would spend their free time hacking together video games or dynamic websites or clever hacks, who would notice after a while that their abilities were stagnating and, unable to improve, would give up programming rather than live with the frustration of writing software worthy of their expectations.</p>
<p>And when you care about programming, you tend to have strong opinions about how it should be done.</p>
<p>Some of these opinions are trivial. My hair stands on end whenever I have to read badly formatted code — I don&#8217;t care about the opening-brace-position flame wars, any convention is fine by me as long as it is consistently followed — and the authors often wonder why I would care about such a silly thing. I have a strong opinion about how code should look like, and I dislike working with people who do not share that opinion.</p>
<p>Yes, I am one of those Scheme-Haskell-C elite programmers, and I can pick up Java in a few days and outperform experienced Java-only programmers. I have done it several times in the past. And every single time I did so, I felt dirty and miserable, because Java goes against several of my opinions about what software development should be like.</p>
<p>In fact, I am not really surprised about the popular success of Python and Ruby on Rails — not in terms of how many projects are written, but in terms of how outspoken the technical advocates are. This is because those two have something that appeals to people who can become passionate about them : a clean core philosophy you can agree or disagree with.</p>
<p>Python zealots flock around the <a href="http://www.python.org/dev/peps/pep-0020/" target="_blank">Zen of Python</a> :</p>
<blockquote><p>Beautiful is better than ugly.<br />
Explicit is better than implicit.<br />
Simple is better than complex.<br />
Complex is better than complicated.<br />
Flat is better than nested.<br />
Sparse is better than dense.<br />
Readability counts.<br />
Special cases aren&#8217;t special enough to break the rules.<br />
Although practicality beats purity.<br />
Errors should never pass silently.<br />
Unless explicitly silenced.<br />
In the face of ambiguity, refuse the temptation to guess.<br />
There should be one&#8211; and preferably only one &#8211;obvious way to do it.<br />
Although that way may not be obvious at first unless you&#8217;re Dutch.<br />
Now is better than never.<br />
Although never is often better than *right* now.<br />
If the implementation is hard to explain, it&#8217;s a bad idea.<br />
If the implementation is easy to explain, it may be a good idea.<br />
Namespaces are one honking great idea &#8212; let&#8217;s do more of those!</p></blockquote>
<p>Ruby on Rails fanboys have a similar set of core beliefs, the <a href="http://guides.rubyonrails.org/getting_started.html#what-is-rails" target="_blank">Rails Way</a>:</p>
<blockquote><p>DRY – “Don’t Repeat Yourself” – suggests that writing the same code over and over again is a bad thing.<br />
Convention Over Configuration – means that Rails makes assumptions about what you want to do and how you’re going to d o it, rather than requiring you to specify every little thing through endless configuration files.<br />
REST is the best pattern for web applications – organizing your application around resources and standard HTTP verbs is the fastest way to go.</p></blockquote>
<p>So, if you happen to wholeheartedly agree with the Ruby on Rails way, then by using it you are certain to find both a technical environment in which you can feel happy, and a community that shares you strong opinions about software development. It is any wonder, then, that people <em>passionate</em> about the RoR values would flock to RoR and, inevitably, start advocating its use?</p>
<p>Going a little bit further, if you are hiring for your software company, would you rather hire someone with weak opinions on most topics because they are «flexible» or someone with strong opinions that match the strong opinions of your company? Given the choice, I would certainly hire the latter.</p>
<p>I have my own «core philosophy» that I apply to the way I write my own code. These would be, by order of decreasing importance:</p>
<ol>
<li>It is better to <strong>have a correct program with few features</strong>, than a buggy program with many features.<br />
<small>If possible, take the time to design your code and your interface so that errors cannot happen. If not, explicitly detect and display all errors as they happen. If possible, have a programming language and a programming style that can eliminate by design many errors, rather than a programming language or programming style that improves productivity at the cost of having more errors.<br />
</small></li>
<li>It is better to <strong>prove the correctness of a program</strong>, than to test for the existence of bugs.<br />
<small>Tests cannot prove that the software is correct, they may only prove the existence of bugs. A proven program contains no bugs, there is no worry about having enough code coverage and enough test cases. This is a special case of &#8220;fail early&#8221; : better to fail at the compilation stage, than to fail during tests or at runtime.</small></li>
<li>It is better to <strong>accept that code will have to be rewritten</strong>, than to future-proof a complex design.<br />
<small>Future-proof code will likely be larger, and contains more untested pieces, than normal code. This increases the probability of bugs, without completely eliminating the possibility of a completely unforeseen design change that still involves a rewrite. Preparing your code for a rewrite, by splitting it up into clean independent self-documenting modules and creating automated correctness checks for these, is the best way to make it flexible.<br />
</small></li>
<li>It is better to <strong>enforce data constraints through types</strong>, than to enforce it through code.<br />
<small>Attempting to store data that violates the constraints fails earlier if the type cannot represent that data, especially in a statically typed language. Doing things this way might take longer than just keeping a flexible data type and performing the constraint checks in the code, but the odds of it being correct are higher.<br />
</small></li>
<li>It is better to <strong>have the computer do work for you</strong>, than for you to do that work yourself.<br />
<small>Why write trivial unit tests when you can harness the type system to perform those checks? Why define or configure things by hand when your framework could define or configure them for you?</small></li>
<li>It is better to <strong>rewrite your code using new concepts</strong>, than to insist on using existing but ill-adapted concepts.<br />
<small>Concepts improve productivity and readability, and by design will prevent some kinds of incorrect usage, but only as long as they match what the software is expected to be doing. Otherwise, at best they will be a useless weight and at worst will have to be tediously worked around to achieve anything. The size of the refactoring is no obstacle: if half the application needs to be adapted to the new concept, then so be it.<br />
</small></li>
<li>It is better to <strong>repeat yourself from time to time</strong>, than to introduce too many concepts.<br />
<small>Any repetition can be eliminated by adding a new abstraction through refactoring. That abstraction is usually a mere application of an existing pattern or concept, but might sometimes give flesh to a new concept. While that concept arguably already existed in the non-refactored code, it is easier to understand uncommon concepts by looking at their repeated code, than to give them a sufficiently understandable name.</small></li>
</ol>
<p>I don&#8217;t know. Maybe someone might agree with me one day.<br />
<small>Article image © Timo Newton-Syms — <a href="http://www.flickr.com/photos/timo_w2s/6021716943/in/photostream/">Flickr</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/09/having-a-strong-opinion/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Coping With Inconsistent Databases</title>
		<link>http://www.nicollet.net/2011/08/coping-with-inconsistent-databases/</link>
		<comments>http://www.nicollet.net/2011/08/coping-with-inconsistent-databases/#comments</comments>
		<pubDate>Fri, 05 Aug 2011 21:21:17 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Dynamic]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[CouchDB]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2470</guid>
		<description><![CDATA[In my earlier article about the benefits of NoSQL, I discussed eventually consistent databases. These are databases where « write A ; read A » can return an outdated or missing value, but « write A ; wait ; read A » will always return the correct value if you wait long enough. Dealing with [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-2486" title="clock" src="http://www.nicollet.net/wp-content/uploads/2011/08/clock.png" alt="" width="675" height="100" /></p>
<p>In <a href="http://www.nicollet.net/2011/07/nosql-is-a-premature-optimization/" target="_blank">my earlier article about the benefits of NoSQL</a>, I discussed eventually consistent databases. These are databases where « write A ; read A » can return an outdated or missing value, but « write A ; wait ; read A » will always return the correct value if you wait long enough. Dealing with eventual consistency can lead to bugs, because there are many pitfalls caused by race conditions. It&#8217;s impossible for anyone to avoid race conditions by reading the code and thinking very hard about it. Instead, the code must be written using patterns and <em>mental tools</em> that by their very design prevent race conditions from happening. My point was that most programmers that only had experience with the absolute-consistency SQL world do not have the mental tools necessary to avoid those pitfalls. Not because they are incapable of it, but because they never had the training or the experience to acquire these mental tools.</p>
<p>Today, an anonymous coward shared a few thoughts on the topic :</p>
<blockquote><p>They do not have the mental tools required to work with eventual consistency?<br />
The only mental tool I’ve seen is disregard for the issue.<br />
Waiting eagerly on another post discussing those “mental tools”.</p></blockquote>
<p>He/she is right, what <em>are</em> those mental tools anyway ?</p>
<p>First, let me state the obvious again : eventually consistent databases almost never remain inconsistent long enough for users to notice and, even if they do notice, they usually don&#8217;t care — through the prevalence of cache-powered websites, our users are used to seeing stale data every so often and know to hit the refresh button to deal with it. Aside from a few critical edge cases like online payment processing, <strong>the problem with eventual consistency is not the user</strong>.</p>
<p>The problem is that software makes decisions based on available data and, if the available data is wrong, then the outcome is wrong. This decision-making process will turn a one-nanosecond inconsistency into a permanent error if you are unlucky, and the entire point of this article is how to prevent this from happening. Need an example?</p>
<h3>Event-Based vs State-Based</h3>
<p>Let&#8217;s say I&#8217;m writing a badge module similar to the one used on <a href="http://stackoverflow.com/badges" target="_blank">Stack Overflow</a>. Here are the specifications:</p>
<blockquote><p>The user can publish articles. Their 10th article will bear a bronze badge, their 50th will bear a silver badge, and their 100th article will bear a golden badge.</p></blockquote>
<p>One way I can write this module is to intercept the «publish article» event and add my own bit of logic to it: if there are nine other articles, award the bronze badge. This is an event-based approach, because it performs some changes when an event happens. This way of doing things is almost universally followed in the SQL world, but it does not work in NoSQL environments that lack absolute consistency.<strong><br />
</strong></p>
<p><strong>What&#8217;s the problem?</strong> One user, Bob, tries to cheat the system by publishing nine articles, then publishing articles X and Y in quick succession, hoping to get bronze badges for both. The behavior we want is that X should have the bronze badge and Y should not.</p>
<ul>
<li>If absolute consistency is guaranteed, then Y will be published when the database already knows that X has been published, it will be the 11th and thus will not receive the badge.</li>
<li>If only eventual consistency is guaranteed, then Y might be published before the existence of X has been acknowledged : both articles would receive badges.</li>
</ul>
<p>The alternative is to use a state-based architecture where «On EVENT apply CHANGE» is replaced by «If STATE-A then STATE-B» : instead of «On publishing the tenth article, award badge» the system uses «If this is the tenth article, then it has the bronze badge.» Where an event-based solution would apply the CHANGE and move on, the state-based solution instead examines STATE-A whenever someone asks for STATE-B and applies the rule every single time.</p>
<p>Going back to Bob&#8217;s problem : if you ask a few nanoseconds after both articles are published «Does article Y have the bronze badge?» then the answer will still be «Yes» because eventual consistency takes a short while to set in. But if you ask the same question a few seconds later, then article Y will be correctly known as being the 11th article and the answer will be «No»</p>
<p>An application that is entirely based on state-based rules can work with an eventually consistent database without ever having permanent errors — by definition, any errors would only last as long as the underlying inconsistencies remain. In practice, from my experience with CouchDB, all temporary errors are gone after a couple of seconds in the very worst case, and it&#8217;s usually gone before that.</p>
<p>But state-based rules do mean that whenever the application needs to know STATE-B, it must read STATE-A and apply the rule again. Does this mean that I will have to count the articles (a potentially costly operation) whenever I need to know if a given article has the bronze badge? This is pure insanity!</p>
<h3>State-Based Caches</h3>
<p>The NoSQL answer is «Cache it!»</p>
<p>In fact, I will go even further: a NoSQL-friendly architecture eliminates several downsides of caching while keeping all the performance benefits, in ways that no event-based SQL solution can.</p>
<ul>
<li>Staleness of cached data is not an issue: the software is already designed to deal with eventual consistency and a cache is just another kind of eventually consistent data source. Unlike traditional software that relies on absolute consistency, NoSQL-friendly applications can make business decisions based on cached data without any risk.</li>
<li>Dependencies between STATE-A and STATE-B are usually first-class citizens of the application source code, so when a state change happens it&#8217;s easy to follow the threads and invalidate all the dependencies. The application can rely on invalidation instead of timeouts to keep the cache up-to-date.</li>
<li>Most NoSQL solutions already provide some level of caching. For instance, counting the number of published articles in CouchDB is <a href="http://wiki.apache.org/couchdb/Introduction_to_CouchDB_views#Reduced_Value_Sizes" target="_blank">a constant-time cached operation</a>, and the database keeps the cache up-to-date without developer intervention. In fact, manual caching is almost never a requirement for simple rules in CouchDB — and even then, the database provides a &#8220;last changes&#8221; real-time feed that the developer can use to make cache management easier.</li>
</ul>
<p>It interesting to note that several common patterns in SQL event-based applications are in fact poor implementations of a caching strategy for a state-based rule. An upvote/downvote system such as the one <a href="http://www.reddit.com/" target="_blank">Reddit</a> uses involves storing both the number of votes in the <em>item</em> table, and the individual votes in an <em>user-comment</em> association table — the former is used to quickly determine the current score of an item, while the latter is used to prevent people from voting several times. The state-based query implemented here is :</p>
<p style="padding-left: 30px;"><code>SELECT SUM(score) FROM votes WHERE item_id = ?<br />
</code></p>
<p>However, the naive event-based solution is to intercept &#8220;upvote&#8221; and &#8220;downvote&#8221; events and perform this query instead:</p>
<p style="padding-left: 30px;"><code>UPDATE item SET score = score + 1 WHERE item_id = ?</code></p>
<p>This is done in the hopes that the sequence of of +1&#8242;s and -1&#8242;s will remain equivalent to the original state-based query, which is only the case if upvotes and downvotes are the only events that affect the votes table. If, say, banning an user account retroactively deletes all the associated votes, it would take another ad hoc query to keep the cache correct. Maybe something like this:</p>
<p style="padding-left: 30px;"><code>UPDATE item NATURAL JOIN vote SET score = item.score - vote.score<br />
WHERE vote.user_id = ?</code></p>
<p>This is because of a fundamental difference between event-based and state-based designs : if your value actually depends on the state, then it takes one state-based piece of code to compute it, but it takes one event-based piece of code<em> for each possible event that could ever affect it</em>.</p>
<p>And even then, you still have to write the state-based update code because you will need to run it to rebuild the cache whenever something goes wrong.</p>
<h3><strong>Typical State-Based Architecture</strong></h3>
<p>There are three kinds of rules in any application :</p>
<ul>
<li>State-based rules : when this value is X, that value is F(X). Most <em>indirect</em> consequences of user input are here.</li>
<li>Event-based input rules : when this event happens in the real world, do X. This could be caused by user input, or when communicating with a third party API.</li>
<li>Event-based output rules : when this happens in the application, perform X in the real world. The classic example is sending an e-mail, but this covers <em>pushing</em> any kind of data to anyone outside your application.</li>
</ul>
<p><strong>State-based rules</strong> can be handled natively.</p>
<p><strong>Input rules</strong> are usually handled by performing an <em>atomic, non-conflicting</em> write to the database whenever the event happens — it should be done in such a way that no conflict can happen after the event has passed. One solution is to simply create a new document with an unique identifier every time an event happens: unique identifiers prevent conflicts, and you can then rely on state-based rules to aggregate a sequence of events into a more coherent current state. In my current project, every notification received from PayPal is appended to a database, and a state-based rule aggregates those notifications into a pending-failed-successful state for every transaction. As an added bonus this solution also provides a history (the list of related events) and the possibility to <em>cancel</em> events by deleting the corresponding document in the same way that one can revert a Wikipedia article to a previous version by removing the corresponding diffs.</p>
<p>Another solution for handling input rules is useful when the user <em>sets</em> a value — what matters to the user is the resulting value, not the operation that resulted in that value. If setting this value can be done by an <em>atomic, non-conflicting</em> update, then do so. Keep in mind that if you use CouchDB master-master replication, then updates are <em>not</em> non-conflicting !</p>
<p><strong>Output rules</strong> are trickier. If you are lucky, your output rule is in fact tied to an input event such as «When you click this button, I will ask Paypal for your money» and this can in fact be handled as a normal input rule that just happens to query a third party API for more input data.</p>
<p>Application-initiated output events involve creating an entry that represents the outgoing event before it happens, with a timestamp of the moment the event should happen, appropriately set some time into the future. That entry is then managed by standard state-based rules that can alter it or disable it as part of the corresponding source data eventually becoming consistent. The delay should be calculated to ensure that the database does become consistent, and a delay of few minutes is not a problem because the action was not initiated by the user. Once the delay expires, the application reads back the entry and performs the output action if it is still appropriate.</p>
<p>Back to Bob&#8217;s articles : let&#8217;s say the specifications require that I send Bob a congratulatory e-mail whenever an article gets a badge. Be cause he cheated, the state-based rule determines mistakenly that Bob&#8217;s articles X and Y both received a bronze badge, so it creates two entries in the «congratulatory e-mail» section, both set one minute into the future.</p>
<p>The trick here is that the identifiers of those entries are something along the lines of &#8220;Bronze-Badge-Y&#8221; so that applying the state-based rule several times merely updates the same entry instead of creating a new one every time. After a few seconds, the eventual consistency catches up with Bob and article Y loses its bronze badge status. The rule-based system detects that the &#8220;Bronze-Badge-Y&#8221; entry needs to be updated and marks it as «do not send».</p>
<h3>User Uncertainty</h3>
<p>Earlier, I skimmed over the fact that users don&#8217;t care about eventual consistency. There&#8217;s one exception to this rule — when you&#8217;re asking users to make a decision based on data you are showing them, you cannot afford to go wrong.</p>
<p>If you ask your user whether they wish to pay $100, and you bill them $101 instead because the price changed in the database while the user was reading the confirmation form, then you have a problem.</p>
<p>This problem, however, is not specific to the NoSQL eventual consistency world. In fact, the average SQL application has the same problem: it&#8217;s impossible to start a transaction, show the user a confirmation form, and only end the transaction when the user confirms. Transactions do not work that way. Instead, both SQL and NoSQL solutions must resort to a conflict detection strategy: when the user confirms, check whether the user&#8217;s decision is still compatible with the application state and if it isn&#8217;t, show them an error message — «Sorry, the price just went up to $101, do you still want to go on?»</p>
<p>It is possible to detect conflict using state-based rules in an eventually consistent database: entry A, created when the user confirmed the payment, states that $100 should be billed, but entry B created a few seconds before entry A states that the price is now $101. The problem is that it might take a short while for entries A and B to be processed together, but we need to show a confirmation page straight away&#8230;</p>
<p>You have two possibilities here. The first is the most obvious one: have the user wait until the eventual consistency kicks in and you can genuinely confirm their purchase; you may optimise your NoSQL usage to make that delay shorter, such as by avoiding master-master replication on that particular database.</p>
<p>The second possibility, for which I have a personal preference, is to provide an answer straight away, but reserve the right to deny that decision later. This means that in 99% of the cases, there is no conflict and the user does not have to wait. In 99% of the remaining cases, the user waited long enough on the confirmation page that the conflict is detected straight away. It really takes a stroke of bad luck for the user&#8217;s decision to happen precisely as the situation changes, so having to cancel in those specific cases is acceptable, especially since your state-based architecture can handle the cancellation quite well.</p>
<p>This is no different than having to cancel an e-commerce order because the ordered item was lost at the warehouse — the computer said yes, but reality said no.</p>
<h3>TL ; DR</h3>
<ol>
<li>An UPDATE is <em>permanently</em> inconsistent if it was based on <em>temporarily</em> inconsistent data.</li>
<li>The result of a CREATE is never <em>permanently</em> inconsistent.<br />
So, don&#8217;t UPDATE objects, CREATE object <em>modifications</em>.</li>
<li>To get the latest version of an object, apply a map-reduce algorithm to the modifications.</li>
<li>You should cache data, the cache must be re-calculated whenever the underlying data changes.</li>
<li>Some UPDATEs are in fact hidden cache refreshes. Use a normal cache instead.</li>
<li>When affecting the outside world, wait for the eventual consistency to kick in before you act.</li>
<li>Conflicts can affect users, but only rarely. Plan your UI accordingly.</li>
</ol>
<p><small>Article Image &copy; Chris Dlugosz &mdash; <a href="http://www.flickr.com/photos/chrisdlugosz/4324706280/">Flickr</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/08/coping-with-inconsistent-databases/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>NoSQL Is A Premature Optimization</title>
		<link>http://www.nicollet.net/2011/07/nosql-is-a-premature-optimization/</link>
		<comments>http://www.nicollet.net/2011/07/nosql-is-a-premature-optimization/#comments</comments>
		<pubDate>Sat, 23 Jul 2011 12:32:15 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Dynamic]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[NoSQL]]></category>
		<category><![CDATA[Productivity]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2441</guid>
		<description><![CDATA[Or so Bob Warfield writes. I happen to agree with the title — optimization using NoSQL means using a server cluster to split the load and scale up, and such an optimization is premature unless you are already having the millions of visits it takes to feel growing pains. If I start off on a [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-2442" title="bulb" src="http://www.nicollet.net/wp-content/uploads/2011/07/bulb.png" alt="" width="675" height="100" /></p>
<p><a href="http://smoothspan.wordpress.com/2011/07/22/nosql-is-a-premature-optimization/" target="_blank">Or so Bob Warfield writes</a>. I happen to agree with the title — optimization using NoSQL means using a server cluster to split the load and scale up, and such an optimization is premature unless you are already having the millions of visits it takes to feel growing pains. If I start off on a new project and decide «<em>I&#8217;m going to use NoSQL so that it will scale when my project will have millions of users</em>» then I am prematurely assuming that your initial NoSQL strategy will fit the actual million-user scenario that will come up years from now. In fact, the bottleneck will probably be in a feature I didn&#8217;t even think of yet, and making it work will probably involve changes in the persistence model. But Bob Warfield goes further than the premature optimization argument:</p>
<blockquote><p><strong>Point 2:  There is no particular advantage to NoSQL until you  reach scales that require it.  In fact it is the opposite, given Point  1.</strong></p>
<p>It’s harder to use.  You wind up having to do more in your  application layer to make up for what Relational does that NoSQL can’t  that you may rely on.  Take consistency, for example.  As Anand says in  his video, “Non-relational systems are not consistent.  Some, like  Cassandra, will heal the data.  Some will not.  If yours doesn’t, you  will spend a lot of time writing consistency checkers to deal with it.”   This is just one of many issues involved with being productive with  NoSQL.</p></blockquote>
<p>My current SaaS project pivoted from MySQL to CouchDB nearly at the beginning, certainly before we had any customers or any features worth showing. My greatest fear when settling on CouchDB was that I would have to <em>work around</em> the NoSQL lack of transactions, joins, consistency or whatever else you expect from a database system.</p>
<p>I was sorely mistaken, and so is Bob Warfield.</p>
<p>Even though NoSQL fails to solve many of the <em>low-level</em> problems that SQL eats for breakfast, this does not make it incapable of solving the same <em>high-level</em> problems as traditional relational strategies, you just need to understand how to do it, in the same way that you had to understand relational algebra, joins, indexes and transactions before doing anything worthwhile with <strong> </strong>SQL. Coming to NoSQL and expecting to solve your problems with those same strategies that you used in the relational world is as silly as using a hammer to drive screws in.</p>
<p>For instance, CouchDB has no global consistency, only eventual consistency — there is an <em>inconsistency window</em> where state spread across multiple documents can be inconsistent. This will make any relational programmer scream bloody murder. And yes, if you absolutely and positively need to have that state stay consistent, then you will need some application-side code to do it, and it will ruin your productivity.</p>
<p>But most applications don&#8217;t <em>need</em> global consistency, in fact an inconsistency window of a few seconds is acceptable in most situations. It is the programmers who need global consistency, because they do not have the mental tools required to work with eventual consistency. But once you get the hang of it, there is no working around, no overhead, no additional steps or checks required to make your application work. It is a different route, but not a longer one.</p>
<p>In addition to the above, from my experience, <strong>there are clear and significant benefits to using CouchDB over MySQL that are not related to scalability or performance</strong>. These benefits may well be useless to your specific situation, but they do exist.</p>
<h3>1. Schema changes are painless and non-locking</h3>
<p><a href="http://highscalability.com/blog/2010/5/17/7-lessons-learned-while-building-reddit-to-270-million-page.html" target="_blank">This (lesson 3)</a> is what brought me to NoSQL in the first place.</p>
<p>CouchDB does not implement a schema in the way an SQL product rigidly delineates tables, columns and relationships. Of course, it would be foolish to actually have no schema concept at all, so there is a dedicated schema layer in our application architecture that describes what the CouchDB &#8220;tables&#8221; look like, in terms of serialization and deserialization. Schema changes are therefore a simple change to the deserialization process, which needs to be able to read the old data format.</p>
<p>For simple changes, such as adding a field with a constant value, no work is required as the deserialization layer can fill in the missing field on the fly. For complex changes that involve application-provided data, such as adding a &#8220;file size&#8221; field that needs to be initialized with the actual file size, there is a clear benefit to having the application itself perform the schema change, as opposed to application-independent ALTER scripts.<strong></strong></p>
<h3>2. Document contents can be dynamic</h3>
<p>This was the actual reason we settled on CouchDB: our application lets users add their own custom fields to objects, and then filter/sort based on these fields. This requires almost no programming effort (aside, of course, from the user interface involved in doing so) and is nearly as efficient as using static programmer-provided fields.</p>
<p>I have had in the past some experience with managing arbitrary fields on a SQL platform, mostly when I was working with open source e-commerce platform Magento. Dynamic fields involve some significant boilerplate (such as entity-attribute-value tables) and clever tricks to perform filtering efficiently.</p>
<h3>3. The application-database impedance is lower</h3>
<p>A typical SQL schema contains two kinds of relationships: natural relationships such as «<em>an article has an author</em>» between two entities that can and will usually be queried independently, and accidental relationships such as «<em>an article has several tags</em>» that are only present because SQL cannot store the tags in the post table. As such, extracting a post from an SQL database counter-intuitively requires one query to grab the post itself, and another query to grab its tags.</p>
<p>CouchDB does away with accidental relationships completely by storing JSON documents. While this might allow a performance in some cases, the main benefit is that object <em>composition</em> as described by the programmer in the application code is persisted intuitively, without jumping through the intellectual hoops typical in relational storage.<strong></strong></p>
<h3>4. An identifier-centric application architecture is possible<strong><br />
</strong></h3>
<p>What does it mean to be identifier-centric or object-centric? A function to get the full URL of an article, in an <em>object-centric</em> application, is a function that takes an article object as an argument (or possibly a member function of the article object) and returns the article&#8217;s full URL. In an identifier-centric application, it would be a function that takes an article identifier as an argument (or possibly a member function of the article identifier class) and returns the full URL.</p>
<p>Identifier-centric architectures have major design benefits over object-centric ones, with clear consequences in terms of productivity and correctness, but have a major performance problem as the <em>same</em> data is read from the database several times unless some very complex caching strategies are applied — that data might be read using a quite complex SQL query that is hard to keep in cache correctly.</p>
<p>From my experience, the vast majority of queries in a CouchDB application will either query a document by its identifier, or query a view for several key-identifier-document pairs. In short, most of the data manipulated by the application can be easily traced back to an identifier without any specific design effort. And get-document-by-id requests are far easier to cache and optimize than arbitrary SELECT requests, both at the application level (we have a temporary cache that lasts the lifetime of the HTTP request) and with key-value caches like Memcache.</p>
<p>This may sound like a performance argument, but it isn&#8217;t, or at least not in the traditional «<em>NoSQL is faster than SQL</em>» sense. It just means that using NoSQL makes an identifier-centric architecture <em>acceptable</em> in terms of performance.</p>
<p><small>Article image © Satoru Kikuchi — <a href="http://www.flickr.com/photos/satoru_kikuchi/4461605065/">Flickr</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/07/nosql-is-a-premature-optimization/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Agile Code in OCaml</title>
		<link>http://www.nicollet.net/2011/07/agile-code-in-ocaml/</link>
		<comments>http://www.nicollet.net/2011/07/agile-code-in-ocaml/#comments</comments>
		<pubDate>Thu, 07 Jul 2011 08:40:22 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Functional]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Objective Caml]]></category>
		<category><![CDATA[Productivity]]></category>
		<category><![CDATA[RunOrg]]></category>
		<category><![CDATA[Start-Up]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2432</guid>
		<description><![CDATA[First, a quick bit of background: I&#8217;m working on RunOrg [fr], a start-up that provides communities with their own online private social networks à la facebook. The technology stack is Linux-Apache-CouchDB-OCaml, and this has some implications that I will discuss below. Facebook has it easy in terms of user management: an user starts existing on [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-2433" title="sparta" src="http://www.nicollet.net/wp-content/uploads/2011/07/sparta.png" alt="" width="675" height="100" /></p>
<p>First, a quick bit of background: I&#8217;m working on <a href="http://runorg.com" target="_blank">RunOrg</a> [fr], a start-up that provides communities with their own online private social networks <em>à la facebook</em>. The technology stack is Linux-Apache-CouchDB-OCaml, and this has some implications that I will discuss below.</p>
<p>Facebook has it easy in terms of user management: an user starts existing on their platform the instant they sign up, at which point they fill in their first name and last name, and these are displayed to anyone who is allowed to see any hint of the user&#8217;s existence. So, making the first name and last name mandatory are quite acceptable.</p>
<p>At RunOrg, we cannot do this for several reasons:</p>
<ul>
<li>User profiles may be created by communities as part of the membership management toolbox: we have to rely on user A to provide data about user B, and user A usually relies on an email-only source (such as newsletter or mailing-list registrations) where no first name or last name is available.</li>
<li>A given user may be part of several independent communities, and may choose to manage their identity separately for each one: appear as John Doe in an innocuous community they trust and John Censored in a more critical community.</li>
<li>We also allow users to keep control over whether a community is allowed to publish their name on the internet (as part of the online directory, or as comments on public articles).</li>
</ul>
<p>Our needs for advanced privacy controls involve a more complex management of both what data is available and how we display it. The good news is, it&#8217;s certainly possible to handle all of this elegantly in terms of implementation. The bad news — <strong>we didn&#8217;t plan everything ahead</strong>.</p>
<p>It was only a few days ago that our customer requirements sessions brought up the issue of email-only sources: community managers were frustrated by the fact that our mass import functionality <em>required</em> first names and last names. The problem is, in almost every single programming language out there, making a required field become optional is a very dangerous endeavor because the development team must audit the entire code base to identify which parts of the code assume that the field is required, and describe what should happen when the field is <em>null</em>.</p>
<p>In your average PHP project, try making the user name optional and I can assure you that sentences like «You have been invited by  to this event» will appear. Someone failed to audit the who-invited-you-to-events code. At least with Java or C# you will get a Null Reference Exception of some kind that will show up in the logs and give you the opportunity to hunt down the mistake.</p>
<p>The good news is that our implementation language OCaml, does not allow <em>null</em> values. Instead, optional values are handled using a different value <em>type</em>, known as <code>'a option</code>, which changes everything. An optional value simply cannot be accessed in the same way as a non-optional value. Trying to do so anyway will cause a type error that is picked up by the compiler, so a programmer can rely on these errors to quickly identify all locations the code that assume the value to be present.</p>
<p>I&#8217;ll say it again: in OCaml, a field being optional or mandatory is an assumption that is build into the <em>type</em> of that field, so changing the assumption involves changing the type and breaks all code that does not match the new assumption. Applying breaking changes to an OCaml code base is usually as simple as following a trail of compiler errors.</p>
<p>So, that&#8217;s what we did. We already knew that the behavior we wanted was to construct the &#8220;display name&#8221; of users like this:</p>
<ul>
<li>If either the firstname or the lastname are present, use them (if both are present, use firstname-whitespace-lastname).</li>
<li>If none were present, then the private display name (visible only to the user themselves on their profile page and in the e-mails they receive) should be their e-mail address and the public display name (visible to everyone else) should be the username part of their e-mail address (so <strong>john.doe@gmail.com</strong> is shown as <strong>john.doe@&#8230;</strong>)</li>
</ul>
<p>First, we defined two functions that compute the private and public display names based on the first name, last name and e-mail of the user. Then, the compiler error trail led us to all locations where a change was required, where we quickly identified whether the public or private name was to be displayed and replaced the existing code with our new display name functions. In total, a full audit of 40kLOC was done in less than an hour and I have <em>proof</em> that any code that uses the user name now handles the case where the user name is not provided.</p>
<h4>The Rules</h4>
<p>When working on any OCaml project, and especially on RunOrg, I follow these few rules:</p>
<ul>
<li><strong>Any assumption must cause a compiler error when broken</strong>. Either the code determines <em>on the spot</em> that the assumption is true, or I use the type system to <em>prove that another part</em> of the code already did. This rule took a massive toll on my early productivity, and I attributed it to an inherent cost of making compiler-enforced assumptions, but the real reason was that I was still pretty new at it — the elementary assumption enforcement from my smaller projects was too crude for the needs of the richness of RunOrg functionality, and it took me six months to refactor my early approach into an elegant and streamlined strategy of encoding assumptions into types.</li>
<li><strong>Don&#8217;t work around the compiler or cheat with semantics</strong>. The initial reaction to a system that complains about every little change you make is to try and work around it by using more generic types or storing information where it does not belong. For instance, an easy solution to the optional name conundrum would have been to store <strong>john.doe@&#8230;</strong> as the name, but doing so would have been semantically incorrect (that&#8217;s a placeholder, not a name) and would have polluted the database &#8220;name&#8221; field with things that are not names and that <em>will be treated differently from names at some point in the future</em>.</li>
<li><strong>Don&#8217;t accept mediocre code or patterns</strong>. Sometimes, design choices in the interface of module A will lead to ugly code in modules B, C and D because an unforeseen usage pattern happens to apply 95% of the time and the interface of module A was not designed with that usage pattern in mind. No amount of cleanup or refactoring in modules B, C and D will solve the problem, the only solution is to go back to the design of module A and <em>change the interface</em> even if it means that two hundred client modules will break. Keeping my code clean, elegant and short is worth wading through two hundred modules.</li>
<li><strong>Perform lazy payments on your technical debt</strong>. I can propagate new design changes through your entire code base in one coding session, but this doesn&#8217;t mean I should. Instead, I keep a mental todo-list of all the changes that need to be applied, and apply all of them at once, locally, whenever I have to rework a given piece of code for any reason. While it may seem that such a todo-list is hard to keep and I will inevitably forget parts of it, remember that those design changes came around in order to solve the problem of ugly or mediocre code — by noticing that the code is ugly, I am reminded of the strategies that I set up in order to clean it up.</li>
<li><strong>But be eager with small payments</strong>. If it&#8217;s a matter of moving a few functions around or refactoring a small piece of code, I do it as soon as I am done writing or rewriting it. Cleaning up little odd bits in a mostly clean code base is extremely rewarding.</li>
<li><strong>Discover code by trying changes out</strong>. If the assumptions are correctly laid out, then the easiest way to determine the implications of a change — whether it will work and how long it will take — is simply to try it out. Following the compiler error trail will quickly reveal how many things are impacted by the change, as well as any unforeseen massive consequences. If it turns out that the change is too impacting, I just roll back my edits.</li>
<li><strong>Keep interface patterns to a minimum</strong>. The basic idea behind having few different interfaces implemented by many parts of the system is usually expected to be «code is easier to reuse» but I disagree. Yes, that is a frequent benefit, but certainly not the most essential. Having few different interfaces means that most of my code can be described using a small <em>vocabulary</em> of interface patterns, and that looking at some code immediately reveals the pattern being used there. It also means that any design changes can be expressed in term of pattern changes, and can be applied almost blindly to all locations where that pattern was used. Last but not least, by using a simple shared vocabulary for large sections of the applications, I make it easier to recognize patterns in the more chaotic sections based on how they interact with the cleaner code. It&#8217;s easier to determine that two sentences have the same meaning if they share some words.</li>
<li><strong>Love your code</strong>. In the RunOrg code base, priority 3 is making sure the code is well-designed, clean and free of technical debt, priority 2 is adding new features, priority 1 is making sure there are no bugs, and the drop-everything-you-do-and-work-on-this priority zero is that <em>I should never hate working on the software</em>. Motivation is paramount to keeping the code clean, feature-rich and bug-free, and even to working on the start-up in the first place, so anything that might make me question my dedication to the project or cause me pain while working on it <em>must</em> and <em>will</em> be corrected as soon as possible, regardless of other priorities.</li>
</ul>
<p>I&#8217;m pretty certain that all of the rules are important, but I do believe the last one is an absolute prerequisite.</p>
<p><small>Article image © Ergonomik — <a href="http://www.flickr.com/photos/psyarch/3841401884/">Flickr</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/07/agile-code-in-ocaml/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

