<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Nicollet.Net &#187; Functional</title>
	<atom:link href="http://www.nicollet.net/chiasma/functional/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.nicollet.net</link>
	<description>Everyone Loves Me</description>
	<lastBuildDate>Mon, 23 Jan 2012 16:55:59 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=</generator>
		<item>
		<title>OCaml Submodule Pattern</title>
		<link>http://www.nicollet.net/2012/01/ocaml-submodule-pattern/</link>
		<comments>http://www.nicollet.net/2012/01/ocaml-submodule-pattern/#comments</comments>
		<pubDate>Mon, 23 Jan 2012 16:55:59 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Functional]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Design Patterns]]></category>
		<category><![CDATA[Objective Caml]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2660</guid>
		<description><![CDATA[My code is quite large for an OCaml project. The main RunOrg repository alone contains 46212 lines of OCaml code (plus an additional 5631 lines of OCaml mli files) — and then, there&#8217;s the web framework code and the independent plugins code. It&#8217;s is Better™ to have many short files than a few long ones. [...]]]></description>
			<content:encoded><![CDATA[<p>My code is quite large for an OCaml project. The main RunOrg repository alone contains 46212 lines of OCaml code (plus an additional 5631 lines of OCaml mli files) — and then, there&#8217;s the web framework code and the independent plugins code.</p>
<p>It&#8217;s is Better™ to have many short files than a few long ones. One reason is incremental compiling with <em>ocamlbuild</em> : that the smaller your files are, the smaller the percentage of code to be compiled when you make a small change. Another reason is that files provide a natural delineation of code that makes it slightly easier to reason about.</p>
<p>The very process of splitting a large file into smaller files is also an excellent way to clean up the code. Every split is an opportunity to move some code to a more generic location — why have a <code>CMember_importParser</code> module when all of its functionality could fit into an <code>OzCsv</code> plugin module ? Even when no such generic solution exists, cutting through the jungle that a 2000-line module contains helps clean up dependencies, identify shared functionality and imagine better ways to design code.</p>
<p>Still, when cutting up code this way, the problem of encapsulation remains. If code that relates to pictures (an upload module, a transform module, a download module, an access rights module) is split across several files, it is desirable to let each file access functions and values from other values that would not otherwise be shown to modules not related to picture processing. For instance, a <code>get_download_link</code> function should be available throughout all picture-related modules, but the rest of the application should use the <code>get_download_link_for_user</code> function that checks whether the user is allowed to download the file.</p>
<p>In order to achieve several nested levels of encapsulation required to work with modules this way, I have come up with a convention :</p>
<ul>
<li>A module name (and thus, a file name) is composed of segments written in camelCase and separated by underscores. For instance, <code>CEntity_view_grid</code> is a module name containing segments <code>CEntity,</code> <code>view</code> and <code>grid</code>.</li>
<li>Modules with only one segment are public. Any other module may include, open or otherwise reference them with no limitations beyond what the module signature says. So, <code>CEntity</code> may access <code>MGroup</code> freely.</li>
<li>Modules with N &gt; 1 segments are private. They may only be accessed by modules which share the first N-1 segments. So, <code>CEntity_view</code> is available to modules <code>CEntity</code> and <code>CEntity_edit</code> but not <code>CPicture</code>.</li>
<li>A module with N segments may export any module with N+1 segments it can access, possibly under a more restrictive signature. For instance, <code>CEntity_view</code> is available to all other modules as <code>CEntity.View</code>.</li>
</ul>
<p>To make these rules easier to respect, private module dependencies are made explicit by adding a list of module aliases at the top of each file. The top of my <code>cEntity_view.ml</code> file starts with :</p>
<pre style="padding-left: 30px;"><code>module Sidebar     = CEntity_sidebar
module Unavailable = CEntity_unavailable
module Edit        = CEntity_edit
module Info        = CEntity_view_info
module Directory   = CEntity_view_directory
module Grid        = CEntity_view_grid
module Wall        = CEntity_view_wall
</code></pre>
<p>It is forbidden to use a private module without going through such an alias, and it is forbidden to define such an alias anywhere except at the top of the file. This makes it extremely easy to determine whether private access rules are respected.</p>
<p>The rule of thumb for splitting files (in my particular coding style) is :</p>
<ul>
<li>Code for separate layers (model, view, controller&#8230;) go into separate public modules.</li>
<li>For complex code (such as complex rules in model or controller code), consider splitting files larger than 200 lines.</li>
<li>For simple code (such as HTML template or JSON serialization definitions), there is no splitting limit except for factoring out common behavior.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2012/01/ocaml-submodule-pattern/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Two-way bindings</title>
		<link>http://www.nicollet.net/2011/12/two-way-bindings/</link>
		<comments>http://www.nicollet.net/2011/12/two-way-bindings/#comments</comments>
		<pubDate>Thu, 29 Dec 2011 18:42:05 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Functional]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Objective Caml]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2640</guid>
		<description><![CDATA[Quick : how would you design a function that opens a file handle that auto-closes whenever execution leaves a certain scope, even if an exception happens ? The C++ solution is quite straightforward : have a destructor that closes the file handle, and create the handle as an auto variable: std::ofstream out = std::ofstream("out.txt"); out [...]]]></description>
			<content:encoded><![CDATA[<p>Quick : how would you design a function that opens a file handle that auto-closes whenever execution leaves a certain scope, even if an exception happens ?</p>
<p>The C++ solution is quite straightforward : have a <em>destructor</em> that closes the file handle, and create the handle as an auto variable:</p>
<pre style="padding-left: 30px;">std::ofstream out = std::ofstream("out.txt");
out &lt;&lt; whatever();
// file handle is always closed by the language</pre>
<p>On the other hand, OCaml does not have destructors, but can simulate the RAII spirit <a href="http://thelema.github.com/AAA-batteries/hdoc/BatFile.html#VALwith_file_in" target="_blank">using a closure</a>: you provide a function that will be called with the file handle as its argument, and the file handle will be destroyed when the function returns or raises an exception.</p>
<pre style="padding-left: 30px;">BatFile.with_file_out "out.txt" begin fun out -&gt;
  BatIO.nwrite out (whatever ())
  (* the file handle is always closed by the language *)
end</pre>
<p>This is a special case of a more general principle.</p>
<h4>Two-way binding</h4>
<p>The standard <code>let</code> keyword performs a one-way binding: bind value to variable <em>then</em> evaluate expression. Two-way binding adds a post-processing step : when you&#8217;re done with the expression, do something else. Such a behavior has important consequences for writing concise and readable code.</p>
<p>In my OCaml code, two-way binding is performed with keyword let! that is preprocessed as follows :</p>
<pre style="padding-left: 30px;">let! pattern = value in expression
(* Is translated to *)
value (fun pattern -&gt; expression)</pre>
<p>For instance, the above file manipulation script would be written as:</p>
<pre style="padding-left: 30px;">let! out = BatFile.with_file_out "out.txt" in
BatIo.nwrite out (whatever ())</pre>
<p>This syntax expresses the actual intent of the code better than the anonymous callback syntax did: <em>bind the file handle to this variable, but don&#8217;t forget the post-processing steps</em>.</p>
<p>Here are a few more examples of situations that may be improved by this syntax :</p>
<h4>Events and reactive programming</h4>
<p>Reactive programs can be constructed either using the typical &#8220;<em>register this function to be called whenever this value changes or this event happens</em>&#8221; semantics, or  by using binding semantics instead:</p>
<pre style="padding-left: 30px;">let () =
  let! user = User.on_change (#last_login) in
  if user # notify_login then
    Mail.send (user # email)
      ("Someone has logged in to your account at " ^ datetime (user # last_login))</pre>
<p>The underlying signature of <code>User.on_change</code> (which registers a listener callback and returns unit) remains the same.</p>
<h4>Retry semantics</h4>
<p>CouchDB implements transactions with retry semantics: you read a document, compute some changes and try saving them back, and  if the document was changed by someone else in the mean time, you will have to try again. It makes sense for the code inside the transaction to be 1° idempotent and 2° wrapped away in a function that 3° takes the latest version of the document as an argument :</p>
<pre style="padding-left: 30px;">let set_title article_id new_title =
  let! article = Database.transaction article_id in
  Database.write article_id { article with title = new_title }</pre>
<p>In such a design, the write function would throw a specific exception if a collision occurs, and the transaction function would intercept that exception and try again until the transaction succeeded or a maximum number of retries happened.</p>
<h4>Monads</h4>
<p>Value binding in monads benefits from having a syntax that actually looks like binding.With the option monad, one can turn this :</p>
<pre style="padding-left: 30px;">match Files.get file_id with None -&gt; None | Some file -&gt;
  match file # owner with None -&gt; None | Some user_id -&gt;
    match Users.get user_id with None -&gt; None | Some user -&gt;
      Some (user # name)</pre>
<p>Into a more straightforward version :</p>
<pre style="padding-left: 30px;">let  open BatOption.Monad in
let! file    = bind $ Files.get file_id in
let! user_id = bind $ file # owner in
let! user    = bind $ Users.get user_id in
return (user # name)</pre>
<p>Also, one can deal with Lwt threads almost as well as the Lwt-specific syntax extension:</p>
<pre style="padding-left: 30px;">open Lwt
open Lwt_io

let process_lines channel process =
  let loop () =
    let! line_opt = bind $ read_line_opt channel in
    match line_opt with
      | None -&gt; return ()
      | Some line -&gt; loop () &lt;&amp;&gt; process line
  in
  loop ()</pre>
<h4>Being Silly</h4>
<pre style="padding-left: 30px;">let fold init list f = List.fold_left (fun acc x -&gt; f (acc,x)) init list
let map list f       = List.map f list

let probabilities odds =
  let sum =
    let! accumulator, odd = fold 0. odds in
    accumulator +. float_of_int odd
  in
  let! odd = map odds in
  float_of_int odd /. sum</pre>
<h4>The Syntax Extension</h4>
<p>In case you don&#8217;t know how to create it, this is the preprocessor file for this syntax extension :</p>
<pre style="padding-left: 30px;">open Camlp4.PreCast
open Syntax

EXTEND Gram
 GLOBAL: expr;

 expr: LEVEL "top"
 [
   [ "let"; "!"; p = patt ; "=" ; e = expr ; "in" ; e' = expr -&gt;
     &lt;:expr&lt; (($e$) (fun $p$ -&gt; $e'$)) &gt;&gt; ]
 ] ;

END;</pre>
<p>By the way, I find that this extension has a significant advantage over the Lwt extension &#8211; it is readily compatible with syntax highlighting in most editors.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/12/two-way-bindings/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Basic Patterns for Everyday Programming</title>
		<link>http://www.nicollet.net/2011/11/basic-patterns-for-everyday-programming/</link>
		<comments>http://www.nicollet.net/2011/11/basic-patterns-for-everyday-programming/#comments</comments>
		<pubDate>Wed, 23 Nov 2011 15:48:19 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Functional]]></category>
		<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Design Patterns]]></category>
		<category><![CDATA[Objective Caml]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2624</guid>
		<description><![CDATA[Lakshen Perera provides a list of basic patterns for everyday programming, illustrated in Javascript and Ruby. I thought it would be interesting to provide an OCaml illustration as well, and perhaps a handful of additional patterns as well. Verify object&#8217;s availability before calling its methods or properties In many languages, there is a possibility for [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-2635" title="pattern" src="http://www.nicollet.net/wp-content/uploads/2011/11/pattern.png" alt="" width="675" height="100" /></p>
<p><a href="http://laktek.com/2011/11/23/basic-patterns-for-everyday-programming/" target="_blank">Lakshen Perera</a> provides a list of basic patterns for everyday programming, illustrated in Javascript and Ruby. I thought it would be interesting to provide an OCaml illustration as well, and perhaps a handful of additional patterns as well.</p>
<h4>Verify object&#8217;s availability before calling its methods or properties</h4>
<p>In many languages, there is a possibility for objects to be missing — whether this is represented as <code>NULL</code>, <code>null</code>, <code>nil</code> or <code>None</code>. Regardless of the language, it is important to keep in mind at all times whether a given value is optional or not. If it is not optional, then you may assert so using a type declaration if your language supports it :</p>
<pre style="padding-left: 30px;"><strong>method</strong> name  : string <span style="color: #008000;">(* not optional *)</span>
<strong>method</strong> email : string option <span style="color: #008000;">(* optional *)</span></pre>
<p>In other languages, a runtime assert will do, for instance in PHP :</p>
<pre style="padding-left: 30px;"><strong>assert</strong> (isset($this-&gt;name))</pre>
<p>OCaml will not require you to explicitly declare all values as optional or mandatory. Instead, it will deduce that information from the way you use each value. For instance, since <code>json_of_string</code> expects a non-optional string argument, you can simply write :</p>
<pre style="padding-left: 30px;"><strong>let</strong> parsed_content = json_of_string json <span style="color: #008000;">(* json is not optional *)</span></pre>
<p>If json is an optional string, this will create a compilation error unless you explicitly define what should happen when the string is missing. The most common approach is to decide that the result should be missing too :</p>
<pre style="padding-left: 30px;"><strong>let</strong> parsed_content = <strong>match</strong> json <strong>with</strong>
  | <span style="color: #008080;">None</span>   -&gt; <span style="color: #008080;">None</span>
  | <span style="color: #008080;">Some</span> s -&gt; <span style="color: #008080;">Some</span> (json_of_string s) <span style="color: #008000;">(* by definition, s is not optional *)</span></pre>
<p>If using the Batteries library (which, by the way, you should), you can express this more easily :</p>
<pre style="padding-left: 30px;"><strong>let</strong> parsed_content = <span style="color: #008080;">BatOption</span>.map json_of_string json</pre>
<h4>Set a default value with assignments</h4>
<p>This advice is almost identical to the previous one : if you need to construct a non-optional value, but only have access to an optional value, you will have to provide a default value. Using standard OCaml :</p>
<pre style="padding-left: 30px;"><strong>let</strong> role = <strong>match</strong> person.role <strong>with</strong>
  | <span style="color: #008080;">None</span>      -&gt; <span style="color: #008080;">`Guest</span>
  | <span style="color: #008080;">Some</span> role -&gt; role</pre>
<p>Using Batteries, there is an almost equivalent version :</p>
<pre style="padding-left: 30px;"><strong>let</strong> role = <span style="color: #008080;">BatOption</span>.default <span style="color: #008080;">`Guest</span> person.role</pre>
<p>This version is almost equivalent, because it will evaluate the default value even if it is not required. Let&#8217;s assume that the default value is itself provided by a complex computation, such as a database access :</p>
<pre style="padding-left: 30px;"><strong>let</strong> role = <strong>match</strong> person.role <strong>with</strong>
  | <span style="color: #008080;">None</span>      -&gt; readfrom database <span style="color: #008000;">(* only executed if person.role is missing *)</span>
  | <span style="color: #008080;">Some</span> role -&gt; role 

<strong>let</strong> role = <span style="color: #008080;">BatOption</span>.default
  (readfrom database) <span style="color: #008000;">(* Always executed, even if unnecessary *)</span>
  role</pre>
<p>The Batteries library provides an alternate function for that :</p>
<pre style="padding-left: 30px;"><strong>let</strong> role = <span style="color: #008080;">BatOption</span>.map_default
  readfrom database <span style="color: #008000;">(* only executed if person.role is missing *)</span>
  role</pre>
<h4>Checking whether a variable equals to any of the given values</h4>
<p>If the values have a legitimate reason to be strings, integers or other types with unlimited numbers of values, then « is in list » predicates are the preferred choice :</p>
<pre style="padding-left: 30px;"><strong>if </strong><span style="color: #008080;">List</span>.mem current_day [<span style="color: #ff0000;">"Monday"</span>;<span style="color: #ff0000;">"Wednesday"</span>;<span style="color: #ff0000;">"Friday"</span>] <strong>then</strong>
  <span style="color: #008000;">(* do something *) </span></pre>
<p>Days are a quite bad example, because these are better represented as a variant (which then type-checks whether you have written<em> Mornday</em> instead of <em>Monday</em>). If you have already defined the type of weekdays, then :</p>
<pre style="padding-left: 30px;"><strong>if</strong> <span style="color: #008080;">List</span>.mem (current_day : weekday) [<span style="color: #008080;">`Monday</span>;<span style="color: #008080;">`Wednesday</span>;<span style="color: #008080;">`Friday</span>] <strong>then</strong>
  <span style="color: #008000;">(* do something *)</span></pre>
<p>If this is a one-shot check, or if you would rather not define a weekday type yet, you should instead go for an exhaustive pattern-matching :</p>
<pre style="padding-left: 30px;"> <strong>match</strong> current_day <strong>with</strong>
  | <span style="color: #008080;">`Monday</span> | <span style="color: #008080;">`Wednesday</span> | <span style="color: #008080;">`Friday</span> -&gt;
    <span style="color: #008000;">(* do something *)</span>
  | <span style="color: #008080;">`Tuesday</span> | <span style="color: #008080;">`Thursday</span> | <span style="color: #008080;">`Saturday</span> | <span style="color: #008080;">`Sunday</span> -&gt; ()</pre>
<h4>Extract complex or repeated logic into functions</h4>
<p>This is a fairly fundamental concept — but it consists in two distinct parts. There is a separation and naming part (you pull out a piece of code and give it a name, which helps understand what it does and how it relates to the rest of the program) and there is an extraction and reuse part (the piece of code is pulled out into a more globally accessible location and parametrized, so that it may be used in other places).</p>
<p>With the above example, simple separation-and-naming would be :</p>
<pre style="padding-left: 30px;"><strong>let </strong>is_discount_day = <strong>match </strong>current_day <strong>with</strong>
 | <span style="color: #008080;">`Monday</span> | <span style="color: #008080;">`Wednesday</span> | <span style="color: #008080;">`Friday</span> -&gt; current_date &gt; <span style="color: #ff0000;">20</span>
 | <span style="color: #008080;">`Tuesday</span> | <span style="color: #008080;">`Thursday</span> | <span style="color: #008080;">`Saturday</span> | <span style="color: #008080;">`Sunday</span> -&gt; <span style="color: #ff0000;">false</span>
<strong>in</strong>

<strong>if </strong>is_discount_day <strong>then </strong>
  <span style="color: #008000;">(* do something *)</span></pre>
<p>The variable is defined in the same scope it is use in, and it assumes that <code>current_day</code> and <code>current_date</code> values have been defined previously in that scope. Extraction-and-reuse would go further :</p>
<pre style="padding-left: 30px;"><strong>type </strong>weekday =
  [ <span style="color: #008080;">`Monday</span> | <span style="color: #008080;">`Tuesday</span> | <span style="color: #008080;">`Wednesday</span>
  | <span style="color: #008080;">`Thursday</span> | <span style="color: #008080;">`Friday</span> | <span style="color: #008080;">`Saturday</span> | <span style="color: #008080;">`Sunday</span> ]
<strong>
let </strong>is_discount_day (day:weekday) date =
<span style="color: #008080;">  List</span>.mem day [<span style="color: #008080;">`Monday</span>;<span style="color: #008080;">`Wednesday</span>;<span style="color: #008080;">`Friday</span>] <strong>&amp;&amp;</strong> date &gt; <span style="color: #ff0000;">20</span>

...

  <strong>if </strong>is_discount_day current_day current_date <strong>then </strong>
    <span style="color: #008000;">(* do something *)</span></pre>
<p>Now, is-discount-day is a global function available from everywhere in the code, and it uses the provided parameters to determine whether this is indeed a discount day.</p>
<h4>Memoize the results of repeated function calls</h4>
<p>OCaml has several ways to perform memoization. One of them is lazy evaluation :</p>
<pre style="padding-left: 30px;"><strong>val</strong> discount_day = <strong>lazy</strong> (is_discount_day current_day current_date)
<strong>method</strong> discount_day = <span style="color: #008080;">Lazy</span>.force discount_day</pre>
<p>The lazy expression will only be evaluated the first time the <code>Lazy.force</code> function is called on it.</p>
<p>Note that if the current day or current date can change, then the memoization actually <em>breaks</em> things !</p>
<p>Memoization is also helpful when dealing with a function that requires arguments, in which case a different result will be provided for each argument set. A common solution is to use a hash table to store these :</p>
<pre style="padding-left: 30px;"><strong>let</strong> fibonacci =
  <strong>let</strong> memo = <span style="color: #008080;">Hashtbl</span>.create <span style="color: #ff0000;">100</span> <strong>in</strong>
  <strong>let rec</strong> fib n =
    <strong>try</strong> <span style="color: #008080;">Hashtbl</span>.find memo n <strong>with </strong><span style="color: #008080;">Not_found</span> -&gt;
      <strong>let</strong> result = fib (n-<span style="color: #ff0000;">1</span>) + fib (n-<span style="color: #ff0000;">2</span>) <strong>in</strong>
      <span style="color: #008080;">Hashtbl</span>.add memo n result ; result
  <strong>in</strong> fib</pre>
<p>This works fine for short-lived functions — don&#8217;t do this for global functions that might stick around for a long time, because the memoization hash table will grow and its contents will never be garbage-collected. If you really have to, use a <em>weak</em> hash table, such as Batteries&#8217; <code>BatInnerWeaktbl</code>, so that the garbage collector may reclaim the memoized values when it runs out of memory.</p>
<p>Also don&#8217; t overdo memoization — it only works when arguments are reliably passed more than once <em>and</em> the time to compute the value is significantly larger than the time to retrieve and store it <em>and</em> it is worth the memory usage <em>and </em>the function has no side-effects.</p>
<h4>Use the seven list manipulation primitives</h4>
<p>Almost any processing on collections of items can be expressed in terms of seven fundamental patterns. Recognizing those patterns can help improve the clarity of both the code and the underlying algorithm.</p>
<p><strong>1. Map</strong> transforms a list into another, item by item, in linear time. Use a map operation when all you need is a one-to-one transformation. The line below extracts three recipes from the database using their identifier.</p>
<pre style="padding-left: 30px;"><strong>let</strong> recipes = <span style="color: #008080;">List</span>.map from_database [ <span style="color: #ff0000;">"omelet"</span> ; <span style="color: #ff0000;">"cheeseburger"</span> ; <span style="color: #ff0000;">"risotto"</span> ]</pre>
<p><strong>2. Reduce</strong> transforms a list of values into a single value by repeatedly applying a function that combines together two values into one. The typical example is a fold, which uses a function to combine each list element, in turn, with an accumulator. It can be used to extract the sum of values in a list, for example :</p>
<pre style="padding-left: 30px;"><strong>let</strong> total = <span style="color: #008080;">List</span>.fold_left (+) <span style="color: #ff0000;">0</span> [ <span style="color: #ff0000;">5</span> ; <span style="color: #ff0000;">6</span> ; <span style="color: #ff0000;">3</span> ; <span style="color: #ff0000;">8</span> ; <span style="color: #ff0000;">9</span> ; <span style="color: #ff0000;">0</span> ; <span style="color: #ff0000;">7</span> ; <span style="color: #ff0000;">6</span> ]</pre>
<p>This transform allows a preliminary map step which transform the values inside the list into values that can be combined. For instance, to find the age of the oldest person in a list of people :</p>
<pre style="padding-left: 30px;"><strong>let</strong> oldest = <span style="color: #008080;">List</span>.fold_left (<strong>fun</strong> acc person -&gt; max age person.age) <span style="color: #ff0000;">0</span> people</pre>
<p><strong>3. Extract </strong>works like a map, but the transformation function returns zero, one or more results for each call. All the results are included in the final list. The most elementary implementation is literally to have a map (that transforms a list into a list of lists) followed by a concatenation (that transforms a list of lists into a list). For instance, to get all the ingredients involved in a list of recipes :</p>
<pre style="padding-left: 30px;"><strong>let </strong>ingredients =
  <span style="color: #008080;">List</span>.concat (<span style="color: #008080;">List</span>.map (<strong>fun</strong> recipe -&gt; recipe.ingredients) recipes)</pre>
<p><strong>4. Filter</strong> is a subset of Extract where the transform may not return more than one result — but it may still return none, so its result is simply an optional type. For instance, to extract the list of all recipes that have a wine recommandation along with their recommended wine :</p>
<pre style="padding-left: 30px;"><strong>let</strong> wines = <span style="color: #008080;">BatList</span>.filter_map
  (<strong>fun</strong> recipe -&gt; <span style="color: #008080;">BatOption</span>.map (<strong>fun</strong> wine -&gt; recipe,wine) recipe.wine) recipes</pre>
<p>The OCaml language also provides a standard <code>List.filter</code> function which keeps values for which a property is true. For instance, to get the list of recipes that have a wine recommendation :</p>
<pre style="padding-left: 30px;"><strong>let</strong> have_wines = <span style="color: #008080;">List</span>.filter (<strong>fun</strong> recipe -&gt; recipe.wine &lt;&gt; <span style="color: #008080;">None</span>) recipes</pre>
<p>This approach is weaker — the wines in the resulting list are still treated as optional by the type system, so you will need bogus pattern matching for a case that never happens (no wine) to extract the actual wine values. A <code>filter_map</code> lets you encode the filter property in the type of the result, which makes using the filtered list easier.</p>
<p><strong>5. Sort</strong> unsurprisingly sorts the list. The canonical sort — using the canonical order relationship — is a theoretical curiosity, and in practice most sorts use a <em>projection function</em> p such that A &lt; B iff p(A) &lt; p(B). This is best illustrated in SQL, in the form of the ORDER BY &lt;projection&gt; statement. Two useful helper functions :</p>
<pre style="padding-left: 30px;"><strong>let</strong> project compare p a b = compare (p a) (p b)
<strong>let</strong> descending compare a b = compare b a</pre>
<p>For instance, to sort the list of recipes based on how long each of them takes :</p>
<pre style="padding-left: 30px;"><strong>let</strong> by_duration =
  <span style="color: #008080;">List</span>.sort (project compare (<strong>fun</strong> recipe -&gt; recipe.duration)) recipes</pre>
<p><strong>6. Group</strong> works like Sort, but further regroups « equal » items together by returning a list of lists. It works using a comparison function that is usually based on a projection function. For instance, to get three lists containing one-star, two-star and three-star recipes :</p>
<pre style="padding-left: 30px;"><strong>let</strong> by_stars =
  <span style="color: #008080;">BatList</span>.group (project compare (<strong>fun</strong> recipe -&gt; recipe.stars)) recipes</pre>
<p>There is a special case when the projection function returns booleans, which is known as a <em>partition</em>. For instance, to extract recipes that are desserts and recipes that are not :</p>
<pre style="padding-left: 30px;"><strong>let</strong> desserts, non_desserts =
<span style="color: #008080;">  BatList</span>.partition (<strong>fun</strong> recipe -&gt; recipe.is_dessert) recipes</pre>
<p><strong>7. Search</strong> extracts one element from the list (if possible) based on a certain condition or property. Elementary searches are « first element » and « last element. » More complex searches : finding a value by key in a list of key-value pairs using <code>List.assoc</code> or using a predicate using <code>List.find</code>. The heavy-duty search tool is <code>BatList.find_map</code>, used below to find a recipe that is recommended by at least one person, and the recommending person :</p>
<pre style="padding-left: 30px;"><strong>let </strong>recipe, recommender = <span style="color: #008080;">BatList</span>.find_map
  (<strong>fun</strong> recipe -&gt; <span style="color: #008080;">BatOption</span>.map (<strong>fun </strong>person -&gt; recipe, person) recipe.recommended)
  recipes</pre>
<p>These seven patterns can be used in conjunction to perform almost any algorithm on collections, sequences or lists. For a more complex example, assume we need the ten ingredients that appear the most often in desserts. We would filter recipes by desserts, extract their ingredients, group them by name, sort the sub-lists by list length and take the name of the first element of the first ten sub-lists :</p>
<pre style="padding-left: 30px;"><strong>open </strong><span style="color: #008080;">BatPervasives</span> <span style="color: #008000;">(* for operator |&gt; *)</span>
<strong>open </strong><span style="color: #008080;">BatList
</span>
<strong>let</strong> ten_best_ingredients recipes = recipes
  |&gt; filter (<strong>fun</strong> r -&gt; r.dessert)
  |&gt; map (<strong>fun</strong> r -&gt; r.ingredients) |&gt; concat
  |&gt; group (project compare (<strong>fun</strong> i -&gt; i.name))
  |&gt; sort (descending (project compare length))
  |&gt; take <span style="color: #ff0000;">10</span>
  |&gt; map hd
  |&gt; map (<strong>fun</strong> i -&gt; i.name)</pre>
<p><small>Article &copy; brewbooks &mdash; <a href="http://www.flickr.com/photos/brewbooks/3203211847/">Flickr</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/11/basic-patterns-for-everyday-programming/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Comment Branches</title>
		<link>http://www.nicollet.net/2011/11/comment-branches/</link>
		<comments>http://www.nicollet.net/2011/11/comment-branches/#comments</comments>
		<pubDate>Thu, 17 Nov 2011 18:42:36 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Functional]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Design Patterns]]></category>
		<category><![CDATA[Productivity]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2619</guid>
		<description><![CDATA[Your development job is making changes in your software. Writing, testing and debugging those changes takes some time. If your job is anywhere as hectic as mine, you will have to fix and deploy urgent patches, even when your application code is in a half-written, half-debugged state because of the feature of the month. This is [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-2620" title="branches" src="http://www.nicollet.net/wp-content/uploads/2011/11/branches.png" alt="" width="675" height="100" />Your development job is making changes in your software. Writing, testing and debugging those changes takes some time.</p>
<p>If your job is anywhere as hectic as mine, you will have to fix and deploy urgent patches, even when your application code is in a half-written, half-debugged state because of <em>the feature of the month</em>.</p>
<p>This is what <em>branches</em> are for. You keep two versions of the code, one of which is called the <strong>trunk </strong>and is always ready for deployment, and another which holds the changes that you are working on.</p>
<p>When your feature is done, you <em>merge</em> the two versions together. You want to keep the merge operation painless. To do so, you have several kinds of branches available.</p>
<p>The <strong>repository branch</strong> is built into your SourceSafe/subversion/git/whatever. It creates two independent copies, and you need to migrate changes from the trunk to every branch out there as soon as possible, or the merge will make you wish for a sweet and merciful death.</p>
<p>By the way, changeset-oriented tools (like git or mercurial) make this easer, while revision-oriented tools (like subversion) make it harder.</p>
<p>The <strong>feature branch</strong> is done using programming logic. The code you deploy to production supports the new feature, but it is turned off for everyone except yourself. This technique is great for adding features, but inefficient when changing existing ones.</p>
<p>A side effect of the feature branch is that you can stress-test new code by rolling it out to increasing numbers of users progressively.</p>
<p>The <strong>comment branch</strong> is an odd gambit. It involves ripping out an entire module and replacing it with another that has a <em>different</em> interface. This will involve large amounts of re-wiring all over the code base, and these will take hours or days before they can be compiled, let alone <em>tested</em>.</p>
<p>Use a comment structure such as this one:</p>
<pre style="padding-left: 30px;"><span style="color: #008000;">/*[*/</span> old code <span style="color: #008000;">/*|* new code *]*/</span></pre>
<p>It is trivial to build a text-replacement macro that turns the above into the code below and back:</p>
<pre style="padding-left: 30px;"><span style="color: #008000;">/*[* old code *|*/</span> new code <span style="color: #008000;">/*]*/</span></pre>
<p>Use the macro to switch between development mode (when you write new code and desperately try to get it to compile) and fix mode (when you edit the old code and deploy it). For consistency, always commit the <em>old </em>version to the repository.</p>
<p>Why use <strong>comment branches</strong> instead of <strong>repository branches</strong> ? Maybe your source control tool sucks at branches. I use Subversion. Yes, I know. Legacy, pain and unlikely hopes of a brighter future.</p>
<p>When a trunk change occurs in a part that has been erased or reworked in the branch, that change <em>will</em> cause a conflict that <em>will</em> require manual intervention. Even with git or mercurial. For a large number of small changes sprinkled over a large codebase that is routinely involving many small updates, repository branches turn into a merge minefield.</p>
<p>Does your branch involve a small number of well-defined files ?</p>
<p>Then you should use <strong>repository branches</strong>, because conflicts will only happen in those files, and will usually be easy to fix.</p>
<p>Does your branch involve many changes in many files everywhere in the project ?</p>
<p>Then use <strong>comment branches</strong>.</p>
<p>Last and possibly least, there is the <strong>TODO-branch</strong>. This involves non-breaking, purely cosmetic changes. 25% of my project uses this syntax for historical reasons:</p>
<pre style="padding-left: 30px;">Table.get id |-&gt; function
   | None       -&gt; return 0
   | Some value -&gt; return value.count</pre>
<p>Then, a convention change happened, and this is used instead:</p>
<pre style="padding-left: 30px;">let! value_opt = breathe (Table.get id) in
match value_opt with  
   | None       -&gt; return 0
   | Some value -&gt; return value.count</pre>
<p>Then, another convention change happened, and this should be used instead</p>
<pre style="padding-left: 30px;">let! value = breathe_req_or (return 0) (Table.get id) in
return value.count</pre>
<p>And then, there&#8217;s the current version:</p>
<pre style="padding-left: 30px;">let! value = breathe_req_or (return 0) $ Table.get id in
return value.count</pre>
<p>Whenever I change coding conventions, I do not spend the time to reformat the tens of thousands of lines of code in my application. That would have been wasteful. Instead, every time a piece of code is refactored, it is refactored to the most recent style.</p>
<p>The same happens when using an old and a new version of a given API. My code uses two libraries for handling HTML forms, uses both Javascript and Coffeescript, and a variety of similar two-hammers-one-nail situations.</p>
<p>These are, for all practical purposes, branches. They are work that is being performed for long durations. The benefit of TODO-branches is that code in the middle of such changes is still compatible with the trunk. It all happens in the head of the developer, who remembers what changes should be done the next time a piece of code is rewritten.</p>
<p><small>Article Image &copy; Dominic Alves &mdash; <a href="http://www.flickr.com/photos/dominicspics/422131893/">Flickr</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/11/comment-branches/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Annealing Constraints</title>
		<link>http://www.nicollet.net/2011/10/annealing-constraints/</link>
		<comments>http://www.nicollet.net/2011/10/annealing-constraints/#comments</comments>
		<pubDate>Fri, 28 Oct 2011 17:46:28 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Functional]]></category>
		<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[RunOrg]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2598</guid>
		<description><![CDATA[Of all the existing constraint-solving algorithms out there, my favourite is simulated annealing. To use it, you need a mutation function that randomly and slightly alters a solution, and a fitness function that lets you compare two solutions. Based on these, the algorithm is simple : Start with an arbitrary solution. Mutate the solution : [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-2599" title="annealing" src="http://www.nicollet.net/wp-content/uploads/2011/10/annealing.png" alt="" width="675" height="100" />Of all the existing constraint-solving algorithms out there, my favourite is simulated annealing. To use it, you need a mutation function that randomly and slightly alters a solution, and a fitness function that lets you compare two solutions. Based on these, the algorithm is simple :</p>
<ol>
<li>Start with an arbitrary solution.</li>
<li>Mutate the solution : if the mutated solution is better than the current one in terms of fitness, keep it and repeat this step.</li>
<li>If you discard too many mutated solutions in step 2, remember the current solution and mutate it.</li>
<li>When time runs out, return the best solution found so far.</li>
</ol>
<p>Step 2 is a fairly standard ascent towards a local maximum : try a random similar solution and keep it if it improves fitness. Step 3 detects that the algorithm is stuck in a local maximum, so it shakes the solution out of there and starts again from that new position.</p>
<p>What is interesting about this solution is that it is exceedingly easy to implement, easy to adapt to a problem (as long as mutation and fitness are possible), and is reasonably efficient for most small problems.</p>
<p>I used it today to solve an original problem on RunOrg. We have a history of recently connected users, and we wanted to display a simple metric along the lines of « 10 users connected in the last hour » or « 11 users connected this week » as a short summary on the administrators&#8217; dashboard. The problem, of course, is to pick a reasonable duration based on the current data ! If one user connected seven days ago, and ten users connected a few minutes ago, then both « 10 users connected in the last hour » and « 11 users connected this week » are correct descriptions of the situation, but the former is intuitively better than the latter. <em>Intuitively</em>, we should use the shortest duration with the largest amount of users.</p>
<p>And intuition is a bad thing, because it is hard to turn into an algorithm.</p>
<p>In the end, I settled for a handful of periods of time : the last 5 minutes, the last 1 to 6 hours, today, the last 2 or 3 days, and this week. Each of these periods of time contains a certain number of user connections (which is fairly easy to determine based on the connection history) and receives a score computed by multiplying the number of user connections with a magic constant that is tied to that period. So, if the magic constant for « the last hour » is 0.76 and that for « this week » is 0.03, then 10 × 0.76 = 7.6 and 11 × 0.03 = 0.33 meaning that the « 10 users connected in the last hour » version will be preferred.</p>
<p>Then, I came up with several examples of connection histories, and manually picked the <em>intuitive</em> duration that should be chosen to display them. This turned into the <strong>fitness</strong> function : any set of magic numbers would have to agree with as many of those examples as possible, the more the better. <strong>Mutation</strong>, on the other hand, consisted in changing one of the magic constants at random.</p>
<p>Within 2000 iterations and a few seconds, the annealing algorithm came up with a solution that satisfied 94% of my constraints — and I am not even certain whether 100% could be satisfied, since I might have left a few contradictory intuitions in there. The final output is:</p>
<pre style="padding-left: 30px;">== Fitness: 94.44%
5min : 1
1h : 0.761091036722064
2h : 0.42295812451487896
3h : 0.26899204507851066
4h : 0.20923702855563558
5h : 0.14858146007043194
6h : 0.14846461447836096
today : 0.1477566411417444
2days : 0.07381054881381345
3days : 0.04202811963812667
week : 0.033268828300871064</pre>
<p>And so, these are the magic numbers that our algorithm uses. If it makes wrong decisions, we can always run the annealing again with new constraints, and feed the magic numbers back in.</p>
<p><small>Article image © Detlef Schobert — <a href="http://www.flickr.com/photos/detlefschobert/2505032252/">Flickr</a> </small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/10/annealing-constraints/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ozone Templating</title>
		<link>http://www.nicollet.net/2011/10/ozone-templating/</link>
		<comments>http://www.nicollet.net/2011/10/ozone-templating/#comments</comments>
		<pubDate>Tue, 18 Oct 2011 14:59:37 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Functional]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[HTML]]></category>
		<category><![CDATA[Objective Caml]]></category>
		<category><![CDATA[RunOrg]]></category>
		<category><![CDATA[Template]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2588</guid>
		<description><![CDATA[There have been recurring requests about an in-depth explanation of how Ozone — our in-house OCaml web framework — handles HTML templates. So, here it is. A template is usually understood by everyone to be « HTML with holes » that is filled using values from the application itself. It is, in a sense, a DSL that [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-2596" title="mountains" src="http://www.nicollet.net/wp-content/uploads/2011/10/mountains.png" alt="" width="675" height="100" /></p>
<p>There have been recurring requests about an in-depth explanation of how Ozone — our in-house OCaml web framework — handles HTML templates. So, here it is.</p>
<p>A template is usually understood by everyone to be « HTML with holes » that is filled using values from the application itself. It is, in a sense, a DSL that is restricted to describing how HTML should be built.</p>
<p>Here is an example of template Ozone could use, stored in file <em>users.htm</em> :</p>
<pre style="padding-left: 30px;"><code>&lt;<span style="color: #003366;">h1</span>&gt;<span style="color: #ff6600;"><strong>{t:users.title}</strong></span>&lt;/<span style="color: #003366;">h1</span>&gt;
&lt;<span style="color: #003366;">ul</span> <span style="color: #008000;">class</span>=<span style="color: #ff0000;">"userlist"</span>&gt;
  {{<strong><span style="color: #ff6600;">list</span></strong>:
    &lt;<span style="color: #003366;">li</span> <span style="color: #008000;">id</span>=<span style="color: #ff0000;">"<strong><span style="color: #ff6600;">{v:id}</span></strong>"</span>&gt;
      &lt;<span style="color: #003366;">img </span><span style="color: #008000;">src</span>=<span style="color: #ff0000;">"<strong><span style="color: #ff6600;">{v:img}</span></strong>"</span>/&gt;
<span style="font-family: monospace;">      &lt;</span><span style="color: #003366;">a</span><span style="font-family: monospace;"> </span><span style="color: #008000;">href</span><span style="font-family: monospace;">=</span><span style="color: #ff0000;">"<strong><span style="color: #ff6600;">{v:url}</span></strong>"</span><span style="font-family: monospace;">&gt;</span><strong><span style="color: #ff6600;">{v:name}</span></strong><span style="font-family: monospace;">&lt;/</span><span style="color: #003366;">a</span><span style="font-family: monospace;">&gt;
    &lt;/</span><span style="color: #003366;">li</span><span style="font-family: monospace;">&gt;
</span><span style="font-family: monospace;">  }}</span><span style="font-family: monospace;">
</span><span style="font-family: monospace;">&lt;/</span><span style="color: #003366;">ul</span><span style="font-family: monospace;">&gt;
</span>
&lt;<span style="color: #003366;">script</span> <span style="color: #008000;">type</span>=<span style="color: #ff0000;">"text/coffeescript"</span> <span style="color: #008000;">params</span>=<span style="color: #ff0000;">"<strong><span style="color: #ff6600;">save_url</span></strong>"</span>&gt;
list = @$.find <span style="color: #ff0000;">'ul'</span>

save = =&gt;
  ids = [];
  list.children(<span style="color: #ff0000;">'li'</span>).each -&gt;
    ids.push $(@).attr 'id'
  @ajax save_url, ids

list.children(<span style="color: #ff0000;">'li'</span>).sortable
  change: save
&lt;/<span style="color: #003366;">script</span><span style="font-family: monospace;">&gt;
</span>
&lt;<span style="color: #003366;">style</span> <span style="color: #008000;">type</span>=<span style="color: #ff0000;">"less"</span>&gt;
<span style="color: #008000;">.userlist</span> {
  <span style="color: #003366;">list-style-type</span>: none;
  <span style="color: #008000;">li img</span> {
    <span style="color: #003366;">float</span>: left;
    <span style="color: #003366;">margin-right</span>: 5px;
    <span style="color: #003366;">width</span>: 50px;
    <span style="color: #003366;">height</span>: 50px;
  }
}
&lt;/<span style="color: #003366;">style</span>&gt;  </code></pre>
<p>The template for a sortable list of users contains three things:</p>
<ul>
<li>A piece of HTML, which is the actual « HTML with holes » to be filled later. The holes are marked in orange.</li>
<li>A piece of CoffeeScript, which will be extracted from the template file, compiled to javascript and appended to a site-wide javascript file. It will be replaced, in the template, by a hole that will call the extracted javascript with additional parameters provided by the application (in orange).</li>
<li>A piece of LESS CSS, which is compiled to CSS and appended to a site-wide CSS file.</li>
</ul>
<p>These are not sections — they can appear in any order as long as the elements and attributes are respected so the pre-build tool can identify and extract the CoffeeScript and CSS bits.</p>
<p>Let&#8217;s examine each of them in order.</p>
<h4>The HTML Template</h4>
<p>This is the meat of the template. In order to improve application performance, loading the templates is a multi-step operation that involves intermediary storage formats.</p>
<p>The <em>first</em> step consists in reading in all the necessary templates, parsing them to determine that no variables are undefined, and storing them as a JSON blog in the underlying CouchDB database. This is a manually triggered operation that happens whenever we modify the templates (it&#8217;s part of our deployment procedure). This step may also involve a bit of cleanup, such as removing semantically irrelevant spaces from the HTML (this cannot be done earlier, because some templates are plaintext instead of HTML, and only the application knows which is which).</p>
<p>The <em>second </em>step happens whenever a new instance of our application begins — maybe it died and needed to restart, maybe Apache decided it needed another worker process to handle a surplus of request, or maybe we added a new server to our web farm. The startup process of our application server does not read anything from the disk — instead, it will read in all the template data from the database, along with all the other bits of configuration: internationalization strings, third party API keys, feature branch triggers, and so on. Then, it will compile every template down to optimized closure-based opcodes for a hole-filling virtual machine.</p>
<p>The <em>third </em>step happens whenever a bit of HTML needs to be rendered. The application provides the hole-filling virtual machine with a data object and a « writing stream » which is either the HTTP request stream or a JSON serializer stream, depending on whether the request is normal HTTP or AJAX. This is an extremely fast operation where no parsing or checks are performed.</p>
<p>On the application side, loading a template involves three things:</p>
<ul>
<li>Declaring the type of the data object expected by the template.</li>
<li>Declaring the source file from the template (as a function of the language).</li>
<li>Declaring the hole-to-value mapping  to be used.</li>
</ul>
<p>Here&#8217;s that loading code for the above template file:</p>
<pre style="padding-left: 30px;"><span style="color: #003366;">module </span>User = Loader.Html(<span style="color: #003366;">struct</span>
  <span style="color: #003366;">type </span>t = &lt;
    id   : Id.t ;
    url  : string ;
    img  : string option ;
    name : string
  &gt; ;;
  <span style="color: #003366;">let </span>source  _ = <span style="color: #ff0000;">"users/list"</span>
  <span style="color: #003366;">let </span>mapping _ = [
    <span style="color: #ff0000;">"id"</span>,   Mk.esc (<span style="color: #003366;">fun </span>x -&gt; Id.to_string (x # id)) ;
    <span style="color: #ff0000;">"url"</span>,  Mk.esc (<span style="color: #003366;">fun </span>x -&gt; x # url) ;
    <span style="color: #ff0000;">"img"</span>,  Mk.esc (<span style="color: #003366;">fun </span>x -&gt; BatOption.default img404 (x # img)) ;
    <span style="color: #ff0000;">"name"</span>, Mk.esc (<span style="color: #003366;">fun </span>x -&gt; x # name)
  ]
<span style="color: #003366;">end</span>)

<span style="color: #003366;">module </span>UserList = Loader.Html(struct
  <span style="color: #003366;">type </span>t = &lt;
    users : User.t list
  &gt; ;;
  <span style="color: #003366;">let </span>source  _ = <span style="color: #ff0000;">"users"</span>
  <span style="color: #003366;">let </span>mapping lang = [
    <span style="color: #ff0000;">"list"</span>, Mk.list (<span style="color: #003366;">fun </span>x -&gt; x # users) (User.template lang)
  ]
<span style="color: #003366;">end</span>)</pre>
<p>One <em>view </em>is defined and loaded for every independent piece of HTML in the template. Here, there is an User view which represents the list item for a single user, repeated zero, one or more times ; and there is the UserList view representing the wrapper in which those list items will be placed.</p>
<p>The <code>{v:foobar}</code> syntax defines a variable hole. The corresponding view MUST define a mapping for that variable, or an error will occur at deployment time.</p>
<p>The <code>{{foobar: }}</code> syntax is a variant: in addition to declaring a variable hole, it also defines such a sub-view, which can be loaded using <code>template/foobar</code> as the path.</p>
<p>The <code>{t:foobar}</code> syntax defines a translation hole. The template engine will automatically load the corresponding term from the internationalization dictionary used to render the template.</p>
<p>The <code>Mk.esc</code> and <code>Mk.list</code> are binding instructions which are used to compile the template to a virtual machine. The common binding instructions are:</p>
<ul>
<li><code>Mk.esc f</code> applies <code>f</code> to the data object, which returns a string. That string is then HTML-escaped and output.</li>
<li><code>Mk.str f</code> is the same as above, but the string is not HTML-escaped.</li>
<li><code>Mk.i18n f</code> is the same as above, but the string is translated as an internationalization term.</li>
<li><code>Mk.list f t</code> applies <code>f</code> to the data object, which returns a list of data objects compatible with template <code>t</code>. That template is then used to render those data objects in order.</li>
<li><code>Mk.list_or f t e</code> is the same as above, but if the returned list is empty, it instead uses template <code>e</code> to draw a « list is empty » message.</li>
<li><code>Mk.sub f t</code> applies <code>f</code> to the data object, which returns a single object compatible with template <code>t</code>. That template is then used to render the object.</li>
<li><code>Mk.sub_or f t e</code> is the same as above, but <code>f</code> returns an optional type. If it is missing, then template <code>e</code> is used to render an « object is missing » message.</li>
<li><code>Mk.text f</code> provides <code>f</code> with the current writing stream and internationalization object, so that it may directly write HTML to the output. This is how most rendering helpers such as « render a currency amount » are used.</li>
<li><code>Mk.box f</code> is the same as above, but the writing stream supports the addition of arbitrary javascript code to be executed by the client as part of rendering the template. This is how javascript-dependent rendering helpers such as « render a datepicker » are used.</li>
</ul>
<p>The data type is defined in the view itself, either explicitly (as I did above for the sake of clarity) or by using an existing type from your application — if the application already had an user module with the appropriate data type, I could have used that type instead.</p>
<p>By specifying views in this way, the data required to render a template is made available to the compiler for type-checking, and missing bindings are detected during deployment (usually to a local test server). This has made template-related errors exceedingly rare — once the HTML is done, it becomes extremely hard to use it wrong.</p>
<p>Although this feature is not currently in use, the virtual machine semantics also allow compiling it down to JavaScript. This would allow us to send the rendering code to the client as a one-time cost, and send a much smaller data package through AJAX whenever something new needs to be rendered.</p>
<h4>The CoffeeScript Layer</h4>
<p>We use CoffeeScript because it&#8217;s more elegant, shorter, and includes a compiling-to-javascript step that lets us detect syntax errors at deployment time. Yes, compile- and deployment-time are my favorite buzzwords, because I enjoy the feeling of safety that they bring.</p>
<p>As mentioned above, the actual CoffeeScript is removed from the template in a pre-processing step, and replaced with a hole that says « call JavaScript function #33 now » that happens to define a list of parameters matching the params attribute of the original script element.</p>
<p>So, starting with the script element from the example above:</p>
<pre style="padding-left: 30px;"><code>&lt;<span style="color: #003366;">script</span> <span style="color: #008000;">type</span>=<span style="color: #ff0000;">"text/coffeescript"</span> <span style="color: #008000;">params</span>=<span style="color: #ff0000;">"<span style="color: #ff6600;"><strong>save_url</strong></span>"</span>&gt;
list = @$.find <span style="color: #ff0000;">'ul'</span>

save = =&gt;
  ids = [];
  list.children(<span style="color: #ff0000;">'li'</span>).each -&gt;
    ids.push $(@).attr 'id'
  @ajax save_url, ids

list.children(<span style="color: #ff0000;">'li'</span>).sortable
  change: save
&lt;/<span style="color: #003366;">script</span>&gt;</code></pre>
<p>If this is the 33rd script tag encountered by the preprocessor, then it would append the following to the complete CoffeeScript file:</p>
<pre style="padding-left: 30px;">@j33 = (save_url) -&gt;
  list = @$.find <span style="color: #ff0000;">'ul'</span>

  save = =&gt;
    ids = [];
    list.children(<span style="color: #ff0000;">'li'</span>).each -&gt;
      ids.push $(@).attr 'id'
    @ajax save_url, ids

  list.children(<span style="color: #ff0000;">'li'</span>).sortable
    change: save</pre>
<p>And it would be replaced in the template file with this:</p>
<pre style="padding-left: 30px;">{j:j33:save_url}</pre>
<p>This syntax (which can be used manually, although it should be avoided) is a javascript hole, it runs the specified function and provides by-name values for the arguments. The parser would notice that we are declaring an HTML view instead of a JS/HTML view and complain about it, so we would have to go back and re-define it:</p>
<pre><span style="color: #003366;">module </span>UserList = Loader.JsHtml(struct
  <span style="color: #003366;">type </span>t = &lt;
    users : User.t list ;
    save_url : string
  &gt; ;;
  <span style="color: #003366;">let </span>source  _ = <span style="color: #ff0000;">"users"</span>
  <span style="color: #003366;">let </span>mapping lang = [
    <span style="color: #ff0000;">"list"</span>, Mk.list (<span style="color: #003366;">fun </span>x -&gt; x # users) (User.template lang)
  ]
  <span style="color: #003366;">let </span>script  _ = [
    <span style="color: #ff0000;">"save_url"</span>, (<span style="color: #003366;">fun </span>x -&gt; Json_type.String (x # save_url))
  ]
<span style="color: #003366;">end</span>)</pre>
<p>I have used <code>Loader.JsHtml</code> instead of <code>Loader.Html</code>, and defined a secondary mapping that is specific to JavaScript parameters, and which uses the data object to return JSON values.</p>
<p>How is the JavaScript called? Well, it really depends on how your JavaScript library handles it. On non-AJAX HTTP, Ozone will try to inject all JavaScript calls in a script element at the end of the HTML body. In AJAX mode, Ozone allows you render a template to a JSON object representing both the HTML and the JavaScript together, and it is the responsibility of the code that made the AJAX request to receive that object, place the HTML wherever applicable, and then &#8220;run the JavaScript&#8221;.</p>
<p>By convention, the JavaScript is called using a <em>client context</em> as its<code>this</code> value. The client context is an object which may contain whatever the caller finds interesting to place there, along with a variable named <code>$</code> which should be a jQuery selection containing the root element of the previously rendered HTML. Hence, <code>@$.find 'ul'</code> would select the list in the rendered HTML, instead of all the lists on the page.</p>
<h4>The LESS CSS Layer</h4>
<p>This is the least interesting of all three layers. The LESS CSS code is extracted, appended to a single file, and compiled to CSS (which, again, is an useful deployment-time syntax check). The point of this feature is simply to let the designer place element-specific CSS next to the element, instead of having it exist in an external file and cause trouble with asset garbage collection (can I remove this rule or is it still used anywhere?) External files still exist, though, for CSS rules that are not limited to a single template.</p>
<h4>Bonus : the triple hash</h4>
<p>How do I define some code that should be called when a button is clicked? Defining it directly in the onclick method is ugly, hard to read and does not let the application provide parameters, so what else can I do?</p>
<p>The solution is to use an intermediary global object that happens to be the same for the entire file — a pattern that stores any template-related JS in a global variable named after <code>__FILE__</code> !</p>
<p>Yes, it is a hack, but it&#8217;s a simple and useful one.</p>
<p>The only difference is that <code>__FILE__</code> is spelled <code>###</code>.</p>
<pre style="padding-left: 30px;"><code>&lt;<span style="color: #003366;">button </span><span style="color: #008000;">type</span>=<span style="color: #ff0000;">"button"</span> <span style="color: #008000;">onclick</span>=<span style="color: #ff0000;">"###.frobnicate()"</span>&gt;<span style="color: #ff6600;"><strong>{t:frobnicate}</strong></span>&lt;/<span style="color: #003366;">button</span>&gt;

&lt;<span style="color: #003366;">script </span><span style="color: #008000;">type</span>=<span style="color: #ff0000;">"text/coffeescript"</span> <span style="color: #008000;">params</span>=<span style="color: #ff0000;">"<span style="color: #ff6600;"><strong>message</strong></span>"</span>&gt;
###.frobnicate = -&gt;
  alert message
&lt;/<span style="color: #003366;">script</span>&gt; </code></pre>
<p><small>Article Image &copy; gdbg12 &mdash; <a href="http://www.flickr.com/photos/78168499@N00/408879493/">Flickr</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/10/ozone-templating/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Let&#8217;s Whine About Google Dart</title>
		<link>http://www.nicollet.net/2011/10/lets-whine-about-google-dart/</link>
		<comments>http://www.nicollet.net/2011/10/lets-whine-about-google-dart/#comments</comments>
		<pubDate>Wed, 12 Oct 2011 14:19:30 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Functional]]></category>
		<category><![CDATA[Dart]]></category>
		<category><![CDATA[Javascript]]></category>
		<category><![CDATA[Language Design]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2583</guid>
		<description><![CDATA[What&#8217;s wrong about 17259 lines of code ? asks Andrea Giammarchi, following the publication of a gist that shows an 18-line Hello-World application written in Dart, the new Google-sponsored browser-based language, and the corresponding 17259 lines of JavaScript that the Dart source code would be compiled to. Yes, 17259 lines of code is quite terrifying, [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-2584" title="melee" src="http://www.nicollet.net/wp-content/uploads/2011/10/melee.png" alt="" width="675" height="100" /></p>
<p><a href="http://webreflection.blogspot.com/2011/10/what-is-wrong-about-17259-lines-of-code.html" target="_blank">What&#8217;s wrong about 17259 lines of code ?</a> asks Andrea Giammarchi, following the publication of a <a href="https://gist.github.com/1277224" target="_blank">gist</a> that shows an 18-line Hello-World application written in Dart, the new Google-sponsored browser-based language, and the corresponding 17259 lines of JavaScript that the Dart source code would be compiled to.</p>
<p>Yes, 17259 lines of code is quite terrifying, and reading through the code does make one ponder what those engineers at Google were smoking and how they can ever expect such a language to overtake JavaScript — or even manage to be used in the first place.</p>
<p>And then, there&#8217;s a loud WHOOSH: the sound of a point flying so high over the collective heads of Dart critiques that it will probably collide with some garbage in low earth orbit.</p>
<p>The point is that <strong>this is version 0.01 of a language designed for optimization through static analysis</strong>.</p>
<p>I&#8217;ll be the first to admit that there is no way for a Hello-World application to execute all those 17259 lines. This is known as dead code: pieces of functionality which might be useful to some applications but not to others, which is included because it&#8217;s easier for the developer, but which will never run.</p>
<p>I can understand why Andrea is upset. Dead code is a big no-no in JavaScript, because it has to be downloaded and parsed by every user, which costs bandwidth and slows down everything. No, wait. Let me correct that last statement. Dead code is a big no-no in JavaScript <em>because the JavaScript language makes it impossible to have automatic dead code elimination</em>.</p>
<p>In JavaScript, you can have a global function named <code>foobar</code> which is not even once referenced in the entire code base, and it would still be unsafe to remove it because at some point the code fetches a <code>funcname</code> variable through AJAX and executes <code>window[funcname]()</code> — and it just so happens that the server returns <code>"foobar"</code> on the night of the full moon. In short, there&#8217;s no way in hell an automated tool can remove a single line of code from JavaScript without breaking at least one possible program. That&#8217;s just the way JavaScript is — objects are key-value containers with string keys.</p>
<p>Of course, you could decide to follow rules that prevent such things from happening. There&#8217;s at least one convention, for instance, which is used by minifiers to try and be a little more than glorified whitespace removers: variable renaming. If you have a local variable in JavaScript, you can rename it and all its occurences and the change will never be visible outside the scope it was defined in <em>because the variable itself is not visible outside</em>. This lets you rename <code>aQuiteLongVariableName</code> to <code>g</code>, which makes your code lighter. And it breaks because one of your function uses <em>eval </em>on an arbitrary string provided by the server and which references <code>aQuiteLongVariableName</code>. So, the convention is « do not use <em>eval</em> » but it&#8217;s still a convention, not a language specification, so it has to be opt-in and clearly document its requirements.</p>
<p>Dart was designed not to have these issues, so it does not matter if version 0.01 of the Dart-to-JavaScript compiler fails to perform dead code removal: it can be implemented, and Google engineers would indeed be fools not to implement it before the 1.0 release.</p>
<p>There are also questions to be asked about the actual contents of those 17259 lines — when they <em>will</em> be used, will it make sense?</p>
<p>One of the first things that I notice is, as Andrea mentioned, the various global binding functions such as:</p>
<pre style="padding-left: 30px;"><code>function $bind1_0(fn, thisObj, scope) {
  return function() {
    return fn.call(thisObj, scope);
  }
}</code></pre>
<p>This is actually an extremely important optimization, but not related to speed — it&#8217;s about memory. To understand why, look at this code:</p>
<pre style="padding-left: 30px;"><code>var div = document.createElement("div");
var message = "Hello";
div.onclick = function(){
  alert (message)
};</code></pre>
<p>The anonymous function here is a closure: it must carry the variable <code>message</code> with it in order to access it when it is called. In any sane language, the interpreter would store a reference to that variable in the closure, and be done with it. JavaScript cannot (again, because of <em>eval</em>) so it has to store the entire scope that the function was defined in, which includes both <code>message</code> and <code>div</code>. In some versions of IE, having such a circular reference between a DOM element and a JavaScript value creates a memory leak which leads to massive performance degradation, but even in better browsers than IE, the function will keep this <code>div</code> alive for longer than is really necessary, thus consuming memory for nothing.</p>
<p>Dart has the ability to perform this scope analysis and determine that only the message needs to be included. The truckload of bind functions help implement this by utterly eliminating the JavaScript closure:</p>
<pre style="padding-left: 30px;"><code>function onclick(message) {
  alert(message);
}

var div = document.createElement("div");
var message = "Hello";
div.onclick = $bind1_0(onclick,div,message);</code></pre>
<p>Another critic from Andrea was the massive amount of wrappers around elementary functions, such as setting an array element to a value or adding two values together, which should bring JavaScript-from-Dart programs to a screeching halt becuse of all the involved overhead.</p>
<p>They won&#8217;t. Trust me. This isn&#8217;t the first time I have seen such wrappers, and I know what they are and why they exist. They are generic adapters — fallback operations to be used when the static analysis step cannot determine precisely the types of the objects involved.</p>
<p>Quick. Translate the following code from Dart to JavaScript by hand in an optimal manner:</p>
<pre style="padding-left: 30px;"><code>swap(a,i,j) {
  var t = a[i];
  a[i] = a[j];
  a[j] = t;
}</code></pre>
<p>Chances are, you answered something like this:</p>
<pre style="padding-left: 30px;"><code>function swap(a,i,j) {
  var t = a[i];
  a[i] = a[j];
  a[j] = t;
}</code></pre>
<p>You would be wrong. You are under the mistaken impression that, because it uses the []-syntax, the Dart function works with arrays when in fact it works with any object that defines that access operator.</p>
<p>So, on your second attempt, you would answer something like this:</p>
<pre style="padding-left: 30px;"><code>function swap(a,i,j) {
  var t = ugly_getter.apply(a,i);
  ugly_setter.apply(a,i,ugly_getter.apply(a,j));
  ugly_setter.apply(a,j,t);
}</code></pre>
<p><strong>And you would be wrong again</strong>. The key property of Dart (as well as other languages which encounter the same problem — OCaml, C#, C++, Java&#8230; anything with generic polymorphism actually) is that it allows static analysis. Static analysis lets you determine what the types of those variables might be. Is the swap function ever called on something other than an array? Then only define version one. Is it called on many things, but most often arrays? Define both functions, and use the specific array-compatible one whenever you are certain an array will be passed. Is it never called? Remove it&#8230;</p>
<p>This kind of analysis is reasonably simple to perform, and has the benefit of being able to create both highly-optimized code for primitive types and short generic code for abstract types. And you can even help the compiler by annotating types.</p>
<p>Again, it does not matter whether Dart 0.0.1 actually performs these optimizations. What matters is that they are both possible and easy given the language design, and I expect them to be present in release 1.0</p>
<p>I am by no means a Google zealot or secret Dart admirer. In fact, from what I have seen so far, I would sooner <em>loathe </em>Dart than admire it — a bastard type-checking that relies on warnings? A bland java-like syntax for brace weenies? No actual support for immutable data structures? Blech. Give me <span style="text-decoration: line-through;">JavaScript</span> CoffeeScript anyday. I&#8217;ll switch to Dart when it takes over the world.</p>
<p>But the current crapstorm is just ridiculous. Come on, the guys at Google are pushing out a 0.0.1 proof-of-concept and you&#8217;re complaining that their optimizer isn&#8217;t acceptable? Please. <em>I&#8217;m not even sure there&#8217;s an optimizer yet</em>. From the language specs, it&#8217;s fairly obvious that many optimization avenues unavailable to JavaScript will be available to Dart, the only question is whether release 1.0 will have them or not. Wait and see — criticizing the optimization of a 0.0.1 compiler is just silly.</p>
<p><small>Article Image &copy; Erich Ferdinantd &mdash; <a href="http://www.flickr.com/photos/erix/5876052552/">Flickr</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/10/lets-whine-about-google-dart/feed/</wfw:commentRss>
		<slash:comments>30</slash:comments>
		</item>
		<item>
		<title>Functional Programming</title>
		<link>http://www.nicollet.net/2011/10/functional-programming/</link>
		<comments>http://www.nicollet.net/2011/10/functional-programming/#comments</comments>
		<pubDate>Thu, 06 Oct 2011 11:01:12 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Functional]]></category>
		<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[Javascript]]></category>
		<category><![CDATA[Language Design]]></category>
		<category><![CDATA[Learning]]></category>
		<category><![CDATA[Objective Caml]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2561</guid>
		<description><![CDATA[Yesterday evening, I was present at Paris Hackers, a meetup for Hacker News enthusiasts — and a «hacker» in this context is not someone who breaks into software systems, but someone who is passionate beyond words about how software works. But the definition is vague, and that vagueness is part of my surprise. We had a [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-2562" title="hackers" src="http://www.nicollet.net/wp-content/uploads/2011/10/hackers.png" alt="" width="675" height="100" /></p>
<p>Yesterday evening, I was present at <a href="http://parishackers.org/">Paris Hackers</a>, a meetup for <a href="http://news.ycombinator.com/" target="_blank">Hacker News</a> enthusiasts — and a «hacker» in this context is not someone who breaks into software systems, but someone who is passionate beyond words about how software works. But the definition is vague, and that vagueness is part of my surprise.</p>
<p>We had a little chat on the topic of recent buzzwords — concepts that people were interested in, but not everyone understood. One of them was Functional Programming. The speaker asked who knew enough about it to explain to the others, and I naturally raised my hand: I was taught functional programming formally, I taught it myself, I implemented a functional programming language, and I am using functional programming professionally on a daily basis.</p>
<p>My surprise was that I was the only one. I had sort of assumed that a majority of «hackers» would know what functional programming is. I will have to revise that definition.</p>
<h3>« Having Functions »</h3>
<p>Functional programming itself has a definition problem: depending on whom you ask, you can get two different definitions on functional programming. One definition is what I would call <em>pure</em> functional programming, and the other is better summarized as « having functions »</p>
<p>Of course all languages have something they call «functions» but I strongly believe that there is a list of minimum requirements before you are allowed to say that.</p>
<ol>
<li>You should be able to define a FOOBAR anywhere. At global scope. Inside a class. Inside another function. Inside an expression. It can be an anonymous FOOBAR, or it can be given a name, depending on the situation.</li>
<li>You should be able to store a FOOBAR in a variable, pass it as an argument to a function, and return it from a function. If your language is strongly typed, it should have an easily described type as well.</li>
</ol>
<p>Replace FOOBAR with «integer» or «string» and you will notice that without these requirements one can hardly say a language «supports integers» or «supports strings» — what if you could not store integers in variables, or write string literals as part of expressions?</p>
<p>These requirements are called «having FOOBARs as first-class citizens» by the way. In C for instance, functions and arrays are second-class citizens, and integers are first-class citizens. In JavaScript, functions are a first-class citizen.</p>
<p>Allowing functions helps make a language a lot more expressive. Let&#8217;s think of a quick example: performing an HTTP request and displaying the result in a window. In pseudocode that looks shamefully like JavaScript code, it would look like this:</p>
<pre style="padding-left: 30px;"><code><span style="color: #000080;">function </span>display(url) {
  <span style="color: #000080;">var </span>page = http(url);
  <span style="color: #000080;">return new</span> Window(page);
}</code></pre>
<p>This function has an obvious limitation: if it takes five seconds to load the data through HTTP, then the program will freeze for five seconds and only then display the window. A better solution would be to make the request asynchronous: a window in a «loading» state is returned immediately, and a background process takes care of the HTTP requests and fills the window with the contents once they have been received.</p>
<p>A clean way of encapsulating this would be to let the <code>http</code> function handle the asynchronous behavior. The calling code would only need to provide a description of what needs to be done once the contents have been received. There are two ways of doing this in an object-oriented world.</p>
<p><strong>Object-oriented solution #1</strong>: inherit from an http request class.</p>
<pre style="padding-left: 30px;"><code><span style="color: #000080;">class </span>GetPageRequest <span style="color: #000080;">extends </span>HttpRequest {

  <span style="color: #008000;">// We need to know what window to put the contents in</span>
  <span style="color: #000080;">function </span>GetPage(window,url) {
    <span style="color: #000080;">super</span>(url);
    <span style="color: #000080;">this</span>.window = window;
  } 

  <span style="color: #008000;">// Override this function to react to the data being received</span>
  <span style="color: #000080;">function </span>onSuccess(page) {
    <span style="color: #000080;">this</span>.window.setContents(page);
  }
}

<span style="color: #000080;">function </span>display(url) {
  <span style="color: #000080;">var </span>window = <span style="color: #000080;">new </span>Window(loading);
  <span style="color: #000080;">var </span>request = <span style="color: #000080;">new </span>GetPageRequest(window,url);
  request.runAsynchronous();
  <span style="color: #000080;">return </span>window;
}</code></pre>
<p><strong>Object-oriented solution #2</strong>: implement an «HTTP success event handler» interface.</p>
<pre style="padding-left: 30px;"><code><span style="color: #000080;">class </span>OnReceivePage <span style="color: #000080;">implements </span>IHttpRequestSuccessHandler {

  <span style="color: #008000;">// What window do we need to fill ?</span>
  <span style="color: #000080;">function </span>OnReceivePage(window) {
    <span style="color: #000080;">this</span>.window = window;
  }

  <span style="color: #008000;">// Handle success (as previous example)</span>
  <span style="color: #000080;">function </span>onSuccess(page) {
    <span style="color: #000080;">this</span>.window.setContents(page);
  }
}

<span style="color: #000080;">function </span>display(url) {
  <span style="color: #000080;">var </span>window = <span style="color: #000080;">new </span>Window(loading);
  http(url, <span style="color: #000080;">new </span>OnReceivePage(window));
  <span style="color: #000080;">return </span>window;
}</code></pre>
<p>Both solutions are quite long, because object-oriented solutions involve a certain amount of syntactic overhead. In both cases, the only code that is actually relevant to our work is the contents of the <code>onSuccess</code> function. What if the language allowed us to define that function directly, and passing it to the HTTP request without any syntactic fuss?</p>
<p><strong>Functional solution #1</strong>: local function definition.</p>
<pre><code><span style="color: #000080;">function </span>display(url) {
  <span style="color: #000080;">var </span>window = <span style="color: #000080;">new </span>Window(loading);
  <span style="color: #000080;">function </span>onSuccess(page) { window.setContents(page); }
  http(url, onSuccess);
  <span style="color: #000080;">return </span>window;
}</code></pre>
<p>In fact, why bother with giving the function a name? Its contents are obvious enough, so it would be shorter and cleaner to just passing it to the HTTP request without any name.</p>
<p><strong>Functional solution #2</strong>: anonymous function definition, or «lambda»</p>
<pre><code><span style="color: #000080;">function </span>display(url) {
  <span style="color: #000080;">var </span>window = <span style="color: #000080;">new </span>Window(loading);
  http(url,<span style="color: #000080;">function</span>(page) { window.setContents(page); });
  <span style="color: #000080;">return </span>window;
}</code></pre>
<p>The latter example is syntactically correct JavaScript code, by the way. As you can see, the code is both shorter and more readable than the class-based version. In general, using inheritance or interface implementation to define a single method is always significantly more wasteful that using an anonymous function.</p>
<p>There&#8217;s a third requiremend that I left unvoiced. Look back at the previous examples again. Notice how the class-based examples have to store the window object explicitly, but the functional versions use the variable directly? This is known as «closures» : when defining a function inside a function, the defined function actually carries with it all the variables that were present when it was defined. <strong>Functional languages always have closures</strong>, because any non-trivial operation you might wish to place in an anonymous function will involve accessing data that was present near the definition.</p>
<p>In summary: functional programming involves the ability to manipulate functions as you would manipulate any other data type (define anywhere, store, pass, return&#8230;) and allows writing more concise programs whenever you need to pass, store or return some behavior to be executed later.</p>
<p>Examples of functional languages:</p>
<ul>
<li>JavaScript</li>
<li>Lisp (including Common Lisp, Scheme &#8230;)</li>
<li>ML (including SML, OCaml, F# &#8230;)</li>
<li>Haskell</li>
<li>Ruby</li>
<li>C#</li>
</ul>
<p>Examples of near-functional languages:</p>
<ul>
<li>PHP only supports lambdas and closures since 5.3, and closure syntax is suboptimal.</li>
<li>Python has 99% support, the only issue is that only one-line lambdas are allowed.</li>
<li>XSLT 2.0 has no anonymous functions, but is otherwise functional. I kid you not.</li>
</ul>
<p>Examples of non-functional languages:</p>
<ul>
<li>Java</li>
<li>C</li>
<li>C++</li>
</ul>
<h3>Pure Functional Programming</h3>
<p>This is the other definition of «Functional Programming» that you can hear, and the confusion is understandable: languages that match definition #2 have long been the only examples of definition #1 as well.</p>
<p>The relevant concept is purity: can the impact of a function on the rest of the program be described only in terms of its <em>return type, </em>and can the impact of the rest of the program on the function be described only in terms of its <em>arguments</em>?</p>
<p>Consider the signature of an interface method:</p>
<pre style="padding-left: 30px;"><code>public int frobnicate(Foo f);</code></pre>
<p>There are many ways in which that function can affect the rest of the program:</p>
<ul>
<li>It could call a global function to perform changes in the global state, such as altering a singleton, setting a global variable, printing data to the screen, sending data over a socket&#8230;</li>
<li>It could change member variables of the object it is called on.</li>
<li>It could call methods on its argument.</li>
<li><strong>It could return a new value that is then used by the calling code.</strong></li>
</ul>
<p>There are also many ways in which the rest of the program could affect the behavior of the function:</p>
<ul>
<li>It could depend on global variables to decide what it should do.</li>
<li>It could read global state (user input, files, sockets&#8230;)</li>
<li><strong>It could access the data of the object it is called on.</strong></li>
<li><strong>It could access the data of its argument.</strong></li>
</ul>
<p>The entries marked in bold can be deduced from the method signature: having a return type means you want to return something, being non-static means you wish to read the state of the object you are called on, having an argument means you want to read data from that argument. The other entries are possible, but not readily apparent in the signature.</p>
<p>Pure functional programming is the principle of least surprise: sure, those entries that are not in bold are possible, but it would be surprising if they happened, so they are in fact not allowed. <strong>A pure function can only read data from its arguments and can only write data by returning it</strong>.</p>
<p>Also, pure functions imply that data is immutable — if functions are not allowed to change anything, then they are not allowed to change data structures, so the data structures are by definition immutable.</p>
<p>By the way, pure functional programming is all about what you cannot do — so you could in theory write functional programming in <em>any</em> language just by following coding conventions to that end. In practice, immutability is quite hard to respect if there is no syntax for making it easier — you spend a lot of time creating manual copies of objects in order to change a single field.</p>
<p>Working with pure functions and immutable data structures involves both benefits and trade-offs when compared to non-pure programing. Think of it as a slider which goes between «0% Pure» and «100% Pure» and you pick one value for your entire program. Many algorithms and concepts get easier to understand, change and manipulate as the slider moves towards the 100% end, but there are some concepts that are intrinsically non-pure and actually get harder and harder to think about as your slider moves away from the 0% end. For instance, working with a persistent data store is an intrinsically mutable endeavor, and it&#8217;s hard (but possible) to think about it in pure terms.</p>
<p>I find that a slider value of 95% in a language that makes such a value possible is the optimal place to be — the 5% includes writing code that is truly non-pure (such as database manipulation, responding to HTTP requests&#8230;) as well as code that is non-pure on the inside but is actually pure from the outside (such as memoization and caching).</p>
<p><small>Article Image &copy; Tim Olson &mdash; <a href="http://www.flickr.com/photos/timmyo/4629980461/">Flickr</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/10/functional-programming/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Using OCaml from CouchDB views</title>
		<link>http://www.nicollet.net/2011/08/using-ocaml-from-couchdb-views/</link>
		<comments>http://www.nicollet.net/2011/08/using-ocaml-from-couchdb-views/#comments</comments>
		<pubDate>Sun, 07 Aug 2011 15:27:46 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Functional]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[Objective Caml]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2488</guid>
		<description><![CDATA[What follows is directly taken from my latest GitHub project, which provides an adapter for transforming OCaml applications into CouchDB view servers. The programmer writes an OCaml application that exports one or more map and reduce functions using the API found in module CouchAdapter, and creates a CouchDB design document that specifies the application path [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-2489" title="plug" src="http://www.nicollet.net/wp-content/uploads/2011/08/plug.png" alt="" width="675" height="100" /></p>
<p>What follows is directly taken from <a href="https://github.com/VictorNicollet/CouchDB-OCaml-Adapter" target="_blank">my latest GitHub project</a>, which provides an  adapter for transforming OCaml applications into CouchDB view servers.  The programmer writes an OCaml application that exports one or more map  and reduce functions using the API found in module <code>CouchAdapter</code>,  and creates a CouchDB design document that specifies the application  path and the name of the exported functions. The adapter server then  receives evaluation requests from CouchDB and passes them to the  application, and returns the result back to CouchDB.</p>
<p>The objective of this project is <em>not</em> to support writing OCaml code directly into views! The OCaml code <em>should</em> follow the standard build procedure, the only exception being that the <code>CouchAdapter</code> API is used to export that code and make it available to the adapter server.</p>
<h3>Requirements and setup</h3>
<p>This adapter uses <a href="http://martin.jambon.free.fr/json-wheel.html"><strong>json-wheel</strong></a> for representing JSON values, and the build process requires <a href="http://brion.inria.fr/gallium/index.php/Ocamlbuild">OCamlBuild</a>. There are no other direct dependencies.  Building the adapter server <code>runServer</code> is fairly straightforward: <code>make byte</code> or <code>make native</code> generates <code>runServer.byte</code> or <code>runServer.native</code> respectively. Move the resulting application to an appropriate location  on your system and allow CouchDB to execute it. My suggestion is:</p>
<pre style="padding-left: 30px;"><code>cp runServer.native /usr/bin/couch-ml-adapter
chmod a+x /usr/bin/couch-ml-adapter
</code></pre>
<p>I will be assuming this convention for the rest of this manual. Once  the server is built and installed, you need to configure CouchDB to  actually use that adapter to execute OCaml views. Edit the <code>local.ini</code> configuration file of your CouchDB server (usually found in <code>/etc/couchdb/local.ini</code>) and add the following lines:</p>
<pre style="padding-left: 30px;"><code>[query_servers]
ocaml=/usr/bin/couch-ml-adapter
</code></pre>
<p>Depending on your configuration, there might already be a <code>[query_servers]</code> section. If that is the case, add the second line to that section. If you have trouble configuring your query servers, <a href="http://wiki.apache.org/couchdb/View_server#The_View_Server">read the CouchDB documentation</a>.</p>
<p>Errors that happen while executing the adapter will appear in the CouchDB logs (usually found in <code>/var/log/couchdb/couch.log</code>).</p>
<h3>Architecture</h3>
<h4>Query Servers</h4>
<p>The CouchDB server usually evaluates map and reduce functions only  when a design document containing those functions is queried by a  client, by following this process:</p>
<ul>
<li>If the query server configured in <code>local.ini</code> is not already running, start it.</li>
<li>Send various instructions on the query server&#8217;s STDIN, such as &#8220;apply the map function F to document D&#8221;</li>
<li>Read the results on the query server&#8217;s STDOUT.</li>
</ul>
<h4>The Adapter Server</h4>
<p>The adapter server provided by this project is one such query server.  When it must apply a function to a document, it does the following:</p>
<ul>
<li>Determine which application provides the function.</li>
<li>If the application is not already running, start it.</li>
<li>Send the request to the application&#8217;s STDIN, read the answer on its STDOUT.</li>
<li>If the application responds with results, send these back to CouchDB.</li>
</ul>
<p>In short, the overall architecture looks like this:</p>
<pre><code>+---------+         +------------------------+
|         | &lt;-----&gt; |  Haskell Query Server  |
|         |         +------------------------+
|         |
|         |         +------------------------+
| CouchDB | &lt;-----&gt; | Brainfuck Query Server |
|         |         +------------------------+
|         |
|         |         +------------------------+
|         | &lt;-----&gt; |                        | &lt;-----&gt; [ Application /home/nicollet/test ]
+---------+         |  OCaml Adapter Server  |
                    |                        | &lt;-----&gt; [ Application /usr/bin/foo ]
                    +------------------------+
</code></pre>
<p>The programmer should therefore write an application which reads the  adapter requests on STDIN, runs the requested functions on the provided  documents, and sends the results back on STDOUT. All the boilerplate  involved is handled by the <code>CouchAdapter</code> module, so that the actual development process you will be following is:</p>
<ul>
<li>Include any modules you might need to use in your view.</li>
<li>Define the map or reduce function as an OCaml function.</li>
<li>Register that function as being exported with <code>CouchAdapter.export_map</code> and <code>CouchAdapter.export_reduce</code>.</li>
<li>Call <code>CouchAdapter.export()</code></li>
</ul>
<h4>Importing From CouchDB</h4>
<p>CouchDB references map and reduce functions in design documents, using the following syntax:</p>
<pre style="padding-left: 30px;"><code>{ "_id" : "_design/..."  ,
  "language" : "...",
  "views" : {
    "foobar" : { "map" : ... }
    "quxbaz" : { "map" : ... , "reduce" : ... }
  }
}
</code></pre>
<p>In order to use the OCaml adapter, one must first set the language property to <code>"ocaml"</code>. Then, to reference the function <code>"extract_foo"</code> defined in application <code>/usr/bin/foo</code>, one would write:</p>
<pre style="padding-left: 30px;"><code>"views" : {
  "foobar" : { "map" : ["/usr/bin/foo", 1, "extract_foo"] }
}
</code></pre>
<p>The same syntax applies for reduce functions as well. The three components of the definition are <strong>1-</strong> the absolute path to the application that exports the function (this is how the adapter server knows what application to run), <strong>2-</strong> a version number discussed in the next section and <strong>3-</strong> the name under which the function is exported from that application.</p>
<h4>Function versions</h4>
<p>For performance reasons, once an application or query server has been  started, it is never shut down. This only causes problems when there&#8217;s a  new version of the code that needs to be deployed. The adapter server  provides a versioning system which automatically detects that a  function.</p>
<p>A CouchDB design document requests a function that is <em>at least</em> a certain version. For instance, <code>["/usr/bin/foo", 42, "extract_foo"]</code> indicates that the adapter server should find version 42 <em>or greater</em> of the function <code>"extract_foo"</code> exported by application <code>usr/bin/foo</code>. If that application is currently running <em>and the function is either missing or older than version 42</em> then the application is shut down and started anew in a completely transparent fashion.</p>
<p>Note that if rebooting the application <em>still</em> fails to  provide an appropriate version of the function, the adapter server will  report an error, which CouchDB will propagate to the client. This makes  all the views inside the design document unavailable until an  appropriate version of the application is deployed.</p>
<p>Failing to manage function versions <em>both in CouchDB and in the application</em> can lead to data inconsistencies, as different documents are processed  by different versions of the same function. Only a global version change  which prompts a full refresh of the view and reloads the application  can ensure data consistency in the face of code changes.</p>
<h3>Creating a map function</h3>
<p>A map function must follow the signature <code>json -&gt; (json * json) list</code>: the argument is the entire document being processed, and the output is a list of <code>key, value</code> pairs being output by the map function.</p>
<p>For example, suppose you already have an <code>User</code> module in your application, which is used among other things for reading and writing users to the CouchDB database:</p>
<pre style="padding-left: 30px;"><code>type t = {
  active  : bool ;
  name    : string ;
  email   : string ;
  picture : string
}

let of_json = (* ... *)
let to_json = (* ... *)
</code></pre>
<p>Then you can rely on that module to define a map function with the above signature, and export it using the <code>CouchAdapter</code> module:</p>
<pre style="padding-left: 30px;"><code>open Json_type

let user_by_email json =
  try let user = User.of_json json in
      [ String user.User.email , Null ]
  with _ -&gt; []

let () =             

  CouchAdapter.export_map
    ~name:"user_by_email"
    ~version:1
    ~body:user_by_email ;

  CouchAdapter.export ()
</code></pre>
<p>Should you decide to update the view code, make sure that you also increment the version number:</p>
<pre style="padding-left: 30px;"><code>open Json_type

let user_by_email json =
  try let user = User.of_json json in
      if user.User.active then [ String user.User.email , Null ]
  else []
  with _ -&gt; []

let () =             

  CouchAdapter.export_map
    ~name:"user_by_email"
    ~version:2
    ~body:user_by_email ;

  CouchAdapter.export ()
</code></pre>
<h3>Creating a reduce function</h3>
<p>There is no distinction made between reduce and rereduce. While this  causes a slight loss in functionality it also makes writing reduce  functions less arduous given the OCaml type system. The signature of  reduce functions is simply <code>json list -&gt; json</code>.</p>
<p>For example, let&#8217;s assume that an <code>Article</code> module is already defined in your main application:</p>
<pre style="padding-left: 30px;"><code>type t = {
  title : string ;
  html  : string ;
  tags  : string list
}

let of_json = (* ... *)
let to_json = (* ... *)
</code></pre>
<p>We now define a map function and a reduce function that counts how many articles are published for every tag.</p>
<pre style="padding-left: 30px;"><code>let by_tag_map json =
  try let article = Article.of_json json in
      List.map (fun tag -&gt; String tag , Int 1) article.Article.tags
  with _ -&gt; []

let by_tag_reduce json =
  Int (List.fold_left (fun acc -&gt; function Int i -&gt; acc + i | _ -&gt; acc) 0 json)

let () =
  CouchAdapter.export_map "by_tag-map" 1 by_tag_map ;
  CouchAdapter.export_reduce "by_tag-reduce" 1 by_tag_reduce ;
  CouchAdapter.export ()
</code></pre>
<p>And the CouchDB design document is as follows:</p>
<pre style="padding-left: 30px;"><code>{ "_id" : "_design/article",
  "language" : "ocaml",
  "views" : {
    "by_tag" : { "map"    : ["/path/to/app", 1, "by_tag-map"    ],
                 "reduce" : ["/path/to/app", 1, "by_tag-reduce" ] }
  }
}
</code></pre>
<p><small>Article image © Miriam Rossignoli — <a href="http://www.flickr.com/photos/bgo1/3301647540/">Flickr</a></small></p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/08/using-ocaml-from-couchdb-views/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Short OCaml Diffs</title>
		<link>http://www.nicollet.net/2011/07/short-ocaml-diffs/</link>
		<comments>http://www.nicollet.net/2011/07/short-ocaml-diffs/#comments</comments>
		<pubDate>Fri, 29 Jul 2011 20:49:16 +0000</pubDate>
		<dc:creator>Victor Nicollet</dc:creator>
				<category><![CDATA[Functional]]></category>
		<category><![CDATA[Algorithms]]></category>
		<category><![CDATA[CouchDB]]></category>
		<category><![CDATA[Objective Caml]]></category>

		<guid isPermaLink="false">http://www.nicollet.net/?p=2463</guid>
		<description><![CDATA[I&#8217;m working on some wiki functionality right now, and one of the basic properties of a wiki is that history is kept for all pages. And this should be done elegantly, without actually keeping a copy of every version, especially since many changes are actually just small fixes. This is usually done by keeping a [...]]]></description>
			<content:encoded><![CDATA[<p><img class="aligncenter size-full wp-image-2464" title="bond" src="http://www.nicollet.net/wp-content/uploads/2011/07/bond.png" alt="" width="675" height="99" /></p>
<p>I&#8217;m working on some wiki functionality right now, and one of the basic properties of a wiki is that history is kept for all pages. And this should be done elegantly, without actually keeping a copy of every version, especially since many changes are actually just small fixes. This is usually done by keeping a <em>diff</em> — a small piece of data that stores the changes needed to get from version A to version B. Computing an arbitrary diff is a fairly easy task, but computing a diff that is as small as possible is actually challenging, because there are many ways in which a diff can be created and only a few of these are optimal.</p>
<p>The standard approach to computing diffs is to find the longest common subsequence — a non-contiguous sequence of characters that is found in both versions, such that one may turn version A into the sequence only by removing characters, and then turn that sequence into version B only by adding characters. The list of additions and removals is what is commonly referred to as a <em>diff</em>, because this is how the UNIX program <code>diff</code> works.</p>
<p>I am not happy with this solution. For one, it works based on lines, which is fairly acceptable for source code that is cleanly split into many lines, but the average wiki page is not as clean. Instead, every paragraph tends to stay on its own line and is only word-wrapped for convenience. Using a line-based diff tool on such data would cause the removal and addition of entire paragraphs, which is certainly not optimal if the change was a minor typo fix on a single word. And since the <em>diff</em> algorithm itself is quadratic, it cannot be coaxed to work on characters instead of lines : a one-millisecond line-based run would take ten seconds on characters !</p>
<p>Another shortfall of the longest common subsequence approach is that it fails to detect sentence or paragraph swaps, yet these happen quite frequently in the rewriting stages of a wiki page. Indeed, regardless of how clever the diff algorithm is, the diff <em>format</em> itself provides no primitives for moving data around, only &#8220;add this&#8221; and &#8220;remove that&#8221; are supported.</p>
<p>I have designed and implemented a custom diff algorithm. Instead of working based on &#8220;add&#8221; and &#8220;remove&#8221; primitives, it carries a data segment where all the new content is stored, and works with &#8220;blit from old&#8221; and &#8220;blit from data segment&#8221; instructions to construct the new version from the old. Its format is optimized to be stored as JSON, and the diff application algorithm is simple enough to be handled by a short piece of JavaScript if necessary. <a href="https://github.com/VictorNicollet/MiniDiffs" target="_blank">It&#8217;s on GitHub, by the way</a>.</p>
<p>For instance, consider the sentences &#8220;<code>The quick brown fox</code>&#8221; and &#8220;<code>Brown foxes are quick</code>&#8220;, and how the former can be transformed into the latter.</p>
<p>A longest common subsequence approach (on characters, assuming this is sane) would determine that the LCS is the seven characters &#8220;<code>e quick</code>&#8220;, and so it would be written as &#8220;<code><del datetime="2011-07-29T20:13:42+00:00">Th</del><ins datetime="2011-07-29T20:13:42+00:00">Brown foxes ar</ins>e quick<del datetime="2011-07-29T20:13:42+00:00"> brown fox</del></code>&#8221; which is indeed the shortest diff you can get using the LCS approach.</p>
<p>My algorithm instead determines that contiguous substrings &#8220;<code>rown fox</code>&#8221; and &#8220;<code>e quick</code>&#8221; are present in both sentences, so the diff would be written as &#8220;<code><ins datetime="2011-07-29T20:13:42+00:00">B</ins>rown fox<ins datetime="2011-07-29T20:13:42+00:00">es ar</ins>e quick</code>&#8220;. The corresponding JSON is quite short, too:</p>
<pre style="padding-left: 30px;">["Bes ar",1,[10,8],5,[-12,7]]</pre>
<p>The first element is the data segment (the five new characters), single integers are blit-from-data-segment (length) instructions, and integer pairs are blit-from-old (offset,length) instructions.</p>
<p>The theoretical algorithm complexity is O(m+n), the actual implementation on GitHub is worst-case O(mn) because I used lists instead of hash tables at one point, but should actually be O(m+n) on any input that happens to be written in a human language because working on a human language almost guarantees that those lists will contain zero or one elements in most situations.</p>
<p>The algorithm itself relies on statistical properties of character pairs to quickly match up sequences from both strings, such as by finding out that the <code>ox</code> pair only appears once in each sentence, or that the <code>Br</code> pair only appears in the new sentence. On medium-sized texts (up to 50,000 characters), there are several character pairs that are unique or missing, which helps the algorithm guess correctly where each piece came from. Once the source of the pieces is identified, generating the diff is extremely simple.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nicollet.net/2011/07/short-ocaml-diffs/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

