Tag Archive for 'Algorithms'

Basic Patterns for Everyday Programming

Lakshen Perera provides a list of basic patterns for everyday programming, illustrated in Javascript and Ruby. I thought it would be interesting to provide an OCaml illustration as well, and perhaps a handful of additional patterns as well.

Verify object’s availability before calling its methods or properties

In many languages, there is a possibility for objects to be missing — whether this is represented as NULL, null, nil or None. Regardless of the language, it is important to keep in mind at all times whether a given value is optional or not. If it is not optional, then you may assert so using a type declaration if your language supports it :

method name  : string (* not optional *)
method email : string option (* optional *)

In other languages, a runtime assert will do, for instance in PHP :

assert (isset($this->name))

OCaml will not require you to explicitly declare all values as optional or mandatory. Instead, it will deduce that information from the way you use each value. For instance, since json_of_string expects a non-optional string argument, you can simply write :

let parsed_content = json_of_string json (* json is not optional *)

If json is an optional string, this will create a compilation error unless you explicitly define what should happen when the string is missing. The most common approach is to decide that the result should be missing too :

let parsed_content = match json with
  | None   -> None
  | Some s -> Some (json_of_string s) (* by definition, s is not optional *)

If using the Batteries library (which, by the way, you should), you can express this more easily :

let parsed_content = BatOption.map json_of_string json

Set a default value with assignments

This advice is almost identical to the previous one : if you need to construct a non-optional value, but only have access to an optional value, you will have to provide a default value. Using standard OCaml :

let role = match person.role with
  | None      -> `Guest
  | Some role -> role

Using Batteries, there is an almost equivalent version :

let role = BatOption.default `Guest person.role

This version is almost equivalent, because it will evaluate the default value even if it is not required. Let’s assume that the default value is itself provided by a complex computation, such as a database access :

let role = match person.role with
  | None      -> readfrom database (* only executed if person.role is missing *)
  | Some role -> role 

let role = BatOption.default
  (readfrom database) (* Always executed, even if unnecessary *)
  role

The Batteries library provides an alternate function for that :

let role = BatOption.map_default
  readfrom database (* only executed if person.role is missing *)
  role

Checking whether a variable equals to any of the given values

If the values have a legitimate reason to be strings, integers or other types with unlimited numbers of values, then « is in list » predicates are the preferred choice :

if List.mem current_day ["Monday";"Wednesday";"Friday"] then
  (* do something *) 

Days are a quite bad example, because these are better represented as a variant (which then type-checks whether you have written Mornday instead of Monday). If you have already defined the type of weekdays, then :

if List.mem (current_day : weekday) [`Monday;`Wednesday;`Friday] then
  (* do something *)

If this is a one-shot check, or if you would rather not define a weekday type yet, you should instead go for an exhaustive pattern-matching :

 match current_day with
  | `Monday | `Wednesday | `Friday ->
    (* do something *)
  | `Tuesday | `Thursday | `Saturday | `Sunday -> ()

Extract complex or repeated logic into functions

This is a fairly fundamental concept — but it consists in two distinct parts. There is a separation and naming part (you pull out a piece of code and give it a name, which helps understand what it does and how it relates to the rest of the program) and there is an extraction and reuse part (the piece of code is pulled out into a more globally accessible location and parametrized, so that it may be used in other places).

With the above example, simple separation-and-naming would be :

let is_discount_day = match current_day with
 | `Monday | `Wednesday | `Friday -> current_date > 20
 | `Tuesday | `Thursday | `Saturday | `Sunday -> false
in

if is_discount_day then 
  (* do something *)

The variable is defined in the same scope it is use in, and it assumes that current_day and current_date values have been defined previously in that scope. Extraction-and-reuse would go further :

type weekday =
  [ `Monday | `Tuesday | `Wednesday
  | `Thursday | `Friday | `Saturday | `Sunday ]

let is_discount_day (day:weekday) date =
  List.mem day [`Monday;`Wednesday;`Friday] && date > 20

...

  if is_discount_day current_day current_date then 
    (* do something *)

Now, is-discount-day is a global function available from everywhere in the code, and it uses the provided parameters to determine whether this is indeed a discount day.

Memoize the results of repeated function calls

OCaml has several ways to perform memoization. One of them is lazy evaluation :

val discount_day = lazy (is_discount_day current_day current_date)
method discount_day = Lazy.force discount_day

The lazy expression will only be evaluated the first time the Lazy.force function is called on it.

Note that if the current day or current date can change, then the memoization actually breaks things !

Memoization is also helpful when dealing with a function that requires arguments, in which case a different result will be provided for each argument set. A common solution is to use a hash table to store these :

let fibonacci =
  let memo = Hashtbl.create 100 in
  let rec fib n =
    try Hashtbl.find memo n with Not_found ->
      let result = fib (n-1) + fib (n-2) in
      Hashtbl.add memo n result ; result
  in fib

This works fine for short-lived functions — don’t do this for global functions that might stick around for a long time, because the memoization hash table will grow and its contents will never be garbage-collected. If you really have to, use a weak hash table, such as Batteries’ BatInnerWeaktbl, so that the garbage collector may reclaim the memoized values when it runs out of memory.

Also don’ t overdo memoization — it only works when arguments are reliably passed more than once and the time to compute the value is significantly larger than the time to retrieve and store it and it is worth the memory usage and the function has no side-effects.

Use the seven list manipulation primitives

Almost any processing on collections of items can be expressed in terms of seven fundamental patterns. Recognizing those patterns can help improve the clarity of both the code and the underlying algorithm.

1. Map transforms a list into another, item by item, in linear time. Use a map operation when all you need is a one-to-one transformation. The line below extracts three recipes from the database using their identifier.

let recipes = List.map from_database [ "omelet" ; "cheeseburger" ; "risotto" ]

2. Reduce transforms a list of values into a single value by repeatedly applying a function that combines together two values into one. The typical example is a fold, which uses a function to combine each list element, in turn, with an accumulator. It can be used to extract the sum of values in a list, for example :

let total = List.fold_left (+) 0 [ 5 ; 6 ; 3 ; 8 ; 9 ; 0 ; 7 ; 6 ]

This transform allows a preliminary map step which transform the values inside the list into values that can be combined. For instance, to find the age of the oldest person in a list of people :

let oldest = List.fold_left (fun acc person -> max age person.age) 0 people

3. Extract works like a map, but the transformation function returns zero, one or more results for each call. All the results are included in the final list. The most elementary implementation is literally to have a map (that transforms a list into a list of lists) followed by a concatenation (that transforms a list of lists into a list). For instance, to get all the ingredients involved in a list of recipes :

let ingredients =
  List.concat (List.map (fun recipe -> recipe.ingredients) recipes)

4. Filter is a subset of Extract where the transform may not return more than one result — but it may still return none, so its result is simply an optional type. For instance, to extract the list of all recipes that have a wine recommandation along with their recommended wine :

let wines = BatList.filter_map
  (fun recipe -> BatOption.map (fun wine -> recipe,wine) recipe.wine) recipes

The OCaml language also provides a standard List.filter function which keeps values for which a property is true. For instance, to get the list of recipes that have a wine recommendation :

let have_wines = List.filter (fun recipe -> recipe.wine <> None) recipes

This approach is weaker — the wines in the resulting list are still treated as optional by the type system, so you will need bogus pattern matching for a case that never happens (no wine) to extract the actual wine values. A filter_map lets you encode the filter property in the type of the result, which makes using the filtered list easier.

5. Sort unsurprisingly sorts the list. The canonical sort — using the canonical order relationship — is a theoretical curiosity, and in practice most sorts use a projection function p such that A < B iff p(A) < p(B). This is best illustrated in SQL, in the form of the ORDER BY <projection> statement. Two useful helper functions :

let project compare p a b = compare (p a) (p b)
let descending compare a b = compare b a

For instance, to sort the list of recipes based on how long each of them takes :

let by_duration =
  List.sort (project compare (fun recipe -> recipe.duration)) recipes

6. Group works like Sort, but further regroups « equal » items together by returning a list of lists. It works using a comparison function that is usually based on a projection function. For instance, to get three lists containing one-star, two-star and three-star recipes :

let by_stars =
  BatList.group (project compare (fun recipe -> recipe.stars)) recipes

There is a special case when the projection function returns booleans, which is known as a partition. For instance, to extract recipes that are desserts and recipes that are not :

let desserts, non_desserts =
  BatList.partition (fun recipe -> recipe.is_dessert) recipes

7. Search extracts one element from the list (if possible) based on a certain condition or property. Elementary searches are « first element » and « last element. » More complex searches : finding a value by key in a list of key-value pairs using List.assoc or using a predicate using List.find. The heavy-duty search tool is BatList.find_map, used below to find a recipe that is recommended by at least one person, and the recommending person :

let recipe, recommender = BatList.find_map
  (fun recipe -> BatOption.map (fun person -> recipe, person) recipe.recommended)
  recipes

These seven patterns can be used in conjunction to perform almost any algorithm on collections, sequences or lists. For a more complex example, assume we need the ten ingredients that appear the most often in desserts. We would filter recipes by desserts, extract their ingredients, group them by name, sort the sub-lists by list length and take the name of the first element of the first ten sub-lists :

open BatPervasives (* for operator |> *)
open BatList

let ten_best_ingredients recipes = recipes
  |> filter (fun r -> r.dessert)
  |> map (fun r -> r.ingredients) |> concat
  |> group (project compare (fun i -> i.name))
  |> sort (descending (project compare length))
  |> take 10
  |> map hd
  |> map (fun i -> i.name)

Article © brewbooks — Flickr

Annealing Constraints

Of all the existing constraint-solving algorithms out there, my favourite is simulated annealing. To use it, you need a mutation function that randomly and slightly alters a solution, and a fitness function that lets you compare two solutions. Based on these, the algorithm is simple :

  1. Start with an arbitrary solution.
  2. Mutate the solution : if the mutated solution is better than the current one in terms of fitness, keep it and repeat this step.
  3. If you discard too many mutated solutions in step 2, remember the current solution and mutate it.
  4. When time runs out, return the best solution found so far.

Step 2 is a fairly standard ascent towards a local maximum : try a random similar solution and keep it if it improves fitness. Step 3 detects that the algorithm is stuck in a local maximum, so it shakes the solution out of there and starts again from that new position.

What is interesting about this solution is that it is exceedingly easy to implement, easy to adapt to a problem (as long as mutation and fitness are possible), and is reasonably efficient for most small problems.

I used it today to solve an original problem on RunOrg. We have a history of recently connected users, and we wanted to display a simple metric along the lines of « 10 users connected in the last hour » or « 11 users connected this week » as a short summary on the administrators’ dashboard. The problem, of course, is to pick a reasonable duration based on the current data ! If one user connected seven days ago, and ten users connected a few minutes ago, then both « 10 users connected in the last hour » and « 11 users connected this week » are correct descriptions of the situation, but the former is intuitively better than the latter. Intuitively, we should use the shortest duration with the largest amount of users.

And intuition is a bad thing, because it is hard to turn into an algorithm.

In the end, I settled for a handful of periods of time : the last 5 minutes, the last 1 to 6 hours, today, the last 2 or 3 days, and this week. Each of these periods of time contains a certain number of user connections (which is fairly easy to determine based on the connection history) and receives a score computed by multiplying the number of user connections with a magic constant that is tied to that period. So, if the magic constant for « the last hour » is 0.76 and that for « this week » is 0.03, then 10 × 0.76 = 7.6 and 11 × 0.03 = 0.33 meaning that the « 10 users connected in the last hour » version will be preferred.

Then, I came up with several examples of connection histories, and manually picked the intuitive duration that should be chosen to display them. This turned into the fitness function : any set of magic numbers would have to agree with as many of those examples as possible, the more the better. Mutation, on the other hand, consisted in changing one of the magic constants at random.

Within 2000 iterations and a few seconds, the annealing algorithm came up with a solution that satisfied 94% of my constraints — and I am not even certain whether 100% could be satisfied, since I might have left a few contradictory intuitions in there. The final output is:

== Fitness: 94.44%
5min : 1
1h : 0.761091036722064
2h : 0.42295812451487896
3h : 0.26899204507851066
4h : 0.20923702855563558
5h : 0.14858146007043194
6h : 0.14846461447836096
today : 0.1477566411417444
2days : 0.07381054881381345
3days : 0.04202811963812667
week : 0.033268828300871064

And so, these are the magic numbers that our algorithm uses. If it makes wrong decisions, we can always run the annealing again with new constraints, and feed the magic numbers back in.

Article image © Detlef Schobert — Flickr

Pascal’s Triangle

Grant Muller writes that Algorithms can Really Beat You:

I stared at the pyramid of numbers scribbled hastily in blue dry erase. The only thing clear to me in that moment was that I wouldn’t have this figured out in five minutes. Or even ten.

“Any number in the pyramid, or in this case tree array, is the sum of the two numbers above it,” the young man continued, “write a function that gets the value of any x, y coordinate in this structure.”

Sitting quietly in the middle of that article was an old friend of mine, Pascal’s Triangle, that I encounter every so often when doing nerdy things with equations.

The first rows in Pascal’s Triangle are 1, 11, 121, 1331 and 14641 — the most observant will notice that those are the values of 110, 111, 112, 113 and 114. In fact, if you consider the values of the sixth row, 1-5-10-10-5-1, they correspond to the number 161051, which is 115.

And the reason for this is the way in which Pascal’s Triangle is constructed: take a row, duplicate it, shift one copy to the left by one, and add them together.

  1 1           1 2 1         1 3 3 1
+   1 1       +   1 2 1     +   1 3 3 1
= 1 2 1       = 1 3 3 1     = 1 4 6 4 1

This is the same as saying that «any number is the sum of the numbers above it», or T(x,y) = T(x-1,y-1) + T(x,y-1). As an amusing side-effect of this construction method, the sum of all numbers in a given row is always a power of two — by adding two copies of the previous row together, you are multiplying that sum by two every time. Remember this tidbit, it will come in handy later on.

What does this have to do with the powers of 11 ? Well, to multiply by 11, you would multiply by 10 (which is easier: shift all the digits to the left) and add the original number. So, multiplication by 11 works in exactly the same way as the Pascal’s Triangle. Which is why the numbers are the same.

Another place where Pascal’s Triangle appears is binomial coefficients:

(a + b)2 = 1 a2 + 2 ab + 1 b2
(a + b)3 = 1 a3 + 3 a2b + 3 ab2 + 1 b3
(a + b)4 = 1 a4 + 4 a3b + 6 a2b2 + 4 ab3 + 1 b4

To expand (a + b)n, all you need to know is the values on row n of the Triangle. And the reason for this is exactly the same as the one for 11 (and in fact, 11 is a special case of the binomial formula where a=10 and b=1).

But that’s not all. Let’s discuss something entirely different for a second — the fun thing about math is precisely that you can start exploring entirely different venues and end up where you started anyway. So, suppose you have a N-bit integer, and you wish to count how many such integers have exactly M bits set to 1. This is number is known as «N choose M», and is fairly frequently found in mathematics. How do you do determine that number?

One way, the combinatorics approach, is to determine an algorithm that generates such a value, and count how many independent possible choices that algorithm had. My algorithm in this situation is «pick one bit at random, set it to 1, repeat M times»

For the first step, there are N possible bits to choose from.

For the next step, there are N-1 possible bits, and so on until only N-M bits are left. So, there are N × (N-1) × (N-2) × (N -3) … (N-M+1) possible runs for this algorithm. This can be written more concisely using factorials : N! / (N-M)!

But wait! This algorithm actually generates each set of bits several times: once for every possible order of selection of those bits — which is known to be M! for a set of M bits. So, the actual number of distinct runs of my algorithm is N! / ( M! × (N-M)! )

This is something we learn in high school in France, or at least we did back when I was still in high school.

The other way to determine this number is to solve that problem recursively: let’s say that there are C(N,M) different N-bit integer with M bits set to 1. For each such integer one of two possible cases happens:

  • It consists of a 0 followed by an (N-1)-bit integer with M bits set to 1. There are C(N-1,M) such integers.
  • It consists of a 1 followed by an (N-1)-bit integer with (M-1) bits set to1. There are C(N-1,M-1) such integers.

So, recursively, C(N,M) = C(N-1,M) + (N-1,M-1), with the edge cases being C(N,N) = C(N,0) = 1 (there’s only one integer with all-zeroes, and only one integer with all-ones). This is the same formula as Pascal’s Triangle !

T(x,y) = C(N,M) where y = N and x = M

So, here’s the pseudocode for a function that correctly computes T(x,y) far faster than the recursive version suggested by Grant, by using the factorial-based definition we found for C(N,M) :

function T(x,y) {
  var num = 1, denom = 1, start = y-x;
  for (i = 1; i <= x; ++i) {
    num = num * (start + i);
    denom = denom * i;
  }
  return num / denom;
}

Last question: what is C(N,0) + C(N,1) + … C(N,N) ? It is merely counting the number of N-bit integers that have any combination of bits set to 1. We know there are 2N integers. Remember when I told you the sum of the elements in a row of the Triangle was a power of two?

Yes, maths are fun like that.

Article Image © PhilosophyGeek — Flickr

Functional Programming

Yesterday evening, I was present at Paris Hackers, a meetup for Hacker News enthusiasts — and a «hacker» in this context is not someone who breaks into software systems, but someone who is passionate beyond words about how software works. But the definition is vague, and that vagueness is part of my surprise.

We had a little chat on the topic of recent buzzwords — concepts that people were interested in, but not everyone understood. One of them was Functional Programming. The speaker asked who knew enough about it to explain to the others, and I naturally raised my hand: I was taught functional programming formally, I taught it myself, I implemented a functional programming language, and I am using functional programming professionally on a daily basis.

My surprise was that I was the only one. I had sort of assumed that a majority of «hackers» would know what functional programming is. I will have to revise that definition.

« Having Functions »

Functional programming itself has a definition problem: depending on whom you ask, you can get two different definitions on functional programming. One definition is what I would call pure functional programming, and the other is better summarized as « having functions »

Of course all languages have something they call «functions» but I strongly believe that there is a list of minimum requirements before you are allowed to say that.

  1. You should be able to define a FOOBAR anywhere. At global scope. Inside a class. Inside another function. Inside an expression. It can be an anonymous FOOBAR, or it can be given a name, depending on the situation.
  2. You should be able to store a FOOBAR in a variable, pass it as an argument to a function, and return it from a function. If your language is strongly typed, it should have an easily described type as well.

Replace FOOBAR with «integer» or «string» and you will notice that without these requirements one can hardly say a language «supports integers» or «supports strings» — what if you could not store integers in variables, or write string literals as part of expressions?

These requirements are called «having FOOBARs as first-class citizens» by the way. In C for instance, functions and arrays are second-class citizens, and integers are first-class citizens. In JavaScript, functions are a first-class citizen.

Allowing functions helps make a language a lot more expressive. Let’s think of a quick example: performing an HTTP request and displaying the result in a window. In pseudocode that looks shamefully like JavaScript code, it would look like this:

function display(url) {
  var page = http(url);
  return new Window(page);
}

This function has an obvious limitation: if it takes five seconds to load the data through HTTP, then the program will freeze for five seconds and only then display the window. A better solution would be to make the request asynchronous: a window in a «loading» state is returned immediately, and a background process takes care of the HTTP requests and fills the window with the contents once they have been received.

A clean way of encapsulating this would be to let the http function handle the asynchronous behavior. The calling code would only need to provide a description of what needs to be done once the contents have been received. There are two ways of doing this in an object-oriented world.

Object-oriented solution #1: inherit from an http request class.

class GetPageRequest extends HttpRequest {

  // We need to know what window to put the contents in
  function GetPage(window,url) {
    super(url);
    this.window = window;
  } 

  // Override this function to react to the data being received
  function onSuccess(page) {
    this.window.setContents(page);
  }
}

function display(url) {
  var window = new Window(loading);
  var request = new GetPageRequest(window,url);
  request.runAsynchronous();
  return window;
}

Object-oriented solution #2: implement an «HTTP success event handler» interface.

class OnReceivePage implements IHttpRequestSuccessHandler {

  // What window do we need to fill ?
  function OnReceivePage(window) {
    this.window = window;
  }

  // Handle success (as previous example)
  function onSuccess(page) {
    this.window.setContents(page);
  }
}

function display(url) {
  var window = new Window(loading);
  http(url, new OnReceivePage(window));
  return window;
}

Both solutions are quite long, because object-oriented solutions involve a certain amount of syntactic overhead. In both cases, the only code that is actually relevant to our work is the contents of the onSuccess function. What if the language allowed us to define that function directly, and passing it to the HTTP request without any syntactic fuss?

Functional solution #1: local function definition.

function display(url) {
  var window = new Window(loading);
  function onSuccess(page) { window.setContents(page); }
  http(url, onSuccess);
  return window;
}

In fact, why bother with giving the function a name? Its contents are obvious enough, so it would be shorter and cleaner to just passing it to the HTTP request without any name.

Functional solution #2: anonymous function definition, or «lambda»

function display(url) {
  var window = new Window(loading);
  http(url,function(page) { window.setContents(page); });
  return window;
}

The latter example is syntactically correct JavaScript code, by the way. As you can see, the code is both shorter and more readable than the class-based version. In general, using inheritance or interface implementation to define a single method is always significantly more wasteful that using an anonymous function.

There’s a third requiremend that I left unvoiced. Look back at the previous examples again. Notice how the class-based examples have to store the window object explicitly, but the functional versions use the variable directly? This is known as «closures» : when defining a function inside a function, the defined function actually carries with it all the variables that were present when it was defined. Functional languages always have closures, because any non-trivial operation you might wish to place in an anonymous function will involve accessing data that was present near the definition.

In summary: functional programming involves the ability to manipulate functions as you would manipulate any other data type (define anywhere, store, pass, return…) and allows writing more concise programs whenever you need to pass, store or return some behavior to be executed later.

Examples of functional languages:

  • JavaScript
  • Lisp (including Common Lisp, Scheme …)
  • ML (including SML, OCaml, F# …)
  • Haskell
  • Ruby
  • C#

Examples of near-functional languages:

  • PHP only supports lambdas and closures since 5.3, and closure syntax is suboptimal.
  • Python has 99% support, the only issue is that only one-line lambdas are allowed.
  • XSLT 2.0 has no anonymous functions, but is otherwise functional. I kid you not.

Examples of non-functional languages:

  • Java
  • C
  • C++

Pure Functional Programming

This is the other definition of «Functional Programming» that you can hear, and the confusion is understandable: languages that match definition #2 have long been the only examples of definition #1 as well.

The relevant concept is purity: can the impact of a function on the rest of the program be described only in terms of its return type, and can the impact of the rest of the program on the function be described only in terms of its arguments?

Consider the signature of an interface method:

public int frobnicate(Foo f);

There are many ways in which that function can affect the rest of the program:

  • It could call a global function to perform changes in the global state, such as altering a singleton, setting a global variable, printing data to the screen, sending data over a socket…
  • It could change member variables of the object it is called on.
  • It could call methods on its argument.
  • It could return a new value that is then used by the calling code.

There are also many ways in which the rest of the program could affect the behavior of the function:

  • It could depend on global variables to decide what it should do.
  • It could read global state (user input, files, sockets…)
  • It could access the data of the object it is called on.
  • It could access the data of its argument.

The entries marked in bold can be deduced from the method signature: having a return type means you want to return something, being non-static means you wish to read the state of the object you are called on, having an argument means you want to read data from that argument. The other entries are possible, but not readily apparent in the signature.

Pure functional programming is the principle of least surprise: sure, those entries that are not in bold are possible, but it would be surprising if they happened, so they are in fact not allowed. A pure function can only read data from its arguments and can only write data by returning it.

Also, pure functions imply that data is immutable — if functions are not allowed to change anything, then they are not allowed to change data structures, so the data structures are by definition immutable.

By the way, pure functional programming is all about what you cannot do — so you could in theory write functional programming in any language just by following coding conventions to that end. In practice, immutability is quite hard to respect if there is no syntax for making it easier — you spend a lot of time creating manual copies of objects in order to change a single field.

Working with pure functions and immutable data structures involves both benefits and trade-offs when compared to non-pure programing. Think of it as a slider which goes between «0% Pure» and «100% Pure» and you pick one value for your entire program. Many algorithms and concepts get easier to understand, change and manipulate as the slider moves towards the 100% end, but there are some concepts that are intrinsically non-pure and actually get harder and harder to think about as your slider moves away from the 0% end. For instance, working with a persistent data store is an intrinsically mutable endeavor, and it’s hard (but possible) to think about it in pure terms.

I find that a slider value of 95% in a language that makes such a value possible is the optimal place to be — the 5% includes writing code that is truly non-pure (such as database manipulation, responding to HTTP requests…) as well as code that is non-pure on the inside but is actually pure from the outside (such as memoization and caching).

Article Image © Tim Olson — Flickr

RS Latches


One of my oldest memories harkens back to 1989 — I was a four years old boy in communist Romania. My mother has a PhD in biochemistry, and her father and brother both had a PhD in physics, so it might seem quite unsurprising that I followed in their rational and scientific steps, but it would be hard to put the proverbial finger on how, precisely, that happened. This is what that specific memory is about.

It all started with my frequent stays at my grandparents’ house, a conventional second-floor flat in Bucharest, where my grandmother and grandfather took turns babysitting me while my parents were out hammer-and-sickling (well, laboratory-and-architecturing) the socialist ideal. My grandfather was intent on turning me into a bright scientific mind, so he asked my uncle to build me a computer. Nothing fancy, mind you : it had no operating system worth mentioning, no hard drive, and depended on magnetic tapes and 8″ floppies to actually work. My grandfather showed me a standard BASIC interpreter, and had me type in a program that displayed, line by line, an analog wall clock.

Fascinated, I started spending my days on that thing writing BASIC code, learning how to build increasingly complex programs, and starting my career as a programmer. Or so had my grandfather hoped. A few seconds after discovering computer programming, I discovered computer games, which were a far more fascinating novelty than displaying wall clocks on a screen. I promptly forgot about the BASIC interpreter and for nine long years, I assumed that programming computers was some sort of esoteric arcane art, invisible to the uninitiated, and remained entirely uninterested in it.

Instead, my breakthrough as a rational mind was triggered by the close proximity of two unlikely catalysts : one was a frequent diet of mecha anime (the quality of the Romanian dubbing was hilarious), and the other was a cupboard drawer full of bits and pieces that could, to a child’s mind, be spare parts out of which a robot could be built : wheels, small electrical engines, cables, batteries, microprocessors, light bulbs… as a matter of fact, once I had the basic understanding of how a batteries + cables + engine setup worked, I did build small uninteresting contraptions.

I became intensely interested in building a robot myself. Obviously, I did not have the technical skills or the willpower to follow through with this plan, but I would listen, entranced, to any explanations I would receive that were related to robot-building. My grandfather was writing a book on semiconductors at the time, so he showed me a large digital circuit and let me understand that this is what the brain of a robot would look like — only quite larger !

And so he started teaching me the elementary principles of digital circuits. We spent little time discussing the underlying technical details, and instead concentrated on the simpler abstractions of logical gates. This time, I was fascinated. I would grab a pen and paper and draw circuits for fun. The meaning of AND, OR, NOT, NAND, NOR and XOR are imprinted in my brain ever since I was five years old… there are few concepts remaining in my brain that have been there for so long.

My circuits behaved like mathematical functions : I would mentally set the input bits, and the output bits would contain the result, so I could build the truth tables that mapped input combinations to output combinations. I experimented with basic arithmetic functions, learning binary as I went. And then one day my grandfather showed me a circuit that I could draw from memory to this day : the RS latch.

The fun thing about the RS latch is that it cannot have a truth table. It is an entirely different beast from arithmetic “write input bits, read output bits” circuits in that it has a state. If you write 1 to the S input bit, the output becomes 1 and remains 1 until you write 1 to the R bit, at which point the output becomes 0, and so on. To say things differently, the S input bit stands for “Set” and sets the output to 1, and the R input bit stands for “Reset” and sets the output to 0. When both R and S are set to 0, then the latch output bit is equal to whatever it was previously set to.

The RS latch is a versatile little thing. It can be used to implement RAM (one of the first things I did with it), or it can be chained to create an incrementing counter.

After a while, my interest in digital circuits faded, but the effects it had on my character were permanent : I now had experience applying abstract rules to abstract concepts, and it was fun.

Article image © Cornelia Kopp — Flickr

Short OCaml Diffs

I’m working on some wiki functionality right now, and one of the basic properties of a wiki is that history is kept for all pages. And this should be done elegantly, without actually keeping a copy of every version, especially since many changes are actually just small fixes. This is usually done by keeping a diff — a small piece of data that stores the changes needed to get from version A to version B. Computing an arbitrary diff is a fairly easy task, but computing a diff that is as small as possible is actually challenging, because there are many ways in which a diff can be created and only a few of these are optimal.

The standard approach to computing diffs is to find the longest common subsequence — a non-contiguous sequence of characters that is found in both versions, such that one may turn version A into the sequence only by removing characters, and then turn that sequence into version B only by adding characters. The list of additions and removals is what is commonly referred to as a diff, because this is how the UNIX program diff works.

I am not happy with this solution. For one, it works based on lines, which is fairly acceptable for source code that is cleanly split into many lines, but the average wiki page is not as clean. Instead, every paragraph tends to stay on its own line and is only word-wrapped for convenience. Using a line-based diff tool on such data would cause the removal and addition of entire paragraphs, which is certainly not optimal if the change was a minor typo fix on a single word. And since the diff algorithm itself is quadratic, it cannot be coaxed to work on characters instead of lines : a one-millisecond line-based run would take ten seconds on characters !

Another shortfall of the longest common subsequence approach is that it fails to detect sentence or paragraph swaps, yet these happen quite frequently in the rewriting stages of a wiki page. Indeed, regardless of how clever the diff algorithm is, the diff format itself provides no primitives for moving data around, only “add this” and “remove that” are supported.

I have designed and implemented a custom diff algorithm. Instead of working based on “add” and “remove” primitives, it carries a data segment where all the new content is stored, and works with “blit from old” and “blit from data segment” instructions to construct the new version from the old. Its format is optimized to be stored as JSON, and the diff application algorithm is simple enough to be handled by a short piece of JavaScript if necessary. It’s on GitHub, by the way.

For instance, consider the sentences “The quick brown fox” and “Brown foxes are quick“, and how the former can be transformed into the latter.

A longest common subsequence approach (on characters, assuming this is sane) would determine that the LCS is the seven characters “e quick“, and so it would be written as “ThBrown foxes are quick brown fox” which is indeed the shortest diff you can get using the LCS approach.

My algorithm instead determines that contiguous substrings “rown fox” and “e quick” are present in both sentences, so the diff would be written as “Brown foxes are quick“. The corresponding JSON is quite short, too:

["Bes ar",1,[10,8],5,[-12,7]]

The first element is the data segment (the five new characters), single integers are blit-from-data-segment (length) instructions, and integer pairs are blit-from-old (offset,length) instructions.

The theoretical algorithm complexity is O(m+n), the actual implementation on GitHub is worst-case O(mn) because I used lists instead of hash tables at one point, but should actually be O(m+n) on any input that happens to be written in a human language because working on a human language almost guarantees that those lists will contain zero or one elements in most situations.

The algorithm itself relies on statistical properties of character pairs to quickly match up sequences from both strings, such as by finding out that the ox pair only appears once in each sentence, or that the Br pair only appears in the new sentence. On medium-sized texts (up to 50,000 characters), there are several character pairs that are unique or missing, which helps the algorithm guess correctly where each piece came from. Once the source of the pieces is identified, generating the diff is extremely simple.

A Decentralized Dating Protocol

I’ve been wondering about how a decentralized dating protocol could be implemented.

Here’s the idea : Alice would love to date Bob, and Bob would also love to date Alice, but both of them are too shy to actually ask the other out. The good news is, Alice and Bob are the protagonists of almost every single cryptography protocol out there, and they naturally have asymmetric encryption keys. So, they resort to using the MAP (Mutual Attraction Protocol) to reveal whether they are interested in each other, but only if the feeling is indeed mutual.

An elementary implementation of the protocol could behave as such:

  1. Bob wishes to tell Alice about his feelings. He concatenates a fixed ILOVEYOU prefix and random suffix RAND, encrypts it with Alice’s public key APub, then with his private key BPriv, This creates the cipher BPriv(APub(ILOVEYOU || RAND)).
  2. He then publishes the cipher to a public location in a way that can be easily found by anyone looking for it — such as a database indexed by the public key of the uploader.
  3. Alice wishes to know whether Bob has feelings about her. She scours the public databases for any items matching Bob’s public key, and stumbles upon the aforementioned BPriv(APub(ILOVEYOU || RAND)) cipher.
  4. Shen then decrypts it with Bob’s public key, and uses her own private key to decrypt the result. She ends up with ILOVEYOU || RAND and determines, based on the presence of the initial ILOVEYOU that there’s indeed a love interest around.

The protocol is of course symmetrical: when you wish to express your interest, you should both publish it to the public database and look through that database for a reciprocal interest.

This version of MAP has the following interesting properties:

  • Only Bob can publish Bob’s feelings on the public database, because doing so requires encrypting them with Bob’s private key.
  • Only Alice can determine whether a given entry on the public database is about her, because doing so involves her own private key.

However, it assumes that Alice will only search for Bob’s message if she is herself interested in him. This is not necessarily true : Alice might be running some software that automatically searches through the public database for people who like her. This clearly violates the constraint that the feelings should only be revealed if they are indeed mutual — that, before Alice can determine whether Bob posted an “I’d date Alice” message, she must post an “I’d date Bob” message herself for Bob to read.

The question I am stuck on is, can these constraints be implemented without involving a trusted third party ?

Article image © Nattu — Flickr

Software Patents – Why, Why Not

Intellectual property exists for a reason: protecting creators from the theft of their creations. Not in the sense that the idea is stolen — idea theft would imply that the original owner would not have access to the idea anymore — but rather that the opportunity to monetize that idea is stolen or destroyed by someone with the industrial and commercial firepower to completely take over the market. Georges Méliès created the 1902 motion picture Le Voyage Dans La Lune, which was a special effects masterpiece at the time, but was denied the opportunity to monetize it in the United States because of piracy and eventually went bankrupt. The question asked by intellectual property supporters is, would people invest time and money in creating new motion pictures if they knew the same thing could happen to them? Copyright is intended to ensure that, if someone runs with the author’s work, the author can shut them down or make them pay.

The same reasoning exists behind patents: inventors spend time and money on the arduous process of elaborating and perfecting new machines or processes. Then, he starts selling the invention, and inevitably, other people and companies will notice. People and companies that did not participate in the elaboration, that have a better manufacturing infrastructure, a better sales network, and are not weighed down by the cost of researching the invention in the first place, so they can easily manufacture and sell the invention, outmatch the inventor on the market due to lower prices and larger quantities, and keep the profit to themselves. But if the inventor holds a patent, he can force the rogue manufacturers to pay him for his invention.

Perhaps the single greatest example of modern patents is the pharmaceutical industry. The cost to creating a new drug is tremendous, not because thinking of new molecules is a strenuous activity, but because of the many clinical trials required to turn tens of candidate molecules into a single FDA-approved doctor-prescribed drug. But once the drug exists, competitors can find out its composition at a fraction of the original cost (after all, the composition is published as part of the approval process), manufacture it, and sell it at a price that does not need to pay for the original research and development. Were they allowed to do so, one would expect the pharmaceutical innovation to freeze in a prisoner’s dilemma as stealing the drugs of others is far more profitable than inventing your own.

Patents work in these traditional industries for two reasons:

  • Inventing the patented mechanism or creation is a costly, arduous process, but creating a competing product based on the invention is comparatively cheap. This means that without patents, copying is more profitable than inventing.
  • Two people inventing the exact same thing without hearing from one another is extremely unlikely, either because the inventions are so esoteric and unusual that no two people would reasonably think of the same thing, or because the inventors are part of a community of experts that regularly hear of what the others are working on.

Software patents protect algorithms — processes executed by computers to transform data and interact with humans and other computers. The problem with patenting software is that the two reasons above are the exception rather than the rule.

Creating competing software products is hard. Assuming that Microsoft invents a wonderful new algorithm and includes it in Windows 8 without patenting it, simply reusing that algorithm does not make the creation of a serious Windows 8 competitor any easier. The modern software industry is so complex and multi-faceted that the reuse of any algorithm, no matter how clever or innovative, would not contribute significantly to the creation of a competing product. Copying over a single algorithm does not grant your competitors access to your network of compatible software, your corporate partnerships, your user base, your software development infrastructure, your reputation, your existing maintenance contracts, or any of the many other ways in which the average software company turns an algorithm into money.

And there’s more. The job of software programmers is to create new algorithms, and there are quite a lot of software programmers around. Hundreds of thousands of software programmers. This means that any algorithm which happens to be a slight modification or adaptation of an existing known algorithm, or the result of a deterministic problem-solving strategy, has already been invented at least a dozen times, currently sleep in a legacy project somewhere, and will be re-invented another dozen times in the coming years.

The possibility that someone might steal their work does not deter these programmers from inventing algorithms.

There is no prisoner’s dilemma in software development where everyone waits for the others to invent new things so they can copy them. We have a thriving start-up community bent on inventing new things that almost never patents anything at all. We have an industry that focuses on the quality and nature of the data it collects rather than the ever-changing processes that are applied to that data. The software industry in general does not need patents to keep the innovation ball rolling.

This, in itself, would be no issue — if software companies want to give money to the USPTO for nothing, let them be.

The problem is that software patents are actually hurting innovation. There is a strong tendency for the USPTO to grant patents for algorithms that are not in any way esoteric or unusual, or even new. As a software programmer, I invent new algorithms as part of my daily showering routine ; there is a significant probability that those algorithms are patented either because they are so obvious that a patent-happy company already thought of them, or because they are a special case of one of the surprisingly broad “any computing device, any storage mechanism, any input mechanism” patents being granted recently. The fact that I independently developed that algorithm through logical reasoning, simple modifications of an existing public domain approach, or even genius, does not absolve me from respecting the patent. Sure, I could weasel my way out of the patent by tweaking the algorithm, but doing so is hard for two reasons that are completely unrelated to the underlying technical principles :

  • The only way for me to know what patent I must work around is to look through all existing patents for those that might be applicable to my situation. This means I need to research existing patents as part of my daily shower routine, which is highly inefficient and wastes a lot of water.
  • Even if I was able to identify the applicable patents, I would certainly be able to create algorithms that still did the job without infringement, but I need to hire a lawyer to determine whether there is indeed no infringement.

All of this leads to a massive increase in the cost of developing new software, on the scale of having to check for copyright infringement every time you write an e-mail message. Where software innovation used to be a single engineer working for a few days on a project (which, let’s admit it, is fairly cheap), it now involves much costlier patent lawyers even in the case when no infringement actually happens. Needless to say, most start-ups don’t bother with the lawyers and live at constant risk of discovering that one of the algorithms they created on their own has been patented by someone else.

And all of this happens even when the patent is obviously invalid due to prior art, because it’s usually cheaper to spend a few days working around the patent than to risk going to court over it.

I will not even go into how the monopolies induced by the patent system hurt the users themselves. There have been plenty of discussions on that already.

But there is a flip side to all of this. Patents are not only about inventing new algorithms and processes, they also pay for the cost of validating them. Some algorithms are not solutions to black-and-white, solved-or-not-solved problems where one can prove on the white board that the invention always produces the correct output. Some problems are so complex that any candidate algorithm must be tested against huge amounts of data, and the testing is not cheap. It might be run against data that the inventor has spent a lot of time collecting from the real world, it might be run against data that the inventor had to buy at a steep price, or it might be tested on real-life humans or customers with the risks and costs this involves. And even if the tests themselves are free, there’s the possibility that one needs to try out hundreds of different algorithms or parameters before an acceptable solution is found.

This is almost the same problem as the pharmaceutical industry, where thinking up new molecules is cheap and running the clinical tests is the real cost. The only difference is that algorithm validation is not a mandatory or even obvious process. The various pieces of user interface in Apple, Facebook or Google products might seem utterly obvious when you see them, but each of them passed grueling internal testing processes and survived thrilling tournaments of feature comparisons before ever going live, and these processes and tournaments cost a lot more than the developers thinking about user interfaces during their morning shower.

I suppose the real question that needs to be asked is, why do we have a patent system in the first place? Is it because innovation cannot exist without patents (which is clearly not the case as far as the software industry is concerned) or is it because we somehow feel that pioneers deserve some kind of reward for being the first, even though they were often not the only ones?

Article image © Stuart Heath — Flickr

AJAX Tiles Game

I started programming life by writing an AI for a simple game called RoboWar (back when it was a shareware game on Macintosh, in the pre-OS X days). Make sure you click the “Edit” buttons at the bottom of that page to view the kind of code I had to deal with. I then moved on to HyperTalk and, when I finally got my hands on a PC, started a ten-year spree in C++.

Recently, I started toying with a similar idea: to create a game where people could write an AI that would play for them, with simple and easily modeled rules, and a really easy way to share your creations with your programmer friends (because just adding your AI to a long list of other AIs is kind of boring and not very social).

So, I wrote a simple game in JavaScript. You feed it one URL for each player, and the game connects to that URL and asks for the next move. So, your AI is one URL, which makes it easy to share. Not to mention that you can probably write that AI in any language you wish, provided it can respond to HTTP requests and read JSON arrays of characters, and you can resort to a lot of fancy stuff (cloud computing, learning from experience, genetic algorithms) that would otherwise not be available in a game like RoboWar. The game board is readily available for you to observe while the game is played. It looks like this:

The rules to the game are fairly simple: pick white tiles to paint them with your color, pick tiles you already own to nuke them (blows up that tile and the four adjacent tiles as well), and connect a large group of tiles you own to a small group of tiles your opponent owns to conquer them. The player with the most tiles when the board is covered, wins. Play it here. I’ve already programmed four different AIs:

  • http://tiles.nicollet.net/ai/?random plays randomly.
  • http://tiles.nicollet.net/ai/?expand creates one single group of tiles and expands it as fast as possible.
  • http://tiles.nicollet.net/ai/?fearful nukes its own tiles when they might become connected to enemy tiles.
  • http://tiles.nicollet.net/ai/?contain attempts to contain the enemy in a prison of nuked tiles.

None of these are particularly good, and a human player can certainly beat them quite easily, but they’re a good challenge for your own AI. If you wish to participate without starting from scratch, you can fork the sample AI on GitHub to create your own (it’s PHP code). Also, the GitHub README contains all the technical details about what data your AI URL receives and what data it should respond with. And, if you’re curious, the game rules are cleanly described in the JavaScript source code.

I’m looking forward to hearing from you. Please share your AIs in the comments below, if you please: I’m really interested in what you might come up with.

99 OCaml Problems

I find that when learning the basics of a new functional language, or introducing oneself to functional programming in general, the 99 Lisp Problems are a good start. You don’t have to run through all of them, but how fast you can solve them says a lot about your fluency in the language, and has many things to teach even to otherwise experienced programmers.

I’ve started committing solutions to those 99 problems in the Objective Caml language to GitHub ; the repository is:

https://github.com/VictorNicollet/99-Problems-OCaml

If you’re interested in starting OCaml programming, have read the manual, and are looking for exercises to get up to speed, consider spending a little time trying to come up with the solutions on your own, using my solutions as a reference. And if you’re an OCaml expert, feel free to propose corrections and improvements : this is what GitHub is for.

Other solutions for OCaml do exist, for instance from Christian Kissig, which is currently more complete than mine. Note that we don’t share the same philosophy as far as writing the code goes, so you can probably look at both sides of the issues. For example, on the “find the last element of a list” problem, Christian throws an exception if no last element exists, while I return None, because I hate exceptions.

Enjoy.



1150 feed subscribers
(readers who polled a feed this week)