Monthly Archive for January, 2012

OCaml Submodule Pattern

My code is quite large for an OCaml project. The main RunOrg repository alone contains 46212 lines of OCaml code (plus an additional 5631 lines of OCaml mli files) — and then, there’s the web framework code and the independent plugins code.

It’s is Better™ to have many short files than a few long ones. One reason is incremental compiling with ocamlbuild : that the smaller your files are, the smaller the percentage of code to be compiled when you make a small change. Another reason is that files provide a natural delineation of code that makes it slightly easier to reason about.

The very process of splitting a large file into smaller files is also an excellent way to clean up the code. Every split is an opportunity to move some code to a more generic location — why have a CMember_importParser module when all of its functionality could fit into an OzCsv plugin module ? Even when no such generic solution exists, cutting through the jungle that a 2000-line module contains helps clean up dependencies, identify shared functionality and imagine better ways to design code.

Still, when cutting up code this way, the problem of encapsulation remains. If code that relates to pictures (an upload module, a transform module, a download module, an access rights module) is split across several files, it is desirable to let each file access functions and values from other values that would not otherwise be shown to modules not related to picture processing. For instance, a get_download_link function should be available throughout all picture-related modules, but the rest of the application should use the get_download_link_for_user function that checks whether the user is allowed to download the file.

In order to achieve several nested levels of encapsulation required to work with modules this way, I have come up with a convention :

  • A module name (and thus, a file name) is composed of segments written in camelCase and separated by underscores. For instance, CEntity_view_grid is a module name containing segments CEntity, view and grid.
  • Modules with only one segment are public. Any other module may include, open or otherwise reference them with no limitations beyond what the module signature says. So, CEntity may access MGroup freely.
  • Modules with N > 1 segments are private. They may only be accessed by modules which share the first N-1 segments. So, CEntity_view is available to modules CEntity and CEntity_edit but not CPicture.
  • A module with N segments may export any module with N+1 segments it can access, possibly under a more restrictive signature. For instance, CEntity_view is available to all other modules as CEntity.View.

To make these rules easier to respect, private module dependencies are made explicit by adding a list of module aliases at the top of each file. The top of my cEntity_view.ml file starts with :

module Sidebar     = CEntity_sidebar
module Unavailable = CEntity_unavailable
module Edit        = CEntity_edit
module Info        = CEntity_view_info
module Directory   = CEntity_view_directory
module Grid        = CEntity_view_grid
module Wall        = CEntity_view_wall

It is forbidden to use a private module without going through such an alias, and it is forbidden to define such an alias anywhere except at the top of the file. This makes it extremely easy to determine whether private access rules are respected.

The rule of thumb for splitting files (in my particular coding style) is :

  • Code for separate layers (model, view, controller…) go into separate public modules.
  • For complex code (such as complex rules in model or controller code), consider splitting files larger than 200 lines.
  • For simple code (such as HTML template or JSON serialization definitions), there is no splitting limit except for factoring out common behavior.

freedom.txt

Francis suggested the freedom.txt idea in early January. It’s catching on. I think this is a good idea, although I do not agree with the wording of the message, so here’s mine.

You might not understand what I am rambling about in the two sentences above. What is going on?

Until recently, humans were fairly similar to each other in terms of capabilities. If an individual decided to annoy, harass or harm others, their impact would be limited to what they could do on their own before being stopped, or they would have to convince enough people to help them. When you were a large organized group of people, you only had to care about other large organized groups of people.

Even with the advent of modern technology up to the 1980s, when two people in a stealth bomber could mash tens of thousands to a pulp by pressing a button, this was still the consequence of an industrial infrastructure allowed by a large organized group that willingly granted those two people such power.

Guns are a special case. An individual with a gun can do more damage than average, faster and without retaliation. What happened next is hardly surprising: some people decided to fight fire with fire and buy their own guns as a deterrent, while others decided that gun ownership — or rather, the increase in destructive power provided by guns — should be heavily regulated.

And then, computing and the internet happened. Owning an internet-connected general-purpose device is quite affordable these days.

And owning an internet-connected computer increases the capabilities of individuals by several orders of magnitude, as far as processing data and communicating with others is concerned. While computers are not as lethal as guns (tech support calls excluded), the increase in lethality provided by guns is much smaller than the increase in processing power and communication reach provided by computers. This is literally the first time that we, as a species, have to deal with some individuals having enough power to harm, disrupt or topple large organized groups.

A handful of people can harness the power of computing to send billions of spam messages, bring down entire web sites or networks or nuclear plants, make elite KGB spies green with envy, collect personal information about thousands of people, tell millions of people about a new law that mainstream media are silent about, or illegally distribute content without compensating its creators.

Is it any surprise that corporations, organizations and countries are fighting to regulate such incredible power?

You may not run software on iWhatever devices without consent from Apple. In France, you are legally responsible for when third parties commit crimes using your IP address. Various laws around the world including SOPA, PIPA and ACTA aim to provide counter-measures to illegal distribution of content. Germany and the UK are getting anal-retentive about what cookies you are allowed to send to your users.

And yet, there are immense benefits still to be reaped from a free, open, uncensored internet, and I am certain they far outweigh most of the costs involved. To willingly throw away those benefits in order to maintain existing business models and political habits strikes me as a very bad idea indeed, and one we should fight against.

Another alarming development is that the general public, lawmakers included, are woefully incompetent when it comes to computers and the internet. They do not understand why it works, and so they do not understand why actions they take might prevent it from working or what negative consequences. Laws are drafted and voted on without asking any experts for input, even when experts are quite outspoken against them.

And we, the experts, are the cause. We are dealing with people who thrive on communicating with the public — we are fairly good when it comes to communication, but only with each other. Is it any wonder that no one listens to us? We are so familiar with the details, nooks and crannies of our high-tech world that we fail to explain, in simple terms, why non-experts should care about these issues.

Why did it take something as critical as SOPA and PIPA to get us moving? Shouldn’t we be the ones leading the conversation on computers and the internet, instead of mumbling in our collective niche beards when clueless members of our parliaments speak of «series of tubes»? Could it be that we are so used to wielding awesome individual powers of communication, that we have forgotten how to team up to make our voices heard?

Frameworks, Libraries, Conventions

Funkatron came up with the MicroPHP Manifesto :

I am a PHP developer

  • I am not a Zend Framework or Symfony or CakePHP developer
  • I think PHP is complicated enough

I like building small things

  • I like building small things with simple purposes
  • I like to make things that solve problems
  • I like building small things that work together to solve larger problems

I want less code, not more

  • I want to write less code, not more
  • I want to manage less code, not more
  • I want to support less code, not more
  • I need to justify every piece of code I add to a project

I like simple, readable code

  • I want to write code that is easily understood
  • I want code that is easily verifiable

Without surprise, a large swath of the community did not take it well, for similar reasons to my earlier piece against Zend Framework — deviation from the commonly accepted norm.

I have come a long way since I wrote that article, and I must have been walking in circles, because I actually ended up where I originally begun : why do we call these things frameworks ?

Zend, Symfony, CakePHP — as well as Node.js, Rails, Django, Ocsigen … — actually contribute three different things to projects that use them.

Libraries

A library provides functionality used for solving general problems in a flexible, standalone manner. Zend_Mail is a classic example of the library aspect of Zend Framework: you can plug it into your application and start sending e-mail. The interface you would use is uncluttered by details that are not directly related to sending e-mail.

The core qualities of a library are its power (how many different aspects of a problem does it let me solve — attachments, rich text, bouncing, MIME handling…) and the clarity of its interface. What problems can you solve, and how fast can you solve them?

Conventions

When you hear «conventions» you immediately think of opening brace positions and variable naming rules. It’s about more than that.

The Model-View-Controller separation is an example of convention: it has been decided that under no circumstances should HTML rendering occur in Model code, no HTTP or session handling should happen in View code, and no SQL queries happen in Controller code.

Good conventions are designed to let the developers assume interesting properties about the code without having to actually read it. A convention like «no global variables» means I never have to care about global state in my code, ever. A convention like «view code must respect the law of Demeter» means all the data used by the view is right where it is being initialized.

They are also designed to make reuse and interoperability easier by reducing the number of ways in which a possible interface can be implemented. A convention could say the values are passed by assigning them to members post-construction and not as constructor arguments, so you have one less point of contention between the object that is initialized and the object that does the initialization.

Last but not least, conventions are usually based on experience of things that could go wrong if certain behavior is allowed. A typical example is the requirement to escape all strings as they are being output — eliminating any ambiguities as to whether the string has already been escaped elsewhere and should be output as-is: it has not.

Zend comes with a variety of useful conventions enforced both through the interface of its tools — this is how you use a view, this is how you define a view helper that should be available from within any view, this is how you bind a piece of code to an URL, and so on. I happen to disagree with many of those conventions myself — because I believe they solve the wrong problems — but they are certainly better than a project with no conventions.

For the reference, my PHP conventions are described in the user manual for Ohm.

Framework

A framework is actually going a step further than mere conventions. They are super-conventions designed to be respected by plugin authors. The point is that if plugin A and plugin B respect the set of conventions provided by the framework, then they can be used together in the same application.

Consider a practical example : a plugin that implements a CAPTCHA field in a form, and a plugin that displays and submits a form through AJAX. On a bad day, it goes like this :

  1. When an error occurs, the server-side AJAX-form plugin sends out a small piece of JSON containing the fields that have errors, along with the error messages. A small client-side script applies these.
  2. However, the CAPTCHA plugin expected the image to be reloaded when an error occurs.  It may either keep the same image and target word — defeating the purpose of a CAPTCHA — or change the target word without knowing that the image could not be changed.
  3. You then need to post on StackOverflow hoping for a solution, search online for a patch to either plugin that could make it work as expected, or try to read the code to either in order to create the patch yourself.

Had the framework provided a clean notion of « this field must be refreshed on every attempt » as part of their form interface, both plugins would have used it — the CAPTCHA plugin would have marked its field as such, and the AJAX plugin would have implemented a special case for such fields.

As such, the purpose of a framework is to provide a clean, unambigous and extensive vocabulary that all the plugins should be able to speak, and that is designed to cover as much real-world situations as possible.

Zend Framework and Symfony in particular do an absolutely great job of this. When you can have a pager component push its data to the page through a progressive enhancement component, and log its performance to FirePHP when an user authentication component  determines that the viewing user is a developer, and all of it works by plugging square pegs into square holes, you know there has been a lot of great work going on below the hood.

Back to the point

Using a framework is all fun and games until you need to disagree with it. You need to plug out what does not work, and plug your own implementation in its place. The more complex the vocabulary, and the harder it will be to write new code — frameworks make it easy to connect existing components, at the cost of having to deal with more concepts when actually implementing new things.

What it boils down to, in the end, is whether you expect to be reusing a lot of third party components, or to write a lot of your own code. In the latter case, MicroPHP — and lean environments that do not have a heavy framework side to them — is actually an improvement over trying to fit a six-inch wooden square peg into a mini-USB port.

The exception to this is, of course, being so familiar with a particular framework that you immediately know what changes you need to do without fighting against third party code.



1342 feed subscribers
(readers who polled a feed this week)