Tag Archive for 'Productivity'

Comment Branches

Your development job is making changes in your software. Writing, testing and debugging those changes takes some time.

If your job is anywhere as hectic as mine, you will have to fix and deploy urgent patches, even when your application code is in a half-written, half-debugged state because of the feature of the month.

This is what branches are for. You keep two versions of the code, one of which is called the trunk and is always ready for deployment, and another which holds the changes that you are working on.

When your feature is done, you merge the two versions together. You want to keep the merge operation painless. To do so, you have several kinds of branches available.

The repository branch is built into your SourceSafe/subversion/git/whatever. It creates two independent copies, and you need to migrate changes from the trunk to every branch out there as soon as possible, or the merge will make you wish for a sweet and merciful death.

By the way, changeset-oriented tools (like git or mercurial) make this easer, while revision-oriented tools (like subversion) make it harder.

The feature branch is done using programming logic. The code you deploy to production supports the new feature, but it is turned off for everyone except yourself. This technique is great for adding features, but inefficient when changing existing ones.

A side effect of the feature branch is that you can stress-test new code by rolling it out to increasing numbers of users progressively.

The comment branch is an odd gambit. It involves ripping out an entire module and replacing it with another that has a different interface. This will involve large amounts of re-wiring all over the code base, and these will take hours or days before they can be compiled, let alone tested.

Use a comment structure such as this one:

/*[*/ old code /*|* new code *]*/

It is trivial to build a text-replacement macro that turns the above into the code below and back:

/*[* old code *|*/ new code /*]*/

Use the macro to switch between development mode (when you write new code and desperately try to get it to compile) and fix mode (when you edit the old code and deploy it). For consistency, always commit the old version to the repository.

Why use comment branches instead of repository branches ? Maybe your source control tool sucks at branches. I use Subversion. Yes, I know. Legacy, pain and unlikely hopes of a brighter future.

When a trunk change occurs in a part that has been erased or reworked in the branch, that change will cause a conflict that will require manual intervention. Even with git or mercurial. For a large number of small changes sprinkled over a large codebase that is routinely involving many small updates, repository branches turn into a merge minefield.

Does your branch involve a small number of well-defined files ?

Then you should use repository branches, because conflicts will only happen in those files, and will usually be easy to fix.

Does your branch involve many changes in many files everywhere in the project ?

Then use comment branches.

Last and possibly least, there is the TODO-branch. This involves non-breaking, purely cosmetic changes. 25% of my project uses this syntax for historical reasons:

Table.get id |-> function
   | None       -> return 0
   | Some value -> return value.count

Then, a convention change happened, and this is used instead:

let! value_opt = breathe (Table.get id) in
match value_opt with  
   | None       -> return 0
   | Some value -> return value.count

Then, another convention change happened, and this should be used instead

let! value = breathe_req_or (return 0) (Table.get id) in
return value.count

And then, there’s the current version:

let! value = breathe_req_or (return 0) $ Table.get id in
return value.count

Whenever I change coding conventions, I do not spend the time to reformat the tens of thousands of lines of code in my application. That would have been wasteful. Instead, every time a piece of code is refactored, it is refactored to the most recent style.

The same happens when using an old and a new version of a given API. My code uses two libraries for handling HTML forms, uses both Javascript and Coffeescript, and a variety of similar two-hammers-one-nail situations.

These are, for all practical purposes, branches. They are work that is being performed for long durations. The benefit of TODO-branches is that code in the middle of such changes is still compatible with the trunk. It all happens in the head of the developer, who remembers what changes should be done the next time a piece of code is rewritten.

Article Image © Dominic Alves — Flickr

NoSQL Is A Premature Optimization

Or so Bob Warfield writes. I happen to agree with the title — optimization using NoSQL means using a server cluster to split the load and scale up, and such an optimization is premature unless you are already having the millions of visits it takes to feel growing pains. If I start off on a new project and decide «I’m going to use NoSQL so that it will scale when my project will have millions of users» then I am prematurely assuming that your initial NoSQL strategy will fit the actual million-user scenario that will come up years from now. In fact, the bottleneck will probably be in a feature I didn’t even think of yet, and making it work will probably involve changes in the persistence model. But Bob Warfield goes further than the premature optimization argument:

Point 2:  There is no particular advantage to NoSQL until you reach scales that require it.  In fact it is the opposite, given Point 1.

It’s harder to use.  You wind up having to do more in your application layer to make up for what Relational does that NoSQL can’t that you may rely on.  Take consistency, for example.  As Anand says in his video, “Non-relational systems are not consistent.  Some, like Cassandra, will heal the data.  Some will not.  If yours doesn’t, you will spend a lot of time writing consistency checkers to deal with it.”  This is just one of many issues involved with being productive with NoSQL.

My current SaaS project pivoted from MySQL to CouchDB nearly at the beginning, certainly before we had any customers or any features worth showing. My greatest fear when settling on CouchDB was that I would have to work around the NoSQL lack of transactions, joins, consistency or whatever else you expect from a database system.

I was sorely mistaken, and so is Bob Warfield.

Even though NoSQL fails to solve many of the low-level problems that SQL eats for breakfast, this does not make it incapable of solving the same high-level problems as traditional relational strategies, you just need to understand how to do it, in the same way that you had to understand relational algebra, joins, indexes and transactions before doing anything worthwhile with SQL. Coming to NoSQL and expecting to solve your problems with those same strategies that you used in the relational world is as silly as using a hammer to drive screws in.

For instance, CouchDB has no global consistency, only eventual consistency — there is an inconsistency window where state spread across multiple documents can be inconsistent. This will make any relational programmer scream bloody murder. And yes, if you absolutely and positively need to have that state stay consistent, then you will need some application-side code to do it, and it will ruin your productivity.

But most applications don’t need global consistency, in fact an inconsistency window of a few seconds is acceptable in most situations. It is the programmers who need global consistency, because they do not have the mental tools required to work with eventual consistency. But once you get the hang of it, there is no working around, no overhead, no additional steps or checks required to make your application work. It is a different route, but not a longer one.

In addition to the above, from my experience, there are clear and significant benefits to using CouchDB over MySQL that are not related to scalability or performance. These benefits may well be useless to your specific situation, but they do exist.

1. Schema changes are painless and non-locking

This (lesson 3) is what brought me to NoSQL in the first place.

CouchDB does not implement a schema in the way an SQL product rigidly delineates tables, columns and relationships. Of course, it would be foolish to actually have no schema concept at all, so there is a dedicated schema layer in our application architecture that describes what the CouchDB “tables” look like, in terms of serialization and deserialization. Schema changes are therefore a simple change to the deserialization process, which needs to be able to read the old data format.

For simple changes, such as adding a field with a constant value, no work is required as the deserialization layer can fill in the missing field on the fly. For complex changes that involve application-provided data, such as adding a “file size” field that needs to be initialized with the actual file size, there is a clear benefit to having the application itself perform the schema change, as opposed to application-independent ALTER scripts.

2. Document contents can be dynamic

This was the actual reason we settled on CouchDB: our application lets users add their own custom fields to objects, and then filter/sort based on these fields. This requires almost no programming effort (aside, of course, from the user interface involved in doing so) and is nearly as efficient as using static programmer-provided fields.

I have had in the past some experience with managing arbitrary fields on a SQL platform, mostly when I was working with open source e-commerce platform Magento. Dynamic fields involve some significant boilerplate (such as entity-attribute-value tables) and clever tricks to perform filtering efficiently.

3. The application-database impedance is lower

A typical SQL schema contains two kinds of relationships: natural relationships such as «an article has an author» between two entities that can and will usually be queried independently, and accidental relationships such as «an article has several tags» that are only present because SQL cannot store the tags in the post table. As such, extracting a post from an SQL database counter-intuitively requires one query to grab the post itself, and another query to grab its tags.

CouchDB does away with accidental relationships completely by storing JSON documents. While this might allow a performance in some cases, the main benefit is that object composition as described by the programmer in the application code is persisted intuitively, without jumping through the intellectual hoops typical in relational storage.

4. An identifier-centric application architecture is possible

What does it mean to be identifier-centric or object-centric? A function to get the full URL of an article, in an object-centric application, is a function that takes an article object as an argument (or possibly a member function of the article object) and returns the article’s full URL. In an identifier-centric application, it would be a function that takes an article identifier as an argument (or possibly a member function of the article identifier class) and returns the full URL.

Identifier-centric architectures have major design benefits over object-centric ones, with clear consequences in terms of productivity and correctness, but have a major performance problem as the same data is read from the database several times unless some very complex caching strategies are applied — that data might be read using a quite complex SQL query that is hard to keep in cache correctly.

From my experience, the vast majority of queries in a CouchDB application will either query a document by its identifier, or query a view for several key-identifier-document pairs. In short, most of the data manipulated by the application can be easily traced back to an identifier without any specific design effort. And get-document-by-id requests are far easier to cache and optimize than arbitrary SELECT requests, both at the application level (we have a temporary cache that lasts the lifetime of the HTTP request) and with key-value caches like Memcache.

This may sound like a performance argument, but it isn’t, or at least not in the traditional «NoSQL is faster than SQL» sense. It just means that using NoSQL makes an identifier-centric architecture acceptable in terms of performance.

Article image © Satoru Kikuchi — Flickr

The Art of Development Time Estimates (Part 1)

Writing software takes time, and time is money both in terms of programmer wages and in terms of delayed releases. It makes sense to try and predict ahead of time how long a given feature would take, in order to make an informed decision about whether it should be attempted, reduced or eliminated. If your job is to predict durations, make sure you understand whether you are expected to provide a back-of-the-envelope approximation — with the implication that it could be wrong by an order of magnitude in both directions — or if you’re going for a guarantee — this feature will cost no more than X days of work, unless something really catastrophic occurs, which is what a paying customer wants to know.

If your co-workers ever start using your approximations to define milestones, prepare deadlines and discuss delays, you have not insisted enough on the fact that it was an approximated answer. If anyone asks me for an on-the-fly estimate, I provide an upper and lower bound as “this will take somewhere between 2 and 10 days”. This is an outrageously wide range, but it’s fairly correct in terms of how wrong I can be with my on-the-fly estimates, and it deters anyone from just adding up the estimates to come up with a deadline. Yes, some people have tried converting “between 2 and 10 days” to “around 5 days” but I gave them the evil eye every single time. If anyone needs to turn an approximation into a guarantee, there is no sane reason to use anything but the upper bound.

What Could Go Wrong ?

There’s a fairly common mistake to be made with the upper bounds, and I’ve made it myself quite a lot: not being pessimistic enough. We’re lazy humans, so being optimistic is natural: we come up with a few tasks that need to be done, slap a reasonable duration on each one, and add them up. The upper bound is then pulled out of a top hat as being two to three times higher than the lower bound, because that feels right. Quite to the contrary, the upper bound must be calculated by actively looking for those things that can go wrong. Newbie programmers can usually provide a fairly accurate optimistic estimate because knowing what needs to be done is a prerequisite of being a programmer at all, but the pessimistic estimate requires knowledge of what can go wrong, which by definition is an esoteric list of accidents gathered from experience rather than rational forethought:

  • The feature involves changing some code that is unusually brittle or unstable, so time will be needed to either pay the technical debt up front and bring that code back to acceptable quality levels, or soak up the cost of hunting for bugs after the code has been changed. This is the most frequent issue I encounter when dealing with changes to existing software, because not all code is of equal quality regardless of how much effort you put into it.
  • A library does exist, but preliminary analysis failed to observe that it only supports 95% of the required feature set, so additional time is necessary to obtain the missing 5%. Several internet-facing modules in one of my recent projects use a standard library for doing HTTP requests, but I discovered late during development that said library did not support HTTPS, which prompted me to include a second library, and incur technical debt related to having two overlapping libraries in the same project.
  • The library does fulfill all requirements, but happens to contain an obscure bug that prevents the feature from working as expected, so more time is spent trying to work around the bug and get the library authors to fix it. This is especially nasty when no replacement is possible, such as errors in database servers.
  • The code works as written, but QA testing reveals massive performance issues on typical user input, and time is required to correct the issue. On an older project, I used a jQuery plugin for handling five-star ratings, with a single rating component costing 300 milliseconds in initialization — nothing noticeable on our test pages where only one component was used, but it brought the page load time to an unacceptable three seconds because users created feedback polls with dozens of such components.
  • The programmer who implemented the first half of the feature is ill, on vacation, fired, fighting fires on another project, demotivated, stuck in the snow or otherwise unavailable. Another developer is brought in and needs to spend some time getting familiar with the half-completed code (and getting that uncommitted code from the unavailable developer’s laptop was, in itself, a delay).
  • The programmer who implemented the feature delivered an incomplete buggy product several days late.
  • The programmer misunderstood the requirements and implemented the wrong feature.
  • An unforeseen edge case is detected that has severe consequences on the application architecture. For instance, a given server-side process was assumed to be synchronous but is discovered to be asynchronous with latencies of several minutes on high server load. This makes the original plans for a five-second loading page obsolete, and calls for a costlier, asynchronous “we’ll start working on this and notify you when we’re done” user interface strategy instead.

The list goes on. Think of it as a shopping list you can go through when coming up with a pessimistic upper bound — start with the lower bound and add possible accidents.

Announcing a “between 2 and 10 days” range out of the blue can sound ridiculous, but it is quite less so when it’s actually backed by a list potential problems. Eight days spent working around library issues, obscure edge cases and performance problems is actually pretty normal from my own experience if these problems do come up.

Stay tuned for the next issue, where I will discuss how to work with your team and your stakeholders to lower those estimates.

Article image © Alexandre Pereira Flickr

Agile Code in OCaml

First, a quick bit of background: I’m working on RunOrg [fr], a start-up that provides communities with their own online private social networks à la facebook. The technology stack is Linux-Apache-CouchDB-OCaml, and this has some implications that I will discuss below.

Facebook has it easy in terms of user management: an user starts existing on their platform the instant they sign up, at which point they fill in their first name and last name, and these are displayed to anyone who is allowed to see any hint of the user’s existence. So, making the first name and last name mandatory are quite acceptable.

At RunOrg, we cannot do this for several reasons:

  • User profiles may be created by communities as part of the membership management toolbox: we have to rely on user A to provide data about user B, and user A usually relies on an email-only source (such as newsletter or mailing-list registrations) where no first name or last name is available.
  • A given user may be part of several independent communities, and may choose to manage their identity separately for each one: appear as John Doe in an innocuous community they trust and John Censored in a more critical community.
  • We also allow users to keep control over whether a community is allowed to publish their name on the internet (as part of the online directory, or as comments on public articles).

Our needs for advanced privacy controls involve a more complex management of both what data is available and how we display it. The good news is, it’s certainly possible to handle all of this elegantly in terms of implementation. The bad news — we didn’t plan everything ahead.

It was only a few days ago that our customer requirements sessions brought up the issue of email-only sources: community managers were frustrated by the fact that our mass import functionality required first names and last names. The problem is, in almost every single programming language out there, making a required field become optional is a very dangerous endeavor because the development team must audit the entire code base to identify which parts of the code assume that the field is required, and describe what should happen when the field is null.

In your average PHP project, try making the user name optional and I can assure you that sentences like «You have been invited by  to this event» will appear. Someone failed to audit the who-invited-you-to-events code. At least with Java or C# you will get a Null Reference Exception of some kind that will show up in the logs and give you the opportunity to hunt down the mistake.

The good news is that our implementation language OCaml, does not allow null values. Instead, optional values are handled using a different value type, known as 'a option, which changes everything. An optional value simply cannot be accessed in the same way as a non-optional value. Trying to do so anyway will cause a type error that is picked up by the compiler, so a programmer can rely on these errors to quickly identify all locations the code that assume the value to be present.

I’ll say it again: in OCaml, a field being optional or mandatory is an assumption that is build into the type of that field, so changing the assumption involves changing the type and breaks all code that does not match the new assumption. Applying breaking changes to an OCaml code base is usually as simple as following a trail of compiler errors.

So, that’s what we did. We already knew that the behavior we wanted was to construct the “display name” of users like this:

  • If either the firstname or the lastname are present, use them (if both are present, use firstname-whitespace-lastname).
  • If none were present, then the private display name (visible only to the user themselves on their profile page and in the e-mails they receive) should be their e-mail address and the public display name (visible to everyone else) should be the username part of their e-mail address (so john.doe@gmail.com is shown as john.doe@…)

First, we defined two functions that compute the private and public display names based on the first name, last name and e-mail of the user. Then, the compiler error trail led us to all locations where a change was required, where we quickly identified whether the public or private name was to be displayed and replaced the existing code with our new display name functions. In total, a full audit of 40kLOC was done in less than an hour and I have proof that any code that uses the user name now handles the case where the user name is not provided.

The Rules

When working on any OCaml project, and especially on RunOrg, I follow these few rules:

  • Any assumption must cause a compiler error when broken. Either the code determines on the spot that the assumption is true, or I use the type system to prove that another part of the code already did. This rule took a massive toll on my early productivity, and I attributed it to an inherent cost of making compiler-enforced assumptions, but the real reason was that I was still pretty new at it — the elementary assumption enforcement from my smaller projects was too crude for the needs of the richness of RunOrg functionality, and it took me six months to refactor my early approach into an elegant and streamlined strategy of encoding assumptions into types.
  • Don’t work around the compiler or cheat with semantics. The initial reaction to a system that complains about every little change you make is to try and work around it by using more generic types or storing information where it does not belong. For instance, an easy solution to the optional name conundrum would have been to store john.doe@… as the name, but doing so would have been semantically incorrect (that’s a placeholder, not a name) and would have polluted the database “name” field with things that are not names and that will be treated differently from names at some point in the future.
  • Don’t accept mediocre code or patterns. Sometimes, design choices in the interface of module A will lead to ugly code in modules B, C and D because an unforeseen usage pattern happens to apply 95% of the time and the interface of module A was not designed with that usage pattern in mind. No amount of cleanup or refactoring in modules B, C and D will solve the problem, the only solution is to go back to the design of module A and change the interface even if it means that two hundred client modules will break. Keeping my code clean, elegant and short is worth wading through two hundred modules.
  • Perform lazy payments on your technical debt. I can propagate new design changes through your entire code base in one coding session, but this doesn’t mean I should. Instead, I keep a mental todo-list of all the changes that need to be applied, and apply all of them at once, locally, whenever I have to rework a given piece of code for any reason. While it may seem that such a todo-list is hard to keep and I will inevitably forget parts of it, remember that those design changes came around in order to solve the problem of ugly or mediocre code — by noticing that the code is ugly, I am reminded of the strategies that I set up in order to clean it up.
  • But be eager with small payments. If it’s a matter of moving a few functions around or refactoring a small piece of code, I do it as soon as I am done writing or rewriting it. Cleaning up little odd bits in a mostly clean code base is extremely rewarding.
  • Discover code by trying changes out. If the assumptions are correctly laid out, then the easiest way to determine the implications of a change — whether it will work and how long it will take — is simply to try it out. Following the compiler error trail will quickly reveal how many things are impacted by the change, as well as any unforeseen massive consequences. If it turns out that the change is too impacting, I just roll back my edits.
  • Keep interface patterns to a minimum. The basic idea behind having few different interfaces implemented by many parts of the system is usually expected to be «code is easier to reuse» but I disagree. Yes, that is a frequent benefit, but certainly not the most essential. Having few different interfaces means that most of my code can be described using a small vocabulary of interface patterns, and that looking at some code immediately reveals the pattern being used there. It also means that any design changes can be expressed in term of pattern changes, and can be applied almost blindly to all locations where that pattern was used. Last but not least, by using a simple shared vocabulary for large sections of the applications, I make it easier to recognize patterns in the more chaotic sections based on how they interact with the cleaner code. It’s easier to determine that two sentences have the same meaning if they share some words.
  • Love your code. In the RunOrg code base, priority 3 is making sure the code is well-designed, clean and free of technical debt, priority 2 is adding new features, priority 1 is making sure there are no bugs, and the drop-everything-you-do-and-work-on-this priority zero is that I should never hate working on the software. Motivation is paramount to keeping the code clean, feature-rich and bug-free, and even to working on the start-up in the first place, so anything that might make me question my dedication to the project or cause me pain while working on it must and will be corrected as soon as possible, regardless of other priorities.

I’m pretty certain that all of the rules are important, but I do believe the last one is an absolute prerequisite.

Article image © Ergonomik — Flickr

Dealing With Huge Projects

Right now, I’m the only developer working on RunOrg, which happens to be a 45k-line project written in OCaml. According to a common terseness observation, this is equivalent to managing a 135k-line project in Java. Alone.

OCaml shares a problem with many dynamic languages : it’s very expressive, but there is no general consensus on what architectural best practices should be, so there are literally dozens of different ways a given feature might be implemented that cannot be discriminated on anything but taste. This leads to a variety of unique design choices throughout the application which, despite working well with each other, cause programmers to «discover» new architectures every time.

In the end, I believe that the philosophy of using the best tool for every job can easily be taken to painful extremes if you are not careful. You encounter a new problem, pick an unusual but well-adapted solution, and it makes perfect sense to you, so you move on. Months later, you come back and the solution does not make sense anymore because you have forgotten a small detail about how it works or why it was done this way, and you have to hunt that small detail down by reading the code. I’ve pretty much solved this anti-pattern, so I’ll come back to it later.

The main point I’m making here is that for every project, there is an ideal mudball of code that happens to perfectly implement everything without bugs, all in a single gigantic file, and you cannot write this mudball. For a human, there’s no way to manage anything mudballish past a few hundred lines because you cannot wrap your mortal mind around the possibility that every line might interact with any other line in the project… so, as an architect, you slice up the mudball into more acceptable bits that you politely call «modules» in order to reduce the number of things any given line might interact with. We reduce the amount of data we need to cope by adding big «you don’t need to think about this» signs everywhere (and making sure the signs don’t lie, obviously).

And on a small scale, this works, because you only have a dozen modules and it’s enough to fit in your short-term memory. RunOrg currently has 260 high-level modules, and several times that amount in sub-modules. No UML design, no matter how comprehensive, can make all those modules fit in my mind at once. I must find some «you don’t need to think about this» signs before I can move on.

There are mostly two ways of slicing a given project into modules: horizontal and vertical.

Vertical slices happen when there are dependencies, and modules look like layers stacked on top of each other with each layer being allowed to access the layers below. The RunOrg project architecture actually starts with a clean set of vertical slices : the controller layer deals with HTTP actions by using the view layer and the model layer below it, but the view layer cannot access the controller layer, and the model layer cannot access either.

Horizontal slices happen when there are absolutely no dependencies, and modules look like books cleanly arranged next to each other on a shelf. This usually happens when those modules represent the same concept for different purposes. In the RunOrg project, the controller layer is divided into many action modules, with each of these modules handling the HTTP requests for a limited part of the application. For instance, there’s a Login module in charge of handling HTTP requests related to logging in, and a File module in charge of handling HTTP requests related to uploading files. The concept is the same (handle HTTP requests) but the purpose is different (logging in, uploading files). And there is no need for either module to know about the existence of the other.

Knowing whether slices are vertical or horizontal immediately tells the programmer about what dependencies should be considered for that slice. And it is all recursive : the Login module of the controller layer is further divided into a Login_common bottom layer for common definitions, the root Login top layer for binding everything together, and an intermediary layer of horizontal Login_form, Login_signup, Login_lost slices dedicated to the various independent aspects of logging in. The naming convention helps identify the pattern used.

In practice, the slices do not necessarily map to actual namespaces or modules because, especially at very low levels, the granularity involved to segregate the two would be too verbose. For instance, while it may appear that the controller layer is made up of modules that are all horizontal slices, this is not the case : while the actions (functions that respond to HTTP requests) are indeed independent horizontal slices, the layer also contains helpers (functions that provide common functionality to actions) that follow a vertical layering, and a given module will usually contain both actions and helpers indiscriminately.

What is relevant here is that the patterns used will let you determine easily what kind of slice you are dealing with. And a pattern is a named convention (action, action helper, view template, table) that is respected by relevant pieces of code, in terms of :

  • Location : where is it within the module and file hierarchy, and in relation to other constructs within the same module ?
  • Structure : how does the code look like ? What parts of the pattern are expected to be changed and what parts should always be the same ?
  • Type : what is the signature of the module, class or function defined by the code ?
  • Name : is there a common suffix or a way to give a name to entities following the pattern ?

These are guidelines, a pattern should usually have at least one of these, and the more the better, but you don’t have to implement all four if it is counter-productive to do so. Also, a pattern should define a dependency rule : it is generally understood that two pieces of code that follow the same pattern have a dependency dictated by that pattern, and that dependency is usually a horizontal slice.

The important thing about patterns is that they are not an external influence on your project. If you limit yourself only to those patterns that are dictated by the Gang of Four book, or by the framework you are using, then you will miss out on the many patterns that will emerge naturally within your application. Quite to the contrary, it is essential to identify as often as possible the patterns that appear in your code, clean them up by providing both a name and conventions of location/structure/type/name, and apply them wherever necessary. This will make your code more easily recognized by the programmer, because there are only a handful of fairly generic concepts to learn (the patterns) and everything else can be understood by finding out what patterns are used. Even better, familiarity with patterns places a  «you don’t need to think about this» sign on the parts of the pattern structure that stay the same, because they never change.

And now, I have cleverly returned to my previous point : the inevitable conflict between the use the best tool rule and the use the same pattern rule.

Using the best tool creates the risk of writing code that is very difficult to understand later on, because there are too many special cases. Using the same pattern everywhere causes problems when the pattern is ill suited to the problem being solved, such that it creates code that is too long, too repetitive, or too unsafe.

In my day-to-day routine, I follow the use the same pattern rule until it becomes too painful. Then, I just change the pattern to make it less painful to use, and propagate the changes to all the places where it is used, which in turn was made possible by the fact that I did use the same pattern everywhere.

Obviously, I don’t have a pattern for everything. So, whenever I encounter a problem for the first time, I go with the best tool instead. Once that kind of problem is solved several times, a pattern will emerge and some refactoring will happen.

Article image © holycalamity — Flickr

Rewrite Your Code

Writing code relies on four kinds of decisions:

  • What algorithm can implement this feature?
  • How is that algorithm best written in that specific language?
  • What platform quirks and subtle edge cases must be accounted for?
  • How does this code fit in with the rest of the application?

Regardless of team experience or preliminary analysis, some of these decisions will be incorrect. Maybe the algorithm failed to take into account the unusual distribution of real-world data ; maybe there was a better way to write it ; maybe there’s a subtle bug that will not be discovered for weeks ; maybe a possible code reuse has not been identified during the design phase… or maybe the customer requirements that the feature was based on were not actually adapted to the customer needs.

Such bad decisions get in the way of users, but they also hinder developers, who have to regularly work around existing bad decisions, which in turn causes more bad decisions to be made in recurrent “lesser of two evils” situations.

It is a good idea to go back on your bad decisions and make new ones instead. They will not necessarily be good, but at least they will address some of the problems with the old ones.

Don’t try to go back on everything at once. Most of the time, the shortcomings of a decision can be identified in hindsight, change too many things at once and hindsight will be lost. In particular, throwing away non-trivial portions of code (anything beyond a single function) in order to rewrite it from scratch is quite risky, especially since it might also discard good decisions that would be hard to retrieve.

Don’t make your code difficult to change. Going back on your decisions will involve rewriting code. Lots of it. So far, most of the code in the RunOrg project has been rewritten at least three times. Make sure your language, frameworks, libraries and unit tests all work together to make it easy to evolve specific parts of your code to change decisions. The worst situation for a project to be in is code freeze — changing code is forbidden because it’s too risky and it might break something. If you suspect that your project might be heading that way, immediately drop everything you are doing and bring your project back to an acceptable state ; if you are not allowed to do so, make sure you send out a warning to anyone who might need to know.

Don’t make too many decisions. This is usually spelled out as YAGNI : You Ain’t Gonna Need It. If there is currently no need for a given feature, other than the fact that it should remain possible in the future, then don’t implement it. Implementing it will involve making many decisions about how it should happen, and lack of practical application will increase the odds that those decisions are wrong.

Don’t be afraid to go back on huge decisions. Weeks ago, an initial decision we made on the RunOrg project turned out to have huge performance implications. I was faced with two choices : keep that decision, and manually optimize the locations where the performance suffered the most (this involved manually handling caching and batches) ; or go back on that decision, re-architecture the entire database access system and propagate those changes throughout literally half the project, in order to allow automatic caching and batch construction in ways that manual optimization could never allow. The rewrite took me four days, with some aftershocks being felt several days afterwards (strangely enough, changing 20k lines of code resulted in only four fairly obvious bugs).

What does your decision-making process or pipeline look like? What does your decision postmortem and reversal process look like? How often do you go back on your decisions?

Article image © Barb Crawford – Flickr

Verification Bandwidth

We’re all in the software business, so you already know this to be true. Software doesn’t work. Think about it: you release your current project and then it’s 1° on time, 2° full-featured and 3° bug-free. Pick any two.

The are many technical solutions for delivering software without bugs. Architectures. Design methodologies. Frameworks. Programming languages. I was fairly convinced for a while that Objective Caml was the ultimate solution to software bugs. These solutions don’t work.

Lets take a simple example. A web site. Users can send private message to each other, and each message has a web address so I can send it to the user in an e-mail: «You have received a message, click here to view it». And then, a web developer writes the code for that address: grab the message identifier from the URL, ask the database for the message contents, mash them up with some HTML and send them to the viewer. That’s a bug right there. What about messages that don’t exist?

If you’re on your average web platform, the server will spit out an error message along the lines of «Silly human, this value is NULL» and it will obviously happen during your demo to your investors. Bad tech start-up, no funding cookie. The good news is that with Objective Caml is that values cannot be NULL — nullable references are a billion dollar mistake that the creators of ML wisely avoided. So, instead, you will get a compiler error about using a can-be-null type when a cannot-be-null type was expected. The buggy code will never reach production, the investors will not see anything out of the ordinary and you will get your funding.

But there’s still a bug. At no point did the code check that the viewer was indeed allowed to view that private message. That’s not a bug Objective Caml or any other automated tool on earth can detect for you. Unless you explain it to them — but if you forget to check this, you will probably also forget to teach the tool to detect that you forgot to check this. Foiled by Occam’s Razor once more. This is a human problem : if no human in the entire development process thought «we really need to make sure private messages really are private» then you can be certain that no automated tool will think of that for them. Until a mischevious user finds out and you get sued.

Our limited mental capabilities mean that every single human project since we started sharpening sticks at throwing rocks at each other follows the exact same structure: baby steps are interspersed with verification sessions that help keep the entire project on course. To fall back on a classic analogy, wow do we build bridges? The architect comes up with a general plan: it’s going to be this kind of bridge, going from here to there, build using these materials. Then, all kinds of verifications happen: technical (are these materials going to hold?), functional (is it wide enough for a two-lane road?), organizational (can we fund this?). The plans are adjusted to take any new elements into account, and the cycle continues until the bridge is done. And it turns out no one thought about oscillation frequencies in bridges and you get the Tacoma Narrows Bridge bug.

Any project is going to evolve under the effect of two distinct forces: implementation and verification. Both ingredients are necessary for success. Not enough implementation — not enough code, not enough plans, not enough features — is usually quite obvious: just count the features and you know you’re not done yet. Not enough verification, on the other hand, is a lot harder to detect because by definition you would need more of it to find out that you need more. As a project lead, this is an extremely important metric to manage: verification bandwidth — the amount of pre-implementation constraints and post-implementation feedback that is collected and applied to the project — will make the difference between a quality product and a dud.

And we already have a lot of tools for doing just that. Specifications aim to crystallize a lot of pre-implementation requirements into document form, which makes it easier to apply to the project than if they were just random comments floating around in collective memory or e-mails. Well-written annotations in a specification document can be a gold mine during implementation. And when bugs are detected after the implementation, bug tracking tools help bridge the gap between testers and implementers.

But there’s more to this than just good specifications and good bug tracking.

Agile folks recognize that while pre-implementation requirements are useful, post-implementation feedback is a much more valuable source of information. The various flavors of Agile development all have this in common: to make it as easy as possible to collect post-implementation feedback and apply it to the project. Weekly Scrum meetings, hands-on demos to stakeholders, continuous user testing with short cycles, are all ways of improving collection ; frequent refactoring, high quality code and evolving designs are all ways of improving the team’s ability to incorporate feedback into the project.

In fact, the shorter the feedback cycle, the better. This is the general idea behind automated testing: why let frail flesh-and-blood humans handle the testing if a computer can do it for you? Static type systems eliminate the need for a tester to painstakingly traversing all the pages of your PHP site looking for broken links and null variable errors. Automated Unit Tests and Regression Tests let you refactor your entire application without having a human tester look at even one screen. And what the automated tests cannot find — rely on code reviews or human testers to identify issues, and then retro-fit your automated test suite to detect that issue. You’re trading off a small bit of implementation for what amounts to a large verification payoff.

And self-feedback is just as important. Having experienced developers who can identify problems in code on their own before it’s even written are perhaps the single most important source of software quality. Developers who understand the problem domain and apply common sense efficiently are certainly a huge asset as well.

In any given project, the sources of verification information are the following:

  • Developers – costly to acquire, but the most efficient kind there is
  • Compilers and static analysis tools
  • Automated tests
  • Stakeholder feedback
  • Dedicated testers – especially if they can communicate with developers directly
  • Written specifications
  • End user feedback

Try looking at your current project in terms of verification bandwidths: what are your primary sources of feedback? What are your bottlenecks? How can you improve?

Objective Caml Web Programming

The core RunOrg¹ application clocks in at about 30K lines of Objective Caml code, with around 2K being added every week. If you factor in our use of CouchDB, all of this might strike you as an odd choice of technologies, based on esoteric hopeful fantasies instead of cold pragmatical consideration. It isn’t, despite what others might say:

OCaml: You know yourself to be fast, smart, and extremely reliable. However, you look kind of funny and nobody really wants to talk to you. You spend most of your time sitting in a public library glaring at people, occasionally yelling “NOBODY HERE APPRECIATES MY GENIUS!” and getting kicked out.

Two years ago, I discussed the topic of using Objective Caml for web programming:

What would happen if a compact web framework were proposed? One that, in addition to borrowing existing useful concepts from other languages, also added some OCaml-specific features to the mix. Functional modules would be an interesting addition, so would be the type system and pure functional programming applied to transactions, and monadic optimization at initialization time would also be quite interesting.

Eliom

Let’s get this out of the way first. I have been continuously peeking at Ocsigen – Eliom (a web server and assorted web framework) ever since it was mentioned in a comment, and some aspects of it resonated with me while others really did not. In many ways, it served as a showcase of the many ways in which the peculiarities of Objective Caml can impact the development of a web project, and helped me decide whether these were appropriate or not. This evolved into my own rendition of a web framework, Ozone, connected to an apache server through OcamlNet2-powered FastCGI.

There were many reasons for avoiding Ocsigen – Eliom, though I do not believe any of them to be universally true. The main reason was described in Guillaume Yziquel’s comment on that article:

Somehow, even a Ruby on Rails app is a state machine. Perhaps a “better state machine”, but a state machine nonetheless, in the sense that incoming requests interact with each other by modifying the internal data.

With Ocsigen / Eliom, it’s completely different: it’s a “safely” multithreaded, compiled, application. And that makes all the difference.

Based on my experience with Ocsigen – Eliom, I fully agree with this assertion, but consider it a liability in my situation. Our business plans call for a number of users that cannot be safely expected to all run out of a single server, be it multi-threaded, for both scaling and redundancy reasons. At some point, the only communication bridge between two requests will be the database back-end, and I need my web framework to accept that and actually make sure that my one-server code will gracefully scale up to a multi-server setup.

On a more philosophical level, I agree that «On [the] server side, somehow, the “state machine” paradigm has been a hindrance», but HTTP being what it is this is a basic truth that will not go away. Eliom is building an abstraction on top of it that will continuously spring leaks whenever the disconnected nature of HTTP surfaces. This is what ASP.NET and countless other technologies tried to do and they have all made the fall back to HTTP harder when the situation did eventually ask for it.

Ozone is also a compiled application, but it has one thread and no sessions — scaling happens by launching more instances of the application and therefore supports transparently the addition or removal of servers, while “session data” is stored in a combination of client-side state, database storage and HMAC proof tokens in the URLs. While this ascetic approach cuts me off from the sheer sexy of what Eliom allows, the tradeoff is a fairly convenient set of scalability guarantees. But if you can afford all that Eliom sexy, then I have no issue with that.

Benefits of OCaml

This is why I use Objective Caml, in no particular order.

  • It’s fast out of the box — OCaml is on par with C performance as long as you don’t stray too far into sub-optimal areas (such as naive string concatenation). I can write any kind of code and be assured that it will not be the bottleneck, because database access and HTTP are a lot slower: right now, the average HTTP request takes about 80ms, with about 60ms for the actual HTTP transfer, 18ms for database latency, and 2ms for all of the Apache-FastCGI-Ozone sequence when compiled without optimizations.
  • It’s a compiled application. This one is mostly aimed at my PHP friends, where every request starts a new PHP execution from scratch — this makes it several designs impossible or impractical, such as event-based programming: this would require B to register as a listener to A’s event, which means B should be identified as a potential listener and loaded for every request even if it does not trigger the event. Once initialized, a given Ozone instance can respond to tens of thousands of requests, which makes it worthwhile to run a lot of pre-processing and pre-caching operations during initialization.
  • It’s safe. I use a programming style that relies on avoiding exceptions, never using wildcards, defining many new types for almost everything, and writing pure functional code. This eliminates entire realms of bugs : using the wrong variable, forgetting to call a function or catch an exception, being surprised by a sneaky side-effect or doing things in the wrong order… About half the bugs I caught using Unit Tests don’t exist in OCaml (null reference exceptions, anyone?) and the other half is eliminated by my programming style — so I don’t write unit tests anymore (well, I do write an automated “test” every time I find a bug, but it’s usually as simple as adding a type annotation). This also lets me routinely refactor literally half the application every other week, without causing any bugs.
  • It’s concise. Most of the features I write are a matter of a mere hundred lines — most of the code is related to my obsessive need for being explicit. Being a functional language, you can define a brand new anonymous function on the spot and throw it into another function that is returned by yet another function which is then given to yet yet another function, all of it being implicitly type-checked without having to define a single IAcceptsBoxObserver interface or LeafBoxObserver implementation.
  • It has a fast compiler. Building those 30KLOC from scratch takes less than a minute — the average incremental build takes one or two seconds. Whenever I have any doubts about what I’m writing, I can just ask the compiler — Hey, did I forget anything about this function call? Why yes, master, you forgot to check that the user was indeed allowed to reply to that message.

The most essential feature is complete compile-time safety. As a web programmer, I have to be careful about hundreds of small details — can this text be translated into another language? Is this user allowed to do what they just did? Did that object disappear from the database while you were editing it? Does that URL really correspond to an actual page? Did you remember to check for script injection in that piece of HTML? Is this GET parameter available at this point in the code? Is this object available or locked by another user? Did I forget anything else? It’s impossible for a human brain to think about all these things while at the same time creating an elegant design or refactoring a piece of code or writing a new feature. I can use the flexible OCaml type system to check for all these details through appropriate design of the Ozone API, which turns the development process into a game of 1° write the simplest code that works, 2° listen to the compiler’s suggestions for making it fail-proof. It’s a game that I’m becoming fairly fond of, and it lets me concentrate on the very core of what I’m trying to do.

Disadvantages of OCaml

It’s not a happy fun place. Quite the contrary: the language comes with a set of annoying quirks and flaws that do make things harder. Before you jump in, you should know what to expect.

  • Type-safety has a price. If the type system cannot express a certain thing, then you can’t do it. There are a few fairly complex examples where this has caused me trouble, in areas such as optional function arguments, module meta-programming, JSON serialization or dynamic database-driven data structures. Workarounds exist, but they’re only workarounds. Another side-effect is that type inference can make it hard for inexperienced developers to find an error, especially if you do a lot of strange type wizardry. Not to mention the silly yet annoying “this expression has type foo but is used here with type foo” error.
  • Lack of tools and libraries. Being a non-mainstream language means there are no heavily tweaked and highly evolved tools available (think about the wealth of tools available for C# or Java development), which gives a certain clunky feel to development. Besides, many libraries which are taken for granted in the mainstream world are missing or non-documented — try connecting to the Facebook API and you’ll notice that not only there is no Facebook SDK in OCaml, but there is also no documented way of using HTTPS. The same goes for Amazon S3 and MD5-based HMACs, by the way. And iconv functionality. And removing the X-Mailer header from e-mail you send. The list goes on.
  • It’s not object-oriented. You can use classes and mutable objects — it’s a viable implementation strategy, but it also bears a lot of the typical issues encountered in the mainstream programming languages, and it lacks the conciseness of functional approaches (defining a class and instantiating an object is bound to be longer than a lambda). If you’re not in the right mindset for using the language, you will miss on a lot of the benefits.
  • It’s not popular. It is a disadvantage, just not a technical one. As a programmer I couldn’t care less about the popularity of my language because, you know, COBOL was very popular once. As a hiring manager, I am aware that using a non-popular language will make hiring developers harder. As a start-up founder, I know that this reduces my chances of selling my company because esoteric technologies are a risk to potential buyers.

There are also many tiny quirks in the language that I hope would eventually be solved. For instance, there’s the absence of a shorthand notation for the ubiquitous (fun x -> x # member). There’s also the lack of C#-like properties, with a pure functional twist:

val x = init

method get_x    = x
method set_x x' = {< x = x' >}

And, of course, there is a lot of things going on with the option type that BatOption just isn’t up to expressing concisely. The P4 preprocessor could be applied to these situations fairly reasonably, but I would feel more comfortable if they were built into the language (and syntax highlighting tools).

In conclusion, OCaml + CouchDB provide our team with the flexibility required to build new features frequently without being afraid of subtle bugs or regressions, and to regularly refactor our code into a more amenable mess. It is a level of compiler-provided safety, surgical refactoring and bug detection that would be simply unavailable with C# and Java (and hopeless with PHP, Python or Ruby).

¹ RunOrg is my Start-Up ; we provide an online tool that helps associations, unions, organizations and communities manage their members, contacts, activities, events, knowledge and online presence.

Work ≠ Progress

I did a lot of work today. Mostly, I tracked down and eliminated a nasty little problem related to our @runorg.com email addresses and our DNS records.

DNS is the directory system which determines which particular computer handles the requests to a given domain name. So, if you’re looking for holy-grail.runorg.com, a DNS entry mentions that it points to the machine known on the internet as 188.165.231.88, which happens to be our main production server.

The MX records are used when you’re looking for the mailboxes for that domain. This is because usually, you don’t want your web server to handle your e-mail: it’s handled elsewhere, such as another company server, or maybe gmail. So, you can specify a main DNS entry for your domain and then use the MX record to point to another server specifically for e-mail.

Finally, the CNAME records represent the canonical name. We don’t want our main web site to be available both on http://runorg.com and http://www.runorg.com, because it’s confusing and bad for the search engine ranking. So, I pointed a CNAME telling that runorg.com should point at www.runorg.com.

What I did not take into account (or even know) was that CNAME records are meant to be of a higher priority than MX records. So, when someone sent an e-mail to foobar@runorg.com, it would undergo canonicalization and point at foobar@www.runorg.com instead. Since there was no MX record for the latter, the e-mail would then disappear into the void. Our tools and newsletters apparently ignored the CNAME when sending e-mail, so we received those correctly.

So, my entire day was spent hunting down an obscure, unpredictable and not-quite-documented error in my DNS records. It was necessary work and it certainly kept me busy, but it wasn’t progress.

Our team has a looming deadline: the delivery of our first version of the software. It’s when we move from an “implement all the stuff we need before we can deliver” strategy to a “improve or add features to the existing product” strategy (which is an entirely different mechanism). Progress is what brings us closer to that transition — while dealing with the DNS issue was necessary, it did not move me an inch closer to delivering version 1.0.

What is the single largest difference between working as an employee for another firm and working on your own Start-Up? Before I started, I would have guessed it would be the work hours (I now work week-ends quite often), the commute (I work at home because we’re too small to need offices), the freedom (I’m literally by own boss) or the lack of money (no comment). Now it’s pretty obvious that the single greatest difference is that I now emphasize progress more than I emphasize work.

In my previous jobs, there was a fixed set of objectives which had to be accomplished, so I would just come to work every day and chip away at the monolith of work to be done, and since it all had to be done anyway, I could do it in any order I wished. Since I’ve started working on my Start-Up, I find myself increasingly questioning the very objectives I’m trying to accomplish — is this going to let me ship sooner, or not? The freedom of choosing (and discarding) my  objectives myself comes with the responsibility of making the right choices.

That’s a question I never asked myself before.

When you think about it, there are many things that are work but not progress. Some are done because it feels easier to do them sooner rather than later. Others are done because, let’s face it, sometimes you have low morale and a neat exciting feature comes up that you’d rather implement even though it’s purely gratuitous (I added a CSV export feature recently that is not necessary in any way, and I know my definition of exciting is weird but bear with me). Others stem from the necessary shame of delivering a half-baked product, but bear in mind that:

If you are not embarrassed by the first version of your product, you’ve launched too late.
- Reid Hoffman, LinkedIn founder

Delivering a huge product with a small under-funded team is ultimately a find-the-shortest-path endeavour. Choose your next objective based on that.

Know Thyself

What do you enjoy about this project? What do you utterly hate about it? What’s the long-term motivation that keeps you working on it : fame, money, passion, something else?

What are the day-to-day activities that you love doing? What would you rather not do? Do these match your actual skills, or are you unskilled in what you love and hate what you’re skilled in?

When and where are you the most productive? Are you a regular worker, or do you have infrequent bursts of productivity?

I’m glad you took the time to know all of these tidbits about yourself. It really helps.

Have you told the people you work with? Have they told you what makes them tick? If not, how do you know you’re working with them and not against them?



1170 feed subscribers
(readers who polled a feed this week)