
First, a quick bit of background: I’m working on RunOrg [fr], a start-up that provides communities with their own online private social networks à la facebook. The technology stack is Linux-Apache-CouchDB-OCaml, and this has some implications that I will discuss below.
Facebook has it easy in terms of user management: an user starts existing on their platform the instant they sign up, at which point they fill in their first name and last name, and these are displayed to anyone who is allowed to see any hint of the user’s existence. So, making the first name and last name mandatory are quite acceptable.
At RunOrg, we cannot do this for several reasons:
- User profiles may be created by communities as part of the membership management toolbox: we have to rely on user A to provide data about user B, and user A usually relies on an email-only source (such as newsletter or mailing-list registrations) where no first name or last name is available.
- A given user may be part of several independent communities, and may choose to manage their identity separately for each one: appear as John Doe in an innocuous community they trust and John Censored in a more critical community.
- We also allow users to keep control over whether a community is allowed to publish their name on the internet (as part of the online directory, or as comments on public articles).
Our needs for advanced privacy controls involve a more complex management of both what data is available and how we display it. The good news is, it’s certainly possible to handle all of this elegantly in terms of implementation. The bad news — we didn’t plan everything ahead.
It was only a few days ago that our customer requirements sessions brought up the issue of email-only sources: community managers were frustrated by the fact that our mass import functionality required first names and last names. The problem is, in almost every single programming language out there, making a required field become optional is a very dangerous endeavor because the development team must audit the entire code base to identify which parts of the code assume that the field is required, and describe what should happen when the field is null.
In your average PHP project, try making the user name optional and I can assure you that sentences like «You have been invited by to this event» will appear. Someone failed to audit the who-invited-you-to-events code. At least with Java or C# you will get a Null Reference Exception of some kind that will show up in the logs and give you the opportunity to hunt down the mistake.
The good news is that our implementation language OCaml, does not allow null values. Instead, optional values are handled using a different value type, known as 'a option, which changes everything. An optional value simply cannot be accessed in the same way as a non-optional value. Trying to do so anyway will cause a type error that is picked up by the compiler, so a programmer can rely on these errors to quickly identify all locations the code that assume the value to be present.
I’ll say it again: in OCaml, a field being optional or mandatory is an assumption that is build into the type of that field, so changing the assumption involves changing the type and breaks all code that does not match the new assumption. Applying breaking changes to an OCaml code base is usually as simple as following a trail of compiler errors.
So, that’s what we did. We already knew that the behavior we wanted was to construct the “display name” of users like this:
- If either the firstname or the lastname are present, use them (if both are present, use firstname-whitespace-lastname).
- If none were present, then the private display name (visible only to the user themselves on their profile page and in the e-mails they receive) should be their e-mail address and the public display name (visible to everyone else) should be the username part of their e-mail address (so john.doe@gmail.com is shown as john.doe@…)
First, we defined two functions that compute the private and public display names based on the first name, last name and e-mail of the user. Then, the compiler error trail led us to all locations where a change was required, where we quickly identified whether the public or private name was to be displayed and replaced the existing code with our new display name functions. In total, a full audit of 40kLOC was done in less than an hour and I have proof that any code that uses the user name now handles the case where the user name is not provided.
The Rules
When working on any OCaml project, and especially on RunOrg, I follow these few rules:
- Any assumption must cause a compiler error when broken. Either the code determines on the spot that the assumption is true, or I use the type system to prove that another part of the code already did. This rule took a massive toll on my early productivity, and I attributed it to an inherent cost of making compiler-enforced assumptions, but the real reason was that I was still pretty new at it — the elementary assumption enforcement from my smaller projects was too crude for the needs of the richness of RunOrg functionality, and it took me six months to refactor my early approach into an elegant and streamlined strategy of encoding assumptions into types.
- Don’t work around the compiler or cheat with semantics. The initial reaction to a system that complains about every little change you make is to try and work around it by using more generic types or storing information where it does not belong. For instance, an easy solution to the optional name conundrum would have been to store john.doe@… as the name, but doing so would have been semantically incorrect (that’s a placeholder, not a name) and would have polluted the database “name” field with things that are not names and that will be treated differently from names at some point in the future.
- Don’t accept mediocre code or patterns. Sometimes, design choices in the interface of module A will lead to ugly code in modules B, C and D because an unforeseen usage pattern happens to apply 95% of the time and the interface of module A was not designed with that usage pattern in mind. No amount of cleanup or refactoring in modules B, C and D will solve the problem, the only solution is to go back to the design of module A and change the interface even if it means that two hundred client modules will break. Keeping my code clean, elegant and short is worth wading through two hundred modules.
- Perform lazy payments on your technical debt. I can propagate new design changes through your entire code base in one coding session, but this doesn’t mean I should. Instead, I keep a mental todo-list of all the changes that need to be applied, and apply all of them at once, locally, whenever I have to rework a given piece of code for any reason. While it may seem that such a todo-list is hard to keep and I will inevitably forget parts of it, remember that those design changes came around in order to solve the problem of ugly or mediocre code — by noticing that the code is ugly, I am reminded of the strategies that I set up in order to clean it up.
- But be eager with small payments. If it’s a matter of moving a few functions around or refactoring a small piece of code, I do it as soon as I am done writing or rewriting it. Cleaning up little odd bits in a mostly clean code base is extremely rewarding.
- Discover code by trying changes out. If the assumptions are correctly laid out, then the easiest way to determine the implications of a change — whether it will work and how long it will take — is simply to try it out. Following the compiler error trail will quickly reveal how many things are impacted by the change, as well as any unforeseen massive consequences. If it turns out that the change is too impacting, I just roll back my edits.
- Keep interface patterns to a minimum. The basic idea behind having few different interfaces implemented by many parts of the system is usually expected to be «code is easier to reuse» but I disagree. Yes, that is a frequent benefit, but certainly not the most essential. Having few different interfaces means that most of my code can be described using a small vocabulary of interface patterns, and that looking at some code immediately reveals the pattern being used there. It also means that any design changes can be expressed in term of pattern changes, and can be applied almost blindly to all locations where that pattern was used. Last but not least, by using a simple shared vocabulary for large sections of the applications, I make it easier to recognize patterns in the more chaotic sections based on how they interact with the cleaner code. It’s easier to determine that two sentences have the same meaning if they share some words.
- Love your code. In the RunOrg code base, priority 3 is making sure the code is well-designed, clean and free of technical debt, priority 2 is adding new features, priority 1 is making sure there are no bugs, and the drop-everything-you-do-and-work-on-this priority zero is that I should never hate working on the software. Motivation is paramount to keeping the code clean, feature-rich and bug-free, and even to working on the start-up in the first place, so anything that might make me question my dedication to the project or cause me pain while working on it must and will be corrected as soon as possible, regardless of other priorities.
I’m pretty certain that all of the rules are important, but I do believe the last one is an absolute prerequisite.
Article image © Ergonomik — Flickr
Hi. I'm Victor Nicollet,
Recent Comments