Configuration files are omnipresent in modern computer architecture. They usually appear as editable text files that are loaded (more or less dynamically) by a process while it’s running. There are many design choices that separate good configuration files from bad configuration files.
Location
A very important part of configuration is where the configuration files can be found. Good configuration setups store their files in locations as conventional as possible:
- The root directory of the program, with an explicit name (usually containing “config“). This is usually a Linux convention, but a lot of PHP projects tend to follow it.
- An “etc” directory within the root directory of the program. Alternatively, you may store the system-wide configuration elements in the absolute “/etc/program” path, as long as an alternate target is provided.
- If the program is aimed at Linux users, per-user configuration is usually expected within the home directory, prefixed by a dot (either a directory, such as “.emacs.d“, or as a single file, such as “.muttrc“).
- On windows, the registry is advisable for user programs (store global configuration as global keys if allwed to, and store per-user configuration as per-user keys).
- In Java, “foobar.properties” is usually expected to be in the same path as the class “foobar” for which it was defined.
Hiding configuration files in other locations is possible, but I would advise against it—it forces administrators, developers and installers to hunt for the files (either on their file system, or through documentation).
It is also possible to store configuration information in the database—this raises the question of where configuration stops and runtime data starts, especially in modern systems like WordPress or Magento that allow heavy-duty reconfiguration of the system through a back-office HTTP interface. This can also be quite annoying at times: for instance, if your development model calls for many developers writing and testing code on their own machines with a shared database for all developers, a system like Magento won’t do because it stores local information (such as the access URL) in the database.
Timing
Another important element is the time when the configuration file is taken into account by the system. There are, mostly, three ways of dealing with configuration files:
- Load at initialization time. When the system or program boots, it reads and parses the configuration file. The advantage of this system is that if the configuration file is broken, the system won’t start, so it’s quite probable that the system administrator will be on hand to correct things as required. Besides, it’s also quite easy to implement. The downside is that you have to restart the system to take changes into account.
- Load on demand. The file is loaded at initialization time, but can be reloaded on demand (the classic way on a Debian box is to “/etc/init.d/program reload“, for instance). Should an error appear in the file, the reloading fails with an error but the old configuration is kept so that the program keeps running while an administrator corrects the error. This way is harder to get right, especially since it requires a communication channel to signal the change in configuration to the program.
- Load on every request. This is the case for configuration files that affect the behavior of a frequently occurring action (such as receiving an HTTP request for an Apache server). Whenever the action is performed, the configuration file is reloaded. The advantage of this solution is that there is no manual reloading to be done. The downside is that the configuration will not be tested until the action is performed, which might happen a while after the administrator left (of course, in a perfect world people would test any modifications they make on a live system before leaving).
Let’s not forget the issue of synchronization: does a configuration file reflect the current configuration of a running program? In the first two cases, it doesn’t: an administrator could forget to restart or reload the system, leading to a system that uses the old configuration and a configuration file that contains untested modifications, and rebooting such a system is certain death. Some administrators go as far as rebooting a system once per night—the ability to come back online quickly and correctly is taken as a sign of good health of the system.
My personal advice on this is to use load on demand. This improves performance over load on request (as there is no need to reload configuration every run) as well as safety (you immediately know if something went wrong) and to compensate for the synchronization issue by periodically checking whether the currently-enabled configuration is older than the configuration file itself, and issuing warnings if it is. Keep a backup of the current configuration somewhere in case you need to reboot.
Another solution for the synchronization issue is to combine editing and reloading. This is what crontab does: it edits the file containing the jobs, checks for validity, and then signals the crond daemon to trigger a reload. The approach is also found in online administration tools (that only commit a modification once it’s been validated, and do not save uncommited modifications at all). Requiring to go through an editor, however, reduces interoperability as it prevents non-human users (such as IDEs, installers, administration consoles or other third party programs) from modifying the configuration.
Syntax
There are usually three main ways to design a configuration file itself.
- Creating your own language. This one is extremely frequent in the UNIX world. It has the benefit of allowing maximum expressiveness, as the developer can tailor a configuration language to stick as closely to the problem domain as possible. The downside is that it requires administrators to learn yet another scripting language. As a Linux sysadmin, I routinely have to deal with a lot of languages, such as:
- the general-purpose awk and sed
- makefiles
- crontabs
- shell configuration files (.zshrc, .bashrc)
- php.ini
- apache host definitions, .htaccess and httpd.conf
- the sudoers file
- .qmail
- .emacs.d
- .muttrc
- .flrnrc
- /etc/fstab
- /etc/passwd (arguably, I can read it faster than I query it with the appropriate programs)
Besides, don’t really expect your IDE or text editor to provide verification and syntax highlighting for your more obscure scripts.
- Using XML, with an appropriate schema or DTD. While fairly verbose and often ill suited to configuration tasks, XML is a standard. This means that it’s easy to find validation tools (including, of course, editing it with your IDE and knowing ahead of time if errors are present), parsing tools (so that you don’t have to write a loading module yourself) and transform tools in the form of XSL. Of course, not everything can be validated, and not every configuration file has a corresponding DTD—sometimes, developers use XML for the “easy to parse” benefit and don’t really care about who will be writing the configuration.
- Using a programming language. This works fine with dynamic languages that can interpret themselves on the fly, and is most often encountered in PHP (the Mantis bug tracker uses this approach, and the configuration file, written in PHP, is merely a set of assignments to global variables). This is extremely efficient: no parser is required at all, there’s already full support for the IDE, the format is much more flexible if you call APIs that are allowed to you, and you don’t have to learn another language either. The downside is that there’s a lot of freedom allowed, meaning that a meddling user could break the system if they were allowed to provide a configuration file—so, no userland configuration can use this approach on pain of death without heavy sandboxing.
This choice is yours, as I have no advice to give here: XML can be nice, but overly verbose, custom formats always need you to learn them, and using the programming language itself can be dangerous because it’s Turing-complete.
Can one imagine a replacement for XML? A language with turing-complete verification rules, a nicer syntax, and even easier parsing? Or even an improvement on XML schemas that could perform more in-depth checks? I think this deserves some more thought.
Hi. I'm Victor Nicollet,
Recent Comments