Daily Archive for August 22nd, 2008

C++ Warts

Warts, in programming jargon, refer to any information appended to a symbol’s name in addition to the core semantic name of that symbol, usually in a highly abbreviated and standardized fashion. Examples of wart-bearing names (with the warts in red) would be lpszName, CUserSocket, g_project_config, author_t, and next_. In themselves, warts are just a way of conveying additional information about a symbol, usually a property such as its kind (class, template, interface), type (long pointer to a string), scope (global or member) or some more specific information (encoding, being a row or column index, and so on).

Warts fall into three major categories:

  • Useful, these convey a relevant information that would not be easily conveyed otherwise.
  • Useless, these convey some information that can be readily gathered from the name or the usage.
  • Useless and dangerous, these convey some information with some negative side-effects on code quality.

Those that are Useful

To be useful, a wart must be small enough to avoid cluttering the name (most people consider the clutter limit to be at most one character long), summarize a piece of information that is not obvious from the context of use (and cannot be made obvious by using the language), and provide a large enough level of understanding that it can severely influence the plans of the programmer.

The canonical example of an useful wart is the “interface” wart, which consists in prepending an uppercase i (for interface) to the name of an interface (or, in C++, an abstract class which serves as an interface). The message conveyed is that the symbol represents a type which provides no functionality, and merely defines a list of methods that must be provided by any subtype. There is a big difference between a function with a signature such as:

void Print(const Formatter &fmt)

And a function with a signature like:

void Print(const IFormatter &fmt)

The first function brings up the idea that an instance of the codebase-provided Formatter class must be created, customized either through constructor arguments or through member functions, and then passed to the function. The second function brings up the idea that one must instead use an existing implementation of the formatter, or perhaps create a new one. Of course, the wart somewhat loses its usefulness when inside code, because an intellisense-like tool can easily convey the same information. However, in documentation summaries or samples, or when working on the white board, these are useful.

Those that are useless, but benign

The leading contenders here would be the C prefix (indicating that a type is a class), the T prefix (indicating that a symbol is a template), the _t suffix (indicating that a symbol is a type), the g_ and m_ prefixes (indicating respectively that a symbol is a global or member variable). The latter is sometimes found as a trailing underscore.

All of this information can either be clearly inferred from the context in any reasonably well-written program (not only should all programs be well-written, but programs which use warts generally use them in order to write the program well), or provide no actual relevant information.

The starting point would be the C prefix. Indicating that a symbol is a class type can serve two purposes: differentiating it from non-class types, and differentiating it from non-types. The first distinction is pointless: the only situation when a class behaves differently from another type in a way that actually matters is when looking at PODs (and even then, classes can be PODs as well) and code that relies on that distinction is extremely rare.

As for labeling types to differentiate them from variables, this is a good idea in C, where no C++ namespaces exist and thus collisions must be avoided between type names and variable names. In C++, however, most serious projects place their types into namespaces, and the global namespace can be used as a last resort if necessary. Furthermore, even though it can be easily solved, ambiguity between type names and other symbols arises only with a high rarity anyway.

The T prefix to templates may appear as a good idea in the same way as the I prefix. However, a template is nothing more than an ad hoc meta-function which uses ‘<>’ instead of ‘()’ for its parameter list. As such, there is no more need to differentiate templates from non-templates than it is to differentiate functions from non-functions: the template parameter list is almost always present on a template (removing any ambiguity), and the only case where it is not present is when passing the template as an argument to another template, a situationboth too rare and too complex to be resolved without knowing what the other template is doing.

The g_ prefix for global variables does provide information that is relevant and important. However, it can be easily gathered from the context in a well-written program, simply by namespacing it: a namespaced variable is always obviously distinct from a local or member variable. The advantage of adding the namespace prefix is that the ownership of the global variable is more clear (since the namespace will reflect which part of the program the global belongs to, if any), and the prefix can be ignored easily in a context where it is obvious that a global variable is being manipulated. Also, using global variables in C++ is a typical recipe for disaster (in particular because of order-of-initialization problems) and is usually replaced by a global function returning a reference to an internally managed global object instead, making this point more of a historical anecdote.

The m_ prefix (or underscore suffix) for member variables is also useless: the vast majority of uses of a member variable happens in a context where the nature is a syntactical fact (such as object.member, object -> member or constructor() : member(data)) where a wart is entirely useless. The situation where ambiguity can arise is when accessing a member variable in a member function. Two possibilities exist here: either the function is small enough that the variable, being neither a local variable nor a function parameter, is necessarily a member, or the function is large enough to warrant an effort. This can be done by prepending a this -> to the variable name, which serves the same purpose as a wart, but has the advantage of being removable when unnecessary.

Those that are useless and dangerous

Worse that being useless, is being dangerous. Most useless warts carry redundant (and thus useless) information by using up only one or two characters. Some warts, however, can carry incorrect information, use up too much space, or even prevent code from being portable.

Correct information appears mostly with warts that carry type information. During the lifetime of a program, it’s possible for the type of variables to change to accomodate program extensions or bug corrections. Expecting the programmer doing the modification to also propagate the name change is a good idea in theory, but hopeless in practice. Inevitably, out of laziness, honest mistake or looming deadlines, the type wart will become obsolete, and will lead a maintenance programmer into thinking that iLength is an integer when in fact it is a floating-point number.

Using too much space becomes a real difficulty with type warts: either class instances get useless warts such as ‘o’ or ‘obj’ to indicate that they are objects (which is a quite pointless fact in object-oriented software), or they get more detailed ones. This is to say nothing of the typical long-pointer warts so widespread in the Win32 C API (meet the lplptstrUsername internationalization-friendly pointer to an username). While in itself just a readability issue of making variable names longer, these warts often lead the programmers to shorten the non-wart part of the name to keep line lengths manageable, leading to lplptstrUsrnm or worse.

Last but not least, there is the case of the prefix underscore: when an identifier begins with an underscore followed by a capital letter, that identifier is reserved by the implementation. This means that code can break when changing compilers or compiler versions.

Conclusion

While warts are not always necessarily bad, many are completely useless, requiring additional uniformization effort for no real gain besides, perhaps a very subjective aesthetic pleasure and a warm and fuzzy feeling for the instigator, and some are actually dangerous to use despite their initial appeal, because information redundancy, when maintained by humans, is seldom redundant in the face of changes.



1150 feed subscribers
(readers who polled a feed this week)