An Exercise in Terseness

Terse: smoothly elegant, devoid of superfluity

Programming languages have cultures that revolve around them, which are both a cause and a consequence of the idioms and techniques found in these languages. Some languages, such as Perl or Objective Caml, have small communities around them that lead to a reasonable consistency in their cultural aspect: while both value terseness as a sign of good design, Perl hackers rely on the wild tentacled expressiveness of the language while Objective Caml hackers rely on streamlined mathematical representations that the language expresses concisely.

Other languages are larger, often due to their application in so many situations that a single language-minded community cannot contain them. JavaScript ranges from download-free-script online communities with non-portable low-quality code to elegant functional programming to code generated from Java, while C++ ranges from academic numerical computations inspired from earlier proceedings in the C language to modern reliance on standard library generic and functional programming to large-scale object-oriented development.

Tentacled Onion

When I say that the Perl language has a wild tentacled expressiveness, it is more of an admiring stance than a negative analysis. Consider, for instance, the “underscore” special variable $_ : in many situations, if no variable is used to assign the result of an expression, then that result is assigned to $_. For instance:

while(defined($_ = <>)) print $_; 

# Equivalent to : 

while(<>) print;

To the average Perl hacker, this concept is an illustration of a fundamental principle (as the standard documentation puts it, “this may seem like an odd thing to you, but you’ll use the construct in almost every Perl script you write”). In terms of language semantics, the behavior is pretty simple to explain: a program is a collection of operations which read values and output values, where the output of one operation is the input of another. This is true for almost all programming languages (with the exception of some languages, such as Prolog).

Writing a program is the description of what operations to run, and what data goes from which operation to which operation. Concise languages tend to describe operations by naming them, and make data flows implicit by various means. Perl takes a “don’t write what you don’t need to specify” that is shared by several other languages from SH to Ruby: when it’s obvious that a value has to be assigned to something, there’s a default assignment target that will keep the programmer happy most of the time (the $_ special variable). And, in general, if an acceptable default behavior exists, it is used.

Go Forth and Multiply

Forth makes data flow completely implicit, through slightly different means: instead of being stored in variables, data is pushed onto a stack. Every operation therefore reads the top values from the stack, and its output is pushed back on top of the stack when it ends. As such, a Forth program is a sequence of words, and every word is an operation.

This extends to “function” definitions, which don’t have to name their parameters and instead read directly from the stack:

: abs dup 0 < if negate then ;

This definition of “abs” duplicates the top element, pushes zero, then pops the two top elements and pushes a boolean determining whether the bottom was smaller than the top, then pops that boolean and, if true, pops the top and pushes its negation.

Of course, terseness and conciseness leads to shorter code for equivalent functionality, and shorter code usually leads to more difficulties in understanding. For instance, long word definitions are frowned upon in Forth because keeping track of the stack is hard (its implicit manipulation is harder to follow than the less concise but more explicit technique of naming values for later use). The same goes for Perl, where the implicit use of a default behavior requires knowledge of what that default behavior is (and even the most war-worn Perl hacker can be surprised by the behavior of an exceptionally devious and elegant piece of code).

Paths, Expressions and Domains

One area where conciseness does not create difficulties in understanding is domain-specific languages. If a general-purpose language can describe a shopping cart in thirty bytes, there’s a high probability that those thirty bytes will be cleverly written and therefore difficult to understand. This is an elementary cookie-based shopping cart implementation in Perl:

@P=sub{lijs while<CKE>;}ul$<chomp; sub{~"http://.*";print}

Just kidding. On the other hand, a thirty-byte description of a shopping cart in a scripting language designed for creating e-commerce web sites is boringly unsurprising: this is what the language was designed for, and one can expect it to get to the point fast enough. Everyone who uses such a language already knows what a shopping cart is and therefore has no difficulties understanding the code.

XPath is an example of a concise yet understandable language: it has a clean and small domain (identification of nodesets within an XML document) and a syntax that matches. For instance, were I to look for all the ‘li’ elements of an XML document that are inside a ‘div’ named “menu”:

 /descendant::div[attribute::class="menu"]/descendant::li

Or, as a shorter version (using the shorthand syntax of XPath):

//div[class="menu"]//li

The equivalent search without XPath in any general-purpose language (even with a good library for traversing XML) would be much longer than that. The same happens for regular expressions: a complex language for the small domain of matching patterns within text (and possibly replacing them with something else), that aims to be concise above all.

An Exercise in Terseness

I have often been frustrated by the verbosity of XSLT as a tree transform language. Looking at the w3schools tutorials, for example, one can see an example such as:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> 

<xsl:template match="/">
  <html>
  <body>
    <h2>My CD Collection</h2>
    <table border="1">
      <tr bgcolor="#9acd32">
        <th>Title</th>
        <th>Artist</th>
      </tr>
      <xsl:for-each select="catalog/cd">
      <tr>
        <td><xsl:value-of select="title"/></td>
        <td><xsl:value-of select="artist"/></td>
      </tr>
      </xsl:for-each>
    </table>
  </body>
  </html>
</xsl:template> 

</xsl:stylesheet>

This is, at least in my opinion, a mighty long piece of code for such a small thing. Using a stack-based approach similar to Forth, one could make the above look like:

: ~/
  "My CD Collection" /h2
    "Title" "Artist" . ( /th ) "#9acd32" @bgcolor /tr
    ~catalog/cd { $ ~title # ~artist # . ( /td ) /tr } .
  "1" @border /table
/body /html ;

The semantics of this language use two elements: a stack of output node sets and a stack of input node sets. It still works based on words:

Word Semantics XSLT Equivalent
“foo”
Pushes the text node “foo” on top of the output stack. Raw text
/foo
Pops the top of the output stack and replaces it with an element named “foo” containing all the elements that were in the popped node set. <foo></foo>
@foo
Pops the top of the output stack, then adds an attribute named “foo” containing the text representation of all the elements in the popped node set o the new top of the output stack. foo=
~foo
Pops the top of the input stack and pushes any of its contents that match the XPath “foo”. select=”"
$
Duplicate the top of the input stack. Automatic
.
Concatenate the top two output node sets into one. Automatic
{ }
Pops the top of the input stack, pushes an empty node set onto the output stack. For every node in the popped node set, pushes that node on the input stack and executes the code in brackets, adding an implicit . at the end of that code. xsl:for-each
( ) Same as { }, but pops from the output stack instead. XSLT 2.0 only.
# Pops the top of the input stack and pushes its text value to the top of the output stack. xsl:value-of
: ~foo ; Defines a template that matches the pattern “foo”. xsl:template
$: Pops the top of the input stack, pushes an empty node set onto the output stack. Then, pushes every element of the popped node set that matches an existing template onto the input stack and runs that template, then concatenates the top two output node sets. xsl:apply-templates

This shortens the resulting transform script, while reducing its readability (now, the generated XML is not “obviously present” in the script, but rather implied by the sequence of words that construct it).

If you remember the first example from this earlier article, you could rewrite it as such:

: ~/    $: /ul ;
: ~file ~@name # /p /li ;
: ~dir  $ ~@name # /p $: /ul . /li ;

When you’re careful, you can make out the meaning of this. However, it’s harder than with the original:

<xsl:template match="dir">
  <li>
    <p><xsl:value-of select="@name"/></p>
    <ul><xsl:apply-templates/></ul>
  </li>
</xsl:template> 

<xsl:template match="file">
  <li><p><xsl:value-of select="@name"/></p></li>
</xsl:template> 

<xsl:template match="/">
  <ul><xsl:apply-templates/></ul>
</xsl:template>

Parsing the language is extremely easy: a word is either a string literal or a sequence of non-whitespace characters, a template is defined as : ~foo … ; and a program is a sequence of templates. Of course, extracting the loops and conditionals may require some additional effort, but it remains extremely easy to get done. Writing a virtual machine is equally easy as long as XML extraction and manipulation functions for node sets are available. Even static analysis (of stack depth) is quite simple because of the requirement that every template and loop pushes and pops a fixed number of elements from the stack : a linear-time left-to-right traversal can determine whether the number of elements pushed and popped corresponds to the requirements, and generate code that runs the program. If the program outputs the file directly, an optimization pass can be made with a slightly increased difficulty (involving, mostly, executing the individual templates with slightly different semantics) in order to generate the output file linearly.

2 Responses to “An Exercise in Terseness”


  1. Hey all!

    My name is James and I’m new here :) . So far this is an incredible resource for information and I have spent a ton of time reading. Look forward to hearing from you!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>



1150 feed subscribers
(readers who polled a feed this week)