Monthly Archive for November, 2008

MVC in Web Development

Model-View-Controller is an architectural pattern that originated in 1979 in the smalltalk language (paper) for non-web purposes, and set foot timidly but with increasing appeal around 2001 (paper, requires registration). Its primary objective is the separation of concerns:

  • Regardless of how it is accessed by the users, the state of the application should be represented both as data (database or file storage, in-memory representation for performance) and as constraints on its functional behavior (keeping the state consistent, reacting correctly to events). This set of concerns is represented by the model.
  • The program will receive requests from users. These requests will usually follow a medium-specific definition, such as an HTTP GET or POST request, Java RMI or command-line execution of a program, and must be parsed and translated before they can be applied to the underlying model.This set of concerns is represented by the controller.
  • The program will display data back to the users. This representation will usually follow a medium-specific definition, either human-readable or computer-readable, such as HTML, XML, JSON, AMF, CSV, PDF or some other format. The data to be displayed, extracted or computed from the model and the request received by the controller, is then translated and wrapped in renderer-specific details. This set of concerns is represented by the view.

At its core, MVC is nothing more than a set of classification rules which answers the question “where should the code for this functionality go?” with a simple high-level answer. As a consequence, a lot of different flavors exist, each with their own (often slightly distinct) approach, and no universally accepted definition of the One True MVC.

A quick example of minimalistic Model-View-Controller in PHP:

// Controller 
$target = $_GET['target']; 

// Model
session_start();
$stats = array();
if (isset($target)) {
  $stats = array( 'hits' => ++ $SESSION[$target]['hits'], 'last' => $SESSION[$target]['last'] );
  $SESSION[$target]['last'] = date();
} 

// View
echo json_encode($stats);

This example receives a GET HTTP request containing a target. It then computes the number of times that target was queried from within the same session, along with the last time it was queried (if any), and updates the data in memory. Finally, it returns the extracted data through HTTP, encoded as JSON. This would be usable, for instance, as an AJAX helper to determine how many times a certain piece of content (article, video) has been viewed recently.

Large MVC frameworks have evolved. One such example, which was discussed on this blog a few months ago, is the Zend framework, which provides a Controller system (which comprises a Front Controller, responsible for dispatching the request to an user-implemented Action Controller that matches the request URI) and a View system (which is mostly a way of using PHP as a templating system by use of phtml files), along with utilities such as Zend_Db to help the implementation of models.

Another MVC framework I could recently get my hands on is the Alfresco Web Scripts system, which uses the filesystem (and a description XML file) for parsing and dispatching HTTP requests to a set of javascript controllers that can access and manipulate the underlying Alfresco data model, with the output being sent in various formats including XHTML formatted with FreeMarker.

As a whole, the Web Development world earns particular benefits from MVC due to the very nature of the HTTP protocol: every web application reacts in the same “dispatch and parse request, apply modifications and extract data, send back formatted response” archetypal sequence. As such, MVC is an excellent means of separating these three steps into cleanly defined blocks.

Common errors in MVC design

The absence of a precise definition of MVC makes it possible to create a seemingly conforming implementation and label it as MVC, have it go unnoticed by the community at large, but cause issues down the road because the implementation does not, in fact, support the usual benefits of MVC.

A frequently encountered situation is the Anemic Domain Model antipattern, where storage of the application data remains in the model, but the entire verification and behavior code is moved to the controller instead. For instance, the model stores and retrieves a plain text password value, and the various controllers related to authentication, registration and password reset all perform a hash on the password before storing it or comparing it.

In this example, one major advantage of the Model-View-Controller is lost: adding a new controller must now respect the invariants required by the other controllers (the new controller would have to manipulate passwords by using the same hash as the other controllers). By contrast, if the invariant had been made part of the model, then it would have been impossible for a new controller to forget about the invariant.

Consider the common situation where the model in Model-View-Controller is a simple database wrapper around several tables. In such a setup, the views and controllers cannot and should not assume any additional constraints or invariants on top of the constraints and invariants guaranteed by the table structure. If your controller assumes that a VARCHAR(16) database column contains at least four alphanumeric characters because another controller put it there, then there is a high probability that this assumption will break at some annoying point in the next few months. A VARCHAR(16) database column contains sixteen random characters of the database’s current encoding. If you wish to enforce an additional invariant (such as being four or more alphanumeric characters), add that invariant to the model (even if this involves adding more model code than originally envisioned).

The anemic domain model problem appears whenever the separation of concerns (the original objective of the MVC approach) is not correctly performed: the concern of enforcing invariants of data is not cleanly encapsulated by the model and is instead left to float freely in both the model and the controller.

Similarly, failure to correctly separate concerns between the view and model or between the view and controller lead to similar issues. For instance, if a view finds itself computing renderer-agnostic data, it’s quite possible that this computation belongs in the model instead, and keeping it in the view will inevitable lead to reimplementation once the same data will be displayed with a different renderer (such as a web service API).

MVC as a recursive pattern

Resist the temptation to create one controller, one model and one view for your program. In several cases, it can be interesting to have several controllers, models and views. It can be interesting to implement part of a model as a complete model-view-controller triad.

For instance, I would point to the analysis by Morales et al, which in an RIA model (where the client keeps its own Model-View-Controller triple) defines the server-side data manipulation as being a model: in order to support querying the server-side model by several means (local API, REST API, JSON/AJAX/AMF, HTTP-HTML and so on) the actual data model must in turn be separated from a server-side view and a server-side controller. Then, the client-side model interacts with the server-side model through the server-side view and controller, as illustrated here.

However, when considering the situation from the eyes of the client code, the server in itself is just part of the implementation of the client-side model: a remote entity which supports read/write queries through an appropriate protocol, which essentially amounts to a model.

Another situation is when a model is composed of sub-models. For instance, when a shopping cart aggregates several products, it can be interesting to make the product model distinct from the shopping cart model. In turn, the shopping cart view will own a product view that represents the product within a shopping cart, which may different from the product view that represents a product on its own.

As such, the shopping cart page contains a top-level controller and view, which recursively manipulate internal custom controller and view for the product model. The product model is independent and has its own top-level controllers and views.

The conclusion is that an MVC architecture should not be seen as a fixed high-level structure. Instead, controllers and views may use sub-controllers and sub-views that operate on other models (benefitting from the fact that a given model can have several controllers and views associated to it).

Persistent Arrays

Imperative style programming has a performance advantage over pure functional programming. This is related to the problem of aliasing, which is a type of functionality that imperative programming supports but pure functional programming does not.

Consider a button: it can be pressed or not. Consider two data structures within the program which both reference the button, which is initially not pressed. Assuming that someone presses the button, how do the changes propagate?

In the imperative version, the button is mutable. Thus, it’s possible to change its state from “not pressed” to “pressed”, and both data structures will be aware of the changes. In the pure functional version, the button is not pressed, and will stay so forever. The only solution is to replace it with a copy that is pressed. However, the two data structures themselves are also immutable, and will still reference the old version (that is not pressed), which then requires the code to replace them in turn with copies that reference the new version. This propagation goes up the entire data structure up to the context of the current function.

Arrays

The most common example of imperative construct that is has no functional equivalent is the array. An array is a mutable container which supports read and write access to elements identified by consecutive integers. All access happens in constant time.

Is it possible to create an immutable version of an array? Mostly no, with small sprinkles of yes. The main difficulty comes from the fact that once a modification is applied to an immutable array, a new array is created and exists independently of the old one. The obvious and naive implementation, which consists in creating a real in-memory copy, is of course suboptimal because it operates in linear time (in the number of elements) for write-access in order to achieve constant time read-access and also wastes a lot of memory. The other naive implementation, which consists in storing the modifications applied to the array as a linked list, achieves linear time (in the number of modifications) for read-access and constant time write-access.

Even when looking into trickier algorithms, attempting to implement an array using only the classic immutable data types (such as tuples, lists and trees) results in non-constant access time for either read or write operations (sometimes both). The typical implementation has logarithmic read-write access, which is good, but still not good enough.

Persistent Arrays

It’s possible, however, to come quite close to constant-time by giving up on implementing the array using pure functional methods. An example of optimisation is Baker’s algorithm. The Objective Caml version of the algorithm (seen below) is inspired by the implementation of Conchon and Fillitre.

The underlying idea is that it is enough for the array to appear immutable, so that its actual implementation may use mutable concepts including the original mutable arrays. The original implementation starts by representing the immutable array as a mutable array plus a list of modifications. This results in write-access that takes constant time and read-access which takes linear time in the number of modifications. Then, Baker optimizes the read process by using rerooting : instead of representing the current array as the original array plus a list of modifications applied to it, the rerooting algorithm represents the original array as the current array plus a list of modifications applied to it. The rerooting process consists in reversing the list of modifications while applying them in order to the array (which takes linear time in the number of modifications) but once it has been done, it does not have to be repeated until a new change happens, so that if the array is used as a normal mutable array (no backtracking) then the reading happens in amortized constant time.

The data structure in itself follows a basic pattern. Every immutable array is either an array node (contains a mutable array) or a diff node (contains a modification as a value-index pair, and a reference to the original immutable array). The nodes are mutable (so that a diff node may become an array node and vice versa). Altering the array creates a diff node referencing the original. Reading from the array reroots the array (so that the manipulated node becomes an array node) and reads from the mutable array thus obtained. Rerooting is a recursive process which, on a diff node, reroots the referenced array, then swaps the current node and the older node (applying the modification represented by the diff, and then replacing that modification with its inverse).

The actual application also supports extracting sub-arrays and representing swaps as a single operation instead of two diffs:

type 'a t =
    { length : int
    ; offset : int
    ; mutable data : 'a node } 

and 'a node =
  | Array of 'a array
  | Diff of int * 'a * 'a t
  | Swap of int * int * 'a t
  | Same of 'a t 

let rec cache a =
  match a.data with
    | Array _       -> ()
    | Same b ->
        cache b ;
        begin match b.data with
          | Array arr ->
              b.data <- Same a ;
              a.data <- Array arr
          | _ -> assert false
        end
    | Swap (i,j,b) ->
        cache b ;
        begin match b.data with
          | Array arr ->
              let x = arr.(i) in arr.(i) <- arr.(j) ; arr.(j) <- x ;
              b.data <- Swap (i,j,a) ;
              a.data <- Array arr
          | _ -> assert false
        end
    | Diff (i,x,b) ->
        cache b ;
        begin match b.data with
          | Array arr    ->
              let x' = arr.(i) in arr.(i) <- x ;
              b.data <- Diff (i,x',b) ;
              a.data <- Array arr
          | _ -> assert false
        end 

let impl a =
  match a.data with
    | Array arr -> arr
    | _         ->
        cache a;
        match a.data with
          | Array arr -> arr
          | _         -> assert false 

let array a =
  { length = length a
  ; offset = 0
  ; data   = Array a } 

let set a i x =
  { length = a.length
  ; offset = a.offset
  ; data   = Diff (i+a.offset,x,a) } 

let sub a o l =
  { length = l
  ; offset = a.offset + o
  ; data   = Same a } 

let swap a i j =
  { length = a.length
  ; offset = a.offset
  ; data   = Swap (i+a.offset,j+a.offset,a) } 

let get a i   = (impl a).(i + a.offset)

Download the implementation here: persistArray.ml, persistArray.mli.These files are available in the public domain (without any guarantee) and provide a reimplementation of the Array module using persistent arrays.



1170 feed subscribers
(readers who polled a feed this week)