Lately, I have heard people refer to Object-Relational Mapping as being equivalent to the advent of compilers in the grand scheme of things, abstracting away relational semantics (which, like assembly, was the domain of expert hackers with overclocked brains and guru-like status that understood it) behind a nice and clean object interface that normal programmers could use. Think of the productivity gains!
And yet, a lot of non-technical people whose jobs include data mining will learn SQL at some point in their career. This includes many people in academia (you have to process that empirical data), librarians and various industry managers. And, from my experience, they have an easier time learning it than the average programmer—who is used to seeing the programming world in an imperative fashion.
The easiest way to explain SQL queries is to think in terms of merge and filter operations. The first step is to merge all the tables you need together. This happens as a cartesian product operation: if you merge two tables, you get a table that contains, as its lines, all pairs consisting in one line from the first table and one line from the second table. So, if you merge five tables with 100 records each, you get a nasty result set containing ten billion records. This is where filtering comes in: of all the records, you only need those who satisfy certain properties. One classic example is performing a join: you only keep the a record if its sub-records share the same key value, thereby turning your ten billion records back into a hundred. And, of course, the mathematical properties of relational algebra allow the database system to reorder certain merge and filter operations, so that you don’t first compute all ten billion rows and only then start filtering them.
As such, SQL is a very practical language: ‘FROM’ and ‘JOIN’ merge tables, ‘WHERE’ and ‘ON’ filter records, and the remaining ‘SELECT’, ‘ORDER BY’, ‘GROUP BY’ and ‘LIMIT’ further format the results returned to the user.
Besides, most database systems also provide features for defining functions and procedures within the database to run almost arbitrary code. This makes it possible to write an entire application that does nothing but parse user input and forward database output back to the user. The database would then take care of the business logic, data storage, and output formatting. Of course, having the database output formatted HTML based on its contents would be crazy. Or would it?
Layered Architectures
The average website is layered. There’s the dispatch layer, which receives the HTTP request and dispatches it to the appropriate controller based on the domain and URI. Then, there’s the controller layer, which parses the user input and initiates the business logic and uses view components to send data back to the user. Underneath, there’s the model, containing both business data and business logic. And below the model is the database itself.
Different architectures give each layer a different weight. Some may have heavy controllers that perform most of the logic, and very thing model and storage layers. Others may have a very thin controller, and rely on the model for everything save the most elementary output formatting.
What if we completely eliminated the model? All the business logic would be stored in the database as stored procedures and stored functions, and all the model would have to do is provide simple points of access to the procedures themselves: format arguments, run the stored procedure as a query, return the result set or success/failure. Oh, and of course, decide whether the query should be sent to the master database or to a slave.
There are a few advantages of doing things this way:
- Performance benefits. Most of the time, the processing that has to be done by the system between the first request and the last request of a given session is very light (or, if it isn’t, proper architecture would demand that it be done asynchronously anyway). By performing everything on the database server, one wastes some time by having the database perform some processing that could have been done by the web script itself, but eliminates all the overhead related to serializing and sending data and the lock overhead for keeping a session locked too long, and allows the database to optimize queries ahead of time based on hints given when declaring the stored procedures and functions.
- Abstraction benefits. When you need to make a database and a program communicate, you just have to create an abstraction that can serve as an interface between the two. If you decide that all operations on the database have to go through stored procedures, then you will end up with a flurry of stored procedures that completely hide the actual storage approach used by the database.
- Interoperability. Any software that can connect to your database system can use the interface. This means that you can write asynchronous processing units in other languages that work with the database to perform costly computations (such as indexing). It also means that, if your software is intended to work on client computers, your client can develop extensions against the database.
There are, of course, downsides as well:
- You need SQL developers and SQL architects. By definition, storing your model in the database means that you will have a relational design as opposed to an object-oriented design. Relational design is by no means inferior, but there are fewer people fluent enough with it than there are object-oriented designers, if only because it’s not encountered as often.
- You might end up using non-portable SQL. By keeping your SQL simple (and hidden behind a Hibernate dialect) you guarantee that your system can be re-deployed to another database system if you need to improve performance, adjust to cost issues or adapt to the architecture of your customer.
- It’s harder to deploy. Stored procedures are installed on the database by running SQL queries that create them. This is an order of magnitude harder than just zipping up a JAR or packing an assembly and uploading it to the server, and two orders of magnitude harder than just copying your PHP source directory to the server. Of course, there are tools that allow editing stored procedures on the server on-the-fly, but they often lose you the benefit of source control.
Choose wisely.
Hi. I'm Victor Nicollet,
Recent Comments