Introduction
This is a follow-up of an older article, which examined automatic generation of relational database tables to represent an entity-property-binding data structure. That article concluded by stating that the next step would be the generation of queries.
Deletion, modification and reading queries all have the ability to affect a subset of the available instances of an entity. This article is dedicated to providing subset semantics and a way to compile them to SQL.
Naive solution
In the most naive implementation, subsets would be either singletons (that is, all operations affect a single instance at a time) or the entire set (that is, all operations affect all instances). A simple modification would also allow constructing subsets through union operations from the singletons.
val all : entity -> subset val singleton : id -> subset val union : id list -> subset
Subsets would then be of the form ” or ‘WHERE id = $1 OR id = $2 …‘, which can be generated in mere seconds from Objective Caml code, and could be easily appended to any SQL query (select, delete, update) to determine the working set for that query.
The advantage of this solution is that it’s very simple to implement. The problem is that it moves all the processing from the database to the host code, which then has to sort through heaps of data (and send huge queries to select a large subset that’s not the entire set) to achieve anything meaningful.
Predicate-based solution
Most queries performed on a database are either based on identifiers (thus possible with singletons) or based on logic predicates. As such, a subset is merely the set of all instances of an entity which satisfy a given predicate (such as having a certain e-mail address, or being born before a certain date, or simply being the sole element of a singleton). This semantic approach is used in SQL (the WHERE clause introduces a predicate) and showcased with a distinct syntax in a recent article.
A basic predicate system involves writing a first-order logical expression (that is, without quantifiers) which applies to a single variable (an instance of the appropriate entity) to support the following operations:
P(x) = atom(x) | P(x) × P(x) | P(x) + P(x) | ¬ P(x)
An atom is any elementary comparison between two expressions involving constants, query parameters, the identifier of the instance and its properties. For instance, an example of atom would be less (property “created”) (sub curDate (days 30)), indicating that the instance is older than 30 days, corresponding to WHERE created < DATEADD(“dd”,-30,CURRENT_DATE()) (your SQL dialect may vary).
In this situation, the subset can be translated into an SQL sentence to be placed after the WHERE, with the rules:
atom -> compile(atom) Not(p) -> "NOT (" tr(p) ")" And(p,q) -> "(" tr(p) "AND" tr(q) ")" Or(p,q) -> "(" tr(p) "OR" tr(q) ")"
A preprocessing operation could be to move all Not operations to atoms using the De Morgan rules, and instead negate only atoms, or even convert the predicate to a normal form.
By encapsulating subset operations in a type-safe, no-leak system, it becomes possible to propagate the construction of SQL queries to helper modules, for instance a module that provides the subset of recent elements (younger than 30 days), which may then be intersected or unified at will with other subsets to obtain a full query.
Further work
All of the above does not handle bindings (which are the entity-property-binding representation of cross-table searching). The possible semantics, along with the actual implementation of bindings through SQL joins (and the consequences of that implementation on SQL UPDATE and DELETE queries) remains to be seen in a futher article.
Hi. I'm Victor Nicollet,
0 Responses to “SQL Generation, Part 2”