Not so long ago, I discussed the puzzling fact that in JavaScript, if(x) is not equivalent to if(x == true). Today, I stumbled upon a similar occurence in MySQL.
The Problem
Consider the following table, containing arbitrary text with an «alive» boolean flag:
CREATE TABLE tested ( txt CHAR(32) NOT NULL, alive BOOLEAN NOT NULL, PRIMARY KEY(txt), KEY(alive) ); INSERT INTO tested (txt,alive) VALUES ( MD5(1), FALSE ), ( MD5(2), FALSE ), ( MD5(3), FALSE ), ( MD5(4), TRUE ), ( MD5(5), TRUE );
I want to display all the lines that are marked as alive, sorted by their text field. What is the difference between these two requests?
SELECT txt FROM tested WHERE alive ORDER BY txt;
SELECT txt FROM tested WHERE alive = TRUE ORDER BY txt;
And the answer is… both queries will return the same result set! But let’s EXPLAIN them, just in case.
| type | possible keys | key | rows | |
WHERE alive |
ALL | 5 | ||
WHERE alive = TRUE |
ref | alive |
alive |
2 |
The first query will scan through the entire table, whereas the second query will use the index to only run through lines that are still alive. If your table consists of 99% dead elements, the first query will be a hundred times slower than the second one!
The Reason
The fundamental reason for this behavior can be found in the MySQL documentation:
These types are synonyms for
TINYINT(1). A value of zero is considered false. Nonzero values are considered true:However, the values
TRUEandFALSEare merely aliases for1and0, respectively, as shown here:
In short, that boolean column is not actually a boolean value, but actually an integer. This means it can contain values that are neither TRUE nor FALSE, such as 2. Such a value would be returned by the first query, but not the second, so the query optimizer is not allowed to turn the first one into the second one. And «this column evaluates to true in a boolean context» is not easily expressed as a key constraint, whereas «this column equals one» is the textbook definition of a key constraint. This explains why the second query is faster.
It also means that the second query might start behaving incorrectly if a non-TRUE, non-FALSE value finds its way into that column.
The Solution
The good news is that NOT foo is mathematically equivalent to foo = FALSE, so that the constraint can be easily rewritten by turning the «alive» property into a «dead» property. Both queries become equivalent, so the second query is a faster yet functionally identical alternative:
CREATE TABLE tested ( txt CHAR(32) NOT NULL, dead BOOLEAN NOT NULL, PRIMARY KEY(txt), KEY(dead) ); INSERT INTO tested (txt,dead) VALUES ( MD5(1), TRUE ), ( MD5(2), TRUE ), ( MD5(3), TRUE ), ( MD5(4), FALSE ), ( MD5(5), FALSE ); SELECT txt FROM tested WHERE dead = FALSE ORDER BY txt;
Hi. I'm Victor Nicollet,
1 Responses to “MySQL (Un)Maintenance Trick”