MySQL (Un)Maintenance Trick

Not so long ago, I discussed the puzzling fact that in JavaScript, if(x) is not equivalent to if(x == true). Today, I stumbled upon a similar occurence in MySQL.

The Problem

Consider the following table, containing arbitrary text with an «alive» boolean flag:

CREATE TABLE tested (
  txt CHAR(32) NOT NULL,
  alive BOOLEAN NOT NULL,
  PRIMARY KEY(txt),
  KEY(alive)
);

INSERT INTO tested (txt,alive) VALUES
( MD5(1), FALSE ),
( MD5(2), FALSE ),
( MD5(3), FALSE ),
( MD5(4), TRUE ),
( MD5(5), TRUE );

I want to display all the lines that are marked as alive, sorted by their text field. What is the difference between these two requests?

SELECT txt FROM tested WHERE alive ORDER BY txt;
SELECT txt FROM tested WHERE alive = TRUE ORDER BY txt;

And the answer is… both queries will return the same result set! But let’s EXPLAIN them, just in case.

type possible keys key rows
WHERE alive ALL 5
WHERE alive = TRUE ref alive alive 2

The first query will scan through the entire table, whereas the second query will use the index to only run through lines that are still alive. If your table consists of 99% dead elements, the first query will be a hundred times slower than the second one!

The Reason

The fundamental reason for this behavior can be found in the MySQL documentation:

These types are synonyms for TINYINT(1). A value of zero is considered false. Nonzero values are considered true:

However, the values TRUE and FALSE are merely aliases for 1 and 0, respectively, as shown here:

In short, that boolean column is not actually a boolean value, but actually an integer. This means it can contain values that are neither TRUE nor FALSE, such as 2. Such a value would be returned by the first query, but not the second, so the query optimizer is not allowed to turn the first one into the second one. And «this column evaluates to true in a boolean context» is not easily expressed as a key constraint, whereas «this column equals one» is the textbook definition of a key constraint. This explains why the second query is faster.

It also means that the second query might start behaving incorrectly if a non-TRUE, non-FALSE value finds its way into that column.

The Solution

The good news is that NOT foo is mathematically equivalent to foo = FALSE, so that the constraint can be easily rewritten by turning the «alive» property into a «dead» property. Both queries become equivalent, so the second query is a faster yet functionally identical alternative:

CREATE TABLE tested (
  txt CHAR(32) NOT NULL,
  dead BOOLEAN NOT NULL,
  PRIMARY KEY(txt),
  KEY(dead)
);

INSERT INTO tested (txt,dead) VALUES
( MD5(1), TRUE ),
( MD5(2), TRUE ),
( MD5(3), TRUE ),
( MD5(4), FALSE ),
( MD5(5), FALSE );

SELECT txt FROM tested WHERE dead = FALSE ORDER BY txt;

1 Responses to “MySQL (Un)Maintenance Trick”


  1. No Comments
  1. 1 MySQL (Un)Maintenance Trick at Nicollet.Net | mysql

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>



1150 feed subscribers
(readers who polled a feed this week)