|
written by Tomas M. 149 days ago From time to time, I need to select a random row from MySQL table. The official documentation suggests SELECT * FROM table1 ORDER BY RAND() LIMIT 1, but this is very dangerous. It can take ages to select a random row from a table with millions of records. The reason can be noticed after running the same query with 'EXPLAIN' keyword at the beginning: Using temporary; Using filesort. It's the worst situation which could happen.
I'm always looking for as best optimalization as possible, so I had to come up with something faster. And here is the result:
SELECT @r:=RAND(); SELECT @i:= (SELECT MIN(id) FROM table1 WHERE id>= (SELECT (@r*(SELECT MAX(id) FROM table1)))); SELECT * FROM table1 WHERE id=@i; As you can see, there are 3 queries needed instead of one, but the time to find out the particular random row is reduced to the smallest possible value, because there is no more needed to create any temporary tables or sort them. After EXPLAINing the queries again, you'll find out that most of the statements were optimized away entirely, and the final SELECT uses a const reference, thus returns immediately too.
Benchmark shows that my server can run 10 millions (10^7) of these queries in 1 second, while it can execute only 20 (!!!) queries using ORDER BY RAND(). This seems too good to be true, so perhaps I made an error somewhere in the benchmarking ... but in all cases, you can be 100% sure the new solution is way faster.
It has one drawback though. If the table has 'holes' in the primary key column (for example when some of the rows are deleted from the table and the given ID value is no longer used), the random row is not so precisely random. That can be solved by creating a map table, but I personally don't need this since the distribution of 'holes' in my tables is pretty random too, so the result is sufficient.
[1] Hugh Johnson wrote 147 days ago:
Wow man I didn't expect to come here and get a great sql tip. Thanks, that is indeed useful. | |