[rt-users] *very* slow 'OR' searches
Dan Riley
dsr at mail.lns.cornell.edu
Sun Apr 3 13:45:34 EDT 2005
Cast: RT 3.4.1, perl 5.8.3, DBIx::SearchBuilder 1.22, MySQL 4.1.10a.
Setting: a test database with 50 tickets
Searching for "Subject LIKE 'kerberos'" takes a few hundreths of a
second; same for "Content LIKE 'slip'". Searching for "Subject LIKE
'kerberos' OR Content LIKE 'slip'" takes 6 seconds (hundreds of times
longer than the component queries); with a realistic number of slips
it takes effectively forever (hours). The query produced is:
SELECT DISTINCT main.*
FROM Tickets main, Transactions Transactions_1, Attachments Attachments_2
WHERE
((Transactions_1.ObjectType = 'RT::Ticket')) AND
((main.EffectiveId = main.id)) AND
((main.Status != 'deleted')) AND
((main.Type = 'ticket')) AND
((main.Subject LIKE '%kerberos%') OR
((Attachments_2.Content LIKE '%slip%') AND
(Attachments_2.TransactionId = Transactions_1.id) AND
(main.id = Transactions_1.ObjectId) ) )
ORDER BY main.id ASC LIMIT 50;
Feeding that into "explain" says, in part (trimmed to 72 columns):
+----------------+------+-----------------------+---------------+------+
| table | type | possible_keys | key | rows |
+----------------+------+-----------------------+---------------+------+
| main | ALL | PRIMARY | NULL | 51 |
| Transactions_1 | ref | PRIMARY,Transactions1 | Transactions1 | 210 |
| Attachments_2 | ALL | Attachments2 | NULL | 91 |
+----------------+------+-----------------------+---------------+------+
which is not good. For the equivalent query:
SELECT DISTINCT main.* FROM Tickets main
JOIN Transactions Transactions_1 ON (main.id = Transactions_1.ObjectId)
JOIN Attachments Attachments_2
ON (Attachments_2.TransactionId = Transactions_1.id)
WHERE
(Transactions_1.ObjectType = 'RT::Ticket') AND
(main.EffectiveId = main.id) AND
(main.Status != 'deleted') AND
(main.Type = 'ticket') AND
((main.Subject LIKE '%kerberos%') OR
(Attachments_2.Content LIKE '%slip%'))
ORDER BY main.id ASC LIMIT 50;
explain is happier:
+----------------+------+-----------------------+---------------+------+
| table | type | possible_keys | key | rows |
+----------------+------+-----------------------+---------------+------+
| main | ALL | PRIMARY | NULL | 51 |
| Transactions_1 | ref | PRIMARY,Transactions1 | Transactions1 | 2 |
| Attachments_2 | ref | Attachments2 | Attachments2 | 1 |
+----------------+------+-----------------------+---------------+------+
and the query runs in a few hundreths of a second.
I find it somewhat worrisome that it is so easy to stumble across a
simple search that takes essentially forever. Presumably this happens
because neither the query builder nor the MySQL "optimizer" notices
that the first term of the query doesn't depend on Attachments or
Transactions, and so misses the opportunity to move the restrictions
from the second term into JOIN clauses. In general this is probably a
hard problem to solve, but couldn't the query builder do better on the
simple "apples OR oranges" search?
More information about the rt-users
mailing list