[rt-users] mysql & sphinx

Alex Vandiver alexmv at bestpractical.com
Sun Aug 28 20:38:53 EDT 2011


On Sun, 2011-08-28 at 13:49 +0200, Arkadiusz Miskiewicz wrote:
> I'm going to setup full text search with mysql 5.5, sphinxse 2.1 and sphinxd 
> 0.9.9.
> [snip]
> This means that sphinx will never ever return new matching tickets that are 
> above max_matches :-/ Would be acceptable if it use max_matches counting down 
> from latest one but this doc suggests it's count from first one.

That is not quite what it means.  To make the limitation clearer, assume
there are 100,000 tickets in the database, and only five tickets contain
the word "target" (once each): ticket ids 3, 44, 555, 6666, and 77777.
If max_matches is set to 5, and the search is for "Content LIKE
'target'", all five tickets will be returned.  If max_matches is set to
3, only tickets 3, 44, and 555 will be returned.  That is, max_matches
need not be set to 100,000 to return results in tickets that high; it
should be set comfortably higher than the number of occurrences of the
words you expect to be searching for[*].

Given the same scenario, but with a query, of "Content LIKE 'target' AND
Status = 'Open'" and only ticket 77777 in the Open status, a max_matches
of 5 would suffice to return that one result.  A max_matches of 3 would
return no results, as Sphinx would return only three results to RT (3,
44, and 555) which would then be filtered to only open tickets, which is
a null set.

Does that help to clarify the limitation?  To be sure, it is still an
irritating limitation, and one that i wish we could work around somehow.
Unfortunately, short of pushing more of the search parameters down into
sphinx, which would be a rather complicated piece of work, I see little
way around it.
 - Alex

[*] To complicate matters, this is technically the number of
_attachments_ matching the full-text criteria, not the number of
_tickets_.  That is, ticket 3 contained 500 emails, each of which
contained the word "target", then (contrary to the above example)
max_matches would need to be 501 in order for the results to contain
more than just ticket id 3.




More information about the rt-users mailing list