[rt-users] Search results anomaly
Kenneth Marshall
ktm at rice.edu
Mon May 17 17:05:08 EDT 2010
Hi Jeff,
Yes, that sounds like the problem. What happens if you change
the SQL query to:
LOG: duration: 35.815 ms statement: EXECUTE<unnamed> [PREPARE:
SELECT
DISTINCT main.* FROM Tickets main
JOIN Transactions Transactions_1
ON ( Transactions_1.ObjectId = main.id )
JOIN Attachments Attachments_2
ON ( Attachments_2.TransactionId = Transactions_1.id )
WHERE (Transactions_1.ObjectType = 'RT::Ticket')
AND (main.Status != 'deleted')
AND ( ( Attachments_2.Content LIKE '%members.linode%' ) AND main.Owner = '66')
AND (main.Type = 'ticket')
AND (main.EffectiveId = main.id)
ORDER BY main.id ASC ]
Using LIKE instead of ILIKE. It looks like there is no uppercase
period which is why it fails. It looks like you will need to use
a compound check for "members" and "linode". You might also give
the fulltext indexing support a try since the case is basically
not included in the indexes so does not matter.
At least it is doing the right thing, not what you want. :)
Regards,
Ken
On Mon, May 17, 2010 at 04:31:57PM -0400, Jeff Blaine wrote:
> * The tickets failing to match "members.linode", but matching
> "linode" have an HTML attachment containing the following:
>
> > <META ... content="text/html; charset=windows-1252">
>
> * The handful of tickets that match "members.linode" just so
> happen to have that troublesome attachment quoted in a
> reply (as text, not HTML).
>
> So, that sounds like the problem. I wish that translated
> to an obvious solution for me, but it doesn't :)
>
> On 5/17/2010 3:50 PM, Kenneth Marshall wrote:
>> Hi Jeff,
>>
>> Here is a link to some PostgreSQL documentation on setting
>> locales:
>>
>> http://www.postgresql.org/docs/8.4/static/locale.html
>>
>> A crude example could be a locale that "sorted" a-i the
>> same and o-z the same, and nothing for h. Then the following
>>
>> ahb
>> bha
>>
>> would be equivalent to:
>>
>> ab
>> ba
>>
>> Or you could have a set of chars where the upper case
>> for one letter (c) did not exist. Then you could not match
>>
>> upper(abc)
>>
>> using lower(upper(abc)) == abc since it would produce
>> ab == abc and fail. I think that you may be seeing something
>> like that with your dot character.
>>
>> Regards,
>> Ken
>>
>> On Mon, May 17, 2010 at 03:30:27PM -0400, Jeff Blaine wrote:
>>> On 5/17/2010 3:26 PM, Kenneth Marshall wrote:
>>>> Hi Jeff,
>>>>
>>>> Those are just what I would have expected. I would take one
>>>> of the results from the second set of 77 and figure out why
>>>> it is not matching the first query. It does look more and more
>>>> like a collation problem. What is the collation of an attachment
>>>> that matches one query but not the other?
>>>
>>> Thanks for the reply.
>>>
>>> I'll gladly provide any info I can, but you're talking over
>>> my head re: collation. Can you provide some more info? I'm
>>> not a DB person.
>>>
>>>> Ken
>>>>
>>>> On Mon, May 17, 2010 at 03:20:00PM -0400, Jeff Blaine wrote:
>>>>> Here are the transactions as logged. FWIW, I find it curious
>>>>> that all of these search failures for us so far are with
>>>>> search terms that have a '.' in the term...
>>>>>
>>>>> LOG: duration: 35.815 ms statement: EXECUTE<unnamed> [PREPARE:
>>>>> SELECT
>>>>> DISTINCT main.* FROM Tickets main
>>>>> JOIN Transactions Transactions_1
>>>>> ON ( Transactions_1.ObjectId = main.id )
>>>>> JOIN Attachments Attachments_2
>>>>> ON ( Attachments_2.TransactionId = Transactions_1.id )
>>>>> WHERE (Transactions_1.ObjectType = 'RT::Ticket')
>>>>> AND (main.Status != 'deleted')
>>>>> AND ( ( Attachments_2.Content ILIKE '%members.linode%' ) AND
>>>>> main.Owner = '66')
>>>>> AND (main.Type = 'ticket')
>>>>> AND (main.EffectiveId = main.id)
>>>>> ORDER BY main.id ASC ]
>>>>>
>>>>> LOG: duration: 38.362 ms statement: EXECUTE<unnamed> [PREPARE:
>>>>> SELECT
>>>>> DISTINCT main.* FROM Tickets main
>>>>> JOIN Transactions Transactions_1
>>>>> ON ( Transactions_1.ObjectId = main.id )
>>>>> JOIN Attachments Attachments_2
>>>>> ON ( Attachments_2.TransactionId = Transactions_1.id )
>>>>> WHERE (Transactions_1.ObjectType = 'RT::Ticket')
>>>>> AND (main.Status != 'deleted')
>>>>> AND ( ( Attachments_2.Content ILIKE '%linode%' ) AND
>>>>> main.Owner =
>>>>> '66')
>>>>> AND (main.Type = 'ticket')
>>>>> AND (main.EffectiveId = main.id)
>>>>> ORDER BY main.id ASC ]
>>>>>
>>>>> On 5/17/2010 3:00 PM, Kenneth Marshall wrote:
>>>>>> And what queries actually hit the database backend for these
>>>>>> two? If it is the same SQL query, then the problem is somewhere
>>>>>> else. If the query differs, try running the query manually to
>>>>>> see why it is giving unexpected results. Again as others have
>>>>>> mentioned, collation/encoding could cause these types of differences.
>>>>>>
>>>>>> Cheers,
>>>>>> Ken
>>>>>>
>>>>>> On Mon, May 17, 2010 at 02:48:49PM -0400, Jeff Blaine wrote:
>>>>>>> Another example of this failing:
>>>>>>>
>>>>>>> rt list -t ticket "Content like members.linode AND Owner = jsmith"
>>>>>>>
>>>>>>> 14 results
>>>>>>>
>>>>>>> rt list -t ticket "Content like linode AND Owner = jsmith"
>>>>>>>
>>>>>>> 77 results, all of which have "members.linode" in the content.
>>>>>>>
>>>>>>> On 4/26/2010 5:32 PM, Jeff Blaine wrote:
>>>>>>>> On 4/26/2010 5:04 PM, Kenneth Marshall wrote:
>>>>>>>>> I am not certain what to do. For myself, I would turn on
>>>>>>>>> SQL statement logging within the database. Set:
>>>>>>>>>
>>>>>>>>> log_min_duration_statement = 0
>>>>>>>>>
>>>>>>>>> in your postgresql.conf for the database and run both
>>>>>>>>> queries in RT. Then turn it back done and pick apart the
>>>>>>>>> query results to see if the problem is a logic problem or
>>>>>>>>> possibly an encoding issue which results in "equal" items
>>>>>>>>> not showing as "equal". Also, it looks like you are just
>>>>>>>>
>>>>>>>> These were sequential in the log as a result of my simple
>>>>>>>> "Content matches foo.com" query via the web GUI. One is
>>>>>>>> a COUNT and the other is for getting data, apparently.
>>>>>>>>
>>>>>>>> I formatted them here for easier reading.
>>>>>>>>
>>>>>>>> LOG: duration: 101.115 ms statement: EXECUTE<unnamed> [PREPARE:
>>>>>>>> SELECT COUNT(DISTINCT main.id) FROM Tickets main
>>>>>>>> JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId =
>>>>>>>> main.id
>>>>>>>> )
>>>>>>>> JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId =
>>>>>>>> Transactions_1.id )
>>>>>>>> WHERE (Transactions_1.ObjectType = 'RT::Ticket')
>>>>>>>> AND (main.Status != 'deleted')
>>>>>>>> AND ( ( Attachments_2.Content ILIKE '%foo.com%' ) )
>>>>>>>> AND (main.Type = 'ticket')
>>>>>>>> AND (main.EffectiveId = main.id)
>>>>>>>> ]
>>>>>>>>
>>>>>>>> LOG: duration: 105.350 ms statement: EXECUTE<unnamed> [PREPARE:
>>>>>>>> SELECT DISTINCT main.* FROM Tickets main
>>>>>>>> JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId =
>>>>>>>> main.id
>>>>>>>> )
>>>>>>>> JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId =
>>>>>>>> Transactions_1.id )
>>>>>>>> WHERE (Transactions_1.ObjectType = 'RT::Ticket')
>>>>>>>> AND (main.Status != 'deleted')
>>>>>>>> AND ( ( Attachments_2.Content ILIKE '%foo.com%' ) )
>>>>>>>> AND (main.Type = 'ticket')
>>>>>>>> AND (main.EffectiveId = main.id)
>>>>>>>> ORDER BY main.id ASC LIMIT 50
>>>>>>>> ]
>>>>>>>>
>>>>>>>> What "both queries" are you referring to? The RT CLI and the
>>>>>>>> RT web GUI both fail the same way with my 1 test query. I'm
>>>>>>>> confused by what you mean there.
>>>>>>>>
>>>>>>>> Thanks for the ideas, Ken
>>>>>>>>
>>>>>>>>> starting with such a low ticket number, but if you plan
>>>>>>>>> on using full content searches for any large-ish amount
>>>>>>>>> of data you should really plan on using the full-text index
>>>>>>>>> support for PostgreSQL/RT that is described in the wiki.
>>>>>>>>> Speaking as one whose database was brought to its knees by
>>>>>>>>> some creative data mining. :)
>>>>>>>>>
>>>>>>>>> Good luck and let us know what you find to be the cause.
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Ken
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Apr 26, 2010 at 04:54:45PM -0400, Jeff Blaine wrote:
>>>>>>>>>> On 4/26/2010 4:52 PM, Kenneth Marshall wrote:
>>>>>>>>>>> Jeff,
>>>>>>>>>>>
>>>>>>>>>>> Are you using the Full-text index support from the wiki?
>>>>>>>>>>
>>>>>>>>>> Nope.
>>>>>>>>>>
>>>>>>>>>> Just RT 3.8.7 + RTFM 2.4.2. No add-ons/tweaks from the wiki.
>>>>>>>>>>
>>>>>>>>>>> There were some index bugs that may require you to re-index
>>>>>>>>>>> to fix, specifically some rows were not reported correctly
>>>>>>>>>>> as valid matches. I believe that the PostgreSQL release
>>>>>>>>>>> notes mentioned that need. Maybe that is your problem. We
>>>>>>>>>>> are using RT-3.8.5 and PostgreSQL 8.4.2 here with the full-text
>>>>>>>>>>> support without an issue.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Ken
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Apr 26, 2010 at 03:00:39PM -0400, Jeff Blaine wrote:
>>>>>>>>>>>> On 4/26/2010 2:19 PM, Kenneth Marshall wrote:
>>>>>>>>>>>>> Well, that knocks out the ACL issue. Do you think that your
>>>>>>>>>>>>> Mason cache is confused? Maybe stop RT, clear the cache, and
>>>>>>>>>>>>> restart RT to see if that helps. What DB backend are you using
>>>>>>>>>>>>> and which version of RT are you running?
>>>>>>>>>>>>
>>>>>>>>>>>> RT 3.8.7
>>>>>>>>>>>> PostgreSQL as it comes with RHELv5 + updates from yum
>>>>>>>>>>>>
>>>>>>>>>>>> Clearing the Mason cache didn't help :|
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>> Ken
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Apr 26, 2010 at 12:47:22PM -0400, Jeff Blaine wrote:
>>>>>>>>>>>>>> On 4/26/2010 12:29 PM, Raed El-Hames wrote:
>>>>>>>>>>>>>>> Jeff;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Does your CLI user have permissions on the queue that ticket
>>>>>>>>>>>>>>> 39
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> in??
>>>>>>>>>>>>>>> login to the web interface with the same cli user and see if
>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>> view the ticket.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Yes, it does.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Again, however, this is not really a report about an anomaly
>>>>>>>>>>>>>> in
>>>>>>>>>>>>>> the RT CLI.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The incorrect search results are returned via a web GUI search
>>>>>>>>>>>>>> of "Content matches foo.com"
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here, maybe this makes it more clear, showing the same problem
>>>>>>>>>>>>>> when using the RT CLI:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [root at rtsrv1 etc]# /apps/rt/bin/rt list "Content like foo.com"
>>>>>>>>>>>>>> Query:Content like 'foo.com'
>>>>>>>>>>>>>> Ticket Owner Queue Age Told Status Requestor Subject
>>>>>>>>>>>>>> --------------------------------------------------------------------------------
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 23 mbs Incid 1 wk resolv enVision@ alert
>>>>>>>>>>>>>> -NICAlert-Secur
>>>>>>>>>>>>>> [root at rtsrv1 etc]#
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [root at rtsrv1 etc]# /apps/rt/bin/rt show 39 | grep foo.com
>>>>>>>>>>>>>> foo.com blah blah... 1 line... not including in this email
>>>>>>>>>>>>>> [root at rtsrv1 etc]#
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [root at rtsrv1 etc]# /apps/rt/bin/rt show 23 | grep foo.com
>>>>>>>>>>>>>> foo.com blah blah... not including in this email
>>>>>>>>>>>>>> foo.com matching lines 66 more times... not including in this
>>>>>>>>>>>>>> email
>>>>>>>>>>>>>> [root at rtsrv1 etc]#
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards;
>>>>>>>>>>>>>>> Roy
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jeff Blaine wrote:
>>>>>>>>>>>>>>>> On 4/26/2010 11:50 AM, Kenneth Marshall wrote:
>>>>>>>>>>>>>>>>> Hi Jeff,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> There is nothing here that indicates a problem. It looks
>>>>>>>>>>>>>>>>> like an apples vs. oranges comparison by the time you
>>>>>>>>>>>>>>>>> include
>>>>>>>>>>>>>>>>> the actual parameters of the search from the web interface
>>>>>>>>>>>>>>>>> and the rt commandline interface and possible privilege and
>>>>>>>>>>>>>>>>> ACL differences. You can use DB query logging to figure out
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think my original post is being misinterpreted. The 'rt'
>>>>>>>>>>>>>>>> CLI commands aren't doing a search. They're just showing
>>>>>>>>>>>>>>>> this list's readers that 'foo.com' does show up in each of
>>>>>>>>>>>>>>>> the tickets when doing a simple 'rt show<ticket>'. It's
>>>>>>>>>>>>>>>> not a comparison of "CLI search vs. web search".
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> what SQL is being used in the web search or the commandline
>>>>>>>>>>>>>>>>> rt and compare the output piece-wise to put yourself at
>>>>>>>>>>>>>>>>> ease.
>>>>>>>>>>>>>>>>> Maybe look at the individual components of each of the two
>>>>>>>>>>>>>>>>> tickets, as well.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> When viewing the tickets using 'Full headers" and then
>>>>>>>>>>>>>>>> "Ctrl-F" to examine every instance of 'foo.com' in each
>>>>>>>>>>>>>>>> ticket
>>>>>>>>>>>>>>>> shows that both tickets have the 'foo.com' in text/html
>>>>>>>>>>>>>>>> parts
>>>>>>>>>>>>>>>> (and only there).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ticket 23 has 67 of those parts and is returned when RT
>>>>>>>>>>>>>>>> searching
>>>>>>>>>>>>>>>> for 'foo.com'
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Ticket 39 has 1 of those parts and is not returned when RT
>>>>>>>>>>>>>>>> searching
>>>>>>>>>>>>>>>> for 'foo.com'
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> By "DB query logging" do you mean Set($StatementLog,
>>>>>>>>>>>>>>>> "DEBUG");
>>>>>>>>>>>>>>>> or something?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for the reply, Ken
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Jeff
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>> Ken
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Apr 26, 2010 at 11:21:45AM -0400, Jeff Blaine
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> Does anyone have any suggestions for how to go about
>>>>>>>>>>>>>>>>>> figuring out what is wrong here?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 4/22/2010 2:09 PM, Jeff Blaine wrote:
>>>>>>>>>>>>>>>>>>> RT 3.8.7
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> A search for 'Content matches foo.com' is returning some
>>>>>>>>>>>>>>>>>>> tickets
>>>>>>>>>>>>>>>>>>> and missing others that clearly have foo.com in the
>>>>>>>>>>>>>>>>>>> Content.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> [root at rtsrv1 bin]# ./rt show 39 | grep foo.com | wc -l
>>>>>>>>>>>>>>>>>>>> 1
>>>>>>>>>>>>>>>>>>>> [root at rtsrv1 bin]#
>>>>>>>>>>>>>>>>>>>> [root at rtsrv1 bin]# ./rt show 23 | grep foo.com | wc -l
>>>>>>>>>>>>>>>>>>>> 67
>>>>>>>>>>>>>>>>>>>> [root at rtsrv1 bin]#
>>>>>>>>>>>>>>>>>>> 23 shows up in the web search results.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 39 does not.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Any ideas?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Discover RT's hidden secrets with RT Essentials from
>>>>>>>>>>>>>>>>>>> O'Reilly
>>>>>>>>>>>>>>>>>>> Media.
>>>>>>>>>>>>>>>>>>> Buy a copy at http://rtbook.bestpractical.com
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Discover RT's hidden secrets with RT Essentials from
>>>>>>>>>>>>>>>>>> O'Reilly
>>>>>>>>>>>>>>>>>> Media.
>>>>>>>>>>>>>>>>>> Buy a copy at http://rtbook.bestpractical.com
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Discover RT's hidden secrets with RT Essentials from
>>>>>>>>>>>>>>>> O'Reilly
>>>>>>>>>>>>>>>> Media.
>>>>>>>>>>>>>>>> Buy a copy at http://rtbook.bestpractical.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> Discover RT's hidden secrets with RT Essentials from O'Reilly Media.
>>>>>>>> Buy a copy at http://rtbook.bestpractical.com
>>>>>>>>
>>>>>>>
>>>>>>> Discover RT's hidden secrets with RT Essentials from O'Reilly Media.
>>>>>>> Buy a copy at http://rtbook.bestpractical.com
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the rt-users
mailing list