[rt-users] Search results anomaly
Jeff Blaine
jblaine at kickflop.net
Mon May 17 16:31:57 EDT 2010
* The tickets failing to match "members.linode", but matching
"linode" have an HTML attachment containing the following:
> <META ... content="text/html; charset=windows-1252">
* The handful of tickets that match "members.linode" just so
happen to have that troublesome attachment quoted in a
reply (as text, not HTML).
So, that sounds like the problem. I wish that translated
to an obvious solution for me, but it doesn't :)
On 5/17/2010 3:50 PM, Kenneth Marshall wrote:
> Hi Jeff,
>
> Here is a link to some PostgreSQL documentation on setting
> locales:
>
> http://www.postgresql.org/docs/8.4/static/locale.html
>
> A crude example could be a locale that "sorted" a-i the
> same and o-z the same, and nothing for h. Then the following
>
> ahb
> bha
>
> would be equivalent to:
>
> ab
> ba
>
> Or you could have a set of chars where the upper case
> for one letter (c) did not exist. Then you could not match
>
> upper(abc)
>
> using lower(upper(abc)) == abc since it would produce
> ab == abc and fail. I think that you may be seeing something
> like that with your dot character.
>
> Regards,
> Ken
>
> On Mon, May 17, 2010 at 03:30:27PM -0400, Jeff Blaine wrote:
>> On 5/17/2010 3:26 PM, Kenneth Marshall wrote:
>>> Hi Jeff,
>>>
>>> Those are just what I would have expected. I would take one
>>> of the results from the second set of 77 and figure out why
>>> it is not matching the first query. It does look more and more
>>> like a collation problem. What is the collation of an attachment
>>> that matches one query but not the other?
>>
>> Thanks for the reply.
>>
>> I'll gladly provide any info I can, but you're talking over
>> my head re: collation. Can you provide some more info? I'm
>> not a DB person.
>>
>>> Ken
>>>
>>> On Mon, May 17, 2010 at 03:20:00PM -0400, Jeff Blaine wrote:
>>>> Here are the transactions as logged. FWIW, I find it curious
>>>> that all of these search failures for us so far are with
>>>> search terms that have a '.' in the term...
>>>>
>>>> LOG: duration: 35.815 ms statement: EXECUTE<unnamed> [PREPARE: SELECT
>>>> DISTINCT main.* FROM Tickets main
>>>> JOIN Transactions Transactions_1
>>>> ON ( Transactions_1.ObjectId = main.id )
>>>> JOIN Attachments Attachments_2
>>>> ON ( Attachments_2.TransactionId = Transactions_1.id )
>>>> WHERE (Transactions_1.ObjectType = 'RT::Ticket')
>>>> AND (main.Status != 'deleted')
>>>> AND ( ( Attachments_2.Content ILIKE '%members.linode%' ) AND
>>>> main.Owner = '66')
>>>> AND (main.Type = 'ticket')
>>>> AND (main.EffectiveId = main.id)
>>>> ORDER BY main.id ASC ]
>>>>
>>>> LOG: duration: 38.362 ms statement: EXECUTE<unnamed> [PREPARE: SELECT
>>>> DISTINCT main.* FROM Tickets main
>>>> JOIN Transactions Transactions_1
>>>> ON ( Transactions_1.ObjectId = main.id )
>>>> JOIN Attachments Attachments_2
>>>> ON ( Attachments_2.TransactionId = Transactions_1.id )
>>>> WHERE (Transactions_1.ObjectType = 'RT::Ticket')
>>>> AND (main.Status != 'deleted')
>>>> AND ( ( Attachments_2.Content ILIKE '%linode%' ) AND main.Owner =
>>>> '66')
>>>> AND (main.Type = 'ticket')
>>>> AND (main.EffectiveId = main.id)
>>>> ORDER BY main.id ASC ]
>>>>
>>>> On 5/17/2010 3:00 PM, Kenneth Marshall wrote:
>>>>> And what queries actually hit the database backend for these
>>>>> two? If it is the same SQL query, then the problem is somewhere
>>>>> else. If the query differs, try running the query manually to
>>>>> see why it is giving unexpected results. Again as others have
>>>>> mentioned, collation/encoding could cause these types of differences.
>>>>>
>>>>> Cheers,
>>>>> Ken
>>>>>
>>>>> On Mon, May 17, 2010 at 02:48:49PM -0400, Jeff Blaine wrote:
>>>>>> Another example of this failing:
>>>>>>
>>>>>> rt list -t ticket "Content like members.linode AND Owner = jsmith"
>>>>>>
>>>>>> 14 results
>>>>>>
>>>>>> rt list -t ticket "Content like linode AND Owner = jsmith"
>>>>>>
>>>>>> 77 results, all of which have "members.linode" in the content.
>>>>>>
>>>>>> On 4/26/2010 5:32 PM, Jeff Blaine wrote:
>>>>>>> On 4/26/2010 5:04 PM, Kenneth Marshall wrote:
>>>>>>>> I am not certain what to do. For myself, I would turn on
>>>>>>>> SQL statement logging within the database. Set:
>>>>>>>>
>>>>>>>> log_min_duration_statement = 0
>>>>>>>>
>>>>>>>> in your postgresql.conf for the database and run both
>>>>>>>> queries in RT. Then turn it back done and pick apart the
>>>>>>>> query results to see if the problem is a logic problem or
>>>>>>>> possibly an encoding issue which results in "equal" items
>>>>>>>> not showing as "equal". Also, it looks like you are just
>>>>>>>
>>>>>>> These were sequential in the log as a result of my simple
>>>>>>> "Content matches foo.com" query via the web GUI. One is
>>>>>>> a COUNT and the other is for getting data, apparently.
>>>>>>>
>>>>>>> I formatted them here for easier reading.
>>>>>>>
>>>>>>> LOG: duration: 101.115 ms statement: EXECUTE<unnamed> [PREPARE:
>>>>>>> SELECT COUNT(DISTINCT main.id) FROM Tickets main
>>>>>>> JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId =
>>>>>>> main.id
>>>>>>> )
>>>>>>> JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId =
>>>>>>> Transactions_1.id )
>>>>>>> WHERE (Transactions_1.ObjectType = 'RT::Ticket')
>>>>>>> AND (main.Status != 'deleted')
>>>>>>> AND ( ( Attachments_2.Content ILIKE '%foo.com%' ) )
>>>>>>> AND (main.Type = 'ticket')
>>>>>>> AND (main.EffectiveId = main.id)
>>>>>>> ]
>>>>>>>
>>>>>>> LOG: duration: 105.350 ms statement: EXECUTE<unnamed> [PREPARE:
>>>>>>> SELECT DISTINCT main.* FROM Tickets main
>>>>>>> JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId =
>>>>>>> main.id
>>>>>>> )
>>>>>>> JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId =
>>>>>>> Transactions_1.id )
>>>>>>> WHERE (Transactions_1.ObjectType = 'RT::Ticket')
>>>>>>> AND (main.Status != 'deleted')
>>>>>>> AND ( ( Attachments_2.Content ILIKE '%foo.com%' ) )
>>>>>>> AND (main.Type = 'ticket')
>>>>>>> AND (main.EffectiveId = main.id)
>>>>>>> ORDER BY main.id ASC LIMIT 50
>>>>>>> ]
>>>>>>>
>>>>>>> What "both queries" are you referring to? The RT CLI and the
>>>>>>> RT web GUI both fail the same way with my 1 test query. I'm
>>>>>>> confused by what you mean there.
>>>>>>>
>>>>>>> Thanks for the ideas, Ken
>>>>>>>
>>>>>>>> starting with such a low ticket number, but if you plan
>>>>>>>> on using full content searches for any large-ish amount
>>>>>>>> of data you should really plan on using the full-text index
>>>>>>>> support for PostgreSQL/RT that is described in the wiki.
>>>>>>>> Speaking as one whose database was brought to its knees by
>>>>>>>> some creative data mining. :)
>>>>>>>>
>>>>>>>> Good luck and let us know what you find to be the cause.
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Ken
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Apr 26, 2010 at 04:54:45PM -0400, Jeff Blaine wrote:
>>>>>>>>> On 4/26/2010 4:52 PM, Kenneth Marshall wrote:
>>>>>>>>>> Jeff,
>>>>>>>>>>
>>>>>>>>>> Are you using the Full-text index support from the wiki?
>>>>>>>>>
>>>>>>>>> Nope.
>>>>>>>>>
>>>>>>>>> Just RT 3.8.7 + RTFM 2.4.2. No add-ons/tweaks from the wiki.
>>>>>>>>>
>>>>>>>>>> There were some index bugs that may require you to re-index
>>>>>>>>>> to fix, specifically some rows were not reported correctly
>>>>>>>>>> as valid matches. I believe that the PostgreSQL release
>>>>>>>>>> notes mentioned that need. Maybe that is your problem. We
>>>>>>>>>> are using RT-3.8.5 and PostgreSQL 8.4.2 here with the full-text
>>>>>>>>>> support without an issue.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Ken
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Apr 26, 2010 at 03:00:39PM -0400, Jeff Blaine wrote:
>>>>>>>>>>> On 4/26/2010 2:19 PM, Kenneth Marshall wrote:
>>>>>>>>>>>> Well, that knocks out the ACL issue. Do you think that your
>>>>>>>>>>>> Mason cache is confused? Maybe stop RT, clear the cache, and
>>>>>>>>>>>> restart RT to see if that helps. What DB backend are you using
>>>>>>>>>>>> and which version of RT are you running?
>>>>>>>>>>>
>>>>>>>>>>> RT 3.8.7
>>>>>>>>>>> PostgreSQL as it comes with RHELv5 + updates from yum
>>>>>>>>>>>
>>>>>>>>>>> Clearing the Mason cache didn't help :|
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Ken
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Apr 26, 2010 at 12:47:22PM -0400, Jeff Blaine wrote:
>>>>>>>>>>>>> On 4/26/2010 12:29 PM, Raed El-Hames wrote:
>>>>>>>>>>>>>> Jeff;
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Does your CLI user have permissions on the queue that ticket 39
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> in??
>>>>>>>>>>>>>> login to the web interface with the same cli user and see if
>>>>>>>>>>>>>> you
>>>>>>>>>>>>>> can
>>>>>>>>>>>>>> view the ticket.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, it does.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Again, however, this is not really a report about an anomaly in
>>>>>>>>>>>>> the RT CLI.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The incorrect search results are returned via a web GUI search
>>>>>>>>>>>>> of "Content matches foo.com"
>>>>>>>>>>>>>
>>>>>>>>>>>>> Here, maybe this makes it more clear, showing the same problem
>>>>>>>>>>>>> when using the RT CLI:
>>>>>>>>>>>>>
>>>>>>>>>>>>> [root at rtsrv1 etc]# /apps/rt/bin/rt list "Content like foo.com"
>>>>>>>>>>>>> Query:Content like 'foo.com'
>>>>>>>>>>>>> Ticket Owner Queue Age Told Status Requestor Subject
>>>>>>>>>>>>> --------------------------------------------------------------------------------
>>>>>>>>>>>>>
>>>>>>>>>>>>> 23 mbs Incid 1 wk resolv enVision@ alert
>>>>>>>>>>>>> -NICAlert-Secur
>>>>>>>>>>>>> [root at rtsrv1 etc]#
>>>>>>>>>>>>>
>>>>>>>>>>>>> [root at rtsrv1 etc]# /apps/rt/bin/rt show 39 | grep foo.com
>>>>>>>>>>>>> foo.com blah blah... 1 line... not including in this email
>>>>>>>>>>>>> [root at rtsrv1 etc]#
>>>>>>>>>>>>>
>>>>>>>>>>>>> [root at rtsrv1 etc]# /apps/rt/bin/rt show 23 | grep foo.com
>>>>>>>>>>>>> foo.com blah blah... not including in this email
>>>>>>>>>>>>> foo.com matching lines 66 more times... not including in this
>>>>>>>>>>>>> email
>>>>>>>>>>>>> [root at rtsrv1 etc]#
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards;
>>>>>>>>>>>>>> Roy
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jeff Blaine wrote:
>>>>>>>>>>>>>>> On 4/26/2010 11:50 AM, Kenneth Marshall wrote:
>>>>>>>>>>>>>>>> Hi Jeff,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There is nothing here that indicates a problem. It looks
>>>>>>>>>>>>>>>> like an apples vs. oranges comparison by the time you include
>>>>>>>>>>>>>>>> the actual parameters of the search from the web interface
>>>>>>>>>>>>>>>> and the rt commandline interface and possible privilege and
>>>>>>>>>>>>>>>> ACL differences. You can use DB query logging to figure out
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think my original post is being misinterpreted. The 'rt'
>>>>>>>>>>>>>>> CLI commands aren't doing a search. They're just showing
>>>>>>>>>>>>>>> this list's readers that 'foo.com' does show up in each of
>>>>>>>>>>>>>>> the tickets when doing a simple 'rt show<ticket>'. It's
>>>>>>>>>>>>>>> not a comparison of "CLI search vs. web search".
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> what SQL is being used in the web search or the commandline
>>>>>>>>>>>>>>>> rt and compare the output piece-wise to put yourself at ease.
>>>>>>>>>>>>>>>> Maybe look at the individual components of each of the two
>>>>>>>>>>>>>>>> tickets, as well.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> When viewing the tickets using 'Full headers" and then
>>>>>>>>>>>>>>> "Ctrl-F" to examine every instance of 'foo.com' in each ticket
>>>>>>>>>>>>>>> shows that both tickets have the 'foo.com' in text/html parts
>>>>>>>>>>>>>>> (and only there).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ticket 23 has 67 of those parts and is returned when RT
>>>>>>>>>>>>>>> searching
>>>>>>>>>>>>>>> for 'foo.com'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ticket 39 has 1 of those parts and is not returned when RT
>>>>>>>>>>>>>>> searching
>>>>>>>>>>>>>>> for 'foo.com'
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> By "DB query logging" do you mean Set($StatementLog, "DEBUG");
>>>>>>>>>>>>>>> or something?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the reply, Ken
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Jeff
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>> Ken
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Apr 26, 2010 at 11:21:45AM -0400, Jeff Blaine wrote:
>>>>>>>>>>>>>>>>> Does anyone have any suggestions for how to go about
>>>>>>>>>>>>>>>>> figuring out what is wrong here?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 4/22/2010 2:09 PM, Jeff Blaine wrote:
>>>>>>>>>>>>>>>>>> RT 3.8.7
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> A search for 'Content matches foo.com' is returning some
>>>>>>>>>>>>>>>>>> tickets
>>>>>>>>>>>>>>>>>> and missing others that clearly have foo.com in the
>>>>>>>>>>>>>>>>>> Content.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> [root at rtsrv1 bin]# ./rt show 39 | grep foo.com | wc -l
>>>>>>>>>>>>>>>>>>> 1
>>>>>>>>>>>>>>>>>>> [root at rtsrv1 bin]#
>>>>>>>>>>>>>>>>>>> [root at rtsrv1 bin]# ./rt show 23 | grep foo.com | wc -l
>>>>>>>>>>>>>>>>>>> 67
>>>>>>>>>>>>>>>>>>> [root at rtsrv1 bin]#
>>>>>>>>>>>>>>>>>> 23 shows up in the web search results.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 39 does not.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Any ideas?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Discover RT's hidden secrets with RT Essentials from
>>>>>>>>>>>>>>>>>> O'Reilly
>>>>>>>>>>>>>>>>>> Media.
>>>>>>>>>>>>>>>>>> Buy a copy at http://rtbook.bestpractical.com
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Discover RT's hidden secrets with RT Essentials from
>>>>>>>>>>>>>>>>> O'Reilly
>>>>>>>>>>>>>>>>> Media.
>>>>>>>>>>>>>>>>> Buy a copy at http://rtbook.bestpractical.com
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Discover RT's hidden secrets with RT Essentials from O'Reilly
>>>>>>>>>>>>>>> Media.
>>>>>>>>>>>>>>> Buy a copy at http://rtbook.bestpractical.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Discover RT's hidden secrets with RT Essentials from O'Reilly Media.
>>>>>>> Buy a copy at http://rtbook.bestpractical.com
>>>>>>>
>>>>>>
>>>>>> Discover RT's hidden secrets with RT Essentials from O'Reilly Media.
>>>>>> Buy a copy at http://rtbook.bestpractical.com
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the rt-users
mailing list