[rt-users] Search results anomaly
Kenneth Marshall
ktm at rice.edu
Mon May 17 15:50:48 EDT 2010
Hi Jeff,
Here is a link to some PostgreSQL documentation on setting
locales:
http://www.postgresql.org/docs/8.4/static/locale.html
A crude example could be a locale that "sorted" a-i the
same and o-z the same, and nothing for h. Then the following
ahb
bha
would be equivalent to:
ab
ba
Or you could have a set of chars where the upper case
for one letter (c) did not exist. Then you could not match
upper(abc)
using lower(upper(abc)) == abc since it would produce
ab == abc and fail. I think that you may be seeing something
like that with your dot character.
Regards,
Ken
On Mon, May 17, 2010 at 03:30:27PM -0400, Jeff Blaine wrote:
> On 5/17/2010 3:26 PM, Kenneth Marshall wrote:
>> Hi Jeff,
>>
>> Those are just what I would have expected. I would take one
>> of the results from the second set of 77 and figure out why
>> it is not matching the first query. It does look more and more
>> like a collation problem. What is the collation of an attachment
>> that matches one query but not the other?
>
> Thanks for the reply.
>
> I'll gladly provide any info I can, but you're talking over
> my head re: collation. Can you provide some more info? I'm
> not a DB person.
>
>> Ken
>>
>> On Mon, May 17, 2010 at 03:20:00PM -0400, Jeff Blaine wrote:
>>> Here are the transactions as logged. FWIW, I find it curious
>>> that all of these search failures for us so far are with
>>> search terms that have a '.' in the term...
>>>
>>> LOG: duration: 35.815 ms statement: EXECUTE<unnamed> [PREPARE: SELECT
>>> DISTINCT main.* FROM Tickets main
>>> JOIN Transactions Transactions_1
>>> ON ( Transactions_1.ObjectId = main.id )
>>> JOIN Attachments Attachments_2
>>> ON ( Attachments_2.TransactionId = Transactions_1.id )
>>> WHERE (Transactions_1.ObjectType = 'RT::Ticket')
>>> AND (main.Status != 'deleted')
>>> AND ( ( Attachments_2.Content ILIKE '%members.linode%' ) AND
>>> main.Owner = '66')
>>> AND (main.Type = 'ticket')
>>> AND (main.EffectiveId = main.id)
>>> ORDER BY main.id ASC ]
>>>
>>> LOG: duration: 38.362 ms statement: EXECUTE<unnamed> [PREPARE: SELECT
>>> DISTINCT main.* FROM Tickets main
>>> JOIN Transactions Transactions_1
>>> ON ( Transactions_1.ObjectId = main.id )
>>> JOIN Attachments Attachments_2
>>> ON ( Attachments_2.TransactionId = Transactions_1.id )
>>> WHERE (Transactions_1.ObjectType = 'RT::Ticket')
>>> AND (main.Status != 'deleted')
>>> AND ( ( Attachments_2.Content ILIKE '%linode%' ) AND main.Owner =
>>> '66')
>>> AND (main.Type = 'ticket')
>>> AND (main.EffectiveId = main.id)
>>> ORDER BY main.id ASC ]
>>>
>>> On 5/17/2010 3:00 PM, Kenneth Marshall wrote:
>>>> And what queries actually hit the database backend for these
>>>> two? If it is the same SQL query, then the problem is somewhere
>>>> else. If the query differs, try running the query manually to
>>>> see why it is giving unexpected results. Again as others have
>>>> mentioned, collation/encoding could cause these types of differences.
>>>>
>>>> Cheers,
>>>> Ken
>>>>
>>>> On Mon, May 17, 2010 at 02:48:49PM -0400, Jeff Blaine wrote:
>>>>> Another example of this failing:
>>>>>
>>>>> rt list -t ticket "Content like members.linode AND Owner = jsmith"
>>>>>
>>>>> 14 results
>>>>>
>>>>> rt list -t ticket "Content like linode AND Owner = jsmith"
>>>>>
>>>>> 77 results, all of which have "members.linode" in the content.
>>>>>
>>>>> On 4/26/2010 5:32 PM, Jeff Blaine wrote:
>>>>>> On 4/26/2010 5:04 PM, Kenneth Marshall wrote:
>>>>>>> I am not certain what to do. For myself, I would turn on
>>>>>>> SQL statement logging within the database. Set:
>>>>>>>
>>>>>>> log_min_duration_statement = 0
>>>>>>>
>>>>>>> in your postgresql.conf for the database and run both
>>>>>>> queries in RT. Then turn it back done and pick apart the
>>>>>>> query results to see if the problem is a logic problem or
>>>>>>> possibly an encoding issue which results in "equal" items
>>>>>>> not showing as "equal". Also, it looks like you are just
>>>>>>
>>>>>> These were sequential in the log as a result of my simple
>>>>>> "Content matches foo.com" query via the web GUI. One is
>>>>>> a COUNT and the other is for getting data, apparently.
>>>>>>
>>>>>> I formatted them here for easier reading.
>>>>>>
>>>>>> LOG: duration: 101.115 ms statement: EXECUTE<unnamed> [PREPARE:
>>>>>> SELECT COUNT(DISTINCT main.id) FROM Tickets main
>>>>>> JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId =
>>>>>> main.id
>>>>>> )
>>>>>> JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId =
>>>>>> Transactions_1.id )
>>>>>> WHERE (Transactions_1.ObjectType = 'RT::Ticket')
>>>>>> AND (main.Status != 'deleted')
>>>>>> AND ( ( Attachments_2.Content ILIKE '%foo.com%' ) )
>>>>>> AND (main.Type = 'ticket')
>>>>>> AND (main.EffectiveId = main.id)
>>>>>> ]
>>>>>>
>>>>>> LOG: duration: 105.350 ms statement: EXECUTE<unnamed> [PREPARE:
>>>>>> SELECT DISTINCT main.* FROM Tickets main
>>>>>> JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId =
>>>>>> main.id
>>>>>> )
>>>>>> JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId =
>>>>>> Transactions_1.id )
>>>>>> WHERE (Transactions_1.ObjectType = 'RT::Ticket')
>>>>>> AND (main.Status != 'deleted')
>>>>>> AND ( ( Attachments_2.Content ILIKE '%foo.com%' ) )
>>>>>> AND (main.Type = 'ticket')
>>>>>> AND (main.EffectiveId = main.id)
>>>>>> ORDER BY main.id ASC LIMIT 50
>>>>>> ]
>>>>>>
>>>>>> What "both queries" are you referring to? The RT CLI and the
>>>>>> RT web GUI both fail the same way with my 1 test query. I'm
>>>>>> confused by what you mean there.
>>>>>>
>>>>>> Thanks for the ideas, Ken
>>>>>>
>>>>>>> starting with such a low ticket number, but if you plan
>>>>>>> on using full content searches for any large-ish amount
>>>>>>> of data you should really plan on using the full-text index
>>>>>>> support for PostgreSQL/RT that is described in the wiki.
>>>>>>> Speaking as one whose database was brought to its knees by
>>>>>>> some creative data mining. :)
>>>>>>>
>>>>>>> Good luck and let us know what you find to be the cause.
>>>>>>> Regards,
>>>>>>>
>>>>>>> Ken
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Apr 26, 2010 at 04:54:45PM -0400, Jeff Blaine wrote:
>>>>>>>> On 4/26/2010 4:52 PM, Kenneth Marshall wrote:
>>>>>>>>> Jeff,
>>>>>>>>>
>>>>>>>>> Are you using the Full-text index support from the wiki?
>>>>>>>>
>>>>>>>> Nope.
>>>>>>>>
>>>>>>>> Just RT 3.8.7 + RTFM 2.4.2. No add-ons/tweaks from the wiki.
>>>>>>>>
>>>>>>>>> There were some index bugs that may require you to re-index
>>>>>>>>> to fix, specifically some rows were not reported correctly
>>>>>>>>> as valid matches. I believe that the PostgreSQL release
>>>>>>>>> notes mentioned that need. Maybe that is your problem. We
>>>>>>>>> are using RT-3.8.5 and PostgreSQL 8.4.2 here with the full-text
>>>>>>>>> support without an issue.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Ken
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Apr 26, 2010 at 03:00:39PM -0400, Jeff Blaine wrote:
>>>>>>>>>> On 4/26/2010 2:19 PM, Kenneth Marshall wrote:
>>>>>>>>>>> Well, that knocks out the ACL issue. Do you think that your
>>>>>>>>>>> Mason cache is confused? Maybe stop RT, clear the cache, and
>>>>>>>>>>> restart RT to see if that helps. What DB backend are you using
>>>>>>>>>>> and which version of RT are you running?
>>>>>>>>>>
>>>>>>>>>> RT 3.8.7
>>>>>>>>>> PostgreSQL as it comes with RHELv5 + updates from yum
>>>>>>>>>>
>>>>>>>>>> Clearing the Mason cache didn't help :|
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Ken
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Apr 26, 2010 at 12:47:22PM -0400, Jeff Blaine wrote:
>>>>>>>>>>>> On 4/26/2010 12:29 PM, Raed El-Hames wrote:
>>>>>>>>>>>>> Jeff;
>>>>>>>>>>>>>
>>>>>>>>>>>>> Does your CLI user have permissions on the queue that ticket 39
>>>>>>>>>>>>> is
>>>>>>>>>>>>> in??
>>>>>>>>>>>>> login to the web interface with the same cli user and see if
>>>>>>>>>>>>> you
>>>>>>>>>>>>> can
>>>>>>>>>>>>> view the ticket.
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, it does.
>>>>>>>>>>>>
>>>>>>>>>>>> Again, however, this is not really a report about an anomaly in
>>>>>>>>>>>> the RT CLI.
>>>>>>>>>>>>
>>>>>>>>>>>> The incorrect search results are returned via a web GUI search
>>>>>>>>>>>> of "Content matches foo.com"
>>>>>>>>>>>>
>>>>>>>>>>>> Here, maybe this makes it more clear, showing the same problem
>>>>>>>>>>>> when using the RT CLI:
>>>>>>>>>>>>
>>>>>>>>>>>> [root at rtsrv1 etc]# /apps/rt/bin/rt list "Content like foo.com"
>>>>>>>>>>>> Query:Content like 'foo.com'
>>>>>>>>>>>> Ticket Owner Queue Age Told Status Requestor Subject
>>>>>>>>>>>> --------------------------------------------------------------------------------
>>>>>>>>>>>>
>>>>>>>>>>>> 23 mbs Incid 1 wk resolv enVision@ alert
>>>>>>>>>>>> -NICAlert-Secur
>>>>>>>>>>>> [root at rtsrv1 etc]#
>>>>>>>>>>>>
>>>>>>>>>>>> [root at rtsrv1 etc]# /apps/rt/bin/rt show 39 | grep foo.com
>>>>>>>>>>>> foo.com blah blah... 1 line... not including in this email
>>>>>>>>>>>> [root at rtsrv1 etc]#
>>>>>>>>>>>>
>>>>>>>>>>>> [root at rtsrv1 etc]# /apps/rt/bin/rt show 23 | grep foo.com
>>>>>>>>>>>> foo.com blah blah... not including in this email
>>>>>>>>>>>> foo.com matching lines 66 more times... not including in this
>>>>>>>>>>>> email
>>>>>>>>>>>> [root at rtsrv1 etc]#
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Regards;
>>>>>>>>>>>>> Roy
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jeff Blaine wrote:
>>>>>>>>>>>>>> On 4/26/2010 11:50 AM, Kenneth Marshall wrote:
>>>>>>>>>>>>>>> Hi Jeff,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> There is nothing here that indicates a problem. It looks
>>>>>>>>>>>>>>> like an apples vs. oranges comparison by the time you include
>>>>>>>>>>>>>>> the actual parameters of the search from the web interface
>>>>>>>>>>>>>>> and the rt commandline interface and possible privilege and
>>>>>>>>>>>>>>> ACL differences. You can use DB query logging to figure out
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think my original post is being misinterpreted. The 'rt'
>>>>>>>>>>>>>> CLI commands aren't doing a search. They're just showing
>>>>>>>>>>>>>> this list's readers that 'foo.com' does show up in each of
>>>>>>>>>>>>>> the tickets when doing a simple 'rt show<ticket>'. It's
>>>>>>>>>>>>>> not a comparison of "CLI search vs. web search".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> what SQL is being used in the web search or the commandline
>>>>>>>>>>>>>>> rt and compare the output piece-wise to put yourself at ease.
>>>>>>>>>>>>>>> Maybe look at the individual components of each of the two
>>>>>>>>>>>>>>> tickets, as well.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When viewing the tickets using 'Full headers" and then
>>>>>>>>>>>>>> "Ctrl-F" to examine every instance of 'foo.com' in each ticket
>>>>>>>>>>>>>> shows that both tickets have the 'foo.com' in text/html parts
>>>>>>>>>>>>>> (and only there).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ticket 23 has 67 of those parts and is returned when RT
>>>>>>>>>>>>>> searching
>>>>>>>>>>>>>> for 'foo.com'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Ticket 39 has 1 of those parts and is not returned when RT
>>>>>>>>>>>>>> searching
>>>>>>>>>>>>>> for 'foo.com'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> By "DB query logging" do you mean Set($StatementLog, "DEBUG");
>>>>>>>>>>>>>> or something?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for the reply, Ken
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Jeff
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>> Ken
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Apr 26, 2010 at 11:21:45AM -0400, Jeff Blaine wrote:
>>>>>>>>>>>>>>>> Does anyone have any suggestions for how to go about
>>>>>>>>>>>>>>>> figuring out what is wrong here?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 4/22/2010 2:09 PM, Jeff Blaine wrote:
>>>>>>>>>>>>>>>>> RT 3.8.7
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> A search for 'Content matches foo.com' is returning some
>>>>>>>>>>>>>>>>> tickets
>>>>>>>>>>>>>>>>> and missing others that clearly have foo.com in the
>>>>>>>>>>>>>>>>> Content.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [root at rtsrv1 bin]# ./rt show 39 | grep foo.com | wc -l
>>>>>>>>>>>>>>>>>> 1
>>>>>>>>>>>>>>>>>> [root at rtsrv1 bin]#
>>>>>>>>>>>>>>>>>> [root at rtsrv1 bin]# ./rt show 23 | grep foo.com | wc -l
>>>>>>>>>>>>>>>>>> 67
>>>>>>>>>>>>>>>>>> [root at rtsrv1 bin]#
>>>>>>>>>>>>>>>>> 23 shows up in the web search results.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 39 does not.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Any ideas?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Discover RT's hidden secrets with RT Essentials from
>>>>>>>>>>>>>>>>> O'Reilly
>>>>>>>>>>>>>>>>> Media.
>>>>>>>>>>>>>>>>> Buy a copy at http://rtbook.bestpractical.com
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Discover RT's hidden secrets with RT Essentials from
>>>>>>>>>>>>>>>> O'Reilly
>>>>>>>>>>>>>>>> Media.
>>>>>>>>>>>>>>>> Buy a copy at http://rtbook.bestpractical.com
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Discover RT's hidden secrets with RT Essentials from O'Reilly
>>>>>>>>>>>>>> Media.
>>>>>>>>>>>>>> Buy a copy at http://rtbook.bestpractical.com
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Discover RT's hidden secrets with RT Essentials from O'Reilly Media.
>>>>>> Buy a copy at http://rtbook.bestpractical.com
>>>>>>
>>>>>
>>>>> Discover RT's hidden secrets with RT Essentials from O'Reilly Media.
>>>>> Buy a copy at http://rtbook.bestpractical.com
>>>>>
>>>>
>>>
>>
>
More information about the rt-users
mailing list