[Rt-devel] Searches on attachments.content using Oracle

Joop van de Wege JoopvandeWege at mococo.nl
Sun Aug 22 09:33:43 EDT 2004


Hi All,

We're using Oracle9i and I noticed that SearchBuilder is not optimised
to use Oracle Context searches when searching for a specific keyword in
the content of a email/attachment. The current situation where the
select looks something like: select * from  attachments where content
like 'sendmail%' does work but it is taking ages to come up with a
result.
I have added the following index to RT3.2.1:
CREATE INDEX cnt ON attachments(content) INDEXTYPE IS ctxsys.context;
Beware that if your attachments table is rather large that it is going
to take a while for the index to build. Ours is around 300MB.
Also have a look at the Oracle documentation to see where you can cut
down on creation and maintanance time of this index.
http://download-west.oracle.com/docs/cd/B10501_01/text.920/a96518/csql.htm#19446
http://download-west.oracle.com/docs/cd/B10501_01/text.920/a96518/toc.htm
Also needed as I said is regular updating of the index. I use a Job for
that, SQL follows and adjust to your needs, jobs runs at 06:00AM every
day.:

DECLARE
  X NUMBER;
BEGIN
  SYS.DBMS_JOB.SUBMIT
    ( job       => X 
     ,what      => 'DECLARE
   stmt VARCHAR2(200);
BEGIN
   stmt := ''ALTER INDEX cnt REBUILD ONLINE PARAMETERS('' || ''''''sync MEMORY 40M'''''' || '')'';
   EXECUTE IMMEDIATE stmt;
END;
'
     ,next_date => TO_DATE('22-08-2004 06:00:00','dd/mm/yyyy hh24:mi:ss')
     ,INTERVAL  => 'TRUNC(SYSDATE+1)+6/24'
     ,no_parse  => TRUE
    );
END;
/

And most importantly the modification to Searchbuilder.pm (1.01)


--- SearchBuilder.pm.orig	2004-06-28 04:39:00.000000000 +0200
+++ SearchBuilder.pm	2004-08-04 13:14:00.000000000 +0200
@@ -792,9 +792,20 @@
         }
 
     }
 
-    my $clause = "($QualifiedField $args{'OPERATOR'} $args{'VALUE'})";
+    my $clause;
+    if ($QualifiedField eq 'lower(Attachments_2.Content)' and $args{'OPERATOR'} eq 'LIKE') {
+	$clause = '(contains(Attachments_2.Content,' . $args{'VALUE'} . ') >0)';
+        }
+    elsif ($QualifiedField eq 'lower(Attachments_2.Content)' and $args{'OPERATOR'} eq 'NOT LIKE') {
+	$clause = '(not contains(Attachments_2.Content,' . $args{'VALUE'} . ') >0)';
+        }
+    else {
+        $clause = "($QualifiedField $args{'OPERATOR'} $args{'VALUE'})"; 
+    }
+
+#my $clause = "($QualifiedField $args{'OPERATOR'} $args{'VALUE'})";
 
     # Juju because this should come _AFTER_ the EA
     my $prefix = "";
     if ( $self->{_open_parens}{$Clause} ) { 


To put things into perspective:
select * from attachments where contains(content, 'sendmail')>0 
12 rows in ~150msec, that is cached by the database, first time ~25sec.

select * from attachments where content like '%sendmail%'
12 rows in 2:23 first time, ~40 sec every next invocation.

That is just testing this query but it pushes so much data into Oracle
caches that it invalidates a lot of indices causing problems for other
queries. Example: after running the ... like '%sendmail%' the index
caches are almost clear because running my version will now take
~500msec for first invocation and ~150msec on subsequent invocations.
YMMV.


Anyone using Oracle is invited to have a go at it using a test
installation and comment/update this so that once it is stable it can be
incorporated into SearchBuilder and RT.


Thanks Jesse for RT3.2.1. I really, really like it.

Joop

- 
Joop van de Wege <JoopvandeWege at mococo.nl>



More information about the Rt-devel mailing list