[Rt-devel] Searches on attachments.content using Oracle

Jesse Vincent jesse at bestpractical.com
Sun Aug 22 15:11:52 EDT 2004


Oh wow. Thanks!  This is great!
  I'm only in the netherlands for the next 12 hours. if I were here for  
longer I'd show up to buy you a beer.

	Jesse


On Aug 22, 2004, at 3:33 PM, Joop van de Wege wrote:

> Hi All,
>
> We're using Oracle9i and I noticed that SearchBuilder is not optimised
> to use Oracle Context searches when searching for a specific keyword in
> the content of a email/attachment. The current situation where the
> select looks something like: select * from  attachments where content
> like 'sendmail%' does work but it is taking ages to come up with a
> result.
> I have added the following index to RT3.2.1:
> CREATE INDEX cnt ON attachments(content) INDEXTYPE IS ctxsys.context;
> Beware that if your attachments table is rather large that it is going
> to take a while for the index to build. Ours is around 300MB.
> Also have a look at the Oracle documentation to see where you can cut
> down on creation and maintanance time of this index.
> http://download-west.oracle.com/docs/cd/B10501_01/text.920/a96518/ 
> csql.htm#19446
> http://download-west.oracle.com/docs/cd/B10501_01/text.920/a96518/ 
> toc.htm
> Also needed as I said is regular updating of the index. I use a Job for
> that, SQL follows and adjust to your needs, jobs runs at 06:00AM every
> day.:
>
> DECLARE
>   X NUMBER;
> BEGIN
>   SYS.DBMS_JOB.SUBMIT
>     ( job       => X
>      ,what      => 'DECLARE
>    stmt VARCHAR2(200);
> BEGIN
>    stmt := ''ALTER INDEX cnt REBUILD ONLINE PARAMETERS('' ||  
> ''''''sync MEMORY 40M'''''' || '')'';
>    EXECUTE IMMEDIATE stmt;
> END;
> '
>      ,next_date => TO_DATE('22-08-2004 06:00:00','dd/mm/yyyy  
> hh24:mi:ss')
>      ,INTERVAL  => 'TRUNC(SYSDATE+1)+6/24'
>      ,no_parse  => TRUE
>     );
> END;
> /
>
> And most importantly the modification to Searchbuilder.pm (1.01)
>
>
> --- SearchBuilder.pm.orig	2004-06-28 04:39:00.000000000 +0200
> +++ SearchBuilder.pm	2004-08-04 13:14:00.000000000 +0200
> @@ -792,9 +792,20 @@
>          }
>
>      }
>
> -    my $clause = "($QualifiedField $args{'OPERATOR'} $args{'VALUE'})";
> +    my $clause;
> +    if ($QualifiedField eq 'lower(Attachments_2.Content)' and  
> $args{'OPERATOR'} eq 'LIKE') {
> +	$clause = '(contains(Attachments_2.Content,' . $args{'VALUE'} . ')  
> >0)';
> +        }
> +    elsif ($QualifiedField eq 'lower(Attachments_2.Content)' and  
> $args{'OPERATOR'} eq 'NOT LIKE') {
> +	$clause = '(not contains(Attachments_2.Content,' . $args{'VALUE'} .  
> ') >0)';
> +        }
> +    else {
> +        $clause = "($QualifiedField $args{'OPERATOR'}  
> $args{'VALUE'})";
> +    }
> +
> +#my $clause = "($QualifiedField $args{'OPERATOR'} $args{'VALUE'})";
>
>      # Juju because this should come _AFTER_ the EA
>      my $prefix = "";
>      if ( $self->{_open_parens}{$Clause} ) {
>
>
> To put things into perspective:
> select * from attachments where contains(content, 'sendmail')>0
> 12 rows in ~150msec, that is cached by the database, first time ~25sec.
>
> select * from attachments where content like '%sendmail%'
> 12 rows in 2:23 first time, ~40 sec every next invocation.
>
> That is just testing this query but it pushes so much data into Oracle
> caches that it invalidates a lot of indices causing problems for other
> queries. Example: after running the ... like '%sendmail%' the index
> caches are almost clear because running my version will now take
> ~500msec for first invocation and ~150msec on subsequent invocations.
> YMMV.
>
>
> Anyone using Oracle is invited to have a go at it using a test
> installation and comment/update this so that once it is stable it can  
> be
> incorporated into SearchBuilder and RT.
>
>
> Thanks Jesse for RT3.2.1. I really, really like it.
>
> Joop
>
> -
> Joop van de Wege <JoopvandeWege at mococo.nl>
>
> _______________________________________________
> Rt-devel mailing list
> Rt-devel at lists.bestpractical.com
> http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-devel
>



More information about the Rt-devel mailing list