[Rt-devel] Searches on attachments.content using Oracle
Jesse Vincent
jesse at bestpractical.com
Sun Aug 22 15:11:52 EDT 2004
Oh wow. Thanks! This is great!
I'm only in the netherlands for the next 12 hours. if I were here for
longer I'd show up to buy you a beer.
Jesse
On Aug 22, 2004, at 3:33 PM, Joop van de Wege wrote:
> Hi All,
>
> We're using Oracle9i and I noticed that SearchBuilder is not optimised
> to use Oracle Context searches when searching for a specific keyword in
> the content of a email/attachment. The current situation where the
> select looks something like: select * from attachments where content
> like 'sendmail%' does work but it is taking ages to come up with a
> result.
> I have added the following index to RT3.2.1:
> CREATE INDEX cnt ON attachments(content) INDEXTYPE IS ctxsys.context;
> Beware that if your attachments table is rather large that it is going
> to take a while for the index to build. Ours is around 300MB.
> Also have a look at the Oracle documentation to see where you can cut
> down on creation and maintanance time of this index.
> http://download-west.oracle.com/docs/cd/B10501_01/text.920/a96518/
> csql.htm#19446
> http://download-west.oracle.com/docs/cd/B10501_01/text.920/a96518/
> toc.htm
> Also needed as I said is regular updating of the index. I use a Job for
> that, SQL follows and adjust to your needs, jobs runs at 06:00AM every
> day.:
>
> DECLARE
> X NUMBER;
> BEGIN
> SYS.DBMS_JOB.SUBMIT
> ( job => X
> ,what => 'DECLARE
> stmt VARCHAR2(200);
> BEGIN
> stmt := ''ALTER INDEX cnt REBUILD ONLINE PARAMETERS('' ||
> ''''''sync MEMORY 40M'''''' || '')'';
> EXECUTE IMMEDIATE stmt;
> END;
> '
> ,next_date => TO_DATE('22-08-2004 06:00:00','dd/mm/yyyy
> hh24:mi:ss')
> ,INTERVAL => 'TRUNC(SYSDATE+1)+6/24'
> ,no_parse => TRUE
> );
> END;
> /
>
> And most importantly the modification to Searchbuilder.pm (1.01)
>
>
> --- SearchBuilder.pm.orig 2004-06-28 04:39:00.000000000 +0200
> +++ SearchBuilder.pm 2004-08-04 13:14:00.000000000 +0200
> @@ -792,9 +792,20 @@
> }
>
> }
>
> - my $clause = "($QualifiedField $args{'OPERATOR'} $args{'VALUE'})";
> + my $clause;
> + if ($QualifiedField eq 'lower(Attachments_2.Content)' and
> $args{'OPERATOR'} eq 'LIKE') {
> + $clause = '(contains(Attachments_2.Content,' . $args{'VALUE'} . ')
> >0)';
> + }
> + elsif ($QualifiedField eq 'lower(Attachments_2.Content)' and
> $args{'OPERATOR'} eq 'NOT LIKE') {
> + $clause = '(not contains(Attachments_2.Content,' . $args{'VALUE'} .
> ') >0)';
> + }
> + else {
> + $clause = "($QualifiedField $args{'OPERATOR'}
> $args{'VALUE'})";
> + }
> +
> +#my $clause = "($QualifiedField $args{'OPERATOR'} $args{'VALUE'})";
>
> # Juju because this should come _AFTER_ the EA
> my $prefix = "";
> if ( $self->{_open_parens}{$Clause} ) {
>
>
> To put things into perspective:
> select * from attachments where contains(content, 'sendmail')>0
> 12 rows in ~150msec, that is cached by the database, first time ~25sec.
>
> select * from attachments where content like '%sendmail%'
> 12 rows in 2:23 first time, ~40 sec every next invocation.
>
> That is just testing this query but it pushes so much data into Oracle
> caches that it invalidates a lot of indices causing problems for other
> queries. Example: after running the ... like '%sendmail%' the index
> caches are almost clear because running my version will now take
> ~500msec for first invocation and ~150msec on subsequent invocations.
> YMMV.
>
>
> Anyone using Oracle is invited to have a go at it using a test
> installation and comment/update this so that once it is stable it can
> be
> incorporated into SearchBuilder and RT.
>
>
> Thanks Jesse for RT3.2.1. I really, really like it.
>
> Joop
>
> -
> Joop van de Wege <JoopvandeWege at mococo.nl>
>
> _______________________________________________
> Rt-devel mailing list
> Rt-devel at lists.bestpractical.com
> http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-devel
>
More information about the Rt-devel
mailing list