[Rt-devel] Searches on attachments.content using Oracle
Joop van de Wege
JoopvandeWege at mococo.nl
Sun Aug 22 09:33:43 EDT 2004
Hi All,
We're using Oracle9i and I noticed that SearchBuilder is not optimised
to use Oracle Context searches when searching for a specific keyword in
the content of a email/attachment. The current situation where the
select looks something like: select * from attachments where content
like 'sendmail%' does work but it is taking ages to come up with a
result.
I have added the following index to RT3.2.1:
CREATE INDEX cnt ON attachments(content) INDEXTYPE IS ctxsys.context;
Beware that if your attachments table is rather large that it is going
to take a while for the index to build. Ours is around 300MB.
Also have a look at the Oracle documentation to see where you can cut
down on creation and maintanance time of this index.
http://download-west.oracle.com/docs/cd/B10501_01/text.920/a96518/csql.htm#19446
http://download-west.oracle.com/docs/cd/B10501_01/text.920/a96518/toc.htm
Also needed as I said is regular updating of the index. I use a Job for
that, SQL follows and adjust to your needs, jobs runs at 06:00AM every
day.:
DECLARE
X NUMBER;
BEGIN
SYS.DBMS_JOB.SUBMIT
( job => X
,what => 'DECLARE
stmt VARCHAR2(200);
BEGIN
stmt := ''ALTER INDEX cnt REBUILD ONLINE PARAMETERS('' || ''''''sync MEMORY 40M'''''' || '')'';
EXECUTE IMMEDIATE stmt;
END;
'
,next_date => TO_DATE('22-08-2004 06:00:00','dd/mm/yyyy hh24:mi:ss')
,INTERVAL => 'TRUNC(SYSDATE+1)+6/24'
,no_parse => TRUE
);
END;
/
And most importantly the modification to Searchbuilder.pm (1.01)
--- SearchBuilder.pm.orig 2004-06-28 04:39:00.000000000 +0200
+++ SearchBuilder.pm 2004-08-04 13:14:00.000000000 +0200
@@ -792,9 +792,20 @@
}
}
- my $clause = "($QualifiedField $args{'OPERATOR'} $args{'VALUE'})";
+ my $clause;
+ if ($QualifiedField eq 'lower(Attachments_2.Content)' and $args{'OPERATOR'} eq 'LIKE') {
+ $clause = '(contains(Attachments_2.Content,' . $args{'VALUE'} . ') >0)';
+ }
+ elsif ($QualifiedField eq 'lower(Attachments_2.Content)' and $args{'OPERATOR'} eq 'NOT LIKE') {
+ $clause = '(not contains(Attachments_2.Content,' . $args{'VALUE'} . ') >0)';
+ }
+ else {
+ $clause = "($QualifiedField $args{'OPERATOR'} $args{'VALUE'})";
+ }
+
+#my $clause = "($QualifiedField $args{'OPERATOR'} $args{'VALUE'})";
# Juju because this should come _AFTER_ the EA
my $prefix = "";
if ( $self->{_open_parens}{$Clause} ) {
To put things into perspective:
select * from attachments where contains(content, 'sendmail')>0
12 rows in ~150msec, that is cached by the database, first time ~25sec.
select * from attachments where content like '%sendmail%'
12 rows in 2:23 first time, ~40 sec every next invocation.
That is just testing this query but it pushes so much data into Oracle
caches that it invalidates a lot of indices causing problems for other
queries. Example: after running the ... like '%sendmail%' the index
caches are almost clear because running my version will now take
~500msec for first invocation and ~150msec on subsequent invocations.
YMMV.
Anyone using Oracle is invited to have a go at it using a test
installation and comment/update this so that once it is stable it can be
incorporated into SearchBuilder and RT.
Thanks Jesse for RT3.2.1. I really, really like it.
Joop
-
Joop van de Wege <JoopvandeWege at mococo.nl>
More information about the Rt-devel
mailing list