[Rt-commit] rt branch, 4.0/mysql-fts, updated. rt-4.0.0-218-gfa5dffc

Alex Vandiver alexmv at bestpractical.com
Thu May 5 21:03:01 EDT 2011


The branch, 4.0/mysql-fts has been updated
       via  fa5dffcb4e2ce943e7506cd4bc56c608725554a9 (commit)
       via  1b66c9baa565369eb1bfb5c13d0e2d3a26407949 (commit)
       via  d16cafcd37bce065739c57fe641c5030fed106cf (commit)
       via  1d665c448c8eb8625c27f8da8bb1a71103d712f2 (commit)
       via  6657efc19964327dfa3a586732de60da55dbfc90 (commit)
      from  a3fc4ec3c8992098473ec5ad4a6cd3ad6f35a4ad (commit)

Summary of changes:
 docs/full_text_indexing.pod     |  159 +++++++++++++++++++++++++++++++++++++++
 etc/RT_Config.pm.in             |   26 +------
 sbin/rt-fulltext-indexer.in     |   91 +++++++---------------
 sbin/rt-setup-fulltext-index.in |   67 +++++------------
 4 files changed, 208 insertions(+), 135 deletions(-)
 create mode 100644 docs/full_text_indexing.pod

- Log -----------------------------------------------------------------
commit 6657efc19964327dfa3a586732de60da55dbfc90
Author: Alex Vandiver <alexmv at bestpractical.com>
Date:   Thu May 5 20:47:54 2011 -0400

    Move the DontSearchBinaryAttachments config to a flag on index create
    
    As the option only affected Oracle indexing, and only during index
    creation, move it to a flag on index creation instead of a runtime
    configuration.

diff --git a/etc/RT_Config.pm.in b/etc/RT_Config.pm.in
index b152325..37290d8 100755
--- a/etc/RT_Config.pm.in
+++ b/etc/RT_Config.pm.in
@@ -1208,20 +1208,6 @@ If C<$DontSearchFileAttachments> is set to 1, then uploaded files
 
 Set($DontSearchFileAttachments, undef);
 
-=item C<$DontSearchBinaryAttachments>
-
-This refers to F<sbin/rt-setup-fulltext-index>.
-
-By default, text attachments are always indexed, and a known set of
-binary attachments are always skipped.
-
-If C<$DontSearchBinaryAttachments> is set to 1, then unrecognized binary
-data types will not be indexed.
-
-=cut
-
-Set($DontSearchBinaryAttachments, undef);
-
 =item C<$OnlySearchActiveTicketsInSimpleSearch>
 
 When query in simple search doesn't have status info, use this to only
diff --git a/sbin/rt-setup-fulltext-index.in b/sbin/rt-setup-fulltext-index.in
index 3c0edbf..adcf0d0 100644
--- a/sbin/rt-setup-fulltext-index.in
+++ b/sbin/rt-setup-fulltext-index.in
@@ -117,6 +117,7 @@ GetOptions(
     'h|help!'        => \$OPT{'help'},
     'ask!'           => \$OPT{'ask'},
     'dry-run!'       => \$OPT{'dryrun'},
+    'binary!'        => \$OPT{'binary'},
 
     'dba=s'          => \$DB{'admin'},
     'dba-password=s' => \$DB{'admin_password'},
@@ -503,7 +504,7 @@ sub ora_create_format_column {
                 WHEN fname IS NOT NULL THEN 'ignore'
         };
     }
-    my $binary = $RT::DontSearchBinaryAttachments? 'ignore' : 'binary';
+    my $binary = $OPT{'binary'} ? 'binary' : 'ignore';
     $detect_format .= qq{
                 WHEN type = 'text' THEN 'text'
                 WHEN type = 'text/rtf' THEN '$binary'
@@ -717,6 +718,11 @@ Creates an Oracle CONTEXT index on the Content column in the Attachments
 table.  It also creates several preferences, functions and triggers to
 support this index.
 
+The Oracle index determines which content-types it will index, at
+creation time.  By default, unknown content-types are ignored; to
+instead index unknown content-types as binary data, pass the C<--binary>
+flag when the index is created.
+
 CONTEXT indexes needperiodic synchronization after any updates; either
 use F<sbin/rt-fulltext-indexer> via cron, or read its documentation for
 alternatives.

commit 1d665c448c8eb8625c27f8da8bb1a71103d712f2
Author: Alex Vandiver <alexmv at bestpractical.com>
Date:   Thu May 5 20:49:52 2011 -0400

    Add a --all flag to iterate and index all new txns

diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index 758b14a..7b2dcab 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -92,15 +92,17 @@ if ( $db_type eq 'Pg' ) {
     %OPT = (
         %OPT,
         limit  => 0,
+        all    => 0,
     );
-    push @OPT_LIST, 'limit=i';
+    push @OPT_LIST, 'limit=i', 'all!';
 }
 elsif ( $db_type eq 'mysql' ) {
     %OPT = (
         %OPT,
         limit  => 0,
+        all    => 0,
     );
-    push @OPT_LIST, 'limit=i';
+    push @OPT_LIST, 'limit=i', 'all!';
 }
 elsif ( $db_type eq 'Oracle' ) {
     %OPT = (
@@ -151,6 +153,7 @@ if ( $db_type eq 'Oracle' ) {
 
 my @types = qw(text html);
 foreach my $type ( @types ) {
+  REDO:
     my $attachments = attachments($type);
     $attachments->Limit(
         FIELD => 'id',
@@ -171,6 +174,7 @@ foreach my $type ( @types ) {
     }
     finalize( $type, $attachments ) if $found;
     clean( $type );
+    goto REDO if $OPT{'all'} and $attachments->Count == ($OPT{'limit'} || 100)
 }
 
 sub attachments {

commit d16cafcd37bce065739c57fe641c5030fed106cf
Author: Alex Vandiver <alexmv at bestpractical.com>
Date:   Thu May 5 20:50:26 2011 -0400

    Make direct sql via `indexer` the approved way of updating the Sphinx index
    
    The (untested, unfinished) xmlpipe2 output is left as a starting point
    for future improvement, but protected by an --xmlpipe2 flag.

diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index 7b2dcab..59b3b21 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -99,10 +99,11 @@ if ( $db_type eq 'Pg' ) {
 elsif ( $db_type eq 'mysql' ) {
     %OPT = (
         %OPT,
-        limit  => 0,
-        all    => 0,
+        limit    => 0,
+        all      => 0,
+        xmlpipe2 => 0,
     );
-    push @OPT_LIST, 'limit=i', 'all!';
+    push @OPT_LIST, 'limit=i', 'all!', 'xmlpipe2!';
 }
 elsif ( $db_type eq 'Oracle' ) {
     %OPT = (
@@ -149,6 +150,18 @@ if ( $db_type eq 'Oracle' ) {
         $index, $OPT{'memory'}
     );
     exit;
+} elsif ( $db_type eq 'mysql' ) {
+    unless ($OPT{'xmlpipe2'}) {
+        print STDERR <<EOT;
+
+Updates to the external Sphinx index are done via running the sphinx
+`indexer` tool:
+
+    indexer rt
+
+EOT
+        exit 1;
+    }
 }
 
 my @types = qw(text html);

commit 1b66c9baa565369eb1bfb5c13d0e2d3a26407949
Author: Alex Vandiver <alexmv at bestpractical.com>
Date:   Thu May 5 20:56:01 2011 -0400

    Remove unreachable "verbose" code

diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index 59b3b21..5bdba64 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -426,10 +426,9 @@ sub goto_specific {
 
 
 # helper functions
-sub verbose  { print @_, "\n" if $OPT{verbose} || $OPT{verbose}; 1 }
 sub debug    { print @_, "\n" if $OPT{debug}; 1 }
-sub error    { $RT::Logger->error(_(@_)); verbose(@_); 1 }
-sub warning  { $RT::Logger->warn(_(@_)); verbose(@_); 1 }
+sub error    { $RT::Logger->error(_(@_)); 1 }
+sub warning  { $RT::Logger->warn(_(@_)); 1 }
 
 =head1 NAME
 

commit fa5dffcb4e2ce943e7506cd4bc56c608725554a9
Author: Alex Vandiver <alexmv at bestpractical.com>
Date:   Thu May 5 20:59:15 2011 -0400

    Centralize full-text searching documentation into one place

diff --git a/docs/full_text_indexing.pod b/docs/full_text_indexing.pod
new file mode 100644
index 0000000..88d88ef
--- /dev/null
+++ b/docs/full_text_indexing.pod
@@ -0,0 +1,159 @@
+=head1 NAME
+
+Full text indexing in RT
+
+=head1 POSTGRES
+
+=head2 Creating and configuring the index
+
+Postgres 8.3 and above support full-text searching natively; to set up
+an the C<ts_vector> column, and creates either a C<GiN> or C<GiST> index
+on it, run:
+
+    sbin/rt-setup-fulltext-index
+
+If you have a non-standard database administrator username or password,
+you may need to pass the C<--dba> or C<--dba-password> options:
+
+    sbin/rt-setup-fulltext-index --dba postgres --dba-password secret
+
+This will also output an appropriate C<%FullTextSearch> configuration to
+add to your F<RT_SiteConfig.pm>; you will need to restart your webserver
+after making these changes.  However, the index will also need to be
+filled before it can be used.  To update the index initially, run:
+
+    sbin/rt-fulltext-indexer --all
+
+This will tokenize and index all existing attachments in your database;
+it may take quite a while if your database already has a large number of
+tickets in it.
+
+=head2 Updating the index
+
+To keep the index up-to-date, you will need to run:
+
+    sbin/rt-fulltext-indexer
+
+...at regular intervals.  By default, this will only tokenize up to 100
+tickets at a time; you can adjust this batch upwards by passing
+C<--limit 500>, for instance.  Larger batch sizes will take longer and
+consume more memory.  Care should be taken to ensure that multiple
+instances of C<rt-fulltext-indexer> are not run at the same time, should
+it be run at too close intervals for the given batch size.
+
+
+=head1 MYSQL
+
+MySQL does not support full-text indexing natively.  However, it does
+integrate with the external Sphinx engine, available from
+L<http://sphinxsearch.com>.  Unfortunately, Sphinx integration (using
+SphinxSE) does require that you recompile MySQL from source.  Most
+distribution-provided packages for MySQL do not include SphinxSE
+integration, merely the external Sphinx tools; these are not sufficient
+for RT's needs.
+
+=head2 Compiling MySQL and SphinxSE
+
+SphinxSE requires MySQL 5.0 or 5.1; later versions of MySQL have not
+been tested at this time.  Sphinx version 2.0.1 has been tested to work,
+but version 0.9.9 may work as well.  Compilation and installation
+instructions for MySQL with SphinxSE can be found at
+L<http://sphinxsearch.com/docs/current.html#sphinxse-installing>.
+
+=head2 Creating and configuring the index
+
+Once MySQL has been recompiled with SphinxSE, and Sphinx itself is
+installed, you may create the required SphinxSE communication table via:
+
+    sbin/rt-setup-fulltext-index
+
+If you have a non-standard database administrator username or password,
+you may need to pass the C<--dba> or C<--dba-password> options:
+
+    sbin/rt-setup-fulltext-index --dba root --dba-password secret
+
+This will also provide you with the appropriate C<%FullTextSearch>
+configuration to add to your F<RT_SiteConfig.pm>; you will need to
+restart your webserver after making these changes.  It will also print a
+sample Sphinx configuration, which should be placed in
+F</etc/sphinx.conf>, or equivalent.
+
+=head2 Updating the index
+
+To fill the index, you will need to run the C<indexer> command-line tool
+provided by Sphinx:
+
+    indexer rt
+
+This command should also be run at regular intervals in order to pick
+new and updated attachments from RT's database.  Failure to do so will
+result in stale data.
+
+=head2 Caveats
+
+Sphinx only returns a finite number of matches to any query; this number
+is controlled by C<max_matches> in F</etc/sphinx.conf> and
+C<%FullTextSearch>'s C<MaxMatches> in C<RT_SiteConfig.pm>, which must be
+kept in sync.  The default, set during C<rt-setup-fulltext-index>, is
+10000.  This limit may lead to false negatives in search results if the
+maximum number of matches is reached but the results returned do not
+match RT's other criteria.
+
+Take, for example, the instance where only three results are returned,
+and tickets 1, 2, 3, 4, and 5 contain the string "target", but only
+ticket 5 is in status "Open".  A search for C<Content LIKE 'target' AND
+Status = 'Open'> may return no results, despite ticket 5 matching those
+criteria, as Sphinx will only return tickets 1, 2, and 3 as possible
+matches.
+
+
+=head1 ORACLE
+
+=head2 Creating and configuring the index
+
+Oracle supports full-text indexing natively; to configure your Oracle
+database for full-text searching, run:
+
+    sbin/rt-setup-fulltext-index
+
+If you have a non-standard database administrator username or password,
+you may need to pass the C<--dba> or C<--dba-password> options:
+
+    sbin/rt-setup-fulltext-index --dba sysdba --dba-password secret
+
+This will create an Oracle CONTEXT index on the Content column in the
+Attachments table, as well as several preferences, functions and
+triggers to support this index.  The script will also output an
+appropriate C<%FullTextSearch> configuration to add to your
+F<RT_SiteConfig>.
+
+The Oracle index determines which content types it will index, at
+creation time.  By default, unknown content-types are ignored; to
+instead index unknown content-types as binary data, pass the C<--binary>
+flag to C<rt-setup-fulltext-index>.
+
+=head2 Updating the index
+
+To update the index, you will need to run the following at regular
+intervals:
+
+    sbin/rt-fulltext-indexer
+
+This, in effect, simply runs:
+
+    begin
+    ctx_ddl.sync_index('rt_fts_index', '2M');
+    end;
+
+The amount of memory used for the sync can be controlled with the
+C<--memory> option:
+
+    rt-fulltext-indexer --memory 10M
+
+Instead of being run via C<cron>, this may instead be run via a
+DBMS_JOB; read the B<Managing DML Operations for a CONTEXT Index>
+chapter of Oracle's B<Text Application Developer's Guide> for details
+how to keep the index optimized, perform garbage collection, and other
+tasks.
+
+=cut
diff --git a/etc/RT_Config.pm.in b/etc/RT_Config.pm.in
index 37290d8..5fb562e 100755
--- a/etc/RT_Config.pm.in
+++ b/etc/RT_Config.pm.in
@@ -1173,14 +1173,8 @@ Set($DefaultSelfServiceSearchResultFormat, qq{
 Full text search (FTS) without database indexing is a very slow
 operation, and is thus disabled by default.
 
-To enable and configure database support for full-text indexing, run
-F<sbin/rt-setup-fulltext-index> and follow the prompts it presents.
-This script will create the necessary indexes and inform you how to
-set the %FullTextSearch configuration.
-
-You will also need to update the index by running
-F<sbin/rt-fulltext-indexer> to run at regular intervals; how frequently
-depends entirely on your workload.
+Before setting C<Indexed> to 1, read F<docs/full_text_indexing.pod> for
+the full details of FTS on your particular database.
 
 It is possible to enable FTS without database indexing support, simply
 by setting the C<Enable> key to 1, while leaving C<Indexed> set to 0.
@@ -1192,8 +1186,6 @@ cause severe performance problems.
 Set(%FullTextSearch,
     Enable  => 0,
     Indexed => 0,
-#    Table   => 'AttachmentsIndex',
-#    Column  => 'ftsindex',
 );
 
 
diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index 5bdba64..19b1592 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -436,65 +436,14 @@ rt-fulltext-indexer - Indexer for full text search
 
 =head1 DESCRIPTION
 
-This is a helper script to keep full text indexes in sync with data.  It
-relies on indexes having been properly configured by
-F<sbin/rt-setup-fulltext-index> first.  This script should be run
-occasionally via C<cron>.
+This is a helper script to keep full text indexes in sync with data.
+Read F<docs/full_text_indexing.pod> for complete details on how and when
+to run it.
 
-=head1 ORACLE
+=head1 AUTHOR
 
-=head2 USAGE
-
-    rt-fulltext-indexer --help
-    rt-fulltext-indexer
-    rt-fulltext-indexer --memory 10M
-
-=head2 DESCRIPTION
-
-Basicly, this script simply runs the following query:
-
-    begin
-    ctx_ddl.sync_index('rt_fts_index', '2M');
-    end;
-
-The ammount of memory used for the sync can be controlled with --memory option.
-
-There is way to do all of this by setting up a DBMS_JOB; read the
-"Managing DML Operations for a CONTEXT Index" chapter from Oracle's
-"Text Application Developer's Guide" for more info on the topic, and
-details how to keep the index optimized, perform garbage collection, and
-other tasks.
-
-=head1 PG
-
-=head2 USAGE
-
-    rt-fulltext-indexer --help
-    rt-fulltext-indexer
-    rt-fulltext-indexer --limit 100
-
-=head2 DESCRIPTION
-
-This script finds attachments that should be indexed and stores them in
-the index.  You can use the --limit option to specify how many
-attachments to process at once; defaults to 100.
-
-=head1 MYSQL
-
-=head2 USAGE
-
-    rt-fulltext-indexer --help
-    rt-fulltext-indexer
-    rt-fulltext-indexer --limit 100
-
-=head2 DESCRIPTION
-
-This script finds attachments that should be indexed and prints an
-xmlpipe2 document stream that can processed by sphinx; the output should
-be piped into. For more details, see the sphinx reference manual.
-
-You can use the --limit option to specify how many attachments to
-process at once; defaults to 100.
+Ruslan Zakirov E<lt>ruz at bestpractical.comE<gt>,
+Alex Vandiver E<lt>alexmv at bestpractical.comE<gt>
 
 =cut
 
diff --git a/sbin/rt-setup-fulltext-index.in b/sbin/rt-setup-fulltext-index.in
index adcf0d0..d22099e 100644
--- a/sbin/rt-setup-fulltext-index.in
+++ b/sbin/rt-setup-fulltext-index.in
@@ -651,10 +651,7 @@ sub show_help {
     my $error = shift;
     RT::Interface::CLI->ShowHelp(
         ExitValue => $error,
-        Sections => $error
-            ? 'NAME|'. uc($DB{'type'}) .'/USAGE'
-            : 'NAME|DESCRIPTION|'. uc($DB{'type'})
-        ,
+        Sections => 'NAME|DESCRIPTION',
     );
 }
 
@@ -699,63 +696,31 @@ rt-setup-fulltext-index - Create indexes for full text search
 
 =head1 DESCRIPTION
 
-Full text indexes are very database specific; this script sets up
-indexing for Oracle, Pg and mysql; specifics for each are below.  After
-creating the indexes, it will print a short section that should be
-inserted into your your RT_SiteConfig.pm to enable the index. You will
-need to restart the web-server after making those changes.
+This script creates the appropriate tables, columns, functions, and / or
+views necessary for full-text searching for your database type.  It will
+drop any existing indexes in the process.
 
-=head1 ORACLE
+Please read F<docs/full_text_indexing.pod> for complete documentation on
+full-text indexing for your database type.
 
-=head2 USAGE
+If you have a non-standard database administrator user or password, you
+may use the C<--dba> and C<--dba-password> parameters to set them
+explicitly:
 
-    rt-setup-fulltext-index --help
     rt-setup-fulltext-index --dba sysdba --dba-password 'secret'
 
-=head2 DESCRIPTION
-
-Creates an Oracle CONTEXT index on the Content column in the Attachments
-table.  It also creates several preferences, functions and triggers to
-support this index.
+To test what will happen without running any DDL, pass the C<--dryrun>
+flag.
 
 The Oracle index determines which content-types it will index, at
 creation time.  By default, unknown content-types are ignored; to
 instead index unknown content-types as binary data, pass the C<--binary>
 flag when the index is created.
 
-CONTEXT indexes needperiodic synchronization after any updates; either
-use F<sbin/rt-fulltext-indexer> via cron, or read its documentation for
-alternatives.
-
-=head1 PG
-
-=head2 USAGE
-
-    rt-setup-fulltext-index --help
-    rt-setup-fulltext-index --dba postgres --dba-password 'secret'
-
-=head2 DESCRIPTION
-
-Creates an additional column to store a ts_vector, and then creates
-either a GiN or GiST index on it.  Use F<sbin/rt-fulltext-indexer> via
-cron to keep the index in sync.
-
-=head1 MYSQL
-
-=head2 USAGE
-
-    rt-setup-fulltext-index --help
-    rt-setup-fulltext-index --dba root
-
-=head2 DESCRIPTION
-
-Full text search in mysql is implemented through the Sphinx storage
-engine (SphinxSE), which your mysql must be compiled with support
-for. Use F<sbin/rt-fulltext-indexer> via cron to keep the index in sync.
-
 =head1 AUTHOR
 
-Ruslan Zakirov E<lt>ruz at bestpractical.comE<gt>
+Ruslan Zakirov E<lt>ruz at bestpractical.comE<gt>,
+Alex Vandiver E<lt>alexmv at bestpractical.comE<gt>
 
 =cut
 

-----------------------------------------------------------------------


More information about the Rt-commit mailing list