[Rt-commit] rt branch, 4.0/mysql-fts, updated. rt-4.0.0-218-gfa5dffc
Alex Vandiver
alexmv at bestpractical.com
Thu May 5 21:03:01 EDT 2011
The branch, 4.0/mysql-fts has been updated
via fa5dffcb4e2ce943e7506cd4bc56c608725554a9 (commit)
via 1b66c9baa565369eb1bfb5c13d0e2d3a26407949 (commit)
via d16cafcd37bce065739c57fe641c5030fed106cf (commit)
via 1d665c448c8eb8625c27f8da8bb1a71103d712f2 (commit)
via 6657efc19964327dfa3a586732de60da55dbfc90 (commit)
from a3fc4ec3c8992098473ec5ad4a6cd3ad6f35a4ad (commit)
Summary of changes:
docs/full_text_indexing.pod | 159 +++++++++++++++++++++++++++++++++++++++
etc/RT_Config.pm.in | 26 +------
sbin/rt-fulltext-indexer.in | 91 +++++++---------------
sbin/rt-setup-fulltext-index.in | 67 +++++------------
4 files changed, 208 insertions(+), 135 deletions(-)
create mode 100644 docs/full_text_indexing.pod
- Log -----------------------------------------------------------------
commit 6657efc19964327dfa3a586732de60da55dbfc90
Author: Alex Vandiver <alexmv at bestpractical.com>
Date: Thu May 5 20:47:54 2011 -0400
Move the DontSearchBinaryAttachments config to a flag on index create
As the option only affected Oracle indexing, and only during index
creation, move it to a flag on index creation instead of a runtime
configuration.
diff --git a/etc/RT_Config.pm.in b/etc/RT_Config.pm.in
index b152325..37290d8 100755
--- a/etc/RT_Config.pm.in
+++ b/etc/RT_Config.pm.in
@@ -1208,20 +1208,6 @@ If C<$DontSearchFileAttachments> is set to 1, then uploaded files
Set($DontSearchFileAttachments, undef);
-=item C<$DontSearchBinaryAttachments>
-
-This refers to F<sbin/rt-setup-fulltext-index>.
-
-By default, text attachments are always indexed, and a known set of
-binary attachments are always skipped.
-
-If C<$DontSearchBinaryAttachments> is set to 1, then unrecognized binary
-data types will not be indexed.
-
-=cut
-
-Set($DontSearchBinaryAttachments, undef);
-
=item C<$OnlySearchActiveTicketsInSimpleSearch>
When query in simple search doesn't have status info, use this to only
diff --git a/sbin/rt-setup-fulltext-index.in b/sbin/rt-setup-fulltext-index.in
index 3c0edbf..adcf0d0 100644
--- a/sbin/rt-setup-fulltext-index.in
+++ b/sbin/rt-setup-fulltext-index.in
@@ -117,6 +117,7 @@ GetOptions(
'h|help!' => \$OPT{'help'},
'ask!' => \$OPT{'ask'},
'dry-run!' => \$OPT{'dryrun'},
+ 'binary!' => \$OPT{'binary'},
'dba=s' => \$DB{'admin'},
'dba-password=s' => \$DB{'admin_password'},
@@ -503,7 +504,7 @@ sub ora_create_format_column {
WHEN fname IS NOT NULL THEN 'ignore'
};
}
- my $binary = $RT::DontSearchBinaryAttachments? 'ignore' : 'binary';
+ my $binary = $OPT{'binary'} ? 'binary' : 'ignore';
$detect_format .= qq{
WHEN type = 'text' THEN 'text'
WHEN type = 'text/rtf' THEN '$binary'
@@ -717,6 +718,11 @@ Creates an Oracle CONTEXT index on the Content column in the Attachments
table. It also creates several preferences, functions and triggers to
support this index.
+The Oracle index determines which content-types it will index, at
+creation time. By default, unknown content-types are ignored; to
+instead index unknown content-types as binary data, pass the C<--binary>
+flag when the index is created.
+
CONTEXT indexes needperiodic synchronization after any updates; either
use F<sbin/rt-fulltext-indexer> via cron, or read its documentation for
alternatives.
commit 1d665c448c8eb8625c27f8da8bb1a71103d712f2
Author: Alex Vandiver <alexmv at bestpractical.com>
Date: Thu May 5 20:49:52 2011 -0400
Add a --all flag to iterate and index all new txns
diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index 758b14a..7b2dcab 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -92,15 +92,17 @@ if ( $db_type eq 'Pg' ) {
%OPT = (
%OPT,
limit => 0,
+ all => 0,
);
- push @OPT_LIST, 'limit=i';
+ push @OPT_LIST, 'limit=i', 'all!';
}
elsif ( $db_type eq 'mysql' ) {
%OPT = (
%OPT,
limit => 0,
+ all => 0,
);
- push @OPT_LIST, 'limit=i';
+ push @OPT_LIST, 'limit=i', 'all!';
}
elsif ( $db_type eq 'Oracle' ) {
%OPT = (
@@ -151,6 +153,7 @@ if ( $db_type eq 'Oracle' ) {
my @types = qw(text html);
foreach my $type ( @types ) {
+ REDO:
my $attachments = attachments($type);
$attachments->Limit(
FIELD => 'id',
@@ -171,6 +174,7 @@ foreach my $type ( @types ) {
}
finalize( $type, $attachments ) if $found;
clean( $type );
+ goto REDO if $OPT{'all'} and $attachments->Count == ($OPT{'limit'} || 100)
}
sub attachments {
commit d16cafcd37bce065739c57fe641c5030fed106cf
Author: Alex Vandiver <alexmv at bestpractical.com>
Date: Thu May 5 20:50:26 2011 -0400
Make direct sql via `indexer` the approved way of updating the Sphinx index
The (untested, unfinished) xmlpipe2 output is left as a starting point
for future improvement, but protected by an --xmlpipe2 flag.
diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index 7b2dcab..59b3b21 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -99,10 +99,11 @@ if ( $db_type eq 'Pg' ) {
elsif ( $db_type eq 'mysql' ) {
%OPT = (
%OPT,
- limit => 0,
- all => 0,
+ limit => 0,
+ all => 0,
+ xmlpipe2 => 0,
);
- push @OPT_LIST, 'limit=i', 'all!';
+ push @OPT_LIST, 'limit=i', 'all!', 'xmlpipe2!';
}
elsif ( $db_type eq 'Oracle' ) {
%OPT = (
@@ -149,6 +150,18 @@ if ( $db_type eq 'Oracle' ) {
$index, $OPT{'memory'}
);
exit;
+} elsif ( $db_type eq 'mysql' ) {
+ unless ($OPT{'xmlpipe2'}) {
+ print STDERR <<EOT;
+
+Updates to the external Sphinx index are done via running the sphinx
+`indexer` tool:
+
+ indexer rt
+
+EOT
+ exit 1;
+ }
}
my @types = qw(text html);
commit 1b66c9baa565369eb1bfb5c13d0e2d3a26407949
Author: Alex Vandiver <alexmv at bestpractical.com>
Date: Thu May 5 20:56:01 2011 -0400
Remove unreachable "verbose" code
diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index 59b3b21..5bdba64 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -426,10 +426,9 @@ sub goto_specific {
# helper functions
-sub verbose { print @_, "\n" if $OPT{verbose} || $OPT{verbose}; 1 }
sub debug { print @_, "\n" if $OPT{debug}; 1 }
-sub error { $RT::Logger->error(_(@_)); verbose(@_); 1 }
-sub warning { $RT::Logger->warn(_(@_)); verbose(@_); 1 }
+sub error { $RT::Logger->error(_(@_)); 1 }
+sub warning { $RT::Logger->warn(_(@_)); 1 }
=head1 NAME
commit fa5dffcb4e2ce943e7506cd4bc56c608725554a9
Author: Alex Vandiver <alexmv at bestpractical.com>
Date: Thu May 5 20:59:15 2011 -0400
Centralize full-text searching documentation into one place
diff --git a/docs/full_text_indexing.pod b/docs/full_text_indexing.pod
new file mode 100644
index 0000000..88d88ef
--- /dev/null
+++ b/docs/full_text_indexing.pod
@@ -0,0 +1,159 @@
+=head1 NAME
+
+Full text indexing in RT
+
+=head1 POSTGRES
+
+=head2 Creating and configuring the index
+
+Postgres 8.3 and above support full-text searching natively; to set up
+an the C<ts_vector> column, and creates either a C<GiN> or C<GiST> index
+on it, run:
+
+ sbin/rt-setup-fulltext-index
+
+If you have a non-standard database administrator username or password,
+you may need to pass the C<--dba> or C<--dba-password> options:
+
+ sbin/rt-setup-fulltext-index --dba postgres --dba-password secret
+
+This will also output an appropriate C<%FullTextSearch> configuration to
+add to your F<RT_SiteConfig.pm>; you will need to restart your webserver
+after making these changes. However, the index will also need to be
+filled before it can be used. To update the index initially, run:
+
+ sbin/rt-fulltext-indexer --all
+
+This will tokenize and index all existing attachments in your database;
+it may take quite a while if your database already has a large number of
+tickets in it.
+
+=head2 Updating the index
+
+To keep the index up-to-date, you will need to run:
+
+ sbin/rt-fulltext-indexer
+
+...at regular intervals. By default, this will only tokenize up to 100
+tickets at a time; you can adjust this batch upwards by passing
+C<--limit 500>, for instance. Larger batch sizes will take longer and
+consume more memory. Care should be taken to ensure that multiple
+instances of C<rt-fulltext-indexer> are not run at the same time, should
+it be run at too close intervals for the given batch size.
+
+
+=head1 MYSQL
+
+MySQL does not support full-text indexing natively. However, it does
+integrate with the external Sphinx engine, available from
+L<http://sphinxsearch.com>. Unfortunately, Sphinx integration (using
+SphinxSE) does require that you recompile MySQL from source. Most
+distribution-provided packages for MySQL do not include SphinxSE
+integration, merely the external Sphinx tools; these are not sufficient
+for RT's needs.
+
+=head2 Compiling MySQL and SphinxSE
+
+SphinxSE requires MySQL 5.0 or 5.1; later versions of MySQL have not
+been tested at this time. Sphinx version 2.0.1 has been tested to work,
+but version 0.9.9 may work as well. Compilation and installation
+instructions for MySQL with SphinxSE can be found at
+L<http://sphinxsearch.com/docs/current.html#sphinxse-installing>.
+
+=head2 Creating and configuring the index
+
+Once MySQL has been recompiled with SphinxSE, and Sphinx itself is
+installed, you may create the required SphinxSE communication table via:
+
+ sbin/rt-setup-fulltext-index
+
+If you have a non-standard database administrator username or password,
+you may need to pass the C<--dba> or C<--dba-password> options:
+
+ sbin/rt-setup-fulltext-index --dba root --dba-password secret
+
+This will also provide you with the appropriate C<%FullTextSearch>
+configuration to add to your F<RT_SiteConfig.pm>; you will need to
+restart your webserver after making these changes. It will also print a
+sample Sphinx configuration, which should be placed in
+F</etc/sphinx.conf>, or equivalent.
+
+=head2 Updating the index
+
+To fill the index, you will need to run the C<indexer> command-line tool
+provided by Sphinx:
+
+ indexer rt
+
+This command should also be run at regular intervals in order to pick
+new and updated attachments from RT's database. Failure to do so will
+result in stale data.
+
+=head2 Caveats
+
+Sphinx only returns a finite number of matches to any query; this number
+is controlled by C<max_matches> in F</etc/sphinx.conf> and
+C<%FullTextSearch>'s C<MaxMatches> in C<RT_SiteConfig.pm>, which must be
+kept in sync. The default, set during C<rt-setup-fulltext-index>, is
+10000. This limit may lead to false negatives in search results if the
+maximum number of matches is reached but the results returned do not
+match RT's other criteria.
+
+Take, for example, the instance where only three results are returned,
+and tickets 1, 2, 3, 4, and 5 contain the string "target", but only
+ticket 5 is in status "Open". A search for C<Content LIKE 'target' AND
+Status = 'Open'> may return no results, despite ticket 5 matching those
+criteria, as Sphinx will only return tickets 1, 2, and 3 as possible
+matches.
+
+
+=head1 ORACLE
+
+=head2 Creating and configuring the index
+
+Oracle supports full-text indexing natively; to configure your Oracle
+database for full-text searching, run:
+
+ sbin/rt-setup-fulltext-index
+
+If you have a non-standard database administrator username or password,
+you may need to pass the C<--dba> or C<--dba-password> options:
+
+ sbin/rt-setup-fulltext-index --dba sysdba --dba-password secret
+
+This will create an Oracle CONTEXT index on the Content column in the
+Attachments table, as well as several preferences, functions and
+triggers to support this index. The script will also output an
+appropriate C<%FullTextSearch> configuration to add to your
+F<RT_SiteConfig>.
+
+The Oracle index determines which content types it will index, at
+creation time. By default, unknown content-types are ignored; to
+instead index unknown content-types as binary data, pass the C<--binary>
+flag to C<rt-setup-fulltext-index>.
+
+=head2 Updating the index
+
+To update the index, you will need to run the following at regular
+intervals:
+
+ sbin/rt-fulltext-indexer
+
+This, in effect, simply runs:
+
+ begin
+ ctx_ddl.sync_index('rt_fts_index', '2M');
+ end;
+
+The amount of memory used for the sync can be controlled with the
+C<--memory> option:
+
+ rt-fulltext-indexer --memory 10M
+
+Instead of being run via C<cron>, this may instead be run via a
+DBMS_JOB; read the B<Managing DML Operations for a CONTEXT Index>
+chapter of Oracle's B<Text Application Developer's Guide> for details
+how to keep the index optimized, perform garbage collection, and other
+tasks.
+
+=cut
diff --git a/etc/RT_Config.pm.in b/etc/RT_Config.pm.in
index 37290d8..5fb562e 100755
--- a/etc/RT_Config.pm.in
+++ b/etc/RT_Config.pm.in
@@ -1173,14 +1173,8 @@ Set($DefaultSelfServiceSearchResultFormat, qq{
Full text search (FTS) without database indexing is a very slow
operation, and is thus disabled by default.
-To enable and configure database support for full-text indexing, run
-F<sbin/rt-setup-fulltext-index> and follow the prompts it presents.
-This script will create the necessary indexes and inform you how to
-set the %FullTextSearch configuration.
-
-You will also need to update the index by running
-F<sbin/rt-fulltext-indexer> to run at regular intervals; how frequently
-depends entirely on your workload.
+Before setting C<Indexed> to 1, read F<docs/full_text_indexing.pod> for
+the full details of FTS on your particular database.
It is possible to enable FTS without database indexing support, simply
by setting the C<Enable> key to 1, while leaving C<Indexed> set to 0.
@@ -1192,8 +1186,6 @@ cause severe performance problems.
Set(%FullTextSearch,
Enable => 0,
Indexed => 0,
-# Table => 'AttachmentsIndex',
-# Column => 'ftsindex',
);
diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index 5bdba64..19b1592 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -436,65 +436,14 @@ rt-fulltext-indexer - Indexer for full text search
=head1 DESCRIPTION
-This is a helper script to keep full text indexes in sync with data. It
-relies on indexes having been properly configured by
-F<sbin/rt-setup-fulltext-index> first. This script should be run
-occasionally via C<cron>.
+This is a helper script to keep full text indexes in sync with data.
+Read F<docs/full_text_indexing.pod> for complete details on how and when
+to run it.
-=head1 ORACLE
+=head1 AUTHOR
-=head2 USAGE
-
- rt-fulltext-indexer --help
- rt-fulltext-indexer
- rt-fulltext-indexer --memory 10M
-
-=head2 DESCRIPTION
-
-Basicly, this script simply runs the following query:
-
- begin
- ctx_ddl.sync_index('rt_fts_index', '2M');
- end;
-
-The ammount of memory used for the sync can be controlled with --memory option.
-
-There is way to do all of this by setting up a DBMS_JOB; read the
-"Managing DML Operations for a CONTEXT Index" chapter from Oracle's
-"Text Application Developer's Guide" for more info on the topic, and
-details how to keep the index optimized, perform garbage collection, and
-other tasks.
-
-=head1 PG
-
-=head2 USAGE
-
- rt-fulltext-indexer --help
- rt-fulltext-indexer
- rt-fulltext-indexer --limit 100
-
-=head2 DESCRIPTION
-
-This script finds attachments that should be indexed and stores them in
-the index. You can use the --limit option to specify how many
-attachments to process at once; defaults to 100.
-
-=head1 MYSQL
-
-=head2 USAGE
-
- rt-fulltext-indexer --help
- rt-fulltext-indexer
- rt-fulltext-indexer --limit 100
-
-=head2 DESCRIPTION
-
-This script finds attachments that should be indexed and prints an
-xmlpipe2 document stream that can processed by sphinx; the output should
-be piped into. For more details, see the sphinx reference manual.
-
-You can use the --limit option to specify how many attachments to
-process at once; defaults to 100.
+Ruslan Zakirov E<lt>ruz at bestpractical.comE<gt>,
+Alex Vandiver E<lt>alexmv at bestpractical.comE<gt>
=cut
diff --git a/sbin/rt-setup-fulltext-index.in b/sbin/rt-setup-fulltext-index.in
index adcf0d0..d22099e 100644
--- a/sbin/rt-setup-fulltext-index.in
+++ b/sbin/rt-setup-fulltext-index.in
@@ -651,10 +651,7 @@ sub show_help {
my $error = shift;
RT::Interface::CLI->ShowHelp(
ExitValue => $error,
- Sections => $error
- ? 'NAME|'. uc($DB{'type'}) .'/USAGE'
- : 'NAME|DESCRIPTION|'. uc($DB{'type'})
- ,
+ Sections => 'NAME|DESCRIPTION',
);
}
@@ -699,63 +696,31 @@ rt-setup-fulltext-index - Create indexes for full text search
=head1 DESCRIPTION
-Full text indexes are very database specific; this script sets up
-indexing for Oracle, Pg and mysql; specifics for each are below. After
-creating the indexes, it will print a short section that should be
-inserted into your your RT_SiteConfig.pm to enable the index. You will
-need to restart the web-server after making those changes.
+This script creates the appropriate tables, columns, functions, and / or
+views necessary for full-text searching for your database type. It will
+drop any existing indexes in the process.
-=head1 ORACLE
+Please read F<docs/full_text_indexing.pod> for complete documentation on
+full-text indexing for your database type.
-=head2 USAGE
+If you have a non-standard database administrator user or password, you
+may use the C<--dba> and C<--dba-password> parameters to set them
+explicitly:
- rt-setup-fulltext-index --help
rt-setup-fulltext-index --dba sysdba --dba-password 'secret'
-=head2 DESCRIPTION
-
-Creates an Oracle CONTEXT index on the Content column in the Attachments
-table. It also creates several preferences, functions and triggers to
-support this index.
+To test what will happen without running any DDL, pass the C<--dryrun>
+flag.
The Oracle index determines which content-types it will index, at
creation time. By default, unknown content-types are ignored; to
instead index unknown content-types as binary data, pass the C<--binary>
flag when the index is created.
-CONTEXT indexes needperiodic synchronization after any updates; either
-use F<sbin/rt-fulltext-indexer> via cron, or read its documentation for
-alternatives.
-
-=head1 PG
-
-=head2 USAGE
-
- rt-setup-fulltext-index --help
- rt-setup-fulltext-index --dba postgres --dba-password 'secret'
-
-=head2 DESCRIPTION
-
-Creates an additional column to store a ts_vector, and then creates
-either a GiN or GiST index on it. Use F<sbin/rt-fulltext-indexer> via
-cron to keep the index in sync.
-
-=head1 MYSQL
-
-=head2 USAGE
-
- rt-setup-fulltext-index --help
- rt-setup-fulltext-index --dba root
-
-=head2 DESCRIPTION
-
-Full text search in mysql is implemented through the Sphinx storage
-engine (SphinxSE), which your mysql must be compiled with support
-for. Use F<sbin/rt-fulltext-indexer> via cron to keep the index in sync.
-
=head1 AUTHOR
-Ruslan Zakirov E<lt>ruz at bestpractical.comE<gt>
+Ruslan Zakirov E<lt>ruz at bestpractical.comE<gt>,
+Alex Vandiver E<lt>alexmv at bestpractical.comE<gt>
=cut
-----------------------------------------------------------------------
More information about the Rt-commit
mailing list