[Rt-commit] rt branch, 4.0/pg-fts-invalid-character, created. rt-4.0.5-62-g81df7e2
Alex Vandiver
alexmv at bestpractical.com
Wed Feb 15 15:08:43 EST 2012
The branch, 4.0/pg-fts-invalid-character has been created
at 81df7e2d07c35834b670e0e41adf677cd15affb5 (commit)
- Log -----------------------------------------------------------------
commit 12b0fded547c53c79db4f5a2e2f049b5f397d387
Author: Alex Vandiver <alexmv at bestpractical.com>
Date: Wed Feb 15 15:01:05 2012 -0500
With the Pg FTS, catch and skip attachments which contain invalid UTF8 bytes
diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index 7e31cac..652fde0 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -371,6 +371,8 @@ sub process_pg {
unless ( $status ) {
if ($dbh->errstr =~ /string is too long for tsvector/) {
warn "Attachment @{[$attachment->id]} not indexed, as it contains too many unique words to be indexed";
+ } elsif ($dbh->errstr =~ /invalid byte sequence/) {
+ warn "Attachment @{[$attachment->id]} cannot be indexed, as it contains invalid UTF8 bytes";
} else {
die "error: ". $dbh->errstr;
}
commit 19721b8012776f5ae523e27f07b6dac06ad1dded
Author: Alex Vandiver <alexmv at bestpractical.com>
Date: Wed Feb 15 15:03:38 2012 -0500
Strengthen wording about our ability (or lack thereof) to FTS index on Pg
diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index 652fde0..d978586 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -370,7 +370,7 @@ sub process_pg {
my $status = eval { $dbh->do( $query, undef, $$text, $attachment->id ) };
unless ( $status ) {
if ($dbh->errstr =~ /string is too long for tsvector/) {
- warn "Attachment @{[$attachment->id]} not indexed, as it contains too many unique words to be indexed";
+ warn "Attachment @{[$attachment->id]} cannot be indexed, as it contains too many unique words";
} elsif ($dbh->errstr =~ /invalid byte sequence/) {
warn "Attachment @{[$attachment->id]} cannot be indexed, as it contains invalid UTF8 bytes";
} else {
commit 81df7e2d07c35834b670e0e41adf677cd15affb5
Author: Alex Vandiver <alexmv at bestpractical.com>
Date: Wed Feb 15 15:03:45 2012 -0500
If we fail to index on Pg, ensure that we continue indexing past that point
Previously, failure to index (because of invalid bytes, or too-long
content) left the content index NULL. As our check for where to resume
indexing is based on rows where the index IS NOT NULL, this could lead
to a pessimal condition where a large number of failures to index in a
row would prevent forward progress of the indexer.
diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index d978586..407afe0 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -376,6 +376,11 @@ sub process_pg {
} else {
die "error: ". $dbh->errstr;
}
+
+ # Insert an empty tsvector, so we count this row as "indexed"
+ # for purposes of knowing where to pick up
+ eval { $dbh->do( $query, undef, "", $attachment->id ) }
+ or die "Failed to insert empty tsvector: " . $dbh->errstr;
}
}
-----------------------------------------------------------------------
More information about the Rt-commit
mailing list