[Rt-commit] rt branch, 4.0/decode-entities-before-indexing, created. rt-4.0.22-21-g9ced882
Kevin Falcone
falcone at bestpractical.com
Mon Dec 15 14:54:48 EST 2014
The branch, 4.0/decode-entities-before-indexing has been created
at 9ced88246f8ff368e0877c454fb70b81b654d673 (commit)
- Log -----------------------------------------------------------------
commit 9ced88246f8ff368e0877c454fb70b81b654d673
Author: Kevin Falcone <falcone at bestpractical.com>
Date: Thu Nov 20 16:58:22 2014 -0500
Our rich text editor produces entities for indexed characters
Postgres is completely capable of searching for cherché but the rich
text editor produced cherché and so we handed that to Pg and it
stored 'cherch' which won't match later.
This causes us to pass the actual word to Pg to be tokenized and stored
improving the odds that a later search will find it.
Since RT generates text/plain outgoing mail parts and those parts
generally contained the decoded word, this may have been hiding this
failure (the search would find the outgoing mail message rather than
the reply itself, but would still match the ticket).
diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index b90d8da..1045a92 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -435,7 +435,10 @@ sub extract_html {
my $attachment = shift;
my $text = $attachment->Content;
return undef unless defined $text && length($text);
-# TODO: html -> text
+# the rich text editor generates html entities for characters
+# but Pg doesn't index them, so decode to something it can index.
+ require HTML::Entities;
+ HTML::Entities::decode_entities($text);
return \$text;
}
-----------------------------------------------------------------------
More information about the rt-commit
mailing list