[Rt-commit] rt branch, 4.0/fulltext-search, updated. rt-4.0.0-236-gd99fb38

Mon May 16 19:29:15 EDT 2011

The branch, 4.0/fulltext-search has been updated
       via  d99fb385b2b46191d749b48be83f21c7cbf2fd7f (commit)
      from  3f5c3b21406eec4ac9f468edff351521801ca1b3 (commit)

Summary of changes:
 docs/full_text_indexing.pod |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

- Log -----------------------------------------------------------------
commit d99fb385b2b46191d749b48be83f21c7cbf2fd7f
Author: Alex Vandiver <alexmv at bestpractical.com>
Date:   Mon May 16 19:28:53 2011 -0400

    Note unicode limitations of full-text searching solutions

diff --git a/docs/full_text_indexing.pod b/docs/full_text_indexing.pod
index 88d88ef..9fad5a5 100644
--- a/docs/full_text_indexing.pod
+++ b/docs/full_text_indexing.pod
@@ -2,6 +2,17 @@
 
 Full text indexing in RT
 
+=head1 LIMITATIONS
+
+While all of the below solutions can search for Unicode characters, they
+are not otherwise Unicode aware, and do no case folding, normalization,
+or the like.  That is, a string that contains C<U+0065 LATIN SMALL
+LETTER E> followed by C<U+0301 COMBINING ACUTE ACCENT> will not match a
+search for C<U+00E9 LATIN SMALL LETTER E WITH ACUTE>.  They also only
+know how to tokenize C<latin-1>-ish languages where words are separated
+by whitespace or similar characters; as such, support for searching for
+Japanese and Chinese content is extremely limited.
+
 =head1 POSTGRES
 
 =head2 Creating and configuring the index

-----------------------------------------------------------------------