[Rt-commit] rt branch, 4.4-trunk, updated. rt-4.4.0-48-g54d006d

Shawn Moore shawn at bestpractical.com
Tue Feb 16 21:40:00 EST 2016


The branch, 4.4-trunk has been updated
       via  54d006d1c90ac41a810bbaaa792e8ee572bafbe7 (commit)
       via  0c23344cb410f36f14f5c9bcdb7a781d8377eda7 (commit)
       via  5dc7c8200391d08d10a47a5e99b8efa1504dfa82 (commit)
       via  d32e97febe6e8c2fe012f2ddce75ca862f3a29f0 (commit)
      from  36d6891523fec77f8b509d3a3290823acfbfb96d (commit)

Summary of changes:
 lib/RT/Tickets.pm           | 17 ++++++++++++-----
 sbin/rt-fulltext-indexer.in | 16 ++++++++++++++--
 2 files changed, 26 insertions(+), 7 deletions(-)

- Log -----------------------------------------------------------------
commit d32e97febe6e8c2fe012f2ddce75ca862f3a29f0
Author: Christian Loos <cloos at netcologne.de>
Date:   Thu Oct 15 12:53:15 2015 +0200

    don't index EmailRecord transaction attachments
    
    EmailRecord and CommentEmailRecord transaction attachments contain
    redundant content as this attachment consists of the transaction
    content (Create, Correspond or Comment) and the template text.
    
    For example with the default RT configuration with queue AdminCcs, a
    ticket create results in a Create transaction and two EmailRecord
    transactions (one for the Requestor autoreply and one for the queue
    AdminCcs).
    So the valuable information in the create transaction attachment is
    indexed three times.
    
    Tests shows that not indexing the EmailRecord and CommentEmailRecord
    transaction attachments drops first index time by 30%, indexed rows by
    35%, index data file size by 27% and index index file size by 38%.
    
    For limiting the transaction Type, I chose NOT IN because it's 2 (EmailRecord,
    CommentEmailRecord) vs. 3 (Correspond, Comment, Create). Also, if ever a new
    transaction type will be introduced, or was introduced in a local
    customisation, which stores attachments, this type will automatically be
    indexed.
    
    Test details (RT 4.2.12, MySQL 5.5.44, 1,534,314 plain/html
    attachments):
    
    before:
    time /opt/rt4/sbin/rt-setup-fulltext-index --index-type mysql --table AttachmentsIndex
    34m58.295s
    
    mysql -BNe 'SELECT COUNT(*) FROM rt4.AttachmentsIndex'
    1534314
    
    du -h /var/lib/mysql/rt4/AttachmentsIndex.MY*
    1.5G    /var/lib/mysql/rt4/AttachmentsIndex.MYD
    782M    /var/lib/mysql/rt4/AttachmentsIndex.MYI
    
    after:
    time /opt/rt4/sbin/rt-setup-fulltext-index --index-type mysql --table AttachmentsIndex
    24m4.712s
    
    mysql -BNe 'SELECT COUNT(*) FROM rt4.AttachmentsIndex'
    1000218
    
    du -h /var/lib/mysql/rt4/AttachmentsIndex.MY*
    1.1G    /var/lib/mysql/rt4/AttachmentsIndex.MYD
    483M    /var/lib/mysql/rt4/AttachmentsIndex.MYI

diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index bd55adb..b7aa550 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -167,6 +167,18 @@ sub attachment_loop {
     {
         # Indexes all text/plain and text/html attachments
         my $attachments = RT::Attachments->new( RT->SystemUser );
+        my $txn_alias = $attachments->Join(
+            ALIAS1 => 'main',
+            FIELD1 => 'TransactionId',
+            TABLE2 => 'Transactions',
+            FIELD2 => 'id',
+        );
+        $attachments->Limit(
+            ALIAS    => $txn_alias,
+            FIELD    => 'Type',
+            OPERATOR => 'NOT IN',
+            VALUE    => ['EmailRecord', 'CommentEmailRecord'],
+        );
         $attachments->Limit(
             FIELD    => 'ContentType',
             OPERATOR => 'IN',

commit 5dc7c8200391d08d10a47a5e99b8efa1504dfa82
Author: Christian Loos <cloos at netcologne.de>
Date:   Fri Oct 23 15:29:48 2015 +0200

    don't search EmailRecord transaction attachments
    
    To have consistent search results for indexed and non-indexed searches.
    See also previous commit.

diff --git a/lib/RT/Tickets.pm b/lib/RT/Tickets.pm
index c641cd2..4e50790 100644
--- a/lib/RT/Tickets.pm
+++ b/lib/RT/Tickets.pm
@@ -976,12 +976,19 @@ sub _TransContentLimit {
         }
     } else {
         $self->Limit(
+            ALIAS    => $txn_alias,
+            FIELD    => 'Type',
+            OPERATOR => 'NOT IN',
+            VALUE    => ['EmailRecord', 'CommentEmailRecord'],
+        );
+        $self->Limit(
             %rest,
-            ALIAS         => $self->{_sql_trattachalias},
-            FIELD         => $field,
-            OPERATOR      => $op,
-            VALUE         => $value,
-            CASESENSITIVE => 0,
+            ENTRYAGGREGATOR => 'AND',
+            ALIAS           => $self->{_sql_trattachalias},
+            FIELD           => $field,
+            OPERATOR        => $op,
+            VALUE           => $value,
+            CASESENSITIVE   => 0,
         );
     }
     if ( RT->Config->Get('DontSearchFileAttachments') ) {

commit 0c23344cb410f36f14f5c9bcdb7a781d8377eda7
Author: Christian Loos <cloos at netcologne.de>
Date:   Fri Oct 23 15:57:18 2015 +0200

    also index the attachment subject
    
    When we indexed the EmailRecord transactions, we had the benefit of
    indexing the attachments subject.
    To maintain this benefit now that we no longer index the EmailRecord
    transactions, add the attachment subject to the indexed content.
    See also 8450f0a9f233d6a761ac22dbdf14926abc54d7fa.

diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index b7aa550..0e0bc6a 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -217,7 +217,7 @@ sub process_bulk_insert {
             debug("Found attachment #". $a->id );
             my $text = $a->Content // "";
             HTML::Entities::decode_entities($text) if $a->ContentType eq "text/html";
-            push @insert, $text, $a->id;
+            push @insert, join("\n", $a->Subject // "", $text), $a->id;
             $found++;
         }
         return unless $found;
@@ -322,7 +322,7 @@ sub process_pg_update {
             my $text = $a->Content // "";
             HTML::Entities::decode_entities($text) if $a->ContentType eq "text/html";
 
-            push @insert, [$text, $a->id];
+            push @insert, [join("\n", $a->Subject // "", $text), $a->id];
         }
 
         # Try in one database transaction; if it fails, we roll it back

commit 54d006d1c90ac41a810bbaaa792e8ee572bafbe7
Merge: 36d6891 0c23344
Author: Shawn M Moore <shawn at bestpractical.com>
Date:   Tue Feb 16 21:39:56 2016 -0500

    Merge branch '4.2/fts-indexer-improvements' into 4.4-trunk


-----------------------------------------------------------------------


More information about the rt-commit mailing list