[Rt-commit] rt branch, 4.4-trunk, updated. rt-4.4.0-48-g54d006d
Shawn Moore
shawn at bestpractical.com
Tue Feb 16 21:40:00 EST 2016
The branch, 4.4-trunk has been updated
via 54d006d1c90ac41a810bbaaa792e8ee572bafbe7 (commit)
via 0c23344cb410f36f14f5c9bcdb7a781d8377eda7 (commit)
via 5dc7c8200391d08d10a47a5e99b8efa1504dfa82 (commit)
via d32e97febe6e8c2fe012f2ddce75ca862f3a29f0 (commit)
from 36d6891523fec77f8b509d3a3290823acfbfb96d (commit)
Summary of changes:
lib/RT/Tickets.pm | 17 ++++++++++++-----
sbin/rt-fulltext-indexer.in | 16 ++++++++++++++--
2 files changed, 26 insertions(+), 7 deletions(-)
- Log -----------------------------------------------------------------
commit d32e97febe6e8c2fe012f2ddce75ca862f3a29f0
Author: Christian Loos <cloos at netcologne.de>
Date: Thu Oct 15 12:53:15 2015 +0200
don't index EmailRecord transaction attachments
EmailRecord and CommentEmailRecord transaction attachments contain
redundant content as this attachment consists of the transaction
content (Create, Correspond or Comment) and the template text.
For example with the default RT configuration with queue AdminCcs, a
ticket create results in a Create transaction and two EmailRecord
transactions (one for the Requestor autoreply and one for the queue
AdminCcs).
So the valuable information in the create transaction attachment is
indexed three times.
Tests shows that not indexing the EmailRecord and CommentEmailRecord
transaction attachments drops first index time by 30%, indexed rows by
35%, index data file size by 27% and index index file size by 38%.
For limiting the transaction Type, I chose NOT IN because it's 2 (EmailRecord,
CommentEmailRecord) vs. 3 (Correspond, Comment, Create). Also, if ever a new
transaction type will be introduced, or was introduced in a local
customisation, which stores attachments, this type will automatically be
indexed.
Test details (RT 4.2.12, MySQL 5.5.44, 1,534,314 plain/html
attachments):
before:
time /opt/rt4/sbin/rt-setup-fulltext-index --index-type mysql --table AttachmentsIndex
34m58.295s
mysql -BNe 'SELECT COUNT(*) FROM rt4.AttachmentsIndex'
1534314
du -h /var/lib/mysql/rt4/AttachmentsIndex.MY*
1.5G /var/lib/mysql/rt4/AttachmentsIndex.MYD
782M /var/lib/mysql/rt4/AttachmentsIndex.MYI
after:
time /opt/rt4/sbin/rt-setup-fulltext-index --index-type mysql --table AttachmentsIndex
24m4.712s
mysql -BNe 'SELECT COUNT(*) FROM rt4.AttachmentsIndex'
1000218
du -h /var/lib/mysql/rt4/AttachmentsIndex.MY*
1.1G /var/lib/mysql/rt4/AttachmentsIndex.MYD
483M /var/lib/mysql/rt4/AttachmentsIndex.MYI
diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index bd55adb..b7aa550 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -167,6 +167,18 @@ sub attachment_loop {
{
# Indexes all text/plain and text/html attachments
my $attachments = RT::Attachments->new( RT->SystemUser );
+ my $txn_alias = $attachments->Join(
+ ALIAS1 => 'main',
+ FIELD1 => 'TransactionId',
+ TABLE2 => 'Transactions',
+ FIELD2 => 'id',
+ );
+ $attachments->Limit(
+ ALIAS => $txn_alias,
+ FIELD => 'Type',
+ OPERATOR => 'NOT IN',
+ VALUE => ['EmailRecord', 'CommentEmailRecord'],
+ );
$attachments->Limit(
FIELD => 'ContentType',
OPERATOR => 'IN',
commit 5dc7c8200391d08d10a47a5e99b8efa1504dfa82
Author: Christian Loos <cloos at netcologne.de>
Date: Fri Oct 23 15:29:48 2015 +0200
don't search EmailRecord transaction attachments
To have consistent search results for indexed and non-indexed searches.
See also previous commit.
diff --git a/lib/RT/Tickets.pm b/lib/RT/Tickets.pm
index c641cd2..4e50790 100644
--- a/lib/RT/Tickets.pm
+++ b/lib/RT/Tickets.pm
@@ -976,12 +976,19 @@ sub _TransContentLimit {
}
} else {
$self->Limit(
+ ALIAS => $txn_alias,
+ FIELD => 'Type',
+ OPERATOR => 'NOT IN',
+ VALUE => ['EmailRecord', 'CommentEmailRecord'],
+ );
+ $self->Limit(
%rest,
- ALIAS => $self->{_sql_trattachalias},
- FIELD => $field,
- OPERATOR => $op,
- VALUE => $value,
- CASESENSITIVE => 0,
+ ENTRYAGGREGATOR => 'AND',
+ ALIAS => $self->{_sql_trattachalias},
+ FIELD => $field,
+ OPERATOR => $op,
+ VALUE => $value,
+ CASESENSITIVE => 0,
);
}
if ( RT->Config->Get('DontSearchFileAttachments') ) {
commit 0c23344cb410f36f14f5c9bcdb7a781d8377eda7
Author: Christian Loos <cloos at netcologne.de>
Date: Fri Oct 23 15:57:18 2015 +0200
also index the attachment subject
When we indexed the EmailRecord transactions, we had the benefit of
indexing the attachments subject.
To maintain this benefit now that we no longer index the EmailRecord
transactions, add the attachment subject to the indexed content.
See also 8450f0a9f233d6a761ac22dbdf14926abc54d7fa.
diff --git a/sbin/rt-fulltext-indexer.in b/sbin/rt-fulltext-indexer.in
index b7aa550..0e0bc6a 100644
--- a/sbin/rt-fulltext-indexer.in
+++ b/sbin/rt-fulltext-indexer.in
@@ -217,7 +217,7 @@ sub process_bulk_insert {
debug("Found attachment #". $a->id );
my $text = $a->Content // "";
HTML::Entities::decode_entities($text) if $a->ContentType eq "text/html";
- push @insert, $text, $a->id;
+ push @insert, join("\n", $a->Subject // "", $text), $a->id;
$found++;
}
return unless $found;
@@ -322,7 +322,7 @@ sub process_pg_update {
my $text = $a->Content // "";
HTML::Entities::decode_entities($text) if $a->ContentType eq "text/html";
- push @insert, [$text, $a->id];
+ push @insert, [join("\n", $a->Subject // "", $text), $a->id];
}
# Try in one database transaction; if it fails, we roll it back
commit 54d006d1c90ac41a810bbaaa792e8ee572bafbe7
Merge: 36d6891 0c23344
Author: Shawn M Moore <shawn at bestpractical.com>
Date: Tue Feb 16 21:39:56 2016 -0500
Merge branch '4.2/fts-indexer-improvements' into 4.4-trunk
-----------------------------------------------------------------------
More information about the rt-commit
mailing list