[rt-users] RT 3.8.9 + PostgreSQL 8.4.7 - missing/bad indexes

Thu Apr 14 09:26:33 EDT 2011

Hi,
because of horrible performance of spreadsheet export of tickets I found
some badly created or missing indexes. I didn't investigate if other
RDBMS schemas has the same problem.
To discover problems I rewrote Results.tsv into command-line shape and
run it with DBI_TRACE=2 finally. 

For every row of Tickets the following additional problematic queries are
executed:

########################################################################

SELECT  * FROM Groups WHERE Instance = ? AND LOWER(Domain) = LOWER(?) AND LOWER(Type) = LOWER(?)

e.g. one my bind params: 106431 'RT::Ticket-Role' 'Requestor'

rt=# EXPLAIN ANALYZE SELECT * FROM Groups WHERE Instance = 106431 AND LOWER(Domain) = LOWER('RT::Ticket-Role') AND LOWER(Type) = LOWER('Requestor');
							     QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on groups  (cost=0.00..12925.34 rows=1 width=66) (actual time=64.672..64.904 rows=1 loops=1)
   Filter: ((instance = 106431) AND (lower((domain)::text) = 'rt::ticket-role'::text) AND (lower((type)::text) = 'requestor'::text))
 Total runtime: 64.936 ms
(3 rows)

Existing indexes:
CREATE UNIQUE INDEX Groups1 ON Groups (Domain,Instance,Type,id, Name);
CREATE INDEX Groups2 On Groups (Type, Instance, Domain);

Both indexes above are case sensitive. Either problematic query should be
case sensitive (without lower()) or a new index with lower() on fields should
be created:

CREATE INDEX Groups_zito1 ON Groups (lower(Type), Instance, lower(Domain));

rt=# EXPLAIN ANALYZE SELECT * FROM Groups WHERE Instance = 106431 AND LOWER(Domain) = LOWER('RT::Ticket-Role') AND LOWER(Type) = LOWER('Requestor');
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using groups_zito1 on groups  (cost=0.01..8.38 rows=1 width=66) (actual time=0.084..0.084 rows=1 loops=1)
   Index Cond: ((lower((type)::text) = 'requestor'::text) AND (instance = 106431) AND (lower((domain)::text) = 'rt::ticket-role'::text))
 Total runtime: 0.108 ms
(3 rows)

########################################################################

SELECT main.* FROM ObjectCustomFieldValues main  WHERE (main.Disabled = '0') AND (main.ObjectType = 'RT::Ticket') AND (main.ObjectId = '106431')

rt=# EXPLAIN ANALYZE SELECT main.* FROM ObjectCustomFieldValues main  WHERE (main.Disabled = '0') AND (main.ObjectType = 'RT::Ticket') AND (main.ObjectId = '106431');
                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on objectcustomfieldvalues main  (cost=0.00..12298.75 rows=3 width=457) (actual time=142.497..142.887 rows=3 loops=1)
   Filter: ((disabled = 0) AND ((objecttype)::text = 'RT::Ticket'::text) AND (objectid = 106431))
 Total runtime: 142.924 ms
(3 rows)

Existing indexes:
CREATE INDEX ObjectCustomFieldValues1 ON ObjectCustomFieldValues (CustomField,ObjectType,ObjectId,Content);
CREATE INDEX ObjectCustomFieldValues2 ON ObjectCustomFieldValues (CustomField,ObjectType,ObjectId);

The above query selects based on fields ObjectType and ObjectId,
unfortunately ObjectCustomFieldValues2 contains additional field
CustomField. To speedup the search index with good selectivity
with only field ObjectId can suffice.

CREATE INDEX ObjectCustomFieldValues_zito1 ON ObjectCustomFieldValues (ObjectId);

rt=# EXPLAIN ANALYZE SELECT main.* FROM ObjectCustomFieldValues main  WHERE (main.Disabled = '0') AND (main.ObjectType = 'RT::Ticket') AND (main.ObjectId = '106431');
                                                                          QUERY PLAN                                                                          
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using objectcustomfieldvalues_zito1 on objectcustomfieldvalues main  (cost=0.00..8.58 rows=3 width=457) (actual time=0.059..0.063 rows=3 loops=1)
   Index Cond: (objectid = 106431)
   Filter: ((disabled = 0) AND ((objecttype)::text = 'RT::Ticket'::text))
 Total runtime: 0.090 ms
(4 rows)

or index with two corresponding fields:
CREATE INDEX ObjectCustomFieldValues_zito2 ON ObjectCustomFieldValues (ObjectType, ObjectId);
My planner uses always objectcustomfieldvalues_zito1, because I have
only one ObjectType value:

rt=# select distinct objecttype from objectcustomfieldvalues;
 objecttype 
------------
 RT::Ticket
(1 row)

########################################################################

And performance benefit?

Original state, without a new indexes:

zito at rt2:~/migration$ time ./rt-search2 /opt/RT >/dev/null 2>&1

real    3m12.235s
user    0m11.713s
sys     0m0.576s

After creation of the indexes:
CREATE INDEX Groups_zito1 ON Groups (lower(Type), Instance, lower(Domain));
CREATE INDEX ObjectCustomFieldValues_zito1 ON ObjectCustomFieldValues (ObjectId);

zito at rt2:~/migration$ time ./rt-search2 /opt/RT >/dev/null 2>&1

real    0m14.171s
user    0m9.545s
sys     0m0.456s

That is speed up by a factor of circa 14 in my case.

A testing script rt-search2 attached (rewrite of Results.tsv). A TSQL
query at the start of the script should be changed to something
appropriate of course.

Can be new indexes added into schema.Pg dist file?
Thanks
Best Regards
-- 
Zito
-------------- next part --------------
#!/usr/bin/perl
use warnings;
no warnings 'once';
use strict;
use Data::Dumper;

my $Query = <<'EOF';
Status = 'resolved' AND LastUpdated > '1.1.2009' AND (  Owner != 'mvas' OR Owner != 'behounek' ) AND Created > '1.1.2006' AND Queue = 'support RZP' AND Content LIKE 'uir'
EOF
my $OrderBy = 'id';
my $Order = 'ASC';

MAIN: {
    $| = 1;
    my $prefix = $ARGV[0];
    unshift @INC, ("$prefix/local/lib",  "$prefix/lib");

    require RT;
    RT::LoadConfig();
    RT::Init();

    require RT::Tickets;

    my $Tickets = RT::Tickets->new( $RT::SystemUser );
    $Tickets->FromSQL( $Query );
    if ( $OrderBy =~ /\|/ ) {
	# Multiple Sorts
	my @OrderBy = split /\|/, $OrderBy;
	my @Order   = split /\|/, $Order;
	$Tickets->OrderByCols(
	    map { { FIELD => $OrderBy[$_], ORDER => $Order[$_] } }
	    ( 0 .. $#OrderBy )
	);
    }
    else {
	$Tickets->OrderBy( FIELD => $OrderBy, ORDER => $Order );
    }

    my %cf_id_to_name;
    my %cf_name_to_pos;
    {
	my $cfs = RT::SQL::PossibleCustomFields(
	    Query => $Query, CurrentUser => $RT::SystemUser,
	);
	while ( my $cf = $cfs->Next ) {
	    my $name = $cf->Name;
	    $cf_id_to_name{ $cf->id } = $name;
	    next if $cf_name_to_pos{ $name };

	    $cf_name_to_pos{ $name } = 
		(sort { $b <=> $a } values %cf_name_to_pos)[0] + 1;
	}
    }

    my @attrs = qw(
	id QueueObj->Name Subject Status
	TimeEstimated TimeWorked TimeLeft
	Priority FinalPriority
	OwnerObj->Name 
	Requestors->MemberEmailAddressesAsString
	Cc->MemberEmailAddressesAsString
	AdminCc->MemberEmailAddressesAsString
	DueObj->ISO ToldObj->ISO CreatedObj->ISO
	ResolvedObj->ISO LastUpdatedObj->ISO LastUpdatedByObj->Name
    );

    {
	my @header;
	foreach my $attr (@attrs) {
	    my $label = $attr;
	    $label =~ s'Obj-.(?:AsString|Name|ISO)''g;
	    $label =~ s'-\>MemberEmailAddressesAsString''g;
	    push @header, $label;
	}

	$_ += @header - 1 foreach values %cf_name_to_pos;

	foreach my $name ( sort { $cf_name_to_pos{$a} <=> $cf_name_to_pos{$b} } keys %cf_name_to_pos ) {
	    push @header, "CF-". $name;
	}
	print join("\t", @header);
	print "\n";
    }

    my $i = 0;
    while ( my $Ticket = $Tickets->Next()) {
	my @row;
	foreach my $attr (@attrs) {
	    my $value;
	    if ($attr =~ /(.*)->ISO$/ and $Ticket->$1->Unix <= 0) {
		$value = '';
	    } else {
		my $method = '$Ticket->'.$attr.'()';
		$method =~ s/->ISO\(\)$/->ISO( Timezone => 'user' )/;
		$value = eval $method;
		if ($@) {die "Failed to find $attr - ". $@}; 
	    }
	    push @row, $value;
	}

	my $values = $Ticket->CustomFieldValues;
	$values->OrderByCols; # don't sort them
	while (my $value = $values->Next) {
	    my $pos = $cf_name_to_pos{ $cf_id_to_name{ $value->CustomField } };
	    next unless $pos;

	    $row[$pos] = '' unless defined $row[$pos];
	    $row[$pos] .= ', ' if $row[$pos];
	    $row[$pos] .= $value->Content;
	}

	# remove tabs from all field values, they screw up the tsv
	for (@row) {
	    $_ = '' unless defined;
	    $_ =~ s/(?:\n|\r)//g;
	    $_ =~ s{\t}{    }g;
	}

	print join("\t", at row);
	print "\n";
    }
};