[rt-users] Slow queries building list of privileged users (Postgres)

Kenneth Marshall ktm at rice.edu
Fri Nov 20 09:02:46 EST 2009


On Fri, Nov 20, 2009 at 11:41:50AM +0000, Dominic Hargreaves wrote:
> I'm migrating from an RT 2 install to an RT 3.8 install with around
> 170 privileged users (and around 90,000 total users).
> 
> I've done some initial testing with RT 3.8.6 and have observed
> that building the list of privileged users (in the "create new
> ticket" (ticket owner), "display ticket" (reminder owner) and
> "ticket search" (ticket owner) pages) takes around 20-30 seconds to run.
> 
> It performs two queries which are logged by my slow query logger:
> 
> LOG:  duration: 30570.089 ms  statement: SELECT DISTINCT main.* FROM Users main CROSS JOIN ACL ACL_4 JOIN Principals Principals_1  ON ( Principals_1.id = main.id ) JOIN CachedGroupMembers CachedGroupMembers_2  ON ( CachedGroupMembers_2.MemberId = Principals_1.id ) JOIN Groups Groups_3  ON ( Groups_3.id = CachedGroupMembers_2.GroupId )  WHERE (Principals_1.Disabled = '0') AND (ACL_4.PrincipalType = Groups_3.Type) AND (Principals_1.id != '1') AND (Principals_1.PrincipalType = 'User') AND (ACL_4.RightName = 'OwnTicket') AND (Groups_3.Domain = 'RT::System-Role') AND ((ACL_4.ObjectType = 'RT::Queue' AND ACL_4.ObjectId = 3) OR (ACL_4.ObjectType = 'RT::System'))  ORDER BY main.Name ASC 
> 
> LOG:  duration: 824.692 ms  statement: SELECT DISTINCT main.* FROM Users main CROSS JOIN ACL ACL_2 JOIN Principals Principals_1  ON ( Principals_1.id = main.id ) JOIN CachedGroupMembers CachedGroupMembers_3  ON ( CachedGroupMembers_3.MemberId = Principals_1.id )  WHERE (Principals_1.Disabled = '0') AND (ACL_2.PrincipalId = CachedGroupMembers_3.GroupId) AND (Principals_1.id != '1') AND (ACL_2.PrincipalType = 'Group') AND (Principals_1.PrincipalType = 'User') AND (ACL_2.RightName = 'OwnTicket') AND ((ACL_2.ObjectType = 'RT::Queue' AND ACL_2.ObjectId   = 3) OR (ACL_2.ObjectType = 'RT::System'))  ORDER BY main.Name ASC 
> 
> The first, on our database, returns no entries at all, whilst the
> second returns the expected number of entries (the list of privileged
> users).
> 
> Concentrating on this very slow one and asking the query planner
> with EXPLAIN ANALYZE:
> 
>  Unique  (cost=147226.19..147269.59 rows=496 width=3382) (actual time=30888.166..30888.166 rows=0 loops=1)
>    ->  Sort  (cost=147226.19..147227.43 rows=496 width=3382) (actual time=30888.164..30888.164 rows=0 loops=1)
>          Sort Key: main.name, main.id, main.password, main.comments, main.signature, main.emailaddress, main.freeformcontactinfo, main.organization, main.realname, main.nickname, main.lang, main.emailencoding, main.webencoding, main.externalcontactinfoid, main.contactinfosystem, main.externalauthid, main.authsystem, main.gecos, main.homephone, main.workphone, main.mobilephone, main.pagerphone, main.address1, main.address2, main.city, main.state, main.zip, main.country, main.timezone, main.pgpkey, main.creator, main.created, main.lastupdatedby, main.lastupdated
>          Sort Method:  quicksort  Memory: 25kB
>          ->  Hash Join  (cost=76985.91..147203.98 rows=496 width=3382) (actual time=30888.088..30888.088 rows=0 loops=1)
>                Hash Cond: ((groups_3.type)::text = (acl_4.principaltype)::text)
>                ->  Hash Join  (cost=76933.22..147046.33 rows=3077 width=3388) (actual time=30887.657..30887.657 rows=0 loops=1)
>                      Hash Cond: (cachedgroupmembers_2.groupid = groups_3.id)
>                      ->  Hash Join  (cost=57544.03..126361.67 rows=126470 width=3386) (actual time=1509.645..29963.426 rows=1278161 loops=1)
>                            Hash Cond: (cachedgroupmembers_2.memberid = main.id)
>                            ->  Seq Scan on cachedgroupmembers cachedgroupmembers_2  (cost=0.00..54962.02 rows=3357602 width=8) (actual time=0.026..2268.758 rows=3357455 loops=1)
>                            ->  Hash  (cost=57500.57..57500.57 rows=3477 width=3386) (actual time=1509.465..1509.465 rows=92315 loops=1)
>                                  ->  Hash Join  (cost=52857.90..57500.57 rows=3477 width=3386) (actual time=993.708..1360.215 rows=92315 loops=1)
>                                        Hash Cond: (main.id = principals_1.id)
>                                        ->  Seq Scan on users main  (cost=0.00..3223.16 rows=92316 width=3382) (actual time=0.022..75.080 rows=92316 loops=1)
>                                        ->  Hash  (cost=51835.42..51835.42 rows=81798 width=4) (actual time=993.584..993.584 rows=92316 loops=1)
>                                              ->  Seq Scan on principals principals_1  (cost=0.00..51835.42 rows=81798 width=4) (actual time=0.036..914.611 rows=92316 loops=1)
>                                                    Filter: ((id <> 1) AND (disabled = 0) AND ((principaltype)::text = 'User'::text))
>                      ->  Hash  (cost=18759.68..18759.68 rows=50361 width=10) (actual time=0.092..0.092 rows=4 loops=1)
>                            ->  Bitmap Heap Scan on groups groups_3  (cost=1907.17..18759.68 rows=50361 width=10) (actual time=0.078..0.081 rows=4 loops=1)
>                                  Recheck Cond: ((domain)::text = 'RT::System-Role'::text)
>                                  ->  Bitmap Index Scan on groups1  (cost=0.00..1894.58 rows=50361 width=0) (actual time=0.059..0.059 rows=4 loops=1)
>                                        Index Cond: ((domain)::text = 'RT::System-Role'::text)
>                ->  Hash  (cost=52.39..52.39 rows=24 width=6) (actual time=0.401..0.401 rows=97 loops=1)
>                      ->  Bitmap Heap Scan on acl acl_4  (cost=8.77..52.39 rows=24 width=6) (actual time=0.228..0.303 rows=97 loops=1)
>                            Recheck Cond: ((((rightname)::text = 'OwnTicket'::text) AND ((objecttype)::text = 'RT::Queue'::text) AND (objectid = 3)) OR (((rightname)::text = 'OwnTicket'::text) AND ((objecttype)::text = 'RT::System'::text)))
>                            ->  BitmapOr  (cost=8.77..8.77 rows=24 width=0) (actual time=0.205..0.205 rows=0 loops=1)
>                                  ->  Bitmap Index Scan on acl1  (cost=0.00..4.30 rows=4 width=0) (actual time=0.091..0.091 rows=2 loops=1)
>                                        Index Cond: (((rightname)::text = 'OwnTicket'::text) AND ((objecttype)::text = 'RT::Queue'::text) AND (objectid = 3))
>                                  ->  Bitmap Index Scan on acl1  (cost=0.00..4.45 rows=20 width=0) (actual time=0.113..0.113 rows=95 loops=1)
>                                        Index Cond: (((rightname)::text = 'OwnTicket'::text) AND ((objecttype)::text = 'RT::System'::text))
>  Total runtime: 30892.368 ms
> 
> Here most of the time is spent in the hash join for:
> 
> Hash Cond: (cachedgroupmembers_2.memberid = main.id)
> 
> and this join eliminates all the remaining rows.
> 
> Clearly for us, this first join is useless, but can anyone comment on
> whether this is usual, and why it's so slow for us? 
> 
> We're running an essentially identical database configuration with
> another instance of 3.8.6, which has around 10 privileged users and so
> unsurprisingly doesn't have this problem, and the query planner is
> different here:
> 
>  Unique  (cost=638.22..638.31 rows=1 width=1086) (actual time=0.355..0.355 rows=0 loops=1)
>    ->  Sort  (cost=638.22..638.22 rows=1 width=1086) (actual time=0.353..0.353 rows=0 loops=1)
>          Sort Key: main.name, main.id, main.password, main.comments, main.signature, main.emailaddress, main.freeformcontactinfo, main.organization, main.realname, main.nickname, main.lang, main.emailencoding, main.webencoding, main.externalcontactinfoid, main.contactinfosystem, main.externalauthid, main.authsystem, main.gecos, main.homephone, main.workphone, main.mobilephone, main.pagerphone, main.address1, main.address2, main.city, main.state, main.zip, main.country, main.timezone, main.pgpkey, main.creator, main.created, main.lastupdatedby, main.lastupdated
>          Sort Method:  quicksort  Memory: 25kB
>          ->  Nested Loop  (cost=0.00..638.21 rows=1 width=1086) (actual time=0.300..0.300 rows=0 loops=1)
>                ->  Nested Loop  (cost=0.00..637.87 rows=1 width=8) (actual time=0.299..0.299 rows=0 loops=1)
>                      ->  Nested Loop  (cost=0.00..635.76 rows=6 width=4) (actual time=0.298..0.298 rows=0 loops=1)
>                            ->  Nested Loop  (cost=0.00..608.38 rows=3 width=4) (actual time=0.296..0.296 rows=0 loops=1)
>                                  ->  Seq Scan on acl acl_4  (cost=0.00..16.40 rows=2 width=6) (actual time=0.031..0.219 rows=3 loops=1)
>                                        Filter: (((rightname)::text = 'OwnTicket'::text) AND ((((objecttype)::text = 'RT::Queue'::text) AND (objectid = 3)) OR ((objecttype)::text = 'RT::System'::text)))
>                                  ->  Index Scan using groups1 on groups groups_3  (cost=0.00..294.15 rows=147 width=10) (actual time=0.024..0.024 rows=0 loops=3)
>                                        Index Cond: (((groups_3.domain)::text = 'RT::System-Role'::text) AND ((groups_3.type)::text = (acl_4.principaltype)::text))
>                            ->  Index Scan using cachedgroupmembers3 on cachedgroupmembers cachedgroupmembers_2  (cost=0.00..9.06 rows=5 width=8) (never executed)
>                                  Index Cond: (cachedgroupmembers_2.groupid = groups_3.id)
>                      ->  Index Scan using principals_pkey on principals principals_1  (cost=0.00..0.34 rows=1 width=4) (never executed)
>                            Index Cond: (principals_1.id = cachedgroupmembers_2.memberid)
>                            Filter: ((principals_1.id <> 1) AND (principals_1.disabled = 0::smallint) AND ((principals_1.principaltype)::text = 'User'::text))
>                ->  Index Scan using users_pkey on users main  (cost=0.00..0.33 rows=1 width=1086) (never executed)
>                      Index Cond: (main.id = principals_1.id)
>  Total runtime: 0.602 ms
> 
> cachedgroupmembers and users have the same indexes on both systems
> (I'm not sure whether that's relevant).
> 
> Both systems are Debian lenny, RT 3.8.6, Postgres 8.3.
> 
> If anyone has any other advice about running RT 3.8 on postgres
> I'd be interested too - I've added a couple of extra indexes:
> 
> CREATE INDEX Groups3 ON Groups (LOWER(Domain), LOWER(Type));
> CREATE INDEX users5 ON users (LOWER(emailaddress));
> 
> which are missing from the default installation which have
> cut down some other common slow queries (the former
> is already in an RT ticket:
> 
> http://rt3.fsck.com/Ticket/Display.html?id=13056
> 
> )
> 
> Thanks,
> Dominic.
> 
Hi Dominic,

First, do you have $UseSQLForACLChecks set? I know that that
is a new option and there may still be performance tuning that
needs to be done to have it work well. We run RT 3.8.5 on a
PostgreSQL 8.4.1 database with 25K users and about 400 privileged
users and we do not see a performance problem. Would you mind
posting your postgres.conf changes from the default values as
well as the indexes you have defined for the tables involved.
Also, what is your statistics target for your tables?

Regards,
Ken



More information about the rt-users mailing list