[rt-users] ShowCustomFields extremely slow/Ticket loading slow...

Thu May 15 20:22:59 EDT 2014

Alex Vandiver wrote:
> On Thu, 2014-01-30 at 10:57 +0100, Michelle Sullivan wrote:
>   
>> Jan 28 02:11:09 to Jan 28 02:18:31  ... a full 7 minutes of being in
>> ShowCustomFields there are just 100's (like more than 1900 of them) of
>> statements like this: [snip]
>>
>> .. and no other queries...  The start being immediately after: [snip]
>>
>> Which is a column_info() call in DBD::Pg
>>
>> Which is called from Fields() in DBIx::Searchbuilder ..
>>
>> Which I assume is being called in ShowCustomFields somewhere...
>>     
>
> Correct.  It should also only run once per process, and be cached
> thereafter.
>
>   
>> The problem is network latency between the front ends and the backend
>> DB...  Running this on the same host returns in under a second.
>>     
>
> Yes.  These queries aren't a notable performance hit on most servers.
> At worst case, they may take a second or two -- they certainly don't
> take upwards of _five minutes_.  The latency is, as you've diagnosed,
> almost certainly the root cause of your problems.
>
>   
>> My frontends are on completely different networks (and indeed different
>> data centers) from the database servers for security (mainly DDoS type)
>> reasons
>>
>> [snip discussion of optimization of column_info()]
>>     
>
> RT simply isn't designed to operate in an environment where every query
> has tens to hundreds of ms of latency.  Attempts to optimize this
> particular set of queries will merely cause the limiting factor to
> become some other set.  While we're absolutely interested in patches
> that help decrease the number of queries that RT runs (as you've noted,
> there are several queries run multiple times), I expect you will find
> this to be a frustrating game of whack-a-mole.  If at all possible,
> altering your network topology to remove the latency is most probably a
> more straightforward way to have a performant RT instance.
>  

In a follow up to this and my other post about slow ticket loading.. 
The 20ms round trip time is not the issue here - it's just exaggerating
the issue.

I've built a custom port on FreeBSD to install pgpool-II-3.3.3 and am
using memcacheq and many of the RT queries (such as ACL checks) are
being cached.. however the bulk of the queries are not being cached at
all, they seem to be constantly changing - and even reloading of the
same ticket is taking minutes.

The problem is something in showcustomfields.. hundreds (sometimes
thousands) of queries like this:

May 15 18:59:59 vortex pgpool[31750]: DB node id: 0 backend pid: 15008
statement: Execute: SELECT consrc FROM pg_catalog.pg_constraint WHERE
contype = 'c' AND conrelid = 11404 AND conkey = '{8}'
May 15 18:59:59 vortex pgpool[31750]: DB node id: 0 backend pid: 15008
statement: Parse: SELECT consrc FROM pg_catalog.pg_constraint WHERE
contype = 'c' AND conrelid = 11404 AND conkey = '{9}'
May 15 18:59:59 vortex pgpool[31750]: DB node id: 0 backend pid: 15008
statement: B message
May 15 18:59:59 vortex pgpool[31750]: DB node id: 0 backend pid: 15008
statement: D message
May 15 18:59:59 vortex pgpool[31750]: DB node id: 0 backend pid: 15008
statement: Execute: SELECT consrc FROM pg_catalog.pg_constraint WHERE
contype = 'c' AND conrelid = 11404 AND conkey = '{9}'
May 15 18:59:59 vortex pgpool[31750]: DB node id: 0 backend pid: 15008
statement: Parse: SELECT consrc FROM pg_catalog.pg_constraint WHERE
contype = 'c' AND conrelid = 11408 AND conkey = '{1}'
May 15 18:59:59 vortex pgpool[31750]: DB node id: 0 backend pid: 15008
statement: B message
May 15 18:59:59 vortex pgpool[31750]: DB node id: 0 backend pid: 15008
statement: D message
May 15 18:59:59 vortex pgpool[31750]: DB node id: 0 backend pid: 15008
statement: Execute: SELECT consrc FROM pg_catalog.pg_constraint WHERE
contype = 'c' AND conrelid = 11408 AND conkey = '{1}'
May 15 18:59:59 vortex pgpool[31750]: DB node id: 0 backend pid: 15008
statement: Parse: SELECT consrc FROM pg_catalog.pg_constraint WHERE
contype = 'c' AND conrelid = 11408 AND conkey = '{2}'
May 15 18:59:59 vortex pgpool[31750]: DB node id: 0 backend pid: 15008
statement: B message
May 15 18:59:59 vortex pgpool[31750]: DB node id: 0 backend pid: 15008
statement: D message
May 15 18:59:59 vortex pgpool[31750]: DB node id: 0 backend pid: 15008
statement: Execute: SELECT consrc FROM pg_catalog.pg_constraint WHERE
contype = 'c' AND conrelid = 11408 AND conkey = '{2}'
May 15 18:59:59 vortex pgpool[31750]: DB node id: 0 backend pid: 15008
statement: Parse: SELECT consrc FROM pg_catalog.pg_constraint WHERE
contype = 'c' AND conrelid = 11408 AND conkey = '{3}'

Everytime 'conrelid' changes and/or 'conkey' changes - so far, not even
reloading the same ticket results in not a single cache entry... out of
hundreds of thousands...

However I know caching is working because I see things like:

May 15 18:57:29 vortex pgpool[31752]: query result fetched from cache.
statement: SELECT  * FROM Tickets WHERE id = $1
000000010000000636363130343100010000
May 15 18:57:29 vortex pgpool[31752]: DB node id: 0 backend pid: 14906
statement: Parse: SELECT  * FROM Queues WHERE id = $1
May 15 18:57:29 vortex pgpool[31752]: DB node id: 0 backend pid: 14906
statement: B message
May 15 18:57:29 vortex pgpool[31752]: DB node id: 0 backend pid: 14906
statement: D message
May 15 18:57:29 vortex pgpool[31752]: query result fetched from cache.
statement: SELECT  * FROM Queues WHERE id = $1 0000000100000002313900010000
May 15 18:57:29 vortex pgpool[31752]: DB node id: 0 backend pid: 14906
statement: Parse: SELECT ACL.id, ACL.ObjectType, ACL.ObjectId FROM ACL,
Principals, CachedGroupMembers WHERE Principals.id = ACL.PrincipalId AND
Principals.PrincipalType = 'Group' AND Principals.Disabled = 0 AND
CachedGroupMembers.GroupId  = ACL.PrincipalId AND
CachedGroupMembers.GroupId  = Principals.id AND
CachedGroupMembers.MemberId = 1079074 AND CachedGroupMembers.Disabled =
0  AND ((ACL.ObjectType = 'RT::System' AND ACL.ObjectId = 1) OR
(ACL.ObjectType = 'RT::Queue' AND ACL.ObjectId = 19) OR (ACL.ObjectType
= 'RT::Ticket' AND ACL.ObjectId = 661041)) AND (ACL.RightName =
'SuperUser' OR ACL.RightName = 'ShowOutgoingEmail')  LIMIT 1
May 15 18:57:29 vortex pgpool[31752]: DB node id: 0 backend pid: 14906
statement: B message
May 15 18:57:29 vortex pgpool[31752]: DB node id: 0 backend pid: 14906
statement: D message
May 15 18:57:29 vortex pgpool[31752]: query result fetched from cache.
statement: SELECT ACL.id, ACL.ObjectType, ACL.ObjectId FROM ACL,
Principals, CachedGroupMembers WHERE Principals.id = ACL.PrincipalId AND
Principals.PrincipalType = 'Group' AND Principals.Disabled = 0 AND
CachedGroupMembers.GroupId  = ACL.PrincipalId AND
CachedGroupMembers.GroupId  = Principals.id AND
CachedGroupMembers.MemberId = 1079074 AND CachedGroupMembers.Disabled =
0  AND ((ACL.ObjectType = 'RT::System' AND ACL.ObjectId = 1) OR
(ACL.ObjectType = 'RT::Queue' AND ACL.ObjectId = 19) OR (ACL.ObjectType
= 'RT::Ticket' AND ACL.ObjectId = 661041)) AND (ACL.RightName =
'SuperUser' OR ACL.RightName = 'ShowOutgoingEmail')  LIMIT 1
0000000000010000
May 15 18:57:29 vortex pgpool[31752]: DB node id: 0 backend pid: 14906
statement: Parse: SELECT  * FROM Groups WHERE LOWER(Domain) = LOWER($1)
AND LOWER(Type) = LOWER($2)
May 15 18:57:29 vortex pgpool[31752]: DB node id: 0 backend pid: 14906
statement: B message
May 15 18:57:29 vortex pgpool[31752]: DB node id: 0 backend pid: 14906
statement: D message
May 15 18:57:29 vortex pgpool[31752]: query result fetched from cache.
statement: SELECT  * FROM Groups WHERE LOWER(Domain) = LOWER($1) AND
LOWER(Type) = LOWER($2)
000000020000000E53797374656D496E7465726E616C0000000C556E70726976696C6567656400010000

This could well be something underneath (DBD::Pg or DBIx::Searchbuilder
or even the pgsql.so shared library) - however it's only RT that's doing
it - I use DBD::Pg and the same version of the libraries to the same
servers for my main website where every page is served by the same
version of perl under the same version of mod_perl2 (not using Mason or
DBIx::* on my site though).. I don't use pgpool caching on my site
either and see no such performance problems... so it's something that is
being called that either shouldn't be or it's very very broken. 
Pointers would be good - even on the same network (even if that is
possible for me) it would be a performance hit - the queries take 2ms
average - but doing 1000 of them is going to give a 2s delay even if the
server was on the same host, already connected and had no (as in zero)
lag to the DB.. and that's without the processing overhead to process
hundreds of useless queries per second with no results for every page
load...

Regards,

-- 
Michelle Sullivan
http://www.mhix.org/