[Rt-devel] Huge number of database queries

Tue Nov 20 19:39:07 EST 2007

On Mon, 2007-11-19 at 07:18 -0600, Jesse Vincent wrote:
> *nod* FWIW, enabling "log_statement" on 8.1 didn't do that for me.
> 

PostgreSQL logging has many options, and many of them are
interdependent. I mentioned log_min_duration_statement=0 because that's
an easy way to make sure that the statements are logged. 

> > 
> > I'm more worried about the stability of performance than anything else.
> > With so many network round-trips to the database, I'm worried that any
> > minor slowdown would make RT unavailable. We don't notice any serious
> > problem right now, because our network ping time is ~0.2 ms, and our
> > processor usage on the database server is also low. Also, we have
> > essentially an empty database now, so what will the performance be like
> > as it grows? Will the number of queries per page load change as the
> > dataset changes?
> 
> 
> It shouldn't change significantly, but if you want to stress-test, the
> results can help us to improve RT.

We'll certainly provide our results.

> > It also just makes it harder to administer. If I turn on query logging
> > to diagnose a problem, the signal to noise ratio makes it almost
> > useless.
> 
> I think that a little experience about what to grep out will go a long
> way toward that. We've had a fair amount of experience and luck tuning
> RT on Postgres with those logs.
> 

Ok.

> >>> It's hard for me to imagine that we need 44 normal SELECT
> >>> statements to view a ticket.
> >> Can you pull out what they are? 
> > 
> > I have to scrub the data before putting it in a public place (yeah, I
> > know there's nothing sensitive, but I still have to look to be sure), so
> > this is just a little more detail to the summary:
> > 
> > 016  parse
> >   001  "SELECT  * FROM Users WHERE LOWER(Gecos) = LOWER($1)"
> >   002  "SELECT  * FROM Users WHERE LOWER(Gecos) = LOWER($1)"
> 
> LOWER(Gecos)? Really? Can you pull out the values there? I wouldn't
> expect to see that in normal operation unless you're playing external
> authenntication tricks.

We set WebExternalAuth and we unset WebFallbackToInternalAuth. I don't
think we do any tricks aside from that. We use a .htaccess file to
authenticate through LDAP.

> >   001  "SELECT  * FROM Transactions WHERE id = $1"
> 
> This seems a little surprsiing. That should get pulled in from a bigger
> prefetch of all txns related to that ticket
> 
> >   009  other
> > 018  bind
> > 018  execute
> > 016  deallocate
> 
> I talked about those in my last mail. I'd love to see a patch to
> SearchBuilder to reuse prepared statements, but I'm not sure it's a
> high-priority target.
> 

I'll have to look into it to see how many statements can be effectively
reused.

> > 049  select
> >   005  ping test
> >   011  "SELECT ... from ACL, Groups, Principals, CachedGroupMembers ..."
> 
> 11 ACL checks seems slightly high but not insanely high, since RT is
> checking queue, ticket and custom field ACLs. Though you should pay
> attention to Ruslan's recent mail about refactoring the ACL code to be a
> bit smarter and cut down on the number of queries.

Ok.

> >   004  "SELECT main.* FROM ( SELECT main.id FROM CustomFields ..."
> 
> Your custom fields.
> 

"customfield" and "customfieldvalues" are both empty tables

> >   004  "SELECT DISTINCT main.* FROM Users main JOIN Principals ..."
> 
> Requestors, Ccs, AdminCcs and Owners.
> 
> >   006  "SELECT main.* FROM Tickets main  WHERE ..."
> 
> Unsure
> 
> >   006  "SELECT main.* FROM Links main  WHERE ..."
> 
> Six different kinds of links on tickets
> 
> >   002  "...Users main CROSS JOIN ACL..."
> >   011  other
> 
> What falls into "other"?
> 

I'll send the logs to you off-list.

> *nod* Improving query count has generally taken a back-seat to improving
> overall query _performance_, since quite often there are one or two
> queries in that set that dwarf the run time of all the rest. That said,

Good performance and also stable/scalable performance that doesn't
drastically change when the conditions change are the main goals for me.
I'm not implying that RT does that currently, but that's where I'm
coming from.

> I'd love to make fewer queries and get the same results ;)  Would you
> maybe be up for helping to improve things? Setting up a global
> prepared-query cache seems like it might be a good, well-contained
> project that would have a pretty direct impact on what's made your
> spidey-sense tingle.

I've been looking into it as I learn RT. The deadline for deploying RT
is approaching, but afterward when RT is in maintenance I may be able to
contribute (employer permitting).

> There's also interesting work to be done with caching and invalidating
> caches of collection search results, possibly using a tool like MemCache
> that I'd be happy to ramble about if somebody is interested in hacking
> on it.

That's always an interesting subject, but I'm cautious about introducing
more caching because it's a challenge to get it right. I'd hate to give
up real stability for hypothetical performance ;)

Regards,
	Jeff Davis