[Rt-devel] Postgres performance issues and some "odd" ways to get IDs...

Mon Aug 14 11:24:30 EDT 2006

Hey All,

Preface: I'm a complete newb and l4m3r when it comes to DBs... however I 
do have some very good friends who work as DB programmers... I am also 
designing a new DB and learning lots... I am also running RT with around 
12m rows...originally on MySQL, now on Postgres....

Performance on MySQL was ok until we broken the 10m row mark now it 
sucks big time...

Performance on Postgres .. not hot but very consistent and usable...

Performance on Postgres a week later (even with performing VACUUM FULL 
ANALYSE) .. sucks - not as bad as MySQL but not good... seems there are 
a number of improvements that can be made ... the first of which I 
noticed when I remember the golden rule a colleague drummed into me when 
he saw one of my tables like:

                                         Table "public.users"
        Column         |            Type             |                   
Modifiers
-----------------------+-----------------------------+------------------------------------------------
 id                    | integer                     | not null default 
nextval('users_id_seq'::text)

.. he said "don't do it, at best it's bad practice, at worst it creates 
race conditions"... when you try to do things like get the last insert 
id..... so when I was trying to get to he bottom of the doggyness in the 
RT DB here I noticed a SELECT following each INSERT and that it was 
taking average around 15 seconds and upto 45 seconds and seemed to be 
the only slow queries... needless to say I was surprised when I saw what 
it was doing (eg: SELECT id from Transactions where oid = <blah>;) ... 
further inspection lead me to DBIx::SearchBuilder::Handle::Pg::Insert() 
... and the following:

sub Insert {
    my $self = shift;
    my $table = shift;

    my $sth = $self->SUPER::Insert($table, @_ );

    unless ($sth) {
            return ($sth);
    }

    #Lets get the id of that row we just inserted
    my $oid = $sth->{'pg_oid_status'};
    my $sql = "SELECT id FROM $table WHERE oid = ?";
    my @row = $self->FetchResult($sql, $oid);
    # TODO: Propagate Class::ReturnValue up here.
    unless ($row[0]) {
            print STDERR "Can't find $table.id  for OID $oid";
            return(undef);
    }
    $self->{'id'} = $row[0];

    return ($self->{'id'});
}

.. is there any reason why you're not SELECTing the next val from the 
sequence and passing that to the insert and returning it .... that way 
you could remove the whole reliance on the OIDs in Postgres and improve 
large DB performance (substantially in my case)....?

I figure as you are naming the sequences as <tablename>_id_seq, 
hardcoding it is as bad as hardcoding column the SELECT id FROM 
$table.... and it seems to be the only place that RT relies on OIDs....

Something like

SELECT nextval AS id from nextval('$table_id_seq'::text);

will get the correct value of id _before_ the insert - discard if the 
insert fails ... Postgres has a very large number for sequences I doubt 
you'll run out.... Of course this code would be Pg specific as MySQL 
still doesn't support sequences properly .. but that's another story.

Comments/flames welcome...

Regards,

Mat

PS: It's 1:15am here after 6 hours of database performance debugging, 
apologies if the above offends, it is not meant to.