[rt-users] PostgreSQL full-text index default type: GiST or GIN

Alex Vandiver alex at chmrr.net
Fri Mar 18 03:21:13 EDT 2016


On Thu, 17 Mar 2016 17:47:50 -0500
Matt Zagrabelny <mzagrabe at d.umn.edu> wrote:
> > I am curious about the benchmarks you used to make this determination?
> > In particular, the GIN fastscan option can dramatically improve search
> > performance in versions 9.4 and above. Here is a nice discussion:
> >
> > http://blog.pgaddict.com/posts/performance-since-postgresql-7-4-to-9-4-fulltext
> >
> > In addition, GIN indexes are much smaller in PostgreSQL 9.4 and above.
> > So I am curious about the data that motivated the change to GiST.

You are quite correct; GIN indexes are superior in virtually every
way for our use case.  And the documentation is unfortunately in error;
the default switched in the other direction, from GiST to GIN. See
https://github.com/bestpractical/rt/commit/e103f6da for the actual
behavior change.

This mistake is entirely my fault, and my only excuse for getting it
backwards is that the documentation commit was written several months
after the code was. Apologies!

> Here is a correspondence between myself and a BP engineer you  may find helpful:
> [snip]

That correspondence is out of date; at the time, GiST was the
default, based on no particular research.  The change to default to
GIN was made late 2014/early 2015 after I conducted additional research
on real-world data -- as well as the Postgres documentation, which is
quite clear that static data should use GIN.

You can see the results of that research at
https://chmrr.net/fts-charts/pg.html and
https://chmrr.net/fts-charts/query.html#pg

Those are from performing speed analysis on rebuilding the index on a
clone of Best Practical's own ticketing system, which was taken as
being a representative real-life sample of data.  It was run on
Postgres 9.3, and resulted in the following branch:
https://github.com/bestpractical/rt/compare/1d1ffe44...7c48294a
The last ~7 commits are the most Postgres-specific.

The end result is an indexing engine which, in its default
configuration, indexes data an order of magnitude faster, as well as
provides results an order of magnitude faster.  Wins all around!

I'm happy to supply a patch to fix the documentation, but I suspect
Shawn or someone at BPS will be faster to simply fix it directly. :)
 - Alex



More information about the rt-users mailing list