[rt-users] RT 2.0.14 hangs on certain queries

Michael Bilow mike at bilow.com
Sun Oct 12 03:49:16 EDT 2003


Here is some follow-up on my message below.

In trying to diagnose this problem, we put our old RT test server back
into operation.  This is a separate server which runs Debian "woody"
(stable) with an older version of the the Debian "request-tracker"
package, 2.0.13-4.  We used this system to test RT before rolling it out
into production, where it was subsequently upgraded to 2.0.14-2.  The
older server is running Apache 1.2.26, mod_perl 1.26, HTML::Mason 1.04,
and Perl 5.6.1.

The RT installations on both the old and new servers access exactly the
same database and the same live data, which is critically important to
understanding the significance of this test.

On the old server, both the command line interface and web UI work fine,
just as they did until Friday on the new server.  Searches using
constraints that fail on the new server (such as "--limit-requestor")
complete normally on the old server.  This rules out, in my opinion, any
possibility of database corruption or other problems.

-- Mike


On 2003-10-11 at 23:25 -0400, Michael Bilow wrote:

> We have a Debian "sarge" (testing) system that was working for some time,
> using the Debian package "request-tracker" 2.0.14-2 that has been orphaned
> to make way for RT3.  We did not upgrade to RT3 mainly because the system
> has been working for us and we were concerned about the migration path for
> the existing database.  Can I install RT3 and have it share access to the
> same database used for RT2, or is there a one-way migration required?
> 
> We were primarily using the web interface with Apache 1.3.27, mod_perl
> 1.27, HTML::Mason 1.21, and perl 5.8.0.  This was a working system, but
> since we are running on the Debian testing distribution, it is possible
> that the package manager swapped something subtle that we didn't notice.
> 
> The only slightly unusual thing about our installation is that we access
> the database over the network.  As explained, this has been working for
> some time as well.  The database server machine is running the Debian
> "woody" (stable) distribution, with package "postgresql" 7.2.1-2woody2.
> 
> Yesterday, some queries from the web interface started hanging.  For
> example, if a user tries to log in with the wrong password, the web site
> (correctly) returns an error immediately after consulting the database.  
> However, if the user logs in with the correct password, the web site hangs
> and the server shows the relevant instance of Apache pegged approaching
> 100% of CPU utilization.  In order to stop the runaway instance, we need
> to kill and restart Apache.
> 
> The "rt-mailgate" tool continues to work, and users continue to be able to
> submit tickets and get responses via e-mail.
> 
> Looking into this further, I discovered that the "rt" command line utility
> hangs on some types of queries and not others.  For example, a relatively
> complicated query with a bunch of constraints like this succeeds:
> 
> 	./rt --limit-owner=mikebw --limit-status=new --limit-status=open
> 	   --limit-last-update=20031009- --summary
> 
> However, adding a constraint for requestor causes a hang until killed with
> Ctrl-C.  In fact, even a constraint for only the requestor hangs:
> 
> 	./rt '--limit-requestor=mike at bilow.com' --summary
> 
> My inference is that the web site hangs after logging in because one of
> the regions that is displayed to a user on their RT home is the list of
> tickets for which they are the requstor.
> 
> The fact that the "rt" command line tool hangs combined with the fact that
> web site works in some cases indicates to me that the problem is somewhere
> inside the Perl code and not in the Apache-specific stuff such as mod_perl
> and Mason, but I suppose I could be wrong about this.  Regardless, since I
> have a failure at the command line, I can use tools such as strace to
> examime what is going on.
> 
> Using strace to compare the cases where the request does not hang as
> opposed to where it does hang, both open the log files in the standard
> place used by the Debian package, /var/log/request-tracker/.  Oddly, in
> both cases, the log files are zero-length.  The last shared activity
> between the non-hanging and hanging instances is some stuff about the
> owner constraint on both queries:
> 
>  rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
>  send(3, "QSELECT  * FROM Users WHERE lower(Name) = \'mikebw\'\0", 51,
> 0) = 51
>  rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
>  select(4, [3], [], [3], NULL)           = 1 (in [3])
>  recv(3,
> "Pblank\0T\0\"id\0\0\0\0\27\0\4\377\377\377\377name\0\0\0\4\23\377\377\
> 0\0\0|password\0\0\0\4\23\377\377\0\0\0,comments\0\0\0\0\31\377\377\377\377\377\
> 377signature\0\0\0\0\31\377\377\377\377\377\377emailaddress\0\0\0\4\23\377\377\0
> \0\0|freeformc"..., 16384, 0) = 951
>  rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
>  send(3, "QSELECT  * FROM Users WHERE id = \'4\'\0", 37, 0) = 37
>  rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
>  select(4, [3], [], [3], NULL)           = 1 (in [3])
>  recv(3,
> "Pblank\0T\0\"id\0\0\0\0\27\0\4\377\377\377\377name\0\0\0\4\23\377\377\
> 0\0\0|password\0\0\0\4\23\377\377\0\0\0,comments\0\0\0\0\31\377\377\377\377\377\
> 377signature\0\0\0\0\31\377\377\377\377\377\377emailaddress\0\0\0\4\23\377\377\0
> \0\0|freeformc"..., 16384, 0) = 951
> 
> Following this, both instances write the header to the stdout console:
> 
>  brk(0x87f3000)                          = 0x87f3000
>  brk(0)                                  = 0x87f3000
>  write(1, "  id  Stat    Queue                                   Subject
>   Requestor \n", 81) = 81
> 
> But then the non-hanging version begins sending and receiving database
> queries (the leading "-" on each line comes from "diff -u" used for this
> comparison of the strace logs):
> 
> -rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
> -send(3, "QSELECT main.* FROM Tickets main   WHERE ((main.EffectiveId =
> main.id)) AND ((main.Owner = \'4\')) AND ((main.Status = \'new\')OR(ma"..., 151,
> 0) = 151
> -rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
> -select(4, [3], [], [3], NULL)           = 1 (in [3])
> -recv(3,
> "Pblank\0T\0\30id\0\0\0\0\27\0\4\377\377\377\377effectiveid\0\0\0\0\27\
> 0\4\377\377\377\377queue\0\0\0\0\27\0\4\377\377\377\377type\0\0\0\4\23\377\377\0
> \0\0\24issuestatement\0\0\0\0\27\0\4\377\377\377\377resolution\0\0\0\0\27\0\4
> \377\377\377\377own"..., 16384, 0) = 1448
> -select(4, [3], [], [3], NULL)           = 1 (in [3])
> -recv(3, "49:05+00\0\0\0\0321970-01-01 00:00:00+00\0\0\0\0322003-07-15
>  14:58:29+00\0\0\0\0322003-07-16 14:30:24+00\0\0\0\0322003-08-13
>  07:01:25+00\0\0\0\0051\0\0\0\0322003-08"..., 16271, 0) = 742
> 
> By comparison, the hanging version is, well, hung: nothing further is
> emitted into the strace log and it just sits there until Ctrl-C is hit.
> 
> We are kind of up the creek here without being able to use the web UI,
> even if we could live without ever searching by requestor.  Replying to
> tickets using the command line interface and a Unix editor is less than
> ideal.  (When doing that, is there any way to quote in the reply?)  I am
> hoping that someone very familiar with the code will recognize this
> problem as relatively simple and point me in the right direction for a
> patch.  As explained, the Debian RT2 package stopped updating at 2.0.14
> because the package maintainers switched focus to RT3, so upgrading to
> 2.0.15 would require pulling the system out from under package management,
> which I am keen to avoid.
> 
> -- Mike
> 




More information about the rt-users mailing list