No subject

Thu Oct 13 11:38:50 EDT 2022

browser - as far as I can see (the trace is a bit trimmed down), fd 3 is the
socket to the client, and apache has gone into the read and then timed out (via
the ALARM), probably after the default (300 seconds) inactivity timer (set by the
Timeout directive in httpd.conf).

(The segfault can probably be ignored - I see it a lot in mod_perl stuff when
apache is signal handling or closing down (after an alarm or 'child termination'
due to lower load etc.) - I often see it in unloading DBI and other XS stuff, but
it's after any user activity is finished, so I don't let it bother me.)

I saw something strikingly similar with our RT install last week: a user
complained that they couldn't access RT (display tickets etc.). Other users were
fine (including myself).

Our set up is a bit different to most probably - we're accessing our intranet
(and RT) through a special kind of reverse proxy with SSL on both sides, which
also stores the RT session cookies in the proxy (our product!), so I could trace
the activity both inside and outside the web server. We have plenty of other apps
on our intranet servers to test as well.

I cleared the Apache::Session for the RT users (we use RT with apache
authentication), and restarted various stuff, but the same basic problem - the
user couldn't get the display screen up. strace on the intranet and the proxy
showed the same basic cause - RT/apache is waiting for more browser input (our
proxy agreed - more browser input was expected) but the browser never sent it.
The problem only occured on 'POST' pages (including ticket display from 'GoTo
Ticket' etc.), not on GET pages.

POST to other intranet applications on the same server worked fine (for big and
small content), and other users were also fine. I tried logging in as the
affected user from my browser, and that was also fine! The RT and security
gateway session managers were ruled out, as was the user's database info.

The user was using Win2K with Netscape and Mozilla through a local solaris
firewall. Sometimes I've seen firewalls with over-zealous ICMP restrictions
(which affects path-mtu discovery) from breaking connections when large packets
are used (e.g. ssh is fine, but scp breaks!), but this didn't seem to be a
problem. We are using ssl for all traffic, so its not a proxy or httpd caching
problem.

I got the user to try IE (5.5 I think) on the same machine, and they got in OK!
Doh! (They forgot later this week and used mozilla again - same problem, so its
nothing intermittent). I tested as this user from my Netscape & Mozilla browsers,
except I'm using linux, and that was also fine.

Anyway, I think there's some kind of quite specific browser-based dependency in
RT and/or mod_perl (or even mod_ssl?) which is causing httpd to expect more input
when there isn't any coming, which causes httpd to sit in a (never terminating)
read until the global timeout. Because the apache::session is locked, other RT
requests (e.g. from stop/reload etc.) will also get blocked until the 5 minute
timeout releases the session lock.

This looks much like Calvin's problem (especially as he's also using SSL - check
the trace!) and Jesse asked about browser type breakdown, so I'm hoping he has a
candidate for the problem!

This is on RT 2.0.7, which we've had installed for quite a while, without
previous incident.

Cris