[rt-users] rt2.0.9 times out/segfaults

Bob Apthorpe arclight at jump.net
Sun Jan 6 10:20:57 EST 2002


Hi,

This is the part where I sheepishly admit that this is where I start guessing how C and/or unix works. The process reminds me 
of reconstructing an airplane crash by spreading out all the pieces in big warehouse and trying to figure out why the plane 
suddenly decided to not be airworthy anymore.

On 6 Jan 2002, at 12:31, Calvin Browne wrote:

> > 
> > Hi,
> > 
> > You might try running Apache under 'strace' ('truss' under Solaris) 
> > and repeating whatever causes the segfaults. I've needed to do this 
> > to resolve mysterious Apache problems before.
> > 
> > -- Bob
> 
> Thanks for the info.
[snip] 
> and the strace output for
> 
> [Sun Jan  6 11:21:27 2002] [notice] child pid 6187 exit signal Segmentation fault (11)
> 
> (ummm - its pretty long).
> 
> Any ideas from anyone out there?
> 
> --Calvin

I've clipped some of the output and annotated it as best I can. I'm still not sure what's going on, but I think I can see a timeout 
on a socket read. I'm not sure what's causing the segfault; it looks like process 6187 is just handling the signals it gets. (Note: 
Astute readers will note my lingering FORTRAN mindset.)

> [pid  6179] <... fork resumed> )        = 6187
[... watch as fork is resumed ... ]
[pid  6187] open("/etc/httpd/logs/ssl_mutex.6178", O_WRONLY <unfinished ...>
[pid  6187] <... open resumed> )        = 3
[pid  6187] fcntl64(0x3, 0, 0xf, 0x400168e4 <unfinished ...>
[pid  6187] <... fcntl64 resumed> )     = 25
[pid  6187] close(3 <unfinished ...>
[pid  6187] <... close resumed> )       = 0
[ ... resume opening ssl_mutex.6178 on unit 3; does this really close or is it still hanging aroung? ]
> [pid  6187] open("/etc/httpd/logs/ssl_mutex.6178", O_WRONLY <unfinished ...>
> [pid  6187] <... open resumed> )        = 3
> [pid  6187] fcntl64(0x3, 0, 0xf, 0x400168e4 <unfinished ...>
> [pid  6187] <... fcntl64 resumed> )     = 25
> [pid  6187] close(3 <unfinished ...>
> [pid  6187] <... close resumed> )       = 0
[ ... resume some more ... ]
[ resume a socket connection (this is now on unit 3, I think). ]
> [pid  6187] select(19, [17 18], NULL, NULL, NULL <unfinished ...>
> [pid  6187] <... select resumed> )      = 1 (in [17])
[ yuck! what's this "192.16192.168.34"? Typo? ]
> [pid  6187] accept(17, {sin_family=AF_INET, sin_port=htons(32909), sin_addr=inet_addr("192.16192.168.34")}}, [16]) = 3
> [pid  6187] semop(1540098, 0x819fb56, 1) = 0
> [pid  6187] rt_sigaction(SIGUSR1, {SIG_IGN}, {0x80c1cac, [], SA_INTERRUPT|0x4000000}, 8) = 0
> [pid  6187] getsockname(3, {sin_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("192.168.247.6")}}, [16]) = 0
> [pid  6187] setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[ ... DBD has trouble loading Postgres libraries (probably not a big deal); same for non-existant Kerberos libs ]
[ Apache::Session does its magic and some SQL queries occur ... ]
[ Here we see the Storable module getting autoloaded ]
> [pid  6187] stat64("/usr/lib/perl5/site_perl/5.6.0/i386-linux/auto/Storable/thaw.al", {st_mode=S_IFREG|0444, st_size=640, ...}) = 0
> [pid  6187] getgroups32(0x20, 0xbffff230) = 1
> [pid  6187] rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0
> [pid  6187] open("/usr/lib/perl5/site_perl/5.6.0/i386-linux/auto/Storable/thaw.al", O_RDONLY) = 9
> [pid  6187] fstat64(9, {st_mode=S_IFREG|0444, st_size=640, ...}) = 0
> [pid  6187] mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x403d2000
> [pid  6187] read(9, "# NOTE: Derived from blib/lib/St"..., 4096) = 640
> [pid  6187] read(9, "", 4096)           = 0
> [pid  6187] close(9)                    = 0
[ memory is freed? ]
> [pid  6187] munmap(0x403d2000, 4096)    = 0
[ what is all this brk() stuff? results of munmap()? ]
> [pid  6187] brk(0x8d35000)              = 0x8d35000
> [pid  6187] brk(0x8d36000)              = 0x8d36000
> [pid  6187] brk(0x8d37000)              = 0x8d37000
> [pid  6187] brk(0x8d38000)              = 0x8d38000
> [pid  6187] brk(0x8d39000)              = 0x8d39000
> [pid  6187] brk(0x8d3a000)              = 0x8d3a000
> [pid  6187] brk(0x8d3b000)              = 0x8d3b000
> [pid  6187] brk(0x8d3c000)              = 0x8d3c000
> [pid  6187] brk(0x8d3e000)              = 0x8d3e000
> [pid  6187] brk(0x8d3f000)              = 0x8d3f000
[ get file info on "/opt/rt2/WebRT/html/Ticket/Display.html" ]
> [pid  6187] stat64("/opt/rt2/WebRT/html/Ticket/Display.html", {st_mode=S_IFREG|0755, st_size=4226, ...}) = 0
> [pid  6187] brk(0x8d50000)              = 0x8d50000
> [pid  6187] brk(0x8d61000)              = 0x8d61000
> [pid  6187] brk(0x8d72000)              = 0x8d72000
> [pid  6187] brk(0x8d73000)              = 0x8d73000
> [pid  6187] brk(0x8d74000)              = 0x8d74000
> [pid  6187] brk(0x8d75000)              = 0x8d75000
> [pid  6187] brk(0x8d76000)              = 0x8d76000
> [pid  6187] brk(0x8d77000)              = 0x8d77000
> [pid  6187] brk(0x8d78000)              = 0x8d78000
> [pid  6187] brk(0x8d79000)              = 0x8d79000
> [pid  6187] brk(0x8d7a000)              = 0x8d7a000
> [pid  6187] brk(0x8d7b000)              = 0x8d7b000
> [pid  6187] brk(0x8d7c000)              = 0x8d7c000
> [pid  6187] brk(0x8d7d000)              = 0x8d7d000
> [pid  6187] read(3,  <unfinished ...>
[ process 6179 sends a timeout signal (SIGALRM) to process 6187. Read timeout on the socket on unit 3? ]
> [pid  6179] kill(6187, SIGALRM)         = 0
> [pid  6187] <... read resumed> 0x8cc3388, 18437) = ? ERESTARTSYS (To be restarted)
> [pid  6187] --- SIGALRM (Alarm clock) ---
[ 6187 catches the signal, flushes output to unit 3 (open socket) and closes it ]
> [pid  6187] write(3, "\25\3\0\0\22nJ\261z$\223:A!\316\370L\2=\234\31\242\0", 23) = 23
> [pid  6187] close(3)                    = 0
[ okay, what's this? Does 6187 catch the SIGALRM, close up shop but fail to exit gracefully? ]
> [pid  6187] sigreturn()                 = ? (mask now [])
[ segfault on the socket close? the chdir() to "/etc/httpd"? ]
> [pid  6187] --- SIGSEGV (Segmentation fault) ---
> [pid  6187] chdir("/etc/httpd")         = 0
> [pid  6187] rt_sigaction(SIGSEGV, {SIG_DFL}, {SIG_DFL}, 8) = 0
> [pid  6187] getpid()                    = 6187
> [pid  6187] kill(6187, SIGSEGV)         = 0
> [pid  6187] sigreturn()                 = ? (mask now [])
[ Let us observe a moment of silence for process 6187 ]
> [pid  6187] --- SIGSEGV (Segmentation fault) ---
> [pid  6179] <... wait4 resumed> [WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV], WNOHANG, NULL) = 6187
[ ... and life goes on for 6179 ... ]

My big question is where the SIGALRM is coming from within process 6179. Also, I'm curious about the 
'inet_addr("192.16192.168.34")' call -- this looks gruesome unless my mail has been getting mangled in transit.

hth,

-- Bob





More information about the rt-users mailing list