When one of my systems rebooted this morning, I found monit (4.8.1, backported Debian package) spinning. I attached to it with gdb and did a backtrace; it had only the poll thread (this system's running kernel
2.6.16, but under Xen 3.0.2 with Debian sarge, so TLS is disabled and we're back on LinuxThreads) and one other, which was spinning in a call to glibc's mallopt. This is sympomatic of a double-free bug, and given that there's only one real thread it appears to be in the startup code.
Even aside from the missing symbols, I find the stack trace somewhat suspect; I can't see how signal would end up calling glob. If it happens again I guess I'll try out a debug build. Unfortunately it was fine when I restarted monit, and the server doesn't get rebooted very often at all, so may not get much further info.
Anything else I should try?
Cheers, Will
address@hidden:~$ sudo gdb -p 24683 GNU gdb 6.3-debian Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-linux". Attaching to process 24683 Using host libthread_db library "/lib/libthread_db.so.1".
warning: could not load vsyscall page because no executable was specified
warning: try using the "file" command first Reading symbols from /usr/sbin/monit...(no debugging symbols found)...done. Reading symbols from /lib/libpthread.so.0...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 24683)] [New Thread 32769 (LWP 24684)] Loaded symbols for /lib/libpthread.so.0 Reading symbols from /lib/libcrypt.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1... (no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1 Reading symbols from /usr/lib/i686/cmov/libssl.so.0.9.7...(no debugging symbols found)...done. Loaded symbols for /usr/lib/i686/cmov/libssl.so.0.9.7 Reading symbols from /usr/lib/i686/cmov/libcrypto.so.0.9.7...(no debugging symbols found)...done.
Loaded symbols for /usr/lib/i686/cmov/libcrypto.so.0.9.7 Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...
(no debugging symbols found)...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib/libnss_compat.so.2 Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_nis.so.2 Reading symbols from /lib/libnss_files.so.2...
(no debugging symbols found)...done. Loaded symbols for /lib/libnss_files.so.2 Reading symbols from /lib/libnss_dns.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_dns.so.2 0x3ad0f51e in mallopt () from /lib/libc.so.6
(gdb) info threads 2 Thread 32769 (LWP 24684) 0x3ad6ba5a in poll () from /lib/libc.so.6 1 Thread 16384 (LWP 24683) 0x3ad0f51e in mallopt () from /lib/libc.so.6 (gdb) thread 1 [Switching to thread 1 (Thread 16384 (LWP 24683))]#0 0x3ad0f51e in mallopt () from /lib/libc.so.6
(gdb) bt #0 0x3ad0f51e in mallopt () from /lib/libc.so.6 #1 0x3ad0ed4b in mallopt () from /lib/libc.so.6 #2 0x3ad0df33 in malloc () from /lib/libc.so.6 #3 0x3ad40296 in opendir () from /lib/libc.so.6
#4 0x3ad48457 in glob_pattern_p () from /lib/libc.so.6 #5 0x3ad476ad in glob () from /lib/libc.so.6 #6 0x080704f7 in signal () #7 0x080709d1 in signal () #8 0x0806d061 in signal () #9 0x0806f7c8 in signal ()
#10 0x08051d6e in ?? () #11 0x08093380 in ?? () #12 0xafc78c83 in ?? () #13 0xafc77818 in ?? () #14 0x08051d60 in ?? () #15 0x3adccea0 in errno () from /lib/libc.so.6 #16 0xafc7780c in ?? () #17 0xafc77818 in ?? ()
#18 0x08071d0b in signal () #19 0x3acb2e36 in __libc_start_main () from /lib/libc.so.6 #20 0x0804b3a1 in ?? () (gdb) thread 2 [Switching to thread 2 (Thread 32769 (LWP 24684))]#0 0x3ad6ba5a in poll () from /lib/libc.so.6
(gdb) bt #0 0x3ad6ba5a in poll () from /lib/libc.so.6 #1 0x3aaccb50 in __pthread_manager () from /lib/libpthread.so.0 #2 0x3ad748aa in clone () from /lib/libc.so.6 (gdb) c Continuing.
Program received signal SIGINT, Interrupt.
[Switching to Thread 16384 (LWP 24683)] 0x3ad0f578 in mallopt () from /lib/libc.so.6 (gdb) bt #0 0x3ad0f578 in mallopt () from /lib/libc.so.6 #1 0x3ad0ed4b in mallopt () from /lib/libc.so.6 #2 0x3ad0df33 in malloc () from /lib/libc.so.6
#3 0x3ad40296 in opendir () from /lib/libc.so.6 #4 0x3ad48457 in glob_pattern_p () from /lib/libc.so.6 #5 0x3ad476ad in glob () from /lib/libc.so.6 #6 0x080704f7 in signal () #7 0x080709d1 in signal ()
#8 0x0806d061 in signal () #9 0x0806f7c8 in signal () #10 0x08051d6e in ?? () #11 0x08093380 in ?? () #12 0xafc78c83 in ?? () #13 0xafc77818 in ?? () #14 0x08051d60 in ?? () #15 0x3adccea0 in errno () from /lib/libc.so.6
#16 0xafc7780c in ?? () #17 0xafc77818 in ?? () #18 0x08071d0b in signal () #19 0x3acb2e36 in __libc_start_main () from /lib/libc.so.6 #20 0x0804b3a1 in ?? () (gdb) c Continuing.
Program received signal SIGINT, Interrupt.
0x3ad0f4f4 in mallopt () from /lib/libc.so.6 (gdb) bt #0 0x3ad0f4f4 in mallopt () from /lib/libc.so.6 #1 0x3ad0ed4b in mallopt () from /lib/libc.so.6 #2 0x3ad0df33 in malloc () from /lib/libc.so.6 #3 0x3ad40296 in opendir () from /lib/libc.so.6
#4 0x3ad48457 in glob_pattern_p () from /lib/libc.so.6 #5 0x3ad476ad in glob () from /lib/libc.so.6 #6 0x080704f7 in signal () #7 0x080709d1 in signal () #8 0x0806d061 in signal () #9 0x0806f7c8 in signal ()
#10 0x08051d6e in ?? () #11 0x08093380 in ?? () #12 0xafc78c83 in ?? () #13 0xafc77818 in ?? () #14 0x08051d60 in ?? () #15 0x3adccea0 in errno () from /lib/libc.so.6 #16 0xafc7780c in ?? () #17 0xafc77818 in ?? ()
#18 0x08071d0b in signal () #19 0x3acb2e36 in __libc_start_main () from /lib/libc.so.6 #20 0x0804b3a1 in ?? () (gdb) quit The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program: /usr/sbin/monit, process 24683