Debugging a Segfault
From Cfwiki
There's a brief page on cfengine segfaults at http://www.cfengine.org/confdir/segfault.html but it doesn't describe "post facto" debugging. As it is not usually feasible to run production services under the debugger, usually all we're left with is a corefile. Let me emphasise this first:
DO NOT EMAIL COREFILES TO THE MAILING LIST!
A corefile is not helpful to anyone except you, because nobody else has access to the binary that generated the corefile.
How to get a useful core file
First, if you're using a package manager (such as RPM or pkgadd), make sure the build process a) preserves the '-g' flag while compiling; and b) does not use strip(1) on the compiled binaries. For the most recent Red Hat-based distributions, the debugging info is stripped from the main binaries
but saved in a separate package named packagename-debuginfo-version.arch.rpm
Next, make sure your shell is configured to allow core files to be created. A common thing to do for security purposes is to remove the ability to create core files. In our case, we want core files to be produced. Note that this can be done easily and without altering the system values that were previously set.
To do this in almost all shells, the command ulimit controls whether core files will be generated. Use the command
ulimit -c unlimited
(which sets the core file maximum size), then use the command
ulimit -a
to verify that the core files are indeed set to be generated. Start up the cfservd or cfagent from this shell directly -
if you use a startup script, the ulimit value may not be inherited into its environment.
Note that the ulimit command is quite often a shell built-in, but is sometimes a system binary (if there is no shell built-in). Note, too, that this setting is inherited by spawned shells; thus, if a shell changes the values set by ulimit, then all future shells will have that new value.
Your core file will be generated in the cwd (or current working directory) of the binary - you can use a tool like lsof(8) to find out what that is, or look at the /proc entry for the pid of your binary, if your operating system has a procfs filesystem.
To test that a core file will be generated the way you expect, you can kill -SEGV your process. You should end up with a nice core file, suitable for running gdb on.
What to do once you have a corefile
Invoke gdb on your corefile with the '-c' flag
[ Copyright snipped ] gdb -c /path/to/corefile Program terminated with signal 11, Segmentation fault. #0 0xb73853b8 in ?? () (gdb)
At the (gdb) prompt, load the binary which generated the core with the 'file' command:
(gdb) file /usr/sbin/cfservd Reading symbols from /usr/sbin/cfservd...Reading symbols from /usr/lib/debug//usr/sbin/cfservd.debug...done. Using host libthread_db library "/lib/tls/libthread_db.so.1". done.
Now use the 'bt' (backtrace) command to display what the program was doing when it crashed:
(gdb) bt #0 0x08064976 in DeleteItemGeneral (list=0x80924b0, string=0x405f2af0 "10.10.16.209", type=regexComplete) at item-ext.c:707 #1 0x4707521c in ?? () #2 0x08064baf in DeleteItemMatching (list=0x80924b0, string=0x405f2af0 "10.10.16.209") at item-ext.c:767 #3 0x08050a28 in DeleteConn (conn=0x1) at cfservd.c:3268 #4 0x0804c6fb in HandleConnection (conn=0x40508b50) at cfservd.c:1129 #5 0x401d32b6 in ?? ()
The above should be enough to pinpoint the problem, but optionally the 'bt full' command can give more information.
Mail that output to the list and, as Mark says -- Are you running the latest version?