Less known Solaris features: About crashes and cores - Part 4: Crashdump analysis for beginners
Okay, now you have all this crash and core dumps, it would be nice to do something useful with it. Okay, i show you just some basic tricks to get some insight into the state of a system when it wrote a crash dump.
Basic analysis of a crash dump with mdb
At first we load the dump into the
A nice information is the backtrace. This helps you to find out, what triggered the crash dump. In this case it´s easy. It´s was the
But it would be nice, to know more of the state of the system, at the moment of the crash. For example we can print out the process table of the system like we would do it with
We can even lookup, which files or sockets where opened at the moment of the crash dump. For example: We want to know the open files of the
ssh daemon. To get this information, we have to use the address of the process from the process table (the eigth column) and extend it with
And here we look into the open files of the
As the core dump contains all the pages of the kernel (or more, in the case you configure it) you have a frozen state of your system to investigate everything you want.
And to get back to my security example: With the core dump and
mdb you can gather really interesting informations. For example, you can see that an ssh connection was open at the time of the crash dump.
An example from the field
You can do it like the pros and and look at source code and crash dump side by side to finde the root cause for an error. Or like some colleagues at the Sun Mission Critical Support Center who wouldn´t surprise me, when they find the error by laying their hand on a system).
For all others, there is a more simple way to analyse your crash dump to have at least a little bit more informations to search in a bug database.
I will use a crash i´ve analysed a long time ago to show you the trick. Okay, you have to start a debugger. I used
mdb in this example:
A prompt appears, just type in
$C to get a stack trace.
Okay, now start at the beginning of the trace to strip all lines from the operating system infrastructure for error cases. Okay,
vpanic() generates the panic. The second line is useless for our purposes to. The next both lines with
segmap are generated by the error but not the root cause. The interesting line ist
With this name you can go to Sunsolve or bugs.opensolaris.org. Et voila : System panic due to recursive mutex_enter in snf_smap_desbfree trying to re-aquire Tx mutex. When you type this error into the PatchFinder, you will find a patch fixing this bug: 124255-03
- It´s a good practice to know
mdb. It´s very useful at compiling open source software in the case your compiled code throw cores, but you don´t know why. core files are not just for deleting them.
- Error reports with a stack trace are more usefull than an error report just with "The system paniced when i did this"