About crashes and cores

About crashes and cores

No software is without errors. This is a basic law of computer science. And when there is no bug in the software (by a strange kind of luck) your hardware has bugs. And when there are no bugs in the hardware, cosmic rays are flipping bits. Thus an operating system needs some mechanisms to stop a process or the complete kernel at once without allowing the system to write anything back to disk and thus manifesting the corrupted state. This tutorial will cover the most important concepts surrounding the last life signs of a system or an application.

A plea for the panic

The panic isn’t the bug, it’s the reaction of the system to a bug in the system. Many people think of panics as the result of an instability and something bad like the bogey man. But: Panics and crash dumps are your friend. Whenever the system detects a inconsistency in its structures, it does the best what it could to: protect your data. And the best way to do this, is to give the system a fresh start, don’t try to modify the data on the disk and write some status information to a special device to enable analysis of the problem. The concepts of panic and crash dump were developed to give the admin exactly such tools.

A good example: Imagine a problem in the UFS, a bit has flipped. The operating environment detects an inconsistence in the data structures. You can’t work with this error. It would unpredictably alter the data on your disk. You can’t shutdown the system by a normal reboot. The flushing of your disks would alter the data on the disk. The only way to get out of the system: Stop everything and restart the system, ergo panic the system and write a crash dump.

Furthermore: Some people look at the core and crash dumps and think about their analysis as an arcane art and see them as a wast of disk space. But it’s really easy to get some basic data and hints out of this large heaps of data.

Difference between Crash Dumps and Cores

Many people use this words synonymously ("The system paniced and wrote a core dump"). Every now and then i’m doing this as well. But this isn’t correct. The scope of the dump is quite different:

crash dump

A crash dump is the dump of the memory of the complete kernel.

core dump

The core dump is the dump of the memory of a process

Forcing dumps

Okay, a dumps are not only a consequence of errors. You can force the generation of both kinds. This is really useful when you want to freeze the current state of the system or an application for further examination.

Forcing a core dump

Let’s assume you want to have an core dump of a process running on your system:

# ps -ef  | grep "bash" | grep "jmoekamp"
jmoekamp   681   675   0 20:59:39 pts/1       0:00 bash

Okay, now we can trigger the core dump by using the process id of the process.

# gcore 681
gcore: core.681 dumped

Okay, but the kicker is the fact, that the process still runs afterwards. So you can get an core dump of your process for analysis without interrupting it.

# ps -ef  | grep "bash" | grep "jmoekamp"
jmoekamp   681   675   0 20:59:39 pts/1       0:00 bash

Neat isn’t it. Now you can use the mdb to analyse it, for example to print out the backtrace:

# mdb core.681
Loading modules: [ libc.so.1 ld.so.1 ]
> $c
libc.so.1`__waitid+0x15(0, 2a9, 8047ca0, 83)
libc.so.1`waitpid+0x63(2a9, 8047d4c, 80)
waitjob+0x51(8077098)
postjob+0xcd(2a9, 1)
execute+0x77d(80771c4, 0, 0)
exfile+0x170(0)
main+0x4d2(1, 8047e48, 8047e50)
_start+0x7a(1, 8047eec, 0, 8047ef0, 8047efe, 8047f0f)

Forcing a crash dump

Okay, you can force a crash dump, too. It’s quite easy. You can trigger it with the uadmin command.

bash-3.2# uadmin 5 0

panic[cpu0]/thread=db47700: forced crash dump initiated at user request

d50a2f4c genunix:kadmin+10c (5, 0, 0, db325400)
d50a2f84 genunix:uadmin+8e (5, 0, 0, d50a2fac, )

syncing file systems... 2 1 done
dumping to /dev/dsk/c0d0s1, offset 108593152, content kernel
100% done: 31255 pages dumped, compression ratio 5.31, dump succeeded
Press any key to reboot.

Why should you do something like that? Well, there are several reasons. For example, when you want to stop a system right at this moment. There is an effect in clusters called "split brain" . This happens, when both systems believe their are the surviving one, because they’ve lost the cluster interconnect. Sun Cluster can prevent this situation by something called quorum. In a high availability situation the nodes of a cluster try to get this quorum. Whoever gets the quorum, runs the service. But you have to ensure that the other nodes don’t even try to write something to disks. The simplest method: Panic the machine.

Another use case would be the detection of an security breach. Let’s assume, your developer integrated a security hole as large as the Rhine into a web applicaiton by accident and now someone else owns your machine. The false reaction would be: Switch the system off or trigger a normal reboot. Both would lead to the loss of the memory content and perhaps the hacker had integrated a tool in the shutdown procedure to erase logs. A more feasible possibility: Trigger a crash dump. You keep the content of the memory and you can analyse it for traces to the attacker.

Controlling the behaviour of the dump facilities

Solaris has mechanisms to control the behaviour of dumps. These mechanisms are different for crash and core dumps.

Crash dumps

You can configure the content of crashdumps, where they are located and what you do with them after the boot. You control this behaviour with the dumpadm command. When you use this command without any further option, it prints out the actual state.

# dumpadm
      Dump content: kernel pages
       Dump device: /dev/dsk/c0d0s1 (swap)
Savecore directory: /var/crash/incubator
  Savecore enabled: yes

This is the default setting: A crash dump contains only the memory pages of the kernel and uses /dev/dsk/c0d0s1 (the swap device) to store the crash dump in the case of a kernel panic. savecore is a special process, that runs at the next boot of the system. In the case of an crash dump at the dump device, it copies the dump to the configured directory to keep it for analysis before it’s used for swapping again.

Let’s change the behaviour. At first we want to configure, that the complete memory is saved to the crash dump in case of a panic. This is easy:

# dumpadm -c all
      Dump content: all pages
       Dump device: /dev/dsk/c0d0s1 (swap)
Savecore directory: /var/crash/incubator
  Savecore enabled: yes

Okay, now let’s change the location for the crash dump. The actual name is an artefact of my orignal VM image called incubator. To get a new test machine I clone this image. I want to use the directory /var/crash/theoden for this purpose.

# mkdir /var/crash/theoden
# chmod 700 /var/crash/theoden
# dumpadm -s /var/crash/theoden
      Dump content: all pages
       Dump device: /dev/dsk/c0d0s1 (swap)
Savecore directory: /var/crash/theoden
  Savecore enabled: yes

Now the system will use the new directory to store the crash dumps. Setting the rights of the directory to 700 is important. The crash dump may contain sensitive information, thus it could be dangerous to make them readable by anyone else than root.

Core dumps

A similar facility exists for the core dumps. You can control the behaviour of the core dumps with the coreadm command. Like with dumpadm you can get the actual configuration by using coreadm without any option.

# coreadm
     global core file pattern: 
     global core file content: default
       init core file pattern: core
       init core file content: default
            global core dumps: disabled
       per-process core dumps: enabled
      global setid core dumps: disabled
 per-process setid core dumps: disabled
     global core dump logging: disabled

This programm has more options than dumpadm. I won’t go through all options, but some important ones.

From my view the file patterns are the most interesting ones. You can control, where core dumps are stored. The default is to store the core dumps in the working directory of a process. But this may lead to core dumps dispersed over the filesystem.

With core adm you can configure a central location for all your coredumps.

# coreadm -i /var/core/core.%n.%f.%u.%p
# coreadm -u

With -i you tell coreadm to set the location for the per-process core dumps. The parameter for this option is the filename for new core dumps. You can use variables in this filename. For example %n will be translated to the machine name, %f to name of the file, %u to the effective user id of the process and %p will be substituted with the process id. The coreadm -u forces the instant reload the configuration. Otherwise, this setting would get active at the next boot or the next refresh of the coreadm service. Okay, let’s try our configuration.

# ps -ef | grep "bash" | grep "jmoekamp"
jmoekamp   681   675   0 20:59:39 pts/1       0:00 bash

Now we trigger a core dump for a running process.

# gcore -p 681
gcore: /var/core/core.theoden.bash.100.681 dumped

As you see, the core dump isn’t written at the current working directory of the process, it’s written at the configured position.

Core dump configuration for the normal user

The both configuration described so far are global ones, so you can do this configuration only with root privileges. But a normal user can manipulate the core dump configuration as well, albeit only for processes owned by her or him.

Let’s login as a normal user. Now we check one of our processes for its coreadm configuration:

$ ps -ef | grep "jmoekamp"
jmoekamp   712   670   0 01:27:38 pts/1       0:00 -sh
jmoekamp   669   666   0 22:29:15 ?           0:00 /usr/lib/ssh/sshd
jmoekamp   670   669   0 22:29:16 pts/1       0:00 -sh
jmoekamp   713   712   0 01:27:38 pts/1       0:00 ps -ef
$ coreadm 669
669:    /var/core/core.%n.%f.%u.%p      default

Now let’s check a process owned by root.

$ ps -ef | grep "cron"
jmoekamp   716   670   0 01:28:13 pts/1       0:00 grep cron
    root   322     1   0 22:25:24 ?           0:00 /usr/sbin/cron
$ coreadm 322
322: Not owner

The system denies the access to this information. Now we change the setting for the process 669 from the first example. It’s quite simple:

$ coreadm -p /export/home/jmoekamp/cores/core.%n.%f.%u.%p 669
$ coreadm 669
669:    /export/home/jmoekamp/cores/core.%n.%f.%u.%p    default

This setting ist inherited to all The per-process core file name pattern is inherited by future child processes of the affected processes.

Why should you set an own path and filename for an application or an user? There are several reasons. For example to ensure that you have the correct rights to an directory for the cores, when the process starts to dump the core or to seperate the cores from certain applications a different locations.

Crashdump analysis for beginners

Basic analysis of a crash dump with mdb

Okay, now you have all this crash and core dumps, it would be nice to do something useful with it. Okay, I show you just some basic tricks to get some insight into the state of a system when it wrote a crash dump.

At first we load the dump into the mdb

# mdb unix.0 vmcore.0  
Loading modules: [ unix genunix specfs cpu.generic uppc scsi_vhci ufs ip hook neti sctp arp usba nca lofs zfs random nsctl sdbc rdc sppp ]
>

Solaris has an in-memory buffer for the console messages. In the case you write a crash dump, obviously this messages are written into the crash dump as well. With the ::msgbuf command of ` mdb` you can read this message buffer.

> ::msgbuf
MESSAGE                                                               
SunOS Release 5.11 Version snv_84 32-bit
Copyright 1983-2008 Sun Microsystems, Inc.  All rights reserved.
Use is subject to license terms.
features: 10474df<cpuid,sse3,sse2,sse,sep,cx8,mmx,cmov,pge,mtrr,msr,tsc,lgpg>
mem = 331388K (0x1439f000)
root nexus = i86pc
pseudo0 at root
pseudo0 is /pseudo
[...]
devinfo0 is /pseudo/devinfo@0

panic[cpu0]/thread=db3aea00: 
forced crash dump initiated at user request


d5efcf4c genunix:kadmin+10c (5, 0, 0, db5c8a98)
d5efcf84 genunix:uadmin+8e (5, 0, 0, d5efcfac, )

syncing file systems...
 done
dumping to /dev/dsk/c0d0s1, offset 108593152, content: all
>

So it’s really easy to get this last messages of a dying system with mdb from the crash dump alone.

A nice information is the backtrace. This helps you to find out, what triggered the crash dump. In this case it’s easy. It was the uadmin syscall.

> $c   
vpanic(fea6388c)
kadmin+0x10c(5, 0, 0, db39e550)
uadmin+0x8e()
sys_sysenter+0x106()

But it would be nice, to know more of the state of the system, at the moment of the crash. For example we can print out the process table of the system like we would do it with ps

> ::ps
S    PID   PPID   PGID    SID    UID      FLAGS     ADDR NAME
R      0      0      0      0      0 0x00000001 fec1d3d0 sched
[...]
R    586      1    586    586      0 0x42000000 d55f58a8 sshd
R    545      1    545    545      0 0x42000000 d5601230 fmd
R    559      1    559    559      0 0x42000000 d55fb128 syslogd
[...]
R    533    494    494    494      0 0x4a014000 d55f19c0 ttymon

We can even lookup, which files or sockets where opened at the moment of the crash dump. For example: We want to know the open files of the ssh daemon. To get this information, we have to use the address of the process from the process table (the eighth column) and extend it with "::pfiles":

> d55f58a8::pfiles
FD   TYPE    VNODE INFO
   0  CHR d597d540 /devices/pseudo/mm@0:null 
   1  CHR d597d540 /devices/pseudo/mm@0:null 
   2  CHR d597d540 /devices/pseudo/mm@0:null 
   3 SOCK db688300 socket: AF_INET6 :: 22 

And here we look into the open files of the syslog process.

> d55fb128::pfiles
FD   TYPE    VNODE INFO
   0  DIR d5082a80 / 
   1  DIR d5082a80 / 
   2  DIR d5082a80 / 
   3 DOOR d699b300 /var/run/name_service_door [door to 'nscd' (proc=d5604890)]
   4  CHR db522cc0 /devices/pseudo/sysmsg@0:sysmsg 
   5  REG db643840 /var/adm/messages 
   6  REG db6839c0 /var/log/syslog 
   7  CHR db522840 /devices/pseudo/log@0:log 
   8 DOOR db6eb300 [door to 'syslogd' (proc=d55fb128)]

As the core dump contains all the pages of the kernel (or more, in the case you configure it) you have a frozen state of your system to investigate everything you want.

And to get back to my security example: With the core dump and mdb you can gather really interesting informations. For example, you can see that an ssh connection was open at the time of the crash dump.

> ::netstat
TCPv4    St   Local Address        Remote Address       Stack       Zone
db35f980  0   10.211.55.200.22        10.211.55.2.53811    0    0
[...]

A practical usecase

You can do it like the pros and look at source code and crash dump side by side to find the root cause for an error. Or like some colleagues at the Sun Mission Critical Support Center who wouldn’t surprise me, when they find the error by laying their hand on a system).

For all others, there is a more simple way to analyse your crash dump to have at least a little bit more informations to search in a bug database.

I will use a crash I’ve analyzed a long time ago to show you the trick. Okay, you have to start a debugger. I used mdb in this example:

bash-3.00# mdb -k unix.4 vmcore.4
Loading modules: [ unix krtld genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp ufs md ip sctp usba fcp fctl nca lofs cpc fcip random crypto zfs logindmux ptm sppp nfs ipc ]

A prompt appears, just type in $C to get a stack trace.

> $C
fffffe80000b9650 vpanic()
fffffe80000b9670 0xfffffffffb840459()
fffffe80000b96e0 segmap_unlock+0xe5()
fffffe80000b97a0 segmap_fault+0x2db()
fffffe80000b97c0 snf_smap_desbfree+0x76()
fffffe80000b97e0 dblk_lastfree_desb+0x17()
fffffe80000b9800 dblk_decref+0x66()
fffffe80000b9830 freeb+0x7b()
fffffe80000b99b0 tcp_rput_data+0x1986()
fffffe80000b99d0 tcp_input+0x38()
fffffe80000b9a10 squeue_enter_chain+0x16e()
fffffe80000b9ac0 ip_input+0x18c()
fffffe80000b9b50 i_dls_link_ether_rx+0x153()
fffffe80000b9b80 mac_rx+0x46()
fffffe80000b9bd0 bge_receive+0x98()
fffffe80000b9c10 bge_intr+0xaf()
fffffe80000b9c60 av_dispatch_autovect+0x78()
fffffe80000b9c70 intr_thread+0x50()

Okay, now start at the beginning of the trace to strip all lines from the operating system infrastructure for error cases. Okay, vpanic() generates the panic. The second line is useless for our purposes to. The next both lines with segmap are generated by the error but not the root cause. The interesting line ist snf_smap_desbfree

With this name you can go to Sunsolve or bugs.opensolaris.org. Et voila : System panic due to recursive mutex_enter in snf_smap_desbfree trying to re-aquire Tx mutex. When you type this error into the PatchFinder, you will find a patch fixing this bug: 124255-03

Two hints:

  • It’s a good practice to know mdb. It’s very useful at compiling open source software in the case your compiled code throw cores, but you don’t know why. core files are not just for deleting them.

  • Error reports with a stack trace are more usefull than an error report just with "The system paniced when I did this"

Conclusion

The Solaris Operating Environment has several functions to enable the user or the support engineer in the case something went wrong. Crash and core dumps are an invaluable resource to find the root cause of a problem. Don’t throw them away without looking at them.

Do you want to learn more?

Documentation

coreadm(1M)

dumpadm(1M)

uadmin(1M)

mdb(1)

Solaris Modular Debugger Guide

Books

Solaris Internals: Solaris 10 and Open Solaris Kernel Architecture - Richard McDougall and Jim Mauro

Solaris Performance and Tools: DTrace and MDB Techniques for Solaris 10 and OpenSolaris - Richard McDougall , Jim Mauro and Brendan Gregg