LKSF: Getting kernel statistics

Whenever i’m at a customer for a performance analysis gig, the customer ask me when i will start the Dtrace magic that many people expect when you tell them you work in the Solaris and Performance area. The truth is however that you use DTrace not at the beginning … perhaps at last because Solaris has a rich set of features that delivers you nice statitistical information without even touching DTrace, like all the *stat tools. One core component of collecting such information is the kstat facility. kstat is so to say the central way to get statistical informations from the kernel, a number of *stat tools use the kstat facility toget their data.

You can ask the kstat facility for a lot of information and the tool to do so is called kstat. With a lot of information i really mean “a lot” as in

root@solaris:~# kstat -p | wc -l
   27351

You want statistical information about ZFS, you can get them with

kstat -p -m zfs

You want the size of the ZFS ARC every second:

kstat -p zfs:0:arcstats:c 1


Really helpful is the filtering mechanism of kstat. The “Keys” are following the following rule

module:instance:name:statistic

So zfs:0:arcstats:c is module “zfs”,instance 0 of the module, name “arcstat” and statistic “c”. You can either express this at kstat in a full string like before, but you can use command line options as well.
kstat -p -m zfs -i 0 -n arcstats -s c will yield the same result as kstat -p zfs:0:arcstats:c.

If you omit a value a wildcard is assumed. So when you want to see the read statistic of each instance of the sd module, you can express it as kstat -p -m sd -s writes or kstat -p sd:::writes.

And there are quite a few occasions where a customer of mine found a configuration error simply by looking for non-zero values in kstat -p -s "*fail*" and a short search in MOS in almost no time (namely in early some time ago hitting the segkp fails. But caution: Not all non-zero values indicate a problem, sometimes they are just the way things work (something fails to get memory, but the fail triggers getting more memory, so the code can try again and succeed).

So the kstat facility and the kstat command is really useful to get information about the system when the *stat tools don’t give you the answer you want or you want to see raw data.