Written by J. Moellenkamp
on May 10, 2010
Reading time: 7 minutes
Solaris English

Meet the stats - today: mpstat

In this installment of the “Meet the Stats” series i want to talk with you about the mpstat. In my opinion, mpstat is one of the most useful tools to find what your processors are really doing.

Using mpstat

Let’s execute mpstat on a system. I’ve used my fileserver for this task on Saturday morning, it’s a system with four cores. So mpstat reports 4 lines to me.

$ mpstat 1
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    4   0   28   679  154  573    3   12    8    0   767    1   2   0  96
  1    4   0   22   504  145  485    2    9    4    0   661    1   2   0  97
  2    4   0   30   579   81  425    3   12    6    0   519    1   2   0  97
  3    5   0   25   505  250  517    3   12    5    0   758    1   3   0  96
CPU minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
  0    0   0    9   567  182  372    0    2    0    0   338    0   1   0  99
  1    0   0   21   454  174  468    1    1    2    0   468    1   1   0  98
  2    0   0   12   480   15  304    1    4    1    0   249    2   1   0  97
  3   27   0   15   157   68  147    1    2    0    0   422    0   1   0  99
jmoekamp@hivemind:~$

Internals

You need some knowledge about the inner workings of an operating system to really understand the output of this command, but a basic understanding is relatively easy to reach.

All those operations are normal occurrences in a Solaris system. You can't say "oh ... i have too much events of this kind" just by looking at this numbers, because the observed pattern is possibly normal for your load. So it's very reasonable to run mpstat from time to time during load times just to get a baseline. Debugging in the event of a major fsckup is much easier with such historical data because otherwise you look for a pattern that is looks pathological, but it is maybe just the way things go in your application.
Forget about this wt column. It's the "wait time" column, but it isn't computed anymore, it's simply set to zero. The reason for keeping this column is the binary compatibility guarantee. You can't leave it out because one column less could break programs and you can't fill it with a dash, as programs may expect a number here.
When not otherwise stated the numbers are "events per second". Exceptions are the last four columns and obviously the first one.

A recommended reading

In the following description i sacrified complete correctness for understandability, as i simplified some of the dependencies.To understand the full implications of all the numbers presented by all the *stat commands you should start to gather some knowledge about the internals of Solaris. There is an excellent book about it. It’s “Solaris Internals: Solaris 10 and Open Solaris Kernel Architecture
“ written by Richard McDougall and Jim Mauro. The ISBN of this great book is 0-13-148209-2.

DTrace examples

The DTrace examples are from my standard cheat sheet i’m using at customer site. However they aren’t mine, i’ve gathered them from the Prefetch.net’s Dtrace Cookbook.

The numbers and their meaning

sys,usr and idl have obvious meanings, they tell you the percentage of time the system stays in kernelland, in userland or idling. When you really want to know something about the load on your system, look at this values and forget about load average. The other values are a little more difficult to explain. Basically you can divide the columns 2-12 into three groups: The virtual memory part (row 2 and 3), the interrupt part (row 4-7), the scheduling part (8-11) and the locks part (row 11-12)

The virtual memory part

To understand the meaning of the both rows regarding the virtual memory subsystem you should have some knowledge about the concept of virtual memory. But i will try to give you some insight into this part without getting overly complex. At first it’s important to know that memory is organized in pages. Those pages are chunks of memory. The possible sizes of the page are hardware dependent, but let’s assume that we have pages in the size of 8 Kilobytes. As you may know, modern operating system doesn’t allocate real memory when your application is requesting for memory. Instead it allocated virtual memory. When you access a memory page the first time, a page fault occurs. This page fault leads to the mapping of a physical memory page to the virtual memory page. The mapping is done by adding an entry to the Hash Page Table. And here the minor and major page faults differs:

minf:
When the memory subsystem doesn't find a mapping in the Hash Page Table, but knows that a page with the same content is on the list of free pages, a minor fault occurs. The page is just inserted in the Hash Page Table and the system works with the data in the memory already there. You can measure what applications create minor page faults with a short dtrace oneliner:
```
#  dtrace -n 'vminfo:::as_fault{@execs[execname]=count()}'
dtrace: description 'vminfo:::as_fault
' matched 1 probe
^C

  dtrace                                                          104
jmoekamp@hivemind:~#
```
majf:
A major fault has much more severe consequences. It occurs, when there is no mapping to a physical page in the hash page table and the content of the page was migrated to the swap space. Obviously takes some time.
```
# dtrace -n 'vminfo:::maj_fault{@execs[execname] = count() }'
dtrace: description 'vminfo:::maj_fault' matched 1 probe
^C
```

Obviously major faults have a bigger impact to the system performance than minor faults, as the second one doesn’t need to access a rotating rust device aka hard disk. However even minor faults can have a significant impact to performance. But that’s enough stuff for an own article or an evening with the book mentioned above.

The interrupt part

xcal:
xcal's or cross calls is a special kind of interrupt. Whenever a processors need another processor to do something for them, a so-called cross call is issued. There are several reasons to issue cross calls like updating certain tables on other processors.

# dtrace -n 'sysinfo:::xcalls{@execs[execname] = count();}'
dtrace: description 'sysinfo:::xcalls' matched 1 probe
^C

  firefox-bin                                                       6
  thunderbird-bin                                                   6
  VBoxHeadless                                                      9
  dtrace                                                           24
  pageout                                                        1660
  sched                                                          1834
jmoekamp@hivemind:~#

intr:
An interrupt interrupts preempts the current work on the processor and forces it to execute the code needed to handle the interrupt, for example to trigger the processing incoming network packages. To get some insight into the drivers generating interrupts, it's interesting to use the intrstat command:

jmoekamp@hivemind:~# intrstat
      device |      cpu0 %tim      cpu1 %tim      cpu2 %tim      cpu3 %tim
-------------+------------------------------------------------------------
[...]
    e1000g#0 |         0  0,0         0  0,0         0  0,0         0  0,0
    e1000g#1 |         0  0,0         0  0,0         0  0,0         0  0,0
      ehci#0 |         0  0,0         0  0,0         0  0,0         0  0,0
      ehci#1 |         0  0,0         0  0,0         0  0,0         0  0,0
   hci1394#0 |         0  0,0         0  0,0       123  0,0         0  0,0
[...]
   pci-ide#0 |         0  0,0         0  0,0       247  0,3         0  0,0
       rge#0 |         0  0,0         0  0,0         0  0,0        11  0,0

ithr:
ithr or "interrupts as threads" refers to a special mechanism to handle those interrupts. Many interrupts are handled in threads that are triggered by an interrupts. This column counts the interrupts handled by such threads.

Interrupts are important for the operation of the system, however they interrupt (they are called “interrupts” for a reason) the application running on this processor. Thus a high number of interrupts can lead into a situation, where many interrupts significantly slows down the application. There are some tricks to reduce this interruption. For example you can force the interrupts on a subset of all processors by declaring most of the CPUs as “non-interrupt”.

The scheduling part

csw:
Context switches take place, when a currently running thread doesn't have anything to compute on the processor. For example because it wait's for data from the disk. The process gives back the processor to scheduling and a different process is scheduled on the proc. As the other process has a totally different set of register contents for example, the OS has to switch from the context of the old process to the one of the new thread. This is called context switch. Obviously there is a performance penalty bound to this event, as the switching takes some time. When you want to know what processes causes the context switches, you can use the sysinfo:::psswitch probe helps you:
```
# dtrace -n 'sysinfo:::pswitch{@execs[execname] = count(); }'
dtrace: description 'sysinfo:::pswitch' matched 3 probes
^C

  fmd                                                               1
[...]
  VBoxHeadless                                                   2054
  sched                                                          9657
jmoekamp@hivemind:~#
```

icsw:
involuntary context switches is the forced variant of a context switch. Whenever a processor has consumed it's time slice or when a higher priority process is ready for execution, an involuntary context switch is done. It just forces the process off from the processor.

# dtrace -n 'sysinfo::preempt:inv_swtch{@execs[execname] = count();}'
dtrace: description 'sysinfo::preempt:inv_swtch
' matched 1 probe
^C

  VBoxHeadless                                                      1
  VBoxSVC                                                           1
  gam_server                                                        1
  thunderbird-bin                                                   2
  firefox-bin                                                       3
  gnome-netstatus-                                                  3

Obviously a large number of involuntary context switches should be avoided.

migr:
A "thread migrations" is counted when a process is scheduled on a different processor than it's last time. This can have a certain a big performance impact, as the caches in the processor aren't warmed for the process, thus leading to more cache misses thus leading to more accesses to the slower main memory instead to the caches.

# dtrace -n ' sched:::off-cpu{self->cpu = cpu;}

sched:::on-cpu /self->cpu != cpu/ 
{
    printf("%s migrated from cpu %d to cpu %d\n",execname,self->cpu,cpu);
    self->cpu = 0; 
}' 
dtrace: description ' sched:::off-cpu 
' matched 6 probes
^C
CPU     ID                    FUNCTION:NAME
  2  10067                    resume:on-cpu firefox-bin migrated from cpu 0 to cpu 2
  2  10067                    resume:on-cpu thunderbird-bin migrated from cpu 0 to cpu 2
  2  10067                    resume:on-cpu nskernd migrated from cpu 0 to cpu 2
  2  10067                    resume:on-cpu nskernd migrated from cpu 0 to cpu 2
  2  10067                    resume:on-cpu VBoxHeadless migrated from cpu 0 to cpu 2
  2  10067                    resume:on-cpu sched migrated from cpu 0 to cpu 2
  2  10067                    resume:on-cpu gnome-netstatus- migrated from cpu 0 to cpu 2
  2  10067                    resume:on-cpu gnome-netstatus- migrated from cpu 0 to cpu 2

The locks part.

smtx:
smtx or "spins on mutexes" reports how often the code flow on the processor wasn't able to gather a mutex lock. Mutex is a shorthand for "Mutual Exclusion". A mutex lock provides exclusive read and write access to the thread owning it.
```
# dtrace -n 'lockstat:::adaptive-spin, lockstat:::adaptive-block
> {
>     @execs[execname,probename] = count(); 
> }'
dtrace: description 'lockstat:::adaptive-spin, lockstat:::adaptive-block
' matched 2 probes
^C

  gnome-netstatus-                                    adaptive-spin                                                     2
  sched                                               adaptive-block                                                    5
  zpool-datapool                                      adaptive-block                                                    5
  VBoxHeadless                                        adaptive-spin                                                    21
  zpool-datapool                                      adaptive-spin                                                    78
  sched                                               adaptive-spin                                                   262
#
```
The number of spins is an interesting number because of the nature of spin locks. Imagine this lock like a lock at a lavatory door, where the "Busy/Vacant" part is out of order. You have two ways to get to the lavatory. Shaking every few seconds at the door to check if it's still closed or you can leave it, doing something else and wait for a few minutes and then check again. The spin lock is the equivalent to this very annoying person rattling at the door. Being annoying is no problem in the computer, but rechecking it again and again uses your clock cycles you could use better. However it has a big advantage: You get the lavatory immediately when it's free and you don't have to do a context switch from doing a telephone call and going to the lavatory. Of course the reality is a little bit more complex than the lavatory door, but it should get you the picture. It's the same with spin locks: There is a tight loop that tries to acquire the lock again and again, until it get's the lock and stays on a CPU until the time quantum for the thread is used up or a thread with a higher priority leads to the preemption of the thread. So counting the "spinning on mutexes" event is a good indication, how often your computer rattles at the lavatory door and burns CPU cycles while doing so. Furthermore it's a good indication if there are any highly contended locks in the codepath you are using, as it get's more probable that a lock has to spin, when many threads want to use the same codepath synchronized by this lock. However this is just a vastly simplified description, the handling of locks in Solaris is an own article,too or another evening with the book already mentioned.
srw:
srw or "spins on reader/writer locks" counts the number of spins on reader/writer locks. rwlocks are another kind of locks in Solaris. They allow one writer or many readers to acquire the lock. But not both at the same time.
When you want to know what processes spin are responsible for the srw events, a dtrace one-liner can help you:
```
# dtrace -n 'lockstat:::rw-block
{
    @execs[execname] = count(); 
}'
dtrace: description 'lockstat:::rw-block
' matched 1 probe
^C

  VBoxHeadless                                                      8
#
```

Do you want to learn more?

man pages
docs.sun.com: intrstat
docs.sun.com: mpstat Misc
Prefetch.net: Dtrace Cookbook

← → Top