segkp revisited

I wrote about it a while ago, but in the light of recent questions from customer i want to reiterate this, because with the massive compute power of a T5 or T4 i’m seeing higher an higher process counts on customer systems in the recent time. When you are runing quite a number of zones and running applications with a lot of threads and your system sends you messages like “cannot fork: Resource temporarily unavailable” in all your zones in parallel and you are not running Solaris 11.1, you should do the following checks. The following checks are for a system without any changes in this regard of segkp in the /etc/system. The numbers used in this example are obfuscated by rounding from a real-life example. Check at first for the number of threads:

# prstat -cmL 1 1| grep lwps
Total: 2539 processes, 56489 lwps, load averages: 13.11, 14.71, 13.33

If the number of threads is close to 64000, check the usage of the segkp memory.

$ kstat -p | grep "segkp"  | grep ":mem_"
vmem:170:segkp:mem_import 0
vmem:170:segkp:mem_inuse 2100000000
vmem:170:segkp:mem_total 2140000000

If mem_inuse is close to mem_total, check if segkp-allocations have failed (Shortcut: Just check for the alloc_fail).

unix:0:segkp_32768:alloc_fail 2492500
vmem:170:segkp:fail 2492500

If this number is significantly larger than 0 and the number is increasing when measuring two times in let’s say a minute, you have to increase the segkp area. You can do this by putting the following line into the /etc/system and reboot the system.

set segkpsize=0x80000

The explanation for this issue and why it isn’t a problem in Solaris 11.1 is in “Oh my god, it’s full of threads … and out of memory”. Before you ask: The default in Solaris 10 and 11 was choosen, because the segkp is allocated at startup in the configured size. When you have somewhat more normal numbers of threads, you don’t need that large segkp and you could use it better for other stuff. That said, Solaris 11.1 has changed that. It’s now calculated at system start on the basis of a number of parameters. I’ve describted in the linked older article.