(originally published on 17.06.2018, reviewed/rewritten on 05.04.2025, tested on Oracle Solaris 11.4 SRU 79)
 

Many Solaris are aware of the the Fault Management Architecture in Solaris. However, it’s not really a habit i’ve seen frequently to have a regular peek into the output of fmlist to look after the faults detects by Solaris. In Solaris a new PAM module has been integrated, that gives you a message, that looking into the information of the FMA may be not the dumbest idea.

It’s already in the default pam configuration:

jmoekamp@testbed:~$ grep -i "session" /etc/pam.d/other
# Default definition for Session management
# Used when service name is not explicitly mentioned for session management
session definitive      pam_user_policy.so.1
session required        pam_unix_session.so.1
session optional        pam_fm_notify.so.1

However by default, the module does nothing. You won’t get the message to look after the FMA

joergmoellenkamp@Mac ~ % ssh jmoekamp@192.168.39.122
(jmoekamp@192.168.39.122) Password: 
Last login: Thu Apr  3 05:09:23 2025 from 192.168.39.121
Oracle Solaris 11.4.79.189.2                     Assembled March 2025
jmoekamp@testbed:~$ 

The reason for not being the default is simple: Perhaps not all users allowed to log into the system should see this kind of information. In order to get this kind of information at login, you need an additional authorization. The necessary authorization is called solaris.fm.read. You can it this authorization to a user via usermod:

root@testbed:~# usermod -A +solaris.fm.read jmoekamp

Next time you login as jmoekamp you will see a small, but useful addition to the output:

joergmoellenkamp@Mac ~ % ssh jmoekamp@192.168.39.122
(jmoekamp@192.168.39.122) Password: 
Last login: Thu Apr  3 05:11:43 2025 from 192.168.39.121
NOTE: system has 2 active diagnoses; run 'fmadm list' for details.
Oracle Solaris 11.4.79.189.2                     Assembled March 2025
Mastodon · 4 comments
DrScriptt @drscriptt@oldbytes.space
@c0t0d0s0 @SolarisDiaspora I saw this and skimmed the article.

I think that it’s a very clever use of PAM.

But I feel like multiple things have failed for this to be the first notification.

Why hasn’t the *LOM sent an alert email / SNMP trap?

Why hasn’t periodic monitoring of fault management as part of overall monitoring alerted?

I mean for belt and suspenders, pam_fm_notify is suspenders. But where’s the belt?

All of that being said, I’m considering installing pam_fm_notify as an additional level of notification.
1 0 0
↩ @drscriptt@oldbytes.space
@drscriptt @SolarisDiaspora FMA is not just hardware, in this case it's SMF services not running Of course this is somewhat belt and suspender. Proper monitoring should have reported this. Furthermore: When you see this message, you know in an instance that you are logging into the system with problems, even when SMF has reported problems by other means.
1 0 1
DrScriptt @drscriptt@oldbytes.space
↩ @c0t0d0s0
@c0t0d0s0 @SolarisDiaspora I thought FMA would trigger the system attention / service needed light for software too. Am I misremembering?
0 0 0
WooShell @WooShell@chaosfurs.social
@c0t0d0s0 that reminds me of something I had needed an expert for a few days ago..
Do you know a way to disable the fault propagation from LDOMs to the CDOM and/or ILOM? I can only find articles where Oracle announced the feature had been added, but no info how to configure/disable it.
This is quite annoying, as e.g. a user coredumping in an LDOM would be "can be checked tomorrow" thing, while a fault reported in the CDOM is usually a "Call Woo right now!" issue.
0 0 0
5 toots from 3 people in this thread
Written by

Joerg Moellenkamp

Grey-haired, sometimes grey-bearded Windows dismissing Unix guy.