Solaris Features: Service Management Facility - Part 2: The foundations of SMF

The additional capabilities of the SMF comes at a price. SMF has to know more about your services. Most of the new components of the SMF has to do with this capabilties. So we have to define some foundations before doing some practical stuff.

Service and Service Instance

At first we start with the service and the service instance. This difference is important. The service is the generic definition how a service is started. The service instance is the exact configuration of a service. A good example is a webserver. The service defines the basic methods how to start or stop an apache daemon, the service instance contains the information, how an specific configuration should work (which port to listen on, the position of the config file). A service can define to allow just one instance, as well you can define, you can have multiple instances of a certain service. But: A service doesn´t have to be a long running process. Even a script that simply exits after executing (e.g. for some commands to do network tuning) is a special kind of a service in the sense of SMF.

Milestone

A milestone is somehow similar to the old notion of runlevel. With milestones you can group certain services. Thus you don´t have to define each service when configuring the dependencies, you can use a matching milestones containing all the needed services. Furthermore you can force the system to boot to a certain milestone. For example: Booting a system into the single user mode is implemented by defining a single user milestone. When booting into single user mode, the system just starts the services of this milestone. The milestone itself is implemented as a special kind of service. It´s an anchor point for dependencies and a simplification for the admin. Furthermore some of the milestones including single-user, multi-user and multi-user-server contain methods to execute the legacy scripts in rc*.d

Fault Manager Resource Identifier

Every service instance and service instance has a unique name in the system do designate it precicely. This name is called Fault Management Resource Identifier. For example the SSH server is called:

svc:/network/ssh:default</pre>

The FRMI is divided by the : into three parts. The first part designates the resource as an service. The second parts designates the service. The last part designates the service instance. Into natural language: It´s the default configuration of the ssh daemon. But why is this identifier called Fault Manager Resource Identifier? Fault Management is another important feature in Solaris. The FM has the job to react automatically to failure events. The failure of a service isn´t much different to the failure of a hard disk or a memory chip. You can detect it and you can react to it. So the Fault Management is tightly integrated into the service mangement.

Service Model

As i mentioned before, not at all services are equal and they have different requirements to starting them. Thus the System Managemen Facility knows different service models:

Transient service

The simplest service model is transient. You can view it as a script that gets executed while starting the system without leaving a long-lived server process. You use it for scripts to tune or config things on your system. A good example is the script to configure the core dumping via coreadm. A recommendation at this place: Don´t use the transient model to transform your old startup scripts. Albeit possible, you loose all the advantages of SMF. In this case it would be easier to use the integrated methods to use legacy init.d scripts.

Standalone model

The third service model is the standalone model. The inner workings of this model are really simple. Whenever the forked process exits, SMF will start it again.

Contract service

The standard model for services is contract. This model uses a special facility of the Solaris Operating Environment to monitor the processes

A short digression: Contracts

Did you ever wondered about the /system/contract filesystems. It´s the most obvious sign of the contracts. The contract model ist based on a kernel level construct to manage the relationships between a process and other kernel managed resources. Such resources are processor sets, memory, devices and most important for SMF other processes. Process contracts describe the relation between a process and it´s child process. The contract subsystem generates events available to other processes via listeners. Possible events are:

Event    Description
empty    the last process in the contract has exited
process exit    a process in the process contract has exited
core    a member process dumped core
signal    a member process received a fatal signal from outside the contract
hwerr    a member process has a fatal hardware error


Your system already use this contracts. Let´s have a look at sendmail.

# ptree -c `pgrep sendmail`
[process contract 1]
1 /sbin/init
[process contract 4]
7 /lib/svc/bin/svc.startd
[process contract 107]
792 /usr/lib/sendmail -bd -q15m -C /etc/mail/local.cf
794 /usr/lib/sendmail -Ac -q15m
</pre>

With the -c option pstree prints the contract IDs of the processes. In our example, the sendmail processes run under the contract ID 107. With ctstat we can lookup the contents of this contract:

# ctstat -vi 107
CTID ZONEID TYPE STATE HOLDER EVENTS QTIME NTIME
107 0 process owned 7 0 - -
cookie: 0x20
informative event set: none
critical event set: hwerr empty
fatal event set: none
parameter set: inherit regent
member processes: 792 794
inherited contracts: none
</pre>

Contract 107 runs in the global zone. It´s an process id and it was created by process number 7 (the svc.startd). There wasn´t any events so far. The contract subsystem should only throw critical evens when the processes terminate due hardware errors and when no processes are left. At the moment there are two processes under the control of the contract subsystem (the both processes of the sendmail daemon) Let´s play around with the contracts:

# ptree -c `pgrep sendmail`
[process contract 1]
1 /sbin/init
[process contract 4]
7 /lib/svc/bin/svc.startd
[process contract 99]
705 /usr/lib/sendmail -bd -q15m -C /etc/mail/local.cf
707 /usr/lib/sendmail -Ac -q15m
</pre>

You can listen to the events with the ctwatch:

# ctwatch 99
CTID EVID CRIT ACK CTTYPE SUMMARY
</pre>

Okay, open a second terminal window to your system and kill the both sendmail proccesses:

# kill 705 707</pre>

After we submitted the kill, the contract subsystem reacts and sends an event, that there are no processes left in the contract.

# ctwatch 99
CTID EVID CRIT ACK CTTYPE SUMMARY
99 25 crit no process contract empty
</pre>

Bbesides of ctwatch the event there was another listener to the event: SMF. Let´s look for the sendmail processes again.

# ptree -c `pgrep sendmail`
[process contract 1]
1 /sbin/init
[process contract 4]
7 /lib/svc/bin/svc.startd
[process contract 103]
776 /usr/lib/sendmail -bd -q15m -C /etc/mail/local.cf
777 /usr/lib/sendmail -Ac -q15m
</pre>

Et voila, two new sendmail processes with a different process id and a different process contract ID. SMF has done it´s job by restarting sendmail. To summarize things: The SMF uses the contracts to monitor the processes of a service. Based on this events SMF can take action to react on this events. Per default, SMF stops and restart a service, when any member of the contract dumps core, gets a signal or dies due a hardware failure. Additionaly the SMF does the same, when there´s no member process left in the contract.

Service State

Fault management brings us to the next important definition. Every service instance has a service state. This state describes a point in the lifecycle of the process:


<table width=70%> Service state    Description degraded    The service runs, but somehow the startup didn´t fully succeeded and thus the service has only limited capabilities disabled    The service was enabled by the admin, and thus SMF doesn´t attempt to start it online    The services is enabled and the bringup of the service was successful offline    The service is enabled, but the service hasn´t been started so far, as dependencies are not fullfilled. maintance    The service didn´t started properly and exited with an error code other than 0. For example because of typos in config files legacy_run    This is an special service state. It´s used by the SMF for services under the control of the restarter for legacy init.d scripts </table>


Each service under the control of the SMF has an service state throughout it whole lifetime on the system.

Service Configuration Repository

All the configurations about the services in the Service Configuration Repository. It´s the central database regarding the services. This database is backuped and snapshoted in a regular manner. So it´s easy to fall back to a known running state of the repository (after you or a fellow admin FOOBARed the service configuration)

Dependencies

The most important feature of SMF is the knowledge about dependencies. In SMF you can define two kinds of dependency in a services.

This second way to define a dependency has an big advantage. Let´s assume, you have a new service. You want to start it before an other service. But you don´t want to change the object itself (perhaps, you need this service only in one special configuration and the normal installation doesn´t need your new service … perhaps it´s the authentication daemon for a hyper-special networking connection ;)). By defining, that another service depends on your service, you don´t have to change the other one. I will show you how to look up the dependencies in the practical part of this tutorial.

Master Restarter Daemon and Delegated Restarter

Okay, now you have all the data. But you need someone to do something: For this task you have the SMF Master Restarter Daemon. This daemon reads the Service Configuration Repository and acts accordingly. It starts a services when all it´s dependencies are fullfilled. By this simple rule all services will be started in the process of booting untill there are no enabled services left in the offline state. But not all processes are controlled by the Master Restarter. The Master Restarted can delegate this task to other restarters, thus the are called SMF Delegated Restarter Daemons.

Delegated Restarter for inetd services

The most obvious example for such an delegated restarter is inetd, the daemon to start network demons only on demand. One important effect of this is a change in behaviour of the inetd. inetd.conf isn´t used to control inetd anymore. The Solaris services which were formerly configured using this file are now configured via SMF. So you don´t edit the inetd.conf to disable or enable an inetd service. You use the same commands like for all other services.

Enough theory

Enough theory, let´s do some practical stuff …