Friday, October 02, 2015

Notifications from SMF (and FMA)

In illumos, it's possible to set the system up so that notifications are sent whenever anything happens to an SMF service.

Unfortunately, however, the illumos documentation is essentially non-existent, although looking at the Solaris documentation on the subject should be accurate.

The first thing is that you can see the current notification state by looking at svcs -n:

Notification parameters for FMA Events
    Event: problem-diagnosed
        Notification Type: smtp
            Active: true
            reply-to: root@localhost
            to: root@localhost

        Notification Type: snmp
            Active: true

        Notification Type: syslog
            Active: true

    Event: problem-repaired
        Notification Type: snmp
            Active: true

    Event: problem-resolved
        Notification Type: snmp
            Active: true

The first thing to realize here is that first line - these are the notifications sent by FMA, not SMF. There's a relationship, of course, in that if an SMF service fails and ends up in maintenance, then an FMA event will be generated, and notifications will be sent according to the above scheme.

(By the way, the configuration for this comes from the svc:/system/fm/notify-params:default service, which you can see the source for here. And you'll see that it basically matches exactly what I've shown above.)

Whether you actually receive the notifications is another matter. If you have syslogd running, which is normal, then you'll see the syslog messages ending up in the log files. To get the email or SNMP notifications relies on additional service. These are

service/fault-management/smtp-notify
service/fault-management/snmp-notify

and if these are installed and enabled, they'll send the notifications out.

You can also set up notifications inside SMF itself. There's a decent intro available for this feature, although you should note that illumos doesn't currently have any of the man pages referred to. This functionality uses the listnotify, setnotify, and delnotify subcommands to svccfg. The one thing that isn't often covered is the relationship between the SMF and the FMA notifications - it's important to understand that both exist, in a strangely mingled state, with some non-obvious overlap.

You can see the global SMF notifications with
/usr/sbin/svccfg listnotify -g
This will come back with nothing by default, so the only thing you'll see is the FMA notifications. To get SMF to email you if any service goes offline, then

/usr/sbin/svccfg setnotify -g to-offline mailto:admin@example.com

and you can set this up at a per-service level with

/usr/sbin/svccfg -s apache22 setnotify to-offline mailto:webadmin@example.com

Now, while the SMF notifications can be configured in a very granular manner - you can turn it on and off by service, you can control exactly which state transitions you're interested in, and you can route individual notifications to different destinations, when it comes to the FMA notifications all you have is a big hammer. It's all or nothing, and you can't be selective on where notifications end up (beyond the smtp vs snmp vs syslog channels).

This is unfortunate, because SMF isn't the only source of telemetry that gets fed into FMA. In particular, the system hardware and ZFS will generate FMA events if there's a problem. If you want to get notifications from FMA if there's a problem with ZFS, then you're also going to get notified if an SMF service breaks. In a development environment, this might happen quite a lot.

Perhaps the best compromise I've come up with is to have FMA notifications disabled in a non-global zone, and configure SMF notifications explicitly there. Then, just have FMA notifications in the global zone. This assumes you have nothing but applications in zones, and all the non-SMF events will get caught in the global zone.

No comments: