A commentator at hackernews asked how i think about -9. In my opinion: It’s widespread use is a similar plaque like the –f switch. And this is pretty easy to explain (I’m simplifying things a bit). -9 is a shorthand for SIGKILL. When you send a SIGKILL to a process, the process is terminated immediately. You can’t catch this signal, you can’t ignore it. A kill with -9 sends this SIGKILL to a process. A kill without -9 sends a SIGTERM to process. It terminates the process like SIGKILL. However a process is allowed to catch it in order to execute a signal handler … or just ignores to ignore it. A signal handler is nothing more than a code path that is executed when the process receives a signal. So when you kill a process with a normal kill you give the process the chance to clean up behind itself, to make files consistent, to roll back changes in the case the process isn’t using some transactional mechanisms when changing data, to delete temporary files … and so on … It’s a good style to write such signal handlers and in many programming languages it’s pretty easy. For example in perl:
When you send a -9 to a process you take away this chance from the process. It’s killed instantly … even if it just started to modify your files, fscking up your data in order to put it in a new form, even when you have created dozens of temporary files filling up /tmp. Things like that …
Killing a process with -9 is the last possibility. However I see people using it too often too early. A second after the normal kill is send a pgrep on the process follows. Still there and the sword of -9 is falling down.
When a process doesn’t disappear immediately after sending the SIGTERM, it may be just busy to follow your order of terminating itself and is cleaning up things. When your application is dependent to precious resources at cleaning up (for example IOPS on your rotating rust) the process of cleaning up may take a while.
The implicit question in any process, that doesn’t react to a normal kill via SIGTERM is the question why it doesn’t react to the signal. Just sending a -9 when a normal kill didn’t worked is like “Do not care”. Monitoring the process with truss or strace what the heck the process is doing after getting the SIGTERM is a good first step. Perhaps you see some cleanup work and know that you just have to wait a little bit longer. Writing a core dump of the process with
gcore is often a good second step to save evidence for future research why the process didn’t reacted.
And then … and only then … a kill -9 may be feasible.