QuicksearchDisclaimerThe individual owning this blog works for Oracle in Germany. The opinions expressed here are his own, are not necessarily reviewed in advance by anyone but the individual author, and neither Oracle nor any other party necessarily agrees with them.
|
About tuningFriday, September 23. 2011
Recently I was doing some work in regard of tuning systems. There is something i really hate about this topic of computing: Tuning scripts. You find them on google easily and i find them on systems at customers quite often.
Simply said: I hate them. The reasons for it are simple. For example recently I found a system with a networking tuning script dating back into 2003 or so. The problem: It was meant to increase some of the settings. However many of them were already higher in the default config of current Solaris 10 versions, thus the tuning script essentially reduced the parameters and thus reduced the performance. Futhermore: Tuning is a lot about understanding things. Understanding how things work together. On a systemic, on an architectural level. How an application loads all the rest of components. Just dropping a script downloaded from a website found by Google - into /etc/init.d is not about understanding things. You have to carefully consider each change from the default about the impacts. You have to check each setting, if the setting hasn’t already overtaken by the years. You have to recheck it it with every major update of your environment. You have to recheck it with each new technology you are using in your system. Network tuning scripts dating back to a time when 100 MB/s were normal and 1 GB/s are fast aren’t necessarily up to the task in a time when 10 GBit/s are fast and Infiniband IPoIB networks deliver even more. You had to turn different knobs in a time, when cpu time was precious. You’ve tuned for minium cpu utilization. CPU isn’t a large factor today, you tune for minimize latency or maximize throughput. You have to know what you want to aim for, because minimum latency and maximum throughput are often mutually exclusive. Do you want an extreme or a target in between. Just using a script to tune something doesn’t lead you through all the thought to make really good tuning decisions. There are some basic rules from my point of view:
kill -9Tuesday, September 20. 2011
A commentator at hackernews asked how i think about -9. In my opinion: It's widespread use is a similar plaque like the –f switch. And this is pretty easy to explain (I'm simplifying things a bit).
-9 is a shorthand for SIGKILL. When you send a SIGKILL to a process, the process is terminated immediately. You can’t catch this signal, you can’t ignore it. A kill with -9 sends this SIGKILL to a process. A kill without -9 sends a SIGTERM to process. It terminates the process like SIGKILL. However a process is allowed to catch it in order to execute a signal handler … or just ignores to ignore it. A signal handler is nothing more than a code path that is executed when the process receives a signal. So when you kill a process with a normal kill you give the process the chance to clean up behind itself, to make files consistent, to roll back changes in the case the process isn’t using some transactional mechanisms when changing data, to delete temporary files … and so on ... It's a good style to write such signal handlers and in many programming languages it's pretty easy. For example in perl: When you send a -9 to a process you take away this chance from the process. It’s killed instantly … even if it just started to modify your files, fscking up your data in order to put it in a new form, even when you have created dozens of temporary files filling up /tmp. Things like that … Killing a process with -9 is the last possibility. However I see people using it too often too early. A second after the normal kill is send a pgrep on the process follows. Still there and the sword of -9 is falling down. When a process doesn’t disappear immediately after sending the SIGTERM, it may be just busy to follow your order of terminating itself and is cleaning up things. When your application is dependent to precious resources at cleaning up (for example IOPS on your rotating rust) the process of cleaning up may take a while. The implicit question in any process, that doesn't react to a normal kill via SIGTERM is the question why it doesn't react to the signal. Just sending a -9 when a normal kill didn't worked is like "Do not care". Monitoring the process with truss or strace what the heck the process is doing after getting the SIGTERM is a good first step. Perhaps you see some cleanup work and know that you just have to wait a little bit longer. Writing a core dump of the process with gcore is often a good second step to save evidence for future research why the process didn't reacted.And then … and only then … a kill -9 may be feasible. In short:
-fTuesday, September 20. 2011
I'm following a discussion at the moment, where someone has done some havoc to his data. This discussion inspired me to write this:
-f. The force switch. Personally i believe -f should be protected by key that you just get when you can explain the whole subsystem that has such a switch and the reason why you need -f.
-f is feasible. However just do it, when you know the 7 following things:
(Page 1 of 1, totaling 3 entries)
View as PDF: This month | Full blog
Competition entry by David Cummins powered by Serendipity v1.0 |
+1The LKSF bookThe book with the consolidated Less known Solaris Tutorials is available for download here
Web 2.0Contact
Networking xing.com My photos Buttons![]() This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Germany License
![]() ![]() ![]() Blog Administration |




Comments