QuicksearchDisclaimerThe individual owning this blog works for Oracle in Germany. The opinions expressed here are his own, are not necessarily reviewed in advance by anyone but the individual author, and neither Oracle nor any other party necessarily agrees with them.
|
Damned axe ...Sunday, October 30. 2011
Damned axe … is it really that hard to believe that an automatic system in Debian makes a dumb decision based on dumb assumptions (classic BIBO - bullshit in , bullshit out) ? It has nothing to do with a butchered up distribution by a hosting provider or an error between the keyboard and the chair. I would consider it a bug if it would deliver a differen result given the input to the system and the ruleset codified in the system.
Per default an init.d script is inserted by the insserv in the init.d scripts in debian. If you have ever wondered about the comments section at the beginning of the init.d scripts in debian … they are the input for insserv to build up the dependency graph.This information is used to put the init.d script at the right place of the sequence of startup. There is a number of scripts just dependent on the existence of the local filesystem. This ones are run with the sequence number 01 thus first. One of the services started first is syslog, and as you may have recognized in the head of the ssh init.d script, ssh depends on it. So it gets a higher sequence number and thus it's started later:But wait, what has happened with 02 and 03? They are used by a single service each:Why do they have this special treatment. It's pretty simple. Both services are flagged as being interactive, as they could ask for some user interaction.In this cases both might probably ask for a password for the key. They have to start early to be sure that nothing can gets and block the tty in order to enable both scripts to show the password dialog. Keep in mind that we are in a state where no getty is running. However they can't earlier, as both rely on syslog. Essentially they are started as soon as possible and before other scripts with the same dependencies. And thats basically a part af the problem.And this essentially leads to the situation, that apache2 was started before ssh. Another reader hinted me to the fact that lightttpd starts after ssh. At first this is just of cursory interest at a situation where the webserver in question is apache. At second the dependencies are different. At first the init.d script for lighty has no # X-Interactive: hint and at second there is a condidtional (obey it, when it's installed and configured to run, otherwise ignore it) dependency to fam (file alteration monitor) that the ssh script don't have. And fam relies on portmap. Thus the dependencies are a little bit longer thus it's no wonder that you see lighty starting after ssh.I won't comment what i'm thinking about such a … a… a… solution ... It's facepalm time ...Thursday, October 27. 2011
Surely you’ve recognized that my blog was down for a few days and with it all services on the system. The problem that led to this situation was a really dumb one. Perhaps this article is more a story about not thinking about a failure mode just because it’s not a problem under your preferred operating system (Or to be exact: It was a problem before Solaris 10, but afterwards it was solved). And and most it's a story about being totally problem-blind in the first moment.
Perhaps I should explain first that c0t0d0s0.org isn’t run with Solaris, it uses this-other-unixoid-operating-system in a well-known non-commercial variant. That’s the dirty secret of c0t0d0s0.org. No technical reason for it, but webserving and mail could be done by any operating system and thus I used that operating system with ubiquitous availability at almost all providers of dedicated servers. I’m able to migrate the server from one dedicated server provider to another within 2 hours including moving the data and did this three times in the past (from 1&1 to Hetzner, and two times within Hetzner). This saved quite significant money until now and that’s the basic reason why I don’t want donations and when you do donations I would donate this money to kiva.org) Hetzner has reasonably priced dedicated servers and I had no problems in the past, however they have one important shortfall: No serial console in the standard product. When you need a console, you have to make a support call and they connect one. As you need it really seldom, it’s okay. As I found out later: I With this serial console I would have recognized the problem within a minute, and fixed in a second. However: The console was exactly thing that I didn’t had to my disposal at this moment. So it was a lot harder to find out what’s happened. However i wanted my server back as soon as possible (out of personal reasons I was just able to start the recovery in the evening and as I have job to do I could only do the further stuff in the evenings as well) and thus I just reimaged the server after keeping a copy of the logfiles. I have a quite extensive backup regimen with very regular rsyncs and database replication on my server at home thus I knew I would perhaps just lose an hour of minutes of data and that was okay for me, additionally I was able to mount the disks of the non-working installation and to copy the delta of mails between the last backup and the last mail in the queue to my backup. What had happened: At 10:something my server provider had a large power outage. The UPS didn’t take over as planed and thus a lot of servers rebooted. One of them was mine. Damned … but that’s the basic reason why I’m a fan of proper enterprise architecture and not of some singular availability features, no matter what marketing tells you. Real availability is hard work and often expensive. But: When you really bet your business on IT, you need an architecture that is even capable to cover an UPS that proofs to be not so uninterruptible. The availability feature UPS may fail (and did fail my case) but a proper enterprise architecture keeps your service up and running. Even more important: With a proper enterprise architecture you don’t need the feature UPS for availability reasons at all because your service can survive the outage of some parts. Perhaps you want the UPS out of other reasons like “don’t want the hassle of bringing up all the systems again.”. But you don’t need it with such an architecture to keep your business running. By having a proper planed enterprise architecture with servers on two seperate sides with different power grids you may forget about the UPS because a UPS won’t help you with a prolonged power outage for example because of region-wide blackout. An outage that maybe will take out the connectivity as well as it’s not that unprobable that your local carrier has the same power problem Okay: After a while my system worked again and thus I had time to find out what had happened. I knew that the system was still reacting on pings, thus I knew the kernel of this-other-unixoid-operating-system in a well-known non-commercial variant was working. Looking into the logfiles I saw complete bootup of the kernel and some of the daemons were starting up .... like acpid for example. However I couldn’t log into SSH. No signs in the logfiles of a ssh daemon startup. The apache was in a half-reacting state. Port 80 was open but it didn’t reacted to HTTP commands. Out of this I concluded: The kernel and the boot configuration is okay. The bring up of the services is at least working partially, because otherwise it would start services at all. And as it reacted on the networking there must have been at least a working boot of services mandated by rcS.d, as otherwise there would be no networking. The problem must be in apache that is frozen halfway. And out of other reasons ssh isn’t started at all. There must have been a major fsckup in the startup of the services As I had no console as explained before I needed to conclude from the leftovers what had happened. And now was a little bit puzzled. 5-6 years ago I would have recognized this problem in an instance (because sometimes i've produced ... well ... suboptimal startup scripts) … but now today it took a while, because I didn’t felt prey for 5-6 years to such a problem. It took me a lot of more thoughts what might had happened. When you do one operating system for a living and one for hobby, you tend to project your mindset of one to the others and you don’t do justice to this other OS. As you all may know, Solaris ditched init.d with Solaris 10 in order to introduce SMF (not to forget the equally important features like the contract filesystem and the Fault Management Architecture). One of the nice advantages of SMF is that services that aren’t interdependent will be started in parallel without waiting for another. This has two advantages: At first the system can start up much faster, at second a service not able to start up can’t block the startup of the rest (short of services needed by all others). The init.d concept is a different. All services are started in sequence. The sequence is numerical and then alphabetical. That is of course slower but more important … depending on the way you write your script a script or binary hanging or waiting for user interaction can block the startup. The variant of this-other-unixoid-operating-system is using init.d And it’s quite easy to block a service. For example by integrating a new SSL key and certificate. My key had a password and apache was asking for this key in order to startup. This exacly happened. Acpi started up because it was started before Apache (guess what: ACpi is before APache in an alphabetical order, and way before Ssh). This is the basic reason why you strip of the password from your key. Guess what I did last week: I put a new key and certificate on my server and I forgot to strip the password from it. And that exactly happened: In my version of this-other-unixoid-operating-system the ssh daemon is started after the apache daemon. When Apache waits for something you won't get SSH. Damned … it's facepalm time. Basically I felt prey to a beginners error because I’m working with an operating system that reacts totally differently on such situations. On Solaris such a situation just don’t matter at all … you get at least your ssh login and the system non-availability is just a service-non-availability you can fix within a second. However given the init.d system of this-other-unixoid-operating-system the outcome was somewhat more problematic. However: What a dumb error on my side ... However: The reinstallation wasn't that bad ... the system could used a reinstallation because of some tests and experiments. So it was worth the work in the evenings. And on the other side: Who had the glorious idea to start apache before ssh? That said, this-other-unixoid-operating-system in newer variants have a different startup mechanism up upstart all the services. However: My heating control is running on a beagleboard-XM at the moment using a really current variant of this-other-unixoid-operating-system just released a few days ago. It uses this new startup mechanism. And it’s justs my unimportant personal preference, that doesn’t matter: But I don’t like it. And I have a lot of reasons for it. It looks by far too much designed for desktop needs. However my dislike would require an article I won’t write in this blog nowadays. But as I wrote: That’s my personal opinion that doesn’t matter. However it’s really important that this-other-unixoid-operating-system gets away from the old init.d mechanism to something more current. I think in 2011 every operating system deserves something more functional, something better than init.d … init.d is simple and well understood, however it creates classes of problems unnecessary today. Especially: In order to keep die-hard Solaris admins to fall prey to such a beginners error because such problems were parts of their distant past. And now i will start to cut holes for my eyes in the brown paper bag for my head. On Amazon, too ....Friday, October 7. 2011
With a link to apple.com, too. Directly on the front page ...
Hungarian sortWednesday, October 5. 2011
BTW ... the hungarian folk dancers explain other sort algorithms as well: merge-sort, insert-sort, bubble-sort, shell-sort and select-sort
Dance the quicksortWednesday, October 5. 2011
I think with a few words as an introduction and this video everybody should understand Quicksort. They just dance the algorithm:
Nice example for the power of boot environmentsTuesday, October 4. 2011
There is a nice example of the power of boot environment. Boot environments are something like snapshots of your operating system installation made writeable. As you may already assume, they are based on ZFS snapshots and the clone functionality. This is possible due to the usage of ZFS as the root filesystem.
So: Please don't try this at home. Whey you try it, don't try it on any Solaris 11 Express installation of any value. But don't try it. I don't want to hear any story. that you've deleted your ERP system by accident because you used the wrong terminal window. Leave that to trained professional stunt admins with the right equipment (Solaris 11 Express) Assume you have a system, configured with all your application, everything is running fine. So you think it would be nice to have something like a freezed state of this situation. No problem. This command will do the trick. When you reboot your system you will see it as a new entry in the grub menu. Okay, but boot into the old environment starting "Oracle Solaris ..." first by selecting it in the grub menu (it should be already selected, or you used beadm activate already. Now i will drop the atomic bomb on your installation. Essentially we've just nuked the installation. After a moment the system should just freeze. Reset the system and boot again via grub into the boot environment starting with "Oracle Solaris ...":Okay ... on a normal system this would send you to the tapes. With Solaris 11: Reset the system. Boot into the boot environment "rescuenet" via selecting it in grub. Tada! Just creating a boot environment with a single command after a config change may safe your butt later .... and btw ... this even works in zones ... they know the concepts of boot environment,too.
Posted by Joerg Moellenkamp
in English, Solaris, Sun/Oracle
at
20:06
| Comments (7)
| Trackbacks (0)
eSTEP blogTuesday, October 4. 2011
You may have noticed, that there are no product announcement for Oracle products on this page, even when there are now a lot of announcement that i was really waiting for a long time. And i will keep it this way. So in case you want information about announcement, you have to search them at other locations. I want to draw your attention to a blog by my colleages working in the eSTEP (EMEA Systems Technology Enablement Program) program. They started just it. So i would like to ask for your kind attention for:"The official eSTEP blog"
How to activate IPoIB Connected mode in Solaris 10 Update 9Monday, October 3. 2011
Just a short hint: The What's new document of Solaris 10 Update 9 states, that the support for IPoIB Connected Mode has been added in the release. However you have to search a bit in order for some information how to activate it. The necessary step is documented in the manpage for the ibd driver. Let's assume you have to instances of the ibd driver running (ibd0 and ibd1). In this case you have to change one line at the end of
/kernel/drv/ibd.conf file to enable_rc=1,1; and reload the ibd driver respectively reboot the system. After that you ibd devices should show an mtu size of 65520 bytes instead of 2044.PS: The process for Solaris 11 is better, as you just use dladm for it. However connected mode is the default there anyway. In Solaris 10 unreliable datagram was kept as the default, as one of the rules in Solaris is that you have to opt-in to such changes between updates. About tuningFriday, September 23. 2011
Recently I was doing some work in regard of tuning systems. There is something i really hate about this topic of computing: Tuning scripts. You find them on google easily and i find them on systems at customers quite often.
Simply said: I hate them. The reasons for it are simple. For example recently I found a system with a networking tuning script dating back into 2003 or so. The problem: It was meant to increase some of the settings. However many of them were already higher in the default config of current Solaris 10 versions, thus the tuning script essentially reduced the parameters and thus reduced the performance. Futhermore: Tuning is a lot about understanding things. Understanding how things work together. On a systemic, on an architectural level. How an application loads all the rest of components. Just dropping a script downloaded from a website found by Google - into /etc/init.d is not about understanding things. You have to carefully consider each change from the default about the impacts. You have to check each setting, if the setting hasn’t already overtaken by the years. You have to recheck it it with every major update of your environment. You have to recheck it with each new technology you are using in your system. Network tuning scripts dating back to a time when 100 MB/s were normal and 1 GB/s are fast aren’t necessarily up to the task in a time when 10 GBit/s are fast and Infiniband IPoIB networks deliver even more. You had to turn different knobs in a time, when cpu time was precious. You’ve tuned for minium cpu utilization. CPU isn’t a large factor today, you tune for minimize latency or maximize throughput. You have to know what you want to aim for, because minimum latency and maximum throughput are often mutually exclusive. Do you want an extreme or a target in between. Just using a script to tune something doesn’t lead you through all the thought to make really good tuning decisions. There are some basic rules from my point of view:
kill -9Tuesday, September 20. 2011
A commentator at hackernews asked how i think about -9. In my opinion: It's widespread use is a similar plaque like the –f switch. And this is pretty easy to explain (I'm simplifying things a bit).
-9 is a shorthand for SIGKILL. When you send a SIGKILL to a process, the process is terminated immediately. You can’t catch this signal, you can’t ignore it. A kill with -9 sends this SIGKILL to a process. A kill without -9 sends a SIGTERM to process. It terminates the process like SIGKILL. However a process is allowed to catch it in order to execute a signal handler … or just ignores to ignore it. A signal handler is nothing more than a code path that is executed when the process receives a signal. So when you kill a process with a normal kill you give the process the chance to clean up behind itself, to make files consistent, to roll back changes in the case the process isn’t using some transactional mechanisms when changing data, to delete temporary files … and so on ... It's a good style to write such signal handlers and in many programming languages it's pretty easy. For example in perl: When you send a -9 to a process you take away this chance from the process. It’s killed instantly … even if it just started to modify your files, fscking up your data in order to put it in a new form, even when you have created dozens of temporary files filling up /tmp. Things like that … Killing a process with -9 is the last possibility. However I see people using it too often too early. A second after the normal kill is send a pgrep on the process follows. Still there and the sword of -9 is falling down. When a process doesn’t disappear immediately after sending the SIGTERM, it may be just busy to follow your order of terminating itself and is cleaning up things. When your application is dependent to precious resources at cleaning up (for example IOPS on your rotating rust) the process of cleaning up may take a while. The implicit question in any process, that doesn't react to a normal kill via SIGTERM is the question why it doesn't react to the signal. Just sending a -9 when a normal kill didn't worked is like "Do not care". Monitoring the process with truss or strace what the heck the process is doing after getting the SIGTERM is a good first step. Perhaps you see some cleanup work and know that you just have to wait a little bit longer. Writing a core dump of the process with gcore is often a good second step to save evidence for future research why the process didn't reacted.And then … and only then … a kill -9 may be feasible. In short:
-fTuesday, September 20. 2011
I'm following a discussion at the moment, where someone has done some havoc to his data. This discussion inspired me to write this:
-f. The force switch. Personally i believe -f should be protected by key that you just get when you can explain the whole subsystem that has such a switch and the reason why you need -f.
-f is feasible. However just do it, when you know the 7 following things:
Froscon 2011Sunday, August 21. 2011
Yesterday i held my deduplication talk at the Froscon 2011. I think it was acceptable and the lecture room 3 was really filled. To be honest: I don't expected such an audience at that time. So a big thank you for all who attended the talk.
My talk started at 10:00 o'clock and thanks to apron parking, a rental car pickup in the feeled middle of nowhere in Düsseldorf and a larger traffic jam near Cologne led to an arrival at 09:58 … with a presentation at 10:00 and the urgent need to visit the restroom before a disaster happens. I had just a short time at Froscon … i had a date with my house, that still needs work and don't accept when i'm saying "No … not this weekend". And thus i was back in the rental car at 11:45 and was back in Lueneburg shortly after 15:00, ten hours after leaving home. 3.30h for 430km … not that bad for an Opel Corsa. However there is something i've recognized: I'm really missing standing in front of customers and technically interested people and trying to transfer my enthusiasm for technology. I'm really missing being on the road. Hunting red herringsMonday, August 15. 2011
Sometimes you “know” the problem from the first moment. But sometimes your feeling in the gut results in something that is perceived as a large change, so you have to find the smoking gun, the undeniable proof for your hypothesis.
This is the story of such a search. It started with a telephone call of a colleague. He got my name from another colleague. An Oracle database running on a Solaris system, the datafiles and logs are located on a Veritas File System. The customer saw massive delays (in the range of hundreds of seconds) when excuting certain commands. One of the commands was “truncate table”. A hypothesis - but the proof?And in the beginning it started with a red herring.In this case the thread is trying to execute something on a semaphore, but it wasn’t able to do so. However the semtimedop is timebombed. When the timeout is reached without being able to execute on the semaphore , it terminates with error 11. All the timeouts were consistent with the waiting time seen from the SQL commands perspective. Obviously the customer and other involved parties were tempted to see this as the problem, but already thought that this may be just the harbinger of bad news. And after a short look into the truss files, I was pretty sure that they were right with their doubts in regard of passing the . It was just the harbinger of bad news. After a short amount of research I suspected, that we were talking about a locking problem here. There was just a problem: vxfs. At first I worked seldomly with it, thus it’s not really my center of expertise. One point that diverted the attention of the customer from the locking stuff is a small but important difference: The customer knew that Oracle likes Direct I/O. With UFS the "Direct I/O" is doing a little bit more than just making the I/O direct by disabling buffering. It also removes the inode r/w lock mandated by POSIX rules.The customer knew about UFS Direct I/O that and thus activated Direct I/O on vxfs. And thus I found lines like /oracle/importantdatabase/oradata1 on /dev/vx/dsk/importantdatabase/oradata1 read/write/setuid/devices/mincache=direct/convosync=direct/delaylog/largefiles/ioerror=mwdisable/mntlock=VCS/dev=51836b0 on Thu Mar 17 20:14:11 2011However i stil suspected a lock contention problem, and had a reason for it: Direct I/O isn't the same with vxfs than it's in UFS. In vxfs Direct I/O is really just the direct part. It doesn't enable concurrent I/O (explain that moniker later) to a file. The removal of the inode r/w-lock isn't part of the feature. You have to use either Quick I/O (QIO) or the ODM module for vxfs. As both features weren't activated, that was the moment where i told the customer "Hey, choose the ODM module for vxfs or QIO, activate it and the problem should go away". Both remove that lock contention and thus are of big help in order to get better Oracle performance when using vxfs. Just to remove a misunderstanding: ODM (Oracle Disk Management) is an API in Oracle, not of Veritas. Oracles DNFS (direct NFS) is implemented via an ODM module as well. The problem: You used to pay for both vxfs, neither of them is really cheap and before doing the change, the customer wanted to know that i was right with my diagnosis (according to the release notes, ODM and QIO are now part of the SF except in basic). I wrote of two problems, but just wrote of one so far. Normally, finding out this inode rwlock contention problems are quite easy to find . But not in this case. vxfs is different than UFS in a multitude of ways. It doesn’t use the locking primitives of Solaris but has its own instead. And thus all values reported by prefered diagnosis tools were pretty useless. Damned … how should you find problems, when your instruments can’t show the problems. Without instrumentation troubleshooting is just guesswork and experience. At this point a question on a mail alias (it’s great to have people on internal aliases, that have forgotten more about Solaris than I know) and some research via google yielded the same result in a few minutes of time: vxfs`vx_rwsleep_rec_lock is the function waiting on/implementing the posix inode rw lock. Now I was back in the game and I was able to use all the nice things of the operating system i prefer.Digging in the dirtI asked the customer to put a dtrace script into a script that is executed in the moment of the wait:The result was interesting, as it clearly showed a peak of 307 events in the range 34359738368 nanoseconds (34.36 seconds) to 68719476735 nanoseconds (68.72 seconds). This was especially interesting as the same dtrace script didn't showed such a peak during times where the system ran flawlessly. Okay ... well ... next step ... what parts of the system were executing this vxfs`vx_rwsleep_rec_lock function. I could have used dtrace for this task as well, but i wanted some additional insight in one step. Thus i used a nice little command of the modular debugger in Solaris: # echo "::threadlist -v" | mdb -k The output is quite long on a loaded solaris system. It prints something like this for each thread: I hate multiple line outputs when searching for patterns. There is nothing better than two monitors, an terminal streched on both and the two glibberish grep-implementations on the front side of your skull. But this works best, if one event is just in one line.So i did some grepsed-fu on it. .Each thread is now in a single line. Yeah … perhaps there is a more elegant way to do this, but that was the first that came into my mind Just a quick check. At the moment of the hang, 1008 processes were in vxfs`vx_rwsleep_rec_lock. That was interesting. Even more interesting were the list of commands that had threads in the mentioned function. It's column 10 in the threadlist in it's concatenated form.When you further dig down into the large heap of data: From all this threads belonging to the ora_dbwriter3_importantdatabase just seven weren't in the vx_rwsleep_rec_lock function.At that moment i thought: That isn't a smoking gun, that's a smoking howitzer. An attempt to explainMost threads excuting this function are part of the database writers. When you think about it, that's not so astonishing, especially when you think about the nature of an rwlock. At first: There is a rwlock for each inode in a filesystem. Their function: Multiple readers can get the lock and so they can read concurrently from the file, but just one writer is able to hold it and thus to write into the file. Equally important: You can't write to the file as long one or more readers is in the codepath protected by the rwlock for this file, and no one can read from the file as long there is a writer in protected codepath.In really basic rwlock implementations this can lead to writer starvation, as it's hard for the writer to get the lock, because all readers have to relinquish the rwlock and no new readers should start before the writer can get the lock. Out of this reason, the Solaris threads implementation tends to favour writers before readers. However when you have many writers, it may take a long time before the backlog of writes. Blindly prefering writers is not a solution as well, because then readers would starve which is even more problematic, because reads are always synchronous by nature. As i wrote at other locations. While a system can chose the time of an physical write to some extent, it can't chose the time of a read. A function won't execute as long the data isn't available. But that's out of scope of this article. For the capability to write and read in parallel to a file the name cocurrent I/O was coined. I just wrote that it can take a moment before the backlog of writes has been executed. In this case it was even worse: The inode r/w lock adds insult to injury. Because basically the inode r/w lock limits you to just a single write I/O operation in parallel to a file, no matter how many HBA, how many disks you have in your system. And now you've made a while out of a moment. Even when the changes in the file are totally unrelated, e.g. changing a block belonging to the user table stored in it and another block in the article database or you want to read a block into the sga containing the customer database and writing the new salary for the promited assistant. You can't do this in parallel due to the inode rw lock. And with many updates in your workload it's not that astonishing that database writer threads start to twiddeling fingers in an increasing number in order to wait for their turn to write to the file. You may ask yourself, why the heck there is such a mechanism. The r/w lock is something mandatory in order to be Posix compliant. You need it to ensure write ordering and consistent reads, when updates occur in parallel to read. Obviously you really want such a protection when working with files. However especially with databases a file is just a container for a large heap of things. Independent things. And things are now different. Out of this reason there were some developments in the database realm to get rid of the inode rwlock and put this mechanism elsewere. Oracle allows you to use a raw disk, and so it has to do the consistent read and write ordering stuff anyways and as it’s aware of the inner structure of the heaps of data, it can do it with a much greater granularity than just per inode and thus per file. The inode r/w lock is just a bottleneck without any use in this case. Out of this reason Direct I/O of UFS for example offers a mode that removes the lock. It's not the way, that those write ordering things or consistency protections are away. They are just in a layer that knows more about the structure inside the file and thus can do a better job at doing this job. vxfs knows similar mechanisms. QIO or ODM don't have such an inodewise locking. They are working differently compared with UFS direct I/O but as an earlier chancelor of the Federal Republic of Germany said: Outcome matters. One question was still open. Why was this problem reproducible by a "TRUNCATE TABLE" command? That’s pretty easy however you have to dig deep into the internals of Oracle. When Oracle executes a TRUNCATE TABLE command, it checkpoints the database. In such a situation it writes all dirty blocks from the SGA into the database datafiles. This must be done for recovery purposes. Such checkpointing may trigger a storm of writes via the database writer, especially when you have a SGA with a lot of dirty blocks. The checkpoint has to complete, before the TRUNCATE TABLE executes. And then we are at another red herring at the end: It's not the TRUNCATE TABLE command that was slow ... it's the checkpoint occuring before. You can check this pretty easy, when a "TRUNCATE TABLE" takes too long for your taste, trigger a checkpoint manually and do the TRUNCATE TABLE directly afterwards. TRUNCATE TABLE does still a checkpoint, but as you've already cleaned up the SGA from dirty buffers, it doesn't have to do much writing. It should run much faster now. ConclusionAt the end i had to tell the customer, that in essence everything works as designed. It would be a bug, when the system would act just a little bit different differently. However that's seldom the answer a customer wants.So: The solution for the issue? It's as old as it's easy. Getting rid of the inode rwlock. Get concurrent I/O: Either by using raw disks, by using ASM, by using UFS or by using ODM or QIO for vxfs. I just can reiterate something i've already said: When you put your Oracle database file into a filesystem, you want to use direct I/O and concurrent I/O!
« previous page
(Page 2 of 425, totaling 6365 entries)
» next page
View as PDF: This month | Full blog Competition entry by David Cummins powered by Serendipity v1.0 |
+1The LKSF bookThe book with the consolidated Less known Solaris Tutorials is available for download here
Web 2.0Contact
Networking xing.com My photos Buttons![]() This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Germany License
![]() ![]() ![]() Blog AdministrationDonateOkay, okay ... as several people have asked for it ... but you know my opinion.
|



