Some thoughts about IBM´s Workload Partition.
I´ve ranted yesterday about the marketing habits of IBM. But kudos to IBM, they have excellent documentation, they even document the weaknesses of their technologies. So you find very intersting documents about their technology. Not that they hadn´t widthdrawn redbooks in the past like the one that documented the overhead of mpars, but mostly it´s a good source of realistic information. And thanks to the RedBooks it´s easy to clear some of fuss around some IBM technologies. And so, you find some interesting facts for example in the Redbook. And in this article i want to talk about WPARS. Many people think, that this is a cheap rip off of Solaris Zones. But many of them think, that the Live WPARS mobility is a cool feature, too. But when you really read public documentation much of the fuss appears a little bit overblown. Let´s have a look in the Redbook 247431: Introduction to Workload Partition Management in IBM AIX”:
- The application in a WPAR has to write into a filesystem.
Page 32: All files that needs to be written by the application must be hosted on an NFS filesystemThat means: No large applications that needs direct raw access to disk like Oracle on WPARS and the speed of you application is limited to NFS.
- WPARs works with checkpoint/restore. That means: When you want to migrate your application, you have to freeze the WPAR. Then you have to checkpoint it. The checkpoint state file will be written via NFS to the shared storage. When you restart the WPAR on a different system, the process loads the checkpoint state file from the NFS server, and starts the WPAR. The application doesn´t run while to take or revive the snapshot:
The chkptwpar command captures a snapshot of all tasks executing within one WPAR. It first interrupts all processes so they reach a quiescence point, then stores a copy of the processes context in a state file.Still sound unproblematic to you? The problem is in a detail. Think about an application that allocates 8 GB of memory (your java application for example), now take into consideration that most datacenters run on Gigabit Ethernet. Let´s assume 80 MB/s per Gigabit Ethernet interface via NFS. Thus the writing the state file would take 100 seconds. Reading it on the target system: Another 100 seconds. IBM states:
The only visible effect for a user of the application is a slightly longer response time while the application is migrating.Well, i don´t know your users, but three and a half minute application downtime isn´t a slightly longer response time.
Okay, perhaps you can live with all this stuff, but still believe this, then you should read the manual for the Workload Partition Manager. The parts about Migration compatibility beginning arround page 27 is especially interesting: There are compatibility modes like “inbound/outbound compatile”. An example:
Compatibility testing shows that a WPAR can be relocated from the departure system to the arrival system, but it cannot be relocated back from the arrival system to the departure system.
The first time i´ve heard of it, i thought this is a joke, but it´s documented in IBM´s own documentation. It´s perfectly possible, that you can migrate a system to another system, but not back. When i understand it correctly, all systems have to patched to exactly the same level. When the patchlevel is lower on the arrival machine than on the departure machine, you can´t migrate to the system. When the patchlevel ist higher on the arrival machine than one the depature system, you can´t migrate back to your old system. And the libc has to be same on both systems in any case. Whoa .. what a bullshit … When i look at the WPARS solution, it looks like a really bad kludge. It´s good for the bullet point list wars.I´m sure that IBM sales reps will run around with WPARS and tell “Sun cannot do that” and maybe some customers who doesn´t like Sun, will use it as an argument to their management, to explain why they didn´t bought Sun. But honestly, I don´t think, that you really can use it for anything practical. PS: I´m not an AIX expert. When i´ve got something wrong, please correct me!