Dissecting a TCO study

There are lies, obvious nonsense and this TCO study. Or to say it differently: Sometimes there are studies having an obvious bias, maybe introduced by the people that ordered that study. But let’s start at the beginning. A few days ago, i’ve got a TCO study named “The Business Value of Novell SUSE Linux Enterprise Server on HP ProLiant x86 Servers vs. Sun Solaris 10 on Sun Servers”. You can download it at various location at Google … no need to provide a direct link to this pile of … well … don’t know what it is. I hope after reading my article you have an impression where you should put this “study”. Just to make it clear, i don’t want to discuss about the HP servers or HP (albeit i feel as naughty about buying a Procurve switch recently as other people feel about downloading a porno video ;) ) , i just want to show that this TCO study is totally nonsense. It’s just the way that i have the strong feel that this “study” was ordered by HP. Otherwise many statements are somewhat unexplainable to me.
So … let’s dissect this “study”. I will refer to pages and parts of this document and write down my thoughts to it:

  • in general: Why do they use the T5440 instead of the X4600/X4640 to compare SLES to Solaris 10? Would be a more obvious comparison...
  • in general: this document makes the most basic error: It writes "Savings with SLES" whereas "Savings by migrating from SPARC to x86 and from Solaris to SLES" would be more correct. This paper wants to make the impression that the cost savings are a property of SLES, but in reality it's a property of the OS, the hardware and a vast amount of errors in their paper as i will explain later on. But at the end it's a classic technique in such migration studies. Like comparing costs of a 8 years old Sun U450 with the costs of an brandnew Nehalem server.
  • Page 7:The matching of the business case scenario to the hardware is somewhat strange. They talk about using T5440 and DL785 but they use just half of the cores in each system. The "study" lacks an explanation for this, thus i have to speculate:
    • They want to ensure, that the other system can handle the additional load of the other system in case of a failure. But in this case two two-node clusters would be a more reasonable choice, as you don't need the high-end systems of the respective system classes.
    • Another explanation can be derived from a statement on page 1. They write at the beginning of the "study":
      "A customer case study was developed for the mirgration of SAP running in 4 virtual partitions from Sun Solaris to a comparable HP or Sun solution"
      Thus the mentioned 16 (Sun) respectively 24 (HP) active cores may be just for a single SAP instance, not for all 4 instances. But this mandates the usage of virtualisation which has quite an impact on performance when you use Linux, as you have to use Xen or VMware. With containers you don't have significant impact. Thus you can have serious doubts about the comparative sizing. I just want to remember you of the situation, where a Sun System with Solaris 10 with Containers yieled a 36% better performance than SLES on VMware running on a roughly identical Fujitsu system.
    • Perhaps the second point explains the large headroom included in this sizing. An T5440 has roughly twice the performance of the E6900 given you have a thread-rich environment. Let's assume that the DL785G6 has the same headroom. With that much room to grow the virtualisation overhead isn't that important. But you have definitely less room to grow in the HP/SLES configuration as you need the virtualisation layer for consolidation. And this is somewhat diametral to the agility thing they brag about later in the document as headroom equals agility to a sudden surge in demand.
    • </ul>
    • Page 9, first table: Looks like they never really worked with Sun equipment ;). Otherwise they would knew that the OS support is included in the HW support ...
    • Page 9, first table: A large dent into software budget are the HP OpenView licenses. For monitoring you could use cheaper alternatives, like Nagios. And it's not an advantage of SLES or the HP server . It's just an advantage of using the special pricing of OpenView for HP servers.
    • Page 9: A large cost difference between both solution finds it's explanation in the arbitrary addition of JES suites to the software stack. You don't need JES for SAP. I think even Alinean doesn't know why they included JES as there isn't a usecase for JES in this scenario as they wrote:
      The "Other Software" cost is to Identity Manager or Java Composite Application Plattform Suite to Sun
      .I would like to know why the SLES solution doesn't need an Identity Managment or an SOA system and why there is an "or" between both. Either you need both or need none of them. Based on the 5-years perspective this simple trick of Alinean adds $250.000 to the Sun solution whereas the HP solution doesn't contain this part. $250.000$ of $339.000 just consist out of this arbitrary addition
    • Page 11, last paragraph - Page 12 first paragraph: They calculate the power and cooling with the numbers they found on the power supplies or in the datasheet. This is one of the oldest tricks in comparative benchmarking. Didn't thought, that someone still uses it. The T5440 doesn't need 2640 Watts as well as the HP server doesn't need 2400 Watts. When you put the data into the power calculator for the T5440 you get an power consumption of 1346 Watts at 100% (4 procs at 1.6 GHz with 128 GB, 2 disks and 4 PCI cards) instead of the 2640 watts used by Alenian. The number BTU/hr was calculated with 4559 instead of 8973. BTW: It looks like they've used the real BTU number for the HP system whereas they used the max BTU numbers for the Sun configuration. Of course this error leads to better numbers for the facility costs of HP.
    • Page 10: The bold statement department - Alinean states that you need less personel to administrate SLES on x86, but they deliver no proof points for this. They state:
      By utilizing an industry standard based server platform, the resources to manage the environment is found to 28% less
      Can someone explain this to me? It's not the way that they normalized different salary levels by assuming that a Solaris admin is more expensive than an Linux admin and normalizing it with a full time equivalent (FTE) based on money, it's time based and the different salary levels are included by using different "average burdened salary".
      Why do they think that you need 0.031 FTE for storage management for Solaris, but 0.023 for Linux? Why do they think that you need 0.037 FTE for software deployment for Solaris, but just 0.017 for Linux? Why do they think that you need 0.018 FTE for application managment, but just 0.008 FTE? I simply don't have an idea as it doesn't match my experience. It's just my personal opinion but that the
      HP Management tools can't be the answer as well because they wrote on the same page of their "study", that they didn't included the effects. As there is no additional costs in the "Other Software" cell for HP, it can't be another special software as well.
      Finally i want to point you at something strange in the reasoning. I want point to it by asking a question: "Would you really fire your experienced employees just to save some bucks and hire some Linux specialist? " An SAP specialist said to me once "At the beginning there was the customization" ... so every SAP installation may be similar, but it's different enough to justify to spend some money on experience. In most cases savings due to lower "average burdened salaries" are just nonsense as you would work with the same employees you already trust. And you don't lower their salaries ... that's a motivation thing ...
    • Page 11, last paragraph - Page 12 first paragraph: Alinean thinks that both solutions take the same amount of space in your datacenter. That may be even correct when you look at it from the right perspective, the 2D perspective. But a datacenter isn't a two-dimensional entity, it's a three-dimensional thing. You can stack the equipment in the rack. So we have to talk about room, not space. And in this case the calculation looks pretty much different. A T5440 takes 4 rack units, the DL786G6 takes 7. So it's 4 rack units vs. 14 on the HP side. Alinean seemingly assumes that you don't put any other component into the rack.
    • Page 13, second table: Well, i don't know where it should start. Okay ... :
      • Install Time Security: Plain standard Solaris provides Secure-by-default. With Jumpstart and the Solaris Security Toolkit you can implement a automatic network based installation with automatic hardening and minimization.
      • Security Containment: "No" for Solaris? What do you think are Zones? They are even certified for Security deployments as they are the foundation of the trusted extensions of Solaris.
      • Secure Resource Partitions: "No" for Solaris? What do Alinean thinks are Zones with Resource Management
      • CC compartments: Well ... zones are certified as part of the Trusted Extensions
      • Audit filtering: This "No" for Solaris is strange .... long time reader of my blog know that you can fine tune what Solaris logs by the auditing subsystem
      Given all this errors you can have serious doubt about their insight into Solaris and so i have some doubts about their conclusion that there is no cost difference for security.
    • Page 12: The change costs forget an important point. The proposed migration is a dual major change migration: You migrate from SPARC to x86 *and* you migrate from Solaris to Linux. Assuming that there is no additional costs between testing with two of such large changes and testing when you've just changed the hardware is dangerous .
      Furthermore you should talk about the costs of increased risk in such an dual change situation. From a risk perspective it would be more efficient to use Solaris x86 as it's much more similar to Solaris/SPARC than Linux/x86. Where are the costs for porting operational procedures from Solaris to Linux? Where are the costs for the training of your employees to admin Linux instead of Solaris? Where is time for gathering experience with the new OS instead of using the already gathered experience with Solaris?
    • Page 15: This point is from the "bold statement" department. Alinean doesn't explain why it's more agile to use Linux/x86 than Solaris/SPARC. Agility is at foremost a feature of your organisation, not of you technology. Alinean doesn't explain why it's faster to update an SAP on Linux than on Solaris.
    • Page 16: Why does Alinean thinks, that a customer can react timlier on market changes due to the operating system or the hardware. That's an property of the application, and that is pretty much the same on both systems. I don't see any viable reasoning for this besides of marketing bullshit.
    • Some readers wrote in the comment section about other strange points in this case study:
      • Paul hints in this comment to the point, that the numbers of watts for the Sun equipment was computed for a full redundant power supply configuration (2+2 at 100V has an input rating of 2629 Watts, where as the 2400 watts of HP is the input rating a non-redundant 3 powersupply configuration as stated by the quick spec sheets.
      • Given the point that the system seems to have non-redundant power supplies, i'm really interested into the rational behind Alineans decision to assume, that both system have roughly the same availability (as they state on page 13)
      • As Christian correctly observes in his comment, Alinean talks about downtimes for worm and trojans. Trojans for Solaris? Sure. By the way, this isn't the developers desktop, this is the SAP system of a company. It should be guarded by well defined change management processes. When you execute files on you production SAP without testing them, your have problems, that can't be solved by hardware or software.
      Well ... okay ... when you really think there are reasons to migrate to DL785 with SLES ... but don't do it because you have read this "study". It's biased, it's full of errors, it's full of cloudy statements. But to see the positive in it: Looking at this whitepaper there doesn't seem to be a TCO saving story in migrating from Sun/SPARC to HP/x86. Dear Alinean ... nice of you to deliver a proof of that for us ;)