The individual owning this blog works for Oracle in Germany. The opinions expressed here are his own, are not necessarily reviewed in advance by anyone but the individual author, and neither Oracle nor any other party necessarily agrees with them.
Sunday, September 20. 2009
As my colleague Ingo said a while ago: "It doesn't matter how fast you stir your meal while cooking when you wait for your sweetheart to search the wine. You can't eat a minute earlier". And i found a another example for this: The newly published TPC-H 3 TB benchmark for the IBM p595, the biggest system in the Power series.
But i will go into the past before: In 2007 (in IT thats halfway to infinity) Sun benchmarked the E25K with 72 sockets at 1.8 GHz with two cores per socket and 256 GBytes of memory. This benchmark yielded 114,713.7 QphH.
2 years, 224 GBytes main memory and 3.2 GHz later the biggest machine available from IBM is just 34 percent faster (to be exact: 154,115.8 QphH) than the biggest machine available from Sun from two years ago ... based on a design from the beginning of this century.
By the way: Just in case you wonder about the memory configuration - 512 GB is the maximum configuration that can be used with 667 MHz. More downclocks the memory. Using 4 TB of memory would downclock it to 400 MHz.
Perhaps the challenge gets a little more clear, when we stay in the same shop: Almost 4 years ago IBM published a result of the p5-595. This system yielded with 1.9 Ghz Power5 CPU 100,512.30 QphH. 4 years of development, 256 GByte more memory and 2.6 times the frequency gives you just roundabout 50% more performance. Nice, but not that impressive, especially given the effort put into this CPU in regard of cycles.
An thats perhaps the most important takeaway from this point: Ingo was right. Just increasing the frequency isn't sufficient. It gets clearer with every published benchmark. You have to do more or different. Perhaps the future of large SMP systems is in a more efficient communication between the components, more real bandwidth (not all bandwidths added into a number for marketing purposes) and less bandwidth between the CPUs and between CPU and bandwith.
But i have a request to our benchmarking team: I would like to see an TPC-H result for the M9000 at 3TB. Giving the existing results for the M9000, putting some knowledge into the equation and doing some extrapolations i don't think that that we have to fear this competition. But thats just an educated guess.
TPC-H for p6-based 595 withdrawn
Do you remember my comment regarding the TPC-H benchmark of the p595? Well, this result has been withdrawn for unknown reasons as stated by the TPC-H page for such results. What a pity ... i had found something new to talk about, but now i'm not allowed t
Tracked: Oct 01, 18:09
Display comments as (Linear | Threaded)
Power 5 and Power 6 are two different designs. Trying to indicate that there must be some "problem" with a SMP system since performance does not increase linearly with GHz is therefore pure nonsense(I am not sure why you retort to such speculation when you obviously know better). The 50% increase in core performance is on par with other benchmarks(TPC-C for example).
The Power 5 does more per cycle than both Power 6 and the UltraSPARC in the E25K. It is as simple as that.
Only thing I agree with is that the M9000 should be put up for benchmarking. One should not have to guess when it comes to SUNs high end servers.
Okay ... let's assume you can to 154115 things in 5.0 GHz and 100,512 things in 1,9 GHz. So you could do 30823 things per GHz with Power6 and 52901 things per GHz with Power6. Suggesting that this is just the processor would lead to the conclucion that power6 lost half of it's efficiency on the way to 5 GHz. Sorry, i don't like the Power CPUs but they are not that bad.
Additionally SPECint2006 hints in annother direction. Let's assume that SPECint2006 is a good benchmark for testing core performance. The Bull Escala systems with a 2.1 GHz P5+ yielded 12.8 (http://www.spec.org/cpu2006/results/res2007q1/cpu2006-20070219-00534.html) whereas the the power6 yielded 24.9. So let's assume the p5/2.2 core can do 12.8 things this leads to 6.01 things per Gigahertz. The p6/5.0 core can do 24.8 things, thus it can to 5 things per Gigahertz.
Yes, the p6 core is slower (e.g. because OOE was removed from it) but it didn't lost half of it's performance on the way. Thus the core can't be the explanation for the point, that an increase with factor 2.6 just yields 50% more performance. It must be something between the core and the data. I can just assume, that Power6 has problems to get all the horsepower to the street and that's my point: It's a nice example why increasing the clock cycles isn't a solution. Perhaps other components than the CPU should move into the focus like more effective connections to the data.
You did not mention the generation step at all in your blog post. In fact the reader was lead to believe that the only difference was the clock speed. Now that is corrected. Good, because that fact is very important.
The INT/FP core performance increase between the two generations seem to be in the 50-70% range. So 50% increase for TPC-H is reasonable.
Btw. do you still hold on to the claim that the Power 7 wil be 50% slower per core than the Power 6? I have tried to find other commentators saying the same but with no success, most do state that the core performance will increase. But then you would have full support from the poet Henrik Ibsen: "The minority is always right".
1. Sorry, but what do you not understand about p5-595 to designate Power5 and p595 to designate Power6. Furthermore i wrote for an knowledgeable audience. I assume that the people know that 1.9 GHz was a Power5 and 5 GHz is a Power6. But when you want to hide behind this misunderstanding to explain your comment, you're welcome ...
2. It looks like you still not get it. When you have to speed up your proc by factor 2.6 to yield a 50% increase in performance and considering that 1 GHz in Power6 yields only 17% less SPECint than in Power5 you could come to the conclusion that the next steps of IBM should not look towards more cores or more frequency. They should look towards bringing more of the performance available in the core to the outside. 50% less TCPH performance per GHz due to 17% less SPECint per GHz doesn't sound reasonable to me.
I'm just considering integer performance here, because database workloads are mainly integer and it's even forbidden in germany to use floating point in commercial workloads like general ledger or something like that.
3. Based on the data available at the moment when i wrote that article, i would come to the same conclusion. Just to say it again: That were just informations provided by IBM themself and some really basic math including constraints that IBM mentioned again. I thought you understood that at that time.
The slides of the Hotchips 21 suggest that the single core will be a little bit faster in integer workloads. That's fine but thats contrary to the statements mentioned by IBM before. We will have to wait to see what part of this speed up is owed to the large cache and whats the performance of the naked core in real world workloads. We will see that, when there are official products. I'm sure Power7 will be a cool proc, but IBM will have to change their marketing message
You can not put the blame on IBM for this one.
1. You were the only one concluding that core performance would be halfed from Power 6 to Power 7. If IBM marketing did a bad job there should be a lot of other people saying the same. Show me them, will you?
2. The reason you came to this conclusion was that you selectively picked bits of information from different sources and combined them with creative logic. I tried to tell you this. I mean by your logic the 4,6 and 8-core Power7 would have the same total performance. Who on earth would produce somethng like that? But you were too blinded by the opportunity to take a cheap stab at IBM. Many slow cores would be following SUN right?
3. One should expect some kind of moderation when you speculate and put out highly controversial statements. But you did not. You were absolutely sure. I quote:
"Obviously a single core of the Power7 has just half the performance of the Power6 core."
It is OK to be wrong, but the real problem here is that this MO is similar to other blog posts of yours. You are obviously very emotional when commenting on the competition.
No ... i'm note emotional about the competion, i'm just emotional about the point of having to explain stuff again and again and then again. A last try. I'm shorting down the argument a little bit as we know today, that Power7 is a 8 core design.
Informations at the moment of writing the article http://www.c0t0d0s0.org/permalink/About-Power7.html:
0. p6 is a 2 core design, p7 an 8 core design.
1. p7 delivers 2-3 times of p6 at using the same power (Source: http://www-03.ibm.com/systems/resources/systems_power_news_20090721_annc.pdf)
2. p7 fits in the same systems as p6 (at least p595 and p570) (SoD Power7 Upgrade from IBM)
3. Upgrades are done by swapping system controllers and processor books (same document)
3a. There are no upgrades to cooling and power supplies (conclusion out of information 3)
4. The p6 and the p7 boards have to be in the same thermal envelope (conclusion out of information 3a)
5. Let's assume the performance of p6 is 1 and the performance of p7 is 2 (engineering conclusion out of information 1)
5a. Let's assume the performance of p6 is 1 and the performance of p7 is 3 (marketing conclusion out of information 1)
6. 1performanceunit/2cores=0.5, 2 performanceunits/8 cores=0.25
6a. 1performanceunit/2cores=0.5, 3 performanceunits/8 cores=0.375
7. The single core of Power7 seems to have less performance than than the single core of Power6 (conclusion out of information 6 or 6a)
We've gone through all this in the discussion about http://www.c0t0d0s0.org/permalink/About-Power7.html. I wondered why you stopped to comment there. As i wrote there: "As i wrote you before: It's basic math. Of course it's based on assumptions as there is no reasonable benchmark available for the p7. But my assumption aren't based on wishful thinking, they are based on some not totally unreasonable considerations based on the information available." Given the same informations, i would get to the same conclusions again.
Today we know that IBM claimed at HotChips that a core in Power7 is a little bit faster than a core in Power6 in integer. But we have to wait, how much of this owed to the naked core, and how much is owed to the huge cache and what's left of it considering some of the characteristics of the cache (different latencies depnding of the placement of the cached element). I will just waiting for an SPECint and SPECint rate to draw further conclusions.
Furthermore i don't understand the reasoning you use in point 2: Of course they wouldn't have the same performance. A 4-core p7 would roughly have the same performance of the 2 core p6, the 6 core p7 50% more performance and the 8 core 100% more performance than the p6 when you look at the CPU level.
There will be one area where the p7 will be surely shine, that's FP because of the doubling of the FP units. But FP isn't that common in commercial workloads as i explained before. This sweet-spot has its source in the point that p7 is the answer of IBM in the DARPA HPC project.
OK, so you sit there and employ your basic math skills and you come to
1.A conclusion that is inherently absurd.
2. A conclusion that is not shared by a single individual on the planet.
That would stop any sensible person in his tracks. Big red flag. Alarms would go off. Start to question his reasoning. Maybe wonder if picking small bits of information from different sources was a good idea. But not you. It was just too tempting to point a finger at IBM. Your agenda is clearly hurting your thinking.
And caught red handed you blame everybody else. If 999 out of 1000 people reach their destination by reading a map but the last one gets lost what would you say? That the last person can't read a map or that the map is wrong?
Here is the absurd conclusion. Quote from you:
"A 4-core p7 would roughly have the same performance of the 2 core p6"
I guess you think IBM customers must be really stupid but who would pay a lot of money for upgrade to a CPU with twice the cores and the same total performance? Customers who would like to additionally pay twice the Oracle license? Customers who really wants their single threaded batch jobs take twice the time? Customers that are clinically insane?
The 2-3 performance gain was probably for the upgrade from the 2 core P6 to the 4 core P7. You should note that I said "probably" and not "obviously" which is the scientifically correct thing to do when one speculates based on vague sources on a product that is not yet released.
Let's assume for a moment i'm emotional about this stuff and i'm just waiting to point fingers to IBM all the day long. I have an excuse ... i'm working for Sun. Everybody knows that.
But i'm somewhat puzzled about the point, that leads to your massive interest to prove that i'm wrong about something. You are investing a lot of time for this. Why?
I'm really interested about your agenda. And given the insults you are throwing in my direction i'm thinking you are in the emotional corner for quite a time.
I'm really interested why you do this investment? I've looked for you IP number and that was an astonishing experience. I've really thought you work for IBM. No, i won't disclose it here. Pure altruism? Too much time? I'm really curious about the incentive for you to warm up that Power7 core topic after some weeks ...
At the end: The discussion started to be intellectual interesting, but it decayed to a major annoyance.
I'm not afraid of being the only one writing that the single thread performance of Power7 may be smaller than at power6. Of course i can be wrong. That's okay for me. But i drawed my concusion out of documents from IBM officially available and my considerations aren't that way-off. Can i prove them. No. Can you prove that i'm wrong. Again no. My arguments are based on some basic math. Your arguments are based "it would be dumb to do so, and you are thinking the people are stupid.". That's sound quite emotional to me. And it sound like a uncertain foundation to me. Especially considering the paradigm shift in designing CPU.
To get back to the technological part: It looks like that your are pretty convinced of your view to the world. I've recognized this tendency in the discussion. To be honest ... i find this position quite arrogant. Man ... please leave your high horse ... you could look at the IT business with a multitude of views.
I don't think that those people are stupid. Not at all. I'm just thinking that those people will understand something that you didn't understood. Sun calls this concept Workload Computing. I'm sure that similar concepts exist at other vendors just with a different name: There is no single grand unifying technology that allows you to process all your computing needs. IBM knows the same: They offer Opterons, they offer Xeons, they offer Power, they offer Cell. They use the technology where it fits best and use Linux for example as an umbrella to unify all this technologies. Sun has Xeon, Opteron, SPARC T, SPARC64 and our umbrella is called Solaris.
There is nothing inherently bad about the fact that a Power7 core could be slower than a Power6 core. As i wrote before: I don't think of Power7 as the single unitary Power design for the future. I think it will be more a dual headed approach: Power6 with the job to address the single-thread performance and Power7 with the task to address the market of overall performance. Use whatever fits best. You have a batch job? Use Power6 at 5 GHz ... perhaps 6 in the future. You have a workload with many threads, for example DB2 or Websphere (IBM's own pricing model). Use the P7!
Everybody goes this way ... Intel will do 8 core with Hyperthreading in Beckton, Sun does 8 cores with 8 threads with N2 (i assume you've read the annoucement about N3) and AMD will do 12 cores without Hyperthreading in Magnycours. They all sacrify single-thread performance to have more perfomance on the chip level.
I can think of several reasons to use a Power7 with four cores that is as fast as Power6 cores. Power consumption for example. When Power7 has 2-3 times the performance of Power6 at the same power, a Power7 at the same performance as a Power6 would suck a lot of less power out of the wall than a Power6. Given that you have a matching workload this can be quite an incentive to do so.
Anyway i'm thinking that Power7 wasn't designed for maximum performance in batch loads, it was designed to compete in a world of workloads with more and more parallel tasks: Like HPC, like virtualisation (you should know it, as you use lpar in you name), like transaction processing.
Furthermore: I don't know if you had already the opportunity to read the HotChips presentation. If you don't it doesn't make sense to discuss further. But a few additional points.
The Power7 is a beast. There is no way to manufacture this chip at low costs. 567 squaremilimeters are a lot of real estate. That's almost doubling the size of power6 with I see a lot of incentives for IBM and for customers to sell and to buy a much cheaper downrated P7 when it fits the workload. IBM can increase chip yield, customers get cheaper chips.
I don't think that the upgrades for pSeries 595 or 570 will just contain scaled down variants. I think the scaled down frequency of 4 GHz and the smaller chip manufacturing process are responsible for the point that these chips are still fitting in those system. Furthermore the press release of IBM from the July 21,2009 states "A Power 595 upgrade can be accomplished during planned downtime by simply replacing the processors, memory and system controllers with new POWER7 components within the existing system frame. POWER7 processors will offer two to three times the performance of POWER6 using the same amount of energy (and will be available in four, six and eight-core varieties)." Sorry ... i'm reading it like this: I can upgrade my 595 by plugging in 4,6 and 8 core Power7 and i'm getting up to 2-3 the performance in my system. There is nothing in there stating that you can just use 4 core varieties.
And now there comes the killer for your view to the world. I'm even thinking when you go down to the performance of the single thread executed, there may be even less than half the performance when i'm looking at the Hotchips presentation The Power7 core is a SMT4 proc whereas the Power6 core is a SMT2 core. I feel in my bones, that those claims in the Hotchip21 presentation that the p6 single core is a little bit faster than the p7 is based on two points: A well engineered workload leveraging all the execution units in conjunction with with the turbo mode (the Hotchips 21 presentation states, that the Power7 is capable to overclock in case the thermal budget leaves some room). But that's a speculation. I won't opt for the IBM FUD that tried to convince people that that Nigara runs at 300 MHz per Thread . That's not my style
I suggest, we settle down this dispute now ... we have to wait until benchmark results are coming in to get to afinal conclusion, albeit i'm stop seeing the intellectual fun to discuss with you. We can warm this discussion when Power7 is available.
But feel free to turn words around in my mouth trying to prove that i'm having an agenda of pointing with the finger on IBM wherever i find a breach. You are welcome ...
There is an TPC-C benchmark published at 08/17/10
It beats P6 as 6 to 10.
Per core P7 is less powerfull but overall is ok.
But in terms of software licencing per core (ie ORALCE EE) P7 is not e choice.
The author does not allow comments to this entry
The LKSF book
The book with the consolidated Less known Solaris Tutorials is available for download here
Martin about End of c0t0d0s0.org
Mon, 01.05.2017 11:21
Thank you for many interesting blog posts. Good luck with al l new endeavours!
Hosam about End of c0t0d0s0.org
Mon, 01.05.2017 08:58
Joerg Moellenkamp about tar -x and NFS - or: The devil in the details
Fri, 28.04.2017 13:47
At least with ZFS this isn't c orrect. A rmdir for example do esn't trigger a zil_commit, as long as you don't speci [...]
Thu, 27.04.2017 22:31
You say: "The following dat a modifying procedures are syn chronous: WRITE (with stable f lag set to FILE_SYNC), C [...]