QuicksearchCodenews SearchDisclaimerThe individual owning this blog works at Sun Microsystems GmbH in Germany, a subsidiary of Oracle. The opinions expressed here are his own, are not necessarily reviewed in advance by anyone but the individual author, and neither Oracle nor any other party necessarily agrees with them.
NavigationCategories
|
The new TPC-H benchmark from IBMSunday, September 20. 2009Comments
Display comments as
(Linear | Threaded)
Power 5 and Power 6 are two different designs. Trying to indicate that there must be some "problem" with a SMP system since performance does not increase linearly with GHz is therefore pure nonsense(I am not sure why you retort to such speculation when you obviously know better). The 50% increase in core performance is on par with other benchmarks(TPC-C for example).
The Power 5 does more per cycle than both Power 6 and the UltraSPARC in the E25K. It is as simple as that. Only thing I agree with is that the M9000 should be put up for benchmarking. One should not have to guess when it comes to SUNs high end servers.
Okay ... let's assume you can to 154115 things in 5.0 GHz and 100,512 things in 1,9 GHz. So you could do 30823 things per GHz with Power6 and 52901 things per GHz with Power6. Suggesting that this is just the processor would lead to the conclucion that power6 lost half of it's efficiency on the way to 5 GHz. Sorry, i don't like the Power CPUs but they are not that bad.
Additionally SPECint2006 hints in annother direction. Let's assume that SPECint2006 is a good benchmark for testing core performance. The Bull Escala systems with a 2.1 GHz P5+ yielded 12.8 (http://www.spec.org/cpu2006/results/res2007q1/cpu2006-20070219-00534.html) whereas the the power6 yielded 24.9. So let's assume the p5/2.2 core can do 12.8 things this leads to 6.01 things per Gigahertz. The p6/5.0 core can do 24.8 things, thus it can to 5 things per Gigahertz. Yes, the p6 core is slower (e.g. because OOE was removed from it) but it didn't lost half of it's performance on the way. Thus the core can't be the explanation for the point, that an increase with factor 2.6 just yields 50% more performance. It must be something between the core and the data. I can just assume, that Power6 has problems to get all the horsepower to the street and that's my point: It's a nice example why increasing the clock cycles isn't a solution. Perhaps other components than the CPU should move into the focus like more effective connections to the data.
You did not mention the generation step at all in your blog post. In fact the reader was lead to believe that the only difference was the clock speed. Now that is corrected. Good, because that fact is very important.
The INT/FP core performance increase between the two generations seem to be in the 50-70% range. So 50% increase for TPC-H is reasonable. Btw. do you still hold on to the claim that the Power 7 wil be 50% slower per core than the Power 6? I have tried to find other commentators saying the same but with no success, most do state that the core performance will increase. But then you would have full support from the poet Henrik Ibsen: "The minority is always right".
1. Sorry, but what do you not understand about p5-595 to designate Power5 and p595 to designate Power6. Furthermore i wrote for an knowledgeable audience. I assume that the people know that 1.9 GHz was a Power5 and 5 GHz is a Power6. But when you want to hide behind this misunderstanding to explain your comment, you're welcome ...
2. It looks like you still not get it. When you have to speed up your proc by factor 2.6 to yield a 50% increase in performance and considering that 1 GHz in Power6 yields only 17% less SPECint than in Power5 you could come to the conclusion that the next steps of IBM should not look towards more cores or more frequency. They should look towards bringing more of the performance available in the core to the outside. 50% less TCPH performance per GHz due to 17% less SPECint per GHz doesn't sound reasonable to me. I'm just considering integer performance here, because database workloads are mainly integer and it's even forbidden in germany to use floating point in commercial workloads like general ledger or something like that. 3. Based on the data available at the moment when i wrote that article, i would come to the same conclusion. Just to say it again: That were just informations provided by IBM themself and some really basic math including constraints that IBM mentioned again. I thought you understood that at that time. The slides of the Hotchips 21 suggest that the single core will be a little bit faster in integer workloads. That's fine but thats contrary to the statements mentioned by IBM before. We will have to wait to see what part of this speed up is owed to the large cache and whats the performance of the naked core in real world workloads. We will see that, when there are official products. I'm sure Power7 will be a cool proc, but IBM will have to change their marketing message
You can not put the blame on IBM for this one.
1. You were the only one concluding that core performance would be halfed from Power 6 to Power 7. If IBM marketing did a bad job there should be a lot of other people saying the same. Show me them, will you? 2. The reason you came to this conclusion was that you selectively picked bits of information from different sources and combined them with creative logic. I tried to tell you this. I mean by your logic the 4,6 and 8-core Power7 would have the same total performance. Who on earth would produce somethng like that? But you were too blinded by the opportunity to take a cheap stab at IBM. Many slow cores would be following SUN right? 3. One should expect some kind of moderation when you speculate and put out highly controversial statements. But you did not. You were absolutely sure. I quote: "Obviously a single core of the Power7 has just half the performance of the Power6 core." It is OK to be wrong, but the real problem here is that this MO is similar to other blog posts of yours. You are obviously very emotional when commenting on the competition.
No ... i'm note emotional about the competion, i'm just emotional about the point of having to explain stuff again and again and then again. A last try. I'm shorting down the argument a little bit as we know today, that Power7 is a 8 core design.
Informations at the moment of writing the article http://www.c0t0d0s0.org/permalink/About-Power7.html: 0. p6 is a 2 core design, p7 an 8 core design. 1. p7 delivers 2-3 times of p6 at using the same power (Source: http://www-03.ibm.com/systems/resources/systems_power_news_20090721_annc.pdf) 2. p7 fits in the same systems as p6 (at least p595 and p570) (SoD Power7 Upgrade from IBM) 3. Upgrades are done by swapping system controllers and processor books (same document) 3a. There are no upgrades to cooling and power supplies (conclusion out of information 3) 4. The p6 and the p7 boards have to be in the same thermal envelope (conclusion out of information 3a) 5. Let's assume the performance of p6 is 1 and the performance of p7 is 2 (engineering conclusion out of information 1) 5a. Let's assume the performance of p6 is 1 and the performance of p7 is 3 (marketing conclusion out of information 1) 6. 1performanceunit/2cores=0.5, 2 performanceunits/8 cores=0.25 6a. 1performanceunit/2cores=0.5, 3 performanceunits/8 cores=0.375 7. The single core of Power7 seems to have less performance than than the single core of Power6 (conclusion out of information 6 or 6a) We've gone through all this in the discussion about http://www.c0t0d0s0.org/permalink/About-Power7.html. I wondered why you stopped to comment there. As i wrote there: "As i wrote you before: It's basic math. Of course it's based on assumptions as there is no reasonable benchmark available for the p7. But my assumption aren't based on wishful thinking, they are based on some not totally unreasonable considerations based on the information available." Given the same informations, i would get to the same conclusions again. Today we know that IBM claimed at HotChips that a core in Power7 is a little bit faster than a core in Power6 in integer. But we have to wait, how much of this owed to the naked core, and how much is owed to the huge cache and what's left of it considering some of the characteristics of the cache (different latencies depnding of the placement of the cached element). I will just waiting for an SPECint and SPECint rate to draw further conclusions. Furthermore i don't understand the reasoning you use in point 2: Of course they wouldn't have the same performance. A 4-core p7 would roughly have the same performance of the 2 core p6, the 6 core p7 50% more performance and the 8 core 100% more performance than the p6 when you look at the CPU level. There will be one area where the p7 will be surely shine, that's FP because of the doubling of the FP units. But FP isn't that common in commercial workloads as i explained before. This sweet-spot has its source in the point that p7 is the answer of IBM in the DARPA HPC project.
OK, so you sit there and employ your basic math skills and you come to
1.A conclusion that is inherently absurd. and 2. A conclusion that is not shared by a single individual on the planet. That would stop any sensible person in his tracks. Big red flag. Alarms would go off. Start to question his reasoning. Maybe wonder if picking small bits of information from different sources was a good idea. But not you. It was just too tempting to point a finger at IBM. Your agenda is clearly hurting your thinking. And caught red handed you blame everybody else. If 999 out of 1000 people reach their destination by reading a map but the last one gets lost what would you say? That the last person can't read a map or that the map is wrong? Here is the absurd conclusion. Quote from you: "A 4-core p7 would roughly have the same performance of the 2 core p6" I guess you think IBM customers must be really stupid but who would pay a lot of money for upgrade to a CPU with twice the cores and the same total performance? Customers who would like to additionally pay twice the Oracle license? Customers who really wants their single threaded batch jobs take twice the time? Customers that are clinically insane? The 2-3 performance gain was probably for the upgrade from the 2 core P6 to the 4 core P7. You should note that I said "probably" and not "obviously" which is the scientifically correct thing to do when one speculates based on vague sources on a product that is not yet released.
Let's assume for a moment i'm emotional about this stuff and i'm just waiting to point fingers to IBM all the day long. I have an excuse ... i'm working for Sun. Everybody knows that.
But i'm somewhat puzzled about the point, that leads to your massive interest to prove that i'm wrong about something. You are investing a lot of time for this. Why? I'm really interested about your agenda. And given the insults you are throwing in my direction i'm thinking you are in the emotional corner for quite a time. I'm really interested why you do this investment? I've looked for you IP number and that was an astonishing experience. I've really thought you work for IBM. No, i won't disclose it here. Pure altruism? Too much time? I'm really curious about the incentive for you to warm up that Power7 core topic after some weeks ... At the end: The discussion started to be intellectual interesting, but it decayed to a major annoyance. I'm not afraid of being the only one writing that the single thread performance of Power7 may be smaller than at power6. Of course i can be wrong. That's okay for me. But i drawed my concusion out of documents from IBM officially available and my considerations aren't that way-off. Can i prove them. No. Can you prove that i'm wrong. Again no. My arguments are based on some basic math. Your arguments are based "it would be dumb to do so, and you are thinking the people are stupid.". That's sound quite emotional to me. And it sound like a uncertain foundation to me. Especially considering the paradigm shift in designing CPU. To get back to the technological part: It looks like that your are pretty convinced of your view to the world. I've recognized this tendency in the discussion. To be honest ... i find this position quite arrogant. Man ... please leave your high horse ... you could look at the IT business with a multitude of views. I don't think that those people are stupid. Not at all. I'm just thinking that those people will understand something that you didn't understood. Sun calls this concept Workload Computing. I'm sure that similar concepts exist at other vendors just with a different name: There is no single grand unifying technology that allows you to process all your computing needs. IBM knows the same: They offer Opterons, they offer Xeons, they offer Power, they offer Cell. They use the technology where it fits best and use Linux for example as an umbrella to unify all this technologies. Sun has Xeon, Opteron, SPARC T, SPARC64 and our umbrella is called Solaris. There is nothing inherently bad about the fact that a Power7 core could be slower than a Power6 core. As i wrote before: I don't think of Power7 as the single unitary Power design for the future. I think it will be more a dual headed approach: Power6 with the job to address the single-thread performance and Power7 with the task to address the market of overall performance. Use whatever fits best. You have a batch job? Use Power6 at 5 GHz ... perhaps 6 in the future. You have a workload with many threads, for example DB2 or Websphere (IBM's own pricing model). Use the P7! Everybody goes this way ... Intel will do 8 core with Hyperthreading in Beckton, Sun does 8 cores with 8 threads with N2 (i assume you've read the annoucement about N3) and AMD will do 12 cores without Hyperthreading in Magnycours. They all sacrify single-thread performance to have more perfomance on the chip level. I can think of several reasons to use a Power7 with four cores that is as fast as Power6 cores. Power consumption for example. When Power7 has 2-3 times the performance of Power6 at the same power, a Power7 at the same performance as a Power6 would suck a lot of less power out of the wall than a Power6. Given that you have a matching workload this can be quite an incentive to do so. Anyway i'm thinking that Power7 wasn't designed for maximum performance in batch loads, it was designed to compete in a world of workloads with more and more parallel tasks: Like HPC, like virtualisation (you should know it, as you use lpar in you name), like transaction processing. Furthermore: I don't know if you had already the opportunity to read the HotChips presentation. If you don't it doesn't make sense to discuss further. But a few additional points. The Power7 is a beast. There is no way to manufacture this chip at low costs. 567 squaremilimeters are a lot of real estate. That's almost doubling the size of power6 with I see a lot of incentives for IBM and for customers to sell and to buy a much cheaper downrated P7 when it fits the workload. IBM can increase chip yield, customers get cheaper chips. I don't think that the upgrades for pSeries 595 or 570 will just contain scaled down variants. I think the scaled down frequency of 4 GHz and the smaller chip manufacturing process are responsible for the point that these chips are still fitting in those system. Furthermore the press release of IBM from the July 21,2009 states "A Power 595 upgrade can be accomplished during planned downtime by simply replacing the processors, memory and system controllers with new POWER7 components within the existing system frame. POWER7 processors will offer two to three times the performance of POWER6 using the same amount of energy (and will be available in four, six and eight-core varieties)." Sorry ... i'm reading it like this: I can upgrade my 595 by plugging in 4,6 and 8 core Power7 and i'm getting up to 2-3 the performance in my system. There is nothing in there stating that you can just use 4 core varieties. And now there comes the killer for your view to the world. I'm even thinking when you go down to the performance of the single thread executed, there may be even less than half the performance when i'm looking at the Hotchips presentation The Power7 core is a SMT4 proc whereas the Power6 core is a SMT2 core. I feel in my bones, that those claims in the Hotchip21 presentation that the p6 single core is a little bit faster than the p7 is based on two points: A well engineered workload leveraging all the execution units in conjunction with with the turbo mode (the Hotchips 21 presentation states, that the Power7 is capable to overclock in case the thermal budget leaves some room). But that's a speculation. I won't opt for the IBM FUD that tried to convince people that that Nigara runs at 300 MHz per Thread I suggest, we settle down this dispute now ... we have to wait until benchmark results are coming in to get to afinal conclusion, albeit i'm stop seeing the intellectual fun to discuss with you. We can warm this discussion when Power7 is available. But feel free to turn words around in my mouth trying to prove that i'm having an agenda of pointing with the finger on IBM wherever i find a breach. You are welcome ... |
Links in this articleThe LKSF bookThe book with the consolidated Less known Solaris Tutorials is available for download here
Twitterfeedstwitter.com/c0t0d0s0
just blogged: links for 2010-03-19: Gedanken eines Fliegenden: Freikoerperkultur ... http://bit.ly/c21ARU twitter.com/codenews 6935782 need to manually increment build number one more time http://bit.ly/aMqEbX twitter.com/SunPatches 128365-04 - Sun Crypto Accelerator 6000 1.1: Driver Patch. Available for SPARC since Mar/19/10. http://bit.ly/agl9Nw twitter.com/SolPatchesX86 118192-04 - SunOS 5.9_x86: gtar patch. Available since Mar/19/10. http://bit.ly/cbnoJ7 twitter.com/SolPatchesSPARC 118191-04 - SunOS 5.9: gtar patch. Available since Mar/19/10. http://bit.ly/cb2Drj Web 2.0Contact
Networking open.bc My photos SyndicationTagged articlesAMD Apple avs Bahn Blogging Blogosphere braindump Business Travel CeBIT cec cec2006 CMT del.icio.us deutsch dtrace fliegen Fundsache General Hamburg IBM i hate sundays Intel iscsi jumpstart Links Linux lksf Mindfuck Movies Music Musik Niagara Opensolaris Opteron Photographie policy of ... Politik Security Solaris storage Sun suncec2007 sunw t1 The IT Business Ultrasparc ultrasparc t1 Wirtschaft Work ZFS
Comments about Who are you?
Sat, 20.03.2010 02:15
Ich bin im Rahmen der Diskussi
on um das Zugangserschwerungsg
esetz auf dein Blog gestoßen.
Als Linux-Begeisterter d [...]
Sat, 20.03.2010 00:32
The article doesn't explain wh
y the adquisition of Sun is go
ing to be a sucessfull. It onl
y says that we all know: [...]
Fri, 19.03.2010 20:58
Well, I am being paid to take
care of Solaris 10 systems and
my company will continue to u
se it. But the relativel [...]
Fri, 19.03.2010 17:36
Actually I am curious to know
what would have happened if th
ey objected.
Fri, 19.03.2010 17:31
I agree, it has been a very il
l and stupid waiting...full of
stupid statements...I was so
much waiting for them to [...]
Buttons![]() This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Germany License
![]() ![]() ![]() Blog Administration |
Do you remember my comment regarding the TPC-H benchmark of the p595? Well, this result has been withdrawn for unknown reasons as stated by the TPC-H page for such results. What a pity ... i had found something new to talk about, but now i'm not allowed t
Tracked: Oct 01, 18:09