The individual owning this blog works for Oracle in Germany. The opinions expressed here are his own, are not necessarily reviewed in advance by anyone but the individual author, and neither Oracle nor any other party necessarily agrees with them.
Tuesday, October 6. 2009
Nice to see, that i'm not the only one who thinks, that IBM will run in the same challenges like Sun in regard of massive multicore processors. Great to see, when this position is somewhat confirmed by someone who isn't known as friendly to Sun and is said to be firmly on the blue side.
BlueToTheBone writes in his column "The Four Hundred" about Moore's Law and the Performance Wall:
Well, with the Power7 chips coming next year, IBM has to get the multithreading fixed in DB2 for i or get a whole lot of excuses ready for why customers buy more cores and threads, running at lower clock speeds, and don't see performance go up.But he points to an even more interesting point, that isn't really a known territory to an open systems guy like me. Besides of this "bytecode-compiled is slower than iron-compiled" stuff (which isn't true since the invention of Just-in-time-compilers) he has a very valid point. Much of the software is really old, and it wasn't written for a environments with hundreds of cores. We learned the hard way, that there were a vast amount of pitfalls in the Open System world which thinks multithreaded/multiprocess for quite a time now. Now software developed in RPG and COBOL (and many code lines originate from a time when many of us weren't even a glint in the eyes of our parents) hits with Power7 on an environment that doesn't fulfill on the promise of ever-increasing single-thread performance. BlueToTheBone comes to a similar conclusion than the one i've made a while ago. Perhaps many applications stay on Power6 while newer developments can move to Power7:
. It might even mean putting off a move to Power7 iron and sticking with Power6 or Power6+ boxes as you dig through your code and see how parallelization can and cannot be used to make your applications run faster as well as offer more capacity to support more work.
Display comments as (Linear | Threaded)
I hope that when you said "confirmed by someone isn't known to be friendly to Sun", you weren't referring to me
Both IBM's power and Sun's Sparc would lose major market to 8-32way Nehalem-EXs, that is for certain, and it has nothing to do with technology as Jeff Bonwick correctly points out, it has everything to do with volume economics due to Intel's manufacturing arm.
If you don't know already, I worship Sun's Software Engineering. If anything, Sun should really take some hint.
The hint is, although Sun's software engineering has shifted to scale out architecture and open source, Sun's hardware business still hasn't realized that using commodity components really means you have to have commodity value too.
This applies not only to Sun's Open Network Servers, but it also applies to Sun's Open Storage Servers too. The thing is: no one will pay double or triple the market price for a system that Dell and HP can offer too.
I would say this: if Sun went to Dell's server website, and priced out a system, and see if they have anything within 25% of what Dell can offer. It is revealing.
Right now, I would pay a 25% premium for Sun gear. But I wouldn't pay 100% premium for Sun gear.
Another issue that really bugs me recently, is how much bias the Open Storage hardware design is toward AMD. Given the current market situation where DDR3 and DDR2 have come to price parity on spot market and how the Nehalem architecture clearly beats the Opterons, I am amazed that Sun revamped the 7400 series systems to use 6 core Istanbuls, when 2 Socket Nahalem-EPs offer far better value and performance. It is not that Sun doesn't have Nahelem designs to be used for Open Storage.
If I were Sun(trust me, I want Sun to thrive so that they can invest more toward ZFS), I would have replaced the entire lineup of the Open Storage systems to Nehalem architecture already. Call it the 8000 series.
Open Storage 8110: Single Socket Lynnfield Xeon X3450 with 24GB DDR3 ram, same chassis as 7110, with Intel X25-E as ZIL+L2ARCs
Open Storage 8210: same as 7210, with dual Nehalem-EPs
Open Storage 8310...you get the idea
Open Storage 8410 will be 2 Socket Nehalem-EX with 128GB ram
Open Storage 8510 : 4 Socket Nahelem-EX with 256GB ram
Open Stoage 8610 should be 8 Socket EX with 512GB ram.
All I ask is that Sun deliver those system at 25% premium over HP/Dell gear, and Sun will be in a much better financial position than they ever will be. Volume begets volume, and it will amortize Sun's employees' salaries far better.
1. No ... i did not refered to you ... why should refer to someone who seems go get a sales commission on Nehalems.
2. Oh no ... not this "Sun is soooooo expensive" thing. Random system - X4170 Config 4 (2 Intel Xeon X5570, 2.93 GHz, 12 GB (6 x 2 GB) DDR3-1066, 584 GB (4 x 146 GB) 10000 rpm 2.5-Inch SAS Disks, 1 DVD+/-RW, 4 x 10/100/1000 Ethernet, 3 PCIe 2. 1 x 100-240 V AC), 24/7 HW Support $ 9,041.67 (http://www.c0t0d0s0.org/uploads/sunpricing.png )... okay ... let's go to Dell : PowerEdge 610 $8,930 (http://www.c0t0d0s0.org/uploads/dellpricing.png). In my world this are round about a percent.
3. When you compare the prices of barebone DELL/HP to Sun Unified Storage, you forget that those systems are appliances. For e.g. you get Storage Analytics and other features. Good luck at programming it on you own without spending more than the costs for the Filer
4Why sticking at AMD? There are no Quad-Socket configurations (so you couldn't create a 7410 with nehalems today) A fully populated Nehalem System downclocks the memory to 800 MHz, so the advantages of the Nehalem MC can't be used and furthermore: Istanbul is here , you can buy it ... Nehalem EX is still somewhere in the labs and fabs of Intel. No one can buy it.
At end one advice: Please stop to be such an Nehalem EX trumpeter ... this get boring and annoying in the way you tout it as a solution for everything. I'm sure an 32-way Nehalem-EX will have some interesting performance characteristics, but the performance in non-embarrasingly parallel applications will be the proof point. And nobody knows how the system behaves in such tasks.
Depressing, you can't tell the difference between a friend and a foe.
I trumpet Nehalem, because it is the best x86 architecture in the world. Yeah it downclocks to triple channel DDR3-800, so you get 6 channels of DDR3-800 on a 2S system. Compared to Istanbul's 8 channels of DDR2-533 downclock on a 4S system. Are you sure that Istanbul has more bandwidth?
I have said a long time ago, Sun's choice of hardware has always been about how to get the most non-commodity parts so Sun can double stack the premiums.
BTW, Dell R610 with 2x X5570 and 48GB ram is $7043. Sun X4170 with 2x X5570 and 48GB ram is $9344. Sun premium over Dell in this case is 33% or so. This is actually not bad.
What is bad is the list price for a pair of dual Socket 7410s listing for 170K on Sun's website. Yeah, 2x HA dual socket systems plus a JBOD with 7200rpm drives for 170K list. Compare that to a pair of Dell R710s with a pair of MD1000s. That's the comparison I am making. Your list price on Open Storage system is a joke. All it does is turn people who are prospective buyers into Dell and HP customers.
Remember, blue to the bone is your enemy, I am not. Although I do question whether you should call him "blue to the bones" in the first place.
And one more thing:
People actually want to buy X4170 based Open Storage system.
I just configured a X4170 with Dual L5520 with 72GB ram, and it came out $8100. So a pair of those would be 16K. Plus a J4400 JBOD at $3700. Plus 22x 1TB WD RE3s = 3K, Plus 2x Intel X25-E 32GB $700. The actual price for a HA pair of X4170 with 72GB ram with a JBOD full of SATA drives should be around $22-$25K or so, that is far from your asking price. Even if you add another $10K for a pair of Stec ZeusIOPs instead.
The problem is: you can't buy a X4170 based Open Storage system Whoops.
So by now, you know that I am versed in Sun/Dell/HP gear. I really hope that one day, Sun will wake up from its day dreams and realize that the kind of Sparc premiums Sun enjoyed in the 90s is no longer possible in today's market conditions. This is actually from someone who wants Sun to do well.
>People actually want to buy X4170 based Open Storage system.
Why and who? People are interested in performance for their tasks, not the chip inside.
The 4 Socket, 4 Core AMD Based 7410 system has enough Ooompfh for the most demanding tasks.
You have no idea what you are talking about. Of course people care about the chip inside. If you can get more performance from 2S Nehalems vs 2S Istanbuls, at lower power and lower price, PCI-e 2.0 bandwidth from dual IOH36Ds, and more memory bandwidth/channels from DDR3 of course 2S Nehalem is a better choice. Of course, Sun has no interest in doing so, why give customers more CPU power and memory capacity, when you can give them 8GB of ram, and no SSDs like the 7110 right?
Sun's business interest is here to sell 4x $2600 Istanbul CPUs instead of 2x $530 L5520s, or 2x $1300 X5570s, even though in most storage situations, the CPUs just sit there idling. Besides the max ram difference of 72GB vs 128GB, actual CPU performance differential is hardly noticeable.
Don't think that your potential customers are retards. That's all I have to say.
>Don't think that your potential customers are retards. That's all I >have to say.
I am a customer, and I am the least interested in the CPU that is in the 7410.
In fact, compared to the rest of the system, the CPU it is almost the cheapest part. Not to say the system is expensive (the opposite!), but as a customer, I'm buying a system.
Sun could put MOS 6510 CPUs in the system. If the system is up to the task and has a nice price tag, I will buy.
It is interesting, I wonder what is your configuration that you bought?
The Sun Hybrid Storage pool has one weakness: that is, your workload must all fit in SSD L2ARCs. If your work load exceeds the L2ARC size, then you take a nose dive in performance. That's why Brendan Gregg's benchmarks are all either fit in memory or fit in SSDs.
I wonder how big is your workload and your 7410 config?
Mine is R710+L5520s+72GB ram + Dual MD1000(2x X25-Es+13x 1TB RE3s) + 2xMSA50s filled with X25-M SSDs as caches.
There are so many more factors, like number of disks, hot/warm/cold data, pool configuration.
Do you know why our NetApp Filers took a nosedive? Because the filesystem metadata (!) did not fit into NVRAM.
Yeah, NetApp is going through some serious issues right now. ZFS is many orders of magnitude superior than NetApp. Sun really has something going in storage.
Still what is your workload size and configuration? I am curious. My workload is about 2TB. Pretty much all in caches right now, but since L2ARC isn't damn persistent, I have to leave the server on at all times, sucking power.
As to the HSP weakness, Sun wants people to believe that they are selling a 10TB storage pool with 16K write IOPs and 30K+ read IOPS. Really the real magic is in the Ram+SSD caching. Once you blow through that, you are as screwed as a turkey. So no, you don't get to 10TB storage before performance degredation starts. (I am talking about 8K and 16K DB workloads)
My config is a magnitude larger, and I will blog about it later...
Don't know who at Sun sells it like that, IMHO it was always clear to me that with 1TB SATA disks (and with any other) you have a hard limit for IOPS.
The huge amount of cache available helps to relief the pressure to buy expensive 15k disks.
BTW: Interesting PDF related to this topic: http://www.caiss.org/docs/DinnerSeminar/TheStorageChasm20090205.pdf
Basically confirms that SSD/Flash as Cache+SATA is the way to go for the next few years...
Order of magnitude larger? Unless you work with large files, where streaming performance of the hard drives matter, it is all about SSD small file caching.
I don't think Sun supports more than 600GB SSD caching(6x100GB) so far.
Still, love to hear your config and workload.
A cluster of the gear supports 1.2 TB of SSD.
Where is your blog Mika, if you don't mind telling?
think your discussion with Mika and all your other comments deserve a longer comment. Obviously Sun doesn't think that all the people are retards out there. I'm a presales engineer in my daytime job. This blog is just my job in the evening or in low load moments. The problem with your comments is that you think that Sun just consists out of greedy retards. And with this you imply that many of the customers of Suns are just dumb retards buying overpriced stuff. One problem with intelligent people in a certain age is that they believe to understand the world and think off all others just as morons. But those morons have several reasons to be morons.
Obviously you can get to your favorite hardware dealer and buy some disks and save some money. I found it really strange, when we discussed about HP/Dell prices after the proof that they have almost the same price than the Sun one, that you use newegg ones in HP/DELL. OF course you can do this in a Sun Server as well. And you get the same answer from Sun Support as from Dell Support or HP Support: "Sorry we cant help you"
But: Are they really the same. I wrote a while ago something about this 600$ hard disks. There are several reasons why harddisks are that expensive when you purchase them from a tier-1 vendor, and that isn't the case because of a bunch of greedy vendors. In an environment competitive like the server market greed is punished. And please compare the price of Dell/HP servers with free-market disks with the price of vendor delivered disks in a Sun system. For example: How do you know that the firmware of your disks is reasonably bug-free? Perhaps you know it by experience or you are willingly to take the risks. But other people think different. They and to choke a throat in case of a bug. Because Sun/Dell/HP don't want to be choked, they check and test the disks in qualification testing and after those testing it get the label "It's okay for enterprise use. Feel free to choke our throat, when you find a bug that has gone through all the tests".
Many people feel, that hard disks and memory and things like that are commodity componets. But they aren't. We just believe it, because we get poor to medium quality components very cheap at the next el-cheapo shack. But quality hardware with vendor support, assured capability to deliver and so on isn't cheap.
It's pretty much the same with car insurances. In Germany car insurances are relatively cheap. When you drive some years without accident even a high powered model is dirt-cheap. One of the reasons is the dense network of technical checks. Every two years the german driver is in fear of his car and all the bills in regard of getting the car through the checks. So we have only a relatively low number of accidents due to technical defects. The car insurance prices drops. Other countries have different systems. High prices for insurances, but you are able to get a car through the checks that would shock the living daylights out of every german TUEV engineer.
The first one tries to prevent the damages. The second one pays for the damages. Now take this two different mindsets and put it into the IT.A good example is your comment thread with Mika. Given the email addresses of you both it's a not too far fetched assumption that you both have two separate mind sets.
As i wrote before in citation of Marten Mikos: There are customers with money, but no time and there are customers with time, but no money. Of course you can create a cheaper version of a S7000 with commodity parts. But others like to buy a package: With all the knowledge, with all the risk mitigation, with all the stuff you need in an enterprise environment. They don't give a sh...t which proc runs in this system, they just want a running package.
There is a problem with the marketing of the S7000. It consists out of components that looks exactly like our servers and the people think: Where is the difference? Why do i pay more for the S7000 than the sum of components. The reason is simple. It's a system, not a bunch of components. It's tested in it's entirety. It's developed to be an appliance with no access to the OS.
I have more than one customer that said: I don't want you hardware, but i want your software. Because the people recognized that the S7000 isn't a cheap offer compared with the DIY approach, but they recognized as well that it takes a long time to replicate all the developments in the S7000. But it doesn't work that way. It's a system. And to be honest: The price is calculated in that way to sell both in conjunction. Many people are used to get their software for cheap, but someone has to develop the Shadow Migration, someone has to develop the User Interface, someone has to develop all the small and big additions to the pure Solaris that are embedded into the firmware of the System. And those developments have to be funded. How do you fund such developments? By selling products. It's that simple. Of course: When you just need OpenSolaris and ZFS, just use it, use your favorite vendor of hardware and use OpenSolaris or Solaris 10. Perhaps you even buy an support contract. Given the fact that you use L2ARC, i assume you use OpenSolaris 2009.06. Or do you have need no support contract? And then you have to pay people like Brendan getting the most out of the system. BTW: Brendan wrote in his benchmarks that his tests were done to max out the CPUs to show case the maximum performance of the gear. He made that very clear. Of course your performance will be smaller. But interestingly we many many customers whose hot-data fits in the memory and the warm data fitting into the SSD. Of course there are alway workloads larger or with a pattern killing the SSD advantages. But for 80% it's a reasonable fit.
Maybe you are right about the performance advantage of Nehalem, albeit i consider Opteron as a very balanced plattform perfectly for a task like file serving. Additionally: What you see at the moment is the classic game of leap frogging. At one moment a certain CPU is faster than the other, but this changes. The problem: When you have a product in the market you can't jump to get some additional platform. You have to test all the stuff again, you have to tune the stuff again. You don't jump from one system to another. That would make the product more expensive as you have to support several different systems.
It's perfectly possible that Sun Engineering will switch over to Nehalem as soon all requirements are fulfilled by this proc (eg quad-sockets), the performance characteristics are fully understood.
BTW: Interesting that you are citing Anandtech. Just have a look at the recent article there and it's conclusion at http://it.anandtech.com/IT/showdoc.aspx?i=3653&p=10: "So who wins? Intel's dual socket, AMD's dual socket, or AMD's quad socket platform? The answer is that it depends on your performance/RAM ratio. The more performance you require per GB, the more interesting the dual Nehalem platform gets. The more RAM you need to obtain a certain level of performance, the more interesting the AMD quad platform gets. " As you may have noticed those S7000 systems are usually maxed out on the memory side.
By the way ... on one side you tout that the Nehalem EX is the faster and better proc, but on the other side you wrote that the CPU is unimportant.
I want to come to a conclusion at the end to end this topic from my side as i want to proceed with writing new articles: You should take into consideration, that other customers have other views on the same problem and come to other conclusion. The fact that the S7000 is a well-selling product it the fact, that our conclusions about the market can't be that wrong. Enough said from my side.
That was a long post!
1. I agree with you that there is value in Sun's software. You have to understand the fact that people who really understand ZFS's superiority typically do have the intelligence that Sun's jacking them off with retarded hardware configurations. Even Apple's margins for Macs is typically 50%. Intel's margins are usually 40-45%. You simply cannot expect to charge a pair of two socket systems with a JBOD for 170K. Trust me, less is more in this case.
2. Look at the recent IB S7000 benchmark posted. 30% CPU utilization to fill Infiniband network. That confirms my comment earlier that mostly the CPUs will sit there idling. Another observation from the IB test is that you are bound by PCI-e 1.0. So you have a few choices: go Nehalem with Dual IOH36D like your X4275, or revamp 7410 again with AMD Fiorano chipsets, which could be costly considering that it is really intended for G34 products.
3. I source most of my gears from Dell VARs. I can't tell you which one, but I can tell you that I do get R710+72GB ram+L5520s for roughly $5K or so, with 3 year Dell warranty. I also souce Dell certified MD1000/MD1120 with 10K SAS drives for under $5K or so. That's roughly half of what Sun X4170 costs on Sun's website. Now if Sun has VARs that give 50% off of list prices, you should let people know. Better yet, just list it on the Sun website.
4. I don't want to go into Anandtech for now, but the author of Istanbul articles is pretty much in bed with John Fruehe. As if comparing 24 cores against Intel's 8 cores is a fair fight in the first place. Performance per watt per dollar favors Nehalem designs by roughly 400% or so.
5. Nehalem-EX is a better processor than Istanbul for sure. And it is better than MCM Mangy Cours too. CPU performance isn't important, but number of DIMMs support and number of memory channels is. Nehalem-EX uses Scalable memory buffers so it wouldn't downclock the memory and you get 64 DIMMs over 16 channel DDR3-1333 vs 4S Istanbul's 32 DIMMs over 8 channel of DDR2-533. That's a big difference. That means you can use 4GB DIMMs instead of 8GB DIMMs to get to 256GB, and get more bandwidth too.
So far the S7000 is a bold idea from Sun. I like it. But I do have a lot of design criticisms for it.
A) Migration to Nehalem-EP is overdue.
B) Migration to SAS 6Gbps LSI 2108ROC controller is overdue for the L2ARC SSDs
C) Migration to Intel X25-E is overdue.
D) A better Istanbul to choose if you don't want to use Nehalems yet, is the 8425HE/2425HE or even the Shanghais with HT3. Since the CPU utilization is usually below 25% in real world workloads, it is better to go low power processors.
I am not affiliated with Intel at all. I will keep on watching ZFS developments.
Sorry, but that the CPU is sitting idling isn't true.
1. While some Cores may sitting idling at a point in time the rest of the CPU is put under heavy load, like the I/O-Subsystem, the interconnects between the CPUs and so on.
2. You have to remember, that a S7000 isn't your fathers storage system. You want cores in you storage system, you want many of them: You want them to do iSCSI, you want them to do NFS, you want them to do CIFS, you want them to do Infiniband. You want them to do checksum calculations, you want them to do encryption, you want them to do deduplication, you want them to do a lot of tasks. And you want them to do this all at the same moment.
If your workload fits in ram, like Brendan Gregg's recent pumpage of 7410 Istanbul revamp, then it is just a ram benchmark, so CPU and interconnect performance matters.
If your workload fits in L2ARC, then CPU performance matters a little less.
If your workload doesn't fit in L2ARC(2x more than L2ARC for example), then you are fundamentally bottlenecked by the hard drives's random IO.
I darn you or Brendan Gregg to benchmark a X4170 with Dual L5520s or X5570s against the Istanbul 7410.(even the 4S one. Even recent anandtech article points out, that 4 Sockets of Istanbuls only edge out the X5570 by roughly 20% or so)
Disclaimer : I am a bit blue also!
> "Besides of this "bytecode-compiled is slower than iron-compiled" stuff (which isn't true since the invention of Just-in-time-compilers)"
I can help with that (I hope) : TPM is refering to AS/400 (iSeries, System i, or whatever the marketing droids think of next) cfr. http://en.wikipedia.org/wiki/AS/400
It was preceded by http://en.wikipedia.org/wiki/IBM_System/38 and that page explains a bit the mystery of MI or TIMI microcode.
Completely different beast from AIX.
Imagine being very young, looking this machine which has 48bit adressability while mainframes were still 24 bit, being fully object oriented by using bytecode to insulate programs from hardware (sort of a Java VM n'est-ce pas), with an integrated database (you upgraded the OS? your relational database is up-leveled automaticaly : try that with anything else out there)
"the System/38 was commercially available in August 1979."
IBM Rochester HAD an advance over everyone (including the rest of IBM ) for decades.
That"s enough for one box.
The author does not allow comments to this entry
The LKSF book
The book with the consolidated Less known Solaris Tutorials is available for download here
Martin about End of c0t0d0s0.org
Mon, 01.05.2017 11:21
Thank you for many interesting blog posts. Good luck with al l new endeavours!
Hosam about End of c0t0d0s0.org
Mon, 01.05.2017 08:58
Joerg Moellenkamp about tar -x and NFS - or: The devil in the details
Fri, 28.04.2017 13:47
At least with ZFS this isn't c orrect. A rmdir for example do esn't trigger a zil_commit, as long as you don't speci [...]
Thu, 27.04.2017 22:31
You say: "The following dat a modifying procedures are syn chronous: WRITE (with stable f lag set to FILE_SYNC), C [...]