Thursday, July 24. 2008
Oracle and CMT are often a natural choice. Whenever you have many parallel requests and the latency isn´t a key performance indicator, you should give it a try. But sometimes there are loads, that should scale well on CMT systems but they don´t scale well. In most cases there are some quirks in the SQL statements that makes the code single- or few-threaded. Glenn Fawcett summarized some great tips for Oracle and CMT in a series of blog articles to overcome such problems.
(via: Stefan Hinker)
Saturday, July 12. 2008
Lawrence Spracklen wrote a really interesting article about cryptography done in software on CMT. This sounds counterintuitive at first, as cryptography is considered as a computational intensive task and thus considered as a tasks for fast superscalar cores. But according to the article from Lawrence this is a implementation issue. Take the strength of the CMT architecture, and the result is a little bit different: As a result, as the number of strands is increased, performance scales almost linearly. Indeed, for Niagara, per-core Kasumi performance is around 8 times the performance of a single strand, and per-chip Kasumi performance is close to 64X single-strand performance. Indeed, single-core Kasumi performance is around 1.3X the performance of a single-core of a 3GHz Xeon processor.
Sunday, July 6. 2008
This is the english translation of an article i wrote to answer an strange blog article in another german blog. While answering it, i found a case of "History Repeating".
In the actual installment of the Prozessorgeflüster (a reoccuring CPU technology article in the german computer magazine c´t) Andreas Stiller discusses the the first facts about the new manycore processor "Larrabee" from Intel. Stiller states in this article, that this processor was discussed to release with with 16 to 24 cores, but will release with 32 cores when it appears in 2009. To the surprise of many experts, the cores are well known. The cores are nothing more than Pentium P54C cores. Well, the P54C was announced 1994. This would be similar to a Niagara T1 on the basis of the SuperSPARC II, which was announced 1994 as well.
Annother interesting fact reported in the article: The Larabee xPU will use roundabout 300 watts. That´s much much more than a UltraSPARC T1 CPU. I know, Larabee is a GPU at start, but do you really believe that a manycore GPU with x86 compatible commands will stay on the graphics card for long time?
There is a small irony at this story. Sun learned many things about designing multicores when Sun developed the MAJC-5200 CPU and used it on the XVR1000 and XVR4000. The XVR4000 was the graphic card, that didn´t used something earthly like a PCI-bus, you plugged it directly onto the Fireplane Interconnect of the V880 instead of the CPU. Sun learned so much about multicore that some of the findings will reappear only in future incarnations like the Rock CPU (speculative multithreading e.g.)
The funny (and "history repeating" part): Intel start s to use it´s first manycore design on a graphics card as well. Like we did ... in 2002.
Wednesday, June 25. 2008
There is an interesting rumour about Niagara 3 at Register: Sun's Niagara 3 will have 16-cores and 16 threads per core. Sun Microsystems looks poised to lead the "mainstream" multi-core race for at least a couple more years. By late 2009, the server maker should deliver a third major revision of its Niagara processor which will have 16 cores and an astonishing 16 threads per core, The Register has learned. As usual i don´t comment the level of truth in such rumours, but just two things: At first, despite to the opinion in the comments ... we don´t kill of Rock. Period. At second: Ashlee doesn´t know half
Tuesday, April 15. 2008
As usual Paul brings an interesting perspective into discussion: On the other hand.. the way the processors are coupled - done by replacing the the T2’s on board 10Gbyte facility - demonstrated that Sun can now produce highly customized versions of the core CPU set and suggests what I believe may be a unique performance opportunity for this product line. Taking into consideration the additional transistor budget by the usage of upcoming process technologies, it should be feasible to integrate other interconnect technologies as well, for example Infiniband on die (just think about the latency advantages of an Infiniband port directly conneted to the crossbar) or the integration of additional support circuits for special tasks. I assume, there is only one limit ... the pin count of the proc ... at a certain point you can´t get all the interfaces out of the chip in an economical feasible manner.
Tuesday, February 12. 2008
Octave Orgeron did a really good review of the T5120 server. Octave closes his findings with: The T5120 and the T5220 offer many unique and exciting features that set it apart from the competition. The UltraSPARC-T2 processor with 8 cores, 64 threads, 10Gb Ethernet, PCI-E, and cryptographic features are revolutionary in the computing industry. It is amazing to think that not long ago, it would have taken a much larger and more expensive solution to equal the features and benefits of these servers.
Thursday, December 20. 2007
Wednesday, October 10. 2007
This was the only presentation i´ve left early. Nothing new in it for people who try to stay informed and the presentation had room for improvement (to say it polite) . Nevertheless i can´t write about the contents of this presentations, as it was marked as "Sun internal" only. Enough said....
Thursday, September 27. 2007
Rick Hetherington was in Munich today ... he gave a good presentation into the future of CMT SPARC and what we will see in the near and the middle term. Many people have the opinion, that Sparc will see it´s end soon, especially after the announcement of TI to stop investments in new process technologies. The direct opposite is true, more than ever. I was quite impressed. Unfortunately it´s not up to me to disclose things.
Friday, August 24. 2007

We fastly ramping up UltraSPARC T2. Here you see: 12 E10000k, only a little bit short of a quarter Terabit per second networkbandwith, 0,75 Tera byte per second memory bandwith, 96 cores, 192 integer pipelines, 96 crypro accelerators, 96 floating point units, 768 threads, 768 virtual processor. Imagine this processors in some nifty boxes, with a nice silver finish, 1 or 2 rack units high. Consuming only 1200 Watts for the processors. Imagine the load this babies can handle. 5 years ago, you had to fill a complete datacenter for a similar performance. Now it´s a small tray of CPUs (respective 12 or 24 rack units). Simply mind-boggling. But it´s even getting better: In the near future you will have this sheer power in only three machines.
Thursday, August 23. 2007
Ich muss ja ganz offen gestehen, das ich die c´t nicht mehr lese. Die Zeitschrift hat vor vielen Jahren irgendwie den letzten Reiz verloren und in Zeiten der weltumspannenden, drahtgestützten Pornoverteilung gibt es als Nebenprodukt viele weitere Webseiten, die viel schneller Informationen liefern. Einer der wenigen Lichtblicke der c´t waren zuweilen die Kolumne "Prozessorgeflüster" von Andreas Stiller, auch wenn dieser sich selten positiv ueber das Haus Sun geäussert hat. Das scheint mit Von Wasserfällen und Ökowellen endlich ein Ende zu haben. Ich zitiere mal meinen Lieblingssatz: All das ist jetzt Geschichte, denn nun hat Sun mit dem UltraSPARC T2 (Codename Niagara2) wieder ein ganz heißes selbstgestricktes Eisen im solaren Feuer, das zumindest als Einzelchip in der Multithreaded-Welt alle anderen in die Finsternis schickt - wobei man Intels „geschummelten“ Quad-Core mit zwei Dice in einem Gehäuse ruhig mit hinzurechnen darf.
Tuesday, August 14. 2007
Die PC Games berichtet über Niagara2 in Schnellste CPU: Nicht von Intel oder AMD: Die Werte des professionellen SPEC-rate-Benchmarks, welcher auch die Skalierung mit mehreren Kernen berücksichtigt, liefern einen großen Vorsprung gegenüber der Konkurrenz aus den Häusern Intel, AMD und auch IBMs Power6. Hätte ja nicht gedacht, das mal aus der Richtung Lob kommt.
Friday, August 10. 2007
Lawrence Spracklen gives a detailed overview about the crypto performance of the Niagara II chip. Alexandre Chartre decribes in his blog the usage of Linux on an UltraSPARC T2 processor.
Wednesday, August 8. 2007
What did we announce yesterday? In my opinion, the most important chip for Sun. It´s even more important than Rock. When you look around, the loads for single threaded performance decrease, and as the complete industry turns in the direction of "more cores" (even IBM talks about this in the Power7 timeframe). Rock will be important for the highest end. But the future of general purpose computing begins right here, right now.
So, why is Niagara 2 so important. Niagara 1 had it´s weak points for general purpose computing. We decided to design a processor with such weaknesses, as we designed it with a certain workload in mind: Internet, a niche according to IBM and HP, but hey ... it´s a really big niche. Niagara 2 was designed as a general purpose CPU. The problem of the single FPU? Addressed ... the N2 has 8 of them? The issue of SSL-centric Crypto-circuits. Addressed ... they were subsituted by full-fledged crypto accelerators. You get 64 threads, you get two 10 Gigabit Ethernet interfaces with chipsets specifically designed for multi core processing, as they implement the processing tasks in a similar multithreaded way. You get eight lanes of PCI Express. 2 Gigabyte/s in and 2 Gigabyte/s out. Solely for storage attach, as you have already two really big fat pipes of Ethernet directly connected to the inner crossbar of the chip. 60 GB/s worth of memory bandwith.
The point of beeing only capable to run in single socket systems will be solved soon by Victoria Falls. Imagine a system of two or four this processors. Running on an operating system really capable to handle such large amounts of computing threads because four processor with 64 threads each means scheduling you processors on 256 hardware processor. Not an easy task.
Now many people will say: Each thread will only run on 1.4 Gigahertz. This is half as fast as a modern x86 CPU. But is this really a factor? Maybe you can do more cycles in a second, but what´s the gain, when you wait most cycles for memory. A T1/T2 swiches simply to a non-memory starved thread. This is one the reason why a 1.4 GHz CPU can be faster than a 4.7 GHz CPU in SPECfp rate and SPECint rate . It´s like my sister and me at playing racing games on her PC. I´ve lost every time in spite of having the faster car. The problem: I´ve lost traction in every curve of the race track and had to reaccelerate while my sister was able to take the curve without no problems. And: 1.4 Ghz is just the beginning.
IBM want´s to tell to plays it´s virtualisation card by saying: "But we have better virtualisation, we can do more than 64 LDOMS". But the decision for limit was a sentient one: When ever you want to switch to a virtual machine, you have to save the actual register sets and restore the stored register contents of the next vm into the register sets. Takes a vast amount of clock cycles. Now: A 64 threads processor has 64 register sets. By limiting the number of domains to the number of threads - i hope you´ve already got the point - you get an important advantage: Switch from a VM to the next in a clock cycle. Neat, isn´t it ?At the end, it´s not important to be able to partition the cpu into smallest fragments. It´s important that the performance isn´t evaporated by the VM layer. Or didn´t you asked your self, why there´s no benchmark result for a virtualized system in the large benchmark portfolio of IBM or why the VMware licence prohibits benchmarks? LDOMs virtualisation is virtualisation done right.
Now, as the N2 hasn´t the problem of the single FPU, that prevented N1 of beeing a general purpose CPU, the game changes again. 18 month after we did it with N1 the first time. With Niagara II you get a processor suitable for almost all tasks of computing. And as Jonathan said: You´ve ain´t seen nothing yet. But you see the future of computing today. And really soon in your datacenter.
PS: Maybe some statements are really bold ones. But when you´ve read, what i´ve read in the past, when you know what i know, when you saw what i saw you would be so enthusiastic as well.
Wednesday, August 8. 2007
SearchDatacenter cites Nathan Brookwood in Sun Microsystems unveils Niagara UltraSparc T2 chip: Nathan Brookwood, the founder and principal analyst for research firm Insight 64, a consulting firm based in Saratoga, Calif., said he's impressed with the upgrade.[..]All in all, it's a very impressive chip in terms of its absolute performance and its performance per watt."
And I´m really interested about a discussion at osnews.com. This portal was pro-Sun in the last few years.
|
Comments
Thu, 28.08.2008 11:42
I called it fangorn (sindarin for Treebeard) because it´s th e oldest active machine in my home office.
Thu, 28.08.2008 10:23
My old Sun Ultra 10
Thu, 28.08.2008 09:08
Writing this comment on a Sun Ultra6 with 2x450MHz und 2 GB RAM. It is a fine hardware.
Thu, 28.08.2008 01:06
There is another aspect of MAI D, when it is done properly, w hich is the design of a physic al enclosure for the dis [...]
Wed, 27.08.2008 22:36
I'm not particularly convinced by MAID either. The little I' ve looked at it, they try to k eep the discs alive by d [...]