QuicksearchDisclaimerThe individual owning this blog works for Oracle in Germany. The opinions expressed here are his own, are not necessarily reviewed in advance by anyone but the individual author, and neither Oracle nor any other party necessarily agrees with them.
Navigation |
Latency mattersFriday, September 4. 2009Trackbacks
Trackback specific URI for this entry
No Trackbacks
Comments
Display comments as
(Linear | Threaded)
Introducing the FAWN
http://www.cs.cmu.edu/~fawnproj/ A Fast Array of Wimpy Nodes My point being that FAWN uses Flash-memory locally : "FAWN is a fast, scalable, and power-efficient cluster architecture for data-intensive computing. Our prototype FAWN cluster links together a large number of tiny nodes built using embedded processors and small amounts (2--16GB) of flash memory into an ensemble capable of handling 1300 queries per second per node, while consuming fewer than 4 watts of power per node."
I really love to see huge SSD-Disks with tons of cache and fast interfaces, but lets think about this for some seconds:
SAS is specified for operation via electrical cables with a probable maximum distance of about 10 meters at 3 Gbps or 5 meters at 6 Gbps. SATA is only 1 meter IB would really love to see Disks with IB-Connectors, too. But 12-15W for single port QDR is far too much for a single disk. Perhaps a complete system with tons of SSD in 1U or 2U. And it is still interesting if FCoIB would be easier to implement than FCoE. Probably both of them will never make it. Conclusion: if it's not SAS winning the race, it will be something completely differnt.
A 32G Fuision I/O with a fiber port, a supercap, and the logic to implement guaranteed atomic writes to a connected "buddy" card. maybe dual ports to do three cards ( 1 hop to each ).
#3
on
2009-09-06 21:02
And there you have the latency again: because the write has to transmitted to the other cards, written to the flash, and the successful write has to be confirmed to the other card ... it's easier to use this approach with a shared szil disk. Sun does this in our S7000 line ...
So in a situation where you have a zpool on SAN and your L2ARC on local SSD, is there a way to fail that zpool over to another host?
I don't think there is at the moment, since the L2ARC (or 'cache' in "zpool add" terminology) is still part of the pool. Does the HAStoragePlus agent in Sun Cluster handle this? Would be fantastic if it did, but I believe the answer is "no" at present. Fancy filing an RFE for me? ![]()
#4
on
2009-09-06 22:58
Putting the storage as close to the CPU as possible surely is a great concept.. but it gets increasingly difficult with rising space demands. If you're only doing file system math in the terabyte range, there's no way around the big boxes with tons of disks inside.
Figuring out the speed gain by using SSDs in these boxes instead of rotating rust is a wholly different equation, as now the transfer speed of the storage medium approaches the transfer speed of the cache modules. Your whole math also (deliberately?) excludes several other factos that reduce the total available IOPS, like the latency in the HBA, processing overhead in some file system driver, overhead in the multipath manager, cpu bus wait times etc. I'm currently deep into research to optimize how many of the IOPS that the storage box can delivery will really end up at the application. This is especially a problem with the current many-but-weak-cores architectures like the Niagara.. on an old V240 I have no problem saturating two 4G fibre links, on a T5220 I don't have the slightest chance to achieve the same, unless I spread my tests over two or three dozen threads.
1. No i didn't forget it. I've assumed that all those stuff is included in the 6.67 microseconds. My assumption was that the complete combination of Filesystem, Server, Storage, Cable and so on delivers 150.000 IOPS, as the article just looked into the additional latency of the switches
2. With the multi-petabyte you are right, but thats the idea behind L2ARC and sZIL. Having the accelerating stuff in/or near the server, and the data in the big box ... 3. It's obvious that you need more threads to saturate the lines. Thats the point of CMT. The single thread is slower, but you have many of them ... 4. The M-Class has no problems at staturating your lines ![]()
My position is that I think SAN solves most storage needs. It is not a religion. I just ask the question. "Can this server function in a SAN?" If yes, that is good since then I don't have to make a specialized setup for that server. If not, so be it and let that server be an exception. A rule has exceptions, no problem. Same thing with virtualization where some servers should not be virtualized. Same with DR where there may be a discussion of whether this one application should take care of its own DR instead of letting the disk system do it, maybe DataGuard is the way to go for an Oracle DB.
I guess there are servers which needs millions upon millions of IOPS. I have been to a few data centres and haven't seen one yet but I reckon they exist somewhere out there. My estimate is that out of every 1000 servers, less than one would require some special solution that you describe. From my perspective a SAN that is beneficial for 99,9% of your server park is a huge success and it makes SAN the rule. But you are turning this upside down. You make a rule out of a few rare exceptions. From a series of blog posts you take a borderline case(imaginary or not) and generalize. This special server(out of a thousand) is so IO hungry that it needs a large number of SSDs a few centimetres from the CPU. By your logic this expection will make SANs disappear in a few years. That is why I think your reasoning is agenda-driven. That is also probably why I haven't seen you take the same stance when it comes to other technologies? Like LAN for example. A core-edge LAN design will introduce even larger latencies between servers. Why don't you argue that all servers that communicate with each other should reside in the same box? And argue that data centre LANs will disappear. Why are you not a big fan of the mainframe? It can not say for certain if your calculations are correct since I am not an expert on the lower levels of FC. If you have for example 150 kIOPS on each three components(HBA, SAN and disksystem) you add the latencies of each one and get an overall throughput of 50kIOPS. I suspect that simultaneous requests from a highly threaded application with buffer credits and queue depth will make the real throughput somewhere in between 50k and 150k IOPS. If you go to the movies and the ticket counter can process 10 ppl/minute, the pop corn kiosk 15 ppl/minute and the entrance to the theatre 20 ppl/minute, the overall throughput will be dictated by the slowest link in the chain, that is 10 ppl/minute. Not 4,6 ppl/minute if you add all the latencies. And remember that with a SAN you can have parallell queues by bundling HBAs(up to 32 I think for PowerPath) and just multiply SAN performance. But I will try to find out what is correct for FC latencies or maybe an FC expert can advice us in this discussion. I'll tip a few people. OK. So SAN has a limit today and that limit is pushed further and further as technology evolves. With SSDs we will see increasing performance on HBAs, switches and disk systems. SSDs are certainly a wake up call for the vendors but they will surely adapt. So if I have a special server in my data centre server that needs an abnormal amount of IOPS and I would like to provide that within the confines of SAN and centralized storage and with todays technology. Then I wouldn't put it on the outside of a core-edge design with switch and director.Most of the servers are fine there but for my big UNIX box I want more. I could plug it directly into the core director. Or I could dedicate a few ports on the USP-V(out of a maximum of 224) and skip the switches entirely. The SAN can now produce the maximum IOPS of the HBA and you can have as many pipes as you want. My question for you: What percentage of servers do you think will fail to have their IOPS requriements met by this solution? It would be interesting if you came up with an estimate here because then it would be a better discussion. If you agree with me that we are talking ball park 0.1% of the servers, we could skip the debate on whether SAN is on the rise or withering away.
#6
on
2009-09-07 12:07
I'm in a training at the moment ... so just a short comment. Of course your componets have their IOPS count ... but the time inbetween the components is longer.
Or with the cinema example: Let's assume the tickets have 10 ppl/minute, the restrooms have 10 ppl/minute capacity, the popcorn counter has 10ppl/minute and the entrance to the cinema has 10ppl/minute. Of course all the stations have their capacity. Now think about the time, you need from your car to the chair in the cinema, when the restrooms aren't in the same floor and you have to go upstairs three floors, and go back downstairs again. Now reconsider the time from car to chair, if the restroom is directly left of the ticket counter. The next part doesn't match into the readl world: The IOPS rate available to the application "fill the cinema", when the next guest can only buy a ticket, when the last guest sits in the chair. (perhaps the ticket sales is done be the same person, than the popcorn sales, the cleaning of the restroom after each person and the person showing your seat ![]() Each point is stil capable to process 10 ppl but you can't fill your cinema with this speed. Of course it's different when you have more employees ![]() I don't think that SAN will be obsolete, i just think it will be augmented by intelligent concepts to get around the inherent challenges of a network between the components.
How much latency does an LSI SAS Expander chip (are there any other SAS multiplexers out there?) add?
#7
on
2009-09-07 15:04
1. For my calculcations this is irrelevant, as i assumed, that sas expanders in the storage box for example are included in the 6.67 microseconds.
2. There are other vendors for SAS expanders like PMC Sierra.
I understand you bundled in the SAS expander calc in with the 6.67 ms, but..say you have an SSD with .18 ms expected, and the expander has a .18ms latency. Then the expander latency halves the expected IOPS, so its really important.
I don't know what the latency is...I was googling around and found this page, probably because of Olli's comment.... oh well, I'll keep searching. I can tell you this, in my test setup, the expander is halving the expected IOPS...but I don't know for sure if this is expected for this equipment, or not. (LSI).
#7.1.1
on
2009-09-30 21:13
I assumed that SAS expander are as well in the direct attached and in the SAN attached version, so they would be hitted equally by the SAS-expanders. The calculation was set up to show the impact of the DAS compared to SAN.
The author does not allow comments to this entry
|
The LKSF bookThe book with the consolidated Less known Solaris Tutorials is available for download here
Web 2.0Contact
Networking xing.com My photos Comments
about Mon, 01.05.2017 11:21
Thank you for many interesting
blog posts. Good luck with al
l new endeavours!
about Fri, 28.04.2017 13:47
At least with ZFS this isn't c
orrect. A rmdir for example do
esn't trigger a zil_commit, as
long as you don't speci [...]
about Thu, 27.04.2017 22:31
You say:
"The following dat
a modifying procedures are syn
chronous: WRITE (with stable f
lag set to FILE_SYNC), C [...]
|