QuicksearchDisclaimerThe individual owning this blog works for Oracle in Germany. The opinions expressed here are his own, are not necessarily reviewed in advance by anyone but the individual author, and neither Oracle nor any other party necessarily agrees with them.
Navigation |
Some perspective to this DIY storage server mentioned at StoragemojoThursday, September 3. 2009Comments
Display comments as
(Linear | Threaded)
Interesting comments. I have a better idea now of why Sun charges as much as they do for disk drives for my x2270 ($600+/TB). Presumably they are selling me an enterprise-level drive. Still, I would like to be able to buy the empty disk carrier so I could put my own (desktop-quality) drive into it. Alas, there is no part number for that in the catalog.
Thanks for posting. -cwl
There are other niceties: Did you ever tried to get a firmware update for a desktop harddisk (besides of big bugs like the one at Seagate recently) ? Or the rather long qualification process ....
"If this expensive equipment fails 1% less often over the course of a year that's 90 hours less downtime." - Me (I know it's not exact, but it gets the point across)
Precisely why I educate my clients in the importance of spending the right amount of money on equipment. Just becasue Cisco is too expensive, it doesn't mean we should use netgear.
#3
on
2009-09-03 22:44
Stop charging 3x the amount for the drives and we'll consider Sun. We tried to get the J4200 and it wasn't possible to buy the array with the trays and no drives.
We bought WD RE4 drives and another manufacturers drive array because we couldn't justify paying such a ridiculous premium on drives. We bought the server from sun with minimal disks and OEM drive carriers.
#4
on
2009-09-03 23:25
We buy WD enterprise drives and have been for a few years. It's best to buy OEM drive carriers and enterprise drives on your own.
#5
on
2009-09-03 23:27
They have emulated a tape backup solution with their box. It's cheap, it does mostly writes, and a read once in a while.
Why would you want to use anything but a simple and cost effective hardware solution in this case? Get a desktop m/b, a few gigs of ram, and a bunch of home-type disks. With no fancy stuff needed, only cheep, raw, storage. A disk fails? Replace it. A power supply goes... You'll be back online with that box when the part is replaced. I don't think their service is geared towards the mission critical situations, just effective, cost-wise, backup solutions.
Of course, this is the reason why i wrote that this is an optimal hw for their needs. This is the advantage when you can build your own hardware based purely on your own needs and put many of the tasks from the hardware into the software. Then such an reduced-to-the-max approach is a sensible one. Otherwise oher concepts seem to be better ways to go.
It would be really nice if Sun offered the bare empty X4500 chassis and let people build their own with their own choice of disks and motherboard, if people don't need all the super speed and reliability of the X4500 but still want the density.
#8
on
2009-09-04 02:28
Fantastic! Great post highlighting -some- (I'm sure there are loads more) of the issues. Some people around here that seem to think the backblaze is the greatest thing. I'm busily trying to kill the "hype".
#9
on
2009-09-04 03:10
And then when one of the disks fail, we get a support call logged and after spending the time and resources diagnosing the fault Sun Support has to say "I'm sorry, your third party disk has failed, we can't help you". Real good for customer satisfaction.
Yes, I know you wouldn't make that support call, but many would. Alan.
I think it depends on useage
for home - i rolled my own server with 3 zfs 1tb mirror pairs +netatalk and iscsi ![]() But for enterprise we use thumpers and EMC sans If you look at the graph of the petabyte costs per vendor .. sun are about right . The bottom line for our clients is EMC is too expensive for most of them .. so we use thumpers - happly EMC just costs too much - far to much
88 Comments on Hacker News till now, not bad
![]() http://news.ycombinator.com/item?id=803136
Well of course you get more with Sun. But is it worth 10 times the price? I think no. In the end we compare price and reliability. The backblaze solution is working 99.9(99?)%. Is the remainding few points really worth paying 10 times more?
#13
on
2009-09-04 11:52
Well, this hw doesn't have an availability not even near of 99,9% (don't talk about 99.999%) . Five nines translates into 26,3 seconds outage per month. I don't think you can even swap a disk in this time. Calculate into the equation that there is no redundant power supply (there are two power supplies in it, but they are not redundant). All in all this system should have a really mediocre availability.
But there is a reason why this works for them: Their application logic keeps the data available. And perhaps they can reach 99.9% for the complete solution. But this is not the general case. Most often we talk about applications that need a better availability, because they assume an datastorage with a reasonable availability. As i wrote before ... it's about your application: The more intelligence you have in your application, the less less intelligence you need in your storage to keep data available. By the way: The price calculations are somewhat skewed. To keep a decent availability they have to keep the data threefold from my opinion. Thus you get 333 TB from your 1 PB. Now use a system that is more available by the single unit. Now it may be sufficient to keep the data on just two systems to get the same availability. So you get 500 TB out of your 1 PB. Put into equation, that they should use other drives (doubling the costs) and that you have to factor in the price of the development, testing et. al. then the price isn't that different. I don't included this thought into the article, because you can argue about the interpretations too much and i just know the stuff from the blog about Backblaze. At the end: It's a perfect hw for their solution, but don't take for granted that this a solution sufficient even for similar cases. And for general-purpose with standard application: Forget it ... without ZFS i wouldn't even think about using this 10^14 disks ...
It would be nice, if SUN would build a system nearer to the backblaze specs for backup purposes.
It will not happen, since accountants will not understand the difference between enterprise 24/7 online storage and a backup device, which may have a service downtime of a few days. If the software supports power switching the disks, even the desktop drives will be ok. There are a lot of applications, which can be done on cheap storage, but you have to explain a buyer, that a barn is not a fully automated high rack storage area. You can store things in both and the barn wins in price.
Sorry, but desktop drives wouldn't be okay. Given that you use 1 PT not alone. You have to work with rather short idle time outs. You should keep into consideration, that an desktop drives is designed for 10.000 start/stops per year.
This leads to a start-stop every 52 minutes. Further you should keep in mind, that you end with waking up many disks due to striping and such nasty challenges like partial stripe writes. And then consider that you don't have one user, you have several. Perhaps thousands. Thus to get the disk sleeping for at least 6360 hours (8760 - the specified 2400 power-on hours for the 7200.11) in a year you have to set rather short sleep time outs, but that will bite large chunks of your budget of 10.000 start/stops. The problem: Those disk will still work reasonable well when used out of specification ... at the beginning ... but the problem will haunt you, when your disks get a little bit older. Of course you could preventively substitute your disks, but that affirms the old saying "Buying cheap, is buying two times" Better to buy a disk, that is capable to run 24/7. right from start.
Given the rise of the SOHO NAS, HDs manufacturers are slowly introducing intermediate-level HDs, like the WD Caviar RE3 built for 24/7 with a MTBF of 1.2 million hours ... at a decent price.
What's missing now is an intermadiate-level storage solution. Not everyone needs 99.999%. Lots of IT departements are ready to accept to have a little higher failure rate for a 5x price decrease. Sun can ignore this market, or embrace it while there's still room for a big player.
#17
on
2009-09-04 14:31
But what if the power supply goes during a write (highly likely)? Don't the absence of cache and battery on the controllers matched with different controller I/O speeds make data corruption a real possibility? If you lose a power supply, you may be down much longer than you think. I would assume they are backing up or replicating with their application solution like is mentioned, but, even then, bringing back that much data can take a while.
Seems like this is right on the money. A good solution for their needs and they should also be applauded for sharing, but for others there are some failure and performance points that need to be considered.
#18
on
2009-09-04 14:40
The argument with start/stop cycles is ok, but only, if the system is used as an enterprise server. For backups, there are a low number of accessors at a time. And a private home page server has a lot of pages with nearly no access. Small companies will have access patterns, that allow the drives to sleep the whole night.
If the system is not used for storing seldomly used files, using enterprise hard drives is recommend of course. I think, there is a market for this systems as archive storage. How much storage will you need, to justify the expenses for two tape drives and a robot ? I think this will be in the 100 TB range, which is a lot for backups and archive of a 100 to 500 people shop.
10.000 start/stop-cycles?
Aren't desktop drives usually the ones with more start/stop cycles that a typical server disk? And only 2400 power-on-hours - that would be great to have for energy consumption considerations, but I really don't see a 24/7 8760 day/Year application as that big of a problem for desktop disks. The fact that it is not specified is no reason that it doesn't actually work and perform well. I really don't see the desktop disks failing that often that you'd have to continuously swap-in new ones.
#20
on
2009-09-04 22:50
Sorry, but please look at page 23 (2.11 2.11.1 Annualized Failure Rate (AFR) and Mean Time Between Failures (MTBF) of the Seagate 2007.11 product line at http://www.seagate.com/staticfiles/support/disc/manuals/desktop/Barracuda%207200.11/100507013e.pdf.
This document specifies "2400 Power-on hours". I assume that has something to do with the bearings and wear on electronic components when powered. Furthermore this document specifies that "10.000 Start/Stop cycles". Obviously you are right, that desktop drives are speced for more start and stops. When you look in the specification for the Cheetah 15k. (http://www.seagate.com/staticfiles/support/disc/manuals/enterprise/cheetah/15K.6/FC/100465943a.pdf) you will see that all this reliability calculations are done on the basis of whooping 250 start stops per year, but different to the desktop it's speced for 8,760 power-on hours per year, so you don't have to shut it down just to keep it in the power-on hours envelope. When a device has a annualized failure rate of 0,37% at 2400 hours, then it rises to 1,25% when used out of spec in regard of the power-on-hours. Mix a non-desktop load to it, perhaps a few degrees away from the 25 degrees specified ambient temperature you have all the factors to shorten the life expectancy of your drives. Now take into consideration, that those people doesn't have just one drive they seem to have hundreds of them .... The problem: Even a desktop drive can be used in enterprise load ... for a while ... like driving a motor constantly in the red area ... it will work for a while ... but you significantly shorten the life of the device. And there we get to another problems: When discs are buyed at the same time, used at a similar load, you could expect that the disks are dying at a similar time. Now you've additionally accelerated the wear on the disks by out-of-spec usage ... And by the way,there are lot's of anecdotes of people using SATA disks in their ghetto RAID where they used FC drives before by the choice of the financial controller, put the same database load on it, and saw the drives dying like flies after a while. I can just assume that Backblaze took this into their consideration and plan to substitute disks early to participate on the increased data density of disks so they don't have to buy new racks and systems, just new disks.
c0t0d0s0 is perfectly right:
Backblaze is selling BACKUP services. BACKUP is the last line of defense. In case your data is lost you will want to recover from your BACKUP. But this backblaze stuff is no BACKUP solution. BACKUP is done on TAPE not on DISKS. This is why the industry inventend tapechangers for backup purposes. Because a tape is much more reliable than a drive. Anyone heared about silent bit corruption ? No ? RTFM. I would never backup my data in a datacenter like the one backblaze runs. Never ! Oh theres one exception: in case i never need my data back i would trust backblaze. But on the other hand - in this case it would be the cheapest solution to backup all data to /dev/null. Cheap and fast.
I think Micheal is off the mark.
1. Tape media fails. It does not offer fast random access, like its magnetic storage HDD counterparts. We stored data in the 1980s on three different tapes and put them in different locations. When we needed to recover an old program, all three failed. 2. I am not sure what BB does exactly, but I would imagine if there were conservative, is to have a copy offsite too. And weekly, run a hash algorithm on each file stored and isolate any degradation. I believe the SW mitigates any issues here. 3. Bit errors will occur. They can happen in RAM before the write and in numerous other ways. Again, application logic using big hashes, can mitigate much of this and provide the highest reliability.
#23
on
2009-09-24 07:54
Hey, you have a great blog here! I'm definitely going to bookmark you! Thank you for your info.And this is Home Improvement site/blog. It pretty much covers Home Improvement related stuff.
Jones,
1st: Yes Tapes can fail. But it is more unlikely than loss of data on Hard Drives. Did you ever hear of "silent data corruption" ? No ? Then read this please: 2nd: i think they don't have an offsite copy. 3rd: Yes bit errors will occur. Using application logic to avoid bit corruption is only one part of the problem. But using CRAP (like backblaze does) makes all worse.
#25
on
2009-10-13 10:15
Link to silent data corruption:
http://raidinc.com/pdf/Silent%20Data%20Corruption%20Whitepaper.pdf
#26
on
2009-10-13 10:27
I read about the backblaze, and found this blog 20 minutes later.
96 TB is definitely too much for me, but I'm thinking about building 2 storages 1 for backup, and the other one for my esx farm. Using NFS. I came to the conclusion that I could easily upgrade the Backblaze design after reading your post, by modifying 3 major elements: SATA controller: 3aware 9650SE-12ML - 8-Channel File System: ZFS with OpenSolaris 0906 - RAID-Z2 Hard drive: 8 or 12 Disk (still wondering between the Seagate Barracuda ES.2 ST31000340NS and the Western Digital Caviar Black WD1001FALS) My major problem is to find the enclosure to install these 8 or 12 HD ! What do you think ?
#27
on
2009-12-10 07:53
The author does not allow comments to this entry
|
The LKSF bookThe book with the consolidated Less known Solaris Tutorials is available for download here
Web 2.0Contact
Networking xing.com My photos Comments
about Mon, 01.05.2017 11:21
Thank you for many interesting
blog posts. Good luck with al
l new endeavours!
about Fri, 28.04.2017 13:47
At least with ZFS this isn't c
orrect. A rmdir for example do
esn't trigger a zil_commit, as
long as you don't speci [...]
about Thu, 27.04.2017 22:31
You say:
"The following dat
a modifying procedures are syn
chronous: WRITE (with stable f
lag set to FILE_SYNC), C [...]
|
Tracked: Sep 03, 18:08
Tracked: Sep 03, 21:19
Tracked: Sep 03, 23:20
Tracked: Sep 04, 00:22
Tracked: Sep 04, 01:43
Tracked: Sep 04, 15:51
Computing One companies cheap large disk arrays
Tracked: Sep 04, 18:12
Computing One companies cheap large disk arrays
Tracked: Sep 04, 18:29
Tracked: Sep 04, 18:33
Computing One companies cheap large disk arrays
Tracked: Sep 04, 19:13
Tracked: Sep 04, 22:25
Computing One companies cheap large disk arrays
Tracked: Sep 05, 05:06
Computing One companies cheap large disk arrays
Tracked: Sep 05, 06:51
Tracked: Sep 05, 14:37
Tracked: Sep 07, 13:51
Tracked: Sep 08, 11:00
Tracked: Sep 25, 17:20
Tracked: Dec 07, 04:46
Tracked: Jan 05, 13:41
Tracked: Mar 01, 00:14