QuicksearchCodenews SearchDisclaimerThe individual owning this blog works at Sun Microsystems GmbH in Germany, a subsidiary of Oracle. The opinions expressed here are his own, are not necessarily reviewed in advance by anyone but the individual author, and neither Oracle nor any other party necessarily agrees with them.
NavigationCategories
|
MAID, ZFS and some further thoughts ...Wednesday, August 27. 2008Comments
Display comments as
(Linear | Threaded)
I'm not particularly convinced by MAID either. The little I've looked at it, they try to keep the discs alive by doing daily spin-ups (which must also waste a power over tapes).
I think the easiest way to emulate MAID would be with SAM. Treat each disk as a tape and that's pretty much it. All you have to do is have a controller that'll turn off power to the drive when idling. Daily spin-ups could just be accessing a file from each drive. Just thinking - for now, I'll stick with the tape solution.
There is another aspect of MAID, when it is done properly, which is the design of a physical enclosure for the disks that constrains the maximum proportion of disks spun up at any one time. This allows higher packing density but more importantly reduces the peak power rating and therefore data centre power and cooling that needs provisioning to the disk system. Of course you could do this with ZFS and some custom tin...
On the reliability front your intuitive understanding of what you are seeing with devices that fail on power up can be quite misleading. We need to distinguish between when a failure occurs and when it is revealed. In the case of disks which are complex, mixed electronic and mechanical systems a component can wear out or go out of tolerance and this may not become visible until the next spin up. Of course, this wearout or other failure trigger event has occured regardless of whether you have spun the disk down and up again and revealed the failure. In many ways a regular spin down and back up is actually helping you by identifying the disks that have quietly suffered an event of which you are blissfully unaware. Most of the classes of disk used in these systems (including the cheap SATA disks in the X4500) are actually rated for 9-5 mon-fri use and expect to go through many power up / down cycles. Their MTBF is properly quoted in running hours (250,000 or so for those in the X4500) so the less time they are actually spun up the longer they are going to last (elapsed real time), thus reducing the component failure rate in the array. Another issue that we need to deal with is the data degrading on the disk surface which is normally kept in check by either data read / write operations or background scans by the RAID logic when the array is at low load or idle. As the individual disks read data they come across clusters which require re-read attempts because they are failing and losing the data. This is frequently recovered within the disk by re-reads but may also be recovered from a parity or mirror by the RAID logic. Either way, the quicker we find a decaying cluster the better our chance of recovering from it without data loss, marking it as bad and putting the data somewhere else on the platter. This is the counterpoint to the disks being more reliable in real time (as opposed to power on time) in MAID and requires that a balance be struck. There is no reason why a properly managed MAID system cannot be at least as reliable as tape, it can also be multi site. Further, disk capacity is growing much faster than tape capacity and cost is going the same way. As MAID systems can compete well with tape on in use energy consumption and yet provide what is essentially live random access to the data there are many arguments in favour of the technology. I would be very interested to see MAID implemented properly, using the far stronger data protection features that ZFS achieves by merging the FS and RAID logic. This could then be applied to truly commodity disk trays to provide a very cheap, yet effective, MAID solution.
Having been in the storage industry for the past 18 years, I understand the idea that turning device on and off can cause problems in the old days. With the use of power management within notebook computers however, the ability for a disk drive to be powered on and off several times a day or even an hour has been built into the devices for years. The notebook industry has allowed helped in several area within the mass storage area:
- Small profile drives - Power management - Increased density within the small form factor MAID fits into the green datacenter philosophy and should be used as a TIER 3 (nearline / lightly used) or 4 (archive / rarely used) devices. However, the cost of Copan (the only real MAID provider) is high and personally I think not justified. I am going to try the solution shown in your write-up above.
If you have a problem with the actual power on-off cycle, why not conserve enrgy by doing a CPU stunt: slow down the drive, less heat/energy required for lower RPMs though once the disk use goes over a certain threshold, speed up the RPMs again
But yes, here in Africa there is a power issue in data centres, so something that would conserve peak and sustained power consumption would be "good (TM)"
Width the advent of SSD in datacenter we will see slower drives in the datacenter anyway. You use 15k drives primarily out of IOPS reasons (perpendicular recording gave us high density thus high data rate). When you donīt need the iops because of a hybrid storage pool, you can use 4500 rpm drives again ... what a silence in front of a storage array
From the first day we saw the X4500 and thought about the possibilties we feel the need to spin down the disks. For us and our customers the idea to use the thumper as a disk-archive for SAM with maid-function was our goal in the last month. Now we are some days away from announcing a software which is able to spin down and up diskgroups defined by zpool or manual configs.
In case of access a zpool is up in arround 15 seconds and you can save roundabout 50% of the idle (!) power-consumption.
Is this a modified block allocator to keep least possible disk busy or just a tool to spin down disks ?
We don't modify anything.
We just let the disks spin down and up by a new service. This service helps to configure disks to spin down after a defined idle time. The more interesting and difficult part was to spin up disks parallel to ensure a quick access. Without this the needed disks will get up sequential and your IOs will reach timeouts.. and under some circumstances your server would hang. Further we developed tools arround to monitor disk status and to import zpools and so on. A solaris service was necessary, too... ...and a little but nice java tool to configure the services is also ready.
Tobia, anything you can share with us on how to implement spinning down in ZFS?
|
Links in this articleThe LKSF bookThe book with the consolidated Less known Solaris Tutorials is available for download here
Twitterfeedstwitter.com/c0t0d0s0
just blogged: Big CIFS/SMB putback: A large change found its way into the Opensolaris Code base: Headlining with w... http://bit.ly/9IGb2y twitter.com/codenews 6932434 AAC adapter GUI hang when creating or deleting RAID http://bit.ly/cZVKE0 twitter.com/SunPatches 128365-04 - Sun Crypto Accelerator 6000 1.1: Driver Patch. Available for SPARC since Mar/19/10. http://bit.ly/agl9Nw twitter.com/SolPatchesX86 118192-04 - SunOS 5.9_x86: gtar patch. Available since Mar/19/10. http://bit.ly/cbnoJ7 twitter.com/SolPatchesSPARC 118191-04 - SunOS 5.9: gtar patch. Available since Mar/19/10. http://bit.ly/cb2Drj Web 2.0Contact
Networking open.bc My photos SyndicationTagged articlesAMD Apple avs Bahn Blogging Blogosphere braindump Business Travel CeBIT cec cec2006 CMT del.icio.us deutsch dtrace fliegen Fundsache General Hamburg IBM i hate sundays Intel iscsi jumpstart Links Linux lksf Mindfuck Movies Music Musik Niagara Opensolaris Opteron Photographie policy of ... Politik Security Solaris storage Sun suncec2007 sunw t1 The IT Business Ultrasparc ultrasparc t1 Wirtschaft Work ZFS
Comments about Reengining
Sat, 20.03.2010 21:36
I didn't have special interest
in airplanes, but your articl
es about airplanes are very go
od. They have made to ta [...]
Sat, 20.03.2010 08:55
Yes. And I just don't like the
way they're killing all of Su
n brands.
They could just buy
, help, let live, contro [...]
Sat, 20.03.2010 08:49
Well, I don't think many peopl
e were using Solaris at home b
efore Oracle acquisition too,
I see home servers more [...]
about Who are you?
Sat, 20.03.2010 02:15
Ich bin im Rahmen der Diskussi
on um das Zugangserschwerungsg
esetz auf dein Blog gestoßen.
Als Linux-Begeisterter d [...]
Sat, 20.03.2010 00:32
The article doesn't explain wh
y the adquisition of Sun is go
ing to be a sucessfull. It onl
y says that we all know: [...]
Buttons![]() This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Germany License
![]() ![]() ![]() Blog Administration |
I wrote some days ago about the need for an different ZFS block allocator to enable the usage of an ZFS pool for an MAID. After thinking a little bit longer about it, i wrote an change request for it. The change request "67422636 - Disk activation base
Tracked: Sep 02, 11:39
We in the storage and server fields (not to mention Apple fans) have been watching ZFS with keen interest for a while. It holds great promise as the next-generation UNIX filesystem foundation, and Joerg Moellenkamp over at c0t0d0s0.com has been coverin...
Tracked: Jun 13, 14:55