QuicksearchDisclaimerThe individual owning this blog works for Oracle in Germany. The opinions expressed here are his own, are not necessarily reviewed in advance by anyone but the individual author, and neither Oracle nor any other party necessarily agrees with them.
|
MAID, ZFS and some further thoughts ...Wednesday, August 27. 2008Comments
Display comments as
(Linear | Threaded)
I'm not particularly convinced by MAID either. The little I've looked at it, they try to keep the discs alive by doing daily spin-ups (which must also waste a power over tapes).
I think the easiest way to emulate MAID would be with SAM. Treat each disk as a tape and that's pretty much it. All you have to do is have a controller that'll turn off power to the drive when idling. Daily spin-ups could just be accessing a file from each drive. Just thinking - for now, I'll stick with the tape solution.
There is another aspect of MAID, when it is done properly, which is the design of a physical enclosure for the disks that constrains the maximum proportion of disks spun up at any one time. This allows higher packing density but more importantly reduces the peak power rating and therefore data centre power and cooling that needs provisioning to the disk system. Of course you could do this with ZFS and some custom tin...
On the reliability front your intuitive understanding of what you are seeing with devices that fail on power up can be quite misleading. We need to distinguish between when a failure occurs and when it is revealed. In the case of disks which are complex, mixed electronic and mechanical systems a component can wear out or go out of tolerance and this may not become visible until the next spin up. Of course, this wearout or other failure trigger event has occured regardless of whether you have spun the disk down and up again and revealed the failure. In many ways a regular spin down and back up is actually helping you by identifying the disks that have quietly suffered an event of which you are blissfully unaware. Most of the classes of disk used in these systems (including the cheap SATA disks in the X4500) are actually rated for 9-5 mon-fri use and expect to go through many power up / down cycles. Their MTBF is properly quoted in running hours (250,000 or so for those in the X4500) so the less time they are actually spun up the longer they are going to last (elapsed real time), thus reducing the component failure rate in the array. Another issue that we need to deal with is the data degrading on the disk surface which is normally kept in check by either data read / write operations or background scans by the RAID logic when the array is at low load or idle. As the individual disks read data they come across clusters which require re-read attempts because they are failing and losing the data. This is frequently recovered within the disk by re-reads but may also be recovered from a parity or mirror by the RAID logic. Either way, the quicker we find a decaying cluster the better our chance of recovering from it without data loss, marking it as bad and putting the data somewhere else on the platter. This is the counterpoint to the disks being more reliable in real time (as opposed to power on time) in MAID and requires that a balance be struck. There is no reason why a properly managed MAID system cannot be at least as reliable as tape, it can also be multi site. Further, disk capacity is growing much faster than tape capacity and cost is going the same way. As MAID systems can compete well with tape on in use energy consumption and yet provide what is essentially live random access to the data there are many arguments in favour of the technology. I would be very interested to see MAID implemented properly, using the far stronger data protection features that ZFS achieves by merging the FS and RAID logic. This could then be applied to truly commodity disk trays to provide a very cheap, yet effective, MAID solution.
Having been in the storage industry for the past 18 years, I understand the idea that turning device on and off can cause problems in the old days. With the use of power management within notebook computers however, the ability for a disk drive to be powered on and off several times a day or even an hour has been built into the devices for years. The notebook industry has allowed helped in several area within the mass storage area:
- Small profile drives - Power management - Increased density within the small form factor MAID fits into the green datacenter philosophy and should be used as a TIER 3 (nearline / lightly used) or 4 (archive / rarely used) devices. However, the cost of Copan (the only real MAID provider) is high and personally I think not justified. I am going to try the solution shown in your write-up above.
#3
on
2008-09-07 01:35
If you have a problem with the actual power on-off cycle, why not conserve enrgy by doing a CPU stunt: slow down the drive, less heat/energy required for lower RPMs though once the disk use goes over a certain threshold, speed up the RPMs again
![]() But yes, here in Africa there is a power issue in data centres, so something that would conserve peak and sustained power consumption would be "good (TM)"
#4
on
2008-09-10 15:22
Width the advent of SSD in datacenter we will see slower drives in the datacenter anyway. You use 15k drives primarily out of IOPS reasons (perpendicular recording gave us high density thus high data rate). When you donīt need the iops because of a hybrid storage pool, you can use 4500 rpm drives again ... what a silence in front of a storage array
![]()
From the first day we saw the X4500 and thought about the possibilties we feel the need to spin down the disks. For us and our customers the idea to use the thumper as a disk-archive for SAM with maid-function was our goal in the last month. Now we are some days away from announcing a software which is able to spin down and up diskgroups defined by zpool or manual configs.
In case of access a zpool is up in arround 15 seconds and you can save roundabout 50% of the idle (!) power-consumption.
Is this a modified block allocator to keep least possible disk busy or just a tool to spin down disks ?
We don't modify anything.
We just let the disks spin down and up by a new service. This service helps to configure disks to spin down after a defined idle time. The more interesting and difficult part was to spin up disks parallel to ensure a quick access. Without this the needed disks will get up sequential and your IOs will reach timeouts.. and under some circumstances your server would hang. Further we developed tools arround to monitor disk status and to import zpools and so on. A solaris service was necessary, too... ...and a little but nice java tool to configure the services is also ready.
Tobia, anything you can share with us on how to implement spinning down in ZFS?
#6.2
on
2009-07-21 14:28
The author does not allow comments to this entry
|
+1The LKSF bookThe book with the consolidated Less known Solaris Tutorials is available for download here
Web 2.0Contact
Networking xing.com My photos Comments
about Mon, 01.05.2017 11:21
Thank you for many interesting
blog posts. Good luck with al
l new endeavours!
about Fri, 28.04.2017 13:47
At least with ZFS this isn't c
orrect. A rmdir for example do
esn't trigger a zil_commit, as
long as you don't speci [...]
about Thu, 27.04.2017 22:31
You say:
"The following dat
a modifying procedures are syn
chronous: WRITE (with stable f
lag set to FILE_SYNC), C [...]
Buttons![]() This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Germany License ![]() ![]() ![]() ![]() ![]() Blog Administration |
I wrote some days ago about the need for an different ZFS block allocator to enable the usage of an ZFS pool for an MAID. After thinking a little bit longer about it, i wrote an change request for it. The change request "67422636 - Disk activation base
Tracked: Sep 02, 11:39
We in the storage and server fields (not to mention Apple fans) have been watching ZFS with keen interest for a while. It holds great promise as the next-generation UNIX filesystem foundation, and Joerg Moellenkamp over at c0t0d0s0.com has been coverin...
Tracked: Jun 13, 14:55