QuicksearchDisclaimerThe individual owning this blog works for Oracle in Germany. The opinions expressed here are his own, are not necessarily reviewed in advance by anyone but the individual author, and neither Oracle nor any other party necessarily agrees with them.
|
Some thoughts about deduplicationTuesday, February 3. 2009Comments
Display comments as
(Linear | Threaded)
I can imagine several situations, where "duplicating the deduplication" would still be a great feature and a big bunch of space saving.
When I look at our Fileservers, even keeping each block twice instead of once for each file would be a save. I don't speak of a deduplicated raid, but even the ditto blocks feature of zfs. In many applications it might not be worth the computing overhead but I would very glad if I had it at least for some of our boxes.
You make some good points - especially the 16-bit checksum used by NetApp. That helps more than anything else that I have seen to give guidance on when it is really bad to do dedup.
Every once in a while I hear the argument that disk (or RAM) is cheap, so just add more. That works well up until you have added more or you started out with the (economically) most. Suppose I have a T2000 with 4 x 73 GB drives that runs Solaris 10. I've taken the leap and use ZFS for / and zone roots. Having gotten tired of being surprised by the poorly packaged app that "needs" to put a symlink in /usr/lib I make use of zfs clones to provision 20 full root zones. After all, each takes hardly any space. My starting point for disk usage looks like: / 4 GB swap 16 GB dump 16 GB /zones/$zone 1 GB (50 MB each for app install, etc.) The second pair of disks is used to store application data. The first pair of 73 GB drives is about half full - That's not bad. Along comes a patch cycle. I create a new boot environment with live upgrade, download 10_Recommended.zip (1078.54 MB today) to /var/tmp and extract it (another 2 GB?). Let's pretend that about half of the patches (in space) actually apply - which would seem to take up another 13 GB. In reality, each patch creates undo.Z in two places in the global zone (/var/sadm/pkg/$pkg/save/$patch/undo.Z, /var/sadm/pkg/$pkg/save/pspool/$pkg/save/$patch/undo.Z) and one in each non-global zone (/var/sadm/pkg/$pkg/save/$patch/undo.Z). Assuming average compression of 50% for the undo.Z files, we are now at: / 7 GB (original 4 GB + 1 GB .zip, + 2 GB unzipped) swap 16 GB dump 16 GB /zones/$zone 1 GB alt / 2 GB (Free clone, 1 GB patches, 2 x 500 MB undo.Z) alt /zones/$zone 30 GB (Free clone, 20 x (1 GB patches, 500 MB undo.Z)) Oops, that is 70 GB - things are looking pretty tight. If real-time dedup were used, it would look like: / 6 GB (original 4 GB + 1 GB .zip + 2 GB unzipped - dupes in unzipped compared to /) swap 16 GB dump 16 GB /zones/$zone 1 GB alt / 600 MB (Free clone, new files are dupes of unzip in /var/tmp, /var/sadm/pkg has lots of package metadata changes, 2 x 500 MB undo.Z deduped to 1 x 500 MB) alt /zones/$zone 2 GB (Free clone, new files are dupes, undo.Z are dupes, metadata changes) That brings us to about 42 GB - which should remain relatively steady through subsequent patch cycles - especially if undo.Z from earier patch cycles are removed. The "add more disk" approach would have involved buying a pair of 146 GB drives because the root pool doesn't do striping. The same exercise done for LDoms can start to make the space available in a maxed out T5240 look tight.
Obviously the best way in regard of deduplication is to prevent duplication as you did it with zone cloning.
I understand your argument, but I believe there is a very valid counter-argument.
If I am using zfs to mirror disks, I should have all the data safety I need. To achieve consolidation goals, and likely efficient use of the ZFS ARC, having fewer copies of duplicate data is better. I think you would get better use of disk space and have lower risk of data loss through dedupe + mirror + copies=2.
Yes, a mirror + copies=2 would give you back the redundancy to ensure that you donīt have to go to the tapes. I just have some problems with dedup in conjunction with technologies like RAID-4/5.
I have just a little doubt if people really think about dedup in storage arrays when you can have a fivefold capacity by using 2TB 7200rpm disks+SSD instead of 400 GB SAS disks. I think itīs a classic hype cycle with dedup. Many of the current implementations arenīt really ready for prime time and often look just like "RfP tick off features". Done right, Dedup may a really interesting feature. For cache efficiency it may be useful to do cache dedup alone
I just wonder if 'block level' deduplication is really relevant, what about 'file level' deduplication?
Comparing filesizes first, checksums and at the end files byte-by-byte could solve the performance and complexity issues. We have customers with 'strange' users that preffer copy the "C:\Documents and settings" directory on the server every week, to "feel" safe... every week a new fresh copy of 5GB... occupying about 250GB a year for 99% exact copies... file deduplication could save a lot of GB. I saw the case where customers working with CAD software are using a 10GB library (mecanical components drawing) that they have to save with every project to be sure that he will be able to open his project in the future. In this case, file deduplication, or maybe shoud I call it "auto reverse snapshot" is a killer feature.
Block level is most useful for things like storing disk images. Imagine a storage device or primary LDom that has disk images (i.e. big files with UFS file systems on them). Each file will most certainly be unique but there is likely to be lots of duplicate blocks.
Arguably a 256-bit checksum is suboptimal because it means you need to load and compare 4 long longs. A 64-bit checksum, the likelihood of a collision is about .00000000000000000005. This means that if you are doing a steady 1 million writes per second, you will need to do a full block comparison once every 634,000 years. Give or take. I bet 32-bit would be a good compromise. At a million writes per second you should expect a full block comparison about once per day. Assuming chained hashing, a 16 TB pool with 32-bit checksums and average 4KB block size would have an average of chain length of 1.
The problem ... The 256-bit checksums are already there to ensure data integrity. Why not use them? You have more process cycles to spare than IOPS. I donīt think about it as a problem.
And 16 TB are just 8 harddisks today, even a medium sized array will have easily 1 PT in the near future. You may have to check for duplicity anyway: Extremly low probability collision doesnīt mean there is no collision. On the other hand: When the hash is long enough, itīs more likely, that the data will be corrupted by bit rot than by dedup collision. But i donīt see that at 32-bit.
Sorry, but your explanation is plain wrong: Netapp uses the checksum to find candidates for deduplication but before two blocks are assumed to be identical, they are compared bit-by-bit so there is no possibility of a corruption because of hash collisions.
This is a very important difference to stream deduplication (like in tape applications) where you have no chance to see or check an already deduplicated block again. |
+1The LKSF bookThe book with the consolidated Less known Solaris Tutorials is available for download here
Web 2.0Contact
Networking xing.com My photos Buttons![]() This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Germany License
![]() ![]() ![]() Blog AdministrationDonateOkay, okay ... as several people have asked for it ... but you know my opinion.
|
Personally iīm think that Data Deduplication is vastly overrated, as you need really special data set to get the promised reductions by the vendors. At other points it is even dangerous: Sometimes this duplications are the last line of defence between the
Tracked: Jul 11, 20:56