Somewhat paranoid.
There was some discussion about possible attack vectors for checksum-only deduplication. However i don’t think the proposed vectors are really feasible. The first attack tries to create data corruption by foiling deduplication into using a thoughtfully created block, with the same checksum but different data. Another, less discussed attack vector, is gaining knowledge of a block by using a thoughtfully created block to foil dedup to show you a different block than the one you have actually written, as the checksum-only variant didn’t stored your block, but just created a pointer to data written beforehand. Yes, it’s thinkable that you can cancel out a write by creating a packet with the same checksum to create data corruption. But when the possibility of an accidental collision is already that low, how would you quantify the possibility of:
- knowing the dedup hash of an block before the block is written
- find a block(let's call it hash-collision injection block) with the same dedup hash
- having enough time between gaining the knowledge of the dedup hash of the write in the future and the actual write, to find this hash-collsion injection block
- inject hash-collision injection block into the attacked system, before the block you want to cancel out is actually written (as being second doesn't help you in such a attack against dedupliction)
- furthermore the point of injection has to be in the same pool and the point of injection has to be in a dataset with activated dedup </ul> Sorry, but i would bet my money on hoping of bit-rot on the same block on both rotating rust mirrors of a dataset to corrupt data, given that there are no advances in precognition technology, i'm not aware of. I could think of an attack to write some blocks on a device and hoping that a future write will yield the same checksum. But that sounds to me like mobsters driving you to the desert and instead of shooting you, they hope for a meteorite hitting you directly in the head. More feasible than a attack to cancel out a write is an attack to gain knowledge of a file by writing a block yielding the same dedup checksum than an already written block with interesting information, as checksum-only dedup would deduplicate away the collision provoking block and delivering the block of the original file instead. But More feasible doesn't mean realistically feasible out of a simple reason, because you have to think about the three vectors to execute such an knowledge escalation attack:
- You have to know the content of a block to find the matching knowledge escalation block, but then you wouldn't have to create such a block to gain the knowledge as you already have the knowledge. The only interesting information you could yield here is the fact that the block hasn't changed since you've gained the knowledge of the block contents
- Or you have to know the dedup checksum of an block with interesting data, but how should you gain knowledge of this checksum without having either privileged access to the system or access to the harddisk. Both ways you could read the block of interest directly without having the hassle of creating a knowledge escalation deduplication collision block
- Brute-forcing the deduplication collision by trying blocks resulting in each and every checksum, comparing the content you have written in your collision block with the content that could be actually read back from the device after deduplication.Well, good luck. Hope you get some interesting results, before the general proton decay of the universe gets you.