Observations on memory reliability

Robin Harris points to an interesting study about DRAM failures in his blog storagemojo (BTW: Robins blog is really a great read). He points to the paper “DRAM Errors in the Wild: A Large-Scale Field Study”( written by Bianca Schroeder (University of Toronto), Eduardo Pinheiro and Wolf-Dietrich Weber (both from Google)) in his article. Some of the numbers are really terrifying: 4.15% unrecoverable errors for of of the platforms are much more then i had thought and i’m somewhat conservative in my thinking how far i trust hardware. Furthermore hard errors (as in “bit permanently flipped and put it to the trashbin”) are vastly more common reasons for errors as most people think. As a sidenote: After the discussion about DRAM prices in the M3000 i’ve got some flak because of memory prices at different quality and got many comments “there never failed a dimm at my home pc”. But given the point that Google is said to use cheaper hardware and the amount of errors (especially the unrecoverable ones) there may be a point behind the fixation of Sun in regard of memory quality