Somewhat stable Solid State

A reader think it makes no sense to use the STEC SSD, and we should switch to the Intel X25E drives. That sounds reasonable at first as the X25E is much cheaper as the STECs. But as usual the devil is in the detail. So why do many people still use some of the more expensive STECs? Do they have too much money? Are they morons? I will tell you a dirty little secret. No, not at all. When you take it really seriously the X25E aren’t enterprise ready. At least in their default setting.

The -E stands for extreme, not for enterprise

I hear the people crying “What a BS, it’s SLC”. And many media outlets tout them as an enterprise SSD. Yes, but that’s just half of the story. Those SSD have a cache. They need them to enable fast writes. 64 MB or 32 in the case of the X25M G2. The problem: It isn’t protected against power failure at both devices. No cap, no battery. When you don’t believe me, just look at the PCB photos available all around in the network. As soon as the power fails, the cache lose it’s content. But the drive may have answered to the OS, that it wrote the data to the non-volatile media. That’s really a problem. The mysqlperformanceblog describes this in a blog entry:

Now to test durability I do plug off power from SSD card and check how many transactions are really stored - and there is second bumper - I do not see several last N commited transactions.

I find this a little bit frighting. This shouldn’t happen with the settings of the mysql server. Someone has lied in the chain OS to disk. As far as i understand it the Linux fsync() doesn’t flush the drive cache. That’s okay, when you protect the caches with some energy storage. But not without such means of data protection. You have to disable the drive caches, to ensure that the data is really on disk and as that article suggests, the performance was just outright horrible. ZFS doesn’t have that problem because it flushes the cache after every write to the ZIL. It circumvents the problem of the write cache by effectively disabling it due to the frequent flushing of the caches. But that has quite an performance impact. Of course: For many use-cases it’s good enough. And the risks to loose some transactions seems to be acceptable. For an L2ARC it’s acceptable, as the data is redundant and you can afford to loose integrity. The STECs are used for a different reason there: More capacity (100 GB instead of 64 GB). I will come back to this a little bit later. But on a filer you are using for your Oracle DB or your mailserver for example you can’t to that. For any reasonable work needing synchronous writes you need to switch off write caching with the X25-E or use a file system with frequent cache flushing. But that has two impacts: At first the latency of sync writes increases vastly. You have to take into consideration, that those 3.3k IOPS are defined for an enabled write cache. And as a speculation: On the other side the write cache is used to reduce the wear of the disk, thus it may reduce the usable lifetime of your SSD drive. As far as i understand the mechanism to reduce write amplification, it needs the cache the incoming blocks to collect some of them before writing them. That doesn’t make the X25-E an unusable device for enterprise. Switch off the write-cache, and you don’t have the problems. You just have to set your expectation accordingly. An X25-E is still vastly faster than a high capacity SATA drive. Even with switched-off write cache. But for any reasonable work with higher write load you want a device with a protected write cache. And at the moment the X25-E isn’t there.

MLC SSD for L2ARC?

There was another suggestion: Using X25-M for L2ARC. Because they are significantly larger than the -E type. There is detail devil in this situation again. The datasheet of the X25-M Gen2 specifies on page 11 (section 3.5.4) that you can expect a lifetime of 5 years when you write 20 GBytes per day at typical client workloads. At first this sound as a reasonable number, but 20 GBytes are 20971520 KBytes. That’s just 242 KByte per second. 20 GBytes per day are just 35,64 Terabytes in the lifetime of the device. Now you have to remember the L2ARC mechanism. It writes soon-to-be-evicted pages from both ends of the ARC into the L2ARC. Thus on any reasonably loaded system there should be more write-load than just 242 KBytes per Seconds. For a X25-E the situation looks a little bit different: The data sheet specifies in “3.5.4 Write Endurance”:

32 GB drive supports 1 petabyte of lifetime random writes and 64 GB drive supports 2 petabyte of lifetime random writes.

2 Petabytes are 2 199 023 255 552 Kilobytes. 5 years are 157 784 630 seconds. By the power of mathematics, i have the division. 13 MByte per second. And that’s more up to the task than 242 Kilobyte per second. Of course you can find corner use cases where such a disk can reasonably work. But any really loaded device should kill MLC devices quite quickly. And most often S7000 are really loaded devices. MLC devices are specified for client usage. Perhaps you could even build a decent home file server with it. But for enterprise usage? No, almost never.

Enterprise

Enterprise flash drives like the ones used in the S7000 series have a capacitor or a battery to protect the cache. So they can simply ignore the commands to sync the caches. The capacitor or battery buffered cache counts as non-volatile storage. Obviously they are much faster this way. It’s one of the reasons why an STEC has roundabout 4 times number of sync write operations per seconds when compared to an Intel X25-E. But this advantage isn’t cheap, but it’s worth it. You get what you pay for. That’s the reason why the Sun F20 and F5100 flash storage products contain a capacitor to ensure the integrity of the data. Furthermore those STEC SSD are available as dual-ported ones. That’s interesting for multipathing your ZIL devices and for clustering. So it can put into the system without using an interposer card and you don’t need to use the SATA Tunneling Protocol to get your data to the server. That’s a problem with the X25-E, it’s a SATA disks and thus only connectable via an single-ported interface and a just accessible via STP.

Conclusion

To end this article: Maybe a second generation X25-E will contain those capacitors (or something different) and will be usable in an enterprise environment, but at the moment those devices need special care that serverly hits performance. And as long it’s that way, there are good reasons to use the STEC drives in the S7000 series. And just to answer that reader: No, a swap to X25-E isn’t overdue. There are several good reasons not to use them in an enterprise environments. Let’s just wait until an X25-E is up to the task of enterprise at full speed. Of course you can cut a corner everywhere. You can save money by doing so. And maybe it’s up to your task, That’s fine. Do it this way. But most enterprise customers have different requirements. And the costs of a STEC are negligible compared to the costs of restoring and recovering a corrupt database, losing an important mail to the write cache in an SSD or corrupting a file on your NFS server.