Friday, August 28. 2009
While commenting to a tweet of @alecmuffet a question arised in my head: Why do many people talk about SAN boot and why do people seldomly ask for NFS boot?
Asides from my opinion that systems should be able to boot without help from the outside (debugging is much easier, when you have an OS on your metal), SAN boot looks to me like combining the worst of all worlds. You depend on centralized infrastructure, you have to put HBAs (at least two) in every server, you have to provide an additional fabric, encryption is still an unsolved problem.
On the other side you have NFS, deduplication and cloning is a non-problem when you configure your boot environment in a clever way, encryption is available by the means of IPsec. Caching to get load from the network is possible by CacheFS for example. Or think about the combination of NFS boot with the snapshot/clone function of ZFS to clone you boot environments?
So why is everybody looking in the direction of SAN boot instead of using NFS boot, when they look at centralized boot storage? The issue of needing an additional fabric is known, so solutions are developed, but instead of simple going in the direction of NFS the industry is running into the direction of blocks over IP or blocks over Ethernet. Strange world ...
Sunday, March 1. 2009
Tom Haynes wrote a great article about pNFS: How does a npool compare to a zpool?. He plays in this article with the concept of npools als storage pools consisting out of several servers instead of zpools as storage pools consisting out of disks at a server. Really an interesting idea.
The article led me to another thought: Technology in IT is a thing of waves and cycles: At first there were file level storage area networks with the first versions of NFS, then there were block level SAN networks like FC. And in itīs cycle there were waves ... albeit there were predominat technologies (there was a time everybody got laughed for the idea to use NFS, but that was at large a problem introduced by sub-standard NFS implementation in a certain open-source implementation of NFS), and everybody wanted block devices for everything ... now there is a problem that the number of block devices often outnumbers the servers by an order of magnitude leading to management problems like fragmented storage. This led to additional complexity with thin provisioning, not working well with the space allocation patterns of data. And the whole idea of virtualisation in storage is just an added layer of complexity to solve another complexity that doesnīt goes away, itīs just a hidden skeleton in the closet.
Perhaps pNFS is the next wave of file level in the datacenter. There are several good reasons to do so. With virtualisation of desktops and servers there isnīt a physical limit preventing people to generate small servers for each and every task. When you try to do this, the number of block devices simply gets outrageous. In discussion with customers there is a trend for providing storage devices to virtualisation layers in the form of files in a shared filesystem provided by NFS, even when there is a performance penalty. Simply because itīs easier to manage. The concept of mirrored stripes in pNFS may even render the whole concept of making storage high-available itself superfluous, as pNFS in conjunction with the meta data server may solve many of the problems.
Of course this puts a big load to the storage servers. But with pNFS you have really interesting possibibilies to share all the load generated by the virtual disks over smaller systems: Like building stripes of whole servers. Like having a real shared access instead having to fiddle around with many configurations while moving a LUN from one server to another.
This shows another basic concept of IT. Any given problem in HPC will appear in commercial IT noch much later. Any given technology in HPC will appear in commercial IT not much later. pNFS and similar developmets of cluster based data storage were developed for storing large amount of data for HPC. But virtualisation for example shares many of its problems with large compute clusters. Thus anybody in commercial IT should have a close look on HPC. Solutions and problems found in that area will help or hit them two or three years later. Better to be prepared and informed.
pNFS is definitely a technology from HPC that will help you in the future solving your commercial IT problems.
Monday, February 2. 2009
Bryan Cantrill wrote an excellent article about the SPECsfs: Eulogy for a benchmark. This article describes the shortcomings of the SPECsfs and the consequences to benchmarketing configurations (like a 228 short-stroking disk configuration for a benchmark with NetApp): Be it due to incompetence or malice, SPEC's descent into a disk benchmark while masquerading as a system benchmark does worse than simply mislead the customer, it actively encourages the wrong engineering decisions. In particular, as long as SPEC SFS is thought to be the canonical metric of NFS performance, there is little incentive to add cache to NAS heads. (If SPEC SFS isn't going to use it, why bother?) The engineering decisions made by the NAS market leaders reflect this thinking, as they continue to peddle grossly undersized DRAM configurations -- like NetApp's top-of-the-line FAS6080 and its meager maximum of 32GB of DRAM per head! (By contrast, our Sun Storage 7410 has up to 128GB of DRAM -- and for a fraction of the price, I hasten to add.) And it is of no surprise that none of the entrenched players conceived of the hybrid storage pool; SPEC SFS does little to reward cache, so why focus on it? (Aside from the fact that it delivers much faster systems, of course!) A must-read article! His introductory sentence "I come to bury SPEC SFS, not to praise it." has to be taken seriously.
Tuesday, December 16. 2008
Brendan Gregg published an follow-on article to his 250.000 IOPS article. In "Up to 2 Gbytes/sec NFS" he shows how far he can drive the Sun Storage 7410 in a single-node configuration. I like his approach to talk about an benchmark-special result at first, to show itīs pitfalls afterwards just to present a more realistic number. As in the last test, he factored out the harddisks by using a workingset within the size of the main memory (100 GB on a 128GB system)
He was able to get up to 2 GByte/s from a 7410 with two 10 GBit/s interfaces and 20 clients. (with some upside potential, as the CPUs werenīt loaded). In a more realistic test he was able to yield a little bit more than 1 GByte/s over a single 10 GBit/s interface. That's 1.07 Gbytes/sec outbound. This includes the network headers, so the NFS payload throughput will be a little less. As a sanity check, we can see from the first screenshot x-axis that the test ran from 03:47:40 to about 03:48:30. We know that 50 Gbytes of total payload was moved over NFS (the shares were mounted before the run, so no client caching), so if this took 50 seconds - our average payload throughput would be about 1 Gbyte/sec. This fits.
10 GbE should peak at about 1.164 Gbyte/sec (converting gigabits to gibibytes) per direction, so this test reaching 1.07 Gbytes/sec outbound is a 92% utilization for the 7410's 10 GbE interface. Each of the 10 client's 1 GbE interface would be equally busy. This is a great result for such a simple test - everything is doing what it is supposed to. An excellent result!
Thursday, December 4. 2008
Brendan Gregg did an interesting benchmark with our new storage systems to show what an 7410 single-head can deliver: A quarter million NFS IOPS. This number was generated with a benchmarking configuration - 1 byte blocksize. This isnīt really realistic - itīs just for benchmarking. But he delivers a more meaningful result in his text, too: I've reached over 145,000 NFSv3 read ops/sec - and this is not the maximum the 7410 can do (I'll need to use a second 10 GigE port to take this further). The latency does increase as more threads queue up, here it is plotted as a heat map with latency on the y-axis (the darker the pixel - the more I/Os were at that latency for that second.) At our peak (which has been selected by the vertical line), most of the I/Os were faster than 55 us (0.055 milliseconds) - which can be seen in the numbers in the list on the left.
Friday, August 22. 2008
With the integration of PSARC 2007/347 NFS/RDMA Solaris got a new implementation for NFS over RDMA: This integration brings the NFS/RDMA Solaris Implementation up to datewith the latest IETF Drafts. Specifically, it removes support for the initial NFS/RDMA implementation (version 0) and adds support for rpcrdma [1] and nfsdirect [2] (version 1).
Monday, August 18. 2008
Yet another tutorial finalized - this time about one of the really hidden features of Solaris - CacheFS. CacheFS is something similar to a caching proxy. But this proxy donīt cache web page, it caches files from another filesystem. I divided the tutorial in 8 parts.
I hope, this tutorial gives you some new insights into Solaris. Have fun while trying out the stuff in the tutorial
Monday, August 18. 2008
CacheFS is one of this features even some experienced admins arenīt aware of. But as soon as they try it, most of them canīt live without it. You should give it a try.
Continue reading "Less known Solaris Features: CacheFS - Part 8: Conclusion"
Monday, August 18. 2008
CacheFS is one of this hidden gems in Solaris, itīs even that hidden, that itīs in sustaining mode since 1999. That means, that only bugfixes are made for this features but no new features found their way into this component since that day. In the recent days there was some discussion about the declaration of the End-of-Feature status for CacheFS which will lead to the announcement of the removal of CacheFS. After a few days of discussion the ARC decided in favour of the removal.
Continue reading "Less known Solaris Features: CacheFS - Part 7: The future of CacheFS"
Monday, August 18. 2008
Letīs take an example: Letīs assume you have 50 webservers that serve static content or that serves and executes .php files. You may have used scp or rsync to syncronize all this servers.
With CacheFS the workflow is a little bit different: You simply put all you files on a fileserver. But you have noticed, that there was a large load on this system. To reduce load to the fileserver you use your new knowledge to create a cache on every server. After this you saw a lot of requests for directory and file metadata for the documents directory. You know that this directory is provisioned with the changes only at 3 oīclock in the morning. Thus you write a little script, that checks for the file please_check_me and starts cfsadmin -s all on the client in the case of itīs existence and it will check at the next access to the files if thereīs a newer version.
Monday, August 18. 2008
Letīs assume you share a filesystem with static content (for example the copy of a cdrom) or a filesystem that changes on a regular schedule (for example at midnight every day). So it would pose unnessesary load to network to check the consistency everytime again.
CacheFS knows a special mode of operation for such an situation. Itīs called on demand consistency checking. It does exactly what the name says: It checks only the consistency of files in the cache, when you tell the system to do it.
Continue reading "Less known Solaris Features: CacheFS - Part 5: On-demand consistency checking"
Monday, August 18. 2008
Okay, we have a working CacheFS mount but where and how is the stuff cached by the system. Letīs have a look at the cache.
Continue reading "Less known Solaris Features: CacheFS - Part 4: The Cache"
Monday, August 18. 2008
Okay ... using CacheFS is really easy. Letīs assume, you have an fileserver called theoden We use the directory /export/files as the directory shared by NFS. The client in our example is gandalf .
Continue reading "Less known Solaris Features: CacheFS - Part 3: A basic example"
Monday, August 18. 2008
The history of CacheFSSun didnīt introduced this feature for webservers. Long ago, admins didnīt want to manage dozens of operating system installations. Instead of this they wanted to store all this data on a central fileserver (you know ... the network is the computer). Thus netbooting Solaris and SunOS was invented. But there was a problem: Swap via Network was a really bad idea that days (it was a bad idea in 10 MBit/s times and itīs still a bad idea in 10 GBit/s times). Thus the diskless systems got a disk for a local swap. But there was another problem. All the users started to work at 9 oīclock ... they switched on their workstations ... and the load on the fileserver and the network got higher and higher. They had a local disk ... local installation again? No ... the central installation had itīs advantages. Thus the idea of CacheFS was born.
CacheFS is a really old feature of Solaris/SunOS. Itīs first implementation dates back into the year 1991. I really think you can call this feature matured
CacheFS in theoryThe mechanism of CacheFS is pretty simple. As i told you before, CacheFS is somewhat similar to a caching web proxy. The CacheFS is a proxy to the original filesystem and caches files their way through CacheFS. The basic idea is to cache remote files locally on a harddisk, so you can deliver them without using the network when you access them the second time.
Of course the CacheFS has to handle changes to the original files. So CacheFS checks the metadata of the file before delivering the copy. If the metadata has changed, the CacheFS loads the original file from the server. When the metadata hasnīt changed it delivers the copy from the cache.
The CacheFS isnīt just usable for NFS, you could use it as well for caching optical media like CD or DVD.
Monday, August 18. 2008
There is a hidden gem in the Solaris Operating Environment called CacheFS, solving a task many admins solve with scripts.
Imagine the following situation. You have a central fileserver, and letīs say a 40 webservers. All this webservers deliver static content, this content is stored on the harddisk of the fileserver. Later you recognize, that your fileserver is really loaded by the webservers. Harddisks are cheap, thus most admins will start to use an recursive rcp or an rsync to put a copy of the data to the webserver disks.
Well ... Solaris gives you a tool to solve this problem without scripting, without cron based jobs, just by using NFSv3 and this hidden gem: CacheFS. CacheFS is a really nifty tool. It does exactly what the name says. Itīs an filesystem, that caches data of an other filesystem. You have to think about it like a layered cake. You mount a CacheFS filesystem with an parameter that tells CacheFS to mount another one in the background.
|
Comments