Wednesday, May 21. 2008
There is a nice walkthrough for configuring Opensolaris as an CIFS fileserver for home use at the Sun Developer Network . It´s really simple: Developer Recipes: Setting Up an OpenSolaris NAS Box. And NFS ist just a zfs set sharenfs=on examplepool/examplefilesystem away.
Friday, March 28. 2008
Interesting article about NFS at EnterpriseStorageForum - "The Future of NFS Arrives": I believe that NFSv4.1 is going to make a number of file systems look very bad.
Sunday, February 17. 2008
History
The ever reoccuring question to me at customer sites relatively new to Solaris is: "Okay, on Linux i had my homedirectories at /home. Why are they at /export/home at Solaris?" This is an old hat for seasoned admins, but i get this question quite often. Well, the answer is relativly simple and it comes from the time when started to use NIS and NFS and it had something to to with our slogan "The network is the computer", because it has to do with directories distributed in the network. Okay, we have to go 20 years in the past.
Continue reading "Less known Solaris Features: /export/home? /home? autofs?"
Tuesday, January 29. 2008
What is high end enterprise storage? This huge bad-ass high performance storage arrays in the size of multiples of racks? At the end: Not much more than a SAN in a box with an intelligent storage controller in front of it. Manufactures like Hitachi or EMC make a large wad of money out of this devices. But: I don´t believe that storage boxes of this size will be a reasonable market in 5 years from now. The storage vendors will be the next group of manufacturers, who will get problems with the comoditization of computer hardware.
I thought about this idea the last few days, after reading about the integration of the SCSI Remote DMA protocol over Infiniband in Solaris. Sun works hard to integrate more and more storage features into Solaris. A new SCSI target framework, more and more features for ZFS, CIFS, NFS ... you name it.
After a while a idea came into my mind: Lets take some racks full of X4500 (with SATA disks with the mechanical specifications of high end FC disks), perhaps a next generation one with a large amount of RAM for caching, two quad cores for encryption and compression and background scrubbing of the data and remote replication. Substitute some disks with solid state disks for the seperated ZIL and the L2ARC to speed up ZFS. Build an Infiniband network between it. Take a UltraSPARC T2 or an Victoria Falls based system or two as storage controller for serving iSCSI, FC Target, CIFS, NFS ... . And a clever management software to control this configuration from a single point and intelligent distribution of the storage usage over all boxes.. At the end you will get something in the size and the performance of an high end storage box but with a much smaller price as you use commodity components with commodity servers.
Okay, there are some missing pieces and many components must go trough an optimisation until competing with high end storage. But this will definitly come. And when Sun doesn´t to it, someone else will to it.
Thursday, December 6. 2007
Yet another important step for Solaris in regard of fileserving. The code vor VSCAN was integrated into Solaris yesterday.. With VSCAN you can integrate an ICAP compliant virus scanner into Solaris to check files delivered by NFS or CIFS. Neat stuff ...
Saturday, October 27. 2007
Yesterday was an important day for Opensolaris. Something found it´s way in the kernel, i already knew but wasn´t allowed to talk about (at least not without a NDA saying "When you talk about this, we send a support technican spelling a dozen mighty curses in you datacenter. And, yeah, we know where you park your car ..."  ).
Alan Wright announced the putback of the code changes resulting out of PSARC 2006/715, or for mortals like me: The in-kernel CIFS service. This doesn´t mean just the integration to mount CIFS shares. That´s a different project. With this changes CIFS get a first class citizen besides NFS. Opensolaris Build 77 will give you a kernel-integrated CIFS server.
And as a CIFS server alone is only half the answer to the question, there will be the integration of NDMP for backups or ICAP for integrating virus scanning software. So stay tuned, albeit the page isn´t activated you will find more informations at the CIFS community at opensolaris.org soon. Meanwhile you should look at Bob Porass´"Fish -n- CIFS" for more informations.
PS: I can only assume, that NTAP saw something like this coming when they filed their "all in"-lawsuit...
Thursday, October 25. 2007
I wrote in my articles about the CEC that Jonathan looked really pissed about the patent lawsuit at a certain court in Texas (in my opinion, this says enough about the claims). And my inkling about Jonathans opinion wasn´t false: He is really pissed . Just have a look at his latest blog entry - "ZFS Puts Net App Viability at Risk?". I think this blog entry and the preannouncment of an reciprocal lawsuit is something like the last call to Network Appliance to return to a sane behaviour before it gets nasty. And nasty means really nasty: And to be clear, once again, we have no interest whatever in suing NetApps - we didn't before this case, and we don't now. But given the impracticality of what they're seeking as resolution, to take back an innovation that helps their customers as much as ours, we have no choice but to respond in court.
So later this week, we're going to use our defensive portfolio to respond to Network Appliance, filing a comprehensive reciprocal suit. As a part of this suit, we are requesting a permanent injunction to remove all of their filer products from the marketplace, and are examining the original NFS license - on which Network Appliance was started. [...]
In addition to seeking the removal of their products from the marketplace, we will be going after sizable monetary damages. And I am committing that Sun will donate half of those proceeds to the leading institutions promoting free software and patent reform (in specific, The Software Freedom Law Center and the Peer to Patent initiative), and to the legal defense of free software innovators. I really hope that Network Appliance will step back from their lawsuit. We need both companies. Only a healthy competition will drive innovation.
Wednesday, August 15. 2007
I´ve ranted yesterday about the marketing habits of IBM. But kudos to IBM, they have excellent documentation, they even document the weaknesses of their technologies. So you find very intersting documents about their technology. Not that they hadn´t widthdrawn redbooks in the past like the one that documented the overhead of mpars, but mostly it´s a good source of realistic information. And thanks to the RedBooks it´s easy to clear some of fuss around some IBM technologies. And so, you find some interesting facts for example in the Redbook. And in this article i want to talk about WPARS. Many people think, that this is a cheap rip off of Solaris Zones. But many of them think, that the Live WPARS mobility is a cool feature, too. But when you really read public documentation much of the fuss appears a little bit overblown. Let´s have a look in the Redbook 247431: Introduction to Workload Partition Management in IBM AIX": - The application in a WPAR has to write into a filesystem.
Page 32: All files that needs to be written by the application must be hosted on an NFS filesystem That means: No large applications that needs direct raw access to disk like Oracle on WPARS and the speed of you application is limited to NFS. - WPARs works with checkpoint/restore. That means: When you want to migrate your application, you have to freeze the WPAR. Then you have to checkpoint it. The checkpoint state file will be written via NFS to the shared storage. When you restart the WPAR on a different system, the process loads the checkpoint state file from the NFS server, and starts the WPAR. The application doesn´t run while to take or revive the snapshot:
The chkptwpar command captures a snapshot of all tasks executing within one WPAR. It first interrupts all processes so they reach a quiescence point, then stores a copy of the processes context in a state file. Still sound unproblematic to you? The problem is in a detail. Think about an application that allocates 8 GB of memory (your java application for example), now take into consideration that most datacenters run on Gigabit Ethernet. Let´s assume 80 MB/s per Gigabit Ethernet interface via NFS. Thus the writing the state file would take 100 seconds. Reading it on the target system: Another 100 seconds. IBM states:The only visible effect for a user of the application is a slightly longer response time while the application is migrating. Well, i don´t know your users, but three and a half minute application downtime isn´t a slightly longer response time.
Okay, perhaps you can live with all this stuff, but still believe this, then you should read the manual for the Workload Partition Manager. The parts about Migration compatibility beginning arround page 27 is especially interesting: There are compatibility modes like "inbound/outbound compatile". An example: Outbound Compatible
Compatibility testing shows that a WPAR can be relocated from the departure system to the arrival system, but it cannot be relocated back from the arrival system to the departure system. The first time i´ve heard of it, i thought this is a joke, but it´s documented in IBM´s own documentation. It´s perfectly possible, that you can migrate a system to another system, but not back. When i understand it correctly, all systems have to patched to exactly the same level. When the patchlevel is lower on the arrival machine than on the departure machine, you can´t migrate to the system. When the patchlevel ist higher on the arrival machine than one the depature system, you can´t migrate back to your old system. And the libc has to be same on both systems in any case. Whoa .. what a bullshit ...
When i look at the WPARS solution, it looks like a really bad kludge. It´s good for the bullet point list wars.I´m sure that IBM sales reps will run around with WPARS and tell "Sun cannot do that" and maybe some customers who doesn´t like Sun, will use it as an argument to their management, to explain why they didn´t bought Sun.
But honestly, I don´t think, that you really can use it for anything practical.
PS: I´m not an AIX expert. When i´ve got something wrong, please correct me!
Thursday, July 26. 2007
I´ve reported about the separated ZIL a few days ago. The problem of the described NVRAM PCI card is, that you can´t do a clusterfailover with such a device. How do you want to failover the seperate log, when the log is on a card in the failed server? Sun had a product called Prestoserve, that was used to accelerate NFS and DB. It was static RAM with a battery. It was great for benchmarks, but suffered by the cluster problem.
Thus you should use some external device, that can failover with the rest of your storage. The obvious choice would be a RAM-based Solid State Disk(SSD). But these are quite expensive: You need the RAM, you need a harddisk to keep the data persistent when power fails, and you need a rechargeable battery or an capacitor that´s able to power the SSD until all data is written to the hard disk.
A Flash-Based SSD would be a more sensible choice, as Flash is a non-volatile memory by nature. Such a disk costs you approximatly 400$. But most people think "Oh no, wear will destroy it within a few days". Experiences with el-cheapo CF-cards underline this assumption.
But let´s calculate with the specifications of a leading brand flash disk. Let´s assume: A 32 GB flash-based SSD is specified for 2.000.000 write cycles. We have a sustained stream of 40 MB per second (conservative assumption). The wear leveling is perfect (perhaps supported by a seperate ZIL algorithm, that looks at the flash SSD as a cyclic buffer). Okay, a little math:

So this flash SSD wouldn´t fail by wear within the usable live of the storage and the server, even when you write 40 MB every second to it. I´m sure, that a flash disk doesn´t run such a long time, but this is not a wear problem, it´s the problem, that modern electronic hasn´t the build quality of former times.
Based on this considerations, a flash SSD would be an interesting choice for the separated ZIL. Or at least: Wear isn´t a reason for not using Flash SSD
PS: There is one point, i´m not perfectly sure, but i interpret the 2 million write cycles as the ability to erase and write the full disk 2 million times.
Thursday, July 5. 2007
Some interesting observations about the scaling of the way NFS handles shares, when you have thousands of them: The Management of NFS Performance With Solaris ZFS.
Tuesday, June 26. 2007
pNFS is a way to build massivly scaling NFS cluster by distributing the files over several servers. It will be standardized in NFS V4.1 and it will part of future Solaris. But when you already want to try it, you should look here. You find some tarballs there to bf-upgrade your Solaris installation for your own experiments.
PS: You will find a good overview of pNFS at StorageMojo.
Tuesday, June 19. 2007
After the LSPP EAL4+ evaluation of RHEL5 you should read this article to get an in-depth view to the matter. The article Comparing the Multilevel Security Policies of the Solaris Trusted Extensions and Red Hat Enterprise Linux Systems ends with:
RHEL5 LSPP and Trusted Extensions have taken different design approaches to meet the same CC profiles. However, while these systems might meet the same criteria, it is important to consider the functionality that is included in the systems submitted for LSPP evaluation. Trusted Extensions includes several features, such as multilevel NFS and a multilevel windowing system, that are designed to meet the security data flow requirements specified by LSPP and to be usable in real-world environments and by real-world customers. Comparable features are either not available or have been excluded from the RHEL5 LSPP evaluation target. It´s quite interesting, how different two system with the same EAL level and the same protection profile can be.
Wednesday, October 4. 2006
Nun, dritter Tag ... die letzten Breakout-Sessions ... beginnen wir mal den Tag mit "Building multipetabyte architectures with Thumper". Nicht wirklich viel los, die meisten Leute sind wohl irgendwie noch in einem Zustand der frühmorgendlichen Pseudostasis.
Der Vortrag begann damit, zu erklaeren, warum man ueberhaupt petabyteweise Daten speichen will, sollte und muss. Insbesondere Forschungsinstitute muessen heute grosse Datenmengen speichern. Wenn man bedenkt, das hoechstaufgeloeste Gehirnscans mittlerweile 1.5 Terabyte Speicher pro Bild verbrauchen, kommen diese Terabytes ziemlich schnell zusammen.
Eine Lösung um soetwas mit Thumpern darzustellem, ist die Verwendung von Lustre. Lustre ist ein Cluster Filesystem, welches darauf ausgelegt worden ist, mir sehr grossen Filemengen und sehr grossen Bandbreiten umzugehen. Das bestechende an Lustre ist der mögliche Verzicht auf ein SAN. Ich kann trotzdem auf gemeinsam von einem Client auf den Speicher zugreifen. Daszu wird auf dem Server ein Object Storage Server installiert. Die Kommunikation zwischen diesen Server und dem Client geht dann Ethernet, Myrinet oder Infiniband. Damit kann man dann auch Storage horizontal skalieren. Leider gibt es das nur fuer Linux momentan. Da muss der Hersteller von Lustre unbedingt noch nachlegen.
Einige interessante Anmerkungen:
- man kann mit 8 Gigabit-Ethernetschnittstellen das I/O-System der X4500 nicht saturieren
- es wurde auf mein Blog verwiesen
Tuesday, October 3. 2006
 Die Keynotes von David Yen und dem EVP der Softwarepractice waren eben recht gut.
Beide haben einen Überblick gebeten, wie ihre Sicht der Dinge ist. Recht typische Highlevel-Präsentationen, ganz gut aber eher nur von internem Interesse. Ein sehr interessante Anmerkung war aber: Wenn wir es nicht schaffen, ZFS zu einem interessanten Produkt beziehungsweise einem Enabler für weiteres Geschäft zu machen, dann macht das jemand anders. Und das wäre ziemlich peinlich. Zumal uns sowas dann das zweite Mal passiert wäre. NFS ist nämlich nicht von NetApp erfunden worden.
Gerade laeuft eine Keynote der EVPs der Service Practice. Mal sehen, wie die wird. Für einen doch eher technikorientierten Menschen ist Service ja meist doch ein Thema "need-to-have".
|
Comments