CacheFS

Introduction

There is a hidden gem in the Solaris Operating Environment, solving a task that many admins solve with scripts. Imagine the following situation. You have a central fileserver, and let’s say 40 webservers. All of these webservers deliver static content and this content is stored on the harddisk of the fileserver. Later you recognize that your fileserver is really loaded by the webservers. Harddisks are cheap, thus most admins will start to use a recursive rcp or an rsync to put a copy of the data to the webserver disks.

Well... Solaris gives you a tool to solve this problem without scripting, without cron based jobs, just by using NFSv3 and this hidden gem: CacheFS. CacheFS is a really nifty tool. It does exactly what the name says. It’s a filesystem that caches data of another filesystem. You have to think about it like a layered cake. You mount a CacheFS filesystem with a parameter that tells CacheFS to mount another one in the background.

History of the feature

Sun didn’t introduce this feature for webservers. Long ago, admins didn’t want to manage dozens of operating system installations. Instead of this they wanted to store all this data on a central fileserver (you know... the network is the computer). Thus net-booting Solaris and SunOS was invented. But there was a problem: Swap via the network was a really bad idea in those days (it was a bad idea in 10 MBit/s times and it’s still a bad idea in 10 GBit/s times). Thus the diskless systems got a disk for local swap. But there was another problem. All the users started to work at 9 o’clock... they switched on their workstations... and the load on the fileserver and the network got higher and higher. They had a local disk... local installation again? No... the central installation had its advantages. Thus the idea of CacheFS was born.

CacheFS is a really old feature of Solaris/SunOS. Its first implementation dates back to the year 1991. I really think you can call this feature mature ;)

CacheFS in theory

The mechanism of CacheFS is pretty simple. As I told you before, CacheFS is somewhat similar to a caching web proxy. The CacheFS is a proxy to the original filesystem and caches files on their way through CacheFS. The basic idea is to cache remote files locally on a harddisk, so you can deliver them without using the network when you access them the second time.

Of course, the CacheFS has to handle changes to the original files. So CacheFS checks the metadata of the file before delivering the copy. If the metadata have changed, the CacheFS loads the original file from the server. When the metadata hasn’t changed it delivers the copy from the cache.

The CacheFS isn’t just usable for NFS, you could use it for caching optical media like CD or DVD as well.

A basic example

Okay... using CacheFS is really easy. Let’s assume that you have an fileserver called theoden. We use the directory /export/files as the directory shared by NFS. The client in our example is gandalf.

Preparations

Let’s create an NFS server at first. This is easy. Just share a directory on a Solaris Server. We log in to theoden and execute the following commands with root privileges.

[root@theoden:/]# mkdir /export/files  
[root@theoden:/]# share -o rw /export/files
# share
-               /export/files   rw   ""

Okay, of course it would be nice to have some files to play around with in this directory. I will use some files of the Solaris Environment.

[root@theoden:/]# cd /export/files
[root@theoden:/export/files]# cp -R /usr/share/doc/pcre/html/* .

Let’s do a quick test and see if we can mount the directory:

[root@gandalf:/]# mkdir /files
[root@gandalf:/]# mount theoden:/export/files /files
[root@gandalf:/]# unmount /files

Now you should be able to access the /export/files directory on theoden by accessing /files on gandalf. There should be no error messages.

Okay, firstly we have to create the location for our caching directories. Let’s assume we want to place our cache at /var/cachefs/caches/cache1. At first we create the directories above the cache directory. You don’t create the last part of the directory structure manually.

[root@gandalf:/]# mkdir -p /var/cachefs/caches

This directory will be the place where we store our caches for CacheFS. After this step we have to create the cache for the CacheFS.

[root@gandalf:/files]# cfsadmin -c -o maxblocks=60,minblocks=40,threshblocks=50 /var/cachefs/caches/cache1

The directory cache1 is created automatically by the command. In the case where the directory already exists, the command will quit and do nothing.

Additionally you have created the cache and you specified some basic parameters to control the behavior of the cache. Citing the manpage of cfsadmin:

maxblocks:: Maximum amount of storage space that CacheFS can use, expressed as a percentage of the total number of blocks in the front file system.
minblocks:: Minimum amount of storage space, expressed as a percentage of the total number of blocks in the front file system, that CacheFS is always allowed to use without limitation by its internal control mechanisms.
threshblocks:: A percentage of the total blocks in the front file system beyond which CacheFS cannot claim resources once its block usage has reached the level specified by minblocks.

Each of these parameters can be tuned to prevent CacheFS from eating away all of the storage available in a filesystem, a behavior that was quite common to early versions of this feature.

Mounting a filesystem via CacheFS

We have to mount the original filesystem now.

[root@gandalf:/files]# mkdir -p /var/cachefs/backpaths/files
[root@gandalf:/files]# mount -o vers=3 theoden:/export/files /var/cachefs/backpaths/files

You may notice the parameter that sets the NFS version to 3. This is necessary as CacheFS isn’t supported with NFSv4. Thus you can only use it with NFSv3 and below. The reason for this limitation has its foundation in the different way NFSv4 handles inodes.

Okay, now we mount the cache filesystem at the old location:

[root@gandalf:/files]# mount -F cachefs -o backfstype=nfs,backpath=/var/cachefs/backpaths/files,cachedir=/var/cachefs/caches/cache1 theoden:/export/files /files

The options of the mount command control some basic parameters of the mount:

backfstype: specifies what type of filesystem is proxied by the CacheFS filesystem
backpath: specifies where this proxied filesystem is currently mounted
cachedir: specifies the cache directory for this instance of the cache. Multiple CacheFS mounts can use the same cache.

From now on every access to the /files directory will be cached by CacheFS. Let’s have a quick look into the /etc/mnttab. There are two important mounts for us:

[root@gandalf:/etc]# cat mnttab 
[...]
theoden:/export/files   /var/cachefs/backpaths/files    nfs     vers=3,xattr,dev=4f80001        1219049560
/var/cachefs/backpaths/files    /files  cachefs backfstype=nfs,backpath=/var/cachefs/backpaths/files,cachedir=/var/cachefs/caches/cache1,dev=4fc0001   1219049688

The first mount is our back file system, it’s a normal NFS mountpoint. But the second mount is a special one. This one is the consequence of the mount with the -F cachefs option.

Statistics about the cache

While using it, you will see the cache structure at /var/cachefs/caches/cache1 filling up with files. I will explain some of the structure in the next section. But how efficient is this cache? Solaris provides a command to gather some statistics about the cache. With cachefsstat you print out data like hit rate and the absolute number of cache hits and cache misses:

[root@gandalf:/files]# /usr/bin/cachefsstat

    /files
                 cache hit rate:    60% (3 hits, 2 misses)
             consistency checks:      7 (7 pass, 0 fail)
                       modifies:      0
             garbage collection:      0
[root@gandalf:/files]#

The cache

Okay, we have a working CacheFS mount but how and where is the stuff cached by the system? Let’s have a look at the cache:

[root@gandalf:/var/cachefs/cache1]# ls -l
total 6
drwxrwxrwx   5 root     root         512 Aug 18 10:54 0000000000044e30
drwx------   2 root     root         512 Aug 11 08:11 lost+found
lrwxrwxrwx   1 root     root          16 Aug 11 08:18 theoden:_export_files:_files -> 0000000000044e30

To ensure that multiple caches using a single cache directory of the time aren’t mixing up their data, they are divided at this place. At first a special directory is generated and secondly a more friendly name is linked to this. It’s pretty obvious how this name is generated. theoden:_export_files:_files can be easily translated to theoden:/export/files mounted at /files.

Let’s assume we’ve used the cache for another filesystem (e.g. /export/binaries on theoden mounted to /binaries):

[root@gandalf:/var/cachefs/cache1]# ls -l
total 10
drwxrwxrwx   5 root     root         512 Aug 18 10:54 0000000000044e30
drwxrwxrwx   3 root     root         512 Aug 18 11:18 0000000000044e41
drwx------   2 root     root         512 Aug 11 08:11 lost+found
lrwxrwxrwx   1 root     root          16 Aug 18 11:18 theoden:_export_binaries:_binaries -> 0000000000044e41
lrwxrwxrwx   1 root     root          16 Aug 11 08:18 theoden:_export_files:_files -> 0000000000044e30

With this mechanism, the caches are separated in their respective directories... no mixing up.

When we dig down a little bit deeper to the directories, we will see an additional layer of directories. This is necessary to prevent a situation where a directory contains too many files and thus slows down.

[root@gandalf:/var/cachefs/cache1/0000000000044e30/0000000000044e00]# ls -l
total 62
-rw-rw-rw-   1 root     root           0 Aug 18 10:54 0000000000044e66
-rw-rw-rw-   1 root     root        1683 Aug 11 08:24 0000000000044eaa
-rw-rw-rw-   1 root     root       29417 Aug 11 08:22 0000000000044eba

When you examine these files, you will see that they are just a copy of the original files:

[root@gandalf:/var/cachefs/cache1/0000000000044e30/0000000000044e00]# cat 0000000000044eaa 
[...]
This page is part of the PCRE HTML documentation. It was generated automatically
from the original man page. If there is any nonsense in it, please consult the
man page, in case the conversion went wrong.
[...]
[root@gandalf:/var/cachefs/cache1/0000000000044e30/0000000000044e00]#

The "Cache" of CacheFS is a pretty simple structure.

On-demand consistency checking with CacheFS

Let’s assume you share a filesystem with static content (for example a copy of a CD-ROM) or a filesystem that changes on a regular schedule (for example at midnight every day). So it would pose unnecessary load to network to check the consistency every time a file is accessed.

CacheFS knows a special mode of operation for such a situation. It’s called on demand consistency checking. It does exactly what the name says: It only checks the consistency of files in the cache when you tell the system to do so.

I will demonstrate this with an example:

Let’s assume we stay with the normal mode of operation from the example before. We create a file on the fileserver.

[root@theoden:/export/files]# date >> test_with_consistency_check
[root@theoden:/export/files]# cat test_with_consistency_check 
Tue Aug 12 14:59:54 CEST 2008

When we go to the NFS client and access the directory, this new file is visible instantaneously. And when we access it, we see the content of the file.

[root@gandalf:/files]# cat test_with_consistency_check 
Tue Aug 12 14:59:54 CEST 2008

Now we go back to the server, and append additional data to the file:

[root@theoden:/export/files]# date >> test_with_consistency_check
[root@theoden:/export/files]# cat test_with_consistency_check 
Tue Aug 12 14:59:54 CEST 2008
Tue Aug 12 15:00:11 CEST 2008

Obviously, you will see this change on the client:

[root@gandalf:/files]# cat test_with_consistency_check 
Tue Aug 12 14:59:54 CEST 2008
Tue Aug 12 15:00:11 CEST 2008

Now we unmount it, and remount it:

[root@gandalf:/files]# cd /
[root@gandalf:/]# umount /files
[root@gandalf:/]# mount -F cachefs -o backfstype=nfs,backpath=/var/cachefs/backpaths/files,cachedir=/var/cachefs/caches/cache1,demandconst theoden:/export/files /files

You may have noticed the demandconst option. This option changes everything. Let’s assume you created another file on the NFS server:

[root@theoden:/export/files]# date >> test_with_ondemand_consistency_check
[root@theoden:/export/files]# cat test_with_ondemand_consistency_check 
Tue Aug 12 15:00:57 CEST 2008

Back on the NFS client you will not even see this file:

[root@gandalf:/files]# ls                              
index.html                         pcre_refcount.html
[...]
pcre_info.html                     pcretest.html
pcre_maketables.html               test_with_consistency_check

You have to trigger a consistency check. This is quite easy.

[root@gandalf:/files]# cfsadmin -s all

[root@gandalf:/files]# ls
index.html                            pcre_study.html
[..]
pcre_info.html                        test_with_consistency_check
pcre_maketables.html                  test_with_ondemand_consistency_check
pcre_refcount.html

Okay, now we can look into the file.

[root@gandalf:/files]cat test_with_ondemand_consistency_check 
Tue Aug 12 15:00:57 CEST 2008

Now we append a new line to the file on the server by executing the following commands on the NFS server

[root@theoden:/export/files]date >> test_with_ondemand_consistency_check
[root@theoden:/export/files]cat test_with_ondemand_consistency_check 
Tue Aug 12 15:00:57 CEST 2008
Tue Aug 12 15:02:03 CEST 2008

When we check this file on our NFS client, we still see the cached version.

[root@gandalf:/files]cat test_with_ondemand_consistency_check 
Tue Aug 12 15:00:57 CEST 2008

[root@gandalf:/files]cfsadmin -s all

Now we can look into the file again, and you will see the new version of the file.

[root@gandalf:/files]cat test_with_ondemand_consistency_check 
Tue Aug 12 15:00:57 CEST 2008
Tue Aug 12 15:02:03 CEST 2008

Okay, it’s pretty obvious this isn’t a feature for a filesystem that changes in a constant and fast manner. But it’s really useful for situations, where you have control over the changes. As long as a file is cached, the file server will see not a single access for such files. Thus such a file access doesn’t add to the load of the server.

There is an important fact here: It doesn’t tell CacheFS to check the files right at that moment. It just tells CacheFS to check it at the next access to the file. So you don’t have an consistency check storm.

An practical usecase

Let’s take an example: Let’s assume you have 50 webservers that serve static content or that serve and execute .php files. You may have used scp or rsync to synchronize all of these servers.

With CacheFS the workflow is a little bit different: You simply put all of your files on a fileserver. But you have noticed that there was a large load on this system. To reduce load to the fileserver you use your new knowledge to create a cache on every server. After this you saw a lot of requests for directory and file metadata for the documents directory. You know that this directory is provisioned with the changes only at 3 o’clock in the morning. Thus you write a little script that checks for the file please_check_me and starts cfsadmin -s all on the client in the case of its existence, causing them to check at the next access to the files if there is a newer version.

The CacheFS feature in future Solaris Development

CacheFS is one of the hidden gems in Solaris - in fact it’s so hidden, that it’s been in sustaining mode since 1999. That means that only bugfixes are made for this component but no new features will find their way into this component. In recent days there was some discussion about the declaration of the End-of-Feature status for CacheFS which will lead to the announcement of the removal of CacheFS. While this isn’t a problem for Solaris 10, I strongly disagree with the idea of removing this part of Solaris, as long there is no other boot-persistent non-main memory caching available for Solaris.

Conclusion

CacheFS is one of the features that even some experienced admins aren’t aware of. But as soon as they try it, most of them can’t live without it. You should give it a try.