As usual, this technology has itīs own jargon. Thus i will start to define the most important words at first.
The circle of live
Before defining the jargon, itīs important to understand, that every file under the control of SamFS follows a certain lifecyle. You create or modify it, the system archives it, after a certain time without an access the system removes it from expensive storage, when it has copies on cheaper ones, when you access it, it will be gathered from the cheaper storage and delivered to you. When you delete it, you have to remove it from all your medias. This cycle is endless until a file is deleted.
Policies
Albeit every file is under the control of the described cycle, the exact life of a file doensīt have to be the same for every file. SamFS knows the concept of policies to describe the way, SamFS should handle a file. How many copies should SamFS make of a file on which media. The most difficult task of configuring SamFS is to find a most adaequate policy. You need experience for it, but itīs something that you can easly learn on the job.
Archiving
Okay, the first step is archiving. Letīs assume youīve created a file. The data getīs stored into the SamFS filesystem. Okay, but youīve defined a policy, that you want two copies on a tape media. The process that does this job ist called
archiver, the process itself is called
archiving. Archiving moves your files to the desired media. The metadata of the files is augmented with the positions of the file. SamFS can create up to 4 copies of a file. Important to know: SamFS doesnīt wait with the archiving process until it needs space on the cache media. It starts the process of archiving files with the next run of the archive (for example every 5 minutes)
Releasing
Okay, letīs assume you filesystem is 90% full. You need some space to work. Without SamFS you would move around the data manually. SamFS works similar and differently at the same time. The archiver already moved your data to different places. Thus releasing is the process to delete the data from your filesystem. But it doesnīt delete all of it. It keeps a stub of it in the filesystem. This process is called releasing. The metadata (filename, acl, ownership, rights, and the start of the file) stays on disk. Thus you wonīt see a difference. You can walk around in your directories and you will see all your files. The difference: The data itself isnīt in the filesystem anymore, thus it donīt consume space in it.
Staging
Okay, after long time (the file was already released) you want to access the data. You go into the filesystem, and open this file. SamFS intercepts this call, and automatically gathers the data from the archive media. In the meantime the reads from this file will be blocked, thus the process accessing the data blocks, too. SamFS uses informations from the metadata to find the media.
Recycling
Okay, the end of the lifetime of a file is itīs deletion. Thatīs easy for disks. But you canīt delete a single file from tape in an efficient manner. Thus SamFS uses a different method: The data on the tape is just marked as invalid, the stub getīs deleted. But the data stays on tape. After a while more and more data may get deleted from tape. This may end in a swiss cheese wher only a small amount of data is actual data. This would be waste of tape and the access pattern getīs slower and slower. Reycling solves this by a single trick. The residual active data gets a special marker. When the archiver runs the next time, the data getīs archived again. Now there is no actual data left on the tape. You can erase it by writing a new label to it and you can use it for new data again. This process is called recycling.
The circle of life
Okay, with this jargon we can draw a picture of this processes.

Once a file gets newly written or updated, it gets archived. Based on a combination policies, usage and the caching strategy itīs possible itīs getting released and staged again and again. And at the end, the tape with the data will be recycled.
Watermarks
Watermarks are an addtional, but very important concept in SamFS. The cache is much smaller than the filesystem . Nevertheless you have to provide space for new and updated data. So SamFS implements two important watermarks: Then the cache gets filled to the
high watermark, the system starts to release the least recently used files with a minimum number of copies on archive media automatically. This process process stops, when the
low water mark is reached. Thus you can ensure that you have at least a certain amount of free capacity to store new or updated data in the filesystem.
The SamFS filesystem: Archive media
When used in conjunction with the Archiver/Stager/Releaser construct, the SamFS filesystem itself isnīt much more than a cache for all the data you store in this filesystem. Not the size of the SamFS filesystem is decisive for the size of your file system, the amount of archive media is the limitation of the size. For example. With a 1 GB disk cache and 10 petabyte of T10000 tapes, you can store up to 5 petabyte of data. Why 5 petabyte? Well, itīs a best practise to store two copies of every file on your system, just in case a tape gets lost or damaged.
Archive media can be of different nature:
- disk drives
- other SamFS Servers
- tapes (with or without autoloading)
- magneto optical disks (with or without autoloading)
The media doesnīt even have to be in reach of an autoloader. SamFS knows the concept of offlined archive media, for example tapes in a safe. When you try to access data on an offlined media, the accessing process blocks and the admin is notified to move itīs a.. to put the tape into a drive.
Part 1: Introduction.html Part 2: The theory of Hierarchical Storage Management Part 3: The Jargon of SamFS Part 4: Installation of the packages Part 5: Configuring a SamFS filesystem Part 6: Configuring disk archiving Part 7: Working with SamFS Pa
Tracked: Mar 25, 09:51