I promised to publish a walk through to my ZFS demonstration at the CeBit 2009 booth ... itīs the stuff Ingo Frobenius called magic. Well, it isnīt really magic, but perhaps impressive when you demo it at a high speed. For the people used to the virtues ZFS this speed seems normal, but you have to consider, that most people know it otherwise .... vastly slower, vastly less integrated and vastly more uncomfortable.
So ... what was my demo case for ZFS at the CeBIT? As i had just one disk in my CeBIT system, iīve used the trick with using files as devices. So i had to create the file devices first.
# mkfile 128m /testfile1
# mkfile 128m /testfile2
# mkfile 128m /testfile3
# mkfile 128m /testfile4
Okay, now i create my testpool.
# zpool create tp mirror /testfile1 /testfile2
To show how mighty the
zfs create is, i showed the mount table afterwards to present the already mounted filesystem.
# mount
[..]
/tp on tp read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90013 on Fri Mar 20 12:58:24 2009
Afterwards iīve created some filesystem. As iīm a child of the beginning 80ies i use Muppet Show names most of the times:
# zfs create tp/statler
# zfs create tp/gonzo
# zfs create tp/waldorf
# zfs create tp/kermit
ZFS filesystem creation is so fast, that i show the mount table again to show the audience, that iīve really created file systems.
# mount
[..]
/tp/statler on tp/statler read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90017 on Fri Mar 20 12:59:34 2009
/tp/gonzo on tp/gonzo read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90018 on Fri Mar 20 12:59:37 2009
/tp/waldorf on tp/waldorf read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90019 on Fri Mar 20 12:59:40 2009
/tp/kermit on tp/kermit read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d9001a on Fri Mar 20 12:59:42 2009
The concept of the storage pool was new to many people at the CeBIT booth so i told them to observe the third column.
# zfs list | grep "tp/"
tp/gonzo 18K 90.8M 18K /tp/gonzo
tp/kermit 18K 90.8M 18K /tp/kermit
tp/statler 18K 90.8M 18K /tp/statler
tp/waldorf 18K 90.8M 18K /tp/waldorf
90.8 M free. Now letīs create a file in one of the directory.
# mkfile 10m /tp/gonzo/testfile
Okay, yet another look to the filesystems.
# zfs list | grep "tp/"
tp/gonzo 5.27M 85.6M 5.27M /tp/gonzo
tp/kermit 18K 85.6M 18K /tp/kermit
tp/statler 18K 85.6M 18K /tp/statler
tp/waldorf 18K 85.6M 18K /tp/waldorf
As all four filesystems share the same pool, all have the same reduced amount of storage. Okay, now letīs extend the pool. A short look about the current configuration.
# zpool status tp
pool: tp
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tp ONLINE 0 0 0
mirror ONLINE 0 0 0
/testfile1 ONLINE 0 0 0
/testfile2 ONLINE 0 0 0
errors: No known data errors
We have a mirror of two devices. Okay, letīs add the other two filesystem.
# zpool add tp mirror /testfile3 /testfile4
Letīs have another look to our pool structure.
# zpool status tp
pool: tp
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
tp ONLINE 0 0 0
mirror ONLINE 0 0 0
/testfile1 ONLINE 0 0 0
/testfile2 ONLINE 0 0 0
mirror ONLINE 0 0 0
/testfile3 ONLINE 0 0 0
/testfile4 ONLINE 0 0 0
errors: No known data errors
We have now a stripe of two mirrors. And when you look at the filesystem, all filesystems of the pool have the same increased amount of storage.
# zfs list | grep "tp/"
tp/gonzo 20.0M 194M 20.0M /tp/gonzo
tp/kermit 18K 194M 18K /tp/kermit
tp/statler 18K 194M 18K /tp/statler
tp/waldorf 18K 194M 18K /tp/waldorf
Letīs play around with filesystem snapshots. Iīve used the example of working in your home directory.
# cd /tp/gonzo
# touch monday
# touch tuesday
Okay, it would be nice to protect your work against mishaps. Letīs do a snapshot.
# zfs snapshot tp/gonzo@tuesdayevening
The storyline of my demo repeats this for a while:
# touch wednesday
# zfs snapshot tp/gonzo@wednesdayevening
# touch thursday
# zfs snapshot tp/gonzo@thursdayevening
# rm monday
# touch friday
# zfs snapshot tp/gonzo@fridayevening
Itīs Saturday and the boss needs the results of Monday.
# ls -l
total 4
-rw-r--r-- 1 root root 0 Mar 20 13:11 friday
-rw-r--r-- 1 root root 0 Mar 20 13:10 thursday
-rw-r--r-- 1 root root 0 Mar 20 13:10 tuesday
-rw-r--r-- 1 root root 0 Mar 20 13:10 wednesday
Fsck ... youīve deleted them. But you could use the snapshots:
# cd .zfs
# cd snapshot
# cd tuesdayevening/
# ls -l
total 40979
-rw-r--r-- 1 root root 0 Mar 20 13:10 monday
-rw------T 1 root root 20971520 Mar 20 13:05 testfile
-rw-r--r-- 1 root root 0 Mar 20 13:10 tuesday
# cp monday /tp/gonzo/monday
You can just go to the
.zfs directory in the root of your filesystem and access a snapshot with itīs name as a directory name. Okay, most people are really impressed now, but we can do more than that. We can do the same for raw devices. At first i showed them the creation of sparse provisioned directories with the storyline "Imagine, you have collegue telling you, that he need a raw volume as large as 5 gigabyte but you know he needs only 128 megabytes. How to give him 5 gigabyte without giving him 5 gigabyte worth of hardisks". So i create such a volume.
# zfs create -V 5g -s tp/ufsvolume
# zfs list | grep "tp/"
tp/gonzo 20.1M 194M 19K /tp/gonzo
tp/kermit 18K 194M 18K /tp/kermit
tp/statler 18K 194M 18K /tp/statler
tp/ufsvolume 16K 194M 16K -
tp/waldorf 18K 194M 18K /tp/waldorf
Itīs really a device. Just look at the device path:
# ls -l /dev/zvol/dsk/tp/ufsvolume
lrwxrwxrwx 1 root root 35 Mar 20 13:13 /dev/zvol/dsk/tp/ufsvolume -> ../../../../devices/pseudo/zfs@0:6c
Letīs format it with UFS just as an example. You could export it with iSCSI and format it with NTFS as well.
# newfs /dev/zvol/dsk/tp/ufsvolume
newfs: construct a new file system /dev/zvol/rdsk/tp/ufsvolume: (y/n)? y
Warning: 2082 sector(s) in last cylinder unallocated
/dev/zvol/rdsk/tp/ufsvolume: 10485726 sectors in 1707 cylinders of 48 tracks, 128 sectors
5120.0MB in 107 cyl groups (16 c/g, 48.00MB/g, 5824 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
9539744, 9638176, 9736608, 9835040, 9933472, 10031904, 10130336, 10228768,
10327200, 10425632
#
We need some mountpoint.
# mkdir /mountpoint1
# mkdir /mountpoint2
# mkdir /mountpoint3
I initially mount the filesystem and create an timestamp file in it.
# mount /dev/zvol/dsk/tp/ufsvolume /mountpoint1
# date >> /mountpoint1/timestamp
# cat /mountpoint1/timestamp
Fri Mar 20 13:19:04 CET 2009
Iīm unmounting it, make a snapshot of it, remout it and create another timestamp file just to show that itīs still writeable.
# umount /mountpoint1
# zfs snapshot tp/ufsvolume@template
# mount /dev/zvol/dsk/tp/ufsvolume /mountpoint1
# date >> /mountpoint1/timestamp2
Now letīs have a short look to the contents of our UFS filesystem.
# ls -l /mountpoint1/
total 20
drwx------ 2 root root 8192 Mar 20 13:16 lost+found
-rw-r--r-- 1 root root 29 Mar 20 13:19 timestamp
-rw-r--r-- 1 root root 29 Mar 20 13:21 timestamp2
There are two timestamp files in it as expected. Now we mount our snapshot. As snapshots are read-only by definition, we can just mount it read only.
# mount -o ro /dev/zvol/dsk/tp/ufsvolume@template /mountpoint2
# ls -l /mountpoint2
total 18
drwx------ 2 root root 8192 Mar 20 13:16 lost+found
-rw-r--r-- 1 root root 29 Mar 20 13:19 timestamp
But we can look in and the the version at the time of the snapshot ... but now we want to have a writeable version of the filesystem. We have to clone the snapshot. No problem.
# zfs clone tp/ufsvolume@template tp/workingvolume
# mount /dev/zvol/dsk/tp/workingvolume /mountpoint3
# cd /mountpoint3
# ls -l
total 20
drwx------ 2 root root 8192 Mar 20 13:16 lost+found
-rw-r--r-- 1 root root 29 Mar 20 13:19 timestamp
# mkfile 1k testfile
Initially it has the same concent as in our snapshot. But when we create an additional file the filesystem starts to be different. The nice thing. The cloned filesystem just takes the storage needed for the modifications, not for the a complete copy.
# ls -l
total 20
drwx------ 2 root root 8192 Mar 20 13:16 lost+found
-rw------T 1 root root 1024 Mar 20 13:26 testfile
-rw-r--r-- 1 root root 29 Mar 20 13:19 timestamp
This was my ZFS CeBIT showcase. For many people it was a really impressive show.