How to remove a top-level vdev from a ZFS pool
Over the years I did many, many presentations. Whenever talking with the customers afterwards about what they would like to see in ZFS, there was one feature that was always mentioned: Removing devices. While it was no problem for example to remove a member disk of a mirror, you couldn’t remove a top level vdev, you wasn’t able to remove a mirror out of a stripe of mirrors. With Solaris 11.4 we finally have such a feature allowing to do you exactly this. It’s really easy to use, so if I would like only to show this feature this would be a rather short entry. However I would like to shed some light about the mechanism behind it.
Preparing an example
Let’s assume we have three devices and we have created a striped pool out of it.
root@batou:/# zpool create testpool c1t2d0 c1t3d0 c1t4d0
We create some files in it:
root@batou:/# cd testpool
root@batou:/testpool# mkfile 1g test1 test2 test3 test4 test5 test6
Let’s now check the structure of the pool. For this I’m using the zdb -L
command. The output is much longer than represented here.
root@batou:/testpool# zdb -L testpool
[...]
name: ‚testpool‘
[...]
hostname: ‚batou‘
vdev_children: 3
[...]
children[0]:
guid: 1209395020087258815
id: 0
type: ‚disk‘
path: ‚/dev/dsk/c1t2d0s0‘
devid: ‚id1,sd@SATA_____VBOX_HARDDISK____VB96a218f1-27200143/a‘
phys_path: ‚/pci@0,0/pci8086,2829@d/disk@2,0:a‘
[...]
children[1]:
guid: 5622741003370822611
id: 1
type: ‚disk‘
path: ‚/dev/dsk/c1t3d0s0‘
devid: ‚id1,sd@SATA_____VBOX_HARDDISK____VB9cc00131-8b8a0295/a‘
phys_path: ‚/pci@0,0/pci8086,2829@d/disk@3,0:a‘
[...]
children[2]:
guid: 12149574521403767327
id: 2
type: ‚disk‘
path: ‚/dev/dsk/c1t4d0s0‘
devid: ‚id1,sd@SATA_____VBOX_HARDDISK____VB5b29f40e-f9bc48b9/a‘
phys_path: ‚/pci@0,0/pci8086,2829@d/disk@4,0:a‘
[...]
capacity operations bandwidth —— errors ——
description used avail read write read write read write cksum
testpool 5.85G 41.8G 745 0 80.9M 0 0 0 0
/dev/dsk/c1t2d0s0 1.95G 13.9G 244 0 26.6M 0 0 0 0
/dev/dsk/c1t3d0s0 1.95G 13.9G 255 0 27.1M 0 0 0 0
/dev/dsk/c1t4d0s0 1.95G 13.9G 245 0 27.2M 0 0 0 0
[…]
We have 6 Gigabyte worth of data, three devices thus 2 Gigabytes per device. Before you ask, I honestly don’t know why zdb -L
shows no writes. Will check this. Now let’s remove one of top level vdevs.
Removing the device
The removal process is really simple to trigger via the remove
subcommand to zpool
:
root@batou:/# zpool remove testpool c1t4d0
The device you want to remove then gets into REMOVING
.
NAME STATE READ WRITE CKSUM
testpool ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
c1t4d0 REMOVING 0 0 0
After a while the device will disappear from the pool.
NAME STATE READ WRITE CKSUM
testpool ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
In case you want to remove a top level vdev in a mirror you have to use the name of the top-level vdev. Let’s assume a pool consisting out of two mirrors.
NAME STATE READ WRITE CKSUM
testpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c1t2d0 ONLINE 0 0 0
c1t3d0 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
c1t4d0 ONLINE 0 0 0
c1t5d0 ONLINE 0 0 0
To remove the top-level vdev you have to address its name. In this case mirror-0
.
root@sol114s1:~# zpool remove testpool mirror-0
Behind the curtain
So how was this done by Oracle Solaris? Well, this is quite simple. It doesn’t really reorganize the data. The pool has still three devices after the change. You just don’t see the third one. When you check with zdb -L testpool
you will see that the third device changed to
children[2]:
guid: 14641473971126587410
id: 2
type: ‚pseudo‘
path: ‚$VDEV-9DA81B2EED2E2E37‘
phys_path: ‚testpool/$VDEV-9DA81B2EED2E2E37‘
removing: 1
The third device has been substituted by an virtual devices. This virtual device resides on the disks remaining in the pool. You can see it quite nicely in the output of zdb
description used avail read write read write read write cksum
testpool 6.03G 25.7G 3.55K 0 3.87M 0 0 0 0
/dev/dsk/c1t2d0s0 3.02G 12.9G 1.77K 0 1.92M 0 0 0 0
/dev/dsk/c1t3d0s0 3.02G 12.9G 1.76K 0 1.91M 0 0 0 0
$VDEV-9DA81B2EED2E2E37 2.00G 13.9G 20 0 28.9K 0 0 0 0
There is still a third device in it with 2 G worth of data, but more interesting the remaining devices now have taken over the data as indicated by the increased used
column for both devices. As long as the data isn’t changed the data will stay on this virtual device. Please note that the system isn’t simply blocking the full size of the vdev on disk, but it’s only the space for the data.
Let’s now delete everything in the pool by issuing a rm /testpool/*
command:
capacity operations bandwidth —— errors ——
description used avail read write read write read write cksum
testpool 499K 31.7G 460 0 2.85M 0 0 0 0
/dev/dsk/c1t2d0s0 316K 15.9G 203 0 1.51M 0 0 0 0
/dev/dsk/c1t3d0s0 184K 15.9G 248 0 1.22M 0 0 0 0
$VDEV-9DA81B2EED2E2E37 6.50K 15.9G 9 0 119K 0 0 0 0
The consumption has been significantly reduced. Let’s now recreate our datafiles.
root@batou:/testpool# mkfile 1g test1 test2 test3 test4 test5 test6
After this you will see the following output in the zdb -L
output.
capacity operations bandwidth —— errors ——
description used avail read write read write read write cksum
testpool 6.00G 25.7G 2.54K 0 194M 0 0 0 0
/dev/dsk/c1t2d0s0 3.00G 12.9G 1.31K 0 96.7M 0 0 0 0
/dev/dsk/c1t3d0s0 3.00G 12.9G 1.22K 0 97.7M 0 0 0 0
$VDEV-9DA81B2EED2E2E37 6.50K 15.9G 9 0 119K 0 0 0 0
The virtual device isn’t used for new writes, however all reads for the removed disks are now serviced by the virtual device, which means by proxy by the remaining disks. But the virtual device doesn’t get any new data. So over time in case you change the data on your pool, the virtual device won’t be used anymore. Of course when the data is static and you never change it, it won’t be migrated of the vdev.
When you add a new device, it won’t substitute the virtual device acting as the third device:
root@batou:~# zpool add testpool c1t4d0
You will see a pool with four devices instead.
capacity operations bandwidth —— errors ——
description used avail read write read write read write cksum
testpool 6.00G 41.6G 1.55K 0 4.07M 0 0 0 0
/dev/dsk/c1t2d0s0 3.00G 12.9G 771 0 1.64M 0 0 0 0
/dev/dsk/c1t3d0s0 3.00G 12.9G 774 0 1.62M 0 0 0 0
$VDEV-9DA81B2EED2E2E37 6.50K 15.9G 9 0 119K 0 0 0 0
/dev/dsk/c1t4d0s0 21.0K 15.9G 38 0 708K 0 0 0 0
Conclusion
After quite a time ZFS has finally the ability to remove top level vdevs. I think that reduces a lot of questions from now on in presentations.