A little change of queues

An overwhelming number of ZFS installations work with just a bunch of disks, perhaps in a JBOD or in the server itself. However there are installations, that use disk arrays with RAID-controllers. Some of those installations are even using a single LUN. I don’t think that this is a good idea (for e.g. because ZFS can just detect corruptions without redundancies, but not repair them) but that’s a different story I don’t want to discuss here. There is a slight change in the default parameters of ZFS in Update 9. It’s related to the parameter zfs:zfs_vdev_max_pending . This parameter controls, how many I/O requests can be pending per vdev. For example when you have 100 disks visible from your OS with a zfs:zfs_vdev_max_pending of 2, you have 200 request outstanding at maximum. When you have 100 disks hidden behind your storage controller just showing a single LUN, you will have – you will know it – 2 pending requests at maximum. You may think, that you could increase the queue depth without end, but as usual this is a tradeoff game and not that easy, longer queue depths may increase latency of the commands. Experience showed that certain queue depth delivered the best performance on most installations. However the installed landscape changes and sometimes you have to adjust things. Exactly this happened a while ago in Opensolaris. And it seems that this change moved into Solaris. The default for zfs:zfs_vdev_max_pending is 10 at the moment. You can check this:

# echo zfs_vdev_max_pending::print | mdb –kw
0xa
#

0xa in decimal is 10. And this is a wise choice for most implementations out there. But it was different on older versions. I checked it on U7, i asked my twitter/facebook contacts to make quick check on U8 as i was to lazy to install it:

# echo zfs_vdev_max_pending::print | mdb –kw
0x23
#

0x23 in decimal is 35 and 35 was the default up to Update 8 of Solaris 10. So essentially the queues are less deep than before. For JBODs this is most often a good thing, as each vdev and thus each LUN has its own queue of 10 pending I/Os. For a single LUN hiding many disks sometimes not. So how do you change it back to the old value?
You can change it dynamically:

# echo zfs_vdev_max_pending/W0t35 | mdb –kw

To make this change boot-persistent you have to add a line to /etc/system:

set zfs:zfs_vdev_max_pending = 35 

Sometimes even an higher value may be indicated with very large numbers of disks behind your controller forming a single LUN.
How do you know if this decreased queue depth is a problem for you at all? The command iostat will help you:

jmoekamp@hivemind:~$ iostat -xdn
                    extended device statistics
    r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
    6,3    1,9  525,9   31,2  0,1  0,0   16,4    6,0   2   3 c3d0
   17,1    1,0 1676,0    8,0  0,2  0,1   11,4    4,8   4   4 c3d1
    6,4    1,9  525,8   31,2  0,1  0,0   14,1    4,8   2   2 c4d0
   17,1    1,0 1675,9    8,0  0,2  0,1   12,9    4,7   4   4 c4d1
    0,0    0,0    0,0    0,0  0,0  0,0    0,0    0,0   0   0 gsdbc
jmoekamp@hivemind:~$

If you see the column actv at or near the number of zfs:zfs_vdev_max_pending, it’s worth a try. Otherwise not.