Less known Solaris features - IP Multipathing (Part 6): New IPMP

When you want to try new IPMP, you need a fairly recent build of OpenSolaris. New IPMP was integrated into Build 107 for the first time. At first: If you have already a working IPMP configuration, you can simply reuse this config. However it yields a different looking, but functionally equivalent result compared to your system with Classic IPMP. This is made possible by some automagic functions in New IPMP. One example is the implicit creation of the IPMP interface with the name of the IPMP group when there isn’t already an IPMP interface in the group. However explicit creation should be prefered as you can choose a better name for your IPMP interface and the dependencies are much more obvious. As I wrote before, the new IPMP doesn’t use a logical interface switching from one interface to the other. You have a special kind of interface for it. It’s a virtual interface. It’s looking like a real interface but there is no hardware behind this interface. So … at first we have to configure this interface:

jmoekamp@hivemind:/etc# ifconfig production0 ipmp hivemind-prod up

With this command you’ve configured the IPMP interface. You can use any name for it you want, it just has to begin with a letter and has to end on a number. I have chosen the name production0 for this tutorial Now let’s look at the interface:

jmoekamp@hivemind:~# ifconfig production0
production0: flags=8011000803<UP,BROADCAST,MULTICAST,IPv4,FAILED,IPMP> mtu 68 index 6
	inet 192.168.178.200 netmask ffffff00 broadcast 192.168.178.255
	groupname production0

As you see, it’s pretty much looking like a normal network interface with some specialities: At first it’s in the mode FAILED at the moment. There are no network interfaces configured to the group, thus you can’t connect anywhere over this interface. The interface is already configured with the data address.(Additional data addresses are configured as logical interfaces onto this virtual interface. You won’t configure additional virtual IPMP interfaces). The data address will never move away from there. At the end you see the name of the IPMP group. The default behavior sets the name of the IPMP group and the name of the IPMP interface to the same value. Okay, now we have to assign some physical interfaces to it. This is the moment where we have to make a decision. Do we want to use IPMP with probes or without probes? As I’ve explained before it’s important to know at this point, what failure scenarios you want to cover with your configuration. You need to know it now, as the configuration is slightly different.

Link based failure detection

I want to explain the configuration of the link based failure detection first not only because it’s easier, but to show you the problems of link based failure detection, too.

Configuration

As explained before, the link based failure detection just snoops on certain events of the networking card like a lost link. Thus we just have to configure the interface into the IPMP group that you want to protect against a link failure, but you don’t have to configure any IP addresses on the member interfaces of the IPMP group. Okay, at first we plumb the interfaces we want to use in our IPMP group:

jmoekamp@hivemind:/etc# ifconfig e1000g0 plumb
jmoekamp@hivemind:/etc# ifconfig e1000g1 plumb
jmoekamp@hivemind:/etc# ifconfig rge0 plumb

Okay, now we add the three member interfaces into the IPMP group:

jmoekamp@hivemind:/etc# ifconfig e1000g0 -failover group production0 up
jmoekamp@hivemind:/etc# ifconfig e1000g1 -failover group production0 up
jmoekamp@hivemind:/etc# ifconfig rge0 -failover group production0 up

As you may have noticed, we really didn’t specify an IP address or a hostname. With link-based failure detection you don’t need it. The IP address of the group is located on the IPMP interface we’ve defined a few moments ago. But let’s have a look at the ifconfig statements. There are two parameters you may not know:

Let’s look at one of the interfaces:

rge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 5
        inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
        groupname production0

We find the consequences of both ifconfig parameters: The NOFAILOVER is obviously the result of the -failover and groupname production0 is the result of the group production0 statement. But there is another flag that is important in the realm of IPMP. It’s the DEPRECATED flag. The DEPRECATED flag has a very simple meaning: Don’t use this IP interface. When an interface has this flag, the IP address won’t be used to send out data (Of course there are exceptions like an application specifically binding to the interface. Please look into the man page for further information). As those IP addresses are just for test purposes, you don’t want them to appear in packets to the outside world.

Playing around

Now we need to interact with the hardware, as we will fail network connections manually. Or to say it differently: We will pull some cables. But before we are doing this, we look at the initial status of our IPMP configuration. The new IPMP model improved the monitoring capabilities of it’s state by introducing a command for this task. It’s called ipmpstat.

jmoekamp@hivemind:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
rge0        yes     production0 -------   up        disabled  ok
e1000g1     yes     production0 -------   up        disabled  ok
e1000g0     yes     production0 --mb---   up        disabled  ok

Just to give you a brief tour through the output of the command. The first column reports the name of the interface, the next one reports the state of the interface from the perspective of IPMP. The third column tells you which IPMP group was assigned to this interface. The next columns gives us some more in-depth information about the interface. The fourth column is a multipurpose column to report a number of states. In the last output, the --mb-- tells us, that the interface e1000g0 was chosen for sending and receiving multicast and broadcast data. Other interfaces doesn’t have a special state, so there are just dashes in the respective =FLAGS= field of these interfaces. The fifth column reveals, that we’ve disabled probes (Or to be more exact, that there is no probing as we didn’t configured it so far). The last column details on the state of the interface. In this example it is OK and so it’s used in the IPMP group. Okay, now pull the cable from the e1000g0 interface. It’s Cable 1 in the last figure. The system automatically switches to e1000g1 as the active interface.

jmoekamp@hivemind:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
rge0        yes     production0 -------   up        disabled  ok
e1000g1     yes     production0 --mb---   up        disabled  ok
e1000g0     no      production0 -------   down      disabled  failed

As you can see, the failure has been detected on the e1000g0 interface. The link is down, thus it is no longer active. Okay, let’s repair it. Put the cable back to the port of the e1000g0 interface. After a moments, the link is up. The in.mpathd gets aware of the RUNNING flag on the interface. in.mpathd assumes that the network connection got repaired, so the state of the interface is set to ok and thus the interface is reactivated.

jmoekamp@hivemind:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
rge0        yes     production0 -------   up        disabled  ok
e1000g1     yes     production0 --mb---   up        disabled  ok
e1000g0     yes     production0 -------   up        disabled  ok

The problem with link-based failure detection

Just in case you’ve played with the ethernet cables, ensure that IPMP chooses an interface connecting via Switch A as the active interface by zipping Cable 3 from the switch B for a moment. When you check with ipmpstat -i the mb has to be assigned to the interface e1000g0 or e1000g1. As i wrote before there are failure modes link-based failure detection can’t detect. Now let’s introduce such a fault. To do so, just remove Cable 4 between switch A and B.

jmoekamp@hivemind:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
rge0        yes     production0 -------   up        disabled  ok
e1000g1     yes     production0 --mb---   up        disabled  ok
e1000g0     yes     production0 -------   up        disabled  ok

As there is still a link on the Cables 1 and 2 everything is fine from the perspective of IPMP. It doesn’t switch to the connection via rge0 which presents the only working connection to the outside world. IPMP is simply not aware of the fact that Switch A was seperated from the IP link 192.168.178.0/24 due to the removal of cable 4.

Probe based failure detection

The probe based detection has some additional capabilities. At first it has all the capabilities of the link-based detection. It switches over to the other network card as soon as the card loses the link. But additionally it checks the availability of the connection by pinging other IP addresses called target systems. When the system doesn’t get a reply on the ICMP messages, the interface is assumed to be in failure state and it isn’t used anymore. in.mpathd switches the data addresses to other interfaces. So how do you configure probe based IPMP?

Configuration

Okay, at first we revert back to the original state of the system. This is easy, we just have to unplumb the interfaces. In my example I’m unplumbing all interfaces. You could reuse the production0 interface, but I’m including it here just in case you’ve started reading this tutorial at the beginning of this paragraph (In this case, the first three commands will fail, but you have the explicitly defined IPMP interface). It’s important that you unplumb the member interfaces of the group before you unplumb the IPMP interface, otherwise you get an error message:

jmoekamp@hivemind:/etc# ifconfig e1000g0 unplumb
jmoekamp@hivemind:/etc# ifconfig e1000g1 unplumb
jmoekamp@hivemind:/etc# ifconfig rge0 unplumb
jmoekamp@hivemind:/etc# ifconfig production0 unplumb

Okay, now all the interfaces are away. Now we recreate the IPMP group.

jmoekamp@hivemind:/etc# ifconfig production0 ipmp hivemind-prod up

We can check the successful creation of the IPMP interface by using the ipmpstat command.

jmoekamp@hivemind:/etc# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
production0 production0 failed    --        --

At start there isn’t an interface configured into the IPMP group. So let’s start to fill the group with some life.

jmoekamp@hivemind:/etc# ifconfig e1000g0 plumb hivemind-prod-e1000g0 -failover group production0 up

There is an important difference. This ifconfig statement contains an IP address, that is assigned to the physical interface. This automatically configures IPMP to use the probe based failure detection. The idea behind the -failover= setting gets clearer now. Obviously the test addresses of an interface should be failovered by IPMP. They should stay on the logical interface. As the interface has the FAILOVER flag, the complete interface including it's IP address is exempted from any failover. Let's check the ipmp group again:

jmoekamp@hivemind:/etc# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
production0 production0 ok        10.00s    e1000g0

There is now an interface in the group. Of course an IPMP group with just one interface doesn’t really make sense. So configure we will configure a second interface into the group. You may have recognized the FTD column. FTD stands for “Failure Detection Time”. Why is there an own column for this number? Due to the dynamic nature of the Failure Detection time, the FDT may be different for every group. With this column you can check the the current FDT.

jmoekamp@hivemind:/etc# ifconfig e1000g1 plumb hivemind-prod-e1000g1 -failover group production0 up

Let’s check again.

jmoekamp@hivemind:/etc# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
production0 production0 ok        10.00s    e1000g1 e1000g0

Now we add the third interface that is connected to the default gateway just via Switch B.

jmoekamp@hivemind:/etc# ifconfig rge0 plumb hivemind-prod-rge0 -failover group production0 up

Let’s check again.

jmoekamp@hivemind:/etc# ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
production0 production0 ok        10.00s    rge0 e1000g1 e1000g0

All three interfaces are in the IPMP group now. And that’s all … we’ve just activated failure detection and failover by this four commands. Really simple, isn’t it?

Playing around

I hope, you have still the hardware configuration in place, I used to show the problems of link based failure detection. In the case you haven’t please create the configuration we’ve used there. At first we do a simple test: We simply unplug a cable from the system. In my case I removed the cable 1:

jmoekamp@hivemind:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
rge0        yes     production0 --mb---   up        ok        ok
e1000g1     yes     production0 -------   up        ok        ok
e1000g0     no      production0 -------   down      failed    failed

The system reacts immediately, as the link-based failure detection is still active, even when you use the probe-based mechanism. You can observe this in the ipmpstat output by monitoring the state of the link column. It’s down at the moment and obviously probes can’t reach their targets. The state is assumed as failed. Now plug the cable back to the system:

jmoekamp@hivemind:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
rge0        yes     production0 --mb---   up        ok        ok
e1000g1     yes     production0 -------   up        ok        ok
e1000g0     no      production0 -------   up        failed    failed

The link is back, but the interface is still failed. IPMP works as designed here. The probing of the interface with ICMP messages still considers this interface as down. As we have now two mechanism to check the availability of the interface, both have to confirm the repair. IPMP doesn’t consider an interface as repaired when just one ICMP probe gets through, it waits until 20 ICMP probes were correctly replied by the target system. Due to this probing at repair time instead of just relying on the link, you can prevent that an interface is considered as OK when an unconfigured switch brings the link back online, but the configuration of the switch doesn’t allow to the server to connect anywhere (because of a missing VLAN configuration for example).

jmoekamp@hivemind:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
rge0        yes     production0 --mb---   up        ok        ok
e1000g1     yes     production0 -------   up        ok        ok
e1000g0     yes     production0 -------   up        ok        ok
jmoekamp@hivemind:~#

As soon as the probing of the interface is successful, it brings the interface back to the OK state and everything is fine. Now we get to a more interesting use case of probe-based failure detection. Let’s assume we’ve repaired everything and all is fine. You should see a situation similar to this one in your ipmpstat output:

jmoekamp@hivemind:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
rge0        yes     production0 -------   up        ok        ok
e1000g1     yes     production0 -------   up        ok        ok
e1000g0     yes     production0 --mb---   up        ok        ok

Now unplug cable 4, the cable between the switch A and B. At first nothing happens, but a few seconds later IPMP switches the IP addresses to rge0 and set the state of the other interfaces to failed.

jmoekamp@hivemind:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
rge0        yes     production0 --mb---   up        ok        ok
e1000g1     no      production0 -------   up        failed    failed
e1000g0     no      production0 -------   up        failed    failed

When you look at the output of ipmpstat you will notice that the link is still up, but the probe has failed, thus the interfaces were set into the state failed. When you plug the cable 3 back to the switches nothing will happen at first. You have to wait until the probing mechanism reports that the IPMP messages were correctly returned by the target systems.

jmoekamp@hivemind:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
rge0        yes     production0 --mb---   up        ok        ok
e1000g1     no      production0 -------   up        failed    failed
e1000g0     no      production0 -------   up        failed    failed

After a few seconds it should deliver an ipmpstat output reporting everything is well again.

jmoekamp@hivemind:~# ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
rge0        yes     production0 --mb---   up        ok        ok
e1000g1     yes     production0 -------   up        ok        ok
e1000g0     yes     production0 -------   up        ok        ok

Making the configuration boot persistent

As you have recognized for sure, all this configuration took place with the ifconfig statement. This configuration is lost when you reboot the system. But there is already an entity that configures the interfaces at system start. It’s using the hostname.* files. Thus we could use these files for IPMP as well.

Boot persistent link-based configuration

Okay, to recreate our link-based IPMP configuration in a boot persistent, we need to fill the hostname.* files with the following statements:

jmoekamp@hivemind:/etc# echo "ipmp group production0 hivemind-prod up" > /etc/hostname.production0
jmoekamp@hivemind:/etc# echo "group production0 -failover up" > /etc/hostname.e1000g0
jmoekamp@hivemind:/etc# echo "group production0 -failover up" > /etc/hostname.e1000g1
jmoekamp@hivemind:/etc# echo "group production0 -failover up" > /etc/hostname.rge0

We reboot the system now to ensure that we did everything correctly. When the system has booted up, we will check if we made an error.

jmoekamp@hivemind:~$ ipmpstat -g
GROUP       GROUPNAME   STATE     FDT       INTERFACES
production0 production0 ok        --        rge0 e1000g1 e1000g0
jmoekamp@hivemind:~$ ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
rge0        yes     production0 -------   up        disabled  ok
e1000g1     yes     production0 -------   up        disabled  ok
e1000g0     yes     production0 --mb---   up        disabled  ok

Looks good. Now let’s look into the list of interfaces.

jmoekamp@hivemind:~$ ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
	inet 127.0.0.1 netmask ff000000 
production0: flags=8001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,IPMP> mtu 1500 index 2
	inet 192.168.178.200 netmask ffffff00 broadcast 192.168.178.255
	groupname production0
e1000g0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3
	inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
	groupname production0
e1000g1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 4
	inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
	groupname production0
rge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 5
	inet 0.0.0.0 netmask ff000000 broadcast 0.255.255.255
	groupname production0
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
	inet6 ::1/128 
jmoekamp@hivemind:~$ 

IPMP configured and boot-persistent? Check.

Boot persistent probe-based configuration

We can do the same for the probe-based IPMP:

jmoekamp@hivemind:/etc# echo "ipmp group production0 hivemind-prod up" > /etc/hostname.production0
jmoekamp@hivemind:/etc# echo "group production0 -failover hivemind-prod-e1000g0 up" > /etc/hostname.e1000g0
jmoekamp@hivemind:/etc# echo "group production0 -failover hivemind-prod-e1000g1 up" > /etc/hostname.e1000g1
jmoekamp@hivemind:/etc# echo "group production0 -failover hivemind-prod-rge0 up" > /etc/hostname.rge0

Reboot the system and login afterwards to check the list of interfaces.

jmoekamp@hivemind:~$ ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1
	inet 127.0.0.1 netmask ff000000 
production0: flags=8001000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,IPMP> mtu 1500 index 2
	inet 192.168.178.200 netmask ffffff00 broadcast 192.168.178.255
	groupname production0
e1000g0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 3
	inet 192.168.178.201 netmask ffffff00 broadcast 192.168.178.255
	groupname production0
e1000g1: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 4
	inet 192.168.178.202 netmask ffffff00 broadcast 192.168.178.255
	groupname production0
rge0: flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILOVER> mtu 1500 index 5
	inet 192.168.178.203 netmask ffffff00 broadcast 192.168.178.255
	groupname production0
lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1
	inet6 ::1/128 

Let’s check the configuration via ipmpstat, too:

jmoekamp@hivemind:~$ ipmpstat -i
INTERFACE   ACTIVE  GROUP       FLAGS     LINK      PROBE     STATE
rge0        yes     production0 --mb---   up        ok        ok
e1000g1     yes     production0 -------   up        ok        ok
e1000g0     yes     production0 -------   up        ok        ok
jmoekamp@hivemind:~$ 

Everything is fine.