Remote Mirroring with the Availability Suite

Remote Mirroring with the Availability Suite

Introduction

Solaris was designed with commercial customers in mind. Thus this operating environment has some capabilities that are somewhat useless for the soho user, but absolutely essential for enterprise users. One of this capabilities is remote replication of disk volumes. Imagine the following situation. You have a database and a filesystem for a central application of your company. The filesystem stores binary objects (for example images or something like that). The application is so important for your company, that you plan to build a replicated site. And now your problems start.

Replicating a database is fairly easy. Most databases have some functionality to do a master/replica replication. Filesystems are a little bit harder. Of course you can use rsync, but whats with the data you wrote after the last rsync run and the failure of your main system. And how do you keep the database and the filesystem consistent?

The Solaris operating environment has a feature to solve this problem. It’s called Availability Suite (in short AVS). It’s a rather old feature, but I would call it a matured feature. The first versions of the tool wasn’t really enterprise ready and this led to some rude nick names for this feature, but that’s so long ago .…

AVS was designed to give the operating environment the capability to replicate a volume to another site independently from the way it’s used. Thus it’s irrelevant, if the volume is used by a filesystem, as a raw device for a database. Well you can even use it to give ZFS the capability to do a synchronous replication (a feature missing today)

AVS has a point-in-time-copy feature (something similar to snapshots) as well, but this tutorial will concentrate to the remote mirror capability.

Some words about Sun StorageTek Availability Suite at first: We’ve opensourced the product quite a while ago. While it’s a commercial product for Solaris, we’ve integrated it into Solaris Express Community Edition and Developer Edition.

Implementation of the replication

The integration of the replication into Solaris is relatively simple. The replication is done by a filter driver in front of the storage device drivers. You can think of it as: Data destined to the harddisk is intercepted by this driver before it goes to the disk to handle the data accordingly to the configuration. This driver can write the data directly to an equivalent driver on a different system or to a queue for later transmission. Furthermore the data is directed to the normal path to the local harddisk.

Wording

Some definitions are important to understand the following text:

primary host/primary volume:

The volume or host which acts as the source of the replication<

secondary host/secondary volume:

The volume or host which acts as the target of the replication

bitmap volume:

Each secondary and primary volume has a so called bitmap volume. This bitmap volume is used to store the changes to the primary or secondary volume when replication has failed or was deliberately stopped

Synchronous Replication

AVS supports two modes of replication. The first mode is the synchronous replication. From the standpoint to ensure that master and replica are identical volume, this is the best method. A system call writing to a replicated volumes doesn’t complete until the replica has confirmed, that the data was written to the replicated volume.

By this mode of operation it’s ensured that all data committed do disk are on both volumes. This is important for databases for example. But it has an disadvantage as well: A single write takes much longer. You have to factor in the round trip time on the network and the amount of time needed to write the data on the replicated disk.

Asynchronous Replication

Out of this reason, asynchronous replication was introduced. You can use it in environments with less strict requirements. In this mode the write system calls return when the data has written to the replicated volume and to a per-volume queue. From this queue data is sent to the secondary host. When the secondary host writes the data to the disk, it acknowledges this and the primary host deletes the data from the queue.

This method has the advantage of introducing much less latency to the writes of the system. Especially for long-range replications this is a desired behavior. This comes at a price: In the case of a failure, a committed read from an application may reside in a replication queue but isn’t transmitted to the secondary host.

Choosing the correct mode

Choosing the correct mode is difficult. You have to do an trade-off between performance and integrity. You can speed up synchronous replication by faster networks with a few hops between both hosts, but a soon as you leave your campus this can be a costly endeavor. You should think about one point: Is the need for "up to the last committed" transaction a real or an assumed need.

My experience from the last seven years Sun: As soon as you show people the price tag of "99.999%" availability, the requirements get more differentiated. Or to bring it in a context with remote mirroring: Before ordering this extremely fast and extremely expensive leased line you should talk with the stakeholders with they really need synchronous replication.

Synchronization

Okay, the replication takes care of keeping the replicated volume and the replica identical, when the software runs. But how to sync both volumes when starting replication or later on, when the replication was interrupted? The process to solve this is called synchronization.

AVS Remote Mirror knows four modes of replication:

Full replication:

You do at least one full replication with every volume. It’s the first one. The full replication copies all data to the secondary volume.

Update replication

: When the volume is in logging mode, the changes to the primary volume is stored on the bitmap volume. Thus with the update replication you can transmit only the changed blocks to the secondary volume.

Full reverse replication

This is the other way round. Let’s assume you’ve done a failover to your remote site, and you’ve worked on the replicated volume for some time. Now you want to switch back to your normal datacenter. You have to transport the changes from the mean time to your primary site as well. Thus there is a replication mode called reverse replication. The full reverse replication copies all data from the secondary volume to the primary volume.

Update reverse replication:

The secondary volume has a bitmap volume as well. Thus you can do an update replication from the secondary to the primary volume as well.

Okay, but what mode of replication should you choose? For the first replication it’s easy ... full replication. After this, there is a simple rule of thumb: Whenever in doubt of the integrity of the target volume, use the full replication.

Logging

At last there is another important term in this technology: Logging. This has nothing to do with writing log messages about the daemons of AVS. Logging is a special mode of operation. This mode is entered when the replication is interrupted. In this case the changes to the primary and secondary will be recorded in the bitmap volume. It’s important that Logging don’t record the change itself. It stores only the information, that a part of the volume has changed. Logging makes the resynchronization of volumes after a disaster more efficient, as you only have to resync the changed parts of a volume as I’ve explained for the mechanism of update replication before.

Prerequisites for this tutorial

In this tutorial we need to work with two systems. I will use theoden and gandalf again.

10.211.55.200 theoden
10.211.55.201 gandalf

Layout of the disks

Both systems have a boot disk and two disks for data. The data disks have a size of 512 MB. In my example this disks are c1d0 and c1d1. I’ve partitioned each disk in the same manner:

# prtvtoc /dev/rdsk/c1d0s2
. /dev/rdsk/c1d0s2 partition map
[..]
.                          First     Sector    Last
. Partition  Tag  Flags    Sector     Count    Sector  Mount Directory
       0      0    00       6144     65536     71679
       1      0    00      71680    968704   1040383
       2      5    01          0   1042432   1042431
       8      1    01          0      2048      2047
       9      9    00       2048      4096      6143

It’s important that the primary and secondary partitions and their respective bitmap partitions are equal in size. Furthermore: Don’t use cylinder 0 for partitions under the control of AVS. This cylinder may contain administrative information from other components of the systems. Replication of this information may lead to data loss.

Size for the bitmap volume

In my example I’ve chosen a 32 MB bitmap partition for a 512 MB data partition. This is vastly too large for the use case.

You can calculate the size for the bitmap size as follows: \(Size_\text{Bitmapvolume in kB}=1+(Size_\text{Datavolume in GB}*4)\) Let’s assume a 10 GB volume for Data: \(Size_\text{Bitmapvolume in kB}=1+(10*4)\) \(Size_\text{Bitmapvolume in kB}=41kb\)

Usage of the devices in our example

The first partition c1dXs0 is used as the bitmap volume. The second volume c1dXs1 is used as the primary volume on the source of the replication respectively the secondary volume on the target of the replication.

Setting up an synchronous replication

Okay, at first we have to enable AVS on both hosts. At first we activate it on theoden

[root@theoden:~]$ dscfgadm -e

This command may ask for the approval to create the config database, when you run this command for the first time. Answert this question with y.After this we switch to gandalf to do the same.

[root@gandalf:~]$ dscfgadm -e

Now we can establish the replication. We login at theoden first, and configure this replication.

[root@theoden:~]$ sndradm -e theoden /dev/rdsk/c1d0s1 /dev/rdsk/c1d0s0 gandalf /dev/rdsk/c1d0s1 /dev/rdsk/c1d0s0 ip sync
Enable Remote Mirror? (Y/N) [N]: y

What have we configured? We told AVS to replicate the content of /dev/rdsk/c1d0s1 on theoden to /dev/rdsk/c1d0s1 on gandalf. AVS uses the /dev/rdsk/c1d0s1 volume on each system as the bitmap volume for this application. At the end of this command we configure, that the replication uses IP and it’s a synchronous replication.

Okay, but we have to configure it on the targeted system of the replication as well:

[root@gandalf:~]$ sndradm -e theoden /dev/rdsk/c1d0s1 /dev/rdsk/c1d0s0 gandalf /dev/rdsk/c1d0s1 /dev/rdsk/c1d0s0 ip sync
Enable Remote Mirror? (Y/N) [N]: y

We repeat the same command we used on <code>theoden</code> on gandalf as well. Forgetting to do this step is one of the most frequent errors in regard of setting up an remote mirror.

An interesting command in regard of AVS remote mirror is the dsstat command. It shows the status and some statistic data about your replication.

[root@theoden:~]$ dsstat -m sndr
name              t  s    pct role    kps   tps  svt
dev/rdsk/c1d0s1   P  L 100.00  net      0     0    0
dev/rdsk/c1d0s0                bmp      0     0    0

The 100.00 doesn’t stand for "100% of the replication is completed". It standing for "100% of the replication to do". We have to start the replication manually. Okay, more formally the meaning of this column is "percentage of the volume in need of syncing". And as we freshly configured this replication it’s obivous, that the complete volume needs synchronisation.

Two other columns are important, too: It’s the <code>t</code> and the <code>s</code> column. The <code>t</code> column designates the volume type and the <code>s</code> the status of the volume. In this case we’ve observed the primary volume and it’s in the logging mode. It records changes, but doesn’t replicate them right now to the secondary volume.

Okay, so let’s start the synchronisation:

[root@theoden:~]$  sndradm -m gandalf:/dev/rdsk/c1d0s1
Overwrite secondary with primary? (Y/N) [N]: y

We can lookup the progress of the sync with the <code>dsstat</code> command again:

[root@theoden:~]$  dsstat -m sndr
name              t  s    pct role    kps   tps  svt
dev/rdsk/c1d0s1   P SY  97.39  net    Inf     0 -NaN
dev/rdsk/c1d0s0                bmp    Inf     0 -NaN
[root@theoden:~]$  dsstat -m sndr
name              t  s    pct role    kps   tps  svt
dev/rdsk/c1d0s1   P SY  94.78  net    Inf     0 -NaN
dev/rdsk/c1d0s0                bmp    Inf     0 -NaN
[...]
[root@theoden:~]$  dsstat -m sndr
name              t  s    pct role    kps   tps  svt
dev/rdsk/c1d0s1   P SY   3.33  net    Inf     0 -NaN
dev/rdsk/c1d0s0                bmp    Inf     0 -NaN
[root@theoden:~]$  dsstat -m sndr
name              t  s    pct role    kps   tps  svt
dev/rdsk/c1d0s1   P  R   0.00  net      0     0    0
dev/rdsk/c1d0s0                bmp      0     0    0

When we start the synchronization the status of the volume switches to SY for synchronizing. After a while the sync is complete. The status switches again, this time to R for replicating. From this moment all changes to the primary volume will be replicated to the secondary one.

Now let’s play around with our new replication set by using the primary volume. Create a filesystem on it for example, mount it and play around with it:

 [root@theoden:~]$  newfs /dev/dsk/c1d0s1
newfs: construct a new file system /dev/rdsk/c1d0s1: (y/n)? y
/dev/rdsk/c1d0s1:       968704 sectors in 473 cylinders of 64 tracks, 32 sectors
        473.0MB in 30 cyl groups (16 c/g, 16.00MB/g, 7680 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32, 32832, 65632, 98432, 131232, 164032, 196832, 229632, 262432, 295232,
 656032, 688832, 721632, 754432, 787232, 820032, 852832, 885632, 918432, 951232
[root@theoden:~]$ mount /dev/dsk/c1d0s1 /mnt
[root@theoden:~]$ cd /mnt
[root@theoden:~]$ touch test
[root@theoden:~]$ cp /var/log/* .
[root@theoden:~]$ mkfile 1k test2

Okay, in a few seconds I will show you, that all changes really get to the other side.

Testing the replication

One of the most essential tasks when configuring disaster recovery mechanism is training the procedure. Thus let’s test the switch into our remote datacenter. We will simulate a failure now. This will show that the data really gets replicated to a different system ;)

Disaster test

Okay, at first we leave a timestamp in our replicated filesystem, just for testing this feature. I assume, that it’s still mounted on the primary host.

[root@theoden:~]$# cd /mnt
[root@theoden:~]$ ls 
aculog      blah2       lastlog     messages    sulog       utmpx
blah        ds.log      lost+found  spellhist   test        wtmpx
[root@theoden:~]$ date > timetest
[root@theoden:~]$ cat timetest
Sat Mar 29 19:28:51 CET 2008
[root@theoden:~]$ cd /
[root@theoden:~]$ umount /mnt

Please keep the timestamp in mind. Now we switch both mirrors into the logging mode. As an alternative you can disconnect the network cable. This will have the same effect. Whenever the network link between the both hosts is unavailable, both volume will be set to the logging mode. As I use virtual servers, I can’t disconnect a network cable, thus can’t do it this way. Okay ...

[root@theoden:~]$ sndradm -l
Put Remote Mirror into logging mode? (Y/N) [N]: y

When you look at the status of the replication on theoden, you will see the logging state again.

[root@theoden:~]$ dsstat
name              t  s    pct role   ckps   dkps   tps  svt
dev/rdsk/c1d0s1   P  L   0.00  net      -      0     0    0
dev/rdsk/c1d0s0                bmp      0      0     0    0

On gandalf it’s the same.

[root@gandalf:~]$ dsstat
name              t  s    pct role   ckps   dkps   tps  svt
dev/rdsk/c1d0s1   S  L   0.00  net      -      0     0    0
dev/rdsk/c1d0s0                bmp      0      0     0    0

Okay, now we mount the secondary volume. Please keep in mind, that we don’t mount the volume via network or via a dual ported SAN. It’s a independent storage device on a different system.

[root@gandalf:~]$ mount /dev/dsk/c1d0s1 /mnt
[root@gandalf:~]$ cd /mnt
[root@gandalf:~]$ ls -l
total 7854
-rw-------   1 root     root           0 Mar 29 16:43 aculog
[..]
-rw-r--r--   1 root     root          29 Mar 29 19:28 timetest
-rw-r--r--   1 root     root        2232 Mar 29 16:43 utmpx
-rw-r--r--   1 root     root       43152 Mar 29 16:43 wtmpx

Okay, there is a file called timetest. Let’s look for the data in the file.

[root@gandalf:~]$ cat timetest
Sat Mar 29 19:28:51 CET 2008

The file and its content got replicated to the secondary volume instantaniously. Okay, now let’s switch back to primary hosts, but we create another file with a timestamp before doing that.

[root@gandalf:~]$ date > timetest2
[root@gandalf:~]$ cat timetest2
Sat Mar 29 19:29:10 CET 2008
[root@gandalf:~]$ cd /
[root@gandalf:~]$ umount /mnt

Okay, we changed the secondary volume by adding this file, thus we have to sync our primary volume. Thus we do an update reverse synchronisation:

[root@theoden:~]$ sndradm -u -r
Refresh primary with secondary? (Y/N) [N]: y
[root@theoden:~]$ dsstat
name              t  s    pct role   ckps   dkps   tps  svt
dev/rdsk/c1d0s1   P  R   0.00  net      -      0     0    0
dev/rdsk/c1d0s0                bmp      0      0     0    0

This has two consequence. The changes to the secondary volumes are transmitted to the primary volume (as we use the update sync we just transmit this changes) and the replication is started again. Okay, but let’s check for our second timestamp file. We mount our filesystem by using the primary volume.

[root@theoden:~]$ mount /dev/dsk/c1d0s1 /mnt
[root@theoden:~]$ cd /mnt
[root@theoden:~]$ ls -l
total 7856
-rw-------   1 root     root           0 Mar 29 16:43 aculog
[...]
-rw-r--r--   1 root     root          29 Mar 29 19:28 timetest
-rw-r--r--   1 root     root          29 Mar 29 19:32 timetest2
[...]
[root@theoden:~]$ cat timetest2
Sat Mar 29 19:29:10 CET 2008

Et voila, you find two files beginning with timetest and the second version contains the new timestamp we’ve have written to the filesystem while using the secondary volume on the seondary host. Neat, isn’t it?

Asynchronous replication and replication groups

A new scenario: Okay, the filesystem gets replicated now. Let’s assume that we use /dev/rdsk/c1d0s1 for a database. The filesystem and the database partition are used from the same application and it’s important, that the metadata in the database an the binary objects are in sync even when you switched over to the remote site, albeit it’s acceptable to lose the last few transactions, as both sites are 1000km away from each other and synchronous replication is not an option.

The problem

When you use synchronous replication, all is well. But let’s assume you’ve chosen asynchronous replication. Under this circumstances a situation can occur, where one queue is processed faster than another, thus the on-disk states of each volume may be consistent in itself, but both volumes may have the state at different point in time, thus leaving the application data model inconsistent.

Such an behavior is problematic, when you have a database volume and a filesystem volume working together for an application, but the results can be catastrophic when you use a database split over several volumes.

The solution of this problem would be a mechanism, that keeps the writes to a group of volumes in order for the complete group. Thus is inconsistencies can’t occur.

Replication Group

To solve such problems, AVS supports a concept called Replication Group. Adding volumes to a replication group has some implications:

  • All administrative operations to this group are atomic. Thus when you change to logging mode or start an replication, this happens on all volumes in the group

  • The writes to any of the primary volumes will be replicated in the same order to the secondary volumes. The scope of this ordering is the complete group, not the single volume.</li>

  • Normally every replication relation has its own queue and its own queue flusher daemon. Thus multiple volumes can flush their queue in parallel to increase the performance. In case of the Replication group all I/O operations are routed trough a single queue. This may reduce the performance.

How to set up a replication group?

Okay, at first we login at theoden, our primary host in our example. We have to add the existing replication to the replication group and configure another replication relation directly in the correct group. I will create an replication group called importantapp.

[root@theoden:~]$ sndradm -R g importantapp gandalf:/dev/rdsk/c1d0s1
Perform Remote Mirror reconfiguration? (Y/N) [N]: y

We’ve added the group property to the existing mirror, now we create the new mirror directly in the correct group

[root@theoden:~]$ sndradm -e theoden /dev/rdsk/c1d1s1 /dev/rdsk/c1d1s0 gandalf /dev/rdsk/c1d1s1 /dev/rdsk/c1d1s0 ip sync g importantapp
Enable Remote Mirror? (Y/N) [N]: y

With sndradm -P you can look up the exact configuration of your replication sets:

[root@theoden:~]$ sndradm -P
/dev/rdsk/c1d0s1        ->      gandalf:/dev/rdsk/c1d0s1
autosync: off, max q writes: 4096, max q fbas: 16384, async threads: 2, mode: sync, group: importantapp, state: syncing
/dev/rdsk/c1d1s1        ->      gandalf:/dev/rdsk/c1d1s1
autosync: off, max q writes: 4096, max q fbas: 16384, async threads: 2, mode: sync, group: importantapp, state: syncing

Okay, both are in the same group. As before, we have to perform this configuration on both hosts: So we repeat the same steps on the other hosts as well:

[root@gandalf:~]$ sndradm -e theoden /dev/rdsk/c1d1s1 /dev/rdsk/c1d1s0 gandalf /dev/rdsk/c1d1s1 /dev/rdsk/c1d1s0 ip sync g importantapp
Enable Remote Mirror? (Y/N) [N]: y
[root@gandalf:~]$ sndradm -R g importantapp gandalf:/dev/rdsk/c1d0s1
Perform Remote Mirror reconfiguration? (Y/N) [N]: y

No we start the replication of both volumes. We can to this in a single step by using the name of the group.

[root@theoden:~] sndradm -m -g importantapp
Overwrite secondary with primary? (Y/N) [N]: y

Voila, both volumes are in synchronizing mode:

[root@theoden:~]$ dsstat
name              t  s    pct role   ckps   dkps   tps  svt
dev/rdsk/c1d0s1   P SY  89.37  net      -    Inf     0 -NaN
dev/rdsk/c1d0s0                bmp      0     28     0 -NaN
dev/rdsk/c1d1s1   P SY  88.02  net      -    Inf     0 -NaN
dev/rdsk/c1d1s0                bmp      0     28     0 -NaN

Two minutes later the replication has succeeded, we have now a fully operational replication group:

[root@theoden:~]$  dsstat
name              t  s    pct role   ckps   dkps   tps  svt
dev/rdsk/c1d0s1   P  R   0.00  net      -      0     0    0
dev/rdsk/c1d0s0                bmp      0      0     0    0
dev/rdsk/c1d1s1   P  R   0.00  net      -      0     0    0
dev/rdsk/c1d1s0                bmp      0      0     0    0

Now both volumes are in replicating mode. Really easy, it’s just done by adding the group to the replication relations.

Deleting the replication configuration

In the last parts of this tutorial I’ve explained to you, how you set up a replication relation. But it’s important to know how you deactivate and delete the replication as well. It’s quite easy to delete the replication. At first we look up the existing replication configuration.

[root@gandalf:~]$ sndradm -P
/dev/rdsk/c1d1s1        <-      theoden:/dev/rdsk/c1d1s1
autosync: off, max q writes: 4096, max q fbas: 16384, async threads: 2, mode: sync, state: logging

Okay, we can use the local or remote volume as a name to choose the configuration to be deleted:

[root@gandalf:~]$ sndradm -d theoden:/dev/rdsk/c1d1s1
Disable Remote Mirror? (Y/N) [N]: y

Now we can lookup the configuration again.

[root@gandalf:~]$ sndradm -P

As you see, the configuration is gone. But you have to do the same on the other host. So login as root to the other host:

[root@theoden:~]$ sndradm -P         
/dev/rdsk/c1d1s1        ->      gandalf:/dev/rdsk/c1d1s1
autosync: off, max q writes: 4096, max q fbas: 16384, async threads: 2, mode: sync, state: logging
[root@theoden:~]$ sndradm -d gandalf:/dev/rdsk/c1d1s1
Disable Remote Mirror? (Y/N) [N]: y
[root@theoden:~]$ sndradm -P
[root@theoden:~]$

Truck based replication

Andrew S. Tannenbaum said "Never underestimate the bandwidth of a truck full of tapes hurling down the highway". This sounds counterintuitive at the first moment, but when you start to think about it, it’s really obvious.

The math behind the phrase

Let’s assume that you have two datacenters, thousand kilometers apart from each other. You have to transport 48 Terabytes of storage. We will calculate with the harddisk marketing system, 48.000.000 Megabytes. Okay ... now we assume, that we have a 155Mps leased ATM line between the locations. Let’s assume that we can transfer 15,5 Megabytes per second of this line under perfect circumstances. Under perfect circumstances we can transfer the amount of data in 3096774 seconds. Thus you would need 35 days to transmit the 48 Terabytes. Now assume a wagon car with two thumpers (real admins don’t use USB sticks, they use the X4500 for their data transportation needs)in the trunk driving at 100 kilometers per hour. The data would reach the datacenter within 10 hours. Enough time to copy the data to the transport-thumpers and after the arrival from the thumper to the final storage array.

Truck based replication with AVS

AVS Remote mirror supports a procedure to exactly support such a method: Okay, let’s assume you want to migrate a server to a new one. But this new server is 1000km away. You have multiple terabytes of storage, and albeit your line to the new datacenter is good enough for the updates, an full sync would take longer than the universe will exist because of the proton decay.

AVS Remote Mirror can be configured in a way, that relies on a special condition of primary and secondary volumes: The disks are already synchronized, before starting the replication. For example by doing a copy by dd to the new storage directly or with the redirection of an transport media like tapes. When you configure AVS Remote Mirror in this way, you don’t need the initial full sync.

On our old server

To play around, we create at first a new filesystem:

[root@theoden:~]$ newfs /dev/rdsk/c1d1s1 
newfs: construct a new file system /dev/rdsk/c1d1s1: (y/n)? y
/dev/rdsk/c1d1s1:       968704 sectors in 473 cylinders of 64 tracks, 32 sectors
        473.0MB in 30 cyl groups (16 c/g, 16.00MB/g, 7680 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
 32, 32832, 65632, 98432, 131232, 164032, 196832, 229632, 262432, 295232,
 656032, 688832, 721632, 754432, 787232, 820032, 852832, 885632, 918432, 951232

Now mount it , play around with it and put a timestamp in a file.

[root@theoden:~]$ mount /dev/dsk/c1d1s1 /mnt
[root@theoden:~]$ mount /dev/dsk/c1d1s1 /mnt
[root@theoden:~]$ touch /mnt/test1
[root@theoden:~]$ mkfile 1k /mnt/test2
[root@theoden:~]$ mkfile 1k /mnt/test3
[root@theoden:~]$ mkfile 1k /mnt/test4
[root@theoden:~]$ date >> /mnt/test5

Okay, now unmount it again.

[root@theoden:~]$ umount /mnt

Now we can generate a backup of this filesystem. You have to make a image of the volume, making a tar or cpio file backup isn’t sufficient.

[root@theoden:~]$ dd if=/dev/rdsk/c1d1s1 | gzip > 2migrate.gz
968704+0 records in
968704+0 records out

Okay, now activate the replication on the primary volume. Don’t activate it on the secondary one! The important difference to a normal replication is the -E. When you use this switch, the system assumes that the primary and secondary volume are identical already.

[root@theoden:~]$ sndradm -E theoden /dev/rdsk/c1d1s1 /dev/rdsk/c1d1s0 gandalf /dev/rdsk/c1d1s1 /dev/rdsk/c1d1s0 ip sync
Enable Remote Mirror? (Y/N) [N]: y

Okay, we’ve used the -E switch again to circumvent the need for a full syncronisation. When you look at the status of volume, you will see the volume in the "logging" state:

[root@theoden:~]$ dsstat
name              t  s    pct role   ckps   dkps   tps  svt
dev/rdsk/c1d1s1   P  L   0.00  net      -      0     0    0
dev/rdsk/c1d1s0                bmp      0      0     0    0

This means, that you can do changes on the volume.

[root@theoden:~]$ mount /dev/dsk/c1d1s1 /mnt
[root@theoden:~]$ cat /mnt/test5
Mon Mar 31 14:57:04 CEST 2008
[root@theoden:~]$ date >> /mnt/test6
[root@theoden:~]$cat /mnt/test6
Mon Mar 31 15:46:03 CEST 2008

Now we transmit our image of the primary volume to our new system. In my case it’s scp, but for huge amount of data sending the truck with tapes would be more sensible.

[root@theoden:~]$ scp 2migrate.gz jmoekamp@gandalf:/export/home/jmoekamp/2migrate.gz
Password: 
2migrate.gz           100% |*****************************|  1792 KB    00:00

On our new server

Okay, when the transmission is completeted, we write the image to the raw device of the secondary volume:

[root@gandalf:~]$ cat 2migrate.gz | gunzip | dd of=/dev/rdsk/c1d1s1
968704+0 records in
968704+0 records out

Okay, now we configure the replication on the secondary host:

# sndradm -E theoden /dev/rdsk/c1d1s1 /dev/rdsk/c1d1s0 gandalf /dev/rdsk/c1d1s1 /dev/rdsk/c1d1s0 ip sync

A short look into the status of replication:

[root@gandalf:~]$ dsstat
name              t  s    pct role   ckps   dkps   tps  svt
dev/rdsk/c1d1s1   S  L   0.00  net      -      0     0    0
dev/rdsk/c1d1s0                bmp      0      0     0    0

Okay, our primary and secondary volumes are still in logging mode. How do we get them out of this? In our first example we did an full syncronisation, this time we need only an update synchronisation. So login as root to our primary host and initiate such an update sync. This is the moment, where you have to stop working on the primary volume.

[root@theoden:~]$ sndradm -u
Refresh secondary with primary? (Y/N) [N]: y

After this step all changes we did after creating the image from our primary volume will be synced to the secondary volume.

Testing the migration

Well ... let’s test it: Do you remember, that we created /mnt/test6 after the dd for the image? Okay, at first, we put the replication in logging mode again. So login as root on our secondary host.

[root@gandalf:~]$ sndradm -l
Put Remote Mirror into logging mode? (Y/N) [N]: y
\begin{lstlisting}
Now we mount the secondary volume:
\begin{lstlisting}
# mount /dev/dsk/c1d1s1 /mnt
[root@gandalf:~]$ cd /mnt
[root@gandalf:~]$ ls
lost+found  test2       test4       test6
test1       test3       test5

By the virtues of update synchronisation, the test6 appeared on the seondary volume.Let’s have a look in /mnt/test6:

[root@gandalf:~]$ cat test6
Mon Mar 31 15:46:03 CEST 2008

Cool, isn’t it ?

Conclusion

How can you use this feature? Some use cases are really obvious. It’s a natural match for disaster recovery. The Sun Cluster Geographic Edition even supports this kind of remote mirror out of the box to do cluster failover with wider distances than just a campus. But it’s usable for other jobs as well, for example for migrations to a new datacenter, when you have to transport a large amount data over long distances without a time window for a longer service interruption.

Do you want to learn more?

Documentation

Sun StorageTek Availability Suite 4.0 Software Installation and Configuration Guide[^35]

Sun StorageTek Availability Suite 4.0 Remote Mirror Software Administration Guide[^36]

misc. Links

OpenSolaris Project: Sun StorageTek Availability Suite[^37]