The Deliverer's knowledge: Sun Cluster Step by Step - How to add an additional node to an existing single-node cluster?
(Foreword from Joerg: The next installment of “The Deliverer’s Knowledge” is a pretty advanced topic: Sun Cluster. This article it’s a cooperation, Heiko wrote the technical side of tutorial, i expanded his notes with a few comments. So typos are my, not his fault)
Installing a single-node cluster is drop-dead simple, when you just follow the User Manual. But now you’ve played a while with it. It got a little bit boring. You want to see a resource switching from one node to another. So how do you get a cluster node into an existing single node cluster. So how do you expand the cluster to a two-node cluster?
Environment
This example makes some assumption about the environment:
OS
Solaris 10
SC
3.2
Clustername
t-clust
existing node
node1
Metaset
sc-set
Resourcegroup
sc-rg
Resources
stor-rs lh-rs
new Node
node2
NICs for interconnect
ce0,ce1
IPMPs
node1-group@1,node2-group@2
Requirements:
Furthermore there are some tasks, you have to do before you start the configuration. It’s important, that you wire the interconnect before you start the configuration.
Afterwards you should check, if the operating system of both nodes is patched to the same level. It’s a good practice to have similar systems at the start of your system.
A new member for the cluster
Okay, at first look up the the version and the patch level on both nodes. It should be equal. You can look it up on both nodes after installing the clusterpackages.
Configuring the cluster interconnect
There are some components a single-node cluster doesn’t need. You don’t need a cluster interconnect when you have just one node. Obvious, isn’t it? So let’s start by configuring the interconnect on node1:
In the case you have a switched interconnect, you have to configure the so-called junctions.
Now you’ve configured the interconnect, but you have to enable it before you can use it:
Okay, let’s check it:
The configuration is ready, but we have to tell your cluster, that node2 is allowed to join the cluster. That’s easy:
To check the successful completion you can just look into the file /etc/cluster/ccr/infrastructure
Now you have to login to your new cluster node node2. You start the configuration will all the nescessary data, so the installation won’t ask you interactively. Afterwards restart the cluster node.
After the boot, you will see several new messages while you start the system. After the boot, log in as root and check the cluster configuration.
Quorum devices
No we have to create a quorum device. For many people the quorum devices are a little bit difficult concept. The quorum device has an important role in Sun Cluster. It prevents two partitioning effects (“cluster amnesia” and “split brain”), that could lead to data corruption due to applications accessing the same data at the same time. You can find a description of these effects in the Sun Cluster Concepts Guide.
You have to know when the cluster is partitioned there is a vote about “Who is the operational cluster?”. The rule is simple: Every member of the cluster gets one vote. The cluster partition, that has more votes is the operational cluster. That’s easy for a 3-Node cluster. Whenever a partitioning occurs, you can be sure that there is one partition with 2 nodes and one partition with one node. Thus the vote goes 2:1. You have a winner. But what is the situation with a splited two node cluster. Both partitions have a single node, Both have one vote - 1:1 . You need a tie breaker and the quorum device has exactly this role. In essence the quorum device is something like a flag. Whoever gets it first, has won and is the operational half of a cluster.
The practical side is a little bit more complex, as the vote count is configurable. You need to configure it in some situtation to ensure that a cluster can start, even when just node comes up. Remember, that a cluster just can get up, when it’s operational and an operational cluster needs the majority of votes.
Let’s assume you have a three node cluster. So the vote count is three. Let’s assume that two of your nodes are failed. Thus the surviving node can’t get the majority of votes as he has just one vote. So you can configure a quorum device that has two votes. Your cluster has a total vote count of five. A single member has still a vote count of 1. When this member gets the quorum, this cluster part has a vote count of 3, thus it would have the majority.
Okay, let’s check the current quorum configuration:
As you see … an even number of votecounts. This cluster would just start only if all nodes available. At first we have to choose a device for it.
In our case we will use the device d6
So let’s check our quorum configuration again.
Nice, we have a odd number of total votes, the node that gets the quorum disk is the functional one.
Final steps
But we are not at the end of the configuration. At first we have to tell the Solaris Volume manager, that node2 is allowed to use the metaset sc-set, thus accessing the data on the metaset.
Now you have to tell the cluster that it has new resources. At first you tell the resource lh-res that it has an additional interface on a second node. At second, you add the new node to the resource-group sc-rg