Restore Failed WiDirect in Cluster

From WiDirect
Jump to: navigation, search

In some scenarios it may be advisable to have multiple WiDirect units running side by side in the event that one fails. In the unlikely event that a WiDirect fails, this document will describe the steps to restore the settings on the failed unit.

Many of the steps below will require root access to the WiDirect. This command can be run initially to obtain root access:

	su -

Overview

Configure Hostname

It is important for hostnames to be properly set on both WiDirects. The hostname can be set on the network page in version 2.3 and above.

The examples below will use f1.awi6.net for the hostname of the new server. f1.awi6.net has IP addresses 10.8.9.123.

Install Packages

A number of packages are required to be installed to configure WiDirect failover. Run this command first:

emacs /etc/yum.repos.d/clusterlabs.repo

Add this text to the text file:

[clusterlabs]
name=High Availability/Clustering server technologies (epel-5)
baseurl=http://www.clusterlabs.org/rpm/epel-5
type=rpm-md
gpgcheck=0
enabled=1

Save the file and run these commands:

wget https://allcitywireless.com/failover/epel-release-5-4.noarch.rpm
rpm -i epel-release-5-4.noarch.rpm
yum remove awicp_reloaders
yum install awicp_reloaders_ha drbd83 kmod-drbd83* heartbeat pacemaker ipmitool

After installing the packages run the reboot command:

reboot

Create Firewall Rules

A number of ports need to be opened for the services to work properly. TCP ports 7788 through 7799 need to be opened for the shared drive functionality to work. UDP port 694 must be opened for the process monitoring services to work. Add these lines to the top portion of the iptables file:

-A INPUT -i eth0 -p tcp -m tcp --dport 7788:7799 --tcp-flags SYN,RST,ACK SYN -j ACCEPT
-A INPUT -i eth0 -p udp -m udp --dport 694 -j ACCEPT

FailoverIptables.jpg

Configure Local Services

These commands should be run on each box.

service iptables restart
service mysqld stop
chkconfig mysqld off
service dhcpd stop
chkconfig dhcpd off
service httpd stop
chkconfig httpd off
rm -rf /etc/rc3.d/*awicp*

Create Shared Drive

Both WiDirects are going to share storage space for data that will be shared between them. There is at least 2 GB of empty space available on the hard drive for the shared drive. Some models may have more space available. Check with AllCity Wireless support staff for more information. Run these commands on both WiDirects to create the partitions:

	lvm
	lvcreate --size 2G -n LogVol02 VolGroup00
	exit
	emacs /etc/drbd.conf

Copy the /etc/drbd.conf file from the operational WiDirect to the WiDirect that you are restoring.

After the configuration file is saved the next step is to create the drive metadata. These commands need to be run on the new WiDirect:

	drbdadm create-md drbd0
	drbdadm up drbd0
	mkdir /shared
	service mysqld stop
	mkdir /root/AWICP/license
	chmod -R a+rw /root/AWICP/license
	cp /root/AWICP/etc/awicp.serial /root/AWICP/license
	reboot

One more step is required to modify the file locations on the secondary WiDirect:

mv /var/lib/mysql /var/lib/mysql.backup
ln -s /shared/mysql /var/lib/mysql
mv /root/AWICP/www/portal/branding  /root/AWICP/www/portal/branding.backup 
ln -s /shared/branding /root/AWICP/www/portal/branding 
mv /root/AWICP/etc /root/AWICP/etc.backup
ln -s /shared/etc /root/AWICP/etc 
mv /root/AWICP/logs /root/AWICP/logs.backup
ln -s /shared/logs /root/AWICP/logs 
mv /root/AWICP/monitor-data /root/AWICP/monitor-data.backup 
ln -s /shared/monitor-data /root/AWICP/monitor-data 
mv /root/AWICP/db /root/AWICP/db.backup
ln -s /shared/db /root/AWICP/db
mv /etc/dhcpd.conf /etc/dhcpd.conf.backup
ln -s /shared/etc/dhcpd.conf /etc/dhcpd.conf 
mv /var/lib/dhcpd /var/lib/dhcpd.backup
ln -s /shared/dhcpd /var/lib/dhcpd

After running those commands there will be a period of syncing between the two drives. Run the status command to check the status:

service drbd status

The status command will indicate whether the drives are inconsistent or up to date. Initially the secondary drive will be listed as inconsistent, and the status command will show the percent synced between the two devices.

View and Change Status of Shared Disk Drive

The drbd service manages the shared drive between the two boxes. To view the current status of the shared drive you can run the command below:

service drbd status

The commands in this section describe how to manually change which WiDirect is the primary one, and which is the secondary. These are only for reference, and do not need to be run typically. The following sections will describe how to use the heartbeat service to manage these automatically. To change the current primary box to be secondary you can run these commands:

/root/AWICP/bin/widirect_stop_all.sh
service mysqld stop
service httpd stop
umount /shared
drbdadm secondary drbd0

The other box can then be made the primary server:

drbdadm primary drbd0
mount /dev/drbd0 /shared
service mysqld start
service httpd start
/root/AWICP/bin/widirect_start_all.sh

Configure Services for Failover

From the existing WiDirect copy the /etc/ha.d folder to the new WiDirect with the scp command. You also need to configure some of the other services:

service drbd stop
chkconfig drbd off
chkconfig heartbeat on

Further Configuration

It is also recommended that you modify the system check page to show the status of the heartbeat processes. Run this command to edit the file:

emacs /root/AWICP/config-helpers/statusCheck.pl

Look for the line that shows "my $showFailoverStatus=0;" and change the 0 to a 1.