Restore Failed WiDirect in Cluster
In some scenarios it may be advisable to have multiple WiDirect units running side by side in the event that one fails. In the unlikely event that a WiDirect fails, this document will describe the steps to restore the settings on the failed unit.
Many of the steps below will require root access to the WiDirect. This command can be run initially to obtain root access:
It is important for hostnames to be properly set on both WiDirects. The hostname can be set on the network page in version 2.3 and above.
The examples below will use f1.awi6.net for the hostname of the new server. f1.awi6.net has IP addresses 10.8.9.123.
A number of packages are required to be installed to configure WiDirect failover. Run this command first:
Add this text to the text file:
[clusterlabs] name=High Availability/Clustering server technologies (epel-5) baseurl=http://www.clusterlabs.org/rpm/epel-5 type=rpm-md gpgcheck=0 enabled=1
Save the file and run these commands:
wget https://allcitywireless.com/failover/epel-release-5-4.noarch.rpm rpm -i epel-release-5-4.noarch.rpm yum remove awicp_reloaders yum install awicp_reloaders_ha drbd83 kmod-drbd83* heartbeat pacemaker ipmitool
After installing the packages run the reboot command:
Create Firewall Rules
A number of ports need to be opened for the services to work properly. TCP ports 7788 through 7799 need to be opened for the shared drive functionality to work. UDP port 694 must be opened for the process monitoring services to work. Add these lines to the top portion of the iptables file:
-A INPUT -i eth0 -p tcp -m tcp --dport 7788:7799 --tcp-flags SYN,RST,ACK SYN -j ACCEPT -A INPUT -i eth0 -p udp -m udp --dport 694 -j ACCEPT
Configure Local Services
These commands should be run on each box.
service iptables restart service mysqld stop chkconfig mysqld off service dhcpd stop chkconfig dhcpd off service httpd stop chkconfig httpd off rm -rf /etc/rc3.d/*awicp*
Both WiDirects are going to share storage space for data that will be shared between them. There is at least 2 GB of empty space available on the hard drive for the shared drive. Some models may have more space available. Check with AllCity Wireless support staff for more information. Run these commands on both WiDirects to create the partitions:
lvm lvcreate --size 2G -n LogVol02 VolGroup00 exit emacs /etc/drbd.conf
Copy the /etc/drbd.conf file from the operational WiDirect to the WiDirect that you are restoring.
After the configuration file is saved the next step is to create the drive metadata. These commands need to be run on the new WiDirect:
drbdadm create-md drbd0 drbdadm up drbd0 mkdir /shared service mysqld stop mkdir /root/AWICP/license chmod -R a+rw /root/AWICP/license cp /root/AWICP/etc/awicp.serial /root/AWICP/license reboot
One more step is required to modify the file locations on the secondary WiDirect:
mv /var/lib/mysql /var/lib/mysql.backup ln -s /shared/mysql /var/lib/mysql mv /root/AWICP/www/portal/branding /root/AWICP/www/portal/branding.backup ln -s /shared/branding /root/AWICP/www/portal/branding mv /root/AWICP/etc /root/AWICP/etc.backup ln -s /shared/etc /root/AWICP/etc mv /root/AWICP/logs /root/AWICP/logs.backup ln -s /shared/logs /root/AWICP/logs mv /root/AWICP/monitor-data /root/AWICP/monitor-data.backup ln -s /shared/monitor-data /root/AWICP/monitor-data mv /root/AWICP/db /root/AWICP/db.backup ln -s /shared/db /root/AWICP/db mv /etc/dhcpd.conf /etc/dhcpd.conf.backup ln -s /shared/etc/dhcpd.conf /etc/dhcpd.conf mv /var/lib/dhcpd /var/lib/dhcpd.backup ln -s /shared/dhcpd /var/lib/dhcpd
After running those commands there will be a period of syncing between the two drives. Run the status command to check the status:
service drbd status
The status command will indicate whether the drives are inconsistent or up to date. Initially the secondary drive will be listed as inconsistent, and the status command will show the percent synced between the two devices.
The drbd service manages the shared drive between the two boxes. To view the current status of the shared drive you can run the command below:
service drbd status
The commands in this section describe how to manually change which WiDirect is the primary one, and which is the secondary. These are only for reference, and do not need to be run typically. The following sections will describe how to use the heartbeat service to manage these automatically. To change the current primary box to be secondary you can run these commands:
/root/AWICP/bin/widirect_stop_all.sh service mysqld stop service httpd stop umount /shared drbdadm secondary drbd0
The other box can then be made the primary server:
drbdadm primary drbd0 mount /dev/drbd0 /shared service mysqld start service httpd start /root/AWICP/bin/widirect_start_all.sh
Configure Services for Failover
From the existing WiDirect copy the /etc/ha.d folder to the new WiDirect with the scp command. You also need to configure some of the other services:
service drbd stop chkconfig drbd off chkconfig heartbeat on
It is also recommended that you modify the system check page to show the status of the heartbeat processes. Run this command to edit the file:
Look for the line that shows "my $showFailoverStatus=0;" and change the 0 to a 1.