Pacemaker

Recovering from full cluster shutdown

If at any time all of the nodes in your cluster have been taken down, it is necessary to re-initialize the Galera replication state. In effect, this is identical to bootstrapping the cluster.

Start by manually bringing up the cluster IP on one of your nodes:

ip address add 192.168.122.99/24 dev eth1 label eth1:galera

Re-initialize the Galera cluster:

mysqld --wsrep_cluster_address=gcomm:// &

Note the empty gcomm:// address.

Dealing with node failure

If an entire node happens to get killed, and that node currently does not hold the Galera IP (192.168.122.99 in our example), then the other nodes simply continue to function normally, and you can connect to and use them without interruption. In the example below, alice has left the cluster:

Testing resource recovery

If MySQL happens to die in your cluster, Pacemaker will automatically recover the service in place. To test this, select any node on your cluster and send the mysqld process a KILL signal:

killall -KILL mysqld

Then, monitor your cluster status with crm_mon -rf. After a few seconds, you should see one of your p_mysql clones entering the FAILED state:

Adding MySQL/Galera resources to Pacemaker

Once you have one instance of Galera running, and it is running on the same node that holds the temporarily-configured cluster IP (192.168.122.99 in our example), you can add your resources to the Pacemaker cluster configuration.

Create a temporary file, such as /tmp/galera.crm, with the following contents:

Starting Pacemaker

Once Corosync is running, you are able to start the Pacemaker cluster resource manager on all cluster nodes:

service pacemaker start

Once cluster startup is completed, you should see output similar to the following when invoking the crm_mon utility:

MySQL/Galera in Pacemaker High Availability Clusters

In this walkthrough, you will create a Pacemaker managed MySQL/Galera cluster. It assumes that you are running on a Debian 6.0 (squeeze) box, but the concepts should be equally applicable to other platforms with minimal modifications.

It also assumes that your Galera cluster will consist of three nodes, named alice, bob and charlie. Furthermore, all cluster nodes can resolve each other's hostnames.

Pacemaker and the recent GitHub service interruption

It never fails. Someone manages to break their Pacemaker cluster, and Henrik starts preaching his usual sermon of why Pacemaker is terrible and why you should never-ever use it. And when that someone is GitHub, which we all know, use and love, then that sermon gets a bit of excess attention. Let's take a quick look at the facts.

Maintenance in active Pacemaker clusters

In a Pacemaker cluster, as in a standalone system, operators must complete maintenance tasks such as software upgrades and configuration changes. Here's what you need to keep Pacemaker's built-in monitoring features from creating unwanted side effects.

Exetel relies on Ask The Expert Now™

Australian communications company Exetel relies on hastexo's remote professional service expertise through our unique Ask The Expert Now™ service offering.

Vaimo trusts hastexo for High Availability

Swedish e-commerce specialist Vaimo turns to hastexo for expert consultancy.

Syndicate content