Mariadb in a Galera Cluster Maintenance and Recovery

Introduction

This document covers how to perform system maintenance with the Mariadb database in active production, and how to recover from power outages, or network failure.

Environment

SCM (Scyld Cloud Manager) currently leverages the Kolla OpenStack project which packages the OpenStack services into Docker containers.

The mariadb database utilizes galera to run a database cluster on the three OpenStack controller systems. The cluster provides high availability as well as scalability. The three instances of the database run inside docker containers. Containerization changes how the user interacts with the database, as it differs from running the database directly on bare metal.

The best method of accessing the database is to install mariadb on the ops system which has access to the private OpenStack API network. Several scripts are provided to interrogate the status of the OpenStack cluster as well as the database.

Note: The scripts referenced in this document are shipped with SCM and are installed in the directory /opt/scm/scripts/.

Checking Galera Cluster Status

The script galeraStatus.sh will report the current status of the database cluster.

Since each of the OpenStack controller systems has an individual copy of the database for redundancy, the Galera software ensures that the three copies of the database are kept in sync with each other.

To protect the integrity of the data, Galera requires a quorum or majority of controller instances to agree on the last update to the database. In the SCM case there are three controllers, a minimum of two controllers must agree on the last committed transaction. This means that at least two instances of the database must be running and communicating to agree on every update.

Without quorum Galera will not allow updates to be done ensuring that the instances are kept synchronized and avoid a condition called split-brain.

A healthy cluster will have a similar report from running the galeraStatus.sh script.

Variable_name Value
wsrep_last_committed 3832840
wsrep_local_state_comment Synced
wsrep_incoming_addresses 10.10.3.5:3306,10.10.3.1:3 306,10.10.3.3:3306
wsrep_cluster_size 3
wsrep_cluster_status Primary
wsrep_ready ON

All the fields in this report are important.

  • wsrep_last_committed: An ever increasing transaction ID.
  • wsrep_local_state_comment: The state of the cluster. Synced is the healthy state. Possible other states: Joining, Waiting on SST, Joined, Synced or Donor.
  • wsrep_incoming_addresses: The connected servers in the cluster; Healthy is all three.
  • wsrep_cluster_size: Number of servers in the cluster; Again healthy is 3.
  • wsrep_cluster_status: Primary means part of quorum and accepting updates; anything else is bad.
  • wsrep_ready: ON means communicating with an active cluster. OFF means not accepting queries.

Further details can be found on the Galera website at: http://galeracluster.com/documentation-webpages/monitoringthecluster.html

Normal System Maintenance

Since the Galera wrapped Mariadb database is able to continue in production with a quorum of instances ( 2 out of 3 ) , maintenance can be performed without downtime. However, this maintenance needs to be done carefully to keep the quorum when performing maintenance.

The recommended way to temporarily remove a controller system from the cluster is to first stop the mariadb container on the one controller, then stop the docker service on that same controller.

Manually stopping the docker service before a system shutdown will reduce recovery time when the system reboots. Relying on the shutdown command to stop all the docker containers, may result in unclean shutdown of docker containers due to a systemd timeout.

For the smallest possible cluster scenario, there will be three systems which will play the roles of controllers, vmhost and ceph data hosts. This scenario currently only exists in non-production development clusters.

Note: If you do not have osd’s running on the controller, skip step 2.

# docker stop mariadb
# docker exec -t ceph_mon ceph osd set noout # only when osd is on this controller!!
# systemctl stop docker

A check of the galera status with one mariadb database shutdown will resemble the following report:

Variable_name Value
wsrep_last_committed 3829914
wsrep_local_state_comment Synced
wsrep_incoming_addresses 10.10.3.1:3306,10.10.3.3:3 306
wsrep_cluster_size 2
wsrep_cluster_status Primary
wsrep_ready ON

The controller with the shutdown docker can now have its maintenance performed and then be rebooted.

Restarting a Controller

Once the controller is booted back into service, docker will automatically start the containers except for the mariadb container which was manually stopped. Verify all docker containers, with the exception of the manually stopped mariadb, have started and have a status of Up using the command:

# docker ps -a

If the noout flag was previously set in ceph, unset that flag now using the command:

# docker exec -t ceph_mon ceph osd unset noout

Verify that the OpenStack services and the rabbitmq have all re-started, and connected successfully by running the /opt/scm/scripts/OS_Status.sh script.

After all the openstack services report an Up status, start the mariadb container on the controller you just rebooted.

# docker start mariadb

Monitor the status of the database cluster as the database on the rebooted node synchronizes with the other two nodes in the cluster.

It may take several minutes for the synchronization to begin, so monitor the database for a few minutes before considering that the synchronization has started.

You need to proceed with caution here because when the the third instance connects with the two running instances the cluster size immediately jumps to 3 before galera discovers that the new instance needs to be updated.

The state during synchronization can be either Joining: receiving State Transfer
or Donor/Sync depending on which mode of synchronization that galera decides to use

of either transferring the deltas or the entire database.

Variable_name Value
wsrep_last_committed 3842876
wsrep_local_state_comment Joining: receiving State Transfer
wsrep_incoming_addresses 10.10.3.1:3306,10.10.3.5:3 306,10.10.3.3:3306
wsrep_cluster_size 3
wsrep_cluster_status Primary
wsrep_ready OFF

After the synchronization completes the results will be similar to the following:

Variable_name Value
wsrep_last_committed 3868364
wsrep_local_state_comment Synced
wsrep_incoming_addresses 10.10.3.1:3306,10.10.3.5:3 306,10.10.3.3:3306
wsrep_cluster_size 3
wsrep_cluster_status Primary
wsrep_ready ON

Give the database some time to be safe and check that all the OpenStack services are up and running before starting on the next controller. This may be a good time for a coffee break. Check the status again when you return.

Carefully restarting the controllers one at a time will allow you to maintain the controllers without experiencing downtime.

Recovery from Failures

Network Failure or Power Loss

After a network failure, or a power loss check on the status of the database by running the galeraStatus.sh.

If the database did not recover, the report may resemble the following:

Variable_name Value
wsrep_last_committed 3885809
wsrep_local_state_comment Initialized
wsrep_incoming_addresses 10.10.3.5:3306
wsrep_cluster_size 1
wsrep_cluster_status non-Primary
wsrep_ready OFF

To recover, run the kolla ansible playbook to startup the database. The playbook will interrogate each instance of the database to find the latest committed transaction ID. The playbook will then bootstrap the database using that instance.

Before running the playbook, stop the database containers on all three controllers using the command:

# docker stop mariadb

From the ops system run the kolla ansible playbook to restart the database.

# kolla-ansible -i inventory/<clustername> mariadb_recovery

The mariadb_recovery playbook is not always successful but it does discover which controller has the most up to date version of the database. So make note of which controller the playbook chooses to start first as the playbook runs.

Worst Case - mariadb_recovery Failed

On the controller with the most up to date version of the database edit /etc/kolla/mariadb/galera.cnf.

Change the wsrep_cluster_address to the following value. Here we save the original value by commenting out the line.

Change:

wsrep_cluster_address = gcomm://10.11.0.1:4567,10.11.0.8:4567,10.11.0.9:4567

to:

#wsrep_cluster_address = gcomm://10.11.0.1:4567,10.11.0.8:4567,10.11.0.9:4567
wsrep_cluster_address = gcomm://

Then edit /var/lib/docker/volumes/mariadb/_data/grastate.dat.

Change the value of safe_to_bootstrap from 0 to 1.

safe_to_bootstrap: 0

to:

safe_to_bootstrap: 1

Next Start mariadb on this just this one controller:

# docker start mariadb

On this controller in a separate window monitor the mariadb startup log:

# tail -f /var/lib/docker/volumes/kolla_logs/_data/mariadb/mariadb.log

If there are no obvious errors in the log and the database starts running, cancel the monitoring using tail with a CTRL-C and change to the ops management window to monitor the database with the script, galeraStatus.sh.

Variable_name Value
wsrep_last_committed 140856471
wsrep_local_state_comment Synced
wsrep_incoming_addresses 10.11.0.9:3306
wsrep_cluster_size 1
wsrep_cluster_status Primary
wsrep_ready ON

Once you see the cluster state of Synced, size of 1, status of Primary and wsrep_ready ON, start the mariadb container on the second controller.

Again monitor the startup and you will see the data sync over to this node:

Variable_name Value
wsrep_last_committed 140856195
wsrep_local_state_comment Joining: receiving State Transfer
wsrep_incoming_addresses 10.11.0.9:3306,10.11.0.8:3 306
wsrep_cluster_size 2
wsrep_cluster_status Primary
wsrep_ready OFF

Rerun the galeraStatus.sh script until you see the following:

Variable_name Value
wsrep_last_committed 140857617
wsrep_local_state_comment Synced
wsrep_incoming_addresses 10.11.0.9:3306,10.11.0.8:3 306
wsrep_cluster_size 2
wsrep_cluster_status Primary
wsrep_ready ON

Finally start mariadb on the last controller and monitor.

Variable_name Value
wsrep_last_committed 140856155
wsrep_local_state_comment Joining: receiving State Transfer
wsrep_incoming_addresses 10.11.0.1:3306,10.11.0.9:3 306,10.11.0.8:3306
wsrep_cluster_size 3
wsrep_cluster_status Primary
wsrep_ready OFF

You should see the state change to Synced, size of 3, status of Primary and wsrep_ready of ON.

Variable_name Value
wsrep_last_committed 140858943
wsrep_local_state_comment Synced
wsrep_incoming_addresses 10.11.0.1:3306,10.11.0.9:3 306,10.11.0.8:3306
wsrep_cluster_size 3
wsrep_cluster_status Primary
wsrep_ready ON

The database is recovered!

To clean up the controller where you started, you will want to change the settings back to their original values.

On the controller where you started:

# docker stop mariadb

Again, edit the file /etc/kolla/mariadb/galera.cnf.

# docker start mariadb

and check galeraStatus.sh to verify the controller rejoins galera successfully.

When One Instance Will Not Start

In rare cases one of the database instances will not start and you will see errors in the /var/lib/docker/volumes/kolla_logs/_data/mariadb/mariadb.log logfile.

If the other 2 instances are working and are synced you can quickly recover this corrupted instance by letting galera run a full sync of the database to replace the corrupted data. This is done by first stopping the mariadb container that is stuck in a restarting state, then removing the databases and finally start the mariadb container again.

# docker stop mariadb
# rm -rf /var/lib/docker/volumes/mariadb/_data/
# docker start mariadb

Verify that the database transfer completed by listing the directory:

# ls -lh /var/lib/docker/volumes/mariadb/_data/

Also verify that the cluster is again healthy with the galeraStatus.sh script.