High Availability in OpenStack

An update on high-availability development during the OpenStack Folsom development cycle. This presentation was delivered August 30, 2012 in San Diego, California. It was part of the inaugural CloudOpen conference hosted by the Linux Foundation.

Following up on his earlier talks at OpenStack Summit and OSCON, Florian summarizes the high-availability features OpenStack gained during the Folsom development cycle.

Florian's full presentation is available below.

Comments

Has HA been achieved in Folsom yet?

Seems Folsom does not support HA out-of-box. Moreover, in the long run, what the OpenStack community's long term vision tackling that problem? At this moment, I only see two SPOFs, MySQL and RabbitMQ, which theoretically can be handled by the combo of Pacemaker + Corosync + DRBD (though recently there are some fear and noise).

Any thought and comment?

Shuo

HA in Folsom

The OpenStack HA Guide is here. In OpenStack there are actually several single points of failure, with varying degrees of HA availability:

  • MySQL. Can be fixed trivially with DRBD and Pacemaker as the HA Guide illustrates. In future OpenStack releases we are expecting this to be gradually replaced with Galera based configurations with client-side failover once the ORM layer supports that.
  • RabbitMQ. Again, there's a trivial fix with DRBD+Pacemaker, and the long-term goal is probably to remove brokered message queuing altogether, which ØMQ offers. There are some pending issues with ØMQ and Quantum at the moment, though.
  • Several OpenStack API services, all of which are locally stateless and can easily be put under Pacemaker management. I hope to find the time to work this into the HA Guide soon.
  • Cinder, which with its default LVM/iSCSI implementation needs a little extra work in order to be fully HA capable.
  • Quantum, where there is presently no HA for the L3 and DHCP agents (quantum-server, however, is easy to fix up for HA). We're expecting this to be addressed in Grizzly with multi-host networking support in Quantum.

In summary, it's not perfect yet, but we're getting there.

Media

To view media content, please log in!