Why PingOne's heart doesn't skip a beat: MySQL Galera
PingOne runs on two separate data layer technologies for customer facing applications. Short of this configuration we simply could not claim true geographic fault tolerance or high availability.
One is MySQL Galera and the other is Cassandra DB. Both are geographically distributed write/read anywhere technologies. Our data layers are distributed in a hybrid cloud architecture made up of two private data centers and one public. The reason for this set-up? Scalability, flexibility and high availability.
PingOne can autonomously handle failure in the stack ranging from single sub systems to an entire datacenter.
Today, I'm going to layout our Galera MySQL architecture. I will dissect other parts of the architecture in later articles, so stick with me.
As so many people say but so few follow, choosing the right technology for the problem can be the difference between success and failure. Galera is not the right MySQL solution in every case study, nor is it perfect, but it does have some advantages we found attractive.
The Pain
High availability and fault tolerance come at a cost, but once you’ve paid it your enterprise application is a lot better off. Our MySQL Galera implementation required massive amounts of time and effort into every level of our stack.
Among other considerations, WAN replication forced programming consideration for developers and hurdles to clear for performance engineers, monitoring and reporting engines had to be built to simplify troubleshooting, database queries were rewritten to boost performance, test engineers sacrificed sleep, and databases were reconfigured. By no means was this a small endeavor.
The Benefits
Hurricane Sandy unfortunately battered the East Coast, but it couldn't bring us down even though PingOne lost a datacenter. As a result of our distributed architecture, the majority of PingOne customers knew Sandy only as a video clip on the nightly news.
As one Ping employee put it, “Yup...Still Up”. I think there were even a few songs written about it that were wildly popular here.
The Architecture
Our MySQL Galera implementation contains all configuration data for PingOne. Various subsystems contribute or retrieve information from this geographically dispersed relational database.
Functional requirements for the RDBMS were fairly stringent. Security around authentication and authorization were paramount. If any way possible, we wanted reads/writes from anywhere. Why not right? We have a small dataset, our writes aren't enormous, lets take advantage of it. Data needed to be consistent across the architecture. The cluster was required to live through datacenter failure. Write performance is important but not in the nanosecond realm but needed to be relatively fast. Fast being a holistic view of our entire stack. Is PingOne within web standards and meeting customer needs?
The Galera MySQL architecture was designed in a six node cluster distributed across three dispersed datacenters. This equates to two servers per data center, allowing for a single server in any data center to go down without triggering a data center failover event.
The largest performance impact as many would guess is network latency and thus WAN replication. Its the bane of distributed database. Galera uses cert based replication (aka certification based conflict resolution) which requires far less overhead and thus better performance over the WAN than traditional 2 phase commit. This however does come with a limitation.
The single largest hurdle with certification based replication and thus Galera is Deadlocks. Known by the PingOne team as WANlocks. Why do we call them WANlocks? Because these aren’t traditional deadlocks even though the exception generated is a deadlock.
Traditional deadlocks occur when-
Process 1 holds a lock on TableA and wants a lock on TableB. Process 2 holds a lock on TableB and wants a lock on TableA.
WANlocks are the result of optimistic locking on replication across the cluster. This can happen in a LAN environment too, its simply more pronounced in a WAN model. Optimistic locking is inherent in certification based replication. More on this from my friends over at Severalnines.
The question is how did we get around this? There are several options including:
The most common being all writes directed at a single MySQL server. Often recommended because its dead simple to implement. Just redirect all writes to one location and your done. Set and forget. However now there is an added layer of vertical scalability. You have to add server resources for both the primary and secondary machines designated for writes. If your servers are large and you have plenty of headroom, then this is a perfectly viable solution.
The second option is partitioning writes. Not a bad idea if PingOne had more than a couple databases to distribute writes to.
Last is eliminating hot tables whether through application changes or schema changes and quite possibly both. Yes it can hurt the brains a little bit. Yes it will take more time to get it straightened out. Yes we chose it. Why? Because once its done, you have eliminated a major road block in being able to write anywhere.
Benefits of certification based replication and thus optimistic locking allow for less over-head on replication. This combined with Galera’s parallel replication allows for very fast synchronization, especially when considering the WAN. This was probably one of the largest factors in our design choice.
As a result of the PingOne team’s effort and our current configuration, we have managed to achieve 5500 randomized reads/writes per second or 330,000 reads/writes per minute at 17% writes and 83% reads in stress testing. The percent writes to reads is fairly indicative of our typical workload.
Next post I'll delve a little bit into our Cassandra architecture.
About the Author: Michael Ward is a Site Reliability Engineer for Ping Identity, specializing in high availability solutions and system architecture. mward@pingidentity.com