By late 2013, our ACL service, Hades, needed a revamp. Sokrati had had a great year and the number of clients and agencies we were working with had exponentially increased. This web service handles authentication of apps and users and helps us manage all sorts of mappings (users to clients and agencies, clients to agencies, apps to app contexts, etc.). This might sound like a typical ACL service and in many ways, it is, but the data we manage via this service is highly sensitive and we couldn’t hand it over to a third party tool to manage it for us, so we built it ourselves.
This blog will cover how we made Hades highly scalable by overhauling the service and using materialized views with the help of Flexviews.
As the number of clients, users and agencies increased, scaling up to ensure the same quick response times became a priority. Since this service was written quite early in Sokrati’s life, we wanted to ensure that handling 50000+ clients would be as effortless as the initial 10 clients. So, we got a new set of SLAs and got to work.
A service with 14 servlets serving a variety of requests ranging from authentication and authorization to managing user client and user agency mappings and serving app context requests and app-ip authentication. Hades relied on Solr for fast retrieval of data with most calls bypassing the DB and hitting Solr instead.
A typical normalized database.
Various Options Considered:
- Only a mysql based flattened schema: Getting rid of Solr entirely, de-normalizing for optimization.
- Only Solr: Getting rid of mysql and storing all our data in Solr alone. Ruled out as we had unresolved concerns about Solr’s consistency checks and transaction management capabilities + fail-over, replication and all that good stuff.
- Flattened indexes in Solr only: Leaving the DB schema as is and flattening (de-normalizing) Solr’s indexes. Ruled out as it takes too long to process joins and then import the data. After benchmarking this option, we weren’t even close to meeting our SLAs. Delta imports surprisingly took even longer because the complexity of the joins went up.
New Hades Success Metrics:
- Authentication (500 QPS) – Response Time: 40 ms
- Get Calls (100 QPS) – Response Time: 500 ms
- Inserts (50 QPS) – Response Time: 500 ms
- Failure Rate Overall – Less than 0.5%
- A complete overhaul of the service: Getting rid of code that works is never clear-cut, but in this case it had to be done. We went over every call and drastically cut down on service calls that triggered multiple DB calls using a mix of smarter querying and a very handy project called flexviews – more on that later.
- The database: After pouring over the service, we realized that there are a couple of key tables – specifically User Client and User Agency mappings (Agencies have clients. Users are either mapped to an agency and so to its clients or directly mapped to a client) – on which joins resulted in big hits to performance.
DENORMALIZATION! The research went something like this:
- OK, what are the overheads? …Storage – flattening user-agency-client mappings is going to need thousands of additional rows.
- Storage?! HAH! Next overhead …DB calls are going to exponentially increase with the Solr buffer being removed.
- OK, let’s benchmark. More importantly, how are we going to de-normalize? How are we going to maintain the flattened tables? …Materialized Views!
- MySQL does not support materialized views …Let’s build it – TRIGGERS and PROCEDURES!
- Concerns: How will we ensure data consistency for the flattened schema? How are already existing rows going to be flattened? How are we going to handle the huge number of inserts on the D.B. for changes in mappings and new additions? How many triggers and procedures are we going to write? What about the hit to insert times if you have triggers flattening your data on inserts? Should we even use triggers, why not write a script that flattens inserted data periodically and take the hit on not having real time updates?
As you can see, this got complicated very quickly. That’s when we came across a project on github called Flexviews.
Flexviews to the rescue:
According to the project’s github page – “Flexviews is a materialized views implementation for MySQL. It includes a simple API that is used to create materialized views and to refresh them.”
This sounded perfect. We quickly downloaded and set it up but ran into a few issues immediately. Typically, at this point, you give up and move on to the next option as there’s no guarantee of fixes being rolled out to match your project’s deadline. But Greenlion was fantastic here. Every single issue we faced was speedily patched or, in some cases where it was our mistake, prompt guidance was given.
I’m not going to explain how flexviews works, you can (and should) check it out for yourself here.
Current flattened design:
We’ve got one materialized view using flexviews called user agency clients. This view holds all user -> agency -> client mappings and mostly eliminates the need for joins.
We’ve got a mysql event that runs every two minutes and refreshes this view. I’m happy to report that we’ve faced exactly zero issues with flexviews for the past three months since we’ve deployed it on production.
The current state of Hades:
Hades is running great with all SLAs being met. After benchmarking the new service, we expect our scaling issues to be over. This is especially good timing as Sokrati is expected to scale up to 27000 clients by the end of 2015.
Some reading material:
3 part article explaining how to setup and use flexviews:
Flexviews on github:
We’ve also written a puppet module that we’d be happy to share if there is any interest.