At Hudl we use RabbitMQ to help us decouple some operations from impacting normal web requests. We smooth out spikes in write traffic and isolate long-running or CPU-intensive tasks. We first started looking at queueing certain operations after one coach almost brought down our entire site.
The whole thing started one weekday afternoon. Hudl offers coaches the ability to upload Excel spreadsheets with their roster; we parse it and import that roster data. Coaches love it, we love it, all is well. One afternoon a coach, without realizing it, uploaded a million-row spreadsheet. That brought our Excel parsing SDK to its knees, which in turn brought that web server to its knees. No sweat, at the time we were running nine web servers. Problem was, the coach was impatient (I would be too) so he kept trying it again and again. One by one our web servers were falling over. It was a tense afternoon.
We’ve since moved our Excel parsing, and lots of other functions, over to a separate cluster of servers and use a separate queuing service to communicate between clusters. Using a separate queue allows us to decouple the two layers of our system. If queues are running slow, it doesn’t impact web servers.
After looking around at other queuing technologies, we eventually settled on RabbitMQ. It’s got a solid C# driver, is blazing fast, offers a nice management dashboard, and is super flexible. With flexibility naturally comes a learning curve; RabbitMQ is no different. I want to talk through some of the choices we made and lessons learned.
RabbitMQ: Fast & Flexible
RabbitMQ offers many options to balance performance and durability, and it has good documentation, too. For Hudl, we standardized on a single set of options and used that across all of our queues. We err on the side of durability. It’s worked great for us: we get the durability we want and still see performance enough to easily handle our load, which is typically not more than a few thousand messages per second. Here is the configuration we use at Hudl:
- Queues are durable to keep us safe when RabbitMQ crashes.
- The queue files live on EBS volumes. This keeps data safe if the server dies because we can re-attach the volume to a new server. Provisioned IOPS drives are critical to maintain performance. Normal EBS volumes have inconsistent performance.
- Delivery-Mode=2. The call from the web server to enqueue a message doesn’t return until the message is received and “persisted”. Because we use durable queues that means written to disk.
- We set Prefetch to 50 (via
IModel.BasicQos(0, prefetchCount, false)). There is no “best setting” for all systems, but the default of 0 is probably not ideal. This just means a client will grab 50 messages, wait until all are finished, and then grab 50 more. You’ll see better throughput this way. Go too high and you could have one server hogging all messages. Too low and you’ll be doing more round-trips to RabbitMQ more than you need.
- We explicitly acknowledge messages when they are finished processing. This protects us when a queue consumer crashes in the middle of processing a message. RabbitMQ will automatically re-deliver the message to another consumer. Important note: Messages can be delivered or partially-processed more than once. Consumers must be written in a way that takes this into account.
On To The Code!
Our back-end code is written in C#. RabbitMQ has a great .NET client library. Some lessons we’ve learned:
- Use a single IConnection per server (see the accepted answer on this question for a good explanation). All traffic to and from RabbitMQ can flow across that single persistent connection.
- Don’t share IModels (“channels” in the RabbitMQ documentation) across threads.
- We’ve invested a lot of time into our RabbitMQ code base. If we were starting today we’d use EasyNetQ, a wrapper around the RabbitMQ client that greatly simplifies the most common usage patterns.
Let’s look at some simplified example code. First, we’ll add some code to initialize our singleton connection. We run this once per instance.
Then we need a producer that pushes messages into a queue.
Finally, the consumer. Consumers pull messages of the queue, one at a time, process the message, and then tell Rabbit “I got this”. Rabbit will then discard the message. If something goes wrong and we don’t Acknowledge the message, Rabbit will eventually re-deliver the message again. It’s important that consumers are implemented taking this into account. Messages could be processed more than once.
An official WCF driver exists as well but, at the time we began using RabbitMQ, it needed some work before being production-ready. We’ve made some pretty major modifications before running RabbitMQ over WCF at Hudl.
Wrapping It Up
RabbitMQ and C# work well together. We’ve been using it now for three years in production and are still happy with the choice. It’s easy to get up and running with just the basic functionality but is also flexible if/when you want to use more complex routing scenarios. RabbitMQ is also very fast, we regularly push 1,500 messages/sec and Rabbit barely breaks a sweat. If you want to move some logic off of your web servers then it’s a great choice to do so in a durable and performant way.