We’ve adopted some naming conventions. They make it easier to understand how our servers are related to each other, who can communicate with who and on what ports, and we can precisely filter via the API or from the management console. As we grew beyond tens and then hundreds of servers, standards like these will keep your engineering teams sane.
Running sites with high availability is a foregone conclusion for most businesses. There are plenty of blog posts and articles out there talking about “nines”, but few really describe how to actually measure availability. Five nines of what, exactly? How do you measure continuous uptime of a website that serves discrete HTTP requests? Here’s how we measure server-side availability both overall for our individual microservices on hudl.com.
In August we migrated our core user data (around 5.5MM user records) from SQL Server to MongoDB. We moved the data during the daytime while still taking full production traffic, maintaining nearly 100% availability for reads and writes during the course of the migration. Our CPO fittingly described it as akin to “swapping out a couple of the plane’s engines while it’s flying at 10,000 feet.” I’d like to share our approach to the migration and some of the code we used to do it.
We took time to optimize our EC2 instance types. By finding the maximum load a server could handle we were able to run a quarter as many app servers. Our hourly spend dropped by 50%. Despite the huge cost savings, we also saw a 2x improvement in response times! This came about by moving to a newer instance family.
One part of Hudl I frequently have to explain to people outside the company is the structure of our product team. Fellow developers at other companies, friends I graduated with, and plenty of people in between want to know how Hudl works—and as it turns out, there’s a lot to talk about. We’re constantly evolving and learning more about how to keep our heads on straight, and as we do, we want to get the lessons learned on the table.
Speed, innovation, and creativity… the key components of creative genius. Skunkworks is specifically designed to unleash the creative wrath of our product team. At Hudl, we use Skunkworks to explore new technologies and tools that make us better at what we do.
At Hudl, we like to move quickly. We are constantly fixing issues, building new features, and improving the experience for our coaches and athletes. We put a lot of thought into how we work and dedicate a lot of time to making sure we are working as efficiently as we can. So, when we began to run into major bottlenecks in our deployment process, we realized we needed a major change. We came up with a plan to break our monolithic application into smaller components, and thus The Multiverse was born.
As a company, we understand that one of our key competitive edges is moving quickly. We develop and ship new features continuously. Before we started moving toward our microapplication architecture, we were deploying our monolithic application ten times a day. Even though the number of Monolith deployments is trending down as we break it out into multiple, smaller applications, we still rely on our deployment infrastructure to deliver multiple payloads daily.
An autoscaling farm of AWS EC2 instances sits behind our front-facing web application, working on heavy, long-running tasks like video transcoding, thumbnail generation, and computer vision processing. It’s a battle-tested combination of queues, worker instances, and an orchestration service called Lifeguard that easily hammers through thousands of these CPU-bound jobs per minute.
Limiting yourself to one area of expertise introduces blind spots to your code. Learning several technologies across multiple layers in the stack makes you a better programmer, reduces bugs, and increases your value.
Logging in your application is super important. Early on it’ll be fine to peruse your logs manually. As traffic increases you’ll soon have a need to aggregate all that data into one place so it can be easily searched. At Hudl, we chose Splunk. There are a lot of competing products, but it works well for us.
Either way my advice is clear: Log. All the things. You won’t regret it.