This post is a case-study about how we fixed some Redshift performance issues we started running in to as we had more and more people using it. A lot of the information I present is documented in Redshift’s best practices, but some of it isn’t easily findable or wasn’t obvious to us when we were just getting started. I hope that reading this post will save you some time if you are just getting started with Redshift and want to avoid some of the pitfalls that we ran into!
Over the last year, the Data Engineering squad has been building a data warehouse called Fulla. Recently, the squad rethought our entire data warehouse stack. We’ve now released Fulla v2 and Hudlies are querying data like never before giving us a better understanding of our customers and our product.
At Hudl, each squad on the product team takes two weeks each year to help out the coach relations team in an ongoing rotation known as Firesquads. This year, for Firesquads, the data science squad built a Naive Bayes classifier to automate the task of categorizing emails.
Hudl stores petabytes of video. In that video there are a lot of awesome plays. Figuring out which plays are the most interesting and sifting through the uninteresting footage is a huge challenge. To solve this problem, we leveraged deep learning, Amazon Mechanical Turk, and crowd noise. The result: basketball highlights!