Imagine this. There’s ten seconds left on the game clock. Your favorite team is currently down by a touchdown but has driven the ball all the way down to the 2 yard line. With no timeouts remaining, this is the final play for either team. Adding to the tension, you have some money on the line. Your friend bet you $100 that your team would lose. Your quarterback snaps the ball, steps up in the pocket, finds his target, and… the tv goes black. Your little brother accidentally tripped on the wire.
So what happened? Did your team win? Are you up a hundred or down a hundred?
For most people in this type of situation, the answers to those questions are as good as guesses. Thinkful graduate Albert Troszczynski isn’t most people, however. Using the power of data science, Albert is able to predict whether or not a team will score a touchdown with an astonishing 96% accuracy.
Creating a nearly perfect model
Like many data scientists, Albert started off with a research question - and in his case, 2 research questions:
- What is the most computationally efficient model for predicting touchdowns before the ball is snapped?
- What factors influence the likelihood of scoring?
Next, he found a data set on Kaggle that contained the information he needed to build his model. This data set featured details about every regular season play for every team over the past 10 years. In total, he analyzed 407,688 plays and more than 100 different features of those plays.
In the NFL, teams have a maximum of 40 seconds between plays. In some situations, however, particularly towards the end of a close game, the time between snaps is far shorter. Therefore, when Albert began to build his model, he not only paid close attention to the accuracy of the model but also the speed at which the model could run.
The result: Using logistic regression, Albert's model accurately predicts whether or not a touchdown will be scored on a given play 96% of the time and in under a half a second.
A deeper dive into the model helps us understand which features are the best predictors. Unsurprisingly, data points like distance to the goal and yardage from a first down are among the most predictive. What Albert didn't expect: the New Orleans offense has been so prolific that simply being the New Orleans Saints is one of the top 10 top predictors of whether or not a touchdown will be scored!
Additionally, the model shows us how the down number (whether it's 1st, 2nd, 3rd or 4th) and the amount of time left in a game correlate with scoring touchdowns. Teams are most likely to score touchdowns on 3rd and 4th downs and during the 2nd and 4th quarters.
The Most Efficient and Consistent Offenses
Thinkful is now in a number of cities with NFL teams including Los Angeles and Washington, DC, so we try to stay neutral when it comes to picking sides. However, we're more than happy to let data science tell us which teams to pay attention to. For example, Albert's model provides new insights into which teams have the most efficient offenses (measured by touchdowns/play) and which have the most consistent offenses (measured by the distribution of touchdown/play across all games.) In other words, more efficient offenses need fewer plays to score a touchdown while more consistent offenses tend to need a similar number of plays to score a touchdown in each game.
You can see how your team stacks up in the chart below:
Share Our Story
On behalf of all NFL fans at Thinkful, we’d love for you to share this story to help celebrate the 2018-2019 season.
If you’d like to cite this story on your digital publication or blog, please make sure to mention that this project was conducted by Albert Troszcynsky, a student in Thinkful’s data science bootcamp.
If you'd like to chat with Albert about his analysis or any of Thinkful's data science experts, please email Adam Levenson.