Imagine this. There’s ten seconds left on the game clock. Your favorite team is currently down–– but they've just driven the ball all the way down to the 2 yard line. With no timeouts remaining, this is the final play for either team. Oh yeah, and you've got $250 on the line. Your quarterback snaps the ball, steps up in the pocket, finds his target, and…
For most people in this type of situation, knowing if you're walking away with money in your pocket is as good as a random guess. Thinkful graduate Albert Troszczynski isn’t most people, however. Using the power of data science, Albert can predict whether or not a team will score a touchdown with an astonishing 96% accuracy.
Creating a nearly perfect model
Like many data scientists, Albert started off with a research question––two of them, actually:
- What is the most computationally efficient model for predicting touchdowns before the ball is snapped?
- What factors influence the likelihood of scoring?
Next, he found a data set on Kaggle that contained the information he needed to build his model. This data set featured details about every regular season play for every team over the past 10 years. In total, he analyzed 407,688 plays and more than 100 different features of those plays.
In the NFL, teams have a maximum of 40 seconds between plays. In some situations, however, particularly towards the end of a close game, the time between snaps is far shorter. Therefore, when Albert began to build his model, he not only paid close attention to the accuracy of the model but also the speed at which the model could run.
The result: Using logistic regression, Albert's model accurately predicts whether or not a touchdown will be scored on a given play 96% of the time and in under a half a second.
A deeper dive into the model helps us understand which features are the best predictors. Unsurprisingly, data points like distance to the goal and yardage from a first down are among the most predictive. What Albert didn't expect: the New Orleans offense has been so prolific that simply being the New Orleans Saints is one of the top 10 top predictors of whether or not a touchdown will be scored!
Additionally, the model shows us how the down number (whether it's 1st, 2nd, 3rd or 4th) and the amount of time left in a game correlate with scoring touchdowns. Teams are most likely to score touchdowns on 3rd and 4th downs and during the 2nd and 4th quarters.
The Most Efficient and Consistent Offenses
Thinkful is now in a number of cities with NFL teams including Los Angeles and Washington, DC, so we try to stay neutral when it comes to picking sides. However, we're more than happy to let data science tell us which teams to pay attention to. For example, Albert's model provides new insights into which teams have the most efficient offenses (measured by touchdowns/play) and which have the most consistent offenses (measured by the distribution of touchdown/play across all games.) In other words, more efficient offenses need fewer plays to score a touchdown while more consistent offenses tend to need a similar number of plays to score a touchdown in each game.
You can see how your team stacks up in the chart below:
Find out more about Thinkful's Data Science Immersion and Data Science Flex courses.
Share Our Story
On behalf of all NFL fans at Thinkful, have an excellent Super Bowl LIV!