Financial Victory? An analysis of NFL wins from contract data.

14 Oct 2019

I’ve always been a Green Bay Packers fan, which may be a bit surprising given that I’m currently in Chicago as a part of the fall cohort of the Metis Data Science Bootcamp. But I am a Cheesehead born and raised. My family may not agree on everything, but we can agree when it comes to our NFL fandom.

Why look at contract data? My initial interest in this question comes from looking at the quarterback position. Since the 2010 season, the Super Bowl winning quarterbacks were Aaron Rodgers, Eli Manning, Joe Flacco, Russell Wilson, Tom Brady (three times), Peyton Manning, and Nick Foles. Of these, Joe Flacco and Russell Wilson were on their rookie contracts, while Nick Foles was a backup for a starter on a rookie contract.

I found this situation interesting. Considering also runners up in the Super Bowl, three additional quarterbacks on a rookie contract also appear. Furthermore, the 2018 NFL MVP Patrick Mahomes is still on his rookie contract. These contracts are notably cheaper than contracts given to the same players in future contracts.

This motivates my analysis. Does having additional cap space to spend on other position groups help a team win? More generally, what can linear regression say from spending patterns in relation to regular season wins?

To get started I got really well acquainted with BeautifulSoup, a library for working with HTML/XML files. With this, I scraped relevant contract data off of Sportrac.com. This data is somewhat limited, not by availability but by relevance, given that the collective bargaining agreement between the NFL and the NFL Players Association fundamentally determines the relevance of contract financials.

Once i had contract data, I then went to Wikipedia and acquired the remainder of my data, now came the hard part! This data is not at all suited to the techniques I have at hand, and as an NFL fan, it’s not the most direct measure of a teams success to know how they paid their punter!

Regardless, there is still important modeling considerations to entertain. For this project at Metis I was restricted to forms of linear regression (with regularization). Fighting overfitting became my main difficulty, as the dataset was quite small (~200 rows). I decided to use a LASSO regression (a linear regression with an L1 penalty on coefficients). What does that mean? Not only did it try to match my training data, but furthermore this algorithm punishes itself for the sum of the absolute values of the coefficients. By adding a contraint, the ability to overfit is reduced.

In the end, this model was not the greatest for betting at Vegas! My model on my testing set had a Root Mean Square Error of ~2.8. That is it’s generally around 3 games off on a 16 game season.