High Dimensional Space

Some of my otherwise most ardent supporters recently called out the model for its somewhat nonconventional and seemingly inconsistent attitudes towards the Dolphins of Miami.  Last week, the fish traveled to Dallas as 20-something point underdogs and the model instructed us to go wide open throttle.  This week, we’ve got Miami at home against LAC and the model wants us to bet AGAINST the little tadpoles.  What the fuck is going on?

The most obvious explanation would be a claim that LAC is significantly stronger than Dallas.  This claim has merit, at least based on those variables the model takes as inputs; LAC has the highest preseason DVOA of the three teams at 16.1%, while DAL sits at 5.1% and MIA at -30.6%.  DAL has performed much better in its opening weeks, heading into its MIA game with a 50.5% total DVOA versus a significantly lower -19.5% for LAC.  I put more stock in preseason DVOA right now given how early we are in the season, so I didn’t see a grave inconsistency here, but the size of the edges admittedly looks odd.

It’s easy to visualize models that only have 2 dimensions, X and y; you throw up a scatterplot, look where the points fall, and you have a good visual understanding of the model.  Even with 3 dimensions, you can look at funny 3-D plots and get a feel for your data.  But DomModel2019 computes across 8 dimensions, and that’s a little tougher to visualize.

In order to get a feel for how the model is handling Miami, let’s hold a few variables constant.  The relationship we’re most interested in is how changes in the spread affect Miami’s cover probability.  If we hold the DVOAs for MIA and LAC constant, we can fiddle with the spread and the points total for any given week to see how the model would predict MIA’s success.

LAC @ MIA

Think of this as a sort of “mesh,” a non-linear object that twists itself around.  This sort of non-linearity is an example of how sophisticated machine learning models are and how difficult it can be to interpret their output- and this is only across 2 of the model’s dimensions. 

According to the model, no combination of spread and o/u would enable MIA to prevail a majority of the time.  But more surprising is how Miami, being the underdog, actually decreased in coverage probability as their spread increased.  That’s not totally impossible to reconcile- we might conclude that the larger spread is indicative of a much stronger team, making it harder for Miami to cover.  But it’s the opposite of the relationship I anticipated.  I would have thought that a large number of points would correlate quite strongly with coverage probability.  Curious, I decided to run the same mesh for the Dallas game.

MIA @ DAL

The different combination of DVOAs yield fantastically different results.  Here, I get the relationship I expect- although there is a sort of “twist” where the relationship changes. 

I thought about these “meshes” for a long time.  The probabilities on the Dallas game seemed counterintuitively high, and the relationship for LAC this week just didn’t feel right.

I made a key error when I put the model into production.  The model was tuned on 80% of my NFL data, holding 20% out completely for testing.  This is of course with good intention- we need to make sure our evaluation of the model is pure, and information does not “leak” into our testing set and skew our performance via overfitting.  But that 20% could also contain key information, especially so when it comes to games at the “edge” of the model, where sample sizes are smaller- games like MIA with uncommonly high spreads.

I decided to re-train the model with the same hyperparameters (thus avoiding overfitting) on the entire dataset.  This changed the mesh for LAC substantially:

LAC @ MIA- updated model

And for DAL mildly:

MIA @ DAL- updated model

And, most importantly, provided much more realistic edges for this week- though note that lines have shifted in some games.

team_awayteam_homecover_teamcover_spreadcover_probcover_edge
Los Angeles ChargersMiami DolphinsLAC-17.060.25%15.03%
Dallas CowboysNew Orleans SaintsDAL-3.057.91%10.56%
Jacksonville JaguarsDenver BroncosJAX+3.053.63%2.38%
Minnesota VikingsChicago BearsMIN+2.552.81%0.82%
Washington RedskinsNew York GiantsWAS+3.052.35%-0.07%
New England PatriotsBuffalo BillsBUF+7.052.06%-0.61%
Tennessee TitansAtlanta FalconsTEN+5.051.79%-1.13%
Philadelphia EaglesGreen Bay PackersPHI+4.551.76%-1.18%
Oakland RaidersIndianapolis ColtsOAK+7.051.65%-1.39%
Cleveland BrownsBaltimore RavensCLE+5.551.49%-1.69%
Carolina PanthersHouston TexansCAR+4.051.34%-2.0%
Tampa Bay BuccaneersLos Angeles RamsLAR-10.051.07%-2.49%
Cincinnati BengalsPittsburgh SteelersCIN+4.550.97%-2.69%
Seattle SeahawksArizona CardinalsSEA-4.050.63%-3.34%
Kansas City ChiefsDetroit LionsKC-7.050.0%-4.55%

If you take nothing else from this post, it needs to be that the model’s outputs are incredibly complex.  For the LAC game, a statement like “larger spreads are better for the dog- to a point” would hold true.  For the DAL game, the exact opposite statement holds, and the difference between the two games lies in the variables we held constant- the DVOAs for the respective teams.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Powered by WordPress.com.

Up ↑

%d bloggers like this: