Purdue-Michigan and Introduction to Football Analytics
So you've listened to the podcast and you've read the Predicto, but you're still not sure how Michigan can be favored by 21 points when Vegas has the spread at 10, and you probably need an MS to understand all that analytic stuff anyway, but it sure would be nice to be able to get your feet wet before that zlionsfan guy pushes you into the deep end in November, right? (Not that it's a good time now, since pretty much every sport is going on at once, never mind classwork for those of you who have it.)
Besides, you know analytics isn't just a guy in a basement cranking through Lotus 1-2-3 spreadsheets and running a fantasy baseball league by hand. Purdue has a Sports Analytics Club (BTW, some of those members seem to be pretty good at trivia), and if you don't mind hopping the fence at IUPUI, you can even get an MS in Sports Analytics now. (25 years too late for me, unfortunately.) So it fits pretty well with the Purdue fan base, but that doesn't mean you can just dive right in and start talking about Adjusted Line Yards and PPP+ yet ... what can you do to prepare for Homecoming?
Good news! I have just the thing for you. If you need to, think of this as CS/STAT 24250, Introduction to Football Analytics. Prerequisites for this class are a reasonable knowledge of basic football stats, an open mind, and refreshments of your choice - no CS or STATS background required unless you're going to do your own systems, in which case you want the 500-level class down the hall.
There are no pictures. Ask again and I'll assign you homework in Flash.
Once you start looking, you'll find any number of sites that have valuable data to page through, but for this class, we'll lean heavily on a couple of sites in particular: Football Outsiders, which does a lot of NFL stuff but also a reasonable amount of NCAA stats, and Football Study Hall, which focuses specifically on NCAA football and has a ton of good stuff from FO contributor Bill Connelly.
Disclaimer: I spent nearly a decade doing volunteer charting for Football Outsiders. Feel free to discount any or all of this info and use other sites as you prefer; there is no deadline to drop this class and doing so will not affect your GPA.
The purpose of analytics is to help explain the story in front of us, not to replace basic stats (in most cases; replacing player wins in any team sport would be a favor to all fans) or to override what you see yourself. Ideally, you'll use all three things together to get a better understanding of what happens on the field ... but even so, keep in mind that analytics focuses on probabilities. A system that predicts a 17-point Michigan win isn't saying that the Wolverines will win by 17; it's saying that based on the data it has, with matchups between teams like Purdue and Michigan, a 17-point UM win is the median outcome, but other factors can affect the expected margin of victory. Most notably, coaching changes tend to have unpredictable effects, mostly because you can't really separate a coach's ability from that of his assistants and his team: we can tell that Hazell really struggled and Brohm is off to a great start, but we can't really quantify how much of 2016 was on Hazell and how much of 2017 is credited to Brohm. (tl;dr: don't email a stats person saying "your system doesn't work because Howard beat UNLV"; your email will go into a folder marked "that's not what we're talking about here" and you will get no reply.)
One way that analytics can be used is to find things that correlate with winning; this can help us identify teams that are going through significant changes from the previous season, perhaps like the Good Guys are now. Connelly's done some of that work for us: go and read his article about the Five Factors in college football.
OK. Explosiveness, efficiency, drive-finishing, field position, and turnovers. It makes sense that these are things that winning teams will do more often ... but also that they are just reasonably predictive, rather than absolutely predictive. JakeTroch looked at games from the second half of the NCAA season over a ten-year period and found that even when one team comes into a game with all five factors on its side, they win less than 80% of the time. Again, this makes sense: an excellent coach can create a game plan that can take down a better team, conditions can create an environment that favors a weaker team, etc. etc.
Still, 79.1% is a pretty good number. So we'll keep that in mind when looking at Purdue and Michigan while we figure out why it's likely that this won't be a turning point for the program. Let's start by looking at each of the Five Factors, first with a look at overall numbers, then a deeper dive into individual components.
National average: 40.0%
When Purdue has the ball: Purdue 50.0% (18th), Michigan 26.9% (7th)
When Michigan has the ball: Michigan 36.8% (108th), Purdue 38.6% (60th)
Efficiency, or success rate, is roughly the percentage of the time that you get at least the number of yards you need on a play to set yourself up for a first down (or to actually get one, if it's third down). The average I-A team (all stats here are for I-A teams; there are currently 130 of them) does this on 40% of their plays. You can think of this as the "bend" portion of the "bend but don't break" defensive metaphor: that kind of defense will bend to allow a higher success rate, but won't break to allow an explosive play (which is our next section).
Overall, the numbers look like what you'd expect: Purdue's been moving the ball efficiently, successful on half of their plays, while Michigan's defense has limited opponents to just over half of Purdue's success rate. Going the other direction, Michigan's offense has struggled to move the ball effectively, while Purdue's defense has not been great at forcing second- or third-and-long. But what if we narrow this down further?
National rushing average: 41.6%
When Purdue runs the ball: Purdue 52.3% (15th), Michigan 33.0% (20th)
When Michigan runs the ball: Michigan 38.1% (98th), Purdue 32.8% (18th)
National passing average: 40.0%
When Purdue throws the ball: Purdue 48.6% (31st), Michigan 20.7% (6th)
When Michigan throws the ball: Michigan 34.9% (99th), Purdue 44.1% (93rd)
Of the four matchups, one leans heavily in one direction: when Michigan runs the ball. The Wolverines don't get consistent yards on the ground, and Purdue doesn't give them up. Understanding that there may be some scheme things that skew the data this way (for example, if Missouri's passing attack was so bad that Purdue didn't have to worry about it), one thing to watch for Saturday is what Michigan will do to change things up on the ground, since these numbers suggest that they'll continue to find themselves in long-yardage situations if they run the ball. Won't they?
National average: 1.17 PPP
When Purdue has the ball: Purdue 1.27 (37th), Michigan 1.17 (69th)
When Michigan has the ball: Michigan 1.38 (17th), Purdue 1.09 (55th)
PPP is Equivalent Points Per Play; in other words, not actually points divided by plays, but the value you got on the plays, which in this case are only successful plays (i.e. the ones that go into Success Rate). Connelly describes how changing to this model - looking at only PPP on successful plays, instead of all plays - suggests that big plays in general don't have nearly the impact we think they do. (That thinking usually comes from observer bias - we remember the big, high-impact plays because they're high-impact plays, not because they're big. Do you remember a 45-yard gain on a drive that ended in 0 points? Probably not. And if that drive ended in 7 points instead, it was likely from efficiency - getting first downs to continue the drive - rather than simply from explosiveness.) So this is the "break" component; how often does an opponent break a big play on your D?
Both offenses are pretty good at getting points - or putting themselves in good position to score - off successful plays; both defenses struggle a little bit at containing successful plays.
National rushing average: 0.90 PPP
When Purdue runs the ball: Purdue 0.99 PPP (39th), Michigan 0.87 PPP (62nd)
When Michigan runs the ball: Michigan 1.10 PPP (16th), Purdue 0.80 PPP (47th)
National passing average: 1.48 PPP
When Purdue throws the ball: Purdue 1.45 PPP (65th), Michigan 1.60 PPP (97th)
When Michigan throws the ball: Michigan 1.80 PPP (20th), Purdue 1.30 PPP (46th)
Running the ball seems similar to overall: both teams can break long runs on occasion, and both teams give up long runs a little more than you'd want, but still are better than average about it. Michigan's prone to giving up big plays in the passing game, but the Boilers, sadly, aren't as explosive as you'd expect (CATCH THE BALL) ... which matches what you saw against Missouri. Three TD drives of 75+ yards with a long time of possessions is the epitome of success rate; explosiveness is about breaking the big play long rather than getting caught from behind. Big plays are still useful, obviously, both from an emotional standpoint (getting the team and the crowd fired up) and from a situational standpoint (because now it's first and 10 again).
National average: 29.7 (just short of your own 30)
When Purdue takes possession: Purdue 29.5 (72nd), Michigan 26.6 (33rd)
When Michigan takes possession: Michigan 30.2 (64th), Purdue 29.8 (87th)
To no one's great surprise, the Boilers tend to start a bit deeper in their own end and tend to give up the ball a little closer to their own end zone, but as you can see, in both cases, they're effectively average. Michigan is much better at forcing opponents deep and a little better at getting the ball in good position. This is kind of a combination of all units - offense, defense, and special teams - so it'd be nice to dig into special-teams numbers a bit, since we have a lot on the offense and defense above.
When Purdue punts: Purdue success rate 75.0% (31st), Michigan return success rate 37.5% (91st)
When Michigan punts: Michigan 60.0% (64th), Purdue 25.0% (102nd)
When Purdue kicks off: Purdue 92.9% (28th), Michigan 0.0% (115th)
When Michigan kicks off: Michigan 93.3% (26th), Purdue 0.0% (115th)
Both teams have a ton of touchbacks on kickoffs; neither team has a successful kickoff return. Purdue's done a better job of punting; Michigan's done a better job of returning. It's fair to put most of the field position difference on the offense and defense.
National average: 4.32 points per trip inside opponents' 40
When Purdue has the ball: Purdue 5.35 (22nd), Michigan 3.00 (20th)
When Michigan has the ball: Michigan 3.40 (118th), Purdue 3.38 (27th)
No surprises here if you've been following both teams. Purdue has had newfound success across midfield, while Michigan has run into some unfamiliar problems there. Michigan's defense does a great job of slowing opponents to a crawl (or at least a FG attempt), while Purdue's stolen 3 TDs so far this season inside the 5 ... which is probably unsustainable. (See the Turnovers section below.)
Purdue has a clear edge here: even if the Wolverines drive consistently, if those red zone issues crop up, they could be enough to make a blowout into a close game ... or a close game into a real surprise. On the other hand, Michigan's defense is significantly better than Louisville's, so Purdue's success may not be as great when adjusting for opponent strength.
Purdue: Expected +1.31 (47th), actual +3 (23rd), turnover luck +2.82 PPG
Michigan: Expected +3.22 (12th), actual +1 (48th), turnover luck -3.70 PPG
A key to this section is remembering that in general, forcing fumbles is a teachable skill, but recovering fumbles is generally luck. The expected TO margin is calculated from splitting fumble recoveries 50/50 and adjusting interceptions to match the average national ratio between INTs and deflections (around 22%). You can see that there's about 6.5 points per game of turnover luck heading Purdue's way - that's a sizable advantage, and can help to explain why the computers see this as a bigger Michigan win than the carbon-based lifeforms do.
Purdue's recovered all 6 opponent fumbles this year, which gives them +3 over expected. The Boilers have 2 picks on the year, but 12 PDs; if such a thing were possible, they'd be short about half an interception or so (with 24 PDs, an average team would have 5 INTs or so). Purdue's recovered 2 of their 3 fumbles, which is another +0.5; opponents have 4 picks and 13 PDs, which is about an extra pick than you'd expect (so a -1). Since that doesn't quite add up right, there's something I'm missing in my explanation, but you get the idea.
Michigan is down 1 on fumbles (lost 3 of their 5, opponents lost 3 of their 7), a little short on their own INTs (3 picks, 15 PDs), and has had way too many opponent INTs (2 picks and just 3 PDs). If a Michigan fumble bounces back to the offense, or if an easy pick skips harmlessly off a Purdue defender's hands, that'll be turnover luck evening out.
You knew it was coming. Sure, Purdue seems to have a statistical edge ... but we know Missouri and Ohio look pretty brutal, and let's be honest, Louisville's been kicked around by a couple of teams. So how do we account for that?
This is where the magic of modern computing comes in. These days, you can weight results based on who your opponents were. As the season plays out, we get a better idea of the quality of each team, so opponent adjustments become more understandable. (You don't think Minnesota is as good as Wisconsin, do you?) That also means that early-season ratings are usually based on some combination of past results and current results, sometimes with additional factors that correlate with expected results.
So for this data, we'll hop over to Football Outsiders and look at Connelly's S&P+.
Note: in sports statistics, + generally means that a statistic is adjusted for opponents or otherwise normalized so that direct comparisons are more valid.
Purdue: 71st overall, 66th offense, 75th defense, 76th special teams
Michigan: 11th overall, 49th offense, 2nd defense, 44th special teams
Now that seems like a clearer picture. Yes, Michigan's offense struggled, but they played one game against another probably-elite defense (Florida) and another against a defense that gambled by bringing run pressure in hopes that they wouldn't get repeatedly run over (Air Force). Purdue's played one mediocre defense (Ohio) and one terrible defense (Missouri).
S&P+ is based on play-by-play and drive data; the other component to FO's F/+ ratings is Brian Fremeau's FEI ratings, which are based on drive efficiency. This early in the season, Fremeau doesn't split them out into offense, defense, and special teams.
Purdue: 93rd overall, SOS 60th
Michigan: 11th overall, SOS 30th
Preseason projections are 57% of these numbers, which explains why Purdue is in Illinois territory and Michigan is about where you'd expect. This actually passes the eye test pretty well: nobody expected Brohm to get things going this quickly, and yet, here we are. The Boilers head into an early Homecoming as significant underdogs. They're probably going to lose big ... except if they're better than we expected, in which case ... well, if rating systems were 100% accurate, nobody would take bets on games, right?
S&P+ likes Michigan -17.7 (84.6%). The combined F/+ projection says -22.1 (89.9%), and the adjusted S&P+ projection says -32.3 (96.9%). That's ... well, that adjusted projection is a new feature this year. I suspect it needs some tweaking. (I hope it does.) Last week's article is here, but you can find a link to the Google sheet with Week 4 picks in it.
As I've mentioned elsewhere, Massey has Michigan as a 91% or -21.5 favorite, whichever way you want to look at it. Sagarin's normal system sees the Wolverines as somewhere between -11 and -16.5, but the new feature he's testing says -35.5. (Did I mention no system is right all the time?)
Despite all that, remember that pretty much everyone knows that Purdue's much better off under Brohm, but no one knows exactly how much better they are. We won't necessarily find out Saturday, either ... but we might find out that predictions still haven't caught up with the Boilers.