Skeptical Football: A Tale Of Two Quarterbacks Struggling Against The Buffalo Bills

There’s a bit of a schism in sports fandom. On one side there are those who want more and more statistical analysis (Hi, everybody!); on the other there are those who think stats are overused and blanch at how sabermetrics and analytics have changed what it means to be a good fan.

But I have a theory about this latter group: In general, they’re not really anti-stats. Virtually every argument about sports on TV or online is made using stats of one sort or another.¹ A typical exchange between talking heads includes one guy emphasizing one set of stats (“He throws a lot of touchdowns!”), which is then countered by another (“But he throws too many interceptions!”). Almost no sports fans are truly anti-stats, they’re just anti-complicated, hard-to-understand stats.

And to some extent, they’re right. Over-reliance on advanced metrics can lose the forest for the trees, and vice versa. But, ideally, good stats aren’t meant to eradicate classic storylines or debates, but to lend context to them (and hopefully to shed new light on difficult questions along the way). As usual, let me illustrate with an example using Peyton Manning and Aaron Rodgers.

The Denver Broncos and Green Bay Packers each played the Buffalo Bills in Weeks 14 and 15 of this season, respectively. In both games, the MVP-candidate QBs “struggled” statistically. This shouldn’t be a total surprise: Despite having games against Manning, Rodgers and Tom Brady, Buffalo has had arguably the best defense in the NFL this year (judging by expected points denied per play).

But Rodgers’s and Manning’s stats seemed particularly bad. Each threw two interceptions, no TDs and fewer than 200 yards. Manning’s 51-game TD streak ended, and Rodgers threw just his fourth and fifth INTs of the season.

The media wasn’t kind to either quarterback, but much of it was particularly brutal to Manning. Here’s the Colorado NBC affiliate: “Denver wins despite Manning’s worst game as a Bronco.” Meanwhile, a number of stories about Green Bay’s loss emphasized Rodgers’s lack of interceptions this year or the fact that his receivers dropped or tipped some key passes.

But not all no-TD, two-INT, 180-yard games are created equal. For example, Manning’s two interceptions were pretty “good” as far as interceptions go: the first was 42 yards downfield (which is practically a punt), and the other was 18 yards downfield on a third-and-12 — with the Broncos up 21-3. In general, it’s a bad idea to judge a QB who throws a small number of passes in a game his team led wire to wire.

Besides, touchdowns and interceptions can be fickle: For example, sometimes a significant part of QB efficiency can be accounted for by whether his team likes to run or pass on first-and-goal from the 1-yard line. But a QB often has just as much of an effect on his team’s ability to run the ball as he does on its ability to throw it. (If all teams played optimally, game theory suggests he should affect them about equally, because opposing defenses should adapt to a stronger passing game by devoting more resources to it.)

With some exceptions, it generally makes more sense to judge a QB by the outcomes of his team’s offensive drives. From this perspective, the difference between Manning vs. Buffalo and Rodgers vs. Buffalo was pretty stark. Here are the outcomes of each player’s drives by situation:

Denver started out its game against Buffalo with a punt, then scored TDs on three of its next five drives (also, one of those drives ended in field goal range after Jacob Tamme fumbled a completed catch). Up 18 points in the second half, its offense stalled, particularly as it attempted to run more. But even counting those possessions, 10 (non-end-of-game) drives were turned into three TDs and one field goal. This may have been a bit of an off day for Peyton Manning, but that’s a good day for most QBs. Denver’s 2.18 points per drive was only slightly below its season average of 2.33, and was better than 24 teams have averaged in 2014. Green Bay’s offense, on the other hand, started out cold (punting on three of its first four drives), and basically stayed that way — ultimately scoring only 13 points on 13 drives.

The point here isn’t to knock Rodgers or Green Bay. The Rodgers-led offense still leads the league with 2.7 points per drive this year, and with his TD/INT ratio (so beloved by media everywhere) still a league-best 7/1, Rodgers is still probably the MVP frontrunner. But we should understand the limitations of first-order stats that people are shouting about, and how they can be deceptive. What context do they include, and what do they ignore?

Chart of the week

The Seattle Seahawks’ defense has its own deceptive stats. The defending champions are in an odd spot. If the playoffs started today, the 10-4 Seahawks would play a wildcard game on the road against the 6-8 New Orleans Saints. And depending how the next two weeks go, they could easily end up as the top seed in the NFC, or out of the playoffs entirely.

Two weeks ago, I introduced some “scoring curves,” and showed how Seattle’s defense (with the team 8-4 at the time) flirted with league average in many situations (such as when its opponent has a long way to go for a touchdown). Many readers expressed skepticism, particularly because Seattle has the best defense in the NFL by the old “yards allowed” metric, and is among the league leaders in points allowed per game (as well as yards per play against).

I partially agree: I find it very unlikely that Seattle’s defense is average or below average. And I’m tempted to go further and say that it’s unlikely this defense is much worse than last year’s squad. But the stats show the defense has had a pretty huge regression to the mean in measurable defensive outcomes.

To show just how much these kinds of things vary from season to season, I’ve plotted each team’s expected points allowed per play on offense vs. expected points allowed per play on defense, and then shown how this year’s iterations compare with last year’s:

Seattle has had a pretty big decline on the defensive side, but this is to be expected: Last year’s results were a big outlier, and outliers are more likely to regress toward the mean. For example, Denver’s incredible 2013 offense declined similarly. Both remain among the top tier of teams for each respective side, but are much closer to the pack than they were last year.

Once again, the context here is important, and this time for either side of the advanced-stats debate: Simply looking at basic defensive stats and saying that everything is fine with the Seahawks’ D misses a dramatic decline. But simply looking at the magnitude of the decline without considering the context would overvalue its importance.

Twitter question of the week

@skepticalsports NFL q: if a kicker was 95% accurate from X distance, at what X is he the #1 pick in daft? MVP? I say ~70. Thoughts?

— Matt Glassman (@MattGlassman312) December 5, 2014

Like many counterfactuals, this is not an easy question to answer definitively, since having a kicker who is automatic from long range might have all kinds of ripple effects on the game that we can’t really foresee.

(Although unlike many counterfactuals, it’s not a completely crazy idea: Thinking about a kicker who can usually nail it from 70 yards seems ridiculous to us now, but NFL kickers have steadily gotten better for at least 80 years, and they haven’t slowed down yet. In the 1960s, kickers made 13 of 129 kicks — 10.1 percent — from 50 yards. In the past five years alone, NFL kickers have made 422 of 675 such attempts — for 62.5 percent. Since 2010, kickers have even made seven of 31 tries from 60 yards — 22.5 percent.)

If we simply replaced all a kicker’s misses with makes, an “automatic” kicker wouldn’t be worth much more than the worst kicker in the league. There’d be a few salvaged points here and there, but nothing major (kickers these days just don’t miss that often).

But the real fun starts when we think about how a team would use a truly “automatic” kicker differently.

To simplify the question, let’s assume the kicker makes 100 percent of his kicks instead of 95 percent — he’s “RoboKicker.” Using ESPN’s expected points model, we can identify all situations where a team would definitely want to make a FG attempt on fourth down if it knew it could automatically earn three points. A made kick is actually worth slightly less than that because the kicking team has to give up possession whether it makes the kick or not, but we’ll charitably give it full credit.² So if a team is in RoboKicker’s range, it should want to attempt a field goal any time it’s fourth down and the expected value of its possession is less than three points. The value it gains from having that option is the difference between the two, and the kicker’s total value added is the sum of all those differences.

This plot shows how much RoboKicker would be worth for an average team (since 2006) in expected points added per game, based on his range:

This assumes the kicker would be just a normal kicker from longer distances than the one he’s automatic from, though if he was automatic from 50 yards he would probably be pretty good from 60,³ which would carry additional value. But this is a fair first-order guess.

The second wrinkle to @MattGlassman312’s question is its bit about RoboKicker being a No. 1 pick or an MVP. To answer that, we have to start to answer how valuable a No. 1 pick or an MVP is.

Let’s use Peyton Manning as our stand-in for “best player in the league,” which helps us to answer at least the spirit of the question. When Manning was injured, the Indianapolis Colts’ average margin of victory dropped by 14.6 points per game (though this may have been in part because they were tanking so that they could draft Andrew Luck). And when Manning joined Denver, the Broncos’ average MOV rose by 17.1 points per game. But let’s assume that those years were outliers and assume that a typical MVP is worth about 10 points per game. To surpass that, RoboKicker would need to be able to hit from around 80 yards. (I confess, this is further out than I would have guessed.) Then, considering that even No. 1 picks have only about a 50 percent to 60 percent shot of ever making a Pro Bowl — much less of being MVP — I’d say being automatic from 50 to 60 yards would probably be sufficient to be worth the top pick in the draft most years.

The Hacker Gods read FiveThirtyEight (or just love Andrew Luck)

Last week’s games had a few outcomes consistent with this column’s most frequently asserted stereotypes. Most intriguingly, we saw win curve standout and two-time Gunslinger of the Week winner Andrew Luck⁴ digging his own hole by throwing an early pick-6 that put the Colts down 7-0, and then climbing out of it to come back and win against the Houston Texans. This follows a similar Week 14 victory against the Cleveland Browns, when Luck was down 14 points in the second half after an early pick-6 (and a third-quarter fumble-6).

If you’ve been reading Skeptical Football, you’ll know I’m generally pro-interception (at least certain kinds) — but as an indirect indicator of taking good risks. Normally, a quarterback will lose the games in which he throws interceptions. But so far in his young career, it seems like Luck has an uncanny talent for winning and throwing INTs in the same game. So, naturally, that got me wondering how these results compare to Peyton Manning’s and those of all other quarterbacks (since 2006):

Luck shows a similar propensity for winning as his predecessor in Indianapolis, regardless of scenario. But the big caveat is that interceptions are often a function of losing as well as a cause of it. Generally this is because QBs make rational risk adjustments that lead to more interceptions when they’re behind.⁵ So to isolate the situations we’re most interested in, I limited the comparison to the number of interceptions thrown while the QB’s team was trailing (including only games in which the QB’s team trailed at some point):

This is, of course, a small sample for Luck: He has two wins in the six games in which he threw two trailing INTs, and two wins in the five games he threw three. But those four wins in 11 games (36.4 percent success rate) are already more than Manning. Since 2006, Manning has just three wins in 24 games (12.5 percent) in which he threw two or more trailing interceptions, and all QBs since 2006 have only 56 wins in 1,025 such games (5.5 percent).

Naturally, this relates back to my gunslinger hypothesis (that a quarterback can throw too few interceptions as well as too many). Andrew Luck is an example of someone who throws more interceptions than usual when his team is down, but wins more often. Overall, Luck has thrown one or more INT in 55.9 percent of games (19 of 34) in which he trailed and has won 52.9 percent of them (18 of 34). Other QBs have thrown one or more INT in 49.3 percent of games where they trailed, winning only 42.3 percent.

You can continue like this for more drastic circumstances (more likely to require heavy risk-taking): Of the 19 games where Luck threw 1 trailing INT, he threw 2 in 57.9 percent (11 of 19) and won 36.8 percent (7 of 19). Other QBs have thrown an additional INT in 38.0 percent of such games and won only 16.3 percent (439 of 2,697).⁶

In other words, Andrew Luck is to gunslinging what Aaron Rodgers is to gunholstering.⁷

Bonus chart of the week

After making the “team movement between 2013 and 2014” chart earlier, I thought it would be interesting to see how each team’s offensive and defensive performance has varied over the past five years. For this chart, I plotted expected points added per drive on offense and expected points denied per drive on defense for each of the last five years, and then connected them so you can see how each team has changed. Some teams have much tighter “shot groups” (Cleveland, New England) than others (Chicago, New York Giants), but I’ll leave you to look for yourself:

Reminder: If you tweet questions to me @skepticalsports, there is a non-zero chance that I’ll answer them here.

Charts by Reuben Fischer-Baum.

Footnotes

My wife, who is not a sports fan herself, describes “Pardon The Interruption” as “a bunch of guys shouting numbers at each other until a bell rings.”
The actual value is probably somewhere around 2.6 points, but I think the charitable number is appropriate since the kicker is likely to be at least moderately more valuable strategically.
Though if he were actually a robot, this may not be the case, as he would probably make about the same kick every time.
He won in Week 1 and again in Week 14 — you don’t remember?
There is also a smaller opposite effect, which is that QBs sometimes throw slightly more interceptions than expected in games they’re winning by wide margins, presumably because teams start playing a basic offensive set in blowouts rather than taking the extraordinary risk-avoidance measures they do to protect smaller leads. (Weird things happen in the NFL.)
And, if you need more: Of the 11 games in which Luck threw 2+ trailing INTs, he threw 3+ in 45.5 percent (5 of 11) and won 36.4 percent (4 of 11). Other QBs threw an additional INT in 30.3 percent of such games, and won only 5.5 percent.
However, for all that sound and fury about Luck, the actual Week 15 gunslinger winner was Mark Sanchez, who had two trailing interceptions for Philadelphia (in the third and fourth quarters), yet managed to take the lead (albeit briefly) in a game where the Eagles once trailed 21-0.

Chart of the week

Twitter question of the week

The Hacker Gods read FiveThirtyEight (or just love Andrew Luck)

Bonus chart of the week

Footnotes

Comments