Skip to main content
ABC News
MLB’s Hit-Tracking Tool Misses A Lot Of Hits

The introduction of Statcast has marked the beginning of a new era in baseball, at least from a stathead’s perspective. The revolutionary new tracking system calculates metrics such as exit velocity and launch angle, which have already provided us with new insights on baseball’s inner workings. But while Statcast is so far surpassing the wildest dreams of sabermetricians, the tracking system remains a work in progress, with gaps in its powers of observation.

The system itself is a technical marvel. A Doppler radar (which tracks high-speed objects, like the ball) combined with a camera tracking array (which tracks low-speed objects, like the players), Statcast integrates these two sources of information to monitor the position and velocity of every object and person on the field. This setup generates a greater volume of data in a single game than was collected in all of MLB’s previous history combined.1

But that data isn’t always easy to analyze. Front office analysts I spoke with said that Statcast’s radars frequently lose track of batted balls on atypical trajectories — for example, with extremely high (popup) or low (chopper) angles. In 2015, Statcast failed to provide data on 13.4 percent of all batted balls; it’s gotten a bit better as time has progressed, dropping to 12.5 percent in the first half of 2016 to only 11.2 percent since July.

Without a complete track of the batted ball, the computers must extrapolate, and sometimes they fail to report any data on the trajectory or give implausible readings (exit velocities of zero, or improbable home run distances). They can also spit out velocity readings that are just plain inaccurate. These kinds of errors require extensive manual checking and correction for use by front offices, but for public use, such ambiguous batted balls are sometimes discarded.

While we can’t measure baseline inaccurate readings, we can try to examine the missing batted balls by cross referencing data from Statcast with pitch-by-pitch data from PitchInfo. That way we can at least see what kinds of balls in play Statcast is most likely to miss entirely.2

arthur-StatcastTracking-1

By far the most-missed category of batted balls is popups. That makes sense, since popups tend to leave the bat at extremely high angles, which are difficult for the radar to track. Groundballs have the same problem on the lower end, compounded by the fact that they often produce bounces in the dirt, which can confuse the system.

The system does the best with intermediate angles, such as line drives and fly balls, which show the least missing data. But performance has shifted on even these balls over the course of Statcast’s lifespan. In the first half of 2015, 5.1 percent of line drives were missed, which decreased to only 3.5 percent in the second half and 1.5 percent in 2016.3 In the case of more intermediate angles, the system can fail when the exit velocity is exceptionally high or low, both of which befuddle the tracking system.4

There are also serious differences between the implementation of the system at different ballparks. They range from around 7 percent of batted balls missing at Progressive Field in Cleveland and Citi Field in New York (the latter being one of MLB’s Statcast pilot stadiums) to 21.7 percent going missing in Arizona. Five ballparks are missing data for more than 15 percent of batted balls; seven are missing it for less than 10 percent.5 Those five ballparks with the most missing data show an average exit velocity among tracked balls in 2016 about a half a mile per hour faster than the overall MLB average, suggesting that they are mostly missing low-velocity balls.

Statcast’s tracking problems can affect how we evaluate players. Players with a predisposition towards popups, for example, tend to have those batted balls excised from their records, along with their correspondingly low exit velocities. What remains is an incomplete and potentially misleading subset of their exit velocities. The fraction of tracked batted balls for each player in MLB (with a minimum of 1,000 pitches seen) varies by as much as a factor of 4, with line drive sluggers such as Joey Votto seeing the best tracking (only 5.3 percent of his batted balls are untracked) while groundball-heavy slap hitters such as Scooter Gennett lose more than 19 percent of their batted balls. Three of the top 20 hitters in missing data play for the Arizona Diamondbacks, which is unsurprising since the Arizona stadium fails to track just over a fifth of all balls in play.

With so much missing data, it’s impossible to calculate players’ true average exit velocities. However, we can take an educated guess by imputing the exit velocities of their missing batted balls.6 Most players tend to lose exit velocity when you do this, which makes sense because low-exit velocity hits like grounders and popups tend to make up the majority of the missing data. For some hitters, the difference between their exit tracked and imputed exit velocity can be in excess of a mile per hour.

AVERAGE EXIT VELOCITY
NAME TRACKED IMPUTED DIFFERENCE TRACKED RANK IMPUTED RANK
Eric Hosmer 94.3 MPH 92.3 MPH 2.0 MPH 5 13
Pedro Alvarez 94.2 92.4 1.8 6 12
Jake Lamb 93.4 91.7 1.7 14 25
Matt Holliday 94.8 93.1 1.7 3 4
Paul Goldschmidt 92.5 91.1 1.5 26 35
Jose Bautista 93.5 92.1 1.4 13 17
Nelson Cruz 95.5 94.1 1.4 1 1
Ryan Braun 91.1 89.7 1.4 67 91
Giancarlo Stanton 95.2 93.9 1.4 2 2
Mark Trumbo 94.7 93.3 1.4 4 3

Source: Baseball Savant, PitchInfo

Kansas City Royals first baseman Eric Hosmer takes the lead, with just under two miles per hour between his “true” and measured exit velocities. By only measured exit velocity, Hosmer would be the fifth-best in the league, with the four players above him performing at an average weighted Runs Created Plus, a measure overall offensive productivity, of 123. After you adjust for his missing data, however, Hosmer falls to 13th in the league in exit velocity, which is closer to his mediocre offensive production (he’s hitting .275/.333/.433 on the year for a wRC+ of 104).7

On the other end of the spectrum, by contrast, very few players see their exit velocities rise when you incorporate missing data. Billy Burns, the Royals’ defensive specialist center fielder, was last in the league in average exit velocity, and he gains the most (+.39 mph) by taking into account untracked balls, which leaves him once again last in the league in imputed exit velocity. Burns just can’t catch a break, with a woeful 51 wRC+ to match that exit velocity.

While Burns and Hosmer are the exceptions, Statcast is missing a substantial portion of the batted balls in MLB. It’s also making significant strides forward, with the tracking rate improving consistently this year relative to last. Still, it pays to be aware of the limitations in MLB’s novel technology and how they impact the way we view players like Hosmer in terms of the newest metrics.

Footnotes

  1. Measured in terms of the storage space required to hold this data.

  2. I used LOESS to smooth the fraction of missing data over time. Batted ball classifications are provided by stringers and so are less precise than Statcast’s launch angle measurements but also more complete.

  3. The change in line-drive tracking over the course of 2015 may shed light on one of the remaining concerns about the juiced ball hypothesis. If more low-velocity line drives were tracked in the latter half of 2015 and forward, it may explain why tracked exit velocities appeared to stay flat for line drives but increase for other kinds of hits. If that were the case, capturing more of the not-so-valuable low-velocity line drives would sandbag the value of tracked line drives, causing them to appear to decrease in speed over the course of the year, while untracked line drives would appear to become more valuable. In fact, the average run value of an untracked line drive increased three times as much over the course of 2015 than the same value for an untracked flyball, potentially indicating that more low-velocity batted balls were tracked later in the year.

  4. The standard deviation of linear weight value for untracked batted balls in both the fly ball and line drive categories is larger than it is for tracked batted balls, which supports the idea that they are exceptionally high- or low-velocity contacts.

  5. The discrepancy between parks in missing data is extremely significant using a logistic regression.

  6. To do the imputation, I used the type of batted ball and the result of the batted ball. So, for example, if a player hit an untracked line drive single, I gave it the average exit velocity of other line drive singles.

  7. Generally, imputed exit velocities correlate slightly better (at r=.43) with weighted On-Base Average (wOBA) than raw averages (at r=.41) for all players, although the difference shrinks for players with more batted balls.

Rob Arthur is a former baseball columnist for FiveThirtyEight. He also wrote about crime.

Comments