Serena Williams And The Difference Between All-Time Great And Greatest Of All Time

As of this writing, Serena Williams, 33, is around even money to win the U.S. Open. A victory in Flushing, New York, would be her 22nd major championship — tying her with Steffi Graf for the most Grand Slam titles in the Open era.¹

Williams has enjoyed as big of a gap between herself and the field as anyone in modern tennis history — at the moment she has more than double the ranking points as No. 2 Simona Halep. And with the assistance of a time machine, most top tennis players today could probably (though not definitely) beat most top players in history. So obviously Williams is the Greatest Of All Time, right?

Not so fast.

To look deeper into Williams’s career, and to compare her to her current and historical rivals, we employed a tool many FiveThirtyEight readers will be familiar with: Elo ratings. Well-favored around here for the NFL, NBA and soccer, Elo ratings are a good device for making historical comparisons of the kind we like to make in sports. Elo definitely puts Williams in the top tier of female tennis players, but it tells a slightly more muted story than other measures. In particular: While Williams has been great, and has been doing unprecedented things for a player of her age, the relative weakness of the tier of players beneath her undermines her GOAT claim.

The possibility that Serena has benefited from “weaker competition” is pretty conventional, and certainly debatable, but Elo gives us a useful way to examine exactly what that possibility means and what it implies.

For the unfamiliar, Elo is a rating method originally developed for chess, but eminently suitable for tennis. It’s very simple: Two players enter a match with Elo ratings based on their prior results. Elo uses their ratings to predict their head-to-head outcome, and then updates those ratings depending on the outcome.² It’s not without limitations: Elo makes every head-to-head prediction based solely on the two players’ ratings, which, in turn, are only affected by previous match results. This means that a lot of information (such as injuries, subsequent performances of past opponents, and aging) is completely ignored. But Elo’s ruthlessly Bayesian design — adjusting ratings based on the prior likelihoods of outcomes — makes it surprisingly accurate and extremely flexible. Match by match, it builds a pyramid of greatness, with players moving closer to the top with every win and closer to the bottom with every loss. It comes in a number of flavors, and for this article, we’ve cooked up our own version, empirically tailored to tennis’s particular dynamics. (For our methodology, see the footnotes.)³

Our method includes Grand Slams and tour events, treating matches at each level equally. A Grand Slam title counts much more with historians and fans, but women play best-of-three sets at majors and on tour, making results comparable. Throwing out non-major matches would mean omitting most of our data, including many of the meetings between top players. (And in any case, we calculated Grand Slam-only Elo and it was less kind to Williams than the one we’re using.)

The first thing that stands out: Although this has been the most productive period of Williams’s career as far as Grand Slams go — and she appears to be playing better than ever in all events — Elo still thinks Williams peaked around 2003 (when today’s next-highest-ranked American, Madison Keys, was 8 years old):

Williams’s Elo ratings provide a fever chart of the ups and downs of her career. By age 17, she’d won her first major, and at 20 she fulfilled her childhood goal of becoming the top-ranked player in the world. By age 21, she’d won four straight majors, all of them by beating her sister Venus in the finals. Serena looked GOAT-bound. Then came years of missed opportunities. She pursued off-court interests vigorously enough that former greats such as Chris Evert questioned her commitment to the sport. Williams also missed time for reasons outside her control. Months after completing her first Serena Slam, she suffered a knee injury. She and her family grieved after the horror of her sister Yetunde’s murder.

Williams won just one major title between her 22nd and 25th birthdays, normally prime years in women’s tennis. Her next five years were more productive, with six major titles bringing her to a total of 13. But at age 30, Williams was well off the pace set by Margaret Court (21 majors at age 30, 24 in her career) and Graf (she won all 22 of her career majors by age 30). Williams was coming back from health problems that included a lacerated tendon and pulmonary embolism in 2011. At that point, even equaling Martina Navratilova and Evert at 18 majors would take the greatest post-30 career in Open-era women’s tennis history (no one before Williams had won more than four after her 30th birthday).

The rest is history.

Yet, despite eight more major wins, including four in a row (and counting), Williams’s upward swing has not propelled her to new ratings heights. Not only does Elo suggest she was better in 2002 and 2003, it suggests Justine Henin was better in 2007 (after reaching seven major finals and winning four of them in a three-year span).

It isn’t rare for the top player in women’s tennis to dominate. Separate the very best players over time, and they look like they are taking turns at the top, handing off the baton every few years, leaving little room for lesser rivals to shine:

What’s amazing about Serena’s position on this chart is that she gets a second peak. But neither of her peaks is as high as those of Navratilova, Graf and Monica Seles.

So what’s going on?

Navratilova, Graf and Seles each reigned over women’s tennis while competing against more elite players. When Navratilova won six majors in two years, she had to beat Evert in four of the finals. Graf and Seles competed with each other and Navratilova, as well as merely-greats like Arantxa Sánchez Vicario (four majors) and Jennifer Capriati (three majors).⁴ Amid so much greatness, Seles dominated in a stunning period from 1991 through April 1993, winning seven of eight majors she entered. Her meteoric rise, steeper than that of any other greats, was halted when she was stabbed by a Graf fan in 1993. Seles returned 28 months later but never reached No. 1 again.

Williams also faced tough competition early in her career, though no rival as great as Graf or peak-Seles. The last time Williams won four straight majors — in her first Serena Slam more than a decade ago — she was competing against Venus, who was herself in her prime; fellow Americans Lindsay Davenport and Capriati; five-time major champ Martina Hingis; and Belgian rivals Henin and Kim Clijsters.

As Williams returned from health problems, her younger rivals were retiring or struggling. Henin and Clijsters left the sport, came back, and left again for good (we think), playing no majors between them after age 29. Hingis retired at age 22, and, while she has since un-retired twice, she has played little singles. Li Na retired last year at age 32. Maria Sharapova has struggled with her own injuries — one of which caused her to withdraw from the U.S. Open on Sunday — as has Victoria Azarenka. Petra Kvitova has looked unbeatable at two Wimbledons in the last four years, but at other Grand Slam events during that time she has had as many first-round exits as major semifinals (two). The power vacuum beneath Williams has been such that three women — Dinara Safina, Caroline Wozniacki and Jelena Jankovic — spent more than two years combined at No. 1 between 2008 and 2012 without winning a single major. (Jankovic and Wozniacki are still trying.)

In other words, women’s tennis got weak at the top — aside from Williams herself. Now just two active players have won more than two Grand Slam titles: Venus Williams (seven majors), who is still competitive but finished the last four years outside the top 10; and Sharapova (five), who has lost her last 17 matches against Serena Williams. Serena Williams’s top rivals aren’t weak just because she keeps beating them; often they lose to players ranked beneath them. It’s possible that none of the players born in the decade after Williams is an all-time great.

It may sound bizarre to suggest that Williams’s opposition is weak, considering that modern athletes tend to get better and better. For most sports, in a strict time-traveling scenario today’s middling pros would likely beat our heroes of yesteryear, and we don’t have any reason to think this is less true in tennis — if anything, with population increases and worldwide popularity, the talent pool for tennis is probably larger than ever. Yet we engage in cross-era comparisons all the time, and some greats have it easier than others. So what to do?

The somewhat recursive resolution offered by Elo is fairly intuitive: The tier underneath you is “strong” if it consistently beats the tier beneath it, which is strong if it consistently beats the next-lowest tier, and so on. And the tier beneath Williams has been in free-fall:

We can’t directly compare players of today with players of, say, the 1980s, but Elo allows us to compare them indirectly, through common opponents. It builds comparisons, matchup by matchup. Williams herself is a bridge between eras. And her results now compared to then — along with everyone else’s matches against both eras — feed into Elo, which tells us that Williams has been getting better, but the relative strength of other top players has been getting worse. In fact, the gap between Williams and the next tier is so big right now that the most she can gain — about 5 Elo points for a win against Sharapova — is dwarfed by the 15 points she drops any time she loses. For example, Williams’s loss in the semifinals of an event in Canada to No. 20 Belinda Bencic earlier this month essentially wiped out all the ratings points she had gained at Wimbledon. (If that sounds crazy, think of it this way: Williams’s rating is so good that Elo considers her more likely to win Wimbledon than to lose one match to someone like Bencic.)

There’s an inclination to think that it’s unfair to dock an athlete for playing in a weaker era, because it’s not her fault when she was born, but Elo is capable of getting around this. If your opponents are weak, it just expects you to win more often. Thus, even a period of staggering dominance may look, according to Elo, like simply treading water if it was to be expected.

Is it possible that Williams is so much better than everyone else that Elo just can’t capture her current brilliance? Yes — if she never lost. But while Williams is 48-2 this year, those two losses are meaningful. Also, if she really were significantly better than her rating, she’d have dominated her opponents in all 48 wins. But she has been forced to a third set 17 times. We calculated different versions of Elo using each set as its own contest. Then we did the same with each game. If Williams really were blowing out opponents, she’d look better in these measures relative to other all-time greats. She looked worse. She is winning because she is better than everyone else, and she has even been pulling out wins that looked like potential losses. But she hasn’t been winning enough against this opposition to suggest that her already extremely high rating should be higher.

Thus, even as Williams has outpaced Navratilova in majors after turning 30 (eight to three), Williams only approached Navratilova’s same-age Elo rating at 32 and only recently overtook her:

Serena and Venus Williams had remarkable tennis origins — from being trained by their parents on cracked concrete courts in Compton to becoming the best pair of sporting siblings and the best African-American tennis players of all time. Yet their tennis ascent was remarkably conventional, at least as far as the careers of extraordinarily talented young women go. Grouping them with other greats by age shows how they slightly lagged Graf’s meteoric rise as a teen — itself outdone by Seles’s teenage run. Their early-20s dip was also conventional. And Venus Williams’s continued decline after 30 also matches the usual career arc of past greats.

Now, Serena Williams stands out among her fellow greats for more than her legendary origin story: She is getting better as she gets older. Her two years since turning 32 have been better than any of her previous ones besides her peak years from ages 21 to 23. She is defying tennis aging trends, even those of legends.

Her late peak is both larger and later than anyone’s. And there’s a flip side to Elo’s stubbornness about strength of competition: Because facing weaker competition won’t help you improve your standing, a marked improvement in standing is unlikely to result from weakening competition. In other words, Serena’s resurgence is real.

No athlete’s dominance is permanent, and it feels particularly vulnerable when the athlete is in uncharted territory, sustaining her superiority at an age when most of her historical peers have retired. But Williams is so far ahead of her competition that even if she started to decline at a rate of 100 points per year,⁵ and all of her rivals stayed in place, she would remain the best player for about two more years, and remain in the top 10 for at least another four.

Could a new rival emerge? Sure, it’s possible. But Sharapova, No. 2 to Williams in Elo among active players, has shown signs of decline at age 28. Azarenka and Kvitova, younger Williams rivals with two major titles each, are 26 and 25, respectively, and might have already peaked. Williams has stayed so great for so long, she looks likely to stay at the top of the game when rivals a decade younger have already started to fade. And all that winning only makes it more fun for Williams to stick around.

“I feel great,” Williams said at her press conference after winning Wimbledon. “I definitely don’t feel old. I think in life I’m still pretty young.”

While she may be unlikely to ever take the top spot in the women’s tennis hierarchy — at least as far as recursive Bayesian ratings algorithms are concerned — she’s still young enough to accomplish a lot more in the game, and old enough to deserve extra credit for it.

CORRECTION (Aug. 31, 11:56 a.m.): An earlier version of the chart in this article titled “Serena Williams’s Late-Career Renaissance” incorrectly displayed several trend lines. We have corrected and updated the chart.

Also, an earlier version of this article misstated the age at which Serena Williams became the No. 1 player in the world. She first reached the No. 1 ranking at age 20, not age 17.

CORRECTION (Sept. 14, 6:18 p.m.): An earlier version of a footnote in this article incorrectly stated the formula for the multiplier used in our tennis Elo rating system. The multiplier is determined by a function in the form K/((matches in player’s datasetoffset)^shape), not K/((games in player’s datasetoffset)^shape).

Special Podcast: Check out Baseline, a U.S. Open mini-podcast with Carl Bialik, Louisa Thomas of Grantland, and others from the National Tennis Center grounds. Listen here, and subscribe to the FiveThirtyEight sports podcast Hot Takedown on iTunes now so you don’t miss an episode!

Read more:

Footnotes

The period since April 1968 when professionals have been allowed to compete at the Grand Slam tournaments.
Players who don’t already have a rating start with 1500 Elo points, but that’s entirely aesthetic; everyone starts with the same rating, which may as well be zero.
In this case, there were two main choices to make: First, what level of granularity to cover. That is, do we treat each game in a tennis match as its own “match” — thus increasing our sample size, but measuring something very different from overall wins? Or do we treat a match win as a win — regardless of whether it’s 6-0, 6-0; or 6-4, 3-6, 6-7(7), 7-6(3), 70-68? Or do we go set by set? This tradeoff always exists, even in other settings like football or baseball, where margin of victory and run differential are commonly used. The question is always what we gain in prediction strength by using a less accurate but more detailed metric than wins. For this system, we tested and optimized all three, and found that any predictive gains from using sets or games instead of matches were extremely small. While those versions of our system may yet prove to have their uses, we’ve stuck with a match-based system for most of this analysis.

The second choice is how to update ratings after a match. All Elo systems take the difference between the number of wins a player earned and the number of wins expected, and then do something with that number to determine the appropriate adjustment to the player’s rating. The crudest thing to do is to multiply that difference by some constant number, K, where K is chosen empirically to match the context. In chess, a common K for new players is 40, meaning that for each 10 percent above expectation a player ran, she would gain 4 points in Elo per game. For FiveThirtyEight’s NBA Elo, we used a K of 20. Chess uses a K function that depends on the number of games a player has played. This is the approach taken by other Elo variants such as Glicko and Stephenson, which use additional parameters as well.

We tried and tested all these methods extensively but ultimately settled on our own variant (which substantially outperformed the alternatives) in which the multiplier is determined by a function in the form K/((matches in player’s dataset+offset)^shape). K is a constant multiplier much like that used by other systems, offset is a small adjustment to keep new players from shooting up or down too much, and shape tells us what shape the curve should take (essentially, the larger the number, the more stable the ratings for players with lots of games). With this structure in place, we simply had to test to see which parameters performed best over our data set. The values we settled on are a K of 250, offset of 5, and shape of 0.4.

This is an empirical approximation of what this adjustment curve should look like, and can likely be improved with a more accurate function or by incorporating more inputs than games and performance, but we tried to keep it as simple as possible to avoid the potential for overfitting. (The data we used is parsed from that on Jeff Sackmann’s GitHub page, totalling more than 250,000 matches at the tour level).
The fields were also loaded with perpetual finalist/semifinalist types like Gabriela Sabatini, Jana Novotná and Mary Joe Fernández.
This would be faster than most declines Elo has documented, but it’s hard to say what’s reasonable because most players retire shortly after any steeper decline.

Footnotes

Comments