Why The Best Supreme Court Predictor In The World Is Some Random Guy In Queens

Editors Note: This article on predicting decisions by the Supreme Court first ran on Nov. 17, 2014. Last week, the website FantasySCOTUS correctly predicted the outcome — and the vote count — of the court’s decisions on Obamacare subsidies and gay marriage. It was incorrect, however, in predicting the court’s fair housing decision.

On June 28, 2012, opponents of the Affordable Care Act celebrated on the steps of the Supreme Court. They had just learned that the law’s individual mandate was struck down. Both CNN and Fox News broke the news. The celebrants chanted, “The Constitution won, the Constitution won.” Across the street from the court, Rep. Jean Schmidt, staunchly averse to the law, fell into joyful histrionics.

But of course the individual mandate was not struck down. While it was found invalid under the Commerce Clause, it was upheld under Congress’s taxing power. Both the revelers and the news networks would learn that minutes later.

This story epitomizes two Supreme Court truths: The business of the court matters a lot to a lot of people, and it is extremely complex.

We’re quite good at predicting election outcomes. But if a Supreme Court opinion is tough to decipher when held in one’s hands, can we ever hope to predict decisions before they happen? The issues are complex and diverse, and justices have unique and evolving ideologies, outlooks and interpretations — and they don’t provide polling data.

Still, many political scientists, legal scholars, practitioners, physicists, engineers and hobbyists are trying.

Supreme Court predictors have emptied their toolboxes to tackle this problem. They’ve used statistical models, machine learning, expert opinion and — recently — the wisdom of the crowd.

The job got started in earnest about 10 years ago. Theodore Ruger, a law professor at the University of Pennsylvania — along with Pauline Kim (a legal scholar), Andrew Martin and Kevin Quinn (political scientists) — undertook a seminal study, the results of which were published in 2004. They studied the Supreme Court “prospectively.” Rather than looking back, as most scholarship had, they looked forward.

The authors employed two parallel methods to predict the 2002-2003 Supreme Court term. Ruger and Kim polled expert law professors for their predictions. Meanwhile, Martin and Quinn, using data from 10 years of the Rehnquist court, ran statistical, “classification tree” prediction models. These “trees” are basically vote-predicting flow charts, fitted with data. (Predictions were made, importantly, before oral arguments. More on that later.)

This is their classification tree for Justice Ruth Bader Ginsburg:

Ruger, Theodore W., et al. “The Supreme Court Forecasting Project,” Columbia Law Review (2004): 1150-1210.

The interdisciplinary nature of their work is key. Political scientists tend to view the court in terms of a left-right ideological spectrum, as they would Congress. Legal scholars may be more interested in precedent, the circuit court a case came from, and so on. Both views are important.

“Ideological preferences matter, but that’s not all that matters,” Ruger told me. “You have a host of things like institutional setting, and who the parties are, and, yes, even what the law says — the content or the specificity of the legal rules or the precedent.”

Ruger highlighted another reason Supreme Court cases are so difficult to predict: selection bias.

“They’re cases where the law is unusually vague or indeterminate,” Ruger said. “The Supreme Court takes the hardest cases, where the law is the least clear.”

In this instance, the statistical model beat the experts. The Martin-Quinn classification trees scored a 75 percent success rate, while Ruger’s and Kim’s legal scholars clocked in at about 59 percent. Ruger told me, however, that he’s sure that those who have argued before the Supreme Court would have done better than the legal academics.

Josh Blackman, a professor at South Texas College of Law, has recently taken up the predicting mantle. Blackman and his partners, Michael Bommarito and Daniel Martin Katz, have built a second-generation computer model, called {Marshall}, after former Chief Justice John Marshall. While it may be slightly less accurate than the Martin-Quinn model, it’s far more robust. It’s built on data going back more than 60 years, and doesn’t rely on a given composition of the court, as the Martin-Quinn model did. The model goes beyond Martin’s and Quinn’s classification trees, employing what the authors call a “random forest approach.” They can even “time travel,” testing their model on historical cases, using only the information they would have had at the time. It correctly predicts 70 percent of cases since 1953. And it’s public — the source code for {Marshall} is available on GitHub.

While the model has impressed many, Blackman still believes in human reasoning. “We expect the humans to win, they’re better,” he told me. “This is not like ‘The Terminator’ where machines will rise.”

Blackman also launched the website FantasySCOTUS in 2009 — as a joke. Like fantasy sports, human players log on, pick justices to vote this way or that, and score points once the decisions come down. To Blackman’s surprise, it took off, and thousands of people now participate.

The real promise of FantasySCOTUS isn’t entertainment, but prediction. Like the Iowa Electronic Markets, or the sadly defunct Intrade, FantasySCOTUS can harness the wisdom of the crowd — incentivizing its participants to make accurate predictions, and then observing those decisions in the aggregate. This year there is a $10,000 first prize put up by the media firm Thomson Reuters.

“I had no idea whether it’d be accurate or not — I did this entirely on a whim. And then by the end of the year I found out that this is actually pretty good,” Blackman said. The serious FantasySCOTUS players generate predictions cracking 70 percent accuracy.

Blackman is excited to find out what sort of cases the humans are good at predicting, and what sort the machines are good at. With that information, he can begin to craft an “ensemble” prediction, using the best of both worlds.

So there are the scholars and the machines and the crowd. Composing the crowd are the hobbyists — the intrepid, rugged individualists of the predicting world.

Jacob Berlove, 30, of Queens, is the best human Supreme Court predictor in the world. Actually, forget the qualifier. He’s the best Supreme Court predictor in the world. He won FantasySCOTUS three years running. He correctly predicts cases more than 80 percent of the time. He plays under the name “Melech” — “king” in Hebrew.

Berlove has no formal legal training. Nor does he use statistical analyses to aid his predictions. He got interested in the Supreme Court in elementary school, reading his local paper, the Cincinnati Enquirer. In high school, he stumbled upon a constitutional law textbook.

“I read through huge chunks of it and I had a great time,” he told me. “I learned a lot over that weekend.”

Berlove has a prodigious memory for justices’ past decisions and opinions, and relies heavily on their colloquies in oral arguments. When we spoke, he had strong feelings about certain justices’ oratorical styles and how they affected his predictions.

Some justices are easy to predict. “I really appreciate Justice Scalia’s candor,” he said. “In oral arguments, 90 percent of the time he makes it very clear what he is thinking.”

Some are not. “To some extent, Justice Thomas might be the hardest, because he never speaks in oral arguments, ever.”¹ That fact is mitigated, though, by Thomas’s rather predictable ideology. Justices Kennedy and Breyer can be tricky, too. Kennedy doesn’t tip his hand too much in oral arguments. And Breyer, Berlove says, plays coy.

“He expresses this deep-seated, what I would argue is a phony humility at oral arguments. ‘No, I really don’t know. This is a difficult question. I have to think about it. It’s very close.’ And then all of sudden he writes the opinion and he makes it seem like it was never a question in the first place. I find that to be very annoying.”

I told Ruger about Berlove. He said it made a certain amount of sense that the best Supreme Court predictor in the world should be some random guy in Queens.

“It’s possible that too much thinking or knowledge about the law could hurt you. If you make your career writing law review articles, like we do, you come up with your own normative baggage and your own preconceptions,” Ruger said. “We can’t be as dispassionate as this guy.”

Berlove also referenced the current supremacy of the best humans over the machines. “There’ll probably be a few top-notch players up there who can do better” than the computer model, he said. But he added, “With time, they might be able to do what they did to Garry Kasparov, or what they did to Ken Jennings,” referring to IBM’s Deep Blue and Watson supercomputers.

Linda Greenhouse, the former New York Times Supreme Court reporter, now at Yale, is also a notable human predictor. In a paper responding to the work of Ruger, Kim, Martin and Quinn, she lays out an eloquent critique of the computer models. She also describes how “the Supreme Court press corps often indulges in predictions … usually made during the walk down the stairs from the courtroom following an oral argument session.”

She goes on to estimate her success rate at around 75 percent, and probably actually higher, as the cases she covered tended to be closely fought and thus less predictable. She, too, stresses the importance of oral arguments to prediction.

So how about some actual predictions? Here they are, as I write, for the Supreme Court’s October sitting — from the FantasySCOTUS crowd, super-predictor Berlove and the {Marshall} algorithm. The predicted votes shown are those for and against affirming the lower-court decision, respectively. (You can find a useful summary of the cases at SCOTUSblog.)

The disagreement among the predictions for the October cases is remarkable. In fact, for only one case — Holt v. Hobbs — do all three predictors agree on the outcome. This case, about the right of an inmate to grow a beard in accordance with his religious beliefs, is also remarkable because the inmate filed a handwritten petition to the Supreme Court. All three predictors guess he will win.

There is considerably more agreement between the crowd and Berlove for the cases being heard in the court’s November sitting. (The {Marshall} predictions are being run as I write.) Again, here are the predicted votes for and against affirming:

In two of this term’s high-stakes cases, tackling racial gerrymandering — Alabama Democratic Conference v. Alabama and Alabama Legislative Black Caucus v. Alabama — oral argument was heard Wednesday. (My colleagues at the Brennan Center for Justice at NYU Law, where I am a fellow, filed an amicus brief in the latter case.) As I write, the crowd predicts close decisions in both cases. The predictions may shift — and improve — after predictors analyze the oral arguments’ tea leaves.

Last week, the court agreed to hear another challenge to the Affordable Care Act, in a case called King v. Burwell, which will almost certainly be the most-watched of the term. It is not yet scheduled for argument, and it remains to be seen how the predictors will come down on the case. And, of course, it’ll be summer before we know how the justices will.

Machine predictors are bound to improve. Blackman hopes to host a machine competition next year, encouraging others to work off {Marshall} and develop their own, better algorithms that can crack 70 percent. And in the long run, the real impact of this work may not have much at all to do with the Supreme Court. With an accurate algorithm in hand, predictions could be generated for the tens of thousands of cases argued every year in lower courts.

“There are roughly 80 cases argued in the Supreme Court every year. That’s a drop in the bucket,” Blackman said.

This could have dramatic, efficiency-boosting effects. Lawyers could better decide when to go to trial, when to settle and how to settle, for example. But there’s a long way to go.

Maybe the real difficulty is as simple as Ruger put it: “The justices are intelligent human beings and, as such, are very complex human beings.”

Footnotes

In January 2013, Thomas broke a nearly seven-year silence.

FiveThirtyEight

Why The Best Supreme Court Predictor In The World Is Some Random Guy In Queens

Footnotes

Comments