Computers Are Learning How To Treat Illnesses By Playing Poker And Atari

You’ve got to know when to hold ’em. Know when to fold ’em. And know when to walk away because the game you’re playing has been rendered trivial by an advanced algorithm.

Poker has been solved. Michael Bowling, a computer science professor at the University of Alberta — along with co-authors Neil Burch, Michael Johanson and Oskari Tammelin — published findings to that effect earlier this month in the journal Science. For a specific poker game — heads-up limit hold ’em — a computer algorithm is now indistinguishable from perfect.

What’s more, the program — dubbed Cepheus¹ — is self-taught. Over two months, it played trillions of hands against itself. It learned what worked, what didn’t, and it improved. The game is now “solved” in the sense that you could play poker against Cepheus all day, every day for a lifetime and not be able to distinguish the program from the Platonic ideal of a poker player. Don’t believe it? You can play with Cepheus online.

And while this on its own is a major victory for computer science and game theory, the real impact may be much greater. Algorithms — including the ones that have mastered checkers and gotten really good at “Jeopardy!” — are being put to use in genomics labs, hospitals and airport security.

But first, let’s appreciate this result for its immediate impact. Cepheus is a notch on the belt of a legion of computer scientists, game theorists and hobbyists. These researchers, in a large and diverse set of projects, try to solve games — to truly master them with math, programming and technology. To solve a game, roughly speaking, is to be able to play it perfectly.² It’s slow going. Tic-tac-toe and Connect Four are solved, sure. But solving chess or go remains a distant dream.

Bowling’s computers had been really good at poker for a while. He helped start the Annual Computer Poker Competition and has entered his programs in every event, winning more than half of them. And his algorithms have prevailed in matches against human heads-up limit hold ’em specialists.

“From that point on, computers had exceeded human play at heads-up limit,” Bowling told me by phone. “Not that everyone in the community accepted this. Poker players have a lot of ego.”

But now, one of the computers is perfect. Bowling says heads-up limit hold ’em is the first imperfect information game — a game where some information, like an opponent’s cards, is hidden — to be solved.³

Some researchers may be driven to solve games, but that doesn’t always mean they excel at playing them. I asked Bowling if he himself is a strong poker player.

“I am human,” he said, reassuringly. “And I have played the game of poker. But to say I’m a poker player goes way too far.” Last year he played only about 100 hands, just to see “if the buttons worked” on Cepheus.

Imperfect information makes a game tricky to solve. An algorithm has to take into account more possibilities. In a game like checkers, you can prune the game tree with your play, leaving fewer possibilities for an algorithm to consider. Open with one move and you don’t have to worry about calculating all those situations in which you didn’t open with that move. In poker, you don’t have as much control over the state of the game.

“We can’t just say, ‘I don’t want a jack of spades to be flipped over on the flop, because I haven’t figured out what to do when that happens,’” said Bowling.

The following table shows the complexity of a number of popular games. This is measured by the size of their “state space” — in other words, how many total positions are possible in the game. I’ve also indicated whether or not the game is solved.⁴ (Games in the table are their two-player versions, where applicable.)

GAME	SIZE (10^X)	SOLVED?
Tic-tac-toe	3	✓
Nine Men’s Morris	10	✓
Go (5×5)	11	✓
Connect Four	13	✓
Limit hold ’em	14	✓
Checkers	20	✓
Backgammon	20
Chinese checkers	23
Othello	28
Go (9×9)	38
Xiangqi	48
Chess	50
Shogi	71
Go (13×13)	79
Stratego	115
No limit hold ’em	140
Go (19×19)	171

For context, the number of atoms in the universe is about 10⁸⁰.

“What determines whether games are easy or difficult to solve? One very clear answer is how big the game is,” Bowling said.

Solvers have been making their way down the list. Tic-tac-toe is trivially solvable. Connect Four was first solved in 1988. Nine Men’s Morris in 1993. Go, on a 5×5 board, was solved in 2002, and checkers in 2007. As measured by state-space size, checkers is the most complex game yet solved.

The holy grails of this broad game-solving project are chess and go. By all accounts, their solution is unlikely in our lifetimes.

In 1965, Hans-Joachim Bremermann wrote that the “speed, memory, and processing capacity of any possible future computer equipment are limited by certain physical barriers: the light barrier, the quantum barrier, and the thermodynamical barrier. These limitations imply, for example, that no computer, however constructed, will ever be able to examine the entire tree of possible move sequences of the game of chess.”

Fifty years later, Jonathan Schaeffer agrees. “Chess cannot be solved with modern technology. We need a breakthrough,” he told me. He pointed to quantum computing as a potential, but distant, example.

Schaeffer is the man who solved checkers. He too is a computer scientist at the University of Alberta.

Much like Bowling with poker, Schaeffer doesn’t play checkers. (He is, however, a good chess player, and had one of the strongest chess programs around, called Phoenix, before Deep Blue came on the scene.) Nevertheless, in 1994, a program he built became the world checkers champion. Schaeffer’s checkers program is called Chinook.⁵ Thirteen years later, also in the pages of Science, he announced that the game had been solved.

Schaeffer started the poker research group, in 1995, before Bowling was even a graduate student, and was behind Polaris, Cepheus’s forebear.

“I think it’s a fabulous piece of work,” he told me, referring to Cepheus, although he admitted he was biased. “It’s taken an algorithm that, quite frankly, I didn’t think had the ability to solve the game of poker. And they kept finding new ideas to take this algorithm and stretch it further and further, until solving the game of poker became something they could do with a lot of computers in a relatively short amount of time. They’ve pushed the boundaries of what was possible.”

And while the poker and checkers projects are distinct — the tools used to solve incomplete and complete information games are vastly different — both Bowling and Schaeffer emphasized to me their desire to generalize their algorithms. Bowling has a project analyzing the sport of curling, for example, and Schaeffer has entertained the idea of trying to solve backgammon.

And both emphasized the potential for their research to extend beyond games.

“Ideally, we’d love to have an A.I. system that can sit down at any decision-making task and figure out how to make good decisions. That may be too far, so let’s take a smaller subset, a step along the way,” said Bowling.

Enter Atari. A gaming challenge of a much different kind, this is a chance for the algorithms to really stretch their binary legs.

Bowling’s Cepheus is self-taught, and this has powerful implications. His next project is to develop an A.I. that can figure out how to play any Atari 2600 game by interacting with it. Here’s a demo video from the project — computers playing computer games.

This project is called The Arcade Learning Environment: “A new challenge problem, platform, and experimental methodology for empirically assessing agents designed for general competency.” Some of Bowling’s Ph.D. students have gone on to work for DeepMind, a secretive startup acquired by Google last year. Bowling recalls Larry Page, in a talk he gave, pointing specifically to the students’ arcade learning work. DeepMind was founded by neuroscientist Demis Hassabis, who has been called “probably the best games player in history.”

But both Bowling and Schaeffer emphasized that, in the end, their work is not really about games at all.

“We’re not doing research into games. I’m not here doing work that will allow humans to play better checkers,” said Schaeffer. “We’re interested in finding ways to make computers perform tasks that you normally think humans should be doing.” Games are just the test bed.

“We do have computer programs that are capable of doing very specific tasks very, very well. But that doesn’t say a lot for what humans are good at, which is doing a very general space,” said Bowling.

Perhaps, rather than interacting with age-old board games or dusty video games, these algorithms can interact with other things — biological or medical information, say.

Databases used by Schaeffer’s checkers program were hundreds of gigabytes large — very large indeed in 1994 — and had to be compressed to be useful. So he figured out how. Two years later, Schaeffer founded a company called BioTools. Why? The company was doing analysis of the human genome, to better understand DNA and protein. The DNA sequencing generated massive amounts of data. Data that had to be compressed to be useful.

Bowling’s poker work has spilled over into diabetes treatment. Rather than finding the optimal strategy in a poker hand, a similar algorithm can provide optimal treatment recommendations for adjusting insulin injection formulas. The “game,” in this case, is dealing with the worst-case responses of a patient to a given treatment. “We have a proof-of-concept paper that shows that robustness according to a certain class of risk measures can be optimized” by solving this carefully constructed game, Bowling explained.

He also has hope for security application, like at an airport. “We’re starting to look at the area of game theory for security, where you are trying to find a defender strategy for protecting strategic infrastructure from an attacker trying to exploit your policy. Game theoretic solutions have already been deployed by Milind [Tambe]’s group [at USC], but this result shows our ability to scale to problems of unprecedented size, which can enable new problems to be tackled, or more complex models of existing situations,” Bowling said in an email.

And, of course, there’s the famous case of Watson. The IBM supercomputer famously defeated Ken Jennings and Brad Rutter at the game show “Jeopardy!” It’s now being put to use to improve the treatment of cancer and PTSD.

Importantly, the applications aren’t predictable. Schaeffer had never dreamed that his checkers work would be used this way, he said, but to him it’s a powerful argument for discovery-based research. And it’s a powerful argument that games are far from trivial.

To Schaeffer, all this research reveals something else important. Researchers attack human games players with a full-on technological onslaught: millions of dollars of funding, teams of dozens of experts, cutting-edge computing power, months or years of research. And, if they’re lucky, they eke out a slight edge in a game.

“It’s a real compliment to the human ability to compute, to reason, to think,” he said.

Footnotes

A forerunner to the Cepheus poker program was called Polaris — the North Star. Cepheus is also a constellation in the northern sky. Gamma Cephei, a star in Cepheus, will succeed Polaris as the northern pole star in about 3,200 years.
That doesn’t mean that computers are bad at unsolved games. Chess isn’t solved, but computers are very good at it. Ditto Scrabble. Perhaps a comic will help illustrate the distinction.
Excluding other trivial games invented by academics for the express purpose of being solved.
There are some subtle variations in the definition of “solved.” A game can be “strongly solved,” meaning that an algorithm can provide perfect play from any possible position that can be realized in the game. A game can also be “weakly solved,” meaning an algorithm can guarantee a win, or for some games just a draw, from the beginning of the game. Also, Xiangqi is also known as Chinese chess. Shogi is also known as Japanese chess. Competitive go is played on the largest, 19×19, board.
In Great Britain, checkers is called draughts, pronounced “drafts.” In Canada, where Schaeffer lives, there are Foehn winds called chinooks. Another name for winds is drafts. Get it?

FiveThirtyEight

Computers Are Learning How To Treat Illnesses By Playing Poker And Atari

Footnotes

Comments