Once again, a leading Artificial Intelligence (AI) developer has pitted their machine intelligence against a team of the best human competitors in a public tournament. The difference this time was that the game required more than memory, learning, and logic. Winning required intuition and the ability to know when and how to bluff your opponent. In January 2017, Carnegie Mellon University’s (CMU) AI program was at the tables in the Rivers Casino in Pittsburgh, Pa., to face off against four of the world’s best professional poker players. The 20-day match began on January 11, and over the three weeks they played 120,000 hands of Heads-up, No-limit Texas Hold ’em, a particularly difficult version of poker.
Booked as “Brains vs. Artificial Intelligence: Upping the Ante,” the contest was actually a rematch. In the first Brains vs. AI match in 2015, CMU’s bot, then called Claudico, had lost even though the academics characterized the winning margin in that contest as statistically insignificant.
The developers, Professor Tuomas Sandholm and his Ph.D. student Noam Brown, went back home, did some work, and returned to play two of the original poker professionals plus two others. They were Jason Les, Dong Kim, Jimmy Chou, and Daniel McAuley, among the best poker players in the world. CMU’s program was renamed Libratus (Latin for balance).
In the first few days of the rematch, one of the players commented on how the bot they were playing seemed to be different. It was. Nine days in, Libratus had built a substantial lead of $459,154 in chips, and Jimmy Chou offered the following observation: “The most surprising thing is its ability to adjust, its ability to learn every day and get better. It’s been taxing on us to try to find weaknesses, especially after it’s been able to adjust to us. It’s like a tougher version of us.”
After 20 grueling days that ran from 11 a.m. to 8 p.m., Libratus had won, having amassed a total $1,766,250 in chips. Professor Sandholm noted for the press: “This is a landmark step for AI. This is the first time that AI has been able to beat the best humans at Heads-up, No-limit Texas Hold ’em. More generally, this shows that the best AI’s ability to do strategic reasoning under imperfect information has surpassed that of the best humans.”
This isn’t the first time that an AI program combined with intensive computing power to defeat humans at some of their more sophisticated games. The three contests that come to mind immediately are the chess and Go tournaments and the humbling of the two best Jeopardy players.
1997 – IBM’s Deep Blue defeats chess grandmaster Garry Kasparov.
2011 – IBM’s Watson defeats Jeopardy all-time champions Ken Jennings and Brad Rutter.
2016 – Google’s AlphaGo defeats South Korean Go champion Lee Sedol.
But poker is a different game and has attracted the attention of AI researchers. Before the match, Sandholm noted, “If we are able to show that the best AI has surpassed the quality of the best humans in strategical thinking under imperfect information that would have tremendous implications.”
With chess you have prescribed moves, prescribed values for the pieces, established patterns of defense and offense, and the task of thinking ahead, which involves tree-branch searching, which Deep Blue is really good at. Go is similar, but with even deeper searching.
With poker, there’s not only the problem of serendipity (you don’t all get the same “pieces” to play), but, more importantly, there’s also the critical strategy of bluffing, not just for the current hand, but over the course of many hands. Deciding on your bluffs requires intelligent guesses (intuiting) about your opponent’s response. One of the poker players, Chou, commented on the opposition, “I think it’s a really good step for AI in general if it beats us. This is an incredibly difficult game to solve.”
According to the developers Sandholm and Brown, Libratus essentially built itself from just the rules of the game. Brown explains, “We [gave] the AI a description of the game. We don’t tell it how to play. It develops a strategy completely independently from human play, and it can be very different from the way humans play the game.”
Libratus wasn’t given a catalog of actual past games to analyze like AlphaGo when Google’s DeepMind developers were tutoring that program. It simply played trillions of games against itself using an AI technique called reinforcement learning, and it developed its own unique, machine-style play.
During the day, the human players were alert to patterns they found in the machine’s play, because that’s how they play other humans. What they weren’t aware of was that Brown ran a second program for Libratus each evening, an algorithm that also searched for those very same patterns, and then it removed them. The vulnerability that the human players might have used thus became a bewildering advantage for the machine. Andrew Ng, one of the founders of Google’s AI lab, explained to Cade Metz of wired.com, “Poker has been one of the hardest games for AI to crack, because you see only partial information about the game state. There is no single optimal move. Instead, an AI player has to randomize its actions so as to make opponents uncertain when it is bluffing.”
This new skill combining intuition and bluffing will no doubt be included in the final reports that Sandholm and Brown are working on now, but the new ways this AI set of skills can be used are now part of the lexicon for machine intelligence. A number of commentators have conjectured that we might now expect to see other programs use the Libratus algorithms to become better negotiators, financial investors, and cyber mediators.