CMU and Italian artificial intelligence (AI) researchers use correlated equilibrium for a novel algorithm.
In a recent artificial intelligence (AI) machine learning breakthrough in game theory, researchers solve a persistent problem with real-world applicability—for the better. At last month’s 34th Conference on Neural Information Processing Systems (NeurIPS 2020), researchers from Politecnico di Milano and Carnegie Mellon University presented a new AI algorithm that enables a more nuanced approach to solving game theory problems—a solution that has potential real-world impact in economics, industry, policymaking, and science.
Game theory is the science of strategy that was pioneered by Princeton mathematician John Von Neumann (1903-1957) with the 1928 publication of his theory of parlor games titled “Zur Theorie der Gesellschaftsspiele.” It is a math-based approach to modeling behavior.
Mention “game theory” and the concept of Nash equilibrium, one of the most important decision-making theorems in game theory, may come to the minds of economists, mathematicians, scientists, computer programmers, CEOs, policymakers, entrepreneurs, financial analysts, and the tech-savvy. American mathematician John Forbes Nath Jr. (1928-2015) is a recipient of the Abel Prize, the Nobel Memorial Prize in Economic Sciences, and the John von Neumann Theory Prize among many other awards and distinctions.
The Nash equilibrium puts forth the concept that the optimal outcome of a game is when there is no incentive for players to deviate from their initial strategy after considering an opponent’s choice—that an equilibrium exists if given a finite number of players and moves.
In data science, Nash equilibrium assumes fully decentralized interaction among players, therefore it is a distribution on the uncorrelated strategy space.
To illustrate this concept, consider a hypothetical game where two children are presented with a choice between two strategies, “get-cookie” or “lose-cookie.” Choose the strategy of get-cookie and receive a delicious chocolate chip cookie. Choose the strategy lose-cookie and forgo a tasty treat. It logically follows that both children pursue to receive a chocolate chip cookie and select the strategy of get-cookie. Even if either child revealed hers or his strategy to the other—it does not impact the other child’s behavior. In other words, there are not incentives for the children to deviate from their initial strategy of get-cookie—just ask any parent of sweet-toothed children.
Standard game theory assumes homo economicus (economic man), a view that people act as rational, self-interested agents who seek outcomes that maximize utility. This is not a popular viewpoint among sociologists, psychologists, anthropologists, and behavioral economists as there are many factors that may impact decisions such as imperfect information, changes in preference, individual social preferences (fairness, altruism, reciprocity, etc.), and other reasons.
A common game theory that illustrates the Nash equilibrium is the prisoners’ dilemma, where prosecutors lacking evidence to convict, offer two prisoners with imperfect information (each is held in separate solitary confinement with no ability to communicate with the other), the choice to either testify against the other, or remain silent. If neither confess, both will serve a year in prison. If one betrays the other, and the other remains silent, the betrayer is set completely free while the other serves 10 years in prison. If both betray the other, each serves five years in prison. The Nash Equilibrium in the prisoners’ dilemma is for both prisoners to betray the other.
In complex real-world situations where everything is not so arbitrarily black and white, it is not uncommon to find shortfalls to the Nash equilibrium. In these scenarios, the concept of correlated equilibrium (CE) put forth in 1974 by mathematician Robert J. Aumann, recipient of the Sveriges Riksbank Prize in Economic Sciences in Memory of Alfred Nobel 2005, offers a more general and flexible approach.
“A correlated strategy is a general distribution over joint action profiles and it is customarily modeled via a trusted external mediator that draws an action profile from this distribution, and privately recommends to each player her component,” wrote the research team of Andrea Celli, Alberto Marchesi, Gabriele Farina, and Nicola Gatti.
The researchers cite many potential weaknesses in the Nash equilibrium concept where correlated equilibrium may mitigate. Continuing the prior example, imagine that the children’s mom, a trusted external mediator, provides a recommendation to each child privately during the game.
“Most of the work in the multi-agent reinforcement learning community either studies fully competitive settings, where agents play selfishly to reach a Nash equilibrium, or fully cooperative scenarios in which agents have the exact same goals,” the researchers explained in the study. “Our work could enable techniques that are in-between these two extremes: agents have arbitrary objectives, but coordinate their actions towards an equilibrium with some desired properties.”
Extensive-form correlated equilibrium (EFCE) broadens Aumann’s strategic-form correlated equilibrium; recommendations to the players are revealed incrementally to a player when reaching possible new information sets, decision points, at the moment when the move can be acted upon. If the player deviates from the recommended action at any decision point, the player does not get recommendations in the future.
To model extensive-form correlated equilibrium, the researchers created an algorithm called Internal Counterfactual Regret Minimization (ICFR) that minimizes trigger agents regrets through decomposition of the regrets at each information set. In other words, the algorithm is a way to minimize laminar subtree regrets.
The researcher evaluated the convergence of their algorithm using standard benchmark games from four multiplayer games, Kuhn poker, Leduc poker, Goofspiel, and Battleship.
“We show that it is possible to orchestrate the learning procedure so that, for each information set, employing one regret minimizer per round does not compromise the overall convergence of the algorithm,” wrote the researchers.
The researchers from Politecnico di Milano and Carnegie Mellon University were among the winners of the prestigious NeurIPS 2020 Best Paper Awards.
“This paper shows the existence of such regret-minimizing algorithms that converge to CE in a much larger class of games: namely, the extensive-form (or tree-form) games,” wrote the NeurIPS 2020 Program Chairs in the conference blog. “This result solves a long-standing open problem at the interface of game theory, computer science, and economics and can have substantial impact on games that involve a mediator, for example, on efficient traffic routing via navigation apps.”
“We provided some empirical evidences that ICFR computes equilibria which attain a social welfare ‘not too far’ from the optimal one,” reported the researchers. “This could have an arguably positive societal impact when applied to real economic problems.”
Copyright © 2021 Cami Rosso. All rights reserved.
A version of this article first appeared on Psychology Today.