Large Scale Machine Learning and Other Animals: GraphLab plays poker?

This week I am visiting CMU. I always enjoy hearing about new projects and see if I can help them utilize GraphLab for speeding up the solution, and allowing solutions for larger problems than where previously possible.

One of the interesting projects I heard about is from Xi Chen from the machine learning dept. and Qihang Lin from Tepper school of business, who are working on finding the best strategies for winning in poker games. And here is how Qihang describes the problem:

"Strictly speaking, in a poker game problem, we try to solve the Nash equilibrium which is the best strategy for each player. This means we are guiding people to win the poker game. According to the definition of a Nash equilibrium, if any one deviate from the equilibrium stategy, he will definitely lose money. However, poker game is a zero-sum game, which means the total money of all players are fixed. Therefore, if one player loss money by not playing equilibirum strategy, the other player will win the money from him. So, in order to beat the opponent in a poker game, we just need to compute the Nash equilibrium and play according to it.

There are many poker games with different rules. We are only interested in the family of poker games which consists of rounds of new cards dealing and betting, for example, Texas Hold'em Poker. This kind of games is also called sequential game with ordered signals in game theory. Because the size of the problem, no one can compute the exact Nash equilibrium for Texas HOld'em. The best strategy is found by Andrew Gilpin who was a PhD student in CMU advised by Prof.Tuomas Sandholm.
Here are the papers: http://www.cs.cmu.edu/~sandholm/gs3.aaai07.pdf
http://www.cs.cmu.edu/~gilpin/papers/iterated.aaai08.pdf

The first paper talks about how to reduce the size the game by shrink the states. The second paper talks about a gradient-based algorithm for solve the Nash equilibrium.

Gilpin and Sandholm also implement the Texas Hold'em player software. It is called GS3.

You can also play by yourself with their software in here

The main idea of their papers is to first formulate this problem as a saddle point problem, minmax x^TAy, and apply the algorithm by Nesterov (2005), ''Excessive gap technique in nonsmooth convex minimization''. The advantage of this algorithm is that, it only uses matrix vector multiplication, therefore it can handle the problem with a huge size. The disadvantage is that the algorithm need O(1/e) number of iterations to find a solution which is e-optimal. This is a slow convergence rate such that the error e of the solution found by Gilpin is usually very large. Therefore, the solution they implmemnt in GS3 is not the perfect strategy. There is a hope we can win their money by using the perfect Nash equilibrium strategy.

We try to use interior point method to solve the saddle point problem, minmax x^TAy, becasue to find a e-optimal solution, interior point method needs only O(log(1/e)) number of iterations. This is a much faster convergence rate, which can guarantee us a more accurate solution than GS3. However, the challeging is how to make interior point method scalable to the poker game which has a large size. We hope you linear system solve can save us.

The size I shown you is the size of a poker game called Rhode Island Poker. No body play it in practice. But people in academic invent this game because it is similar to Texas Hold'em but has a smaller size. They can test and compare their strategies on this smaller game.

Let me explain the structure and the size of the matrix A. A is called payoff matrix. The game happens as follows. A private card is dealt to each player, and they start betting money. Betting of money is a smaller game which has 9 possible outcomes. For example, player one bet 5 dollar. player two raise the bet to 10 dollar, and then player one follows the raise by adding 5 more dollars to his own bet. Then the betting round finishes. This is just one of the 9 possible procedures of the betting round. After the betting round, a public card is dealt in the middle of the table and it is shared by the two player. Following, the same betting round repeats, which has 9 possible outcomes also. In the end of a Rhode Island poker, each player should have one private card and there are two public cards in the table and the players bet three time totally.

Each of column of A corresponds to a scenario path observed by player one.
Each of row of A corresponds to a scenario path observed by player two. So we just LABEL the columns and rows by the scenario path. Apparently, the scenario observed by the two players are different because of private card. For exmaple, player one get a A and player two get a K as private cards. Then they start the betting round and finish the betting round by outcome 1 for instance. Then the public card is dealt which is Q. And then they bet and finish betting by outcome 2. Then the last public card is dealt which is J and they bet and finish betting round by outcome 3. In this example, the scenario path for first player is A1Q2J3 and the one for player two is K1Q2J3. This game is finished and the first player get the money equal to the number in the A1Q2J3 column and K1Q2J3 row of matrix A. Therefore, you can see the size of the problem is exactly the number of possible scenario of the game observe by each player. A will be a matrix with 52*9*51*9*50*9 rows and columns.

Now, let's see the sparsity structure of matrix A. The senorita path observed by the two players can only be different in the first lable, which is the private card. See example, A1Q2J3 and K1Q2J3. Therefore, the matrix A must be block diagonal. For example, the element of A in column A1Q2J3 and row K1K2J3 must be zero, because the second public must be the same for both player. A1Q2J3 and K1K2J3 can not be the scenario observed by the two players at the same time. Therefore, there should be 9*52*9*51*9 blocks. Inside each of the block, the column is labeled by &***** and row is labeled by $***** with ***** is the shared segment of the label, but & and $ are different. & and $ could have 52 possible choice. Therefore, each block should have a size of 52 by 52. Each block has a size of 52*9 by 52*9 because I re-sort the order of columns and rows of A in order to make it compatible with the block structure of the constraint matrix E and F. After the resorting, we group some of the blocks, such that we have 52*9*51*9 blocks but the size of each block is increased to 52*9.

Overall, it is going to be very exciting to try and solve this problem in GraphLab.
Using back of the envelope calculation it seems that each Newton step has a matrix of 25,000,000,000 non-zeros..
I will post more once we make more progress.

Large Scale Machine Learning and Other Animals

Tuesday, November 8, 2011

GraphLab plays poker?

No comments:

Post a Comment

Labels

GraphLab Users Google Group

pagerank

google analytics

syntax