Reversi, Two Player Board Game
This was a semester long project for my Artificial Intelligence class that I took Spring semester of 2015. The project required everyone in the class to choose a game of "perfect" information and implement it in a programming language of their choice. I choose Reversi (marketed as Othello by Mattel) and wrote all the components in Java.
My first objective for this project was creating the game with all of the rules playable for two human players. I ran into many challenges along the way, finding that many of the basic rules of the game were deceivingly difficult to implement, specifically, validating a move.
This is a very basic example of a valid move, particularly because it's the first move of the game. As the game becomes more complex and moves begin to influence the board in more than one direction verifying the validity of moves becomes more difficult.
In general the algorithm checks in all eight directions around the piece to determine if there is at least one opponent's piece followed by a non-opponent's piece. One piece can flank pieces in multiple directions; moves that can completely change the outcome of the game.
For instance, this move occurs in the GIF of the game at the top of the page, between two AI opponents. The "0" player captures a corner (often considered the best position in Reversi) causing a huge point swing and effectively winning the game for "0" in that moment. There is nothing for the "X" player to do to get back into the game.
Two Kinds of Artificial Intelligence
One searching and one learning, the core of this project was learning how to write two AI that approach the game very differently. The first of the two AI that I worked on was the searching AI, it was recommended to start with it since it would, either, find any bugs present in the code or find that the game was missing rules.
Minimax and Alpha-Beta
The algorithm that we used for our searching Artificial Intelligence was Minimax enhanced with Alpha-beta pruning. As I began working on the algorithm I noticed that when Alpha-beta would play, the game would run indefinitely. After analyzing Alpha-beta's moves, I discovered that there are other conditions that can cause an end game, not just the board being completely filled. If there are no more valid moves for a player (i.e. no possible pieces to flip, then that player must pass). If both players must pass, i.e. there are no possible moves for either player, then the game ends. After a slew of other bug fixes, Alpha-beta was ready to play against and made quick work of me!
For Alpha-beta to understand the state of the game a value function is needed. This signifies which moves are more valuable than others. In simple terms, play the move that nets the highest points. Below is the simple value function that I used for Alpha-beta, the Maximizing player wants their score to be highly positive, subtract a large number from a small number, and the Minimizing player wants the opposite. This follows the logic of Reversi closely, every time a player captures pieces their score increases and the opponent's decreases.
Reward = Current Score of Max Player - Current Score of Min Player
I learned more than I ever imagined from Alpha-beta, it showed me rules that and conditions that I didn't know existed, as well as valuable positions and strategies. Given than, I approached creating my learning AI with confidence, feeling that my new knowledge of the game would be immensely helpful. My goal at the time was to create a learning AI that could beat Alpha-beta using Alpha-beta's strengths, which had been revealed to me.
My algorithm of choice ended up being a modified version of both State Action Reward State Action (SARSA) and Q-Learning. These are two very similar reinforcement learning algorithms that learn by generating a set of weights from features that describe the board.
These features have no intrinsic value other than what they describe about the board. If the condition for the feature is met, it could be considered "true" or "1". If one feature describes a corner position on the board, and a piece is there, then that feature is true. Using the equation above the reinforcement learner generates weights from the current features active on the board, the weights serve as the Q value representing the State.
The features I used were (29 per player):
- 4 for the corners of the Reversi Board (1 feature per corner)
- 24 from the edges of the Reversi Board (1 per position on each edge, 6 positions, 4 edges)
- 1 for the score, this is similar to the value function for Alpha-beta (it's considered active if the current player's score is higher than their opponent's)
During the learning process for Reinforcement Learner, initially I had the Reinforcement Learner play itself for 100,000 games, which took surprisingly only 25 minutes. After 100,000 games I noticed that the weights were growing too large, so I lowered α, the learning rate to 0.0001 which slowed the growth of the weights.
With those weights, my Reinforcement Learner can beat Alpha-beta and can beat me. Although I am not very good, I have tied it a couple of times... It sometimes plays poorly and makes seemingly bad moves, but then somehow makes a comeback.
Surprise! The winner in the GIF is the Reinforcement Learner!
In the end corners and edges became naturally valuable to both AI. The AI demonstrated a great deal more to me about the game than I could have ever imagined.
There is certainly more to be done with such a multifaceted project including, more learning with the Reinforcement Learner, creating a GUI, and even recreating this project with a new programming language.