Thursday, January 21, 2016

Did I mention that enumeration is a slow process?

With the level of success I had with enumerating all possible games, I thought it would be easy to enumerate all possible game states, even if I did it in a less than efficient manner. Three days later, and I don't think that I'm even halfway done. The basic scheme was to take each game and deconstruct it into a series of game states. Each game state would be checked against a list of previously attained game states, and added to the list if it wasn't found. Since the process of creating those states only took about an hour, I figured that it would take maybe 8-9 hours to generate the game states. After all, each game can yield at most 9 states.

So why is it taking so long? For one thing, I forgot to take into account the time of reading through the list of attained game states. But right now, I'm only up to around 3000 game states. What I think has happened is that I have underestimated the time of reading the file. I am willing to bet that reading off the of SD card isn't particularly efficient, so that all of the opening and closing of the file and reading the data over and over again is just chewing up all sorts of time.

Nonetheless, I've included the code for completeness. I don't really have time to write a new program tonight or the next several nights, so I'm just going to let it continue to run for the next few days and see if it reaches the end. If I were to create a new program, I would do the enumeration differently. I think I would go back to the original enumeration program. Every time a move is added, it would have to be a new position. But if I remove a move, I would be returning to a previously created position. I think I've even got that type of language written into the program. It's simple enough that I could probably do it tonight if I really wanted to. But I don't.

Addendum - After thinking about it for a moment longer, I realized that the above won't quite work. There will be lots of repetitions. I'll have to do a more careful enumeration perhaps ordered by the number of moves. But then I have to worry about positions that are impossible to reach. Or do I? I'm not sure, but I think that impossible to reach positions really just mean positions where both X and O have a 3 in a row. It's possible for X or O to get a double-win with a single move. But there will never be a situation where both win. (Also, I'll always alternate turns between X and O, so there will be at most 5 Xs, which cannot make two distinct TTTs.)

## Tic-Tac-Toe Game State Generator
##
## This program takes all possible games of Tic-Tac-Toe and generates a list
## of all possible game states. This will be the foundation of the game tree
## that will be used to analyze the game for machine learning.
##
## The program will take in the ordered games created by the TTT_Enumerator
## program and use this to generate mid-game states. This will be a
## non-redundant system, so that as new states are generated they are checked
## against the existing states. This means that each game state will be listed
## exactly once.

##### Generate game board given moves list
def playGame(moves):
    game = [0]*9
    for turn in range(1,10):
        if moves[turn] > -1:
            if turn % 2 == 0:
                player = 2
            else:
                player = 1
            game[moves[turn]] = player
    return game

##### Check if the game already exists
def alreadyExists(thisGame, debug):
    checkFile = open("TTTGameStates.txt", "r")
    for line in checkFile:
        gameAsStr = line.split()
        game = [0]*9
        for i in range(0,9):
            game[i] = int(gameAsStr[i])
            if debug == 1:
                print "Checking game match"
                print "game = {}".format(game)
                print "thisGame = {}".format(thisGame)

        if game[0] == thisGame[0] \
          and game[1] == thisGame[1] \
          and game[2] == thisGame[2] \
          and game[3] == thisGame[3] \
          and game[4] == thisGame[4] \
          and game[5] == thisGame[5] \
          and game[6] == thisGame[6] \
          and game[7] == thisGame[7] \
          and game[8] == thisGame[8]:
            if debug == 1:
                print "Match!\n\n"
            checkFile.close()
            return 1
        if debug == 1:
            print "Not a match"
    checkFile.close()
    if debug == 1:
        print "No matches at all"
    return 0

##### Check for win
def checkWin(game):
    winner = 0
    # X win?
    if game[0]*game[1]*game[2] == 1 \
      or game[3]*game[4]*game[5] == 1 \
      or game[6]*game[7]*game[8] == 1 \
      or game[0]*game[3]*game[6] == 1 \
      or game[1]*game[4]*game[7] == 1 \
      or game[2]*game[5]*game[8] == 1 \
      or game[0]*game[4]*game[8] == 1 \
      or game[2]*game[4]*game[6] == 1:
        winner = 1
    elif game[0]*game[1]*game[2] == 8 \
      or game[3]*game[4]*game[5] == 8 \
      or game[6]*game[7]*game[8] == 8 \
      or game[0]*game[3]*game[6] == 8 \
      or game[1]*game[4]*game[7] == 8 \
      or game[2]*game[5]*game[8] == 8 \
      or game[0]*game[4]*game[8] == 8 \
      or game[2]*game[4]*game[6] == 8:
        winner = 2
    elif game[0]*game[1]*game[2]*game[3]*game[4]*game[5]*game[6]*game[7]*game[8] > 0:
        winner = -1
    return winner

##### Main Program
debug = 0
inputFile = open("TTTGames.txt", "r")

checkFile = open("TTTGameStates.txt", "w")
checkFile.write("0 0 0 0 0 0 0 0 0 0\n")
checkFile.close()

for line in inputFile:
    ## Get move data
    movesAsStr = line.split()
    moves = [0, -1, -1, -1, -1, -1, -1, -1, -1, -1]
    for i in range(1,10):
        moves[i] = int(movesAsStr[i])

    ## Check each step of the game to see if it already exists in the list
    stepMoves = [0, -1, -1, -1, -1, -1, -1, -1 ,-1 ,-1]
    for depth in range(1,10):
        if moves[depth] > -1:
            stepMoves[depth] = moves[depth]
            thisGame = playGame(stepMoves)
            if debug == 1:
                print "stepMoves = {}".format(stepMoves)
                print "thisGame = {}\n\n".format(thisGame)

            ## If the game does not exist, add it to the list
            if alreadyExists(thisGame, debug) == 0:
                thisGame.append(checkWin(thisGame))
                if debug == 1:
                    print "Adding thisGame"
                print "thisGame = {}".format(thisGame)
                if debug == 1:
                    print "\n"
                addToFile = open("TTTGameStates.txt", "a")
                for item in thisGame:
                    addToFile.write(str(item))
                    addToFile.write(" ")
                addToFile.write("\n")
                addToFile.close()

Monday, January 18, 2016

Enumeration is a slow process

After some contemplation, I figured out how to do a complete enumeration of all games. At least, it seems to cycle through them in a reasonable manner. I didn't really think about it until I started running the program, but there are a lot of potential TTT games. The upper limit on the number of games is 9! = 362880. Some of those strings of moves would never happen because the game would have been won before that point, so the number is probably smaller. I'm not sure how much smaller, but I'll find out when the enumeration is completed. Fortunately, this number isn't so large that it's unreasonable to do it. I think if I were to do it again, I'd have to do a bit more with checking the game states rather than looking at the moves. There are fewer than 3^9 = 19683 potential game states to enumerate (9 positions that are either blank, X, or O, and many of those aren't valid). But I'm already down this path far enough that I'll keep going with it and then try the other technique later.

I had set the program to run overnight, but something happened because the program was not running in the morning, and the data appeared to have stopped somewhere. I think I have an idea of what happened, and it's related to shifting the first move from position 0 to position 1. I made some edits and I think I solved the problem. I won't know until it finishes, which will probably be in another couple hours.

Here's the code, with comments at the top:


Addendum: I ran the simulator again, but this time through a PuTTy connection, and it went significantly faster. I think it was done in less than an hour. I didn't time it, but it might have been as little as 30 minutes. I also needed to make a couple other minor adjustments to the code to handle the program exit properly. The final output has 255168 lines, which corresponds to the number of TTT games according to this website. This gives me a great deal of confidence that everything functioned correctly.

## Tic-Tac-Toe Game enumerator
##
## This program systematically cycles through all possible games of TTT
## and exports those games to a file that can be used to generate a set
## of games for the machine learning project.
##
## Each game consists of a 10 numbers. The first number is 0, 1, or 2 and
## indicates the winner:
##    * -1 = Draw
##    * 1 = X wins
##    * 2 = O wins
## The remaining 9 numbers indicate the moves that are made and the order
## in which they were made. Unused moves (if someone wins before the 9th move)
## will be indicated by a -1.
##
## The game positions are as follows:
##
##  0 | 1 | 2
## -------+---
##  3 | 4 | 5
## ---+---+---
##  6 | 7 | 8
##
## In addition to the games, there is also a game state array which keeps track
## of how the board actually looks. This is a 9 element list consisting of 0s, 1s,
## or 2s, depending on what positions have been used.
##
## On each available move, the current player will play in the lowest available
## position as long as that move has not already been played. After each move, the
## game will be checked for a winner. If there is no winner and there are still
## available moves, the play will continue. Otherwise, the game will be output to
## a file.
##
## After a game is output, the last play will be removed and the search for new
## moves will continue with the next position higher than the one that was removed.
## If there are no plays that meet this constraint, then the next move is removed
## and the process is repeated.

##### Check for win
def checkWin(game, debug):
    winner = 0
    # X win?
    if game[0]*game[1]*game[2] == 1 \
      or game[3]*game[4]*game[5] == 1 \
      or game[6]*game[7]*game[8] == 1 \
      or game[0]*game[3]*game[6] == 1 \
      or game[1]*game[4]*game[7] == 1 \
      or game[2]*game[5]*game[8] == 1 \
      or game[0]*game[4]*game[8] == 1 \
      or game[2]*game[4]*game[6] == 1:
        winner = 1
        if debug == 1:
            print "X wins!"
    elif game[0]*game[1]*game[2] == 8 \
      or game[3]*game[4]*game[5] == 8 \
      or game[6]*game[7]*game[8] == 8 \
      or game[0]*game[3]*game[6] == 8 \
      or game[1]*game[4]*game[7] == 8 \
      or game[2]*game[5]*game[8] == 8 \
      or game[0]*game[4]*game[8] == 8 \
      or game[2]*game[4]*game[6] == 8:
        winner = 2
        if debug == 1:
            print "O wins!"
    elif game[0]*game[1]*game[2]*game[3]*game[4]*game[5]*game[6]*game[7]*game[8] > 0:
        winner = -1
        if debug == 1:
            print "Draw!"
    else:
        if debug == 1:
            print "No winner yet."
    return winner

##### Display Game State
def dispGame(game):
    gameChar = []
    for i in range(0,9):
        if game[i] == 0:
            gameChar.append(" ")
        elif game[i] == 1:
            gameChar.append("X")
        elif game[i] == 2:
            gameChar.append("O")
        else:
            gameChar.append("Something bad happened!")
    print
    print " {} | {} | {}".format(gameChar[0], gameChar[1], gameChar[2])
    print "---+---+---"
    print " {} | {} | {}".format(gameChar[3], gameChar[4], gameChar[5])
    print "---+---+---"
    print " {} | {} | {}".format(gameChar[6], gameChar[7], gameChar[8])
    print

##### Main Program
game = [0]*9
move = [-1]*9
numMoves = 0
winner = 0
testMove = 0
turn = 1
removed = -1
debug = 0

while numMoves > -1 or removed > -1:
    ## Determine whether the last loop was additive or subtractive
    ##     * removed == 0 means additive (or starting)
    ##     * removed > 0 means subtractive
    ## Establish starting point for the search for the next move
    if debug == 1:
        print "Starting main loop\n"
        print "numMoves = {}".format(numMoves)
        print "removed = {}".format(removed)
        print "move = {}".format(move)
        dispGame(game)

    if removed == -1:
        ## Start checking for valid moves from 0
        lowStart = 0
    else:
        ## Start checking for valid moves from 1 more than the move
        ## that was removed
        lowStart = removed + 1
    testMove = lowStart

    ## Whose turn is it?
    if numMoves % 2 == 1:
        turn = 2
    else:
        turn = 1

    ## Start searching for a valid move
    while testMove <= 8:
        ## Is position testMove available?
        ## If so, move there. If not, increase testMove
        if game[testMove] == 0:
            game[testMove] = turn
            move[numMoves] = testMove
            numMoves += 1
            removed = -1
            testMove = 10
        else:
            testMove += 1

    ## If no valid play was found, remove the last move
    if testMove == 9:
        numMoves -= 1
        game[move[numMoves]] = 0
        removed = move[numMoves]
        move[numMoves] = -1

    ## Test whether someone has won
    ## If so, create a new line of output and then remove the last move
    winner = checkWin(game, debug)
    if winner != 0:
        if debug == 1:
            dispGame(game)
        print "move = {}".format(move)
        f = open("TTTGames.txt", "a")
        f.write(str(winner))
        f.write(" ")
        for item in move:
            f.write(str(item))
            f.write(" ")
        f.write("\n")
        f.close()
        winner = 0
        numMoves -= 1
        game[move[numMoves]] = 0
        removed = move[numMoves]
        move[numMoves] = -1

Sunday, January 17, 2016

Reinforcement Learning for Tic-Tac-Toe

This post is purely theoretical. I want to try to frame the idea of reinforcement learning for TTT. The setup here is based on Lecture 16 of the CS 229 Machine Learning Course: https://www.youtube.com/watch?v=RtxI449ZjSc.

As an important element of this programming, I'm going to always make the opponent play randomly against whatever strategy is put forward by the learning algorithm. This makes the calculation of future states easier, and possibly something that can be solved analytically.

First, we define the Markov Decision Process (MDP). This is a 5-tuple containing the following information:
  • S = Set of States -- These are all of the valid configurations of games.
  • A = Set of Actions -- This would be the set of all valid plays at any given time. Since I'm only playing one player, this does not consider what the opponent's (random) moves might be. The example of the robot in the video was set up that the moves were fixed directions that could always be invoked. But I don't see any reason why this would need to be the case based on the formulas that are worked out.
  • P_{s_a} = State Transition Probabilities -- For any given decision, there is a probabilistic outcome and this gives the probabilities given that you are in state s and take action a. (Note: For deterministic situations, we just set the probability to be 1 for the event that happens and 0 for everything else.) In the TTT game, the outcomes are probabilistic based on the number of available moves for the opponent that are available.
  • gamma = Discount Factor -- Since these are finite length games, I can set this to be 1 without any problems. The reason that this number is less than 1 in the video is because it accounts for there being infinitely many steps each with finite value, which leads to nonconvergent sums.
  • R = Reward function -- This is +1 when I win and -1 when I either lose or draw. This is an interesting point in the programming, as it might be that setting the reward to be 0 for a draw may result in a different strategy. But I won't know until I get there. For now, I want to play to win.
I will need a couple definitions and formulas here to describe the calculations that are about to happen:
  • V^{pi}(s) = The expected payoff of playing strategy pi starting from position s. (Note: Since the starting position is an empty board, one might think that one would only need to have this for that position. However, the calculations are done in an iterative/recursive manner, and so this value will need to be calculated at all states.
  • Formula for V^{pi}(s): V^{pi}(s) = R(s) + (gamma) * \sum_{s'} P_{s pi(s)} (s') V^{pi}(s') -- These are known as Bellman's equations. For TTT, we have gamma = 1.
Bellman's equations give us a linear system of equations that we could actually solve analytically. For the TTT game, we can think of the possible game states as picking the locations of 5 Xs out of the 9 positions and letting the rest be Os. Of course, many games end before that point, so this really creates an upper bound on the number of games as C(9,5) = 9!/(5! * 4!) = 126 games. So this should turn out to be an explicit calculation that I should be able to do. I think this will be my first project. The challenge seems to be indexing the possible game states in a useful way. I need to ensure that play does not go beyond the point that someone has won. There also seems to be some problems with the precise order of play. For example, if after three moves the board is XOX across the top row, there are two distinct ways that could have come about based on which order X made his move, but this game state has the same value either way. (I think this may be solved by coming up with a useful indexing of the games.)

So this works given a particular strategy pi, which will be initialized as being random play. The next challenge comes at the learning stage. The goal is to find V^* = max_{pi} V^{pi}, which is the optimal value function, and then to find the strategy pi^* that achieves that. There's another Bellman equation for V^*: V^*(s) = R(s) + (gamma) * max_a \sum_{s'} P_{s_a}(s') V^*(s'). The strategy pi^* can be obtained (after V^* is known) by using pi^*(s) = argmax_{a} \sum_{s'} P_{s_a}(s') V^*(s').

Since the formula for V^*(s) is recursive, this is no longer something that can be calculated in a straightforward manner. This leads to the two algorithms that attempt to approximate it: Value Iteration (VI) and Policy Iteration (PI). Here's the VI algorithm:
  • Initialize V(s) = 0 for all s.
  • Repeatedly update using V(s) := R(s) + (gamma) * max_a \sum_{s'} P_{s_a}(s') V(s') -- This can be done synchronously (update all at once) or asynchronously (update V(s) for one s at a time). I want to try to program both.
And here's the algorithm for Policy Iteration:
  • Initialize pi randomly.
  • Repeat:
    • V(s) := V^{pi}
    • pi(s) := argmax_a \sum_{s'} P_{s_a}(s') V(s')
This one seems a bit more difficult to program because I need to actually solve Bellman's equations to get V^{pi}. This means solving the large system of equations that I alluded to above. But in theory, that's it.

Here's my plan of attack:
  • Step 1: Devise an indexing system for all possible games and game states so that I can cycle through them in some useful way.
  • Step 2: Implement synchronous VI
  • Step 3: Implement asynchronous VI (this should be easy after having done synchronous VI)
  • Step 4: Learn how to solve large systems of equations numerically. (There might exist something in numpy that does this already. I don't know.)
  • Step 5: Implement PI
And this would take me to the first landing point for machine learning TTT. One "flaw" of this that I can see is that Step 1 is actually where all the real work lies. After telling the program exactly what's going to happen, it can figure out how to play to win. But eventually I want to do something where I don't enumerate all of the possibilities first. I'll have to think about how I can create some sort of game simulator. And then maybe set two different programs to play against each other and learn against each other. But that's a couple steps down the line.

This isn't going to work...

I sat down today to try to get the Raspberry Pi up and running, and it's not going as I had hoped.

First, I needed to go through the process of getting the wifi to work. Fortunately, I found a website that had instructions that worked perfectly on the first try: http://www.modmypi.com/blog/how-to-set-up-the-ralink-rt5370-wifi-dongle-on-raspian.

Then, I needed to get the VNC Server files: https://www.raspberrypi.org/forums/viewtopic.php?t=123457&p=830506. One significant change here (that took a while to figure out) was that when I try to log in from my laptop, I need to use port 5901 because the Jessie distribution automatically boots up in a GUI, and that takes away the default port. That took about an hour to figure out.

I tried several times to download the the packages using both pip and pip3, but it didn't work. Fortunately, the Python Machine Learning book also mentioned a different Python distribution called Anaconda. So I went to the webpage and downloaded that one: https://www.continuum.io/downloads. That didn't install properly. So it turns out that the Raspberry Pi CPU architecture can't handle the standard package. Fortunately, there's a package made specifically for the Raspberry Pi: https://www.continuum.io/blog/developer/anaconda-raspberry-pi. But unfortunately, they only built that for Python 2.7 and there are no plans to support Python3 (https://www.raspberrypi.org/forums/viewtopic.php?f=32&t=90544). After looking around some more, it looks like I won't be able to install the various packages onto the Raspberry Pi.

So where does that leave me? I can try to go through this process using my work computer, but then it wouldn't be as much of a Raspberry Pi project. So I'm going to do the harder thing, which is to try to build as many bits and pieces by scratch as I'm learning how to do this sort of stuff. This is an excellent way to run into a dead end, especially as it pertains to data visualization, which may be way beyond my ability to learn and implement properly. But I won't really know until I try. So I'm going to spend a day or two rethinking my strategy and my approach. I can also watch the last few lectures from the Stanford CS 229 class and wrap my mind around reinforcement learning.

Monday, January 11, 2016

Installing Python 3.4 wasn't enough, so it's back to Jessie

As I tried to get started on installing some packages for Python 3.4, I learned that there's a program called "pip" that I needed to install. So I did. Or at least, I thought I did. I ended up doing

sudo apt-get install python-pip
which installed pip, but for Python 2. So I had to do it a second time.
sudo apt-get install python3-pip
In all, I probably spent about an hour from the time I discovered I needed pip to the time I finished installing it. After installing it, I then tried to install the various packages as described in the Machine Learning book: NumPy, SciPy, scikit-learn, matplotlib, and pandas.

When I tried to install NumPy, it downloaded the files and just sat there for a full hour with nothing. So I cancelled out of it and tried to upgrade pip3 (there was a warning that suggested I do this). The suggested upgrade command didn't do anything useful. Things were running very slowly, so I rebooted the computer. There might have been some other background processes that were stuck in a loop. It still took a while for me to figure it out, but the upgrade command I needed wasn't the one the program told me, but rather
sudo pip3 install --upgrade pip
After getting pip3 to upgrade, I tried to install NumPy again. No luck. So I'm going to download Jessie and go from there. I don't have anything worth saving on the current system. I found some instructions for getting into the system using VNC, so I should be able to restore things to a similar condition. But since I don't have a microSD card reader at home, I'm going to have to install that at work.
Here's the link to the installation instructions: https://www.raspberrypi.org/forums/viewtopic.php?t=123457&p=830506

Sunday, January 10, 2016

Python 3.4 and Jessie

I found another resource for learning about machine learning using Python called "Python Machine Learning" by Sebastian Raschka. As it turns out, the publisher is currently having a sale, and I bought this 450 ebook for only $5. I consider that a big win.

One of the issues that I immediately ran into was that the book uses code for Python 3.4.3 or later, and I only have Python 3.2. After trying various forms of apt-get to download it, I found some instructions on the internet that suggested upgrade Raspbian to Jessie, and that would also have Python 3.4 installed on it. I almost did that, but kept on looking because I don't want my first attempt to be to completely install a brand new system. It turns out that there is no package for it, but I can download and compile it myself. Here's the webpage that I used to get the instructions: http://depado.markdownblog.com/2015-03-12-short-tutorial-raspbian-python3-4-rpi-gpio

This turned out to be a pretty long process. Fortunately, I had a book I could read and some videos I could watch while it was happening. The installation took about an hour and 15 minutes. But I didn't have any time to do anything from there because it had gotten somewhat late.

Friday, January 8, 2016

Basic Tic-Tac-Toe

The first step in teaching a computer to play TTT is to program a basic TTT game. The idea is that I want to start off with randomly generated games from which the game will learn. This was a relatively simple thing to do, though I had to spend a bit of time reminding myself how to do simple things like initialize arrays and print to files, but I put all the pieces back together again and created a program that plays TTT at random and also exports a file that records the winner of the game followed by the sequence of moves. I didn't want to just output the final board because I wanted to be as robust as possible with the data. I can always process the sequences of moves to generate the final board if I wanted to.

The game board is set up with the 9 positions labeled 0 through 8, starting from the top left and working left-to-right, top-to-bottom. Player 1 is X and Player 2 is O. The game state is initialized with 0s and those values are changed as players make their moves. I have one command to display the current game state and another one to check for winners.

And that's really all there was to it. The code is below. Now I need to start thinking about how this information actually gets processed by a machine learning algorithm and see what I can make it do (or not make it do).
## Standard Tic Tac Toe game
##
## Positions
##
##  0 | 1 | 2
## ---+---+---
##  3 | 4 | 5
## ---+---+---
##  6 | 7 | 8
##
## Game state
##  - 9 element array
##  - 0 = empty
##  - 1 = X
##  - 2 = O

import random

##### Check for win
def checkWin(game):
    winner = 0
    # X win?
    if game[0]*game[1]*game[2] == 1 \
      or game[3]*game[4]*game[5] == 1 \
      or game[6]*game[7]*game[8] == 1 \
      or game[0]*game[3]*game[6] == 1 \
      or game[1]*game[4]*game[7] == 1 \
      or game[2]*game[5]*game[8] == 1 \
      or game[0]*game[4]*game[8] == 1 \
      or game[2]*game[4]*game[6] == 1:
        winner = 1
        print "X wins!"
    elif game[0]*game[1]*game[2] == 8 \
      or game[3]*game[4]*game[5] == 8 \
      or game[6]*game[7]*game[8] == 8 \
      or game[0]*game[3]*game[6] == 8 \
      or game[1]*game[4]*game[7] == 8 \
      or game[2]*game[5]*game[8] == 8 \
      or game[0]*game[4]*game[8] == 8 \
      or game[2]*game[4]*game[6] == 8:
        winner = 2
        print "O wins!"
    elif game[0]*game[1]*game[2]*game[3]*game[4]*game[5]*game[6]*game[7]*game[8] > 0:
        winner = -1
        print "Draw!"
    else:
        print "No winner yet."
    return winner

##### Display Game State
def dispGame(game):
    gameChar = []
    for i in range(0,9):
        if game[i] == 0:
            gameChar.append(" ")
        elif game[i] == 1:
            gameChar.append("X")
        elif game[i] == 2:
            gameChar.append("O")
        else:
            gameChar.append("Something bad happened!")
    print
    print " {} | {} | {}".format(gameChar[0], gameChar[1], gameChar[2])
    print "---+---+---"
    print " {} | {} | {}".format(gameChar[3], gameChar[4], gameChar[5])
    print "---+---+---"
    print " {} | {} | {}".format(gameChar[6], gameChar[7], gameChar[8])
    print

##### Main Program
game = [0]*9
move = []
turn = 1
winner = 0
dispGame(game)
while winner == 0:
    print "Player {}'s turn".format(turn)
    validMove = 0
    while validMove == 0:
        testMove = random.randint(0,8)
        if game[testMove] == 0:
            game[testMove] = turn
            validMove = turn
            move.append(testMove)
    if turn == 1:
        turn = 2
    else:
        turn = 1
    winner = checkWin(game)
    dispGame(game)
move.insert(0, winner)
f = open("TTTGames.txt", "a")
for item in move:
    f.write(str(item))
    f.write(" ")
f.write("\n")
f.close()

Monday, January 4, 2016

Project: Machine Learning - Tic-Tac-Toe

Almost a year ago, I had intended to do some sort of dot tracking game. But then I got busy with other types of things and never got back around to it. That's okay. I have a completely different type of learning project. I want to learn about machine learning.

I happened to come across Stanford's CS 229 (Machine Learning) videos on YouTube by Andrew Ng. Here's the link to the first video: https://www.youtube.com/watch?v=UzxYlbK2c7E. Here is the course webpage with notes: http://cs229.stanford.edu. (There's also a Coursera version of the class, but it looks different and seems to be coming in at a lower level of difficulty. I'm going to hope that my background is enough to figure this out after just watching the lectures.)

This is going to be a far more technical skill to learn than making colored dots on a screen, so it will be a bit more of a journey. But I think the end outcome from learning this will be something pretty great if I can get there. I'm going to try to get a computer to learn Tic-Tac-Toe. But I don't want to program a strategy for it to follow. I want it to use data from randomly generated games and see if it can't learn how to play well from that information. I need to take the time to think through the details carefully, as watching the videos at 2x speed (as I've been doing) is great for getting general ideas but not so good for setting things up. I would like to do this with both supervised and unsupervised learning, and I'll need to program my own Tic-Tac-Toe game to generate the data for the computer to try to learn from.

The long term goal of this is that I'd like to try to apply this type of machine learning to other games and maybe even something like horse racing. But that's pretty far out into the future. Right now, I just want to think about Tic-Tac-Toe.

What's the plan? I don't have one. Yet. But I have the components of one.
  • Build a Tic-Tac-Toe Game that can play itself by making random moves and can log the games that it plays. How exactly this will look is an open question. This will probably need to be tinkered with several times depending on the model that's being created and the way that the log needs to be formatted to make it work with the learning algorithm.
  • Create a supervised learning model. I barely know what that means, but I think in the space of all possible games (approximately 9! games), there are winners and losers (say, if you are the first player). I can imagine the games being embedded into a 9 dimensional space and that we're trying to find ways to estimate when we'll have a winner and when we'll have a loser, and we can use that estimation to make future moves. I think that the section on generative learning algorithms is what I should be looking at, but it's entirely possible that I have no idea what I'm talking about. I may have to backtrack and do some of the problem sets to understand what I'm saying.
I don't think that an unsupervised learning model will be successful here, as I'm not looking to understand or find patterns in an abstract collection of data. I want to win. But maybe unsupervised learning will help in the horse racing to find ways of looking at the information that's useful. Of course, I probably have no idea what I'm saying, but that's okay. At least, it's okay for now. At some point, I would like to be able to write competent statements about this stuff.