Sunday, November 20, 2016

Updated Algorithm Finder

The conversion took a little bit less time than I thought it would. However, I did discover a bug that's related to how Python stores list information. When you make a list of lists, depending on how those lists were constructed it's possible to inadvertently change a variable without realizing. Here's a question that was posted to StackOverflow: Python list by value not by reference. This was basically my issue.

The general situation was that I had to use two different variables to hold cube information for the search. One was supposed to be fixed (the cube configuration being solved) and one was supposed to be part of the search (perform some moves and then compare the result to the database of known solved positions). However, every time I tried to change the position on the second cube, the first one would change.

My solution to this problem was to rewrite the code for making the copy of the cube. I didn't use the deepcopy program from the link. Instead, I just created a function that would return the values of the cube and used that to copy. But even then, I had some problems. Here's the original attempt:
def copyCube( master ):
    return master

newCube = copyCube( oldCube )
This didn't work. When I made changes to newCube, it still impacted the old one. So I had to create an intermediate variable that held just the values and did not actually create a reference to the original.
def copyCube( master ):
    tempCube = resetCube()
    for i in range(6):
        for ii in range(9):
            tempCube[i][ii] = master[i][ii]
    return tempCube

newCube = copyCube( oldCube )
I'll have to think about fine-tuning this to see how efficient I can get it. Right now, the program is completely functional. It even has a save feature for cube positions so that they can be loaded instead of needing to type them in. There's definitely space to grow in that direction for flexibility. I also added a randomizer.

On the Raspberry Pi, I get about 15,000 checked positions per second. There's a chance of improving that by multithreading, but that's another task for another day. I first want to make the base code as efficient as possible. I made a couple other changes that might support that, which includes organizing the library in order of move length so that I can check from the middle after I've already verified that the position wasn't already closer to the beginning. I haven't implemented this, but that will give me a few percent improvement. I also need to do some timing checks to see if I can't make other parts of the algorithm more efficient. But at least for now, I've got the minimum I've needed to get a program that can search for solution algorithms. It's slow, but it works.
import pickle
import time
import random

'''
Terminology:
A "command" is one basic twist of the cube.
A "move" is some move around a single axis of the cube.
A "sequence" is an ordered collection of unflattened moves.
'''

# Global Variables
'''
sqColor[face, position] -- The current state of the cube

 Faces             Positions
  0                  0 1 2
1 2 3 4              3 4 5
  5                  6 7 8

0 = Blue, 1 = Red, 2 = Yellow, 3 = Orange, 4 = White, 5 = Green, 6 = Gray

Solved position:

       0 0 0
       0 0 0
       0 0 0

1 1 1  2 2 2  3 3 3  4 4 4
1 1 1  2 2 2  3 3 3  4 4 4
1 1 1  2 2 2  3 3 3  4 4 4

       5 5 5
       5 5 5
       5 5 5
'''
sqColor = [ [ i for ii in range(9) ] for i in range(6) ]

'''
Axis moves -- xAxis = FB / yAxis = LR / zAxis = UD
'''
xAxis = [["F"], ["F2"], ["F'"], ["B"], ["B2"], ["B'"],
         ["F", "B"], ["F", "B2"], ["F", "B'"],
         ["F2", "B"], ["F2", "B2"], ["F2", "B'"],                               
         ["F'", "B"], ["F'", "B2"], ["F'", "B'"] ]
yAxis = [["L"], ["L2"], ["L'"], ["R"], ["R2"], ["R'"],
         ["L", "R"], ["L", "R2"], ["L", "R'"],
         ["L2", "R"], ["L2", "R2"], ["L2", "R'"],                               
         ["L'", "R"], ["L'", "R2"], ["L'", "R'"] ]
zAxis = [["U"], ["U2"], ["U'"], ["D"], ["D2"], ["D'"],
         ["U", "D"], ["U", "D2"], ["U", "D'"],
         ["U2", "D"], ["U2", "D2"], ["U2", "D'"],                               
         ["U'", "D"], ["U'", "D2"], ["U'", "D'"] ]

def resetCube():
    return [ [ i for ii in range(9) ] for i in range(6) ]

def copyCube( master ):
    tempCube = resetCube()
    for i in range(6):
        for ii in range(9):
            tempCube[i][ii] = master[i][ii]
    return tempCube


'''
premute -- perform some move on the cube
'''

def permute(comm):
    if comm == "U":
        swap4([0,0], [0,2], [0,8], [0,6])
        swap4([0,1], [0,5], [0,7], [0,3])
        swap4([4,0], [3,0], [2,0], [1,0])
        swap4([4,1], [3,1], [2,1], [1,1])
        swap4([4,2], [3,2], [2,2], [1,2])
    elif comm == "U2":
        swap2([0,0], [0,8])
        swap2([0,2], [0,6])
        swap2([0,1], [0,7])
        swap2([0,3], [0,5])
        swap2([1,0], [3,0])
        swap2([1,1], [3,1])
        swap2([1,2], [3,2])
        swap2([2,0], [4,0])
        swap2([2,1], [4,1])
        swap2([2,2], [4,2])
    elif comm == "U'":
        swap4([0,0], [0,6], [0,8], [0,2])
        swap4([0,1], [0,3], [0,7], [0,5])
        swap4([1,0], [2,0], [3,0], [4,0])
        swap4([1,1], [2,1], [3,1], [4,1])
        swap4([1,2], [2,2], [3,2], [4,2])
    elif comm == "L":
        swap4([1,0], [1,2], [1,8], [1,6])
        swap4([1,1], [1,5], [1,7], [1,3])
        swap4([0,0], [2,0], [5,0], [4,8])
        swap4([0,3], [2,3], [5,3], [4,5])
        swap4([0,6], [2,6], [5,6], [4,2])
    elif comm == "L2":
        swap2([1,0], [1,8])
        swap2([1,2], [1,6])
        swap2([1,1], [1,7])
        swap2([1,3], [1,5])
        swap2([2,0], [4,8])
        swap2([2,3], [4,5])
        swap2([2,6], [4,2])
        swap2([0,0], [5,0])
        swap2([0,3], [5,3])
        swap2([0,6], [5,6])
    elif comm == "L'":
        swap4([1,0], [1,6], [1,8], [1,2])
        swap4([1,1], [1,3], [1,7], [1,5])
        swap4([5,0], [2,0], [0,0], [4,8])
        swap4([5,3], [2,3], [0,3], [4,5])
        swap4([5,6], [2,6], [0,6], [4,2])
    elif comm == "F":
        swap4([2,0], [2,2], [2,8], [2,6])
        swap4([2,1], [2,5], [2,7], [2,3])
        swap4([0,6], [3,0], [5,2], [1,8])
        swap4([0,7], [3,3], [5,1], [1,5])
        swap4([0,8], [3,6], [5,0], [1,2])
    elif comm == "F2":
        swap2([2,0], [2,8])
        swap2([2,2], [2,6])
        swap2([2,1], [2,7])
        swap2([2,3], [2,5])
        swap2([0,6], [5,2])
        swap2([0,7], [5,1])
        swap2([0,8], [5,0])
        swap2([1,8], [3,0])
        swap2([1,5], [3,3])
        swap2([1,2], [3,6])
    elif comm == "F'":
        swap4([2,0], [2,6], [2,8], [2,2])
        swap4([2,1], [2,3], [2,7], [2,5])
        swap4([0,6], [1,8], [5,2], [3,0])
        swap4([0,7], [1,5], [5,1], [3,3])
        swap4([0,8], [1,2], [5,0], [3,6])
    elif comm == "R":
        swap4([3,0], [3,2], [3,8], [3,6])
        swap4([3,1], [3,5], [3,7], [3,3])
        swap4([0,8], [4,0], [5,8], [2,8])
        swap4([0,5], [4,3], [5,5], [2,5])
        swap4([0,2], [4,6], [5,2], [2,2])
    elif comm == "R2":
        swap2([3,0], [3,8])
        swap2([3,2], [3,6])
        swap2([3,1], [3,7])
        swap2([3,3], [3,5])
        swap2([2,2], [4,6])
        swap2([2,5], [4,3])
        swap2([2,8], [4,0])
        swap2([0,8], [5,8])
        swap2([0,5], [5,5])
        swap2([0,2], [5,2])
    elif comm == "R'":
        swap4([3,0], [3,6], [3,8], [3,2])
        swap4([3,1], [3,3], [3,7], [3,5])
        swap4([0,8], [2,8], [5,8], [4,0])
        swap4([0,5], [2,5], [5,5], [4,3])
        swap4([0,2], [2,2], [5,2], [4,6])
    elif comm == "B":
        swap4([4,0], [4,2], [4,8], [4,6])
        swap4([4,1], [4,5], [4,7], [4,3])
        swap4([0,2], [1,0], [5,6], [3,8])
        swap4([0,1], [1,3], [5,7], [3,5])
        swap4([0,0], [1,6], [5,8], [3,2])
    elif comm == "B2":
        swap2([4,0], [4,8])
        swap2([4,2], [4,6])
        swap2([4,1], [4,7])
        swap2([4,3], [4,5])
        swap2([0,2], [5,6])
        swap2([0,1], [5,7])
        swap2([0,0], [5,8])
        swap2([1,0], [3,8])
        swap2([1,3], [3,5])
        swap2([1,6], [3,2])
    elif comm == "B'":
        swap4([4,0], [4,6], [4,8], [4,2])
        swap4([4,1], [4,3], [4,7], [4,5])
        swap4([0,2], [3,8], [5,6], [1,0])
        swap4([0,1], [3,5], [5,7], [1,3])
        swap4([0,0], [3,2], [5,8], [1,6])
    elif comm == "D":
        swap4([5,0], [5,2], [5,8], [5,6])
        swap4([5,1], [5,5], [5,7], [5,3])
        swap4([1,6], [2,6], [3,6], [4,6])
        swap4([1,7], [2,7], [3,7], [4,7])
        swap4([1,8], [2,8], [3,8], [4,8])
    elif comm == "D2":
        swap2([5,0], [5,8])
        swap2([5,2], [5,6])
        swap2([5,1], [5,7])
        swap2([5,3], [5,5])
        swap2([1,6], [3,6])
        swap2([1,7], [3,7])
        swap2([1,8], [3,8])
        swap2([2,6], [4,6])
        swap2([2,7], [4,7])
        swap2([2,8], [4,8])
    elif comm == "D'":
        swap4([5,0], [5,6], [5,8], [5,2])
        swap4([5,1], [5,3], [5,7], [5,5])
        swap4([1,6], [4,6], [3,6], [2,6])
        swap4([1,7], [4,7], [3,7], [2,7])
        swap4([1,8], [4,8], [3,8], [2,8])

def swap4(p0, p1, p2, p3):
    temp = sqColor[p0[0]][p0[1]]
    sqColor[p0[0]][p0[1]] = sqColor[p3[0]][p3[1]]
    sqColor[p3[0]][p3[1]] = sqColor[p2[0]][p2[1]]
    sqColor[p2[0]][p2[1]] = sqColor[p1[0]][p1[1]]
    sqColor[p1[0]][p1[1]] = temp

def swap2(p0, p1):
    temp = sqColor[p0[0]][p0[1]]
    sqColor[p0[0]][p0[1]] = sqColor[p1[0]][p1[1]]
    sqColor[p1[0]][p1[1]] = temp

'''
Apply a sequence - The sequence is not flattened
'''

def applySeq(seq):
    commList = flatten(seq)
    for comm in commList:
        permute(comm)

def flatten(seq):
    return [ item for sublist in seq for item in sublist ]

'''
Compare cube: test vs target
returns 1 if matching, 0 if not
ignores color 6 (gray)
'''

def compCube( test, target ):
    for i in range(6):
        for ii in [0, 1, 2, 3, 5, 6, 7, 8]:
            if test[i][ii] < 6:
                if test[i][ii] != target[i][ii]:
                    return 0
    return 1

def dbCompCube( test, cubePosList ):
    for i in range( len(cubePosList) ):
        if i % 10000 == 0:
            print(i, 'of', len(cubePosList))
        if compCube( test, cubePosList[i] ) == 1:
            return i
    return -1


'''
Find the next sequence
Generates the "next" sequence when listed pseudo-lexicographically

A
AA
AAA
AAB
AAC
AB
ABA
ABB
ABC

Except that the letters are moves taken from the lists xAxis, yAxis, and zAxis
The search will also not pick two moves from the same axis back-to-back
Need maxLen > 1
'''

def nextSeq(seq, seqLen):
    if len(seq) == 0:
        seq = [ xAxis[0] ]
    else:
        done = 0
        goBack = 0
        while done == 0:
            lastMove = seq[-1]
            if len(seq) < seqLen and goBack == 0:
                if lastMove in xAxis:
                    seq.append( yAxis[0] )
                else:
                    seq.append( xAxis[0] )
                done = 1
            elif lastMove in xAxis:
                if xAxis.index( lastMove) < len( xAxis ) - 1:
                    seq[-1] = xAxis[ xAxis.index( lastMove) + 1]
                else:
                    if len( seq ) > 1:
                        if seq[-2] in yAxis:
                            seq[-1] = zAxis[0]
                        else:
                            seq[-1] = yAxis[0]
                    else:
                        seq = [ yAxis[0] ]
                done = 1
            elif lastMove in yAxis:
                if yAxis.index( lastMove) < len( yAxis ) - 1:
                    seq[-1] = yAxis[ yAxis.index( lastMove) + 1]
                    done = 1
                else:
                    if len( seq ) > 1:
                        if seq[-2] in xAxis:
                            seq[-1] = zAxis[0]
                            done = 1
                        else:
                            seq = seq[:-1]
                            goBack = 1
                    else:
                        seq = [ zAxis[0] ]
                        done = 1
            elif lastMove in zAxis:
                if zAxis.index( lastMove) < len( zAxis ) - 1:
                    seq[-1] = zAxis[ zAxis.index( lastMove) + 1]
                    done = 1
                else:
                    seq = seq[:-1]
                    goBack = 1
            if seq == [ ] and goBack == 1:
                done = 1
    return seq

'''
Invert move list -- this is for building solution sequences for positions
'''
def invertMoves( flatSeq ):
    inverted = []
    for i in range(len(flatSeq)):
        j = i+1
        if flatSeq[-j] == "F":
            inverted.append("F'")
        elif flatSeq[-j] == "F'":
            inverted.append("F")
        elif flatSeq[-j] == "B":
            inverted.append("B'")
        elif flatSeq[-j] == "B'":
            inverted.append("B")
        elif flatSeq[-j] == "R":
            inverted.append("R'")
        elif flatSeq[-j] == "R'":
            inverted.append("R")
        elif flatSeq[-j] == "L":
            inverted.append("L'")
        elif flatSeq[-j] == "L'":
            inverted.append("L")
        elif flatSeq[-j] == "U":
            inverted.append("U'")
        elif flatSeq[-j] == "U'":
            inverted.append("U")
        elif flatSeq[-j] == "D":
            inverted.append("D'")
        elif flatSeq[-j] == "D'":
            inverted.append("D")
        else:
            inverted.append( flatSeq[-j] )
    return inverted

'''
Display cube
'''
def dispCube(cube):
    print()
    print("       ", cube[0][0], cube[0][1], cube[0][2])
    print("       ", cube[0][3], cube[0][4], cube[0][5])
    print("       ", cube[0][6], cube[0][7], cube[0][8])
    print()
    print(cube[1][0], cube[1][1], cube[1][2], " ",
          cube[2][0], cube[2][1], cube[2][2], " ",
          cube[3][0], cube[3][1], cube[3][2], " ",
          cube[4][0], cube[4][1], cube[4][2] )
    print(cube[1][3], cube[1][4], cube[1][5], " ",
          cube[2][3], cube[2][4], cube[2][5], " ",
          cube[3][3], cube[3][4], cube[3][5], " ",
          cube[4][3], cube[4][4], cube[4][5] )
    print(cube[1][6], cube[1][7], cube[1][8], " ",
          cube[2][6], cube[2][7], cube[2][8], " ",
          cube[3][6], cube[3][7], cube[3][8], " ",
          cube[4][6], cube[4][7], cube[4][8] )
    print()
    print("       ", cube[5][0], cube[5][1], cube[5][2])
    print("       ", cube[5][3], cube[5][4], cube[5][5])
    print("       ", cube[5][6], cube[5][7], cube[5][8])
    print()

def dispGuide():
    print(''' Faces       Positions     Colors: 0 = Blue, 1 = Red, 2 = Yellow
  0            0 1 2               3 = Orange, 4 = White, 5 = Green
1 2 3 4        3 4 5               6 = Gray (ignore)
  5            6 7 8
''')

def go():
    exec(open("./SpeedCubeV2.py").read())

# BEGIN MAIN PROGRAM!

menu = " "
sqColor = resetCube()
testCube = resetCube()
while menu != '9':
    print('''
    (1) Create library pickle
    (2) Load library pickle
    (3) Set Cube
    (4) Solve Cube
    (7) Load Cube
    (8) Save Cube
    (9) Quit
    ''')
    menu = input('Select Menu Option: ')
    if menu == '1':
        maxLen = input('Select Maximum Length: ')
        if maxLen.isdigit():
            curSeq = []
            seqList = [ curSeq ]
            cubePosList = [ sqColor ]
            bookmark = [ 0 ]
            maxLen = int(maxLen)
            for i in range(maxLen):
                bookmark.append( len( seqList) )
                seqLen = i+1
                done = 0
                while done == 0:
                    if len(seqList) % 1000 == 0:
                        print(len(seqList), curSeq)
                    curSeq = nextSeq( curSeq, seqLen )
                    sqColor = resetCube()
                    if len( curSeq ) == seqLen:
                        applySeq( curSeq )
                        cubePosList.append( sqColor )
                        seqList.append( invertMoves( flatten( curSeq ) ) )
                    elif curSeq == []:
                        done = 1
            pickle_out = open("seqList.pickle",'wb')
            pickle.dump(seqList, pickle_out)
            pickle_out.close()
            pickle_out = open("cubePosList.pickle",'wb')
            pickle.dump(cubePosList, pickle_out)
            pickle_out.close()
            pickle_out = open("bookmark.pickle",'wb')
            pickle.dump(bookmark, pickle_out)
            pickle_out.close()

    elif menu == '2':
        pickle_in = open("seqList.pickle",'rb')
        seqList = pickle.load(pickle_in)
        pickle_in.close()
        pickle_in = open("cubePosList.pickle",'rb')
        cubePosList = pickle.load(pickle_in)
        pickle_in.close()
        pickle_in = open("bookmark.pickle",'rb')
        bookmark = pickle.load(pickle_in)
        pickle_in.close()

    elif menu == '3':
        done = 0
        while done == 0:
            dispCube(sqColor)
            dispGuide()
            face = input("Select face (0-5), F for full entry, R to randomize, S to save, Q to quit: ")
            if face == 'Q':
                done = 1
            elif face == 'S':
                testCube = copyCube(sqColor)
                done = 1
            elif face == 'F':
                sqColor = [ ['X' for i in range(9)] for ii in range(6)]
                for i in range(6):
                    for ii in range(9):
                        complete = 0
                        while complete ==  0:
                            dispCube(sqColor)
                            dispGuide()
                            color = input("Location "
                                          + str(i) + ", " + str(ii) + ": ")
                            if color in ['0', '1', '2', '3', '4', '5', '6']:
                                sqColor[i][ii] = int(color)
                                complete = 1
            elif face == 'R':
                randSeq = []
                for i in range(20):
                    randAxis = random.randint(0,2)
                    if randAxis == 0:
                        randMove = random.randint(0,len(xAxis)-1)
                        randSeq.append( xAxis[randMove] )
                    elif randAxis == 1:
                        randMove = random.randint(0,len(yAxis)-1)
                        randSeq.append( yAxis[randMove] )
                    else:
                        randMove = random.randint(0,len(zAxis)-1)
                        randSeq.append( zAxis[randMove] )
                applySeq( randSeq )
            elif face in ['0', '1', '2', '3', '4', '5']:
                face = int(face)
                position = input("Select position (0-8): ")
                if position in ['0', '1', '2', '3', '4', '5', '6', '7', '8']:
                    position = int(position)
                    color = input("Select color (0-6, 6 = ignore): ")
                    if color in ['0', '1', '2', '3', '4', '5', '6']:
                        sqColor[face][position] = int(color)

    elif menu == '4':
        print("Begin search in database")
        dispCube( testCube )
        solIndex = dbCompCube( testCube, cubePosList )
        if solIndex > -1:
            print("Solution index:", solIndex)
            print("Solution:", seqList[solIndex])
        else:
            print("Position not found in database.")
            for searchDepth in range(1,15):
                print("Searching at depth", searchDepth)
                testSeq = [ ]
                done = 0
                count = 1
                while done == 0:
                    testSeq = nextSeq(testSeq, searchDepth)
                    if len( testSeq ) > 0:
                        print("Count:", count)
                        print("Trying", testSeq)
                        sqColor = copyCube( testCube )
                        applySeq( testSeq )
                        dispCube( sqColor )
                        count = count + 1
                        solIndex = dbCompCube( sqColor, cubePosList )
                        if solIndex > -1:
                            print("Search sequence:", testSeq)
                            print("Solution index:", solIndex)
                            print("Solution:", seqList[solIndex])
                            done = 1
                    else:
                        done = 1
                if solIndex > -1:
                    break
    elif menu == '7':
        pickle_in = open("testCube.pickle",'rb')
        testCube = pickle.load(pickle_in)
        pickle_in.close()
    elif menu == '8':
        pickle_out = open("testCube.pickle",'wb')
        pickle.dump(testCube, pickle_out)
        pickle_out.close()

Saturday, November 19, 2016

Creating the move list and searchable database

I created the algorithm to generate the searchable list of positions with "known" solutions. It took a little bumbling around to get it right, but it wasn't conceptually too difficult. It's just thinking through how I would manually create the list step by step. When do I add a new item, how do I cycle to the next item in the list, and then how do I backtrack after I've exhausted the list.

But as I'm thinking about this, I'm somewhat worried that I'm going to need to rewrite the program a bit. I think some of the pieces have become too spread out and disorganized for me to go to the next step (the solving step). Also, while I have definitely liked the graphical representation as I've been double checking the various permutations corresponding to the moves, Pygame just takes so much time to load and running the program in IDLE instead of in a terminal just feels so much slower.

So I'm about to embark on a rewriting of the program. I've built all of the bits and pieces, but I need to pull it apart and reorganize it a bit so that I can call the things I want to call when I want to call them. I've got the ability to run long processes and then dump the results into a file (pickling), which means that I don't need to rerun those processes again and again.

So the way I see it, I need to create the following functions:
  • Reset the cube
  • Perform a sequence of moves to the cube
  • Compare the current state of the cube with one from a list, including the ability to ignore gray pieces
  • Find the "next" sequence to check given a maximum move depth and the current sequence.
It may take a while for me to get there, so here's the code in its current form:
import pygame
import pickle
import time

# Global Variables
sGap = 5
bGap = 10
boxSize = 30

def posCalc(bS, sG, bG):
    return bS*boxSize + sG*sGap + bG*bGap

# Define drawing data

sqPos = [ [ [posCalc(3,3,1), posCalc(0,1,0)],
            [posCalc(4,4,1), posCalc(0,1,0)],
            [posCalc(5,5,1), posCalc(0,1,0)],
            [posCalc(3,3,1), posCalc(1,2,0)],
            [posCalc(4,4,1), posCalc(1,2,0)],
            [posCalc(5,5,1), posCalc(1,2,0)],
            [posCalc(3,3,1), posCalc(2,3,0)],
            [posCalc(4,4,1), posCalc(2,3,0)],
            [posCalc(5,5,1), posCalc(2,3,0)] ],
          [ [posCalc(0,1,0), posCalc(3,3,1)],
            [posCalc(1,2,0), posCalc(3,3,1)],
            [posCalc(2,3,0), posCalc(3,3,1)],
            [posCalc(0,1,0), posCalc(4,4,1)],
            [posCalc(1,2,0), posCalc(4,4,1)],
            [posCalc(2,3,0), posCalc(4,4,1)],
            [posCalc(0,1,0), posCalc(5,5,1)],
            [posCalc(1,2,0), posCalc(5,5,1)],
            [posCalc(2,3,0), posCalc(5,5,1)] ],
          [ [posCalc(3,3,1), posCalc(3,3,1)],
            [posCalc(4,4,1), posCalc(3,3,1)],
            [posCalc(5,5,1), posCalc(3,3,1)],
            [posCalc(3,3,1), posCalc(4,4,1)],
            [posCalc(4,4,1), posCalc(4,4,1)],
            [posCalc(5,5,1), posCalc(4,4,1)],
            [posCalc(3,3,1), posCalc(5,5,1)],
            [posCalc(4,4,1), posCalc(5,5,1)],
            [posCalc(5,5,1), posCalc(5,5,1)] ],
          [ [posCalc(6,5,2), posCalc(3,3,1)],
            [posCalc(7,6,2), posCalc(3,3,1)],
            [posCalc(8,7,2), posCalc(3,3,1)],
            [posCalc(6,5,2), posCalc(4,4,1)],
            [posCalc(7,6,2), posCalc(4,4,1)],
            [posCalc(8,7,2), posCalc(4,4,1)],
            [posCalc(6,5,2), posCalc(5,5,1)],
            [posCalc(7,6,2), posCalc(5,5,1)],
            [posCalc(8,7,2), posCalc(5,5,1)] ],
          [ [posCalc(9,7,3), posCalc(3,3,1)],
            [posCalc(10,8,3), posCalc(3,3,1)],
            [posCalc(11,9,3), posCalc(3,3,1)],
            [posCalc(9,7,3), posCalc(4,4,1)],
            [posCalc(10,8,3), posCalc(4,4,1)],
            [posCalc(11,9,3), posCalc(4,4,1)],
            [posCalc(9,7,3), posCalc(5,5,1)],
            [posCalc(10,8,3), posCalc(5,5,1)],
            [posCalc(11,9,3), posCalc(5,5,1)] ],
          [ [posCalc(3,3,1), posCalc(6,5,2)],
            [posCalc(4,4,1), posCalc(6,5,2)],
            [posCalc(5,5,1), posCalc(6,5,2)],
            [posCalc(3,3,1), posCalc(7,6,2)],
            [posCalc(4,4,1), posCalc(7,6,2)],
            [posCalc(5,5,1), posCalc(7,6,2)],
            [posCalc(3,3,1), posCalc(8,7,2)],
            [posCalc(4,4,1), posCalc(8,7,2)],
            [posCalc(5,5,1), posCalc(8,7,2)] ]
        ]

sqColor = [ [ i for ii in range(9) ] for i in range(6) ]

drawColor = [ pygame.Color(0,0,255),
              pygame.Color(255,0,0),
              pygame.Color(255,255,0),
              pygame.Color(255,100,0),
              pygame.Color(255,255,255),
              pygame.Color(0,255,0),
              pygame.Color(100,100,100) ]

# Initialize
pygame.init()

window1Width = posCalc(12,10,3)
window1Height = posCalc(9,8,2)
window1 = pygame.display.set_mode((window1Width,window1Height))

# Functions

def menu():
    menuSelect = 0
    while menuSelect == 0:
        print('''
Main Menu:
(1) Free Play
(2) Set up Cube
(6) Generate Position Lookup
(7) Generate Solution Lookup Move List
(8) Reset Cube
(9) Quit
''')
        comm = input('Select Menu Option: ')
        if comm in ['1', '2', '6', '7', '8', '9']:
            menuSelect = 1
    return int(comm)

def drawIt1(window, sqPos, sqColor, drawColor):
    for i in range(6):
        for ii in range(9):
            pygame.draw.rect(window,
                             drawColor[sqColor[i][ii]],
                             ( sqPos[i][ii][0], sqPos[i][ii][1], boxSize, boxSize ),
                             0)
    pygame.display.update()
    return

def playCube(sqColor):
    commands = ["U", "U2", "U'",
                "L", "L2", "L'",
                "F", "F2", "F'",
                "R", "R2", "R'",
                "B", "B2", "B'",
                "D", "D2", "D'"]
    commList = "U, U2, U', L, L2, L', F, F2, F', R, R2, R', B, B2, B', D, D2, D'"
    comm = ""
    while comm != "Q":
        print("Execute which command?")
        print(commList)
        comm = input('Enter a squence of moves separated by spaces or Q to quit: ' )
        validComm = 1
        commSeq = comm.split()
        for move in commSeq:
            if move not in commList:
                validComm = 0
        if validComm == 1:
            for move in commSeq:
                sqColor = permute(move, sqColor)
        elif validComm == 0 and comm != "Q":
            print("Invalid move sequence.")
        drawIt1(window1, sqPos, sqColor, drawColor)
    return

def permute(comm, sqColor):
    if comm == "U":
        swap4(sqColor, [0,0], [0,2], [0,8], [0,6])
        swap4(sqColor, [0,1], [0,5], [0,7], [0,3])
        swap4(sqColor, [4,0], [3,0], [2,0], [1,0])
        swap4(sqColor, [4,1], [3,1], [2,1], [1,1])
        swap4(sqColor, [4,2], [3,2], [2,2], [1,2])
    elif comm == "U2":
        swap2(sqColor, [0,0], [0,8])
        swap2(sqColor, [0,2], [0,6])
        swap2(sqColor, [0,1], [0,7])
        swap2(sqColor, [0,3], [0,5])
        swap2(sqColor, [1,0], [3,0])
        swap2(sqColor, [1,1], [3,1])
        swap2(sqColor, [1,2], [3,2])
        swap2(sqColor, [2,0], [4,0])
        swap2(sqColor, [2,1], [4,1])
        swap2(sqColor, [2,2], [4,2])
    elif comm == "U'":
        swap4(sqColor, [0,0], [0,6], [0,8], [0,2])
        swap4(sqColor, [0,1], [0,3], [0,7], [0,5])
        swap4(sqColor, [1,0], [2,0], [3,0], [4,0])
        swap4(sqColor, [1,1], [2,1], [3,1], [4,1])
        swap4(sqColor, [1,2], [2,2], [3,2], [4,2])
    elif comm == "L":
        swap4(sqColor, [1,0], [1,2], [1,8], [1,6])
        swap4(sqColor, [1,1], [1,5], [1,7], [1,3])
        swap4(sqColor, [0,0], [2,0], [5,0], [4,8])
        swap4(sqColor, [0,3], [2,3], [5,3], [4,5])
        swap4(sqColor, [0,6], [2,6], [5,6], [4,2])
    elif comm == "L2":
        swap2(sqColor, [1,0], [1,8])
        swap2(sqColor, [1,2], [1,6])
        swap2(sqColor, [1,1], [1,7])
        swap2(sqColor, [1,3], [1,5])
        swap2(sqColor, [2,0], [4,8])
        swap2(sqColor, [2,3], [4,5])
        swap2(sqColor, [2,6], [4,2])
        swap2(sqColor, [0,0], [5,0])
        swap2(sqColor, [0,3], [5,3])
        swap2(sqColor, [0,6], [5,6])
    elif comm == "L'":
        swap4(sqColor, [1,0], [1,6], [1,8], [1,2])
        swap4(sqColor, [1,1], [1,3], [1,7], [1,5])
        swap4(sqColor, [5,0], [2,0], [0,0], [4,8])
        swap4(sqColor, [5,3], [2,3], [0,3], [4,5])
        swap4(sqColor, [5,6], [2,6], [0,6], [4,2])
    elif comm == "F":
        swap4(sqColor, [2,0], [2,2], [2,8], [2,6])
        swap4(sqColor, [2,1], [2,5], [2,7], [2,3])
        swap4(sqColor, [0,6], [3,0], [5,2], [1,8])
        swap4(sqColor, [0,7], [3,3], [5,1], [1,5])
        swap4(sqColor, [0,8], [3,6], [5,0], [1,2])
    elif comm == "F2":
        swap2(sqColor, [2,0], [2,8])
        swap2(sqColor, [2,2], [2,6])
        swap2(sqColor, [2,1], [2,7])
        swap2(sqColor, [2,3], [2,5])
        swap2(sqColor, [0,6], [5,2])
        swap2(sqColor, [0,7], [5,1])
        swap2(sqColor, [0,8], [5,0])
        swap2(sqColor, [1,8], [3,0])
        swap2(sqColor, [1,5], [3,3])
        swap2(sqColor, [1,2], [3,6])
    elif comm == "F'":
        swap4(sqColor, [2,0], [2,6], [2,8], [2,2])
        swap4(sqColor, [2,1], [2,3], [2,7], [2,5])
        swap4(sqColor, [0,6], [1,8], [5,2], [3,0])
        swap4(sqColor, [0,7], [1,5], [5,1], [3,3])
        swap4(sqColor, [0,8], [1,2], [5,0], [3,6])
    elif comm == "R":
        swap4(sqColor, [3,0], [3,2], [3,8], [3,6])
        swap4(sqColor, [3,1], [3,5], [3,7], [3,3])
        swap4(sqColor, [0,8], [4,0], [5,8], [2,8])
        swap4(sqColor, [0,5], [4,3], [5,5], [2,5])
        swap4(sqColor, [0,2], [4,6], [5,2], [2,2])
    elif comm == "R2":
        swap2(sqColor, [3,0], [3,8])
        swap2(sqColor, [3,2], [3,6])
        swap2(sqColor, [3,1], [3,7])
        swap2(sqColor, [3,3], [3,5])
        swap2(sqColor, [2,2], [4,6])
        swap2(sqColor, [2,5], [4,3])
        swap2(sqColor, [2,8], [4,0])
        swap2(sqColor, [0,8], [5,8])
        swap2(sqColor, [0,5], [5,5])
        swap2(sqColor, [0,2], [5,2])
    elif comm == "R'":
        swap4(sqColor, [3,0], [3,6], [3,8], [3,2])
        swap4(sqColor, [3,1], [3,3], [3,7], [3,5])
        swap4(sqColor, [0,8], [2,8], [5,8], [4,0])
        swap4(sqColor, [0,5], [2,5], [5,5], [4,3])
        swap4(sqColor, [0,2], [2,2], [5,2], [4,6])
    elif comm == "B":
        swap4(sqColor, [4,0], [4,2], [4,8], [4,6])
        swap4(sqColor, [4,1], [4,5], [4,7], [4,3])
        swap4(sqColor, [0,2], [1,0], [5,6], [3,8])
        swap4(sqColor, [0,1], [1,3], [5,7], [3,5])
        swap4(sqColor, [0,0], [1,6], [5,8], [3,2])
    elif comm == "B2":
        swap2(sqColor, [4,0], [4,8])
        swap2(sqColor, [4,2], [4,6])
        swap2(sqColor, [4,1], [4,7])
        swap2(sqColor, [4,3], [4,5])
        swap2(sqColor, [0,2], [5,6])
        swap2(sqColor, [0,1], [5,7])
        swap2(sqColor, [0,0], [5,8])
        swap2(sqColor, [1,0], [3,8])
        swap2(sqColor, [1,3], [3,5])
        swap2(sqColor, [1,6], [3,2])
    elif comm == "B'":
        swap4(sqColor, [4,0], [4,6], [4,8], [4,2])
        swap4(sqColor, [4,1], [4,3], [4,7], [4,5])
        swap4(sqColor, [0,2], [3,8], [5,6], [1,0])
        swap4(sqColor, [0,1], [3,5], [5,7], [1,3])
        swap4(sqColor, [0,0], [3,2], [5,8], [1,6])
    elif comm == "D":
        swap4(sqColor, [5,0], [5,2], [5,8], [5,6])
        swap4(sqColor, [5,1], [5,5], [5,7], [5,3])
        swap4(sqColor, [1,6], [2,6], [3,6], [4,6])
        swap4(sqColor, [1,7], [2,7], [3,7], [4,7])
        swap4(sqColor, [1,8], [2,8], [3,8], [4,8])
    elif comm == "D2":
        swap2(sqColor, [5,0], [5,8])
        swap2(sqColor, [5,2], [5,6])
        swap2(sqColor, [5,1], [5,7])
        swap2(sqColor, [5,3], [5,5])
        swap2(sqColor, [1,6], [3,6])
        swap2(sqColor, [1,7], [3,7])
        swap2(sqColor, [1,8], [3,8])
        swap2(sqColor, [2,6], [4,6])
        swap2(sqColor, [2,7], [4,7])
        swap2(sqColor, [2,8], [4,8])
    elif comm == "D'":
        swap4(sqColor, [5,0], [5,6], [5,8], [5,2])
        swap4(sqColor, [5,1], [5,3], [5,7], [5,5])
        swap4(sqColor, [1,6], [4,6], [3,6], [2,6])
        swap4(sqColor, [1,7], [4,7], [3,7], [2,7])
        swap4(sqColor, [1,8], [4,8], [3,8], [2,8])

    return sqColor

def swap4(sqColor, p0, p1, p2, p3):
    temp = sqColor[p0[0]][p0[1]]
    sqColor[p0[0]][p0[1]] = sqColor[p3[0]][p3[1]]
    sqColor[p3[0]][p3[1]] = sqColor[p2[0]][p2[1]]
    sqColor[p2[0]][p2[1]] = sqColor[p1[0]][p1[1]]
    sqColor[p1[0]][p1[1]] = temp
    return sqColor

def swap2(sqColor, p0, p1):
    temp = sqColor[p0[0]][p0[1]]
    sqColor[p0[0]][p0[1]] = sqColor[p1[0]][p1[1]]
    sqColor[p1[0]][p1[1]] = temp
    return sqColor

def setCube(sqColor):
    done = 0
    print('''
 Faces             Positions
  0                  0 1 2
1 2 3 4              3 4 5
  5                  6 7 8

0 = Blue, 1 = Red, 2 = Yellow, 3 = Orange, 4 = White, 5 = Green, 6 = Gray

Use Face = -1 to quit
''')
    face = input('Enter a face to change: ')
    if face in ['0', '1', '2', '3', '4', '5']:
        pos = input('Enter a position to chance: ')
        if pos in ['0', '1', '2', '3', '4', '5', '6', '7', '8']:
            color = input('Enter the color: ')
            if color in ['0', '1', '2', '3', '4', '5', '6']:
                face = int(face)
                pos = int(pos)
                color = int(color)
                sqColor[face][pos] = color
    elif face == '-1':
        done = 1
    return done, sqColor

def buildLib():
    xAxis = [["F"], ["F2"], ["F'"], ["B"], ["B2"], ["B'"],
             ["F", "B"], ["F", "B2"], ["F", "B'"],
             ["F2", "B"], ["F2", "B2"], ["F2", "B'"],                           
             ["F'", "B"], ["F'", "B2"], ["F'", "B'"] ]
    yAxis = [["L"], ["L2"], ["L'"], ["R"], ["R2"], ["R'"],
             ["L", "R"], ["L", "R2"], ["L", "R'"],
             ["L2", "R"], ["L2", "R2"], ["L2", "R'"],                           
             ["L'", "R"], ["L'", "R2"], ["L'", "R'"] ]
    zAxis = [["U"], ["U2"], ["U'"], ["D"], ["D2"], ["D'"],
             ["U", "D"], ["U", "D2"], ["U", "D'"],
             ["U2", "D"], ["U2", "D2"], ["U2", "D'"],                           
             ["U'", "D"], ["U'", "D2"], ["U'", "D'"] ]

    maxDepth = 3

    moveList = [ [move] for move in xAxis+yAxis+zAxis]

    for firstMove in xAxis+yAxis+zAxis:
        thisComm = [ firstMove ]
        done = 0
        goBack = 0
        while done == 0:
            if len( moveList ) % 1000 == 0:
                print(len(moveList))
            if len( thisComm ) < maxDepth and goBack == 0:
                if thisComm[-1] in xAxis:
                    thisComm.append( yAxis[0] )
                else:
                    thisComm.append( xAxis[0] )
            else:
                goBack = 0
                lastMove = thisComm[-1]
                if lastMove in xAxis:
                    if xAxis.index( lastMove ) < len( xAxis ) - 1:
                        thisComm[-1] = xAxis[ xAxis.index( lastMove ) + 1]
                    else:
                        if thisComm[-2] in yAxis:
                            thisComm[-1] = zAxis[0]
                        else:
                            thisComm[-1] = yAxis[0]
                elif lastMove in yAxis:
                    if yAxis.index( lastMove ) < len( yAxis ) - 1:
                        thisComm[-1] = yAxis[ yAxis.index( lastMove ) + 1]
                    else:
                        if thisComm[-1] in xAxis:
                            thisComm[-1] = zAxis[0]
                        else:
                            if thisComm[-2] in xAxis:
                                thisComm[-1] = zAxis[0]
                            else:
                                thisComm = thisComm[:-1]
                                goBack = 1
                elif lastMove in zAxis:
                    if zAxis.index( lastMove ) < len( xAxis) - 1:
                        thisComm[-1] = zAxis[ zAxis.index( lastMove ) + 1]
                    else:
                        thisComm = thisComm[:-1]
                        goBack = 1
            if len( thisComm ) == 1:
                done = 1
            if goBack == 0 and done == 0:
                moveList.append( thisComm[:] )
    flatList = []
    for move in moveList:
        flatList.append( [ item for sublist in move for item in sublist ] )
    pickle_out = open("flatList.pickle", 'wb')
    pickle.dump(flatList, pickle_out)
    pickle_out.close()
    return

def buildPosLib():
    pickle_in = open("flatList.pickle",'rb')
    flatList = pickle.load(pickle_in)
    pickle_in.close()

    posList = [ [ [ i for ii in range(9) ] for i in range(6) ] ]
    solList = [ [] ]

    for moveSeq in flatList:
        if len(solList) % 1000 == 0:
            print(len(solList))
        sqColor = [ [ i for ii in range(9) ] for i in range(6) ]
        for move in moveSeq:
            sqColor = permute(move, sqColor)
        posList.append( [ sqColor ] )
        solList.append( invertMoves( moveSeq ) )

    pickle_out = open("posList.pickle",'wb')
    pickle.dump(posList, pickle_out)
    pickle_out.close()
    pickle_out = open("solList.pickle",'wb')
    pickle.dump(solList, pickle_out)
    pickle_out.close()


def invertMoves( seq ):
    inverted = []
    for i in range(len(seq)):
        j = i+1
        if seq[-j] == "F":
            inverted.append("F'")
        elif seq[-j] == "F'":
            inverted.append("F")
        elif seq[-j] == "B":
            inverted.append("B'")
        elif seq[-j] == "B'":
            inverted.append("B")
        elif seq[-j] == "R":
            inverted.append("R'")
        elif seq[-j] == "R'":
            inverted.append("R")
        elif seq[-j] == "L":
            inverted.append("L'")
        elif seq[-j] == "L'":
            inverted.append("L")
        elif seq[-j] == "U":
            inverted.append("U'")
        elif seq[-j] == "U'":
            inverted.append("U")
        elif seq[-j] == "D":
            inverted.append("D'")
        elif seq[-j] == "D'":
            inverted.append("D")
        else:
            inverted.append( seq[-j] )
    return inverted


### Main program

comm = 0
while comm != 9:
    comm = menu()
    if comm == 1:
        drawIt1(window1, sqPos, sqColor, drawColor)
        playCube(sqColor)
    elif comm == 2:
        finished = 0
        while finished == 0:
            drawIt1(window1, sqPos, sqColor, drawColor)
            finished, sqColor = setCube(sqColor)
        drawIt1(window1, sqPos, sqColor, drawColor)
    elif comm == 6:
        buildPosLib()
    elif comm == 7:
        buildLib()
    elif comm == 8:
        sqColor = [ [ i for ii in range(9) ] for i in range(6) ]
    elif comm == 9:
        quit()

Wednesday, November 16, 2016

"Playable" Cube

This is just a minor update on the Rubik's Cube program. I've programmed all of the moves as well as the ability to define the cube colors to be whatever you want them to be. This process includes a "gray" color which will be used as a "I don't care" color in the search algorithm. Speaking of the search algorithm, I have decided how I'm going to set it up.

There are 18 moves that are hard-coded into the program (3 moves per face * 6 faces). The most straight-forward search method (test each one, then test all two consecutive moves, then test all three consecutive moves) grows exponentially. To search a 4 move depth would require 18^4, which is just over 100,000 attempts. But many of these moves would be duplicates. For example, the pattern UU would have been searched at depth 1 by U2. There are also a lot of "dumb" moves like UU', which accomplishes nothing.

So my first idea is that I'm not going to search using the 18 standard moves, but instead I'm going to think of each move as being rotation around a particular pole. If you imagine the z-axis going up the U and D faces and consider all of the distinct moves that use only those faces, there are a total of 15 different things that can be done: U, U2, U', DU, DU2, DU', D2U, D2U2, D2U', D'U, D'U2, D'U', D, D2, D'. If the next move is along one of the other two axes, it will be impossible for me to have a move that's undone. I also know that if I do something along the z-axis twice in a row, I'm going to get something redundant. So my search tree will have 45*30*30*... total algorithms to check. But I'm guaranteed not to have anything too silly in the search.

One other issue that I'm concerned about is the time it will take to search. The way this is set up, I'm searching for exactly one arrangement (the solved position). It is possible for me to decrease the search time by expanding the space of winning arrangements. Instead of looking for the solved position, I could search for any position that is within one move of being solved. In other words, teach the program to recognize the almost-solved positions as well as the solved positions. Then if it's only one move away, it doesn't need to finish searching the entire tree at that depth before finding the solution. If I do this for all 3-deep positions from being solved, I go from one winning position to 45*30*30 = 40,500 winning positions.

In order to do this, I need to basically start from a solved cube and then apply all 3-deep moves to it. Since I'll be searching cases where certain pieces might not matter, I'll probably need to include the 1-deep and 2-deep moves in the list. But I'll work on that some other day. Here's the code as it stands.
import pygame

# Global Variables
sGap = 5
bGap = 10
boxSize = 30

def posCalc(bS, sG, bG):
    return bS*boxSize + sG*sGap + bG*bGap

# Define drawing data

sqPos = [ [ [posCalc(3,3,1), posCalc(0,1,0)],
            [posCalc(4,4,1), posCalc(0,1,0)],
            [posCalc(5,5,1), posCalc(0,1,0)],
            [posCalc(3,3,1), posCalc(1,2,0)],
            [posCalc(4,4,1), posCalc(1,2,0)],
            [posCalc(5,5,1), posCalc(1,2,0)],
            [posCalc(3,3,1), posCalc(2,3,0)],
            [posCalc(4,4,1), posCalc(2,3,0)],
            [posCalc(5,5,1), posCalc(2,3,0)] ],
          [ [posCalc(0,1,0), posCalc(3,3,1)],
            [posCalc(1,2,0), posCalc(3,3,1)],
            [posCalc(2,3,0), posCalc(3,3,1)],
            [posCalc(0,1,0), posCalc(4,4,1)],
            [posCalc(1,2,0), posCalc(4,4,1)],
            [posCalc(2,3,0), posCalc(4,4,1)],
            [posCalc(0,1,0), posCalc(5,5,1)],
            [posCalc(1,2,0), posCalc(5,5,1)],
            [posCalc(2,3,0), posCalc(5,5,1)] ],
          [ [posCalc(3,3,1), posCalc(3,3,1)],
            [posCalc(4,4,1), posCalc(3,3,1)],
            [posCalc(5,5,1), posCalc(3,3,1)],
            [posCalc(3,3,1), posCalc(4,4,1)],
            [posCalc(4,4,1), posCalc(4,4,1)],
            [posCalc(5,5,1), posCalc(4,4,1)],
            [posCalc(3,3,1), posCalc(5,5,1)],
            [posCalc(4,4,1), posCalc(5,5,1)],
            [posCalc(5,5,1), posCalc(5,5,1)] ],
          [ [posCalc(6,5,2), posCalc(3,3,1)],
            [posCalc(7,6,2), posCalc(3,3,1)],
            [posCalc(8,7,2), posCalc(3,3,1)],
            [posCalc(6,5,2), posCalc(4,4,1)],
            [posCalc(7,6,2), posCalc(4,4,1)],
            [posCalc(8,7,2), posCalc(4,4,1)],
            [posCalc(6,5,2), posCalc(5,5,1)],
            [posCalc(7,6,2), posCalc(5,5,1)],
            [posCalc(8,7,2), posCalc(5,5,1)] ],
          [ [posCalc(9,7,3), posCalc(3,3,1)],
            [posCalc(10,8,3), posCalc(3,3,1)],
            [posCalc(11,9,3), posCalc(3,3,1)],
            [posCalc(9,7,3), posCalc(4,4,1)],
            [posCalc(10,8,3), posCalc(4,4,1)],
            [posCalc(11,9,3), posCalc(4,4,1)],
            [posCalc(9,7,3), posCalc(5,5,1)],
            [posCalc(10,8,3), posCalc(5,5,1)],
            [posCalc(11,9,3), posCalc(5,5,1)] ],
          [ [posCalc(3,3,1), posCalc(6,5,2)],
            [posCalc(4,4,1), posCalc(6,5,2)],
            [posCalc(5,5,1), posCalc(6,5,2)],
            [posCalc(3,3,1), posCalc(7,6,2)],
            [posCalc(4,4,1), posCalc(7,6,2)],
            [posCalc(5,5,1), posCalc(7,6,2)],
            [posCalc(3,3,1), posCalc(8,7,2)],
            [posCalc(4,4,1), posCalc(8,7,2)],
            [posCalc(5,5,1), posCalc(8,7,2)] ]
        ]

sqColor = [ [ i for ii in range(9) ] for i in range(6) ]

drawColor = [ pygame.Color(0,0,255),
              pygame.Color(255,0,0),
              pygame.Color(255,255,0),
              pygame.Color(255,100,0),
              pygame.Color(255,255,255),
              pygame.Color(0,255,0),
              pygame.Color(100,100,100) ]

# Initialize
pygame.init()

window1Width = posCalc(12,10,3)
window1Height = posCalc(9,8,2)
window1 = pygame.display.set_mode((window1Width,window1Height))

# Functions

def menu():
    menuSelect = 0
    while menuSelect == 0:
        print('''
Main Menu:
(1) Free Play
(2) Set up Cube
(8) Reset Cube
(9) Quit
''')
        comm = input('Select Menu Option: ')
        if comm in ['1', '2', '8', '9']:
            menuSelect = 1
    return int(comm)

def drawIt1(window, sqPos, sqColor, drawColor):
    for i in range(6):
        for ii in range(9):
            pygame.draw.rect(window,
                             drawColor[sqColor[i][ii]],
                             ( sqPos[i][ii][0], sqPos[i][ii][1], boxSize, boxSize ),
                             0)
    pygame.display.update()
    return

def playCube(sqColor):
    commands = ["U", "U2", "U'",
                "L", "L2", "L'",
                "F", "F2", "F'",
                "R", "R2", "R'",
                "B", "B2", "B'",
                "D", "D2", "D'"]
    commList = "U, U2, U', L, L2, L', F, F2, F', R, R2, R', B, B2, B', D, D2, D'"
    comm = ""
    while comm != "Q":
        print("Execute which command?")
        print(commList)
        comm = input('Select a move or Q to quit: ' )
        if comm in commands:
            sqColor = permute(comm, sqColor)
        drawIt1(window1, sqPos, sqColor, drawColor)

def permute(comm, sqColor):
    if comm == "U":
        swap4(sqColor, [0,0], [0,2], [0,8], [0,6])
        swap4(sqColor, [0,1], [0,5], [0,7], [0,3])
        swap4(sqColor, [4,0], [3,0], [2,0], [1,0])
        swap4(sqColor, [4,1], [3,1], [2,1], [1,1])
        swap4(sqColor, [4,2], [3,2], [2,2], [1,2])
    elif comm == "U2":
        swap2(sqColor, [0,0], [0,8])
        swap2(sqColor, [0,2], [0,6])
        swap2(sqColor, [0,1], [0,7])
        swap2(sqColor, [0,3], [0,5])
        swap2(sqColor, [1,0], [3,0])
        swap2(sqColor, [1,1], [3,1])
        swap2(sqColor, [1,2], [3,2])
        swap2(sqColor, [2,0], [4,0])
        swap2(sqColor, [2,1], [4,1])
        swap2(sqColor, [2,2], [4,2])
    elif comm == "U'":
        swap4(sqColor, [0,0], [0,6], [0,8], [0,2])
        swap4(sqColor, [0,1], [0,3], [0,7], [0,5])
        swap4(sqColor, [1,0], [2,0], [3,0], [4,0])
        swap4(sqColor, [1,1], [2,1], [3,1], [4,1])
        swap4(sqColor, [1,2], [2,2], [3,2], [4,2])
    elif comm == "L":
        swap4(sqColor, [1,0], [1,2], [1,8], [1,6])
        swap4(sqColor, [1,1], [1,5], [1,7], [1,3])
        swap4(sqColor, [0,0], [2,0], [5,0], [4,8])
        swap4(sqColor, [0,3], [2,3], [5,3], [4,5])
        swap4(sqColor, [0,6], [2,6], [5,6], [4,2])
    elif comm == "L2":
        swap2(sqColor, [1,0], [1,8])
        swap2(sqColor, [1,2], [1,6])
        swap2(sqColor, [1,1], [1,7])
        swap2(sqColor, [1,3], [1,5])
        swap2(sqColor, [2,0], [4,8])
        swap2(sqColor, [2,3], [4,5])
        swap2(sqColor, [2,6], [4,2])
        swap2(sqColor, [0,0], [5,0])
        swap2(sqColor, [0,3], [5,3])
        swap2(sqColor, [0,6], [5,6])
    elif comm == "L'":
        swap4(sqColor, [1,0], [1,6], [1,8], [1,2])
        swap4(sqColor, [1,1], [1,3], [1,7], [1,5])
        swap4(sqColor, [5,0], [2,0], [0,0], [4,8])
        swap4(sqColor, [5,3], [2,3], [0,3], [4,5])
        swap4(sqColor, [5,6], [2,6], [0,6], [4,2])
    elif comm == "F":
        swap4(sqColor, [2,0], [2,2], [2,8], [2,6])
        swap4(sqColor, [2,1], [2,5], [2,7], [2,3])
        swap4(sqColor, [0,6], [3,0], [5,2], [1,8])
        swap4(sqColor, [0,7], [3,3], [5,1], [1,5])
        swap4(sqColor, [0,8], [3,6], [5,0], [1,2])
    elif comm == "F2":
        swap2(sqColor, [2,0], [2,8])
        swap2(sqColor, [2,2], [2,6])
        swap2(sqColor, [2,1], [2,7])
        swap2(sqColor, [2,3], [2,5])
        swap2(sqColor, [0,6], [5,2])
        swap2(sqColor, [0,7], [5,1])
        swap2(sqColor, [0,8], [5,0])
        swap2(sqColor, [1,8], [3,0])
        swap2(sqColor, [1,5], [3,3])
        swap2(sqColor, [1,2], [3,6])
    elif comm == "F'":
        swap4(sqColor, [2,0], [2,6], [2,8], [2,2])
        swap4(sqColor, [2,1], [2,3], [2,7], [2,5])
        swap4(sqColor, [0,6], [1,8], [5,2], [3,0])
        swap4(sqColor, [0,7], [1,5], [5,1], [3,3])
        swap4(sqColor, [0,8], [1,2], [5,0], [3,6])
    elif comm == "R":
        swap4(sqColor, [3,0], [3,2], [3,8], [3,6])
        swap4(sqColor, [3,1], [3,5], [3,7], [3,3])
        swap4(sqColor, [0,8], [4,0], [5,8], [2,8])
        swap4(sqColor, [0,5], [4,3], [5,5], [2,5])
        swap4(sqColor, [0,2], [4,6], [5,2], [2,2])
    elif comm == "R2":
        swap2(sqColor, [3,0], [3,8])
        swap2(sqColor, [3,2], [3,6])
        swap2(sqColor, [3,1], [3,7])
        swap2(sqColor, [3,3], [3,5])
        swap2(sqColor, [2,2], [4,6])
        swap2(sqColor, [2,5], [4,3])
        swap2(sqColor, [2,8], [4,0])
        swap2(sqColor, [0,8], [5,8])
        swap2(sqColor, [0,5], [5,5])
        swap2(sqColor, [0,2], [5,2])
    elif comm == "R'":
        swap4(sqColor, [3,0], [3,6], [3,8], [3,2])
        swap4(sqColor, [3,1], [3,3], [3,7], [3,5])
        swap4(sqColor, [0,8], [2,8], [5,8], [4,0])
        swap4(sqColor, [0,5], [2,5], [5,5], [4,3])
        swap4(sqColor, [0,2], [2,2], [5,2], [4,6])
    elif comm == "B":
        swap4(sqColor, [4,0], [4,2], [4,8], [4,6])
        swap4(sqColor, [4,1], [4,5], [4,7], [4,3])
        swap4(sqColor, [0,2], [1,0], [5,6], [3,8])
        swap4(sqColor, [0,1], [1,3], [5,7], [3,5])
        swap4(sqColor, [0,0], [1,6], [5,8], [3,2])
    elif comm == "B2":
        swap2(sqColor, [4,0], [4,8])
        swap2(sqColor, [4,2], [4,6])
        swap2(sqColor, [4,1], [4,7])
        swap2(sqColor, [4,3], [4,5])
        swap2(sqColor, [0,2], [5,6])
        swap2(sqColor, [0,1], [5,7])
        swap2(sqColor, [0,0], [5,8])
        swap2(sqColor, [1,0], [3,8])
        swap2(sqColor, [1,3], [3,5])
        swap2(sqColor, [1,6], [3,2])
    elif comm == "B'":
        swap4(sqColor, [4,0], [4,6], [4,8], [4,2])
        swap4(sqColor, [4,1], [4,3], [4,7], [4,5])
        swap4(sqColor, [0,2], [3,8], [5,6], [1,0])
        swap4(sqColor, [0,1], [3,5], [5,7], [1,3])
        swap4(sqColor, [0,0], [3,2], [5,8], [1,6])
    elif comm == "D":
        swap4(sqColor, [5,0], [5,2], [5,8], [5,6])
        swap4(sqColor, [5,1], [5,5], [5,7], [5,3])
        swap4(sqColor, [1,6], [2,6], [3,6], [4,6])
        swap4(sqColor, [1,7], [2,7], [3,7], [4,7])
        swap4(sqColor, [1,8], [2,8], [3,8], [4,8])
    elif comm == "D2":
        swap2(sqColor, [5,0], [5,8])
        swap2(sqColor, [5,2], [5,6])
        swap2(sqColor, [5,1], [5,7])
        swap2(sqColor, [5,3], [5,5])
        swap2(sqColor, [1,6], [3,6])
        swap2(sqColor, [1,7], [3,7])
        swap2(sqColor, [1,8], [3,8])
        swap2(sqColor, [2,6], [4,6])
        swap2(sqColor, [2,7], [4,7])
        swap2(sqColor, [2,8], [4,8])
    elif comm == "D'":
        swap4(sqColor, [5,0], [5,6], [5,8], [5,2])
        swap4(sqColor, [5,1], [5,3], [5,7], [5,5])
        swap4(sqColor, [1,6], [4,6], [3,6], [2,6])
        swap4(sqColor, [1,7], [4,7], [3,7], [2,7])
        swap4(sqColor, [1,8], [4,8], [3,8], [2,8])

    return sqColor

def swap4(sqColor, p0, p1, p2, p3):
    temp = sqColor[p0[0]][p0[1]]
    sqColor[p0[0]][p0[1]] = sqColor[p3[0]][p3[1]]
    sqColor[p3[0]][p3[1]] = sqColor[p2[0]][p2[1]]
    sqColor[p2[0]][p2[1]] = sqColor[p1[0]][p1[1]]
    sqColor[p1[0]][p1[1]] = temp
    return sqColor

def swap2(sqColor, p0, p1):
    temp = sqColor[p0[0]][p0[1]]
    sqColor[p0[0]][p0[1]] = sqColor[p1[0]][p1[1]]
    sqColor[p1[0]][p1[1]] = temp
    return sqColor

def setCube(sqColor):
    done = 0
    print('''
 Faces             Positions
  0                  0 1 2
1 2 3 4              3 4 5
  5                  6 7 8

0 = Blue, 1 = Red, 2 = Yellow, 3 = Orange, 4 = White, 5 = Green, 6 = Gray

Use Face = -1 to quit
''')
    face = input('Enter a face to change: ')
    if face in ['0', '1', '2', '3', '4', '5']:
        pos = input('Enter a position to chance: ')
        if pos in ['0', '1', '2', '3', '4', '5', '6', '7', '8']:
            color = input('Enter the color: ')
            if color in ['0', '1', '2', '3', '4', '5', '6']:
                face = int(face)
                pos = int(pos)
                color = int(color)
                sqColor[face][pos] = color
    elif face == '-1':
        done = 1
    return done, sqColor

### Main program

comm = 0
while comm != 9:
    comm = menu()
    print(comm)
    if comm == 1:
        print('Free Play!!!')
        drawIt1(window1, sqPos, sqColor, drawColor)
        playCube(sqColor)
    elif comm == 2:
        finished = 0
        while finished == 0:
            drawIt1(window1, sqPos, sqColor, drawColor)
            finished, sqColor = setCube(sqColor)
    elif comm == 8:
        sqColor = [ [ i for ii in range(9) ] for i in range(6) ]
    elif comm == 9:
        quit()

Monday, November 7, 2016

Displaying the Unfolded Cube

There isn't that much to this first step. All I wanted to do was to be able to display an unfolded cube. But because I didn't want to have to calculate the positions of everything by hand, I wrote a function that let me compute the locations of the squares based on three constants: the small gap size (between squares on the same face), the large gap size (between the faces), and the box size (the actual boxes). After having done all of this, I think I probably could have written a more general position calculator, but I'm not going to go back and rewrite this in order to do that. Also, I think this is the first time I've used global variables. My understanding is that if you define a variable before you define functions, you get a global variable. That seems to be what happens, anyway.

I've defined the colors of the faces to match the physical cube that I have (with blue on top... for now). The locations are given by a two dimensional array. The first dimension gives the face (0 = U, 1 = L, 2 = F, 3 = R, 4 = B, 5 = D) and the second dimension gives the position (0-8 starting from the top left and working across the rows and down the columns). I also added a 7th color (gray) for when I'm searching for moves that permute certain pieces but that do not care what happens to certain other pieces.

The next task will be to program in the permutations obtained from the 18 basic movements. I thought about just programming in the 6 clockwise rotations and then having the others generated by repeating these, but I suspect that I'll be checking millions of positions, so that would just make things inefficient. When it comes time to writing the search algorithm, I may need to also write in a "don't be dumb" check that doesn't let me do two moves on the same face in a row. But I'll get to that one probably in the next few days.

I've set myself up to have two windows. One to show the unfolded cube and the other to show a 3D cube. I actually want to use the second window to display move sequences with a visual aid to help me get it right. It's not necessary, but it just seemed like a fun thing to try. That will be done towards the end of the project.

Here's the existing code:
import pygame

# Global Variables
sGap = 5
bGap = 10
boxSize = 30

# Functions

def posCalc(bS, sG, bG):
    return bS*boxSize + sG*sGap + bG*bGap

def drawIt1(window, sqPos, sqColor, drawColor):
    for i in range(6):
        for ii in range(9):
            pygame.draw.rect(window,
                             drawColor[sqColor[i][ii]],
                             ( sqPos[i][ii][0], sqPos[i][ii][1], boxSize, boxSize ),
                             0)
    pygame.display.update()
    return


# Define drawing data

sqPos = [ [ [posCalc(3,3,1), posCalc(0,1,0)],
            [posCalc(4,4,1), posCalc(0,1,0)],
            [posCalc(5,5,1), posCalc(0,1,0)],
            [posCalc(3,3,1), posCalc(1,2,0)],
            [posCalc(4,4,1), posCalc(1,2,0)],
            [posCalc(5,5,1), posCalc(1,2,0)],
            [posCalc(3,3,1), posCalc(2,3,0)],
            [posCalc(4,4,1), posCalc(2,3,0)],
            [posCalc(5,5,1), posCalc(2,3,0)] ],
          [ [posCalc(0,1,0), posCalc(3,3,1)],
            [posCalc(1,2,0), posCalc(3,3,1)],
            [posCalc(2,3,0), posCalc(3,3,1)],
            [posCalc(0,1,0), posCalc(4,4,1)],
            [posCalc(1,2,0), posCalc(4,4,1)],
            [posCalc(2,3,0), posCalc(4,4,1)],
            [posCalc(0,1,0), posCalc(5,5,1)],
            [posCalc(1,2,0), posCalc(5,5,1)],
            [posCalc(2,3,0), posCalc(5,5,1)] ],
          [ [posCalc(3,3,1), posCalc(3,3,1)],
            [posCalc(4,4,1), posCalc(3,3,1)],
            [posCalc(5,5,1), posCalc(3,3,1)],
            [posCalc(3,3,1), posCalc(4,4,1)],
            [posCalc(4,4,1), posCalc(4,4,1)],
            [posCalc(5,5,1), posCalc(4,4,1)],
            [posCalc(3,3,1), posCalc(5,5,1)],
            [posCalc(4,4,1), posCalc(5,5,1)],
            [posCalc(5,5,1), posCalc(5,5,1)] ],
          [ [posCalc(6,5,2), posCalc(3,3,1)],
            [posCalc(7,6,2), posCalc(3,3,1)],
            [posCalc(8,7,2), posCalc(3,3,1)],
            [posCalc(6,5,2), posCalc(4,4,1)],
            [posCalc(7,6,2), posCalc(4,4,1)],
            [posCalc(8,7,2), posCalc(4,4,1)],
            [posCalc(6,5,2), posCalc(5,5,1)],
            [posCalc(7,6,2), posCalc(5,5,1)],
            [posCalc(8,7,2), posCalc(5,5,1)] ],
          [ [posCalc(9,7,3), posCalc(3,3,1)],
            [posCalc(10,8,3), posCalc(3,3,1)],
            [posCalc(11,9,3), posCalc(3,3,1)],
            [posCalc(9,7,3), posCalc(4,4,1)],
            [posCalc(10,8,3), posCalc(4,4,1)],
            [posCalc(11,9,3), posCalc(4,4,1)],
            [posCalc(9,7,3), posCalc(5,5,1)],
            [posCalc(10,8,3), posCalc(5,5,1)],
            [posCalc(11,9,3), posCalc(5,5,1)] ],
          [ [posCalc(3,3,1), posCalc(6,5,2)],
            [posCalc(4,4,1), posCalc(6,5,2)],
            [posCalc(5,5,1), posCalc(6,5,2)],
            [posCalc(3,3,1), posCalc(7,6,2)],
            [posCalc(4,4,1), posCalc(7,6,2)],
            [posCalc(5,5,1), posCalc(7,6,2)],
            [posCalc(3,3,1), posCalc(8,7,2)],
            [posCalc(4,4,1), posCalc(8,7,2)],
            [posCalc(5,5,1), posCalc(8,7,2)] ]
        ]

sqColor = [ [ i for ii in range(9) ] for i in range(6) ]

drawColor = [ pygame.Color(0,0,255),
              pygame.Color(255,0,0),
              pygame.Color(255,255,0),
              pygame.Color(255,140,0),
              pygame.Color(255,255,255),
              pygame.Color(0,255,0),
              pygame.Color(100,100,100) ]

### Main program
# Initialize
pygame.init()

window1Width = posCalc(12,10,3)
window1Height = posCalc(9,8,2)
window1 = pygame.display.set_mode((window1Width,window1Height))
drawIt1(window1, sqPos, sqColor, drawColor)

Sunday, November 6, 2016

Speed Cubing Project

It's been a long time since I've last pulled out the Raspberry Pi to play with it. I've been watching a lot of videos about Python from PythonProgramming.net, which has been extremely enlightening (though I haven't actually done anything with it). For a short period of time I worked on some programming stuff to try to get some horse racing data, but I just don't have the free time to put into these projects as I would like.

However, I think I found a project that's at an appropriate level for the time that I have. Over the summer, I discovered that my friend's son got interested in speed cubing. I played around with the Rubik's cube a whole lot when I was in high school, and so this discovery piqued my interest a bit. Then this semester I discovered that I had a student that was interested in speed cubing. And I think that put things over the top. So I bought myself a proper speed cube, which arrived in the mail today.

There are basically two aspects of speed cubing. One is just the raw memorization of algorithms to move the pieces to certain places, and the other is the physical technique of manipulating the cube. So my idea is to create a program that's both an algorithm search program as well as an algorithm optimizer. It's true that there are lots of algorithms published on the internet to do all sorts of things. I suspect that those are fairly optimal with regards to the number of moves it takes to execute them. But they may not be optimal with regards to the actual physical manipulation of the cube. For example, RUR' is a very fluid motion, but BUB' is not. I actually think it would be faster to physically rotate the cube and then do RUR' even though the latter is four movements.

So I envision building out this program in steps. I'm going to first build the algorithm search engine, and then I'm going modify that engine so that it takes into account some data about the speed of movements. Or perhaps in the process of finding algorithms, I may discover that there are some moves that I simply need to practice. For example, repeating RUR'U' is a common way to just fiddle around with the cube. And by doing this over and over again, there's a bit of muscle memory that's created to make patterns such as RUR' become more fluid and more automatic. So I might have to just create patterns like that to play with.

This project is somewhat reminiscent of a project I did back in high school. I was frustrated with a permutation game called Triple Cross, and so I programmed a computer to execute a search for an algorithm to solve it. For that game, there were really only four basic motions, so it was a simple tree which made writing the search algorithm easy. The Rubik's Cube has 18 motions, so there will be a much slower search process. However, since computers are significantly faster than they were back in 1998, it's possible that this won't really be an issue. As I'm thinking about potential pitfalls along the way, I'm wondering whether I'll need to make a library of permutations to speed up the search process.

I'm also thinking about developing a display module to show the moves just to get myself used to thinking about the moves using the standard language and notation for speed cubing. If I'm holding a cube in front of me, it's far from natural for me to manipulate it by just reading a string of letters. That display module will probably just come from making drawings using PyGame. It shouldn't be that hard.

Friday, March 18, 2016

Perceptron Learning Algorithm

I haven't given up on the TTT project. But I've decided to take some time to play with a few other things and come back to the TTT later.

I've been reading through "Python Machine Learning" by Sebastian Raschka. There are lots of code examples that are provided, but in order for me to ensure that I'm understanding things, I'm trying to rewrite pieces of code on my own. The rewritten code will almost certainly be less efficient, but that's not really the point. I want to understand this stuff.

Chapter 2 shows a programming example of the perceptron learning algorithm. I've rewritten that algorithm using lists instead of arrays. This is partly that desire to make sure I understand (lists give me a natural way to perform calculations in a sequential manner, allowing me to break it down step-by-step), and partly because I'm just a little bit lazy and am not immediately interested in learning the syntax of arrays in numpy. I'll have to get there eventually, but right now I'm more interested in the learning algorithms. Besides, if I'm ever in search of an application, I'd probably just use pre-existing code.

The perceptron algorithm is a classification algorithm. It works with data that is linearly separable. Imagine a line on the plane, and one side has only Xs and the other has only Os. The perceptron algorithm seeks to generate that line (or rather, a line that can also serve as the boundary between the data points). The algorithm generates a vector w whose length is one longer than the dimension of the space. To predict whether the outcome is an X or an O, the dot product of w and a vector made of 1 followed by the coordinates of the point is calculated. The prediction is based on whether the result of that dot product is positive or negative.

The algorithm is written in two pieces. One command is called "predict" and this takes the weight vector and a location, and it outputs either a 1 or a -1 depending on the value of the dot product. The other piece is the actual learning algorithm. If I did this correctly, I've actually found an error in the book. The book claims that the algorithm should update the weights simultaneously, but based on what I programmed, it doesn't do that. There may be some technicality in how it runs as the author wrote it, but I'm not interested in pursuing that too deeply.

The next step for me is to rewrite the program that graphically displays this information. The visualization piece is important for me to learn. As a side note, I did not program this on the Raspberry Pi. However, it should run just fine because it's only using lists. I'm not so sure what will happen when I get to the visualization piece, but we'll see what happens when I get there.

def PerceptronFit(eta, n_iter, X, y):
    '''
    X is a list where each list element is a list containing the values of a particular sample
    -- This is the input data
    y is a list
    -- This list represents the correct classifications
    '''
 
    # Make an empty list that has one more slot than the number of features
    # This list holds the weights
    w = [0.0] * (len(X[0]) + 1)
 
    # Make an empty list that will track the number of errors
    errors = []
 
    # Make an empty list that will retain the weights for each iteration
    w_iter = []
 
    # Begin learning
    for iteration in range(n_iter):
        n_errors = 0
        cur_w = w[:]
        # changing cur_w to just using w will change the code to match the book's output
        for xi, target in zip(X, y):
            update = eta * (target - predict(cur_w, xi))
            w[0] += update
            for n in range(1,len(w)):
                w[n] += update * xi[n-1]
            if update != 0:
                n_errors += 1
        errors.append(n_errors)
        w_iter.append(w[:])
    return w_iter, errors
 
# Function to calculate the prediction {0,1} based on the weight and the input
def predict(w, xi):
    net_input = w[0]
    for n in range(1, len(w)):
        net_input += w[n] * xi[n-1]
    if net_input >= 0.0:
        return 1
    else:
        return -1

Monday, February 15, 2016

Synchronous Value Iteration is a success for Tic-Tac-Toe

It took a while, but I finally got the programming bugs worked out to make VI a success. I made a few mistakes in the setup, from not properly understanding what happens in Python when you set lists equal to each other, to making a conceptual error in the evaluation of the values of various positions. However, none of that matters now because those problems have all been worked out.

Here are some of the highlights of the results:
  • It works! Or at least, it mostly works. It successfully figured out to block every play that's one away from winning. I've played out some sample games below, but I've also spot-checked a few other positions to see if the computer would make the "obvious" decision.
  • The assumptions that it makes is that the opposing player will pick randomly. This means that it did not learn any opponent-specific information. There are some interesting consequences to this, such as the second player making a mistake if you open with an edge move.
  • The "best" decision is not what's worst for the opposing player. The machine learning algorithm wasn't constructed with a two player game in mind. There are situations that the first player analyzes in a certain way because of the assumption that the opponent will play randomly. It does not make a decision based on what move puts the opponent in the worst possible position.
  • The program resolves both player's actions, not just the first player. This was accomplished by having the second player minimize the value and the first player maximize the value.
  • I also set draws to have a value of zero. I had originally wanted to set draws to be -1 to punish the first player for not winning. But when I found myself solving both players at the same time, it seemed more sensible to do this.
  • The values converged at the end of the 4th epoch. Just as a side note, the 0th epoch is where V(s) = 0 for every state s.
Here are a few games played using the calculated policy.
  • X opens on the corner, O plays the center, X picks an adjacent edge, and the game plays itself to a draw. (This is how every game would play if the computer just plays itself.)
  • X opens on an edge, O plays an opposite corner (this is a mistake -- playing the center forces a draw), X plays the corner on same side as both the first X and the O, which can be played to a win for X.
  • X opens in the center, O takes a corner, and X picks up the edge next to the O. After O blocks, X takes the other edge next to the first O, and this ends in a draw.
Playing to do maximal harm has X opening in the center (and playing to a draw). If X opens on the corner, O's maximal harm move is the center (and all moves at the next level are equally harmful).

Lastly, here's the code. It's built off of values that were created by the position enumeration program. Once again, it's not thoroughly commented.
### Load States and Rewards
def LoadSR():
    S = ['000000000']
    R = [0]

    inputFile = open("TTTValidGames.txt","r")

    # For each line of input, convert it to a state string and extract
    # a reward
    for line in inputFile:
        game = line.split()
        S.append(game[0]+game[1]+game[2] \
                 +game[3]+game[4]+game[5]\
                 +game[6]+game[7]+game[8])
        if game[9] == "1":
            R.append(1.0)
        elif game[9] == "2":
            R.append(-1.0)
        elif game[9] == "-1":
            R.append(0.0)
        else:
            R.append(0.0)
    inputFile.close()
    return S, R

### Determine whose turn it is
def WhoseTurn(state):
    CountMoves = 0
    for i in range(0,len(state)):
        if state[i] != '0':
            CountMoves += 1
    if CountMoves % 2 == 0:
        return '1'
    else:
        return '2'


### Calculate OutStates and Transitions
def GetOutStatesAndMoves(S):
    OutStates = []
    OutMoves = []
    for state in S:
        game = StateToInt(state)
        XWin = CheckXWin(game)
        OWin = CheckOWin(game)
        if XWin == 1 or OWin == 2 or XWin == -1:
            OutStates.append([])
            OutMoves.append([])
        else:
            MovesFromHere = []
            PositionsFromHere = []
            player = WhoseTurn(state)

            for i in range(0, len(state)):
                if state[i] == '0':
                    tempstate = ""
                    for j in range(0,i):
                        tempstate += state[j]
                    tempstate += player
                    for j in range(i+1, len(state)):
                        tempstate += state[j]
                    if ValidPosition(tempstate) == 1:
                        MovesFromHere.append(i)
                        PositionsFromHere.append(tempstate)
            OutStates.append(PositionsFromHere)
            OutMoves.append(MovesFromHere)
    return OutStates, OutMoves

### Calculate transition probabilities
def GetTransitionProbs(Moves):
    Probs = []
    for MoveList in Moves:
        if len(MoveList) > 0:
            ThisProb = [1.0/(len(MoveList))] * len(MoveList)
            Probs.append(ThisProb)
        else:
            Probs.append([])
    return Probs

### Convert state string to numerical array
def StateToInt(state):
    game = []
    for i in range(0, len(state)):
        game.append(int(state[i]))
    return game

### Check if a position is valid
def ValidPosition(InState):
    game = StateToInt(InState)
    XWin = CheckXWin(game)
    OWin = CheckOWin(game)
    if XWin == 1 and OWin == 2:
        return 0
    return 1

# Check if X wins
def CheckXWin(game):
    winner = 0
    if game[0]*game[1]*game[2] == 1 \
      or game[3]*game[4]*game[5] == 1 \
      or game[6]*game[7]*game[8] == 1 \
      or game[0]*game[3]*game[6] == 1 \
      or game[1]*game[4]*game[7] == 1 \
      or game[2]*game[5]*game[8] == 1 \
      or game[0]*game[4]*game[8] == 1 \
      or game[2]*game[4]*game[6] == 1:
        winner = 1
    elif game[0]*game[1]*game[2]*game[3]*game[4]*game[5]*game[6]*game[7]*game[8] > 0:
        winner = -1
    return winner

# Check if O wins
def CheckOWin(game):
    winner = 0
    if game[0]*game[1]*game[2] == 8 \
      or game[3]*game[4]*game[5] == 8 \
      or game[6]*game[7]*game[8] == 8 \
      or game[0]*game[3]*game[6] == 8 \
      or game[1]*game[4]*game[7] == 8 \
      or game[2]*game[5]*game[8] == 8 \
      or game[0]*game[4]*game[8] == 8 \
      or game[2]*game[4]*game[6] == 8:
        winner = 2
    elif game[0]*game[1]*game[2]*game[3]*game[4]*game[5]*game[6]*game[7]*game[8] > 0:
        winner = -1
    return winner

### Main Program
max_epochs = 20
gamma = 0.9

S, R = LoadSR()
OutStateList, MoveList = GetOutStatesAndMoves(S)
TProb = GetTransitionProbs(MoveList)

# Begin Learning
VOld = [0.0] * len(S)
VNew = [0.0] * len(S)
P = [-2] * len(S)

for epoch in range(0,max_epochs):
    for state in S:
        stateIndex = S.index(state)
        if len(OutStateList[stateIndex]) == 0:
            VNew[stateIndex] = R[stateIndex]
        else:
            comp = []
            if WhoseTurn(S[stateIndex]) == '1':
                for secondState in OutStateList[stateIndex]:
                    secondIndex = S.index(secondState)
                    if len(OutStateList[secondIndex]) == 0:
                        comp.append(R[secondIndex])
                    else:
                        tempValue = 0.0
                        for i in range(0, len(OutStateList[secondIndex])):
                            tempIndex = S.index(OutStateList[secondIndex][i])
                            tempValue += TProb[secondIndex][i] * VOld[tempIndex]
                        comp.append(tempValue)
                VNew[stateIndex] = R[stateIndex] + gamma * max(comp)
                P[stateIndex] = MoveList[stateIndex][comp.index(max(comp))]
            else:
                for secondState in OutStateList[stateIndex]:
                    secondIndex = S.index(secondState)
                    if len(OutStateList[secondIndex]) == 0:
                        comp.append(R[secondIndex])
                    else:
                        tempValue = 0.0
                        for i in range(0, len(OutStateList[secondIndex])):
                            tempIndex = S.index(OutStateList[secondIndex][i])
                            tempValue += TProb[secondIndex][i] * VOld[tempIndex]
                        comp.append(tempValue)
                VNew[stateIndex] = R[stateIndex] + gamma * min(comp)
                P[stateIndex] = MoveList[stateIndex][comp.index(min(comp))]
#        print epoch, stateIndex, state, WhoseTurn(S[stateIndex]), VOld[stateIndex], VNew[stateIndex], P[stateIndex]
    VOld = VNew[:]
    print epoch

    OutFile = open("SynchVI"+str(epoch)+".csv", "w")
    OutFile.write("state, P, V\n")
    for i in range(0, len(S)):
        OutFile.write(S[i])
        OutFile.write(", ")
        OutFile.write(str(P[i]))
        OutFile.write(", ")
        OutFile.write(str(VNew[i]))
        OutFile.write("\n")
    OutFile.close()

Friday, February 5, 2016

TTT Enumeration Process Complete

I got busy and distracted for a while, but I think it gave my brain the time it needed to process things and find a more efficient method for enumerating the valid game states. Unfortunately, I didn't comment this code as I went, and now that it's done I don't want to go back and comment it. But this seemed to work, and the basic idea was to mimic the behavior of nested for-next loops. I created a function to increment any list because that allowed me to apply the same code to both the X positions and the O positions. (For some reason, when I was working on the code, I ended up calling the players X and Y, not X and O. I don't know why.)

There ended up being 5889 valid game states. There are some redundancies in this list based on either rotational or reflective symmetry, but I'm okay with that. There are enough game states to make this a little bit interesting. I'll have to think about how I'm going to handle that much data, especially as I start getting into calculations. I'm going to have to think about that one for a while.

Here's the (uncommented) code:

maxNum = 8

def nextInd(myList, big):
    length = len(myList)
    done = 0
    depth = 0
    while done == 0:
        if depth == length:
            done = -1
        elif myList[length-1-depth] < big - depth:
            myList[length-1-depth] += 1
            for i in range(0, depth):
                myList[length-depth+i] = myList[length-depth-1] + i + 1
            done = 1
        else:
            depth += 1
    return myList, done

def init(length):
    myList = [0] * length
    for i in range(0,length):
        myList[i] = i
    return myList

def CheckXWin(game):
    winner = 0
    if game[0]*game[1]*game[2] == 1 \
      or game[3]*game[4]*game[5] == 1 \
      or game[6]*game[7]*game[8] == 1 \
      or game[0]*game[3]*game[6] == 1 \
      or game[1]*game[4]*game[7] == 1 \
      or game[2]*game[5]*game[8] == 1 \
      or game[0]*game[4]*game[8] == 1 \
      or game[2]*game[4]*game[6] == 1:
        winner = 1
    elif game[0]*game[1]*game[2]*game[3]*game[4]*game[5]*game[6]*game[7]*game[8] > 0:
        winner = -1
    return winner

def CheckYWin(game):
    winner = 0
    if game[0]*game[1]*game[2] == 8 \
      or game[3]*game[4]*game[5] == 8 \
      or game[6]*game[7]*game[8] == 8 \
      or game[0]*game[3]*game[6] == 8 \
      or game[1]*game[4]*game[7] == 8 \
      or game[2]*game[5]*game[8] == 8 \
      or game[0]*game[4]*game[8] == 8 \
      or game[2]*game[4]*game[6] == 8:
        winner = 2
    elif game[0]*game[1]*game[2]*game[3]*game[4]*game[5]*game[6]*game[7]*game[8] > 0:
        winner = -1
    return winner

def dispGame(game):
    gameChar = []
    for i in range(0,9):
        if game[i] == 0:
            gameChar.append(" ")
        elif game[i] == 1:
            gameChar.append("X")
        elif game[i] == 2:
            gameChar.append("O")
        else:
            gameChar.append("Something bad happened!")
    print
    print " {} | {} | {}".format(gameChar[0], gameChar[1], gameChar[2])
    print "---+---+---"
    print " {} | {} | {}".format(gameChar[3], gameChar[4], gameChar[5])
    print "---+---+---"
    print " {} | {} | {}".format(gameChar[6], gameChar[7], gameChar[8])
    print

for numX in range(1,6):
    for numY in range(numX - 1, numX + 1):
        if numY < 5:
            XPos = init(numX)
            YPos = [0] * numY
            XLoop = 1
            while XLoop == 1:
                remaining = init(9)
                for i in XPos:
                    remaining.remove(i)
                YList = init(numY)
                for i in range(0,numY):
                    YPos[i] = remaining[YList[i]]
                YLoop = 1
                while YLoop == 1:
                    game = [0]*9
                    XWin = 0
                    YWin = 0
                    for i in XPos:
                        game[i] = 1
                    for i in YPos:
                        game[i] = 2
                    XWin = CheckXWin(game)
                    YWin = CheckYWin(game)
                    print "XPos = {}, YPos = {}, XWin = {}, YWin = {}".format(XPos, YPos, XWin, YWin)
                    if not (XWin == 1 and YWin == 2):
                        if XWin == 1:
                            winner = 1
                        elif YWin == 2:
                            winner = 2
                        elif (XWin == -1 and YWin == -1):
                            winner = -1
                        else:
                            winner = 0
                        print "Saving... winner = {}".format(winner)
                        f = open("TTTValidGames.txt", "a")
                        for item in game:
                            f.write(str(item))
                            f.write(" ")
                        f.write(str(winner))
                        f.write("\n")
                        f.close()

                    YList, YLoop = nextInd(YList, len(remaining)-1)
                    if YLoop == 1:
                        for i in range(0,numY):
                            YPos[i] = remaining[YList[i]]
                XPos, XLoop = nextInd(XPos, maxNum)

Thursday, January 21, 2016

Did I mention that enumeration is a slow process?

With the level of success I had with enumerating all possible games, I thought it would be easy to enumerate all possible game states, even if I did it in a less than efficient manner. Three days later, and I don't think that I'm even halfway done. The basic scheme was to take each game and deconstruct it into a series of game states. Each game state would be checked against a list of previously attained game states, and added to the list if it wasn't found. Since the process of creating those states only took about an hour, I figured that it would take maybe 8-9 hours to generate the game states. After all, each game can yield at most 9 states.

So why is it taking so long? For one thing, I forgot to take into account the time of reading through the list of attained game states. But right now, I'm only up to around 3000 game states. What I think has happened is that I have underestimated the time of reading the file. I am willing to bet that reading off the of SD card isn't particularly efficient, so that all of the opening and closing of the file and reading the data over and over again is just chewing up all sorts of time.

Nonetheless, I've included the code for completeness. I don't really have time to write a new program tonight or the next several nights, so I'm just going to let it continue to run for the next few days and see if it reaches the end. If I were to create a new program, I would do the enumeration differently. I think I would go back to the original enumeration program. Every time a move is added, it would have to be a new position. But if I remove a move, I would be returning to a previously created position. I think I've even got that type of language written into the program. It's simple enough that I could probably do it tonight if I really wanted to. But I don't.

Addendum - After thinking about it for a moment longer, I realized that the above won't quite work. There will be lots of repetitions. I'll have to do a more careful enumeration perhaps ordered by the number of moves. But then I have to worry about positions that are impossible to reach. Or do I? I'm not sure, but I think that impossible to reach positions really just mean positions where both X and O have a 3 in a row. It's possible for X or O to get a double-win with a single move. But there will never be a situation where both win. (Also, I'll always alternate turns between X and O, so there will be at most 5 Xs, which cannot make two distinct TTTs.)

## Tic-Tac-Toe Game State Generator
##
## This program takes all possible games of Tic-Tac-Toe and generates a list
## of all possible game states. This will be the foundation of the game tree
## that will be used to analyze the game for machine learning.
##
## The program will take in the ordered games created by the TTT_Enumerator
## program and use this to generate mid-game states. This will be a
## non-redundant system, so that as new states are generated they are checked
## against the existing states. This means that each game state will be listed
## exactly once.

##### Generate game board given moves list
def playGame(moves):
    game = [0]*9
    for turn in range(1,10):
        if moves[turn] > -1:
            if turn % 2 == 0:
                player = 2
            else:
                player = 1
            game[moves[turn]] = player
    return game

##### Check if the game already exists
def alreadyExists(thisGame, debug):
    checkFile = open("TTTGameStates.txt", "r")
    for line in checkFile:
        gameAsStr = line.split()
        game = [0]*9
        for i in range(0,9):
            game[i] = int(gameAsStr[i])
            if debug == 1:
                print "Checking game match"
                print "game = {}".format(game)
                print "thisGame = {}".format(thisGame)

        if game[0] == thisGame[0] \
          and game[1] == thisGame[1] \
          and game[2] == thisGame[2] \
          and game[3] == thisGame[3] \
          and game[4] == thisGame[4] \
          and game[5] == thisGame[5] \
          and game[6] == thisGame[6] \
          and game[7] == thisGame[7] \
          and game[8] == thisGame[8]:
            if debug == 1:
                print "Match!\n\n"
            checkFile.close()
            return 1
        if debug == 1:
            print "Not a match"
    checkFile.close()
    if debug == 1:
        print "No matches at all"
    return 0

##### Check for win
def checkWin(game):
    winner = 0
    # X win?
    if game[0]*game[1]*game[2] == 1 \
      or game[3]*game[4]*game[5] == 1 \
      or game[6]*game[7]*game[8] == 1 \
      or game[0]*game[3]*game[6] == 1 \
      or game[1]*game[4]*game[7] == 1 \
      or game[2]*game[5]*game[8] == 1 \
      or game[0]*game[4]*game[8] == 1 \
      or game[2]*game[4]*game[6] == 1:
        winner = 1
    elif game[0]*game[1]*game[2] == 8 \
      or game[3]*game[4]*game[5] == 8 \
      or game[6]*game[7]*game[8] == 8 \
      or game[0]*game[3]*game[6] == 8 \
      or game[1]*game[4]*game[7] == 8 \
      or game[2]*game[5]*game[8] == 8 \
      or game[0]*game[4]*game[8] == 8 \
      or game[2]*game[4]*game[6] == 8:
        winner = 2
    elif game[0]*game[1]*game[2]*game[3]*game[4]*game[5]*game[6]*game[7]*game[8] > 0:
        winner = -1
    return winner

##### Main Program
debug = 0
inputFile = open("TTTGames.txt", "r")

checkFile = open("TTTGameStates.txt", "w")
checkFile.write("0 0 0 0 0 0 0 0 0 0\n")
checkFile.close()

for line in inputFile:
    ## Get move data
    movesAsStr = line.split()
    moves = [0, -1, -1, -1, -1, -1, -1, -1, -1, -1]
    for i in range(1,10):
        moves[i] = int(movesAsStr[i])

    ## Check each step of the game to see if it already exists in the list
    stepMoves = [0, -1, -1, -1, -1, -1, -1, -1 ,-1 ,-1]
    for depth in range(1,10):
        if moves[depth] > -1:
            stepMoves[depth] = moves[depth]
            thisGame = playGame(stepMoves)
            if debug == 1:
                print "stepMoves = {}".format(stepMoves)
                print "thisGame = {}\n\n".format(thisGame)

            ## If the game does not exist, add it to the list
            if alreadyExists(thisGame, debug) == 0:
                thisGame.append(checkWin(thisGame))
                if debug == 1:
                    print "Adding thisGame"
                print "thisGame = {}".format(thisGame)
                if debug == 1:
                    print "\n"
                addToFile = open("TTTGameStates.txt", "a")
                for item in thisGame:
                    addToFile.write(str(item))
                    addToFile.write(" ")
                addToFile.write("\n")
                addToFile.close()

Monday, January 18, 2016

Enumeration is a slow process

After some contemplation, I figured out how to do a complete enumeration of all games. At least, it seems to cycle through them in a reasonable manner. I didn't really think about it until I started running the program, but there are a lot of potential TTT games. The upper limit on the number of games is 9! = 362880. Some of those strings of moves would never happen because the game would have been won before that point, so the number is probably smaller. I'm not sure how much smaller, but I'll find out when the enumeration is completed. Fortunately, this number isn't so large that it's unreasonable to do it. I think if I were to do it again, I'd have to do a bit more with checking the game states rather than looking at the moves. There are fewer than 3^9 = 19683 potential game states to enumerate (9 positions that are either blank, X, or O, and many of those aren't valid). But I'm already down this path far enough that I'll keep going with it and then try the other technique later.

I had set the program to run overnight, but something happened because the program was not running in the morning, and the data appeared to have stopped somewhere. I think I have an idea of what happened, and it's related to shifting the first move from position 0 to position 1. I made some edits and I think I solved the problem. I won't know until it finishes, which will probably be in another couple hours.

Here's the code, with comments at the top:


Addendum: I ran the simulator again, but this time through a PuTTy connection, and it went significantly faster. I think it was done in less than an hour. I didn't time it, but it might have been as little as 30 minutes. I also needed to make a couple other minor adjustments to the code to handle the program exit properly. The final output has 255168 lines, which corresponds to the number of TTT games according to this website. This gives me a great deal of confidence that everything functioned correctly.

## Tic-Tac-Toe Game enumerator
##
## This program systematically cycles through all possible games of TTT
## and exports those games to a file that can be used to generate a set
## of games for the machine learning project.
##
## Each game consists of a 10 numbers. The first number is 0, 1, or 2 and
## indicates the winner:
##    * -1 = Draw
##    * 1 = X wins
##    * 2 = O wins
## The remaining 9 numbers indicate the moves that are made and the order
## in which they were made. Unused moves (if someone wins before the 9th move)
## will be indicated by a -1.
##
## The game positions are as follows:
##
##  0 | 1 | 2
## -------+---
##  3 | 4 | 5
## ---+---+---
##  6 | 7 | 8
##
## In addition to the games, there is also a game state array which keeps track
## of how the board actually looks. This is a 9 element list consisting of 0s, 1s,
## or 2s, depending on what positions have been used.
##
## On each available move, the current player will play in the lowest available
## position as long as that move has not already been played. After each move, the
## game will be checked for a winner. If there is no winner and there are still
## available moves, the play will continue. Otherwise, the game will be output to
## a file.
##
## After a game is output, the last play will be removed and the search for new
## moves will continue with the next position higher than the one that was removed.
## If there are no plays that meet this constraint, then the next move is removed
## and the process is repeated.

##### Check for win
def checkWin(game, debug):
    winner = 0
    # X win?
    if game[0]*game[1]*game[2] == 1 \
      or game[3]*game[4]*game[5] == 1 \
      or game[6]*game[7]*game[8] == 1 \
      or game[0]*game[3]*game[6] == 1 \
      or game[1]*game[4]*game[7] == 1 \
      or game[2]*game[5]*game[8] == 1 \
      or game[0]*game[4]*game[8] == 1 \
      or game[2]*game[4]*game[6] == 1:
        winner = 1
        if debug == 1:
            print "X wins!"
    elif game[0]*game[1]*game[2] == 8 \
      or game[3]*game[4]*game[5] == 8 \
      or game[6]*game[7]*game[8] == 8 \
      or game[0]*game[3]*game[6] == 8 \
      or game[1]*game[4]*game[7] == 8 \
      or game[2]*game[5]*game[8] == 8 \
      or game[0]*game[4]*game[8] == 8 \
      or game[2]*game[4]*game[6] == 8:
        winner = 2
        if debug == 1:
            print "O wins!"
    elif game[0]*game[1]*game[2]*game[3]*game[4]*game[5]*game[6]*game[7]*game[8] > 0:
        winner = -1
        if debug == 1:
            print "Draw!"
    else:
        if debug == 1:
            print "No winner yet."
    return winner

##### Display Game State
def dispGame(game):
    gameChar = []
    for i in range(0,9):
        if game[i] == 0:
            gameChar.append(" ")
        elif game[i] == 1:
            gameChar.append("X")
        elif game[i] == 2:
            gameChar.append("O")
        else:
            gameChar.append("Something bad happened!")
    print
    print " {} | {} | {}".format(gameChar[0], gameChar[1], gameChar[2])
    print "---+---+---"
    print " {} | {} | {}".format(gameChar[3], gameChar[4], gameChar[5])
    print "---+---+---"
    print " {} | {} | {}".format(gameChar[6], gameChar[7], gameChar[8])
    print

##### Main Program
game = [0]*9
move = [-1]*9
numMoves = 0
winner = 0
testMove = 0
turn = 1
removed = -1
debug = 0

while numMoves > -1 or removed > -1:
    ## Determine whether the last loop was additive or subtractive
    ##     * removed == 0 means additive (or starting)
    ##     * removed > 0 means subtractive
    ## Establish starting point for the search for the next move
    if debug == 1:
        print "Starting main loop\n"
        print "numMoves = {}".format(numMoves)
        print "removed = {}".format(removed)
        print "move = {}".format(move)
        dispGame(game)

    if removed == -1:
        ## Start checking for valid moves from 0
        lowStart = 0
    else:
        ## Start checking for valid moves from 1 more than the move
        ## that was removed
        lowStart = removed + 1
    testMove = lowStart

    ## Whose turn is it?
    if numMoves % 2 == 1:
        turn = 2
    else:
        turn = 1

    ## Start searching for a valid move
    while testMove <= 8:
        ## Is position testMove available?
        ## If so, move there. If not, increase testMove
        if game[testMove] == 0:
            game[testMove] = turn
            move[numMoves] = testMove
            numMoves += 1
            removed = -1
            testMove = 10
        else:
            testMove += 1

    ## If no valid play was found, remove the last move
    if testMove == 9:
        numMoves -= 1
        game[move[numMoves]] = 0
        removed = move[numMoves]
        move[numMoves] = -1

    ## Test whether someone has won
    ## If so, create a new line of output and then remove the last move
    winner = checkWin(game, debug)
    if winner != 0:
        if debug == 1:
            dispGame(game)
        print "move = {}".format(move)
        f = open("TTTGames.txt", "a")
        f.write(str(winner))
        f.write(" ")
        for item in move:
            f.write(str(item))
            f.write(" ")
        f.write("\n")
        f.close()
        winner = 0
        numMoves -= 1
        game[move[numMoves]] = 0
        removed = move[numMoves]
        move[numMoves] = -1

Sunday, January 17, 2016

Reinforcement Learning for Tic-Tac-Toe

This post is purely theoretical. I want to try to frame the idea of reinforcement learning for TTT. The setup here is based on Lecture 16 of the CS 229 Machine Learning Course: https://www.youtube.com/watch?v=RtxI449ZjSc.

As an important element of this programming, I'm going to always make the opponent play randomly against whatever strategy is put forward by the learning algorithm. This makes the calculation of future states easier, and possibly something that can be solved analytically.

First, we define the Markov Decision Process (MDP). This is a 5-tuple containing the following information:
  • S = Set of States -- These are all of the valid configurations of games.
  • A = Set of Actions -- This would be the set of all valid plays at any given time. Since I'm only playing one player, this does not consider what the opponent's (random) moves might be. The example of the robot in the video was set up that the moves were fixed directions that could always be invoked. But I don't see any reason why this would need to be the case based on the formulas that are worked out.
  • P_{s_a} = State Transition Probabilities -- For any given decision, there is a probabilistic outcome and this gives the probabilities given that you are in state s and take action a. (Note: For deterministic situations, we just set the probability to be 1 for the event that happens and 0 for everything else.) In the TTT game, the outcomes are probabilistic based on the number of available moves for the opponent that are available.
  • gamma = Discount Factor -- Since these are finite length games, I can set this to be 1 without any problems. The reason that this number is less than 1 in the video is because it accounts for there being infinitely many steps each with finite value, which leads to nonconvergent sums.
  • R = Reward function -- This is +1 when I win and -1 when I either lose or draw. This is an interesting point in the programming, as it might be that setting the reward to be 0 for a draw may result in a different strategy. But I won't know until I get there. For now, I want to play to win.
I will need a couple definitions and formulas here to describe the calculations that are about to happen:
  • V^{pi}(s) = The expected payoff of playing strategy pi starting from position s. (Note: Since the starting position is an empty board, one might think that one would only need to have this for that position. However, the calculations are done in an iterative/recursive manner, and so this value will need to be calculated at all states.
  • Formula for V^{pi}(s): V^{pi}(s) = R(s) + (gamma) * \sum_{s'} P_{s pi(s)} (s') V^{pi}(s') -- These are known as Bellman's equations. For TTT, we have gamma = 1.
Bellman's equations give us a linear system of equations that we could actually solve analytically. For the TTT game, we can think of the possible game states as picking the locations of 5 Xs out of the 9 positions and letting the rest be Os. Of course, many games end before that point, so this really creates an upper bound on the number of games as C(9,5) = 9!/(5! * 4!) = 126 games. So this should turn out to be an explicit calculation that I should be able to do. I think this will be my first project. The challenge seems to be indexing the possible game states in a useful way. I need to ensure that play does not go beyond the point that someone has won. There also seems to be some problems with the precise order of play. For example, if after three moves the board is XOX across the top row, there are two distinct ways that could have come about based on which order X made his move, but this game state has the same value either way. (I think this may be solved by coming up with a useful indexing of the games.)

So this works given a particular strategy pi, which will be initialized as being random play. The next challenge comes at the learning stage. The goal is to find V^* = max_{pi} V^{pi}, which is the optimal value function, and then to find the strategy pi^* that achieves that. There's another Bellman equation for V^*: V^*(s) = R(s) + (gamma) * max_a \sum_{s'} P_{s_a}(s') V^*(s'). The strategy pi^* can be obtained (after V^* is known) by using pi^*(s) = argmax_{a} \sum_{s'} P_{s_a}(s') V^*(s').

Since the formula for V^*(s) is recursive, this is no longer something that can be calculated in a straightforward manner. This leads to the two algorithms that attempt to approximate it: Value Iteration (VI) and Policy Iteration (PI). Here's the VI algorithm:
  • Initialize V(s) = 0 for all s.
  • Repeatedly update using V(s) := R(s) + (gamma) * max_a \sum_{s'} P_{s_a}(s') V(s') -- This can be done synchronously (update all at once) or asynchronously (update V(s) for one s at a time). I want to try to program both.
And here's the algorithm for Policy Iteration:
  • Initialize pi randomly.
  • Repeat:
    • V(s) := V^{pi}
    • pi(s) := argmax_a \sum_{s'} P_{s_a}(s') V(s')
This one seems a bit more difficult to program because I need to actually solve Bellman's equations to get V^{pi}. This means solving the large system of equations that I alluded to above. But in theory, that's it.

Here's my plan of attack:
  • Step 1: Devise an indexing system for all possible games and game states so that I can cycle through them in some useful way.
  • Step 2: Implement synchronous VI
  • Step 3: Implement asynchronous VI (this should be easy after having done synchronous VI)
  • Step 4: Learn how to solve large systems of equations numerically. (There might exist something in numpy that does this already. I don't know.)
  • Step 5: Implement PI
And this would take me to the first landing point for machine learning TTT. One "flaw" of this that I can see is that Step 1 is actually where all the real work lies. After telling the program exactly what's going to happen, it can figure out how to play to win. But eventually I want to do something where I don't enumerate all of the possibilities first. I'll have to think about how I can create some sort of game simulator. And then maybe set two different programs to play against each other and learn against each other. But that's a couple steps down the line.