ML with Python - Rock Paper Scissors

Hello everyone!!

I’m having trouble doing this challenge.

1- As for the data to feed the model, I thought they should be sequential and be a dyad with the ‘player’ and opponent’s move (because the opponent’s actions occur, in part, due to the ‘player’s’ actions). However, I didn’t see anyone doing that. Should I continue to consider doing this?

2- I don’t know what to use. I thought about using RNN (LSTM) or CNN, but it looks like it won’t work. I saw people talking about Hidden Markov Models. Is it the most suitable for this problem? And do you have another suggestion?

Thanks!

Your browser information:

User Agent is: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36.

Challenge: Rock Paper Scissors

Link to the challenge:

The problem is that you don’t have the set of data in advance to make a training and testing set, so using a neural network model that requires training would be difficult to implement. It would be complicated by the fact that the computer players use algorithms that react to their opponents, so even if you played them a lot to generate training data, how you play them would affect how they play which would affect the training data.

So, a Markov chain algorithm is a good choice. I’m sure there are several algorithms that use past actions to predict future ones, so you could research that. There are also different modifications that can be used on a Markov chain.

There used to be an online RPS competition where different bots played and were ranked and there is some information about algorithms there. I’ve posted links to it here before on similar threads.

3 Likes

Thank you so much for your reply @jeremy.a.gray!
Yeah. I was thinking about playing about 300 times at random, and then use the data gathered to train. It would be like this image. Like an (2x10 or 11) image, with stride=1. I thought it would be interesting. But, if possible, I don’t have the knowledge now.
image

I’m going to read a few things about Markov, to have a better understanding on how to implement that. And I’m looking at your replies to find those links.

Again, thank you so much for your reply. Meant a lot. Has been some days that I’m stuck on that. I was really down.

I spent some time thinking along these lines, but since the other bots are using algorithms that react to what you’ve played in the past, if you train at random, you would have to play at random to get the same behaviour from the bot. If you used the training to play to win, the bot would adapt to what you were playing and your training wouldn’t be helpful.

Now if you really wanted to use a neural network to solve this problem, I think you would need to design a model that could discern the algorithm the bot was playing instead of just deciding what the bot would play given the current history. This seems like it could be doable, but it would be much harder than a Markov chain or a genetic algorithm.

P.S. Abbey is a Markov chain player if you want an example.

1 Like

Yeah. It wouldn’t work. I didn’t think about that.

Yeah. I saw it in a post that you made on another topic. I copied and modified it to see if it works. I got a 58.707% win% so far.

First, I added another play in the history (2 => 3). But it wasn’t enough. So I thought about using the “2-plays data” at the beginning and the “3-plays data” after a few plays. It improved the win%, but less than 1%.
So I tried to break the pattern a few times, which improved the win%, but (again) less than 1%.

And I tried to restart the data when len(opponent_history) == 200, which worked against Abbey, but was terrible against Kris.
I think I’m going to focus on that, see if I can make it work against the two of them.

Thanks @jeremy.a.gray!

My code so far

opponent_history=["R"]
play_order=[{
              "RRR": 0, "RRP": 0, "RRS": 0,
              "RPR": 0, "RPP": 0, "RPS": 0,
              "RSR": 0, "RSP": 0, "RSS": 0,
              "PRR": 0, "PRP": 0, "PRS": 0,
              "PPR": 0, "PPP": 0, "PPS": 0,
              "PSR": 0, "PSP": 0, "PSS": 0,
              "SRR": 0, "SRP": 0, "SRS": 0,
              "SPR": 0, "SPP": 0, "SPS": 0,
              "SSR": 0, "SSP": 0, "SSS": 0
          }]
play_order2=[{
              "RR": 0, "RP": 0, "RS": 0,
              "PR": 0, "PP": 0, "PS": 0,
              "SR": 0, "SP": 0, "SS": 0,
          }]

def player(prev_play):
    global opponent_history
    global play_order
    global play_order2

    if not prev_play:
        prev_play = "R"
        opponent_history=["R"]
        play_order=[{
              "RRR": 0, "RRP": 0, "RRS": 0,
              "RPR": 0, "RPP": 0, "RPS": 0,
              "RSR": 0, "RSP": 0, "RSS": 0,
              "PRR": 0, "PRP": 0, "PRS": 0,
              "PPR": 0, "PPP": 0, "PPS": 0,
              "PSR": 0, "PSP": 0, "PSS": 0,
              "SRR": 0, "SRP": 0, "SRS": 0,
              "SPR": 0, "SPP": 0, "SPS": 0,
              "SSR": 0, "SSP": 0, "SSS": 0
          }]
        play_order2=[{
              "RR": 0, "RP": 0, "RS": 0,
              "PR": 0, "PP": 0, "PS": 0,
              "SR": 0, "SP": 0, "SS": 0,
          }]
    opponent_history.append(prev_play)

    last_two = "".join(opponent_history[-2:])
    last_three = "".join(opponent_history[-3:])
    if len(last_two) == 2:
        play_order2[0][last_two] += 1
    if len(last_three) == 3:
        play_order[0][last_three] += 1
    
    potential_plays = [
        prev_play + "P",
        prev_play + "R",
        prev_play + "S",
        last_two + "P",
        last_two + "R",
        last_two + "S"
    ]

    sub_order = {
        k: play_order[0][k]
        for k in potential_plays if k in play_order[0]
    }

    sub_order2 = {
        l: play_order2[0][l]
        for l in potential_plays if l in play_order2[0]
    }

    dsa = []
    zyx = {}
    for key,value in sub_order.items():
        dsa.append(value)
        zyx[value] = key
    
    ideal_response = {'P': 'S', 'R': 'P', 'S': 'R'}

    if sub_order[zyx[max(dsa)]] > 60:
        prediction = max(sub_order, key=sub_order.get)[-1:]
    elif sub_order[zyx[max(dsa)]] > 58:
        prediction = min(sub_order, key=sub_order.get)[-1:]
    elif sub_order[zyx[max(dsa)]] > 55:
        prediction = max(sub_order, key=sub_order.get)[-1:]
    elif sub_order[zyx[max(dsa)]] > 53:
        prediction = min(sub_order, key=sub_order.get)[-1:]
    elif sub_order[zyx[max(dsa)]] > 5:
        prediction = max(sub_order, key=sub_order.get)[-1:]
    else:
        prediction = max(sub_order2, key=sub_order2.get)[-1:]

    return ideal_response[prediction]

I made it! I don’t think it was a fair win, but I did it!
I cleaned both play_order 2 times.
One thing I thought about today was to print the results after every 100 games, to see where I wasn’t winning.
I don’t know if it was fair. I mean, supposedly, it shouldn’t be like that. But I don’t know. I cheated? Maybe, after a while, I try to do that again, without cheating.

Thanks for everything @jeremy.a.gray!!

opponent_history=["R"]
play_order=[{
              "RRR": 0, "RRP": 0, "RRS": 0,
              "RPR": 0, "RPP": 0, "RPS": 0,
              "RSR": 0, "RSP": 0, "RSS": 0,
              "PRR": 0, "PRP": 0, "PRS": 0,
              "PPR": 0, "PPP": 0, "PPS": 0,
              "PSR": 0, "PSP": 0, "PSS": 0,
              "SRR": 0, "SRP": 0, "SRS": 0,
              "SPR": 0, "SPP": 0, "SPS": 0,
              "SSR": 0, "SSP": 0, "SSS": 0
          }]
play_order2=[{
              "RR": 0, "RP": 0, "RS": 0,
              "PR": 0, "PP": 0, "PS": 0,
              "SR": 0, "SP": 0, "SS": 0,
          }]
n = 0

def player(prev_play):
    global opponent_history
    global play_order
    global play_order2
    global n

    n += 1
    if not prev_play:
        prev_play = "R"
        opponent_history=["R"]
        play_order=[{
              "RRR": 0, "RRP": 0, "RRS": 0,
              "RPR": 0, "RPP": 0, "RPS": 0,
              "RSR": 0, "RSP": 0, "RSS": 0,
              "PRR": 0, "PRP": 0, "PRS": 0,
              "PPR": 0, "PPP": 0, "PPS": 0,
              "PSR": 0, "PSP": 0, "PSS": 0,
              "SRR": 0, "SRP": 0, "SRS": 0,
              "SPR": 0, "SPP": 0, "SPS": 0,
              "SSR": 0, "SSP": 0, "SSS": 0
          }]
        play_order2=[{
              "RR": 0, "RP": 0, "RS": 0,
              "PR": 0, "PP": 0, "PS": 0,
              "SR": 0, "SP": 0, "SS": 0,
          }]

    if n == 401:
        play_order=[{
              "RRR": 0, "RRP": 0, "RRS": 0,
              "RPR": 0, "RPP": 0, "RPS": 0,
              "RSR": 0, "RSP": 0, "RSS": 0,
              "PRR": 0, "PRP": 0, "PRS": 0,
              "PPR": 0, "PPP": 0, "PPS": 0,
              "PSR": 0, "PSP": 0, "PSS": 0,
              "SRR": 0, "SRP": 0, "SRS": 0,
              "SPR": 0, "SPP": 0, "SPS": 0,
              "SSR": 0, "SSP": 0, "SSS": 0
          }]
        play_order2=[{
              "RR": 0, "RP": 0, "RS": 0,
              "PR": 0, "PP": 0, "PS": 0,
              "SR": 0, "SP": 0, "SS": 0,
          }]

    if n == 802:
        play_order=[{
              "RRR": 0, "RRP": 0, "RRS": 0,
              "RPR": 0, "RPP": 0, "RPS": 0,
              "RSR": 0, "RSP": 0, "RSS": 0,
              "PRR": 0, "PRP": 0, "PRS": 0,
              "PPR": 0, "PPP": 0, "PPS": 0,
              "PSR": 0, "PSP": 0, "PSS": 0,
              "SRR": 0, "SRP": 0, "SRS": 0,
              "SPR": 0, "SPP": 0, "SPS": 0,
              "SSR": 0, "SSP": 0, "SSS": 0
          }]
        play_order2=[{
              "RR": 0, "RP": 0, "RS": 0,
              "PR": 0, "PP": 0, "PS": 0,
              "SR": 0, "SP": 0, "SS": 0,
          }]
    opponent_history.append(prev_play)

    last_two = "".join(opponent_history[-2:])
    last_three = "".join(opponent_history[-3:])
    if len(last_two) == 2:
        play_order2[0][last_two] += 1
    if len(last_three) == 3:
        play_order[0][last_three] += 1
    
    potential_plays = [
        prev_play + "P",
        prev_play + "R",
        prev_play + "S",
        last_two + "P",
        last_two + "R",
        last_two + "S"
    ]

    sub_order = {
        k: play_order[0][k]
        for k in potential_plays if k in play_order[0]
    }

    sub_order2 = {
        l: play_order2[0][l]
        for l in potential_plays if l in play_order2[0]
    }

    dsa = []
    zyx = {}
    for key,value in sub_order.items():
        dsa.append(value)
        zyx[value] = key
    
    ideal_response = {'P': 'S', 'R': 'P', 'S': 'R'}

    if sub_order[zyx[max(dsa)]] > 60:
        prediction = max(sub_order, key=sub_order.get)[-1:]
    elif sub_order[zyx[max(dsa)]] > 58:
        prediction = min(sub_order, key=sub_order.get)[-1:]
    elif sub_order[zyx[max(dsa)]] > 55:
        prediction = max(sub_order, key=sub_order.get)[-1:]
    elif sub_order[zyx[max(dsa)]] > 53:
        prediction = min(sub_order, key=sub_order.get)[-1:]
    elif sub_order[zyx[max(dsa)]] > 5:
        prediction = max(sub_order, key=sub_order.get)[-1:]
    else:
        prediction = max(sub_order2, key=sub_order2.get)[-1:]

    return ideal_response[prediction]

Clearing the history is not cheating, it’s just recognizing that you’re playing different players. I cleared mine between each player. Playing a straight Markov chain against Abbey is a bit dicey; using a length of 5 my bot can usually win (~61%). With a small change in perspective, you can beat her at >80% with a length 2 chain. All you’re really doing is tinkering with how much data is used by the Markov chain. Many algorithms build in some forgetfulness to keep the data current and winnow out past data.

I’ve got several different bots that win this and use far dirtier tricks. It’s fun to see the computer lose 99% of the time.

1 Like

Thanks. Yeah. I was a little concerned if it would limit my learning, but I’m exploring the possibilities, etc., so I don’t think it will hurt my learning. For example, all those things you said, if I try each one, I will learn a lot.
Anyway, it is the insecurity led by little experience and skills.

Thank you very very much! For all the answers!! I’m extremely grateful!! :smiley:

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.