Machine Learning with Python Projects - Rock Paper Scissors

Tell us what’s happening:

Hi! I have a question about winning 60% of the time against the bot “abbey.” I’m currently using supervised learning with a Q-Table. Sometimes I achieve a 60% win rate against the “abbey” bot, but other times I get around 50% wins. I’ve tried adjusting the alpha, gamma, etc. I thought about using STEPS, but they don’t mimic the way the “abbey” bot plays if I store the moves in “opponent_history.” Do you think you could help me?

Your code so far

import numpy as np

ALPHA, GAMMA, EPSILON = 0.2, 0.67, 0.9
STATES, ACTIONS = 40, 3
Q = np.zeros((STATES, ACTIONS))

states, state, guess, state_idx = [], '', None, None

def update_state(state, move, window_size=3):
    state += move
    if len(state) > window_size:
        state = state[1:]
    return state

def get_idx_state(state, states):
    if state not in states:
        states.append(state)
    return states.index(state)

def player(prev_play, opponent_history=[]):
    global STATES, ACTIONS, Q, ALPHA, GAMMA, EPSILON, state, state_idx, guess, states

    options = ['R', 'P', 'S']

    if prev_play == '':
        Q = np.zeros((STATES, ACTIONS))
        guess, EPSILON, states, state, state_idx = None, 0.9, [], '', None

    if guess:
      if (guess == "P" and prev_play == "R") or (
          guess == "R" and prev_play == "S") or (
          guess == "S" and prev_play == "P"):
          reward = 1
      elif prev_play == "P" and guess == "R" or prev_play == "R" and guess == "S" or prev_play == "S" and guess == "P":
          reward = -1
      else:
          reward = 0

      guess_action = options.index(guess)

      next_state = update_state(state, prev_play)
      next_state_idx = get_idx_state(next_state, states)

      Q[state_idx, guess_action] = Q[state_idx, guess_action] + ALPHA * (reward + GAMMA * np.max(Q[next_state_idx, :]) - Q[state_idx, guess_action])

      state = next_state

      if reward == 1:
          EPSILON -= 0.00985

    state_idx = get_idx_state(state, states)

    if np.random.uniform(0, 1) < EPSILON:
      action = np.random.choice(ACTIONS)
    else:
      action = np.argmax(Q[state_idx, :])

    guess = options[action]


    return guess

Your browser information:

User Agent is: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36

Challenge Information:

Machine Learning with Python Projects - Rock Paper Scissors