Tell us what’s happening:
Hi! I have a question about winning 60% of the time against the bot “abbey.” I’m currently using supervised learning with a Q-Table. Sometimes I achieve a 60% win rate against the “abbey” bot, but other times I get around 50% wins. I’ve tried adjusting the alpha, gamma, etc. I thought about using STEPS, but they don’t mimic the way the “abbey” bot plays if I store the moves in “opponent_history.” Do you think you could help me?
Your code so far
import numpy as np
ALPHA, GAMMA, EPSILON = 0.2, 0.67, 0.9
STATES, ACTIONS = 40, 3
Q = np.zeros((STATES, ACTIONS))
states, state, guess, state_idx = [], '', None, None
def update_state(state, move, window_size=3):
state += move
if len(state) > window_size:
state = state[1:]
return state
def get_idx_state(state, states):
if state not in states:
states.append(state)
return states.index(state)
def player(prev_play, opponent_history=[]):
global STATES, ACTIONS, Q, ALPHA, GAMMA, EPSILON, state, state_idx, guess, states
options = ['R', 'P', 'S']
if prev_play == '':
Q = np.zeros((STATES, ACTIONS))
guess, EPSILON, states, state, state_idx = None, 0.9, [], '', None
if guess:
if (guess == "P" and prev_play == "R") or (
guess == "R" and prev_play == "S") or (
guess == "S" and prev_play == "P"):
reward = 1
elif prev_play == "P" and guess == "R" or prev_play == "R" and guess == "S" or prev_play == "S" and guess == "P":
reward = -1
else:
reward = 0
guess_action = options.index(guess)
next_state = update_state(state, prev_play)
next_state_idx = get_idx_state(next_state, states)
Q[state_idx, guess_action] = Q[state_idx, guess_action] + ALPHA * (reward + GAMMA * np.max(Q[next_state_idx, :]) - Q[state_idx, guess_action])
state = next_state
if reward == 1:
EPSILON -= 0.00985
state_idx = get_idx_state(state, states)
if np.random.uniform(0, 1) < EPSILON:
action = np.random.choice(ACTIONS)
else:
action = np.argmax(Q[state_idx, :])
guess = options[action]
return guess
Your browser information:
User Agent is: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36
Challenge Information:
Machine Learning with Python Projects - Rock Paper Scissors