I am on the Rock Paper Scissors challenge currently for the Machine Learning with Python course. It is the final challenge I need to get my certificate. I am almost done but can’t seem to get a 60% win rate on two of the bots and I am not sure why. Would anyone mind taking a look at my code and giving me feedback?
rock_paper_scissors - Replit
The key is to increase the complexity of the Q-table. A 3x3 Q-table is essentially making play solely by tracking opponent’s last play , something like “if the opponent’s last play is rock, by experience which play has grestest chance to win?”. To outsmart bots which make play based on opponent’s last play (‘kris’) and opponent’s last two plays (‘abbey’), the Q-table needs to track different combinations of the plays history (last 5 plays of opponent or last 2 plays of both side, for examples). That means increase the number of states(rows) in the Q-table.
On the other hand, as the program is required to play 1,000 games with each bot, it sets a constraint on the size of the Q-table. Say if the Q-table were 729(3**6) x 3 large, it has 2,187 cells and not possible to be filled up by random plays within 1,000 plays.
Ok I see. I tried making my Q table more complex like you said by adding states of the last two plays of the opponent and my win ratings actually became worse. I think I implemented it correctly but I’m not entirely sure. I was thinking as another approach maybe having a function that could predict the next move from the opponent history and then passing that value into Q_learn(). Do you think that could also work?
I have studied your code and think its logic is recreating a Q-table everytime the “player” function is called, which stimulates 10 episodes of plays for every state. It does not really learn the behaviour of the bots.
I think for RSP game, Q-learning should be use in this way: let’s say you want to create a Q-table against the last two opponent plays, we first create a 9 by 3 matrix of zeros, and after each round of play, we update the value of the corresponding cell according to the result of the last round. The values of the Q-table should be kept between each call of the “player” function. If the plays of the opponent follow some pattern, after a number of rounds the Q-table would show given the last two plays of the opponent, which play by the player has the biggest chance to win.
And when we use the formula of updating the value of Q-table , we should take notice what the designations of “current state” and “next state” are actually referring to. If the last three plays of the opponent’s are R followed by S and then P, we may be inclined to think the current state is “SP”, but in fact it should be the “next state” in the formula, and “RS” the “current state”.
On the other hand, as the playing strategy of 3 of the 4 bots are based on opponent’s plays, I think it’s necessary to track the previous plays of the “player” itself as well, and include them in the states of the Q-table.
Oh ok I see. I’ll see what I can come up with.
Thanks for the help!