Machine Learning - Rock-Paper-Scissor question

Hi, so I have trouble where to start with this one.
Like, given how the AI works, I can rather easily write a function that counters all 4 of them for 100% - but only if they are in a specific order playing 1000 turns each (well technically I could even use the code to just hardcode the 4000 responses) - but that’s obviously not the goal.
So my issue comes down to what my goal is and what kind of model would be appropriate, because I am not dealing with static data and predictions, but 4 different functions that change their output based on my input.

What bothers me is, what constrictions I should look at for ML.
Should I consider both the order and turn-number of the bots as random? Then maybe LSTM seems likely, though could that even handle the change in strategy?
Or are they fixed? Then I could go with Reinforcement Learning and just let it basically create the 4000 ideal responses mentioned before.
In theory I could also go with a basic ML-model, give it 4k inputs with no hidden layers (or a non-dense 4-node hidden layer for each bot) and let it optimize that.

Though none of these options feel really good.
Would be happy for any kind of advice on how to approach that challenge with ML rather than just coding the counter myself.

Coding the counter players is actually quite instructive, especially for the one using the short Markov chain. There are multiple ways to determine which bot your program is playing against every time, regardless of order.

Most of the RPS machine learning searches on google lead to the myriad copies of some tutorial about recognizing human hands making the RPS signs to pit a computer against a human. If you dig though, you can find information related to the old RPS contest that used to run online and you can investigate some of their strategies. Most of them involved using a Markov chain either alone or in combination with other algorithms, combined with some forgetting and some randomness.

The neural net methods are not really well suited to this type of problem, at least not easily. I imagine there is some way to do it it by building a model and feeding it the bot and player data from hundreds or thousands of games but I never attempted it. My advice is to use the old school machine learning and investigate a Markov chain algorithm, even though it’s lots of fun to watch the counter-bots beat the bots >90% of the time…

Ah I see!
Well guess I have to look into Markov Models. The videos did only mention the hidden Markov Model and that didn’t seem suited. But guess there is more to it and having a model to start is most helpful - so thanks ^^