Harder for humans, but easy to make a really strong AI for this. Even overcounting because of illegal board states (multiple winners) and not even bothering to eliminate symmetries, there are at most 2 * 3^9 = 39366 board states.
There are cycles in the board state graph, although they are of a very specific form (the only kind of cycle that exists is for board B with O and X alternating turns). So it is probably possible to make a completely deterministic and optimal algorithm for this probabilistic game, but it does sound complicated. You can't naively apply expectiminimax.
However after marking the winning board states as 0 or 1 respectively if O or X wins I would expect value iteration to very quickly converge to an optimal strategy here.
WRT to computing an exact solution, something something markov chains, transition matrices, eigenvalues. I think it is tractable
Usually those are for additive/linear systems, the problem with game theoretic graphs like these is that you alternate between max and min nodes, so the system is highly nonlinear.
You're right.
I'll work on the simpler problem of :) / :( first. I think that can be done with just minimax
And then maybe win chance for each possible state of a purely random game
If it were just solely :) / :( then it is a freshman's exercise in expectiminimax.
it turns out you don't need anything more than minimax for the general case Here's my solution https://github.com/pvillano/probabalistic-tic-tac-toe
I think this fails to take into account that your opponent can also roll 'meh', making it your turn again.
I just did: https://louisabraham.github.io/articles/probabilistic-tic-ta...
I think this is doable. Say we assign a win rate W(S) to each board state S, and let W(S, A) denote the win rate after taking action A from state S. Since the transition is probabilistic, we can write:
W(S, A) = P(good) * (1 - W(S_good)) + P(bad) * (1 - W(S_bad)) + (1 - P(good) - P(bad)) * (1 - W(S))
And obvisouly:
W(S) = max(W(S, A), foreach A in Actions)
max() is troublesome, but we can replace it with a >= sign:
W(S) >= (W(S, A), forall A in Actions)
And then we can expand W(S, A) and move W(S) in each of the inequalities to the left hand side. After all that we will have a bunch of linear inequalities which can be optimized with linear programming. I think the objective would just be:
<del>maximize</del> minimize W(empty_board)
There is an easier way to solve each recursion! I just wrote a blog post on it: https://louisabraham.github.io/articles/probabilistic-tic-ta...
I think you have an error in the equation defining V(s).
You have component n_c * V(s) for the 'nothing happened' case, but I don't think that's correct. If you rolled that nothing happens the turn still passes to your opponent, so I think it should be n_c * V'(s).
oh RIGHT. Gotta fix it
Yep, this is what I ended up doing as well! With how the game generate boards, the player that goes first always have a ~5% advantage. Since players switch hands each around they should have 50% win rate if both play optimally.
In practice, playing against author's AI I barely get ~60% win rate (small caveat, I count ties as 0.5 to both players). What about yours?
Edit: nvm I saw you did the same with ties.
Turns out linear programming is not fast... Takes about 90 minutes to find the optimal solution for any board configuration.
I think you will find it extremely difficult to do better than simply checking the probability that each square gives you a spot times the number of victory paths it opens up minus the probability that it gives your opponent a spot times the number of victory paths it opens up for them. Add another clause for paths closed if you want.
Since chance is involved, you will basically never want to do anything but the greediest highest value next action. Sometimes more than half the board has net value of 0 or less which makes them very easy to ignore.
Wouldn't you also have to take into account the probability of the following moves also being successful and giving you a win?
No because the odds are symmetric for every slot. If you have two in a row; and the third slot has higher odds that it goes to the non-roller, you should just... not roll in it.
The timing of when it gets rolled won't matter. The need to urgently consider blocking off other routes to victory will be embedded in the scoring described above.
Since passing is not an option, you can't ignore a net value of 0 or less, because all options might have a net value of 0 or less.
Sure. But there's still no conceivable situation where it is advantageous to pursue such an option while a net positive option exists. Ergo, it is easy to ignore.
IDK if it's just me, but I went 6-0. Is something wrong with the computer player logic?
What happens when you play against yourself?
100% winrate again
The OP mentioned that the AI they programmed is using a simple heuristic.
1/64 chance.
What happens if you play 6 more?