return to table of content

Monte-Carlo graph search from first principles

modeless
7 replies
14h13m

Also, as far as the name "Monte-Carlo Tree Search" itself, readers might note that there is nothing "Monte-Carlo" in the above algorithm - that it's completely deterministic!

MCTS as commonly implemented is deterministic? How strange! I assumed there was randomness in the sampling.

kadoban
3 replies
13h36m

Originally MCTS did have randomness, yes. I think the article mentions it, but this came in in the form of playouts at the end, used just to evaluate positions. This has been replaced in current similar projects by NN evaluations, which are higher quality (as you can probably imagine, just playing random moves to see who wins is not very good, it was just the best strategy known at the time).

So the monte-carlo turned out to be an unessential (and suboptimal) part of what is now still called MCTS, making the name a bit unfortunate.

jacoblambda
1 replies
13h7m

In essence the NN evaluations are acting as a PRNG with values heavily biased by an established heuristic.

So while it's very much a specialization of MCTS and probably should go by a different name, one could argue it's still using monte carlo methods.

pavlov
0 replies
9h19m

Sounds like the dice are loaded in Monte Carlo.

mzl
0 replies
11h8m

Small side note, standard MCTS is often implemented with heuristic guiding the moves in the playouts and not just random moves, as that gives more interesting information.

jacoblambda
1 replies
13h19m

It's technically a different algorithm just under the same "monte carlo" name.

However something interesting of note is that since most applications of monte carlo methods rely on PRNGs instead of true RNGs, they are technically deterministic (as given the same PRNG seed you should always get the same result for a given input).

So what this algorithm is doing instead of using a normal PRNG and a separate heuristic, is to instead query the neural network. This works because the neural net is an heuristic over a massive search space which ends up acting like a very poor PRNG that is heavily biased towards specific results based on the training of the neutal net, which in turn ends up looking like a PRNG with a set of heuristics applied.

The important thing to note is that this is a specialisation of MCTS and as such shouldn't technically work for all use cases.

pvitz
0 replies
8h47m

I can't speak for this use case, but in many other places, one isn't even using a PRNG, but rather Quasi-Monte Carlo techniques with no randomness at all. Monte Carlo doesn't need randomness, but a good coverage of the search space.

westurner
0 replies
13h53m

If there's randomness, is there convergence and after how much (CPU, RAM, GPU, TPU, QPU) resource-time?

fabmilo
3 replies
18h27m

I believe this kind of graph exploration is what we need to progress reasoning in AI. Plain LLMS will fail. The link has tons of good references, including the Zobrist hashing https://en.wikipedia.org/wiki/Zobrist_hashing for game tables. We need to find a good hashing for language based state description so that graph exploration doesn't explode computationally. Another good read for Tree Search is Thinking Fast and Slow: https://arxiv.org/abs/1705.08439 and Teaching Large Language Models to Reason with Reinforcement Learning: https://arxiv.org/abs/2403.04642 comparing the MCTS approach to other current RL strategies.

vjerancrnjak
0 replies
9h42m

This looks too low level.

What might be a step forward is a joint learning of the state representation with the search algorithm. Search algorithm explores the NN representation of the state for which you can get the cost.

https://sites.google.com/view/genie-2024/

Genie from DeepMind is a good demonstration where discrete state is being modeled. NN learns a very complex representation with collision detection and actions. Instead of decoding that state into pixels, search could probably be done directly on that state.

Of course, this architecture could be very different.

anonymous-panda
0 replies
13h21m

Would it be impossible to marry the two somehow? It’s hard for me to believe that the brain only uses a single technique for everything and likely has many different tools in its toolbox with a selector on top to know how to leverage each appropriately.

andrewdb
0 replies
4h45m

An over-simplified approach that would be interesting to explore:

1. Given a collection of logical arguments, figure out how to assign a hash to each argument.

2. Build a Merkle tree representing the argument's hashes, nesting according to first principles.

3. If an argument is challenged successfully, the argument's hash changes, rendering descendent argument hashes invalid.

behnamoh
3 replies
18h23m

A bit introduction about this would be nice.

bionhoward
2 replies
17h43m

When we make game-playing AI (which is all AI, depending on your analogy comfort), one of the most promising techniques is Tree Search, which ranks moves based on the descendant moves. In games where you could reach the same state in many ways, much memory might be wasted to re-record the same state node on different branches.

This article is a nice exploration of an approach called Graph Search, which essentially trades compute for memory by doing extra compute work (hashing the game states) to check to see if the nodes are already visited. That saves us from re-recording nodes we already saw, and consequently converts trees (free of cycles) into directed acyclic graphs.

This forces some tinkering with the tree search to get correct results, specifically it demands a focus more on edges (actions or moves) as the unit of optimization, rather than on vertices (states). It’s a well written technical essay in literate programming written by someone who understands their subject.

TrueDuality
0 replies
4h51m

I would argue that this saves both compute and memory for any significant search. In a traditional tree-search without treating states as identical and revistable still requires you to store every node that you visit in memory to track the N, Q, and child relations.

Applying the algorithm adds a hash associated with each node, and a counter on each edge. On the other side you're de-duplicating identical states in your search space, saving the memory of duplicate nodes but more importantly saving compute by not requiring independent searches of the same state space. Whether this trade off actually saves memory will be determined based on how likely duplicate states are in your search space.

Otherwise your summary is absolutely spot on.

3abiton
0 replies
9h34m

This was an amazing high level recap with context about the background and importance of the technique!

bingbingbing777
1 replies
17h0m

The URL is literally within the KataGo repo.

pixelpoet
0 replies
15h47m

The HN URL preview for me "literally" says "(github.com/lightvector)", with no upfront mention of KataGo. sigh

dooglius
1 replies
4h18m

My chess experience is not that high, but enough that I'm skeptical of the claim that the same position would be duplicated in a search tree enough for it to matter. I'd be interested to see an actual measure of this with Leela Zero. (And I'm not even considering the threefold repetition and fifty-move rules, which when considered in the state make a repetition much less likely).

tel
0 replies
3h29m

Ko is very common in Go, though. You can't validly repeat the board position (simplified way of handling Ko rules) but if tree search failed to evaluate ko positions correctly it'd be easy to construct situations where the AI would make bad moves.

rphln
0 replies
19h41m

Somehow the paper they mention completely flew under my radar when I was researching MCTS. Surely it's gonna be a lot of fun to give this modification a spin on my next opportunity.