HN comments for: Monte-Carlo graph search from first principles

modeless

7 replies

14h13m

2024-03-11 04:12:23 UTC

Also, as far as the name "Monte-Carlo Tree Search" itself, readers might note that there is nothing "Monte-Carlo" in the above algorithm - that it's completely deterministic!

MCTS as commonly implemented is deterministic? How strange! I assumed there was randomness in the sampling.

kadoban

3 replies

13h36m

2024-03-11 04:48:52 UTC

Originally MCTS did have randomness, yes. I think the article mentions it, but this came in in the form of playouts at the end, used just to evaluate positions. This has been replaced in current similar projects by NN evaluations, which are higher quality (as you can probably imagine, just playing random moves to see who wins is not very good, it was just the best strategy known at the time).

So the monte-carlo turned out to be an unessential (and suboptimal) part of what is now still called MCTS, making the name a bit unfortunate.

jacoblambda

1 replies

13h7m

2024-03-11 05:18:41 UTC

In essence the NN evaluations are acting as a PRNG with values heavily biased by an established heuristic.

So while it's very much a specialization of MCTS and probably should go by a different name, one could argue it's still using monte carlo methods.

pavlov

0 replies

9h19m

2024-03-11 09:06:09 UTC

Sounds like the dice are loaded in Monte Carlo.

mzl

0 replies

11h8m

2024-03-11 07:17:14 UTC

Small side note, standard MCTS is often implemented with heuristic guiding the moves in the playouts and not just random moves, as that gives more interesting information.

jacoblambda

1 replies

13h19m

2024-03-11 05:06:38 UTC

It's technically a different algorithm just under the same "monte carlo" name.

However something interesting of note is that since most applications of monte carlo methods rely on PRNGs instead of true RNGs, they are technically deterministic (as given the same PRNG seed you should always get the same result for a given input).

So what this algorithm is doing instead of using a normal PRNG and a separate heuristic, is to instead query the neural network. This works because the neural net is an heuristic over a massive search space which ends up acting like a very poor PRNG that is heavily biased towards specific results based on the training of the neutal net, which in turn ends up looking like a PRNG with a set of heuristics applied.

The important thing to note is that this is a specialisation of MCTS and as such shouldn't technically work for all use cases.

pvitz

0 replies

8h47m

2024-03-11 09:38:15 UTC

I can't speak for this use case, but in many other places, one isn't even using a PRNG, but rather Quasi-Monte Carlo techniques with no randomness at all. Monte Carlo doesn't need randomness, but a good coverage of the search space.

westurner

0 replies

13h53m

2024-03-11 04:32:11 UTC

If there's randomness, is there convergence and after how much (CPU, RAM, GPU, TPU, QPU) resource-time?

fabmilo

3 replies

18h27m

2024-03-10 23:58:34 UTC

I believe this kind of graph exploration is what we need to progress reasoning in AI. Plain LLMS will fail. The link has tons of good references, including the Zobrist hashing https://en.wikipedia.org/wiki/Zobrist_hashing for game tables. We need to find a good hashing for language based state description so that graph exploration doesn't explode computationally. Another good read for Tree Search is Thinking Fast and Slow: https://arxiv.org/abs/1705.08439 and Teaching Large Language Models to Reason with Reinforcement Learning: https://arxiv.org/abs/2403.04642 comparing the MCTS approach to other current RL strategies.

vjerancrnjak

0 replies

9h42m

2024-03-11 08:43:04 UTC

This looks too low level.

What might be a step forward is a joint learning of the state representation with the search algorithm. Search algorithm explores the NN representation of the state for which you can get the cost.

https://sites.google.com/view/genie-2024/

Genie from DeepMind is a good demonstration where discrete state is being modeled. NN learns a very complex representation with collision detection and actions. Instead of decoding that state into pixels, search could probably be done directly on that state.

Of course, this architecture could be very different.

anonymous-panda

0 replies

13h21m

2024-03-11 05:04:36 UTC

Would it be impossible to marry the two somehow? It’s hard for me to believe that the brain only uses a single technique for everything and likely has many different tools in its toolbox with a selector on top to know how to leverage each appropriately.

andrewdb

0 replies

4h45m

2024-03-11 13:40:31 UTC

An over-simplified approach that would be interesting to explore:

1. Given a collection of logical arguments, figure out how to assign a hash to each argument.

2. Build a Merkle tree representing the argument's hashes, nesting according to first principles.

3. If an argument is challenged successfully, the argument's hash changes, rendering descendent argument hashes invalid.

behnamoh

3 replies

18h23m

2024-03-11 00:02:32 UTC

A bit introduction about this would be nice.

bionhoward

2 replies

17h43m

2024-03-11 00:41:54 UTC

When we make game-playing AI (which is all AI, depending on your analogy comfort), one of the most promising techniques is Tree Search, which ranks moves based on the descendant moves. In games where you could reach the same state in many ways, much memory might be wasted to re-record the same state node on different branches.

This article is a nice exploration of an approach called Graph Search, which essentially trades compute for memory by doing extra compute work (hashing the game states) to check to see if the nodes are already visited. That saves us from re-recording nodes we already saw, and consequently converts trees (free of cycles) into directed acyclic graphs.

This forces some tinkering with the tree search to get correct results, specifically it demands a focus more on edges (actions or moves) as the unit of optimization, rather than on vertices (states). It’s a well written technical essay in literate programming written by someone who understands their subject.

TrueDuality

0 replies

4h51m

2024-03-11 13:34:11 UTC

I would argue that this saves both compute and memory for any significant search. In a traditional tree-search without treating states as identical and revistable still requires you to store every node that you visit in memory to track the N, Q, and child relations.

Applying the algorithm adds a hash associated with each node, and a counter on each edge. On the other side you're de-duplicating identical states in your search space, saving the memory of duplicate nodes but more importantly saving compute by not requiring independent searches of the same state space. Whether this trade off actually saves memory will be determined based on how likely duplicate states are in your search space.

Otherwise your summary is absolutely spot on.

3abiton

0 replies

9h34m

2024-03-11 08:51:43 UTC

This was an amazing high level recap with context about the background and importance of the technique!

pixelpoet

2 replies

18h39m

2024-03-10 23:46:06 UTC

Immediately recognise the author in the HN url as the genius behind KataGo: https://github.com/lightvector/KataGo

His posts on https://www.reddit.com/r/cbaduk/ are consistently excellent.

bingbingbing777

1 replies

17h0m

2024-03-11 01:25:08 UTC

The URL is literally within the KataGo repo.

pixelpoet

0 replies

15h47m

2024-03-11 02:38:43 UTC

The HN URL preview for me "literally" says "(github.com/lightvector)", with no upfront mention of KataGo. sigh

dooglius

1 replies

4h18m

2024-03-11 14:07:15 UTC

My chess experience is not that high, but enough that I'm skeptical of the claim that the same position would be duplicated in a search tree enough for it to matter. I'd be interested to see an actual measure of this with Leela Zero. (And I'm not even considering the threefold repetition and fifty-move rules, which when considered in the state make a repetition much less likely).

tel

0 replies

3h29m

2024-03-11 14:56:16 UTC

Ko is very common in Go, though. You can't validly repeat the board position (simplified way of handling Ko rules) but if tree search failed to evaluate ko positions correctly it'd be easy to construct situations where the AI would make bad moves.

rphln

0 replies

19h41m

2024-03-10 22:44:04 UTC

Somehow the paper they mention completely flew under my radar when I was researching MCTS. Surely it's gonna be a lot of fun to give this modification a spin on my next opportunity.