If I read their paper right, this is legit work (much more legit than DeepMind's AI math paper last month falsely advertised as solving an open math research problem) but it's still pretty striking how far away the structure of it is from the usual idea of automated reasoning/intelligence.
A transformer is trained on millions of elementary geometry theorems and used as brute search for a proof, which because of the elementary geometry context has both a necessarily elementary structure and can be easily symbolically judged as true or false. When the brute search fails, an extra geometric construction is randomly added (like adding a midpoint of a certain line) to see if brute search using that extra raw material might work. [edit: as corrected by Imnimo, I got this backwards - the brute search is just pure brute search, the transformer is used to predict which extra geometric construction to add]
Also (not mentioned in the blog post) the actual problem statements had to be modified/adapted, e.g. the actual problem statement "Let AH1, BH2 and CH3 be the altitudes of a triangle ABC. The incircle W of triangle ABC touches the sides BC, CA and AB at T1, T2 and T3, respectively. Consider the symmetric images of the lines H1H2, H2H3, and H3H1 with respect to the lines T1T2, T2T3, and T3T1. Prove that these images form a triangle whose vertices lie on W." had to be changed to "Let ABC be a triangle. Define point I such that AI is the bisector of angle BAC and CI is the bisector of angle ACB. Define point T1 as the foot of I on line BC. Define T2 as the foot of I on line AC. Define point T3 as the foot of I on line AB. Define point H1 as the foot of A on line BC. Define point H2 as the foot of B on line AC. Define point H3 as the foot of C on line AB. Define point X1 as the intersection of circles (T1,H1) and (T2,H1). Define point X2 as the intersection of circles (T1,H2) and (T2,H2). Define point Y2 as the intersection of circles (T2,H2) and (T3,H2). Define point Y3 as the intersection of circles (T2,H3) and (T3,H3). Define point Z as the intersection of lines X1X2 and Y2Y3. Prove that T1I=IZ."
How so? Reasoning is fundamentally a search problem.
The process you described is exactly the process humans use: i.e. make a guess about what's useful, try to work out details mechanically. If get stuck, make another guess, etc. So the process is like searching through a tree.
People figured out this process back in 1955 (and made a working prototype which can prove theorems): https://en.wikipedia.org/wiki/Logic_Theorist but it all hinges on using good 'heuristics'. Neural networks are relevant here as they can extract heuristics from data.
What do you think is "the usual idea of automated reasoning"? Some magic device which can solve any problem using a single linear pass?
Well, what's different is that humans invent new abstractions along the way such as complex numbers and Fourier transforms.
Not a lot of humans do
When a human has done the same thing many times they tend to try to generalize and take shortcuts. And make tools. Perhaps I missed something but I haven't seen a neural net do that.
Is that very different than the distillation and amplification process that happens during training? Where the neural net learns to predict in one step what initially required several steps of iterated execution.
IMHO, yes. It's not an (internal) invention occurring as a result of insight into the process - it's an (external) human training one model on the output of another model.
Every single human has abstractions that are unique to them. Your world model isn’t the same as mine.
It’s just that usually these abstractions are fuzzy and hard to formalize, so they aren’t shared. It doesn’t mean that they don’t exist.
This is laughable... Neural networks also have basically unknowable abstractions encoded in their weights. There was some work not long ago which taught an ANN to do modular arithmetic, and found that it was doing Fourier transforms with the learned weights....
True, but we're talking about Olympiad level math skills here.
A neural network is nothing but a heap of new abstractions from data.
I think the intuition here is that an “intelligent” system would lean much more heavily on the heuristics than on the search, like humans. It’s hard to quantify in this case, but certainly in the case of chess, when engines were about as good as best human players, they were doing orders of magnitude larger search than humans. Which made them feel more like chess _engines_ than chess AI. AlphaZero certainly made a step in the right direction, but it’s still far from how humans play.
The comment above mentions "brute force" incorrectly. It's fundamentally impossible to brute force googol combinations...
The key insight is this whole thread is that this Alpha Geometry only works because the search field is not a googol combinations. So, it doesn't really generalize to many other fields of math. We shouldn't expect an AlphaCategoryTheory or AlphaNumberTheory anytime soon.
"Brute force" = going through all combinations indiscriminately.
Using heuristics, beam search, etc, = "not brute force".
Calling something smart "brute force" is wrong.
There's already a number of papers demonstrating use of LLMs for math in general:
"Autoformalization with Large Language Models" https://arxiv.org/abs/2205.12615
"Large language models can write informal proofs, translate them into formal ones, and achieve SoTA performance in proving competition-level maths problems!" https://twitter.com/AlbertQJiang/status/1584877475502301184
"Magnushammer: A Transformer-based Approach to Premise Selection" https://twitter.com/arankomatsuzaki/status/16336564683345879...
I find it rather sad that the reaction to amazing research is "but it's just..." and "it might not work on ...". Like, maybe appreciate it a bit and consider a possibility that something similar to it might work?
AlphaGeometry works, in simplified terms, in two stages:
1. Heuristically add a candidate construction using an NN (a transformer)
2. Brute force search through all possible deductions from the current set of constructions using a symbolic solver
If you don't find a solution after step 2, repeat. There may be some backtracking involved to try a different set of constructions as well.
This approach only works because, in geometry, the set of possible deductions from a given construction is actually quite small.
Also, note that this approach overall is essentially an optimization, not amazing new capabilities. Replacing step 1 with a random construction still solves 10 problems on average in the given time, compared to 30 with the new approach. The existing algorithm, relying mostly on brute force search, is probably able to solve all of the geometry problems if given, say, 10 times as much time as the olympiad students (so not some absurd amount of time).
The search space here is not so large. Read this paper, a key part of the present work: https://doi.org/10.1023/A:1006171315513
You mean like a calculator?
A calculator that can solve any problem human could, and more? Sure.
how did you edit your comment after it got visible to others?
You can edit a comment for about an hour if I remember correctly, regardless of it being visible. You can't delete it after someone responds to it though.
I don't think this is quite right. The brute force search is performed by a symbolic solver, not the transformer. When it runs out of new deductions, the transformer is asked to suggest possible extra constructions (not randomly added).
Thanks for the correction, it looks like you're right.
That's exactly how I was taught geometry in school and I hated it with my all guts. Only after making it into the math department of the university I learned to do it properly and to enjoy it.