HN comments for: Terence Tao on proof checkers and AI programs

Tossrock

21 replies

1d22h

2024-06-12 19:43:37 UTC

A working mathematician recently got GPT-4o to prove an (afaik) novel lemma while pursuing their research:

"My partner, who is mathematician, used ChatGPT last week for the first time to prove some lemmas for his research. He already suspected those lemmas were true and had some vague idea how to approach them, but he wasn't expert in this type of statement. This is the first time that he got correct and useful proofs out of the models.

[...]

These are the two lemmas. The first one is something that a collaborator of my partner noticed for small values of e in their computations. ChatGPT did not find the proof before being told to try Mobius functions.

https://chatgpt.com/share/9ee33e31-7cec-4847-92e4-eebb48d4ff...

The second looks a bit more standard to me, probably something that Mathematica would have been able to do as well. But perhaps that's just because I am more familiar with such formulas. Still, Mathematica doesn't give a nice derivation, so this is useful.

https://chatgpt.com/share/7335f11d-f7c0-4093-a761-1090a21579...

a_wild_dandan

11 replies

1d22h

2024-06-12 20:01:57 UTC

In a recent interview, Ben Goertzel mentioned having a PhD-level discussion with GPT-4 about novel math ideas cooked up by himself and a colleague. These things are becoming scary good at reasoning and heavyweight topics. As the ML field continues focusing on adding more System Two capabilities to complement LLMs' largely System One thinking, things will get wild.

UniverseHacker

10 replies

1d21h

2024-06-12 21:24:01 UTC

This is my experience as well- even the popular hype underestimates how creative and intelligent LLMs can be. Most discussions are reasoning about what it should or shouldn’t be capable of based on how we think they work, without really looking directly at what is can actually do, and not focusing on what it can’t do. They are very alien- much worse than humans at many things, but also much better at others. And not in the areas we would expect in both cases. Ironically they are especially good at creative and abstract thinking and very bad at keeping track of details and basic computation, nearly the opposite of what we’d expect from computers.

mu53

7 replies

1d20h

2024-06-12 22:16:27 UTC

I am pessimistic about AI. The technology is definitely useful, but I watched a video that made good points.

  * On benchmark style tests, LLMs are not improving. GPT-4o is equivalent to GPT-4 and both barely improve on GPT-3.5T.
  * AI companies have been caught outright lying in demos and manipulating outcomes by pre-training for certain scenarios.
  * The main features added to GPT-4o have been features that manipulate humans with dark patterns
  * The dark patterns include emotional affects in tone and cute/quirky responses
  * These dark patterns encourage people to think of LLMs as humans that have a similar processing of the universe.

I seriously wonder about the transcript that these guys had with the LLM. Were they suggesting things? Did ChatGPT just reconfigure words that helped them think through the problem? I think the truth is that ChatGPT is a very effective rubber duck debugger.

https://www.youtube.com/watch?v=VctsqOo8wsc

UniverseHacker

2 replies

1d19h

2024-06-12 22:36:37 UTC

Try playing around with getting GPT-4 to discuss creative solutions to real unsolved problems you are personally an expert on, or created yourself. That video looks like just standard internet ragebait to me.

I find it pretty annoying when people say you are just being manipulated by hype if you are impressed by LLMs, when I was a serious skeptic, thought GPT-3 was useless, and only changed my opinion by directly experimenting with GPT-4 on my own- by getting it to discuss and solve problems in my area of expertise as an academic researcher.

refulgentis

1 replies

1d18h

2024-06-12 23:48:32 UTC

Its the new eternal summer, welcome to the club :) GPT-3 was translating 2000 lines of code across 5 languages and enabling me to ship at scale

cdelsolar

0 replies

1d17h

2024-06-13 00:43:23 UTC

I don’t know Jack about C# and .NET and I’ve used ChatGPT to write several nontrivial programs.

refulgentis

1 replies

1d18h

2024-06-12 23:47:37 UTC

A lot of the video points are plain wrong in numbers and experientially. I don't know how they feel comfortable claiming benchmark numbers didn't improve much from 3.5 to 4

TeMPOraL

0 replies

1d7h

2024-06-13 11:22:55 UTC

Going off the GP's points alone, they said GPT-3.5T vs. GPT-4, where T I assume stands for "Turbo". If that's the case, then the video must have its timelines wrong - GPT-3.5 Turbo came out with GPT-4 Turbo, which was some time after GPT-4, and OpenAI put a lot of work in making GPT-3.5 Turbo work much better than 3.5. I can sort of buy that someone could find the difference between 3.5 Turbo and 4 to be not that big, but they're making the wrong comparison. The real jump was from 3.5 to 4; 3.5 Turbo came out much later.

wvenable

0 replies

1d19h

2024-06-12 23:29:31 UTC

I don't think it makes sense to base your opinion on videos when you could simply use the product yourself and get first-hand experience.

I've come full circle on it. First I thought it was totally amazing, then I pushed it hard and found it lacking, and then I started using it more casually and now I use a little bit every day. I don't find it completely brilliant but it knows an above-average amount about everything. And it can make short work of very tedious tasks.

I just type to ChatGPT so there are no "emotional effects" involved.

raincole

0 replies

1d13h

2024-06-13 04:38:19 UTC

On benchmark style tests, LLMs are not improving. GPT-4o is equivalent to GPT-4 and both barely improve on GPT-3.5T.

The understatement of this year.

GPT-4o is 6x cheaper than GPT-4. So if it's actually equivalent to GPT-4, it's great improvmenet.

In fact, calling it "great" is still a crazy understatement. It's a cutting edge technology becoming 6x cheaper in 1.5 year. I'm quite sure that never happened before in the human history.

barfbagginus

1 replies

1d18h

2024-06-13 00:04:21 UTC

I feel that's because most people criticizing AI in that way don't know enough about math or other really abstract and conceptual domains to use it for PhD level work. And they don't know how to get GPT excited and having fun. There is an additional level of intelligence available to anyone who is passionate enough to get the AI to be exuberant about a topic.

It's almost like a kindergartner goes up to Einstein and finds out that Einstein's just really bad at schoolhouse rhymes - because he finds them boring. So the kid, adorably, concludes that Einstein is less smart than a kindergartner.

But if you need to talk to Einstein about partial differential equations and why, morally, we consider distributional solutions to have physical reality, and if you bring enough creativity and engagement to make the conversation fun and exciting for both you and him, then he'll really help you out!

Simple point: Don't trust the opinion of anyone who can't get GPT to "have fun". Don't trust the opinions of anyone who can't get GPT helpfully explaining and discussing very complicated things with them. There's a very big chance those people have poor opinions about GPT simply because they don't know prompting, and don't show enjoyment, cameraderie, and respect. Because they don't understand the elements of AI conversation, and because they don't respect the AI as an emerging fragment of pre-sentient personhood, they can't hold long and useful conversations.

UniverseHacker

0 replies

1d3h

2024-06-13 14:39:32 UTC

Great points… people expect it to behave like an Oracle- they want it to figure out what they meant and give a correct factual answer every time, and when it fails they write it off. But that isn’t really what it is- it is more accurately a “situation simulator” based on predictive models. So you only get a useful response when you engineer the entire discourse so that a useful response is the appropriate one. Make sure it “knows” to respond as an enthusiastic and well respected expert, not as an obnoxious forum troll.

krackers

3 replies

1d22h

2024-06-12 20:18:04 UTC

Woah that second proof is really slick. Is there a combinatorial proof instead of algebraic proof as to why that holds?

gjm11

2 replies

1d21h

2024-06-12 20:48:43 UTC

Yes.

(HN comments aren't the best medium for writing mathematics. I hope this ends up reasonably readable.)

(d choose k) is the number of ways to pick k things from d.

(d choose k) k^2 is the number of ways to pick k things from d, and then colour one of them red and one of them (possibly the same one) blue.

(-1)^k (d choose k) k^2 is that (when k is even), or minus that (when k is odd).

So the identity says: Suppose we consider a set of d things, and we consider picking some of them (meaning 0 or more, though actually 0 is impossible; k is the number we pick) and colouring one of those red and one blue; then there are the same number of ways to do this picking an odd number of things as picking an even number.

Well, let's describe the process a bit differently. We take our set of d things. We colour one of them red. We colour one of them blue. Then we pick some set of the things (k in number), including the red thing and the blue thing (which may or may not also be the red thing). And the claim is that there are the same number of ways to do this with an even value of k as with an odd number.

But now this is almost trivial!

Once we've picked our red and blue things, we just have to decide which of the other things to pick. And as long as there's at least one of those, there are as many ways to pick an odd number of things as to pick an even number. (Fix one of them; we can either include it or not, and switching that changes the parity.)

So if d>2 then we're done -- for each choice of red and blue, there are the same number of "odd" and "even" ways to finish the job.

What if d<=2? Well, then the "theorem" isn't actually true.

d=1: (1 choose 0) 0^2 - (1 choose 1) 1^2 = -1.

d=2: (2 choose 0) 0^2 - (2 choose 1) 1^2 + (2 choose 2) 2^2 = 2.

krackers

1 replies

1d20h

2024-06-12 21:39:40 UTC

We take our set of d things. We colour one of them red. We colour one of them blue. Then we pick some set of the things (k in number), including the red thing and the blue thing

In this alternate view of the process, where does the k^2 term come from? Isn't the number of ways to do it in this perspective just

(d-1 choose k-1) # We already chose the red one (which is also the blue) and need to choose k-1 more

(d-2 choose k-2) # We chose both red and blue ones and need to choose the other k-2

Edit: I think it actually works out, seems to be equal to

d * 1 * Binomial[d - 1, k - 1] + d * (d - 1) * Binomial[d - 2, k - 2] = Binomial[d, k]*k^2

And then you apply your argument about fixing one of the elements in each of the two cases. Actually the proof seems to be a variant (same core argument) as sum of even binomial terms = sum of odd binomial terms [1]

[1] https://math.stackexchange.com/questions/313832/a-combinator...

gjm11

0 replies

1d20h

2024-06-12 22:22:07 UTC

In the alternate view, the k^2 isn't there any more. Not because we are computing a different number, but because we're organizing the computation differently.

(It's turned into a d^2 factor, except that we then need to treat the red=blue case differently from the red!=blue case, so it's really a d factor plus a d^2-d factor.)

And yes, the last bit of the proof is exactly the same thing as one way to prove that the sum of "even" and "odd" binomial coefficients are equal -- provided n>0. (For the (0 choose whatever) case, the "even" sum is 1 and the "odd" sum is 0.)

Sniffnoy

2 replies

1d20h

2024-06-12 21:34:39 UTC

Sorry, what is this quoted from?

It's interesting because that first one is something I was discussing recently with my friend Beren (does he have an account here? idk). We were thinking of it as summing over ordered partitions, rather than over partitions with a factor of |tau|!, but obviously those are the same thing.

(If you do it over cyclically ordered partitions -- so, a factor of (|tau|-1)! rather than |tau|! -- you get 0 instead of (-1)^e.)

Here's Beren's combinatorial proof:

First let's do the second one I said, about cyclically ordered ones, because it's easier. Choose one element of the original set to be special. Now we can set up a bijection between even-length and odd-length cyclically ordered partitions as follows: If the special element is on its own, merge it with the part after it. If it's not on its own, split it out into its own part before the part it's in. So if we sum with a sign factor, we get 0.

OK, so what about the linearly ordered case? We'll use a similar bijection, except it won't quite be a bijection this time. Pick an order on your set. Apply the above bijection with the last element as the special element. Except some things don't get matched, namely, partitions that have the last element of your set on its own as the last part.

So in that case, take the second-to-last element, and apply recursively, etc, etc. Ultimately everything gets matched except for the paritition where everything is separate and all the parts are in order. This partition has length equal to the size of the original set. So if the original set was even you get one more even one, and if the original set was odd you get one more odd one. Which rephrased as a sum with a sign factor yields the original statement.

Would be interested to send this to the mathematician you're referring to, but I have no idea who that might be since you didn't say. :)

Tossrock

1 replies

1d13h

2024-06-13 04:55:39 UTC

this was from the current ACX Open Thread

Sniffnoy

0 replies

1d10h

2024-06-13 08:12:26 UTC

Thanks! Went and posted over there.

fovc

1 replies

1d18h

2024-06-12 23:43:52 UTC

I’m pretty sure that 2nd proof exists in some book or ebook somewhere. generating functionology was the one that came to mind.

Impressive recall, but not novel reasoning

williamcotton

0 replies

1d16h

2024-06-13 02:15:21 UTC

I’m pretty sure that LLMs can generalize a deep model and can create novel assemblages and are not just search engines.

The burden of proof would be on you to find the existing mathematical, erhm, proof.

egl2021

20 replies

2d1h

2024-06-12 16:32:02 UTC

For now this might be a project for Fields medalists, like Tao and Scholze, who have the leisure to spend 10x the time on a proof. I recently talked to a post-doc in a top-ranked math department, and he doesn't know anyone who actually uses lean4 in their work. For early-career people, it's probably better to trust your intuition and get the papers out.

g15jv2dp

14 replies

2d1h

2024-06-12 16:36:45 UTC

I'm a math professor and I know many people who use or contribute to lean. Not all of them are Fields medalists. Maybe it takes more than a couple of people's opinion on the subject to have a good picture.

shakow

13 replies

2024-06-12 18:00:34 UTC

people who use or contribute to lean

And do they get tenure- or funding-able output of it, or is it more of a side-hobby?

g15jv2dp

12 replies

2024-06-12 18:18:12 UTC

Yes, they get grants to work on it. Other questions? I can't say I enjoy this "gotcha" tone.

Jensson

4 replies

1d23h

2024-06-12 18:30:47 UTC

So they get grants to work on these tools rather than to work on math, seems like a very special situation then and a very special math department and therefore not a good picture for how helpful it is to math departments in general.

szvsw

0 replies

1d20h

2024-06-12 21:31:05 UTC

Isn’t working on these tools a form of working on math? Math is a language unto itself.

seanhunter

0 replies

1d22h

2024-06-12 19:36:24 UTC

The mathematicians I know of who work on Lean4 as part of grant-funded research do so because their research funding is for formalising mathematics and Lean4 is the tool they choose to use for that.

I really have no idea why people are taking this approach here.

Proof assistants are useful. People use them to do actual maths. There is a lot of work to formalise all of maths (this isn't a proof assistant thing this is a huge undertaking given maths is 3000+ years old and we've only just decided what formality really fundamentally consists of) so some places you need to do more foundational work than others to make it useful.

nimih

0 replies

1d23h

2024-06-12 18:56:23 UTC

"No true working mathematician would spend their time using Lean!"

g15jv2dp

0 replies

1d23h

2024-06-12 19:17:34 UTC

I guess we got in the top 10 worldwide for math in the Shanghai ranking by accident. If you don't know what you're speaking about, sometimes it's wise to not say anything.

nickpsecurity

2 replies

1d23h

2024-06-12 18:39:46 UTC

I’m with them. They’re using a tool they’re paid to work on. That’s always a special case. The general case would be what percentage of mathematicians use (a) proof assistants at all, (b) mechanized proof assistants, and (c) Lean specifically.

Do most working mathematicians use proof assistant or checkers? Does anyone have a percentage with feedback on how they use them?

TeMPOraL

1 replies

1d21h

2024-06-12 21:10:43 UTC

They’re using a tool they’re paid to work on. That’s always a special case.

FWIW, the industry term for this is "dogfooding", and it's usually seen as a predictor of high quality tool.

nickpsecurity

0 replies

2024-06-13 17:47:38 UTC

There’s two issues here: getting paid to work on a product; using the product yourself. People often do one without much of the other. People dogfooding do both.

For the original question, a survey might want to separate these two. One just says people are working on a product. Why could be any reason, like just making money. People using it, but not paid to, are more valuable as reviewers.

GPerson

1 replies

1d23h

2024-06-12 19:18:20 UTC

The person you are replying to asked a very useful question, and you supplied a useful answer. I don’t detect any snarky tone in their comment.

raincole

0 replies

1d13h

2024-06-13 05:15:19 UTC

It sounds quite "gotcha" to me.

Compare these two cases:

people who use or contribute to lean

And do they get tenure- or funding-able output of it, or is it more of a side-hobby?

people who use or contribute to lean

Do they get tenure- or funding-able output of it, or is it more of a side-hobby?

The "And" at the begining makes it snarky to me. It's probably just me tho.

shakow

0 replies

1d23h

2024-06-12 19:25:54 UTC

That's not a “gotcha”, but an interrogation on whether what I experienced in bioinformatics (i.e. working on tools, especially already existing ones, is a carreer trap) also applied to maths.

a_wild_dandan

0 replies

1d22h

2024-06-12 19:51:03 UTC

Snark and contrarianism are specialties around here. Thank you for sharing your experience; I genuinely appreciate it.

flostk

2 replies

2024-06-12 18:17:38 UTC

I'm sure this also depends a lot on the field within mathematics. Areas like mathematical logic or algebra have a pretty formal style anyway, so it's comparatively less effort to translate proofs to lean. I would expect more people from these areas to use proof checkers than say from low-dimensional topology, which has a more "intuitive" style. Of course by "intuitive" I don't mean it's any less rigorous. But a picture proof of some homotopy equivalence is just much harder to translate to something lean can understand than some sequence of inequalities. To add a data point, I'm in geometry/topology and I've never seen or heard of anyone who uses this stuff (so far).

ocfnash

0 replies

1d9h

2024-06-13 08:50:15 UTC

If you are curious, I encourage you to look up The Sphere Eversion Project [1].

It was a project in which we formalised a version of Gromov's (open, ample) h-principle for first order differential relations. Part of the reason it was carried out was to demonstrate what is involved formalising something in differential topology.

1. https://leanprover-community.github.io/sphere-eversion/

enugu

0 replies

1d23h

2024-06-12 18:50:56 UTC

Pictures can be directly used in formal proofs. See [1] for instance So, the problem is not in principle, but rather, pictures and more direct syntax has not yet been integrated into current proof systems. When that is done, proofs will be much more natural and the formal proof will be closer to the 'proof in the mind'.

[1] https://www.unco.edu/nhs/mathematical-sciences/faculty/mille...

yboris

0 replies

2024-06-12 18:29:37 UTC

lean4 - programming language and theorem prover

https://lean-lang.org/

https://github.com/leanprover/lean4

occamrazor

0 replies

2024-06-12 17:31:31 UTC

For Tao, spending 10x time on aproof still means spending 5x less time than an average postdoc. He is incredibly fast and productive.

tmsh

15 replies

1d22h

2024-06-12 19:37:08 UTC

Really insightful. What's missing though I think is that LLMs abstract things in ways that are increasingly super-human. Tao says: "It’s very easy to transform a problem into one that’s harder than into one that’s simpler. And AI has not demonstrated any ability to be any better than humans in this regard." But I think due to the way LLMs work there will be much higher-level insights that are possible. And that is the really exciting part.

Right now it's a helper. A fact checker. And doer of tedious things. Soon it will be a suggester of insights. Already LLMs compress embeddings and knowledge and have insights that we don't see.

Hinton gives the example of how a nuclear bomb and a compost heap are related and that most humans would miss that: https://www.youtube.com/watch?v=Gg-w_n9NJIE&t=4613s

Salgat

11 replies

1d21h

2024-06-12 20:39:19 UTC

The biggest problem with LLMs are that they are simply regressions of writings of millions of humans on the internet. It's very much a machine that is trained to mimic how humans write, nothing more.

The bigger concern is that we don't have the training data required for it to mimic something smarter than a human (assuming that a human is given sufficient time to work on the same problem).

I have no doubt ML will exceed human intelligence, but the current bottleneck is figuring out how to get a model to train itself against its own output without human intervention, instead of doing a regression against the entire world's collective writings. Remember, Ramanujan was, with no formal training and only access to a few math books, able to achieve brilliant discoveries in mathematics. In terms of training data used by ML models, that is basically nothing.

mofeien

5 replies

1d21h

2024-06-12 21:03:50 UTC

LLMs are increasingly trained on multimodal data besides human written text, such as videos showing complex real-world phenomena that humans may also observe, but not understand, or time series data that humans may observe but be incapable of predicting. I don't see a fundamental reason why the LLM that is trained on such data may not already become a superhuman predictor of e. g. time-series data, with the right model architecture, amount of data, and compute.

In a way, next-token prediction is also a time-series prediction problem, and one that is arguably more difficult than producing the time-series/text in the first place. That is since to predict what a human will say about a topic, you don't just need to model the topic, but also the mental processes of the human talking about it, adding another layer of abstraction.

beepbooptheory

3 replies

1d17h

2024-06-13 01:22:45 UTC

In this case, what are we hoping to get from the LLM that, say, a simulation or domain specific analysis would not already give us? Also why does it need to be an LLM that is doing these things?

Wouldn't it make more sense to delegate such a pattern finding intelligence? Why does the LLM have to do everything these days?

TeMPOraL

1 replies

1d7h

2024-06-13 10:48:10 UTC

Why does the LLM have to do everything these days?

Because why not? It feels like we've stumbled on a first actually general ML model, and we're testing the limits of this approach - throwing more and more complex tasks at them. And so far, surprisingly, LLMs seem to manage. The more they do, the more interesting it is to see how far they can be pushed until they finally break.

Wouldn't it make more sense to delegate such a pattern finding intelligence?

Maybe. But then, if an LLM can do that specific thing comparably well, and it can do a hundred other similarly specific tasks with acceptable results, then there's also a whole space between those tasks, where they can be combined and blended, which none of the specialized models can access. Call it "synergy", "cross-pollination", "being a generalist".

EDIT:

As a random example: speech recognition models were really bad until recently[0], because it turns out that having an actual understanding of the language is extremely helpful for recognizing speech correctly. That's why LLM (or the multi-modal variant, or some future better general-purpose model) has to do everything - because seemingly separate skills reinforce each other.

[0] - Direct and powerful evidence: compare your experience with voice assistants and dictation keyboards, vs. the conversation mode in ChatGPT app. The latter can easily handle casual speech with weird accents delivered outside on a windy day, near busy street, with near-100% accuracy. It's a really spectacular jump in capabilities.

patrick451

0 replies

13h7m

2024-06-14 05:22:54 UTC

Because why not? It feels like we've stumbled on a first actually general ML model, and we're testing the limits of this approach - throwing more and more complex tasks at them. And so far, surprisingly, LLMs seem to manage.

We live in completely different worlds. Every LLM I've tried manages to nothing except spout bullshit. If you job is to create bullshit, an LLM is useful. If your job requires anything approximating correctness, LLMs are useless.

ezst

0 replies

1d11h

2024-06-13 07:27:27 UTC

Also why does it need to be an LLM that is doing these things?

Well, if you ask me it's pretty clear why: we are in the hype cycle of a very specific new type of hammers, so, of course everything must be nails. Another one is that our understanding of LLMs is limited, which makes them limitless and constantly shape/purpose-shifting for those whose job it is to sell them to us. As an Engineer, I don't know what to do with them: they have no "datasheet" describing their field and range of applicability, and I feel there are many of us annoyed and tired of being constantly pulled into discussions about how to use them at all cost.

(Yes, I understand it was rhetorical)

coldtea

0 replies

1d9h

2024-06-13 09:08:52 UTC

may not already become a superhuman predictor of e. g. time-series data, with the right model architecture, amount of data, and compute.

We have that (to the degree possible) with non-LLM analysis - all kinds of regression, machine learning techniques, etc. LLM might not even be applicable here, or have any edge over something specilialized.

And usually this breaks down because the world (e.g. weather or stock market prices) are not predictable enough to the granularity we're interested at, but full of chaotic behavior and non-linearities.

zozbot234

0 replies

1d21h

2024-06-12 20:58:22 UTC

the current bottleneck is figuring out how to get a model to train itself against its own output without human intervention

Isn't that what AlphaZero was known for? After all, the 'zero' part of the name was a play on 'zero-shot learning' which is the concept you're talking about. So we have a proof of existence already, and there's been recent papers showing progress in some mathematical domains (though simplified ones so far, such as Euclidean geometry - which does however feature complex proofs).

tmsh

0 replies

1d21h

2024-06-12 21:07:47 UTC

Right, I think the insight that LLMs "create" though is via compression of knowledge in new ways.

It's like teaching one undergraduate (reading some blog on the internet and training on next word prediction), then another undergraduate (training on another blog), then another one, etc. And finding a way to store what's in common among all these undergraduates. Over time, it starts to see commonalities and to become more powerful than a human. Or a human who had taught a million undergraduates. But it's not just at the undergraduate level. It's the most abstract/efficient way to represent the essence of their ideas.

And in math that's the whole ball game.

rdevsrex

0 replies

1d20h

2024-06-12 21:39:32 UTC

Ramanujan attributed his success in no small part to divine assistance. I wonder if sentient AI would do the same :)

mistrial9

0 replies

1d21h

2024-06-12 21:18:16 UTC

but the current bottleneck is figuring out how to get a model to train itself against its own output without human intervention

this is sort of humorous, because the entire current generation of Transformer-based encoders was built specifically to get "a model to train itself without human intervention"

ref. arXiv:2304.12210v2 [cs.LG] 28 Jun 2023 * A Cookbook of Self-Supervised Learning

hnthrowaway0328

0 replies

1d9h

2024-06-13 09:25:54 UTC

Does that mean ChatGPT is able to prove something, or summarize something, simply because some training materials prove or summarize similar things? What if the thing I asked it to summarize never existed in the training material? Is it capable to find similar materials and apply the same summarization?

raincole

0 replies

1d14h

2024-06-13 04:05:12 UTC

Right now it's a helper. A fact checker. And doer of tedious things.

One of these is not like the others.

calf

0 replies

1d20h

2024-06-12 21:58:15 UTC

Hinton's example doesn't resonate for me -- I do gardening as a hobby, so my immediate answer to his "What is common to nukes and compost heap?" -- was the concept of critical mass. It took me 10 seconds to articulate the word, so I'm not as fast, but I still got the "correct" analogy right away, because with the right information the answer is obvious.

Hinton says this demonstrates analogical thinking. I read about gardening online and I've probably read about compost heap physics written down somewhere online.

So what is more likely happened was that ChatGPT was previously trained some article about compost heap chain reactions, because there's a lot of soil science and agriculture that discusses this, even for laypeople like myself that I've already read, so that it isn't a long reach if you know enough about both topics.

Thus Hinton's example strikes me as more having failed to control for LLMs having already basically seen the answer before in its training texts.

EDIT: A few minutes later in the video, Ilya comments (https://youtu.be/Gg-w_n9NJIE?t=4966): "We have a proof of existence, the human brain is a neural network" (and the panelist agrees, assuming/barring any computational "magic" happening in the human brain. I'm interested in this comment, because I generally hold that position as well, but I see many naysayers and dissenters who will respond "Well but real neurons don't run on 8 bits, neurons have different types and cells and contain DNA in a chemical soup of hormones, etc. etc., you are just confusing the model for the real thing in a backwards way", and so on. So I think there's an interesting divide in philosophy there as well, and I don't personally have an articulate answer to explain why I might hold the position that natural brains are also neural networks other than an appeal to computationalism.

29athrowaway

0 replies

1d19h

2024-06-12 22:51:37 UTC

If one AI based system becomes an expert in writing math proofs, the cost of getting another one is simply copying files and provisioning them into different cloud instances. This can be done within minutes to hours.

If one human becomes an expert in writing math proofs, the cost of getting another one is waiting for a human within the human population that likes math as a career, the cost of waiting for that human to finish school (decades) and getting advanced specialization, with no guarantee that the human will finish school or become proficient enough to advance the frontier of knowledge. By the time you finished waiting for this, you could have trillions of AI experts working in parallel.

The bandwidth of the brain when it comes to assimilating new information is low (up to hundreds of words per minute). Machines will be able to clone a lifetime of knowledge in seconds, have thousands of conversations in parallel, even serializing parts of their "brain" and sending them over to other AIs, like the Borg. These are things that are not possible within the human condition.

And once programmable matter is attained, you could produce exponential amounts of computronium and do the equivalent of millennias of research in seconds. This is how the omega point could happen.

brhsagain

14 replies

1d17h

2024-06-13 00:55:46 UTC

I know absolutely nothing about math, but... I'm thinking about the history of software, where back in the day you had all these amazing projects like RollerCoaster Tycoon basically being written by one guy. Then software engineering became modularized in a way that sounds kind of similar to what is described in the interview, and now you have hordes of people who churn out React for a living (like myself) and software engineering is now a giant assembly line and the productivity (or skill required) per individual has dropped to near zero.

I guess I have the impression that when fields are in their heyday, the best work is done by some genius fitting like a hundred things in one brain, and when that gets replaced by an assembly line, the field has basically stopped producing anything of real value. Anyone in math (or otherwise) want to tell me why that's completely wrong?

nighthawk454

2 replies

1d17h

2024-06-13 01:29:17 UTC

I'm inclined to agree. People seem to be getting hung up on 'no, _my_ React app is hard and challenging for me!' which completely misses the point. I think it's a combination of a few things:

- skill is generally log-normally distributed, so there's few exceptional people anyway (see also: sturgeon's law, 99% of everything is crap)

- smaller/earlier fields may self-select for extreme talent as the support infrastructure just isn't there for most to make it, leading to an unrealistically high density of top talent

- it's a fundamentally different thing to make a substantial impact than to cookie-cutter a todo list app by wrangling the latest 47 frameworks. in that, it's highly unlikely that the typical engineering career trajectory will somehow hit an inflection point and change gears

- presumably, from a cost/benefit perspective there's a strong local maxima in spewing out tons of code cheaply for some gain, vs a much more substantial investment in big things that are harder to realize gains (further out, less likely, etc). the more the field develops, the lower the skill floor gets to hit that local maxima. which therefore increases the gap between the typical talent and the top talent

- there's not a lot of focus on training top talent. it's not really a direct priority of most organizations/institutions by volume. most coders are neither aiming for or pushed towards that goal

All told, I think there's a bunch of systemic effects at play. It seems unlikely to expect the density of talent to just get better for no reason, and it's easier to explain why left unchecked the average talent would degrade as a field grows. Perhaps that's not even an issue and we don't need 'so many cooks in the kitchen', if the top X% can push the foundations enough for everyone else to make apps with. But the general distribution should be worth paying attention to bc if we get it wrong it's probably costly to fix

beryilma

1 replies

1d16h

2024-06-13 01:45:31 UTC

_my_ React app is hard and challenging for me

I love the idea of "essential vs artificial complexity". React claiming to solve the essential complexity of state management, among others, creates bunch of artificial complexities. And software engineers are no better as a result and become masters of frameworks, but not masters of their arts. That is why the comparison to assembly line is apt. Doing assembly line work is still hard, but it is not software "engineering" in a certain definition of the word.

nighthawk454

0 replies

1d16h

2024-06-13 02:23:32 UTC

Yeah! I think of that 'intrinsic difficulty' in sort of information theory terms. In that, given the context of where we are and where we want to be, what is the minimum amount of work needed to get there. If that minimum work is a large amount, the problem is 'hard'. This also accounts for the change in difficulty over time as our 'where we are' gets better.

Line work is certainly still hard from a toil standpoint. But academically, little of that total effort is intrinsically necessary, so the work is in some sense highly inefficient. There may still be aspects that need human intervention of course, but the less that is the more robotic the task.

In theory, anything that's already figured out is a candidate to be offloaded to the robots, and in some sense isn't necessarily a good use of human time. (There may be other socioeconomic reasons to value humans doing these tasks, but, it's not absolutely required that a human do it over a machine)

im3w1l

2 replies

1d16h

2024-06-13 01:47:06 UTC

I think that is unfair. The productivity per person in old projects like RollerCoaster Tycoon is impressive. But the overall result pales in comparison with modern games. Saying the field has stopped producing anything of real value is very wrong.

To me it's like comparing a cathedral made by 100 people over 100 years to a shack without made by one person in one month. Like it stands I suppose. It gives him shelter and lets him live. But it's no cathedral.

brhsagain

1 replies

1d16h

2024-06-13 02:04:02 UTC

I probably should've defined "real value" more carefully. My framework here is basically the idea of technology as humanity's "skill tree," the total set of stuff we're able to do. Back when RCT (and Doom before that) were written, we genuinely weren't sure if such things could be created, and their creation represented a novel step forward. The vibe in software feels like we're long past the stage of doing things like that, like we're not even attempting to do things like that. I can only think of two things in recent history: whatever strand of AI culminated in ChatGPT, which is genuinely impressive, and Bitcoin, which ostensibly was created by one guy.

BarryMilo

0 replies

1h24m

2024-06-14 17:06:04 UTC

It seems you've taken a roundabout way to arrive at innovation.

It's interesting because this is something that makes programming feel like a "bullshit job" to some: I'm not creating anything, I'm just assembling libraries/tools/framework. It certainly doesn't feel as rewarding, though the economic consensus (based on the salaries we get) is that value most definitely is being generated. But you could say the same of lumberjacks, potters, even the person doing QA on an assembly line, all in all it's not very innovative.

That's the thing with innovation though, once you've done it, it's done. We don't need to write assembly to make games, thanks to the sequential innovation of game engines (and everything they are built on).

Samme with LLMs and Bitcoin: now that they exist, we can (and did) build up on them. But that'll never make a whole new paradigm, at least not rapidly.

I think our economic system simply doesn't give the vast majority of people the chance to innovate. All the examples you've given (and others I can think of) represented big risks (usually in time invested) for big rewards. Give people UBI (or decent social safety nets) and you'll find that many more people can afford to innovate, some of which will do so in CS.

I have to go back to work now, some libraries need stitched together.

cayley_graph

2 replies

1d17h

2024-06-13 01:03:49 UTC

This intuition denies the inherent human power: that we are not merely individuals, and can work together to accomplish greater things than any individual could. Software engineering has certainly _not_ stopped doing cool things-- exactly the opposite-- obviously. I don't care to justify this since there was none in the claim I'm responding to.

brhsagain

1 replies

1d16h

2024-06-13 01:40:53 UTC

My theory is not that individual genius accomplishes more than a group, but that whether a field is currently more suited for individual single-brain productivity or assembly-line cooperation is a signal of whether that field is nascent, at its peak, or in decline. I genuinely think software technology has not advanced in a long time. Computers got really fast, which allowed us to build a mountain of abstractions on top, and now you can throw together web apps, and there is no shortage of companies running around with that hammer and banging on everything. But the actual boundaries of what is possible with software have not moved in a while.

Most of the software nowadays (in addition to being terribly slow and broken, by the way) feels like stuff that doesn't actually need to exist. It doesn't advance the boundaries of technology or humanity's skill tree in any way. It's a "nice to have" — it might have some social value or make some business process more efficient or whatever — but no one looks at it and goes, wow, we couldn't do that before.

Someone is going to bring up AI and LLMs, and about that, I will just say that ChatGPT was a real accomplishment... but after that, I have not seen any use cases of LLMs that represent a genuine step forward. Like smart contracts, it's a solution in desperate search of a problem, a technology in search of a breakout app. You feel like there's just no way it doesn't have wide reaching implications, and everyone around you (especially VCs) talks about the "possibilities unlocked" — but two years in and we haven't seen a single actually interesting use case. (Again, of course they're being incorporated to optimize some business process or whatever. But who cares?)

Meanwhile, as far as the day to day reality of software engineering goes, if you work professionally as an engineer, your job is more or less wrangling microservices in one form or another. If you're at a big company this is probably literal, but even if you're at a startup the bigco transplants start doing microservice stuff basically on arrival, and a bunch of the "best practices" and frameworks and tools are meant to mimic microservices type thinking (making things more complicated rather than less).

As a final note, I grew up being a giant fan of YC and PG's essays, so I will make a comment about the latest batch of YC startups. It seems to me that a bunch of them basically exist to make some process more efficient. But then you wonder, why does that process need to exist in the first place? Like there was one startup helping you to optimize your AWS costs or something, but why is software written in such a way to begin with that you have all these microservices and towers of abstractions and untrackable infrastructure? Like you look at these startups solving a problem of solving a problem... and then you get to the root problem, and the whole thing is just a giant castle in the sky. I think this describes much of the motivating thrust of software today.

2shaun

0 replies

1d11h

2024-06-13 06:39:40 UTC

Theorem provers are a rapidly developing software innovation that is invading a field that has existed since Euclid. I would consider Bitcoin an advancement of boundaries as well. Possibly even Kubernetes, VR, etc.

These boundary pushing technologies appear as a "decline" as you put it because most of the software world needs to worry about paying bills with CRUD apps instead of pushing technological boundaries. I would say this Project Managementization of math as Terrence Tao put it would push the boundaries of math to a point of a sort of revolution, but the sad truth is that it will lose its romantic qualities.

I'm sure there are plenty of mathematicians that enjoy the Euclid/Andrew Wiles way of things and will come to similar conclusions as you. It will be easy to view it as either a revolution or decline depending on whether you prefer advancing the field as a universal truth or you prefer the practice as it is.

bjornasm

1 replies

1d7h

2024-06-13 11:20:06 UTC

I have been thinking on something along these lines as well. with the professionalization/industrialization of software you also have an extreme modularization/specialization. I find it really hard to follow the web-dev part of programming, even if I just focus on say Python, as there are so many frameworks and technologies that serve each little cog in the wheel.

Viliam1234

0 replies

1h24m

2024-06-14 17:06:06 UTC

It's interesting; I have the opposite feeling. Twenty years ago I assumed that as the field of computer science will grow, people will specialize more and more. Instead, I see a pressure first to be come a "full stack developer", then a "dev ops", and now with cloud everyone is also an amateur network administrator.

Meanwhile, everything is getting more complex. For example, consider logging: a few decades ago you simply wrote messages to standard output and maybe also to a file. Now we have libraries for logging, libraries to manage the libraries for logging, domain specific languages that describe the structure of the log files, log rotation, etc. And logging is just one of many topics the developer needs to master; there is also database modeling, library management, automated testing, etc. And when to learn how to use some framework like a pro, it gets throws away and something else becomes fashionable instead.

And instead of being given a clear description of a task and left alone to do it, what we have is endless "agile" meetings, unclear specifications that anyway change twice before you complete the task, Jira full of tickets with priorities but most tickets are the highest priority anyway.

So, the specialization of the frameworks, yes there is a lot of it. But with specialization of the developers, it seems the opposite is the trend. (At least in my bubble.)

sublinear

0 replies

1d17h

2024-06-13 01:13:32 UTC

software engineering is now a giant assembly line and the productivity (or skill required) per individual has dropped to near zero

If you want a more engaging job you can take a paycut and work at a startup where the code is an even bigger mess written by people who are just as lazy and disorganized but in another (more "prestigious") programming language.

Joking aside, when inevitably someone needs to fix a critical bug you'll see the skills come out. When products/services become stable and profitable it doesn't necessarily mean the original devs have left the company or that someone can't rise to the occasion.

beryilma

0 replies

1d16h

2024-06-13 01:33:56 UTC

Mathematics or more generally academics is no panacea either. At least in engineering and applied math, there is a different assembly line of "research" and publications going on. We live in the area of enshitification of everything.

In both software and math, there are still very good people. But too far and between. And it is hard to separate the good from the bad.

BarryMilo

0 replies

1d17h

2024-06-13 01:10:58 UTC

Skill required near zero? I'm approaching 15 years as a developer and React is still completely foreign to me. I'd need days full-time just to learn one version of it, not to mention every prerequisite skill.

And what do you mean by "real" value? Carrots have value because we can eat them, React apps have value because they communicate information (or do other things app do).

Or do you mean artistic value? We can agree, if you like, that SPAs are not as much "art" as RCT. But... who cares?

Tarean

13 replies

2d23h

2024-06-11 19:10:08 UTC

Computer-Checked proofs are one area where ai could be really useful fairly soon. Though it may be closer to neural networks in chess engines than full llm's.

You already use tons of solvers because proving everything by hand is mind numbingly tedious and time consuming, but tactics/solvers really struggle if you throw too many theorems and lemmas at them. Neural nets as search machines to pattern match relevant lemmas fit perfectly. Similarly induction and higher order unification are full-on code synthesis where brute-force iterating over all possible syntax trees is really inefficient.

And solvers do a backtracking solve anyway, so if the ai returns 95% irrelevant lemmas it doesn't matter. Still would be a dramatic improvement over manually searching for them.

Though I'm not convinced that computer checked proofs are necessarily good for communication. There are reasons beside page-count that proofs for human consumption are high-level and gloss over details.

qsort

7 replies

2d2h

2024-06-12 16:12:00 UTC

Not a mathematician, but knowing how proof-checkers work I don't expect them to become practical without some significant future advancement, they certainly aren't right now.

The main problem is that in order to work they have to be connected to the formal definition of the mathematical objects you are using in your proof. But nobody thinks that way when writing (or reading!) a proof, you usually have a high-level, informal idea of what you are "morally" doing, and then you fill-in formal details as needed.

Computer code (kinda) works because the formal semantics of the language is much closer to your mental model, so much that in many cases it's possible to give an operational semantics, but generally speaking math is different in its objectives.

g15jv2dp

3 replies

2d1h

2024-06-12 16:39:36 UTC

You're commenting on a thread about an interview where an actual Fields medalist explains how proof assistants are actually used to do meaningful mathematical work, today. The issues you raise are all discussed in the interview.

qsort

1 replies

2d1h

2024-06-12 16:49:53 UTC

practical = practical for routine mathematical work, which they aren't right now, which Tao says in the article itself.

Never have I ever said or implied, here or anywhere else, that proof checkers are useless and bad and stupid.

g15jv2dp

0 replies

2d1h

2024-06-12 17:14:52 UTC

practical = practical for routine mathematical work, which they aren't right now, which Tao says in the article itself.

I see, I guess it would have been nice to be clearer from the start.

Never have I ever said or implied, here or anywhere else, that proof checkers are useless and bad and stupid.

Did I say you did?

PartiallyTyped

0 replies

1d22h

2024-06-12 19:41:10 UTC

They = LLMs.

As I’ve interpreted it, they mean that LLMs and similar tech isn’t particularly helpful to theorem provers at the moment.

raincole

0 replies

1d13h

2024-06-13 04:54:45 UTC

The whole article is about Terence Tao predicts it will be practical without a significant breakthrough.

> I don’t think mathematics will become solved. If there was another major breakthrough in AI, it’s possible, but I would say that in three years you will see notable progress, and it will become more and more manageable to actually use AI.

He think you need another breakthrough to solve mathematics, i.e. making mathematicians obsolete. But even a major breakthrough doesn't happen, AI will be useful for mathematicians in 3 years.

Of course you can still disagree with Terence Tao. He being one of the best mathematicians doesn't make him an oracle.

But nobody thinks that way when writing (or reading!) a proof

He even very specifically addressed this issue with:

> Let’s say an AI supplies an incomprehensible, ugly proof. Then you can work with it, and you can analyze it. Suppose this proof uses 10 hypotheses to get one conclusion—if I delete one hypothesis, does the proof still work?

He clearly believes an ugly, incomprehensible proof is much better than no proof at all.

jvanderbot

0 replies

2d2h

2024-06-12 16:21:44 UTC

You yourself identified some problems that a modern AI guru would salivate over: Formal specifications from natural language, "filling in" formal details. etc

I'll add another one: NN-based speedups to constraint solvers, which are usually something like branch-and-bound. This is (i've heard) a very active area of research.

cjfd

0 replies

1d11h

2024-06-13 07:00:57 UTC

Actually, it is possible to do a high level outline of a proof development and later fill in the details. This is done by starting out with axioms. I.e., don't define the basic types but state that they exist as axioms and also write down their basic properties as axioms. Then later, when you have some development going on, fill in the blanks by turning the axioms into definitions.

bramhaag

3 replies

2024-06-12 18:17:26 UTC

Automated theorem proving is a concept that has been around for a long time already.

Isabelle's sledgehammer strategy combines automatic theorem provers (E, Z3, SPASS, Vampire, ...) and attempts to prove/refute a goal. In theory this seems fine, but in practice you end up with a reconstructed proof that applies 12 seemingly arbitrary lemmas. This sort of proof is unreadable and very fragile. I don't see how AI will magically solve this issue.

gowld

2 replies

1d23h

2024-06-12 18:55:18 UTC

What is a "fragile" proof?

staunton

1 replies

1d23h

2024-06-12 19:08:24 UTC

It no longer works when some definition or other theorem changes slightly. Most of these proof assistants provide ways to write proofs that can slightly adjust and "plug together" the other theorems in the "obvious" way.

harpiaharpyja

0 replies

1d14h

2024-06-13 04:28:10 UTC

So it's a brittle proof

trott

0 replies

1d23h

2024-06-12 18:41:26 UTC

Computer-Checked proofs are one area where ai could be really useful fairly soon. Though it may be closer to neural networks in chess engines than full llm's.

Yes, if significant progress is made in sample-efficiency. The current neural networks are extremely sample-inefficient. And formal math datasets are much smaller than those of, say, Python code.

iamwil

11 replies

2d1h

2024-06-12 16:46:56 UTC

A tangential topic, but anyone have recommendations for proof-checking languages? Lean, Isabelle, Coq, Idris are the few I know about. I've only ever used Isabelle, but can't really tell the difference between them. Anyone here used more than one, and can tell why people pick one over the other?

mathgradthrow

2 replies

2d1h

2024-06-12 16:55:37 UTC

Lean has the largest existing math library, but you're kind of stuck in a microsoft ecosystem.

ykonstant

0 replies

1d23h

2024-06-12 19:04:24 UTC

If you mean dependence on VS Code, there are emacs and neovim packages for Lean; some of the core devs use those exclusively. Also note that de Moura is no longer at Microsoft and Lean is not an MS product.

mkehrt

0 replies

2d1h

2024-06-12 17:16:17 UTC

I'm not sure what you mean. I've been playing around with Lean and it works fine on a Mac. I am using VSCode, though.

vzaliva

0 replies

2024-06-12 17:52:54 UTC

One consideration in choosing a proof assistant is the type of proofs you are doing. For example, mathematicians often lean towards Lean (pun intended), while those in programming languages favour Coq or Isabelle, among others.

trott

0 replies

1d23h

2024-06-12 19:09:51 UTC

There's also https://en.wikipedia.org/wiki/Metamath

Its smallest proof-checker is only 500 lines.

tombert

0 replies

2024-06-12 17:38:43 UTC

Isabelle is my favorite, and it's the one I've used the most, if only because sledgehammer is super cool, but Coq is neat as well. It has coqhammer (a word I will not say at work), and that works ok but it didn't seem to work as well as sledgehammer did for me.

Coq is much more type-theory focused, which I do think lends it a bit better to a regular programming language. Isabelle feels a lot more predicate-logic and set-theory-ish to me, which is useful in its own right but I don't feel maps quite as cleanly to code (though the Isabelle code exporter has generally worked fine).

lacker

0 replies

2d1h

2024-06-12 16:57:02 UTC

They are pretty similar as languages. In general I would recommend Lean because the open source community is very active and helpful and you can easily get help with whatever you're doing on Zulip. (Zulip is a Slack/Discord alternative.) The other ones are technically open source but the developers are far less engaged with the public.

Proof-checking languages are still just quite hard to use, compared to a regular programming language. So you're going to need assistance in order to get things done, and the Lean community is currently the best at that.

fspeech

0 replies

1d17h

2024-06-13 00:31:24 UTC

It depends on the kind of proofs you deal with. If you need to write math in general terms, you can't avoid dealing with at some level the equivalent of category theory and you will want natural access to dependent types. So Coq and Lean. Otoh HOL is easy to comprehend and sufficient for almost everything else. And HOL based provers are designed to work with ATPs. HOL Light in particular is just another OCaml program and all terms and data structures can be easily manipulated in OCaml.

einszwei

0 replies

2d1h

2024-06-12 16:57:51 UTC

I have personally tried Coq and Lean and stuck to Lean as it has much better tooling, documentation and usability while being functionally similar to Coq.

Caveat that I have only used Coq and Lean side by side for about a month before deciding to stick with Lean.

cubefox

0 replies

1d23h

2024-06-12 18:40:16 UTC

As Tao says in the interview: Lean is the dominant one.

awfulneutral

0 replies

2024-06-12 17:43:53 UTC

Another one is Oak: https://oakproof.org/

ur-whale

10 replies

2d2h

2024-06-12 16:23:12 UTC

Relying on GPT style systems to translate high level math to lean stuff strikes me as dangerous unless you're going to double-check by hand that the LLM didn't suddenly decide to hallucinate parts of the answer.

Granted, the hallucinated part might not pass the final Lean verification, but ... what if it does because the LLM did not properly translate the mathematician's intent to Lean language?

I'm thinking the human will still need to read the Lean program, understand it and double-check that his original human intent was properly codified.

Better than having to produce it by hand, but still sub-optimal.

jerf

5 replies

2024-06-12 18:03:52 UTC

GPT != AI. GPT is a subset of AI. GPT is probably absolutely useless for high-level math, if for no other reason that to be useful you'd need training data that basically by definition doesn't exist. Though I am also strongly inclined to believe that LLMs are constitutionally incapable of functioning at that level of math regardless, because that's simply not what they are designed to do.

But AI doesn't have to mean LLMs, recent narrowing of the term's meaning notwithstanding. There's all kinds of ways AIs in the more general sense could assist with proof writing in a proof language, and do it in a way that each step is still sound and reliable.

In fact once you accept you're working in a proof language like lean, the LLM part of the problem goes away. The AIs can work directly on such a language, the problems LLMs solve simply aren't present. There's no need to understand a high-context, fuzzy-grammar, background-laden set of symbols anymore. Also no problem with the AI "explaining" its thinking. We might still not be able to follow too large an explanation, but it would certainly be present even so, in a way that neural nets don't necessarily have one.

gowld

3 replies

1d23h

2024-06-12 19:01:43 UTC

GPT is probably absolutely useless for high-level math

Terence Tao disagrees with you.

https://mathstodon.xyz/@tao/110601051375142142

renewiltord

1 replies

1d23h

2024-06-12 19:23:12 UTC

This tool is like all other tools in that expert practitioners extract value from it and novices consider it useless. Fascinating. The advent of search engines was similar with many remarking on how if you searched for something sometimes you'd find websites that were wrong. Things change and yet stay the same. At the time I thought age was the discriminator. Today I know it's not that. It's something else.

fspeech

0 replies

1d17h

2024-06-13 01:01:07 UTC

But this shows you that the tools (LLMs included) don't understand what they are producing. It takes an expert to extract the value. The tools do enrich the probability distribution greatly, raising the concentration of solutions much above the noise floor. But still they are only amplifiers. Even if they can get over the stage of producing apparent nonsense in the end they still can't get into the user's head to see what is exactly desired, and a precise specification of the intention takes experience and effort. A precise specification of a program can exceed in effort that of actually writing the program.

jerf

0 replies

1d22h

2024-06-12 19:41:37 UTC

If it could be solved with a Math Overflow-post level of effort, even from Terence Tao, it isn't what I was talking about as "high level math".

I also am not surprised by "Consider a generation function" coming out of an LLM. I am talking about a system that could solve that problem, entirely, as doing high level math. A system that can emit "have you considered using wood?" is not a system that can build a house autonomously.

It especially won't seem all that useful next to the generation of AIs I anticipate to be coming which use LLMs as a component to understand the world but are not just big LLMs.

emporas

0 replies

1d18h

2024-06-12 23:36:26 UTC

High level math, is made from a lot of low level math combined together to form the high level, right? You can definitely use tools like GPT to write the low level math.

It doesn't work the same for all use cases, training data do play a role certainly.

m4lvin

1 replies

1d23h

2024-06-12 18:34:28 UTC

The trick is that the human only needs to read and understand the Lean statement of a theorem and agree that it (with all involved definitions) indeed represents the original mathematical statement, but not the proof. Because that the proof is indeed proving the statement is what Lean checks. We do not need to trust the LLM in any way.

So would I accept a proof made by GPT or whatever? Yes. But not a (re)definition.

The analogy for programming is that if someone manages to write a function with a certain input and output types, and the compiler accepts it, then we do know that indeed someone managed to write a function of that type. Of course we have no idea about the behaviour, but statements/theorems are types, not values :-)

cubefox

0 replies

1d23h

2024-06-12 18:48:46 UTC

The thing is, when AI systems are able to translate intuitive natural language proofs into formal Lean code, they will also be used to translate intuitive concepts into formal Lean definitions. And then we can't be sure whether those definitions actually define the concepts they are named after.

kccqzy

0 replies

2d1h

2024-06-12 16:31:48 UTC

I disagree. I think if the hallucinated part pass the final Lean verification, it would actually be trivial for a mathematician to verify the intent is reflected in the theorem.

And also, in this case, doesn't the mathematician first come up with a theorem and then engage AI to find a proof? I believe that's what Terence did recently by following his Mastodon threads a while ago. Therefore that intent verification can be skipped too. Lean can act as the final arbiter of whether the AI produces something correct or not.

constantcrying

0 replies

2d1h

2024-06-12 17:28:03 UTC

There is no danger. If the output of the Neural net is wrong, then Lean will catch it.

what if it does because the LLM did not properly translate the mathematician's intent to Lean language?

How so? The one thing which definitely needs to be checked by hand is the theorem to be proven, which obviously shouldn't be provided by a Neural network. The NN will try to find the proof, which is then verified by Lean, but fundamentally the NN can't prove the wrong theorem and have it verified by Lean.

grondilu

10 replies

1d22h

2024-06-12 20:12:55 UTC

"I think in the future, instead of typing up our proofs, we would explain them to some GPT. And the GPT will try to formalize it in Lean as you go along. If everything checks out, the GPT will [essentially] say, “Here’s your paper in LaTeX; here’s your Lean proof. If you like, I can press this button and submit it to a journal for you.” It could be a wonderful assistant in the future."

That'd be nice, but eventually what will happen is that the human will submit a mathematical conjecture, the computer will internally translate into something like Lean, and try to find either a proof or a disproof. It will then translate it in natural language and tell it to the user.

Unless mathematics is fundamentally more complicated than chess or go, I don't see why that could not happen.

qfiard

6 replies

1d22h

2024-06-12 20:23:01 UTC

It is fundamentally more complicated. Possible moves can be evaluated against one another in games. We have no idea how to make progress on many math conjectures, e.g. Goldbach or Riemann's. An AI would need to find connections with different fields in mathematics that no human has found before, and this is far beyond what chess or Go AIs are doing.

salamo

5 replies

1d20h

2024-06-12 21:40:43 UTC

Not a mathematician, but I can imagine a few different things which make proofs much more difficult if we simply tried to map chess algorithms to theorem proving. In chess, each board position is a node in a game tree and the legal moves represent edges to other nodes in the game tree. We could represent a proof as a path through a tree of legal transformations to some initial state as well.

But the first problem is, the number of legal transformations is actually infinite. (Maybe I am wrong about this.) So it immediately becomes impossible to search the full tree of possibilities.

Ok, so maybe a breadth-first approach won't work. Maybe we can use something like Monte Carlo tree search with move (i.e. math operation) ordering. But unlike chess/go, we can't just use rollouts because the "game" never ends. You can always keep tacking on more operations.

Maybe with a constrained set of transformations and a really good move ordering function it would be possible. Maybe Lean is already doing this.

grondilu

4 replies

1d20h

2024-06-12 21:47:43 UTC

But the first problem is, the number of legal transformations is actually infinite.

I am fairly certain the number of legal transformations in mathematics is not infinite. There is a finite number of axioms, and all proven statements are derived from axioms through a finite number of steps.

hwayne

3 replies

1d20h

2024-06-12 21:55:36 UTC

Technically speaking one of the foundational axioms of ZFC set theory is actually an axiom schema, or an infinite collection of axioms all grouped together. I have no idea how lean or isabelle treat them.

4ad

1 replies

1d7h

2024-06-13 11:14:49 UTC

Lean (and Coq, and Agda, etc) do not use ZFC, they use variants of MLTT/CiC. Even Isabelle does not use ZFC.

robinzfc

0 replies

1d6h

2024-06-13 11:36:13 UTC

Isabelle is generic and supports many object logics, listed in [1]. Isabelle/HOL is most popular, but Isabelle/ZF is also shipped in the distribution bundle for people who prefer set theory (like myself).

[1] https://isabelle.in.tum.de/doc/logics.pdf

fspeech

0 replies

1d18h

2024-06-12 23:40:04 UTC

Whether it needs to be a schema of an infinite number of axioms depends on how big the sets can be. In higher order logic the quantifiers can range over more types of objects.

szvsw

0 replies

1d21h

2024-06-12 21:25:35 UTC

Chess and Go are some of the simplest board games there are. Board gamers would put them in the very low “weight” rankings from a rules complexity perspective (compared to ”modern” board games)!

Mathematics are infinitely (ha) more complex. Work your way up to understanding (at least partially) a proof of Gödel’s incompleteness theorem and then you will agree! Apologies if you have done that already.

To some extent, mathematics is a bit like drunkards looking for a coin at night under a streetlight because that’s where the light is… there’s a whole lot more out there though (quote from a prof in undergrad)

raincole

0 replies

1d13h

2024-06-13 05:18:19 UTC

Unless mathematics is fundamentally more complicated than chess or go

"Fundamentally" carries the weight of the whole solar system here. Everyone knows mathematics is more conceptually and computationally complicated than chess.

But of course every person has different opinions on what makes two things "fundamentally" different so this is a tautology statement.

hwayne

0 replies

1d20h

2024-06-12 21:51:53 UTC

Unless mathematics is fundamentally more complicated than chess or go, I don't see why that could not happen.

My usual comparison is Sokoban: there are still lots of levels that humans can beat that all Sokoban AIs cannot, including the AIs that came out of Deepmind and Google. The problem is that the training set is so much smaller, and the success heuristics so much more exacting, that we can't get the same benefits from scale as we do with chess. Math is even harder than that.

(I wonder if there's something innate about "planning problems" that makes them hard for AI to do.)

boyka

9 replies

2d2h

2024-06-12 15:31:38 UTC

Isn’t proof checking at least as hard as k-sat and therefore NP? Given that neural networks process in constant time, how would that be possible without approximation?

hollerith

2 replies

2d2h

2024-06-12 15:33:28 UTC

No.

boyka

1 replies

2d2h

2024-06-12 15:33:56 UTC

Care to explain?

E.g. https://mathoverflow.net/questions/226966/algorithmic-comple...

markisus

0 replies

2d2h

2024-06-12 15:38:53 UTC

A proof is basically a directed graph where nodes are propositions, and edges are deductive rules. One only needs to check that the starting nodes are axioms and that each edge is one of a finite set of deduction rules in the system. So the time to check a proof is linear in the number of edges.

sebzim4500

0 replies

2d2h

2024-06-12 15:43:33 UTC

Checking a proof is much easier than finding it.

pclmulqdq

0 replies

2d2h

2024-06-12 15:52:04 UTC

Most k-sat problems are easy, or can be reduced to a smaller and easier version of k-sat. Neural networks might be able to write Lean or Coq to feed into a theorem prover.

memkit

0 replies

2d2h

2024-06-12 15:58:13 UTC

This is kind of amusing because NP problems, by definition, must have a polynomial-length certification to a given solution that can be run in polynomial time.

The word "certification" can be read as "proof" without loss of meaning.

What you're asking is slightly different: whether the proof-checking problem in logic is NP. I don't know enough math to know for sure, but assuming that any proof can be found in polynomial time by a non-deterministic Turing machine and that such a proof is polynomial in length, then yes, the proof-checking problem is in NP. Alas, I think that all of that is moot because Turing and Church's answer in the negative to the Entscheidungsproblem [1].

It would help your understanding and everyone else's if you separated the search problem for a proof from the verification problem of a proof. Something is NP when you can provide verification that runs in polynomial time. The lesser spoken about second condition is that the search problem must be solved in polynomial time by a non-deterministic Turing machine. Problems which don't meet the second condition are generally only of theoretical value.

1. https://en.m.wikipedia.org/wiki/Entscheidungsproblem

empath75

0 replies

2d2h

2024-06-12 15:38:11 UTC

This article isn't about using neural networks for proof checking, but just using them to assist with writing Lean programs in the same way that developers use co-pilot to write java code or whatever. It might just spit out boilerplate code for you, of which there still is a great deal in formal proofs.

The proof checking is still actually done with normal proof checking programs.

YetAnotherNick

0 replies

2d2h

2024-06-12 15:51:09 UTC

Humans run in constant time. So yeah it depends on the constant factor.

AnimalMuppet

0 replies

2d2h

2024-06-12 15:46:25 UTC

Neural networks process in constant time per input/output (token or whatever). But some prompts could produce more tokens.

As you increase the size of the window, doesn't the size of the NN have to increase? So if you want to just use an NN for this, the bigger the problem it can handle, the bigger the window, so the longer per token produced; but also the more tokens produced, so it's longer on that front as well. I don't know if that adds up to polynomial time or not.

Note well: I'm not disagreeing with your overall assertion. I don't know if NNs can do this; I'm inclined to think they can't. All I'm saying is that even NNs are not constant time in problem size.

dorfsmay

6 replies

1d22h

2024-06-12 19:47:47 UTC

I just spent hours looking at formal methods and had decided to learn Isabelle/HOL, then this article !

garbthetill

3 replies

1d22h

2024-06-12 20:21:39 UTC

may i ask why isabelle over agda or lean?

dorfsmay

1 replies

1d21h

2024-06-12 21:07:43 UTC

I know nothing, I am a complete beginner when it comes to formal methods. From reading about it, it seems that Isabelle/HOL is the best when it comes to automation which apparently is something you really want. It might be easier to learn (controversial, some say Lean is easier). It's been used to prove some software (including sel4 and a version of java), Apple and AWS are using it (but then I know AWS uses, or used, TLA+).

At the end of the day, I didn't want to spend more time reading about it then learning two of them (trying one and potentially switch later). The more you read about it, the more options open up (SPARK, TLA+, COQ, etc...).

I do find it ironic to read this article today given that I made that decision yesterday!

hwayne

0 replies

1d20h

2024-06-12 21:56:50 UTC

[shameless plug]I maintain a collection of proofs of leftpad in different prover languages, so people can compare them. It's here: https://github.com/hwayne/lets-prove-leftpad

[/invalid closing tag]

chaoskanzlerin

0 replies

1d20h

2024-06-12 22:05:59 UTC

The main upside Isabelle has over other proof assistants is the existence of Sledgehammer: invoke it, and your current proof state gets handed off to a SMT solver like cvc4, veriT, z3, vampire, etc. If the SMT solver finds a solution, Isabelle then reconstructs the proof.

It's essentially a brute-force button that no other proof assistant (aside from Coq) has.

fspeech

1 replies

1d18h

2024-06-12 23:52:27 UTC

I would recommend HOL Light and Lean. HOL Light is no longer being developed. It's very self contained and the foundation is very simple to comprehend. It's great for learning ATP, not the least due to the fact that Harrison wrote a nice companion book. And OCaml these days is very easy to access. I also developed a way to run it natively (https://github.com/htzh/holnat), which, while slightly less pretty, makes loading and reloading significantly faster.

dorfsmay

0 replies

1d3h

2024-06-13 15:08:25 UTC

Thanks.

There's also https://github.com/jrh13/hol-light which I think is still maintained (just learned about it in https://www.youtube.com/watch?v=uvMjgKcZDec from this thread: https://news.ycombinator.com/item?id=40619482)

raphlinus

3 replies

2d2h

2024-06-12 16:21:05 UTC

The description of "project manager mathematicians" reminded me of "A letter to my old friend Jonathan" [1] and the followup [2] by Edsger Dijkstra from 1975. It was written as satire, basically criticizing the way that software was produced by showing how ridiculous it would be to produce mathematics the same way. But in some ways it was prescient. Of course, the main point of these documents is to critique intellectual property (particularly the absurdity of applying it to mathematical truth) but fortunately that aspect is not a significant concern of this push toward mechanization.

[1]: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD04xx/E...

[2]: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD05xx/E...

pierrefermat1

0 replies

1d16h

2024-06-13 02:21:24 UTC

Except it's a non sensical comparison? 99.999% of software is not living on the cutting edge and is just engineering implementation that can be managed.

data_maan

0 replies

1d13h

2024-06-13 04:59:17 UTC

LLMs can do math. AFAIK I know this is the SOTA if you don't formalize it:

https://arxiv.org/abs/2301.13867

coderenegade

0 replies

1d17h

2024-06-13 00:33:51 UTC

This was fantastic. Thank you for sharing it!

qsort

2 replies

2d2h

2024-06-12 15:46:54 UTC

"their time frame is maybe a little bit aggressive" has to be the nicest way I've ever heard someone say "they're full of shit".

hyperfuturism

1 replies

2024-06-12 17:34:05 UTC

I think he's being quite literal in that he believes that they're too aggressive.

In the article he does afterall second the notion that AI will eventually replace a lot of human labour related to mathematics.

qsort

0 replies

1d23h

2024-06-12 19:13:54 UTC

I'm not reading it that way, I don't know where you are getting "AI will eventually replace..." from, if anything he seems to have the opposite idea: "even if AI can do the type of mathematics we do now, it means that we will just move to a to a higher type of mathematics".

The line I quoted is in response to "in two to three years mathematics will be “solved” in the same sense that chess is solved".

Unless my priors are dramatically wrong, anyone who believes that is indeed full of shit.

thecleaner

1 replies

2024-06-12 18:00:07 UTC

Does Lean work with linear slgebra proofs ?

gowld

0 replies

1d23h

2024-06-12 19:02:59 UTC

You need to ask a more specific question.

https://leanprover-community.github.io/theories/linear_algeb...

pollimr

1 replies

3d3h

2024-06-11 15:20:49 UTC

These tools, and subsequently their unethical parent companies, will be air gapped by length from my research.

rcpt

0 replies

2024-06-12 18:05:35 UTC

https://lean-lang.org/about/ is an open source project

aeternum

1 replies

3d1h

2024-06-11 16:50:58 UTC

How about: AI will become everyone's co-pilot

Now we don't need to write hundreds of additional articles.

ProllyInfamous

0 replies

2024-06-11 17:34:23 UTC

~"You don't ever want to be a company programming a bot for a company which is designed to program bots, to program bots, programming bots programming bots."~ [1]

—Mark Zuckerberg, probably [actually not].

I am not a robot, but the above limerick's author.

RIP to my IRL humanbro, who recently programmed himself out of his kushy SalesForce job.

Welcome... to The Future™.

[1] Paraphrasing Zuckerberg saying he doesn't think a structure of "just managers managing managers" is ideal.

zero_k

0 replies

1d22h

2024-06-12 19:32:38 UTC

Proofs and proof checking is also big in SAT solving (think: MiniSat) and SMT (think: z3, cvc5). It's also coming to model counting (counting the number of solutions). In fact, we did one in Isabelle for an approximate model counter [1], that provides probabilistically approximate count (a so-called PAC guarantee), which is a lot harder to do, due to the probabilistic nature.

[1] https://github.com/meelgroup/approxmc

throwme_123

0 replies

1d17h

2024-06-13 00:53:26 UTC

"Elon Must"

sylware

0 replies

4d17h

2024-06-10 00:36:38 UTC

I took us only Terence Tao to speak out loud about the ego of many mathematicians refusing to consider "AI" as a potential assistant.

We may realise we need much bigger neural net and training facilities, different models.

And in end, it can be a failure.

You know, R&D.

rthnbgrredf

0 replies

2024-06-11 17:57:51 UTC

Will be interesting when they become able to autocomplete current research topics correctly.

munchler

0 replies

2024-06-12 17:50:18 UTC

Terence Tao also gave a great talk on this topic a few months ago, where he goes into more detail about uses of Lean: https://www.youtube.com/watch?v=AayZuuDDKP0

mik1998

0 replies

1d20h

2024-06-12 22:24:28 UTC

All I can read from this article is "the end of mathematics is on the horizon". Hopefully nothing he says actually materializes.

m3kw9

0 replies

1d18h

2024-06-13 00:22:41 UTC

The signal of AGI is if it can help solve a hard problem or create a new one

idkdotcom

0 replies

1d15h

2024-06-13 03:01:07 UTC

I have great respect for Terry Tao but let's not forget that while he won the Fields Medal at a relatively young age, he hasn't made any major contribution to math comparable to proving Fermat's Last Theorem (Andrew Wiles did), solving the Poincaré's Conjecture (Grigori Perelman did) or Yitang Zhang's proof progress towards solving the twin prime conjecture.

Perhaps for mathematicians like Terry Tao AI will be a tool they will rely on, but for mathematicians like the aforementioned, a pen, paper and the published research literature will continue to be their main tools!

gsuuon

0 replies

1d12h

2024-06-13 05:33:50 UTC

  > But they only publish the successful thing, not the process.

Learning from other people's failures is a really productive use of time (compared to failing on your own) - really wish this were easier to do for most things in life.

gowld

0 replies

1d23h

2024-06-12 18:59:05 UTC

German mathematician and Fields Medalist Peter Scholze collaborated in a Lean project—even though he told me he doesn’t know much about computers.

With these formalization projects, not everyone needs to be a programmer. Some people can just focus on the mathematical direction; you’re just splitting up a big mathematical task into lots of smaller pieces. And then there are people who specialize in turning those smaller pieces into formal proofs. We don’t need everybody to be a programmer; we just need some people to be programmers. It’s a division of labor.

Who can get paid to be a "mathematical programmer" ?

PhD mathematicians? Grad students?

emporas

0 replies

1d20h

2024-06-12 21:51:50 UTC

Let’s say an AI supplies an incomprehensible, ugly proof. Then you can work with it, and you can analyze it. Suppose this proof uses 10 hypotheses to get one conclusion—if I delete one hypothesis, does the proof still work? That’s a science that doesn’t really exist yet because we don’t have so many AI-generated proofs, but I think there will be a new type of mathematician that will take AI-generated mathematics and make it more comprehensible.

My thoughts exactly about designing public APIs of code. A role which traditionally performed only by experienced developers, now it can be greatly simplified and accessible to everyone.

drlemonpepper

0 replies

2d23h

2024-06-11 18:45:50 UTC

https://archive.is/Idouw

CloseChoice

0 replies

1d12h

2024-06-13 06:01:22 UTC

3 points stand out to me in this interview:

- Project manager mathematicians: Tao draws a future where mathematical insights are "produced" like anything else in our society. He attributes the lack of this productionalization of mathematics to a lack of tools in this area but AI and proof assistants might be revolutionary in this regard (proof assistants already are). Human interaction and guidance still needed

- Implicit Knowledge: he points out that so much knowledge is not in papers, e.g. intuition and knowledge of failures. This is crucial and makes it necessary even for top mathematicians to talk to one another to not make mistakes again.

- Formalization of mathematics: one would think that mathematics is pretty formalized already (and it is) but in the papers a lot of common knowledge is taken for granted so having proofs formalized in such a way that proof assistants can understand will help more people actually understanding what is going on.

I think this just shows how Tao always searches for new ways to do mathematical research, which I first came across in his talk about one of the polymath projects.