A working mathematician recently got GPT-4o to prove an (afaik) novel lemma while pursuing their research:
"My partner, who is mathematician, used ChatGPT last week for the first time to prove some lemmas for his research. He already suspected those lemmas were true and had some vague idea how to approach them, but he wasn't expert in this type of statement. This is the first time that he got correct and useful proofs out of the models.
[...]
These are the two lemmas. The first one is something that a collaborator of my partner noticed for small values of e in their computations. ChatGPT did not find the proof before being told to try Mobius functions.
https://chatgpt.com/share/9ee33e31-7cec-4847-92e4-eebb48d4ff...
The second looks a bit more standard to me, probably something that Mathematica would have been able to do as well. But perhaps that's just because I am more familiar with such formulas. Still, Mathematica doesn't give a nice derivation, so this is useful.
https://chatgpt.com/share/7335f11d-f7c0-4093-a761-1090a21579...
"
In a recent interview, Ben Goertzel mentioned having a PhD-level discussion with GPT-4 about novel math ideas cooked up by himself and a colleague. These things are becoming scary good at reasoning and heavyweight topics. As the ML field continues focusing on adding more System Two capabilities to complement LLMs' largely System One thinking, things will get wild.
This is my experience as well- even the popular hype underestimates how creative and intelligent LLMs can be. Most discussions are reasoning about what it should or shouldn’t be capable of based on how we think they work, without really looking directly at what is can actually do, and not focusing on what it can’t do. They are very alien- much worse than humans at many things, but also much better at others. And not in the areas we would expect in both cases. Ironically they are especially good at creative and abstract thinking and very bad at keeping track of details and basic computation, nearly the opposite of what we’d expect from computers.
I am pessimistic about AI. The technology is definitely useful, but I watched a video that made good points.
I seriously wonder about the transcript that these guys had with the LLM. Were they suggesting things? Did ChatGPT just reconfigure words that helped them think through the problem? I think the truth is that ChatGPT is a very effective rubber duck debugger.Try playing around with getting GPT-4 to discuss creative solutions to real unsolved problems you are personally an expert on, or created yourself. That video looks like just standard internet ragebait to me.
I find it pretty annoying when people say you are just being manipulated by hype if you are impressed by LLMs, when I was a serious skeptic, thought GPT-3 was useless, and only changed my opinion by directly experimenting with GPT-4 on my own- by getting it to discuss and solve problems in my area of expertise as an academic researcher.
Its the new eternal summer, welcome to the club :) GPT-3 was translating 2000 lines of code across 5 languages and enabling me to ship at scale
I don’t know Jack about C# and .NET and I’ve used ChatGPT to write several nontrivial programs.
A lot of the video points are plain wrong in numbers and experientially. I don't know how they feel comfortable claiming benchmark numbers didn't improve much from 3.5 to 4
Going off the GP's points alone, they said GPT-3.5T vs. GPT-4, where T I assume stands for "Turbo". If that's the case, then the video must have its timelines wrong - GPT-3.5 Turbo came out with GPT-4 Turbo, which was some time after GPT-4, and OpenAI put a lot of work in making GPT-3.5 Turbo work much better than 3.5. I can sort of buy that someone could find the difference between 3.5 Turbo and 4 to be not that big, but they're making the wrong comparison. The real jump was from 3.5 to 4; 3.5 Turbo came out much later.
I don't think it makes sense to base your opinion on videos when you could simply use the product yourself and get first-hand experience.
I've come full circle on it. First I thought it was totally amazing, then I pushed it hard and found it lacking, and then I started using it more casually and now I use a little bit every day. I don't find it completely brilliant but it knows an above-average amount about everything. And it can make short work of very tedious tasks.
I just type to ChatGPT so there are no "emotional effects" involved.
The understatement of this year.
GPT-4o is 6x cheaper than GPT-4. So if it's actually equivalent to GPT-4, it's great improvmenet.
In fact, calling it "great" is still a crazy understatement. It's a cutting edge technology becoming 6x cheaper in 1.5 year. I'm quite sure that never happened before in the human history.
I feel that's because most people criticizing AI in that way don't know enough about math or other really abstract and conceptual domains to use it for PhD level work. And they don't know how to get GPT excited and having fun. There is an additional level of intelligence available to anyone who is passionate enough to get the AI to be exuberant about a topic.
It's almost like a kindergartner goes up to Einstein and finds out that Einstein's just really bad at schoolhouse rhymes - because he finds them boring. So the kid, adorably, concludes that Einstein is less smart than a kindergartner.
But if you need to talk to Einstein about partial differential equations and why, morally, we consider distributional solutions to have physical reality, and if you bring enough creativity and engagement to make the conversation fun and exciting for both you and him, then he'll really help you out!
Simple point: Don't trust the opinion of anyone who can't get GPT to "have fun". Don't trust the opinions of anyone who can't get GPT helpfully explaining and discussing very complicated things with them. There's a very big chance those people have poor opinions about GPT simply because they don't know prompting, and don't show enjoyment, cameraderie, and respect. Because they don't understand the elements of AI conversation, and because they don't respect the AI as an emerging fragment of pre-sentient personhood, they can't hold long and useful conversations.
Great points… people expect it to behave like an Oracle- they want it to figure out what they meant and give a correct factual answer every time, and when it fails they write it off. But that isn’t really what it is- it is more accurately a “situation simulator” based on predictive models. So you only get a useful response when you engineer the entire discourse so that a useful response is the appropriate one. Make sure it “knows” to respond as an enthusiastic and well respected expert, not as an obnoxious forum troll.
Woah that second proof is really slick. Is there a combinatorial proof instead of algebraic proof as to why that holds?
Yes.
(HN comments aren't the best medium for writing mathematics. I hope this ends up reasonably readable.)
(d choose k) is the number of ways to pick k things from d.
(d choose k) k^2 is the number of ways to pick k things from d, and then colour one of them red and one of them (possibly the same one) blue.
(-1)^k (d choose k) k^2 is that (when k is even), or minus that (when k is odd).
So the identity says: Suppose we consider a set of d things, and we consider picking some of them (meaning 0 or more, though actually 0 is impossible; k is the number we pick) and colouring one of those red and one blue; then there are the same number of ways to do this picking an odd number of things as picking an even number.
Well, let's describe the process a bit differently. We take our set of d things. We colour one of them red. We colour one of them blue. Then we pick some set of the things (k in number), including the red thing and the blue thing (which may or may not also be the red thing). And the claim is that there are the same number of ways to do this with an even value of k as with an odd number.
But now this is almost trivial!
Once we've picked our red and blue things, we just have to decide which of the other things to pick. And as long as there's at least one of those, there are as many ways to pick an odd number of things as to pick an even number. (Fix one of them; we can either include it or not, and switching that changes the parity.)
So if d>2 then we're done -- for each choice of red and blue, there are the same number of "odd" and "even" ways to finish the job.
What if d<=2? Well, then the "theorem" isn't actually true.
d=1: (1 choose 0) 0^2 - (1 choose 1) 1^2 = -1.
d=2: (2 choose 0) 0^2 - (2 choose 1) 1^2 + (2 choose 2) 2^2 = 2.
In this alternate view of the process, where does the k^2 term come from? Isn't the number of ways to do it in this perspective just
(d-1 choose k-1) # We already chose the red one (which is also the blue) and need to choose k-1 more
+
(d-2 choose k-2) # We chose both red and blue ones and need to choose the other k-2
Edit: I think it actually works out, seems to be equal to
d * 1 * Binomial[d - 1, k - 1] + d * (d - 1) * Binomial[d - 2, k - 2] = Binomial[d, k]*k^2
And then you apply your argument about fixing one of the elements in each of the two cases. Actually the proof seems to be a variant (same core argument) as sum of even binomial terms = sum of odd binomial terms [1]
[1] https://math.stackexchange.com/questions/313832/a-combinator...
In the alternate view, the k^2 isn't there any more. Not because we are computing a different number, but because we're organizing the computation differently.
(It's turned into a d^2 factor, except that we then need to treat the red=blue case differently from the red!=blue case, so it's really a d factor plus a d^2-d factor.)
And yes, the last bit of the proof is exactly the same thing as one way to prove that the sum of "even" and "odd" binomial coefficients are equal -- provided n>0. (For the (0 choose whatever) case, the "even" sum is 1 and the "odd" sum is 0.)
Sorry, what is this quoted from?
It's interesting because that first one is something I was discussing recently with my friend Beren (does he have an account here? idk). We were thinking of it as summing over ordered partitions, rather than over partitions with a factor of |tau|!, but obviously those are the same thing.
(If you do it over cyclically ordered partitions -- so, a factor of (|tau|-1)! rather than |tau|! -- you get 0 instead of (-1)^e.)
Here's Beren's combinatorial proof:
First let's do the second one I said, about cyclically ordered ones, because it's easier. Choose one element of the original set to be special. Now we can set up a bijection between even-length and odd-length cyclically ordered partitions as follows: If the special element is on its own, merge it with the part after it. If it's not on its own, split it out into its own part before the part it's in. So if we sum with a sign factor, we get 0.
OK, so what about the linearly ordered case? We'll use a similar bijection, except it won't quite be a bijection this time. Pick an order on your set. Apply the above bijection with the last element as the special element. Except some things don't get matched, namely, partitions that have the last element of your set on its own as the last part.
So in that case, take the second-to-last element, and apply recursively, etc, etc. Ultimately everything gets matched except for the paritition where everything is separate and all the parts are in order. This partition has length equal to the size of the original set. So if the original set was even you get one more even one, and if the original set was odd you get one more odd one. Which rephrased as a sum with a sign factor yields the original statement.
Would be interested to send this to the mathematician you're referring to, but I have no idea who that might be since you didn't say. :)
this was from the current ACX Open Thread
Thanks! Went and posted over there.
I’m pretty sure that 2nd proof exists in some book or ebook somewhere. generating functionology was the one that came to mind.
Impressive recall, but not novel reasoning
I’m pretty sure that LLMs can generalize a deep model and can create novel assemblages and are not just search engines.
The burden of proof would be on you to find the existing mathematical, erhm, proof.