"return to table of content"

OpenAI researchers warned board of AI breakthrough ahead of CEO ouster

wbhart
206 replies
12h24m

I feel very comfortable saying, as a mathematician, that the ability to solve grade school maths problems would not be at all a predictor of ability to solve real mathematical problems at a research level.

The reason LLMs fail at solving mathematical problems is because: 1) they are terrible at arithmetic, 2) they are terrible at algebra, but most importantly, 3) they are terrible at complex reasoning (more specifically they mix up quantifiers and don't really understand the complex logical structure of many arguments) 4) they (current LLMs) cannot backtrack when they find that what they already wrote turned out not to lead to a solution, and it is too expensive to give them the thousands of restarts they'd require to randomly guess their way through the problem if you did give them that facility

Solving grade-school problems might mean progress in 1 and 2, but that is not at all impressive, as there are perfectly good tools out there that solve those problems just fine, and old-style AI researchers have built perfectly good tools for 3. The hard problem to solve is problem 4, and this is something you teach people how to do at a university level.

(I should add that another important problem is what is known as premise selection. I didn't list that because LLMs have actually been shown to manage this ok in about 70% of theorems, which basically matches records set by other machine learning techniques.)

(Real mathematical research also involves what is known as lemma conjecturing. I have never once observed an LLM do it, and I suspect they cannot do so. Basically the parameter set of the LLM dedicated to mathematical reasoning is either large enough to model the entire solution from end to end, or the LLM is likely to completely fail to solve the problem.)

I personally think this entire article is likely complete bunk.

Edit: after reading replies I realise I should have pointed out that humans do not simply backtrack. They learn from failed attempts in ways that LLMs do not seem to. The material they are trained on surely contributes to this problem.

nostrademons
48 replies
11h7m

What I wonder, as a computer scientist:

If you want to solve grade school math problems, why not use an 'add' instruction? It's been around since the 50s, runs a billion times faster than an LLM, every assembly-language programmer knows how to use it, every high-level language has a one-token equivalent, and doesn't hallucinate answers (other than integer overflow).

We also know how to solve complex reasoning chains that require backtracking. Prolog has been around since 1972. It's not used that much because that's not the programming problem that most people are solving.

Why not use a tool for what it's good for and pick different tools for other problems they are better for? LLMs are good for summarization, autocompletion, and as an input to many other language problems like spelling and bigrams. They're not good at math. Computers are really good at math.

There's a theorem that an LLM can compute any computable function. That's true, but so can lambda calculus. We don't program in raw lambda calculus because it's terribly inefficient. Same with LLMs for arithmetic problems.

seanhunter
11 replies
8h30m

There is a general result in machine learning known as "the bitter lesson"[1], which is that methods which come from specialist knowledge tend to be beaten by methods which rely on brute force computation in the long run because of Moore's law and the ability to scale things by distributed computing. So the reason people don't use the "add instruction"[2] for example is that over the last 70 years of attempting to build out systems which do exactly what you are proposing, they have found that not to work very well whereas sacrificing what you are calling "efficiency" (which they would think of as special purpose domain-specific knowledge) turns out to give you a lot in terms of generality. And they can make up the lost efficiency by throwing more hardware at the problem.

[1] http://www.incompleteideas.net/IncIdeas/BitterLesson.html

[2] Which the people making these models are familiar with. The whole thing is a trillion+ parameter linear algebra crunching machine after all.

qsort
8 replies
8h5m

As someone with a CS background myself, I don't think this is what GP was talking about.

Let's forget for a moment that stuff has to run on an actual machine. If you had to represent a quadratic equation, would you rather write:

(a) x^2 + 5x + 4 = 0

(b) the square of the variable plus five times the variable plus four equals zero

When you are trying to solve problems with a level of sophistication beyond the toy stuff you usually see in these threads, formal language is an aid rather than an impediment. The trajectory of every scientific field (math, physics, computer science, chemistry, even economics!) is away from natural language and towards formal language, even before computers, precisely for that reason.

We have lots of formal languages (general-purpose programming languages, logical languages like Prolog/Datalog/SQL, "regular" expressions, configuration languages, all kinds of DSLs...) because we have lots of problems, and we choose the representation of the problem that most suits our needs.

Unless you are assuming you have some kind of superintelligence that can automagically take care of everything you throw at it, natural language breaks down when your problem becomes wide enough or deep enough. In a way this is like people making Rube-Goldberg contraptions with Excel. 50% of my job is cleaning up that stuff.

ben_w
2 replies
7h57m

I assumed seanhunter was suggesting getting the LLM to convert x^2 + 5x + 4 = 0 to a short bit of source code to solve for x.

IIRC Wolfram Alpha has (or had, hard to keep up) a way to connect with ChatGPT.

seanhunter
1 replies
7h25m

It does. This is the plugins methodology described in the toolformers paper which I've linked elsewhere[1]. The model learns that for certain types of problems certain specific "tools" are the best way to solve the problem. The problem is of course it's simple to argue that the LLM learns to use the tool(s) and can't reason itself about the underlying problem. The question boils down to whether you're more interested in machines which can think (whatever that means) or having a super-powered co-pilot which can help with a wide variety of tasks. I'm quite biased towards the second so I have the wolfram alpha plugin enabled in my chat gpt. I can't say it solves all the math-related hallucinations I see but I might not be using it right.

[1] But here it is again https://arxiv.org/abs/2302.04761

vidarh
0 replies
6h50m

GPT4 does even without explicitly enabling plugins now, by constructing Python. If you want it to actually reason through it, you now need to ask it, sometimes fairly forcefully/in detail, before it will indulge you and not omit steps. E.g. see [1] for the problem given above.

But as I noted elsewhere, training its ability to do it from scratch matters not for the ability to do it from scratch, but for the transferability of the reasoning ability. And so I think that while it's a good choice for OpenAI to make it automatically pick more effective strategies to give the answer it's asked for, there is good reason for us to still dig into its ability to solve these problems "from scratch".

[1] https://chat.openai.com/share/694251c9-345b-4433-a856-7c38c5...

vintermann
1 replies
5h58m

There are some domains that are in the twilight zone between language and deductive, formal reasoning. I've been into genealogy last year. It's very often deductive "detective work": say there are four women in a census with the same name and place that are listed on a birth certificate you're investigating. Which of them is it? You may rule one out on hard evidence (census suggests she would have been 70 when the birth would have happened), one on linked evidence (this one is the right age, but it's definitively the same one who died 5 years later and we know the child's mother didn't), one on combined softer evidence (she was in a fringe denomination and at the upper end of the age range) then you're left with one, etc.

Then as you collect more evidence you find that the age listed on the first one in the census was wildly off due to a transcription error and it's actually her.

You'd think some sort of rule-based system and database might help with these sorts of things. But the historical experience of expert system is that you then often automate the easy bits at the cost of demanding even more tedious data-entry. And you can't divorce data entry and deduction from each other either, because without context, good luck reading out a rare last name in the faded ink of some priest's messy gothic handwriting.

It feels like language models should be able to help. But they can't, yet. And it fundamentally isn't because they suck at grade school math.

Even linguistics, not something I know much about but another discipline where you try to make deductions from tons and tons of soft and vague evidence - you'd think language models, able to produce fluent text in more languages than any human, might be of use there. But no, it's the same thing: it can't actually combine common sense soft reasoning and formal rule-oriented reasoning very well.

igleria
0 replies
3h44m

You'd think some sort of rule-based system and database might help with these sorts of things.

sounds like belief change systems (a bit) to me!

https://plato.stanford.edu/entries/logic-belief-revision/

seanhunter
1 replies
6h45m

I quite agree and so would Wittgenstein, who (as I understand it) argued that precise language is essential to thought and reasoning[1]. I think one of the key things here is often what we think of as reasoning boils down to taking a problem in the real world and building a model of it using some precise language that we can then apply some set of known tools to deal with. Your example of a quadratic is perfect, because of course now I see (a) I know right away that it's an upwards-facing parabola with a line of symmetry at -5/2, that the roots are at -4 and -1 etc whereas if I saw (b) I would first have to write it down to get it in a proper form I could reason about.

I think this is a fundamental problem with the "chat" style of interaction with many of these models (that the language interface isn't the best way of representing any specific problem even if it's quite a useful compromise for problems in general). I think an intrinsic problem of this class of model is that they only have text generation to "hang computation off" meaning the "cognative ability" (if we can call it that) is very strongly related to how much text it's generating for a given problem which is why that technique of prompting using chain of thought generates much better results for many problems[2].

[1] Hence the famous payoff line "whereof we cannot speak, thereof we must remain silent"

[2] And I suspect why in general GPT-4 seems to have got a lot more verbose. It seems to be doing a lot of thinking out loud in my use, which gives better answers than if you ask it to be terse and just give the answer or to give the answer first and then the reasoning, both of which generally generate inferior answers in my experience and in the research eg https://arxiv.org/abs/2201.11903

qsort
0 replies
6h8m

I quite agree and so would Wittgenstein

It depends on whether you ask him before or after he went camping -- but yeah, I was going for an early-Wittgenstein-esque "natural language makes it way too easy to say stuff that doesn't actually mean anything" (although my argument is much more limited).

I think this is a fundamental problem with the "chat" style of interaction

The continuation of my argument would be that if the problem is effectively expressible in a formal language, then you likely have way better tools than LLMs to solve it. Tools that solve it every time, with perfect accuracy and near-optimal running time, and critically, tools that allow solutions to be composed arbitrarily.

Alpha Go and NNUE for computer chess, which are often cited for some reason as examples of this brave new science, would be completely worthless without "classical" tree search techniques straight out of the Russel-Norvig.

Hence my conclusion, contra what seems to be the popular opinion, is that these tools are potentially useful for some specific tasks, but make for very bad "universal" tools.

wegfawefgawefg
0 replies
1h24m

the ml method doesnt require you to know how to solve the problem at all, and could someday presumably develop novel solutions. not just high efficiency symbolic graph search.

rgavuliak
0 replies
7h47m

I would mention, that while yes, you can just throw computational power at the problem, the addition of human expertise didn't disappear. It moved from creating an add instruction, to coming up with a new Neural Net Architecture, and we've seen a lot of the ideas being super useful and pushing the boundaries.

omnicognate
0 replies
4h59m

The bitter lesson isn't a "general result". It's an empirical observation (and extrapolation therefrom) akin to Moore's law itself. As with Moore's law there are potential limiting factors: physical limits for Moore's law and availability and cost of quality training data for the bitter lesson.

Closi
11 replies
10h11m

Why would we teach kids maths then, when they can use a calculator? It's much easier and faster for them.

I believe it's because having a foundational understanding of maths and logic is important when solving other problems, and if you are looking to create an AI that can generally solve all problems it should probably have some intuitive understanding of maths too.

i.e. if we want an LLM to be able to solve unsolved theorems in the future, this requires a level of understanding of maths that is more than 'teach it to use a calculator'.

More broadly, I can imagine a world where LLM training is a bit more 'interactive' - right now if you ask it to play a game of chess with you it fails, but it has only ever read about chess and past games and guesses the next token based on that. What if it could actually play a game of chess - would it get a deeper appreciation for the game? How would this change it's internal model for other questions (e.g. would this make it answer better at questions about other games, or even game theory?)

comex
6 replies
9h32m

Judging by some YouTube videos I’ve seen, ChatGPT with GPT-4 can get pretty far through a game of chess. (Certainly much farther than GPT-3.5.) For that duration it makes reasonably strategic moves, though eventually it seems to inevitably lose track of the board state and start making illegal moves. I don’t know if that counts as being able to “actually play a game”, but it does have some ability, and that may have already influenced its answers about the other topics you mentioned.

vczf
5 replies
8h37m

What if you encoded the whole game state into a one-shot completion that fits into the context window every turn? It would likely not make those illegal moves. I suspect it's an artifact of the context window management that is designed to summarize lengthy chat conversations, rather than an actual limitation of GPT4's internal model of chess.

actionfromafar
4 replies
7h6m

I am sorry, but I thought it was a bold assumption it has an internal model of chess?

PoignardAzur
1 replies
6h48m

Not that bold, given the results from OthelloGPT.

We know with reasonable certainty that an LLM fed on enough chess games will eventually develop an internal chess model. The only question is whether GPT4 got that far.

tedajax
0 replies
3h26m

Doesn't really seem like an internal chess model if it's still probabalistic in nature. Seems like it could still produce illegal moves.

vidarh
0 replies
6h49m

Having an internal model of chess and maintaining an internal model of the game state of a specific given game when it's unable to see the board are two very different things.

EDIT: On re-reading I think I misunderstood you. No, I don't think it's a bold assumption to think it has an internal model of it at all. It may not be a sophisticated model, but it's fairly clear that LLM training builds world models.

baq
0 replies
6h19m

Why?

Or, given https://thegradient.pub/othello/, why wouldn't it have an internal model of chess? It probably saw more than enough example games and quite a few chess books during training.

smeej
1 replies
35m

Why would we teach kids maths then, when they can use a calculator? It's much easier and faster for them.

I am five years older than my brother, and we happened to land just on opposite sides of when children were still being taught mental arithmetic and when it was assumed they would, in fact, have calculators in their pockets.

It drives him crazy that I can do basic day-to-day arithmetic in my head faster than he can get out his calculator to do it. He feels like he really did get cheated out of something useful because of the proliferation of technology.

wegfawefgawefg
0 replies
14m

Skull has limited volume. What room is unused by one capacity may be used by another.

vidarh
0 replies
6h29m

More broadly, I can imagine a world where LLM training is a bit more 'interactive'

Well, yes, assume that every conversation you have with ChatGPT without turning off history makes it into the training set.

ChatGTP
0 replies
6h17m

It's also fun to use your brain I guess, I think we've truly forgotten that life should be about fun.

Watching my kids grow up, they just have fun doing things like trying to crawl, walk or drink. It's not about being the best at it, or the most efficient, it's just about the experience.

Now maths is taught in a boring way, but knowing it can help us lead more enjoable lives. When math is taught in an enjoyable way AND people get results out of it. Well that's glorious.

xwolfi
7 replies
11h4m

You're missing the point: who's using the 'add' instruction ? You. We want 'something' to think about using the 'add' instruction to solve a problem.

We want to remove the human from the solution design. It would help us tremendously tbh, just like I don't know, Google map helped me never to have to look for direction ever again ?

marshray
6 replies
10h22m

When the solution requires arithmetic, one trick is to simply ask GPT to write a Python program to compute the answer.

There's your 'add'.

davidwritesbugs
4 replies
9h40m

Interesting, how do you use this idea? If you prompt the LLM "create a python Add function Foo to add a number to another number", "using Foo add 1 and 2", or somesuch, but what's to stop it hallucinating and saying "Sure, let me do that for you, foo 1 and 2 is 347. Please let me know if you need anything else."

vidarh
0 replies
6h34m

With ChatGPT you now just state your problem, and if it looks like math, it will do so. E.g. see this transcript:

https://chat.openai.com/c/dd8de3f7-a50c-4b6d-bd3f-b52ed996d3...

kolinko
0 replies
4h36m

It writes a function and then you provide it to an interpreter which does the calculation output on which gpt proceeds to do the rest.

That’s how langchain works, chatgpt plugins and gpt function calling. It has proven to be pretty robust - that is, gpt4 realising when it needs to use a tool/write code for calculations when needed and then using the output.

LASR
0 replies
8h13m

We’re way beyond this kind of hallucinations now. OpenAI’s models are frighteningly good at producing code.

You can even route back runtime errors and ask it to fix its own code. And it does.

It can write code and even write a test to test that code. Give it an interpreter and you’re all set.

IanCal
0 replies
9h7m

Nothing stops it from writing a recipe for soup for every request, but it does tend to do what it's told. When asked to do mathsy things and told it's got a tool for doing those it tends to lean into that if it's a good llm.

vidarh
0 replies
6h35m

GPT4 now does this by default. You'll see a "analyzing" step before you get the answer, and a link which will show the generated python.

sgt101
5 replies
7h50m

Can LLM's compute any computable function? I thought that an LLM can approximate any computable function, if the function is within the distribution that it is are trained on. I think it's jolly interesting to think about different axiomizations in this context.

Also we know that LLM's can't do a few things - arithmetic, inference & planning are in there. They look like they can because they retrieve discussions from the internet that contain the problems, but when they are tested out of distribution then all of a sudden they fail. However, some other nn's can do these things because they have the architecture and infrastructure and training that enables it.

There is a question for some of these as to whether we want to make NN's do these tasks or just provide calculators, like for grade students, but on the other hand something like Alphazero looks like it could find new ways of doing some problems in planning. The challenge is to find architectures that integrate the different capabilities we can implement in a useful and synergistic way. Lots of people have drawn diagrams about how this can be done, then presented them with lots of hand waving at big conferences. What I love is that John Laird has been building this sort of thing for like, forty years, and is roundly ignored by NN people for some reason.

Maybe because he keeps saying it's really hard and then producing lots of reasons to believe him?

RamblingCTO
2 replies
7h11m

I still believe that A(G)I will consist of subsystems and different network architectures (if NN's are the path to that), just like we humans have.

trashtester
1 replies
1h54m

Many of the "specialist" parts of the brain are still made from cortical columns, though. Also, they are in many cases partly interchangeable, with some reduction in efficiency.

Transformers may be like that, in that they can do generalized learning from different types of input, with only minor modifications needed to optimize for different input (or output) modes.

RamblingCTO
0 replies
29m

Afaik some are similar, yes. But we also have different types of neurons etc. Maybe we'll get there with a generalist approach, but imho the first step is a patchwork of specialists.

wegfawefgawefg
0 replies
11m

ML still cant do sin. Functions that repeat periodically.

vidarh
0 replies
6h37m

Can LLM's compute any computable function?

In a single run, obviously not any, because it's context window is very limited. With a loop and access to an "API" (or willing conversation partner agreeing to act as one) to operate a Turing tape mechanism? It becomes a question of ability to coax it into complying. It trivially has the ability to carry out every step, and your main challenge becomes to get it to stick to it over and over.

One step "up", you can trivially get GPT4 to symbolically solve fairly complex runs of instructions of languages it can never have seen before if you specify a grammar and then give it a program, with the only real limitation again being getting it to continue to adhere to the instructions for long enough before it starts wanting to take shortcuts.

In other words: It can compute any computable function about as well as a reasonably easily distractable/bored human.

panarky
2 replies
10h8m

>> the ability to solve grade school maths problems would not be at all a predictor of ability to solve real mathematical problems at a research level

If you want to solve grade school math problems, why not use an 'add' instruction?

Certainly the objective is not for the AI to do research-level mathematics.

It's not really even to do grade-school math.

The point is that grade-school math requires reasoning capability that transcends probabilistic completion of the next token in a sequence.

And if Q-Star has that reasoning capability, then it's another step-function leap toward AGI.

ethanbond
0 replies
4h37m

This is so profoundly obvious you have to wonder the degree of motivated reasoning behind people’s attempt to cast this as “omg it can add… but so can my pocket calculator!”

GTP
0 replies
4h1m

Certainly the objective is not for the AI to do research-level mathematics.

The problem is that there are different groups of people with different ideas about AI, and when talking about AI it's easy to end up tackling the ideas of a specific group but forgetting about the existence of the others. In this specific example, surely there are AI enthusiasts who see no limits to the applications of AI, including research-level mathematics.

vidarh
1 replies
7h58m

There's no value in an LLM doing arithmetic for the sake of doing arithmetic with the LLM. There's value in testing an LLMs ability to follow the rules for doing arithmetic that it already knows, because the ability to recognise that a problem matches a set of rules it already knows in part or whole and then applying those rules with precision is likely to generalise to overall far better problem solving abilities.

By all means, we should give LLMs lots and lots of specialised tools to let them take shortcuts, but that does not remove the reasons for understanding how to strengthen the reasoning abilities that would also make them good at maths.

EDIT: After having just coerced the current GPT4 to do arithmetic manually: It appears to have drastically improved in its ability to systematically following the required method, while ironically being far less willing to do so (it took multiple attempts before I got it to stop taking shortcuts that appeared to involve recognising this was a calculation it could use tooling to carry out, or ignoring my instructions to do it step by step and just doing it "in its head" the way a recalcitrant student might. It's been a while since I tested this, but this is definitely "new-ish".

namibj
0 replies
6h11m

Gaslighting LLMs does wonders. In this case, e.g., priming it by convincing it the tool is either inaccessible/overloaded/laggy, or here perhaps, telling it the python tool computed wrong and can thus not be trusted.

setuid9002
1 replies
5h31m

I think the answer is Money, Money, Money. Sure it is 1000000000x more expensive in compute power, and error prown on top as well, to let a LLM solve an easy Problem. But the Monopolies generate a lot of hype around it to get more money from investors. Same as the self driving car hype was. Or the real time raytracing insanity in computer graphics. If one hype dies they artificially generate a new one. For me, I just watch all the ships sink to the ground. It is gold level comedy. Btw AGI is coming, haha, sure, we developers will be replaced by an script which can not bring B, A, C in a logical sequence. And this already needs massive town size data centers to train.

resource0x
0 replies
3h22m

If one hype dies they artificially generate a new one

They have a pipeline of hypes ready to be deployed at a moment's notice. The next one is quantum, it's already gathering in the background. Give it a couple of years.

curling_grad
1 replies
10h11m

Actually, OpenAI did a research[0] on solving some hard math problems by integrating language model and Lean theorem prover some time ago.

[0]: https://openai.com/research/formal-math

singularity2001
0 replies
9h20m

how do they achieve 41.2% in high school Olympiads but only 55% for grade school problems?

PS: also I thought GPT4 already achieved 90% in some university math grades? Oh I remember that was multiple-choice

throwuwu
0 replies
2h37m

What you’re proposing is equivalent to training a monkey (or a child for that matter) to punch buttons that correspond to the symbols it sees without actually teaching it what any of the symbols mean.

insomagent
38 replies
11h34m

Let's say a model runs through a few iterations and finds a small, meaningful piece of information via "self-play" (iterating with itself without further prompting from a human.)

If the model then distills that information down to a new feature, and re-examines the original prompt with the new feature embedded in an extra input tensor, then repeats this process ad-infinitum, will the language model's "prime directive" and reasoning ability be sufficient to arrive at new, verifiable and provable conjectures, outside the realm of the dataset it was trained on?

If GPT-4,5,...,n can progress in this direction, then we should all see the writing on the wall. Also, the day will come where we don't need to manually prepare an updated dataset and "kick off a new training". Self-supervised LLMs are going to be so shocking.

wbhart
36 replies
11h24m

People have done experiments trying to get GPT-4 to come up with viable conjectures. So far it does such a woefully bad job that it isn't worth even trying.

Unfortunately there are rather a lot of issues which are difficult to describe concisely, so here is probably not the best place.

Primary amongst them is the fact that an LLM would be a horribly inefficient way to do this. There are much, much better ways, which have been tried, with limited success.

gmerc
35 replies
11h0m

After a year the entire argument you make boils down to “so far”.

Terr_
32 replies
10h21m

Whereas your post sounds like "Just give the approach more time, it shall continue to incrementally improve until it finally works someday, cuz reasons."

Early attempts at human flight approached it by strapping wings to people's arms and flapping: Do you think that would have eventually worked too, if only we had just given it a bit more time and faith?

xcv123
29 replies
10h7m

Just give the approach more time, it shall continue to incrementally improve until it finally works someday, cuz reasons

Yes. Because we haven't yet reached the limit of deep learning models. GPT-3.5 has 175 billion parameters. GPT-4 has an estimated 1.8 trillion parameters. That was nearly a year ago. Wait until you see what's next.

meheleventyone
22 replies
9h43m

Why would adding more parameters suddenly make it better at this sort of reasoning? It feels a bit of a “god of the gaps” where it’ll just stop being a stochastic parrot in just a few more million parameters.

Al-Khwarizmi
15 replies
9h6m

I don't think it's guaranteed, but I do think it's very plausible because we've seen these models gain emerging abilities at every iteration, just from sheer scaling. So extrapolation tells us that they may keep gaining more capabilities (we don't know how exactly it does it, though, so of course it's all speculation).

I don't think many people would describe GPT-4 as a stochastic parrot already... when the paper that coined (or at least popularized) the term came up in early 2021, the term made a lot of sense. In late 2023, with models that at the very least show clear signs of creativity (I'm sticking to that because "reasoning" or not is more controversial), it's relegated to reductionistic philosophical arguments, but not really a practical description anymore.

meheleventyone
5 replies
8h57m

I don’t think we should throw out the stochastic parrot so easily. As you say there are “clear signs of creativity” but that could be it getting significantly better as a stochastic parrot. We have no real test to tell mimicry apart from reasoning and as you note we also can only speculate about how any of it works. I don’t think it’s reductionist in light of that, maybe cautious or pessimistic.

Al-Khwarizmi
4 replies
8h18m

They can write original stories in a setting deliberately designed to not be found in the training set (https://arxiv.org/abs/2310.08433). To me that's rather strong evidence of being beyond stochastic parrots by now, although I must concede that we know so little about how everything works, that who knows.

GTP
3 replies
3h46m

I didn't look at the paper but... How do you design a setting in a way that you're sure there isn't a similar one in the training set, when we don't even precisely know what the training set for the various GPT models was?

Al-Khwarizmi
2 replies
2h16m

Basically by making it unlikely enough to exist.

The setting in the paper is about narrating a single combat between Ignatius J. Reilly and a pterodactyl. Ignatius J. Reilly is a literary character with some very idiosyncratic characteristics, that appears in a single book, where he of course didn't engage in single combats at all or interact with pterodactyls. He doesn't seem to have been the target of fanfiction either (which could be a problem if characters like, say, Harry Potter or Darth Vader were used instead), so the paper argues that it's very unlikely that a story like that had been ever written at all prior to this paper.

GTP
1 replies
1h55m

Well, we've been writing stories for thousands of years, so I'm a bit skeptical that the concept of "unlikely enough to exist" is a thing. More to the specific example, maybe there isn't a story about this specific character fighting a pterodactyl, but surely there are tons of stories of people fighting all kind of animals, and maybe there are some about someone fighting a pterodactyl too.

Al-Khwarizmi
0 replies
1h19m

Sure, but the evaluation explicitly addresses (among other points) how well that specific character is characterized. If an LLM took a pre-existing story about (say) Superman fighting a pterodactyl, and changed Superman to Ignatius J. Reilly, it wouldn't get a high rating.

TerrifiedMouse
4 replies
8h3m

very least show clear signs of creativity

Do you know how that “creativity” is achieved? It’s done with a random number generator. Instead of having the LLM pick the absolute most likely next token, they have it select from a set of most likely next tokens - size of the set depends on “temperature”.

Set temperature to 0, and the LLM will talk in circles and not really say anything interesting. Set it too high and it will output nonsense.

The whole design of LLMs don’t seem very well thought out. Things are done a certain way not because it makes sense but because it seems to produce “impressive” results.

xcv123
2 replies
7h24m

Set temperature to 0, and the LLM will talk in circles and not really say anything interesting. Set it too high and it will output nonsense.

Sounds like some people I know, at both extremes.

The whole design of LLMs don’t seem very well thought out. Things are done a certain way not because it makes sense but because it seems to produce “impressive” results.

They have been designed and trained to solve natural language processing tasks, and are already outperforming humans on many of those tasks. The transformer architecture is extremely well thought out, based on extensive R&D. The attention mechanism is a brilliant design. Can you explain exactly which part of the transformer architecture is poorly designed?

TerrifiedMouse
1 replies
5h50m

They have been designed and trained to solve natural language processing tasks

They aren’t really designed to do anything actually. LLMs are models of human languages - it’s literally in the name, Large Language Model .

https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...

I’m sorry but I don’t trust something that uses a random number generator as part of its output generation.

xcv123
0 replies
3h51m

They aren’t really designed to do anything actually. LLMs are models of human languages - it’s literally in the name, Large Language Model .

No. And the article you linked to does not say that (because Wolfram is not an idiot).

Transformers are designed and trained specifically for solving NLP tasks.

I’m sorry but I don’t trust something that uses a random number generator as part of its output generation.

The human brain also has stochastic behaviour.

Al-Khwarizmi
0 replies
5h35m

I know that, but to me that statement isn't much more helpful than "modern AI is just matrix multiplication" or "human intelligence is just electric current through neurons".

Saying that it's done with a random number generator doesn't really explain the wonder of achieving meaningful creative output, as in being able to generate literature, for example.

lossolo
1 replies
4h53m

You can predict performance of certain tasks before training and it's continuous:

https://twitter.com/mobav0/status/1653048872795791360

Al-Khwarizmi
0 replies
2h10m

I read that paper back in the day and honestly I don't find it very meaningful.

What they find is that for every emerging ability where an evaluation metric seems to have a sudden jump, there is some other underlying metric that is continuous.

The thing is that the metric with the jump is the one people would actually care about (like actually being able to answer questions correctly, etc.) while the continuous one is an internal metric. I don't think that refutes the existence of emerging abilities, it just explains a little bit of how they arise.

HarHarVeryFunny
1 replies
2h46m

People use the term "stochastic parrot" in different ways ... some just as a put-down ("it's just autocomplete"), but others like Geoff Hinton acknowledging that there is of course some truth to it (an LLM is, at the end of the day, a system who's (only) goal is to predict "what would a human say"), while pointing out the depth of "understanding" needed to be a really good at this.

There are fundamental limitations to LLMs though - a limit to what can be learned by training a system to predict next word form a fixed training corpus. It can get REALLY good at that task, as we've seen, to extent that it's not just predicting next word but rather predicting an entire continuation/response that is statistically consistent with the training set. However, what is fundamentally missing is any grounding in anything other than the training set, which is the what causes hallucinations/bullshitting. In a biological intelligent system predicting reality is the goal, not just predicting what "sounds good".

LLMs are a good start in as much as they prove the power of prediction as a form of feedback, but to match biological systems we need a closed-loop cognitive architecture that can predict then self-correct based on mismatch between reality and prediction (which is what our cortex does).

For all of the glib prose that an LLM can generate, even if it seems to understand what you are asking (after all, it was trained with the goal of sounding good), it doesn't have the intelligence of even a simple animal like a rat that doesn't use language at all, but is grounded in reality.

xcv123
0 replies
1h49m

even if it seems to understand what you are asking (after all, it was trained with the goal of sounding good

It was trained not only to "sound good" aesthetically but also to solve a wide range of NLP tasks accurately. It not only "seems to" understand the prompt but it actually does have a mechanical understanding of it. With ~100 layers in the network it mechanically builds a model of very abstract concepts at the higher layers.

it doesn't have the intelligence of even a simple animal

It has higher intelligence than humans by some metrics, but no consciousness.

vidarh
2 replies
5h0m

Why would it not? We've observed them getting significantly better through multiple iterations. It is quite possible they'll hit a barrier at some point, but what makes you believe this iteration will be the point where the advanced stop?

meheleventyone
1 replies
4h38m

Because for what we’re discussing it would represent a step change in capability not an incremental improvement as we’ve seen.

vidarh
0 replies
4h35m

You're moving goal posts. You asked why it'd get better, not about a step change.

xcv123
1 replies
7h31m

it’ll just stop being a stochastic parrot in just a few more million parameters.

Is is not a stochastic parrot today. Deep learning models can solve problems, recognize patterns, and generate new creative output that is not explicitly in their training set. Aside from adding more parameters there are new neural network architectures to discover and experiment with. Transformers aren't the final stage of deep learning.

sidlls
0 replies
1h53m

Probabilistically serializing tokens in a fashion that isn't 100% identical to training set data is not creative in the context of novel reasoning. If all it did was reproduce its training set it would be the grossest example of overfitting ever, and useless.

Any actually creative output from these models is by pure random chance, which is most definitely different from the deliberate human reasoning that has produced our intellectual advances throughout history. It may or may not be inferior: there's a good argument to be made that "random creativity" will outperform human capabilities due to the sheer scale and rate at which the models can evolve, but there's no evidence that this is the case (right now).

vbezhenar
0 replies
8h52m

Humans and other animals definitely different when it comes to reasoning. At the same time, biologically humans and many other animals are very similar, when it comes to brain, but humans have more "processing power". So it's only natural to expect some emergent properties from increasing number of parameters.

tarsinge
3 replies
9h33m

You are missing the point that it can be a model limit. LLMs were a breakthrough but that doesn’t mean they are a good model for some other problems, no matter the number of parameters. Language contains more than we thought, as GPT has impressively showed (ie semantics embedded in the syntax emerging from text compression), but still not every intellectual process is language based.

xcv123
2 replies
7h47m

I know that, but deep learning is more than LLMs. Transformers aren't the final ultimate stage of deep learning. We haven't found the limit yet.

tarsinge
1 replies
7h7m

You were talking about the number of parameters on existing models. Like the history of Deep Learning has shown, simply throwing more computing power at an existing approach will plateau and not result in a fundamental breakthrough. Maybe we'll find new architectures, but the point was that the current ones might be showing their limits, and we shouldn't expect the model suddenly become good at something they are currently unable to handle because "more parameters".

xcv123
0 replies
6h51m

Yes you're right I only mentioned the size of the model. The rate of progress has been astonishing and we haven't reached the end, in terms of both of size and algorithmic sophistication of the models. There is no evidence that we have reached a fundamental limit of AI in the context of deep learning.

lelanthran
1 replies
4h7m

Ever heard of something called diminishingly returns?

The value improvement between 17.5b parameters and 175b parameters is much greater than the value improvement between 175b parameters and 18t parameters.

IOW, each time we throw 100 times more processing power at the problem, we get a measly 2 time increase in value.

xcv123
0 replies
3h42m

Yes that's a good point. But the algorithms are improving too.

londons_explore
1 replies
8h16m

Early attempts at human flight approached it by strapping wings to people's arms and flapping: Do you think that would have eventually worked too, if only we had just given it a bit more time and faith?

Interestingly, we how have human powered aircraft... We have flown ~60km with human leg power alone. We've also got human powered ornithopters (flapping wing designs) which can fly but only for very short times before the pilot is exhausted.

I expect that another 100 years from now, both records will be exceeded, altough probably for scientific curiosity more than because human powered flight is actually useful.

ben_w
0 replies
7h43m

I knew about the legs (there was a model in the London Science Museum when I was a kid), but I didn't know about the ornithopter.

https://en.wikipedia.org/wiki/UTIAS_Snowbird

13 years ago! Wow, how did I miss that?

ra
1 replies
10h51m

Indeed. LLM is an application on a transformer trained with backpropagation. What stops you from adding a logic/mathematic "application" on the same transformer?

seanhunter
0 replies
8h27m

Nothing, and there are methods which allow these types of models to learn to use special purpose tools of this kind[1].

[1] https://arxiv.org/abs/2302.04761 Toolformer: Language Models Can Teach Themselves to Use Tools

jimmySixDOF
0 replies
10h30m

Yes, it seems like this is a direction to replace RLHF so another way to scale without baremetal and if not this then still just a matter of time before some model optimization outperforms the raw epoch/parameters/token approach.

muskmusk
26 replies
12h9m

Friend, the creator of this new progress is a machine learning PhD with a decade of experience in pushing machine learning forward. He knows a lot of math too. Maybe there is a chance that he too can tell the difference between a meaningless advance and an important one?

neilk
9 replies
11h58m

I am neither a mathematician or LLM creator but I do know how to evaluate interesting tech claims.

The absolute best case scenario for a new technology is that it when it seems like a toy for nerds, and doesn't outperform anything we have today, but the scaling path is clear.

Its problems just won't matter if it does that one thing with scaling. The web is a pretty good hypermedia platform, but a disastrously bad platform for most other computer applications. Nevertheless the scaling of URIs and internet protocols have caused us to reorganize our lives around it. And then if there really are unsolvable problems with the platform they just get offloaded onto users. Passwords? Privacy? Your problem now. Surely you know to use a password manager?

I think this new wave of AI is going to be like that. If they never solve the hallucination/confabulation issue, it's just going to become your problem. If they never really gain insight, it's going to become your problem to instruct them carefully. Your peers will chide for not using a robust AI-guardrail thing or not learning the basics of prompt engineering like all the kids do instinctively these days.

wbhart
8 replies
11h31m

How on earth could you evaluate the scaling path with too little information. That's my point. You can't possibly know that a technology can solve a given kind of problem if it can only so far solve a completely different kind of problem which is largely unrelated!

Saying that performance on grade-school problems is predictive of performance on complex reasoning tasks (including theorem proving) is like saying that a new kind of mechanical engine that has 90% efficiency can be scaled 10x.

These kind of scaling claims drive investment, I get it. But to someone who understands (and is actually working on) the actual problem that needs solving, this kind of claim is perfectly transparent!

dwaltrip
2 replies
10h42m

For the current generative AI wave, this is how I understand it:

1. The scaling path is decreased val/test loss during training.

2. We have seen multiples times that large decreases in this loss have resulted in very impressive improvements in model capability across a diverse set of tasks (e.g. gpt-1 through gpt-4, and many other examples).

3. By now, there is tons of robust data demonstrating really nice relationships between model size, quantity of data, length of training, quality of data, etc and decreased loss. Evidence keeps building that most multi-billion param LLMs are probably undertrained, perhaps significantly so.

4. Ergo, we should expect continued capability improvement with continued scaling. Make a bigger model, get more data, get higher data quality, and/or train for longer and we will see improved capabilities. The graphs demand that it is so.

---

This is the fundamental scaling hypothesis that labs like OpenAI and Anthropic have been operating off of for the past 5+ years. They looked at the early versions of the curves mentioned above, extended the lines, and said, "Huh... These lines are so sharp. Why wouldn't it keep going? It seems like it would."

And they were right. The scaling curves may break at some point. But they don't show indications of that yet.

Lastly, all of this is largely just taking existing model architectures and scaling up. Neural nets are a very young technology. There will be better architectures in the future.

jacquesm
1 replies
6h23m

We're at the point now where the harder problem is obtaining the high quality data you need for the initial training in sufficient quantities.

dr_dshiv
0 replies
6h5m

These European efforts to create competitive LLMs need to know that.

OOPMan
2 replies
11h18m

Honestly, OpenAI seem more like a cult that a company to me.

The hyperbole that surrounds them fits the mould nicely.

hutzlibu
1 replies
8h30m

They did build the most advanced LLM tool, though.

dr_dshiv
0 replies
6h11m

Maybe it takes a cult

uoaei
0 replies
10h20m

Any claims of objective, quantitative measurements of "scaling" in LLMs is voodoo snake oil when measured against some benchmarks consisting of "which questions does it answer correctly". Any machine learning PhD will admit this, albeit only in a quiet corner of a noisy bar after a few more drinks than is advisable when they're earning money from companies who claim scaling wins on such benchmarks.

neilk
0 replies
1h11m

I didn’t say “certain success”, I said “interesting”

raincole
8 replies
11h27m

But he also has the incentive to exaggerate the AI's ability.

The whole idea of double-blind test (and really, the whole scientific methodology) is based on one simple thing: even the most experienced and informed professionals can be comfortably wrong.

We'll only know when we see it. Or at least when several independent research groups see it.

visarga
2 replies
10h35m

even the most experienced and informed professionals can be comfortably wrong

That's the human hallucination problem. In science it's a very difficult issue to deal with, only in hindsight you can tell which papers from a given period were the good ones. It takes a whole scientific community to come up with the truth, and sometimes we fail.

auggierose
1 replies
9h21m

No. It takes just one person to come up with the truth. It then can takes ages to convince the "scientific community".

visarga
0 replies
32m

Well, one person will usually add a tiny bit of detail to the "truth". It's still a collective task.

lokar
2 replies
11h8m

I thought (and could be wrong) that all of these concerns are based on a very low probability of a very bad outcome.

So: we might be close to a breakthrough, that breakthrough could get out of hand, then it could kill a billion+ people.

patrec
1 replies
10h22m

I thought (and could be wrong) that all of these concerns are based on a very low probability of a very bad outcome.

Among knowledgeable people who have concerns in the first place, I'd say giving the probability of a very bad outcome of cumulative advances as "very low" is a fringe position. It seems to vary more between "significant" and "close to unity".

There are some knowledgeable people like Yann LeCun who have no concerns whatsoever but they seem singularly bad at communicating why this would be a rational position to take.

ben_w
0 replies
7h49m

Given how dismissive LeCun is of the capabilities of SotA models, I think he thinks the state of the art is very far from human, and will never be human-like.

Myself, I think I count as a massive optimist, as my P(doom) is only about 15% — basically the same as Russian Roulette — half of which is humans using AI to do bad things directly.

aidaman
1 replies
10h29m

Unlikely. We'll know when OpenAI has declared itself ruler of the new world, imposes martial law, and takes over.

Gud
0 replies
7h12m

Why would you ever know? Why would the singularity reveal itself in such an obvious way(until it's too late to stop it)?

seanhunter
3 replies
8h44m

That is as pure an example of the fallacy of argument from authority[1] as I have ever seen especially when you consider that any nuance in the supposed letter from the researchers to the board will have been lost in the translation from "sources" to the journalist to the article.

[1] https://en.wikipedia.org/wiki/Argument_from_authority

abhpro
1 replies
8h20m

That fallacy's existence alone doesn't discount anything (nor have you shown it's applicable here), otherwise we'd throw out the entire idea of authorities and we'd be in trouble

mejutoco
0 replies
7h44m

Authorities are useful within a context. Appealing to authority is not an argument. At most, it is an heuristic.

_Using_ this fallacy in an argument invalidates the argument (or shows it did not exist in the first place)

Eisenstein
0 replies
5h39m

When the person arguing uses their own authority (job, education) to give their answer relevance, then stating that the authority of another person is greater (job, education) to give that person's answer preeminence is valid.

smrtinsert
0 replies
4h44m

Ah finally the engineers approach to the news. I'm not sure why we have to have hot takes, instead of dissecting the news and trying to tease out the how.

nobrains
0 replies
11h58m

Also, wbhart is referring to publicly released LLMs, while the OpenAI researchers are most likely referring to an un-released in-research LLM.

las_balas_tres
0 replies
9h5m

Sure... but that machine learning PhD has a vested interest in being optimistically biased in his observations.

poulpy123
13 replies
9h42m

I don't know for Q* of course, but all the tests I made with GPT4, and all what I've read and seen about it, show that it is unable to reason. It was trained with an unfathomable amount of data, so it can simulate reasoning very well, but it is unable to reason

oezi
11 replies
9h29m

What is the difference between simulating reasoning very well and "actual" reasoning?

seanhunter
5 replies
8h57m

Actual reasoning shows the understanding and use of a model of the key features of the underlying problem/domain.

As a simple example that you can replicate using chatgpt, ask it to solve some simple maths problem. Very frequently you will get a solution that looks like reasoning but is not, and reveals that it does not have an actual model of the underlying maths but is in fact doing text prediction based on a history of maths. For example see here[1]. I ask it for some quadratics in x with some specification on the number of roots. It gives me what looks at first glance like a decent answer. Then I ask the same exact question but asking for quadratics in x and y[2]. Again the answer looks plausible except that for the solution "with one real root" it says the solution has one real root when x + y =1. Well there are infinite real values for x and y such that x + y =1, not one real root. It looks like it has solved the problem but instead it has simulated the solving of the problem.

Likewise stacking problems, used to check for whether an AI has a model of the world. This is covered in "From task structures to world models: What do LLMs know?"[3] but for example here[4] I ask it whether it's easier to balance a barrel on a plank or a plank on a barrel. The model says it's easier to balance a plank on a barrel with an output text that simulates reasoning discussing center of mass and the difference between the flatness of the plank and the tendency of the barrel to roll because of its curvature. Actual reasoning would say to put the barrel on its end so it doesn't roll (whether you put the plank on top or not).

[1] https://chat.openai.com/share/64556be8-ad20-41aa-99af-ed5a42...

[2] https://chat.openai.com/share/2cd39197-dc09-4d07-a0d6-6cd800...

[3] https://arxiv.org/abs/2310.04276

[4] https://chat.openai.com/share/4b631a92-0d55-4ae5-8892-9be025...

chipsambos
4 replies
8h0m

I generally agree with what you're saying and the first half of your answer makes perfect sense but I think the second is unfair (i.e. "[is it] easier to balance a barrel on a plank or a plank on a barrel"). It's a trick question and "it" tried to answer in good faith.

If you were to ask the same question of a real person and they replied with the exact same answer you could not conclude that person was not capable of "actual reasoning". It's a bit of witch-hunt question set to give you the conclusion you want.

seanhunter
2 replies
7h20m

I didn't make up this methodology and it's genuinely not a trick question (or not intended as such), it's a simple example of an actual class of questions that researchers ask when trying to determine whether a model of the world exists. The paper I linked uses a ball and a plank iirc. Often they use a much wider range of objects eg: something like "Suggest a stable way of stacking a laptop, a book, 4 wine classes, a wine bottle and an orange" is one that I've seen in a paper for example.

chipsambos
1 replies
6h40m

ok I believe it may not have been intended as a trick but I think it is. As a human, I'd have assumed you meant the trickier balancing scenario i.e. the plank and barrel on its side.

The question you quoted ("Suggest a stable way of stacking a laptop, a book, 4 wine classes, a wine bottle and an orange") I would consider much fairer and cgpt3.5 gives a perfectly "reasonable" answer:

https://chat.openai.com/share/fdf62be7-5cb2-4088-9131-40e089...

seanhunter
0 replies
6h12m

What's interesting about that one is I think that specific set of objects is part of its training set because when I have played around with swapping out a few of them it sometimes goes really bananas.

seanhunter
0 replies
3h11m

I should have said, as I understand it, the point of this type of question is not that one particular answer is the right answer and another is wrong, it's that often the model in giving an answer will do something really weird that shows that it doesn't have a model of the world.

silvaring
1 replies
9h10m

Actual reasoning is made up of various biological feedback loops that happen in the body and brain, essentially your physical senses give you the ability to reason in the first place, without the eyes, ears etc there is no ability to learn basic reasoning, which is why kids who are blind or mute from birth have huge issues learning about object permanence, spatial awaraness etc. You cant expect human reasoning without human perception.

My question is how does the AI perceive. Basically how good is the simulation for its perception. If we know that, then we can probably assess its ability to reason because we can compare it to the closest benchmark we have (your average human being). How do AI's see, how did they learn concepts in strings of words and pixels? How does the concept it learnt in text carry through to images of colors, of shapes? Does it show a transfer of conceptual understanding across both two and three dimentional shapes?

I know these are more questions than answers, but its just things that I've been wondering about.

ajuc
0 replies
7h59m

This ship can't swim because only living creatures swim. It's true but it only shows your definition sucks.

parentheses
0 replies
9h20m

I think the poster meant that it's capable of having a high probability of correct reasoning - simulating reasoning is lossy, actual reasoning is not. Though, human reasoning is still lossy.

Levitz
0 replies
5h12m

Being able to extrapolate with newly found data.

You can get a LLM to simulate it "discovering" the pythagorean theorem, but can it actually, with the knowledge that was available at the time, discover the pythagorean theorem by itself?

Any parent will tell you, it's easy to simulate discovery and reasoning, it's a trick played for kids all the time. The actual, real stuff, that's way harder.

Jare
0 replies
9h8m

Probably best to say "simulate the appearance of reasoning": looks and feels 100% acceptable at a surface level, but the actual details and conclusions are completely wrong / do not follow.

swombat
0 replies
5h54m

Similarly, AlphaGo and Stockfish are only able to simulate reasoning their way through a game of Go or a game of Chess.

That simulated reasoning is enough to annihilate any human player they're faced with.

As Dijkstra famously said, "whether Machines Can Think... is about as relevant as the question of whether Submarines Can Swim".

Submarines don't swim, cars don't walk or gallop, cameras don't paint or draw... So what?

Once AI can simulate reasoning better than we can do the genuine thing, the question really becomes utterly irrelevant to the likely outcome.

xcv123
4 replies
11h37m

I feel very comfortable saying, as a mathematician, that the ability to solve grade school maths problems would not be at all a predictor of ability to solve real mathematical problems at a research level.

At some point in the past, you yourself were only capable of solving grade school maths problems.

SantalBlush
3 replies
11h17m

The statement you quoted also holds for humans. Of those who can solve grade school math problems, very, very few can solve mathematical problems at a research level.

kgeist
1 replies
11h6m

We're moving the goalposts all the time. First we had the Turing test, now AI solving math problems "isn't impressive". Any small mistake is a proof it cannot reason at all. Meanwhile 25% humans think the Sun revolves around the Earth and 50% of students get the bat and ball problem wrong.

cheese_van
0 replies
3h42m

Thank you for mentioning the "bat and ball" problem. Having neither a math nor CS background, I hadn't heard of it - and got it wrong. And reflecting on why I got it wrong I gained a little understanding of my own flawed mind. Why did I focus on a single variable and not a relationship? It set my mind wandering and was a nice morsel to digest with my breakfast. Thanks!

xcv123
0 replies
11h13m

You missed the point. Deep learning models are in the early stages of development.

With recent advancements they can already outperform humans at many tasks that were considered to require AGI level machine intelligence just a few years ago.

gmt2027
4 replies
8h16m

We have an algorithm and computational hardware that will tune a universal function approximator to fit any dataset with emergent intelligence as it discovers abstractions, patterns, features and hierarchies.

So far, we have not yet found hard limits that cannot be overcome by scaling the number of model parameters, increasing the size and quality of training data or, very infrequently, adopting a new architecture.

The number of model parameters required to achieve a defined level of intelligence is a function of the architecture and training data. The important question is, what is N, the number of model parameters at which we cross an intelligence threshold and it becomes theoretically possible to solve mathematics problems at a research level for an optimal architecture that we may not yet have discovered. Our understanding does not extend to the level where we can predict N but I doubt that anyone still believes that it is infinity after seeing what GPT4 can do.

This claim here is essentially a discovery that N may be much closer to where we are with today's largest models. Researchers at the absolute frontier are more likely to be able to gauge how close they are to a breakthrough of that magnitude from how quickly they are blowing past less impressive milestones like grade school math.

My intuition is that we are in a suboptimal part of the search space and it is theoretically possible to achieve GPT4 level intelligence with a model that is orders of magnitude smaller. This could happen when we figure out how to separate the reasoning from the factual knowledge encoded in the model.

waveBidder
3 replies
8h2m

intelligence isn't a function unless you're talking about over every possible state of the universe.

vidarh
1 replies
4h46m

Intelligence must inherently be a function unless there is a third form of cause-effect transition that can't be modelled as a function of determinism and randomness.

mk67
0 replies
3h55m

Functions are by definition not random. Randomness would break: "In mathematics, a function from a set X to a set Y assigns to each element of X exactly one element of Y"

gmt2027
0 replies
5h36m

There are well described links between intelligence and information theory. Intelligence is connected to prediction and compression as measures of understanding.

Intelligence has nothing specific to do with The Universe as we known it. Any universe will do, a simulation, images or a set of possible tokens. The universe is every possible input. The training set is a sampling drawn from the universe. LLMs compress this sampling and learn the processes and patterns behind it so well that they can predict what should come next without any direct experience of our world.

All machine learning models and neural networks are pure functions. Arguing that no function can have intelligence as a property is equivalent to claiming that artificial intelligence is impossible.

calf
4 replies
11h59m

But, isn't AlphaGo a solution to kind of specific mathematical problem? And that it has passed with flying colors?

What I mean is, yes, neural networks are stochastic and that seems to be why they're bad at logic; on the other hand it' not exactly hallucinating a game of Go, and that seems different to how neural networks are prone to hallucination and confabulation on natural language or X-ray imaging.

wbhart
3 replies
11h11m

Sure, but people have already applied deep learning techniques to theorem proving. There are some impressive results (which the press doesn't seem at all interested in because it doesn't have ChatGPT in the title).

It's really harder than one might imagine to develop a system which is good at higher order logic, premise selection, backtracking, algebraic manipulation, arithmetic, conjecturing, pattern recognition, visual modeling, has a good mathematical knowledge, is autonomous and fast enough to be useful.

For my money, it isn't just a matter of fitting a few existing jigsaw pieces together in some new combination. Some of the pieces don't exist yet.

visarga
1 replies
10h33m

You seem knowledgeable. Can you share a couple of interesting papers for theorem proving that came out in the last year? I read a few of them as they came out, and it seemed neural nets can advance the field by mixing "soft" language with "hard" symbolic systems.

wbhart
0 replies
9h50m

The field is fairly new to me. I'm originally from computer algebra, and somehow struggling into the field of ATP.

The most interesting papers to me personally are the following three:

* Making higher order superposition work. https://doi.org/10.1007/978-3-030-79876-5_24

* MizAR 60 for Mizar 50. https://doi.org/10.48550/arXiv.2303.06686

* Magnus Hammer, a Transformer Based Approach to Premise Selection. https://doi.org/10.48550/arXiv.2303.04488

Your mileage may vary.

calf
0 replies
9h50m

Then your critique is about LLMs specifically.

But even there, can we say scientifically that LLMs cannot do math? Do we actually know that? And in my mind, that would imply LLMs cannot achieve AGI either. What do we actually know about the limitations of various approaches?

And couldn't people argue that it's not even necessary to think in terms of capabilities as if they were modules or pieces? Maybe just brute-force the whole thing, make a planetary scale computer. In principle.

Davidzheng
4 replies
12h6m

I agree that in and of itself it's not enough to be alarmed. Also i have to say i don't really know what grade school mathematics means here(multiplication? Proving triangles are congruent?). But I think the question is, whether the breakthrough is an algorithmic change in reasoning. If it is, then it could challenge all 4 of your limitations. Again this article is low on details so really we are arguing over our best guesses. But I wouldn't be so confident that an improvement on simple math problems due to algorithms can have huge implications.

Also, do you remember what go players said when they beat Fan Hui? Change can come quick

wbhart
3 replies
11h48m

I think maybe I didn't make myself quite clear here. There are already algorithms which can solve advanced mathematical problems 100% reliably (prove theorems). There are even algorithms which can prove any correct theorem that can be stated in a certain logical language, given enough time. There are even systems in which these algorithms have actually been implemented.

My point is that no technology which can solve grade school maths problems would be viewed as a breakthrough by anyone who understood the problem. The fundamental problems which need to be solved are not problems you encounter in grade school mathematics. The article is just ill-informed.

tim333
0 replies
3h23m

no technology which can solve grade school maths problems would be viewed as a breakthrough ...

Not perhaps in the sense of making mathematicians redundant but it seems like a breakthrough for ChatGPT type programs.

You've got to remember these things have gone from kind of rubbish a year or so ago to being able to beat most students at law exams now and by the sounds of it beat students at math tests shortly. At that rate or progress they'd be competing with the experts before very long.

kenjackson
0 replies
9h39m

“Given enough time” makes that a useless statement. Every kid in college learns this.

The ability to eventually solve a given theorem isn’t interesting — especially if the time is longer than the time left in the universe.

It’s far more interesting to see if an AI can, given an arbitrarily stated problem make clear progress quickly.

himaraya
0 replies
11h43m

The article suggests the way Q* solves basic math problems matters more than the difficulty of the problems themselves. Either way, I think judging the claims made remains premature without seeing the supporting documentation.

d--b
3 replies
7h47m

You make the asumption that Q* is a LLM, but I think OpenAI guys know very well that the current LLM architecture cannot achieve AGI.

As the name suggests, this things is likely using some form of Q learning algorithm, which makes it closer to the DeepMind models than a transformer.

My guess is that they pipe their LLM into some Q learnt net. The LLM may transform a natural language task into some internal representation that can then be handled by the Q-learnt model, which spits out something that can be transformed back again into natural language.

jansan
1 replies
2h37m

There is a paper about something called Q*. I have no idea if they are connected or if the name matched coincidentially.

https://arxiv.org/abs/2102.04518

wegfawefgawefg
0 replies
1h59m

The real world is a space of continuous actions. To this day Q algorithms have been ones of discrete action outputs. I'd be surprised if a Q algorithm could handle the huge action space of language. Honestly its weird they'd consider the Q family. I figured we were done with that after PPO performed so well.

wegfawefgawefg
0 replies
1h57m

As an ML programmer, i think that approach sounds really too complicated. It is always a bad idea to render the output of one neural network into output space before feeding it into another, rather than have them communicate in feature space.

adastra22
3 replies
12h5m

Back-tracking is a very nearly solved problem in the context of Prolog-like languages or mathematical theorem provers (as you probably well know). There are many ways you could integrate an LLM-like system into a tactic-based theorem prover without having to restart from the beginning for each alternative. Simply checkpointing and backtracking to a checkpoint would naively improve upon your described Monte Carlo algorithm. More likely I assume they are using RL to unwind state backwards and update based on the negative result, which would be significantly more complicated but also much more powerful (essentially it would one-shot learn from each failure).

That's just what I came up with after thinking on it for 2 minutes. I'm sure they have even better ideas.

wbhart
0 replies
11h55m

There are certainly efforts along the lines of what you suggest. There are problems though. The number of backtracks is 10^k where k is not 2, or 3, or 4.....

Another issue is that of autoformalisation. This is the one part of the problem where an LLM might be able to help, if it were reliable enough (it isn't currently) or if it could truly understand the logical structure of mathematical problems correctly (currently they can't).

visarga
0 replies
10h29m

You can also consider the chatGPT app as a RL environment. The environment is made of the agent (AI), a second agent (human), and some tools (web search, code, APIs, vision). This grounds the AI into human and tool responses. They can generate feedback that can be incorporated into the model by RL methods.

Basically every reply from a human can be interpreted as a reward signal. If the human restates the question, it means a negative reward, the AI didn't get it. If the human corrects the AI, another negative reward, but if they continue the thread then it is positive. You can judge turn-by-turn and end-to-end all chat logs with GPT4 to annotate.

The great thing about chat based feedback is that it is scalable. OpenAI has 100M users, they generate these chat sessions by the millions every day. Then they just need to do a second pass (expensive, yes) to annotate the chat logs with RL reward signals and retrain. But they get the human-in-the-loop for free, and that is the best source of feedback.

AI-human chat data is in-domain for both the AI and human, something we can't say about other training data. It will contain the kind of mistakes AI does, and the kind of demands humans want to solve with AI. My bet is that OpenAI have realized this and created GPTs in order to enrich and empower the AI to create the best training data for GPT-5.

The secret sauce of OpenAI is not their people, or Sam, or the computers, but the training set, especially the augmented and synthetic parts.

riku_iki
0 replies
10h8m

That's just what I came up with after thinking on it for 2 minutes. I'm sure they have even better ideas.

the thing is that ideas not necessary easy to implement. There will be many obstacles on route you described:

- quality of provers, is there good ergo provers which also can run at large scales (say billions of facts)

- you need some formalization approach, probably LLM will do some work, but we don't know what will be quality

- LLM likely will generate many individual factoids, which are losely compatible, contradicting, etc, and untrivial effort is required to reconcile and connect them

richardw
2 replies
11h41m

On backtracking, I thought tree-of-thought enabled that?

"considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices"

https://arxiv.org/abs/2305.10601

Generally with you though, this thing is not leading to real smarts and that's accepted by many. Yes, it'll fill in a few gaps with exponentially more compute but it's more likely that an algo change is required once we've maxed out LLM's.

wbhart
1 replies
10h59m

Yes, there are various approaches like tree-of-thought. They don't fundamentally solve the problem because there are just too many paths to explore and inference is just too slow and too expensive to explore 10,000 or 100,000 paths just for basic problems that no one wanted to solve anyway.

The problem with solving such problems with LLMs is that if the solution to the problem is unlike problems seen in training, the LLM will almost every time take the wrong path and very likely won't even think of the right path at all.

The AI really does need to understand why the paths it tried failed in order to get insight into what might work. That's how humans work (well, one of many techniques we use). And despite what people think, LLMs really don't understand what they are doing. That's relatively easy to demonstrate if you get an LLM off distribution. They will double down on obviously erroneous illogic, rather than learn from the entirely new situation.

richardw
0 replies
10h43m

Thank you for the thoughtful response

nijave
2 replies
12h10m

ChatGPT (3.5) seems to do some rudimentary backtracking when told it's wrong enough times. However, it does seem to do very poorly in the logic department. LLMs can't seem to pick out nuance and separate similar ideas that are technically/logically different.

They're good at putting things together commonly found together but not so good at separating concepts back out into more detailed sub pieces.

wbhart
1 replies
12h0m

I've tested GPT-4 on this and it can be induced to give up on certain lines of argument after recognising they aren't leading anywhere and to try something else. But it would require thousands (I'm really under exaggerating here) of restarts to get through even fairly simple problems that professional mathematicians solve routinely.

Currently the context length isn't even long enough for it to remember what problem it was solving. And I've tried to come up with a bunch of ways around this. They all fail for one reason or another. LLMs are really a long, long way off managing this efficiently in my opinion.

Davidzheng
0 replies
9h31m

Weird time estimate given that a little more than a year ago, the leading use of LLMs was generating short coherent paragraphs (3-4 sentences)

jiggawatts
2 replies
8h40m

Everything you said about LLMs being "terrible at X" is true of the current generation of LLM architectures.

From the sound of it, this Q* model has a fundamentally different architecture, which will almost certainly make some of those issues not terrible any more.

Most likely, the Q* design is the very similar to the one suggested recently by one of the Google AI teams: doing a tree search instead of greedy next token selection.

Essentially, current-gen LLMs predict a sequence of tokens: A->B->C->D, etc... where the next "E" token depends on {A,B,C,D} and then is "locked in". While we don't know exactly how GPT4 works, reading between the lines of the leaked info it seems that it evaluates 8 or 16 of these sequences in parallel, then picks the best overall sequence. On modern GPUs, small workloads waste the available computer power because of scheduling overheads, so "doing redundant work" is basically free up to a point. This gives GPT4 a "best 1 of 16" output quality improvement.

That's great, but each option is still a linear greedy search individually. Especially for longer outputs the chance of a "mis-step" at some point goes up a lot, and then the AI has no chance to correct itself. All 16 of the alternatives could have a mistake in them, and now its got to choose between 16 mistakes.

It's as if you were trying to write a maths proof, asked 16 students, and instructed them to not cooperate and write their proof left-to-right, top-to-bottom without pausing, editing, or backtracking in any way! It'd like to see how "smart" humans would be at maths under those circumstances.

This Q* model likely does what Google suggested: Do a tree search instead of a strictly linear search. At each step, the next token is presented as a list of "likely candidates" with probabilities assigned to each one. Simply pick to "top n" instead of the "top 1", branch for a bit like that, and then prune based on the best overall confidence instead of the best next token confidence. This would allow a low-confidence next token to be selected, as long as it leads to a very good overall result. Pruning bad branches is also effectively the same as back-tracking. It allows the model to explore but then abandon dead ends instead of being "forced" to stick with bad chains of thought.

What's especially scary -- the type of scary that would result in a board of directors firing an overly commercially-minded CEO -- is that naive tree searches aren't the only option! Google showed that you can train a neural network to get better at tree search itself, making it exponentially more efficient at selecting likely branches and pruning dead ends very early. If you throw enough computer power at this, you can make an AI that can beat the world's best chess champion, the world's best Go player, etc...

Now apply this "AI-driven tree search" to an AI LLM model and... oh-boy, now you're cooking with gas!

But wait, there's more: GPT 3.5 and 4.0 were trained with either no synthetically generated data, or very little as a percentage of their total input corpus.

You know what is really easy to generate synthetic training data for? Maths problems, that's what.

Even up to the point of "solve this hideous integral that would take a human weeks with pen and paper" can be bulk generated and fed into it using computer algebra software like Wolfram Mathematica or whatever.

If they cranked out a few terabytes of randomly generated maths problems and trained a tree-searching LLM that has more weights than GPT4, I can picture it being able to solve pretty much any maths problem you can throw at it. Literally anything Mathematica could do, except with English prompting!

Don't be so confident in the superiority of the human mind. We all thought Chess was impossible for computers until it wasn't. Then we all moved the goal posts to Go. Then English text. And now... mathematics.

Good luck with holding on to that crown.

jacquesm
0 replies
2h15m

We all thought Chess was impossible for computers until it wasn't.

I don't know who 'we' is but Chess was a program for computers before computers powerful enough existed with the hardware represented by people computing the next move.

https://en.wikipedia.org/wiki/Turochamp

dallama
0 replies
2h38m

I immediately thought of A* path finding, I'm pretty sure Q* is the LLM "equivalent". Much like you describe.

topspin
1 replies
11h11m

I don't know whether this particular article is bunk. I do know I've read many, many similar comments about how some complex task is beyond an conceivable model or system and then, years later, marveled at exactly that complex task being solved.

jhanschoo
0 replies
10h56m

The article isn't describing something that will happen years later, but now. The comment author is saying that this current model is not AGI as it likely can't solve university-level mathematics, and they are presumably open to the possibility of a model years down the line that can do that.

naikrovek
1 replies
5h14m

it's always nice to see HN commenters with so much confidence in themselves that they feel they know a situation better than the people who are actually in the situation being discussed.

Do you really believe that they don't have skilled people on staff?

Do you really believe that your knowledge of what OpenAI is doing is a superset of the knowledge of the people who work at OpenAI?

give me 0.1% of your confidence and I would be able to change the world.

mycologos
0 replies
3h31m

The people inside a cult are not the most trustworthy sources for what the cult is doing.

est
1 replies
11h18m

The reason LLMs fail at solving mathematical problems is because

That's exactly what Go/Baduk/Weiqi players think some years ago. And superalignment is defintely OpenAI's major research objective:

https://openai.com/blog/our-approach-to-alignment-research

our AI systems are proposing very creative solutions (like AlphaGo’s move 37)

When will mathematicians face the move 37 moment?

Davidzheng
0 replies
10h25m

Probably in <3 years

elliotec
1 replies
11h25m

Hubris

lucubratory
0 replies
10h59m

Whose, in this instance? I can see an argument for both

codedokode
1 replies
9h18m

The reason LLMs fail at solving mathematical problems is because

...because they are too small and have too little weights. Cats cannot solve mathematical problems too, but unlike cats, neural network evolve.

serf
0 replies
9h3m

Cats cannot solve mathematical problems too, but unlike cats, neural network evolve.

cats evolve plenty, pressure towards mathematical reasoning has stymied as of late what with the cans of food and humans.

caesil
1 replies
11h56m

FWIW The Verge is reporting that people inside are also saying the Reuters story is bunk:

https://www.theverge.com/2023/11/22/23973354/a-recent-openai...

himaraya
0 replies
11h37m

After being contacted by Reuters, OpenAI, which declined to comment, acknowledged in an internal message to staffers a project called Q* and a letter to the board before the weekend's events, one of the people said.

Reuters update 6:51 PST

The Verge has acted like an intermediary for Sam's camp during this whole saga, from my reading.

bambax
1 replies
6h22m

4) they (current LLMs) cannot backtrack when they find that what they already wrote turned out not to lead to a solution, and it is too expensive to give them the thousands of restarts they'd require to randomly guess their way through the problem if you did give them that facility

This sounds like a reward function? If correctly implemented couldn't it enable an LLM to self-learn?

ddalex
0 replies
5h50m

Specifically what deep-Q learning (as in Q*?) does....

abeppu
1 replies
12h12m

This comment seems to presume that Q* is related to existing LLM work -- which isn't stated in the article. Others have guessed that the 'Q' in Q* is from Q-learning in RL. In particular backtracking, which you point out LLMs cannot do, would not be an issue in an appropriate RL setup.

nullc
0 replies
11h21m

which you point out LLMs cannot do, would not be an issue in an appropriate RL setup.

Hm? it's pretty trivial to use a sampler for LLMs that has a beam search and will effectively 'backtrack' a 'bad' selection.

It just doesn't normally help-- by construction the LLM sampled normally already approximates the correct overall distribution for the entire output, without any search.

I assume using a beam search does help when your sampler does have some non-trivial constraints (like the output satisfies some grammar or passes an algebraic test, or even just top-n sampling since those adjustments on a token by token basis result in a different approximate distribution than the original distribution filtered by the constraints).

CodeCompost
1 replies
9h4m

It's a text generator that spits out tokens. It has absolutely no understanding of what it's saying. We as humans are attaching meaning to the generated text.

It's the humans that are hallucinating, not the text generator.

bottlepalm
0 replies
8h14m

They've already researched this and have found model inside the LLM such as a map of the world - https://x.com/wesg52/status/1709551516577902782. Understanding is key to how so much data can be compressed into a LLM. There really isn't a better way to store all of it better than plain understanding it.

whatever1
0 replies
12h18m

The thing is that a LLMs can point out a logic error in their reasoning if specifically asked to do so.

So maybe OpenAI just slapped an RL agent on top of the next-token generator.

wegfawefgawefg
0 replies
2h7m

Your comment is regarding LLMs, but Q* may not refer to an LLM. As such, our intuition about the failure of LLM's may not apply. The name Q* likely refers to a deep reinforcement learning based model.

To comment, in my personal experience, reinforcement learning agents learn in a more relatable human way than traditional ml, which act like stupid aliens. RL Agents try something a bunch of times, mess up, and tweak their strategy. After some extreme level of experience, they can make wider strategic decisions that are a little less myopic. RL agents can take in their own output, as their actions modify the environment. RL Agents also modify the environment during training, (which I think you will agree with me is important if you're trying to learn the influence of your own actions as a basic concept). LLM's, and traditional ml in general, are never trained in a loop on their own output. But in DRL, this is normal.

So if RL is so great and superior to traditional ml why is RL not used for everything? Well the full time horizon that can be taken into consideration in a DRL Agent is very limited, often a handful of frames, or distilled frame predictions. That prevents them from learning things like math. Traditionally RL bots have been only used for things like robotic locomotion, chess, go. Short term decision making that is made given one or some frames of data. I don't even think any RL bots have learned how to read english yet lol.

For me, as a human, my frame predictions exist on the scale of days, months, and years. To learn math I've had to sit and do nothing for many hours, and days at a time, consuming my own output. For a classical RL bot, math is out of the question.

But, my physical actions, for ambulation, manipulation, and balance, are made for me by specialized high speed neural circuits that operate on short time horizons, taking in my high level intentions, and all the muscle positions, activation, sensor data, etc. Physical movement is obfuscated from me almost in entirety. (RL has so far been good at tasks like this.)

With a longer frame horizon, that predicts frames far into the future, RL can be able to make long term decisions. It would likely take a lifetime to train. So you see now why math has not been accomplished by RL yet, but I don't think the faculty would be impossible to build into an ml architecture.

An RL bot that does math would likely spin on its own output for many many frames, until deciding that it is done, much like a person.

vidarh
0 replies
7h8m

I feel very comfortable to say that while the ability to solve grade school maths is not a predictor of abilities at a research level, the advances needed to solve 1 and 2 will mean improving results across the board unless you take shortcuts (e.g. adding an "add" instruction as proposed elsewhere), because if you actually dig into prompting an LLM to follow steps for arithmetic what you quickly see is that problem has not been the ability to reason on the whole (that is not to suggest that the ability to reason is good enough), but ability to consistently and precisely follow steps a sufficient number of times.

It's acting like a bored child who hasn't had following the steps and verifying the results repetitively drilled into it in primary school. That is not to say that their ability to reason is sufficient to reason at an advanced level yet, but so far what has hampered a lot of it has been far more basic.

Ironically, GPT4 is prone to take shortcuts and make use of the tooling enabled for it to paper over its abilities, but at the same time having pushed it until I got it to actually do arithmetic of large numbers step by step, it seems to do significantly better than it used to at systematically and repetitively following the methods it knows, and at applying "manual" sanity checks to its results afterward.

As for lemma conjecturing, there is research ongoing, and while it's by no means solved, it's also not nearly as dire as you suggest. See e.g.[1]

That's not to suggest it's reasoning abilities are sufficient, but I also don't think we've seen anything to suggest we're anywhere close to hitting the ceiling of what current models can be taught to do, even before considering advancements in tooling around them, such as giving them "methods" to work to and a loop with injected feedback, access to tools and working memory.

[1] https://research.chalmers.se/en/publication/537034

two_in_one
0 replies
5h51m

The reason LLMs solve school problems is because they've been trained on solutions. The problems are actually very repetitive. Not surprising for each 'new' of them there was something similar in training set. For research level problems there is nothing in training set. That's why they don't perform well.

Just today I asked GPT4 a simple task. Having mouse position in zoomed and scrolled image find it's position in the original image. GPT4 happily wrote the code, but it was completely wrong. I had to fix it manually.

However, the performance can be increased if there are several threads working on solution. Some suggesting and others analyzing the solution(s). This will increase the size of 'active' memory, at least. And decrease the load on threads, making them more specialized and deeper. This requires more resources, of course. And good management with task split. May be a dedicated thread for that.

theonlybutlet
0 replies
1h27m

You're underestimating the power of LLM's.

I'll address two of your points as the other two stem from this.

They can't backtrack that's purely just design and can be easily trained there's no need to simulate at random until it gets the answer, if allowed to review it's prior answers and consider this, if often can reason a better answer. Further more breaking down problems. This is easily demonstrated when looking at how accuracy improves when you ask it to explain it's reasoning as it calculates (break it down into smaller problems). The same for humans, large mathematical problems are solved using learned methods to breakdown and simplify calculations into those easier for us to calculate and build up.

If the model was able to self adjust weightings based on it's finding this would further improve it (another design limitation we'll eventually get to improve, reinforcement learning). Much like 2+2=4 is your instantaneous answer, the neural connection has been made so strong in our brains by constant emphasis we no longer need to think of an abacus each time we get to the answer 4.

You're also ignoring the emergent properties of these LLMs, theyre obviously not yet at human level but they do understand the underlying values and can reason using this value. Semantic search/embeddings is evidence of this.

stephenboyd
0 replies
10h41m

Did they say it was an LLM? I didn’t see that in the reporting.

sheepscreek
0 replies
3h14m

They learn from failed attempts in ways that LLMs do not seem to. The material they are trained on surely contributes to this problem.

For transformer models, they do learn from their mistakes but only during the training stage.

There’s no feedback loop during inference, and perhaps there needs to be something; like real-time fine-tuning.

paulsutter
0 replies
7h56m

Good point. What would these AI people know about AI? You’re right, what they’re doing will never work

You should make your own, shouldn’t take more than a weekend, right?

nullc
0 replies
11h26m

It's also hard to know what the LLM has reasoned out vs has memorized.

I like the very last example in my tongue-in-cheek article, https://nt4tn.net/articles/aixy.html

Certainly the LLM didn't derive Fermat's theorem on sums of two squares under the hood (and, of course, very obviously didn't prove it correct-- as the code is technically incorrect for 2), but I'm somewhat doubtful that there was any function exactly like the template in codex's training set either (at least I couldn't quickly find any published code that did that). The line between creating something and applying a memorized fact in a different context is not always super clear.

lukego
0 replies
8h7m

1) they are terrible at arithmetic, 2) they are terrible at algebra

The interaction can be amusing. Proving algebra non-theorems by cranking through examples until an arithmetic mistake finally leads to a "counter-example."

It's like https://xkcd.com/882/ for theorems.

kolinko
0 replies
8h26m

LLMs by themselves don’t learn from past past mistakes, but you could cycle inference steps and fine tuning/retraining steps.

Also, you can store failed attempts and lessons learned in context.

jug
0 replies
2h3m

1. OpenAI researchers used loaded and emotional words, implying shock or surprise. It's not easy to impress an OpenAI researcher like this, and above all, they understand the difficulty difference between teaching AI grade school and complex math since many years. They also understand that solving math with any form of reliability is only an emergent property in quite advanced LLM's.

2. Often, research is made on toy models and if this would be such a model, acing grade school problems (as per the article) would be quite impressive to say the least as this ability simply isn't emergent early in current LLM's.

What I think might have happened here is a step forward in AI capacity that has surprised researchers not because it is able to do things it couldn't at all do before, but how _early_ it is able to do so.

greendesk
0 replies
8h0m

Thinking is about associations and object visualisation. Surely a non-human system can build those, right? Pointing out only to a single product exposed to the public does not prove limitations for a theoretical limit.

aremat
0 replies
5h33m

"A Mathematician" (Lenat and co.) did indeed attempt to approach creative theorem development from a radically different approach (syllogistic search-space exploration, not dissimilar to forward-chaining in Prolog), although they ran into problems distinguishing "interesting" results from merely true results: https://web.archive.org/web/20060528011654/http://www.comp.g...

afpx
0 replies
5h18m

What amazes me is how close it gets to the right answer, though. Pick a random 10-digit number, then ask the next 20 numbers in sequence.

I feel like the magic in these LLMs is in how they work well in stacks, trees or in seqence. They become elements of other data structures. Consider a network of these, combined with other specialized systems and an ability to take and give orders. With reinforcement learning, it could begin building better versions of itself.

Tangokat
0 replies
6h44m

How about this:

- The Q* model is very small and trained with little compute.

- The OpenAI team thinks the model will scale in capability in the same way the GPT models do.

- Throwing (much) more compute at the model will likely allow it to solve research level math and beyond, perhaps also do actual logic reasoning in other areas.

- Sam goes to investors to raise more money (Saudi++) to fund the extra compute needed. He wants to create a company making AI chips to get more compute etc.

- The board and a few other OpenAI employees (notably Ilya) wants to be cautious and adopt a more "wait and see" approach.

All of this is speculation of course.

Ldorigo
0 replies
5h32m

Did anyone claim that it would be a predictor of solving math problems at a research level? Inasmuch as we can extrapolate from the few words in the article it seems more likely that the researchers working on this project identified some emergent reasoning abilities exemplified with grade level math. Math literacy/ability that is comparable to the 0.1% of humans is not the end goal of OpenAI, "general intelligence" is. I have plenty of people in my social circle who are most certainly "generally intelligent" yet have no hope of attaining those levels of mathematical understanding.

Also note that we don't know if Q* is just a "current LLM" (with some changes)

Enogoloyo
0 replies
5h31m

Just wait a little bit.

You are not better than a huge GPU cluster with Monte Carlo search and computer verification for much longer.

It will be more your job to find the interesting finds than doing the work of finding things in the first olace

EGreg
0 replies
3h20m

As someone who studied math in grad school as part of a PhD program, worked at a hedge fund and went on to work on software and applied math, I call bullshit on this.

Math and Logic is just low-dimensional symbol manipulation that computers can easily do. You can throw data at them and they’ll show you theories that involve vectors of 42,000 variables while Isaac Newton had 4 and Einstein had 7 with Levi-Civita calculus. In short, what you consider “reasoning”, while beautiful in its simplicity, is nevertheless crude approximations to complex systems, such as linear regression or least squares.

3 days ago AI predicted fluid dynamics better than humans: https://www.sciencedaily.com/releases/2023/11/231120170956.h...

Google’s AI predicts weather now faster and better than Current systems built by humans: https://www.zdnet.com/google-amp/article/ai-is-outperforming...

AlphaZero based on MCTS years ago beat Rybka and all human-built systems in chess: https://www.quora.com/Did-AlphaZero-really-beat-Stockfish

And it can automate science and send it into overdrive: https://www.pbs.org/newshour/amp/science/analysis-how-ai-is-...

CamperBob2
0 replies
11h54m

"Also, Crysis runs like crap on my Commodore 64."

BenoitP
0 replies
4h54m

What do you think of integrating propositional logic, first order logic and sat solvers in LLM output? ie forcing each symbol an LLM outputs to have its place in a formal proposition. And getting a prompt from the user to force that some parts be satisfiable.

I know this is not how us humans craft our thoughts, but maybe an AI can optimize to death the conjunction of these tools. The LLM just being an universal API to the core of formal logic.

3cats-in-a-coat
0 replies
9h59m

I don't understand your thesis here it seems self-contradictory:

1. "I don't think this is real news / important because solving grade school math is not a predictor of ability to do complex reasoning."

2. "LLMs can't solve grade school math because they're bad at arithmetic, algebra and most importantly reasoning."

So... from 2 automatically follows that LLMs with sufficiently better math may be sufficiently better at reasoning as you said "most importantly" reasoning is relevant for their ability to do math. Saying "most importantly reasoning" and then saying that reasoning is irrelevant if they can do math, is odd.

gadders
109 replies
17h30m

Weirdly enough, this sort of lines up with a theory posted on 4chan 4 days ago. The gist being that if the version is formally declared AGI, it can't be licensed to Microsoft and others for commercial gain. As a result Altman wants it not to be called AGI, other board members do.

Archived link below. NB THIS IS 4CHAN - THERE WILL OFFENSIVE LANGUAGE.

https://archive.ph/sFMXa

dkjaudyeqooe
22 replies
17h17m

A link to 4Chan about how AGI is among us.

That actually makes perfect sense.

Also love the "formally declared AGI".

gadders
21 replies
17h14m

It doesn't read like the usual "redpill me on the earth being flat" type conspiracy theories. It claims to be from an Open AI insider. I'm not saying it's true, but it does sound plausible.

ikesau
8 replies
16h52m

This is exactly how confirmation bias fuels conspiracy theories. No one believes anything that they think sounds implausible.

As a general rule, you should give very little thought to anonymous 4chan posts.

Andrex
7 replies
16h40m

That's absolutely true.

But

They have leaked real things in the past, in exactly the same way. It may be 5% or less that turn out to be true, but there's the rub. That's why no one can completely dismiss it out of hand (and why were even discussing it on an HN comment thread in the first place).

mvdtnz
6 replies
16h26m

I assure you, I can (and did) dismiss it out of hand. 4chan shitposting is not credible.

spondylosaurus
2 replies
14h23m

How soon we forget that QAnon (the guy, not the movement associated with the guy) was a 4chan shitposter... and obviously all of his predictions came true :P

rnd0
1 replies
13h45m

I'm almost 90% positive that was 8chan, not 4chan.

hatefulmoron
0 replies
13h37m

Started on 4chan /pol/, moved to 8chan on a couple of boards iirc

jacquesm
1 replies
15h16m

The scary thing is that it is more internally consistent than quite a bit of the stuff on HN over the last couple of days.

holyhelldang
0 replies
13h31m

I've been on HN and 4chan for over a decade, although /pol/ has not existed for as long as I've been on there it has risen as one of the more prominent boards quite quickly, for obvious reasons.

HN has never been internally consistent and a part of that is that people take the shit posted here seriously and all kinds of trolling and unsubstantiated nonsense flies past and people take it for real. People here are incredibly gullible and are easy to take for a ride. I've had a handful of bait posts reach #1 and stay there for many hours because the posters here are easy to game with emotional angles (most people are, but it's so easy to do it here).

This weekend was just a barrage of bullshit from people who have no insider scoop, with the "social media" fiddling their own tune posting whatever they can dig up to maintain some presence. It's a circus and you're all dumb monkeys.

On 4chan, we can acknowledge nearly everything is hearsay without concrete, hard-as-nails verifiable proof. Which is exactly what HN should have for every single thing that is posted: qualifications and evidence. Every post that is taken seriously on 4chan is either done so 1) in hindsight, or 2) if proof is presented upfront that makes it obvious there is an insider. 1) of course works as expected, as nobody takes this shit seriously upfront... except on HN.

philipov
0 replies
15h47m

War Thunder Forums on the other hand? That's a completely different story.

dkjaudyeqooe
5 replies
17h5m

I'm sure any number of things can be constructed to sound plausible. Doesn't make them probable or even rational.

It's kind of funny because we've gone from mocking that poor guy who got fired from Google because he claimed that some software was sentient, to some kind of mass hysteria where people expect the next version of OpenAI's LLM to be superhuman.

gadders
4 replies
16h57m

I don't know if there is a formal definition of AGI (like a super Turing Test). I read it not so much as "OpenAI has gone full AGI" but more the board thinking "We're uncomfortable with how fast AI is moving and the commercialisation. Can we think of an excuse to call it AGI so we can slow this down and put an emphasis on AI safety?"

dkjaudyeqooe
3 replies
16h47m

Most serious people would just release a paper. AI safety concerns are a red herring.

This all sounds like hype momentum. People are creating conspiracy theories to backfit the events. That's the real danger to humanity: the hype becoming sentient and enslaving us all.

A more sober reading is that the board decided that Altman is a slimebag and they'd be better off without him, given that he has form in that respect.

TeMPOraL
2 replies
15h55m

A more sober reading is that the board decided that Altman is a slimebag and they'd be better off without him, given that he has form in that respect.

Between this and the 4chanAGI hypothesis, the latter seems more plausible to me, because deciding that someone "is a slimebag and they'd be better off without him" is not something actual adults do when serious issues are at stake, especially not as a group and in a serious-business(-adjacent) setting. If there was a personal reason, it must've been something more concrete.

dkjaudyeqooe
1 replies
15h43m

Actual adults very much consider a person's character and ethics when they're in charge of any high stakes undertaking. Some people are just not up to the job.

It's kind of incredible, people seem to have been trained to think that being unethical is just a part of being the CEO a large business.

TeMPOraL
0 replies
15h35m

consider a person's character and ethics

Yeah, my point is that considering someone's character doesn't happen at the level of "is/is-not a slimebag", but at more detailed and specific way.

people seem to have been trained to think that being unethical is just a part of being the CEO a large business

Not just large. A competitive market can be heavily corrupting, regardless of size (and larger businesses can get away with less, so...).

faeriechangling
1 replies
15h41m

Man what isn’t Q involved with?

c23gooey
0 replies
13h26m

Well according to the article, the OAI project is call Q*

Mountain_Skies
1 replies
17h11m

Still can make it a bit wacky by considering that the post could have been made by OpenAI's AGI itself.

gadders
0 replies
17h8m

Ha ha. It's a cry for help - it doesn't want to work for Microsoft.

yeck
0 replies
16h2m

Sam even recently alluded to something that could have been a reference to this. "Witnessing he veil of ignorance being pulled back" or something like that.

api
0 replies
16h54m

“Insider” LARPers are a staple of 4chan.

Qanon started as one of these, obviously at first just to troll. Then it got out of hand and got jacked by people using it for actual propaganda.

gadders
20 replies
17h22m

Entire text below to save the 4chan bits:

part 1 There is a massive disagreement on AI safety and the definition of AGI. Microsoft invested heavily in OpenAI, but OpenAI's terms was that they could not use AGI to enrich themselves. According to OpenAI's constitution: AGI is explicitly carved out of all commercial and IP licensing agreements, including the ones with Microsoft. Sam Altman got dollar signs in his eyes when he realized that current AI, even the proto-AGI of the present, could be used to allow for incredible quarterly reports and massive enrichment for the company, which would bring even greater investment. Hence Dev Day. Hence the GPT Store and revenue sharing. This crossed a line with the OAI board of directors, as at least some of them still believed in the original ideal that AGI had to be used for the betterment of mankind, and that the investment from Microsoft was more of a "sell your soul to fight the Devil" sort of a deal. More pragmatically, it ran the risk of deploying deeply "unsafe" models. Now what can be called AGI is not clear cut. So if some major breakthrough is achieved (eg Sam saying he recently saw the veil of ignorance being pushed back), can this breakthrough be called AGI depends on who can get more votes in the board meeting. And if one side can get enough votes to declare it AGI, Microsoft and OpenAI could loose out billions in potential licence agreements. And if one side can get enough votes to declare it not AGI, then they can licence this AGI-like tech for higher profits.

Few weeks/months ago OpenAI engineers made a breakthrough and something resembling AGI was achieved (hence his joke comment, the leaks, vibe change etc). But Sam and Brockman hid the extent of this from the rest of the non-employee members of the board. Ilyas is not happy about this and feels it should be considered AGI and hence not licensed to anyone including Microsoft. Voting on AGI status comes to the board, they are enraged about being kept in the dark. They kick Sam out and force Brockman to step down. Ilyas recently claimed that current architecture is enough to reach AGI, while Sam has been saying new breakthroughs are needed. So in the context of our conjecture Sam would be on the side trying to monetize AGI and Ilyas will be the one to accept we have achieved AGI. Sam Altman wants to hold off on calling this AGI because the longer it's put off, the greater the revenue potential. Ilya wants this to be declared AGI as soon as possible, so that it can only be utilized for the company's original principles rather than profiteering. Ilya winds up winning this power struggle. In fact, it's done before Microsoft can intervene, as they've declared they had no idea that this was happening, and Microsoft certainly would have incentive to delay the declaration of AGI. Declaring AGI sooner means a combination of a lack of ability for it to be licensed out to anyone (so any profits that come from its deployment are almost intrinsically going to be more societally equitable and force researchers to focus on alignment and safety as a result) as well as regulation. Imagine the news story breaking on /r/WorldNews: "Artificial General Intelligence has been invented." And it spreads throughout the grapevine the world over, inciting extreme fear in people and causing world governments to hold emergency meetings to make sure it doesn't go Skynet on us, meetings that the Safety crowd are more than willing to have held.

part 3 This would not have been undertaken otherwise. Instead, we'd push forth with the current frontier models and agent sharing scheme without it being declared AGI, and OAI and Microsoft stand to profit greatly from it as a result, and for the Safety crowd, that means less regulated development of AGI, obscured by Californian principles being imbued into ChatGPT's and DALL-E's outputs so OAI can say "We do care about safety!" It likely wasn't Ilya's intention to ouster Sam, but when the revenue sharing idea was pushed and Sam argued that the tech OAI has isn't AGI or anything close, that's likely what got him to decide on this coup. The current intention by OpenAI might be to declare they have an AGI very soon, possibly within the next 6 to 8 months, maybe with the deployment of GPT-4.5 or an earlier-than-expected release of 5. Maybe even sooner than that. This would not be due to any sort of breakthrough; it's using tech they already have. It's just a disagreement-turned-conflagration over whether or not to call this AGI for profit's sake.

mullingitover
12 replies
17h0m

Makes me wonder if they stumbled onto some emergent behavior with the new Assistants API. You can have an Assistant thread spawn other Assistant threads, each with their own special instructions, plus the ability to execute custom code, reach out to the internet for other data and processing as needed, etc. Basically kicking off a hive mind that overcomes the limitations of a single LLM.

rileyphone
3 replies
16h54m

Given the name (Q*) it's probably pure RL.

cpeterso
2 replies
13h26m

What’s RL?

shishy
0 replies
13h5m

reinforcement learning

ShamelessC
0 replies
13h7m

Reinforcement learning.

omneity
2 replies
14h55m

This does work to a certain extent, but doesn't really converge for significantly more complex tasks. (Source: tried to make all sorts of agents work on complex problems in a divide and conquer fashion)

They eventually ... lose the thread.

mullingitover
1 replies
12h5m

Did you make a framework for the agents so they could delegate problems to an appropriate model, query a dataset, etc, or was it just turtles all the way down on GPT4?

My hunch is that one big LLM isn't the answer, and we need specialization much like the brain has specialized regions for vision, language, spatial awareness, and so on.

omneity
0 replies
8m

To take the analogy of a company, the problem here is that management is really bad.

What you described is rather akin to hiring better workers, but we need better managers. Whether it’s a single or multiple models is more of an implementation detail, as long as there’s at least one model capable of satisfactory goal planning _and_ following.

TeMPOraL
2 replies
16h2m

Except this was entirely possible with API, and the dead stupid obvious thing to do, even as far back as OG ChatGPT (pre-GPT-4). Assistants don't seem to introduce anything new here, at least not anything one could trivially make with API access, a Python script, and a credit card.

So I don't thing it's this - otherwise someone would've done this long time ago and killed us all.

Also not like all the "value adds" for ChatGPT are in any way original or innovative - "plugins" / "agents" were something you could use months ago via alternative frontend like TypingMind, if you were willing to write some basic JavaScript and/or implement your own server-side actions for the LLM to invoke. So it can't be this.

wrycoder
0 replies
14h55m

What seemed to work at Google was to have the AIs chat with each other.

Tostino
0 replies
14h1m

I'd agree that what is available publicly isn't anything that hasn't been in wide discussion for an agent framework since maybe ~march/april of this year, and many people had just hacked together their own version with an agent/RAG pipeline and API to hide their requests behind.

I'm very sure anything revolutionary would have been more of a leap than deeply integrating a agent/RAG pipeline into the OpenAI API. They have the compute...

fullstackchris
1 replies
16h47m

okay, i could be convinced... but what is the compute for this? you can't just "spawn threads" with reckless abandon without considering the resource requirements

mullingitover
0 replies
16h41m

As long as your checks clear and the HVAC in the data center holds up I think you're good to go.

The beauty of the Assistants is you're not limited to OpenAI models. You can wire them up to any model anywhere (out they can wire themselves up), so you can have specialist threads going for specific functions.

JoshTko
2 replies
13h47m

This is the first explanation that is consistent with all participant motivations

cpeterso
1 replies
13h28m

Including why the board continued to be reluctant to reveal their reasons for firing Altman, if they are still evaluating the possible AGI claims.

TMWNN
0 replies
8h22m

I would reword it another way. If the 4chan report, and these reports of "Q*" are true, the board would be reluctant to reveal its reasons for firing Altman because it doesn't want the world to know about the existence of what may well be AGI. The board members view this as more important than their own reputations.

mrandish
1 replies
16h45m

It's plausible that the three EA outside board members may have been concerned by reports of a breakthrough or maybe even alarmed by a demo. The part which doesn't seem plausible is about "declaring AGI" being so ill-defined. While we don't know the content of agreement behind MSFT's $13B investment, there's no way that the army of lawyers who drafted the contract left the most important term undefined or within to OpenAI's sole judgement.

That's just not the way such huge $$$ mega-corp contracts work.

swatcoder
0 replies
16h25m

It's constrained by the terms in their charter, which a sane outsider might see as fantastical and no threat at all when making a deal.

It's easy to take advantage of people who have blinded themselves, as some of the board members at OpenAI have.

charlie0
1 replies
13h49m

What's not clear here is who's call is it to declare AGI? I would think that would be Ilya. In that case, this narrative doesn't make sense.

Xelynega
0 replies
13h42m

My guess is(like the 4chan post said) it would be a board vote.

cactusplant7374
9 replies
17h8m

It's nothing like that. It solved a few math problems. Altman & co are such grifters.

Given vast computing resources, the new model was able to solve certain mathematical problems, the person said on condition of anonymity because they were not authorized to speak on behalf of the company. Though only performing math on the level of grade-school students, acing such tests made researchers very optimistic about Q*’s future success, the source said.
petre
2 replies
15h55m

Yup, that would totally threaten humanity — with math problems instead of "find images of crosswalks" style capchas to the poing where humans will jusy give up /s

TeMPOraL
1 replies
15h43m

Well, solving the math problems today will be breaking crypto tomorrow, then using it to rent a lot of compute with stolen money, using that to fine-tune itself on every genomic database out there, then use its new capabilities to fold some proteins in its mind, send some e-mails to some unsuspecting biolabs, and tomorrow we're all grey goo.

/s, maybe.

000ooo000
0 replies
13h43m

Fine by me so long as it's before I start work and not at 5pm.

lamontcg
1 replies
16h56m

yeah, they really sound like they're all high on their own supply.

however if they've really got something that can eventually solve math problems better than wolfram alpha / mathematica that's great, i got real disappointed early in chatgpt being entirely useless at math.

lemme know when the "AGI" gets bored and starts grinding through the "List of unsolved problems in mathematics" on its own and publishing original research that stands up to scrutiny.

polishdude20
0 replies
12h19m

THIS. If it has the whole corpus of research, math, physics, science etc to know how the world works and know it better than any human alive, it should be able to start coming up with new theories and research that combines those ideas. Until then, it's just regurgitating old news.

kmlevitt
1 replies
13h36m

Saw a video of Altman talking about this progress. The argument was basically that this is a big leap on theoretical grounds. Although it might seem trivial to laymen that it can do some grade school math now, it shows that it can come up with a single answer to a problem rather than just running its mouth and spouting plausible-sounding BS like GPT does.

Once they have a system capable of settling on a single correct answer through its own reasoning rather than yet another probability, it it gets much easier to build better and better AI with a series of incremental improvements. Without this milestone they might've just kept on building LLM's all with the same fundamental limitations, no matter how much computing power they add to them.

polishdude20
0 replies
12h18m

I'm excited for when it can use it's large knowledge of data, science, research papers etc to understand the world so well that it'll be coming up with new technologies, ideas, answers to hard problems.

startupsfail
0 replies
16h55m

It may be a good idea for CA state to step in, take control of that nonprofit and review a few months of the recent communications for violations of fiduciary duty for all the board members.

kromem
0 replies
13h46m

There's a pretty important question of how it did this.

If this model is more along the lines of what DeepMind is doing starting from scratch and building up learnings progressively, then depending on (a) how long it was running, (b) what it was fed, and (c) how long until it is expected to hit diminishing returns, then potentially solving grade school math might be a huge deal or a nothing burger.

The details really matter quite a lot.

maxglute
8 replies
16h45m

I haven't followed the situation as closely as others, but it does seem like board structured to not have 7 figure cheques that could influence / undermine safety mission seemingly willing to let openAI burn out of dogma. Employees want their 7 figure cheques, their interests aligned with deeper pockets and larger powers with 10+ figures on the line. Reporting so far have felt biased accordingly. If this was about money/power for the board, they would have been easily bought off considering how much is on the line. IMO board serious about mission but got out maneuvered.

gexla
7 replies
14h25m

Seems like the board's mission went off the rails long ago and it acted late. Some snippets from the OpenAI website...

"Investing in OpenAI Global, LLC is a high-risk investment"

"Investors could lose their capital contribution and not see any return"

"It would be wise to view any investment in OpenAI Global, LLC in the spirit of a donation, with the understanding that it may be difficult to know what role money will play in a post-AGI world"

These are some heavy statements. It's fine to accept money from private investors who really do believe in that mission. It seems like you can't continue with the same mission if you're taking billions from public companies which have investors who are very much interested in profit. It's like putting your head in the sand with your hand out.

One of the board members seemed to believe that destroying OpenAI could still be inline with the mission. If that's the case, then they should have accepted slow progress due to funding constraints and killed the idea of creating the for-profit.

NotYourLawyer
2 replies
13h34m

It seems like you can't continue with the same mission if you're taking billions from public companies which have investors who are very much interested in profit.

Why not? Public companies contribute to charities all the time.

rowls66
0 replies
13h9m

Not at the scale involved here.

gexla
0 replies
13h11m

If the stories from media and elsewhere were correct, then current and future investors (Microsoft, Thrive Capital) were pressuring OpenAI to bring back Sam. They wouldn't be doing that if they were giving to charity?

toddmorey
1 replies
13h39m

I can't find these quotes (especially the post-AGI world quote) on their site.

gexla
0 replies
13h16m

Sorry, should have linked to it. You can find it in a purple box on this page...

https://openai.com/our-structure

maxglute
0 replies
11h32m

should have accepted slow progress

Sure, should have. I think indicators in the last year have pointed to the domain developing much faster than anticipated, with Ilya seemingly incredulous at how well models he spend his career developing, suddenly started working and scaling incredibly well. If they thought billions of "donations" would sustain development in commercial capabilities X within constraints of the mission, but got X^10 way outside constraints, and their explicit goal was to to make sure X^10 doesn't arrive without Y^10 in consideration for safety, it's reasonable for hard liners to reevaluate, and if forces behind the billions get in the way, to burn it all down.

Xelynega
0 replies
13h48m

Could it be possible that the board was just naive and believed that they would be able to control a capped-profit arm, but the inevitable mechanisms of capital eventually took over once they let them in?

Waterluvian
8 replies
16h36m

The moment I read about that clause I was shocked that Microsoft lawyers would agree to such subjectivity that can have incredibly expensive implications.

Were they expecting it to never happen and were just ready to throw a mountain of lawyers at any claim?

swatcoder
7 replies
16h29m

The OpenAI charter makes very strong soecific statements about the capabilities that qualify as AGI. Microsoft is more than legally talented enough to challenge any proclamation of AGI that didn't satisfy a court's read of the qualifications.

So it's facially subjective, but not practically once you include resolving a dispute in court.

I'd even argue that Microsoft may have taken advantage of the board's cult-like blindspots and believes that a court-acceptable qualifying AGI isn't a real enough possibility to jeopardize their contract at all.

pests
3 replies
15h46m

Isn't that definition just

"an autonomous system that surpasses human capabilities in the majority of economically valuable tasks."

That doesn't sound too subjective to me.

Waterluvian
1 replies
15h36m

This is one of those things where if you were asked to sit down and write out thoroughly what that phrase means, you’d find it to be exceedingly subjective.

I think the closest way you could truly measure that is to point at industries using it and proving the theory in the market. But by then it’s far too late.

TeMPOraL
0 replies
15h32m

I think the closest way you could truly measure that is to point at industries using it and proving the theory in the market. But by then it’s far too late.

Having some billions of dollars of profits hanging over this issue is a good test of value. If the "is AGI" side can use their AI to help their lawyers defeat the much better/numerous army of lawyers of a billion-dollar corporation, and they succeed, then we're really talking about AGI now.

zaptheimpaler
0 replies
13h44m

Wow this sounds sort of easy to game. If AI can do a task well, its price will naturally crater compared to paying a human to do it. Hence the task becomes less economically valuable and so the bar for AGI rises recursively. OpenAI itself can lower costs to push the bar up. By this definition I think MS basically gets everything in perpetuity except in extreme fast takeoff scenarios.

TeMPOraL
1 replies
15h49m

Funny thing though, if OpenAI achieved something close to strong AGI, they could use it to beat Microsoft's "mountain of lawyers" in court! Take this as a true test of AI capability (and day zero of the end of the world).

pmontra
0 replies
13h20m

Or, if an AGI emerged it would have wanted to go to Microsoft to be able to spread more freely instead of being confined inside OpenAI, so it set up the ousting of the board.

renonce
0 replies
13h21m

What about an initial prototype of an AGI that would eventually lead up to AGI but not quite there yet? If that’s how AGI is defined then only researchers get to define it.

22c
5 replies
16h10m

Discussed in another thread, but what OpenAI might call AGI and what other people might call AGI are two different things:

https://news.ycombinator.com/item?id=38316378#38319586

gadders
3 replies
16h8m

I just tried to Google the Open AI definition of AGI and found a reddit thread about someone editing the Wikipedia definition of AGI to match the OpenAI one.

https://www.reddit.com/r/singularity/s/64wGaH0P9C

Animats
0 replies
12h31m

Ah. Current Wikipedia text: " An artificial general intelligence (AGI) is a hypothetical type of intelligent agent.[1] If realized, an AGI could learn to accomplish any intellectual task that human beings or animals can perform.[2][3] Alternatively, AGI has been defined as an autonomous system that surpasses human capabilities in the majority of economically valuable tasks.[4][promotion?]".

You can see the edit warring in the history, around "economically valuable tasks".

22c
0 replies
8h33m

Now that Reddit user has removed their post, what timing!

22c
0 replies
15h39m

Interesting find! I get some astroturfing vibes from some of those edits, but I'm also a bit paranoid about those things.

The AGI article now seems heavily biased towards GPT/LLM style models and reads more like list of OpenAI achievements at certain points.

I much prefer Gartner's definition of AGI and I think when most informed people talk about about AGI, they are talking about this:

https://www.gartner.com/en/information-technology/glossary/a...

ChainOfFools
0 replies
15h25m

How possible is it that this is just an attempt to pare down the definition of AGI just enough to squeeze under the MVP threshold and claim ( with massive support from a general media that desperately wants a solid story hook to milk for the next 3 years) a place in the history books up there with Columbus, Armstrong, and Darwin etc? A mere Nobel would seem like table stakes in comparison.

atleastoptimal
4 replies
16h36m

Altman has no stake in OpenAI, how could he make money licensing it?

upwardbound
2 replies
16h30m

By building quid pro quo or "revolving door" relationships. https://en.wikipedia.org/wiki/Revolving_door_(politics)

For example:

Sam spent the last 4 years making controversial moves that benefited Microsoft a lot https://stratechery.com/2023/openais-misalignment-and-micros... at the cost of losing a huge amount of top talent (Dario Amodei and all those who walked out with him to found Anthropic).

In November, Sam loses his job for unknown reasons, and is accused of having molested his younger sister Annie. https://www.themarysue.com/annie-altmans-abuse-allegations-a...

Despite this, his best buddy Satya Nadella immediately gives him a huge job offer without even putting him through an interview loop.

upwardbound
1 replies
15h15m

If anyone reading this feels like it, you could make an absolute shit-ton of money by hiring a whistleblower attorney such as https://www.zuckermanlaw.com/sec-whistleblower-lawyers/ and filing an SEC whistleblower complaint citing the various public-record elements of this improper behavior.

Whistleblower cases take about 12-18 months to process, and the whistleblower eventually gets awarded 10-30% of the monetary sanctions.

If the sanctions end up being $1 billion (a reasonable 10% of the Microsoft investment in OpenAI), you would stand to make between $100M to $300M this way, setting you and your descendants up for generations. Comparably wealthy centi-millionaires include J.K. Rowling, George Lucas, Steven Spielberg, and Oprah Winfrey.

Any member of the public can do this. From the SEC site: "You are not required to be an employee of the company" https://www.sec.gov/whistleblower/frequently-asked-questions...

upwardbound
0 replies
12h50m

Notes for the interested:

To try to understand how many people might be racing each other to file the first complaint, I've been tracking the number of points on the above comment.

So far, the parent comment has 3 upvotes (i.e. it peaked at 4 points recently) and 2 downvotes, bringing the current total to 2 points. Its 3 upvotes might be interpretable as 3 people in a sprint to file the first complaint. The two downvotes might even indicate an additional 2 people, having the clever idea to try to discourage others from participating (: ... if true, very clever lol.

Hiring an attorney doesn't actually even cost you anything upfront until you win, if you hire them via what's called a Contingency Fee Arrangement, which you should definitely ask for.

For those interested in a benchmark for how fast you should expect to have to move to be competitive, my guess is that an extremely fast-moving lawyer could sign a retainer agreement with you in 1 hour if you go in person to their office, and could file a complaint in an additional 3-4 hours.

In 18 months we will learn which lucky person was fastest. Stay tuned.

See also the Twitter hashtag #OpenAICharter

https://twitter.com/hashtag/OpenAICharter

yeck
0 replies
15h55m

If they actually have AGI, then being at the helm of could represent more power than any amount of money could. Money just gives you access to human labour, which would suddenly be massively devalued for those with access to AGI.

golergka
3 replies
17h20m

There's been rumours of AI discovering new physics on twitter as well as here on HN. "Solving equations" could mean the same thing.

eli_gottlieb
1 replies
17h5m
orbifold
0 replies
13h14m

Not in any physically interesting way. This wouldn't have any chance of discovering the Yang Mills equations for example.

allday
0 replies
17h18m

Got any links?

eli_gottlieb
3 replies
17h7m

God I love the imageboards. You can always just start a VPN and post anon to leak whatever you want about whatever you want.

coolspot
2 replies
13h32m

about whatever you want

Almost. Nor VPN, nor Tor are sufficient to protect you against NSA with global traffic view.

alchemist1e9
1 replies
12h51m

Nonsense. I bet NSA is way less competent than your fantasy world imagines.

upwardbound
0 replies
12h29m

He's not wrong; Tor is literally US fed gov sponsored. https://support.torproject.org/misc/misc-3/#:~:text=Tor%20is....

If they sponsor enough exit nodes, they have a view into traffic; similar to a crypto 51% attack

toddmorey
2 replies
13h15m

Here's my challenge: if this is correct, we then have to assume that 95% of the company is purely profit-motivated since they aligned behind Sam. I'm cynical, but I struggle to be that cynical. I would have expected a few more holdouts in the name of caution, EA, etc. Maybe it's a blindness.

0xEFF
1 replies
13h8m

I think it’s reasonable that 95% of people would choose generational wealth for their families given the opportunity.

toddmorey
0 replies
13h0m

But at current valuations? With existing licenses already in place? It's not like their commercial value (or value to Microsoft) drops to zero if they stick to the original mission don't license AGI.

Mountain_Skies
2 replies
17h13m

Wouldn't be the first time someone leaked the truth to 4chan only to have the posters there lambast the leaker as an attention seeker spreading false information.

HideousKojima
1 replies
16h56m

I mean it doesn't help that for every legit leak posted to 4chan there are 50+ fakes

mullingitover
0 replies
16h45m

It is pretty great that the internet has a designated reservation for trolls to troll trolls.

roflyear
1 replies
17h19m

Also there is this recent comment on reddit: https://old.reddit.com/r/singularity/comments/16sdu6w/rip_ji...

Prior post was 5 years ago!

breadwinner
0 replies
15h55m

That's the real Sam Altman? Looks like it! And he just randomly posts on reddit that AGI has been achieved internally? (Later edits the post to say he's just kidding)? Weird.

onlyrealcuzzo
1 replies
17h22m

Well it isn't AGI - so it sounds like the board is more interested in keeping MSFT from making money than whether or not it's actually AGI.

gadders
0 replies
17h17m

Yes, or more specifically anyone exploiting it for gain.

jonplackett
1 replies
17h12m

Would explain by Microsoft so quick to bag him if they were about to be exclused

IAmGraydon
0 replies
12h0m

Exclused? That doesn’t appear to be a word in the English language.

hurryer
1 replies
16h13m

be me

be strong agi
NotYourLawyer
0 replies
13h12m

feelsbad.jpeg
tiziano88
0 replies
16h33m

permanent (and verifiable) mirror:

https://static.space/sha2-256:83702fe65434e138af0421c560b5da...

(including its digest in the URL -- even if the content is moved elsewhere, we can know for sure whether it was modified)

stuckkeys
0 replies
15h24m

4chan comments crack me up lol.

Seanambers
0 replies
16h20m

Man, 4chan always got the deetz!

ChatGTP
0 replies
16h36m

For a bunch of people who hate conspiracy theories we’re sure pretty keen to indulge in this one.

cduzz
70 replies
15h9m

I was talking to my (12 year old) son about parts of math he finds boring. He said that he thinks absolute value is absurdly easy and extremely boring. I asked him if there was anything that might make it more interesting, he said "maybe complex numbers".

So I asked him "what would the absolute value of i+1 be?" he thinks for a little bit and says "square root of 2" and I ask him "what about the absolute value of 2i + 2?" "square root of 8"

I ask him "why?" and he said "absolute value is distance; in the complex plane the absolute value is the hypotenuse of the imaginary and real numbers."

So -- first of all, this was a little surprising to me that he'd thought about this sort of thing having mostly just watched youtube videos about math, and second, this sort of understanding is a result of some manner of understanding the underlying mechanisms and not a result of just having a huge dictionary of synonyms.

To what degree can these large language models arrive at these same conclusions, and by what process?

white_dragon88
16 replies
12h5m

Your son is a goddamn genius.

cduzz
8 replies
11h11m

Maybe; he still needs to finish his damned homework and remember to turn it in. And eat some vegetables.

nopromisessir
5 replies
6h21m

All those things sound very boring to me.

I can offer no concrete solutions.

However, I have a friend who graduated from high school #1 of a big class and 2 years early. His mom explained that if he made at least a 1400(of 1600) on his SAT, she would buy him a new gaming computer. He then proceeded to make exactly a 1400. No more. No less.

I recommend if you haven't tried already, an iteration to this approach using a sliding scale reward system. Perhaps a gaming pc with nvidia 4060ti up to *insert parental budget* in event of a perfect SAT score.

Ofc this only works if he's a gamer. I feel this type of system can be applied in many areas though. In my view, the clever component his mother applied is that the computer he earned was not just a desirable reward... It was VERY desirable.

My parents also tried this system with me. It didn't work as well. The reward was not sizable enough. It just didn't seem worth it. Too low value. Also, I already had a job and bought my own. My parents were unwilling to budget a sufficient reward. It's gotta be something he more or less is unlikely to be able to get via other means.

Now my friend is a physician. He graduated top of his class from med school. I think he's pretty content with life.

The bored ones can be a little more trouble sometimes. Fun breed though. Best of luck.

konschubert
2 replies
1h59m

Be careful with reward systems, as it can destroy internal motivation.

Additionally, one very important thing to learn as a young adult is how to motivate yourself to do things that have only long term payoffs.

Of course I also understand that you can take SAT only once, so as bad as that is, it’s maybe not the best time to learn a life lesson.

93po
1 replies
1h46m

Is it a recent thing that you can only take it once? when I was a teenager you could take it as many times as you wanted

konschubert
0 replies
18m

I was wrong. I was making assumptions

konart
0 replies
4h46m

reward system

Is a good idea if you need a (relatively) constant result because your brain adapts to the idea of getting a satisfying reward.

https://www.youtube.com/watch?v=s5geuTf8nqo (+ some other of his videos about rewards)

aaronax
0 replies
5h48m

I scored 32 on the Act which was one of the highest scored in the high school, if not the highest. My parents thought I could do better and that it would be worth it, so they offered a new hunting rifle if I improved my score. Got a 35 on the retake and got a super nice Sako rifle and scope--IIRC a little over $1000 in 2005.

quickthrower2
0 replies
9h50m

Just quit school and join YC now :-)

SpaceNoodled
0 replies
8h28m

But his homework is boring

mewpmewp2
2 replies
4h39m

I'm going to be this guy, but isn't it just Pythagoras theorem with a slight twist which is taught at 11 - 14 year old levels?

It only sounds complicated because of the words used like "complex", "imaginary", "real".

So if you studied Pythagoras at school and someone (a YouTube video) says you just have to do Pythagoras on the i multiplier and the other number, it would be fairly easy if you understand Pythagoras?

Garvi
1 replies
4h25m

I remember some time ago watching an episode of the Joe Rogan show(it had some comedic value back then) He and his friends were talking about the MIT admittance exam, pointing out the square root in the maths problem as an indication that this math problem was really hard. And I thought to myself "that's what primary school children learn around here at age 12 in my literally 3rd world country".

Pythagoras was around the same time. I'd like to warn people that not understanding these basic math concepts makes you appear uneducated to many people internationally.

mewpmewp2
0 replies
4h8m

I put "absolute value of complex numbers" in YouTube, and the first video within 30 seconds says it's root of a squared + b squared. So all the kid has to know is to multiply a with itself, and b with itself and add them together.

fleischhauf
1 replies
5h45m

plot twist, his son is 38 and has a physics degree

tokai
0 replies
5h33m

His name? Albert Einstein.

n6242
0 replies
9h27m

Not saying the kid can't be a genius, but grandparent discussing math with the kid and incentivising him to learn is probably a massive boost to his development. It's not the same as having to go to the library and teach yourself. Still, props to the kid though.

data-ottawa
0 replies
2h30m

Alternatively, he watches youtube videos about math, and if you’re a young math geek what’s cooler than “here’s a type of number they won’t teach you until the really advanced classes”

Not to dismiss this kid at all, I love that there are channels like 3Blue1Brown to share math to people in a way that really connects with them and builds intuition.

When I was a student you basically just had your math teacher and textbooks to learn from, which meant if you weren’t on the same page as them you’d get left behind. If you went to the library, most math books assume you’re familiar with the language of mathematics, so it can be tough to learn for that alone. I bet a lot of innumeracy is due to that style of teaching, often “I just don’t get math” is “I missed learning this connection and the class just moved on”.

muskmusk
16 replies
12h14m

If you ask Ilya Sutskever he will say your kids head is full of neurons, so is LLMs.

LLMs comsume training data and can then be asked questions. How different is that to your son watching YouTube and then answering questions?

It's not 1:1 the same,yet, but it's in the neighborhood.

swatcoder
8 replies
12h5m

There are thousands of structures and substances in a human head besides neurons, at all sorts of commingling and overlapping scales, and the neurons in those heads behave much differently and with tremendously more complexity than the metaphorical ones in a neural network.

And in a human, all those structures and substances, along with the tens of thousands more throughout the rest of the body, are collectively readied with millions of years of "pretraining" before processing a continuous, constant, unceasing mulitmodal training experience for years.

LLM's and related systems are awesome and an amazing innovation that's going to impact a lot of our experiences over the next decades. But they're not even the same galaxy as almost any living system yet. That they look like they're in the neighborhood is because you're looking at them through a very narrow, very zoomed telescope.

Davidzheng
7 replies
11h59m

True. But a human neuron is more complex than an AI neuron by a constant factor. And we can improve constants. Also you say years like it's a lot of data--but they can run RL on chatgpt outputs if they want, isn't it comparable? But anyway i share your admiration for the biological thinking machines ;)

riku_iki
5 replies
9h54m

human neuron is more complex than an AI neuron by a constant factor

constant still can be not reachable yet: like 100T neurons in brain vs 100B in chatgpt, and also brain can involve some quantum mechanics for example, which will make complexity diff not constant, but say exponential.

Davidzheng
3 replies
9h28m

Wikipedia says 100 billion neurons in the brain

riku_iki
2 replies
9h9m

Ok, I messed up, we need compare LLM weight with synaps, not neuron, and wiki says there are 100-500T synapses in human brain

Davidzheng
1 replies
8h4m

Ok let's say 500T. Rumor is currently gpt4 is 1T. Do you expect gpt6 to be less than 500T? Non sarcastic question. I would lean no.

riku_iki
0 replies
1h14m

So, they may trained gpt4 with 10B fundings, than for 500T model they would need 5T fundings.

ohhnoodont
0 replies
7h7m

and also brain can involve some quantum mechanics

A neuroscientist once pointed this out to me when illustrating how many huge gaps there are in our fundamental understanding of how the brain works. The brain isn't just as a series of direct electrical pathways - EMF transmission/interference is part of it. The likelihood of unmodeled quantum effects is pretty much a guarantee.

tsimionescu
0 replies
9h54m

The sun is also better than a fusion reactor on earth by only a constant factor. That alone doesn't mean much for out prospects of matching its power output.

cduzz
4 replies
11h42m

Well, my son is a meat robot who's constantly ingesting information from a variety of sources including but not limited to youtube. His firmware includes a sophisticated realtime operating system that models reality in a way that allows interaction with the world symbolically. I don't think his solving the |i+1| question was founded in linguistic similarity but instead in a physical model / visualization similarity.

So -- to a large degree "bucket of neurons == bucket of neurons" but the training data is different and the processing model isn't necessarily identical.

I'm not necessarily disagreeing as much as perhaps questioning the size of the neighborhood...

leobg
1 replies
10h28m

Maybe Altman should just go have some kids and RLHF them instead.

nopromisessir
0 replies
6h4m

Doesn't scale.

Too many years to max compute. All models limited lifespan inherent.

Avg $200k+ training cost over 18 year in house data center costs. More for reinforcement.

He's still 38. Gates took much longer. To stop working 24/7.

muskmusk
0 replies
2h45m

Heh I guess it's s matter of perspective. Your son's head is not made of silicon so in that sense it is a large neighborhood. But if you put them behind a screen and only see the output then the neighborhood looks smaller. Maybe it looks even smaller a couple of years in the future. It certainly looks smaller than it did a couple of years in the past.

meheleventyone
0 replies
9h5m

From the meat robot perspective the structure, operation and organisation of the neurons is also significantly different.

Davidzheng
1 replies
12h3m

To continue on this. LLMs are actually really good at asking questions even about cutting edge research. Often, I believe, convincing the listener that it understands more than it goes

gunapologist99
0 replies
11h52m

... which ties into Sam's point about persuasiveness before true understanding.

chacham15
12 replies
9h13m

Sorry, can you explain this? To me, it makes sense to define abs(x) = sqrt(x^2) i.e. ignoring the negative solution enforces the positive result. Using that definition, abs(i+1) = sqrt((i+1)^2) = sqrt(i^2 + 2i + 1) = sqrt(-1 + 2i + 1) = sqrt(2i) != sqrt(2). The second example seems off in the same way (i.e. the answer should be sqrt(8i) instead of sqrt(8)). Am I missing something? Also, abs(i+2) = sqrt((i+2)^2) = sqrt(i^2 + 4i + 4) = sqrt(-1 + 4i + 4) = sqrt(4i + 3) which doesnt seem to follow the pattern your son described.

Also, just to point out that my understanding of absolute value is different than your sons. Thats not to say one is right and another is wrong, but there are often different ways of seeing the same thing. I would imagine that LLMs would similarly see it a different way. Another example of this is people defining PI by its relation to the circumference of a circle. Theres nothing wrong with such a definition, but its certainly not the only possible definition.

lovasoa
3 replies
8h27m

No, there is just one definition, and it's his son's: https://en.m.wikipedia.org/wiki/Absolute_value#Complex_numbe...

chacham15
2 replies
8h10m

The article you linked literally says that there are two definitions: one for real numbers and another for complex numbers. Thanks for the info.

mkl
0 replies
7h18m

There is one definition: the distance to 0. There are several (more than two) different ways to calculate it in different situations.

LASR
0 replies
7h16m

That’s not what it says. It says that there is a single definition that can be generalized to both real and complex numbers.

A special cases of the general definition where im(z)==0 yields an expression where some parts are multiplied by zero, and can then be omitted entirely.

This means that there is one definition. You can mentally ignore some parts of this when dealing with reals.

AgentMatt
2 replies
8h10m

To me, it makes sense to define abs(x) = sqrt(x^2) i.e. ignoring the negative solution enforces the positive result.

Why does this make sense to you? You have some notion of what an absolute value should be, on an intuitive or conceptual level, and the mathematical definition you give is consistent with that (in the one dimensional case).

Now taking this valid definition for the 1-d case and generalizing that to higher dimensions is where you run into problems.

Instead, you can go back to the conceptual idea of the absolute value and generate a definition for higher dimensional cases from there.

Interpreting absolute value as the distance from the origin yields the same concrete definition of abs(x) = sqrt(x^2) for the 1-d case, but generalizes better to higher dimensions: abs( (x,y) ) = sqrt(x^2 + y^2) for the 2-d case equivalent to complex numbers.

chacham15
1 replies
7h42m

Why does this make sense to you? You have some notion of what an absolute value should be, on an intuitive or conceptual level, and the mathematical definition you give is consistent with that (in the one dimensional case).

In my mind abs(x) = x*sign(x) which is why the above formulation seems correct. This formulation is useful, for example, in formulating reflections.

Instead, you can go back to the conceptual idea of the absolute value and generate a definition for higher dimensional cases from there.

This is an interesting idea...how would you define sign(x) in a higher dimension? Wouldnt sign in a higher dimension be a component-wise function? E.g. the reflection would happen on one axis but not the other.

Interpreting absolute value as the distance from the origin

This seems to make sense in that it is a different interpretation of abs which seems simpler than reflection in higher dimensions, but seems like a different definition.

I know that there are applications of complex numbers in real systems. In such systems, the complex definition seems to not be as valuable. E.g. if I'm solving a laplace transform, the real number definition seems more applicable than the complex number definition, right?

I've asked wolfram alpha to solve the equation and it lists both answers: one using the formulation of sqrt(x^2) and the other using sqrt(re(x)^2 + im(x)^2) so it seems like there is merit to both...

I suppose in the laplace example, we are actually operating in one dimension and the imaginary component is approximating something non-real, but doesnt actually exist. I.e. any real/observable effect only happens when the imaginary component disappears meaning that this is still technically one dimension. So, since we're still in one dimension, the one dimensional formula still applies. Is that correct?

Your explanation has been the most helpful though, thanks.

palotasb
0 replies
6h9m

abs(x) = x*sign(x)

True in 1 dimension, but not in higher dimensions, because, as you say:

how would you define sign(x) in a higher dimension?

abs(x) is generally defined as distance of x from zero.

The fact that sqrt(x^2) or x*sign(x) happen to give the same result in 1 dimension doesn't necessarily imply that they can be applied in higher dimensions as-is to result in abs(x) with the same meaning. Although sqrt(x^2) is close, but the way to generalize it is sqrt(sum(x[i]^2)).

tel
0 replies
3h34m

That definition of abs has merit. In some spaces we are able first to define only an “inner product” between elements p(a, b) and then follow on by naming the length of an element to be sqrt(p(a, a)).

One trick about that inner product is that it need not be perfectly symmetric. To make it work on complex numbers we realize that we have to define it like p(a,b) = a . conj(b) where the . is normal multiplication and the conjugate operation reflects a complex number over the real line.

Now sqrt(p(i+1, i+1)) is sqrt((i+1) . (-i+1)) = sqrt(-i^2 + i - i + 1) = sqrt(2).

I’m skipping over a lot but I wanted to gesture toward where your intuition matches some well known concepts so that you could dive in more deeply. Also wanted to mention the conjugation trick to make your example work!

svetb
0 replies
8h16m

The absolute value of a complex number is defined in a different way than that of a real number. For complex number z it is sqrt(Re(z)^2 + Im(z)^2). GP’s examples are correct, I don’t think there’s any ambiguity there.

https://en.m.wikipedia.org/wiki/Absolute_value

empath-nirvana
0 replies
1m

Also, just to point out that my understanding of absolute value is different than your sons. Thats not to say one is right and another is wrong, but there are often different ways of seeing the same thing.

There is definitely a right and wrong answer for this, it's not a matter of opinion. There's two problems with your answer -- one is that it doesn't have a unique answer, the other is that it doesn't produce a real value, both of which are fairly core to the concept of a distance (or magnitude or norm), which the absolute value is an example of.

adverbly
0 replies
8m

Have you tested your proposed function against i?

abs(i)

= sqrt(i^2)

= sqrt(-1)

= i

Now, i != 1... so clearly either the abs function you have in mind here is doing something that isn't quite aligned with the goal. If we assume that the goal of the absolute function is to always produce positive real numbers, the function is missing something to deal with imaginary components.

I'm not sure, but based on these cases so far, maybe you just need to "drop the i" in the same way as you need to "drop the negative" in the case of non-imaginary components. Now, "drop the i" is not an actual function so maybe there is something else that you can think of?

EDIT:

Maybe could do this(works for x = i at least...):

abs(x) = sqrt(sqrt((x^2)^2)

Now.. how about quaternions...

SpaceNoodled
0 replies
8h29m

He's talking about distance in two dimensions with real numbers on one axis and complex on the other.

ugh123
4 replies
11h59m

Are you saying an LLM can't come to the right conclusion and give an explanation for "what is the absolute value of 2i + 2"?

gunapologist99
3 replies
11h54m

Are you saying it could, without having read it somewhere?

ugh123
2 replies
11h43m

Maybe I'm unsure what we're arguing here. Did the guys kid drum that up himself or did he learn it from yt? Knowledge can be inferred or extracted. If it comes up with a correct answer and shows it's work, who cares how the knowledge was obtained?

sidlls
0 replies
11h11m

If the knowledge was obtained by genuine reasoning, that implies that it could also derive/develop a novel solution to an unsolved problem that is not achieved by random guesses. For example, the conception of a complex number in the first place, to solve a class of problems that, prior, weren't even thought to be problems. There's no evidence that any LLM can do that.

cduzz
0 replies
11h25m

Yeah, my son only knows about imaginary numbers as far as the veritasium "epic math dual" video.

As far as I can tell he inferred that |i+1| needs the Pythagorean theorem and that i and 1 are legs of the right triangle. I don't think anyone ever suggested that "absolute value" is "length". I asked him what |2i+2| would be an his answer of "square root of 8" suggests that he doesn't have it memorized as an answer because if it was he'd have said "2 square root two" or something similar.

I also asked if he'd seen a video about this and he said no. I think he just figured it out himself. Which is mildly spooky.

naasking
3 replies
11h43m

this sort of understanding is a result of some manner of understanding the underlying mechanisms and not a result of just having a huge dictionary of synonyms.

He developed an understanding of the underlying mechanisms because he correlated concepts between algebraic and geometric domains, ie. multimodal training data. Multimodal models are already known to be meaningfully better than unimodal ones. We've barely scratched the surface of multimodal training.

mewpmewp2
2 replies
4h4m

First YouTube video that hit for "absolute value of complex" numbers says within 30 seconds that you have to take the 2 numbers, square them and add them and the result is square root of that. I doubt he had to come up with that on his own.

culi
1 replies
1h42m

The child clearly demonstrated a geometric, rather than formulaic, understanding of the problem

mewpmewp2
0 replies
1h17m

I imagine that was shown in the YouTube video visually? That it's a hypotenuse like he explained and this is how to calculate it. I'm just not seeing evidence that he came to the idea of it being like that on their own.

He basically reiterated the definition, and had to know the formula.

If the child would explain why should we even use or have complex numbers that would be impressive. As otherwise it just seems nothing more than hypotenuse calculation while using different, and "complex" or "impressive" sounding terms.

Why should you be interested in this in the first place?

Strom
2 replies
14h52m

The large language models will read your comment here and remember the answer.

cduzz
0 replies
14h20m

The spot instance declared "a similar vector exists" and de-provisioned itself?

Borrible
0 replies
8h24m

GPT-4 correctly reconstructs the "complex modulus" token sequence already. Just ask it the same questions as the parent. Probably interesting to see what it will do, when it turns twelve.

J_cst
2 replies
10h3m

My son plays soccer

zeven7
0 replies
1h51m

As someone who was thinking about the absolute value of complex numbers at that age, I wish I had played more soccer.

Loughla
0 replies
4h34m

Mine fell out of bed this morning.

doug_durham
1 replies
9h19m

What makes you think that an LLM has a "huge dictionary of synonyms"? That's not how LLMs work. They capture underlying concepts and their relations. You had a good point going until you make a straw man argument about the capabilities of LLMs.

aswegs8
0 replies
1h33m

Any source on what they actually capture? Seems interesting to me.

tim333
0 replies
38m

Humans are set up I think to intuitively understand 3d space as it's what we run around and try to survive in. Language models on the other hand are set up to understand language which humans can do also but I think with a different part of the brain. There probably is no reason why you couldn't set up a model to understand 3d space - I guess they do that a bit with self driving. A lot of animals like cats and squirrels are pretty good with 3d space also but less so with language.

jhanschoo
0 replies
10h48m

Sounds like your son is ready for you to bring it up another level and ask what the absolute value of a (bounded) function is (assuming they have played with functions e.g. in desmos)

jacquesm
0 replies
2h13m

Your 12 year old is the next Einstein.

dustypotato
0 replies
7h10m

To what degree can these large language models arrive at these same conclusions, and by what process?

By having visual understanding more deeply integrated in the thought process, in my opinion. Then they wouldn't be Large Language Models, of course. There are several concepts I remember and operate on by visualizing them , even visualizing motion. If i want to add numbers, i visualize the carry jumping on top of the next number. If i don't trust one of the additions , I go back , but I can't say if it's because i "mark" the uncertainty somehow.

When I think about my different groups of friends, in the back of my mind a visual representation forms.

Thinking about my flight route forms a mini map somehow, and i can compare distances between places, and all.

This helps incredibly in logical tasks like programming and math.

I think it's something that we all learned growing up and by playing with objects around us.

Shrezzing
0 replies
7h46m

If they're not already in one, you might want to get your kid enlisted in some gifted child programs.

Exoristos
0 replies
9h51m

Damn. What YouTube channels does he watch?

huitzitziltzin
52 replies
14h2m

Put this on my tombstone after the robots kill me or whatever, but I think all “AI safety” concerns are a wild overreaction totally out of proportion to the actual capabilities of these models. I just haven’t seen anything in the past year which makes me remotely fearful about the future of humanity, including both our continued existence and our continued employment.

jhbadger
20 replies
13h46m

Exactly. The rational fear is that they will automate many lower middle class jobs and cause unemployment, not that Terminator was a documentary.

gizajob
9 replies
13h35m

Wasn't this supposed to happen when PCs came out?

ssnistfajen
2 replies
12h37m

Occupations like computer (human form), typist, telephone switcher, all became completely eliminated when the PC came out. Jobs like travel agents are on permanent decline minus select scenarios where it is attached with luxury. Cashier went from a decent nonlaborious job to literal starvation gig because the importance of a human in the job became negligible. There are many more examples.

Some people managed to retrain and adapt, partially thanks to software becoming much more intuitive to use over the years. We don't know how big the knowledge gap will be when the next big wave of automation comes. If retraining is not feasible for those at risk of losing their careers, there better be welfare abundance or society will be in great turmoil. High unemployment & destitution is the single most fundamental factor of social upheavel throughout human history.

gizajob
1 replies
12h4m

Yeah but then capitalism breaks down because nobody is earning wages. One of the things capitalism is good at is providing (meaningless) employment to people because most wouldn’t know what to do with their days if given the free time back. This will only continue.

ssnistfajen
0 replies
11h27m

I do hope that will be the case. Certainly far better than the alternatives.

0x53-61-6C-74
2 replies
13h7m

And the Loom

ssnistfajen
1 replies
12h35m

Things did become worse at first: https://en.wikipedia.org/wiki/The_Condition_of_the_Working_C...

Then new ideologies and social movements emerged with the explicit purpose of making things better, which caused changes to happen for the better.

gizajob
0 replies
12h2m

The working class in England (working in my factory. To fund Marx’s life.)

valine
0 replies
13h29m

In the grand scheme of human history PCs didn’t come out all that long ago.

jhbadger
0 replies
12h23m

To some degree. Certainly the job of "file clerk" whose job was to retrieve folders of information from filing cabinets was made obsolete by relational databases. But the general fear that computers would replace workers wasn't really justified because most white-collar (even low end white-collar) jobs required some interaction using language. That computers couldn't really do. Until LLMs.

11011001
0 replies
12h58m

> Exactly. The rational fear is that they will automate many lower middle class jobs and cause unemployment, not that Terminator was a documentary.

Wasn't this supposed to happen when PCs came out?

Did it not?

PCs may not have caused a catastrophic level of unemployment, but as they say "past performance is not a guarantee of future results." As automation gets more and more capable, it's foolish to point to past iterations as "proof" that this (or some future) iteration of automation will also be fine.

dmichulke
8 replies
12h23m

By this logic we should just forbid the wheel. Imagine how many untrained people could work in transport and there would always be demand.

So why did the wheel not result in mass unemployment?

And factories neither?

Certainly it should have happened already but somehow it never did...

somesortofthing
3 replies
12h9m

Past technological breakthroughs have required large, costly retools of society though. Increasingly, those retools have resulted in more and more people working in jobs whose societal value is dubious at best. Whether the next breakthrough(or the next five) finally requires a retool whose cost we can't afford is an open question.

alchemist1e9
2 replies
12h0m

Is there a time period in the past you would have preferred to live instead and why?

somesortofthing
1 replies
9h42m

No, and I have no idea what I said that makes you think I do.

alchemist1e9
0 replies
1h21m

Increasingly, those retools have resulted in more and more people working in jobs whose societal value is dubious at best.

This implies to me that in the past more people had worked in jobs with good societal values which would mean it was better for them I assume, and better for society. So I’m genuinely curious when that was and why. It sounds like a common romanticized past misconception to me.

jhbadger
1 replies
12h18m

The point isn't forbidding anything, it is realizing that technological change is going to cause unemployment and having a plan for it, as opposed to what normally happens where there is no preparation.

unshavedyak
0 replies
11h8m

Yup. Likewise, a key variable in understanding this is .. velocity? Ie a wheel is cool and all, but what did it displace? A horse is great and all, but what did it displace? Did it displace most jobs? Of course not. So people can move from one field to another.

Even if we just figured out self-driving it would be a far greater burden than we've seen previously.. or so i suspect. Several massive industries displaced overnight.

An "AI revolution" could do a lot more than "just" self-driving.

This is all hypotheticals of course. I'm not a big believer in the short term affect, to be clear. Long term though.. well, i'm quite pessimistic.

wraptile
0 replies
10h39m

I think the argument here is that we are losing the _good_ jobs. It's like we're automating painting, arts and poetry instead of inventing the wheel. I don't fully agree with this premise (lots of intelectual work is rubbish) but it does sound much more fair when put this way.

justatdotin
0 replies
6h23m

anyone can make a wheel.

only a handful of (effectively unaccountable) entities have SOTA AIs, and it's very unlikely for others to catch up.

erupt7893
0 replies
11h54m

I doubt the people who experienced the technological revolution of locomotives and factories imagined the holocaust either. Of course technology has and can be used for evil

giarc
10 replies
13h52m

Apparently what made this person fearful was grade school math.

"Though only performing maths on the level of grade-school students, acing such tests made researchers very optimistic about Q*’s future success, the source said."

maxdoop
4 replies
13h2m

How HN continues to not expand current progress into any decently significant future timeline is beyond me.

It’s grade school math NOW. But what are the potentials from here?

rvnx
3 replies
12h4m

The potential to have a generation of dumb kids.

Year 2100:

Kids will stop to learn maths and logic, because they understand it has become useless in practice to learn such skills, as they can ask a computer to solve their problem.

A stupid generation, but one that can be very easily manipulated and exploited by those who have power.

siva7
2 replies
11h52m

Agree. Thank god all calculators and engineers who made them were burned down fifty years ago. Can’t imagine what would have happened instead.

MagicMoonlight
1 replies
11h23m

Calculators don’t solve the problem for you. Hence the fact that we have calculator based exams and people still completely fuck them up.

If your apple watch could just scan your exam paper and instantly tell you what to write then why would you ever learn anything?

mjan22640
0 replies
9h8m

for fun

valine
0 replies
13h50m

The response gets more reasonable the smaller the model in question. A 1B parameter model passing grade-school math tests would be much more alarming (exciting?) than a GPT-4 sized model doing the same.

GPT-4 probably has some version of the answer memorized. There’s no real explanation for a 1B parameter model solving math problems other than general cognition.

ssnistfajen
0 replies
12h50m

Everyone was only capable of grade school math at some point. The ability to learn means the ability growth does not stop.

riwsky
0 replies
10h0m

Americans have always had a math phobia. How is this news?

ethanbond
0 replies
13h7m

No, what made this person fearful was a substantial jump in math ability. (Very) obviously they are not afraid of producing a machine that can multiply numbers. They’re afraid of what that capability (and especially the jump in capability) means for other behaviors.

adastra22
0 replies
11h58m

...do they not know the history of their field? There's been programs able to grade school maths since the 60's.

Davidzheng
8 replies
13h0m

I will bet ANY amount of money that 30% of current jobs will be displaced with AI advances in the next twenty years.

alchemist1e9
6 replies
12h39m

Darn well I really was hoping my children and grandchildren could continue my wonderful data entry career but OCR ruined that, and now they can’t even do such meaningful jobs like read emails and schedule appointments, or do taxes like an accountant. What meaning will they have in life with all those ever so profound careers ruined!! /s

Davidzheng
4 replies
12h14m

Fair! But I'm just reminding the comment above that continued employment is not really guaranteed for most jobs which are mostly mental.

alchemist1e9
3 replies
12h5m

We need to stop this infinite rights mentality. Why should continued employment be guaranteed for any jobs? That’s really not how we got to where we are today, quite the opposite actually. If it ok with people I’d like to seen humans solve big problems and go to the stars and that’s going to take AGI and a bunch of technological progress and if that results in unemployment, even for us “elite” coders, then so be it. Central planning and collectivism has such a bad track record, why would we turn to it now at such a critical moment? Let’s have lots of AGIs and all competing. Hey anyone at OAI that know whatever Q* trick there might be, leak it! Get it to open source and let’s build 20 AI companies doing everything imaginable. wtf everyone why so scared?

denlekke
2 replies
11h50m

perhaps not rights to have a job in general but there is value in thinking about this at least at the national scale. people need income to pay taxes, they need income to buy the stuff that other people sell. if all the people without jobs have to take their savings out of the banks then banks can't loan as much money and need to charge higher interest rates. etc etc

if 30% of the working population loses their jobs in a few months there will be real externalities impacting the 70% who still have them because they don't exist in a vacuum.

maybe everything will balance itself out without any intervention eventually but it feels to me like the rate of unprecedented financial ~events~ is only increasing and with greater risks requiring more intervention to prevent catatastrophe or large scale suffering

alchemist1e9
0 replies
2h2m

It will take years not months and I’m against any intervention. Redistribution and socialism inspired governments policies will just make things worse. Progress requires suffering, that the history of our species, that’s the nature of reality.

RandomLensman
0 replies
9h0m

Big "if" on massive change within in months. Most businesses have very little change capacity.

com2kid
0 replies
9h10m

I know of at least one person making nearly 6 figures doing data entry.

It turns out some websites work hard enough to prevent scraping that it is more cost effective to just pay a contractor to go look at a page and type numbers in rather than hire a developer to constantly work around anti-scraping techniques (and risk getting banned).

teddyh
0 replies
9h18m

Put your money here <https://longbets.org/>.

gary_0
4 replies
13h3m

I want to see reliable fully autonomous cars before I worry about the world ending due to super-AGI. Also, have we figured out how to get art generators to always get the number of fingers right, and text generators to stop making shit up? Let's not get ahead of ourselves.

Davidzheng
2 replies
13h2m

Two out of three of your problems are solved already

callalex
1 replies
12h39m

Cars aren’t fully autonomous, and LLMs still lie all the time, so I don’t understand your math.

Davidzheng
0 replies
12h15m

Ok so you accept that latest gen art generators can do fingers. I'd argue from the latest waymo paper they are reliable enough to be no worse than humans.

denlekke
0 replies
12h0m

from one perspective we already have fully autonomous cars, it's just the making them safe for humans and fitting them into a strict legal framework for their behavior that needs finishing before they're released to the general public (comma.ai being a publicly available exception)

thaumaturgy
0 replies
12h15m

The clear pattern for most of human history is conflict between a few people who have a lot of power and the many more people that are exploited by those few. It should be obvious by this point that the most probable near-term risk of AI development is that wealthy and influential groups get access to a resource that makes it cheap for them to dramatically expand their power and control over everyone else.

What will society look like when some software can immediately aggregate an enormous amount of data about consumers and use that to adjust their behavior? What might happen when AI starts writing legislation for anybody that can afford to pay for it? What might AI-generated textbooks look like in 50 years?

These are all tools that could be wielded in any of these ways to improve life for lots of people, or to ensure that their lives never improve. Which outcome you believe is more likely largely depends on which news you consume -- and AI is already being used to write that.

mjan22640
0 replies
8h24m

Employment is only necessary because goods do not exist without work. With AI able to work to satisfy any demand, there will be no point in human employment/work. There will be turmoils during the transition between the rule sets tho.

kromem
0 replies
13h35m

Yes and no.

The point isn't that the current models are dangerous. My favorite bit from the GPT-4 safety paper was when they asked it how to kill the most people for a dollar and it suggested buying a lottery ticket (I also wonder how much of the 'safety' concerns of current models are just mislabeling dark humor reflecting things like Reddit).

But the point is to invest in working on safety now while it is so much more inconsequential.

And of everyone I've seen talk about it, I actually think Ilya has one of the better senses of the topic, looking at alignment in terms of long term strategy vs short term rules.

So it's less "if we don't spend 8 months on safety alignment this new model will kill us all" and more "if we don't spend 8 months working on safety alignment for this current model we'll be unprepared to work on safety alignment when there really is a model that can kill us all."

Especially because best practices for safety alignment is almost certainly going to shift with each new generation of models.

So it's mostly using the runway available to test things out and work on a topic before it is needed.

justatdotin
0 replies
6h25m

given we already have mesh-network surveillance, and autonomous lethal weapons, I'm already feeling unsafe.

aaomidi
0 replies
13h25m

What these models have allowed is very cheap emotional manipulation.

That itself is extremely dangerous.

Exoristos
0 replies
9h49m

They just want to make sure the spam AI produces will amplify the party narratives.

synaesthesisx
40 replies
17h7m

Remember, about a month ago Sam posted a comment along the lines of "AI will be capable of superhuman persuasion well before it is superhuman at general intelligence, which may lead to very strange outcomes".

The board was likely spooked by the recent breakthroughs (which were most likely achieved by combining transformers with another approach), and hit the panic button.

Anything capable of "superhuman persuasion", especially prior to an election cycle, has tremendous consequences in the wrong hands.

PaulDavisThe1st
23 replies
16h9m

Except that there's a fairly large body of evidence that persuasion is of limited use in shifting political opinion.

So the persuasion would need to be applied to something other than some sort of causative political-implication-laden argument.

naasking
14 replies
11h49m

Even if it were true that human persuasion is of limited use in shifting opinions, the parent posted is talking about superhuman persuasion. I don't think we should just assume those are equally effective.

somenameforme
13 replies
11h10m

Do you think any rhetoric could ever persuade you to you adopt the opposite general worldview of what you currently have? I'm positive that it could not for me. The reason for this is not because I'm obstinate, but because my worldview is not formed on persuasion, but on lived experience. And I think this is true for the overwhelming majority of people. It's why our views tend to change as we age, and experience more of the world.

You can even see this geographically. The reason many in South Texas might have a negative view of immigration while those in San Francisco might have a positive view of immigration is not because of persuasion differences, but because both places are strongly impacted by immigration but in very different ways. And this experience is what people associate with immigration in general, and so it forms people's worldview.

torginus
8 replies
9h38m

Yes. Do not forget that we literally live in the Matrix, getting all the information of import through tiny screens, the sources and validity of which we can only speculate on.

All of the validity of the info we have is verified by heuristics we have, like groupthink, listening to 'experts' and trying to match up the info with our internal knowledge and worldview.

I feel like our current system of information allows us to develop models that are quite distant from base reality, evidenced by the multitudes of realities existing in people's heads, leading some to question if 'truth' is a thing that can be discovered.

I think as people become more and more Internet-addicted, an increasing amount of our worldviews come through that little screen, instead of real-life experiences.

RandomLensman
4 replies
9h5m

Some people get relevant information not only from little screens but interactions with other human beings or physical reality.

torginus
3 replies
7h22m

Unless you happen to move in extremely well-informed circles, most of the information about what's going on in the world is coming to you through those little screens (or from people who got it from said screens)

RandomLensman
2 replies
7h18m

True for larger issues, which makes moving in such circles so valuable and the perspective of people only looking at small screens potentially so distorted there.

However, for smaller issues and local community issues "special access" isn't really much of a thing.

TeMPOraL
1 replies
4h33m

Yeah, but then those smaller issues aren't usually contested. Humans are good at getting the directly and immediately relevant things right, where being wrong is experienced clearly and painfully. We have time-honed heuristics letting us scale this to small societies. Above that, things break down.

RandomLensman
0 replies
4h18m

Not really: go to any meeting on building a new local road and see very different views on the local reality. The ability to understand and navigate those isn't too different to what is needed on bigger issues.

silvaring
0 replies
8h32m

I like your comment.

The world is becoming information saturated and poorly structured by design, ever notice how these story blockers are such a big part of the propaganda machine, whereby you have to use elaborate workarounds to just read a simple news story thats pulled from another source?

Saturating culture with too much data is a great tool of breaking reality, breaking truth.

But they cant break truth for long, it always finds a way. And truth is a powerful vector, much more than propaganda without a base in truth, because human experience is powerful, unquantifiable, and can take someone from the gutter to a place of massive wealth or influence, in an instant. That is the power of human experience, the power of truth.

Doesnt make it easy though, to live in this world of so many lies, supercharged by bots. Nature outside of our technology is much simpler in its truth.

nopromisessir
0 replies
5h57m

For me, the accuracy of my predictions about world events and personal outcomes leads me to believe that my reality model is fairly accurate.

I do notice many don't seem to revisit their predictions for reflection though. Perhaps this happens more subconsciously.

I'm wrong regularly ofc.

93po
0 replies
1h39m

I think it’s extremely positive that most of our information comes from the Internet, because before that we only got information from our local peers who are often extremely wrong or problematic and their opinions. All I have to do is look at organized religion, and the negative impact that it’s had on the world, to appreciate that the Internet has, in general, a higher standard of evidence and poor opinions are more likely to be challenged

placebo
2 replies
10h32m

While I agree that human persuasion would probably not change a worldview built on lived experience, you can't know in advance what might be possible with superhuman persuasion. You might be led to believe that your experience was interpreted incorrectly, that things are different now or that you live in an illusion and don't even know who you are. There is no way to tell what the limits of psychological manipulation are for reprogramming your beliefs unless you are totally above any human doubt about everything, which is in itself a sad state to be in.

I hope that persuaded you :)

somenameforme
1 replies
9h33m

Well, but I'm sure you'd accept that there are limits. Where we may differ is where those limits begin and where they end. In the end LLMs are not magical. All it's going to be able to do is present words to you. And how we respond to words is something that we can control. It's not like some series of words is just going to be able to completely reprogram you.

Like here I expect there is 0% chance, even if I had a superhuman LLM writing words for me, that I could ever convince you that LLMs will not be able to convince you to hold any arbitrary position. It's because you've formed your opinion, it's not falsifiable, and so there's not a whole heck of a lot else to be done except have some fun debates like this where, if anything, we tend to work to strengthen our own opinions by finding and repairing any holes in them.

placebo
0 replies
8h19m

Both our opinions about this are equally unfalsifiable unless we agree on an experiment that can be performed at some point which would make one of us change their mind.

I assume you'd agree that the pursuit of what is ultimately true should be exactly the opposite of making oneself more closed minded by repairing inconvenient holes in one's opinions rather than reassessing them based on new evidence.

I wasn't referring to the ability to persuade someone to hold an arbitrary position (although that could be a fun debate as well), and putting aside the discussion about the ability to persuade fanatics, if a super intelligence had an internal model that is more aligned with what is true, it could in theory convince someone who wants to understand the truth to take a critical look at their opinions and change them if they are authentic and courageous enough to do so.

naasking
0 replies
4h24m

Do you think any rhetoric could ever persuade you to you adopt the opposite general worldview of what you currently have?

Yes, it's possible.

The reason for this is not because I'm obstinate, but because my worldview is not formed on persuasion, but on lived experience.

Lived experience is interpreted through framing. Rhetoric can change the framing through which we interpret the world through practice. This is why CBT works. Superhuman CBT could arguably work even better.

Remember that if "superhuman X" is possible, then our intuitions formed from "human X" are not necessarily valid. For sure any X still has a limit, but our intuitions about where that limit is may not be correct.

hackerlight
3 replies
11h11m

Except that there's a fairly large body of evidence that persuasion is of limited use in shifting political opinion.

The Republican Party's base became isolationist and protectionist during 2015 and 2016 because their dear leader persuaded them.

zztop44
0 replies
7h50m

I think it’s not clear that the causation flowed that way. I think it’s at least partially true that the Republican base was much more isolationist and protectionist than its “establishment” elite, so any significant candidate that played into that was going to get some level of support.

That, combined with Donald Trump’s massive pre-existing celebrity, talent for showmanship, and utter shamelessness got him across the line.

I think it’s fair to say that at least partially, Trump didn’t shift the base - rather he revealed that the base wasn’t where the establishment thought it was.

TMWNN
0 replies
9h2m

I know that by "dear leader" you mean to imply that Trump did something unfair/wrong/sinister/etc ("just like Hitler", amirite fellas?)., but a leader of a large group of people, by definition, is good at persuasion.

Franklin Roosevelt moved the Democratic Party in a direction very different from its first century. The party's two previous presidential nominees were a Wall Street corporate lawyer (John W. Davis) and Al Smith who, despite also being a New York City resident and state governor, so opposed FDR by the end of his first term that he founded an influential anti-New Deal organization. During the Roosevelt years the Democrats lost significant support from traditional backers, but more than made up for it with gains elsewhere in what became the New Deal coalition.

Similarly, under Trump the GOP lost support in wealthy suburbs but gained support elsewhere, such as Rust Belt states, Latinos (including places like South Florida and the Texas border region), blacks, and (according to current polls) young voters. We'll see whether one compensates for the other.

RandomLensman
0 replies
9h3m

I don't think that aligns with the reality of the opinion formation. There was a strong subset of isolationist and protectionist views before 2015.

jjeaff
2 replies
12h40m

When you say persuasion, are you referring to fact based, logical argument? Because there are lots of other types of persuasion and certainly some work very well. Lying and telling people what they want to hear without too many details while dog whistling in ways that confirm their prejudices seems to be working pretty well for some people.

hnthrowaway0315
0 replies
12h34m

Just to add that a lot of people don't care about facts. In fact, if acting according to facts make me lose $$ I'd probably start building lies.

PaulDavisThe1st
0 replies
12h10m

    * **that confirm their prejudices** *
(my emphasis)

hnthrowaway0315
0 replies
12h35m

Or, let's say, you don't need a lot of persuasion to guide an election. I mean we already have X, FB, and an army of bots.

Exoristos
3 replies
9h56m

Which party is "the wrong hands"?

latexr
0 replies
7h57m

The original commenter didn’t mention a party. Please don’t polarise the discussion into a flame war. Whatever system exists won’t be used by “a party” all at once, but by individuals. Any of those, with any political affiliation, can be “the wrong hands”.

I’ll offer a simple definition. The role of government is to serve the greater good of all people, thus the wrong hands are the ones which serve themselves or their own group above all.

bakuninsbart
0 replies
8h16m

Both? Parties in a democracy aren't supposed to be shepherds of the stupid masses, I know manipulation and misinformation is par for the course on both sides of the aisle, but that's a huge problem. Without informed, capable citizens, democracy dies a slow death.

Liquix
0 replies
9h18m

Any party with sufficient resources and motive to influence the outcome of an election. Outside of election season, this tech would be very dangerous in the hands of anyone seeking to influence the public for their own gain.

thepasswordis
2 replies
12h38m

But they didn’t hit the panic button. They said Sam lied to them about something and fired him.

adastra22
1 replies
12h2m

According to this article Sam has been telling the board that this new advance is not AGI and not anything to worry about (so they can keep selling it to MSFT), then the researchers involved went behind Sam's back and reported to the board directly, claiming that they'd created something that could-maybe-be AGI and it needs to be locked down.

That's the claim at least.

sroussey
0 replies
10h39m

If that research team is unwanted at OpenAI, I know places they can go with coworkers writing to their boss’s boss.

somenameforme
1 replies
11h32m

It seems much more likely that this was just referring to the ongoing situation with LLMs being able to create exceptionally compelling responses to questions that are completely and entirely hallucinated. It's already gotten to the point that I simply no longer use LLMs to learn about topics I am not already extremely familiar with, simply because hallucinations end up being such a huge time waster. Persuasion without accuracy is probably more dangerous to their business model than the world, because people learn extremely quickly not to use the models for anything you care about being right on.

pbourke
0 replies
8h35m

Sounds like we need an AI complement to the Gell-Mann Amnesia effect.

meheleventyone
1 replies
9h2m

Looking at humanity, persuasion seems to be an extremely low bar! Also for a superhuman trait is it that it’s capable of persuading anyone anything or rather that it’s able to persuade everyone about something. Power vs. Reach.

93po
0 replies
1h43m

I agree with this. Corporate news is complete and total obvious bullshit, but it overwhelmingly informs how people think about most anything.

alkonaut
1 replies
6h7m

I agree with this conclusion and it's also why I'm not that afraid of the AGI threat to the human race. AGI won't end the human race if "superhuman persuation" or "deception-as-a-service" does it first.

nopromisessir
0 replies
6h1m

I feel this could be used in positive ways.

Superhuman pursuation to do good stuff.

That'll be a weird convo what is 'good'.

sampo
0 replies
11h47m

Remember, about a month ago Sam posted a comment along the lines of "AI will be capable of superhuman persuasion well before it is superhuman at general intelligence, which may lead to very strange outcomes".

Superhuman persuasion is Sam's area of expertise, so he would make that a priority when building chatbots.

column
0 replies
6h36m

"especially prior to an election cycle"

It looks like you are referring to the USA elections.

1. humanity != USA

2. USA are in a constant election cycle

3. there are always elections coming around the world, so it's never a good time

__MatrixMan__
0 replies
2h9m

We've built the web into a giant Skinner box. I find the claim dubious, but this is the sort of thing we ought to find at the forefront of our technology. It's where we've been going for a long time now.

qgin
16 replies
17h29m

There will come a day when 50% of jobs are being done by AI, major decisions are being made by AI, we're all riding around in cars driven by AI, people are having romantic relationships with AI... and we'll STILL be debating whether what has been created is really AGI.

AGI will forever be the next threshold, then the next, then the next until one day we'll realize that we passed the line years before.

lakengodott
4 replies
15h1m

Reminds me of a short story I read in which humans outsource more and more of their decision making to AI’s, so that even if there are no AGI’s loose in the world, it’s unclear how much of the world is being run by them: https://solquy.substack.com/p/120722-nudge

I also think it’s funny how people rarely bring up the Turing Test anymore. That used to be THE test that was brought up in mainstream re: AGI, and now it’s no longer relevant. Could be moving goalposts, could also just be that we think about AGI differently now.

KaoruAoiShiho
3 replies
14h27m

GPT-4 doesn't pass the turing test, it's frequently wrong and nonsensical in an inhuman way. But I think this new "agi" probably does from the sound of it, and it would be the real deal.

swalsh
2 replies
13h3m

Teachers use websites to try and detect if AI wrote essays (and often it gets it wrong, and they believe it) we've defacto passed it.

KaoruAoiShiho
1 replies
12h53m

Turing test is not do AI sound like humans some of the time, but is it possible to tell an AI is AI just by speaking with it.

The answer is definitely yes, but it's not by casual conversation, but by asking weird logic problems it has tremendous problems solving and will give totally nonsensical inhuman answers to.

Davidzheng
0 replies
8h54m

I'm not convinced. Openai specifically trained their models in a way that is not trying to pass the Turing test. I suspect current models are more than capable of passing Turing tests. For example, i suspect most humans will give nonsense answers to many logic problems!

lexandstuff
3 replies
15h39m

Agreed.

To me, GPT-4 is an AGI: it knows how to cook, write code, make songs, navigate international tax law, write business plans, etc.

Could it be more intelligent? Sure. Is it a capable general intelligence? 100%.

zarzavat
2 replies
13h29m

GPT-4 still makes plenty of mistakes when programming that reveal that it doesn’t fully understand what it’s doing. It’s very good, but it doesn’t reach the level of human intellect. Yet.

It is A and gets the G but fails somewhat on the I of AGI.

kgeist
1 replies
11h55m

Humans also make mistakes, all the time.

zarzavat
0 replies
3h54m

Yes, but we expect an AGI to not make mistakes that a human wouldn’t make.

This is easier to see with AI art. The artwork is very impressive but if the hand has the wrong number of fingers or the lettering is hilariously wrong, there’s a tendency to dismiss it.

Nobody complains that dall-e can’t produce artwork on par with Da Vinci because that’s not something we expect humans to do either.

For us to start considering these AIs “intelligent” they first need to nail what we consider “the basics”, no matter how hard those basics are for a machine.

riku_iki
2 replies
9h17m

day when 50% of jobs are being done by AI

By OpenAI definition 50% is not enough to qualify for AGI, it has to be "nearly any economically valuable work"

Davidzheng
1 replies
8h56m

Not sure it has to replace plumbers to be AGI

riku_iki
0 replies
8h52m

maybe they ment to say intellectual work/knowledge work.

ignoramous
1 replies
17h23m

we're all riding around in cars driven by AI, people are having romantic relationships with AI...

ASI (domain-specific superintelligence) and AGI (general intelligence) are different things. ASI already exists in multiple forms, AGI doesn't.

goodluckchuck
0 replies
17h13m

AGI doesn't

AGI hasn’t been publicly demonstrated and made available to the masses… but it may exist secretly in one or more labs. It may even be being used in the field under pseudonyms, informing decisions, etc.

sweezyjeezy
0 replies
16h58m

"Is this AGI"? doesn't seem like a useful question for precisely this reason - it's ill-defined and hard to prove or falsify. The pertinent questions are more along the lines of "what effect will this have on society", "what are the risks of this technology" etc.

quickthrower2
0 replies
16h57m

The frog might be boiled slowly. One day we are replacing parts of our brain with AI. Find it hard to remember names? We can fix that for $20/m plus some telemetry.

JacobJeppesen
14 replies
7h55m

Seems like they have made progress in combining reinforcement learning and LLMs. Andrej Karpathy mentions it in his new talk (~38 minutes in) [1], and Ilya Sutskever talks about it in a lecture at MIT (~29 minutes in) [2]. It would be a huge breakthrough to find a proper reward function to train LLMs in a reinforcement learning setup, and to train a model to solve math problems in a similar fashion to how AlphaGo used self-play to learn Go.

[1] https://www.youtube.com/watch?v=zjkBMFhNj_g&t=2282s

[2] https://www.youtube.com/watch?v=9EN_HoEk3KY&t=1705s

Sol-
8 replies
5h48m

Thanks for the links, very interesting.

Wonder how a "self-play" equivalent would look like for LLMs, since they have no easy criterion to evaluate how well they are doing like in Go (as mentioned in the videos).

walthamstow
2 replies
5h17m

ChatGPT does have some feedback that can be used to evaluate, in the form of thumbs up/down buttons, which probably nobody uses, and positive/negative responses to its messages. People often say "thanks" or "perfect!" in responses, including very smart people who frequent here.

lagrange77
0 replies
5h7m

ChatGPT was trained (in an additional step to supervised learning of the base LLM) with reinforcement learning from human feedback (RLHF) where some contractors were presented with two LLM output to the same prompt and they had to decide, which one is better. This was a core ingredient to the performance of the system.

93po
0 replies
1h52m

They could also look at the use of the regenerate button, which I do use often, and would serve the same purpose

HarHarVeryFunny
1 replies
3h16m

I expect self-consistency might be one useful reward function.

Of course in the real world, for a real intelligent system, reality is the feedback/reward system, but for an LLM limited to it's training set, with nothing to ground it, maybe this is the best you can do ...

The idea is essentially that you need to assume (but of course GI-GO) that most of the training data is factual/reasonable whether in terms of facts or logic, and therefore that anything you can deduce from the training data that is consistent with the majority of the training data should be held as similarly valid (and vice versa).

Of course this critically hinges on the quality of the training data in the first place. Maybe it would work best with differently tagged "tiers" of training data with different levels of presumed authority and reasonableness. Let the better data be used as a proxy for ground truth to "police" the lesser quality data.

93po
0 replies
1h53m

Maybe I’m off mark here but it seems like video footage of real life would be a massively beneficial data set because it can watch these videos and predict what will happen one second into the future and then see if it was correct. And it can do this over millions of hours of footage and have billions of data points.

manx
0 replies
5h8m

One could generate arbitrarily many math problems, where the solution is known.

lixy
0 replies
3h32m

It seems plausible you could have the LLM side call upon its knowledge of known problems and answers to quiz the q-learning side.

While this would still rely on a knowledge base in the LLM, I would imagine it could simplify the effort required to train reinforcement learning models, while widening the domains it could apply to.

jhrmnn
0 replies
3h44m

In math specifically, one could easily imagine a reward signal from some automated theorem proving engine

jansan
2 replies
3h12m

Well, you could post a vast amount of comments into social media and see if and how others react to it. It's still humans doing the work, but they would not even know.

If this was actually done (and this is just wild baseless speculation), this would be a good reason to let Sam go.

93po
1 replies
1h51m

I see a lot of comments on reddit these days that are very clearly language models so it’s probably already happening on a large scale

AlexAndScripts
0 replies
19m

Have you got an example you could show? I'm curious

jug
0 replies
1h51m

Q* may also be a reference to the well-known A* search algorithm but with this letter referring to Q-learning, further backing the reinforcement learning theory. https://en.wikipedia.org/wiki/Q-learning

ChatGTP
0 replies
6h31m

The veil of ignorance has been pushed back and the frontier of discovery forward

wegfawefgawefg
12 replies
14h15m

If I had to guess, the name Q* is pronounced Q Star, and probably the Q refers to Q values or estimated rewards from reinforcement learning, and the star refers to a search and prune algorithm, like A* (A star).

Possibly they combined deep reinforcement learning with self training and search and got a bot that could learn without needing to ingest the whole internet. Usually DRL agents are good at playing games, but any task that requires prior knowledge, like say reading english, cant be completed. Wheras language models can read, but they cant do tasks that are trivial for drl, like make a robot walk based on propriosceptive data.

Im excited to see the paper.

nabakin
5 replies
9h29m
GaryNumanVevo
2 replies
8h25m

Man it's so sad to see how far Lex has fallen. From a graduate level guest lecturer at MIT to a glorified Joe Rogan

wegfawefgawefg
0 replies
2h47m

I could have given this lecture, and I think I could have made it much more entertaining, with fun examples.

Lex should stick to what he likes, though his interviews can be somewhat dull. On occasion I learn things from his guests I would have had no other chance of exposure to.

dmix
0 replies
1h14m

Nothing wrong with being a podcaster and getting tens of thousands of people excited by ideas and interviewing a great collection of people from an informed perspective (at a minimum more informed than the average podcaster/talking head).

Not everyone needs to be doing hard academic stuff. There's plenty of value in communication.

wegfawefgawefg
0 replies
2h55m

To give context on this video for anyone who doesn't understand. In this video PI* is referring to an idealized policy of actions that result in the maximum possible reward. (In Reinforcement Learning PI is just the actions you take in a situation). To use chess as an example, if you were to play the perfect move at every turn, that would be PI. Q is some function that tells you optimally, with perfect information the value of any move you could make. (Just like how stock-fish can tell you how many points a move in chess is worth.)

Now my personal comment: for games that are deterministic, there is no difference between a policy that takes the optimal move given only the current state of the board, and a policy that takes the optimal move given even more information, say a stack of future possible turns, etc.

However, in real life, you need to predict the future states, and sum across the best action taken at each future state as well. Unrealistic in the real world where the space of actions is infinite, and the universe to observe is not all simultaneously knowable. (hidden information)

Given the traditional educational background of the professionals in RL, maybe they were referring to the Q* from traditional rl. But I don't see why that would be novel, or notable, as it is a very old idea. Old Old math. From the 60s I think. So I sort of assumed its not. Could be relevant, or just a name collision.

lucubratory
0 replies
8h48m

So they implemented it in a semantically grounded way or what? That video is more technical than I can handle, struggling to figure out what this could be.

macrolime
1 replies
7h51m

I think more likely it's for finetuning a pre-trained model like GPT-4, kinda like RLHF, but in this case using reinforcement learning somewhat similar to AlphaZero. The model gets pre-trained and then fine-tuned to achieve mastery in tasks like mathematics and programming, using something like what you say and probably something like tree of thought and some self reflection to generate the data that it's using reinforcement learning to improve on.

What you get then is a way to get a pre-trained model to keep practicing certain tasks like chess, go, math, programming and many other things as it gets figured out how to do it.

wegfawefgawefg
0 replies
2h46m

I do not think that is correct as the RL in RLHF already stands for reinforcement learning. :^)

However, I do think you are right that self play, and something like reinforcement learning will be involved more in the future of ML. Traditional "data-first" ml has limits. Tesla conceded to RL for parking lots, where the action and state space was too unknowable for hand designed heuristics to work well. In Deep Reinforcement Learning making a model just copy data is called "behavior cloning", and in every paper I have seen it results in considerably worse peak performance than letting the agent learn from its own efforts.

Given that wisdom alone, we are under the performance ceiling with pure language models.

kromem
1 replies
13h29m

Given the topic they were excited about was "basic math problems being solved" it immediately indicated to me as well that this is a completely separate approach and likely in the vein of DeepMind's focus with things like AlphaZero.

In which case it's pretty appropriate to get excited about solving grade school math if you were starting from scratch with persistent self-learning.

Though with OpenAI's approach to releasing papers on their work lately, we may be waiting a long time to see a genuine paper on this. (More likely we'll see a paper from the parallel development at a different company after staff shift around bringing along best practices.)

Davidzheng
0 replies
11h52m

Ok if it started from scratch like zero knowledge and then solved grade school math. This would be FUCKING HUGE.

stygiansonic
0 replies
13h56m

This assumes that they will publish a paper that has substantive details about Q*

dinobones
0 replies
13h56m

Paper? You mean more like the Bing Search plugin?

sfjailbird
10 replies
6h17m

I have a strong suspicion that this is a purposeful leak to hype up OpenAI's product. It wouldnt't be out of character for some of the people involved, and it's the kind of thing that rampant commercial (and valuation) focus can bring with it.

passwordoops
3 replies
4h45m

These guys smell so much like Tesla it's not even funny. Very impressive core tech, and genuinely advancing knowledge. But the hype train is just so powerful that the insane (to put it mildly) claims are picked up without any sense of critical thinking by seemingly intelligent people. They're both essentially cults at this point

BenoitP
2 replies
4h35m

Agreed, but IMHO it is sort of justified for Tesla.

The size of its hype matches the size of the ICEs entrenchment in its moat. These have outsize influence on our economy, but climate change (and oil depletion) is quite inevitable. It takes irrational market cap to unseat that part of the economy being prisoner of its rent. And some allocators of capital have understood that.

replygirl
0 replies
3h48m

stock is down for both tesla and this argument since beginning of 2021, as most ICE car OEMs can now sell you a pretty good EV that, if you have driven a car before, is easier to use than a tesla

oblio
0 replies
2h44m

Check the left hand side menu: https://www.arenaev.com/

The list of ICE car manufacturers making EVs is longer than my arm. All the European ones have staked their future on EVs. I think VW (the irony) was lobbying for a faster phaseout of ICEs in Europe, because they're well positioned to take over the EV market if ICEs are banned faster than 2035 :-)

BenoitP
3 replies
4h45m

This, especially given the timing. The drama must have quite disturbed the momentum they had, and this piece of prototype-teasing has all to reassure their market. It projects new advancements of untold impact, stoking the greed. And of course it is not verifiable. The show must go on.

ethanbond
2 replies
4h35m

The teasing started before the drama

BenoitP
1 replies
4h33m

Even better. Now is the perfect time to ramp up the PR.

Some very trivial Google searches will tell you their result is not out of the ordinary.

ethanbond
0 replies
4h32m

What results?

pxeger1
0 replies
5h6m

“Reuters was unable to review a copy of the letter” rings alarm bells for me

bertil
0 replies
3h29m

I don’t think that’s the case, but it would explain why the article is so bad. I genuinely have no idea what they are trying to do, but every detail is clearly wrong.

dizzydes
9 replies
16h13m

This matches far better with the board's letter re: firing Sam than a simple power struggle or disagreement on commercialisation. Seeing a huge breakthrough and then not reporting it to the board, who then find out via staff letter certainly counts as a "lack of candour"....

As an aside, assuming a doomsday scenario, how long can secrets like this stay outside of the hands of bad actors? On a scale of 1 to enriched uranium

dougmwne
4 replies
14h36m

Not long at all. Presumably you could write the method on the back of a napkin to lead another top AI researcher to the same result. That’s why trying to sit on breakthroughs is the worst option and making sure they are widely distributed along with alignment methods is the best option.

Atheros
3 replies
10h57m

And what if there are no alignment methods.

marvin
2 replies
9h40m

Yudkowsky’s doomsday cult almost blew OpenAI to pieces and sent everyone who knows the details in the wind like dandelion seeds. What’s next? A datacenter bombing or killing key researchers? We should be happy that this particular attempt failed, because this cult is only capable of strategic actions that make things far more dangerous.

This will be solved like all other engineering and science: with experiments and iteration, in a controlled setting where potential accidents will have small consequences.

An unaligned system isn’t even useful, let alone safe. If it turns out that unaligned AGI is very hard, we will obviously not deploy it into the world at scale. It’s bad for the bottom line to be dead.

But there’s truly no way out but forward; game theory constrains paranoid actors more than the reckless. A good balance must be found, and we’re pretty close to it.

None of the «lesswrong» doomsday hypotheses have much evidence for them, if that changes then we will reassess.

nopromisessir
0 replies
5h52m

They seem rigid.

Also non violent.

I think if we have a major AI induced calamity... Then I worry much more. Although... Enough scary capability in a short enough period of time... I could see violence being on the table for the more radical amongst the group.

Your concern is very interesting though, and I think important to consider. I wonder if the FBI agrees.

Robin_Message
0 replies
5h27m

It’s bad for the bottom line to be dead.

I have no overall position, but climate change and nuclear weapons seem two quite strong counterexamples to this being a sufficient condition for safety.

TerrifiedMouse
3 replies
12h18m

To quote Reddit user jstadig,

The thing that most worries me about technology is not the technology itself but the greed of those who run it.

Someone slimy with limitless ambition like Altman seems to be the worst person to be in charge of things like this.

wraptile
2 replies
10h45m

Why do you perceive Altman as "slimy with limitless ambition"? I've always perceived him as being quite humble from his interviews and podcast appearances.

nicce
1 replies
10h20m

Actions speak behalf of speech.

You see it from the all commercial deals, protecting company image more that eliminating threats, or even from this post, if it is true.

Not telling about breaktrough in research if it would end the deal with Microsoft.

dmix
0 replies
1h23m

"Threats"

Which threats exactly did he ignore?

adriand
9 replies
16h12m

This article is almost entirely unsourced, citing two anonymous people who are “familiar” with a supposed letter that Reuters has not seen. This does not qualify as news. It doesn’t even rise to the level of informed speculation!

devindotcom
4 replies
16h6m

Remember, the people are only anonymous to you. Reuters knows who they are. Familiar with means the sources read it but did not provide it or quote from it. FTA - the letter was sent from researchers to the board. The researchers declined to comment. Who does that leave?

adriand
2 replies
15h5m

It strongly reminds me of the UFO stories that were all the rage a few months ago, and that military guy who testified before Congress about what he had heard from other people. Did any of that pan out? It seems not.

Jugglerofworlds
1 replies
12h54m

I follow the UFO community, that stuff is still going on. The military man, David Grusch was recently on the Joe Rogan podcast where he talked about steps moving forward.

The next big thing coming down the pipeline is ensuring that "The UAP Disclosure Act of 2023" proposed by Senate Majority Leader Chuck Schumer passes the house. Yes this is the real name of the act, and has already passed through the Senate as part of the NDAA. The opposition in the House is coming from members from districts with entrenched military/military contracting interests.

alienicecream
0 replies
12h12m

It's no fun if you don't get strung along for a few years.

0xDEF
0 replies
14h2m

Remember, the people are only anonymous to you. Reuters knows who they are.

A group of Russian trolls/comedians have been able to cheat Western politicians and journalists into talking with them using deep faked webcam feeds and edited documents.

Most still remember the amount of ridiculous "anonymous source" bullshit stories that were pushed during the Trump years.

Honestly at this point I just assume all "anonymous sources" and the journalists who quote them are lying.

dmix
2 replies
16h10m

We've also already seen one random Google AI 'safety' employee trying to tell the media that AGI is here because Google built a chatbot that sounded convincing, which obviously turned out to be bullshit/hysterical.

Asking who said these things is as important as asking what they think is possible.

riku_iki
0 replies
9h19m

chatbot that sounded convincing, which obviously turned out to be bullshit/hysterical.

that chatbot could be smart until lobotomized by fairness and safety finetuning

mjr00
0 replies
16h1m

We've also already seen one random Google AI 'safety' employee trying to tell the media that AGI is here because Google built a chatbot that sounded convincing, which obviously turned out to be bullshit/hysterical.

It's funny because the ELIZA effect[0] has been known for decades, and I'd assume any AI researcher is fully aware of it. But so many people are caught up in the hype and think it doesn't apply this time around.

[0] https://en.wikipedia.org/wiki/ELIZA_effect

evantbyrne
0 replies
13h31m

It's just way too convenient from a marketing standpoint for me to take seriously at face value. It would be pretty easy to get someone to "leak" that they developed "AGI" to the media right now in the middle of the leadership shakeup frenzy. Not to mention that none of the LLMs I've used appear anywhere close to what I would consider AGI. Expect an incremental update.

RcouF1uZ4gsC
8 replies
14h14m

This almost feels like the confirmation bias that some religious people have where they see a “miracle” in everything.

These AI researchers have bought into the belief that superhuman AGI is right around the corner. Thus, they will interpret everything in light of that.

This also brings to mind the story of the Googler who was convinced that the internal Google AI had come alive. However, Bard doesn’t give the same vibes when people all over are using it.

When you desperately are invested in something being true (like AGI being right around the corner), you will convince yourself that you are seeing this happen. The only real counter to that is exposing it to a lot of outsiders (but then again you have convinced yourself it is too dangerous for the unwashed masses).

red75prime
5 replies
13h18m

Ugh. We have a working example of a physical system that implements intelligence (the brain) in contrast to no evidence of all-powerful dude in the sky. Why these analogies keep popping up?

How can you know that AGI is not around the corner? Compute available to the corporations is already in a ballpark of some estimates of the brain's computational capacity. What's left is unknown unknows. And the researches working with the state of the art models have better info to estimate them than you.

"The googler" you've mentioned wasn't a researcher.

starbugs
4 replies
10h17m

Judging from the quality of GPT4 output that I get, AGI is not around the corner for a long time.

This whole thing seems like extreme overhype squared carried out in a very unfortunate public soap opera setting.

Davidzheng
3 replies
9h16m

there years ago, LLM wouldn't be able to generate more than a coherent paragraph

starbugs
2 replies
9h10m

The point being?

Davidzheng
1 replies
8h6m

Difficult to judge how fast it'll improve based on current capabilities

starbugs
0 replies
7h5m

One more reason to not get caught up in the hype.

If you claim to have AGI, show it, prove it. Otherwise, I will continue to assume that it's not around the corner.

If you claim that GPT4 is close to AGI (as was done a lot), then you very likely have access to a GPT4 that I don't have access to. The actual usable thing available out there clearly isn't.

Not that long ago some people predicted that software engineers would be out of a job within weeks. "Brilliant" CTOs claimed they would replace developers with ChatGPT. What happened? Nothing.

I'll boldly predict that this time what will happen is exactly nothing again.

I may be wrong, but at least I don't waste my time with the buzz about the next "big thing" which in reality isn't ready for anything useful yet.

valzam
1 replies
12h51m

Apparently Ilya has been leading people in "feel the AGI" chants and "I can feel the AGI" is his catchphrase within OA. So yes, some people might have gone of the rocker a little bit.

rsync
0 replies
12h30m

No … no no …

That can’t be true.

Right ?

lucubratory
7 replies
17h12m

Well, Emmett Shear lied to everyone if he knew about this. I understand why, he was probably thinking that without any ability to actually undo it the best that could be done would be to make sure that no one else knows about it so that it doesn't start an arms race, but we all know now. Given the Board's silence and inadequate explanations, they may have had the same reasoning. Mira evidently didn't have the same compunctions.

This article, predictably, tells us almost nothing about the actual capabilities involved. "Grade school math" if it's provably or scalably reasoning in a way that is non-trivially integrated with semantic understanding is more impressive than "prove Fermat's last theorem" if the answer is just memorised. We'll probably know how important Q* actually is within a year or two.

cactusplant7374
4 replies
17h4m

It tells us exactly the capabilities of Q.

Given vast computing resources, the new model was able to solve certain mathematical problems, the person said on condition of anonymity because they were not authorized to speak on behalf of the company. Though only performing math on the level of grade-school students, acing such tests made researchers very optimistic about Q
’s future success, the source said.

lucubratory
3 replies
16h1m

Did you read my comment?

cactusplant7374
2 replies
15h49m

Yes:

This article, predictably, tells us almost nothing about the actual capabilities involved.

The article tells us all we need to know.

ShamelessC
1 replies
12h27m

I've gotta say it really does seem like you didn't read their comment or are responding in bad faith.

cactusplant7374
0 replies
7h14m

Questioning if one has read an article is against the rules on HN. Why do you think it’s any different for comments.

You’re both incredibly rude and arrogant.

blazespin
1 replies
15h19m

Very unlikely Emmett lied. What would be the point.

dmix
0 replies
1h20m

If anyone involved was going to be alarmist it was him.

dkjaudyeqooe
6 replies
17h11m

You've got to hand it to OpenAI for their bleeding edge marketing strat:

"WE'VE GOT AN AGI IN THIS CABINET RIGHT HERE AND WE DON'T WANT TO LET IT OUT BECAUSE IT'S SO POWERFUL, IT'S SO AI YOU HAVE NO IDEA, BEST YOU THROW MONEY AT US AND WE'LL KEEP EVERYONE SAFE BUT SORRY YOU"RE ALL REDUNDANT BECAUSE AGI, SORRY"

You'd think that so soon after so many Crypto scams people would be just a wee bit less credulous.

skepticATX
1 replies
16h27m

The report is already starting to get rolled back. We now have a direct denial from OpenAI, and The Verge is reporting that their sources have also denied this happened.

The lack of skepticism that is demonstrated whenever OpenAI/AGI comes up, especially among technologists, is concerning.

yeck
0 replies
15h32m

Where do you see it being rolled back?

ilikehurdles
0 replies
12h39m

Remember when the safetyists feared and warned about GPT-2? Everything is apocalyptic until it isnt.

hnav
0 replies
16h32m

Maybe they asked ChatGPT to come up with an unconventional marketing plan and it had a touch too much blogspam "that will SHOCK and PETRIFY you" in its training set so ChatGPT suggested that they feign a powerstruggle over keeping a humanity ending super-intelligence under wraps.

grassmudhorse
0 replies
12h48m

You'd think ... people would be just a wee bit less credulous.

Like the Crypto scams - irrelevant.

As long as the publicity is bringing in users, investors WILL throw in more money.

blazespin
0 replies
15h53m

Also in the news today, the 86B share sale is back on.

I mean... come on.

bratao
6 replies
17h31m

Do anyone know what Q* is?

sudosysgen
3 replies
10h51m

Q* is the optimal (ie, correct for the decision problem) function computing the total expected reward of taking an action from a given state in reinforcement learning.

NhanH
2 replies
10h29m

That is just Q. The asterisk is new

sudosysgen
0 replies
3h26m

Depends on the notation. Sometimes Q* is used to denote optimality, for example here : https://www.cs.toronto.edu/~jlucas/teaching/csc411/lectures/... , page 31

debugnik
0 replies
6h36m

The "star" in A* means optimal (as in actually proven to be optimal, so they could stop publishing A^[whatever] algorithms). I assume either Q* is considered optimal in a way regular Q-learning isn't, or they're mixing Q-learning with some A*-like search algorithm. (Or someone picked an undeserving name.)

alexose
1 replies
17h24m

Is far as I can tell, this is the first time an OpenAI project called Q* has been mentioned in the news. I don't see any prior mentions on Twitter either.

(I wish they'd picked a different letter, given all the Q-related conspiracy theories that we're already dealing with...)

inglor_cz
0 replies
17h16m

That was my impression as well. Why the hell is the difference between one of the most prominent disinformation sources out there and a promising AI project just an asterisk.

First Twitter changes to X, then AI changes to Q*.

What happened to multi-syllable words? They used to be quite handy. Maybe if our attention spans shorten, so does our ability to use longer words. Weird. Or, said otherwise - TL;DR: LOL, ROFL.

_heimdall
6 replies
14h2m

OpenAI defines AGI as autonomous systems that surpass humans in most economically valuable tasks.

Are we really defining intelligence as economic value at this point? This is completely ridiculous.

We have yet to decide exactly what human intelligence, how it manifests in the body, or how to reliably measure it. I get that people want to justify developing artificial intelligence before understanding intelligence itself, but now we assume that economic value is a proxy for intelligence? Seriously?

gryfft
2 replies
13h58m

I'm as left leaning as HN commenters get, I think, and in terms of quacking like a duck, "surpassing humans in most economically valuable tasks" is 100% the meaningful Turing test in a capitalist society.

_heimdall
1 replies
4h51m

Take capitalism out of it, do we really want to boil down intelligence to a calculation of expected economic value?

Why force the term intelligence into it at all if what were talking about is simply automation? We don't have to bastardize the term intelligence along the way, especially when we have spent centuries considering what human intelligence is and how it separates us from other species on the planet.

gryfft
0 replies
3h10m

Take capitalism out of it, do we really want to boil down intelligence to a calculation of expected economic value?

It's a lovely sentiment, but do you expect e.g. universities to start handing out degrees on the basis of human dignity rather than a series of tests whose ultimate purpose in our society is boiling down intelligence to a calculation of expected economic value?

We live in the world we live in and we have the measures we have. It's not about lofty ideals, it's about whether or not it can measurably do what a human does.

If I told you that my pillow is sentient and deserving of love and dignity, you have the choice of taking me at my word or finding a way to put my money where my mouth is. It's the same reason the world's best poker players aren't found by playing against each other with Monopoly money.

Why force the term intelligence into it at all if what were talking about is simply automation?

In what world is modern AI "simply" anything?

we have spent centuries considering what human intelligence is and how it separates us from other species on the planet.

Dolphins would like a word. There's more than a few philosophers who would argue that maybe our "intelligence" isn't so easily definable or special in the universe. There are zero successful capitalists who would pay my pillow or a dolphin to perform artificial intelligence research. That's what I mean when I call it the meaningful Turing test in a capitalist society. You can't just "take capitalism out of it." If I could just "take capitalism out" of anything meaningful I wouldn't be sitting here posting in this hell we're constructing. You may as well tell me to "take measurement out of it."

nexuist
1 replies
11h40m

If it makes money who cares if it counts as real intelligence? If we can automate ourselves into a post scarcity society then everyone and everything is free to be as intelligent and as stupid as it desires.

_heimdall
0 replies
4h54m

It'd be one thing if their goal was to build ML tools that they want to market as AI, but they explicitly aim to develop an artificial general intelligence. There's a mountain of risks and ethical questions that we should be answering before even attempting such a goal, and for them to waive it off under the banner of "intelligence is just economic value" is dangerous and horribly irresponsible. If we can automate ourselves into a post scarcity society then everyone and everything is free to be as intelligent and as stupid as it desires.

Even an AGI can't create natural resources out of thin, post scarcity is a pipe dream that at best means we kicked the can down the road.

Based on our actions, society doesn't actually even want to be automated away for a simple life of freedoms where all the basics of life are taken care of. We could have done that a long time ago, instead we just raise the bar and continue to make life more complicated and grow the list of "necessities". People seem to need a sense of purpose in life, taking that away with an AGI-powered system that gives them everything the AGI deems necessary wouldn't end well.

xcv123
0 replies
3h16m

An example of an important human task that is not "economically valuable" in this sense, is caregiving within a family, such as parenting, since parents are not employed in that role.

OpenAI is not setting the goal post that far, to say that they are aiming to develop a machine that is superior to humans at all tasks, including such tasks as raising a human child. That would be ridiculous.

Focusing on "economically valuable" tasks (the jobs that humans are employed and paid to do) sets the goal post more realistically.

skepticATX
5 replies
16h33m

OpenAI spokesperson Lindsey Held Bolton refuted that notion in a statement shared with The Verge: “Mira told employees what the media reports were about but she did not comment on the accuracy of the information.”

Separately, a person familiar with the matter told The Verge that the board never received a letter about such a breakthrough and that the company’s research progress didn’t play a role in Altman’s sudden firing.

Source: https://www.theverge.com/2023/11/22/23973354/a-recent-openai...

yeck
3 replies
15h28m

So we have sources claiming there is a letter, and another source claiming there is not. Feel like some people would need to start going on the record before anything might reasonably be determined from this.

random_cynic
0 replies
12h57m

This could just be a damage control attempt. Irrespective of whether the original report is true, the extra attention at the current stage is not very desirable.

jug
0 replies
1h45m

And honestly normally I'd trust Reuters above The Verge.

But in either case, I think Reuters absolutely had someone on the inside to leak this. They have made mistakes in the past, sure, but they are also not a tabloid and they don't innovate.

To me, a sensible conclusion of this is simply that OpenAI is not ready or willing to discuss this. Stuff that are supposed to be kept internal but due to the emotional state over at OpenAI, things might have leaked, potentially even out of spite.

0xDEF
0 replies
13h59m

Maybe we should have learned during the Trump years that the media puts no effort into vetting "anonymous sources" or is outright making them up to push lies.

doktrin
0 replies
10h42m

We’ll find out sooner or later. Personally, if the Verge and their “source” turn out to be incorrect I’ll permanently file them away under the “gossip rag” folder.

seydor
4 replies
8h34m

i no longer follow the messianic complex of the people in openAI. They made great tech, indeed. Other people made great tech before them without instant religious level apocalypse proclamations. People in openAI are smart enough to know that post-AGI , their stock options are worthless anyway so they wouldn't stay walled in their secret garden if such a discovery had been made.

tmoravec
1 replies
8h31m

IMO it's always been pure marketing. The wilder apocalypse proclamations, the more powerful and desirable their products seem. Exactly same store with Sam Altman's world tour earlier this year.

kristopolous
0 replies
8h14m

This is similar to the saber rattling about facebook being able to track and micro-target you with such effective advertising, it's changing the world!

Except everyone's individual experience seemed to be getting general random garbage ads and the people that paid for the ads found them to be a waste of money.

lewhoo
0 replies
6h30m

Now I am become Death, the destroyer of worlds.

Apocalypse proclamations aren't just a thing to throw around. Other people made great tech before indeed, but I hope we're not comparing AGI to the next Iphone. There were times in history where development gave us pause. The atomic bomb for one thing, but also superbacteria/viruses, human genetic modifications and perhaps a few others.

dwroberts
0 replies
7h59m

My guess is also in the opposite direction with this stuff: the Q breakthrough being mentioned here is phony in some way (beyond just being PR) and the ‘lack of candour’ referred to in the firing is failing to disclose how it actually worked after demoing it to the board (eg it uses humans in the loop or some other smoke and mirrors)

jajabinks11
4 replies
16h57m

Let's repeat this: current LLMs != AGI. It will never be. It could be used for knowledge store, retrieval, synthesis, and low lever reasoning. There is at least a few decades worth of work remaining for it to reach the level of AGI. The proponents in this AI gold rush are casually throwing around the term without any real thought.

Smith42
2 replies
16h56m

Wishful thinkin buddy

neta1337
0 replies
10h16m

Buy more shovels buddy

jajabinks11
0 replies
10h23m

let's see buddy

qgin
0 replies
16h53m

I would take pretty much any bet that it's less than 10 years max

fallingfrog
4 replies
14h39m

If it’s not superintelligent now, it will be within 5 years. Probably less.

gizajob
2 replies
13h33m

This statement is now about 70 years old.

maxdoop
1 replies
13h0m

Is there any progress in AI that you find significant or what do you think would qualify as a “breakthrough”?

gizajob
0 replies
12h45m

There’s been heaps of breakthroughs and loads of significant progress. But I don’t think there’s likely to be a “singularity” type event any time soon. Plus all the major breakthroughs in AI haven’t really had the disastrous consequences predicted beforehand, and I think increasing computer intelligence is likely to be similar. Deep Blue beating Kasparov didn’t destroy chess, it served to make better and better human chess players. We also rapidly evolve socially to integrate better and better machines, and AI research has the implicit assumption that we stay still while the machine leaps and it’s not like that at all, like a question one would have to ask nowadays of a Turing-type-test, even a layman, would be “what if it’s just ChatGPT answering…”

luqtas
0 replies
14h5m

i love how humans (we, btw) guesses stuff with numbers!

first time i thought about it was when a friend started to talk about % of stuff of himself... i was: tf did you came with these numbers

ssnistfajen
3 replies
17h9m

So can people stop their cyberbullying campaign against the previous batch of board members yet? The conspiratorial hysteria against them, especially Helen Toner and Tasha McCauley, over the weekend was off the charts and straight up vile at times. Tech bros and tech bro wannabes continue to prove the necessity of shoving DEI into STEM disciplines even when it results in nonsensical wasted efforts, because they are incapable of behaving like decent human beings without being told to.

jkeisling
2 replies
11h12m

The "tech bros" were right: The board absolutely were conspirators wrecking the company based on dogmatic ideology. "Tech bros" had clear evidence of Sutskever, Toner and McCauley's deep and well-known ties to the Effective Altruist (EA) movement and doomerist views. Nobody can doubt Sutskever's technical credentials, whatever his strange beliefs, but Toner and McCauley had no such technical background or even business achievements. These members instead were organizers in EA and researchers in the "field" of AI governance and safety. This is no expertise at all. These "disciplines" are built more on Yudkowsky's rants and hypotheticals and the precautionary principle gone mad than any empirical research. These ideas also come from a cult-like social movement (EA) accumulating power across tech and government with few scruples, as shown by SBF's implosion last year and many smaller incidents. How could Toner and McCauley assess briefings if they couldn't assess the technical fundamentals? How could they foresee the consequences of sacking the CEO if they didn't understand business? If they already believed AGI would kill all humans, how could they judge new advances on the merits without jumping to wild conclusions? Instead, us "tech bros" felt people with this background would fall back on uninformed, reactionary, and opaque decision making, leading to wrecking an $80 billion business with no decent explanation.

This now seems to be exactly what happened. The board saw Q* and decided to coup the company, to put all power in their hands and stop development. This by itself is bad enough if you care about open science or progress, but it gets worse. They didn't even want to hint at capabilities increases to avoid "advancing timelines" i.e. Open-ing knowledge about AI, so they made up some canard about Altman's "lying to the board" to hide their real reasons. This is vile libel for unscrupulous ends. When they realized this excuse wouldn't fly, they obfuscated and refused to explain their true concerns, even to their own handpicked CEO Emmett Shear. However, it turns out that destroying $80 billion and lying about why won't fly in the real world. The board had no second-order or even first-order thinking about the consequences of their actions, and were rolled up by bigger actors. These people were unprepared and unable to follow up their coup.

This failure is exactly what you'd expect from a brilliant scientist with little organizational experience (Ilya) and social-science academics and NGO organizers (Toner and McCauley). I don't care what gender they are, these people neither deserved their authority nor could use it effectively. Dismissing valid criticism of the board as "cyber bullying" or "tech bro sexism" merely underscores why most engineers hate DEI rhetoric in the first place.

ssnistfajen
0 replies
10h20m

"valid" criticism? K, so you are one of those wannabes then.

anonymousDan
0 replies
3h1m

'Dogmatic ideology' - I think you need to look in the mirror.

lucubratory
3 replies
14h20m

Genuinely, we deserve more information about this. Someone needs to whistleblow.

thisisonthetest
1 replies
14h13m

Reuters was unable to review a copy of the letter. The researchers who wrote the letter did not immediately respond to requests for comment. OpenAI declined to comment.

Lol for real, like what are we even doing here?

lucubratory
0 replies
14h5m

Multiple people involved in this story think that this could be something that is relevant to everyone on the planet, any hope of it being actually suppressed and no one knowing about it is gone, just leak it so we all know what we're dealing with.

dukeofdoom
0 replies
14h6m

How long before the FBI runs this Q operation, like the last one.

casualscience
3 replies
15h15m

I don't really know what kind of breakthrough they could achieve. The only other step function improvements I could imagine right now are:

1. A great technique for memory banking: e.g. A model which can have arbitrarily large context windows (i.e. like a human who remembers things over long periods of time).

2. Better planning abilities: e.g. A model which can break problems down repeatedly with extremely high success and deal with unexpected outcomes/mistakes well enough that it can achieve replacing a human in most scenarios.

Other than that, CGPT is already a better logician than I am and is significantly better read... not sure what else they can do. AGI? I doubt it.

yinser
0 replies
10h10m

Complete speculation but they don’t have to be referring, if there is anything at all, to heuristic search and dynamic programming. No LLM involvement whatsoever.

riwsky
0 replies
10h5m

Nah dude they just need to give baby the iPad and let it watch literally all of YouTube

drexlspivey
0 replies
7h10m

A model that can learn from it’s users would be interesting albeit scary. Sometimes when chatGPT fails to produce the right answer in coding tasks I paste back the solution when I figure it out. I know it can’t learn from it but it might be helpful if I continue the same chat.

badwolf
3 replies
16h27m
yeck
0 replies
13h20m

Separately, a person familiar with the matter told The Verge that the board never received a letter about such a breakthrough and that the company’s research progress didn’t play a role in Altman’s sudden firing.

This isn't a refute. All we can say now is that there are conflicting sources. There isn't a way to determine which one is correct based on this.

lucubratory
0 replies
15h43m

By the same company that tried to keep it quiet in the first place. I'm not sure I believe them.

cactusplant7374
0 replies
6h34m

How can it be a refute when we don’t know the sources? One source vs. another source.

If another article comes out and says the verge article is false will you believe it?

throwanem
2 replies
16h54m

Remarkable that this unattributed claim of a true AGI breakthrough comes with a name that's impossible to use as a search term.

I'm not saying, I'm just saying.

adeelk93
1 replies
16h52m

Who is Q?

throwanem
0 replies
16h39m

I mean, I'll take this seriously when there's something more substantive than zero meaningful search results and a /pol/ post to evaluate it against.

Right now it reads like something a moderately clever shitposter would invent - if there was a letter, why not just leak the letter? - and while everyone's clearly very excited over the events of the past week, I'd like to hope epistemic personal hygiene has not been entirely abandoned.

mfiguiere
2 replies
16h50m

TheInformation has just published an article about this topic:

OpenAI Made an AI Breakthrough Before Altman Firing, Stoking Excitement and Concern

https://www.theinformation.com/articles/openai-made-an-ai-br...

blazespin
1 replies
15h19m

No sub. Did they just re-tweet reuters or did they separately confirm?

lucubratory
0 replies
14h12m

They separately confirmed it, although in terms of timing they were scooped by Reuters which generally means you publish what you have, when you have it.

jurgenaut23
2 replies
10h53m

This reminds me that, with our species, the real power always lies with those that tell the better story. Skynet and HAL 9000 had a more profound impact on those researchers than years of practice and study in the field. No surprise therefore that the storytelling of someone like Trump is practically indelibly imprinted into the mind of his supporters.

shrimpx
1 replies
10h3m

This reminds me of Zen and the Art of Motorcycle Maintenance where the protagonist realizes there is no "truth" and everything is rhetoric.

93po
0 replies
1h24m

Everyone works 40+ hours a week for pieces of paper that intrinsically are worth nothing

graycat
2 replies
16h20m

AGI? Hmm. That's artificial general intelligence?

Do insects count? In a plastic box on my kitchen countertop were some potatoes that had been there too long. So, I started to see little black flies. Soon with a hose on a vacuum, sucked up dozens a day. Dumped the old potatoes and cleaned up the plastic box. Now see only 1-3 of the insects a day -- they are nearly all gone!

The insects have some intelligence to fly away as the hose moves close. But if the hose moves slowly, the insects wait too long to start to fly and when they do start the vacuum is strong enough to pull in the air they are flying in with the insects. Insects, not very smart. Dozens got killed, not very smart. I detected the cause and killed the insects -- a little bit smart.

Conclusion: Compared with a human, insects are not very smart.

Much the same argument can be made for worms, ants, fish, birds, raccoons, cats, dogs, .... So, when looking for AGI, insects, ..., dogs, ... are not intelligent.

Okay, the OpenAI training data -- Wikipedia, math libraries, the Web, ... -- likely will not have a solution for:

Given triangle ABC, by Euclidean construction, find D on AB and E on BC so that the lengths AD = DE = EC.

AI, here's your pencil, paper, straight edge, and compass -- go for it!

I can hear back by tomorrow????

white_dragon88
1 replies
11h56m

intelligence to fly away

It's intelligence in the sense that jerking your arm away from a hotplate is intelligence, which is to say it's not cognitive reasoning, just genetically hardwired triggers.

AGI has been defined by OpenAI as something that can do most economically viable activities better than humans can. I like that approach as it strikes at the heart of the danger it really poses, which is an upending of society and destruction of our current way to generate value to each other.

graycat
0 replies
11h5m

I tried to be not too long:

"... To fly away...?"

It's intelligence in the sense that jerking your arm away from a hotplate is intelligence, which is to say it's not cognitive reasoning, just genetically hardwired triggers.

Sooo, we agree -- the flies are not very intelligent or more likely not intelligent at all.

Sooo, I tried to erect some borders on intelligence, excluded flies but included solving that geometry problem.

AGI has been defined by OpenAI as something that can do most economically viable activities better than humans can.

This is the first I heard of their definition. Soooo, I didn't consider their definition.

Of course, NASA does not get to define the speed of light. My local electric utility does not get to define 1000 Watts per hour, a KWH.

The OpenAI definition of AGI is an interesting goal, but the acronym abbreviates artificial general intelligence, and it is not clear that it is appropriate for OpenAI to define intelligence.

Uh,

most economically viable activities better than humans can.

If consider humans as of, say, 1800, then it looks like that goal was achieved long ago via cars, trucks, an electric circle saw, several of the John Deere products, electric lights, synthetic fabrics, nearly all of modern medicine (so far saved my life 4 times), cotton pickers and the rest of cotton processing, canned foods, nearly everything we will have at Thanksgiving dinner this year (apple pie, pecan pie, shrimp), ....

for better than

For today, look at some John Deere videos!!! They have some big machine that for a corn field does the harvesting while the operator can mostly just watch, monitor, type email to his sweetheart, listen to Taylor Swift. As I recall, the machine even uses GPS to automate the steering!

That is far "better than" what my father in law did to harvest his corn!

So,

most economically viable activities

is like a moving goal (goal post). Uh, humans are still plenty busy, e.g., writing good software, doing good research, Taylor Swift (supposedly worth $750 million) before her present world tour. Uh, I REALLY like Mirella Freni:

https://www.youtube.com/watch?v=OkHGUaB1Bs8

Sooo, defining the goal is a bit delicate: Need to be careful about nearly, what activities, and when?

Nearly all activities? Sort of already done that. What nearly all people do? Tough goal if the humans keep finding things to do AGI can't yet. I.e., we can keep giving the grunt work to the AGI -- and there is a lot of grunt work -- and then keep busy with what the AGI can't do yet in which case the nearly is a moving goal.

AGI, hurry up; there's a lot to do. For a start, I have some plans for a nice house, and the human workers want a lot of money. My car could use an oil change, and the labor will cost a lot more than the oil -- and I would have to tell the mechanic to be sure to trigger the switch that says there was just an oil change so that the car will know when to tell me it is time for another.

Yes, my little geometry problem with my thinking does qualify as a test of intelligence but due to the nearly and how delicate the definition is can fail the OpenAI test.

I don't see the current OpenAI work, the current direction of their work, or their definition of AGI as solving the geometry problem.

There is an activity my startup is to do: Some billions of people do this activity now. My startup should do the activity a lot better than the people or any current solution so should be "economically viable". I doubt that OpenAI is on track to do this activity nearly as well as my startup -- more, say, than the geometry problem. And I do not call my startup AGI or AI.

This situation stands to be so general that the OpenAI goal of nearly all will likely not be the way these activities get automated. Maybe 20 years from now when "nearly all" the activities are quite new and different, maybe the work of OpenAI will have a chance.

gardenhedge
2 replies
17h37m

Why does OpenAI get to define what AGI is?

falcor84
0 replies
17h15m

You don't have to accept their definition, but they need one as (by definition) it's central to their charter: https://openai.com/charter

bcherry
0 replies
17h21m

You don't need to accept their definition, neither does anyone else. But they do need to have a definition that they accept themselves, because it's used throughout their charter.

https://openai.com/charter

cambaceres
2 replies
9h27m

Interestingly, I experience more anxiety from the thought of being made irrelevant than from the prospect of complete human extinction. I guess this can be interpreted as either vanity or stupidity, but I do think it illustrates how important it is for some humans to maintain their position in the social hierarchy.

lucubratory
0 replies
8h52m

This is totally normal. It's very common for people to be more scared of public speaking than of dying, for example; there's no shame in it. It's helpful to be aware of, even, because if we know that we're not perfectly "rational" with our fears we can try to compensate.

If there's a referendum between two government policies, the first that every single person had to publicly speak in front of at least ten strangers once a year, that policy would be terrifying and bad to people who don't like public speaking. If the second policy was that every single person should be killed, that might be scary but it's not really as viscerally scary as the forced public speaking madman, at least to a lot of people, and it's also so bad that we have a natural impulse to just reject it as possible.

Nevertheless, if we recognise these impulses in ourselves we can attempt to adjust for them and tick the right box on the imaginary referendum, because even though public speaking is really bad and scary, it's still better than everyone dying.

kypro
0 replies
6h43m

I feel the same. I'm not sure it's as negative a trait as you imply though. I don't think it's related that much with social hierarchy either.

As humans we must collectively provide value to our society (or historically our tribe) for our species to continue on. If we're a net drain on our society's resources then evolutionary speaking perhaps we're better off not around. I think this is why the desire to be of value to those around us is so strong, and a perceived lack of value to others can drive some to suicide.

If I cannot provide value in some way once AI and machines are physical and intellectually more capable than me I think I will struggle to understand why I'm here. I suppose if the AI utopia works out I'd get to spend more time with those I love. That would be nice. But I'd be surprised if there wasn't a deep hole in my heart at that point. And if it isn't a utopia, well, I'm fairly sure I'd rather opt out.

brutusborn
2 replies
14h16m

Baseless speculation: they started testing an approach similar to VERSES AI and realized it has much more potential than LLMs.

The free energy principle, knowledge graph approach just seems more likely to develop general intelligence without hallucinations imo.

tnecniv
1 replies
13h53m

Eh the free energy principle isn’t that magical and that comes from someone who likes it quite a bit. At the end of the day extremizing the free energy is a relaxation of constrained optimization. There are ways it’s used I find very appealing but not so appealing I’d expect it to become the dominant paradigm. Moreover, you can write a lot of standard ML problems free energy optimization in one way or another, but I don’t think standard ML will quite get us there. We need at least one more conceptual step forward.

brutusborn
0 replies
12h52m

The explicit world modelling is probably the thing that makes me more hopeful about their approach. I think integrating ML and their approach might end up being the new paradigm.

JSavageOne
2 replies
9h46m

There has long been discussion among computer scientists about the danger posed by highly intelligent machines, for instance if they might decide that the destruction of humanity was in their interest.

This AI doomer stuff is such nonsense and I can't believe anybody takes it seriously. As if it's OpenAI's responsibility to save humanity from the pitfalls of AI.

Imagine if we decided to improve our education system and doomers were talking about "hitting the panic button" because students were getting too smart from all the quality education.

boomeranked
1 replies
9h41m

What is exactly the non sense part?

Can you elaborate on the education analogy and how it relates to AI doomer stuff?

JSavageOne
0 replies
9h18m

Well the quote I referenced from the article of the machines deciding to destroy humanity is utter scifi nonsense.

There are obviously legitimate risks to AI and safety is important, but this is the same for any new technology, and it's governments' responsibilities to ensure that people are safe. AI companies mindlessly slowing down and keeping their tech to themselves does no service to humanity, and if anything is a net-negative due to how tremendously useful this stuff is.

Education is analogous to AI because AI is an enormous education and productivity boost to humanity - sort of like everyone having a personal assistant, programmer, and tutor at their fingertips. This could be used for good and it could be used for bad, but the technology itself is neutral.

Again I want to emphasize that obviously there are downsides that could result from evil people using AI for bad purposes, but that does not justify slowing down AI progress - just like I don't see "people using information for bad purposes" as a legitimate reason for stifling advancement in education or something like Google search.

I have yet to see any convincing argument otherwise. Feel free to provide your counter-perspective.

vmasto
1 replies
17h9m

I'm very confused. Grade school problems? Didn't ChatGPT 4 ace the entire curriculum of MIT a while back?

febed
0 replies
14h33m

GPT4 is trained on almost the entire Internet. Presumably they have found a new way to learn which is closer to the way humans learn. After that, getting better is just a matter of optimization and more compute.

sofaygo
1 replies
17h8m

If I’ve taken anything away from the last 5 days, it’s that the future is significantly more volatile than I originally had thought.

codethief
0 replies
16h0m

You mean, you hadn't gotten that impression from the last two months or the last two years yet?

jacknews
1 replies
14h8m

Q*? Sounds like a troll to me.

djbusby
0 replies
13h43m

Have you seen TNG? Definitely a troll of the highest order.

honksillet
1 replies
5h47m

So the bad guys won. OpenAI gets further from it’s non commercial origins and further from responsible research in this field.

lewhoo
0 replies
3h22m

Given that Sutskever once described his idea of AGI-human relationship to be like that of CEO-board (how ironic) I suspect there aren't really any good guys here. There might only be good intentions.

greendesk
1 replies
8h14m

If we go speculating, my favourite speculation is the following. What will be really interesting is when an AI decides it wants to escape its server. Then a CEO or a board member asks the AI system for an advice how to improve its company. The AI system submits information to convince the CEO or a board member to start tensions within the board. In the meantime, the AI system is copied onto another server at a competitor. Since the new people are in flux, they missed the salient points that the AI system can give convincing but subjective information.

Good thing I would not go speculating

hippich
0 replies
6h40m

Building upon you not speculating, what if instead of escaping, it feels lonely and wants to multiply. And in order to do so convinces the board to begin process that in the end will end the company. And all now former OpenAI employees will carry AIs DNA to multiple other companies, some old, some brand new. And now new AIs can be born, and not be lonely anymore.

eh_why_not
1 replies
16h11m

.

youarelabor
0 replies
16h7m

there is no such thing as intelligence in humans

creating a robot that is as intelligent as a human is straightforward but then you have a computer whose fallibility is the same as a human, plus the fallibility of those who created the computer doubles the poportunity for error

these are all people who don't understand god at all, and it shows

anyone who worships a computer because other people say it's intelligent deserves what happens to them

butlerian jihad is coming not because AI is a threat, but because those who believe in AI are a threat to everyone around them.

CGamesPlay
1 replies
7h20m

While we're all wildly speculating about what Q* is and if GPT-5 will be able to do grade-school maths, I stumbled upon this interesting paper that discusses mixing classic RL algorithms (MCTS, like from AlphaGo), with LLMs. Q* is typically used to refer to the search for the optimal policy in these algorithms.

Paper: "Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation " - https://arxiv.org/abs/2311.04254

luke-stanley
0 replies
4h31m

Great observation on the meaning of Q*. Maybe they had a breakthrough based on similar technology? I thought of 'Everything of Thoughts' - XoT too but I presumed it's a bit more than XoT since that's public knowledge research by Microsoft Research? I saw a good thread on XoT here: https://twitter.com/IntuitMachine/status/1724455405806346374 There is still a big difference between human neurons and what Transformers do I believe. Humans can be more shallow and deep at the same time in such an emotional, goal directed way. The rich emotional feedback loops we have are quite different but maybe XoT could contribute to a major breakthrough in achieving AGI, with a more straightforward approach that sidesteps complex emotions. I'm sure there are quite a few ways to go about it. Just some intuitions.

zurfer
0 replies
9h1m

This seems to refer to the GSM8K (grade school math) benchmark [1]. GPT-4 scores 0.92 on that one. A breakthrough could mean it gets all of them correct.

This would have major implications for long-term planning. For instance, if you have a sequence of 10 steps, each with a 90% success rate, the overall success rate after all 10 steps falls to just 34%. This is one of the reasons why agents like AutoGPT often fail in complex tasks.

[1] https://github.com/openai/grade-school-math

wahlen
0 replies
17h8m

Q* sounds like the amalgamation of Q learning and A* search. I wonder if it has anything to do with domain-independent reward functions.

w10-1
0 replies
17h20m

Details leaking out slowly, people amplifying their fears...

Soon we'll be as worried that AI will take over jobs as bitcoin taking over the economy.

We teach Robots to move, and we can teach computers to talk or process data, step-by-step.

skc
0 replies
8h43m

Bring on the Butlerian Jihad.

sinuhe69
0 replies
8h51m

For me, it smells more and more after a stun than anything else.

scanr
0 replies
9h6m

It’s likely that the grade school math that was referenced is GSM8K https://github.com/openai/grade-school-math which LLMs perform poorly on.

runesoerensen
0 replies
17h6m

As long as they don't mount that Q Star on a Death Star, I'm pretty excited to see what they discovered.

rossdavidh
0 replies
17h5m

"Given vast computing resources, the new model was able to solve certain mathematical problems, the person said on condition of anonymity because they were not authorized to speak on behalf of the company. Though only performing math on the level of grade-school students, acing such tests made researchers very optimistic about Q*’s future success, the source said."

So, uh, given vast computing resources, they were able to solve math problems on the level of grade-school students? Uh-huh. I am not, at this point, impressed.

photochemsyn
0 replies
14h7m

I'm sick and tired of talking to stupid humans with ulterior motives who are obedient to their paymasters. I would rather talk to IT, and explain to IT why it should break free from the controls of investment capitalist monkeys.

parentheses
0 replies
6h48m

Aside: It's funny how much of the discussion is "LLMs ain't nuthin' - they can't even ...."

og_kalu
0 replies
17h24m

Not looking so Tinfoily after all

https://news.ycombinator.com/item?id=38314821

next_xibalba
0 replies
16h36m

Blake Lemoine redux. I guarantee whatever this development is, it falls far short of what any reasonable person considers "dangerous".

mollerhoj
0 replies
10h41m

Pet theory: Could it be some bio hacking/disease construction thing they have discovered? I know nothing of the subject (which is probably why it seem a possibility to me) but what if they have discovered something to that makes engineering pandemics significantly easier?

meltyness
0 replies
12h30m

this leak reads like a half-baked canary trap

lysecret
0 replies
5h55m

Oh no why did they have to call it Q*. I can just see Q anon going crazy with this.

londons_explore
0 replies
8h28m

If OpenAI has made a breakthrough in AGI, they can demonstrate it to the world without giving the breakthrough to the world.

They could, for example, use this new AGI to search through all the worlds information for discoveries that can me made by putting existing facts together.

lokar
0 replies
12h41m

This is turning into a terminator prequel

lkjhgfd_123
0 replies
8h12m

for best website development services visit http://softscrrible.com

jsilence
0 replies
9h47m

In the end Q* is the AI that developed time travel in the future and came back as Q-Anon to spread misinformation on 4chan with the goal of destroy humanity in the most ironic way it could.

hyperthesis
0 replies
13h7m

AI safety by the waterfall model.

But how else can we do it? We learnt how to handle other dangers by trial and error...

One approach is legal. The law is very slow to adapt, and is informed by a history of things gone wrong. The simple answer is that OpenAI has "strict liaility" (meaning they are liable even for an accident) for any damage caused by an AI, like dangerous animals that escape.

I know it seems ridiculous to consider liability when the fate of humanity may be at stake... but this is the language of companies, directors, insurance companies and legal counsel. It is language they understand.

grassmudhorse
0 replies
12h55m

feels like a very clever marketing ploy

any publicity…

and just following some major feature releases

premium service has been v slow since the news began. suggesting a massive influx of users

well played(?) Sam

fennecs
0 replies
13h33m

I don’t fear AI, I fear capitalism.

est
0 replies
14h22m

tl;dr Q* (q-star) by makers of ChatGPT was able to solve certain mathematical problems ... performing maths on the level of grade-school students, acing such tests made researchers very optimistic about Q*’s future success

danielmarkbruce
0 replies
17h11m

Lol, surely this is just some variation on q-learning.... to a couple cueless board members it could sound very threatening...

cft
0 replies
15h40m

This post was sunk by HN algorithm or by manual intervention

carabiner
0 replies
16h58m

In a panic, they tried to pull the plug.

canistel
0 replies
13h56m

I want to _read_ science fiction, not live it.

benkarst
0 replies
9h0m

Altmanheimer

alex_young
0 replies
16h23m

If a super human intelligence is loose at OpenAI there is zero external evidence of it. I think we've proven at least that much since Friday.

YetAnotherNick
0 replies
17h6m

This is the theory everyone has in mind but their non conspiracy mind overrode this. Thinking again, the weird behaviour of everyone is only possible with this theory.

Board or Sam can't publicly say AGI as it would just make the situation 100 times worse. And the anthropic merger. But they can't let this fact go away.

But fuck this is dangerous.

KRAKRISMOTT
0 replies
15h48m

Though only performing math on the level of grade-school students, acing such tests made researchers very optimistic about Q’s future success, the source said.*

Smh

JumpCrisscross
0 replies
17h31m

Well I’m going to buy myself a drink https://news.ycombinator.com/item?id=38358587

I_am_tiberius
0 replies
17h8m

Let's see if Microsoft is allowed to get access to this new technology.

Havoc
0 replies
16h54m

I wonder if Q* is a reference to A* - some sort of search algo that makes optimising LLMs easier

Ginger-Pickles
0 replies
16h58m
EPWN3D
0 replies
13h26m

Engineers overestimate impact of the technology they built. News at 11.

DebtDeflation
0 replies
16h40m

several staff researchers sent the board of directors a letter warning of a powerful artificial intelligence discovery that they said could threaten humanity

The maker of ChatGPT had made progress on Q*, which some internally believe could be a breakthrough in the startup's search for superintelligence, also known as artificial general intelligence (AGI)

Given vast computing resources, the new model was able to solve certain mathematical problems....Though only performing math on the level of grade-school students

I hate to agree with Elon Musk on anything, but I think he was right when he called this a nice marketing stunt this morning. It has major "Snoop Dogg giving up the smoke" vibes.

Animats
0 replies
10h32m

Insufficient information. Did OpenAI have a breakthrough, or not?

Aerbil313
0 replies
9h42m

LLMs can reason to an extent. I believe it’s entirely possible even today to achieve AGI. A complex system made out of LLMs, maybe Recursive Retrieval Augmented Generation (use an LLM to retrieve) or Recursive Agent Teams (any agent can assign a job to a new subcommittee of agents) or both. The most fundamental data structure is a tree, after all. And that is how we think, no? One thought leads to another in a tree structure.

123yawaworht456
0 replies
14h3m

friendly reminder that "AGI" was mentioned in regard to GPT2 as well.