HN comments for: GPT-4o

Jensson

21 replies

55m

2024-05-13 17:31:16 UTC

The most impressive part is that the voice uses the right feelings and tonal language during the presentation. I'm not sure how much of that was that they had tested this over and over, but it is really hard to get that right so if they didn't fake it in some way I'd say that is revolutionary.

gdb

10 replies

51m

2024-05-13 17:35:03 UTC

(I work at OpenAI.)

It's really how it works.

jamestimmins

1 replies

38m

2024-05-13 17:48:06 UTC

With this capability, how close are y'all to it being able to listen to my pronunciation of a new language (e.g. Italian) and given specific feedback about how to pronounce it like a local?

Seems like these would be similar.

taytus

0 replies

19m

2024-05-13 18:07:23 UTC

The italian output in the demo was really bad.

baq

1 replies

23m

2024-05-13 18:03:04 UTC

(I work at OpenAI.)

Winner of the 'understatement of the week' award (and it's only Monday).

Also top contender in the 'technically correct' category.

swyx

0 replies

22m

2024-05-13 18:04:43 UTC

and was briefly untrue for like 2 days

terhechte

0 replies

21m

2024-05-13 18:05:48 UTC

Random OpenAI question: While the GPT models have become ever cheaper, the price for the tts models have stayed in the $15/1Mio char range. I was hoping this would also become cheaper at some point. There're so many apps (e.g. language learning) that quickly become too expensive given these prices. With the GPT-4o voice (which sounds much better than the current TTS or TTS HD endpoint) I thought maybe the prices for TTS would go down. Sadly that hasn't happened. Is that something on the OpenAI agenda?

skottenborg

0 replies

22m

2024-05-13 18:04:42 UTC

"(I work at OpenAI.)"

Ah yes, also known as being co-founder :)

passion__desire

0 replies

20m

2024-05-13 18:06:06 UTC

hi gdb, could you please create an assistant AI that can filter low-quality HN discussion on your comment so that it can redirect my focus on useful stuff.

mttpgn

0 replies

23m

2024-05-13 18:03:26 UTC

Licensing the emotion-intoned TTS as a standalone API is something I would look forward to seeing. Not sure how feasible that would be if, as a sibling comment suggested, it bypasses the text-rendering step altogether.

bjtitus

0 replies

18m

2024-05-13 18:08:38 UTC

Is it possible to use this as a TTS model? I noticed on the announcement post that this is a single model as opposed to a text model being piped to a separate TTS model.

999900000999

0 replies

19m

2024-05-13 18:07:34 UTC

How far are we away from something like a helmet with chat GPT and a video camera installed, I imagine this will be awesome for low vision people. Imagine having a guide tell you how to walk to the grocery store, and help you grocery shop without an assistant. Of course you have tons of liability issues here, but this is very impressive

og_kalu

4 replies

51m

2024-05-13 17:35:43 UTC

The most impressive part is that the voice uses the right feelings and tonal language during the presentation.

Consequences of audio2audio (rather than audio >text text>audio). Being able to manipulate speech nearly as well as it manipulates text is something else. This will be a revelation for language learning amongst other things. And you can interrupt it freely now!

pants2

2 replies

29m

2024-05-13 17:57:06 UTC

However, this looks like it only works with speech - i.e. you can't ask it, "What's the tune I'm humming?" or "Why is my car making this noise?"

I could be wrong but I haven't seen any non-speech demos.

throwaway11460

0 replies

26m

2024-05-13 18:00:31 UTC

What about the breath analysis?

cube2222

0 replies

27m

2024-05-13 17:59:16 UTC

Fwiw, the live demo[0] included different kinds of breathing, and getting feedback on it.

[0]: https://youtu.be/DQacCB9tDaw?t=557

jcims

0 replies

27m

2024-05-13 17:58:57 UTC

Anyone who has used elevenlabs for voice generation has found this to be the case. Voice to voice seems like magic.

simonw

1 replies

31m

2024-05-13 17:55:40 UTC

that was very impressive, but it doesn't surprise me much given how good the voice mode is in the ChatGPT iPhone app is already.

The new voice mode sounds better, but the current voice mode did also have inflection that made it feel much more natural than most computer voices I've heard before.

Jensson

0 replies

22m

2024-05-13 18:04:53 UTC

Can you tell the current voice model what feelings and tone it should communicate with? If not it isn't even comparable, being able to control how it reads things is absolutely revolutionary, that is what was missing from using these AI models as voice actors.

newzisforsukas

1 replies

22m

2024-05-13 18:03:57 UTC

Right to who? To me, the voice sounds like an over enthusiastic podcast interviewer. Whats wrong with wanting computers to sound like what people think computers should sound like?

Jensson

0 replies

18m

2024-05-13 18:08:18 UTC

It understands tonal language, you can tell it how you want it to talk, I have never seen a model like that before. If you want it to talk like a computer you can tell it to, they did it during the presentation, that is so much better than the old attempts at solving this.

bredren

0 replies

37m

2024-05-13 17:49:31 UTC

I mention this down thread, but a symptom of a tech product of sufficient advancement is the nature of its introduction matters less and less.

Based on the casual production of these videos, the product must be this good.

https://news.ycombinator.com/item?id=40346002

msoad

18 replies

48m

2024-05-13 17:37:59 UTC

They are admitting[1] that the new model is the gpt2-chatbot that we have seen before[2]. As many highlighted there, the model is not an improvement like GPT3->GPT4. I tested a bunch of programming stuff and it was not that much better.

It's interesting that OpenAI is highlighting the Elo score instead of showing results for many many benchmarks that all models are stuck at 50-70% success.

[1] https://twitter.com/LiamFedus/status/1790064963966370209

[2] https://news.ycombinator.com/item?id=40199715

modeless

16 replies

46m

2024-05-13 17:40:35 UTC

"not that much better" is extremely impressive, because it's a much smaller and much faster model. Don't worry, GPT-5 is coming and it will be better.

talldayo

5 replies

40m

2024-05-13 17:46:44 UTC

Chalmers: "GPT-5? A vastly-improved model that somehow reduces the compute overhead while providing better answers with the same compute architecture? At this time of year? In this kind of market?"

Skinner: "Yes."

Chalmers: "May I see it?"

Skinner: "No."

og_kalu

1 replies

28m

2024-05-13 17:58:30 UTC

GPT-3 was released in 2020 and GPT-4 in 2023. Now we all expect 5 sooner than that but you're acting like we've been waiting years lol.

skepticATX

0 replies

22m

2024-05-13 18:04:34 UTC

[delayed]

AaronFriel

1 replies

33m

2024-05-13 17:53:19 UTC

It has only been a little over one year since GPT-4 was announced, and it was at the time the largest and most expensive model ever trained. It might still be.

Perhaps it's worth taking a beat and looking at the incredible progress in that year, and acknowledge that whatever's next is probably "still cooking".

Even Meta is still baking their 400B parameter model.

bamboozled

0 replies

22m

2024-05-13 18:04:31 UTC

Legit love progress

pwdisswordfishc

0 replies

33m

2024-05-13 17:53:17 UTC

Incidentally, this dialogue works equally well, if not better, with David Chalmers versus B.F. Skinner, as with the Simpsons characters.

mupuff1234

5 replies

42m

2024-05-13 17:44:22 UTC

And how can one be so sure of that?

Seems to me that performance is converging and we might not see a significant jump until we have another breakthrough.

diego_sandoval

1 replies

32m

2024-05-13 17:54:18 UTC

Seems to me that performance is converging

It doesn't seem that way to me. But even if it did, video generation also seemed kind of stagnant before Sora.

In general, I think The Bitter Lesson is the biggest factor at play here, and compute power is not stagnating.

drawnwren

0 replies

26m

2024-05-13 18:00:54 UTC

Computer power is not stagnating, but the availability of training data is. It's not like there's a second stackoverflow or reddit to scrape.

aantix

1 replies

27m

2024-05-13 17:59:50 UTC

The use of AI in the research of AI accelerates everything.

thefaux

0 replies

20m

2024-05-13 18:06:11 UTC

I'm not sure of this. The jury is still out on most ai tools. Even if it is true, it may be in a kind of strange reverse way: people innovating by asking what ai can't do and directing their attention there.

scarmig

0 replies

36m

2024-05-13 17:49:59 UTC

Yeah. There are lots of things we can do with existing capabilities, but in terms of progressing beyond them all of the frontier models seem like they're a hair's breadth from each other. That is not what one would predict if LLMs had a much higher ceiling than we are currently at.

I'll reserve judgment until we see GPT5, but if it becomes just a matter of who best can monetize existing capabilities, OAI isn't the best positioned.

moomoo11

2 replies

29m

2024-05-13 17:57:22 UTC

I really hope GPT5 is good. GPT4 sucks at programming.

verdverm

1 replies

26m

2024-05-13 18:00:20 UTC

Look to a specialized model instead of a general purpose one

moomoo11

0 replies

20m

2024-05-13 18:06:46 UTC

Any suggestions? Thanks

I have tried Phind and anything beyond mega junior tier questions it suffers as well and gives bad answers.

TIPSIO

0 replies

43m

2024-05-13 17:43:48 UTC

Obviously given enough time there will always be better models coming.

But I am not convinced it will be another GPT-4 moment. Seems like big focus on tacking together multi-modal clever tricks vs straight better intelligence AI.

Hope they prove me wrong!

cube2222

0 replies

28m

2024-05-13 17:57:57 UTC

I think the live demo that happened on the livestream is best to get a feel for this model[0].

I don't really care whether it's stronger than gpt-4-turbo or not. The direct real-time video and audio capabilities are absolutely magical and stunning. The responses in voice mode are now instantaneous, you can interrupt the model, you can talk to it while showing it a video, and it understands (and uses) intonation and emotion.

Really, just watch the live demo. I linked directly to where it starts.

Importantly, this makes the interaction a lot more "human-like".

[0]: https://youtu.be/DQacCB9tDaw?t=557

skilled

9 replies

55m

2024-05-13 17:31:37 UTC

Parts of the demo were quite choppy (latency?) so this definitely feels rushed in response to Google I/O.

Other than that, looks good. Desktop app is great, but I didn’t see no mention of being able to use your own API key so OS projects might still be needed.

The biggest thing is bringing GPT-4 to free users, that is an interesting move. Depending on what the limits are, I might cancel my subscription.

Jordan-117

2 replies

50m

2024-05-13 17:35:56 UTC

Seems like it was picking up on the audience reaction and stopping to listen.

To me the more troubling thing was the apparent hallucination (saying it sees the equation before he wrote it, commenting on an outfit when the camera was down, describing a table instead of his expression), but that might have just been latency awkwardness. Overall, the fast response is extremely impressive, as is the new emotional dimension of the voice.

sebastiennight

0 replies

46m

2024-05-13 17:40:26 UTC

Aha, I think I saw the trick for the live demo: every time they used the "video feed", they did prompt the model specifically by saying:

- "What are you seeing now"

- "I'm showing this to you now"

etc.

The one time where he didn't prime the model to take a snapshot this way, was the time where the model saw the "table" (an old snapshot, since the phone was on the table/pointed at the table), so that might be the reason.

ayhanfuat

0 replies

43m

2024-05-13 17:43:15 UTC

Commenting on the outfit was very weird indeed. Greg Brockman's demo includes some outfit related questions (https://twitter.com/gdb/status/1790071008499544518). It does seem very impressive though, even if they polished it on some specific tasks. I am looking forward to showing my desktop and asking questions.

tailspin2019

1 replies

46m

2024-05-13 17:40:19 UTC

Regarding the limits, I recently found that I was hitting limits very quickly on GPT-4 on my ChatGPT Plus plan.

I’m pretty sure that wasn’t always the case - it feels like somewhere along the lines the allowed usage was reduced, unless I’m imagining it. It wouldn’t be such a big deal if there was more visibility of my current usage compared to my total “allowance”.

I ended up upgrading to ChatGPT Team which has a minimum of 2x users (I now use both accounts) but I resented having to do this - especially being forced to pay for two users just to meet their arbitrary minimum.

I feel like I should not be hitting limits on the ChatGPT Plus paid plan at all based on my usage patterns.

I haven’t hit any limits on the Team plan yet.

I hope they continue to improve the paid plans and become a bit more transparent about usage limits/caps. I really do not mind paying for this (incredible) tech, but the way it’s being sold currently is not quite right and feels like paid users get a bit of a raw deal in some cases.

I have API access but just haven’t found an open source client that I like using as much as the native ChatGPT apps yet.

emporas

0 replies

38m

2024-05-13 17:48:14 UTC

I use GPT from API in emacs, it's wonderful. Gptel is the program.

Although API access through Groq to Llama 3 (8b and 70b) is so much faster, that i cannot stand how slow GPT is anymore. It is slooow, still very capable model, but marginally better than open source alternatives.

dharma1

1 replies

35m

2024-05-13 17:51:10 UTC

what's the download link for the desktop app? can't find it

mpeg

0 replies

24m

2024-05-13 18:01:58 UTC

seems like it might not be available for everyone? – my chatgpt plus doesn't show anything new, and also can't find the dekstop app

russdill

0 replies

35m

2024-05-13 17:51:35 UTC

They need to fade the audio or add some vocal queue when it's being interrupted. It makes it sound like it's losing connection. What'll be really impressive is when it intentionally starts interrupting you.

Jensson

0 replies

44m

2024-05-13 17:42:10 UTC

Parts of the demo were quite choppy (latency?) so this definitely feels rushed in response to Google I/O.

It just stops the audio feed when it detects sound instead of an AI detecting when it should speak, so that part is horrible yeah. A full AI conversation would detect the natural pauses where you give it room to speak or when you try to take the word from it by interrupting, there it was just some dumb script to just shut it off when it hears sound.

But it is still very impressive for all the other part, that voice is really good.

Edit: If anyone from OpenAI reads this, at least fade out the voice quickly instead of chopping it, hard chopping off audio doesn't sound good at all, so many experienced this presentation to be extremely buggy due to it.

MP_1729

7 replies

39m

2024-05-13 17:47:16 UTC

This thing continues to stress my skepticism for AI scaling laws and the broad AI semiconductor capex spending.

1- OpenAI is still working in GPT-4-level models. More than 14 months after the launch of GPT-4 and after more than $10B in capital raised. 2- The rhythm that token prices are collapsing is bizarre. Now a (bit) better model for 50% of the price. How people seriously expect these foundational model companies to make substantial revenue? Token volume needs to double just for revenue to stand still. Since GPT-4 launch, token prices are falling 84% per year!! Good for mankind, but crazy for these companies. 3- Maybe I am an asshole, but where are my agents? I mean, good for the consumer use case. Let's hope the rumors that Apple is deploying ChatGPT with Siri are true, these features will help a lot. But I wanted agents! 4- These drop in costs are good for the environment! No reason to expect them to stop here.

spacebanana7

0 replies

22m

2024-05-13 18:04:05 UTC

Sam Altman gave the impression that foundation models would be a commodity on his appearance in the All in Podcast, at least in my read of what he said.

The revenue will likely come from application layer and platform services. ChatGPT is still much better tuned for conversation than anything else in my subjective experience and I’m paying premium because of that.

Alternatively it could be like search - where between having a slightly better model and getting Apple to make you the default, there’s an ad market to be tapped.

ldjkfkdsjnv

0 replies

29m

2024-05-13 17:57:06 UTC

Yeah I'm also getting suspicious. Also, all of the models (opus, llama3, gpt4, gemini pro) are converging to similar levels of performance. If it was true that the scaling hypothesis was true, we would see a greater divergence of model performance

htrp

0 replies

33m

2024-05-13 17:53:42 UTC

Did we ever get confirmation that GPT 4 was a fresh training run vs increasingly complex training on more tokens on the base GPT3 models?

hn_throwaway_99

0 replies

21m

2024-05-13 18:05:36 UTC

I'm ceaselessly amazed at people's capacity for impatience. I mean, when GPT 4 came out, I was like "holy f, this is magic!!" How quickly we get used to that magic and demand more.

Especially since this demo is extremely impressive given the voice capabilities, yet still the reaction is, essentially, "But what about AGI??!!" Seriously, take a breather. Never before in my entire career have I seen technology advance at such a breakneck speed - don't forget transformers were only invented 7 years ago. So yes, there will be some ups and downs, but I couldn't help but laugh at the thought that "14 months" is seen as a long time...

hehdhdjehehegwv

0 replies

28m

2024-05-13 17:58:17 UTC

This is why think Meta has been so shrewd in their “open” model approach. I can run Llama3-70B on my local workstation with an A6000, which after the up-front cost of the card, is just my electricity bill.

So despite all the effort and cost that goes into these models, you still have to compete against a “free” offering.

Meta doesn’t sell an API, but they can make it harder for everybody else to make money on it.

adtac

0 replies

19m

2024-05-13 18:07:46 UTC

Token volume needs to double just for revenue to stand still

Profits are the real metric. Token volume doesn't need to double for profits to stand still if operational costs go down.

IanCal

0 replies

20m

2024-05-13 18:06:52 UTC

Tbf gpt4 level seems useful and better than almost everything else (or close if not). The more important barriers for use in applications have been cost, throughout and latency. Oh and modalities, which have expanded hugely.

christianqchung

5 replies

51m

2024-05-13 17:34:58 UTC

Does anyone know how they're doing the audio part where Mark breaths too hard? Does his breathing get turned into all-caps text (AA EE OO) and that GPT4-o interprets that as him breathing too hard, or is there something more going on?

modeless

2 replies

47m

2024-05-13 17:39:09 UTC

There is no text. The model understands ingests audio directly and also outputs audio directly.

dclowd9901

1 replies

34m

2024-05-13 17:52:17 UTC

Is it a stretch to think this thing could accurately "talk" with animals?

jamilton

0 replies

21m

2024-05-13 18:05:19 UTC

Yes? Why would it be able to do that?

Jordan-117

0 replies

48m

2024-05-13 17:38:06 UTC

That's how it used to do it, but my understanding is that this new model processes audio directly. If it were a music generator, the original would have generated sheet music to send to a synthesizer (text to speech), while now it can create the raw waveform from scratch.

GalaxyNova

0 replies

50m

2024-05-13 17:36:24 UTC

It can natively interpret voice now.

syntaxing

4 replies

53m

2024-05-13 17:33:06 UTC

I admit I drink the koolaid and love LLMs and their applications. But damn, the way it’s responds in the demo gave me goosebumps in a bad way. Like an uncanny valley instincts kicks in.

wslack

0 replies

21m

2024-05-13 18:05:34 UTC

It should do that, because it's still not actually an intelligence. It's a tool that is figuring out what to say in response that sounds intelligent - and will often succeed!

bbconn

0 replies

22m

2024-05-13 18:04:26 UTC

Yeah it made me realize that I actually don't want a human-like conversational bot (I have actual humans for that). Just teach me javascript like a robot.

_Parfait_

0 replies

43m

2024-05-13 17:43:15 UTC

You're watching the species be reduced to an LLM.

TheSockStealer

0 replies

24m

2024-05-13 18:02:11 UTC

I also thought the screwups, although minor, were interesting. Like when it thought his face was a desk because it did not update the image it was "viewing". It is still not perfect, which made the whole thing more believable.

modeless

4 replies

55m

2024-05-13 17:31:27 UTC

As far as I'm concerned this is the new best demo of all time. This is going to change the world in short order. I doubt they will be ready with enough GPUs for the demand the voice+vision mode is going to get, if it's really released to all free users.

Now imagine this in a $16k humanoid robot, also announced this morning: https://www.youtube.com/watch?v=GzX1qOIO1bE The future is going to be wild.

andy99

3 replies

49m

2024-05-13 17:37:00 UTC

Really? If this was Apple it might make sense, for OpenAI it feels like a demo that's not particularly aligned with their core competency (a least by reputation) of building the most performant AI models. Or put another way, it says to me they're done building models and are now wading into territory where there are strong incumbents.

All the recent OpenAI talk had me concerned that the tech has peaked for now and that expectations are going to be reset.

modeless

1 replies

37m

2024-05-13 17:49:08 UTC

What strong incumbents are there in conversational voice models? Siri? Google Assistant? This is in a completely different league. I can see from the reaction here that people don't understand. But they will when they try it.

fidotron

0 replies

24m

2024-05-13 18:02:38 UTC

In common with Siri, Google Assistant, Alexa and chatgpt is the perception that over time the same thing actually gets worse.

Whether it's real or not is a reasonably interesting question, because it's possible that all that occurs with the progress is our perception of how things should be advances. My gut feeling is it has been a bit of both though, in the sense the decline is real, and we expect things to improve.

Who can forget Google demoing their AI making a call to a restaurant that they showed at I/O many years ago? Everyone, apparently.

golol

0 replies

29m

2024-05-13 17:57:32 UTC

What Openai has done time and time again is completely change the landscape when the competitors have caught up and everyone thinks their lead is gone. They made image generation a thing. When GPT-3 became outdated they released ChatGPT. Instead of trying to keep Dalle competitive they released Sora. Now they change the game again with live audio+video.

cs702

4 replies

36m

2024-05-13 17:50:00 UTC

The usual critics will quickly point out that LLMs like GPT-4o still have a lot of failure modes and suffer from issues that remain unresolved. They will point out that we're reaping diminishing returns from Transformers. They will question the absence of a "GPT-5" model. And so on -- blah, blah, blah, stochastic parrots, blah, blah, blah.

Ignore the critics. Watch the demos. Play with it.

This stuff feels magical. Magical. It makes the movie "Her" look it's no longer in the realm of science fiction but in the realm of incremental product development. HAL's unemotional monotone in Kubrick's movie, "Space Odyssey," feels... primitive by comparison. I'm impressed at how well this works.

Well-deserved congratulations to everyone at OpenAI!

aftbit

1 replies

29m

2024-05-13 17:56:59 UTC

Who cares? This stuff feels magical. Magical!

On one hand, I agree - we shouldn't diminish the very real capabilities of these models with tech skepticism. On the other hand, I disagree - I believe this approach is unlikely to lead to human-level AGI.

Like so many things, the truth probably lies somewhere between the skeptical naysayers and the breathless fanboys.

CamperBob2

0 replies

27m

2024-05-13 17:59:23 UTC

On the other hand, I disagree - I believe this approach is unlikely to lead to human-level AGI.

You might not be fooled by a conversation with an agent like the one in the promo video, but you'd probably agree that somewhere around 80% of people could be. At what percentage would you say that it's good enough to be "human-level?"

CamperBob2

1 replies

31m

2024-05-13 17:55:26 UTC

Imagine what an unfettered model would be like. 'Ex Machina' would no longer be a software-engineering problem, but just another exercise in mechanical and electrical engineering.

The future is indeed here... and it is, indeed, not equitably distributed.

aftbit

0 replies

29m

2024-05-13 17:57:44 UTC

Or from Zones of Thought series, Applied Theology, the study of communication with and creation of superhuman intelligences that might as well be gods.

yumraj

3 replies

46m

2024-05-13 17:40:14 UTC

In the first video the AI seems excessively chatty.

hipadev23

2 replies

35m

2024-05-13 17:51:07 UTC

chatGPT desperately needs a "get to the fucking point" mode.

tomashubelbauer

0 replies

27m

2024-05-13 17:59:43 UTC

Seriously. I've had to spell out that it should just answer in twelve different ways with examples in the custom instructions to make it at least somewhat usable. And it still "forgets" sometimes.

chatcode

0 replies

21m

2024-05-13 18:05:48 UTC

It does, that's "custom instructions".

ralusek

3 replies

55m

2024-05-13 17:31:12 UTC

Can't find info which of these new features are available via the API

tazu

2 replies

54m

2024-05-13 17:32:04 UTC

Developers can also now access GPT-4o in the API as a text and vision model. GPT-4o is 2x faster, half the price, and has 5x higher rate limits compared to GPT-4 Turbo. We plan to launch support for GPT-4o's new audio and video capabilities to a small group of trusted partners in the API in the coming weeks.

ralusek

1 replies

52m

2024-05-13 17:34:27 UTC

[EDIT] The model has since been added to the docs

Not seeing it or any of those documented here:

https://platform.openai.com/docs/models/overview

OutOfHere

0 replies

30m

2024-05-13 17:56:54 UTC

It is not listed as of yet, but it does work if you punch in gpt-4o. I will stick with gpt-4-0125-preview for now because gpt-4o seems majorly prone to hallucinations whereas gpt-4-0125-preview doesn't.

hubraumhugo

3 replies

53m

2024-05-13 17:33:10 UTC

The movie Her has just become reality

speedgoose

1 replies

46m

2024-05-13 17:40:03 UTC

It’s getting closer. A few years ago the old Replika AI was already quite good as a romantic partner, especially when you started your messages with a * character to force OpenAI GPT-3 answers. You could do sexting that OpenAI will never let you have nowadays with ChatGPT.

aftbit

0 replies

27m

2024-05-13 17:59:08 UTC

Why does OpenAI think that sexting is a bad thing? Why is AI safety all about not saying things that are disturbing or offensive, rather than not saying things that are false or unaligned?

volleygman180

0 replies

28m

2024-05-13 17:58:26 UTC

I was surprised that the voice is a ripoff of the AI voice in that movie (Scarlett Johansson) too

101008

3 replies

40m

2024-05-13 17:46:48 UTC

Are the employees in the demo high-directives of OpenAI? I can understand Altman being happy with this progress, but what about the medium/low employees? Didn't they watch Oppenheimer? Are they happy they are destroying humanity/work/etc for future and not-so-future generations?

Anyone who thinks this will be like the previous work revolutions is nonsense. This replaces humans and will replace them even more on each new advance. What's their plan? Live out of their savings? What about family/friends? I honestly can't see this and think how they can be so happy about it...

"Hey, we created something very powerful that will do your work for free! And it does it better than you and faster than you! Who are you? It doesn't matter, it applies to all of you!"

And considering I was thinking in having a kid next year, well, this is a no.

galdosdi

1 replies

34m

2024-05-13 17:52:39 UTC

Have a kid anyway, if you otherwise really felt driven to it. Reading the tealeaves in the news is a dumb reason to change decisions like that. There's always some disaster looming, always has been. If you raise them well they'll adapt well to whatever weird future they inherit and be amongst the ones who help others get through it

101008

0 replies

31m

2024-05-13 17:55:20 UTC

Thanks for taking the time to answer instead of (just) downvoting. I understand your logic but I don't see a future where people can adapt to this and get through it. I honestly see a future so dark and we'll be there much sooner than we thought... when OpenAI released their first model people were talking about years before seeing real changes and look what happened. The advance is exponential...

nice_byte

0 replies

28m

2024-05-13 17:58:52 UTC

"It is difficult to get a man to understand something when his salary depends on his not understanding it."

summerlight

2 replies

43m

2024-05-13 17:43:46 UTC

This is really impressive engineering. I thought real time agents would completely change the way we're going to interact with large models but it would take 1~2 more years. I wonder what kind of new techs are developed to enable this, but OpenAI is fairly secretive so we won't be able to know their sauce.

On the other hand, this also feels like a signal that reasoning capability has probably already been plateaued at GPT-4 level and OpenAI knew it so they decided to focus on research that matters to delivering product engineering rather than long-term research to unlock further general (super)intelligence.

nopinsight

1 replies

33m

2024-05-13 17:53:35 UTC

Reliable agents in diverse domains need better reasoning ability and fewer hallucinations. If the rumored GPT-5 and Q* capabilities are true, such agents could become available soon after it’s launched.

summerlight

0 replies

20m

2024-05-13 18:06:02 UTC

Sam has been pretty clear on denying GPT-5 rumors, so I don't think it will come anytime soon.

spacebanana7

2 replies

51m

2024-05-13 17:35:43 UTC

We recognize that GPT-4o’s audio modalities present a variety of novel risks

For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies.

I wonder if they’ll ever allow truly custom voices from audio samples.

dkasper

1 replies

47m

2024-05-13 17:39:29 UTC

I think the issue there is less of a technical one and more of an issue with deepfakes and copyright

spacebanana7

0 replies

42m

2024-05-13 17:44:02 UTC

It might be possible to prove that I control my voice, or that of a given audio sample. For example by saying specific words on demand.

But yeah I see how they’d be blamed if anything went wrong, which it almost certainly would in some cases.

levocardia

2 replies

35m

2024-05-13 17:51:16 UTC

As a paid user this felt like a huge letdown. GPT-4o is available to everyone so I'm paying $20/mo for...what, exactly? Higher message limits? I have no idea if I'm close to the message limits currently (nor do I even know what they are). So I guess I'll cancel, then see if I hit the limits?

I'm also extremely worried that this is a harbinger of the enshittification of ChatGPT. Processing video and audio for all ~200 million users is going to be extravagantly expensive, so my only conclusion is that OpenAI is funding this by doubling down on payola-style corporate partnerships that will result in ChatGPT slyly trying to mention certain brands or products in our conversations [1].

I use ChatGPT every day. I love it. But after watching the video I can't help but think "why should I keep paying money for this?"

[1] https://www.adweek.com/media/openai-preferred-publisher-prog...

muttantt

0 replies

34m

2024-05-13 17:52:09 UTC

So... cancel the subscription?

CodeCrusader

0 replies

30m

2024-05-13 17:56:35 UTC

Completely agree, none of the updates will apply to any of my use cases, disappointment.

csjh

2 replies

46m

2024-05-13 17:40:21 UTC

I wonder if this is what the "gpt2-chatbot" that was going around earlier this month was

lambdaba

0 replies

41m

2024-05-13 17:44:59 UTC

yes it was

AndyNemmity

0 replies

32m

2024-05-13 17:53:58 UTC

it was

bearjaws

2 replies

36m

2024-05-13 17:50:09 UTC

OAI just made an embarrassment of Google's fake demo earlier this year. Given how this was recorded, I am pretty certain it's authentic.

hehdhdjehehegwv

0 replies

24m

2024-05-13 18:02:44 UTC

This feature has been in iOS for a while now, just really slow and without some of the new vision aspects. This seems like a version 2 for me.

CivBase

0 replies

25m

2024-05-13 18:01:03 UTC

I don't doubt this is authentic, but if they really wanted to fake those demos, it would be pretty easy to do using pre-recorded lines and staged interactions.

atgctg

2 replies

56m

2024-05-13 17:30:03 UTC

Tiktoken added support for GPT-4o: https://github.com/openai/tiktoken/commit/9d01e5670ff50eb74c...

It has an increased vocab size of 200k.

simonw

0 replies

28m

2024-05-13 17:58:52 UTC

Oh interesting, does that mean languages other than English won't be paying such a large penalty in terms of token lengths?

With previous tokenizers there was a notable increase in the number of tokens needed to represent non-English sentences: https://simonwillison.net/2023/Jun/8/gpt-tokenizers/

minimaxir

0 replies

53m

2024-05-13 17:32:57 UTC

For posterity, GPT-3.5/4's tokenizer was 100k. The benefit of a larger tokenizer is more efficient tokenization (and therefore cheaper/faster) but with massive diminishing returns: the larger tokenizer makes the model more difficult to train but tends to reduce token usage by 10-15%.

MisterBiggs

2 replies

42m

2024-05-13 17:44:13 UTC

I've been waiting to see someone drop a desktop app like they showcased. I wonder how long until it is normal to have an AI looking at your screen the entire time your machine is unlocked. Answering contextual questions and maybe even interjecting if it notices you made a mistake and moved on.

doomroot13

1 replies

36m

2024-05-13 17:50:05 UTC

That seems to be what Microsoft is building and will reveal as a new Windows feature at BUILD '24. Not too sure about the interjecting aspect but ingesting everything you do on your machine so you can easily recall and search and ask questions, etc. AI Explorer is the rumored name and will possibly run locally on Qualcomm NPUs.

ukuina

0 replies

30m

2024-05-13 17:56:03 UTC

Yes, this is Windows AI Explorer.

Jimmc414

2 replies

48m

2024-05-13 17:38:03 UTC

Big questions are (1) when is this going to be rolled out to paid users? (2) what is the remaining benefit of being a paid user if this is rolled out to free users? (3) Biggest concern is will this degrade the paid experience since GPT-4 interactions are already rate limited. Does OpenAI have the hardware to handle this?

Edit: according to @gdb this is coming in "weeks"

https://twitter.com/gdb/status/1790074041614717210

onemiketwelve

0 replies

41m

2024-05-13 17:45:46 UTC

thanks, I was confused because the top of the page says to try now when you cannot in fact try it at all

Tenoke

0 replies

26m

2024-05-13 18:00:17 UTC

what is the remaining benefit of being a paid user if this is rolled out to free users?

It says so right in the post

We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits

The limits are much lower for free users.

GalaxyNova

2 replies

57m

2024-05-13 17:29:19 UTC

It is really cool that they are bringing this to free users. It does make me wonder what justifies ChatGPT plus now though...

pantsforbirds

0 replies

48m

2024-05-13 17:38:44 UTC

I assume the desktop app with voice and vision is rolling out to plus users first?

InfiniteVortex

0 replies

52m

2024-05-13 17:34:01 UTC

they stated that they will be announcing something new that is on the next frontier (or close to it IIRC) soon. so there will definitely be an incentive to pay because it will be something better than gpt 4o.

w-m

1 replies

47m

2024-05-13 17:39:26 UTC

Gone are the days of copy-pasting to/from ChatGPT all the time, now you just share your screen. That's a fantastic feature, in how much friction that removes. But what an absolute privacy nightmare.

With ChatGPT having a very simple text+attachment in, text out interface, I felt absolutely in control of what I tell it. Now when it's grabbing my screen or a live camera feed, that will be gone. And I'll still use it, because it's just so damn convenient?

baby_souffle

0 replies

31m

2024-05-13 17:55:23 UTC

Now when it's grabbing my screen or a live camera feed, that will be gone. And I'll still use it, because it's just so damn convenient?

Presumably you'll have a way to draw a bounding box around what you want to show or limit to just a particular window the same way you can when doing a screen share w/ modern video conferencing?

sebastiennight

1 replies

44m

2024-05-13 17:42:20 UTC

Anyone who watched the OpenAI livestream: did they "paste" the code after hitting CTRL+C ? Or did the desktop app just read from the clipboard?

Edit: I'm asking because of the obvious data security implications of having your desktop app read from the clipboard _in the live demo_... That would definitely put a damper to my fanboyish enthusiasm about that desktop app.

golol

0 replies

27m

2024-05-13 17:59:26 UTC

To me it looked they used one command that did copy+paste into ChatGPT both.

rvz

1 replies

52m

2024-05-13 17:34:30 UTC

Given that they are moving all these features to free users, it tells us that GPT-5 is around the corner and is significantly much better than their previous models.

margorczynski

0 replies

37m

2024-05-13 17:49:20 UTC

Or maybe it is a desperation move after Llama 3 got released and the free mode will have such tight constraints that it will be unusable for anything a bit more serious.

peppertree

1 replies

45m

2024-05-13 17:41:30 UTC

Just like that Google is on back foot again.

tempsy

0 replies

39m

2024-05-13 17:47:18 UTC

Considering the stock pumped following the presentation the market doesn't seem particularly with what OpenAI released at all.

karaterobot

1 replies

39m

2024-05-13 17:47:15 UTC

That first demo video was impressive, but then it ended very abruptly. It made me wonder if the next response was not as good as the prior ones.

dclowd9901

0 replies

29m

2024-05-13 17:57:04 UTC

Extremely impressive -- hopefully there will be an option to color all responses with a underlying brevity. It seemed like the AI just kept droning on and on.

causal

1 replies

49m

2024-05-13 17:37:14 UTC

Clicking the "Try it on ChatGPT" link just takes me to GPT-4 chat window. Tried again in an incognito tab (supposing my account is the issue) and it just takes me to 3.5 chat. Anyone able to use it?

101008

0 replies

46m

2024-05-13 17:40:25 UTC

Same here and also I can't hear audio in any of the videos on this page. Weird.

brainer

1 replies

29m

2024-05-13 17:57:18 UTC

OpenAI's Mission and the New Voice Mode of GPT-4

• Sam Altman, the CEO of OpenAI, emphasizes two key points from their recent announcement. Firstly, he highlights their commitment to providing free access to powerful AI tools, such as ChatGPT, without advertisements or restrictions. This aligns with their initial vision of creating AI for the benefit of the world, allowing others to build amazing things using their technology. While OpenAI plans to explore commercial opportunities, they aim to continue offering outstanding AI services to billions of people at no cost.

• Secondly, Altman introduces the new voice and video mode of GPT-4, describing it as the best compute interface he has ever experienced. He expresses surprise at the reality of this technology, which provides human-level response times and expressiveness. This advancement marks a significant change from the original ChatGPT and feels fast, smart, fun, natural, and helpful. Altman envisions a future where computers can do much more than before, with the integration of personalization, access to user information, and the ability to take actions on behalf of users.

https://blog.samaltman.com/gpt-4o

simonw

0 replies

20m

2024-05-13 18:06:38 UTC

Please don't post AI-generated summaries here.

MBCook

1 replies

30m

2024-05-13 17:56:35 UTC

Too bad they consume 25x the electricity Google does.

https://www.brusselstimes.com/world-all-news/1042696/chatgpt...

simonw

0 replies

23m

2024-05-13 18:03:35 UTC

That's not a well sourced story: it doesn't say where the numbers come from. Also:

"However, ChatGPT consumes a lot of energy in the process, up to 25 times more than a Google search."

That's comparing a Large Language Model prompt to a search query.

CivBase

1 replies

34m

2024-05-13 17:52:40 UTC

Those voice demos are cool but having to listen to it speak makes me even more frustrated with how these LLMs will drone on and on without having much to say.

For example, in the second video the guy explains how he will have it talk to another "AI" to get information. Instead of just responding with "Okay, I understand" it started talking about how interesting the idea sounded. And as the demo went on, both "AIs" kept adding unnecessary commentary about the secenes.

I would hate having to talk with these things on a regular basis.

golol

0 replies

24m

2024-05-13 18:02:46 UTC

Yea at some pont the style and tone of these assistants needs to be seriously changed, I can imagine a lot of their RLHF and instruct processes emphasize sounding good vs being good too much.

taytus

0 replies

22m

2024-05-13 18:04:36 UTC

the OpenAI live stream was quite underwhelming...

tailspin2019

0 replies

33m

2024-05-13 17:53:48 UTC

Does anyone with a paid plan see anything different in the ChatGPT iOS app yet?

Mine just continues to show “GPT 4” as the model - it’s not clear if that’s now 4o or there is an app update coming…

sourcecodeplz

0 replies

52m

2024-05-13 17:34:42 UTC

It is quite nice how they keep giving premium features for free, after a while. I know openai is not open and all but damn, they do give some cool freebies.

sn_master

0 replies

43m

2024-05-13 17:42:58 UTC

This is every romance scammer's dreams come true...

skepticATX

0 replies

42m

2024-05-13 17:44:48 UTC

Very impressive demo, but not really a step change in my opinion. The hype from OpenAI employees was on another level, way more than was warranted in my opinion.

Ultimately, the promise of LLM proponents is that these models will get exponentially smarter - this hasn’t born out yet. So from that perspective, this was a disappointing release.

If anything, this feels like a rushed release to match what Google will be demoing tomorrow.

simonw

0 replies

33m

2024-05-13 17:53:24 UTC

I'm seeing gpt-4o in the OpenAI Playground interface already: https://platform.openai.com/playground/chat?mode=chat&model=...

First impressions are that it feels very fast.

pachico

0 replies

50m

2024-05-13 17:36:30 UTC

jeez, that model really speaks a lot! I hope there's a way to make it more straight to the point rather than radio-like.

mickg10

0 replies

22m

2024-05-13 18:04:15 UTC

So, babelfish soon?

mickg10

0 replies

22m

2024-05-13 18:04:36 UTC

So, babelfish incoming?

mellosouls

0 replies

47m

2024-05-13 17:39:36 UTC

Very, very impressive for a "minor" release demo. The capabilities here would look shockingly advanced just 5 years ago.

Universal translator, pair programmer, completely human sounding voice assistant and all in real time. Scifi tropes made real.

But: Interesting next to see how it actually performs IRL latency and without cherry-picking. No snark, it was great but need to see real world power. Also what the benefits are to subscribers if all this is going to be free...

mellosouls

0 replies

40m

2024-05-13 17:46:42 UTC

@sama reflects:

https://blog.samaltman.com/gpt-4o

lxgr

0 replies

48m

2024-05-13 17:38:25 UTC

Will this include image generation for the free tier as well? That's a big missing feature in OpenAI's free tier compared to Google and Meta.

lagt_t

0 replies

50m

2024-05-13 17:36:18 UTC

Universal real time translation is incredibly dope.

I hate video players without volume control.

krunck

0 replies

34m

2024-05-13 17:52:27 UTC

So GPT-4o can do voice intonation? Great. Nice work.

Still, it sounds like some PR drone selling a product. Oh wait....

jrflowers

0 replies

25m

2024-05-13 18:01:07 UTC

I like the robot typing at the keyboard that has B as half of the keys and my favorite part is when it tears up the paper and behind it is another copy of that same paper

joshstrange

0 replies

39m

2024-05-13 17:47:37 UTC

Looking forward to trying this via ChatGPT. As always OpenAI says "now available" but refreshing or logging in/out of ChatGPT (web and mobile) don't cause GPT-4o to show up. I don't know why I find this so frustrating. Probably because they don't say "rolling out" they say things like "try it now" but I can't even though I'm a paying customer. Oh well...

jawiggins

0 replies

47m

2024-05-13 17:39:36 UTC

I hope when this gets to my iphone I can use it to set two concurrent timers.

ilaksh

0 replies

31m

2024-05-13 17:55:30 UTC

Are there any remotely comparable open source models? Fully multimodal, audio-to-audio?

hu3

0 replies

25m

2024-05-13 18:01:51 UTC

That they are offering more features for free concurs with my theory that, just like search, state of the art AI will soon be "free", in exchange for personal information/ads.

hmmmhmmmhmmm

0 replies

52m

2024-05-13 17:34:13 UTC

With the news that Apple and OpenAI are closing / just closed a deal for iOS 18, it's easy to speculate we might be hearing about that exciting new model at WWDC...

gallerdude

0 replies

26m

2024-05-13 17:59:59 UTC

Interesting that they didn't mention a bump in capabilities - I wrote a LLM benchmark a few weeks ago, and before GPT-4 could solve Wordle about ~48% of the time.

Currently with GPT-4o, it's easily clearing 60% - while blazing fast, and half the cost. Amazing.

dom96

0 replies

26m

2024-05-13 18:00:00 UTC

I can't help but feel a bit let down. The demos felt pretty cherry picked and still had issues with the voice getting cut off frequently (especially in the first demo).

I've already played with the vision API, so that doesn't seem all that new. But I agree it is impressive.

That said, watching back a Windows Vista speech recognition demo[1] I'm starting to wonder if this stuff won't have the same fate in a few years.

1 - https://www.youtube.com/watch?v=VMk8J8DElvA

dkga

0 replies

42m

2024-05-13 17:44:07 UTC

That can “reason”?

delichon

0 replies

30m

2024-05-13 17:56:43 UTC

Won't this make pretty much all of the work to make a website accessible go away, as it becomes cheap enough? Why struggle to build parallel content for the impaired when it can be generated just in time as needed?

deegles

0 replies

29m

2024-05-13 17:57:48 UTC

what's the path from LLMs to "true" general AI? is it "only" more training power/data or will they need a fundamental shift in architecture?

dbcooper

0 replies

34m

2024-05-13 17:52:11 UTC

question for you guys - is there a model that can take figures (graphs), from scientific publications, and combine image analysis with picking up the data point symbol descriptions and analyse the trends?

crindy

0 replies

51m

2024-05-13 17:35:04 UTC

Very impressed by the demo where it starts speaking French in error, then laughs with the user about the mistake. Such a natural recovery.

catchnear4321

0 replies

17m

2024-05-13 18:09:04 UTC

window dressing

his love for yud is showing.

candiodari

0 replies

37m

2024-05-13 17:49:47 UTC

I wonder if the audio stuff works like ViTS. Do they just encode the audio as tokens and input the whole thing? Wouldn't that make the context size a lot smaller?

One does notice that context size is noticeably absent from the announcement ...

bredren

0 replies

42m

2024-05-13 17:44:28 UTC

It is notable OpenAI did not need to carefully rehearse the talking points of the speakers. Or even do the kind of careful production quality seen in a lot of other videos.

The technology product is so good and so advanced it doesn't matter how the people appear.

Zuck tried this in his video countering to vision pro, but it did not have the authentic "not really rehearsed or produced" feel of this at all. If you watch that video and compare it with this you can see the difference.

Very interesting times.

blixt

0 replies

19m

2024-05-13 18:07:35 UTC

GPT-4o being a truly multimodal model is exciting, does open the door to more interesting products. I was curious about the new tokenizer which uses much fewer tokens for non-English, but also 1.1x fewer tokens for English, so I'm wondering if this means each token now can be more possible values than before? Might make sense provided that they now also have audio and image output tokens? https://openai.com/index/hello-gpt-4o/

I wonder what "fewer tokens" really means then, without context on raising the size of each token? It's a bit like saying my JPEG image is now using 2x fewer words after I switched from a 32-bit to a 64-bit architecture no?

banjoe

0 replies

27m

2024-05-13 17:59:02 UTC

I still need to talk very fast to actually chat with ChatGPT which is annoying. You can tell they didn't fix this based on how fast they are talking in the demo.

aw4y

0 replies

40m

2024-05-13 17:46:24 UTC

I don't see anything released today. Login/signup is still required, no signs of desktop app or free use on web. What am I missing?

alvaroir

0 replies

20m

2024-05-13 18:06:21 UTC

I'm really impressed about this demo! Apart from the usual quality benchmarks I'm really impressed about the latency for audio/video: "It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response"... If true at scale, what could be the "tricks" they're using for achieving that?!

altcognito

0 replies

41m

2024-05-13 17:45:08 UTC

Express having a human-like emotional response every single time you interact with it is pretty annoying.

In general, trying to push that this is a human being is probably "unsafe", but that hurts the marketing.

TrueDuality

0 replies

49m

2024-05-13 17:37:19 UTC

Weird visiting the page crashed my graphics driver using Firefox.

Thaxll

0 replies

19m

2024-05-13 18:07:32 UTC

It's pretty impressive, although I don't like the voice / tone, I prefer something more neutral.

PoignardAzur

0 replies

52m

2024-05-13 17:34:30 UTC

Holy crap, the level of corporate cringe of that "two AIs talk to each other" scene is mind-boggling.

It feels like a pretty strong illustration of the awkwardness of getting value from recent AI developments. Like, this is technically super impressive, but also I'm not sure it gives us anything we couldn't have one year ago with GPT-4 and ElevenLabs.

OliverM

0 replies

47m

2024-05-13 17:39:01 UTC

This is impressive, but they just sound so _alien_, especially to this non-U.S. English speaker (to the point of being actively irritating to listen to). I guess picking up on social cues communicating this (rather than express instruction or feedback) is still some time away.

It's still astonishing to consider what this demonstrates!

Negitivefrags

0 replies

29m

2024-05-13 17:57:12 UTC

I found these videos quite hard to watch. There is a level of cringe that I found a bit unpleasant.

It’s like some kind of uncanny valley of human interaction that I don’t get on nearly the same level with the text version.

MBCook

0 replies

43m

2024-05-13 17:43:47 UTC

Why must every website put stupid stuff that floats above the content and can’t be dismissed? It drives me nuts.

DataDaemon

0 replies

34m

2024-05-13 17:52:43 UTC

Now, say goodbye to call centers.

CosmicShadow

0 replies

22m

2024-05-13 18:04:07 UTC

In the video where the 2 AI's sing together, it starts to get really cringey and weird to the point where it literally sounds like it's being faked by 2 voice actors off-screen with literal guns to their heads trying not to cry, did anyone else get that impression?

The tonal talking was impressive, but man that part was like, is someone being tortured or forced against their will?

BoumTAC

0 replies

52m

2024-05-13 17:34:50 UTC

Did they provide the limit rate for free user ?

Because I have the plus membership which is expensive (25$/month).

But if the limit is high enough (or my usage low enough), there is no point for paying that much money for me.