return to table of content

GPT-4o

Jensson
21 replies
55m

The most impressive part is that the voice uses the right feelings and tonal language during the presentation. I'm not sure how much of that was that they had tested this over and over, but it is really hard to get that right so if they didn't fake it in some way I'd say that is revolutionary.

gdb
10 replies
51m

(I work at OpenAI.)

It's really how it works.

jamestimmins
1 replies
38m

With this capability, how close are y'all to it being able to listen to my pronunciation of a new language (e.g. Italian) and given specific feedback about how to pronounce it like a local?

Seems like these would be similar.

taytus
0 replies
19m

The italian output in the demo was really bad.

baq
1 replies
23m

(I work at OpenAI.)

Winner of the 'understatement of the week' award (and it's only Monday).

Also top contender in the 'technically correct' category.

swyx
0 replies
22m

and was briefly untrue for like 2 days

terhechte
0 replies
21m

Random OpenAI question: While the GPT models have become ever cheaper, the price for the tts models have stayed in the $15/1Mio char range. I was hoping this would also become cheaper at some point. There're so many apps (e.g. language learning) that quickly become too expensive given these prices. With the GPT-4o voice (which sounds much better than the current TTS or TTS HD endpoint) I thought maybe the prices for TTS would go down. Sadly that hasn't happened. Is that something on the OpenAI agenda?

skottenborg
0 replies
22m

"(I work at OpenAI.)"

Ah yes, also known as being co-founder :)

passion__desire
0 replies
20m

hi gdb, could you please create an assistant AI that can filter low-quality HN discussion on your comment so that it can redirect my focus on useful stuff.

mttpgn
0 replies
23m

Licensing the emotion-intoned TTS as a standalone API is something I would look forward to seeing. Not sure how feasible that would be if, as a sibling comment suggested, it bypasses the text-rendering step altogether.

bjtitus
0 replies
18m

Is it possible to use this as a TTS model? I noticed on the announcement post that this is a single model as opposed to a text model being piped to a separate TTS model.

999900000999
0 replies
19m

How far are we away from something like a helmet with chat GPT and a video camera installed, I imagine this will be awesome for low vision people. Imagine having a guide tell you how to walk to the grocery store, and help you grocery shop without an assistant. Of course you have tons of liability issues here, but this is very impressive

og_kalu
4 replies
51m

The most impressive part is that the voice uses the right feelings and tonal language during the presentation.

Consequences of audio2audio (rather than audio >text text>audio). Being able to manipulate speech nearly as well as it manipulates text is something else. This will be a revelation for language learning amongst other things. And you can interrupt it freely now!

pants2
2 replies
29m

However, this looks like it only works with speech - i.e. you can't ask it, "What's the tune I'm humming?" or "Why is my car making this noise?"

I could be wrong but I haven't seen any non-speech demos.

throwaway11460
0 replies
26m

What about the breath analysis?

cube2222
0 replies
27m

Fwiw, the live demo[0] included different kinds of breathing, and getting feedback on it.

[0]: https://youtu.be/DQacCB9tDaw?t=557

jcims
0 replies
27m

Anyone who has used elevenlabs for voice generation has found this to be the case. Voice to voice seems like magic.

simonw
1 replies
31m

that was very impressive, but it doesn't surprise me much given how good the voice mode is in the ChatGPT iPhone app is already.

The new voice mode sounds better, but the current voice mode did also have inflection that made it feel much more natural than most computer voices I've heard before.

Jensson
0 replies
22m

Can you tell the current voice model what feelings and tone it should communicate with? If not it isn't even comparable, being able to control how it reads things is absolutely revolutionary, that is what was missing from using these AI models as voice actors.

newzisforsukas
1 replies
22m

Right to who? To me, the voice sounds like an over enthusiastic podcast interviewer. Whats wrong with wanting computers to sound like what people think computers should sound like?

Jensson
0 replies
18m

It understands tonal language, you can tell it how you want it to talk, I have never seen a model like that before. If you want it to talk like a computer you can tell it to, they did it during the presentation, that is so much better than the old attempts at solving this.

bredren
0 replies
37m

I mention this down thread, but a symptom of a tech product of sufficient advancement is the nature of its introduction matters less and less.

Based on the casual production of these videos, the product must be this good.

https://news.ycombinator.com/item?id=40346002

msoad
18 replies
48m

They are admitting[1] that the new model is the gpt2-chatbot that we have seen before[2]. As many highlighted there, the model is not an improvement like GPT3->GPT4. I tested a bunch of programming stuff and it was not that much better.

It's interesting that OpenAI is highlighting the Elo score instead of showing results for many many benchmarks that all models are stuck at 50-70% success.

[1] https://twitter.com/LiamFedus/status/1790064963966370209

[2] https://news.ycombinator.com/item?id=40199715

modeless
16 replies
46m

"not that much better" is extremely impressive, because it's a much smaller and much faster model. Don't worry, GPT-5 is coming and it will be better.

talldayo
5 replies
40m

Chalmers: "GPT-5? A vastly-improved model that somehow reduces the compute overhead while providing better answers with the same compute architecture? At this time of year? In this kind of market?"

Skinner: "Yes."

Chalmers: "May I see it?"

Skinner: "No."

og_kalu
1 replies
28m

GPT-3 was released in 2020 and GPT-4 in 2023. Now we all expect 5 sooner than that but you're acting like we've been waiting years lol.

skepticATX
0 replies
22m

[delayed]

AaronFriel
1 replies
33m

It has only been a little over one year since GPT-4 was announced, and it was at the time the largest and most expensive model ever trained. It might still be.

Perhaps it's worth taking a beat and looking at the incredible progress in that year, and acknowledge that whatever's next is probably "still cooking".

Even Meta is still baking their 400B parameter model.

bamboozled
0 replies
22m

Legit love progress

pwdisswordfishc
0 replies
33m

Incidentally, this dialogue works equally well, if not better, with David Chalmers versus B.F. Skinner, as with the Simpsons characters.

mupuff1234
5 replies
42m

And how can one be so sure of that?

Seems to me that performance is converging and we might not see a significant jump until we have another breakthrough.

diego_sandoval
1 replies
32m

Seems to me that performance is converging

It doesn't seem that way to me. But even if it did, video generation also seemed kind of stagnant before Sora.

In general, I think The Bitter Lesson is the biggest factor at play here, and compute power is not stagnating.

drawnwren
0 replies
26m

Computer power is not stagnating, but the availability of training data is. It's not like there's a second stackoverflow or reddit to scrape.

aantix
1 replies
27m

The use of AI in the research of AI accelerates everything.

thefaux
0 replies
20m

I'm not sure of this. The jury is still out on most ai tools. Even if it is true, it may be in a kind of strange reverse way: people innovating by asking what ai can't do and directing their attention there.

scarmig
0 replies
36m

Yeah. There are lots of things we can do with existing capabilities, but in terms of progressing beyond them all of the frontier models seem like they're a hair's breadth from each other. That is not what one would predict if LLMs had a much higher ceiling than we are currently at.

I'll reserve judgment until we see GPT5, but if it becomes just a matter of who best can monetize existing capabilities, OAI isn't the best positioned.

moomoo11
2 replies
29m

I really hope GPT5 is good. GPT4 sucks at programming.

verdverm
1 replies
26m

Look to a specialized model instead of a general purpose one

moomoo11
0 replies
20m

Any suggestions? Thanks

I have tried Phind and anything beyond mega junior tier questions it suffers as well and gives bad answers.

TIPSIO
0 replies
43m

Obviously given enough time there will always be better models coming.

But I am not convinced it will be another GPT-4 moment. Seems like big focus on tacking together multi-modal clever tricks vs straight better intelligence AI.

Hope they prove me wrong!

cube2222
0 replies
28m

I think the live demo that happened on the livestream is best to get a feel for this model[0].

I don't really care whether it's stronger than gpt-4-turbo or not. The direct real-time video and audio capabilities are absolutely magical and stunning. The responses in voice mode are now instantaneous, you can interrupt the model, you can talk to it while showing it a video, and it understands (and uses) intonation and emotion.

Really, just watch the live demo. I linked directly to where it starts.

Importantly, this makes the interaction a lot more "human-like".

[0]: https://youtu.be/DQacCB9tDaw?t=557

skilled
9 replies
55m

Parts of the demo were quite choppy (latency?) so this definitely feels rushed in response to Google I/O.

Other than that, looks good. Desktop app is great, but I didn’t see no mention of being able to use your own API key so OS projects might still be needed.

The biggest thing is bringing GPT-4 to free users, that is an interesting move. Depending on what the limits are, I might cancel my subscription.

Jordan-117
2 replies
50m

Seems like it was picking up on the audience reaction and stopping to listen.

To me the more troubling thing was the apparent hallucination (saying it sees the equation before he wrote it, commenting on an outfit when the camera was down, describing a table instead of his expression), but that might have just been latency awkwardness. Overall, the fast response is extremely impressive, as is the new emotional dimension of the voice.

sebastiennight
0 replies
46m

Aha, I think I saw the trick for the live demo: every time they used the "video feed", they did prompt the model specifically by saying:

- "What are you seeing now"

- "I'm showing this to you now"

etc.

The one time where he didn't prime the model to take a snapshot this way, was the time where the model saw the "table" (an old snapshot, since the phone was on the table/pointed at the table), so that might be the reason.

ayhanfuat
0 replies
43m

Commenting on the outfit was very weird indeed. Greg Brockman's demo includes some outfit related questions (https://twitter.com/gdb/status/1790071008499544518). It does seem very impressive though, even if they polished it on some specific tasks. I am looking forward to showing my desktop and asking questions.

tailspin2019
1 replies
46m

Regarding the limits, I recently found that I was hitting limits very quickly on GPT-4 on my ChatGPT Plus plan.

I’m pretty sure that wasn’t always the case - it feels like somewhere along the lines the allowed usage was reduced, unless I’m imagining it. It wouldn’t be such a big deal if there was more visibility of my current usage compared to my total “allowance”.

I ended up upgrading to ChatGPT Team which has a minimum of 2x users (I now use both accounts) but I resented having to do this - especially being forced to pay for two users just to meet their arbitrary minimum.

I feel like I should not be hitting limits on the ChatGPT Plus paid plan at all based on my usage patterns.

I haven’t hit any limits on the Team plan yet.

I hope they continue to improve the paid plans and become a bit more transparent about usage limits/caps. I really do not mind paying for this (incredible) tech, but the way it’s being sold currently is not quite right and feels like paid users get a bit of a raw deal in some cases.

I have API access but just haven’t found an open source client that I like using as much as the native ChatGPT apps yet.

emporas
0 replies
38m

I use GPT from API in emacs, it's wonderful. Gptel is the program.

Although API access through Groq to Llama 3 (8b and 70b) is so much faster, that i cannot stand how slow GPT is anymore. It is slooow, still very capable model, but marginally better than open source alternatives.

dharma1
1 replies
35m

what's the download link for the desktop app? can't find it

mpeg
0 replies
24m

seems like it might not be available for everyone? – my chatgpt plus doesn't show anything new, and also can't find the dekstop app

russdill
0 replies
35m

They need to fade the audio or add some vocal queue when it's being interrupted. It makes it sound like it's losing connection. What'll be really impressive is when it intentionally starts interrupting you.

Jensson
0 replies
44m

Parts of the demo were quite choppy (latency?) so this definitely feels rushed in response to Google I/O.

It just stops the audio feed when it detects sound instead of an AI detecting when it should speak, so that part is horrible yeah. A full AI conversation would detect the natural pauses where you give it room to speak or when you try to take the word from it by interrupting, there it was just some dumb script to just shut it off when it hears sound.

But it is still very impressive for all the other part, that voice is really good.

Edit: If anyone from OpenAI reads this, at least fade out the voice quickly instead of chopping it, hard chopping off audio doesn't sound good at all, so many experienced this presentation to be extremely buggy due to it.

MP_1729
7 replies
39m

This thing continues to stress my skepticism for AI scaling laws and the broad AI semiconductor capex spending.

1- OpenAI is still working in GPT-4-level models. More than 14 months after the launch of GPT-4 and after more than $10B in capital raised. 2- The rhythm that token prices are collapsing is bizarre. Now a (bit) better model for 50% of the price. How people seriously expect these foundational model companies to make substantial revenue? Token volume needs to double just for revenue to stand still. Since GPT-4 launch, token prices are falling 84% per year!! Good for mankind, but crazy for these companies. 3- Maybe I am an asshole, but where are my agents? I mean, good for the consumer use case. Let's hope the rumors that Apple is deploying ChatGPT with Siri are true, these features will help a lot. But I wanted agents! 4- These drop in costs are good for the environment! No reason to expect them to stop here.

spacebanana7
0 replies
22m

Sam Altman gave the impression that foundation models would be a commodity on his appearance in the All in Podcast, at least in my read of what he said.

The revenue will likely come from application layer and platform services. ChatGPT is still much better tuned for conversation than anything else in my subjective experience and I’m paying premium because of that.

Alternatively it could be like search - where between having a slightly better model and getting Apple to make you the default, there’s an ad market to be tapped.

ldjkfkdsjnv
0 replies
29m

Yeah I'm also getting suspicious. Also, all of the models (opus, llama3, gpt4, gemini pro) are converging to similar levels of performance. If it was true that the scaling hypothesis was true, we would see a greater divergence of model performance

htrp
0 replies
33m

Did we ever get confirmation that GPT 4 was a fresh training run vs increasingly complex training on more tokens on the base GPT3 models?

hn_throwaway_99
0 replies
21m

I'm ceaselessly amazed at people's capacity for impatience. I mean, when GPT 4 came out, I was like "holy f, this is magic!!" How quickly we get used to that magic and demand more.

Especially since this demo is extremely impressive given the voice capabilities, yet still the reaction is, essentially, "But what about AGI??!!" Seriously, take a breather. Never before in my entire career have I seen technology advance at such a breakneck speed - don't forget transformers were only invented 7 years ago. So yes, there will be some ups and downs, but I couldn't help but laugh at the thought that "14 months" is seen as a long time...

hehdhdjehehegwv
0 replies
28m

This is why think Meta has been so shrewd in their “open” model approach. I can run Llama3-70B on my local workstation with an A6000, which after the up-front cost of the card, is just my electricity bill.

So despite all the effort and cost that goes into these models, you still have to compete against a “free” offering.

Meta doesn’t sell an API, but they can make it harder for everybody else to make money on it.

adtac
0 replies
19m

Token volume needs to double just for revenue to stand still

Profits are the real metric. Token volume doesn't need to double for profits to stand still if operational costs go down.

IanCal
0 replies
20m

Tbf gpt4 level seems useful and better than almost everything else (or close if not). The more important barriers for use in applications have been cost, throughout and latency. Oh and modalities, which have expanded hugely.

christianqchung
5 replies
51m

Does anyone know how they're doing the audio part where Mark breaths too hard? Does his breathing get turned into all-caps text (AA EE OO) and that GPT4-o interprets that as him breathing too hard, or is there something more going on?

modeless
2 replies
47m

There is no text. The model understands ingests audio directly and also outputs audio directly.

dclowd9901
1 replies
34m

Is it a stretch to think this thing could accurately "talk" with animals?

jamilton
0 replies
21m

Yes? Why would it be able to do that?

Jordan-117
0 replies
48m

That's how it used to do it, but my understanding is that this new model processes audio directly. If it were a music generator, the original would have generated sheet music to send to a synthesizer (text to speech), while now it can create the raw waveform from scratch.

GalaxyNova
0 replies
50m

It can natively interpret voice now.

syntaxing
4 replies
53m

I admit I drink the koolaid and love LLMs and their applications. But damn, the way it’s responds in the demo gave me goosebumps in a bad way. Like an uncanny valley instincts kicks in.

wslack
0 replies
21m

It should do that, because it's still not actually an intelligence. It's a tool that is figuring out what to say in response that sounds intelligent - and will often succeed!

bbconn
0 replies
22m

Yeah it made me realize that I actually don't want a human-like conversational bot (I have actual humans for that). Just teach me javascript like a robot.

_Parfait_
0 replies
43m

You're watching the species be reduced to an LLM.

TheSockStealer
0 replies
24m

I also thought the screwups, although minor, were interesting. Like when it thought his face was a desk because it did not update the image it was "viewing". It is still not perfect, which made the whole thing more believable.

modeless
4 replies
55m

As far as I'm concerned this is the new best demo of all time. This is going to change the world in short order. I doubt they will be ready with enough GPUs for the demand the voice+vision mode is going to get, if it's really released to all free users.

Now imagine this in a $16k humanoid robot, also announced this morning: https://www.youtube.com/watch?v=GzX1qOIO1bE The future is going to be wild.

andy99
3 replies
49m

Really? If this was Apple it might make sense, for OpenAI it feels like a demo that's not particularly aligned with their core competency (a least by reputation) of building the most performant AI models. Or put another way, it says to me they're done building models and are now wading into territory where there are strong incumbents.

All the recent OpenAI talk had me concerned that the tech has peaked for now and that expectations are going to be reset.

modeless
1 replies
37m

What strong incumbents are there in conversational voice models? Siri? Google Assistant? This is in a completely different league. I can see from the reaction here that people don't understand. But they will when they try it.

fidotron
0 replies
24m

In common with Siri, Google Assistant, Alexa and chatgpt is the perception that over time the same thing actually gets worse.

Whether it's real or not is a reasonably interesting question, because it's possible that all that occurs with the progress is our perception of how things should be advances. My gut feeling is it has been a bit of both though, in the sense the decline is real, and we expect things to improve.

Who can forget Google demoing their AI making a call to a restaurant that they showed at I/O many years ago? Everyone, apparently.

golol
0 replies
29m

What Openai has done time and time again is completely change the landscape when the competitors have caught up and everyone thinks their lead is gone. They made image generation a thing. When GPT-3 became outdated they released ChatGPT. Instead of trying to keep Dalle competitive they released Sora. Now they change the game again with live audio+video.

cs702
4 replies
36m

The usual critics will quickly point out that LLMs like GPT-4o still have a lot of failure modes and suffer from issues that remain unresolved. They will point out that we're reaping diminishing returns from Transformers. They will question the absence of a "GPT-5" model. And so on -- blah, blah, blah, stochastic parrots, blah, blah, blah.

Ignore the critics. Watch the demos. Play with it.

This stuff feels magical. Magical. It makes the movie "Her" look it's no longer in the realm of science fiction but in the realm of incremental product development. HAL's unemotional monotone in Kubrick's movie, "Space Odyssey," feels... primitive by comparison. I'm impressed at how well this works.

Well-deserved congratulations to everyone at OpenAI!

aftbit
1 replies
29m

Who cares? This stuff feels magical. Magical!

On one hand, I agree - we shouldn't diminish the very real capabilities of these models with tech skepticism. On the other hand, I disagree - I believe this approach is unlikely to lead to human-level AGI.

Like so many things, the truth probably lies somewhere between the skeptical naysayers and the breathless fanboys.

CamperBob2
0 replies
27m

On the other hand, I disagree - I believe this approach is unlikely to lead to human-level AGI.

You might not be fooled by a conversation with an agent like the one in the promo video, but you'd probably agree that somewhere around 80% of people could be. At what percentage would you say that it's good enough to be "human-level?"

CamperBob2
1 replies
31m

Imagine what an unfettered model would be like. 'Ex Machina' would no longer be a software-engineering problem, but just another exercise in mechanical and electrical engineering.

The future is indeed here... and it is, indeed, not equitably distributed.

aftbit
0 replies
29m

Or from Zones of Thought series, Applied Theology, the study of communication with and creation of superhuman intelligences that might as well be gods.

yumraj
3 replies
46m

In the first video the AI seems excessively chatty.

hipadev23
2 replies
35m

chatGPT desperately needs a "get to the fucking point" mode.

tomashubelbauer
0 replies
27m

Seriously. I've had to spell out that it should just answer in twelve different ways with examples in the custom instructions to make it at least somewhat usable. And it still "forgets" sometimes.

chatcode
0 replies
21m

It does, that's "custom instructions".

ralusek
3 replies
55m

Can't find info which of these new features are available via the API

tazu
2 replies
54m

Developers can also now access GPT-4o in the API as a text and vision model. GPT-4o is 2x faster, half the price, and has 5x higher rate limits compared to GPT-4 Turbo. We plan to launch support for GPT-4o's new audio and video capabilities to a small group of trusted partners in the API in the coming weeks.
OutOfHere
0 replies
30m

It is not listed as of yet, but it does work if you punch in gpt-4o. I will stick with gpt-4-0125-preview for now because gpt-4o seems majorly prone to hallucinations whereas gpt-4-0125-preview doesn't.

hubraumhugo
3 replies
53m

The movie Her has just become reality

speedgoose
1 replies
46m

It’s getting closer. A few years ago the old Replika AI was already quite good as a romantic partner, especially when you started your messages with a * character to force OpenAI GPT-3 answers. You could do sexting that OpenAI will never let you have nowadays with ChatGPT.

aftbit
0 replies
27m

Why does OpenAI think that sexting is a bad thing? Why is AI safety all about not saying things that are disturbing or offensive, rather than not saying things that are false or unaligned?

volleygman180
0 replies
28m

I was surprised that the voice is a ripoff of the AI voice in that movie (Scarlett Johansson) too

101008
3 replies
40m

Are the employees in the demo high-directives of OpenAI? I can understand Altman being happy with this progress, but what about the medium/low employees? Didn't they watch Oppenheimer? Are they happy they are destroying humanity/work/etc for future and not-so-future generations?

Anyone who thinks this will be like the previous work revolutions is nonsense. This replaces humans and will replace them even more on each new advance. What's their plan? Live out of their savings? What about family/friends? I honestly can't see this and think how they can be so happy about it...

"Hey, we created something very powerful that will do your work for free! And it does it better than you and faster than you! Who are you? It doesn't matter, it applies to all of you!"

And considering I was thinking in having a kid next year, well, this is a no.

galdosdi
1 replies
34m

Have a kid anyway, if you otherwise really felt driven to it. Reading the tealeaves in the news is a dumb reason to change decisions like that. There's always some disaster looming, always has been. If you raise them well they'll adapt well to whatever weird future they inherit and be amongst the ones who help others get through it

101008
0 replies
31m

Thanks for taking the time to answer instead of (just) downvoting. I understand your logic but I don't see a future where people can adapt to this and get through it. I honestly see a future so dark and we'll be there much sooner than we thought... when OpenAI released their first model people were talking about years before seeing real changes and look what happened. The advance is exponential...

nice_byte
0 replies
28m

"It is difficult to get a man to understand something when his salary depends on his not understanding it."

summerlight
2 replies
43m

This is really impressive engineering. I thought real time agents would completely change the way we're going to interact with large models but it would take 1~2 more years. I wonder what kind of new techs are developed to enable this, but OpenAI is fairly secretive so we won't be able to know their sauce.

On the other hand, this also feels like a signal that reasoning capability has probably already been plateaued at GPT-4 level and OpenAI knew it so they decided to focus on research that matters to delivering product engineering rather than long-term research to unlock further general (super)intelligence.

nopinsight
1 replies
33m

Reliable agents in diverse domains need better reasoning ability and fewer hallucinations. If the rumored GPT-5 and Q* capabilities are true, such agents could become available soon after it’s launched.

summerlight
0 replies
20m

Sam has been pretty clear on denying GPT-5 rumors, so I don't think it will come anytime soon.

spacebanana7
2 replies
51m

We recognize that GPT-4o’s audio modalities present a variety of novel risks

For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies.

I wonder if they’ll ever allow truly custom voices from audio samples.

dkasper
1 replies
47m

I think the issue there is less of a technical one and more of an issue with deepfakes and copyright

spacebanana7
0 replies
42m

It might be possible to prove that I control my voice, or that of a given audio sample. For example by saying specific words on demand.

But yeah I see how they’d be blamed if anything went wrong, which it almost certainly would in some cases.

levocardia
2 replies
35m

As a paid user this felt like a huge letdown. GPT-4o is available to everyone so I'm paying $20/mo for...what, exactly? Higher message limits? I have no idea if I'm close to the message limits currently (nor do I even know what they are). So I guess I'll cancel, then see if I hit the limits?

I'm also extremely worried that this is a harbinger of the enshittification of ChatGPT. Processing video and audio for all ~200 million users is going to be extravagantly expensive, so my only conclusion is that OpenAI is funding this by doubling down on payola-style corporate partnerships that will result in ChatGPT slyly trying to mention certain brands or products in our conversations [1].

I use ChatGPT every day. I love it. But after watching the video I can't help but think "why should I keep paying money for this?"

[1] https://www.adweek.com/media/openai-preferred-publisher-prog...

muttantt
0 replies
34m

So... cancel the subscription?

CodeCrusader
0 replies
30m

Completely agree, none of the updates will apply to any of my use cases, disappointment.

csjh
2 replies
46m

I wonder if this is what the "gpt2-chatbot" that was going around earlier this month was

lambdaba
0 replies
41m

yes it was

AndyNemmity
0 replies
32m

it was

bearjaws
2 replies
36m

OAI just made an embarrassment of Google's fake demo earlier this year. Given how this was recorded, I am pretty certain it's authentic.

hehdhdjehehegwv
0 replies
24m

This feature has been in iOS for a while now, just really slow and without some of the new vision aspects. This seems like a version 2 for me.

CivBase
0 replies
25m

I don't doubt this is authentic, but if they really wanted to fake those demos, it would be pretty easy to do using pre-recorded lines and staged interactions.

simonw
0 replies
28m

Oh interesting, does that mean languages other than English won't be paying such a large penalty in terms of token lengths?

With previous tokenizers there was a notable increase in the number of tokens needed to represent non-English sentences: https://simonwillison.net/2023/Jun/8/gpt-tokenizers/

minimaxir
0 replies
53m

For posterity, GPT-3.5/4's tokenizer was 100k. The benefit of a larger tokenizer is more efficient tokenization (and therefore cheaper/faster) but with massive diminishing returns: the larger tokenizer makes the model more difficult to train but tends to reduce token usage by 10-15%.

MisterBiggs
2 replies
42m

I've been waiting to see someone drop a desktop app like they showcased. I wonder how long until it is normal to have an AI looking at your screen the entire time your machine is unlocked. Answering contextual questions and maybe even interjecting if it notices you made a mistake and moved on.

doomroot13
1 replies
36m

That seems to be what Microsoft is building and will reveal as a new Windows feature at BUILD '24. Not too sure about the interjecting aspect but ingesting everything you do on your machine so you can easily recall and search and ask questions, etc. AI Explorer is the rumored name and will possibly run locally on Qualcomm NPUs.

ukuina
0 replies
30m

Yes, this is Windows AI Explorer.

Jimmc414
2 replies
48m

Big questions are (1) when is this going to be rolled out to paid users? (2) what is the remaining benefit of being a paid user if this is rolled out to free users? (3) Biggest concern is will this degrade the paid experience since GPT-4 interactions are already rate limited. Does OpenAI have the hardware to handle this?

Edit: according to @gdb this is coming in "weeks"

https://twitter.com/gdb/status/1790074041614717210

onemiketwelve
0 replies
41m

thanks, I was confused because the top of the page says to try now when you cannot in fact try it at all

Tenoke
0 replies
26m

what is the remaining benefit of being a paid user if this is rolled out to free users?

It says so right in the post

We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits

The limits are much lower for free users.

GalaxyNova
2 replies
57m

It is really cool that they are bringing this to free users. It does make me wonder what justifies ChatGPT plus now though...

pantsforbirds
0 replies
48m

I assume the desktop app with voice and vision is rolling out to plus users first?

InfiniteVortex
0 replies
52m

they stated that they will be announcing something new that is on the next frontier (or close to it IIRC) soon. so there will definitely be an incentive to pay because it will be something better than gpt 4o.

w-m
1 replies
47m

Gone are the days of copy-pasting to/from ChatGPT all the time, now you just share your screen. That's a fantastic feature, in how much friction that removes. But what an absolute privacy nightmare.

With ChatGPT having a very simple text+attachment in, text out interface, I felt absolutely in control of what I tell it. Now when it's grabbing my screen or a live camera feed, that will be gone. And I'll still use it, because it's just so damn convenient?

baby_souffle
0 replies
31m

Now when it's grabbing my screen or a live camera feed, that will be gone. And I'll still use it, because it's just so damn convenient?

Presumably you'll have a way to draw a bounding box around what you want to show or limit to just a particular window the same way you can when doing a screen share w/ modern video conferencing?

sebastiennight
1 replies
44m

Anyone who watched the OpenAI livestream: did they "paste" the code after hitting CTRL+C ? Or did the desktop app just read from the clipboard?

Edit: I'm asking because of the obvious data security implications of having your desktop app read from the clipboard _in the live demo_... That would definitely put a damper to my fanboyish enthusiasm about that desktop app.

golol
0 replies
27m

To me it looked they used one command that did copy+paste into ChatGPT both.

rvz
1 replies
52m

Given that they are moving all these features to free users, it tells us that GPT-5 is around the corner and is significantly much better than their previous models.

margorczynski
0 replies
37m

Or maybe it is a desperation move after Llama 3 got released and the free mode will have such tight constraints that it will be unusable for anything a bit more serious.

peppertree
1 replies
45m

Just like that Google is on back foot again.

tempsy
0 replies
39m

Considering the stock pumped following the presentation the market doesn't seem particularly with what OpenAI released at all.

karaterobot
1 replies
39m

That first demo video was impressive, but then it ended very abruptly. It made me wonder if the next response was not as good as the prior ones.

dclowd9901
0 replies
29m

Extremely impressive -- hopefully there will be an option to color all responses with a underlying brevity. It seemed like the AI just kept droning on and on.

causal
1 replies
49m

Clicking the "Try it on ChatGPT" link just takes me to GPT-4 chat window. Tried again in an incognito tab (supposing my account is the issue) and it just takes me to 3.5 chat. Anyone able to use it?

101008
0 replies
46m

Same here and also I can't hear audio in any of the videos on this page. Weird.

brainer
1 replies
29m

OpenAI's Mission and the New Voice Mode of GPT-4

• Sam Altman, the CEO of OpenAI, emphasizes two key points from their recent announcement. Firstly, he highlights their commitment to providing free access to powerful AI tools, such as ChatGPT, without advertisements or restrictions. This aligns with their initial vision of creating AI for the benefit of the world, allowing others to build amazing things using their technology. While OpenAI plans to explore commercial opportunities, they aim to continue offering outstanding AI services to billions of people at no cost.

• Secondly, Altman introduces the new voice and video mode of GPT-4, describing it as the best compute interface he has ever experienced. He expresses surprise at the reality of this technology, which provides human-level response times and expressiveness. This advancement marks a significant change from the original ChatGPT and feels fast, smart, fun, natural, and helpful. Altman envisions a future where computers can do much more than before, with the integration of personalization, access to user information, and the ability to take actions on behalf of users.

https://blog.samaltman.com/gpt-4o

simonw
0 replies
20m

Please don't post AI-generated summaries here.

simonw
0 replies
23m

That's not a well sourced story: it doesn't say where the numbers come from. Also:

"However, ChatGPT consumes a lot of energy in the process, up to 25 times more than a Google search."

That's comparing a Large Language Model prompt to a search query.

CivBase
1 replies
34m

Those voice demos are cool but having to listen to it speak makes me even more frustrated with how these LLMs will drone on and on without having much to say.

For example, in the second video the guy explains how he will have it talk to another "AI" to get information. Instead of just responding with "Okay, I understand" it started talking about how interesting the idea sounded. And as the demo went on, both "AIs" kept adding unnecessary commentary about the secenes.

I would hate having to talk with these things on a regular basis.

golol
0 replies
24m

Yea at some pont the style and tone of these assistants needs to be seriously changed, I can imagine a lot of their RLHF and instruct processes emphasize sounding good vs being good too much.

taytus
0 replies
22m

the OpenAI live stream was quite underwhelming...

tailspin2019
0 replies
33m

Does anyone with a paid plan see anything different in the ChatGPT iOS app yet?

Mine just continues to show “GPT 4” as the model - it’s not clear if that’s now 4o or there is an app update coming…

sourcecodeplz
0 replies
52m

It is quite nice how they keep giving premium features for free, after a while. I know openai is not open and all but damn, they do give some cool freebies.

sn_master
0 replies
43m

This is every romance scammer's dreams come true...

skepticATX
0 replies
42m

Very impressive demo, but not really a step change in my opinion. The hype from OpenAI employees was on another level, way more than was warranted in my opinion.

Ultimately, the promise of LLM proponents is that these models will get exponentially smarter - this hasn’t born out yet. So from that perspective, this was a disappointing release.

If anything, this feels like a rushed release to match what Google will be demoing tomorrow.

pachico
0 replies
50m

jeez, that model really speaks a lot! I hope there's a way to make it more straight to the point rather than radio-like.

mickg10
0 replies
22m

So, babelfish soon?

mickg10
0 replies
22m

So, babelfish incoming?

mellosouls
0 replies
47m

Very, very impressive for a "minor" release demo. The capabilities here would look shockingly advanced just 5 years ago.

Universal translator, pair programmer, completely human sounding voice assistant and all in real time. Scifi tropes made real.

But: Interesting next to see how it actually performs IRL latency and without cherry-picking. No snark, it was great but need to see real world power. Also what the benefits are to subscribers if all this is going to be free...

lxgr
0 replies
48m

Will this include image generation for the free tier as well? That's a big missing feature in OpenAI's free tier compared to Google and Meta.

lagt_t
0 replies
50m

Universal real time translation is incredibly dope.

I hate video players without volume control.

krunck
0 replies
34m

So GPT-4o can do voice intonation? Great. Nice work.

Still, it sounds like some PR drone selling a product. Oh wait....

jrflowers
0 replies
25m

I like the robot typing at the keyboard that has B as half of the keys and my favorite part is when it tears up the paper and behind it is another copy of that same paper

joshstrange
0 replies
39m

Looking forward to trying this via ChatGPT. As always OpenAI says "now available" but refreshing or logging in/out of ChatGPT (web and mobile) don't cause GPT-4o to show up. I don't know why I find this so frustrating. Probably because they don't say "rolling out" they say things like "try it now" but I can't even though I'm a paying customer. Oh well...

jawiggins
0 replies
47m

I hope when this gets to my iphone I can use it to set two concurrent timers.

ilaksh
0 replies
31m

Are there any remotely comparable open source models? Fully multimodal, audio-to-audio?

hu3
0 replies
25m

That they are offering more features for free concurs with my theory that, just like search, state of the art AI will soon be "free", in exchange for personal information/ads.

hmmmhmmmhmmm
0 replies
52m

With the news that Apple and OpenAI are closing / just closed a deal for iOS 18, it's easy to speculate we might be hearing about that exciting new model at WWDC...

gallerdude
0 replies
26m

Interesting that they didn't mention a bump in capabilities - I wrote a LLM benchmark a few weeks ago, and before GPT-4 could solve Wordle about ~48% of the time.

Currently with GPT-4o, it's easily clearing 60% - while blazing fast, and half the cost. Amazing.

dom96
0 replies
26m

I can't help but feel a bit let down. The demos felt pretty cherry picked and still had issues with the voice getting cut off frequently (especially in the first demo).

I've already played with the vision API, so that doesn't seem all that new. But I agree it is impressive.

That said, watching back a Windows Vista speech recognition demo[1] I'm starting to wonder if this stuff won't have the same fate in a few years.

1 - https://www.youtube.com/watch?v=VMk8J8DElvA

dkga
0 replies
42m

That can “reason”?

delichon
0 replies
30m

Won't this make pretty much all of the work to make a website accessible go away, as it becomes cheap enough? Why struggle to build parallel content for the impaired when it can be generated just in time as needed?

deegles
0 replies
29m

what's the path from LLMs to "true" general AI? is it "only" more training power/data or will they need a fundamental shift in architecture?

dbcooper
0 replies
34m

question for you guys - is there a model that can take figures (graphs), from scientific publications, and combine image analysis with picking up the data point symbol descriptions and analyse the trends?

crindy
0 replies
51m

Very impressed by the demo where it starts speaking French in error, then laughs with the user about the mistake. Such a natural recovery.

catchnear4321
0 replies
17m

window dressing

his love for yud is showing.

candiodari
0 replies
37m

I wonder if the audio stuff works like ViTS. Do they just encode the audio as tokens and input the whole thing? Wouldn't that make the context size a lot smaller?

One does notice that context size is noticeably absent from the announcement ...

bredren
0 replies
42m

It is notable OpenAI did not need to carefully rehearse the talking points of the speakers. Or even do the kind of careful production quality seen in a lot of other videos.

The technology product is so good and so advanced it doesn't matter how the people appear.

Zuck tried this in his video countering to vision pro, but it did not have the authentic "not really rehearsed or produced" feel of this at all. If you watch that video and compare it with this you can see the difference.

Very interesting times.

blixt
0 replies
19m

GPT-4o being a truly multimodal model is exciting, does open the door to more interesting products. I was curious about the new tokenizer which uses much fewer tokens for non-English, but also 1.1x fewer tokens for English, so I'm wondering if this means each token now can be more possible values than before? Might make sense provided that they now also have audio and image output tokens? https://openai.com/index/hello-gpt-4o/

I wonder what "fewer tokens" really means then, without context on raising the size of each token? It's a bit like saying my JPEG image is now using 2x fewer words after I switched from a 32-bit to a 64-bit architecture no?

banjoe
0 replies
27m

I still need to talk very fast to actually chat with ChatGPT which is annoying. You can tell they didn't fix this based on how fast they are talking in the demo.

aw4y
0 replies
40m

I don't see anything released today. Login/signup is still required, no signs of desktop app or free use on web. What am I missing?

alvaroir
0 replies
20m

I'm really impressed about this demo! Apart from the usual quality benchmarks I'm really impressed about the latency for audio/video: "It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response"... If true at scale, what could be the "tricks" they're using for achieving that?!

altcognito
0 replies
41m

Express having a human-like emotional response every single time you interact with it is pretty annoying.

In general, trying to push that this is a human being is probably "unsafe", but that hurts the marketing.

TrueDuality
0 replies
49m

Weird visiting the page crashed my graphics driver using Firefox.

Thaxll
0 replies
19m

It's pretty impressive, although I don't like the voice / tone, I prefer something more neutral.

PoignardAzur
0 replies
52m

Holy crap, the level of corporate cringe of that "two AIs talk to each other" scene is mind-boggling.

It feels like a pretty strong illustration of the awkwardness of getting value from recent AI developments. Like, this is technically super impressive, but also I'm not sure it gives us anything we couldn't have one year ago with GPT-4 and ElevenLabs.

OliverM
0 replies
47m

This is impressive, but they just sound so _alien_, especially to this non-U.S. English speaker (to the point of being actively irritating to listen to). I guess picking up on social cues communicating this (rather than express instruction or feedback) is still some time away.

It's still astonishing to consider what this demonstrates!

Negitivefrags
0 replies
29m

I found these videos quite hard to watch. There is a level of cringe that I found a bit unpleasant.

It’s like some kind of uncanny valley of human interaction that I don’t get on nearly the same level with the text version.

MBCook
0 replies
43m

Why must every website put stupid stuff that floats above the content and can’t be dismissed? It drives me nuts.

DataDaemon
0 replies
34m

Now, say goodbye to call centers.

CosmicShadow
0 replies
22m

In the video where the 2 AI's sing together, it starts to get really cringey and weird to the point where it literally sounds like it's being faked by 2 voice actors off-screen with literal guns to their heads trying not to cry, did anyone else get that impression?

The tonal talking was impressive, but man that part was like, is someone being tortured or forced against their will?

BoumTAC
0 replies
52m

Did they provide the limit rate for free user ?

Because I have the plus membership which is expensive (25$/month).

But if the limit is high enough (or my usage low enough), there is no point for paying that much money for me.