return to table of content

Veo

xianshou
56 replies
1d

To quote Twitter/X, "I wonder what OpenAI will release tomorrow and Google will release a waitlist for."

GPT-4o: out

Veo: waitlist

Admittedly this is impressive and the direct comp would be Sora, which isn't out, but sometimes the caricature is very close to the truth.

skepticATX
31 replies
1d

OpenAI hardly released gpt-4o. The demo yesterday was clearly a rushed response to I/O. It’s quite possible that Google will ship multi-modality features faster than OpenAI will.

Difwif
26 replies
23h37m

What do you mean? Everyone has access to the gpt-4o model right now through ChatGPT and the API. Sure we don't have voice-to-voice but we have a lot more than what Google has promised.

hbn
16 replies
23h30m

How do I get access? I just checked my app and the Premium upgrade says it will unlocked GPT-3.5 and GPT-4, so I assume my version is still the old one.

All my apps are updated in the App Store too.

theresistor
3 replies
23h7m

To add a counterbalance, I just checked in the app and on the website on a non-paid account, and I too do NOT have GPT-4o.

hbn
2 replies
23h4m

Everyone who says they have access in the replies to my comment seem to be paid users. So maybe it's only rolling out to them first.

htrp
1 replies
22h25m

I expect they have to offer the paid users some thing

hbn
0 replies
22h22m

Paid users get like a 5x higher rate limit iirc

TecoAndJix
2 replies
22h38m

I have a paid account and can do voice to voice on the iOS app as of last night.

hbn
1 replies
22h22m

The realtime one they showed yesterday or the one that's existed forever where it's just a voice-to-text input and TTS output taking turns?

TecoAndJix
0 replies
20h24m

I feel silly now. I downloaded the app after the announcement (I'm a desktop user) and it looked identical to the one they show in the sarcasm video. When I asked it, I was told it was not the new feature announced yesterday. Still a lot of fun!

Edit - it does list the new model in my app at least

DeRock
2 replies
23h23m

I just checked, there was an iOS app update available and it enabled it. I'd check again if there's a new update (version 1.2024.129). Or you could use the website.

hbn
1 replies
23h20m

I'm on the same version and don't see anything different

Website also only has toggle for 3.5 and 4 with the Plus upgrade. Not sure if it's cause I'm in Canada?

croes
0 replies
23h14m

I use their website and it's one of the three models to choose from if your are on plus subscription.

sib
1 replies
21h22m

In the App Store there's a new build of the iOS app as of 3 hours about (call it about 11am US Pacific time). It includes the GPT-4o model (at least it shows it for me.)

hbn
0 replies
20h51m

Are you a paid user?

mr_mitm
0 replies
23h23m

I have premium access and I can select 4o in the dropdown menu on Android

electriclove
0 replies
23h0m

My (paid) app has it but no voice chat yet

cush
0 replies
23h12m

I have the 4o model. On premium. No voice yet

bagels
0 replies
23h25m

I have a paid account, and I didn't have to do anything to use the new model.

theresistor
3 replies
23h5m

It is not available on my (free) account in either the app or the website. So no, everyone does not have access to it.

satvikpendem
2 replies
22h14m

It's for paid users for now, not free. I have ChatGPT Pro and I can use the new model.

nialv7
1 replies
21h25m

I am a free user and I have 4o. I think it is just a gradual roll out.

satvikpendem
0 replies
13h52m

That sounds about right, it just seemed that everyone who replied above who had access to the new model were paid users.

ben_w
2 replies
22h59m

API yes, ChatGPT no (at least not for all users); I've got my own web interface for the API so I can play with the model (for all of $0.045 of API fees), but most people can't be bothered with that and will only get 4o when it rolls out as far as their specific ChatGPT account.

mike_hearn
0 replies
22h47m

I have a regular ChatGPT Pro account and I have GPT-4o.

The bigger issue is that 4o without the multi-modal, new speech capabilities or desktop app isn't that different to GPT-4. And those things aren't yet launched.

ghshephard
0 replies
22h47m

Was running just fine on my ChatGPT client on iOS - full 2-way voice comms by 3:00 PM yesterday. Application was already updated.

baobabKoodaa
0 replies
21h7m

I don't have access to gpt-4o via ChatGPT

TecoAndJix
0 replies
22h38m

Posted further down the thread - I have a paid account and can do voice to voice on the iOS app as of last night.

localfirst
0 replies
22h9m

listen all these guys out here attacking Google and making outlandish/false claims

look at their linkedin pages, that will tell you why they are desperate

(hint: they bought OpenAI bags on the secondary market)

juice_bus
0 replies
23h39m

Which one of these products Google are releasing that you can trust will even be around in a year or two? I'm certainly done trusting Google with new products.

buildbot
0 replies
23h39m

Without doing anything, I have access to GPT-4o in chatgpt and the api already (on a personal account, not related to work). Maybe I’m just super lucky, but it’s certainly not vaporware.

JeremyNT
0 replies
23h41m

Yeah I think at this point it's "not if, but when" and the gap between parity is just going to keep shrinking (until/unless there's some kind of copyright/legislative barrier implemented that favors one or the other).

"We have no moat" swings both ways.

dom96
7 replies
23h50m

GPT-4o: out

Is it? I can't use it yet at least

drawnwren
4 replies
23h49m

It is. I've got it already, but I'm a bit of a gpt4 power user. I hit my rate limit biweekly or so and run up close to it every day. I'd bet maybe they prioritized people that were costing them money.

phyalow
1 replies
23h20m

Really? You have the new model, sure, I have it too, but afaik nobody has the new ultra fast and variable voice + video chat on mobile.

drawnwren
0 replies
20h46m

The original question asked if anyone had GPT-4o. You're asking a different question.

kaibee
1 replies
23h31m

It might be just by sign-up order. I signed up for pro basically as soon as I could, but I never hit limits, and only really use it once or twice a day, sometimes not at all.

drawnwren
0 replies
20h45m

Interestingly, when I use Cloudflare's Warp DNS I don't have access to it. So, it might have something to do w/ region as well?

w-m
0 replies
23h37m

GPT-4o is available for me on ChatGPT, with the text+attachment input (as a Plus user from Germany). It's crazy fast. The voice for the audio conversation in the app is still the old one, and doesn't let you interrupt it.

lordswork
0 replies
23h49m

I am also wondering how to use it..

nextworddev
4 replies
23h58m

Except gpt-4o with audio and video inputs isn’t actually out

adamtaylor_13
3 replies
23h57m

I was using it yesterday in the mobile app. Unless they just slapped the new UI on an older model.

hakanensari
0 replies
23h55m

It’s no longer there, I think?

Workaccount2
0 replies
23h52m

They just released the text chat model which still uses the same old audio interface as 4. The new audio/video chat stuff is not out yet (unless you are a very lucky early beta user).

Cyph0n
0 replies
23h53m

I think they (partially?) rolled it back. I tried out the voice input yesterday, but it’s missing from the app today.

martinesko36
1 replies
1d

This is Google for the last 5+ IOs. They just release waitlists and demos that are leapfrogged by the time they're available to all. (and shut down a few years later)

htrp
0 replies
22h19m

Cite sources?

rvz
0 replies
1d

Sora is the closest comparison to Veo and both aren't out.

It's been there for three months and still isn't even close to being released and available.

Essentially Google has already caught up to OpenAI with their recent responses and it's clear that there are private OpenAI investors pushing such nonsense around Google struggling to compete.

rvnx
0 replies
1d

"This tool isn’t available in your country yet"

resource_waste
0 replies
1d

Google Press: This is the greatest AI Model ever yet.

Users: Lol it wont even tell me how to draw a picture of a human because its inappropriate.

Google flipped like a switch a few years ago. Instead of going for product quality, it seems they went full Apple Marketing and control the narrative of top social media.

I keep trying thinking: "well its Google, they will be the best right?" No, I'm at giving up on Google, they are not as powerful as I once thought... Hmm seems like a good time to get into Lobbying and Marketing...

qwertox
0 replies
22h13m

GPT-4o: out

I don't know what's wrong with GPT-4o, but the answers I'm getting are much worse than before yesterday. It's constantly repeating the entire content required to provide a seemingly "full" answer, but if it passes me the same but slightly modified Python code for the fifth time even if it has become irrelevant to the current conversation, it really gets on my nerves.

I had so well tuned custom instructions which worked beautifully and now it's as if it is ignoring most of them.

It's causing me frustration and really wasting my time when I have to wait for the unnecessary long answers to finish.

mrkramer
0 replies
23h30m

Google is scared of what every new model can produce, they don't want drama but they always end up in some kind of media drama.

modeless
0 replies
1d

To be fair, all the voice stuff OpenAI demoed isn't released yet either.

jsheard
0 replies
1d

Then again Veo is in the same category as Sora, which isn't released either, 3 months after the reveal.

dyauspitr
0 replies
23h12m

I haven’t been able to try out 4o. The voice chat continuously says there’s too much traffic and I don’t even see a button to turn on the camera

Tenoke
0 replies
23h24m

I can't even join the waitlist from Europe while 4o is fully available here.

typpo
46 replies
23h19m

The amount of negativity in these comments is astounding. Congrats to the teams at Google on what they have built, and hoping for more competition and progress in this space.

rvz
11 replies
22h29m

You have to give Google credit as they went against the OpenAI fanatics, Google doomsday crowd and some of the permanent critics (who won't disclose they invested in OpenAI's secondary share sale) that believe that Google can't keep up.

In fact, they already did. What OpenAI announced was nothing that Google could not do already.

The top comments around Sora vs Veo suggesting that Google was falling behind, given the fact that both are still unavailable to use wasn't even a point to make in the first place, but just typical HN nonsense.

JumpCrisscross
5 replies
22h22m

What OpenAI announced was nothing that Google could not do already

I don’t think I’ve seen serious criticism of Google’s abilities. Apple didn’t release anything that Xerox or IBM couldn’t do. The difference is they didn’t.

Google’s problem has always been in product follow through. In this case, I fault them for having the sole action item be a buried waitlist request and two new brands (Veo and VideoFX) for one unreleased product.

KorematsuFredt
2 replies
21h40m

Google’s problem has always been in product follow through.

Google is large enough to not care about small opportunities. It ends up focusing on bigger opportunities that only it can execute well. Google's ability to shut down products that dont work is an insult to user but a very good corporate strategy and they deserve kudos for that.

Now, coming back to the "follow through". Google Search, Gmail, Chrome, Android, Photos, Drive, Cloud etc. all are excellent examples of Google's long term commitment to the product and constantly making things better and keeping them relevant for the market. Many companies like Yahoo! had a head start but could not keep up with their mail service.

Sure it has shut down many small products but that is because they were unlikely to turn into bigger opportunities. They often integrated the best aspect of those products into their other well established products such as Google Trips became part of search and Google Shopping became part of search.

troupo
0 replies
20h53m

Google is large enough to not care about small opportunities. It ends up focusing on bigger opportunities

that result in shittier products overall. For example, just a few months ago they cut 17 features from Google Assistant because they couldn't monetize them, sorry, because these were "small opportunities": https://techcrunch.com/2024/01/11/google-is-removing-17-unde...

all are excellent examples of Google's long term commitment to the product and constantly making things better and keeping them relevant for the market.

And here's a long list of excellent examples of Google killing products right and left because small opportunities or something: https://killedbygoogle.com/

And don't get me started on the whole Hangouts/Meet/Alo/Duo/whatever fiasco

Sure it has shut down many small products but that is because they were unlikely to turn into bigger opportunities.

Translation: because they couldn't find ways to monetize the last cent out of them

---

Edit: don't forget: The absolute vast majority of Google's money comes from selling ads. There's nothing else it is capable of doing at any significant scale. The only reason it doesn't "chase small opportunities" is because Google doesn't know how. There are a few smaller cash cows that it can keep chugging along, but they are dwarfed by the single driving force that mars everything at Google: the need to sell more and more ads and monetize the shit out of everything.

falcor84
0 replies
21h8m

coming back to the "follow through". Google Search, Gmail, Chrome, Android, Photos, Drive, Cloud etc. all are excellent examples of Google's long term commitment

Do you have any examples of something they launched in the last decade?

sangnoir
0 replies
22h10m

I don’t think I’ve seen serious criticism of Google’s abilities

Serious or not, that criticism existed on HN - and still does. I've seen many comments claiming Google has "fallen behind" on AI, sometimes with the insinuation the Google won't ever catch up due to OpenAI's apparent insurmountable lead

aprilthird2021
0 replies
22h6m

I saw it here alone. A lot of people simply have no idea the level of research ability and skill Google, the inventor of the Transformer, has.

localfirst
2 replies
22h15m

Don't forget SORA edited their "ai generated" videos while Google did not here.

Where did SORA get all its training videos from again and why won't the executives answer a simple Yes/No question to "Did you scrape Youtube to train SORA?"

Google attorneys want to know.

scarmig
0 replies
21h36m

Google does not care to start a war where every company has to form explicit legal agreements with every other company to scrape their data. Maybe if they got really desperate, but right now they have no reason to be.

TwentyPosts
0 replies
20h58m

Don't forget SORA edited their "ai generated" videos while Google did not here.

Wait, really? Could you point to proof for this? I'm very curious where this is coming from

septic-liqueur
0 replies
21h10m

I have no doubt about Google's capabilities in AI, my doubt lies on the productization part. I don't think they can produce something that will not be a complete mess

CSMastermind
0 replies
20h52m

In fact, they already did.

In terms of software that's actually been released Google is still at best in third place when it comes to AI products.

I don't care what they can demo, I care what they've shipped. So far the only thing they've shipped for Veo is a waitlist.

localfirst
10 replies
22h31m

We have to take account that this community (good chunk have stakes in YC and a lot to gain from secondary shares in OpenAI) and platform is going to favor its own and be aware that Sam Altman is the golden boy of YC's founder after all.

So of course you are going to see snarky comments and straight up denial in the competition. We saw that yesterday in the comments with the release of GPT4o in anticipation of Gemini 2.0 (GPT-5 basically) release being announced today at Google I/O

I'm SORA to say Veo looks much more polished without jank.

Big congratulations to Google and their excellent AI team for not editing their AI generated videos like SORA

mrbungie
2 replies
20h45m

The amount of copium in this response is astounding.

Yes, there is a noticeable negative response from HN towards Google, and there has always been especially when speaking about their weird product management practices and incentives. Google hasn't launched any notable (and still surviving, Stadia being a sad example of this) consumer product or service in the last 10 years.

But to suggest there is a Sam Altman / OpenAI bias is delusional. In most posts about them there is at least some kind of skepticism or criticism towards Altman (his participation in Worldcoin and his accelerationist stance towards AGI) or his companies (OpenAI not being really open).

PS: I would say most people lurking here are just hackers (of many kinds, but still hackers), not investors with shady motives.

localfirst
1 replies
17h27m

My argument wasn't that there was a cabal of shady investors trying to influence perception here. your observation is certainly valid there is general disdain for Google but specifically I'm calling out people that were blatantly telling lies and making outlandish claims and attacking others who were simply pointing out that some of those people have financial motives (either being backed by YC or seek to benefit from the work of others).

None of this is surprising to me and shouldn't shock you. You are literally on a site called Ycombinator. Had this been another platform without ties to investments or drawing from crowd that actively seeks to enrich themselves through participation in a narrative, this wouldn't even be a thing.

Large number of people who read my comment seems to agree and this whole worldcoin thing seems to me just another distraction (We've already been through why that was shady but we are talking about something different here).

mrbungie
0 replies
16h50m

Well, you have a point. I've always thought that Hacker News <> YCombinator, but maybe the truth is in the middle. At the very least, this is food for thought.

baobabKoodaa
2 replies
21h35m

We have to take account that this community (good chunk have stakes in YC and a lot to gain from secondary shares in OpenAI)

You have to be pretty deep inside your own little bubble to think that even more than a 0.001% of HN has "stakes in YC" or "secondary shares in OpenAI".

hu3
1 replies
21h32m

It can be a vocal minority. Still vocal.

I wouldn't discard.

dylan604
0 replies
20h50m

I have 0% stake in any YC, and I'm very vocal in my negativity against any of these "AI" anythings. All of these announcements are only slighty more than a toddler anxious to show the parental units a finger painting looking to hang it on the fridge. Only instead of the fridge, they are a hoping to get funding/investment knowing that their product is not a fully fledged anything. It's comical.

JumpCrisscross
2 replies
22h15m

platform is going to favor its own and be aware that Sam Altman is the golden boy of YC's founder

I don’t know if there is a sentiment analysis tool for HN, but I’m pretty sure it’s been dead negative for Altman since at least Worldcoin.

saalweachter
0 replies
21h50m

A land of contrasts, etc.

betternet77
0 replies
20h21m

Yup, there's a significant anti-Google spin in HN, twitter. For example, here's paulg claiming that Cruise handles driving around cyclists better than Waymo [1], obviously not true to anyone who's used both services

[1] https://twitter.com/paulg/status/1360341492850708481

Xenoamorphous
8 replies
21h42m

It’s tiring. Same thing happened to the GPT-4o announcement yesterday. Apparently because there’s no unquestionable AGI 14 months after GPT-4 then everything sucks.

I always found HN contrarian but as I say it’s really tiring. I’ve no idea what the negative commenters are working on on a daily basis to be so dismissive of everybody else’s work, including work that leaves 90% of the population in a combination of awe and fear. Also people sometimes forget that behind big corp names there are actual people. People who might be reading this thread.

motoxpro
4 replies
21h5m

Yeah it's pretty unfortunate. Saying something sucks is such a lack of understanding that things are not static. I guess it's a sure way to be right, because there will always be progress and you can look back and say "See I told you!"

IggleSniggle
3 replies
20h46m

Psh. Things are not static. Progress sucks now. Haven't you heard of enshitification? You can always look back and say, "see? I told you it would suck in the future!"

...why am I feeling to urge to point out that I am only making a joke here and not trying to make an actual counter point, even if one can be made...?

piloto_ciego
2 replies
19h13m

I commented on this elsewhere, but being a negative Nancy is really a winning strategy.

If you’re negative and you get it wrong, nobody cares, get it and right you look like a damn genius. Conversely, if you’re positive and get it wrong, you look like an idiot and if you’re right you’re praised for a good call once. The rational “game theory” choice is to predict calamity.

motoxpro
1 replies
11h10m

Yeah it’s funny that optimism in the long term is optimal and pessimism in the short term is optimal.

piloto_ciego
0 replies
3h39m

Right, but I think people sometimes get the “what constitutes long term” factor a little bit wrong.

I am still talking to a lot of people who say, “what can any of this AI stuff even do?” It’s like, robots you could hold a conversation with effectively didn’t exist 3 years ago and you’re already upset that it’s not a money tree?

I think that peoples expectation horizon narrowing down may be the clearest evidence that we’re in the singularity.

mupuff1234
1 replies
21h1m

What's also tiring is that no one is allowed to have any critical thoughts because "it's tiring".

From my own perspective the critique is usually a counter balance to extreme hype, so maybe let's just agree it's ok to have both types of comments, you know "checks and balances".

piloto_ciego
0 replies
19h17m

Being cynical is not a counterbalance though, it’s just as low effort as the hype people.

Workaccount2
0 replies
20h46m

AI is a pretty direct threat to software engineering. It's no surprise people are hostile towards it. Come 2030, how do you justify a paying someone $175k/yr when a $20/mo app is 95% as good, and the other 5% can be done by someone making $40k/yr?

piloto_ciego
7 replies
21h42m

I think it’s fear. Maybe not openly, but people are spooked at how fast stuff is happening, so shitting on progress is a natural reaction.

brikym
3 replies
20h39m

Progress? There are loads of downsides the AI fans won't acknowledge. It diminishes human value/creativity and will be owned and controlled by the wealthiest people. It's not like the horse being replaced by the tractor. This time it's different there is no place to move to but doing nothing on a UBI (best case). That same power also opens the door to dystopian levels of censorship and surveillance. I see more of the Black Mirror scenarios coming true rather than breakthroughs that benefit society. Nobody is denying that it's impressive but the question is more whether it's good overall. Unfortunately the toothpaste seems to be out of the tube.

sshnuke
1 replies
12h10m

It diminishes human value/creativity and will be owned and controlled by the wealthiest people

"When you go to an art gallery, you are simply a tourist looking at the trophy cabinet of a few millionaires" - Banksy

piloto_ciego
0 replies
3h36m

Then… isn’t AI generated art something that empowers the non-millionaires?

piloto_ciego
0 replies
19h24m

Progress? There are loads of downsides the AI fans won't acknowledge.

I don’t know if this is true.

It diminishes human value/creativity

I don’t see this at all, I see it as enhancing creativity and human value.

and will be owned and controlled by the wealthiest people.

There are a lot of open source models being created, even if they are being released by Meta…

It's not like the horse being replaced by the tractor. This time it's different there is no place to move to but doing nothing on a UBI (best case).

So, like, you wouldn’t do anything if you could just chill on UBI all day? If anything I’d get more creative.

That same power also opens the door to dystopian levels of censorship and surveillance.

I don’t disagree with this at all, but I think we can fight back here and overcome this, but we have to lean into the tech to do that.

I see more of the Black Mirror scenarios coming true rather than breakthroughs that benefit society.

I think this is basically wrong historically. Things are very seldom permanently dystopian if they’re dystopian at all. Things are demonstrably better than they were 100 years ago, and if you think back even a couple decades things are often a lot better.

The medical applications alone will save a lot of lives.

Nobody is denying that it's impressive but the question is more whether it's good overall. Unfortunately the toothpaste seems to be out of the tube.

There are going to be annoyances, but I would bet serious cash that things continue to get better.

kmacdough
1 replies
20h46m

I suspect it's also a general fatigue with the over-hype. It is moving fast, but every step improvement has come with its own mini hype cycle. The demos are very curated and make the model look incredibly flexible and resilient. But when we test the product in the wild, it's constantly surprising the simple tasks it blunders on. It's natural to become a bit cynical and human to take that cynicism on the attack. Not saying it's right, just natural, in the same way that it's natural for the marketing teams to be as misleading as they can get away with. Both are annoying, but there's not much to do.

piloto_ciego
0 replies
19h21m

Cynicism is (arguably) the intellectually easy strategy.

If you’re cynical and you get it right that everything “sucks” you look like a genius, if you get it wrong there is no penalty.

If you aren’t cynical and you talk about how great something is going to be and it flops you look like an idiot. The social penalty is much higher.

Workaccount2
0 replies
20h49m

I have noticed this the most in SWE's who went from being code writers to "human intention decipherers". Ask a an SWE in 2019 what they do and it was "Write novel and efficient code", ask one in 2024 and you get "Sit in meetings and talk to project managers in order to translate their poor communication to good code".

Not saying the latter was never true, it's just interesting to see how people have reframed their work in the wake of breakneck AI progress.

jtolmar
2 replies
20h47m

I think it's just hype fatigue.

There's genuinely impressive progress being made, but there are also a lot of new models coming out promising way more than they can deliver. Even the Google AI announcements, which used to be carefully tailored to keep expectations low and show off their own limitations, now read more like marketing puff pieces.

I'm sure a lot of the HN crowd likes to pretend we're all perfectly discerning arbiters of the tech future with our thumbs on the pulse of the times or whatever, but realistically nobody is going to sift through a mountain of announcements ranging from "states it's revolutionary, is marginal improvement" to "states it's revolutionary, is merely an impressive step" to "states it's revolutionary, is bullshit" without resorting to vibes-based analysis.

throwup238
1 replies
20h31m

It's made all the worse by just being a giant waitlist. Sora is still no where to be seen three months later, GPT-4o's conversational features aren't widely rolled out yet, and Google's AI releases have been waitlist after waitlist after waitlist.

Companies can either get peopled hyped or have never-ending georestricted waitlists, they can't have their cake and eat it too.

indigodaddy
0 replies
20h17m

Isn’t there a lot of positive forward motion and fruitfulness in the current state of the open source llama-3 community?

rmbyrro
0 replies
20h12m

I hope they didn't mess this one up with ideologically driven non-sense, like they did with Gemini.

jmkni
0 replies
20h54m

Well for me it linked to a Google Form to join a waitlist lol, so I'm not exactly pumped

Dig1t
0 replies
20h45m

Honestly just think that Google has burned their good will at this point. If you notice, most announcements by Apple are positively received here and same with OpenAI. But since Google's "don't be evil" persona has faded and since they went through so much churn WRT products. I think most people just don't want to see them win.

salamo
46 replies
16h9m

The first thing I will do when I get access to this is ask it to generate a realistic chess board. I have never gotten a decent looking chessboard with any image generator that doesn't have deformed pieces, the correct number of squares, squares properly in a checkerboard pattern, pieces placed in the correct position, board oriented properly (white on the right!) and not an otherwise illegal position. It seems to be an "AI complete" problem.

arcticbull
37 replies
16h4m

Similarly the Veo example of the northern lights is a really interesting one. That's not what the northern lights look like to the naked eye - they're actually pretty grey. The really bright greens and even the reds really only come out when you take a photo of them with a camera. Of course the model couldn't know that because, well, it only gets trained on photos. Gets really existential - simulacra energy - maybe another good AI Turing test, for now.

porphyra
10 replies
13h24m

Human eyes are basically black and white in low light since rod cells can't detect color. But when the northern lights are bright enough you can definitely see the colors.

The fact that some things are too dark to be seen by humans but can be captured accurately with cameras doesn't mean that the camera, or the AI, is "making things up" or whatever.

Finally, nobody wants to see a video or a photo of a dark, gray, and barely visible aurora.

exodust
9 replies
12h34m

nobody wants to see a video or a photo of a dark, gray, and barely visible aurora

Except those who want to see an accurate representation of what it looks like to the naked eye.

stkhlm
7 replies
12h14m

Living in northern Sweden I see the northern lights multiple times a year. I have never seen them pale or otherwise not colorful. Green and reds always. That is to my naked eye. Photographs do look more saturated, but the difference isn't as large as this comment thread make it out to be.

shwaj
2 replies
11h19m

That mirrors my experience from when I used to live in northern Canada

jabits
1 replies
10h36m

Even in Upper Michigan near Lake Superior we sometimes had stunn, colorful Northern Lights. Sometimes it seemed like they were flying overhead within your grasp

DaSHacka
0 replies
9h56m

Most definitely, it's quite common to find people hanging around outside up towards Calumet whenever there's a night with a high KP Index.

I highly recommend checking them out if you're nearby, the recent auoras have been quite astonishing

fzzzy
1 replies
7h16m

In the upper peninsula of michigan I have only seen grey.

Jensson
0 replies
2h38m

That is the same latitude as Paris though, not very north at all.

peanut_merchant
0 replies
9h18m

Even in Northern Scotland (further south than northern Sweden) this is the case. The latest aurora showing was vividly colourful to the naked eye.

exodust
0 replies
6h20m

I'm in Australia where the southern lights are known to be not as intense as northern lights. That's where my remark comes from. Those who have never seen the aurora with their own eyes may like to see an accurate photo. A rare find among the collective celebration of saturation.

freedomben
0 replies
5h58m

Exactly. I went through major gas lighting trying to see the Aurora. I just wasn't sure whether I was actually seeing it, because it always looked so different from the photos. It is absolutely maddening trying to find a realistic photo of what it looks like to the naked eye, so that you can know if what you are seeing is actually the Aurora and not just clouds

pmlarocque
5 replies
15h46m

That not true, they look grey when they aren't bright enough, but they can look green or red to the naked eyes if they are bright. I have seen it myself and yes I was disappointed to see only grey ones last week.

see: https://theconversation.com/what-causes-the-different-colour...

arcticbull
4 replies
15h42m

[Aurora] only appear to us in shades of gray because the light is too faint to be sensed by our color-detecting cone cells."

Thus, the human eye primarily views the Northern Lights in faint colors and shades of gray and white. DSLR camera sensors don't have that limitation. Couple that fact with the long exposure times and high ISO settings of modern cameras and it becomes clear that the camera sensor has a much higher dynamic range of vision in the dark than people do.

https://www.space.com/23707-only-photos-reveal-aurora-true-c...

This aligns with my experiences.

The brightest ones I saw in Northern Canada I even saw hints of reds - but no real greens - until I looked at it through my phone, and it looked just like the simulated video.

If I looked up and saw them the way they appear in the simulation, in real life, I'd run for a pair of leaded undies.

kortilla
0 replies
12h53m

This is such an arrogant pile of bullshit. I’ve seen very obvious colors on many different occasions in the northern part of the lower 48, up in southern Canada, and in Alaska.

Tronno
0 replies
15h29m

I've seen it bright green with the naked eye. It definitely happens. That article is inaccurate.

Maxion
0 replies
12h50m

Greens are the more common colors, reds and blues occur in higher energy solar storms.

And yes, they can be as green to the naked eye in that AI video. I've seen aurora shows that fill the entire night sky from horizon to horizon, way more impressive than that AI video with my own eyes.

Kiro
0 replies
13h1m

That is totally incorrect which anyone who have seen real northern lights can attest to. I'm sorry that you haven't gotten the chance to experience it and now think all northern lights are that lackluster.

paxys
5 replies
15h19m

That's not true at all. I have seen northern lights with my own eyes that were more neon green and bright purple than any mainstream photo.

cryptoz
4 replies
14h10m

There's a middle ground here. I saw the northern lights with my own eyes just days ago and it was mostly grey. I saw some color. But when I took a photo with a phone camera, the color absolutely popped. So it may be that you've seen more color than any photo, but the average viewer in Seattle this past weekend saw grey-er with their eyes and huge color in their phone photos.

(Edit: it was still super-cool even if grey-ish, and there was absolutely beautiful colors in there if you could find your way out of the direct city lights)

goostavos
3 replies
13h39m

The hubris of suggesting that your single experience of vaguely seeing the northern lights one time in Seattle has now led to a deep understanding of their true "color" and that the other person (perhaps all other people?) must be fooling themselves is... part of what makes HN so delightful to read.

I've also seen the northern lights with my own eyes. Way up in the arctic circle in Sweden. Their color changes along with activity. Grey looking sometimes? Sure. But also colors that are so vivid that it feels like it envelopes your body.

stavros
0 replies
10h50m

They did say "the average viewer in Seattle this past weekend", not "all other viewers".

Then again, the average viewer in Seattle this past weekend is hardly representative of what the northern lights look like.

lpapez
0 replies
10h8m

The hubris of suggesting that your single experience of vaguely seeing the northern lights one time in Seattle has now led to a deep understanding of their true "color" and that the other person (perhaps all other people?) must be fooling themselves is... part of what makes HN so delightful to read.

The H in HN stands for Hubris.

freedomben
0 replies
5h53m

The person they were responding to was saying that the people reporting grays were wrong, and that they had seen it and it was colorful. If anything, you should be accusing that person of hubris, not GP. All GPS point was, is that it can differ in different situations. They used the example of Seattle to show that the person they were responding to is not correct that it is never gray and dull.

sdenton4
3 replies
15h46m

That doesn't seem in any way useful, though... To use a very blunt analogy, are color blind people intelligent/sentient/whatever? Obviously, yes: differences in perceptual apparatus aren't useful indicators of intelligence.

shermantanktop
2 replies
15h28m

As a colorblind person…I could see the northern lights way better than all the full-color-vision people around me squinting at their phones.

Wider bandwidth isn’t always better.

Ferret7446
1 replies
12h23m

I could see the northern lights way better than all the full-color-vision people around me

How would you know?

squeaky-clean
0 replies
10h9m

Quote the entire sentence, not just a portion of it.

skypanther
0 replies
5h4m

What struck me about the northern lights video was that it showed the Milky Way crossing the sky behind the northern lights. That bright part of the Milky Way is visible in the southern sky but the aurora hugging the horizon like that indicates the viewer is looking north. (Swap directions for the southern hemisphere and the aurora borealis).

simonjgreen
0 replies
12h26m

To be fair, the prompt isn’t asking for a realistic interpretation it’s asking for a timelapse. What it’s generated is absolutely what most timelapses look like.

Prompt: Timelapse of the northern lights dancing across the Arctic sky, stars twinkling, snow-covered landscape
poulpy123
0 replies
8h58m

that's a bad example since the only images of aurora borealis are brightly colored. What I expect of an image generator is to output what is expected from it

laserbeam
0 replies
14h47m

For decades, game engines have been working on realistic rendering. Bumping quality here and there.

The golden standard for rendering has always been cameras. It’s always photo-realistic rendering. Maybe this won’t be true for VR, but so far most effort is to be as good as video, not as good as the human eye.

Any sort of video generation AI is likely to have the same goal. Be as good as top notch cameras, not as eyes.

hoyd
0 replies
13h3m

I can see what you mean, and that the video is somewhat not what it would be like in real. I have lived in northern Norway most of my life, and watched Auroras a lot. It certainly look green and link for the most time. Fainter, it would perhaps sorry gray I guess? Red, when viewed from a more southern viewpoint..

I work at Andøya Space where perhaps most of the space research on Aurora had been done by sending scientific rockets into space for the last 60 yrs.

garyrob
0 replies
13h47m

Even in NY State, Hudson River Valley, I've seen them with real color. They're different each time.

darkstar_16
0 replies
10h32m

Northern lights are actually pretty colourful, even to the naked eye. I've never seen them pale or b/w

blhack
0 replies
13h38m

Have you ever seen the Northern Lights with your eyes? If so I'm curious where you saw them.

I echo what some other posters here have said: they're certainly not gray.

Kiro
0 replies
13h7m

Shouldn't the model reflect how it looks on video rather than our naked eye?

22c
0 replies
15h47m

I've only ever seen photos of the northern lights and I also didn't know that.

sdenton4
3 replies
15h44m

This strikes me as equally "AI complete" as drawing hands, which is now essentially a solved problem... No one test is sufficient, because you can add enough training data to address it.

salamo
2 replies
15h29m

Yeah "AI complete" is a bit tongue-in-cheek but it is a fairly spectacular failure mode of every model I've tried.

swyx
0 replies
7h46m

ive been using “agi-hard” https://latent.space/p/agi-hard as a term

because completeness isnt really what we are going for

smusamashah
0 replies
9h40m

Ideogram and dalle do hands pretty well

sabellito
1 replies
10h58m

Per usual the top comment on anything AI related is snark on "it can't to [random specific thing] well yet".

kmacdough
0 replies
8h33m

Tiring, but so is the relentless over-marketing. Each new demo implies new use cases and flexible performance. But the reality is they're very brittle and blunder most seemingly simple tasks. I would personally love an ongoing breakdown of the key weaknesses. I often wonder "can it X?" The answer is almost always "almost, but not a useful almost".

perbu
0 replies
5h43m

Most generative AI will struggle when given a task that requires something more less exact. They're probably pretty good at making something "chessish".

mikeocool
0 replies
6h37m

Ha, wow, I’d never seen this one before. The failures are pretty great. Even repeatedly trying to correct ChatGPT/Dall-e with the proper number of squares and pieces, it somehow makes it worse.

This is what dall-e came up with after trying to correct many previous iterations: https://imgur.com/Ss4TwNC

popcar2
34 replies
1d

Not nearly as impressive as Sora. Sora was impressive because the clips were long and had lots of rapid movement since video models tend to fall apart when the movement isn't easy to predict.

By comparison, the shots here are only a few seconds long and almost all look like slow motion or slow panning shots cherrypicked because they don't have that much movement. Compare that to Sora's videos of people walking in real speed.

The only shot they had that can compare was the cyberpunk video they linked to, and it looks crazy inconsistent. Real shame.

seoulmetro
3 replies
19h38m

There is now on that second link:

The videos below were edited by the artists, who creatively integrated Sora into their work, and had the freedom to modify the content Sora generated.
seoulmetro
0 replies
17h25m

That's hilarious. Your comment clearly got seen by someone.

Aeolun
0 replies
18h42m

If you modified something because it got some attention on HN, at least have the guts to own up to it :/

rvz
3 replies
22h20m

Interesting to see that OpenAI was successful in creating their own reality distortion spells, just like Apple's reality distortion field which has fooled many of these commenters here.

It's quite early to race to the conclusion that one is better than the other when not only they are both unreleased, but especially when the demos can be edited, faked or altered to look great for optics and distortion.

EDIT: It appears there is at least one commenter who replied below that is upset with this fact above.

It is OK to cope, but the truth really doesn't care especially when the competition (Google) came out much stronger than expected with their announcements.

ijidak
2 replies
20h10m

Well, as a counterpoint, Apple did become a $2 trillion dollar company...

Distortion is easiest when the products really work. :)

adventured
1 replies
20h2m

Apple got up to $3 trillion back in 2023.

turnsout
0 replies
17h39m

Indeed, and they’re at 2.87T today… Built largely on differentiated high-margin products, which is not how I would describe OpenAI. I should clarify that I’m a fan of both companies, but the reality is that OpenAI’s business model depends on how well it can commoditize itself.

hanspeter
0 replies
11h55m

I believe it was clear that Air Head was an edited video.

The intention wasn't to show "This is what Sora can generate from start to end" but rather "This is what a video production team can do with Sora instead of shooting their own raw footage."

Maybe not so obvious to others, but for me it was clear from how the other demo videos looked.

TIPSIO
6 replies
23h19m

Objectively speaking (if people would be honest with themselves), both are just decent at best.

I think comparing them now is probably not that useful outside of this AI hype train. Like comparing two children. A lot can happen.

The bigger message I am getting from this is it's clear OpenAI won't have a super AI monopoly.

TaylorAlexander
3 replies
23h8m

Comparing two children is a good one. My girlfriend has taken to pointing out when I’m engaging in “punditry”. They're an engineer like I am and we talk about tech all the time, but sometimes I talk about which company is beating which company like it’s a football game, and they call me out for it.

Video models are interesting, and to some extent trying to imagine which company is gonna eat the other’s lunch is kind of interesting, but sometimes that’s all people are interested in and I can see my girlfriend's reasoning for being disinterested in such discussion.

Jonanin
2 replies
15h27m

Except that many of the people involved do think of it like a football game, and thus it actually is like one. Of course the researchers and engineers at both OpenAI and Google DeepMind have a sense of rivalry and strive to one up another. They definitely feel like they are in a competition.

TaylorAlexander
1 replies
11h54m

They definitely feel like they are in a competition.

Citation needed?

Although I did not work in AI, I did work at Google X robotics on a robot they often use for AI research.

Maybe some people felt like it was a competition, but I don’t have much reason to believe that feeling is common. AI researchers are literally in collaboration with other people in the field, publishing papers and reading the work of others to learn and build upon it.

Jensson
0 replies
9h53m

AI researchers are literally in collaboration with other people in the field, publishing papers and reading the work of others to learn and build upon it.

When OpenAI suddenly stopped publishing their stuff I bet that many researchers now started feeling like it started to be a competition.

OpenAI is no longer cooperating, they are just competing. They still haven't said anything about how gpt-4 works.

motoxpro
0 replies
21h3m

What would make this "Good?"

Aeolun
0 replies
18h40m

I’m fairly certain Google just has a big stack of these in storage but never released, or the moment someone pulls ahead it’s all hands on deck to make the same thing.

Jensson
3 replies
23h28m

Sora was impressive because the clips were long and had lots of rapid movement

Sora videos ran at 1 beat per second, so everything in the image moved at the same beat and often too slow or too fast to keep the pace.

It is very obvious when you inspect the images and notice that there are keyframes at every whole second mark and everything on the screen suddenly goes in their next animation step.

That really limits the kind of videos you can generate.

lupire
2 replies
22h51m

So it needs to learn how far each object can travel in 1sec at its natural speed?

Jensson
1 replies
22h47m

It also needs to separate animation steps for different objects so that objects can keep different speeds. It isn't trivial at all to go from having a keyframe for the whole picture to having separate for separate parts, you need to retrain the whole thing from the ground up and the results will be way worse until you figure out a way to train that.

My point is that it isn't obvious at all that Soras way actually is closer to the end goal, it might look better today to have those 1 second beats for every video but where do you go from there?

Aerroon
0 replies
19h8m

The best case scenario would probably being able to generate "layers" at a time. That would give more creative control over the outcome, but I have no idea how you would do it.

totaldude87
1 replies
22h39m

Could also be the doing of google. if Veo screws up , the weight falls on Alphabet stock. While open AI is not public and doesn't have to worry about anything . Like even if open AI faked some of their AI videos(not saying they did), it wouldn't affect them the way it would affect Veo--> Google-->Alphabet

being cautious often puts a dent in innovation

pheatherlite
1 replies
18h0m

Not just that, but anything with a subject in it felt uncanny valleyish... like that cowboy clip, the gate of the horse stood out as odd and then I gave it some attention . It seems like a camel's gate. And whole thing seems to be hovering, gliding rather than walking. Sora indeed seems to have an advantage

__float
0 replies
17h18m

I thought a camel's gait is much closer to two legs moving almost at the same time. Granted, I don't see camels often. Out of curiosity can you explain that more?

ein0p
1 replies
23h47m

Also Sora demos had some really impressive generations featuring _people_. Here we hardly see any people which likely means exactly what you’d guess.

data-ottawa
0 replies
23h7m

Has Gemini started generated impacted of people again? My trial has ended and I haven’t been following the issue.

spiderfarmer
0 replies
23h46m

Also the horse just looks weird, just like the buildings and peppers.

It's impressive as hell though. Even if it would only be used to extrapolate existing video.

nuz
0 replies
23h55m

Sora is also movement limited to a certain range if you look at the clips closely. Probably something like filtering by some function of optical flow in both cases.

dyauspitr
0 replies
19h15m

They’re not showing people because that can get hairy quickly.

btown
0 replies
18h33m

A commercially available tool that can turn still images into depth-conscious panning shots is still tremendously impactful across all sorts of industries, especially tourism and hospitality. I’m really excited to see what this can do.

arcastroe
0 replies
22h44m

The shots here [..] almost all look like slow motion or slow panning shots.

I think this is arguably better than the alternative. With slow-mo generated videos, you can always speed them up in editing. It's much harder to take a fast-paced video and slow it down without terrible loss in quality.

LZ_Khan
0 replies
23h41m

I imagine thats just a function of how much training data you throw at it.

ugh123
23 replies
21h47m

From a filmmaking standpoint I still don't think this is impactful.

For that it needs a "director" to say: "turn the horse's head 90˚ the other way, trot 20 feet, and dismount the rider" and "give me additional camera angles" of the same scene. Otherwise this is mostly b-roll content.

I'm sure this is coming.

qingcharles
6 replies
20h31m

I can see using these video generators to create video storyboards. Especially if you can drop in a scribbled sketch and a prompt for each tile.

ancientworldnow
5 replies
18h19m

That sounds actively harmful. Often we want story boards to be less specific so as not to have some non artist decision maker ask why it doesn't look like the storyboard.

And when we want it to match exactly in an animatic or whatever, it needs to be far more precise than this, matching real locations etc.

sbarre
2 replies
18h3m

I know you weren't implying this, but not every storyboard is for sharing with (or seeking approval from) decision makers.

I could see this being really useful for exploring tone, movement, shot sequences or cut timing, etc..

Right now you scrape together "kinda close enough" stock footage for this kind of exploration, and this could get you "much closer enough" footage..

shermantanktop
1 replies
15h12m

I think of it in terms of the anchoring bias. Imagine that your most important decisions are anchored for you by what a 10 year old kid heard and understood. Your ideas don’t come to life without first being rendered as a terrible approximation that is convincing to others but deeply wrong to you, and now you get to react to that instead of going through your own method.

So if it’s an optional tool, great, but some people would be fine with it, some would not.

sbarre
0 replies
6h18m

Absolutely. Everyone's creative process is different (and valid).

gregmac
0 replies
15h14m

I hadn't thought about that in movie context before, but it totally makes sense.

I've worked with other developers that want to build high fidelity wire frames, sometimes in the actual UI framework, probably because they can (and it's "easy"). I always push back against that, in favor of using whiteboard or Sharpies. The low-fidelity brings better feedback and discussion: focused on layout and flow, not spacing and colors. Psychologically it also feels temporary, giving permission for others to suggest a completely different approach without thinking they're tossing out more than a few minutes of work.

I think in the artistic context it extends further, too: if you show something too detailed it can anchor it in people's minds and stifle their creativity. Most people experience this in an ironically similar way: consider how you picture the characters of a book differently depending on if you watched the movie first or not.

cpill
0 replies
8h54m

I guess this will give birth to a new kind of film making. Start with a rough sketch, generate 100 higher quality versions with an image generator, select one to tweak, use that as input to a video generator which generates 10 versions, coffee one to refine etc

chacham15
2 replies
16h57m

I dont think "turn the horse's head 90˚" is the right path forward. What I think is more likely and more useful is: here is a start keyframe and here is a stop keyframe (generated by text to image using other things like controlnet to control positioning etc.) and then having the AI generate the frames in between. Dont like the way it generated the in between? Choose a keyframe, adjust it, and rerun with the segment before and segment after.

GenerocUsername
0 replies
16h11m

This appeals to me because it feels auditable and controllable... But the pace these things have been progressing the last 3 years, I could imagine the tech leapfrogs all conventional understanding real soon. Likely outputting gaussian splat style outputs where the scene is separate from the camera and ask peices can be independently tweaked via a VR director chair

8note
0 replies
14h46m

So a declarative keyframe of "the horses head is pointed forward" and a second one of "the horse is looking left"

And let the robot tween?

Vs an imperative for "tween this by turning the horse's head left"

imachine1980_
1 replies
16h37m

Stock videos are indeed crucial, especially now that we can easily search for precisely what we need. Take, for instance, the scene at the end of 'Look Up' featuring a native American dance in Peru. The dancer's movements were captured from a stock video, and the comet falling was seamlessly edited in. now imagine having near infinite stock videos tailored to the situation.

rzmmm
0 replies
14h43m

Stock photographers are already having issues with piracy due to very powerful AI watermark removal tools. And I suspect the companies are using content of these people to train these models too. .

Eji1700
1 replies
17h15m

There's also the whole "oh you have no actual model/rigging/lighting/set to manipulate" for detail work issue.

That said, I personally think the solution will not be coming that soon, but at the same time, we'll be seeing a LOT more content that can be done using current tools, even if that means a dip in quality (severely) due to the cost it might save.

SJC_Hacker
0 replies
15h35m

This lead me to the question of why hasn't there been an effort to do this with 3D content (that I know of).

Because camera angles/lighting/collision detection/etc. at that point would be almost trivial.

I guess with the "2D only" approach that is based on actual, acquired video you get way more impressive shots.

But the obvious application is for games. Content generation in the form of modeling and animation is actually one the biggest cost centers for most studios these days.

thehappypm
0 replies
5h12m

If you or I don’t see the potential here, I think that just means someone more creative is going to do amazing things with it

teaearlgraycold
0 replies
16h49m

Everything I’ve heard from professionals backs that up. Great for B roll. Great for stock footage. That’s it.

sailfast
0 replies
19h23m

For most things I view on the internet B-roll is great content, so I'm sure this will enable a new kind of storytelling via YouTube Shorts / Instagram, etc at minimum.

lofaszvanitt
0 replies
12h35m

I can't wait what will the big video camera makers gonna do with tech similar to this. Since Google clearly have zero idea what to do with this, and they lack the creativity, it's up to ARRI, Canon, Panasonic etc. to create their own solutions for this tech. I can't wait to see what Canon has up its sleeves with their new offerings that come in a few months.

larodi
0 replies
9h16m

Perhaps the only industry which immediately benefits from this is the short ads and perhaps TikTok. But still it is very dubious, as people seem to actually enjoy being themselves the directors of their thing, not somebody else.

Maybe this works for ads for duner place or shisha bar in some developing country. I’ve seen generated images used for menus in such places.

But I doubt a serious filmography can be done this way. And if it can - it’d be again thanks to some smart concept on behalf of humans.

kmacdough
0 replies
7h42m

I wouldn't be so sure it's coming. NNs currently dont have the structures for long term memory and development. These are almost certainly necessary for creating longer works with real purpose and meaning. It's possible we're on the cusp with some of the work to tame RNNs, but it's taken us years to really harness the power of transformers.

gedy
0 replies
16h29m

I think with AI content, we'd need to not treat it like expecting fine grained control. E.g. instead like "dramatic scene of rider coming down path, and dismounting horse, then looking into distance", etc. (Or even less detail eventually once a cohesive story can be generated.)

evantbyrne
0 replies
20h42m

They claim it can accept an "input video and editing command" to produce a new video output. Also, "In addition, it supports masked editing, enabling changes to specific areas of the video when you add a mask area to your video and text prompt." Not sure if that specific example would work or not.

aetherson
0 replies
16h50m

Yeah, I've made a lot of images, and it sure is amazing if all you're interested in is, like, "Any basically good image," but if you start needing something very particular, rather than "anything that is on a general topic and is aesthetically pleasing," it gets a lot harder.

And there are a lot more degrees of freedom to get something wrong in film than in a single still image.

Keyframe
23 replies
1d

Kind of sucks to be google. Even they're making good progress here, and have laid the foundations of a lot if not most things.. their products are, well there aren't any noteworthy compared to rest. And considering google is sitting on top of one of the largest if not THE largest video database, along with maps, traffic, search, internet.zip, usenet, vast computing resources vertically integrated.. they have the whole advantage in the world. So, the hell are they doing? Why isn't their CEO already out? Expectations from them are higher than from anyone else.

atleastoptimal
10 replies
23h52m

Because they punish experimentation as it eats into their bottom line. AI is a tool for ads in the mind of executives at Google. Ads and monetization of human productivity, not an agent of productivity on its own.

lolinder
5 replies
23h42m

"Laser-focused on the bottom line at the expense of all else" is not how I'd describe Google, now or at any point in the past. They have a lot of dysfunction, but if anything that dysfunction stems from too much experimentation and autonomy at the leaf nodes of the organization. That's how they get into these crazy places where they have to pick between 5 chat apps or whatever.

If Google were as focused on ads as you seem to think we'd at least see some sort of coherent org-wide strategy instead of a complete lack of direction.

khazhoux
2 replies
23h5m

The person now in charge of Search is Elizabeth Hamon Reid, a long-time googler who came up through the ranks from engineer (in Google Maps) to VP over 20 years. She's legit.

khazhoux
0 replies
22h17m

Ah, according to this, she’s head of Search but reports to Prabhakar. I thought from recent reports that she’d taken search over from him.

Nonetheless, she was a good engineer and a good manager, back when we crossed path many moons ago.

https://searchengineland.com/liz-reid-google-new-head-of-sea...

lolinder
0 replies
22h46m

That was a decision that prioritized the bottom line over other things. But saying that Google is "focused" on the bottom line implies that there's a pattern of them putting the bottom line first, which is simply not true if you look at Google as a whole. Search specifically, maybe, but not Alphabet.

khazhoux
3 replies
23h48m

C'mon, Google doesn't "punish" experimentation. Google X, Google Glass, Daydream, Fuschia, moonshots, the lab spinoff (whose name I can't remember)... hell, even all the abandoned products everyone here always complains about.

The experiments often/usually fail, but they do experiment.

Koffiepoeder
2 replies
23h3m

If you prune all the branches, where will the fruits grow?

khazhoux
1 replies
22h56m

The branches were dead and could bear no fruit. New branches will sprout next season.

saalweachter
0 replies
16h37m

For grapes, the conventional wisdom is to prune all the old branches at the end of each season.

Workaccount2
4 replies
23h50m

I don't know how more people don't talk about the 1M context tokens. While the output is mediocre for cutting edge models, you can context stuff the ever living hell out of it for some pretty amazing capabilities. 2M tokens is even crazier.

lordswork
1 replies
23h47m

It is pretty amazing. I've been using it every day. I do wish you could easily upload an entire repo into it though.

bongodongobob
0 replies
21h45m

Have it write a program to output a repo as a flat file.

rm_-rf_slash
0 replies
23h30m

Anything approaching the token limit I turn into a file and upload to a vector store. Results are comparable between Chat and Assistants.

Keyframe
0 replies
21h30m

That's a good point. Gemini gatekeeping me on so many answers made me forget about this extraordinary feature of it.

InfiniteVortex
3 replies
1d

Google search has been absolutely ruined in terms of quality. You're right, they've built the base in terms of R&D for many of the AI breakthroughs thats powering competing alternative products.... that happen to be better than Google's own products. Google went from "Don't be evil" to just another big corporate tech company. They have so much potential. Regrettable.

CraftingLinks
1 replies
23h52m

They are fast on their way becoming IBM 2.0.

jason-phillips
0 replies
22h6m

More like Xerox

dyauspitr
0 replies
23h6m

If anything google search with the Gemini area on the top has been very good for me.

dyauspitr
1 replies
23h8m

Their CEO is generating massive, growing profits every quarter while releasing generative technology, all the while threading a fine line in what those models generate because it can be pretty devastating for a large corp like Google.

Keyframe
0 replies
23h2m

you think it's because of him or despite him?

softwaredoug
0 replies
23h22m

It's often said you need to disrupt your own business model.

Google had blinders on. They didn't relentlessly focus on reinventing their domain. They just milked what they had. Gradually losing site of the user experience[1] to focus on monetization above all else.

1 - https://twitter.com/pdrmnvd/status/1707395736458207430

inasio
18 replies
1d

From a 2014 Wired article [0]: "The average shot length of English language films has declined from about 12 seconds in 1930 to about 2.5 seconds today"

I can see more real-world impact from this (and/or Sora) than most other AI tools

[0] https://www.wired.com/2014/09/cinema-is-evolving/

mattgreenrocks
13 replies
23h51m

This is very noticeable. Watching movies from the 1970s is positively serene for me, vs the shot time on modern films often leaves me wonder, "wait, what just happened there?"

And I'm someone who is fine playing fast action video games. Can't imagine what it's like if you're older or have sensory processing issues.

aidenn0
3 replies
23h7m

I'd like to fact check this amazing comment on that video, but it would require watching Taken 3:

Some of y'all may find how awful this editing gets pretty interesting: I did an Average Shot Length (ASL) for many movies for a recent project, and just to illustrate bad overediting in action movies, I looked at Taken 3 (2014) in its extended cut.

The longest shot in the movie is the last shot, an aerial shot of a pier at sunset ending the movie as the end credits start rolling over them. It clocks in at a runtime of 41 seconds and is, BY FAR, the longest shot in the movie.

The next longest is a helicopter establishing shot of the daughter's college after the "action scene" there a little over an hour in, at 5 seconds.

Otherwise, the ASL for Taken 3 (minus the end credits/opening logos), which has a runtime of 1:49:40, 4,561 shots in all (!!!), is 1.38 SECONDS . For comparison, Zack Snyder's Justice League (2021) (minus end credits/opening logos) is 3:50:59, with 3163 shots overall, giving it an ASL of 4.40 seconds, and this movie, at 1 hour 50 minutes, has north of 4,561 for an ASL of 1.38 seconds?!?! Taken 3 has more shots in it than Zack Snyder's Justice League, a movie more than double its length...

To further illustrate how ridiculous this editing gets, the ASL for Taken 3's non-action scenes is 2.27 seconds. To reiterate, this is the non-action scenes. The "slow scenes." The character stuff. Dialogue scenes. The stuff where any other movie would know to slow down. 2.27 SECONDS For comparison, Mad Max: Fury Road (minus end credits/opening logos) has a runtime of 1:51:58, with 2646 shots overall, for an ASL of 2.54 seconds. TAKEN 3'S "SLOW SCENES" ARE EDITED MORE AGGRESSIVELY THAN MAD MAX: FURY ROAD!

And Taken 3's action scenes? Their ASL is 0.68 seconds!

If it weren't for the sound people on the movie, Taken 3 wouldn't be an "action movie". It'd be abstract art.
throwup238
1 replies
22h56m

It's worth noting that Taken 3 has a 13% rating on Rotten Tomatoes, which is well in to "it's so bad it's good" territory. I don't think the rapid cuts went unnoticed.

nimithryn
0 replies
22h7m

Yeah, this sequence is a meme commonly cited to show "choppy modern editing"

llmblockchain
0 replies
22h12m

More chops than an MF DOOM track.

kristofferR
2 replies
22h53m

The top comment makes a really good point though:

"He's 68. I'm guessing they stitched it together like this because "geriatric spends 30 seconds scaling chainlink fence then breaks a hip" doesn't exactly make for riveting action flick fare."

Lingering shots are horrible for obscuring things.

troupo
0 replies
20h49m

Keanu Reeves was 57-8 when he shot the last John Wick. IIRC Bob Odenkirk was 58 in Nobody. Neeson was 60 in Taken 3.

There ways to shoot an action scene with an aging star that doesn't involve 14 cuts in 4 seconds. You just have to care about your craft.

lupire
0 replies
22h47m

Movies have stunt performers.

And Neeson was only 60 when filming Taken 3.

nineteen999
0 replies
20h21m

Is it Liam Neeson, or his stunt double?

psbp
2 replies
21h36m

My brain processes too slow for modern action movies.

I can tell what's going on, but I always end up feeling agitated.

MarcScott
1 replies
12h13m

I'm okay with watching the majority of action movies, but I distinctly remember watching this fight scene in a Bourne movie and not having a clue what was going on. The constant camera changes, short shot length, and shaky cam, just confused the hell out of me.

https://youtu.be/uLt7lXDCHQ0?si=JnVMjmu0WgN5Jr5e&t=70

earthnail
0 replies
9h48m

I thought it was brilliant. Notice there’s no music. It’s one of the most brutal action scenes I know. Brutal in the sense of how honest it felt about direct combat.

kemitchell
0 replies
20h20m

Enjoy some Tarkovsky.

lobochrome
0 replies
18h31m

Shot length, yes - but the scene stays the same. Getting continuity with just prompts seems not yet figured out.

Maybe it's easy, and you feed continuity stills into the prompt. Maybe it's not, and this will always remain just a more advanced storyboarding technique.

But then again, storyboards are always less about details and more about mood, dialog, and framing.

jsheard
0 replies
1d

Even if the shots are very short you still need coherency between shots, and they don't seem to have tackled that problem yet.

joshuahedlund
0 replies
21h36m

How many of those 2.5 second "shots" are back-and-forths between two perspectives (ex. of two characters talking to one another) where each perspective is consistent with itself? This would be extremely relevant for how many seconds of consistent footage are actually needed for an AI-generated "shot" at film-level quality.

chipweinberger
0 replies
18h5m

In 1930 they often literally had a single camera.

Just worth keeping that in mind. You could not just switch between multiple shots like you can today.

axblount
16 replies
23h58m

I hate to be so cynical, but I'm dreading the inevitable flood of AI generated video spam.

We really are about this close to infinite jest. Imagine TikTok's algorithm with on demand video generation to suit your exact tastes. It may erase the social aspect, but for many users I doubt that would matter too much. "Lurking" into oblivion.

barbariangrunge
3 replies
23h40m

YouTube’s endgame is to not need content creators in the loop any more. The algorithm will just create everything

esafak
1 replies
23h18m

The endgame of that is that people will leave.

darby_eight
0 replies
23h6m

I'm somewhat surprised people still watch YouTube with the horrible recommendations and non-stop spam

belter
0 replies
23h8m

Henry Ford II: Walter, how are you going to get those robots to pay your union dues?

Walter Reuther: Henry, how are you going to get them to buy your cars?

lordswork
2 replies
23h45m

It's already here. There are communities forming around generating passive income from mass producing AI videos as tiktoks and shorts.

tikkun
0 replies
18h34m

What's the subreddit?

axblount
0 replies
23h32m

I saw one of those where a guy just made videos about increasingly elaborate AI generated cakes. You're right, I guess we're mostly there.

But those still require some human input. I'm imagining a sort of genetic algorithm for video prompts, no human editing, input, or curation required.

redml
1 replies
22h47m

I think of it as we're replacing the SEO spam we have right now with AI spam. At least now we can fight that with more AI.

sph
0 replies
11h7m

There's a naive statement to make.

beacon294
1 replies
23h15m

Can you explain this aspect of infinite jest to me without spoiling the book?

_xander
0 replies
23h9m

It's introduced early on (and not what the book is really about): distribution of a video that is so entertaining that any viewer is compelled to watch it until they die

rm_-rf_slash
0 replies
23h28m

And somehow our exact tastes would also include influencer coded advertisements.

layer8
0 replies
22h16m

If it really suited my exact tastes, that would actually be great. But I don’t see how we’re anywhere close to that. And they won’t target matching your exact taste. They will target the threshold where it’s just barely interesting enough that people don’t turn it off.

jprete
0 replies
23h5m

At the bottom of the text blurb on the Veo page: "In the future, we’ll also bring some of Veo’s capabilities to YouTube Shorts and other products."

So...you're not cynical, it's an explicit product goal.

LZ_Khan
0 replies
23h39m

I had the same thought regarding infinite jest recently

Invictus0
0 replies
23h3m

This basically already exists for porn

ArchitectAnon
13 replies
7h54m

I think the thing that most perturbs me about AI is that it takes jobs that involve manipulating colours, light, shade and space directly and turns them into essay writing exercises. As a dyslexic I fucking hate writing essays. 40% of architects are dyslexic. I wouldn't be surprised if that was similar or higher in other creative industries such as filmmaking and illustration. Coincidentally 40% of the prison population is also dyslexic, I wonder if that's where all the spare creatives who are terrible at describing things with words will end up in 20 years time.

gnobbler
2 replies
5h26m

You're entitled to your opinion but this will open up a world of possibilities to people who couldn't work in these fields previously due to their own non-dyslexia disability. Handless intelligent people shouldn't lose out because incumbents don't want to share their lane.

alt227
1 replies
5h18m

So, the fall of the skilled professional and the rise of anybody who knows how to write prompts?

Jensson
0 replies
5h11m

The AI we have today has very little to do with writing prompts, you still need to understand, correct, glue and edit the results and that is most of the work so you still need skilled professionals.

fzzzy
2 replies
7h18m

you can speak instead if you wish. Speech to text is available for all operating systems.

cy6erlion
1 replies
6h51m

Speaking has sound but that is still just words with the same logic structure. "Colours, light, shade and space" have entirely different logic.

fzzzy
0 replies
6h48m

Very interesting. Thank you for the perspective, it is extremely illuminating.

What is a user interface which can move from color, light, shade, and space to images or text? Could there be an architecture that takes blueprints and produces text or images?

aavshr
1 replies
7h31m

I would imagine and hope for interfaces to exist where the natural language prompt is the initial seed and then you'd still be able to manipulate visual elements through other ways.

Art9681
0 replies
6h13m

This is the case today. You won't get a "perfect" image without heavy post-processing, even if that post-processing is AI enhanced. ComfyUI is the new PhotoShop and although its not an easy app to understand, once it "clicks" its the most amazing piece of software to come out of the opensource oven in a long time.

seanw265
0 replies
4h5m

Your claim that 40% of architects piqued my curiosity. I wonder if this would have an impact on the success of tools like ChatGPT in the architecture industry.

Do you have a source for this stat? I can't seem to find anything to support it.

chromanoid
0 replies
7h11m

I guess in the near future prompts can be replaced by a live editing conversation with the AI, like talking to a phantom draughtsman or a camera operator / movie team. The AI will adjust while you talk to it and can also ask questions.

ChatGPT already allows this workflow to some extent. You should try it out. I just talked to ChatGPT on my phone to test it. I think I will not go back to text for these purposes. It's much more creative to just say what you don't like about a picture.

If you speech is also affected rough sketches and other interfaces will/are also be available (see https://openart.ai/apps/sketch-to-image). What kind of expression do you prefer?

canes123456
0 replies
6h5m

It’s seems exceedly clear to me that the primary interface for LLMs will voice.

cainxinth
0 replies
5h42m

Terence McKenna predicted this:

“The engineers of the future will be poets.”

DeathArrow
0 replies
4h47m

As a dyslexic I fucking hate writing essays

You can feed AI an image and ask it to describe. Kind of the inverse process.

mrcwinn
12 replies
15h9m

OpenAI has the model advantage.

Google and Apple have the ecosystem advantage.

Apple in particular has the deeper stack integration advantage.

Both Apple and Google have a somewhat poor software innovation reputation.

How does it all net out? I suspect ecosystem play wins in this case because they can personalize more deeply.

lowkey
3 replies
15h3m

Google has a deep addiction to AdWords revenue which makes for a significant disadvantage. Nomatter how good their technology, they will struggle internally with deploying it at scale because that would risk their cash cow. Innovator’s dilemma.

frankacter
1 replies
14h50m

Google Cloud and cloud services generated almost 9.57 billion. That's up 28% from prior:

https://www.crn.com/news/networking/2024/google-cloud-posts-...

They are embedding their models not only widely across their platforms suite of internal products and devices, but also computationally via API for 3rd party development.

Those are all free from any perceived golden handcuffs that AdWords would impose.

damsalor
0 replies
11h55m

Yea, well. I still think there is a conflict of interest if you sell propaganda

xNeil
2 replies
14h12m

Google and Apple have a somewhat poor software innovation reputation.

I'm assuming you mean reputation as in general opinion among developers? Because Google's probably been the most innovative company of the 21st century so far.

bugbuddy
1 replies
12h58m

Yes, I miss Stadia so much. It was the most innovative streaming platform I had ever used. I wished I could still use it. Please, Google, bring Stadia back.

teaearlgraycold
0 replies
11h19m

They’re renting out the tech to 3rd parties

mirekrusin
2 replies
14h32m

Not mentioning Meta, the good guy now, is scandalous.

X is not going to sit quietly as well.

There is also the rest of us.

riffraff
1 replies
12h20m

X is tiny compared to Apple/Meta/Google, both in engineering size and in "fingerprint" in people's life.

Also engineering wise, currently every tweet is followed by a reply "my nudes in profile" and X seems unable to detect it as trivial spam, I doubt they have the chops to compete in this arena, especially after the mass layoffs they experienced.

mirekrusin
0 replies
10h35m

By X I mean one guy with big pocket who won't sit quietly - I wouldn't underestimate him.

miki123211
0 replies
14h39m

Google and Apple also have an "API access" advantage. It is similar to the ecosystem advantage but goes beyond it; Google and Apple restrict third-party app makers from access to crucial APIs like receiving and reading texts or interacting with onscreen content from other apps. I think that may turn out to be the most important advantage of them all. This should be a far bigger concern for antitrust regulators than petty squabbles over in-app purchases. Spotify and Netflix are possible (if slightly inconvenient) to use on iOS, a fully-featured AI assistant coming from somebody who isn't Apple is not.

Google (and to a lesser extend also Microsoft and Meta) also have a data advantage, they've been building search engines for years, and presumably have a lot more in-house expertise on crawling the web and filtering the scraped content. Google can also require websites which wish to appear in Google search to also consent to appearing in their LLM datasets. That decision would even make sense from a technical perspective, it's easier and cheaper to scrape once and maintain one dataset than to have two separate scrapers for different purposes.

Then there's the bias problem, all of the major AI companies (except for Mistral) are based in California and have mostly left-leaning employees, some of them quite radical and many of them very passionate about identity politics. That worldview is inconsistent with a half of all Americans and the large majority of people in other countries. This particularly applies to the identity politics part, which just isn't a concern outside of the English-speaking world. That might also have some impact on which AI companies people choose, although I suspect far less so than the previous two points.

hwbunny
0 replies
12h32m

ahem...zzzzzzzz

loudmax
10 replies
1d

The videos in this demo are pretty neat. If this had been announced just four months ago we'd all be very impressed by the capabilities.

The problem is that these video clips are very unimpressive compared to the Sora demonstration which came out three months ago. If this demo was announced by some scrappy startup it would be worth taking note. Coming from Google, the inventor of the Transformer and owner of the largest collection of videos in the world, these sample videos are underwhelming.

Having said that, Sora isn't publicly available yet, and maybe Veo will have more to offer than what we see in those short clips when it gets a full release.

fakedang
4 replies
23h50m

Honestly, if Veo becomes public faster than Sora, they could win the video AI race. But what am I wishfully thinking - it's Google we're talking about!

spaceman_2020
2 replies
11h50m

The cost to switch to new models is negligible. People will switch to Sora if its better instantly

I’ve switched to Opus from GPT-4 for coding and it was non-trivially easy

ndls
0 replies
11h33m

I think you used non-trivially wrong there, bud.

SilverSlash
0 replies
8h57m

Except your single experience doesn't mean it's generally true, bud. For instance I have not switched to Opus despite claims that it is better because I don't want to go through the effort of cancelling my ChatGPT subscription and subbing to Claude. Plus I like getting new stuff early that OpenAI occasionally gives out and the same could apply for Google's AI.

Jensson
0 replies
22h53m

But what am I wishfully thinking - it's Google we're talking about!

Google the company known to launch way too many products? What other big company launches more stuff early than them? What people complain about Google is that they launch too much and then shut them down, not that they don't launch things.

alex_duf
4 replies
9h51m

these sample videos are underwhelming

wow the speed at which we can be blasé is terrifying. 6 months ago this was not possible, and felt this was years away!

They're not underwhelming to me, they're beyond anything I thought would ever be possible.

are you genuinely unimpressed? or maybe trying to play it cool?

danielbln
1 replies
8h0m

The faster the tech cycle, the faster we become accustomed to it. Look at your phone, an absolute, wondrous marvel of technology that would have been utterl and totally scifi just 25 years ago. Yet we take it for granted, as we do with all technology eventually. The time frames just compress is all, for better or for worse.

newswasboring
0 replies
7h5m

Yeah man but there has to be some thresholds. We take phones for granted after years of active availability. I personally remember days when "what if your phone dies" was a valid concern for even short periods, and I'm not that old. Sora isn't even available publicly. At some point it crosses over from being jaded to just being a cynic.

steamer25
0 replies
39m

They didn't really do a very good job of selecting marketing examples. The only good one, that shows off creative possibilities, is the knit elephant. Everything else looks like the results of a (granted fairly advanced) search through a catalog of stock footage.

Even search, in and of itself, is incredibly amazing but fairly commoditized at this point. They should've highlighted more unique footage.

loudmax
0 replies
5h59m

On some level, it's healthy to retain a sense of humility at the technological marvels around us. Everything about our daily lives is impressive.

Just a few years ago, I would have been absolutely blown away by these demo videos. Six months ago, I would have been very impressed. Today, Google is rolling a product that seems second best. They're playing catch-up in a game where they should be leading.

I will still be very impressed to see videos of that quality generated on consumer grade hardware. I'll also be extremely impressed if Google manages to roll out public access to this capability without major gaffes or embarrassments.

This is very cool tech, and the developers and engineers that produced it should be proud of what they've achieved. But Google's management needs to be asking itself how they've allowed themselves to be surpassed.

htrp
9 replies
1d

Was anyone else confused by that Donald Glover segment. It felt like we were going to get a short film, and we got 3-5 clips?

ZiiS
3 replies
1d

Also it is either very good at generating living people or they need to put more though into saying "Note: All videos on this page were generated by Veo and have not been modified"

jsheard
2 replies
1d

That "footage has not been modified" statement is probably to get ahead of any speculation that it was "cleaned up" in post, after it turned out that the Sora demo of the balloon headed man had fairly extensive manual VFX applied afterwards to fix continuity errors and other artifacts.

iamdelirium
1 replies
23h35m

Wait, where did you hear this? I would assume something like this would have made somewhat of a splash.

jsheard
0 replies
23h33m

The studio was pretty up front about it, they released a making-of video one day after debuting the short which made it clear they used VFX to fix Soras errors in post, but OpenAI neglected to mention that in their own copy so it flew under the radar for a while.

https://www.youtube.com/watch?v=KFzXwBZgB88

https://www.fxguide.com/fxfeatured/actually-using-sora/

> While all the imagery was generated in SORA, the balloon still required a lot of post-work. In addition to isolating the balloon so it could be re-coloured, it would sometimes have a face on Sonny, as if his face was drawn on with a marker, and this would be removed in AfterEffects. similar other artifacts were often removed.

Keyframe
1 replies
1d

It felt AI-generated.

htrp
0 replies
1d

I wish it were AI Donald Glover talking and the "Apple twist" at the end was that the entire 3 minute segment was a prompt for "Donald Glover talking about how Awesome Gemini Models are in a California vineyard"

thisoneworks
0 replies
1d

Yeah that wasn't obvious what they were trying to show. Demis said feature films will be released in a while

jsheard
0 replies
1d

And those clips mostly look like generic stock footage, not something specific that a director might want to pre-vis.

This is what movie pre-vis is actually like, it doesn't need to be pretty, it needs to be precise:

https://www.youtube.com/watch?v=KMMeHPGV5VE

curiousgal
0 replies
1d

Exactly!

"Hey guys big artist says this is fine so we're good"

mccraveiro
8 replies
1d

They didn't show any human videos, which could indicate that the technology struggles with generating them.

dyauspitr
2 replies
23h5m

You know why and it’s not that their technology struggles with it.

lewispollard
1 replies
11h30m

Please elaborate, because I certainly don't.

blinky88
0 replies
10h51m

I think he's talking about the diversity controversy

revscat
0 replies
23h50m

I’m sure part of the reason, beyond those given already, is that they want to avoid the debate around nudity.

mjfl
0 replies
21h6m

thank goodness.

karmasimida
0 replies
1d

Actually there is one in the last demo, it is not an individual one, but one shot in the demo where a team uses this model to create a scene with human in it, where they created an image of black woman but only up her head in it

I would generally agree though, it is not normal they didn’t show more human

himinlomax
0 replies
20h47m

They're probably still wary of their latest PR disaster, the inclusive and diverse WW2 Germans from Gemini.

chubot
0 replies
22h33m

It's also probably that it's easier to spot fake humans than to spot fake cats or camels. We are more attuned to the faces of our own species

That is, AI humans can look "creepy" whereas AI animals may not. The cowboy looks pretty good precisely because it's all shadow.

CGI animators can probably explain this better than I can ... they have to spend way more time on certain areas and certain motions, and all the other times it makes sense to "cheat" ...

It explains why CGI characters look a certain way too -- they have to be economical to animate

airstrike
7 replies
1d

Veo

Sign up to try VisionFX

Is it Veo or VisionFX? Is it a sign up, a trial, or a waitlist?

How hard can it be to write a clear message? In the words of Don Miller, if you confuse, you lose.

peppertree
1 replies
22h55m

This is very on-brand with how Google does branding. "are you confused yet? no? try this other vaguely similar name."

davidw
0 replies
21h14m

Maybe it's going to be a new messaging app - but with AI!

Kidding... I signed up for the waitlist. I have ideas for videos I'd like to use to explain things that I have no hope of creating myself.

BlackJack
1 replies
22h51m

Disclaimer: I work at Google on related stuff

Veo is the name of a video model. VideoFX is the name of a new experimental tool at labs.google.com, which uses Veo and lets you make videos.

Thanks for the feedback though, I see how it's confusing for users.

zb3
0 replies
22h41m

I see the endpoint returns "Not Implemented" when trying to make a video :<

Imagen 3 is awesome though, generates nice logos :D

therein
0 replies
23h48m

Yeah I was like so is it Veo or VisionFX.

This landing page feels as haphazardly put together as the Coinbase downtime page last night.

qingcharles
0 replies
20h33m

And: Communication isn't what you say, it's what people hear

Agree this is totally confusing.

mike_hearn
0 replies
22h48m

Presumably this is DeepMind vs Labs fighting over the same project. A consequence of guaranteeing Demis some level of independence when DeepMind was bought, which still shows through in the fact that the DeepMind brand(s) survive.

londons_explore
1 replies
14h22m

Looks like in places this has learned video compression artifacts...

exodust
0 replies
11h44m

Funny if true. Perhaps in some generated video it will suddenly interrupt the sequence with pretend unskippable ads for phone cases & VPNs.

candiddevmike
1 replies
16h48m

For some reason this video reminds me of dreaming--details just kind of pop in and out and the entire thing seems very surreal and fractal.

jprete
0 replies
15h34m

Same impression here. The scene changes very abruptly from a sky view to following the car. The cars meld with the ground frequently, and I think I saw one car drive through another at one point.

nixpulvis
0 replies
15h49m

So… much… bloom. I like it, but still holy shit. I hate that I like it because I don’t want this art form to be reduced by overuse. Sadly, it’s too late.

I’ll just go back to living under a rock.

datashaman
0 replies
8h0m

1080p but it has pixelated artifacts...

aragonite
5 replies
22h41m

With so much recent focus by OpenAI/Google on AI's visual capabilities, does anyone know when we might see an OCR product as good as Whisper for voice transcription? (Or has that already happened?) I had to convert some PDFs and MP3s to text recently and was struck by the vast difference in output quality. Whisper's transcription was near-flawless, all the OCR softwares I tried struggled with formatting, missed words, and made many errors.

nunez
0 replies
14h54m

This is a work of fucking art.

aragonite
0 replies
21h12m

This is so good - thanks for sharing this!

thesandlord
1 replies
20h44m

We use GPT-4o for data extraction from documents, its really good. I published a small library that does a lot of the document conversion and output parsing: https://npmjs.com/package/llm-document-ocr

For straight OCR, it does work really well but at the end of the day its still not 100%

aragonite
0 replies
20h0m

Thanks! look forward to checking this out as soon as I get home.

Octokiddie
5 replies
14h53m

Oddly enough, I predict the final destination for this train will be for moving images to fade into the background. Everything will have a dazzling sameness to it. It's not unlike the weird place that action movies and pop music have arrived. What would have been considered unbelievable a short time ago has become bland. It's probably more than just novelty that's driving the comeback of vinyl.

rjh29
1 replies
14h27m

Even this site just did not impress me. I feel like it's all stuff I could easily imagine myself. True creativity is someone with a unique mind creating something you would never had thought of.

damsalor
0 replies
11h50m

Get a life

mFixman
0 replies
10h28m

AI generated images and video are not competing against actual quality work with money put into it. They are competing against the quick photoshop or Adobe Aftereffects done by hobbyists and people learning how to work in the creative arts.

I never heard HN claiming that Copilot will replace programmers. Why do so many people believe generative AI will replace artists?

jmathai
0 replies
14h10m

It's a lot more than novelty. It's dedicating the attention span needed to listen to an album track by track without skipping to another song or another artist. If that sounds dumb, give it time and you'll get there also.

It's not just technology though. Globalization has added so many layers between us and the objects we interact with.

I think Etsy was a bit ahead of their time. It's no longer a marketplace for handcrafted goods - it got overrun by mass produced goods masquerading as something artisan. I think the trend is continuing and in 5-10 years we'll be tired of cheap and plentiful goods.

hwbunny
0 replies
12h30m

Yeah, but if you bring up a generation or two on this trash, they will get used to it and think this will be the norm and gonna enjoy it like pigs at the troughs.

curiousgal
4 replies
1d

Oh look another half baked product release that's not available in any country. They're a joke.

mupuff1234
3 replies
1d

Is Sora available in any country?

bamboozled
1 replies
23h54m

I thought I read they’ve deemed Sora too dangerous to release pre- election ? Or have reservations about it ? I might be wrong…

sib
0 replies
21h14m

Sounds like a great excuse / communications strategy!

jaggs
0 replies
23h48m

Apparently it's only released to red teams at the moment as they try to manage safety. There's also the issue about releasing too close to an election?

svag
3 replies
21h54m

An interesting thing that Google does is to watermark the AI generated videos using the [SynthID technology](https://deepmind.google/technologies/synthid/).

It seems that the SynthID is not only for AI generated video but for image, text and audio.

bardak
1 replies
17h25m

I would like a bit more convincing that the text watermark will not be noticeable. AI text already has issues with using certain words to frequently. Messing with the weights seems like it might make the issue worse

Tostino
0 replies
17h15m

Not to mention, when does he get applied? If I am asking an llm to transform some data from one format to another, I don't expect any changes other than the format.

padolsey
0 replies
13h54m

It seems really clever, especially the encoding of a signature into LLM token probability selections. I wonder if synthid will trigger some standarization in the industry. I don't think there's much incentive to tho. Open-source gen AI will still exist. What does google expext to occur? I guess they're just trying to present themselves as 'ethically pursuing AI'.

runeks
2 replies
11h21m

Didn't the model ever fail to generate realistic-looking content?

If I don't know better I'd think you just cherry-picked the prompts with the best-looking results.

carschno
1 replies
11h14m

What you see there is a product, not the scientific contribution behind it. Consequently, you see marketing material, not a scientific evaluation.

tsurba
0 replies
10h11m

Unfortunately also the majority of scientific papers for eg. image generation have had completely cherry-picked examples for a long time now.

neverokay
2 replies
7h26m

I really just need to make some porn with this stuff already and I feel like we’re all tip toeing around this key feature.

Censored models are not going to work and we need someone to charge for an explicit model already that we can trust.

ranyume
1 replies
3h41m

Noo! Think about the children!

(this post is sarcastic)

neverokay
0 replies
1h8m

If they cared about the kids they would out ahead of this before it spreads like wildfire.

indy
2 replies
1d

As someone who doesn't live in the US this year's Google IO feels like I'm outside looking in at all the cool kids who get to play with the latest toys.

roynasser
0 replies
1d

VPN'd right into that playground, turns out the toys were pretty blah

numbers
0 replies
22h38m

don't feel left out, we're all on the wait lists

hipadev23
2 replies
23h34m

I've never had to click "Sign in" so many times in a row.

flying_whale
1 replies
23h28m

...and then fill out an actual google form at the end, _after_ you've already signed in, to be added to the waitlist :sigh:

throwup238
0 replies
21h49m

...and enter your email into the form again despite being logged into a Google account.

willsmith72
1 replies
23h45m

all of this stuff i'll believe when it's ready for public release

1. safety measures lead to huge quality reductions

2. the devil's in the details. you can make me 1 million videos which look 99% realistic, but it's useless. consumers can pick it instantly, and it's a gigantic turn-off for any brand

aprilthird2021
0 replies
21h11m

There'll always be a market for cheap low-quality videos, and vice versa always a market for shockingly high quality videos. K. Asif's Mughal-e-Azham had enormous ticket sales and a huge budget spending on all sorts of stuff, like actual gold jewelry to make the actors feel that they were important despite the film being black and white.

No matter how good AI gets, it will never be the highest budget. Hell, even technically more accurate quartz watches cannot compete price wise with mechanical masterpiece watches of lower accuracy

tauntz
1 replies
22h32m

Uh.. First it tells me that I can't sign up because my country is supported (yay, EU) and I can sign up to be notified when it's actually available. Great, after I complete that form, I get an error that the form can't be submitted and I'm taken to https://aitestkitchen.withgoogle.com/tools/video-fx where I can only press the "Join our waitlist" button. This takes me to a Google Form, that doesn't have my country in the required country dropdown and has a hint that says: "Note: the dropdown only includes countries where ImageFX and MusicFX are publicly available.". Say what?

Why does this have to be so confusing? Is the name "Veo" or "VideoFX"? Why is the waitlist for VideoFX telling me something about public availability of ImageFX and MusicFX? Why is everything US only, again? Sigh..

pelorat
0 replies
21h36m

We can blame the EU AI act and other regulations for that.

solatic
1 replies
1h29m

It's critical to bring technologies like Veo to the world responsibly. Videos created by Veo are watermarked using SynthID, our cutting-edge tool for watermarking and identifying AI-generated content

And we're supposed to believe that this is resilient against prompt injection?

How do you prevent state actors from creating "proof" that their enemies engaged in acts of war, and they are only engaging in "self-defense"?

dmix
0 replies
1h12m

Nation states can run their own models if not now very soon. This isn't something you're going to control via AI-safety woo woo.

sebzim4500
1 replies
1d

The Donald Glover segment might be a new low for Google announcement videos. They spent all this time talking up the product but didn't actually show what he had created.

Imagine how bad the model must be if this is the best way Google can think of selling it.

fakedang
0 replies
23h54m

What seems worse is the Google TextFX video with Lupe Fiasco? What the heck am I supposed to get out of watching boring monologues by a couple of people? They could have just as easily shown, with less camera work, Lupe Fiasco actually using the LLM model, but they didn't - or at least not enough to grab my attention in 2 minutes.

Personally, I liked the above link, even as a Google skeptic, but the videos aren't helping their case.

rishav_sharan
1 replies
1d

Now that the first direct competitor to Sora has been announced, I am sure Sora will be suddenly ready for public consumption, all it's ai safety concerns forgotten

sebastiennight
0 replies
20h45m

I think there's a tremendous compute cost associated with both models still... I can't see how either company could withstand the instant enormous demand, even if they tried to command crazy prices.

Even at $1 per 5-second video, I think some use cases (including fun/non-business ones) would still overwhelm capacity.

nosmokewhereiam
1 replies
17h45m

Made an album in 10 mins. Typically as a techno DJ I'd mix them together so they sound kinda bare right now.

Here's my 10 minutes to 12:09 album debut:

https://on.soundcloud.com/FAXkJrLrC2JjoAyu7

TazeTSchnitzel
0 replies
17h28m

Even as a very inexperienced musician I think I can say these are not very compelling examples? They sound like unfinished sketches that took a few minutes to make each, but with no overarching theme and weirdly low fidelity. An absolute beginner could make better things just by messing around with a groovebox.

miohtama
1 replies
21h2m

Veo's cutting-edge latent diffusion transformers reduce the appearance of these inconsistencies, keeping characters, objects and styles in place, as they would in real life.

How is this achieved? Is there temporal memory between frames?

hackerlight
0 replies
17h2m

Probably similar to Sora, a patchified vision transformer, you sample a 3d patch (third dimension is time) instead of a 2d patch

iamleppert
1 replies
1d

It's so bad its laughable. Sundar really needs to crack the whip harder on those Googlers.

bamboozled
0 replies
1d

or someone has to crack the whip on Sundar :)

gliched_robot
1 replies
21h58m

This is far more superior than SORA, there is no comparison.

monkeeguy
0 replies
20h55m

lol

endisneigh
1 replies
1d

I’ve noticed that a lot of the commentary of these models creates the sort of fervor like politics or sports.

In any case - no details on compute needed. Curious if this ever can be cheap. Even Midjourney still requires a lot.

I’m also surprised there hasn’t been some attempt at creating benchmarks for this. One example could be color accuracy.

stefan_
0 replies
23h55m

Never mind no benchmarks, half of these announcements in the past were straight made up, "offline enhanced" cherry picked "examples", CGI fantasies.

Not to mention the whole AGI topic is forever doomed from SciFi fans, just remember what happened with that room-temperature superconductivity.

barbariangrunge
1 replies
23h36m

The company that controls online video is announcing a new tool, and ambitions to develop it further, to create videos without need for content creators. Using their videos to make a machine that will cut them out of the loop.

infinitezest
0 replies
17h52m

Males the very long Acknowledgments section at the bottom extra rich.

TheAceOfHearts
1 replies
13h0m

ImageFX fails at both of my tests:

1. Generating an image of "a group of catgirls activating a summoning circle". Anything related to catgirls tends to get tagged as sexual or NSFW so it's censored. Unsurprising.

2. The lamb described in Book of Revelation. Asking for it directly or pasting in the passage where the lamb is described both fail to generate any images. Normally this fails because there's not much art of the lamb from Book of Revelation from which the model can steal. If I gave the worst of artists a description of this, they'd be able to come up with something even if it's not great.

Overall, a very disappointing release. It's surprising that despite having effectively infinite money this is the best that Google is able to ship at the moment.

SomaticPirate
0 replies
12h53m

I think this comment is peak Hackernews… dripping with sarcasm and minimizing a significant engineering accomplishment

SoftTalker
1 replies
22h51m

Vaguely unsettling that the thumbnail for first example prompt "A lone cowboy rides his horse across an open plain at beautiful sunset, soft light, warm colors" looks something like the pixelated vision of The Gunslinger android (Yul Brynner's character) from the 1973 version of Westworld.

See 1:11 in this video https://www.youtube.com/watch?v=MAvid5fzWnY

Incidentally that was one of the early uses of computer graphics in a movie, supposedly those short scenes took many hours to render and had to be done three times to achieve a colorized image.

AceJohnny2
0 replies
22h48m

Can't say I see a visual similarity. In any case, "Cowboy silhouette in the sunset" is a pretty classic American visual.

But the parallel you made between android Brynner's vision and the generated imagery is fun to consider!

KorematsuFredt
1 replies
21h48m

I think we should all take a pause and just appreciate the amazing work Google, OpenAI, MS and many others including those in academia have done. We do not know if Google or OpenAI or someone else is going to win the race but unlike many other races, this one makes the entire humanity move faster. Keep the negativity aside and appreciate the sweat and nights people have poured into making such things happen. Majority of these people are pretty ordinary folks working for a salary so they can spend their time with their families.

myaccountonhn
0 replies
16h16m

Majority of the people building the ai are artists having their work stolen or workers earning extremely low wages to label gory and csam data to a point where it hurts their mental health.

Horffupolde
1 replies
1d

Google is the new Kodak.

bingbingbing777
0 replies
18h48m

Kodak failed because their CEO refused to go down the digital route. How is that comparable?

yoyopa
0 replies
10h9m

stop with the ridiculous names just some code numbers like BMW

wseqyrku
0 replies
22h21m

Google puts more effort into the namings than the actual model, ngl.

totaldude87
0 replies
22h41m

its 2024 and AI is taking over and yet, to signup for this, it take way more clicks and Google form entry(1)

Sigh. I still have hopes for VEO though

toasted-subs
0 replies
20h18m

I could say something but I'm glad to get the confirmation.

thih9
0 replies
22h3m

Is there any non slow motion example?

The cyberpunk video seems better in that aspect, but I wish there were more.

sys32768
0 replies
23h8m

I assume for consumers to use this, we must agree to have product placements inserted into our productions every 48 seconds.

shaunxcode
0 replies
20h10m

truly removing the `id` from video.

sanjayk0508
0 replies
8h31m

its a direct competition to sora

s1k3s
0 replies
21h51m

This looks really good for promo videos. All scenes in here are basically that.

robertlagrant
0 replies
23h29m

Hold on to your papers!

rlhf
0 replies
16h42m

It seems so real, cool.

nbzso
0 replies
2h3m

How many billions and tons of water is wasted on this abomination and copyright theft?

mrkramer
0 replies
23h32m

YouTube people: We need more UGC.

DeepMind people: AI can do it.

moralestapia
0 replies
1d

Not nearly as good as Sora.

Google missed this train, big time.

makestuff
0 replies
23h11m

Is there any good blogs/videos that ELI5 how these video generation models even work?

m3kw9
0 replies
21h27m

Why is it always in slow motion, is it hard to get the speed correctly?

infinitezest
0 replies
17h51m

A fast-tracking shot through a bustling dystopian sprawl

How apropos...

ijidak
0 replies
20h4m

These will be remembered as the AI wars.

Reminds me of the competition in tech in the late 80's early 90's between Microsoft and Borland, Microsoft and IBM, AMD and Intel, Word vs Wordperfect, etc.

It's a two horse race between Google and OpenAI.

iamleppert
0 replies
21h30m

Too little, too late. Google is follower, not leader. They need to stop trying and do more stock buybacks and strip the company to barebones, like Musk did with Twitter & Tesla.

hehdhdjehehegwv
0 replies
1d

You have to log in just to see a demo? They are desperate to track people.

fidotron
0 replies
23h53m

If you were of a mind to give Google the benefit of the doubt you would have to think they are desperately trying not to overpromise and underdeliver, partly because that has been their track record to date. It's a very curious time to choose to make this switch though given their competition, and if it was motivated by the reception Bard received then it shows they didn't learn the right lessons from that mess at all.

esafak
0 replies
23h21m

I love the reference to Llama with the alpacas.

efitz
0 replies
15h52m

I’m surprised that the cowboy is not actually an Asian woman.

clawoo
0 replies
23h18m

"This tool isn’t available in your country yet"

How did I know I would see this message before clicking "Sign up to try"?

bpodgursky
0 replies
22h16m

I think it's funny the demos don't have people in them after the Gemini fiasco. I wonder if they didn't have time to re-train the model to show representative ethnicities.

benatkin
0 replies
22h26m

Yo lo veo.

belval
0 replies
23h31m

While it's cool that they chose to showcase full-resolution videos, they take so long to load I thought their videos were just a stuttery mess.

Turns out if you open the video in a new tab the smoothness is much more impressive.

animanoir
0 replies
20h4m

Google is so finished... Unless they remove Mr. Pinchar...

abledon
0 replies
20h7m

music is lacking.... suno, udio, riffusion all blow this out of the water

aaroninsf
0 replies
23h49m

It's mildly interesting how many of the samples shown fail to fully conform to the prompts. Lots of specifics are missing.

Kudos to Google for if not foregrounding, being entirely transparent, about this.

TIPSIO
0 replies
1d

Seems like ImageFX, VideoFX (just a Google form and 3 demos), MusicFX, and TextFX at the links are down and not working.

Huge grammar error on front page too.

NegativeLatency
0 replies
21h28m

Shoulda used youtube to host their video, it's all broken and pixelated for me

Dowwie
0 replies
7h13m

The Alpine Lake example is gorgeous

A4ET8a8uTh0
0 replies
14h30m

I was hoping to see more.. I logged in and was greeted by a waiting list for videos. Since I was disappointed already, I figured I might as well spend some time on other, hopefully usable, features. So I moved to pictures.

First, randomly selected 'feeling lucky' prompt got rejected, because it did not meet some criteria and pop-up helpfully listed FAQ to explain to me how I should be more sensitive to the program. I found it amusing.

Then I played with a couple of images, but it was nothing really exciting one way or another.

I guess you can color me disappointed overall. And no, I don't consider videos on repeat sufficient.