return to table of content

Nvidia CEO Jensen Huang announces new AI chips: ‘We need bigger GPUs’

paulpauper
65 replies
21h35m

Stock unchanged in afterhours. A lot of people were hoping for a big pop on some big development.

TheAlchemist
53 replies
21h7m

Well, stock price is not a good short term indicator about Nvidia developments, nor any company for that matter. Nvidia is doing a very good job.

That being said, their stock is absolutely and hilariously overvalued.

costcofries
40 replies
20h58m

Tell me more about why you believe their stock is hilariously overvalued.

Workaccount2
15 replies
20h49m

They are priced as if they are the only ones who are capable of creating chips that can crunch LLM algos. But AMD, Google, Intel, and even Apple are also capable.

Apple is in talks with Google to bring Gemini to the iPhone, and it will obviously also be on android phones. So almost every phone on earth is poised to be using Gemini in the near future, and Gemini runs entirely on Google's own custom hardware (which is at parity or better than nVidia's offerings anyway).

jerf
10 replies
20h39m

This seems as good a place as any to be Corrected by the Internet, so... correct me if I'm wrong.

Making a graphics chip that is as good as Nvidia: Very difficult. Huge moat, huge effort, lots of barriers, lots of APIs, lot of experience, lots of decades of experience to overcome.

Making something that can run a NN: Much, much easier. I'd guess, start-up level feasible. The math is much simpler. There's a lot of it, but my biggest concern would be less about pulling it off and more around whether my custom hardware is still the correct custom hardware by the time it is released. You'd think you could even eke out a bit of a performance advantage in not having all the other graphics stuff around. LLMs in their current state are characterized by vast swathes of input data and unbelievably repetitive number crunching, not complicated silicon architectures and decades-refined algorithms. (I mean, the algorithms are decades refined, but they're still simple as programs go.)

I understand nVidia's graphics moat. I do not understand the moat implied by their stock valuation, that as you say, they are the only people who will ever be able to build AI hardware. That doesn't seem remotely true.

So... correct me Internet. Explain why nVidia has persistent advantages in the specific field of neural nets that can not be overcome. I'm seriously listening, because I'm curious; this is a deliberate Cunningham's Law invocation, not me speaking from authority.

smallmancontrov
2 replies
20h27m

I agree with you, but let me devil's advocate.

After 10 years of pretending to care about compute, AMD has filled the industry with burned-once experts who, when weighing nvidia against competitors, instinctively include "likely boondoggle" against every competitor's quote because they've seen it happen, possibly several times. Combine this with nvidia's deep experience and and huge rich-get-richer R&D budget keeping them always one or two architecture and software steps ahead, like it did in graphics, and their rich-get-richer TSMC budget buying them a step ahead in hardware, and you have a scenario where it continues makes sense to pay the green tax for the next generation or three. Red/blue/other rebels get zinged and join team "just pay the green tax." NV continues to dominate. Competitors go green with envy, as was fortold.

jerf
0 replies
4h35m

It's true that nobody has beaten nVidia yet, and that is a valid data point I don't deny.

But (as a reply to some other repliers as well), AMD was also chasing them on the entire graphics stack as well as compute. That is trying to cross the moat. Even reimplementing CUDA as a whole is trying to cross a moat, even a smaller one.

But just implementing a chip that does AI, as it stands today, full stop, seems like it would be a lot easier. There's a lot of people doing it and I can't imagine they're all going to fail. I would consider by far the more likely scenario to be that the AI research community finds something other than neural nets to run on and thus the latest hotness becomes something other than a neural net and the chips become much less relevant or irrelevant.

And with the valuation of nVidia basically being based not on their graphics, or CUDA, but specifically just on this one feeding frenzy of LLM-based AI, it seems to me there's a lot of people with the motivation to produce a chip that can do this.

htrp
0 replies
20h12m

burned-once experts

More like burned 2x / 3x / 4x of this time it's different people.

Looking at you Intel

elorant
2 replies
20h2m

CUDA is a big reason for their moat. And that's not something you can build in a couple of years no matter how money you can throw on it.

Without CUDA you have a chip that runs on premise without anyone having a clue how good that is which is supposedly what Google does. Your only offering is cloud services. As big as this is, corporations would want to build their own datacenters.

sottol
0 replies
19h32m

Sure, CUDA has a lot of highly optimized utilities baked-in (CUDNN and the likes) and maybe more importantly, implementors have a lot of experience with it but afaict everyone is working on their own HAL/compiler and not using CUDA directly to implement the actual models. It's part of the HAL/framework. You can probably port any of these frameworks to a new hardware platform with a few man-years worth of work imo if you can spare the manpower.

I think nobody had the time to port any of these architectures away from CUDA because: * the leaders want to maintain their lead and everyone needs to catch up asap so no time to waste, * and progress was _super_ fast so doubly no time to waste, * there was/is plenty of money that buys some perceived value in maintaining the lead or catching up.

But imo: 1. progress has slowed a bit, maybe there's time to explore alternatives, 2. nvidia GPUs are pretty hard to come by, switching vendors may actually be a competitive advantage (if performance/price pans out and you can actually buy the hardware now as opposed to later).

In terms of ML "compilers"/frameworks, afaik there's:

* Google JAX/Tensorflow XLA/MLIR, * OpenAI Triton, * Meta Glow, * Apple PyTorch+Metal fork.

sangnoir
0 replies
17h39m

CUDA is a big reason for their moat.

Zen 1 showed that absolute performance is not the end-all metric ( Zen lost on single-core performance vs Intel). A lot of people care for bang-for-buck metric. If AMD can squeak out good-enough drivers for cards with good-enough performance for a TCO[1] significantly lower than NVidia, they break Nvidia's current positive feedback cycle.

1. Initial cost and cooling - I imagine for AI data center usage, opex exceeds capex.

lmm
0 replies
19h8m

So... correct me Internet. Explain why nVidia has persistent advantages in the specific field of neural nets that can not be overcome. I'm seriously listening, because I'm curious; this is a deliberate Cunningham's Law invocation, not me speaking from authority.

To become a person who writes driver infrastructure for this sort of thing, you need to be a smart person who commits, probably, several of their most productive years to becoming an expert in a particular niche skillset. This only makes sense if you get a job somewhere that has a proven commitment of taking driver work seriously and rewarding it over multiple years.

NVidia is the only company in history that has ever written non-awful drivers, and therefore it's not so implausible to believe that it might be the only company that can ever hire people who write non-awful drivers, and will continue to be the only company that can write non-awful drivers.

imtringued
0 replies
10h8m

It doesn't. If NVIDIA doesn't work with SK Hynix to integrate PIM GDDR into their products they are going to die, because processing in memory is already a thing and it is faster and more scalable than GPU based inference.

bgnn
0 replies
20h13m

CUDA is/was their biggest advantage to be honest, not the HW. They saw the demand to super high-end GPUs driven by Bitcoin mining craze thanks to CUDA, and it transitioned gracefully to AI/ML workloads. Google was much more ahead to see the need and develop TPUs for example.

I don't think they have a crazy advantage HW wise. Couple of start-ups are able to achieve this. If SW infrastracture end is standardized, we will have a more level playground.

__mharrison__
0 replies
12h17m

Anecdata... one of the folks sitting in front of me at a session at GTC claimed the be an AMD employee who also claimed to previously work on cuda. He seemed skeptical that AMD would pull this off. This is the sort of fun stuff that you hear at a conference and aren't sure how much of it is just technical bragging/oneupmanship.

belter
2 replies
20h20m

Good luck with that. Gemini Advanced is simply unusable right now....It's so bad its hard to believe nobody picked up on that yet.

belter
1 replies
19h13m

Go to Gemini Advanced and try a common programming task in Parallel with Claude and ChatGPT4. Within 2 prompts Claude and ChatGPT4 will give nice working code you can use as a basis while Gemini Advanced will ignore your prompts, provide partial code and quickly tell you it can do more, until you tell it exactly what you want. It will go from looking usable to stuck on "I can do A or I can do B you tell me what you prefer hell" in less than 2 or 3 prompts...Unusable. And I say that as paying customer that will soon cancel the service.

Workaccount2
0 replies
19h2m

You're not wrong, but it wouldn't be surprising if Google irons things out with a few more updates. The point is that it would be foolish to write off Gemini right now, and Gemini is totally independent of Nvidia's dominance.

drexlspivey
0 replies
7h43m

AMD is even more hilariously overvalued, currently at 360 PE

xyst
10 replies
20h48m

Because their stock value is highly coupled with crypto mining and AI craze.

The move from PoW to PoS for most crypto networks in combination with bust of ‘22. NVDA slid down in value.

OpenAI debuts ChatGPT in late 2022 and now it’s suddenly bumping in price as the hype and rush for GPUs from companies of all types buys up their stock of GPUs. Demand is far outpacing the supply. Nvda can’t keep up.

Thus, share price is brittle. Competition in the GPU market is dominantly owned by Nvidia. That can change, but so far openai loves using nvidia for some reason.

ryandrake
9 replies
20h31m

If you are a true believer that AI is not a craze, then the stock can only go up from here. If you think there is a chance that everyone gets bored of AI and moves on to some other fad that is not in Nvidia’s wheelhouse, then it’s probably down from here. I’m staying out of this bet: don’t have the stomach for it.

AlexandrB
4 replies
20h1m

There's another case for pessimism as well: cost. It's possible that many AI applications aren't worth the money required for the extra compute. AI-enhanced search comes to mind here: how is Microsoft going to monetize users of Copilot in Bing to justify the extra cost? Right now a lot of this stuff is heavily subsidized by VCs or the MSFTs of the world, but when it comes time to make a profit we'll see what actually sticks around.

_factor
2 replies
15h51m

Better question: why does a simple search for “What color is a labrador retriever” require any compute time when the answer can be cached? This is a simple example, but 90% of my searches don’t require an llm to process a simple question.

jazzyjackson
1 replies
10h34m

One time I came across a git repo that let me download a gigabyte of prime numbers and I thought to myself, is that more or less efficient than me running a program locally to generate a gigabyte of prime numbers?

The compute for a direct answer like that is fractions of a penny, it might be better to create answers on the fly than store an index of every question anyone has asked (well, that's essentially what the weights are after all)

jacobr1
0 replies
19h24m

This seems true as far as incentives go. But how much of that cost driver will be due to efficiencies driven by companies like NVIDIA? They seem well poised to benefit from a lot of the increased (non-hype) use of AI. Seems like we spent a decade or more of stalled CPU performance gains chasing better energy efficiency in the data center, same story could play out here.

throw0101b
2 replies
18h9m

If you think there is a chance that everyone gets bored of AI and moves on to some other fad that is not in Nvidia’s wheelhouse, then it’s probably down from here.

You may wish to look at history to see how things can work out: Cisco had a P/E ratio of 148 in 1999:

* https://www.dividendgrowthinvestor.com/2022/09/cisco-systems...

The share price tanked, but that does not mean that people got bored of the Internet and the need for routers and switches. QCOM had a P/E of 166: did people decide that mobile communications was a fad?

The connection between technological revolutions and financial bubbles dates back to (at least) Canal Mania:

* https://en.wikipedia.org/wiki/Canal_Mania

* https://en.wikipedia.org/wiki/Technological_Revolutions_and_...

It is possible for both AI to be a big thing and for NVDA to drop.

throw0101b
0 replies
6h45m

Neither of these were technologically based speculations.

https://en.wikipedia.org/wiki/Tulip_mania

While widely used as an example, most of the well-known stories about this were actually made up, and it wasn't as bad as it is often made out to be.

Quinn and Turner, when they wrote about bubbles:

* https://www.goodreads.com/book/show/48989633-boom-and-bust

* https://old.reddit.com/r/AskHistorians/comments/i2wfsm/i_am_...

purposefully excluded it because their research found it wasn't actually a thing. (Though for the general public it can be an illustrative parable.)

partiallypro
0 replies
20h1m

AI is obviously the future, though current iterations will probably die at some point. but the dot com bubble ended up with the internet being more pervasive than may have even been thought of at the time, but regardless even the likes of Amazon's stock went bust before it recollected itself. Not a perfect comparison given Nvidia has really good revenue growth, but the point still stands.

smallmancontrov
7 replies
20h40m

No moat.

Yes, CUDA, but CUDA is maaaaaybe a few tens of billion USD deep and a few (more) years wide. When the rest of the industry saw compute as a vanity market, that was sufficient. Now, it's a matter of time before margins go to, uhhh, less than 90%.

Does that make shorting a good idea? I wouldn't count on it. The market can always remain irrational longer than you can remain solvent.

tiahura
4 replies
20h6m

And MS and everyone else have plenty of interest in helping AMD commodify CUDA compatibility.

stefan_
3 replies
18h49m

It's so weird it's taking them so long, because as far as anyone can tell AMD is mostly competent enough to make GPUs within some percentage points of Nvidia, the "breadth of complexity" in what these things do at the end of the day is ... rather underwhelming, the software stack may appear to be changing all the time but is also distinctly JavaScript-frotend-esque... is there an insider that knows what the holdup is? Is AMD just averse to making a ton of money?

At this point AMD investors should be rebelling, it's pissing money out there but they are not getting wet, and management might have doubled the stock price but that's little consolation if "order of magnitude" is what could have been.

wkat4242
0 replies
15h34m

It's kinda great for those of us wanting GPUs though. Nvidia might eventually decide it's not worth their time to bother with.

treprinum
0 replies
8h8m

AMD pays very little to its SWEngs (principal engineer in SFBA for ~200k), so they can't attract top end people in SW to implement what they need. Semi companies are used to pay HW engineers peanuts and that doesn't work in SW.

sangnoir
0 replies
17h0m

At this point AMD investors should be rebelling

Looking at the chart for $AMD over the past 5 years gives plenty od reasons to be happy, and no reason to rebel. A rational AMD investor should not be Jonesing Nvidia's catching lightning in a bottle via crypto + AI. The Transformers paper was published a few months before AMD released Zen 1 chips - they did not have a lot of money for GPU R&D then.

The timing of the LLM-craze was very fortuitous for Nvidia.

yen223
0 replies
17h36m

I used to think that CUDA was something that would get commoditised real fast. How hard could building it be?

However, given that the nearest competitor AMD has basically given up on building a CUDA alternative, despite the fact that this could grow the company by literal trillions of dollars, I suspect the CUDA moat is much bigger than I give it credit for.

cma
0 replies
20h6m

They also bought infiniband which has played a big role in being the best at clustering, though Google's TPU reconfigurable topology stuff seems really cool too.

Tesla went after them with Dojo and has still ended up splurging on big H100 clusters.

swalsh
1 replies
20h46m

72 P/E ratio while they have a mere monopoly on one the most valuable resource in the world.

Competition WILL come. Maybe it's Groq, maybe AMD, maybe Cerebras. Maybe there's a stealth startup out there. Point is, they're going to be challenged soon.

htrp
0 replies
20h12m

You and what fab?

It's almost impossible to manufacture at scale with good yields and leading edge fabs are almost all bought out.

TheAlchemist
1 replies
20h40m

Their market cap is 2.2T $.

In the past year, they had a revenue of 60B $ and net income of 30B $. Absolutely amazing numbers, I agree. The year before they had a revenue of 30B $ and a net income of 4.5B $ - and it was a rather good year. What happens next of course depend of how you judge the situation - was it a peak hype demand ? Will it stabilize now ? Grow at current extraordinary rates ?

Scenario 1 - margins get back to normal due to hype going down, competition improving etc - in this case the company is worth at best ~200B $ - or 1/10 of what it is now.

Scenario 2 - they maintain current revenue and the exceptional margins - the company would be worth ~1T - or 1/2 of what it is now.

Scenario 3 - they current growth rate (based on past 12 months) continue for ~5 years. This is the case the company is worth ~2T $.

But they are in a business where most money come from a handful of customers, all of which are working on similar chips - and given the sums in play now, the incentives are *very* strong.

My opinion, is that the company is already priced for perfection - basically the current price reflects the perfect scenario. I struggle to see any upside, unless we have AGI in the next 5 years and it decides it can only run on Nvidia chips.

All of this is akin to Tesla in the past years. They grew from a small startup to a medium car maker - the % growth rate was huge of course - an amazing achievement in itself. But people projected that the % growth rate would continue - and the stock was priced accordingly. Reality is catching up on Tesla, even if some projections are still absolutely crazy.

CamperBob2
0 replies
15h18m

It does no good to design similar or even superior chips if you can't get them fabbed. How much of the world's fab capacity has Nvidia already reserved?

Takennickname
0 replies
20h54m

Because he missed the train. My guess.

Blammar
11 replies
19h9m

NVDA's forward PE is ~37, about what it has been for the past ~5 years I've been tracking that. So it's not overpriced based on that metric.

If you're convinced the stock is that overvalued, go short some or, if you like to live dangerously, buy some long-term put options (don't be an idiot and buy short-term options.)

I have no idea if NVDA is like Cisco Systems in 2000, or if it's something unique. What I am aware of is that there's around 5-7 trillion that were moved from stocks to t-bills since the Fed raised rates in March 2022. If and when they drop their rates back to the historical ~2.5%, it's reasonable to predict these funds will go back into stocks, which will presumably drive up prices.

TheAlchemist
7 replies
18h51m

That's exactly what I'm saying below - PE is still very high, hence projecting the past growth into the future. But the scale changed a bit. A 37 PE ratio is extremely high by historical standards - this was reserved for very promising, small startups. Not for 2T companies. I know this got distorted in the past 15 years by abnormally low interest rates, but sooner or later it will come back to something that makes sense.

Buying long-term put options on Nvidia now is extremely expensive - the stock was so volatile that the price you pay for those options almost annihilate any gains you could expect, even if the stock losses 50% in 12 months.

You got me curious about those 5-7 trillions. Where these numbers come from ?

solumunus
3 replies
12h21m

People need to stop focusing on “historical standards”. For better or worse, retail entering the market en masse (often with options trading) has created a new standard. The market over the last 10 years is the new normal, the smart people have worked this out and are making huge returns. TSLA at its peak had a WAY higher PE than NVDA does now, and NVDA is just as popular, with even stronger fundamentals.

TheAlchemist
2 replies
10h9m

For better or worse, human psychology doesn't change that much - those historical standards very much apply today.

For us, older folks, we've seen this 'new normal' several times already - it will end up as usual. There are no free lunches and as it appears to me that have not entered any permanently high plateau.

It's even quite funny that ~100 years ago, we've had the previous big pandemic, and the biggest stock market crash. Epidemic of this century is done, now waiting for the second part !

solumunus
1 replies
10h7m

No, that’s objectively wrong. You simply haven’t seen retail involvement in the market on this level before, not even close. It’s a new precedent, the market has changed.

To clarify, I’m not saying NVDA won’t crash from here or bear markets no longer exist. I’m simply saying that historic PE valuations are a poor metric for assessing the potential of a stock in todays market conditions.

TheAlchemist
0 replies
9h15m

Well, if you use the word "objectively", I would expect some numbers to back it up.

Yes, it's probably the first time that retail is allowed to trade options. But it's not the first time that retail is all in in stocks. I've tried to find a funny number to back it up - just check the Wiki on 1929 crash - https://en.wikipedia.org/wiki/Wall_Street_Crash_of_1929 - there was more money lent to 'small investors' so they can buy on margin ... than the entire amount of currency circulating at the time.

On average, all those retail guys will loose money - that's the sad truth. In the long term, the stocks simply follow the earnings - all other movements around this trends are pretty much a zero sum game - and most skilled operators are not loosing money in that game.

Price to earning ratio is just the number of years the company 'pays for itself' if you buy it. PER at 40s for big chunks of main indexes mean that either there will be tremendous progress in the economy that will boost the earnings or people are hoping to resell to a bigger fool.

Note: I work in finance, and I very much see the retail involvement in stocks. Hedge Funds and banks, are making a ton of money out of them, that's for sure.

rytill
2 replies
16h22m

Then be the options seller. You can sell cash secured puts, or a put credit spread, or a call credit spread. Calls are even more expensive than puts right now.

TheAlchemist
1 replies
15h35m

Selling options is an even worse idea.

Frankly, I don't understand why we made it possible for individuals to gamble by selling options. As Charlie Munger used to say, Wall Street will sell shit as long as shit can be sold.

rytill
0 replies
10h39m

You would be right about selling naked options. But call/put credit spreads have bounded downside, just like buying options.

Selling cash secured puts or selling covered calls would be less risky than just holding stock.

dkrich
2 replies
15h29m

We are at a very unique time. The stock market has basically been in a bull market for 15 years with some very short-lived sell-offs along the way. During that time we've had some incredible innovations such as the iPhone, FANG stock dominance and unprecedented profitability for years.

You've also had three or four bona fide bubbles in that span, starting around 2017. First was Bitcoin along with the stock market as a whole (with Nvidia being one of the leading stocks of that bull market advance).

Then you had Tesla go parabolic and lots of people become rich. Then you had the whole post-COVID speculative mania.

The result of this has been extreme credulity by the average person. Today's keynote is the perfect summation of this phenomenon. I saw multiple people who almost certainly couldn't explain in any level of detail how Nvidia GPUs are used for training and inference, but rather rely on the secondhand talking points like CUDA that they've learned by watching Jim Cramer, watching this keynote with excitement and anticipating how much it would pump their shares or call options.

Contrast this with Steve Jobs keynotes from 15 years ago when Apple's best days were well ahead of them. Most keynotes were questioned, in some cases even mocked. When Tesla stock broke out, many people couldn't make sense of it. Ditto for cryptocurrencies. But now, taking their cues from those cycles, the average person wants to ride the next bubble to riches and is trying to catch the wave and so now believes every story attached to a rising asset price.

CEO's aren't blind to this and are using every opportunity to create favorable storylines. The leadup to a keynote like this carries with it an enormous amount of pressure to deliver. Hence a company like Nvidia leaning into generative artwork and straight up made up storylines like robot development.

At the end of the day, I'm afraid that there likely isn't all that much substance and the evidence is beginning to pile up that the megacap tech stocks have run out of ideas which is why they are laying off people en masse and appealing to the AI hype cycle to carry their stocks higher.

Consider that Nvidia has gone up 8x- 800%!- in just over a year. The cycles are moving faster and faster. I remember just a year ago when lots of people said Nvidia at $250 was insane. Now here we are with the stock at more than three times that level and most people are calling it cheap. The stock market seems to have in certain areas like semis, completely disconnected from the fundamentals and taken flight. Yes, Nvidia earnings have grown. But understand that this is all part of a positive feedback loop where tech CEO's are pressured by their competitors and shareholders to show that they are investing in AI. Thus they all talk about it on their earnings calls and spend massively. All of their stocks rise in unison as you have a market that increasingly looks like its chasing momentum stock trends up. Nvidia's moves of late have almost nothing to do with any fundamental developments in the company. It has been routinely trading upwards of $45 billion a day. The Friday before last that number was over $100 billion. These are absolutely insane figures. Compare that to Microsoft, the largest company by market cap in the world, which trades on average around $8 billion per day.

I think this is generally how bull markets end and I think we may be actually forming the top of the great bull market for the megacaps that began around 2010 but really hit its stride starting in 2017.

svnt
0 replies
5h4m

I’ll buy the credulity problem, and agree there is considerable risk in NVDA’s market position.

However they went up 8x because (neglecting crypto) they overnight transitioned from providing accessories to PC gamers and high end engineering workstations (both increasingly niche markets with tapering growth or decline) to being for the moment the only substrate of an entirely new consumer product segment that has seen the most rapid adoption of any new technology in the history of the world.

This could be the way things work now: the time constants shrink as the pipeline efficiency increases.

55555
0 replies
7h22m

Good post, thanks.

dxbydt
5 replies
21h12m

guy is messing it up bigtime and in real-time as well. sheesh. none of his jokes are landing. “we had 2 customers. we have more now”. long pause. screen behind him covered with logos of all his customers. pause. pause. finally applause. ok on to the next tidbit.

whole conference has been proceeding like this now. look if you invite cramer and the wall street crowd, you should throw in some dollar figures. like - who is paying for all this. how much. and why. talk is entirely about token generation bandwidth, exaflops and petaflops, data parallel vs tensor parallel vs pipeline parallel - do you honestly think cramer knows the difference between an ml pipeline and an oil pipeline ?

i am watching this conf with my kid - proper GenZ member - who got up after 5 mins and said man who is this comedian, his jokes are so bad, and left :(

smallmancontrov
0 replies
20h46m

You might not like it, but this is what peak performance looks like.

gmerc
0 replies
20h57m

Nah, wallstreet doesn’t understand what it’s looking at.

That’s fine, it’s a developer conference for a founder lead company that hasn’t reached the “stock price is the product” state. He’s not trying to optimize the next 5 days of stock.

There’s a full ecosystem grab with Nim there, a new GPU that forces every major datacenter to adopt (or their competitors will massively increase their compute density)

anon291
0 replies
20h57m

This is a developer's conference, not a financial one.

Takennickname
0 replies
20h48m

Cramer is an entertainer. Not a developer or an investor.

Deasiomlo
0 replies
19h23m

Why would a Gen z kid care about this conference?

It's not a hipster event.

And the pauses are clearly his style of presenting. Having a artificial pause to tell the audience it would be a good time to react.

You clearly don't like it, I don't mind it.

But just to be clear: we see peak human ingenuity. This right now will be history on how we as humans build AGI, full human robots etc (in case we don't nuke ourselves).

You can read the conference results easily at any it news portal.

There is no requirement from Nvidia entertaining you or your kid.

synergy20
0 replies
21h31m

not only that, it lost steam during the day, maybe it was overheated too much and no more news can pump it up any further.

swalsh
0 replies
20h49m

At 2 trillion, it's all baked in already

rvz
0 replies
21h5m

A lot of people were hoping for a big pop on some big development

They are waiting for earnings projections for such a pop since right now it is extremely overbought and struggling to move past >$1,000 per share.

For now, Microsoft and OpenAI will use these chips, but in the long term they are just looking at this and plotting to build their own chips and reducing their dependence on Nvidia and will be ready to switch once their contracts have run out.

dagmx
0 replies
21h30m

I imagine it’ll pop in the morning

_factor
0 replies
15h59m

Nvidia is no secret. Whatever hidden value is in the stock, is likely already represented.

cebert
56 replies
15h7m

“Nvidia … is becoming less of a mercenary chip provider and more of a platform provider, like Microsoft or Apple, on which other companies can build software.

I can understand from a growth perspective why it’s more profitable for Nvidia if it can become more of a platform service for AI. However, that’s difficult to balance that and partnerships the company already has with AWS and Microsoft. I’d expect to see some acquisitions or competing custom solutions in the future. Fortunately for Nvidia, a lot of AI is still dependent on CUDA. I’m interested to see how this plays out.

dheera
32 replies
15h0m

My prediction is eventually there will be anti-trust ligitation, they will be required to open the CUDA standard, after which AMD will become a competitor.

NVIDIA could voluntarily open the standard to avoid this ligitation if they wanted to, though, and IMO it would be the smart thing to do, but almost every corporation in history has chosen the ligitation instead.

nemothekid
19 replies
14h4m

My prediction is eventually there will be anti-trust ligitation, they will be required to open the CUDA standard, after which AMD will become a competitor.

If AMD isn't a competitor before government intervention, I don't the government forcing nvidia to open up CUDA changes much. CUDA's moat isn't due to some secret sauce - nvidia put in the developer hours; and if AMDs CUDA implementation is still broken, people will continue to buy nvidia.

There has been a lot of trying to get AMD to work - Hotz has been trying for a while now[1] and has been uncovering a ton of bugs in AMD drivers. To AMD's credit, they have been fixed, but it does give you a sense of how far behind they are in regards to their own software. Now imagine them trying to implement a competitor's spec?

[1] https://twitter.com/__tinygrad__/status/1765085827946942923

AYBABTME
15 replies
13h39m

I don't understand AMD in this. Isn't it insanity that they're not throwing all they've got at their software stack?

tempaccount420
5 replies
13h22m

Hardware people don't get along very well with software people.

elbear
4 replies
12h20m

Why's that?

rapsey
1 replies
11h25m

Because it is a different type of engineering. If you manage software development like you manage hardware development your software is going to be bad. That has always been AMD's problem and it is not likely to get fixed.

AYBABTME
0 replies
2h35m

2t$ problem of egos?

imtringued
1 replies
10h25m

Because they didn't go to uni when hardware-software-codesign was being taught.

nebula8804
0 replies
6h53m

What unis would that include? Isn't ATI Canadian? Therefore i'd expect lots of UToronto and Waterloo people there. Aren't they some of the best in this field?

roenxi
5 replies
12h15m

You know what happens to companies that panic and throw all their resources into knee-jerk software projects? I don't, but I'd predict it is ugly. Adding more people to a bad project generally makes it worse.

The issue that AMD has is they had a long period where they clearly had no idea what they were doing. You could tell just from looking at websites, CUDA pretty much immediately gets to "here is a library for FFT", "here is a library for sparse matricies". AMD would explain that ROCM is an abbreviation of the ROCm Software platform or something unspeakably stupid. And that your graphics card wasn't supported.

That changed a few months ago; so it looks like they have put some competent PMs in the chair now or something. But it'll take months for the flow on effects to reach the market. They have to figure out what the problems are which takes months to do properly; then fix the software (1-3 months more minimum); then get it into the open and the foundational libraries like PyTorch pick it up (might take another year). You can speed that up, but more cooks in the kitchen is not the way. Bandwidth use needs to be optimised.

It isn't like ROCm seems lacks key features; it can technically do inference and training. My card crashes regularly though (might be a VRAM issue) so it is useless in practice. AMD can check boxes but the software doesn't really work and grappling with that organisationally is hard. Unless you have the right people in the right places, which AMD didn't have up to at least mid 2023.

elcomet
1 replies
10h16m

Pytorch has been supporting rocm for all last 2 years

blagie
0 replies
7h27m

I'd add quotes there:

Pytorch has been "supporting" rocm for all last 2 years

Certhas
1 replies
9h37m

Look at AMDs vs Intel. They have now surpassed Intel in terms of CPUs sold and market cap. That was unthinkable even six, seven years ago.

It makes perfect sense that, organisationally, they were focused on that battle. If you remember the Athlon days, AMD beat Intel before, but briefly. It didn't last. This time it looks like they beat Intel and have had the focus to stay. Intel will come back and beat them some cycles, but there is no collapse on the horizon.

So it makes sense that they started looking at nVidia in the last year or so. Of course nVidia has amassed an obscene war chest in the meantime...

nicoburns
0 replies
8h59m

AMDs graphics was an acquisition though (ATI), and I understand that the company culture of that division might still be quite different.

makomk
0 replies
8h15m

AMD would explain that ROCM is an abbreviation of the ROCm Software platform or something unspeakably stupid. And that your graphics card wasn't supported.

If even that. A few years ago they managed to break basic machine learning code on the few commonly-used consumer GPUs that were officially supported at the time, and it was only after several months of more or less radio silence on the bug report and several releases that they declared those GPUs were no longer officially supported and they'd be closing the bug report: https://github.com/ROCm/ROCm/issues/1265

logicchains
0 replies
9h19m

It's a political problem. Good software engineers are paid more than good hardware engineers, but AMD management is unwilling to pay up to bring on good software engineers because then they'd also need to pay their hardware engineers more, otherwise the hardware engineers would be unsatisfied. If you check NVidia salaries online you'll see NVidia pays significantly more than AMD for both hardware and software engineers; it's a classic case of AMD management being penny-wise, pound-foolish.

imtringued
0 replies
10h22m

You have to remember that this only applies to cheap consumer GPUs, they tend to support their datacenter GPUs better. When you consider that Ryzen AI already eats the AI inference lunch, having better GPUs with better software only threatens to cannibalize their data center GPU offering. Given enough time nobody will care about using AMD GPUs for AI.

e4325f
0 replies
7h20m

They did buy Nod.ai recently

matt-p
1 replies
6h2m

It changes alot. It is not legal to make a 'CUDA' driver for an AMD GPU as Nvidia own cuda. You can see there was a open implementation of this that AMD sponsored until they got threatened with a lawsuit by Nvidia

dogma1138
0 replies
3m

ZLUDA ate the dust not because they implemented CUDA but because they were misusing complied NVIDIA libraries.

If it was a clean room implementation of the API NVIDIA wouldn’t care. Heck that’s exactly what AMD did with HIP.

But what you cannot do is essentially intercept calls to and reverse engineer NVIDIA binaries in real time because you can’t be arsed to build your own.

blitzar
0 replies
10h24m

Getting this working might be worth a trillion $ to AMD - they should be doing more than just waiting for a bootstrapped startup to debug their drivers for them.

anon291
7 replies
14h24m

The cuda api is essentially open... Hip is basically a copy.

CUDA is such a misnomer. Amd doesn't have tensorRT, cuDNN, cutlass, etc. Forcing Nvidia to make these work on AMD is like forcing Microsoft to make windows work on apple hardware... Not going to happen.

alphabeta567
4 replies
9h20m

CUDA is not open. See what happened with ZLUDA.

coryrc
3 replies
9h1m

I'm not sure your implication. My understanding of the project is AMD didn't want to invest in it anymore.

sgift
2 replies
8h21m

IMHO there's reason to believe that was what was discussed here plays a role in that decision: https://news.ycombinator.com/item?id=39592689 - namely NVidia trying to forbid such APIs.

paulmd
0 replies
6h22m

that’s literally old news, it’s from ‘20 or ‘21 and just got noticed iirc

anon291
0 replies
4h58m

That has nothing to do with the API. The restriction there is you cannot use nvcc to generate nvidia bytecode, take that bytecode, decompile it, and translate it to another platform. This means that, if you use cuDNN, you cannot intercept the already-compiled neural network kernels and then translate those to AMD.

You can absolutely use the names of the functions and the programming model. Like I said, HIP is literally a copy. Llama.cpp changes to HIP with a #define, because llama.cpp has its own set of custom kernels.

And this is what I've said before, CUDA is hardly a moat. The API is well-known and already implemented by AMD. It's all the surrounding work: the thousands of custom (really fast!) kernels. The ease-of-use of the SDKs. The 'pre-built libraries for every use case'. You can claim that CUDA should be made open-source for competition, but all those libraries and supporting SDKs represent real work done by real engineers, not just designing a platform, but making the platform work. I don't see why NVIDIA should be compelled to give those away anymore than Microsoft should be compelled to support device driver development on linux.

wmf
1 replies
13h47m

They did force Microsoft to make Office work on Mac though... (Office for Mac already existed but I think MS agreed to not cancel it.)

pjmlp
0 replies
10h42m

It was more like Microsoft had the anti-trust stuff going on, and Apple was on the verge of going bankrupt.

wmf
3 replies
14h45m

It would be kind of genius for Nvidia to "open" the CUDA APIs (which have already been unofficially reverse engineered anyway) but not the code. Maybe they'd also officially support HIP and SYCL. Maybe they could open SXM after all competitors have already committed to OAM. They'd create the appearance of opening up while giving up very little.

sitkack
2 replies
12h27m

By "Opening Up" they cement their leadership position. AI frameworks are already targeting CL, SPIR-V, etc. The low level details will fade and so will Nvidias api dominance.

The MI300 smokes the H100 yet here we are.

incrudible
1 replies
9h38m

Just because they are a target doesn’t mean things just work. Historically, AMD hardware for GPGPU becomes obsolete well before the software landscape catches up. I am not going to risk my time and money finding out whether history repeats itself, just for a few potential FLOPS per dollar.

sitkack
0 replies
3h58m

Don't disagree, but it is is nuts how AMD is leaving billions on the table by not finishing the project by writing the software.

wmf
20 replies
14h53m

I think they're planning for a world where half their customers (hyperscalers) just use GPUs and CUDA while the other half (the long tail) use more profitable, higher-level parts of the platform. They don't have the leverage to force customers one way or the other. It would be easier to just sell GPUs, but they know that sophisticated customers can switch to other chips while the platform provides lock-in for smaller customers.

neverokay
17 replies
12h52m

Doesn’t matter how good your in-house tech team was, your company still outsourced it to cloud infra.

That’s what Nvidia faces. Doesn’t matter how good the current in-house teams are using direct hardware, the trend in corporate is shift to a vendor (Google/AWS).

Nvidia can watch this inevitable shift or get ready to offer itself as a platform too.

joshellington
16 replies
12h28m

I get it, the service model always shines the brightest in the eye of the revenue calculator. But I have immediate skepticism they’ll be able to execute at a competitive level. Their core competency has always been manufacture and production, not service-based things. It’s a big rock to push up a tall hill.

bboygravity
12 replies
9h54m

Genuine question (I don't know much about cloud stuff): how is providing a cloud service/platform (at scale) even remotely as hard as designing, manufacturing and selling GPU's (including drivers and firmware) at massive scale?

It feels like reading that setting up something like Facebook would be extremely challenging for a company like SpaceX.

blagie
5 replies
7h30m

Genuine answer: Setting up Facebook WOULD be extremely challenging for a company like SpaceX. There's a reason Facebook is worth about 10x what SpaceX is worth, and most of that value doesn't come from the ability to build software. Facebook isn't even particularly good at building software.

To give an example in a closer domain: Look at how long Google lost money on cloud services through 2022 (over $15B in loses), and now only makes money by creative accounting (bundling "cloud services" together versus breaking out GCP; Microsoft does something similar with Office 365 and Azure).

Like many potential customers, I would not consider GCP because:

1) Google "support" is a buggy, automated algorithm which randomly thwacks customers on the head

2) Google randomly discontinues products

3) I've seen a half-dozen to a dozen instances where buying from Google was penny-wise and pound-foolish, and so have many other engineers I've worked with.

Google's overall attitude is that I'm a statistic defined by my value to Google. Google can and will externalize costs onto me. That attitude is 100% right for adwords and search, which are defined by margins, but not for something like GCP. If I am going with a cloud service/platform, I'll go with Amazon, Microsoft, or just about anyone else, for that matter.

That's not that Google is a bad company. Google actually did have the skill set to build the software and data centers for a very, very good cloud provider. It's just that Google's core competencies lie very far from providing reliable service to customers, customer support, or all the things which go into providing me with stability and business continuity.

"Fixing" this would require a wholesale culture, value, and attitude change, and developing a core competency very far from what Google is good at.

I put "fixing" in quotes since if you develop too many core competencies, you usually stop being good at any of them. Focus is important, and there's a reason many businesses spin out units outside of their domains of focus. If Google is able to become good at this, but in the process loses their edge in their current core competencies, that's probably a bad deal.

FWIW: I haven't yet formed an opinion on NVidia's cloud strategy. However, their core competencies appear to be very much in the "hard" domains like silicon, digital design, machine learning rather than "soft" ones. Another relevant example for what can happen when hard skills are de-emphasized at engineering-driven companies is Boeing (if you've been following recent stories; if not, watch a documentary).

lotsofpulp
2 replies
6h4m

There's a reason Facebook is worth about 10x what SpaceX is worth, and most of that value doesn't come from the ability to build software. Facebook isn't even particularly good at building software.

One is a publicly listed business with as much of an objective look at real time "worth" as possible in today's world, and the other is a private business with confidential financials.

Seems like you would be unable to even calculate SpaceX's net worth, much less compare them to a business with the most objective measure of "worth".

blagie
1 replies
4h49m

SpaceX raised $750M at a valuation of $137B in January 2023.

A private investment at this scale should have a lot more transparency and due diligence than disclosures from a SEC disclosures. If I were investing $750M, I'd have engineers under NDA review SpaceX technologies, financial auditors, legal auditors, etc.

Secondary sales place it a little bit higher (but those typically have all the issues you describe).

lotsofpulp
0 replies
2h5m

Fair enough, didn’t know about that recent round. Still, I would assume that number is higher than it would be if the business were publicly listed, but the $140B should be close enough.

fragmede
1 replies
7h5m

Eh I mean Google moved GCP's revenue around because Microsoft was doing that to make Azure look bigger than GCP. If you can't beat em, join em. Google's got long term contracts with a lot of companies and the government, so GCP isn't going to shut down anytime soon. Their consumer products division has problems with product longevy, but we're not paying them corporation level money or signing serious contracts when buying a Stadia subscription. So it's just business.

What I've heard is Azure is a pain in the ass, and things take three times as long to set up there, for some reason. There's also Oracle cloud but you hear way less about them. out there.

blagie
0 replies
6h10m

Having been down this road a few times, this:

What I've heard is Azure is a pain in the ass, and things take three times as long to set up there, for some reason.

Doesn't matter. The cost here is a rounding error. What does matter is something like this:

https://developer.chrome.com/docs/extensions/develop/migrate...

https://workspaceupdates.googleblog.com/2021/05/Google-Docs-...

https://workspace.google.com/blog/product-announcements/elev...

https://killedbygoogle.com/

https://www.tomsguide.com/news/g-suite-free-shutdown

Etc.

These sorts of behaviors take out whole swaths of businesses wholesale. It's random, and you never know when it will happen to you.

It's the difference between managing a classroom with:

- an annoying kid throwing spitballs every day (Azure)

- the quiet kid who, one day, brings an assault rifle, a few extra mags, and starts spraying bullets into the cafeteria (Google).

Yes, one is a constant source of annoyance, but really, it's very manageable when you consider the alternative.

(Oracle, in the school analogy, is the mean kid who spreads false rumors about you. As far as I can tell, there is never a sound, long-term business reason to pick Oracle. Most of the reason Oracle is chosen is they're very good at setting up offerings which align to misaligned incentives; they're very often the right choice for maximizing some quarterly or annual objective so someone gets their bonus. In return, the firm is usually completely milked by Oracle a few years down the line. By that point, the decision maker has typically collected their bonus, moved on, and it's no longer their problem.)

throwaway2037
2 replies
8h23m

I like this question. You raise a very good point. Occam's Razor tells me the simplest explanation is "core competency". Running AI-SaaS is just a very different business from creating GPUs (including the required software ecosystem). As a counterpoint: Look at Microsoft. Traditionally, they have been pretty good at writing (and making money from) enterprise software, but not very good at hardware. (XBox is the one BIG exception, I can think of.)

zer0c00ler
0 replies
6h52m

Microsoft's mice and keyboards also were exceptionally good

fragmede
0 replies
7h13m

Except for the place where they're good at hardware, they're not good at hardware? I mean that's true, but a bit of a twist of logic, wouldn't you say?

komadori
1 replies
8h32m

I share your intuition, perhaps unfairly, that it's indeed not as hard in absolute terms. However, it certainly requires a different set of skills and organisational practices.

Just because an organisation is extremely good at one thing doesn't mean it can easily apply that to another field. I would guess that SpaceX probably does have the talent on hand to throw together a Facebook clone, but equally I think they would struggle to actually complete with Facebook as a business at scale.

selimnairb
0 replies
7h38m

Well, motivation would be an obvious thing lacking. People who want to work on rockets, I would guess, would find working on facebook to be a “boring” solved problem.

michaelt
0 replies
7h4m

It's certainly possible to start new cloud providers - there are a bunch of smaller-scale VM providers. In the GPU cloud business there's companies like lambdalabs and runpod.

But the fattest profit margins are in selling to big corporations. Big corporations who already have accounts with the likes of AWS. They already have billing set up, and AWS provides every cloud service under the sun. Container registry? Logging database? Secret management? Private networks? Complicated-ass role management? Single-sign-on integration? SOC2/PCI/HIPAA compliance? A costs explorer with a full API? Everything a growing bureaucracy could need. Getting your GPU VMs from your existing cloud provider is the path of least resistance.

The smaller providers often compete by having lower prices - but competing on cost isn't generally a route to fat profit margins. And will folks at big corporations care that you're 30% cheaper, when they're not spending their own money?

nvidia could definitely launch a focused cloud product, that competes on price - but would they be happy doing that? If they want to get into the business of offering everything from SAML logon integration to a managed labelling workforce with folks fluent in 8 languages - that could be a great deal of work.

xbmcuser
0 replies
7h52m

They have been building partnerships with ISP's and service providers all over the world with their GeForce now game streaming service. They could continue and expand this by providing a similar backend for LLM services.

omnimus
0 replies
10h41m

This doesnt seem to be true. They have been running Geforce Now for a long time and its one of the best gaming streaming services. It seems they are doing it in partnership with other regional companies but nobody says they cant use same partners. Running games with small latency seems more complicated than llms on cuda.

neverokay
0 replies
11h22m

Hey, if it doesn’t work out they can always return back to what they are, a consumer first company but now with world class hardware/software.

AI graphics service that renders games in the cloud (photorealistic), concluding its epic journey of being an amazing graphics card company.

Kinda cool when you think about it.

danpalmer
0 replies
7h29m

The hyperscalers aren't going to be on CUDA for long, Nvidia are taking too much of a cut. Google run tensor chips, Amazon and Microsoft both have their own accelerators in the works. They are massively incentivised to do so right now.

Now they do all re-sell Nvidia GPUs in their cloud businesses, but there the cost will be passed very directly on to infrastructure customers, who will see the competing higher level services from those cloud providers (hosted models) likely at lower or competitive prices, and it's going to be harder to justify renting CUDA cores for custom software.

bayindirh
0 replies
9h42m

There are already platforms and projects which provide resources for long tail of science. EU is supporting projects which brings together resource providers with researchers and private companies in some cases.

NVIDIA is not entering an empty marketplace here. Also there's already enough know-how to make things run on cloud, OpenStack, HPC, etc.

Unless they make things very difficult for independent platforms, they can't force their way in that much, from my perspective.

selvan
0 replies
7h3m

Microsoft and AWS would have a partnership with AMD/Intel for their GPUs, if those are capable and widely used as Nvidia's.

Microsoft has partnetship with OpenAI and also with Mistral.

Present convenience may not hold true in future. Nvidia knows that well.

DanielHB
0 replies
2h48m

AWS is pushing ARM hard, yet people still buy x86/x64 compute en mass

Even if AWS has its own hardware+software solution for neural networks it would still take years if not decades to tear off the CUDA platform

qwertox
28 replies
20h16m

What is FP4, 4 bit floating point? If so, the comparison graph [0] with 30x above Hopper was a bit misleading.

[0] https://youtu.be/Y2F8yisiS6E?t=4698

s_m_t
16 replies
19h0m

How can 4 bits possibly be enough? Are intermediate calculations done at a higher width and then converted down back to FP4?

WhitneyLand
5 replies
18h4m

- Training isn’t done at 4-bits, to date this small size has only been for inference.

- Research for a while now has been finding that smaller weights are surprisingly effective. It’s kind of a counterintuitive result, but one way to think about it is there are billions of weights working together. So taken as a whole you still have a large amount of information.

tmalsburg2
1 replies
9h32m

- Training isn’t done at 4-bits, to date this small size has only been for inference.

Wasn't there a paper from Microsoft two weeks ago or so where they trained on log₂(3) bits?

Edit: https://arxiv.org/pdf/2402.17764.pdf

terramex
0 replies
7h9m

They don't "train on log₂(3) bit". Gradients and activations are still calculated at full (8-bit) precision and weights are quantised after every update.

This makes network minimise loss not only with regard to expected outcome but also minimises loss resulting from quantisation. With big networks their "knowledge" is encoded in relationships between weights, not in their absolute values so lower precision work well as long as network is big enough.

acchow
1 replies
17h36m

Intuitively, there is a ton of redundancy and we still have a long way we can still compress things.

imtringued
0 replies
10h18m

Each token is represented by a vector of 4096 floats. Of course there is redundancy.

coffeebeqn
0 replies
16h36m

Maybe the rounding errors are noise that is somewhat useful in a big enough neutral net. Image generators also generate noise to work on

yalok
4 replies
15h16m

There are research papers where even 1 bit (not floating point) was enough, with some quality loss.

4 bits is effectively 16 different float point numbers - 8 positive, 8 negative, no zero and no NaN/inf. 1 bit for sign and 3 bits for exponent, 0 bits for mantissa, mantissa is implied to be 4. It’s logarithmic - representing numbers in the range from -4^3 to 4^3, smallest numbers are 4^-3.

phh
1 replies
9h30m

Thanks. First source i see for what fp4 is. Gotta say I'm surprised: I would have chosen to lose one value, but have a zero. (though I have no doubt those people are much more clever and knowledgeable than I am)

omikun
0 replies
1h25m

If the weight is zero it doesn’t need to exist

s_m_t
0 replies
14h42m

Thanks, I was thinking that zero, negative zero, inf, negative inf, and the NaN's were included like in IEEE 754

carlmr
0 replies
10h5m

1 bit (not floating point)

I like how you specified that it's not floating point.

anon291
2 replies
14h58m

The fundamental 'unit' of NN computation is not an individual vector element but rather an entire vector. One of the first results you often learn about in linear algebra is that some axes are more important than others (principal components, singular value decomposition). Thus, it totally stands to reason that the underlying field of the vector is inconsequential but rather the entire vector machinery. All you have to do is make sure that there are enough elements in the vector to get the job done for whatever bit size of element.

s_m_t
1 replies
14h40m

I see, so the idea is that enough of the quantization errors are sort of averaged out across the dimensions of the vector space to still be useful?

singularity2001
0 replies
12h18m

The way I think about it is finally it will end in a binary feature vector similar to 20Questions (male or female, alive or dead ...) just with 100s of dimensions

wongarsu
0 replies
18h5m

For training FP4 sounds pretty niche, but for inference it might be very useful.

CamperBob2
0 replies
15h23m

The various sigmoid activation functions have the effect of keeping bit growth under control, by virtue of clamping to the +/- 1 range.

Havoc
5 replies
19h1m

bit misleading.

Only partially, because in LLMs FP4 isn't half as useful as FP8. So if you have gear that crushes at FP4 then that's what you use and you benefit from that increased speed (at minimal accuracy loss).

Definitely some marketing creativity in there, but its not entirely wrong as a measure of real world usage

jxy
4 replies
15h1m

curiously, what real world usage actually uses FP4? AFICT, most of the LLMs still use BF16, and even the quantizations down to 4bits and 2bits end up back to 16bit or INT8 for actual computations.

creshal
2 replies
11h22m

Half the reason why they move back up to 8/16 bit is that current hardware doesn't properly support 4 bit floats, and you get better performance from the conversion. I think once this hardware hits, most of the computation will shift to native 4 bit just for efficency's sake.

...assuming the recent 1.58b paper doesn't render the entire float quantization approach obsolete by then.

imtringued
1 replies
10h19m

The 1.58b approach is good for everyone including for quantization. It means that current quantization schemes have room for improvement.

creshal
0 replies
6h1m

It's good for everyone on the software side, but it's not so good for the hardware side, because it means that whatever you're designing now for tapeout in 6 months and release in 12 is going to be obsolete in 2 weeks.

buildbot
0 replies
13h41m

Llama.cpp and many others support 4 bit weights and lower

buildbot
0 replies
13h42m

Yes this is it!

wongarsu
0 replies
18h12m

It's 4 bit floating point, at twice the speed of 8 bit floating point. There's also FP6, doesn't offer faster compute than FP8 but manages to take advantage of the better memory bandwith and cache use of the 6 bit format.

Apparently some people are drawing connections to this paper [1] on 4 bit LLMs, which has one NVIDIA employee among its contributors

1: https://arxiv.org/pdf/2310.16836.pdf

sipjca
0 replies
20h16m

yes

fancyfredbot
0 replies
19h33m

That's right.There was mention of a precision aware transformer engine which might make it easier to use fp4, but it's not 30x faster in a like for like way. This shouldn't be surprising since it's more or less two hoppers next to one another on a slightly improved process node. 2.5x seems more likely in cases where you don't exploit a new feature like that or the increased memory.

dagmx
26 replies
21h28m

FP8 being 2.5x Hopper is kind of disappointing after such a long time. Since its 2 fused chips, that means it’s 25% effective delta.

though it seems most of the progress has been on memory throughput and power use which is still very impressive.

I wonder how this will trickle down to the consumer segment.

azeirah
16 replies
21h14m

Jensen revealed later that the LLM inference is 30x due to architectural improvements, it's massive. I don't know if it's latency or just 2-3x performance boost with 30x more customers served in the same chip. Either way, 30x is massive.

ephemeral-life
5 replies
21h10m

30x is the type of number that when you see it in a generational improvement, you should ignore it as marketing fluff.

azeirah
4 replies
21h2m

From how I understood it, it means they optimised the entire stack from CUDA to the networking interconnects specifically for data centers, meaning you get 30x more inference per dollar for a datacenter. This is probably not fluff, but it's only relevant for a very very specific use-case, ie enterprises with the money to buy a stack to serve thousands of users with LLMs.

It doesn't matter for anyone who's not microsoft, aws or openai or similar.

acchow
2 replies
20h15m

They showed 30x was for FP4. Who is using FP4 in practice?

KaoruAoiShiho
1 replies
20h4m

But maybe you should. Once the software stack is ready for it there'll be more people since the performance gains are so massive.

dagmx
0 replies
15h28m

It would depend highly on the model though. Some stuff will generalize better to FP4 than others.

misterdabb
0 replies
18h46m

It's a weird graph... It's specifically tokens per GPU but the x-axis is "interactivity per second", so the y-axis is including Blackwell being twice the size and also the increase from fp8 -> fp4, note it will needs to be counted multiple time as half as much data is needed to be going through the networks as well.

modeless
3 replies
20h33m

He always does that. They stack up a bunch of special case features like sparsity that most people don't use in practice to get these unrealistic numbers. It'll be faster, certainly, but 30x will only be achievable in very special cases I'm sure.

cma
2 replies
20h17m

Isn't sparsity almost always a win at this point? Making everything fully connected is a major waste.

modeless
1 replies
19h41m

The kind of sparsity that the hardware supports is not fully general. I'm not aware of any large models trained using it. Maybe they are all leaving 2x perf on the table for no reason, but maybe not. I don't think sparsity is really proven to be "almost always a win" for training.

cma
0 replies
15h35m

To train well with it I think you still need to store all the optimizer state (derivatives and momentum or whatever) if not all the weights (for RigL), so maybe not nearly as much memory bandwidth advantage as you get in inference?

my123
1 replies
21h4m

The 30x number is for a really narrow scenario tbh. Running a GPT 1.8T parameters (w/ MOE) on one GB200

huac
0 replies
20h52m

'narrow scenario,' perhaps, but one that also happens to closely match rumors for GPT4's size

qwertox
0 replies
20h7m

But Blackwell in the graph is FP4 whereas Hopper is FP8.

kkielhofner
0 replies
19h16m

The other big announcement here is NIM - Nvidia Inference Microservice.

It's basically TensorRT-LLM + Triton Inference Server + pre-build of models to TensorRT-LLM engines + packaging + what appears to be an OpenAI compatible API router in front of all of it + other "enterprise" management and deployment tools.

This software stack is extremely performant and very flexible, I've noted here before it's what many large-scale hosted inference providers are already using (Amazon, Cloudflare, Mistral, etc).

From the article:

'Nvidia will work with AI companies like Microsoft or Hugging Face to ensure their AI models are tuned to run on all compatible Nvidia chips. Then, using a NIM, developers can efficiently run the model on their own servers or cloud-based Nvidia servers without a lengthy configuration process.

“In my code, where I was calling into OpenAI, I will replace one line of code to point it to this NIM that I got from Nvidia instead,” Das said.'

The dead giveaway is "I changed one line of code in my OpenAI code" which means "I pointed the OpenAI API base URL to an OpenAI compatible API proxy that likely interfaces with Triton on the backend via its gRPC protocol".

I have a lot of experience with TensorRT-LLM + Triton and have been working on a highly performant rust-based open source project for the OpenAI compatible API and routing portion[0].

On this hardware (FP4) with this software package 30x compared to other solutions (who knows what - base transformers?) on Hopper seems possible. TensorRT-LLM and Triton can already do FP8 on Hopper and as noted the performance is impressive.

[0] - https://github.com/toverainc/ai-router

jimmySixDOF
0 replies
14h45m

This is also the only place Nvidia are getting competitive pressure - from the likes of Groq (and likely but less published from Cerebras) with higher inferance T/s and concurrency utilization/batching [1] so if this proves to be the true then the case for big chip systems (on todays specs) will be harder.

[1]https://twitter.com/swyx/status/1760065636410274162?t=rpbcr8...

dagmx
0 replies
21h1m

Yeah and the 30x is largely due to the increase in factors like packaging and throughput. It's not indicative of general purpose performance which is what I was talking about.

Again, I do think the throughput and energy efficiency gains are impressive, but the raw performance gain is lower than I'd have expected for the massive leap in node size etc

YetAnotherNick
8 replies
20h22m

How is 2.5x disappointing in one generation?

dagmx
5 replies
19h14m

Did you skip the sentence immediately after that one?

It’s two fused chips. So 1.25x per chip. 25% uplift. Not 2.5x uplift. The 2.5x is for the whole package.

throwaway11460
2 replies
16h53m

Is that how it works? Why don't we just put many chips in one computer?

dagmx
1 replies
15h30m

the massive blackwell SoC he showed is two Blackwell dies with an interconnect. It’s very similar to what Apple does with their Ultra series.

Then the B200 package is 2 of these plus a CPU. So a total of 4 GPU dies in each unit.

abhinavk
0 replies
15h15m

Then the B200 package is 2 of these plus a CPU.

That's GB200.

downvotetruth
1 replies
16h43m

two fused chips

Jensen's comment about being first was such a dig to Emerald Rapids.

dagmx
0 replies
15h31m

Is it the first? The Apple Ultra series chips are two Max’s fused with an interconnect. In which case it’s both CPU and GPU.

I believe this is just the first for a GPU only product.

chimney
1 replies
20h18m

Compare to the 10x that was Hopper uplift.

YetAnotherNick
0 replies
20h3m

Because it involved scaling in chip area needed for FP8. AI community realized that FP8 training is possible few years back so the transistors given for FP8 was scaled. Overall I think transistors grow just by ~50% per generation so most of the gains comes from removing FP32/FP64 share which were dominant 10 years back, but there is only some point it could go to.

bluedino
22 replies
21h10m

They acquired Bright Cluster Manager a few years ago, who would be next on their list to acquire? It seems like they want to provide customers with the whole stack.

shiftpgdn
19 replies
20h28m

Canonical is a ripe target. Canonical has been trying to grow Ubuntu and other tools in the enterprise world for the last few years without significant success, and much of the Nvidia devkit stuff is built around Ubuntu.

echelon
14 replies
20h20m

Please do not give them this idea.

Ubuntu is actually a pretty great daily driver desktop Linux, and I'd hate for that to lose priority and disappear.

I'm not a fan of what happened to the Red Hat ecosystem for exactly the same reasons.

xmprt
9 replies
19h56m

As someone who has used Ubuntu in the past and has now moved onto greener pastures, I appreciate everything Canonical and Ubuntu have done for the Linux community but there are many better options today and Canonical is already far from the company it once used to be.

xarope
2 replies
14h8m

what would your top suggestions be for server or desktop, instead of ubuntu? Arch (too unstable for server?), Silverblue?

unmole
0 replies
10h58m

Debian for server, Fedora or openSUSE Tumbleweed for desktop.

rompledorph
0 replies
12h22m

Recently installed PopOS on my desktop. That is currently my top suggestion as an Ubuntu alternative

dgfitz
2 replies
19h47m

Whenever I see an open job req for canonical I run for the hills.

mianos
1 replies
17h53m

They must have a fast revolving door. I have know so many people who worked there in the last few years but seem to have moved in.

Probably depends on the team and the worst ones have a lot of churn.

margorczynski
0 replies
16h39m

From what I see on Glassdoor they have bad, toxic management so that would explain a lot.

solumunus
1 replies
12h28m

I hate it when people say stuff like this and then don’t express their opinions on the better alternatives.

sitkack
0 replies
12h25m

They mean Nix and Arch.

hnlmorg
0 replies
19h6m

There have always been better distributions than Ubuntu. That isn’t something new. What Canonical did better than anyone else was mass market appeal. Or at least appeal to a wider market than Linux traditionally had. But as someone who’s used Linux since the 90s, I was always underwhelmed by Ubuntu as a distribution.

That all said, I have to work with a lot of CentOS and Rocky workstations for VFX and I enjoy those for desktop Linux even less than Ubuntu.

greggsy
1 replies
19h29m

Tbh, Ubuntu’s only pull is the support and breadth of users. As a desktop, it’s let down by Unity, which IMHO is basically a port of Windows 8 tablet UI.

If they defaulted back to a menu and taskbar-based WM, it might actually be more approachable to users who are more familiar with macOS and Windows.

kcb
0 replies
19h19m

Main Ubuntu hasn't shipped with Unity for like 7 years.

riffic
0 replies
18h43m

the desktop Linux ecosystem can survive w/o Ubuntu. Silverblue / Universal Blue for instance is quite compelling.

pm90
0 replies
14h0m

What happened to Red Hat? As far as I see they’re continuing to invest in linux. Im glad they are keeping CoreOs around as FCOS.

riffic
1 replies
18h44m

That's not culture, that's Shuttleworth.

ethbr1
0 replies
15h41m

"We hire only the best who can fully recall the intimate nuances of their high school experience?"

pjmlp
0 replies
10h40m

I would rather bet Microsoft doing that, given their cozy relationship for .NET and main WSL distribution.

At least I would finally get to buy MS Ubuntu PCs at the shopping mall.

az226
0 replies
9h36m

Anthropic or Mistral and build AGI/ASI.

jairuhme
17 replies
21h23m

I haven't listened to Jensen speak before, but am I the only one who thought the presentation wasn't very polished? Not a knock on anything he has accomplished, just an observation that sorta surprised me

sct202
1 replies
20h38m

I think it's a good reminder that objectively great CEO's and leaders can be kind of cringe when presenting. A lot of times people like that get passed up in promotions in favor of smooth talkers.

wmf
0 replies
19h17m

It's been said that founders are people who can't get hired.

azeirah
1 replies
21h13m

He said he didn't rehearse well. I think it makes him come across very genuinely, not some dumb hyperpolished corporate blabla

CaptainFever
0 replies
8h26m

In comparison to Apple keynotes, precise, polished, practiced and pre-recorded.

acchow
1 replies
21h13m

He has more important things to do than perfecting a presentation. He likes his employees to message him freely with things they think he can help with.

PartiallyTyped
0 replies
11h30m

I read about this around and tbh I respect it a lot. Somehow Jensen has managed to scale the company and still keep it focused on the product with high eng practices.

swalsh
0 replies
20h41m

He's selling water in a desert, kind of doesn't matter how polished his presentation is.

sumedh
0 replies
8h55m

If I am not mistaken he has said in some interviews he is not a big fan of public speaking.

modeless
0 replies
20h32m

It's typical. He isn't a great public speaker IMO. Not terrible but not great.

jefozabuss
0 replies
21h13m

When he had that slide up with generating everything I was kind of expecting that he'd say this whole keynote is generated including him. That'd have been crazy.

ipsum2
0 replies
21h4m

Yeah he said he didn't rehearse and it really shows.

erupt7893
0 replies
20h17m

I've been watching his keynotes for as long as I can remember, this is how it's always been

caycep
0 replies
19h58m

I remember his opening line at NEURIPS 2017, to an audience of grad students and postdocs, "only Nvidia would unveil their most expensive product to an audience who's completely broke"

Then he went into a comedic monologue about GANS. But hey, at least that meant that the CEO was reading the actual conference proceedings...

cableshaft
0 replies
20h38m

The beginning in particular seemed pretty rough, but he seemed to mostly get into a groove about halfway into it. At least he started talking a lot smoother around then.

angm128
0 replies
21h2m

The products, animations and slides are doing some heavy lifting. Most jokes don't land and his presentation is somewhat confusing at times (e.g. star trek intro token count)

Havoc
0 replies
17h48m

Previous ones have been similar. Awkward “you may clap now” pauses etc.

Don’t think anyone cares as long as the company keeps getting the bets right

CoachRufus87
0 replies
16h21m

Its refreshing. Another tech keynote that I regularly watch (Apple) is far too polished these days.

tamimio
11 replies
20h16m

I think at this point, they should stop making it video “cards” but rather video “stations”, a full tower station with power supply and one giant “card” inside with proper cooling, etc., might also justify the crazy prices anyway.

ufocia
4 replies
20h4m

Probably better to stick to the GPUs. Integration is a low margin game.

georgyo
2 replies
19h50m

I'd prefer they stick to GPUs, but I think you're over simplifying.

Dell proves that selling complete units is very profitable.

Apple shows that owning the entire stack is immensely profitable.

Nvidia already has significant hardware and software investment. They very well could fully integrate and grab larger slices of the pie.

In fact, Nvidia already has complete appliance like fully integrated machines. But enterprises like to install their own OS and run their own software stack. These appliances have not caught on, at least not yet.

zer00eyz
1 replies
18h0m

> Apple shows that owning the entire stack is immensely profitable.

Apple shows no such thing. Apple, sells pretty, reliable and safe. A car is a car, but apple is a sports car, or a saloon. Vertical integration is the way they chose to deliver that, and pretty and reliable are all normal people care about.

Nvidia is gonna have to think long and hard about the "whole stack". 20 years ago they might have been able to pull a next, but right now anything that isnt LINUX is a rounding error, and they dont want to turn into SUN (no one at nividia is smart enough to make them the next sun).

Nvidia architecture + Nvidia os is not something that I see them pulling off for the datacenter.

__mharrison__
0 replies
12h8m

Nvidia is moving up the stack. They announced NIMs today. I liken it to Docker for AI.

caycep
0 replies
20h1m

granted, at this point we plug the computer into the GPU, so it might not make a difference...

justinclift
0 replies
17h56m

Heh Heh Heh

"Ultimate AI Workstation". Pricing starts at US$83,549:

https://shop.lambdalabs.com/gpu-workstations/vector/customiz...

Adding every option only adds $2,100 to the price too (totalling $85,649). They should probably just include everything as standard. ;)

jazzyjackson
0 replies
10h33m

I love that "we installed a working python environment for you" is a front-page value-add

wmf
0 replies
19h19m

SXM for desktop would be great but it won't happen. The PC industry can't even adopt things like 12VO.

ribosometronome
0 replies
19h41m

Isn't that just a computer? Or an eGPU, if it doesn't contain the rest of the computer?

acchow
0 replies
19h14m

That’s the DGX GB200 they announced today, with liquid cooling.

herecomethefuzz
9 replies
19h59m

"Platform company" means multi-chip in this case?

Seems logical since it's becoming impractical to cram so many transistors on a single die.

1oooqooq
3 replies
19h37m

no it means rent seeking.

imagine aws if they also sold all computers in the world, now you can only rent from them

maximus-decimus
1 replies
15h9m

"For only 100$ a month, you'll be able to turn on the gpu you already paid for"

--Nvidia, pretty soon

bpye
0 replies
11h46m

This is sort of already a reality. Their vGPU functionality (partitioning a single physical GPU into multiple virtual GPUs) is already separately licensed - https://www.nvidia.com/en-us/data-center/buy-grid/

And that's once you've bought an expensive Tesla/Quadro GPU too.

throwaway11460
0 replies
19h14m

So like IBM at the beginning of computers

0xcde4c3db
3 replies
19h15m

I don't really understand the bird's-eye view of the product line, but judging by some of the raw physical numbers and configurations Jensen was bragging about, it means that they want to basically play the mainframe game of locking high-end applications into proprietary middleware running on proprietary chassis with proprietary cluster interconnect (hello, Mellanox acquisiton).

wtallis
1 replies
18h47m

The lock-in is more of a bonus for them. The underlying problem is that it's impossible to build a chip big enough, or even a collection of chiplets big enough. Training LLMs requires more silicon than can fit on one PCB, so they need an interconnect that is as fast as possible. With interconnect bandwidth as a critical bottleneck, they're not going to wait around for the industry to standardize on a suitable interconnect when they can build what they need to be ready to ship alongside the chips they need to connect.

l33tman
0 replies
8h51m

Cerebras: -Hold my beer

anon291
0 replies
14h24m

In this case the interconnects are also doing compute.

dweekly
0 replies
19h14m

It means all the main chips required for a large-scale datacenter. And many of the layers of software on top of it.

Hardware: * The GPU * The GPU-GPU Fabric (NVLINK) * The CPU * The NIC * The Network Fabric (infiniband) * The Switch

And that's not even starting to get into the many layers of the software stack (CUDA, Riva, Megatron, Omniverse) that they're contributing and working to get folks to build on.

__mharrison__
9 replies
12h31m

My take from being at the keynote and the content I've seen so far at the conference is that Nvidia's is moving up the stack (like all good hardware vendors are prone to do).

Obviously they are going to keep doing bigger. But the takeaway for me is that they are building "docker for llms" - NIM. They are building a container system where you can download/buy(?) NIMs and easily deploy them on their hardware. Going to be fun to watch what this does to all the AI startups...

flessner
5 replies
11h31m

Won't do anything to most consumer facing AI, the UI & convenience is already a major selling point. A bigger threat is that the feature the business is built around makes it into mainline software... there is no demand for (paid) background removal anymore as every iPhone can do it nowadays.

Generally if whatever AI product you have can easily just be a feature in whatever application businesses already use, then you are running a business on borrowed time.

dig1
2 replies
8h17m

here is no demand for (paid) background removal anymore as every iPhone can do it nowadays.

Proper background removal of even remotely complex content is still in demand, especially on a large scale. I doubt you'd use an iPhone to work on >100 of images per second.

actionfromafar
0 replies
7h3m

Holy Telephony Batman! That's both insane and perfectly rational at the same time.

jerska
0 replies
7h27m

there is no demand for (paid) background removal anymore as every iPhone can do it nowadays.

PhotoRoom begs to disagree with this statement.

Kinrany
0 replies
9h9m

Oh no, turns out the thing people really want from AIs is generality because everything else can be done in stupid software!

djtango
1 replies
11h32m

I'm not that abreast of all the developments in the AI space.

What specific class of AI startups do you have in mind here? AI-aaS startups who provide the "infra"?

fisf
0 replies
22m

AI startups who just wrap a (standard) API or model in a thin UI layer. The backend part will be a commodity and the UI layer offers no value proposition.

gammalost
7 replies
8h54m

I wonder when we as an industry will start to address the scaling issues in LLMs.It is obviously in Nvidias interest to keep pushing out bigger and better GPUs, but what is the collective interest?

It is already proven that good language models are possible given enough resources. The challenge now is to put these models in a solution which do not require unfathomable amounts of resources for the average use cases.

bayindirh
6 replies
8h50m

Wasteful software development is easy and keeps momentum for development. As long as growth is king, quick and dirty will always beat well optimized and smaller systems.

This is not a problem with AI only, but with every software we use. Only two groups try to optimize things and try to fit into smaller systems. Passionate programmers and people who is paid to do this (e.g.: phone manufacturers' software teams, etc.).

Difwif
3 replies
6h48m

Not sure it's fair to characterize modern LLMs as 'wasteful software development" or unoptimized. The implementations do quite an impressive level of optimization with what hardware is available. New theoretical methods w.r.t quantization represent most of our software optimization techniques and we're probably hitting the limit of that shortly with ternary or binary gates.

To your point enthusiasts and developers with strong financial motivation have fairly optimized code. AI definitely falls into the later group. These are not like your typical web app :)

bayindirh
2 replies
6h39m

Currently AI folks create bigger models and feed larger training data sets, because we still don't know the efficiency limits. IOW, currently we can't accelerate the learning beyond a certain point with less training data. This is esp. true on LLM/GenAI space.

On the other areas where NNs are used, what I can see is training models with less data is not only plausible, but very possible, esp. in image processing, probably an image carries more information than a single sentence.

I think AI falls on the spectrum with a slight bias to the latter group. Because if you can shrink a model 10% and lose a month, you'd rather have a 10% bigger model now, and reap the money^H^H^H^H^H fame.

I don't know anything about a typical web app, because I'm not a typical developer developing web apps. :)

Difwif
1 replies
5h45m

I just think we're talking about a completely different problem with AI optimization. There's $billions of effort and research that goes into AI optimization. Scaling model size and training set size happens because it's what the research and evidence tells us will improve model performance reliably. If we could reduce any of it for the same performance we would. Top model performance is an arms race and it's happening at every expense. The largest players are all shooting to beat GPT-4 or Claude Opus and achieve AGI (whatever that means).

This is very different than a program that requires zero research breakthroughs to dramatically improve and is simply slow and bloated because people have different priorities.

bayindirh
0 replies
4h39m

If we could reduce any of it for the same performance we would.

Nope, because all of them are harder than just going bigger.

Top model performance is an arms race and it's happening at every expense.

This is also what I said. "Growth (in model performance) is king, so quick and dirty (going bigger) beating harder optimization efforts".

This is very different than a program that... [Snipped for brevity]

Again this is what I said by "it keeps development momentum". Yes people have different priorities. Mostly money and fame in this point.

So, we don't disagree a bit.

gammalost
1 replies
5h39m

I don't think it is thanks to wasteful software development. The libraries used for LLMs do a lot to squeeze out the full potential of GPUs.

I think it is more of an information problem. How can we store enough information in weights so that it is possible to train models without a budget similar to OpenAI

bayindirh
0 replies
4h33m

Just because you're squeezing out every bit of performance from a processor doesn't mean all the work you're doing is meaningful or can't be optimized.

I work on material simulations. I make processors hit their TDPs, saturate their pipelines and make them go as fast as they can. However, sometimes we come up with a formula optimization which does things 1-2% faster, which means we can save hours on a bigger computation. Utilization doesn't change, but speed does.

I think it is more of an information problem.

It's an interesting point of view, and partially true. However, we're still wasting "space" by just adding bits to the network to make it contain more data.

There's a long way to go. "Wasteful development" is a phase and always be part of software development. The important part is not forgetting that optimization exists. Otherwise we can't sustain ourselves much with all that energy use.

stevethomas
6 replies
20h42m

Time to sell. When they start becoming a platform, it means they have nothing more concrete in the near future. Sell now and buy again later once the price corrects.

pvg
0 replies
20h30m

If you held and then sold Nvidia stock when they announced CUDA or GeForce Live, you'd now be now a big pile of negative money richer.

mark336
0 replies
16h54m

I wouldn't be 100% against them, but it is disappointing that they don't have any better ideas.

hsuz
0 replies
10h58m

Selling is a bit exaggerated. Scaling up is almost always non-detrimental. But I do feel that NVDA is slowly falling to a stall because there are no exciting new modes of business coming out.

golergka
0 replies
20h33m

Does a company like Nvidia has to have anything more concrete than newer, bigger and faster chips?

belter
0 replies
20h35m

Don't bet against a CEO who knows what is talking about, has 80% market share and an arm tattoo of his own company logo.... :-)

So far the short sellers, learned that bitter lesson.

Keyframe
0 replies
19h31m

They still have to announce you can send an email through their platform.

Deasiomlo
4 replies
19h15m

Double digit peta flop mass produced.

"The computing power needed to replicate the human brain’s relevant activities has been estimated by various authors, with answers ranging from 10^12 to 10^28 FLOPS."

Petaflop is 10^15

Crazy times.

teaearlgraycold
3 replies
19h5m

I’ll be happy with this if we use it to design viable fusion power plants. And I’ll be severely disappointed if it’s mostly used for ad targeting.

sbstp
0 replies
12h16m

You are about to be severely disappointed.

belter
0 replies
8h7m

Fusion plants will be used to power ad targeting Nanoprobes.

jakobov
3 replies
15h23m

They are claiming a 25x reduction in power consumption. That can't be right. Anyone understand where this number is coming from?

LTL_FTC
2 replies
15h1m

Did you read that in the linked article? I couldn’t find it. But maybe due to the better efficiency with regard to the performance boost (5x) and the ability to now use 27 trillion parameters versus 1.7 Trillion, one can presumably finish the same amount of work in 1/25th of the time and bam, reduction in power consumption. As you say, I’m skeptical the max power draw itself is 25x lower.

wmf
1 replies
14h37m

I think Jensen said something like needing 25x fewer GPUs (vs. A100) to get the same performance, which amounts to essentially the same thing.

creshal
0 replies
11h19m

It doesn't imply a full 25x reduction in power consumption though, that might "only" go down by 10x.

geor9e
3 replies
18h26m

Like a lot of the commenters here, I have a problem with this headline. They don't "seek to become a platform company", as in making their own cloud platform where they rent GPU time, meaning they stop selling GPUs to other cloud platforms. That easy misinterpretation makes good clickbait, but no, that's not what the article says - the article has Huang bragging that CUDA already is a parallel computing platform, for a decade or more, and Blackwell Architecture is so integrated and customizable with CUDA (with all its user-extendable kernels and community) that it's thought of as a platform rather than just chip architecture.

baobabKoodaa
1 replies
17h52m

Someone please "chip in" and confirm or deny if this interpretation is correct?

wmf
0 replies
16h50m

"Platform" can have several different meanings and some people in this thread are picking the most evil meaning as an excuse to shit on Nvidia. It's true that Nvidia is providing the GPU, the CPU, the server, the rack, the network, the drivers, and the orchestration software to run AI training/inference. (If you want that stuff. You could just buy GPUs if you want.) It's fair to call that a platform.

Nvidia is not becoming a cloud provider (beyond a small eval environment perhaps).

tzm
1 replies
19h16m

Platform co seems fitting, considering Nvidia's data center revenue in the fourth quarter of 2023 was a record $18.4 billion, which is 27% higher than the previous quarter and 409% higher than the previous year.

Seems revenue from inference is growing at a significant clip.

ec109685
0 replies
17h54m

Data center revenue includes sales to companies like Meta that run their chips in their own data centers.

nojvek
1 replies
6h30m

The issue with bigger LLM or even MMMs is that the bigger they are the more they are cramming and regurgitating the training data, and that opens up to lawsuits.

Making NNs generalize the way humans do it is still a hard problem.

lewhoo
0 replies
6h13m

Is this indeed established ? Could you provide a link or three ?

lvl102
1 replies
20h31m

Seems Nvidia is going for maximum margin as they see competition ahead.

theGnuMe
0 replies
20h14m

And they can build a big moat with cuda.

amelius
1 replies
19h22m

"platform co."

Platform company, as in they're allowing developers on their AI platform and opening an app store?

fnordpiglet
0 replies
19h16m

The headline is editorialized by the submitter, the actual headline is “Nvidia CEO Jensen Huang announces new AI chips: ‘We need bigger GPUs’” which is arguably worse.

The article doesn’t discuss becoming a platform co but instead discussed ways their existing platform subscription model is evolving to add backwards compatibility testing.

adamnemecek
1 replies
15h10m

Which Blackwell is it named after?

bilsbie
0 replies
6h50m

Does this support that 1-2 bit stuff we were hearing about a few weeks ago?