HN comments for: Google's First Tensor Processing Unit: Architecture

50 replies

17h16m

2024-03-26 01:09:33 UTC

On the podcast interview now Groq CEO Jonathon Ross did[1] he talked about the creation of the original TPUs (which he built at Google). Apparently originally it was a FPGA he did in his 20% time because he sat near the team who was having inference speed issues.

They got it working, then Jeff Dean did the math and the decided to do an ASIC.

Now of course Google should spin off the TPU team as a separate company. It's the only credible competition NVidia has, and the software support is second only to NVidia.

[1] https://open.spotify.com/episode/0V9kRgNS7Ds6zh3GjdXUAQ?si=q...

Laremere

25 replies

13h47m

2024-03-26 04:38:49 UTC

The way I see, NVidia only has a few advantages ordered from most important to least:

1. Reserved fab space.

2. Highly integrated software.

3. Hardware architecture that exists today.

4. Customer relationships.

but all of these aspects are weak in one way or another:

For #1, fab space is tight, and NVidia can strangle its consumer GPU market if it means selling more AI chips at a higher price. This advantage is gone if a competitor makes big bets years in advance, or another company that has a lot of fab space (intel?) is willing to change priorities.

2. Life is good when your proprietary software is the industry standard. Whether this actually matters will depend on the use case heavily.

3. A benefit now, but not for long. It's my estimation that the hardware design for TPUs is fundamentally much simpler than for GPUs. No need for raytracing, texture samplers, or rasterization. Mostly just needs lots of matrix multiplication and memory. Others moving into the space will be able to catch up quickly.

4. Useful to stay in the conversation, but in a field hungry for any advantage, the hardware vendor with the highest FLOPS (or equivalent) per dollar is going to win enough customers to saturate their manufacturing ability.

So overall, I give them a few years, and then the competition is going to be real quite fast.

bartwr

9 replies

7h36m

2024-03-26 10:49:38 UTC

Seems you have not worked with ML workloads, but base your comment on "internet wisdom", or worse, business analysts (I am sorry if that's inaccurate).

On GPUs, ML "just works" (inference and training) and are always order of magnitude faster than whatever CPU you have. TPUs work very well for some model architectures (old ones that they were optimized and designed for) and on some novel others can be actually slower than a CPU (because of gathers and similar) - this was my experience working on ML stuff as an ML Researcher at Google till 2022, maybe it got better but I doubt. Older TPUs were ok only for inference of those specific models and useless for training. And anything new I tried (fundamental part of research...) - the compiler would sonetimes just break with an internal error, most of the time just produce terrible and slow code, and bugs filed against it would stay open for years.

GPU is so much more than a matrix multiplier - it's a fully general, programmable processor. With excellent compilers, but most importantly - low level access that you don't need to rely on proprietary compiler engineers (like TPU ones) and anyone can develop something like Flash Attention. And as a side note: while a Transformer might be mostly matrix multiplication, many other models are not.

sevagh

4 replies

7h4m

2024-03-26 11:22:19 UTC

Also, it's disingenuous to say "there's only 4 things you need to beat NVIDIA" when each of the 4 is an enormous undertaking.

puppymaster

3 replies

5h9m

2024-03-26 13:17:18 UTC

not to mention every not-so-serious, inference heavy ML developers just want something to work to deliver to client. That itself is a semi-moat.

kkielhofner

2 replies

4h53m

2024-03-26 13:33:26 UTC

It's been talked to death but non-CUDA implementations have their challenges regardless of use case. That's what first-mover advantage and > 15 years of investment by Nvidia in their overall ecosystem will do for you.

But support for production serving of inference workloads outside of CUDA is universally dismal. This is where I spend most of my time and compared to CUDA anything else is non-existent or a non-starter unless you're all-in on packaged API driven Google/Amazon/etc tooling utilizing their TPUs (or whatever). The most significant vendor/cloud lock-in I think I've ever seen.

Efficient and high-scale serving of inference workloads is THE thing you need to do to serve customers and actually have a chance at ever making any money. It's shocking to me that Nvidia/CUDA has a complete stranglehold on this obvious use case.

sevagh

1 replies

4h44m

2024-03-26 13:42:23 UTC

A great summary of how unserious NVIDIA's competitors are is how long it took AMD's flagship consumer/retail GPU, the 7900 XT[X], to gain ROCm support.

That's quite literally unacceptable.

kkielhofner

0 replies

1h41m

2024-03-26 16:44:48 UTC

For those who don't know - one year after launch.

Meanwhile Nvidia will go as far as to back port Hopper support to CUDA 11.8 so it "just runs" the day of launch with everything you already have.

sigmoid10

1 replies

3h9m

2024-03-26 15:17:14 UTC

On GPUs, ML "just works"

If you had worked with ML, you'd know that this is not true. It's actually more like the opposite. It also has nothing to do with the chips themselves. Things don't magically work "because GPU", they work because manufacturers spend the time getting their drivers and ecosystems right. That's why for example noone is using AMD GPUs for ML, despite them offering more compute per dollar on paper. Getting the software stack to the point of Nvidia/CUDA, where things really do "just work", is an enormous undertaking. And as someone who has been researching ML for more than a decade now, I can tell you Nvidia also didn't get these things right in the beginning. That's the reason why they have no real competition today (and still won't for quite some time).

mike_hearn

0 replies

2024-03-26 18:17:30 UTC

Probably bartwr is using "GPUs" to mean NVIDIA GPUs. Seeing as nobody uses AMD GPUs for it, that simplification seems OK.

sudosysgen

0 replies

1h50m

2024-03-26 16:35:56 UTC

ML doesn't just work on GPUs. It's not uncommon to have architectures where GPUs don't really work, we just tend not to use those :)

Laremere

0 replies

1h50m

2024-03-26 16:35:34 UTC

Hey, this is a good comment. I've only toyed with ML stuff, but I've done a lot with GPUs. I hope you can find my "step back" perspective as valuable I find your up close one.

My chief mistake in the above comment was using "TPU", as that's Google's branding. I probably should've used "AI focused co-processor". I'm not talking exclusively about Google's foray into the space, especially as I haven't used TPUs.

My list of things to ditch on GPUs doesn't include cores. My point there is that there's a bunch of components that are needed for graphics programming that are entirely pointless for AI workloads, both inside the core's ALU and as larger board components. The hardware components needed for AI seem relatively well understood at this point (though that's possible to change with some other innovation).

Put another way, my point is this: Historically, the high end GPU market was mostly limited to scientific computing, enthusiast gaming, and some varied professional workloads. Nvidia has long been king here, but with relatively little attempt by others at competition. ML was added to that list in the last decade, but with some few exceptions (Google's TPU), the people who could move into the space haven't. Then chatGPT happened, investment in AI has gone crazy, and suddenly Nvidia is one of the most valuable companies in the world.

However, The list of companies who have proven they can make all the essential components (in my list in the grandparent) isn't large, but it's also not just Nvidia. Basically every computing device with a screen has some measure of GPU components, and now everyone is paying attention to AI. So I think within a few years Nvidia's market leadership will be challenged, and they certainly won't be the only supplier of top of the line AI co-processors by the end of the decade. Whether first mover advantage will keep them in first place, time will tell.

otabdeveloper4

4 replies

10h56m

2024-03-26 07:30:23 UTC

CUDA is absolute shit, segfaults or compiler errors if you look at it wrong.

NVidia's software is the only reason I'm not using GPU's for ML tasks and likely never will.

KeplerBoy

1 replies

10h18m

2024-03-26 08:07:51 UTC

That's just C. If you're accessing your arrays out of bounds it's going to segfault. hopefully.

Can't blame CUDA for that one.

otabdeveloper4

0 replies

8h24m

2024-03-26 10:02:08 UTC

I'm talking about the compiler segfaulting, not the end-user code.

Culonavirus

1 replies

9h34m

2024-03-26 08:52:12 UTC

Skill issue.

otabdeveloper4

0 replies

8h25m

2024-03-26 10:01:20 UTC

No, CUDA's botched gcc implementation segfaulting due to compiler errors during compilation is not a "skill issue".

(Well, a skill issue of whoever is patching gcc on Nvidia's end, I guess.)

1 replies

12h25m

2024-03-26 06:01:07 UTC

Actually their real advantage is the large set of highly optimised CUDA kernels.

This is the thing that lets them outperform AMD chips even on inferior hardware. And the fact that anything new gets written for CUDA first.

There is OpenAI's Triton language for this too and people are beginning to use it (shout out to Unsloth here!).

Reserved fab space.

While this is true, it's worth noting that the inference only Groq chip which gets 2x-5x better LLM inference performance is on a 12nm process.

ants_everywhere

0 replies

6h22m

2024-03-26 12:03:44 UTC

Honest question: will AI help AMD catch up with optimized CUDA/ROCM kernels of their own?

jimberlage

1 replies

1h20m

2024-03-26 17:06:12 UTC

I’ve spent the last month deep in GPU driver/compiler world and -

AMD or Apple (Metal) or someone (I haven’t tried Intel’s stuff) just needs to have a single guide to installing a driver and compiler that doesn’t segfault if you look at it wrong, and they would sweep the R&D mindshare.

It is insane how bad CUDA is; it’s even more insane how bad their competitors are.

jimberlage

0 replies

46m

2024-03-26 17:40:00 UTC

If you work in hardware and are interested in solving this lemme say this

There are billions of dollars waiting for the first person to get this right. The only reason I haven’t jumped on this myself is a lack of familiarity with drivers.

1 replies

12h31m

2024-03-26 05:55:17 UTC

These have always been NVIDIA's "few" advantages and yet they've still dominated for years. It's their relentless pace of innovation that is their advantage. They resemble Intel of old, and despite Intel's same "few" advantages, Intel is still dominant in the PC space (even with recent missteps).

weweersdfsd

0 replies

12h10m

2024-03-26 06:15:57 UTC

They've dominated for years, but now all big tech companies are using their products in scale not seen before, and all have vested interest in cutting their margins by introducing some real competition.

Nvidia will do good in the future, but perhaps not good enough to justify their stock price.

logicchains

0 replies

12h31m

2024-03-26 05:55:23 UTC

2. Highly integrated software.

NVidia's biggest advantage is that AMD is unwilling to pay for top notch software engineers (and unwilling to pay the corresponding increase in hardware engineer salaries this would entail). If you check online you'll see NVidia pays both hardware and software engineers significantly more than AMD does. This is a cultural/management problem, which AMD's unlikely to overcome in the near-term future. Apple so far seems like the only other hardware company that doesn't underpay its engineers, but Apple's unlikely to release a discrete/stand-alone GPU any time soon.

dagmx

0 replies

11h46m

2024-03-26 06:40:06 UTC

Don’t underestimate CUDA as the moat. It’s been a decade of sheer dominance with multiple attempts to loosen its grip that haven’t been super fruitful.

I’ll also add that their second moat is Mellanox. They have state of the art interconnect and networking that puts them ahead of the competition that are currently focusing just on the single unit.

Oioioioiio

0 replies

4h33m

2024-03-26 13:53:21 UTC

Nvidia has so much software behind all of this, your list is a tremendes understatement.

Alone how many internal ML things nvidia builds helps them tremendesly to understand the market (what does the market need).

And they use their inventions themselves.

'only has a few' = 'has a handful easy to list but with huge implications which are not easily matched by amd or intel right now'

KeplerBoy

0 replies

10h16m

2024-03-26 08:09:55 UTC

Nvidia's datacenter AI chips don't have raytracing or rasterization. Heck, for all we know the new blackwell chip is almost exclusively tensor cores. They gave no numbers for regular CUDA perf.

ipsum2

13 replies

13h1m

2024-03-26 05:25:08 UTC

It's the only credible competition NVidia has

This is wrong, both AMD and Intel (through Habana) have GPUs comparable to H100s in performance.

10 replies

12h24m

2024-03-26 06:02:19 UTC

Yes, but they don't have the custom kernels that CUDA has. TPUs do have some!

rurban

9 replies

10h23m

2024-03-26 08:02:58 UTC

They have Vulcan, which is cross-compatible.

And AMD has ROCm. pytorch is standard and pytorch has ROCm support. And the Google TPU v5 also has pytorch support.

We do have a couple of H100's, but I'd love to replace them with AMD's

Kelteseth

8 replies

9h7m

2024-03-26 09:19:03 UTC

If AMD fixes or open sources their proprietary firmware blob[0]. Geohot streamed all weekend on Twitch, reverse engineering the AMD firmware. It was quite entertaining learning about how that low level hardware firmware works[1] and his rants about AMD of course.

[0] https://www.phoronix.com/news/Tinybox-Radeon-Again-UMR

[1] https://www.twitch.tv/georgehotz

refulgentis

2 replies

6h7m

2024-03-26 12:18:51 UTC

Geohot doesn't know what he's talking about and I'm kinda ashamed to see this lazy thinking leak onto HN. There was an article a couple weeks back on AMD open sourcing drivers in the Linux kernel tree that you should look into.

Kelteseth

1 replies

5h25m

2024-03-26 13:00:56 UTC

Care to explain a bit more? His rant was about the firmware having crashes not the Linux driver.

refulgentis

0 replies

4h53m

2024-03-26 13:33:17 UTC

Firmware crashes => days long "open source it and I'll fix it. no? why does AMD hate its customers?"

I got an appointment and have exactly one minute till I have to leave, apologies for brevity: they can't open source the full driver because then they'd have to release HDMI spec stuff that the consortium says they can't. (I don't support any of that, my only intent is to communicate George isn't really locked in here when he starts casting aspersions or claiming AMD doesn't care)

KeplerBoy

2 replies

8h53m

2024-03-26 09:33:19 UTC

Geohot is wrangling with unsupported consumer hardware.

The datacenter stuff is on a different architecture and driver stack. The number one supercomputer on the top500 list (frontier at ORNL) is based on AMD GPUs and AMD is probably more invested in supporting that.

kkielhofner

0 replies

4h41m

2024-03-26 13:44:40 UTC

I work with Frontier and ORNL/OLCF. They have had and continue to have issues with AMD/ROCm but yes, they do of course get excellent support from AMD. The entire team at OLCF is incredible as well (obviously) and they do amazing work.

Frontier certainly has some unique quirks but the documentation is online[0] and most of these quirks are inherent to the kinds of fundamental issues you'll see on any system in the space (SLURM, etc).

However, most of the issues are fundamentally ROCm and you'll run into them on any MIxxx anywhere. I run into them frequently with supported and unsupported consumer gear all the way up.

[0] - https://docs.olcf.ornl.gov/systems/frontier_user_guide.html

QuadmasterXLII

0 replies

5h59m

2024-03-26 12:26:53 UTC

I mean, that's kinda nvidia's whole shtick: anyone can play around synthesizing cat pictures on their gaming GPU and if they make a breakthrough, the same software will transfer to X million dollar supercomputers.

immibis

1 replies

7h32m

2024-03-26 10:54:11 UTC

Subscriber only videos, so nobody can confirm that he did that, nor archive whatever valuable information he released. At least not without paying some money in the next 7-14 days before they're deleted.

ionelaipatioaei

0 replies

6h29m

2024-03-26 11:57:00 UTC

https://www.youtube.com/@geohotarchive

kettleballroll

0 replies

11h55m

2024-03-26 06:30:59 UTC

But they're far behind in adoption in the AI space, while TPUs have both adoption (inside Google and on top) and a very strong software offering (Jax and TF)

HarHarVeryFunny

0 replies

2h57m

2024-03-26 15:28:53 UTC

There's also Amazon's AWS "Trainium" chips, which is what Anthropic will be using going forward.

If you're talking about training LLMs, involving 10's of thousands of processors, then the specifics of one processor vs another isn't the most important thing - it's the overall architecture and infrastructure in place to manage it.

summerlight

5 replies

17h2m

2024-03-26 01:23:58 UTC

Now of course Google should spin off the TPU team as a separate company.

Given the size of the market and its near-monopoly situation, I strongly think this has the potential to (almost immediately) surpass the Pixel hardware business. But the problem here is that TPU is a relatively scarce computing resource even inside Google and it's very likely that Google has a hard time to meet its internal demands...

fnbr

3 replies

16h36m

2024-03-26 01:50:29 UTC

I’m surprised they sell any to external customers, to be honest.

KeplerBoy

2 replies

10h14m

2024-03-26 08:11:40 UTC

They don't sell any TPUs, do they? Besides the, now ancient, coral toy-TPUs.

Kelteseth

1 replies

9h17m

2024-03-26 09:09:03 UTC

Has there been any development? The last update is from 2021 [0], but it is not officially killed by google(.com)

[0] https://coral.ai/news/updates-07-2021

ddalex

0 replies

7h51m

2024-03-26 10:34:47 UTC

My guess is that the "AI" accelerators in Google Tensor phone chips are based on Coral....

0 replies

12h20m

2024-03-26 06:05:56 UTC

I strongly think this has the potential to (almost immediately) surpass the Pixel hardware business. But the problem here is that TPU is a relatively scarce computing resource even inside Google and it's very likely that Google has a hard time to meet its internal demands...

Yes.

But imagine how the company would do: they have a guaranteed market at Google say for 3 years, and while yes maybe Google takes 100% of the production in the first 12 months it's not a bad position to start from.

Plus there are other products which they could ship that might not always need to be built on the latest process. I imagine there would be demand for inference only earlier generation TPUs that can run LLMs fast if the power usage is low enough.

bionhoward

1 replies

4h51m

2024-03-26 13:34:48 UTC

Speaking of which, mega props to Groq, they really are awesome, so many startups launch with bullshit and promises, but Groq came to the scene with something awesome already working, which is reason enough to love them. I really respect this company and I say that extremely never-often.

elorant

0 replies

1h44m

2024-03-26 16:41:36 UTC

I wouldn't call it awesome. It's just a big chip with lots of cache. You need hundreds of them to sufficiently load any decent model. At which point the cost has skyrocketed.

pyb

0 replies

9h30m

2024-03-26 08:56:04 UTC

There seem to be conflicting reports as to who came up with the TPU https://mastodon.social/@danluu/109641269333636407

bfeynman

0 replies

16h35m

2024-03-26 01:51:01 UTC

Amazon acquired Annapurna labs doing the same thing and have their own train,/inferentia silicon, and they definitely have more support than Google.

hipadev23

40 replies

16h19m

2024-03-26 02:07:26 UTC

How is it that Google invented the TPU and Google Research came up with the paper on LLM and NVDA and AI startup companies have captured ~100% of the value

neilv

21 replies

15h52m

2024-03-26 02:33:55 UTC

There's an old joke explanation about Xerox and PARC, about the difficulty of "pitching a 'paperless office' to a photocopier company".

In Google's case, an example analogy would be pitching making something like ChatGPT widely available, when that would disrupt revenue from search engine paid placements, and from ads on sites that people wouldn't need to visit. (So maybe someone says, better to phase it in subtly, as needed for competitiveness, but in non-disruptive ways.)

I doubt it's as simple as that, but would be funny if that was it.

halflings

15 replies

14h38m

2024-03-26 03:48:15 UTC

This (innovator's dilemma / too afraid of disrupting your own ads business model) is the most common explanation folks are giving for this, but seems to be some sort of post-rationalization of why such a large company full of competent researchers/engineers would drop the ball this hard.

My read (having seen some of this on the inside), is that it was a mix of being too worried about safety issues (OMG, the chatbot occasionally says something offensive!) and being too complacent (too comfortable with incremental changes in Search, no appetite for launching an entirely new type of product / doing something really out there). There are many ways to monetize a chatbot, OpenAI for example is raking billions in subscription fees.

nemothekid

7 replies

14h21m

2024-03-26 04:05:09 UTC

There are many ways to monetize a chatbot, OpenAI for example is raking billions in subscription fees.

Compared to Google, OpenAI's billions is peanuts, while costing a fortune to generate. GPT-4 doesn't seem profitable (if it was, would they need to throttle it?)

nequo

5 replies

12h6m

2024-03-26 06:20:20 UTC

Wouldn't Google be better able to integrate ads into a "ChatGoogle" service than OpenAI is into ChatGPT?

exitheone

4 replies

11h13m

2024-03-26 07:13:24 UTC

The cost per ad is still astronomically different between search ads and LLMs

varjag

3 replies

9h46m

2024-03-26 08:40:29 UTC

There could be an opposite avenue: ad-free Google Premium subscription with AI chat as a crown jewel. An ultimate opportunity to diversify from ad revenue.

disgruntledphd2

2 replies

4h35m

2024-03-26 13:51:11 UTC

There's not enough money in it, as Google's scale.

Especially because the people who'd pay for Premium tend to be the most prized people from an advertiser perspective.

And most people won't pay, under any circumstances, but they will click on ads which make Google money.

varjag

1 replies

4h1m

2024-03-26 14:25:00 UTC

YouTube does it, at Google scale. And these same people do pay $20/mo for ChatGPT anyway.

nemothekid

0 replies

3h6m

2024-03-26 15:20:09 UTC

YouTube isn't comparable - YouTube revenue is roughly 30B/year, while Search revenue is roughly 175B/year.

Advertisers are willing to pay far more than $20/mo per user, combined with the fact that search costs way less per query than inference.

ro_sharp

0 replies

13h51m

2024-03-26 04:34:42 UTC

GPT-4 doesn't seem profitable (if it was, would they need to throttle it?)

Maybe? Hardware supply isn’t perfectly elastic

Karrot_Kream

5 replies

12h31m

2024-03-26 05:55:27 UTC

Google gets much more scrutiny then smaller companies so it's understandable to be worried. Pretty much any small mistake of theirs turns into clickbait on here and the other tech news sites and you get hundreds of comments about how evil Big Tech is. Of course it's their own fault that their PR hews negative so frequently but still it's understandable why they were so shy.

Symmetry

2 replies

6h4m

2024-03-26 12:21:41 UTC

It's understandable that people at Google are worried because it's likely very unpleasant to see critical articles and tweets about something you did. But that isn't really bad for Google's business in any of the ways that losing to someone on AI would be.

michaelt

0 replies

3h49m

2024-03-26 14:37:03 UTC

That's true for google, sure. But what about individual workers and managers at google?

You can push things forward hard, battle the many stakeholders all of whom want their thing at the top of the search results page, get a load of extra headcount to make a robust and scalable user-facing system, join an on-call rota and get called at 2am, engage in a bunch of ethically questionable behaviour skirting the border between fair use and copyright infringement, hire and manage loads of data labellers in low-income countries who get paid a pittance, battle the internal doubters who think Google Assistant shows chatbots are a joke and users don't want it, and battle the internal fearmongers who think your ML system is going to call black people monkeys, and at the end of it maybe it's great or maybe it ends up an embarrassment that gets withdrawn, like Tay.

Or you can publish some academic papers. Maybe do some work improving the automatic transcription for youtube, or translation for google translate. Finish work at 3pm on a Friday, and have plenty of time to enjoy your $400k salary.

IX-103

0 replies

3h51m

2024-03-26 14:35:00 UTC

Google is constantly being sued for nearly everything they do. They create a Chrome Incognito mode like Firefox's private browsing mode and they get sued. They start restricting App permissions on Android, sued. Adding a feature where Google maps lets you select the location of your next appointment as a destination in a single click, sued (that's leveraging your calendar monopoly to improve your map app).

Google has it's hands in so many fields that any change they make that disrupts the status-quo brings down antitrust investigations and lawsuits.

That's the reason why Firefox and Safari dropping support for 3rd party cookies gets a yawn from regulators while Google gets pinned between the CMA wanting to slow down or stop 3rd party cookies deprecation to prevent disrupting the ads market and the ICO wanting Google to drop support yesterday.

This is not about bad press or people feeling bad about news articles. Google has been hit by billion dollar fines in the past and has become hesitant to do anything.

Where smaller companies can take the "Elon Musk" route and just pay fines and settle lawsuits as just the cost of doing business, Google has become an unwieldy juggernaut unable to move out of fear of people complaining and taking another pound of flesh. To be clear, I don't agree with a strategy of ignoring inconvenient regulations, but Google's excess of caution has severely limited their ability to innovate. But given previous judgements against Google, I can't exactly say that they're wrong to do so. Even Google can only pay so many multi-billion dollar fines before they have to close shop, and I can't exactly say the world would be better off if that happened.

logicchains

1 replies

12h27m

2024-03-26 05:58:50 UTC

Sydney when initially released was much less censored and the vast majority of responses online were positive, "this is hilarious/cool", not "OMG Sydney should be banned!".

knowriju

0 replies

6h14m

2024-03-26 12:12:02 UTC

You have clearly not heard about Tay and Galactica.

rs11

0 replies

11h34m

2024-03-26 06:52:09 UTC

Monetizing a chatbot is one thing. Beating revenues every year when you are already making 300b a year is a whole different ball game There must be tens of execs who understand this but their payout depends on keeping status quo

vineyardmike

3 replies

14h49m

2024-03-26 03:37:07 UTC

The answer is far weirder - they had a chat bot, and no one even discussed it in the context of search replacements. They didn’t want to release it because they just didn’t think it should be a product. Only after OpenAI actually disrupted search did they start releasing Gemini/Bard which takes advantage of search.

ec109685

2 replies

14h39m

2024-03-26 03:46:59 UTC

They were afraid to release it because of unaligned output and hallucinations.

ChatGPT showed that people could still get value out of something that wasn’t perfect.

E.g. they had this in their labs: https://www.theguardian.com/technology/2022/jun/12/google-en... from July, 2022z

halflings

0 replies

14h35m

2024-03-26 03:50:30 UTC

Agree re:hallucinations/safety issues, that was likely one of the main blockers.

And here's the sad part: they had this back in 2019... see this paper released in Jan 2020: https://blog.research.google/2020/01/towards-conversational-...

HarHarVeryFunny

0 replies

2024-03-26 18:17:10 UTC

LaMBDA was also briefly available for public testing, but then rapidly withdrawn due to unhinged responses.

One advantage that OpenAI had over Google was having developed RLHF as a way to "align" the model's output to be more acceptable.

Part of Google's dropping the ball at that time period (but catching up now with Gemini) may also have been just not knowing what to do with it. It certainly wasn't apparent pre-ChatGPT that there'd be any huge public demand for something like this, or that people would find so many uses for it in API form, and especially so with LaMBDA's behavioral issues.

eitally

0 replies

2h44m

2024-03-26 15:42:04 UTC

My take as someone who worked in Cloud, closely with the AI product teams on GTM strategy, is that it was primarily the former: Google was always extremely risk averse when it came to AI, to the point that until Andrew Moore was pushed out, Google Cloud didn't refer to anything as AI. It was ML-only, hence the BigQuery ML, Video Intelligence ML, NLP API, and so many other "ML" product names. There was strong sentiment internally that the technology wasn't mature enough to legitimately call it "AI", and that any models adequately complex to be non-trivially explainable were a no-go. Part of this was just general conservatism around product launches within Google, but it was significantly driven by EU regulation, too. Having just come off massive GDPR projects and staring down the barrel of DMA, Google didn't want to do anything that expanded the risk surface, whether it was in Cloud, Ads, Mobile or anything else.

Their hand was forced with ChatGPT was launched ... and we're seeing how that's going.

abraae

8 replies

16h17m

2024-03-26 02:09:15 UTC

For historical precedent see Xerox Parc.

readyplayernull

7 replies

16h12m

2024-03-26 02:13:46 UTC

IBM, Intel, Apple's Newton.

klodolph

4 replies

14h16m

2024-03-26 04:09:50 UTC

The story I like to tell for the Newton is that it was launched before the technology was ready yet. Like the Sega Game Gear. Old video phones. All those tablets that launched before the iPad.

They’re good ideas, but they shipped a few years too early, and the technology to make them work well at a good price point wasn’t available until later. Like, the Sega Game Gear had a cool active matrix LCD screen, but it took six AA batteries and the batteries only lasted like four hours.

Rinzler89

2 replies

9h19m

2024-03-26 09:07:26 UTC

>and the batteries only lasted like four hours

Still more than the OG Steam Deck today :)

sofixa

1 replies

8h16m

2024-03-26 10:09:36 UTC

Vastly depends on the game played and the settings. In a plane (so airplane mode, with Bluetooth headset) I played Hitman Absolution for 3 hours and still had 50%+ of the battery left. It was on minimal brightness because it was dark and didn't need more, but still.

Rinzler89

0 replies

4h20m

2024-03-26 14:05:53 UTC

Yeah, no need to take a (semi)joke literally and go all technical to debunk it. Though without optimizations, battery life on the deck was lucky to hit 2h at first before valve brought in updates and people learned they had to cap resolution and FPS to increase battery life.

technofiend

0 replies

14h3m

2024-03-26 04:22:36 UTC

The Palm Pilot V had a dockable cell phone modem, but the connectivity wasn't integrated into the OS. It worked but only as a demonstration. Then Palm released a model with integrated data, but the BlackBerry came out the same year. You can be first and still if someone comes along with a much more compelling product, that's the end of you.

Google has a few years left as a search company, but their enshittification of results has doomed them to replacement by LLMs. They seem to have forgotten Google pushed out their predecessors by having the best search results. Targeted advertisements don't qualify.

chillfox

1 replies

15h53m

2024-03-26 02:32:50 UTC

Kodak

BolexNOLA

0 replies

15h27m

2024-03-26 02:58:45 UTC

Man I remember my last semester of college taking a history of photography course that was only offered every 3-4 years by a pretty legendary professor. The the day before the first day of class (or super close), Eastman Kodak declared bankruptcy after what? 110 years?

He scrapped his day 1 lecture and threw together a talk - with photos of course - about Kodak and how an intrepid engineer developed then the company foolishly hid the first digital camera because it would compete with their film line.

Incredible lecturer for sure haha

wstrange

2 replies

16h2m

2024-03-26 02:23:34 UTC

It's far too early to suggest Google will not capture value from AI. They have plenty of opportunity to integrate AI into their products.

earthnail

1 replies

10h22m

2024-03-26 08:03:45 UTC

Microsoft is sooooo far ahead in this game, it’s borderline ridiculous. Google really missed an opportunity to grab major market share from MS Office.

willvarfar

0 replies

9h1m

2024-03-26 09:25:11 UTC

Yes, this! Google Docs is basically basic. But imagine if, years ago, Google had added built-in LLM-based auto-complete and refactoring and summation tools to documents and presentations etc, years ago...

Oh to dream.

tw04

1 replies

15h53m

2024-03-26 02:32:32 UTC

Because Google can’t focus on a product for more than 18 months if it isn’t generating several billion in PROFIT. They are punch drunk on advertising.

Culonavirus

0 replies

9h25m

2024-03-26 09:01:17 UTC

They're like a hyperactive dog chasing its own tail. How many projects did they create only to shut them a bit later? All because there's always some nonsense to chase. Meanwhile the AI train has left the station without them and their search is now an ad infested hot piece of garbage. Don't even get me started on their customer/dev support or how aging things like Google Translate api got absolutely KILLED by GPT-4 like apis overnight.

Google has stage 4 leadership incompetency and can't be helped. The only humane option is euthanasia.

willvarfar

0 replies

10h52m

2024-03-26 07:34:18 UTC

OpenAI lured everyone away from Google with way higher pay.

https://www.linkedin.com/posts/eolver_googles-defense-agains...

vineyardmike

0 replies

14h52m

2024-03-26 03:33:56 UTC

I think the TPU is simple. They do sell it (via cloud), but they focus on themselves first. When there was no shortage of compute, it was an also-ran in the ML hardware market. Now it’s trendy.

ChatGPT v Google is a far crazier history. Not only did Google invent Transformers, not only did Google open-source PaLM and Bert, but they even built chat tuned LLM chat bots and let employees talk with it. This isn’t a case where they were avoiding for disruption or protecting search - they genuinely didn’t see its potential. Worse, they got so much negative publicity over it that they considered it an AI safety issue to release. If that guy hadn’t gone to the press and claimed LaMDA was sentient than they may have entirely open sourced it like PaLM. This would likely mean that GPT-3 was open sourced and maybe never chat tuned either.

GPT-2 was freely available and OpenAI showed off GPT-3 freely as a parlor trick before ChatGPT came out. ChatGPT was originally the same - fun text generation as chat not a full product.

TLDR - Tensors probably didn’t have a lot of value until NVidia because scarce and they actively invented the original ChatGPT and “AI Safety” concerns caused them to lock it down.

snats

0 replies

15h13m

2024-03-26 03:13:28 UTC

to this day i am impressed that they have not figured out how to embed advertisments to bard outputs so that they can go free.

romanovcode

0 replies

15h28m

2024-03-26 02:57:36 UTC

Pretty sure it is because if ChatGPT likes would update as frequently as google website index it would render search engines like google obsolete and thus make their revenue nonexistent.

layer8

39 replies

17h56m

2024-03-26 00:29:53 UTC

However, although tensors describe the relationship between arbitrary higher-dimensional arrays, in practice the TPU hardware that we will consider is designed to perform calculations associated with one and two-dimensional arrays. Or, more specifically, vector and matrix operations.

I still don’t understand why the term “tensor” is used if it’s only vectors and matrices.

sillysaurusx

14 replies

17h36m

2024-03-26 00:50:20 UTC

I was confused as hell for a long time when I first got into ML, until I figured out how to think about tensors in a visual way.

You're right: fundamentally ML is about vector and matrix operations (1D and 2D). So then why are most ML programs 3D, 4D, and in a transformer sometimes up to 6D (?!)

One reasonable guess is that the third dimension is time. Actually not. It turns out that time is pretty rare in ML, and it's only (relatively) recently that it's been introduced into e.g. video models.

Another guess is that it's to represent "time" as in, think of how transformers work: they generate a token, then another given the previous, then a third given the first two, etc. That's a certain way of describing "time". But it turns out that transformers don't do this as a 3D or 4D dimension. It only needs to be 2D, because tokens are 1D -- if you're representing tokens over time, you get a 2D output. So even with a cutting edge model like transformers, you still only need plain old 2D matrix operations. The attention layer creates a mask, which ends up being 2D.

So then why do models get to 3D and above? Usually batching. You get a certain efficiency boost when you pack a bunch of operations together. And if you pack a bunch of 2D operations together, that third dimension is the batch dimension.

For images, you typically end up with 4D, with the convension N,C,H,W, which stands for "Batch, Channel, Height, Width". It can also be N,H,W,C, which is the same thing but it's packed in memory as red green blue, red green blue, etc instead of all the red pixels first, then all the green pixels, then all the blue pixels. This matters in various subtle ways.

I have no idea why the batch dimension is called N, but it's probably "number of images".

"Vector" wouldn't quite cover all of this, and although "tensor" is confusing, it's fine. It's the ham sandwich of naming conventions: flexible, satisfying to some, and you can make them in a bunch of different varieties.

Under the hood, TPUs actually flatten 3D tensors down into 2D matrix multiplications. I was surprised by this, but it makes total sense. The native size for a TPU is 8x128 -- you can think of it a bit like the native width of a CPU, except it's 2D. So if you have a 3x4x256 tensor, it actually gets flattened out to 12x256, then the XLA black box magic figures out how to split that across a certain number of 8x128 vector registers. Note they're called "vector registers" rather than "tensor registers", which is interesting. See https://cloud.google.com/tpu/docs/performance-guide

layer8

8 replies

17h32m

2024-03-26 00:54:24 UTC

Thanks for the background! I still don’t think it’s appropriate to call a batch of matrices a tensor.

sillysaurusx

4 replies

17h29m

2024-03-26 00:56:48 UTC

You'd hate particle physics then. "Spin" and "action" and so on are terrible names, but scientists live with them, because convention.

Convention dominates most of what we do. I'm not sure there's a good way around this. Most conventions suck, but they were established back before there was a clear idea of what the best long-term convention should be.

layer8

3 replies

17h20m

2024-03-26 01:05:36 UTC

At least in physics you can understand how the terms came about historically, where at some point they made sense. But “tensor” here, as note in sibling comments, seems to have been chosen primarily for marketing reasons.

FridgeSeal

2 replies

16h58m

2024-03-26 01:27:55 UTC

It comes from the maths, where tensors are generalisations of matrices/vectors. They got cribbed, because the ML stuff directly used a bunch of the underlying maths. It’s a novel term, it sounds cool, not surprised it also then got promoted up into a marketing term.

cowsandmilk

1 replies

16h6m

2024-03-26 02:20:21 UTC

tensors are generalisations of matrices/vectors.

Is that what they are though? Because that really is not my understanding. Tensors are mappings which not all matrices and vectors are. Maybe the matrices in ML layers are all mappings, but a matrix in general is not, not is a vector always a mapping. So tensors aren’t generalizations of matrices and vectors.

kergonath

0 replies

12h19m

2024-03-26 06:07:12 UTC

Tensors are mappings which not all matrices and vectors are.

A tensor in Physics is an object that follows some rules when changing reference frame. Their matrix representation is just one way of writing them. It’s the same with vectors: a list with their components is a representation of a vector, not the vector itself. We can think about it that way: the velocity of an object does not depend on the reference frame. Changing the axes does not make the object change its trajectory, but it does change the numerical values of the components of the velocity vector.

So tensors aren’t generalizations of matrices and vectors.

Indeed. Tensors in ML have pretty much nothing to do with tensors in Maths or Physics. It is very unfortunate that they settled on the same name just because it sounds cool and sciency.

whimsicalism

0 replies

17h16m

2024-03-26 01:10:27 UTC

why not? multilinear mappings can be represented by “batches of matrices” and that’s all that a tensor is

dekhn

0 replies

16h51m

2024-03-26 01:35:04 UTC

I think to be a tensor, all the bases should be independent. The way I think of it is you use a tensor to describe the rotation of an asteroid around all its major axes (inertia tensor?)

WhitneyLand

0 replies

13h45m

2024-03-26 04:41:16 UTC

It is appropriate in ML and computer science. It’s not in pure math.

There are many terms in math and science where the definition changes based on the context.

samstave

1 replies

17h26m

2024-03-26 00:59:56 UTC

For whatever reason, I have held a mental image of a Tensor as a Tesseract/HyperCube where the connections are like the Elastic workout bands where they have differing tensile resistances, and they pull on one another to create their encapsulated info-cluster - but I have no clue if thats truly an accurate depiction, but it works in my head....

https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/Or...

sillysaurusx

0 replies

16h37m

2024-03-26 01:48:30 UTC

I'm reluctant to tell people "no, don't think of it that way," especially if it works for you, because I don't know the best way to think of things. I only know what works well for me. But for me, it'd be ~impossible to use your mental model to do anything useful. That doesn't mean it's bad, just that I don't understand what you mean.

The most straightforward mental model I've ever found for ML is, think of it as 2D matrix operations, like high school linear algebra. Matrix-matrix, matrix-vector, vector-matrix, and vector-vector will get you through 95% of what comes up in practice. In fact I'm having trouble thinking of something that doesn't work that way, because even if you have an RGB image that you multiply against a 2D matrix (i.e. HxWxC multiplied by a mask) the matrix is still only going to apply to 2 of the channels (height and width), since that's the only thing that makes sense. That's why there's all kinds of flattening and rearranging everywhere in practice -- everyone is trying to get a format like N,C,H,W down to a 2D matrix representation.

People like to talk up the higher level maths in ML, but highschool linear algebra (or for the gamedevs in the audience, the stuff you'd normally do in a rendering engine) really will carry you most of the way through your ML journey without loss of generality. The higher level maths usually happens when you start understanding how differentiation works, which you don't even need to understand until way later after you're doing useful things already.

samstave

0 replies

17h23m

2024-03-26 01:03:20 UTC

One reasonable guess is that the third dimension is time. Actually not. It turns out that time is pretty rare in ML, and it's only (relatively) recently that it's been introduced into e.g. video models.

WRT to ML - may time be better thought of where a thing lives in relation to other things that occurred within the same temporal window?

so "all the shit that happened in 1999 also has an expression within this cluster of events from 1999" - but the same information appears in any location where it is relationally contextual to the other neighbors, such as the SUBJECT of the information? Is this accurate to say why its 'quantum' because the information will show up depending on where the Observation (query) for it is occurring?

(sorry for my kindergarten understanding of this)

parpfish

0 replies

17h27m

2024-03-26 00:58:54 UTC

just because an image is 2-D doesn’t mean that the model can’t use higher dimensional representations in subsequent layers.

For an image, you could imagine a network learning to push the image through a filter bank that does oriented local frequency decomposition and turns it into 4D {height}x{width}x{spatial freq}X{orientation} before dealing with color channels or image batches

bsdpufferfish

0 replies

16h12m

2024-03-26 02:14:22 UTC

higher dimensional vectors or matrices are still not tensors.

shrubble

5 replies

17h33m

2024-03-26 00:52:59 UTC

Tensor is from mathematics and was popularized over a century ago.

layer8

4 replies

17h29m

2024-03-26 00:56:39 UTC

I know what a tensor is mathematically. However, as far as I can see, ML isn’t based on tensor calculus as such.

phkahler

2 replies

17h22m

2024-03-26 01:03:42 UTC

Something similar happens on Wikipedia, where topics that use math inevitably get explained in the highest level math possible. It makes topics harder to understand than they need to be.

jiggawatts

1 replies

17h18m

2024-03-26 01:07:40 UTC

As a helpful Wiki editor just trying to make sure that we don't lead people astray, I've made some small changes to clarify your statement:

In the virtual compendium of Wikipedia, an extensive repository of human knowledge, there is a discernible proclivity for the hermeneutics of mathematically-infused topics to be articulated through the prism of esoteric and sophisticated mathematical constructs, often employing a panoply of arcane lexemes and syntactic structures of Greek and Latin etymology. This phenomenon, redolent of an academic periphrasis, tends to transmute the exegesis of such subjects into a crucible of abstruse and high-order mathematical discourse. Consequently, this modus operandi obfuscates the intrinsic didactic intent, thereby precipitating an epistemological chasm that challenges the layperson's erudition and obviates the pedagogical utility of the exposition.

xarope

0 replies

15h7m

2024-03-26 03:18:46 UTC

scarily, I actually understood this.

whimsicalism

0 replies

17h17m

2024-03-26 01:08:58 UTC

multidimensional arrays are multilinear mappings, and that is how they are used in ml usually. it seems fine to me

ralusek

5 replies

17h54m

2024-03-26 00:32:07 UTC

If nothing else, the term "tensor" is shorter than "vectors and matrices," and then has the added benefit of representing n-dimensional arrays.

layer8

4 replies

17h41m

2024-03-26 00:45:11 UTC

How is that an added benefit if the hardware doesn’t actually support n-dimensional arrays (other the n = 1 and 2)?

And, strictly speaking, a vector can be considered a 1xn (or nx1) matrix, so Matrix Processing Unit would have been fine.

whimsicalism

2 replies

17h17m

2024-03-26 01:09:20 UTC

it’s an abstraction, just like 2d arrays

layer8

1 replies

17h14m

2024-03-26 01:12:20 UTC

I’d say it’s more like calling an ALU that can perform unary and binary operations (so 1 or 2 inputs) an “array processing unit” because it’s like it can process 1- and 2-element arrays. ;)

whimsicalism

0 replies

2h46m

2024-03-26 15:39:31 UTC

what? the ml framework can support n-dimensional arrays. that’s what i mean by an abstraction

thatguysaguy

0 replies

17h20m

2024-03-26 01:06:01 UTC

At the end of the day all the arrays are 1 dimensional and thinking of them as 2 dimensional is just an indexing convenience. A matrix multiply is a bunch of vector dot products in a row. Higher tensor contractions can be built out of lower-dimensional ones, so I don't think it's really fair to say the hardware doesn't support it.

necroforest

3 replies

17h43m

2024-03-26 00:43:11 UTC

It's branding (see: TensorFlow); also, pretty much anything (linear) you would do with an arbitrarily ranked tensor can be expressed in terms of vector ops and matmuls

nxobject

2 replies

17h35m

2024-03-26 00:51:10 UTC

"Fixed-Function Matrix Accelerator" just doesn't have the same buzzy ring to it.

selcuka

0 replies

17h5m

2024-03-26 01:21:26 UTC

"Fixed-Function Matrix Accelerator" just doesn't have the same buzzy ring to it.

FixMax™ or Maxxelerator™ would be good brands.

bryzaguy

0 replies

17h21m

2024-03-26 01:05:27 UTC

It’s the perfect name for my next EDM SoundCloud mix, though.

WhitneyLand

2 replies

14h3m

2024-03-26 04:23:01 UTC

It says: tensors describe the relationship between high-d arrays

It does not say: tensors “only” describe the relationship between high-d arrays

The term “tensor” is used because it covers all cases: scalars, vectors, matrices, and higher-dimensional arrays.

Tensors are still a generalization of vectors and matrices.

Note the context: In ML and computer science, they are considered a generalization. From a strict pure math standpoint they can be considered different.

As frustrating as it seems one is not really more right and context is the decider. There are lots of definitions across STEM fields that change based on the context or field they’re applied to.

adrian_b

0 replies

7h48m

2024-03-26 10:38:14 UTC

The word tensor has become more ambiguous during the time.

Before 1900, the use of the word tensor was consistent with its etymology, because it was used only for symmetric matrices, which correspond to affine transformations that stretch or compress a body in certain directions.

The square matrix that corresponds to a general affine transformation can be decomposed into the product of a tensor (a symmetric matrix which stretches) and a versor (a rotation matrix, which is antisymmetric and which rotates).

When Ricci-Curbastro and Levi-Civitta have published the first theory of what now are called tensors, they did not define any new word for the concept of a multidimensional array with certain rules of transformation when the coordinate system is changed, which is now called tensor.

When Einstein has published the Theory of General Relativity during WWI in which he used what is now called tensor theory, for an unknown reason and without any explanation for this choice he has begun to use the word "tensor" with the current meaning, in contrast with all previous physics publications.

Because Einstein has become extremely popular immediately after WWI, his usage of the word "tensor" has spread everywhere, including in mathematics (and including in the American translations of the works of Ricci and Levi-Civita, where the word tensor has been introduced everywhere, despite the fact that it did not exist in the original).

Nevertheless, for many years the word "tensor" could not be used for arbitrary multi-dimensional arrays, but only for those which observe the tensor transformation rules with respect to coordinate changes.

The use of the word "tensor" as a synonym for the word "array", like in ML/AI, is a recent phenomenon.

Previously, e.g. in all early computer literature, the word "array" (or "table" in COBOL literature) was used to cover all cases, from scalars, vectors and matrices to arrays with an arbitrary number of dimensions, so no new words are necessary.

Symmetry

0 replies

5h40m

2024-03-26 12:45:44 UTC

Famously whether free helium is a molecule or not depends on whether you're talking to a physicist or a chemist.

But yeah, people in different countries speak different languages and the same sound, like "no" can mean a negation in English but a possessive in Japanese. And as different fields establish their jargons they often redefine words in different ways. It's just something you have to be aware of.

thatguysaguy

1 replies

17h22m

2024-03-26 01:04:15 UTC

Well, in the transformer forward pass there are a bunch of 4-dimensional arrays being used.

smilekzs

0 replies

16h13m

2024-03-26 02:12:38 UTC

Came in to say this.

The Einsum notation makes it desirable to formulate your model/layer as multi-dimensional arrays connected by (loosely) named axes, without worrying too much about breaking it down to primitives yourself. Once you get used to it, the terseness is liberating.

jeffhwang

0 replies

15h52m

2024-03-26 02:33:40 UTC

(I think) technically, all of these mathematical objects are tensors of different ranks:

0. Scalar numbers are tensors of rank 0.

1. Vectors (eg velocity, acceleration in intro high school physics) are tensors of rank 1.

2. Matrices that you learn in intro linear anlgebra are tensors of rank 2. Nested arrays 1 level deep, aka a 2d array.

0. Tensors numbers are tensors of rank 3 or higher. I explain this as ‘nested arrays’ to people with programming backgrounds as nested arrays of arrays with 3dimensions of arrays or higher.

But I’m mostly self-taught in math so ymmv.

adrian_b

0 replies

9h43m

2024-03-26 08:42:42 UTC

I do not know which is the real origin of the fashion to use the word tensor in the context of AI/ML.

Nevertheless, I have always interpreted it as a reference to the fact that the optimal method of multiplying matrices is to decompose the matrix multiplication into tensor products of vectors.

The other 2 alternative methods, i.e. decomposing the matrix multiplication into scalar products of vectors or into AXPY operations on pairs of vectors, have a much worse ratio between computation operations and transfer operations.

Unfortunately, most people learn in school the much less useful definition of the matrix multiplication based on scalar products of vectors, instead of its definition based on tensor products of vectors, which is the one needed in practice.

The 3 possible methods for multiplying matrices correspond to the 6 possible orders for the 3 indices of the 3 nested loops that compute a matrix product.

a_wild_dandan

0 replies

14h20m

2024-03-26 04:05:56 UTC

Every tensor is just a stack of vectors wearing a trench coat.

formercoder

11 replies

16h22m

2024-03-26 02:04:09 UTC

Googler here, if you haven’t looked at TPUs in a while check out the v5. They support PyTorch/JAX now, makes them much easier to use than TF only.

tw04

10 replies

15h52m

2024-03-26 02:33:44 UTC

Where can I buy a TPU v5 to install in my server? If the answer is “cloud”: that’s why NVidia is wiping the floor.

foota

6 replies

15h35m

2024-03-26 02:50:39 UTC

How many people are out there buying H100s for their personal use?

Workaccount2

2 replies

15h24m

2024-03-26 03:02:18 UTC

Probably many orders of magnitude greater than those buying TPU's for personal use...

foota

1 replies

13h44m

2024-03-26 04:41:58 UTC

Technically correct, but only because TPUs aren't for sale. H100s cost like 30,000 USD, if you can even get one.

immibis

0 replies

7h35m

2024-03-26 10:51:14 UTC

So in other words, every AI company has at least 20.

michaelt

1 replies

6h35m

2024-03-26 11:50:39 UTC

Ah, but part of the reason for CUDA's success is that the open source developer who wants to run unit tests or profile their kernel can pick up a $200 card. That PhD student with a $2000 budget can pick up a card. Academic lab with $20,000 for a beefy server, or tiny cluster? nvidia will take their money.

And that's all fixed capital expenditure - there's no risk a code bug or typo by an inexperienced student will lead to a huge bill.

Also, if you're looking for an alternative to CUDA because you dislike vendor lock-in, switching to something only available in GCP would be an absurd choice.

jsight

0 replies

4h18m

2024-03-26 14:07:38 UTC

I'm really shocked at how dependent companies have become on the cloud offerings. Want a GPU? Those are expensive, lets just rent on Amazon and then complain about operational costs!

I've noticed this at companies. Yeah, the cloud is expensive, but you have a data center, and a few servers with RTX 3090s aren't expensive. A lot of research workloads can run on simple, cheap hardware.

Even older Nvidia P40s are still useful.

rhdunn

0 replies

6h21m

2024-03-26 12:05:01 UTC

Probably not many. However, 4090s would be a different situation. There are plenty of guides on running LLMs, stable diffusion, etc. on local hardware.

The H100s would be for businesses looking to get into this space.

ShamelessC

2 replies

15h42m

2024-03-26 02:44:23 UTC

You probably can't even rent them from Google if you wanted to, in my experience.

inhumantsar

1 replies

15h0m

2024-03-26 03:25:57 UTC

https://cloud.google.com/tpu

jedberg

0 replies

14h51m

2024-03-26 03:35:00 UTC

I think OPs point was Google claims to have TPUs in their cloud but in reality they are rarely available.

IncreasePosts

10 replies

17h31m

2024-03-26 00:55:20 UTC

Sigh...learning about TPUs a decade ago made me invest heavily in $GOOG for the coming AI revolution...got that one 100% wrong. +400% over 10 years isn't bad but I can't help but feel shortchanged seeing nvidia/etc

smallmancontrov

3 replies

17h23m

2024-03-26 01:02:51 UTC

+400% over 10 years isn't bad.

layer8

2 replies

17h17m

2024-03-26 01:09:21 UTC

It’s almost 15% per year, quite a lot.

fragmede

1 replies

16h47m

2024-03-26 01:39:14 UTC

yeah, but nvda is up like 500% in 2 years, so if you’re naive enough to think you can time the market, you’d have fomo over having invested in the “wrong” thing.

bongodongobob

0 replies

13h15m

2024-03-26 05:10:55 UTC

Seeing the difference between GPT2 and GPT3 made me run to NVDA immediately. One of the few bets in my life I've ever been confident about. I think NVDA was a pretty reasonable bet on AI like 5+, maybe 10 years ago when deep learning was ramping up.

genidoi

1 replies

17h22m

2024-03-26 01:04:24 UTC

I don't think anybody in 2014 believed that the performance of GPT-4/Claude Opus/... was 10 years away. 25 years maybe, 50 years probably, but not 10.

bongodongobob

0 replies

13h9m

2024-03-26 05:16:31 UTC

It wasn't just that, it was also all the deep learning stuff. Atari games playing themselves, deep style and variants. There was some interesting image generation happening. AlphaGo was 2015, etc. that was really when things started accelerating imo.

brcmthrowaway

1 replies

16h15m

2024-03-26 02:11:13 UTC

An out and out cherrypicker complaining about missing out on gains.. couldnt write a better script

IncreasePosts

0 replies

1h54m

2024-03-26 16:32:11 UTC

Buying and holding stock for 10 years is not cherry picking as I understand it.

astrange

0 replies

13h44m

2024-03-26 04:42:13 UTC

Could've bought $QQQ unless you expected one of the components to do especially badly.

Symmetry

0 replies

5h35m

2024-03-26 12:50:38 UTC

The potential usefulness of things like TPUs actually made me invest in Broadcomm, which helped Google design them and could potentially help Amazon or however else design their equivalents. But I'm also long NVidia and a half dozen other companies with AI exposure while still keeping most of my money in index funds.

rhelz

8 replies

16h51m

2024-03-26 01:34:48 UTC

Quote from the OP: "The TPU v1 uses a CISC (Complex Instruction Set Computer) design with around only about 20 instructions."

chuckle CISC/RISC has gone from astute observation, to research program, to revolutionary technology, to marketing buzzwords....and finally to being just completely meaningless sounds.

I suppose it's the terminological circle of life.

cowsandmilk

6 replies

16h26m

2024-03-26 01:59:31 UTC

You’re seeming to imply the number of instructions available is what distinguishes CISC, but it never has been.

rhelz

3 replies

16h17m

2024-03-26 02:09:09 UTC

Guys....what are the instructions? The on-chip memory they are talking about is essentially...a big register set. So we have load from main memory into registers, store from registers into main memory, multiply matrices--source and dest are stored in registers....

We have a 20 instruction, load-store cpu....how is this not RISC? At least RISC how we used the term in 1995?

dmoy

0 replies

16h12m

2024-03-26 02:13:33 UTC

I think the "multiply matrices" instruction is the one that makes it a cisc

brigade

0 replies

15h34m

2024-03-26 02:52:26 UTC

Its design follows the old idea that an ISA should be designed for assembly programmers; that instructions should implement complex or higher-level functions intended for a programmer to use directly.

RISC rejected that notion (among other things) and focused on designing ISAs for a compiler to target when compiling high level languages, without wasting silicon on instructions a compiler cannot easily use. For the TPU, a compiler cannot easily take a 256x256 matrix multiply written in a high-level language like C and emit a Matrix_Multiply instruction.

Symmetry

0 replies

5h46m

2024-03-26 12:39:35 UTC

I don't think it makes any sense to talk about all on-chip memory as a register set. In practice most uses of REP MOVS these days don't leave L3$ but because it's an instruction that runs for a highly variable amount of time while transferring data between different locations we consider it very CISCy. And the TPU also has instructions to transfer data over PCIe to and from the TPU's local DDR3 memory as well, which isn't on the chip and I hope you would agree that it's not like a register at that point.

If every instruction was always one 256 element unit maybe you could make the analogy stick. But it's working with 256*N element operations.

xarope

0 replies

14h57m

2024-03-26 03:28:57 UTC

Right. CISC vs RISC has always been about simplifying the underlying micro-instructions and register set usage. It's definitely CISC if you have a large complex operation on multiple memory direct locations (albeit the lines between RISC and CISC being blurred, as all such polar philosophies do, when real-life performance optimizations come into play)

LelouBil

0 replies

16h19m

2024-03-26 02:06:51 UTC

The fact that it's opposed to RISC (Reduced Instruction Set) adds to the confusion.

dmoy

0 replies

16h31m

2024-03-26 01:55:27 UTC

Idk maybe it's just me, but what I was taught in comp architecture was that cisc vs risc has more to do with the complexity of the instructions, not the raw count. So TPU having a smaller number of instructions can still be a cisc if the instructions are fairly complex.

Granted the last time I took any comp architecture was a grad course like 15 years ago, so my memory is pretty fuzzy (also we spent most of that semester dicking around with Itanium stuff that is beyond useless now)

uptownfunk

1 replies

13h34m

2024-03-26 04:52:21 UTC

What Google really needs to do is get into the 2nm EUV space and go sub 2nm. When they have the electro lithography (or whatever tech ASML has that prints on the chips) then you have something really dangerous. Probably a hardcore Google X moonshot type project. Or maybe they have 500mm sitting around to just buy one of the machines. If their tpu are really that good - maybe it is a good business - especially if they can integrate all the way to having their own fab with their own tech

ejiblabahaba

0 replies

12h30m

2024-03-26 05:56:16 UTC

This is frankly infeasible. Between the decades of trade secrets they would first need to discover, the tens- or maybe hundreds- of billions in capital needed to build their very first leading edge fab, the decade or two it would take for any such business to mature to the extent it would be functional, and the completely inconsequential volumes of devices they'd produce, they would probably be lighting half a trillion dollars on fire just to get a few years behind where the leading edge sits today, ten or more years from now. The only reason leading edge fabs are profitable today is because of decades of talent and engineering focused on producing general purpose computing devices for a wide variety of applications and customers, often with those very same customers driving innovation independently in critical focus areas (e.g. Micron with chip-on-chip HDI yield improvements, Xilinx with interdie communication fabric and multi chip substrate design). TPUs will never generate the required volumes, or attract the necessary customers, to achieve remotely profitable economies of scale, particularly when Google also has to set an attractive price against their competitors.

If Google has a compelling-enough business case, existing fabs will happily allocate space for their hardware. TPU is not remotely compelling enough.

brcmthrowaway

1 replies

16h16m

2024-03-26 02:09:54 UTC

Broadcom did the TPU

HarHarVeryFunny

0 replies

2h54m

2024-03-26 15:32:20 UTC

Not the whole design - the core processing part (systolic array - matrix multiplier) was designed by Google, but Broadcom designed all the highspeed chip I/O and mapped the design onto TSMCs tools/rules.

ThinkBeat

1 replies

6h33m

2024-03-26 11:53:18 UTC

This is probably a dumb question, that just shows my ignorance but I keep hearing on the consumer end of things that the M1-M4 chips are good at some AI.

The most important for me these days would be Photoshop, Resolve etc, and I have seen those run a lot faster on Apple new proprietary chips than on my older machines.

That may not translate well at all to what this chip can do or what a H100s can do.

But does it translate at all?

Of course Apple is not selling their propritary chips either so for it to be practical Apple would have to release some from of an external, server something stuffed with their GPUs and AI chips

singhrac

0 replies

2h36m

2024-03-26 15:49:39 UTC

I’m also not quite an expert, but have benchmarked an M1 and various GPUs.

The M* chips have unified memory and (especially Pro/Max/Ultra) have very high memory bandwidth even compared eg to a 1080 (an M1 Ultra has memory bandwidth between 2080 and 3090).

At small batch sizes (including 1, like most local tasks), inference is bottlenecked by memory bandwidth, not compute ability. This is why people say the M* chips are good for ML.

However H100s are used primarily for training (at enormous batch sizes) and require lots of interconnect to train large models. At that scale, arithmetic intensity is very high, and the M* chips aren’t very competitive (even if they could be networked) - they pick a different part of the Pareto power/efficiency curve than H100s which guzzle up power.

xrd

0 replies

17h9m

2024-03-26 01:16:44 UTC

This article really connected a lot of abstract pieces together into how they flow through silicon. I really enjoyed seeing the simple CISC instructions and how they basically map on to LLM inference steps.

sroussey

0 replies

2h41m

2024-03-26 15:45:15 UTC

I wonder how hardware will change if LLMs quantized to -1,0,1 really take off.

kleton

0 replies

16h22m

2024-03-26 02:03:50 UTC

Which ocean creature name is the current TPU?

ThinkBeat

0 replies

6h37m

2024-03-26 11:48:33 UTC

Given what seems to be an enormous demand for fab space, when Microsoft or Google create a proprietary chip and need it produced how do they get to the front of the line?

Are they simple enough that "older outdated less in demand" fabs can produce them?

I know Apple and Nvidea has a lock on a lot of fab space?

11101010001100

0 replies

1h39m

2024-03-26 16:47:13 UTC

But where is the puck going?