return to table of content

Google's First Tensor Processing Unit: Architecture

nl
50 replies
17h16m

On the podcast interview now Groq CEO Jonathon Ross did[1] he talked about the creation of the original TPUs (which he built at Google). Apparently originally it was a FPGA he did in his 20% time because he sat near the team who was having inference speed issues.

They got it working, then Jeff Dean did the math and the decided to do an ASIC.

Now of course Google should spin off the TPU team as a separate company. It's the only credible competition NVidia has, and the software support is second only to NVidia.

[1] https://open.spotify.com/episode/0V9kRgNS7Ds6zh3GjdXUAQ?si=q...

Laremere
25 replies
13h47m

The way I see, NVidia only has a few advantages ordered from most important to least:

1. Reserved fab space.

2. Highly integrated software.

3. Hardware architecture that exists today.

4. Customer relationships.

but all of these aspects are weak in one way or another:

For #1, fab space is tight, and NVidia can strangle its consumer GPU market if it means selling more AI chips at a higher price. This advantage is gone if a competitor makes big bets years in advance, or another company that has a lot of fab space (intel?) is willing to change priorities.

2. Life is good when your proprietary software is the industry standard. Whether this actually matters will depend on the use case heavily.

3. A benefit now, but not for long. It's my estimation that the hardware design for TPUs is fundamentally much simpler than for GPUs. No need for raytracing, texture samplers, or rasterization. Mostly just needs lots of matrix multiplication and memory. Others moving into the space will be able to catch up quickly.

4. Useful to stay in the conversation, but in a field hungry for any advantage, the hardware vendor with the highest FLOPS (or equivalent) per dollar is going to win enough customers to saturate their manufacturing ability.

So overall, I give them a few years, and then the competition is going to be real quite fast.

bartwr
9 replies
7h36m

Seems you have not worked with ML workloads, but base your comment on "internet wisdom", or worse, business analysts (I am sorry if that's inaccurate).

On GPUs, ML "just works" (inference and training) and are always order of magnitude faster than whatever CPU you have. TPUs work very well for some model architectures (old ones that they were optimized and designed for) and on some novel others can be actually slower than a CPU (because of gathers and similar) - this was my experience working on ML stuff as an ML Researcher at Google till 2022, maybe it got better but I doubt. Older TPUs were ok only for inference of those specific models and useless for training. And anything new I tried (fundamental part of research...) - the compiler would sonetimes just break with an internal error, most of the time just produce terrible and slow code, and bugs filed against it would stay open for years.

GPU is so much more than a matrix multiplier - it's a fully general, programmable processor. With excellent compilers, but most importantly - low level access that you don't need to rely on proprietary compiler engineers (like TPU ones) and anyone can develop something like Flash Attention. And as a side note: while a Transformer might be mostly matrix multiplication, many other models are not.

sevagh
4 replies
7h4m

Also, it's disingenuous to say "there's only 4 things you need to beat NVIDIA" when each of the 4 is an enormous undertaking.

puppymaster
3 replies
5h9m

not to mention every not-so-serious, inference heavy ML developers just want something to work to deliver to client. That itself is a semi-moat.

kkielhofner
2 replies
4h53m

It's been talked to death but non-CUDA implementations have their challenges regardless of use case. That's what first-mover advantage and > 15 years of investment by Nvidia in their overall ecosystem will do for you.

But support for production serving of inference workloads outside of CUDA is universally dismal. This is where I spend most of my time and compared to CUDA anything else is non-existent or a non-starter unless you're all-in on packaged API driven Google/Amazon/etc tooling utilizing their TPUs (or whatever). The most significant vendor/cloud lock-in I think I've ever seen.

Efficient and high-scale serving of inference workloads is THE thing you need to do to serve customers and actually have a chance at ever making any money. It's shocking to me that Nvidia/CUDA has a complete stranglehold on this obvious use case.

sevagh
1 replies
4h44m

A great summary of how unserious NVIDIA's competitors are is how long it took AMD's flagship consumer/retail GPU, the 7900 XT[X], to gain ROCm support.

That's quite literally unacceptable.

kkielhofner
0 replies
1h41m

For those who don't know - one year after launch.

Meanwhile Nvidia will go as far as to back port Hopper support to CUDA 11.8 so it "just runs" the day of launch with everything you already have.

sigmoid10
1 replies
3h9m

On GPUs, ML "just works"

If you had worked with ML, you'd know that this is not true. It's actually more like the opposite. It also has nothing to do with the chips themselves. Things don't magically work "because GPU", they work because manufacturers spend the time getting their drivers and ecosystems right. That's why for example noone is using AMD GPUs for ML, despite them offering more compute per dollar on paper. Getting the software stack to the point of Nvidia/CUDA, where things really do "just work", is an enormous undertaking. And as someone who has been researching ML for more than a decade now, I can tell you Nvidia also didn't get these things right in the beginning. That's the reason why they have no real competition today (and still won't for quite some time).

mike_hearn
0 replies
8m

Probably bartwr is using "GPUs" to mean NVIDIA GPUs. Seeing as nobody uses AMD GPUs for it, that simplification seems OK.

sudosysgen
0 replies
1h50m

ML doesn't just work on GPUs. It's not uncommon to have architectures where GPUs don't really work, we just tend not to use those :)

Laremere
0 replies
1h50m

Hey, this is a good comment. I've only toyed with ML stuff, but I've done a lot with GPUs. I hope you can find my "step back" perspective as valuable I find your up close one.

My chief mistake in the above comment was using "TPU", as that's Google's branding. I probably should've used "AI focused co-processor". I'm not talking exclusively about Google's foray into the space, especially as I haven't used TPUs.

My list of things to ditch on GPUs doesn't include cores. My point there is that there's a bunch of components that are needed for graphics programming that are entirely pointless for AI workloads, both inside the core's ALU and as larger board components. The hardware components needed for AI seem relatively well understood at this point (though that's possible to change with some other innovation).

Put another way, my point is this: Historically, the high end GPU market was mostly limited to scientific computing, enthusiast gaming, and some varied professional workloads. Nvidia has long been king here, but with relatively little attempt by others at competition. ML was added to that list in the last decade, but with some few exceptions (Google's TPU), the people who could move into the space haven't. Then chatGPT happened, investment in AI has gone crazy, and suddenly Nvidia is one of the most valuable companies in the world.

However, The list of companies who have proven they can make all the essential components (in my list in the grandparent) isn't large, but it's also not just Nvidia. Basically every computing device with a screen has some measure of GPU components, and now everyone is paying attention to AI. So I think within a few years Nvidia's market leadership will be challenged, and they certainly won't be the only supplier of top of the line AI co-processors by the end of the decade. Whether first mover advantage will keep them in first place, time will tell.

otabdeveloper4
4 replies
10h56m

CUDA is absolute shit, segfaults or compiler errors if you look at it wrong.

NVidia's software is the only reason I'm not using GPU's for ML tasks and likely never will.

KeplerBoy
1 replies
10h18m

That's just C. If you're accessing your arrays out of bounds it's going to segfault. hopefully.

Can't blame CUDA for that one.

otabdeveloper4
0 replies
8h24m

I'm talking about the compiler segfaulting, not the end-user code.

Culonavirus
1 replies
9h34m

Skill issue.

otabdeveloper4
0 replies
8h25m

No, CUDA's botched gcc implementation segfaulting due to compiler errors during compilation is not a "skill issue".

(Well, a skill issue of whoever is patching gcc on Nvidia's end, I guess.)

nl
1 replies
12h25m

Actually their real advantage is the large set of highly optimised CUDA kernels.

This is the thing that lets them outperform AMD chips even on inferior hardware. And the fact that anything new gets written for CUDA first.

There is OpenAI's Triton language for this too and people are beginning to use it (shout out to Unsloth here!).

Reserved fab space.

While this is true, it's worth noting that the inference only Groq chip which gets 2x-5x better LLM inference performance is on a 12nm process.

ants_everywhere
0 replies
6h22m

Honest question: will AI help AMD catch up with optimized CUDA/ROCM kernels of their own?

jimberlage
1 replies
1h20m

I’ve spent the last month deep in GPU driver/compiler world and -

AMD or Apple (Metal) or someone (I haven’t tried Intel’s stuff) just needs to have a single guide to installing a driver and compiler that doesn’t segfault if you look at it wrong, and they would sweep the R&D mindshare.

It is insane how bad CUDA is; it’s even more insane how bad their competitors are.

jimberlage
0 replies
46m

If you work in hardware and are interested in solving this lemme say this

There are billions of dollars waiting for the first person to get this right. The only reason I haven’t jumped on this myself is a lack of familiarity with drivers.

7e
1 replies
12h31m

These have always been NVIDIA's "few" advantages and yet they've still dominated for years. It's their relentless pace of innovation that is their advantage. They resemble Intel of old, and despite Intel's same "few" advantages, Intel is still dominant in the PC space (even with recent missteps).

weweersdfsd
0 replies
12h10m

They've dominated for years, but now all big tech companies are using their products in scale not seen before, and all have vested interest in cutting their margins by introducing some real competition.

Nvidia will do good in the future, but perhaps not good enough to justify their stock price.

logicchains
0 replies
12h31m

2. Highly integrated software.

NVidia's biggest advantage is that AMD is unwilling to pay for top notch software engineers (and unwilling to pay the corresponding increase in hardware engineer salaries this would entail). If you check online you'll see NVidia pays both hardware and software engineers significantly more than AMD does. This is a cultural/management problem, which AMD's unlikely to overcome in the near-term future. Apple so far seems like the only other hardware company that doesn't underpay its engineers, but Apple's unlikely to release a discrete/stand-alone GPU any time soon.

dagmx
0 replies
11h46m

Don’t underestimate CUDA as the moat. It’s been a decade of sheer dominance with multiple attempts to loosen its grip that haven’t been super fruitful.

I’ll also add that their second moat is Mellanox. They have state of the art interconnect and networking that puts them ahead of the competition that are currently focusing just on the single unit.

Oioioioiio
0 replies
4h33m

Nvidia has so much software behind all of this, your list is a tremendes understatement.

Alone how many internal ML things nvidia builds helps them tremendesly to understand the market (what does the market need).

And they use their inventions themselves.

'only has a few' = 'has a handful easy to list but with huge implications which are not easily matched by amd or intel right now'

KeplerBoy
0 replies
10h16m

Nvidia's datacenter AI chips don't have raytracing or rasterization. Heck, for all we know the new blackwell chip is almost exclusively tensor cores. They gave no numbers for regular CUDA perf.

ipsum2
13 replies
13h1m

It's the only credible competition NVidia has

This is wrong, both AMD and Intel (through Habana) have GPUs comparable to H100s in performance.

nl
10 replies
12h24m

Yes, but they don't have the custom kernels that CUDA has. TPUs do have some!

rurban
9 replies
10h23m

They have Vulcan, which is cross-compatible.

And AMD has ROCm. pytorch is standard and pytorch has ROCm support. And the Google TPU v5 also has pytorch support.

We do have a couple of H100's, but I'd love to replace them with AMD's

refulgentis
2 replies
6h7m

Geohot doesn't know what he's talking about and I'm kinda ashamed to see this lazy thinking leak onto HN. There was an article a couple weeks back on AMD open sourcing drivers in the Linux kernel tree that you should look into.

Kelteseth
1 replies
5h25m

Care to explain a bit more? His rant was about the firmware having crashes not the Linux driver.

refulgentis
0 replies
4h53m

Firmware crashes => days long "open source it and I'll fix it. no? why does AMD hate its customers?"

I got an appointment and have exactly one minute till I have to leave, apologies for brevity: they can't open source the full driver because then they'd have to release HDMI spec stuff that the consortium says they can't. (I don't support any of that, my only intent is to communicate George isn't really locked in here when he starts casting aspersions or claiming AMD doesn't care)

KeplerBoy
2 replies
8h53m

Geohot is wrangling with unsupported consumer hardware.

The datacenter stuff is on a different architecture and driver stack. The number one supercomputer on the top500 list (frontier at ORNL) is based on AMD GPUs and AMD is probably more invested in supporting that.

kkielhofner
0 replies
4h41m

I work with Frontier and ORNL/OLCF. They have had and continue to have issues with AMD/ROCm but yes, they do of course get excellent support from AMD. The entire team at OLCF is incredible as well (obviously) and they do amazing work.

Frontier certainly has some unique quirks but the documentation is online[0] and most of these quirks are inherent to the kinds of fundamental issues you'll see on any system in the space (SLURM, etc).

However, most of the issues are fundamentally ROCm and you'll run into them on any MIxxx anywhere. I run into them frequently with supported and unsupported consumer gear all the way up.

[0] - https://docs.olcf.ornl.gov/systems/frontier_user_guide.html

QuadmasterXLII
0 replies
5h59m

I mean, that's kinda nvidia's whole shtick: anyone can play around synthesizing cat pictures on their gaming GPU and if they make a breakthrough, the same software will transfer to X million dollar supercomputers.

immibis
1 replies
7h32m

Subscriber only videos, so nobody can confirm that he did that, nor archive whatever valuable information he released. At least not without paying some money in the next 7-14 days before they're deleted.

kettleballroll
0 replies
11h55m

But they're far behind in adoption in the AI space, while TPUs have both adoption (inside Google and on top) and a very strong software offering (Jax and TF)

HarHarVeryFunny
0 replies
2h57m

There's also Amazon's AWS "Trainium" chips, which is what Anthropic will be using going forward.

If you're talking about training LLMs, involving 10's of thousands of processors, then the specifics of one processor vs another isn't the most important thing - it's the overall architecture and infrastructure in place to manage it.

summerlight
5 replies
17h2m

Now of course Google should spin off the TPU team as a separate company.

Given the size of the market and its near-monopoly situation, I strongly think this has the potential to (almost immediately) surpass the Pixel hardware business. But the problem here is that TPU is a relatively scarce computing resource even inside Google and it's very likely that Google has a hard time to meet its internal demands...

fnbr
3 replies
16h36m

I’m surprised they sell any to external customers, to be honest.

KeplerBoy
2 replies
10h14m

They don't sell any TPUs, do they? Besides the, now ancient, coral toy-TPUs.

Kelteseth
1 replies
9h17m

Has there been any development? The last update is from 2021 [0], but it is not officially killed by google(.com)

[0] https://coral.ai/news/updates-07-2021

ddalex
0 replies
7h51m

My guess is that the "AI" accelerators in Google Tensor phone chips are based on Coral....

nl
0 replies
12h20m

I strongly think this has the potential to (almost immediately) surpass the Pixel hardware business. But the problem here is that TPU is a relatively scarce computing resource even inside Google and it's very likely that Google has a hard time to meet its internal demands...

Yes.

But imagine how the company would do: they have a guaranteed market at Google say for 3 years, and while yes maybe Google takes 100% of the production in the first 12 months it's not a bad position to start from.

Plus there are other products which they could ship that might not always need to be built on the latest process. I imagine there would be demand for inference only earlier generation TPUs that can run LLMs fast if the power usage is low enough.

bionhoward
1 replies
4h51m

Speaking of which, mega props to Groq, they really are awesome, so many startups launch with bullshit and promises, but Groq came to the scene with something awesome already working, which is reason enough to love them. I really respect this company and I say that extremely never-often.

elorant
0 replies
1h44m

I wouldn't call it awesome. It's just a big chip with lots of cache. You need hundreds of them to sufficiently load any decent model. At which point the cost has skyrocketed.

bfeynman
0 replies
16h35m

Amazon acquired Annapurna labs doing the same thing and have their own train,/inferentia silicon, and they definitely have more support than Google.

hipadev23
40 replies
16h19m

How is it that Google invented the TPU and Google Research came up with the paper on LLM and NVDA and AI startup companies have captured ~100% of the value

neilv
21 replies
15h52m

There's an old joke explanation about Xerox and PARC, about the difficulty of "pitching a 'paperless office' to a photocopier company".

In Google's case, an example analogy would be pitching making something like ChatGPT widely available, when that would disrupt revenue from search engine paid placements, and from ads on sites that people wouldn't need to visit. (So maybe someone says, better to phase it in subtly, as needed for competitiveness, but in non-disruptive ways.)

I doubt it's as simple as that, but would be funny if that was it.

halflings
15 replies
14h38m

This (innovator's dilemma / too afraid of disrupting your own ads business model) is the most common explanation folks are giving for this, but seems to be some sort of post-rationalization of why such a large company full of competent researchers/engineers would drop the ball this hard.

My read (having seen some of this on the inside), is that it was a mix of being too worried about safety issues (OMG, the chatbot occasionally says something offensive!) and being too complacent (too comfortable with incremental changes in Search, no appetite for launching an entirely new type of product / doing something really out there). There are many ways to monetize a chatbot, OpenAI for example is raking billions in subscription fees.

nemothekid
7 replies
14h21m

There are many ways to monetize a chatbot, OpenAI for example is raking billions in subscription fees.

Compared to Google, OpenAI's billions is peanuts, while costing a fortune to generate. GPT-4 doesn't seem profitable (if it was, would they need to throttle it?)

nequo
5 replies
12h6m

Wouldn't Google be better able to integrate ads into a "ChatGoogle" service than OpenAI is into ChatGPT?

exitheone
4 replies
11h13m

The cost per ad is still astronomically different between search ads and LLMs

varjag
3 replies
9h46m

There could be an opposite avenue: ad-free Google Premium subscription with AI chat as a crown jewel. An ultimate opportunity to diversify from ad revenue.

disgruntledphd2
2 replies
4h35m

There's not enough money in it, as Google's scale.

Especially because the people who'd pay for Premium tend to be the most prized people from an advertiser perspective.

And most people won't pay, under any circumstances, but they will click on ads which make Google money.

varjag
1 replies
4h1m

YouTube does it, at Google scale. And these same people do pay $20/mo for ChatGPT anyway.

nemothekid
0 replies
3h6m

YouTube isn't comparable - YouTube revenue is roughly 30B/year, while Search revenue is roughly 175B/year.

Advertisers are willing to pay far more than $20/mo per user, combined with the fact that search costs way less per query than inference.

ro_sharp
0 replies
13h51m

GPT-4 doesn't seem profitable (if it was, would they need to throttle it?)

Maybe? Hardware supply isn’t perfectly elastic

Karrot_Kream
5 replies
12h31m

Google gets much more scrutiny then smaller companies so it's understandable to be worried. Pretty much any small mistake of theirs turns into clickbait on here and the other tech news sites and you get hundreds of comments about how evil Big Tech is. Of course it's their own fault that their PR hews negative so frequently but still it's understandable why they were so shy.

Symmetry
2 replies
6h4m

It's understandable that people at Google are worried because it's likely very unpleasant to see critical articles and tweets about something you did. But that isn't really bad for Google's business in any of the ways that losing to someone on AI would be.

michaelt
0 replies
3h49m

That's true for google, sure. But what about individual workers and managers at google?

You can push things forward hard, battle the many stakeholders all of whom want their thing at the top of the search results page, get a load of extra headcount to make a robust and scalable user-facing system, join an on-call rota and get called at 2am, engage in a bunch of ethically questionable behaviour skirting the border between fair use and copyright infringement, hire and manage loads of data labellers in low-income countries who get paid a pittance, battle the internal doubters who think Google Assistant shows chatbots are a joke and users don't want it, and battle the internal fearmongers who think your ML system is going to call black people monkeys, and at the end of it maybe it's great or maybe it ends up an embarrassment that gets withdrawn, like Tay.

Or you can publish some academic papers. Maybe do some work improving the automatic transcription for youtube, or translation for google translate. Finish work at 3pm on a Friday, and have plenty of time to enjoy your $400k salary.

IX-103
0 replies
3h51m

Google is constantly being sued for nearly everything they do. They create a Chrome Incognito mode like Firefox's private browsing mode and they get sued. They start restricting App permissions on Android, sued. Adding a feature where Google maps lets you select the location of your next appointment as a destination in a single click, sued (that's leveraging your calendar monopoly to improve your map app).

Google has it's hands in so many fields that any change they make that disrupts the status-quo brings down antitrust investigations and lawsuits.

That's the reason why Firefox and Safari dropping support for 3rd party cookies gets a yawn from regulators while Google gets pinned between the CMA wanting to slow down or stop 3rd party cookies deprecation to prevent disrupting the ads market and the ICO wanting Google to drop support yesterday.

This is not about bad press or people feeling bad about news articles. Google has been hit by billion dollar fines in the past and has become hesitant to do anything.

Where smaller companies can take the "Elon Musk" route and just pay fines and settle lawsuits as just the cost of doing business, Google has become an unwieldy juggernaut unable to move out of fear of people complaining and taking another pound of flesh. To be clear, I don't agree with a strategy of ignoring inconvenient regulations, but Google's excess of caution has severely limited their ability to innovate. But given previous judgements against Google, I can't exactly say that they're wrong to do so. Even Google can only pay so many multi-billion dollar fines before they have to close shop, and I can't exactly say the world would be better off if that happened.

logicchains
1 replies
12h27m

Sydney when initially released was much less censored and the vast majority of responses online were positive, "this is hilarious/cool", not "OMG Sydney should be banned!".

knowriju
0 replies
6h14m

You have clearly not heard about Tay and Galactica.

rs11
0 replies
11h34m

Monetizing a chatbot is one thing. Beating revenues every year when you are already making 300b a year is a whole different ball game There must be tens of execs who understand this but their payout depends on keeping status quo

vineyardmike
3 replies
14h49m

The answer is far weirder - they had a chat bot, and no one even discussed it in the context of search replacements. They didn’t want to release it because they just didn’t think it should be a product. Only after OpenAI actually disrupted search did they start releasing Gemini/Bard which takes advantage of search.

ec109685
2 replies
14h39m

They were afraid to release it because of unaligned output and hallucinations.

ChatGPT showed that people could still get value out of something that wasn’t perfect.

E.g. they had this in their labs: https://www.theguardian.com/technology/2022/jun/12/google-en... from July, 2022z

HarHarVeryFunny
0 replies
9m

LaMBDA was also briefly available for public testing, but then rapidly withdrawn due to unhinged responses.

One advantage that OpenAI had over Google was having developed RLHF as a way to "align" the model's output to be more acceptable.

Part of Google's dropping the ball at that time period (but catching up now with Gemini) may also have been just not knowing what to do with it. It certainly wasn't apparent pre-ChatGPT that there'd be any huge public demand for something like this, or that people would find so many uses for it in API form, and especially so with LaMBDA's behavioral issues.

eitally
0 replies
2h44m

My take as someone who worked in Cloud, closely with the AI product teams on GTM strategy, is that it was primarily the former: Google was always extremely risk averse when it came to AI, to the point that until Andrew Moore was pushed out, Google Cloud didn't refer to anything as AI. It was ML-only, hence the BigQuery ML, Video Intelligence ML, NLP API, and so many other "ML" product names. There was strong sentiment internally that the technology wasn't mature enough to legitimately call it "AI", and that any models adequately complex to be non-trivially explainable were a no-go. Part of this was just general conservatism around product launches within Google, but it was significantly driven by EU regulation, too. Having just come off massive GDPR projects and staring down the barrel of DMA, Google didn't want to do anything that expanded the risk surface, whether it was in Cloud, Ads, Mobile or anything else.

Their hand was forced with ChatGPT was launched ... and we're seeing how that's going.

abraae
8 replies
16h17m

For historical precedent see Xerox Parc.

readyplayernull
7 replies
16h12m

IBM, Intel, Apple's Newton.

klodolph
4 replies
14h16m

The story I like to tell for the Newton is that it was launched before the technology was ready yet. Like the Sega Game Gear. Old video phones. All those tablets that launched before the iPad.

They’re good ideas, but they shipped a few years too early, and the technology to make them work well at a good price point wasn’t available until later. Like, the Sega Game Gear had a cool active matrix LCD screen, but it took six AA batteries and the batteries only lasted like four hours.

Rinzler89
2 replies
9h19m

>and the batteries only lasted like four hours

Still more than the OG Steam Deck today :)

sofixa
1 replies
8h16m

Vastly depends on the game played and the settings. In a plane (so airplane mode, with Bluetooth headset) I played Hitman Absolution for 3 hours and still had 50%+ of the battery left. It was on minimal brightness because it was dark and didn't need more, but still.

Rinzler89
0 replies
4h20m

Yeah, no need to take a (semi)joke literally and go all technical to debunk it. Though without optimizations, battery life on the deck was lucky to hit 2h at first before valve brought in updates and people learned they had to cap resolution and FPS to increase battery life.

technofiend
0 replies
14h3m

The Palm Pilot V had a dockable cell phone modem, but the connectivity wasn't integrated into the OS. It worked but only as a demonstration. Then Palm released a model with integrated data, but the BlackBerry came out the same year. You can be first and still if someone comes along with a much more compelling product, that's the end of you.

Google has a few years left as a search company, but their enshittification of results has doomed them to replacement by LLMs. They seem to have forgotten Google pushed out their predecessors by having the best search results. Targeted advertisements don't qualify.

chillfox
1 replies
15h53m

Kodak

BolexNOLA
0 replies
15h27m

Man I remember my last semester of college taking a history of photography course that was only offered every 3-4 years by a pretty legendary professor. The the day before the first day of class (or super close), Eastman Kodak declared bankruptcy after what? 110 years?

He scrapped his day 1 lecture and threw together a talk - with photos of course - about Kodak and how an intrepid engineer developed then the company foolishly hid the first digital camera because it would compete with their film line.

Incredible lecturer for sure haha

wstrange
2 replies
16h2m

It's far too early to suggest Google will not capture value from AI. They have plenty of opportunity to integrate AI into their products.

earthnail
1 replies
10h22m

Microsoft is sooooo far ahead in this game, it’s borderline ridiculous. Google really missed an opportunity to grab major market share from MS Office.

willvarfar
0 replies
9h1m

Yes, this! Google Docs is basically basic. But imagine if, years ago, Google had added built-in LLM-based auto-complete and refactoring and summation tools to documents and presentations etc, years ago...

Oh to dream.

tw04
1 replies
15h53m

Because Google can’t focus on a product for more than 18 months if it isn’t generating several billion in PROFIT. They are punch drunk on advertising.

Culonavirus
0 replies
9h25m

They're like a hyperactive dog chasing its own tail. How many projects did they create only to shut them a bit later? All because there's always some nonsense to chase. Meanwhile the AI train has left the station without them and their search is now an ad infested hot piece of garbage. Don't even get me started on their customer/dev support or how aging things like Google Translate api got absolutely KILLED by GPT-4 like apis overnight.

Google has stage 4 leadership incompetency and can't be helped. The only humane option is euthanasia.

vineyardmike
0 replies
14h52m

I think the TPU is simple. They do sell it (via cloud), but they focus on themselves first. When there was no shortage of compute, it was an also-ran in the ML hardware market. Now it’s trendy.

ChatGPT v Google is a far crazier history. Not only did Google invent Transformers, not only did Google open-source PaLM and Bert, but they even built chat tuned LLM chat bots and let employees talk with it. This isn’t a case where they were avoiding for disruption or protecting search - they genuinely didn’t see its potential. Worse, they got so much negative publicity over it that they considered it an AI safety issue to release. If that guy hadn’t gone to the press and claimed LaMDA was sentient than they may have entirely open sourced it like PaLM. This would likely mean that GPT-3 was open sourced and maybe never chat tuned either.

GPT-2 was freely available and OpenAI showed off GPT-3 freely as a parlor trick before ChatGPT came out. ChatGPT was originally the same - fun text generation as chat not a full product.

TLDR - Tensors probably didn’t have a lot of value until NVidia because scarce and they actively invented the original ChatGPT and “AI Safety” concerns caused them to lock it down.

snats
0 replies
15h13m

to this day i am impressed that they have not figured out how to embed advertisments to bard outputs so that they can go free.

romanovcode
0 replies
15h28m

Pretty sure it is because if ChatGPT likes would update as frequently as google website index it would render search engines like google obsolete and thus make their revenue nonexistent.

layer8
39 replies
17h56m

However, although tensors describe the relationship between arbitrary higher-dimensional arrays, in practice the TPU hardware that we will consider is designed to perform calculations associated with one and two-dimensional arrays. Or, more specifically, vector and matrix operations.

I still don’t understand why the term “tensor” is used if it’s only vectors and matrices.

sillysaurusx
14 replies
17h36m

I was confused as hell for a long time when I first got into ML, until I figured out how to think about tensors in a visual way.

You're right: fundamentally ML is about vector and matrix operations (1D and 2D). So then why are most ML programs 3D, 4D, and in a transformer sometimes up to 6D (?!)

One reasonable guess is that the third dimension is time. Actually not. It turns out that time is pretty rare in ML, and it's only (relatively) recently that it's been introduced into e.g. video models.

Another guess is that it's to represent "time" as in, think of how transformers work: they generate a token, then another given the previous, then a third given the first two, etc. That's a certain way of describing "time". But it turns out that transformers don't do this as a 3D or 4D dimension. It only needs to be 2D, because tokens are 1D -- if you're representing tokens over time, you get a 2D output. So even with a cutting edge model like transformers, you still only need plain old 2D matrix operations. The attention layer creates a mask, which ends up being 2D.

So then why do models get to 3D and above? Usually batching. You get a certain efficiency boost when you pack a bunch of operations together. And if you pack a bunch of 2D operations together, that third dimension is the batch dimension.

For images, you typically end up with 4D, with the convension N,C,H,W, which stands for "Batch, Channel, Height, Width". It can also be N,H,W,C, which is the same thing but it's packed in memory as red green blue, red green blue, etc instead of all the red pixels first, then all the green pixels, then all the blue pixels. This matters in various subtle ways.

I have no idea why the batch dimension is called N, but it's probably "number of images".

"Vector" wouldn't quite cover all of this, and although "tensor" is confusing, it's fine. It's the ham sandwich of naming conventions: flexible, satisfying to some, and you can make them in a bunch of different varieties.

Under the hood, TPUs actually flatten 3D tensors down into 2D matrix multiplications. I was surprised by this, but it makes total sense. The native size for a TPU is 8x128 -- you can think of it a bit like the native width of a CPU, except it's 2D. So if you have a 3x4x256 tensor, it actually gets flattened out to 12x256, then the XLA black box magic figures out how to split that across a certain number of 8x128 vector registers. Note they're called "vector registers" rather than "tensor registers", which is interesting. See https://cloud.google.com/tpu/docs/performance-guide

layer8
8 replies
17h32m

Thanks for the background! I still don’t think it’s appropriate to call a batch of matrices a tensor.

sillysaurusx
4 replies
17h29m

You'd hate particle physics then. "Spin" and "action" and so on are terrible names, but scientists live with them, because convention.

Convention dominates most of what we do. I'm not sure there's a good way around this. Most conventions suck, but they were established back before there was a clear idea of what the best long-term convention should be.

layer8
3 replies
17h20m

At least in physics you can understand how the terms came about historically, where at some point they made sense. But “tensor” here, as note in sibling comments, seems to have been chosen primarily for marketing reasons.

FridgeSeal
2 replies
16h58m

It comes from the maths, where tensors are generalisations of matrices/vectors. They got cribbed, because the ML stuff directly used a bunch of the underlying maths. It’s a novel term, it sounds cool, not surprised it also then got promoted up into a marketing term.

cowsandmilk
1 replies
16h6m

tensors are generalisations of matrices/vectors.

Is that what they are though? Because that really is not my understanding. Tensors are mappings which not all matrices and vectors are. Maybe the matrices in ML layers are all mappings, but a matrix in general is not, not is a vector always a mapping. So tensors aren’t generalizations of matrices and vectors.

kergonath
0 replies
12h19m

Tensors are mappings which not all matrices and vectors are.

A tensor in Physics is an object that follows some rules when changing reference frame. Their matrix representation is just one way of writing them. It’s the same with vectors: a list with their components is a representation of a vector, not the vector itself. We can think about it that way: the velocity of an object does not depend on the reference frame. Changing the axes does not make the object change its trajectory, but it does change the numerical values of the components of the velocity vector.

So tensors aren’t generalizations of matrices and vectors.

Indeed. Tensors in ML have pretty much nothing to do with tensors in Maths or Physics. It is very unfortunate that they settled on the same name just because it sounds cool and sciency.

whimsicalism
0 replies
17h16m

why not? multilinear mappings can be represented by “batches of matrices” and that’s all that a tensor is

dekhn
0 replies
16h51m

I think to be a tensor, all the bases should be independent. The way I think of it is you use a tensor to describe the rotation of an asteroid around all its major axes (inertia tensor?)

WhitneyLand
0 replies
13h45m

It is appropriate in ML and computer science. It’s not in pure math.

There are many terms in math and science where the definition changes based on the context.

samstave
1 replies
17h26m

For whatever reason, I have held a mental image of a Tensor as a Tesseract/HyperCube where the connections are like the Elastic workout bands where they have differing tensile resistances, and they pull on one another to create their encapsulated info-cluster - but I have no clue if thats truly an accurate depiction, but it works in my head....

https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/Or...

sillysaurusx
0 replies
16h37m

I'm reluctant to tell people "no, don't think of it that way," especially if it works for you, because I don't know the best way to think of things. I only know what works well for me. But for me, it'd be ~impossible to use your mental model to do anything useful. That doesn't mean it's bad, just that I don't understand what you mean.

The most straightforward mental model I've ever found for ML is, think of it as 2D matrix operations, like high school linear algebra. Matrix-matrix, matrix-vector, vector-matrix, and vector-vector will get you through 95% of what comes up in practice. In fact I'm having trouble thinking of something that doesn't work that way, because even if you have an RGB image that you multiply against a 2D matrix (i.e. HxWxC multiplied by a mask) the matrix is still only going to apply to 2 of the channels (height and width), since that's the only thing that makes sense. That's why there's all kinds of flattening and rearranging everywhere in practice -- everyone is trying to get a format like N,C,H,W down to a 2D matrix representation.

People like to talk up the higher level maths in ML, but highschool linear algebra (or for the gamedevs in the audience, the stuff you'd normally do in a rendering engine) really will carry you most of the way through your ML journey without loss of generality. The higher level maths usually happens when you start understanding how differentiation works, which you don't even need to understand until way later after you're doing useful things already.

samstave
0 replies
17h23m

One reasonable guess is that the third dimension is time. Actually not. It turns out that time is pretty rare in ML, and it's only (relatively) recently that it's been introduced into e.g. video models.

WRT to ML - may time be better thought of where a thing lives in relation to other things that occurred within the same temporal window?

so "all the shit that happened in 1999 also has an expression within this cluster of events from 1999" - but the same information appears in any location where it is relationally contextual to the other neighbors, such as the SUBJECT of the information? Is this accurate to say why its 'quantum' because the information will show up depending on where the Observation (query) for it is occurring?

(sorry for my kindergarten understanding of this)

parpfish
0 replies
17h27m

just because an image is 2-D doesn’t mean that the model can’t use higher dimensional representations in subsequent layers.

For an image, you could imagine a network learning to push the image through a filter bank that does oriented local frequency decomposition and turns it into 4D {height}x{width}x{spatial freq}X{orientation} before dealing with color channels or image batches

bsdpufferfish
0 replies
16h12m

higher dimensional vectors or matrices are still not tensors.

shrubble
5 replies
17h33m

Tensor is from mathematics and was popularized over a century ago.

layer8
4 replies
17h29m

I know what a tensor is mathematically. However, as far as I can see, ML isn’t based on tensor calculus as such.

phkahler
2 replies
17h22m

Something similar happens on Wikipedia, where topics that use math inevitably get explained in the highest level math possible. It makes topics harder to understand than they need to be.

jiggawatts
1 replies
17h18m

As a helpful Wiki editor just trying to make sure that we don't lead people astray, I've made some small changes to clarify your statement:

In the virtual compendium of Wikipedia, an extensive repository of human knowledge, there is a discernible proclivity for the hermeneutics of mathematically-infused topics to be articulated through the prism of esoteric and sophisticated mathematical constructs, often employing a panoply of arcane lexemes and syntactic structures of Greek and Latin etymology. This phenomenon, redolent of an academic periphrasis, tends to transmute the exegesis of such subjects into a crucible of abstruse and high-order mathematical discourse. Consequently, this modus operandi obfuscates the intrinsic didactic intent, thereby precipitating an epistemological chasm that challenges the layperson's erudition and obviates the pedagogical utility of the exposition.

xarope
0 replies
15h7m

scarily, I actually understood this.

whimsicalism
0 replies
17h17m

multidimensional arrays are multilinear mappings, and that is how they are used in ml usually. it seems fine to me

ralusek
5 replies
17h54m

If nothing else, the term "tensor" is shorter than "vectors and matrices," and then has the added benefit of representing n-dimensional arrays.

layer8
4 replies
17h41m

How is that an added benefit if the hardware doesn’t actually support n-dimensional arrays (other the n = 1 and 2)?

And, strictly speaking, a vector can be considered a 1xn (or nx1) matrix, so Matrix Processing Unit would have been fine.

whimsicalism
2 replies
17h17m

it’s an abstraction, just like 2d arrays

layer8
1 replies
17h14m

I’d say it’s more like calling an ALU that can perform unary and binary operations (so 1 or 2 inputs) an “array processing unit” because it’s like it can process 1- and 2-element arrays. ;)

whimsicalism
0 replies
2h46m

what? the ml framework can support n-dimensional arrays. that’s what i mean by an abstraction

thatguysaguy
0 replies
17h20m

At the end of the day all the arrays are 1 dimensional and thinking of them as 2 dimensional is just an indexing convenience. A matrix multiply is a bunch of vector dot products in a row. Higher tensor contractions can be built out of lower-dimensional ones, so I don't think it's really fair to say the hardware doesn't support it.

necroforest
3 replies
17h43m

It's branding (see: TensorFlow); also, pretty much anything (linear) you would do with an arbitrarily ranked tensor can be expressed in terms of vector ops and matmuls

nxobject
2 replies
17h35m

"Fixed-Function Matrix Accelerator" just doesn't have the same buzzy ring to it.

selcuka
0 replies
17h5m

"Fixed-Function Matrix Accelerator" just doesn't have the same buzzy ring to it.

FixMax™ or Maxxelerator™ would be good brands.

bryzaguy
0 replies
17h21m

It’s the perfect name for my next EDM SoundCloud mix, though.

WhitneyLand
2 replies
14h3m

It says: tensors describe the relationship between high-d arrays

It does not say: tensors “only” describe the relationship between high-d arrays

The term “tensor” is used because it covers all cases: scalars, vectors, matrices, and higher-dimensional arrays.

Tensors are still a generalization of vectors and matrices.

Note the context: In ML and computer science, they are considered a generalization. From a strict pure math standpoint they can be considered different.

As frustrating as it seems one is not really more right and context is the decider. There are lots of definitions across STEM fields that change based on the context or field they’re applied to.

adrian_b
0 replies
7h48m

The word tensor has become more ambiguous during the time.

Before 1900, the use of the word tensor was consistent with its etymology, because it was used only for symmetric matrices, which correspond to affine transformations that stretch or compress a body in certain directions.

The square matrix that corresponds to a general affine transformation can be decomposed into the product of a tensor (a symmetric matrix which stretches) and a versor (a rotation matrix, which is antisymmetric and which rotates).

When Ricci-Curbastro and Levi-Civitta have published the first theory of what now are called tensors, they did not define any new word for the concept of a multidimensional array with certain rules of transformation when the coordinate system is changed, which is now called tensor.

When Einstein has published the Theory of General Relativity during WWI in which he used what is now called tensor theory, for an unknown reason and without any explanation for this choice he has begun to use the word "tensor" with the current meaning, in contrast with all previous physics publications.

Because Einstein has become extremely popular immediately after WWI, his usage of the word "tensor" has spread everywhere, including in mathematics (and including in the American translations of the works of Ricci and Levi-Civita, where the word tensor has been introduced everywhere, despite the fact that it did not exist in the original).

Nevertheless, for many years the word "tensor" could not be used for arbitrary multi-dimensional arrays, but only for those which observe the tensor transformation rules with respect to coordinate changes.

The use of the word "tensor" as a synonym for the word "array", like in ML/AI, is a recent phenomenon.

Previously, e.g. in all early computer literature, the word "array" (or "table" in COBOL literature) was used to cover all cases, from scalars, vectors and matrices to arrays with an arbitrary number of dimensions, so no new words are necessary.

Symmetry
0 replies
5h40m

Famously whether free helium is a molecule or not depends on whether you're talking to a physicist or a chemist.

But yeah, people in different countries speak different languages and the same sound, like "no" can mean a negation in English but a possessive in Japanese. And as different fields establish their jargons they often redefine words in different ways. It's just something you have to be aware of.

thatguysaguy
1 replies
17h22m

Well, in the transformer forward pass there are a bunch of 4-dimensional arrays being used.

smilekzs
0 replies
16h13m

Came in to say this.

The Einsum notation makes it desirable to formulate your model/layer as multi-dimensional arrays connected by (loosely) named axes, without worrying too much about breaking it down to primitives yourself. Once you get used to it, the terseness is liberating.

jeffhwang
0 replies
15h52m

(I think) technically, all of these mathematical objects are tensors of different ranks:

0. Scalar numbers are tensors of rank 0.

1. Vectors (eg velocity, acceleration in intro high school physics) are tensors of rank 1.

2. Matrices that you learn in intro linear anlgebra are tensors of rank 2. Nested arrays 1 level deep, aka a 2d array.

0. Tensors numbers are tensors of rank 3 or higher. I explain this as ‘nested arrays’ to people with programming backgrounds as nested arrays of arrays with 3dimensions of arrays or higher.

But I’m mostly self-taught in math so ymmv.

adrian_b
0 replies
9h43m

I do not know which is the real origin of the fashion to use the word tensor in the context of AI/ML.

Nevertheless, I have always interpreted it as a reference to the fact that the optimal method of multiplying matrices is to decompose the matrix multiplication into tensor products of vectors.

The other 2 alternative methods, i.e. decomposing the matrix multiplication into scalar products of vectors or into AXPY operations on pairs of vectors, have a much worse ratio between computation operations and transfer operations.

Unfortunately, most people learn in school the much less useful definition of the matrix multiplication based on scalar products of vectors, instead of its definition based on tensor products of vectors, which is the one needed in practice.

The 3 possible methods for multiplying matrices correspond to the 6 possible orders for the 3 indices of the 3 nested loops that compute a matrix product.

a_wild_dandan
0 replies
14h20m

Every tensor is just a stack of vectors wearing a trench coat.

formercoder
11 replies
16h22m

Googler here, if you haven’t looked at TPUs in a while check out the v5. They support PyTorch/JAX now, makes them much easier to use than TF only.

tw04
10 replies
15h52m

Where can I buy a TPU v5 to install in my server? If the answer is “cloud”: that’s why NVidia is wiping the floor.

foota
6 replies
15h35m

How many people are out there buying H100s for their personal use?

Workaccount2
2 replies
15h24m

Probably many orders of magnitude greater than those buying TPU's for personal use...

foota
1 replies
13h44m

Technically correct, but only because TPUs aren't for sale. H100s cost like 30,000 USD, if you can even get one.

immibis
0 replies
7h35m

So in other words, every AI company has at least 20.

michaelt
1 replies
6h35m

Ah, but part of the reason for CUDA's success is that the open source developer who wants to run unit tests or profile their kernel can pick up a $200 card. That PhD student with a $2000 budget can pick up a card. Academic lab with $20,000 for a beefy server, or tiny cluster? nvidia will take their money.

And that's all fixed capital expenditure - there's no risk a code bug or typo by an inexperienced student will lead to a huge bill.

Also, if you're looking for an alternative to CUDA because you dislike vendor lock-in, switching to something only available in GCP would be an absurd choice.

jsight
0 replies
4h18m

I'm really shocked at how dependent companies have become on the cloud offerings. Want a GPU? Those are expensive, lets just rent on Amazon and then complain about operational costs!

I've noticed this at companies. Yeah, the cloud is expensive, but you have a data center, and a few servers with RTX 3090s aren't expensive. A lot of research workloads can run on simple, cheap hardware.

Even older Nvidia P40s are still useful.

rhdunn
0 replies
6h21m

Probably not many. However, 4090s would be a different situation. There are plenty of guides on running LLMs, stable diffusion, etc. on local hardware.

The H100s would be for businesses looking to get into this space.

ShamelessC
2 replies
15h42m

You probably can't even rent them from Google if you wanted to, in my experience.

jedberg
0 replies
14h51m

I think OPs point was Google claims to have TPUs in their cloud but in reality they are rarely available.

IncreasePosts
10 replies
17h31m

Sigh...learning about TPUs a decade ago made me invest heavily in $GOOG for the coming AI revolution...got that one 100% wrong. +400% over 10 years isn't bad but I can't help but feel shortchanged seeing nvidia/etc

smallmancontrov
3 replies
17h23m

+400% over 10 years isn't bad.

layer8
2 replies
17h17m

It’s almost 15% per year, quite a lot.

fragmede
1 replies
16h47m

yeah, but nvda is up like 500% in 2 years, so if you’re naive enough to think you can time the market, you’d have fomo over having invested in the “wrong” thing.

bongodongobob
0 replies
13h15m

Seeing the difference between GPT2 and GPT3 made me run to NVDA immediately. One of the few bets in my life I've ever been confident about. I think NVDA was a pretty reasonable bet on AI like 5+, maybe 10 years ago when deep learning was ramping up.

genidoi
1 replies
17h22m

I don't think anybody in 2014 believed that the performance of GPT-4/Claude Opus/... was 10 years away. 25 years maybe, 50 years probably, but not 10.

bongodongobob
0 replies
13h9m

It wasn't just that, it was also all the deep learning stuff. Atari games playing themselves, deep style and variants. There was some interesting image generation happening. AlphaGo was 2015, etc. that was really when things started accelerating imo.

brcmthrowaway
1 replies
16h15m

An out and out cherrypicker complaining about missing out on gains.. couldnt write a better script

IncreasePosts
0 replies
1h54m

Buying and holding stock for 10 years is not cherry picking as I understand it.

astrange
0 replies
13h44m

Could've bought $QQQ unless you expected one of the components to do especially badly.

Symmetry
0 replies
5h35m

The potential usefulness of things like TPUs actually made me invest in Broadcomm, which helped Google design them and could potentially help Amazon or however else design their equivalents. But I'm also long NVidia and a half dozen other companies with AI exposure while still keeping most of my money in index funds.

rhelz
8 replies
16h51m

Quote from the OP: "The TPU v1 uses a CISC (Complex Instruction Set Computer) design with around only about 20 instructions."

chuckle CISC/RISC has gone from astute observation, to research program, to revolutionary technology, to marketing buzzwords....and finally to being just completely meaningless sounds.

I suppose it's the terminological circle of life.

cowsandmilk
6 replies
16h26m

You’re seeming to imply the number of instructions available is what distinguishes CISC, but it never has been.

rhelz
3 replies
16h17m

Guys....what are the instructions? The on-chip memory they are talking about is essentially...a big register set. So we have load from main memory into registers, store from registers into main memory, multiply matrices--source and dest are stored in registers....

We have a 20 instruction, load-store cpu....how is this not RISC? At least RISC how we used the term in 1995?

dmoy
0 replies
16h12m

I think the "multiply matrices" instruction is the one that makes it a cisc

brigade
0 replies
15h34m

Its design follows the old idea that an ISA should be designed for assembly programmers; that instructions should implement complex or higher-level functions intended for a programmer to use directly.

RISC rejected that notion (among other things) and focused on designing ISAs for a compiler to target when compiling high level languages, without wasting silicon on instructions a compiler cannot easily use. For the TPU, a compiler cannot easily take a 256x256 matrix multiply written in a high-level language like C and emit a Matrix_Multiply instruction.

Symmetry
0 replies
5h46m

I don't think it makes any sense to talk about all on-chip memory as a register set. In practice most uses of REP MOVS these days don't leave L3$ but because it's an instruction that runs for a highly variable amount of time while transferring data between different locations we consider it very CISCy. And the TPU also has instructions to transfer data over PCIe to and from the TPU's local DDR3 memory as well, which isn't on the chip and I hope you would agree that it's not like a register at that point.

If every instruction was always one 256 element unit maybe you could make the analogy stick. But it's working with 256*N element operations.

xarope
0 replies
14h57m

Right. CISC vs RISC has always been about simplifying the underlying micro-instructions and register set usage. It's definitely CISC if you have a large complex operation on multiple memory direct locations (albeit the lines between RISC and CISC being blurred, as all such polar philosophies do, when real-life performance optimizations come into play)

LelouBil
0 replies
16h19m

The fact that it's opposed to RISC (Reduced Instruction Set) adds to the confusion.

dmoy
0 replies
16h31m

Idk maybe it's just me, but what I was taught in comp architecture was that cisc vs risc has more to do with the complexity of the instructions, not the raw count. So TPU having a smaller number of instructions can still be a cisc if the instructions are fairly complex.

Granted the last time I took any comp architecture was a grad course like 15 years ago, so my memory is pretty fuzzy (also we spent most of that semester dicking around with Itanium stuff that is beyond useless now)

uptownfunk
1 replies
13h34m

What Google really needs to do is get into the 2nm EUV space and go sub 2nm. When they have the electro lithography (or whatever tech ASML has that prints on the chips) then you have something really dangerous. Probably a hardcore Google X moonshot type project. Or maybe they have 500mm sitting around to just buy one of the machines. If their tpu are really that good - maybe it is a good business - especially if they can integrate all the way to having their own fab with their own tech

ejiblabahaba
0 replies
12h30m

This is frankly infeasible. Between the decades of trade secrets they would first need to discover, the tens- or maybe hundreds- of billions in capital needed to build their very first leading edge fab, the decade or two it would take for any such business to mature to the extent it would be functional, and the completely inconsequential volumes of devices they'd produce, they would probably be lighting half a trillion dollars on fire just to get a few years behind where the leading edge sits today, ten or more years from now. The only reason leading edge fabs are profitable today is because of decades of talent and engineering focused on producing general purpose computing devices for a wide variety of applications and customers, often with those very same customers driving innovation independently in critical focus areas (e.g. Micron with chip-on-chip HDI yield improvements, Xilinx with interdie communication fabric and multi chip substrate design). TPUs will never generate the required volumes, or attract the necessary customers, to achieve remotely profitable economies of scale, particularly when Google also has to set an attractive price against their competitors.

If Google has a compelling-enough business case, existing fabs will happily allocate space for their hardware. TPU is not remotely compelling enough.

brcmthrowaway
1 replies
16h16m

Broadcom did the TPU

HarHarVeryFunny
0 replies
2h54m

Not the whole design - the core processing part (systolic array - matrix multiplier) was designed by Google, but Broadcom designed all the highspeed chip I/O and mapped the design onto TSMCs tools/rules.

ThinkBeat
1 replies
6h33m

This is probably a dumb question, that just shows my ignorance but I keep hearing on the consumer end of things that the M1-M4 chips are good at some AI.

The most important for me these days would be Photoshop, Resolve etc, and I have seen those run a lot faster on Apple new proprietary chips than on my older machines.

That may not translate well at all to what this chip can do or what a H100s can do.

But does it translate at all?

Of course Apple is not selling their propritary chips either so for it to be practical Apple would have to release some from of an external, server something stuffed with their GPUs and AI chips

singhrac
0 replies
2h36m

I’m also not quite an expert, but have benchmarked an M1 and various GPUs.

The M* chips have unified memory and (especially Pro/Max/Ultra) have very high memory bandwidth even compared eg to a 1080 (an M1 Ultra has memory bandwidth between 2080 and 3090).

At small batch sizes (including 1, like most local tasks), inference is bottlenecked by memory bandwidth, not compute ability. This is why people say the M* chips are good for ML.

However H100s are used primarily for training (at enormous batch sizes) and require lots of interconnect to train large models. At that scale, arithmetic intensity is very high, and the M* chips aren’t very competitive (even if they could be networked) - they pick a different part of the Pareto power/efficiency curve than H100s which guzzle up power.

xrd
0 replies
17h9m

This article really connected a lot of abstract pieces together into how they flow through silicon. I really enjoyed seeing the simple CISC instructions and how they basically map on to LLM inference steps.

sroussey
0 replies
2h41m

I wonder how hardware will change if LLMs quantized to -1,0,1 really take off.

kleton
0 replies
16h22m

Which ocean creature name is the current TPU?

ThinkBeat
0 replies
6h37m

Given what seems to be an enormous demand for fab space, when Microsoft or Google create a proprietary chip and need it produced how do they get to the front of the line?

Are they simple enough that "older outdated less in demand" fabs can produce them?

I know Apple and Nvidea has a lock on a lot of fab space?

11101010001100
0 replies
1h39m

But where is the puck going?