return to table of content

Building Meta's GenAI infrastructure

DEDLINE
48 replies
1d1h

I wonder if Meta would ever try to compete with AWS / MSFT / GOOG for AI workloads

lifeisstillgood
32 replies
1d

FB does not have the flywheel of running data centres - all three of those mentioned run hyper scale datacentres that they can then juice by “investing” billions in AI companies who then turn around and put those billions as revenue in the investors

OpenAI takes money from MSFT and buys Azure services

Anthropic takes Amazon money and buys AWS services (as do many robotics etc)

I am fairly sure it’s not illegal but it’s definitely low quality revenue

virtuallynathan
23 replies
22h15m

Facebook has more datacenter space and power than Amazon, Google, and Microsoft -- possibly more than Amazon and Microsoft combined...

jedberg
9 replies
21h38m

Unless you've worked at Amazon, Microsoft, Google, and Facebook, or a whole bunch of datacenter providers, I'm not sure how you could make that claim. They don't really share that information freely, even in their stock reports.

Heck I worked at Amazon and even then I couldn't tell you the total datacenter space, they don't even share it internally.

virtuallynathan
8 replies
19h35m

You can just map them all... I have. I also worked at AWS :)

the-rc
5 replies
16h59m

Mapping as in.. drawing the outlines of buildings and computing the square footage yourself?

virtuallynathan
4 replies
6h48m

Yep.

the-rc
3 replies
4h50m

Then you should be aware that, for the longest time, Google was against multiple floors, until they suddenly switched to four floors in many locations:

https://www.datacenterfrontier.com/cloud/article/11431213/sc...

A decade ago, there was a burst in construction and in some places the bottleneck was not getting the machines or electricity, but how fast they could deliver and pour cement, even working overnight.

virtuallynathan
2 replies
3h35m

Yep, I am aware, I have a square footage multiplier for their multi-story buildings.

jedberg
1 replies
2h30m

But how can you know how many floors they have? And where are you getting the list of buildings from? And what makes you think your list is complete?

Also how do you know their efficiency? Google might have less space but also a way to pack twice as much compute in the same place.

Like I said, this is impossible to know without a lot of insider information from a lot of companies.

chatmasta
1 replies
17h48m

This would be an interesting dataset to use for trading decisions (or sell to hedge funds).

But I wonder how much of their infrastructure is publicly mappable, compared to just the part of it that's exposed to the edge. (Can you map some internal instances in a VPC?)

That said, I'm sure there are a lot of side channels in the provisioning APIs, certificate logs, and other metadata that could paint a decently accurate picture of cloud sizes. It might not cover everything but it'd be good enough to track and measure a gradual expansion of capacity.

virtuallynathan
0 replies
6h46m

I’m not sure mapping VPCs is super helpful - the physical infra is fairly distinct.

AWS has also disclosed 20 million Nitro adapters have been deployed, so you can do some backwards napkin math from that.

virtuallynathan
7 replies
19h29m

To date, facebook has built, or is building, 47,100,000 sq ft of space, totaling nearly $24bn in investment. Based on available/disclosed power numbers and extrapolating per sqft, I get something like 4770MW.

Last I updated my spreadsheet in 2019, Google had $17bn in investments across their datacenters, totaling 13,260,000 sq ft of datacenter space. Additional buildings have been built since then, but not to the scale of an additional 30mil sq ft.

Amazon operates ~80 datacenter buildings in Northern Virginia, each ~200,000 sq ft -- about 16,000,000sq ft total in that region, the other regions are much much smaller, perhaps another 4 mil sq ft. When I'm bored I'll go update all my maps and spreadsheets.

VirusNewbie
2 replies
18h56m

But Google built data centers aren't the only data centers google is running their machine fleet in...

chatmasta
1 replies
17h46m

Yeah, Google buys servers in public datacenters like those from Equinix. One "region" needn't be one datacenter, and sometimes AWS and GCP will even have computers in the same facility. It's actually quite annoying that "region" is such an opaque construct and they don't have any clear way to identify what physical building is hosting the hardware you rent from them.

the-rc
0 replies
17h13m

Those are almost lost in the noise, compared to the big datacenters. (I've been inside two Atlanta facilities, one leased and one built from scratch, and the old Savvis one in Sunnyvale).

the-rc
1 replies
17h1m

Does the square footage take into account multiple floors? What's the source? It can be misleading, because you don't know the compute density of what's inside. Using just public data, power is a more accurate proxy. Until at least 5-6 years ago, Google was procuring more electricity than Amazon. Before that, it had a further advantage from lower PUE, but I bet the big names are all comparable on that front by now. Anyone that has worked at several of them can infer that FB is not the largest (but it's still huge).

As for the dollars, were they just in 2019 or cumulative? The Google ones seem low compared to numbers from earnings.

virtuallynathan
0 replies
6h48m

Google certainly has more compute density than Amazon, the numbers I was able to find from the local power company was 250MW at Council Bluffs back in 2015 or so.

Amazon builds out 32MW shells, and the most utilized as of 5 or 6 years ago was 24MW or so, with most being much less than that.

virtuallynathan
0 replies
51m

I updated my map for AWS in Northern Virginia -- came up with 74 buildings (another source says 76, so i'll call it directionally correct). If I scale my sq ft by ~5% to account for missing buildings, we get 11,500,000sq ft in the northern virginia area for AWS.

I'll finish my other maps and share them later...

samstave
0 replies
18h43m

At this point Power Companies (ala PG&E, etc) should be investing in AI companies in a big way. THen they make money off the AI companies to build out power infra - and vice versa.

I am surprised we havent heard about private electrical grid built out by such companies.

Surely they all have some owned power generation, but then if they do, the local areas where they DO build out power plants - they should have to build capacity for the local area, mayhaps in exchange for the normal tax subsidies they seek for all these large capital projects.

Cant wait until we pods/clusters in orbit. With radioisotope batteries to power them along with the panels. (I wonder how close to a node a RI battery can be? Can each node have its own RI?) (sas they can produce upto "several KW" -- but I cant find a reliable source for max wattage of an RI...)

SpaceX should build an ISS module thats an AI DC cluster.

And have all the ISS technologies build its LLM there based on all the data they create?

pgwhalen
1 replies
21h1m

I have zero evidence, but this seems extremely unlikely. Do you have more than zero evidence?

meiraleal
0 replies
19h44m

Meta can use all their datacenter space while Amazon, Google, and Microsoft datacenter space is mostly rented.

karmasimida
1 replies
21h32m

I don't think so, AWS hasn't disclosed this numbers, like datacenter spaces occupied, so how do you know.

virtuallynathan
0 replies
19h36m

I have mapped every AWS data center globally, and I worked at AWS.

Facebook publishes this data.

dsp
0 replies
22h4m

[citation needed]

woah
2 replies
1d

Sounds like it's free equity at the very least

lotsofpulp
1 replies
20h3m

How is it free equity? Spending money to invest it somewhere involves risks. You might recover some of it if the investment is valued by others, but there is no guarantee.

miohtama
0 replies
19h47m

You do not need cash in hands to invest. Instead, you print your own money (AWS credit) and use that to drive up the valuation, because this money costs you nothing today.

It might cost tomorrow though, when the company starts to use your services. However depending the deal structure they might not use all the credit, go belly up before credit is used or bought up by someone with real cash.

vineyardmike
1 replies
22h3m

NVidia also invests in their AI customers.

fikama
0 replies
10h24m

What do you mean? Could you elaborate please? Enumerate some deals so I could read more about it?

itslennysfault
1 replies
21h37m

Neither did AWS when they started. They were just building out data centers to run their little book website and decided to start selling the excess capacity. Meta could absolutely do the same, but in the short term, I think they find using that capacity more valuable than selling it.

otterley
0 replies
20h42m

Neither did AWS when they started. They were just building out data centers to run their little book website and decided to start selling the excess capacity.

This is a myth. It simply isn't true. AWS was conceived as a greenfield business by its first CEO. Besides, S3 and SQS were the first AWS services; EC2 didn't appear till a few years later. And it wasn't built from excess Amazon server capacity; it was totally separate.

miohtama
0 replies
19h50m

Such barter deals were also popular during the 00s Internet Bubble.

Here more on the deals (2003):

https://www.cnet.com/tech/services-and-software/aol-saga-ope...

Popular names included AOL, Cisco, Yahoo, etc.

Not sure if Amazon’s term sheets driving high valuation are nothing but AWS credits (Amazon’s own license to print money).

rthnbgrredf
11 replies
1d

Meta could build their own cloud offering. But it would take years to match the current existing offerings of AWS, Azure and GCP in terms of scale and wide range of cloud solutions.

oblio
8 replies
22h51m

The real question is: why aren't they? They had the infrastructure needed to seed a cloud offering 10 years ago. Heck, if Oracle managed to be in 5th (6th? 7th?) place, Facebook for sure could have been a top 5 contender, at least.

KaiserPro
3 replies
22h41m

because meta sucks at software, documentation and making sure end user products work in a supported way.

Offering reliable IaaS is super hard and capital intensive. Its also not profitable if you are perceived as shit.

logicchains
2 replies
22h1m

because meta sucks at software

Google started a cloud and their user-facing software is atrocious. Compared e.g. Angular to React, Tensorflow to Pytorch.

negus
1 replies
19h22m

Why would you prefer Pytorch to Tensorflow/Keras?

chessgecko
0 replies
18h45m

Tensorflow and keras have gotten better, but pytorch historically had better flexibility than keras and was much easier to debug/develop in than tensorflow.

krschultz
2 replies
11h54m

Because they make more money using their servers for their own products than they would renting them to other people. Meta has an operating margin of 41% AFTER they burn a ton on Reality Labs, while AWS has a 21% margin with more disciplined spending. Social media is a more profitable business than infrastructure.

elbear
1 replies
11h25m

Does Meta make money from anything other than ads? It's not a dismissive question. I'm curious if social media implies anything other than ads.

oblio
0 replies
5h26m

Advertising (over 97.8% of revenues): the company generated over $131 billion in advertising, primarily consisting of displaying ad products on Facebook, Instagram, Messenger, and third-party.

https://fourweekmba.com/how-does-facebook-make-money/

Thaxll
0 replies
17h53m

Because it's not their business, they're not good at it and probably the ROI is not worth it.

Also how exactly they would do it, they don't have enough infra for renting, they would need to x10 what they have now.

bionhoward
0 replies
23h56m

aww, those existing offerings are overcomplicated as hell, a fresh look could yield substantially simpler cloud developer experience and this would compete well against those other cloud offerings on simplicity alone

Cthulhu_
0 replies
22h45m

And then there's sales. All of those three - and more you haven't considered, like the Chinese mega-IT companies - spend huge amounts on training, partnerships, consultancy, etc to get companies to use their services instead of their competitors. My current employer seems all-in on Azure, previous one was AWS.

There was one manager who worked at two large Dutch companies and sold AWS to them, as in, moving their entire IT, workloads and servers over to AWS. I wouldn't be surprised if there was a deal made there somewhere.

crowcroft
1 replies
23h24m

I think Meta have avoided doing this because it would complicate their business priorities. They don’t really do B2B.

carlossouza
0 replies
21h39m

What do you mean by “they don’t do B2B”? They sell ads to companies, don’t they?

redleader55
0 replies
21h43m

For consumers, AI could just be stateless "micro service". Meta already has enough surfaces where customers can interact with AI.

dougdonohoe
37 replies
18h50m

Having lived through the dot-com era, I find the AI-era slightly dispiriting because of the sheer capital cost of training models. At the start of the dot-com era, anyone could spin up an e-commerce site with relatively little infrastructure costs. Now, it seems, only the hyper-scale companies can build these AI models. Meta, Google, Microsoft, Open-AI, etc.

renegade-otter
11 replies
18h24m

Not everything has to be AI. You can run a small business infra for MUCH less than you did back then, especially if you adjust for inflation (!).

Training AI models costs a fortune, but so far it's been just front-loading costs in hopes of a windfall. We'll see what actually happens.

boringg
10 replies
15h40m

Front loading costs to eventually extract rents on usage with one hell of a capital wall protecting the assets.

Its easier to spin up a business for sure -- also easier to unwind it - there not as sticky as they used to be.

whatshisface
4 replies
15h24m

If the government can stay back far enough that more than one AI company can train their models, it will end up working like steel mills - barely enough profit to pay the massive cost of capital due to competition. If the government regulates the industry into a monopoly, all bets are off. Their investors are going to push hard for shutting the door behind them so watch out.

The only question is - what tactic? I don't really know, but one trick I am aware of is "specifying to the vendor." In other words, the introduction of regulatory requirements that are at every step in the process a description of the most favored vendor's product. As the favored players add more features, potentially safety features, those features are required in new regulations, using very specific descriptions that more or less mandate that you reproduce the existing technology, to use a software engineer's term, bug-for-bug. If your product is better in some ways but worse in others, you might have a chance in the market - but to no avail, if the regulations demand exactly the advantages of the established suppliers.

woooooo
3 replies
5h36m

Funny example, US Steel was a textbook case for a monopoly achieved privately because it wasn't regulated against.

whatshisface
2 replies
1h32m

They were on top for a while, but later fell behind because they didn't invest. There were heavy tarrifs in place to "protect" the monopoly from foreign competition.

woooooo
1 replies
21m

All true, but the initial agglomeration was 100% private

whatshisface
0 replies
13m

If privately arising monopolies could only be kept from buying out their regulators, they'd privately break down before they became too odious... for example Google, which for years was the only remotely good search engine, is now merely one of the better search engines. If there had been a "department of consumer information safety," staffed by the best industry professionals status can buy, that might have not happened.

brookst
4 replies
15h6m

This is typically called a high fixed cost business, like airlines, hotels/apartments, SpaceX, etc.

The dream may be barriers to entry that allow high margins (“rents” if you prefer the prejudicial), but all too often these huge capital costs bankrupt the company and lose money for investors (see: WeWork, Magic Leap). It is high risk, high return. Which seems fair.

boringg
2 replies
4h36m

I understand the economics concept. I'm not sure WeWork was a great example it had significant other challenges such as a self-dealing founder and, frankly, a poor long term model.

I would wager that the concept needs a bit of a refresh as historically it has referred to high capital costs for the production of a hard good though in this case there is more than just a good produced theres a fair bit of influence and power associated with the good and a ton of downstream businesses that are reliant upon it if it goes according to plan.

renegade-otter
0 replies
3h20m

It's more like "disrupting the market". The problem is that it's a whole market.

Uber just now turned its first profit since 2009, and I would wager that if not for the newly found appreciation of efficiency and austerity, it would still be burning through money like a drunken socialist sailor.

Classic approach required basic math. "Here is my investment, here is what I am going to charge for rent". You actually can figure out when your investment starts paying off.

This new "model" requires tall, loud, truth-massaging founders to "charm" VCs into giving away billions, with the promise of trillions, I guess. The founders do talk about conquering the world, like, a lot.

I do not know what the WeWork investors were thinking when they expected standard real estate to "10x" their money while the tenants were drinking free beer on tap. The whole thing screamed "scam" even to a lay-person.

brookst
0 replies
3h10m

Agreed, and Magic Leap had its own problems. My point was just that “invest huge amounts of capital to create a moat and then monetize in the long run” Is an inherently risky strategy. Business would not work if society insisted that large, high risk investments could not product higher long term margins than less risky investments.

nradov
0 replies
2h28m

Nothing in the WeWork business model is inherently capital intensive. Fundamentally they just take out long-term office leases at low rates, then sublease the space to short-term tenants at higher rates. They don't really own major assets and have no significant IP.

danielhanchen
5 replies
14h4m

Another way to compete with the big tech incumbents is instead of hardware, try maths and software hacks to level the playing field! Training models is still black magic, so making it faster on the software side can solve the capital cost issue somewhat!

toxik
4 replies
12h27m

This kind of research is also incredibly capital intensive. You have to pay some of the smartest people around to work in it.

djhn
3 replies
8h47m

That's labour and human capital intensive, not capital intensive. And I don't mean this as a technically correct nitpick: in terms of economics it's more accurate to call it the exact opposite of capital intensive.

toxik
2 replies
8h7m

That’s a good point, I wanted to make the point that doing the research is also incredibly expensive because it requires some of the smartest people around, and the right background (and what even is that background?)

djhn
0 replies
7h56m

Yes, I agree with the general idea that it's not easy. Yet at least to some extent it might allow people and/or nations with (some degree of, relative) lack of capital but high levels of education and innovation to benefit and catch up.

danielhanchen
0 replies
7h14m

Ye not a bad point - also agree with djhn on stuff.

It's true it'll still be relatively expensive - but I would propose its relatively inexpensive if people want to make it faster, and have the drive to do it :) On the other hand, capital expenditures requires large amounts of money, which also works.

I guess some general CUDA, some maths, knowing how to code transformers from scratch, some Operating systems and hardware knowledge, and the constant drive to read new research papers + wanting to make things better.

I just think as humans, if you have drive, we can do it no matter the contraints!

herval
4 replies
14h31m

I’m not sure we went through the same dot-com era, but in my experience, it was extremely expensive to spin up anything. You’d have to run your own servers, buy your own T1 lines, develop with rudimentary cgi… it was a very expensive mess - just like AI today

Which gives me hope that - like the web - hardware will catch up and stuff will become more and more accessible with time

Jensson
3 replies
7h16m

I’m not sure we went through the same dot-com era, but in my experience, it was extremely expensive to spin up anything. You’d have to run your own servers, buy your own T1 lines, develop with rudimentary cgi… it was a very expensive mess - just like AI today

To make your own competing LLM today you need hundreds of millions of dollars, the "very expensive" of this is on a whole different level. You could afford the things you talked about on a software engineering salary, it would be a lot of money for that engineer but at least he could do it, no way anyone but a billionaire could fund a new competing LLM today.

anon373839
1 replies
6h38m

I think the foundation models are a commodity, anyway. The bulk of the economic value, as usual, will be realized at the application layer. Building apps that use LLMs, including fine-tuning them for particular purposes, is well within reach even of indie/solo devs.

That’s why Sam Altman makes so much noise about “safety” - OpenAI would really like a government-backed monopoly position so they can charge higher rents and capture more of that value for themselves. Fortunately, I think that llama has already left the barn.

herval
0 replies
2h23m

I think openai/anthropic/etc are banking on foundation models being the equivalent of the "datacenters" or AWS-equivalents of AI - there'll be PaaSes (eg replicate), and most businesses will just pay the "rent"

herval
0 replies
2h24m

Only if you're creating a foundation model. The equivalent would be competing with a well-funded Amazon, back in 1999. You can compete in building LLM-powered products with much, much less money - less than a regular web app in 99

andy99
3 replies
18h44m

So far it's been pretty "democratic" - I feel in no way disadvantaged because I can't train a foundation model myself. Actually the ecosystem is a lot better than 25 years ago - there are open source (or source available) versions of basically everything you'd want to participate in modern AI/ML.

mewpmewp2
2 replies
18h27m

But none of those are remotely as good as GPT4 for example.

to11mtm
1 replies
18h18m

Mixtral?

ametrau
0 replies
17h20m

Obviously not even close

tdudhhu
1 replies
7h43m

As far as I know training is the main issue.

I don't know a lot about ML. Does anyone know if it is possible to keep training the system while it is running?

That would help a lot if you don't have the possibility to use huge training sets as a starting point.

xdeepak81
0 replies
7h4m

Ads and Search engine uses a continuous incremental training to add the new relevant information.

richardw
1 replies
11h20m

I find the market way more open and competitive than dot-com. Everyone is throwing up a chatbot or RAG solution. There are tradesmen and secretaries and infinite 19 year olds who are now able to wire together a no-code app or low-code bot and add value to real businesses. The hyper scalars are making some money but absolutely don't have this locked up. Any Groq or Mistral could wander in and eat their lunch, and we haven't really started the race yet. The next decade will be ridiculous.

infecto
0 replies
6h35m

Could not have said it better. Nobody has won the race yet and things are getting better. Building a foundation model is not cheap but not out of reach still for a startup.

rmbyrro
0 replies
7h28m

Fine-tuning is quite accessible for the average small business or hacker, though.

nl
0 replies
3h57m

I too went through the dot com era: as in when Sun Microsystems had the tag line "we are the dot in dot com".

I assure you that before Apache and Linux took over that "dot" in the .com was not cheap!

Fortunately it only really lasted maybe 1993-1997 (I think Oracle announced Linux support in 1997, and that allowed a bunch of companies to start moving off Solaris).

But it wasn't until after the 2001 crash that people started doing sharded MySQL and then NoSQL to scale databases (when you needed it back then!).

It's early. You can do LORA training now on home systems, and for $500 you can rent enough compute to do even more meaningful fine-tuning. Lets see where we are in 5 and 10 years time.

(Provided the doomers don't get LLMs banned of course!)

mindwok
0 replies
16h20m

We will probably get there, it's just going to take time for hardware supply chains to catch up. I feel it's more comparable to mainframe eras - it took time for general purpose computing to become commoditised.

hackerlight
0 replies
12h54m

Foundation models != application layer. The question is whether the application layer's lunch will be eaten by better foundation models.

danielmarkbruce
0 replies
18h46m

It's not quite the same thing. A model is just one part of a product. You can spin up a product with zero infra and calling APIs hosting models.

ZiiS
0 replies
6h51m

Only hyper-scale companies like ATT could build the fibre; scrappy startups like Google and Amazon ate their lunch.

fuddle
32 replies
1d1h

How much are they paying for H100's? If they are paying $10k: 350,000 NVIDIA H100 x $10k = $3.5b

YetAnotherNick
13 replies
1d

$3.5b

Which is a fourth of what they spent in VR/AR in a year. And Gen AI is something they could easily get more revenue as it has now become proven technology, and Meta could possibly leapfrog others because of the data moat.

NBJack
7 replies
1d

What moat exactly? Much of the user data they have access to is drying up due to new regulations, some of which prohibit IIRC direct use on models as well. I'm not even sure they can use historical data.

Meta certainly has an edge in engineer count, undoubtedly. But I'd say they really, really want the metaverse to succeed more to have their on walled garden (i.e. equivalent power of Apple and Google stores, etc.). There's a reason they gave a hard pass to a Google partnership.

Dr_Birdbrain
4 replies
23h56m

I think the raw text inside Facebook groups is at least as valuable as Reddit data. Even if demographics data is restricted under European law, the raw text of people interacting is quite valuable.

verticalscaler
2 replies
22h54m

Indeed, my deranged auntie posting on FB is approximately as valuable as my ADHD/PTSD quaranteeny nephew redditing.

infecto
0 replies
6h27m

Ahhh you had posted some other negative META criticism that was not even factual. Your made-up narratives really do not paint the correct picture.

fragmede
0 replies
14h13m

That ignores all the user groups that are on Facebook. From apartment communities aka Nextdoor to grief support counseling to the mindfulness therapy groups, there’s a wealth of user comments a tad bit higher than Uncle John’s racist rants.

calvinmorrison
0 replies
23h31m

facebooks downfall will be their lock in. every other social media platform lets you view a public profile, discussion groups etc. it's all locked inside facebook.

agar
0 replies
17h53m

There's a reason they gave a hard pass to a Google partnership.

AIUI, Google required Meta to basically cede control of a partnered OS to them:

"After years of not focusing on VR or doing anything to support our work in the space, Google has been pitching AndroidXR to partners and suggesting, incredibly, that WE are the ones threatening to fragment the ecosystem when they are the ones who plan to do exactly that.

"We would love to partner with them. They could bring their apps to Quest today! They could bring the Play store (with its current economics for 2d apps) and add value to all their developers immediately, which is exactly the kind of open app ecosystem we want to see. We would be thrilled to have them. It would be a win for their developers and all consumers and we’ll keep pushing for it.

"Instead, they want us to agree to restrictive terms that require us to give up our freedom to innovate and build better experiences for people and developers—we’ve seen this play out before and we think we can do better this time around."

-- From Mark Bosworth

YetAnotherNick
0 replies
22h36m

Much of the user data they have access to is drying up due to new regulations, some of which prohibit IIRC direct use on models as well.

Source would be appreciated, because this is opposite of obvious. Regulations against using public first party would be a big news and I haven't heard of anything like that. They use my data for recommending feed so why not for answering my question?

dougb5
4 replies
23h47m

Proven technology, maybe, but proven product-market fit for the kinds of things Facebook is using it for? Their linked blog about AI features gives examples "AI stickers" and image editing... cool, but are these potential multi-billion dollar lifts to their existing business? I guess I'm skeptical it's worthwhile unless they're able to unseat ChatGPT with a market-leading general purpose assistant.

pests
1 replies
23h13m

I have a few group chats just that devolve into hours of sending stickers or image generation back and forth, lately we've been "writing a book together" with @Meta AI as the ghost writer, and while it utterly sucks, its been a hilarious shared experience.

I don't think anyone else has gotten that group chat with AI thing so nailed.

TaylorAlexander
0 replies
22h50m

On the podcast TrashFuture, November Kelly recently described AI systems as “garbage dispensers” which is both a funny image (why would anyone make a garbage dispenser??) and an apt description. Certainly these tools have some utility, but there are a load of startups claiming to “democratize creativity” by allowing anyone to publish AI generated slop to major platforms. On the podcast this phrase was used during discussion of a website which lets you create AI generated music and push it to Spotify, a move which Spotify originally pushed back on but has now embraced. Garbage dispenser indeed.

YetAnotherNick
1 replies
22h22m

unseat ChatGPT with a market-leading general purpose assistant.

It's not impossible. The prediction from many(not that I believe it) is that over long run modelling tricks would become common knowledge and only thing that matters is compute and data, both of which Meta has.

Also there could be a trend of LLMs for ads or feed recommendation in the future as they has large completely unstructured dataset per user across multiple sites.

cj
0 replies
20h45m

Compute, data, and most importantly distribution/users.

IMO standalone AI companies like OpenAI might be successful by providing infrastructure to other companies, but I can’t imagine ChatGPT remaining #1 many years from now.

The web is still trending towards being a walled garden. Maybe not right now, but long term I think people will use whatever AI is most convenient which probably will be AI built into a giant company with established user base (FB, GOOG, MSFT, and Apple if they ever get around to launching - would love Siri 2.0 if it meant not needing to open the ChatGPT iOS app)

trsohmers
11 replies
1d

Significantly more than that; MFN pricing for NVIDIA DGX H100 (which has been getting priority supply allocation, so many have been suckered into buying them in order to get fast delivery) is ~$309k, while a basically equivalent HGX H100 system is ~$250k, coming to a price per GPU at the full server level being ~$31.5k. With Meta’s custom OCP systems integrating the SXM baseboards from NVIDIA, my guess is that their cost per GPU would be in the ~$23-$25k range.

fuddle
9 replies
23h9m

350,000 NVIDIA H100 x $23k = $8b :0

verticalscaler
8 replies
22h58m

Wait till you find out how much they spent on VR.

It is a real loophole in the economy. If you're a trillion dollar company the market will insist you set such sums on fire just to be in the race for $current-hype. If they do it drives their market cap higher still and if they don't they risk being considered un-innovative and therefore doomed to irrelevancy and the market cap will spiral downwards.

Sort of reminds me of The Producers.

oblio
3 replies
22h53m

The thing is, this could be considered basic research, right? Basic research IS setting money on fire until (and if) that basic research turns into TCP/IP, Ethernet and the Internet.

verticalscaler
2 replies
22h47m

I wish.

Funnily enough Arpanet and all that Xerox stuff were like <$50 million (inflation adjusted!) total. Some real forward thinkers were able to work the system by breaking off a tiny pittance of a much larger budget.

Where as I think this more appropriately can be considered the meta PR budget. They simply can't not spend it, would look bad for Wall Street. Have to keep up with the herd.

throwaway2037
0 replies
12h39m

    > Funnily enough Arpanet and all that Xerox stuff were like <$50 million (inflation adjusted!) total.
That doesn't say much. The industry was in utter infancy. How much do you think it cost to move Ethernet from 100Mbit/sec to 1GBbit/sec to 10GB to 100GB to 400GB to 800GB? At least one or two orders of magnitude.

How about the cost to build a fab for the Intel 8088 versus a fab that produces 5nm chips running @ 5GHz. Again, at least one or two orders of magnitude.

infecto
0 replies
6h29m

Funny you pick a company that has very little to answer to the markets, out of all the large tech companies, META is the rare one that does not need to answer because Zuckerberg controls the company.

lotsofpulp
3 replies
20h8m

If you're a trillion dollar company the market will insist you set such sums on fire just to be in the race for $current-hype. If they do it drives their market cap higher still and if they don't they risk being considered un-innovative and therefore doomed to irrelevancy and the market cap will spiral downwards.

You don’t think earning increasing amounts of tens of billions of dollars in net income per year at some of the highest profit margins in the world at that size for 10+ years has anything to do with market cap?

verticalscaler
2 replies
18h9m

$1T Market Cap lets it be known it will invest $10B a year into $current-hype that will change everything. P/E loosens speculatively on sudden new unbounded potential, Market Cap $1.1T. Hype funded. PR as innovator cemented.

throwaway2037
0 replies
11h44m

Market Cap $1.1T. Hype funded.

I'm confused. How does your stock price, which determines market cat, affect your cashflow to fund R&D? It does not.

bigcat12345678
0 replies
14h17m

Would you kindly provide sources to the numbers? What is MFN?

Thanks! (Your number is consistent with what I hear of, but I never managed to get solid sources to back them up)

vineyardmike
1 replies
22h5m

It’s often forgotten now, but just a few years NVidia was cancelled production batches and writing down inventory when the GPU shortage cleared. No one needed more GPUs. It also happens to be when Meta first announced they were going to increase CapEx spending on compute.

I’m guessing that Meta got a sweetheart deal to help take a lot of inventory for NVidia and make commitments for future purchases.

transcriptase
0 replies
18h47m

I don’t think it was that nobody needed GPUs. It was that nvidia thought they could get scalper margins by restricting supply after the shortage showed people were willing to pay scalper prices.

ZiiS
1 replies
1d

They may have to pay a premium to secure ~¼ of the output; certainly unlikely to be that steep a discount.

theptip
0 replies
23h6m

Semi analysis posted recently noting that Meta locked in these purchases a while ago; something like a year or more. So they probably didn’t pay today’s spot rate.

loeg
0 replies
21h24m

Yes, billions in GPU cap ex.

dekhn
0 replies
23h24m

That sounds like a reasonable budget for 3 years of hardware at a major AI company.

hendersoon
28 replies
1d1h

350k H100 cards, around ten billion dollars just for the GPUs. Less if Nvidia gives a volume discount, which I imagine they do not.

renegade-otter
27 replies
1d1h

It will be ironic if Meta sinks all this money into the new trend and finds out later that it has been a huge boondoggle, just as publishers followed Facebook's "guidance" on video being the future, subsequently gutting the talent pool and investing into video production and staff - only to find out it was all a total waste.

echelon
17 replies
1d1h

As a practitioner in the field, I can assure you this is not a boondoggle.

Those GPUs are going to subsume the entire music, film, and gaming industries. And that's just to start.

__loam
16 replies
1d

"My paycheck depends on this technology destroying every field producing cultural artifacts"

echelon
15 replies
1d

Said the butter churner, cotton ginner, and petrol pumper.

I work in film. I've shot dozens of them the old fashioned way. I've always hated how labor, time, and cost intensive they are to make.

Despite instructions from the luminaries to "just pick up a camera", the entire process is stone age. The field is extremely inequitable, full of nepotism and "who you know". Almost every starry-eyed film student winds up doing drudge work for the rest of their lives. Most will never make a feature to match their ambition.

If the whole task was to simply convey my thoughts and dreams to others, why am I scrambling around to sign location rights, capture photons on expensive glass, and then smear and splice things together for months on end? This is ceremonial and soon to be anachronistic. I'm glad that whole mess is going to be replaced. It's a farce.

To phrase it another way - would you like to be hand-writing assembly on punch cards? To only gain entrance into the field with your mathematics PhD?

To speak of the liberty and the economics, why should I have to sell the rights to my idea to a studio so I can get it off the ground? Why should I have to obey the studio's rules and mind their interference?

This whole Gen AI thing is going to be the biggest liberating moment for filmmaking creatives. I know, because I am one.

And if you think any Jack or Jill can just come in and text prompt a whole movie, you're crazy. It's still hard work and a metric ton of good taste.

Art will never die. It's the human soul. It'll take more than some tech bros with GPUs to kill it.

AI is just another tool for the artist. A "bicycle for the mind" to quote Jobs, and a rocket ship for the imagination to convey my own direct experience.

wizzwizz4
10 replies
1d

And if you think any Jack or Jill can just come in and text prompt a whole movie, you're crazy. It's still hard work and a metric ton of good taste.

If you want anything good, yes. If you just want something… I reckon it'd take a week to assemble an incomprehensible-nonsense-film pipeline, after which it's just a matter of feeding the computer electricity.

Short-term, this is going to funnel resources away from the people with good taste. Long-term, it might help collapse the entire "creative industry", after which we might get some of that artist liberation stuff you're talking about – but we might just end up with new gatekeeping strategies from the wealthy and connected, and business as usual.

echelon
9 replies
23h47m

If you want anything good, yes. If you just want something ...

You don't even need AI for that.

https://en.wikipedia.org/wiki/YouTube_poop

https://en.wikipedia.org/wiki/Skibidi_Toilet

The idea that AI isn't going to be used as a creative tool too and that it won't lead to more and better art is a defeatist, Luddite attitude.

Similarly shaped people thought that digital cameras would ruin cinema and photography.

Short-term, this is going to funnel resources away from the people with good taste.

On the contrary - every budding film student will soon [1] be able to execute on their entire visions straight out of the gates. No decades of clawing their way to a very limited, almost impossible to reach peak.

it might help collapse the entire "creative industry"

The studio system. Not the industry.

new gatekeeping strategies from the wealthy and connected, and business as usual.

Creatives have more ways of building brands and followings for themselves than ever before. It's one of the largest growing sectors of the economy, and lots of people are earning livings off of it.

You'll be able to follow that steampunk vampire creator that's been missing from the world until now. Every long tail interest will be catered to. Even the most obscure and wild tastes, ideas, and designs. Stuff that would never get studio funding.

As a creative, I'm overjoyed by this. My friends and I are getting to create things we never could make before [2].

[1] This and next year.

[2] Just an inspiration / aesthetic sample, but we're making a full film: https://imgur.com/a/JNVnJIn

crmd
4 replies
22h14m

You'll be able to follow that steampunk vampire creator that's been missing from the world until now. Every long tail interest will be catered to. Even the most obscure and wild tastes, ideas, and designs. Stuff that would never get studio funding.

Your optimism reminds me of the optimism I had around the early internet. Power to the people, long tail, rise of the creative class, the fall of gatekeeping corporations, etc.

It was like that for a couple of years in the late 90s before power and control got vastly more centralized than before. Maybe this time it’ll be different.

munificent
3 replies
21h12m

The big difference is that back then, anyone with a consumer-level computer in their bedroom could turn it into a server and be a first-class citizen on the Internet.

With generative AI, models will be controlled by a handful of giant corporations who have the enormous corpuses (of dubious provenance) and compute ability to train them.

So it will be like last time, but even worse.

echelon
2 replies
21h2m

You can run ComfyUI and AnimateDiff on your PC. If you haven't checked them out, please do.

And there are other angles to consider. Apple, for one, is expressly interested in not becoming a thin client to cloud AI. They're baking a lot of inference power into their chips. If the creative class don't need their devices, that doesn't bode well for them...

munificent
1 replies
20h34m

Running local models isn't the same as being able to train them from scratch yourself on a corpus of your own choosing.

echelon
0 replies
20h18m

There are so many ways to do exactly this too!

FakeYou, CivitAi, WeightsGg, Comflowy, ... -- there are tons of vibrant communities to teach you everything you need to know. The tools are open source, free to use, and accessible.

This isn't hard at all once you dive in.

renegade-otter
1 replies
22h46m

Similarly shaped people thought that digital cameras would ruin cinema and photography.

Obviously, but you seem to be arguing that AI is just another evolution of productivity tools. You still need to have a photographer's eye while using this technology.

If you couldn't make a good composition on film, a digicam will not save you, and it definitely did not replace photographers. Perhaps lowered the barrier of entry for prosumers.

https://www.nytimes.com/2023/12/26/opinion/ai-future-photogr...

echelon
0 replies
22h30m

We're arguing the same point. :)

wizzwizz4
0 replies
22h3m

Many YouTube Poops are artistic expression (e.g. https://redirect.invidious.io/watch?v=dO4eIEvHjSw). Skibidi Toilet is definitely artistic expression: it's a full-on epic. (Reactions from one ≈50-year-old: “baffling” “how did they do that?” “why would anyone make this?”)

If you think the Luddites were defeatist, you don't know much about the Luddites.

On the contrary - every budding film student will soon [1] be able to execute on their entire visions straight out of the gates. […] Creatives have more ways of building brands and followings for themselves than ever before.

Yet, we have no shortage of starving artists. Will AI provide them food and shelter?

This is unequivocally a win for creative expression for hobbyists, but it stands to harm professionals – at least in the short term, perhaps longer-term. It's not happening in a vacuum: the greedy are revoking livelihoods because they think AI can do it faster and cheaper (laundering appropriated hobbyist and increasingly-cheap professional labour).

The studio system. Not the industry.

Huh, the word 'industry' has a specialised meaning in economics. Didn't know that.

MrScruff
0 replies
10h55m

Are you talking about some as yet unseen research/technology? The aesthetic sample looks like something we could have seen on the SD subreddit for the last year.

sangnoir
0 replies
1d

And if you think any Jack or Jill can just come in and text prompt a whole movie, you're crazy. It's still hard work and a metric ton of good taste.

Yeah, I cant wait for ChuChuTV to get the best film Oscar /s.

munificent
0 replies
22h57m

Call me crazy, but I don't think churning butter and writing a novel are in the same category of human endeavor at all.

lmm
0 replies
14h35m

Said the butter churner, cotton ginner, and petrol pumper.

Said the bank teller, record producer, etc.. Plenty of cases where we've been told technology and automation would democratise the field and remove the middleman, and actually it's the opposite.

Yes, it would be nice if AI made it easy for anyone who wanted to make a great movie. That doesn't mean it's going to happen.

dist-epoch
0 replies
20h44m

The field is extremely inequitable, full of nepotism and "who you know"

Maybe, but it's never been cheaper to make a movie.

I know someone with no connections and (almost) no money which in 4 years made multiple no. 1 box-office films (obviously not in US, in a smaller country) and then got picked up by Netflix.

tayo42
5 replies
1d1h

What does video not be in the future mean? In social media tiktok and reels are everywhere?

neon_electro
2 replies
1d

They are referring to Facebook/Meta’s 2015 “pivot to video”, speculating there may be a similar thing happening more recently with AI.

https://en.wikipedia.org/wiki/Pivot_to_video

tayo42
0 replies
1d

Interesting thanks!

Feels like in hind sight, maybe they were just to early to it.

michaelt
1 replies
1d

There are reports [1] that a bunch of companies like "College Humor" were convinced to switch to producing native video for facebook (instead of directing users to their own sites) on the basis of bullshit metrics from facebook, and had an extremely bad time as a result, with some companies going bankrupt.

Something like counting an autoplaying video that ran for 3 seconds as a 'view' IIRC

[1] https://twitter.com/adamconover/status/1183209875859333120

scubbo
0 replies
1d

Thankfully, Dropout (a spin-off of College Humor) is alive and well, and producing some of the best D&D Actual Play series as well as other non-D&D comedy shows. One of the entertainment services that I happily pay for because I want to support what they're doing.

motoxpro
1 replies
1d1h

It already paid off. When the world moved from determinisic to probablistic ad modeling. That's why their numbers are so good right now compared to every other advertiser

blitzar
0 replies
20h18m

It already paid off. FB stonk price is up lots.

foobarian
0 replies
1d

There is still hope then for cheap gaming GPUs some day soon! I have pretty much the last 10 years of flagship releases to catch up on...

danielhanchen
22 replies
1d1h

float8 got a mention! x2 more FLOPs! Also xformers has 2:4 sparsity support now so another x2? Is Llama3 gonna use like float8 + 2:4 sparsity for the MLP, so 4x H100 float16 FLOPs? Pytorch has fp8 experimental support, whilst attention is still complex to do in float8 due to precision issues, so maybe attention is in float16, and RoPE / layernorms in float16 / float32, whilst everything else is float8?

andy99
8 replies
23h6m

Is there float8 support in any common CPU intrinsics? It sounds interesting but curious what will be the impact if any on CPU inference.

teaearlgraycold
4 replies
12h47m

I’m curious if there’s a meaningful quality difference between float8 and some uint8 alternative (fixed precision or a look up table).

CraigJPerry
3 replies
10h36m

A LUT could be a significant performance penalty would it not? Instead of a float8 (potentially multiple in simd case) in a register, you’re now having to head out to at least L1 cache to dereference the value in the LUT.

Plain uint8 wouldn’t allow for the same accuracy range as float8 and it’s the accuracy not the precision (which uint would win for the largest values it can represent) that counts most.

danielhanchen
2 replies
10h25m

Oh oh was just gonna comment as well, but saw this! I think x86 has like pshufb for LUTs (used them like ages ago, but forgot now :() I think also some game (was it Spiderman) used loads of lookup tables.

The issue with LUTs is don't you have to update the LUT itself? You can select which memory address to load up, but the LUT itself has to be differentiable maybe? TBH I'm not an expert on LUTs.

On fixed point - similarly ye you have to fix the precision ranges as well, so again I'm unsure on how one changes the fixed point numbers over time. I'll have to read more on fixed point.

Maybe 1.58bit using (-1, 0, 1) which gets rid of multiplications and just additions might be more useful, although you'll only get a 2x FLOP boost since you still need fp8 or fp16 addition.

danielhanchen
0 replies
9h13m

Oh I forgot about that!! But ye LUTs are very interesting and fascinating :) One of the hidden gems of CPU optimizations :)

ashvardanian
2 replies
19h49m

Nope. Moreover, simulating it even with AVX-512 is quite an experience. Been postponing it for 2 years now... But first of all, you need to choose the version of float8 you want to implement, as the standards differ between GPU vendors.

janwas
1 replies
16h21m

We use it in gemma.cpp [1]. This hybrid of E5M2 and E4M3 decodes to bf16 in ~14 instructions, so we can do that on the fly during dot products.

[1]: github.com/google/gemma.cpp

danielhanchen
0 replies
15h49m

Congratulations on gemma.cpp!!

ipsum2
4 replies
22h41m

You're still bounded by memory bandwidth, so adding multiples to FLOPs is not going to give you a good representation of overall speedup.

jabl
2 replies
22h35m

Well, those smaller floats require less BW to transfer back and forth as well. Perhaps not a reduction linear in the size of the float, as maybe smaller floats require more iterations and/or more nodes in the model graph to get an equivalent result.

But rest assured there's an improvement, it's not like people would be doing it if there wasn't any benefit!

andy99
1 replies
21h15m

The impact on bandwidth is the main reason smaller is better I belive, certainly when it's the bottleneck. I'm only really familiar with CPU but with say FP16 you might convert back to FP32 when you're doing the actual multiplication (so conversion plus multiplication is actually slower) but because you're moving half the data in and off you still get a huge speedup.

danielhanchen
0 replies
15h50m

I can't remember some research paper somewhere even if you do float32 multiplications, but keep the data in bfloat16 by just simply truncating the lower mantissa bits, and doing packing, you still get speedups, since matrix multiplication is bound both by compute and cache access. If you can optimize on the cache side of things, speedups are definitely there.

danielhanchen
0 replies
15h52m

I'm not sure exactly on how NVIDIA calculates FLOPs, but I do know for Intel's FLOPs, it's calculated from how many FMA units, how many loads can be done in tandem, and what the throughput is. And ye fp8 requires 2x less space. Sparse 2:4 might be less pronounced, since the matrix first needs to be constructed on the fly, and there is like a small matrix of indicator values.

boywitharupee
3 replies
17h8m

care to explain why attention has precision issues with fp8?

danielhanchen
2 replies
16h45m

Oh so float8's L2 Norm from float32 is around I think 1e-4, whilst float16 is 1e-6. Sadly attention is quite sensitive. There are some hybrid methods which just before the attention kernel which is done in fp8, upcasts the Q and K from the RoPE kernel to become float16, then also leaves V to be in float8. Everything is done in fp8 on the fly, and the output is fp8. This makes errors go to 1e-6.

alecco
1 replies
7h56m

Yes, but it's a bit more complicated. There are 2 FP8 formats: E5M2 and E4M3.

E5M2 is like an IEEE 754. But to compensate the smaller exponent, "E4M3’s dynamic range is extended by not representing infinities and having only one mantissa bit-pattern for NaNs".

Some people reported E4M3 is better for the forward pass (small range, more precision) and E5M2 is better for the backward pass (bigger range, less precision). And most implementations have some sort of scaling or other math tricks to shrink the error.

[0] FP8 Formats for Deep Learning (Nvidia/ARM/Intel) https://arxiv.org/abs/2209.05433

danielhanchen
0 replies
7h21m

Fair points! Ye Pytorch's fp8 experimental support does scaling of the gradients. Interesting point on a larger range for the forward pass, and a small range for the gradients! I did not know that - so learnt something today!! Thanks! I'll definitely read that paper!

j45
1 replies
19h37m

Is it safe to assume this is the same float16 that exists in Apple m2 chips but not m1?

GamerAlias
1 replies
1d

I was thinking why is this one guy on HN so deeply interested and discussing technical details from a minor remark. Then I clocked the name. Great work on Gemma bugs

danielhanchen
0 replies
1d

Oh thanks :) I always like small details :)

mjburgess
20 replies
1d1h

I'd be great if they could invest in an alternative to nvidia -- then, in one fell swoop, destroy the moats of everyone in the industry.

aeyes
9 replies
1d1h

Isn't Google trying to do this with their TPUs?

crakenzak
8 replies
1d

I still, for the life of me, can't understand why Google doesn't just start selling their TPUs to everyone. Nvidia wouldn't be anywhere near their size if they only made H100s available through their DGX cloud, which is what Google is doing only making TPUs available through Google Cloud.

Good hardware, good software support, and market is starving for performant competitors to the H100s (and soon B100s). Would sell like hotcakes.

dekhn
2 replies
20h47m

Do you mean, sell TPU hardware to other companies that would run it in their data centers? I can't imagine that would ever really work. The only reason TPUs work at Google is because they have huge teams across many different areas to keep them running (SRE, hardware repair, SWE, hardware infra) and it's coupled to the design of the data centers. To vend and externalize the software would require google to setup similar teams for external customers (well beyond what Google Cloud provides for TPUs today) just to eke out some margin of profit. Plus, there is a whole proprietary stack running under the hood that google wouldn't want to share with potential competitors.

Google used to sell a search appliance-in-a-box and eventually lost interest because hardware is so high-touch.

aeyes
1 replies
19h46m

Google used to sell a search appliance-in-a-box and eventually lost interest because hardware is so high-touch.

We had a GSA for intranet search and other than the paint this was a standard Dell server. I remember not being impressed by what the GSA could do.

We also had Google Urchin for web analytics, it wasn't a hardware appliance but the product wasn't very impressive either. They then killed that and tried to get you onto Google Analytics.

They just didn't commit to these on premise enterprise products.

dekhn
0 replies
19h16m

The server may have been dell, but it included a full stack of google3 software including chubby the lockserver.

We had one at my company and it was widely loved- far better intranet search and domain-specific search for biotech.

ajcp
1 replies
1d

And undercut what they'd like to use as a huge motivator in people moving to GCP? Not likely. Even if they wanted to they can't keep up with their own internal demand.

Beyond that they might not be as stable or resilient outside of the closely curated confines of their own data-centers. In that case selling them would be more of an embarrassment.

htrp
0 replies
1d

Beyond that they might not be as stable or resilient outside of the closely curated confines of their own data-centers. In that case selling them would be more of an embarrassment.

Once you go out of your heavily curated hardware stack, the headaches multiply exponentially.

qiine
0 replies
1d

Maybe selling hardware to customers worldwide + support like Nvidia does is actually not trivial ?

neuronexmachina
0 replies
1d

The impression I got from this thread yesterday is that Google's having difficulty keeping up with the heavy internal demand for TPUs: https://news.ycombinator.com/item?id=39670121

aseipp
0 replies
1d

It is an absolutely massive amount of work to turn something designed for your custom software stack and data centers (custom rack designs, water cooling, etc) into a COTS product that is plug-and-play; not just technically but also things like sales, support, etc. You are introducing a massive amount of new problems to solve and pay for. And the in-house designs like TPUs (or Meta's accelerators) are cost effective in part because they don't do that stuff at all. They would not be as cheap per unit of work if they had to also pay off all that other stuff. They also have had a very strong demand for TPUs internally which takes priority over GCP.

math_dandy
5 replies
1d1h

A company moving away from Nvidia/CUDA while the field is developing so rapidly would result in that company falling behind. When (if) the rate of progress in the AI space slows, then perhaps the big players will have the breathing room to consider rethinking foundational components of their infrastructure. But even at that point, their massive investment in Nvidia will likely render this impractical. Nvidia decisively won the AI hardware lottery, and that's why it's worth trillions.

whiplash451
2 replies
1d

People said the same thing when tensorflow was all the rage and pytorch was a side project.

Granted, HW is much harder than SW, but I would not discount Meta's ability to displace NVIDIA entirely.

Cthulhu_
1 replies
22h44m

I don't think they could; nvidia has tons of talent, Meta would have to steal that. Meta doesn't do anything in either consumer or datacenter hardware that isn't for themselves either.

Meta is a services company, their hardware is secondary and for their own usage.

Wazako
0 replies
19h35m

meta has the Quest. It's not so bad that they're looking to create an LPU for their headset to offer local play.

mjburgess
1 replies
1d

I'm more concerned to avoid nvidia (et al.) market domination, than chasing the top-edge of the genAI benefits sigmoid. This will prevent much broad-based innovation.

hx8
0 replies
1d

This space is so compeitive, even if Nvidia is asleep at the wheel a competitor will come and push them before too long. AMD has a history of noticing when their competitors are going soft and rapidly being compeitive.

paxys
1 replies
1d1h

Except that "one fell swoop" would realistically be 20+ years of research and development from the top minds in the semiconductor industry.

logicchains
0 replies
21h59m

It's not the hardware keeping NVidia ahead, it's the software. Hardware-wise AMD is competitive with NVidia, but their lack of a competitive CUDA alternative is hurting adoption.

brucethemoose2
0 replies
1d1h

Facebook very specifically bought and customized Intel SKUs tailored for AI workloads for some time.

jvanderbot
15 replies
22h51m

So, I'd love to work on optimizing pipelines like this. How does one "get into" it? It seems a ML scientist with some C/C++ and infra knowledge just dips down into the system when required? Or is it CUDA/SIMD experts who move "up" into ML?

KaiserPro
8 replies
22h6m

A lot of the optimisation at this level is getting data into the right place at the right time, without killing the network.

Its also a group effort to provide simple to use primitives that "normal" ML people can use, even if they've never used hyper scale clusters before.

So you need a good scheduler, that understand dependencies (no, the k8s scheduler(s) are shit for this, plus it wont scale past 1k nodes without eating all of your network bandwidth), then you need a dataloader that can provide the dataset access, then you need the IPC that allows sharing/joining of GPUs together.

all of that needs to be wrapped up into a python interface that fairly simple to use.

Oh and it needs to be secure, pass an FCC audit (ie you need to prove that no user data is being used) have a high utilisation efficiency and uptime.

the model stuff is the cherry on the top

jvanderbot
5 replies
21h14m

Ok, but back to my main question, how do I get into this?

willsmith72
4 replies
20h46m

It looks more like an infra problem than ML. "Software architect"s mixed with devops/infra/sre people

jvanderbot
3 replies
20h37m

Well since I'm not a ML engineer of any kind - that's good!

zooq_ai
2 replies
20h0m

at the end of the day, you are still moving, storing and manipulating 1's and 0's, whether you are a front end engineer or a backend engineer or systems engieer or an ML engineer or an infra engineer

elbear
1 replies
11h36m

yeah, but how do you get the hiring managers to see things in the same way? :)

zooq_ai
0 replies
6h55m

well at least I fit my resume to match the 'job description' because at the end of the day it's all hallucinations and 'real' software engineers that has core computer science skills can literally do anything

claytonjy
1 replies
19h9m

can you say more about the network issues with thousands of k8s nodes? I'm regularly running 2-3000 nodes in a GKE cluster, majority have GPUs, is this something I need to be worrying about?

KaiserPro
0 replies
7h44m

Only if you are paying for the network bandwidth. for example if there are nodes spanning more than one zone, and you pay for that traffic, you might want to think about moving stuff to a single zone.

For other settings, moving to something like opencue might be better (caveats apply)

thegginthesky
1 replies
18h41m

I know someone who works on this in Meta. His resume is computer science heavy, with a masters in Machine Learning. On the previous experience side, before getting into Meta, he had about a decade working as a Software Engineer with Machine Learning system in multiple languages, such as Go, C++ and Python.

To get the job he applied for a spot I'm Software Engineer applied in Machine Learning, he went through the multiple step interview process, and then when he got the job he did a few weeks of training and interviewing teams. One of the teams in charge of optimizing ML code in Meta picked him up and now he works there.

Because of Meta's scale, optimizing code that saves a few ms or watts is a huge impact in the bottom line.

In sum:

- Get a formal education in the area - Get work experience somewhere - Apply for a big tech job in Software Engineer applied with ML - Hope they hire you and have a spot in one of the teams in charge of optimizing stuff

jvanderbot
0 replies
17h32m

This is helpful thank you. There's always some luck.

I have a PhD in CS, and lots of experience in optimization and some in throughput/speedups (in an amdahl sense) for planning problems. My biggest challenge is really getting something meaty with high constraints or large compute requirements. By the time I get a pipeline set up it's good enough and we move on. So it's tough to build up that skillset to get in the door where the big problems are.

gajjanag
1 replies
18h18m

Our group works on some of this stuff at Meta, and we have a pretty good diversity of backgrounds - high performance computing (the bulk), computer systems, compilers, ML engineers, etc. We are hiring.

Feel free to DM me to learn more.

jvanderbot
0 replies
17h30m

I will, thank you. Any info is very helpful.

yalok
0 replies
17h35m

start with something small - take some kernel function in C, and try to optimize it for your laptops assembly SIMD instruction set.

chillee
0 replies
16h37m

I work on PyTorch Compilers at Meta, and I think folks enter ML Systems from all directions :)

Some folks start with more familiarity in ML research and dip down as far as they need.

Other folks come from a traditional distributed systems/compilers/HPC background, and apply those skills to ML systems.

CuriouslyC
13 replies
1d1h

Yann wants to be open and Mark seems happy to salt the earth.

bananabrick
11 replies
1d1h

What do you mean?

CuriouslyC
10 replies
1d1h

In pretty much every interview, Yann has talked about how important that AI infrastructure is open and distributed for the good of humanity, and how he wouldn't work for a company that wasn't open. Since Mark doesn't have an AI product to cannibalize, it's in his interest to devalue the AI products of others ("salting the earth").

Legend2440
7 replies
1d

I don't see how they're devaluing other people's AI products.

crakenzak
3 replies
1d

The angle is that by releasing cutting edge AI research to the public openly, the relative difference between open source models/tech and closed source tech shrinks.

Whether or not you think the "value" of AI products is proportional to their performance gap vs the next closest thing or not is up to you. Very interesting PG essay I read recently talks about the opposite of this (Superlinear returns) where if you're half as good as the next competitor, you don't get half the customers, you get 0.

Essay: https://paulgraham.com/superlinear.html

nova22033
2 replies
1d

New Linux versions don't "salt the earth" for Windows.

nemothekid
0 replies
23h6m

Linux is not competitive as a desktop platform for regular users, but linux did "salt the earth" for the Server market.

dkarras
0 replies
16h47m

Windows should always provide enough additional value that makes up for what they are asking as money - compared to the free option. That is the point. If you had no other viable options, then they could do whatever they like. Now they have a baseline to compete with and it is very hard to compete with free.

jedberg
0 replies
21h1m

It's called commoditize the compliment.

If they make AI models free to use it makes OpenAI nearly valueless, which means that they can't survive and then sell Meta's competitors a better GenAI product than Meta can make themselves.

So basically since they don't make money directly on GenAI, it makes sense for them to release it for free so no one else can have something better, so they don't have to compete on GenAI abilities with their competitors.

conradev
0 replies
1d

Mistral makes comparable models to Facebook. Mistral charges money, Facebook does not. This negatively affect’s Mistral’s pricing power because a customer can get 70% of the performance they need for 0% of the cost.

The “0% of the cost” part is unique to software businesses because you can copy software so cheaply

CuriouslyC
0 replies
1d

The Llama models have played a large part in fostering the development of the open source LLM ecosystem, and I expect Llama3 to put in performance > mistral medium and anthropic haiku while being fully open and able to be run on consumer hardware.

chasd00
1 replies
1d

is "salting the earth", in the biblical sense of destroying your enemy and their land to the point where not even plants grow again, a SV term used for companies that promote open source?

CuriouslyC
0 replies
19h22m

It's a term used for making a certain type of business unviable. In this case, high quality open models will make closed source models less viable, since the closed source model providers won't be able to charge monopoly prices for their models, but will have to approach the price of cloud GPU time or lose customers to equally capable open models.

torginus
0 replies
1d

I genuinely think one of the most plausible short-term dangers of AI is the creation of lifelike bots which will be absolutely indistinguishable from real humans in short-form online interaction.

Since people don't want to talk to algorithms, this would result in them shunning all social media, which is a huge danger to companies in the space.

islewis
6 replies
1d1h

I know we won't get it this from FB, but I'd be really interested to see how the relationship of compute power to engineering hours scales.

They mention custom building as much as they can. If FB magically has the option to 10x the compute power, would they need to re-engineer the whole stack? What about 100x? Is each of these re-writes just a re-write, or is it a whole order of magnitude more complex?

My technical understanding of what's under the hood of these clusters is pretty surface level- super curious if anyone with relevant experience has thoughts?

tintor
2 replies
23h52m

"just a re-write"

mirekrusin
1 replies
23h21m

...the idea is that at some point it "just re-writes" itself.

ametrau
0 replies
17h18m

The day after that, we have true AGI.

jvalencia
1 replies
19h3m

The cost of training quickly outpaces the cost of development as context length increases. So hardware is cheap until it isn't anymore, by orders of magnitude.

samstave
0 replies
18h57m

But there is still significant cost in the physical buildouts of new pods/DCs, whatever and the human engineering hours to physically build, even though its a mix of resources across the vendors and FB? - it still would be interesting to know man hours into the physical build of the HW.

bilekas
0 replies
1d1h

I'm not 100% sure but I would.make an educated guess that that cluster in the first image for example is a sample of scalable clusters, so throwing more hardware at it could bring improvements but sooner or later the cost to improvements will call for an optimization or rewrite as you call it, so a bit of both usually. It seems a bit of a balancing act really!

elwell
6 replies
1d1h

Meta’s long-term vision is to build artificial general intelligence (AGI)
valzam
5 replies
18h2m

Don't worry, this goal will change with the next hype cycle

latchkey
4 replies
16h31m

I pity the fools that think AI is just another internet hype cycle.

brookst
3 replies
15h0m

I’m old enough to remember the proud, defiant declarations that the internet was just a hype cycle.

bennyelv
1 replies
11h56m

Well it was wasn’t it? There was a massive boom where loads of companies over promised what they would achieve, followed by a crash when everyone realised lots of them couldn’t, followed by stability for the smaller number that could.

It was the very definition of a hype cycle as far as I can see. Hype cycle doesn’t mean “useless and will go away”, you have the second upward curve and then productivity.

https://en.m.wikipedia.org/wiki/Gartner_hype_cycle

brookst
0 replies
3h12m

I don’t disagree, but a lot of “analysis” was not that nuanced. At one time I worked for a company where 90% of revenue was from printed periodicals. Smart, capable executives assured the whole company that the internet was not a threat, just something college kids used for fun.

Colloquial, dismissive use of “hype cycle” does not usually mean “this will change the world but foolish things, soon forgotten, will also be done in the short term”. Though I agree a deeper understanding of the term can suggest that.

latchkey
0 replies
14h40m

I got my first email in 1991 and started my first internet business in 1995 (a web dev shop). My entire life has been an endless hype cycle.

spencerchubb
5 replies
16h55m

All this compute and my Instagram Reels feed still isn't as good as my TikTok feed

zeroonetwothree
4 replies
16h21m

What does that have to do with Gen AI

spencerchubb
2 replies
16h4m

GenAI infra is the same as regular AI infra. They used GenAI in the title because it's a buzzword.

refulgentis
0 replies
15h44m

Yeah, no.

ipsum2
0 replies
15h52m

Not really. Ranking and recommendation models require different infrastructure than LLMs. The models are generally smaller and require more data processing before training.

lmm
0 replies
15h1m

If Gen AI doesn't have anything to do with "Meta"'s actual business then WTF are they setting all this money on fire for?

choppaface
5 replies
1d1h

Total cluster they say will reach 350k H100, which at $30k street price is about $10b.

In contrast, Microsoft is spending over $10b per quarter capex on cloud.

That makes Zuck look conservative after his big loss on metaverse.

https://www.datacenterdynamics.com/en/news/q3-2023-cloud-res...

baby
2 replies
1d1h

What loss lol. Stop the fud

Legend2440
1 replies
1d

Has literally anyone spent money on the metaverse? Maybe it'll still take off in the future, but it's a $40b loss so far.

artninja1988
0 replies
23h55m

Has literally anyone spent money on the metaverse?

I guess people buy their vr headsets, if that counts. I'm not too familiar with what the "metaverse" entails though...

yuliyp
0 replies
1d1h

That's a weird comparison. The GPU is only a part of the capex: there's the rest of the servers and racks, the networking, as well as the buildings/cooling systems to support that.

KaiserPro
0 replies
22h1m

the biggest cost at meta is infra.

In contrast, Microsoft is spending over $10b per quarter capex on cloud.

to service other people's work load. Its a different business.

zerop
4 replies
1d1h

At Meta, we handle hundreds of trillions of AI model executions per day

Such a large number, makes sense?

sangnoir
0 replies
1d

How many ads does Meta serve a day, and how many AI model executions are done for each one? Repeat the same for stories, post and comment recommendations on Facebook and Instagram, and you have very big numbers. To that, Add VR, internal modeling and other backoffice/ offline analyses over billions of users and you'll easily get into the trillions.

pants2
0 replies
1d1h

Perhaps there's some combinatorics where every time an ad or post is displayed to the user, it runs through some hundreds/thousands of candidates and computes their relevance.

dakiol
0 replies
1d

What's an "AI model execution"? When I ask something to ChatGPT and it answers to me, does that count as 1 "AI model execution" for OpenAI?

GeneralMayhem
0 replies
1d

Sure. 100T/day * 1day/86400sec ~= 1B/sec. They're probably considering at least a few hundred candidates per impression, and every impression is going to go through _at least_ two models (relevance and pCTR/revenue), so you could get there just with online serving at 5Mqps, which is plausible. But they're also going to be doing a lot of stuff in batch - spam predictions, ad budget forecasts, etc - so that every candidate actually runs through four or five different models, and every actual impression could do more than that.

latchkey
3 replies
1d1h

we have successfully used both RoCE and InfiniBand clusters for large, GenAI workloads (including our ongoing training of Llama 3 on our RoCE cluster) without any network bottlenecks.

Interesting dig on IB. RoCE is the right solution since it is open standards and more importantly, available without a 52+ week lead time.

loeg
2 replies
21h22m

Yeah, and RoCE isn't single vendor. I'm not sure IB scales to the relevant cluster sizes, either.

anonymousDan
1 replies
20h18m

Is NVLink just not scalable enough here?

loeg
0 replies
19h34m

I don't know. I haven't actually worked with IB in this specific space (or since before Nvidia acquired MLNX). My experience with RoCE/IB was for storage cluster backend in the late 2010s.

froonly
3 replies
1d

lmfao at the Meta folks not giving any credit whatsoever to the company that actually came up with and implemented the infrastructure work.

jfkfif
2 replies
1d

What’s the company?

sangnoir
1 replies
1d

Facebook.

_zoltan_
0 replies
19h18m

???

alexsereno
3 replies
1d1h

Honestly Meta is consistently one of the better companies at releasing tech stack info or just open sourcing, these kinds of articles are super fun

adamnemecek
1 replies
1d1h

Do you find this informative?

alexsereno
0 replies
1d

Yes of course - it depends on what lens though. If you mean "I'm learning to build better from this" then no, but its very informative on Meta's own goals and mindset as well as real numbers that allow comparison to investment in other areas, etc. Also the point was mostly that Meta does publish a lot in the open - including actual open source tech stacks etc. They're reasonably good actors in this specific domain.

rshm
0 replies
1d1h

I think some elements of this stack might flow into the open compute.

sashank_1509
2 replies
16h16m

Metas backing itself into a corner with its admirable commitment to open source. Unfortunately, at some point when they decide to monetize their billions spent and try to release a closed source model, the level of vitriol they will deal with will be an order of magnitude above what even OpenAI is experiencing. I don’t think they realize that!

bigcat12345678
0 replies
16h7m

No

Meta's commitment to Open Source is well under calculation.

OCP is a way to rally lower-tier vendors to form a semi-alliance to keep up with super-gorilla like AWS & Google.

LLaMA has already gained much more than its cost (look at the stock price, and the open source ecosystem built surrounding LLaMA, and Google's open source Gemma models which is a proof of Meta's success).

IMHO, Meta's Open Source strategy already covered at least 5 years in prospect. That's enough to finesse a 180 degree turn around if necessary (i.e., from open source to close source)

Horffupolde
0 replies
16h10m

The general public doesn’t care. Only developers.

pinko
2 replies
22h28m

The link mentions "our internal job scheduler" and how they had to optimize it for this work -- does anyone know what this job scheduler is called, or how it works?

KaiserPro
1 replies
22h4m

it might be twine: https://www.usenix.org/system/files/osdi20-tang.pdf

but I suspect its not that, because Twine is optimised for services rather than batch processing, and doesn't really have the concept of priorities.

radicality
0 replies
19h57m

I would think it’s probably that. Also, has this been renamed to Twine from Tupperware?

mrkramer
2 replies
23h22m

"Share this: Hacker News" Noice

BonoboIO
1 replies
23h10m

I thought at first "what are you talking about", when i check my uBlock filters. Was blocking the whole "Share this" content section.

Sharing on Hacker News ... they now their audience.

mrkramer
0 replies
22h31m

I also use uBlock but my filters are the default ones and I saw it without any problem but tbh this is the first time that I saw some post on the Web have HN as a share option or the first time that I was surprised seeing it. Maybe it has something to do with Google ranking "trusted human information and knowledge" higher than "non-human" information and knowledge[0] or simply some Meta software engineer loves and uses HN so s/he decided to include HN as well, idk.

[0] https://news.ycombinator.com/item?id=39423949

zone411
1 replies
1d

Meta is still playing catch-up. Might be hard to believe but according to Reuters they've been trying to run AI workloads mostly on CPUs until 2022 and they had to pull the plug on the first iteration of their AI chip.

https://www.reuters.com/technology/inside-metas-scramble-cat...

axpy906
0 replies
19h41m

Definitely has some pr buzz and flex in the article. Now I see why.

wseqyrku
1 replies
23h44m

Commitment to open AI innovation

I see what you did there, Meta.

owenpalmer
0 replies
23h2m

Haha, I noticed that too xD

seydor
1 replies
22h50m

This is great news for Nvidia and their stock, but are they sure the LLMs and image models will scale indefinitely? nature and biology has a preference for sigmoids. What if we find out that AGI requries different kinds of cpu capabilities

jiggawatts
0 replies
19h44m

If anything, NVIDIA H100 GPUs are too general purpose! The optimal compute for AI training would be more specialised, but then would be efficient at only one NN architecture. Until we know what the best architecture is, the general purpose clusters remain a good strategy.

delanyoyoko
1 replies
23h36m

You've got to read "open" roughly 3x in a paragraph.

papichulo2023
0 replies
22h23m

If they release models I dont care honestly, they can brag about that as much as they want.

mejutoco
0 replies
23h42m

Those reviews are hilarious

sidcool
0 replies
18h25m

Those are some seriously great engineering numbers. Mera, with all the negative pressure it receives (rightfully so) is an engineering powerhouse.

But I do wonder how they foresee monetising this.

pwb25
0 replies
1d

so tired of this, not everyone need to work with AI stuff. work on facebook that is a disaster page instead

pedrovhb
0 replies
17h52m

Meta seems to actually be taking all the right steps in how they're contributing to open source AI research. Is this a "commodotize your complement" kind of situation?

marmaduke
0 replies
1d1h

Just for comparison, Swiss CSCS new Alps system will get 5k GH200 nodes (each with a H100).

lvl102
0 replies
1d1h

This reads more like a flex for the investment community.

ilaksh
0 replies
1d

"Everything You Wanted to Know About GenAI at Meta, Except the One Thing You Honestly Care About" (Llama 3).

delegate
0 replies
1d

Subtitled 'Here's what you'll never be able to do'.

dekhn
0 replies
23h29m

it's really interesting just how similar these systems are to the designs adopted for HPC over the past few decades. I'm salty because it took a while for the ML community to converge on this (20+K GPUs connected by a real fabric with low latency and high bandwidth).

benreesman
0 replies
20h42m

I think it’s always useful to pay attention to the history on stuff like this and it’s a rare pleasure to be able to give some pointers in the literature along with some color to those interested from first-hand experience.

I’d point the interested at the DLRM paper [1]: that was just after I left and I’m sad I missed it. FB got into disagg racks and SDN and stuff fairly early, and we already had half-U dual-socket SKUs with the SSD and (increasingly) even DRAM elsewhere in the rack in 2018, but we were doing huge NNs for recommenders and rankers even for then. I don’t know if this is considered proprietary so I’ll play it safe and just say that a click-prediction model on IG Stories in 2018 was on the order of a modest but real LLM today (at FP32!).

The crazy part is they were HOGWILD trained on Intel AVX-2, which is just wild to think about. When I was screwing around with CUDA kernels we were time sharing NVIDIA dev boxes, typically 2-4 people doing CUDA were splitting up a single card as late as maybe 2016. I was managing what was called “IGML Infra” when I left and was on a first-name basis with the next-gen hardware people and any NVIDIA deal was still so closely guarded I didn’t hear more than rumors about GPUs for training let alone inference.

350k Hopper this year, Jesus. Say what you want about Meta but don’t say they can’t pour concrete and design SKUs on a dime: best damned infrastructure folks in the game pound-for-pound to this day.

The talk by Thomas “tnb” Bredillet in particular I’d recommend: one of the finest hackers, mathematicians, and humans I’ve ever had the pleasure to know.

[1] https://arxiv.org/pdf/1906.00091.pdf

[2] https://arxiv.org/pdf/2108.09373.pdf

[3] https://engineering.fb.com/2022/10/18/open-source/ocp-summit...

[4] https://youtu.be/lQlIwWVlPGo?si=rRbRUAXX7aM0UcVO