return to table of content

Open source AI is the path forward

JumpCrisscross
63 replies
2h54m

“The Heavy Press Program was a Cold War-era program of the United States Air Force to build the largest forging presses and extrusion presses in the world.” This ”program began in 1944 and concluded in 1957 after construction of four forging presses and six extruders, at an overall cost of $279 million. Six of them are still in operation today, manufacturing structural parts for military and commercial aircraft” [1].

$279mm in 1957 dollars is about $3.2bn today [2]. A public cluster of GPUs provided for free to American universities, companies and non-profits might not be a bad idea.

[1] https://en.m.wikipedia.org/wiki/Heavy_Press_Program

[2] https://data.bls.gov/cgi-bin/cpicalc.pl?cost1=279&year1=1957...

CardenB
29 replies
2h49m

Doubtful that GPUs purchased today would be in use for a similar time scale. Govt investment would also drive the cost of GPUs up a great deal.

Not sure why a publicly accessible GPU cluster would be a better solution than the current system of research grants.

JumpCrisscross
20 replies
2h46m

Doubtful that GPUs purchased today would be in use for a similar time scale

Totally agree. That doesn't mean it can't generate massive ROI.

Govt investment would also drive the cost of GPUs up a great deal

Difficult to say this ex ante. On its own, yes. But it would displace some demand. And it could help boost chip production in the long run.

Not sure why a publicly accessible GPU cluster would be a better solution than the current system of research grants

Those receiving the grants have to pay a private owner of the GPUs. That gatekeeping might be both problematic, if there is a conflict of interests, and inefficient. (Consider why the government runs its own supercomputers versus contracting everything to Oracle and IBM.)

rvnx
19 replies
2h42m

It would be better that the government removes IP on such technology for public use, like drugs got generics.

This way the government pays 2'500 USD per card, not 40'000 USD or whatever absurd.

JumpCrisscross
17 replies
2h41m

better that the government removes IP on such technology for public use, like drugs got generics

You want to punish NVIDIA for calling its shots correctly? You don't see the many ways that backfires?

gpm
9 replies
2h28m

No. But I do want to limit the amount we reward NVIDIA for calling the shots correctly to maximize the benefit to society. For instance by reducing the duration of the government granted monopolies on chip technology that is obsolete well before the default duration of 20 years is over.

That said, it strikes me that the actual limiting factor is fab capacity not nvidia's designs and we probably need to lift the monopolies preventing competition there if we want to reduce prices.

JumpCrisscross
6 replies
2h25m

reducing the duration of the government granted monopolies on chip technology that is obsolete well before the default duration of 20 years is over

Why do you think these private entities are willing to invest the massive capital it takes to keep the frontier advancing at that rate?

I do want to limit the amount we reward NVIDIA for calling the shots correctly to maximize the benefit to society

Why wouldn't NVIDIA be a solid steward of that capital given their track record?

gpm
5 replies
2h18m

Why do you think these private entities are willing to invest the massive capital it takes to keep the frontier advancing at that rate?

Because whether they make 100x or 200x they make a shitload of money.

Why wouldn't NVIDIA be a solid steward of that capital given their track record?

The problem isn't who is the steward of the capital. The problem is that economically efficient thing to do for a single company is (given sufficient fab capacity, and a monopoly) to raise prices to extract a greater share of the pie at the expense of shrinking the size of the pie. I'm not worried about who takes the profit, I'm worried about the size of the pie.

whimsicalism
4 replies
1h41m

Because whether they make 100x or 200x they make a shitload of money.

It's not a certainty that they 'make a shitload of money'. Reducing the right tail payoffs absolutely reduces the capital allocated to solve problems - many of which are risky bets.

Your solution absolutely decreases capital investment at the margin, this is indisputable and basic economics. Even worse when the taking is not due to some pre-existing law, so companies have to deal with the additional uncertainty of whether & when future people will decide in retrospect that they got too large a payoff and arbitrarily decide to take it from them.

gpm
3 replies
1h29m

You can't just look at the costs to an action, you also have to look at the benefits.

Of course I agree I'm going to stop marginal investments from occurring into research into patent-able technologies by reducing the expect profit. But I'm going to do so very slightly because I'm not shifting the expected value by very much. Meanwhile I'm going to greatly increase the investment into the existing technology we already have, and allow many more people to try to improve upon it, and I'm going to argue the benefits greatly outweigh the costs.

Whether I'm right or wrong about the net benefit, the basic economics here is that there are both costs and benefits to my proposed action.

And yes I'm going to marginally reduce future investments because the same might happen in the future and that reduces expected value. In fact if I was in charge the same would happen in the future. And the trade-off I get for this is that society gets the benefit of the same actually happening in the future and us not being hamstrung by unbreachable monopolies.

whimsicalism
0 replies
1h12m

But I'm going to do so very slightly because I'm not shifting the expected value by very much

I think you're shifting it by a lot. If the government can post-hoc decide to invalidate patents because the holder is getting too successful, you are introducing a substantial impact on expectations and uncertainty. Your action is not taken in a vacuum.

Meanwhile I'm going to greatly increase the investment into the existing technology we already have, and allow many more people to try to improve upon it, and I'm going to argue the benefits greatly outweigh the costs.

I think this is a much more speculative impact. Why will people even fund the improvements if the government might just decide they've gotten too large a slice of the pie later on down the road?

the trade-off I get for this is that society gets the benefit of the same actually happening in the future and us not being hamstrung by unbreachable monopolies.

No the trade-off is that materially less is produced. These incentive effects are not small. Take for instance, drug price controls - a similar post-facto taking because we feel that the profits from R&D are too high. Introducing proposed price controls leads to hundreds of fewer drugs over the next decade [0] - and likely millions of premature deaths downstream of these incentive effects. And that's with a policy with a clear path towards short-term upside (cheaper drug prices). Discounted GPUs by invalidating nvidia's patents has a much more tenuous upside and clear downside.

[0]: https://bpb-us-w2.wpmucdn.com/voices.uchicago.edu/dist/d/312...

hluska
0 replies
29m

You have proposed state ownership of all successful IP. That is a massive change and yet you have demonstrated zero understanding of the possible costs.

Your claim that removing a profit motivation will increase investment is flat out wrong. Everything else crumbles from there.

JumpCrisscross
0 replies
1h21m

I'm going to do so very slightly because I'm not shifting the expected value by very much

You're massively increasing uncertainty.

the same would happen in the future. And the trade-off I get for this is that society gets the benefit

Why would you expect it would ever happen again? What you want is an unrealized capital gains tax. Not to nuke our semiconductor industry.

whimsicalism
0 replies
1h48m

there is no such thing as a lump-sum transfer, this will shift expectations and incentives going forward and make future large capital projects an increasingly uphill battle

hluska
0 replies
32m

So, if a private company is successful, you will nationalize its IP under some guise of maximizing the benefit to society? That form of government was tried once. It failed miserably.

Under your idea, we’ll try a badly broken economic philosophy again. And while we’re at it, we will completely stifle investment in innovation.

Teever
5 replies
2h24m

There was a post[0] on here recently about how the US went from producing woefully insufficient numbers of aircraft to producing 300k by the end of world war 2.

One of the things that the post mentioned was the meager profit margin that the companies made during this time.

But the thing is that this set the America auto and aviation industry up to rule the world for decades.

A government going to a company and saying 'we need you to produce this product for us at a lower margin thab you'd like to' isn't the end of the world.

I don't know if this is one of those scenarios but they exist.

[0] https://www.construction-physics.com/p/how-to-build-300000-a...

rvnx
4 replies
2h4m

In the case of NVIDIA it's even more sneaky.

They are an intellectual property company holding the rights on plans to make graphic cards, not even a company actually making graphic cards.

The government could launch an initiative "OpenGPU" or "OpenAI Accelerator", where the government orders GPUs from TSMC directly, without the middleman.

It may require some tweaking in the law to allow exception to intellectual property for "public interest".

whimsicalism
3 replies
1h46m

y'all really don't understand how these actions would seriously harm capital markets and make it difficult for private capital formation to produce innovations going forward.

freeone3000
2 replies
1h24m

If we have public capital formation, we don’t necessarily need private capital. Private innovation in weather modelling isn’t outpacing government work by leaps and bounds, for instance.

whimsicalism
1 replies
1h10m

because it is extremely challenging to capture the additional value that is being produced by better weather forecasts and generally the forecasts we have right now are pretty good.

private capital is absolutely the driving force for the vast majority of innovations since the beginning of the 20th century. public capital may be involved, but it is dwarfed by private capital markets.

freeone3000
0 replies
25m

It’s challenging to capture the additional value and the forecasts are pretty good because of continual large-scale government investment into weather forecasting. NOAA is launching satellites! it’s a big deal!

Private nuclear research is heavily dependent on governmental contracts to function. Solar was subsidized to heck and back for years. Public investment does work, and does make a didference.

I would even say governmental involvement is sometimes even the deciding factor, to determine if research is worth pursuing. Some major capital investors have decided AI models cannot possibly gain enough money to pay for their training costs. So what do we do when we believe something is a net good for society, but isn’t going to be profitable?

panarky
0 replies
2h24m

To the extent these are incremental units that wouldn't have been sold absent the government program, it's difficult to see how NVIDIA is "harmed".

kube-system
0 replies
1h21m

It would be better that the government removes IP on such technology for public use, like drugs got generics.

20-25 year old drugs are a lot more useful than 20-25 year old GPUs, and the manufacturing supply chain is not a bottleneck.

There's no generics for the latest and greatest drugs, and a fancy gene therapy might run a lot more than $40k.

ygjb
6 replies
2h32m

Of course they won't. The investment in the Heavy Press Program was the initial build, and just citing one example, the Alcoa 50,000 ton forging press was built in 1955, operated until 2008, and needed ~$100M to get it operational again in 2012.

The investment was made to build the press, which created significant jobs and capital investment. The press, and others like it, were subsequently operated by and then sold to a private operator, which in turn enabled the massive expansion of both military manufacturing, and commercial aviation and other manufacturing.

The Heavy Press Program was a strategic investment that paid dividends by both advancing the state of the art in manufacturing at the time it was built, and improving manufacturing capacity.

A GPU cluster might not be the correct investment, but a strategic investment in increasing, for example, the availability of training data, or interoperability of tools, or ease of use for building, training, and distributing models would probably pay big dividends.

dmix
2 replies
2h23m

I don't think there's a shortage of capital for AI... probably the opposite

Of all the things to expand the scope of government spending why would they choose AI, or more specifically GPUs?

hluska
0 replies
34m

Look at it from the perspective of an elected official:

If it succeeds, you were ahead of the curve. If it fails, you were prudent enough to fund an investigation early. Either way, bleeding edge tech gives you a W.

devmor
0 replies
48m

There may however, be a shortage of capital for open source AI, which is the subject under consideration.

As for the why... because there's no shortage of capital for AI. It sounds like the government would like to encourage redirecting that capital to something that's good for the economy at large, rather than good for the investors of a handful of Silicon Valley firms interested only in their own short term gains.

JumpCrisscross
1 replies
2h28m

A GPU cluster might not be the correct investment, but a strategic investment in increasing, for example, the availability of training data, or interoperability of tools, or ease of use for building, training, and distributing models would probably pay big dividends

Would you mind expanding on these options? Universal training data sounds intriguing.

ygjb
0 replies
2h17m

Sure, just on the training front, building and maintaining a broad corpus of properly managed training data with metadata that provides attribution (for example, content that is known to be human generated instead of model generated, what the source of data is for datasets such as weather data, census data, etc), and that also captures any licensing encumbrance so that consumers of the training data can be confident in their ability to use it without risk of legal challenge.

Much of this is already available to private sector entities, but having a publicly funded organization responsible for curating and publishing this would enable new entrants to quickly and easily get a foundation without having to scrape the internet again, especially given how rapidly model generated content is being published.

whimsicalism
0 replies
1h49m

there are many things i think are more capital constrained, if the government is trying to subsidize things.

jvanderbot
0 replies
1h33m

A much better investment would be to (somehow) revolutionize production of chips for AI so that it's all cheaper, more reliable, and faster to stand up new generations of software and hardware codesign. This is probably much closer to the program mentioned in the top level comment: It wasn't to produce one type of thing, but to allow better production of any large thing from lighter alloys.

ks2048
6 replies
2h24m

How about using some of that money to develop CUDA alternatives so everyone is not paying the Nvidia tax?

whimsicalism
1 replies
2h4m

It seems like rocm is already fully ready for transformer inference, so you are just referring to training?

janalsncm
0 replies
50m

ROCm is buggy and largely undocumented. That’s why we don’t use it.

zitterbewegung
0 replies
1h35m

Either you port Tensorflow (Apple)[1] or PyTorch to your platform or you allow CUDA to run on your hardware (AMD) [2]. Companies are incentives to not have NVIDIA having a monopoly but the thing is that CUDA is a huge moat due to compatibility of all frameworks and everyone knows it. Also, all of the cloud or on premises providers use NVIDIA regardless.

[1] https://developer.apple.com/metal/tensorflow-plugin/ [2] https://www.xda-developers.com/nvidia-cuda-amd-zluda/

lukan
0 replies
2h15m

It would be probably cheaper to negate some IP. There are quite some projects and initiatives to make CUDA code run on AMD for example, but as far as I know, they all stopped at some point, probably because of fear of being sued into oblivion.

erickj
0 replies
1h3m

That's the kind of work that can come out of academia and open source communities when societies provide the resources required.

belter
0 replies
1h41m

Please start with the Windows Tax first for Linux users buying hardware...and the Apple Tax for Android users...

epaulson
4 replies
2h36m

The National Science Foundation has been doing this for decades, starting with the supercomputing centers in the 80s. Long before anyone talked about cloud credits, NSF has had a bunch of different programs to allocate time on supercomputers to researchers at no cost, these days mostly run out of the Office of Advanced Cyberinfrastruture. (The office name is from the early 00s) - https://new.nsf.gov/cise/oac

(To connect universities to the different supercomputing centers, the NSF funded the NSFnet network in the 80s, which was basically the backbone of the Internet in the 80s and early 90s. The supercomputing funding has really, really paid off for the USA)

JumpCrisscross
1 replies
2h35m

NSF has had a bunch of different programs to allocate time on supercomputers to researchers at no cost, these days mostly run out of the Office of Advanced Cyberinfrastruture

This would be the logical place to put such a programme.

alephnerd
0 replies
50m

The DoE has also been a fairly active purchaser of GPUs for almost two decades now thanks to the Exascale Computing Project [0] and other predecessor projects.

The DoE helped subsidize development of Kepler, Maxwell, Pascal, etc along with the underlying stack like NVLink, NGC, CUDA, etc either via purchases or allowing grants to be commercialized by Nvidia. They also played matchmaker by helping connect private sector research partners with Nvidia.

The DoE also did the same thing for AMD and Intel.

[0] - https://www.exascaleproject.org/

jszymborski
0 replies
46m

As you've rightly pointed out, we have the mechanism, now let's fund it properly!

I'm in Canada, and our science funding has likewise fallen year after year as a proportion of our GDP. I'm still benefiting from A100 clusters funded by tax payer dollars, but think of the advantage we'd have over industry if we didn't have to fight over resources.

cmdrk
0 replies
16m

Yeah, the specific AI/ML-focused program is NAIRR.

https://nairrpilot.org/

Terrible name unless they low-key plan to make AI researchers' hair fall out.

fweimer
3 replies
2h37m

Don't these public clusters exist today, and have been around for decades at this point, with varying architectures? In the sense that you submit a proposal, it gets approved, and then you get access for your research?

JumpCrisscross
1 replies
2h34m

Not--to my knowledge--for the GPUs necessary to train cutting-edge LLMs.

Maxious
0 replies
2h21m

All of the major cloud providers offer grants for public research https://www.amazon.science/research-awards https://edu.google.com/intl/ALL_us/programs/credits/research https://www.microsoft.com/en-us/azure-academic-research/

NVIDIA offers discounts https://developer.nvidia.com/education-pricing

eg. for Australia, the National Computing Infrastructure allows researchers to reserve time on:

- 160 nodes each containing four Nvidia V100 GPUs and two 24-core Intel Xeon Scalable 'Cascade Lake' processors.

- 2 nodes of the NVIDIA DGX A100 system, with 8 A100 GPUs per node.

https://nci.org.au/our-systems/hpc-systems

NewJazz
0 replies
1h21m

This is the most recent iteration of a national platform. They have tons of GPUs (and CPUs, and flash storage) hooked up as a Kubernetes cluster, available for teaching and research.

https://nationalresearchplatform.org/

light_hue_1
2 replies
2h40m

The problem is that any public cluster would be outdated in 2 years. At the same time, GPUs are massively overpriced. Nvidia's profit margins on the H100 are crazy.

Until we get cheaper cards that stand the test of time, building a public cluster is just a waste of money. There are far better ways to spend $1b in research dollars.

ninininino
0 replies
1h33m

What about dollar cost averaging your purchases of GPUs? So that you're always buying a bit of the newest stuff every year rather than just a single fixed investment in hardware that will become outdated? Say 100 million a year every year for 20 years instead of 2 billion in a single year?

JumpCrisscross
0 replies
2h35m

any public cluster would be outdated in 2 years

The private companies buying hundreds of billions of dollars of GPUs aren't writing them off in 2 years. They won't be cutting edge for long. But that's not the point--they'll still be available.

Nvidia's profit margins on the H100 are crazy

I don't see how the current practice of giving a researcher a grant so they can rent time on a Google cluster that runs H100s is more efficient. It's just a question of capex or opex. As a state, the U.S. has a structual advantage in the former.

far better ways to spend $1b in research dollars

One assumes the U.S. government wouldn't be paying list price. In any case, the purpose isn't purely research ROI. Like the heavy presses, it's in making a prohibitively-expensive capital asset generally available.

aiauthoritydev
2 replies
1h5m

Overall government doing anything is a bad idea. There are cases however where government is the only entity that can do certain things. These are things that involve military, law enforcement etc. Outside of this we should rely on private industry and for-profit industry as much as possible.

pavlov
0 replies
1h2m

The American healthcare industry demonstrates the tremendous benefits of rigidly applying this mindset.

Why couldn’t law enforcement be private too? You call 911, several private security squads rush to solve your immediate crime issue, and the ones who manage to shoot the suspect send you a $20k bill. Seems efficient. If you don’t like the size of the bill, you can always get private crime insurance.

chris_wot
0 replies
57m

That’s not correct. The American health care system is an extreme example of where private organisations fail overall society.

goda90
1 replies
1h45m

I'd like to see big programs to increase the amount of cheap, clean energy we have. AI compute would be one of many beneficiaries of super cheap energy, especially since you wouldn't need to chase newer, more efficient hardware just to keep costs down.

Melatonic
0 replies
1h1m

Yeah this would be the real equivalent of the program people are talking about above. That an investing in core networking infrastructure (like cables) instead of just giving huge handouts to certain corporations that then pocket the money.....

blackeyeblitzar
1 replies
2h28m

What about distributed training on volunteer hardware? Is that feasible?

oersted
0 replies
2h4m

It is an exciting concept, there's a huge wealth of gaming hardware deployed that is inactive at most hours of the day. And I'm sure people are willing to pay well above the electricity cost for it.

Unfortunately, the dominant LLM architecture makes it relatively infeasible right now.

- Gaming hardware has too limited VRAM for training any kind of near-state-of-the-art model. Nvidia is being annoyingly smart about this to sell enterprise GPUs at exorbitant markups.

- Right now communication between machines seems to be the bottleneck, and this is way worse with limited VRAM. Even with data-centre-grade interconnect (mostly Infiniband, which is also Nvidia, smart-asses), any failed links tend to cause big delays in training.

Nevertheless, it is a good direction to push towards, and the government could indeed help, but it will take time. We need both a more healthy competitive landscape in hardware, and research towards model architectures that are easy to train in a distributed manner (this was also the key to the success of Transformers, but we need to go further).

BigParm
1 replies
1h14m

So we'll have the government bypass markets and force the working class to buy toys for the owning class?

If anything, allocate compute to citizens.

_fat_santa
0 replies
1h9m

If anything, allocate compute to citizens.

If something like this were to become a reality, I could see something like "CitizenCloud" where once you prove that you are a US Citizen (or green card holder or some other requirement), you can then be allocated a number of credits every month for running workloads on the "CitizenCloud". Everyone would get a baseline amount, from there if you can prove you are a researcher or own a business related to AI then you can get more credits.

spullara
0 replies
37m

It makes much more sense to invest in a next generation fab for GPUs than to buy GPUs and more closely matches this kind of project.

prpl
0 replies
2h11m

Great idea, too bad the DOE and NSF were there first.

kjkjadksj
0 replies
1h58m

The size of the cluster would have to be massive or else your job will be on the queue for a year. And even then what are you going to do downsize the resources requested so you can get in earlier? After a certain point it starts to make more sense to just buy your own xeons and run your own cluster.

Aperocky
0 replies
1h54m

Imagine if they made a data center with 1957 electronics that cost $279 million.

They probably won't be using it now because the phone in your pocket is likely more powerful. Moore law did end but data center stuff are still evolving order of magnitudes faster than forging presses.

the8thbit
62 replies
2h52m

"Eventually though, open source Linux gained popularity – initially because it allowed developers to modify its code however they wanted ..."

I find the language around "open source AI" to be confusing. With "open source" there's usually "source" to open, right? As in, there is human legible code that can be read and modified by the user? If so, then how can current ML models be open source? They're very large matrices that are, for the most part, inscrutable to the user. They seem akin to binaries, which, yes, can be modified by the user, but are extremely obscured to the user, and require enormous effort to understand and effectively modify.

"Open source" code is not just code that isn't executed remotely over an API, and it seems like maybe its being conflated with that here?

candiddevmike
24 replies
2h45m

None of Meta's models are "open source" in the FOSS sense, even the latest Llama 3.1. The license is restrictive. And no one has bothered to release their training data either.

This post is an ad and trying to paint these things as something they aren't.

JumpCrisscross
23 replies
2h44m

no one has bothered to release their training data

If the FOSS community sets this as the benchmark for open source in respect of AI, they're going to lose control of the term. In most jurisdictions it would be illegal for the likes of Meta to release training data.

exe34
20 replies
2h31m

the training data is the source.

JumpCrisscross
15 replies
2h26m

the training data is the source

Sure. But that's not going to be released. The term open source AI cannot be expected to cover it because it's not practical.

diggan
11 replies
2h19m

So because it's really hard to do proper Open Source with these LLMs, means we need to change the meaning of Open Source so it fits with these PR releases?

JumpCrisscross
10 replies
2h17m

because it's really hard to do proper Open Source with these LLMs, means we need to change the meaning of Open Source so it fits with these PR releases?

Open training data is hard to the point of impracticality. It requires excluding private and proprietary data.

Meanwhile, the term "open source" is massively popular. So it will get used. The question is how.

Meta et al would love for the choice to be between, on one hand, open weights only, and, on the other hand, open training data, because the latter is impractical. That dichotomy guarantees that when someone says open source AI they'll mean open weights. (The way open source software, today, generally means source available, not FOSS.)

Palomides
6 replies
2h12m

source available is absolutely not the same as open source

you are playing very loosely with terms that have specific, widely accepted definitions (e.g. https://opensource.org/osd )

I don't get why you think it would be useful to call LLMs with published weights "open source"

JumpCrisscross
4 replies
1h53m

terms that have specific, widely accepted definitions

OSF's definition is far from the only one [1]. Switzerland is currently implementing CH Open's definition, the EU another one, et cetera.

I don't get why you think it would be useful to call LLMs with published weights "open source"

I don't. I'm saying that if the choice is between open weights or open weights + open training data, open weights will win because the useful definition will outcompete the pristine one in a public context.

[1] https://en.wikipedia.org/wiki/Open-source_software#Definitio...

diggan
3 replies
1h45m

For the EU, I'm guessing you're talking about the EUPL, which is FSF/OSI approved and GPL compatible, generally considered copyleft.

For the CH Open, I'm not finding anything specific, even from Swiss websites, could you help me understand what you're referring to here?

I'm guessing that all these definitions have at least some points in common, which involves (another guess) at least being able to produce the output artifacts/binaries by yourself, something that you cannot do with Llama, just as an example.

JumpCrisscross
2 replies
1h40m

For the CH Open, I'm not finding anything specific, even from Swiss websites, could you help me understand what you're referring to here

Was on the HN front page earlier [1][2]. The definition comes strikingly close to source on request with no use restrictions.

all these definitions have at least some points in common

Agreed. But they're all different. There isn't an accepted defintiion of open source even when it comes to software; there is an accepted set of broad principles.

[1] https://news.ycombinator.com/item?id=41047172

[2] https://joinup.ec.europa.eu/collection/open-source-observato...

diggan
1 replies
1h33m

Agreed. But they're all different. There isn't an accepted defintiion of open source even when it comes to software; there is an accepted set of broad principles.

Agreed, but are we splitting hairs here and is it relevant to the claim made earlier?

(The way open source software, today, generally means source available, not FOSS.)

Do any of these principles or definitions from these orgs agree/disagree with that?

My hypothesis is that they generally would go against that belief and instead argue that open source is different from source available. But I haven't looked specifically to confirm if that's true or not, just a guess.

JumpCrisscross
0 replies
1h2m

are we splitting hairs here and is it relevant to the claim made earlier?

I don't think so. Take the Swiss definition. Source on request, not even available. Yet being branded and accepted as open source.

(To be clear, the Swiss example favours FOSS. But it also permits source on request and bundles them together under the same label.)

SquareWheel
0 replies
6m

specific, widely accepted definitions

Realistically, nobody outside of Hacker News commenters have ever cared about the OSD. It's just not how the term is used colloquially.

unethical_ban
1 replies
1h57m

Meanwhile, the term "open source" is massively popular. So it will get used. The question is how.

Here's the source of the disagreement. You're justifying the use of the term "open source" by saying it's logical for Meta to want to use it for its popularity and layman (incorrect) understanding.

Other person is saying it doesn't matter how convenient it is or how much Meta wants to use it, that the term "open source" is misleading for a product where the "source" is the training data, and the final product has onerous restrictions on use.

This would be like Adobe giving Photoshop away for free, but for personal use only and not for making ads for Adobe's competitors. Sure, Adobe likes it and most users may be fine with it, but it isn't open source.

The way open source software, today, generally means source available, not FOSS.

I don't agree with that. When a company says "open source" but it's not free, the tech community is quick to call it "source available" or "open core".

JumpCrisscross
0 replies
1h44m

You're justifying the use of the term "open source" by saying it's logical for Meta to want to use it for its popularity and layman (incorrect) understanding

I'm actually not a fan of Meta's definition. I'm arguing specifically against an unrealistic definition, because for practical purposes that cedes the term to Meta.

the term "open source" is misleading for a product where the "source" is the training data, and the final product has onerous restrictions on use

Agree. I think the focus should be on the use restrictions.

When a company says "open source" but it's not free, the tech community is quick to call it "source available" or "open core"

This isn't consistently applied. It's why we have the free vs open vs FOSS fracture.

diggan
0 replies
2h5m

Open training data is hard to the point of impracticality. It requires excluding private and proprietary data.

Right, so the onus is on Facebook/Meta to get that right, then they could call something Open Source, until then, find another name that already doesn't have a specific meaning.

(The way open source software, today, generally means source available, not FOSS.)

No, but it's going in that way. Open Source, today, still means that the things you need to build a project, is publicly available for you to download and run on your own machine, granted you have the means to do so. What you're thinking of is literally called "Source Available" which is very different from "Open Source".

The intent of Open Source is for people to be able to reproduce the work themselves, with modifications if they want to. Is that something you can do today with the various Llama models? No, because one core part of the projects "source code" (what you need to reproduce it from scratch), the training data, is being held back and kept private.

plsbenice34
1 replies
2h12m

Of course it could be practical - provide the data. The fact of that society is a dystopian nightmare controlled by a few megacorporations that don't want free information does not justify outright changing the meaning of the language.

JumpCrisscross
0 replies
2h3m

provide the data

Who? It's not their data.

tintor
0 replies
1h45m

Meta can call it something else other than open source.

Synthetic part of the training data could be released.

wrs
0 replies
14m

I don’t think even that is true. I conjecture that Facebook couldn’t reproduce the model weights if they started over with the same training data, because I doubt such a huge training run is a reproducible deterministic process. I don’t think anyone has “the” source.

sangnoir
0 replies
28m

We've had a similar debate before, but the last time it about whether Linux device drivers based on non-public datasheets under NDA were actually open source. This debate occurred again over drivers that interact with binary blobs.

I disagree with the purists - if you can legally change the source or weights - even without having access to the data used by the upstream authors - it's open enough for me. YMMV.

root_axis
0 replies
56m

No. It's an asset used in the training process, the source code can process arbitrary training data.

JimDabell
0 replies
1h44m

I don’t think it’s that simple. The source is “the preferred form of the work for making modifications to it” (to use the GPL’s wording).

For an LLM, that’s not the training data. That’s the model itself. You don’t make changes to an LLM by going back to the training data and making changes to it, then re-running the training. You update the model itself with more training data.

You can’t even use the training code and original training data to reproduce the existing model. A lot of it is non-deterministic, so you’ll get different results each time anyway.

Another complication is that the object code for normal software is a clear derivative work of the source code. It’s a direct translation from one form to another. This isn’t the case with LLMs and their training data. The models learn from it, but they aren’t simply an alternative form of it. I don’t think you can describe an LLM as a derivative work of its training data. It learns from it, it isn’t a copy of it. This is mostly the reason why distributing training data is infeasible – the model’s creator may not have the license to do so.

Would it be extremely useful to have the original training data? Definitely. Is distributing it the same as distributing source code for normal software? I don’t think so.

I think new terminology is needed for open AI models. We can’t simply re-use what works for human-editable code because it’s a fundamentally different type of thing with different technical and legal constraints.

mesebrec
1 replies
2h24m

Regardless of the training data, the license even heavily restricts how you can use the model.

Please read through their "acceptable use" policy before you decide whether this is really in line with open source.

JumpCrisscross
0 replies
2h13m

Please read through their "acceptable use" policy before you decide whether this is really in line with open source

I'm not taking a specific posiion on this license. I haven't read it closely. My broad point is simply that open source AI, as a term, cannot practically require the training data be made available.

blackeyeblitzar
3 replies
2h27m

That is just the inference code. Not training code or evaluation code or whatever pre/post processing they do.

osanseviero
0 replies
44m

Yes, there are a few dozen full open source models (license, code, data, models)

mesebrec
2 replies
2h23m

This is like saying any python program is open source because the python runtime is open source.

Inference code is the runtime; the code that runs the model. Not the model itself.

mkolodny
1 replies
1h19m

I disagree. The file I linked to, model.py, contains the Llama 3 model itself.

You can use that model with open data to train it from scratch yourself. Or you can load Meta’s open weights and have a working LLM.

causal
0 replies
57m

Yeah a lot of people here seem to not understand that PyTorch really does make model definitions that simple, and that has everything you need to resume back-propagation. Not to mention PyTorch itself being open-sourced by Meta.

That said the LLama-license doesn't meet strict definitions of OS, and I bet they have internal tooling for datacenter-scale training that's not represented here.

apsec112
1 replies
2h28m

That's not the training code, just the inference code. The training code, running on thousands of high-end H100 servers, is surely much more complex. They also don't open-source the dataset, or the code they used for data scraping/filtering/etc.

the8thbit
0 replies
2h18m

"just the inference code"

It's not the "inference code", its the code that specifies the architecture of the model and loads the model. The "inference code" is mostly the model, and the model is not legible to a human reader.

Maybe someday open source models will be possible, but we will need much better interpretability tools so we can generate the source code from the model. In most software projects you write the source as a specification that is then used by the computer to implement the software, but in this case the process is reversed.

Flimm
0 replies
1h55m

No, it's not. The Llama 3 Community License Agreement is not an open source license. Open source licenses need to meet the criteria of the only widely accepted definition of "open source", and that's the one formulated by the OSI [0]. This license has multiple restrictions on use and distribution which make it not open source. I know Facebook keeps calling this stuff open source, maybe in order to get all the good will that open source branding gets you, but that doesn't make it true. It's like a company calling their candy vegan while listing one its ingredients as pork-based gelatin. No matter how many times the company advertises that their product is vegan, it's not, because it doesn't meet the definition of vegan.

[0] - https://opensource.org/osd

bilsbie
8 replies
2h48m

Can’t you do fine tuning on those binaries? That’s a modification.

the8thbit
7 replies
2h29m

You can fine tune the models, and you can modify binaries. However, there is no human readable "source" to open in either case. The act of "fine tuning" is essentially brute forcing the system to gradually alter the weights such that loss is reduced against a new training set. This limits what you can actually do with the model vs an actual open source system where you can understand how the system is working and modify specific functionality.

Additionally, models can be (and are) fine tuned via APIs, so if that is the threshold required for a system to be "open source", then that would also make the GPT4 family and other such API only models which allow finetuning open source.

whimsicalism
4 replies
2h13m

I don't find this argument super convincing.

There's a pretty clear difference between the 'finetuning' offered via API by GPT4 and the ability to do whatever sort of finetuning you want and get the weights at the end that you can do with open weights models.

"Brute forcing" is not the correct language to use for describing fine-tuning. It is not as if you are trying weights randomly and seeing which ones work on your dataset - you are following a gradient.

the8thbit
3 replies
1h57m

"There's a pretty clear difference between the 'finetuning' offered via API by GPT4 and the ability to do whatever sort of finetuning you want and get the weights at the end that you can do with open weights models."

Yes, the difference is that one is provided over a remote API, and the provider of the API can restrict how you interact with it, while the other is performed directly by the user. One is a SaaS solution, the other is a compiled solution, and neither are open source.

""Brute forcing" is not the correct language to use for describing fine-tuning. It is not as if you are trying weights randomly and seeing which ones work on your dataset - you are following a gradient."

Whatever you want to call it, this doesn't sound like modifying functionality in source code. When I modify source code, I might make a change, check what that does, change the same functionality again, check the new change, etc... up to maybe a couple dozen times. What I don't do is have a very simple routine make very small modifications to all of the system's functionality, then check the result of that small change across the broad spectrum of functionality, and repeat millions of times.

emporas
1 replies
1h3m

When I modify source code, I might make a change, check what that does, change the same functionality again, check the new change, etc... up to maybe a couple dozen times.

You can modify individual neurons if you are so inclined. That's what Anthropic have done with the Claude family of models [1]. You cannot do that using any closed model. So "Open Weights" looks very much like "Open Source".

Techniques for introspection of weights are very primitive, but i do think new techniques will be developed, or even new architectures which will make it much easier.

[1] https://www.anthropic.com/news/mapping-mind-language-model

the8thbit
0 replies
34m

"You can modify individual neurons if you are so inclined."

You can also modify a binary, but that doesn't mean that binaries are open source.

"That's what Anthropic have done with the Claude family of models [1]. ... Techniques for introspection of weights are very primitive, but i do think new techniques will be developed"

Yeah, I don't think what we have now is robust enough interpretability to be capable of generating something comparable to "source code", but I would like to see us get there at some point. It might sound crazy, but a few years ago the degree of interpretability we have today (thanks in no small part to Anthropic's work) would have sounded crazy.

I think getting to open sourcable models is probably pretty important for producing models that actually do what we want them to do, and as these models become more powerful and integrated into our lives and production processes the inability to make them do what we actually want them to do may become increasingly dangerous. Muddling the meaning of open source today to market your product, then, can have troubling downstream effects as focus in the open source community may be taken away from interpretability and on distributing and tuning public weights.

Kubuxu
0 replies
1h13m

The gap between fine-tuning API and weights-available is much more significant than you give it credit for.

You can take the weights and train LoRAs (which is close to fine-tuning), but you can also build custom adapters on top (classification heads). You can mix models from different fine-tunes or perform model surgery (adding additional layers, attention heads, MoE).

You can perform model decomposition and amplify some of its characteristics. You can also train multi-modal adapters for the model. Prompt tuning requires weights as well.

I would even say that having the model is more potent in the hands of individual users than having the dataset.

bilsbie
1 replies
2h1m

You make a good point but those are also just limitations of the technology (or at least our current understanding of it)

Maybe an analogy would help. A family spent generations breeding the perfect apple tree and they decided to “open source” it. What would open sourcing look like?

the8thbit
0 replies
1h10m

"You make a good point but those are also just limitations of the technology (or at least our current understanding of it)"

Yeah, that is my point. Things that don't have source code can't be open source.

"Maybe an analogy would help. A family spent generations breeding the perfect apple tree and they decided to “open source” it. What would open sourcing look like?"

I think we need to be weary of dilemmas without solutions here. For example, let's think about another analogy: I was in a car accident last week. How can I open source my car accident?

I don't think all, or even most things, are actually "open sourcable". ML models could be open sourced, but it would require a lot of work to interpret the models and generate the source code from them.

orthoxerox
5 replies
2h48m

Open training dataset + open steps sufficient to train exactly the same model.

the8thbit
3 replies
2h37m

This isn't what Meta releases with their models, though I would like to see more public training data. However, I still don't think that would qualify as "open source". Something isn't open source just because its reproducible out of composable parts. If one, very critical and system defining part is a binary (or similar) without publicly available source code, then I don't think it can be said to be "open source". That would be like saying that Windows 11 is open source because Windows Calculator is open source, and its a component of Windows.

orthoxerox
1 replies
2h20m

That's what I meant by "open steps", I guess I wasn't clear enough.

the8thbit
0 replies
2h15m

Is that what you meant? I don't think releasing the sequence of steps required to produce the model satisfies "open source", which is how I interpreted you, because there is still no source code for the model.

Yizahi
0 replies
2h8m

They can't release training dataset if it was illegally scrapped all over the web without permission :) (taps head)

stale2002
3 replies
2h26m

Ok call it Open Weights then if the dictionary definitions matter so much to you.

The actual point that matters is that these models are available for most people to use for a lot of stuff, and this is way way better than what competitors like OpenAI offer.

the8thbit
2 replies
2h5m

They don't "[allow] developers to modify its code however they want", which is a critical component of "open source", and one that Meta is clearly trying to leverage in branding around its products. I would like them to start calling these "public weight models", because what they're doing now is muddying the waters so much that "open source" now just means providing an enormous binary and an open source harness to run it in, rather than serving access to the same binary via an API.

Voloskaya
1 replies
1h8m

Feels a bit like you are splitting hair for the pleasure of semantic arguments to be honest. Yes there are no source in ML, so if we want to be pedantic it shouldn't be called open source. But what really matters in the open source movement is that we are able to take a program built by someone and modify it to do whatever we want with it, without having to ask someone for permission or get scrutinized or have to pay someone.

The same applies here, you can take those models and modify them to do whatever you want (provided you know how to train ML models), without having to ask for permission, get scrutinized or pay someone.

I personally think using the term open source is fine, as it conveys the intent correctly, even if, yes, weights are not sources you can read with your eyes.

wrs
0 replies
17m

Calling that “open source” renders the word “source” meaningless. By your definition, I can release a binary executable freely and call it “open source” because you can modify it to do whatever you want.

Model weights are like a binary that nobody has the source for. We need another term.

jsheard
3 replies
2h47m

I also think that something like Chromium is a better analogy for corporate open source models than a grassroots project like Linux is. Chromium is technically open source, but Google has absolute control over the direction of it's development and realistically it's far too complex to maintain a fork without Googles resources, just like Meta has complete control over what goes into their open models, and even if they did release all the training data and code (which they don't) us mere plebs could never afford to train a fork from scratch anyway.

skybrian
2 replies
2h34m

I think you’re right from the perspective of an individual developer. You and I are not about to fork Chromium any time soon. If you presume that forking is impractical then sure, the right to fork isn’t worth much.

But just because a single developer couldn’t do it doesn’t mean it couldn’t be done. It means nobody has organized a large enough effort yet.

For something like a browser, which is critical for security, you need both the organization and the trust. Despite frequent criticism, Mozilla (for example) is still considered pretty trustworthy in a way that an unknown developer can’t be.

Yizahi
1 replies
2h9m

If Microsoft can't do it, then we can reasonably conclude that it can't be done for any practical purpose. Discussing infinitesimal possibilities is better left to philosophers.

skybrian
0 replies
1h2m

Doesn’t Microsoft maintain its own fork of Chromimum?

input_sh
0 replies
1h53m

Open Source Initiative (kind of a de-facto authority on what's open source and what not) is spending a whole lot of time figuring out what it means for an AI system to be open source. In other words, they're basically trying to come up with a new license because the existing ones can't easily apply.

I believe this is the current draft: https://opensource.org/deepdive/drafts/the-open-source-ai-de...

causal
0 replies
2h43m

"Open weights" is a more appropriate term but I'll point out that these weights are also largely inscrutable to the people with the code that trained it. And for licensing reasons, the datasets may not be possible to share.

There is still a lot of modifying you can do with a set of weights, and they make great foundations for new stuff, but yeah we may never see a competitive model that's 100% buildable at home.

Edit: mkolodny points out that the model code is shared (under llama license at least), which is really all you need to run training https://github.com/meta-llama/llama3/blob/main/llama/model.p...

Zambyte
0 replies
1h29m

If so, then how can current ML models be open source?

The source of a language model is the text it was trained on. Llama models are not open source (contrary to their claims), they are open weight.

diggan
21 replies
2h20m

Today we’re taking the next steps towards open source AI becoming the industry standard. We’re releasing Llama 3.1 405B, the first frontier-level open source AI model,

Why do people keep mislabeling this as Open Source? The whole point of calling something Open Source is that the "magic sauce" of how to build something is publicly available, so I could built it myself if I have the means. But without the training data publicly available, could I train Llama 3.1 if I had the means? No wonder Zuckerberg doesn't start with defining what Open Source actually means, as then the blogpost would have lost all meaning from the get go.

Just call it "Open Model" or something. As it stands right now, the meaning of Open Source is being diluted by all these companies pretending to doing one thing, while actually doing something else.

I initially got very exciting seeing the title and the domain, but hopelessly sad after reading through the article and realizing they're still trying to pass their artifacts off as Open Source projects.

valine
7 replies
2h8m

The codebase to do the training is way less valuable than the weights for the vast majority of people. Releasing the training code would be nice, but it doesn't really help anyone but Meta's direct competitors.

If you want to train on top of Llama there's absolutely nothing stopping you. Plenty of open source tools to do parameter optimization.

diggan
6 replies
2h3m

Not just the training code but the training data as well, should be under a permissive license, otherwise you cannot call the project itself Open Source, which Facebook does here.

is way less valuable than the weights for the vast majority of people

The same is true for most Open Source projects, most people use the distributed binaries or other artifacts from the projects, and couldn't care less about the code itself. But that doesn't warrant us changing the meaning of Open Source just because companies feel like it's free PR.

If you want to train on top of Llama there's absolutely nothing stopping you.

Sure, but in order for the intent of Open Source to be true for Llama, I should be able to build this project from scratch. Say I have a farm of 100 A100's, could I reproduce the Llama model from scratch today?

talldayo
2 replies
1h52m

I will steelman the idea that a tokenizer and weights are all you need for the "source" of an LLM. They are components that can be modified, redistributed and when put together, reproduce the full experience intended.

If we insist upon the release of training data with Open models, you might as well kiss the idea of usable Open LLMs out the door. Most of the content in training datasets like The Pile are not licensed for redistribution in any way shape or form. It would jeopardize projects that do use transparent training data while not offering anything of value to the community compared to the training code. Republishing all training data is an absolute trap.

enriquto
1 replies
1h46m

Most of the content in training datasets like The Pile are not licensed for redistribution in any way shape or form.

But distributing the weights is a "form" of distribution. You can recover many items of the dataset (most easily, the outliers) by using the weights.

Just because they are codified in a non-readily accessible way, does not mean that you are not distributing them.

It's scary to think that "training" is becoming a thinly veiled way to strip copyright of works.

talldayo
0 replies
1h43m

The weights are a transformed, lossy and non-complete permutation of the training material. You cannot recover most of the dataset reliably, which is what stops it from being an outright replacement for the work it's trained on.

does not mean that you are not distributing them.

Except you literally aren't distributing them. It's like accusing me of pirating a movie because I sent a screenshot or a scene description to my friend.

It's scary to think that "training" is becoming a thinly veiled way to strip copyright of works.

This is the way it's been for years. Google is given Fair Use for redistributing incomplete parts of copywritten text materials verbatim, since their application is transformative: https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....

Or Corellium, who won their case to use copywritten Apple code in novel and transformative ways: https://www.forbes.com/sites/thomasbrewster/2023/12/14/apple...

Copyright has always been a limited power.

unshavedyak
1 replies
1h55m

Not just the training code but the training data as well, should be under a permissive license, otherwise you cannot call the project itself Open Source, which Facebook does here.

Does FB even have the capability to do that? I'd assume there's a bunch of data that's not theirs and they can't even release it. Let alone some data that they might not want to admit is in the source.

bornfreddy
0 replies
1h46m

If not, it is questionable if they should train on such data anyway.

Also, that doesn't matter in this discussion - if you are unable to release the source under appropriate licence (for whatever reason), you should not call it Open Source.

jncfhnb
0 replies
1h46m

People don’t typically modify distributed binaries.

People do typically modify model weights. They are the preferred form to modify model.

Saying “build” llama is just a nonsense comparison to traditional compiled software. “Building llama” is more akin to taking the raw weights as text and putting them into a nice pickle file. Or loading it into an inference engine.

Demanding that you have everything needed to recreate the weights from scratch is like arguing an application cannot be open source unless it also includes the user testing history and design documents.

And of course some idiots don’t understand what a pickled weights file is and claim it’s as useless as a distributed binary if you want to modify the program just because it is technically compiled; not understanding that the point of the pickled file is “convenience” and that it unpacks back to the original form. Like arguing open source software can’t be distributed in zip files.

Say I have a farm of 100 A100's, could I reproduce the Llama model from scratch today?

Say you have a piece of paper. Can you reproduce `print(“hello world”)` from scratch?

jdminhbg
6 replies
1h53m

Why do people keep mislabeling this as Open Source? The whole point of calling something Open Source is that the "magic sauce" of how to build something is publicly available, so I could built it myself if I have the means. But without the training data publicly available, could I train Llama 3.1 if I had the means?

I don't think not releasing the commit history of a project makes it not Open Source, this seems like that to me. What's important is you can download it, run it, modify it, and re-release it. Being able to see how the sausage was made would be interesting, but I don't think Meta have to show their training data any more than they are obligated to release their planning meeting notes for React development.

Edit: I think the restrictions in the license itself are good cause for saying it shouldn't be called Open Source, fwiw.

thenoblesunfish
1 replies
1h49m

You don't need to have the commit history to see "how it works". ML that works well does so in huge part due to the training data used. The leading models today aren't distinguished by the way they're trained, but what they're trained on.

jdminhbg
0 replies
1h46m

I agree that you need training data to build AI from scratch, much like you need lots of really smart developers and a mailing list and servers and stuff to build the Linux kernel from scratch. But it's not like having the training data and training code will get you the same result, in the way something like open data in science is about replicating results.

tempfile
1 replies
1h43m

For the freedom to change to be effective, a user must be given the software in a form they can modify. Can you tweak an LLM once it's built? (I genuinely don't know the answer)

diggan
1 replies
1h41m

I don't think not releasing the commit history of a project makes it not Open Source,

Right, I'm not talking about the commit history, but rather that anyone (with means) should be able to produce the final artifact themselves, if they want. For weights like this, that requires at least the training script + the training data. Without that, it's very misleading to call the project Open Source, when only the result of the training is released.

What's important is you can download it, run it, modify it, and re-release it

But I literally cannot download the project, build it and run it myself? I can only use the binaries (weights) provided by Meta. No one can modify how the artifact is produced, only modify the already produced artifact.

That's like saying that Slack is Open Source because if I want to, I could patch the binary with a hex editor and add/remove things as I see fit? No one believes Slack should be called Open Source for that.

jdminhbg
0 replies
1h32m

Right, I'm not talking about the commit history, but rather that anyone (with means) should be able to produce the final artifact themselves, if they want. For weights like this, that requires at least the training script + the training data.

You cannot produce the final artifact with the training script + data. Meta also cannot reproduce the current weights with the training script + data. You could produce some other set of weights that are just about as good, but it's not a deterministic process like compiling source code.

That's like saying that Slack is Open Source because if I want to, I could patch the binary with a hex editor and add/remove things as I see fit? No one believes Slack should be called Open Source for that.

This analogy doesn't work because it's not like Meta can "patch" Llama any more than you can. They can only finetune it like everyone else, or produce an entirely different LLM by training from scratch like everyone else.

The right to release your changes is another difference; if you patch Slack with a hex editor to do some useful thing, you're not allowed to release that changed Slack to others.

If Slack lost their source code, went out of business, and released a decompiled version of the built product into the public domain, that would in some sense be "open source," even if not as good as something like Linux. LLMs though do not have a source code-like representation that is easily and deterministically modifiable like that, no matter who the owner is or what the license is.

vngzs
0 replies
2h2m

Agreed. The Linux kernel source contains everything you need to produce Linux kernel binaries. The llama source does not contain what you need to produce llama models. Facebook is using sleight of hand to garner favor with open model weights.

Open model weights are still commendable, but it's a far cry from open-source (or even libre) software!

unraveller
0 replies
1h52m

Open-weights is not open-source, for sure, but I don't mind it being stated as an aspiration goal, the moment it is legally possible to publish a source without shooting themselves in the foot they should do it.

They could release 50% of their best data but that would only stop them from attracting the best talent.

hbn
0 replies
1h56m

Is that even something they keep on hand? Or would WANT to keep on hand? I figured they're basically sending a crawler to go nuts reading things and discard the data once they've trained on it.

If that included, e.g. reading all of Github for code, I wouldn't expect them to host an entire separate read-only copy of Github because they trained on it and say "this is part of our open source model"

elromulous
0 replies
1h59m

100%. With this licensing model, meta gets to reap the benefits of open source (people contributing, social cachet), without any of the real detriment (exposing secret sauce).

blcknight
0 replies
1h47m

InstructLab and the Granite Models from IBM seem the closest to being open source. Certainly more than whatever FB is doing here.

(Disclaimer: I work for an IBM subsidiary but not on any of these products)

JeremyNT
0 replies
1h52m

Why do people keep mislabeling this as Open Source?

I guess this is a rhetorical question, but this is a press release from Meta itself. It's just a marketing ploy, of course.

cs702
20 replies
2h30m

> We’re releasing Llama 3.1 405B, the first frontier-level open source AI model, as well as new and improved Llama 3.1 70B and 8B models.

Bravo! While I don't agree with Zuck's views and actions on many fronts, on this occasion I think he and the AI folks at Meta deserve our praise and gratitude. With this release, they have brought the cost of pretraining a frontier 400B+ parameter model to ZERO for pretty much everyone -- well, everyone except Meta's key competitors.[a] THANK YOU ZUCK.

Meanwhile, the business-minded people at Meta surely won't mind if the release of these frontier models to the public happens to completely mess up the AI plans of competitors like OpenAI/Microsoft, Google, Anthropic, etc. Come to think of it, the negative impact on such competitors was likely a key motivation for releasing the new models.

---

[a] The license is not open to the handful of companies worldwide which have more than 700M users.

throwaway_2494
4 replies
1h25m

We’re releasing Llama 3.1 405B

Is it possible to run this with ollama?

jessechin
2 replies
1h9m

Sure, if you have a H100 cluster. If you quant it to int4 you might get away with using only 4 H100 GPUs!

sheepscreek
1 replies
28m

Assuming $25k a pop, that’s at least $100k in just the GPUs alone. Throw in their linking technology (NVLink) and cost for the remaining parts, won’t be surprised if you’re looking at $150k for such a cluster. Which is not bad to be honest, for something at this scale.

Can anyone share the cost of their pre-built clusters, they’ve recently started selling? (sorry feeling lazy to research atm, I might do that later when I have more time).

rty32
0 replies
9m

You can rent H100 GPUs.

vorticalbox
0 replies
1h8m

If you have the ram for it.

Ollama will offload as many layers as it can to the gpu then the rest will run on the cpu/ram.

tambourine_man
3 replies
1h11m

Praising is good. Gratitude is a bit much. They got this big by selling user generated content and private info to the highest bidder. Often through questionable means.

Also, the underdog always touts Open Source and standards, so it’s good to remain skeptical when/if tables turn.

sheepscreek
1 replies
31m

All said and done, it is a very expensive and blasy way to undercut competitors. They’ve spent > $5B on hardware alone, much of which will depreciate in value quickly.

Pretty sure the only reason Meta’s managed to do this is because of Zuck’s iron grip on the board (majority voting rights). This is great for Open Source and regular people though!

wrsh07
0 replies
12m

Zuck made a bet when they provisioned for reels to buy enough GPUs to be able to spin up another reels-sized service.

Llama is probably just running on spare capacity (I mean, sure, they've kept increasing capex, but if they're worried about an llm-based fb competitor they sort of have to in order to enact their copycat strategy)

ricardo81
0 replies
3m

selling user generated content and private info to the highest bidder

Was always their modus operandi, surely. How else would they have survived.

Thanks for returning everyone else;s content and never mind all the content stealing your platform did.

pwdisswordfishd
3 replies
29m

Makes me wonder why he's really doing this. Zuckerberg being Zuckerberg, it can't be out of any genuine sense of altruism. Probably just wants to crush all competitors before he monetizes the next generation of Meta AI.

spiralk
0 replies
17m

Its certainly not altruism. Given that Facebook/Meta owns the largest user data collection systems, any advancement in AI ultimately strengthens their business model (which is still mostly collecting private user data, amassing large user datasets, and selling targeting ads).

There is a demo video that shows a user wearing a Quest VR headset and asks the AI "what do you see" and it interprets everything around it. Then, "what goes well with these shorts"... You can see where this is going. Wearing headsets with AIs monitoring everything the users see and collecting even more data is becoming normalized. Imagine the private data harvesting capabilities of the internet but anywhere in the physical world. People need not even choose to wear a Meta headset, simply passing a user with a Meta headset in public will be enough to have private data collected. This will be the inevitable result of vision models improvements integrated into mobile VR/AR headsets.

phyrex
0 replies
14m

You can always listen to the investor calls for the capitalist point of view. In short, attracting talent, building the ecosystem, and making it really easy for users to make stuff they want to share on Meta's social networks

bun_at_work
0 replies
13m

I really think the value of this for Meta is content generation. More open models (especially state of the art) means more content is being generated, and more content is being shared on Meta platforms, so there is more advertising revenue for Meta.

tintor
1 replies
35m

they have brought the cost of pretraining a frontier 400B+ parameter model to ZERO

It is still far from zero.

cs702
0 replies
9m

If the model is already pretrained, there's no need to pretrain it, so the cost of pretraining is zero.

troupo
0 replies
29m

There's nothing open source about it.

It's a proprietary dump of data you can't replicate or verify.

What were the sources? What datasets it was trained on? What are the training parameters? And so on and so on

swyx
0 replies
1h57m

the AI folks at Meta deserve our praise and gratitude

We interviewed Thomas who led Llama 2 and 3 post training here in case you want to hear from someone closer to the ground on the models https://www.latent.space/p/llama-3

sandworm101
0 replies
4m

> Bravo! While I don't agree with Zuck's views and actions on many fronts, on this occasion I think he and the AI folks at Meta deserve our praise and gratitude.

Nope. Not one bit. Supporting F/OSS when it suits you in one area and then being totally dismissive of it in every other area should not be lauded. How about open sourcing some of FB's VR efforts?

germinalphrase
0 replies
41m

"Come to think of it, the negative impact on such competitors was likely a key motivation for releasing the new models."

"Commoditize Your Complement" is often cited here: https://gwern.net/complement

advael
0 replies
17m

Look, absolutely zero people in the world should trust any tech company when they say they care about or will keep commitments to the open-source ecosystem in any capacity. Nevertheless, it is occasionally strategic for them to do so, and there can be ancillary benefits for said ecosystem in those moments where this is the best play for them to harm their competitors

For now, Meta seems to release Llama models in ways that don't significantly lock people into their infrastructure. If that ever stops being the case, you should fork rather than trust their judgment. I say this knowing full well that most of the internet is on AWS or GCP, most brick and mortar businesses use Windows, and carrying a proprietary smartphone is essentially required to participate in many aspects of the modern economy. All of this is a mistake. You can't resist all lock-in. The players involved effectively run the world. You should still try where you can, and we should still be happy when tech companies either slip up or make the momentary strategic decision to make this easier

rybosworld
10 replies
2h36m

Huge companies like facebook will often argue for solutions that on the surface, seem to be in the public interest.

But I have strong doubts they (or any other company) actually believe what they are saying.

Here is the reality:

- Facebook is spending untold billions on GPU hardware.

- Facebook is arguing in favor of open sourcing the models, that they spent billions of dollars to generate, for free...?

It follows that companies with much smaller resources (money) will not be able to match what Facebook is doing. Seems like an attempt to kill off the competition (specifically, smaller organizations) before they can take root.

mattnewton
5 replies
2h31m

I actually think this is one of the rare times where the small guys interests are aligned with Meta. Meta is scared of a world where they are locked out of LLM platforms, one where OpenAI gets to dictate rules around their use of the platform much like Apple and Google dictates rules around advertiser data and monetization on their mobile platforms. Small developers should be scared of a world where the only competitive LLMs are owned by those players too.

Through this lense, Meta’s actions make more sense to me. Why invest billions in VR/AR? The answer is simple, don’t get locked out of the next platform, maybe you can own the next one. Why invest in LLMs? Again, don’t get locked out. Google and OpenAi/Microsoft are far larger and ahead of Meta right now and Meta genuinely believes the best way to make sure they have an LLM they control is to make everyone else have an LLM they can control. That way community efforts are unified around their standard.

mupuff1234
2 replies
2h22m

Sure, but don't you think the "not getting locked out" is just the pre-requisite for their eventual goal of locking everyone else out?

yesco
1 replies
1h15m

Does it really matter? Attributing goodwill to a company is like attributing goodwill to a spider that happens to clean up the bugs in your basement. Sure if they had the ability to, I'm confident Meta would try something like that, but they obviously don't, and will not for the foreseeable future.

I have faith they will continue to do what's in their best interests and if their best interests happen to align with mine, then I will support that. Just like how I don't bother killing the spider in my basement because it helps clean up the other bugs.

mupuff1234
0 replies
22m

But you also know that the spider has been laying eggs - So you better have an extermination plan ready.

myaccountonhn
1 replies
1h48m

I actually think this is one of the rare times where the small guys interests are aligned with Meta

Small guys are the ones being screwed over by AI companies and having their text/art/code stolen without any attribution or adherence to license. I don’t think Meta is on their side at all

MisterPea
0 replies
1h29m

That's a separate problem which affects small to large players alike (e.g. ScarJo).

Small companies interests are aligned with Meta as they are now on an equal footing with large incumbent players. They can now compete with a similarly sized team at a big tech company instead of that team + dozens of AI scientists

ketzo
2 replies
2h11m

Meta is, fundamentally, a user-generated-content distribution company.

Meta wants to make sure they commoditize their complements: they don’t want a world where OpenAI captures all the value of content generation, they want the cost of producing the best content to be as close to free as possible.

chasd00
0 replies
1h52m

i was thinking along the same. A lot of content generated by LLMs is going to end up on Facebook or Instagram. The easier it is to create AI generated content the more content ends up on those applications.

Nesco
0 replies
1h27m

Especially because genAI is a copyright laundering system. You can train it on copyrighted material and none of the content generated with it are copyright-able, which is perfect for social apps

Salgat
0 replies
2h32m

The reason for Meta making their model open source is rather simple: They receive an unimaginable amount of free labor, and their license only excludes their major competitors to ensure mass adoption without benefiting their competition (Microsoft, Google, Alibaba, etc). Public interest, philanthropy, etc are just nice little marketing bonuses as far as they're concerned (otherwise they wouldn't be including this licensing restriction).

mvkel
8 replies
2h40m

It's a real shame that we're still calling Llama "open source" when at best it's "open weights."

Not that anyone would go buy 100,000 H100s to train their own Llama, but words matter. Definitions matter.

sidcool
3 replies
2h31m

Honest question. As far as LLMs are concerned, isn't open weights same as open source?

paulhilbert
0 replies
2h19m

No, I would argue that from the three main ingredients - training data, model source code and weights - weights are the furthest away from something akin to source code.

They're more like obfuscated binaries. When it comes to fine-tuning only however things shift a little bit, yes.

mesebrec
0 replies
2h26m

Open source requires, at the very least, that you can use it for any purpose. This is not the case with Llama.

The Llama license has a lot of restrictions, based on user base size, type of use, etc.

For example you're not allowed to use Llama to train or improve other models.

But it goes much further than that. The government of India can't use Llama because they're too large. Sex workers are not allowed to use Llama due to the acceptable use policy of the license. Then there is also the vague language probibiting discrimination, racism etc.. good luck getting something like that approved by your legal team.

aloe_falsa
0 replies
2h20m

GPL defines the “source code” of a work as the preferred form of the work for making modifications to it. If Meta released a petabyte of raw training data, would that really be easier to extend and adapt (as opposed to fine-tuning the weights)?

lolinder
3 replies
2h23m

Source versus weights seems like a really pedantic distinction to make. As you say, the training code and training data would be worthless to anyone who doesn't have compute on the level that Meta does. Arguably, the weights are source code interpreted by an inference engine, and realistically it's the weights that someone is going to want to modify through fine-tuning, not the original training code and data.

The far more important distinction is "open" versus "not open", and I disagree that we should cede that distinction while trying to fight for "source". The Llama license is restrictive in a number of ways (it incorporates an entire acceptable use policy) that make it most definitely not "open" in the customary sense.

lolinder
0 replies
16m

It's fine in that I'm happy to use it and don't think I'll be breaking the terms anytime soon. It's not fine in that one of the primary things that makes open source open is that an open source license doesn't restrict groups of people or whole fields from usage of the software. The policy has a number of such blanket bans on industries, which, while reasonable, make the license not truly open.

mvkel
0 replies
1h48m

training code and training data would be worthless to anyone who doesn't have compute on the level that Meta does

I don't fully agree.

Isn't that like saying *nix being open source is worthless unless you're planning to ship your own Linux distro?

Knowing how the sausage is made is important if you're an animal rights activist.

tw04
6 replies
2h50m

In the early days of high-performance computing, the major tech companies of the day each invested heavily in developing their own closed source versions of Unix.

Because they sold the resultant code and systems built on it for money... this is the gold miner saying that all shovels and jeans should be free.

Am I happy Facebook open sources some of their code? Sure, I think it's good for everyone. Do I think they're talking out of both sides of their mouth? Absolutely.

Let me know when Facebook opens up the entirety of their Ad and Tracking platforms and we can start talking about how it's silly for companies to keep software closed.

I can say with 100% confidence if Facebook were selling their AI advances instead of selling the output it produces, they wouldn't be advocating for everyone else to open source their stacks.

JumpCrisscross
3 replies
2h49m

if Facebook were selling their AI advances instead of selling the output it produces, they wouldn't be advocating for everyone else to open source their stack

You're acting as if commoditizing one's complements is either new or reprehensible [1].

[1] https://gwern.net/complement

tw04
2 replies
2h34m

You're acting as if commoditizing one's complements is either new or reprehensible [1].

I'm acting as if calling on other companies to open source their core product, just because it's a complement for you, and acting as if it's for the benefit of mankind is disingenuous, which it is.

stale2002
0 replies
2h19m

as if it's for the benefit of mankind

But it does benefit mankind.

More free tech products is good for the world.

This is a good thing. When people or companies do good things, they should get the credit for doing good things.

JumpCrisscross
0 replies
2h19m

acting as if it's for the benefit of mankind is disingenuous, which it is

Is it bad for mankind that Meta publishes its weights? Mutually beneficial is a valid game state--there is no moral law that requires anything good be made as a sacrifice.

rvnx
1 replies
2h45m

The source-code to Ad tracking platform is useless to users.

At the end, it's actually Facebook doing the right thing (though they are known for being evil).

It's a bit of an irony.

The supposedly "good" and "open" people like Google or OpenAI, haven't given their model weights.

A bit like Microsoft became the company that actually supports the whole open-source ecosystem with GitHub.

tw04
0 replies
2h32m

The source-code to Ad tracking platform is useless to users.

It's absolutely not useless for developers looking to build a competing project.

The supposedly "good" and "open" people like Google or OpenAI, haven't given their model weights.

Because they're monetizing it... the only reason Facebook is giving it away is because it's a complement to their core product of selling ads. If they were monetizing it, it would be closed source. Just like their Ads platform...

enriquto
6 replies
2h5m

It's alarming that he refers to llama as if it was open source.

The definition of free software (and open source, for that mater), is well-established. The same definition applies to all programs, whether they are "AI" or not. In any case, if a program was built by training against a dataset, the whole dataset is part of the source code.

Llama is distributed in binary form, and it was built based on a secret dataset. Referring to it as "open source" is not ignorance, it's malice.

jdminhbg
2 replies
1h50m

In any case, if a program was built by training against a dataset, the whole dataset is part of the source code.

I'm not sure why I keep seeing this. What is the equivalent of the training data for something like the Linux kernel?

enriquto
1 replies
1h42m

What is the equivalent of the training data for something like the Linux kernel?

It's the source code.

For the linux kernel:

          compile(sourcecode) = binary
For llama:

          train(data) = weights

jdminhbg
0 replies
1h40m

That analogy doesn't work. `train` is not a deterministic process. Meta has all of the training data and all of the supporting source code and they still won't get the same `weights` if they re-run the process.

The weights are the result of the development process, like the source code of a program is the result of a development process.

Nesco
2 replies
2h0m

The training data contains most likely insane amounts of copyrighted material. That’s why virtually none of the “open models” come with their training data

enriquto
1 replies
1h51m

The training data contains most likely insane amounts of copyrighted material.

If that is the case then the weights must inherit all these copyrights. It has been shown (at least in image processing) that you can extract many training images from the weights, almost verbatim. Hiding the training data does not solve this issue.

But regardless of copyright issues, people here are complaining about the malicious use of the term "open source", to signify a completely different thing (more like "open api").

tempfile
0 replies
1h38m

If that is the case then the weights must inherit all these copyrights.

Not if it's a fair use (which is obviously the defence they're hoping for)

mesebrec
5 replies
2h51m

Note that Meta's models are not open source in any interpretation of the term.

* You can't use them for any purpose. For example, the license prohibits using these models to train other models. * You can't meaningfully modify them given there is almost no information available about the training data, how they were trained, or how the training data was processed.

As such, the model itself is not available under an open source license and the AI does not comply with the "open source AI" definition by OSI.

It's an utter disgrace for Meta to write such a blogpost patting themselves on the back while lying about how open these models are.

ChadNauseam
2 replies
2h48m

you can't meaningfully modify them given there is almost no information available about the training data, how they were trained, or how the training data was processed.

I was under the impression that you could still fine-tune the models or apply your own RLHF on top of them. My understanding is that the training data would mostly be useful for training the model yourself from scratch (possibly after modifying the training data), which would be extremely expensive and out of reach for most people

mesebrec
0 replies
2h36m

Indeed, fine-tuning is still possible, but you can only go so far with fine-tuning before you need to completely retrain the model.

This is why Silo AI, for example, had to start from scratch to get better support for small European languages.

chasd00
0 replies
1h36m

From what i understand the training data and careful curation of it is the hard part. Everyone wants training data sets to train their own models instead of producing their own.

causal
1 replies
2h34m

You are definitely allowed to train other models with these models, you just have to give credit in the name, per the license:

If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name.
starship006
4 replies
2h25m

Our adversaries are great at espionage, stealing models that fit on a thumb drive is relatively easy, and most tech companies are far from operating in a way that would make this more difficult.

Mostly unrelated to the correctness of the article, but this feels like a bad argument. AFAIK, Anthropic/OpenAI/Google are not having issues with their weights being leaked (are they?). Why is it that Meta's model weights are?

whimsicalism
0 replies
2h7m

We have no way of knowing whether nation-state level actors have access to those weights.

skybrian
0 replies
1h51m

I think it’s hard to say. We simply don’t know much from the outside. Microsoft has had some pretty bad security lapses, for example around guarding access to Windows source code. I don’t think we’ve seen a bad security break-in at Google in quite a few years? It would surprise me if Anthropic and OpenAI had good security since they’re pretty new, and fast-growing startups have a lot of organizational challenges.

It seems safe to assume that not all the companies doing leading-edge LLM’s have good security and that the industry as a whole isn’t set up to keep secrets for long. Things aren’t locked down to the level of classified research. And it sounds like Zuckerberg doesn’t want to play the game that way.

At the state level, China has independent AI research efforts and they’re going to figure it out. It’s largely a matter of timing, which could matter a lot.

There’s still an argument to be made against making proliferation too easy. Just because states have powerful weapons doesn’t mean you want them in the hands of people on the street.

meowface
0 replies
2h21m

AFAIK, Anthropic/OpenAI/Google are not having issues with their weights being leaked. Why is it that Meta's model weights are?

The main threat actors there would be powerful nation-states, in which case they'd be unlikely to leak what they've taken.

It is a bad argument though, because one day possession of AI models (and associated resources) might confer great and dangerous power, and we can't just throw up our hands and say "welp, no point trying to protect this, might as well let everyone have it". I don't think that'll happen anytime soon, but I am personally somewhat in the AI doomer camp.

dfadsadsf
0 replies
11m

We have nationals/citizens of every major US adversary working in those companies with looser security practice than security at local warehouse. Security check before hiring is a joke (mostly checks that resume checks out), laptops can be taken home and internal communication are not segmented on need to know basis. Essentially if China wants weights or source code, it will have hundreds of people to choose from who can provide it.

blackeyeblitzar
4 replies
2h54m

Only if it is truly open source (open data sets, transparent curation/moderation/censorship of data sets, open training source code, open evaluation suites, and an OSI approved open source license).

Open weights (and open inference code) is NOT open source, but just some weak open washing marketing.

The model that comes closest to being TRULY open is AI2’s OLMo. See their blog post on their approach:

https://blog.allenai.org/hello-olmo-a-truly-open-llm-43f7e73...

I think the only thing they’re not open about is how they’ve curated/censored their “Dolma” training data set, as I don’t think they explicitly share each decision made or the original uncensored dataset:

https://blog.allenai.org/dolma-3-trillion-tokens-open-llm-co...

By the way, OSI is working on defining open source for AI. They post weekly updates to their blog. Example:

https://opensource.org/blog/open-source-ai-definition-weekly...

JumpCrisscross
1 replies
2h51m

Only if it is truly open source (open data sets, transparent curation/moderation/censorship of data sets, open training source code, open evaluation suites, and an OSI approved open source license)

You’re missing a then to your if. What happens if it’s “truly” open per your definition versus not?

blackeyeblitzar
0 replies
2h37m

I think you are asking what the benefits are? The main benefit is that we can trust what these systems are doing better. Or we can self host them. If we just take the weights, then it is unclear how these systems might be lying to us or manipulating us.

Another benefit is that we can learn from how the training and other steps actually work. We can change them to suit our needs (although costs are impractical today). Etc. It’s all the usual open source benefits.

itissid
0 replies
1h43m

Yeah, though I do wonder for a big model like 405B if the original training recipe, really matters for where models are heading, practically speaking which is smaller and more specific?

I imagine its main use would be to train other models by distilling them down with LoRA/Quantization etc(assuming we have a tokenizer). Or use them to generate training data for smaller models directly.

But, I do think there is always a way to share without disclosing too many specifics, like this[1] lecture from this year's spring course at Stanford. You can always say, for example:

- The most common technique for filtering was using voting LLMs (without disclosing said llms or quantity of data).

- We built on top of a filtering technique for removing poor code using ____ by ____ authors (without disclosing or handwaving how you exactly filtered, but saying that you had to filter).

- We mixed certain proportion of this data with that data to make it better (without saying what proportion)

[1] https://www.youtube.com/watch?v=jm2hyJLFfN8&list=PLoROMvodv4...

haolez
0 replies
2h50m

There is also the risk of companies like Meta introducing ads in the training itself, instead of inference time.

maxdo
3 replies
2h52m

so that North Korea will create small call centers for cheaper, since they can get these models for free?

tempfile
0 replies
1h40m

This argument implies that cheap phones are bad since telemarketers can use them.

mrfinn
0 replies
51m

You guys really need to get over your bellicose POV of the world. Actually, before it destroys you. Really, is not necessary. Most people in the world just want to leave in peace, and see their children grow happily. For each data center NK would create there will be a thousand of peaceful, kind, and well-intentioned AI projects going on. Or maybe more.

HanClinto
0 replies
2h44m

The article argues that the threat of foreign espionage is not solved by closing models.

Some people argue that we must close our models to prevent China from gaining access to them, but my view is that this will not work and will only disadvantage the US and its allies. Our adversaries are great at espionage, stealing models that fit on a thumb drive is relatively easy, and most tech companies are far from operating in a way that would make this more difficult. It seems most likely that a world of only closed models results in a small number of big companies plus our geopolitical adversaries having access to leading models, while startups, universities, and small businesses miss out on opportunities.
hubraumhugo
3 replies
2h18m

The big winners of this: devs and AI startups

- No more vendor lock-in

- Instead of just wrapping proprietary API endpoints, developers can now integrate AI deeply into their products in a very cost-effective and performant way

- Price race to the bottom with near-instant LLM responses at very low prices are on the horizon

As a founder, it feels like a very exciting time to build a startup as your product automatically becomes better, cheaper, and more scalable with every major AI advancement. This leads to a powerful flywheel effect: https://www.kadoa.com/blog/ai-flywheel

danielmarkbruce
1 replies
24m

It creates the opposite of a flywheel effect for you. It creates a leapfrog effect.

boringg
0 replies
18m

AI might cannabalize a lot of first gen AI businesses.

boringg
0 replies
19m

- Price race to the bottom with near-instant LLM responses at very low prices are on the horizon

Maybe a big price war while the market majors fight out for positioning but they still need to make money off their investments so someone is going to have to raise prices at some point and youll be locked into their system if you build on it.

popcorncowboy
2 replies
1h50m

Developers can run inference on Llama 3.1 405B on their own infra at roughly 50% the cost of using closed models like GPT-4o

Does anyone have details on exactly what this means or where/how this metric gets derived?

rohansood15
0 replies
1h44m

I am guessing these are prices on services like AWS Bedrock (their post is down right now).

PlattypusRex
0 replies
1h20m

a big chunk of that is probably the fact that you don't need to pay someone who is trying to make a profit by running inference off-premises.

kart23
2 replies
2h55m

This is how we’ve managed security on our social networks – our more robust AI systems identify and stop threats from less sophisticated actors who often use smaller scale AI systems.

Ok, first of all, has this really worked? AI moderators still can't capture the mass of obvious spam/bots on all their platforms, threads included. Second, AI detection doesn't work, and with how much better the systems are getting, it's probably never going to, unless you keep the best models for yourself, and it's is clear from the rest of the note that its not zuck's intention to do so.

As long as everyone has access to similar generations of models – which open source promotes – then governments and institutions with more compute resources will be able to check bad actors with less compute.

This just doesn't make sense. How are you going to prevent AI spam, AI deepfakes from causing harm with more compute? What are you gonna do with more compute about nonconsensual deepfakes? People are already using AI to bypass identity verification on your social media networks, and pump out loads of spam.

simonw
0 replies
1h29m

"AI detection doesn't work, and with how much better the systems are getting, it's probably never going to, unless you keep the best models for yourself"

I don't think that's true. I don't think even the best privately held models will be able to detect AI text reliably enough for that to be worthwhile.

OpenComment
0 replies
2h43m

Interesting quotes. Less sophisticated actors just means humans who already write in 2020 what the NYT wrote in early 2022 to prepare for Biden's State Of The Union 180° policy reversals (manufacturing consent).

FB was notorious for censorship. Anyway, what is with the "actions/actors" terminology? This is straightforward totalitarian language.

carimura
2 replies
2h17m

Looks like you can already try out Llama-3.1-405b on Groq, although it's timing out. So. Hugged I guess.

TechDebtDevin
1 replies
1h54m

All the big providers should have it up by end of day. They just change their API configs (they're just reselling you AWS Bedrock).

jamiedg
0 replies
1h47m

405B and the other Llama 3.1 models are working and available on Together AI. https://api.together.ai

bun_at_work
2 replies
2h13m

Meta makes their money off advertising, which means they profit from attention.

This means they need content that will grab attention, and creating open source models that allow anyone to create any content on their own becomes good for Meta. The users of the models can post it to their Instagram/FB/Threads account.

Releasing an open model also releases Meta from the burden of having to police the content the model generates, once the open source community fine-tunes the models.

Overall, this move is good business move for Meta - the post doesn't really talk about the true benefit, instead moralizing about open source, but this is a sound business move for Meta.

natural219
0 replies
1h9m

AI moderators too would be an enormous boon if they could get that right.

jklinger410
0 replies
1h56m

This is a great point. Eventually, META will only allow LLAMA generated visual AI content on its platforms. They'll put a little key in the image that clears it with the platform.

Then all other visual AI content will be banned. If that is where legislation is heading.

amusingimpala75
2 replies
2h59m

Sure but under what license? Because slapping “open source” on the model doesn’t make it open source if it’s not actually license that way. The 3.1 license still contains their non-commercial clause (over 700m users) and requires derivatives, whether fine tunings or trained on generated data, to use the llama name.

redleader55
1 replies
2h49m

"Use it for whatever you want(conditions apply), but not if you are Google, Amazon, etc. If you become big enough talk to us." That's how I read the license, but obviously I might be missing some nuance.

mesebrec
0 replies
2h31m

You also can't use it for training or improving other models.

You also can't use it if you're the government of India.

Neither can sex workers use it. (Do you know if your customers are sex workers?)

There are also very vague restrictions for things like discrimination, racism etc.

abetusk
2 replies
2h48m

Another case of "open-washing". Llama is not available open source, under the common definition of open source, as the license doesn't allow for commercial re-use by default [0].

They provide their model, with weights and code, as "source available" and it looks like they allow for commercial use until a 700M monthly subscriber cap is surpassed. They also don't allow you to train other AI models with their model:

""" ... v. You will not use the Llama Materials or any output or results of the Llama Materials to improve any other large language model (excluding Meta Llama 3 or derivative works thereof). ... """

[0] https://github.com/meta-llama/llama3/blob/main/LICENSE

whimsicalism
0 replies
2h6m

i think these clauses are unenforceable. it's telling that OAI hasn't tried a similar suit despite multiple extremely well-known cases of competitors training on OAI outputs

sillysaurusx
0 replies
2h45m

They cannot legally enforce this, because they don’t have the rights to the content they trained it on. Whoever’s willing to fund that court battle would likely win.

There’s a legal precedent that says hard work alone isn’t enough to guarantee copyright, i.e. it doesn’t matter that it took millions of dollars to train.

1024core
2 replies
2h43m

"open source AI" ... "open" ... "open" ....

And you can't even try it without an FB/IG account.

Zuck will never change.

Gracana
0 replies
2h32m

You can also wait a bit for someone to upload quantized variants, finetunes, etc, and download those. FWIW I'm not making a claim about the legality of that, just saying it's an easy way around needing to sign the agreement.

typpo
1 replies
2h7m

Thanks to Meta for their work on safety, particularly Llama Guard. Llama Guard 3 adds defamation, elections, and code interpreter abuse as detection categories.

Having run many red teams recently as I build out promptfoo's red teaming featureset [0], I've noticed the Llama models punch above their weight in terms of accuracy when it comes to safety. People hate excessive guardrails and Llama seems to thread the needle.

Very bullish on open source.

[0] https://www.promptfoo.dev/docs/red-team/

swyx
0 replies
1h55m

is there a #2 to llamaguard? Meta seems curiously alone in doing this kind of, lets call it, "practical safety" work

tpurves
1 replies
35m

405 is a lot of B's. What does it take to run or host that?

danielmarkbruce
0 replies
17m

quantize to 0 bit. Run on a potato.

Jokes aside ~ 405b x 2 bytes of memory (FP16), so say 810 gigs, maybe 1000 gigs or so required in reality, need maybe 2 aws p5 instances?

throwaway1194
1 replies
2h11m

I strongly suspect that what AI will end up doing is push companies and organizations towards open source, they will eventually realize that code is already being shared via AI channels, so why not do it legally with open source?

talldayo
0 replies
37m

they will eventually realize that code is already being shared via AI channels

Private repos are not being reproduced by any modern AI. Their source code is safe, although AI arguably lowers the bar to compete with them.

resters
1 replies
2h12m

This is really good news. Zuck sees the inevitability of it and the dystopian regulatory landscape and decided to go all in.

This also has the important effect of neutralizing the critique of US Government AI regulation because it will democratize "frontier" models and make enforcement nearly impossible. Thank you, Zuck, this is an important and historic move.

It also opens up the market to a lot more entry in the area of "ancillary services to support the effective use of frontier models" (including safety-oriented concerns), which should really be the larger market segment.

passion__desire
0 replies
57m

Probably, Yann Lecun is the Lord Varys here. He has Mark's ear and Mark believes in Yann's vision.

nuz
1 replies
2h45m

Everyone complaining about not having data access: Remember that without meta you would have openai and anthropic and that's it. I'm really thankful they're releasing this, and the reason they can't release the data is obvious.

mesebrec
0 replies
2h39m

Without Meta, you would still have Mistral, Silo AI, and the many other companies and labs producing much more open models with similar performance.

mav3ri3k
1 replies
2h17m

I am not deep into llms so I ask this. From my understanding, their last model was open source but it was in a way that you can use them but the inner working were "hidden"/not transparent.

With the new model, I am seeing alot of how open source they are and can be build upon. Is it now completely open source or similar to their last models ?

whimsicalism
0 replies
2h8m

It's intrinsic to transformers that the inner workings are largely inscrutable. This is no different, but it does not mean they cannot be built upon.

Gradient descent works on these models just like the prior ones.

jorblumesea
1 replies
2h32m

Cynically I think this position is largely due to how they can undercut OpenAI's moat.

wayeq
0 replies
1h35m

It's not cynical, it's just an awareness that public companies have a fiduciary duty to their shareholders.

itissid
1 replies
2h1m

How are smaller models distilled from large models, I know of LoRA, quantization like technique; but does distilling also mean generating new datasets for conversing with smaller models entirely from the big models for many simpler tasks?

tintor
0 replies
1h36m

Smaller models can be trained to match log probs of the larger model. Larger model can be used to generate synthethic data for the smaller model.

indus
1 replies
2h3m

Is there an argument against Open Source AI?

Not the usual nation-state rhetoric, but something that justifies that closed source leads to better user-experience and fewer security and privacy issues.

An ecosystem that benefits vendors, customers, and the makers of close source?

Are there historical analogies other than Microsoft Windows or Apple iPhone / iOS?

kjkjadksj
0 replies
1h50m

Lets take the iphone. Secured by the industries best security teams I am sure. Closed source, yet teenagers in eastern europe have cracked into it dozens of times making jailbreaks. Every law enforcement agency can crack into it. Closed source is not a security moat, but a trade protection moat.

frabjoused
1 replies
1h17m

Who knew FB would hold OpenAI's original ideals, and OpenAI now holds early FB ideals/integrity.

boringg
0 replies
17m

FB needed to differentiate drastically. FB is at its best creating large data infra.

bufferoverflow
1 replies
1h23m

Hard disagree. So far every big important model is closed-source. Grok is sort-of the only exception, and it's not even that big compared to the (already old) GPT-4.

I don't see open source being able to compete with the cutting-edge proprietary models. There's just not enough money. GPT-5 will take an estimated $1.2 billion to train. MS and OpenAI are already talking about building a $100 billion training data center.

How can you compete with that if your plan is to give away the training result for free?

sohamgovande
0 replies
1h15m

Where is the $1.2b number from?

avivo
1 replies
54m

The FTC also recently put out a statement that is fairly pro-open source: https://www.ftc.gov/policy/advocacy-research/tech-at-ftc/202...

I think it's interesting to think about this question of open source, benefits, risk, and even competition, without all of the baggage that Meta brings.

I agree with the FTC, that the benefits of open-weight models are significant for competition. The challenge is in distinguishing between good competition and bad competition.

Some kind of competition can harm consumers and critical public goods, including democracy itself. For example, competing for people's scarce attention or for their food buying, with increasingly optimized and addictive innovations. Or competition to build the most powerful biological weapons.

Other kinds of competition can massively accelerate valuable innovation. The FTC must navigate a tricky balance here — leaning into competition that serves consumers and the broader public, while being careful about what kind of competition it is accelerating that could cause significant risk and harm.

It's also obviously not just "big tech" that cares about the risks behind open-weight foundation models. Many people have written about these risks even before it became a subject of major tech investment. (In other words, A16Z's framing is often rather misleading.) There are many non-big tech actors who are very concerned about current and potential negative impacts of open-weight foundation models.

One approach which can provide the best of both worlds, is for cases where there are significant potential risks, to ensure that there is at least some period of time where weights are not provided openly, in order to learn a bit about the potential implications of new models.

Longer-term, there may be a line where models are too risky to share openly, and it may be unclear what that line is. In that case, it's important that we have governance systems for such decisions that are not just profit-driven, and which can help us continue to get the best of all worlds. (Plug: my organization, the AI & Democracy Foundation; https://ai-dem.org/; is working to develop such systems and hiring.)

whimsicalism
0 replies
6m

making food that people want to buy is good actually

i am not down with this concept of the chattering class deciding what are good markets and what are bad, unless it is due to broad-based and obvious moral judgements.

amelius
1 replies
2h11m

One of my [Mark Zuckerberg, ed.] formative experiences has been building our services constrained by what Apple will let us build on their platforms. Between the way they tax developers, the arbitrary rules they apply, and all the product innovations they block from shipping, it’s clear that Meta and many other companies would be freed up to build much better services for people if we could build the best versions of our products and competitors were not able to constrain what we could build.

This is hard to disagree with.

glhaynes
0 replies
1h11m

I think it's very easy to disagree with!

If Zuckerberg had his way, mobile device OSes would let Meta ingest microphone and GPS data 24/7 (just like much of the general public already thinks they do because of the effectiveness of the other sorts of tracking they are able to do).

There are certainly legit innovations that haven't shipped because gatekeepers don't allow them. But there've been lots of harmful "innovations" blocked, too.

Oras
1 replies
2h51m

This is obviously good news, but __personally__ I feel the open-source models are just trying to catch up with whoever the market leader is, based on some benchmarks.

The actual problem is running these models. Very few companies can afford the hardware to run these models privately. If you run them in the cloud, then I don't see any potential financial gain for any company to fine-tune these huge models just to catch up with OpenAI or Anthropic, when you can probably get a much better deal by fine-tuning the closed-source models.

Also this point:

We need to protect our data. Many organizations handle sensitive data that they need to secure and can’t send to closed models over cloud APIs.

First, it's ironic that Meta is talking about privacy. Second, most companies will run these models in the cloud anyway. You can run OpenAI via Azure Enterprise and Anthropic on AWS Bedrock.

simonw
0 replies
1h27m

"Very few companies can afford the hardware to run these models privately."

I can run Llama 3 70B on my (64GB RAM M2) laptop. I haven't tried 3.1 yet but I expect to be able to run that 70B model too.

As for the 405B model, the Llama 3.1 announcement says:

To support large-scale production inference for a model at the scale of the 405B, we quantized our models from 16-bit (BF16) to 8-bit (FP8) numerics, effectively lowering the compute requirements needed and allowing the model to run within a single server node.
6gvONxR4sf7o
1 replies
1h17m

Third, a key difference between Meta and closed model providers is that selling access to AI models isn’t our business model. That means openly releasing Llama doesn’t undercut our revenue, sustainability, or ability to invest in research like it does for closed providers. (This is one reason several closed providers consistently lobby governments against open source.)

The whole thing is interesting, but this part strikes me as potentially anticompetitive reasoning. I wonder what the lines are that they have to avoid crossing here?

phkahler
0 replies
37m

> ...but this part strikes me as potentially anticompetitive reasoning.

"Commoditize your complements" is an accepted strategy. And while pricing below cost to harm competitors is often illegal, the reality is that the marginal cost of software is zero.

whimsicalism
0 replies
2h9m

OpenAI needs to release a new model setting a new capabilities highpoint. This is existential for them now.

wesleyyue
0 replies
1h48m

Just added Llama 3.1 405B/70B/8B to https://double.bot (VSCode coding assistant) if anyone would like to try it.

---

Some observations:

* The model is much better at trajectory correcting and putting out a chain of tangential thoughts than other frontier models like Sonnet or GPT-4o. Usually, these models are limited to outputting "one thought", no matter how verbose that thought might be.

* I remember in Dec of 2022 telling famous "tier 1" VCs that frontier models would eventually be like databases: extremely hard to build, but the best ones will eventually be open and win as it's too important to too many large players. I remember the confidence in their ridicule at the time but it seems increasingly more likely that this will be true.

tpurves
0 replies
36m

405 sounds like a lot of B's! What do you need to practically run or host that yourself?

probablybetter
0 replies
2h24m

I would avoid Facebook and Meta products in general. I do NOT trust them. We have approx. 20 years of their record to go upon.

pja
0 replies
1h24m

“Commoditise your complement” in action!

openrisk
0 replies
1h7m

Open source "AI" is a proxy for democratising and making (much) more widely useful the goodies of high performance computing (HPC).

The HPC domain (data and compute intensive applications that typically need vector, parallel or other such architectures) have been around for the longest time, but confined to academic / government tasks.

LLM's with their famous "matrix multiply" at their very core are basically demolishing an ossified frontier where a few commercial entities (Intel, Microsoft, Apple, Google, Samsung etc) have defined for decades what computing looks like for most people.

Assuming that the genie is out of the bottle, the question is: what is the shape of end-user devices that are optimally designed to use compute intensive open source algorithms? The "AI PC" is already a marketing gimmick, but could it be that Linux desktops and smartphones will suddenly be "ΑΙ natives"?

For sure its a transformational period and the landscape T+10 yrs could be drastically different...

mmmore
0 replies
1h8m

I appreciate that Mark Zuckerberg soberly and neutrally talked about some of the risks from advances in AI technology. I agree with others in this thread that this is more accurately called "public weights" instead of open source, and in that vein I noticed some issues in the article.

This is one reason several closed providers consistently lobby governments against open source.

Is this substantially true? I've noticed a tendency of those who support the general arguments in this post to conflate the beliefs of people concerned about AI existential risk, some of whom work at the leading AI labs, with the position of the labs themselves. In most cases I've seen, the AI labs (especially OpenAI) have lobbied against any additional regulation on AI, including with SB1047[1] and the EU AI Act[2]. Can anyone provide an example of this in the context of actual legislation?

On this front, open source should be significantly safer since the systems are more transparent and can be widely scrutinized. Historically, open source software has been more secure for this reason.

This may be true if we could actually understand what was happening in neural networks, or train them to consistently avoid unwanted behaviors. As things are, the public weights are simply inscrutable black boxes, and the existence of jailbreaks and other strange LLM behaviors show that we don't understand how our training processes create models' emergent behaviors. The capabilities of these models and their influence are growing faster than our understand of them, and our ability to steer them to behave precisely how we want, and that will only get harder as the models get more powerful.

At this point, the balance of power will be critical to AI safety. I think it will be better to live in a world where AI is widely deployed so that larger actors can check the power of smaller bad actors.

This paragraph ignores the concept of offense/defense balance. It's much easier to cause a pandemic than to stop one, and cyberattacks, while not as bad as pandemics, seem to also favor the attacker (this one is contingent on how much AI tools can improve our ability to write secure code). At the extreme, it would clearly be bad if everyone had access to a anti-matter weapon large enough to destroy the Earth; at some level of capability, we have to limit the commands an advanced AI will follow from an arbitrary person.

That said, I'm unsure if limiting public weights at this time would be good regulation. They do seem to have some benefits in increasing research around alignment/interpretability, and I don't know if I buy the argument that public weights are significantly more dangerous from a "misaligned ASI" perspective than many competing closed companies. I also don't buy the view of some in the leading labs that we'll likely have "human level" systems by the end of the decade; it seems possible but unlikely. But I worry that Zuckerberg's vision of the future does not adequately guard against downside risks, and is not compatible with the way the technology will actually develop.

[1] https://thebulletin.org/2024/06/california-ai-bill-becomes-a...

[2] https://time.com/6288245/openai-eu-lobbying-ai-act/

mensetmanusman
0 replies
2h15m

It’s easy to support open source AI when the code is 1,000 lines and the execution costs $100,000,000 of electricity.

Only the big players can afford to push go, and FB would love to see OpenAI’s code so they can point it to their proprietary user data.

manishrana
0 replies
1h24m

really useful insights

manishrana
0 replies
1h24m

rally useful insights

m3kw9
0 replies
2h8m

The truth is we need both closed and open source, they both have their discovery path and advantages and disadvantages, there shouldn’t be a system where one is eliminated over the other. They also seem to be driving each other forward via competition.

littlestymaar
0 replies
19m

I love how Zuck decided to play a new game called “commoditize some other billionaire's business to piss him”, I can't wait until this becomes a trend and we get plenty of open source cool stuff.

If he really wants to replicate Linux's success against proprietary Unices, he needs to release Llama with some kind of GPL equivalent, that forces everyone to play the open source game.

jmward01
0 replies
1h15m

I never thought I would say this but thanks Meta.

*I reserve the right to remove this praise if they abuse this open source model position in the future.

jamiedg
0 replies
1h45m

Looks like it's easy to test out these models now on Together AI - https://api.together.ai

gooob
0 replies
1h11m

why do they keep training on publicly available online data, god dammit? what the fuck. don't they want to make a good LLM? train on the classics, on the essentials reference manuals for different technologies, on history books, medical encyclopedias, journal notes from the top surgeons and engineers, scientific papers of the experiments that back up our fundamental theories. we want quality information, not recent information. we already have plenty of recent information.

didip
0 replies
1h47m

Is it really open source though? You can't run these models for your company. The license is extremely restrictive and there's NO SOURCE CODE.

btbuildem
0 replies
1h8m

The "open source" part sounds nice, though we all know there's nothing particularly open about the models (or their weights). The barriers to entry remain the same - huge upfront investments to train your own, and steep ongoing costs for "inference".

Is the vision here to treat LLM-based AI as a "public good", akin to a utility provider in a civilized country (taxpayer funded, govt maintained, non-for-profit)?

I think we could arguably call this "open source" when all the infra blueprints, scripts and configs are freely available for anyone to try and duplicate the state-of-the-art (resource and grokking requirements nonwithstanding)

aliljet
0 replies
2h58m

And this is happening RIGHT as a new potential leader is emerging in Llama 3.1. I'm really curious about how this is going to match up on the leaderboards...

KingOfCoders
0 replies
1h45m

Open Source AI needs to include training data.

Invictus0
0 replies
2h44m

The irony of this letter being written by Mark Zuckerburg at Meta, while OpenAI continues to be anything but open, is richer than anyone could have imagined.

InDubioProRubio
0 replies
2h51m

CrowdStrike just added "Centralized Company Controlled Software Ecosystem" to every risk data sheet on the planet. Everything futureproof is self-hosted and open source.

GaggiX
0 replies
2h1m

Llama 3.1 405B is on par with GPT-4o and Claude 3.5 Sonnet, the 70B model is better than GPT 3.5 turbo, incredible.

Dwedit
0 replies
4m

Without the raw data that trained the model, how is it open source?