They've got a console for it as well, https://www.meta.ai/
And announcing a lot of integration across the Meta product suite, https://about.fb.com/news/2024/04/meta-ai-assistant-built-wi...
Neglected to include comparisons against GPT-4-Turbo or Claude Opus, so I guess it's far from being a frontier model. We'll see how it fares in the LLM Arena.
They didn't compare against the best models because they were trying to do "in class" comparisons, and the 70B model is in the same class as Sonnet (which they do compare against) and GPT3.5 (which is much worse than sonnet). If they're beating sonnet that means they're going to be within stabbing distance of opus and gpt4 for most tasks, with the only major difference probably arising in extremely difficult reasoning benchmarks.
Since llama is open source, we're going to see fine tunes and LoRAs though, unlike opus.
https://github.com/meta-llama/llama3/blob/main/LICENSE
Llama is not open source. It's corporate freeware with some generous allowances.
Open source licenses are a well defined thing. Meta marketing saying otherwise doesn't mean they get to usurp the meaning of a well understood and commonly used understanding of the term "open source."
https://opensource.org/license
Nothing about Meta's license is open source. It's a carefully constructed legal agreement intended to prevent any meaningful encroachment by anyone, ever, into any potential Meta profit, and to disavow liability to prevent reputational harm in the case of someone using their freeware for something embarrassing.
If you use it against the license anyway, you'll just have to hope you never get successful enough that it becomes more profitable to sue you and take your product away than it would be annoying to prosecute you under their legal rights. When the threshold between annoying and profitable is crossed, Meta's lawyers will start sniping and acquiring users of their IP.
What is "source" regarding an LLM? Public training data and initial parameters?
Not an expert, but often weights are mentioned as not being open sourced. Happy to get corrected, as I'm not really sure.
Weights aren’t source because the goal of having open source software is that you can know how the software you’re consuming works, and you can produce the final software (the executable) using the source yourself. When you only have weights, you are getting something like the executable. Sure you can tweak it, but you don’t have the things you need to reproduce it or to examine how it works and validate it for your purposes. As such open weights are not in the spirit of open source.
I don't think the previous commenter was saying that it's okay to only release the weights.
The parameters and the license. Mistral uses Apache 2.0, a neatly permissive open source license. As such, it's an open source model.
Models are similar to code you might run on a compiled vm or native operating system. Llama.cpp is to a model as Python is to a python script. The license lays out the rights and responsibilities of the users of the software, or the model, in this case. The training data, process, pipeline to build the model in the first place is a distinct and separate thing from the models themselves. It'd be nice if those were open, too, but when dealing with just the model:
If it uses an OSI recognized open source license, it is an open source model. If it doesn't use an OSI recognized open source license, it's not.
Llama is not open source. It's corporate freeware.
Mistral is not “open source” either since we cannot reproduce it (the training data is not published). Both are open weight models, and they are both released under a license whose legal basis is unclear: it's not actually clear if they own any intellectual property over the model at all. Of course they claim such IP, but no court has ruled on this yet AFAIK and legislators could also enact laws that make these public domain altogether.
See this discussion and blog post about a model called OLMo from AI2 (https://news.ycombinator.com/item?id=39974374). They try to be more truly open, although here are nuances even with them that make it not fully open. Just like with open source software, an open source model should provide everything you need to reproduce the final output, and with transparency. That means you need the training source code, the data sets, the evaluation suites, the inference code, and more.
Most of these other models, like Llama, are open weight not open source - and open weight is just openwashing, since you’re just getting the final output like a compiled executable. But even with OLMo (and others like Databrick’s DBRX) there are issues with proprietary licenses being used for some things, which prevent truly free use. For some reason in the AI world there is heavy resistance to using OSI-approved licenses like Apache or MIT.
Finally, there is still a lack of openness and transparency on the training data sets even with models that release those data sets. This is because they do a lot of filtering to produce those data sets that happen without any transparency. For example AI2’s OLMo uses a dataset that has been filtered to remove “toxic” content or “hateful” content, with input from “ethics experts” - and this is of course a key input into the overall model that can heavily bias its performance, accuracy, and neutrality.
Unfortunately, there is a lot missing from the current AI landscape as far as openness.
You seem to be making claims that have little connection to the actual license.
The license states you can't use the model if, at the time Llama 3 was released, you had >700 million customers. It also says you can't use it for illegal/military/etc uses. Other than that, you can use it as you wish.
Those additional restrictions mean it's not an open source license by the OSI definition, which matters if you care about words sometimes having unambiguous meanings.
I call models like this "openly licensed" but not "open source licensed".
The OSI definition applies to source code -- I'm not sure the term "open source" makes much sense applied to model weights.
Whilst I agree the term isn't ideal, I don't agree with the other comments in the post I originally replied to.
That "etc" is doing a lot of work here. The point of OSI licenses like MIT, Apache 2.0 is to remove the "etc". The licensing company gives up its right to impose acceptable use policies. More restrictive, but still OSI approved, licenses are as clear as they possibly can be about allowed uses and the language is as unambiguous as possible. Neither is the case for the Llama AUP.
I'm curious: given that the model will probably be hosted in a private server, how would meta know or prove that someone is using their model against the license?
If they can develop any evidence at all (perhaps from a whistleblower, perhaps from some characteristic unique to their model), they can sue and then there's they get to do "discovery", which would force the sued party to reveal details.
Yes or no, do you conceed that for almost everyone, none of what you said matters, and almost everyone can use llama 3 for their use case, and that basically nobody is going to have to worry about being sued, other than maybe like Google, or equivalent?
You are using all these scary words without saying the obvious, which is that for almost everyone, none of that matters.
Models are mostly fungible, if meta decided to play games it's not too hard to switch models. I think this is mostly a CYA play.
Llama is open weight, not open source. They don’t release all the things you need to reproduce their weights.
Not really that either, if we assume that “open weight” means something similar to the standard meaning of “open source”—section 2 of the license discriminates against some users, and the entirety of the AUP against some uses, in contravention of FSD #0 (“The freedom to run the program as you wish, for any purpose”) as well as DFSG #5&6 = OSD #5&6 (“No Discrimination Against Persons or Groups” and “... Fields of Endeavor”, the text under those titles is identical in both cases). Section 7 of the license is a choice of jurisdiction, which (in addition to being void in many places) I believe was considered to be against or at least skirting the DFSG in other licenses. At best it’s weight-available and redistributable.
ML Twitter was saying that they're working on a 400B parameter version?
And they even allow you to use it without logging in. Didnt expect that from Meta.
I do see on the bottom left:
Log in to save your conversation history, sync with Messenger, generate images and more.
Think they meant it can be used without login.
Not in the EU though
or the UK
Doesn't work for me, I'm in EU.
Which indicates that they get enough value out of logged in users. Potentially they can identify you without logging in, no need to. But also ofc they get a lot of value by giving them data via interacting with the model.
I had the same reaction, but when I saw the thumbs up and down icon, I realized this was a smart way to crowd source validation data.
I imagine that is to compete with ChatGPT, which began doing the same.
Yeah, but not for image generation unfortunately
I've never had a FaceBook account, and really don't trust them regarding privacy
Where is it available? I got this in Norway.
Got the same in the Netherlands.
Probably the EU laws are getting too draconian. I'm starting to see it a lot.
EU actually has the opposite of draconian privacy laws. It's more that meta doesn't have a business model if they don't intrude on your privacy
Well, exactly, and that's why IMO they'll end up pulling out the EU. There's barely any money in non-targeted ads.
Claude has the same restriction [0], the whole of Europe (except Albania) is excluded. Somehow I don't think it is a retaliation against Europe for fining Meta and Google. I could be wrong, but a business decision seems more likely, like keeping usage down to a manageable level in an initial phase. Still, curious to understand why, should anyone here know more.
[0] https://www.anthropic.com/claude-ai-locations
You also said that when Meta delayed the Threads release by a few weeks in the EU. I recommend reading the princess on a pea fairytale since you seem to be quite sheltered, using the term draconian as liberally.
Meta (and other privacy exploiting companies) have to actually... care? Even if it's just a bit more. Nothing draconian about it.
Got the same in Denmark
Just use the Replicate demo instead, you can even alter the inference parameters
https://llama3.replicate.dev/
Or run a jupyter notebook from Unsloth on Colab
https://huggingface.co/unsloth/llama-3-8b-bnb-4bit
Blocked me for asking how to make Feet soft.
lmaooo.
I was asking scientifically too. I mean, I had intentions, but I wasnt doing anything outright bad.
GPT-3.5 rejected to extract data from a German receipt because it contained "Women's Sportswear", sent back a "medium" severity sexual content rating. That was an API call, which should be less restrictive.
No free feet!
Sorry, still too sexy. Can’t have that.
We are living in a post Dan Schneider world. Feet are off the table.
Yeah, almost like comparing a 70b model with a 1.8 trillion parameter model doesn't make any sense when you have a 400b model pending release.
(You can't compare parameter count with a mixture of experts model, which is what the 1.8T rumor says that GPT-4 is.)
You absolutely can since it has a size advantage either way. MoE means the expert model performs better BECAUSE of the overall model size.
Fair enough, although it means we don't know whether a 1.8T MoE GPT-4 will have a "size advantage" over Llama 3 400B.
> And announcing a lot of integration across the Meta product suite, ...
That's ominous...
Spending millions/billions to train these models is for a reason and it's not just for funsies.
They also stated that they are still training larger variants that will be more competitive:
That realtime `/imagine` prompt seems pretty great.