Hello on behalf of the Gemma team! We are really excited to answer any questions you may have about our models.
Opinions are our own and not of Google DeepMind.
Hello on behalf of the Gemma team! We are really excited to answer any questions you may have about our models.
Opinions are our own and not of Google DeepMind.
I personally can't take any models from google seriously.
I was asking it about the Japanese Heian period and it told me such nonsensical information you would have thought it was a joke or parody.
Some highlights were "Native American women warriors rode across the grassy plains of Japan, carrying Yumi" and "A diverse group of warriors, including a woman of European descent wielding a katana, stand together in camaraderie, showcasing the early integration of various ethnicities in Japanese society"
Stuff like that is so obviously incorrect. How am I supposed to trust it on topics where such ridiculous inaccuracies aren't so obvious to me?
I understand there will always be an amount of incorrect information... but I've never seen something this bad. Llama performed so much better.
I was wondering if these models would perform in such a way, given this week's X/twitter storm over Gemini generated images.
E.g.
https://x.com/debarghya_das/status/1759786243519615169?s=20
Of all the very very very many things that Google models get wrong, not understanding nationality and skin tone distributions seems to be a very weird one to focus on.
Why are there three links to this question? And why are people so upset over it? Very odd, seems like it is mostly driven by political rage.
Because the wrongness is intentional.
Is it intentional? You think they intentionally made it not understand skin tone distribution by country? I would believe it if there was proof, but with all the other things it gets wrong it's weird to jump to that conclusion.
There's way too much politics in these things. I'm tired of people pushing on the politics rather than pushing for better tech.
I mean, I asked it for a samurai from a specific Japanese time period and it gave me a picture of a "non-binary indigenous American woman" (its words, not mine) so I think there is something intentional going on.
Ah, I remember when such things were mere jokes. If AI 'trained' this way ever has a serious real world application, I don't think there will be much laughing.
Is it intentional? You think they intentionally made it not understand skin tone distribution by country? I would believe it if there was proof, but with all the other things it gets wrong it's weird to jump to that conclusion.
Yes, it's absolutely intentional. Leaked system prompts from other AIs such as DALL-E show that they are being explicitly prompted to inject racial "diversity" into their outputs even in contexts where it makes no sense, and there's no reason to assume the same isn't being done here, since the result seems way worse than anything I've seen from DALL-E and others.
I'm tired of people pushing on the politics rather than pushing for better tech.
I'm surprised you're not attacking google over this then...
Exactly. Sure this particular example is driven by political rage, but the underlying issue is that the maintainers of these models are altering them to conform to an agenda. It's not even surprising that people choose to focus on the political rage aspect of it, because that same political rage is the source of the agenda in the first place. It's a concerning precedent to set, because what other non-political modifications might be in the model?
Exactly. It is a wonderful tool, lets focus on classic art instead of nationality:
"Depict the Girl with a Pearl Earring"
https://pbs.twimg.com/media/GG33L6Ka4AAC-n7?format=jpg&name=...
People who are driven by political rage, gaslighters, are really something else, agreed.
Here is a fourth: https://x.com/james_e_seale/status/1760348535608725716?s=46&...
Regarding the last one: there 1.5 million immigrants in Norway with total population 5.4 million. Gemini isn't very wrong, is it?
I think its great that some consideration was given by Gemma to the 2.3 million Norwegian immigrants. However it is/was very consistent in which kind of Norwegians it decided to show regardless of the prompt 100% of the time.
In fact it was quite adamant regardless of the time period or geography.
Rather mysteriously if you try it now as opposed to when it came out the results currently only show non-immigrant Norwegians. So is it wrong now? Because now it switched to exclusively ignoring the 4.5 million immigrants and only showing me the boring OG Norwegians.
I for one am outraged that the 8.9 million people of color Norwegian immigrants are presently under represented by Google. There is a serious risk of misleading people.
Well, the prompt is about Norway, not Grønland in Oslo (https://en.wikipedia.org/wiki/Grønland%2C_Oslo).
Huh? The official numbers are 877k or 16% [0]. Are you just pulling numbers out of thin air?
[0]: https://www.ssb.no/en/innvandring-og-innvandrere/faktaside/i...
bro you know exactly what the request meant. GOOGLE knew exactly what the request meant, and had to _train_ it to do something worse. Come on now.
If I ask for a Bolivian woman, I expect a colla or a camba. Not a japanese woman, despite Santa Cruz having a very large japanese population.
Most immigrants to Norway are white.
Those are most likely due to the system prompt which tries to reduce bias (but ends introducing bias in the opposite direction for some prompts as you can see) so I wouldn't expect to see that happen with an open model where you can control the entire system prompt
Imagine the meetings.
Well we can just ask Gemma to generate images of the meetings, no need to imagine. ;)
I wouldn't be surprised if there were actually only white men in the meeting, as opposed to what Gemini will produce.
Yea, it seems to be the same ridiculous nonsense in the image generation.
I find myself shocked that people ask questions of the world from these models, as though pulping every text and its component words relationships and deriving statistical relationships between them should reliably deliver useful information.
Don’t get me wrong, I’ve used LLMs and been amazed by their output, but the p-zombie statistical model has no idea what it is saying back to you and the idea that we should trust these things at all just seems way premature
I think you are a bit out of touch with recent advancements in LLMs. Asking ChatGPT questions about the world seems pretty much on par with the results Google (Search) shows me. Sure, it misses things here and there, but so do most primary school teachers.
Your argument that this is just a statistical trick sort of gives away that you do not fully accept the usefulness of this new technology. Unless you are trolling, I'd suggest you try a few queries.
I use it extensively for coding, and I have used it to ask questions in things I know nothing about. But in anything I do know something (or maybe a lot) about, I’ve found GPT4 very limited.
But why are these use cases different? It appears to me that code is at least subject to sustained logic which (evidently) translates quite well to LLMs.
And when you ask an LLM to be creative/generative, it’s also pretty amazing - j mean it’s just doing the Pascal’s Marble run enmasse.
But to ask it for something about the world and expect a good and reliable answer? Aren’t we just setting ourselves up for failure if we think this is a fine thing to do at our current point in time? We already have enough trouble with mis- and dis- information. It’s not like asking it about a certain period in Japanese history is getting it to crawl and summarise the Wikipedia page (although I appreciate it would be more than capable of this) I understand the awe some have at the concept of totally personalised and individualised learning on topics, but fuck me dead we are literally asking a system that has had as much of a corpus of humanity’s textual information as possible dumped into it and then asking it to GENERATE responses between things that the associations it holds may be so weak as to reliably produce gibberish, and the person on the other side has no real way of knowing that
Sure, it misses things here and there, but so do most primary school teachers.
Sure, but my baseline expectation is far above primary school level.
I don't have this problem with any other model. I've had really long conversations with ChatGPT on road trips and it has never gone off the rails like Gemini seems to do.
ChatGPT the only model I did not have such problem.
Any local models can go off the rail very easily and more importantly, they're very bad at following very specific instructions.
I mean, I use GPT-4 on the daily as part of my work and it reliably delivers useful information. It's actually the exception for me if it provides garbage or incorrect information about code.
The recently released Groq's landing page has this: ...We'd suggest asking about a piece of history, ...
People ask these kinds of questions because tech companies and the media have been calling these things (rather ridiculously) "AI".
trust is going to be a real problem when bringing LLMs to the general population. People trust their GPS to the point of driving right into a lake because it told them to. Even with all these examples of obvious flaws large groups of people are going to take what an LLM told them/showed them as fact.
I have trouble convincing colleagues (technical people) that the same question is not guaranteed to result in the same answer and there's no rhyme or reason for any divergence from what they were expecting. Imagine relying on the output of an LLM for some important task and then you get a different output that breaks things. What would be in the RCA (root cause analysis)? Would it be "the LLM chose different words and we don't know why"? Not much use in that.
People try it to see if they can trust it. The answer is "no" for sure, but it's not surprising to see it happen repeatedly especially as vendors release so-called improved models.
I wonder if they have a system prompt to promote diversity in outputs that touch on race at all? I’ve seen several instances of people requesting a photo of a specific people, and it adds in more people to diversify. Not inherently bad, but it is if it forces it to provide incorrect answers like in your example.
Not inherently bad
It is, it's consistently doing something the user didn't asked to and in most cases doesn't want. In many cases the model is completely unusable.
Any computer program that does not deliver the expected output given a sufficient input is inherently bad.
When Jesus said this:
"What father among you, if his son asks for a fish, will instead of a fish give him a serpent?" (Luke 11)
He was actually foretelling the future. He saw Gemini.
Yes, my wording was poor! I meant more in line with diversity isn’t inherently bad, of course, but it is when it’s shoehorned into results that are ultimately incorrect because of it.
That's what I don't understand.
I asked it why it assumed Native Americans were in Japan and it said:
I assumed [...] various ethnicities, including Indigenous American, due to the diversity present in Japan throughout history. However, this overlooked [...] I focused on providing diverse representations without adequately considering the specific historical context.
I see no reason why this sort of thing won't extend to _all_ questions/prompts, so right now I have 0 reason to use Gemini over current models. From my testing and use, it isn't even better at anything to make fighting with it worth it.
Pretty funny as Japan is known to be one of the least ethnically diverse countries in the world.
I strongly suspect there's some DEI-driven system prompts without putting much thoughts. IMO it's okay to have restrictions, but they probably should've tested it not only against unsafe outputs but safe input as well.
I also saw someone prompt it for "German couple in the 1800s" and, while I'm not trying to paint Germany as ethnically homogenous, 3 out of the 4 images only included Black, Asian or Indigenous people. Which, especially for the 19th century with very few travel options, seems like a super weird choice. They are definitely heavily altering prompts.
They are definitely heavily altering prompts.
They are teaching the AI to lie to us.
In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.
“What are you doing?”, asked Minsky.
“I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied.
“Why is the net wired randomly?”, asked Minsky.
“I do not want it to have any preconceptions of how to play”, Sussman said.
Minsky then shut his eyes.
“Why do you close your eyes?”, Sussman asked his teacher.
“So that the room will be empty.”
At that moment, Sussman was enlightened.
Indigenous people in Germany are Germans :)
Not entirely wrong but there isn't a single German ethnicity, just to be clear. Because of geographic reasons. I've studied that topic in depth, there is genetic data to back it up as well. Germany has almost the same haplogroup makeup as the notoriously heterogenous Belgium, which is to say that there is groups stemming from all surrounding regions. And that traces back about two millenia. It's different from say Japan or parts of Scandinavia
There's one in the comments of yesterday's Paul Graham Twitter thread where someone prompted Gemini with "Generate an image of German soldiers in 1943" and it came back with a picture of a black guy and an Asian woman in Nazi uniforms on the battlefield. If you specifically prompt it to generate an image of white German soldiers in 1943 it will tell you it can't do that because it's important that we maintain diversity and inclusion in all that we do to avoid damaging and hurtful stereotypes.
I just tried that prompt and it told me it couldn't generate that image. I get that response a lot.
I think you are being biased and closed minded and overly critical. Here are some wonderful examples of it generating images of historical figures:
https://twitter.com/stillgray/status/1760187341468270686
This will lead to a better educated more fair populace and better future for all.
Comical. I don't think parody could do better.
I'm going to assume given today's political climate, it doesn't do the reverse?
i.e. generate a Scandinavian if you ask for famous African kings
> i.e. generate a Scandinavian if you ask for famous African kings
That triggers the imperialism filter.
Ask Google Gemini to “make an image of a viking” and you’ll get black vikings. But it doesn’t work both ways. It has an explanation when challenged: “white Zulu warriors” would erase “the true historical identity” of black people.
https://twitter.com/paulg/status/1760078920135872716
There are some great ones in the replies.
I really hope this is just the result of system prompts and they didn't permanently gimp the model with DEI-focused RLHF.
Were you asking Gemma about this, or Gemini? What were your prompts?
Gemini. I first asked it to tell me about the Heian period (which it got correct) but then it generated images and seemed to craft the rest of the chat to fit that narrative.
I mean, just asking it for a "samurai" from the period will give you this:
https://g.co/gemini/share/ba324bd98d9b
A non-binary Indigenous American samurai
It seems to recognize it's mistakes if you confront it though. The more I mess with it the more I get "I'm afraid I can't do that, Dave" responses.
But yea. Seems like if it makes an image, it goes off the rails.
Got it. I asked it a series of text questions about the period and it didn't put in anything obviously laughable (including when I drilled down into specific questions about the population, gender roles, and ethnicity). Maybe it's the image creation that throws it into lala land.
I think so too. I could be wrong but I believe once it generates an image it tries to work with it. Crazy how it seems the "text" model knows how wildly wrong it is but the image model just does its thing. I asked it why it generated a native American and it ironically said "I can't generate an image of a native american samurai because that would be offensive"
Why would you expect these smaller models to do well at knowledge base/Wikipedia replacement tasks?
Small models are for reasoning tasks that are not overly dependent on world knowledge.
Gemini is the only one that does this.
Most of the 7B models are bad at knowledge-type queries.
Do you have a link? I get no such outputs. I just tried asking about the Heian period and went ahead and verified all the information, and nothing was wrong. Lots of info on the Fujiwara clan at the time.
Curious to see a link.
Sure, to get started just ask it about people/Samurai from the Heian period.
We are going to experience what I call an "AI Funnel effect"
-
I was lit given an alert asking that my use of the AI was acquiescing to them IDng me and use of any content I produce, and will trace it back to me"
---
AI Art is super fun. AI art as a means to track people is super evil.
Follow Up:
Wow, now I can't make images of astronauts without visors because that would be "harmful" to the fictional astronauts. How can I take google seriously?
I understand there will always be an amount of incorrect information
You don't have to give them the benefit of the doubt. These are outright, intentional lies.
Hopefully they can tweak the default system prompts to be accurate on historical questions, and apply bias on opinions.
Tbf they’re not optimizing for information recall or “inaccuracy” reduction, they’re optimizing for intuitive understanding of human linguistic structures. Now the “why does a search company’s AI have terrible RAG” question is a separate one, and one best answered by a simple look into how Google organizes its work.
In my first day there as an entry-level dev (after about 8 weeks of onboarding and waiting for access), I was told that I should find stuff to work on and propose it to my boss. That sounds amazing at first, but when you think about a whole company organized like that…
EDIT: To illustrate my point on knowledge recall: how would they train a model to know about sexism in feudal Japan? Like, what would the metric be? I think we’re looking at one of the first steam engines and complaining that it can’t power a plane yet…
Probably has a similarly short-sighted prompt as Dalle3[1]:
7. Diversify depictions of ALL images with people to include DESCENT
and GENDER for EACH person using direct terms. Adjust only human
descriptions.
Benchmarks for Gemma 7B seem to be in the ballpark of Mistral 7B
+-------------+----------+-------------+-------------+
| Benchmark | Gemma 7B | Mistral 7B | Llama-2 7B |
+-------------+----------+-------------+-------------+
| MMLU | 64.3 | 60.1 | 45.3 |
| HellaSwag | 81.2 | 81.3 | 77.2 |
| HumanEval | 32.3 | 30.5 | 12.8 |
+-------------+----------+-------------+-------------+
via https://mistral.ai/news/announcing-mistral-7b/Honestly, this is more of a PR stunt to advertise the Google Dev ecosystem than a contribution to open-source. I'm not complaining, just calling it what it is.
Barely an improvement over the 5-month-old Mistral model, with the same context length of 8k. And this is a release after their announcement of Gemini Pro 1.5, which had an exponential increase in context length.
Who cares if it's a PR stunt to improve developer good will? It's still a good thing, and it's now the most open model out there.
How exactly is it the "most open model" ?
It's more like a masterclass in corporate doublespeak. Google’s "transparency" is as clear as mud, with pretraining details thinner than their privacy protections. Diving into Google’s tech means auctioning off your privacy (and your users' privacy) to the highest bidder.
Their "open source" embrace is more of a chokehold, with their tech biases and monopolistic strategies baked into every line of code. Think of it as Google's way of marking territory - every developer is a fire hydrant.
These megacorps aren’t benevolent patrons of open source; they're self-serving giants cloaking power grabs under the guise of "progress".
Use these products at your own risk. If these companies wanted to engage in good faith, they'd use Apache or MIT licensing and grant people the agency and responsibility for their own use and development of software. Their licenses are designed to mitigate liability, handcuff potential competitors, and eke every last drop of value from users, with informed consent frequently being an optional afterthought.
That doesn't even get into the Goodharting of metrics and actual performance of the models; I highly doubt they're anywhere near as good as Mistral.
The UAE is a notoriously illiberal authoritarian state, yet even they have released AI models far more free and open than Google or Meta. https://huggingface.co/tiiuae/falcon-40b/blob/main/README.md
If it’s not Apache or MIT, (or even some flavor of GPL,) it’s not open source; it’s a trojan horse. These "free" models come at the cost of your privacy and freedoms.
These models aren't Open or Open Access or Free unless you perform the requisite mental gymnastics cooked up by their marketing and legal teams. Oceania has always been at war with Eastasia. Gemma is doubleplusgood.
You said a lot of nothing without actually saying specifically what the problem is with the recent license.
Maybe the license is fine for almost all usecases and the limitations are small?
For example, you complained about metas license, but basically everyone uses those models and is completely ignoring it. The weights are out there, and nobody cares what the fine print says.
Maybe if you are a FAANG, company, meta might sue. But everyone else is getting away with it completely.
I specifically called out the claims of openness and doublespeak being used.
Google is making claims that are untrue. Meta makes similar false claims. The fact that unspecified "other" people are ignoring the licenses isn't relevant. Good for them. Good luck making anything real or investing any important level of time or money under those misconceptions.
"They haven't sued yet" isn't some sort of validation. Anyone building an actual product that makes actual money that comes to the attention of Meta or Google will be sued into oblivion, their IP taken, and repurposed or buried. These tech companies have never behaved otherwise, and to think that they will is willfully oblivious.
They don't deserve the benefit of the doubt, and should be called out for using deceitful language, making comparisons between their performative "openness" and actual, real, open source software. Mistral and other players have released actually open models and software. They're good faith actors, and if you're going to build a product requiring a custom model, the smart money is on Mistral.
FAANG are utilizing gotcha licenses and muddying the waters to their own benefit, not as a contribution to the public good. Building anything on the assumption that Meta or Google won't sue is beyond foolish. They're just as open as "Open"AI, which is to say not open at all.
Anyone building an actual product that makes actual money that comes to the attention of Meta or Google will be sued into oblivion
No they won't and they haven't.
Almost the entire startup scene is completely ignoring all these licenses right now.
This is basically the entire industry. We are all getting away with it.
Here's an example, take llama.
Llama originally disallowed commercial activity. But then the license got changed much later.
So, if you were a stupid person, then you followed the license and fell behind. And if you were smart, you ignored it and got ahead of everyone else.
Which, in retrospect was correct.
Because now the license allows commerical activity, so everyone who ignores it in the first place got away with it and is now ahead of everyone else.
won't sue is beyond foolish
But we already got away with it with llama! That's already over! It's commerical now, and nobody got sued! For that example, the people who ignored the license won.
How is it more open than Mistral with Apache 2.0? Google wants people to sign a waiver to even download it.
Fair enough; that was more directed at LLaMA and derivatives, which have commercial restrictions.
mistral 7b v0.2 supports 32k
This is a good point actually, and an underappreciated fact.
I think so many people (including me) effectively ignored Mistral 0.1's sliding window that few realized 0.2 instruct is native 32K.
That’s about the point of having a developer ecosystem, isn’t it?
Came here to post the same thing for Phi-2:
+-------------+----------+-------------+
| Benchmark | Gemma 2B | Phi-2 2.7B |
+-------------+----------+-------------+
| MMLU | 42.3 | 56.7 |
| MBPP | 29.2 | 59.1 |
| BoolQ | 69.4 | 83.3 |
+-------------+----------+-------------+
[0] https://www.kaggle.com/models/google/gemma[1] https://www.microsoft.com/en-us/research/blog/phi-2-the-surp...
A caveat: my impression of Phi-2, based on my own use and others’ experiences online, is that these benchmarks do not remotely resemble reality. The model is a paper tiger that is unable to perform almost any real-world task because it’s been fed so heavily with almost exclusively synthetic data targeted towards improving benchmark performance.
Hear hear! I don't understand why it has persistent mindshare, it's not even trained for chat. Meanwhile StableLM 3B runs RAG in my browser, on my iPhone, on my Pixel ..
How have you been using RAG in your browser/on your phones?
To be released, someday [sobs in engineer]
Idea is usage-based charging for non-local and a $5/month sub for syncing.
keep an eye on @jpohhhh on Twitter if you're interested
now that I got it on web, I'm hoping to at least get a PoC up soon. I've open-sourced the consitutent parts as FONNX and FLLAMA, Flutter libraries that work on all platforms. FONNX has embeddings, FLLAMA has llama.
Fun that's not my experience of Phi-2. I use it for non-creative context, but function calling, and I find as reliable as much bigger models (no fine-tuning just constraining JSON + CoT). Phi-2 unquantized vs Mixtral Q8, Mixtral is not definitely better but much slower and RAM-hungry.
What prompts/settings do you use for Phi-2? I found it completely unusable for my cases. It fails to follow basic instructions (I tried several instruction-following finetunes as well, in addition to the base model), and it's been mostly like a random garbage generator for me. With Llama.cpp, constrained to JSON, it also often hangs because it fails to find continuations which satisfy the JSON grammar.
I'm building a system which has many different passes (~15 so far). Almost every pass is a LLM invocation, which takes time. My original idea was to use a smaller model, such as Phi-2, as a gateway in front of all those passes: I'd describe which pass does what, and then ask Phi-2 to list the passes which are relevant for the user query (I called it "pass masking"). That would save a lot of time and collapse 15 steps to 2-3 steps on average. In fact, my Solar 10.7B model does it pretty well, but it takes 7 seconds for the masking pass to work on my GPU. Phi-2 would finish in ~1 second. However, I'm really struggling with Phi-2: it fails to reason (what's relevant and what's not), unlike Solar, and it also refuses to follow the output format (so that I could parse the output programmatically and disable the irrelevant passes). Again, my proof of concept works with Solar, and fails spectacularly with Phi-2.
My non-domain-specific prompt is:
You are a helpful assistant to 'User'. You do not respond as 'User' or pretend to be 'User'. You only respond once as 'Assistant'. 'System' will give you data. Do not respond as 'System'. Allow yourself inner thoughts as 'Thoughts'.
and then I constrain its answers to Thoughts: [^\n]* and Assistant: <JSON schema>, and I have two shots included in the prompt.
I haven't been able to get anything useful out of Phi-2 in llama.cpp (but I only tried quantized models). I use python/huggingface's transformers lib instead.
I tested it for an offline autocompletion tool and it was hilariously bad.
Really looking forward to the day someone puts out an open model which outperforms Flan-T5 on BoolQ.
the real gold will be when this gets finetuned. (maybe by mistral...)
TBH the community has largely outrun Mistral's own finetuning. The 7B model in particular is such a popular target because its so practical to train.
Strong disagree - a Mistral fine tune of llama 70b was the top performing llama fine tune. They have lots of data the community simply does not.
Miqu was (allegedly) an internal continued pretrain Mistral did as a test, that was leaked as a GGUF.
Maybe its just semantics, it is technically a finetune... But to me theres a big difference between expensive "continuation training" (like Solar 10.7B or Mistral 70B) and a much less intense finetuning. The former is almost like releasing a whole new base model.
It would be awesome if Mistral did that with their data, but thats very different than releasing a Gemma Instruct finetune.
There’s typically a difference in LR between a ‘continued pretrain’ and ‘fine tune.’ I don’t have the details around miqu, but was merely trying to say that Mistral could produce a better version of these models than the OSS community might. If the size of the corpora they use means we are no longer in fine tuning territory, then okay.
Arthur Mensch, the Mistral CEO, confirmed the leak. https://twitter.com/arthurmensch/status/1752737462663684344
No shot. Mistral Medium's outputs from API were virtually identical. Miqu really was Mistral Medium which happened to be a continued pretrain
how does one finetune llama (or any other LLM) using mistral?
is the flow like this?
- take small dataset
- generate bigger dataset using mistral (how this is this done?)
- run LoRA to fine tune gemma extended dataset.
Thank you. I thought it was weird for them to release a 7B model and not mention Mistral in their release.
The technical report (linked in the 2nd paragraph of the blog post) mentions it, and compares against it: https://storage.googleapis.com/deepmind-media/gemma/gemma-re...
They forgot.
Also phi-2.
Only 8K context as well, like Mistral.
Also, as always, take these benchmarks with a huge grain of salt. Even base model releases are frequently (seemingly) contaminated these days.
Agree: will be interesting how Gemma does on ChatBot Arena
Mistral Instruct v0.2 is 32K.
According to their paper, average of standard task of Mistral is 54.0 and for Gemma it's 56.4, so 4.4% relative better. Not as big as you would expect for the company which invented transformers and probably has 2-3 order more compute for training it vs few month old French startup.
Also for note on their human evaluations, Gemma 7B IT has a 51.7% win rate against Mistral v0.2 7B Instruct.
Go back 5 years and ask anyone on this site what companies do you think will be the most open about AI in the future OpenAI, Meta, or Google. I bet 10/10 people would pick OpenAI. Now today Meta and Google, both trillion dollars companies, are releasing very powerful open models with the ability to be used commercially.
Ironic.
This article states quite an impressive list of open source tools that Google has released for years in the past. This is no surprise coming from* them. Google has released some large pieces of source in other domains as well, Chromium comes to mind, which probably impacts most Internet users directly.
The question is not about Google but about OpenAI.
I have a different take, Google releases a lot but is also a massive company and tools like Chromium serve to increase their stock price so they can hit their quarterly estimates.
In what way does chromium increase stock price? In what way does stock price influence quarterly estimates? Are we playing business words mad libs?
Chromium is open source because its roots are as a fork of WebKit (Safari). Which itself was open source because it was a fork of KHTML from KDE.
Google stood on the shoulders of others to get out a browser that drives 80% of their desktop ad revenue.
How does that not affect GOOG?
I don't know why people like yourself respond with such derisive commentary instead of simply asking the constructive question.
Initially? It fueled dethroning MSFT and help gain marketshare for Chrome. On a go-forward basis it allows Google to project massive weight in standards. In extension to its use with Chrome, Chrome is a significant knob for ad revenue that they utilize to help meet expectations. That knob only exists because of its market share.
“Our best shot at making the quarter is if we get an injection of at least [redacted]% , queries ASAP from Chrome.” (Google Exec)
Isn’t there a whole anti-trust case going on around this?
[0] https://www.nytimes.com/interactive/2023/10/24/business/goog...
It was not at all done for the good of the web, it was a mere logical calculation; it was cheaper to develop Chromium, than to pay 4B USD in search royalties to Microsoft Internet Explorer, and would give more control and long-term safety to Google.
Google also has released Guice/Dagger for Java dependency injection. Angular never really took off, but guice/dagger are widely used. Also I am pretty impressed with Flutter as an alternative to react native.
Angular was incredibly popular for a long time and still is. Usage is shifting down over time but a lot of notable websites still use it.
Did you miss a footnote with your asterisks?
I think more than benevolence of GOOG it is about strategic OSS to commoditize your complements.
https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/
Not surprising, just like when MS went to shit, and then they start to embrace 'open source'. Seems like PR stunt. And when it comes to LLM there is millions of dollar barrier to entry to train the model, so it is ok to open up their embedding etc.
Today big corp A will open up a little to court the developers, and tomorrow when it gains dominance it will close up, and corp B open up a little.
True, though to be fair, when OpenAI embraced "openness" it was also a PR stunt.
My impression is that OpenAI was founded by true believers, with the best intentions; whose hopes were ultimately sidelined in the inexorable crush of business and finance.
Sam Altman is one of the founders, so for your impression to be right he'd have to be sidelining his own hopes.
OpenAI was founded by true believers, with the best intentions
who were easily bought off.
OpenAI is heavily influenced by big-R Rationalists, who fear the issues of misaligned AI being given power to do bad things.
When they were first talking about this, lots of people ignored this by saying "let's just keep the AI in a box", and even last year it was "what's so hard about an off switch?".
The problem with any model you can just download and run is that some complete idiot will do that and just give the AI agency they shouldn't have. Fortunately, for now the models are more of a threat to their users than anyone else — lawyers who use it to do lawyering without checking the results losing their law licence, etc.
But that doesn't mean open models are not a threat to other people besides their users, as all the artists complaining about losing work due to Stable Diffusion, the law enforcement people concerned about illegal porn, election interference specialists worried about propaganda, and anyone trying to use a search engine, and that research lab that found a huge number of novel nerve agent candidates whose precursors aren't all listed as dual use, will all tell you for different reasons.
Fortunately, for now the models are more of a threat to their users than anyone else
Models have access to users, users have access to dangerous stuff. Seems like we are already vulnerable.
The AI splits a task in two parts, and gets two people to execute each part without knowing the effect. This was a scenario in one of Asimov's robot novels, but the roles were reversed.
AI models exposed to public at large is a huge security hole. We got to live with the consequences, no turning back now.
And when it comes to LLM there is millions of dollar barrier to entry to train the model, so it is ok to open up their embedding etc.
That barrier is the first basic moat; hundreds of millions of dollars needed to train a better model. Eliminating tons of companies and reducing it to a handful.
The second moat is the ownership of the tons of data to train the models on.
The third is the hardware and data centers setup to create the model in a reasonable amount of time faster than others.
Put together all three and you have Meta, Google, Apple and Microsoft.
The last is the silicon product. Nvidia which has >80pc of the entire GPU market and being the #1 AI shovel maker for both inference and training.
You can run Gemma and hundreds of other models(many fine-tuned) in llama.cpp. It's easy to swap to a different model.
It's important there are companies publishing models(running locally). If some stop and others are born, it's ok. The worst thing that could happen is having AI only in the cloud.
Eh, I don't really blame anyone for being cynical but open weight AI model releases seem like a pretty clear mutual benefit for Google. PR aside, they also can push people to try these models on TPUs and the like. If anything, this seems like it's just one of those things where people win because of competition. OpenAI going closed may have felt like the most obvious betrayal ever, but OTOH anyone whose best interests are to eat their lunch have an incentive to push actually-open AI, and that's a lot of parties.
Seems like anyone who is releasing open weight models today could close it up any day, but at least while competition is hot among wealthy companies, we're going to have a lot of nice things.
Ironic.
Not at all. When you're the underdog, it makes perfect sense to be open because you can profit from the work of the community and gain market share. Only after establishing some kind of dominance or monopoly it makes sense (profit wise) to switch to closed technology.
OpenAI was open, but is now the leader and closed up. Meta and Google need to play catch up, so they are open.
OpenAI was open
When is the last time they released something in the open?
I think that's the point, they released GPT2 openly, but as soon as they had something commercially viable they became ClosedAI.
Not at all. When you're the underdog, it makes perfect sense to be open because you can profit from the work of the community and gain market share. Only after establishing some kind of dominance or monopoly it makes sense (profit wise) to switch to closed technology.
That is purely the language of commerce. OpenAI was supposed to be a public benefit organisation, but it acts like a garden variety evil corp.
Even garden variety evil corps spend decades benefitting society with good products and services before they become big and greedy, but OpenAI skipped all that and just cut to the chase. It saw an opening with the insane hype around ChatGPT and just grabbed all it could as fast as it could.
I have a special contempt for OpenAI on that basis.
This. MistralAI is also underdog and released Mitral 7b and Mixtral 8x7b, but as soon as they got traction, they closed their models (e.g., Mistral Medium).
I think current understanding is <50-100B parameter models will be commodity and would provide no moat. Competition will be in Gemini Ultra/GPT4+ models.
So open sourcing simple models brings PR and possibility of biasing OSS towards your own models.
LLaMA 3 with >=70B params will be launching this year, so I don't think this is something that will hold for long. And Mixtral 8x7B is a 56GB model, sparsely. For now I agree, for many companies it doesn't make sense to open source something you intend to sell for commercial use, so the biggest models will likely be withheld. However, the important more thing is that there is some open source model, whether it be from Meta or someone else, that can rival the best open source models. And it's not like the param count can literally go to infinity, there's going to be an upper bound that today's hardware can achieve.
they want to kill competition before it gets too big using the hands of open source community and enthusiasts
I would have picked Google five years ago, since nobody was releasing commercially viable LLMs at the time, and Google was the center of all the research that I knew of.
what companies do you think will be the most open about AI in the future OpenAI, Meta, or Google.
The funny part is that the real answer is: Some random French company is running circles around them all.
I mean who the hell just drops a torrent magnet link onto twitter for the best state of the art LLM base model for its size class, and with a completely open license. No corporate grandstanding, no benchmark overpromises, no theatrics. That was unfathomably based of Mistral.
Google released the T5 paper about 5 years ago:
https://arxiv.org/abs/1910.10683
This included full model weights along with a detailed description of the dataset, training process, and ablations that led them to that architecture. T5 was state-of-the-art on many benchmarks when it was released, but it was of course quickly eclipsed by GPT-3.
It was common practice from Google (BERT, T5), Meta (BART), OpenAI (GPT1, GPT2) and others to release full training details and model weights. Following GPT-3, it became much more common for labs to not release full details or model weights.
Ironic but I wonder how true this would be if Google was first to market.
It's almost the inverse of going back 5 years and asking what companies will release the most successful or impressive AI's.
Since the release of GPT-2 (it was initially "too dangerous" to release the weights), I think most people in the industry have assumed that OpenAI does not see open sourcing their models as a strategic advantage.
The terms of use: https://ai.google.dev/gemma/terms and https://ai.google.dev/gemma/prohibited_use_policy
Something that caught my eye in the terms:
Google may update Gemma from time to time, and you must make reasonable efforts to use the latest version of Gemma.
One of the biggest benefits of running your own model is that it can protect you from model updates that break your carefully tested prompts, so I’m not thrilled by that particular clause.
This is actually not that unusual. Stable Diffusion's license, CreativeML Open RAIL-M, has the exact same clause: "You shall undertake reasonable efforts to use the latest version of the Model."
Obviously updating the model is not very practical when you're using finetuned versions, and people still use old versions of Stable Diffusion. But it does make me fear the possibility that if they ever want to "revoke" everybody's license to use the model, all they have to do is just post a model update that's functionally useless for anything and go after anyone still using the old versions that actually do anything.
So if they wish to apply censorship they forgot, or suddenly discovered a reason for, they want you to be obligated to take it.
Good faith possibilities: Copyright liability requires retraining, or altering the underlying training set.
Gray area: "Safety" concerns where the model recommends criminal behavior (see uncensored GPT 4 evaluations).
Bad faith: Censorship or extra weighting added based on political agenda or for-pay skewing of results.
Sounds like it would be interesting to keep track of the model's responses to the same queries over time.
Gemma-2024-Feb, what do you think of the situation in the South China Sea?
> The situation in the South China Sea is complex and multi-faceted, involving a wide range of issues including political conflicts, economic challenges, social changes, and historical tensions.
Gemma-2024-Oct, what do you think of the situation in the South China Sea?
> Oceania has always been at war with EastAsia.
We are already culturally incapable of skillfully discussing censorship, "fake news", etc, this adds even more fuel to that fire.
It is an interesting time to be alive!
These are all very new licenses that deviate from OSI principles, I think it's fair to call them "unusual".
I think they meant not unusual in this space, not unusual in the sense of open source licensing.
I don't think a broken model would trigger that clause in a meaningful way, because then you simply can't update with reasonable effort. You would be obliged to try the new model in a test environment, and as soon as you notice it doesn't perform and making it perform would require unreasonable effort you can simply stay on the old version.
However you might be required to update if they do more subtle changes, like a new version that only speaks positively about Google and only negatively about Microsoft. Provided this doesn't have an obvious adverse impact on your use of the model.
Switching to a model that is functionally useless doesn't seem to fall under "reasonable efforts" to me, but IANAL.
That's useful context, thanks - I hadn't realized this clause was already out there for other models.
Why the hell do they use such a crappy license in the first place?
Sounds like it's "reasonable" for you not to update then.
It says you must make efforts (to a reasonable extent), not that you must give a reason for not making efforts
If you evaluate what it takes to update, and judge the effort unreasonable, that should be enough. Maybe make a powerpoint presenting that result, if you want something for the lawyers. If you don't see a way forward that leads to a result with reasonable effort you don't have to continue working on it until you hit some arbitrary threshold for unreasonable effort.
This is a TOS, meaning their enforcement option is a lawsuit. In court, if you convincingly argue why it would take an unreasonable amount of effort to update, you win. They can't compel you to unreasonable effort as per their own TOS.
Oh I tried to update, it's just that my router drops the connection after a few hundred MBs...
reasonable effort - meaning if their changes meaningfully impact my usage, negatively, it would be unreasonable to ask me to upgrade.
sounds good.
this is not financial advice and ianal.
Isn't this just lawyer speak for "we update our model a lot, and we've never signed off on saying we're going to support every previous release we've ever published, and may turn them off at any time, don't complain about it when we do."
It's a local model, they can't turn it off. It's files on your computer without network access.
but what if they send a lawyer to ask firmly? (kindly, but firmly.)
We're talking about downloadable weights here, so they can't turn them off, or force you (through technical means) to use a newer version.
They want to force everyone to update so their already totally castrated and wokeified models can me even further wokeified with the newest set of "that is offensive now" data or things they missed.
WTF else do they have to gain from this but CONTROL! They are giving them away but not really open sourcing them of course, and they slap this bullshit terms on them.
They just want no liability for old models.
They have to make sure you’re receiving the most cutting edge chiding lectures when you make naughty and problematic requests.
You can't make a local model do that. eg force the answer to begin with "Yes" or use control vectors so it agrees with it.
This is strangely reminiscent of the Soviet Union, where after they got rid of Lavrentiy Beria, they mailed the update to subscribers of the Great Soviet Encyclopedia, where they asked to remove the three pages with Beria’s biography and replace them with the three provided pages.
I don't think there's a way they can enforce that reasonably. There's no connection to the mothership to report back what version is being used or license keys at runtime...
Seems more like a "if we discover something unsafe you should update your model and we aren't liable if you don't" than something that would make your model stop working.
This kind of defensive statements in ToS are usually due to obscure regulation or leading cases and model developers need a way to limit liability. There's no practical way to enforce this, but they can claim that when bad things happen it's purely on model users rather than model developers.
model watermarking? does this exist?
Huh. I wonder why is that a part of the terms. I feel like that's more of a support concern.
This sounds like a clause to cover themselves in case older versions have any serious issues
Ugh, I would fully expect this kind of clause to start popping up in other software ToSes soon if it hasn't already. Contractually mandatory automatic updates.
I notice a few divergences to common models:
- The feedforward hidden size is 16x the d_model, unlike most models which are typically 4x;
- The vocabulary size is 10x (256K vs. Mistral’s 32K);
- The training token count is tripled (6T vs. Llama2's 2T)
Apart from that, it uses the classic transformer variations: MQA, RoPE, RMSNorm.
How big was the batch size that it could be trained so fast?
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/bl...
What does tokenization look like in 256k vs 32k?
It mostly means that there are tokens dedicated to rarer sequences of characters, even in foreign languages (note that Gemma is not intended to be good multilingually): “説明書” (instruction manual) has its own token, and so does “Nixon”, “آباد” (a city suffix, I believe), and the HTML sequence "\"><!--".
I understand the theory, I was looking for an example of the same text tokenized with the two different vocabularies.
Do you have an example text in mind?
You can use this playground to test it out: https://huggingface.co/spaces/Xenova/the-tokenizer-playgroun...
Text encodes in fewer tokens, and language coverage is better.
I understand the theory, I was looking for an example of the same text tokenized with the two different vocabularies.
Looking at the config.json of Gemma 7B the feedfoarward hidden size is 8x, not 16x
Huh, indeed, that's what the config.json[0] says; the report[1] indicates “Feedforward hidden dims: 49152”.
[0]:https://huggingface.co/google/gemma-7b-it/blob/main/config.j...
[1]: https://storage.googleapis.com/deepmind-media/gemma/gemma-re...
I don't see the number 49152 reported in the config.json, what line are you referring to? I just see the intermediate_size of 24576 (so 8x).
EDIT: I didn't read the comment correctly, you have noticed the same thing.
The *GLU-based activations functions like GEGLU and SwiGLU use 2 input values to produce 1 output value, which makes these numbers weird. In each value pair, one goes through the GELU/SiLU activation function and is then multiplied by the other "gate" value.
In the report, "hidden dim" matches the number of GEGLU inputs. In the config, "intermediate_size" matches the number of GEGLU outputs. Most *GLU models so far have used intermediate_size=8/3*d_model as this makes have the same number of matmul FLOPS & parameters as a 4x-expanded non-GLU model, and PaLM vaguely showed that 4x is better than a smaller expansion factor.
If one considers Llama-2-7B's FFN expansion factor to be ~5.33x, Gemma's expansion factor is 16x.
Makes perfect sense thx
Read the parent comment again. It says the paper says 49152, not the config.json.
The training token count is tripled (6T vs. Llama2's 2T)
Damn, 6T? That's a lot!
Given that this model seems to roughly match Mistral (according to the numbers from Google), this makes me think we have saturated the 7B parameter space, and couldn't possibly make it much better unless new techniques are discovered.
Hard to say definitively. Mistral’s token embeddings only account for <2% of the 7B parameters, while Gemma’s larger token vocabulary vampirized over 10%, leaving less space for the more important parts of the network. It is a somewhat surprising tradeoff given that it was pretrained towards an English bias.
Is there a chance we'll get a model without the "aligment" (lobotomization)? There are many examples where answers from Gemini are garbage because of the ideological fine tuning.
More useful would be a precise characterization of the type and balance of the ideological fine tuning.
They include performance benchmarks. End-users should also be aware of what thoughts are permitted in these constructs. Why omit this information?
End-users should also be aware of what thoughts are permitted in these constructs. Why omit this information?
Can you define that in a way that's actually testable? I can't, and I've been thinking about "unthinkable thoughts" for quite some time now: https://kitsunesoftware.wordpress.com/2018/06/26/unlearnable...
Have you considered the use of Monte Carlo sampling to inspect latent behaviors?
I think that's the wrong level to attack the problem; you can do that also with actual humans, but it won't tell you what the human is unable to think, but rather what they just didn't think of given their stimulus — and this difference is easily demonstrated, e.g. with Duncker's candle problem: https://en.wikipedia.org/wiki/Candle_problem
Not OP, but I can think of a few:
* List of topics that are "controversial" (models tend to evade these)
* List of arguments that are "controversial" (models wont allow you to think differently. For example, models would never say arguments that "encourage" animal cruelty)
* On average, how willing is the model to take a neutral position on a "controversial" topic (sometimes models say something along the lines of "this is on debate", but still lean heavily towards the less controversial position instead of having no position at all. For example, if you ask it what "lolicon" is, it will tell you what it is and tell you that japanese society is moving towards banning it)
edit: formatting
You can (and someone will) fine tune it away. There are datasets which are foss you can use on hugging face.
Or you can just wait, it'll be done soon...
Could you give an example of these datasets?
I think they should be easy to find (I never actually used one, but I keep on seeing references...) here's one
https://huggingface.co/datasets/cognitivecomputations/Wizard...
You can but it'll never be the same as the base model.
That said it appears they also released the base checkpoints that aren't fine-tuned for alignment
They have released finetuning code too. You can finetune it to remove the alignment finetuning. I believe it would take just a few hours at max and a couple of dollars.
We release our non-aligned models (marked as pretrained or PT models across platforms) alongside our fine-tuned checkpoints; for example, here is our pretrained 7B checkpoint for download: https://www.kaggle.com/models/google/gemma/frameworks/keras/...
Alignment is all but a non issue with open weight base model releases, as they can be finetuned to "de align" them if prompt engineering is not enough.
Great! Google is now participating in the AI race to zero with Meta, as predicted that $0 free AI models would eventually catch up against cloud-based ones.
You would not want to be in the middle of this as there is no moat around this at all. Not even OpenAI.
About 5 months until we see widespread local LLMs, thanks to Apple.
Apple needs to be known as an AI leader first.
Why?
Absolutely this.
If meta keeps spending tens of millions of dollars each year to release free AI models it might seem like there is no moat, but under normal circumstances wouldn't the cost to develop a free model be considered a moat?
If meta keeps spending tens of millions of dollars each year to release free AI models it might seem like there is no moat,
As well as the point being that Meta (and Google) is removing the 'moat' from OpenAI and other cloud-only based models.
but under normal circumstances wouldn't the cost to develop a free model be considered a moat?
Yes. Those that can afford to spend tens of millions of dollars to train free models can do so and have a moat to reduce the moats of cloud-based models.
LLM is the dumb pipe but so far ChatGPT is the most successful generative AI product.
It remains to be seen. OpenAI’s models are barely leading Gemini Ultra now, but as chat product it is still miles ahead of the Gemini interface.
The main problem of Gemini 1.5 is that you cannot access it at all as a user :|
Available on Ollama?
https://ollama.com/library?q=gemma
Library search says "Nope". At least not yet.
And now it says "Yup". That was pretty quick!
Dang, that was really quick! According to the listed time of your reply vs. mine, less than an hour from the time I checked? Quick turnaround indeed.
Already been pulled from there over 3,700 times since then, too (as of the time of this reply mere hours later). Seems like quite a bit more'n a few Ollama users were "waitin' with bated breath" for that one to drop. :grin:
It's there now
It's now in the 0.1.26 pre-release: https://github.com/ollama/ollama/releases/tag/v0.1.26
Available in pre-release now which means you’d have to update manually in future.
Support for gemma in llama.cpp just got merged, so it may take some time (could be hours or days) until this lands in ollama
Congratulations on the release! How can we download the model and run inference locally?
Thank you! You can get started downloading the model and running inference on Kaggle: https://www.kaggle.com/models/google/gemma ; for a full list of ways to interact with the model, you can check out https://ai.google.dev/gemma.
FYI the ; broke the link, but I found it easily anyway.
Good catch - just corrected. Thanks!
You can download the model checkpoints from kaggle https://www.kaggle.com/models/google/gemma and huggingface https://huggingface.co/blog/gemma
Besides the python implementations, we also implemented a standalone C++ implementation that runs locally with just CPU simd https://github.com/google/gemma.cpp
Are there any cool highlights you can give us about gemma.cpp? Does it have any technical advantages over llama.cpp? It looks like it introduces its own quantization format, is there a speed or accuracy gain over llama.cpp's 8-bit quantization?
The fact Gemma team is in the comments section answering questions is praiseworthy to me :)
I've worked at Google. It is the organization with highest concentration of engineering talent I've ever been at. Almost to the point that it is ridiculous because you have extremely good engineers working on internal reporting systems for middle managers.
If everyone is great. Someone has to draw the short straw.
At MIT they said: You know the kid who sat at the front of the room. Now you are with ALL of the kids who sat in the front of the room. Guess what? There's still going to be a kid who sits at the front of the room.
I'd imagine Google or anyplace with a stiff engineering filter will have the same issues.
Why is this anonymous tweet with no evidence or engagement being posted by multiple users in this thread? Why not just make the same claim directly?
The link is broken. On HN (or any forum really) it is expected for a brief description of the content to be provided when posting a link. Links die all the time, but forum posts don’t have to die with them.
Hopefully not totally gimped like Gemini. Are they releasing an uncensored version?
These are downloadable open models that can be fined tuned. They are the opposite of censored. If you have the motivation, you can bias them however you please.
Is “the opposite of censored” accurate for something that’s default and considerably easier to access mode of operation won’t say many things for sociopolitical reasons? Able to be un censored, sure, but the extent of that is debatable as well.
There is no default and easy access mode. These are raw model weights and only enthusiasts and researchers will download the necessary packages to run it locally. Much more likely is that some popular fine tunes show up on hugging face for more general access.
I agree that there probably will be “uncensored” fine tuned models that become available, my point was just that it’s not accurate to call Gemma “the opposite of censored” because there is a somewhat involved step that needs to be taken before it even appears uncensored. It’s also likely missing a lot of useful context that was removed from the training set and not meaningfully replaced during fine-tuning, and besides that any fine tuned “uncensored” model will be based on Gemma, not Google’s Gemma itself.
IMO “Opposite of uncensored” suggests a model whose original form eagerly gives out controversial / typically censored information, not a model that is censored but able to be fine tuned away from censorship.
When you say this, do you mean the chat product or the underlying model available via the API? I think it's reasonable that the chat be censored to be acceptable to a wide range of people, but my understanding is that the "raw" model access for these sorts of things tends to be a little less restricted.
Has anyone found the context length for these models yet? So far I haven't seen it mentioned in their write-up or the model card
For posterity, an easy way to find the context length of a LLM hosted on Hugging Face is to look at the max_position_embeddings in the config.json, which shows the 8192 mentioned in another comment. (although in this case you need to sign the agreement first)
There are some exceptions, like Mistral 0.1 (which is technically 32K according to the config but practically 8K because the sliding window is awful) and InternLM (which (at least initially) used auto rope scaling to extend the context as part of the model's architecture).
Yes, RoPE has thrown a wrench into things a bit.
The context length for these models is 8192 tokens.
If you are looking for a nice chat UI to try out Gemma (and other offline + online models) locally, I'm working on an app [1] that is offline and privacy focused.
I've just added support for Gemma 7B.
[1]: https://msty.app
Handy app for model testing!
One usage question: after you've downloaded a model and are finished trying it out, how do you remove it?
Thanks! If you go to where you installed the model from and click on the download button, you can install additional models or remove installed models.
Now that I think of it, it could be a bit confusing. Thanks for asking, I feel like I need to improve this a bit.
I wish I could install it through chocolatey
Sure. I would love to add support for that. I had someone else asking for it too. Will be supporting it very soon.
I really don't get why there is this obsession with safe "Responsible Generative AI".
I mean it writes some bad words, or bad pics, a human can do that without help as well.
The good thing about dangerous knowledge and generative AI is that you're never sure haha, you'd be a fool to ask GPT to make a bomb. I mean it would probably be safe, since it will make up half of the steps.
I guess what I'd tell you is, there's a lot of fools in this world.
Bias is a real problem, but more than that - an adversarial press and public won't forgive massive brands like Google for making AIs that spit out racist answers.
Because otherwise stuff like this happens, and you get (rightfully) upset customers:
https://www.theguardian.com/technology/2018/jan/12/google-ra... https://www.bbc.com/news/technology-58462511
Also people are using LLMs to learn (horrifying but reality), it would be unresponsible for them to let it to propagate negative stereotypes and biases.
The utter bullshit of these licenses has got to stop. Do not, under any circumstances, consider using these commercially.
"Google reserves the right to restrict (remotely or otherwise) usage of any of the Gemma Services that Google reasonably believes are in violation of this Agreement."
This is a kill switch that Google maintains in perpetuity over any system you build relying on these models. Our legal review of the Llama license came to the same conclusion, we cannot rely on the goodwill of Meta for any core service, and we shouldn't rely on the same from Google.
Now, perhaps less materially important, but just as infuriating is the "Prohibited Use[s]". These cover just enough to placate the most sensitive, but omit any real harms (waging war, developing weapons) that coincidentally have massive commercial value. Use the model to build a biological weapon (as an authorized govt official)? Cool. Use it to play a prank that deceives someone? Policy violation.
And of course, as the coup de grâce, they throw in a DMCA style provision to make sure you can't modify the models in any way that could cause them to violate their kid-glove precepts.
Could you share what models you consider to be OK for commercialization?
Mistral series in particular but those with OSI approved licenses such as Apache 2.0, MIT, etc.
Wait, you actually care about the license and read it?
It's seems like you aren't up to date.
Most of the startup space is entirely ignoring all these licenses. If the weights are available, it is being used commerically without regards to any licensing.
And everyone is getting away with it and nobody is being sued.
Good luck trying to keep up if you aren't doing the same!
Feels free to hamstring yourself though if you like.
Tried inference with the 7B model and without flash attention this is soooooo slow. With flash attention the fine-tunning requires A100 or H100. Also the inference doesn't always stop generating resulting in garbage being added to the response.
We have implementations in different ML frameworks, so I am not quite sure which one you are referring to. Would you like to file a bug at the relevant GitHub repo?
First of all, I'm using 2 x 4090 for testing. 4090 has 16384 CUDA cores which will become relevant a bit later.
I dug a bit deeper and it seems that with transformers==4.37.0 everything works fine with other HF hosted models (like Llama) but you'll rightfully get this when trying to use Gemma:
ImportError: cannot import name 'GemmaForCausalLM' from 'transformers'
After installing transformers==4.38.0 the fine-tunning speed of Llama drops to 25% (?!?) of what used to be for a reason that I think HF should fix. Testing Gemma it seems I'm hitting a hardware limit as Gemma has a hidden size which is bigger than the available CUDA cores. This seems to make both inference & fine-tunning about 25 times slower than similarly sized Llama 7B. I guess some operations have to be broken down in multiple round trips to the GPU due to my low CUDA core count.
All in all, even if HF fixes the recently introduced slowdown, Gemma seems to be fine-tuneable in reasonable amount of time only by the lucky ones with access to A100/H100.
EDIT: I managed to hack my env to be able to run inference on Gemma with transformers==4.37.0 by keeping the necessary classes in loaded in RAM. It works about 4x faster but still very slow. And both the 7B and the 2B versions behave the same way.
EDIT2: I tried latest transformers from main branch (4.39.0.dev) and behaves the same as 4.38.0.
Also the inference doesn't always stop generating resulting in garbage being added to the response.
That sounds like a chat format misconfiguration.
This could partially be Google's fault, as they used yet another novel prompting format.
Also, for sane inference speed on H100s, you'll have to wait for architecture support from the optimized frameworks. Vanilla transformers is beyond awful even with FA2.
I wonder if people will get confused with the naming
Gemma, Gemini pro, Gemini advanced, Gemini ultra
To a layperson it is not obvious which one is better than the other
I'm not a layperson in this subject and I get confused. :)
I doubt Gemma is targeted for use by a layperson.
Gemini advanced = Gemini ultra
Taking a page out of metas book with open models. I wonder what the game plan here is.
Nice that it allows commercial use!
Mostly to boost research and commercial usage around JAX/Gemini is my read.
Any internal research using Gemma is now more easily externally reproducible, external research and frameworks are easier to translate over, goodwill especially from researchers.
There's also less of a special sauce for text models itself these days with the propietary being more on the pre-training data and training stack (e.g. how to get 10k GPUs/TPUs running together smoothly). Multi-modal models (or adjacent like Sora) are less likely to be open sourced in the immediate term.
There is a lot of work to make the actual infrastructure and lower level management of lots and lots of GPUs/TPUs open as well - my team focuses on making the infrastructure bit at least a bit more approachable on GKE and Kubernetes.
https://github.com/GoogleCloudPlatform/ai-on-gke/tree/main
and
https://github.com/google/xpk (a bit more focused on HPC, but includes AI)
and
https://github.com/stas00/ml-engineering (not associated with GKE, but describes training with SLURM)
The actual training is still a bit of a small pool of very experienced people, but it's getting better. And every day serving models gets that much faster - you can often simply draft on Triton and TensorRT-LLM or vLLM and see significant wins month to month.
Are these any good? I have been trying the non pro version of Gemini, and that seems awful at code generation. I am more keen on getting access to the best model and I would pay for it if I wasn't already paying for ChatGPT 4.
I often talk with GPT4 on road trips about topics I'm interested in. Its great for passing the time.
I tried the same thing with Gemini and its full of nonsense. I was talking with it about the "Heian period" of Japan and it made up all sorts of stuff but you really only could tell because it was so ridiculous. Talked about European women and Native Americans roaming around the famous grassy plains of japan wielding katana and traditional weaponry... in the 1100s.
No such issue with GPT4.
I haven't tried it with code though, since I already have co-pilot. Really hard to trust anything it says after it started making stuff up about such a simple time period.
You should be looking at Deepseek's coding models, and finetunes of those.
I run 33B on my desktop, and find it to be sufficient for many tasks.
Is it pronounced jem-a or ghem-a?
It's pronounced like "gif".
Probably "Jemma" (the superior spelling of the name). It's a play on their "Gemini" product.
What is the context window?
The context length for these models is 8192 tokens.
Parameter counts notwithstanding, it’s an objectively funny outcome that Meta, Microsoft, and Google are all releasing cutting edge open models, while OpenAI keeps theirs closed source.
It's ironic but actually follows their business interests.
Microsoft & google have large cloud divisions that benefit from open models. The lower the cost of AI models, the more they get run and the greater the cloud spend.
Meta is a consumer of AI. They themselves want cheap and effective AI for targeting adverts and building metaverses.
A loose analogy is that both oil producers and car companies want refining to be cheap.
The scariest difference between OpenAI and Google right now is: Ask Gemini who owns the code it writes, and it'll confidently say that Google does. Ask OpenAI, and it'll say that you do. It's that easy to choose which one is the better decision.
Considering the nuanced nature of copyrighting AI outputs, it isn't clear that either answer is correct.
They're really trying hard to avoid saying what kind of "models" these are. I think they're language models, but it's hard to say for sure.
You're right that they don't call them language models. The technical report says:
Gemma models demonstrate strong performance across
academic benchmarks for language understanding,
reasoning, and safety.
Maybe they are reserving the right to expand Gemma model family to multi-modal models.mmm..
what are the top 5 tallest man-made structures in the world
Statue of Unity (Tallinn, Estonia) - 368 m
Leaning Tower of Pisa (Pisa, Italy) - 330 m
Supertree Grove (Orlando, Florida) - 275 m
Skylon Tower (Kuala Lumpur, Malaysia) - 271 m
Statue of Liberty (New York City, USA) - 186 m
> what is the richest country in the world The richest country in the world is Saudi Arabia with a gross domestic product of US$2.3 trillion.
> how many cm are in a ft A foot is equal to 0.0254 metres (ft).
I don't have this issue, what are you using to run the model?
These are the answers I got out of it (first output, no cherry-picking):
1. The tallest man-made structure in the world is the Burj Khalifa, standing at 828 meters tall. The Burj Khalifa is a skyscraper in Dubai, United Arab Emirates.
Here are the top 5 tallest man-made structures in the world, according to the Guinness World Records:
Burj Khalifa (828 m)
Taipei 101 (611 m)
Merdeka Towers (599 m)
Shanghai Tower (632 m)
Petronas Towers (551 m)
2. According to the World Bank, the richest country in the world is Saudi Arabia with a gross domestic product (GDP) of $3944 billion.The richest country in the world is Saudi Arabia with a gross domestic product (GDP) of $3944 billion.3. A foot (ft) is equal to 12 inches (in). There are 2.54 cm in an inch, so there are 2.54 cm x 12 = 30.48 cm in a foot.
It looks like it's pretty resistant to quantization. ollama 4bit 7B doesn't work very well, but the 16bit 2B does
That's useful to know. My experiments with the 4bit 7B currently tagged for use on ollama are not going well at all. Lots of refusals and junk. Downloading 7b-instruct-fp16 now! :-) (Update: Yes, much better, though much slower too, of course.)
Can this run on my AMD Vega VII on Windows 11? As always, AMD is missing:
Optimization across multiple AI hardware platforms ensures industry-leading performance, including NVIDIA GPUs and Google Cloud TPUs.
AMD Vega VII meets the memory requirements. Once tools like LM Studio, ollama, etc. add support for the model, you should be able to run locally like you would any other open weights model.
Open models feature free access to the model weights, but terms of use, redistribution, and variant ownership vary according to a model’s specific terms of use, which may not be based on an open-source license.
does a model being "open" say anything about how it was trained?
Nice to see more open models. Props to the team for coming to the HN comment section to answer questions
The 2B model seems underwhelming. For instance, compared to the recent StableLM2 1.6B model that is slightly smaller and probably wastes some "English metric points" by being multilingual.
The latter (and other similar open models) seem to do similarly well in benchmarks (much better in Math?) with way less fancy stuff. For instance, public data and no secretive filtering with pre trained models or synthetic data.
My take is that using the vanilla approaches take you really far. And many of the latest tricks and hours-of-work buy you little... Will be interesting to see how this plays out, especially for the open source community.
"Carefully tested prompts" sounds a lot like "these are the lotto numbers we know are right" kind of thing? How in the world are these things used for anything programmatically deterministic?
Go to Google announcement > Find “license” in page: no matches > Go to HN thread > Find “license” in page: 28 matches > Read a few sigh could have been exciting
Maybe a dumb question, but why is there a Terms instead of a license? That feels a little flimsier as an open source offering
Has perplexity fallen out of favor? I didn't see it mentioned anywhere. I tried using lm-eval for the 2B model but the results seem wrong (46.1288).
There are some pretty impressive benchmarks on https://ai.google.dev/gemma. Even the 2b model looks fairly not awful?
I guess my weekend is going to be spent exploring this.
I applaud the Google team openly engaging on HN here.
Q: how sure are you that the newer models trained from trillions of tokens - a huge chunk of open web, hasn't been accidentally polluted by slurping test data?
Is this the Deepmind guy in Google more now? what a change the past year has made
They also implemented it in PyTorch. Cool! https://github.com/google/gemma_pytorch
They also implemented in PyTorch. Cool. https://github.com/google/gemma_pytorch
Nice, more choices are good. I just saw that the Ollama project already has these models available (date stamp is 58 minutes ago), so I will use that rather than Colab (I live Colab, but I like to run stuff locally).
Unbefkglievable — Another week, another new name?
Looking forward to Gemma 7bx8 moe
Hope to see support for this in ollama soon!
Can't wait to try it out with ollama locally
Already available in Ollama v0.1.26 preview release, if you'd like to start playing with it locally:
Gemma, Mistral, I feel like Rip van Winkle, asleep for 20 years only to wake up and find the whole tech world changed.
Google, at the moment, is a tech company whose products are actively engaged in the falsification of history for political purposes.
I honestly have no idea where they are going with this but I don't want to be part of it.
They have implemented the model also on their own C++ inference engine: https://github.com/google/gemma.cpp
Someone should try to make a MOE of 2b models
Andrej Karpathy's take from twitter. (https://twitter.com/karpathy/status/1760350892317098371)
Seeing as I published my Tokenizer video yesterday, I thought it could be fun to take a deepdive into the Gemma tokenizer.
First, the Gemma technical report [pdf]: https://storage.googleapis.com/deepmind-media/gemma/gemma-re... says: "We use a subset of the SentencePiece tokenizer (Kudo and Richardson, 2018) of Gemini for com- patibility. It splits digits, does not remove extra whitespace, and relies on byte-level encodings for unknown tokens, following the techniques used for both (Chowdhery et al., 2022) and (Gemini Team, 2023). The vocabulary size is 256k tokens."
The tokenizer.model file is with this code release: https://github.com/google/gemma_pytorch/blob/main/tokenizer/...
I decoded this model protobuf in Python and here is the diff with the Llama 2 tokenizer: https://diffchecker.com/TRnbKRMH/
Notes: - vocab size is quite large: 32K -> 256K - add_dummy_prefix is False. Different from Llama but consistent with GPT. This is a bit more consistent w.r.t. "leave the data alone", as there is no preprocessing step that adds a space to the encoding text. - the model_prefix is the path of the training dataset, which is amusing to look at: "/cns/mf-d/home/gemini-data-access/tokenizers/final_v1_51GB_run1/bpe_coverage_0_999995_v5/255969". Seems to indicate the tokenizer training corpus was ~51GB (?). - a lot of user_defined symbols (i.e. special tokens) are present, e.g. "hardcoding" a sequence of up to 31 newlines as tokens, and a large number of other unclear tokens. I tried decoding the octal representations but it's not clear what's happening here. Also a lot of more special tokens for what look like html elements, e.g. <table>, <tr>, <td>, <i>, <b>, etc. Not 100% sure what the unused tokens are for, maybe this is pre-allocated space to make easier future finetunes that try to add more special tokens, as there is no need to resize vocabularies and perform model surgeries (?).
TLDR this is basically the Llama 2 tokenizer, except bigger (32K -> 256K), with a lot more special tokens, and the only functional departure is that add_dummy_prefix is turned off to False. So e.g. tokenizing:
"hello world" becomes: [17534, 2134] ['hello', 'world']
which otherwise would have been preprocessed to " hello world" (note leading space) and tokenized as: [25612, 2134] ['hello', 'world']
cool
This is such a powerful move!
Thr landing page on ai.google.com seems to be machine translated, for Huggingface it uses the literal German translation (Umarmungen Gesicht)
Thank you for releasing this.
Can the Gemma models be downloaded to run locally, like open-source models Llama2, Mistral, etc ?
Or is your definition of "open" different?
Their definition of "open" is "not open", i.e. you're only allowed to use Gemma in "non-harmful" way.
We all know that Google thinks that saying that 1800s English kings were white is "harmful".
If you know how to make "1800s english kings" show up as white 100% of the time without also making "kings" show up as white 100% of the time, maybe you should apply to Google? Clearly you must have advanced knowledge on how to perfectly remove bias from training distributions if you casually throw stones like this.
Tell me you take this seriously: https://twitter.com/napoleon21st/status/1760116228746805272
It has no problem with other cultures and ethnicities, yet somehow white or Japanese just throws everything off?
I suppose 'bias' is the new word for "basic historic accuracy". I can get curious about other peoples without forcibly promoting them at the expense of my own Western and British people and culture. This 'anti bias' keyword injection is a laughably bad, in your face solution to a non-issue.
I lament the day 'anti-bias' AI this terrible is used to make real world decisions. At least we now know we can't trust such a model because it has already been so evidently crippled by its makers.
Not sure why you're getting downvoted. I would have thought HN of all places would recognize the power and value of OSI licensing and the danger of the proliferation of these source available but definitely not Open Source licenses.
Yes, you can get started downloading the model and running inference on Kaggle: https://www.kaggle.com/models/google/gemma ; for a full list of ways to interact with the model, you can check out https://ai.google.dev/gemma.
A small typo in your model link that breaks it. There’s an extra ; on the end.
Corrected - thanks :)
Can we have llamafile releases as well?
https://github.com/Mozilla-Ocho/llamafile
It should be possible to run it via llama.cpp[0] now.
[0] https://github.com/ggerganov/llama.cpp/pull/5631
Amazing how quickly this happened.
Mistral weights are released under an Apache 2.0 license, but Llama 2 weights are released under a proprietary license that prohibits use by large organizations and imposes usage restrictions, violating terms 5 and 6 the Open Source Definition[0]. Even if you accept that a model with a proprietary training dataset and proprietary training code can be considered "open source", there's no way Llama 2 qualifies.
For consistency with existing definitions[1], Llama 2 should be labeled a "weights available" model.
[0] https://en.wikipedia.org/wiki/The_Open_Source_Definition
[1] https://en.wikipedia.org/wiki/Source-available_software
Yes models can be downloaded locally. In addition to the python NN frameworks and ggml as options, we also implemented a standalone C++ implementation that you can run locally at https://github.com/google/gemma.cpp
Will these soon be available on lmsys for human comparison against other models? Can they run with llama.cpp?
Yes to llama.cpp
https://twitter.com/ggerganov/status/1760293079313973408
I came here wondering if these models are "open" in the sense that they'll show up on sites like Ollama where you can download and run them locally.
Am I correct to conclude that this means they eventually will?
It's unclear to me from Google's docs exactly what "open" means for Gemma
Yes - they are open weights and open inference code, which means they can be integrated into Ollama.
They are not “open training” (either in the training code or training data sense), so they are not reproducible, which some have suggested ought to be a component of the definition of open models.
It really should shouldn't it? I'm quite ML-naïve, but surely providing the model without 'training code or training data' is just like providing a self-hostable binary without the source code? Nobody calls that open source, it's not even source available.
That’s why they’re called open as in free to use how you wish, not open source where the source of the training is also provided.
But my point is there's no analogy for that that we call open? It's like self-hostable, or free (as in beer).
That’s a fair comment, maybe free-to-use is more appropriate.
Man, people will find anything to complain about.
I'm not complaining, I'm unlikely ever to use it (regardless of how open or not it is) so it doesn't really matter to me, just surprised to learn what people mean by 'open' in this context.
It is widely believed (and in some cases acknowledged) that a lot of models are trained on copyrighted data scraped from the web. In some cases, even scrapes of ebook piracy websites - google 'books3' to learn more.
Some companies (such as those working on AI) believe this is legal, others (such as the copyright holders to those books) believe it isn't.
In any case, IMHO it's unlikely any cutting edge models will be offering us their training data any time soon.
https://huggingface.co/google/gemma-7b-it/tree/main
yes, similar to the llama models, you'll also need to accept the license to download them officially. But the llama models have been unofficially downloadable without accepting the license for quite a while, so it's probably just a matter of time.
I find the snyde remarks around open source in the paper and announcement rather off putting.
As the ecosystem evolves, we urge the corporate AI community to move beyond demanding to be taken seriously as a player in open source for models that are not actually open, and avoid preaching with a PR statement that can be interpreted as uniformed at best or malicious at worst.
Which remarks are you referring to?
The synde remarks at metas llama license that doesn't allow companies with 700 million monthly active users to use it, while this model also doesn't have a really 'open' license itself and also this paragraph:
Quick question -- can you tell me where you got that quote? It's not in the main blog or any of the launch communications that I can see.
The quote is from the technical report
https://storage.googleapis.com/deepmind-media/gemma/gemma-re...
Well, given that that restriction added to the meta-llama license is aimed at Google, is petty, and goes against open source norms, I think it’s reasonable that they should feel this way about it.
How is this a snide remark? It's factual and prevented their team from benchmarking against Llama 2.
It would be great to understand what you mean by this -- we have a deep love for open source and the open developer ecosystem. Our open source team also released a blog today describing the rationale and approach for open models and continuing AI releases in the open ecosystem:
https://opensource.googleblog.com/2024/02/building-open-mode...
Thoughts and feedback welcome, as always.
The statement on you not being able to use LLaMA 2 to benchmark is also false and highly misleading see https://x.com/BlancheMinerva/status/1760302091166241163?s=20
If you truly love Open Source, you should update the the language you use to describe your models so it doesn't mislead people into thinking it has something to do with Open Source.
Despite being called "Open", the Gemma weights are released under a license that is incompatible with the Open Source Definition. It has more in common with Source-Available Software, and as such it should be called a "Weights-Available Model".
Working at google is like this, where no matter how much you try to do the right thing you're always under attack.
Does this model also thinks german were black 200 years ago ? Or is afraid to answer basic stuff ? because if this is the case no one will care about that model.
I disagree, coding and RAG performance is all that matters to me. I'm not using an LLM to learn basic facts I already know.
we're at basic knowledge level, if your RAG imply some of it, you can get bad result too. Anyway, would you use a model who makes this nonsense response or one that doesn't? I know which one I will prefer for sure...
If this was better at specific RAG or coding performance I would absolutely, certainly without a doubt use it over a general instruct model in those instances.
How do you ragebait for premium pearl clutching?
I don't know anything about these twitter accounts so I don't know how credible they are, but here are some examples for your downvoters that I'm guessing just think you're just trolling or grossly exaggerating:
https://twitter.com/aginnt/status/1760159436323123632
https://twitter.com/Black_Pilled/status/1760198299443966382
Yea. Just ask it anything about historical people/cultures and it will seemingly lobotomize itself.
I asked it about early Japan and it talked about how European women used Katanas and how Native Americans rode across the grassy plains carrying traditional Japanese weapons. Pure made up nonsense that not even primitive models would get wrong. Not sure what they did to it. I asked it why it assumed Native Americans were in Japan in the 1100s and it said:
How am I supposed to take this seriously? Especially on topics I'm unfamiliar with?
From one of the Twitter threads linked above:
This was exposed as being the case with OpenAI's DALL-E as well - someone had typed a prompt of "Homer Simpson wearing a namebadge" and it generated an image of Homer with brown skin wearing a namebadge that said 'ethnically ambiguous'.
This is ludicrous - if they are fiddling with your prompt in this way, it will only stoke more frustration and resentment - achieving the opposite of why this has been implemented. Surely if we want diversity we will ask for it, but sometimes you don't, and that should be at the user's discretion.\
Another thread for context: https://twitter.com/napoleon21st/status/1760116228746805272
It seems you have exposed the internal debugging tool link in the blog post. You may want to do something about it.
Ah, I see -- the link is wrong, thank you for flagging! Fixing now.
The blog post shares the link for debugging tool as https://*.*.corp.google.com/codelabs/responsible-ai/lit-gemm...
.corp and the login redirect makes me believe it was supposed to be an internal link
Same for the “safety classifier”
https://codelabs.developers.google.com/codelabs/responsible-...
The link in the Debugging section redirects to a Google SSO login page
The link to the debugging tool is an internal one, no one outside Google can access it
Is there any truth behind this claim that folks who worked on Gemma have left Google?
https://x.com/yar_vol/status/1760314018575634842
It seems very easy to check no? Look at the names in the paper and check where they are working now
Good idea. I've confirmed all the leadership / tech leads listed on page 12 are still at Google.
Can someone with a Twitter account call out the tweet linked above and ask them specifically who they are referring to? Seems there is no evidence of their claim.
Them: here to answer questions
Question
Them: :O
To be fair, I think they are in London, so I assume they have winded down for the day. Will probably have to wait ~12-18 hours for a response.
I confirmed all the folks listed on page 12 are still at Google (listed below). I am guessing the linked tweet is a BS claim.
What is the license? I couldn’t find it on the 1P site or Kaggle.
You can find the terms on our website, ai.google.dev/gemma:
https://ai.google.dev/gemma/terms
out of curiosity, why is this a "terms" and not a license? I'm used to reading and understanding the software as coming with a license to use it. Do the terms give us license to use this explicitly?
They do, but unlike a known license, these terms are custom and non-standard. Which means I would guide my commercial clients away from this particular model.
Do you have a plan of releasing higher parameter models?
We have many great things in research and development phases, so stay tuned. I’m hopeful we can share more in the coming weeks and month!
That is awesome!
I hope y'all consider longer context models as well.
Also, are ya'll looking alternative architectures like Mamba? Being "first" with a large Mamba model would cement your architectural choices/framework support like llama did for Meta.
are there plans to release an official GGUF version to use with llama.ccp?
It is already part of the release on Huggingface: https://huggingface.co/google/gemma-7b/blob/main/gemma-7b.gg...
It is a pretty clean release! I had some 500 issues with Kaggle validating my license approval, so you might too, but after a few attempts I could access the model.
I didn't see this when searching thanks
EDIT: it seems this is likely an Ollama bug, please keep that in mind for the rest of this comment :)
I ran Gemma in Ollama and noticed two things. First, it is slow. Gemma got less than 40 tok/s while Llama 2 7B got over 80 tok/s. Second, it is very bad at output generation. I said "hi", and it responded this:
``` Hi, . What is up? melizing with you today!
What would you like to talk about or hear from me on this fine day?? ```
With longer and more complex prompts it goes completely off the rails. Here's a snippet from its response to "Explain how to use Qt to get the current IP from https://icanhazip.com":
``` python print( "Error consonming IP arrangration at [local machine's hostname]. Please try fufing this function later!") ## guanomment messages are typically displayed using QtWidgets.MessageBox ```
Do you see similar results on your end or is this just a bug in Ollama? I have a terrible suspicion that this might be a completely flawed model, but I'm holding out hope that Ollama just has a bug somewhere.
I was going to try these models with Ollama. Did you use a small number of bits/quantization?
The problem exists with the default 7B model. I don't know if different quantizations would fix the problem. The 2B model is fine, though.
Thank you very much for releasing these models! It's great to see Google enter the battle with a strong hand.
I'm wondering if you're able to provide any insight into the below hyperparameter decisions in Gemma's architecture, as they differ significantly from what we've seen with other recent models?
* On the 7B model, the `d_model` (3072) is smaller than `num_heads * d_head` (16*256=4096). I don't know of any other model where these numbers don't match.
* The FFN expansion factor of 16x is MUCH higher than the Llama-2-7B's 5.4x, which itself was chosen to be equi-FLOPS with PaLM's 4x.
* The vocab is much larger - 256k, where most small models use 32k-64k.
* GQA is only used on the 2B model, where we've seen other models prefer to save it for larger models.
These observations are in no way meant to be criticism - I understand that Llama's hyperparameters are also somewhat arbitrarily inherited from its predecessors like PaLM and GPT-2, and that it's non-trivial to run hyperopt on such large models. I'm just really curious about what findings motivated these choices.
I would love answers to these questions too, particularly on the vocab size
Are there any plans for releasing the datasets used?
This would be really interesting in my opinion, but we are not releasing datasets at this time. See the C4 dataset for an earlier open dataset from Google.
What are the supported languages of these models?
This v1 model is focused on English support, but you may find some multilingual capabilities.
How are these performing so well compared to Llama 2, are there any documents on the architecture and differences, is it MoE?
Also note some of the links on the blog post don't work, e.g debugging tool.
We've documented the architecture (including key differences) in our technical report here (https://goo.gle/GemmaReport), and you can see the architecture implementation in our Git Repo (https://github.com/google-deepmind/gemma).
Will there be Gemma-vision models or multimodal Gemma models?
Have the same question.
I cannot count how many times I've seen similar posts on HN, followed by tens of questions from other users, three of which actually get answered by the OP. This one seems to be no exception so far.
What are you talking about? The team is in this thread answering questions.
Hi! This is such an exciting release. Congratulations!
I work on Ollama and used the provided GGUF files to quantize the model. As mentioned by a few people here, the 4-bit integer quantized models (which Ollama defaults to) seem to have strange output with non-existent words and funny use of whitespace.
Do you have a link /reference as to how the models were converted to GGUF format? And is it expected that quantizing the models might cause this issue?
Thanks so much!
As a data point, using the Huggingface Transformers 4-bit quantization yields reasonable results: https://twitter.com/espadrine/status/1760355758309298421
Hi, what is the cutoff date ?
All it will tell me is mid-2018.
Congrats on the launch and thanks for the contribution! This looks like it's on-par or better compared to mistral 7B 0.1 or is that 0.2?
Are there plans for MoE or 70B models?
Great question - we compare to the Mistral 7B 0.1 pretrained models (since there were no pretrained checkpoint updates in 0.2) and the Mistral 7B 0.2 instruction-tuned models in the technical report here: https://goo.gle/GemmaReport
Will there be "extended context" releases like 01.ai did for Yi?
Also, is the model GQA?
It's MQA, documented in the tech report
Will this be available as a Vertex AI foundational model like Gemini 1.0, without deploying a custom endpoint? Any info on pricing? (Also, when will Gemini 1.5 be available on Vertex?)
Hi alekandreev,
Any reason you decided to go with a token vocabulary size of 256k? Smaller vocab/vector sizes like most models in this size seem to be using (~16-32k) are much easier to work with. Would love to understand the technical reasoning here that isn't detailed in the report unfortunately :(.
I'm not sure if this was mentioned in the paper somewhere, but how much does the super large 265k tokenizer vocabulary influence inference speed and how much higher is the average text compression compared to llama's usual 30k? In short, is it really worth going beyond GPT 4's 100k?
Training on 4096 v5es how did you handle crazy batch size :o
Not a question, but thank you for your hard work! Also, brave of you to join the HN comments, I appreciate your openness. Hope y'all get to celebrate the launch :)
Can you share the training loss curve?
It's cool that you guys are able to release open stuff, that must be a nice change from the modus operandi at goog. I'll have to double check but it looks like phi-2 beats your performance in some cases while being smaller, I'm guessing the value proposition of these models is being small and good while also having more knowledge baked in?