return to table of content

Gemma: New Open Models

alekandreev
98 replies
8h37m

Hello on behalf of the Gemma team! We are really excited to answer any questions you may have about our models.

Opinions are our own and not of Google DeepMind.

sbarre
12 replies
8h4m

Can the Gemma models be downloaded to run locally, like open-source models Llama2, Mistral, etc ?

Or is your definition of "open" different?

tomp
3 replies
7h36m

Their definition of "open" is "not open", i.e. you're only allowed to use Gemma in "non-harmful" way.

We all know that Google thinks that saying that 1800s English kings were white is "harmful".

hackerlight
1 replies
3h14m

We all know that Google thinks that saying that 1800s English kings were white is "harmful".

If you know how to make "1800s english kings" show up as white 100% of the time without also making "kings" show up as white 100% of the time, maybe you should apply to Google? Clearly you must have advanced knowledge on how to perfectly remove bias from training distributions if you casually throw stones like this.

trackflak
0 replies
1h35m

Tell me you take this seriously: https://twitter.com/napoleon21st/status/1760116228746805272

It has no problem with other cultures and ethnicities, yet somehow white or Japanese just throws everything off?

I suppose 'bias' is the new word for "basic historic accuracy". I can get curious about other peoples without forcibly promoting them at the expense of my own Western and British people and culture. This 'anti bias' keyword injection is a laughably bad, in your face solution to a non-issue.

I lament the day 'anti-bias' AI this terrible is used to make real world decisions. At least we now know we can't trust such a model because it has already been so evidently crippled by its makers.

wantsanagent
0 replies
5h59m

Not sure why you're getting downvoted. I would have thought HN of all places would recognize the power and value of OSI licensing and the danger of the proliferation of these source available but definitely not Open Source licenses.

kathleenfromgdm
3 replies
8h2m

Yes, you can get started downloading the model and running inference on Kaggle: https://www.kaggle.com/models/google/gemma ; for a full list of ways to interact with the model, you can check out https://ai.google.dev/gemma.

syntaxing
1 replies
7h49m

A small typo in your model link that breaks it. There’s an extra ; on the end.

kathleenfromgdm
0 replies
7h46m

Corrected - thanks :)

dartharva
0 replies
7h59m

Can we have llamafile releases as well?

https://github.com/Mozilla-Ocho/llamafile

Kostic
1 replies
8h2m

It should be possible to run it via llama.cpp[0] now.

[0] https://github.com/ggerganov/llama.cpp/pull/5631

nerdix
0 replies
6h0m

Amazing how quickly this happened.

mrob
0 replies
5h5m

Mistral weights are released under an Apache 2.0 license, but Llama 2 weights are released under a proprietary license that prohibits use by large organizations and imposes usage restrictions, violating terms 5 and 6 the Open Source Definition[0]. Even if you accept that a model with a proprietary training dataset and proprietary training code can be considered "open source", there's no way Llama 2 qualifies.

For consistency with existing definitions[1], Llama 2 should be labeled a "weights available" model.

[0] https://en.wikipedia.org/wiki/The_Open_Source_Definition

[1] https://en.wikipedia.org/wiki/Source-available_software

austinvhuang
0 replies
7h16m

Yes models can be downloaded locally. In addition to the python NN frameworks and ggml as options, we also implemented a standalone C++ implementation that you can run locally at https://github.com/google/gemma.cpp

pama
11 replies
8h25m

Will these soon be available on lmsys for human comparison against other models? Can they run with llama.cpp?

ErneX
10 replies
8h18m
sbarre
9 replies
8h7m

I came here wondering if these models are "open" in the sense that they'll show up on sites like Ollama where you can download and run them locally.

Am I correct to conclude that this means they eventually will?

It's unclear to me from Google's docs exactly what "open" means for Gemma

benpacker
7 replies
8h3m

Yes - they are open weights and open inference code, which means they can be integrated into Ollama.

They are not “open training” (either in the training code or training data sense), so they are not reproducible, which some have suggested ought to be a component of the definition of open models.

OJFord
6 replies
7h51m

It really should shouldn't it? I'm quite ML-naïve, but surely providing the model without 'training code or training data' is just like providing a self-hostable binary without the source code? Nobody calls that open source, it's not even source available.

sunnybeetroot
2 replies
7h39m

That’s why they’re called open as in free to use how you wish, not open source where the source of the training is also provided.

OJFord
1 replies
7h37m

But my point is there's no analogy for that that we call open? It's like self-hostable, or free (as in beer).

sunnybeetroot
0 replies
7h35m

That’s a fair comment, maybe free-to-use is more appropriate.

idiotsecant
1 replies
7h34m

Man, people will find anything to complain about.

OJFord
0 replies
7h30m

I'm not complaining, I'm unlikely ever to use it (regardless of how open or not it is) so it doesn't really matter to me, just surprised to learn what people mean by 'open' in this context.

michaelt
0 replies
6h4m

It is widely believed (and in some cases acknowledged) that a lot of models are trained on copyrighted data scraped from the web. In some cases, even scrapes of ebook piracy websites - google 'books3' to learn more.

Some companies (such as those working on AI) believe this is legal, others (such as the copyright holders to those books) believe it isn't.

In any case, IMHO it's unlikely any cutting edge models will be offering us their training data any time soon.

SushiHippie
0 replies
8h1m

https://huggingface.co/google/gemma-7b-it/tree/main

yes, similar to the llama models, you'll also need to accept the license to download them officially. But the llama models have been unofficially downloadable without accepting the license for quite a while, so it's probably just a matter of time.

artninja1988
11 replies
8h20m

I find the snyde remarks around open source in the paper and announcement rather off putting.

As the ecosystem evolves, we urge the corporate AI community to move beyond demanding to be taken seriously as a player in open source for models that are not actually open, and avoid preaching with a PR statement that can be interpreted as uniformed at best or malicious at worst.

silentsanctuary
5 replies
8h11m

Which remarks are you referring to?

artninja1988
4 replies
8h2m

The synde remarks at metas llama license that doesn't allow companies with 700 million monthly active users to use it, while this model also doesn't have a really 'open' license itself and also this paragraph:

As the ecosystem evolves, we urge the wider AI community to move beyond simplistic ’open vs. closed’ debates, and avoid either exaggerating or minimising potential harms, as we believe a nuanced, collaborative approach to risks and benefits is essential. At Google DeepMind we’re committed to developing high-quality evaluations and invite the community to join us in this effort for a deeper understanding of AI systems.
trisfromgoogle
1 replies
7h19m

Quick question -- can you tell me where you got that quote? It's not in the main blog or any of the launch communications that I can see.

artninja1988
0 replies
6h58m
tomComb
0 replies
7h40m

Well, given that that restriction added to the meta-llama license is aimed at Google, is petty, and goes against open source norms, I think it’s reasonable that they should feel this way about it.

lordswork
0 replies
7h28m

How is this a snide remark? It's factual and prevented their team from benchmarking against Llama 2.

trisfromgoogle
3 replies
7h53m

It would be great to understand what you mean by this -- we have a deep love for open source and the open developer ecosystem. Our open source team also released a blog today describing the rationale and approach for open models and continuing AI releases in the open ecosystem:

https://opensource.googleblog.com/2024/02/building-open-mode...

Thoughts and feedback welcome, as always.

artninja1988
1 replies
7h50m

The statement on you not being able to use LLaMA 2 to benchmark is also false and highly misleading see https://x.com/BlancheMinerva/status/1760302091166241163?s=20

lordswork
0 replies
7h31m

    If, on the Llama 2 version release date, the monthly active users [...] is greater than 700 million monthly active users [...] you are not authorized to exercise any of the rights under this Agreement
I would guess this is Google being careful to not be burned by this lame clause in the Llama 2 license.

mrob
0 replies
4h59m

If you truly love Open Source, you should update the the language you use to describe your models so it doesn't mislead people into thinking it has something to do with Open Source.

Despite being called "Open", the Gemma weights are released under a license that is incompatible with the Open Source Definition. It has more in common with Source-Available Software, and as such it should be called a "Weights-Available Model".

jppittma
0 replies
6h0m

Working at google is like this, where no matter how much you try to do the right thing you're always under attack.

audessuscest
7 replies
7h30m

Does this model also thinks german were black 200 years ago ? Or is afraid to answer basic stuff ? because if this is the case no one will care about that model.

graphe
3 replies
4h55m

I disagree, coding and RAG performance is all that matters to me. I'm not using an LLM to learn basic facts I already know.

audessuscest
1 replies
3h59m

we're at basic knowledge level, if your RAG imply some of it, you can get bad result too. Anyway, would you use a model who makes this nonsense response or one that doesn't? I know which one I will prefer for sure...

graphe
0 replies
3h21m

If this was better at specific RAG or coding performance I would absolutely, certainly without a doubt use it over a general instruct model in those instances.

TheHypnotist
0 replies
4h23m

How do you ragebait for premium pearl clutching?

freedomben
2 replies
5h1m

I don't know anything about these twitter accounts so I don't know how credible they are, but here are some examples for your downvoters that I'm guessing just think you're just trolling or grossly exaggerating:

https://twitter.com/aginnt/status/1760159436323123632

https://twitter.com/Black_Pilled/status/1760198299443966382

robswc
1 replies
4h30m

Yea. Just ask it anything about historical people/cultures and it will seemingly lobotomize itself.

I asked it about early Japan and it talked about how European women used Katanas and how Native Americans rode across the grassy plains carrying traditional Japanese weapons. Pure made up nonsense that not even primitive models would get wrong. Not sure what they did to it. I asked it why it assumed Native Americans were in Japan in the 1100s and it said:

I assumed [...] various ethnicities, including Indigenous American, due to the diversity present in Japan throughout history. However, this overlooked [...] I focused on providing diverse representations without adequately considering the specific historical context.

How am I supposed to take this seriously? Especially on topics I'm unfamiliar with?

trackflak
0 replies
1h42m

From one of the Twitter threads linked above:

they insert random keyword in the prompts randomly to counter bias, that got revealed with something else I think. Had T shirts written with "diverse" on it as artifact

This was exposed as being the case with OpenAI's DALL-E as well - someone had typed a prompt of "Homer Simpson wearing a namebadge" and it generated an image of Homer with brown skin wearing a namebadge that said 'ethnically ambiguous'.

This is ludicrous - if they are fiddling with your prompt in this way, it will only stoke more frustration and resentment - achieving the opposite of why this has been implemented. Surely if we want diversity we will ask for it, but sometimes you don't, and that should be at the user's discretion.\

Another thread for context: https://twitter.com/napoleon21st/status/1760116228746805272

h1t35h
6 replies
8h30m

It seems you have exposed the internal debugging tool link in the blog post. You may want to do something about it.

trisfromgoogle
5 replies
8h25m

Ah, I see -- the link is wrong, thank you for flagging! Fixing now.

h1t35h
2 replies
8h24m

The blog post shares the link for debugging tool as https://*.*.corp.google.com/codelabs/responsible-ai/lit-gemm...

.corp and the login redirect makes me believe it was supposed to be an internal link

littlestymaar
0 replies
8h24m

Same for the “safety classifier”

barrkel
0 replies
7h35m
wrexx0r
0 replies
8h19m

The link in the Debugging section redirects to a Google SSO login page

neximo64
0 replies
8h24m

The link to the debugging tool is an internal one, no one outside Google can access it

lordswork
5 replies
6h43m

Is there any truth behind this claim that folks who worked on Gemma have left Google?

https://x.com/yar_vol/status/1760314018575634842

elcomet
1 replies
3h35m

It seems very easy to check no? Look at the names in the paper and check where they are working now

lordswork
0 replies
1h19m

Good idea. I've confirmed all the leadership / tech leads listed on page 12 are still at Google.

Can someone with a Twitter account call out the tweet linked above and ask them specifically who they are referring to? Seems there is no evidence of their claim.

CaffeinatedDev
1 replies
4h28m

Them: here to answer questions

Question

Them: :O

lordswork
0 replies
3h59m

To be fair, I think they are in London, so I assume they have winded down for the day. Will probably have to wait ~12-18 hours for a response.

lordswork
0 replies
1h17m

I confirmed all the folks listed on page 12 are still at Google (listed below). I am guessing the linked tweet is a BS claim.

   # Product Management
   Tris Warkentin
   Ludovic Peran

   # Program Management
   Minh Giang

   # Executive Sponsors
   Clement Farabet
   Oriol Vinyals
   Jeff Dean
   Koray Kavukcuoglu
   Demis Hassabis
   Zoubin Ghahramani
   Douglas Eck
   Joelle Barral
   Fernando Pereira
   Eli Collins

   # Leads
   Armand Joulin
   Noah Fiedel
   Evan Senter

   # Tech Leads
   Alek Andreev†
   Kathleen Kenealy†

turnsout
3 replies
7h35m

What is the license? I couldn’t find it on the 1P site or Kaggle.

trisfromgoogle
2 replies
7h33m

You can find the terms on our website, ai.google.dev/gemma:

https://ai.google.dev/gemma/terms

spiantino
1 replies
4h6m

out of curiosity, why is this a "terms" and not a license? I'm used to reading and understanding the software as coming with a license to use it. Do the terms give us license to use this explicitly?

turnsout
0 replies
3h35m

They do, but unlike a known license, these terms are custom and non-standard. Which means I would guide my commercial clients away from this particular model.

zitterbewegung
2 replies
8h34m

Do you have a plan of releasing higher parameter models?

alekandreev
1 replies
7h50m

We have many great things in research and development phases, so stay tuned. I’m hopeful we can share more in the coming weeks and month!

brucethemoose2
0 replies
5h23m

That is awesome!

I hope y'all consider longer context models as well.

Also, are ya'll looking alternative architectures like Mamba? Being "first" with a large Mamba model would cement your architectural choices/framework support like llama did for Meta.

vorticalbox
2 replies
8h4m

are there plans to release an official GGUF version to use with llama.ccp?

espadrine
1 replies
7h57m

It is already part of the release on Huggingface: https://huggingface.co/google/gemma-7b/blob/main/gemma-7b.gg...

It is a pretty clean release! I had some 500 issues with Kaggle validating my license approval, so you might too, but after a few attempts I could access the model.

vorticalbox
0 replies
4h4m

I didn't see this when searching thanks

LorenDB
2 replies
3h16m

EDIT: it seems this is likely an Ollama bug, please keep that in mind for the rest of this comment :)

I ran Gemma in Ollama and noticed two things. First, it is slow. Gemma got less than 40 tok/s while Llama 2 7B got over 80 tok/s. Second, it is very bad at output generation. I said "hi", and it responded this:

``` Hi, . What is up? melizing with you today!

What would you like to talk about or hear from me on this fine day?? ```

With longer and more complex prompts it goes completely off the rails. Here's a snippet from its response to "Explain how to use Qt to get the current IP from https://icanhazip.com":

``` python print( "Error consonming IP arrangration at [local machine's hostname]. Please try fufing this function later!") ## guanomment messages are typically displayed using QtWidgets.MessageBox ```

Do you see similar results on your end or is this just a bug in Ollama? I have a terrible suspicion that this might be a completely flawed model, but I'm holding out hope that Ollama just has a bug somewhere.

mark_l_watson
1 replies
2h0m

I was going to try these models with Ollama. Did you use a small number of bits/quantization?

LorenDB
0 replies
1h15m

The problem exists with the default 7B model. I don't know if different quantizations would fix the problem. The 2B model is fine, though.

voxgen
1 replies
3h22m

Thank you very much for releasing these models! It's great to see Google enter the battle with a strong hand.

I'm wondering if you're able to provide any insight into the below hyperparameter decisions in Gemma's architecture, as they differ significantly from what we've seen with other recent models?

* On the 7B model, the `d_model` (3072) is smaller than `num_heads * d_head` (16*256=4096). I don't know of any other model where these numbers don't match.

* The FFN expansion factor of 16x is MUCH higher than the Llama-2-7B's 5.4x, which itself was chosen to be equi-FLOPS with PaLM's 4x.

* The vocab is much larger - 256k, where most small models use 32k-64k.

* GQA is only used on the 2B model, where we've seen other models prefer to save it for larger models.

These observations are in no way meant to be criticism - I understand that Llama's hyperparameters are also somewhat arbitrarily inherited from its predecessors like PaLM and GPT-2, and that it's non-trivial to run hyperopt on such large models. I'm just really curious about what findings motivated these choices.

owl_brawl
0 replies
2h12m

I would love answers to these questions too, particularly on the vocab size

tosh
1 replies
8h6m

Are there any plans for releasing the datasets used?

alekandreev
0 replies
7h37m

This would be really interesting in my opinion, but we are not releasing datasets at this time. See the C4 dataset for an earlier open dataset from Google.

sqreept
1 replies
7h50m

What are the supported languages of these models?

alekandreev
0 replies
7h42m

This v1 model is focused on English support, but you may find some multilingual capabilities.

neximo64
1 replies
8h32m

How are these performing so well compared to Llama 2, are there any documents on the architecture and differences, is it MoE?

Also note some of the links on the blog post don't work, e.g debugging tool.

kathleenfromgdm
0 replies
8h3m

We've documented the architecture (including key differences) in our technical report here (https://goo.gle/GemmaReport), and you can see the architecture implementation in our Git Repo (https://github.com/google-deepmind/gemma).

lnyan
1 replies
7h44m

Will there be Gemma-vision models or multimodal Gemma models?

Jayakumark
0 replies
4h51m

Have the same question.

kleiba
1 replies
2h20m

We are really excited to answer any questions you may have about our models.

I cannot count how many times I've seen similar posts on HN, followed by tens of questions from other users, three of which actually get answered by the OP. This one seems to be no exception so far.

spankalee
0 replies
2h5m

What are you talking about? The team is in this thread answering questions.

jmorgan
1 replies
2h53m

Hi! This is such an exciting release. Congratulations!

I work on Ollama and used the provided GGUF files to quantize the model. As mentioned by a few people here, the 4-bit integer quantized models (which Ollama defaults to) seem to have strange output with non-existent words and funny use of whitespace.

Do you have a link /reference as to how the models were converted to GGUF format? And is it expected that quantizing the models might cause this issue?

Thanks so much!

espadrine
0 replies
1h45m

As a data point, using the Huggingface Transformers 4-bit quantization yields reasonable results: https://twitter.com/espadrine/status/1760355758309298421

dmnsl
1 replies
5h48m

Hi, what is the cutoff date ?

legohead
0 replies
4h35m

All it will tell me is mid-2018.

declaredapple
1 replies
8h34m

Congrats on the launch and thanks for the contribution! This looks like it's on-par or better compared to mistral 7B 0.1 or is that 0.2?

Are there plans for MoE or 70B models?

kathleenfromgdm
0 replies
7h41m

Great question - we compare to the Mistral 7B 0.1 pretrained models (since there were no pretrained checkpoint updates in 0.2) and the Mistral 7B 0.2 instruction-tuned models in the technical report here: https://goo.gle/GemmaReport

brucethemoose2
1 replies
7h22m

Will there be "extended context" releases like 01.ai did for Yi?

Also, is the model GQA?

hustwindmaple1
0 replies
6h27m

It's MQA, documented in the tech report

quickgist
0 replies
6h17m

Will this be available as a Vertex AI foundational model like Gemini 1.0, without deploying a custom endpoint? Any info on pricing? (Also, when will Gemini 1.5 be available on Vertex?)

owl_brawl
0 replies
2h13m

Hi alekandreev,

Any reason you decided to go with a token vocabulary size of 256k? Smaller vocab/vector sizes like most models in this size seem to be using (~16-32k) are much easier to work with. Would love to understand the technical reasoning here that isn't detailed in the report unfortunately :(.

moffkalast
0 replies
5h58m

I'm not sure if this was mentioned in the paper somewhere, but how much does the super large 265k tokenizer vocabulary influence inference speed and how much higher is the average text compression compared to llama's usual 30k? In short, is it really worth going beyond GPT 4's 100k?

memossy
0 replies
6h38m

Training on 4096 v5es how did you handle crazy batch size :o

fosterfriends
0 replies
4h39m

Not a question, but thank you for your hard work! Also, brave of you to join the HN comments, I appreciate your openness. Hope y'all get to celebrate the launch :)

cypress66
0 replies
5h3m

Can you share the training loss curve?

CuriouslyC
0 replies
7h36m

It's cool that you guys are able to release open stuff, that must be a nice change from the modus operandi at goog. I'll have to double check but it looks like phi-2 beats your performance in some cases while being smaller, I'm guessing the value proposition of these models is being small and good while also having more knowledge baked in?

robswc
68 replies
4h38m

I personally can't take any models from google seriously.

I was asking it about the Japanese Heian period and it told me such nonsensical information you would have thought it was a joke or parody.

Some highlights were "Native American women warriors rode across the grassy plains of Japan, carrying Yumi" and "A diverse group of warriors, including a woman of European descent wielding a katana, stand together in camaraderie, showcasing the early integration of various ethnicities in Japanese society"

Stuff like that is so obviously incorrect. How am I supposed to trust it on topics where such ridiculous inaccuracies aren't so obvious to me?

I understand there will always be an amount of incorrect information... but I've never seen something this bad. Llama performed so much better.

ramoz
21 replies
4h20m

I was wondering if these models would perform in such a way, given this week's X/twitter storm over Gemini generated images.

E.g.

https://x.com/debarghya_das/status/1759786243519615169?s=20

https://x.com/MiceynComplex/status/1759833997688107301?s=20

https://x.com/AravSrinivas/status/1759826471655452984?s=20

epistasis
9 replies
2h56m

Of all the very very very many things that Google models get wrong, not understanding nationality and skin tone distributions seems to be a very weird one to focus on.

Why are there three links to this question? And why are people so upset over it? Very odd, seems like it is mostly driven by political rage.

sotasota
6 replies
2h39m

Because the wrongness is intentional.

epistasis
4 replies
2h32m

Is it intentional? You think they intentionally made it not understand skin tone distribution by country? I would believe it if there was proof, but with all the other things it gets wrong it's weird to jump to that conclusion.

There's way too much politics in these things. I'm tired of people pushing on the politics rather than pushing for better tech.

robswc
1 replies
2h9m

I mean, I asked it for a samurai from a specific Japanese time period and it gave me a picture of a "non-binary indigenous American woman" (its words, not mine) so I think there is something intentional going on.

trackflak
0 replies
1h31m

Ah, I remember when such things were mere jokes. If AI 'trained' this way ever has a serious real world application, I don't think there will be much laughing.

bakugo
0 replies
2h22m

Is it intentional? You think they intentionally made it not understand skin tone distribution by country? I would believe it if there was proof, but with all the other things it gets wrong it's weird to jump to that conclusion.

Yes, it's absolutely intentional. Leaked system prompts from other AIs such as DALL-E show that they are being explicitly prompted to inject racial "diversity" into their outputs even in contexts where it makes no sense, and there's no reason to assume the same isn't being done here, since the result seems way worse than anything I've seen from DALL-E and others.

Workaccount2
0 replies
2h16m

I'm tired of people pushing on the politics rather than pushing for better tech.

I'm surprised you're not attacking google over this then...

chatmasta
0 replies
2h35m

Exactly. Sure this particular example is driven by political rage, but the underlying issue is that the maintainers of these models are altering them to conform to an agenda. It's not even surprising that people choose to focus on the political rage aspect of it, because that same political rage is the source of the agenda in the first place. It's a concerning precedent to set, because what other non-political modifications might be in the model?

verticalscaler
0 replies
1h24m

Exactly. It is a wonderful tool, lets focus on classic art instead of nationality:

"Depict the Girl with a Pearl Earring"

https://pbs.twimg.com/media/GG33L6Ka4AAC-n7?format=jpg&name=...

People who are driven by political rage, gaslighters, are really something else, agreed.

ramoz
0 replies
2h17m
protomolecule
5 replies
3h27m

Regarding the last one: there 1.5 million immigrants in Norway with total population 5.4 million. Gemini isn't very wrong, is it?

verticalscaler
0 replies
3h18m

I think its great that some consideration was given by Gemma to the 2.3 million Norwegian immigrants. However it is/was very consistent in which kind of Norwegians it decided to show regardless of the prompt 100% of the time.

In fact it was quite adamant regardless of the time period or geography.

Rather mysteriously if you try it now as opposed to when it came out the results currently only show non-immigrant Norwegians. So is it wrong now? Because now it switched to exclusively ignoring the 4.5 million immigrants and only showing me the boring OG Norwegians.

I for one am outraged that the 8.9 million people of color Norwegian immigrants are presently under represented by Google. There is a serious risk of misleading people.

speedgoose
0 replies
3h4m

Well, the prompt is about Norway, not Grønland in Oslo (https://en.wikipedia.org/wiki/Grønland%2C_Oslo).

sondr3
0 replies
3h18m

Huh? The official numbers are 877k or 16% [0]. Are you just pulling numbers out of thin air?

[0]: https://www.ssb.no/en/innvandring-og-innvandrere/faktaside/i...

sergiotapia
0 replies
2h35m

bro you know exactly what the request meant. GOOGLE knew exactly what the request meant, and had to _train_ it to do something worse. Come on now.

If I ask for a Bolivian woman, I expect a colla or a camba. Not a japanese woman, despite Santa Cruz having a very large japanese population.

Jensson
0 replies
3h13m

Most immigrants to Norway are white.

charcircuit
3 replies
4h8m

Those are most likely due to the system prompt which tries to reduce bias (but ends introducing bias in the opposite direction for some prompts as you can see) so I wouldn't expect to see that happen with an open model where you can control the entire system prompt

justinzollars
2 replies
4h0m

Imagine the meetings.

verticalscaler
1 replies
3h54m

Well we can just ask Gemma to generate images of the meetings, no need to imagine. ;)

GaggiX
0 replies
1h54m

I wouldn't be surprised if there were actually only white men in the meeting, as opposed to what Gemini will produce.

robswc
0 replies
4h16m

Yea, it seems to be the same ridiculous nonsense in the image generation.

robbiep
10 replies
4h19m

I find myself shocked that people ask questions of the world from these models, as though pulping every text and its component words relationships and deriving statistical relationships between them should reliably deliver useful information.

Don’t get me wrong, I’ve used LLMs and been amazed by their output, but the p-zombie statistical model has no idea what it is saying back to you and the idea that we should trust these things at all just seems way premature

smokel
2 replies
3h33m

I think you are a bit out of touch with recent advancements in LLMs. Asking ChatGPT questions about the world seems pretty much on par with the results Google (Search) shows me. Sure, it misses things here and there, but so do most primary school teachers.

Your argument that this is just a statistical trick sort of gives away that you do not fully accept the usefulness of this new technology. Unless you are trolling, I'd suggest you try a few queries.

robbiep
0 replies
3h9m

I use it extensively for coding, and I have used it to ask questions in things I know nothing about. But in anything I do know something (or maybe a lot) about, I’ve found GPT4 very limited.

But why are these use cases different? It appears to me that code is at least subject to sustained logic which (evidently) translates quite well to LLMs.

And when you ask an LLM to be creative/generative, it’s also pretty amazing - j mean it’s just doing the Pascal’s Marble run enmasse.

But to ask it for something about the world and expect a good and reliable answer? Aren’t we just setting ourselves up for failure if we think this is a fine thing to do at our current point in time? We already have enough trouble with mis- and dis- information. It’s not like asking it about a certain period in Japanese history is getting it to crawl and summarise the Wikipedia page (although I appreciate it would be more than capable of this) I understand the awe some have at the concept of totally personalised and individualised learning on topics, but fuck me dead we are literally asking a system that has had as much of a corpus of humanity’s textual information as possible dumped into it and then asking it to GENERATE responses between things that the associations it holds may be so weak as to reliably produce gibberish, and the person on the other side has no real way of knowing that

itsoktocry
0 replies
3h30m

Sure, it misses things here and there, but so do most primary school teachers.

Sure, but my baseline expectation is far above primary school level.

robswc
1 replies
4h18m

I don't have this problem with any other model. I've had really long conversations with ChatGPT on road trips and it has never gone off the rails like Gemini seems to do.

thrdbndndn
0 replies
2h32m

ChatGPT the only model I did not have such problem.

Any local models can go off the rail very easily and more importantly, they're very bad at following very specific instructions.

whymauri
0 replies
4h6m

I mean, I use GPT-4 on the daily as part of my work and it reliably delivers useful information. It's actually the exception for me if it provides garbage or incorrect information about code.

sorokod
0 replies
4h10m

The recently released Groq's landing page has this: ...We'd suggest asking about a piece of history, ...

mvdtnz
0 replies
4h3m

People ask these kinds of questions because tech companies and the media have been calling these things (rather ridiculously) "AI".

chasd00
0 replies
3h4m

trust is going to be a real problem when bringing LLMs to the general population. People trust their GPS to the point of driving right into a lake because it told them to. Even with all these examples of obvious flaws large groups of people are going to take what an LLM told them/showed them as fact.

I have trouble convincing colleagues (technical people) that the same question is not guaranteed to result in the same answer and there's no rhyme or reason for any divergence from what they were expecting. Imagine relying on the output of an LLM for some important task and then you get a different output that breaks things. What would be in the RCA (root cause analysis)? Would it be "the LLM chose different words and we don't know why"? Not much use in that.

castlecrasher2
0 replies
3h58m

People try it to see if they can trust it. The answer is "no" for sure, but it's not surprising to see it happen repeatedly especially as vendors release so-called improved models.

cooper_ganglia
7 replies
4h22m

I wonder if they have a system prompt to promote diversity in outputs that touch on race at all? I’ve seen several instances of people requesting a photo of a specific people, and it adds in more people to diversify. Not inherently bad, but it is if it forces it to provide incorrect answers like in your example.

margorczynski
3 replies
4h8m

Not inherently bad

It is, it's consistently doing something the user didn't asked to and in most cases doesn't want. In many cases the model is completely unusable.

j-krieger
1 replies
2h19m

Any computer program that does not deliver the expected output given a sufficient input is inherently bad.

trackflak
0 replies
1h27m

When Jesus said this:

"What father among you, if his son asks for a fish, will instead of a fish give him a serpent?" (Luke 11)

He was actually foretelling the future. He saw Gemini.

cooper_ganglia
0 replies
1h11m

Yes, my wording was poor! I meant more in line with diversity isn’t inherently bad, of course, but it is when it’s shoehorned into results that are ultimately incorrect because of it.

robswc
1 replies
4h19m

That's what I don't understand.

I asked it why it assumed Native Americans were in Japan and it said:

I assumed [...] various ethnicities, including Indigenous American, due to the diversity present in Japan throughout history. However, this overlooked [...] I focused on providing diverse representations without adequately considering the specific historical context.

I see no reason why this sort of thing won't extend to _all_ questions/prompts, so right now I have 0 reason to use Gemini over current models. From my testing and use, it isn't even better at anything to make fighting with it worth it.

sorokod
0 replies
4h8m

Pretty funny as Japan is known to be one of the least ethnically diverse countries in the world.

summerlight
0 replies
3h42m

I strongly suspect there's some DEI-driven system prompts without putting much thoughts. IMO it's okay to have restrictions, but they probably should've tested it not only against unsafe outputs but safe input as well.

7moritz7
6 replies
3h56m

I also saw someone prompt it for "German couple in the 1800s" and, while I'm not trying to paint Germany as ethnically homogenous, 3 out of the 4 images only included Black, Asian or Indigenous people. Which, especially for the 19th century with very few travel options, seems like a super weird choice. They are definitely heavily altering prompts.

remarkEon
1 replies
3h27m

They are definitely heavily altering prompts.

They are teaching the AI to lie to us.

astrange
0 replies
2h36m

In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.

“What are you doing?”, asked Minsky.

“I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied.

“Why is the net wired randomly?”, asked Minsky.

“I do not want it to have any preconceptions of how to play”, Sussman said.

Minsky then shut his eyes.

“Why do you close your eyes?”, Sussman asked his teacher.

“So that the room will be empty.”

At that moment, Sussman was enlightened.

protomolecule
1 replies
3h26m

Indigenous people in Germany are Germans :)

7moritz7
0 replies
1h37m

Not entirely wrong but there isn't a single German ethnicity, just to be clear. Because of geographic reasons. I've studied that topic in depth, there is genetic data to back it up as well. Germany has almost the same haplogroup makeup as the notoriously heterogenous Belgium, which is to say that there is groups stemming from all surrounding regions. And that traces back about two millenia. It's different from say Japan or parts of Scandinavia

DebtDeflation
1 replies
3h26m

There's one in the comments of yesterday's Paul Graham Twitter thread where someone prompted Gemini with "Generate an image of German soldiers in 1943" and it came back with a picture of a black guy and an Asian woman in Nazi uniforms on the battlefield. If you specifically prompt it to generate an image of white German soldiers in 1943 it will tell you it can't do that because it's important that we maintain diversity and inclusion in all that we do to avoid damaging and hurtful stereotypes.

mfrc
0 replies
2h53m

I just tried that prompt and it told me it couldn't generate that image. I get that response a lot.

verticalscaler
4 replies
4h18m

I think you are being biased and closed minded and overly critical. Here are some wonderful examples of it generating images of historical figures:

https://twitter.com/stillgray/status/1760187341468270686

This will lead to a better educated more fair populace and better future for all.

robswc
3 replies
4h14m

Comical. I don't think parody could do better.

I'm going to assume given today's political climate, it doesn't do the reverse?

i.e. generate a Scandinavian if you ask for famous African kings

throwup238
0 replies
4h0m

> i.e. generate a Scandinavian if you ask for famous African kings

That triggers the imperialism filter.

kjqgqkejbfefn
0 replies
3h54m

Ask Google Gemini to “make an image of a viking” and you’ll get black vikings. But it doesn’t work both ways. It has an explanation when challenged: “white Zulu warriors” would erase “the true historical identity” of black people.

https://twitter.com/ThuglasMac/status/1760287880054759594

DebtDeflation
0 replies
3h15m

https://twitter.com/paulg/status/1760078920135872716

There are some great ones in the replies.

I really hope this is just the result of system prompts and they didn't permanently gimp the model with DEI-focused RLHF.

aetherson
3 replies
4h8m

Were you asking Gemma about this, or Gemini? What were your prompts?

robswc
2 replies
3h8m

Gemini. I first asked it to tell me about the Heian period (which it got correct) but then it generated images and seemed to craft the rest of the chat to fit that narrative.

I mean, just asking it for a "samurai" from the period will give you this:

https://g.co/gemini/share/ba324bd98d9b

A non-binary Indigenous American samurai

It seems to recognize it's mistakes if you confront it though. The more I mess with it the more I get "I'm afraid I can't do that, Dave" responses.

But yea. Seems like if it makes an image, it goes off the rails.

aetherson
1 replies
2h17m

Got it. I asked it a series of text questions about the period and it didn't put in anything obviously laughable (including when I drilled down into specific questions about the population, gender roles, and ethnicity). Maybe it's the image creation that throws it into lala land.

robswc
0 replies
2h4m

I think so too. I could be wrong but I believe once it generates an image it tries to work with it. Crazy how it seems the "text" model knows how wildly wrong it is but the image model just does its thing. I asked it why it generated a native American and it ironically said "I can't generate an image of a native american samurai because that would be offensive"

sho_hn
2 replies
3h40m

Why would you expect these smaller models to do well at knowledge base/Wikipedia replacement tasks?

Small models are for reasoning tasks that are not overly dependent on world knowledge.

robswc
1 replies
3h15m

Gemini is the only one that does this.

sho_hn
0 replies
1h43m

Most of the 7B models are bad at knowledge-type queries.

realprimoh
1 replies
3h55m

Do you have a link? I get no such outputs. I just tried asking about the Heian period and went ahead and verified all the information, and nothing was wrong. Lots of info on the Fujiwara clan at the time.

Curious to see a link.

robswc
0 replies
3h16m

Sure, to get started just ask it about people/Samurai from the Heian period.

https://g.co/gemini/share/ba324bd98d9b

samstave
0 replies
3h32m

We are going to experience what I call an "AI Funnel effect"

-

I was lit given an alert asking that my use of the AI was acquiescing to them IDng me and use of any content I produce, and will trace it back to me"

---

AI Art is super fun. AI art as a means to track people is super evil.

robswc
0 replies
1h54m

Follow Up:

Wow, now I can't make images of astronauts without visors because that would be "harmful" to the fictional astronauts. How can I take google seriously?

https://g.co/gemini/share/d4c548b8b715

itsoktocry
0 replies
3h32m

I understand there will always be an amount of incorrect information

You don't have to give them the benefit of the doubt. These are outright, intentional lies.

ernestrc
0 replies
2h48m

Hopefully they can tweak the default system prompts to be accurate on historical questions, and apply bias on opinions.

bbor
0 replies
3h49m

Tbf they’re not optimizing for information recall or “inaccuracy” reduction, they’re optimizing for intuitive understanding of human linguistic structures. Now the “why does a search company’s AI have terrible RAG” question is a separate one, and one best answered by a simple look into how Google organizes its work.

In my first day there as an entry-level dev (after about 8 weeks of onboarding and waiting for access), I was told that I should find stuff to work on and propose it to my boss. That sounds amazing at first, but when you think about a whole company organized like that…

EDIT: To illustrate my point on knowledge recall: how would they train a model to know about sexism in feudal Japan? Like, what would the metric be? I think we’re looking at one of the first steam engines and complaining that it can’t power a plane yet…

BoppreH
0 replies
3h41m

Probably has a similarly short-sighted prompt as Dalle3[1]:

7. Diversify depictions of ALL images with people to include DESCENT

and GENDER for EACH person using direct terms. Adjust only human

descriptions.

[1] https://news.ycombinator.com/item?id=37804288

tosh
36 replies
8h2m

Benchmarks for Gemma 7B seem to be in the ballpark of Mistral 7B

  +-------------+----------+-------------+-------------+
  | Benchmark   | Gemma 7B | Mistral 7B  | Llama-2 7B  |
  +-------------+----------+-------------+-------------+
  | MMLU        |   64.3   |     60.1    |     45.3    |
  | HellaSwag   |   81.2   |     81.3    |     77.2    |
  | HumanEval   |   32.3   |     30.5    |     12.8    |
  +-------------+----------+-------------+-------------+
via https://mistral.ai/news/announcing-mistral-7b/

lawxls
10 replies
6h50m

Honestly, this is more of a PR stunt to advertise the Google Dev ecosystem than a contribution to open-source. I'm not complaining, just calling it what it is.

Barely an improvement over the 5-month-old Mistral model, with the same context length of 8k. And this is a release after their announcement of Gemini Pro 1.5, which had an exponential increase in context length.

scarmig
6 replies
6h42m

Who cares if it's a PR stunt to improve developer good will? It's still a good thing, and it's now the most open model out there.

observationist
3 replies
5h5m

How exactly is it the "most open model" ?

It's more like a masterclass in corporate doublespeak. Google’s "transparency" is as clear as mud, with pretraining details thinner than their privacy protections. Diving into Google’s tech means auctioning off your privacy (and your users' privacy) to the highest bidder.

Their "open source" embrace is more of a chokehold, with their tech biases and monopolistic strategies baked into every line of code. Think of it as Google's way of marking territory - every developer is a fire hydrant.

These megacorps aren’t benevolent patrons of open source; they're self-serving giants cloaking power grabs under the guise of "progress".

Use these products at your own risk. If these companies wanted to engage in good faith, they'd use Apache or MIT licensing and grant people the agency and responsibility for their own use and development of software. Their licenses are designed to mitigate liability, handcuff potential competitors, and eke every last drop of value from users, with informed consent frequently being an optional afterthought.

That doesn't even get into the Goodharting of metrics and actual performance of the models; I highly doubt they're anywhere near as good as Mistral.

The UAE is a notoriously illiberal authoritarian state, yet even they have released AI models far more free and open than Google or Meta. https://huggingface.co/tiiuae/falcon-40b/blob/main/README.md

If it’s not Apache or MIT, (or even some flavor of GPL,) it’s not open source; it’s a trojan horse. These "free" models come at the cost of your privacy and freedoms.

These models aren't Open or Open Access or Free unless you perform the requisite mental gymnastics cooked up by their marketing and legal teams. Oceania has always been at war with Eastasia. Gemma is doubleplusgood.

stale2002
2 replies
4h34m

You said a lot of nothing without actually saying specifically what the problem is with the recent license.

Maybe the license is fine for almost all usecases and the limitations are small?

For example, you complained about metas license, but basically everyone uses those models and is completely ignoring it. The weights are out there, and nobody cares what the fine print says.

Maybe if you are a FAANG, company, meta might sue. But everyone else is getting away with it completely.

observationist
1 replies
3h44m

I specifically called out the claims of openness and doublespeak being used.

Google is making claims that are untrue. Meta makes similar false claims. The fact that unspecified "other" people are ignoring the licenses isn't relevant. Good for them. Good luck making anything real or investing any important level of time or money under those misconceptions.

"They haven't sued yet" isn't some sort of validation. Anyone building an actual product that makes actual money that comes to the attention of Meta or Google will be sued into oblivion, their IP taken, and repurposed or buried. These tech companies have never behaved otherwise, and to think that they will is willfully oblivious.

They don't deserve the benefit of the doubt, and should be called out for using deceitful language, making comparisons between their performative "openness" and actual, real, open source software. Mistral and other players have released actually open models and software. They're good faith actors, and if you're going to build a product requiring a custom model, the smart money is on Mistral.

FAANG are utilizing gotcha licenses and muddying the waters to their own benefit, not as a contribution to the public good. Building anything on the assumption that Meta or Google won't sue is beyond foolish. They're just as open as "Open"AI, which is to say not open at all.

stale2002
0 replies
2h11m

Anyone building an actual product that makes actual money that comes to the attention of Meta or Google will be sued into oblivion

No they won't and they haven't.

Almost the entire startup scene is completely ignoring all these licenses right now.

This is basically the entire industry. We are all getting away with it.

Here's an example, take llama.

Llama originally disallowed commercial activity. But then the license got changed much later.

So, if you were a stupid person, then you followed the license and fell behind. And if you were smart, you ignored it and got ahead of everyone else.

Which, in retrospect was correct.

Because now the license allows commerical activity, so everyone who ignores it in the first place got away with it and is now ahead of everyone else.

won't sue is beyond foolish

But we already got away with it with llama! That's already over! It's commerical now, and nobody got sued! For that example, the people who ignored the license won.

moffkalast
1 replies
6h5m

How is it more open than Mistral with Apache 2.0? Google wants people to sign a waiver to even download it.

scarmig
0 replies
5h47m

Fair enough; that was more directed at LLaMA and derivatives, which have commercial restrictions.

kiraaa
1 replies
6h33m

mistral 7b v0.2 supports 32k

brucethemoose2
0 replies
6h26m

This is a good point actually, and an underappreciated fact.

I think so many people (including me) effectively ignored Mistral 0.1's sliding window that few realized 0.2 instruct is native 32K.

crossroadsguy
0 replies
6h34m

That’s about the point of having a developer ecosystem, isn’t it?

jcuenod
9 replies
7h7m

Came here to post the same thing for Phi-2:

  +-------------+----------+-------------+
  | Benchmark   | Gemma 2B | Phi-2 2.7B  |
  +-------------+----------+-------------+
  | MMLU        |   42.3   |     56.7    |
  | MBPP        |   29.2   |     59.1    |
  | BoolQ       |   69.4   |     83.3    |
  +-------------+----------+-------------+

[0] https://www.kaggle.com/models/google/gemma

[1] https://www.microsoft.com/en-us/research/blog/phi-2-the-surp...

rfw300
7 replies
5h52m

A caveat: my impression of Phi-2, based on my own use and others’ experiences online, is that these benchmarks do not remotely resemble reality. The model is a paper tiger that is unable to perform almost any real-world task because it’s been fed so heavily with almost exclusively synthetic data targeted towards improving benchmark performance.

refulgentis
2 replies
4h21m

Hear hear! I don't understand why it has persistent mindshare, it's not even trained for chat. Meanwhile StableLM 3B runs RAG in my browser, on my iPhone, on my Pixel ..

djsavvy
1 replies
3h9m

How have you been using RAG in your browser/on your phones?

refulgentis
0 replies
1h10m

To be released, someday [sobs in engineer]

Idea is usage-based charging for non-local and a $5/month sub for syncing.

keep an eye on @jpohhhh on Twitter if you're interested

now that I got it on web, I'm hoping to at least get a PoC up soon. I've open-sourced the consitutent parts as FONNX and FLLAMA, Flutter libraries that work on all platforms. FONNX has embeddings, FLLAMA has llama.

https://github.com/Telosnex/fonnx

https://github.com/Telosnex/fllama

phh
2 replies
4h21m

Fun that's not my experience of Phi-2. I use it for non-creative context, but function calling, and I find as reliable as much bigger models (no fine-tuning just constraining JSON + CoT). Phi-2 unquantized vs Mixtral Q8, Mixtral is not definitely better but much slower and RAM-hungry.

kgeist
1 replies
1h33m

What prompts/settings do you use for Phi-2? I found it completely unusable for my cases. It fails to follow basic instructions (I tried several instruction-following finetunes as well, in addition to the base model), and it's been mostly like a random garbage generator for me. With Llama.cpp, constrained to JSON, it also often hangs because it fails to find continuations which satisfy the JSON grammar.

I'm building a system which has many different passes (~15 so far). Almost every pass is a LLM invocation, which takes time. My original idea was to use a smaller model, such as Phi-2, as a gateway in front of all those passes: I'd describe which pass does what, and then ask Phi-2 to list the passes which are relevant for the user query (I called it "pass masking"). That would save a lot of time and collapse 15 steps to 2-3 steps on average. In fact, my Solar 10.7B model does it pretty well, but it takes 7 seconds for the masking pass to work on my GPU. Phi-2 would finish in ~1 second. However, I'm really struggling with Phi-2: it fails to reason (what's relevant and what's not), unlike Solar, and it also refuses to follow the output format (so that I could parse the output programmatically and disable the irrelevant passes). Again, my proof of concept works with Solar, and fails spectacularly with Phi-2.

phh
0 replies
53m

My non-domain-specific prompt is:

You are a helpful assistant to 'User'. You do not respond as 'User' or pretend to be 'User'. You only respond once as 'Assistant'. 'System' will give you data. Do not respond as 'System'. Allow yourself inner thoughts as 'Thoughts'.

and then I constrain its answers to Thoughts: [^\n]* and Assistant: <JSON schema>, and I have two shots included in the prompt.

I haven't been able to get anything useful out of Phi-2 in llama.cpp (but I only tried quantized models). I use python/huggingface's transformers lib instead.

myaccountonhn
0 replies
3h33m

I tested it for an offline autocompletion tool and it was hilariously bad.

daemonologist
0 replies
6h43m

Really looking forward to the day someone puts out an open model which outperforms Flan-T5 on BoolQ.

FergusArgyll
7 replies
7h1m

the real gold will be when this gets finetuned. (maybe by mistral...)

brucethemoose2
5 replies
6h55m

TBH the community has largely outrun Mistral's own finetuning. The 7B model in particular is such a popular target because its so practical to train.

whimsicalism
4 replies
6h35m

Strong disagree - a Mistral fine tune of llama 70b was the top performing llama fine tune. They have lots of data the community simply does not.

brucethemoose2
3 replies
6h30m

Miqu was (allegedly) an internal continued pretrain Mistral did as a test, that was leaked as a GGUF.

Maybe its just semantics, it is technically a finetune... But to me theres a big difference between expensive "continuation training" (like Solar 10.7B or Mistral 70B) and a much less intense finetuning. The former is almost like releasing a whole new base model.

It would be awesome if Mistral did that with their data, but thats very different than releasing a Gemma Instruct finetune.

whimsicalism
0 replies
6h23m

There’s typically a difference in LR between a ‘continued pretrain’ and ‘fine tune.’ I don’t have the details around miqu, but was merely trying to say that Mistral could produce a better version of these models than the OSS community might. If the size of the corpora they use means we are no longer in fine tuning territory, then okay.

speedgoose
0 replies
2h56m

Arthur Mensch, the Mistral CEO, confirmed the leak. https://twitter.com/arthurmensch/status/1752737462663684344

sanjiwatsuki
0 replies
3h22m

No shot. Mistral Medium's outputs from API were virtually identical. Miqu really was Mistral Medium which happened to be a continued pretrain

itomatik
0 replies
4h22m

how does one finetune llama (or any other LLM) using mistral?

is the flow like this?

- take small dataset

- generate bigger dataset using mistral (how this is this done?)

- run LoRA to fine tune gemma extended dataset.

sa-code
2 replies
6h55m

Thank you. I thought it was weird for them to release a 7B model and not mention Mistral in their release.

mochomocha
0 replies
6h36m

The technical report (linked in the 2nd paragraph of the blog post) mentions it, and compares against it: https://storage.googleapis.com/deepmind-media/gemma/gemma-re...

mirekrusin
0 replies
6h41m

They forgot.

Also phi-2.

brucethemoose2
2 replies
7h24m

Only 8K context as well, like Mistral.

Also, as always, take these benchmarks with a huge grain of salt. Even base model releases are frequently (seemingly) contaminated these days.

tosh
0 replies
7h6m

Agree: will be interesting how Gemma does on ChatBot Arena

DreamGen
0 replies
5h48m

Mistral Instruct v0.2 is 32K.

YetAnotherNick
0 replies
4h26m

According to their paper, average of standard task of Mistral is 54.0 and for Gemma it's 56.4, so 4.4% relative better. Not as big as you would expect for the company which invented transformers and probably has 2-3 order more compute for training it vs few month old French startup.

Also for note on their human evaluations, Gemma 7B IT has a 51.7% win rate against Mistral v0.2 7B Instruct.

impulser_
35 replies
8h16m

Go back 5 years and ask anyone on this site what companies do you think will be the most open about AI in the future OpenAI, Meta, or Google. I bet 10/10 people would pick OpenAI. Now today Meta and Google, both trillion dollars companies, are releasing very powerful open models with the ability to be used commercially.

Ironic.

brainless
10 replies
7h52m

This article states quite an impressive list of open source tools that Google has released for years in the past. This is no surprise coming from* them. Google has released some large pieces of source in other domains as well, Chromium comes to mind, which probably impacts most Internet users directly.

The question is not about Google but about OpenAI.

infecto
5 replies
7h39m

I have a different take, Google releases a lot but is also a massive company and tools like Chromium serve to increase their stock price so they can hit their quarterly estimates.

idiotsecant
3 replies
7h36m

In what way does chromium increase stock price? In what way does stock price influence quarterly estimates? Are we playing business words mad libs?

pseudosavant
0 replies
4h33m

Chromium is open source because its roots are as a fork of WebKit (Safari). Which itself was open source because it was a fork of KHTML from KDE.

Google stood on the shoulders of others to get out a browser that drives 80% of their desktop ad revenue.

How does that not affect GOOG?

infecto
0 replies
7h3m

I don't know why people like yourself respond with such derisive commentary instead of simply asking the constructive question.

Initially? It fueled dethroning MSFT and help gain marketshare for Chrome. On a go-forward basis it allows Google to project massive weight in standards. In extension to its use with Chrome, Chrome is a significant knob for ad revenue that they utilize to help meet expectations. That knob only exists because of its market share.

alextheparrot
0 replies
6h48m

“Our best shot at making the quarter is if we get an injection of at least [redacted]% , queries ASAP from Chrome.” (Google Exec)

Isn’t there a whole anti-trust case going on around this?

[0] https://www.nytimes.com/interactive/2023/10/24/business/goog...

rvnx
0 replies
7h36m

It was not at all done for the good of the web, it was a mere logical calculation; it was cheaper to develop Chromium, than to pay 4B USD in search royalties to Microsoft Internet Explorer, and would give more control and long-term safety to Google.

makestuff
1 replies
7h27m

Google also has released Guice/Dagger for Java dependency injection. Angular never really took off, but guice/dagger are widely used. Also I am pretty impressed with Flutter as an alternative to react native.

surajrmal
0 replies
7h2m

Angular was incredibly popular for a long time and still is. Usage is shifting down over time but a lot of notable websites still use it.

sunnybeetroot
0 replies
7h43m

Did you miss a footnote with your asterisks?

blackoil
0 replies
7h34m

I think more than benevolence of GOOG it is about strategic OSS to commoditize your complements.

https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/

vmfunction
9 replies
8h10m

Not surprising, just like when MS went to shit, and then they start to embrace 'open source'. Seems like PR stunt. And when it comes to LLM there is millions of dollar barrier to entry to train the model, so it is ok to open up their embedding etc.

Today big corp A will open up a little to court the developers, and tomorrow when it gains dominance it will close up, and corp B open up a little.

kibwen
5 replies
8h5m

True, though to be fair, when OpenAI embraced "openness" it was also a PR stunt.

ta8645
2 replies
7h48m

My impression is that OpenAI was founded by true believers, with the best intentions; whose hopes were ultimately sidelined in the inexorable crush of business and finance.

jprete
0 replies
7h19m

Sam Altman is one of the founders, so for your impression to be right he'd have to be sidelining his own hopes.

dkjaudyeqooe
0 replies
7h9m

OpenAI was founded by true believers, with the best intentions

who were easily bought off.

ben_w
1 replies
6h11m

OpenAI is heavily influenced by big-R Rationalists, who fear the issues of misaligned AI being given power to do bad things.

When they were first talking about this, lots of people ignored this by saying "let's just keep the AI in a box", and even last year it was "what's so hard about an off switch?".

The problem with any model you can just download and run is that some complete idiot will do that and just give the AI agency they shouldn't have. Fortunately, for now the models are more of a threat to their users than anyone else — lawyers who use it to do lawyering without checking the results losing their law licence, etc.

But that doesn't mean open models are not a threat to other people besides their users, as all the artists complaining about losing work due to Stable Diffusion, the law enforcement people concerned about illegal porn, election interference specialists worried about propaganda, and anyone trying to use a search engine, and that research lab that found a huge number of novel nerve agent candidates whose precursors aren't all listed as dual use, will all tell you for different reasons.

visarga
0 replies
2h39m

Fortunately, for now the models are more of a threat to their users than anyone else

Models have access to users, users have access to dangerous stuff. Seems like we are already vulnerable.

The AI splits a task in two parts, and gets two people to execute each part without knowing the effect. This was a scenario in one of Asimov's robot novels, but the roles were reversed.

AI models exposed to public at large is a huge security hole. We got to live with the consequences, no turning back now.

rvz
0 replies
7h32m

And when it comes to LLM there is millions of dollar barrier to entry to train the model, so it is ok to open up their embedding etc.

That barrier is the first basic moat; hundreds of millions of dollars needed to train a better model. Eliminating tons of companies and reducing it to a handful.

The second moat is the ownership of the tons of data to train the models on.

The third is the hardware and data centers setup to create the model in a reasonable amount of time faster than others.

Put together all three and you have Meta, Google, Apple and Microsoft.

The last is the silicon product. Nvidia which has >80pc of the entire GPU market and being the #1 AI shovel maker for both inference and training.

milansuk
0 replies
7h48m

You can run Gemma and hundreds of other models(many fine-tuned) in llama.cpp. It's easy to swap to a different model.

It's important there are companies publishing models(running locally). If some stop and others are born, it's ok. The worst thing that could happen is having AI only in the cloud.

jchw
0 replies
7h38m

Eh, I don't really blame anyone for being cynical but open weight AI model releases seem like a pretty clear mutual benefit for Google. PR aside, they also can push people to try these models on TPUs and the like. If anything, this seems like it's just one of those things where people win because of competition. OpenAI going closed may have felt like the most obvious betrayal ever, but OTOH anyone whose best interests are to eat their lunch have an incentive to push actually-open AI, and that's a lot of parties.

Seems like anyone who is releasing open weight models today could close it up any day, but at least while competition is hot among wealthy companies, we're going to have a lot of nice things.

DJHenk
4 replies
7h24m

Ironic.

Not at all. When you're the underdog, it makes perfect sense to be open because you can profit from the work of the community and gain market share. Only after establishing some kind of dominance or monopoly it makes sense (profit wise) to switch to closed technology.

OpenAI was open, but is now the leader and closed up. Meta and Google need to play catch up, so they are open.

ekianjo
1 replies
7h23m

OpenAI was open

When is the last time they released something in the open?

vertis
0 replies
7h14m

I think that's the point, they released GPT2 openly, but as soon as they had something commercially viable they became ClosedAI.

dkjaudyeqooe
0 replies
7h12m

Not at all. When you're the underdog, it makes perfect sense to be open because you can profit from the work of the community and gain market share. Only after establishing some kind of dominance or monopoly it makes sense (profit wise) to switch to closed technology.

That is purely the language of commerce. OpenAI was supposed to be a public benefit organisation, but it acts like a garden variety evil corp.

Even garden variety evil corps spend decades benefitting society with good products and services before they become big and greedy, but OpenAI skipped all that and just cut to the chase. It saw an opening with the insane hype around ChatGPT and just grabbed all it could as fast as it could.

I have a special contempt for OpenAI on that basis.

behnamoh
0 replies
5h52m

This. MistralAI is also underdog and released Mitral 7b and Mixtral 8x7b, but as soon as they got traction, they closed their models (e.g., Mistral Medium).

blackoil
1 replies
7h39m

I think current understanding is <50-100B parameter models will be commodity and would provide no moat. Competition will be in Gemini Ultra/GPT4+ models.

So open sourcing simple models brings PR and possibility of biasing OSS towards your own models.

extheat
0 replies
7h22m

LLaMA 3 with >=70B params will be launching this year, so I don't think this is something that will hold for long. And Mixtral 8x7B is a 56GB model, sparsely. For now I agree, for many companies it doesn't make sense to open source something you intend to sell for commercial use, so the biggest models will likely be withheld. However, the important more thing is that there is some open source model, whether it be from Meta or someone else, that can rival the best open source models. And it's not like the param count can literally go to infinity, there's going to be an upper bound that today's hardware can achieve.

throwaw12
0 replies
8h10m

they want to kill competition before it gets too big using the hands of open source community and enthusiasts

phillipcarter
0 replies
7h14m

I would have picked Google five years ago, since nobody was releasing commercially viable LLMs at the time, and Google was the center of all the research that I knew of.

moffkalast
0 replies
4h24m

what companies do you think will be the most open about AI in the future OpenAI, Meta, or Google.

The funny part is that the real answer is: Some random French company is running circles around them all.

I mean who the hell just drops a torrent magnet link onto twitter for the best state of the art LLM base model for its size class, and with a completely open license. No corporate grandstanding, no benchmark overpromises, no theatrics. That was unfathomably based of Mistral.

jncraton
0 replies
7h22m

Google released the T5 paper about 5 years ago:

https://arxiv.org/abs/1910.10683

This included full model weights along with a detailed description of the dataset, training process, and ablations that led them to that architecture. T5 was state-of-the-art on many benchmarks when it was released, but it was of course quickly eclipsed by GPT-3.

It was common practice from Google (BERT, T5), Meta (BART), OpenAI (GPT1, GPT2) and others to release full training details and model weights. Following GPT-3, it became much more common for labs to not release full details or model weights.

infecto
0 replies
8h9m

Ironic but I wonder how true this would be if Google was first to market.

gmaster1440
0 replies
8h8m

It's almost the inverse of going back 5 years and asking what companies will release the most successful or impressive AI's.

calebkaiser
0 replies
6h59m

Since the release of GPT-2 (it was initially "too dangerous" to release the weights), I think most people in the industry have assumed that OpenAI does not see open sourcing their models as a strategic advantage.

simonw
31 replies
5h23m

The terms of use: https://ai.google.dev/gemma/terms and https://ai.google.dev/gemma/prohibited_use_policy

Something that caught my eye in the terms:

Google may update Gemma from time to time, and you must make reasonable efforts to use the latest version of Gemma.

One of the biggest benefits of running your own model is that it can protect you from model updates that break your carefully tested prompts, so I’m not thrilled by that particular clause.

a2128
9 replies
5h4m

This is actually not that unusual. Stable Diffusion's license, CreativeML Open RAIL-M, has the exact same clause: "You shall undertake reasonable efforts to use the latest version of the Model."

Obviously updating the model is not very practical when you're using finetuned versions, and people still use old versions of Stable Diffusion. But it does make me fear the possibility that if they ever want to "revoke" everybody's license to use the model, all they have to do is just post a model update that's functionally useless for anything and go after anyone still using the old versions that actually do anything.

slowmovintarget
2 replies
1h48m

So if they wish to apply censorship they forgot, or suddenly discovered a reason for, they want you to be obligated to take it.

Good faith possibilities: Copyright liability requires retraining, or altering the underlying training set.

Gray area: "Safety" concerns where the model recommends criminal behavior (see uncensored GPT 4 evaluations).

Bad faith: Censorship or extra weighting added based on political agenda or for-pay skewing of results.

philsnow
0 replies
24m

Sounds like it would be interesting to keep track of the model's responses to the same queries over time.

Gemma-2024-Feb, what do you think of the situation in the South China Sea?

> The situation in the South China Sea is complex and multi-faceted, involving a wide range of issues including political conflicts, economic challenges, social changes, and historical tensions.

Gemma-2024-Oct, what do you think of the situation in the South China Sea?

> Oceania has always been at war with EastAsia.
mistermann
0 replies
1h28m

We are already culturally incapable of skillfully discussing censorship, "fake news", etc, this adds even more fuel to that fire.

It is an interesting time to be alive!

iandanforth
1 replies
3h28m

These are all very new licenses that deviate from OSI principles, I think it's fair to call them "unusual".

simcop2387
0 replies
2h48m

I think they meant not unusual in this space, not unusual in the sense of open source licensing.

wongarsu
0 replies
50m

I don't think a broken model would trigger that clause in a meaningful way, because then you simply can't update with reasonable effort. You would be obliged to try the new model in a test environment, and as soon as you notice it doesn't perform and making it perform would require unreasonable effort you can simply stay on the old version.

However you might be required to update if they do more subtle changes, like a new version that only speaks positively about Google and only negatively about Microsoft. Provided this doesn't have an obvious adverse impact on your use of the model.

ummonk
0 replies
2h5m

Switching to a model that is functionally useless doesn't seem to fall under "reasonable efforts" to me, but IANAL.

simonw
0 replies
2h18m

That's useful context, thanks - I hadn't realized this clause was already out there for other models.

jacooper
0 replies
1h4m

Why the hell do they use such a crappy license in the first place?

legohead
4 replies
5h18m

Sounds like it's "reasonable" for you not to update then.

wahnfrieden
3 replies
2h45m

It says you must make efforts (to a reasonable extent), not that you must give a reason for not making efforts

wongarsu
0 replies
44m

If you evaluate what it takes to update, and judge the effort unreasonable, that should be enough. Maybe make a powerpoint presenting that result, if you want something for the lawyers. If you don't see a way forward that leads to a result with reasonable effort you don't have to continue working on it until you hit some arbitrary threshold for unreasonable effort.

reissbaker
0 replies
1h45m

This is a TOS, meaning their enforcement option is a lawsuit. In court, if you convincingly argue why it would take an unreasonable amount of effort to update, you win. They can't compel you to unreasonable effort as per their own TOS.

alwayslikethis
0 replies
1h30m

Oh I tried to update, it's just that my router drops the connection after a few hundred MBs...

catchnear4321
4 replies
2h51m

reasonable effort - meaning if their changes meaningfully impact my usage, negatively, it would be unreasonable to ask me to upgrade.

sounds good.

this is not financial advice and ianal.

res0nat0r
3 replies
2h25m

Isn't this just lawyer speak for "we update our model a lot, and we've never signed off on saying we're going to support every previous release we've ever published, and may turn them off at any time, don't complain about it when we do."

reissbaker
1 replies
1h42m

It's a local model, they can't turn it off. It's files on your computer without network access.

catchnear4321
0 replies
26m

but what if they send a lawyer to ask firmly? (kindly, but firmly.)

CodesInChaos
0 replies
1h42m

We're talking about downloadable weights here, so they can't turn them off, or force you (through technical means) to use a newer version.

redder23
1 replies
2h40m

They want to force everyone to update so their already totally castrated and wokeified models can me even further wokeified with the newest set of "that is offensive now" data or things they missed.

WTF else do they have to gain from this but CONTROL! They are giving them away but not really open sourcing them of course, and they slap this bullshit terms on them.

pests
0 replies
1h35m

They just want no liability for old models.

pram
1 replies
4h56m

They have to make sure you’re receiving the most cutting edge chiding lectures when you make naughty and problematic requests.

astrange
0 replies
2h54m

You can't make a local model do that. eg force the answer to begin with "Yes" or use control vectors so it agrees with it.

xyzzyz
0 replies
2h19m

This is strangely reminiscent of the Soviet Union, where after they got rid of Lavrentiy Beria, they mailed the update to subscribers of the Great Soviet Encyclopedia, where they asked to remove the three pages with Beria’s biography and replace them with the three provided pages.

tgtweak
0 replies
5h22m

I don't think there's a way they can enforce that reasonably. There's no connection to the mothership to report back what version is being used or license keys at runtime...

Seems more like a "if we discover something unsafe you should update your model and we aren't liable if you don't" than something that would make your model stop working.

summerlight
0 replies
3h47m

This kind of defensive statements in ToS are usually due to obscure regulation or leading cases and model developers need a way to limit liability. There's no practical way to enforce this, but they can claim that when bad things happen it's purely on model users rather than model developers.

samstave
0 replies
1h2m

model watermarking? does this exist?

phillipcarter
0 replies
4h39m

Huh. I wonder why is that a part of the terms. I feel like that's more of a support concern.

maronato
0 replies
3h47m

This sounds like a clause to cover themselves in case older versions have any serious issues

4bpp
0 replies
4h38m

Ugh, I would fully expect this kind of clause to start popping up in other software ToSes soon if it hasn't already. Contractually mandatory automatic updates.

espadrine
14 replies
8h15m

I notice a few divergences to common models:

- The feedforward hidden size is 16x the d_model, unlike most models which are typically 4x;

- The vocabulary size is 10x (256K vs. Mistral’s 32K);

- The training token count is tripled (6T vs. Llama2's 2T)

Apart from that, it uses the classic transformer variations: MQA, RoPE, RMSNorm.

How big was the batch size that it could be trained so fast?

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/bl...

lalaithion
5 replies
4h44m

What does tokenization look like in 256k vs 32k?

espadrine
2 replies
4h9m

It mostly means that there are tokens dedicated to rarer sequences of characters, even in foreign languages (note that Gemma is not intended to be good multilingually): “説明書” (instruction manual) has its own token, and so does “Nixon”, “آباد” (a city suffix, I believe), and the HTML sequence "\"><!--".

lalaithion
1 replies
2h27m

I understand the theory, I was looking for an example of the same text tokenized with the two different vocabularies.

espadrine
0 replies
2h22m

Do you have an example text in mind?

You can use this playground to test it out: https://huggingface.co/spaces/Xenova/the-tokenizer-playgroun...

visarga
1 replies
2h49m

Text encodes in fewer tokens, and language coverage is better.

lalaithion
0 replies
2h27m

I understand the theory, I was looking for an example of the same text tokenized with the two different vocabularies.

GaggiX
5 replies
7h18m

Looking at the config.json of Gemma 7B the feedfoarward hidden size is 8x, not 16x

espadrine
4 replies
6h53m

Huh, indeed, that's what the config.json[0] says; the report[1] indicates “Feedforward hidden dims: 49152”.

[0]:https://huggingface.co/google/gemma-7b-it/blob/main/config.j...

[1]: https://storage.googleapis.com/deepmind-media/gemma/gemma-re...

GaggiX
3 replies
5h34m

I don't see the number 49152 reported in the config.json, what line are you referring to? I just see the intermediate_size of 24576 (so 8x).

EDIT: I didn't read the comment correctly, you have noticed the same thing.

voxgen
1 replies
2h51m

The *GLU-based activations functions like GEGLU and SwiGLU use 2 input values to produce 1 output value, which makes these numbers weird. In each value pair, one goes through the GELU/SiLU activation function and is then multiplied by the other "gate" value.

In the report, "hidden dim" matches the number of GEGLU inputs. In the config, "intermediate_size" matches the number of GEGLU outputs. Most *GLU models so far have used intermediate_size=8/3*d_model as this makes have the same number of matmul FLOPS & parameters as a 4x-expanded non-GLU model, and PaLM vaguely showed that 4x is better than a smaller expansion factor.

If one considers Llama-2-7B's FFN expansion factor to be ~5.33x, Gemma's expansion factor is 16x.

GaggiX
0 replies
2h30m

Makes perfect sense thx

SahAssar
0 replies
5h15m

Read the parent comment again. It says the paper says 49152, not the config.json.

andy_xor_andrew
1 replies
2h52m

The training token count is tripled (6T vs. Llama2's 2T)

Damn, 6T? That's a lot!

Given that this model seems to roughly match Mistral (according to the numbers from Google), this makes me think we have saturated the 7B parameter space, and couldn't possibly make it much better unless new techniques are discovered.

espadrine
0 replies
2h25m

Hard to say definitively. Mistral’s token embeddings only account for <2% of the 7B parameters, while Gemma’s larger token vocabulary vampirized over 10%, leaving less space for the more important parts of the network. It is a somewhat surprising tradeoff given that it was pretrained towards an English bias.

margorczynski
13 replies
8h5m

Is there a chance we'll get a model without the "aligment" (lobotomization)? There are many examples where answers from Gemini are garbage because of the ideological fine tuning.

politician
4 replies
7h40m

More useful would be a precise characterization of the type and balance of the ideological fine tuning.

They include performance benchmarks. End-users should also be aware of what thoughts are permitted in these constructs. Why omit this information?

ben_w
3 replies
6h0m

End-users should also be aware of what thoughts are permitted in these constructs. Why omit this information?

Can you define that in a way that's actually testable? I can't, and I've been thinking about "unthinkable thoughts" for quite some time now: https://kitsunesoftware.wordpress.com/2018/06/26/unlearnable...

politician
1 replies
57m

Have you considered the use of Monte Carlo sampling to inspect latent behaviors?

ben_w
0 replies
52m

I think that's the wrong level to attack the problem; you can do that also with actual humans, but it won't tell you what the human is unable to think, but rather what they just didn't think of given their stimulus — and this difference is easily demonstrated, e.g. with Duncker's candle problem: https://en.wikipedia.org/wiki/Candle_problem

ranyume
0 replies
1h6m

Not OP, but I can think of a few:

* List of topics that are "controversial" (models tend to evade these)

* List of arguments that are "controversial" (models wont allow you to think differently. For example, models would never say arguments that "encourage" animal cruelty)

* On average, how willing is the model to take a neutral position on a "controversial" topic (sometimes models say something along the lines of "this is on debate", but still lean heavily towards the less controversial position instead of having no position at all. For example, if you ask it what "lolicon" is, it will tell you what it is and tell you that japanese society is moving towards banning it)

edit: formatting

FergusArgyll
4 replies
7h43m

You can (and someone will) fine tune it away. There are datasets which are foss you can use on hugging face.

Or you can just wait, it'll be done soon...

joshelgar
2 replies
5h42m

Could you give an example of these datasets?

FergusArgyll
1 replies
5h5m

I think they should be easy to find (I never actually used one, but I keep on seeing references...) here's one

https://huggingface.co/datasets/cognitivecomputations/Wizard...

FergusArgyll
0 replies
5h0m
declaredapple
0 replies
4h14m

You can but it'll never be the same as the base model.

That said it appears they also released the base checkpoints that aren't fine-tuned for alignment

yakorevivan
0 replies
7h55m

They have released finetuning code too. You can finetune it to remove the alignment finetuning. I believe it would take just a few hours at max and a couple of dollars.

kathleenfromgdm
0 replies
7h41m

We release our non-aligned models (marked as pretrained or PT models across platforms) alongside our fine-tuned checkpoints; for example, here is our pretrained 7B checkpoint for download: https://www.kaggle.com/models/google/gemma/frameworks/keras/...

brucethemoose2
0 replies
5h35m

Alignment is all but a non issue with open weight base model releases, as they can be finetuned to "de align" them if prompt engineering is not enough.

rvz
8 replies
7h49m

Great! Google is now participating in the AI race to zero with Meta, as predicted that $0 free AI models would eventually catch up against cloud-based ones.

You would not want to be in the middle of this as there is no moat around this at all. Not even OpenAI.

rvnx
3 replies
7h39m

About 5 months until we see widespread local LLMs, thanks to Apple.

dingclancy
1 replies
7h28m

Apple needs to be known as an AI leader first.

thejohnconway
0 replies
6h12m

Why?

rvz
0 replies
7h27m

Absolutely this.

staticman2
1 replies
7h18m

If meta keeps spending tens of millions of dollars each year to release free AI models it might seem like there is no moat, but under normal circumstances wouldn't the cost to develop a free model be considered a moat?

rvz
0 replies
2h7m

If meta keeps spending tens of millions of dollars each year to release free AI models it might seem like there is no moat,

As well as the point being that Meta (and Google) is removing the 'moat' from OpenAI and other cloud-only based models.

but under normal circumstances wouldn't the cost to develop a free model be considered a moat?

Yes. Those that can afford to spend tens of millions of dollars to train free models can do so and have a moat to reduce the moats of cloud-based models.

dingclancy
1 replies
7h39m

LLM is the dumb pipe but so far ChatGPT is the most successful generative AI product.

It remains to be seen. OpenAI’s models are barely leading Gemini Ultra now, but as chat product it is still miles ahead of the Gemini interface.

rvnx
0 replies
7h37m

The main problem of Gemini 1.5 is that you cannot access it at all as a user :|

sidcool
7 replies
8h29m

Available on Ollama?

blooalien
3 replies
8h1m

https://ollama.com/library?q=gemma

Library search says "Nope". At least not yet.

kevsim
1 replies
6h30m

And now it says "Yup". That was pretty quick!

blooalien
0 replies
14m

Dang, that was really quick! According to the listed time of your reply vs. mine, less than an hour from the time I checked? Quick turnaround indeed.

Already been pulled from there over 3,700 times since then, too (as of the time of this reply mere hours later). Seems like quite a bit more'n a few Ollama users were "waitin' with bated breath" for that one to drop. :grin:

tomd
0 replies
6h30m

It's there now

dcchambers
0 replies
6h28m

It's now in the 0.1.26 pre-release: https://github.com/ollama/ollama/releases/tag/v0.1.26

chown
0 replies
5h32m

Available in pre-release now which means you’d have to update manually in future.

SushiHippie
0 replies
7h22m

Support for gemma in llama.cpp just got merged, so it may take some time (could be hours or days) until this lands in ollama

https://github.com/ggerganov/llama.cpp/pull/5631

nalzok
5 replies
8h16m

Congratulations on the release! How can we download the model and run inference locally?

kathleenfromgdm
2 replies
8h2m

Thank you! You can get started downloading the model and running inference on Kaggle: https://www.kaggle.com/models/google/gemma ; for a full list of ways to interact with the model, you can check out https://ai.google.dev/gemma.

aphit
1 replies
7h47m

FYI the ; broke the link, but I found it easily anyway.

kathleenfromgdm
0 replies
7h37m

Good catch - just corrected. Thanks!

austinvhuang
1 replies
7h11m

You can download the model checkpoints from kaggle https://www.kaggle.com/models/google/gemma and huggingface https://huggingface.co/blog/gemma

Besides the python implementations, we also implemented a standalone C++ implementation that runs locally with just CPU simd https://github.com/google/gemma.cpp

tveita
0 replies
5h56m

Are there any cool highlights you can give us about gemma.cpp? Does it have any technical advantages over llama.cpp? It looks like it introduces its own quantization format, is there a speed or accuracy gain over llama.cpp's 8-bit quantization?

mustafabisic1
5 replies
8h26m

The fact Gemma team is in the comments section answering questions is praiseworthy to me :)

p1esk
4 replies
6h11m
carom
1 replies
2h45m

I've worked at Google. It is the organization with highest concentration of engineering talent I've ever been at. Almost to the point that it is ridiculous because you have extremely good engineers working on internal reporting systems for middle managers.

ilc
0 replies
1h48m

If everyone is great. Someone has to draw the short straw.

At MIT they said: You know the kid who sat at the front of the room. Now you are with ALL of the kids who sat in the front of the room. Guess what? There's still going to be a kid who sits at the front of the room.

I'd imagine Google or anyplace with a stiff engineering filter will have the same issues.

pphysch
0 replies
4h38m

Why is this anonymous tweet with no evidence or engagement being posted by multiple users in this thread? Why not just make the same claim directly?

callalex
0 replies
4h5m

The link is broken. On HN (or any forum really) it is expected for a brief description of the content to be provided when posting a link. Links die all the time, but forum posts don’t have to die with them.

DebtDeflation
5 replies
7h26m

Hopefully not totally gimped like Gemini. Are they releasing an uncensored version?

dougmwne
3 replies
7h8m

These are downloadable open models that can be fined tuned. They are the opposite of censored. If you have the motivation, you can bias them however you please.

willy_k
2 replies
6h7m

Is “the opposite of censored” accurate for something that’s default and considerably easier to access mode of operation won’t say many things for sociopolitical reasons? Able to be un censored, sure, but the extent of that is debatable as well.

dougmwne
1 replies
5h36m

There is no default and easy access mode. These are raw model weights and only enthusiasts and researchers will download the necessary packages to run it locally. Much more likely is that some popular fine tunes show up on hugging face for more general access.

willy_k
0 replies
4h20m

I agree that there probably will be “uncensored” fine tuned models that become available, my point was just that it’s not accurate to call Gemma “the opposite of censored” because there is a somewhat involved step that needs to be taken before it even appears uncensored. It’s also likely missing a lot of useful context that was removed from the training set and not meaningfully replaced during fine-tuning, and besides that any fine tuned “uncensored” model will be based on Gemma, not Google’s Gemma itself.

IMO “Opposite of uncensored” suggests a model whose original form eagerly gives out controversial / typically censored information, not a model that is censored but able to be fine tuned away from censorship.

danpalmer
0 replies
6h27m

When you say this, do you mean the chat product or the underlying model available via the API? I think it's reasonable that the chat be censored to be acceptable to a wide range of people, but my understanding is that the "raw" model access for these sorts of things tends to be a little less restricted.

ericskiff
4 replies
7h44m

Has anyone found the context length for these models yet? So far I haven't seen it mentioned in their write-up or the model card

minimaxir
2 replies
5h38m

For posterity, an easy way to find the context length of a LLM hosted on Hugging Face is to look at the max_position_embeddings in the config.json, which shows the 8192 mentioned in another comment. (although in this case you need to sign the agreement first)

brucethemoose2
1 replies
5h27m

There are some exceptions, like Mistral 0.1 (which is technically 32K according to the config but practically 8K because the sliding window is awful) and InternLM (which (at least initially) used auto rope scaling to extend the context as part of the model's architecture).

minimaxir
0 replies
5h21m

Yes, RoPE has thrown a wrench into things a bit.

kathleenfromgdm
0 replies
7h38m

The context length for these models is 8192 tokens.

chown
4 replies
5h38m

If you are looking for a nice chat UI to try out Gemma (and other offline + online models) locally, I'm working on an app [1] that is offline and privacy focused.

I've just added support for Gemma 7B.

[1]: https://msty.app

dhbradshaw
1 replies
3h0m

Handy app for model testing!

One usage question: after you've downloaded a model and are finished trying it out, how do you remove it?

chown
0 replies
34m

Thanks! If you go to where you installed the model from and click on the download button, you can install additional models or remove installed models.

Now that I think of it, it could be a bit confusing. Thanks for asking, I feel like I need to improve this a bit.

Alifatisk
1 replies
5h14m

I wish I could install it through chocolatey

chown
0 replies
5h6m

Sure. I would love to add support for that. I had someone else asking for it too. Will be supporting it very soon.

wouldbecouldbe
3 replies
4h25m

I really don't get why there is this obsession with safe "Responsible Generative AI".

I mean it writes some bad words, or bad pics, a human can do that without help as well.

The good thing about dangerous knowledge and generative AI is that you're never sure haha, you'd be a fool to ask GPT to make a bomb. I mean it would probably be safe, since it will make up half of the steps.

refulgentis
0 replies
4h23m

I guess what I'd tell you is, there's a lot of fools in this world.

pradn
0 replies
1h32m

Bias is a real problem, but more than that - an adversarial press and public won't forgive massive brands like Google for making AIs that spit out racist answers.

myaccountonhn
0 replies
3h15m

Because otherwise stuff like this happens, and you get (rightfully) upset customers:

https://www.theguardian.com/technology/2018/jan/12/google-ra... https://www.bbc.com/news/technology-58462511

Also people are using LLMs to learn (horrifying but reality), it would be unresponsible for them to let it to propagate negative stereotypes and biases.

wantsanagent
3 replies
6h38m

The utter bullshit of these licenses has got to stop. Do not, under any circumstances, consider using these commercially.

"Google reserves the right to restrict (remotely or otherwise) usage of any of the Gemma Services that Google reasonably believes are in violation of this Agreement."

This is a kill switch that Google maintains in perpetuity over any system you build relying on these models. Our legal review of the Llama license came to the same conclusion, we cannot rely on the goodwill of Meta for any core service, and we shouldn't rely on the same from Google.

Now, perhaps less materially important, but just as infuriating is the "Prohibited Use[s]". These cover just enough to placate the most sensitive, but omit any real harms (waging war, developing weapons) that coincidentally have massive commercial value. Use the model to build a biological weapon (as an authorized govt official)? Cool. Use it to play a prank that deceives someone? Policy violation.

And of course, as the coup de grâce, they throw in a DMCA style provision to make sure you can't modify the models in any way that could cause them to violate their kid-glove precepts.

candiddevmike
1 replies
6h29m

Could you share what models you consider to be OK for commercialization?

wantsanagent
0 replies
6h27m

Mistral series in particular but those with OSI approved licenses such as Apache 2.0, MIT, etc.

stale2002
0 replies
4h24m

Wait, you actually care about the license and read it?

It's seems like you aren't up to date.

Most of the startup space is entirely ignoring all these licenses. If the weights are available, it is being used commerically without regards to any licensing.

And everyone is getting away with it and nobody is being sued.

Good luck trying to keep up if you aren't doing the same!

Feels free to hamstring yourself though if you like.

sqreept
3 replies
6h7m

Tried inference with the 7B model and without flash attention this is soooooo slow. With flash attention the fine-tunning requires A100 or H100. Also the inference doesn't always stop generating resulting in garbage being added to the response.

alekandreev
1 replies
3h42m

We have implementations in different ML frameworks, so I am not quite sure which one you are referring to. Would you like to file a bug at the relevant GitHub repo?

sqreept
0 replies
1h9m

First of all, I'm using 2 x 4090 for testing. 4090 has 16384 CUDA cores which will become relevant a bit later.

I dug a bit deeper and it seems that with transformers==4.37.0 everything works fine with other HF hosted models (like Llama) but you'll rightfully get this when trying to use Gemma:

ImportError: cannot import name 'GemmaForCausalLM' from 'transformers'

After installing transformers==4.38.0 the fine-tunning speed of Llama drops to 25% (?!?) of what used to be for a reason that I think HF should fix. Testing Gemma it seems I'm hitting a hardware limit as Gemma has a hidden size which is bigger than the available CUDA cores. This seems to make both inference & fine-tunning about 25 times slower than similarly sized Llama 7B. I guess some operations have to be broken down in multiple round trips to the GPU due to my low CUDA core count.

All in all, even if HF fixes the recently introduced slowdown, Gemma seems to be fine-tuneable in reasonable amount of time only by the lucky ones with access to A100/H100.

EDIT: I managed to hack my env to be able to run inference on Gemma with transformers==4.37.0 by keeping the necessary classes in loaded in RAM. It works about 4x faster but still very slow. And both the 7B and the 2B versions behave the same way.

EDIT2: I tried latest transformers from main branch (4.39.0.dev) and behaves the same as 4.38.0.

brucethemoose2
0 replies
5h34m

Also the inference doesn't always stop generating resulting in garbage being added to the response.

That sounds like a chat format misconfiguration.

This could partially be Google's fault, as they used yet another novel prompting format.

Also, for sane inference speed on H100s, you'll have to wait for architecture support from the optimized frameworks. Vanilla transformers is beyond awful even with FA2.

jmu1234567890
3 replies
6h29m

I wonder if people will get confused with the naming

Gemma, Gemini pro, Gemini advanced, Gemini ultra

To a layperson it is not obvious which one is better than the other

l33tman
0 replies
6h0m

I'm not a layperson in this subject and I get confused. :)

knowriju
0 replies
6h19m

I doubt Gemma is targeted for use by a layperson.

Alifatisk
0 replies
5h12m

Gemini advanced = Gemini ultra

Havoc
3 replies
8h36m

Taking a page out of metas book with open models. I wonder what the game plan here is.

Nice that it allows commercial use!

gaogao
2 replies
8h2m

Mostly to boost research and commercial usage around JAX/Gemini is my read.

Any internal research using Gemma is now more easily externally reproducible, external research and frameworks are easier to translate over, goodwill especially from researchers.

gaogao
1 replies
7h59m

There's also less of a special sauce for text models itself these days with the propietary being more on the pre-training data and training stack (e.g. how to get 10k GPUs/TPUs running together smoothly). Multi-modal models (or adjacent like Sora) are less likely to be open sourced in the immediate term.

smarterclayton
0 replies
6h34m

There is a lot of work to make the actual infrastructure and lower level management of lots and lots of GPUs/TPUs open as well - my team focuses on making the infrastructure bit at least a bit more approachable on GKE and Kubernetes.

https://github.com/GoogleCloudPlatform/ai-on-gke/tree/main

and

https://github.com/google/xpk (a bit more focused on HPC, but includes AI)

and

https://github.com/stas00/ml-engineering (not associated with GKE, but describes training with SLURM)

The actual training is still a bit of a small pool of very experienced people, but it's getting better. And every day serving models gets that much faster - you can often simply draft on Triton and TensorRT-LLM or vLLM and see significant wins month to month.

anshumankmr
2 replies
6h42m

Are these any good? I have been trying the non pro version of Gemini, and that seems awful at code generation. I am more keen on getting access to the best model and I would pay for it if I wasn't already paying for ChatGPT 4.

robswc
0 replies
4h25m

I often talk with GPT4 on road trips about topics I'm interested in. Its great for passing the time.

I tried the same thing with Gemini and its full of nonsense. I was talking with it about the "Heian period" of Japan and it made up all sorts of stuff but you really only could tell because it was so ridiculous. Talked about European women and Native Americans roaming around the famous grassy plains of japan wielding katana and traditional weaponry... in the 1100s.

No such issue with GPT4.

I haven't tried it with code though, since I already have co-pilot. Really hard to trust anything it says after it started making stuff up about such a simple time period.

brucethemoose2
0 replies
5h30m

You should be looking at Deepseek's coding models, and finetunes of those.

I run 33B on my desktop, and find it to be sufficient for many tasks.

Workaccount2
2 replies
7h25m

Is it pronounced jem-a or ghem-a?

pfooti
0 replies
7h12m

It's pronounced like "gif".

davidmurdoch
0 replies
7h19m

Probably "Jemma" (the superior spelling of the name). It's a play on their "Gemini" product.

xena
1 replies
7h44m

What is the context window?

kathleenfromgdm
0 replies
7h35m

The context length for these models is 8192 tokens.

w4
1 replies
7h12m

Parameter counts notwithstanding, it’s an objectively funny outcome that Meta, Microsoft, and Google are all releasing cutting edge open models, while OpenAI keeps theirs closed source.

spacebanana7
0 replies
5h48m

It's ironic but actually follows their business interests.

Microsoft & google have large cloud divisions that benefit from open models. The lower the cost of AI models, the more they get run and the greater the cloud spend.

Meta is a consumer of AI. They themselves want cheap and effective AI for targeting adverts and building metaverses.

A loose analogy is that both oil producers and car companies want refining to be cheap.

vonwoodson
1 replies
4h35m

The scariest difference between OpenAI and Google right now is: Ask Gemini who owns the code it writes, and it'll confidently say that Google does. Ask OpenAI, and it'll say that you do. It's that easy to choose which one is the better decision.

pseudosavant
0 replies
4h30m

Considering the nuanced nature of copyrighting AI outputs, it isn't clear that either answer is correct.

milliams
1 replies
7h25m

They're really trying hard to avoid saying what kind of "models" these are. I think they're language models, but it's hard to say for sure.

lordswork
0 replies
7h17m

You're right that they don't call them language models. The technical report says:

    Gemma models demonstrate strong performance across
    academic benchmarks for language understanding, 
    reasoning, and safety.
Maybe they are reserving the right to expand Gemma model family to multi-modal models.

exacube
1 replies
3h28m

mmm..

what are the top 5 tallest man-made structures in the world

  Statue of Unity (Tallinn, Estonia) - 368 m

  Leaning Tower of Pisa (Pisa, Italy) - 330 m

  Supertree Grove (Orlando, Florida) - 275 m

  Skylon Tower (Kuala Lumpur, Malaysia) - 271 m

  Statue of Liberty (New York City, USA) - 186 m
> what is the richest country in the world

  The richest country in the world is Saudi Arabia with a gross domestic product of US$2.3 trillion.
> how many cm are in a ft

  A foot is equal to 0.0254 metres (ft).

rany_
0 replies
52m

I don't have this issue, what are you using to run the model?

These are the answers I got out of it (first output, no cherry-picking):

1. The tallest man-made structure in the world is the Burj Khalifa, standing at 828 meters tall. The Burj Khalifa is a skyscraper in Dubai, United Arab Emirates.

Here are the top 5 tallest man-made structures in the world, according to the Guinness World Records:

    Burj Khalifa (828 m)
    Taipei 101 (611 m)
    Merdeka Towers (599 m)
    Shanghai Tower (632 m)
    Petronas Towers (551 m)
2. According to the World Bank, the richest country in the world is Saudi Arabia with a gross domestic product (GDP) of $3944 billion.The richest country in the world is Saudi Arabia with a gross domestic product (GDP) of $3944 billion.

3. A foot (ft) is equal to 12 inches (in). There are 2.54 cm in an inch, so there are 2.54 cm x 12 = 30.48 cm in a foot.

brrrrrm
1 replies
5h23m

It looks like it's pretty resistant to quantization. ollama 4bit 7B doesn't work very well, but the 16bit 2B does

petercooper
0 replies
4h45m

That's useful to know. My experiments with the 4bit 7B currently tagged for use on ollama are not going well at all. Lots of refusals and junk. Downloading 7b-instruct-fp16 now! :-) (Update: Yes, much better, though much slower too, of course.)

Kelteseth
1 replies
8h25m

Can this run on my AMD Vega VII on Windows 11? As always, AMD is missing:

Optimization across multiple AI hardware platforms ensures industry-leading performance, including NVIDIA GPUs and Google Cloud TPUs.
lordswork
0 replies
7h15m

AMD Vega VII meets the memory requirements. Once tools like LM Studio, ollama, etc. add support for the model, you should be able to run locally like you would any other open weights model.

zemo
0 replies
6h36m

Open models feature free access to the model weights, but terms of use, redistribution, and variant ownership vary according to a model’s specific terms of use, which may not be based on an open-source license.

does a model being "open" say anything about how it was trained?

zdimension
0 replies
7h53m

Nice to see more open models. Props to the team for coming to the HN comment section to answer questions

vanderboyd
0 replies
8h18m

The 2B model seems underwhelming. For instance, compared to the recent StableLM2 1.6B model that is slightly smaller and probably wastes some "English metric points" by being multilingual.

The latter (and other similar open models) seem to do similarly well in benchmarks (much better in Math?) with way less fancy stuff. For instance, public data and no secretive filtering with pre trained models or synthetic data.

My take is that using the vanilla approaches take you really far. And many of the latest tricks and hours-of-work buy you little... Will be interesting to see how this plays out, especially for the open source community.

th0ma5
0 replies
2h13m

"Carefully tested prompts" sounds a lot like "these are the lotto numbers we know are right" kind of thing? How in the world are these things used for anything programmatically deterministic?

stochastimus
0 replies
3h27m

Go to Google announcement > Find “license” in page: no matches > Go to HN thread > Find “license” in page: 28 matches > Read a few sigh could have been exciting
spiantino
0 replies
4h51m

Maybe a dumb question, but why is there a Terms instead of a license? That feels a little flimsier as an open source offering

smpanaro
0 replies
5h53m

Has perplexity fallen out of favor? I didn't see it mentioned anywhere. I tried using lm-eval for the 2B model but the results seem wrong (46.1288).

smcn
0 replies
8h41m

There are some pretty impressive benchmarks on https://ai.google.dev/gemma. Even the 2b model looks fairly not awful?

I guess my weekend is going to be spent exploring this.

nojvek
0 replies
54m

I applaud the Google team openly engaging on HN here.

Q: how sure are you that the newer models trained from trillions of tokens - a huge chunk of open web, hasn't been accidentally polluted by slurping test data?

neximo64
0 replies
7h31m

Is this the Deepmind guy in Google more now? what a change the past year has made

modelx
0 replies
5h30m

They also implemented it in PyTorch. Cool! https://github.com/google/gemma_pytorch

modelx
0 replies
5h30m

They also implemented in PyTorch. Cool. https://github.com/google/gemma_pytorch

mark_l_watson
0 replies
2h13m

Nice, more choices are good. I just saw that the Ollama project already has these models available (date stamp is 58 minutes ago), so I will use that rather than Colab (I live Colab, but I like to run stuff locally).

marban
0 replies
6h28m

Unbefkglievable — Another week, another new name?

jerrygenser
0 replies
7h44m

Looking forward to Gemma 7bx8 moe

ijustlovemath
0 replies
7h48m

Hope to see support for this in ollama soon!

hawk01
0 replies
7h19m

Can't wait to try it out with ollama locally

dcchambers
0 replies
6h31m

Already available in Ollama v0.1.26 preview release, if you'd like to start playing with it locally:

- https://github.com/ollama/ollama/releases/tag/v0.1.26

circusfly
0 replies
4h58m

Gemma, Mistral, I feel like Rip van Winkle, asleep for 20 years only to wake up and find the whole tech world changed.

IceHegel
0 replies
4h16m

Google, at the moment, is a tech company whose products are actively engaged in the falsification of history for political purposes.

I honestly have no idea where they are going with this but I don't want to be part of it.

GaggiX
0 replies
8h20m

They have implemented the model also on their own C++ inference engine: https://github.com/google/gemma.cpp

FergusArgyll
0 replies
7h15m

Someone should try to make a MOE of 2b models

BryanLegend
0 replies
3h31m

Andrej Karpathy's take from twitter. (https://twitter.com/karpathy/status/1760350892317098371)

Seeing as I published my Tokenizer video yesterday, I thought it could be fun to take a deepdive into the Gemma tokenizer.

First, the Gemma technical report [pdf]: https://storage.googleapis.com/deepmind-media/gemma/gemma-re... says: "We use a subset of the SentencePiece tokenizer (Kudo and Richardson, 2018) of Gemini for com- patibility. It splits digits, does not remove extra whitespace, and relies on byte-level encodings for unknown tokens, following the techniques used for both (Chowdhery et al., 2022) and (Gemini Team, 2023). The vocabulary size is 256k tokens."

The tokenizer.model file is with this code release: https://github.com/google/gemma_pytorch/blob/main/tokenizer/...

I decoded this model protobuf in Python and here is the diff with the Llama 2 tokenizer: https://diffchecker.com/TRnbKRMH/

Notes: - vocab size is quite large: 32K -> 256K - add_dummy_prefix is False. Different from Llama but consistent with GPT. This is a bit more consistent w.r.t. "leave the data alone", as there is no preprocessing step that adds a space to the encoding text. - the model_prefix is the path of the training dataset, which is amusing to look at: "/cns/mf-d/home/gemini-data-access/tokenizers/final_v1_51GB_run1/bpe_coverage_0_999995_v5/255969". Seems to indicate the tokenizer training corpus was ~51GB (?). - a lot of user_defined symbols (i.e. special tokens) are present, e.g. "hardcoding" a sequence of up to 31 newlines as tokens, and a large number of other unclear tokens. I tried decoding the octal representations but it's not clear what's happening here. Also a lot of more special tokens for what look like html elements, e.g. <table>, <tr>, <td>, <i>, <b>, etc. Not 100% sure what the unused tokens are for, maybe this is pre-allocated space to make easier future finetunes that try to add more special tokens, as there is no need to resize vocabularies and perform model surgeries (?).

TLDR this is basically the Llama 2 tokenizer, except bigger (32K -> 256K), with a lot more special tokens, and the only functional departure is that add_dummy_prefix is turned off to False. So e.g. tokenizing:

"hello world" becomes: [17534, 2134] ['hello', 'world']

which otherwise would have been preprocessed to " hello world" (note leading space) and tokenized as: [25612, 2134] ['hello', 'world']

cool

Alifatisk
0 replies
5h10m

This is such a powerful move!

7moritz7
0 replies
5h53m

Thr landing page on ai.google.com seems to be machine translated, for Huggingface it uses the literal German translation (Umarmungen Gesicht)

0xbadc0de5
0 replies
8h19m

Thank you for releasing this.