So, better than GPT4 according to the benchmarks? Looks very interesting.
Technical paper: https://goo.gle/GeminiPaper
Some details:
- 32k context length
- efficient attention mechanisms (for e.g. multi-query attention (Shazeer, 2019))
- audio input via Universal Speech Model (USM) (Zhang et al., 2023) features
- no audio output? (Figure 2)
- visual encoding of Gemini models is inspired by our own foundational work on Flamingo (Alayrac et al., 2022), CoCa (Yu et al., 2022a), and PaLI (Chen et al., 2022)
- output images using discrete image tokens (Ramesh et al., 2021; Yu et al., 2022b)
- supervised fine tuning (SFT) and reinforcement learning through human feedback (RLHF)
I think these are already more details than what we got from OpenAI about GPT4, but on the other side, still only very little details.
That's for Ultra right? Which is an amazing accomplishment, but it sounds like I won't be able to access it for months. If I'm lucky.
I hate this "tierification" of products into categories: normal, pro, max, ultra
Apple does this and it's obvious that they do it to use the "decoy effect" when customers want to shop. Why purchase a measly regular iPhone when you can spend a little more and get the Pro version?
But when it comes to AI, this tierification only leads to disappointment—everyone expects the best models from the FAANGO (including OpenAI), no one expects Google or OpenAI to offer shitty models that underperform their flagships when you can literally run Llama 2 and Mistral models that you can actually own.
No, it’s not just to use the “decoy effect.” They do this to share development costs across a whole product line. Low volume, expensive products are subsidized by high volume, mass market devices. Without these tiers, they’d be unable to differentiate the products and so lose the margins of the high end products (and their entire reason for existing).
Unless you expect Apple to just sell the high end devices at a loss? Or do you want the high end chips to be sold in the mass market devices and for Apple to just eat the R&D costs?
Usually it’s the other way around. Mass market products have thin margins and are subsidized by high end / B2B products because the customers for those products have infinitely deep pockets.
Literally what Steve Jobs was steadfast in :). One iPhone for everyone. He even insisted on the Plus models carrying no extra features.
Usually it’s the other way around. Mass market products have thin margins and are subsidized by high end / B2B products because the customers for those products have infinitely deep pockets.
That's usually what I've seen, but the M1 MacBook Air came out first and the M1 Pro and Max came out much later.
This isn't "tierificaton" or even premiumization. That may come later.
Large AI models have tight resources requirements. You physically can't use X billion parameters without ~X billion ~bytes of memory.
It makes complete sense to have these 3 "tiers". You have a max capability option, a price-performance scaling option, and an edge compute option.
Well, X billion bits times the parameter bit size. For base models, those are generally 32-bit (so 4X bytes), though smaller quantizations ate possible and widely used for public models, and I would assume as a cost measure for closed hosted models as well.
It has to be this way when current LLMs have orders of magnitude electricity cost differences depending on the output you desire.
I'm honestly 100% okay with it as long as it's reasonable and not confusing to customers. (Not saying Apple isn't somewhat; I mean, buying a non-Pro iPhone 15 and not being able to view WebM files feels literally fucking insane, and that's apparently how that works, but that's a rant for a different thread.) In cases like this, presumably the idea isn't actually feature-gating, it's scaling up. AI inference costs compute time, and although I have no idea if the inference occurs on special hardware or not, if it does, I can only presume that scaling up the special hardware to meet demand is challenging and very much not like scaling up e.g. a typical web service.
IMO, Tiers can be useful when they make sense and aren't just for artificial market segmentation.
I think the expensive ones are used when the customer is the user — e.g. ChatGPT Plus (personal) subscription — and the cheap ones when they are not — e.g. customer support service bots.
I think it depends. It's always worth having a small fast model for some tasks and being able to run it completely offline on a mobile cpu. Maybe not as a chat companion, for for text understanding or indexing all your messages and photos for search, it may be enough.
I don't understand -- these are all literally tied directly to performance.
They're tiers of computing power and memory. More performance costs more money to produce. The "nano" can fit on a phone, while the others can't.
Are you really objecting to the existence of different price/performance tiers...? Do you object to McDonald's selling 3 sizes of soft drink? There's nothing "decoy" about any of this.
More expensive things cost more money, not a surprise imo
Yep, the announcement is quite cheeky.
Ultra is out sometime next year, with GPT-4 level capability.
Pro is out now (?) with ??? level capability.
Pro benchmarks are here: https://storage.googleapis.com/deepmind-media/gemini/gemini_...
Sadly it's 3.5 quality, :(
Table 2 indicates Pro is generally closer to 4 than 3.5 and Ultra is on par with 4.
Ehhh not really, it even loses to 3.5 on 2/8 tests. For me it feels pretty lackluster considering I'm using GPT-4 probably close to 100 times or more a day and it would be a huge downgrade.
Pro is approximately in the middle between GPT 3.5 and GPT 4 on four measures (MMLU, BIG-Bench-Hard, Natural2Cod, DROP), it is closer to 3.5 on two (MATH, Hellaswag), and closer to four on the remaining two (GSM8K, HumanEval). Two one way, two the other way, and four in the middle.
So it's a split almost right down the middle, if anything closer to 4, at least if you assume the benchmarks to be of equal significance.
If you think eval numbers mean a model is close to 4, then you clearly haven't been scarred by the legions of open source models which claim 4-level evals but clearly struggle to actually perform challenging work as soon as you start testing
Perhaps Gemini is different and Google has tapped into their own OpenAI-like secret sauce, but I'm not holding my breath
Lol that's why it's hidden in a PDF.
They basically announced GPT 3.5, then. Big woop, by the time Ultra is out GPT-5 is probably also out.
Yup, it's all a performance for the investors
+1. The investors are the customers of this release, not end users.
Yep, at this point I'd rather they hold their announcements until everybody can access it, not just the beautiful people. I'm excited and want to try it right now, and would actually use it for a PoC I have in mind, but in a few months the excitement will be gone.
It's to their detriment, also. Being told Gemini beats GPT-4 while withholding that what I'm trying out is not the model they're talking about would have me think they're full of crap. They'd be better off making it clear that this is not the one that surpasses GPT-4.
It really is. OpenAI has the Apple model of release - when it's announced the laptop is in you freaking hands 3 days later.
Google announces vaporware that's never going to come out, or something that will be out in 5 months. It's frustrating and very bad for their image in the LLM space.
I wonder if the "release" was done in spite of dev knowledge that it isn't really ready. Like "screw it, we want to attract eyeballs even though we know it's premature"
The article says "next year" - so that could be as soon as January, right?
given how google has been functioning, probably as late as December :)
There was a waiting period for ChatGPT4 as well, particularly direct API access, and the WebUI had (has?) a paywall
Looking at the names on those papers, it seems like all those breakthroughs are from Chinese and Indian origin authors. Stunning.
Why? Google is an international organization and its technical employment is heavily skewed towards these two origins. Also Americans come from other places? Regardless of their last name…
What is this about?
Are you trying to find a controversy?
They're making an observation. As you noted, there is a lot of technical people that are immigrants at Google. It is stunning because it implies native born americans are dramatically under represented. Inclusion means include everyone. This is just as bad as CEOs at most companies being all of european ancestry.
No it was just a bizarrely naive observation -- not even in a racist way, just really dumb and implied things that were not true.
how do you know they're not native born?
I assume if one of the names in the paper was O'Shaughnessy you would immediately think: "Irish immigrant!" Schmidt? German immigrant!
Most "Indian looking", forgive me the crude way of saying it, native-born Americans in my kid's school have traditional Indian names.
It is very common for children of immigrants to be high achievers because being a legal immigrant strongly correlates with high personal achievement - which is generally transmitted to children. Of course this isn't exclusive to immigrants, but it's a form of selection bias.
Two things:
1. I see few Native Americans at graduate levels in technical fields. Maybe because the American culture, unlike Eastern countries, does not encourage students to go to college? Maybe because there are way more jobs out there that don't require that high-ed degrees? Maybe because Americans already live in the US whereas for a typical Chinese/Indian person, getting a Ph.D. is a ticket to come to the US?
2. DEI policies in the industry and academia sometimes lead to over-presentation of those nationalities (speaking as a foreign national myself). Companies can treat an H1B visa holder any way they want because the visa holder wouldn't get another job if they got fired, but the comapny can't behave like that towards a native American.
Isn't London a major location for Google AI expertise?
Also, native-born Americans have family origins, and therefore names, from all over the world. (I'm assuming from context that by native born you don't mean actual Native Americans)
I think the controversy was in making the observation
China has the second biggest output of AI/ML research after the US. So not that surprising.
Just looking at the names in that comment I see US, China, India, and France represented, but if you actually check the full list of authors from one of the papers you'll usually see a pretty broad range of backgrounds.
I miss when ML scientific papers had actual science in them. Now they all feel like ads.
That's because they're not "scientific papers", they're technical papers.
The table is *highly* misleading. It uses different methodologies all over the place.
For MMLU, it highlights the CoT @ 32 result, where Ultra beats GPT4, but it loses to GPT4 with 5-shot, for example.
For GSM8K it uses Maj1@32 for Ultra and 5-shot CoT for GPT4, etc.
Then also, for some reason, it uses different metrics for Ultra and Pro, making them hard to compare.
What a mess of a "paper".