Mistral AI Valued at $2B

There is a lot of hype around LLMs, but (BUT!) Mistral well deserves the hype. I use their original 7B model, as well as some derived models, all the time. I can’t wait to see what they release next (which I expect to be a commercial product, although the MoE model set they just released is free).

Another company worthy of some hype is 01.AI which released their Yi-34B model. I have been running Yi locally on my Mac (use “ ollama run yi:34b”) and it is amazing.

Hype away Mistral and 01.AI, hype away…

How do these small models compare to gpt4 for coding and technical questions?

I noticed that gpt3.5 is practically useless to me (either wrong or too generic), while gpt4 provides a decent answer 80% of the time.

They are not close to GPT-4. Yet. But the rate of improvement is higher than I expected. I think there will be open source models at GPT-4 level that can run on consumer GPUs within a year or two. Possibly requiring some new techniques that haven't been invented yet. The rate of adoption of new techniques that work is incredibly fast.

Of course, GPT-5 is expected soon, so there's a moving target. And I can't see myself using GPT-4 much after GPT-5 is available, if it represents a significant improvement. We are quite far from "good enough".

Curious thought: at some point a competitor’s AI might become so advanced, you can just ask it to tell you how to create your own, analogous system. Easier than trying to catch up on your own. Corporations will have to include their own trade secrets among the things that AIs aren’t presently allowed to talk about like medical issues or sex.

How to create my own LLM?

Step 1: get a billion dollars.

That’s your main trade secret.

What is inherent about AIs that requires spending a billion dollars?

Humans learn a lot of things from very little input. Seems to me there's no reason, in principle, that AIs could not do the same. We just haven't figured out how to build them yet.

What we have right now, with LLMs, is a very crude brute-force method. That suggests to me that we really don't understand how cognition works, and much of this brute computation is actually unnecessary.

If we knew how to build humans for cheap, then it wouldn't require spending a billion dollars. Your reasoning is circular.

It's precisely because we don't know how to build these LLMs cheaply that one must so spend so much money to build them.

The point is that it's not inherently necessary to spend a billion dollars. We just haven't figured it out yet, and it's not due to trade secrets.

Transistors used to cost a billion times more than they do now [1]. Do you have any reason to suspect AIs to be different?

[1] https://spectrum.ieee.org/how-much-did-early-transistors-cos...

Transistors used to cost a billion times more than they do now

However you would still need billions of dollars if you want state of the art chips today, say 3nm.

Similarly, LLM may at some point not require a billion dollars, you may be able to get one, on par or surpass GPT4, easily for cheap. The state of the art AI will still require substantial investment.

Maybe not $1 billion, but you'd want quite a few million.

According to [1] a 70B model needs $1.7 million of GPU time.

And when you spend that - you don't know if your model will be a damp squib like Bard's original release. Or if you've scraped the wrong stuff from the internet, and you'll get shitty results because you didn't train on a million pirated ebooks. Or if your competitors have a multimodal model, and you really ought to be training on images too.

So you'd want to be ready to spend $1.7 million more than once.

You'll also probably want $$$$ to pay a bunch of humans to choose between responses for human feedback to fine-tune the results. And you can't use the cheapest workers for that, if you need great english language skills and want them to evaluate long responses.

And if you become successful, maybe you'll also want $$$$ for lawyers after you trained on all those pirated ebooks.

And of course you'll need employees - the kind of employees who are very much in demand right now.

You might not need billions, but $10M would be a shoestring budget.

[1] https://twitter.com/moinnadeem/status/1681371166999707648

This just screams to me that we don’t have a clue what we’re doing. We know how to build various model architectures and train them, but if we can’t even roughly predict how they’ll perform then that really says a lot about our lack of understanding.

Most of the people replying to my original comment seem to have dropped the “in principle” qualifier when interpreting my remarks. That’s quite frustrating because it changes the whole meaning of my comment. I think the answer is that there isn’t anything in principle stopping us from cheaply training powerful AIs. We just don’t know how to do it at this point.

Humans learn a lot of things from very little input

And also takes 8 hours of sleep per day, and are mostly worthless for the first 18 years. Oh, also they may tell you to fuck off while they go on a 3000 mile nature walk for 2 years because they like the idea of free love better.

Knowing how birds fly ready doesn't make a useful aircraft that can carry 50 tons of supplies, or one that can go over the speed of sound.

This is the power of machines and bacteria. Throwing massive numbers at the problem. Being able to solve problems of cognition by throwing 1GW of power at it will absolutely solve the problem of how our brain does it with 20 watts in a faster period of time.

I agree about training time, but bear in mind LLMs like GPT4 and Mistral also have noisy recall of vastly more written knowledge than any human can read in their lifetime, and this is one of the features people like about LLMs.

You can't replace those types of LLM with a human, the same way you can't replace Google Search (or GitHub Search) with a human.

Acquiring and preparing that data may end up being the most expensive part.

Because that billion dollars gets you the R&D to know how to do it?

The original point was that an “AI” might become so advanced that it would be able to describe how to create a brain on a chip. This is flawed for two main reasons.

1. The models we have today aren’t able to do this. We are able to model existing patterns fairly well but making new discoveries is still out of reach.

2. Any company capable of creating a model which had singularity-like properties would discover them first, simply by virtue of the fact that they have first access. Then they would use their superior resources to write the algorithm and train the next-gen model before you even procured your first H100.

It might work for fine-tuning an open model to a narrow use case.

But creating a base model is out of reach. You need an order of probably hundreds of millions of $$ (if not billion) to get close to GPT 4.

As someone who doesn’t know much about how these models work or are created I’d love to see some kind of breakdown that shows what % of the power of GPT4 is due to how it’s modelled (layers or whatever) vs training data and the computing resources associated with it.

This isn't precisely knowable now, but it might be something academics figure out years from now. Of course, first principles of 'garbage in garbage out' would put data integrity very high, the LLM code itself is supposedly not even 100k lines of code, and the HW is crazy advanced.

so the ordering is probably data, HW, LLM model

This also fits the general ordering of

data = all human knowledge HW = integrated complexity of most technologists LLM = small team

Still requires the small team to figure out what to do with the first two, but it only happened now because the HW is good enough.

LLMs would have been invented by Turing and Shannon et al. almost certainly nearly 100 years ago if they had access to the first two.

By % of cost it's 99.9% compute cost and 0.01% data costs.

In terms of "secret sauce" it's 95% data quality and 5% architectural choices.

That’s true now, but maybe GPT6 will be able to tell you how to build GPT7 on an old laptop, and you’ll be able to summon GPT8 with a toothpick and three cc’s of mouse blood.

Model merging is easy, and a unique model merge may be hard to replicate if you don’t know the original recipe.

Model merging can create truly unique models. Love to see shit from ghost in the shell turn into real life

Yes training a new model from scratch is expensive, but creating a new model that can’t be replicated by fine tuning is easy

The limiting factor isn’t knowledge of how to do it, it is GPU access and RLHF training data.

I’m both excited and scared to think about this “significant improvement” over GPT-4.

It can make our jobs a lot easier or it can take our jobs.

Isn't that the same? At some point, your job becomes so easy that anyone can do it.

It's weird for programmers to be worried about getting automated out of a job when my job as a programmer is basically to try as hard as I can to automate myself out of a job.

You’re supposed to automate yourself out but not tell anyone. Didn’t you see that old simpsons episode from the 90s about the self driving trucks? The drivers rightfully STFU about their innovation and cashed in on great work life balance and Homer ruined it by blabbered about it to everyone, causing the drivers to try to go after him.

We are trying to keep SWE salaries up, and lowering the barrier to entry will drop them.

I expect the demand for SWE to grow faster than productivity gains.

The idea that demand scales to fill supply doesn’t work when supply becomes effectively infinite. Induction from the past is likely wrong in this case

I don't see the current tech making supply infinite. Not even close.

Maybe a more advanced type of model they'll invent in the next years. Who knows... But GPT-like models? Nah, they won't write useful code applicable in prod without supervision by an experient engineer.

LLMs are going to spit out a lot of broken shit that needs fixing. They're great at small context work but full applications require more than they're capable of imo.

Even if so, the next gen model will fix it.

Hey, I doubt it.

I believe one of the problems that OSS models need to solve, is... dataset. All of them lack a good and large dataset.

And this is most noticiable if you ask anything that is not in English-American-ish.

Maybe it should be an independent model in charge only of converting your question to American English and back, instead of trying to make a single model speak all languages

I don't think this is a good idea. A good model if we are really aiming at anything that resembles AGI (or even a good LLM like GPT4) is a model that have world knowledge. The world is not just English.

There’s a lot of world knowledge that is just not present in an American English corpus. For example knowledge of world cuisine & culture. There’s precious few good English sources on Sichuan cooking.

I think there will be open source models at GPT-4 level that can run on consumer GPUs within a year or two.

There is indeed already open source models rivaling ChatGPT-3.5 but GPT-4 is an order of magnitude better.

The sentiment that GPT-4 is going to be surpassed by open source models soon is something I only notice on HN. Makes me suspect people here haven't really tried the actual GPT-4 but instead the various scammy services like Bing that claim they are using GPT-4 under the hood when they are clearly not.

Makes me suspect you don't follow HN user base very closely.

You're 100% right and I apologize that you're getting downvoted, in solidarity I will eat downvotes with you.

HNs funny right now because LLMs are all over the front page constantly, but there's a lot of HN "I am an expert because I read comments sections" type behavior. So many not even wrong comments that start from "I know LLaMa is local and C++ is a programming language and I know LLaMa.cpp is on GitHub and software improves and I've heard of Mistral."

Mistral's latest just released model is well below GPT-3 out of the box. I've seen people speculate that with fine-tuning and RLHF you could get GPT-3 like performance out of it but it's still too early to tell.

I'm in agreement with you, I've been following this field for a decade now and GPT-4 did seem to cross a magical threshold for me where it was finally good enough to not just be a curiosity but a real tool. I try to test every new model I can get my hands on and it remains the only one to cross that admittedly subjective threshold for me.

Still, for a 7B model, this is quite impressive.

Mistral's latest just released model is well below GPT-3 out of the box

The early information I see implies it is above. Mind you, that is mostly because GPT-3 was comparatively low: for instance its 5-shot MMLU score was 43.9%, while Llama2 70B 5-shot was 68.9%[0]. Early benchmarks[1] give Mixtral scores above Llama2 70B on MMLU (and other benchmarks), thus transitively, it seems likely to be above GPT-3.

Of course, GPT-3.5 has a 5-shot score of 70, and it is unclear yet whether Mixtral is above or below, and clearly it is below GPT-4’s 86.5. The dust needs to settle, and the official inference code needs to be released, before there is certainty on its exact strength.

(It is also a base model, not a chat finetune; I see a lot of people saying it is worse, simply because they interact with it as if it was a chatbot.)

[0]: https://paperswithcode.com/sota/multi-task-language-understa...

[1]: https://github.com/open-compass/MixtralKit#comparison-with-o...

Have you played with finetunes, like Cybertron? Augmented in wrappers and retrievers like GPT is?

It's not there yet, but its waaaay closer than the plain Mistral chat release.

what types of things do you ask ChatGPT to do for you regarding coding?

Typically a few lines snippets that would require me a few minutes of thinking but that ChatGPT will provide immediately. It often works, but there are setbacks. For instance, if I'm lazy and don't very carefully check the code, it can produce bugs and cancel the benefits.

It can be useful, but I can see how it'll generate a class of lazy coders who can't think by themselves and just try to get the answer from ChatGPT. An amplified Stack Overflow syndrome.

If you can run yi34b, you can run phind-codellama. It's much better than yi and mistral for code questions. I use it daily. More useful than gpt3 for coding, not as good as gpt4, except that I can copy and paste secrets into it without sending them to openai.

Thanks, I will give codellama a try.

Open source models will probably catch up at the same rate as open source search engines have caught up to Google search.

One thing people should keep in mind when reading others’ comments about how good an LLM is at coding, is that the capability of the model will vary depending on the programming language. GPT-4 is phenomenal at Java because it probably ate an absolutely enormous amount of Java in training. Also, Java is a well-managed language with good backwards-compatibility, so patterns in code written at different times are likely to be compatible with each other. Finally, Java has been designed so that it is hard for the programmer to make mistakes. GPT-4 is great for Java because Java is great for GPT-4: it provides what the LLM needs to be great.

How do you use these models? If you don't mind sharing. I use GPT-4 as an alternative to googling, haven't yet found a reason to switch to something else. I'll for example use it to learn about the history, architecture, cultural context, etc of a place when I'm visiting. I've found it very ergonomic for that.

I use them in my editor with my plugin https://github.com/David-Kunz/gen.nvim

Interesting use case, but the issue is wasting all this compute energy for prediction?

Can you explain what you mean by this question?

I’ve use lm studio. It’s not reached peak user friendliness, but it’s a nice enough GUI. You’ll need to fiddle with resource allocation settings and select an optimally quantized model for best performance. But you can do all that in the UI.

lm studio is an accessible simple way to use them. that said expecting them to be anywhere near as good as gpt-4 is going to lead to disappointment.

If you want to experiment Kobold.cpp is a great interface and goes a long distance to guarantee backwards compatibility of outdated model formats.

I host them here: https://app.lamini.ai/playground

You can play with them, tune them, and download the weights

It isn’t exactly the same as open source because weights != source code, but it is close in the sense that it is editable

IMO we just don’t have great tools for editing LLMs like we do for code, but they are getting better

Prompt engineering, RAG, and finetuning/tuning are effective for editing LLMs. They are getting easier and better tooling is starting to emerge

You mind sharing what you find so amazing about Yi-34B? I haven’t had a chance to try it.

I just installed it on my 32B Mac yesterday, first impressions: it does very well reasoning, it does very well answering general common sense world knowledge questions, and so far when it generates Python code, the code works and is well documented. I know this is just subjective, but I have been running a 30B model for a while in my Mac and Yi-34B just feels much better. With 4bit quantization, I can still run Emacs, terminal windows and a web browser with a few tabs without seeing much page faulting. Anyway, please try it and share a second opinion.

The 200K finetunes are also quite good at understanding their huge context.

I use their original 7B model, as well as some derived models, all the time.

How does it compare to other models? and with chatgpt in particular?

No comparison to be made.

I concur, Yi 34B and Mistral 7B are fantastic.

But you need to run the top Yi finetunes instead of the vanilla chat model. They are far better. I would recommend Xaboros/Cybertron, or my own merge of several models on huggingface if you want the long context Yi.

Mistral has a lot of potential, but there's the obvious risk that without proper monetization strategies it might not achieve sustainable profitability in the long term.

Nothing stops them from launching a chat app.

The old open source, but we'll host it for you? I think Bezos is going to be in fits of evil laughter about that model in 5 years, as all the open source compute moves to the clouds, with dollars flowing his way.

But one thing Mistral could do is have a free foundational model, and have non-free (as in beer, as in speech) "pro" models. I think they will have to.

Here's to hoping such models run on dedicated chips locally, on Phones and PCs etc...

They already do, we just released a model equivalent to most 40-60b base models that runs on a MacBook Air no problem.

It's like 1.6gb, ones coming are better and smaller https://x.com/EMostaque/status/1732912442282312099?s=20

I think the large language model paradigm is pretty much done as we move to satisficing tbh

There are huge economy of scale benefits from providing hosted models.

I've been trying out all sorts of open models, and some of them are really impressive - but for my deployed web apps I'm currently sticking with OpenAI, because the performance and price I get from their API is generally much better than I can get for open models.

If Mistral offered a hosted version which didn't have any spin-up time and was price competitive with OpenAI I would be much more likely to build against their models.

This only is defensible for closed models though.

Release small, open, foundational models.

Deploy larger, fine tuned variants and charge for them.

There’s a reason we don’t have the data set or original training scripts for mistral

it’s a “mistry” ;)

Zero moat. Everybody's doing it.

I suppose they could be the Google to everyone else's Yahoo and Dogpile, but I expect that to be a hard game to play these days.

The French have a urge to be independent, the French government will hand them some juicy contract as soon as the can provide any product that justifies that.

Yeah they shouldn't worry, they'll get a big French government deal at worst

One of the French tycoons will eventually buy them.

The French have a urge to be independent

They lose that fight a long time ago though. It seems they don't even try to pretend anymore.

I would say most European countries have that desire. That and the fact it can easily by fine tuned to the local language could make these models very popular outside the US.

Wait what? If company don’t make $ it don’t survive?

HN could really elevate the discourse if they flagged the submarine ads of VCs

It is a relevant question in the AI industry specifically due to new concerns about ROI given the intense compute costs.

Same concern I have regarding Spotify. [Which seems to have insane recurring costs. Plus some risky expansive strategic moves]

I was wondering this. What is their business model exactly? Almost seems like Europe’s attempt to say “hey, look, we are relevant too”

They charge for the API, like OpenAI.

https://docs.mistral.ai/platform/pricing/

Being acquired.

On their pitch deck it said they will monetise serving of their models.

While it may feel like a low moat if anyone can spin up a cloud instance with the same model, it's still a reasonable starting point. I think they will also be getting a lot of EU clients who can't/don't want to use US providers.

People forget the released version is v0.1

If the commerically-served model has improved capability and is exclusive to Mistral's service, there is a possible moat there.

they seem pretty committed to open-source AI (from interviews I've heard with the founders) - but maybe if they manage to train models with truly amazing capabilities somewhere down the line, they will keep some closed source

At this valuation and given the strength of the team, it’s not hard to imagine a future acquisition yielding a significant ROI.

Besides, we don’t know what future opportunities will unfold for these technologies. Clearly there’s no shortage of smart investors happy to place bets on that uncertainty.

Coupled with the concern that once you’re charging users money for a product, you are also liable for sketchy things they do with it. Not so much when you post a torrent link on twitter that happens to have model weights.

Model-as-a-service should work just fine.

LLM space is so cringe so much excitement from supply side and no excitement/cringe from supposed demand side

I don’t know what you’re talking about. I use chatGPT extensively. Probably more than 50 times a day. I am extremely excited for anything that can top the already amazing thing we have now. They have a massive paying customer base.

What do you use it for?

I used it to write my wedding vows

Based

Not OP, but For me:

- Writing: emails, documentation, marketing - Write a bunch of unstructured skeleton of information. Add a prompt about the intended audience and a purpose. Possibly ask it to add some detail.

- Coding: Especially things like "Is there a method for this in this library" - a lot quicker than browsing through documentation. Some errors - copy-paste the error from the console, maybe a little bit for context, and quite often I get the solution.

And API based:

- Support bot

- Prompt engineering of some text models that normally would require labeling, training, and evaluation for weeks or months. A couple of use cases - unstructured text as an input + prompt, JSON as an output.

"Is there a method for this in this library"

more efficient than just googling "<method description> <library name>"?

Not OP but I used it very successfully (not OpenAI but some wrapper solution) for technical/developer support. Turns out a lot of people prefer talking to a bot that gives a direct answer than reading the docs.

Support workload on our Slack was reduced by 50-75% and the output is steadily improving.

I wouldn’t want to go back tbh.

Bash scripts

A lot of very varied things so it’s hard to remember. Yesterday I used it extensively to determine what I need to buy for a chicken coop. Calculating the volume of concrete and cinder blocks needed, the type and number of bags of concrete I would need, calculating how many rolls of chicken wire I would need, calculating the number of shingles I would need, questions on techniques, and drying times for using those things, calculating how much mortar I would need for the cinderblocks (it took into account that I would mortar only on the edges, the thickness of mortar required for each joint, it accounted for the cores in the cinderblocks, it correctly determined I wouldn’t need mortar on the horizontal axis on the bottom row) etc. All of this, I could’ve done by hand, but I was able to sit and literally use my voice to determine all of this in under five minutes.

I use DALLE3 extensively for my woodworking hobby, where I ask it to come up with ideas for different pieces of furniture, and have constructed several based on those suggestions.

For work I use it to write emails, to come up with skeletons for performance reviews, look back look ahead documents, ideas for what questions to bring up during sprint reviews based on data points I provide it etc.

I usually go to it before google now if I’m looking for an answer to a specific question.

I know it can be wrong, but usually when it is, it’s obviously wrong

100%. ChatGPT is used heavily in my household (my wife and I both have paid subscriptions) and it’s absolutely worth it. One of the most interesting things for me has actually been watching my wife use it. She’s an academic in the field of education and I’ve seen her come up with so many creative uses of the technology to help with her work. I’m a power user too, but my usage, as a software engineer, is likely more predictable and typical.

It’s replaced Google for me, for most queries.

It’s just so much more efficient in getting the answers I need. And it makes a great pair programmer partner.

Microsoft Cloud AI revenue went $90M, $900M, $2.7B in three quarters. How much more hard dollar demand growth could there possibly be at this point?

it's shovels all the way down

shovelling what in your opinion? Or it’s just a giant house of cards?

Right now they’re shoveling “potential”. LLMs demonstrate capabilities we haven’t seen before, so there’s high uncertainty about the eventual impact. The pace of progress makes it _seem_ like an LLM “killer app” could appear any day and creating a sense of FOMO.

There's also the race to "AGI" -- companies spending tens of billions on training, hoping they'll hit a major intelligence breakthrough. If they don't hit anything significant that would have been money (mostly) down the drain, but Nvidia made out like a bandit.

I think there are enough genuine use cases. People are saving time using AI tools. There are a lot of people in office jobs. It is a huge market. Not to say it won't overshoot. With high interest rates valuations should be less frothy anyway.

They're selling to startups, not consumers.

The good startups are building, fine tuning, and running models locally.

Yeah, the demand side consists solely of those that think they will be supply side.

I can’t think of any software/service that’s grown more in terms of demand over a single year than ChatGPT (in all its incarnations, like the MS Azure one).

Valuation means Jack shit for early stage startup. WeWork was valued at $50B at its peak.

Until a company is consistently showing growth in revenue and a path to sustainable profitability, valuation is essentially wild speculation.

OpenAI is wildly unprofitable right now. The revenue they make is through nice APIs.

What is Mistral’s plan for profitability?

Right now stability AI is in dumps and looking for a buyer.

Only companies I see making money in AI are those who live like cockroaches and very capital efficient. Midjourney and Comma.ai come to mind.

Very much applaud them for open release of models and weights.

Valuation matters quite a bit for continued funding.

Yes, and it can matter in a very bad way if you need to subsequently have a "down round" (more funding at a lower valuation).

Initial high valuations mean the founders get a lot of initial money giving up little stock. This can be awesome if they become strongly cash-flow positive before they run out of that much runway. But if not, they'll get crammed hard in subsequent rounds.

The more key question is: how much funding did they raise at that great valuation, and is it sufficient runway? Looks like €450 million plus an additional €120 million in convertible debt. Might be enough, depending on their expenses...

I'm not saying that either of your concerns are invalid. The LLM space is just the wrong place to be for investors who are worried about cash-flow positivity this early in the game. These models are crazy expensive to develop _currently_, but they is getting cheaper to train all the time. Meaning Mistral spent a fraction of what OpenAI did on GPT-3 to train their debut model, and that companies started one year from now will be spending a fraction of what both are spending presently to train their debut models.

YUP. Plus, the points at the end of your post, abt how much faster and cheaper it is getting to train new models indicates that Mistral may have hit a real sweet-spot. They are getting funding at a moment where the expectations are that huge capital is needed to build these models, just when those costs are declining, so the same investment will buy them a lot more runway than it did for previous competitors...

His point is with regards to reaching & maintaining profitability, not revenue spending.

It's too early for Mistral to focus on revenue. These AI companies are best thought of as moonshot projects.

It’s kinda weird thinking deep tech companies should be profitable a year in.

Like it takes time to make lots of money and it’s really hard to build state of the art models.

Reality is this market is huge and growing massively as it is so much more efficient to use these models than many (but not all) tasks.

At stability I told team to focus on shipping models as next year is the year for generative media where we are the leader as language models go to the edge.

I acknowledge it’s easy to be an armchair critic. You are the ones in battlefield doing real work and pushing the edge.

The thing is I don’t want the pro-open-source players to fizzle out and implode because funding dried up and they have no path to self sustainability.

AGI could be 6 months away or 6 decades away.

E.g Cruise has a high probability of imploding. They raised too much and didn’t deliver. Now California has revoked their license for driverless cars.

I’m 100% sure AGI, driverless cars and amazing robots will come. Fairly convinced the ones who get us there will be the cockroaches and not the dinosaurs.

I think its also tough at the early stage of the diffusion (aha) of innovation curve, we are at the point of early adopters and high churn before mass adoption of these technologies over the coming years as they are good enough, fast enough and cheap enough.

AGI is a bit of a canard imo, its not really actionable on a business sense.

They didn't say that companies should be profitable at a year in.

To my mind they just seemed to be responding to the slightly clickbait-y title, which focuses on the valuation, which has some significance but is still pretty abstract. Still, headlines love the word "billion".

The straight-news version of the headline would probably focus more on a16z's new round.

OpenAI is wildly unprofitable right now.

Do we know some of its numbers? How many paid subscribers do they have? I pay for two subscriptions.

comma.ai is a great example of a good business.

But I might have a bias because I was following along as the company was built from whiteboard diagrams to what it became.

This is just tangential, but I wouldn't call their APIs "nice", I'd be far less charitable. I spent a few hours (because that's how long it took to figure out the API, due to almost zero documentation) and wrote a nicer Python layer:

https://github.com/skorokithakis/ez-openai/

With all that money, I would have thought they'd be able to design more user-friendly APIs. Maybe they could even ask an LLM for help.

Profitability likewise means jack shit. You just need to be have a successful acquisition by a lazy dinosaur or go make enough income to go public. You can lose money for 10yrs straight while transferring wealth from the public to the investors/owners. With that said, I'm short Mistral for them being French. I have absolute zero faith in EU based orgs.

On profitability, For all the new comers, I don't think anyone can wager that any of them is going to make money. Capital efficiency is overrated so long as they can survive for the next year+, they are all trying to corner the market and OpenAI is the one that seems to have found a way to milk the cow for now. I truly believe that the true hitmakers are yet to enter the scene.

Generally agree.

Instead of "path to profitability", I think path to ROI is more appropriate, though.

WhatsApp never had a path to profitability, but it had a clear path to ROI by building a unique and massive user base that major social networks would fight for.

Of course, the reason Mistral AI got a lot of press and publicity in the first place was because they open-sourced Mistral-7B despite the not-making-money-in-the-short-term aspect of it.

It's better for the AI ecosystem as a whole to incentive AI startups to make a business through good and open software instead of building moats and lock-in ecosystems.

I don’t think that counts as open source. They didn’t share any details about their training, making it basically impossible to replicate.

It’s more akin to a SaaS company releasing a compiled binary that usually runs on their server. Better than nothing, but not exactly in the spirit of open source.

This doesn’t seem like a pedantic distinction, but I suppose it’s up to the community to agree or disagree.

It's IMO a pedantic distinction.

A compiled binary is a bad metaphor because it gives the implication that Mistral-7B is an as-is WYSIWIG project that's not easily modifiable. In contrast, there have been a bunch of new powerful new models created by modifying or finetuning Mistral-7B such as Zephyr-7B: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta

The better analogy to Mistral-7B is something like modding Minecraft or Skyrim: although those games are closed source themselves, it has enabled innovations which helps the open-source community directly.

It would be nice to have fully open-source methodologies but lacking them isn't an inherent disqualifier.

It's a big distinction, if I want to tinker with the model architecture I essentially can't because the training pipeline is not public.

If you want to tinker with the architecture Hugging Face has a FOSS implementation in transformers: https://github.com/huggingface/transformers/blob/main/src/tr...

If you want to reproduce the training pipeline, you couldn't do that even if you wanted to because you don't have access to thousands of A100s.

We do, via TRC. Eleuther does too. I think it’s a bad idea to have a fatalistic attitude towards model reproduction.

Exactly, nice work BTW. And no hate for Mistral, they're doing great work, but let's not confuse weights-available with fully open models.

With all the new national supercomputers scale isn’t really going to be an issue, they all want large language models on 10k GH200s or whatever and the libraries are getting easier to use

I'm well aware of the many open source architectures, and the point stands. Models like GPT-J have open code and data, and that allows using them as a baseline for architecture experiments in a way that Mistral's models can't be. Mistral publishes weights and code, but not the training procedure or data. Not open.

According to the Free Software Definition:

"Source code is defined as the preferred form of the program for making changes in. Thus, whatever form a developer changes to develop the program is the source code of that developer's version."

According to the Open Source Definition:

"The source code must be the preferred form in which a programmer would modify the program. Deliberately obfuscated source code is not allowed. Intermediate forms such as the output of a preprocessor or translator are not allowed."

LLM models are usually modified by changing the model weights directly, instead of retraining the model from scratch. LLM weights are poorly understood, but this is an unavoidable side effect of the development methodology, not deliberate obfuscation. "Intermediate" implies a form must undergo further processing before it can be used, but LLM weights are typically used directly. LLMs did not exist when these definitions were written, so they aren't a perfect fit for the terminology used, but there's a reasonable argument to be made that LLM weights can qualify as "source code".

LLM models are usually modified by changing the model weights directly, instead of retraining the model from scratch. LLM weights are poorly understood, but this is an unavoidable side effect of the development methodology, not deliberate obfuscation.

They're understood based on knowing the training process though, and a developer working on them would want to have the option of doing a partial or full retraining where warranted.

also because their model is unconstrained/censored. and they are commited to that according to what they say, they build it so others can build on it. GPTs are not finished business and hopefully the open source community with surpass the early successes.

They ought to rename to “ReallyOpenAI”

I really hope that a European startup can successfully compete with the major companies. I do not want to see privacy violations, such as OpenAI's default use of user prompts for training, become standard practice.

Does Anthropic count as European?

How on Earth would it count as European? It's a completely American company. Founded in the US, by Americans, headquartered in the US, funded by American VCs... I genuinely don't get how you arrived at the idea that it's European.

Big office and lots of jobs in UK. And with complex tax setups these days I wasn’t sure.

UK is not in the Europe anymore.

Interesting, TIL.

They cut through the continental shelf as part of Brexit.

maybe not the distinction you meant but the UK is still in Europe (the continent) and to me, European is a word based on location not membership of the European Union (which the UK left)

By that measure I guess Apple is Irish...?!

Dario is italian-american?

That doesn't matter too much, the corporate structure is more interesting.

Elon is South African but that doesn't make Tesla a South African company.

With the new AI regulations the EU is going to adopt, how long will mistral be paris based?

there's nothing in the new AI regulations hindering Mistral's work. Open Source foundation models are in no way impacted.

https://x.com/ylecun/status/1733481002234679685?s=20

We both know that's not how regulations work. Mistral is going to have to get a legal team to understand the regulations, have a line item for each provision, verify each one doesn't apply to them, get it signed off and continously monitor for changes both to the laws and the code to make sure it stays compliant. This will just be a mandate from HR/Legal/Investors.

Alot of work for a company with no commercial offering off the bat. And possibly an insurmountable amount of work for new players trying to enter.

Alot of work for a company with no commercial offering off the bat

If you have no commercial offering it doesn't apply to you at all in the first place

If you never have any commercial offering, you have a 0 valuation.

Meta didn't have any commercial offering until what, WhatsApp for business a few years ago, around 2018? By your logic they should have never been valued at anything or made any profit, yet they did.

Regardless of where a company is headquartered, it has to comply with local regulations.

Only if it wants to do business there. If a company is just headquartered there, they have to comply with regulations no matter what.

Maybe the regulations will be Mistral shaped.

Or another way to put it - if you are an enterprise based in Europe that needs to stay compliant, future regulation will make it very hard to not use Mistral :P.

Anyone else think Nvidia giving companies money to spend on Nvidia hardware at very high profit margin is a dubious valuation scheme?

Why would it be a dubious valuation scheme? I guess if an investor is looking at just revenue, or only looking at one area of their business finances, maybe? Otherwise it seems like the loss in funds would be weighed against the increase in revenue and wouldn't distort earnings.

If it was a good valuation scheme, then Nvidia giving them $100 million at a $2 billion valuation would mean that Nvidia thinks the company is worth $2 billion. But if Mistral uses that money to buy GPUs that Nvidia sells with 75% profit margin, the deal is profitable for Nvidia even if they believe the company is worth only $0.5 billion (since they effectively get 75% of the investment back). And if this deal fuels the wider LLM hype and leads other companies to spend just $50 million more at Nvidia, this investment is profitable for Nvidia even if Mistral had negative value.

With convertible debt and many of these rounds investors get the first money out, so the first 450m would go to the investors.

Say big green gives a company $100M with the rider that it needs to spend all that on nvidia's hardware in exchange for 10% of the company.

Has Nvidia valued the company at 1B? Say their margin is 80% on the sales. So Nvidia has lost some cashflow and $20M for that 10%. Has Nvidia valued the company at $200M?

I see :) Thanks for clarifying. I would say that I don't have a strong enough grasp on biz finances to do more than speculate here, but:

1) Is all the money spent up front? Or does it trickle back in over a few years? Cash flow might be impacted more than implied, but I doubt this is much of an issue.

2) I wonder how the 10% ownership at 2B valuation would be interpreted by investors. If it's viewed as a fairly liquid investment with low risk of depreciation then yeah, I could see Nvidia's strategy being quite the way to pad numbers. OTOH, the valuation could be seen as pure marketing fluff and mostly written off by the markets until regulations and profitability are firmly in place.

You'd be surprised how this is much more common than people realize

Kinda like MS giving OpenAI all those Azure credits?

It's the heads I win, tails you lose investment model

My 1st thought as an European, "YAY! EU startup to the moon". My 2nd thought was "n'aww, American VC". I guess that's the best we can do around here.

It may feel that there are few EU startups and that's true.

But there are even fewer EU VCs.

Was CTO for some European startups. I'll always remember one when by the time the EU VC was mid-way through its due dilligence for 500k seed, we already had some millions lined up from some US VCs no questions asked.

The problem is that no European VC has that amount of capital. European VCs typically have a couple of hundred million under mgmt. SV VCs have a few billion under mgmt.

Index Ventures has the money. But the truth of the matter is that even most US VCs aren't willing to shell out 2B valuations for a company with no revenue.

There were european VCs investing in the very first round, french one in particular. Founders are french. This qualifies as european in my book (let’s not get too demanding)

Unfortunately, the EU also just passed some AI regulations. Not sure how they impact Mistral's work, but just FWIW.

Why is that an unfortunately? We need regulations to set the rules of the game.

we don’t even know what AI is truely going to look like in 2 years, and 2 years ago nobody cared. Isn’t it a bit too early to regulate a field that’s barely starting ?

By regulating it now, we can shape what it's going to look like in 2 years.

What is the business model?

Sshh

Sorry I forgot, in AI $2Bn is preseed

Get the French government to throw a ton of money at you for sovereignty reasons

The old Masters have a saying: Never fall in love with your creation. The AI industry is falling into the trap of their own making (marketing). LLM's are nice toys, but implementation is resource/energy expensive and murky at best. There are a lot of real life problems that would be solved trough rational approach. If someone is thirsty, the water is the most important part, not the type of glass:)

If you compared the efficiency of steam engines during industrial revolution with the ones used today, or power generation from 100 years ago to that of now, or between just about any chemical process, manufacturing method or agricultural technique at its invention and now, you'd be amazed by the difference. In some cases, the activity of today was several orders of magnitude more wasteful just 100 years ago.

Or, I guess look at how size, energy use and speed of computer hardware evolved over the past 70 years. Point is, implementation being, right now, "resource/energy expensive and murky at best" is how many very powerful inventions look at the beginning.

If someone is thirsty, the water is the most important part, not the type of glass:)

Sure, except here, we're talking about one group selling a glass imbued with breakthrough nanotech, allowing it to keep the water at desired temperature indefinitely, and continuously refill itself by sucking moisture out of the air. Sometimes, the type glass may really matter, and then it's not surprising many groups strive to be able to produce it.

Don't fall in love with your creation, is not stop creating.

https://www.cell.com/joule/fulltext/S2542-4351(23)00365-3

Perhaps someone can answer this: this is a one year old company. Does this mean that barriers to entry are low and replication relatively simple?

Main barrier right now is access to supercompute and how to run it, everything is standardising quickly in the space

The part of Meta research that worked on LLaMa happened to be based in the Paris office. Then some of the leads left and started Mistral.

Complex/simple is not really the right way to think about training these models, I'd say its more arcane. Every mistake is expensive because it takes a ton of GPU time and/or human fine tuning time. Take a look at the logbooks of some of the open source/research training runs.

So these engineers have some value as they've seen these mistakes (paid for by Meta's budget).

I have realised just how meaningless valuations now are. As much as we use them as a marker of success, you can find someone to write the higher valuation ticket when it suits their agenda too e.g the markup, the status signal, or just getting the deal done ahead of your more rationale competitors in the investment landscape. Now that's not to say Mistral isn't a valuable company or that they aren't doing good work. It's just valuation markers are meaningless and most of this capital raise in the AI space is about offsetting the cloud/GPU spend. Might get downvoted to death but watching valuation news feels like no news.

It's smoke. but where there is smoke, there is some level of fire

Not if it's a smoke machine

Evaluation based on what? what is the business model?

I believe that the rationale is that if you can do an outstanding 7B model, it is likely that you are able to create, in the near future, something that may compete with OpenAI, and something that makes money, too.

$2B is super cheap when ChatGPT wrapper AI startups are worth $500M.

That’s insane how much money is flowing is between these investors and startups

Let me say this. Whoever is going to be able to let "normal" Mac users to install and run a local copy of an LLM, is going to reap tons of commercial benefits. (e.g. DMG, click-install, run. No command line).

It is nuts to me that we have 100M computers capable of running LLMs properly, and yet only a tiny fraction of them does.

Heck, let us do p2p, and lend our computing power to others.

Let us build a personalized LLM.

This is, IMHO, a really interesting path forward. It seems no one is doing it.

Whoever is going to be able to let "normal" Mac users to install and run a local copy of an LLM, is going to reap tons of commercial benefits. (e.g. DMG, click-install, run. No command line).

https://gpt4all.io

https://ollama.ai

Gotta give it to Nvidia and TSMC. In the big AI race, they’re the ones with real moat and no serious competition.

No matter who wins, they’ll need those sweet GPUs and fabs.

Its the good old "in a gold rush, sell shovels"

Curious to see how this will impact Aleph Alpha

Aleph Alpha raised even more ^_^

https://sifted.eu/articles/ai-startup-aleph-alpha-raises-500...

Perhaps too much off-topic, but I hate how the press (and often the startups themselves) focuses on the valuation number when a company receives funding. As we've seen in very recent history, those valuation numbers are at best a finger in the wind, and of course a big capital intensive project like AI requires a valuation that is at least a couple multiples of the investment, even if it's all essentially based on hope.

I think it would make much more sense to focus on the "reality side" of the transaction, e.g. "Mistral AI received a €450 million investment from top tech VC firms."

The valuation is meaningful in the sense of "Mistral sells 22.5% of company to VC firms."

There is a lot of noise here suggesting it is too much, but relative to the supposed SV unicorns of two years ago this looks like an absolute steal.

The macroeconomic situation 2 years ago and now was wildly different.

Some folks on this forum seem to get irritated by the prospect of a successful AI company HQed in the EU. Why the hate?

Because many around here have a preconceived bias that Europe cannot be innovative, and any proof to the contrary needs to be shat upon as not good or innovative enough/only looking for government contracts, or that they're not the size of Meta or Alphabet or Apple so obviously they aren't really innovative, or some other goal post shifting exercise.

Anyone has example of products that made large use of LLM API that could make economics sense to use self-hosted model (Mistral, LLAMA)?

Im working on embeddings database of my personal information, and ability to query it. Just a privacy reason.

Too many superlatives and groundbreaking miracles reported. Probably written by AI.

In a significant development for the European artificial intelligence sector, Paris-based startup Mistral AI has achieved a noteworthy milestone. The company has successfully secured a substantial investment of €450 million, propelling its valuation to an impressive $2 billion.

I’m cracking up. I don’t need to be a rocket scientist to read this and immediately conclude it’s AI-generated. I mean, they didn’t even try to hide that. Haha.

A competitor to OpenAI in like, benchmarks?

At least a competitor to Llama, for now.

https://medium.com/@datadrifters/mistral-7b-beats-llama-v2-1...

Noob questions (I don't know anything about LLM, I'm just a casual user of ChatGPT)

- is what Mistral does better than Meta or OpenAI?

- will LLM become eventually open-source commodities with little room for innovation or shall we expect to see a company with a competitive advantage that will make it the new Google? in other words, how much better can we expect these LLM to be in the future? should we expect significant progress or have we reached to diminished returns (after all, this is only statistical prediction of next word, maybe there's an intrinsic limitation of this method)

- are there some sorts of benchmarks to compare all these new models?

how does Mistral monetize or plan to monetize? create a chat gpt-like service and charge? license to other businesses?

Dupe https://news.ycombinator.com/item?id=38580758

I see a lot of comments asking what or how people are using these models for.

The promise of LLMs is not in chatbots (imho). At scale, you will not even realize you are interacting with a language model.

It just happens to be that the first, most boring, lowest hanging fruit products that OAI, Anthropic, et al pump out are chatbots.

Previously: OpenAI Rival Mistral Nears $2B Valuation with Andreessen Horowitz Backing (6 days ago, 2 points, 0 comments)[0], OpenAI Rival Mistral Nears $2B Valuation with Andreessen Horowitz Backing (5 days ago, 9 points, 1 comment)[1], French AI startup Mistral secures €2B valuation (2 days ago, 106 points, 74 comments)[2], Mistral, French A.I. Startup, Is Valued at $2B in Funding Round (6 hours ago, 15 points, 1 comment)[3]

[0]: https://news.ycombinator.com/item?id=38522873 [1]: https://news.ycombinator.com/item?id=38533725 [2]: https://news.ycombinator.com/item?id=38580758 [3]: https://news.ycombinator.com/item?id=38593526

this is inevitable. at some point companies like this will be too big to fail like air bus. maybe it’s already there

Who comes up with these valuations? The Donald?

That's fair given it's 50 times more difficult to use their model