HN comments for: Codestral: Mistral's Code Model

analyte123

56 replies

1d5h

2024-05-29 15:48:44 UTC

The license for this [1] prohibits use of the model and its outputs for any commercial activity, or even any "live" (whatever that means) conditions, commercial or not.

There seems to be an exclusion for using the code outputs as part of "development". But wait! It also prohibits "any internal usage by employees in the context of the company's business activities". However you interpret these clauses, this puts their claims and comparisons on completely unequal ground. They only compare to other open-weight models, not GPT-4 or Opus, but a normal company or individual can do whatever they want with the Llama weights and outputs. LangChain? "Your favourite coding and building environment"? Who cares? It seems you're not allowed to integrate this with anything else and show it to anyone, even as an art project.

[1] https://mistral.ai/licenses/MNPL-0.1.md

foobiekr

32 replies

1d5h

2024-05-29 15:57:00 UTC

There's some irony in the fact that people will ignore this license in exactly the same way Mistral and all the other LLM guys ignore the copyright and licensing on the works they ingest.

hehdhdjehehegwv

21 replies

1d2h

2024-05-29 19:08:30 UTC

So basically I, as an open source author, had my code eaten up by Mistral without my consent, but if I want to use their code model I’m subject to a bunch of restrictions that benefit their bottom line?

The problem these AI companies have is they live in a glass house and they can’t throw IP rocks around without breaking their own “your content is our training data” foundation.

They only reason I can think of that Google doesn’t go after OpenAI for scraping YouTube is then they’d put themselves in the same crosshairs, and may set a precedent they’d also be bound by.

Given the model is “on the web” I have the same rights as Mistral to use anything online however I want without regard for IP, right?

Utter absurdity.

IshKebab

10 replies

2024-05-29 21:00:06 UTC

So basically I, as an open source author, had my code eaten up by Mistral without my consent

Not necessarily. You consented to people reading your code and learning from it when you posted it on Github. Whether or not there's an issue with AI doing the same remains to be settled. It certainly isn't clear cut that separate consent would be required.

Liquix

4 replies

2024-05-29 21:14:11 UTC

MIT/BSD code is fair game, but isn't the whole point of GPL/AGPL "you can read and share and use this, but you can't take it and roll it into your closed commercial product for profit"? It seems like what Mistral and co are doing is a fundamental violation of the one thing GPL is striving to enforce.

rvz

0 replies

15h58m

2024-05-30 05:34:39 UTC

but isn't the whole point of GPL/AGPL "you can read and share and use this, but you can't take it and roll it into your closed commercial product for profit"?

You can profit from GPL / AGPL code but just also make all your source code open source and available for everyone to see.

poopypoopington

0 replies

5h22m

2024-05-30 16:10:08 UTC

Is there an updated version of these license(s) that explicitly excludes projects from being used for training of AIs?

hehdhdjehehegwv

0 replies

21h38m

2024-05-29 23:54:46 UTC

Precisely, this is such a basic violation of GPL it’s mind boggling they went for it.

gpm

0 replies

16h29m

2024-05-30 05:03:35 UTC

No. Either MIT/BSD code isn't fair game because it requires attribution, or GPL/AGPL code is fair game because it isn't copyright infringement in the first place so no license is required.

It'll be a court fight to determine which. Worse, it will be a court fight that plays out in a bunch of different countries and they probably won't all come to the same conclusion. It's unlikely the two licenses have a different effect here though. Either they both forbid it, or neither had the power to forbid it in the first place.

hehdhdjehehegwv

3 replies

2024-05-29 21:09:33 UTC

I did not give consent to train on my software and the license does not allow commercial use of it.

They have taken my code and now are dictating how I can use their derived work.

Personally I think these tools are useful, but if the data comes from the commons the model should also belong to the commons. This is just another attempt to gain private benefit from public work.

There are legal issues to be resolved, and there is an explosion of lawsuits already, but the fact pattern is simple and applies to nearly all closed-source AI companies.

portaouflop

2 replies

2024-05-29 21:19:28 UTC

Mistral is as open as they get, most others are far worse. Here you can use the model without issues, as others are saying it’s doubtful they would sue you if you were to use code generated by the model in a commercial app

mananaysiempre

0 replies

22h41m

2024-05-29 22:51:12 UTC

Replit’s replit-code[1,2] is CC BY-SA 4.0 for the weights, Apache 2.0 for the sources. Replit has its own unpleasant history[3], but the model’s terms are good. (The model itself is not as good, but deciding whether that’s a worthwhile tradeoff is up to you. The tradeoff exists and is meaningful, is my point.)

[1] https://huggingface.co/replit/replit-code-v1-3b

[2] https://huggingface.co/replit/replit-code-v1_5-3b

[3] https://news.ycombinator.com/item?id=27424195

hehdhdjehehegwv

0 replies

21h39m

2024-05-29 23:53:29 UTC

This model is more restricted than Mistral and Mixtral - this is a new development from them.

bobthecowboy

0 replies

17h24m

2024-05-30 04:08:14 UTC

You consented to people reading your code and learning from it when you posted it on Github.

And if I never posted my code to github, but someone else did? What if someone had posted proprietary code they had no rights to to github at the same time the scraper bots were trawling it? A few years ago some Windows source code was leaked onto Github - did Microsoft consent then?

paulddraper

2 replies

16h9m

2024-05-30 05:23:18 UTC

No. You are welcome to learn from Mistral's works, either as a meatbag or via machine agent.

You are not allowed to reproduce Mistral's works (beyond the usual Fair Use allowances).

Nor is Mistral entitled to reproduce your works (unless you have licensed as such).

If it does, you can sue for copyright infringement.

hehdhdjehehegwv

1 replies

2h7m

2024-05-30 19:25:47 UTC

This is an actively litigated and unsettled area of law. You, and nobody else, can say any of this with confidence until these lawsuits get to a judge, and even then it’s per jurisdiction rulings. The US, EU, and Japan may end up with different rulings. International trade agreements may be updated. Industry may settle on some sort of broadly acceptable revenue sharing model.

The point is: nobody knows and the AI companies are getting well ahead of the law.

paulddraper

0 replies

1h56m

2024-05-30 19:36:24 UTC

Those cases are about applying these principals to specific events/facts.

But what part about what I said do you believe to be undecided?

That a human can learn without violating copyright? That a machine can learn without violating copyright?

htrp

2 replies

1d2h

2024-05-29 19:29:15 UTC

call it an Enterprise poison pill.

hehdhdjehehegwv

1 replies

1d1h

2024-05-29 20:13:05 UTC

But a pill they also have to swallow.

dodslaser

0 replies

2024-05-29 21:00:26 UTC

Enterprise suicide cult.

rvz

1 replies

16h7m

2024-05-30 05:25:46 UTC

They only reason I can think of that Google doesn’t go after OpenAI for scraping YouTube is then they’d put themselves in the same crosshairs, and may set a precedent they’d also be bound by.

It will be the smartphone patent wars all over again with hundreds of lawsuits against big tech and AI companies.

We are already past the 'fair use' excuses at this point especially when OpenAI is slowly striking deals with news companies to train on their content (with their permission) and with intent of commercializing the model.

hehdhdjehehegwv

0 replies

4h9m

2024-05-30 17:22:58 UTC

I think a lot of the license motivation is to have real time information for RAG. I doubt that is being used for foundation training, it’s just not enough volume.

esperent

1 replies

16h16m

2024-05-30 05:16:37 UTC

I used to spend a lot of time (thousands of hours) contributing to open source projects. Over the past few years I've stopped contributing (except minor fixes) to any project under MIT/Apache or similar licences.

Has anyone else done this?

hehdhdjehehegwv

0 replies

4h11m

2024-05-30 17:21:44 UTC

Interesting, I think that’s a totally valid response to this trend of capturing “value” of open source via Cloud Services (for a while now) and Code Gen (more recent).

I think SV is just dead set on killing the golden goose of open source and the web by extracting as much as possible with no regard for the wasteland left behind.

nicce

4 replies

1d5h

2024-05-29 16:20:16 UTC

In many countries you even can't claim copyright for the output of the AI to use license like this.

hannasanarion

2 replies

1d4h

2024-05-29 17:30:32 UTC

The library's copyright is intact, as normal, and they can control who uses it and how just like any other software.

The output of AI systems is not copyrightable, but the systems themselves are, and associated EULAs are valid.

nicce

1 replies

1d3h

2024-05-29 18:12:00 UTC

Is that so certain? To be able to make claims for what you can use the output, can you do it without making any claims for about control and ownership of the output?

Of course, they can revoke your right to use the software, but if it goes to court, that would be interesting case.

wrs

0 replies

22h12m

2024-05-29 23:20:26 UTC

If there’s no copyright in the weights to begin with, the only restrictions you have are the ones you agreed to when you accepted the license agreement. Find the weights somewhere else and you don’t have to worry about the license.

I don’t know why there isn’t more discussion on this point and people just assume there’s an underlying copyright basis to the licensing of weights. As far as I know that isn’t settled at all.

wakawaka28

0 replies

21h35m

2024-05-29 23:57:14 UTC

I'd like to know how they think they'll prove I didn't write whatever code I generate. Unless it is a direct copy of something else available to the investigator, good luck.

nullc

1 replies

1d2h

2024-05-29 19:15:36 UTC

Five years ago it would not have been at all controversial that these weights would not be copyrightable in the US, they're machine generated output on third party data. Yet somehow we've entered a weird timeline where obvious copyfraud is fine, by the same entities that are at best on the line of engaging in commercial copyright infringement at a hereto unforeseen scale.

esperent

0 replies

16h13m

2024-05-30 05:19:27 UTC

It's clear that when enough money and power is on the line - and fear that other countries will overtake them - all countries are willing to conveniently and pragmatically ignore their laws. I don't think this is any kind of surprise.

belter

1 replies

1d4h

2024-05-29 17:31:47 UTC

And nobody will sue anybody because suing means...discovery....

aitchnyu

0 replies

11h58m

2024-05-30 09:34:09 UTC

Umm, are you saying its mutually assured destruction for the book pirate (AI company) and api wrapper startup?

rldjbpin

0 replies

12h52m

2024-05-30 08:40:47 UTC

should it be morally ok to not follow these kinds of license, maybe except when you are selling a service without making any changes? i wonder what people visiting this site thinks about this.

behnamoh

13 replies

1d5h

2024-05-29 16:04:21 UTC

From the website:

licensed under the new Mistral AI Non-Production License, which means that you can use it for research and testing purposes. ...

Which basically means "we give you this model. Go find its weaknesses and report on r/locallama. Then we'll use that to improve our commercial model which we won't open-source."

I'm sick of abusing the word "open-source" in this field.

JimDabell

10 replies

1d5h

2024-05-29 16:12:17 UTC

I'm sick of abusing the word "open-source" in this field.

They don’t call this open source anywhere, do they? As far as I can see, they only say it’s open weights and that it’s available under their Mistral AI Non-Production License for research and testing. That doesn’t scream “open source” to me.

demosthanos

7 replies

1d5h

2024-05-29 16:25:52 UTC

They do say "open-weight", which is I think still very misleading in this context. Open-weight sounds like it should be the same as open-source, just for weights instead of the full source (for example, training data and the code used to generate the weights may not be released). This isn't really "open" in any meaningful sense.

Zambyte

5 replies

1d4h

2024-05-29 16:51:08 UTC

The fact that I can downloaded it and run it myself is a pretty meaningful amount of openness to me. I can easily ignore their bogus claims about what I'm allowed to do with it due to their distribution model. I can't necessarily do the same with a propriety service, as they can cut me off if the way I use the output makes them sad :(

TeMPOraL

2 replies

1d3h

2024-05-29 17:43:32 UTC

The fact that I can downloaded it and run it myself is a pretty meaningful amount of openness to me

That's typically called freeware, though.

Zambyte

1 replies

1d2h

2024-05-29 18:34:30 UTC

The inference engine that I use to run open weight language models is fully free software. The model itself isn't really software in the traditional sense. So calling it ____ware seems inaccurate.

TeMPOraL

0 replies

1d1h

2024-05-29 19:36:52 UTC

The interpreter is free software. The model is freeware distributed as a binary blob. Code vs. Data is a matter of perspective, but with large neural nets, more than anywhere, it makes no sense to pretend they're plain data. All the computational complexity is in the weights, they're very much code compiled for an unusual architecture (the inference engine).

demosthanos

1 replies

1d2h

2024-05-29 18:48:34 UTC

I can easily ignore their bogus claims about what I'm allowed to do with it due to their distribution model.

If you're talking about exclusively personally use, sure. If you're talking about a business setting in a jurisdiction that Mistral can sue in, not so much.

Being able to use it in a business setting is a pretty darn important part of what Open Source has always meant (it's why it exists as a term at all).

Zambyte

0 replies

9h36m

2024-05-30 11:56:34 UTC

If you're talking about a business setting in a jurisdiction that Mistral can sue in, not so much.

I'm reminded of the Japanese concept called Sosumi :)

Being able to use it in a business setting is a pretty darn important part of what Open Source has always meant (it's why it exists as a term at all).

I'm quite familiar with the history of that term, but neither I nor Mistral used it. None of their models have been open source; they have been open weight. You can argue that they are actually "weight available" given the terms they write next to the download link, but since there has been no ruling on whether weights themselves are covered by copyright (and I think that would be terribly bogus if they are), I simply choose not to care what they write in their "terms of use".

boulos

0 replies

1d1h

2024-05-29 20:20:07 UTC

This is why I prefer the term "weights available" just like "source available". It makes it clear that you can get your hands on the copy, you could run this exact thing locally if they go out of business, etc. but it is definitely not open in the OSS sense.

gyudin

0 replies

1d5h

2024-05-29 16:25:55 UTC

All their other models are “open source” and it was the selling point they built their brand on. I doubt they made their new model completely different from previous ones so it’s supposed be open source too, unless they found some juridical loophole lol

Rastonbury

0 replies

1d4h

2024-05-29 16:53:17 UTC

No but they do say "empowering developers" and "democratising coding" as the subtitle, I guess only those who pay

wrs

0 replies

22h15m

2024-05-29 23:17:34 UTC

If you want to live on the legal edge, it’s unclear whether there is any copyright in model weights (since they don’t have human authorship), so just wait for someone to post the weights someplace where you can get them without agreeing to the license.

benreesman

0 replies

19h50m

2024-05-30 01:41:58 UTC

I use the term “available weight”.

This is maybe a debatable claim, but I’ll contend that without the magnificent rebel who leaked the original LLaMA weights the last, what, 15 months would have gone completely differently.

The legislators and courts and lawyers will be years if not decades sorting all this out.

For now there seems to be a productive if slightly uneasy truce: outside of a few groups at a few firms, everyone seems to be maximizing for innovation and generally behaving under a positive sum expectation.

One imagines if some really cool tune of this model shows up as a magnet or even on huggingface, the courteous thing probably happened: Mistral was notified in advance and some mutually beneficial arrangement was agreed to in outline, maybe inked, maybe not.

I don’t work for Mistral, so that’s pure speculation, but the big company I spent most of my career at would have certainly said “can we hire this person? can we buy this company? can we collaborate with people who do awesome stuff with our stuff that we didn’t think of?”

The icky actors kind of dominate the headlines and I’m as guilty as anyone and guiltier than most of letting that be top of mind too often.

In the large this is really cool and kind of new.

I’m personally rather optimistic that we’re well past the point when outright piracy or flagrantly adversarial license violations are either necessary or useful.

To me this license seems like an invitation to build on Mistral’s work and approach them with the results, and given how well a posture of openness with some safeguards is working out for FAIR and the LLaMA group, that’s certainly the outcome I’d be hoping for in their position.

Maybe open AI was an unrealistic goal. Maybe AvailableAI is what we wind up with, and that wouldn’t be too bad.

meiraleal

2 replies

1d5h

2024-05-29 16:28:18 UTC

Who cares? It seems you're not allowed to integrate this with anything else and show it to anyone, even as an art project.

Now they just lack the means to enforce it.

localfirst

1 replies

1d4h

2024-05-29 17:27:07 UTC

impossible to enforce

GuB-42

0 replies

20h17m

2024-05-30 01:15:43 UTC

They could potentially watermark the model in order to identify the output. There are techniques for doing that, for example by randomly assigning token into groups A and B, group A probability is increased over group B, if group A is over-represented, chances are that that the output comes from the watermarked model.

How effective these techniques are and how acceptable as a proof it is is yet to be defined.

I don't think it is the case here, they probably don't really care, and watermarking has a cost.

rohansood15

1 replies

1d5h

2024-05-29 16:02:22 UTC

That license is just hilarious.

das_keyboard

0 replies

1d5h

2024-05-29 16:27:32 UTC

OT, but 7.2 reads like the description of some Yu-Gi-Oh card or something:

Mistral AI may terminate this Agreement at any time [...]. Sections 5, 6, 7 and 8 shall survive the termination of this Agreement.

croes

1 replies

1d5h

2024-05-29 15:55:10 UTC

It's more like a demo version you can evaluate before you need to buy a commercial license.

On whose code is Mistral trained?

rvnx

0 replies

1d4h

2024-05-29 16:42:53 UTC

Your code, my code, etc. But there is a common case with law; copyright do not apply when you have billions.

Examples: recurring infringement from Microsoft on open-source projects, Google scraping content to build their own database, etc...

isoprophlex

0 replies

1d5h

2024-05-29 16:28:33 UTC

So, it's almost entirely useless with that license, because the average pack of corpo beancounters will never let you use it over whatever Microsoft has already sold them.

batch12

0 replies

2024-05-29 21:27:55 UTC

If they can make agreements with arbitrary terms, why can't we? [0]

[0] https://o565.com/content-ownership-and-licensing-agreement/

ddavis

38 replies

1d6h

2024-05-29 15:30:43 UTC

My favorite thing to ask the models designed for programming is: "Using Python write a pure ASGI middleware that intercepts the request body, response headers, and response body, stores that information in a dict, and then JSON encodes it to be sent to an external program using a function called transmit." None of them ever get it right :)

bongodongobob

21 replies

1d5h

2024-05-29 15:38:28 UTC

Cool, you've identified that your prompt is inadequate for the task.

'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?'

TechDebtDevin

17 replies

1d5h

2024-05-29 15:41:18 UTC

Damn, show us your brilliant prompt then. LLMs cannot do this, not even in python, of which there are libraries like Blacksheep that honestly make it a trivial task.

bongodongobob

7 replies

1d5h

2024-05-29 15:45:36 UTC

My point is that you shouldn't expect to one shot everything. Have it start by writing a spec, then outline classes and methods, then write the code, and feed it debug stuff.

TechDebtDevin

4 replies

1d5h

2024-05-29 15:51:18 UTC

I see your point but hand holding isn't really a good way to benchmark a models coding capabilities.

Closi

3 replies

1d5h

2024-05-29 15:56:01 UTC

Depends if benchmarking is the aim, rather than decreasing the time it takes to build things.

TechDebtDevin

2 replies

1d5h

2024-05-29 16:09:08 UTC

Well sure, but that wasn't what we were discussing. The original comment says they use that as their benchmark. While their coding task is a bit complex compared to other benchmarking prompts, it's not that crazy. Here is an example of prompts used for benchmarking with Python for reference:

https://huggingface.co/datasets/mbpp?row=98

At the end of the day LLMs in their current iteration aren't intended to do even moderately difficult tasks on their own but it's fun to query them to see progress when new claims are made.

Closi

1 replies

1d5h

2024-05-29 16:24:44 UTC

The original comment says nothing about benchmarking, they just say that an AI can’t one shot their complex task?

amne

0 replies

1d3h

2024-05-29 18:26:29 UTC

When I read

"My favorite thing to ask the models designed for programming is ....... None of them ever get it right"

I read "benchmark".

bottom999mottob

1 replies

1d5h

2024-05-29 15:51:15 UTC

Exactly, expecting one shot 100% working code with one prompt is ridiculous at this point. It's why libraries like Aider are so useful, because you can iteratively diff generated code until it's useable.

TechDebtDevin

0 replies

1d5h

2024-05-29 16:11:48 UTC

Sure it's impossible at this point, but the point of a benchmark isn't to complete the task it's to test it's efficacy overall and to see progress. None of them are 100% at even the simplistic python benchmarks, doesn't mean we shouldn't measure that capability. But sure, I get it. That's not how they are intended to be used but that's also not the point the commenter was laying out.

Closi

7 replies

1d5h

2024-05-29 15:54:41 UTC

Break your prompt up into smaller pieces and it can.

qeternity

6 replies

1d5h

2024-05-29 16:28:29 UTC

Taken to the extreme, a sufficiently broken down prompt is simply the code itself.

The whole point is to prompt less?

meiraleal

3 replies

1d4h

2024-05-29 16:36:44 UTC

Taken to the extreme, a sufficiently broken down prompt is simply the code itself

it is not. But the artifacts generated through the steps will be code. The last prompt will have most of the code supplied to it as the context.

buddhistdude

1 replies

22h56m

2024-05-29 22:35:51 UTC

No he is right, he is saying taken to the extreme. The point is the more and more specific you have to prompt, the more you are actually contributing to the result yourself and the less the model is

meiraleal

0 replies

22h26m

2024-05-29 23:06:40 UTC

Yes but the build up isn't manual. You go patching prompts with responses until the final result. The last prompt will be almost the whole code complete, obviously.

achierius

0 replies

1d3h

2024-05-29 17:43:08 UTC

A prompt is just a specification for an output. Code is just what we call a sufficiently detailed specification.

bongodongobob

0 replies

18h6m

2024-05-30 03:26:32 UTC

Well now we get into information density and Komolgorov complexity. The more complicated your desired output program is, the more information you'll have to put in, ie, more complicated prompts.

Closi

0 replies

1d4h

2024-05-29 17:04:06 UTC

More practically, the whole point is to prompt enough to generate valid code.

ben_w

0 replies

1d5h

2024-05-29 15:51:37 UTC

Prompts like yours (I ask them for a fluid dynamics simulator which also doesn't succeed) inform us of the level they have reached. A useful benchmark, given how many of the formal ones they breeze through.

I'm glad they can't quite manage this yet. Means I still have a job.

ddavis

1 replies

1d5h

2024-05-29 15:54:49 UTC

It's something I know how to do after figuring it out myself and discovering the potential sharp edges, so I've made it into a fun game to test the models. I'd argue that it's a great prompt (to keep using consistently over time) to see the evolution of this wildly accelerating field.

kergonath

0 replies

1d5h

2024-05-29 16:09:52 UTC

Do you notice any progress over time?

AnimalMuppet

0 replies

1d5h

2024-05-29 16:28:03 UTC

How is that "putting in wrong figures"? It's a perfectly valid prompt, written in clear, proper English.

gyudin

3 replies

1d5h

2024-05-29 16:28:03 UTC

I ask software developers to do the same thing and give them the same amount of time. None of them ever write a single line of code :)

dieortin

2 replies

1d4h

2024-05-29 17:26:39 UTC

Give an LLM all the time you want, and they will still not get it right. In fact, they most likely will give worse and worse answers with time. That’s a big difference with a software developer.

mypalmike

0 replies

23h37m

2024-05-29 21:55:48 UTC

My experience is very different. Often it (ChatGPT or Copilot, depending on what I'm trying to accomplish) gets things right the first time. When it doesn't, it's usually close enough that a bit of manual modification is all that's needed. Sometimes it's totally wrong, but I can usually point it in the right direction.

TeamDman

0 replies

3h24m

2024-05-30 18:08:01 UTC

I mean, with a nonzero temperature, the randomness will eventually produce every combination of tokens in the corpus, so with a sufficiently large "all the time you want" you can produce limitless correct answers

JimDabell

3 replies

1d5h

2024-05-29 16:06:37 UTC

I normally ask about building a multi-tenant system using async SQLAlchemy 2 ORM where some tables are shared between tenants in a global PostgreSQL schema and some are in a per-tenant schema.

Nothing gets it right first time, but when ChatGPT 4 first came out, I could talk to it more and it would eventually get it right. Not long after that though, ChatGPT degraded. It would get it wrong on the first try, but with every subsequent follow up it would forget one of the constraints. Then when it was prompted to fix that one, it forgot a different one. And eventually it would cycle through all of the constraints, getting at least one wrong each time.

Since then benchmarks came out showing that ChatGPT “didn’t really degrade”, but all of the benchmarks seemed focused on single question/answer pairs and not actual multi-turn chat. For this kind of thing, ChatGPT 4 has never managed to recover to as good as it was when it was first released in my experience.

It’s been months since I’ve had to deal with that kind of code, so I might be forgetting something, but I just tried it with Codestral and it spat out something that looked reasonable very quickly on its first try.

checkyoursudo

1 replies

1d2h

2024-05-29 18:34:54 UTC

I had a similar experience. I was trying to get GPT 4 to write some R/Stan code for a bit of bayesian modelling. It would get the model wrong, and then I would walk it through how to do it right, and by the end it would almost get it right, but on the next step, it would be like, oh, this is what you want, and the output was identical to the first wrong attempt, which would start the loop over again.

happypumpkin

0 replies

23h41m

2024-05-29 21:51:14 UTC

Similar experience using GPT4 for help with Apple's Accessibility API. I wanted to do some non-happy-path things and it kept looping between solutions that failed to satisfy at least one of a handful of requirements that I had, and in ways that I couldn't combine the different "solutions" to meet all the requirements.

I was eventually able to figure it out with the help of some early 2010s blog posts. Sadly I didn't test giving it that context and having it attempt to find a solution again (and this was before web browsing was integrated with the web app).

More of an issue than it not knowing enough to fulfill my request (it was pretty obscure so I didn't necessarily expect that it would be able to) was that it didn't mind emitting solutions that failed to meet the requirements. "I don't know how to do that" would've been a much preferred answer.

alephxyz

0 replies

1d1h

2024-05-29 19:41:01 UTC

It would get it wrong on the first try, but with every subsequent follow up it would forget one of the constraints. Then when it was prompted to fix that one, it forgot a different one. And eventually it would cycle through all of the constraints, getting at least one wrong each time.

That drives me nuts and makes me ragequit about half the time. Although it's usually more effective to go and correct your initial prompt rather than prompt it again

spmurrayzzz

1 replies

2024-05-29 20:32:55 UTC

I love to ask it to "make me a Node.js library that pings an ipv4 address, but you must use ZERO dependencies, you must only the native Node.js API modules"

The majority of models (both proprietary and open-weight) don't understand:

- by inference, ping means we're talking about ICMP

- ICMP requires raw sockets

- Node.js has no native raw socket API

You can do some CoT trickery to help it reason about the problem and maybe finally get it settled on a variety of solutions (usually some flavor of building a native add-on using C/C++/Rust/Go), or just guide it there step by step yourself, but the back and forth to get there requires a ton of pre-knowledge of the problem space which sorta defeats the purpose. If you just feed it the errors you get verbatim trying to run the code it generates, you end up in painful feedback loops.

(Note: I never expect the models to get this right, it's just a good microcosmic but concrete example of where knowledge & reasoning meets actual programming acumen, so its cool to see how models evolve to get better, if at all, at the task).

halJordan

0 replies

44m

2024-05-30 20:48:45 UTC

This is the same level of gotcha that everyone complains about when interviewing. It's mainly just depending on the interviewee having the same assumptions (pings definitely do not have to be icmp) and the same knowledge base, usually bespoke, (node.js peculiarities). I can see that an llm should know whether raw sockets are available, but that's not what you asked.

In fact you deliberately asked for something impossible and hold up undefined behavior as undefined like it's impugning something.

sanex

1 replies

1d5h

2024-05-29 15:40:25 UTC

Can you get it right without an IDE?

ddavis

0 replies

1d5h

2024-05-29 16:00:15 UTC

Nope, I don't know how to do it at all- that's why I have to ask AI!

nicce

1 replies

1d5h

2024-05-29 15:41:43 UTC

I usually through some complex Rust code with lifetime requirements. And ask them to fix it. LLMs aren't capable on providing much help for that in general, other than some very basic cases.

The best way to get your work done is still to look into Rust forums.

meiraleal

0 replies

1d5h

2024-05-29 16:32:30 UTC

It works amazingly well for the ones that never coded in Rust, at least in my experience. It took me a couple hours and 120 lines of code to set up a WebRTC signaling server.

shepardrtc

0 replies

1d3h

2024-05-29 18:22:26 UTC

gpt-4o gets it right on the first try for me. Just ran it and tested it.

meiraleal

0 replies

1d5h

2024-05-29 15:45:02 UTC

Interesting. My favorite thing to ask the models is to refactor code I've not touched for too long and this works very well.

colesantiago

16 replies

1d6h

2024-05-29 15:22:28 UTC

I'm so happy now LLMs are democratising access to programming, especially open models like what Meta with Llama and Mistral is doing with Codestral are doing.

The abundance of programming is going to allow almost everyone to become a great programmer.

This is so exciting to see and each day programming is becoming a solved problem so we can focus on other things.

icedchai

5 replies

1d5h

2024-05-29 15:54:55 UTC

I'm skeptical. I've run into people who used LLMs to code, then can't debug it without someone else's help. It may get you 80% there though.

whiplash451

2 replies

1d5h

2024-05-29 16:05:35 UTC

It does not get you 80% there if it achieves what you described. It rather gets you 100% into trouble.

icedchai

0 replies

23h53m

2024-05-29 21:39:23 UTC

I agree with you. I've had to debug some of that junk.

croes

0 replies

1d5h

2024-05-29 16:12:25 UTC

Programmer view vs management view.

100% of nothing vs 80% of enough.

That's the risk of AI. Not that AI outperforms humans already but that managers believe it does. That and that code writing is the main work of programmers.

Cyphase

1 replies

23h17m

2024-05-29 22:15:28 UTC

I've run into working programmers who were bad at debugging before LLMs existed.

icedchai

0 replies

21h18m

2024-05-30 00:13:52 UTC

Absolutely. LLMs aren't going to replace good developers. The bad ones, maybe.

maskil

3 replies

1d5h

2024-05-29 16:10:46 UTC

I would argue the opposite is true.

My experience with coding with LLMs is that the only thing it's really good at is generating boilerplate that it has more-or-less seen before (essentially a library, even if is somewhat adapted), however it is incapable of the creative thinking that developers regularly need to engage in when architecting a solution for their use case.

Kiro

2 replies

1d4h

2024-05-29 17:09:37 UTC

My experience is the opposite. When I started using Copilot I thought it would only be good at standard boilerplate but I'm constantly surprised how well it understands my completely convoluted legacy architecture that barely I understand myself even though I'm the only contributor.

maskil

0 replies

19h59m

2024-05-30 01:33:31 UTC

Understanding existing code is in its wheelhouse (provided the infrastructure feeding the existing code to the prompt is working well), but I believe if you examine the totality of work a human programmer is involved in, an LLM is woefully behind in many areas (gathering proper requirements, potentially iterating/pushing back on requirements, architecting a solution on a macro level, other gaps an LLm cannot fill).

localfirst

0 replies

1d2h

2024-05-29 19:29:05 UTC

I've been on both sides of the fence here.

Parents problem I experienced -> it gets "stuck" and its limitation of learning loop (humans are always asking why it gets stuck and how to get unstuck), LLMs just power through without understanding what "stuck" is.

For explaining existing corpus, algorithm it does a fantastic job.

So likely we will see significant wage garnishing in "agency/b2b enterprise" shops.

smokel

2 replies

1d5h

2024-05-29 15:51:56 UTC

In my experience these tools amplify the quality of a programmer.

I have seen good programmers dramatically increase their productivity, but I've also seen others copy-pasting for loops inside other for loops where one loop would definitely suffice. We're not quite there yet.

croes

0 replies

1d5h

2024-05-29 16:09:57 UTC

I'm curious for the long-term effect.

I observe a certain laziness in myself when it comes to certain problems. It's easier to ask a LLM and debug provided code, but I ask myself if I'm losing some problem solving capabilities in the long run because of this.

Similar to the loss of speed in doing mental arithmetic because of calculators on the smartphone.

bubbleRefuge

0 replies

1d3h

2024-05-29 18:18:44 UTC

Absolutely it amplifies. Complex and esoteric configuration of frameworks, for example, entails so much reading and Googling and can be very time consuming without AI. AI can help to bring custom software to the markets that could not otherwise afford to pay for it.

skydhash

0 replies

1d5h

2024-05-29 15:50:32 UTC

Shadow libraries did more to democratize anything than LLMs. And following a book like Elixir in Action (Manning) will get you there faster than chatting with LLMs or copilot generating code for you.

huygens6363

0 replies

1d3h

2024-05-29 18:30:29 UTC

This enables everyone to be great programmers like how easily available power tools enables everyone to be a great carpenter and general craftsman.

You’ll get a lot of shitty stuff and the profession will get hollowed out losing attraction of the smart people. We’ll be left with low-quality, disposable bullshit while wondering where all the programmers went.

croes

0 replies

1d5h

2024-05-29 16:02:48 UTC

The abundance of programming is going to allow almost everyone to become a great programmer.

How do you become a great programmer if you don't really program?

andruby

15 replies

1d6h

2024-05-29 15:18:03 UTC

This is an open weights 22B model. The download on Huggingface is 44GB.

Is there a rule-of-thumb estimate for how much RAM this would need to be used locally?

Is the RAM requirement the same for a GPU and "unified" RAM like Apple silicon?

Terretta

5 replies

19h12m

2024-05-30 02:19:56 UTC

Yes, RAM requirement is BnL same for GPU and using the metal/GPU in Apple Silicon.

Running LLM models on a MacBook Pro with Apple Silicon vs. a PC with an Nvidia 4090 GPU has trade-offs. My 128GB MacBook Pro handles models using up to 96GB of unified memory, running at a little under half the speed of a 4090. If you use a quantized version of full floating point model, you can run the largest open models available.

While the 4090 has 24GB of dedicated memory and higher bandwidth (1000 GB/s vs. 400 GB/s on M3 Max), the Mac’s unified memory system (up to 128GB) is flexible and holds smarter models (8 bit and 6 bit models act still mostly all there, 4 bit is so so, 2 bit is brain damaged).

The M2 Ultra in Mac Studio offers even more (800 GB/s bandwidth and 192GB memory). So, ok, 6 or 8 of 4090 cards or 4 x A6000 cards excels in raw performance, but Apple’s unified memory in a laptop fits in your backback.

It's not clear to me why Macbooks and Mac Studio Ultras with maxed out RAM aren't selling better if you look at the convenience and price relative to model size. Models that fit in one 4090 or even a pair of 4090s are toys compared to what fits on these, so for the big models you're comparing a laptop to a minifridge.

tyfon

2 replies

16h13m

2024-05-30 05:19:37 UTC

I have a 5940x with 128 gb ram.

It's a bit slower perhaps than the mac, but i get the best of both worlds. That is I get a lot of RAM to hold the model and I can offload as much of it as possible to the GPU. This works especially well with models like mixtral 8x22, but also models like llama3 and the old large bloom model.

I also get the utility of running Linux instead of the closed up mac os.

But running large models locally is not exclusive to mac studio, you can do the same on PC for a much lower cost.

Terretta

1 replies

12h48m

2024-05-30 08:44:05 UTC

I get the utility of a laptop that runs 20 hours on battery and slips in the side pocket of my carry-on or shoulder bag. (The Mac can also split between RAM and GPU.) Mixtral 8x22 and Llama 3 70b stream at roughly the same speed as last year's GPT-4.

closed up MacOS

https://github.com/apple-oss-distributions/distribution-macO...

  curl https://alx.sh | sh

https://asahilinux.org/

I prefer the "utility" of BSDs, but that's just a preference.

talldayo

0 replies

5h36m

2024-05-30 15:55:53 UTC

Asahi is Fischer-Price tiers of support compared to what even Nvidia, the most loathed OEM on Linux, provides for free to their users. If that's the best option available, it should be no wonder that server customers are avoiding Apple like the plague. Apple has to beg their audience to reverse-engineer their own OpenCL drivers if they want them; Nvidia ships them alongside CUDA. These two companies are not the same.

Have you ever seen the inside of a datacenter? Why is it that surprising to you that nobody perks up when you start waxing on about battery life? Even terms of power-to-performance, Apple's latest chips get ethered by Nvidia's server offerings.

This "Apple for Inference" meme is so dead that I can only feel sad when I see people unironically promoting it. You actually think serious customers are going to load up Asahi (even funnier, MacOS) on their Mac Pro... so they can inference half as fast as a single Blackwell GPU? You think the industry is doing this shit? I don't even think the Steve Jobs apologists are dumb enough to fall for this one, you must be a particularly aspirational shareholder.

daghamm

1 replies

15h14m

2024-05-30 06:18:04 UTC

"It's not clear to me why Macbooks and Mac Studio Ultras with maxed out RAM aren't selling"

Aren't these machines extremly expensive and generally not upgradable?

Terretta

0 replies

12h55m

2024-05-30 08:37:09 UTC

They cost less than the 4 - 8 graphics cards and monster desktop PC that would be needed, plus, hey, it's a laptop.

fnbr

4 replies

1d6h

2024-05-29 15:20:27 UTC

The rule of thumb is roughly 44gb, as most models are trained in bf16, and require 16 bits per parameter, so 2 bytes. You need a bit more for activations, so maybe 50GB?

you need enough RAM and HBM (GPU RAM) so it’s a constraint on both.

sharbloop

2 replies

1d6h

2024-05-29 15:27:19 UTC

Which GPU card can I buy to run this model? Can it run on commercial RTX3090 or does it need a custom GPU?

TechDebtDevin

0 replies

1d5h

2024-05-29 15:49:04 UTC

Easy..

Havoc

0 replies

1d5h

2024-05-29 15:40:02 UTC

3090 or 4090 will be able to run quantized 22B models.

Though realistically for code completion smaller models will be better due to speed

Novosell

0 replies

1d6h

2024-05-29 15:32:43 UTC

Most GPUs still use GDDR I'm pretty sure, not HBM. Do you mean VRAM?

wing-_-nuts

0 replies

1d5h

2024-05-29 15:51:17 UTC

Wait for a gguf release of this and it will fit neatly into a 3090 with a decent quant. I'm excited for this model and I'll be adding it to my collection.

tosh

0 replies

1d6h

2024-05-29 15:25:52 UTC

B × Q / 8

B: number of parameters

Q: quantization (16 = no quantization)

via https://news.ycombinator.com/item?id=40090566

mauricio

0 replies

1d6h

2024-05-29 15:24:46 UTC

22B params * 2 bytes (FP16) = 44GB just for the weights. Doesn't include KV cache and other things.

When the model gets quantized to say 4bit ints, it'll be 22B params * 0.5 bytes = 11GB for example.

TechDebtDevin

0 replies

1d5h

2024-05-29 15:48:35 UTC

I'm honestly not sure on how to measure the amount of vRAM required for these models but I suspect this would run relatively fast, depending on your use case, on a mid to high end 20 or 30 series card. No idea about Apple unified RAM. I get a lot out of performance out of even older cards such as a 1080ti but haven't tested this model.

mousetree

13 replies

1d6h

2024-05-29 14:58:09 UTC

How does this compare to Github Copilot? It's not shown in their comparison

ramon156

4 replies

1d6h

2024-05-29 15:02:46 UTC

Knowing the training data GH has I doubt it's comparable, then again I don't have the benchmarks

ssgodderidge

2 replies

1d6h

2024-05-29 15:10:11 UTC

Are you saying GH has more than Codestral and therefore GH has a better model? Or that Codestral would be better because Codestral is not littered with "bad" code?

nkozyra

1 replies

1d6h

2024-05-29 15:16:25 UTC

Bad code is obviously very subjective, but I would wager that GH places a much higher value on feedback mechanisms like stars, issues, PRs, velocity, etc. Their ubiquity likely allows them to automatically cherry-pick less "bad code."

nicce

0 replies

1d2h

2024-05-29 19:21:15 UTC

Nothing prevents Mistral do the same if they want to. Issues and and PRs are public information, exposed by APIs, and not that much rate limited.

ramon156

0 replies

1d6h

2024-05-29 15:08:48 UTC

After typing this I tried the live chat out and it honestly seems a lot more promising than current GH Copilot, very nice!

rohansood15

3 replies

1d6h

2024-05-29 15:20:54 UTC

Copilot primarily uses GPT-3.5, which is outclassed by Llama3-70B. And this model claims to be slightly better than Llama3-70B.

Edit: For those who don't believe me, https://github.com/microsoft/vscode-copilot-release/issues/6.... Gpt-4 for chat, 3.5 for code.

jasonjmcghee

1 replies

1d5h

2024-05-29 15:45:35 UTC

GitHub Copilot uses GPT-3.5?

I was under the impression it was a custom codex model with a surrogate local model as per https://github.blog/2023-02-14-github-copilot-now-has-a-bett...

When did this change?

Rastonbury

0 replies

1d5h

2024-05-29 15:53:15 UTC

When it first launched it, I too didn't know they had changed the model from the original codex which came similar time as gpt-3.5

jasonjmcghee

0 replies

1d5h

2024-05-29 16:32:48 UTC

Gpt-4 for chat, 3.5 for code

That thread is comparing sidebar chat to inline chat. Doesn't discuss code completions afaict.

esafak

1 replies

19h50m

2024-05-30 01:42:29 UTC

It is fast all right, but the quality is not there. I asked it to implement OAuth with Stytch and Ktor and it made everything up. I pointed out the correct name for the package and asked if it really knew the SDK, and it apologized and repeated the same made up code after merely changing the name of the package.

cco

0 replies

18h42m

2024-05-30 02:49:54 UTC

This is actually why we (Stytch) haven't rolled out any of these "chatbots for code".

We have a big list of example questions we get from devs trying us out and we've tested several home grown and third party providers and thus far haven't seen anything good enough that we'd put into production.

Thanks for testing this out for us! I'll cross it off our list :)

nkozyra

0 replies

1d6h

2024-05-29 15:02:46 UTC

Not sure how much current Copilot varies from the original Codex, but another set of benchmarks here: https://paperswithcode.com/sota/code-generation-on-humaneval

localfirst

0 replies

1d2h

2024-05-29 19:23:53 UTC

It's miles better.

In fact I stopped using expensive GPT-4

Codestral just works, its quick, output is accurate its kinda scary.

swyx

9 replies

1d6h

2024-05-29 15:20:06 UTC

i've been noticing that there's a divergence in philosophy between Llama style LLMs (Mistral are Meta alums so I'm counting them in tehre) and OpenAI/GPT style LLMs when it comes to code.

GPT3.5+ prioritized code very heavily - there's no CodeGPT, its just GPT4, and every version is better than the last.

Whereas the Llama/Mistral models are now shipping the general language model first, then adding CodeLlama/Codestral with additional pretraining (it seems like we don't know how much more tokens are on this one, but CodeLLama was 500B-1T extra tokens of code).

Zuck has mentioned recently that he doesnt see coding ability as important for his usecases, whereas obviously OpenAI is betting heavily on code as a way to improve LLM reasoning for AGI.

memothon

3 replies

1d6h

2024-05-29 15:22:33 UTC

Zuck has mentioned recently

That's a really surprising thing to hear, where did you see that? The only quote I've seen is this one:

“One hypothesis was that coding isn’t that important because it’s not like a lot of people are going to ask coding questions in WhatsApp,” he says. “It turns out that coding is actually really important structurally for having the LLMs be able to understand the rigor and hierarchical structure of knowledge, and just generally have more of an intuitive sense of logic.”

https://www.theverge.com/2024/1/18/24042354/mark-zuckerberg-...

whoami_nr

1 replies

1d3h

2024-05-29 17:51:39 UTC

He mentioned it on the Dwarkesh podcast: https://www.youtube.com/watch?v=bc6uFV9CJGg

Davidzheng

0 replies

22h11m

2024-05-29 23:20:53 UTC

I watched this podcast and i also remember zuck saying it is important

imachine1980_

0 replies

1d5h

2024-05-29 15:36:23 UTC

Make Sense, they want better interaction whit users for Whatsapp, Instagram and Facebook marketers, content creation and moderation,and their glasses(ai /ar) I don't see in that context why the should push more effort into llm coding, is sad anyways

tkellogg

0 replies

1d6h

2024-05-29 15:27:30 UTC

The OpenAI philosophy is that adding modes improves everything. Sure, it’s astronomically expensive, but I tend to think they’re on to something.

nabakin

0 replies

21h12m

2024-05-30 00:19:53 UTC

there's no CodeGPT, its just GPT4

Codex[1] is OpenAI's CodeGPT. It's what powers GitHub Copilot and it is very good but not publicly accessible. Maybe they don't want something else to outcompete Copilot.

[1] https://openai.com/index/openai-codex/

guyomes

0 replies

1d5h

2024-05-29 15:43:30 UTC

OpenAI is betting heavily on code as a way to improve LLM reasoning for AGI.

And researchers from Google Deepmind, University of Wisconsin-Madison and Laboratoire de l’Informatique du Parallélisme, University of Lyon, actually publish some of their results in that direction [1,2].

[1]: https://deepmind.google/discover/blog/funsearch-making-new-d...

[2]: https://www.nature.com/articles/s41586-023-06924-6

behnamoh

0 replies

1d5h

2024-05-29 16:06:11 UTC

Zuck

No, if anything he said Meta realized coding abilities make the model overall better, so they focused on those more than before.

Rastonbury

0 replies

1d5h

2024-05-29 15:44:40 UTC

I thought that was the idea, open source small specific models that most people can run vs general purpose ones that require a massive amount of GPUs

IMTDb

9 replies

1d5h

2024-05-29 15:35:29 UTC

Is there a way to use this within VSCode like copilot , meaning having the "shadow code" appear while you code instead of having to tho back-and-forth between the editor and a chat-like interface ?

For me, a significant component of the quality of these tools resides on the "client" side; being able to engineer a prompt that will yield to accurate code being generated by the model. The prompt needs to find and embed the right chunks from the user current workspace, or even from his entire org repos. The model is "just" one piece of the puzzle.

pyepye

3 replies

1d5h

2024-05-29 15:42:21 UTC

Not using Codestral (yet) but check out Continue.dev[1] with Ollama[2] running llama3:latest and starcoder2:3b. It gives you a locally running chat and edit via llama3 and autocomplete via starcoder2.

It's not perfect but it's getting better and better.

[1] https://www.continue.dev/ [2] https://ollama.com/

sa-code

0 replies

1d3h

2024-05-29 18:07:29 UTC

This doesn't give the "shadow text" that the user specifically mentioned

mijoharas

0 replies

1d1h

2024-05-29 20:07:47 UTC

Wow... That site (continue.dev) managed to consistently crash my mobile google chrome.

I've had the odd crash now and again, but I can't think of many sites that will reliably make it hard crash. It's almost impressive.

jmorgan

0 replies

1d4h

2024-05-29 16:59:34 UTC

Codestral was just published here as well: https://ollama.com/library/codestral

outlore

0 replies

16h57m

2024-05-30 04:34:58 UTC

There are many extensions that hook up to Ollama: Continue, Twinny, Privy being a few

meiraleal

0 replies

1d5h

2024-05-29 15:41:29 UTC

I created a simple CLI app that does this in my workspace, which is under source control so after the LLM execution all the changes are highlighted by diff and the LLM also creates a COMMIT_EDITMSG file describing what it changed. Now I don't use chatgpt anymore, only this cli tool.

I never saw something like this integrated directly on VSCode tho (and isn't my preferred workflow anyway, command line works better).

jdoss

0 replies

1d5h

2024-05-29 15:59:17 UTC

I have been using Ollama to run the Llama3 model and I chat with it via Obsidian using https://github.com/logancyang/obsidian-copilot and I hook VSCode into it with https://github.com/ex3ndr/llama-coder

Having the chats in Obsidian lets me save them to reference them later in my notes. When I first started using it in VSCode when programming in Python it felt like a lot of noise at first. It kept generating a lot of useless recommendations, but recently it has been super helpful.

I think my only gripe is I sometimes forget to turn off my ollama systemd unit and I get some noticeable video lag when playing games on my workstation. I think for my next video card upgrade, I am going to build a new home server that can fit my current NVIDIA RTX 3090 Ti and use that as a dedicated server for running ollama.

jacekm

0 replies

1d5h

2024-05-29 15:48:46 UTC

The article says that the model is available in Tabnine, a direct competitor to Copilot.

croes

0 replies

1d5h

2024-05-29 15:48:43 UTC

You mean like in their example VS code integration shown here?:

https://m.youtube.com/watch?v=mjltGOJMJZA

esafak

4 replies

1d5h

2024-05-29 16:28:17 UTC

Are there any IDE plugins that index your entire code base in order to provide contextual responses AND let you pick between the latest models?

If not, consider it a product idea ;)

elmariachi

1 replies

1d2h

2024-05-29 18:42:25 UTC

Cody by Sourcegraph allows you to do this. It doesn't have Codestral yet but probably will soon.

jdorfman

0 replies

2024-05-29 20:49:11 UTC

We are working on it.

saturatedfat

0 replies

1d4h

2024-05-29 17:15:18 UTC

Supermaven, but you don’t get model choice.

pmmucsd

0 replies

1d4h

2024-05-29 17:01:23 UTC

There are plugins for various IDEs that operate like copilot but let you select model you want to use, just supply your key. CodeGPT for JetBrains/Android Studio is pretty good. I think you can even use a model running locally.

sashank_1509

3 replies

1d6h

2024-05-29 15:19:27 UTC

Seems nice but some preliminary testing against GPT-4o shows it’s lacking a bit. It does a pretty good job for easy questions though

jasonjmcghee

1 replies

1d5h

2024-05-29 15:42:26 UTC

GPT-4o is really oddly hit or miss for code.

Sometimes it outperforms GPT-4 in quality by a fair amount, and other times it starts repeating itself. Duplicating function definitions, even misremembering what things are named.

It seems to have to do with length. If the output exceeds a few thousand tokens, it seems to experience some pretty bad failure modes.

afro88

0 replies

1d3h

2024-05-29 18:02:31 UTC

4o can only output 4k tokens. So the training to complete an answer within 4k tokens is probably kicking in and nerfing the quality

localfirst

0 replies

1d2h

2024-05-29 19:25:51 UTC

personally this has performed consistently and just as good if not better than GPT-4

what strikes me is the consistency and lack of hallucination you got in GPT4o making in unusuable for any reliable code gen

sebzim4500

1 replies

1d6h

2024-05-29 15:20:25 UTC

Very impressed with it based on a short live chat, feels insanely fast considering its capability.

chat.mistral.ai

kergonath

0 replies

1d4h

2024-05-29 16:37:24 UTC

We'll see how fast it is on consumer hardware once decent quantisations are available.

isoprophlex

1 replies

1d5h

2024-05-29 16:27:17 UTC

Does it do SQL, and if so, which dialects? I am having a hard time figuring out what it is actually trained on

sebzim4500

0 replies

1d4h

2024-05-29 16:37:19 UTC

They claim good results in a SQL benchmark but they don't specify what dialects it knows.

isaacrolandson

1 replies

1d2h

2024-05-29 19:23:15 UTC

Will this run on an M3 48GB?

piskov

0 replies

1d1h

2024-05-29 19:48:42 UTC

You’ll need 44GB just for the weights

By default only 75% of unified memory is available to GPU if you have >36GB. So with 48 total only 36 is available for GPU with is lower than 44.

tldr; without quantization you will not be able to run it.

bloopernova

1 replies

1d6h

2024-05-29 15:16:34 UTC

Does anyone know of a link to a codegen comparison page? In other words, you write your request, and it's submitted to multiple codegen engines, so you can compare the output.

rohansood15

0 replies

1d6h

2024-05-29 15:19:18 UTC

Not the same, but we evaluated how good LLMs are at fixing code and just posted it on HN: https://news.ycombinator.com/item?id=40511689

Sytten

1 replies

2024-05-29 21:05:40 UTC

Is there a vscode extension that could plug any model out there and have a similar experience to copilot. I always want to try them but I cant be bothered to do a whole setup each time.

lioeters

0 replies

7h35m

2024-05-30 13:57:21 UTC

Haven't used it personally, but someone up-thread mentioned:

Llama Coder is a better and self-hosted Github Copilot replacement for VS Code

https://github.com/ex3ndr/llama-coder

And:

Open-source VS Code and JetBrains extensions that enable you to easily create your own modular AI software development system

https://github.com/continuedev/continue

mirekrusin

0 replies

1d2h

2024-05-29 19:20:57 UTC

Fifty shades of "open".

jstummbillig

0 replies

1d5h

2024-05-29 15:54:32 UTC

How I interact with new model reports at this point: Open the page, ctrl + f, "gpt-4" and skip the rest.

jhonatan08

0 replies

1d6h

2024-05-29 15:28:13 UTC

Do we have a list of the 80+ languages it was trained on? I couldn't find it

hallman76

0 replies

3h42m

2024-05-30 17:49:59 UTC

dumb question - why not create smaller language-specific models?

gsuuon

0 replies

2024-05-29 21:14:33 UTC

How does the Mistral non-production license work for small/hobby/indie projects? Has anyone tried to get approval for that kind of use?

gavin_gee

0 replies

2024-05-29 21:22:04 UTC

what the heck is this for, if you can't use it for commercial work?

ein0p

0 replies

22h57m

2024-05-29 22:34:52 UTC

If I can’t use the output of this in practical code completion use cases, it’s meaningless, because GH Copilot exists. Idk what they’re thinking or what business model they’re envisioning - Copilot is far and away the best model of this kind anyway

croes

0 replies

1d5h

2024-05-29 15:52:57 UTC

Usage Limitation

- You shall only use the Mistral Models and Derivatives (whether or not created by Mistral AI) for testing, research, Personal, or evaluation purposes in Non-Production Environments;

- Subject to the foregoing, You shall not supply the Mistral Models, Derivatives, or Outputs in the course of a commercial activity, whether in return for payment or free of charge, in any medium or form, including but not limited to through a hosted or managed service (e.g. SaaS, cloud instances, etc.), or behind a software layer

asadm

0 replies

1d4h

2024-05-29 16:56:39 UTC

How do people do infilling these days? In olden times models used to provide a way to provide suffix separately.

artninja1988

0 replies

1d4h

2024-05-29 17:11:03 UTC

This is a business model I can get behind. The model is under a non-commercial license, but it's open weights and they have their official API for it

Zambyte

0 replies

1d6h

2024-05-29 14:58:16 UTC

Link to the huggingface page: https://huggingface.co/mistralai/Codestral-22B-v0.1

YetAnotherNick

0 replies

1d4h

2024-05-29 17:26:27 UTC

What's the business model for semi open source models like these? Is it just because they can't be fully closed as they have to then compare with OpenAI. Who would pay for these model if better is available for cheaper from Anthropic or Google.

James_K

0 replies

2024-05-29 21:16:56 UTC

Democratising code

Did yall see what happened when they democratised art? I don't want to have a billion and one AI garbage libraries to sift through before I can find something reliable and human-made. At least the potential for creating horrific political software is slightly lower than with simple images.