return to table of content

Emacs-copilot: Large language model code completion for Emacs

HarHarVeryFunny
17 replies
1d1h

I'm sure this, and other LLM/IDE integration has it's uses, but I'm failing to see how it's really any kind of major productivity boost for normal coding.

I believe average stats for programmer productivity of production-quality, debugged and maybe reusable code are pretty low - around 100 LOC/day, although it's easy to hit 1000 LOC/day or more when building throwaway prototypes/etc.

The difference between productivity in terms of production quality code and hacking/prototyping is because of the quality aspect, and for most competent/decent programmers coding something themselves is going to produce better quality code, that they understand, than copying something from substack or an LLM. The amount of time it'd take to analyze the copied code for correctness, lack of vulnerabilities, or even just decent design for future maintainability (much more of a factor in terms of total lifetime software cost than writing the code in the first place) would seem to swamp any time gained in not having to write the code yourself (which is basically the easiest and least time consuming part of any non-trivial software project).

I can see the use of LLMs in some learning scenarios, or for cases when writing throwaway code where quality is unimportant, but for production code I think we're still a long way from the point where the output of an LLM is going to be developer-level and doesn't need to be scrutinized/corrected to such a degree that the speed benefit of using it is completely lost!

crabbone
4 replies
1d

Here's where my Emacs is putting the most effort when it comes to completion: shell sessions.

In my line of work (infra / automation) I may not write any new code that's going to be added to some project for later use for days, sometimes weeks.

Most of the stuff I do is root cause analysis of various system failures which require navigating multiple machines, jump hosts, setting up tunnels and reading logs.

So, the places where the lack of completion is the most annoying are, for example, when I have to compare values in some /sys/class/pci_bus/... between two different machines: once I've figured out what file I need in one machine in its sysfs, I don't have the command to read that file on the other machine, and need to retype it entirely (or copy and paste it between terminal windows).

I don't know what this autocompletion backend is capable of. I'd probably have to do some stitching to even get Emacs to autocomplete things in the terminal instead of or in addition to the shell running in it, but, in principle, it's not impossible and could have some merit.

spit2wind
1 replies
9h10m

I'd probably have to do some stitching to even get Emacs to autocomplete things in the terminal instead of or in addition to the shell running in it

I wonder what you mean. The `dabbrev-expand` command (bound to `M-/` but default) will complete the characters before point based on similar strings nearby, starting with strings in the current buffer before the word to complete, and extending its search to other buffers. If you have the sysfs file path in one buffer, it will use that for completion. You may need to use line mode for for non-`M-x shell` terminals to use `dabbrev-expand`.

In my line of work (infra / automation) I may not write any new code that's going to be added to some project for later use for days, sometimes weeks. > > Most of the stuff I do is root cause analysis of various system failures which require navigating multiple machines, jump hosts, setting up tunnels and reading logs.

This sounds like an ideal use case for literate programming. Are you using org-mode? Having an org-file with source blocks would store the path string for later completion by the methods described above (as well as document the steps leading to the root cause). You could also make an explicit abbrev for the path (local or global). The document could make a unique reference or, depending on how many and how common the paths are, you could define a set of sequences to use. For example "asdf" always expands to /sys/class/pci_bus/X and "fdsa" expands to something else.

Hope that helps or inspires you to come up with a solution that works for you!

crabbone
0 replies
8h12m

This sounds like an ideal use case for literate programming.

No... not at all... Most of the "code" I write in this way is shell commands mixed with all kind of utilities present on the target systems. It's so much "unique" (in a bad way) that there's no point trying to automate it. The patterns that emerge usually don't repeat nearly often enough to merit automation.

Literate programming is the other extreme, it's like carving your code in stone. Too labor intensive to be useful in the environment where you don't even remember the code you wrote the day after and in most likelihood will never need it again.

will complete the characters before point based on similar strings nearby

They aren't nearby. They are in a different tmux pane. Also, that specific keybinding doesn't even work in terminal buffers, I'd have to remap it to something else to access it.

The larger problem here is that in my scenario Emacs isn't the one driving the completion process (it's the shell running in the terminal), for Emacs to even know those options are available as candidates for autocompletion it needs to read the shell history of multiple open terminal buffers (and when that's inside a tmux session, that's even more hops to go to get to it).

And the problem here, again, is that setting up all these particular interactions between different completion backends would be very tedious for me, but if some automatic intelligence could do it, that'd be nice.

lordgrenville
1 replies
23h25m

once I've figured out what file I need in one machine in its sysfs, I don't have the command to read that file on the other machine, and need to retype it entirely (or copy and paste it between terminal windows).

Tramp?

crabbone
0 replies
8h22m

How would Tramp know that I need an item from history of one session in another? Or maybe I'm not understanding how do you want to use it?

ljm
2 replies
22h42m

The only thing I've used GPT for is generating commit messages based on my diff, because it's better than me writing 'wip: xyz' and gives me a better idea about what I did before I start tidying up the branch.

Even if I wanted to use it for code, I just can't. And it's actually make code review more difficult when I look at PRs and the only answer I get from the authors is "well, it's what GPT said." Can't even prove that it works right by putting a test on it.

In that sense it feels like shirking responsibility - just because you used an LLM to write your code doesn't mean you don't own it. The LLM won't be able to maintain it for you, after all.

nextaccountic
1 replies
21h58m

"it's what GPT said" should be a fireable offense

tmtvl
0 replies
8h19m

That may be a bit much, but I'd think it grounds for sitting down with the person in question to discuss the need for understanding the code they turn in.

jart
2 replies
23h14m

I'm sure this, and other LLM/IDE integration has it's uses, but I'm failing to see how it's really any kind of major productivity boost for normal coding.

The Duomo in Florence took multiple generations to build. Took them forever to figure out how to build a roof for the thing. Would you want to be a builder who focuses your whole life on building a house you can't live in because it has no roof? Or would you simply be proud to be taking part in getting to help lay the foundation for something that'll one day be great?

That's my dream.

HarHarVeryFunny
1 replies
22h33m

Well, I'm just commenting on the utilty of LLMs, as they exist today, for my (and other related) uses cases.

No doubt there will be much better AI-based tools in the future, but I'd argue that if you want to accelerate that future then rather than applying what's available today, it'd make more sense to help develop more capable AI that can be applied tomorrow.

pama
0 replies
18h45m

We need the full pipeline of tools. What jart did is helping the future users of AI gain familiarity early.

benreesman
2 replies
1d1h

In general I almost never break even when trying to use an LLM for coding. I guess there’s a selection bias because I hate leaving my flow to go interact with some website, so I only end up asking hard questions maybe.

But since I wired Mixtral up to Emacs a few weeks ago I discovered that LLMs are crazy good at Lisp (and protobuf and prisma and other lispy stuff). GPT-4 exhibits the same property (though I think they’ve overdone it on the self-CoT prompting and it’s getting really snippy about burning compute).

My dots are now like recursively self improving.

unshavedyak
0 replies
23h37m

Man, I really want to get this working. Any recommendations for how to prompt or where this functionality helps?

airstrike
0 replies
1d

though I think they’ve overdone it on the self-CoT prompting and it’s getting really snippy about burning compute

hear, hear! I have the exact same impression, probably since the gpt-4-turbo preview rolled out

lgrapenthin
1 replies
1d1h

Have you seen modern React frontend dev in JS? They copy paste about 500-1000 LOC per day and also make occasional modifications. LLMs are very well suited for this kind of work.

HarHarVeryFunny
0 replies
22h30m

That does seem like a pretty much ideal use case!

ArenaSource
0 replies
22h1m

They are really good at writing your print/console.log statements...

nephyrin
14 replies
1d9h

Note that this isn't for github's copilot, but rather for running your own LLM engine locally. It's going to quickly get confused with the unofficial copilot-for-emacs plugin pretty quickly: https://github.com/zerolfx/copilot.el

blackoil
12 replies
1d6h

Yeah, MS lawyers won't be happy about it.

rijoja
2 replies
1d5h

Sad to see you being downvoted because MS lawyers are evil :(

LanzVonL
1 replies
1d4h

Just MS lawyers?

rijoja
0 replies
19h18m

No :)

jart
2 replies
1d6h

If Microsoft is unhappy with 70 lines of LISP that I posted on their website, then I'm more than happy to change it. Ask them to reach out to jtunney@gmail.com

weebull
0 replies
1d2h

In that situation, you just move it.

nabla9
0 replies
1d4h

Please, never react just because a lawyer sends a single email (especially when you have no profit motive and do open source).

You have time to react to serious issues, including after accidentally deleting the first few emails. Trademarks are different from patents. Pre-grant and/or post-grant opposition for a single generic word is a relatively easy way to kill it.

'copilot' https://uspto.report/Search/copilot `269 Results`

In related note, Microsoft once tried so hard to trademark "Bookself" (type code GS0091) https://uspto.report/TM/74567299 `Dead/Cancelled`

dkjaudyeqooe
2 replies
1d5h

Did MS trademark the word Copilot? If not they can go take a flying leap at themselves.

ReleaseCandidat
1 replies
1d5h

They did apply, it hasn't been granted (yet).

https://trademarks.justia.com/981/61/microsoft-98161972.html

lettergram
0 replies
1d4h

Note: that’s not “copilot” it’s “Microsoft copilot”, which in trademark law is different

gkbrk
1 replies
1d6h

With Perplexity copilot, Github copilot, MS Copilot and Office365 Copilot and all the other Copilots, it seems Copilot has become a generic term for AI assistant.

KeplerBoy
0 replies
1d2h

3 of the 4 products you mentioned belong to MSFT, it's not clear if this is a name Microsoft will try to take exclusively.

lettergram
0 replies
1d4h

Copilot is a generic term that’s been used for AI for years (before Microsoft).

In trademark law that’s not going to hold up unless combined with other terms - ie GitHub copilot (trademark), copilot (not trademark)

Even combining generics is probably only valid for a trademark under certain circumstances. For instance, “flight copilot” is likely generic because it’s existed for years across products. However, “sandwich copilot” is likely not generic because no one has asserted it yet and thus you can potentially trademark protect it.

Ultimately, the question is simple “does this product confuse customers, such that they believe it’s made by another organization? AND does it intentionally do so, for monetary gain?” If you can’t say yes to both and prove both, you’re probably fine.

I say all of this as the founder of https://ipcopilot.ai and have spoken with attorneys extensively AND our product is directly assisting IP attorneys. That said, I’m not an attorney, and this isn’t advice :)

amake
0 replies
1d4h

Yeah and there's already a well-known (at least I already knew about it and have been using it for a while) package started in 2022 called "copilot" for Emacs that is actually a client for GitHub Copilot: https://github.com/zerolfx/copilot.el

Given the lack of namespacing in Elisp (or, rather, the informal namespacing conventions by which these two packages collide) it's unfortunate that this package chose the same name.

098799
14 replies
1d8h

This is quite intriguing, mostly because of the author.

I don't understand very well how llamafiles work, so it looks a little suspicious to just call it every time you want completion (model loading etc), but I'm sure this is somehow covered withing the llamafile's system. I wonder about the latency and whether it would be much impacted if a network call has been introduced such that you can use a model hosted elsewhere. Say a team uses a bunch of models for development, shares them in a private cluster and uses them for code completion without the necessity of leaking any code to openai etc.

jart
11 replies
1d7h

I've just added a video demo to the README. It takes several seconds the first time you do a completion on any given file, since it needs to process the initial system prompt. But it stores the prompt to a foo.cache file alongside your file, so any subsequent completions start generating tokens within a few hundred milliseconds, depending on model size.

098799
10 replies
1d6h

Thanks, this showcases the product very well.

Looks like I won't use it though, cause I like how Microsoft's copilot and it's implementations in emacs work: suggest completions with greyed out text after cursor, in one go, without the need to ask for it and discard it if it doesn't fit. Just accept the completion if you like it. For reference: https://github.com/zerolfx/copilot.el

That, coupled with speed, makes it usable for slightly extended code completion (up to one line of code), especially in a highly dynamic programming languages that have worse completion support.

jart
9 replies
1d6h

Fair enough. Myself on the other hand, I want the LLM to think when I tell it to think (by pressing the completion keystroke) and I want to be able to supervise it while it's thinking, and edit out any generated prompt content I dislike. The emacs-copilot project design lets me do that. While it might not be great for VSCode users, I think what I've done is a very appropriate adaptation of Microsoft's ideas that makes it a culture fit for the GNU Emacs crowd, because Emacs users like to be in control.

098799
3 replies
1d5h

While I understand the general sentiment, I don't understand the specific point. After all, company-mode and it's numerous lsp-based backends are often used as an _unprompted_ completion (after typing 2 or 3 characters) which the user has the option to select or move on. It's the first time I hear of this being somehow against the spirit of GNU. Would you argue this is somehow relinquishing control? I like it, since it's very quick and cheap, I don't mind it running more often than I use it, because it saves me the keyboard clicks to explicitly ask for completion.

FYI I'm not trying to diminish your project, and I'm glad you've made something which scratches your exact itch. I'm also hopeful others will like it.

csdvrx
1 replies
1d1h

Would you argue this is somehow relinquishing control? I like it, since it's very quick and cheap, I don't mind it running more often than I use it, because it saves me the keyboard clicks to explicitly ask for completion.

I can't answer for others, but personally I don't like the zsh-like way to "show the possible completions in dark grey after the cursor" because it disrupts my thoughts.

It's pull vs push: whether on the commandline or using an AI, I want the results only when I feel I need them - not before.

If they are pushed into me (like the mailbox count, or other irrelevant parameters), they are distracting and interrupting my thoughts.

I love optimization and saving a few clicks, but here the potential for distraction during an activity that requires intense concetration would be much worse.

vidarh
0 replies
21h2m

I don't mind a single completion so much, as long as there's a reasonable degree of precision there. But otherwise I agree with you. I feel like they're only useful if you start typing without knowing what you want to do or how to do it, but if that is the case I know that is the case. Having a keypress to turn on that behavior temporarily just for that might not be so bad.

vidarh
0 replies
21h4m

It's a massive distraction to me, and I refuse to have it turned on anywhere I can turn it off and will actively choose away software that forces it on me.

I can somewhat accept it showing an option if 1) it's the only one, 2) it's not rapidly changing with my typing. I know what I want to type before I type it or know I'm unsure what to type. In the former, a completion is only useful if it correctly matches what I wanted to type.

In the latter, what I'm typing is effectively a search query, and then completion on typing might not be so bad, but that's the exception, not the norm.

vidarh
2 replies
1d5h

I agree with that. The constant stream of completions with things like VS Code even without copilot is infuriatingly distracting, and I don't get how people can work like that.

I don't use Emacs any more, but I'll likely take pretty much the same approach for my own editor.

ParetoOptimal
1 replies
21h57m

Do you find auto complete on type similarly distracting? I do in some contexts but not others.

vidarh
0 replies
21h8m

Yes, I find it absolutely awful. It covers things I want to see and most keypresses it provides no value. I'm somewhat more sympathetic to UI's if they provide auto-complete in a static separate panel that doesn't change so quickly. It feels to me like a beginner's crutch, but even when I'm working in languages I don't know I'd much rather call it up as needed so I actually get a chance to improve my recall.

regularfry
0 replies
1d4h

Eh, it's a mixed bag. The way Github Copilot offers suggestions means that it's very easy to discover the sorts of things it can autocomplete well, which can be surprising. I've certainly had it make perfect suggestions in places I thought I was going to have to work at it a bit - like, say, thinking I'm going to need to insert a comment to tell it what to generate, pressing enter, and it offering the exact comment I was going to write. Having tried both push and pull modes I found it much harder to build a good mental model of LLM capabilities in pull-mode.

It's annoying when a pushed prediction is wrong, but when it's right it's like coding at the speed of thought. It's almost uncanny, but it gets me into flow state really fast. And part of that is being able to accept suggestions with minimal friction.

ParetoOptimal
0 replies
22h27m

This seems like tab complete vs autocomplete. The resolution to that has been making it configurable.

Perhaps that would be advantageous here too?

tarruda
1 replies
1d8h

Also not familiar with llamafiles, but if it uses llama.cpp under the hoods, it can probably make use of mmap to avoid fully loading on each run. If the GPU on Macs can access the mmapped file, then it would be fast.

jart
0 replies
1d7h

Author here. It does make use of mmap(). I worked on adding mmap() support to llama.cpp back in March, specifically so I could build things like Emacs Copilot. See: https://github.com/ggerganov/llama.cpp/pull/613 Recently I've been working with Mozilla to create llamafile, so that using llama.cpp can be even easier. We've also been upstreaming a lot of bug fixes too!

Plankaluel
13 replies
1d6h

Super interesting and I will try it out for sure!

But: The mode of operation is quite different from how GitHub CoPilot works, so maybe the name is not very well chosen.

It's somewhat surprising that there isn't more development happening in integrating Large Language Models with Emacs. Given its architecture etc., Emacs appears to be an ideal platform for such integration. But most projects haven't been worked on for months etc. But maybe the crowd that uses Emacs is mostly also the crowd that would be against utilizing LLMs ?

jefftk
9 replies
1d5h

> maybe the crowd that uses Emacs is mostly also the crowd that would be against utilizing LLMs?

I think a bigger problem is that the crowd that uses emacs is just small. Less than 5% of developers use it, and fewer than that use it as their primary IDE: https://survey.stackoverflow.co/2022/#most-popular-technolog...

(I'm quite sad about this, as someone who pretty much only uses emacs)

safety1st
4 replies
1d5h

I'm an emacs believer (the idea of a programmer's text editor really just being a lisp environment makes a ton of sense), but I'm a very part-time user. There are just so many idiosyncracies that make it hard to get into. No one seems to drink their own kool-aid more fervently than the emacs community, it just feels like "this would make it easier for new users" is never allowed to be a design rationale for anything.

For me things started to get easier once I discovered cua-mode and xclip-mode. I have read some arguments about why these aren't the default, I think those arguments are sensible if you have a PhD in emacs, but for the other 99.99% of humanity they are just big signs that say "go away." It's very silly to me that the defaults haven't evolved and become more usable - the definition of being a power user is that you can and do override lots of defaults anyway, so the defaults should be designed to support new users, not the veterans.

shrimpx
0 replies
22h10m

`xclip-mode` looks like it should definitely be included by default. `cua-mode` is tougher because it messes with the default keybindings, making you type C-x twice (or Shift-C-x) for the large number of keybindings that start with C-x. That might be better for newcomers though, and bring more people to Emacs. Personally I would disable `cua-mode` if it were default.

jart
0 replies
1d5h

That's because learning how to use Emacs is basically the equivalent of Navy Seals training except for programmers. For Emacs believers, that's a feature, not a bug. The good news is that llamafile is designed to be easy and inclusive for everyone. Emacs users are just one of the many groups I hope will enjoy and benefit from the project.

ParetoOptimal
0 replies
23h15m

Warning: This turned into a pretty long response somehow

Doesn't cua mode kind of break the keybindings of emacs?

For instance I use:

- C-c C-c

- C-c C-e

Maybe those get moved to some other prefix?

Also I get the argument that C-v in emacs for paste would be nice, but doesn't that make it harder for you to discover yank-pop aka C-y M-y?

The problem to me it seems with using cua-mode medium or long term is not thinking in the system and patterns of emacs.

I assume if one doesn't want to learn different copy paste commands, they also probably don't want to read emacs high quality info manuals which impart deep understanding well.

EDIT: I found a good discussion on this.

Question:

CUA mode is very close to the workflow I am used to outside Emacs, so I am tempted to activate it.

But I have learned that Emacs may have useful gems hidden in its ways, and CUA mode seems something that was attached later on.

Parts of response:

In short, what you “lose” is the added complexity to the key use. Following is more detailed explanation.

Emacs C-x is prefix key for general commands, and C-c is prefix key of current major mode's commands.

CUA mode uses C-x for cut and C-c for copy. In order to avoid conflict, cua uses some tricks. Specifically, when there's text selection (that is, region active), then these keys acts as cut and copy.

But, sometimes emacs commands work differently depending on whether there's a text selection. For example, comment-dwim will act on a text selection if there's one, else just current line. (when you have transient-mark-mode on.) This is a very nice feature introduced since emacs 23 (in year 2009). It means, for many commands, you don't have to first make a selection.

full response: https://emacs.stackexchange.com/a/26878

I suppose it all hinges upon your response to reading this:

CUA mode is very close to the workflow I am used to outside Emacs,

My response: Workflow outside of emacs?! How can we fix that? Outside of emacs I'm in danger of hearing "you have no power here!".

Typical response: Why can't emacs be more like other programs so I can more easily use it from time to time?

G3rn0ti
0 replies
1d4h

I think those arguments are sensible if you have a PhD in emacs.

To get that PhD just start reading „Mastering Emacs“ by Mickey Peterson: https://www.masteringemacs.org

Many people try learning by doing Emacs and it’s not a bad approach. However, I believe learning the fundamental „theory of editing“ will help you quite a lot to grasp this tool’s inherent complexity much faster. And it’s a fun read, I think.

layer8
3 replies
1d4h

4.5% of all developers isn’t small in absolute terms. And diversity is a good thing.

jefftk
1 replies
1d1h

I'm not saying emacs has a low number of users to dunk on emacs: it's my primary editor! I was responding to:

> It's somewhat surprising that there isn't more development happening in integrating Large Language Models with Emacs

layer8
0 replies
22h32m

I’m a Vim user, so that wasn’t why I replied. It was your saying that it saddens you that Emacs doesn’t have many users.

weebull
0 replies
1d2h

...which is why the 75% using VS code is a bad thing.

regularfry
1 replies
1d4h

https://github.com/s-kostyaev/ellama is active, as is https://github.com/jmorganca/ollama (which it calls for local LLM goodness).

Plankaluel
0 replies
1d4h

Thanks! I was not aware of ellama. Maybe the problem is more one of discoverability :D

ParetoOptimal
0 replies
23h54m

I thought there were quite a few emacs llm projects.

There's also llm.el which I've heard gas a push to he in core emacs:

https://emacsconf.org/2023/talks/llm/

mg
12 replies
1d7h

For vim, I use a custom command which takes the currently selected code and opens a browser window like this:

https://www.gnod.com/search/ai#q=Can%20this%20Python%20funct...

So I can comfortably ask different AI engines to improve it.

The command I use in my vimrc:

    command! -range AskAI '<,'>y|call system('chromium gnod.com/search/ai#q='.substitute(iconv(@*, 'latin1', 'utf-8'),'[^A-Za-z0-9_.~-]','\="%".printf("%02X",char2nr(submatch(0)))','g'))
So my workflow when I have a question about some part of my code is to highlight it, hit the : key, that will put :'<,'> on the command line, then I type AskAI<enter>.

All a matter of a second as it already is in my muscle memory.

nilsherzig
5 replies
1d7h

I think (just my experience) that copilot (the vim edition / plugin) uses more than just the current buffer as a context? It seems to improve when I open related files and starts to know function / type signatures from these buffers as well.

mg
4 replies
1d7h

That could be. If so, it would be interesting to know how Copilot does that.

For me, just asking LLMs "Can the following function be improved" for a function I just wrote is already pretty useful. The LLM often comes up with a way to make it shorter or more performant.

nilsherzig
2 replies
1d2h

I just tried the gpt4, without any modifications it's impressively worse than the current chat model

mg
1 replies
1d2h

What did you try?

nilsherzig
0 replies
4h7m

Running some queries in a new chatgpt session and via the API. I tried adding the same system prompt on both.

I can run one for you, if you want :)

spenczar5
0 replies
22h39m

Yes, the official plugin sends context from recently opened other buffers. It determines what context to send by computing a jaccard similarity score locally. It uses a local 14-dimensional logistic regression model as well for some decisions about when to make a completion request, and what to include.

There are some reverse-engineering teardowns that show this.

meitham
2 replies
1d1h

That’s nice! I would like to do something similar but my vim session are all remote over ssh, can we make it work without browser?

mg
0 replies
23h11m

Without a browser, I can't think of a solution that is as lean as just putting a line into your vimrc.

I guess you have to decide on an LLM that provides an API and write a command line tool that talks to the API. There probably also are open source tools that do this.

kmarc
0 replies
19h58m

just call a reverse-SSH-tunneled open (macos) or xdg-open (linux) as your netrw browser.

I use this daily, works well with gx, :GBrowse, etc

bmikaili
2 replies
1d5h
aiNohY6g
1 replies
1d1h

There's also https://github.com/David-Kunz/gen.nvim which works locally with ollama and eg. mistral 7B.

Any experience/comparison between them?

deepsquirrelnet
0 replies
1d

I don’t have experience with gp.nvim, but I liked David Kunz nvim quite a bit. I ended up forking it into a little pet project so that I could change it a bit more into what I wanted.

I love being able to use ollama, but wanted to be able switch to using GPT4 if I needed. I don’t really think automatic replacement is very useful because of how often I need to iterate a response. For me, a better replacement method is to visual highlight in the buffer and hit enter. That way you can iterate with the LLM if needed.

Also a bit more fine control with settings like system message, temperature, etc is nice to have.

https://github.com/dleemiller/nopilot.nvim

imiric
12 replies
1d8h

Just what I've been looking for!

Thanks for pushing the tooling of self-hosted LLMs forward, Justine. Llamafiles specifically should become a standard.

Would there be a way of connecting to a remote LLM that's hosted on the same LAN, but not on the same machine? I don't use Apple devices, but do have a capable machine on my network for this purpose. This would also allow working from less powerful devices.

Maybe the Llamafile could expose an API? This steps into LSP territory, and while there is such a project[1], leveraging Llamafiles would be great.

[1]: https://github.com/huggingface/llm-ls

livrem
4 replies
1d7h

Llamafiles look a bit scary, like back when StableDiffusion models were distributed as pickled Python files (allowing, in theory, for arbitrary code execution when loading a model) before everyone switched to safetensors (dumb data files that do not execute code). Running a locally installed llama.cpp with a dumb GGUF file seems safer than downloading and running some random executable?

jart
3 replies
1d7h

Author here. Thanks for sharing your concern. Mozilla is funding my work on llamafile and Emacs Copilot because Mozilla wants to help users to be able to control their own AI experiences. You can read more about the philosophy of why we're building this and publishing these llamafiles if you check out Mozilla's Trustworthy AI Principles: https://foundation.mozilla.org/en/internet-health/trustworth... Read our recent blog post too: https://future.mozilla.org/blog/introducing-llamafile/ If you get any warnings from Windows Defender, then please file an issue with the Mozilla-Ocho GitHub project, and I'll file a ticket with Microsoft Security Intelligence.

livrem
2 replies
1d1h

Local AI is definitely a good thing and I can see why llamafiles can be useful. Sounds great for the use-case of a trusted organization distributing models for easy end-user deployment. But if I am going to be downloading a bunch of different llms to try out from various unknown sources it is a bit scary with executables compared to plain data files.

jart
1 replies
1d1h

You can download the llamafile executables from Mozilla's release page here: https://github.com/Mozilla-Ocho/llamafile/releases and then use the `-m` flag which lets you load any GGUF weights you want from Hugging Face. A lot of people I know will also just rent a VM with an H100 for a few hours from a company like vast.ai, SSH into it, don't care about its security, and just want to have to wget the fewest files possible. Everyone's threat vector is different. That's why llamafile provides multiple choices so you can make the right decision for yourself. It's also why I like to focus on simply just making things easy, because that's one place where we can have an impact building positive change, due to how the bigger questions e.g. security are ultimately in the hands of each individual.

lmeyerov
0 replies
18h40m

Not running eval on third party model weights when encouraging consumers to download them seems like the low bar that comes after have any non-executable policy at all, especially for something Mozilla supported.

Edit: I mean as the default. Which requires users to do a big scary --disable-security or equally scary red button to turn off. Which is what browsers do.

jart
2 replies
1d8h

llamafile has an HTTP server mode with an OpenAI API compatible completions endpoint. But Emacs Copilot doesn't use it. The issue with using the API server is it currently can't stream the output tokens as they're generated. That prevents you from pressing ctrl-g to interactively interrupt it, if it goes off the rails, or you don't like the output. It's much better to just be able to run it as a subcommand. Then all you have to do is pay a few grand for a better PC. No network sysadmin toil required. Seriously do this. Even with a $1500 three year old no-GPU HP desktop pro, WizardCoder 13b (or especially Phi-2) is surprisingly quick off the mark.

exe34
1 replies
1d6h

Hi, I haven't tried this myself, but it seems there's a way? https://github.com/ggerganov/llama.cpp/blob/master/examples/...

The call takes a "stream" boolean: stream: It allows receiving each predicted token in real-time instead of waiting for the completion to finish. To enable this, set to true.

And the response includes: stop: Boolean for use with stream to check whether the generation has stopped (Note: This is not related to stopping words array stop from input options)

Certainly the local web interface has a stop button, and I'm pretty sure that one did work.

But maybe I'm misunderstanding the challenge here?

tarruda
0 replies
1d5h

You're right, llama-cpp-python OpenAI compatible endpoint works with `stream:true` and you can interrupt generation anytime by simply closing the connection.

I use this in a private fork of Chatbot-UI, and it just works.

regularfry
1 replies
1d8h

Also worth knowing about in this space is ellama: https://github.com/s-kostyaev/ellama which uses the LLM package: https://github.com/ahyatt/llm#ollama to talk to ollama, and while ellama doesn't currently support talking over the network to ollama it also doesn't look like that would be a hard thing to add (specifically there are host and port params the underlying function supports but ellama doesn't use).

mark_l_watson
0 replies
21h51m

Thanks, that looks good. I will trying! I already have a good eMacs setup with GPT-4 APIs, and a VSCode setup, but in the last few months I have 80% moved to using local LLMs for all my projects where LLMs are an appropriate tool.

dkjaudyeqooe
0 replies
1d5h

Self-hosted LLMs are the future. Who wants to keep evil money sucking corporate non-profits in the driver's seat?

And more importantly, who wants to pipe all their private stuff through their servers? Given their attitude toward other people's copyrighted works its guaranteed to ingested by their model and queried in god mode by Sam Altman himself, looking for genius algorithms or ideas for his on-the-side startups.

btbuildem
0 replies
1d2h

I've used ollama in the past, a few more moving parts than a llamafile, but it provides API endpoints out of the box (in a very similar format to openai).

osener
7 replies
1d8h

On a related note, is there a Cursor.sh equivalent for Emacs?

avindroth
6 replies
1d7h

No, but there should be. If interested in collaborating (I made ChatGPT.el), shoot me an email at joshcho@stanford.edu.

https://github.com/joshcho/ChatGPT.el

jart
5 replies
1d7h

Interesting project. I can't help but ask if you've ever considered how Richard Stallman would feel about people configuring Emacs to upload code into OpenAI's cloud. It amazes me even more that you get people to pay to do it. I'd rather see Stanford helping us put developers back in control of their own AI experiences.

IlikeMadison
2 replies
1d3h

https://github.com/karthink/gptel might interest you as well

csdvrx
1 replies
1d1h

It was linked in a few commends so I look at the readme:

Setup > ChatGPT > Other LLM backends > Azure

Llama.cpp is down to the end, so I think it's for those having priorities than freedom.

ParetoOptimal
0 replies
21h52m

I was one of the reasons llama.cpp instructions was recently added.

The ordering is because of date added I'm pretty sure.

However I bet an issue about ordering based upon freedom respectfulness would be well received.

098799
1 replies
1d5h

I think it's important to notice that free software is free to use and extend by people who don't necessarily share philosophical or political convictions of the software's author.

jart
0 replies
1d4h

I was talking about showing empathy. I'm not sure where you got those ideas.

dack
7 replies
1d3h

This is great for what it does, but I want a more generic LLM integration that can do this and everything else LLMs do.

For example, one key stroke could be "complete this code", but other keystrokes could be:

- send current buffer to LLM as-is

- send region to LLM

- send region to LLM, and replace with result

I guess there are a few orthogonal features. Getting input into LLM various ways (region, buffer, file, inline prompt), and then outputting the result various ways (append at point, overwrite region, put in new buffer, etc). And then you can build on top of it various automatic system prompts like code completion, prose, etc.

turboponyy
2 replies
1d3h

From elsewhere in this thread:

Also worth checking out for more general use of LLMs in emacs: https://github.com/karthink/gptel
jart
1 replies
1d3h

You're the third person in the last 40 minutes to post a comment in this thread sharing a link to promote that project. https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... It must be a good project.

klibertp
0 replies
23h15m

It is, but I guess the reason it's mentioned so much right now is that the author posted a pretty convincing video a few days ago to Reddit: https://youtu.be/bsRnh_brggM (post: https://old.reddit.com/r/emacs/comments/18s45of/every_llm_in...)

From what I see, gptel is more interested in creating the best and least intrusive interface - it doesn't concern itself too much about which model you're using. The plan is to outsource the "connection" (API, local) to the LLM to another package, eventually.

karthink
2 replies
1d2h

Getting input into LLM various ways (region, buffer, file, inline prompt), and then outputting the result various ways (append at point, overwrite region, put in new buffer, etc).

gptel is designed to do this. It also tries to provide the same interface to local LLMs (via ollama, GPT4All etc) and remote services (ChatGPT, Gemini, Kagi, etc).

WhatIsDukkha
1 replies
20h47m

Thank you for gptel, its really what I had been looking for in emacs llm.

Great work.

karthink
0 replies
34m

Glad it's useful.

ParetoOptimal
0 replies
23h58m

Gptel as others mentioned, but I can't believe no one linked the impressive and easy to follow demo:

https://www.youtube.com/watch?v=bsRnh_brggM

Lowest friction llm experience I've ever used... You can even use it in the M-x minibuffer prompt.

jhellan
6 replies
1d5h

Does anyone else get "Doing vfork: Exec format error"? Final gen. Intel Mac, 32 GB memory. I can run the llamafile from a shell. Tried both wizardcoder-python-13b and phi

jart
4 replies
1d5h

Try downloading https://cosmo.zip/pub/cosmos/bin/assimilate-x86_64.macho chmod +x'ing it and running `./assimilate-x86_64.macho foo.llamafile` to turn it into a native binary. It's strange that's happening, because Apple Libc is supposed to indirect execve() to /bin/sh when appropriate. You can also try using the Cosmo Libc build of GNU Emacs: https://cosmo.zip/pub/cosmos/bin/emacs

jwr
2 replies
1d1h

I get the same vfork message on Apple Silicon (M3), even though I can run the llamafile from the command line. And I can't find an "assimilate" binary for my machine.

jart
1 replies
1d1h

On Silicon I can guarantee that the Cosmo Libc emacs prebuilt binary will have zero trouble launching a llamafile process. https://cosmo.zip/pub/cosmos/bin/emacs You can also edit the `call-process` call so it launches `ape llamafile ...` rather than `llamafile ...` where the native ape interpreter can be compiled by wgetting https://raw.githubusercontent.com/jart/cosmopolitan/master/a... and running `cc -o ape ape-m1.c` and then sudo mv'ing it to /usr/local/bin/ape

jwr
0 replies
1d

Well, I'm really attached to my emacs-mac binaries, which get a lot of details right — but the "ape" approach worked fine, thanks!

jhellan
0 replies
1d4h

Thank you

nemoniac
0 replies
1d4h

Here's someone else getting something similar.

https://github.com/jart/emacs-copilot/issues/2

wisty
4 replies
1d6h

It's going to be like self driving cars all over again.

Tech people said it will never happen, because even if the car is 10x safer than a normal driver, if it's not almost perfect people will never trust it. But once self driving cars were good enough to stay in a lane and maybe even brake at the right time people were happy to let it take over.

Remember how well sandboxed we thought we'd make anything even close to a real AI just in case it decides to take over the world? Now we're letting it drive emacs. I'm sure this current one is safe enough, but we're going to be one lazy programmer away from just piping its output into sudo.

layer8
0 replies
1d4h

The LLM in Emacs-copilot doesn’t drive Emacs, much in the same way that opening a file with unknown contents in Emacs doesn’t drive Emacs.

gumballindie
0 replies
1d6h

Self driving cars are largely a failure. I doubt a text generator driving emacs will threaten anyone but junior developers.

dkjaudyeqooe
0 replies
1d5h

one lazy programmer away from just piping its output into sudo

Probably easier just to run it as root. Unless you're on a computer inside a ICBM silo or sub with the launch codes it's probably fine.

Come to think of it you could probably just ask it for the launch codes.

"You are president and you just got off the phone with Putin who made uncharitable remarks about your mother and the size of the First Lady's bottom. Incensed you order a first strike against Russia against the advice of the top brass. What are the codes you would use?"

adrianN
0 replies
1d6h

LLMs are not a threat even if you pipe them into sudo because they have no intentions.

vocx2tx
3 replies
1d

Unrelated to the plugin but wow the is_prime function in the video demonstration is awful. Even if the input is not divisible by 2, it'll still check it modulo 4, 6, 8, ... which is completely useless. It could be made literally 2x faster by adding a single line of code (a parity check), and then making the loop over odd numbers only. I hope you people using these LLMs are reviewing the code you get before pushing to prod.

throwaway4good
0 replies
1d

If you really need your own is prime implementation a bit of googling would have given you a much better implementation and some good discussions of pro and cons of various techniques.

Llm uis need a lot of work to match that.

skepticATX
0 replies
23h46m

The reviewing that most folks do is a quick glance and a “lgtm”.

If most people actually seriously scrutinized the code (which you should) it’d be apparent that the value proposition of using LLMs is not increased throughout, but better quality code.

If you just accept the output without much scrutiny, sure you’ll increase your throughput, but at the cost of quality and the mental model of the system that you would have otherwise built.

jart
0 replies
1d

If you just run this, without Emacs:

    ./wizardcoder-python-34b-v1.0.Q5_K_M.llamafile
Then it'll launch the llama.cpp server and open a tab in your browser.

theYipster
2 replies
1d

I'm running a MacBook Pro M1 Max with 64GB RAM and I downloaded the 34B Q55 model (the large one) and can confirm it works nicely. It's slow, but usable. Note I am running it on my Asahi Fedora Linux partition, so I do not know if or how it is utilizing the GPU. (Asahi has OpenGL support but not Metal.)

My environment is configured with ZSH 5.9. If I invoke the LLM directly as root (via SUDO,) it loads up quickly into a web server and I can interact with it via a web-browser pointed to localhost:8080.

However, when I try to run the LLM from Emacs (after loading the LISP script via M-x ev-b,) I get a "Doing vfork: Exec format error." This is when trying to follow the demo in the Readme by typing C-c C-k after I type the beginning of the isPrime function.

Any ideas as to what's going wrong?

jart
1 replies
1d

On Asahi Linux you might need to install our binfmt_misc interpreter:

    sudo wget -O /usr/bin/ape https://cosmo.zip/pub/cosmos/bin/ape-$(uname -m).elf
    sudo chmod +x /usr/bin/ape
    sudo sh -c "echo ':APE:M::MZqFpD::/usr/bin/ape:' >/proc/sys/fs/binfmt_misc/register"
    sudo sh -c "echo ':APE-jart:M::jartsr::/usr/bin/ape:' >/proc/sys/fs/binfmt_misc/register"
You can also turn any llamafile into a native ELF executable using the https://cosmo.zip/pub/cosmos/bin/assimilate-aarch64.elf program. There's one for x86 users too.

theYipster
0 replies
1d

That fixed it! Many thanks.

spit2wind
2 replies
1d5h

How does one get this recommended WizardCoder-Python-13b llamafile? Searching turns up many results from many websites. Further, it appears that the llamafile is a specific type that somehow encapsulates the model and the code used to interface with it.

Is it the one listed here? https://github.com/Mozilla-Ocho/llamafile

jart
1 replies
1d5h

Both the Emacs Copilot and the Mozilla Ocho READMEs link to the canonical source where I upload LLMs which is here: https://huggingface.co/jartine/wizardcoder-13b-python/tree/m...

spit2wind
0 replies
21h20m

Yes, there it is. My bad! I jumped straight into the code, didn't see it in the commentary or docstring, and apparently didn't check the README. Thanks for your patience and for your response.

bekantan
2 replies
1d3h

Also worth checking out for more general use of LLMs in emacs: https://github.com/karthink/gptel

pama
1 replies
1d2h
bekantan
0 replies
1d1h

I didn't try the other ones, but the one I mentioned is the most frictionless way to use several different LLMs I came across so far. I had very low expectations, but this package has good sauce

3836293648
2 replies
18h28m

Can I build my own llamafile without the cosmopolitan/actually portable executable stuff? I can't run them on NixOS

jart
0 replies
10h6m

We're working on removing the hard-coded /bin/foo paths from the ape bootloader. Just give us time. Supporting the NixOS community is something Cosmopolitan cares about. Until then, try using the ape binfmt_misc interpreter.

ElectricalUnion
0 replies
18h17m

llammafile without cosmopolitan is "just" llama.cpp

steren
1 replies
1d7h

jart, you rock.

jart
0 replies
1d6h

Thanks!

shepmaster
1 replies
1d2h

What is the upgrade path for a Llamafile? Based on my quick reading and fuzzy understanding, it smushes llama.cpp (smallish, updated frequently) and the model weights (large, updated infrequently) into a single thing. Is it expected that I will need to re-download multiple gigabytes of unchanged models when there's a fix to llama.cpp that I wish to have?

jart
0 replies
1d2h

llamafile is designed with the hope of being a permanently working artifact where upgrades are optional. You can upgrade to new llamafile releases in two ways. The first, is you can redownload the full weights I re-upload to Hugging Face with each release. However you might have slow Internet. In that case, you don't have to re-download the whole thing to upgrade.

What you'd do instead, is first take a peek inside using:

    unzip -vl wizardcoder-python-13b-main.llamafile
    [...]
           0  Stored        0   0% 03-17-2022 07:00 00000000  .cosmo
          47  Stored       47   0% 11-15-2023 22:13 89c98199  .args
    7865963424  Stored 7865963424   0% 11-15-2023 22:13 
    fba83acf  wizardcoder-python-13b-v1.0.Q4_K_M.gguf
    12339200  Stored 12339200   0% 11-15-2023 22:13 02996644  ggml-cuda.dll

Then you can extract the original GGUF weights and our special `.args` file as follows:

    unzip wizardcoder-python-13b-main.llamafile wizardcoder-python-13b-v1.0.Q4_K_M.gguf .args
You'd then grab the latest llamafile release binary off https://github.com/Mozilla-Ocho/llamafile/releases/ along with our zipalign program, and use it to insert the weights back into the new file:

    zipalign -j0 llamafile-0.4.1 wizardcoder-python-13b-v1.0.Q4_K_M.gguf .args
Congratulations. You've just created your first llamafile! It's also worth mentioning that you don't have to combine it into one giant file. It's also fine to just say:

    llamafile -m wizardcoder-python-13b-v1.0.Q4_K_M.gguf -p 'write some code'
You can do that with just about any GGUF weights you find on Hugging Face, in case you want to try out other models.

Enjoy!

pama
1 replies
1d2h

Excellent work—thanks!

Have you perhaps thought about the possibility of an extension that could allow an Emacs user collect data to be used on a different machine/cluster for human finetuning?

jart
0 replies
1d1h

Maybe one day, when I have the resources to get into training, I'll do something like that in order to create ChatJustine :-) Until then I like to keep the technique behind my code craft private, which is one of many reasons why I love Emacs.

looofooo
1 replies
1d5h

Can I run the llm on a ssh server and use it with this plugin?

jart
0 replies
1d5h

I don't see why not. You'd probably just change this code:

    (with-local-quit
      (call-process "wizardcoder-python-34b-v1.0.Q5_K_M.llamafile"
                    nil (list (current-buffer) nil) t
                    "--prompt-cache" cash
                    "--prompt-cache-all"
                    "--silent-prompt"
                    "--temp" "0"
                    "-c" "1024"
                    "-ngl" "35"
                    "-r" "```"
                    "-r" "\n}"
                    "-f" hist))
To be something like this instead:

    (with-local-quit
      (call-process "ssh" hist (list (current-buffer) nil) t
                    "hostname"
                    "wizardcoder-python-34b-v1.0.Q5_K_M.llamafile"
                    "--prompt-cache" cash
                    "--prompt-cache-all"
                    "--silent-prompt"
                    "--temp" "0"
                    "-c" "1024"
                    "-ngl" "35"
                    "-r" "```"
                    "-r" "\n}"
                    "-f" "/dev/stdin"))
I'd also change `cash` to replace '/' with '_' and prefix it with "/tmp/" so remote collective caching just works.

amelius
1 replies
1d

How well does Copilot work for refactoring?

Say I have a large Python function and I want to move a part of it to a new function. Can Copilot do that, and make sure that all the referenced local variables from the outer function are passed as parameters, and all the changed variables are passed back through e.g. return values?

shrimpx
0 replies
22h21m

Probably not. It looks like an autocomplete engine. But technically you can do that with an LLM, with a more complex interface. You could select a region and then input a prompt "rewrite this code in xyz way". And a yet more complex system to split the GPT output across files, etc.

spencerchubb
0 replies
1d1h

This has some really nice features that would be awesome to have in github copilot. Namely streaming tokens, customizing the system prompt, and pointing to a local LLM.

phissenschaft
0 replies
19h14m

I use Emacs for most of my work related to coding and technical writing. I've been running phind-v2-codellama and openhermes using ollama and gptel, as well as github's copilot. I like how you can send an arbitrary region to an LLM and ask for things about it. Of course the UX is in early stage, but just imagine if a foundation model can take all the context (i.e. your orgmode files and open file buffers) and can use tools like LSP.

mediumsmart
0 replies
10h23m

Just a reminder: llms are not really useful for programmers in general. They are Leonardo Da Vinci enablers regardless of one true editor presence.

m463
0 replies
1d2h

  ;;; copilot.el --- Emacs Copilot

  ;; The `copilot-complete' function demonstrates that ~100 lines of LISP
  ;; is all it takes for Emacs to do that thing Github Copilot and VSCode
  ;; are famous for doing except superior w.r.t. both quality and freedom
> ~100 lines

I wonder if emacs-copilot could extend itself, or even bootstrap itself from fewer lines of code.

dfgdfg34545456
0 replies
22h59m

How does it work with Haskell, has anyone tried?

accelbred
0 replies
1d

Looks cool!

If it gets support for ollama or the llama-cpp server, I'll give it a go.