With all these AI tools requiring a prompt, does it really simplify/speed up things? From the example: I have to write "add a name param to the 'greeting' function, add all types", then wait for the result to be generated, read it carefully to be sure that it does what I want, probably reiterate if the result does not match the expectation. This seems to me more time consuming than actually do the work myself. Does anyone has examples where promoting and double checking is faster than doing it on your own? Is it faster when exploring new solutions and "unknown territory" and in this case, are the answers accurate (from what I tried so far they were far off)? In that case how do you compare it with "regular search" via Google/Bing/...? Sorry for the silly question but I'm genuinely trying to understand
People interested in Aider (which is an awesome tool) might also be interested in checking out my project Plandex[1]. It's terminal-based like Aider and has a somewhat comparable set of features, but is more focused on using LLMs to work on larger and more complex tasks that span many files and model responses. It also uses a git-style CLI approach with independent commands for each action vs. Aider's interactive shell.
I studied Aider's code and prompts quite a bit in the early stages of building Plandex. I'm grateful to Paul for building it and making it open source.
This looked cool and I was excited to try it until I realized that I either need a subscription, or I need to set up a server. Why does this need a server, when Aider just works via the cli?
First I should note that while cloud will have a subscription eventually, it's free for now. There's an anonymous trial (with no email required) for up to 10 plans or 10 model responses, and then just name and email is required to continue.
I did start out with just the CLI running locally, but it reached a point where I needed a database and thus a client-server model. Plandex is designed for working on many 'plans' at different levels of the project hierarchy (some users on cloud have 50+ after using it for a week), and there's also a fair amount of concurrency, so it got to be too much for a local filesystem or even something like a local SQLite db.
Plandex also has the ability to send tasks to the background, which I think will start to play a more and more important role as models get better and more capable of running autonomously for longer periods, and I want to add sharing and collaboration features in the future as well, so all-in-all I thought a client-server model was the best base to build from.
I understand where you're coming from though. That local-only simplicity is definitely a nice aspect of Aider.
I had a second look and the server doesn't look too hard to deploy. I like that there's reasoning behind requiring it, although I suspect that SQLite is more than capable to very easily do this.
I'm trying to deploy the server right now so I can try Plandex, it would be easier if I hadn't forgotten my Postgres password...
As a tip, self-hosting would be much easier (which may be something you don't want to do) if you provided a plain Docker image, then it would just be "pull the Docker image, specify the local directory, specify the DB URL, done".
By the way, why does it need a local directory if it has a database? What's stored in the directory?
Agreed on providing a docker image. I made an issue to track it here: https://github.com/plandex-ai/plandex/issues/78
I do want to make self-hosting as easy as possible. In my experience, there will still be enough folks who prefer cloud to make it work :)
There's a local .plandex directory in the project which just stores the project id, and a $HOME/.plandex-home directory that stores some local metadata on each project--so far just the current plan and current branch.
I see, thanks for the explanation! If you're only storing a bit of data, removing the requirement for a local directory would make deployment easier; these could just go into the database.
Oh sorry, my comment was referring to the local files created by the CLI. The server uses the file system much more heavily in order to enable efficient version control with an embedded git repo for each plan. Everything in a plan that's version-controlled (context, the conversation, model settings, and tentative file updates) is stored in this repo instead of the database.
Ah, that makes sense, thank you.
Can we get someone to automate stuff like copying files, renaming stuff, setting env variables, any common tasks done in an OS.
You could do this with Plandex (or Aider... or ChatGPT) by having it output a shell script then `chmod +x` it and run it. I experimented early on with doing script execution like this in Plandex, but decided to just focus on writing and updating files, as it seemed questionable whether execution could be made reliable enough to be worthwhile without significant model advances. That said, I'd like to revisit it eventually, and some more constrained tasks like copying and moving files around are likely doable without full-on shell script execution, though some scary failure cases are possible here if the model gets the paths wrong in a really bad way.
OpenInterpreter is another project you could check out that is more focused on code/script execution: https://github.com/OpenInterpreter/open-interpreter
I feel like what I am saying should be natively supported.
If youre worried about changes getting it wrong, just show a prompt with all the batched changes.
me > build my jar, move it to the last folder I copied it to, and run it. LLM > built jar xyz.jar moving jar to x/y/z me > yes. me > redo last command.
Provide rollback/log for these features if need be.
I really dont think you even need an LLM for this. I feel like I can do it with a simple classifier. It just needs to be hooked into to OS, so that it can scan what you were doing, and replicate it.
For example if I keep opening up folder x and dropping a file called build.jar to folder y, a program should be able to easily understand "copy the new jar over"
I imagine at point this is going to be done at the OS level
It's a great concept and I agree it will definitely exist at some point, but working a lot with GPT-4 has made me viscerally aware of how many different ways something like "build my jar, move it to the last folder I copied it to, and run it" can be spectacularly misinterpreted, and how much context is needed for that command to have any hope of being understood. The other big issue is that there is no rollback for a `rm` or `mv` command that screws up your system.
I had similar ideas when I started on Plandex. I wanted it to be able to install dependencies when needed, move files around, etc., but I quickly realized that there's just so much the model needs to know about the system and its state to even have a chance of getting it right. That's not to say it's impossible. It's just a really hard problem and I'd guess the first projects/products to nail it will either come from the OS vendors themselves, or else from people focusing very specifically on that challenge.
Youre right there is a lot of ambiguity there. I think being able to scan user actions helps a ton with this though, because you know exactly the steps the user took. Most of the times I want this is when I literally have to repeat the same set of actions 5+ times and writing a script to do it isnt worth it. I want to be able to just save/train the model and have it do what I want. Today I literally built a jar 50 times, with each time having to open up two folders and copying files between the two same directories. Massively annoying.
There is still some ambiguity there because cases might slightly differ, youre right.
For rm/mv. mv is easily reversible no? You just need to store some context. Same with rm, just copy it to a temp directory. But again with a confirmation prompt its a non issue either way.
Also maybe we need a slightly different kind of LLM, which instead of just assuming its top predictions are correct, gives you actions at critical steps on how to proceed.
build a jar. > I can build a jar with x,y,z, which do you want?
open interpreter can do that
I _cannot_ wait for you to get local models working with this (I know, they need function calling/streaming first). It's amazing! I burned through $10 like it was nothing and bigger context+local is going to make this killer IMHO. It needs additional guidance and with more context maybe loading lint rules into the context would get back code matching my coding style/guide but even as-is there is a ton of value here.
It was able to rewrite (partially, some didn't get fully done) 10 files before I hit my budget limits from Vue 2 Class Component syntax to Vue 3 Composition API. It would have needed another iteration or so to iron out the issues (plus some manual clean up/checking from me) but that's within spitting distance of being worth it. For now I'll use ChatGPT/Claude (which I pay for) to do this work but I will keep a close eye on this project, it's super cool!
Thanks for trying it and your feedback. I'm keeping tabs on open source/local models and will include them as soon as it's feasible.
I hear you on the API costs. You should see my OpenAI bills from building Plandex :-/
You should see my OpenAI bills from building Plandex :-/
Sorry if you have answered this before, but can you estimate how many man hours were saved using OpenAI or was the high usage more test related?
I have used Plandex a lot to help build Plandex faster, but yeah the high API costs are much more due to testing, where I need to run large tasks over and over in rapid succession in order to debug problems or iterate on the built-in prompts.
If you're thinking it is expensive, wait until you start to play with Claude Opus. Sooner or later I will declare bankrupt
Nice product BTW. I really liked the UI, is very polished
Do you have any plans to build IDE plugins for this? I understand it's open source and anyone could add that, I was just wondering if that was even on the roadmap? Having this run in my IDE would just so awesome with diff tool I'm used to, with all the other plugins/hotkeys/etc I use.
Yes, VSCode and JetBrains plugins are on the roadmap. Here's the current roadmap by the way: https://github.com/plandex-ai/plandex#roadmap-%EF%B8%8F (it's not exhaustive, but can give you a sense of where I'd like to take Plandex in the future).
And I completely missed that somehow... My apologies. Thank you for pointing that out.
No worries, it's pretty far down in the readme :)
Can you describe how you read in a repo any better than aider?
Aider has a few blog posts speaking to it.
I haven't yet tried incorporating tree-sitter as Aider does to load in all definitions in the repo. In Plandex, the idea is more to load in just the files are relevant to what you're building before giving a prompt. You can also load directory layouts (with file names only) with `plandex load some-dir --tree`.
I like the idea of something like `plandex load some-dir --defs` to load definitions with tree-sitter. I don't think I'd load the whole repo's defs by default like Aider does (I believe?), because that could potentially use a lot of tokens in a large repo and include a lot of irrelevant definitions. One of Plandex's goals is to give the user granular control over what's in context.
But for now if you wanted to do something where definitions across the whole repo would be helpful (vs. loading in specific files or directories) then Aider is better at that.
But.. it was said Plandex is similar or worthy of consideration next to something like Aider.. when this post is about Aider.
Understanding a codebase, along with the in/outs between the calls is pretty vital to any codebase, especially the larger a codebase gets.
I'm not attached to the way Aider or Plandex does anything, but I'm still not clear on which scenarios it's worth considering compared to Aider, or vice Versa. Aider seems pretty unique and stands alone on a number of things. I'll still install Plandex and try it out.
Without details, it's a little surprising a post like this could get upvoted so much.
Plandex isn't really focused on understanding a whole codebase. It can be used for that to some extent, but it's more designed for building larger features where you'd load in maybe 5-20 relevant files and then have Plandex build the whole feature across potentially dozens of steps and model calls. With Aider (or ChatGPT) it would require a lot more back-and-forth and user interaction to get a similar result.
Like I said, I think Aider's use of tree-sitter is a great concept and something I'd like to incorporate in some way. I'm not at all trying to claim that Plandex is 'better' than Aider for every use case. I think they are suited to different kinds of tasks.
How do you situate changes in a file? That seems like the hard part to me since the LLM can't necessary count to output a patch with line numbers.
Doesn't the software just give the LLM the line numbers?
It does use line numbers, which definitely aren't infallible. That's why a `plandex changes` TUI is included to review changes before applying. Unfortunately no one has figured out a file update strategy yet that doesn't make occasional mistakes--probably we'll need either next-gen models or fine-tuning to get there.
That said, counting isn't necessarily required to use line numbers. If line numbers are included in the file when it's sent to the model, it becomes a text analysis task rather than a counting task. Here are the relevant prompts: https://github.com/plandex-ai/plandex/blob/main/app/server/m...
I apologize if I'm not posting this in the correct place, but I've been trying to test this out (looks like it'll be fantastic, btw) and I keep running into a 429 error that I've exceeded my current quota for chatgpt but I really don't think I have, which leads me to believe that maybe it's not really taking my api key when I run the export command. Is there a way to check or another reason I could be getting this error?
You'd be getting a different error if your key wasn't getting through at all. Can you double check that you're using the right OpenAI api key, have enough api credits (as distinct from chatgpt quata), and haven't hit any max spend limits? You can check here: https://platform.openai.com/account/api-keys
I'm interested in this and will probably set it up but I wish more AI tools were better integrated to my IDE. I know GH Copilot is and other big AI tools have plugins with chat/edit features but most of the cool open source doesn't seem to support IDEA/JetBrains.
I see the power of LLMs. I use GH Copilot, I use ChatGPT, but I crave deeper integration in my existing toolset. I need to force myself to try in-IDE Copilot Chat. My habit is to go to ChatGPT for anything of that nature and I'm not sure why that is. Sometimes it's the same way I break down my search to for things "I know I can find" then put together the results. In the same way I break down the problem into small pieces and have ChatGPT write them individually or somethings additively.
In my experience, Supermaven makes Copilot look like a joke, and they’ve just released a Jetbrains plugin. YMMV. It’s just code suggestions though, no chat box.
Have you compared it to Sourcegraph Cody which also has a Jetbrains plugin? Same monthly cost as Supermaven.
I’ve tried Cody Pro. I canceled my subscription in 3 days. The plugin (for JetBrains at least) gave auth and other errors regularly and when it did work its completions were “meh” at best.
I’ll admit some of that might be from me being used to what I get from GH Copilot but basic stuff like initializing a variable called “count” with `0` or “++”-ing it in a loop were both things it didn’t auto-complete. I switched back to Copilot and it did exactly what I expected.
The polish is lacking with Cody and the errors are completely unacceptable in a paid product. I’ve seen 2 Copilot outages the entire time I’ve been using it (since before GA) so to have Cody barf up stupid errors multiple times in a 3-day period is just ridiculous.
I’ll be trying this today, thank you for the suggestion. 300K context window is awesome.
I actually like completions more, it feels more natural. I’m fine to go to ChatGPT/Opus to chat if needed.
I didn't get time to test it beyond installing it on VSCode today, but take a look at https://GitHub.com/continuedev/continue, Apache 2.0 license, and they have an IDEA/Jetbrains plugin. Plus codebase context, full configurability to use local or remote LLMs.
I probably need to give it another try but I tried that before with my own GPT-4 key, a local model, and their models and just got errors last time I tried it. I hope that was just a temp issue but because of that I moved on. Also I've tried Cody Pro (again, weird errors and when it did work I felt like Copilot would have done better).
I've been using https://cursor.sh/ heavily for about 2 months and I'm pretty happy.
Cursor is a fork of VSCode focused on AI. I'd prefer to use something totally open-source, but Cursor is free, gets regular updates, and I can use my OpenAI API key.
The diff view works well with AI coding assistants. I end up parallelizing more. I let cursor do its thing while I'm already looking at the next file.
I love aider too! Have used it to automate things such as maintaining a translated version of the page in a git pre-commit hook.
Aider works a little different where it doesn't just code complete or focus on a function level. It can solve much bigger problems.
Folks have developed VSCode and NeoVim integrations for aider. They're based on forks of aider, so I'm not sure how carefully their authors are keeping them up to date with aider releases.
The aider install instructions has more info:
https://aider.chat/docs/install.html#add-aider-to-your-edito...
What would be since is a single plugin that focuses only on UX and allows plug an play for AI models. I think we would benefit immensely from such a concept.
I’ve used aider to understand new codebases using technologies I don’t know and it did a fantastic job; much faster than grep/find + google.
To be fair in a world of good LSP impls, grep/find are really primative tools to be using. Not saying this isn't better then a more sophisicated editor setup, just that grep and find are a _really_ low bar
When we reach that world, let me know. I'm still tripping over a "python-lsp-server was simply not implemented async so sometimes when you combine it with emacs lsp-mode it eats 100% CPU and locks your console" issue.
Wait, so this is why Emacs has been locking up on me in most of my Python projects??
Possibly. Definitely why it has been locking up on me when I added lsp-mode.
Lsp-mode will schedule one request per keypress but then cancel that request at the next keypress. But since the python LSP server doesn't do async, it handles cancel requests by ignoring them
If emacs hard blocks on LSP requests, that may be on emacs as well. I recomemd you try ruff-lsp, although it does not iver everything and is more for linting, it's higb quality
Not sure if that's making things "fair". Grep & find are insanely powerful when you're a CLI power user.
Nonetheless, I'm particularly curious which cases the AI tool can find things that are not easy to find via find & grep (eg: finding URLs that are created via string concatenation, those that do not appear as a string literal in the source code)
Perhaps a larger question there, what's the overall false negative rate of a tool like this? Are there places where it is particularly good and/or particularly poor?
edits: brevity & clarity
I evaluate a lot of code, like ten-twenty applications per year currently, terminal tooling is my goto. Mostly the basic stuff, tree, ripgrep, find, wc, jq, things like that. I also use them on top of output from static analysis tooling.
It's not as slick as SQL on a RDBMS, but very close, and integrates well into e.g. vim, so I can directly pull in output from the tools and add notes when I'm building up my reports. Finding partial URL:s, suspicious strings like API keys, SQL query concatenation and the like is usually trivial.
For me to switch to another toolset there would have to be very strong guarantees that the output is correct, deterministic and the full set of results, since this is the core basis for correctness in my risk assessments and value estimations.
Aider can answer questions I can’t search for via LSP, like “what code would process the following URL” and similar.
I have this 300 line Go application which manages git tags for me. I asked it to implement a -dry-run function. It failed twice. First time it just mangled the file. Second time it just made code that didn't do anything.
I asked it to rename a global variable. It broke the application and failed to understand scoping rules.
Perhaps it is bad luck, or perhaps my Go code is weird, but I don't understand how y'all wanna trust this.
It must be your app/lang/prompt/grandma/dog/... lol. LLMs are the future, and they will replaces Allllllll the coders in the woooorld (TM), and did you know "it" can create websites??? Wooo, let's go, baby!
Nah these things are all stupid as hell. Any back and forth between a human and an LLM in terms of problem solving coding tasks is an absolute disaster.
People here and certainly in the mainstream population see some knowledge and just naturally expect intelligence to go with it. But it doesn't. Wikipedia has knowledge. Books have knowledge. LLMs are just the latest iteration of how humans store knowledge. That's about it, everything else is a hyped up bubble. There's nothing in physics that stops us from creating an artificial, generally intelligent being, but it's NEVER going to be with auto-regressive next-token prediction.
Nah these things are all stupid as hell. Any back and forth between a human and an LLM in terms of problem solving coding tasks is an absolute disaster.
I actually agree in the general case, but for specific applications these tools can be seriously awesome. Case in point - this repo of mine, which I think it's fair to say was 80% written by GPT-4 via Aider.
https://github.com/epiccoleman/scrapio
Now of course this is a very simple project, which is obviously going to have better results. And if you read through the commit history [1], you can see that I had to have a pretty good idea of what had to be done to get useful output from the LLM. There are places where I had to figure out something that the LLM was never going to get on its own, places where I made manual changes because directing the AI to do it would have been more trouble than it was worth, etc.
But to me, the cool thing about this project was that I just wouldn't have bothered to do it if I had to do all the work myself. Realistically I just wanted to download and process a list of like 15 urls, and I don't think the time invested in writing a scraper would have made sense for the level of time I would have saved if I had to figure it all out myself. But because I knew specifically what needed to happen, and was able to provide detailed requirements, I saved a ton of time and labor and wound up with something useful.
I've tried to use these sorts of tools for tasks in bigger and more complicated repos, and I agree that in those cases they really tend to swing and miss more often than not. But if you're smart enough to use it as the tool it is and recognize the limitations, LLM-aided dev can be seriously great.
[1]: https://github.com/epiccoleman/scrapio/commits/master/?befor...
LLMs does not store information though.
Language is a tool to convey information. LLMs are only about the language, not the information.
Which model? I've used Aider a bunch and for tasks like that GPT-4 has worked pretty well. But 3 would not be able to do it.
That’s why I’m not fully jumping in yet, as I think even GPT 4 is borderline. I’m grateful for those investing their energy into building things like this (and no doubt many will be successful) but I’m happy to remain an interested observer until the next generation when I think the value proposition may be much more evident.
Like another commenter I also use it for initial exploration in uncharted territory. For coding only helpful with autocompleting error strings. Even then, it messes with normal auto-complete, might get rid of it.
Thanks for trying aider, and sorry to hear you had trouble working with it. It might be worth looking through some of the tips on the aider GitHub page [0].
In particular, this is one of the most important tips: Large changes are best performed as a sequence of thoughtful bite sized steps, where you plan out the approach and overall design. Walk GPT through changes like you might with a junior dev. Ask for a refactor to prepare, then ask for the actual change. Spend the time to ask for code quality/structure improvements.
Not sure if this was a factor in your attempts? I'd be happy to help you if you'd like to open an GitHub issue [1] our jump into our discord [2].
[0] https://github.com/paul-gauthier/aider#tips
[1] https://github.com/paul-gauthier/aider/issues/new/choose
In my experience, these things work much better with Python than with anything else
GPT can write and edit code in most popular languages: python, javascript, typescript, html, css, etc.
I love how everyone always leaves PHP off these lists of "popular languages" despite the fact that 80% of the web runs on PHP.
PHP has had enormous staying power despite its idiosyncrasies.
I would however be curious to know what percentage of the 80% (or so) is WordPress et al. Since those largely don't involve folks actually writing code. I suspect a very small amount of PHP code is being run a lot.
PHP is still my go to for anything web-page based that's not a single page app. It's just a really nice solution for protoyping and making web quickly. I mean, custom stuff, not wordpress.
A lot of my successful projects have been rewritten later in nodejs. But for getting something up and running to test a concept, PHP is great if you're comfortable with its idiosyncracies.
I'd say Python is just as idiosyncratic, and its packaging system is just too much of a pain point. And Node doesn't ship with mature database interfaces, its dependencies are scary, there's more concern about runaway scripts, crashes are harder to recover from, and a lot of times all you really want from a router is to serve your file structure with some access rules.
I think PHP is still the best choice for prototyping dynamic HTML and logic fast, without any packages or plug-ins. A lotta times I still even use it for short CLI scripts and cron tasks that do database ops.
Languages that can create beginners in programming who can ship early and often are invaluable.
I don’t use very much php, but would be remiss if I left my opinion of it as dated as whisper campaign rumors based on interpretation and preference.
Packages like Laravel and especially technologies like Hotwire are nothing to overlook.
Standardized and capable frameworks that have large workforces can be quite valuable at time of valuation and due diligence. Specialized and brittle techs can be a challenge.
+1
I had a largish website with few thousand static webpages. Over a period the pages grew into around 100K with some server side features. Over the course of 10 years I did and redid this site in multiple technologies. React, Angular, Spring Boot + Freemarker etc.
However the PHP power version of it remains best for SEO has near zero downtime and no maintenance what so ever runs on a VM that shares like 10 other websites. traffic serves is around 100K visits a day.
I’ve updated the list to include php. Sorry for the oversight!
Aider has a big problem when working with python codebase.
1. Its dependencies will conflict with your code requirements.
2. If you don't install it within the code environment, you can use `aider run` where you can run local commands and pipe their outputs.
3. You will need to use all it's dependencies even in prod environment that can increase the attack surface.
So until they introduce a global binary install, I suggest using Plandex which is based on Go and can work across any environment within the system
Thanks for trying aider, and sorry to hear you had dependency conflicts.
You can install aider with pipx to avoid this. There's a FAQ entry that explains how:
https://aider.chat/docs/faq.html#how-to-use-pipx-to-avoid-py...
Also, why would you want to install aider in a production environment? It's a development tool, I wouldn't expect anyone to use it in prod. But maybe there's a use case I'm not thinking of?
Thank you, didn't know about pipx possibility. Will give it a shot.
I don't want aider in prod environment. Im saying its hard to remove it from prod if we can't isolate it from code dependencies as its hard to maintain multiple requirements.txt for different envs.
This is the correct solution, use pipx to install any Python cli and save yourself a ton of hassle.
That is a problem with the Python packaging ecosystem in general, not with aider. @BiteCode_Dev wisely advises beginners to use virtual environments to install anything that uses Python: https://www.bitecode.dev/p/back-to-basics-with-pip-and-venv
I have used this technique for months and it’s great https://x.com/arjie/status/1575201117595926530
I just have copilot in my editor and switch into my editor with C-x C-e for AI completion. I use neovim like example but you can use whatever you like.
EDIT: Oh never mind. I see what it is now. It’s a terminal based flow for editing code. Mine is for command line writing live.
That’s cool, but is there a way to use another vendor instead of GitHub CoPilot?
There’s the fauxpilot project. This is just the minimal stuff to use the AI completer. But you can sub out faux pilot and get the same result (modulo quality).
Python projects depress me because of the dependency management problem.
No problem, just install it in a virtualized containerized pyenv virtualenv pipx poetry conda dies
Exactly.
How do Python people put up with that? Have none of them tried to run multiple Python projects over time on the same Linux install with commandline access to all of them at once?
If you prefer GUI with more space to write task requirements, or use your existing ChatGPT/Claude subscription without additional API costs, you can check out my desktop app: https://prompt.16x.engineer/
Everyone has one, I just made a script to print all the files in the current folder to the terminal. I have .context-ignore files to exclude some patterns, similar to .gitignore The first file is a README.md that contains my initial description of what I am working on. At the end I type a new command. I copy & paste the text into chatGPT web interface.
I think I used it for 10-15 rounds of iteration on my latest project and it generated about 50% of the code of a web app with Python backend. Pretty sweet and costs nothing on top of the web subscription. The funny part is that I was using this AI coding tool to build another AI tool to manage a collection of prompts and demonstrations including automatic prompt evaluation, so I was using an AI tool to make another AI tool.
That's exactly how I started as well, I was using a similar workflow to build an AI-driven game.
Then I thought maybe it's a good idea to turn it into something less ad-hoc, more user-friendly, and work for any project.
So many AI coding agents popping up that claim to build entire projects but all I'm interested in (happy to pay):
- Review GitHub PR and suggest fixes.
- Improve the readability of code with a single command (devs suck at naming variables).
- context aware autocomplete for real
- context aware autocomplete for real
If the AI tool can fetch related classes for the code that I'm working with that would be so helpful!
devs suck at naming variables
This is the #1 problem I've faced with my contractors and I wish I could find a good solution for it
I just tried it and it's amazingly cool, but the quality of the output just isn't there for me yet. It makes too much subtle errors to be as useful as the screenshots and the gifs makes it look
I agree with you. It's okay at really simple code changes for really simple repos, but it falls apart for anything outside the ordinary.
I'm sure I'll have to eat these words, but: This just doesn't feel like the right interface to me. LLMs are incredible at generating "inroads" to a problem, but terrible at execution. Worse yet at anticipating future problems.
All this might very well change. But until it does, I just want my LLMs to help me brainstorm and help me with syntax. I think there's a sweet spot somewhere between this tool and Copilot, but I'm not sure where.
Hi, for somewhere between GitHub Copilot and aider, you can try the desktop app 16x Prompt. I have been using it daily for the past few months and it suits my working style nicely.
It is capable of handling complex tasks like feature development and refactoring across multiple files, but it doesn't try to generate diff and apply them automatically.
Instead, you will get a response from LLM that is easy to read and allow you as a developer to quickly apply to your existing codebase.
You can check it out here: https://prompt.16x.engineer/
I’m still waiting for that bastard Devin to write my killer app. Now you want me to code my own killer app with an AI micromanage me?
exactly. who's training who here? :)
I appreciate @anotherpaulg's continual benchmarking of LLM performance with aider, for example:
OpenAI just released GPT-4 Turbo with Vision and it performs worse on aider’s benchmark suites than all the previous GPT-4 models. In particular, it seems much more prone to “lazy coding” than the GPT-4 Turbo preview models.
+ accompanying HN thread (117 comments) https://news.ycombinator.com/item?id=39985596
It gets commit messages wrong. Commit messages should signal intent, not what the patch does. "changing an enum" is a horrible commit message.
Adopt a convention like commitizen: https://github.com/commitizen/cz-cli
'typeofchange(scopeofchange): reason for change'
It sort helps force devs to type out more meaningful commit messages.
When working alone remotely from home, I simulate pair programming with a methodology I call "The Stranger". I sit on one of my hands until it becomes numb and tingly, and then it feels like somebody else is typing and moving the mouse!
I use synthetic aperture pair programming. I write some code and get it working. Then I get pulled off it for a few months and come back to it.
Me + 3 months: da fuk.
Just finishing up a responsive website design with Claude, no typing by me, just a natural language conversation as I'm looking at the rendering differences and trying strategies to find a one size fit all without shims. The longer context is amazing, 'return the entire codebase to help us maintain a strong context, I have no hands' works extremely well, and it's handling an 875 line project of mixed html/css/js like a champ. This is as bad as it'll ever be, makes web design a pleasure again, well, sort of...
Integration with neovim. https://github.com/nekowasabi/aider.vim
Promising. Looks like it can leverage existing requirements if included in the git repository as ReqIF exchange format files. I’m currently not able to verify with an experiment because I’m dealing with huge sets of requirements and I easily reach the limits of API (I guess), need to trim down. Any experience by others?
These tools have the same problem as image or video generative AI does - it can maybe render individual parts accurately, but what is basically autocomplete cannot reason about the bigger picture. You glance at it, and it looks ok, but then you look closer, and it's riddled with issues.
Statistical prediction has its limitations - who knew.
We have an issue in OpenDevin to add Aider as an agent, if anyone wants to take a crack at it:
I revisited Aider a couple of days ago, after going in circles with AutoGPT - which seemed to either forget or go lazy after a few prompts - to the point it refused to do something that it did a few prompts before. Then Aider delivered from the first prompt.
PS. I've gathered a list of LLM agents (for coding and general purpose) https://docs.google.com/spreadsheets/d/1M3cQmuwhpJ4X0jOw5XWT...
Aider is one of my favorite AI agents, especially because it can work with existing codebases. We've seen a lot of good results from folks who used it with Wasp (https://github.com/wasp-lang/wasp) - a full-stack web framework I'm working on.
A "marketingy" demo video: https://www.youtube.com/watch?v=DXunbNBpgZg&ab_channel=Wasp
While it is not recommended, --no-auto-commits will stop aider from git committing each of GPT’s changes.
Why is it recommended to not quickly review the changes (git status, git diff) before committing?
Navie AI + AppMap I found it while looking for a tool that could read my code and create flow diagrams automatically. AppMap does exactly that: records "appmaps" while the app runs, it is one of the most useful plugins I have seen in vscode for those who have to deal with a lot of different codebases, languages, repos, on a daily basis. It is a really useful too for engineering teams, it allows devs to jump into new codebase so fast. probably one of the most powerful tools I have seen running in Visual studio code. their latest version includes a AI assistant that can feed from the appmaps + language models to build documentation, explanations, etc. in seconds.
If you're interested in this sort of stuff, you might like this diff-based CLI tool I wrote:
It runs on Groq (the company I work for), so it's super snappy.
Big fan of Aider.
We are interesting in integrating Aider as a tool for Dosu https://dosu.dev/ to help it navigate and modify a codebase on issues like this https://github.com/langchain-ai/langchain/issues/8263#issuec...
This tool is amazing
Aider is the only tool I use for coding now with ChatGPT.
Copilot is pretty good but I like the split context of declaring what you are working on in the CLI.
It still suffers from ChatGPT laziness sometimes, you can see it retrying several times to get a correct output before giving up.
The author was one of the brains behind the early search engine Inktomi (which was a wild success, until Google came along ...)
For the Emacs user, maybe not exactly one-to-one, but useful:
Personally the use for me has been in writing boilerplate. As an example, one of my ongoing goals has been to port all the view code of a project to another framework, following its idioms. Using an LLM, I can process a file in a couple of seconds, and checking that everything is right takes just a few seconds as well. It’d take me hours to go through every file manually, and it’d be prone to human error. It’s not technically challenging stuff, just tedious and mind-numbing, which is perfect for an LLM.
I do agree though, these basic examples do seem quite pointless, if you already know what you’re doing. It’s just as pointless as telling another developer to “add a name param to ‘greeting’ function, add all types”, which you’d then have to review.
I think it comes down to your level of experience though. If you have years and years of experience and have honed your search skills and are perfectly comfortable, then I suspect there isn’t a lot that an LLM is going to do when it comes to writing chunks of code. That’s how I’ve felt about all these “write a chunk of code” tools.
In my case, apart from automating the kind of repetitive, mindless work I mentioned, it’s just been a glorified autocomplete. It works -really- well for that, especially with comments. Oftentimes I find myself adding a little comment that explains what I’m about to do, and then boop, I’ve got the next few lines autocompleted with no surprises.
I had to work without an internet connection a few days ago and it really, really hit me how much I’ve come to use that autocomplete - I barely ever type anything to completion anymore, it was jarring, having to type everything by hand. I didn’t realise how lazy my typing had become.
We live in a world with everything from macro systems and code generation to higher-order functions and types... if you find yourself writing the same "boilerplate" enough times that you find it annoying, just automate it, the same way you can automate anything else we do using software. I have found myself writing very little "boilerplate" in my decades of software development, as I'd rather at the extreme (and it almost never comes to this) throw together a custom compiler than litter my code with a bunch of hopefully-the-same-every-time difficult-to-adjust-later "boilerplate".
I'd say that using LLMs to write boilerplate falls under "automation"
Yes, except it's "bad automation", because as opposed to the automation referred to by GP, boilerplate written by an LLM (or an intern or whomever) is extra code that costs a lot of time to be maintained.
In many ways it's more akin to outsourcing then automating.
And might be wrong.
But, perhaps uniquely amongst all the systems for avoiding boilerplate since Lisp macros were introduced in the 1950s, it will sometimes make stuff up. I don't buy that "a worse way to write boilerplate" is going to revolutionise programming.
I don't have the impression I'm writing too much boilerplate, but I am curious about this as I have heard it multiple times: are there more examples of boiler plate that an LLM is better/faster at generating than a couple of copy/paste? If it's more than a couple than copy/paste and it's time for a rewrite, do you leverage AI for this? how do you usually introduce the abstraction?
One example of boilerplate that I've been automating is when you're creating model code for your ORM.
I paste the table definition into a comment, and let the LLM generate the model (if the ORM doesn't automate it), the list of validation rules, custom type casts, whatever specifics your project has. None of it is new or technically challenging, it's just autocompleting stuff I was going to write anyway.
It's not that you're writing "too much" boilerplate; this is a tiny part of my work as well. This is just the one part where I've actually found an LLM useful. Any time I feel like "yeah this doesn't require thought, just needs doing", I chuck it over to an LLM to do.
I've found this very useful as well. My typical workflow (server-side kotlin + spring) has been:
- create migration files locally, run statements against containerized local postgres instance - use a custom data extractor script in IntelliJ's data tool to generate r2dbc DAO files with a commented out CSV table containing column_name, data_type, kotlin_type, is_nullable as headers - let AI assistant handle the rest
Call me a caveman but the lack of an option to use AI tools offline is a massive downside to me. I am connected to the internet most of the time but I take comfort in knowing that, for most of my work, I could lose my connection and not even notice
Try ollama You can run models locally.
That's just not the reality anymore. You can run a decent open source coding language model on local hardware. Just needs a bit of work and it's not quite as seamless.
Can you explain more how "checking everything is right takes just a few seconds as well? A code review can't happen in "just a few seconds" so maybe I don't understand what the process your describing really is
In the example I gave, it was just porting the same view code from one framework's way of writing view code to another. It's a one-off task involving hundreds of different views.
There's zero technical challenge, almost no logic, super tedious for a human to do, not quite automatable since there could be any kind of code in those views, and it's very very unlikely that the LLM gets it wrong. I give it a quick look over, it looks right, the tests pass, it's not really a big deal.
And one nice thing I did as well was ask it to "move all logic to the top of the file", which makes it -very- easy to clean up all the "quick fix" cruft that's built up over years that needs to be cleaned up or refactored out.
In those cases the file might indeed need more time dedicated to it, but it would've needed it either way.
thanks for the reply! I'll try it for commenting.
Most of the discussions about AI applied to coding end up having someone who states that it's just not worth it (at least the moment) and someone else who then chimes in to say that they mostly use it for "boilerplate" code.
I have trouble understanding the "boilerplate" thing because avoiding writing boilerplate is
1) already a solved "problem" long before AI
2) is it really a "problem"?
The first point: * If you find yourself writing the same piece of code over and over again in the same code it's the indication that you should abstract it away as a function / class / library.
* IDEs have had snippets / code completion for a long time to save you from writing the same pieces of code.
* Large piece of recycled functionalities are generally abstracted away in libraries of frameworks.
* Things like "writing similar static websites a million times" are the reason why solutions like WordPress exist: to take away the boilerplate part of writing websites. This of course applies to solutions / technologies / services that make "avoid writing boilerplate code" their core business
* The only type of real boilerplate that comes to my mind are things like "start a new React application" but that is a thing you do once per project and it's the reason why boostrappers exist so that you only really have to type "npx create-react-app my-app" once and the boilerplate part is taken care of.
The second point: Some mundane refactoring / translations of pieces of code from one technology to the other can actually be automated by AI (I think it's what you're talking about here, but how often does one really do such tasks?), but... Do you really want to? Automate, it, I mean?
I mean, yes "let AI do the boring staff so that I can concentrate on the most interesting parts" make sense, but it's not something I want to do. Maybe it's because I'm aging, but I don't have it in me to be concentrated on demanding, difficult, tiring tasks 8 hour straight a day. It's not something that I can and it's also something that I don't want to.
I much prefer alternating hard stuff that require 100% of my attention with lighter tasks that I can do while listening to a podcast and steam off in order to rest by brain before going back to a harder task. Honestly I don't think anyone is supposed to be concentrated on demanding stuff all day long all week long. That's the recipe for a burnout.
The LLM output is also extremely prone to error, so it’s not like the second part of your sentence is a valid argument.
how can you check it in a few seconds if it'd take you hours to change it manually?
Does anyone has examples where promoting and double checking is faster than doing it on your own?
I find it is faster in lots of cases where the solution is 'simple' but long and and a bit fiddly. As a concrete example from earlier today, I needed a function that took a polygon and returned a list of its internal angles. Can I write it myself, sure. Did copilot generate the code (and unit tests) for me in a fraction of the time it would have taken me to do it, absolutely.
Sorry I'm not in your domain at all, but shouldn't that be a library function? Properties of polygons seem pretty universal to me. Will AI replace carefully curated libraries with repeated boilerplate? Thus reducing reusabilty of human efforts?
shouldn't that be a library function
It's a balance. Sometimes it's better to just to write a 10 line function and get on with your work, rather than dragging in a huge extra dependency to your project.
If there is a good library for it within the domain, ideally at some point the AI will suggest it. Can't wait until the AI writes it own library it will reference in future answers.
I feel only a bit bad when deploying a billion dollar machine model to ask "how to rename a git a branch" every other week. Its the easiest way (https://github.com/tbckr/sgpt) compared to reading the manual, but reading the manual is the right way.
Not sure if you're talking about chatgpt or google
Can’t speak for everyone else but I almost exclusively use it for what you mentioned:
If it’s something I have no idea how to do I might describe the problem and just look at the code it spits out; not even copy pasting but just reading for a basic idea.
Much worse if there’s a blog post or example in documentation that’s exactly what I’m looking for, but, if it’s something novel, much better.
An example:
Recently asked how I could convert pressure and temperature data to “skew T” coordinates for a meteorological plot. Not something easy to Google, and the answers the AI gave were slightly wrong, but it gave me a foot in the door.
This is also where I've kind of ended up with it, I've also noticed that when I was at one point using it everyday, I'm opening it less and less, maybe a few times a week and recently cancelled my subscription. It's still pretty useful for exploritory stuff, boilerplate and sometimes it can give you a hint on debugging. Everything else I can write faster and more correctly myself.
One example where I successfully used an AI tool (plain ChatGPT) went a bit like this:
Me: Can you give me code for a simple image viewer in python? It should be able to open images via a file open dialog as well as show the previous and next image in the folder
GPT: [code doing that with tkinter]
Me: That code has a bug because the path handling is wrong on windows
GPT: [tries to convince me that the code isn't broken, fixes it regardless]
Me: Can you add keyboard shortcuts for the previous and next buttons
GPT: [adds keyboard shortcuts]
After that I did all development the old fashioned way, but that alone saved me a good chunk of time. Since it was just internal tooling for myself code quality didn't matter, and I wasn't too upset about the questionable error handling choices
I use it as a glorified search engine and to read through bad documentation (arhm, AWS). but this only works for well documented solutions.
core programming hasn't really changed over the past years with good reason: you need. to. understand what you do. this is the bottleneck. not writing it.
Well, regular search means switching to a different application, with an implied context switch. It definitely takes longer for many things than just using GitHub copilot.
Yea that's where I've landed. Telling it what to do is time consuming.
Telling it what I want to do in a broader term and asking for code examples is a lot better., especially for something I don't know how to do.
Otherwise the autocomplete/suggestions in the editor is great for the minutia and tedious crap and utility functions. Probably saves me about 20% typing which is great on hands that have typing for 20 odd years.
It's also good for finding tools and libraries (when it doesn't hallucinate) since https://libs.garden disappeared inexplicably (dunno what to do on Friday nights now that I can't browse through that wonderful site till 2am)
Generally, I agree. I have found it useful for writing SQL, mapping structs, converting from JSON to CSV, etc. i.e. repetitive stuff.
For me a useful coding assistant would be one that looks at what I'm _doing_ and helps me complete the boring parts of the task.
The current wave of coding assistants target junior programmers who don't know how to even start approaching a task. LLMs are quite good at spitting out code that will create a widget or instantiate a client for a given API, figuring out all the parameters and all the incantations that you'd otherwise need to copy paste from a documentation. In a way they are documentation "search and digest" tools.
While that's also useful for senior developers when they need to work outside of their particular focus area, it's not that useful to help you work on a mature codebase where you have your own abstractions and all sorts of custom things that have good reasons to be there but are project specific.
Sure, we could eventually have LLMs that can be fine tuned to your specific projects, company or personal style.
But there is also another area where we can use intelligent assistants: editors.
Right now editors offer powerful tools to move around and replace text, often in ways that respects the syntax of the language. But it's cumbersome to use and learn, relying on key bindings or complicated "refactoring" commands.
I wish there was a way for me to have a smarter editor. Something that understands the syntax and a bit the semantics of the code but also the general intent of the local change in working on and the wider context so it can help me apply the right edits.
For example right now I'm factoring out a part of a larger function into it's own function so it can be called independently.
I know there are editor features that predate AI that can do this work but for various reasons I can't us it. For example, you may have started to do it manually because it seemed simple and then you realize you have to factor out 5 parameters and it becomes a boring exercise of copy paste. Another example is that the function extraction refactoring tool of your IDE just can't handle that case, for example: func A(a Foo) { b := a.GetBar(); Baz(b.X, b.Y, c, d) } you'd want to extract a function func _A(b Bar) { Baz(b.X.... and have A call that. In some simple cases the IDE can do that. In other you need to do it manually.
I want an editor extension that can help me with the boring parts of shuffling parameters around, moving them in structures etc etc all the while I'm in control of the shape of the code but I don't have to remember the advanced editor commands but instead augment my actions with some natural language comments (written or even spoken!)
Thanks for checking out aider.
That demo GIF is just showing a toy example. To see what it's like to work with aider on more complex changes you can check out the examples page [0].
The demo GIF was just intended to convey the general workflow that aider provides: you ask for some changes and aider shares your existing code base with the LLM, collects back the suggested code edits, applies them to your code and git commits with a sensible commit message.
This workflow is generally a big improvement over manually cutting and pasting bits of code back and forth between the ChatGPT UI and your IDE.
Beyond just sending the code that needs to be edited, aider also sends GPT a "repository map" [1] that gives it the overall context of your codebase. This makes aider more effective when working in larger code bases.
[0] https://aider.chat/examples/
[1] https://aider.chat/docs/repomap.html
Exactly. It's almost useless to me.