return to table of content

Show HN: Plandex – an AI coding engine for complex tasks

BirbSingularity
17 replies
4d2h

It's pretty annoying that every project like this lately is just a wrapper for OpenAI API calls.

danenania
11 replies
4d2h

Supporting more models, including Claude, Gemini, and open source models is definitely at the top of the roadmap. Would that make it less annoying? :)

codeapprove
4 replies
4d1h

Not affiliated with the project but you could use something like OpenRouter to give users a massive list of models to choose from with fairly minimal effort

https://openrouter.ai/

danenania
2 replies
4d1h

Thanks, I need to spend some time digging into OpenRouter. The main requirement would be reliable function calling and JSON, since Plandex relies heavily on that. I'm also expecting to need some model-specific prompts, considering how much prompt iteration was needed to get things behaving how I wanted on OpenAI.

I've also looked at Together (https://www.together.ai/) for this purpose. Can anyone speak to the differences between OpenRouter and Together?

kekebo
1 replies
3d22h

I can't speak to the differences of Openrouter to Together but the Openrouter endpoint should work as a drop-in replacement for OpenAI api calls after replacing the endpoint url and the value of $OPENAI_API_KEY. The model names may differ to other apis but everything else should work the same.

danenania
0 replies
3d21h

Awesome, looking forward to trying it out.

j45
0 replies
3d22h

Would love to hear any feedback from people who have gotten to know OpenRouter, as well as any similar tools.

npace12
3 replies
3d23h

I think Mistral-2-Pro would work really well for this, judging by the great results I've had with it on another heavy on tool calling project [1]

[1] https://github.com/radareorg/r2ai

danenania
2 replies
3d23h

Thanks, I'll give it a try. Plandex's model settings are version-controlled like everything else and play well with branches, so it will be fun to start comparing how all different kinds of models do vs. each other on longer coding tasks using a branch for each one.

p1esk
1 replies
3d21h

For challenging tasks, I typically get code outputs from all three top models (gpt4, opus, and ultra), and pick the best one. It would be nice if your tool could simply this for me: run all three models and perhaps even facilitate some type of model interaction to produce a better outcome.

danenania
0 replies
3d21h

Definitely, I'm very interested in doing something along these lines.

Aeolun
0 replies
3d18h

I think OpenAI is still the best of the bunch. Kind of feel like the others are kind of there to make people realize OpenAI works the best. Maybe when Gemini 1.5 is released?

aerhardt
1 replies
3d20h

I’m moving an inordinate amount of data between the ChatGPT browser window and my IDE (a lot through copying and pasting) and this demonstrates two things: 1) ChatGPT is incredibly useful to me and 2) the worflow UX is still terrible. I think there is room for building innovative UXs with OpenAI, and so far what I’ve seen in Jetbrains and VSCode isn’t it…

danenania
0 replies
3d20h

That was also my experience and thought process.

bottlepalm
0 replies
3d20h

Every program is a wrapper around a CPU, so annoying.

_ink_
0 replies
3d20h

But the open source models have Open AI compatible APIs, so as long as you can set the API endpoint you can use whatever you want.

CharlieDigital
0 replies
3d22h

OpenAI API is simply a utility. The question is given this utility, how does one find the right use case, structure the correct context, and build the right UX.

OP has certainly built something interesting here and added significant value on top of the base utility of the OpenAI API.

visarga
7 replies
3d21h

This approach works. I just built a SPA in 3 days with GPT-4 of which about 50% was generated. My only tooling was a bash script to list all the files in the repo (with some exceptions), including a README.md planning the project, a file list, and at the end I type my task.

I run about 10-15 rounds with it. At the beginning I was using GPT more heavily, but in the middle I found it easier to just fix the code myself. The context got as big as 10k tokens, but was not a problem. At some point I might need to filter the files more aggressively.

But surprisingly all that is needed for a bare-bone repo-level coding assistant is a script to list all the files so I could easily copy paste the whole thing into the chatGPT window.

danenania
2 replies
3d21h

Yes, well said. Doing exactly this kind of thing for months with ChatGPT is what convinced me the idea could work in the first place. I knew the underlying intelligence was there--the challenge is giving it the right prompts and supporting infra.

Aeolun
1 replies
3d18h

Do you have any of the issues where ChatGPT tends to forget the first parts of it’s context window? It could have the information explicitly spelled out, but if it weren’t in the last 2K tokens or so it’d just start to hallucinate stuff for me.

danenania
0 replies
3d17h

Plandex uses gradual summarization as the conversation gets longer (the exact cutoff point in terms of tokens is configurable via `plandex set-model`). So eventually, with a long enough plan, you can start to lose some resolution. That said, assuming you use the default gpt-4-turbo model with a 128k context window, you'd need to go far beyond 2k tokens before you'd start seeing anything like that.

We don't know what ChatGPT's summarization strategy is since it's closed source, but it does seem to be quite a bit more aggressive than Plandex's.

sanmon3186
1 replies
3d15h

What’s your experience with API cost? I've also tried something similar, but I often end up using up my balance too quickly.

CGamesPlay
0 replies
3d13h

I can generally have these tools solve a simple issue in about 0.1 USD, or "complex" issues in 1-2 USD (complex generally just means that I'm spending time prompt engineering to get the model to do the right thing).

ugh123
0 replies
3d19h

Do you have any boilerplate part of your prompt you can share?

nico
0 replies
3d20h

a script to list all the files so I could easily copy paste the whole thing

Just in case you are using a Mac, you can pipe the output of your script to pbcopy so that it goes directly into your clipboard

script.sh | pbcopy

ldelossa
7 replies
3d22h

Show me one of these things do something more complex then a front end intern project.

IshKebab
3 replies
3d21h

I agree, these things seem to do okish on trivial web projects. I've never seen them do anything more than that.

I still use ChatGPT for some coding tasks, e.g. I asked it to write C code to do some annoying fork/execve stuff (can't remember the details) and it did a decentish job, but it's like 90% right. Great for figuring out a rough shape and what functions to search for, but you definitely can't just take the code and expect it to work.

Same when I asked it to write a device driver for some simple peripheral. It had the shape of an answer but with random hallucinated numbers.

I've also noticed that because there is a ton of noob-level code on the internet it will tend to do noob-level things too, like for the device driver it inserted fixed delays to wait for the device to perform an operation rather than monitoring for when it had actually finished.

I wonder if coding AIs would benefit from fine tuning on programming best practices so they don't copy beginner mistakes.

danenania
2 replies
3d21h

I used a web project in the demo because I figured it would be familiar to a wide range of developers, but actually many nontrivial pieces of Plandex have been built with the help of Plandex itself.

That's not to say it's perfect or will never make "noob-level" mistakes. That can definitely happen and is ultimately a function of the underlying model's intelligence. But I can at least assure you that it's quite capable of going far beyond a trivial web project.

It's also on me to show more indepth examples, so thanks for calling it out. I'd love it if you would try some of the projects you mention and let me know how it goes.

Kamii0909
1 replies
1d4h

So basically you doesn't have any non trivial example. What else but to be expected?

chmod2
1 replies
3d20h

It's not something I would consider a complex job. A simple prompt to chatgpt could even produce a working CDK template.

danenania
0 replies
3d20h

Here's another one, for the backend of a Stripe billing system: https://github.com/plandex-ai/plandex/blob/main/test/test_pr...

It seems like more examples demonstrating relatively complex tasks would be helpful, so I'll work on those.

I'm certainly not trying to claim that it can handle any task. The underlying model's intelligence and context size do place limits on what it can do. And it can definitely struggle with code that uses a lot of abstraction or indirection. But I've also been amazed by what it can accomplish on many occasions.

CGamesPlay
6 replies
3d13h

Love the idea of this, and very excited to see how it pans out. That said: I hate the code review UI. Just dump the changes as `git diff` does and let me review them using all the code review tools I use every day, then provide revision instructions. Building a high-quality TUI for side-by-side diffs should not be the thing you are spending time on, and there already exist great tools for viewing diffs in the terminal.

danenania
5 replies
3d13h

Thanks for the feedback! I actually had a ‘plandex diff’ command working at one point, but dropped it in favor of the changes TUI. I could definitely bring it back for people who prefer that format.

retendo
2 replies
3d11h

You could have a mode for people „who know what they are doing“ and just auto approve all the changes plandex makes and let users handle the changes themselves. I would actually prefer that, because I could keep using my IDE to look at diffs and decide what to keep.

danenania
0 replies
3d1h

Thanks, I'll consider this. It would be easy enough to add flags that will allow this.

adr1an
0 replies
3d8h

Agreed! I for example would prefer to use diffstatic

parentheses
1 replies
2d17h

Providing diff output allows people to self select their approach to merging the changes.

danenania
0 replies
2d17h

Yeah, that makes sense. I'm going to add this soon.

wanderingmind
5 replies
3d19h

Congrats on the launch. Can you please compare and contrast Plandex features with another similar solution like aider[1] which also helps solve similar problem.

[1] https://github.com/paul-gauthier/aider

anotherpaulg
2 replies
3d15h

Thanks for mentioning aider! I haven't had a chance to look closely at plandex, but have read the author's description of differences wrt aider. I'd add a few comments:

I think the plandex UX is novel and interesting. The idea of a git-like CLI with various stateful commands is a new one in this space of ai coding tools. In contrast, aider uses a chat based "pair programming" UX, where you collaborate with the AI and ask for a sequence of changes to your local git repo.

The plandex author highlights that it makes changes in a "version-controlled sandbox" and can "rewind" unwanted changes.

These capabilities are all available "for free" in aider, because it is tightly integrated with git. Each AI change is automatically git committed with a sensible commit message. You can type “/diff” to check the diff, or "/undo" to undo any AI commit that you don't like. Or you can use "/git checkout -b <branch-name>" to start working on a branch to explore a longer sequence of changes, etc.

All your favorite git workflows are supported by invoking familiar git commands with "/git ..." inside the aider chat, or using any external git tooling that you prefer. Aider notices any changes in the underlying repo, however they occur.

bjornsing
1 replies
3d6h

These capabilities are all available "for free" in aider, because it is tightly integrated with git.

Sounds like the right approach to me. Some quick questions:

1. Is it easy to customize the system prompt with aider?

2. Does aider save a record of all OpenAI API calls? I’m thinking I may e.g. want to experiment with fine tuning an open source model using these one day.

3. What would you say are aider’s closest “competitors”?

danenania
0 replies
3d1h

Just to note, Plandex also has integration with git on the client-side and can automatically commit its changes (or not--you can decide when applying changes).

One of the reasons I think it's good to have the plan version-controlled separately from the repo is it avoids intermingling your changes and the model's changes in a way that's difficult to disentangle. It's also resilient to a "dirty" git state where you have a mix of staged, unstaged, and untracked changes.

One more benefit is that Plandex can be used in directories that aren't git repos, while still retaining version control for the plan itself. This can be useful for more one-off tasks where you're not working in an established project.

danenania
1 replies
3d18h

Thanks! Sure, I posted this comment in a Reddit thread a couple days ago to a user who asked the same question (and I added one additional point):

First I should say that it’s been a few months at least since I’ve used aider, so it’s possible my impression of it is a bit outdated. Also I’m a big fan of it and drew a lot of inspiration from it. That said:

Plandex is more focused on building larger and more complex functionality that involves multiple steps, whereas aider is more geared toward making a single change at a time.

Plandex has an isolated, version-controlled sandbox where tentative changes are accumulated. I believe with aider you have to apply or discard each change individually?

Plandex has diff review TUI where changes can be viewed side-by-side, and optionally rejected, a bit like GitHub’s PR review UI.

Plandex has branches that allow for exploring multiple approaches.

aider has cool voice input features that Plandex lacks.

aider’s maintainer Paul has done a lot of benchmarking of file update strategies. While I think Plandex’s approach is better suited to larger and more complex functionality, aider’s unified diff approach may have higher accuracy for a single change. I hope to do benchmarking work on this in the future.

aider requires Python and is installed via pip, while Plandex runs from a single binary with no dependencies, so Plandex installation is arguably easier overall, especially if you aren't a Python dev.

I’m sure I’m missing some other differences but those are the main ones that come to mind.

wanderingmind
0 replies
3d18h

Thank you. Branches to explore different approaches is a really good idea, since LLMs are most powerful when they are used as a rubber duck to generate boilerplate templates and this can help get multiple perspectives. Going to test it soon.

usernamed7
3 replies
2d2h

Whats the deal with plandex cloud and $10/$20-mo? The github repo README devolves into a cloud pitch halfway through. I thought this was a local binary talking to openAI? I thought this was open source?

danenania
2 replies
2d2h

Hi, it’s open source and it also has a cloud option. You can either self-host or use cloud—it’s up to you.

The CLI talks to the Plandex server and the server talks to OpenAI.

usernamed7
1 replies
2d1h

but i still don't get what the cloud option would be doing that's worth $20/mo if it's talking to openAI. Does the plandex server have large resource requirements?

danenania
0 replies
2d1h

The server does quite a bit. Most of the features are covered here: https://github.com/plandex-ai/plandex/blob/main/guides/USAGE...

I actually did start out with just the CLI running locally, but it reached a point I needed a database and thus a client-server model to get it all working smoothly. I also want to add sharing and collaboration features in the future, and those require a client-server model.

timfsu
3 replies
3d20h

Congrats on the launch, I'm excited to give it a try. I'm curious how you're having it edit files in place - having built a similar project last summer, I had trouble with reliably getting it to patch files with correct line numbers. It was especially a problem in React files with nested div's.

danenania
2 replies
3d20h

Thanks! I tried many different ways of doing it before settling on the current approach. It's still not perfect and can make mistakes (which is why the `plandex changes` diff review TUI is essential), but it's pretty good now overall.

I was able to improve reliability of line numbers by using a chain-of-thought approach where, for each change, the model first summarizes what's changing, then outputs code that starts and ends the section in the original file, and then finally identifies the line numbers from there.

The relevant prompts are here: https://github.com/plandex-ai/plandex/blob/main/app/server/m...

nico
1 replies
3d20h

Amazing work. Loved the video and looking forward to trying it

Can a user ask plandex to modify a commit? Maybe the commit just needs a small change, but doesn’t need to be entirely re-written. Can the scope be reduced on the spot to focus only on a commit?

danenania
0 replies
3d19h

Thanks! There isn't anything built-in to specifically modify a commit, but you could make the modification to the file with Plandex and then `git commit --amend` for basically the same effect.

poulpy123
3 replies
3d7h

Not for this project specifically, but I realize that I've seen a lot of AI agents, but I've never seen something interesting build with them. Some simple website, maybe even some very simple old games like snake or pong, but nothing better. Do I miss something ?

dmos62
0 replies
3d5h

I brainstormed a text game-engine powered by an llm, but relying on a non-local llm was offputting. Local llms are getting more and more viable though. A general problem I was running into was that thinking in terms of LLM queries is a very new way of computation design and adapting takes a lot of effort. Then again, my central idea was a bit ambitious too: every game character would have a unique interpretation on what was happening.

danenania
0 replies
3d1h

Try to build something interesting with Plandex! Perhaps you will be pleasantly surprised. Either way, please let me know how it goes.

ijustlovemath
3 replies
3d6h

To support many other models you should look at ollama - it provides a REST API on your machine for local inference that works just like OpenAI

danenania
2 replies
3d1h

Thanks, I'm aware of ollama and the open source model ecosystem, but I haven't done a deep dive yet, so all the info in this thread has been quite helpful.

ijustlovemath
1 replies
2d20h

In theory, all you have to do is redirect the API gateway to localhost and all your existing integrations should just work!

danenania
0 replies
2d17h

There's an issue here to keep track of this: https://github.com/plandex-ai/plandex/issues/20

It seems that while ollama does have partial OpenAI API compatibility, it's missing function calling, so that's a blocker for now.

brap
3 replies
3d22h

This seems very interesting, but I think the interface choice is not good. There would have been much less friction if this was purely a GitHub/GitLab/etc bot.

danenania
1 replies
3d21h

I see where you're coming from and I do plan to add a web UI and plugin/integration options in the future.

I personally wanted something with a tighter feedback loop that felt more akin to git. I also thought that simplifying the UI side would help me stay focused on getting the data structures and basic mechanics right in the initial version. But now that the core functionality is in place, I think it will work well as a base for additional frontends.

ENGNR
0 replies
3d21h

I haven't tried it yet, but I think making it fast iteration and simple initially is the right way to go. Nice one sharing this as open source!

vertis
0 replies
3d22h

I disagree, having used Sweep extensively, I've found the GitHub Issue -> PR flow to be incredibly clunky with a lack of ability to see what is happening and what has gone wrong.

asadalt
3 replies
3d23h

In demo it modified UI components, is there any model that can look at the rendered page to see if it looks right? Right now all these wrappers just blindly edit the code.

danenania
1 replies
3d23h

Plandex can't do this yet, but soon I want to add GPT4-vision (and other multi-modal models) as model options, which will enable this kind of workflow.

asadalt
0 replies
3d22h

Well I have built similar project that lives in github action, communicates via issues and sends PR when done.

4-vision isn't there yet. It can mostly OCR or pattern recognize the image if it's popular or has some known object. It cannot detect pixel differences or css/alignment issues.

razster
0 replies
3d22h

I paired mine with VSCode and used the live view addon for that folder. So far so good.

lprubin
2 replies
4d3h

Looks interesting. Can you go into more detail about why you like this better for large/complex tasks compared to GH Copilot?

Cieric
1 replies
4d2h

Not the author, but I'm in a discord with him, I believe the main selling point here is that it allows you to manage your updates and conversations in a branching pattern that's saved. So if you can't get the AI to do something you can always revert to a prior state and try a different method.

Also it doesn't work on a "small view of the world" like Copilot from when I was using it could only insert code around your cursor (I understand that copilot pulls in a lot of context from all the files you have open, but the area it can modify is really small). This can add/remove/update code in multiple files at once. But it'll also just show you a diff first before it applies and you can select some or all of the changes made.

danenania
0 replies
3d21h

Yes, couldn't have said it better myself!

FezzikTheGiant
2 replies
3d10h

Curious to know how you built this. Is it GPT-4 or a fine-tuned model. How much does it cost?

danenania
1 replies
3d1h

It's written in Go. The models that it uses are configurable, but it mostly uses gpt-4-turbo by default currently. It calls the OpenAI API on your behalf with your own API key. No fine-tuning yet, though I'm interested in trying that in the future.

FezzikTheGiant
0 replies
2d22h

Appreciate the response. Really cool work!

ukuina
1 replies
3d17h

Congrats! Looks great, and I can't wait to try it.

Do you support AzureOpenAI with custom endpoints?

Are any special settings necessary to disable telemetry or non-core network requests?

danenania
0 replies
3d17h

Thanks! It doesn't yet support custom endpoints, but it will soon. I'd recommend either joining the Discord (https://discord.gg/plandex-ai) or watching the repo for updates if you want to find out when this gets released.

If you self-host the server, there is no telemetry and no data is sent anywhere except to your self-hosted server and OpenAI.

splatzone
1 replies
3d19h

This is really cool. I tried it and ran into a few syntax errors - it kept missing closing braces in PHP for some reason.

It seems it might be useful if it could actually try to execute the code, or somehow check for syntax errors/unimplemented functions before accepting the response from the LLM.

danenania
0 replies
3d19h

Thanks! Was this on cloud or self-hosted? If cloud and you created an account, feel free to ping me on Discord (https://discord.gg/plandex-ai) or by email (dane@plandex.ai) and let me know your account email so I can investigate. If you have an anonymous trial account on cloud, please still ping me--I can track it down based on file names. There is definitely some work to do in ironing out these kinds of edge cases.

"It seems it might be useful if it could actually try to execute the code, or somehow check for syntax errors/unimplemented functions before accepting the response from the LLM."

Indeed, I do have some ideas on how to add this.

rglover
1 replies
3d22h

This is something I've been thinking a lot about (a way to set context for an LLM against my own code), thank you for putting this out. Looks really polished.

danenania
0 replies
3d22h

Thanks! Please let me know how it goes for you if you try it :)

parentheses
1 replies
2d15h

Very small nit: it'd be nice to provide an OpenAI org in case multiple orgs exist.

danenania
0 replies
2d2h

Ok, I made a note to add that. Thanks for the feedback!

parentheses
1 replies
3d8h

This looks so damn good! Can't wait to try it in the morning!

danenania
0 replies
3d1h

Thanks! Please let me know how it goes for you.

mbil
1 replies
3d22h

Looks really interesting. Is it wrapping git for the rollback and diffing stuff? If I were a user I'd probably opt to use git directly for that sort of thing.

danenania
0 replies
3d22h

Yes, it does use git underneath, with the idea of exposing a very simple subset of git functionality to the user. There's also some locking and transaction logic involved to ensure integrity and thread safety, so it wouldn't really be straightforward to expose the repo directly.

I tried to build the backend so that postgres, the file system, and git would combine to form effectively a single transactional database.

liampulles
1 replies
3d10h

I appreciate in the copy here that you are not claiming plandex to be a super dev or some such nonsense.

I really dislike the hype marketing in some other solutions.

danenania
0 replies
3d1h

Thanks! I agree. I think the key to working effectively with LLMs is to understand and embrace their limitations, using them for tasks they're good at while not spinning the tires on tasks (or parts of tasks) that they aren't yet well-suited for.

jtwaleson
1 replies
3d6h

As someone who is trying to build a bootstrapped startup in spare time (read: coding while tired), this is amazing. Thank you so much for creating it.

danenania
0 replies
3d1h

Thanks! I agree it's great for coding while tired. I also like it when I'm procrastinating or feeling lazy. I find it helps to reduce the activation energy of getting started.

jayloofah
1 replies
4d1h

What is the cost of planning and working through, let's say, a manageable issue in a repo? Does it make sense to use 3.5/Sonnet or some lower cost endpoint for these tasks?

danenania
0 replies
4d1h

It's hard to put a precise number on it because it depends on exactly how much context is loaded, how many model responses the task needs to finish, and how much iteration you need to do in order to get the results you're looking for.

That said, you can do quite a meaty task for well under $1. If you're using it heavily it can start to add up over time, so you'd just need to weigh that cost against how you value your time I suppose. In the future I do hope to incorporate fine tuned models that should bring the cost down, as well as other model options like I mentioned in the post.

You can try different models and model settings with `plandex set-model` and see how you go. But in my experience gpt-4 is really the minimum bar for getting usable results.

j45
1 replies
3d22h

Congrats on the launch.

danenania
0 replies
3d21h

Thank you!

htrp
1 replies
2d22h

Are you using plandex to write improvements to plandex?

danenania
0 replies
2d20h

Yes, quite often! Some of the most complex bits involving stream handling and concurrency were easier to do myself, but it’s been very helpful for http handlers, CLI commands, formatted output, TUIs, AWS infrastructure, and a lot more. I’ve also used it to track down bugs.

dr_kiszonka
1 replies
3d18h

Hi! Is it possible to tell Plandex that the code should pass all tests in, e.g., `tests.py`?

danenania
0 replies
3d17h

Hey! Not in an automated way (yet). But you can get pretty close by building your plan, applying it, and then piping the output of your tests back into Plandex:

  pytest tests.py | plandex load
  plandex tell "update the plan files to fix the failing tests from the included pytest output"

aksyam
1 replies
3d23h

Love this. Super excited AI-SWEs, will give it a try.

danenania
0 replies
3d12h

Awesome, thank you!

ahstilde
1 replies
3d20h

this looks neat i can't wait to try it out.

danenania
0 replies
3d12h

Thanks! Let me know how it goes :)

_bry-guy
1 replies
3d19h

Wow, this is phenomenal! I can't wait to dig in. This is almost exactly the application I've been envisioning for my own workflow. I'm excited to contribute!

danenania
0 replies
3d18h

Thank you! Awesome, I'm glad to hear that! Looking forward to your thoughts, and your contributions :)

danenania
0 replies
3d2h

This is really cool! And quite accurate.

bobby_the_whal
0 replies
3d4h

If this thing really worked, why wouldn't you just point it at AWS documentation and ask it to implement the exact same APIs and come up with designs for the datacenters in extreme detail? Implementing APIs is completely legal.