Sadly I can't try this because I'm on Windows or Linux.
Was testing apps like this if anyone is interested:
Best / Easy to use:
More complex / Unpolished UI:
- https://www.nvidia.com/en-us/ai-on-rtx/chat-with-rtx-generat...
- https://github.com/LostRuins/koboldcpp
Misc:
- https://faraday.dev (AI Characters):
No UI / Command line (not for me):
- https://github.com/Mozilla-Ocho/llamafile
Pending to check:
Feel free to recommend more!
Since I couldn't find it in your list, I'd like to plug my own macOS (and iOS) app: Private LLM. Unlike almost every other app in the space, it isn't based on llama.cpp (we use mlc-llm) or naive RTN quantized models (we use OmniQuant). Also, the app has deep integrations with macOS and iOS (Shortcuts, Siri, macOS Services, etc).
Incidentally, it currently runs Mixtral 8x7B Instruct[2] and Mistral[3] models faster than any other macOS app. The comparison videos are with Ollama, but it generalizes well to almost every other macOS app that I've seen uses llama.cpp for inference. :)
nb: Mixtral 8x7B Instruct requires an Apple Silicon Mac with at least 32GB of RAM.
[1]: https://privatellm.app/
[2]: https://www.youtube.com/watch?v=CdbxM3rkxtc
[3]: https://www.youtube.com/watch?v=UIKOjE9NJU4
What's the performance like in tokens/s?
You can see ms/token in a tiny font on the top of the screen, once the text generation completes in both the videos I'd linked to. Performance will vary by machine. On my 64GB M2 Mac Studio Max, I get ~47 tokens/s (21.06ms/token) with Mistral Instruct v0.2 and ~33 tokens/s (30.14ms/token) with Mixtral Instruct v0.1.
Interesting! What's the prompt eval processing speed like compared to llama.cpp and kin?
I haven't run any specific low level benchmarks, lately. But chunked prefilling and tvm auto-tuned Metal kernels from mlc-llm seemed to make a big differenced, the last time I checked. Also, compared to stock mlc-llm, I use a newer version of metal (3.0) and have a few modifications to make models have a slightly smaller memory and disk footprint, also slightly faster execution. Because unlike the mlc-llm folks, I only care about compatibility with Apple platforms. They support so much more than that in their upstream project.
thanks, I'll give it a crack
MacGPT is way handy because of a global keyboard shortcut which opens a spotlight-like prompt. I would love to have a local equivalent
Oh thanks! didn't know there are quite a few ChatGPT local alternatives. I was wondering what users they are targeting. Engineers or average users? I guess average users will likely choose ChatGPT and Perplexity over local apps for more recent knowledge of the world.
Hi. I'm the author of Msty app, 2nd on the list above. You are right about average users likely choosing ChatGPT over local models. My wife was the first and the biggest user of my app. A software engineer by profession and training but she likes to not worry about LLM world and just to use it as a tool that makes you more productive. As soon as she took Msty for a ride, I realized that some users, despite their background, care about online models. This actually led me adding support for online models right away. However, she really likes to make use of the parallel chat feature and uses both Mistral and ChatGPT models to give same prompt and then compare the output and choose the best answer (or sometimes make a hybrid choice). She says that being able to compare multiple outputs like that is a tremendously helpful. But that's the extent of local LLMs for her. So far my effort has been to target a bit higher than the average users while making it approachable for more advanced users as well.
I’m looking for a ChatGPT client alternative, i.e. I can use my own OpenAI API key in some other client.
Offline isn’t important for me, only that $20 is a lot of money, when I’d wager most months my usage is a lot less. However, I’d still want access to completion, DALL-E, etc.
Would Msty be a good option for me?
Give it a try and see how you feel. "Yes, it will" be a dishonest answer to be completely honest at least at this point. The app has been out for just about a month and I am still working in it. I would love a user like you to give it a try and give me some feedback (please). I am very active on our Discord if you want to get in touch (just mention your HN username and I will wace).
Thank you so much, I’m excited to give this a try in the next few days.
Looks great, though the fact that you have to ignore your anti-virus warning during installation, and the fact that it phones home (to insights.msty.app) directly after launch despite the line in the FAQ on not collecting any data makes me a little skittish.
Do any of these let you dump in a bunch of your own documents to use as a corpus and then query and summarize them ?
Yes, GPT4All has RAG-like features. Basically you configure some directories and then have it load docs from whatever folders you have enabled for the model you're currently using. I haven't used it a ton, but I have used it to review long documents and it's worked well depending on the model.
https://github.com/imartinez/privateGPT
Author of Msty here. Not yet but I am already working on the design for it to be added in very near future. I am happy to chat more with you to understand your needs and what you are looking in such apps. Please hop on the Discord if you don't mind :)
Open-WebUI has support for doing that, it works using #tags for each document so you can ask questions about multiple specific documents.
The new one straight from Nvidia does I believe.
I am the author of Msty app mentioned here. So humbled to see an app that is just about a month old that I mostly wrote for my wife and some friends to begin with (who got overwhelmed with everything that was going in LLM world), on the top of your list. Thank you!
If you need help for testing the Linux version let me know, I’d be happy to help
I was actually looking for one! What's the best way to reach you? Mind jumping on our Discord so that I can share the installer with you soon?
One bit of feedback: there's nowhere to put system messages. These can be much more influential than user prompts when it comes to shaping the tone and style of the response.
That's on the top of our list. It got pushed back because we want to support creating a character/profile (basically select a model and apply some defaults including a system prompt). But I feel like that was a mistake tomwait for it. Regardless, it is getting added in the next release (the one after something that is dropping in a day or 2, which is a big release in itself)
Looks interesting, but can't see what it is doing. Any link to the source code?
lmstudio is using a dark pattern I really hate. Don't have a Github logo in your webpage if your software is not source available. It just takes to Github to some random config repos they have. This is poor choice in my opinion.
We call that stolen valor.
have you seen llamafile[0]?
[0] https://github.com/Mozilla-Ocho/llamafile
Add Open-WebUI (used to be Ollama-WebUI)
https://github.com/open-webui/open-webui
a well featured UI with very active team
We just added local LLM support to our curiosity.ai app too - if anyone wants to try we're looking for feedback there!
Just FYI, llamafile includes a web-based chat UI. It fires up automatically.
Nice, adding these to my list. Here's a list that I put together, it has active GitHub projects for LLM UIs, ordered by stars:
- https://github.com/nomic-ai/gpt4all
- https://github.com/imartinez/privateGPT
- https://github.com/oobabooga/text-generation-webui
- https://github.com/FlowiseAI/Flowise
- https://github.com/lobehub/lobe-chat
- https://github.com/PromtEngineer/localGPT
- https://github.com/h2oai/h2ogpt
- https://github.com/huggingface/chat-ui
- https://github.com/SillyTavern/SillyTavern
- https://github.com/ollama-webui/ollama-webui
- https://github.com/Chainlit/chainlit
- https://github.com/LostRuins/koboldcpp
- https://github.com/ParisNeo/lollms-webui/
Thanks for the list. Tried Jan just now as it is both easy and open source. It is a bit buggy I think but the concept is ace. The quick install, tells you which models work on your machine, one click download and then a chatgpt style interface. Mistral 7B running on my low spec laptop at 6 token/s making some damn sense is amazing. The bugs are at the inference time. Could be hardware issues though, not sure. YMMV
Try this one: https://uneven-macaw-bef2.hiku.app/app/
It loads the LLM in the browser, using webgpu, so it works offline after the first load, it's also PWA you can install. It should work on chrome > 113 on desktop and chrome > 121 on mobile.
Khoj was one of the first 'low-touch' solutions out there I think. It's ok, but still under active development, like all of them really.
https://khoj.dev/
What about https://github.com/open-webui/open-webui ?
Seems to have more features than all of them