return to table of content

Show HN: I made an app to use local AI as daily driver

pentagrama
36 replies
15h14m

Sadly I can't try this because I'm on Windows or Linux.

Was testing apps like this if anyone is interested:

Best / Easy to use:

- https://lmstudio.ai

- https://msty.app

- https://jan.ai

More complex / Unpolished UI:

- https://gpt4all.io

- https://pinokio.computer

- https://www.nvidia.com/en-us/ai-on-rtx/chat-with-rtx-generat...

- https://github.com/LostRuins/koboldcpp

Misc:

- https://faraday.dev (AI Characters):

No UI / Command line (not for me):

- https://ollama.com

- https://privategpt.dev

- https://serge.chat

- https://github.com/Mozilla-Ocho/llamafile

Pending to check:

- https://recurse.chat

Feel free to recommend more!

woadwarrior01
6 replies
10h23m

Since I couldn't find it in your list, I'd like to plug my own macOS (and iOS) app: Private LLM. Unlike almost every other app in the space, it isn't based on llama.cpp (we use mlc-llm) or naive RTN quantized models (we use OmniQuant). Also, the app has deep integrations with macOS and iOS (Shortcuts, Siri, macOS Services, etc).

Incidentally, it currently runs Mixtral 8x7B Instruct[2] and Mistral[3] models faster than any other macOS app. The comparison videos are with Ollama, but it generalizes well to almost every other macOS app that I've seen uses llama.cpp for inference. :)

nb: Mixtral 8x7B Instruct requires an Apple Silicon Mac with at least 32GB of RAM.

[1]: https://privatellm.app/

[2]: https://www.youtube.com/watch?v=CdbxM3rkxtc

[3]: https://www.youtube.com/watch?v=UIKOjE9NJU4

sigmoid10
4 replies
10h7m

What's the performance like in tokens/s?

woadwarrior01
3 replies
8h59m

You can see ms/token in a tiny font on the top of the screen, once the text generation completes in both the videos I'd linked to. Performance will vary by machine. On my 64GB M2 Mac Studio Max, I get ~47 tokens/s (21.06ms/token) with Mistral Instruct v0.2 and ~33 tokens/s (30.14ms/token) with Mixtral Instruct v0.1.

castles
2 replies
6h3m

Interesting! What's the prompt eval processing speed like compared to llama.cpp and kin?

woadwarrior01
1 replies
5h34m

I haven't run any specific low level benchmarks, lately. But chunked prefilling and tvm auto-tuned Metal kernels from mlc-llm seemed to make a big differenced, the last time I checked. Also, compared to stock mlc-llm, I use a newer version of metal (3.0) and have a few modifications to make models have a slightly smaller memory and disk footprint, also slightly faster execution. Because unlike the mlc-llm folks, I only care about compatibility with Apple platforms. They support so much more than that in their upstream project.

castles
0 replies
4h50m

thanks, I'll give it a crack

iknowstuff
0 replies
9h11m

MacGPT is way handy because of a global keyboard shortcut which opens a spotlight-like prompt. I would love to have a local equivalent

lolpanda
5 replies
13h59m

Oh thanks! didn't know there are quite a few ChatGPT local alternatives. I was wondering what users they are targeting. Engineers or average users? I guess average users will likely choose ChatGPT and Perplexity over local apps for more recent knowledge of the world.

chown
4 replies
13h38m

Hi. I'm the author of Msty app, 2nd on the list above. You are right about average users likely choosing ChatGPT over local models. My wife was the first and the biggest user of my app. A software engineer by profession and training but she likes to not worry about LLM world and just to use it as a tool that makes you more productive. As soon as she took Msty for a ride, I realized that some users, despite their background, care about online models. This actually led me adding support for online models right away. However, she really likes to make use of the parallel chat feature and uses both Mistral and ChatGPT models to give same prompt and then compare the output and choose the best answer (or sometimes make a hybrid choice). She says that being able to compare multiple outputs like that is a tremendously helpful. But that's the extent of local LLMs for her. So far my effort has been to target a bit higher than the average users while making it approachable for more advanced users as well.

Gunnerhead
2 replies
13h28m

I’m looking for a ChatGPT client alternative, i.e. I can use my own OpenAI API key in some other client.

Offline isn’t important for me, only that $20 is a lot of money, when I’d wager most months my usage is a lot less. However, I’d still want access to completion, DALL-E, etc.

Would Msty be a good option for me?

chown
1 replies
13h20m

Give it a try and see how you feel. "Yes, it will" be a dishonest answer to be completely honest at least at this point. The app has been out for just about a month and I am still working in it. I would love a user like you to give it a try and give me some feedback (please). I am very active on our Discord if you want to get in touch (just mention your HN username and I will wace).

Gunnerhead
0 replies
12h36m

Thank you so much, I’m excited to give this a try in the next few days.

AriedK
0 replies
10h11m

Looks great, though the fact that you have to ignore your anti-virus warning during installation, and the fact that it phones home (to insights.msty.app) directly after launch despite the line in the FAQ on not collecting any data makes me a little skittish.

joshmarinacci
5 replies
13h42m

Do any of these let you dump in a bunch of your own documents to use as a corpus and then query and summarize them ?

windexh8er
0 replies
13h37m

Yes, GPT4All has RAG-like features. Basically you configure some directories and then have it load docs from whatever folders you have enabled for the model you're currently using. I haven't used it a ton, but I have used it to review long documents and it's worked well depending on the model.

chown
0 replies
13h17m

Author of Msty here. Not yet but I am already working on the design for it to be added in very near future. I am happy to chat more with you to understand your needs and what you are looking in such apps. Please hop on the Discord if you don't mind :)

Datagenerator
0 replies
11h27m

Open-WebUI has support for doing that, it works using #tags for each document so you can ask questions about multiple specific documents.

8n4vidtmkvmk
0 replies
12h38m

The new one straight from Nvidia does I believe.

chown
5 replies
14h58m

I am the author of Msty app mentioned here. So humbled to see an app that is just about a month old that I mostly wrote for my wife and some friends to begin with (who got overwhelmed with everything that was going in LLM world), on the top of your list. Thank you!

petemir
1 replies
11h5m

If you need help for testing the Linux version let me know, I’d be happy to help

chown
0 replies
8h20m

I was actually looking for one! What's the best way to reach you? Mind jumping on our Discord so that I can share the installer with you soon?

crooked-v
1 replies
10h9m

One bit of feedback: there's nowhere to put system messages. These can be much more influential than user prompts when it comes to shaping the tone and style of the response.

chown
0 replies
8h16m

That's on the top of our list. It got pushed back because we want to support creating a character/profile (basically select a model and apply some defaults including a system prompt). But I feel like that was a mistake tomwait for it. Regardless, it is getting added in the next release (the one after something that is dropping in a day or 2, which is a big release in itself)

Datagenerator
0 replies
11h30m

Looks interesting, but can't see what it is doing. Any link to the source code?

wanderingmind
1 replies
10h26m

lmstudio is using a dark pattern I really hate. Don't have a Github logo in your webpage if your software is not source available. It just takes to Github to some random config repos they have. This is poor choice in my opinion.

Hugsun
0 replies
6h21m

We call that stolen valor.

theolivenbaum
0 replies
12h33m

We just added local LLM support to our curiosity.ai app too - if anyone wants to try we're looking for feedback there!

stlhood
0 replies
13h51m

Just FYI, llamafile includes a web-based chat UI. It fires up automatically.

quickthrower2
0 replies
7h52m

Thanks for the list. Tried Jan just now as it is both easy and open source. It is a bit buggy I think but the concept is ace. The quick install, tells you which models work on your machine, one click download and then a chatgpt style interface. Mistral 7B running on my low spec laptop at 6 token/s making some damn sense is amazing. The bugs are at the inference time. Could be hardware issues though, not sure. YMMV

hmdai
0 replies
10h38m

Try this one: https://uneven-macaw-bef2.hiku.app/app/

It loads the LLM in the browser, using webgpu, so it works offline after the first load, it's also PWA you can install. It should work on chrome > 113 on desktop and chrome > 121 on mobile.

greggsy
0 replies
8h40m

Khoj was one of the first 'low-touch' solutions out there I think. It's ok, but still under active development, like all of them really.

https://khoj.dev/

girishso
34 replies
16h50m

I will totally pay for something like this if it answers from my local documents, bookmarks, browser history etc.

xyc
14 replies
16h34m

Yes it would be the next big focus on this. Personal data connectivity is what I see where local AI would excel - despite model power differences.

Satam
7 replies
12h48m

I have doubts about that. Most personal data actually lives in the cloud these days. If you need your Gmail emails, you'll need to use their API which is guarded behind $50k certification fee or so. I think there is a simpler version for personal use, but you still need to get the API key. Who's going to teach their mom about API keys? So I think for a lot of these data sources you'll end up with enterprise AIs integrating them first for a seamless experience.

noduerme
2 replies
8h45m

Seconding a sibling question: What $50k API fee? To access your gmail? I've been using gmail since 2008 or so without ever touching their web/app interface or getting an API key. You just use it as an IMAP server.

Satam
1 replies
7h58m

To use Google's sensitive APIs in production you have to certify your product and that costs tens of thousands. To be honest, didn't think about imap at first, but it looks like that could be getting tougher soon too https://support.google.com/a/answer/14114704?hl=en. Soon they will require oAuth for imap and with oAuth you'll need the certification: https://developers.google.com/gmail/imap/xoauth2-protocol. If it's for personal use, you might be able to get by with just with some warnings in the login flow but it won't be easy to get oAuth flow setup in the first place.

noduerme
0 replies
7h16m

Yeah, Thunderbird integrated oAuth in the last few releases, mainly to keep up with the Gmail and Hotmail requirements. Made it very user-friendly to set up in the GUI right within T-bird. I don't see this being a major obstacle.

I'm not sure I can imagine a scenario in production where Google would, or should, allow API access to individual gmail accounts. What's that for? So you can read all your employees' mail without running your own email server?

xyc
1 replies
9h49m

I think this is a good take. While there's big enough niche for personal data locally, I'd love if there's a way to solve for email/cloud data requiring API keys.

noduerme
0 replies
8h40m

Ideally, though, a sufficiently smart LLM shouldn't need API access. It could navigate to your social media login page, supply your credentials, and scrape what it sees. Better yet, it should just reverse-engineer the API ;)

samstave
0 replies
1h3m

What?

I manage both gmail and protonmail via thunderbird - where I have better search and sort using IMAP.

coev
0 replies
9h46m

Why wouldn't you be able to use IMAP over the gmail api? IMAP returns the text and headers of all your emails, which is what you'd want the LLM to ingest anyway.

ssnri
3 replies
13h57m

I would even let it have longer processing times for queries to apply against each document in my system, allow it to specialize/train itself on a daily basis…

Use all the resources you want if you save me brainpower

xyc
2 replies
12h27m

Agree, there's a non real-time angle to this.

samstave
1 replies
1h0m

"give me a summary of the news around this topic each morning for my daily read"

Help me plan for upcoming meetings whereby if I put something in calendar, it will build a little dossier for the event, and include relevant info based on the type of event or meeting, mostly scheduling reminders or prompting you with updates or changes to the event etc.

ssnri
0 replies
39m

“filter out baby pictures from my family text threads”

chaostheory
0 replies
15h54m

Yeah, we’re getting closer to “Her”

_boffin_
0 replies
15h13m

Good to know there's a market for that. Currently building out something. Integrating from numerous sources, processing and then utilizing those.

nice.

wkat4242
8 replies
7h52m

Stupid question but what does RAG stand for?

onehp
7 replies
7h24m

Retrieval augmented generation. In short you use an LLM to classify your documents (or chunks from them) up front. Then when you want to ask the LLM a question you pull the most relevant ones back to feed it as additional context.

danielovichdk
6 replies
6h39m

I dont get it. To my understanding it takes huge amounts of data to build any any form of RAG. Simply because it enlarges the statistical model you later prompt. If the model is not big enough how would you expect it to answer you in a non qualifying matter ? It simply can't.

So I don't really buy it and I have yet to see it work better than any rdbms search index.

Tell me I am wrong, I would like to see a local model based on my own docs being able to answer me quality answers based on quality prompts.

tveita
3 replies
6h14m

RAG doesn't require much data or involve any training, it is a fancy name for "automatically paste some relevant context into the prompt"

Basically if you have a database of three emails and ask when Biff wanted to meet for lunch, a RAG system would select the most relevant email based on any kind of search - embeddings are most fashionable, and create a prompt like

"""Given this document: <your email>, answer the question "When does Biff want to meet for lunch?"""

loudmax
1 replies
5h1m

That's not how RAG works. What you're describing is something closer to prompt optimization.

Sibling comment from discordance has a more accurate description of RAG. There's a longer description from Nvidia here: https://blogs.nvidia.com/blog/what-is-retrieval-augmented-ge...

tveita
0 replies
4h23m

Right, you read something nebulous about how "the LLM combines the retrieved words and its own response to the query into a final answer it presents to the user", and you think there is some magic going on, and then you click one link deeper and read at https://ai.meta.com/blog/retrieval-augmented-generation-stre... :

Given the prompt “When did the first mammal appear on Earth?” for instance, RAG might surface documents for “Mammal,” “History of Earth,” and “Evolution of Mammals.” These supporting documents are then concatenated as context with the original input and fed to the [...] model

Finding the relevant context to put in the prompt is a search problem, nearest neighbour search on embeddings is one basic way to do it but the singular focus on "vector databases" is a bit of hype phenomenon IMO - a real world product should factor a lot more than just pure textual content into the relevancy score. Or is your personal AI assistant going to treat emails from yesterday as equally relevant as emails from a year ago?

machiaweliczny
0 replies
3h26m

Legit explanation, that's how it works AFAIK.

discordance
1 replies
5h58m

RAG:

1. First you create embeddings from your documents

2. Store that in a vector db

3. Ask what the user wants and do a search in the vector db (cosine similarity etc)

4. Feed the relevant search results to your LLM and do the usual LLM stuff with the returned embeddings and chunks of the documents

bigfudge
0 replies
4h29m

Although RAG is often implemented via vector databases to find 'relevant' content, I'm not sure that's a necessary component. I've been doing what I call RAG by finding 'relevant' content for the current prompt context via a number of different algorithms that don't use vectors.

Would you define RAG only as 'prompt optimisation that involves embeddings'?

chb
2 replies
16h24m

This. There was a post in HN last week, iirc, referring to just such a solution called ZenFetch (?). I would have adopted it in a heartbeat but they don’t currently have a means of exporting the source data you feed to it (should you elect it as your sole means of bookmarking, etc)

gabev
1 replies
14h33m

Hey there,

This is Gabe, the founder of Zenfetch. Thanks for sharing. We're putting together an export option where you can download all your saved data as a CSV and should get that out by end of week.

samstave
0 replies
1h4m

Seems like this would be a good tool to build lessons on - if you could share a "class" and export a link for others to then copy the class, and expand on the lesson/class/topic into their own AI. but as a separate "class" and not fully integrated to my regular history blob?

I want the ability to search all my downloaded files and organize them based on context within. Have it create a category table, and allow me to "put all pics of my cat in this folder, and upload them to a gallery on imgur."

spiderfarmer
1 replies
10h21m

Next version of MacOS will probably have that.

tethys
0 replies
10h10m

As long as you use Safari for browsing, Notes for note taking, iCloud for mail …

scottrblock
1 replies
15h49m

plus one, I would love to configure a folder of markdown/txt(+ eventually images and pdfs) files that this can have access to. Ideally it could RAG over them in a sensible way. Would love to help support this!

xyc
0 replies
12h19m

Thank you! I'd love to learn more about your use cases. Would you mind sending an email to feedback@recurse.chat or DM me on https://x.com/chxy to get the conversation started?

CGamesPlay
18 replies
16h59m

Possibly a strange question, but do you have plans to add online models to the app? Local models just aren't at the same level, but I would certainly appreciate a consistent chat interface that lets me switch between GPT/Claude/local models.

iansinnott
10 replies
16h48m

You could try out Prompta [1], which I made for this use case. Initially created to use OpenAI as a desktop app, but can use any compatible API including Ollama if you want local completions.

[1]: https://github.com/iansinnott/prompta

CGamesPlay
9 replies
16h26m

This one doesn't seem to support system prompts, which are absolutely essential for getting useful output from LLMs.

iansinnott
3 replies
15h52m

You can update the system prompt in the settings. Admittedly this is not mentioned in the README, but is customizable.

refulgentis
2 replies
14h16m

the system prompt

There isn't a singular system prompt. It really does matter!

Copy the OpenAI playground, you'll thank yourself later

iansinnott
0 replies
10h0m

Fair point, and it's not implemented that way currently. It's more like "custom instructions" but thanks for pointing that out. I haven't used multiple system prompts in the OpenAI playground either, so I hadn't given it much thought.

8n4vidtmkvmk
0 replies
12h34m

You use multiple system prompts in a single chat? What for?

derwiki
3 replies
15h53m

Can you speak more to this? I get useful output from LLMs all the time, but never use system prompts. What am I missing?

CGamesPlay
2 replies
14h51m

Sure, I use one system prompt template to make ChatGPT be more concise. Compare these two: https://sharegpt.com/c/fEZKMIy vs https://sharegpt.com/c/S2lyYON

I use similar ones to get ChatGPT to be more thorough or diligent as well. From my limited experience with local models, this type of system prompting is even more important than with ChatGPT 4.

addandsubtract
1 replies
9h8m

Is there a difference in using a system prompt and just pasting the "system prompt" part at the beginning of your message?

CGamesPlay
0 replies
8h5m

Haven't tested, but having it built-in is more convenient, and convenience is why I'm using these tools in the first place (as a replacement for StackOverflow, for example).

a_bonobo
0 replies
15h24m

I've run into the same problem with deploying Gemini locally, it does not seem to support System Prompts. I've cheated around this by auto-prepending the system prompt to the user prompt, and then deleting it from the user-displayed prompt again.

christiangenco
3 replies
16h14m

...how did you highlight a specific sentence like that?

QuinnyPig
1 replies
15h38m

It just worked on Safari on iOS. That’s pretty impressive.

xyc
0 replies
16h53m

Not strange at all! It's a very valid ask. The focus is local AI, but GPT-3.5/GPT-4 are actually included in the app (bring your own key), although customization is limited. Planning to expose some more customizability there including API base urls / model names.

longnguyen
0 replies
10h8m

Shameless plug: if you need multiple AI Service Provider, give BoltAI[0] a try. It’s native (not Electron), and supports multiple services: OpenAI, Azure OpenAI, OpenRouter, Mistral, Ollama…

It also allows you to interact with LLMs via multiple different interfaces: Chat UI, a context-aware called AI Command and an Inline mode.

[0]: https://boltai.com

domano
9 replies
5h52m

Hey, i bought it, nice work!

A few things:

* The main thing that makes ChatGPTs ui useful to me is the ability to change any of my prompts in the conversation & it will then go back to that part of the converation and regenerate, while removing the rest of the conversation after that point.

Such a chat ui is not usable for me without this feature.

* The feedback button does nothing for me, just changes focus to chrome.

* The LLaVA model tells me that it can not generate images since it is a text based AI model. My prompts were "Generate an image of ..."

wodow
6 replies
5h44m

* The main thing that makes ChatGPTs ui useful to me is the ability to change any of my prompts in the conversation & it will then go back to that part of the converation and regenerate, while removing the rest of the conversation after that point.

Agreed, but what I would also really like (from this and ChatGPT) would be branching: take a conversation in two different ways from some point and retain the seperate and shared history.

I'm not sure what the UI should be. Threads? (like mail or Usenet)

shanusmagnus
2 replies
4h4m

1000 upvotes for you. My brain can't compute why someone hasn't made this, along with embeddings-based search that doesn't suck.

FredPret
0 replies
3h26m

I bet UI and UX innovation will follow, but model quality is the most important thing.

If I were OpenAI, I would 95% of resources on ChatGPT5, and 5% into UX.

Once the dust settles, if humanity still exists, and human customers are still economically relevant, AI companies will shift more resources to UX.

ItsMattyG
2 replies
3h31m

ChatGPT does this. You just click an arrow and it will show you other branches.

ApolloFortyNine
1 replies
1h39m

I have ChatGPT4, I have no idea what arrow you are talking about. Could you be more specific? I see now arrow on any of my previous messages or current ones.

wodow
0 replies
1h3m

By George, ItsMattyG is right! After editing a question (with the "stylus"/pen icon), the revision number counter that appears (e.g. "1 / 2") has arrows next to it that allow forward and backward navigation through the new branches.

This was surprisingly undiscoverable. I wonder if it's documented. I couldn't find anything from a quick look at help.openai.com .

xyc
0 replies
11m

Thank you for the support and the valuable feedback! Sorry about the response time, I haven't expected the incoming volume of requests.

* For changing prompt in the middle - I'll take a crack at it this week. It's on top of my post launch list.

* Feedback button: Thanks for reporting this. The button was supposed to open default email client to email feedback@recurse.chat

* LLaVA model: I'll add more documentation. You are right Llava could not generate images. It can only describe images (similar to GPT-4v). For image generation, it's not supported in the app. While I don't have immediate plans for image generation, check out these projects for local image generation.

- https://diffusionbee.com/

- https://github.com/comfyanonymous/ComfyUI

- https://github.com/AUTOMATIC1111/stable-diffusion-webui

pps
0 replies
5h30m

The LLaVA model tells me that it can not generate images since it is a text based AI model.

Because it can't generate images, it can only describe images provided by the user.

raajg
7 replies
16h57m

looks promising, but after looking at the website I'm yearning to learn more about it! How does it compare to alternatives? What's the performance like? There isn't enough to push me to stop using ChatGPT and use this instead. Offline is good, but to get users at scale there has to be a compelling reason to shift. I don't think that offline capabilities are going to be enough to get significant number of users.

Another tip, I try out a new chat interface to LLMs almost every week and they're free to use initially. There isn't a compelling reason for me to spend $10 from the get to for a use case that I'm not sure about yet.

FloorEgg
4 replies
16h40m

Maybe this isn't for everyone, just the people who place a high value on privacy.

copperx
0 replies
10h55m

Are you implying Claude is an open source model?

ukuina
0 replies
16h19m

But how can I guarantee this app is private?

I'm assuming I cannot block internet access to the app because it needs to verify App Store entitlement.

giblfiz
0 replies
16h0m

I mean, ok, then how do you distinguish yourself from LM Studio (Free)

bradnickel
1 replies
14h28m

The compelling reason to shift to local/decentralized AI is that all of compute will soon be AI and that means your entire existence will go into it. The question you should ask yourself is do you want everything about you being handled by Sam Altman, Google, Microsoft, etc? Do you want all of your compute dependent on them always being up and do you want to trust their security team with your life? Do you want to still be using closed/centralized/hosted AI when truly open AI surpasses all of them in performance and capability. If you have children or family, do you want them putting their entire lives in the hands of those folks.

Decentralized AI will eventually become p2p and swarmed and then the true power of agents and collaboration will soar via AI.

Anyway, excuse the soap box, but there are zero valid reasons for supporting and paying centralized keepers of AI that rarely share, collaborate or give back to the community that made what they have possible.

gverrilla
0 replies
6h38m

when truly open AI surpasses all of them in performance and capability.

Is this true? I've tried llama last year and it was not very helpful. GPT4 is already full of problems and I have to keep circumventing them, so using something less capable doesn't get me too excited.

devinprater
6 replies
13h21m

There's another one someone made for blind users like themselves and me, called Vollama (they use a mac, so VoiceOver + Llama). It's really good. I haven't tested many others for accessibility, but it has RAG and uses Ollama as backend, so works very well for me.

https://github.com/chigkim/VOLlama/

chown
3 replies
13h6m

It's very nice that there exists something like that. I am an author of one of the similar apps [1] someone listed in a different thread. I was hoping I could get in touch with someone like you who could give me some feedback on how to make my app more accessible for users like you. I really want to it be an "LLM for all" kind of app but despite my best efforts and intention, I suck at it. Any chance of getting in touch with you and get some feedback? Only if you want and have time, no pressure at all.

[1] https://msty.app

devinprater
1 replies
12h38m

Sure, I'll probably join the discord tomorrow morning, but a few notes:

* For apps like this, using live regions to speak updates may be helpful. either that or change the buttons, like from "download local AI" to "configuring." Maybe a live region would be best for that one since sighted people would probably be looking near the bottom for the status bar, but anyway... * Using live regions for chats is pretty important, because otherwise we don't know when a message is ready to read, and it makes reading those messages much simpler. The user types the message, presses Enter, and the screen reader reads the message to them. So, making a live region, and then sending the finished message, or a finished part of a message, to that live region would be really helpful. * Now on to the UI. At the top, we have "index /text-chat-sessions". I guess that should just say "chats"? Below that, we have a list, with a button saying the same thing. After that list with one item, is a button that says "index /local-ai". That should probably just be "local AI". Afterwards, there is "index /settings", which should just be "settings." Then, there is an unlabeled button. I'm guessing this is styled to look like a menu bar, across the top of the window, so it'd be the item on the right side. Now, there's a button below that that says "New Chat^N". I, being a technical user, am pretty sure the "^N" means "Control + N", but almost no one else knows that. So, maybe change that text label. Between that and the Recent Chats menu button are two unlabeled buttons. I'm not sure why a region landmark was used for the recent chats list, but after the chat name "hello" in this case, where I can rename the chat, there is an unlabeled button. The button after the model chooser is unlabeled as well. After the user input in the conversation, there are three unlabeled buttons. After the response, there is a menu button with (oh, that's cool) items to transform the response into bullets, a table, ETC. but that menu button was unlabeled so I had to open it to see what's inside. After that, all other buttons, like for adding instructions to refine this message, are also unlabeled.

So, live regions for speaking chat messages and state changes like "loading" or "ready" or whatever (keep them short), and label controls, and you should be good to go.

Live regions: https://developer.mozilla.org/en-US/docs/Web/Accessibility/A...

chown
0 replies
8h2m

Wow! This is already very helpful and was the kind of feedback I was looking for. Thank you!

indit
0 replies
7h46m

Hi, I just use msty. Could it use an already downloaded gguf file?

karolist
1 replies
3h52m

Hey. I'm sorry about your condition. I feel I'm approaching blindness eventually, this is very random, but perhaps you could share any resources I could learn to prepare for this so I could continue using the web when/if it happens.

ggerganov
5 replies
9h10m

Thanks to the amazing work of @ggerganov on llama.cpp which made this possible. If there is anything that you wish to exist in an ideal local AI app, I'd love to hear about it.

The app looks great! Likewise, if you have any requests or ideas for improving llama.cpp, please don't hesitate to open an issue / discussion in the repo

petargyurov
1 replies
9h0m

Did not expect to see the Georgi Gerganov here :) How is GGML going?

Поздрави!

ggerganov
0 replies
7h56m

So far is going great! Good community, having fun. Many ideas to explore :-)

xyc
0 replies
8h48m

Oh wow it's the goat himself, love how your work has democratized AI. Thanks so much for the encouragement. I'm mostly a UI/app engineer, total beginner when it comes to llama.cpp, would love to learn more and help along the way.

titaniumtown
0 replies
3h26m

Wow I've been following your work for a while, incredible stuff! Keep up the hard work, I check llama.cpp's commits and PRs very frequently and always see something interesting in the works (the alternative quantization methods and Flash Attention have been interesting).

duckkg5
0 replies
5h12m

Nothing to add except that your work is tremendous

tartrate
4 replies
9h41m

Full Text Search. Blazingly fast search over thousands of messages.

Natural language processing has come full circle and just reinvented Ctrl+F.

I had to double check that a regular '90s search function was actually the thing being advertised here, and sure enough, there is a gif demonstrating exactly that.

addandsubtract
1 replies
9h13m

Ctrl+F only gets you so far. It doesn't allow you to perform semantic searches, for example. If you don't happen to know a unique word (or set of words) to search for, you're out of luck.

Just the other day, I was able to find a song by typing the phonetic pronunciation (well, as best I could) into ChatGPT, and it knew which song I was talking about right away. No way a regular search engine would've helped me there.

danielovichdk
0 replies
6h35m

No. Your own data only gets you so far. And this is exactly the issue. No local model will make sense because the dataset its given is so small compared to what you are referring to - chatgpt.

It's useless locally.

davely
0 replies
41m

Yeah, I think the call out here is specifically because you the ChatGPT interface doesn't have a search feature (on web). Interestingly, on their iOS app, you can search.

I often find myself opening the app on my phone if I want to find a previous conversation, even if I'm at my desk.

behnamoh
0 replies
3h57m

and yet ChatGPT doesn't support it.

jiriro
4 replies
9h2m

Out of curiosity – how is this app built?:-)

There is a demo clip with a vertical scroll bar which does not fade out as it would do in a native mac app:)

rangera
1 replies
8h27m

Scroll bars don't fade out if you're using a mouse (as opposed to just a trackpad) or if you've set Mac OS Settings > Appearance > Show scroll bars to "Always".

jiriro
0 replies
7h10m

I see! I’ve not used mouse on a mac:-o

Anyway the UI looks not mac native. I’m interested what it is:-)

Alifatisk
1 replies
6h31m

Yeah I am curious what the app is built with. I saw someone mention it's using Electron, so that's a start.

bradnickel
4 replies
14h55m

Love this! Just purchased. I am constantly harping on decentralized AI and love seeing power in simplicity.

Are you on Twitter, Threads, Farcast? Would like to tag you when I add you to my decentralized AI threads.

bradnickel
1 replies
14h33m

Found your Twitter account in a previous post. Just tagged you.

xyc
0 replies
13h6m

Awesome, thanks for the tag!

xyc
0 replies
14h39m

Thank you so much for the support! Simplicity is power indeed. I'm on twitter: https://x.com/chxy

hanniabu
0 replies
2h13m

What's your farcaster?

cooper_ganglia
3 replies
13h32m

I read the website for 30 seconds and instantly bought it.

It's clean, easy to use, and works really well! Easy local server hosting was cool, too. I've used the other LLM apps, and this feels like those, but simplified. It just feels good to use. I like it a lot!

I'm gonna test drive it for a while, and if I keep using it regularly, I'll definitely be sending in some feedback. Other users have made a lot of really great recommendations already, I'm excited to see how this evolves!

xyc
2 replies
12h59m

Thanks so much for the kind words and giving it a spin!

Feel free to send feedback, issues, feature suggestion as you use it more, I'm all ears. My twitter DM is also open: https://x.com/chxy.

madduci
1 replies
12h53m

Any chance to see it available on other operating systems as well?

xyc
0 replies
11h55m

Unfortunately not now. If you are interested in email updates: https://tally.so/r/wzDvLM

code51
3 replies
10h5m

Thank you for the work.

Please take this in a nice way: I can't see why I would use this over ChatbotUI+Ollama https://github.com/mckaywrigley/chatbot-ui

Seem the only advantage is having it as MacOS native app and only real distinction is maybe fast import and search - I've yet to try that though.

ChatbotUI (and other similar stuff) are cross-platform, customizable, private, debuggable. I'm easily able to see what it's trying to do.

ayhoung
1 replies
9h26m

Not everyone is a dev

Alifatisk
0 replies
6h33m

HN users keep forgetting that

vood
0 replies
6h21m

Thanks for sharing ChatbotUI. While I'm not an author, I use it extensively and contribute to it. Thanks to the permissive license, I could offer ChatbotUI as a hosted solution with our API keys. https://labs.writingmate.ai.

android521
3 replies
15h19m

how big is the local model? what is the Mac spec requirement? I don't want to download and find out it won't work in my computer. It seems like the first question everyone would ask and should be addressed on the website.

visarga
1 replies
13h9m

It uses ollama which is based on llama.cpp, and adds a model library with dozens of models in all quant sizes.

xyc
0 replies
12h43m

no this doesn't use ollama, just based on llama.cpp.

xyc
0 replies
12h44m

Appreciate the feedback! It works on mac with Apple Silicon only. I'll put some system requirements on the website.

3abiton
3 replies
16h41m

How different is this compared to Jan.ai for example?

xyc
2 replies
16h23m

as i understand jan.ai is more focused on enterprise / platform, while I'd see where recursechat would go is more like "obsidian.md" but as your personal AI.

gexla
1 replies
15h56m

Obsidian has add-ons which do much of this.

internetter
0 replies
15h52m

People are treating Obsidian like it's the next Emacs

toomuchtodo
2 replies
14h4m

Hey! This is awesome! How hard would it be to plug it into something like Raindrop.io (bookmark manager) to train on all bookmarks collected?

xyc
1 replies
12h51m

haven't tried Raindrop.io, looks neat! Saw some other posts mentioning bookmarks as well. I'll keep this in thought, but will have to try it out first to find out.

toomuchtodo
0 replies
3h51m

Appreciate it, thank you.

sen
2 replies
14h11m

This is awesome. I currently use Ollama with OpenWebUI but am a big fan of native apps so this is right up my alley.

xyc
0 replies
12h46m

Thank you!

rkuodys
2 replies
11h21m

Honest question - can it be used for programming? Or anyone maybe can recommend local-first development LLM which would take in all project (Python / Angular) and write code based on full repo, not only the active window as with Copilot or Jetbrains AI

arzke
0 replies
7h57m

Have you tried using Copilot's @workspace command in the chat?

_ink_
0 replies
7h47m

Check out the continue dev plugin (available for VS Code and Jetbrains). You can attach it to OpenAI or local models and it can consider files in your codebase. It has a @Codebase keyword, but so far I get better results in specifically pointing to the needed files.

rexreed
2 replies
15h46m

What are the MacOS and hardware requirements? How does it perform on a slightly older model, lower powered Mac? I wish I could test this to see how it would perform, and while it's only $10, I don't want to spend that just to realize it won't work on my older, underpowered Mac mini.

xyc
1 replies
12h53m

Good question, I'll put some system requirements on the website. It only supports mac with Apple Silicon now, if that's helpful.

pantulis
0 replies
10h7m

Instant buy, great work and the price point is exactly right. Good luck!

rbtprograms
2 replies
16h48m

Looks great! Does it support different sized models, i.e. can I run llama 70B and 7B, and is there a way to specify which model to chat with? Are there plans to allow users to ingest their own models through this UI?

xyc
1 replies
16h8m

If you have a gguf file you can link it. For ingesting new models - I'm thinking about adding some CRUD UIs to it, but I'd like to keep a very small set of default models.

rbtprograms
0 replies
11h13m

thanks, its a great project

pentagrama
2 replies
15h38m

Congrats! Plans on Windows support?

theolivenbaum
0 replies
12h26m

You can try https://curiosity.ai, supports Windows and macOS

xyc
1 replies
14h45m

Wow, I did not expect at all this will end up on the front page. Thank you for all the enthusiasm, I'll try to get to more questions later today but if there's something I missed my X/twitter DM is open: https://x.com/chxy

castles
0 replies
6h1m

It seems "local" is all you need :)

tkgally
1 replies
16h55m

For an app like this, I would really like a spoken interface. Any possibility of adding text-to-speech and speech-to-text so that users can not only type but also talk with it?

xyc
0 replies
12h22m

yes I wish it could talk. It's after other priorities though, but I might try something experimental.

jedberg
1 replies
1h38m

The app is great but honestly I'm impressed with the home page! Can you go into more details on how you made the home page? What did you use to make the screenshots, and are you using any tools to generate the HTML/CSS/etc?

mvdtnz
0 replies
1h22m

Seriously? It grinds my phone to a near halt just trying to scroll from top to bottom. Worse in Firefox but still pretty bad in chrome.

giblfiz
1 replies
15h20m

So there are a few questions that leap out at me:

  * What are you using for image generation? Is that local as well (stable diffusion?) Does it have integrated prompt generation? 


  * You mention the ability to import ChatGPT history, are you able to import other documents?

  * How many "agent" style capacities does it have? Can it search the web? use other APIs? Prompt itself? 

  * Does it have a plugin framework? you mention that it is "customizable" but that can mean almost anything. 

  * What is the license? what assurances do users have that their usage is private? I mean, we all know how many "local" apps exfiltrate a ton of data.

hanniabu
0 replies
2h16m

What are you using for image generation?

It doesn't look like it supports image generation unfortunately. If it did then I would definitely adopt this as my daily driver.

SkepticMystic
1 replies
13h2m

I've found great utility with `llm` https://llm.datasette.io, a CLI to interact with LLMs. It has plugins for remote and local models.

xyc
0 replies
12h6m

Good to know. I've learned lots of things from Simon Willison's blog (datasette's author), so can't imagine llm being unuseful.

zzz999
0 replies
8h29m

Any censorship?

(Can't try MacOS Apps)

xyst
0 replies
1h56m

I'll give it a shot. Appreciate the effort on keeping it local.

surrTurr
0 replies
10h13m

any plans on supporting ollama integration?

stuckkeys
0 replies
6h12m

No iPhone app? Assuming it looks to connect to a local server or are you actually downloading the llms local to the device?

maxfurman
0 replies
1h55m

Won't work on my Intel Macbook :-(

matthewmcg
0 replies
46m

The headline had me thinking you had a DIY self-driving car for a moment there. Didn't initially register that this was just the common metaphor. Looks like a great app.

machiaweliczny
0 replies
3h30m

Will it work fine on Macbook Air M2 16GB ?

konschubert
0 replies
6h19m

I want something that starts as a simple manager for my reminders, something that tells me what to do next. And then, as features are being added, grows into a full-blown personal assistant that can book flights for me.

howmayiannoyyou
0 replies
4h44m

Without Apple Shortcuts support I can't pay for this. I get pretty much the same experience from GPT4All. Hoping you add support CLI, Shortcuts or something along those lines.

gnomodromo
0 replies
3h32m

I wonder how much space it takes.

geniium
0 replies
11h22m

I am very glad to see that kind of app. Well done!

ferfumarma
0 replies
4h44m

Is the haiku example a real Haiku?

I think it gives you 4, 7, and 9 syllables in the lines.

I bet you can coax it to give you a better example, if you tinker a bit.

famahar
0 replies
8h2m

Will this work on an M1 Mac Book Air? Looking for an offline solution like this but wary of hardware requirements.

boringg
0 replies
3h41m

This looks interesting -- might implement it. I'm curious how to ensure that it is local only?

bberenberg
0 replies
5h52m

There are a lot of tools listed in this thread, but I am not seeing the thing I want which is:

- Ability to use local and OpenAI models (ideally it has defaults for common local models)

- Chat UX

- Where I can point it to my JS/TS codebase

- It indexes the whole thing including dependencies for RAG. Ideally indexing has some form of awareness of model context length.

- I can use it for codegen / debugging.

The closest I have found has been aider, but it's python and I get into general python hell every time I try and run it.

Would appreciate a suggestion.

911e
0 replies
9h46m

Not a bit of open code while I'm 100% sure they use some that require it. If you"re using AI + Your data without insight on how it's used you're a fool. 2 cents