return to table of content

Show HN: Open-source macOS AI copilot using vision and voice

thomashop
17 replies
1d2h

Just used it with the digital audio workstation Ableton Live. It is amazing! Its tips were spot-on.

I can see how much time it will save me when I'm working with a software or domain I don't know very well.

Here is the video of my interaction: https://www.youtube.com/watch?v=ikVdjom5t0E&feature=youtu.be

Weird these negative comments. Did people actually try it?

mikey_p
6 replies
22h43m

Is it just me or is it incredibly useless?

"Here's a list of effects. Here's a list of things that make a song. Is it good? Yes. What about my drum effects? Yes here's the name of the two effects you are using on your drum channel"

None of this is really helpful and I can't get over how much it sounds like Eliza.

urbandw311er
3 replies
18h5m

Yeah I thought the same. Ultra generic advice and no evidence it has actually parsed anything unique or useful from the user’s actual composition.

thomashop
2 replies
14h19m

I made another one: https://www.youtube.com/watch?v=zyMmurtCkHI

In the one I posted I was just so amazed how well it worked and didn't really try anything useful. In this video you can see it giving me quite good advice on how to make a bassline dubby and how to carve frequencies out of the kick drum to make space for the bass.

It also looks at spectrograms and gives feedback / takes them into account. I'm pretty amazed.

ralfelfving
1 replies
9h21m

Did you change the GPT Vision system prompt at all? I wonder if changing it to state getting help with specifically Ableton, and maybe some guidelines around what kind of help you want could make it better?

thomashop
0 replies
5h57m

No. But I found it good enough as it is

thomashop
0 replies
22h37m

I made that video right at the start but since then I've asked it for example what kind of compression parameters would fit with a certain track and it could explain to me how to find an expert function which I would have had to consult a manual for otherwise.

thomashop
0 replies
14h21m

I just made a video where I test it with a proper use case. It helps me find effects to make a bassline more dubby and helps carve out frequencies in the kick drum to make space for the bass.

https://www.youtube.com/watch?v=zyMmurtCkHI

ralfelfving
4 replies
1d

So glad when I saw this, thanks for sharing this! It was exactly music production in Ableton was the spark that lit this idea in my head the other week. I tried to explain to a friend that don't use GPT much that with Vision, you can speed up your music production and learn how to use advanced tools like Ableton more quickly. He didn't believe me. So I grabbed a Ableton screenshot off Google and used ChatGPT -- then I felt there had to be a better way, I realized that I have my own use-cases, and it all evolved into this.

I sent him your video, hopefully he'll believe me now :)

thomashop
3 replies
23h44m

You may be interested in two proof of concepts I've been working on. I work with generative AI and music at a company.

MidiJourney: ChatGPT integrated into Ableton Live to create MIDI clips from prompts. https://github.com/korus-labs/MIDIjourney

I have some work on a branch that makes ChatGPT a lot better at generating symbolic music (a better prompt and music notation).

LayerMosaic allows you to allow MusicGen text-to-music loops with the music library of our company. https://layermosaic.pixelynx-ai.com/

ralfelfving
0 replies
23h26m

Oooh. Yes, very interested in MusicGen. I played with MusicGen for the first time the other week and created a little script that uses GPT to create the prompt and params which is stored to a text file along with the output. Let it loop for a few hours to get a few 100 output files that allowed me to learn a bit more about what kind of prompts that gave reasonable output (it was all bad, lol!)

ralfelfving
0 replies
23h17m

Oh LayerMosaic is dope. I'm not entirely sure how it works, but the sounds coming out of it is good -- so you have me intrigued! Can I read more about it somewhere, I might have a crazy idea I'd like to use this for.

ralfelfving
0 replies
23h24m

My brain read midjourney until I clicked on the GH link. What a great name, MIDIjourney!

pelorat
4 replies
1d1h

I mean it does send a screenshot of your screen off to a 3rd party, and that screenshot will most likely be used in future AI training sets.

So... beware when you use it.

thomashop
2 replies
1d1h

Beware of it seeing a screenshot of my music set? OpenAI will start copying my song structure?

You can turn it on and off. Not necessary to turn it on when editing confidential documents.

You never enable screen-sharing in videoconferencing software?

aaronscott
1 replies
1d1h

I completely agree. A huge business with a singular focus isn’t going to pivot into the music business (or any of the myriad use cases the general public throws at it). And if they did use someone’s info, it’s more likely an unethical employee than a genuine business tactic.

Besides, the parent program uses the API, which allows opting out of training or retaining that data.

mecsred
0 replies
1d

Yes this makes perfect sense. As we know, businesses definitely do not treat data as a commodity and engage in selling/buying data sets on the open market as a "genuine business tactic". Therefore, since the company in question doesn't have a clear business case for data collection currently, we can be sure this data will never be used against our interests by any company.

zwily
0 replies
1d1h

OpenAI claims that data sent via the API (as opposed to chatGPT) will not be used in training. Whether or not you believe them is a separate question, but that's the claim.

ProfessorZoom
14 replies
1d3h

e-e-e-electron... for this..

ralfelfving
11 replies
1d3h

I don't know man. I'm new to development, it's what I chose, probably don't know any better. Tell me what you would have chosen instead?

xNeil
2 replies
1d3h

electron's a really nice option, specially for people that aren't interested in porting their apps or spending too much time on development

this is a macOS specific app it seems - if you want better performance and more integration with the OS, i'd recommend using swift

ralfelfving
1 replies
1d3h

Time to learn learn Swift in the next project then! Thank you for the deets.

Filligree
0 replies
1d3h

The good news is you already have a tool to help you with inevitable XCode issues. grin

lolinder
2 replies
1d3h

Don't mind them—there's a certain subset of HN that is upset that web tech has taken over the world. There are some legitimate gripes about the performance of some electron apps, but with some people those have turned into compulsive shallow dismissals of any web app that they believe could have been native.

There's nothing wrong with using web tech to build things! It's often easier, the documentation is more comprehensive, and if you ever wanted to make it cross-platform election makes it trivial.

If you were working for a company it might be worth considering the trade-offs—do you need to support Macs with less RAM?—but for a side project that's for yourself and maybe some friends, just do what works for you!

ralfelfving
1 replies
1d3h

Thank you for the explanation! At the end of the day, I'm a newbie and I'm in it to learn something new with each project. Next time I'll probably try my hand at a different framework.

millzlane
0 replies
1d1h

I just watched a video about building a startup. One of the key points was to use what you know to get an MVP. Don't fret over which language or library to use (unless the goal is to learn a new framework). Just get building. I may not be a pro dev, but there is one thing I have learned over the years from hanging out amongst all of you. And that is, it doesnt matter if you are using emacs or vim, tabs vs spaces, or Java vs Python, the end product after all is what matters at the end of the day. Code can always be refactored.

Good luck in your development journey.

jdamon96
1 replies
1d3h

ignore the naysayers; nice job building out your idea

ralfelfving
0 replies
1d3h

Thank you! I got pretty thick skin, but always a bit of insecurity involved in doing something the first time -- first public GH repo and Show HN :D

programmarchy
0 replies
21h9m

My two cents: I think you made a good, practical choice. If you're happy with Electron, I'd say stick with it, especially if you have cross-platform plans in the future.

If you want to niche down into a more macOS specific app, you could learn AppKit and SwiftUI and build a fully native macOS app.

If you want to stay cross-platform, but you're not happy with Electron, then it might be worth checking out Tauri. It provides a JavaScript-based API to display native UI components, but without packaging a V8 runtime with your app bundle. Instead, it uses a native JavaScript host e.g. on macOS it uses WebKit, so it significantly reduces the download size of your app.

In terms of developing this into a product, on one hand it seems like deep integration with the host OS is the best way to build a "moat", but then again, Apple could release their own version and quickly blow a product like that out of the water.

guytv
0 replies
1d2h

What's important is to get an product out there. Nobody cares what stack you use. just us geeks. don't get discouraged. you did well :)

airstrike
0 replies
1d2h

I think the parent comment is a shallow dismissal, but since you're asking, I would have built in SwiftUI

atraac
1 replies
1d3h

Ah yes, cause what's better than building a real, working MVP? Learning Rust for half a year just so you can 'optimize' the f out of an app that does two REST calls.

wtallis
0 replies
23h12m

To be fair, this does sound like the kind of app that would benefit from being able to launch instantly, and potentially registering with the OS as a service in a way that cross-platform frameworks like Electron cannot easily accommodate. But Rust would not be the easiest choice to avoid those limitations.

swiftcoder
6 replies
1d3h

Worth mentioning that if you are in a corporate environment, running a service that sends arbitrary desktop screenshots to a 3rd party cloud service is going to run afoul of pretty much every security and regulatory control in existence

ralfelfving
1 replies
1d3h

I assume that anyone capable of cloning the app, starting the it on their machine and obtaining + adding an OpenAI API key understands that some data is being sent offsite -- and will be aware of their corporate policies. I think that's a fair assumption.

greenie_beans
0 replies
1d2h

that's a fair assumption. feels like swiftcoder is just trying to gotcha

thelittleone
0 replies
1d3h

The control for that is endpoints should be locked down to prevent install of non approved apps. Any org under regulatory controls would have some variation of that. Safe to assume an orgs users are stupid or nefarious and build defences accordingly.

isoprophlex
0 replies
1d2h

You're telling me... the cloud... is other people's computers?!

brookst
0 replies
1d2h

True, but also true of other screen capture utilities that send data to the cloud. Your PSA is true, but hardly unique to this little utility. And probably not surprising to the intended audience.

abrichr
0 replies
1d

This is exactly why in https://github.com/OpenAdaptAI/OpenAdapt we have implemented three separate PII scrubbing providers.

Congrats to the op on shipping!

qainsights
5 replies
1d

Great. I created `kel` for terminal users. Please check it out at https://github.com/qainsights/kel

dave1010uk
3 replies
23h31m

Very cool! Have you had much luck with Llama models?

I made Clipea, which is similar but has special integration with zsh.

https://github.com/dave1010/clipea

qainsights
1 replies
19h14m

Clipea is cool.

dave1010uk
0 replies
5h15m

Thanks!

qainsights
0 replies
19h15m

Yes, I used Langchain for Llama.

causal
0 replies
1d

Chatblade is another good one: https://github.com/npiv/chatblade

kssreeram
5 replies
17h14m

People reading this should check out Iris[1]. I’ve been using it for about a month, and it’s the best macOS GPT client I’ve found.

[1]: https://iris.fun/

LeoPanthera
3 replies
17h7m

Oof, $20/month is a lot, when I already have my own OpenAI API key.

kssreeram
2 replies
17h1m

I guess having to enter the API key is not a great user experience for regular people who aren’t developers.

immy
1 replies
16h54m

Due to ChatGPT Plus at $20/mo, and it not replacing ChatGPT, it doesn't stand strong for price conscious consumers. But I bet there's plenty who don't care.

kssreeram
0 replies
16h50m

I can see that.

For me, it did replace ChatGPT for one reason: The convenience of a lightweight Iris window being just a hotkey away.

mdrzn
0 replies
8h2m

I wish there was something like this for Windows!

jondwillis
5 replies
1d3h

You should add an option for streaming text as the response instead of TTS. And also maybe text in place of the voice command as well. I have been tire-kicking a similar kind of copilot for awhile, hit me up on discord @jonwilldoit

tomComb
2 replies
1d2h

text in place of the voice command as well

That would be great for people with Mac mini who don't have a mic.

ralfelfving
1 replies
1d

Hmmm... what if I added functionality that uses the webcam to read your lips?

Just kidding. Text seem to be the most requested addition, and it wasn't on my own list :) Will see if I add it, should be fairly easy to make it configurable and render a text input window with a button instead of triggering the microphone.

Won't make any promises, but might do it.

danaris
0 replies
17h9m

People with a Mac mini may not have a webcam, either!

ralfelfving
1 replies
1d3h

There's definitely some improvements to shuttling the data between interface<->API, all that was done in a few hours on day 1 and there's a few things I decided to fix later.

I prefer speaking over typing, and I sit alone, so probably won't add a text input anytime soon. But I'll hit you up on Discord in a bit and share notes.

jondwillis
0 replies
1d2h

Yeah, just some features I could see adding value and not being too hard to implement :)

netika
3 replies
1d3h

Such a shame it uses Vision API, i.e. it can not be replaced by some random self-hosted LLM.

freedomben
1 replies
1d1h

Actually it's open source, so it can be replaced by some random self-hosted LLM

iandanforth
0 replies
23h33m
ralfelfving
0 replies
1d3h

It can be replaced with a self-hosted LLM, simply change the code where the Vision API is being called. That's true for all of the API calls in the app.

havkom
3 replies
1d3h

A lot of negative comments here. However, I liked it!

Perfect Show HN and a great start of a product if the author wants to.

ralfelfving
2 replies
1d3h

Thank you, it's my first GH project & Show HN.. and.. yeah.. learning here :D

jonplackett
1 replies
1d2h

Also think this is fun.

In general I’m pretty excited about LLM as interface and what that is going to mean going forward.

I think our kids are going to think mice and keyboards are hilariously primitive.

ralfelfving
0 replies
1d

Before we know it, even voice might be obsolete when we can just think :) But maybe at that point, even thinking becomes obsolete because the AI:s are doing all the thinking for us?!

e28eta
3 replies
1d1h

Did you find that calling it “OSX” in the prompt worked better than macOS? Or was that just an early choice that you didn’t spend much time on?

I was skimming through the video you posted, and was curious.

https://www.youtube.com/watch?v=1IdCWqTZLyA&t=32s

code link: https://github.com/elfvingralf/macOSpilot-ai-assistant/blob/...

ralfelfving
1 replies
1d1h

No, this is an oversight by me. To be completely honest, up until the other day I thought it was still called OSX. So the project was literally called cOSXpilot, but at some point I double checked and realize it's been called macOS for many years. Updated the project, but apparently not the code :)

I suspect OSX vs macOS has marginal impact on the outcome :)

e28eta
0 replies
1d1h

Haha, makes perfect sense, thanks for the reply!

hot_gril
0 replies
21h28m

Heh. I remember calling it Mac OS back in the day and getting corrected that it's actually OS X, as in "OS ten," and hasn't been called Mac OS since Mac OS 9. Glad Apple finally saw it my way (except it's cased macOS).

zmmmmm
2 replies
21h8m

I've been wanting to build something like this by integrating into the terminal itself. Seems very straight forward and avoids the screen shotting. So you would just type a comment in the right format and it would recognise it:

    $ ls 
    a.txt b.txt c.txt

    $ # AI: concatenate these files and sort the result on the third column
    $ #....
    $ # cat a.txt b.txt c.txt | sort -k 3
This already works brilliantly by just pasting into CodeLLaMa so it's purely terminal integration to make it work. All i need is the rest of life to stop being so annoyingly busy.

paulmedwards
1 replies
20h35m

I wrote a simple command line app to let me quickly ask a quick question in the terminal - https://github.com/edwardsp/qq. It outputs the command I need and puts it in the paste buffer. I use it all the time now, e.g.

    $ qq concatenate all files in the current directory and sort the result on the third column
    cat * | sort -k3

zmmmmm
0 replies
20h27m

yep absolutely - have seen a few of those. And how well they work is what inspires me to want the next parts, which are (a) send the surrounding lines and output as context - notice above I can ask it about "these files" (b) automatically add the result to terminal history so I can avoid copy/paste if I want to run it. I think this could make these things absolutely fluid, almost like autocomplete (another crazy idea is to actually tie it into bash-completion so when you press tab it does the above).

CodeLLama with GPU acceleration on Mac M1 is almost instant in response, its really compelling.

smcleod
2 replies
20h32m

Nice project, any plans to make it work with local LLMs rather than "open"AI?

ralfelfving
0 replies
20h2m

Thanks. Had no plans, but might give it a try at some point. For me, personally, using OpenAI for this isn't an issue.

hmottestad
0 replies
20h0m

I think that LM Studio has an OpenAI "compliant" API, so if there is something similar that supports vision+text then it would be easy enough to make the base URL configurable and then point it to localhost.

Do you know of a simple setup that I can run locally with support for both images and text?

qirpi
2 replies
21h24m

Awesome! I love it! I was just about to sign up for ChatGPT Plus, but maybe I will pay for the API instead. So much good stuff coming out daily.

How does the pricing per message + reply end up in practice? (If my calculations are right, it shouldn't be too bad, but sounds a bit too good to be true)

ralfelfving
1 replies
21h21m

I have a hard time saying how much this particular application cost to run, because I use the Voice+Vision APIs for so many different projects on a near daily basis and haven't implemented a prompt cost estimator.

But I also pay for ChatGPT Plus, and it's sooo worth it to me.

If you'd like to skip Plus and use something else, I don't think my project is the right one. I'd STRONGLY suggest you check out TypingMind, the best wrapper I've found: https://www.typingmind.com/

qirpi
0 replies
20h59m

Wow, thanks for sharing that link, I've been looking for something like this :)

poorman
2 replies
23h23m

Currently imagining my productivity while waiting 10 seconds for the results of the `ls` command.

ralfelfving
1 replies
23h11m

It's a basic demo to show people how it works. I think you can imagine many other examples where it'll save you a lot of time.

hot_gril
0 replies
21h32m

The demo on Twitter is a lot cooler, partially because you scroll to show the AI what the page has. Maybe there's a more impressive demo to put on the GH too?

lordswork
2 replies
1d

This looks very cool. Does anyone know of something similar for Windows? (or does OP intend to extend support to Windows?)

ralfelfving
1 replies
1d

Hey, OP here. I don't have a Windows machine so have not been able to confirm if it works, and probably won't be able to develop/test for it either -- sorry! :/

I suspect that you should be able to take my code and only require a few tweaks to make it work tho, shouldn't be much about it that is macOS only.

coolspot
0 replies
23h14m

For testing/development, you can download a free Windows VM here: https://developer.microsoft.com/en-us/windows/downloads/virt...

hackncheese
2 replies
22h42m

Love it! Will definitely use this when a quick screenshot will help specify what I am confused about. Is there a way to hide the window when I am not using it? i.e. I hit cmd+shift+' and it shows the window, then when the response finishes reading, it hides again?

ralfelfving
1 replies
21h37m

There's a way for sure, it's just not implemented. Allowing for more configurability of the window(s) is on my list, because it annoys me too! :)

hackncheese
0 replies
21h28m

Annoyance Driven Development™

I_am_tiberius
2 replies
1d2h

I would love to have something like this but using an open source model and without any network requests.

trenchgun
0 replies
1d1h

Probably in three months, approximately.

dave1010uk
0 replies
23h19m

LLaVA, Whisper and a few bash scripts should be able to do it. I don't know how helpful the model is with screenshots though.

1. Download LLaVA from https://github.com/Mozilla-Ocho/llamafile

2. Run Whisper locally for speech to text

3. Save screenshots and send to the model, with a script like https://til.dave.engineer/openai/gpt-4-vision/

ukuina
1 replies
1d3h

This is very cool! Thank you for working on it and sharing it with us.

ralfelfving
0 replies
1d3h

Thank you for checking it out! <3

stephenblum
1 replies
1d

You made real-life Clippy! for the Mac. This would be great to be for other mac apps too. Add context of current running apps.

ralfelfving
0 replies
1d

It should work for any macOS app. It just takes a screenshot of the currently active window, you can even append the application name if you'd like.

spullara
1 replies
21h16m

Did you not find the built-in voice-to-text and text-to-speech APIs to be sufficient?

ralfelfving
0 replies
20h2m

Didn't even think of them to be honest.

satchlj
1 replies
1d2h

It's not working for me, I get a "Too many requests" http error

ralfelfving
0 replies
1d

Hmm.. OpenAI bunch a few things into some error. Iirc this could be because you're out of credits / don't have a valid payment method on file, but it could also be that you're hitting rate limits. The Vision API could be the culprit, while in beta you can only call it X amount of times per day (X varies by account).

Make the console.log:s for the three API calls a bit more verbose to find out which call is causing this, and if there's more info in the error body.

rchaves
1 replies
8h27m

Hey, I was working on something to allow GPT-V to actually do stuff on the screen, click around and type, I tested on my Mac and it’s working pretty well, do you think it would be cool to integrate? https://github.com/rogeriochaves/driver

ralfelfving
0 replies
5h29m

Yes. I think you commented this somewhere else, and I like it. I was considering doing something similar to have it execute keyboard commands, but decided it would have to wait for a future version. I think click + type + and performing other actions would be powerful, especially if it can do it fast and accurate. Then it's less about "How do I do X?", and more "Can you do X for me?".

qup
1 replies
21h45m

I have a tangential question: my dad is old. I would love to be able to have this feature, or any voice access to an LLM, available to him via an easy-to-press external button. Kind of like the big "easy button" from staples. Is there anything like that, that can be made to trigger a keypress perhaps?

ralfelfving
0 replies
21h34m

I personally have no experience with configuring or triggering keyboard shortcuts beyond what I learned and implemented in this project. But with that said, I'm very confident that what you're describing is not only possible but fairly easy.

quinncom
1 replies
22h54m

I’d love to see a version of this that uses text input/output instead of voice. I often have someone sleeping in the room with me and don’t want to speak.

ralfelfving
0 replies
22h42m

You're not the first to request. Might add it, can't promise tho.

pyryt
1 replies
1d4h

Do you have use case demo videos somewhere? Would be great to see this in action

ralfelfving
0 replies
1d4h

There's one at 00:30 in this YouTube video (timestamped the link): https://www.youtube.com/watch?v=1IdCWqTZLyA&t=32s

mdrzn
1 replies
8h0m

Very cool, would love to have a Windows version of this.

ralfelfving
0 replies
5h28m

I've not tried this on Windows, but might actually work if you run the packager. Try it. If it doesn't work, there shouldn't be too much that is macOS specific -- so you should be able to tweak the underlying code to work with Windows with fairly few changes.

knowsuchagency
1 replies
1d2h

This is brilliant!

ralfelfving
0 replies
1d1h

Glad you liked it!

jamesmurdza
1 replies
23h22m

Have you thought about integrating the macOS accessibility API for either reading text or performing actions?

ralfelfving
0 replies
22h55m

No, my thought process never really stretched outside of what I built. I had this particular idea, then sat down to build it. I had some idea of getting OpenAI to respond with keyboard shortcuts that the application could execute.

E.g. in Photoshop: "How do I merge all layers" --> "To merge all layers you can use the keyboard shortcut Shift + command + E"

If you can get that response in JSON, you could prompt the user if they want to take the suggested action. I don't see myself using it very often, so didn't think much further about it.

jackculpan
1 replies
1d2h

This is awesome

ralfelfving
0 replies
1d

Thanks, glad you liked it!

faceless3
1 replies
1d4h

Wrote some similar scripts for my Linux setup, that I bind with XFCE keyboard shortcuts:

https://github.com/samoylenkodmitry/Linux-AI-Assistant-scrip...

F1 - ask ChatGPT API about current clipboard content F5 - same, but opens editor before asking num+ - starts/stops recording microphone, then passes to Whisper (locally installed), copies to clipboard

I find myself rarely using them however.

ralfelfving
0 replies
1d4h

Nice!

dekhn
1 replies
16h22m

I misread the title and thought this was an app you run on a laptop as you drive around... which if you think about it, would be pretty useful. A combined vision/hearing/language model with access to maps, local info, etc.

ralfelfving
0 replies
5h25m

It would be really cool, and I think we're not very far away from this being something you have on your phone.

The pilot name comes from Microsoft's use of "Copilot" for their AI assistant products, and I tried to play on it with macOSpilot which is maco(s)pilot. I think that naming has completely flown over everyone's heads :D

d4rkp4ttern
1 replies
5h59m

I’ve looking for a simple way to use voice input on the main ChatGPT website, since it gets tiresome to type a lot of text into it. Anyone have recommendations? The challenge is to get technical words right.

ralfelfving
0 replies
5h32m

If you're ok with it, you can use the mobile app -- it supports voice. Then you just have the same chat/thread open on your computer in case you need to copy/paste something.

behat
1 replies
1d2h

Nice! Built something similar earlier to get fixes from chatgpt for error messages on screen. No voice input because I don't like speaking. My approach then was Apple Computer Vision Kit for OCR + chatgpt. This reminds me to test out OpenAI's Vision API as a replacement.

Thanks for sharing!

ralfelfving
0 replies
1d1h

Thanks! You could probably grab what I have, and tweak it a bit. Try checking if you can screenshot just the error message and check what the value of the window.owner is. It should be the name of the application, so you could just append `Can you help me with this error I get in ${window.owner}?` to the Vision API call.

amelius
1 replies
1d3h

Please include "OpenAI-based" in the title. (Now many people here are disappointed).

ralfelfving
0 replies
1d

Fair point, didn't think it would matter so much. Can't edit it any more, otherwise I'd change it to add OpenAI to the title!

Jayakumark
1 replies
1d

Was following these two projects by someuser on Github which makes similar things possible with Local models. Sending screenshot to openai is expensive , if done every few seconds or minutes.

https://github.com/KoljaB/LocalAIVoiceChat

While the below one uses openai - don't see why it can't be replaced with above project and local mode.

https://github.com/KoljaB/Linguflex

ralfelfving
0 replies
1d

Nice! Although the productivity increase from being able to resolve blockers more quickly adds up to a lot (at least for me), local models would be more cost effective -- and probably feel less iffy for many people.

I went for OpenAI because I wanted to build something quickly, but you should be able to replace the external API calls with calls to your internal models.

Art9681
1 replies
21h27m

Make sure to set OpenAI API spend limits when using this or you'll quickly find yourself learning the difference between the cost of the text models and vision models.

EDIT: I checked again and it seems the pricing is comparable. Good stuff.

ralfelfving
0 replies
21h19m

I think a prompt cost estimator might be a nifty thing to add to the UI.

Right now there's also a daily API limit on the Vision API too that kicks in before it gets really bad, 100+ requests depending on what your max spend limit is.

nbzso
0 replies
6h30m

Welcome to the future where nobody is professional because there is no need for professionals. Just ask Corporate Overlord Surveillance Bot to give you instruction on what to do and how to think. Voilà. You are the master of the Universe. Dunning-Kruger champion for the ages to come.

The problem is obvious. Time to reaction. API calls limitation. Average response for a complex task due to limitation of the vision module. Similar functionality has to be available for free with local model tuned to those type of tasks - helper/copilot. Apple and Microsoft will include helper models into the OS soon. Let's hope they are generous and don't turn this to a local data gathering funnel (I have my doubts on this).

fake-name
0 replies
13h36m

Open Source

off to OpenAI Vision

Pick one

LeoNatan25
0 replies
18h20m

“macOSpilot runs NodeJS/Electron”

Lost me.