HN comments for: Show HN: I've built a locally running Perplexity clone

nilsherzig

27 replies

19h34m

2024-04-03 22:52:19 UTC

Happy to answer any questions and open for suggestions :)

It's basically a LLMs with access to a search engine and the ability to query a vector db.

The top n results from each search query (initialized by the LLM) will be scraped, split into little chunks and saved to the vector db. The LLM can then query this vector db to get the relevant chunks. This obviously isn't as comprehensive as having a 128k context LLM just summarize everything, but at least on local hardware it's a lot faster and way more resource friendly. The demo on GitHub runs on a normal consumer GPU (amd rx 6700xt) with 12gb vRAM.

keefle

7 replies

11h0m

2024-04-04 07:25:37 UTC

Wonderful work!

is it possible to make it only use a subset of the web? (Only sites that I trust and think are relevant to producing an accurate answer), and are there ways to possibly make it work offline on pre installed websites? (wikipedia, some other wikis and possibly news sites that are archived locally), and how about other forms of documents? (books and research papers as pdfs)

kidintech

3 replies

10h53m

2024-04-04 07:32:28 UTC

Seconded. I tried to do this many years ago for my dissertation and failed, but this would be a dream of mine.

robertlagrant

2 replies

10h46m

2024-04-04 07:40:23 UTC

Would it not be possible to create a search engine that only crawls certain sites?

kidintech

1 replies

10h42m

2024-04-04 07:43:47 UTC

I was most interested in the offline aspect of it, which I wouldn't know where to even start with if I were to fork.

How do you parse and efficiently store large, unstructured information for arbitrary, unstructured queries?

stavros

0 replies

9h13m

2024-04-04 09:12:48 UTC

You put it in a search server, like ElasticSearch or Meili.

nemoniac

1 replies

10h26m

2024-04-04 07:59:35 UTC

Llocalsearch uses searxng which has a feature to blacklist/whitelist sites for various purposes.

nilsherzig

0 replies

9h7m

2024-04-04 09:18:42 UTC

also a great idea to expose this to the frontend. thanks :)

nilsherzig

0 replies

9h8m

2024-04-04 09:18:12 UTC

uhhhh both ideas are great, would you like to turn them into github issues? i will definitely look into both of them :)

koeng

6 replies

18h12m

2024-04-04 00:14:06 UTC

What is the search engine that it uses?

nilsherzig

5 replies

18h7m

2024-04-04 00:19:06 UTC

searxng, which is a locally running meta search engine combining a lot of different sources (including Google and co)

mmahemoff

4 replies

12h59m

2024-04-04 05:27:24 UTC

This might be more of a searxng question, but doesn't it quickly run up against anti-bot measures? CAPTCHA challenges and Forbidden responses? I can see the manual has some support for dealing with CAPTCHA [1], but in practical terms, I would guess a tool like this can't be used extensively all day long.

I'm wondering if there's a search API that would make the backend seamless for something like this.

1. https://docs.searxng.org/admin/answer-captcha.html

visarga

2 replies

12h2m

2024-04-04 06:24:16 UTC

As a last resort we could have AI work on top of a real web browser and solving captchas as well. Should look like normal usage. I think these kinds of systems LLM + RAG + Web Agent will become widespread and the preferred method to interact with the web.

We can escape all ads and dark UI patterns by delegating this task to AI agents. We could have it collect our feeds, filter, rank and summarize them to our preferences, not theirs. I think every web browser, operating system and mobile device will come equipped with its own LLM agent.

The development of AI screen agents will probably get a big boost from training on millions of screen capture videos with commentary on YouTube. They will become a major point of competition on features. Not just browser, but also OS, device and even the chips inside are going to be tailored for AI agents running locally.

manojlds

1 replies

10h54m

2024-04-04 07:31:48 UTC

If everyone consumes like that what's even the incentive for content creators?

visarga

0 replies

6h43m

2024-04-04 11:43:02 UTC

If content creators can't find anything that is uniquely human and cannot be made by AI, then maybe they are not creative enough for the job. The thing about generative AI is that it can take context, you can put a lot or very little guidance in it. The more you specify, the more you can mix your own unique sauce in the final result.

I personally use AI for text style changes, as a summarizer of ideas and as rubber duck, something to bounce ideas off of. It's good to get ideas flowing and sometimes can help you realize things you missed, or frame something better than you could.

nilsherzig

0 replies

9h49m

2024-04-04 08:37:16 UTC

I didn't run into a lot of timeouts while using it myself, but you would probably need another search source if you plan to host this service for multiple users at the same time.

There are projects like flareresolverr which might be interesting

FezzikTheGiant

2 replies

12h58m

2024-04-04 05:28:08 UTC

If you're open to it, it would be great if you could make a post explaining how you built this. Even if it's brief. Trying to learn more about this space and this looks pretty cool. And ofc, nice work!

Nischalj10

1 replies

10h22m

2024-04-04 08:03:41 UTC

a primer - https://github.com/nilsherzig/LLocalSearch/issues/17

nilsherzig

0 replies

9h3m

2024-04-04 09:22:34 UTC

guys, i didn't thought there would be this much interest in my project haha. I feel kinda bad for just posting it in this state haha. I would love to make a more detailed post on how it works in the future (keep an eye on the repo?)

ziziman

1 replies

4h23m

2024-04-04 14:02:30 UTC

To scrape the websites, do you just blindly cut all of the HTML into defined size chunks or is there some more sophisticated logic to extract text of interest ?

I'm wondering because most news websites now have a lot of polluting elements like popups, would they also go into the database ?

totolouis

0 replies

3h53m

2024-04-04 14:33:07 UTC

If you look at the vector handler in his code, he is using blue Monday sanitizer and doing some "replaceAll".

So I think there may be some useless data in the vector, but that may not be a issue since it is coming from multiple sources (for simple question at least)

mark_l_watson

1 replies

17h22m

2024-04-04 01:03:41 UTC

Your project looks very cool. I had on my ‘list’ to re-learn Typescript (I took a TS course about 5 years ago, but didn’t do anything with it) so I just cloned your repo so I can experiment with it.

EDIT: I just noticed that most of the code is Go. Still going to play with it!

nilsherzig

0 replies

9h50m

2024-04-04 08:35:42 UTC

Thanks :). Yea only the web part is typescript and I really wouldn't recommend to learn from my typescript haha

ivolimmen

1 replies

13h47m

2024-04-04 04:39:01 UTC

"normal consumer GPU"... well mine is a 4GB 6600.. so I guess that varies.

nilsherzig

0 replies

9h51m

2024-04-04 08:35:03 UTC

Sorry it wasn't my intention to gatekeep, but my 300€ card really is on the low end of LLM Things

d-z-m

1 replies

8h49m

2024-04-04 09:36:54 UTC

any plans to support other backends besides ollama?

nilsherzig

0 replies

6h24m

2024-04-04 12:02:09 UTC

Sure (if they are openai api compatible i can add them within minutes) otherwise I'm open for pull requests :)

Also, i don't own an Nvidia Card or Windows / MacOS

hanniabu

0 replies

3h7m

2024-04-04 15:18:42 UTC

This is awesome, would love if there were executable files where these dependencies are needed. That would make it wayyyy more accessible rather than just to those that know how to use the command line and resolve dependencies (yes, even docker runs into that when fighting the local system).

keyle

17 replies

19h36m

2024-04-03 22:50:11 UTC

Impressive, I don't think I've seen a local model call upon specialised modules yet (although I can't keep up with everything going on).

I too use local 7b open-hermes and it's really good.

nilsherzig

10 replies

19h31m

2024-04-03 22:54:35 UTC

Thanks :). It's just a lot of prompting and string parsing. There are models like "Hermes-2-Pro-Mistral" (the one from the video) which are trained to work with function signatures and outputting structured text. But at the end it's just strings in > strings out, haha. But its fun (and sometimes frustrating) to use LLMs for flow control (conditions, loops...) inside your programs.

keyle

8 replies

19h3m

2024-04-03 23:22:48 UTC

Wow, I didn't know about "Hermes 2 Pro - Mistral 7B", cheers!

nilsherzig

7 replies

18h50m

2024-04-03 23:35:35 UTC

It's my go to "structured text model" atm. Try "starling-ml-beta" (7b) for some very impressive chat capabilities. I honestly think that it outperforms GPT3 half the time.

peter_l_downs

6 replies

18h13m

2024-04-04 00:12:32 UTC

Sorry to repeat the same question I just asked the other commenter in this thread, but could you link the model page and recommend a specific level of quantization for the models you've referenced? I'd love to play with these models and see what you're talking about.

BOOSTERHIDROGEN

5 replies

18h6m

2024-04-04 00:19:42 UTC

It's from nous research https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B

Q5 is minimum.

peter_l_downs

4 replies

17h55m

2024-04-04 00:31:01 UTC

Thank you — from that page, at the bottom, I was able to find this link to what I think are the quantized versions

https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-...

If you have the time, could you explain what you mean by "Q5 is minimum"? Did you determine that by trying the different models and finding this one is best, or did someone else do that evaluation, or is that just generally accepted knowledge? Sorry, I find this whole ecosystem quite confusing still, but I'm very new and that's not your problem.

d-z-m

1 replies

9h13m

2024-04-04 09:13:20 UTC

Talking GGUF, Usually the higher you can afford to go wrt. quantization(e.g. Q5 is better than Q4, etc), the better. A Q6_K has minimal performance loss from the Q8, so in most cases if you can fit a Q6_K it's recommended to just use that. TheBloke's READMEs[0] usually have a good table summarizing each quantization level.

If you're RAM constrained, you'll also have to make trade-offs about the context length. e.g. you could have 8 GB RAM and a Q5 quant with shorter context, vs Q3 with longer, etc.

[0]:https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF

peter_l_downs

0 replies

1h34m

2024-04-04 16:51:33 UTC

Thank you!

BOOSTERHIDROGEN

1 replies

12h27m

2024-04-04 05:59:26 UTC

It's the best balance if you have limited compute performance.

peter_l_downs

0 replies

1h35m

2024-04-04 16:51:26 UTC

Thank you

davidcollantes

0 replies

18h18m

2024-04-04 00:08:16 UTC

Got a link for that one? I have found a few with Hermes-2-Mistral in the name.

viksit

3 replies

19h17m

2024-04-03 23:08:43 UTC

curious what hardware you use? and is any of this runnable on an m1 laptop?

keyle

1 replies

19h15m

2024-04-03 23:11:02 UTC

Absolutely, 7B will run comfortably on 16GB of RAM and most consumer level hardware. Some of the 40B run on 32GB, but it depends on the model I found (GGUF, crossing fingers help).

I ran this originally on a M1 with 32GB, I run this on an Air M2 with 16GB (and mac mini M2 32GB), no problem.

I use llama.cpp with a SwiftUI interface (my own), all native, no scripts python/js/web.

7b is obviously less capable but the instant response makes it worth exploring. It's very useful as a Google search replacement that is instantly more valuable, for general questions, than dealing with the hellscape of blog spam ruling Google atm.

Note, for my complex code queries at $dayjob where time is of the essence, I still use GPT4 plus, which is still unmatched imho, without running special hardware at least.

regularfry

0 replies

8h19m

2024-04-04 10:06:46 UTC

I've been occasionally using a 7b Q4 quant on llama.cpp on an 8GB M1. It's usable, if not amazing.

nilsherzig

0 replies

18h53m

2024-04-03 23:33:20 UTC

Depends on your m1 specs, but should definitely be able to run a 7b model (at least with some quantization).

windexh8er

0 replies

17h27m

2024-04-04 00:59:08 UTC

Have you looked into tools like CrewAI [0]?

[0] https://www.crewai.io/

peter_l_downs

0 replies

18h15m

2024-04-04 00:10:33 UTC

I'm just starting to get into downloading and testing models using llama.cpp and I'm curious which model you're actually using, since they seem to come in varying levels of quantization. Is this [0] the model page for the one you're using, or should I be looking somewhere else? What is the actual file name of the model you're using?

[0] https://huggingface.co/TheBloke/OpenHermes-2.5-Mistral-7B-GG...

ldjkfkdsjnv

9 replies

17h46m

2024-04-04 00:39:46 UTC

The big secret about perplexity is they havent done much beyond using off the shelf models

KuriousCat

5 replies

16h13m

2024-04-04 02:13:22 UTC

How did they secure funds in that case?

basbuller

3 replies

14h11m

2024-04-04 04:14:30 UTC

That is probably exactly why they got funding. You can sell it as focus on adding new features and leveraging the best available tools before reinventing the wheel.

They do train their own models now, but for about a year they just forwarded calls to models like gpt3.5T. You still have the option to use models not trained by perplexity.

hackernewds

1 replies

13h56m

2024-04-04 04:29:47 UTC

which is why their engagement and model responses suck. the other competitors are far better

C.ai and Pi comes to mind

basbuller

0 replies

13h45m

2024-04-04 04:41:15 UTC

Wait, are you directly comparing Perplexity and C.ai or Pi? Perplexity is a search engine, Pi is a chatbot, and C.ai is roleplay? Their value propositions are very different

KuriousCat

0 replies

13h30m

2024-04-04 04:55:44 UTC

I still don't get it. What was the USP here? What is the allure in it for the investors?

code51

0 replies

10h16m

2024-04-04 08:10:07 UTC

Simple, looking at the mirror and saying "Google-killer" firmly 3 times everyday.

nilsherzig

0 replies

6h21m

2024-04-04 12:05:11 UTC

I assume the same, feels like their product is just summarizing the top n results? I wouldn't need the whole vector db thing, if local models (or hardware) would be able to run with a context of this size.

msp26

0 replies

8h46m

2024-04-04 09:39:45 UTC

Pretraining and even finetuning (to a good extent) is overrated and you can create plenty of value without it.

ggnore7452

0 replies

13h53m

2024-04-04 04:33:07 UTC

I've been working on a small personal project similar to this and agree that replicating the overall experience provided by Perplexity.ai, or even improving it for personal use, isn't that challenging. (The concerns of scale or cost are less significant in personal projects. Perplexity doesn't do too much planning or query expansion, nor does it dig super deep into the sources afaik)

I must say, though, that they are doing a commendable job integrating sources like YouTube and Reddit. These platforms benefit from special preprocessing and indeed add value.

wg0

7 replies

8h47m

2024-04-04 09:38:27 UTC

In five year's time - by 2030, I foresee that lots of inference would be happening on local machines with models being downloaded on demand. Think docker registry of AI models which is pretty much Hugging Face already there.

This all would be due to optimisations within model inference code and techniques, hardware and packaging of software like the above.

Don't see billion dollar valuations for lots of AI startups out there to materialise into anything.

openquery

4 replies

8h39m

2024-04-04 09:46:56 UTC

I foresee that lots of inference would be happening on local machines with models being downloaded on demand

Why? It's much more efficient to have centralized special purpose hardware to run enormous models and then ship the comparatively small result over the internet.

By analogy, you don't have a search engine running on your phone right?

dns_snek

1 replies

6h37m

2024-04-04 11:49:24 UTC

Why?

Privacy, security, latency, offline availability, access to local data and services running on the device, just to name a few.

ilc

0 replies

3h27m

2024-04-04 14:58:54 UTC

Big Tech + Countries: Those all sound like great reasons to centralize all access to AIs!

vachina

0 replies

8h34m

2024-04-04 09:51:34 UTC

A more appropriate analogy would be driving your own car vs. taking the bus.

Sammi

0 replies

8h27m

2024-04-04 09:59:23 UTC

You currently can't have a search engine running locally on your phone. Google search is possible the single largest c++ program every built. And nevermind the storage needs...

But in a few years we might be able to have LLMs running on our phones that work just as well if not better. Of couse as you mention the LLMs running on large servers might still be much more powerfull, but the local ones might be powerfull enough.

ThinkBeat

1 replies

8h2m

2024-04-04 10:23:30 UTC

"640kb will be enough for everyone." (Gates)

I think that the models will evolve and grow as more powerful compute/hardware comes out.

You may be able to run scaled down n versions of what state of the art now, but by then the giant models will have grown in size and in required compute.

The 6 year old models will be retro computingish.

Somewhat like how you can play 6 year old games on a new powerful PC but by then the new huge games will no longer play well on your Old mach

lobocinza

0 replies

3h26m

2024-04-04 15:00:04 UTC

There will be demand and supply for both cases.

andrewfromx

4 replies

7h48m

2024-04-04 10:37:52 UTC

searXNGDomain := os.Getenv("SEARXNG_DOMAIN")

I see this but what search engine lets you get results in json for free?

andrewfromx

3 replies

7h46m

2024-04-04 10:39:34 UTC

ohh "https://duckduckgo.com/?q=andrew&format=json" nice!

nilsherzig

2 replies

7h45m

2024-04-04 10:40:52 UTC

Those arent search results tho, that's just duckduckgo internal things like "similar queries"

andrewfromx

1 replies

7h41m

2024-04-04 10:44:51 UTC

oh they must be hitting their own internal api with format=json but what is the datasource?

andrewfromx

0 replies

7h40m

2024-04-04 10:46:20 UTC

https://news.ycombinator.com/item?id=39925003 ahhh https://github.com/searxng/searxng

sebzim4500

3 replies

7h55m

2024-04-04 10:31:00 UTC

Whenever I see these projects I always find reading the prompts fascinating.

Useful for searching through added files and websites. Search for keywords in the text not whole questions, avoid relative words like "yesterday" think about what could be in the text. > The input to this tool will be run against a vector db. The top results will be returned as json.

Presumably each clarification is an attempt to fix a bug experienced by the developer, except the fix is in English not in Go.

nilsherzig

2 replies

7h46m

2024-04-04 10:39:55 UTC

haha yea pretty much, its amazing (and frustrating) how much of the programs "performance" depends on these prompts

htrp

1 replies

6h9m

2024-04-04 12:17:00 UTC

Our current state of the art

also love your last commit

fix: copilot is stupid and i should not blindly trust it

https://github.com/nilsherzig/LLocalSearch/commit/9f45e24f15...

Everything wrong with code gen in a nutshell

nilsherzig

0 replies

4h54m

2024-04-04 13:32:09 UTC

Yea im kinda stressed out to get it working for everyone haha. I would have caught that under different conditions. I'm a big e2e tests guy haha

pants2

3 replies

18h24m

2024-04-04 00:02:18 UTC

It says it's a "locally running search engine" - but not sure how it finds the sites and pages to index in the first place?

nilsherzig

2 replies

18h14m

2024-04-04 00:11:57 UTC

Yea I guess that's misleading, I should probably change that. I was referring to the LLM part as locally running. Indexing is still done by the big guys and queried using searxng

nilsherzig

0 replies

18h8m

2024-04-04 00:18:11 UTC

Just to clarify, it wasn't my intention to be misleading

lavela

0 replies

9h30m

2024-04-04 08:55:58 UTC

What would be your current recommendation on how to create a vector db from local files that would work with LLocalSearch?

fnetisma

3 replies

18h38m

2024-04-03 23:48:26 UTC

This is really neat! I have questions:

“Needs tool usage” and “found the answer” blocks in your infra, how are these decisions made?

Looking at the demo, it takes a little time to return results, from the search, vector storage and vector db retrieval, which step takes the most time?

nilsherzig

2 replies

18h29m

2024-04-03 23:56:53 UTC

Thanks :)

Die LLM makes these decisions on its own. If it writes a message which contains a tool call (Action: Web search Action Input: weight of a llama) the matching function will be executed and the response returned to the LLM. It's basically chatting with the tool.

You can toggle the log viewer on the top right, to get more detail on what it's doing and what is taking time. Timing depends on multiple things: - the size of the top n articles (generating embeddings for them takes some time) - the amount of matching vector DB responses (reading them takes some time)

dcreater

1 replies

12h32m

2024-04-04 05:53:57 UTC

Die LLM

You mean the? The German is bleeding through haha

rzzzt

0 replies

12h13m

2024-04-04 06:13:22 UTC

Wolfenstein 3D did it first! And then The Simpsons as well.

arflikedog

3 replies

11h7m

2024-04-04 07:19:18 UTC

A while back you commented on my personal project Airdraw which I really appreciated. This looks awesome and you're well on your way to another banger project - looking forward to toying around with this :)

gardenhedge

1 replies

8h25m

2024-04-04 10:01:07 UTC

Did you just happened to see this post today and notice the username?

arflikedog

0 replies

3h5m

2024-04-04 15:21:23 UTC

unironically yes, I used comments to hot fix a bunch of stuff when I first launched. It's a small world and I thought this was a cool moment

nilsherzig

0 replies

9h48m

2024-04-04 08:37:51 UTC

Uhh yes I was really impressed by your project :)

nikolayasdf123

2 replies

9h46m

2024-04-04 08:39:41 UTC

cool to see Go here

nilsherzig

1 replies

7h42m

2024-04-04 10:43:36 UTC

langchain go (missed opportunity to call it golang-chain) is nice, but has literally no docs haha

nilsherzig

0 replies

6h17m

2024-04-04 12:08:40 UTC

btw ollama (the webserver around llamacpp) is also written in golang :)

ml-anon

2 replies

11h12m

2024-04-04 07:14:00 UTC

Did you really make a perplexity clone if you didn’t spend more time promoting yourself on Twitter and LinkedIn than on the engineering?

nilsherzig

1 replies

9h43m

2024-04-04 08:43:00 UTC

Ah damn I forgot about getting some VC money

ProllyInfamous

0 replies

1h26m

2024-04-04 16:59:29 UTC

This demo will land you more important things than "just" VC money.

Extremely impressive, cannot wait to actually implement this on my M2Pro (mac).

madeofpalk

2 replies

6h57m

2024-04-04 11:28:28 UTC

Q: is chrome on ios powered by safari

According to the sources provided, Chrome on iOS is not powered by Safari. Google's Chrome uses the Blink engine, while Safari uses the WebKit engine.

I find it amusing how when people show off their LLM projects their examples are always of it failing, and providing a bad answer.

nilsherzig

1 replies

6h23m

2024-04-04 12:03:24 UTC

Well i don't indent to get money from people, so i guess showing real results isnt a "problem".

Besides i think the following sentences arent wrong? Its just a 7b model give it some slack haha

madeofpalk

0 replies

5h20m

2024-04-04 13:05:42 UTC

No, sure. I just think it's funny how it constantly happens, from opensource, free, or commercial projects. No one seems to be immune to 'telling on themselves'.

gorbypark

2 replies

9h18m

2024-04-04 09:07:48 UTC

I if a quick poke through the source and it seems like there’s not much reason this couldn’t run on macOS? It seems that ollama is doing the inference and then there’s a go binary doing everything else? I might give it a go and see what happens!

nilsherzig

1 replies

7h44m

2024-04-04 10:42:04 UTC

sure there are people in the issues how got it working on macos. docker networking was the only problem :)

mritchie712

0 replies

7h4m

2024-04-04 11:21:36 UTC

I have it running on my Mac right now, took < 2 minutes (had to manually download one of the ollama models)

BrutalCoding

2 replies

8h40m

2024-04-04 09:46:02 UTC

That’s a great project you pulled off. From the time I starred it (10-12h ago I think), and upon re-checking this post, you gained 500+ stars lol.

Visualized in a chart with star-history: https://star-history.com/#nilsherzig/LLocalSearch

sroussey

0 replies

2h30m

2024-04-04 15:56:12 UTC

Ah, that’s a nice chart generator. Will have to use if I ever get any, lol.

nilsherzig

0 replies

8h24m

2024-04-04 10:01:33 UTC

haha thanks for the chart link. i woke up with 1k more than it had yesterday, im kinda stressed out

xydac

1 replies

17h34m

2024-04-04 00:51:31 UTC

This is cool, haven't run this yet but seems really promising. Am thinking how this can be a super useful to hook with internal corporate search engines and then get answers from that.

Good to see more of these non API key products being built (connected to local llms)

nilsherzig

0 replies

7h43m

2024-04-04 10:42:48 UTC

I might try to hook this into our internal confluence, shouldn't be a problem

siborg

1 replies

9h5m

2024-04-04 09:21:06 UTC

Exciting project. Trying to install it but running into some issues with searxng. Anyone else?

nilsherzig

0 replies

9h1m

2024-04-04 09:25:07 UTC

please tell me about the problem or open an issue :)

sgt

1 replies

7h7m

2024-04-04 11:18:44 UTC

Speaking of LLM's... here's my "dear lazyweb" to HN:

What would be the best self hosted option to build sort of a textual AI assistant into your app? Preferably something that I can train myself over time with domain knowledge.

traverseda

0 replies

6h53m

2024-04-04 11:33:09 UTC

Fine tuning on your own knowledge probably isn't what you want to do, you probably want to do retrieval aided generation instead. Basically a search engine on some local documents, and you put the results of the search into your prompt. The search engine uses the same vector space as your language model as its index, so the results should be highly relevant to whatever the prompt is.

I'd start with "librechat" and mistral, so far that's one of the best chat interfaces and has good support for self hosting. For the actual model runner, ollama seems to be the way to go.

I believe it's built on "langchain", so you can switch to that when it makes sense to. When you've tested all your queries and setup with librechat, know that librechat is a wrapper around "langchain".

I'd start by testing the workflow in librechat, and if librechat's API doesn't do what you want, well I've always found fastAPI pleasant to work with.

---

Less for your use case, and more in-general. I've been assessing a lot of LLM interfaces lately, and the weird porn community has some really powerful and flexible interfaces. With sillytavern you can set up multiple agents, have one agent program, another agent critique, and a third asses it for security concerns. This kind of feedback can help catch a lot of LLM mistakes. You can also go back and edit the LLM's response, which can really help. If you go back and edit an LLM message to fix code or change variable names, it will tend to stick with those decisions. But those interfaces are still very much optimized for "Role playing".

Recommend keeping an eye on https://www.reddit.com/r/LocalLLaMA/

oysterpingu

1 replies

10h3m

2024-04-04 08:23:17 UTC

Awesome project! As I newbie myself in everything LLM, where should I start looking to create a similar project than yours? Which resources/projects are good to know about? Thank you for sharing!

nilsherzig

0 replies

9h52m

2024-04-04 08:33:29 UTC

I think the easiest entry point would be the python langchain project? It has a lot more documentation and working examples than the golang one I've used :)

If you could tell me more about your goals, I can probably provide a more narrow answer :)

noisy_boy

1 replies

10h12m

2024-04-04 08:13:55 UTC

Would be good if the readme mentions minimum hardware specs to get a reasonably decent performance. E.g. I have a ThinkPad X1 extreme i7 with MaxQ graphics, any hopes of running this on it without completely ruining the performance?

nilsherzig

0 replies

9h45m

2024-04-04 08:41:12 UTC

You could run the LLM using your CPU and normal (non video) ram. But that's a lot slower. There are people working on making it a lot faster tho. The bottleneck is the transfer speed between the ram Sticks and the CPU.

Just taking a guess, but I wouldn't expect more than a couple tokens (more or less like syllables) per second. Which is probably to slow, since it has to read a couple thousand per search result.

It's hard to provide minimum requirements, since there are so many edge cases.

hubraumhugo

1 replies

11h15m

2024-04-04 07:10:48 UTC

Excellent work! Cool side projects like that will eventually help you get hired by a top startup or may even lead to building your own.

I can only encourage other makers to post their projects on HN and put them out into the world.

nilsherzig

0 replies

9h44m

2024-04-04 08:42:22 UTC

Yea it's also quite fulfilling to see people likening something you've put some work into :)

hackernewds

1 replies

13h54m

2024-04-04 04:32:23 UTC

what does this have to do with Perplexity? it should reference the underlying models used instead

vishnumohandas

0 replies

13h48m

2024-04-04 04:37:46 UTC

The UX is comparable.

adr1an

1 replies

9h35m

2024-04-04 08:50:52 UTC

This is so cool! And the fact that you can use Ollama as 'llm backend' makes it sustainable. didn't see how to switch models in the demo, that might be worth to highlight in readme..

adr1an

0 replies

8h54m

2024-04-04 09:32:07 UTC

I have a 'feature request', can we manage which sites are being used by some categories in the frontend? For example, if I build a list of websites and out them under "coding", then I'd like to use those to answer my programming questions. Meanwhile, I'd like to add an "art" category for museum's homepages so that I can ask which year was XYZ painting from. And so on. The current implementation looks like the inter-operability with searing is more static... IDK if searxng has an API to switch those filters or if they can be managed already through 'profiles'.. that kind of thing..

pcthrowaway

0 replies

17h55m

2024-04-04 00:30:38 UTC

Now if someone can hook this into Plandex (also shared today - https://news.ycombinator.com/item?id=39918500) to make a tool that enables you to collaborate with AI without any of your code leaving your computer, that would be amazing!

nilsherzig

0 replies

9h54m

2024-04-04 08:31:54 UTC

Uhh sorry guys, I was asleep and now the project has like 1k stars haha

I will try my best to catch up with everyone <3

gardenhedge

0 replies

8h27m

2024-04-04 09:58:58 UTC

Completely locally running search engine.. that queries the Internet

firtoz

0 replies

12h45m

2024-04-04 05:40:59 UTC

Excellent work! I plan to use it with existing LLMs tbh, but great to see it working locally also! Thank you so much for sharing. I love the architecture.

darby_eight

0 replies

16h57m

2024-04-04 01:29:22 UTC

Perplexity seems to be a chatbot competitor.

aagha

0 replies

10m

2024-04-04 18:15:48 UTC

According to Crunchbase [0], Perplexity has raised over $100M.

You built this in your spare time?

The following things jump out to me:

- How much a hype cycle invites insane amounts of money - How trash the entire VC world is during a hype cycle - What an amazing thing ingenuity and passion are

Great job!

0 - https://www.crunchbase.com/organization/perplexity-ai