return to table of content

SearXNG is a free internet metasearch engine

xnx
17 replies
2d18h

This takes me back. Before Google, meta search tools increased your odds of finding a decent answer between the spammy results from Alta Vista, Hotbot, Lycos, etc.

nickburns
10 replies
2d17h

memories. and now it's come back around some decades later in the name of digital privacy.

rockskon
9 replies
2d13h

How are they private though? Unless they relay your search it is the exact opposite of private.

sneela
5 replies
2d9h

If you host your own instance:

SearXNG protects the privacy of its users in multiple ways regardless of the type of the instance (private, public). Removal of private data from search requests comes in three forms:

1. removal of private data from requests going to search services

2. not forwarding anything from a third party services through search services (e.g. advertisement)

3. removal of private data from requests going to the result pages

From: https://docs.searxng.org/own-instance.html#how-does-searxng-...

The docs mention a caveat below at "What are the consequences of using public instances?":

If someone uses a public instance, they have to trust the administrator of that instance. This means that the user of the public instance does not know whether their requests are logged, aggregated and sent or sold to a third party.
gtirloni
4 replies
2d5h

All of that is fine but by simply having your IP, Google can continue to profile you in countless ways with data they collect in other ways and it wouldn't be expensive for them at all.

panki27
3 replies
2d4h

SearX acts as a proxy, you are not submitting your IP to Google.

gtirloni
1 replies
1d2h

We're talking about self hosting, right? The proxy is using the same IP.

nickburns
0 replies
3h29m

if self-hosting, that may very well be correct.

a few examples of a self-hosted design that would not, include policy-based routing over a VPN with one or multiple tunneled hops, or through another external proxy. (and then there's also that 'onion' routing 'protocol' there—but i'm not clear if/how that integrates with clearnet destinations like publicly-accessible search engines if at all.)

nickburns
0 replies
2d2h

i think since 'IP address' has become something of a baseline non-technical understanding of one of the critical components of networking, it becomes increasingly difficult for non-netpeeps to fully grasp the many uses and non-uses of addressing.

a proxy (or proxies) and how they can shield but one or many of ' your' IP addresses throughout an egress packet's many hops (and from who or what destination it or those addresses can be shielded) is a pretty advanced concept when you think about it.

not to mention that, at this point, bare source IP address is a pretty dilute tracker compared to other current methods of identity profiling or traffic fingerprinting.

nice succint correction on your part regardless.

yayr
0 replies
2d9h

I would assume that the relaying can strip the request from identifying information such as IP, cookies and other tracking mechanisms that you get when visiting e.g. google.com.

notpushkin
0 replies
2d12h

They do indeed relay your query. How else would they work?

nickburns
0 replies
2d9h

privacy is achieved through the proxy and therefore aggregation of disparate requests/queries. some anonymity is therefore achieved, at least from the perspective of source search engine operators, by blending into 'the crowd.'

but the idea is not necessarily anonymity so much as privacy by foiling the creation of any even somewhat accurate marketing/data profile derived from 'your search.'

chefandy
2 replies
2d13h

I liked Copernic: it was a native Windows 9x Meta search tool.

sheepscreek
0 replies
2d12h

Wow. I came here to the comments to write about Copernic! It was a super valuable tool in the pre-Google era, heck even in the early 2000s.

jszymborski
0 replies
2d11h

This brought a smile to my face... I worked at a company started by Copernic alums (Coveo).

marban
0 replies
2d15h

Northern Light was nice though.

giantrobot
0 replies
2d15h

Not just avoiding spam but some meta search engines (Dogpile IIRC) could also search specialist search engines like White Pages and Yellow Pages (long before Yelp etc existed). You'd be able to find business listings and contact info that wasn't normally found on web search engines. They could also include FTP search results which was useful as public anonymous FTPs had yet to fall from use.

8ig8
0 replies
2d18h

Dogpile was another.

sitkack
16 replies
2d19h

Run this on your personal internet connection, risk not being able to use a search engine.

mostlysimilar
9 replies
2d19h

Elaborate?

marginalia_nu
3 replies
2d19h

Virtually all public search engine endpoints see an insane amount of bot activity, often several queries per second.

If you delegate queries to e.g. google or bing at that rate, you'll be ip blocked in a heartbeat.

RaisingSpear
1 replies
2d14h

Search engines: they scrape the web, but get narky when scraped themselves.

marginalia_nu
0 replies
2d9h

Difference is a crawler paces the requests, respects robots.txt and rate limits, and doesn't typically invoke 50-100MB disk I/O per request.

Like I don't mind automated access to my search engine, I even offer a public API to the effect, that you can in fact hook into SearXNG. What I mind is when one jabroni with a botnet decides their search traffic is more important than everyone else's and grabs all the compute for himself via a sybil attack.

mostlysimilar
0 replies
2d19h

Ah duh, for some reason my mind didn't go to hosting the search instance locally and I misunderstood.

btw thank you for Marginalia! The spirit of the small web is very important to me.

Fnoord
3 replies
2d19h

It is a metasearch engine. So it uses other search engines. The point is to let multiple use it, so that Google et al. does not know who's using their service. Ie. it is a gloried proxy.

Honestly, I just use Kagi. Though I need to find some way to limit my searches to 300 per month.

ranger_danger
1 replies
2d18h

that does not negate what OP said. your IP will still get blocked very quickly.

although existing searx instances have been run for years and they don't seem to be dropping like flies...

lannisterstark
0 replies
2d15h

Well. I host a public instance. IP is still not blocked. YMMV.

wkat4242
0 replies
2d15h

Isn't Kagi also really a delegator? I've heard they delegate to brave among others.

HeatrayEnjoyer
0 replies
2d19h

Your IP address will get burned

baobun
4 replies
2d17h

Only if you expose it publically without auth while routing queries through your residential connection, which is not an advised configuration.

For personal use, you can run it directly on your machine or access over VPN. Queries to upstream search engines can be forwarded over proxies or VPNs as you see fit. Some work fine over tor and some can go over commercial or DIY tunnels.

ttt3ts
3 replies
2d16h

To add, I have been running instance for years for family and friends. I run it behind a nginix basic auth with a config that sets a forever cookie first time you login. Really simple. Another good option is cloud flare zero trust.

nickburns
2 replies
2d16h

how many total regular F&F users? do they ever ask you about logging? or is that beyond the scope of what most of them realize is happening?

ttt3ts
1 replies
2d5h

A ~dozen. Several are technical and use it because it includes several private and paid engines on request.

Config is in a git repo I give access to if requested. One of the technical users modified it to keep pretty minimal logs. I guess they are trusting me to actually use that config but trust is pretty high in the group so not really an issue.

nickburns
0 replies
1d19h

very interesting. sounds like an enviable group. thanks for your reply.

BeetleB
0 replies
2d13h

I think this needs a lot more clarification than is provided in this thread.

If you run it locally, and only you use it, then you won't get blocked - a given search engine will see about the same number of requests as if you used it directly.

Add a few house members and you'll still be fine.

(I ran the original searx for a year or two locally - no issues at all).

keepamovin
14 replies
2d11h

If anyone is interested in searches applied to the full text of every page in your browser history, or to only select pages that you bookmark, check out our project DownloadNet (formerly, and possibly, futurely: "DiskerNet").

It hooks into your browser to give you an augmented experience. The UI is pretty simple (think 1997 era google but without CSS haha), and we don't do anything super complex with search (but could in future), but it works not bad. Check it out!!!

https://github.com/dosyago/DownloadNet

Oh, it also makes your content (again either everything you browsed or only what you booked) available offline. So if you work on an oil rig, or shipping, or long haul freight, can be a good way to browse as normal but save yer satellite bandwidth!!!

nickburns
9 replies
2d7h

it's unclear to me why anyone, particularly anyone with even a passing interest in what the topic of this submission has to offer, would be even remotely interested in being the "master archivist of your own internet browsing."

i don't need anything else archiving anything related to my internet browsing except for my human brain. and yes, that's just me...

but how is the shameless plug of this not just therefore off-topic but diametrically-opposed-to-total-personal-privacy tool appropriate here?

keepamovin
8 replies
2d7h

Is funny because this totally offline and locally hosted search engine in DownloadNet is potentially the most private of all.

I get if you’re not interested, but I imagine people interested in locally hosted search-related solutions, may be.

Your view is probably more personal and hard to support in general given this, and given the comment’s position and votes indicating at least some people are interested.

I totally understand why you wouldn’t want your browsing history archived anywhere. But that is what search engines do somewhat. It’s okay, everyone’s different.

nickburns
7 replies
2d7h

none of that (most includingly comment position and votes) = privacy.

this tool is not relevant here.

keepamovin
6 replies
2d7h

none of that = privacy

this tool is not relevant here

No it’s relevant. You don’t think self hosted and offline is private?

nickburns
5 replies
2d7h

i self-host my human brain online in my own skull. i feed it and nurture it so that it can continue to perform and offer me the highest level of privacy i could possibly maintain.

keepamovin
3 replies
2d7h

mental privacy, huh, nickburns? That’s an interesting concept.

there is no man in the desert. And no man needs nothing.

Tho I prefer the west coast of Zaire or Suid-Afrika myself.

nickburns
2 replies
2d6h

  mental privacy, huh, nickburns? That’s an interesting concept.
you made your plug under the guise of asking if anyone had interest, i offered mine, and now i think we're done here, keepamovin.

keepamovin
1 replies
2d6h

Seemed so

nickburns
0 replies
2d6h

before you go... i apologize for being a dick about it. i'd have to really reflect some more on why it felt necessary to go about it in this way, which is inevitably a deeply personal reflection.

but if i may just say, privacy as a concept for a truly egalitarian society is something very near and critical in my opinion. marketing, on the other hand, is not.

good day to you, sir.

Art9681
0 replies
19h25m

The entire field of Information Technology exists because the world disagrees that the thing in your skull has an acceptable level of intellectual and recall performance. And here we are on a social site dedicated to the pursuit of making it better. And so are you!

One of us, apparently.

DavideNL
2 replies
2d9h

I don't see any documentation at all, like what browsers are supported, etc. :/

alexdeloy
1 replies
2d8h

Me neither, it really would benefit from a better documentation since I like the idea a lot.

I just tried it out and it seems to be tied to Chrome. Since I use Firefox and Chromium as my daily drivers this does not work for my case. I understand that they probably rely on some Chrome internals to dig through the content, a SOCKS Proxy approach would have worked better and would have no need to switch between a "save" and "serve" mode. But then again I was only scraping the top of it because of the lack of browser support. Will keep an eye on this one though!

outofpaper
0 replies
2d

While it lacks a search feature last I checked there's always https://github.com/davidfstr/webcrystal

One .py file. Only one dependency (urllib3).n with a little love the concept could become a full transparent proxy.

Avamander
0 replies
2d

How is this better than YaCy?

arcastroe
11 replies
2d14h

This is great. I wish there was a way to block certain domains from ever appearing on search results.

EDIT: Looks like there's already an open issue:

https://github.com/searxng/searxng/issues/2351

BeetleB
5 replies
2d13h

Kagi does that.

lannisterstark
3 replies
2d13h

I trust myself a bit more than I trust someone else to run my queries sadly. I understand that they claim to store no user data or associations etc, but honestly, it's just their word.

BeetleB
2 replies
2d12h

I understand that they claim to store no user data or associations etc, but honestly, it's just their word.

My guess is that if they are found to do so, then they open themselves up to lawsuits. Not collecting data isn't merely a perk - it's practically the reason Kagi exists.

zelphirkalt
0 replies
2d10h

Even if not a lawsuit, people will judge it themselves and vote with their feet.

marginalia_nu
0 replies
2d8h

Another big reason not to keep this stuff is just the cost of dealing with requests from law enforcement. At some point you start getting them.

If you don't have any logs you can just always say the princess is in another castle, since you can't provide data that doesn't exist.

If on the other hand you do have the requested information, you need to determine the validity of the request, and then extract the data; or refuse to comply and possibly put yourself at legal risk. For a smaller business that's probably a can of worms you'd rather avoid opening.

arcastroe
0 replies
2d13h

Kagi requires an account to use, which is not great for privacy.

I understand Kagi is generally reputable, but I like the idea of a self-hosted alternative where you're in full control.

squarefoot
1 replies
2d14h

uBlacklist does just that with Google and some other search engines. I use it with Firefox to filter out pinterest junk from search results. Also available for Chrome and Safari.

https://github.com/iorate/ublacklist

poulpy123
0 replies
2d11h

just installed it to try. For the people that want to give it a try also, I noticed that several of the public list contains legitimate websites such as canva or reddit

dalf
1 replies
2d9h

Disclaimer: I am one of the maintainers.

The intent of SearXNG is to be stateless (with no sessions on the server) and to work without JavaScript.

However, this approach limits certain features because of the restricted size of cookies (and other forms of browser storage require JavaScript).

arcastroe
0 replies
1d21h

Thank you, that makes a lot of sense. Stateless is very good for privacy and I agree with that approach for a multi-user instance, (which I suppose is the most common use-case).

I'm picturing more of an instance-wide configuration of domain blocks for a private, single-user, self-hosted instance. But I understand this may not be the intended use of the project.

ThinkBeat
0 replies
2d9h

Google used to do that, but then stopped. You can still do it manually by specifying by excluding them in (every) search you do,but the list can get along and it is far from a good user experience.

Kagi has this feature built in and it is a good user experience.

You can also use the uBlacklist browser plugin. My problem with that is that is slows everything down. I am not certain but I think all the works is done after the search is complete. That it filter the actual result. The two above limit it from ever being part of the result.

ozehlaw
5 replies
2d19h

Interesting. Google is going to shit these days.

mrexroad
1 replies
2d14h

Web content itself had gone to shit these days, in order to win google’s SEO game to win google’s Adsense game. “Google going to shit” is just a second order effect (or third/forth depending how you look at it).

prmoustache
0 replies
2d1h

The good content has not disappeared. So it is still google going to shit if it can't make up what is good and what isn't, which was the reason people started using it in the first place 25 years ago.

YvUuXJiO
1 replies
2d15h

i dont think so

nickburns
0 replies
2d15h

i don't either. it's done been to shit for some time now. scraping its results can be highly effective though.

ijijijjij
0 replies
2d7h

Google search has gone to shit since Google+ .... or more precisely, when they removed the plus operator in Google search around 2011. And no, the quotes aren't as good.

My bet is that Google will become "Google TV" and search won't be possible. They will just show you what they want. They'll probably frame it as "AI knows what you want to see".

Maybe they should ban Google instead of TikTok (I don't use either though).

keepamovin
4 replies
2d14h

That's clever. X-ING (like those 'crossing' roadsigns), so it's like Search-ching.

There's quite some similarity between the CH and the X sound in English.

But, as this is HN probably someone with a PhD in comparative phonetics will explain why this is a common and infuriating misunderstanding of layfolken.

keepamovin
0 replies
2d11h

Hahaha! :) Good to know

nickburns
0 replies
2d7h

the abbreviation "ng" in software development/forking/maintenance evolution denotes 'next generation.'

lygten
3 replies
2d12h

Try this guy. Its not Kagi, but the search results are pretty good. Host it yourself on Docker.

    https://felladrin-minisearch.hf.space/

zuhsetaqi
1 replies
2d12h

Wow, it's anoying that it reloads the result while I'm going through them

lygten
0 replies
2d9h

You can disable the AI

skoocda
1 replies
1d19h

First few instances I tried are either returning no results, or only DDG results.

Error! Engines cannot retrieve results:

brave ( Suspended: too many requests )

google ( Suspended: too many requests )

qwant ( server API error )

johnklos
0 replies
1d3h

I've run in to exactly that. By putting my SearXNG on a machine that also does NAT for a busy network, this can be avoided. This is definitely one instance where IPs from a colo are a bad thing and residential IPs are a good thing ;)

bilegeek
1 replies
2d19h

OOC, does this support YaCy as another engine? Would be the best of all worlds if it did IMO.

hagbard_c
0 replies
2d18h

It does, I run it that way with an optional fan-out to my personal YaCy instance. Here's the relevant part of settings.yml:

   - name: yacy
     engine: yacy
     categories: general
     search_type: text
     base_url: https://yacy.searchlab.eu
     shortcut: ya
     disabled: true
     # required if you aren't using HTTPS for your local yacy instance
     # https://docs.searxng.org/dev/engines/online/yacy.html
     # enable_http: true
     # timeout: 3.0
     # search_mode: 'global'
Change 'disabled' to 'false' and point it at whatever YaCy instance you want to use. It can use the 'general' and 'images' categories.

ranguna
0 replies
2d7h

Lots of people publicly host searx instances. There's a list of publicly available instances online, but if you are looking for a tool that randomly redirects you to an instance for every search you do on your browser's bar, you can use neocities: https://searx.neocities.org/changelog

I use this all the time. A downside is that sometimes you land on an instance that doesn't provide any results or gives you really poor ones. This has been happening less frequently recently.

nickburns
0 replies
2d17h

use when i'm tired of picking obfuscated Fumo plushies or Minecraft screencaps on https://4get.ca/. i don't even know what a Fumo plushie is, never mind six of them.

longitudinal93
0 replies
2d19h

Been running thieves the default on all my devices for the past year and I couldn't be happier. Have only had it choke twice and it just needed to be updated to be back in business.

kristjank
0 replies
2d10h

It's my favourite tool to average out the individually shitty results of mainstream search engines into something vaguely usable.

johnklos
0 replies
1d3h

SearXNG is great, with a few caveats:

Running it on a machine that also does NAT for many other machines helps to prevent getting blocked by upstream search engines like DuckDuckGo. It'd be good if access to certain upstream search engines could be sent through, say, a proxy set up elsewhere to prevent this very, very common problem, if you can't run it from an IP used for other things.

I'd like to figure out how to have a mode where my search is 100% literal - where every word I type must be in the search results exactly as I type them. Perhaps that's the equivalent of putting a "+" in front of each word, and putting each word in quotes? It's annoying that my words are constantly getting changed for me because there aren't many results, which I expressly don't want.

Like mentioned elsewhere, I want to be able to explicitly exclude certain domains. I get that SearXNG wants to be stateless, but I could either configure a separate URL for it or simply configure it for all searches. For instance, if I search for a PDF manual for something, I never, ever, ever want to see anything from "manualslib.com" and sites like it.

Other than these things which'd be nice to address, I'd say running SearXNG and encouraging people to use it instead of Google has worked quite well :)

fasa99
0 replies
2d2h

People always sell Sear, but myself, I'm a fan of presearch.com I have no affiliation with them whatsoever or financial interest. I have no interest in their crypto based business model. In fact I think their lack of google or bing style search result filtering is entirely due to lack of funding and/or prioritizing other things more important to success, not due to taking a stand on free speech or anything like this. And that's perhaps how it was in the early days of the internet, when maslow's hierarchy of corporate needs focused on trying to make the thing work versus public relations goodfeels and presenting only rightspeech.

Anyway, if I'm looking for some topic I believe google would be known to filter heavily, or something esoteric, I take a look at presearch to get a second opinion. I'd also love to see archive.org do something similar, archive.org has an amazing collection of data, poorly indexed and poorly searchable.

dalf
0 replies
2d2h

A few years ago, I remember someone conducted a study on the quality of SearX(NG) results using different Internet providers: mobile, fiber, and VPN.

I'm not sure if this person is still active on HN, but I'm really curious about the results.

cess11
0 replies
2d7h

Been using SearX/SearXNG a lot over the years, in large part because I used and preferred the Dogpile meta search many, many years ago.

Apparently Dogpile still exists, didn't expect that: https://en.wikipedia.org/wiki/Dogpile