HN comments for: SearXNG is a free internet metasearch engine

xnx

17 replies

2d18h

2024-04-05 23:43:27 UTC

This takes me back. Before Google, meta search tools increased your odds of finding a decent answer between the spammy results from Alta Vista, Hotbot, Lycos, etc.

nickburns

10 replies

2d17h

2024-04-06 00:41:41 UTC

memories. and now it's come back around some decades later in the name of digital privacy.

rockskon

9 replies

2d13h

2024-04-06 04:42:13 UTC

How are they private though? Unless they relay your search it is the exact opposite of private.

sneela

5 replies

2d9h

2024-04-06 09:03:38 UTC

If you host your own instance:

SearXNG protects the privacy of its users in multiple ways regardless of the type of the instance (private, public). Removal of private data from search requests comes in three forms:

1. removal of private data from requests going to search services

2. not forwarding anything from a third party services through search services (e.g. advertisement)

3. removal of private data from requests going to the result pages

From: https://docs.searxng.org/own-instance.html#how-does-searxng-...

The docs mention a caveat below at "What are the consequences of using public instances?":

If someone uses a public instance, they have to trust the administrator of that instance. This means that the user of the public instance does not know whether their requests are logged, aggregated and sent or sold to a third party.

gtirloni

4 replies

2d5h

2024-04-06 12:32:41 UTC

All of that is fine but by simply having your IP, Google can continue to profile you in countless ways with data they collect in other ways and it wouldn't be expensive for them at all.

panki27

3 replies

2d4h

2024-04-06 13:39:35 UTC

SearX acts as a proxy, you are not submitting your IP to Google.

gtirloni

1 replies

1d2h

2024-04-07 16:03:16 UTC

We're talking about self hosting, right? The proxy is using the same IP.

nickburns

0 replies

3h29m

2024-04-08 14:59:44 UTC

if self-hosting, that may very well be correct.

a few examples of a self-hosted design that would not, include policy-based routing over a VPN with one or multiple tunneled hops, or through another external proxy. (and then there's also that 'onion' routing 'protocol' there—but i'm not clear if/how that integrates with clearnet destinations like publicly-accessible search engines if at all.)

nickburns

0 replies

2d2h

2024-04-06 16:11:19 UTC

i think since 'IP address' has become something of a baseline non-technical understanding of one of the critical components of networking, it becomes increasingly difficult for non-netpeeps to fully grasp the many uses and non-uses of addressing.

a proxy (or proxies) and how they can shield but one or many of ' your' IP addresses throughout an egress packet's many hops (and from who or what destination it or those addresses can be shielded) is a pretty advanced concept when you think about it.

not to mention that, at this point, bare source IP address is a pretty dilute tracker compared to other current methods of identity profiling or traffic fingerprinting.

nice succint correction on your part regardless.

yayr

0 replies

2d9h

2024-04-06 08:52:40 UTC

I would assume that the relaying can strip the request from identifying information such as IP, cookies and other tracking mechanisms that you get when visiting e.g. google.com.

notpushkin

0 replies

2d12h

2024-04-06 05:48:10 UTC

They do indeed relay your query. How else would they work?

nickburns

0 replies

2d9h

2024-04-06 09:07:20 UTC

privacy is achieved through the proxy and therefore aggregation of disparate requests/queries. some anonymity is therefore achieved, at least from the perspective of source search engine operators, by blending into 'the crowd.'

but the idea is not necessarily anonymity so much as privacy by foiling the creation of any even somewhat accurate marketing/data profile derived from 'your search.'

chefandy

2 replies

2d13h

2024-04-06 04:46:46 UTC

I liked Copernic: it was a native Windows 9x Meta search tool.

sheepscreek

0 replies

2d12h

2024-04-06 05:57:22 UTC

Wow. I came here to the comments to write about Copernic! It was a super valuable tool in the pre-Google era, heck even in the early 2000s.

jszymborski

0 replies

2d11h

2024-04-06 07:26:59 UTC

This brought a smile to my face... I worked at a company started by Copernic alums (Coveo).

marban

0 replies

2d15h

2024-04-06 02:41:13 UTC

Northern Light was nice though.

giantrobot

0 replies

2d15h

2024-04-06 03:02:42 UTC

Not just avoiding spam but some meta search engines (Dogpile IIRC) could also search specialist search engines like White Pages and Yellow Pages (long before Yelp etc existed). You'd be able to find business listings and contact info that wasn't normally found on web search engines. They could also include FTP search results which was useful as public anonymous FTPs had yet to fall from use.

8ig8

0 replies

2d18h

2024-04-05 23:55:47 UTC

Dogpile was another.

sitkack

16 replies

2d19h

2024-04-05 23:08:10 UTC

Run this on your personal internet connection, risk not being able to use a search engine.

mostlysimilar

9 replies

2d19h

2024-04-05 23:12:46 UTC

Elaborate?

marginalia_nu

3 replies

2d19h

2024-04-05 23:17:04 UTC

Virtually all public search engine endpoints see an insane amount of bot activity, often several queries per second.

If you delegate queries to e.g. google or bing at that rate, you'll be ip blocked in a heartbeat.

RaisingSpear

1 replies

2d14h

2024-04-06 04:05:29 UTC

Search engines: they scrape the web, but get narky when scraped themselves.

marginalia_nu

0 replies

2d9h

2024-04-06 08:33:30 UTC

Difference is a crawler paces the requests, respects robots.txt and rate limits, and doesn't typically invoke 50-100MB disk I/O per request.

Like I don't mind automated access to my search engine, I even offer a public API to the effect, that you can in fact hook into SearXNG. What I mind is when one jabroni with a botnet decides their search traffic is more important than everyone else's and grabs all the compute for himself via a sybil attack.

mostlysimilar

0 replies

2d19h

2024-04-05 23:21:35 UTC

Ah duh, for some reason my mind didn't go to hosting the search instance locally and I misunderstood.

btw thank you for Marginalia! The spirit of the small web is very important to me.

Fnoord

3 replies

2d19h

2024-04-05 23:17:10 UTC

It is a metasearch engine. So it uses other search engines. The point is to let multiple use it, so that Google et al. does not know who's using their service. Ie. it is a gloried proxy.

Honestly, I just use Kagi. Though I need to find some way to limit my searches to 300 per month.

ranger_danger

1 replies

2d18h

2024-04-05 23:50:22 UTC

that does not negate what OP said. your IP will still get blocked very quickly.

although existing searx instances have been run for years and they don't seem to be dropping like flies...

lannisterstark

0 replies

2d15h

2024-04-06 03:07:23 UTC

Well. I host a public instance. IP is still not blocked. YMMV.

wkat4242

0 replies

2d15h

2024-04-06 03:07:37 UTC

Isn't Kagi also really a delegator? I've heard they delegate to brave among others.

HeatrayEnjoyer

0 replies

2d19h

2024-04-05 23:15:53 UTC

Your IP address will get burned

baobun

4 replies

2d17h

2024-04-06 00:51:20 UTC

Only if you expose it publically without auth while routing queries through your residential connection, which is not an advised configuration.

For personal use, you can run it directly on your machine or access over VPN. Queries to upstream search engines can be forwarded over proxies or VPNs as you see fit. Some work fine over tor and some can go over commercial or DIY tunnels.

ttt3ts

3 replies

2d16h

2024-04-06 01:49:15 UTC

To add, I have been running instance for years for family and friends. I run it behind a nginix basic auth with a config that sets a forever cookie first time you login. Really simple. Another good option is cloud flare zero trust.

nickburns

2 replies

2d16h

2024-04-06 01:59:48 UTC

how many total regular F&F users? do they ever ask you about logging? or is that beyond the scope of what most of them realize is happening?

ttt3ts

1 replies

2d5h

2024-04-06 13:15:51 UTC

A ~dozen. Several are technical and use it because it includes several private and paid engines on request.

Config is in a git repo I give access to if requested. One of the technical users modified it to keep pretty minimal logs. I guess they are trusting me to actually use that config but trust is pretty high in the group so not really an issue.

nickburns

0 replies

1d19h

2024-04-06 23:24:00 UTC

very interesting. sounds like an enviable group. thanks for your reply.

BeetleB

0 replies

2d13h

2024-04-06 04:36:28 UTC

I think this needs a lot more clarification than is provided in this thread.

If you run it locally, and only you use it, then you won't get blocked - a given search engine will see about the same number of requests as if you used it directly.

Add a few house members and you'll still be fine.

(I ran the original searx for a year or two locally - no issues at all).

keepamovin

14 replies

2d11h

2024-04-06 07:15:43 UTC

If anyone is interested in searches applied to the full text of every page in your browser history, or to only select pages that you bookmark, check out our project DownloadNet (formerly, and possibly, futurely: "DiskerNet").

It hooks into your browser to give you an augmented experience. The UI is pretty simple (think 1997 era google but without CSS haha), and we don't do anything super complex with search (but could in future), but it works not bad. Check it out!!!

https://github.com/dosyago/DownloadNet

Oh, it also makes your content (again either everything you browsed or only what you booked) available offline. So if you work on an oil rig, or shipping, or long haul freight, can be a good way to browse as normal but save yer satellite bandwidth!!!

nickburns

9 replies

2d7h

2024-04-06 10:58:33 UTC

it's unclear to me why anyone, particularly anyone with even a passing interest in what the topic of this submission has to offer, would be even remotely interested in being the "master archivist of your own internet browsing."

i don't need anything else archiving anything related to my internet browsing except for my human brain. and yes, that's just me...

but how is the shameless plug of this not just therefore off-topic but diametrically-opposed-to-total-personal-privacy tool appropriate here?

keepamovin

8 replies

2d7h

2024-04-06 11:04:19 UTC

Is funny because this totally offline and locally hosted search engine in DownloadNet is potentially the most private of all.

I get if you’re not interested, but I imagine people interested in locally hosted search-related solutions, may be.

Your view is probably more personal and hard to support in general given this, and given the comment’s position and votes indicating at least some people are interested.

I totally understand why you wouldn’t want your browsing history archived anywhere. But that is what search engines do somewhat. It’s okay, everyone’s different.

nickburns

7 replies

2d7h

2024-04-06 11:07:47 UTC

none of that (most includingly comment position and votes) = privacy.

this tool is not relevant here.

keepamovin

6 replies

2d7h

2024-04-06 11:11:32 UTC

none of that = privacy

this tool is not relevant here

No it’s relevant. You don’t think self hosted and offline is private?

nickburns

5 replies

2d7h

2024-04-06 11:14:24 UTC

i self-host my human brain online in my own skull. i feed it and nurture it so that it can continue to perform and offer me the highest level of privacy i could possibly maintain.

keepamovin

3 replies

2d7h

2024-04-06 11:26:40 UTC

mental privacy, huh, nickburns? That’s an interesting concept.

there is no man in the desert. And no man needs nothing.

Tho I prefer the west coast of Zaire or Suid-Afrika myself.

nickburns

2 replies

2d6h

2024-04-06 11:29:37 UTC

  mental privacy, huh, nickburns? That’s an interesting concept.

you made your plug under the guise of asking if anyone had interest, i offered mine, and now i think we're done here, keepamovin.

keepamovin

1 replies

2d6h

2024-04-06 11:36:15 UTC

Seemed so

nickburns

0 replies

2d6h

2024-04-06 11:42:53 UTC

before you go... i apologize for being a dick about it. i'd have to really reflect some more on why it felt necessary to go about it in this way, which is inevitably a deeply personal reflection.

but if i may just say, privacy as a concept for a truly egalitarian society is something very near and critical in my opinion. marketing, on the other hand, is not.

good day to you, sir.

Art9681

0 replies

19h25m

2024-04-07 23:03:34 UTC

The entire field of Information Technology exists because the world disagrees that the thing in your skull has an acceptable level of intellectual and recall performance. And here we are on a social site dedicated to the pursuit of making it better. And so are you!

One of us, apparently.

DavideNL

2 replies

2d9h

2024-04-06 08:49:40 UTC

I don't see any documentation at all, like what browsers are supported, etc. :/

alexdeloy

1 replies

2d8h

2024-04-06 09:41:02 UTC

Me neither, it really would benefit from a better documentation since I like the idea a lot.

I just tried it out and it seems to be tied to Chrome. Since I use Firefox and Chromium as my daily drivers this does not work for my case. I understand that they probably rely on some Chrome internals to dig through the content, a SOCKS Proxy approach would have worked better and would have no need to switch between a "save" and "serve" mode. But then again I was only scraping the top of it because of the lack of browser support. Will keep an eye on this one though!

outofpaper

0 replies

2024-04-06 18:21:00 UTC

While it lacks a search feature last I checked there's always https://github.com/davidfstr/webcrystal

One .py file. Only one dependency (urllib3).n with a little love the concept could become a full transparent proxy.

Avamander

0 replies

2024-04-06 18:02:02 UTC

How is this better than YaCy?

arcastroe

11 replies

2d14h

2024-04-06 03:29:58 UTC

This is great. I wish there was a way to block certain domains from ever appearing on search results.

EDIT: Looks like there's already an open issue:

https://github.com/searxng/searxng/issues/2351

BeetleB

5 replies

2d13h

2024-04-06 04:36:51 UTC

Kagi does that.

lannisterstark

3 replies

2d13h

2024-04-06 05:24:33 UTC

I trust myself a bit more than I trust someone else to run my queries sadly. I understand that they claim to store no user data or associations etc, but honestly, it's just their word.

BeetleB

2 replies

2d12h

2024-04-06 06:12:03 UTC

I understand that they claim to store no user data or associations etc, but honestly, it's just their word.

My guess is that if they are found to do so, then they open themselves up to lawsuits. Not collecting data isn't merely a perk - it's practically the reason Kagi exists.

zelphirkalt

0 replies

2d10h

2024-04-06 08:09:14 UTC

Even if not a lawsuit, people will judge it themselves and vote with their feet.

marginalia_nu

0 replies

2d8h

2024-04-06 10:03:56 UTC

Another big reason not to keep this stuff is just the cost of dealing with requests from law enforcement. At some point you start getting them.

If you don't have any logs you can just always say the princess is in another castle, since you can't provide data that doesn't exist.

If on the other hand you do have the requested information, you need to determine the validity of the request, and then extract the data; or refuse to comply and possibly put yourself at legal risk. For a smaller business that's probably a can of worms you'd rather avoid opening.

arcastroe

0 replies

2d13h

2024-04-06 04:55:17 UTC

Kagi requires an account to use, which is not great for privacy.

I understand Kagi is generally reputable, but I like the idea of a self-hosted alternative where you're in full control.

squarefoot

1 replies

2d14h

2024-04-06 03:51:08 UTC

uBlacklist does just that with Google and some other search engines. I use it with Firefox to filter out pinterest junk from search results. Also available for Chrome and Safari.

https://github.com/iorate/ublacklist

poulpy123

0 replies

2d11h

2024-04-06 07:04:01 UTC

just installed it to try. For the people that want to give it a try also, I noticed that several of the public list contains legitimate websites such as canva or reddit

dalf

1 replies

2d9h

2024-04-06 08:47:43 UTC

Disclaimer: I am one of the maintainers.

The intent of SearXNG is to be stateless (with no sessions on the server) and to work without JavaScript.

However, this approach limits certain features because of the restricted size of cookies (and other forms of browser storage require JavaScript).

arcastroe

0 replies

1d21h

2024-04-06 21:13:54 UTC

Thank you, that makes a lot of sense. Stateless is very good for privacy and I agree with that approach for a multi-user instance, (which I suppose is the most common use-case).

I'm picturing more of an instance-wide configuration of domain blocks for a private, single-user, self-hosted instance. But I understand this may not be the intended use of the project.

ThinkBeat

0 replies

2d9h

2024-04-06 09:03:38 UTC

Google used to do that, but then stopped. You can still do it manually by specifying by excluding them in (every) search you do,but the list can get along and it is far from a good user experience.

Kagi has this feature built in and it is a good user experience.

You can also use the uBlacklist browser plugin. My problem with that is that is slows everything down. I am not certain but I think all the works is done after the search is complete. That it filter the actual result. The two above limit it from ever being part of the result.

ozehlaw

5 replies

2d19h

2024-04-05 22:56:51 UTC

Interesting. Google is going to shit these days.

mrexroad

1 replies

2d14h

2024-04-06 03:52:23 UTC

Web content itself had gone to shit these days, in order to win google’s SEO game to win google’s Adsense game. “Google going to shit” is just a second order effect (or third/forth depending how you look at it).

prmoustache

0 replies

2d1h

2024-04-06 17:27:46 UTC

The good content has not disappeared. So it is still google going to shit if it can't make up what is good and what isn't, which was the reason people started using it in the first place 25 years ago.

YvUuXJiO

1 replies

2d15h

2024-04-06 03:16:49 UTC

i dont think so

nickburns

0 replies

2d15h

2024-04-06 03:22:59 UTC

i don't either. it's done been to shit for some time now. scraping its results can be highly effective though.

ijijijjij

0 replies

2d7h

2024-04-06 11:24:50 UTC

Google search has gone to shit since Google+ .... or more precisely, when they removed the plus operator in Google search around 2011. And no, the quotes aren't as good.

My bet is that Google will become "Google TV" and search won't be possible. They will just show you what they want. They'll probably frame it as "AI knows what you want to see".

Maybe they should ban Google instead of TikTok (I don't use either though).

keepamovin

4 replies

2d14h

2024-04-06 04:05:42 UTC

That's clever. X-ING (like those 'crossing' roadsigns), so it's like Search-ching.

There's quite some similarity between the CH and the X sound in English.

But, as this is HN probably someone with a PhD in comparative phonetics will explain why this is a common and infuriating misunderstanding of layfolken.

lannisterstark

1 replies

2d13h

2024-04-06 05:27:05 UTC

^_^'

Hah. FWIW, it's a fork of searX. https://github.com/searx/searx

keepamovin

0 replies

2d11h

2024-04-06 07:12:59 UTC

Hahaha! :) Good to know

yunohn

0 replies

2d6h

2024-04-06 12:09:29 UTC

X is pronounced as ch in Catalan too.

https://spanish.stackexchange.com/questions/16203/use-of-x-i....

nickburns

0 replies

2d7h

2024-04-06 11:25:52 UTC

the abbreviation "ng" in software development/forking/maintenance evolution denotes 'next generation.'

lygten

3 replies

2d12h

2024-04-06 06:01:37 UTC

Try this guy. Its not Kagi, but the search results are pretty good. Host it yourself on Docker.

    https://felladrin-minisearch.hf.space/

zuhsetaqi

1 replies

2d12h

2024-04-06 06:28:00 UTC

Wow, it's anoying that it reloads the result while I'm going through them

lygten

0 replies

2d9h

2024-04-06 08:38:20 UTC

You can disable the AI

boudin

0 replies

2d6h

2024-04-06 11:35:57 UTC

It seems to be based on SearxNG https://github.com/felladrin/MiniSearch

skoocda

1 replies

1d19h

2024-04-06 22:39:08 UTC

First few instances I tried are either returning no results, or only DDG results.

Error! Engines cannot retrieve results:

brave ( Suspended: too many requests )

google ( Suspended: too many requests )

qwant ( server API error )

johnklos

0 replies

1d3h

2024-04-07 14:42:53 UTC

I've run in to exactly that. By putting my SearXNG on a machine that also does NAT for a busy network, this can be avoided. This is definitely one instance where IPs from a colo are a bad thing and residential IPs are a good thing ;)

bilegeek

1 replies

2d19h

2024-04-05 23:05:22 UTC

OOC, does this support YaCy as another engine? Would be the best of all worlds if it did IMO.

hagbard_c

0 replies

2d18h

2024-04-05 23:30:55 UTC

It does, I run it that way with an optional fan-out to my personal YaCy instance. Here's the relevant part of settings.yml:

   - name: yacy
     engine: yacy
     categories: general
     search_type: text
     base_url: https://yacy.searchlab.eu
     shortcut: ya
     disabled: true
     # required if you aren't using HTTPS for your local yacy instance
     # https://docs.searxng.org/dev/engines/online/yacy.html
     # enable_http: true
     # timeout: 3.0
     # search_mode: 'global'

Change 'disabled' to 'false' and point it at whatever YaCy instance you want to use. It can use the 'general' and 'images' categories.

ranguna

0 replies

2d7h

2024-04-06 10:55:08 UTC

Lots of people publicly host searx instances. There's a list of publicly available instances online, but if you are looking for a tool that randomly redirects you to an instance for every search you do on your browser's bar, you can use neocities: https://searx.neocities.org/changelog

I use this all the time. A downside is that sometimes you land on an instance that doesn't provide any results or gives you really poor ones. This has been happening less frequently recently.

nickburns

0 replies

2d17h

2024-04-06 00:55:16 UTC

use when i'm tired of picking obfuscated Fumo plushies or Minecraft screencaps on https://4get.ca/. i don't even know what a Fumo plushie is, never mind six of them.

longitudinal93

0 replies

2d19h

2024-04-05 23:15:49 UTC

Been running thieves the default on all my devices for the past year and I couldn't be happier. Have only had it choke twice and it just needed to be updated to be back in business.

kristjank

0 replies

2d10h

2024-04-06 08:13:30 UTC

It's my favourite tool to average out the individually shitty results of mainstream search engines into something vaguely usable.

johnklos

0 replies

1d3h

2024-04-07 14:50:08 UTC

SearXNG is great, with a few caveats:

Running it on a machine that also does NAT for many other machines helps to prevent getting blocked by upstream search engines like DuckDuckGo. It'd be good if access to certain upstream search engines could be sent through, say, a proxy set up elsewhere to prevent this very, very common problem, if you can't run it from an IP used for other things.

I'd like to figure out how to have a mode where my search is 100% literal - where every word I type must be in the search results exactly as I type them. Perhaps that's the equivalent of putting a "+" in front of each word, and putting each word in quotes? It's annoying that my words are constantly getting changed for me because there aren't many results, which I expressly don't want.

Like mentioned elsewhere, I want to be able to explicitly exclude certain domains. I get that SearXNG wants to be stateless, but I could either configure a separate URL for it or simply configure it for all searches. For instance, if I search for a PDF manual for something, I never, ever, ever want to see anything from "manualslib.com" and sites like it.

Other than these things which'd be nice to address, I'd say running SearXNG and encouraging people to use it instead of Google has worked quite well :)

fasa99

0 replies

2d2h

2024-04-06 15:41:49 UTC

People always sell Sear, but myself, I'm a fan of presearch.com I have no affiliation with them whatsoever or financial interest. I have no interest in their crypto based business model. In fact I think their lack of google or bing style search result filtering is entirely due to lack of funding and/or prioritizing other things more important to success, not due to taking a stand on free speech or anything like this. And that's perhaps how it was in the early days of the internet, when maslow's hierarchy of corporate needs focused on trying to make the thing work versus public relations goodfeels and presenting only rightspeech.

Anyway, if I'm looking for some topic I believe google would be known to filter heavily, or something esoteric, I take a look at presearch to get a second opinion. I'd also love to see archive.org do something similar, archive.org has an amazing collection of data, poorly indexed and poorly searchable.

dalf

0 replies

2d2h

2024-04-06 16:18:21 UTC

A few years ago, I remember someone conducted a study on the quality of SearX(NG) results using different Internet providers: mobile, fiber, and VPN.

I'm not sure if this person is still active on HN, but I'm really curious about the results.

cess11

0 replies

2d7h

2024-04-06 10:48:59 UTC

Been using SearX/SearXNG a lot over the years, in large part because I used and preferred the Dogpile meta search many, many years ago.

Apparently Dogpile still exists, didn't expect that: https://en.wikipedia.org/wiki/Dogpile

RGamma

0 replies

2d9h

2024-04-06 09:17:38 UTC

There's also https://metager.org