This takes me back. Before Google, meta search tools increased your odds of finding a decent answer between the spammy results from Alta Vista, Hotbot, Lycos, etc.
Run this on your personal internet connection, risk not being able to use a search engine.
Elaborate?
Virtually all public search engine endpoints see an insane amount of bot activity, often several queries per second.
If you delegate queries to e.g. google or bing at that rate, you'll be ip blocked in a heartbeat.
Search engines: they scrape the web, but get narky when scraped themselves.
Difference is a crawler paces the requests, respects robots.txt and rate limits, and doesn't typically invoke 50-100MB disk I/O per request.
Like I don't mind automated access to my search engine, I even offer a public API to the effect, that you can in fact hook into SearXNG. What I mind is when one jabroni with a botnet decides their search traffic is more important than everyone else's and grabs all the compute for himself via a sybil attack.
Ah duh, for some reason my mind didn't go to hosting the search instance locally and I misunderstood.
btw thank you for Marginalia! The spirit of the small web is very important to me.
It is a metasearch engine. So it uses other search engines. The point is to let multiple use it, so that Google et al. does not know who's using their service. Ie. it is a gloried proxy.
Honestly, I just use Kagi. Though I need to find some way to limit my searches to 300 per month.
that does not negate what OP said. your IP will still get blocked very quickly.
although existing searx instances have been run for years and they don't seem to be dropping like flies...
Well. I host a public instance. IP is still not blocked. YMMV.
Isn't Kagi also really a delegator? I've heard they delegate to brave among others.
Your IP address will get burned
Only if you expose it publically without auth while routing queries through your residential connection, which is not an advised configuration.
For personal use, you can run it directly on your machine or access over VPN. Queries to upstream search engines can be forwarded over proxies or VPNs as you see fit. Some work fine over tor and some can go over commercial or DIY tunnels.
To add, I have been running instance for years for family and friends. I run it behind a nginix basic auth with a config that sets a forever cookie first time you login. Really simple. Another good option is cloud flare zero trust.
how many total regular F&F users? do they ever ask you about logging? or is that beyond the scope of what most of them realize is happening?
A ~dozen. Several are technical and use it because it includes several private and paid engines on request.
Config is in a git repo I give access to if requested. One of the technical users modified it to keep pretty minimal logs. I guess they are trusting me to actually use that config but trust is pretty high in the group so not really an issue.
very interesting. sounds like an enviable group. thanks for your reply.
I think this needs a lot more clarification than is provided in this thread.
If you run it locally, and only you use it, then you won't get blocked - a given search engine will see about the same number of requests as if you used it directly.
Add a few house members and you'll still be fine.
(I ran the original searx for a year or two locally - no issues at all).
If anyone is interested in searches applied to the full text of every page in your browser history, or to only select pages that you bookmark, check out our project DownloadNet (formerly, and possibly, futurely: "DiskerNet").
It hooks into your browser to give you an augmented experience. The UI is pretty simple (think 1997 era google but without CSS haha), and we don't do anything super complex with search (but could in future), but it works not bad. Check it out!!!
https://github.com/dosyago/DownloadNet
Oh, it also makes your content (again either everything you browsed or only what you booked) available offline. So if you work on an oil rig, or shipping, or long haul freight, can be a good way to browse as normal but save yer satellite bandwidth!!!
it's unclear to me why anyone, particularly anyone with even a passing interest in what the topic of this submission has to offer, would be even remotely interested in being the "master archivist of your own internet browsing."
i don't need anything else archiving anything related to my internet browsing except for my human brain. and yes, that's just me...
but how is the shameless plug of this not just therefore off-topic but diametrically-opposed-to-total-personal-privacy tool appropriate here?
Is funny because this totally offline and locally hosted search engine in DownloadNet is potentially the most private of all.
I get if you’re not interested, but I imagine people interested in locally hosted search-related solutions, may be.
Your view is probably more personal and hard to support in general given this, and given the comment’s position and votes indicating at least some people are interested.
I totally understand why you wouldn’t want your browsing history archived anywhere. But that is what search engines do somewhat. It’s okay, everyone’s different.
none of that (most includingly comment position and votes) = privacy.
this tool is not relevant here.
none of that = privacy
this tool is not relevant here
No it’s relevant. You don’t think self hosted and offline is private?
i self-host my human brain online in my own skull. i feed it and nurture it so that it can continue to perform and offer me the highest level of privacy i could possibly maintain.
mental privacy, huh, nickburns? That’s an interesting concept.
there is no man in the desert. And no man needs nothing.
Tho I prefer the west coast of Zaire or Suid-Afrika myself.
mental privacy, huh, nickburns? That’s an interesting concept.
you made your plug under the guise of asking if anyone had interest, i offered mine, and now i think we're done here, keepamovin.Seemed so
before you go... i apologize for being a dick about it. i'd have to really reflect some more on why it felt necessary to go about it in this way, which is inevitably a deeply personal reflection.
but if i may just say, privacy as a concept for a truly egalitarian society is something very near and critical in my opinion. marketing, on the other hand, is not.
good day to you, sir.
The entire field of Information Technology exists because the world disagrees that the thing in your skull has an acceptable level of intellectual and recall performance. And here we are on a social site dedicated to the pursuit of making it better. And so are you!
One of us, apparently.
I don't see any documentation at all, like what browsers are supported, etc. :/
Me neither, it really would benefit from a better documentation since I like the idea a lot.
I just tried it out and it seems to be tied to Chrome. Since I use Firefox and Chromium as my daily drivers this does not work for my case. I understand that they probably rely on some Chrome internals to dig through the content, a SOCKS Proxy approach would have worked better and would have no need to switch between a "save" and "serve" mode. But then again I was only scraping the top of it because of the lack of browser support. Will keep an eye on this one though!
While it lacks a search feature last I checked there's always https://github.com/davidfstr/webcrystal
One .py file. Only one dependency (urllib3).n with a little love the concept could become a full transparent proxy.
How is this better than YaCy?
This is great. I wish there was a way to block certain domains from ever appearing on search results.
EDIT: Looks like there's already an open issue:
Kagi does that.
I trust myself a bit more than I trust someone else to run my queries sadly. I understand that they claim to store no user data or associations etc, but honestly, it's just their word.
I understand that they claim to store no user data or associations etc, but honestly, it's just their word.
My guess is that if they are found to do so, then they open themselves up to lawsuits. Not collecting data isn't merely a perk - it's practically the reason Kagi exists.
Even if not a lawsuit, people will judge it themselves and vote with their feet.
Another big reason not to keep this stuff is just the cost of dealing with requests from law enforcement. At some point you start getting them.
If you don't have any logs you can just always say the princess is in another castle, since you can't provide data that doesn't exist.
If on the other hand you do have the requested information, you need to determine the validity of the request, and then extract the data; or refuse to comply and possibly put yourself at legal risk. For a smaller business that's probably a can of worms you'd rather avoid opening.
Kagi requires an account to use, which is not great for privacy.
I understand Kagi is generally reputable, but I like the idea of a self-hosted alternative where you're in full control.
uBlacklist does just that with Google and some other search engines. I use it with Firefox to filter out pinterest junk from search results. Also available for Chrome and Safari.
just installed it to try. For the people that want to give it a try also, I noticed that several of the public list contains legitimate websites such as canva or reddit
Disclaimer: I am one of the maintainers.
The intent of SearXNG is to be stateless (with no sessions on the server) and to work without JavaScript.
However, this approach limits certain features because of the restricted size of cookies (and other forms of browser storage require JavaScript).
Thank you, that makes a lot of sense. Stateless is very good for privacy and I agree with that approach for a multi-user instance, (which I suppose is the most common use-case).
I'm picturing more of an instance-wide configuration of domain blocks for a private, single-user, self-hosted instance. But I understand this may not be the intended use of the project.
Google used to do that, but then stopped. You can still do it manually by specifying by excluding them in (every) search you do,but the list can get along and it is far from a good user experience.
Kagi has this feature built in and it is a good user experience.
You can also use the uBlacklist browser plugin. My problem with that is that is slows everything down. I am not certain but I think all the works is done after the search is complete. That it filter the actual result. The two above limit it from ever being part of the result.
Interesting. Google is going to shit these days.
Web content itself had gone to shit these days, in order to win google’s SEO game to win google’s Adsense game. “Google going to shit” is just a second order effect (or third/forth depending how you look at it).
The good content has not disappeared. So it is still google going to shit if it can't make up what is good and what isn't, which was the reason people started using it in the first place 25 years ago.
i dont think so
i don't either. it's done been to shit for some time now. scraping its results can be highly effective though.
Google search has gone to shit since Google+ .... or more precisely, when they removed the plus operator in Google search around 2011. And no, the quotes aren't as good.
My bet is that Google will become "Google TV" and search won't be possible. They will just show you what they want. They'll probably frame it as "AI knows what you want to see".
Maybe they should ban Google instead of TikTok (I don't use either though).
That's clever. X-ING (like those 'crossing' roadsigns), so it's like Search-ching.
There's quite some similarity between the CH and the X sound in English.
But, as this is HN probably someone with a PhD in comparative phonetics will explain why this is a common and infuriating misunderstanding of layfolken.
^_^'
Hah. FWIW, it's a fork of searX. https://github.com/searx/searx
Hahaha! :) Good to know
X is pronounced as ch in Catalan too.
https://spanish.stackexchange.com/questions/16203/use-of-x-i....
the abbreviation "ng" in software development/forking/maintenance evolution denotes 'next generation.'
Try this guy. Its not Kagi, but the search results are pretty good. Host it yourself on Docker.
https://felladrin-minisearch.hf.space/
Wow, it's anoying that it reloads the result while I'm going through them
You can disable the AI
It seems to be based on SearxNG https://github.com/felladrin/MiniSearch
First few instances I tried are either returning no results, or only DDG results.
Error! Engines cannot retrieve results:
brave ( Suspended: too many requests )
google ( Suspended: too many requests )
qwant ( server API error )
I've run in to exactly that. By putting my SearXNG on a machine that also does NAT for a busy network, this can be avoided. This is definitely one instance where IPs from a colo are a bad thing and residential IPs are a good thing ;)
OOC, does this support YaCy as another engine? Would be the best of all worlds if it did IMO.
It does, I run it that way with an optional fan-out to my personal YaCy instance. Here's the relevant part of settings.yml:
- name: yacy
engine: yacy
categories: general
search_type: text
base_url: https://yacy.searchlab.eu
shortcut: ya
disabled: true
# required if you aren't using HTTPS for your local yacy instance
# https://docs.searxng.org/dev/engines/online/yacy.html
# enable_http: true
# timeout: 3.0
# search_mode: 'global'
Change 'disabled' to 'false' and point it at whatever YaCy instance you want to use. It can use the 'general' and 'images' categories.Lots of people publicly host searx instances. There's a list of publicly available instances online, but if you are looking for a tool that randomly redirects you to an instance for every search you do on your browser's bar, you can use neocities: https://searx.neocities.org/changelog
I use this all the time. A downside is that sometimes you land on an instance that doesn't provide any results or gives you really poor ones. This has been happening less frequently recently.
use when i'm tired of picking obfuscated Fumo plushies or Minecraft screencaps on https://4get.ca/. i don't even know what a Fumo plushie is, never mind six of them.
Been running thieves the default on all my devices for the past year and I couldn't be happier. Have only had it choke twice and it just needed to be updated to be back in business.
It's my favourite tool to average out the individually shitty results of mainstream search engines into something vaguely usable.
SearXNG is great, with a few caveats:
Running it on a machine that also does NAT for many other machines helps to prevent getting blocked by upstream search engines like DuckDuckGo. It'd be good if access to certain upstream search engines could be sent through, say, a proxy set up elsewhere to prevent this very, very common problem, if you can't run it from an IP used for other things.
I'd like to figure out how to have a mode where my search is 100% literal - where every word I type must be in the search results exactly as I type them. Perhaps that's the equivalent of putting a "+" in front of each word, and putting each word in quotes? It's annoying that my words are constantly getting changed for me because there aren't many results, which I expressly don't want.
Like mentioned elsewhere, I want to be able to explicitly exclude certain domains. I get that SearXNG wants to be stateless, but I could either configure a separate URL for it or simply configure it for all searches. For instance, if I search for a PDF manual for something, I never, ever, ever want to see anything from "manualslib.com" and sites like it.
Other than these things which'd be nice to address, I'd say running SearXNG and encouraging people to use it instead of Google has worked quite well :)
People always sell Sear, but myself, I'm a fan of presearch.com I have no affiliation with them whatsoever or financial interest. I have no interest in their crypto based business model. In fact I think their lack of google or bing style search result filtering is entirely due to lack of funding and/or prioritizing other things more important to success, not due to taking a stand on free speech or anything like this. And that's perhaps how it was in the early days of the internet, when maslow's hierarchy of corporate needs focused on trying to make the thing work versus public relations goodfeels and presenting only rightspeech.
Anyway, if I'm looking for some topic I believe google would be known to filter heavily, or something esoteric, I take a look at presearch to get a second opinion. I'd also love to see archive.org do something similar, archive.org has an amazing collection of data, poorly indexed and poorly searchable.
A few years ago, I remember someone conducted a study on the quality of SearX(NG) results using different Internet providers: mobile, fiber, and VPN.
I'm not sure if this person is still active on HN, but I'm really curious about the results.
Been using SearX/SearXNG a lot over the years, in large part because I used and preferred the Dogpile meta search many, many years ago.
Apparently Dogpile still exists, didn't expect that: https://en.wikipedia.org/wiki/Dogpile
There's also https://metager.org
memories. and now it's come back around some decades later in the name of digital privacy.
How are they private though? Unless they relay your search it is the exact opposite of private.
If you host your own instance:
From: https://docs.searxng.org/own-instance.html#how-does-searxng-...
The docs mention a caveat below at "What are the consequences of using public instances?":
All of that is fine but by simply having your IP, Google can continue to profile you in countless ways with data they collect in other ways and it wouldn't be expensive for them at all.
SearX acts as a proxy, you are not submitting your IP to Google.
We're talking about self hosting, right? The proxy is using the same IP.
if self-hosting, that may very well be correct.
a few examples of a self-hosted design that would not, include policy-based routing over a VPN with one or multiple tunneled hops, or through another external proxy. (and then there's also that 'onion' routing 'protocol' there—but i'm not clear if/how that integrates with clearnet destinations like publicly-accessible search engines if at all.)
i think since 'IP address' has become something of a baseline non-technical understanding of one of the critical components of networking, it becomes increasingly difficult for non-netpeeps to fully grasp the many uses and non-uses of addressing.
a proxy (or proxies) and how they can shield but one or many of ' your' IP addresses throughout an egress packet's many hops (and from who or what destination it or those addresses can be shielded) is a pretty advanced concept when you think about it.
not to mention that, at this point, bare source IP address is a pretty dilute tracker compared to other current methods of identity profiling or traffic fingerprinting.
nice succint correction on your part regardless.
I would assume that the relaying can strip the request from identifying information such as IP, cookies and other tracking mechanisms that you get when visiting e.g. google.com.
They do indeed relay your query. How else would they work?
privacy is achieved through the proxy and therefore aggregation of disparate requests/queries. some anonymity is therefore achieved, at least from the perspective of source search engine operators, by blending into 'the crowd.'
but the idea is not necessarily anonymity so much as privacy by foiling the creation of any even somewhat accurate marketing/data profile derived from 'your search.'
I liked Copernic: it was a native Windows 9x Meta search tool.
Wow. I came here to the comments to write about Copernic! It was a super valuable tool in the pre-Google era, heck even in the early 2000s.
This brought a smile to my face... I worked at a company started by Copernic alums (Coveo).
Northern Light was nice though.
Not just avoiding spam but some meta search engines (Dogpile IIRC) could also search specialist search engines like White Pages and Yellow Pages (long before Yelp etc existed). You'd be able to find business listings and contact info that wasn't normally found on web search engines. They could also include FTP search results which was useful as public anonymous FTPs had yet to fall from use.
Dogpile was another.