return to table of content

YaCy, a distributed Web Search Engine, based on a peer-to-peer network

ssijak
41 replies
5d10h

Long time ago I worked for a startup called Wowd which built distributed search engine. It was acquihired by Facebook.

On of the biggest issues was how to entice people to download and run the client/node.

I half wondered afterwards if slapping some crypto on top of it which would be mined by running the node and providing resources would help. My gut says easy yes, but my mind grimace at the abomination.

lifty
23 replies
5d9h

Not sure why it would be an abomination. This is the exact use case which is a fit for cryptocurrency networks.

rakoo
22 replies
5d5h

You have to look beyond the surface. Cryptocurrencies work specifically to address a system where no node can trust any other node. If I cannot trust any other node, why would I fetch anyone else's index, or ask them for the results of a query, or even talk to them ?

Unless there can be a way to trivially verify what others tell you, crypto currencies are a dead end

mhluongo
9 replies
5d5h

You have that issue without cryptocurrencies as well, you'd just be relying on the kindness of users rather than crypto incentives.

You always need a way to hold nodes accountable in a system like this, or it'll be rife with manipulation — because there's already a strong, innate incentive to manipulate results. Today, we call that industry "SEO".

rakoo
8 replies
5d5h

What you don't understand is that "I don't trust others" is not a terminal statee I'd rather build trust again, create human connections, or rather, put them in front because there are always connections; nothing works if you trust noone.

Building a societal system where you know you can rely on your peers, you build together, is a more joyful, more resilient, more ecological and also more realistic way of building a thriving society than distrust-by-default that cryptocurrencies live for.

idiotsecant
5 replies
5d4h

Your current fiat currency is not based on love and trust. Its Proof-of-World-Hegemony which puts crypto based consensus mechanisms to shame in terms of how not based on love and trust it is.

rakoo
3 replies
5d2h

My current fiat currency is absolutely based on trust that the State will uphold any disagreement, even though I know it is not benevolent.

I also don't understand your point. The current world is not what I want, so let's make it worse according to my values ?

idiotsecant
2 replies
5d1h

The value of, for example, the dollar is not based on your trust, at least not at the first order. It's based on the economic and military power backing it up.

rakoo
1 replies
4d21h

Absolutely it is: it is based on the trust we all have that the government will do whatever it takes to guarantee the value of a dollar. Me being able to commerce with you in dollars and not, say, in old zimbabwean dollars rests on the shared assumptions that the US State can and will be there.

idiotsecant
0 replies
1d19h

We're playing with words now but the power of the dollar does not arise from your faith in it, it arises from the nations around the world that view it as a stable medium of trade. This stability is based on the employment and perception of ability to employ hedgemonic hard and soft power. The US isn't going anywhere because it has the guns, bullets, bombs, allies, trade, and diplomacy to stay in the top spot. Your faith is a product of that system, not a cause of it. Your faith won't build aircraft carriers.

Brian_K_White
0 replies
4d23h

Sure it is. When someone gives me a dollar, I have no idea if it's fake or stolen.

That sort of thing only gets handled very indirectly and much later and after a bad actor does their bad thing enough times for the surrounding greater population of good actors to notice a pattern.

Brian_K_White
1 replies
4d23h

And I think this is not even stupid either.

Bad actors exist and there must be some process for identifying and dealing with them, but they are not the majority of people and so probably don't have to be the first, last, primary, and only consideration at all times.

IE living in a bomb shelter is not a life worth living, even though yes you will be safe from bombs and theives.

rakoo
0 replies
4d20h

Exactly. If I'll have to depend on someone else anyway (and I will), might as well build trust because a life being cautious about everything and everyone is not worth living. Only those with already vast amounts of money can afford it because they trust (heh) other people working for them to taue care of that, but to non-jokingly propose it as a standard for everyone is a dystopia.

zubairq
3 replies
5d2h

Interesting comment about how cryptocurrencies can enable a system where no node can trust any other node. Something for me to think about as I am building a peer to peer system (not a search engine though)

rakoo
2 replies
5d2h

Cryptocurrencies only help where no one can trust anyone. But if that's the case, I claim that such a system is not viable in the long term.

zubairq
1 replies
4d23h

Good point. Does this mean that Bitcoin is not viable in the long term?

rakoo
0 replies
4d20h

Bitcoin as a speculating tool lives as long as speculation can live. Bitcoin, or any cryptocurrency as an actual currency exchanged at large scale will not work, or at least not in a democracy.

miohtama
2 replies
5d2h

Cryptocurrencies solve spam problem, not trust problem. No one can spam the network with new write data (transactions) because spam would become expensive. Although people still do, and Ethereum is full of spam tokens, meaning the transaction cost is still too low. This was also the use case of Hashcash, predecessor in proof-of-work, and was designed to solve email spam.

You are paying either

- Block space: your transaction to be included in a block

- State: modifying the world state (EVM in Ethereum)

Trust problem is solved by various other means, usually on libp2p level, by banning node (IP addresses) that send you bad data, which you can verify by comparing it to data from other peers.

rakoo
0 replies
4d20h

Cryptocurrencies slow down the rate of data not because of spam but because a slower rate means a higher consistency across the network: cryptocurrencies' goal is to agree on a consistent state with peers who do not want to negotiate. If the consistent state is pure garbage then that is not a problem for blockchains, because from blockchains' point of view, everything is fine.

Spam is not a function of rate but of content. Spam can absolutely be sent in a blockchain, as you say, and making the price higher only makes both spam and non-spam more difficult. Spam for me might be actual legit information for you.

Hashcash is another beast, it only has the proof-of-work part, not the money part (contrary to its name) so it's not comparable.

dumbfounder
0 replies
4d23h

They also solve the trust problem through consensus using proof of stake. If there is enough financial skin in the game to behave correctly, then that should be enough to make sure that results are not tainted.

mattdesl
2 replies
5d2h

This seems like something that could be verified through ZK proofs. The data to search could be represented by a public merkle root, and the searching/indexing given the user query could be programmed in a ZKVM like RISC0[1].

[1] https://www.risczero.com/zkvm

notfed
1 replies
5d1h

Most information is not a math equation.

mattdesl
0 replies
4d10h

As it turns out, a lot can be.

The concept of a ZK VM is that it is able to prove arbitrary code. Risc0 and the more recent SP1[1] both compile arbitrary Rust programs into ZK circuits for generating and verifying execution proofs.

[1] https://github.com/succinctlabs/sp1

lifty
1 replies
5d3h

You should be able add incentives in the system so that people store the correct index. You can check the incentive design of Filecoin for an example of how you can do that. Obviously it depends on the application how the incentive mechanism should be built.

rakoo
0 replies
5d2h

Filecoin is "easy": it is trivial to verify that the blob you stored is the one I wanted you to store. There is no trivial way to verify that you indexed what I wanted you to index, or that you reply what I wanted you to reply.

I highly dislike monetary incentives because they perpetuate inequalities by design, so here's another incentive: if you store a correct index, I will keep working with you and we can build an awesome system together. We can coordinate by talking to each other rather than trying to get money from each other.

worksonmine
10 replies
5d9h

but my mind grimace at the abomination

Why would that be an abomination? It's a perfect use-case. Like you noticed people need incentives to volunteer their hardware. If you hate crypto because it's crypto you can just use fiat instead.

komali2
4 replies
5d8h

Like you noticed people need incentives to volunteer their hardware.

I wonder if this is because "volunteer your hardware" projects sometimes involve someone making else money, and if someone else is making money but not you, why should you donate your hardware?

For the truly libre "hardware donation" projects, they seem to be doing ok without financial incentivization. What immediately comes to mind is the petabytes of data flying around on peer to peer systems through torrenting. I know people that spend thousands of dollars a year on upkeep and upgrades for what are essentially super seedbox homelabs (I'm one of them too :P )

There's also communities like soulseek where people keep TBs of music up, often seeking out rare tracks to make available to the community for free.

There's folding@home and seti@home, and I'm sure other similar projects I haven't heard of, where people donate cycles just for the common good.

folding@home is a great example because we can directly compare the people that are "incentivized" to participate with bananocoin, a cryptocurrency rewarded based on work cycles in folding@home. You can see all bananocoin miners here under the banano.cc team: https://stats.foldingathome.org/ That team is in first place for work completed, however are only just surpassing the linus tech tips team, and not to mention compared to a bunch of other teams (and private "donors") they're a very small % of work completed for folding@home

So therefore I disagree that people "need" incentives, there just needs to be no, erm, disincentives, if that's a word.

shinryuu
3 replies
5d7h

I know people that spend thousands of dollars a year on upkeep and upgrades for what are essentially super seedbox homelabs

And then you end with "there just needs to be no disincentives". If anything spending thousands of dollars a year on upkeep should be a disincentive for most people. You are not most people though, since you do it voluntarily.

komali2
2 replies
5d7h

I'm a maniac though. I used to run my stack just fine off a raspberri pi with a USB harddrive plugged in.

Actually, before that, I used to run it off an old macbook.

Do we need it to be where everyone hosts a node? I just had this conversation with a friend yesterday actually. We were in disagreement about the accessibility of self hosting and federation. He was of the opinion that we should push LLMs to where anyone can type "I want to host a video hosting platform" and chatgpt.exe will find and install jellyfin on their computer and set up a cloudflare tunnel, or whatever.

I'm more of the opinion that we should increase the quality of documentation until the one person just weird and nerdy enough out of a group of 20 will be able to deploy things on leftover hardware, and share with their friends.

What do you think?

shinryuu
1 replies
4d20h

In terms of accessibility I don't think it would be bad per se if chatgpt.exe would be able to help you with that. Though both of us know that there is maintenance involved and once something catch fire (which will happen at some point), you are kind of helpless.

Something like pikapods.com certainly helps with accessibility, even if it isn't self-hosting per se.

But all of that doesn't have little to do with incentives or disincentives. Even with very high accessibility there are disincentives to self-host. It will cost time and money in some way. For some people the intrinsic motivation will override those disincentives. But I think for the majority of people there will still not be enough motivation to do it.

There are more important things to do for them.

komali2
0 replies
4d15h

There are more important things to do for them.

Well yes, because right now society disincentivizes people from ever spending their time from anything that doesn't earn them at least a little bit of money. Kind of to my earlier point that "FOSS" projects with a monetization angle dicincentivizes people to contribute their time, to make someone else money. Well, except for the fact that it's almost a requirement for people in certain geographies to have FOSS commits on their portfolio, due to economic disparity. Yay free labor pool.

Should we actually leverage our technology to share the bounty of post-scarcity we could have today, don't you think people would spend more time on passion projects?

bawolff
4 replies
5d4h

I mean, how do you verify nodes are being honest and not just sending fake data for the free crypto (like what happened with seti@home and there wasn't even money involved)

Not to mention, where is the value of this coin going to come from? Will people pay to use this search engine? That seems unlikely.

It doesn't sound like the perfect use case to me.

numpad0
1 replies
5d2h

Agreed; feels to me that people here is underestimating malice on the Internet. Simple crypto-based search credit system will be overtaken with fake queries and fake data.

I'm not entirely confident that crypto-like reward mechanisms for distributed search is fundamentally flawed and unusable, but both the problem and solution needs to be refined a bit more.

worksonmine
0 replies
5d

Agreed; feels to me that people here is underestimating malice on the Internet.

I don't think we do. We just prefer to put our trust in algorithms and verifiable data sources. It's not like Google et al are the pinnacle of altruism, there have been cases where the promoted results are faked copies of the actual site you want to visit, fooling less computer savvy users to install malware.

The trust is put into the code, same principle as reproducible builds. It doesn't matter where you get the source, as long as the checksum matches. This way the censor side of the problem is solved.

That leaves the spam, which isn't really solved by the big corporations either. Last time I used google I got 2-3 pages of the same auto-generated bullshit on every technical search term I tried. This could be fixed by having the main index limited to trusted sites at the expense of discovering new content. The latter can be handled by opt-in indexes. If the goal is to index everything users could have their own filters for sites they don't want.

If you really want to spice it up allow me to maintain my own query function (dangerous and potential exploit yes) that I send to the nodes and I can handle my own ranking.

There's nothing that makes a distributed index more unsafe than one run by Google. If every query picks 2 random nodes and compares the results I would trust that query more than current Google execs opinions of what I'm allowed to see.

worksonmine
0 replies
5d3h

That's exactly why blockchain is a good choice. You verify that whatever X sends matches what Y and Z would send before any reward is received. Based on the shared index every query should return the same results, kids stuff really.

The monetization is a nut to crack yes, but Kagi works as a paywalled search engine. Otherwise just serve ads like all the rest already do? Tried and proven model, and in this solution they could be very transparent as there's no corporation behind trying to dupe the users for clicks to maximize profits. I even see the possibility for a hybrid model, don't like ads? Pay for the compute with your own coins.

The value comes from the network, trust and use-case. It doesn't have to be a new coin.

px43
0 replies
5d3h

By ignoring cryptocurrencies, you've missed out on over 10 years of progress in this space. We have things like zero knowledge notaries and data availability sampling proofs. Actively Validated Services are also a thing. Service providers stake some asset, and interested parties can challenge them at certain intervals to ensure that they are properly performing their duties. Through the magic of Merkel Trees, and soon Verkel Trees (basically Merkel tress, but using vector commitments for super fast proofs) challengers can demand that that service providers generate a proof that some data they hold matches some criteria. The nice thing about it is that because it's a zero knowledge proof, the challenger doesn't even need to know what that data is, and what they get back is a succinct proof that they can check very quickly, basically like checking an md5 some for execution correctness.

It's cool shit, you should really look into it.

zoklet-enjoyer
2 replies
5d9h

We have proof of stake now. The nodes could be run by the chain validators and they get a cut of the staking rewards. Look up how proof of stake works on the Cosmos chain. You could totally do this and I bet it would take off, at least in that section of the Internet that's into Cosmos/Tendermint chains. I'd use it

zoklet-enjoyer
0 replies
5d

Hahaha one downvote. I love to see it

ssijak
0 replies
5d9h

I was definitely thinking of some kind of proof of stake, not proof of work.

mdaniel
0 replies
5d1h

Nothing new under the sun, as they say: https://www.presearch.io/engine and just as you said I was unwilling to run a closed-source node binary

colinsane
0 replies
5d

if the situation is really "nobody will run this software unless i pay them to", then you're doomed regardless. there's nothing wrong with the classic route: package your software for the stores/distros you're familiar with, make your software as easy to package as humanly possible for anyone else who'll come around, document the hell out of it, submit it to the handful of top-level news feeds from which it'll percolate, and then wait. maybe you don't like waiting?

6510
0 replies
5d9h

After that the issue becomes ranking. Should say became since LLM's could both rank pages and generate them on "demand" to fit the query.

YaCy has so many buttons I'm not even sure if it lacks it but playing around with it it is very cool to crawl large amounts of pages and serve requests until you want to do other things with the computer and the background process is to bloated. Something like a turtle mode like torrent clients have would be useful.

Long ago there was a Chinese p2p client with a rootkit that would seed at 1 kb. I haven't used it but was told it worked remarkably well.

vGPU
6 replies
5d9h

Has it gotten any better recently?

I run a node but I haven’t actually used it as a search engine in a while, as I found the result quality to be exceedingly poor.

rahen
3 replies
5d9h

I remember trying it for a while in 2012, but the results were essentially worthless, probably because there were so few nodes/crawlers back then. I guess the more users there are, the better the results.

WarOnPrivacy
1 replies
5d4h

I remember trying it for a while in 2012, but the results were essentially worthless,

I had mine crawling gov, mil, etc sties for pages that Google was starting to delist back then. Inbound requests were heavy with porn until I tweaked - IDK, something.

Brian_K_White
0 replies
4d12h

"until I tweaked - IDK, something."

omg so much this.

I got an instance going in a truenas core jail, freebsd and using freebsd java not a linux vm or linux abi compatibility. had to make my own rc script.

Then had to mess with the disk & ram settings to get it to run for more than a day. But the settings are not actually explained at all and whatever they do, they definitely don't do what their names and worthless tooltips say they do.

It seems to be running now indefinitely without killing either itself or the host, in full p2p mode, but I really have no idea why it's working, or really for sur if it actually is fully. I changed "idk, something"

And I don't use it for search myself so far. Maybe some day but for now I'm paying for kagi.

I just like the idea and want it to be a thing, and it seemed a little less "invite a world of shit and attention onto my ip" than running say a tor exit or something. Maybe only a bit less but I'll see how it goes and react if I need to.

viraptor
0 replies
5d6h

Alternatively, ignore the public network (it's still useless) and run it as your own crawler. Seed it with your browsing history, some aggregators like HN, your favourite RSS feeds, etc. and you'll be good.

Avamander
1 replies
5d4h

No.

Either it picks up too much garbage if you allow any P2P data exchange (can't allow only outgoing AFAIK) or it kinda only knows about the sites you know about. Which kinda defeats the purpose.

Even assuming you just want a specific index for yourself of your own content then it struggles to display useful snippets about the results, which makes it really tedious to shift through the already poor results.

If you try to proactively blacklist garbage, which is incredibly tedious because there's no quick "delete from index and blocklist" button under index explorer, then you'll soon run into an unmanageable blocklist, the admin interface doesn't handle long lists well. At some point (around 160k blocked domains) Yacy just runs out of heap during startup trying to load it which makes the instance unusable.

It also can't really handle being reverse proxied (accessed securely by both the users and peers).

It also likes to completely deplete disk space or memory, so both have to be forcefully constrained. But that ends up with a nonfunctional instance you can't really manage. It also doesn't separate functionality enough that you could manually delete a corrupt index for example.

Running (z)grep on locally stored web archives works significantly better.

bobajeff
0 replies
4d23h

Those are pretty bad issues. I remember using it along time ago and only remember the results being bad. I've heard that Yacy could be good for searching sites you've already visited but it sounds like even that might not be a good use case for it.

I do understand the taking up of disk space thing. It's hard to store text of all your sites without it talking up a lot of space unless you can intelligently determine which text is unique and desired. Unless you are just crawling static pages it becomes hard to know what needs to be saved or updated.

b2bsaas00
6 replies
5d9h

Could this be used for a Torrent search engine?

worksonmine
2 replies
5d9h

Recently there was a distributed tracker on the front page. Probably more what you're looking for.

rakoo
0 replies
5d4h

Note that it's not a distributed tracker, it's an indexer/tracker/search engine that uses distributed resources (the nodes in the dht)

feverzsj
1 replies
5d8h

btdig is still alive.

qingcharles
0 replies
5d1h

btdig has the data, but its search is subpar :(

fddrdplktrew
0 replies
5d9h

if it is not censored, probably?

renegat0x0
4 replies
5d4h

There are already many project about search:

- https://www.marginalia.nu/

- https://searchmysite.net/

- https://lucene.apache.org/

- elastic search

- https://presearch.com/

- https://stract.com/

- https://wiby.me/

I think that all project are fun. I would like to see one succeeding at reaching mainstream level of attention.

I have also been gathering links meta data for some time. Maybe I will use them to feed any eventual self hosted search engine, or language model, if I decide to experiment with that.

- domains for seed https://github.com/rumca-js/Internet-Places-Database

- bookmarks seed https://github.com/rumca-js/RSS-Link-Database

- links for year https://github.com/rumca-js/RSS-Link-Database-2024

wongarsu
0 replies
5d

To be fair, of those only Apache Lucene predates YaCy. YaCy is very mature, but in terms of relative popularity for general web search probably peaked around 15 years ago.

fsflover
0 replies
5d3h

But which of those projects are distributed and FLOSS?

DrDroop
4 replies
5d10h

I once went to a workshop on a Sunday morning at the local makerspace to listen to someone talk about some kind of distributed search engine or something like that. One of the developers came from (I think) Germany to explain this to us the centralized sheeple. He just gave a demonstration of the thing, like here is the box you type stuff and here are the results. When I started to ask questions about how it worked an all he sort of acted annoyed saying it was all too difficult to explain. This was more than ten years ago, and yes I am still angry about it.

belter
0 replies
5d9h

160 bits ought to be enough for anybody :-)

albert180
1 replies
5d9h

It's probably him YaCy is made by a German Dude

buffalobuffalo
3 replies
5d2h

I ran YaCy for a while, but not as a node on their distributed search index. I just ran it as a search engine for all my own bookmarks. Unfortunately I never found a particularly good way of getting bookmarks into the system. So eventually I shut it down. Cool idea in theory though.

justusthane
2 replies
4d15h

I have plan that I haven’t implemented yet, but I want to route all my outbound internet traffic through a Squid reverse proxy, which will in turn add every visited URL to YaCy (except for domains I choose to exempt).

That way I’ll have a fully searchable index of every website I ever visit, which will hopefully solve the “Oh shit, what was that one website I found about X two months ago?”

A potentially easier thing to do would be create a bookmarklet that adds the current page to YaCy.

buffalobuffalo
0 replies
4d3h

Yeah. Bookmark indexing was my original goal. But yacy doesn't have a great interface for that. Doable with some work, but not something i wanted to sink too much time into.

arboles
2 replies
5d6h

Sort of hijacking the thread to ask, can YaCy or similar, be an alternative to Google's Programmable Search Engine? All I use it for is limit a search to a medium-sized list of domains. The aspect that makes running a search engine difficult on your own is lack of resources for crawling, I expect. But since I only care about a small list of domains, could I ditch Google's and run my own crawler like YaCy?

gtirloni
1 replies
5d6h

Is that the deceased code search tool?

You could run Sourcegraph and import/sync those repositories.

Or you could run your own ElasticSearch/Melisearch and crawl the websites yourself (if you're interested in things other than git repositories).

arboles
0 replies
5d5h

Is that the deceased code search tool?

No, it's Programmable. Though it's not actually programmable. I should've written Custom Search Engine instead, that's also a name for it.

cse.google.com - It's quaint that past the modern landing page, when using the search portal today, you still get some outdated iteration of Google UI design.

It's used, for example, for making OSINT searches.[0] Or at some point by at least one Wikipedia editor for a custom list of Reliable Sources for Anime & Manga.[1]

[0] https://www.osintme.com/index.php/2020/09/28/

[1] https://gwern.net/me#wikis

anthk
2 replies
5d4h

Ugh, Java. I'll wait for something like i2pd does for I2P, something called yacyd either in c, c++ or golang.

ravenstine
1 replies
5d1h

What's your objection to Java?

anthk
0 replies
5d

High CPU and RAM usage.

WarOnPrivacy
2 replies
5d4h

Yacy's still around. Nice.

After a year or two of hosting a Yacy instance (2014?) I started winding up on some general (probes, etc) blacklists.

I also host a small mail server and I was getting mail returned. I'd force an IP swap and a few weeks later it'd be the same. I had to let Yacy go.

1oooqooq
1 replies
5d2h

So that is how they block a people's search/crawler. Didn't thought they would use the most complicated method.

They also use block lists to add every single TOR node (even if not an exit) and every VPN under the sun (except for streaming, because, why would them, that's why they exist)

WarOnPrivacy
0 replies
4d12h

So that is how they block a people's search/crawler.

It seems less that was the intent and more they blacklist IPs with behavior they find annoying. They're super general lists.

They also use block lists to add every single TOR node

This annoyed the crap out of me. The stupid Dan list guy made it as easy as he could to lump low risk bridges in with high risk exit nodes.

jrussbowman
1 replies
5d

Nice to see search projects are still popping up. After a move, family life taking over and me getting more interested in Unreal Engine, my poor search engine is now more of an experiment in seeing how well it runs while basically on life-support maintenance updates I do. Starting to think I honestly should just take it down and save my $50 a month I spend maintaining it.

But I'll post it in a hacker news comment and maybe you all will give it enough traffic I can get excited about it again, lol

https://www.unscatter.com

jrussbowman
0 replies
5d

And for my immature moment of the day, the above comment was comment #69

gonesilent
1 replies
5d9h

Infrasearch / Gonesilent sold to Sun turned into project JXTA and died.

fortran77
1 replies
5d1h

Related to this — I’d love to see individuals making web pages again, and federated search engines indexing them. People don’t make their own hobby or fan or art websites anymore, and I think that’s partly because nobody will every find them with the big search engines.

emrah
0 replies
5d1h

I think it would be nice if the search results were "distributed" rather than deterministic.

So when i enter the same keywords, let's say there are 50 pages each of which would be equivalently good result for the search, rather than one page "winning", the search engine would alternate the winner among the many possibilities

charcircuit
1 replies
5d9h

Are the results still being gamed by sites using content keyword stuffing? The last time I used it the searching and ranking technology felt like they were 40 years behind state of the art.

liotier
0 replies
5d4h

In distributed indexing, spam management seems a much bigger problem than the indexing itself.

boyter
1 replies
5d9h

I actually half wrote a RFC of a spec and 2 implementations of a federated search last year. Rather than do the disturbed hash table that yacy does.

I wanted results to be re-rankable by the peers by sharing the scores that went into them. The idea being with a common protocol based on the ideas of ActivityPub you could get peers of searches working together to hopefully surface interesting things.

Something I should probably finish and publish at some point. It worked to the hundreds of peers I tested.

The reason I mention this is because I wanted to also add a front into yacy which tuned out to be harder than I expected. It’s a wonderful project and you can find great stuff through it but the way the peers return results sometimes it’s hard to find it again. It’s also not quite as hackable as I would have hoped at the time probably due to he project age.

I still think there is value in it though and I’d love to see yacy have its protocol explained as an apex so people could,build implementations in other languages more easily.

detourdog
0 replies
5d1h

I remember the first days of gopher browsing were like that. Gopher browsing to me was like swinging on vine to vine. The trick was remembering/documenting where each vine went.

treprinum
0 replies
4d23h

Is it worth dedicating 1-2 low power NUCs (4-8 core) to this on a 250MBit/s connection? Or does it need beefier CPUs/network?

rasulkireev
0 replies
5d10h

Love it. Super easy to self host and use. Now I have a personal Google!

nairboon
0 replies
4d22h

If you run YaCy with docker and it is still a junior peer, does the search return results from the global index or just the one that appears to be 'preinstalled'?

maxloh
0 replies
5d9h

See also: Presearch, another decentralized search engine, claimed that it will be open source. No source code available at the moment though.

https://presearch.com/

fho
0 replies
5d

I've been using several times over the last decades and never got good results. I think one instance is still running on my old computer at uni :-)

dredmorbius
0 replies
5d

Previously:

YaCy – your own search engine | https://news.ycombinator.com/item?id=32597309 | 2 years ago | 93 comments

YaCy: Decentralized Web Search | https://news.ycombinator.com/item?id=22246732 | 4 years ago | 41 comments

YaCy – The Peer to Peer Search Engine | https://news.ycombinator.com/item?id=17089240 | 6 years ago | 3 comments

YaCy: a free distributed search engine | https://news.ycombinator.com/item?id=12433010 | 8 years ago | 24 comments

YaCy: Decentralized Web Search | https://news.ycombinator.com/item?id=8746883 | 9 years ago | 29 comments

YaCy takes on Google with open source search engine | https://news.ycombinator.com/item?id=3288586 | 12 years ago | 17 comments

RGBCube
0 replies
5d9h

    curl failed to verify the legitimacy of the server and therefore could not
    establish a secure connection to it. To learn more about this situation and
    how to fix it, please visit the web page mentioned above.
Can't seem to access the page.