return to table of content

Google is the only search engine that works on Reddit now, thanks to AI deal

jedberg
45 replies
1d3h

They changed robots.txt a month or so ago. For the first 19 years of life, reddit had a very permissive robots.txt. We allowed all by default and then only restricted certain poorly behaved agents (and Bender's Shiny Metal Ass(tm))

But I can understand why they made the change they did. The data was being abused.

My guess is that this was an oversight -- that they will do an audit and reopen it for search engines after those engines agree not to use the data for training, because let's face it, reddit is a for profit business and they have to protect their income streams.

JohnMakin
28 replies
1d3h

One (in this case, 2) company's incentive for profit should not take priority over the usability/well being of the internet as a whole, ever, and is exactly why we are where we are now. This is an absolutely terrible precedent.

jedberg
22 replies
1d3h

I agree with you in theory, but in practice someone has to pay for all this magic.

ToucanLoucan
12 replies
1d3h

We did. As in we, the Internet, existed for a long time without anyone making money and we paid for the privilege. Websites were built and hosted at owner's expense, for years, with no expectation that they be financially rewarded. Sure some would run donation drives, or work with sponsors relevant to the community in question, but a whole ton, mine included, just cost me a lot of money over many years.

Those websites were definitely technically inferior, as the march of progress is unavoidable, but web hosting is cheaper than it's ever been. A VPS that utterly blows away what mine was capable of in 2007 for nearly a hundred a month can now be had for about $10 per month. Yet everyone wants these monolith platforms, but even that wouldn't be the worst thing ever, except that every one of these platforms has a backend to support that we in the Old Internet never did: a C-suite's worth of executives and millions of shareholders, who for some reason have decided that reddit can't exist unless reddit makes them reams and reams of money.

I'd be very, very interested to see how much of, even what's probably the most massive one of all, Facebook, is non-essential busywork that could easily be shut down tomorrow with no adverse effects to the platform. Firstly the entire executive class, just, they don't do shit to make Facebook the product. In fact I'd argue their decisions almost universally have made it worse as a product very consistently for it's entire lifetime. Then, all the marketing people. There's just no goddamn reason to advertise Facebook (or reddit for that matter) the brand is so ubiquitous, if you actually found someone who'd never heard of it, I'd give you a large chunk of money. Add to that, if Facebook was doing a good job of being what it ostensibly is, then people immediately become the best advertising, because people want to hang with people in these digital spaces. Then get rid of the people working to make Facebook addictive with dark patterns. Then get rid of the entire targeted ad division, because it's gross and inhumane. Pare the company down to engineers who build the product, and if anything, expand the moderation team so they can actually ensure the safety of the platform, and of course the IT staff to back them. Now what does Facebook cost to operate?

As far as I'm concerned, this pearl-clutching about "well websites have to make money" is grossly, grossly overstated. Websites don't cost that much to run. A ton of money is being siphoned off by the MBA parasites playing games in Excel all day. A ton more is being wasted developing features that advertisers want and users hate. A ton more is being funneled into making products artificially addictive to vulnerable people, to exploit them, so let's just not do that. And of course, leadership, rewarding themselves with generous compensation packages they aren't even remotely able to justify. Now what does your website cost to maintain? Surely not nothing, and for websites of substantial size, it will still be high, but I'm willing to bet it's a hell, hell, hell of a lot less than it was before.

kjkjadksj
6 replies
1d3h

Part of the issue is that it isn’t just the web, but the inevitable american corporate shareholder model. Even businesses could be mom and pop ified and made way more popular overnight: quit raising prices and cutting corners and it would actually stand for itself like a massive $7 burrito. However the expectation is that shareholders get returns. Costs must be cut. Prices must be raised. Margins must be improved. It doesn’t matter if this eats the business alive, as shareholders are sufficiently leveraged. The whole system is incentivized to select for inferior quality and taking all the available money on the table.

ToucanLoucan
5 replies
1d2h

My rant above and your response reminded me of all those tons of MMO games out there that are ancient, with a tiny playerbase, that remain profitable nonetheless simply because if you have a product that people like using, putting it into maintenance mode and doing the bare minimum to keep it running is a perfectly valid business strategy. The companies that buy these service games and run them effectively just buy completed money printers and keep them operating. It's not going to make anyone rich probably, but it's a perfectly valid and profitable way to go about things.

The silicon valley "grow at all costs, always evolve and innovate forever" model is so detached from the reality of most businesses in my experience.

Suppafly
2 replies
1d2h

The companies that buy these service games and run them effectively just buy completed money printers and keep them operating.

I hadn't really thought about that topic in that way before. Really explains why some of those older MMOs have no desire to really make any improvements, the owners are happy to just keep them powered up and collect a check but have no incentive to invest in making them better.

ToucanLoucan
1 replies
1d2h

I think the notion that sometimes things are just "done" is incredibly undervalued in our industry. Frankly I wish a ton of games I play would STOP updating.

Suppafly
0 replies
22h31m

I think the notion that sometimes things are just "done" is incredibly undervalued in our industry.

I agree, but also the flip side is that things rapidly switch from 'done and working' to 'dead' pretty quickly if no one is willing to do minor maintenance.

u8080
0 replies
1d1h

Yeah, like Rockstar with GTA V Online.

isoprophlex
0 replies
1d2h

In biology, you'd call that a cancer. But to people praising the gospel of VC money, it's something desirable...

lotsofpulp
4 replies
1d

Websites don't cost that much to run.

Popular websites that allow user content to be uploaded or linked do cost that much to run, due to content moderation.

There might be a small (relatively) forum here and there that a few good moderators are willing to slave away at keeping clean, but you will never see a website that allows user content with as many users as Reddit/Youtube/Instagram/etc be cheap.

Although, due to AI, the cost to spam the small forums might be so small that even they might come into the crosshairs.

account42
2 replies
4h54m

Popular websites that allow user content to be uploaded or linked do cost that much to run, due to content moderation.

Reddit outsourced most of it's moderation to unpaid volunteers.

lotsofpulp
1 replies
4h18m

I am referring to moderation of child sexual abuse material and other legally problematic content. I assume volunteers do not handle that.

ToucanLoucan
0 replies
3h33m

I don't see how that would fall to different people in reddit's case. I'm sure reddit has some moderators on staff but the vast, vast majority of their moderation happens on the proverbial front lines, which is basically all volunteers. I would hope there's a dedicated abuse team at Reddit that are actually paid people whom the volunteers can kick the truly sick shit to so it can be properly dealt with, but given the corporate culture Reddit has shown over the years, I also wouldn't be awfully surprised if it's JUST down to the volunteers either.

megaman821
0 replies
1d

Although it is quite surprising that mainly text websites (Reddit, Twitter) are hard to run sustainably but video and image websites (YouTube, Instagram, TikTok) can because it is easier to sell ads against them.

JohnMakin
4 replies
1d3h

This is a false dichotomy. You can have services, and not have them devolve into complete unusability in the name of profit. This isn’t sustainable either. The myopic pursuit of short term gains at the expense of the product will collapse at some point in the future, no matter how much you believe in this weird frog-boil internet we’ve inherited now.

twelve40
1 replies
1d2h

Complete unusability is when ai tools clone the content and people stop visiting the original service and participating. I'll leave it up to them to defend blocking duck duck go for example, but blocking "AI" bots for an online community is a matter of survival at this point.

talldayo
0 replies
1d1h

Alternatively, it's because the base platform has also devolved into unusability. Both Reddit and Twitter are in a position where their info is easily scraped, and their community is barely worth the advertising/paid-premium experience they demand from you. As both platforms continue to decline in quality, you might not even need to replace the original service. Both businesses appear intent on getting replaced.

talldayo
1 replies
1d2h

The myopic pursuit of short term gains at the expense of the product will collapse at some point in the future,

The myopic pursuit of short-term gains is the only playbook that works. Long-term business strategy is a gamble, and today's businesses have all learned that they'd rather make hay when the sun is shining than be remembered as a good business.

Twitter tried a long-term playbook to reverse their unprofitable sinkhole of a website. That ended up with them being undervalued and sold to the highest bidder.

latexr
0 replies
7h36m

Twitter tried a long-term playbook to reverse their unprofitable sinkhole of a website.

From what I recall reading at the time, Twitter was finally becoming profitable before the sale (last two years? It’s hard to find a source now since every story since is about some shit show or other post sale).

That ended up with them being undervalued and sold to the highest bidder.

You make it seem they were in dire straits and had to be sold for scraps, but that’s far from the case. They sold for more than their valuation to the only bidder because they understood what a good deal it was for them. They forced the buyer to not back out, after all.

meiraleal
2 replies
1d2h

how can we keep paying the ever-growing profits of multi-trillion dollar companies? This is insane.

jsnell
1 replies
1d2h

Reddit is 100x from being a trillion-dollar company, and is not profitable.

meiraleal
0 replies
1d

Reddit offers no magic is just a forum. Google used to do some magic decades ago and still profit from it.

account42
0 replies
5h2m

People were paying for forums before Reddit came along.

BeetleB
4 replies
1d2h

I know people will hate to hear this, but Reddit it's not important to the A well being of the Internet.

TeaBrain
2 replies
1d2h

I think it's the other way around, in that people don't like to hear how Reddit has become important due to the death of independent forums and the degree to which information has become concentrated on the site.

latexr
0 replies
7h45m

Independent forums, like RSS, are not dead. I use both every day.

BeetleB
0 replies
23h29m

The death of independent forums has been greatly exaggerated.

Of all the forums I used to be active in, many are still active. The ones that died did so because the community died (i.e. they did not shift to Reddit and the like).

Reddit is great simply because it allowed anyone to create a community. No need to get a LAMP stack and deal with security vulnerabilities in your forum SW.

These days you have Lemmy and its ilk. Much higher barrier than the old LAMP stack, but also much superior to it. I do hope it takes off.

account42
0 replies
5h3m

GP was not claiming that it is.

Closi
4 replies
1d2h

But I can understand why they made the change they did. The data was being abused.

Depends how you see it - if you see it as 'their' data (legally true) or if you see it as user content (how their users would likely see it).

If you see it as 'user content', they are actually selling the data to be abused by one company, rather than stopping it being abused at all.

From a commercial 'lets sell user data and make a profit' perspective I get it, although does seem short-sighted to decide to effectively de-list yourself from alternative search engines (guess they just got enough cash to make it worth their while).

Ajedi32
2 replies
1d2h

if you see it as 'their' data (legally true)

Is that actually true? Reddit may indeed have a license to use that data (derived from their ToS), but I very much doubt they actually own the copyright to it. If I write a comment on Reddit, then copy-paste it somewhere else, can Reddit sue me for copyright infringement?

jedberg
1 replies
1d1h

They own a non-exclusive worldwide right to it. You own the copyright, they have a license to use it however they see fit.

account42
0 replies
5h8m

They own a non-exclusive worldwide right to it.

They demand that right. That doesn't mean they actually have a right to use the content in ways that are not directly required for the operation of the website or that are otherwise surprising to the average user. Putting something in the TOS doesn't always make it a valid contract.

passwordoops
0 replies
1d2h

Enough cash or enough data on hand to show the majority of traffic comes from the search monopoly

ekidd
3 replies
1d2h

I personally feel that this kind of "exclusive search only by Google deal" should result in an anti-trust case against Google. This is the kind of abuse of monopoly power that caused anti-trust laws to be passed in the 1890s.

eddd-ddde
2 replies
23h38m

if i create a vacuum cleaner and decide to only sell it at Walmart you can't get mad at me for not wanting to sell it at costco

you can always buy a competitor's or make your own vacuum cleaner if you hate buying at Walmart

maybe what you are really mad about is Reddit monopolising content

ffgjgf1
0 replies
11h52m

Unless you’re deemed to be an unfair monopoly and abuse your position to harm consumers interests.

You don’t even need >90% market share for that to be the case. e.g. Standard Oil only controlled 64% of the US market at most, it was still broken upz

ekidd
0 replies
21h24m

Usually, to trigger any kind of anti-trust law, you need to have massive market share. In this case, for example, Reddit almost certainly hasn't committed any antitrust violations, because they're a relatively minor player in their market.

Similarly, if you start a vacuum cleaner company, you can make whatever exclusive deals you want. But if you control 80% of the market for vacuum cleaners, then you might need to be more careful about leveraging your market share in unfair ways.

If a company is part of a robust, competitive market (like Reddit), it's usually wiser to let customers vote with their wallets, and leave the government out of it. If a company becomes massively dominant (like Google or TicketMaster), and if it starts pushing exclusive contracts, it's much harder for customers to switch away.

ColinHayhurst
2 replies
1d3h

Person extensively quoted in the article here. They are welcome to reach out. But not a single person from any level did that, nor replied to my polite requests to explain and engage. We first contacted them in early June and by 13th June, I had escalated to Steve Huffman @spez.

toomuchtodo
1 replies
1d

An acquaintance investigating Reddit's moderation mechanization inquired how a major subreddit was moderated after an Associated Press post was auto removed by automod. They were banned from said sub. They inquired why they were banned, and they shared they would share any responses with a journalism org (to be transparent where any replies would be going, because they are going to a journalism org). They were muted by mods for 28 days and were "told off" in a very poor manner (per the screenshots I've seen) by the anonymous mod who replied to them. They were then banned from Reddit for 3 days after an appeal for "harassment"; when they requested more info about what was considered harassment, they were ignored. Ergo, inquiring as to how the mods of a major sub are automodding non-biased journalism sources (the AP, in this case) without any transparency appears to be considered harassment by Reddit. The interaction was submitted to the FTC through their complaint system to contribute towards their existing antitrust investigation of Reddit.

Shared because it is unlikely Reddit responds except when required by law, so I recommend engaging regulators (FTC, and DOJ at the bare minimum) and legislators (primarily those focused on Section 230 reforms) whenever possible with regards to this entity. They're the only folks worth escalating to, as Reddit's incentives are to gate content, keep ad buyers happy, and keep the user base in check while they struggle to break even, sharing as little information publicly as possible along the way [1] [2].

[1] https://www.bloomberg.com/news/articles/2024-05-09/reddit-la... | https://archive.today/wQuKM

[2] https://www.sec.gov/edgar/browse/?CIK=1713445

account42
0 replies
4h51m

non-biased journalism sources

No such thing, and definitely not the AP.

fredgrott
0 replies
1d3h

the article quotes reddit policy change: Reddit considers search and ads commercial activities and thus subject to robot.txt block and exclusion.

account42
0 replies
5h11m

Ah so when reddit uses user content for monetization it's ok but when others do it then it isn't? Reddit may want that double standard but I think the only thing they are going to achieve with this stunt is more people ignoring robots.txt.

EasyMark
0 replies
14h17m

how was it being abused. You still clicked on the information and saw the reddit ads? Now they won't get any of that from "rival" sites to google. I guess they figured the 60 million was more than that ad revenue. Seems greedy but I don't think it's illegal like others are suggesting.

ColinHayhurst
0 replies
1d3h

The blocks for MojeekBot, as Cloudflare verified and respectful bot for 20 years, started before the robots.txt file changes. We first noticed in early June.

We thought it was an oversight too at first. It usually is. Large publishers have blocked us when they have not considered the details, but then reinstated us when we got in touch and explained.

popcalc
31 replies
1d4h

  # Welcome to Reddit's robots.txt
  # Reddit believes in an open internet, but not the misuse of public content.
  # See https://support.reddithelp.com/hc/en-us/articles/26410290525844-Public-Content-Policy Reddit's Public Content Policy for access and use restrictions to Reddit content.
  # See https://www.reddit.com/r/reddit4researchers/ for details on how Reddit continues to support research and non-commercial use.
  # policy: https://support.reddithelp.com/hc/en-us/articles/26410290525844-Public-Content-Policy

  User-agent: *
  Disallow: /
Source: https://www.reddit.com/robots.txt

immibis
13 replies
1d2h

Nobody who wants to be successful obeys robots.txt. And I do mean nobody.

chippiewill
9 replies
1d2h

They changed it to disallow so that scrapers can't just claim the robots.txt gave them permission.

tedivm
4 replies
1d

According to the US court systems the robots.txt file is meaningless. If they respond with a 200 status code giving you the access then you can legally scrape it all you want. If they require that you log in then you have to follow the terms you agree to when creating an account. Public means public though, and if Reddit doesn't want to make the content private (put it behind a login) then we can scrape away.

Note that scraping, regardless of the level of permission, doesn't mean you can do anything you want with the content. Copyright still applies. But you can scrape it, and if your use falls under Fair Use or another caveat to the copyright laws then you can do ahead and do it without needing any permission from the authors.

sssilver
3 replies
8h18m

Fascinating. Where can one learn more about this?

datavirtue
1 replies
4h26m

The internet.

deprecative
0 replies
3h52m

If you have nothing constructive to say why say anything?

neongreen
0 replies
3h4m

I liked the chapter on DMCA from the 5-volume E-Commerce & Internet Law. It was super detailed.

I haven’t read volume 1, but apparently half of it is about data scraping, and I expect it to be similarly detailed. So if I were you, that’s where I’d start.

Another option is looking for “robots.txt” at Google Scholar and trying various keywords like “legality”, “scraping”, “case law”, etc.

toomuchtodo
3 replies
1d2h

Independent scrapers can launder the data between Reddit and AI consumers. The only folks this hurts is users seeking info via search engines and folks willing to kowtow to rules that are potentially low impact to evade. Next steps would be (from an adversarial perspective) browser extensions that stream back data for ingestion similar to Recap for Pacer [1].

[1] https://free.law/recap/faq

(full disclosure: assisting someone pursuing regulatory action against reddit in the EU for a separate issue from scraping, it's a valuable resource, but the folks who own and control it are meh)

whycome
2 replies
16h19m

Scrapings laundering. Do we have a term for this?

throwaway4pp24
1 replies
13h40m

Yes, right in the law - "fair use"

account42
0 replies
6h43m

Even more basic, it's free speech. The data itself is public domain so your free speech is not restricted and you don't need fair use excemptions for those restrictions. On the the access through the official system is restricted.

maxnevermind
0 replies
17h35m

Has not NYT tried to sue OpenAI because of them ignoring robots.txt or you mean it's impossible to prove and / or it's still more profitable to just ignore robots.txt?

latexr
0 replies
7h59m

That’s a weird statement to be absolutist about. The majority of individuals and companies who want to be successful do not do so by scrapping websites, thus have no reason to disobey robots.txt. Most people in the world, ambitious or not, wouldn’t even understand what your sentence refers to.

JohnFen
0 replies
22h36m

Sadly true. That's why I gave up on robots.txt years ago and started blocking crawlers outright in .htaccess

Of course, that became unsustainable so now I have everything behind a login wall.

dogleash
12 replies
1d2h

# Reddit believes in an open internet, but not the misuse of public content.

Calling it "public" content in the very act of exercising their ownership over it. The balls on whoever wrote that.

pas
8 replies
1d

it's even worse. it's not theirs (it's the users'), they are merely hosting it and using it (ToS gives them a fancy irrevocable license I guess).

so they can do whatever they want with it and the actual owners/authors have no chance to really influence Reddit at all to make it crawlable. (the GDPR-like data takeout is nice, but ... completely useless in these cases where the value is in the composition and aggregation with other users' content.)

visarga
3 replies
23h31m

the GDPR like data takeout is nice

Is there a way to export my history? How?

visarga
0 replies
15h2m

Thank you, it worked

Terr_
0 replies
15h28m

I tried it many months back when they glitch-killed my decade-plus account. (Yes, I'm still bitter over the kafkaesque injustice.)

Anyway, you basically submit a request and then later they will email you a link to a zip file that contains a free dozen CSV files with unescaped newlines. One for all the comments you made, one for up/down-votes, one for blocked users, etc.

throwaway290
2 replies
20h25m

actually owners/authors like me would not want our stuff crawlable because that gives up our ownership.

When I am answering some random dude on reddit with a problem I want that dude to read my solution. I don't want this to be crawled and forever stored (probably deanonymized) or enshrined in a dozen commercial LLMs. There is substack for that stuff.

pas
1 replies
7h42m

have you heard about DMs?

throwaway290
0 replies
6h40m

That's why more and more exchanges are moving to DMs and closed communities. Because we don't plan on stuff we intend for participants in the discussion to be harvested but it more and more is. If reddit breaks that trend I only welcome it.

deepfriedbits
0 replies
1d

On top of that, a sizable chunk of Reddit content is ripped from elsewhere, whether videos, images, etc.

shit_game
1 replies
18h45m

Their license/Eula clearly state that Reddit has perpetual whatever to content posted on Reddit, but relying solely on DMCA for "stolen" content _yet again_ feels like a terrible way to deal with non-original content. Part of me hopes that Reddit gets hit with some new precidence-setting lawsuits regarding non-original content that requires useful attribution, but I double t that will ever happen.

account42
0 replies
6h50m

An EULA does not change the morality of the situation anyway. They are a leech profiting off users generating content who are now upset about not getting a cut from third-parties also profiting from said user generated content.

Khelavaster
0 replies
31m

The Fake News police should shut down this sort of messaging

raverbashing
0 replies
1d1h

With the amount of crap in Reddit, cleaning it must be a very non-trivial problem. (I mean, it never is, but in the case of Reddit it's probably extra complicated)

Zuiii
0 replies
13h23m

We believe in something that we will now proceed to violate.

I will never take a statement given by a company that blatantly lies like this at face value going forward. What a bunch of clowns.

onlyrealcuzzo
25 replies
1d3h

This is an interesting development.

How many other sites might have leverage to charge to be indexed?

I don't want to live in a world where you have to use X search engine to get answers from Y site - but this seems like the beginning of that world.

From an efficiency perspective - it's obviously better for websites to just lease their data to search engines then both sides paying tons of bandwidth and compute to get that data onto search engines.

Realistically, there are only 2 search engines now.

This seems very bad for Kagi - but possibly could lead the old, cool, hobbiest & un-monetized web being reinvented?

WarOnPrivacy
17 replies
1d2h

Realistically, there are only 2 search engines now.

From the article:

     Many alternatives to GBY [Google, Bing, and Yandex] exist, but almost none of them have their own results;
This seems to assert that ~0 other search providers do any crawling at all. Ever. Are we sure that's the case?

   (they could crawl but never ever return those results == more odd).

MichaelZuo
9 replies
1d1h

Bing provides far fewer verbatim results for pretty much all search queries that I've tested.

And Yandex isn't much better for non cyrillic search, Baidu is only for the Chinese web effectively.

And all other search engines either don't even attempt to do full web crawls anymore and/or buy from one of the four above.

So realistically there's just one search engine for the full web that actually does the work.

dev1ycan
5 replies
1d

Brave has their own search engine, yandex I only use for reverse image search, baidu's interface is really clean and feels like old school google... but I don't speak chinese so I can't use it.

I hope that one day they get a western version

MichaelZuo
4 replies
22h59m

Brave doesn't have its own index of the full web, and it's even less useful than Yandex. And very likely buys some of it, according to what I've heard. So it falls into the last category.

em-bee
2 replies
20h10m

if that is true then they are lying on their site where they claim: "Brave Search operates from a fully independent search index"

do you have any reference for your claim?

i use brave search and find it very useful. very rarely there is something i can't find, and when i run into that other search engines are not much better.

MichaelZuo
1 replies
9h58m

Notice that it doesn't say "Brave Search solely operates from..." or "only operates from"?

Instead the wording leaves wiggle room for the possibility of using multiple.

em-bee
0 replies
52m

it would still be lying by omission at least

hn_go_brrrrr
0 replies
11h16m

Given the piles of spammy shit on Google these days, I'm wondering if "doesn't have its own index of the full web" is actually a competitive advantage.

WarOnPrivacy
2 replies
1d1h

And Yandex isn't much better for non cyrillic search,

I like Yandex when I'm rabbit-holing after obscure musicians/music. I routinely have a better experience than I do with DDG or Kagi or Goog.

freediver
0 replies
19h2m

Kagi also uses Yandex index so it would be unusual that something exists in Yandex and not in Kagi results.

MichaelZuo
0 replies
1d

It's also vastly better for finding livejournal blogs.

ColinHayhurst
3 replies
1d1h

It's a very long article so understandable that you did not read on and learn about other search engines crawling beyond GBY. Still there are indeed very few that are crawling at web scale, and internationally. We are at 8 billion pages and totally independent [0], hence expressing our concerns to 404 media after being blanked by Reddit.

[0] https://www.mojeek.com/about/why-mojeek

WarOnPrivacy
1 replies
1d1h

did not read on and learn about other search engines crawling beyond GBY. Still there are indeed very few that are crawling at web scale, and internationally

That's helpful clarification.

In criticism of the article, you might agree that

none of them have their own results

is a fairly absolute statement. It signals: Final word on the matter; no nuance to follow.

topaz0
0 replies
1d1h

Omitting the "almost" from "almost none" makes it sound disingenuously more absolute than it actually is.

shadowgovt
0 replies
1d

I mean, I didn't read on because it's paid.

I'm not taking their reporting without compensation, but that also means I didn't have the whole story. Such is life in this era of the internet.

tdeck
0 replies
13h29m

Aside: Does anyone know how the GBY term became a thing and why it includes Yandex but not Baidu?

darreninthenet
0 replies
20h2m

I believe Kagi has its own crawler as well and it merges all the results and does whatever Kagi does behind the scenes to show the mix

Yawrehto
0 replies
1d

Doesn't it list three major ones, Google, Bing, and Yandex, plus Mojeek and a few other small ones? That's a bit more than two.

McDyver
1 replies
1d1h

That seems like the business model for streaming. You subscribe to X provider to watch Y series. So, as for streaming, I suppose a pirate bay search engine will come up

toomuchtodo
0 replies
1d1h

Pirate Bay is probably not the most optimal analogy, more like Anna's Archive imho [1], individually offered by web property scrape runs compressed into a package, maybe served by torrents like this Academic Torrents site example [2].

Scraper engine->validation/processing/cleanup->object storage->index + torrent serving is rough pipeline sketch.

[1] https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu... ("HN Search: annas archive")

[2] https://academictorrents.com/details/9c263fc85366c1ef8f5bb9d... ("AcademicTorrents: Reddit comments/submissions 2005-06 to 2023-12 [2.52TB]")

splwjs
0 replies
1d1h

idk man i bet you five bucks and a handshake it's just going to play out like the existing startup grift.

There's an established player with institutional protections, then a scrappy upstart takes a bunch of VC money, converts it into runway, gives away the product for free, gradually replaces and becomes the standard, then puts out an s-1 document saying "we don't make money and we never have, want to invest?" and then they start to enjoy all the institutional protections. Or they don't. Either way you pay yourself handsomely from the runway money so who cares.

The upstart gets indexed and has an API, the established player doesn't.

The upstart is more easily found and modular but the institutional player can refuse to be indexed to own their data and they can block their API to prevent ai slop from getting in and dominating their content.

gtirloni
0 replies
23h12m

> but this seems like the beginning of that world.

It's not the beginning, it's mere continuation.

Walled gardens have existed since the AOL days. They deteriorate over time but it doesn't prevent companies from trying (each time, in bigger attempts).

aAaaArrRgH
0 replies
13h20m

but possibly could lead the old, cool, hobbiest & un-monetized web being reinvented?

It still exists. It just isn't that popular.

debacle
22 replies
1d3h

Reddit has been ripe for disruption for years. It's just waiting on an inflection point and someone to take it behind the barn.

bdw5204
7 replies
1d3h

The strange thing to me is how everybody keeps trying to make distributed Twitter happen when distributed Reddit is the low hanging fruit for federated social media.

You don't want to end up banned from a movies forum because you also participate in a political forum. Federation solves that problem because you can use separate accounts without either forum knowing that you also use the other.

Suppafly
1 replies
1d2h

The strange thing to me is how everybody keeps trying to make distributed Twitter happen when distributed Reddit is the low hanging fruit for federated social media.

Honestly, it's strange to me how hard people are trying to make distributed anything happen. Federation mostly solves a problem that real people don't have or care about.

Yawrehto
0 replies
23h59m

Honestly, it's strange to me how hard people are trying to make distributed anything happen.

IMO, something federation is very good at is solving one slow-moving problem - enshittification of social platforms. It's not immune, of course, but an Elon Musk-style takeover is much harder with Mastodon than Twitter, and it would be hard to run it into the ground in other ways because the platforms are owned by different people and groups.

teabee
0 replies
1d2h

Is this not just what the internet was before reddit? What features would "distributed reddit" have that an internet full of independent community forums be missing?

ravetcofx
0 replies
1d3h

This exists with Lemmy already and is fostering nice communities (and due to ActivityPub is interoperable with Mastodon accounts)

psunavy03
0 replies
1d2h

They had this years ago, and they were called "forums."

ks2048
0 replies
1d2h

I like it principle, but after watching the situation with Twitter clones, I'm not too optimistic on federated services taking off.

I would like to see a wikipedia-style system for Twitter/Reddit: open access data, non-profit.

astrange
0 replies
19h12m

It's not possible because the most common problems with running a forum are spam and moderation, and both of those are too much work unless it's centralized.

crazygringo
6 replies
1d3h

The network effects are too strong.

Remember, the only reason Reddit "won" was because Digg destroyed itself with a radical upgrade that everyone hated.

Reddit would have to do something similarly self-inflicted, and I can't even guess where people would go. Reddit was already an alternative to Digg -- what's the alternative to Reddit? I mean, it's certainly not Quora.

NoMoreNicksLeft
1 replies
1d2h

It was already dead by then. Really, it was the various Slashdot exoduses... sites like K5 got large initial boosts, but stumbled and started to deteriorate. If the Digg exodus is what sent you to Slashdot, chances are you're the kind of user everyone else was trying to escape.

what's the alternative to Reddit? I mean, it's certainly not Quora.

If it was deliberate I certainly can't tell, but one of the characteristics of Reddit is that it caused so many other little tiny internet forums to just wither away. Most were visually unappealing, running some ancient phpbb software or whatever, but there were so many like stars in the night sky. Now, if they're even still running, you look for the newest post, and it will say "November 2023". Hell, the only reason they are still running is that the credit card number on file paying for hosting doesn't expire until next year somehow. Reddit is a red tide algae choking out all life in the ocean, nothing else gets to exist anymore.

bobajeff
0 replies
1d2h

I think you might be onto something with the observation about people moving from old forum software like phpbb to subreddits.

It's like what happened to personal websites when things like Blogger, Tumblr and Facebook popped up.

It's hard to beat something that is easy to set up and pays for hosting but still let's you control moderation. It's a no brainer.

Managing your own domain where users post content is a minefield of problems these days even if you didn't mind the cost of running it.

Something like this might also explain the move to things like Discord over IRC.

tayo42
0 replies
1d3h

Reddit is quietly a huge website with a significant amount of users. So many people use it but dont talk about it. Google search says 1billion mau? Twice as big as Twitter

nope1000
0 replies
1d3h

There is Lemmy for example, very similar to old Reddit. The big problem is the missing content outside of mainstream communities.

Suppafly
0 replies
1d1h

Reddit was already an alternative to Digg -- what's the alternative to Reddit?

This site is essentially 'orange reddit', they just need to add sub-HNs or tagging or something and it'd be ready for an influx of reddit refugees. Not that any of really want it, but it's possible.

CSMastermind
0 replies
1d1h

I don't think this is true.

The main thing I see Reddit being useful for are discussions about entertainment.

There's probably a subreddit for your favorite sports team, twitch steamer, TV show, book series video game, politics (which is entertainment for some people).

Reddit has seriously degraded the experience of a lot of these communities with things like restricting custom CSS.

It seems to me that the way you'd disrupt Reddit as a startup is to pick a vertical and laser focus on becoming the best discussion board for that community. If it's sports than have integrations for live stats, scores, etc.

In general you could attract users by offering profit sharing on ads the same way Youtube does for creators.

Have the best moderation tools in the world, a constant painpoint with Reddit. Give admins more flexibility over the appearance of the board, all things Reddit took away.

The other path for disruption would be if an established company with those communities tackled the problem. Lots of communities already us Discord, but they tend to also have a subreddit because chat and forums are different communication methods. Discord could easily offer a forum product as an extension of their chat services. If they do it well they'd drive a lot of users away from the subreddits.

onlyrealcuzzo
2 replies
1d3h

Or for Google to buy it.

They could monetize it much better while being less annoying.

Ultimately - Google is getting everything they want from Reddit with this deal without having to buy it outright.

Short of Reddit transforming to an entirely different product (difficult) - I'm not sure where the major growth opportunity is for it.

jessriedel
0 replies
1d3h

Very few of the reddit users who are providing the content for free are motivated by which search engines are allowed to index the content, so I don't see how this would make it more ripe for competition. (If you just mean society would now be even better off if reddit were disrupted, ok, maybe, but that's a different thing.)

escapecharacter
0 replies
1d3h

Man, I just want to be able to search the entire internet for when I’m doing niche research.

Does this mean there will be a future where everyone is running their own crawler? I suppose.

api
0 replies
1d3h

Networks effects are more powerful than we are. Witness the number of people who despise Xhitter but are still on there. Once something has a sufficient network effect they become immune to normal market forces and able to abuse their position with near impunity.

QVVRP4nYz
0 replies
9h1m

The killing of third party clients didn't have significant impact, I don't know what would they have to do to lose users, other than some kind of mandatory subscription fee.

VoidWhisperer
20 replies
1d4h

Wow, reddit found a way to make themselves even less useful somehow. After the API fiasco, that seemed like it'd be pretty hard to do.

wvenable
7 replies
1d3h

But, apparently, they did finally find a way to make money.

jasode
5 replies
1d2h

>But, apparently, they did finally find a way to make money.

The most recent 10-K financial results 2024-03-31 (filed 2024-05-08) shows they actually lost money: https://www.sec.gov/edgar/browse/?CIK=1713445

(For 2024-Q1, Reddit lost -$575 million on revenue of $242 M.)

If the quoted "$60 million deal"[1] from Feb 2024 is accurate, that small amount from Google may not be enough for Reddit to turn a profit. It remains to be seen what the Q2 or Q3 financials will show.

[1] https://www.google.com/search?q=google+ai+deal+reddit

wuiheerfoj
4 replies
1d2h

Wow, perhaps I’m naive but what the hell are they spending over $800M a year on? That seems an obscene amount for a glorified message board.

I just read they have 2000 employees which is also puzzling to me

toomuchtodo
0 replies
1d1h

They were a public good currently larping as a for profit concern now run by a vanity and wealth driven executive driving it into the ground while it flails to monetize when that is likely incompatible with the entity.

Compare and contrast to say, HN, run on two servers in a colo with less than a handful of mods.

splwjs
0 replies
1d1h

it's not just a message board, it's an influence machine.

They need to make sure the stuff they want people to think is posted often and has a big number next to it, they need to make sure things that people like are associated with the stuff they want people to like/think/do and things that people don't like are never associated with the stuff they want people to like/think/do. They need to make sure that people who say the wrong things are silenced or persuaded to leave, etc etc. Man they probably have at least one contact in at least one intelligence agency and they have to make sure not to run afoul of that contact.

Like the news isn't just a list of what happened recently, political debates aren't just two guys talking, and reddit/twitter aren't just message boards.

alephxyz
0 replies
1d1h

They spent 400M on R&D this quarter, which means more "personalisation"/ad targeting and probably cooking up some DOA chatbot/assistant product that's costing them a ton in compute

LunaSea
0 replies
1d2h

Barely enough to pay the CEO

abdullahkhalids
3 replies
1d1h

The API changes and these robots.txt were part of the same strategy - preventing third parties from scrapping their data and reducing the AI generated content that makes it into their data. So they can sell that data and make money.

AlexandrB
1 replies
1d

their data

Love how it's their data when it might make them money but not their data if they get sued.

abdullahkhalids
0 replies
23h28m

That's fair. I agree with you that in some sense it is user data. And that Reddit is operating unethically.

kjkjadksj
0 replies
1d

Their dataset is already polluted with misinformation campaigns and shilling

Hikikomori
3 replies
1d3h

The only things it does for me is forcing me to use Google as a large amount of the answers I need is on reddit.

immibis
0 replies
1d2h

That's what Google is paying them for :)

brewdad
0 replies
1d2h

So then this gambit worked. It sucks and I hate it. I will continue to use DDG/Bing first but it looks like I'll be hitting up Google more often too.

WarOnPrivacy
0 replies
1d2h

The only things it does for me is forcing me to use Google

Startpage, Kagi and Lukol are 3 that source from Google. I imagine there are others.

stainablesteel
1 replies
1d1h

which is ironic because pre-AI every solid piece of obscure information and non-programming question usually had an answer on reddit, its an extremely valuable dataset looking back. but moving forward i think its only going to become less valuable and people will probably manually/custom-scrape all the questions out of worthwhile subreddits and open up their data for free

splwjs
0 replies
1d1h

When I was young, my brother knew a guy who was really into movies. If you wanted to know about a movie you couldn't remember, you would go talk to that guy.

For a while, the internet had an end-run play that made that guy less useful. You can just go on the internet for obscure movie information, buddy.

But now it seems like knowing a movie guy is going to be the only way to get a real person's opinion on movies. The internet is about to forget everything without a profit motive and just start telling you that the latest product from a monolith corp like disney is the only movie worth watching. If someone scrapes all the useful movie opinions off of reddit and spends their time crafting it into a usable format, that guy's probably got a company. But not Bill. Bill's just a guy you can know or not know. You can't monetize knowing Bill. Sidenote that's probably why it irked me so bad when some bozo coined the phrase "social capital".

splwjs
1 replies
1d1h

If they kept their API open then by now the entirety of the site would be ai slop that was built with chatgpt and launched with the api.

Then again most of what that site does is just blend and regurgitate the information that's currently on it anyway.

miohtama
0 replies
1d1h

Those AI bots would likely to be more intelligent commentors than Redditors

PaulRobinson
20 replies
1d3h

This is great. It means I won't see Reddit content popping up all over search results in other engines. Can Medium do the same? And perhaps Quora?

bdjsiqoocwk
6 replies
1d3h

What a weird thing to say. Reddit has for a long time been a place where real people hang out and have real conversations, unlike quora and medium.

candiddevmike
1 replies
1d3h

I think Reddit lost that kind of authenticity a while ago. Advertisers know the "search:reddit.com <product>" trick, and when you look at the number of upvotes, it costs _pennies_ to get your product trending in the comments.

Suppafly
0 replies
1d2h

I don't search reddit for <product> though I search it for <highly technical issue with product> because reddit is the only place where real people discuss such issues and the solutions to them.

psunavy03
0 replies
1d2h

Yeah, but each sub to a greater or lesser degree, has its own hivemind you'll be run out of town (or possibly even banned) for challenging. And the average member of Reddit is quite willing to spout off confidently incorrect BS and downvote people into the ground who actually know what they're talking about.

Not exactly always a reliable source of info outside uncontroversial niche topics or places like /r/AskHistorians that actually moderate. And even there I've seen the occasional humdinger.

VancouverMan
0 replies
1d2h

where real people hang out and have real conversations

I don't consider the discussions there to be "real" in any meaningful way, thanks to the extensive moderation.

From what I've seen, there typically ends up being a small handful of moderator-enforced narratives that are deemed "acceptable" for a given subreddit, and any commenters deviating from those narratives get banned, or their comments end up as "[removed]" by "[deleted]", or the comments get obscured with the "comment score below threshold" notice.

It's generally some of the most one-sided and blandest discussion around. Given that there's often no meaningful back-and-forth involving differing perspectives of any sort, I'm not even sure if it should be considered "discussion". It's more like regurgitation and repetition.

I've found the situation to be particularly bad on the Canadian locale-specific subreddits, for example, but a enough of the tech-oriented ones I've seen seem to end up like that, too.

PaulRobinson
0 replies
9h28m

I've got thousands of karma. Used to love it. No more.

MattPalmer1086
0 replies
1d3h

Its not strange to me. Every single time I've followed a Reddit link from search results, I've got a short and fairly useless conversation that doesn't help me at all. So I have never understood why people like it.

Obviously, people do see value in it, or they wouldn't keep saying so! I would happily exclude Reddit links from search results though.

jonpurdy
5 replies
1d3h

FYI, Kagi lets you do this and personalize it as you desire. They even share aggregated stats※ about which domains users choose to block/lower. (Mine generally match these stats.)

※ - https://kagi.com/stats?stat=leaderboard&k=-2

WarOnPrivacy
3 replies
1d1h

Kagi lets you do this and personalize it as you desire.

Kagi shill here. Are they finally applying filters and operands to image searches?

Asking because it was a tough year seeing Pinterest as top filter choice and top result in images (when set as filter=block).

(edit: I just tried searching->image: beautiful quilt patterns. I didn't spot any Pinterest results!)

I have never understood why DDG, etc steadfastly refuse to obey operands in image searches. Most days. Every blue moon operands seem to work. I think.

sidebar: Yesterday I saw Yandex obey quotes in a web search. It was the 1st time I've seen that.

hugh_kagi
2 replies
1d

Are they finally applying filters and operands to image searches?

That was a bug, apologies. It should be fixed now.

WarOnPrivacy
1 replies
4h40m

I sincerely appreciate the diligence. I really did see Pinterest results over a longish span. I may well have only noted their presence and not their absence - skewing my perspective.

Overall, my experience is very positive. I'm on many PCs throughout the day and I miss Kagi when it isn't there.

elashri
0 replies
3h10m

Pinterest serves pages in many domains (different TLDs). It is better to use a regex to block them.

eliasdaler
0 replies
6h28m

You can also do this for free with uBlackList: https://github.com/iorate/ublacklist

This has greatly enhanced my Google experience - easy to ban content farms, AI-generator websites from appearing in Google Images etc.

kingnothing
2 replies
1d3h

What use do you get out of a search engine if not searching for reddit and other forums? The rest of the internet has become a cesspool of useless AI generated crap.

kevincox
0 replies
1d2h

To be fair Reddit threads are more and more often getting filled with useless AI generated crap as well.

jjulius
0 replies
1d2h

To be fair, Reddit has plenty of astroturfing, too.

troyvit
0 replies
1d2h

Kagi lets you configure the search engine to deprioritize or even fully eliminate search results. They ride on the back of Google's indexing so -- if you ever change your mind -- you could bring reddit searches back.

rkangel
0 replies
1d1h

Interesting. I have long found Reddit to be the an excellent source of solutions to problems. Stack Overflow usually beats it for programming specific stuff, but for everything else usually the most helpful answer comes from Reddit. It's a real person, helping another real person with a real problem.

lfkdev
0 replies
1d3h

Yeah awesome, reddit was one of the last useful results beside the spam blogs and ai generated articles.

Suppafly
0 replies
1d2h

This is great. It means I won't see Reddit content popping up all over search results in other engines.

Honestly, that makes those other engines way less valuable because for many topics, telling the engine to specifically narrow the results down to reddit comments is the only way to get a decent answer to what you're looking for. I'd definitely support blocking Quora from everything though.

nerfbatplz
12 replies
1d3h

I propose we change the term enshitification to engoogleification in regards to the internet.

crazygringo
10 replies
1d3h

This is about Reddit disallowing other search engines.

Blame Reddit, not Google.

dvngnt_
9 replies
1d3h

plenty of blame to go around

crazygringo
8 replies
1d3h

You'll have to demonstrate that.

Is Google's contract with Reddit exclusive, so that other search engines aren't given the opportunity to also pay?

I highly doubt that, especially since the DOJ would go after that immediately because of antitrust.

So no, pretty sure the blame here is 100% on Reddit unless you have evidence otherwise.

dvngnt_
7 replies
1d3h

I don't think the DOJ acts immediately.

so that other search engines aren't given the opportunity to also pay?

this makes it harder for new engines if google has exclusive deals with some of the most popular sites

crazygringo
6 replies
1d2h

if google has exclusive deals

My comment said, show me that the Google deal with Reddit is exclusive.

You haven't done that.

And there's no reason to think it would be, because of antitrust. The DOJ doesn't have to act "immediately", the point is that obvious antitrust violations come with fines that make it unprofitable to attempt in the first place. And this would be black-and-white obvious antitrust violation, given Google's monopoly status in search. This isn't a gray area where it might be worth it for Google to roll the dice.

dvngnt_
5 replies
1d2h

clearly some deal was reach between the two parties or we wouldn't be here.

whether or not the deal is exclusive OR companies have to pay to index reddit it's still bad for competition. money has a barrier to entry preventing newcomers.

I can blame reddit for creating the deal and I can blame google for accepting the deal if the effect is bing, ddg and others cannot display reddit results without reaching some deal.

crazygringo
4 replies
1d2h

I'm not saying it's not bad for competition.

I'm saying the blame is 100% with Reddit.

Blaming Google for accepting it makes no sense. That's like if a shopper goes to grocery store and buys an expensive $20 piece of cheese, and other shoppers can't afford cheese that pricey, and you're blaming that one shopper for buying it because it means other shoppers can't also get the cheese without paying for it. That doesn't make any sense. The store set the price, and they're the one to blame if other shoppers can't afford it.

If Bing, DDG and others can't reach a deal with Reddit, that has nothing to do with Google.

Again, blame here is 100% on Reddit, and 0% on Google. To assign blame to a purchaser in a case like this doesn't make any sense.

dvngnt_
3 replies
1d2h

bing probably has the money to reach a deal, smaller companies without monopolies is less likely, and that's the problem.

i don't think google is blameless like you propose.

crazygringo
2 replies
1d1h

bing probably has the money to reach a deal, smaller companies without monopolies is less likely, and that's the problem.

Reddit can charge smaller companies less money. So if there's a problem, again, the problem is 100% with Reddit.

Google is absolutely blameless here. You may not like Google, and you can certainly blame them for plenty of other things. But in this situation, literally all of the blame is with Reddit for deciding to remove their content from all search engines unless they pay. Reddit didn't have to do that. Google didn't make them do that.

Reddit did this. Not Google.

dvngnt_
1 replies
1d1h

takes two tango. reddit can't do anything without google signing papers as well

bryan_w
0 replies
23h8m

I don't think you're right about that.

frizlab
0 replies
1d3h

I back up this proposal.

dvngnt_
12 replies
1d3h

site:reddit.com works for kagi for new posts this week?

Suppafly
3 replies
1d1h

weird, I've never heard of mojeek before and this is the 2nd comment in this thread I've seen mentioning it.

zamadatix
2 replies
1d1h

As of the time of writing there are 8 search matches in this thread: 1 from you, the rest from Colin (CEO of said company).

rozab
0 replies
22h41m

Spam is fine by me if it's from the CEOs personal account, lol. Clearly I wasn't familiar with the product so it's a helpful comment for me

Suppafly
0 replies
22h43m

the rest from Colin (CEO of said company)

I assumed he had financial connection to them, but didn't want to take the time to research it. Mojeek is the new fetch.

karaterobot
1 replies
1d3h

From the second paragraph of the article:

Searching for Reddit still works on Kagi, an independent, paid search engine that buys part of its search index from Google.
dvngnt_
0 replies
1d2h

thanks i only read the first paragraph. then i went to kagi discord and they provided more context

AndroidKitKat
1 replies
1d3h

Kagi gets part of their index from Google, per the article, so perhaps that's the reason Kagi still works. Wonder if Vlad and Kagi will do (or have done) the calculus to see if buying crawlability from Reddit itself is cheaper than buying results from Google for Reddit search.

hugh_kagi
0 replies
1d

Not yet but it's something we want to look into.

ColinHayhurst
0 replies
1d3h

Kagi pays to use APIs from Mojeek and Google

tempfile
9 replies
1d3h

Hopefully this paves the way for antitrust action, but I won't hold my breath.

Reddit's justification for this is profoundly wrong. Their "public content policy" is absurd doublespeak, and counter to everything the open internet is and hopes to be. You cannot simultaneously call yourself "open" and "public" while refusing access to automated clients. Every client is automated. They even go so far as to say that "crawling" (also known as "downloading") is an "abuse" and violates user privacy.

This is absurd, and not justified. I would love to see legislation that restricted server operators' ability to prohibit automated access in this way, but I suppose it will never happen. Some people in this thread have attempted to justify the policy by saying "they have to protect their income streams". No they don't. You don't have a right to an income stream, and you certainly don't have a right to lie in order to get all the benefits of an open internet with none of the downsides. Noting of course that the "downsides" are in this case actually just "competitors".

semiquaver
8 replies
1d3h

Sorry, what is the antitrust concern about Reddit blocking crawlers that aren’t paying them? Surely you don’t think Reddit has a monopoly on anything?

Or are you somehow suggesting that it’s google’s fault that Reddit took this step? I don’t see any indication that’s the case.

em-bee
6 replies
1d2h

not that reddit has a monopoly, but that google has.

google is using their power to prevent others from competing.

the problem here is of course that if reddit would be in financial trouble (i don't know if they are but let's imagine they need this money), they'd be between a rock and a hard place.

google should not be allowed to make exclusive deals, and reddit could not survive without the deal, then what would be left? google buys reddit, or the relevant authority approves of the deal?

i thought about the same problem with firefox. let's assume firefox is forced to allow people to make a choice of the default search engine (just like microsoft was forced to allow a choice of default browser on windows) then google might stop paying mozilla, and they could end up in financial trouble.

ideally no company ever depends on a single other company that much, but that only works if we don't allow companies to grow that much in the first place.

asadotzler
2 replies
23h54m

let's assume firefox is forced to allow people to make a choice of the default search engine

Firefox has always allowed people to make a choice of the default search engine, since before it was even called Firefox. I know. I was there building it.

em-bee
0 replies
23h43m

yes, but the default is google, and you have to go into the settings to make a choice, so most people keep the default. what i meant was the EU directive for microsoft where they actually had to put up a prompt at first use asking the user which browser they want, without allowing any default (and, i am not sure, maybe even a randomized list)

if the same was done for search engine choice for firefox then google would no longer be the default, and they would have no reason to pay firefox for that.

account42
0 replies
4h8m

This is not true. Firefox comes with a fixed default on installation. You can change the search engine afterwards but that does not make the default a user choice.

ColinHayhurst
1 replies
1d2h

let's assume firefox is forced to allow people to make a choice of the default search engine

let assume apple is forced to allow people to make a choice of the default search engine in safari then google might stop paying apple, and ...

tempfile
0 replies
1d1h

surely firefox is the more interesting example, since they have orders of magnitude less alternative revenue?

account42
0 replies
4h11m

If the only way your company can exist is as part of a monopoly scheme then it should stop existing.

And as for Firefox, Mozilla being forced off Google's teat would be the best thing that could happen to the browser.

tempfile
0 replies
1d1h

Yes, sorry, should have been more clear: I claim google is in a monopoly position, not reddit. The rest of the comment is unrelated ranting about reddit's betrayal of their previously-held "public data is public" position.

nomilk
8 replies
1d3h

Suppose a crawler or rival search engine doesn’t respect robots.txt, reddit can’t stop them. Make it a bit trickier, yes, but not stop them.

miyuru
4 replies
1d2h

reddit blocked datacenter IPs even before this change.

nomilk
3 replies
1d2h

Could a motivated scraper not buy IPs/proxies that aren’t in those ranges, i.e. to blend in with general users?

Manuel_D
1 replies
1d

Proxy IPs are also known and typically blocked. In fact, you can't even browse reddit without logging in when connected to most proxies.

Many web scraping companies have loads of phones hooked up in a rack in order to use mobile IPs. Companies can't just block mobile IPs because their site would become unusable for several city blocks (mobile IPs often correspond to a specific cell tower). This is the face of modern web scraping: https://i.imgur.com/U2RXi5G.jpeg

BeFlatXIII
0 replies
16h57m

Or, if you're on the shadier parts of the web, hacked IoT devices put to good use instead of DDoS zombies.

xeromal
0 replies
1d

Just like every security feature in the physical and digital worlds, security just inconveniences honest people and the cost to bypass reduces the amount of people who try.

Eventually it becomes expensive to scrape reddit's data and most people will stop.

eschneider
2 replies
1d3h

It is evidence that they didn't have permission if you sue them.

tagawa
0 replies
1d2h

This is not even scraping - it’s just crawling and indexing.

StrauXX
8 replies
1d1h

IANAL but as far as I understand the current legal status (in the US) a change in robots.txt or terms and conditions is not binding for web scrapers since the data is publicly accessible. Neither does displaying a banner "By using this site you accept our terms and conditions" change anything about that. The only thing that can make these kinds of terms binding is if the data is only accessible after proactively accepting terms. For instance by restricting the website until one has created an account. Linkedin lost a case against a startup scraping and indexing their data because of that a few years ago.

redcobra762
2 replies
23h27m

Has anyone ever successfully been prosecuted for violating this statute?

qingcharles
1 replies
18h26m

I don't know. That data is really hard to put together.

quink
0 replies
18h2m

That’s a blink and you miss it joke of the highest calibre - well done!

jpalomaki
3 replies
1d

Quite sure they are also enforcing these with some technical measures to limit scraping.

renlo
2 replies
23h46m

As was LinkedIn, who was forced to rate stop limiting / IP-banning scrapers for public pages.

therealdrag0
0 replies
14h11m

Really? That seems strange.

altdataseller
0 replies
10h17m

Many other websites enforce rate limiting or IP banning for public pages (using products like Cloudflare). Why is this not legal?

daft_pink
7 replies
1d2h

I don’t understand how this isn’t anti-competitive behavior. It seems like reddit has to offer this deal with similar terms to google’s competitors.

dathinab
0 replies
1d2h

yep, but for things which are "only" search engines it's not a viable offer. Only if you expect "big AI business value" from it does it make sense, maybe.

eddd-ddde
1 replies
23h36m

I don't see how this tracks at all. Companies can decide to only sell their products with some retailer if they want. You can't force them to make deals with other companies.

gtirloni
0 replies
23h0m

You certainly can in monopoly situations (which apparently this isn't the case).

Suppafly
1 replies
1d2h

Most business deals are anti-competitive in some way. What makes you think this specifically rises to the level where they'd legally have to offer similar terms to competitors?

daft_pink
0 replies
5h0m

I’m not sure. Maybe the angle is that Google is anti-competitive by signing an agreement that limits information to it’s rivals.

Being forced into using google services, because they are paying information companies to deal only with them seems like a disaster for the web.

carlosjobim
0 replies
1d1h

Why in the world would they have to do that? There are thousands of exclusive business-to-business deals being signed into action every second of the day.

voisin
5 replies
1d2h

Makes sense that Google did this deal since their search quality tanked and they became an de facto front end UI for Reddit.

NoMoreNicksLeft
3 replies
1d2h

Up until 2016 (I think, +/- 1 year), if you could remember 3 uncommon words in a comment, you could find any reddit post instantly on Google. I'd want to follow up on a thread from weeks ago, and it was magic. Number one result. Then one day that just stopped working, and even adding site:*.reddit.com didn't fix it. At the time, I think, I didn't realize that it was mostly Google's fault, I thought maybe Reddit had changed their infrastructure so that it couldn't be crawled properly.

Google hasn't been a search engine in a long while, it's just an advertisement engine now.

dev1ycan
2 replies
1d

it's so bad it's crazy, you can legit not find stuff on the internet anymore, it's the same with youtube, I search something and get like 20 or so results and then everything else is hidden.

it started when youtube removed the ability to search for videos older than 5 years, if I had to guess? cost saving, have every old video in cheaper storage... but it sort of fragments youtube, every couple of years you only get newer content.

stuffoverflow
1 replies
15h36m

One day I was searching live videos of a local band from youtube and when sorting by upload date the oldest video was from 2010. I knew for a fact there had to be older videos so I got a youtube API key and searched via the API, ended up finding multiple videos starting from 2006. Learned that youtube is full of videos that are basically impossible to find with the regular search.

dev1ycan
0 replies
14h33m

One of my past times is looking up anime opening reactions for fun/to hear people listen to bands I enjoy that sometimes do anime ops, searching that is so scary, you see the same 4 people or every month or so you get 1 or get 1 removed, you can't tell me there's not more than 4 people on the internet that upload opening reactions... when anime is a billion+ dollar industry, if you know particular channels you can see daily uploads on plenty of channels, that are simply search banned for whatever reason by the algorithm.

But yeah, the most outrageous one is older videos, I do believe the reason is that they are using some long term cloud storage that is cheaper for older videos so they removed the ability to search by date.

Additionally, I don't believe the API fully fixes it, because Bing has a wrapper for youtube and searches do not really vary

LegitShady
0 replies
20h32m

"we noticed that since our search results had gotten so bad nobody can use them to find the things they want, people just kept adding "reddit" to search terms anyways, so we figured we might as well make it official and exclusive"

roughly
4 replies
1d2h

Boy, the LLMs have really been an apocalypse moment for the web, haven’t they? Between hoovering up and monetizing every bit of content they can without any attribution or compensation and the absolute flood of mediocre generated content, they’ve really done in the last straggling remains of the open internet.

It’s not like everyone wasn’t already pulling the same grift, but quantity really does have a quality all its own.

imglorp
3 replies
1d

Of course, we have to be careful not to villainize a neutral tech. Instead let's call it what it is: unchecked capitalism and monopolistic behaviors.

Capitalism seems to work ok for the common good until you remove all the protections. LLMs provide a defacto monopoly for the owner which must already be a near monopoly: they take vast resources to train; only a giant corp can afford to buy all the content and provision enough resources to train one.

LLM did not enshittify what's left of the internet, greed did it.

synicalx
1 replies
18h11m

Of course, we have to be careful not to villainize a neutral tech

This is a very good point IMO. If we're going to chastise LLM's we may as well give servers, switches, routers, fiber-optic cable, and silicone a bollocking as well since that's ultimately what's facilitating all this.

latexr
0 replies
7h5m

No, those are not comparable. If someone criticises the electric chair, it’s not reasonable to defend it with “if we’re going to chastise the electric chair we may as well give wood, metal, chairs, and electricity a bollocking since that's ultimately what's facilitating all this”.

Things are more than the sum of their parts. If you have a ton of beneficial things which can be cobbled together into one bad detrimental thing, the existence of the latter does not remove the benefits of the former.

latexr
0 replies
7h13m

On the one hand, you’re absolutely right. But on the other hand it’s not like it matters in practice. Isn’t most technology technically neutral? But it’s also made to be used by people, who can do so beneficially or detrimentally. Criticising a technology is a shorthand for criticising how it’s used.

lifestyleguru
4 replies
1d3h

I deeply regret every minute spent on and kilobyte of text contributed to reddit.

Ylpertnodi
1 replies
1d3h

I don't. There's nothing around that is similar...with the same traction. The various 'verses are variations on cat pics. I'm still looking, though.

wccrawford
0 replies
1d2h

While it's still not Reddit, but I've been enjoying Lemmy. I have a similar range of communities on each, and other than some annoying groupthink, the content is often similar.

And to me, forgetting to log in to each of them feels similar, too. For what that's worth. (I hate both of them when not logged in.)

trallnag
0 replies
1d2h

I can confidently state that I'm a net negative for Reddit, looking at the dozens of banned accounts in the trash bin of my KeePass vault

card_zero
0 replies
22h41m

I mostly contributed to r/nonsense and I'm pleased by the thought of that sub's content being used to train future AI, with information about the architectural uses of super-tall chef's hats, the prehistoric invasion of Europe by Beak People, and so forth.

ykonstant
3 replies
1d3h

It's ironic, because Reddit is the only search engine that works on Google now thanks to shittening.

maxwell
1 replies
1d2h

They're both running on fumes at this point.

riiii
0 replies
20h16m

Also sniffing them.

QVVRP4nYz
0 replies
9h8m

For years reddit build-in search was broken (or at least broken) and people were forced to use 3rd parties like google, so we came full circle.

wtf242
3 replies
1d1h

This problem is only going to get worse. for my thegreatestbooks.org site i used to just get indexed/scraped by google and bing. now it's like 50+ AI bots scraping my entire site just so they can train a LLM to answer questions my site answers without having a user ever visit my site. I just checked cloudflare and in the past 24 hours I've had 1.2 million bot/automated requests

graeme
1 replies
13h35m

Anyone have any experience with this? Is there nothing but upside in blocking these bots

account42
0 replies
4h46m

Considering it's Buttflare, enabling it probably also means blocking random users. But of course that's not Buttflare's problem because it's not enabled by default.

lmeyerov
3 replies
1d

FWIW, we inquired to the reddit sales team about paying for data sometime last year, as we do similar elsewhere for use cases like helping emergency responders, and even though they were launching the program and asking for customers... no email back. Nor on our second and I think third attempt.

I'm not sure what to make of that.

morkalork
2 replies
1d

How much were you willing to pay? Still, rude of them not to even discuss the issue. Every time I've gone to buy data, if I'm too small of a fish, vendors have always been happy refer me to a reseller.

lmeyerov
0 replies
23h21m

We do 4-6 figures/yr for providers which is normal in our world

An enterprise sales team with only 1 customer happens (eg, Mozilla 's search bar), but... That's surprising here, and scary as a sustainable & scalable business. Ignoring 5-6 figure/yr inquiries says a lot to me. In contrast, we did that same-day with Twitter without talking to anyone.

heisenbit
0 replies
23h11m

Certainly rude but also possibly legally problematic. If they were judged to be in a dominant position in a market and were found making deals with exclusivity then it can get expensive.

It all depends of course what the market is. If one looks as reddit not as a whole but as a collection of niches then one could imho find niches where reddit has a dominant knowledge position.

bitpush
3 replies
1d1h

When Microsoft strikes an exclusive deal with OpenAI to use their models, it is a smart, brilliant, clever move.

When Apple strikes an exclusive deal with suppliers for parts, it is sound business practice.

When Google strikes an exclusive deal with Reddit, it is ..

Some of you have no idea how businesses work, and it shows.

riku_iki
2 replies
1d

When Google strikes an exclusive deal with Reddit, it is ..

It's because reddit is selling content created by users, base on promises that reddit supports open internet, open data, etc, without their consent and sharing revenue, which maybe legal but likely not ethical.

bitpush
1 replies
23h28m

Let's get specific. You're confusing with copyright and licensing.

The users hold the copyright (reddit claim that they made the meme) but reddit has the non-exclusive right to redistribute and license the content.

Two different things.

riku_iki
0 replies
23h23m

but reddit has the non-exclusive right to redistribute and license the content.

that's what I said: it is legal.

dathinab
2 replies
1d2h

Worse it doesn't even really "work" anymore, giving how most search are flooded with garbage SEO results and payed advertisements "basically" looking like search results (most times more garbage not what you are looking for results, int he cases where it isn't it quite often times is on the line of "googles algorithm blackmailing companies to buy ads for users which want to find them through google but wouldn't without ads".)

I wonder if this might affect redis, as in slowly kill it's user base especially when it comes to user providing (and often also looking for) high quality content, because who of such users would want to use google search?

john-radio
1 replies
1d2h

Worse it doesn't even really "work" anymore, giving how most search are flooded with garbage SEO results and payed advertisements "basically" looking like search results ...

I don't understand what you're saying. That's exactly why people append `site:reddit.com` to their searches in the first place, because those search results typically aren't like that.

wwweston
0 replies
1d

Or at least, reddit posts and comments that are content messaging / marketing (human or AI) fit in better with earnest and natural posts, so that they're more effective.

Elfener
2 replies
1d3h

I mean, the reddit company did go public, so things like this were inevitable.

Also things like the API fiasco, and also small annoyances like the fact that when you click on an image on reddit, it now goes to a wrapper html page instead of just the actual image (this was one of the reasons reddit was better than most social media...).

mrec
0 replies
1d3h

Maybe it's just me or something temporary (I use Old Reddit, like all right-thinking folk) but for the past couple of days the image wrapper page seems to have been sent to the glue factory. I'm just getting the image now, unadorned.

account42
0 replies
3h44m

It used to be that Reddit didn't host images and you'd have to link (often shitty) external image hosts. The someone created imgur to host images for reddit. And slowly but surely imgur became just another shitty image host (and social media site for some reason). Then reddit wanted some of the dough imgur was pulling in (probably just making losses) and added their own image hosting. At first it worked just like you are saying with you getting direct links to the image file. Now they also turned into yet another shitty image host.

Part of the blame for the redirect-to-wrapper page lies with browsers. If browsers didn't let servers reliably differentiated between a direct request and an <img> embed then this practice would not be as widespread.

r_singh
1 replies
1d3h

I wonder how Aaron Swartz would react to this

geodel
0 replies
1d3h

My guess is he'd freak out once he'd hear that lawyers, law enforcement may get involved on this issue.

nullc
1 replies
16h30m

It's weird to say that reddit "works" with google. Every page they serve to google is stuffed full of hidden unrelated content, so any reddit result in google is unlikely to actually contain what you were searching for.

Google really should blacklist reddit entirely for this practice, but sadly as bad as reddit is it's still a much higher quality result than average for google.

account42
0 replies
3h53m

Ugh it's absurd at how incompetent Google is at filtering out "related" content or similar volatile "sidebar" feeds in the sitesthey index that has nothing to do with the main content and won't be there when the user actually opens that link.

neilv
1 replies
1d

I'm concerned multiple ways by this, but I also could see some positive fallout from this, if it sets precedents that help protect 'content' owners from AI goldrush companies just taking everything.

gtirloni
0 replies
22h58m

AI companies are the least of our worries in the Reddit situation. The fact that Reddit has full control of user-generated data to do as they please gives them freedom to do as they please. I think this is the crux of today's issue.

AI companies like Google, Microsoft and OpenAI have deep pockets to 'unprotect' themselves from anything. The barrier to entry is for small AI companies and those aren't really making an impact currently.

mutatio
1 replies
1d2h

It's funny in the context of Google's past motto of "don't be evil". I feel the right thing for Google here would have been to decline any deal regarding exclusivity, then Reddit wouldn't have pulled the trigger with its robots.txt update. The entire manoeuvre required both parties.

peddling-brink
0 replies
1d2h

Google should abandon its mission to “organize the world’s information” because doing so requires spending money for valuable data, and others might not want to spend that money?

venkat223
0 replies
1d1h

google is selfish

venkat223
0 replies
1d1h

Google is selfish

thih9
0 replies
1d

Story / rant warning.

I remember seeing an unhelpful hyperlink for the first time. It was a random word in the body of a random tech site that redirected to a list of articles from that site tagged with that term.

I remember being stunned, my expectation was that the link would lead me to another website, one that would be an authoritative source on that term and freely accessible.

20 years later we get a paywalled article about fragmented web – and we’re not slowing down.

r_singh
0 replies
23h54m

Thinking from reddits perspective they have nothing to lose really. It’s not like other search engines are going to pay any attention to the robots txt and Google’s AI would have still scraped data from Reddit regardless of the deal. Now they will just feel less bad about not citing sources possibly, depending on the user experience they want to deliver.

numbers
0 replies
1d1h

"Information is power. But like all power, there are those who want to keep it for themselves. The world’s entire scientific and cultural heritage, published over centuries in books and journals, is increasingly being digitized and locked up by a handful of private corporations." - Aaron Swartz (2008)

myrandomcomment
0 replies
16h36m

So I went Slashdot, Digg, Reddit. I stopped spending any time on Reddit 5 years ago. Not worth it.

melodyogonna
0 replies
1d1h

Wait that's actually terrible.

mediumsmart
0 replies
1d2h

that is awesome but I can't open old.reddit.com in my browser so its a non-issue.

manishsharan
0 replies
3h58m

For my use cases , Google is pretty much useless without Reddit

For example, when I search for product reviews, I always specify reddit. Otherwise the search results are inundated with SEO spam.

lpod
0 replies
1d3h

Interesting move by Reddit to lock down their search functionality to just Google. I guess this means Bing and others are out of luck. Seems like another step towards the walled garden approach – good for ad revenue, but probably not great for user choice. Wonder how long it’ll be before other platforms follow suit?

lowbloodsugar
0 replies
1d2h

Funny that source of TFA blocked me from reading the whole thing.

ein0p
0 replies
10h33m

Good for other search engines, I suppose. Reddit is a giant toxic pile of bovine manure.

earthboundkid
0 replies
19h25m

They literally think the scissor statement is a real thing that will really work, fml.

dbg31415
0 replies
20h4m

Every time I think, “How scummy…” Reddit always finds another way to go lower.

causal
0 replies
1d3h

It feels like Reddit is approaching an inflection point anyway where bot-made content is concentrated enough to spoil the whole experience. Closed servers like Discord and Slack may be the last haven of online human interaction.

blackeyeblitzar
0 replies
16h21m

We need laws that make it so that giant platforms like Reddit have no exclusive rights to content submitted by users. It would be ridiculous for only Google to be able to train AI on YouTube or Reddit content for example.

arnaudsm
0 replies
1d

I understand the AI context, but this is dangerously anticompetitive for other search engines.

This is a dangerous precedent for the internet. Business conglomerates have been controlling most of the web, but refusing basic interoperability is even worse.

Khelavaster
0 replies
1d1h

robots.txt isn't legally binding. Can Reddit really force Bing not to crawl it..?

ChrisArchitect
0 replies
12h22m

Fine with this. This is the world OpenAI created. And all the people that started searching with +Reddit tacked on weirdly like 5 years ago. Reddit's covering themselves from internal user-concern and their general exposure to AI training and Google was smart enough to get on that quickly. We'll see what Bing's take is and what changes if anything now that 404medias's outrage farming is at play. This isn't a recent change afterall, month ago?

1vuio0pswjnm7
0 replies
17h6m

"If you use Bing, DuckDuckGo, Mojeek, Qwant or any other alternative search engine that doesn't rely on Google's indexing and search Reddit by using "site:reddit.com," you will not see any results from the last week."

The veracity of this statement is questionable.

I found at least four web search engines not using Google's index that produced results from the last week.

Example: Recent eruption at Yellowstone Black Diamond Pool

https://www.ecosia.org/search?method=index&q=site:reddit.com...

https://search.brave.com/search?q=reddit.com+black+diamond+p...

https://api.yep.com/fs/2/search?client=web&gl=all&no_correct...

   POST /sp/search HTTP/1.0
   host: www.startpage.com
   content-length: 74
   content-type: application/x-www-form-urlencoded
   query=site:reddit.com black diamond pool&abp=-1&t=&lui=english&sc=&cat=web
At least for this example, I got the same desired result using Reddit site search.

https://old.reddit.com/search/?q=black+diamond+pool

If anyone has some good examples of search queries that I can test showing why a search engine must be used, please share.