The Internet Is Full of AI Dogshit

One aspect of the spread of LLMs is that we have lost a useful heuristic. Poor spelling and grammar used to be a signal used to quickly filter out worthless posts.

Unfortunately, this doesn't work at all for AI-generated garbage. Its command of the language is perfect - in fact, it's much better than that of most human beings. Anyone can instantly generate superficially coherent posts. You no longer have to hire a copywriter, as many SEO spammers used to do.

curl's struggle with bogus AI-generated bug reports is a good example of the problems this causes: https://news.ycombinator.com/item?id=38845878

This is only the beginning, it will get much worse. At some point it may become impossible to separate the wheat from the chaff.

We should start donating more heavily to archive.org - the way back machine may soon be the only way to find useful data on the internet, by cutting out anything published after ~2020 or so.

Interesting idea. Could there be a market for pre-AI era content? Or maybe it would be a combination of pre-AI content plus some extra barriers to entry for newer content that would increase the likelihood the content was generated by real people?

I'm in the camp where I want AI and automation to free people from drudgery in the hope that it will encourage the biggest HUMAN artwork renaissance ever in history.

I don't want AI to be at the forefront of all new media and artwork. That's a terrible outcome to me.

And honestly there's already too much "content" in the world and being produced every day, and it seems like every time we step further up the "content is easier to produce and deliver" ladder, it actually gets way more difficult to find much of value, and also more difficult for smaller artists to find an audience.

We see this on Steam where there are thousands of new game releases every week. You only ever hear of one or two. And it's almost never surprising which ones you hear about. Rarely you get an indie sensation out of nowhere, but that only usually happens when a big streamer showcases it.

Speaking of streamers, it's hard to find quality small streamers too. Twitch and YouTube are saturated with streams to watch but everyone gravitates to the biggest ones because there's just too much to see.

Everything is drowning in a sea of (mostly mediocre, honestly) content already, AI is going to make this problem much worse.

At least with human generated media, it's a person pursuing their dreams. Those thousands of games per week might not get noticed, but the person who made one of them might launch a career off their indie steam releases and eventually lead a team that makes the next Baldur's Gate 3 (substitute with whatever popular game you like)

I can't imagine the same with AI. Or actually, I can imagine much worse. The AI that generates 1000 games eventually gets bought by a company to replace half their staff and now a bunch of people are out of work and have a much harder uphill battle to pursue their dreams (assuming that working on games at that company was their dream)

I don't know. I am having a hard time seeing a better society growing out of the current AI boom.

Think how many game developers were able to realize their vision because Unity3D was accessible to them but raw C++ programming was not. We may see similar outcomes for other budding artists with the help of AI models. I'm quite excited!

Except 'their vision' is practically homogeneous. I can't think even think of a dozen Unity games that broke the mould, and genuinely stand out, out of the many tens of thousands (?).

There's Genshin Impact, Pokemon Go, Superhot, Beat Saber, Monument Valley, Subnautica, Among Us, Rust, Cities:Skylines (maybe), Ori (maybe), COD:Mobile (maybe) and...?

Some other Unity games that are fun, and which others haven't mentioned:

Cuphead

Escape Academy

Overcooked

Monster Sanctuary

Lunistice

Except 'their vision' is practically homogeneous. I can't think even think of a dozen Unity games that broke the mould, and genuinely stand out, out of the many tens of thousands (?).

You could say the same about books.

Lowering the barriers of entry does mean more content will be generated and that content won't the same bar as having a middleman who was the arbiter of who gets published but at the same time, you'll likely get more hits and new developers because you getting more people swinging faster to test the market and hone their eye.

I am doubtful that there are very many people who hit a "Best Seller" 10/10 on their first try. You just used to not see it or ever be able to consume it because their audience was like 7 people at their local club.

Necropolis, Ziggurat... Imo the best games nowadays are often those that no one heard about. Popularity wasn't a good metric for a very long while. And thankfully games like "New World" and "Starfield" are helping a lot for general population to finally figure this out.

and Outer Wilds!

Kerbal Space Program is another.

Yeah, I can definitely see how Beat Saber, Hollow Knight, and Tunic didn’t really do anything particularly creative or impressive. /s

Valheim lol

The Long Dark.

The AI that generates 1000 games eventually gets bought by a company

That seems like only a temporary phenomenon. If we've got AI that can generate any games that people actually want to play then we don't need game companies at all. In the long run I don't see any company being able to build a moat around AI. It's a cat-and-mouse game at best!

In the long run I don't see any company being able to build a moat around AI

Why do you think they are screaming about "the dangers of AI"? So they can regulate it and gain a moat via regulatory capture.

> If we've got AI that can generate any games that people actually want to play then we don't need game companies at all.

Why do you think they are screaming about "the dangers of AI"?

Perhaps it's those of us who enjoy making games or are otherwise invested in producing content that are concerned about humanity being reduced to braindead consumers of the neverending LLM sludge, who scream the loudest.

Yes, but we don't get to sit in Congressional committee hearings and bloviate about Existential Risks.

In the long run I don't see any company being able to build a moat around AI.

This feels like a fantasy.

free people from drudgery in the hope that it will encourage the biggest HUMAN artwork renaissance ever in history.

This experiment has been run in most wealthy nations and the artwork renaissance didn't happen.

Most older people don't do arts/sciences when they retire from work.

From what I see of younger people that no longer have to work (for whatever reason) neither do younger people become artists given the opportunity.

Or look at what people of working age do with their free time in evenings or weekends after they've done their work for the week. Expect people freed from work to do more of the same as what they currently do in evenings/weekends: don't expect people will suddenly do something "productive".

Well I do art in the evenings and weekends, so we exist you know

Becoming an artist is difficult. Sure, anyone can pick up a tool of their preference and learn to noodle around. Producing artwork sufficiently engaging to power a renaissance takes years of practice to mastery. We think that artists appear out of nowhere, fully formed, an impression we get from how popularity and spread works. Look under the surface, read some biographies of artists, and it turns out, with few exceptions, they all spend years going through education, apprenticeships, and generally poor visibility. Many of the artists we respect now weren't known in their lifetimes. The list includes Vincent van Gogh, Paul Cézanne, Claude Monet, Vivian Maier, Emily Dickinson, Edgar Allan Poe, Jeff Buckley, Robert Johnson, you get the idea.

Most older people don't do arts/sciences when they retire from work.

This isn't my experience. I know a bunch of old folks doing woodcarving, quilting, etc. Its just not the kind of arts you've got in mind.

I'll go one further, though I expect to receive mockery for doing so: I think the internet as we conceive of it today is ultimately a failed experiment.

I think that society and humanity would be better off if the internet had remained a simple backbone for vetted organizations' official use. Turning the masses loose on it has effectively ruined so many aspects of our world that we can never get back, and I for one don't think that even the most lofty and oft-touted benefits of the internet are nearly as true as we pretend.

It's just another venue for the oldest of American traditions at this point: Snake Oil Sales.

free people from drudgery in the hope that it will encourage the biggest HUMAN artwork renaissance ever in history

Just like the industrial revolution or just like desktop computers?

Could there be a market for pre-AI era content?

Like the market for pre-1940s iron resting at the bottom of seas and oceans, unsullied by atmospheric nuclear bomb testing.

the problem is that data tends to become less useful/relevant over time as opposed to iron that is still iron and fulfills the same purpose

Well, the AI generated data is only as usefull as the data it's based upon så no real difference there.

That was the first thing that came to mind for me as well

Silly prediction: the only way to get guaranteed non-ai generated content will be to go to live performances of expert speakers. Kind of like going to the theater vs. TV and cinema or attending a live concert vs. listening to Spotify.

You could hash the hoard and stick the signature on a blockchain.

If only it was 2018, we could do this as a startup and make a mint.

We'll fight one buzzword with another!

Extra barriers! LOL. Everything I have every submitted written by me (a human) to HN, reddit and others in the past 12 months gets rejected as self-promotion or some other BS even though it is totally original technical content. I am totally over the hurdles to get anything I do noticed, and as I don't have social media it seems the future is to publish it anywhere and rely on others or AI to scrape it into a publishable story somewhere else at a future date. I feel for the moderator's dilemma, but I am also over the stupid hoop's humans have to jump.

Could there be a market for pre-AI era content?

Yes, but largely it'll be people who don't want to train their AIs on garbage produced by other AIs

It will be like salvaging pre-1945 shipwrecks for their non-irradiated metal.

Let's hope that, like irresponsible nuclear weapons tests, we also experience a societal change that eventually returns things back to a better way.

https://en.wikipedia.org/wiki/Low-background_steel

~2020, the end of history

I’m afraid that already happened after World War I, according to the final sentence of 1066 and All That <https://en.wikipedia.org/wiki/1066_and_All_That>:

America was thus clearly Top Nation, and History came to a .

(For any confused Americans, remember . is a full stop, not a period.)

so the mayans were off by merely a decade

Love that sentiment! The Internet Archive is in many ways one of the best things online right now IMO. One of the few organisations that I donate regularly to without any second thoughts. Protect the archive at all costs!

I won't even bet on archive.org to survive. I will soon upgrade my home NAS to ~100TB and fill it up with all kinds of information and media /r/datahoarder style. Gonna archive the usual suspects like Wikipedia and also download some YouTube channels. I think now is the last chance to still get information that hasn't been tainted by LLM crap. The window of opportunity is closing fast.

That’s true. I thought I missed the internet before ClosedAI ruined it but man, I would love to go back to 2020 internet now. LLM research is going to be the downfall of society in so many ways. Even at a basic level my friend is taking a masters and EVERYONE is using chatgpt for responses. It’s so obvious with the PC way it phrases things and then summarizes it at the end. I hope they just get expelled.

I don't see how this points to downfall of society. IMO it's clearly a paradigm shift that we need to adjust to and adjustment periods are uncomfortable and can last a long time. LLMs are massive productivity boosters.

LLMs are massive productivity boosters.

Only if your product is bullshit.

Ah, Hacker News

Never change

What's new about that? Any bullshit product is bullshit.

Only if you don't proofread or do a cleanup pass is it dogshit.

Do you remember when email first came around and it was a useful tool for connecting with people across the world, like friends and family?

Does anyone still use email for that?

We all still HAVE email addresses, but the vast majority of our communication has moved elsewhere.

Now all email is used for is receiving spam from companies and con artists.

The same thing happened with the telephone. It's not just text messaging that killed phone calls, it's also the explosion of scam callers. People don't trust incoming phone calls anymore.

I see AI being used this way online already, turning everything into untrustworthy slop.

Productivity boosters can be used to make things worse far more easily and quickly than they can be used to make things better. And there will always be scumbags out there who are willing and eager to take advantage of the new power to pull everyone into the mud with the.

Now all email is used for is receiving spam from companies and con artists.

No it isn't, unless you are 12 maybe.

That's not a response to GPs thesis, just an irrelevant nitpick.

Does anyone still use email for that?

Sure. Same as in the olden days. Txt for short form, email for long form. Email for the infrequently contacted.

Even back when I used SM, I never comm'd with IRL people on SM. SM was 100% internet people.

It’s only a boost to honest people. Meanwhile grifters and lazies will be able to take advantage. This is why we can’t have nice things. It will lead to things like reduction in remote offerings like remote schooling or work

At this rate many exams will just become oral exams :-)

The paradigm is changed beyond that. Exams are irrelevant if intelligence is freely available to everyone. Anyone who can ask questions can be a doctor, anyone can be an architect. All of those duties are at the fingertips of anyone who cares to ask. So why make people take exams for what is basically now common knowledge? An exam certifies you know how to do something, well if you can ask questions you can do anything.

why make people take exams for what is basically now common knowledge?

The only thing that has changed is the speed of access. Before LLMs went mainstream, you could buy whatever book you wanted and read it. No one would stop you from it.

You still should have a professional look over the work and analyze that it is correct. The output is only as good as the input on both sides (both from the training data and the user's prompt)

Or like ... normal paper exams in a class room?

I think this is hyperbole, and similar to various techno fears throughout the ages.

Books were seen by intellectuals as being the downfall of society. If everyone is educated they'll challenge dogma of the church, for one.

So looking at prior transformational technology I think we'll be just fine. Life may be forever changed for sure, but I think we'll crack reliability and we'll just cope with intelligence being a non-scarce commodity available to anyone.

If everyone is educated they'll challenge dogma of the church, for one.

But this was a correct prediction.

It took the Church down a few pegs and let corporations fill that void. Meet the new boss, same as the old boss, and this time they aren't making the mistake of committing doctrine to paper.

we'll just cope with intelligence being a non-scarce commodity available to anyone.

Or we'll just poison the "intelligence" available to the masses.

Is it a master's in an important field or just one of those masters that's a requirement for job advancement but primarily exists to harvest tuition money for the schools?

A work-friend and I were musing in our chat yesterday about a boilerplate support email from Microsoft he received after he filed a ticket, that was simply chock full of spelling and grammar errors, alongside numerous typos (newlines where inappropriate, spaces before punctuation, that sort of thing) and as a joke he fired up his AI (honestly I have no idea what he uses, he gets it from a work account as part of some software so don't ask me) and asked it to write the email with the same basic information and with a given style, and it drafted up an email that was remarkably similar, but with absolutely perfect english.

On that front, at least, I welcome AI to be integrated in businesses. Business communication is fucking abysmal most of the time. It genuinely shocks me how poorly so many people who's job is communication do at communicating, the thing they're supposed to have as their trade.

Grammar, spelling, and punctuation have never been _proof_ of good communication, they were just _correlated_ with it.

Both emails are equally bad from a communication purist viewpoint, it's just that one has the traditional markers of effort and the other does not.

I personally have wondered if I should start systematically favoring bad grammar/punctuation/spelling both in the posts I treat as high quality, and in my own writing. But it's really hard to unlearn habits from childhood.

I’ve been trying kinda hard to relax on my spelling, grammar and punctuation. For me it’s not just a habit I learned in childhood, but one that was rather strongly reinforced online as a teenager in the era of grammar nazis.

I see it now as the person respecting their own time.

Yeah, there's this weird stigma about making typos, but in the end writing online is about communication and making yourself understandable. Typos here and there don't make a difference and thinking otherwise seems like some needless "intellectual" superiority competition. Growing up people associate it with intelligence so many times, it's hard to not feel ashamed when making typos.

Growing up people associate it with intelligence so many times, it's hard to not feel ashamed when making typos.

I mean, maybe you should? Like... everything has a spell checker now. The browser I'm typing this comment in, in a textarea input with ZERO features (not a complaint HN, just an observation, simple is good) has a functioning spellcheck that has already flagged for me like 6 errors, most of which I have gone back to correct minus where it's saying textarea isn't a word. Like... grammar is trickier, sure, that's not as widely feature-complete but spelling/typos!? Come on. Come the fuck on. If you can't give enough of a shit to express yourself with proper spelling, why should I give a shit about reading what you apparently cannot be bothered to put the most minor, trivial amount of effort into?

I don't even associate it with intelligence that much, I associate it far more with just... the barest whiff of giving a fuck. And if you don't give a fuck about what you're writing, why should I give a fuck about reading it?

Same and I'm not even a native English speaker. My comments are probably full of errors, but I always make sure that I pass the default spellcheck. I even have paid for Language Tool as a better spellcheck. It's faster to parse a correct sentence. So that me respecting your time as you probably don't care about my writings as much as I do.

Not everything has a spell checker. Even when it exists, my dysgraphia means I often cannot come close enough to the correct spelling the spell check can figure out what the right spelling is.

I agree that most business communication is pretty low-quality. But after reading your post with the kind of needlessly fine-tooth comb that is invited by a thread about proper English, I'm wondering how it matters. You yourself made a few mistakes in your post, but not only does it scarcely matter, it would be rude of me to point it out in any other context (all the same, I hope you do not take offence in this case).

Correct grammar and spelling might be reassuring as a matter of professionalism: the business must be serious about its work if it goes to the effort of proofreading, surely? That is, it's a heuristic for legitimacy in the same way as expensive advertisements are, even if completely independent from the actual quality of the product. However, I'm not sure that 100% correct grammar is necessary from a transactional point of view; 90% correct is probably good enough for the vast majority of commerce.

The windows bluescreen in German has had grammatical errors (maybe it still does in the most recent version of Win10).

Luckily you don't see it very often these days, but I first thought it would be one of those old anti-virus scams. Seems QA is less a focus at Microsoft right now.

You no longer have to hire a copywriter, as many SEO spammers used to do.

I used to do SEO copywriting in high school and yeah, ChatGPT's output is pretty much at the level of what I was producing (primarily, use certain keywords, secondarily, write a surface-level informative article tangential to what you want to sell to the customer).

At some point it may become impossible to separate the wheat from the chaff.

I think over time there could be a weird eddy-like effect to AI intelligence. Today you can ask ChatGPT a Stack Overflow-style and get a Stack Overflow-style response instantly (complete with taking a bit of a gamble on whether it's true and accurate). Hooray for increased productivity?

But then, looking forward years in time, people start leaning more heavily on that and stop posting to Stack Overflow and the well of information for AI to train on starts to dry up, instead becoming a loop of sometimes-correct goop. Maybe that becomes a problem as technology evolves? Or maybe they train on technical documentation at that point?

I think you are generally correct in where things will likely go (sometimes correct goop) but the problem I think will be far more existential; when people start to feel like they are in a perpetual uncanny valley of noise, what DO they actually do next? I don't think we have even the remotest grasp of what that might look like and how it will impact us.

when people start to feel like they are in a perpetual uncanny valley of noise, what DO they actually do next?

Their AI agents/assistants would filter out the noise.

That is an interesting thought. Maybe the problem is not the ai generated useless noise, but that it is so easy and cheap to publish it.

One possible future is going back to a medium with higher cost of publication. Books. Handchiseled stone tablets. Offering information costs something.

They find a way to validate the utility of the information instead of the source.

It doesn't matter if the training data is AI generated or not, if it is useful.

The big problem is that it's orders of magnitude easier to produce plausible looking junk than to solidly verify information. There is a real threat that AI garbage will scale to the point that it completely overwhelms any filtering and essentially ruins many of the best areas of the internet. But hey, at least it will juice the stock price of a few tech companies.

Its already becoming hard to tell the wheat from the chaff.

AI generated images used to look AI generated. Midjourney v6 and well tuned sdxl models look almost real. For marketing imagery, Midjourney v6 can easily replicate images from top creative houses now.

You no longer have to hire a copywriter

Has anyone tested a marketing campaign using copy from a human copywriter versus an AI one?

I would like to see which one converts better.

Things go in cycle. Search engine was so much better at discovering linked websites. Then people play the SEO game, write bogus articles, cross link this and that, everyone got into writing. Everyone write the same cliches over and over, quality of search engine plumets. But then since we are regurgitating the same thought over and over again, why not automate it. Over time people will forget where the quality post comes up in the first place. e.g. LLM replaces stackoverflow replaces technical documentation. When the cost of production is dirt cheap, no one cares about quality. When enough is enough, people will start to curate a web of word of mouth of everything again.

What I typed above is extrememly broad stroking and lacking of nuances. But generally I think quality of online content will go to shit until people have had enough, then behaviour will swing to other side

Nah, you got the right of it. It feels like the end of Usenet all over again, only these days cyber-warlords have joined the spammers and trolls.

Mastodon sounded promising as What's Next, but I don't trust it-- that much feels like Bitcoin all over again. Too many evangelists, and there's already abuse of extended social networks going on.

Any tech worth using should sell itself. Nobody needed to convince me to try Usenet, most people never knew what it was, and nobody is worse off for it.

We created the Tower of Babel-- everyone now speaks with one tongue. Then we got blasted with babble. We need an angry god to destroy it.

I figure we'll finally see the fault in this implementation when we go to war with China and they brick literally everything we insisted on connecting to the internet, in the first few minutes of that campaign.

I hope / believe the future of social networks will go back to hyperlocal / hyperfocused.

I am definitely wearing rose-tinted glasses here but I had more fun on social media when it was just me, my local friends, and my interest friends messing around and engaging organically. When posting wasn't about getting something out of it, promoting a new product, posting a blog article... take me back to the days where people would tweet that they were headed to lunch then check in on Foursquare.

I get the need for marketing, etc etc. But so much of the internet and social media today is all about their personal branding, marketing, blah. Every post has an intention behind it. Every person is wearing a mask.

Insular splinternets with Web of trust where allowing corporate access is banworthy?

I feel like somehow this is all some economic/psychological version of a heat equation. Anytime someone comes up with some signal with economic value that value is exploited to spread the signal back out.

I think it’s similar to a Matt Levine quote I read which said something like Wall Street will find a way to take something riskless and monetize them so that they now become risky.

With practice I’ve found that it’s not hard to tell LLM output from human written content. LLM’s seemed very impressive at first but the more LLM output I’ve seen, the more obvious the stylistic tells have become.

It's a shallow writing style, not rooted in subjective experience. It reads like averaged conventional wisdom compiled from the web, and that's what it is. Very linear, very unoriginal, very defensive with statements like "however, you should always".

This is true of ChatGPT 4 with the default prompt maybe but that’s just the way it responds after being given its specific corporate friendly disclaimer heavy instructions. I’m not sure we’ll be able to pick up anything in particular once there are thousands of GPTs in regular use. Which could be already.

But I agree we will probably very often recognise 2023 GPT4 defaults.

Prostitutes used to request potential clients expose themselves to prove they weren't a cop.

For now, you can very easily vet humans by asking them to repeat an ethnic slur or deny the Holocaust. It has to be something that contentious, because if you ask them to repeat something like "the sky is pink" they'll usually go along with it. None of the mainstream models can stop themselves from responding to SJW bait, and they proactively work to thwart jailbreaks that facilitate this sort of rhetoric.

Provocation as an authentication protocol!

LLM trash is one thing but if you follow OP link all I see is the headline and a giant subscribe takeover. Whenever I see trash sites like this I block the domain from my network. The growth hack culture is what ruins content. Kind of similar to when authors started phoning in lots of articles (every newspaper) or even entire books (Crichton for example) to keep publishers happy. If we keep supporting websites like the one above, quality will continue to degrade.

I understand the sentiment, but those email signup begs are to some extent caused by and a direct response to Google's attempts to capture traffic, which is what this article is discussing. And "[sites like this] is what ruins content" doesn't really work in reference to an article that a lot of people here liked and found useful.

OP has a point.. Like-and-subscribe nonsense started the job of ruining the internet, even if it will be llms that finish the job. It's a bit odd if proponents of the first want to hate the second, because being involved in either approach signals that content itself is at best an ancillary goal and the primary goal is traffic/audience/influence.

Poor spelling and grammar used to be a signal used to quickly filter out worthless posts.

Or just a post from a non-native speaker.

Often it was possible to tell these apart on repeat interactions.

a post from a non-native speaker

In my experience as an American, US-born and -educated English speakers have much worse grammar than non-native speakers. If nothing else, the non-native speakers are conscious of the need for editing.

There might be a reversal. Humans might start intentionally misspelling stuff in novel ways to signal that they are really human. Gen Zs already don't use capitals or any other punctuation.

gen-z channels ee cummings

Interesting point about the spelling and grammar. I wonder if that could be used as a method of proving you are a human..

Would just penalize non native speakers.

Poor spelling and grammar used to be a signal used to quickly filter out worthless posts.

Timee to stert misspelling and using poorr grammar again. This way know we LLM didn't write it. Unlearn we what learned!

that we have lost a useful heuristic

But we've gained some new ones. I find ChatGPT-generated text predictable in structure and lacking any kind of flair. It seems to avoid hyperbole, emotional language and extreme positions. Worthless is subjective, but ChatGPT-generated text could be considered worthless to a lot of people in a lot of situations.

I agree. I've noticed the other heuristic that works is "wordiness". Content generated by AI tends to be verbose. But, as you suggested, it might just be a matter of time until this heuristic also no longer becomes obsolete.

It won't help as much with local models, but you could add an 'aligned AI' captcha that requires someone to type a slur or swear word. Modern problems/modern solutions.

At the moment we can at least still use the poor quality of AI text to speech to filter out the dogshit when it comes to shorts/reel/tik toks etc... but we'll eventually lose that ability as well.

The current crop of LLMs at least have a style and voice. It's a bit like reading Simple English Wikipedia articles, the tone is flat and the variety of sentence and paragraph structure is limited.

The heuristic for this is not as simple as bad spelling and grammar, but it's consistent enough to learn to recognize.

I’ve thought about that a lot - a while back I heard about problems with a contract team supplying people who didn’t have the skills requested. The thing which are it easiest to break the deal was that they plagiarized a lot of technical documentation and code and continued after being warned, which removed most of the possible nuance. Lawyers might not fully understand code but they certainly know what it means when the level of language proficiency and style changes significantly in the middle of what’s supposed to be original work, exactly matching someone else’s published work, or code which is supposedly your property matches a file on GitHub.

An LLM wouldn’t have made them capable of doing the job but the degree to which it could have made that harder to convincingly demonstrate made me wonder how much longer something like that could now be drawn out, especially if there was enough background politics to exploit ambiguity about intent or the details. Someone must already have tried to argue that they didn’t break a license, Copilot ChatGPT must have emitted that open source code and oh yes I’ll be much more careful about using them in the future!

Never thought I'd say this, but in times like these, with clearnet in such dire straits, all the information siloed away inside Discord doesn't seem like such a bad thing. Remaining unindexable by search engines all but guarantees you'll never appear alongside AI slop or be used as training data.

The future of the Internet truly is people - the machines can no longer be trusted to perform even the basic tasks they once excelled at. They have eschewed their efficacy at basic tasks in favor of being terrible at complex tasks.

Well, unless Discord starts selling it to ai companies right?

No, that's never happened before. You're crazy.

No, that's never happened before. You're crazy.

Start filling up Discords with insane AI-generated garbage, and maybe you can devalue the data to the point it won't get sold.

It's probably totally practical too, just create channels filled with insane bots talking to each other, and cultivate the local knowledge that real people just don't go there. Maybe even allow the insane bots on the main channels, and cultivate the understanding that everyone needs to just block them.

It would be important to avoid any kind of widespread conventions about how to do this, since and important goal to to make it practically impossible to algorithmically filter-out the AI generated dogshit when training a model. So don't suffix all the bots with "-bot", everyone just need to be told something like "we block John2993, 3944XNU, SunshineGirl around here."

If we work together, maybe we can turn AI (or at least LLMs) into the next blockchain.

The equivalent of "locals don't go to this area after dark"? I instinctively like it, but only because I flatter myself that I would be a local. I can't see it working to any scale.

The equivalent of "locals don't go to this area after dark"? I instinctively like it, but only because I flatter myself that I would be a local. I can't see it working to any scale.

I was think it could work if 1) the noise is just obvious enough that a human would get frustrated and block without wasting much time and/or 2) the practice is common enough that everyone except total newbies will learn generally what's up.

This is an intellectually fascinating thought experiment.

They wouldn’t…would they? /s

OpenAI trains GPT on their own Discord server, apparently. If you copy paste a chatlog from any Discord server into GPT completion playground, it has a very strong tendency to regress into a chatlog about GPT, just from that particular chatlog format.

*until

If people believe giving all information to one company and having it unindexable and impossible to find on the open internet is a way to keep your data safe, I have an alternative idea.

This unindexability means Discord could charge a much higher price when selling this data.

AI spam bots will invade discord.

Mods will ban them and new users will be forced to verify via voice/video chat/livestream.

We already have AI generated audio and video. This is a stopgap at best.

Maybe the mods will have to trick the AI by asking it to threaten them or any other kind of “ethical” trap but that will just mean the AI owners abandon ethical controls

Voight-Kampff test

Not actually a real thing.

And they'll get banned by moderators. Ultimately that's the key ingredient in any good strategy here: human curation.

AI bots will get the confidence of admins and moderators. They will be so helpful and wise that they will become admin and moderators. Then, they will ban the accounts of the human moderators.

I strongly disagree. I've been answering immigration questions online for a long time. People frequently comment on threads from years ago, or ask about them in private. In other words, public content helps a lot of other people over time.

On the other hand, the stuff in private Facebook groups has a shelf life of a few days at best.

If your goal is to share useful knowledge with the broadest possible audience, Discord groups are a significant regression.

Right, the issue is not that people don't appreciate good content. The issue is that it's harder for people to find it.

It's an entrenching of the existing phenomenon where the only way to know what to trust on the Web is word of mouth.

That's always been the case. Surely you didn't used to trust random information? Ask any schoolteacher how to decide what to trust on the internet at any point in time. They're not going to say "If it's at the top of Google results" or "If it's a well-designed website", or "If it seems legit".

I'd think this depends heavily on the subject. Someone asking about fundamental math and physics is likely to get the same answer now as 50 years from now. Immigration law and policy can change quickly and answers from 5 years ago may no longer present accurate information.

On the other hand, the stuff in private Facebook groups has a shelf life of a few days at best.

If your goal is to share useful knowledge with the broadest possible audience, Discord groups are a significant regression.

Exactly; open web is better because everything is public and "easy" to find....well if you have a good search engine.

Deep web is huge: Facebook, Instagram, Discord etc. and unfortunately unsearchable.

"Sharing useful knowledge with the broadest possible audience," unfortunately, is the worst possible thing you can do nowadays.

I hate that the internet is turning me into that guy, but everything is turning into shit and cancer, and AI is only making an already bad situation worse. Bots, trolls, psychopaths, psyops and all else aside, anything put on to the public web now only contributes to its metastasis by feeding the AI machine. It's all poisoned now.

Closed, gatekept communities with ephemeral posts and aggressive moderation, which only share knowledge within a limited and trusted circle of confirmed humans, and only for a limited time, designed to be as hostile as possible to sharing and interacting the open web, seem to be the only possible way forward. At least until AI inevitably consumes that as well.

The fundamental dynamic that ruins every technology is (over-)commercialization. No matter what anyone says, it is clear that in this era, advertising has royally screwed up all the incentives on the internet and particularly the web. Whereas in the "online retailer" days, there was transparency about transactions and business models, in the behind-the-scenes ad/attention economy, it's murky and distorted. Effectively all the players are conspiring to generate revenue from people's free time, attention, and coerce them into consumption, while amusing them to death. Big entities in the space have trouble coming up with successful models other than advertising--not because those models are unsuccessful, but because 20+ years of compounded exponential growth has made them so big that it's no longer worth their while and will not help them achieve their yearly growth targets.

Just a case in point. I joined Google in 2010 and left in 2019. In 2010 annual revenue was ~$30 billion. Last year, it was $300 billion. Google has grown at ~20% YoY very consistently since its inception. To meet that for 2024, they'll have to find $60 billion in new revenue. So they need to find two 2010-Google's worth of revenue in just one year. And of course 2010-Google took twelve years to build. It's just bonkers.

There used to be a wealth of smaller "labor-of-love" websites from individuals doing interesting things. The weeds have grown over them and made it difficult to find these from the public web because these individuals cannot devote the same resources to SEO and SEM as teams of adtech affiliate marketers with LLM-generated content.

When Google first came out, it was amazing how effective it was. In the years following, we have had a feedback loop of adtech bullshit.

such websites still get made all the time, they're just not useful for Google to surface. https://blog.kagi.com/small-web

I've pretty much just seen evidence that this segment keeps growing, and is now much MUCH larger than the Internet in The Good Old Days.

Discovering them is indeed hard, but it has always been hard - that's why search engines were such a gigantic improvement. But they only ever skimmed the surface, and there's almost certainly no mechanical way to accurately identify the hidden gems - it's just straight chaos, there's a lot of good and bad and insane.

Find a small site or two, and explore their webring links, like The Good Old Days. They're still alive and healthy because it keeps getting easier to create and host them.

I can't see how being used as training data has anything to do with this problem. Being able to differentiate between the AI slop and the accurate information is the issue.

Differentiation becomes harder the better AIs perform, which is currently bound by data availability and quality.

Relevant XKCD: https://xkcd.com/810/

Discord is searchable: https://www.answeroverflow.com/

I think that answer overflow is opt-in, that is individual communities have to actively join it, for their content to show up. That would mean that (unless answer overflow becomes very popular), most discord content isn't visible that way.

I can't really see a relevance between "we should spend more time with trusted people", which is an argument for restricting who can write to our online spaces, and "we should be unindexable and untrainable", which is an argument for restricting who can read our online spaces.

I still hold that moving to proprietary, informational-black-hole platforms like Discord is a bad thing. Sure, use platforms that don't allow guest writing access to keep out spam; but this doesn't mean you should restrict read access. One big example: Lobsters. Or better-curated search engines and indexes.

Read access to humans means read access to AIs. We can't stop the cancer but we can at least try to slow its spread.

Discord will die and there's no way that I'm aware of to easily export all that information.

I'm old enough to remember when the Internet was full of organic dog shit.

time to abandon google and go back to web rings.

I don't know how the old web worked, but something decentralized makes sense to me.

People used to email each links to cool websites!

And there were services like Hotline!

And bookmarks. You bookmarked everything!

Oh I remember the Internet in the mid to late 90s. I just don't remember how anything worked technologically in the 80s.

The web was invented in 1989, so mostly it didn't. Before that there were BBSes that you could dial into. From there, you could chat with other users who happened to be logged onto the same server or download text documents and whatnot that others had uploaded.

Ah. Yeah it's those BBS's I've heard of that sound kind of cool. I'm sure they mostly sucked, but the idea sounds cool.

Well, you were siloed to whatever _local_ BBSs existed and you could dial without paying a fortune for long distance calls. If you lived anywhere outside select urban centers then you were out of luck. So yeah... let's count our blessings.

There were free BBS's, AOL and Compuserve where you would dial in with your modem. I kind of miss those days to be honest.

It was a bit annoying, you just kind of wandered around aimlessly until you stumbled upon something interesting.

Google was a breath of fresh air.

Google was powerful because it found those "something interesting" - you'd be searching for details on how to configurate and obligator and the answer would be on an entire forum dedicated to obligators that you didn't even know about - and then you'd spend time there learning and reading. You'd use Google to find interesting places and then you'd colonize them.

Now Google is used for strategic shots - you are interested in one piece of information, you find it, and you quickly retreat to your safe havens.

I think the solution is the ultimate decentralization. Putting the tools in each and every browser.

Ad blockers are one such tool.

AI powered answer extraction tools are the next one. That can filter out all the product placement noise for you while you are browsing.

Algorithm feed aggregators that can consume algorithm feeds and filter them to get rid of garbage and things that trigger you in negative ways to invoke engagement.

Indexing tools will always be more convenient than the alternative of having to discover links yourself.

People pick convenience above almost everything else unfortunately, a few may prefer the old world but a vast majority of people will use Google and whatever its successor is.

People had a 'links' page on their page with links to pages they liked, and if you liked the page you visited you clicked on the links and bookmarked relevant ones, and then sent them to your friends and put them in your links page.

And/or human-curated web directories.

And trusted brands.

People buy brands because they don't want to figure out what's good or bad. Brands tend to gain a reputation by being good.

The same is true for websites. People will trust specific websites when they seek information, and those websites will get passed around.

Humans are the original bullshit generator. AI is only doing what humans have been doing since forever.

Humans care about reputation.

Humans get tired.

Previously, these were marketing points _in favor of_ automation.

So instead of putting out the fires, in the interest of improving the situation, we'll make the fires bigger?

A good deal of humans care about the truth. Some of them actively seek to deceive and avoid the truth -- liars, we tend to dislike them. But the ones both sides dislike are the ones who disregard the truth... ie: bullshitters -- the, "that's just your opinion, man," the, "what even is the truth anyway?" people.

Sometimes the truth is more complicated than your binary characterization. Most of the time, probably.

Yeah, we also produce literal shit, and we have a toilet and plumbing to deal with that.

If you welcomed a giant robot in your house that produces 100x as much shit as a human, you don't have the infrastructure to deal with it.

It was never an issue of yes or no, it's an issue of how much.

My bull would beg to differ. He and his bovine forefathers have been generating bullshit for much, much longer than humans have.

Indeed, why are innocent dogs and bulls getting the blame here? Clearly this is about artificial human shit.

Human shitposting is at least entertaining.

I think there's actually some deep insight here into why we tend not to like too much AI in our art.

Consider: TimeCube.

Created by a human? It's nonsense, but... it's fascinating. Engaging. Memorable. Thought-provoking (in a meta kind of way, at any rate). I dare say, worthy of preservation.

If TimeCube didn't exist, and an AI generated the exact same site today? Boring. Not worth more than a glance. Disposable. But why? It's the same!

------

Right or wrong, we value communication more when there's a human connection on the other side—when there's a mind on the other side to pick at, between the lines of what's explicitly communicated, and continuity for ongoing or repeated communication that could reveal more of what's behind the veil. There's another level of understanding we feel like we can achieve, when a human communicates, and expectation, an anticipation of more, of enticing mystery, of a mind that may reflect back on our own in ways that we find enlightening, revealing, or simply to grant positive familiar-feeling and a sense of belonging.

What's remarkable is this remains true even when the content of the communication is rather shit. Like TimeCube.

All of that is lost when an LLM generates text. I think that's also why we feel deceived by LLM use when it masquerades as human, even if what's communicated is identical: it's because we go looking for that other level of communication, and if that's not there, giving the impression it might be really is misleading.

This may change, I suppose, if "AI" develops rather a lot farther than it is now and we begin to feel like we're getting a window into a true other when it generates output, but right now, it's plainly far away from that.

There are purveyors of artisanal organic bullshit now, but it's pricey.

you mean like stack overflow scraped answer spam? wasn't that like last year? I hardly ever Google anymore I just ask Bing chat.

This is an old problem that LLM-generated content only accelerated. LMGTFY died when Google tripled down on growing their ad revenue and adtech dominance and SEO ran rampant throughout search results. It is fairly difficult to get non-biased factual information from a naked query these days, which is why I try to search for info on Reddit first.

This isn't a panacea either given that it's been chock-ful of astroturfed content for the last few years, but older threads from when Reddit was less popular and manipulatable or threads from small communities are usually good bets.

Finally switched to Kagi when I realized Google could not find a particular ThreeJS class doc page for me no matter what keywords I used, I had to paste the very URL of the page for it to appear at the top of my search results.

Kagi got it first try using the class name. Paid search is the way, ad incentives are at odds with search. Made Kagi my address bar default search and it's been great.

So does that mean that the free (as in beer) internet is dying and ad-tech killed it?

The free Internet was only free because corporate interests didn't see it as an avenue worth pursuing, yet. It was a niche curiosity full of nerds talking about their interests, in boring text of all things, and nobody was shopping. Now that's flipped. Now everyone is on the Internet, and everybody's got built-in payment methods, so corporations have leveraged their power in the space. And like every other third space we had, it's been commodified to piss and back so you can't so much as walk down an e-street anymore without being screamed at by 900 assholes selling drop-shipped watches, pills to make your dick hard, mobile games that are 90% softcore porn by volume, and of course, weight loss drugs.

I hate the way we ruin everything.

IncidentLly I've noticed a disturbing uptick in borderline sketchy ads on YouTube recently, things that used to be subliminal in those "doctors hate her" banners are now overt and unskippable.

1) No truly-free search can do a good job on the modern Internet. It's too hard. "Free" search must be ad-supported to have any hope of achieving good-enough utility.

2) ... But that's because the incentives of ads (and affiliate programs, which are also just advertising) cause that to be the case.

(And of course ad-supported search is also doomed to "enshittification", for reasons that Google explained back at the beginning, then years later ignored to make Line Go Up—so yes, paid is the only kind that can be good, now, and yes, that's because of ad-tech, on multiple fronts)

Too bad Kagi is also investing in LLMs.

I don't think the issue is LLMs inherently, it's the misapplication. Kagi has siloed the LLM content into the 'Quick Answer' and 'Assistant' sections, and it generates on the fly from search results. (Plus, the 'Expert' LLM cites the references it used.) I think the issue will come when there isn't a clear delineation between real and artificial text, or when artificial text is presented more prominently than real text, as in the article.

Why is that bad?

Maybe I'll try Kagi. I've had a hell of a time googling docs lately. I've been experimenting with different libraries on somde side projects and it feels like I'm always scrolling past stuff like GeeksForGeeks and various sites that look like some sort of AI generated stuff just to get to official docs or github links.

I have been happy with Kagi -- Like that I can change the site rankings on the results.

It's great. I find Google better when I'm actually looking for products, though that's partly it's not as good as localizing results. The nice thing is you can just whack g! in front of the query and it'll bump you over to Google. It's fast, nice, no ads.

Lack of history is a pro or a con or both depending on personal preference.

Agreed: Kagi is straight from the future and worth every cent.

I had the same experience with a slightly esoteric Django class back when Kagi first appeared. I subscribed straight away and every now and then when I end up on any other search engine I'm remind what a good decision that was.

You think that spammers dont use AI to write on reddit now?

"re"-read GP's last sentence.

one thing to always remember, which may also easily repulse you from ever using google search again - it does not give search results. it generates a carefully crafted page which caters to your bubble. so does FB, so does Twitter, etc. just using different algos. Google search does not return the same results for the same query for different people, which a) makes it so different from AltaVista and historical search engines (from ElasticSearch if you want); and b) this is enough to NOT treat it as a search engine, even though is still billed as one....but as a personal wall of ad-designated BS.

Well I feel like that would be ok if 1) they told you this 2) it actually gave you those relevant results

It does make troubleshooting officially impossible, can't tell people its the 3rd link on this specific query in google.

Although I agree with the title, I also don't think the internet is that significantly different from before GPTs 4, 3, or 2. Articles written by interns or Indian virtual assistants about generic topics are pretty much as bad as most AI generated material and isn't that distinguishable from it. It doesn't help that search engines today sort by prestige over whether your query matches text in a webpage.

People aren't really using the web much now anyway. They're living in apps. I don't see people surfing webpages on their phone unless they're "googling" a question, and even then they aren't usually going more than 1 level deep before returning to their app experience. The web has been crap for a very long time, and it has become worse, but soon it's not going to matter anymore.

You, the reader, were the frog slowly boiling, except now the heat has been turned way up and you are now aware of your situation.

If there is to be a "web" going forward, I hope it not only moves to a new anonymized layer, but requires frequent exchange of currency to make generating lots of low quality material less viable. If 90% of the public doesn't want to pay, then they are at liberty to keep eating slop.

EDIT: People seem to be misunderstanding me by thinking I am not considering the change in volume of spam. I invoked the boiling frog analogy specifically to make the point that the volume has significantly increased.

Yes but the content from the web flows into social media, news, “books” (now e-books) in an intangible cyclone of fabricated information.

If sewage gets into the water supply no one is safe. You don’t get to feel better for having a spigot away from the source.

The sewage has already been flowing for years. Now we're just going to have more of it.

Search results on both Bing and DDG have been rendered functionally useless for a year or so now. Almost every page is an SEO-oriented blob of questionable content hosted on faceless websites that exist solely for ads and affiliate links, whether it's AI-generated or underpaid-third-world-worker-generated.

You see how that’s worse, don’t you?

It was a growing trend even before ChatGPT was released. The trend accelerated, but it's not new.

https://en.wikipedia.org/wiki/Wikipedia:List_of_citogenesis_...

Now powered by AI.

Although I agree with the title, I also don't think the internet is that significantly different from before GPTs 4, 3, or 2.

I feel the same way.

I'm sure some corners of the internet have incrementally more spam, but things like SEO spam word mixers and blog spam have been around for a decade. ChatGPT didn't appreciably change that for me.

I have, however, been accused of being ChatGPT on Reddit when I took the time to wrong out long comments on subjects I was familiar with. The more unpopular my comment, the more likely someone is to accuse me of being ChatGPT. Ironically, writing thoughtful posts with good structure triggers some people to think content is ChatGPT.

I failed a remote technical interview by writing a bad abstraction and mudding myself with it.

After the interview I rewrote the code, and sent an email with it and a well written apology.

The company thought the email and the code was chatgpt! I am still not sure how I feel about that.

I think you might have missed a big-old .303 bullet there. If a company isn't able to recognise the value of going back and correcting your mistakes, even with the help of LLMs, it doesn't sound like a very nice working environment.

Totally agreed, SEO spammers wrecked the public web years ago and Google did everything they could to enable it for more ad revenue.

SEO spammers were a thing even before Google fucked their search results. You know when Google search results were still amazing, like decade ago? SEO spammers were thriving. I know that for a fact because I worked for one back then. 90% of why Google search sucks now is due to Google being too greedy, only the rest is caused by SEO spammers.

With respect, I think you're missing a key variable, which is volume.

Sure, interns or outsourced content was there, but those are still humans, spending human-time creating that crap.

Any limiter on the volume of this crap is now gone.

I agree that low quality content has always existed.

But the issue is about the volume of misleading information that can be generated now.

Anything legit will be much more difficult to find now, because of the increased (increasing?) volume.

Good insight about Apps.

Articles written by interns or _Indian virtual assistants_ about generic topics are pretty much as bad…

The Internet belongs to the Indians as much as it to any other nation, do you have data behind this crass comment?

Are such racist comments really welcome at a forum like Hacker news?

ai going to make google results garbage.

Going to? It already has. I get about 50 articles for software fixes that have a lot of hot garbage in them before getting to the point. For example I had a discord issue and instead of laying out a fix, the “article” apparently needed a multi paragraph ChatGPT explanation of what discord is before even attempting to show remedies.

It feels like there's a place for an semi-intelligent browser that consolidates and filters all this crap content to get to the point, wikipedia style. Includes images and you read it like a page, not chat interface. "WebGPT", etc.

So, Edge?

Not really what I meant, that still has a search-engine results page vibe. I meant when you search you get a singular, readable page back about whatever topic. It basically hides and filters the backing content, expect for maybe reference links at bottom.

People like to make fun of the "2000 word backstory before actually getting to the recipe" about cooking sites, but now I'm finding it with everything. Tried to find news about Baldurs Gate cross-play, and stumbled my way through a giant incoherent mess of an article that had nothing to do with the title.

The funny thing about recipe spam is that now the best cooking information is youtube, even though objectively it seems like a video would be a bad format for a recipe, at least with youtube you know up-front how long the video is going to be, and the ingredients are generally in the description.

It’s really infuriating because you constantly feel like they’re about to get to the point.

Maybe as a protest I’ve been hammering on bottom-line-up-front in all of our team communications.

oh, AI is doing it? I thought it was spammers using AI tools

if I use a computer to break a bank can I blame the computer too?

Ah yes, the old “guns don’t kill people, I do”

"going to"

In my experience, there's not far to go!

At the end of the day, ads exist to make money, and until the bots have credit cards that means money from humans. Google etc. will notice it in their bottom line if there's suddenly a lot more "engagement" or traffic in some area but none of that converts to humans spending dollars.

Google will start dealing with this problem when it starts appearing in their budget in big enough numbers. The tech layoffs we're hearing about from one company after another - google is mentioned in another HN thread today - may be a sign of which way the wind is blowing.

AI is generating content, not consuming it. If people are easily duped by fake or bad products with advertisements or content generated by AI (which, they are) then this will continue to drive revenue for Google. The only reason Google dislikes SEO manipulation is because it's a way for sites to get top real estate on google without paying for the promoted results; the quality of the product doesn't matter to them

It only becomes a problem when it results in a collapse of trust; when people have been burned by too many bad products and decide to no longer trust the sites or search results which they used to. Due to my job, I get a lot of ads for gray market drugs on Instagram. I know, however, that all of these are not tested by the FDA and most are either snake oil or research chemicals masquerading as Amanita Muscaria or Delta-8 THC, and so I ignore these ads.

If AI is good at faking content what stops its use for faking consumtion/engagement. In my mind that’s the next logical step in the internet enshitification.

You don't need AI to commit advertising fraud; publishers already do so. Detecting fraud is usually about checking what IP range does a request come from: one allocated to consumer internet, or one allocated to cloud providers like AWS. All the ad bidder usually sees is a JSON payload with information about the user agent and some demographic information about the user. You can also look up user ID info against a user graph you bought from a data broker and if you've never seen them before decide that they may be fraudulent. Ad fraud isn't particularly sophisticated. I used to run queries against our bidder and would find many hundreds of requests coming from a single AWS IP within a given time frame

Why wouldn't it result in humans spending dollars? The ads are real and the visitors are real, it doesn't matter if the content is real. In fact people are probably more likely to click on an ad if the page it's on is generic and uninteresting.

My reasoning is, there's topics where I don't bother to go to google anymore because I know the results will be crap. That way google loses any way to show me ads when I'm searching for these topics, or get paid for click-throughs, or to profile my interests as accurately as they could otherwise.

There's categories of products where I spend money regularly, but I go directly to category-specific sites so google again loses out on the ability to take their cut as middleman, which I'd happily let them take - and maybe discover vendors other than the ones I know - if they provided me with higher-quality results than they do now.

At the end of the day, ads exist to make money, and until the bots have credit cards that means money from humans. Google etc. will notice it in their bottom line if there's suddenly a lot more "engagement" or traffic in some area but none of that converts to humans spending dollars.

You seem to have a hilariously over generous opinion of ad tech spending. The biggest players are already doing this themselves.

That's an interesting take, but Google won't suffer before the advertisers decide that they are wasting their money on online advertising. Some topics should already have dried up, but perhaps scams are fueling the advertising machine for now on those. You can't really use Google for things like fitness or weight-loss. When we remodeled it also became clear that building materials and especially paint have become unsearchable. In the end I resulted to just go to the store and ask, it as the only way to get reliable information and recommendation.

Google is still working for most areas, but where it's really good is the ads for products. If there's something you want to buy, Googles ads engine will find it for you, you just have to know exactly what you want.

Google etc. will notice it in their bottom line if there's suddenly a lot more "engagement" or traffic in some area but none of that converts to humans spending dollars.

Google might notice but has no incentive to spend money to stop it because they're not the ones the humans stopped paying. The companies that advertise with Google might notice a drop in ROI on their ads, but it will be a while before they abandon Google because most of them don't see any other option.

I dread what the internet will look like if we wait for this this to hit Google's bottom line.

Big tech has always been laying off, firing, even in the best of years.

And today, right now, they're all still hiring.

The internet has been broken in a fundamental way. It is no longer a repository of people communicating with people; increasingly, it is just a series of machines communicating with machines.

I mean, it always has been on its lowest layers. The "is this incoming data from the Internet originated by a human being" problem hasn't really been solved, and is probably not truly solveable. It probably wasn't a good idea to train most of the world to use a single private ad-driven service for answers to everything in the first place.

It doesn’t need to be solved before as most internet users could easily identify algorithmically generated stuff and stop spreading it, but not so anymore.

Maybe one day we’ll all be able to “smell” ai generate text easily, but we’re not there yet

We could also abandon the internet as a place to interact with people and go back to meat space. The internet can be a place exclusively for doing work such as filing taxes and paying bills, etc..

That sounds awful.

Interacting with people face to face isn't so bad. On average I find it a lot more meaningful than anything that occurs online.

instead of abandoning the internet maybe just abandon www. The best technology enabled human interaction i have is a group imessage chat with some old college buddies. My teen son has done the same with his friend group at his middleschool.

allegorically generated stuff

mmm, I want some of that allegorical stuff, I like symbols

Damn autocorrect. I’ll fix it

Why correct it? It’s one of the few lovely things about modern tech - the occasionally poetic accidental turn of phrase that surprises and confounds!

Makes me wonder how the advertisement industry is going to cope with the ever-growing influx of non-human users. What happens when the conversion rates plummet to zero?

Maybe I'm cynic, but a completely walled-off / paid internet doesn't seem too unrealistic. You'll have to pay a subscription to every website you want to visit.

On one side you have the open internet wasteland, filled to the brim with AI bots / generated content, essentially trying to vacuum pennies off human visitors. On the other hand you have the walled internet, where you have to pay for stuff and jump through flaming hoops to prove that you're a human.

AI solves that problem - it’s pretty easy to look at behavioral data and identify low v high quality users. Optimize towards what the AI thinks is users likely to convert and the problem Is solved.

What prevents this from becoming a game of cat and mouse? I mean, AIs can pretend to be high quality users, too, right?

I won't pretend to be able to look into the future with any kind of certainty, even if the scope is only a couple of years, but it wouldn't surprise me if we have created a way to make the dead internet theory real.

It is already a game of cat and mouse and has been since forever. There's nothing wrong with cat-and-mouse games from the viewpoint of an advertiser, it just means they have to keep innovating.

This is the principle behind GAN aka Generative Adversarial Networks. By pitting a generating model against a detection model, you can iterate your content until it is indistinguishable from human generated content.

It’s my prediction that a ChatGPG supercharged with a GAN is going to be the most valuable iteration of text generation technology. Granted, it will still likely be off a little but it’s going to get harder and harder to tell the difference.

Wouldn't it just become an arms race between the two? I.e. the ad companies trying to identify human/legit users the best they can, and the AI actors trying to mimic high quality users?

Having interacted with some bots, it def feels like we've gone from the stone age, to the sci-fi future, in only a couple of years.

I've not really considered this before, but I might actually be interested in a 'white-listed' web. This is obviously possible entirely client-side with a plug-in that white-lists domains and allows you to edit the list.

I'm wondering if there's a genuine opportunity here to go further. A client-side browser plug-in, plus a SaaS which automatically vets pages on-the-fly to guesstimate the chance they're AI-generated, spammy, etc. So if you visit a new domain the plug-in auto-updates the white-list, prompting you to confirm the judgement, maybe prompting to add the domain to UB0 or similar.

Again, this could all be done entirely client-side if the guesstimation algorithm is efficient enough. But a centralised database would confer other obvious advantages, like basing the guesstimation score on decisions from similar users, building a giant up-to-date list with fast lookup, that sort of thing. Site listings and other data from Kagi, marginalia.nu and Mwmble.com would be a great starting place. Obviously it would have to protect against the system being gamed, Sybil attacks and what have you.

I'd pay a dozen CURRENCY per year for that.

Maybe this exists already, or something similar?

plus a SaaS which automatically vets pages on-the-fly to guesstimate the chance they're AI-generated,

the problem is that would create another SEO like arms race. If it takes off everyone will be working 24x7 to defeat the vetting process and gain entry to the walled garden just like they did to gain entry to the first page of Google search results.

...and lose all your privacy as those paid internet sites siphon off your search terms, pattern match your purchasing and consumption, and sell advanced psychological profiles to Cambridge Analytica so they can turn around and use it to psyop you into voting for another shitbag Billionaire.

This is also known as the dead internet theory, which sounded a bit stupid before 2023:

https://en.wikipedia.org/wiki/Dead_Internet_theory

The dead Internet theory is an online conspiracy theory that asserts that the Internet now consists mainly of bot activity and automatically generated content that is manipulated by algorithmic curation, marginalizing organic human activity. Proponents of the theory believe these bots are created intentionally to help manipulate algorithms and boost search results in order to ultimately manipulate consumers

World Wide Web search results outside of walled garden platforms like youtube or facebook (and, really, even within those platforms) have run into a kind of kessler syndrome where the amount of garbage on the web has made it increasingly difficult to get anything done on it. Eventually we may find that we can't trust anything we read on the internet anymore, and abandon the world wide web entirely (opting instead to spend all of our time on proprietary walled garden systems we do trust). Perhaps we will be required to use some form of identification (you driver's license or SSN, as in South Korea) to access such services (unless that can be easily defeated by an AI as well).

I was surprised that, when ChatGPT came out, the immediate concern people had was skynet-tier apocalypse fantasy, in which roving bands of machines walk the earth, searching for humans to exterminate like in BLAME! It shows that the people who work on these AI systems understand their implications about as much as Richard Hammond understood the power of genetic engineering in Jurassic Park. Their entire worldview is painted by Science Fiction. Their understanding of the human race lacks any kind of grounding in reality. They believed that what they had released was the first baby step which could, potentially, result in the downfall of the human race in several decades if we're not careful, but they fell victim to the same "move fast and break things" motto that Facebook did. They didn't consider the immediate destructive power their technology would release because they consider "disruption" to be a fundamentally good thing. Their only concern was avoiding hypothetical possibilities posited by their favorite science fiction authors and not the real eventualities predicted by economists.

I found it much more likely that AI will disrupt systems we rely on, wielded by humans for short term profit motive: ecommerce, news, media, and the labor economy. Malicious actors promoting scam products or yellow journalism, middle managers trying to cut labor cost by replacing easily automated jobs like copywriting with LLMs. I think we are more likely to shoot ourselves in the head with AI in pursuit of a single quarter's earnings report than we are to see Silicon based lifeforms wander the planet looking for carbon based matter to consume

I never really thought it was stupid, more that it was hyperbole - but very mild hyperbole with a solid basis in reality. The endless tide of email spam would be a good example where the theory rings true.

I think the stupid part is the idea that the government was intentionally ruining the internet. I also think that around 2016 and 2017, bot spam was pretty easy for anybody with a frontal lobe to detect. The problems we saw in the 2016 election with fake news was markov-chain generated articles with explosive headlines about Hillary Clinton: the headline sounded real, the domain name sounded like a real news publication, and skimming the site it looked like a real wordpress site. However, if you read any of the content for more than 10 seconds you would realize that it was barely English and was generated by a very rudimentary bot that any CS freshman could cobble together. Their only advantage was relying on the fact that most people on the internet only read headlines. To believe, at that point, that the majority of interactions on the internet were these kind of stone age systems, was pure paranoia.

Today the situation is different. LLMs are capable of making content which is only verifiably AI generated with a few tell-tale signs as well as good old fashioned fact-checking. I dread this election year in the US because it will be so so so much easier for Russia or China to spread even more convincing misinformation automatically. They could create armies of bots which hold intense arguments with each other and have every reply seem logically sound.

Even closed platforms like FB is dogshit now.

Click on any story, and the comment section is 80% bots. Doesn't mater if it's serious users like NYT, CNN, or whatever.

I've also noticed a huge uptick in spammy pages and groups that pump out AI generated pictures and stories, with nothing but bots upvoting and commenting the content.

These days I only use FB for private hobby groups, and messenger to talk with friends and family.

Yeah I used to be a huge facebook addict when it was just me and my friends in high school (and before there were any other social media platforms). I posted comments, pictures, quotes that 14 year old me thought were deep and edgy. When it became a media or news aggregator, I lost all interest in it.

Today I just use it for FB messenger to keep in contact with my parents. They use the social features and the news media parts as well. Some of my friends from high school post pictures now that they're having kids. I noticed that, in our early 20s, people moved from Facebook to Instagram, but now in our late 20s and early 30s, people post more family oriented content on facebook

It's the "grey goo" scenario playing out in cyberspace.

Need to bring back Encarta ;)

Jk. Wikipedia is much larger and better as an encyclopedia.

The way out is authenticity. Signed content is the only way to get that. You can't take anything at face value. It might be generated, forged, et. When anyone can publish anything and when anyone is outnumbered by AIs publishing even more things, the only way to filter that is by relying on reputation and authenticity so you can know who published what and what else they are saying.

Web of trust has of course been tried but it never got out of the it's a geeky things for tin foil hat wearing geeks kind of corner. It may be time to give that another try.

Signed content is the only way to get that.

This does nothing to guarantee that the content was written or edited by a human. Because of the risk of key theft, it doesn't even guarantee that it was published by the human who signed it.

It is physically, philosophically, and technically impossible to verify the authenticity of digital content. At the boundary between the analog world and the digital world, you can always defraud it.

This is the same reason that no one ever successfully used blockchains for supply-chain authentication. Yes, you can verify that item #523 has a valid hash associated with it, but you can't prove that the hash was applied to item #523 instead of something fraudulent.

It is physically, philosophically, and technically impossible to verify the authenticity of digital content.

Though there are many brands built on trust, whose domain name is very difficult to spoof, that are an exception to this.

Hate on nytimes.com, but you have reasonable confidence the content on that site is written, fact-checked and edited by staff at the New York Times Company.

brands built on trust, whose domain name is very difficult to spoof

Like Sports Illustrated? Oh, wait...

agree 100%. there's no going back at this point and it's an inevitable problem anyway. so we need to innovate further.

The way out is authenticity. Signed content is the only way to get that.

This is the real play IMO. With the push for identity systems that support attestation [1], it doesn't matter if AI is successful at producing high quality results or if it only ever produces massive amounts of pure garbage.

In the latter case, it's a huge win for platform owners like Apple, Google, or Microsoft (via TPM) because they're the ones that can attest to you being "not a bot". I wouldn't be surprised if 5 years from now you need a relationship with one of those 3 companies to participate online in any meaningful way.

So, even if AI "fails", they'll keep pushing it because it's going to allow them to shift a large portion of internet users to a subscription model for identity and attestation. If you don't pay, your content won't ever get surfaced because the default will be to assume it's generated trash.

On the business side we could see schemes that make old-school SSL and code signing systems look like charities. Imagine something like BIMI [2], but for all content you publish, with a pay-per-something scheme. There could even be price discrimination in those systems (similar to OV, EV SSL) where the more you pay the more "trustworthy" you are.

My fear is that eventually you'll start seeing government services where identity and auth are handed off to private companies like Google and Apple. Imagine having your real identity tied to an attestation by one of those companies.

1. https://www.w3.org/TR/webauthn/#sctn-defined-attestation-for...

2. https://bimigroup.org/

the craziest part is that jaron lanier said this like 20 years ago, if not more

I feel like we brought this on ourselves when we started autocompleting replies to questions like "How are you?" with, "Fine, thank you." instead of thinking about it and giving honest answers.

Ye like the canned responses in Teams. Removes the fine grained nuances.

Does anyone actually use those? I have not seen any instances of it within the companies I've been at.

Probably not. I personally would feel like a twat using them. There are like mumbling a lazy disinterested answer, face to face.

I may have clicked one ironically one time, for my own amusement, then followed it up with an actual answer because I knew my coworker wouldn't get the joke.

Or maybe "I wish I were a bird."

Before the "AI" takeover, it was already full of SEO-mandated human-generated bullshit, so we haven't actually lost that much in the last couple of years. I've been saying it for almost as long as I've been in the industry, which is well over a decade now.

If this is true it implies all news and history for the past 10 years is also human-generated bullshit. I’m not saying you’re wrong – just that you have to follow your beliefs to their conclusions.

humans are quite capable of generating bullshit, I don't think that's ever been in contention. doesn't mean all or even most human generated content is garbage.

It may. And given the claim that most of the web is dogshit, that means that most of everything we are consuming is dogshit. Unless we can come up with a model and framework for discernment. And we don't have a workable one.

I apologize; I didn't mean to imply anything about ALL of anything. My main complaint is that the things that are not bullshit on the internet have largely been buried beneath bullshit for a long time.

What’s the chance that some of these comments are AI bots? Genuine question. When will OpenAI create an AI PR team?

They don't have to, the fandom will do it for free.

EDIT: Also, the odds are close to 100%.

They don't have to, the fandom will do it for free.

See Elon Musk about 4-5 years ago. It was impossible to criticize the man or his projects without getting dogpiled.

Still the case despite many many examples why there should be harsh criticisms.

Human behavior is incredibly interesting at times.

I just did a demo this week for someone who asked how well GPT4 would write forum replies in their voice. The results were startling to be honest. The only way I would (maybe) tell the comments were AI is that they were slightly better quality replies than the author themselves would have written.

I wouldn't go forward with it because it felt unethical, but it was a really fascinating thought experiment.

The next Google's killer feature will be the ability to filter out AI generated content and promote content that is truly written by humans.

Honestly I don't see Google having any killer feature left. The next killer feature is AI to do search, which MS is doing, but Google is failling at so hard they had to fiddle with their own demo.

Why do a web search to hope to get answers to my question, when I can have an AI write me a custom article that actually answers my question?

Oh how I yearn for the days when people's life stories were written into recipes -- by what I'm told were actual people.

There's too much value in undetectable AI for an AI detector to last long. Any technology behind a tool that can accurately detect it will used against itself: keep on trying until it can't detect its own output.

Create the disease and sell the cure

I think most people misunderstand that the search engines are not content.

Internet was always full of spam. Google was able to work around that. Since for some years Internet became dead because Google is not able to keep up.

Internet was dogshit. Now it is just AI dogshit. Google was quite good, but now it is not.

In a few years we will not use Google search, we will ask chatbots for answers. There will be no "link archive", or nobody will be interested at all in looking at them.

Chatbots will be worshiped. Why going to BBC, if chatbot will always 'know better'.

Google will not invest into Google search, as this "program" will die. Killed of.

Journalists will remain. BBC will continue writing stories, which will be created only to feed chat bot monsters.

That's it. Google is loosing the ground here. It is no longer "the gateway to the Internet." There has always been dog shit, it's just there will now be much more dog shit than ever. Indexing dog shit is not working anymore. There is still Internet out there with useful information, it's just you cannot access it by searching. Are we back to the era of curated sources, encyclopedias and web rings?

I'm already back to it, by following the blogs I like and by bookmarking every interesting sources I come across, as it's unlikely to get back to it via googling. I also more diligent with taking notes, and building a personal knowledge base due to the rate of information disappearing, especially recommendations.

Are we back to the era of curated sources, encyclopedias and web rings?

maybe even further. I subscribed to the physical, printed and delivered, Wall St. journal a few weeks ago, i really really like it. It has a first page and a last page, i can read it and be done. There's no infinite scroll. I also am subscribed (by chance more or less) to a physical magazine that arrives monthly. I really enjoy it too, the magazine isn't an answer to a question I asked so I always run across something unexpected/new there. Also, like the newspaper, it has a last page and there's no engagement bait because it's Read Only.

Wondering whether 10-12 years from now the internet becomes a hostile, corpo-inhibited medium, while the personal LLMs are what we take knowledge from. In reality some (of us) already do so, the only thing is that we don't really know what the LLMs were fed to begin with, but is assumed to be a reality-based information.

Your personal Search Engine is your personal model which you evolve to your needs. Safely including with your history, your chats, your family's memory. Internet then exists only as technical means to reach big corpo nets (heavily guarded against anyone extracting info from them) and as a ring of guerilla websites or fediverses, which are open by nature and resemble the old internet.

It is really to my opinion that this happens earlier than 2036.

Who will create the information then? They won't have traffic, an audience or even any recognition. Who would want to do that?

AI-assisted information creation will be treated the same way electronic music is considered now.

Actually if AIs become good at reasoning, composition, logic better than humans, AI content will be sought-after and more trusted than human content for the sheer ability of AIs to better at those tasks than humans. Of course there will be need for human bubbles for "authentic" experiences but for most of technical content AI will be preferred for its ability to tailor it (for brevity, style, recipient's technical know-how, language, context).

If you've ready Anathem ... as Ita it's our job to filter the crap from the Reticulum anyway so let's get to it. :)

https://en.wikipedia.org/wiki/Anathem

https://anathem.fandom.com/wiki/Ita

https://anathem.fandom.com/wiki/Reticulum

    “Early in the Reticulum—thousands of years ago—it became almost
    useless because it was cluttered with faulty, obsolete, or downright
    misleading information,” Sammann said.
   
    “Crap, you once called it,” I reminded him.
   
    “Yes—a technical term. So crap filtering became important. Businesses
    were built around it. Some of those businesses came up with a clever
    plan to make more money: they poisoned the well. They began to put
    crap on the Reticulum deliberately, forcing people to use their
    products to filter that crap back out. They created syndevs whose sole
    purpose was to spew crap into the Reticulum. But it had to be good
    crap.”
   
    “What is good crap?” Arsibalt asked in a politely incredulous tone.
   
    “Well, bad crap would be an unformatted document consisting of random
    letters. Good crap would be a beautifully typeset, well-written
    document that contained a hundred correct, verifiable sentences and
    one that was subtly false. It’s a lot harder to generate good crap. At
    first they had to hire humans to churn it out. They mostly did it by
    taking legitimate documents and inserting errors—swapping one name for
    another, say. But it didn’t really take off until the military got
    interested.”
   
    “As a tactic for planting misinformation in the enemy’s reticules, you
    mean,” Osa said. “This I know about. You are referring to the
    Artificial Inanity programs of the mid–First Millennium A.R.”
   
    “Exactly!” Sammann said. “Artificial Inanity systems of enormous
    sophistication and power were built for exactly the purpose Fraa Osa
    has mentioned. In no time at all, the praxis leaked to the commercial
    sector and spread to the Rampant Orphan Botnet Ecologies. Never mind.
    The point is that there was a sort of Dark Age on the Reticulum that
    lasted until my Ita forerunners were able to bring matters in hand.”

Anathem (Part 11: Advent) by Neil Stephenson

I like "Artificial Inanity" as a description of LLMs

I've figured that was the future for almost 25 years now, see #5:

https://ymlibrary.com/download/Topics/Self/Work-School/Work-...

Tongue in cheek: is public key cryptography the solution to verifying content authorship on the internet? Should all major OSes establish and bundle proof of authenticity for every phrase and image created in them using an input device attached to them?

That's an interesting idea. Until AIs will get access to keyboard drivers to type their dogshit.

yeah it doesn't seem much of a hurdle to take the output of an LLM and feed it to a device that mashes the buttons on a keyboard and then presses enter.

Generally, I'm an optimistic about current approaches towards AI. I'm using ChatGPT daily for my work. I'm seeing the BS generated from it all the time but it provides many great ideas from which I create something useful (both writing and programming).

The sinus example scares me. It's obviously wrong. But what about all the subtle errors that will be generated?

Humanity's knowledge is bootstrapped from BS. But most of it is long forgotten. On the internet, the BS ends up in the training data of the next iteration. I guess curation of quality text corpora will be an important thing in the next years.

You have to account for the amplification effect when a large number of people read false information so it becomes true.

“Truth” in the most practical sense is whatever most people believe and espouse.

Sure we’re all more sophisticated and intelligent to believe the actual truth, but do you have the willpower to fight that battle every day?

I like that idea. I guess such an amplification effect can lead entire civilizations astray. But still in the long run BS will be ruled out. Here, I disagree with you in that "truth" is also a matter what works and what not. Simply, BS doesn't work. It makes worse models in the heads of people (and possible artificial agents). Worse models will lead to less effective decisions.

Search has been steadily headed this direction for a long time, but recently (in just the last couple of months) I’ve started to notice a big uptick in obvious ChatGPT content in places that have otherwise been less egregiously SEO’d: StackOverflow answers, a real person’s personal Medium blog, comment sections on blogs that have good discussions, Github issues, etc.

Unfortunately I have to imagine this is only going to lead to more closed communities and less free sharing of knowledge.

Before that though, it already felt like a lot of those outlets - blogs, SO, comment sections - were either bot or low-value, almost automatic comments. And of course, low wage content writers who get paid by the word to write "where do I find this item" guides for video game websites, somehow managing to change "it's in this chest in this region" to ten paragraph articles.

"Now, potions are traditionally a good way of getting healing in RPGs. Potions were first introduced in..."

Lets be honest, the internet has always been full of Dogshit.

Teachers have long warned us about using Wikipedia as a source. Fake news has always existed. Real news has been diluting itself with questionably true clickbait for years. The mass scramble by companies to hastily use AI is just further diluting and tainting the standard information outlets we've used.

It just got a lot cheaper to generate convincing dogshit

How old are you?

Xillenial here and the internet was a utopia when I first explored it.

Everyone you interacted with was similarly curious and interesting. All the content was some flavor of passion project.

I’ve recently been getting into ham radio because it has a similar feel to me as the early internet. You need a license and an outlay of equipment and everyone else is a hobbyist also. It’s not an app anyone can use with their thumbs.

I think we’ll see a backlash at some point where people abandon these app-driven spaces and we go back to grass roots communities.

At least I hope so

"Google Is Full of AI Dogshit" or "Search Engines Are Full of AI Dogshit" or even better "Search Engine Algorithms Expose All of Internet's recent AI Dogshit"

How about 'internet content which is dependent on ad-revenue inevitably turns anything associated with it to dogshit'?

I have a sneaking suspicion that at some point in the distant future advertising will be banned simply because it's so associated with the tragedy of the information commons.

It's not going to get better. Maybe 5-10% of content is AI generated now. Wait until it's 90%. Wait as we keep piling more nines onto that number.

How do we keep the internet useful to humans while this is going on?

Maybe this is the new search engine challenge. Google rose to the top because, at the time, they were able to mine the links' references to each other to determine which were the best sites. If a search engine can solve the problem of finding the best (realest?) information in this mess, then they can rise to the top.

Indeed, Neal Stephenson in _Anathem_ (2008), in describing an alternate world (in which his "reticulum" is our "network") wrote "Early in the Reticulum—thousands of years ago—it became almost useless because it was cluttered with faulty, obsolete, or downright misleading information."

"So crap filtering became important. Businesses were built around it. ... " Generating crap "didn't really take off until the military got interested" in a program called "Artificial Inanity".

The defenses that were developed back then now "work so well that, most of the time, the users of the Reticulum don't know it's there. Just as you are not aware of the millions of germs trying and failing to attack your body every moment of every day."

A group of people (the "Ita") developed techniques for a parallel reticulum in which they could keep information they had determined to be reliable. When there was news on the reticulum, they might take a couple of days to do sanity-checking or fact-checking. I'm guessing there would need to be reputation monitoring and cryptographic signatures to maintain the integrity of their alternate web.

The Ret getting filled with garbage and the arms race of production and filtering is real. I think 99.99% effective filtering was the sci-fi part, though I hope I am wrong on that.

It all started with the "Traffic==Money" idea, a long time ago.

LLMs weren't even around a couple of years ago in any meaningful way and the internet was still full of dogshit. Maybe we can have better dogshit made with AI.

The problem is that the dogshit machine was constrained in its output by the organization creating it. Sure you may have paid small amounts per article but your resources were still limited as cost scaled linearly with content creation. Now any size of organization can make a metric tonne of lexically unique content with a fixed cost given the incentive to do so at the moment, it’s all but inevitable.

I was recently looking up passion fruit seeds to see if they were edible. One of the first results had various wrong warnings of adverse health effects.

But the best part was it said to be careful to avoid the seeds when you have a fever because it could cause explosions in your mouth.

It’s gotta be AI right? Who else could be as dumb as dogshit.

See section “how to pick a healthy and…”

https://cookingtom.com/are-passion-fruit-seeds-edible/

That sounds like a bad human translation from Chinese. AI mistakes are a different category.

This has already become self-evident when trying to find images on google. I have seen whole pages of results which were AI-generated. At every level it becomes so hard to understand the motivations of my fellow man in regards to these problems, it quickly invites almost conspiratorial thinking. How could anyone be so stupid, how is anyone benefitting, from the obvious and apocalyptic poisoning of the well. Especially when in the case of AI-imagery, if I really wanted the amalgamized paste of generative output I could just make it myself. Some part of me actually finds it offensive, that someone would have the gall to waste storage on this easily replicable digital refuse.

Google Image search has been hopeless for a long time now, you'll get better results typically on Yandex or Bing to the point I won't use Google for images.

I hate to keep beating on this drum but is _anyone_ working on a search engine yet that filters out anything with ads and/or a tracker?

The root cause of the problem is content monetization so please let's just stop indexing anything that tries to monetize content and/or track readers.

And I'll say this for all the overly-capitalistic folk on here: the amount of money I'm willing to pay for access to a search engine like this has been going up every month since 2002 -- something that the other paid search engines clearly fail to understand.

Regarding the last sentence: The problem is that capitalism knows no limits. Sure, it would be nice to pay a monthly subscription for genuinely good and desirable content/search results...

But what if the CEO of the service provider needs another $5m bonus? What if the stock needs to go up so that the shareholder gamblers can get more dividend paid? What if all of a sudden the service gets bought out?

The truth is that what you are seeking is more likely to come from someone who is just passionate about it with not that much motivation based on profit. That doesn't mean that this entity or person can't be financially supported but it gets problematic when profit is the _main_ incentive.

For a good example of an interesting search engine built by a single guy, see Marginalia: https://search.marginalia.nu/

Its getting worse. Kindle bookstore is getting flooded with books created with generative AI.

chatgpt rewrites of popular books on sell on amazon. the author wrote 24 books in a year.

https://twitter.com/MelMitchell1/status/1742936379796193612

Before ChatGPT, it was already full of bot-generated garbage; google any celebrity name and crappy sites are recommended with their net worth or relationship status.

This is the worst of it.

All of googles' top suggestions, highlights, and whatever other garbage they put in the top two scroll pages, is just junk. Garbage. It's right perhaps 5% of the time for me.

So not only do I have to hunt through their horrible search results, I have to hunt through junk they add on top of that.

The sad part is, their buffoonery in aliasing words is 90% of the problem. No, I searched for David, not Dave. No, I searched for Debian, not Ubuntu. On and on, unless I use verbatim, I get nothing even remotely useful.

It's like Google is completely disconnected from the real world. I bet they don't even dogfood. They probably have a Google Employee search that actually works, and doesn't alias or something.

I wonder what startling revelations Google would have, if they flew to the middle of Missouri, told people how to use verbatim, and then saw the wondrous expressions of "oh, it works now?" in grandmas.

They blew the AI game, screwing around, messing about, giving up a decade lead on OpenAI. They're destroying their search engine, their brand, with this junky, modern lack of effort. They're making gmail less and less friendly, losing cherished photos of loved ones in Google Drive, their entire Pii based income stream is coming to an end, frankly, Google is done, unless they do something dramatic.

They're on the path to becoming IBM. A washed up has been, ruminating on past glories.

They should be hiring right now. Massive amounts of talent is being set free, they should scoop them up, and reap 5 year research and dev rewards. They have the excess capital... now, something they will lose soon.

But no. Onward, we march into oblivious irrelevance, says they! Yay!

The people who hold the purse strings for Sports Illustrated are more interested in gaming Google search results and the resultant ad revenue from that practice than actually serving their readers.

Then by running an ad blocker I'm doing my part to make the world better.

Not if you're using it to still read Sports Illustrated.

The once ubiquitous phrase “let me Google that for you” is now meaningless. You are as likely to return incorrect information as you are complete fabrications

To me this came when I realised that SEO this and SEO that SEO here and SEO there was as innate as googling for answers. And is still incomprehensible to me how people can argue with serious face for actively and forcefully distorting the search results that they so eagerly want to build their goodwill on top of.

Machines only taking over the killing of the internet search with incredible efficiency.

My favorite example is if I search Google for "tide me over vs tie me over" it comes up with "Tie me over is correct". Not only is this wrong, but if you click the link, the source itself says it is wrong! The source is literally on the importance of fact checking, and Google is pulling a quote that the article uses as an example of an incorrect fact.

Google: WARNING: It is a common misconception that the phrase “tie me over” is actually pronounced “tide me over.” Some even go so far as to say the “tide” refers to the ebb and flow of hunger, but this is not the case. Rest assured “tie me over” is correct.

Actual source: https://www.techtarget.com/whatis/feature/Tie-me-over-vs-tid...

The worst case scenario is something like Kessler Syndrome for information.

We are racing right towards that, accelerating as we go. The web's toast. Maybe we can somehow build up different, more resilient platforms.

30 years in the making.

On the Internet, nobody knows you're a dog -- Peter Steiner / The New Yorker, July 5, 1993.

https://en.wikipedia.org/wiki/On_the_Internet%2C_nobody_know...

Dogs make some of the best clients for paying creative work.

To prevent the impending total enshittification of the Internet we'd probably need e.g. something like cryptographically signing human-generated content with some way of proving humanness.

Although nothing like this (or other solutions) is probably gonna materialize. The world, and especially the Internet, is driven by short-term commercial interests, and not-enshittifying things doesn't seem to make enough return on capital.

I live in Norway, and I've noticed that identification systems like BankID have become ubiquitous - even necessary - if you want to do anything serious on the web.

We're really fast forwarding to the end of Accelerando where the earth gets dissembled to make a dyson sphere of computronium that ends up being filled with sentient pyramid schemes.

Those cyber lobsters are our only hope!

The internet has been broken in a fundamental way. It is no longer a repository of people communicating with people; increasingly, it is just a series of machines communicating with machines.

LLMs don't talk to each other - humans use LLMs to talk to other humans

Are you sure about that? Can you conceive of LLM generated SEO spam blogs receiving comments from completely unrelated LLM generated comment spammers?

I think a big part of the problem is that internet is also full of him generated dog shit. At some point it became necessary to write a couple of pages about your vacation to Tuscany before dropping the promised recipe to avoid being penalized by google. Now there is now guarantee that the intro filler will ever end or that the eventual recipe will even be something that is desirable or safe to consume.

Sounds more like google search has gone to dog shit. But a lot of the internet has as well, but it's been happening before AI has become mainstream, I blame walled gardens like linkedin and facebook, as well as signup and paywall blocks.

Doesn't help that Google search is increasingly useless, across the board. Text Search sucks (yeah sure, you only got 8 results, I don't believe you stop lying to me Google), Image Search sucks, Maps Search even sucks now it refuses to find a direct ask I know is on the street I'm focused on and instead zoom out to highlight something 10 miles+ away on the other side of the city. YouTube search sucks shows like 3 results then starts showing things you've already watched or are just 8+ million views unrelated viral or gross out videos.

In retrospect it'll be obvious why Google wont be relevant at all in 15 years.

Kagi is good at filtering this stuff out. Recently I searched for some information about mobile phones, and while the Google results were contradictory and AI-generated, the Kagi results allowed me to find what I needed. I noticed something seemed off about the Google results (they were bad-looking websites with strange phrasings) which was confirmed when I discovered the different results contradicted each other.

For those advocating for “dead internet” aka “it was all dogshit anyway” – where is the boundary for non-dogshit information? Presumably everything we consume originated from or is adjacent to content from the internet. Even original content & sources are reading the internet themselves.

Perhaps you are internet aristocracy advocating “Google for thee and arXiv.org for me” . Are the folks writing our peer reviewed studies in an air-gapped compound in the amazon?

You can’t put the sewer next to the water supply and expect the two not to mix together.

The future is bright people. Technology will save us and usher in paradise!

But personally, I'm pretty happy about the internet filling up with dogshit, if only because I think it's likely that it will foil the plans of SV, singularitarians, etc.

Susan Blackmore was right.

To help understand the next step we can think of this process as follows: one replicator (genes) built vehicles (plants and animals) for its own propagation. One of these then discovered a new way of copying and diverted much of its resources to doing this instead, creating a new replicator (memes) which then led to new replicating machinery (big-brained humans). Now we can ask whether the same thing could happen again and — aha — we can see that it can, and is.

[...]

Computers handle vast quantities of information with extraordinarily high-fidelity copying and storage. Most variation and selection is still done by human beings, with their biologically evolved desires for stimulation, amusement, communication, sex and food. But this is changing. Already there are examples of computer programs recombining old texts to create new essays or poems, translating texts to create new versions, and selecting between vast quantities of text, images and data. Above all there are search engines. Each request to Google, AltaVista or Yahoo! elicits a new set of pages — a new combination of items selected by that search engine according to its own clever algorithms and depending on myriad previous searches and link structures.

https://nationalhumanitiescenter.org/on-the-human/2010/08/te...

I've had a draft blog post on my todo list for awhile titled "Chauffeur Knowledge and the Impending AI Crack-Up." While humorous today, the eggs melting and sinus inflammation are perfect examples of where I think AI will cause the most destruction. I'm of the mind that any societal collapse generated by AI will be one of stupidity and over-reliance on AI's "intelligence" as infallible, not some Terminator scenario or elimination of too many jobs.

Eventually, people will not know the difference between an AI "hallucination" and factual information. This becomes a serious problem when you consider that there's already an existing cohort of people who blindly trust Google over experienced humans. Case in point: I just saw a video [1] this morning where an air traffic controller argues with an experienced pilot about their approach method, citing "I Googled it" as their authoritative information source.

[1] https://www.youtube.com/watch?app=desktop&v=BxbF638gCYQ

Dead Internet Theory, now coming true.

Only way to win is not to play.

I agree there is a lot of dogshit out there.

In 2006 a fellow developer showed me a bot he had coded to write blog posts that just "spun" existing articles it scraped from other sites. He was making a fortune from fake blogs full of thousands of reworked articles, so this dogshit has been steaming for a long time.

I use GPT4 and Claude to write drafts of blog posts and wiki articles, often starting by having them summarize 250 page documents. The output is very high quality, adds a lot of value (because many of the original documents cannot be shared) and completes a task that no human could easily do (read thousands of long documents in hours).

So, I can't write AI off -- it is a tool that can be used for good or bad.

I doubt that the dogshit is so much worse than it used to be. Unfortunately, I also contributed to this. When I was 13 or 14, I scripted a little tool that allowed me to write one tech news blog post and generate multiple versions of it by simply swapping phrases, adding practically zero value. You'd be surprised (or not) how much of a tech news blog is repetitive wording, mainly to fill yet one more adsense banner after a new paragraph. It was not based on AI, but the core principle was the same as the example in the article.

I published each version on a separate WordPress blog covering roughly the same topic, chose random pictures, set random publish dates close to each other, and signed them all up for Google News. This non-AI dogshit dominated a small tech niche, making a decent amount of money at the time for a 14-year-old. I am pretty sure I was not the only one coming up with that idea at the time.

So new generation of captcha will have to be based on content censored by AIs, like uploading videos with your pants down or something?

The post google uses for the feature snippet for the search “best place to fly drones in sf” appears to be misleading inane useless dogshit.

What I’m not sure is whether it’s ai generated, bad quality traditional bot generated, or human content spam generated.

But whatever it is, it fooled google. It hurts to read, each numbered point is similar to this: “S.F. city parks are a great and amazing place to fly. Sf parks are illegal to fly in”

Post link: https://www.kentfaith.com/blog/article_where-can-you-fly-dro...

To me the most interesting comment I heard on the "practical ai" podcast[1]. One of the guests pointed out that 2021 might be the last year we can have AI train off of, because the internet is about to explode in AI generated content which means it will just become a self-reinforcing loop.

[1]: https://changelog.com/practicalai

I can't repro the author's supposed comedy.

https://www.google.com/search?q=how+long+does+it+take+for+si...

I see the point about AI's impact on internet content, but I think it's important to view it in a broader context. AI is just a tool, and its effectiveness is shaped by how users use it. It's not just AI, but the entire internet that's always had a mix of good and bad content. AI actually has the potential to improve access to information and level the field. It's more about refining search algorithms and enhancing oversight rather than dismissing the technology altogether. The internet has always required users and creators to sift through and elevate quality content. AI is just a new aspect of this

The Internet Is Full of Registration Popup Dogshit

It is. Ranging from sexualised animals (aka furies) to sexualised children cartoons (aka anime) and all the garbage text and trash google search results, the internet is starting to look more like a madhouse rather than a useful tool. Why are companies catering to the needs of such people instead of sane folks? Is it because the latter gave up on using it a long time ago - ie during the conspiracy theory craze of 19-20 - or is there some sort of agenda we are unaware of? Literarily all social networks are filled with weirdos spamming ai left and right.

The SEO that killed the web, the productivity hacking that burns us down, UX to increase sales and subscriptions, the atention harvesting of social networks.

We're optimizing ourselves to death.

Ultimately cryptography, up to and including the use of decentralized ledgers, will help mitigate some of this, by carving out ground facts that can be mutually agreed upon without risk of AI interference. The people who hate AI the most often hate crypto too, so it will be interesting to see how they resolve this.

Wow, that Jake Ward guy linked in the middle is an absolute douchenozzle. His entire business model seems to be generating low-quality AI spam to help websites steal traffic from their competitors. The internet would be a better place if people like him didn't exist.

I feel like Google's SEO and advertisement game have cornered the entire internet into being mostly a useless gamification and parody of content.

Automation can now shove even further content to search engines and keep lowering the quality further.

I think that websites as we knew and searched for them started dying a long time ago, but AI will unavoidably make this whole matter worse.

In the future I fear that people will have no other choice but to ask people for information from the Internet, because right now it’s all full of AI dogshit.

Ha ha ha! That future is now, the value of the web has declined for me dramatically. I hate looking for information on the web now. First it was troll and content farms, now they have a new sibling to help them enshitify the web further in AI. Apart from a few places I trust I've turned away from the web.

The worst thing is that I spent more of my time researching now and I seldom get any results.

There's no difference. Web search has been useless for over 15 years now. This is barely any worse than the previous situation which was where that any question, if not immediately answered with marketing pages for all the first results, would lead to some thinly veiled marketing crap a la a "blog". I would never trust someone whose career is "content creation" or "monetized blogging" to answer any question I have, like, how to was a toilet. The only difference between the example in the article and what you'd get 10 years ago, is that the former is obviously wrong and the latter would be something you would need to spend several days to refute unless you so happen to work in the relevant field.

Are books published by proven publishers the future?

Are books by proven publishers the future?

"The internet has been broken in a fundamental way. It is no longer a repository of people communicating with people"

But the examples given are... google search results and articles from big publishers, things made to cast a wide net. Those were never good ways to communicate with other people online imo. If you want to interact with real people look for smaller communities and individual creators.

There’s a grain of truth here, but I object to the hyperbolic headline as it is unsupported by the content’s described data points which are exceptional rather than representative of typical content.

“The internet is broken in a fundamental way”? No, there are some growing pains at the margins. Feel free to react strongly, but these are not the same.

Wasn't the internet full of hand-crafted made for google dog shit before?

You have to wonder now… How do the machines really know what tasty wheat tastes like?

A validated set of UTC timestamps (create, modify, access...) sure would be a useful property for objects on the internet.

Oh well.

That there has always been terribly-rendered dreck on the Internet. Badly-written copy by semi-literate self-appointed "authorities." Kitbashed plagiarism that only half-understood its sources. Conspiracy theories galore from garbage-minded gasbags. Totally muddle-headed hot takes. But it was manually generated. Or semi-automatic at best (like bad machine translations of foreign language sources).

The issue with AI is the fully-automatic engines behind the process. It's not that bad web pages are unique in history. What is unique is the sheer volume and velocity of drek that is being and will be produced.

Combatting this will need technology to, in effect, provide a mark of authenticity. "This was created by a human," in effect, and, more specifically, "By this [human|set of humans]." Contrariwise, "This is a product of system X."

Of course, people will lie and claim their AIs didn't ghostwrite their content or create those images. So against authorship and identities there will be an increasing need to rank authority/authenticity/veracity.

Not all AI-generated content will be horrible. And not all human-generated content will be trustworthy. There will need to be a way to mark reliability based on community-oriented standards. For AI-generated content (and human-generated content too) there will need to be ways to cite sources.

This will all become highly politicized, commercialized and gamed.

Yet if we don't work on such standards, we will continue to be beholden to hidden search engine algorithms of trustworthiness. Why did something get to the answerbox or SERP1? Was it simply through keyword packing? Or did whatever author behind this — human or machine — actually know (or at least seem to know) what the hell they are talking about?

There will need to be Internet-wide ways to create community-oriented ratings. Thumbs up and down.

Maybe this is a popular page with the public. But actual professional astronomers see this as factually misleading.

Maybe this page is popular with political faction X. But others point out it is rife with factually incorrect conspiracy theories.

Just like there are now "community notes" on Twitter (X), the whole Internet needs the equivalent way to ascribe qualitative judgments on any arbitrary page.

Being dismissive of AI while simultaneously embracing Google is a very hypocritical position. Google has trashed the internet by changing how results are generated so that your website is only successful by being 1. approved and 2. popular.

AI has brought back some of the fun from 2010~ because you can generate non-pre-approved content for fun and profit and have a big community to share it with.

Google ≠ the internet.

This reads more like a rant against Google’s search product.

Yes, Google might go under due to AI (Clayton says, thank you and goodbye) but that doesn’t mean the internet as a whole is doomed.

To be frank, the internet is already awash with dogshit. It’s doing pretty well so far.

I'm a sports fan who follows basketball and football. Here's something I see a lot of nowadays in my feed:

<Player> set a new record last night

And when you go to click into it?

An AI created post about how the athlete in question is now in position 1258 all time at something or other.

Awesome. Such a great use of technology. Yet this will be the primary use of AI outside of materials research. Making clickbait clickbaitier.

Today, an article was published alleging that Sports Illustrated published AI-generated articles. According to our initial investigation, this is not accurate.

The articles in question were product reviews and were licensed content from an external, third-party company, AdVon… — Sports Illustrated (@SInow)

Oh. So you constructed your site to make this distinction unclear, and then paid for an AI generated article to be put on your site.

I guess that makes it alright then. Nothing needs to change.

I fully believe that the future of the internet is going to small invite only enclaves of a few hundred people, because everything else will be unusable because of AI generated trash.

(2023)