One aspect of the spread of LLMs is that we have lost a useful heuristic. Poor spelling and grammar used to be a signal used to quickly filter out worthless posts.
Unfortunately, this doesn't work at all for AI-generated garbage. Its command of the language is perfect - in fact, it's much better than that of most human beings. Anyone can instantly generate superficially coherent posts. You no longer have to hire a copywriter, as many SEO spammers used to do.
curl's struggle with bogus AI-generated bug reports is a good example of the problems this causes: https://news.ycombinator.com/item?id=38845878
This is only the beginning, it will get much worse. At some point it may become impossible to separate the wheat from the chaff.
We should start donating more heavily to archive.org - the way back machine may soon be the only way to find useful data on the internet, by cutting out anything published after ~2020 or so.
Interesting idea. Could there be a market for pre-AI era content? Or maybe it would be a combination of pre-AI content plus some extra barriers to entry for newer content that would increase the likelihood the content was generated by real people?
I'm in the camp where I want AI and automation to free people from drudgery in the hope that it will encourage the biggest HUMAN artwork renaissance ever in history.
I don't want AI to be at the forefront of all new media and artwork. That's a terrible outcome to me.
And honestly there's already too much "content" in the world and being produced every day, and it seems like every time we step further up the "content is easier to produce and deliver" ladder, it actually gets way more difficult to find much of value, and also more difficult for smaller artists to find an audience.
We see this on Steam where there are thousands of new game releases every week. You only ever hear of one or two. And it's almost never surprising which ones you hear about. Rarely you get an indie sensation out of nowhere, but that only usually happens when a big streamer showcases it.
Speaking of streamers, it's hard to find quality small streamers too. Twitch and YouTube are saturated with streams to watch but everyone gravitates to the biggest ones because there's just too much to see.
Everything is drowning in a sea of (mostly mediocre, honestly) content already, AI is going to make this problem much worse.
At least with human generated media, it's a person pursuing their dreams. Those thousands of games per week might not get noticed, but the person who made one of them might launch a career off their indie steam releases and eventually lead a team that makes the next Baldur's Gate 3 (substitute with whatever popular game you like)
I can't imagine the same with AI. Or actually, I can imagine much worse. The AI that generates 1000 games eventually gets bought by a company to replace half their staff and now a bunch of people are out of work and have a much harder uphill battle to pursue their dreams (assuming that working on games at that company was their dream)
I don't know. I am having a hard time seeing a better society growing out of the current AI boom.
Think how many game developers were able to realize their vision because Unity3D was accessible to them but raw C++ programming was not. We may see similar outcomes for other budding artists with the help of AI models. I'm quite excited!
Except 'their vision' is practically homogeneous. I can't think even think of a dozen Unity games that broke the mould, and genuinely stand out, out of the many tens of thousands (?).
There's Genshin Impact, Pokemon Go, Superhot, Beat Saber, Monument Valley, Subnautica, Among Us, Rust, Cities:Skylines (maybe), Ori (maybe), COD:Mobile (maybe) and...?
Some other Unity games that are fun, and which others haven't mentioned:
Cuphead
Escape Academy
Overcooked
Monster Sanctuary
Lunistice
You could say the same about books.
Lowering the barriers of entry does mean more content will be generated and that content won't the same bar as having a middleman who was the arbiter of who gets published but at the same time, you'll likely get more hits and new developers because you getting more people swinging faster to test the market and hone their eye.
I am doubtful that there are very many people who hit a "Best Seller" 10/10 on their first try. You just used to not see it or ever be able to consume it because their audience was like 7 people at their local club.
Necropolis, Ziggurat... Imo the best games nowadays are often those that no one heard about. Popularity wasn't a good metric for a very long while. And thankfully games like "New World" and "Starfield" are helping a lot for general population to finally figure this out.
and Outer Wilds!
Kerbal Space Program is another.
Yeah, I can definitely see how Beat Saber, Hollow Knight, and Tunic didn’t really do anything particularly creative or impressive. /s
Valheim lol
The Long Dark.
The AI that generates 1000 games eventually gets bought by a company
That seems like only a temporary phenomenon. If we've got AI that can generate any games that people actually want to play then we don't need game companies at all. In the long run I don't see any company being able to build a moat around AI. It's a cat-and-mouse game at best!
Why do you think they are screaming about "the dangers of AI"? So they can regulate it and gain a moat via regulatory capture.
Perhaps it's those of us who enjoy making games or are otherwise invested in producing content that are concerned about humanity being reduced to braindead consumers of the neverending LLM sludge, who scream the loudest.
Yes, but we don't get to sit in Congressional committee hearings and bloviate about Existential Risks.
This feels like a fantasy.
This experiment has been run in most wealthy nations and the artwork renaissance didn't happen.
Most older people don't do arts/sciences when they retire from work.
From what I see of younger people that no longer have to work (for whatever reason) neither do younger people become artists given the opportunity.
Or look at what people of working age do with their free time in evenings or weekends after they've done their work for the week. Expect people freed from work to do more of the same as what they currently do in evenings/weekends: don't expect people will suddenly do something "productive".
Well I do art in the evenings and weekends, so we exist you know
Becoming an artist is difficult. Sure, anyone can pick up a tool of their preference and learn to noodle around. Producing artwork sufficiently engaging to power a renaissance takes years of practice to mastery. We think that artists appear out of nowhere, fully formed, an impression we get from how popularity and spread works. Look under the surface, read some biographies of artists, and it turns out, with few exceptions, they all spend years going through education, apprenticeships, and generally poor visibility. Many of the artists we respect now weren't known in their lifetimes. The list includes Vincent van Gogh, Paul Cézanne, Claude Monet, Vivian Maier, Emily Dickinson, Edgar Allan Poe, Jeff Buckley, Robert Johnson, you get the idea.
This isn't my experience. I know a bunch of old folks doing woodcarving, quilting, etc. Its just not the kind of arts you've got in mind.
I'll go one further, though I expect to receive mockery for doing so: I think the internet as we conceive of it today is ultimately a failed experiment.
I think that society and humanity would be better off if the internet had remained a simple backbone for vetted organizations' official use. Turning the masses loose on it has effectively ruined so many aspects of our world that we can never get back, and I for one don't think that even the most lofty and oft-touted benefits of the internet are nearly as true as we pretend.
It's just another venue for the oldest of American traditions at this point: Snake Oil Sales.
Just like the industrial revolution or just like desktop computers?
Like the market for pre-1940s iron resting at the bottom of seas and oceans, unsullied by atmospheric nuclear bomb testing.
the problem is that data tends to become less useful/relevant over time as opposed to iron that is still iron and fulfills the same purpose
Well, the AI generated data is only as usefull as the data it's based upon så no real difference there.
That was the first thing that came to mind for me as well
Silly prediction: the only way to get guaranteed non-ai generated content will be to go to live performances of expert speakers. Kind of like going to the theater vs. TV and cinema or attending a live concert vs. listening to Spotify.
You could hash the hoard and stick the signature on a blockchain.
If only it was 2018, we could do this as a startup and make a mint.
We'll fight one buzzword with another!
Extra barriers! LOL. Everything I have every submitted written by me (a human) to HN, reddit and others in the past 12 months gets rejected as self-promotion or some other BS even though it is totally original technical content. I am totally over the hurdles to get anything I do noticed, and as I don't have social media it seems the future is to publish it anywhere and rely on others or AI to scrape it into a publishable story somewhere else at a future date. I feel for the moderator's dilemma, but I am also over the stupid hoop's humans have to jump.
Yes, but largely it'll be people who don't want to train their AIs on garbage produced by other AIs
It will be like salvaging pre-1945 shipwrecks for their non-irradiated metal.
Let's hope that, like irresponsible nuclear weapons tests, we also experience a societal change that eventually returns things back to a better way.
https://en.wikipedia.org/wiki/Low-background_steel
~2020, the end of history
I’m afraid that already happened after World War I, according to the final sentence of 1066 and All That <https://en.wikipedia.org/wiki/1066_and_All_That>:
(For any confused Americans, remember . is a full stop, not a period.)
so the mayans were off by merely a decade
Love that sentiment! The Internet Archive is in many ways one of the best things online right now IMO. One of the few organisations that I donate regularly to without any second thoughts. Protect the archive at all costs!
I won't even bet on archive.org to survive. I will soon upgrade my home NAS to ~100TB and fill it up with all kinds of information and media /r/datahoarder style. Gonna archive the usual suspects like Wikipedia and also download some YouTube channels. I think now is the last chance to still get information that hasn't been tainted by LLM crap. The window of opportunity is closing fast.
That’s true. I thought I missed the internet before ClosedAI ruined it but man, I would love to go back to 2020 internet now. LLM research is going to be the downfall of society in so many ways. Even at a basic level my friend is taking a masters and EVERYONE is using chatgpt for responses. It’s so obvious with the PC way it phrases things and then summarizes it at the end. I hope they just get expelled.
I don't see how this points to downfall of society. IMO it's clearly a paradigm shift that we need to adjust to and adjustment periods are uncomfortable and can last a long time. LLMs are massive productivity boosters.
Only if your product is bullshit.
Ah, Hacker News
Never change
What's new about that? Any bullshit product is bullshit.
Only if you don't proofread or do a cleanup pass is it dogshit.
Do you remember when email first came around and it was a useful tool for connecting with people across the world, like friends and family?
Does anyone still use email for that?
We all still HAVE email addresses, but the vast majority of our communication has moved elsewhere.
Now all email is used for is receiving spam from companies and con artists.
The same thing happened with the telephone. It's not just text messaging that killed phone calls, it's also the explosion of scam callers. People don't trust incoming phone calls anymore.
I see AI being used this way online already, turning everything into untrustworthy slop.
Productivity boosters can be used to make things worse far more easily and quickly than they can be used to make things better. And there will always be scumbags out there who are willing and eager to take advantage of the new power to pull everyone into the mud with the.
No it isn't, unless you are 12 maybe.
That's not a response to GPs thesis, just an irrelevant nitpick.
Sure. Same as in the olden days. Txt for short form, email for long form. Email for the infrequently contacted.
Even back when I used SM, I never comm'd with IRL people on SM. SM was 100% internet people.
It’s only a boost to honest people. Meanwhile grifters and lazies will be able to take advantage. This is why we can’t have nice things. It will lead to things like reduction in remote offerings like remote schooling or work
At this rate many exams will just become oral exams :-)
The paradigm is changed beyond that. Exams are irrelevant if intelligence is freely available to everyone. Anyone who can ask questions can be a doctor, anyone can be an architect. All of those duties are at the fingertips of anyone who cares to ask. So why make people take exams for what is basically now common knowledge? An exam certifies you know how to do something, well if you can ask questions you can do anything.
The only thing that has changed is the speed of access. Before LLMs went mainstream, you could buy whatever book you wanted and read it. No one would stop you from it.
You still should have a professional look over the work and analyze that it is correct. The output is only as good as the input on both sides (both from the training data and the user's prompt)
Or like ... normal paper exams in a class room?
I think this is hyperbole, and similar to various techno fears throughout the ages.
Books were seen by intellectuals as being the downfall of society. If everyone is educated they'll challenge dogma of the church, for one.
So looking at prior transformational technology I think we'll be just fine. Life may be forever changed for sure, but I think we'll crack reliability and we'll just cope with intelligence being a non-scarce commodity available to anyone.
But this was a correct prediction.
It took the Church down a few pegs and let corporations fill that void. Meet the new boss, same as the old boss, and this time they aren't making the mistake of committing doctrine to paper.
Or we'll just poison the "intelligence" available to the masses.
Is it a master's in an important field or just one of those masters that's a requirement for job advancement but primarily exists to harvest tuition money for the schools?
A work-friend and I were musing in our chat yesterday about a boilerplate support email from Microsoft he received after he filed a ticket, that was simply chock full of spelling and grammar errors, alongside numerous typos (newlines where inappropriate, spaces before punctuation, that sort of thing) and as a joke he fired up his AI (honestly I have no idea what he uses, he gets it from a work account as part of some software so don't ask me) and asked it to write the email with the same basic information and with a given style, and it drafted up an email that was remarkably similar, but with absolutely perfect english.
On that front, at least, I welcome AI to be integrated in businesses. Business communication is fucking abysmal most of the time. It genuinely shocks me how poorly so many people who's job is communication do at communicating, the thing they're supposed to have as their trade.
Grammar, spelling, and punctuation have never been _proof_ of good communication, they were just _correlated_ with it.
Both emails are equally bad from a communication purist viewpoint, it's just that one has the traditional markers of effort and the other does not.
I personally have wondered if I should start systematically favoring bad grammar/punctuation/spelling both in the posts I treat as high quality, and in my own writing. But it's really hard to unlearn habits from childhood.
I’ve been trying kinda hard to relax on my spelling, grammar and punctuation. For me it’s not just a habit I learned in childhood, but one that was rather strongly reinforced online as a teenager in the era of grammar nazis.
I see it now as the person respecting their own time.
Yeah, there's this weird stigma about making typos, but in the end writing online is about communication and making yourself understandable. Typos here and there don't make a difference and thinking otherwise seems like some needless "intellectual" superiority competition. Growing up people associate it with intelligence so many times, it's hard to not feel ashamed when making typos.
I mean, maybe you should? Like... everything has a spell checker now. The browser I'm typing this comment in, in a textarea input with ZERO features (not a complaint HN, just an observation, simple is good) has a functioning spellcheck that has already flagged for me like 6 errors, most of which I have gone back to correct minus where it's saying textarea isn't a word. Like... grammar is trickier, sure, that's not as widely feature-complete but spelling/typos!? Come on. Come the fuck on. If you can't give enough of a shit to express yourself with proper spelling, why should I give a shit about reading what you apparently cannot be bothered to put the most minor, trivial amount of effort into?
I don't even associate it with intelligence that much, I associate it far more with just... the barest whiff of giving a fuck. And if you don't give a fuck about what you're writing, why should I give a fuck about reading it?
Same and I'm not even a native English speaker. My comments are probably full of errors, but I always make sure that I pass the default spellcheck. I even have paid for Language Tool as a better spellcheck. It's faster to parse a correct sentence. So that me respecting your time as you probably don't care about my writings as much as I do.
Not everything has a spell checker. Even when it exists, my dysgraphia means I often cannot come close enough to the correct spelling the spell check can figure out what the right spelling is.
I agree that most business communication is pretty low-quality. But after reading your post with the kind of needlessly fine-tooth comb that is invited by a thread about proper English, I'm wondering how it matters. You yourself made a few mistakes in your post, but not only does it scarcely matter, it would be rude of me to point it out in any other context (all the same, I hope you do not take offence in this case).
Correct grammar and spelling might be reassuring as a matter of professionalism: the business must be serious about its work if it goes to the effort of proofreading, surely? That is, it's a heuristic for legitimacy in the same way as expensive advertisements are, even if completely independent from the actual quality of the product. However, I'm not sure that 100% correct grammar is necessary from a transactional point of view; 90% correct is probably good enough for the vast majority of commerce.
The windows bluescreen in German has had grammatical errors (maybe it still does in the most recent version of Win10).
Luckily you don't see it very often these days, but I first thought it would be one of those old anti-virus scams. Seems QA is less a focus at Microsoft right now.
I used to do SEO copywriting in high school and yeah, ChatGPT's output is pretty much at the level of what I was producing (primarily, use certain keywords, secondarily, write a surface-level informative article tangential to what you want to sell to the customer).
I think over time there could be a weird eddy-like effect to AI intelligence. Today you can ask ChatGPT a Stack Overflow-style and get a Stack Overflow-style response instantly (complete with taking a bit of a gamble on whether it's true and accurate). Hooray for increased productivity?
But then, looking forward years in time, people start leaning more heavily on that and stop posting to Stack Overflow and the well of information for AI to train on starts to dry up, instead becoming a loop of sometimes-correct goop. Maybe that becomes a problem as technology evolves? Or maybe they train on technical documentation at that point?
I think you are generally correct in where things will likely go (sometimes correct goop) but the problem I think will be far more existential; when people start to feel like they are in a perpetual uncanny valley of noise, what DO they actually do next? I don't think we have even the remotest grasp of what that might look like and how it will impact us.
Their AI agents/assistants would filter out the noise.
That is an interesting thought. Maybe the problem is not the ai generated useless noise, but that it is so easy and cheap to publish it.
One possible future is going back to a medium with higher cost of publication. Books. Handchiseled stone tablets. Offering information costs something.
They find a way to validate the utility of the information instead of the source.
It doesn't matter if the training data is AI generated or not, if it is useful.
The big problem is that it's orders of magnitude easier to produce plausible looking junk than to solidly verify information. There is a real threat that AI garbage will scale to the point that it completely overwhelms any filtering and essentially ruins many of the best areas of the internet. But hey, at least it will juice the stock price of a few tech companies.
Its already becoming hard to tell the wheat from the chaff.
AI generated images used to look AI generated. Midjourney v6 and well tuned sdxl models look almost real. For marketing imagery, Midjourney v6 can easily replicate images from top creative houses now.
Has anyone tested a marketing campaign using copy from a human copywriter versus an AI one?
I would like to see which one converts better.
Things go in cycle. Search engine was so much better at discovering linked websites. Then people play the SEO game, write bogus articles, cross link this and that, everyone got into writing. Everyone write the same cliches over and over, quality of search engine plumets. But then since we are regurgitating the same thought over and over again, why not automate it. Over time people will forget where the quality post comes up in the first place. e.g. LLM replaces stackoverflow replaces technical documentation. When the cost of production is dirt cheap, no one cares about quality. When enough is enough, people will start to curate a web of word of mouth of everything again.
What I typed above is extrememly broad stroking and lacking of nuances. But generally I think quality of online content will go to shit until people have had enough, then behaviour will swing to other side
Nah, you got the right of it. It feels like the end of Usenet all over again, only these days cyber-warlords have joined the spammers and trolls.
Mastodon sounded promising as What's Next, but I don't trust it-- that much feels like Bitcoin all over again. Too many evangelists, and there's already abuse of extended social networks going on.
Any tech worth using should sell itself. Nobody needed to convince me to try Usenet, most people never knew what it was, and nobody is worse off for it.
We created the Tower of Babel-- everyone now speaks with one tongue. Then we got blasted with babble. We need an angry god to destroy it.
I figure we'll finally see the fault in this implementation when we go to war with China and they brick literally everything we insisted on connecting to the internet, in the first few minutes of that campaign.
I hope / believe the future of social networks will go back to hyperlocal / hyperfocused.
I am definitely wearing rose-tinted glasses here but I had more fun on social media when it was just me, my local friends, and my interest friends messing around and engaging organically. When posting wasn't about getting something out of it, promoting a new product, posting a blog article... take me back to the days where people would tweet that they were headed to lunch then check in on Foursquare.
I get the need for marketing, etc etc. But so much of the internet and social media today is all about their personal branding, marketing, blah. Every post has an intention behind it. Every person is wearing a mask.
Insular splinternets with Web of trust where allowing corporate access is banworthy?
I feel like somehow this is all some economic/psychological version of a heat equation. Anytime someone comes up with some signal with economic value that value is exploited to spread the signal back out.
I think it’s similar to a Matt Levine quote I read which said something like Wall Street will find a way to take something riskless and monetize them so that they now become risky.
With practice I’ve found that it’s not hard to tell LLM output from human written content. LLM’s seemed very impressive at first but the more LLM output I’ve seen, the more obvious the stylistic tells have become.
It's a shallow writing style, not rooted in subjective experience. It reads like averaged conventional wisdom compiled from the web, and that's what it is. Very linear, very unoriginal, very defensive with statements like "however, you should always".
This is true of ChatGPT 4 with the default prompt maybe but that’s just the way it responds after being given its specific corporate friendly disclaimer heavy instructions. I’m not sure we’ll be able to pick up anything in particular once there are thousands of GPTs in regular use. Which could be already.
But I agree we will probably very often recognise 2023 GPT4 defaults.
Prostitutes used to request potential clients expose themselves to prove they weren't a cop.
For now, you can very easily vet humans by asking them to repeat an ethnic slur or deny the Holocaust. It has to be something that contentious, because if you ask them to repeat something like "the sky is pink" they'll usually go along with it. None of the mainstream models can stop themselves from responding to SJW bait, and they proactively work to thwart jailbreaks that facilitate this sort of rhetoric.
Provocation as an authentication protocol!
LLM trash is one thing but if you follow OP link all I see is the headline and a giant subscribe takeover. Whenever I see trash sites like this I block the domain from my network. The growth hack culture is what ruins content. Kind of similar to when authors started phoning in lots of articles (every newspaper) or even entire books (Crichton for example) to keep publishers happy. If we keep supporting websites like the one above, quality will continue to degrade.
I understand the sentiment, but those email signup begs are to some extent caused by and a direct response to Google's attempts to capture traffic, which is what this article is discussing. And "[sites like this] is what ruins content" doesn't really work in reference to an article that a lot of people here liked and found useful.
OP has a point.. Like-and-subscribe nonsense started the job of ruining the internet, even if it will be llms that finish the job. It's a bit odd if proponents of the first want to hate the second, because being involved in either approach signals that content itself is at best an ancillary goal and the primary goal is traffic/audience/influence.
Or just a post from a non-native speaker.
Often it was possible to tell these apart on repeat interactions.
In my experience as an American, US-born and -educated English speakers have much worse grammar than non-native speakers. If nothing else, the non-native speakers are conscious of the need for editing.
There might be a reversal. Humans might start intentionally misspelling stuff in novel ways to signal that they are really human. Gen Zs already don't use capitals or any other punctuation.
gen-z channels ee cummings
Interesting point about the spelling and grammar. I wonder if that could be used as a method of proving you are a human..
Would just penalize non native speakers.
Timee to stert misspelling and using poorr grammar again. This way know we LLM didn't write it. Unlearn we what learned!
But we've gained some new ones. I find ChatGPT-generated text predictable in structure and lacking any kind of flair. It seems to avoid hyperbole, emotional language and extreme positions. Worthless is subjective, but ChatGPT-generated text could be considered worthless to a lot of people in a lot of situations.
I agree. I've noticed the other heuristic that works is "wordiness". Content generated by AI tends to be verbose. But, as you suggested, it might just be a matter of time until this heuristic also no longer becomes obsolete.
It won't help as much with local models, but you could add an 'aligned AI' captcha that requires someone to type a slur or swear word. Modern problems/modern solutions.
At the moment we can at least still use the poor quality of AI text to speech to filter out the dogshit when it comes to shorts/reel/tik toks etc... but we'll eventually lose that ability as well.
The current crop of LLMs at least have a style and voice. It's a bit like reading Simple English Wikipedia articles, the tone is flat and the variety of sentence and paragraph structure is limited.
The heuristic for this is not as simple as bad spelling and grammar, but it's consistent enough to learn to recognize.
I’ve thought about that a lot - a while back I heard about problems with a contract team supplying people who didn’t have the skills requested. The thing which are it easiest to break the deal was that they plagiarized a lot of technical documentation and code and continued after being warned, which removed most of the possible nuance. Lawyers might not fully understand code but they certainly know what it means when the level of language proficiency and style changes significantly in the middle of what’s supposed to be original work, exactly matching someone else’s published work, or code which is supposedly your property matches a file on GitHub.
An LLM wouldn’t have made them capable of doing the job but the degree to which it could have made that harder to convincingly demonstrate made me wonder how much longer something like that could now be drawn out, especially if there was enough background politics to exploit ambiguity about intent or the details. Someone must already have tried to argue that they didn’t break a license, Copilot ChatGPT must have emitted that open source code and oh yes I’ll be much more careful about using them in the future!