404Media ran an expose on a new LLM product designed to mimmick real users having discussions on Reddit and plug your product in the comments, called "Reply Guy" (lol)
https://www.404media.co/ai-is-poisoning-reddit-to-promote-pr...
Google is failing, so users start putting "Reddit" on the ends of their search results. Where do we go when Reddit is no longer useful and contains the same AI generated dreck as all the Google search results? It shows how many single points of failure there are on the informational web. Pretty much the only informational resource on the web that's still unscathed is Wikipedia (thanks herculean-level efforts by its editors, mind you), but I wouldn't bank on it in the same way I wouldn't bank on Reddit. The "information age" might be coming to an end.
So are we headed towards some sort of identification like passport, drivers license etc to be able to post?
Would you be able to create system where you somehow battle this spam but retain privacy in some way?
Is there an alternative that retains max privacy in a world with a trillion bots spamming away?
Ie. does any good systems exist where say you can get a HUMAN-ID, by some sort of verification, this then grants you access to create users, but no one can see what user are tied to what HUMAN-ID, but you can only create say 5 total, and if some are busted doing spam they are all revoked (bad orwellian idea)
Or maybe some advanced federated trust chains where if lots of different people deem you a spammer you can get your users taken away, but no state power can revoke it in one move for example or see who you are.
Block and fine ISP's that host bots. Throw people in prison that run bots.
Not possible as most are in Russia/China/Iran/Nigeria/etc...
Block
Yes it's possible to do this. I wrote up a scheme for that years ago that I called "proof of passport". You can create anonymous identities tied to a hash of your epassport certificate using SGX enclaves and some careful protocol design.
Needless to say, such ideas make some people very unhappy, although it can be done in a way that doesn't grant governments any new powers they don't already have. The most common objection is from Americans who make the same arguments they make about elections: some people don't have id of any kind and shouldn't be expected to get one.
You can also of course buy identities from people who don't care, as a sibling comment says. But that's inevitable for any identity system where identities can be had cheaply.
I think systems like yours could become extremely valuable sooner than people expect, as the alternative is effectively 100% noise.
As others have mentioned, there are numerous ideological issues. However the alternative might be never encountering a real person online again.
And if not applicable for the broader internet, then probably in smaller or even country-sized gated communities, where people will expect to interact with 'real humans.
Also while IDs may be traded, the relatively small number of fake IDs compared to the infinite bots that can be created today is not even comparable.
Even if a passport was required, I think the same problems would appear. There are plenty of people with no interest in ever posting on Reddit. Some of them might be convinced to allow someone else to use a bot to post on their behalf if there is money to be made.
Not to mention there are plenty of leaked/faked passports out there
We will probably need dogs like they did in the Terminator movies at some point.
Folks have suggested web-of-trust systems. I don't know how they would be implemented - for now, I guess this is already sort of a thing on any platform where users can "repost"/"retweet" things.
I'll just put this out there because I don't know if I could ever implement it, I've had this idea that's essentially "IP permitted from"
We would extend the whois database to contain an oauth url for a given IP block and then forums or other services that need to ensure a real human person is present (Like at registration or when combined with some other trust systems), would bounce the user over to the URL and it would require the user to login via U2F/passkeys/TOTP/etc.
The thinking is that isps are the ones who know their customers are real, and as long as they can challenge them in a human interactive way, that should provide a strong signal that it's a real human. It's also a good way to protect against cookie stealing and could provide resistance from 'man in the browser" attacks as the end user would become suspicious of all the isp challenge pages popping up if a machine was being used in spamming.
It's not foolproof, there could be insiders working at the ISP, and this would require cooperation of all isps everywhere, but it would be a step in the right direction
Seems like we need to start trading anonymity for credibility?
Maybe that’s not such a bad thing.
Anonymity on the web has led to some pretty atrocious behavior.
Plus, at this point, anonymity on the internet is an illusion anyway.
Just use existing trusted CAs to issue personal certs based on some reasonably robust verification process.
Maybe the AI apocalypse will help fix the internet by making anonymity untenable.
Historically speaking, Reddit has been incredibly loose about identifying who is behind an account. Not even requiring email verification, let alone phone number or something more advanced like a drivers license.
The future is likely more similar to LinkedIn
None of this content is AI generated, not sure why you're bringing that up?
You don't define a bot as AI?
Why would someone define bot as AI? Bots have been around forever. A bot could use AI, but not most bots currently do not.
Because it is? AI is just artificial intelligence. It does not say this has to be done with ML (machine learning), LLMs, or even any statistical methods.
A bot is just anything automated. Which has nothing to do with ai in any capacity, and only confuses the conversation.
What does this have to do with it not being generative AI?
I personally don't make much distinction between content that's generated by AI (LLMs), posted by bots, and manually forwarded by your grandma to your old AOL account. It's all the same spam, the new stuff is just more sophisticated.
I mean you're welcome do whatever you want, but I guess don't be surprised in the future when people are either confused by what you're saying or annoyed that you decided to talk about something else.
I have predicted this exact VERY predictable scenario this for years and got downvotes by AI enthusiasts who don’t want to even deal with any downsides of AI.
Examples: https://news.ycombinator.com/item?id=35688266
We are racing towards the abyss orders of magnitude faster than with climate change or nuclear proliferation, and even the overwhelming majority AI experts coming out and saying there is at least 20% chance of a global catastrophe or even risk of extinction earns a mere shrug: https://arstechnica.com/information-technology/2023/05/opena...
And CEOs: https://amp.cnn.com/cnn/2023/06/14/business/artificial-intel...
And yet even the most mild, libertarian-friendly proposal to mitigate the harm is utterly rejected by AI fans who gang up on any criticism, as the future botswarms will: https://news.ycombinator.com/item?id=35808289
I said the entire internet will turn into a dark forest, including all forums like Reddit and even HN. Swarms of bots will cooperate to “gang up on” opponents and astroturf to shift public opinion. And the worst part will be when humans start to PREFER bots the way organizations already do (eg trading bots replaced humans on wall street).
The AI people are building a dystopian future for us and won’t face ANY accountability or disincentives, but rather the opposite. I expect this post to be downvoted by AI people chafing at any criticism. (Like the opposite of web3 posts.) The replies, if any would even appear, will be predictably “well, it was all already possible with human efforts”, ignoring the massive difference in scale and cost to malicious actors (well, the replies would have been that if I didn’t call it out just now, because they always are, and hardly any actual substantive discussion of the extreme dangerous outcomes that are only starting to come about in very early stages).
Can anybody explain, specifically, what that 20% risk looks like? The most specific I ever see is "an adversarial AI will become sentient and wipe out humanity". It sounds like as much snake oil as the people pushing AI itself.
It doesnt need to become sentient to cause great disruption.
1. Bot swarms will simply disrupt everything about the Internet as we know it. Most people ALREADY barely scritinize chats and articles, so bots can EASILY produce those at scale to push opinion in any direction, or just sell shyt
2. Botswarms will outplay adversarrial games vs humans for karma / reputation points, as well as launch coordinated attacks on opponents organically trying to stop whatever viewpoint is being gradually pushed or sold, until they give up or are totally reputationally discredited
3. People start to PREFER bots to humans, just as they PREFER google maps to asking for directions etc. At that point most humans would be surrounded by 100-1000 bots and have no way of affecting other humans.
4. Physical world, cameras capturing all the info and cross correlating where you are. Maybe slaughterbots are mass-produced. Who knows.
When the costs come down and scale goes up, it doesnt matter about AGI, the entire society is disrupted permanently. And that’s what AI is on track to so. It’s far easier to continually create a mess than to continually clean it up.
You see, the jump from "Bots take over all the karma points on social media sites" to "Slaughterbots" is a pretty wide chasm I'm having trouble getting over mentally. This is why I can't take such predictions seriously.
Okay. So remove point 4 and it’s still very dystopian…
Not to mention point 4 contains things that have been in place already for over a decade. It’s not even a prediction: https://magarshak.com/blog/?p=169
But sure, take the one tiny thing you can caricature and ignore the rest. That’s one step up from strawman, I guess
Your original post didn't just posit a "dystopian" future. "...even the overwhelming majority AI experts coming out and saying there is at least 20% chance of a global catastrophe or even risk of extinction".
You're the one bringing up the prospect of extinction. Extinction! And a 20% chance at that. So no, I don't think it's unreasonable to ask about how we arrive at that outcome. Because there's a massive distinction between "dystopian" and "extinction".
No I’m not “the one” bringing it up, the experts and the people asking them and publishing their words are, and you seize on the most hard-to-substantiate claims first, and ignore the rest. Great debate technique for realtime debates, but this is HN and I can reply to focus the point.
My main concern for the next 5 years is that the Internet is going to become a dark forest where you can’t trust anything, it will be impossible to discern fake stuff, and even if it was, the botswarms will gang up to take care of any dissent.
That alone is extremely plausible and scary. Every single institution we have relies on the inefficiency of an attacker. Let alone swarms of attackers that any member of the institution can run instead of themselves, and can be subverted to bring about ANY goal, by who knows behind the scenes.
I've made the same prediction. It was blatantly obvious to me what would happen as soon as I saw GPT 3.5 producing decent quality responses. I had hoped the finger problem of image generators would last longer, but there are a lot of people with absolutely no foresight on the potential downsides of technology. SORA and other video generators are absolute madness.
Wikipedia is great as long as the topic isn’t politically controversial. In those cases you get the US State Dept/corporate media-approved perspectives with all the censored perspectives available in the Talk page.
Even political and historical events that you would not think are controversial have become biased.
As an example, there are some Wikipedia editors that continually remove mentions of genocide from the opening paragraphs of Stalin's page, whereas they leave them there for Hitler.
People really enjoy pushing their ideology in their spare time. I really don't understand it.
I think choosing the personal and political like of Stalin as your "political and historical events that you would not think are controversial" is a very, very poor example. It's very controversial and still impacts the lives of ~1 billion people.
Don’t forget Corporate PR
I was kinda shocked to see the stats on active editors, laid out fairly well in this report by a source that was banned from Wikipedia by a ridiculously small group
https://thegrayzone.com/2020/06/10/wikipedia-formally-censor...
That was a particularly egregious ban. I googled a random username of those that voted to ban them, and it belonged to someone that worked for a political think tank in DC that was opposed to their editorial stance on US foreign policy.
Examples?
Tangental: 404 media is a fun agency, always enjoy their articles. Edit: This plug is not by a bot, but in the near future, nobody will ever trust a plug like this because they’ll suspect it is by a bot. Weird
A bot could add "Edit: ...." to their reply just to make it seem like they're human and need to edit their responses
If you prompt it on how to do it, ChatGPT nails this.
"This plug is not by a bot"
Exactly what a bot would say.
Ask a conservative about that opinion. Do it before you do exactly what I'm accusing you of and downvoting the bad man who said the thing against "your side".
EDIT: Yea, thanks for the gaslighting but Wikipedia's organized effort to remove conservative editors to shift a left bias in the content is well documented. I'm the crazy one injecting politics into "fair and unbiased wikipedia" lol
Why do you assume that I'm not a conservative?
Wikipedia censors leftist content, too (in favor of "centrist", US State Department positions). Part of the problem is that their definition of neutral point of view is pegged to the editorial biases of the papers of record, which through a combination of corporate ownership and "access journalism" converge on a particular world-view, that of neoliberalism.
Injecting right/left politics into everything is so tired. I hope one day you realize how silly and artificial it is. Stop letting people who benefit from civil strife convince you that we always have to fight each other.
This is a little melodramatic, no?
Access to information, even without Wikipedia or Reddit, can still be found easily (compared to pre internet days) I personally don’t use google search anymore, but can still find links to public MIT textbooks (like SICP or Deep Learning) by searching on there. I’m sure google scholar, scihub, and arxiv will be around for a good while.
I’m sure if Wikipedia falls, another encyclopedia would take its place, since so many primary sources are still discoverable if you know the terms to search for. Maybe with a paywall, maybe not.
I don't think it's melodramatic.
First time I've heard people making that claim was around 2020 in the context of corona iirc. I think they called it "the age of misinformation", and that has only become more relevant since then, so I think it was even more on point then they realized back then.
Yeah the problem is that academia has the same issue with garbage papers. As long as information has some ad value, be it commercial or political it will fill all spaces with garbage to make a buck.
Huh, that just put a new perspective on facebook's huge push for open LLMs. The less useful anonymous stuff becomes, the more useful content made by people you know IRL is. And that's facebook's/IG's original value proposition.
Then why is my feed 95% ads lately?
Even more frustrating than ads is the recommended content from folks and pages I didn’t ask to be shown
Very timely, I just came across this account doing this exact thing if you want to see it in action.
https://www.reddit.com/user/Clear-Car862/
If you inspect their comment history, they are recommending several products in almost every reply. ContractsCounsel is one of the services they recommend. The formula they use for recommending is very similar for every post.
Also interesting, one of their only actual posts mentions the phenomenon of using bots to advertise, I guess trying to throw people off their trail?
TBF I just went into this EmploymentLaw sub, and when you look at the posts here (https://www.reddit.com/r/EmploymentLaw/comments/1boucq8/cali...), it just looks like there are bots talking to bots.
I find it pretty funny tbh.
I think I've come across this bot on reddit before. I read a lot of skincare-related subreddits and people talk about their routine and 'holy grail' products...so that seems like an appealing place for this type of thing to infest.
It wasn't quite obvious marketer-speak, but certain comments have just seemed like not quite the way a regular commenter would word things.
I figured it was regular humans doing it though. Sigh...
It’s hard to tell because the small communities have their own newspeak which seems weird to outsiders.
I've started putting reddit, servethehome, etc to my searches. Otherwise the results are lackluster in google or bing.
Are you a bot?
I downloaded Wikipedia months ago so I would have access to only-slightly-tainted information during the information winter.
I'm glad that I'm not the only one who keeps local backups of Wikipedia
I absolutely love having access to so much information, but it really seems like most people just don't even care. The ability to access experts of all kinds for advice or just to fill curiosity has been a boon to me, and I like sharing what I know with people who are interested. But when I look around at the people I know -- some of them are incredibly smart (much smarter than I am) but instead of making a reddit post or going on a topical forum, they just watch a youtube video or try to attempt whatever it is poorly themselves or just don't care to know more about things.
When I was a kid I wanted to learn electronics, so I got some books and parts at radio shack but certain things weren't obvious for someone who knows nothing, so I didn't know where the ground was supposed to go in a schematic diagram for instance. And the adults around me didn't know -- so I just had to figure it out. Now a kid can go on /r/askengineers and get an answer from an engineer in less than 20 minutes.
But overall -- maybe the 'information age' has backfired for society in general. Those kids will figure out what they want to know regardless of how easy it is, and so many people just look for information that confirms what they already think then weaponize 'facts' so they don't have to budge.
I'm really not sure -- it is so useful to me, but every time a nice place gets an influx of people it turns to shit, so I tend to lean misanthropic in the long term.
I'm one of the people who exhibit the behavior you've observed (watching a youtube video rather than creating a forum post, not the "try to attempt whatever it is poorly themselves or just don't care to know more about things" part), so I thought I'd explain how it got to this. First there is the "why not create a reddit post":
A problem I have with reddit are the users looking through my post history trying to extract more information than I present at face value in my post. Sometimes, it works out in my favor because I have an XY problem and get redirected to the correct resource. But most of the time, its just used to determine how they'll engage with me (seriously or not, mockingly or not, high effort or low effort). This is a specific problem that doesn't exist on HN, as a rule.
I'd rather not delete my post after I got my answer, in order to help other people coming in from google searches, so instead I create an account for each subreddit that I post in. If each subreddit should be considered its own forum, it would make sense to have a different account for each forum. It was even once encouraged by reddit for users to have multiple accounts.
The issue now is that most subs "shadow queue" (not shadow ban) posts from new accounts in an attempt to curtail spam. You'll still see your post if you're logged in, but not if you're logged out. And there is no engagement on it until a mod releases it.
Similarly, I am permanently behind a VPN, so creating accounts cause them to be actually shadow banned by default by reddit admins. I must message them to prove that I'm actually a human, after realizing that the mods also don't see my post in their queue. Once, after I got my account appealed, it got shadow banned again, for reasons unknown to me. I was particularly bitter since I had spent 4 hours to make a single high effort comment on that account.
Even if I've managed to overcome all this friction and gotten my post actually appears in the "new" queue, there are myriads of reasons why the post won't yield fruitful results. It could be the timing of the day or the week. Perhaps my title wasn't catchy enough. Maybe my wall of text was too big because I tried to fit enough context and people's eyes just glazed off. Maybe the post got overshadowed by more successful posts upvote wise and never made it to the hot page of the sub. Perhaps my questions is way outside of the skill range of that average sub's users (I've seen this happen often on my posts and others', in various outcomes). Or perhaps the regulars there have seen the same introductory level questions twice a day over the years and simply refuse to engage with them anymore.
It has gotten to the point that if I can't find someone else with the same question in various wordings as I have on reddit through a site:reddit.com search, I simply assume that no one has the answer.
As for why youtube, it's not where I usually start, but ends up being the best solution after all other potential solutions have been exhausted. I'll give some examples:
For music (I know its not work related), a lot of amateur/indie music is so old, it can't be bought anymore, and can't be liscenced to spotify. Most real piracy solutions are defunct (lack of seeders, dead mediafire/megaupload links). The only way to find the song is some random person's channel that was made 12 years ago and hasn't been updated since.
For many "open core" saas products, the problem starts at the documentation. Often, I just need a "getting started"/bird-eye-view of the system and how it would potentially connect with the rest of my systems. The first thing you are told to do in the docs is to sign up for their managed offering. Once I find my way to the self-hosted section of the docs, I am told to download a docker image. I don't want to download a docker image or sift through a 500 line docker file to know which configs are relevant or will do to my system. Then, they'll have a "you can also compile it from source" link that points to their github project page. If things they had a binary upload, excellent. But now I need to figure out what to stick in my config file, environment headers to set, arguments to pass before I have a "sane" startup.
The docs are also an excellent way to get lost in the weeds to "try to attempt whatever it is poorly themselves". You can easily get misled, as terms are recycled between different products with different meanings in each one. They may provide api docs, but no working examples. They may provide working examples, but without any notes, comments or implications on what each line or command does (To get started, run this command in your console: sudo curl ... | sh). You may have reached a certain point before getting it to work, but now you're stuck and you're not sure where the issue is. Sometimes, the docs are sparse, and when you're trying to learn a concept from a page, they'll have links all over the page linking to other concepts. You don't know if these are advanced concepts you can ignore for now or fundamental concepts prerequired for learning what you're trying to learn.
The community around these products are also less susceptible to helping you out. The product devs are focused solely on building the product, or support only the paid managed saas users. The other users are often "drive-by" github issue makers, mostly employees working with said product. They will post massive dump logs and grafana screenshots with machines provisioned with TBs of memory and clusters with hundreds of nodes. They're here to get their problem solved so they can move on with their workday, not subscribe to the project page to receive notifications of others who might have the same issues as them.
Youtube has "solved" these "open core" issues for me more than 3 times now. When you find a good 30min/1hr/playlist, its like finding a gold mine. They almost always start with a succinct birds-eye-view so you can early return/break once you realize this isn't what you need, rather than the product's landing page saying how its the silver bullet to all your problems. The web of "concept" has been linearized in a dependency chain for you. You can see the person doing things and their effects in real time without having to commit the effort. You can see what auxiliary (debugging) tools are used and their install process. You can see which commands are more relevant than other, instead of wading through `prog --help` or `man prog`. They comment on what they're doing so you know the scope and side effects of each command. You can watch it at 2x speed, skip, rewind. All of this allows you to cement a better fundamental understanding of the product you're working with, rather just calling up support from the paid managed service and slapping the it on your CV.
Then there are all the other fast evolving spheres of tech. Being stuck in the usual enterprise CRUD, it can be hard to dip your toes in adjacent domains. Whether it be finetuning an llm for your purposes, fpgas, linux, gpu shader programming, networking, photoshop/illustrator, video editing, game dev, etc... These domains are all evolving rapidly, and if you want to start and finish something with only a weekend of free time, a youtube tutorial is often good enough.
ReplyGuy was posted here six months back under a different name...
https://news.ycombinator.com/item?id=38070502
Or we weren't in it all the way quite yet, and the real information age is not defined only by the availability of information, but also by the massive quantity which drowns out simplistic search methodologies.
Maybe this is the natural end state of information systems. First they gather useful information, then they gather all information, then information starts being generated that is tailored to the system for the purpose of being in the system and affecting how it's used, often negatively. I can think of lots of examples, from internal wikis to rumor mills at work.
/r/LocalLlama has also had multiple[0][1] trending threads about dead Internet generators just within past week
[0] https://reddit.com/r/LocalLLaMA/comments/1cc0fyy/i_made_a_li...
[1] https://www.reddit.com/r/LocalLLaMA/comments/1cg39yq/deaddit...
We live in crazy times
*exposé