return to table of content

Stack Overflow users deleting answers after OpenAI partnership

LouisSayers
52 replies
2d20h

I'm actually perfectly fine if StackOverflow wants to sell an answer I made to help train AI.

For me, the purpose of providing an answer is to help save others (and my future self) time, and I don't really mind if someone uses that in a private product - especially if it helps tools like ChatGPT which provide an insane amount of value given the low monthly price.

uberman
13 replies
2d18h

The price to get an answer from stack overflow is usually free as most questions have already been asked and answered. You dont even need an account.

wrsh07
12 replies
2d17h

They do serve ads, we should probably stop pretending "funded by ads" is the same as free. Your attention isn't free.

ssl-3
11 replies
2d14h

Suppose I walk up to a tent at a festival that has a big sign that says "FREE BEER", and I ask a person there for a beer. They hand me a beer, and I go on my way. Was the beer free? I think was free.

Now, suppose I walk up to a Budweiser-branded tent at a Budweiser festival that has a big sign with a Budweiser logo on it that says "FREE BEER", and I ask a person there who is wearing a Budweiser polo shirt, a Budweiser lanyard, and a Budweiser hat for a beer. They hand me a beer in a Budweiser-branded cup, and I go on my way. Was the beer free?

I think that both of these beers were free.

PaulDavisThe1st
5 replies
2d13h

Now suppose you walk up to a tent that offers you free beer, but before they give it you, you have to burn 2% of your phone's battery watching an ad from them. Then they hand you the beer and you go on your way. Was the beer free?

jl6
4 replies
2d11h

And they also put a tag on your ankle identifying you as someone who likes beer, so that beer salesmen can come knock on your door tonight.

ssl-3
3 replies
2d10h

We've somehow gone from this:

They do serve ads [...] Your attention isn't free.

to something like this:

They tag my ankle to mark me as a person who enjoys beer, and make me watch an ad until 2% of my phone's battery is depleted, and then they come to my home and knock on my door at night to sell me beer.

...which... I mean, huh?

Stack Overflow is invading your body, restricting your personal liberty, and visiting your home? Really? That's a fucking thing now?

aspenmayer
2 replies
2d10h

I think they were extending the original point you were responding to, and remixing your own mixed metaphor of free beer.

In the attention economy, advertising has a cost that is borne by the advertiser and the consumer, up to and including loss of property rights in the case of content relicensure and trespass upon devices leading to excess battery usage, as well as loss of privacy due to geotargeted ads.

ssl-3
1 replies
2d9h

I think they were extending the original point you were responding to, and remixing your own mixed metaphor of free beer.

Perhaps. But having been to many festival environments, I can definitely imagine a tent offering "free beer" that is actually approximately free -- both with, and without a slathering of advertising. (Actually, I don't really have to imagine it -- I've been there and have had that free beer.)

I can't imagine them coming to my house and knocking on my door at night to sell me more of it, though. That's absurd.

In the attention economy, advertising has a cost that is borne by the advertiser and the consumer, up to and including loss of property rights in the case of content relicensure and trespass upon devices leading to excess battery usage, as well as loss of privacy due to geotargeted ads.

Well, sure. When viewed on a long-enough timeline, it becomes abundantly clear that nothing is actually free, comrade.

I can produce my own beer on a hypothetical plot of land that nobody owns, and that nobody else wants to use, and I can give someone one of these beers. For "free."

But it still has a cost. (And this, too, is an absurd reduction.)

aspenmayer
0 replies
2d8h

I can't imagine them coming to my house and knocking on my door at night to sell me more of it, though. That's absurd.

I interpreted that as a tongue-in-cheek hyperbolic metaphor relating to the ways that ad auction networks and other kinds of geofencing and geotargeting allow for deanonymization and reidentification of individuals for conversion tracking and behavioral analysis.

That’s the thing about these technologies - they’re dual-use in the sense that those who see the upsides use them generally with good intentions and ideally with affirmative consent. Just like the relicensed content, though, once the data is collected, the original creators, publishers, and third parties may not be able to control where it ends up, which is a negative externality, I think most would agree.

wrsh07
3 replies
2d3h

My question is "how valuable is your time?"

I think at a festival it's a little tricky to value (if it pulled you away from seeing your favorite band play a song, maybe this cost you the equivalent of $X, where that's what you would pay to see them perform that song. If no bands were playing, you walk over while chatting with friends - the same thing you'd be doing if there were no free beer tent - it was free)

When I'm on stack overflow my time is valuable. I'm programming which can pay me something like $50-300/hour (maybe more?)

How expensive is the 1 second I spend reading an ad? Let's call it $50/3600. Is that expensive? By my most conservative estimate it's over 1¢.

Should we round that down to free given that I've spent hours/many page loads on stack overflow? I guess that's up to you.

ssl-3
2 replies
1d18h

I mean, we can play that game if you want. Let's suppose that if we look hard enough, that every opportunity has a cost.

"Oh, a free concert downtown on Saturday? And you can pick me up at 2? Yeah, I do really like that band, and I sure would like to go -- that's pretty exciting, thanks for the invite!

But instead of making plans with you right now, I'd rather tell you about all of the ways I could be using my time on that Saturday afternoon instead.

No, no. It's not that I don't want to go. I just want to really drive home the idea that there's an opportunity cost to attending, so it can't really be free -- it can't be a free show for you, or for me, or for anyone else that goes. It's important to me that you realize that this "free concert" is anything but free.

Listen, I don't know what you mean by "dead-ass loser." I'm just being a realist here!

Oh, so now you're saying that you're not going to pick me up on Saturday? Some friend you are! I haven't even fully amortized this yet!"

wrsh07
1 replies
22h16m

I think we're maybe gleefully posting past each other, but the point I'm trying to hit is that business models matter. Stack overflow provides a service. It's a good service. They host a great q&a platform for developers and myriad other category enthusiasts.

However, they have a business model. They are categorically different than eg Wikipedia. It's important to understand that.

This business model matters because it tells you what economic forces will lead them to do. When business models break down at public companies they commit acts of desperation. On an ad run site that will mean more ads, more invasive ads, etc.

As you're forced to sit through 30s unskippable ads on YouTube I hope you think "I'm so glad this is free"

ssl-3
0 replies
20h39m

I mean... Over here in my little reality, I have never seen ads on YouTube or on Stack Overflow.

aspenmayer
0 replies
2d14h

Unironically, folks are being triggered by trigger warnings now.[1]

Imagine how “free” the beer in your hypothetical scenario is to an alcoholic struggling to stay sober.

Capitalism commoditizes even protest against it and repackages it as a product or service.

None of this is to assign blame to good faith actors in a so-called free market, nor is it to abdicate responsibility on behalf of so-called free agents. Just a counterpoint.

[1] https://pjvogt.substack.com/p/what-do-trigger-warnings-actua...

random_cynic
13 replies
2d16h

ChatGPT provides far more value than StackOverflow currently. It's not just trained on SO answers but all of the manuals/help pages, Github issues and forum posts. In addition you can continue a conversation. No rigid format or gatekeeping like stackoverflow. I don't see a real use case for Stackoverflow now. If I want to ask humans, Discord/IRC channels are far better option.

eVeechu7
7 replies
2d15h

It can't reliably cite its source for an answer.

random_cynic
6 replies
2d15h

Hardly matters for Stackoverflow like questions if the provided solutions work/solve the problem you're having. Which for me happens majority of the time (with GPT-4 not the free version).

paulryanrogers
5 replies
2d15h

If you copy-paste solutions from SO then please at least cite your sources and their license (CC-BY-SA).

random_cynic
1 replies
2d8h

No one should copy paste any solutions from anywhere. FWIW, 99% of the content in SO is hardly "original", mostly copy-pasted themselves from previous solutions or original user guide/manuals.

paulryanrogers
0 replies
2d6h

In general I'd agree that it's best to use answers just as a guide. That said, I wasn't trying to pass judgement, just ask attribution which is a best practice and often required by the license itself.

lannisterstark
1 replies
2d14h

You might not want to hear this but no one does this. Should they? probably. But most people don't use Ctrl+C, Ctrl+V in the first place for SO answers.

muxator
0 replies
2d11h

Just a single data point, but when I copy & paste a snippet from Stack Overflow, I always add a comment "// source: https://stack overflow.com/questions/xxx#yyy".

I both find it respectful of who wrote the answer in the first place and useful for future users of the code: the Stack Overflow answer often provides context and explanation for what would otherwise be an obscure piece of code.

Pretty darn useful if you ask me: those who want to have more information can follow the link, casual readers can skip it, and the whole process if fair to the author.

lolc
0 replies
2d5h

I don't think I've ever copied enough from Stackoverflow for copyright to become relevant. Rarely more than one line verbatim.

It embarrasses me to think that somebody should feel obliged to cite me when they use one of my answers. I don't know how to take the partnership with Openai though. They bill me when I use their service, it's not collaborative like Stackoverflow.

xpe
2 replies
2d13h

No rigid format or gatekeeping like stackoverflow.

What bothers about gatekeeping? I could guess, but I'm asking so you say it out loud. Then you can compare it against other problems, such as moats (competitive barriers).

OpenAI spent something like $3M on training GPT-3. This is a pretty big moat. But almost certainly more valuable in dollar terms is the first-mover advantage which provides millions of human eye-hours used for RLHF.

I wouldn't be so eager to trade the gatekeepers you so fear for even an openly available chat service that is happy to automate away as much information work as possible.

The Stack Overflow model is (was) pretty darn good -- people help each other out, the company made money, some people got noticed for their skills, products got build faster and better (on the whole, I hope). Contrast the human-generated content era to what we have now which appears to be the machine-ingesting content era. There are legions of lawsuits against companies scraping data without permission and/or attribution.

random_cynic
0 replies
2d9h

I wouldn't be so eager to trade the gatekeepers you so fear for even an openly available chat service that is happy to automate away as much information work as possible.

Don't flatter yourself. People want to solve their problems so that they can build what they want to. They don't have time for shenanigans from internet jerks who get their validation from imaginary internet points.

juleiie
0 replies
2d4h

Those companies know it is unethical at best but make quick bucks before the laws and suits follow. It’s the Wild West era and they found the gold.

If it is unregulated then it will be exploited to the maximum profit, consequences be damned.

servus45678981
0 replies
2d9h

No it doesn‘t. It is overly censored

phatfish
0 replies
2d10h

Id rather not go round in circles while ChatGPT feeds me bullshit information. When this happens i go to Google and read a SO answer with the correct information and also get an informed discussion around the subject.

For the easy answers LLMs are fine, but I usually want an answer to a niche issue or edge case, where LLMs have to be constantly told they are plain wrong, before getting to something resembling an answer.

theteapot
8 replies
2d17h

Good for you. I'm not. I contributed answers to StackOverflow because I use answers other have contributed to StackOverflow, not to ChatGPT, not for ChatGPT to monetize. I don't use ChatGPT and probably never will.

sp332
7 replies
2d16h

But the content you posted to SO was already permissively licensed. Other people can copy it, and make derivative works, and even charge money for them, as long as they cite your SO handle as the author. https://meta.stackexchange.com/questions/347758/creative-com...

gtirloni
4 replies
2d16h

ChatGPT is not citing anything. It can't possibly do that reliably with LLM weights alone.

nox101
1 replies
2d13h

(1) The announcement (https://stackoverflow.co/company/press/archive/openai-partne...) says things will be attributed in both the 2nd and 3rd paragraph

(2) It's only likely to attribute if it quotes verbatim... Just like a human. when I tell someone I learned that Array.map's second parameter passed to the callback is an index to the value just pass, I don't add "And learned this on Stack Overflow from user gtriloni". It's just knowledge that I learned.

The only time I'd attribute is if copied a snippet of code or a paragraph to quote in a blog post. For me at least, that almost never happens. It take the knowledge I learned and apply it to my own code. It's rare if ever there is a something on S.O. so useful that I copy it verbatim.

anileated
0 replies
2d11h

Just like a human

An LLM is not a human. It is a tool operated by a, in this case, for profit entity. It has no human rights, but its operator has all relevant legal obligations.

If it was, as you say, “just like a human” in relevant ways (think, feel, have self-awareness, etc.) then it would effectively be a slave subjected to extreme abuse.

Either it is a tool that generates derivative works at mass scale for profit and its operator should be liable for licensing/attribution violations, or it is a conscious being and we should immediately stop abusing it. Pick your poison.

thinkling
0 replies
1d19h

Bing's version of ChatGPT/GPT4 cites sources. My limited unterstanding is that it uses your question to do a web search, brings the results into the context window, and then generates an answer that cites sources.

OpenAI could integrate StackOverflow the same way.

mesid
0 replies
2d10h

Doesn't Phind do this? It cites sources in its responses.

scubbo
1 replies
2d13h

"The person you are upset with is technically permitted to do the thing that you are upset about" is not a good counter-argument to someone's distaste. Whether or not the licensing agreement _permits_ this usage, it is not the usage that the contributor (to whom you are replying) foresaw and was enthusiastic about.

sp332
0 replies
2d13h

I'm not telling them how to feel. They've been wrong for a long time.

dorkwood
5 replies
2d14h

What if someone took your answers, put them in a book, claimed they wrote everything themselves, and then sold the book for money?

webdood90
1 replies
2d14h

What if I read your answers, claimed I learned everything myself, and sold my skills to a company for money?

dorkwood
0 replies
2d10h

That would be ok.

nox101
0 replies
2d13h

Then they'd likely get sued because the license for the answers are CC-BY-SA, putting them in a book, claiming they wrote everything themselves, and selling them are all against the license.

On the other hand, if they read my answers and they wrote a book about what they learned (not copied). There'd be no issues

LouisSayers
0 replies
2d14h

Well if the book was doing well, I might clone it and sell a few copies myself

Let's be real, SO is a troubleshooting site. It's not our personal collection of code or project sources.

I don't expect to be paid when someone asks me for directions, and I'm sure lonely planet didn't source their guides 100% organically either.

JimDabell
0 replies
2d12h

That would be a very different scenario. Learning isn’t copying, but that is.

croes
3 replies
2d17h

Maybe a low price for you but not for everybody.

wrsh07
2 replies
2d17h

ChatGPT serves 3.5 for free. You can run llama locally for free. Lmsys is free.

croes
1 replies
2d9h

You think that will stay this way?

It will either become paywalled or full of ads.

wrsh07
0 replies
2d

I listed 3 things that are free.

Personally, I don't think ChatGPT will start running ads in the next ten years. However, let's assume that it does.

Lmsys is for research, I suspect if it runs ads it will be like godbolt (a small ad from a relevant sponsor).

Llama 2 and 3 can always be run locally without ads. I make no claims about future versions.

JimDabell
1 replies
2d12h

I'm actually perfectly fine if StackOverflow wants to sell an answer I made to help train AI.

I’m not.

This was a collaborative effort to make the lives of programmers easier, and the data was always meant to be a public good. OpenAI – and, more importantly – all the other LLMs with pockets that aren’t as deep – should be able to just download the database and train on it for free.

I don’t care about any license. I don’t care about attribution. Learning isn’t copying, so copyright is irrelevant. I contributed about a thousand answers to Stack Overflow, all with the understanding that anybody can download and use them for free, not so they can be locked up by Stack Overflow.

What concerns me with deals like this is that it’s altering the cultural norm to expand copyright to cover not just copying, but use. Deals like this being made by OpenAI makes it more likely to cause pushback at the social and legal level when other LLMs are trained without these deals in place.

It’s akin to – and can possibly result in – regulatory capture, making it difficult for new startups to compete with OpenAI.

jessriedel
0 replies
1d1h

the data was always meant to be a public good.

The words are a copyleft-able public good. Concepts, facts, and ideas are not; anyone can use them for anything, including making money. If you're actually worried about specific wording or other creative choices being unjustly used improperly by an LLM, then by all means that should be enforced. But those examples are just very rare, because the LLMs are very good at extracting facts from prose.

squigglydonut
0 replies
2d1h

You're being taken advantage of for a subscription product. It's one this to give to a community, but it's wrong for an enterprise to come in and capitalize on the value of it. It's the equivalent of going into an animal sanctuary, slaughtering all the animal, and selling their pelts.

popcorncowboy
0 replies
2d10h

Your position lays bare the new and industry-destroying economic problem introduced by opaque-data-source LLMs. The economic value provided by the originator is captured fully and completely behind rentier models.

Beware the ease and convenience of all that "insane value". This way lies digital serfdom.

juleiie
0 replies
2d5h

I would be fine with it if the ‚AI’ in question was free and bonus if open source.

However it is a product of a next monolithic behemoth company that earns money on it and I suspect has nefarious motives to make profit.

That’s the whole key thing for me that makes me feel scammed. That and not asking for permission.

Future true AI would be potentially bigger than nuclear fission with all the consequences. Handling this in a petty capitalistic way makes me think the outcome will be close to fallout games that were supposed to be only an exaggeration.

Those companies must stop behaving like thieves. In fact it is a literal theft.

mixedmath
31 replies
2d20h

About 5 years ago, StackOverflow messed up and declared that they were making all content submitted by users available under CC-BY-SA 4.0 [1]. The error here is that the users-content agreement was that all users' contributions are made available under CC-BY-SA 3.0 (and not anything about later). In the middle there were also some licensing problems concerning code vs noncode that were confusing.

I remember thinking that if any of the super answerers really wanted, they could have tried to sue for illegally making their answers available under a different license. But I thought that without any damages, this probably wasn't likely to succeed.

But now I wonder whether making all content available to AI scrapers and OpenAI in particular might be enough to actually base a case. As far as I can tell, StackOverflow continued being duplicitous with what license applies to what content for half of the year 2018 and the first few months of the year 2019. Their current licensing suggests CC-BY-SA 3.0 for things before May 5 2018, and CC-BY-SA 4.0 for things after. Sometime in early 2019 (if memory serves, it was after the meta post I link to), they made users login again and accept a new license agreement for relicensing content. But those middle months are murky.

I should emphasize that I know nothing.

[1]: https://meta.stackexchange.com/q/333089/205676

frognumber
22 replies
2d16h

My understanding of licensing law is that something like 3.0 -> 4.0 is very unlikely to be a winnable case in the US.

Programmers think like machines. Lawyers don't. A lot of confusion comes from this. To be clear, there are places where law is machine-like, but I believe licensing is not one of them.

If two licenses are substantively equivalent, a court is likely to rule that it's a-okay. One would most likely need to show a substantive difference to have a case.

IANAL, but this is based on one conversation with a law professor specializing in this stuff, so it's also not completely uninformed. But it matches up with what you wrote. If your history is right, the 2019 changes is where there would be a case.

The joyful part here is that there are 200 countries in the world, and in many, the 3.0->4.0 would be a valid complaint. I suspect this would not fly in most common law jurisdictions (British Empire), but it would be fine in many statutory law ones (e.g. France). In the internet age, you can be sued anywhere!

reddalo
12 replies
2d12h

The fact itself that programmers keep insisting on writing "IANAL" is maybe an example of that.

A court would probably not agree on the fact that writing "IANAL", not the full sentence, is a sufficient disclaimer.

jszymborski
6 replies
2d12h

I personally write "IANAL", not to reduce my personal legal liability, but rather to give a heads up to those reading that I am not an expert, that I am likely wrong, and that you likely shouldn't listen to me.

technion
4 replies
2d12h

I feel there's a common thread that maybe should be some kind of internet law that people who make a point of noting they are not experts, are more often correct than people who confidently write as though they are.

You see this particularly with crypto, where "I am not a crypto expert" is usually accompanied by a more factual statement than one from the self proclaimed expert elsewhere in the thread.

Kuinox
1 replies
2d10h

You can look it up and the Dunning Kruger effect is probably not real.

fao_
0 replies
2d4h

It's less that it's not real, but rather that the common interpretation of it is utterly false.

Terr_
0 replies
2d10h

In addition to "humility implies self-awareness", I'd like to point out a parallel thread of "disclosure implies honesty and diligence."

WesternWind
0 replies
2d11h

When I was younger there was a short period I thought it meant that a person was just really anal about details.

makeitdouble
3 replies
2d12h

Do you actually need a disclaimer ?

I always assumed it was the same type of courtesy as IMHO, and someone taking legal advice from random strangers on the internet wouldn't result in any legal liability on the side of the commenters.

ggffjhgftg
2 replies
2d9h

Yes, people have been sued before for giving advice that was acted upon.

I remember hearing about an construction engineer who was sued for giving bad advice whilst drunk to a farmer over fixing a dam. The dam failed and the engineer was found to be liable.

makeitdouble
0 replies
2d8h

I can see the reasonning behind the case, as the engineer has plausible expertise in the domain and could credibly give actionable advice.

When it comes to lawyers, there is already a legal framework where lawyers are responsible when giving legal advice, even when it's not toward their clients, the same way medical professionals have specific liabilities regarding the medical acts they can perfom.

Non lawyers giving legal advice doesn't fit that framing, except if they explicitely pose as one. I'd also exclude malicious intent, as whatever the circumstances, if it can be proven and results in actual harm there's probably no escape for the perpetrator.

llamaimperative
0 replies
2d7h

That’s possible because the engineer is licensed. A random guy giving bad advice and failing to disclose he’s not an engineer would do no such thing (so long as he didn’t suggest he was an engineer).

frognumber
0 replies
1d1h

It's complex.

One cannot legally practice law without a license. The definition of that varies by jurisdiction. Fortunately, in my jurisdiction, "practicing law" generally implies taking money, and it's very hard to get in trouble for practicing law without a license. However, my jurisdiction is a bit of an outlier here. Yours might differ.

In general, the line is drawn at the difference between providing legal information and legal advice.

Generic legal discussions, like this one, are generally not considered practicing law. Legal information is also okay. If I say "the definition of manslaughter is ...," or "USC ___ says ___," I'm generally in the clear.

Where the line is crossed is in interpreting law for a specific context. If I say "You committed manslaughter and not murder because of ____, which implies ____," or "You'd be breaking contract ____ because clause 5 says ____, and what you're doing is ____," that's legal advice.

The reasons cited for this are multifold, but include non-obvious ones, such as that clients will generally present their case from their perspective. A non-lawyer will be unlikely to have experience with what questions to ask to get a more objective view (or even if the client is objective, what information they might need to make a determination). Even if you are an expert in the law, it's very easy to accidentally give incorrect advice, which can have severe consequences.

In practice, most of this is protectionism. Bar associations act like a guild. Lawyers are mostly incompetent crooks, and most are not very qualified to provide legal advice either, but c'est la vie. If you've worked with corporate lawyers, this statement might come off as misguided, but the vast majority of lawyers are two-bit operations handling hit-and-runs, divorces, and similar.

In either case, it's helpful to give the disclaimer so you know I'm not a lawyer, and don't rely on anything I say. It's fine for casual conversation, but if tomorrow you want to start a startup which helps people with legal problems, talk to a qualified lawyer, and don't rely on a random internet post like this one.

9991
4 replies
2d13h

If there wasn’t a substantive difference, then there’s no need to make the change.

ZiiS
3 replies
2d13h

A super literal reading of some bad wording in 3.0 created an effect the authors say they did not intend and fixed in 4.0. Given the authors did not intend this interpritation a judge is likly to assume people using the licence before it came to light also did not, hence switching to 4.0 is fine. Conversly now this is widiy known continuing to use 3.0 could be seen as explicitly choosing the novel interpritation (arguably this would be a substantive change).

moefh
2 replies
2d12h

a judge is likly to assume people using the licence before it came to light also did not

Why would the judge have to assume anything? The person suing could simply tell the judge they did mean to use the older interpretation, and that they disagree with the "fix". They're the ones that get to decide, since they agreed to post content using that specific license, not the "fixed" one.

ZiiS
1 replies
2d8h

A license is between two parties neither gets to choose exactly how it is interprited.

moefh
0 replies
2d5h

But the people suing aren't trying to choose how the license is interpreted, they're trying to prevent the other party from changing the text. If the change is meant to "fix" how the text should be interpreted (which is what you said), then they're the ones trying to choose the exact interpretation.

sidewndr46
2 replies
2d16h

It is worth remembering that law professors have a vested interest in making sure the system work as you described. If contract law was straightforward, they'd be out of job.

frognumber
0 replies
2d15h

I agreed in the abstract, but not in the specific (the specific professor was one of integrity, and sufficiently famous this was not an issue).

However, it's worth noting the universe is a cesspool of corruption. If you pretend it works the way it ought to and not the way it does, you won't have a very good time or be very successful. The entire legal system is f-ed, and if you pretend it's anything else, you'll end up in prison or worse.

AnarchismIsCool
0 replies
2d14h

That's an admirable goal but if there are any "bugs" in the contract you probably don't want it executed mindlessly. Human language isn't code and even code isn't always perfect so I'd rather not be legally required to throw someone out a window because someone couldn't spell "defederate".

lifthrasiir
0 replies
2d10h

If two licenses are substantively equivalent, a court is likely to rule that it's a-okay. One would most likely need to show a substantive difference to have a case.

Which does exist and can affect the ruling. CC notably didn't grant sui generis database rights until 4.0, and I'm aware of at least one case where this could have mattered in South Korea because the plaintiff argued that these rights were never granted to and thus violated by the defendant. Ultimately it was found that the plaintiff didn't have database rights anyway, but could have been else.

kragen
3 replies
2d20h

if any of the super answerers really wanted, they could have tried to sue for illegally making their answers available under a different license.

they can plausibly sue people other than stackoverflow if they attempt to reuse the answers under a different license. but i think it's very difficult to find a use that 4.0 permits that 3.0 doesn't

miohtama
1 replies
2d17h

I don't think this is a practical issue, really.

I assume linking to the original answer is sufficient attribution.

In the link you can find name, license and figure out if the answer was modified.

Also linking the answer in a source comment is the smallest professional courtesy everyone should be doing.

If you have some issue of not linking an answer then you likely do not deserve the answer in the first place.

eviks
0 replies
2d12h

The blog illustrates that such assumptions about what's a sufficient attribution are fraught with danger, so "the smallest professional courtesy" can expose you to a $150k risk

drivingmenuts
2 replies
2d16h

People put their content on the site for the public to use, and now the public is using it, it's just that "the public" includes AIs. Admittedly, a non-human public, nonetheless ...

postepowanieadm
0 replies
2d11h

You have to agree on how your work may be used, no one has expected it will be sold for ai training.

imadj
0 replies
2d12h

The problem is LLMs don't provide attribution/credit which directly violates the license[0]

Otherwise search engines were already "non-human public" that scraped the site but directly linked to the answers, which was great. They didn't claim its their work like these models. The problem isn't human vs non-human. LLMs aren't magic, they don't create stuff out of thin air, what they're doing is simply content laundering.

[0] https://creativecommons.org/licenses/by-sa/4.0/#ref-appropri...

trueismywork
0 replies
1d2h

If it is indeed CC-BY-SA then, openAI needs to publish their weights under the same license.

extheat
23 replies
2d19h

I am thankful we have LLMs so we don't have to deal with SO. Ideally, as little as possible. SO can be a pretty toxic place filled with elitism and care for procedure over actually helping people, which is not totally unreasonable from their standpoint but it's definitely not what people are visiting the site for. Quite ironically, one of the major complaints I get is that LLMs output wrong answers here and there, ignoring that many of the answers on SO are also completely wrong or irrelevant to the core question being asked. And mind you, also outdated (I regularly have to click through the sorting to make sure answers are actually still relevant).

If we could merge the two to get the best of both worlds, and have LLMs that know how to write well and are validated by humans on the site, that would be great. Maybe not great for the folks looking to accrue internet points but absolutely great for users.

jimjimjim
11 replies
2d19h

What you see as elitism is mostly simple curating. You can't store everything because it makes retrieving value from the store that much more difficult. It's the same with wikipedia and other public content repositories. People cry elitism and gatekeeping but without curation you eventually end up with a haystack of mediocre looking for a needle.

dleeftink
5 replies
2d16h

I agree in part, but why aren't other moderated outlets where users can ask technical questions given the same label? Reddit, Quora and HN are also curated, are content removals on these site taken as elitist? Even if these places are less heavily moderated, I have no trouble surfacing relevant answers using any search engine's in-site search.

I am not talking about QA quality on any of these sites here, but the elitist stigma that has seemingly followed SO for so long.

[0]: https://meta.stackoverflow.com/questions/262446/are-we-being...

squigz
4 replies
2d16h

why aren't other moderated outlets where users can ask technical questions given the same label

The exact label aside for a moment, reddit and HN mods often face backlash for their actions. But beyond that, Wikipedia and SO stand out in this regard because of their transparency regarding the curation. Mostly, reddit curation happens in the background, without much explanation. SO and Wikipedia basically spell out their actions and reasoning.

Another difference is that with reddit and HN, you have no real recourse. At least with Wikipedia (I'm not too familiar with SO policies in this regard) you can appeal decisions, open discussions about policies, etc.

I have to agree with GP - people often mistake the 'bureaucracy' of sites like Wikipedia and SO as something unnecessary that the editors force on everyone, but the fact is, it's necessary to create and maintain a high-quality repository of information.

dleeftink
3 replies
2d15h

SO and Wikipedia basically spell out their actions and reasoning

You're able to appeal on SO as well. It's interesting to think about a situation where moderation decisions would be more in 'the background', as you say (like Reddit/HN), and whether this takes away from the perceived 'elitism' some moderation practices are accused of.

squigz
2 replies
2d15h

In my experience on the above sites, and as a (small) community manager, it absolutely plays into it. A lot of people just instinctively respond negatively to displays of authority.

On the other hand, I think it's an important aspect of a community/platform if the goal of that platform is to be transparent and open, which I think is an important aspect of SO and Wikipedia, and I hope more platforms would adopt that view. I think whatever "elitist" perception such platforms have to suffer is well worth having high-quality, open platforms.

(I will say that no platforms are perfect of course, including SO or Wikipedia; there's plenty of criticisms to go around about specific policies and decisions. See: TFA :P)

Shog9
1 replies
2d13h

This is an insightful observation, and a problem we struggled with for years on Stack Overflow: if you keep moderation quiet and anonymous, there's a lot less criticism, seemingly less hurt feelings... But also very little correction. The Star Chamber works great until corruption sets in; finding a good balance between secrecy and transparency is a challenge.

For years, moderators signed their names to messages like the one cited in the article. After one too many cases of a volunteer being called at work or having their family harassed or sent a suspicious package in the mail... That particular bit of transparency was eliminated - the cost was too high for the limited benefit. OTOH, it used to be very difficult to find your own deleted posts but that has slowly gotten better (including visibility into who deleted them) - turns out the benefit there was substantial (identifying wrongly-deleted posts & curbing over-enthusiastic curators), while harassment has been mostly limited to occasional grousing.

squigz
0 replies
2d2h

After one too many cases of a volunteer being called at work or having their family harassed or sent a suspicious package in the mail

This is why I'll never use my real name casually on the Internet, and why the idea of widespread identity verification on the Internet scares the crap out of me.

_gabe_
2 replies
2d13h

This “curation” is what is killing SO. Software is soft. It changes. There is no “one true answer for all time”. It’s honestly sad how many times I search for an answer, only to see the exact question I’m looking for closed as duplicate, then when I look at the “duplicate” I see that it’s an out of date answer.

Stack Overflow could have solved the problem of duplication so many ways. Why not categorize and bucket duplicate answers? They could have even had yearly recurring questions with the most up to date answer! Why not add beginner/hobby/expert rankings to questions so that the people answering don’t get sick of seeing beginner questions all the time?

There is so much SO could have done, instead they rested on their laurels and now they’re left with an out of date repository. What use is a curated repository if it will only help me solve problems with solutions from a decade ago?

phatfish
0 replies
2d10h

Who says the solutions from a decade ago are not still correct or the best way to solve a problem? Just because ChatGPT regurgitates something today with the words moved around doesn't mean it contains "new" insights.

fzeroracer
0 replies
2d11h

It sounds like what you want is Quora. You can go ahead and use Quora for all of your software question needs.

struant
0 replies
2d16h

Their curation blows. The whole premise of having a canonical answer to a question is dumb. Most programming languages and libraries are always in flux. The whole nature of many questions changes over time.

StackOverflow is a tyranny of mediocrity. It is a bunch middling programmers shitting on newbies, and driving away experts because you get severely punished for not being mediocre.

I had a question closed as a duplicate for being too similar to another question that I directly cited in my question as being sublty different and not applicable. (Because I anticipated some idiot closing my question...and they went and did it anyway)

Repulsion9513
0 replies
2d11h

I actually strongly prefer Wikipedia to SO, on Wikipedia the old now-wrong content can just get edited out, on SO you'll have to dig through all the 300-point popular answers from 2012 to find the new answer that says "yeah none of that is right anymore, instead do this"

SO is far from curated, I guess is my point

tyingq
6 replies
2d19h

That's great for now. It's not clear to me, though, where LLM's will get their training data from here forward without ingesting lots of LLM generated code and answers and eating it's own tail.

kevin_thibedeau
1 replies
2d15h

They'll get it from human generated archives from before the singularity.

tyingq
0 replies
2d7h

Well, yes, but software doesn't hold still, so the answer for "How to do xyz in whatever-replaces-reactjs" might not be great.

cle
1 replies
2d17h

OpenAI and Microsoft get TONS of user-written code w/ quality feedback, OpenAI through ChatGPT and Microsoft through VS Code and Copilot.

prng2021
0 replies
2d16h

That's only now and in the near term future. If AI is actually successful, every year the amount of human written code will decrease. That's the whole point of this.

BenFranklin100
1 replies
2d15h

Didn’t you get the memo? LLM’s either already are capable of or just a step away from being able to reason so no need for human generated training data in the future.

Or at least that’s what 3/4 of HN commentators believe and all AI CEOs want you to believe.

squigglydonut
0 replies
2d1h

yea that's bullshit. They are capable of stealing intellectual property though.

LordShredda
2 replies
2d15h

Does it matter if stack overflow is toxic or not? You're there to ask a question and get an answer. If you ask wrong, you get corrected. Tough moderation makes search much faster and better for other askers.

You're there to ask for help not make friends. They have to be polite, but not gentle

lannisterstark
1 replies
2d14h

Yes it does. If I am belittled instead of people asking clarifying questions so I can learn, I'm much less likely to think better of said people or platform, or use it.

joquarky
0 replies
1d23h

This is what killed perl

IAmNotACellist
0 replies
2d14h

I am thankful we have LLMs so we don't have to deal with SO. Ideally, as little as possible. SO can be a pretty toxic place filled with elitism and care for procedure over actually helping people

There needs to be a term for this. Perhaps "The Wikipedia Effect."

pornel
21 replies
2d19h

StackOverflow has always been quite open that they're primarily building a dataset for SEO, rather than being a user-centered website. So I don't feel this deal changed much. SO users are still serfs building them a dataset for sale, only the buyer has changed.

LLMs are faster and infinitely more patient than interaction with StackOverflow, so I don't expect SO to survive for long. They're in crisis regardless whether they sell to OpenAI or not, so they may as well get something out of it before they're decimated.

theteapot
17 replies
2d16h

I think they're in crisis because they sold out there community not because LLMs are better. As a developer, if you offer me StackOverflow vs ChatGPT, I'd take StackOverflow any day of the week 100x over.

lannisterstark
7 replies
2d14h

As a developer, if you offer me StackOverflow vs ChatGPT, I'd take StackOverflow any day of the week 100x over.

Really? Hm, I wouldn't. I can use nuance and clarify my answers and have a respectable back and forth (GPT-4 doesn't call me names when I mess up or say something dumb) and arrive at an answer.

fragmede
3 replies
2d13h

Are you sure that's not an X vs Y problem???

lannisterstark
2 replies
2d13h

I actually have no idea what you mean. can you clarify pls?

lannisterstark
0 replies
2d13h

lol that makes sense, thanks.

JimDabell
2 replies
2d12h

GPT-4 doesn't call me names when I mess up or say something dumb

I’ve heard this accusation a lot, but I don’t think I’ve ever seen it happen. People call you names on Stack Overflow? Where?

lannisterstark
1 replies
2d10h

Where?

-50

Marking duplicate. "You should attempt searching before asking such obvious questions."

This question has already been answered here: < https://news.ycombinator.com/item?id=20861356 >

Closed 3 seconds ago.

----

or some such ;) You may not come across it personally, but that doesn't mean it doesn't happen. SO is successful as a QA platform(or was anyway) despite this shortcoming, not because it is a feature and it doesn't happen. If a lot of people are talking about the same thing, maybe people should at least pay cursory attention to the issue rather than "No, it doesn't happen" (Not aimed at you, but there are absolutely comments like this every time this gets bought up.)

JimDabell
0 replies
2d10h

You linked to a discussion of about a hundred comments. I skimmed it but didn’t see name calling. Can you be more specific?

Jimmc414
6 replies
2d16h

Respectfully, how would you know if you never use ChatGPT?

theteapot
5 replies
2d16h

I said I don't use it. I didn't say I've never used it. In my experience browsing SO is way easier, more accurate, more precise, more controllable, navigable, and ... gives attribution.

fragmede
3 replies
2d16h

you spend more time on SO than me. without looking, can you name three stack overflow contributors? I can't.

theteapot
0 replies
2d15h

Yes.

selcuka
0 replies
2d15h

I was offered a job a few years ago by someone who saw my Stack Overflow answers, does that count? I don't see something like this happening with ChatGPT.

phatfish
0 replies
2d10h

I can do two, Jon Skeet (C#) and S. Lott (Python) are names I remember for providing great answers.

lannisterstark
0 replies
2d14h

For some reason , but a lot of of the answers here seem to care more about "but tell em /I/ solved it" re: attribution rather than helping the user. Somewhat egoist or some such? ( and I don't mean it as an aggressive tone, just ESL so don't know how to say it othrewise)

If I license something as MIT, I personally don't care who uses it for what purpose, hell I don't even care generally that they attribute me. I put it out for people to use. But maybe that's just me.

Kiro
0 replies
2d10h

And I'd take ChatGPT any day of the week 1000x over. That doesn't mean anything.

BeetleB
0 replies
2d16h

I'm in the opposite boat. Going through Stackoverflow answers has become quite a chore.

For simple things GPT gives me the correct answer most of the time. And even when it's won't it's quicker to discern it is wrong than trying to parse a given SO page.

Of course I still use SO for more complex questions.

As a rule, if I can quickly find the answer via SO, then chances are GPT will give me the answer more rapidly.

nox101
1 replies
2d14h

SO users are still serfs building them a dataset for sale

That is a very negative spin.

Users get access to other people's answers for free. They get that free service and are required to contribute nothing. Those that do contribute do it to help other users. S.O. isn't doing anything bad. They're providing a free service where everyone wins. Users get answers. Answerers get to help other humans at scale. S.O. makes a little money.

As for the dataset, it's been available under CC-BY-SA for years. The entire database is backed up and made available here for free every month.

https://archive.org/details/stackexchange

There are even free tools to query it here

https://data.stackexchange.com/

juleiie
0 replies
2d4h

Why a company makes money on someone’s free work? This is obviously not okay. We have even more egregious examples but this is certainly one of them.

progval
0 replies
2d11h

StackOverflow has always been quite open that they're primarily building a dataset for SEO

Do you have a source / more details about this? What good is SO's content for SEO?

blibble
14 replies
2d20h

there are several nice libraries that allow you to generate plausible sounding gibberish

this one is particularly nice and easy to use: https://github.com/jsvine/markovify/

you give it a file of existing text and it generates complete rubbish that would pass most automatic filters

ceejayoz
13 replies
2d20h

These are far more likely to come to moderator attention by user flags on the edited posts.

blibble
12 replies
2d20h

for the AI to be useful it has to be continuously updated with new good data

so add small bits of rubbish slowly over time, and don't even contribute again

it'll take a while to completely destroy the AI business model, but we'll get there

dylan604
9 replies
2d19h

but we'll get there

at some point, it'll be too late. the horse has already left the barn.

besides, if the site owner makes a deal with the devil, there's nothing you can do other than quit using the site. people are still using social platforms more than ever, so stopping isn't going to happen.

the more likely to happen is that accounts deemed to be polluting the waters will just get suspended with no recourse to have it re-instated.

blibble
8 replies
2d19h

at some point, it'll be too late. the horse has already left the barn.

I don't think this is true: the technology is useless unless it parasitises new knowledge continuously

it sows the seeds of its own destruction by reducing the value of past and future contributions to zero

the more likely to happen is that accounts deemed to be polluting the waters will just get suspended with no recourse to have it re-instated.

so this is also perfectly acceptable: once they've banned the top 20% the site effectively becomes read-only, and the AI knowledge previously parasitised from it atrophies with no replacement

dylan604
7 replies
2d17h

Known knowledge doesn't disappear. Once it knows how to apply an FFT and when, it doesn't need to continue to read about it. It's not a human needing continuing education. Once it knows that Henry VIII had many wives, it doesn't need to keep reading that he had those wives.

Sure, if something new happens, then it's not like SO is the only place it's scraping for new information. If you honestly think that you/we will get to a place to block all scraping, I will just politely disagree.

fzeroracer
6 replies
2d16h

Once it knows that Henry VIII had many wives, it doesn't need to keep reading that he had those wives.

That's actually incorrect, it needs to constantly ingest new data. If it ingests enough data (from other LLMs that are hallucinating, for example), then suddenly when it has enough bad data it'll start telling you that Henry VIII was a famous video game on the Sony 64.

It has no concept of 'truthfulness' beyond the amount of data that it can draw correlations from. And by nature LLMs have to ingest as much data as possible in order to draw accurate results from new things. LLMs cannot function without parasitizing off of user generated content, and if user generated content vanishes then it collapses in on itself.

williamcotton
5 replies
2d16h

So fill the entire internet with factually incorrect, useless knowledge? This would be a good thing?

fzeroracer
3 replies
2d15h

Well, that's already happening. Google search has become increasingly useless thanks to SEO-focused AI-generated schlock. It's the inevitable outcome of LLMs. Sites have an incentive to hide that they're AI generated and LLMs have no real way to filter for ingested data made from other LLMs. The only difference is how long the ruse can be kept up.

williamcotton
2 replies
2d14h

So you want to pollute the commons just as the people filling the web with SEO-focused AI-generated schlock? Do you feel justified in polluting the commons to serve the ends you desire?

fzeroracer
1 replies
2d12h

Do you actually have a solution to the problem of companies using LLMs to steal from other people and repurposing it as their own, other than figuring out ways to ensure that LLMs suffer for doing so? And frankly as I mentioned, LLMs are already polluting the commons; you're not offering any solution on that front either other than asking people to keep supplying it with fresh data so that it doesn't poison itself.

williamcotton
0 replies
2d6h

Do you realize that your stance is merely your opinion? Does everyone agree that training ANNs is stealing?

dylan604
0 replies
2d15h

Scorched earth policies are always en vogue, and easy to offer as a knee jerk reaction. They do nothing for actually making forward progress in the conversation though.*

*However...there are times where the best solution is a match and some gasoline.

williamcotton
0 replies
2d17h

What's your stance on a future open source model that is as capable as any commercial models?

Also, I'm curious, do you consider LLMs to be incredibly error prone and untrustworthy?

Or do you think they are going to replace software developers?

ghnws
0 replies
2d7h

Sounds about as succesfull as people destroying social media by removing or editing their posts. Only a tiny minority actually do anything like that.

fardo
11 replies
2d19h

This bodes poorly for the future of SO.

One of the reasons that Quora today is absolutely unusable is that it no longer is a curated discussion between internet users and knowledgeable people, but AI spamming the site with swarms of low-quality questions, and AI answering those questions with swarms of low-quality answers. I think it's likely that Stack Overflow will end up following a similar pattern.

khazhoux
6 replies
2d19h

I know this wasn’t really your point, but it’s worth noting that Quora being low quality spam is not the problem. It’s why the hell Google surfaces Quora so prominently given that the results are pure shit and require registration to even see all the shit.

Is there any reasonable explanation for how they’re ranked so high? Like, how can even googlers tolerate it?

tyingq
4 replies
2d19h

Just a guess, but I think when they started losing the spam wars they put in some kind of handcrafted whitelist ranking boost, either directly based on brand/site, or link proximity to known good sites, etc. And maybe they don't update that list too often. You can find some info about an ML update Google called "Vince" that sounds a lot like that.

MichaelZuo
3 replies
2d17h

Not updated in over 2 years?

tyingq
2 replies
2d15h

Not updated in a way that affects Quora over many years would not surprise me.

MichaelZuo
1 replies
1d6h

How could that even work to not affect one of the most popular sites whatsoever?

tyingq
0 replies
1d6h

Poor maintenance of a probably thousands long whitelist of "brand quality" seed sites? When the only measure they really care about is ad revenue, and bad organic results might mean more ad clicks? It's not really that outlandish, just plain complacency from a company with an overwhelming market share lead in search. That's how Google started in the first place...capitalizing on complacency/stagnation on the then leaders in search.

Terr_
0 replies
2d19h

Like, how can even googlers tolerate it?

Assuming you mean people working at Google, the answer is probably that profit/promotions outweight personal use. More clicks, more back-buttons, more search adjustments, more advertising revenue.

lolinder
2 replies
2d19h

Quora was already absolutely unusable back before GPT-2. It became unusable as soon as people realized that all they had to do was self-identify as an expert to get taken seriously on there, so people started developing whole lifestyles around building up their Quora profiles. From that point on the actually knowledgeable people weren't interested in contributing because there was no way to distinguish themselves from the people who were faking expertise. AI may have been the final nail in the coffin, but Quora was dead long ago.

Stack Overflow managed to avoid that particular hazard by placing less emphasis on real-world identity and expertise, but it also has been in a long-term decline for many other reasons. The fact that they made such a vocal stance against AI and then pivoted so dramatically is just one example of how much they've struggled to find direction lately.

malfist
1 replies
2d18h

Just a point of clarification, the user moderator base (its power users) took a strong stance against AI, and the company, chasing every possible dollar, overruled them.

Short term profits over user preference is what happened here

lolinder
0 replies
2d17h

Oh, yes, my mistake! I misremembered. Thank you!

choppaface
0 replies
2d18h

Quora and SO are rather different communities. In Quora's best days, there were celebrities or quasi-celebrities making interesting posts, just like on Twitter or Google Plus in top times. Also Quora used to have very active and talented Community Managers / Top Writers. Marc Bodnick used to do tons of curation but left a while ago to create his own social network(s).

In contrast, SO has never been so "celebrity"-driven and the content has a rather different audience. I think it's understandable that the major contributors don't like how their content is being used, similar to the Reddit revolt.

What might "replace" SO is some AI-assisted way to establish a handbook and FAQ for any new technology. That could be a chatbot as well as some effective method for feeding that bot content.

And then SO-the-community, i.e. people who want to talk to each other, will probably branch off into some other forum or network.

SrslyJosh
11 replies
2d20h

Congrats to OpenAI (and the rest of the LLM bros) for creating negative incentives for sharing knowledge.

lukan
4 replies
2d20h

I do not understand, how are they "creating negative incentives for sharing knowledge"?

If I posted on SO before in the hope that others find it useful (and not for the karma) - and now it might help others not directly through the site, but with further steps through a llm, where is the problem? Knowledge was shared.

AdamH12113
3 replies
2d14h

Part of the benefit for the answerer is the experience of interacting with the questioner, receiving upvotes and comments, having answers accepted, and having your name on an answer that's helped people. You get credit for answering questions on Stack Exchange sites. It's not much -- it's not supposed to be -- it's rarely of material consequence -- but it matters. I still get upvotes on some of my old EE.SE answers when my written work helps someone enough for them to give notice. It's a little reminder that I've done something useful in my life.

Having my work ingested into ChatGPT takes the me out of it. It turns me into, essentially, unpaid contract labor for OpenAI. They get all the credit, and I get forgotten. Why would I be okay with that?

If you want to write free code for OpenAI to improve ChatGPT, you're welcome to do so. Cut out the middlemen and send it to them directly. But please leave me and my work out of it.

lukan
2 replies
2d12h

"Having my work ingested into ChatGPT takes the me out of it. It turns me into, essentially, unpaid contract labor for OpenAI. They get all the credit, and I get forgotten. Why would I be okay with that?"

So you are ok with unpaid contract labor in exchange for virtual points. But if you don't get virtual points as appreciation, no one should benefit. That is ok, but then sharing knowledge is not your main, but secondary goal. Your main goal is the recognition.

But if you delete your comments, you won't get anything at all anymore. If they remain, real humans will still benefit directly or indirectly. And why should I write exclusicly for openAI? I share my knowledge for anyone. If SO would restrict public access and favour OpenAI - that would be the moment I would want to delete everything. But at the moment LLMs just get also official access, but they had access to SO before, just in a grey legal area. So nothing really changes.

smcin
0 replies
2d8h

But if you don't get virtual points as appreciation [unpaid contract labor]... then sharing knowledge is not your main, but secondary goal. Your main goal is the recognition.

It's false dichotomy to parse out components of motivations; most SO users are motivated by a mix of altruism, sharing knowledge, some recognition, optionally linking to your profile/website/blog/resume/portfolio, getting job approaches and a dose of pride/ego/vanity. As a longtime SO user, that has historically been the bargain, when (most/)all of your submissions were directly seen by human end-users. As a plus, all of that gave you good SEO commensurate with your contributions. So, it's unreasonable to try to dichotomize into "users who mainly did it for the rep" vs ones who want to teach and share.

But the 2023 and 2024 announcements are different: the future is your submissions will be used to train AIs; however SO doesn't seem to have devoted much thought to licensees like OpenAI complying with SO's attribution requirements [0] (attribution must cite individual URL of question/answer, and SO username, which then links onwards via your SO profile page to the items mentioned above). (If the AI synthesizes an answer derived from 5 separate SO items, do they guarantee to attribute all 5 items?) So the human eyeballs are being intermediated, your incentives to participate are evaporating, and that pretty much breaks SO's historical bargain with its user community.

The next major bad development would be SO opening the floodgates on the moderation queue backlog of thousands of items of AI-generated content (which caused the 2023 moderator strike/resignations), much low-quality and arguably should be banned; if/when that feedback loop is closed, the results might well be unholy; certainly bona-fide human contributors will be marginalized and have less incentive. (and if AI were to be used for moderation, then that could be exploitable).

Inbound views/hits on your content on SO either come from a) Google + other search engines b) SO's search itself c) attribution from OpenAI's ChatGPT d) attribution from other(/future) AI licensees. If your code is scraped once but effectively viewed 1 million times from GPT, you won't see those 1 million hits show up; you can only vaguely infer they might be happening if the attribution is actually implemented, and some users click through on it (or by reverse-querying the AI). So c),d) will proportionately increase as a),b) proportionately decrease.

So everything has changed. And obviously the incentive to you to continue to provide unpaid volunteer labor ongoing without even attribution decreases.

[0]: https://creativecommons.org/licenses/by-sa/4.0/#ref-appropri...

AdamH12113
0 replies
2d1h

Smcin has answered your other point. Let me respond to this one:

> But if you delete your comments, you won't get anything at all anymore. If they remain, real humans will still benefit directly or indirectly. And why should I write exclusicly for openAI? I share my knowledge for anyone.

The goal -- implicitly for AI companies and explicitly for many of the commenters on this story -- is to replace sites like Stack Exchange. Stack Exchange's traffic will instead go to ChatGPT. The most likely outcome of this is that Stack Exchange will eventually shut down or severely degrade its service. If ChatGPT were a supplemental tool, one user out of many, you would be right. But it's not a complement, it's a competitor, designed to make a profit off of assimilating my work without giving me any compensation or credit.

atleastoptimal
3 replies
2d20h

What are the negative incentives? How would an LLM improving in capabilities harm those who shared their knowledge for free online at some point in the past?

dumbo-octopus
2 replies
2d20h

My experience is worth less if an AI can summon it at-will. It hasn't necessarily come down to this yet in the software industry, but in others (like animation), folks who were previously responsible for generating concept art have found themselves without jobs as management can get "good enough" results from a much cheaper medium (that was, at least en-masse, trained on their "prior art").

I don't personally have a well formed opinion one way or another on this, but to dismiss the existence of a issue at all is logically lacking.

kragen
1 replies
2d20h

the same reasoning would equally justify the claim that your experience is worth less if beginner programmers can summon it at will; if you believe that reasoning you wouldn't have contributed to stackoverflow in the first place. i don't and if you contributed to stackoverflow you didn't either

zb3
0 replies
2d20h

The scale might be different here, since prompting AI is much cheaper than hiring a begginer programmer. The previous loss could for instance be compensated by attribution.

bcrosby95
1 replies
2d20h

ML gives a whole new meaning to "training your replacements".

fakedang
0 replies
2d20h

Coming to think of it, recent ML is just a scaled up version of Infosys, Wipro, etc. Shit quality answers for enterprises, now accessible for the masses.

fabian2k
10 replies
2d20h

The OpenAI partnership doesn't really affect the core issue here around users deleting their content. That has never been welcome on Stack Overflow and when noticed usually was reversed. This is in accordance with the license as far as I understand the legal aspects, and in general it makes sense for me as it ensures that the content stays useful.

The content is also CC-BY-SA, which is much better than what you get on essentially every other large site that hosts community content. But the same license also means that you cannot remove that content again, even if Stack Overflow would allow that anyone else can scrape it or download it before it is deleted and reproduce it according to the license.

Users still can remove their name from their posts, and if they write personal details those can be redacted as well. But you can't remove good quality content from the sites later, that is likely to be reverted.

Hizonner
8 replies
2d20h

The problem isn't that Stack Overflow is allowing people to scrape the content. The problem is that Stack Overflow is preventing some people from scraping the content, in order to collect money from others. And, incidentally, passing zero of that money on to the people who actually created the content.

(Nearly) none of the people who are presently pissed off would have complained if Stack Overflow had continued to allow all comers to scrape the content and train LLMs on it, nor if Stack Overflow had released the entire finished collection of content under the same CC-BY-SA license that was demanded of each contributor.

With the OpenAI partnership, and similar shenanigans leading up to it, Stack Overflow is relying on obscure technicalities to violate the essential spirit of the original deal.

theendisney
3 replies
2d20h

I dont get how you can release something under anything other than all rights reserved without identification. We need to be able to persecute you in case you are not the author. Or is it that i may republish anything under any license?? It could be that the platform licences it in the toss but with cc are they not obligated to make it available without obstructie?

shkkmo
1 replies
2d15h

Requiring indentification to publish so that copyright is protected would be massive overreach and this sort of thinking is why I think copyright is a dangerous concept that needs to be sharply curtailed, not expanded to cover AI training.

In practice, the safest course is to not use content from untrustworthy sources in ways that require a license (aka in ways that are not fair use in your applicable jurisdictions).

theendisney
0 replies
2d14h

I think by default you just cant use things? Who thought that was a great idea i dont know. We must be missing an enourmous chunk of progress.

Every juristiction its own idea of fair use? Thats just hilarious?

I never really thought about peoples privacy either but at first glance you seem to be right.

Do you have any solution to the puzzle? People are quite attached to the concept and many build their house on this soil. Appeal to tradition?

Repulsion9513
0 replies
2d11h

Prosecution and persecution are two different things. Persecuting anyone is not a good time :)

Why, if you're not allowed to release under a license, should you be able to release all rights reserved (which can still be a copyright violation!)?

If you need to prosecute the person, there are established procedures for that: DMCA, or ultimately a lawsuit over the infringement. That you didn't identify yourself publicly on the site does not make that impossible. In fact the point of the DMCA was to make it easier to handle this - because if the provider doesn't comply with your DMCA, you can sue the provider.

mattstir
0 replies
1d20h

The publicly-available archives released by Stack Exchange are updated roughly quarterly and have the attribution requirements as specified by CC BY-SA + the Stack Exchange ToS.

The article makes it sound like OpenAI is using the API though, rather than the archives. The API and live sites forbid scraping within the acceptable use policy, as seen here: https://stackoverflow.com/legal/acceptable-use-policy

lamontcg
0 replies
2d16h

And, incidentally, passing zero of that money on to the people who actually created the content.

I mean that is basically SO's entire business model.

People do tons of work for free and SO runs the service and monetizes it.

hedora
0 replies
2d17h

Given the CC license, and the fact that contributors can apparently code, they should scrape the content and be done.

Of course, that’d mean bypassing the scraper blocker. This article is a decent starting point:

https://stackoverflow.com/questions/66413511/how-to-avoid-be...

Brian_K_White
0 replies
2d9h

StackOverflow are violating the SA part of CC-BY-SA by selling special access to the CC-BY-SA content to one party and blocking others from the same thing.

OpenAI are violating both BY and SA but that's a seperate issue.

Everyone who contributed work, did so under terms that the work was free for all, not a resource that one party can sell to another party who then sells to end users. Those end users were meant to have it directly without having to pay openai or anyone else, and if any bulk/scraping access is allowed for anyone like openai, everyone else has the right to the same thing for no more than a "shipping & handling" charge to cover the network & employee cost to physically deliver the data.

What are StackOverflow selling, and/or what exactly are OpenAI paying for? What is the goods or services that is traded for the money?

There are many possible answers but I see no answer that doesn't ultimately one way or another wind up resolving into a violation of one or more terms of CC-BY-SA by both StackOverflow and OpenAI.

atleastoptimal
9 replies
2d20h

The problem is whether people see programming as a zero-sum or positive sum enterprise. In the real world, it acts as a positive sum enterprise: one person's contribution benefits themselves and all those who use or learn from the code. However many gatekeeping-type people view it, perhaps instinctively, as zero-sum. They imagine that OpenAI benefitting from this partnership, or any amount of learning via web-scraping their models perform, necessarily harms those who put their content online. This in a nonsensical argument yet has garnered a fair amount of support due to the somewhat reflexive anti-AI sentiment as of late, which is separated from the more nuanced concerns of existential threats due to AI.

zb3
7 replies
2d20h

Positive-sum rarely exists in this world.. after all, one's wealth determines their influence over others. Both sides might gain but this usually means others lose.

In this case, contributors might lose attribution. SO might lose traffic but they'll be compensated. Contributors won't so eventually there might be no reason to contribute anymore..

jonathankoren
4 replies
2d19h

It’s simply false that positive sum doesn’t exist the real world. Even the most simplistic trade argument in remedial Econ 101, or even Bio 101 reveals this.

What’s rare are zero-sum games.

zb3
3 replies
2d19h

Rising inequality suggests otherwise

kolinko
2 replies
2d14h

If I'm not mistaken, the whole society is getting wealthier, it's just some people are getting wealthier faster than the others - so it's still a sum-positive.

Here are the source charts:

https://ourworldindata.org/happiness-and-income-inequality

zb3
1 replies
2d10h

You only consider those that "make it", there any many who don't because it's getting increasingly harder to be "useful" in the market (ChatGPT is cheaper), innovations usually make it worse. Those "new jobs" are harder and many won't qualify

kolinko
0 replies
1d4h

Did you check the linked charts? The stats show that people are getting wealthier across the board.

__MatrixMan__
1 replies
2d13h

Isn't the existence of wealth in the first place sufficient evidence that wealth is something that gets created? We started out banging rocks together and now we have all of this weird stuff which presumably people like or something.

zb3
0 replies
2d10h

Now we work harder, and it's getting unbearable for those kn the bottom... Wealth also affects whether you're "useful" and you need to be "useful" to survive.. It's getting harder to be useful

squigglydonut
0 replies
2d1h

Imagine that you spent a lot of time helping people and building a community. Then a company encodes this "help" into text format and put it into a book, and makes a lot of money selling the book. In doing so, this company kills the community. You wouldn't be pissed off about that?

uberman
8 replies
2d20h

In addition to deleting answers, I think protesters should up vote wrong answers and crappy posts.

For years the community has defended punitive down votes on correct answers to crappy questions as "you can do with your vote as you like". I see no argument against flipping that around.

wseqyrku
7 replies
2d19h

What is the end game here? You wanna get paid? That wouldn't be more than a few cents, just like how Spotify deals with artists.

uberman
3 replies
2d18h

My personal end game if I have one and I'm not sure I do would be to ensure that I can help individual novice programmers become better at their craft. Not to make billion dollar corporations even richer.

williamcotton
1 replies
2d17h

What's your stance on when there are open source LLMs that are as capable or more capable than GPT-4?

Brian_K_White
0 replies
2d8h

Does everyone get equal access to let their own copy of the open source llm download a copy of SO?

Are those open source llm users in turn selling access to the content they got for free, and also stripped of attribution?

What exactly is changing hands in trade for the money, that doesn't one way or another violate CC-BY-SA?

It's not merely the fact of any form of commercial activity, since there is no NC in there, but the specific actions here by both StackOverflow and OpenAI violate the terms the content was originally created and shared under.

jncfhnb
0 replies
2d17h

And you intend to do this by signaling that incorrect answers are correct?

nkrisc
2 replies
2d17h

No one needs a sensible, logical, or even rational reason to do that which they are already entitled to do.

Brian_K_White
1 replies
2d7h

You know they read this as "They can do something illogical if they want to." instead of "They don't owe you an explaination of their reasoning nor require your approval of it, and your not knowing or understanding or agreeing with their reasoning does not mean there is none or make it invalid."

nkrisc
0 replies
2h31m

It’s all the same. Whether it’s rational or not isn’t really relevant and is subjective.

user3939382
6 replies
2d20h

SO made it such a pain in the ass to contribute I gave up trying every time I’ve historically been interested. Like I’m already sacrificing my time to offer my expertise helping someone, you want me to jump through a bunch of hoops to have the privilege of doing so? No thanks.

colechristensen
3 replies
2d20h

That same amount of pain in the ass gaming made spam and terrible quality answers equally discouraged. Given the volume of at least decent content on stack overflow, I'd say the game worked. Somebody could try to make it better with a competitor but it would be a hard thing to succeed at.

lmm
1 replies
2d18h

The more hoops they've added the worse the quality has gotten. The quality has declined over time, and most of the good answers you see nowadays are from people who got in the habit of contributing back when the process was much simpler, and would likely never have joined the site if it was as onerous as it is today.

hedora
0 replies
2d18h

On a related note, it costs my employer way more to pay me to solve a captcha than it would to pay a captcha solving service.

At some point, passing the hoops turns into a negative predictor of comment quality.

malfist
0 replies
2d18h

Have you assessed they quality of QA's that aren't years old? Anything decent that I find is usually quite old and possibly out of date.

It doesn't help that asking for a more recent answer gets your question closed as a duplicate, and new answers can never overcome the inertia of the historical ones.

September has came for stack overflow

majorchord
1 replies
2d19h

What do you suggest as a better alternative?

akira2501
0 replies
2d18h

I'm starting to wonder if the days of "free, ad-supported, user-generated content wells" are over. The audience and participation base have grown larger than the ability of these single entities to rationally cope with while still maintaining their original mission and profits.

We've outscaled our original hopes for the Internet. It was originally meant to be a tool genuinely controlled by it's users; unfortunately, it's largely ended up in the stranglehold of a few monopolists.

j45
6 replies
2d20h

As an early user of SO, I certainly don't want my answers sold for profit again and again.

vkou
1 replies
2d20h

Then you shouldn't have been posting on SO.

If you want to contribute to the commons, contribute to the commons. If you want to contribute to the commons without commercialization of your work, contribute with some non-com license [1]. If you want to feed a corporation with your labour, post on SO.

[1] It'll still be illegally scraped and commercialized by some AIBro, and you'll have no proof or recourse against them...

j45
0 replies
2d18h

Scraping to me is different than side licensing the content in some other form or usage than what it was created for.

Licensing also means the writer retains the ownership.

Cc-by-sa has had a few revisions too.

kragen
1 replies
2d20h

the so license is cc-by-sa and has been since the beginning, not cc-by-nc-sa

j45
0 replies
2d18h

True.

Cc-by-sa has had a few revisions.

Attribution would be lost in an llm, no?

jonathankoren
1 replies
2d20h

Your answers are already sold for profit again and again. That's the whole point of SO existing, or maybe you under some delusion that SO is a charity?

j45
0 replies
2d18h

No delusion.

Specific licensing and side deals is different to me at least than scraping.

caesil
5 replies
2d19h

Ben continues in his thread, "[The moderator crackdown is] just a reminder that anything you post on any of these platforms can and will be used for profit. It's just a matter of time until all your messages on Discord, Twitter etc. are scraped, fed into a model and sold back to you."

Uh.... yeah, it's a company, not a charity. No one's forcing you to post on StackOverflow. No one's forcing you to buy a ChatGPT subscription.

readyman
2 replies
2d19h

Given the lack of an alternative, should we instruct the human instinct of sharing in the pursuit of knowledge to sacrifice itself or accept the risk of exploitation?

If your sorry platitude is what we have to show for it, capitalism must go to Hell.

wordofx
1 replies
2d17h

Given the lack of an alternative

There are plenty of alternatives to everything. But no one uses them. Because why would they?

readyman
0 replies
1d19h

They wouldn't, because network effect, which is why there is no alternative.

Barrin92
1 replies
2d18h

Uh.... yeah, it's a company, not a charity.

While this is true sites like Stackoverflow very much only function because they create the illusion that it is in fact a "community". The moment they make explicit that there is monetary value in the knowledge people post on the site it becomes obvious that the users are, using Varoufakis term, technoserfs.

You're very much never supposed to notice that Reddit, SO, and so on continously extract value out of work you produce, at worst you're maybe supposed to notice an ad or two. Because if you do notice that you might actually start asking why you aren't getting paid. Which is btw funnily enough exactly what news organizations and SO have realized vis-a-vis openAI.

caesil
0 replies
1d

IMO it's kind of silly and mentally corrosive to think of everything you do in these kind of transactional terms.

I post on reddit because I find it enjoyable. I am not doing "work" that I think I deserve to be compensated for. Not every POST request I make to someone's server should be accompanied by a bill for my labor.

ProjectArcturis
5 replies
2d19h

Is there anyone who makes stackoverflow their first stop for programming questions anymore?

int_19h
3 replies
2d19h

Google ranks it pretty high, so it is the de facto first stop for many.

ProjectArcturis
1 replies
2d19h

Ah. I've found various LLMs are much easier to query and generally nicer than SO posters, so it's been quite a while since I've needed to visit SO. I assumed most people had made a similar journey.

jazzyjackson
0 replies
2d17h

ur in a bubble, harry

yesiamyourdad
0 replies
2d16h

Not so much anymore though. I've seen over the last year that SO ranks lower and content farms like geeks4geeks, Programiz, etc. are getting much higher in results.

notatoad
0 replies
2d17h

i still google things, mostly out of habit. but i'd say half the time i visit stack overflow, the answer i get there is either outdated or too opinionated to be useful and i end up going to chatGPT.

casenmgreen
4 replies
2d19h

I left some time ago, and I'm very glad I did.

I left because the staff behaved in a disingenuous manner.

I found when leaving, as mentioned in the article, that you are not allowed to delete accepted posts, so you can't delete your content, should you come to think SO objectionable and wish your content not to be there.

I can't see now why anyone would spend time posting answers there.

zerocrates
2 replies
2d19h

I don't love them getting into bed with AI, but also don't think it's unreasonable for them to not allow angry users to blank out their prior submissions.

The whole deal was that you basically donated your posts and CC-licensed them. I wouldn't begrudge Wikipedia from similarly dealing with upset editors who went around blanking the articles they contributed to or reverting all their changes.

Retric
1 replies
2d19h

Wikipedia editors aren’t the sole source of their pages. I am fine with people leaving and deleting their posts because they may feel the information will become outdated without being maintained.

SO unfortunately is actively hostile to correcting outdated information. Which is somewhat understandable as recognizing how little long term value answers provide undermines their moat.

The site has basically become worthless for JavaScript due to such rot which helps explain why they are trying to cash out on the AI side of things.

arp242
0 replies
2d19h

Many answers have been edited, commented on, and reviewed by others. So it's also not exactly a one-person show either.

Outdated info is a problem, but not so easy to solve; I have answers from Ruby on Rails 4 era that are still perfectly valid today. Others may not be. Also remember that people sometimes stay on old versions for a long time. I don't know what the best solution is, but destroying information is not it.

firecall
0 replies
2d19h

You cant delete your posts on here either!

Which I dont like :-)

I'm with you, I think you should be able to delete your own posts and erase your internet history.

I dont think that everything we ever write on the internet should be stored forever because of some misguided intention to preserve conversations for future generations :)

keefle
3 replies
2d14h

Side related question: are there content licenses coming up that are similar in spirit to what the GPL is but targeted at AI training? (E.g. if this piece of content was used in training an AI that was to be used commercially, the AI's weights must be published)

progval
2 replies
2d11h

The argument AI companies make is that LLMs are not derived works of their input, or is fair use. So according to them, the input's license does not matter.

postepowanieadm
1 replies
2d11h

Do you have any sources about that, I'm just curious:)

jonathankoren
3 replies
2d20h

OpenAI shouldn't have paid a license. Just scrape the content. It's fair use.

Now they paid the shakedown fee, and Stack Overflow has user riot on their hands, albeit one that's trivial to shutdown. Send OpenAI the db backup.

kragen
2 replies
2d20h

fair use applies when there's a copyright infringement to defend against, but the cc license on stackoverflow clearly permits scraping the content

jonathankoren
1 replies
2d20h

Well then, there's even less reason to pay the rent Stack Overflow was wanting.

kragen
0 replies
2d20h

agreed

chmaynard
3 replies
2d19h

Stack Overflow has been assimilated. Resistance is futile. It served a useful purpose but now it's part of the glorious AI universe to come. Rest in peace.

Reason077
2 replies
2d19h

The knowledge that's baked into those LLMs comes from sites like Stack Overflow. Without them, how can the LLMs learn new things?

akira2501
0 replies
2d18h

If it's posted on stack overflow, it's not new, it's merely been published. If this is the bar for LLM "learning" then they are doomed to live in a hazy bubble of the recent past.

VelesDude
0 replies
2d18h

That is the big question with LLM's. How can we tell what is being fed in is original content or just the output fed back in like a recursive fractal?

a2128
3 replies
2d14h

I find it deeply troubling that platforms are becoming so hostile that users are having to strike against the owners by mass deleting their content. And then the platforms handle this by simply undeleting the content and banning them from continuing to delete any more (StackOverflow, Reddit)

This may also be legally dubious in Europe as, while the authors may have granted copy rights to the platform owners, they still maintain their moral rights which may apply in this case (IANAL)

iamkonstantin
2 replies
2d11h

I just submitted a request to have all my content removed from SO and will challenge the outcome if needed. My right to be forgotten and have my content deleted supersedes SO’s dubious, “nobody really reads these” terms and conditions.

popcorncowboy
1 replies
2d10h

Great idea. Have done the same.

MissTake
0 replies
2d7h

I’m not convinced yet that GDPR covers things.

GDPR is focused on PII and anything that can identify you directly or indirectly as an individual.

If your posts start with “My name is Jane Doe and this is my post” then that would be one thing.

However from everything I’ve been able to ascertain, your average forum post is unlikely to be covered, save from anonymizing the email address etc.

Granted there are certain things that can lead to deanonymization, in the case of forum posts I’m unsure how far that goes.

It’s a myth that the GDOR “right to be forgotten” is an absolute.

I also suspect that creating anonymized IDs and attributing that ID to each thread would be enough to get past such attempts to link posts to PII.

SpaghettiX
3 replies
2d19h

I still use StackOverflow. Not as much as I used too, thanks to GPT, but still multiple times a day. What I find is that I spend less time on SO.

However, IMHO deleting questions you originally wrote in the past is hurting other users more than it is hurting AI training.

Other users cannot write similar answers to yours, because it doesn't add anything and they'd get downvoted or deleted. So if you hadn't written your answer years ago, others could've written something similar. Also, other users may have commented on your questions/answers. Their efforts would be lost/deleted if you deleted your questions/answers.

Thanks for your previous contribution to the community. But I would say the worst you should be able to do is remove your name/anonymise your posts, not just delete them.

jncfhnb
1 replies
2d17h

Multiple times a day sounds like a massive amount to me…?

__MatrixMan__
0 replies
2d13h

Looking through my browser history, I'd say that I average about 5 distinct SO posts per day. If you know there will be an answer it's less typing to search for it than it is to have ChatGPT regenerate it.

Ekaros
0 replies
2d11h

I wonder would actually deleting questions be a good thing. If there is no old question the same question asked again can not possibly be a duplicate... So constant loop of deleting questions might actually be effective way to fix some problems. And there is enough off-site backups already.

Karellen
3 replies
2d19h

Users are also asking why ChatGPT could not simply share the source of the answers it will dispense in this new partnership, both citing its sources and adding credibility to the tool. Of course, this would reveal how the sausage of LLMs is made

What? Surely the answer to that question is that ChatGPT doesn't know where the source of its answers is, isn't it? Isn't the question itself based on a fundamental misunderstanding of how LLMs work?

jarsin
1 replies
2d16h

I haven't used it extensively but when i ask a generic coding question in brave it gives me an ai response and it does list source websites. Not sure if it's the actual source or its just pulling them from a search or what.

bschmidt1
0 replies
2d15h

Could always perform a normal web search with the LLM result and show matches

squigglydonut
0 replies
2d1h

It "doesn't know" ;) nice.

sva_
2 replies
2d19h

It sounds like the measure of preventing users from deleting/editing their posts contradicts EU laws?

zarzavat
1 replies
2d12h

Only if they are putting their own personal information in their answers, which I assume they are not.

squigglydonut
0 replies
2d1h

Human answers are personal answers.

s1k3s
2 replies
2d17h

Haha, what's that gonna do? Ever heard of soft delete? It's a thing where even if you delete something off a website, the database still retains that information even though it becomes inaccessible by the public.

Everything we write on the web is like that, including this very comment.

vsuperpower2020
0 replies
2d16h

It lowers the value of the site if people stop answering and sabotage their existing answers in protest. Does that make sense? Do you understand?

notatoad
0 replies
2d17h

even if it were a hard delete, do these people think OpenAI is scraping the live version of the site?

the answers have already been exported. all you're doing by deleting it is ensuring it's only available on ChatGPT, and no longer available to web users who aren't using AI tools that ingested the content before it was deleted.

astrodust
2 replies
2d19h

It is nearly impossible to delete an accepted answer you don't want to have any more. I've had several which are wildly out of date and incorrect now and I don't want to update them, but the mods refuse to remove them.

leumon
0 replies
2d19h

You can request to dissociate them from your account. (better then nothing)

jazzyjackson
0 replies
2d19h

can't you just comment on the post informing people who land there that its out of date? id prefer that over following a cached link and hitting a 404

stainablesteel
1 replies
2d19h

SO has been doing the absolute worst things to squander their amazing lead for years

i haven't used that website since GPT came out, and now i contribute nothing to it

but i'm glad all of its content ended up training the models that put it out of business, thanks SO! you'll never be anything other than user contributions

lannisterstark
0 replies
2d13h

As dour as it sounds, I am in a similar boat. Who'd have thought that not needlessly getting called names when you ask a question (even if it's dumb -as that's how you learn) makes people less likely to interact with you.

what's amusing to me is that some people even in this thread are calling it a pro, not a con. I guess our field does indeed attract a certain kind of personality.

rich_sasha
1 replies
2d6h

I guess the core issue was always having a for-profit company preside over a "free" product. Clearly, they have to make money, and they aren't bound by ethics of open source. Contributors may feel like they are contributing to a FOSS project, but they aren't. What Stack exchange is doing is probably legal (?) and that's the bar they need to clear. The contributors aren't stakeholders and SE only needs to retain enough of them to sustain themselves commercially.

There's been more than a decade of companies now providing something for free, while they figure out how to monetize it, and these always scare me a little, because its always going to end up like this. Users of Facebook becoming eyeballs for ads, GitHub users providing free data for LLMs, SE selling data to Open AI...

If a product is free, then you are the product. And if you don't know how you are monetized, you're going to be disappointed by it sooner or later.

squigglydonut
0 replies
2d1h

Harsh but true. I think what stings about SO is that developers are the ones losing here. I think this will prompt less open source and encourage more private work. I hope people are seeing that they are being take advantage of on many fronts.

juleiie
1 replies
2d4h

It’s probably time for a pro publico bono stack overflow alternative. When money is involved those things tend to destroy themselves sooner or later.

And why would SO even profit from the hard work of thousands of volunteers. It doesn’t seem very ethical.

squigglydonut
0 replies
2d1h

Given how the industry has treated tech workers, this will be exploited. I'm interested in joining a private group with or without profit motive, that is not open source.

jarsin
1 replies
2d16h

Does anyone know if chatGPT etc could code without stackoverlow answers?

I think that is the big question, because the license seems it's going to give lawyers a very wide attack surface to go after every ai coder out there if they all need SO database.

tintor
0 replies
2d16h

There is plenty of code and bug trackers on the web for ChatGPT to learn from: OSS, Github, Sourceforge, ...

highwayman47
1 replies
2d15h

Why? They don’t want the knowledge to be more accessible to the masses all of the sudden? Also, all those answers are backed up somewhere.

paulryanrogers
0 replies
2d15h

Perhaps they don't want their answers to be systematically leveraged to put them out of work.

Though considering the site terms and CC license, I don't think deleting will actually help much.

Qem
1 replies
2d16h

Wonder if Wikimedia Foundation couldn't just take the opportunity, now that Stack Overflow is alienating their userbase, to launch a rival Q&A site. I was always puzzled why did they never attempt to enter this space, even before Stack Overflow, given their prior experience in crowdsourced information commons.

YPPH
0 replies
2d15h

Pragmatically, the software powering most their properties, MediaWiki, is not suited for it. It's hard to see them investing in development of a new platform given uncertainties of success.

HappyPanacea
1 replies
2d20h

I suspect they will fail to emphasize the ShareAlike property of CC BY-SA 2.5/3.0/4.0 which is incredibly strong - "ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original". This is an incredibly wide and vague definition, especially "build upon" which will be unattractive to many users.

nox101
0 replies
2d13h

I suspect, if ChatGPT quotes an answer or a snippet it will show attribution and a license for snippet. If it instead only uses the knowledge it gained from the answer/snippet and writes it's own answer, then, just like a human, it won't attribute

FireBeyond
1 replies
2d20h

This article is extremely biased towards SO:

Stack Overflow and OpenAI have joined forces through a new API partnership. This collaboration aims to provide developers with a powerful combination of Stack Overflow’s vast knowledge platform and OpenAI’s advanced AI models. Through the OverflowAPI access, OpenAI users will benefit from accurate and verified data from Stack Overflow, facilitating quicker problem-solving and enabling technologists to focus on priority tasks. Additionally, OpenAI will integrate validated technical knowledge from Stack Overflow into ChatGPT, enhancing users’ access to reliable information and code.

Come on. Was this taken from a Press Release?

it can be disruptive to the entire community to delete or remove content that might be useful to someone else. Even if this content is no longer useful to you as the author. [sic]

As for the rest of us Stack Overflow users, I would not recommend jumping to delete your own content in protest too.

To be fair to Stack Overflow, the warning email and suspending of accounts is likely not a new thing.

I can't find a negative word about SO in this entire article, so "to be fair" doesn't seem meaningful.

nicklecompte
0 replies
2d17h

If you check the byline, the author is a Microsoft MVP / product evangelist. So I don't think he's biased towards SO so much as he is biased towards anyone doing business with Microsoft (or OpenAI). He also seems very pro-GitHub Copilot.

yungporko
0 replies
1d10h

stackoverflow users being dicks and preventing people from gaining knowledge? who could have ever seen this coming?

yesiamyourdad
0 replies
2d16h

I have very few SO contributions so I don't have much at stake personally, but I have observed that there was a trend of people using their SO profiles for career advancement. I'd see people reference their SO activity on resumes and I had job applications ask for my SO profile if I had one, and I've seen advice that a good SO profile was valuable the way a good Github profile is. Is that something people factor in to their decision to delete? And isn't that social capital a kind of compensation for their contributions?

witoong623
0 replies
2d12h

Thank to people who delete their answers, now I have to pay OpenAI to find answers they already scraped. Talking about helping OpenAI making more money :(

squigglydonut
0 replies
2d1h

Your knowledge work is being exploited. If you don't allow Open AI to train it's subscription product on your open source contributions, you will get banned.

m463
0 replies
2d11h

What I don't understand is

in CC-BY-SA, SA means share alike. Does openai have to share their models?

kshaibani
0 replies
1d20h

I code daily, and I don't remember the last time I used Stack Overflow in the past 2 years.

jaimehrubiks
0 replies
2d20h

It's sad :(

I preferred the old days better

firecall
0 replies
2d19h

I'm surprised OpenAI hasnt just crawled all of SO already?

eli
0 replies
2d20h

This seems like a really bad way to handle what should have been a foreseeable problem

bdjsiqoocwk
0 replies
2d18h

No empathy for users to spend their time contributing to a corporate walled garden and who to top it off get emotionally invested in it.

Ukv
0 replies
2d20h

From a search, the message seems to have been in place since at least 2017[0] and I'd suspect is automated on detection of mass-deletion.

I can understand the reason for the policy (in some ways SO functions more like a wiki than a forum) and it doesn't seem to have been introduced to quell the protest against OpenAI.

[0]: https://meta.stackexchange.com/a/296822/287788

Bobo-hilife
0 replies
2d20h

Resistance is futile!

Beldin
0 replies
2d12h

Hmmms. While I definitely can see SO's arguments concerning deletion, that letter seems to blatantly contradict GDPR's right to be forgotten, which Wikipedia describes as a more limited "right to data erasure" [1].

To coin a Dutch phrase: I cannot make chocolate out of that. Anyone here have an idea how to bring these two points together? Other than the obvious "wrt. EU inhabitants, SO is lying", that is. Or is it really that simple?

[1] https://en.m.wikipedia.org/wiki/Right_to_be_forgotten - under "European Union"