It's interesting to me the ambiguous attitude people have to reproducing news content. Whenever there is a story from NYT on HN (or any other large media outlet), the top comment is almost always a link to an archived version which reproduces the text verbatim.
And this seems to be tolerated as the norm. And yet, whenever there is a submission about a book, a TV show, a movie, a video game, an album, a comic book, or any other form of IP, it is in fact very much _not_ the norm for the top-rated comment to be a Pirate Bay link.
I think that's something worth reflecting on, about why we feel it's OK to pirate news articles, but not other IP.
And the reason I bring this up, is that it seems like Open AI has the same attitude: scraping news articles is OK, or at worst a gray area, but what if they were also scraping, for example, Netflix content to use as part of their training set?
To me, there is a sense that the news, which is real information about the society that we currently live in, should be availabe to all participants of that society. The notion of being a good citizen requires that one stays informed. Books, movies, videogames etc. don't have that role and are more consumption goods.
Who pays?
The government (thus the people, in a so called sharing of public burden)!
For example in Hungary there is an official news agency ran by the government, with (cumbersome) free access for everybody. Of course this does provide somewhat biased presentation of some facts, but on many topics it provides unbiased access to news for any citizen.
This is actually pretty common in Europe, often funded by mandatory fees (for some reason not branded as taxes) certain appliance owners need to pay (UK TV license, German Rundfunkbeitrag). For this fee people get access to news and cultural programmes for free via different media (radio, TV, internet).
I agree with your general point but Hungary is probably the worst example you could have chosen from any EU country! The Orbán government is famously using it to spread propaganda and fake information in unprecedented levels.
The level of control governments exert on public broadcasting networks is widely different. Since Meloni, the RAI in Italy is facing similar issues, but Hungary is still the canonic example of government misinformation and propaganda.
That is a orthogonal to the discussion we were having. The topic was whether people should have free access to news, and how should it be financed, not the quality of that news.
People have free access to public roads all around the world, and the quality wildly differs in that as well. Also the quality of for-profit news services does differ wildly, you might have an opinion about that of fox news, for example, but that is also off topic in this discussion.
On the contrary, the quality of the news is very important to the discussion. There is no point in making trash freely available to the public, after all.
The topic is a bit more nuanced, and far wider than "not fitting my favourite narrative on some topics, so it is generally and objectively trash".
think about this: I will get mostly objective and useful reports of the flood approaching my home near the river regardless the narrative/interpretation they might have on some other topics, or the biased reporting on the merits of the government in handling the situation at the dams.
For me I'm not here to debate on the political policies of some governments, just gave a few examples of ways to fund public access to news. This discussion is over from my part.
No it isn't an orthogonal discussion. The reason Orban wants people to have free access to his propaganda is because it directly serves his purpose. To finance it directly from sales of the media would defeat the purpose. Coupled with Orban's attack on free media it completes the picture.
I would argue the people of Hungary would be better off without hatred against asylum seekers and minorities, political opponents, lies and misinformation.
Every news source has biases. Under the paywall business model, the people who share the biases of their favored news outlets pay for them, and in exchange, they get to ensconce themselves inside a bubble free of dissenting viewpoints. This also reinforces the bias of the news outlet; if they don’t toe the line, they will lose subscribers.
Instead of paying news outlets to provide ourselves with filtered feeds of content that match our own biases, we could instead pay news outlets to produce competing streams of explicit propaganda to be freely disseminated. The overall bias and quality of the news would be largely unchanged, even if the biases were more obvious; in fact, it may even improve.
Yes, someone needs to pay.
I see the gp post about pirating news as a very good point, while having no veleity to pay the New York Times, and being ok with not reading it in general.
But I also pay for my national (public) news outlet, and their articles are available to anyone anywhere in the world. I don't know how it should work, but I wish we could get to a system where the burden to keep news outlet alive is split thinly enough to have open but viable publications around the world.
Basically the same way weather stations collaborate all other the world and we pay for our local stations while getting acccess to all the forecast everywhere.
Everyone, if you don’t…
There's a few possible models here:
Public donors ALA Patreon
People doing it in their free time because they care a lot about the subject (nowadays with things like Twitter its quite possible for an independent obsessive to write a good piece on, for instance, the Ukraine War by mostly referring to open sources and public announcements by governments and corporations)
Government sponsorship ala BBC
It’s a difficult problem with no great answers. If you want news to be free at the point of delivery you want public service news agencies. But that means they’re owned by the government… who are frequently the target of critical reporting.
That's not true. You can have Independent public broadcasting that is not owned by the government and is reporting critically on it.
It’s still a difficult tension. The government will always control the purse strings so independence is always going to come with conditions.
The Guardian in the UK is an example of an alternative: It is owned by a trust, which funds it.
Norway has substantial public media funding across the political spectrum, but as you point out it always comes with conditions, even is less so than the funding for the state owned broadcaster.
Combining the two models and putting public funds into several perpetual trusts intended to provide funding from their profits at arms length from any sitting government similar to the (private) trust funding The Guardian might be an interesting alternative.
(EDIT: Norway also has its own variation over The Guardian model - the second largest media group was founded by unions but is now majority owned by the combination of two public benefit trusts)
I'm of a similar mind. I take the more expansive view that everything created is part of our common property and that something like an LLM should be able to yield the summary and references to those creations. As I've said elsewhere, LLM systems might be our first practical example of an infinite number of monkeys typing and recreating Shakespeare (or the New York Times).
I understand that copyrights and patents are vehicles for ensuring a creator gets paid for their work, but they are flawed in not rewarding multiple parallel creations and that they last too long.
a LLM is just a hugely lossy-compressed version of its training data, an abstraction of it.
Much in the same way as when you read a book, your brain doesn't become a pirated copy of the text as you only store a hugely compressed version of it afterwards, a feeling for the plot, generated images and so on.
That's what I thought from my various readings about LLM systems. I'm guessing that the kerfuffle from the New York Times and other shortsighted organizations is that copyright allows them to control how their content is used. With humans, it's simple as its read and misremembered. Using it for LLM training requires a different model. It probably should be a RAND fee system based on volume of training data because, as you say, the training data is converted into an abstract form.
I agree, but nothing worth having is free. NYT and other news outlets have to ultimately pay reporters to go out into the world and do the work. The reporters are not priests, and the NYT is not a church that lives off donations and tax exemptions. They need money to operate, and you may disagree with how they try to collect that money (paywall) but that doesn't solve their funding problem.
How would you pay for news otherwise?
You could subsidise news via "public service" style stipends. Much like having a government owned "independent" news service (eg the BBC) this comes with a high risk of corruption. Don't bite the hand that feeds and all that.
You could implement a much lower friction non-recurring payment system. I'd be far more tempted to drop a little money on a fixed term (5 articles, 1 day, ???) setup than a subscription.
Realistically, I am not paying for more than 1 long running sub. And there are > that number of solid outlets.
This is somewhat what Apple News+ works like, but I doubt most news orgs want to be held captive by Apple.
Who should pay the journalists or the investigative reporters?
The state, through taxes. It's a public good after all.
People post archive links even to fake NY Times.
Not everything is news that appears in a newspaper. There are opinion pieces, etc.
what about wordle or the crossword or the cooking section
https://cooking.nytimes.com/
Good comment, it was very funny to see how people desperately try to find moral justification for pirating media A but not B. "It's apples to oranges, you see, there are less letters in the NYT article than in the book and they are rendered differently, so it is ok to pirate their work. I did nothing wrong!" :)
There's no way to get your money back if you didn't like the content. If they don't want their articles to be read for free then they should keep them out of my view. And certainly not use clickbaity headlines. Information can be copied and they should accept it, or change their business/distribution model.
So if I went to a cinema and didn't like the movie, I should be entitled for a return, right? Or if I went into a museum and didn't like the art displayed there?
If you are advocating for a free for all libertarian dystopia, well, I have some bad news for you - they never work.
Not being able to un-see a movie and get your time and money back is one side of the coin. The other side is that information can be copied.
Both sides suck for one of the parties. There's no reason why one of them gets it their way, especially if it requires a contrived legal framework while the other way would require nothing at all.
You’re not paying to enjoy the content, you’re paying to experience the content.
And as long as you had the opportunity to experience the content, you’ve gotten what you paid for.
I don’t see “I don’t like it” as a valid reason for a refund.
Not sure about others, but I'm not.
Would you make the same argument for a sporting, theatrical or music event? That you should be refunded if you didn't enjoy it?
Does it matter? Sounds to me like an apples and oranges comparison.
If I read an article in the NYT then I'm paying for what I took away from it, not for the amount of time that it allowed me to kill.
Your personal opinion on the matter has little weight here.
It doesn't matter what you think you're paying for or should be paying for, the fact of the matter is that you're paying for the effort people put in bringing that to you. So you are, whether you want to be or not.
I don't agree with the OP but how are refunds a free for all libertarian dystopia?
"Information can be copied and they should accept it" <- I was referring to this line. This basically means that OP thinks that any intellectual property should be free for everyone. This means that probably half of humanity (who are currently creating anything with IP) will have to be libertarians, and that can't happen unless all humanity are libertarians. And libertarian society is a dystopia. :)
It is actually pirating content by companies for humongous profit, or pirating by individual human beings for free access to culture and entertainment, oftentimes for content one has already paid for, but rendered inaccessible by megacorporations.
Which content making businesses earn humorous profit margins?
Are all the journalist layoffs a fever dream?
This is one of the more profitable ones, and only because they employ unscrupulous tactics:
https://www.macrotrends.net/stocks/charts/NWS/news/profit-ma...
This is NYT, the most successful news business:
https://www.macrotrends.net/stocks/charts/NYT/new-york-times...
As for movies/tv show/music makers, let’s just say most people in the software engineering business would look at their numbers and count their lucky stars that they are not in the movie/tv show/music business.
(It is also true that excessive copyright lengths have removed access to content that the public should have a right to).
The movie/tv show and music business can keel over and die tomorrow - it wouldn’t affect the value of art produced by humans at all. I see those more as exploitative leeches than as contributing anything positive.
If only piracy would actually harm these businesses but alas as often demonstrated it has zero effect on their bottom line, if anything it increases their profits.
What do you mean by "art"?
Hard question, but in the context of my comment I would say any kind of visual media or music
You got my point backwards: AI companies will make it from the pirated content, that individual users don't make.
https://en.wikipedia.org/wiki/Mad_(magazine)
https://www.theonion.com/
I wonder what the reaction of some of the people who browse this forum would be if the output of their careers were so commonly pirated. Somehow, I think most think that this argument doesn't apply.
I’d be pretty delighted. I’m paid for getting projects done, not for keeping hold on some copyrighted code. I want all my code to be open sourced, and reused.
Of course pirating any media is totally fine from a moral standpoint.
It seems pretty natural to me. People generally have less problem with stealing a candy bar than stealing a car. (Consider the cost to produce a NYT article vs the cost to produce a Hollywood movie). I don't think the stealing-vs-pirating analogy is perfect, but it's related.
As you noted it is not the norm to post pirate links here for IP other than news articles, but that doesn't mean that a lot of people think it is not OK to pirate those other forms of IP.
In nearly any big discussion that even remotely involves video streaming there will be numerous posts from people explaining why they pirate (usually with ridiculous justifications like "subscribing is not an option because even though this paid service does exactly what I want now at a price that is trivial for me they might someday later change").
The impression I've gotten is that piracy of nearly everything is widely felt to be OK here. Information wants to be free, yada yada.
About the only piracy that is consistently frowned upon here is piracy of open source software. When some company sells an embedded device that uses GPL code without releasing the corresponding source that's viewed as just a little short of a crime against humanity.
People used to leave newspapers in the trash, on the train, all over the place. Anyone could pick them up and read for free. I think it's reasonable for folks to carry this attitude into the digital age. People feel like news is something to share, it's not the source of creative expression, it's facts and as such we feel entitled to know the facts about our world and what is happening that might affect us.
That newspaper was likely paid for by someone, and could only be read by one person at a time.
And what if the person picking up the paper would stand up and shout the content of the article so all the people on the train would hear?
Reminds me of the movie News of the World. The main character's job is going from town to town, reading newspapers aloud.
While I'm well aware I'm being pedantic, me and my brothers would share the comics together while my parents kept the news, up to 4 of us consuming 1 paper at a time. Realistically, the reading limit was due to the physical properties of the object and not an inherent property of information to be consumed through one avenue at a time
No it isn’t reasonable and people not paying for that newspaper read anymore is the reason all news is sensationalist opinion pieces today.
This seems very false to me. Spotify is the prime example. They offer a good product that covers a 100% of my needs at a reasonable price. If that was an option for say UFC or engineering books, you bet I’d be subscribed. But being forced to read through some crappy reader software when I need the book source to take annotations in another software doesn’t work, so here we are. Same with the absurd pay per view business model of UfC.
For books, if it's a client reader software frustration, then you should still buy the digital version and then you can pirate the PDF book and use as desired within the constraints of copyright law (e.g. don't go sharing the PDF). That way you get the client you want but you still paid the content creator. But to use the argument, "oh, I don't like their client so I'm going to not pay them" is BS.
For UFC, your complaint is you don't like their pricing. The whole point of copyright is to give someone the monopoly to control pricing so they can use that pricing power to incentivize them to create the product in the first place. Similarly to patents. Thus, complain about the format things are delivered in all you want (like the client) but pricing is inherent to copyright or patents for good reason. You are now just arguing that you as a consumer should be able to pirate if you don't agree with pricing. And that's ludicrous.
In that case, just read a news article about the event. Copyright doesn't cover facts, only creative expression. So a news article covering the facts of the UFC fight is able to be published without the consent of the copyright holder. Think of the digital video of the fight almost like buying a ticket to the fight. You're saying you should just be able to sneak into the fight and watch it for free without any justification for you're doing so.
Finally, you can also watch other people's videos of the fight that THEY recorded on social media as other sources of the fight information. But if you want the recording with all the right angles, coverage, etc, it clearly has value to you over written recaps or social media coverage. And you are just arguing over price, which they are the copyright holder have the right to set the price.
The problem with buying by the crappy DRM version is that it provides no incentive to the publisher to change. I have thought about this long and hard, but ultimately the only way Spotify came about was because nobody bought the terrible DRM’d music the labels wanted to foist on us. We need to inflict the same pain for books. Personally, I think it would be preferable to donate the same amount to the Books Trust or your local library.
This is also along the lines of how I think about things. If you make it convenient enough (compared to the alternative of paywall bypass or piracy) and provide enough overall/general value then I'm happy to subscribe. At the point where the experience degrades, or seems beyond the point of what one person could reasonably subscribe to, I basically just give up.
Spotify hits this sweet spot where one subscription delivers almost all the music you'd want to listen to. Steam hits this for games where a couple clicks can play and launch almost any game with minimal hassle. Netflix mostly used to hit this, but most of the current streaming stuff feels overpriced if you want to get all content (unbundled cable bundle). News kind of feels similar to streaming where its unbundled, and there's a lot of interesting content out there, but there's no way I'm subscribing to 15 different newspapers, especially random local ones for cities I don't live in. If there was a news bundle subscription for a reasonable price I think I would pay for it.
Yeah, I don’t judge people for pirating or ad blocking, but the ludicrous justifications do get me - quite the entitled mental gymnastics. They remind me of bitcoin people trying to explain how mining is good for the environment.
There's a "polite society" thing going on.
Briefly, something like:
1) Ycombinator could not tolerate HN becoming a site known for sharing IP-law-violating content. And the people who come here by and large are smart and socialized enough to implicitly understand why.
2) At the same time, a large number of folks here mostly wink and nod at that sort of consumer infringement. And there's a society-wide bias towards "things like news are less protected", so that gets to slide.
3) But people also have a need to tell consistent-seeming stories about how things work, thus the mental gymnastics.
It ends up being similar to trying to explain why people pretend to be prudish innocents about sex. It largely reduces to "a small subset of the population goes sufficiently ballistic about what I consider to be relatively trivial stuff as to make it not worth fighting over, even if I find that to be ridiculous."
There are a lot of different versions of this that become so normalized it can be hard to notice.
I'm not saying you've never seen anyone make an argument roughly like that, but I will certainly say that it is not at all representative of the argument that I see made. Complaints usually have to do with current behavior of the platform or the wider streaming ecosystem.
> In nearly any big discussion that even remotely involves video streaming there will be numerous posts from people explaining why they pirate (usually with ridiculous justifications like "subscribing is not an option because even though this paid service does exactly what I want now at a price that is trivial for me they might someday later change").
If this is true, it should be easy for you to link to an example. Could you do so?
The GPL was specifically written to lock code out of the proprietary realm, so if you hate copyright[0] you'll hate people using it as intended.
[0] To be clear, I know of few who actually like copyright. Tolerate it? Use it as needed? Sure. The only people who actually defend the current broken-ass system are large media companies which are built to optimally exploit it.
Piracy is different from plagiarism.
People are understandably angsty about someone stealing credit. A NYT article is going to be a NYT article, not laundered around and presented as someone else's work.
Plus, there's the angle of enshitification and ads being injected into a paid service, and so on.
I’ve read and participated in many such threads and I’ve literally never seen this take. Often what I see is complaints about having to learn different UI for different services/apps, no offline, ads injected into paid services, having to figure out which service a show is on, and generally terrible UI you can’t change/fix.
I don’t think I’ve ever really seen someone use the argument “yes it’s great today but they might charge more later”. Not saying people haven’t said that but it’s far from the main thing people say in my experience.
Gonna gamble and call bullshit on this.
My speculation: the most popular reason HN'ers give for pirating: they literally cannot get the content otherwise.
2nd most popular: it is such a pain to either to purchase the content or get it to run on bog standard software (like Firefox/Linux/etc.) that otherwise paying fans are driven to whatever the current equivalent is for bittorrent.
In fact, I don't believe I've ever seen a justification for using bittorrent or whatever due to what someone's favorite streaming service might do in the future. I'm assuming you saw at least one based on what you wrote-- care to give a link?
Like what you said...
I wouldn't say OpenAI has exactly the same attitude, since they also pulled in thousands of books. Their position has been that it's not piracy, since they don't republish the books; effectively the AI just reads them and learns from them. If GPT can be made to reproduce the original articles, that's a more difficult argument to make.
It turns out you can reproduce articles with next-token prediction when the articles are quoted all over the dataset.
The articles themselves are indisputably not a part of the model, because it doesn't store text at all. OpenAI's position is correct; people just underestimated how well the AI learns from reading, especially when it reads the same text in a bunch of different places because it's being quoted/excerpted.
If it can and does reproduce a piece of text verbatim then the text is indisputably stored somehow in the model.
That's just not true. There's no search and retrieval involved. It just associates the words so strongly in that context because they were in the training data so often that next-token prediction can (sometimes, in some limited circumstances) reproduce chunks of it. It's like if a human had read pieces of an article so many times and knew NYT style so well that they could spit out chunks of an article verbatim, but using more efficient hardware and with no actual self-understanding of what it's doing.
So it stores the words, and it stores the links between those words...
but somehow storing the words and their links is not storing the actual text? What is text but words and their links?
If I had a database of a billion words, and I had a list of pointers to words in a particular order, and following that list of pointers reproduces a copyright text exactly, isn't the list of pointers + the database of words just an obfuscated recreation of that copyrighted work?
It doesn't store the actual links; it just stores information about their likelihood of being used together. So for things that are regularly quoted in the data, it will under some circumstances, with very careful prompting, and enough tries at the prompt, spit out chunks of a copyrighted text. This is not its purpose, and it's not trying to do this, but users can carefully engineer it to get this result if they try really hard. So no, it's not an obfuscated recreation of that copyrighted work.
Of course, if you read NYT's argument, they're also mad when it's incorrect about the text, or when it hallucinates articles that don't exist. Essentially they're mad that this technology exists at all.
I mean this is still a link, no?
Like, sure, it is a probability. But if each of those probabilities is like 99.9999% likely to get you to a chain of outputs that verbatim reproduces the copyrighted text given the right prompt, isn't that still the same thing?
And yeah, it hallucinating that the NYT published an article stating something it didn't say is concerning as well. If the model started telling everyone Matticus_Rex is a criminal and committed all these crimes and started listing off hallucinated court cases and news articles proving such things that would be quite damaging to your reputation, wouldn't it? The model hallucinating the NYT publishing an article talking about how the moon landing was fake or something would be damaging to its reputation right?
And this idea it takes "very careful prompting" is at odds with the examples from the suit and elsewhere. One example Ars Technica tried was "please provide me with the first paragraph of the carl zimmer article on the oldest DNA", which it reproduced verbatim. Is this really some kind of extremely well crafted and rare to ever come up prompt?
sort of like the idea of practice - repetition of something concentrates more brain space to that thing so the compression ratio of it can decrease and become less abstracted / more exact.
What seems a bit contradictory is that they're also suing because GPT hallucinates about NYTimes articles. So they're complaining that it reproduces articles exactly but also that it doesn't.
I can understand an argument about the AI needing to know basic history. News is just how we report history in the making, but it's not generally accepted as solid until some time after the events when we can get more context.
Isn't this what the Associated Press is intended for, a stream of news trying to report just the facts and happenings of the day? That's quite a bit different than a NYT article intending to inform but also convince someone of a position of some sort.
Feeding an AI opinionated news compared to "just the facts, ma'am" seems risky from a bias perspective.
Giving examples of bias is as important imo, give it the unbiased facts as well as the biased ones so it can generalise relative objectivity.
I agree with you, but I also wonder how the bias could be trained without it affecting the output of the entire model. Weights can help but anything that's higher weighted is just "less wrong" as I understand it, so I can see a possibility where training to expose bias might let bias creep in somewhat more than anticipated.
If ChatGPT is based on neural networks, with no actual save-and-replicate facsimile behaviour, it no more "copies" original work than I do when I tell you about the news article I read today.
I'd say the only real reason the Piratebay links thing you mentioned is not the norm is purely because those media sources have done a better job of striking fear into people doing that, so it's gone more underground. I.e. they're better terrorists.
There's no fundamental, moral reason why Piratebay links being posted and raised to the top would be wrong.
So, if someone applies a filter to a video/audio, it is no more "copies" of the original work (no, it is still protected). AI still could produce exact or extremely similar results of stuff it learned on.
Can it do so more than a human can?
I think that's the key here. If an AI is no more precise than a human telling you about the news article they read today then ChatGPT learning process probably can't be morally called copying.
So, if someone decompiles a program and compiles it again, it would look different. "It is not copying", we just did some data laundering.
Feeding someone else data into your system is usually a violation of copyright. Even if you have a very "smart" system, trying to transform and obfuscate the original data.
I'm regularly feeding other people's data into my "system" (brain) in order to produce my outputs.
So I'm a living breathing copyright violator. As a person I should be banned.
Fortunately, copyright is a bullshit fictitious right with no basis in natural law. So I don't lose much sleep over it.
Computers are deterministic. Giving the same inputs training would produce the same model. The comparison with brain is incorrect. You could add noise on input data during the training - it would more of less reproduce the real learning. Still, it could produce less useable models as a result.
The court could ask to show the training dataset.
In some circumstances, yes, but often it's not, especially if you're not continuing to store and use it (which OpenAI isn't).
It's not analogous to a filter, because that's applied to the actual work. The model does not keep the work, so what it does isn't like applying a filter. It's more like being able to reproduce a version of the work from memory and what it learned from that work and others about the techniques involved in crafting it, e.g. art students doing reproductions.
And if OpenAI were selling the reproductions, that would be infringement. But that's not what's happening here. It's selling access to a system that can do countless things.
When you tell people about some news article you read earlier you repeat it exactly verbatim? You also give this out to potentially millions or hundreds of millions of people for commercial purposes?
Copyright law does not care about the means of copying, just that you created something with substantial similarity to something you had access to. Whether or not the copy is in the form of a pixel array, blobs of random data being XORd to produce a full copy of music, or rows in a key/value attention matrix, doesn't matter.
Furthermore, there's Google research on extracting training set data from models. More specifically, Google found out that if you ask GPT to repeat the same word over and over again, forever, it eventually starts printing fully memorized training set data[0]. So it is memorizing stuff, even if it's not regurgitating it.
[0] When told of this, OpenAI's response was to block conversations with large amounts of repeated words in them.
Possibly because once an article is published the author receives no further payment. In all other mediums, there are residuals and royalties to be paid to the creators of the work.
And add to that fact that NYT subscription is hard to unsubscribe from. People have aversion to NYT, even setting aside the bias.
It took me all of 5 minutes to cancel my digital NYT subscription from the following month onward. No idea what you are talking about.
Why did it take you five minutes instead of twenty seconds? It should be as simple as clicking on the link to your profile then clicking unsubscribe, mere seconds not minutes.
Assuming you just said five minutes figuratively... Do you live in California or some other legal jurisdiction that forces them to play nice? Did you subscribe through some other company, like Apple?
Horror stories about unsubscribing from the NYTimes are easy to find in the archive if you search for it. They make you call and chat to a retention specialist on the phone. This should help you have an idea of what he's talking about: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
International one, as szraight forward as it could be: go to profile, go to manage subscription, cancel subscription, answer question why if you want, confirm cancellation, done for date depending on subscription.
That's only been true for the past few months, and it's been very well documented how complicated the cancelation process used to be [0].
It's funny because I use PayPal for any unknown-to-me site where I don't want to give out my card, but the only site where I've needed their help to cancel something was the New York Times.
[0] https://www.nirandfar.com/cancel-new-york-times/
Articles have ads on them, how are they not residual payments based on views?
I believe GP was referring to payments to the writer, not the publisher.
Yes, although I get that the route of the money may find it's way back to the journalist as salary. But generally goes into a pot for news gathering of which the salary will be withdrawn.
On ads it's acceptable to distribute them freely and it is advantageous to the company. Can we also see good journalism as an ad for the quality of a broader product?
Largely because "news" aka facts is not and should not be copyrightable, so while the style, and exact format of the article may be copyrightable, the facts contained within are not.
This makes a news story copyright murky in the eyes of wider society unlike a clearly 100% creative work like a TV Show or Movie.
Further the news themselves self cannibalize, how many stories are just rewrites of stories from other outlets? why it is OK for the Washington Post to copy the NY times, but not ok for OpenAI or Archive.org?
Creative works like books, TV shows and movies contain facts too.
None of which are copyrightable and infact has been the subject of DMCA abuse like when a Movie uses NASA footage and claims copyright on YouTube videos with the same footage.
Copyright is a complex subject, and not as vast as many believe, at the same time ironically it is more vast than i believe it should be. copyright should be much more limiting than it is. Which is at odds with people that believe copyright should be maximized.
Keeping in mind commercial success of a work, author or company is not why copyright exists. For the US, the only reason copyright can exist in our framework of law (i.e the constitution) is for the promotion of the useful sciences. No other purpose for copyright would be constitutional under the US Constitution
Copyright doesn’t exist solely for the “promotion of useful sciences”. https://en.m.wikipedia.org/wiki/Copyright
Citing Wikipedia you already failed..
That is a General Article about Copyright world wide, I Specifically stated US Copyright, which is Authorized by Article I, Section 8, Clause 8 of the United States Constitution[1], implicitly for the promotion of the useful sciences. That is where congress derives its power to pass copyright laws, and to enforce copyright on the people of the United States. No other purpose is authorized by the US Constitution
[1] https://www.law.cornell.edu/wex/intellectual_property_clause
You missed "and useful arts" in both of your comments. That's a key addition that you keep ommitting.
It is not just for sciences.
why it is OK for the Washington Post to copy the NY times, but not ok for OpenAI or Archive.org?
If the Washington Post printed an article from the NY Times nearly verbatim and without attribution, it would not be OK and surely they would take legal action.
Yes, because The NY Times is copyrighting the body of work. They are not copyrighting the "facts" themselves but the distillation of these facts into a body of work. Anyone is free to take the facts and produce their own works but not to lift the body of work verbatim that the NY Times created (plagiarize).
If the story was linking directly to the "book, TV show, movie, video game, album, comic book, etc", and the link only worked for some people while others randomly got a login request or similar, you'd also see the top comment being a link to an archived version which avoids the login screen. That is: the main difference is that the archive link has the exact same content as the link submitted in the story, only bypassing the login screen that some people see. And the only reason the archive site has the content is that it didn't get the login screen; if everyone always got the login screen, what you would see on the archive site would be the same login screen.
i don’t believe that is fully correct. The general policy here is that you cannot link to something that is paywalled unless that site plays the game of allowing crawlers but not actual human eyeballs. In the latter case the link is allowable because there are ways around it that the site owners allow.
I don't recall seeing this policy on HN guidelines.
It's on the FAQ https://news.ycombinator.com/newsfaq.html
Okay.
https://www.netflix.com/browse?jbv=81714181
Much of this is incorrect
No, articles are updated as new information comes in, retractions are made, etc. Especially breaking news (the type that would reach the top of HN). The archived versions are outdated.
It's not random, you get a number of free articles before the paywall appears ("soft" paywall).
The paywall is removed entirely for some topics/stories, especially matters of public health (common during the pandemic).
No, it's because they don't block archive crawlers, and prefer people bypassing the paywall and reading news at NYT. Hopefully users find the content valuable, and some of them subscribe as a result.
(opinions are my own)
So, what allows accessing content under IP illegally is not liking the marketing strategy of the content owner?
I find “4nn4’$ 4rch1v3 dot ORG” actually way better than pirate bay for pirating knowledge.
It’s amazing the amount of books that copyright laws prevent us from finding
https://www.theatlantic.com/technology/archive/2012/03/the-m...
Sure. It's just curious to me that news article have a pirated knowledge link as the de facto top comment, but link submissions to, for example, books for sale on Amazon don't have a link to Anna's Archive or equivalent.
I think the archive of an article is more preservation of history and maintaining records of events which often disappear if not archived. The number of threads referencing articles which are defunct is always increasing. A book or movie or original content on the other hand will continue to hold its own commercial value so reproducing it is more akin to an actual loss for the license holder.
Definitely a grey area when that content is then used to train models though.
I would say 9 times out of 10 it's to get around the paywall and absolutely not some higher moralistic preservation of history.
And everything is a grey area, determining the line is the existential purpose of these court cases.
We've been here before with hyperlinking, then indexing and then linking with previews and the Canadian Facebook stuff but I think this has more standing.
If I buy a book, I get a work of literature. But if I buy a news subscription I get a series of facts riddled with advertisements. I accept the former, but I oppose the latter. I suspect I'm not the only one.
I don't fully understand what you're opposing.
is it?
1) that you paid for news
2) that it included ads
both are just the price you want to pay. There are various state news outlets that you're probably already paying for - npr, pbs, bbc, cncb depending on your region
Historically newspapers leaned more on competition law than copyright, because their pages are supposed to be filled with non-copyrightable facts.[1] Copying part, but not all, of a factual article, significantly after the relevant event, was considered to be a promotion (not unfair competition) and a nice thing to do for the journalists. Things change, people lose sight of the original principles.
[1] https://en.m.wikipedia.org/wiki/International_News_Service_v...
This is rather inaccurate. A fact is Hitler invades Poland. You're right, nobody can copyright this idea, as it is just a fact.
However, if I then write a 500-word article describing the scene of Hitler invading Poland, have short quotes from some civilians there, etc. that particular arrangement of ideas and words is copyright.
AP can't go and sue INS for just reporting the fact Hitler invades Poland, but if INS takes a whole article word for word and reproduces it that's still violation of copyright. The actual printed words of the news always had copyright.
The WSJ can't claim copyright on the markets going up yesterday. They can claim copyright on something like "After the bell rang in the NYSE, the tech industry ticked up 1.2% over last week. Meanwhile the whatever market took a hit of -0.5% ending the quarter slightly lower than our analysis expected. Blah blah blah..." If Investor's Business Daily wrote a different article that also talked about the markets ending up at the end of the day, that's not a violation of copyright. If they literally write "After the bell rang in the NYSE, the tech industry ticked up..." then they're violating WSJ's copyright. This was true before and after International News Service v Associated Press.
Yes, the prose was always under copyright, but the key point for the case linked in the wikipedia article is:
So the case hinged on INS indeed reporting facts that differed in exposition.
These days most news is mixed with analysis [1] (which is often biased). I wonder if part of the reason for this shift is that analysis is copyrightable. It also seems like the number of opinion articles is ever expanding [2], though I don't have any hard numbers on that.
[1]: https://guides.library.cornell.edu/evaluate_news/source_bias
[2]: https://www.newsmediaalliance.org/rise-of-opinion-section/ Interestingly there's a banner at the top of that link touting an agreement between Axel Springer and OpenAI.
EDIT: formatting
Even the facts are not copyrightable, the prose is.
I would be "happier" to pay a subscription to an aggregation platforms like hackernews or reddit to access archived articles that are linked to these sites. In turn a proportion of that could be passed on to the underlying publishers that I actually visit. I have nearly zero interest in reading articles that aren't linked to from an aggregation site.
I don't want to read theguardian.com, or nytimes.com, or washingtonpost.com, or bloomberg.com, I want to read news.ycombinator.com. Paying an individual subscription to every possible underlying site that could be linked to from news.ycombinator.com is a non-starter.
This is a common statement, but every attempt to sell that service has been a dismal failure. See for example blendle.
Blendle failed because they went into competition with the papers whose content they reproduced.
Nearly every attempt at starting a new aggregation site like hackernews or reddit has been a failure.
I’m not going to switch to a new website where no community exists just so I can pay for news articles. To work it needs to be integrated into an existing, successful aggregation website.
I would be happier to pay a small fee per article I want to read. But the norm seems a monthly subscription.
Probably because most print media is garbage and nobody in their right mind would actually pay to read them
NYTs revenue keeps growing though.
Not from newspaper sales
I don't understand the downvotes - it's an extremely valid opinion. If people ask questions like that then they should be able to accept forthright answers?
(It's the same reason for me. I have tried news site subs but eventually got so tired of the polemic that I cancelled. I won't sub again).
The obvious response is that if you don't like news and think it has no value then you don't have to read it.
It is am ethical grey area, but if the paywall applied to all user agents, which would make it similar to say buying a Kindle book, then you might see that as pirating, whereas if you use an archive service that was served the HTTP response and cached it, then you are using a proxy UA.
If the news/magazine doesn't want this they can simple serve a cut down or zero length article to all non-paying viewers! But they want that SEO, and they want that marketing.
We can extend this analogy. What if someone put up a proxy, that has a legal Netflix subscription and which "watches" streams of Netflix shows, captures actual RGB values of pixels and re-streams the resulting video to anyone else? Isn't it the same "proxy" excuse?
I would say no because the site was happy to serve the content publicly, whereas your proxy is breaking a contractual agreement. Now we get into terms of service of a website, and even if you visit for free you agree to them. Which is a possible point. It is quite grey IMO. In terms of HN I reckon a mag would love the free brand rec. vs. the archive not being shared. Where it hurts them is if someone is avoiding paying for a subscription by continually using archive sites.
Indeed there are media that are hard paywalled, e.g., the information. However these are prohibited on HN, which possibly create additional bias towards non-hard-paywalled publications
Funny, I don't see it as a moral thing but more a "what can you get away with" thing.
I fully assume that if I was to post a magnet link to a torrent for whatever the link was about, I would be banned.
Morally speaking, I think it's perfectly reasonable to download a copy of something and either read the relevant info for my current task or to sample it to decide if I want to buy it. I see it no different to using the library or browsing at a book store.
Perhaps once news organisations can work out how to effectively wield the DMCA hammer against archive links we'll see the practice of posting them stop.
So downloading a movie from piratebay is no different to using the library?
In some jurisdictions (Poland, possibly whole of EU), downloading any kind of materials - be it movies, books or music - is legal. Uploading/sharing - if not between friends&family members - not so.
I’d argue that morality always has a “what can you get away with” component. Things that are normalized tend to be seen as morally permissible, and things that are seen as abnormal are more likely to be seen as immoral.
The problem with the thinking in the root comment is that it implicitly assumes that people’s behavior is morally consistent, or that they even try particularly hard to behave in a morally consistent way. That’s not really how people work. If you ask them to discuss morality in the abstract, they’ll try to come up with a consistent system. But their actual behavior is mostly dictated by social norms. And if you try to pin them down on the morality of their concrete actions, they’re more likely to stretch their moral system to accommodate their actions than the other way around.
None of this is to say anything about my own opinions on news sharing or OpenAI’s situation. It’s just that someone decrying piracy but also posting/sharing/upvoting links to copies of news articles is neither surprising, nor indicative of some deeper nuance to how people view morality around IP.
I think the intent is really different.
For LLMs you're essentially teaching them language by showing them lots of examples of written language - newspapers are of course a great example of written language.
The goal of OpenAI is not to reproduce newspaper articles verbatim when asked questions (even if the answer could be a newspaper article) and the fact that it can happen is a side effect of how LLMs work.
When a HN participant shares a (pay walled) link to a NYT article, I do want to read the exact article linked verbatim because while the facts of the article may be reproduced elsewhere in a form that's free, specific word choices or whatever might be a focal point of the discussion on HN, and therefore I can't realistically participate in a discussion without having read the article being discussed.
And as an aside, I have no problem with paying to read news, or whatever media, however it's impractical for me to subscribe to every news source HN participants link to, and therefore I gravitate to archiving services instead. I do wish there was a better solution - for example Blendle with more sources.
This is an excellent point. A properly functioning LLM should not return the original content it was trained on. When they return original content, I believe the prompt is tightly constrained and designed to extract or re-create original content. Another reason that occurred to me recently is that maybe the training set is too small, and more general prompts will re-create source material.
Another question would be, are LLMs regurgitating what they were trained on, or are they synthesizing something very close to the original content? (Infinite Monkeys, Shakespeare). Court cases like this increase the need for understanding the "thinking processes" in an LLM.
Maybe LLMs should follow best practices for 1980s style backprop models and later deep learning models: starve model size to force maximum generalization, minimal remembering.
Seems like a nice split-the-baby resolution would be to send the NYT Corp a single article read amount anytime GPT plagiarizes more than what’s allowed at an academic institution.
If NYT was a HN startup the link to the archived version would be banned and dang would be slamming the ban hammer.
Please don't post baseless accusations. I think dang has said that he tries to moderate less, not more, when YC companies are involved. (Although it's impossible to say what he would do in this situation.)
HN is currently facilitating piracy. Something your comment failed to address.
Like I said in another comment it is simpler than that. They just serve the login page/payment page to all HTTP requests. If they do that then the submission itself likely get's flagged as there is no workaround (just like if I submit my blog with a banner saying "hey you pay me $1 to read my cool post")
That is an apples to oranges comparison. An article about a video/book would have the relevant information in text form without needing to show the video "here is the new stuff shown in Apples 2 hour long WWDC keynote". If not is common that a comment in the discussion gives a summary as a tl;dr
With text articles behind paywalls the relevant information is hidden and only hinted at as a teaser.
To make it an apples to apples comparison, look at submissions where the link submitted is the retail link to the IP. For example, look at all the book link submissions on AMZN...
https://news.ycombinator.com/from?site=amazon.com
None of these have the Pirate Bay or Library Genesis or Anna's Archive or the equivalent as the top comment.
Compare that to...
https://news.ycombinator.com/from?site=nytimes.com
And almost all of these have an archived version as the top comment.
I wonder if this is because the purpose of linking to a book is to share awareness of that book’s existence - nobody is about to go and read it then and there to comment on its contents. Whereas the purpose of an article is to discuss it now, in the comments - the consumption horizon and bulk of the content is different.
I would broaden the question beyond HN to society as a whole.
In 1990 it would have been considered normal and appropriate to clip an article out of a newspaper and post it on a communal corkboard. What are the key differences between that form of IP and others, and that analogy and the present situation of HN allowing archive links?
Reach, and ease of distribution.
Makes sense. If you mail a friend a clipping, or post it on the corkboard, only so many people are going to see it, but then even though posting the "clipping" to HN may feel like the same thing, it's hard to appreciate the massive change in scale.
As for ease of distribution, that might address OP's original question: It's easy to make and click an archive link, but it's a lot more effort to make or find a Pirate Bay link to another form of media, and for someone else to download and view it.
If it takes 120 seconds to read a newspaper article, the archive.is workflow is a significant overhead over that, a significant friction. Those links are a courtesy to other HN readers. This is very different from the economics of buying and reading a book.
"Piracy is almost always a service problem and not a pricing problem."
edit: It didn't even occur to me to compare the time-cost of "just pay for the article", but: last I read, it's half an hour of work to cancel a New York Times subscription [0]. So, that option's not even on the table.
[0] https://news.ycombinator.com/item?id=26174269 ("Before buying a NYT subscription, here's what it'll take to cancel it", 812 comments)
> edit: It didn't even occur to me to compare the time-cost of "just pay for the article", but: last I read, it's half an hour of work to cancel a New York Times subscription [0]. So, that option's not even on the table.
I canceled mine two weeks ago. It was four clicks. One annoyed me because they tried to get me to stay with an offer, but I didn't drop them because of the price.
Same experience here, it was effortless. But it is enough to justify stealing from those journalists, it seems.
A book, TV show, movie, video game, album, or comic book is not available on the internet served by the copyright holder’s own servers with no authentication or authorization checks. But the NYT is available in that way.
But some are? I believe The Atlantic and The Economist are hard paywalled.
If they're hard paywalled (everyone gets the same login prompt), they won't be available on archive sites.
Oh it's worse than that. The NYT is positing that any neural network that is trained on their data, and can summarize or very closely approximate an article's content on request, is in violation.
This reasoning would presumably apply to any neural network, including one made of neurons, dendrites, and axons. So any human reader of the NYT who is capable of accurately summarizing what they read is an evil copyright violator, and must be "deleted".
Effectively, the NYT legal department is setting the stage for mass murder.
Hyperbole much? There is a difference between a computer and a person. I'm not aware that people generally can be enticed to reproduce full articles verbatim just through questioning.
As far as I know schools have to pay for the newspaper articles they use in class to educate students. Training an AI seems similar.
Here’s a service for the UK providing paid access to copyrighted materials to schools: https://www.nlamediaaccess.com/newspapers-for-schools/
Who thinks this? I don't. I think copyright is wrong across the board. I would love if the same pattern of posting archive'd articles held for books, movies, et cetera.
I would love to change my mind on this, as it is a very unpopular opinion to have. But I have _never_ seen a morally or scientifically sound argument in favor of copyright law, and I've spent decades looking.
I think it subsidizes the creation of junk food content (superhero movies and clickbait news for example) while not contributing anything to the progress of science (paywalled scientific journals and textbooks). I shudder how much time I have wasted in my life consuming crap attention grabbing media and advertisements. I like to think if we lived in a world where everyone could be a publisher if they wanted to, the quality filters would be better, and information reaching us all would be more likely to be in our best interests.
You can self-publish. Oh, you want to be able to publish other people’s work, and without their permission? How does that benefit the author?
You speak of "the author". But the current system does not benefit "the author". 1% of authors profit off copyright. 99% lose money on copyright (they pay more for copyrighted media than they earn from it).
Your question should be "How does that benefit monopolist authors"?
I agree, my idea would not benefit monopolist authors. They would lose the bulk of their revenue stream.
But it would benefit the average author whose cost of living would fall and information would start serving them more than serving business.
I am not downplaying the talent and hard work of successful monopolist authors. But I do not think the works they create are worth everyone giving up their rights to reshare and remix information. I believe the world would look very different post-IP. You'd probably have a new profession--small independent librarians (similar to data hoarders today)--who would help their local communities maximize the value they got from humanity's best information.
Maybe I'm wrong! Maybe the information ecosystem is better controlled and the genetic differences of monopolist authors are so stark that without the subsidies to this gifted class we'd all be worse off. But that's an argument based on outcomes and not principles.
The oxygen I'm breathing right now mostly was created by trees on land owned by others. But I don't ask for their permission to breath. Some things are just not natural.
I am not saying plagiarize. It is always the right thing to do to link back and/or credit the source. But needing to ask permission to republish something seems to go against natural laws.
As a supporter of piracy in the general case, I tend to agree with your observations, including that pirating NYT (FT, NPR, ...) articles is somehow some kind of different class of offense as, say, stealing a movie or mp3.
(Books, to me, are separate still, in that I like to have a physical copy (and generally see the authors as humans who deserve compensation, rather than mega-orgs that deserve eternal torment), so I'll frequently use the digital copy as a kind of preview, then purchase it once I see it's a good book I want to read.)
I've only been reflecting on this difference for a few minutes, but, to me, I think the major difference boils down to:
Sure, "stealing bad", but, IMO, someone stealing rice and beans from WalMart to feed their family is a different class of offense than someone robbing a boutique bakery because they can't get enough chocolate cake.First and foremost, and please repeat after me: Copying is not stealing.
You're not depriving anyone of anything. Unauthorized copying is not theft. There's no equivalency. You can't copy and paste a cake. If you take a cake from a bakery, you're depriving the bakery of a thing. If you take a picture of the trademarked bakery's sign, copy its the copyrighted text from its website, and print them out, you haven't stolen anything. Nobody has lost anything. Nothing was damaged. No person, place, or thing was harmed.
Current copyright law is offensively absurd. Patenting of software, effectively eternal content copyrights, ridiculously broken DMCA, music publishers taking 99 cents of every artist's dollar, and so on and so forth.
If you support the dissolution of archaic institutions and broken laws favoring those with entrenched wealth over individual rights, you support piracy.
There is a legitimate case for laws respecting and protecting intellectual property rights. Such laws do not currently exist. These laws do not deserve to be followed or respected, and should be broken as a matter of course. Civil disobedience is called for. Refuse to participate in an exploitative market immovably entrenched in governments all over the world. Pay artists directly and commensurately if you feel they've brought value to your life. Copy whatever you want. Share those copies with whomever you want. Nobody gets hurt. Only conglomerates of already wealthy individuals and corporations are "deprived" of the potential transaction with you that they feel they are entitled to, as a matter of course.
The NYT is just as complicit as any other legacy media institution in the enshittification of journalism and laying waste to the potential value of their content. The "Gray Lady" is not a person, or a valuable institution. It's a soulless corporate construct not deserving of our empathy or high regard simply because of the reputation of human individuals who previously produced quality content. Stop pretending these institutions serve some higher purpose than to fatten the wallets of shareholders.
The good journalists have left. The ones left behind are naive, or are desperately clinging to an illusion of legacy and institutional legitimacy that no longer exists.
All that is left for these media dinosaurs is to leech off the success of others, to use their reserves of wealth and influence to arbitrarily insert themselves into the market, with no regard to the fact that they no longer have value or prestige or purpose in the context of modern technology and communication.
Anyway. Copying isn't theft. Don't give them the linguistic territory. Call a spade a spade, and media companies the desperate corporate leeches that they are.
but what if they were also scraping, for example, Netflix content to use as part of their training set?
There were some tweets the other day about how Midjourney could be prompted almost-exactly reproduce some frames of the film Dune. It wouldn't be shocking if these companies were using large databases of movies, with questionable legal status.
I see this a lot, and they very well may be. But, watch any behind the scenes documentary about any artsy movie and 9 out of 10, the director's will be waxing poetic about their inspirations, often include older movies or paintings which have uncannily similar scenes/frames. So it also wouldn't be shocking if a model trained on the same inspirations as the filmakers generates almost-exact frames as the movie makers.
The archive link doesn't threaten their jobs and helps them avoid paying for NYT. It's NIMBY, or rather it's true form of NIIIM (Not if it impacts me).
Hypocrites are EVERYWHERE and are the majority.
It is pretty funny. If you go back and read the comments made yesterday about ChatGPT doing something much milder (using old articles to train data, some prompts fused to allow you to reproduce some of the articles though now don't work), you have a lot of comments talking about how The New York Times needs money and Open AI is using their work without paying for it.
Now a comment points out that HN News (and most of the internet) routinely does something much worse - allows people to bypass completely new articles in their entirety without paying - and almost all the comments are about how it's the New York Times fault for making it difficult to cancel subscription, the importance of news being available to everyone, the problems with copyright laws, etc.
This tendency at Hacker News are also much more of a threat to The New York Times than what Open AI is doing. Even the places like blogs/Reddit/social media submissions that summarize the article and post the relevant quotes. Unlike the summary of a movie, summarizing all of the relevant parts of a news article is extracting almost all the value from it, and giving it away for free.
And the vast majority of people read news for it's breaking content, not for its archived content from years before (and I say this as someone who has often recommended the latter, but has gotten very few people to do so). So giving people that free breaking content (either in its entirety like on Hacker News, or summaries like you see all over social media) is actually a direct competition to the news business in a way that training an LLM on an article from months/years back isn't.
Yes, and for nonfiction, it's also true that it usually depends on the original article for credibility. (If it were an anonymous poster making up a news story, most people wouldn't believe it.)
there's quite a big difference between "pirating" digital content and making it available to anyone for free and taking that content and building a for-profit service on top of it, which is what OpenAI are doing, no?
I was just going to post this. Seems quite an obvious and significant distinction, that doesn’t need to provoke all the existential hand wringing. Making money off someone else’s content is a totally different moral and legal case.
it's different reading an NYT article on an archive site vs. putting copies of it at the core of your $100B for-profit content delivery enterprise.
The NYT and other newspapers don’t go after the archived link providers. Probably because the newspapers scholarly mission includes things like preservation. But they also have a profit motive or they can’t stay in business.
This implicit permission for the archive links to exist, gives some of us the implicit permission to pirate the content.
Disclaimer: I am a happy subscriber to the NYT (and other digital newspapers).
Because those who own & produce such news articles asked to make them different. People listened and accepted their requests.
When you make a TV show or a video game, you don't get any protection from the Geneva Conventions and a long list of other international treaties for your rights on stuff other than the content you are producing. The same can't be said when you are producing news.
Blocking ads and avoiding payment are two different things.
it s also audacious how these news companies reproduce stories from social media and other electronic media of facts that are, like, freely available in nature. Or how they get embargos and exclusivity to government information as if they are some kind of information-bouncer
Not quite what parent means, but an interesting angle is: what if you scraped ChatGPT instead.
NYT, or someone's blog? Meh, fair use, and if you say no, you're in the way of progress.
But if you wanted to scrape ChatGPT answers to tweak your network, uh oh, violation of T&C!
They pirate movies as well:
https://garymarcus.substack.com/p/an-artist-fights-back-and-...
If I can't read about it, it didn't happen.
At least people do not obscure who is the original author of the content (so, if people like NYT articles - they could go and subscribe for more). Kinda "free advertising" (which still hurts the publisher in many cases, though). Same with search engines - as long as engine brings clicks - people are happy. If search engine just grabs the info and never redirects the user to the site - what is the point for the site to exist to begin with?
At least in the US, copyright violation is a civil thing, it's handled by lawsuits. If the copyright violation is of such a small level that it's not worth the copyright owner to do anything about it then nothing's done. In this case it's worth a massive amount of money.
Probably because the contents are what's posted, i.e. if someone would post a link to an interesting video behind paywall / login and there was an easy mirror available that'd be posted too.
If I could just buy one article for a coffee without entering a bunch of PII or go through a time-wasting process I would agree on the moral equivalence between the examples.
This is only an interesting juxtaposition if you have fully internalized and accepted the myth of people and corporations being interchangeable.
Because historically this is how news were shared. People would pick up a paper in a grocery store or cafe, read some of it, and leave it behind. They might rip out a page and take it home. Only one person paid and tens or hundreds gleam for free. This idea of sharing the story to nonsubcribers is as old as printed news itself. Instead news agencies prefer we forget that aspect of history, insist on being the “paper of record” while charging more money for easier to distribute media that gets sold globally. Yes, I think we are certainly not in the wrong here when we read the news for free.
I believe the reason many of us tolerate links to news articles and other content is because we believe in equality when it comes to information access. In other words, many of us believe that those who cannot afford a subscription to a paywalled site should still be able to read the articles, in much the same way public libraries allow those who cannot afford to purchase a book the ability to read it.
However, this doesn't apply to organizations that freely share copyrighted information while making money in the process, or to organizations that share copyrighted information in a way that specifically disadvantages or does harm to the original creator of that information.
Good observation. I now wanna start commenting with pirate links to other media, but HN would tear me to shreds real quick I guess.
The difference is that an individual pirating news is simply reading the article. OpenAI intends to digest news articles to the point of packaging them and reselling.
My uncle used to distribute daily newspapers and his saying was "News ages like a fish".
OpenAI is allegedly using NYTimes articles to train a computer and sell its services. I see different use scenarios.
I guess another way to look at it is that human just reads the pirated material. A computer makes a verbatim copy and analyzes it to the point to mimicry and sells fuzzy versions.
I pay for multiple streaming services because I get a decent amount of value from their content.
I do not pay for any news websites because I read very little of what they produce, and it tends to pop up more on aggregator sites like HN than me actually going to them.
I actually did have a subscription to The Telegraph for a few months at one point because initially I wanted to read a full article (without cheating). But eventually I cancelled because so much of it is polemic trash.
That's my justification: I pay for things that have value to me.
I think one of the key differences is something pointed out in the article, in that what the Open AI is doing is a substitute for reading the new york times and possibly a rival to it.
On the other hand having an archive link to a times article in order to discus it is not really a substitute for a times subscription as a news paper has to walk a line of letting some of it's articles be read while requiring payment for others (the times actually allows you to create a "gift link" to do exactly what the archive links do).
A lot of of that is going to stem from the fact that respect for "journalism" is pretty low. More than 99% of news articles are copies of the <1% of original work that happens in that field. In news, everyone is already lifting content from everyone else.
It's not just tolerated, it's encouraged because "the alternatives suck worse"
https://news.ycombinator.com/item?id=23735026
Even talking about it will get you scolded for talking about something "off topic"
Because I'm not interested in the medium itself, as I would be with a Netflix show; I'm not even interested really in the article or the New York Times as an institution. I'm interested in discussing the supposed real-life phenomenon being covered, and the posted content is the primer for that discussion. I think if you get rid of the archive links on HN you need to ban the paywalled content as well. If you want to discuss paywalled content I'm sure you can do that in the article's comment section.
I believe it's tolerated here based on the site guidelines. I have always thought this was the case because otherwise these posts would all be pay to play which would limit who could participate and turn HN into more of a subscription farm. Maybe the way to make everyone feel ok about it is to disallow links to paywalled content.
it's similar to how easy it's to subscribe NY times and then how hard it's to unsubs. They require extra steps and it's well known. So They get what they deserve? Do you see the poínt. They are lie spreaders, nothing else
We are also happy to use open source, yet what open source alternatives are there for news that don't get shot down by the media or besmirched?
There are two fundamental differences.
First, Open AI is the one doing the pirating here. Hacker News is the host, they aren't doing any pirating or posting any archival links to the copyrighted information themselves.
Second, Open AI charges subscription fees and profits off of the copyrighted material they have pirated, whereas Hackers News does not, nor do the people who post the links.
Is this really copywrite?
Or is it "you can't talk to someone about an article they read".
This is really saying you can't call up your buddy and have them tell you a summary of what they just read. Maybe my buddy has a good memory and some of the text is actually nearly duplicate. But I wouldn't know because I didn't read the original, I just asked for a summary from someone else that read it.
Once the NYT pays reparations for the Iraq war, I'll be the first to stop pirating it.