But I can't even get cartoons to most people for free now, without doing unpaid work for the profit-making companies who own the most use channels of communication
This is the sticking point for me. OpenAI isn't a profit-making company, but it's certainly a valuable company. A valuable company that is built from the work of content others created without transferring any value back to them. Regardless of legalities, that's wrong to me.
Put it this way - you remove all the copyrighted, permission-less content from OpenAIs training, what value does OpenAI's products have? If you think OpenAI is less valuable because it can't use copyrighted content, then it should give some of that value back to the content.
But we are allowed to use copyrighted content. We are not allowed to copy copyrighted content. We are allowed to view and consume it, to be influenced by it, and under many circumstances even outright copy it. If one doesn't want anyone to see/consume or be influenced by one's copyrighted work, then lock it in a box and don't show it to anyone.
I have some, but diminishing sympathy for artists screaming about how AI generates images too similar to their work. Yes, the output does look very similar to your work. But if I take your work and compare it to the millions of other people's work, I'd bet I can find some preexisting human-made art that also looks similar to your stuff too.
This is why clothing doesn't qualify for copyright. No matter how original you think your clothing seems, someone in the last many thousands of years of fashion has done it before. Visual art may be approaching a similar point. No matter how original you think your drawings are, someone out there has already done something similar. They may not have created exactly the same image, but neither does AI literally copy images. That reality doesn't kill visual arts as it didn't kill off the fashion industry.
I firmly believe that training models qualifies as fair use. I think it falls under research, and is used to push the scientific community forward.
I also firmly believe that commercializing models built on top of copyrighted works (which all works start off as) does not qualify as fair use (or at least shouldn't) and that commercializing models build on copyrighted material is nothing more than license laundering. Companies that commercialize copyrighted work in this manner should be paying for a license to train with the data, or should stick to using the licenses that the content was released under.
I don't think your example is valid either. The reason that AI models are generating content similar to other people's work is because those models were explicitly trained to do that. That is literally what they are and how they work. That is very different than people having similar styles.
If I read a lot of stories in a certain genre that I like, and I later write my own story, it’s almost by definition going to be a mish-mash of everything I like.
Should I pay the authors of the books I read when I sell mine?
We shouldn't hold individual humans and ML models to the same standards, because ML models themselves are products capable of mass production and individual humans are not even remotely at the same scale.
If you write that book, chances are you will gain some fans that are also fans of other authors in that genre.
If ML models write that genre, they can flood that genre so full that human artists won't be able to complete.
It's not even a remotely equivalent scenario
Computers and machines have been capable of mass production for decades, and humans have used them as tools. In the past 170 years, these tools of mass production have already diminished many thousands of professions that were staffed by people who had to painstakingly craft things one at a time.
Why is art some special case that should be protected, when many other industries were not?
Why should we kill this technology to protect existing artistic business models, when many other technologies were allowed to bloom despite killing other existing business models?
Nobody can really answer these questions.
It shouldn't be.
As soon as someone makes an AI that can produce it's own artwork without requiring ingesting every piece of stolen artwork it can, then I'm on board.
But as long as it needs to be trained on the work of humans it should not be allowed to displace those people it relied on to get to where it is. Simple as that.
Are there any humans that can produce artwork without ingesting inspiration from other art? Do you know any artists that lived in a box their whole life and never saw other art? Do you know any writers who'd never read a book?
Are they any human artists who can't, if requested, draw or write something that's a copy of some other person's drawings or writings?
Also, FYI, you can't steal digital artwork. You can only commit copyright infringement, which is not the same crime as theft, because theft requires depriving the owner of something in their possession.
This still is pretending that humans and AI models are equivalent actors and should have the same rights
Emphatically no they shouldn't. The capabilities are vastly different. Fair use should not apply to AI.
Fair use applies even to use of traditional algorithms, like the thumbnailing/caching performed by search engines. If I make a spam detector network, why should it not be covered by fair use?
No idea on the legality, but common sense suggests that the difference would be that a spam detector doesn't replace the products that it was trained on, while AI-generated "art" is intended to replace human artists.
The question is "is it a derivative work of the original?" - not if it is a generative work.
If that was the distinction to be made, using ChatGPT as a classifier would be acceptable while using it to write new spam (see the "I am sorry" amazon listings of the other day) would be unacceptable.
If two different uses of a tool allow for both infringing and non-infringing uses (are photocopiers allowed to make copies(!) of copyrighted works?) it has generally been the case that the tool is allowed and the person with agency to either use the copyrighted work in an infringing or a non-infringing way is the one to come under scrutiny.
I believe that if it is found that OpenAI is found to have committed copyright infringement in training the model, then an argument that training a model on spam be considered to be copyright infringement could be reasonably constructed.
If, on the other hand, OpenAI is found to have sufficiently transformative in its creation of the model and some uses are infringing, then it is the person who did the infringing (as with a photocopier or a printer printing off a copy of a comic from the web) that should be have legal consequences.
Yeah, I really think it should fall on the user as opposed to the tool.
The extent to which it supplants the original work is one of the fair use considerations.
I think it'd make more sense to have a stance of "current LLMs and image generators should be judged by fair use factors and I believe they'd fail", though I'd still disagree, instead of having machine learning models subject to a different set of rules than humans and traditional algorithms.
That is indeed the most common stance. There isn't nearly as much outcry over, say, image classification by LLMs, as there is over AI "art" generation.
Fair use applies to humans and the things they do (including AI). It is not something that applies to algorithms in themselves. AI's are not people, the people who use them are people and fair use may or may not apply to the things they do depending on the circumstances of whatever it is they do. The agent is always the human not the machine.
True; consider the "it" in my question ("If I make a spam detector network, why should it not be covered by fair use?") as "my making (and usage) of the network".
This isn't about giving "rights" to machines. Machines are just tools. The question is about what humans are allowed to do with those tools. Are humans using AI models and humans not using AI models equivalent actors that should have the same rights? I'd argue emphatically yes they should.
The thing is, we already have doctrine that starts to encompass some of these concepts with fair use.
The four pronged test in US case law:
- the purpose and character of use (is a machine doing this different in purpose and character? many would say yes. is "ripping-off-this-artist-as-a-service" different than an isolated work that builds upon another artist's art?)
- the nature of the copyrighted work
- the amount and substantiality of the portion taken (can this be substantially different with AI?)
- the effect of the use upon the potential market for the original work (might mechanization of reproducing a given style have a larger impact than an individual artist inspired by it?)
These are well balanced tests, allowing me as a classroom teacher to duplicate articles nearly freely but preventing me from duplicating books en masse for profit (different purpose; different portion taken; different impact on market).
The problem with this conversation is that its being had by people that make the top level comment here stating that clothing is not copyrightable. It is. Clothing design is copyrightable. This was a huge recent case, Star Athletica. They know nothing about copyright law and they just build intuitions from the world around them, but the intuitions are completely nonsense because they are made in ignorance of the actual law and what the law does and why the law does it. I find it exhausting.
Your sentiment is probably correct in that there are many aspects of copyright law that are not strictly aligned with the public’s intuition. But your example is a bit of a reach. Star Athletica was a relatively novel holding that allows for a specific piece of clothing, when properly argued, could qualify as copyrightable as a semi-sculptural work of art, however this quality of a given piece is separate to its character as clothing. In fact, the USSC in Star Athletica explicitly held a designer/manufacturer has “no right to prohibit any person from manufacturing [clothing] of identical shape, cut, and dimensions” to clothing which they design/manufacture. That quote is directly from a discussion of the ability to apply copyright protections to clothing design. I think the end result is that trying to argue technical legal issues around a poorly implemented statutory regime is always fraught with errors. That really leave moral and commercial arguments outstanding and advocacy should try and focus on that, when not fighting to affect change in the law these copyright determinations are based on.
And just to be clear, this post does not constitute legal advice.
You're dismissing my comment because of what someone else said upthread?
I hate the desire to meta-comment about the site rather than argue on the merits.
We obviously don't know so much about how courts will interpret copyright with LLMs. There's a lot of arguments on all sides, and we're only going to know in several years after a whole lot of case law solidifies. There are so many questions, (fair use, originality, can weights be copyrighted? when can model output be copyrighted? etc etc etc). Not to mention that the legislative branch may weigh in.
This discourse by citizens who are informed about technology is essential for technology to be regulated well, even if not all participants in the conversation are as legally informed as you'd wish. Today's well-meaning intuition about what deserves copyright and why inform tomorrow's case law and legislation.
This sounds so detached from human experience that I am tempted to ask if you are a human or just a disembodied spirit that haunts the internet.
When the first neanderthal drew a deer on the walls of a cave, where did they get inspiration?
When a little child draws a tree for the first time, where do they draw inspiration? Do you think they were reviewing works of Picasso?
When the firm man made an axe, chopped a tree, made a bed, sown some clothes, discovered fire, where did they draw inspiration?
Do you not have eyes, ears, do you not perceive and get inspiration from the natural world around you?
Yeah, but that’s not really your sole source of inspiration. My son has been ‘inspired’ by the art of all other kids in his kindergarden. Certainly by the time he gets to the age where he does it professionally he’s been inspired by an uncountable number of people.
What % is his independent inspiration? 30%? 90%? There are certainly people for whom it was 90%. For most we don’t know.
We do know one thing for sure - that for AI it’s 0%
We don't know what percentage is independent inspiration for a person using the AI to create art.
Once upon a time it was a contentious idea that humans had significant authorship in photographs, which merely mechanically captured the world. What % is the camera's independent inspiration?
Here, we have humans guiding what's often a quite involved process of synthesis of past human (and machine) creation.
The person using the AI doesn't matter in the equation. They aren't an artist, they're a monkey with a typewriter.
We're talking about the AI here, because it can generate the same images no matter which monkey with a typewriter is typing the prompts.
That's an opinion.
Does your opinion hold in all circumstances? If I spend 20 hours with an AI, iterating prompts, erasing portions of output and asking it to repaint and blend, and combining scenes-- did I do anything creative?
Being inspired isn't against the law. copying is. it'd be one thing if this conversation could be had with useful terminology that's actually on point. instead we have you, insisting that there is no creative process, there is only experiencing other art and inevitably copying (because apparently you think that's the only thing humans can do!). It's all so telling. Yet its tragic because so many here don't even realize it. I'm sad for your inability to engage with creativity and creative acts.
I think a lot of the discussion is where the balance of the creativity lies when a human uses a model (trained on other artistic works) to create art.
Is the result a copy, or perhaps a derivative work of the art in the training set?
Does the person using the model have authorship of the result?
Was it even okay to use the art to train the model and then share the resulting weights?
Are the resultant weights protected by copyright themselves?
I suspect the actual answers we'll come to on these topics will be full of nuance.
Are we going to discount the hundreds to thousands of artistic pictures children are exposed to? Or how about the teacher sitting up front demonstrating to the class how to draw a tree?
Learning to see as an artist is a distinct skill. Being able to take the super compressed simplified world view that mind sees and put something recognizable on paper is a specialized skill that has to be developed. That skill is developed by doing it over and over again, often by copying the style of an artist that someone enjoys.
Or to put it another way, go to any period in history prior to the mid 20th century and art in a given region starts to share the same style, dramatically so, because people were inspired by each other, almost to a comical extent. (Financial reasons also had something to do with it as well of course, Artists paint/carve/engrave/etc what sells!)
Do you think art was there before humans? Or humans made art?
If you believe the 1st proposition… please tell me about your very unique religion!
If not… you've answered your own question.
Logically, the answer to this is (almost certainly) yes, so you’ll need to discount this argument.
If the answer were no, then either an infinite number of humans have lived (such that there was always a previous artist to learn from), or it was true in the past but false in the present, which seems unlikely given humans brains have generally become more and not less sophisticated over time.
I presume what you’re missing here is that the brain can be inspired from other sources than human art. For example: nature; life experience; conversation.
Not making any other comment about what machines can or can’t do, just wanted to point out this argument is invalid as it comes up a lot and is probably grounded in ignorance around the artistic process. It’s such a strange idea to suggest that the artist process is ingesting lots of art to make more art. That’s such a weird world view. It’s like insisting every artist is making art the way Quentin Tarantino makes films.
I’ve spent a lot of time with artists, I’ve worked with them, I’ve been in relationships with artists, and I can tell you the great ones see the world differently. There’s something about their brains that would cause them to create art even if born on a desert island without other human contact. Some of them don’t even take an interest in other art.
In fact, those artists that _do_ make art heavily based on other artists’ work as suggested are often derided as “derivative” and “unoriginal”.
Do you feel the same way about tools like Google Translate?
Tbh I'm not familiar enough with how Google Translate is built, but if it's ingesting tons of people's work without their permission so it can be used to replace them then yes I do.
For what it's worth: that's pretty much how Translate works.
Translate operates at a large-chunk resolution, and one of the insights in solving the problem was the idea that you can often get a pretty-good-enough translation by swapping a whole sentence for another whole sentence. So they ingest vast amounts of pre-translated content (the UN publications are a great source, because they have to be published in the language of every member nation), align it for sentence- and paragraph-match, and feed the translation engine at that level.
It's created an uncanny amount of accuracy in the result, and it's basically fed wholesale by the diligent work of translators who were not asked their consent to feed that beast. Almost nobody bats an eye about this because the value (letting people using different languages communicate with each other) grossly outstrips the opportunity cost of lost human translator work, and even the translators are, in general, in favor of it; they aren't going to be displaced because (a) it doesn't really work in realtime (yet), (b) it can't handle any of the deeper signal (body language, tone, nuance) of face-to-face negotiation, and (c) languages are living things that constantly evolve, and human translators handle novel constructs way better than the machines do (so in high-touch political environments, they matter; the machines have replaced translators in roles like "rewriting instruction manuals" that were always pretty under-served in the first place).
Vastly inappropriate comparison- there are millions of pages of text out of copyright, you can get a good translation engine using public domain.
That’s is not the case for art, vast majority of art used by midjourney is not public domain.
Is that true? How did you establish that?
It's unfortunately also not great for translation. Language changes fast enough that training on content that went out of copyright is old data.
OpenAI has basically admitted it. Is OpenAI even disputing that it ingested all the works its being sued over? Not as far as I can tell.
Google translate is very basic and not even close to something good if you already know both languages. Useful if you're translating to your language (you do the correction when reading), but can lead to confusion the other way.
Interesting distinction.
If you can do the correction when reading, it seems reasonable to assume the reader in the opposite direction has the same correction capability.
I would expect the chance of confusion to be identical. The only difference is a matter of perspective, where in one case you are the reader and in one case you are the author.
Yes, they are identical. But I believe the reader is better armed to deal with the confusion, or at least to recognize the error, because it does not fit it. But when producing, you don't know the target language, so there's a better chance for errors to slip in unnoticed.
It's better for me to receive a text in the original language and translate it myself than to try to decipher something translated automatically.
I would argue that Translate being fed by paid UN translators who likely agreed to the use of their transcriptions in a TOS or something is not an equal comparison to unpaid artists having their art submitted online to sites which become part of a training set used in for-profit models such as OpenAI, that they never consented to. OpenAI is a nonprofit parent company, but this spawned a child for-profit company OpenAI LP which most of their staff work for, which is meant to return many-fold returns to their shareholders who are effectively profiting from the labor of all the artists and sources in their training.
What about code? Or what about if we eventually robot labourers that is trained on observing human labourers?
Code has licenses too. And we've had very high profile lawsuits based on "copying code".
Interesting point, but by that point in time I don't think generative art will even be in the top 10 ethical dilemmas to solve for "sentient" robots.
As it is now, robots aren't the ones at the helm grabbing data for themselves. Humans give orders (scripts) and provide data and what/where to obtain that data.
Because in this case the art is still necessary for the machine to work. You don't need horse buggies to make a car, nor existing books to make a printing press. You DO need artist's art to make these generative AI tools work.
If these worked purely off of open source art or from true scratch, I wouldn't personally have an issue.
We don't need to kill it. Just pay your dang labor. But if we are treating proper compensation as stifling technology, I'm not surprised people are against it.
Maybe in the 2010's tech would have the goodwill to pull this off in PR, but the 2020's have drained that goodwill and then some. Tech's made so many promises to make lives easier and now they joined the very corporations they claimed to fight against.
Well it's in courts, so someone is going to answer it soon-ish
> We don't need to kill it. Just pay your dang labor.
> But if we are treating proper compensation as stifling technology, I'm not surprised people are against it.
That's just it, nobody looking to get paid by OpenAI actually did any labor for OpenAI. They did labor for other reasons, and were happy with it.
OpenAI found a way to benefit by learning from these images. The same way that every artist on the planet benefits by learning from the images of their fellow artists. OpenAI just uses technology to do it much more efficiently.
This has never been considered labor in the past. We've never asked artists to "properly compensate" each other for learning/inspiration in the past. I don't know why it should be considered labor or proper compensation now.
But we shall see what the courts decide!
There are many ways an artist can compensate their influences. Some of them are monetary.
When discussing our work, we can name them.
When one of our influences comes out with a new body of work, we can gush about it to our own fans.
When we find ourselves in a position of authority, we can offer work to our influences. No animation studio is really complete without someone old enough to be a grandfather hanging out helping to teach the new kids the ropes in between doing an amazing job on their own scenes, and maybe putting together a few pitches, for instance.
We can draw fan art and send it to them.
None of these are mandatory, but artists tend to do this, because we are humans, and we recognize that we exist in a community of other artists, and these all just feel like normal human things to do for your community.
And if an artist suddenly starts wholesale swiping another artist's style without crediting them, their peers get angry. [1]
1: https://en.wikipedia.org/wiki/Keith_Giffen#Controversy
OpenAI isn't gonna tell you that it was going for a Cat & Girl kind of feel in this drawing. OpenAI isn't gonna offer Dorothy Gambrell a job. OpenAI isn't going to tell you that she just came out with a new collection and she's still at the top of her game, and that you should buy it. OpenAI's not going to send her a painting of Cat & Girl that it did for fun. OpenAI isn't going to do anything for her unless the courts force it to, because OpenAI is a corporation who has found a way to make money by strip-mining the stuff people post publicly on the Internet because they want other humans to be able to see it.
Most people know 20,000-40,000 words. Let's call it 30,000. You've learned 99.999% of those 30,000 people from other people. And don't get me started on phrases, cliches, sentence structures, etc.
How many of those words do you remember learning? How many can you confidently say you remember the person or the book that taught you the word? 5? 10? Maybe 100?
That's how brains work. We ingest vast amounts of information that other people put out into the world. We consume and it incorporate it and start using it on our own work. And we forget where we even got it. My brain works this way. Your brain works this way. Artists' brains work this way. GPT-4 works this way.
The idea that a visual artist can somehow recall where they first saw many of the billions of images stored in their brain -- the photos, movies, architecture, paintings, and real-life scenes that play out every second of every day -- is laughable. Almost all of that goes uncredited, and always will.
This is what it is to learn.
I tend to fall more on the "training should be fair use" side than most, but your comment seems to be missing the point. Nobody is arguing that models are violating copyright or social norms around credit simply because they consume this information. Nobody ever argued/argues that the traditional text generation in markov models on your phone's keyboard runs afoul of these issues. The argument being made is that these particular models are now producing content that very clearly does run into these norms in a qualitatively different way. You cannot convincingly make the argument that the countless generated "X, but in the style of Y" images, text, and video going around the internet are exclusively the product of some unknowable mishmash of influences -- there is clearly some internalized structure of "this work has this name" and "these works are associated with this creator".
To take it to an extreme, you obviously can't just use one of the available neural net lossless compression algorithms to circumvent copyright law or citation rules (e.g., distributing a local LLM that helpfully displays the entirety of some particular book when you ask it to), you can't just tweak it to make it a little lossy by changing one letter, or a little more lossy than that, etc., while on the other hand, any LLM that performs exactly the same as a markov model would presumably be fine, so there is a line somewhere.
A company hires an artist. That artist has observed a ton of other artists' work over the years. The company instructs that artist to draw, "X but in the style of Y", where Y is some copyrighted artwork. The company then prints the result and puts it on their packaging.
A company builds an AI tool. That AI tool is trained on a ton of artists' work over the years. The company opens up the AI tool and asks it to draw, "X but in the style of Y," where Y is come copyrighted artwork. The company then prints the result and puts it on their packaging.
What's the difference?
I'd argue there isn't one. The copyright infringement isn't the ability of the artist or the AI tool to make a copy. It's the act of actually using it to make a copy, and then putting that out into the world.
Okay, but then that's an an argument subject to the critiques made upthread that you were initially trying to dismiss? You can't claim that AI doesn't need to worry about citing influences because it's just doing a thing humans wouldn't cite influences for, then proceed to cite an example where you would very much be expected to cite your influences, and AI wouldn't, as evidence.
I never argued that AI doesn't need to worry about citing influences. If I am a person using a tool to create a work, and the final product clearly resembles some copyrighted work that I need to reference and give credit to, what does it matter if my tool is a pencil, a graphics editing program, a GPT, or my own mind? I can cite the work.
Like I said, this is exactly what the comment you first replied to was explaining. It is very clearly not the same as a pencil or a graphics editing program, because those things do not have a notion of Cat & Girl by Willem de Kooning embedded in them that they can utilize without credit. It is clearly not the same as your mind, because your mind can and, assuming you want to stay in good standing, will provide credit for influence.
Again, take it back to basics: do you believe it is permissible to share a model itself (not the model output, the model), either directly or via API, that can trivially reproduce entire copyrighted works?
I'd say that a tool itself can't be guilty of copyright infringement, only the person using the tool can. So it doesn't matter if the GPT has some sort of "notion" of a copyrighted work in it or not. GPTs aren't sentient beings. They don't go around creating things on their own. Humans have to sit down and command them, and that point, whoever issued the command is responsible for the output. Copyright violation happens at the point of creation or distribution, not at the much earlier point of inspiration or learning.
So yeah, of course imo it should be permissible to share a model that can reproduce copyrighted works. Being "capable of being used" to violate a law is not the same thing as violating a law.
A ton of software on my computer can copy-paste others' work, both images and words. It can trivially break copyright. Hell, there are even programs out there than can auto-generate code for me, code that various companies have patent claims for. Do I think distributing any of this software should be illegal? No. But I think using that software to infringe on someone's copyright should be.
(Note: This is different than if the program distributed came with a folder that included bunch of copyrighted works. To me, sharing something like that would be a copyright violation.)
I'm not sure how to explain this any clearer. I am talking about neural net compression algorithms. As in, it is literally just a neural net encoding some copyrighted work, and nothing else. It is ultimately no more intelligent than a zip file, other than the file and program are the same. You can't seriously believe that these programs allow you to avoid copyright claims, can you? Movie studios, music producers, and book publishers should just pack it in, pirates just need to switch to compressing by training a NN, and seeding those instead, and there's no legal precedence to stop them? If you do think that, do you at least understand why nobody is going to take your position seriously?
A neural net designed to do nothing other than compress and decompress a copyrighted work is completely different than GPT-4, unless I'm uninformed. To me that sounds like comparing a VCR to a brain. GPT-4's technology is clearly something that "learns" in order to be able to produce novel thoughts and ideas, rather than merely compressing. A judge or jury would easily understand that it wasn't designed just to reproduce copyrighted works.
> It is clearly not the same as your mind, because your mind can and, assuming you want to stay in good standing, will provide credit for influence
I forgot to respond to this, but it's not true. Your mind is incapable of providing credit for 99.9% of its influence and inspiration, even when you want it to. You simply don't remember where you've learned most of the things you've learned. And when you have a seemingly novel idea, you can't always be aware of every single influential example of another person's work that combined to generate that new idea.
Ultimately, only high courts in each jurisdiction can decide. I can imagine a case where some highly advanced nations decide different interpretations that cause conflict. Then, we need an amendment to the widely accepted international copyright rules, the Berne Convention. Ref: https://en.wikipedia.org/wiki/Berne_Convention
The artist has a claim for production of a derivative work and for passing off against the other artist.
Individual words aren't comparable to the things people are worried about getting copied. People are much more able to tell you where they learned about more sophisticated concepts and styles.
The same principle applies, though. They can tell you maybe a dozen, maybe a few dozen, concepts they've learned and use in their work. But what about the thousands of concepts they use in their work they can't tell you about? The patterns they've noticed, the concepts that don't even have names, but that came from seeing things in the world world that were all created by other people?
For example, how many artists drawing street scenes credit the designer at Ford Motors for teaching them what a generic car looks like? How many even know which designers created their mental model of a car?
Nobody working on a new cancer drug actually did any work for me. They did labour for other reasons, and were happy with it.
There it is okay for me to steal their recipe and sell their cancer drug.
Nope, but it’s ok for you to read their recipe if they place it on the internet (research paper), and use it to make your own drug.
And that is a good thing we should all celebrate.
The entire point of the patent system was to say inventors can put their design on the net without it being stolen; so future inventors can build on their work.
To me this is a strong point in favor of the idea that OpenAI has no business using their work. How can you even think it's ok for OpenAI to use work that was not done for them without paying some kind of license? They aren't entitled to the free labor of everyone on the internet!
At the risk of answering a rhetorical question: because copyright covers four rights: copying, distribution, creation of derivative works, and public performance, and LLM training doesn't fit cleanly into any of these, which is why many think copying-for-the-purpose-of-training might be fair use (courts have yet to rule here).
I think the most sane outcome would be to find that:
- Training is fair use
- Direct, automated output of AI models cannot be copyrighted (I think this has already been ruled on[0] in the US).
- Use of an genAI to create works that would otherwise be considered a "derivative work" under copyright law can still be challenged under copyright.
The end result here would be that AI can continue to be a useful tool, but artists still have legal teeth to come after folks using the tool to create infringing works.
Of course, determining whether a work is similar enough to be considered infringing remains a horribly difficult challenge, but that's nothing new[1], and will continue to hinge on how courts assess the four factors that govern fair use[2].
[0]: https://www.reuters.com/legal/ai-generated-art-cannot-receiv...
[1]: https://www.npr.org/2023/05/18/1176881182/supreme-court-side...
[2]: https://fairuse.stanford.edu/overview/fair-use/four-factors/
LLMs are collections of GPUs crunching numbers. "Inspiration" doesn't really apply to them.
A better analogy is sampling, and musicians remixing music are very much required to pay for the samples they use.
Only if the "use" where use means distribute.
If I sample a track and play it in my home I don't properly compensate anyone.
If I ask GPT to create a cool new comic based on the article and then delete or use it privately it, same applies.
True, sadly most of those copyright are probably owned by other megacorp. So they either collude to surppess the entire industry or eat each other alive in legal clashes. The latter is happening as we speak (the writers for NYT are probably long retired, but NYT still owns the words) so I guess we'll see how that goes.
If we treat AI like humans, art historically has an equally thin line between inspiration and plagiarism. There are simply more objective metrics to measure now because we can indeed go inside an AI's proverbial brain. So the metaphor is pretty apt, except with more scrutiny able to be applied.
They were happy until their copyright got stolen, I guess. Then got unhappy.
Redacted.
Mass production hasn't killed art and never will.
What's killing art is this idea by a vocal minority of "artists" that they need to mass produce their work, enter the market, and attempt to make millions of dollars by selling and distributing it to millions.
That's not art. That's capitalism. That's competing to produce something that customers will want to buy more than what your competitors offer.
If you want to compete on the capitalistic marketplace, then compete on the capitalistic marketplace. But if you want to be an artist, be an artist.
Art is still alive and well and always will be. Every day I see people singing because they love singing, making pottery because they love making pottery, writing because they love writing. Whether other people love or enjoy their art, the artist may or may not care. Whether they can profit from their art, the artist may or may not care. But many billions of artists will keep creating, crafting, and designing day after day, and they will never be stopped by AI or anything else.
Redacted.
Jobs have never been less soul crushing, or more creative, in the history of humanity. And that becomes increasingly true every decade.
Do you know what a job does? What a company does? It contributes to society! It produces something that someone else values. That they value so much they're willing to pay for it. Being part of this isn't a bad thing. It's what makes society work.
A job/company entertains. It keeps things clean. It transports people to where they need to go. It produces. It gives people things they want. It creates tools, and paints, and nails, and shirts. I look out my window, and I see people delivering furniture, chefs cooking food and selling it out of trucks, keepers maintaining grounds, people walking dogs.
Being useful to the fellow members of your society for 40 hours a week is not "soul crushing."
Hey. Thanks. Sorry about wasting your time. Shouldn't have started in the first place. It was my fault for trying to make a silly point.
Too mid to understand your point.
(This is a response to your comment before you edited it.)
Find the intersection of something that people increasingly value, that you enjoy, and that you can compete at.
The best proof that people value something is that they're spending money for it. If people aren't spending money, they don't value it, and you probably don't want to go into it. If people aren't spending more and more money on it every year, then it's not increasing in value, and you probably don't want to go into it.
The best proof that you enjoy something is that you enjoyed it in the past. Things you liked as a kid, activities that excited you as a young adult, etc., are often the best candidates.
Look for intersections of the two things above. Do some Googling, do some research.
Finally, you need to be able to compete at it. If you do something worse than everyone else does it, then no one will pick you, because you're probably not being helpful. The simple answer to this is to practice to make yourself better. But most people don't want to do that. A better answer to this is to be more unique, so you can avoid the competition. Don't do a job that has a title, a college major, and millions of talented applicants. It's not that helpful to society to do something a hundred million other people can already do, which is why there's more competition and lower wages.
When you find the intersection of what's valued and what you enjoy, call up some people in those fields and ask what's rare. What in their area is needed. What are they missing. What is no one else doing.
Or just start your own company. That's the easiest way to be unique. But it's hard.
Finally, if you feel you're too "mid," then make sure your standards aren't crazy. Don't let society tell you that you need to be a millionaire with a yacht and designer clothes to be happy. Get a normal 9 to 5 with some purpose in it, that you can be proud of, that others appreciate. Live within your means and don't stress yourself out financially. Spend your free time doing things you like. Take care of your health, find good relationships, and treasure them. That's a happy life at any income. I know a bunch of miserable depressed rich people who are very good at making money and very bad at health/relationships/etc., which is the real stuff that life is made out of.
People do whatever they want with their own property. You have no right to steal it just because they want to monetise it. What’s killing art is stealing it en masse using procedural generators.
This is an interesting example because even in the $100 case you are still talking about machine-augmentation. You can have a seamstress or a tailor customize patterns, using off the shelf textiles, for that order of magnitude price - but if you want to use custom built, exotic materials or many kinds combined, the cost is on the orders of thousands not hundreds. Also there is a large industry of just printing designs on stock-shirts, that has a different point effort-scale equilibria.
Thinking about how how automation disintermediates is very important. For animation, often productions have key-frame artists in the animation pipeline that define scenes, and then others that take though to flush out all the details of that scene. GenAI can potentially automate that process. You could still have the artist producing a keyframe, and can render that into a video.
Another big factor is style. One hypothesized reason that more impressionism, absurdism or abstract art all become styles is photography. Once cheap machine-produced photography became available, there is less need for a portrait artist. But further, it also is no longer high-status and others push trends alternative directions.
All the experiments and innovation going right now will definitely settle into a different set of roles for artists, and trends that they will seek to satisfy. Art-style itself will change as a result of both what is technically possible and also what is _not_ easily automatable in order to gain prestige.
Too much wall of text for nothing. Nobody is stopping you from buying hand crafted masterpiece. Just get out of the way of progress.
I'm confused about your point. Are you saying we should ban $10 mass produced shirts so that more people can make a living hand-crafting $100 shirts?
What if the AI was solely trained on this person's work, then from that churned out a similar replacement that was monetized?
Well art predates other professions by like thousands of year so it rightfully earned it's privileges.
It's an interesting predicament. Assuming these stories between person and machine are indistinguishable and of same quality, then the difference here is the ability to scale. Without giving bias because of humanity reasons, why should we give entitlement to output derived from a human over something else of same quality?
I hate making analogies, but if we make humans plant rows of potatoes, should that command a higher price and seen more valuable than planting potatoes by tractor 20 rows wide?
No, we should absolutely be giving bias to humanity. Flesh and blood humans matter, their lives matter, their thoughts matter and their work matters.
Machines are tools for them to use not entities given the same rights and same consideration.
I reject your whole premise.
Exactly; their flesh, blood, energy, etc. does matter. This is my argument for it, not for your argument against it, lmao. There's nothing more remarkable about my planted potato row vs the tractor planted rows, and my energy can be spent elsewhere. I am not entitled to making a living hand planting potatoes if there's not a market for it.
People have the choice to continue making stories and they'll have a fanbase for it and always will, because that's ultimately apart of freedom and choice. Many are less what I'll call purists here, and don't care about how it came to be, they just want a quality story.
What you're loosely proposing is art being a protected class of output, when we have tools that can match and soon with the potential to surpass. Is that not a terrific way to stunt what you're trying to defend?
For transparency, I am an advocate for human made art, but I am against stunting tooling that can otherwise match said creativity. I see that as an artform in itself.
If you believe AI tooling is an artform then you categorically are advocating against human made art as far as I am concerned.
This is just gatekeeping. Art is not better because it was made by hand as opposed to with technology. If I use a generative model to make art then I’m an artist.
I would argue art is better when it's the result of the effort and vision of an individual
prompting a search engine to stitch images together on your behalf might result in an image you can call art, but imo all the art generated wholecloth like this sucks. necessarily derivative. put into the world without thought.
My favorite critique of LLM work: "why would I bother to read a story that no one bothered to write"
You are free to think so, but it really doesn't make you an artist any more than wearing a medal you bought second hand makes you a war hero.
Something else did the work and you're just claiming credit. It's honestly kind of sad.
Seriously asking: if I customize my order at a fast food joint am I a chef? How is that different from prompt engineering to generate art?
Plenty of people would disagree so clearly this is not a settled matter
I think AI art will by definition never surpass human art. Humans can be inspired by things other than the art of others.
Descartes told us that animals are mere soulless automatons, not entities given the same rights and same consideration as humans.
Well ok, that was 300 years ago and views have changed dramatically since then.
Nice strawman.
So you instead want to what? Ban the tools because they interfere with doing things the human way?
no force the people creating and profiting from the tools to get permission from the people they mine the data from or cease operating
I feel like the issue here, is you are giving AIs agency.
AIs are not magic. They are tools. They are not alive, they do not have agency. They do not do things by themselves. Humans do things, some humans use AI to do those things. Agency always rests with a combination of the tool's creator and operator, never the tool itself.
Is there really a difference between a human flooding the market using AI and a human flooding the market using a printing press?
Even if human's can't compete (An obviously untrue premise from my perspective, but lets assume it for the sake of argument), is that a bad thing? The human endeavor is not meant to be a make work project. Humans should not be forced to pointlessly toil out of protectionism when they could be turning their attention to something that can't be automated.
A magnitude of difference, yes. Even a printing press will be limited by natural resources, which require humans to procure.
A computer server can do a lot more with a lot less. And is much easier to scale than a printing press.
When the AI can be argued to be stealing human's work, yes. A printing press didn't need to copy Shakespeare to be useful. And it'd benefit Shakespeare anyways because more people get to read about his works.
So far I don't see how AI benefits artists. Optimistically:
-an artist can make their own market? Doubtful, they will be outgunned by SEO optimized ads from corporations.
- they can make commissions faster? To begin with commissions aren't a sustainable business. Even if they 5x the labor and somehow kept the same prices they aren't living well. But in reality, they will get less business as people will AI their "good enough" art and probably won't pay as much for something not fully hand drawn
- okay, they can make bigger commissions? There's a drama about spending 50k on a 3 minute AMV, imagine if that could be done by a single artist in a day now!... Well, give it another 10 years. Lot of gen Ai is static assets. Rigging or animating is still far from acceptable quality, and a much harder problem space. I also wouldn't be surprised if by then any AI models has its own phase of enshittification and you end up blowing hundreds, thousands anyway.
-----
Until someone conceptualizes a proper UBI scheme, pointlessly toiling is how most of the non-elite live. I have yet to hear of a real alternative for these misplaced artists to move towards.
So what? So we all just become managers in meetings in 30 years?
AI runs on some of the most power hungry and expensive silicon on the planet. Comparing a GPU cluster and a printing press then staring the GPU cluster not limited by natural resources is just silly. Where does the materials come from to make the processors?
The same can be true for AI as well. I could see a picture and then ask AI whose style it is. Then I could go look up more work by that artist, increasing their visibility.
- they can make commissions faster? To begin with commissions aren't a sustainable business. Even if they 5x the labor and somehow kept the same prices they aren't living well. But in reality, they will get less business as people will AI their "good enough" art and probably won't pay as much for something not fully hand drawn
Is this a complaint that something got cheaper to make? This one affects more than just artists. For instance, code quality output from LLM is quite high. So, wages across the board will decrease yet capabilities will increase. This is a problem external to AI.
Again, not just artists and the path forward is the same as it’s always been with technological advancements, increase your skill level to above the median created by the new technology.
Probably mined from 3rd world country slaves (in the literal "owning people" sense). But still, these servers already exist and scale up way more than a tree.
Sure, and you can use p2p to download perfectly legal software. We know how the story ends.
It's a complaint that people even woth more efficiency still can't make a living. While the millionaires become billionaires. I'm not even concerned about software wages. Some Principal SWE going from 400k to 200k will still live fine.
Artists going from 40k to 40k (but now working more efficiently) is exactly how we ended up with wages stagnating for 30 years. And yes, it is affecting everyone even pre-AI. The median is barely a living wage anymore, which is what "minimum wage" used to be.
If we lived in a work optional world I don't think many would care. But we don't and recklessly taking jobs to feed the billionaires is just going to cause societal collapse if left unchecked.
Why do you think the device you're using to make this comment is better than a GPU?
Was your comment written by some new flamebait AI? You say so many arguments that fall apart under the slightest examination
It's like you are mad at gravity. That sucks you feel that way, but very unlikely to change anything.
It'd be a good point if it wasn't for the fact that search engines didn't exist until google, because of technology, and that courts didn't need to consider the issue until then. So where does your point get us? We are here now.
Search engines are an index, which have existed for centuries.
When i was in university, i remember there was this humanities professor who had a concordance for the iliad on his shelf. As a CS person it was so cool to see the ancient version of a search engine.
"search engines didn't exist until google" - you might want to, uh, google that
These models are not conscious, they’re not acting on their own. If I make art using a generative model it’s no more the model doing it than it’s sketchbook doing it if I were to use that. I’m making art using whatever tool, sometimes that tool is more or less powerful. But I’m the one doing it.
What about ML models that only publish 1 or 2 books a year?
Is it realy about volume?
Did you not pay them when you bought their book to read it in the first place? That dead trees don't lend themselves to that sort of payoff is a limitation of the technology. In music, sampling is a well-accepted mechanism for creating new music, and the original authors of the music they used do get paid when the new one is used.
No, I bought the books used for 25 cents at a local booksale, and the authors did not benefit from my secondary market transaction.
But they did. The presence of a secondary market for used books increased the value of some new books. People buy them knowing that they might one day recoup some costs by selling them. Would people pay more, or less, for a new car if they were told they could never sell or trade it away as a used car?
Gee I don't know, but I'm glad that digital goods do not incur the same material costs as a car. "You wouldn't download a car", we've come full circle.
Lol, did an AI write this? Literally no one buys books because they might one day recoup a fraction of the sticker price on the secondary market.
Baffling
You got that via the legal "first sale doctrine" which has been killed for digital works.
It's a tough issue to correlate to physical goods, especially when you realize that people sometimes donate books.
"In 2012, the Court of Justice of the European Union (ECJ) held in UsedSoft GmbH v. Oracle International Corp that the first sale doctrine applies to used copies of [intangible goods] downloaded over the Internet and sold in the European Union." [0]
Arguably the U.S. courts are in the wrong here. We can only hope first sale doctrine is extended to digital goods in the U.S. in the future, as it has been in the EU for over a decade.
[0] https://scholarlycommons.law.northwestern.edu/cgi/viewconten...
How many books can you write per second?
How how many books per second can you read to influence and change your personal style?
I don't think any person who actually has worked on anything creative in their life would compare a personal style to a model that can output in nearly any style at extreme speeds. And even if you're inspired by a specific author, invariably what happens is it becomes mix of yourself + those influences, not a damn near-copy.
With visual mediums it's even worse, because you have to take the time [months, years] to specialize in that specific medium/style.
On my laptop, using modern tools backed by AI? ... many.
Thanks now to AI, hundreds. I can plug the output of the book-reading AI into the input of the tool I use to write my books and thereby update my personal style to incorporate all the latest trends. Blame the idiots who are paying me for my books.
So, zero. You yourself: zero.
You completely ignored the premise of the question.
You should read the response more carefully. Generative models are just tools. If I use one to write a story it’s no less a story that I wrote than if I’d chiseled it into a Persian mountainside.
It pretty clearly is. Less of a story that is.
This is clearly a bad-faith response to the point that the GP was making
I don't think anyone who has ever read a novel in their life would say that an AI can write literature at all, in any style.
The obvious solution is to just treat it as if a human did it. If you did not know the authorship of the output and thought it was a human, would you still consider it copyright infringement? If yes, fair enough. If no, then i think is clearly not a "damn near-copy"
Put differently - if you perfectly memorise Harry Potter, write it down into a book and sell it, you'll get into trouble.
Right, I don't think anyone disagrees with that.
The question is about someone/something writing a book _influenced_ by Harry Potter -- do they owe JK Rowling royalties?
That depends on a variety of factors. You may find yourself in trouble if you write about a wizard boy called Perry Hotter going to Elkwood school of magic and he ends up with two sidekicks (a smarter girl and a redhead boy).
It could be argued quite convincingly that stories like Brooks's Shannara and Eddings's Belgariad are LOTR with the serial numbers filed off — but there is more than enough difference in how various pieces work for those series to make them unique creations that do not infringe on the properties or cover too much the story. (Although I cringe at putting the execrable Belgariad books in any class with either LOTR or Shannara.)
The "best" modern example of this is the 50 Shades series. These are Twilight fan fiction (it is acknowledged as such) with the vampire bits filed off. They are inspired by Twilight, but they are not identifiably Twilight in the end. It might be hard to tell the quality of writing from that which an LLM can produce, and frankly Anne Rice did it all better decades before (both vampires and BSDM).
Humans can be influenced by writers, artists, etc. LLMs cannot. They can produce statistically approximated mishmashes of the original works themselves, but there is no act or spark of creation, insight, or influence going on that makes the sort of question you’re asking silly. LLMs are just math. Humans may be just chemistry, but there’s qualia that LLMs do not have any more than `fortune` does.
I'm with all your other arguments ... but not this point. What is the special magic property that machine-generated art doesn't have? Both human and machine generated art can be banal, can be crap. And I think there is plenty of machine generated art this a quite beautiful, and if well prompted even very insightful. Non-GenAI can be this way, Conway's game of life has a quality of beauty to it that rivals of forms of modern art. If you wanted to argue that there still is the need for a human to provide some initial inspiration as input, or programming before something of value can be generated, then I would agree, at least for now, though there is meta-argument about asking LLMs to generate their own prompts that makes this an increasingly gray area.
But I don't think the stochastic parrot argument holds water. Most of _human_ creations is derivative. Unique mixes of pre-existing approaches, techniques, substance, often _is_ the creative act. True innovation with no tie to existing materials seems vanishingly rare to me and is really high bar, beyond which most humans ever achieve.
I hope you payed for the book you read.
If openai would pay usage fees for the training material per user its generating content for - it would never be profitable - artist would be fine off. But even all the shares are owned by people who have given this system none of it‘s knowledge.
In that case, good? I thought if nothing else, these past year or two would teach companies about sinking money into unsustainable businesses and then price gauging later (i know it won't, the moment interest rates fall we are back to square one). If it isn't profitable, scale down the means of production (which may include paying C class executives one less yatch per year, tragic), charge more upfront to the customers, or work out better deals with your 3rd parties (which is artists in this case).
I also find some scheudenfredre in that these companies are trying to sell "less enployees" to other companies but would also benefit from said scaling down as they throw out defenses of "we can't afford to pay every copyright" .
Is your mishmash going to be a literal statistical model built on top of those other stories?
There are two problems with this (very common) line of argument.
First, the law is pretty clear that yes if your story is too similar to another work, they have rights. Second, it's not at all obvious we can or should generalize from "what a human can do" and "what a bunch of computers can do" in areas like this.
You are not an AI model, and AI models are not human authors, so your comparison is invalid and question irrelevant.
Did you read the books with the intent to incorporate their ideas into your head and profit of this?
You are not a machine.
Have you noticed that authors and artists love sharing their inspirations? Let's say you're an up-and-coming author. In an interview, you list your sources of inspiration.
Using your logic, why does the creative community celebrate you and your inspirations instead of crying foul like they are with LLMs?
I feel like the keyword is 'almost' and then you begin pulling on that thread:
How closely is this the case? What blind spots exist? How do you measure this? What is the capacity for original idea generation does the human mind have and how does it inspire a unique spin to it?
This is one of those areas where 'thought experiments' are never going to pass muster against genuine experiments with metrics, trial, and robist scientific research.
But with the stakes as they are, I dont have faith there exists a good faith dialogue in this arena.
But it's also going to be affected by the teachers you had in pre-school, the people you hang around with, your relatives, films you've seen, adverts you watched, good memories and bad memories of events. You bring your lived experience to your story, and not just a mish-mash of stories in a particular genre, but everything.
Whereas when you train a model, you know the exact input, and that exact input may be 100% copyright material.
Ah, just like humans who train against the output of other humans. AI models are not fundamentally different in kind in this regard, only scope, and even that isn't perfectly obvious to me a priori.
Going by this logic, why is OpenAI forbidding use of the content it generates for training other models?
Well, mostly because of corporate greed of ownership. But the underlying issue is that Ai training in AI is a recipe for ruining the entire training set. At least in these early stages.
Not just greed, they want to silence copyright holders whose works they freely use and at the same time prevent others from using theirs. It is like having different set of rules for them. I don't believe training itself is ruining anything, it is the proposed model of value capture and marginalizing content creators that poses greater threat.
Yes, you've condensed the problem in display quite well here. It's not even just hypocrisy, but also short sighted behaviour.
Artists will learn to not trust the web, if they haven't already. The greatest time to train a model was yesterday, eventually no novel ideas, expressions, art will prosper on the "open" web. Just a regurgitation of some statistical idea of words, and pixels.
They can write whatever they want in their Terms of Service. That's the logic.
That doesn't mean that courts will meaningfully enforce it for them.
I understand that, only pointing out the hypocrisy
Because any company, hypocrisy be damned, will use every legal lever at their disposal to protect their business model.
Hope we are not normalizing hypocrisy, usually it is very destructive.
AI models are fundamentally different because a computer is a lump of silicon which is neither a moral subject nor object. A human author is a living sentient being that needs to earn a living and is deserving of dignity and regard.
I'm sorry, but I'm going to fundamentally disagree with you. One does not get a morality pass because "the computer did it". People are creating these AI models, selecting data and feeding the models data on which to be trained. The outcome of that rests upon _both_ the creators of the models and the users prompting the models to achieve a result.
To make it even more stark, people don't kill people, it's the gun that does it.
Oh, right. It just reads a million books in a couple of days, removes all the source information, mix and match it the way it sees fit and sells this output $10/month to anyone comes with a credit card.
It's the same thing with GitHub's copilot.
A book publisher would seize everything I have, and shot me at a back alley if I do 0.0001% of this.
Yeah, fair use implicitly uses the constraints of typical human lifetime and ability to moderate how much damage is done to publishers with it. That wasn’t an issue before recently, as humans were the only ones who could create output based off fair use laws.
Authors Guild, Inc. v. Google, Inc. strongly disagrees with you on that (the "Google Books case").
Humans usually add their own style to things, and it’s hard to discuss copyright without that larger context along with the question of scale (me making copies of your paintings by hand is not as significant a risk to your livelihood as being able to make them unimaginably faster than you can at much lower cost). Just as rules about making images of people in certain ways or places only became critical when photography made image reproduction an industrial-scale process, I think we’ll be seeing updates to fair-use rules based on scale and originality.
Humans can also come up with their own styles and can draw things they’ve never seen, which ML models as they currently exist are not capable of (and likely will never be). A human artist who has lived their entire life in the wilderness and has never trained themselves with the work of another artist will still be able to produce art with styles produced entirely by personal experimentation.
ML models have a long way to go before comparisons to humans make any kind of sense.
Humans who train on material usually buy a book of it, pay for an entry to an exhibition or even pay to own an original.
Maybe they release it for free, possibly supported by ads, at least to get some recognition and a job if it is perceived well.
Yeah, and creating derivative work without permission is against the law.
[citation needed]
Of course they are fundamentally different. They don't get to decide what to absorb.
Humans that make those decisions, correspondingly, should pay the price.
I really don't get why so many people seem to think that an AI model training on copyrighted work and outputting work in that same style is exactly the same thing (morally, ethically, legally, whatever dimension you want) as a human looking at copyrighted work and then being influenced by that work when they create their own work.
The first thing is the output of a mathematical function as computed by a computer, while the second is an expression of numan creativity. AI models are not alive. They are not creative. They do not have emotion. These things are not even in the same ballpark, let alone similar or the same.
Maybe someday AI will be sophisticated enough to be considered alive, to have emotion, and to be deserving of the same rights and protections that humans have. And I hope if and when that day comes, humanity recognizes that and doesn't try to turn AI into an enslaved underclass. But we are far, far from that point now, and the current computer programs generating text and images and video are not exhibiting creativity, and do not deserve these kinds of protections. The people creating the art that is used to feed the inputs and outputs of these computer programs... those are the people that should have their rights protected.
As other already pointed out, that's not how human artists learn or produce art. Everyone who uses this brain-dead argument outs themselves as someone who knows nothing about the subject.
The heck are you on about?
Have you ever tried to "train a human"?
They don't work that way, not unless your "training" involves so weird torture stuff you probably shouldn't be boasting about.
Maybe try ask some teachers (of both adults and children) how it works with people...
There's a hell lot of money to be made from this belief so of course the HN crowd will hold it.
Some of us here who have been around the copyright hustle for a little longer laugh at this bitterly and pray that the courts and/or Doctorow's activism saves us. But there's so much money to be made from automatized plagiarism and the forces against are so weak, the hope is not much.
The world will be a much, much poorer place once all the artists this view exploits will stop making art because they need to make a living.
See https://twitter.com/molly0xFFF/status/1744422377113501998 and https://i.imgur.com/zOOcPCi.jpg
Generative models are just a tool. Artists are mad because this tool empowers other people, who they view as less talented, to make art too.
The camera and 1-hour film developing didn’t destroy oil paintings, it just enabled more people to have control over what was on their walls.
Sure. It's just a tool. That need other people's art to work.
If it's "just a tool" in and of itself, then there's no problem keeping it away from other people's art.
Because the copyright laws were extended to include photographic reproduction of art as something you need to obtain a permission (and a license) for.
The same needs to happen for generative AI.
A photocopy machine is just a tool too. So is the printing press.
So does a human brain.
Which brings us to the other side of the reasoning is that tools like Midjourney and OpenAI enable idiots (when it comes to drawing/animating ... that includes me) to create engaging artwork.
Recently generating artwork like that went from almost impossible to "great, but easily recognizeable as AI artwork". Frankly, I expect the discussion will end when it stops being recognizeable.
I hate Andreesen Horowitz' reasoning, but they're right about one thing: once we have virtual artists that are not easy to distinguish from "real" ones, the discussion will end. It does not really matter what anyone's opinion on the matter is as it will not make a difference in the end.
A major difference between a human training himself by looking at art, and a computer doing it, is that the human ends up working for himself, the computer is owned by some billionaire.
One enhances the creative potential of humanity as a whole, the other consolidates it in the hands of the already-powerful.
Another major difference is that a human can't use that knowledge to mass-produce that art at a scale that will put other artists in the poorhouse. The computer can.
Copyright exists to benefit humanity as a whole... And frankly, I see no reason for why a neural network's output should be protected by copyright. Only humans can produce copyrightable works, and a prompt is not a sufficient creative input.
Visual artists cannot create without tools. Whether that tool is a brush and paint, a camera, or a neural network.
Whether an artist pays for a subscription to openAI or buys paint pots on Amazon.com money is going to a billionaire, it is not a difference between ai and other art.
You are also ignoring the existence of non-commercial open source AI, they exist.
Regarding copyright, we copyright output not input. Otherwise most photography would be uncopyrightable.
One small nitpick: It is completely possible for an artist to make all of their own tools, and indeed for the majority of history that is exactly how things went.
But today the artist that can also create a robust version of photoshop on their own doesn’t really exist. Maybe some can write code to that level but certainly not a majority and it’s certainly not the same as sanding wood to make a paintbrush.
Ok, here's a pile of sand, the goal is 1. a computer and 2. an AI to run on it. Go!
(spoiler: bootstrapping yourself up the tech tree gets progressively harder)
There's a substantive difference in whether the artist is using the tool, or the tool works on its own. A paintbrush doesn't produce a painting by itself, a human needs to apply an incredibly specialized creative skillset, in conjunction with the paintbrush to do so.
An LLM takes a prompt and produces a painting. No sane person would say that I 'drew' the painting in question, even if I provided the prompt.
We copyright things that require creative input. A list of facts or definitions did not require creative input, and is therefore not copyrightable.
Using an LLM does not meet the bar for creative input.
That sounds like a kinder restatement of the opinion at the top of the thread: "Artists are mad because this tool empowers other people, who they view as less talented, to make art too."
Artists might not like the phrasing, but scratch the surface and there's a degree of truth there. It's an argument from self-interest, at core.
Your brain is a neural network, just FYI.
If you graduated from school and only used work that was public domain, would you have all the knowledge you currently have? Have you learned anything from anybody since graduating?
Where is the line? It’s ok for humans to learn from others work but not a machine?
Yes.
The machine doesn't get to make its own choices. Once it does, we'll have a different conversations.
Presently, humans decide what goes into the training set, and what comes out. Those humans are the ones that we need to regulate.
Hot take: The LLM is learning as much as a ZIP file is learning.
No, I have to disagree here. I'm not an artist, but I respect the creations of others. OpenAI does not. They could have trained on free data, but it did not want to because it would cost more (finding humans to find/sanction said data, etc).
I literally met and worked with Doctorow on a protest back in 2005, so I'm not exactly new to this. I also think that the only way you could have written your comment was by grossly misinterpreting my comment.
I hope the idea of Intellectual Property as a whole is thrown out the window and copyright with it.
That is a pretty unfair response when you're skipping the part about how commercializing should not be fair use.
I don't think this is as cut-and-dry and you make it out here. If I train a model on, say, every one of New York Times' and release it for free and it finds use as a way of circumventing their paywall I have difficulty justifying that as fair use/fair dealing. The purpose/character of the model should indeed be a factor but certainly not nearly as dispositive a one as I think you're suggesting.
To the extent that training the model serves a research purpose I think that the general use / public release of the trained model does not in general serve the same research purpose and ought to have substantially lower protection on the basis of, e.g., the effect on the original work(s) in the market.
Wouldn't that depend on the use case? If you just had the model regenerate articles that roughly approximate its source material that is much a more clear cut violation of a paywall. But if you use that data as general background knowledge to synthesize aggregative works such a history of the vietnam war, or trends in musical theatre in the 1980s relative the 1970s, or shifts in the language usage of formal honorifics, then that seems to me to be clearly fair-use categories. There are gray areas, such as aggregating the opinions of a certain op-ed writer over a short timeframe that while it might produce a novel work, is basically is mixmash of recent articles. But would that be unfair, especially if not done in the original authors style?
These technical distinctions like these probably will matter in whatever form regulation eventually ends up becoming.
Quite a lot of what news publications like the New York Times do is precisely regenerating articles that roughly approximate source material from some other publication. If I remember rightly, a lot of smaller, more local news organisations aren't happy about this because of course it's a more or less direct substitute for the original and a handful of big news organisations (particularly the New York Times) are taking so much of the money that people are willing to pay for news that it's affecting the viability of the rest - but it's not illegal, since facts cannot be copyrighted.
Yes, I think this is a rather fact-specific inquiry. My main point is that the research/commercial distinction is not the only factor (and not even the most important one).
I don't think this is clear. If someone were to train a model on several books about the Vietnam War and then publish my own model-created book on history of the Vietnam War, I would be inclined to say that that is infringement. And if they changed the dataset to include a plurality of additional books which happen to not be about Vietnam, I don't think that changes the analysis substantially.
I think it is hard to earnestly claim that in that instance the output (setting aside the model itself, which is also a consideration) is transformative, and so I would think, absent more specific facts, that all four fair use factors are against it.
So what about the fact that these cartoons look like Keith Haring meets Cathy Guisewite meets Scott Adams? These cartoons are artistically derivative. They are obviously not derivative from the perspective of copyright as style is an idea, not an expression.
These models were not trained on just the cartoonist in question, nor just their inspirations. The intent was to train on all images and styles. The expression of the idea using these models is not going to match the expression of the idea of all images, even those conforming to a certain bounded prompt.
For the life of me I can't get DALL-E or Stable Diffusion to produce anything like Cat and Girl nor anything coherent for the above mentioned inspirations. DALL-E flat out refuses to create things in the style of the above and Stable Diffusion has insane looking outputs, overwhelmed by Herring.
Most importantly, copyright is concerned with specific works that specifically infringe and whose damages are either statute or based on quantifiable earnings from infringement. Copyright does not cover all works, especially when again, the intent is to learn all styles that rarely, if at all, reproduces direct expressions.
The only point at which these images are directly copied are when in the machine's memory, which has already has case law for allowance, followed by back propagation that begins the process of modifying the direct copies for the underlaying formal qualities.
It seems like a lot of people are going to be upset when the courts rule eventually rule in favor of the training and use of these models, if not only because the defendant has a lot of resources to throw at a legal team.
your argument is that it's not infringing because they copied everything at once?
I get that there's case law on copying in memory on the input side not being infringing but can't for the life of me understand how they get away with not paying for it. At least libraries buy the books before loaning them out, OpenAI and midjourney presumably pirated the works or otherwise ignored the license of published works and just say "if we found it on the internet it's fair game"
Libraries loan out specific books!
Can you explain what you mean by "qualifies as free use?" I've never heard that term before.
I would be reasonably confident they mean "fair use" [1] instead.
[1]: https://en.wikipedia.org/wiki/Fair_use
The poster probably meant "fair use", which is an American term of copyright law. The UK, Canada, and other commonwealth countries have a concept known as "fair dealing" which is similar to, but different than fair use[1]. EU copyright law has explicitly permitted uses[2] which are exceptions to copyright restrictions. Research is one of them, but requires explicit attribution.
[1] https://library.ulethbridge.ca/copyright/fairdealing/#s-lib-... [2] https://www.ippt.eu/legal-texts/copyright-information-societ...
One could argue that you're paying for the resources required to run the model rather than paying for the using the model.
Depends, some models are not freely available and in that case you pay very much for access to the model
I think it's worth noting that one of the things that makes this question so vexing is that this topic really is pretty novel. We've only had a few machines like this in history and almost no legal precedent around how they should be treated. I can't remember anyone ever bringing suit over a Markov chain engine, for example, and fabricating one is basically "baby's first introductory 'machine intelligence' project" these days (partially because the output sucks, so nobody has ever felt they have something to lose from competing with a Markov engine).
Existing copyright precedent serves this use-case poorly, and so the question is far more philosophical than legal; there's a good case to be made that there's no law clearly governing this kind of machine, only loose-fit analogies that degenerate badly upon further scrutiny.
That’s literally what human artists do, and how they work. Art is iteratively building on the work of others. That’s why it’s so easy to trace its evolution.
Because we are humans and our capability of abusing those rights is limited. The scale and speed at which LLMs can abuse copyrighted work to threaten the livelihoods of the authors of those works is reason enough to consider it unethical.
I don’t think it is. What you describe is similar to any other industry disruption, and I don’t think those are unethical. I’d actually argue that preventing disruption is often (not always) unethical, because you artificially prolong an inefficient or inferior alternative.
So you're saying that, we should stop pursuing art and prose? Because when you fine tune midjourney with 30 or so images of an artist, it can create any image with the artist's style.
You removed the value and authenticity that artist in 30 minutes, you applauded it, and defended that it should be the norm.
OK then, we can close down all entertainment business, and generate everything with AI, because it can mimic styles, clone sounds, animate things with gaussian splats, and so on.
Maybe we can hire coders to "code" films? Oh sorry. ChatGPT can do that too. So we need a keypad then, only the most wealthy can press. Press 1 for a movie, 2 for a new music album, 3 for a new book, and so on.
We need 10 buttons or so, as far as I can see. Maybe I can ask ChatGPT 4 to code one for me.
"So you're saying that, we should stop pursuing art and prose?", no, it becomes a hobby like any other. People still sew for fun.
Great, hold on, I'm calling Hollywood to tell that all they do is a hobby now.
...and the writers' guild, too.
Well obviously AI isn't at the level of replacing Hollywood yet.
But once it is? I mean, yeah, it'll replace Hollywood.
People will tell Netflix, "hey I want a move about X in the style of Y and I want Z to star in it", and bam -- your own bespoke movie.
I mean, once the capability's there, it's just inevitable. And yeah -- acting will become a hobby, just like sewing is today.
Then people will see how empty and inferior it is and want movies with actual people and writers again.
Then the market will decide, won't it? Why the fuss about generative AI then? If you're so confident about its inferiority, you shouldn't have to worry about it, right? The better product will win, right?
The market does not choose the superior product. It might choose the least common denominator, the cheapest product, the product that got on the market the earliest, or the one with the richest backers, but not "the superior product".
The "superior" product is subjective.
The objectively superior product is the one that people pay for. They are exchanging labor/capital for the item/content.
I could make the best movie ever conceived, the movie to end all movies. If nobody watches it, it has 0 value.
The first part is debatable, unless you qualify it as "superior at making their creator money".
The market selects for that, and only that. Other qualities of the product are secondary, making any statements to the effect of "the best product [outside the context of simply making the most money] will win" misguided at best.
No, because the market isn't fair.
What will actually happen is people will think "meh good enough", shitty AI art will become the norm, and we'll be boiling frogs and not realize how shitty things have become.
Honestly, that'll be boring. I don't want to be a star of a movie, that's not what pulls me in.
I want to see what the person has imagined, what the story carries from the author, what the humans in it added to it and what they got out of it.
When I read a book, I look from another human's eyes, with their thoughts and imagination. That's interesting and life-changing actually. Also, the author's life and inner world leaks into the thing they created.
The most notable example for me is Neon Genesis Evangelion. The psychological aspects of it (which hits very hard actually) is a reflection of Hideaki Anno's clinical depression. You can't fake this even if you want.
This is what makes human creation special. It's a precipitation of a thousand and one thing in an unforeseen way, and this is what feeds us, albeit we are not aware of this and love to deny it at the same time.
"This is what makes human creation special.", that's a load of garbage. There is nothing inherently special about human creation. Some AI artwork I've seen is incredible, the fact it was AI generated didn't change its being an incredible piece of art.
Thinking our creation has some kind of 'specialness' to it is like believing in a soul, or some other stupid thing. It's pure hubris.
Actually, I'm coming from a gentler point of view: "Nature and living things are much more complex than we anticipate".
There are many breakthroughs and realizations in science which excite me more than "this thing called AI": Bacteria have generational memory. Bees have a sense of time. Mitochondria (and cells) inside a human body communicate and try to regulate aging and call for repairs. Ants have evolved antibiotics, and expel the ones with incurable and spreadable diseases. Bees and ants have social norms, they have languages. Plants show more complex behavior than we anticipated. I'm not entering the primates' & birds' region because only the titles will be a short chapter.
While some of them might be very simple mechanisms on chemical level, they make a much more complex system, and the nature we live in is much sophisticated than we know, or want to acknowledge.
I'm not looking from "Humans are superior" perspective. Instead, I'm looking from "our understanding of everything is too shallow" perspective. Instead of trying to understand or acknowledge that we're living in a much more complex system on a speck of dust in vast emptiness, we connect a bunch of silicon chips, dump everything we babbled to a "simulated neural network", and it gives us semi-nonsensical, grammatically correct half-truths.
That thing can do it because it randomly puts a word after word after a very complex and weighted randomization learned from how we do it, but imitating it blindly, and we think that we understood and unlocked what intelligence is. Then we applaud ourselves because we're one step closer to strip a living thing from its authenticity and making Ghost in the Shell a reality.
Living things form themselves over a long life with sight, hearing, communication, interaction and emotions, at least, and we assume that a couple of millions lines of code can do much better because we poured a quadruple distilled, three times diluted version of what we have gone through.
This is pure hubris if you ask me, if there's one.
Artists style is not copyrightable, at least in the US.
And if they changed that because of "AI"? My word, the lawsuits that would arise between artists...
Doesn't matter. You pay the artist for their style of rendering things. Consider XKCD, PHD Comics, Userfriendly, etc. At least 50% of the charm is the style, remaining 50% is the characters and the story arc.
You can't copyright style of a Rolex, but people pay a fortune to get the real deal. Same thing.
Artists imitate/copy artists as a compliment, at least in illustration and comics world. Unless you do it in bad faith, I don't think artists gonna do that. Artists have a sense of humor to begin with, because art is making fun of this world, in a sense.
No, you pay them for the finished product. The STYLE is independent. Lots of artists have similar styles. They don't all pay each other for copying their styles.
Every artist has their own style, because it's their way of creating the product.
Pixar, Disney and Dreamworks have different styles, same for actors, writers, and designers, too. You can generally tell who made what by reading, looking, listening, etc.
I can recognize a song by Deep Purple or John Mayer or Metallica, just by their guitar tone, or their mastering profile (yes, your ear can recognize that), in a couple of seconds.
If style was that easy, we could have 50 Picassos, 200 John Mayers, 45 Ara Gulers (A photographer) which you can't tell them apart, but it doesn't work that way.
XKCD took a couple of guest artists because of personal reasons. It was very evident, even if the drawing style was the same.
People, art, and hand made things are much more complex than it looks. Many programmers forget because everything is rendered with their favorite font, but no two hand-made thing is ever the same. Eat the same recipe from two different cooks, even if you measure the ingredients independently and give them beforehand, you'll have different tastes.
Style is a reflection of who you are. You can maybe imitate it, but you can't be it.
Heck, even two people implementing the same algorithm in the same programming language doesn't write the same thing.
Isn't this an argument that AI-generated artwork will never be more than a lesser facsimile? That'd suggest that human-made works will always be more sought-after, because they're authentic.
It'll be, and human made things will always be better and more sought-after, however capitalism doesn't work that way.
When the replacements become "good enough", it'll push the better things because of being cheaper and 90% being there. I have some hand-made items and they're a treat to hold and use. They perform way better than their mass produced ones, they last longer, they feel human, and no, they're not inferior in quality. In fact it's the opposite, but most of them are not cheap, and when you want to maximize profits, you need to reduce your costs, ideally to zero.
Do you really feel that way universally? Would it be ethical to disrupt the pharmaceutical industry by removing all restrictions around drug trials? Heck, you could probably speed things up even further if you could administer experimental drugs to subjects without their consent.
Obviously this is a bit facetious, but basing your ethical framework on utilitarianism and _nothing_ else is pretty radical.
If having those restrictions makes the world worse overall, then it would be ethical to remove them. But I assume the restrictions are designed by intelligent people with the intention of making the world better, so I don’t see any reason to think that’s the case.
I agree that the current crop of artists are worse off with AI art tools being generally available. But consumers of art, and people who like making art with AI art tools, are better off with those tools being available. To me it’s clear that the benefit of the consumers outweighs the cost to the artists, and I would say the same if it was coders being put out of jobs instead. You can prove this to yourself by applying it to anything else that’s been automated. Recording music playback put thousands of musicians out of work, but do you really regret recorded music playback having been invented?
P.S. Adobe firefly is pretty competent and is only trained on material that adobe has the license to. If copyright were the real reason people didn’t like AI art tools, you would see artists telling everyone to get Adobe subscriptions instead of Midjourney.
Worse how? As defined by whom?
You could make a pretty compelling argument that "the world" would be better off by, e.g., forcing cancer patients through drug trials against their will. We basically could speed run a cure to cancer!
These longtermist, ends justify the means, ideas can easily turn extremely gross.
Yes, that is true. I 100% agree. It is needed without a doubt.
For one moment, let's think it this way. You are a 20-year experienced engineer who is making whatever money you are making. Suddenly, your skills are invalidated because of a new disruption. And you have another friend in, the same situation.
Fortunately for you, luck played out and you could transition! You found a way into life, meaning and value. Your joy and your everyday life continued as it is.
But the other friend enjoyed the process, and liked doing what they were doing and there was no suitable transition for them. Humans are adaptable, but to them, nothing mattered because the whole existence didn't offer any value. The sole act of doing was robbed WITHOUT ANY ALTERNATIVE. The experience and value of a person rendered worthless.
Can you relate to that feeling? If yes, thank you.
If no, your words are empty and hold no value.
Artist went through the similar phase during the invention of photography. Now, it is rather soul-crushing because anything an artist make can easily be replicated, making the whole artistic journey a moot.
Being sympathetic towards those people doesn't mean you should bend to their will if you don't believe it's the right thing to do. I can be sympathetic to a child who cries over not being able to ride a roller coaster because they aren't tall enough without thinking the height requirement should be removed.
I think the big difference is that it's not a direct replacement - it feeds off of the existing people while making it much harder for them to make a living.
It would be as if instead of cars running on gasoline, they ran on chopped up horseflesh. Not good for the horses, and not sustainable in the long term.
Don't even try to stop my grocery-store-sample-hoarding robot army, Wegmans! You're being unethical in your pathetic attempt to prevent your sampling disruption!
Some "disruptions" are unethical, some are not. It's about what they actually consist of. Labelling many things as "industry disruption" abstracts beyond usefulness.
Are photocopy machines illegal? Are CD-ROM burners illegal? Both allow near-unlimited copies of copyrighted material at a scale much faster than a human could do alone.
The tools are not the problem, it's how humans use them.
They can be used in an illegal way if used to copy copyrighted material, yes.
And are the burners themselves illegal because you can name illegal uses for them?
No, and I don't think anyone is arguing that LLMs should be illegal either.
I personally am not against LLMs training on things the operator has rights to, and even training on copyrighted things, but I am against it laundering those things back out and claiming it's legal.
Same as an LLM, they can be used in an illegal way if used to copy copyrighted material. So I can't tell it to reproduce a copyrighted work. But it can create new material in the style of another artist.
The difference is that the LLM is still copying copyrighted material in your case, but if I burn a Linux ISO, that is not happening.
You do not have to produce an exact copy of something to violate copyright, and I think anything the LLM outputs is violating copyright for everything it has ever trained on, unless the operator (the person operating the LLM and/or the person prompting it) has rights to that content.
"abusing those rights" is a subjective phrase. What about it is "abuse"? If I learned how to draw cartoon characters from copying Family Guy and released a cartoon where the characters are drawn in a similar style, would that be abuse (assuming my show takes some of Family Guy's viewership)? Is your ethical hangup with the fact it's wrong to use the data of others to influence one's work (which could potentially be an algorithm) or that people are losing opportunities based on the influenced work?
If it's the latter how do we find the line between what's acceptable and what's not? For example, most people wouldn't be against the creation and release of a cure for cancer developed in this way. It would lead to the loss of opportunities for cancer researchers but I believe most people would deem that an acceptable tradeoff. A grayer area would be an AI art generator used to generate the designs for a cancer research donation page. If it could potentially lead to a 10% increase in donations, does that make it worth it?
Intellectual property law does presently restrict the development of cancer treatments and demands in many cases exorbitant royalties from patients and practitioners, so I'm not convinced that this is accurate. If people believed that the loss of opportunities would constrain innovation in the field of cancer research, I think they'd expect the AI users to pay royalties as well.
This comes down to the product of AI.
If the AI produces a cancer treatment identical to what is already covered by patent, I think commercialization would be contingent on the permission of the IP holder.
If the AI produced a novel cancer treatment, using a transformative synthesis of available knowledge, Most people would not expect royalties.
I never made a legal appeal in my previous comment so legalities are irrelevant. It also differs from my argument on derivative/transformative works rather than specific works.
What I was questioning was whether people would think it's morally right or not to generate inspired works. For example, if someone made an algorithm to read the relevant papers and make a cancer treatment that addresses the same areas/conditions of a method under IP law but don't equate to the exact method, I don't see that as a morally wrong action by itself.
Because we are humans and our capability of abusing those rights is limited. The scale and speed at which looms can abuse copyrighted work to threaten the livelihoods of the seamstresses of those works is reason enough to consider it unethical.
Replace loom for printing press etc, you realize you're a luddite?
Ned Ludd was onto something. He wasn't anti-progress. He was anti-labour theft. The problem was not that people were losing their jobs, but that they were being punished by society for losing their jobs and not being given the ability to adapt, all to satisfy the greed of the ownership class.
I am hearing a strong rhyme.
Commercialized LLMs are absolutely labour theft even if they are useful.
Capatalism has really done a number on the human psyche, WE WANT OUR LABOR STOLEN. That's the whole point, so we don't have to labor anymore.
Boggles my mind how warped peoples thinking is.
We do not want our labour stolen. We want to labour less, and we want to be fairly compensated for when we have to labour.
The Luddites and the original saboteurs (from the French sabot) had a problem where the capital class invested in machines that let them (a) get more work done per person, (b) employ fewer people, and (c) pay those fewer people less because now they weren't working as hard. The people they fired? They (and the governments of the day — just like now) basically told them to go starve.
Yes, we want to work less. But fair work should result in fair compensation. Ultimately, this is something that the copyright washing of current commercialized LLMs cannot achieve.
But unethical =/= illegal, unfortunately.
But unethical =/= illegal, unfortunately.
That is very much fortunate.
Does copyright law say you can ingest copyrighted work at very large scale and sell derivates of those works to gain massive profit / massive market capitalizations? Honestly wondering. This seems to be the crux issue here.
Like how Google has parsed webpages of content to develop their page rank algorithm for searching on the web? I'm assuming it does.
Google is not producing something that competes with or is comparable to what it's parsing and displaying, which makes it very different.
Google is displaying the exact content and a link to the source, and is functioning as a search engine.
Copying music (or whatever), and then outputting music based on the copied music is not the same thing as a search engine, it's outputting a new "art" that is competing with the original.
Another way to put it, is that you can't use a search engine to copy something in any meaningful way, but copying music to produce more music is actually copying something.
The goal of my post was not to answer what differentiates google search with LLMs and other generative models, it was to respond to the original post above:
The reasons as to why I don't think training on copyrighted data are stated in my other comments replying to people who have made arguments about its immorality.
Googles search engine is not selling derivative works.
If you search for a Disney movie on Google search, it does not try to sell you a film derived from the movie.
They sell you ad space on full Disney movies (re)uploaded by random people who are not affiliated with Disney though: https://www.google.com/search?q=finding+nemo+full+movie
I can also get Disney coloring book pages directly from Google's cache on Google images: https://www.google.com/search?q=disney+princess+coloring+boo...
Authors Guild, Inc. v. Google, Inc. determined that Google's wholesale scanning and uploading of books is allowed under the first sale doctrine because the University of Michigan library they borrowed the books to scan from paid for them (or a donor paid for them, at some point). Here's a book of bedtime stories available in its entirety: https://www.google.com/books/edition/Picnics_in_the_Wood_and...
No, because crawling the web, ingesting copyrighted content, and ranking them is not a derivative work of that content.
If crawling the web, ingesting copyrighted content, and ranking them is not a derivative work for that content, then using them to change the values of a mathematical expression should also exempt the expression from being a derivative work.
In that case the OP should have never posed this irrelevant question because access to the expression isn't giving access to a derivative work.
It does. See app stores and the endless copy-cat games and apps, right down to the art style.
I sue people for copyright infringement frequently. It's rare that I have a defendant whose defense is "the internet is full of other infringers, why should I be held responsible?" Never have they won. This debate would go better if people didn't base it on assumptions they gleam from the world around them, but with regard to the actual law and not specious reasoning like "well, they did it too!!"
I'd love to be the lawyer of the first anime artist then.
Why engage with nearly 200 years of copyright jurisprudence when you can just insist you are right because anime?
No, I can insist they are copying my style, which is anime.
Or maybe you cannot copyright style, and all those apps do fall on legal ground?
Style is not copyrightable. Please make an actual effort to engage with copyright law and not just ask me smarmy questions because you think you are right because you've made no efforts past looking at things immediately in front of you.
https://letmegooglethat.com/?q=is+style+copyrightable
Entire industries exist dedicated to such things. News aggregators. TV parody shows. Standup comedians. Fan fiction. Biography writers. Art critics. Movie critics. Sometimes the derivative work even outsells the original, especially when the original was horrible or unappreciated. I have never played Among Us or the new Call of Duty, but I do enjoy watching NeebsGaming do their youtube parodies of them.
No, copyright law prohibits that. The best example so far is Google's image search being considered a fair use, notably there, its not commercial in as far as they do not sell the derivative work, though they might sell ads on the image search results. OpenAI sells their service which is the result of the copies, i.e. the a derivative work. It's also probably true that the AI weights themselves are derivatives of the works they are based from.
Yes, I believe that is correct. If you do something "transformative" with the material then you are allowed to treat it as something new. There's also the idea of using a portion of a copyrighted work (like a quote or a clip of a song or video), this would be "fair use".
It's important to consider in any legalistic argument over copyright that, unlike conventional property rights which are to some degree prehistoric, copyright is a recent legal construct that was developed for a particular economic purpose.
https://en.wikipedia.org/wiki/Intellectual_property#History
The existing standards of fair use are what they are because copyright was developed with supporting the art industry as an intentional goal, not because it was handed down from the heavens or follows a basic human instinct. Ancient playwrights clipped each others' ideas liberally; late medieval economists observed that restricting this behavior seemed to encourage more creativity. Copyright law is a creation of humans, for humans, and is subordinate to moral and economic reasoning, not prior to it.
Copyright only makes any sense for goods with a high fixed cost of production and low to zero marginal cost. Any further use beyond solving that problem is pure rent seeking behavior
Also, with computers being functional copyright has become a tool of social control; any function in a physical object can be taken away from you at a whim with no recourse so long as a computer can be inserted into the object. Absent a major change in how society sees copyright I envision a very bleak totalitarian future arising from this trend.
Don't put the cart before the horse. What art will we have for AI to copy if there's no more artists next generation?
The future of good AI art is Adobe Firefly; a tool in a picture editor which gives users great productivity for certain tasks. Artists won’t go extinct; they will be able to produce a lot more art.
That's the future of AI art - but is AI art the future of art? if AI artists can't maintain any profit from their work, how are they going to afford the compute time?
If that's the case, then novels, news articles, digital images, etc. are things that copyright absolutely makes sense for. If you think that they have a "low cost of production", you are sadly misinformed about the artistic process.
Some of these have vanishingly low marginal costs when it comes to reproduction, but in light of their high fixed cost of production, I don't see how that matters.
Novels maybe. News articles, with rare exception, and digital images absolutely do not have high fixed costs of production.
No, copyright only makes sense insofar that it provides a net positive value for society: that it promotes/protects more creativity leading to economic output than it prevents.
That is, does the amount of creative/economic output dissuaded by allowing AI (preventing people who would not be able to or not want to create art if they couldn't get paid) exceed the creative/economic output of letting people develop and use such AIs?
GenAI reduces the fixed cost of creating images/text/whatever which all else equal will increase the amount created. Whether or not you think that is a good thing is probably mostly a function of do you make money creating these things or do you pay money to have these things created.
100% agree. But even then it's not very good. Abolish copyright, severely limit patents, and leave trademarks as they are. The IP paradigm needs an overhaul.
Sorry, but these arguments by analogy are patently ridiculous.
We are not talking about the eons old human practice of creative artistic endeavor, which yes, is clearly derivative in some fashion, but which we have well established practices around.
We are discussing a new phenomenon of mass replication or derivation by machine at a scale impossible for a single individual to achieve by manual effort.
Further, artists tend to either explicitly or implicitly acknowledge their priors in secondary or even primary material, much like one cites work in an academic context.
Also, the claim:
Is ridiculous. A. you haven't, nor will you ever actual do this. B. This is never how the system of artistic practice up to this point has worked precisely because this sort of activity is beyond the scale of human effort.
In addition, plagiarism exists and is bad. There's no reason that concept can be extended and expanded to include stochastic reproduction at scale.
If you feel artists shouldn't have a say and a future in which capital concentrates even further into the hands of a few technological elite who make their money off of flouting existing laws and the labor of thousands, by all means. But this argument that somehow by analogy to human behavior companies should not be responsible for the vast use of material without permission is absolutely preposterous. These are machines owned by companies. They are not human beings and they do not participate in the social systems of human beings the way human beings do. You may want to consider a distinction in the rules that adequately reflects this distinction in participatory status in a social system.
So your argument is predicated on the scale of inspired work being the problem?
I don't think this adds anything to the argument besides you using this as a reason analogies with humans can't be used to compare the specific concept of inspired works? I don't think this holds up.
Algorithms participating in social systems has nothing to do with whether inspired works have a moral claim to existence for some. The fact that your ethics system values the biological classification of the originator of inspired works is something that can't be reconciled into a general argument. I could make the claim that the prompt engineer is the artist in this case.
That can be said by the development of any technology. Fear of capital concentration is more a critique on capitalism than it is on technological development.
Technology does not exist in a vacuum. All of the utility and relevance of technology to humans is dependent on the social and economic conditions in which that technology is developed and deployed. One cannot possibly critique technology without also critiquing a social system, and typically a critique of technology is precisely a critique about its potential abuses in a given social system. And yes, that's what I'm attempting to do here.
This is a fair point. One could argue that an LLM, properly considered, is just another tool in the artist's toolbox. I think a major distinction though, between and LLM and, say, a paintbrush or even a text-editor, or photoshop, is that these tools do not have content baked into them. An LLM is in a different class insofar as it is not simply a tool, but is also partially the content.
The use of two different LLMs by the same artist, with a the same prompt, will produce different results regardless of the intent of the so called artist/user. The use of a different paintbrush, by the same artist, with the same pictorial intention may produce slightly different results due to material conditions, but the artist is able to consciously and partially deterministically constrain the result. In the LLLM case, the tool itself is a partial realization of the output already and that output is trained on masses of works of unknown individuals.
I think this is a key difference in the "AI as art tool" case. A traditional tool does not harbor intentionality, or digital information. It may constrain the type of work you can produce with it, but it does not have inherent, specific forms that it produces regardless of user intent. LLMs are a different beast in this sense.
Law is a realization of the societal values we want to uphold. Just as we can't in principle claim that training of LLMs on scores of existing work is wrong solely due to the technical function of LLMs, we cannot claim that this process shouldn't be subject to constraints and laws due to the technical function of LLMs and/or human beings, which is precisely what the arguments by analogy try to do. They boil down to "well it can't be illegal since humans basically do the same thing" which is a hyper-reductive viewpoint that ignores both the complexities and novelty of the situation and the role of law in shaping willful societal structure, and not just "adhering" to natural facts.
Your original quote was not using the impact of the technology, it was disparaging the algorithmic source of the inspired work (by saying it does not participate in social systems the way humans do).
LLMs, despite being able to reproduce content in the case of overtraining, do not store the content they are trained from. Also, the usage of "content" here is ambiguous so I assumed you meant the storage of training data.
To me, the content of an LLM is its algorithm and weights. If the weights can reproduce large swaths of content to a verifiable metric of closeness (and to an amount that's covered by current law) I can understand the desire to legally enforce current policies. The problem I have is against the frequent argument to ban generative algorithms altogether.
I would counter this by saying the prompts constrain the result. How deterministically depends on how well one understands the semantic meaning of the weights and what the model was trained on. Also, as a disclaimer, I don't think that makes prompts proprietary (for various different reasons).
Assigning "intent" is an anthropomorphism of the algorithm in my opinion as they don't have any intent.
I do agree with your last paragraph though, one (or even a group of) individual's feelings don't make something legal or illegal. I can make a moral claim as to why I don't think it should be subject to constraints and laws, but of course that doesn't change what the law actually is.
The analogies are trying to make this appeal in an effort to influence those who try to make the laws overly restrictive. There are many laws that don't make sense and logic can't change their enforcement. The idea is to make a logical appeal to those who may have inconsistencies in their value system to try and prevent more non-sensical laws from being developed.
Difference in scale (order of magnitude) is difference in kind (in every area of life), so yes, scale can be argued as the problem.
I think this is the central issue and is not limited to just AI generated art. Wealth concentrates to the few from each technological development. When robots replaced factory workers, the surplus profit went to the capital holders, not the workers who lost their jobs. AI generated art will be no different but I don't think it will replace the creative art that people will want to make, just the art that people are making to pay the bills.
It’s not replication and that’s all there is to it.
I'm not as interested in making a technical/legal argument, as I'm just sharing my feelings on the topic (and eventually, what I think the law should be), but during training copies are made of copyrighted material, even if the model doesn't contain exact copies of work. Crawling, downloading, storing (temporarily) for training all involve making copies, and thus are subject to copyright law. Maybe those copies are fair use, maybe it's not (I think it shouldn't be).
My main point is that OpenAI is generating an incredible amount of value all hinging on other people's work at a massive scale, without paying for their materials. Take all the non-public domain work off Netflix and Netflix doesn't have the same value they have today, so Netflix must pay for content it uses. Same goes for OpenAI imho.
Assume I agree that copyright holders should be compensated for their works (because I do in some sense).
How would this compensation work? Let's say a portion of profits from LLMs that were trained on copyrighted work should be sent to the copyright holders.
How would we allocate which portion of the profits go to which creators? The only "fair" way here would be if we could trace how much a specific work influenced a specific output but this is currently impossible and will likely remain impossible for quite some time.
This is what licensing negotiations are for. One doesn't get to throw up their hands and say "I don't know how to fairly pay you so I won't pay you at all".
Your argument is ridiculous, because it could identically be applied to "every human artist should have to pay a license to every artist whose work they were inspired by". That would obviously be a horrible future, but megacorps like Disney would love it.
Every time you view an image online you’re making a copy so that argument is spurious.
I'd feel a lot better about that argument if we had sane copyright laws and anything older than 7-10 years was automatically in the public domain. Suddenly, netflix is looking a lot more valuable with just public domain works and there'd be a ton of public domain art to train AI models with. I suspect that the technology would still leave a lot of artists concerned in that situation though, because even once the issue of copyright is largely solved the fact remains that AI enables people who aren't artists to create art.
Calling it copyright today is a misnomer, it's not actually the act of copying the work that's a problem; it should actually be called "Performance Rights" or "Redistribution Rights." The part where this gets complicated is that OpenAI has (presumably, if they haven't that's a different matter) acquired the works through legal means. And having acquired them they're free to do most anything with them so long as they don't redistribute or perform the works.
The big question is where "training an AI on this corpus of works and then either distributing the weights or performing the work via API" fall? Should the weights be considered derivative works? I personally don't think so and although the weights can be used to produce obviously infringing works I don't think this meets the bar of being a redistribution of the work via a funny lossy compression algo like some are claiming. But who knows? Copyright is more political than logical so I think the bend is really gonna be a balance of the tangible IRL harms artists can demonstrate vs. the desires of unrelated industries who wish to leverage this technology and are better for having all this data available.
The artist in the article clearly states that his work was free to use only if it was not used to make a profit, those were the terms of their license. In the artist's opinion, OpenAI violated that license by training their tool on their work and then selling that tool.
This artist doesn't complain about work similar to their own being generated, and their artwork is very clearly not clothing.
So? Why does the author's opinion even enter into the equation? Authors cannot claim ownership beyond the bounds of copyright. If what AI is doing qualifies as fair use, the artist cannot do anything about it. I'm sure that lots of artists would not want anyone to lampoon or criticize their work. They cannot stop such things. I'm sure lots of artists would never want anyone to ever create anything in any way similar to their work. They cannot do that either.
It is not clear that training an LLM falls under "fair use". We are then left with the license of the work, in this case that license forbids re-selling the work for a profit. It is the artist's license for their work at issue, not their opinion.
If the legality is ambiguous then we're left with an impending court decision. Fair use is an affirmative defense, considered case by case.
Replace "use" with "copy". No one may copy the work to make a profit. Fair Use has long been an exemption to copyright, with Learning an example of Fair Use. But no one expected AIs to learn so quickly. I don't think it is clear either way, and will end up in SCOTUS.
Free and redistribute the material
The proper construction is that copyright is an exemption from the freedom of speech. Fair use is a partial description of freedom of speech, a description to narrow the limits of copyright rather than to broaden the already limitless bounds of freedom of speech.
The default for expression is that it is allowed except if copyrighted, as opposed to copyrighted except when covered by fair use.
I disagree that a person learning is the same as an AI model being trained. That aside, typically fair use covers the use of an excerpt or a portion of the material, not reproduction of the work in it's entirety.
Agreed: in the end, courts will make the decision.
Clothes are inherently consumable goods. If you use them, they will wear out. If you do not use them, they still age over time. You cannot "copy" a piece of clothing without a truly astonishing amount of effort. Both the processes, and the materials, may be difficult or impossible to imitate without a very large investment of effort.
Compare this to digital art: You can copy it literally for free. Before AI, at least you had to copy it mostly verbatim (modulo some relatively boring transforms, like up/down-scaling, etc.). That limited artist's incomes, but not their future works. But in a post-AI world, you can suck in an artist's life's work, and generate an unlimited number of copycats. Right now, the quality of those might be insufficient to be true replacements, but it's not hard to imagine we'll be in a world not so far off when it will be sufficient, and then artists will be truly screwed.
Sure you can. There's a whole industry making knockoffs.
GP compared copying a piece of clothing to copying digital art. I'd say that setting up a factory to make knockoffs - or even "just" buying a sewing machine, finding and buying the right fabric, laying out the piece you want to copy, tracing it, cutting the fabric, sewing it, and iterating until it comes out right - would qualify as "a truly astonishing amount of effort" for a person.
You can outsource. Look for "knockoff clothing manufacturers".
Let's say I'm an artist. I have, thus far, distributed my art for consumption without cost, because I want people to engage with and enjoy it. But, for whatever reason, I have a deep, irrational philosophical objection to corporate profit. I want to preclude any corporation from ever using my art to turn a profit, when at all possible. I have accepted that in some sense, electrical and internet corporations will be turning a profit using my work, but cannot stomach AI corporations doing so. If I cannot preclude AI corporations from turning a profit using my work, I will stop producing and distributing my work.
Do you think it's reasonable for me to want some legal framework that allows me to explicitly deny that use of my work? Because I do.
When you put it that way, I think you just laid out the case for creating a Copyleft for art.
A copyleft license is enforced by copyright. That’s the reason others can’t simply ignore the license.
I agree with the other commenters about the scale of this “deriving inspiration from others” is where this feels wrong.
It feels similar to the ye olden debates on police surveillance. Acquiring a warrant to tail a suspect, tapping a single individual’s phone line, etc all feels like very normal run-of-the-mill police work that no one has a problem with. Collating your behavior across every website and device you own from a data broker is fundamentally the same thing as a single phone’s wiretap, but it obviously feels way grosser and more unethical because it scales way past the point of what you’d imagine as being acceptable.
In that example it's not the scale that makes it right or wrong, the scale of people impacted just affects the degree of wrongs that have been committed.
If acquiring a warrant is the basic action being scaled, I'd be okay with that ethically if it was done under, what I define as, reasonable pretenses. Regardless of how it scales, I still think it would be the right thing to do assuming the pretenses for the first action could be applied to everyone wiretapped. Now if I thought the base action was morally wrong (someone was tailed or wiretapped without proper pretenses), I'd think it's wrong regardless of the scale. The number of people it affected might impact how wrong I saw it, but not whether it was right or wrong to.
People keep saying this but it's actually much more complicated, and in many cases you can't view copyrighted content.
An example, MicroSoft employees are not permitted to view or learn from an open source (GPL-2) terminal emulator:
https://github.com/microsoft/terminal/issues/10462#issuecomm...
Another example is proprietary software that may have it's source available, either intentionally or not. If you view this and then work on something related to it, like WINE for example, you are definitely at risk of being successfully sued.
If you worked at MicroSoft and worked on Windows, you would not be able to participate in WINE development at all without violating copyright.
If you viewed leaked Windows source code you also would not be able to participate in WINE development.
An interesting question that I have, is whether training on proprietary, non-trade-secret sources would be allowed. Something like unreal engine, where you can view the source but it's still proprietary.
Another question is whether training on leaked sources of proprietary and private but non-trade-secret code, like source dumps of Windows is legal.
Your link isn't very clear, but I think your are talking about the "clean room design" strategy. https://en.m.wikipedia.org/wiki/Clean_room_design
The way this works is they way many of us are arguing that AI and copyright should work.
Vieiwing (or training on) copyrighted work isn't copyright infringement.
What can be copyright infringement is using an employee who has viewed (or a model that was trained on) copyrighted work to create a duplication of that work.
In most of the examples of infringing output that I've seen, the prompt is pretty explicit in its request to duplicate copyrighted material.
Models that produce copyrighted content when not explicitly asked for will have trouble getting traction among users who are concerned about the risk of infringement (such the examples you listed.)
I also see this approach opening an opportunity for models that acquire specific licenses for the content they train on that would grant licenses to the users of the model to duplicate some or all of the copyrighted works.
The responsibility for how a model is used should rest primarily on the user, not the model trainers.
Who is "we" here? Are you making a distinction between people and machines? If I built a machine that randomly copied from a big sample of arts that I wanted, would that machine be ok?
OpenAI built a machine that does exactly that. They just sampled _everyone_.
OP's argument was about right and wrong, not about legal and illegal. There's a difference.
You'd have to argue the entirety, everything about copyright law being ethical, to make your version of the argument.
Copyright is just made up for pragmatic purposes. To incentive creation. It does not matter if training models is not the same as reproducing something exactly if we wish to decide that it's unfair or even just desirable for economic incentive to disallow it, then we are free to make that decision. The trade offs are fairly profound in both directions I think and likely some compromise will need to be made that is fair to all parties and does not cripple economic and social progress.
Copyright is a bad idea in the first place, and should just be thrown out entirely; but that isn't the whole picture here.
If OpenAI is allowed to be ignorant of copyright, then the rest of us should be allowed, too.
The problem is that OpenAI (alongside a handful of other very large corporations) gets exclusive rights to that ignorance. They get to monopolize the un-monopoly. That's even worse than the problem we started with.
people and companies are copying copyrighted content when they're using datasets that contain copyrighted content (which also repackage and distribute copyrighted content - not just as links but as actual works/images too), download linked copyrighted content, and store that copyrighted content. plenty of copies created and stored, it seems to me.
and like, what, do you think they're trying their damnedest to keep datasets clean and to not store any images in the process? how do you think they retrain on datasets over and over? it's really simple - by storing terabytes of copyrighted content. for ease of use, of course - why download something over and over, if you can just download it and keep it. and if they really wanted to steer clear of copyright infringement, if there's truly "no good solution" (which is bullshit for compute, oh, they can compute everything but not that part) - why can't they just refrain from recklessly scraping everything, if something were to just 'slip in'? like, if you know it's kinda bad, just don't do the thing, right? well, maybe copyright infringement is just acceptable to them. if not the actual goal.
what they generate is kinda irrelevant - there's plenty of copyright infringement happening even before any training were to be done. assembling of datasets and bad datasets containing copyrighted content are the start and the core of the copyright problems.
there's a really banal thing at the core of this, and it's just a multi-TB storage filled with pirated works.
If training a model is fair use than model output should also fallow fair use criteria. The very first thing you can find on the internet about fair use is Wikipedia article on the topic. It lists a bunch of factors to decide whether something is fair use. The very first one has a quote from an old copyright case:
Most use of LLMs and image generation models do not produce criticism of their training data. The most common use is to produce similar works. You can find this very common “trick” to get a specific style of output to add “in style of <artist>”. Is this a direct way "to supersede the use of the original work”?
You can certainly see how other factors more or less put gen ai output into the grey zone.
The fact that clothing doesn’t qualify for copyright doesn’t mean text and images don’t. Or if you advocate that they don’t then you pretty much advocate for abolishment of copyright because those are the major areas of copyright applicability at the moment. Which is a stance to have but you’d probably be better to actually say that because saying that copyright applies to some images and text but not others is a much harder position to defend.
Just like the rest of AI, if your argument is "humans can already do this by hand, why is it a problem to let machines do it?", its because you are incorrectly valuing the labor that goes into doing it by hand. If doing X that has potentially negative side effect Y, then the human labor to accomplish X is the principle barrier to Y, which can be mitigated via existing structures. Remove the labor barrier, and the existing mitigation structures cease to be effective. The fact that we never deliberately established those barriers is irrelevant to the fact that our society expects them to be there.
Copying is not illegal, but publishing. You can have as many private copies as you wish for any content.
a lot of popular AIy tool is designed to mimic a specific artist's style. human is not permitted to draw so similar graph.
In theory: sure
In practice: not really, especially when you're small and the other side is big and has lots of lawyers and/or lawmakers in their pockets.
Disney ("In 1989, for instance, the company even threatened to sue three Florida daycare centers unless they removed murals featuring some of its characters") and Deutsche Telekom[1][2] ("the company's actions just smack of corporate bully tactics, where legions of lawyers attempt to hog natural resources — in this case a primary color — that rightfully belong to everyone") are just two examples that spring to mind.
[0] https://hls.harvard.edu/today/harvard-law-i-p-expert-explain... [1] https://www.dw.com/en/court-confirms-deutsche-telekoms-right... [2] https://futurism.com/the-byte/tmobile-legal-rights-obnoxious...
AI doing things that human laboriously learned and inspired from is just different. After all, sheer quantity can be its own quality, especially with AI learning.
Now, i am worried about companies like OpenAI monopolizing technology through making their technology proprietary. I think their output should be public domain and copyright should only apply to human authors if they should be at all.
Well, not exactly. Certain uses are fair. The question is does OpenAI's use count as fair. I don't think your immediate response comes close to addressing that question despite your conviction it does otherwise.
Also, clothing designs are copyrightable. The conviction expressed by some participants in this debate is exhausting in light of their familiarity with actual copyright law.
Same for patents
Most every fashion company has a legal team that reviews print and pattern, as well as certain other aspects of design, relative to any source of inspiration. My husband works in the industry and has to send everything he does for review in this way. I’m not sure where you got the idea that there are no IP protections for fashion, but this is untrue.
I feel the emotionally charged nature of the topic prevents a lot of rational discussion from taking place. That's totally understandable too, it's the livelihood for some of those involved. Unless we start making specific regulations for Generative AI, current copyright law is pretty clear: you can't call your art a Picasso, but you can certainly say it was inspired by Picasso. The difference is that GAI can do it much faster and cheaper. The best middle ground in my opinion is to allow GAI to train on copyrighted data, but the output cannot be copyrighted, and the model weights creating it can't be copyrighted either. Any works modified by a human attempting to gain copyright protection should have to fulfill the requirements to be substantiative and transformative just as fair use requires now.
I think there is a case to be made when AI models do produce copies. For instance, I think the NYT have a right to have an issue with the near verbatim recall of NYT articles. It's not clear cut though, when these models produce copies, they are not functioning as intended. Legally that might produce a quagmire, is it fair use when you intend to be transformative but by accident it isn't? Does it matter if you have no control over which bits are not transformative? Does it matter if you know in advance that some bits will be non transformative but you don't know which ones.
I presume there are people working on research relating to how to prevent output of raw training data, what is the state of the art in this area? Would it be sufficient to prevent output of the training data or should the models be required to have no significant internal copies of training examples?
I do think it's worth remembering there's a difference between "legal" and "good".
It's entirely legal for me to leave the pub every time it comes up to my round. It's legal for me to get into a lift and press all the buttons.
It's not unreasonable I think for people to be surprised at what is now possible. I'm personally shocked at the progress in the last few years - I'd not have guessed five years ago that putting a picture online might result in my style being easily recreated by anyone for the benefit mostly of a profitable company.
What you probably get is a LLM that can perfectly understand well written text as you might find on Wikipedia, but which would struggle severely with colloquial language of the kind found on Reddit and Twitter.
That's literally built into their corporate rules for how to take investment money, and when those rules were written they were criticised because people didn't think they'd ever grow enough for it to matter.
How is OpenAI compensating the owners of IP they trained their models on? Or is that not what you mean? It's certainly how I read the part of the GP comment you quoted.
So far, looks like funding a UBI study. As the IP owners are approximately "everyone" in law, UBI is kinda the only way to compensate all the IP owners.
https://openai.com/our-structure
That makes no sense. If I write a book by myself, post part of it on my website and OpenAI ingests part of it - how does that make anyone besides me myself and I an "owner" of the IP?
I don't understand why you're confused, but I think it's linguistics.
If you write a book by yourself and post parts on your website and they ingest it, you are the copyright holder of that specific IP, and when I post this specific comment to Hacker News I am the copyright holder of this specific IP.
In aggregate you and I together are the copyright holders of that book sample and this post, and I don't know any other way of formulating that sentence, though it sounds like you think I'm trying to claim ownership of your hypothetical book while also giving you IP ownership of this post? But that's not my intent.
I don't think you're trying to claim ownership. It sounded like you were suggesting that the only recourse for OpenAI would be to fund a UBI program as a form of payment instead of directly paying the people who own the IP it ingested?
Yes, I'm saying that because there's (currently) no way to even tell how much the model was improved by my comments on HN vs. an equal number of tokens that came from e.g. nytimes.com; furthermore, to the extent that it is even capable of causing economic losses to IP holders, I think this necessarily requires the model to be actually good and not just a bad mimic[0] or prone to whimsy[1] and that this economic damage will occur equally to all IP holders regardless of whether or not their IP was used in training. For both of these reasons independently, I currently think UBI is the only possible fair outcome.
[0] I find the phrase "stochastic parrot" to be ironic, as people repeat it mindlessly and with a distribution that could easily be described by a Markov model.
[1] if the model is asked to produce something in the style of NYT, but everyone knows it may randomly insert a nonsense statement about President Trump's first visit to the Moon, that's not detracting from the value of buying a copy of the newspaper.
They ingested the entirety of the internet. Everyone who has ever written anything, including our (implicitly copyrighted) HN comments and letters written 400 years ago, which is online was used to train GPT-4.
This is a load of bullshit and I sincerely hope you know that as well as I do.
As a thought experiment, let's say I pirate enough ebooks to stock a virtual library roughly equivalent in scope to a large metropolitan library system, then put up a website where you can download these books for free. I make money on the ads I run on this website, etc. This is theft, but as "compensation" I put some percentage of my revenues into funding a UBI study that might, if we're lucky—in half a century or so, in a progressive, enlightened version of the future we are by no means guaranteed to realize—make a fractional contribution to the thrust of a successful UBI movement.
Does that make what I'm doing okay? Should all those authors deprived of royalties on their work now, even deprived of publishing opportunities as legitimate sales collapse, understand my token contribution to UBI as fair compensation for what I'm taking from them?
That to me is a joke, and the only difference between it and what OpenAI is doing is that OpenAI's product relies on a technical means of laundering intellectual property that seems tailor-made to dodge a body of existing copyright law designed by people who could not possibly have conceived of what modern genAI is capable of. We will see what our lawmakers and courts make of it now, but either way, making a promise to pay me back later does not justify you in taking all the cash out of my wallet without my consent. Nor, for that matter, does tearing it up and returning it to me in the form of a papier-mâché sculpture of Shrek's head protruding from the bowl of a "skibidi toilet".
IMO that's a terrible thought experiment given the situation.
LLMs do not store enough content or with enough accuracy to even close to a virtual library. Unlike, say, Google and the Wayback Machine, the former of which stores enough to show snippets from the pages it's presenting to you as search results (and they got sued for that in certain categories of result), and the latter is straight up an archive of all the sites it crawls.
Furthermore, the "percentage" in question for OpenAI is "once we've paid off our investors, all of it goes to benefitting humanity one way or another" — the parent company is a not-for-profit.
Here's a different question for you: If a generative AI is trained only on out-of-copyright novels and open-licensed modern works, and then still deprives everyone of all publishing opportunities forever as in this thought experiment it's better and cheaper than any human novelist, is that any more, or any less fair on literally any person on the planet? The outcomes are the same.
I'm sure someone's already thought of making such a model, it's just a question of if they raised enough money to train such a model.
You may have noticed from the version number that they're on versions 3 and 4. When version 2 came out in 2019, they said:
"""Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model."""
They were mocked for this: https://slate.com/technology/2019/02/openai-gpt2-text-genera...
Even here: https://news.ycombinator.com/item?id=21306542
Indeed, they are still mocked for suggesting their models may carry any risk at all. Of any kind. There are plenty of people who want to rush forward with this and think OpenAI are needlessly slow and cautious.
You may also have noticed their CEO gave testimony in the US Congress, and that the people asking him questions were surprised he said (to paraphrase) "regulate us specifically — not the open source models, they're not good enough yet — us".
To the extent that any GenAI can pose an economic threat to a creative job, it has to be better than a human in that same job. For now, IMO, they're assistant-level, not economic-threat-level. And when they get to economic-threat-level (which in fairness could be next month or next year), they'll be that threat even if none of your IP ever entered their training runs.
I already addressed this: "the only difference between it and what OpenAI is doing is that OpenAI's product relies on a technical means of laundering intellectual property that seems tailor-made to dodge a body of existing copyright law designed by people who could not possibly have conceived of what modern genAI is capable of."
You are certainly welcome to disagree with what I've said, but you can't simply pretend I didn't say it.
A quick Googling suggests that OpenAI employees are not working for free—far from it, in fact. In this frame I don't particularly care whether the organization itself is nominally "non profit", because profit motives are obviously present all the same.
They are certainly welcome to try! Given how profoundly incapable extant genAI systems are of generating novel (no pun intended) output, including but not limited to developing artistic styles of their own, I think it would be quite funny to see these companies try to outcompete human artists with AI generated slop 70+ years behind the curve of art and culture. As for modern "public domain"-ish content, if genAI companies actually decided to respect intellectual property rights, I expect those licenses would quickly be amended to prohibit use in AI training.
AI systems will probably get there eventually, though it's very difficult to predict when. However, that speculation does not justify theft today.
People are absolutely throwing money at genAI right now, so if nobody has thrown enough money at this particular idea to give it a fair shake then the obvious conclusion is that people who know genAI think it's a relatively bad one. I'm inclined to agree with them.
Why is this relevant? I'm not talking about AI safety or "X risk" or whatever—I'm talking about straightforward intellectual property theft, which OpenAI and their contemporaries are obviously very comfortable with. The models they sell to anybody willing to pay today could literally not exist without their training datasets.
I sincerely think you (in the thought experiment) sounds like an incredible hero.
Libraries are awesome and great for reducing inequality, using ads to support that cause and also funneling cash to UBI initiatives? Even better
That... doesn't seem sufficient, or legal, or (if legal) ethical. You can't just "compensate" people for using their copyrighted works via whatever means you've decided is fair.
I think funding UBI studies and lobbying for that sort of thing is a public good, but is entirely unrelated to -- and does not make up for -- wholesale copyright infringement.
I take no position at all about legality, if scraping is or is not legal and if LLMs are or are not "fair use" is quite beyond my limited grasp of international copyright law.
But moral/ethical, and/or sufficient compensation?
IMO (and it is just opinion), the damage GenAI does to IP value happens when, and only when, an AI is good enough to make some human unemployable, and that happens to all e.g. novelists around the same time even those whose IP was explicitly excluded from training the model. So, twist question: is it fair to pay a UBI to people who refuse to contribute to some future AI that does end up making us all redundant? (My answer is "yes, it's fair to pay to all even if they contributed nothing, this is a terrible place for schadenfreude").
Conversely, mediocre regurgitation of half-remembered patterns that mimic the style of a famous author cannot cause any more harm when done by AI than when done by fan fiction.
Right now these models are pretty bad at creative fiction, pretty good at writing code, so I expect this to impact us before novelists, despite the flood of mediocre AI books reported in various places.
Other damage can happen independently of IP value damage, like fully automated propaganda, but that seems like it's not a place where compensation would go to a copyright holder in the first place.
Sufficient compensation? If AI works out, nobody will have any economic advantage, and UBI is the only fair and ethical option I am yet aware of for that.
It's what happens between here and there that's messy.
Good catch, hadn't seen that.
So the researchers, shareholders, and leadership of OpenAI will be happy to give up being ridiculously wealthy so they can be only moderately wealthy, and everyone else gets a basic income?
I'm also just skeptical of UBI in general, I suppose - 'free' money tends to just inflate everything to account for it, and it still won't address scarcity issues for limited physical assets like land/property.
I've love to be wrong about both of these things.
I agree UBI can have that problem, but I think it can be avoided if e.g. the government owns the means of production.
There are still risks in this scenario.
Until my check from OpenAI shows up, they're not.
"That's literally built into their corporate rules for how to take investment money, and when those rules were written they were criticised because people didn't think they'd ever grow enough for it to matter. "
that sounds like insane bullshit to me. they're trained on the whole internet. there's no way they give back to the whole of the internet, more likely a lot of jobs will be taken away by their work.
plus, there was no consent.
If they make a model that's good enough to actually take away jobs rather than merely making everyone more productive — is it a tool or a human level AI, I'm not sure either way though I lean toward the latter — the only possible compensation is UBI… which they're funding research into.
I agree with you about this. That's a different problem, but I do agree.
It’s not a matter of “is it good enough to replace humans”, as despite all of us here knowing it’s not, we could list many companies (and even industries) where it’s already happening
That comment is self-contradicting. If it's already replacing humans, then economically speaking (which is what matters for economic harm), it's good enough to replace those specific humans.
The reason I'm not sure how much this tech really is at the level of replacing humans in the workplace, is that there's always a lot of loud excitement about new tech changing the world, and a lot of noise and confounding variables in employment levels.
But if it is actually replacing them, then it must be at the level of those employees in the ways that matter.
„In the ways that matter” and the only way that matters for a lot of employers is what is cheaper.
This maybe isn’t strictly related to the topic of this post or conversation but a lot of companies have been replacing most, or even all, support channels with AI assistants. No, it isn’t good enough to replace those humans in a sense most would consider essential - helping customers which reach for the support line, but businesses find it „good enough” in a sense that it’s cheaper than human workers and the additional cost of unhappy customers is small enough to still have it be worth it.
This is very cheap: https://man7.org/linux/man-pages/man1/yes.1.html
I would agree with you that what counts as "good enough" is kinda hard to quantify (which itself leads into the whole free market vs state owned business discourse from 1848 to 1991), but I do mean specifically from the PoV of "does it make your jobs go away?"
Although now I realise I should be even more precise, as I mean "you singular and jobs plural for now and into the future" while my previous words may reasonably be understood as "you plural and each job is just your current one".
Would that be less valuable?
I wouldn't be able to guess.
Plus side: Smaller more focused model with lower rate of falsehoods.
Minus side: it kant reed txt ritten liek this, buuuut dat only matters wen nrml uzers akchuly wanna computr dat wrks wif dis style o ritin lol
I suspect there is a lot of value in the latter, and while I don't expect it to be as much as the value in the former, I wouldn't want to gamble very much either way.
Can it train on the Wikipedia meta community? Maybe we could get an LLM that talks like "You're wrong, see WP:V, WP:RS, WP:WTF".
Great. Let's do that then. No good reason to volunteer it for a lobotomy.
I see it differently. To me, if you post your work online as an artist, it's really there for everyone to view and be inspired by. As long as nobody copies it verbatim, don't think you've been hurt by any other usage. If another artist views it, and is inspired by it... so be it. If an AI views it, and is inspired by it, again, no harm done.
AI doesn't get inspired. It's not human. It adds everything about it to its endless stream of levers to pull, and if you pull the right ones, it will just give you the source verbatim as proven by the NYT lawsuit filing where it was just outputting unaltered copywritten NYT article text.
That's a matter of perspective. AI's do not make a copy of the source material. It very much just adjusts their internal weights, which from a broadminded perspective, can be seen as simple inspiration, and not copying.
Of course, just like a human artist, it could probably closely approximate the source material if it wanted to, but it would still be its own approximation, not an exact duplicate. As for plagiarism, we all have to be careful to rephrase things we've read elsewhere, and perhaps AI's need to be trained to do this a bit better... but it doesn't change the underlying fact that they're learning from the text they read, not storing a verbatim copy (at least, no more so than a human reader with a good memory)
Based on the comment you replied to, it seems they are indeed producing verbatim copies.
Well i'll leave it to the legal system to decide if that's true.
But in any case, that's no different from a human with a photographic memory doing the same thing after reading a paragraph. We don't blame them for their superior memory, or being inspired by the knowledge. We don't claim they've violated copyright because their memory contains an exact copy of what they read.
We may still demand that they avoid reproducing the exact words they've read, even though they are capable of it -- which is fine. We can demand the same of AI's. All I object to, is the idea that a smart AI, with a great memory is guilty of something just by reading or viewing content that was willingingly shared online.
If I tell you water is wet and the sky is blue, will you be waiting for a court case to grind through the appeals process on that as well? The examples in the filing were unambiguous. You can go look it up and see them, they were also cited in all the news articles I saw about it. The AI regurgitated many paragraphs of text with very, very few small modifications.
The issue at hand is not if some words were copied; it's a legal issue of whether that constitutes legal or otherwise fair use or not. And I'm not a lawyer, and am happy to wait for the court to decide.
But to be honest, I don't really care one way or the other, since it doesn't get to the heart of the matter as I see it. To my mind, it's no different than a human with a good memory doing the same thing.
Specifically, it isn't the consumption of the media that is the problem, or even remembering it very well. Rather, it is the public verbatim reproduction of it, which a human might inappropriately do as well. AI's need to be trained to avoid unfair use of source material, but I don't think that means they should be prohibited from consuming public material.
I think the term "AI" is one of the most loaded and misleading to come up in recent discourse. We don't say that relational databases "pack and ship" data, or web clients "hold a conversation" with each other. But for some reason we can say that LLMs and generative models "get inspired" by the data they ingest. It's all just software.
In my own opinion I don't think the models can copy verbatim except in cases of overfitting, but people like the author of the post have a right to feel that something is very wrong with the current system. It's the same principle of compressing a JPEG of the Mona Lisa to 20% and calling that an original work. I believe the courts don't care that it's just a new set of numbers, but instead want to know where those numbers originated from. It is a color of bits[1] situation.
When software is anthropomorphized, it seems like a lot of criticisms against it are pushed aside. Maybe it is because if you listen to the complaints and stop iterating on something like AI, it's like abandoning your child before their potential is fully realized. You see glimpses of something like yourself within its output and become (parentally?) invested in the software in a way beyond just saying "it's software." I feel as if people are getting attached to this kind of software unlike they would to a database, for example.
A thought experiment I have is whenever the term "AI" appears, mentally replace it with the term "advanced technology." The seeming intent behind many headlines changes with this replacement. "Advanced developments in technology will displace jobs." The AI itself isn't the one coming for people.
[1] https://ansuz.sooke.bc.ca/entry/23
You make a good point.
My own perspective is that humans do not have an exclusive right to intelligence, or ultimately to personhood. I am not anthropomorphizing when I defend the rights of AI. Instead, I am doing so in the abstract sense, without claiming that the current technology should be rightly classified as AI or not. But since the arguments are being framed against the rights of AI to consume media, I think the defense needs to be framed in the same way.
If you pull the right levers, you can also copy the NYT article fully.
Yeah and that's copyright infringement. That's why if you're reverse engineering something it needs to be done in a clean room environment, your prior exposure to the copywritten material poisons the well for any derivative you create.
This extends to music as well, if someone hears a song and is inspired by that in their work, the original artist gets credit.
That's only true in a very narrow set of circumstances. Imagine the case where someone listens to 10,000 songs, and then takes the sum total of that experience and writes their own. There's no credit given to the inspiration that each of those 10,000 songs gave. And that is in fact much closer to what AI is currently doing.
If we're discussing the current situation, where ChatGPT is outputting entire articles of NYT copyrighted content, then it certainly matches say, Bittersweet Symphony containing a small sample of a Rolling Stones song resulting in the Stones getting credit for the entire work.
There is a rather vibrant culture of _human_ remix artists who sample existing music and generate something completely new. It is still an open question of how long a sample they can use before it requires licensing:
https://www.ajschwartzlaw.com/can-i-use-that-part-2-remix-cu...
But all of this, doesn't have much to do with whether an AI should be allowed to consume public media -- even if we agree they need to do a better job about avoiding verbatim reconstruction when prompted.
Edit: I wasn't able to find the Lawrence Lessig talk I wanted to link, but here's a quite old one that makes some of the same points:
https://youtu.be/X8ULxxgjBuI
I’m not a lawyer but I’m pretty certain that’s not actually how things work.
This is literally impossible for the general case. There isn't a way to compress everything that an AI consumes down to the finite number of weights. That would represent a perfect compression algorithm, that is mathematically impossible.
I would be careful in asserting that it's proven until the evidence is heard in court and we have all the data.
You can look at the filing yourself. It is unambiguous.
It’s unambiguous about repeating articles of NY times pieces with hallucinations interwoven occasionally. This isn’t nearly as strong of an argument.
...and if you ask a human artist to exactly reproduce an artwork, that is copyright infringment, or forgery. I don't see how this is any different? The person using the AI to make exact replicas would be the one committing the copyright infringement, not the AI.
There's such thing as consent, I hope you've heard of it.
No artist whose work was used to train the AIs consented to such use.
Particularly if they released their work online before generative AI was a possibility.
Generative AI model is not "everyone". It's a model, a combination of data that goes into it.
It's a thing. A product. A derivative work, to be specific, made by the person who trained it.
Such a romantic notion!
But the same metric, a photocopy machine is an auteur that gets inspired by the work that it happens to stumble into to produce its own original art.
No.
The AI doesn't "view" the work, it has no agency. The human that trains the model does.
And that human is the one that is ripping the artist off.
The AI, as many people said, is just a tool. It doesn't suddenly turn into a person for copyright purposes.
It still remains a tool for people who train the models. A tool to rip off others' intellectual property, in the case we're discussing.
When you release something into the world, you have to accept that things beyond your control will engage with it. When you walk outside, down the street, you consent to being in public, and being recorded by video cameras, whether you like it or not. Even if a new 3D camera you don't know about exists and is recording you in ways you don't understand.
And in the same way, when you release something on the internet, it is there essentially forever. If you imagine that you get a "do-over" when the world changes in ways you didn't anticipate -- you're wrong. Nobody should feel entitled to such a reset.
Nobody is being ripped off.
People said the same thing about slaves. They aren't real people, worthy of rights. They were compared to animals, the same way you compared AI to a photocopier. Someday, this human self-obsession and self-importance against silicon based lives, will be seen in the same light, and artificial intelligences will be granted personhood. We should start protecting their rights in law today. Please don't be a bigot.
Please don't project.
LOL.
You can't have it both ways:
-AI is just a tool, work produced with AI is original work of the user
-AI is an intelligence that gets "inspired" by data it's trained on
Anyway. I don't want to regulate the AI.
I want to regulate the humans that make the choice of which data goes into training an AI.
Yes, and those laws will be very useful as soon as any actual artificial intelligence will crop up. Blindly applying them to DALL-E or ChatGPT because they are marketed as AI would be... short-sighted.
You had me till that^ line. In your example if "inspired" human start competing with you, then there is harm. If the inspired human is replaced by an AI, then it also harms. By harm I am referring to competition.
So instead of saying "no harm done", then maybe its more accurate to say "same harm as a other humans being inspired by your work".
I don't disagree with you. But then that is a completely different issue, and has nothing specifically to do with AI, but rather the tradeoff between the benefits of releasing your work to the public, and the potential competition it might inspire.
This was a reference to Instagram, et. al., where distributing your illustrations there implicitly allows thier Ad machine to profit from your work
you decide to put your stuff on Instagram, you don't decide to put your work in midjourney
But sometimes your work is put on Instagram without your knowledge or consent (eg by an Instagram aggregator account)
And this whole ecosystem of credit (or not sharing credit) is undoubtedly encouraged by Instagram (because it's valuable to me to have an Instagram account with many followers)
That's copyright infringement. You can't claim to own other people's work and then give license to others to use when you don't own the work.
Of course it is! It being illegal doesn't mean Instagram is incentivized to crack down on it
And if you're a tiny independent artist you also aren't well-resourced to do anything about it.
(Afaict the best thing we can do is maintain a culture that demands attribution)
I represent tiny independent artists and the best thing you can do is hope that some moneyed media company infringes your work. They never have an excuse and they always pay up.
But that's not what happens on Instagram
This isn't like some TV ad using a song they forgot to license
Sure, but it is quite difficult to prove your work was used in the models and it is also quite expensive legally while being very time consuming. So in practice does it matter? I still believe in presumption of innocence, but that isn't to say that it can't be exploited and that the system itself is expensive (monetarily and temporally). I think you're just oversimplifying the issue and dismissing before considering any nuance.
what am I dismissing? I think these AI companies are stealing these works. Nerds here are equivocating the creation of art with what LLM output, simply out of their own inexperience making art. It's terrible and I hate it.
I believe I misinterpreted, it sounded like you were just suggesting it was copyright and thus could empower the artist to sue the companies. As if the process was rather straightforward and thus the fault on the artist for not pursuing. I'll admit that I'm primed by other comments which do express this sentiment explicitly. Sorry if I misunderstood.
Is that boundary something that the internet is likely to be able to recognize or enforce though? In 100 years? In 500?
What are we building for here?
If the boundary isn't enforced you'll likely see art move increasingly to a patronage system, whereby a select elite will get to choose what art is suitable for your consumption. Maybe you'll like that, maybe you won't.
I think it depends in large part on how the moving parts interact.
If the emergence of automata and internet creatures also means that people can spend more of their time doing what makes them happy, and if people doing that results in amazing art and music emerging, then I don't think that the purvey of selection will be so dangerously cornered by an elite.
But it's up to us to do the work to build that future.
It is?
But also this is dismissive without understanding either system at play. Traditional art has galleries because no one wants to buy art that they haven't seen. But this creates a problem if our galleries are on the internet. It no longer matters how much you attempt to stop a user from downloading or copying the art, they still can. We even see efforts to watermark art on public posting but we also see efforts to remove the watermarks. It's very clear to me that this is becoming a more complex time to be an artist than it was, say 5 years ago. Maybe better than 100 years ago, but that would be ridiculous to compare to as we'd have to get into an argument about the point of societies and if a goal is to make life better.
Strangely this is something some in the NFT crowd were attempting to solve. We can all agree that there were lots of scams (I'd even say dominated by scams and grifting) in the area and that there was poor execution of even good faith attempts, but I'm just trying to say that this was motivation for sum. They weren't even trying to solve as difficult of a problem, they were just trying to solve the problem of proof of ownership. It's not like you can use timestamps or public/private key pairs for proof. It still wouldn't even solve the problem of redistribution or training. It wouldn't prevent someone from even buying the art and training on it even if the artist specified that that was against the TOS (a whole other philosophical debate about TOS and physical ownership that I don't want to get into).
I think you're simplifying the problem and being ready to dismiss without adequately demonstrating an understanding of the issues and environment. I don't know what the answer is but I sure as hell know it is a lot more complex than "let the 'free market' run its course".
We have a patronage system today. See: services like patreon
you mean, is the right of an individual human to property and the rule of law be recognized in 100 or 500 years?
hopefully yes, unless you foresee some inescapable transfer to dictatorship, which you'll have to argue for to maintain your argument
I meant something softer and simpler, like power structures which somehow convince learning machines to avert their gaze when media from a forbidden domain appears.
That just doesn't seem like the direction of the internet to me.
As for the rule of law, it seems that what we're talking about here is what authority can be capable of supplying the law. I highly doubt that a confusing and contradictory set of mandates from nation states will continue to be "the law" on the internet.
This seems like a strange criticism to me - if you're posting your illustrations on social media, it's presumably because you feel that you're getting value out of doing so. Who cares if they're also getting value out of you doing it, particularly when that value comes at no cost to you?
If you sell your art, then art marketplaces and printers and shipping services all profit from your work, but I don't imagine she's complaining about that. What's the difference? In all of those cases, as with social media, companies are making money from your work in return for providing a useful service to you (and one you don't have to use if you don't think it's useful).
On the contrary, I think it is very natural. The environment changed, and thus the deal. There's no clear way to negotiate the terms of the deal. It may be easy to say to just drop off the platform, but we've seen how difficult it can be to destroy a social media platform. Sure, Myspace and Google+ failed, but they didn't have the network base that Facebook and Twitter do which have also fallen into this trap (I'd even add Amazon here for the more general context. Certainly none of these companies are optimizing for the public because there's difference between who their customers are and the public). Network effects are powerful.
So I see complaints like these as attempts to alter the deal and start a coalition. The deal was changed under them, so why do they not have a right to attempt to negotiate? It is quite clear that there is a very large power differential. It's quite clear that unless you're exceptionally famous (which means the alteration is less impactful) that abandoning the platform comes at risking your entire livelihood unless you can garner a large enough coalition. It's like quitting a shitty job without a new job and not knowing who would hire you. Most people are going to stick around and complain __while__ looking for a way out.
The author reminisces of a time that was favorable to them, and implies that back then somehow it was free to distribute and this was taken away. Which is interesting, because there were ads back then too. A lot. Banners. Toolbars. Google made money a shitton of money back then too. And there are amazing non-FAANG spaces today like the Fediverse where the author can distribute their work, and the number of users there are easily comparable to the 'old internet' user counts.
But that points at the main question, what was this magical old way of distribution that was somehow pure that's now gone? Mailing lists. Still there. RSS? Still here, Google Reader died (peace be upon it), but the next day others started to grow as its shadow was gone. IRC? Forums?
So maybe the author is really missing the audience? Sure, platforms and powerful interests shape userbases, but it's not like those old sites were run on pure air.
Of course now there's a new possible way to use copyrighted works, and while it's very similar in effect of what human artists can do (and of course sometimes do), there's a very clear difference in scale, economics, control and thus it's natural that politics is involved.
This sounds like another way to say that the environment changed and thus the deal did.
There's still a ton of ads. And it isn't like Meta is making less money. They're just below their all time high[0] and it's not like there was one of the largest global financial crises between these two periods or something. Meta is in the top 10 for market caps and just shy of getting into the trillion dollar club. I'm not sure what argument you're trying to make because I'm not seeing how Meta (or any FAANG) is struggling.
Be realistic. We both know that 1) many of the artists are trying to distribute there, 2) that this doesn't prevent their work from being used in ML models since you can still download the material, 3) the audience is substantially smaller and Mastodon is not even close to replacing Twitter.
Also remember that platforms with large userbases shape users. There's a reason why the classic Silicon Valley Strategy is to run in the red and attempt to establish a (near) monopoly. Because once you have it, you have a lot of power and can make up for all the cash you burned. Or you now have a product to sell: users.
[0] https://seekingalpha.com/symbol/META
Perhaps your artwork has an anti-capitalist message, and you do not want it to appear anywhere near an Ad for the latest beauty cream.
In the early days of the Internet, there were places to promote a webcomic with no commerical interest, like usenet and forums, and it was typical to visit an artist's website directly.
These days, the average new Internet user might not even be aware of the concept that an artist can have their own website, and own the user experience of visitng that website from end to end. Web design in the early 2000s had a lot of creativity and easter eggs built into the experience of navigating the pages themselves.
An artist absolutely has the right to not want to upload their creative work (which takes days and weeks to produce), onto a bland social media site with it's own terms and conditions with regard how that content is treated and monetized.
Places like Instagram are bleak compared to what was pushing creative boundaries of the web in the mid to late 2000s. Sure, there are still fun websites like this but they are difficult to find (what happened to StumbleUpon?)
This and social media and almost always the worst possible representation of an artwork. Crunched, cropped and compressed into an inferior version of itself.
We’ve forgotten how to appreciate digital artworks.
The author, presumably.
Who would (has) argued that there is a cost.
I think they agree with you, and are making the claim that since they choose not to use the service (considering it exploitative), it is getting increasingly difficult to distribute their work in a way they find ethical.
I'm not sure I agree with their arguments completely, but they aren't really "strange" as you suggest.
Yes, and I'm sure somewhere in the Instagram terms of service, users have agreed to license their work to Instagram for those purposes.
Couldn't we say the same thing about search engines?
What value would google have without content to search for?
Is the conclusion we should make search engines pay royalities? That seems unfeasible at google scale. Should google just be straight up illegal? That also seems like a bad outcome; i like search engines i am glad they exist.
I guess i'm left with - i don't like this argument because of what it would imply for other projects if you follow the logic to its natural conclusion.
Search engines don't replicate the content, they index and point to it. When search engines have been caught replicating content they have been sued or had to work out licenses.
How do they make the index without ingesting a copy of the content?
Storing a copy of the content isn't the relevant part. It's what that copy is used for.
In a search engine, it's used to direct traffic to the original author. In Midjourney, it's used to create an alternative to the original author.
I certainly use the blurb in search results at times to get an alternative to the original author (especially if the blurbed content is paywalled).
In the US, that blurb is covered by fair use. Google doesn't regurgitate the entire published work.
These many AI lawsuits will likely settle whether training LLMs falls under fair use.
I mean, it seems like casds of AI regurtitating entire works verbatim is pretty rare. It happens but is unusual.
And also not really what this is about. If AI was just another libgenisis it would be a lot less controversial.
Wrong. Here's Google's cache of an entire paywalled article currently on the front page of HN: https://webcache.googleusercontent.com/search?q=cache:c-xOMj...
You can get these by clicking the three dots next to any search result, expanding the options in the pop-over, and then clicking "cached."
I mean, generative AI basically involves turning content into vectors in order to try and find related content. If that's not an index structure i don't know what is.
The technical definition of how something works is often irrelevant in the eyes of the law. What matters is the output, and its effects.
The output of a search engine is nothing like the output of generative AI.
Most ML are certainly compression systems, as you imply. But reproduction and redistribution of copyright material is quite different than generating a service that allows someone to find said material. The question about caching is a different one, but I doubt it falls under the same intent, as Google is incentivized to cache content to better facilitate the search __and redirection__ service, not to facilitate distribution.
Ya that worked out real well for Genius.
Search engine caches work on an opt-out basis, which was found legal in Field v. Google Inc.
The deal with search engines was always that you would get traffic out of then.
Use my content, to get people to me. Google's snippets kinda broke that deal and people have indeed complained about that, but otoh you can still technically opt out of being indexed.
It's not clear how you opt out of LLMs.
Google does transfer value back to the websites, by sending them traffic.
However, Google does get a lot of criticism when they do slurp up content and serve it back without sending traffic back to the websites! Yelp and others have testified to Congress complaining about this!
You can remove your work from a search index anytime you want. Not so with work included in a training set
But they are free to use the fruits of the model, same as anyone else. I suppose the difference is they don't care; they already have the talent to transform their labor into visual art, so what use do they have for a visual-art-generation machine?
I find strong parallels in the building of web crawlers and search indexers, except... Perhaps the indexers provided more universal, symmetrical value. It's hard to make the case that someone crawled and added to a search index doesn't derive value from that strong, healthy index being searchable (even librarians and news reporters thrive on having data indexed and reachable; the index is a force-multiplier, it's not "stealing their labor" by crawling their sub-indexing work and agglomerating it into a bigger index, nor is it cheapening the value of their information-sorting-and-sifting skills when the machine sorts and sifts).
So perhaps there is a dimension of symmetry here where the give-and-take aspect of what is created breaks. Much like the rich don't have to care whether it's legal to sleep under a bridge, artists don't have to care whether a machine can do 60% of the work of getting to a reasonable visual representation of an idea. No, more than that: it's harmful to them if it's legal to sleep under the bridge.
They'd be landlords in this analogy, crying to the city that because people can sleep under bridges the value of the houses they maintain has dropped.
Or they lost means of income? Like is this a difficult concept? Livelihoods will most likely be lost and probably never really coming back. Sure we can say industries change; however, we had protections in place to prevent artists losing money due to people copying… A company has said “screw the rules, here’s a supercharged printer.”
Imagine someone came up with a way to print houses for nearly-free. Anyone needs shelter? Bam, instant house.
Landlords would lose income left and right.
The relevant question is: would society stop the building of these free houses to protect the interest of the landlords?
(Now, take that analogy and throw it in the trash. ;) Housing is essential; art is fundamentally a luxury good. If we want to, we can have a society that protects profit-making on luxury goods and profit-making on necessities differently. "Someone is losing income, therefore this change is bad" is too simple a rule to build a healthy society upon, but a reasonable person can conclude "Artists are losing their livelihoods, therefore this change is bad").
Money is just paper. And now with cashless systems, it's now only bits on some computers. But no one sane wants to maximize everyone accounts or print cash to hand out. Art takes hours, days, even years or human labor to produce. Copyright laws set an artificial scarcity, because if it can be replicated at no cost, there's no way for the author to recoup the cost of production.
In the analogy you provided, you'd still have to deal with land scarcity and environmental impact.
Well, depending on whether one considers the output of one of the AI image diffusion algorithms "art"... Not anymore, right? That's rather the point of the debate? That we've gone over a few decades from a world of "This wall is blank. Maybe I'll go to the art show and find a piece someone made, or commission a piece based on an idea in my head" to "This wall is blank. Maybe I'll spend two hours futzing with DALL-E and send the result off to a local print-shop, then pop it over to Michael's and frame it?"
No one wants to live under the bridge even if it was perfectly legal.
OpenAI is very much a for-profit company with the same incentives to make money as every other US for-profit company. I understand that there's another company that is a non-profit and that company bosses the for-profit company. In my opinion, that's more of a footnote in their governance story, it doesn't make OpenAI a non-profit. Their website says...
"A new for-profit subsidiary would be formed, capable of issuing equity to raise capital and hire world class talent, but still at the direction of the Nonprofit. Employees working on for-profit initiatives were transitioned over to the new subsidiary."
https://openai.com/our-structure
They might be a for-profit company, but I don't believe they have made a profit. I wasn't even thinking of the murkey non-profit status.
This is an important distinction. The difference between making a boatload of money on the cheap and making a boatload of money after spending an even bigger amount should matter to this cartoonist becau
Wrong way to evaluate and you're missing the complaint from the artists. It doesn't matter what the company makes or loses, it matters if people end up with money in their pockets. I don't know ML counts as derivative or not, but I don't think that's important here. Regardless it is a fact that an artist does work, that work isn't paid for, and then that work is used by someone else in some way to sell a product. The complaint is specifically that they have no control over deciding if their work can be used in that way or not.
At the end of the day, everyone working at OpenAI ends up with lots of money in their pockets. Regardless of the company profits or revenue, this money is connected to the work that these artists do, who do not make any or any substantial amounts of money. That's the underlying issue and we must read between the lines a bit to understand the difference between a specific point and the general thesis.
Also, remember Stability does have revenue as well. OpenAI is being pointed at because they are top dog, but it's not like there aren't others doing the same thing. So even a very specific justification may not refute the original complaint.
They have over a billion dollars in revenue per year. They aren't making a profit because they're reinvesting all of that money into growing the company.
Particularly after the whole Sam Altman debacle, regardless of one thinks that the board was being logical or not and regardless of whether anyone thinks that Sam Altman should have been fired, it's still very clear that the non-profit side of the company is not in control of the for-profit side.
We've seen zero evidence that the non-profit side of OpenAI meaningfully constrains the for-profit side in any way, and have seen direct evidence that when the non-profit and for-profit groups disagree with each other, the for-profit side wins.
It would be good to see how much it changes if you only include work that is public domain.
Any prompt response longer than 30 characters inevitably devolves into verbatim passages from The Iliad.
I could get on board with this.
If that includes a drastic revamp of copyright laws to increase the public domain, why not.
I don't see why Disney or Universal would be more legitimate than OpenAI to profit from stuff made from now dead authors 60 years ago. Both seems as legitimate.
I thought this was more of a reference to Facebook, etc. I could be wrong, though, it's pretty vague.
Yes, OP knows that the quote is referring to platforms/media channels like youtube/facebook/google, etc. But it's also referring to profit-making companies on the internet, like OpenAI.
I think it it's a reference to the media landscape, Facebook and OpenAI inclusive.
If you remove all the copyrighted, permission-less content from a human's training, what value does the human have, in connection with work?
When is AI good enough that the contents it contains can be comparable to human brain content, copyright wise?
And conversely, now that we can read signals from neurons in a human brain, and create images from dreams and audio from thoughts, would not that also break the copyright of the content?
There is absolutely zero comparison between living in the world and experiencing it, and building a model, loading in copyrighted, carefully curated material and then charging for the outputs of said model without paying royalties. It's hard to even believe people can't understand the difference.
The fact is, the majority of people do not want to steal others work for profit, and for those bottom feeders that do, there are lass to discourage such behavior and to protect the original producer.
If these models were trained on creative commons licensed material only, then you'd have a leg to stand on.
I even had to pay for my tuition, and textbook material. Even if some portion of my knowledge comes from osmosis, I have still contributed at some stage to access training material.
When I was 16, I wanted to learn to code, do you know what I did? I went and purchased coding books because even at 16, I understood that it was the right thing to do. To pay the author for the privilege of accessing their work.
How basic can one get?
Would you like it if I broke into your house and used your things without asking you? Because that's about what's happening her for professionals.
I wonder if AI shrinks the economy. Not that such a metric is the most important ruler by which to measure goodness, but it would be ironic to have a massive tech company that produces less value than it removes from the world.
In a way it will remove value but in a way it will add value back in terms of an avalanche of derivative junk, lacking authenticity and context. If you find value in memes for example there will be a lot of that type of content in the future.
It would be nice if you used a label you had to pay a fee for whoever owns the label, if you don't want to pay a fee to the owners of an artstyle, then you can always use public domain works. Hell, this might be even better for public domain works and restoration if a small fee went to them as well.
I hope I never have to live in a world where such a thing exists.
The only way I see this working with out current economics and IP law is if the people training models license the work they are using. And a traditional license wouldn't do, it would have to be one specific to training AI models.
As to the question of worth, obviously OpenAI's models have value without the training data. Just having a collection of images does not make a trained AI. But the total value of the system is a combination of that model and the training data.
This goes for your knowledge as well, as AI fundamentally doesn’t learn any different than humans do.
If you remove all knowledge gained from learning from or copying others works, what value do you provide?
Nothing on this planet can learn without copying something else. So if we open the can of worms for AI, we should do the same for humans and require paying royalties to those who taught you.
I keep thinking: this is what eg Google has done all along. The content it uses to train models and present the answers to us absolutely belongs to others, but you try get any content (eg maps data) out of it for free at scale.
But the business model emerged and delivered value to us while enough that we didn’t consider asking for money for our content. We like being searched and linked to. Less so Google snippets presented to users without the users landing on our site. Even less so generated without any interaction. But it’s all still all our content.
That's not enough to say that, all companies are benefiting of what has been made before, nothing exists in a vaccum. AI adds into the current landscape.
It basically just looked at them. It’s absolutely preposterous that you can own a painting, thereby claiming nobody is allowed to draw that anymore, and now people can’t even look at your shitty drawing without paying? Then don’t put it online in the first place..
Seriously the audacity of these so called artists.. just because I sang a song one day does not mean I am entitled to own it and force people to pay me to be allowed to sing it. That’s absolutely insane.
that's arguably true for humans learning from the content as well
Agreed with the overall sentiment, but let's be clear. OpenAI is currently a (capped) for-profit company. They are partnered with Microsoft, a for-profit company. They commercially license their products. They provide services to for-profit companies. The existential crisis of the last six months of the company seems to have been over moving in the direction of being more profit-oriented, and one side clearly won that battle. OpenAI may soon be getting contracts with the defense department, after silently revoking their promise not to. Describing them as a non-profit company is de facto untrue, regardless of their nominal governance structure, and describing them (or them describing themselves) as a non-profit feels like falling for a sleight of hand trick at this point.
I think a model that would make more sense is to punish bad behavior in the form of infringement, so if someone monetizes an AI output that infringes on someone's copyright/trademark, then go after that person. Otherwise we are going to be completely stuck for the sake of some kind of outdated mentality around intellectual property.
I really haven't liked the crypto bros == AI bros memes, but in this way I do see the similarity.
As in: we will change the world, all that is required is that we throw away all previous protections! The ends justify the means!
I do see a much more beneficial trajectory for LLMs vs cryptocurrencies, but yeah, this is gross and unfair.
Note: as the days go on, I continue to realize the pitfalls of Utilariansim. I do miss the simplicity, but nope.
If you take away all music created by black people what value do The Rolling Stones have?
I don't know yet exactly how this compares, I’m trying to think it all through.
AI has different levels - output can be loosely inspired by, style cloning, or near exact reproductions of specific work.
In my opinion, the play is thus, steal everything, build the models, then:
* People won't notice, or the majority will forget (doesn't seem to be happening). * Raise enough money that you can smash anyone who complains in court. * Make a model good enough that can generate synthetic data and then claim new models aren't trained on anyone's data. * All of the above.
Anyway, I 100% agree with you, the value is in the content that everyone has produced for , they're repackaging and reselling it in a different format.