On being listed as an artist whose work was used to train Midjourney

But I can't even get cartoons to most people for free now, without doing unpaid work for the profit-making companies who own the most use channels of communication

This is the sticking point for me. OpenAI isn't a profit-making company, but it's certainly a valuable company. A valuable company that is built from the work of content others created without transferring any value back to them. Regardless of legalities, that's wrong to me.

Put it this way - you remove all the copyrighted, permission-less content from OpenAIs training, what value does OpenAI's products have? If you think OpenAI is less valuable because it can't use copyrighted content, then it should give some of that value back to the content.

> If you think OpenAI is less valuable because it can't use copyrighted content, then it should give some of that value back to the content.

But we are allowed to use copyrighted content. We are not allowed to copy copyrighted content. We are allowed to view and consume it, to be influenced by it, and under many circumstances even outright copy it. If one doesn't want anyone to see/consume or be influenced by one's copyrighted work, then lock it in a box and don't show it to anyone.

I have some, but diminishing sympathy for artists screaming about how AI generates images too similar to their work. Yes, the output does look very similar to your work. But if I take your work and compare it to the millions of other people's work, I'd bet I can find some preexisting human-made art that also looks similar to your stuff too.

This is why clothing doesn't qualify for copyright. No matter how original you think your clothing seems, someone in the last many thousands of years of fashion has done it before. Visual art may be approaching a similar point. No matter how original you think your drawings are, someone out there has already done something similar. They may not have created exactly the same image, but neither does AI literally copy images. That reality doesn't kill visual arts as it didn't kill off the fashion industry.

I firmly believe that training models qualifies as fair use. I think it falls under research, and is used to push the scientific community forward.

I also firmly believe that commercializing models built on top of copyrighted works (which all works start off as) does not qualify as fair use (or at least shouldn't) and that commercializing models build on copyrighted material is nothing more than license laundering. Companies that commercialize copyrighted work in this manner should be paying for a license to train with the data, or should stick to using the licenses that the content was released under.

I don't think your example is valid either. The reason that AI models are generating content similar to other people's work is because those models were explicitly trained to do that. That is literally what they are and how they work. That is very different than people having similar styles.

If I read a lot of stories in a certain genre that I like, and I later write my own story, it’s almost by definition going to be a mish-mash of everything I like.

Should I pay the authors of the books I read when I sell mine?

We shouldn't hold individual humans and ML models to the same standards, because ML models themselves are products capable of mass production and individual humans are not even remotely at the same scale.

If you write that book, chances are you will gain some fans that are also fans of other authors in that genre.

If ML models write that genre, they can flood that genre so full that human artists won't be able to complete.

It's not even a remotely equivalent scenario

Computers and machines have been capable of mass production for decades, and humans have used them as tools. In the past 170 years, these tools of mass production have already diminished many thousands of professions that were staffed by people who had to painstakingly craft things one at a time.

Why is art some special case that should be protected, when many other industries were not?

Why should we kill this technology to protect existing artistic business models, when many other technologies were allowed to bloom despite killing other existing business models?

Nobody can really answer these questions.

Why is art some special case that should be protected, when many other industries were not?

It shouldn't be.

As soon as someone makes an AI that can produce it's own artwork without requiring ingesting every piece of stolen artwork it can, then I'm on board.

But as long as it needs to be trained on the work of humans it should not be allowed to displace those people it relied on to get to where it is. Simple as that.

Are there any humans that can produce artwork without ingesting inspiration from other art? Do you know any artists that lived in a box their whole life and never saw other art? Do you know any writers who'd never read a book?

Are they any human artists who can't, if requested, draw or write something that's a copy of some other person's drawings or writings?

Also, FYI, you can't steal digital artwork. You can only commit copyright infringement, which is not the same crime as theft, because theft requires depriving the owner of something in their possession.

Are there any humans that can produce artwork without ingesting inspiration from other art? Do you know any artists that lived in a box their whole life and never saw other art? Do you know any writers who'd never read a book?

Are they any human artists who can't, if requested, draw or write something that's a copy of some other person's drawings or writings?

This still is pretending that humans and AI models are equivalent actors and should have the same rights

Emphatically no they shouldn't. The capabilities are vastly different. Fair use should not apply to AI.

Emphatically no they shouldn't. The capabilities are vastly different. Fair use should not apply to AI.

Fair use applies even to use of traditional algorithms, like the thumbnailing/caching performed by search engines. If I make a spam detector network, why should it not be covered by fair use?

No idea on the legality, but common sense suggests that the difference would be that a spam detector doesn't replace the products that it was trained on, while AI-generated "art" is intended to replace human artists.

The question is "is it a derivative work of the original?" - not if it is a generative work.

If that was the distinction to be made, using ChatGPT as a classifier would be acceptable while using it to write new spam (see the "I am sorry" amazon listings of the other day) would be unacceptable.

If two different uses of a tool allow for both infringing and non-infringing uses (are photocopiers allowed to make copies(!) of copyrighted works?) it has generally been the case that the tool is allowed and the person with agency to either use the copyrighted work in an infringing or a non-infringing way is the one to come under scrutiny.

I believe that if it is found that OpenAI is found to have committed copyright infringement in training the model, then an argument that training a model on spam be considered to be copyright infringement could be reasonably constructed.

If, on the other hand, OpenAI is found to have sufficiently transformative in its creation of the model and some uses are infringing, then it is the person who did the infringing (as with a photocopier or a printer printing off a copy of a comic from the web) that should be have legal consequences.

Yeah, I really think it should fall on the user as opposed to the tool.

common sense suggests that the difference would be that a spam detector doesn't replace the products that it was trained on

The extent to which it supplants the original work is one of the fair use considerations.

I think it'd make more sense to have a stance of "current LLMs and image generators should be judged by fair use factors and I believe they'd fail", though I'd still disagree, instead of having machine learning models subject to a different set of rules than humans and traditional algorithms.

That is indeed the most common stance. There isn't nearly as much outcry over, say, image classification by LLMs, as there is over AI "art" generation.

Fair use applies to humans and the things they do (including AI). It is not something that applies to algorithms in themselves. AI's are not people, the people who use them are people and fair use may or may not apply to the things they do depending on the circumstances of whatever it is they do. The agent is always the human not the machine.

True; consider the "it" in my question ("If I make a spam detector network, why should it not be covered by fair use?") as "my making (and usage) of the network".

This isn't about giving "rights" to machines. Machines are just tools. The question is about what humans are allowed to do with those tools. Are humans using AI models and humans not using AI models equivalent actors that should have the same rights? I'd argue emphatically yes they should.

The thing is, we already have doctrine that starts to encompass some of these concepts with fair use.

The four pronged test in US case law:

- the purpose and character of use (is a machine doing this different in purpose and character? many would say yes. is "ripping-off-this-artist-as-a-service" different than an isolated work that builds upon another artist's art?)

- the nature of the copyrighted work

- the amount and substantiality of the portion taken (can this be substantially different with AI?)

- the effect of the use upon the potential market for the original work (might mechanization of reproducing a given style have a larger impact than an individual artist inspired by it?)

These are well balanced tests, allowing me as a classroom teacher to duplicate articles nearly freely but preventing me from duplicating books en masse for profit (different purpose; different portion taken; different impact on market).

The problem with this conversation is that its being had by people that make the top level comment here stating that clothing is not copyrightable. It is. Clothing design is copyrightable. This was a huge recent case, Star Athletica. They know nothing about copyright law and they just build intuitions from the world around them, but the intuitions are completely nonsense because they are made in ignorance of the actual law and what the law does and why the law does it. I find it exhausting.

Your sentiment is probably correct in that there are many aspects of copyright law that are not strictly aligned with the public’s intuition. But your example is a bit of a reach. Star Athletica was a relatively novel holding that allows for a specific piece of clothing, when properly argued, could qualify as copyrightable as a semi-sculptural work of art, however this quality of a given piece is separate to its character as clothing. In fact, the USSC in Star Athletica explicitly held a designer/manufacturer has “no right to prohibit any person from manufacturing [clothing] of identical shape, cut, and dimensions” to clothing which they design/manufacture. That quote is directly from a discussion of the ability to apply copyright protections to clothing design. I think the end result is that trying to argue technical legal issues around a poorly implemented statutory regime is always fraught with errors. That really leave moral and commercial arguments outstanding and advocacy should try and focus on that, when not fighting to affect change in the law these copyright determinations are based on.

And just to be clear, this post does not constitute legal advice.

You're dismissing my comment because of what someone else said upthread?

I hate the desire to meta-comment about the site rather than argue on the merits.

We obviously don't know so much about how courts will interpret copyright with LLMs. There's a lot of arguments on all sides, and we're only going to know in several years after a whole lot of case law solidifies. There are so many questions, (fair use, originality, can weights be copyrighted? when can model output be copyrighted? etc etc etc). Not to mention that the legislative branch may weigh in.

This discourse by citizens who are informed about technology is essential for technology to be regulated well, even if not all participants in the conversation are as legally informed as you'd wish. Today's well-meaning intuition about what deserves copyright and why inform tomorrow's case law and legislation.

Are there any humans that can produce artwork without ingesting inspiration from other art?

This sounds so detached from human experience that I am tempted to ask if you are a human or just a disembodied spirit that haunts the internet.

When the first neanderthal drew a deer on the walls of a cave, where did they get inspiration?

When a little child draws a tree for the first time, where do they draw inspiration? Do you think they were reviewing works of Picasso?

When the firm man made an axe, chopped a tree, made a bed, sown some clothes, discovered fire, where did they draw inspiration?

Do you not have eyes, ears, do you not perceive and get inspiration from the natural world around you?

Yeah, but that’s not really your sole source of inspiration. My son has been ‘inspired’ by the art of all other kids in his kindergarden. Certainly by the time he gets to the age where he does it professionally he’s been inspired by an uncountable number of people.

What % is his independent inspiration? 30%? 90%? There are certainly people for whom it was 90%. For most we don’t know.

We do know one thing for sure - that for AI it’s 0%

We don't know what percentage is independent inspiration for a person using the AI to create art.

Once upon a time it was a contentious idea that humans had significant authorship in photographs, which merely mechanically captured the world. What % is the camera's independent inspiration?

Here, we have humans guiding what's often a quite involved process of synthesis of past human (and machine) creation.

We don't know what percentage is independent inspiration for a person using the AI to create art

The person using the AI doesn't matter in the equation. They aren't an artist, they're a monkey with a typewriter.

We're talking about the AI here, because it can generate the same images no matter which monkey with a typewriter is typing the prompts.

The person using the AI doesn't matter in the equation. They aren't an artist, they're a monkey with a typewriter.

That's an opinion.

Does your opinion hold in all circumstances? If I spend 20 hours with an AI, iterating prompts, erasing portions of output and asking it to repaint and blend, and combining scenes-- did I do anything creative?

Being inspired isn't against the law. copying is. it'd be one thing if this conversation could be had with useful terminology that's actually on point. instead we have you, insisting that there is no creative process, there is only experiencing other art and inevitably copying (because apparently you think that's the only thing humans can do!). It's all so telling. Yet its tragic because so many here don't even realize it. I'm sad for your inability to engage with creativity and creative acts.

I think a lot of the discussion is where the balance of the creativity lies when a human uses a model (trained on other artistic works) to create art.

Is the result a copy, or perhaps a derivative work of the art in the training set?

Does the person using the model have authorship of the result?

Was it even okay to use the art to train the model and then share the resulting weights?

Are the resultant weights protected by copyright themselves?

I suspect the actual answers we'll come to on these topics will be full of nuance.

When a little child draws a tree for the first time, where do they draw inspiration? Do you think they were reviewing works of Picasso?

Are we going to discount the hundreds to thousands of artistic pictures children are exposed to? Or how about the teacher sitting up front demonstrating to the class how to draw a tree?

Do you not have eyes, ears, do you not perceive and get inspiration from the natural world around you?

Learning to see as an artist is a distinct skill. Being able to take the super compressed simplified world view that mind sees and put something recognizable on paper is a specialized skill that has to be developed. That skill is developed by doing it over and over again, often by copying the style of an artist that someone enjoys.

Or to put it another way, go to any period in history prior to the mid 20th century and art in a given region starts to share the same style, dramatically so, because people were inspired by each other, almost to a comical extent. (Financial reasons also had something to do with it as well of course, Artists paint/carve/engrave/etc what sells!)

Are there any humans that can produce artwork without ingesting inspiration from other art?

Do you think art was there before humans? Or humans made art?

If you believe the 1st proposition… please tell me about your very unique religion!

If not… you've answered your own question.

Are there any humans that can produce artwork without ingesting inspiration from other art?

Logically, the answer to this is (almost certainly) yes, so you’ll need to discount this argument.

If the answer were no, then either an infinite number of humans have lived (such that there was always a previous artist to learn from), or it was true in the past but false in the present, which seems unlikely given humans brains have generally become more and not less sophisticated over time.

I presume what you’re missing here is that the brain can be inspired from other sources than human art. For example: nature; life experience; conversation.

Not making any other comment about what machines can or can’t do, just wanted to point out this argument is invalid as it comes up a lot and is probably grounded in ignorance around the artistic process. It’s such a strange idea to suggest that the artist process is ingesting lots of art to make more art. That’s such a weird world view. It’s like insisting every artist is making art the way Quentin Tarantino makes films.

I’ve spent a lot of time with artists, I’ve worked with them, I’ve been in relationships with artists, and I can tell you the great ones see the world differently. There’s something about their brains that would cause them to create art even if born on a desert island without other human contact. Some of them don’t even take an interest in other art.

In fact, those artists that _do_ make art heavily based on other artists’ work as suggested are often derided as “derivative” and “unoriginal”.

But as long as it needs to be trained on the work of humans it should not be allowed to displace those people it relied on to get to where it is. Simple as that.

Do you feel the same way about tools like Google Translate?

Tbh I'm not familiar enough with how Google Translate is built, but if it's ingesting tons of people's work without their permission so it can be used to replace them then yes I do.

For what it's worth: that's pretty much how Translate works.

Translate operates at a large-chunk resolution, and one of the insights in solving the problem was the idea that you can often get a pretty-good-enough translation by swapping a whole sentence for another whole sentence. So they ingest vast amounts of pre-translated content (the UN publications are a great source, because they have to be published in the language of every member nation), align it for sentence- and paragraph-match, and feed the translation engine at that level.

It's created an uncanny amount of accuracy in the result, and it's basically fed wholesale by the diligent work of translators who were not asked their consent to feed that beast. Almost nobody bats an eye about this because the value (letting people using different languages communicate with each other) grossly outstrips the opportunity cost of lost human translator work, and even the translators are, in general, in favor of it; they aren't going to be displaced because (a) it doesn't really work in realtime (yet), (b) it can't handle any of the deeper signal (body language, tone, nuance) of face-to-face negotiation, and (c) languages are living things that constantly evolve, and human translators handle novel constructs way better than the machines do (so in high-touch political environments, they matter; the machines have replaced translators in roles like "rewriting instruction manuals" that were always pretty under-served in the first place).

Vastly inappropriate comparison- there are millions of pages of text out of copyright, you can get a good translation engine using public domain.

That’s is not the case for art, vast majority of art used by midjourney is not public domain.

vast majority of art used by midjourney is not public domain

Is that true? How did you establish that?

It's unfortunately also not great for translation. Language changes fast enough that training on content that went out of copyright is old data.

OpenAI has basically admitted it. Is OpenAI even disputing that it ingested all the works its being sued over? Not as far as I can tell.

Google translate is very basic and not even close to something good if you already know both languages. Useful if you're translating to your language (you do the correction when reading), but can lead to confusion the other way.

Interesting distinction.

If you can do the correction when reading, it seems reasonable to assume the reader in the opposite direction has the same correction capability.

I would expect the chance of confusion to be identical. The only difference is a matter of perspective, where in one case you are the reader and in one case you are the author.

Yes, they are identical. But I believe the reader is better armed to deal with the confusion, or at least to recognize the error, because it does not fit it. But when producing, you don't know the target language, so there's a better chance for errors to slip in unnoticed.

It's better for me to receive a text in the original language and translate it myself than to try to decipher something translated automatically.

I would argue that Translate being fed by paid UN translators who likely agreed to the use of their transcriptions in a TOS or something is not an equal comparison to unpaid artists having their art submitted online to sites which become part of a training set used in for-profit models such as OpenAI, that they never consented to. OpenAI is a nonprofit parent company, but this spawned a child for-profit company OpenAI LP which most of their staff work for, which is meant to return many-fold returns to their shareholders who are effectively profiting from the labor of all the artists and sources in their training.

What about code? Or what about if we eventually robot labourers that is trained on observing human labourers?

Code has licenses too. And we've had very high profile lawsuits based on "copying code".

what about if we eventually robot labourers that is trained on observing human labourers?

Interesting point, but by that point in time I don't think generative art will even be in the top 10 ethical dilemmas to solve for "sentient" robots.

As it is now, robots aren't the ones at the helm grabbing data for themselves. Humans give orders (scripts) and provide data and what/where to obtain that data.

Why is art some special case that should be protected, when many other industries were not?

Because in this case the art is still necessary for the machine to work. You don't need horse buggies to make a car, nor existing books to make a printing press. You DO need artist's art to make these generative AI tools work.

If these worked purely off of open source art or from true scratch, I wouldn't personally have an issue.

Why should we kill this technology to protect existing artistic business models,

We don't need to kill it. Just pay your dang labor. But if we are treating proper compensation as stifling technology, I'm not surprised people are against it.

Maybe in the 2010's tech would have the goodwill to pull this off in PR, but the 2020's have drained that goodwill and then some. Tech's made so many promises to make lives easier and now they joined the very corporations they claimed to fight against.

Nobody can really answer these questions.

Well it's in courts, so someone is going to answer it soon-ish

> We don't need to kill it. Just pay your dang labor.

> But if we are treating proper compensation as stifling technology, I'm not surprised people are against it.

That's just it, nobody looking to get paid by OpenAI actually did any labor for OpenAI. They did labor for other reasons, and were happy with it.

OpenAI found a way to benefit by learning from these images. The same way that every artist on the planet benefits by learning from the images of their fellow artists. OpenAI just uses technology to do it much more efficiently.

This has never been considered labor in the past. We've never asked artists to "properly compensate" each other for learning/inspiration in the past. I don't know why it should be considered labor or proper compensation now.

But we shall see what the courts decide!

There are many ways an artist can compensate their influences. Some of them are monetary.

When discussing our work, we can name them.

When one of our influences comes out with a new body of work, we can gush about it to our own fans.

When we find ourselves in a position of authority, we can offer work to our influences. No animation studio is really complete without someone old enough to be a grandfather hanging out helping to teach the new kids the ropes in between doing an amazing job on their own scenes, and maybe putting together a few pitches, for instance.

We can draw fan art and send it to them.

None of these are mandatory, but artists tend to do this, because we are humans, and we recognize that we exist in a community of other artists, and these all just feel like normal human things to do for your community.

And if an artist suddenly starts wholesale swiping another artist's style without crediting them, their peers get angry. [1]

1: https://en.wikipedia.org/wiki/Keith_Giffen#Controversy

OpenAI isn't gonna tell you that it was going for a Cat & Girl kind of feel in this drawing. OpenAI isn't gonna offer Dorothy Gambrell a job. OpenAI isn't going to tell you that she just came out with a new collection and she's still at the top of her game, and that you should buy it. OpenAI's not going to send her a painting of Cat & Girl that it did for fun. OpenAI isn't going to do anything for her unless the courts force it to, because OpenAI is a corporation who has found a way to make money by strip-mining the stuff people post publicly on the Internet because they want other humans to be able to see it.

Most people know 20,000-40,000 words. Let's call it 30,000. You've learned 99.999% of those 30,000 people from other people. And don't get me started on phrases, cliches, sentence structures, etc.

How many of those words do you remember learning? How many can you confidently say you remember the person or the book that taught you the word? 5? 10? Maybe 100?

That's how brains work. We ingest vast amounts of information that other people put out into the world. We consume and it incorporate it and start using it on our own work. And we forget where we even got it. My brain works this way. Your brain works this way. Artists' brains work this way. GPT-4 works this way.

The idea that a visual artist can somehow recall where they first saw many of the billions of images stored in their brain -- the photos, movies, architecture, paintings, and real-life scenes that play out every second of every day -- is laughable. Almost all of that goes uncredited, and always will.

This is what it is to learn.

I tend to fall more on the "training should be fair use" side than most, but your comment seems to be missing the point. Nobody is arguing that models are violating copyright or social norms around credit simply because they consume this information. Nobody ever argued/argues that the traditional text generation in markov models on your phone's keyboard runs afoul of these issues. The argument being made is that these particular models are now producing content that very clearly does run into these norms in a qualitatively different way. You cannot convincingly make the argument that the countless generated "X, but in the style of Y" images, text, and video going around the internet are exclusively the product of some unknowable mishmash of influences -- there is clearly some internalized structure of "this work has this name" and "these works are associated with this creator".

To take it to an extreme, you obviously can't just use one of the available neural net lossless compression algorithms to circumvent copyright law or citation rules (e.g., distributing a local LLM that helpfully displays the entirety of some particular book when you ask it to), you can't just tweak it to make it a little lossy by changing one letter, or a little more lossy than that, etc., while on the other hand, any LLM that performs exactly the same as a markov model would presumably be fine, so there is a line somewhere.

A company hires an artist. That artist has observed a ton of other artists' work over the years. The company instructs that artist to draw, "X but in the style of Y", where Y is some copyrighted artwork. The company then prints the result and puts it on their packaging.

A company builds an AI tool. That AI tool is trained on a ton of artists' work over the years. The company opens up the AI tool and asks it to draw, "X but in the style of Y," where Y is come copyrighted artwork. The company then prints the result and puts it on their packaging.

What's the difference?

I'd argue there isn't one. The copyright infringement isn't the ability of the artist or the AI tool to make a copy. It's the act of actually using it to make a copy, and then putting that out into the world.

Okay, but then that's an an argument subject to the critiques made upthread that you were initially trying to dismiss? You can't claim that AI doesn't need to worry about citing influences because it's just doing a thing humans wouldn't cite influences for, then proceed to cite an example where you would very much be expected to cite your influences, and AI wouldn't, as evidence.

I never argued that AI doesn't need to worry about citing influences. If I am a person using a tool to create a work, and the final product clearly resembles some copyrighted work that I need to reference and give credit to, what does it matter if my tool is a pencil, a graphics editing program, a GPT, or my own mind? I can cite the work.

Like I said, this is exactly what the comment you first replied to was explaining. It is very clearly not the same as a pencil or a graphics editing program, because those things do not have a notion of Cat & Girl by Willem de Kooning embedded in them that they can utilize without credit. It is clearly not the same as your mind, because your mind can and, assuming you want to stay in good standing, will provide credit for influence.

Again, take it back to basics: do you believe it is permissible to share a model itself (not the model output, the model), either directly or via API, that can trivially reproduce entire copyrighted works?

I'd say that a tool itself can't be guilty of copyright infringement, only the person using the tool can. So it doesn't matter if the GPT has some sort of "notion" of a copyrighted work in it or not. GPTs aren't sentient beings. They don't go around creating things on their own. Humans have to sit down and command them, and that point, whoever issued the command is responsible for the output. Copyright violation happens at the point of creation or distribution, not at the much earlier point of inspiration or learning.

So yeah, of course imo it should be permissible to share a model that can reproduce copyrighted works. Being "capable of being used" to violate a law is not the same thing as violating a law.

A ton of software on my computer can copy-paste others' work, both images and words. It can trivially break copyright. Hell, there are even programs out there than can auto-generate code for me, code that various companies have patent claims for. Do I think distributing any of this software should be illegal? No. But I think using that software to infringe on someone's copyright should be.

(Note: This is different than if the program distributed came with a folder that included bunch of copyrighted works. To me, sharing something like that would be a copyright violation.)

I'm not sure how to explain this any clearer. I am talking about neural net compression algorithms. As in, it is literally just a neural net encoding some copyrighted work, and nothing else. It is ultimately no more intelligent than a zip file, other than the file and program are the same. You can't seriously believe that these programs allow you to avoid copyright claims, can you? Movie studios, music producers, and book publishers should just pack it in, pirates just need to switch to compressing by training a NN, and seeding those instead, and there's no legal precedence to stop them? If you do think that, do you at least understand why nobody is going to take your position seriously?

A neural net designed to do nothing other than compress and decompress a copyrighted work is completely different than GPT-4, unless I'm uninformed. To me that sounds like comparing a VCR to a brain. GPT-4's technology is clearly something that "learns" in order to be able to produce novel thoughts and ideas, rather than merely compressing. A judge or jury would easily understand that it wasn't designed just to reproduce copyrighted works.

> It is clearly not the same as your mind, because your mind can and, assuming you want to stay in good standing, will provide credit for influence

I forgot to respond to this, but it's not true. Your mind is incapable of providing credit for 99.9% of its influence and inspiration, even when you want it to. You simply don't remember where you've learned most of the things you've learned. And when you have a seemingly novel idea, you can't always be aware of every single influential example of another person's work that combined to generate that new idea.

What's the difference?

Ultimately, only high courts in each jurisdiction can decide. I can imagine a case where some highly advanced nations decide different interpretations that cause conflict. Then, we need an amendment to the widely accepted international copyright rules, the Berne Convention. Ref: https://en.wikipedia.org/wiki/Berne_Convention

The artist has a claim for production of a derivative work and for passing off against the other artist.

Individual words aren't comparable to the things people are worried about getting copied. People are much more able to tell you where they learned about more sophisticated concepts and styles.

The same principle applies, though. They can tell you maybe a dozen, maybe a few dozen, concepts they've learned and use in their work. But what about the thousands of concepts they use in their work they can't tell you about? The patterns they've noticed, the concepts that don't even have names, but that came from seeing things in the world world that were all created by other people?

For example, how many artists drawing street scenes credit the designer at Ford Motors for teaching them what a generic car looks like? How many even know which designers created their mental model of a car?

That's just it, nobody looking to get paid by OpenAI actually did any labor for OpenAI. They did labor for other reasons, and were happy with it

Nobody working on a new cancer drug actually did any work for me. They did labour for other reasons, and were happy with it.

There it is okay for me to steal their recipe and sell their cancer drug.

Nope, but it’s ok for you to read their recipe if they place it on the internet (research paper), and use it to make your own drug.

And that is a good thing we should all celebrate.

The entire point of the patent system was to say inventors can put their design on the net without it being stolen; so future inventors can build on their work.

That's just it, nobody looking to get paid by OpenAI actually did any labor for OpenAI.

To me this is a strong point in favor of the idea that OpenAI has no business using their work. How can you even think it's ok for OpenAI to use work that was not done for them without paying some kind of license? They aren't entitled to the free labor of everyone on the internet!

How can you even think it's ok for OpenAI to use work that was not done for them without paying some kind of license?

At the risk of answering a rhetorical question: because copyright covers four rights: copying, distribution, creation of derivative works, and public performance, and LLM training doesn't fit cleanly into any of these, which is why many think copying-for-the-purpose-of-training might be fair use (courts have yet to rule here).

I think the most sane outcome would be to find that:

- Training is fair use

- Direct, automated output of AI models cannot be copyrighted (I think this has already been ruled on[0] in the US).

- Use of an genAI to create works that would otherwise be considered a "derivative work" under copyright law can still be challenged under copyright.

The end result here would be that AI can continue to be a useful tool, but artists still have legal teeth to come after folks using the tool to create infringing works.

Of course, determining whether a work is similar enough to be considered infringing remains a horribly difficult challenge, but that's nothing new[1], and will continue to hinge on how courts assess the four factors that govern fair use[2].

[0]: https://www.reuters.com/legal/ai-generated-art-cannot-receiv...

[1]: https://www.npr.org/2023/05/18/1176881182/supreme-court-side...

[2]: https://fairuse.stanford.edu/overview/fair-use/four-factors/

We've never asked artists to "properly compensate" each other for learning/inspiration in the past.

LLMs are collections of GPUs crunching numbers. "Inspiration" doesn't really apply to them.

A better analogy is sampling, and musicians remixing music are very much required to pay for the samples they use.

Only if the "use" where use means distribute.

If I sample a track and play it in my home I don't properly compensate anyone.

If I ask GPT to create a cool new comic based on the article and then delete or use it privately it, same applies.

They did labor for other reasons, and were happy with it.

True, sadly most of those copyright are probably owned by other megacorp. So they either collude to surppess the entire industry or eat each other alive in legal clashes. The latter is happening as we speak (the writers for NYT are probably long retired, but NYT still owns the words) so I guess we'll see how that goes.

OpenAI found a way to benefit by learning from these images. The same way that every artist on the planet benefits by learning from the images of their fellow artists.

If we treat AI like humans, art historically has an equally thin line between inspiration and plagiarism. There are simply more objective metrics to measure now because we can indeed go inside an AI's proverbial brain. So the metaphor is pretty apt, except with more scrutiny able to be applied.

They did labor for other reasons, and were happy with it.

They were happy until their copyright got stolen, I guess. Then got unhappy.

Redacted.

Mass production hasn't killed art and never will.

What's killing art is this idea by a vocal minority of "artists" that they need to mass produce their work, enter the market, and attempt to make millions of dollars by selling and distributing it to millions.

That's not art. That's capitalism. That's competing to produce something that customers will want to buy more than what your competitors offer.

If you want to compete on the capitalistic marketplace, then compete on the capitalistic marketplace. But if you want to be an artist, be an artist.

Art is still alive and well and always will be. Every day I see people singing because they love singing, making pottery because they love making pottery, writing because they love writing. Whether other people love or enjoy their art, the artist may or may not care. Whether they can profit from their art, the artist may or may not care. But many billions of artists will keep creating, crafting, and designing day after day, and they will never be stopped by AI or anything else.

Redacted.

Jobs have never been less soul crushing, or more creative, in the history of humanity. And that becomes increasingly true every decade.

Do you know what a job does? What a company does? It contributes to society! It produces something that someone else values. That they value so much they're willing to pay for it. Being part of this isn't a bad thing. It's what makes society work.

A job/company entertains. It keeps things clean. It transports people to where they need to go. It produces. It gives people things they want. It creates tools, and paints, and nails, and shirts. I look out my window, and I see people delivering furniture, chefs cooking food and selling it out of trucks, keepers maintaining grounds, people walking dogs.

Being useful to the fellow members of your society for 40 hours a week is not "soul crushing."

Hey. Thanks. Sorry about wasting your time. Shouldn't have started in the first place. It was my fault for trying to make a silly point.

Too mid to understand your point.

(This is a response to your comment before you edited it.)

Find the intersection of something that people increasingly value, that you enjoy, and that you can compete at.

The best proof that people value something is that they're spending money for it. If people aren't spending money, they don't value it, and you probably don't want to go into it. If people aren't spending more and more money on it every year, then it's not increasing in value, and you probably don't want to go into it.

The best proof that you enjoy something is that you enjoyed it in the past. Things you liked as a kid, activities that excited you as a young adult, etc., are often the best candidates.

Look for intersections of the two things above. Do some Googling, do some research.

Finally, you need to be able to compete at it. If you do something worse than everyone else does it, then no one will pick you, because you're probably not being helpful. The simple answer to this is to practice to make yourself better. But most people don't want to do that. A better answer to this is to be more unique, so you can avoid the competition. Don't do a job that has a title, a college major, and millions of talented applicants. It's not that helpful to society to do something a hundred million other people can already do, which is why there's more competition and lower wages.

When you find the intersection of what's valued and what you enjoy, call up some people in those fields and ask what's rare. What in their area is needed. What are they missing. What is no one else doing.

Or just start your own company. That's the easiest way to be unique. But it's hard.

Finally, if you feel you're too "mid," then make sure your standards aren't crazy. Don't let society tell you that you need to be a millionaire with a yacht and designer clothes to be happy. Get a normal 9 to 5 with some purpose in it, that you can be proud of, that others appreciate. Live within your means and don't stress yourself out financially. Spend your free time doing things you like. Take care of your health, find good relationships, and treasure them. That's a happy life at any income. I know a bunch of miserable depressed rich people who are very good at making money and very bad at health/relationships/etc., which is the real stuff that life is made out of.

People do whatever they want with their own property. You have no right to steal it just because they want to monetise it. What’s killing art is stealing it en masse using procedural generators.

What would you buy? $10 H&M or $100 hand-made shirt? - (My guess, if you could afford the later.)

This is an interesting example because even in the $100 case you are still talking about machine-augmentation. You can have a seamstress or a tailor customize patterns, using off the shelf textiles, for that order of magnitude price - but if you want to use custom built, exotic materials or many kinds combined, the cost is on the orders of thousands not hundreds. Also there is a large industry of just printing designs on stock-shirts, that has a different point effort-scale equilibria.

Thinking about how how automation disintermediates is very important. For animation, often productions have key-frame artists in the animation pipeline that define scenes, and then others that take though to flush out all the details of that scene. GenAI can potentially automate that process. You could still have the artist producing a keyframe, and can render that into a video.

Another big factor is style. One hypothesized reason that more impressionism, absurdism or abstract art all become styles is photography. Once cheap machine-produced photography became available, there is less need for a portrait artist. But further, it also is no longer high-status and others push trends alternative directions.

All the experiments and innovation going right now will definitely settle into a different set of roles for artists, and trends that they will seek to satisfy. Art-style itself will change as a result of both what is technically possible and also what is _not_ easily automatable in order to gain prestige.

Too much wall of text for nothing. Nobody is stopping you from buying hand crafted masterpiece. Just get out of the way of progress.

I'm confused about your point. Are you saying we should ban $10 mass produced shirts so that more people can make a living hand-crafting $100 shirts?

What if the AI was solely trained on this person's work, then from that churned out a similar replacement that was monetized?

Well art predates other professions by like thousands of year so it rightfully earned it's privileges.

It's an interesting predicament. Assuming these stories between person and machine are indistinguishable and of same quality, then the difference here is the ability to scale. Without giving bias because of humanity reasons, why should we give entitlement to output derived from a human over something else of same quality?

I hate making analogies, but if we make humans plant rows of potatoes, should that command a higher price and seen more valuable than planting potatoes by tractor 20 rows wide?

Without giving bias to humanity

No, we should absolutely be giving bias to humanity. Flesh and blood humans matter, their lives matter, their thoughts matter and their work matters.

Machines are tools for them to use not entities given the same rights and same consideration.

I reject your whole premise.

Exactly; their flesh, blood, energy, etc. does matter. This is my argument for it, not for your argument against it, lmao. There's nothing more remarkable about my planted potato row vs the tractor planted rows, and my energy can be spent elsewhere. I am not entitled to making a living hand planting potatoes if there's not a market for it.

People have the choice to continue making stories and they'll have a fanbase for it and always will, because that's ultimately apart of freedom and choice. Many are less what I'll call purists here, and don't care about how it came to be, they just want a quality story.

What you're loosely proposing is art being a protected class of output, when we have tools that can match and soon with the potential to surpass. Is that not a terrific way to stunt what you're trying to defend?

For transparency, I am an advocate for human made art, but I am against stunting tooling that can otherwise match said creativity. I see that as an artform in itself.

For transparency, I am an advocate for human made art,

If you believe AI tooling is an artform then you categorically are advocating against human made art as far as I am concerned.

This is just gatekeeping. Art is not better because it was made by hand as opposed to with technology. If I use a generative model to make art then I’m an artist.

I would argue art is better when it's the result of the effort and vision of an individual

prompting a search engine to stitch images together on your behalf might result in an image you can call art, but imo all the art generated wholecloth like this sucks. necessarily derivative. put into the world without thought.

My favorite critique of LLM work: "why would I bother to read a story that no one bothered to write"

If I use a generative model to make art then I’m an artist.

You are free to think so, but it really doesn't make you an artist any more than wearing a medal you bought second hand makes you a war hero.

Something else did the work and you're just claiming credit. It's honestly kind of sad.

Seriously asking: if I customize my order at a fast food joint am I a chef? How is that different from prompt engineering to generate art?

Plenty of people would disagree so clearly this is not a settled matter

with the potential to surpass

I think AI art will by definition never surpass human art. Humans can be inspired by things other than the art of others.

Descartes told us that animals are mere soulless automatons, not entities given the same rights and same consideration as humans.

Well ok, that was 300 years ago and views have changed dramatically since then.

Nice strawman.

So you instead want to what? Ban the tools because they interfere with doing things the human way?

no force the people creating and profiting from the tools to get permission from the people they mine the data from or cease operating

I feel like the issue here, is you are giving AIs agency.

AIs are not magic. They are tools. They are not alive, they do not have agency. They do not do things by themselves. Humans do things, some humans use AI to do those things. Agency always rests with a combination of the tool's creator and operator, never the tool itself.

Is there really a difference between a human flooding the market using AI and a human flooding the market using a printing press?

Even if human's can't compete (An obviously untrue premise from my perspective, but lets assume it for the sake of argument), is that a bad thing? The human endeavor is not meant to be a make work project. Humans should not be forced to pointlessly toil out of protectionism when they could be turning their attention to something that can't be automated.

Is there really a difference between a human flooding the market using AI and a human flooding the market using a printing press?

A magnitude of difference, yes. Even a printing press will be limited by natural resources, which require humans to procure.

A computer server can do a lot more with a lot less. And is much easier to scale than a printing press.

Even if human's can't compete (An obviously untrue premise from my perspective, but lets assume it for the sake of argument), is that a bad thing?

When the AI can be argued to be stealing human's work, yes. A printing press didn't need to copy Shakespeare to be useful. And it'd benefit Shakespeare anyways because more people get to read about his works.

So far I don't see how AI benefits artists. Optimistically:

-an artist can make their own market? Doubtful, they will be outgunned by SEO optimized ads from corporations.

- they can make commissions faster? To begin with commissions aren't a sustainable business. Even if they 5x the labor and somehow kept the same prices they aren't living well. But in reality, they will get less business as people will AI their "good enough" art and probably won't pay as much for something not fully hand drawn

- okay, they can make bigger commissions? There's a drama about spending 50k on a 3 minute AMV, imagine if that could be done by a single artist in a day now!... Well, give it another 10 years. Lot of gen Ai is static assets. Rigging or animating is still far from acceptable quality, and a much harder problem space. I also wouldn't be surprised if by then any AI models has its own phase of enshittification and you end up blowing hundreds, thousands anyway.

-----

Humans should not be forced to pointlessly toil out of protectionism when they could be turning their attention to something that can't be automated.

Until someone conceptualizes a proper UBI scheme, pointlessly toiling is how most of the non-elite live. I have yet to hear of a real alternative for these misplaced artists to move towards.

So what? So we all just become managers in meetings in 30 years?

A magnitude of difference, yes. Even a printing press will be limited by natural resources, which require humans to procure. A computer server can do a lot more with a lot less. And is much easier to scale than a printing press.

AI runs on some of the most power hungry and expensive silicon on the planet. Comparing a GPU cluster and a printing press then staring the GPU cluster not limited by natural resources is just silly. Where does the materials come from to make the processors?

When the AI can be argued to be stealing human's work, yes. A printing press didn't need to copy Shakespeare to be useful. And it'd benefit Shakespeare anyways because more people get to read about his works.

The same can be true for AI as well. I could see a picture and then ask AI whose style it is. Then I could go look up more work by that artist, increasing their visibility.

Is this a complaint that something got cheaper to make? This one affects more than just artists. For instance, code quality output from LLM is quite high. So, wages across the board will decrease yet capabilities will increase. This is a problem external to AI.

Until someone conceptualizes a proper UBI scheme, pointlessly toiling is how most of the non-elite live. I have yet to hear of a real alternative for these misplaced artists to move towards.

Again, not just artists and the path forward is the same as it’s always been with technological advancements, increase your skill level to above the median created by the new technology.

Comparing a GPU cluster and a printing press then staring the GPU cluster not limited by natural resources is just silly. Where does the materials come from to make the processors?

Probably mined from 3rd world country slaves (in the literal "owning people" sense). But still, these servers already exist and scale up way more than a tree.

well. I could see a picture and then ask AI whose style it is. Then I could go look up more work by that artist, increasing their visibility.

Sure, and you can use p2p to download perfectly legal software. We know how the story ends.

Is this a complaint that something got cheaper to make... not just artists and the path forward is the same as it’s always been with technological advancements, increase your skill level to above the median created by the new technology.

It's a complaint that people even woth more efficiency still can't make a living. While the millionaires become billionaires. I'm not even concerned about software wages. Some Principal SWE going from 400k to 200k will still live fine.

Artists going from 40k to 40k (but now working more efficiently) is exactly how we ended up with wages stagnating for 30 years. And yes, it is affecting everyone even pre-AI. The median is barely a living wage anymore, which is what "minimum wage" used to be.

If we lived in a work optional world I don't think many would care. But we don't and recklessly taking jobs to feed the billionaires is just going to cause societal collapse if left unchecked.

Probably mined from 3rd world country slaves (in the literal "owning people" sense).

Why do you think the device you're using to make this comment is better than a GPU?

Was your comment written by some new flamebait AI? You say so many arguments that fall apart under the slightest examination

It's like you are mad at gravity. That sucks you feel that way, but very unlikely to change anything.

It'd be a good point if it wasn't for the fact that search engines didn't exist until google, because of technology, and that courts didn't need to consider the issue until then. So where does your point get us? We are here now.

Search engines are an index, which have existed for centuries.

When i was in university, i remember there was this humanities professor who had a concordance for the iliad on his shelf. As a CS person it was so cool to see the ancient version of a search engine.

"search engines didn't exist until google" - you might want to, uh, google that

These models are not conscious, they’re not acting on their own. If I make art using a generative model it’s no more the model doing it than it’s sketchbook doing it if I were to use that. I’m making art using whatever tool, sometimes that tool is more or less powerful. But I’m the one doing it.

What about ML models that only publish 1 or 2 books a year?

Is it realy about volume?

Did you not pay them when you bought their book to read it in the first place? That dead trees don't lend themselves to that sort of payoff is a limitation of the technology. In music, sampling is a well-accepted mechanism for creating new music, and the original authors of the music they used do get paid when the new one is used.

No, I bought the books used for 25 cents at a local booksale, and the authors did not benefit from my secondary market transaction.

> the authors did not benefit from my secondary market transaction.

But they did. The presence of a secondary market for used books increased the value of some new books. People buy them knowing that they might one day recoup some costs by selling them. Would people pay more, or less, for a new car if they were told they could never sell or trade it away as a used car?

Gee I don't know, but I'm glad that digital goods do not incur the same material costs as a car. "You wouldn't download a car", we've come full circle.

Lol, did an AI write this? Literally no one buys books because they might one day recoup a fraction of the sticker price on the secondary market.

Baffling

You got that via the legal "first sale doctrine" which has been killed for digital works.

It's a tough issue to correlate to physical goods, especially when you realize that people sometimes donate books.

"In 2012, the Court of Justice of the European Union (ECJ) held in UsedSoft GmbH v. Oracle International Corp that the first sale doctrine applies to used copies of [intangible goods] downloaded over the Internet and sold in the European Union." [0]

Arguably the U.S. courts are in the wrong here. We can only hope first sale doctrine is extended to digital goods in the U.S. in the future, as it has been in the EU for over a decade.

[0] https://scholarlycommons.law.northwestern.edu/cgi/viewconten...

How many books can you write per second?

How how many books per second can you read to influence and change your personal style?

I don't think any person who actually has worked on anything creative in their life would compare a personal style to a model that can output in nearly any style at extreme speeds. And even if you're inspired by a specific author, invariably what happens is it becomes mix of yourself + those influences, not a damn near-copy.

With visual mediums it's even worse, because you have to take the time [months, years] to specialize in that specific medium/style.

> How many books can you write per second?

On my laptop, using modern tools backed by AI? ... many.

> How how many books per second can you read to influence and change your personal style?

Thanks now to AI, hundreds. I can plug the output of the book-reading AI into the input of the tool I use to write my books and thereby update my personal style to incorporate all the latest trends. Blame the idiots who are paying me for my books.

So, zero. You yourself: zero.

You completely ignored the premise of the question.

You should read the response more carefully. Generative models are just tools. If I use one to write a story it’s no less a story that I wrote than if I’d chiseled it into a Persian mountainside.

It pretty clearly is. Less of a story that is.

This is clearly a bad-faith response to the point that the GP was making

I don't think any person who actually has worked on anything creative in their life would compare a personal style to a model that can output in nearly any style at extreme speeds. And even if you're inspired by a specific author, invariably what happens is it becomes mix of yourself + those influences, not a damn near-copy.

I don't think anyone who has ever read a novel in their life would say that an AI can write literature at all, in any style.

not a damn near-copy.

The obvious solution is to just treat it as if a human did it. If you did not know the authorship of the output and thought it was a human, would you still consider it copyright infringement? If yes, fair enough. If no, then i think is clearly not a "damn near-copy"

Put differently - if you perfectly memorise Harry Potter, write it down into a book and sell it, you'll get into trouble.

Right, I don't think anyone disagrees with that.

The question is about someone/something writing a book _influenced_ by Harry Potter -- do they owe JK Rowling royalties?

That depends on a variety of factors. You may find yourself in trouble if you write about a wizard boy called Perry Hotter going to Elkwood school of magic and he ends up with two sidekicks (a smarter girl and a redhead boy).

It could be argued quite convincingly that stories like Brooks's Shannara and Eddings's Belgariad are LOTR with the serial numbers filed off — but there is more than enough difference in how various pieces work for those series to make them unique creations that do not infringe on the properties or cover too much the story. (Although I cringe at putting the execrable Belgariad books in any class with either LOTR or Shannara.)

The "best" modern example of this is the 50 Shades series. These are Twilight fan fiction (it is acknowledged as such) with the vampire bits filed off. They are inspired by Twilight, but they are not identifiably Twilight in the end. It might be hard to tell the quality of writing from that which an LLM can produce, and frankly Anne Rice did it all better decades before (both vampires and BSDM).

Humans can be influenced by writers, artists, etc. LLMs cannot. They can produce statistically approximated mishmashes of the original works themselves, but there is no act or spark of creation, insight, or influence going on that makes the sort of question you’re asking silly. LLMs are just math. Humans may be just chemistry, but there’s qualia that LLMs do not have any more than `fortune` does.

but there’s qualia that LLMs do not have

I'm with all your other arguments ... but not this point. What is the special magic property that machine-generated art doesn't have? Both human and machine generated art can be banal, can be crap. And I think there is plenty of machine generated art this a quite beautiful, and if well prompted even very insightful. Non-GenAI can be this way, Conway's game of life has a quality of beauty to it that rivals of forms of modern art. If you wanted to argue that there still is the need for a human to provide some initial inspiration as input, or programming before something of value can be generated, then I would agree, at least for now, though there is meta-argument about asking LLMs to generate their own prompts that makes this an increasingly gray area.

But I don't think the stochastic parrot argument holds water. Most of _human_ creations is derivative. Unique mixes of pre-existing approaches, techniques, substance, often _is_ the creative act. True innovation with no tie to existing materials seems vanishingly rare to me and is really high bar, beyond which most humans ever achieve.

I hope you payed for the book you read.

If openai would pay usage fees for the training material per user its generating content for - it would never be profitable - artist would be fine off. But even all the shares are owned by people who have given this system none of it‘s knowledge.

If openai would pay usage fees for the training material per user its generating content for - it would never be profitable

In that case, good? I thought if nothing else, these past year or two would teach companies about sinking money into unsustainable businesses and then price gauging later (i know it won't, the moment interest rates fall we are back to square one). If it isn't profitable, scale down the means of production (which may include paying C class executives one less yatch per year, tragic), charge more upfront to the customers, or work out better deals with your 3rd parties (which is artists in this case).

I also find some scheudenfredre in that these companies are trying to sell "less enployees" to other companies but would also benefit from said scaling down as they throw out defenses of "we can't afford to pay every copyright" .

Is your mishmash going to be a literal statistical model built on top of those other stories?

There are two problems with this (very common) line of argument.

First, the law is pretty clear that yes if your story is too similar to another work, they have rights. Second, it's not at all obvious we can or should generalize from "what a human can do" and "what a bunch of computers can do" in areas like this.

You are not an AI model, and AI models are not human authors, so your comparison is invalid and question irrelevant.

Did you read the books with the intent to incorporate their ideas into your head and profit of this?

You are not a machine.

Have you noticed that authors and artists love sharing their inspirations? Let's say you're an up-and-coming author. In an interview, you list your sources of inspiration.

Using your logic, why does the creative community celebrate you and your inspirations instead of crying foul like they are with LLMs?

I feel like the keyword is 'almost' and then you begin pulling on that thread:

How closely is this the case? What blind spots exist? How do you measure this? What is the capacity for original idea generation does the human mind have and how does it inspire a unique spin to it?

This is one of those areas where 'thought experiments' are never going to pass muster against genuine experiments with metrics, trial, and robist scientific research.

But with the stakes as they are, I dont have faith there exists a good faith dialogue in this arena.

If I read a lot of stories in a certain genre that I like, and I later write my own story, it’s almost by definition going to be a mish-mash of everything I like.

But it's also going to be affected by the teachers you had in pre-school, the people you hang around with, your relatives, films you've seen, adverts you watched, good memories and bad memories of events. You bring your lived experience to your story, and not just a mish-mash of stories in a particular genre, but everything.

Whereas when you train a model, you know the exact input, and that exact input may be 100% copyright material.

The reason that AI models are generating content similar to other people's work is because those models were explicitly trained to do that.

Ah, just like humans who train against the output of other humans. AI models are not fundamentally different in kind in this regard, only scope, and even that isn't perfectly obvious to me a priori.

Going by this logic, why is OpenAI forbidding use of the content it generates for training other models?

Well, mostly because of corporate greed of ownership. But the underlying issue is that Ai training in AI is a recipe for ruining the entire training set. At least in these early stages.

Not just greed, they want to silence copyright holders whose works they freely use and at the same time prevent others from using theirs. It is like having different set of rules for them. I don't believe training itself is ruining anything, it is the proposed model of value capture and marginalizing content creators that poses greater threat.

Yes, you've condensed the problem in display quite well here. It's not even just hypocrisy, but also short sighted behaviour.

Artists will learn to not trust the web, if they haven't already. The greatest time to train a model was yesterday, eventually no novel ideas, expressions, art will prosper on the "open" web. Just a regurgitation of some statistical idea of words, and pixels.

They can write whatever they want in their Terms of Service. That's the logic.

That doesn't mean that courts will meaningfully enforce it for them.

I understand that, only pointing out the hypocrisy

Because any company, hypocrisy be damned, will use every legal lever at their disposal to protect their business model.

Hope we are not normalizing hypocrisy, usually it is very destructive.

AI models are fundamentally different because a computer is a lump of silicon which is neither a moral subject nor object. A human author is a living sentient being that needs to earn a living and is deserving of dignity and regard.

I'm sorry, but I'm going to fundamentally disagree with you. One does not get a morality pass because "the computer did it". People are creating these AI models, selecting data and feeding the models data on which to be trained. The outcome of that rests upon _both_ the creators of the models and the users prompting the models to achieve a result.

To make it even more stark, people don't kill people, it's the gun that does it.

Oh, right. It just reads a million books in a couple of days, removes all the source information, mix and match it the way it sees fit and sells this output $10/month to anyone comes with a credit card.

It's the same thing with GitHub's copilot.

A book publisher would seize everything I have, and shot me at a back alley if I do 0.0001% of this.

Yeah, fair use implicitly uses the constraints of typical human lifetime and ability to moderate how much damage is done to publishers with it. That wasn’t an issue before recently, as humans were the only ones who could create output based off fair use laws.

Yeah, fair use implicitly uses the constraints of typical human lifetime and ability

Authors Guild, Inc. v. Google, Inc. strongly disagrees with you on that (the "Google Books case").

Humans usually add their own style to things, and it’s hard to discuss copyright without that larger context along with the question of scale (me making copies of your paintings by hand is not as significant a risk to your livelihood as being able to make them unimaginably faster than you can at much lower cost). Just as rules about making images of people in certain ways or places only became critical when photography made image reproduction an industrial-scale process, I think we’ll be seeing updates to fair-use rules based on scale and originality.

Humans can also come up with their own styles and can draw things they’ve never seen, which ML models as they currently exist are not capable of (and likely will never be). A human artist who has lived their entire life in the wilderness and has never trained themselves with the work of another artist will still be able to produce art with styles produced entirely by personal experimentation.

ML models have a long way to go before comparisons to humans make any kind of sense.

Humans who train on material usually buy a book of it, pay for an entry to an exhibition or even pay to own an original.

Maybe they release it for free, possibly supported by ads, at least to get some recognition and a job if it is perceived well.

Ah, just like humans who train against the output of other humans.

Yeah, and creating derivative work without permission is against the law.

AI models are not fundamentally different in kind in this regard

[citation needed]

Of course they are fundamentally different. They don't get to decide what to absorb.

Humans that make those decisions, correspondingly, should pay the price.

I really don't get why so many people seem to think that an AI model training on copyrighted work and outputting work in that same style is exactly the same thing (morally, ethically, legally, whatever dimension you want) as a human looking at copyrighted work and then being influenced by that work when they create their own work.

The first thing is the output of a mathematical function as computed by a computer, while the second is an expression of numan creativity. AI models are not alive. They are not creative. They do not have emotion. These things are not even in the same ballpark, let alone similar or the same.

Maybe someday AI will be sophisticated enough to be considered alive, to have emotion, and to be deserving of the same rights and protections that humans have. And I hope if and when that day comes, humanity recognizes that and doesn't try to turn AI into an enslaved underclass. But we are far, far from that point now, and the current computer programs generating text and images and video are not exhibiting creativity, and do not deserve these kinds of protections. The people creating the art that is used to feed the inputs and outputs of these computer programs... those are the people that should have their rights protected.

As other already pointed out, that's not how human artists learn or produce art. Everyone who uses this brain-dead argument outs themselves as someone who knows nothing about the subject.

The heck are you on about?

Have you ever tried to "train a human"?

They don't work that way, not unless your "training" involves so weird torture stuff you probably shouldn't be boasting about.

Maybe try ask some teachers (of both adults and children) how it works with people...

I firmly believe that training models qualifies as fair use

There's a hell lot of money to be made from this belief so of course the HN crowd will hold it.

Some of us here who have been around the copyright hustle for a little longer laugh at this bitterly and pray that the courts and/or Doctorow's activism saves us. But there's so much money to be made from automatized plagiarism and the forces against are so weak, the hope is not much.

The world will be a much, much poorer place once all the artists this view exploits will stop making art because they need to make a living.

See https://twitter.com/molly0xFFF/status/1744422377113501998 and https://i.imgur.com/zOOcPCi.jpg

Generative models are just a tool. Artists are mad because this tool empowers other people, who they view as less talented, to make art too.

The camera and 1-hour film developing didn’t destroy oil paintings, it just enabled more people to have control over what was on their walls.

Generative models are just a tool.

Sure. It's just a tool. That need other people's art to work.

If it's "just a tool" in and of itself, then there's no problem keeping it away from other people's art.

The camera and 1-hour film developing didn’t destroy oil paintings

Because the copyright laws were extended to include photographic reproduction of art as something you need to obtain a permission (and a license) for.

The same needs to happen for generative AI.

A photocopy machine is just a tool too. So is the printing press.

Sure. It's just a tool. That need other people's art to work.

So does a human brain.

Which brings us to the other side of the reasoning is that tools like Midjourney and OpenAI enable idiots (when it comes to drawing/animating ... that includes me) to create engaging artwork.

Recently generating artwork like that went from almost impossible to "great, but easily recognizeable as AI artwork". Frankly, I expect the discussion will end when it stops being recognizeable.

I hate Andreesen Horowitz' reasoning, but they're right about one thing: once we have virtual artists that are not easy to distinguish from "real" ones, the discussion will end. It does not really matter what anyone's opinion on the matter is as it will not make a difference in the end.

So does a human brain.

A major difference between a human training himself by looking at art, and a computer doing it, is that the human ends up working for himself, the computer is owned by some billionaire.

One enhances the creative potential of humanity as a whole, the other consolidates it in the hands of the already-powerful.

Another major difference is that a human can't use that knowledge to mass-produce that art at a scale that will put other artists in the poorhouse. The computer can.

Copyright exists to benefit humanity as a whole... And frankly, I see no reason for why a neural network's output should be protected by copyright. Only humans can produce copyrightable works, and a prompt is not a sufficient creative input.

Visual artists cannot create without tools. Whether that tool is a brush and paint, a camera, or a neural network.

Whether an artist pays for a subscription to openAI or buys paint pots on Amazon.com money is going to a billionaire, it is not a difference between ai and other art.

You are also ignoring the existence of non-commercial open source AI, they exist.

Regarding copyright, we copyright output not input. Otherwise most photography would be uncopyrightable.

One small nitpick: It is completely possible for an artist to make all of their own tools, and indeed for the majority of history that is exactly how things went.

But today the artist that can also create a robust version of photoshop on their own doesn’t really exist. Maybe some can write code to that level but certainly not a majority and it’s certainly not the same as sanding wood to make a paintbrush.

Ok, here's a pile of sand, the goal is 1. a computer and 2. an AI to run on it. Go!

(spoiler: bootstrapping yourself up the tech tree gets progressively harder)

There's a substantive difference in whether the artist is using the tool, or the tool works on its own. A paintbrush doesn't produce a painting by itself, a human needs to apply an incredibly specialized creative skillset, in conjunction with the paintbrush to do so.

An LLM takes a prompt and produces a painting. No sane person would say that I 'drew' the painting in question, even if I provided the prompt.

Regarding copyright, we copyright output not input. Otherwise most photography would be uncopyrightable.

We copyright things that require creative input. A list of facts or definitions did not require creative input, and is therefore not copyrightable.

Using an LLM does not meet the bar for creative input.

There's a substantive difference in whether the artist is using the tool, or the tool works on its own. A paintbrush doesn't produce a painting by itself, a human needs to apply an incredibly specialized creative skillset, in conjunction with the paintbrush to do so.

That sounds like a kinder restatement of the opinion at the top of the thread: "Artists are mad because this tool empowers other people, who they view as less talented, to make art too."

Artists might not like the phrasing, but scratch the surface and there's a degree of truth there. It's an argument from self-interest, at core.

I see no reason for why a neural network's output should be protected by copyright. Only humans can produce copyrightable works, and a prompt is not a sufficient creative input.

Your brain is a neural network, just FYI.

If you graduated from school and only used work that was public domain, would you have all the knowledge you currently have? Have you learned anything from anybody since graduating?

Where is the line? It’s ok for humans to learn from others work but not a machine?

Where is the line? It’s ok for humans to learn from others work but not a machine?

Yes.

The machine doesn't get to make its own choices. Once it does, we'll have a different conversations.

Presently, humans decide what goes into the training set, and what comes out. Those humans are the ones that we need to regulate.

Hot take: The LLM is learning as much as a ZIP file is learning.

No, I have to disagree here. I'm not an artist, but I respect the creations of others. OpenAI does not. They could have trained on free data, but it did not want to because it would cost more (finding humans to find/sanction said data, etc).

I literally met and worked with Doctorow on a protest back in 2005, so I'm not exactly new to this. I also think that the only way you could have written your comment was by grossly misinterpreting my comment.

I hope the idea of Intellectual Property as a whole is thrown out the window and copyright with it.

There's a hell lot of money to be made from this belief so of course the HN crowd will hold it.

That is a pretty unfair response when you're skipping the part about how commercializing should not be fair use.

I firmly believe that training models qualifies as free use. I think it falls under research, and is used to push the scientific community forward.

I don't think this is as cut-and-dry and you make it out here. If I train a model on, say, every one of New York Times' and release it for free and it finds use as a way of circumventing their paywall I have difficulty justifying that as fair use/fair dealing. The purpose/character of the model should indeed be a factor but certainly not nearly as dispositive a one as I think you're suggesting.

To the extent that training the model serves a research purpose I think that the general use / public release of the trained model does not in general serve the same research purpose and ought to have substantially lower protection on the basis of, e.g., the effect on the original work(s) in the market.

Wouldn't that depend on the use case? If you just had the model regenerate articles that roughly approximate its source material that is much a more clear cut violation of a paywall. But if you use that data as general background knowledge to synthesize aggregative works such a history of the vietnam war, or trends in musical theatre in the 1980s relative the 1970s, or shifts in the language usage of formal honorifics, then that seems to me to be clearly fair-use categories. There are gray areas, such as aggregating the opinions of a certain op-ed writer over a short timeframe that while it might produce a novel work, is basically is mixmash of recent articles. But would that be unfair, especially if not done in the original authors style?

These technical distinctions like these probably will matter in whatever form regulation eventually ends up becoming.

Quite a lot of what news publications like the New York Times do is precisely regenerating articles that roughly approximate source material from some other publication. If I remember rightly, a lot of smaller, more local news organisations aren't happy about this because of course it's a more or less direct substitute for the original and a handful of big news organisations (particularly the New York Times) are taking so much of the money that people are willing to pay for news that it's affecting the viability of the rest - but it's not illegal, since facts cannot be copyrighted.

Yes, I think this is a rather fact-specific inquiry. My main point is that the research/commercial distinction is not the only factor (and not even the most important one).

if you use that data as general background knowledge to synthesize aggregative works such a history of the vietnam war, or trends in musical theatre in the 1980s relative the 1970s, or shifts in the language usage of formal honorifics, then that seems to me to be clearly fair-use categories

I don't think this is clear. If someone were to train a model on several books about the Vietnam War and then publish my own model-created book on history of the Vietnam War, I would be inclined to say that that is infringement. And if they changed the dataset to include a plurality of additional books which happen to not be about Vietnam, I don't think that changes the analysis substantially.

I think it is hard to earnestly claim that in that instance the output (setting aside the model itself, which is also a consideration) is transformative, and so I would think, absent more specific facts, that all four fair use factors are against it.

So what about the fact that these cartoons look like Keith Haring meets Cathy Guisewite meets Scott Adams? These cartoons are artistically derivative. They are obviously not derivative from the perspective of copyright as style is an idea, not an expression.

These models were not trained on just the cartoonist in question, nor just their inspirations. The intent was to train on all images and styles. The expression of the idea using these models is not going to match the expression of the idea of all images, even those conforming to a certain bounded prompt.

For the life of me I can't get DALL-E or Stable Diffusion to produce anything like Cat and Girl nor anything coherent for the above mentioned inspirations. DALL-E flat out refuses to create things in the style of the above and Stable Diffusion has insane looking outputs, overwhelmed by Herring.

Most importantly, copyright is concerned with specific works that specifically infringe and whose damages are either statute or based on quantifiable earnings from infringement. Copyright does not cover all works, especially when again, the intent is to learn all styles that rarely, if at all, reproduces direct expressions.

The only point at which these images are directly copied are when in the machine's memory, which has already has case law for allowance, followed by back propagation that begins the process of modifying the direct copies for the underlaying formal qualities.

It seems like a lot of people are going to be upset when the courts rule eventually rule in favor of the training and use of these models, if not only because the defendant has a lot of resources to throw at a legal team.

your argument is that it's not infringing because they copied everything at once?

I get that there's case law on copying in memory on the input side not being infringing but can't for the life of me understand how they get away with not paying for it. At least libraries buy the books before loaning them out, OpenAI and midjourney presumably pirated the works or otherwise ignored the license of published works and just say "if we found it on the internet it's fair game"

Libraries loan out specific books!

I firmly believe that training models qualifies as free use

Can you explain what you mean by "qualifies as free use?" I've never heard that term before.

I would be reasonably confident they mean "fair use" [1] instead.

[1]: https://en.wikipedia.org/wiki/Fair_use

The poster probably meant "fair use", which is an American term of copyright law. The UK, Canada, and other commonwealth countries have a concept known as "fair dealing" which is similar to, but different than fair use[1]. EU copyright law has explicitly permitted uses[2] which are exceptions to copyright restrictions. Research is one of them, but requires explicit attribution.

[1] https://library.ulethbridge.ca/copyright/fairdealing/#s-lib-... [2] https://www.ippt.eu/legal-texts/copyright-information-societ...

One could argue that you're paying for the resources required to run the model rather than paying for the using the model.

Depends, some models are not freely available and in that case you pay very much for access to the model

I think it's worth noting that one of the things that makes this question so vexing is that this topic really is pretty novel. We've only had a few machines like this in history and almost no legal precedent around how they should be treated. I can't remember anyone ever bringing suit over a Markov chain engine, for example, and fabricating one is basically "baby's first introductory 'machine intelligence' project" these days (partially because the output sucks, so nobody has ever felt they have something to lose from competing with a Markov engine).

Existing copyright precedent serves this use-case poorly, and so the question is far more philosophical than legal; there's a good case to be made that there's no law clearly governing this kind of machine, only loose-fit analogies that degenerate badly upon further scrutiny.

That’s literally what human artists do, and how they work. Art is iteratively building on the work of others. That’s why it’s so easy to trace its evolution.

Because we are humans and our capability of abusing those rights is limited. The scale and speed at which LLMs can abuse copyrighted work to threaten the livelihoods of the authors of those works is reason enough to consider it unethical.

I don’t think it is. What you describe is similar to any other industry disruption, and I don’t think those are unethical. I’d actually argue that preventing disruption is often (not always) unethical, because you artificially prolong an inefficient or inferior alternative.

So you're saying that, we should stop pursuing art and prose? Because when you fine tune midjourney with 30 or so images of an artist, it can create any image with the artist's style.

You removed the value and authenticity that artist in 30 minutes, you applauded it, and defended that it should be the norm.

OK then, we can close down all entertainment business, and generate everything with AI, because it can mimic styles, clone sounds, animate things with gaussian splats, and so on.

Maybe we can hire coders to "code" films? Oh sorry. ChatGPT can do that too. So we need a keypad then, only the most wealthy can press. Press 1 for a movie, 2 for a new music album, 3 for a new book, and so on.

We need 10 buttons or so, as far as I can see. Maybe I can ask ChatGPT 4 to code one for me.

"So you're saying that, we should stop pursuing art and prose?", no, it becomes a hobby like any other. People still sew for fun.

Great, hold on, I'm calling Hollywood to tell that all they do is a hobby now.

...and the writers' guild, too.

Well obviously AI isn't at the level of replacing Hollywood yet.

But once it is? I mean, yeah, it'll replace Hollywood.

People will tell Netflix, "hey I want a move about X in the style of Y and I want Z to star in it", and bam -- your own bespoke movie.

I mean, once the capability's there, it's just inevitable. And yeah -- acting will become a hobby, just like sewing is today.

Then people will see how empty and inferior it is and want movies with actual people and writers again.

Then the market will decide, won't it? Why the fuss about generative AI then? If you're so confident about its inferiority, you shouldn't have to worry about it, right? The better product will win, right?

The market does not choose the superior product. It might choose the least common denominator, the cheapest product, the product that got on the market the earliest, or the one with the richest backers, but not "the superior product".

The "superior" product is subjective.

The objectively superior product is the one that people pay for. They are exchanging labor/capital for the item/content.

I could make the best movie ever conceived, the movie to end all movies. If nobody watches it, it has 0 value.

The first part is debatable, unless you qualify it as "superior at making their creator money".

The market selects for that, and only that. Other qualities of the product are secondary, making any statements to the effect of "the best product [outside the context of simply making the most money] will win" misguided at best.

No, because the market isn't fair.

What will actually happen is people will think "meh good enough", shitty AI art will become the norm, and we'll be boiling frogs and not realize how shitty things have become.

Honestly, that'll be boring. I don't want to be a star of a movie, that's not what pulls me in.

I want to see what the person has imagined, what the story carries from the author, what the humans in it added to it and what they got out of it.

When I read a book, I look from another human's eyes, with their thoughts and imagination. That's interesting and life-changing actually. Also, the author's life and inner world leaks into the thing they created.

The most notable example for me is Neon Genesis Evangelion. The psychological aspects of it (which hits very hard actually) is a reflection of Hideaki Anno's clinical depression. You can't fake this even if you want.

This is what makes human creation special. It's a precipitation of a thousand and one thing in an unforeseen way, and this is what feeds us, albeit we are not aware of this and love to deny it at the same time.

"This is what makes human creation special.", that's a load of garbage. There is nothing inherently special about human creation. Some AI artwork I've seen is incredible, the fact it was AI generated didn't change its being an incredible piece of art.

Thinking our creation has some kind of 'specialness' to it is like believing in a soul, or some other stupid thing. It's pure hubris.

Actually, I'm coming from a gentler point of view: "Nature and living things are much more complex than we anticipate".

There are many breakthroughs and realizations in science which excite me more than "this thing called AI": Bacteria have generational memory. Bees have a sense of time. Mitochondria (and cells) inside a human body communicate and try to regulate aging and call for repairs. Ants have evolved antibiotics, and expel the ones with incurable and spreadable diseases. Bees and ants have social norms, they have languages. Plants show more complex behavior than we anticipated. I'm not entering the primates' & birds' region because only the titles will be a short chapter.

While some of them might be very simple mechanisms on chemical level, they make a much more complex system, and the nature we live in is much sophisticated than we know, or want to acknowledge.

I'm not looking from "Humans are superior" perspective. Instead, I'm looking from "our understanding of everything is too shallow" perspective. Instead of trying to understand or acknowledge that we're living in a much more complex system on a speck of dust in vast emptiness, we connect a bunch of silicon chips, dump everything we babbled to a "simulated neural network", and it gives us semi-nonsensical, grammatically correct half-truths.

That thing can do it because it randomly puts a word after word after a very complex and weighted randomization learned from how we do it, but imitating it blindly, and we think that we understood and unlocked what intelligence is. Then we applaud ourselves because we're one step closer to strip a living thing from its authenticity and making Ghost in the Shell a reality.

Living things form themselves over a long life with sight, hearing, communication, interaction and emotions, at least, and we assume that a couple of millions lines of code can do much better because we poured a quadruple distilled, three times diluted version of what we have gone through.

This is pure hubris if you ask me, if there's one.

Because when you fine tune midjourney with 30 or so images of an artist, it can create any image with the artist's style.

Artists style is not copyrightable, at least in the US.

And if they changed that because of "AI"? My word, the lawsuits that would arise between artists...

Doesn't matter. You pay the artist for their style of rendering things. Consider XKCD, PHD Comics, Userfriendly, etc. At least 50% of the charm is the style, remaining 50% is the characters and the story arc.

You can't copyright style of a Rolex, but people pay a fortune to get the real deal. Same thing.

My word, the lawsuits that would arise between artists...

Artists imitate/copy artists as a compliment, at least in illustration and comics world. Unless you do it in bad faith, I don't think artists gonna do that. Artists have a sense of humor to begin with, because art is making fun of this world, in a sense.

No, you pay them for the finished product. The STYLE is independent. Lots of artists have similar styles. They don't all pay each other for copying their styles.

Every artist has their own style, because it's their way of creating the product.

Pixar, Disney and Dreamworks have different styles, same for actors, writers, and designers, too. You can generally tell who made what by reading, looking, listening, etc.

I can recognize a song by Deep Purple or John Mayer or Metallica, just by their guitar tone, or their mastering profile (yes, your ear can recognize that), in a couple of seconds.

If style was that easy, we could have 50 Picassos, 200 John Mayers, 45 Ara Gulers (A photographer) which you can't tell them apart, but it doesn't work that way.

XKCD took a couple of guest artists because of personal reasons. It was very evident, even if the drawing style was the same.

People, art, and hand made things are much more complex than it looks. Many programmers forget because everything is rendered with their favorite font, but no two hand-made thing is ever the same. Eat the same recipe from two different cooks, even if you measure the ingredients independently and give them beforehand, you'll have different tastes.

Style is a reflection of who you are. You can maybe imitate it, but you can't be it.

Heck, even two people implementing the same algorithm in the same programming language doesn't write the same thing.

Style is a reflection of who you are. You can maybe imitate it, but you can't be it.

Isn't this an argument that AI-generated artwork will never be more than a lesser facsimile? That'd suggest that human-made works will always be more sought-after, because they're authentic.

It'll be, and human made things will always be better and more sought-after, however capitalism doesn't work that way.

When the replacements become "good enough", it'll push the better things because of being cheaper and 90% being there. I have some hand-made items and they're a treat to hold and use. They perform way better than their mass produced ones, they last longer, they feel human, and no, they're not inferior in quality. In fact it's the opposite, but most of them are not cheap, and when you want to maximize profits, you need to reduce your costs, ideally to zero.

Do you really feel that way universally? Would it be ethical to disrupt the pharmaceutical industry by removing all restrictions around drug trials? Heck, you could probably speed things up even further if you could administer experimental drugs to subjects without their consent.

Obviously this is a bit facetious, but basing your ethical framework on utilitarianism and _nothing_ else is pretty radical.

If having those restrictions makes the world worse overall, then it would be ethical to remove them. But I assume the restrictions are designed by intelligent people with the intention of making the world better, so I don’t see any reason to think that’s the case.

I agree that the current crop of artists are worse off with AI art tools being generally available. But consumers of art, and people who like making art with AI art tools, are better off with those tools being available. To me it’s clear that the benefit of the consumers outweighs the cost to the artists, and I would say the same if it was coders being put out of jobs instead. You can prove this to yourself by applying it to anything else that’s been automated. Recording music playback put thousands of musicians out of work, but do you really regret recorded music playback having been invented?

P.S. Adobe firefly is pretty competent and is only trained on material that adobe has the license to. If copyright were the real reason people didn’t like AI art tools, you would see artists telling everyone to get Adobe subscriptions instead of Midjourney.

If having those restrictions makes the world worse overall, then it would be ethical to remove them

Worse how? As defined by whom?

You could make a pretty compelling argument that "the world" would be better off by, e.g., forcing cancer patients through drug trials against their will. We basically could speed run a cure to cancer!

These longtermist, ends justify the means, ideas can easily turn extremely gross.

Yes, that is true. I 100% agree. It is needed without a doubt.

For one moment, let's think it this way. You are a 20-year experienced engineer who is making whatever money you are making. Suddenly, your skills are invalidated because of a new disruption. And you have another friend in, the same situation.

Fortunately for you, luck played out and you could transition! You found a way into life, meaning and value. Your joy and your everyday life continued as it is.

But the other friend enjoyed the process, and liked doing what they were doing and there was no suitable transition for them. Humans are adaptable, but to them, nothing mattered because the whole existence didn't offer any value. The sole act of doing was robbed WITHOUT ANY ALTERNATIVE. The experience and value of a person rendered worthless.

Can you relate to that feeling? If yes, thank you.

If no, your words are empty and hold no value.

Artist went through the similar phase during the invention of photography. Now, it is rather soul-crushing because anything an artist make can easily be replicated, making the whole artistic journey a moot.

Can you relate to that feeling? If yes, thank you.

If no, your words are empty and hold no value.

Being sympathetic towards those people doesn't mean you should bend to their will if you don't believe it's the right thing to do. I can be sympathetic to a child who cries over not being able to ride a roller coaster because they aren't tall enough without thinking the height requirement should be removed.

I think the big difference is that it's not a direct replacement - it feeds off of the existing people while making it much harder for them to make a living.

It would be as if instead of cars running on gasoline, they ran on chopped up horseflesh. Not good for the horses, and not sustainable in the long term.

Don't even try to stop my grocery-store-sample-hoarding robot army, Wegmans! You're being unethical in your pathetic attempt to prevent your sampling disruption!

Some "disruptions" are unethical, some are not. It's about what they actually consist of. Labelling many things as "industry disruption" abstracts beyond usefulness.

Are photocopy machines illegal? Are CD-ROM burners illegal? Both allow near-unlimited copies of copyrighted material at a scale much faster than a human could do alone.

The tools are not the problem, it's how humans use them.

CD-ROM burners

They can be used in an illegal way if used to copy copyrighted material, yes.

And are the burners themselves illegal because you can name illegal uses for them?

No, and I don't think anyone is arguing that LLMs should be illegal either.

I personally am not against LLMs training on things the operator has rights to, and even training on copyrighted things, but I am against it laundering those things back out and claiming it's legal.

Same as an LLM, they can be used in an illegal way if used to copy copyrighted material. So I can't tell it to reproduce a copyrighted work. But it can create new material in the style of another artist.

The difference is that the LLM is still copying copyrighted material in your case, but if I burn a Linux ISO, that is not happening.

You do not have to produce an exact copy of something to violate copyright, and I think anything the LLM outputs is violating copyright for everything it has ever trained on, unless the operator (the person operating the LLM and/or the person prompting it) has rights to that content.

"abusing those rights" is a subjective phrase. What about it is "abuse"? If I learned how to draw cartoon characters from copying Family Guy and released a cartoon where the characters are drawn in a similar style, would that be abuse (assuming my show takes some of Family Guy's viewership)? Is your ethical hangup with the fact it's wrong to use the data of others to influence one's work (which could potentially be an algorithm) or that people are losing opportunities based on the influenced work?

If it's the latter how do we find the line between what's acceptable and what's not? For example, most people wouldn't be against the creation and release of a cure for cancer developed in this way. It would lead to the loss of opportunities for cancer researchers but I believe most people would deem that an acceptable tradeoff. A grayer area would be an AI art generator used to generate the designs for a cancer research donation page. If it could potentially lead to a 10% increase in donations, does that make it worth it?

For example, most people wouldn't be against the creation and release of a cure for cancer developed in this way.

Intellectual property law does presently restrict the development of cancer treatments and demands in many cases exorbitant royalties from patients and practitioners, so I'm not convinced that this is accurate. If people believed that the loss of opportunities would constrain innovation in the field of cancer research, I think they'd expect the AI users to pay royalties as well.

If people believed that the loss of opportunities would constrain innovation in the field of cancer research, I think they'd expect the AI users to pay royalties as well.

This comes down to the product of AI.

If the AI produces a cancer treatment identical to what is already covered by patent, I think commercialization would be contingent on the permission of the IP holder.

If the AI produced a novel cancer treatment, using a transformative synthesis of available knowledge, Most people would not expect royalties.

I never made a legal appeal in my previous comment so legalities are irrelevant. It also differs from my argument on derivative/transformative works rather than specific works.

What I was questioning was whether people would think it's morally right or not to generate inspired works. For example, if someone made an algorithm to read the relevant papers and make a cancer treatment that addresses the same areas/conditions of a method under IP law but don't equate to the exact method, I don't see that as a morally wrong action by itself.

Because we are humans and our capability of abusing those rights is limited. The scale and speed at which looms can abuse copyrighted work to threaten the livelihoods of the seamstresses of those works is reason enough to consider it unethical.

Replace loom for printing press etc, you realize you're a luddite?

Ned Ludd was onto something. He wasn't anti-progress. He was anti-labour theft. The problem was not that people were losing their jobs, but that they were being punished by society for losing their jobs and not being given the ability to adapt, all to satisfy the greed of the ownership class.

I am hearing a strong rhyme.

Commercialized LLMs are absolutely labour theft even if they are useful.

Capatalism has really done a number on the human psyche, WE WANT OUR LABOR STOLEN. That's the whole point, so we don't have to labor anymore.

Boggles my mind how warped peoples thinking is.

We do not want our labour stolen. We want to labour less, and we want to be fairly compensated for when we have to labour.

The Luddites and the original saboteurs (from the French sabot) had a problem where the capital class invested in machines that let them (a) get more work done per person, (b) employ fewer people, and (c) pay those fewer people less because now they weren't working as hard. The people they fired? They (and the governments of the day — just like now) basically told them to go starve.

The Luddites were members of a 19th-century movement of English textile workers which > opposed the use of certain types of cost-saving machinery, and often destroyed the > machines in clandestine raids. They protested against manufacturers who used machines > in "a fraudulent and deceitful manner" to replace the skilled labour of workers and > drive down wages by producing inferior goods.[1][2] Members of the group referred to > themselves as Luddites, self-described followers of "Ned Ludd", a legendary weaver > whose name was used as a pseudonym in threatening letters to mill owners and > government officials.[3]

Yes, we want to work less. But fair work should result in fair compensation. Ultimately, this is something that the copyright washing of current commercialized LLMs cannot achieve.

...reason enough to consider it unethical.

But unethical =/= illegal, unfortunately.

But unethical =/= illegal, unfortunately.

That is very much fortunate.

Does copyright law say you can ingest copyrighted work at very large scale and sell derivates of those works to gain massive profit / massive market capitalizations? Honestly wondering. This seems to be the crux issue here.

Like how Google has parsed webpages of content to develop their page rank algorithm for searching on the web? I'm assuming it does.

Google is not producing something that competes with or is comparable to what it's parsing and displaying, which makes it very different.

Google is displaying the exact content and a link to the source, and is functioning as a search engine.

Copying music (or whatever), and then outputting music based on the copied music is not the same thing as a search engine, it's outputting a new "art" that is competing with the original.

Another way to put it, is that you can't use a search engine to copy something in any meaningful way, but copying music to produce more music is actually copying something.

The goal of my post was not to answer what differentiates google search with LLMs and other generative models, it was to respond to the original post above:

Does copyright law say you can ingest copyrighted work at very large scale and sell derivates of those works to gain massive profit / massive market capitalizations

The reasons as to why I don't think training on copyrighted data are stated in my other comments replying to people who have made arguments about its immorality.

Googles search engine is not selling derivative works.

If you search for a Disney movie on Google search, it does not try to sell you a film derived from the movie.

They sell you ad space on full Disney movies (re)uploaded by random people who are not affiliated with Disney though: https://www.google.com/search?q=finding+nemo+full+movie

I can also get Disney coloring book pages directly from Google's cache on Google images: https://www.google.com/search?q=disney+princess+coloring+boo...

Authors Guild, Inc. v. Google, Inc. determined that Google's wholesale scanning and uploading of books is allowed under the first sale doctrine because the University of Michigan library they borrowed the books to scan from paid for them (or a donor paid for them, at some point). Here's a book of bedtime stories available in its entirety: https://www.google.com/books/edition/Picnics_in_the_Wood_and...

No, because crawling the web, ingesting copyrighted content, and ranking them is not a derivative work of that content.

If crawling the web, ingesting copyrighted content, and ranking them is not a derivative work for that content, then using them to change the values of a mathematical expression should also exempt the expression from being a derivative work.

Does copyright law say you can ingest copyrighted work at very large scale and sell derivates of those works

In that case the OP should have never posed this irrelevant question because access to the expression isn't giving access to a derivative work.

It does. See app stores and the endless copy-cat games and apps, right down to the art style.

I sue people for copyright infringement frequently. It's rare that I have a defendant whose defense is "the internet is full of other infringers, why should I be held responsible?" Never have they won. This debate would go better if people didn't base it on assumptions they gleam from the world around them, but with regard to the actual law and not specious reasoning like "well, they did it too!!"

I'd love to be the lawyer of the first anime artist then.

Why engage with nearly 200 years of copyright jurisprudence when you can just insist you are right because anime?

No, I can insist they are copying my style, which is anime.

Or maybe you cannot copyright style, and all those apps do fall on legal ground?

Style is not copyrightable. Please make an actual effort to engage with copyright law and not just ask me smarmy questions because you think you are right because you've made no efforts past looking at things immediately in front of you.

https://letmegooglethat.com/?q=is+style+copyrightable

> ingest copyrighted work at very large scale and sell derivates of those works to gain massive profit / massive market capitalizations?

Entire industries exist dedicated to such things. News aggregators. TV parody shows. Standup comedians. Fan fiction. Biography writers. Art critics. Movie critics. Sometimes the derivative work even outsells the original, especially when the original was horrible or unappreciated. I have never played Among Us or the new Call of Duty, but I do enjoy watching NeebsGaming do their youtube parodies of them.

No, copyright law prohibits that. The best example so far is Google's image search being considered a fair use, notably there, its not commercial in as far as they do not sell the derivative work, though they might sell ads on the image search results. OpenAI sells their service which is the result of the copies, i.e. the a derivative work. It's also probably true that the AI weights themselves are derivatives of the works they are based from.

Yes, I believe that is correct. If you do something "transformative" with the material then you are allowed to treat it as something new. There's also the idea of using a portion of a copyrighted work (like a quote or a clip of a song or video), this would be "fair use".

But we are allowed to use copyrighted content. We are not allowed to copy copyrighted content. We are allowed to view and consume it, to be influenced by it, and under many circumstances even outright copy it.

It's important to consider in any legalistic argument over copyright that, unlike conventional property rights which are to some degree prehistoric, copyright is a recent legal construct that was developed for a particular economic purpose.

https://en.wikipedia.org/wiki/Intellectual_property#History

The existing standards of fair use are what they are because copyright was developed with supporting the art industry as an intentional goal, not because it was handed down from the heavens or follows a basic human instinct. Ancient playwrights clipped each others' ideas liberally; late medieval economists observed that restricting this behavior seemed to encourage more creativity. Copyright law is a creation of humans, for humans, and is subordinate to moral and economic reasoning, not prior to it.

Copyright only makes any sense for goods with a high fixed cost of production and low to zero marginal cost. Any further use beyond solving that problem is pure rent seeking behavior

Also, with computers being functional copyright has become a tool of social control; any function in a physical object can be taken away from you at a whim with no recourse so long as a computer can be inserted into the object. Absent a major change in how society sees copyright I envision a very bleak totalitarian future arising from this trend.

Don't put the cart before the horse. What art will we have for AI to copy if there's no more artists next generation?

The future of good AI art is Adobe Firefly; a tool in a picture editor which gives users great productivity for certain tasks. Artists won’t go extinct; they will be able to produce a lot more art.

That's the future of AI art - but is AI art the future of art? if AI artists can't maintain any profit from their work, how are they going to afford the compute time?

Copyright only makes any sense for goods with a high fixed cost of production and low to zero marginal cost.

If that's the case, then novels, news articles, digital images, etc. are things that copyright absolutely makes sense for. If you think that they have a "low cost of production", you are sadly misinformed about the artistic process.

Some of these have vanishingly low marginal costs when it comes to reproduction, but in light of their high fixed cost of production, I don't see how that matters.

Novels maybe. News articles, with rare exception, and digital images absolutely do not have high fixed costs of production.

No, copyright only makes sense insofar that it provides a net positive value for society: that it promotes/protects more creativity leading to economic output than it prevents.

That is, does the amount of creative/economic output dissuaded by allowing AI (preventing people who would not be able to or not want to create art if they couldn't get paid) exceed the creative/economic output of letting people develop and use such AIs?

GenAI reduces the fixed cost of creating images/text/whatever which all else equal will increase the amount created. Whether or not you think that is a good thing is probably mostly a function of do you make money creating these things or do you pay money to have these things created.

Copyright only makes any sense for goods with a high fixed cost of production and low to zero marginal cost. Any further use beyond solving that problem is pure rent seeking behavior

100% agree. But even then it's not very good. Abolish copyright, severely limit patents, and leave trademarks as they are. The IP paradigm needs an overhaul.

Sorry, but these arguments by analogy are patently ridiculous.

We are not talking about the eons old human practice of creative artistic endeavor, which yes, is clearly derivative in some fashion, but which we have well established practices around.

We are discussing a new phenomenon of mass replication or derivation by machine at a scale impossible for a single individual to achieve by manual effort.

Further, artists tend to either explicitly or implicitly acknowledge their priors in secondary or even primary material, much like one cites work in an academic context.

Also, the claim:

But if I take your work and compare it to millions of other people's work...

Is ridiculous. A. you haven't, nor will you ever actual do this. B. This is never how the system of artistic practice up to this point has worked precisely because this sort of activity is beyond the scale of human effort.

In addition, plagiarism exists and is bad. There's no reason that concept can be extended and expanded to include stochastic reproduction at scale.

If you feel artists shouldn't have a say and a future in which capital concentrates even further into the hands of a few technological elite who make their money off of flouting existing laws and the labor of thousands, by all means. But this argument that somehow by analogy to human behavior companies should not be responsible for the vast use of material without permission is absolutely preposterous. These are machines owned by companies. They are not human beings and they do not participate in the social systems of human beings the way human beings do. You may want to consider a distinction in the rules that adequately reflects this distinction in participatory status in a social system.

So your argument is predicated on the scale of inspired work being the problem?

They are not human beings and they do not participate in the social systems of human beings the way human beings do

I don't think this adds anything to the argument besides you using this as a reason analogies with humans can't be used to compare the specific concept of inspired works? I don't think this holds up.

Algorithms participating in social systems has nothing to do with whether inspired works have a moral claim to existence for some. The fact that your ethics system values the biological classification of the originator of inspired works is something that can't be reconciled into a general argument. I could make the claim that the prompt engineer is the artist in this case.

capital concentrates even further into the hands of a few technological elite who make their money off of flouting existing laws

That can be said by the development of any technology. Fear of capital concentration is more a critique on capitalism than it is on technological development.

That can be said by the development of any technology. Fear of capital concentration is more a critique on capitalism than it is on technological development.

Technology does not exist in a vacuum. All of the utility and relevance of technology to humans is dependent on the social and economic conditions in which that technology is developed and deployed. One cannot possibly critique technology without also critiquing a social system, and typically a critique of technology is precisely a critique about its potential abuses in a given social system. And yes, that's what I'm attempting to do here.

I don't think this adds anything to the argument besides you using this as a reason analogies with humans can't be used to compare the specific concept of inspired works? I don't think this holds up.

This is a fair point. One could argue that an LLM, properly considered, is just another tool in the artist's toolbox. I think a major distinction though, between and LLM and, say, a paintbrush or even a text-editor, or photoshop, is that these tools do not have content baked into them. An LLM is in a different class insofar as it is not simply a tool, but is also partially the content.

The use of two different LLMs by the same artist, with a the same prompt, will produce different results regardless of the intent of the so called artist/user. The use of a different paintbrush, by the same artist, with the same pictorial intention may produce slightly different results due to material conditions, but the artist is able to consciously and partially deterministically constrain the result. In the LLLM case, the tool itself is a partial realization of the output already and that output is trained on masses of works of unknown individuals.

I think this is a key difference in the "AI as art tool" case. A traditional tool does not harbor intentionality, or digital information. It may constrain the type of work you can produce with it, but it does not have inherent, specific forms that it produces regardless of user intent. LLMs are a different beast in this sense.

Law is a realization of the societal values we want to uphold. Just as we can't in principle claim that training of LLMs on scores of existing work is wrong solely due to the technical function of LLMs, we cannot claim that this process shouldn't be subject to constraints and laws due to the technical function of LLMs and/or human beings, which is precisely what the arguments by analogy try to do. They boil down to "well it can't be illegal since humans basically do the same thing" which is a hyper-reductive viewpoint that ignores both the complexities and novelty of the situation and the role of law in shaping willful societal structure, and not just "adhering" to natural facts.

They are not human beings and they do not participate in the social systems of human beings the way human beings do.

Your original quote was not using the impact of the technology, it was disparaging the algorithmic source of the inspired work (by saying it does not participate in social systems the way humans do).

I think a major distinction though, between and LLM and, say, a paintbrush or even a text-editor, or photoshop, is that these tools do not have content baked into them

LLMs, despite being able to reproduce content in the case of overtraining, do not store the content they are trained from. Also, the usage of "content" here is ambiguous so I assumed you meant the storage of training data.

To me, the content of an LLM is its algorithm and weights. If the weights can reproduce large swaths of content to a verifiable metric of closeness (and to an amount that's covered by current law) I can understand the desire to legally enforce current policies. The problem I have is against the frequent argument to ban generative algorithms altogether.

The use of a different paintbrush, by the same artist, with the same pictorial intention may produce slightly different results due to material conditions, but the artist is able to consciously and partially deterministically constrain the result.

I would counter this by saying the prompts constrain the result. How deterministically depends on how well one understands the semantic meaning of the weights and what the model was trained on. Also, as a disclaimer, I don't think that makes prompts proprietary (for various different reasons).

I think this is a key difference in the "AI as art tool" case. A traditional tool does not harbor intentionality, or digital information

Assigning "intent" is an anthropomorphism of the algorithm in my opinion as they don't have any intent.

I do agree with your last paragraph though, one (or even a group of) individual's feelings don't make something legal or illegal. I can make a moral claim as to why I don't think it should be subject to constraints and laws, but of course that doesn't change what the law actually is.

The analogies are trying to make this appeal in an effort to influence those who try to make the laws overly restrictive. There are many laws that don't make sense and logic can't change their enforcement. The idea is to make a logical appeal to those who may have inconsistencies in their value system to try and prevent more non-sensical laws from being developed.

Difference in scale (order of magnitude) is difference in kind (in every area of life), so yes, scale can be argued as the problem.

> a future in which capital concentrates even further into the hands of a few technological elite who make their money off of flouting existing laws and the labor of thousands

I think this is the central issue and is not limited to just AI generated art. Wealth concentrates to the few from each technological development. When robots replaced factory workers, the surplus profit went to the capital holders, not the workers who lost their jobs. AI generated art will be no different but I don't think it will replace the creative art that people will want to make, just the art that people are making to pay the bills.

It’s not replication and that’s all there is to it.

I'm not as interested in making a technical/legal argument, as I'm just sharing my feelings on the topic (and eventually, what I think the law should be), but during training copies are made of copyrighted material, even if the model doesn't contain exact copies of work. Crawling, downloading, storing (temporarily) for training all involve making copies, and thus are subject to copyright law. Maybe those copies are fair use, maybe it's not (I think it shouldn't be).

My main point is that OpenAI is generating an incredible amount of value all hinging on other people's work at a massive scale, without paying for their materials. Take all the non-public domain work off Netflix and Netflix doesn't have the same value they have today, so Netflix must pay for content it uses. Same goes for OpenAI imho.

Assume I agree that copyright holders should be compensated for their works (because I do in some sense).

How would this compensation work? Let's say a portion of profits from LLMs that were trained on copyrighted work should be sent to the copyright holders.

How would we allocate which portion of the profits go to which creators? The only "fair" way here would be if we could trace how much a specific work influenced a specific output but this is currently impossible and will likely remain impossible for quite some time.

This is what licensing negotiations are for. One doesn't get to throw up their hands and say "I don't know how to fairly pay you so I won't pay you at all".

Your argument is ridiculous, because it could identically be applied to "every human artist should have to pay a license to every artist whose work they were inspired by". That would obviously be a horrible future, but megacorps like Disney would love it.

Every time you view an image online you’re making a copy so that argument is spurious.

Take all the non-public domain work off Netflix and Netflix doesn't have the same value they have today, so Netflix must pay for content it uses. Same goes for OpenAI imho

I'd feel a lot better about that argument if we had sane copyright laws and anything older than 7-10 years was automatically in the public domain. Suddenly, netflix is looking a lot more valuable with just public domain works and there'd be a ton of public domain art to train AI models with. I suspect that the technology would still leave a lot of artists concerned in that situation though, because even once the issue of copyright is largely solved the fact remains that AI enables people who aren't artists to create art.

Calling it copyright today is a misnomer, it's not actually the act of copying the work that's a problem; it should actually be called "Performance Rights" or "Redistribution Rights." The part where this gets complicated is that OpenAI has (presumably, if they haven't that's a different matter) acquired the works through legal means. And having acquired them they're free to do most anything with them so long as they don't redistribute or perform the works.

The big question is where "training an AI on this corpus of works and then either distributing the weights or performing the work via API" fall? Should the weights be considered derivative works? I personally don't think so and although the weights can be used to produce obviously infringing works I don't think this meets the bar of being a redistribution of the work via a funny lossy compression algo like some are claiming. But who knows? Copyright is more political than logical so I think the bend is really gonna be a balance of the tangible IRL harms artists can demonstrate vs. the desires of unrelated industries who wish to leverage this technology and are better for having all this data available.

The artist in the article clearly states that his work was free to use only if it was not used to make a profit, those were the terms of their license. In the artist's opinion, OpenAI violated that license by training their tool on their work and then selling that tool.

This artist doesn't complain about work similar to their own being generated, and their artwork is very clearly not clothing.

> In the artist's opinion, OpenAI violated that license...

So? Why does the author's opinion even enter into the equation? Authors cannot claim ownership beyond the bounds of copyright. If what AI is doing qualifies as fair use, the artist cannot do anything about it. I'm sure that lots of artists would not want anyone to lampoon or criticize their work. They cannot stop such things. I'm sure lots of artists would never want anyone to ever create anything in any way similar to their work. They cannot do that either.

It is not clear that training an LLM falls under "fair use". We are then left with the license of the work, in this case that license forbids re-selling the work for a profit. It is the artist's license for their work at issue, not their opinion.

If the legality is ambiguous then we're left with an impending court decision. Fair use is an affirmative defense, considered case by case.

"...clearly states that his work was free to use only if it was not used to make a profit"

Replace "use" with "copy". No one may copy the work to make a profit. Fair Use has long been an exemption to copyright, with Learning an example of Fair Use. But no one expected AIs to learn so quickly. I don't think it is clear either way, and will end up in SCOTUS.

Free and redistribute the material

> Fair Use has long been an exemption to copyright,

The proper construction is that copyright is an exemption from the freedom of speech. Fair use is a partial description of freedom of speech, a description to narrow the limits of copyright rather than to broaden the already limitless bounds of freedom of speech.

The default for expression is that it is allowed except if copyrighted, as opposed to copyrighted except when covered by fair use.

I disagree that a person learning is the same as an AI model being trained. That aside, typically fair use covers the use of an excerpt or a portion of the material, not reproduction of the work in it's entirety.

Agreed: in the end, courts will make the decision.

... as it didn't kill off the fashion industry.

Clothes are inherently consumable goods. If you use them, they will wear out. If you do not use them, they still age over time. You cannot "copy" a piece of clothing without a truly astonishing amount of effort. Both the processes, and the materials, may be difficult or impossible to imitate without a very large investment of effort.

Compare this to digital art: You can copy it literally for free. Before AI, at least you had to copy it mostly verbatim (modulo some relatively boring transforms, like up/down-scaling, etc.). That limited artist's incomes, but not their future works. But in a post-AI world, you can suck in an artist's life's work, and generate an unlimited number of copycats. Right now, the quality of those might be insufficient to be true replacements, but it's not hard to imagine we'll be in a world not so far off when it will be sufficient, and then artists will be truly screwed.

You cannot "copy" a piece of clothing without a truly astonishing amount of effort.

Sure you can. There's a whole industry making knockoffs.

GP compared copying a piece of clothing to copying digital art. I'd say that setting up a factory to make knockoffs - or even "just" buying a sewing machine, finding and buying the right fabric, laying out the piece you want to copy, tracing it, cutting the fabric, sewing it, and iterating until it comes out right - would qualify as "a truly astonishing amount of effort" for a person.

You can outsource. Look for "knockoff clothing manufacturers".

Let's say I'm an artist. I have, thus far, distributed my art for consumption without cost, because I want people to engage with and enjoy it. But, for whatever reason, I have a deep, irrational philosophical objection to corporate profit. I want to preclude any corporation from ever using my art to turn a profit, when at all possible. I have accepted that in some sense, electrical and internet corporations will be turning a profit using my work, but cannot stomach AI corporations doing so. If I cannot preclude AI corporations from turning a profit using my work, I will stop producing and distributing my work.

Do you think it's reasonable for me to want some legal framework that allows me to explicitly deny that use of my work? Because I do.

When you put it that way, I think you just laid out the case for creating a Copyleft for art.

A copyleft license is enforced by copyright. That’s the reason others can’t simply ignore the license.

I agree with the other commenters about the scale of this “deriving inspiration from others” is where this feels wrong.

It feels similar to the ye olden debates on police surveillance. Acquiring a warrant to tail a suspect, tapping a single individual’s phone line, etc all feels like very normal run-of-the-mill police work that no one has a problem with. Collating your behavior across every website and device you own from a data broker is fundamentally the same thing as a single phone’s wiretap, but it obviously feels way grosser and more unethical because it scales way past the point of what you’d imagine as being acceptable.

In that example it's not the scale that makes it right or wrong, the scale of people impacted just affects the degree of wrongs that have been committed.

Acquiring a warrant to tail a suspect, tapping a single individual’s phone line, etc all feels like very normal run-of-the-mill police work that no one has a problem with.

If acquiring a warrant is the basic action being scaled, I'd be okay with that ethically if it was done under, what I define as, reasonable pretenses. Regardless of how it scales, I still think it would be the right thing to do assuming the pretenses for the first action could be applied to everyone wiretapped. Now if I thought the base action was morally wrong (someone was tailed or wiretapped without proper pretenses), I'd think it's wrong regardless of the scale. The number of people it affected might impact how wrong I saw it, but not whether it was right or wrong to.

We are allowed to view and consume it, to be influenced by it, and under many circumstances even outright copy it.

People keep saying this but it's actually much more complicated, and in many cases you can't view copyrighted content.

An example, MicroSoft employees are not permitted to view or learn from an open source (GPL-2) terminal emulator:

https://github.com/microsoft/terminal/issues/10462#issuecomm...

Another example is proprietary software that may have it's source available, either intentionally or not. If you view this and then work on something related to it, like WINE for example, you are definitely at risk of being successfully sued.

If you worked at MicroSoft and worked on Windows, you would not be able to participate in WINE development at all without violating copyright.

If you viewed leaked Windows source code you also would not be able to participate in WINE development.

An interesting question that I have, is whether training on proprietary, non-trade-secret sources would be allowed. Something like unreal engine, where you can view the source but it's still proprietary.

Another question is whether training on leaked sources of proprietary and private but non-trade-secret code, like source dumps of Windows is legal.

Your link isn't very clear, but I think your are talking about the "clean room design" strategy. https://en.m.wikipedia.org/wiki/Clean_room_design

The way this works is they way many of us are arguing that AI and copyright should work.

Vieiwing (or training on) copyrighted work isn't copyright infringement.

What can be copyright infringement is using an employee who has viewed (or a model that was trained on) copyrighted work to create a duplication of that work.

In most of the examples of infringing output that I've seen, the prompt is pretty explicit in its request to duplicate copyrighted material.

Models that produce copyrighted content when not explicitly asked for will have trouble getting traction among users who are concerned about the risk of infringement (such the examples you listed.)

I also see this approach opening an opportunity for models that acquire specific licenses for the content they train on that would grant licenses to the users of the model to duplicate some or all of the copyrighted works.

The responsibility for how a model is used should rest primarily on the user, not the model trainers.

Who is "we" here? Are you making a distinction between people and machines? If I built a machine that randomly copied from a big sample of arts that I wanted, would that machine be ok?

OpenAI built a machine that does exactly that. They just sampled _everyone_.

OP's argument was about right and wrong, not about legal and illegal. There's a difference.

You'd have to argue the entirety, everything about copyright law being ethical, to make your version of the argument.

Copyright is just made up for pragmatic purposes. To incentive creation. It does not matter if training models is not the same as reproducing something exactly if we wish to decide that it's unfair or even just desirable for economic incentive to disallow it, then we are free to make that decision. The trade offs are fairly profound in both directions I think and likely some compromise will need to be made that is fair to all parties and does not cripple economic and social progress.

Copyright is a bad idea in the first place, and should just be thrown out entirely; but that isn't the whole picture here.

If OpenAI is allowed to be ignorant of copyright, then the rest of us should be allowed, too.

The problem is that OpenAI (alongside a handful of other very large corporations) gets exclusive rights to that ignorance. They get to monopolize the un-monopoly. That's even worse than the problem we started with.

people and companies are copying copyrighted content when they're using datasets that contain copyrighted content (which also repackage and distribute copyrighted content - not just as links but as actual works/images too), download linked copyrighted content, and store that copyrighted content. plenty of copies created and stored, it seems to me.

and like, what, do you think they're trying their damnedest to keep datasets clean and to not store any images in the process? how do you think they retrain on datasets over and over? it's really simple - by storing terabytes of copyrighted content. for ease of use, of course - why download something over and over, if you can just download it and keep it. and if they really wanted to steer clear of copyright infringement, if there's truly "no good solution" (which is bullshit for compute, oh, they can compute everything but not that part) - why can't they just refrain from recklessly scraping everything, if something were to just 'slip in'? like, if you know it's kinda bad, just don't do the thing, right? well, maybe copyright infringement is just acceptable to them. if not the actual goal.

what they generate is kinda irrelevant - there's plenty of copyright infringement happening even before any training were to be done. assembling of datasets and bad datasets containing copyrighted content are the start and the core of the copyright problems.

there's a really banal thing at the core of this, and it's just a multi-TB storage filled with pirated works.

If training a model is fair use than model output should also fallow fair use criteria. The very first thing you can find on the internet about fair use is Wikipedia article on the topic. It lists a bunch of factors to decide whether something is fair use. The very first one has a quote from an old copyright case:

[A] reviewer may fairly cite largely from the original work, if his design be really and truly to use the passages for the purposes of fair and reasonable criticism. On the other hand, it is as clear, that if he thus cites the most important parts of the work, with a view, not to criticise, but to supersede the use of the original work, and substitute the review for it, such a use will be deemed in law a piracy.

Most use of LLMs and image generation models do not produce criticism of their training data. The most common use is to produce similar works. You can find this very common “trick” to get a specific style of output to add “in style of <artist>”. Is this a direct way "to supersede the use of the original work”?

You can certainly see how other factors more or less put gen ai output into the grey zone.

The fact that clothing doesn’t qualify for copyright doesn’t mean text and images don’t. Or if you advocate that they don’t then you pretty much advocate for abolishment of copyright because those are the major areas of copyright applicability at the moment. Which is a stance to have but you’d probably be better to actually say that because saying that copyright applies to some images and text but not others is a much harder position to defend.

I have some, but diminishing sympathy for artists screaming about how AI generates images too similar to their work. Yes, the output does look very similar to your work. But if I take your work and compare it to the millions of other people's work, I'd bet I can find some preexisting human-made art that also looks similar to your stuff too.

Just like the rest of AI, if your argument is "humans can already do this by hand, why is it a problem to let machines do it?", its because you are incorrectly valuing the labor that goes into doing it by hand. If doing X that has potentially negative side effect Y, then the human labor to accomplish X is the principle barrier to Y, which can be mitigated via existing structures. Remove the labor barrier, and the existing mitigation structures cease to be effective. The fact that we never deliberately established those barriers is irrelevant to the fact that our society expects them to be there.

Copying is not illegal, but publishing. You can have as many private copies as you wish for any content.

a lot of popular AIy tool is designed to mimic a specific artist's style. human is not permitted to draw so similar graph.

We are allowed to view and consume it, to be influenced by it, and under many circumstances even outright copy it.

In theory: sure

In practice: not really, especially when you're small and the other side is big and has lots of lawyers and/or lawmakers in their pockets.

Disney ("In 1989, for instance, the company even threatened to sue three Florida daycare centers unless they removed murals featuring some of its characters") and Deutsche Telekom[1][2] ("the company's actions just smack of corporate bully tactics, where legions of lawyers attempt to hog natural resources — in this case a primary color — that rightfully belong to everyone") are just two examples that spring to mind.

[0] https://hls.harvard.edu/today/harvard-law-i-p-expert-explain... [1] https://www.dw.com/en/court-confirms-deutsche-telekoms-right... [2] https://futurism.com/the-byte/tmobile-legal-rights-obnoxious...

AI doing things that human laboriously learned and inspired from is just different. After all, sheer quantity can be its own quality, especially with AI learning.

Now, i am worried about companies like OpenAI monopolizing technology through making their technology proprietary. I think their output should be public domain and copyright should only apply to human authors if they should be at all.

But we are allowed to use copyrighted content

Well, not exactly. Certain uses are fair. The question is does OpenAI's use count as fair. I don't think your immediate response comes close to addressing that question despite your conviction it does otherwise.

Also, clothing designs are copyrightable. The conviction expressed by some participants in this debate is exhausting in light of their familiarity with actual copyright law.

Same for patents

This is why clothing doesn't qualify for copyright. No matter how original you think your clothing seems, someone in the last many thousands of years of fashion has done it before.

Most every fashion company has a legal team that reviews print and pattern, as well as certain other aspects of design, relative to any source of inspiration. My husband works in the industry and has to send everything he does for review in this way. I’m not sure where you got the idea that there are no IP protections for fashion, but this is untrue.

I feel the emotionally charged nature of the topic prevents a lot of rational discussion from taking place. That's totally understandable too, it's the livelihood for some of those involved. Unless we start making specific regulations for Generative AI, current copyright law is pretty clear: you can't call your art a Picasso, but you can certainly say it was inspired by Picasso. The difference is that GAI can do it much faster and cheaper. The best middle ground in my opinion is to allow GAI to train on copyrighted data, but the output cannot be copyrighted, and the model weights creating it can't be copyrighted either. Any works modified by a human attempting to gain copyright protection should have to fulfill the requirements to be substantiative and transformative just as fair use requires now.

I think there is a case to be made when AI models do produce copies. For instance, I think the NYT have a right to have an issue with the near verbatim recall of NYT articles. It's not clear cut though, when these models produce copies, they are not functioning as intended. Legally that might produce a quagmire, is it fair use when you intend to be transformative but by accident it isn't? Does it matter if you have no control over which bits are not transformative? Does it matter if you know in advance that some bits will be non transformative but you don't know which ones.

I presume there are people working on research relating to how to prevent output of raw training data, what is the state of the art in this area? Would it be sufficient to prevent output of the training data or should the models be required to have no significant internal copies of training examples?

I do think it's worth remembering there's a difference between "legal" and "good".

It's entirely legal for me to leave the pub every time it comes up to my round. It's legal for me to get into a lift and press all the buttons.

It's not unreasonable I think for people to be surprised at what is now possible. I'm personally shocked at the progress in the last few years - I'd not have guessed five years ago that putting a picture online might result in my style being easily recreated by anyone for the benefit mostly of a profitable company.

Put it this way - you remove all the copyrighted, permission-less content from OpenAIs training, what value does OpenAI have?

What you probably get is a LLM that can perfectly understand well written text as you might find on Wikipedia, but which would struggle severely with colloquial language of the kind found on Reddit and Twitter.

then it should give some of that value back to the content.

That's literally built into their corporate rules for how to take investment money, and when those rules were written they were criticised because people didn't think they'd ever grow enough for it to matter.

That's literally built into their corporate rules for how to take investment money, and when those rules were written they were criticised because people didn't think they'd ever grow enough for it to matter.

How is OpenAI compensating the owners of IP they trained their models on? Or is that not what you mean? It's certainly how I read the part of the GP comment you quoted.

So far, looks like funding a UBI study. As the IP owners are approximately "everyone" in law, UBI is kinda the only way to compensate all the IP owners.

https://openai.com/our-structure

As the IP owners are approximately "everyone" in law

That makes no sense. If I write a book by myself, post part of it on my website and OpenAI ingests part of it - how does that make anyone besides me myself and I an "owner" of the IP?

I don't understand why you're confused, but I think it's linguistics.

If you write a book by yourself and post parts on your website and they ingest it, you are the copyright holder of that specific IP, and when I post this specific comment to Hacker News I am the copyright holder of this specific IP.

In aggregate you and I together are the copyright holders of that book sample and this post, and I don't know any other way of formulating that sentence, though it sounds like you think I'm trying to claim ownership of your hypothetical book while also giving you IP ownership of this post? But that's not my intent.

I don't think you're trying to claim ownership. It sounded like you were suggesting that the only recourse for OpenAI would be to fund a UBI program as a form of payment instead of directly paying the people who own the IP it ingested?

Yes, I'm saying that because there's (currently) no way to even tell how much the model was improved by my comments on HN vs. an equal number of tokens that came from e.g. nytimes.com; furthermore, to the extent that it is even capable of causing economic losses to IP holders, I think this necessarily requires the model to be actually good and not just a bad mimic[0] or prone to whimsy[1] and that this economic damage will occur equally to all IP holders regardless of whether or not their IP was used in training. For both of these reasons independently, I currently think UBI is the only possible fair outcome.

[0] I find the phrase "stochastic parrot" to be ironic, as people repeat it mindlessly and with a distribution that could easily be described by a Markov model.

[1] if the model is asked to produce something in the style of NYT, but everyone knows it may randomly insert a nonsense statement about President Trump's first visit to the Moon, that's not detracting from the value of buying a copy of the newspaper.

They ingested the entirety of the internet. Everyone who has ever written anything, including our (implicitly copyrighted) HN comments and letters written 400 years ago, which is online was used to train GPT-4.

This is a load of bullshit and I sincerely hope you know that as well as I do.

As a thought experiment, let's say I pirate enough ebooks to stock a virtual library roughly equivalent in scope to a large metropolitan library system, then put up a website where you can download these books for free. I make money on the ads I run on this website, etc. This is theft, but as "compensation" I put some percentage of my revenues into funding a UBI study that might, if we're lucky—in half a century or so, in a progressive, enlightened version of the future we are by no means guaranteed to realize—make a fractional contribution to the thrust of a successful UBI movement.

Does that make what I'm doing okay? Should all those authors deprived of royalties on their work now, even deprived of publishing opportunities as legitimate sales collapse, understand my token contribution to UBI as fair compensation for what I'm taking from them?

That to me is a joke, and the only difference between it and what OpenAI is doing is that OpenAI's product relies on a technical means of laundering intellectual property that seems tailor-made to dodge a body of existing copyright law designed by people who could not possibly have conceived of what modern genAI is capable of. We will see what our lawmakers and courts make of it now, but either way, making a promise to pay me back later does not justify you in taking all the cash out of my wallet without my consent. Nor, for that matter, does tearing it up and returning it to me in the form of a papier-mâché sculpture of Shrek's head protruding from the bowl of a "skibidi toilet".

IMO that's a terrible thought experiment given the situation.

LLMs do not store enough content or with enough accuracy to even close to a virtual library. Unlike, say, Google and the Wayback Machine, the former of which stores enough to show snippets from the pages it's presenting to you as search results (and they got sued for that in certain categories of result), and the latter is straight up an archive of all the sites it crawls.

Furthermore, the "percentage" in question for OpenAI is "once we've paid off our investors, all of it goes to benefitting humanity one way or another" — the parent company is a not-for-profit.

Does that make what I'm doing okay? Should all those authors deprived of royalties on their work now, even deprived of publishing opportunities as legitimate sales collapse, understand my token contribution to UBI as fair compensation for what I'm taking from them?

Here's a different question for you: If a generative AI is trained only on out-of-copyright novels and open-licensed modern works, and then still deprives everyone of all publishing opportunities forever as in this thought experiment it's better and cheaper than any human novelist, is that any more, or any less fair on literally any person on the planet? The outcomes are the same.

I'm sure someone's already thought of making such a model, it's just a question of if they raised enough money to train such a model.

OpenAI's product relies on a technical means of laundering intellectual property that seems tailor-made to dodge a body of existing copyright law designed by people who could not possibly have conceived of what modern genAI is capable of

You may have noticed from the version number that they're on versions 3 and 4. When version 2 came out in 2019, they said:

"""Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model."""

They were mocked for this: https://slate.com/technology/2019/02/openai-gpt2-text-genera...

Even here: https://news.ycombinator.com/item?id=21306542

Indeed, they are still mocked for suggesting their models may carry any risk at all. Of any kind. There are plenty of people who want to rush forward with this and think OpenAI are needlessly slow and cautious.

You may also have noticed their CEO gave testimony in the US Congress, and that the people asking him questions were surprised he said (to paraphrase) "regulate us specifically — not the open source models, they're not good enough yet — us".

To the extent that any GenAI can pose an economic threat to a creative job, it has to be better than a human in that same job. For now, IMO, they're assistant-level, not economic-threat-level. And when they get to economic-threat-level (which in fairness could be next month or next year), they'll be that threat even if none of your IP ever entered their training runs.

LLMs do not store enough content or with enough accuracy to even close to a virtual library. Unlike, say, Google and the Wayback Machine, the former of which stores enough to show snippets from the pages it's presenting to you as search results (and they got sued for that in certain categories of result), and the latter is straight up an archive of all the sites it crawls.

I already addressed this: "the only difference between it and what OpenAI is doing is that OpenAI's product relies on a technical means of laundering intellectual property that seems tailor-made to dodge a body of existing copyright law designed by people who could not possibly have conceived of what modern genAI is capable of."

You are certainly welcome to disagree with what I've said, but you can't simply pretend I didn't say it.

Furthermore, the "percentage" in question for OpenAI is "once we've paid off our investors, all of it goes to benefitting humanity one way or another" — the parent company is a not-for-profit.

A quick Googling suggests that OpenAI employees are not working for free—far from it, in fact. In this frame I don't particularly care whether the organization itself is nominally "non profit", because profit motives are obviously present all the same.

Here's a different question for you: If a generative AI is trained only on out-of-copyright novels and open-licensed modern works, and then still deprives everyone of all publishing opportunities forever as in this thought experiment it's better and cheaper than any human novelist, is that any more, or any less fair on literally any person on the planet? The outcomes are the same.

They are certainly welcome to try! Given how profoundly incapable extant genAI systems are of generating novel (no pun intended) output, including but not limited to developing artistic styles of their own, I think it would be quite funny to see these companies try to outcompete human artists with AI generated slop 70+ years behind the curve of art and culture. As for modern "public domain"-ish content, if genAI companies actually decided to respect intellectual property rights, I expect those licenses would quickly be amended to prohibit use in AI training.

AI systems will probably get there eventually, though it's very difficult to predict when. However, that speculation does not justify theft today.

I'm sure someone's already thought of making such a model, it's just a question of if they raised enough money to train such a model.

People are absolutely throwing money at genAI right now, so if nobody has thrown enough money at this particular idea to give it a fair shake then the obvious conclusion is that people who know genAI think it's a relatively bad one. I'm inclined to agree with them.

You may have noticed from the version number that they're on versions 3 and 4. When version 2 came out in 2019, they said [...]

Why is this relevant? I'm not talking about AI safety or "X risk" or whatever—I'm talking about straightforward intellectual property theft, which OpenAI and their contemporaries are obviously very comfortable with. The models they sell to anybody willing to pay today could literally not exist without their training datasets.

I sincerely think you (in the thought experiment) sounds like an incredible hero.

Libraries are awesome and great for reducing inequality, using ads to support that cause and also funneling cash to UBI initiatives? Even better

That... doesn't seem sufficient, or legal, or (if legal) ethical. You can't just "compensate" people for using their copyrighted works via whatever means you've decided is fair.

I think funding UBI studies and lobbying for that sort of thing is a public good, but is entirely unrelated to -- and does not make up for -- wholesale copyright infringement.

I take no position at all about legality, if scraping is or is not legal and if LLMs are or are not "fair use" is quite beyond my limited grasp of international copyright law.

But moral/ethical, and/or sufficient compensation?

IMO (and it is just opinion), the damage GenAI does to IP value happens when, and only when, an AI is good enough to make some human unemployable, and that happens to all e.g. novelists around the same time even those whose IP was explicitly excluded from training the model. So, twist question: is it fair to pay a UBI to people who refuse to contribute to some future AI that does end up making us all redundant? (My answer is "yes, it's fair to pay to all even if they contributed nothing, this is a terrible place for schadenfreude").

Conversely, mediocre regurgitation of half-remembered patterns that mimic the style of a famous author cannot cause any more harm when done by AI than when done by fan fiction.

Right now these models are pretty bad at creative fiction, pretty good at writing code, so I expect this to impact us before novelists, despite the flood of mediocre AI books reported in various places.

Other damage can happen independently of IP value damage, like fully automated propaganda, but that seems like it's not a place where compensation would go to a copyright holder in the first place.

Sufficient compensation? If AI works out, nobody will have any economic advantage, and UBI is the only fair and ethical option I am yet aware of for that.

It's what happens between here and there that's messy.

Good catch, hadn't seen that.

So the researchers, shareholders, and leadership of OpenAI will be happy to give up being ridiculously wealthy so they can be only moderately wealthy, and everyone else gets a basic income?

I'm also just skeptical of UBI in general, I suppose - 'free' money tends to just inflate everything to account for it, and it still won't address scarcity issues for limited physical assets like land/property.

I've love to be wrong about both of these things.

I agree UBI can have that problem, but I think it can be avoided if e.g. the government owns the means of production.

There are still risks in this scenario.

Until my check from OpenAI shows up, they're not.

"That's literally built into their corporate rules for how to take investment money, and when those rules were written they were criticised because people didn't think they'd ever grow enough for it to matter. "

that sounds like insane bullshit to me. they're trained on the whole internet. there's no way they give back to the whole of the internet, more likely a lot of jobs will be taken away by their work.

plus, there was no consent.

If they make a model that's good enough to actually take away jobs rather than merely making everyone more productive — is it a tool or a human level AI, I'm not sure either way though I lean toward the latter — the only possible compensation is UBI… which they're funding research into.

plus, there was no consent.

I agree with you about this. That's a different problem, but I do agree.

It’s not a matter of “is it good enough to replace humans”, as despite all of us here knowing it’s not, we could list many companies (and even industries) where it’s already happening

That comment is self-contradicting. If it's already replacing humans, then economically speaking (which is what matters for economic harm), it's good enough to replace those specific humans.

The reason I'm not sure how much this tech really is at the level of replacing humans in the workplace, is that there's always a lot of loud excitement about new tech changing the world, and a lot of noise and confounding variables in employment levels.

But if it is actually replacing them, then it must be at the level of those employees in the ways that matter.

„In the ways that matter” and the only way that matters for a lot of employers is what is cheaper.

This maybe isn’t strictly related to the topic of this post or conversation but a lot of companies have been replacing most, or even all, support channels with AI assistants. No, it isn’t good enough to replace those humans in a sense most would consider essential - helping customers which reach for the support line, but businesses find it „good enough” in a sense that it’s cheaper than human workers and the additional cost of unhappy customers is small enough to still have it be worth it.

This is very cheap: https://man7.org/linux/man-pages/man1/yes.1.html

I would agree with you that what counts as "good enough" is kinda hard to quantify (which itself leads into the whole free market vs state owned business discourse from 1848 to 1991), but I do mean specifically from the PoV of "does it make your jobs go away?"

Although now I realise I should be even more precise, as I mean "you singular and jobs plural for now and into the future" while my previous words may reasonably be understood as "you plural and each job is just your current one".

Would that be less valuable?

I wouldn't be able to guess.

Plus side: Smaller more focused model with lower rate of falsehoods.

Minus side: it kant reed txt ritten liek this, buuuut dat only matters wen nrml uzers akchuly wanna computr dat wrks wif dis style o ritin lol

I suspect there is a lot of value in the latter, and while I don't expect it to be as much as the value in the former, I wouldn't want to gamble very much either way.

Can it train on the Wikipedia meta community? Maybe we could get an LLM that talks like "You're wrong, see WP:V, WP:RS, WP:WTF".

What you probably get is a LLM that can perfectly understand well written text as you might find on Wikipedia, but which would struggle severely with colloquial language of the kind found on Reddit and Twitter.

Great. Let's do that then. No good reason to volunteer it for a lobotomy.

Regardless of legalities, that's wrong to me.

I see it differently. To me, if you post your work online as an artist, it's really there for everyone to view and be inspired by. As long as nobody copies it verbatim, don't think you've been hurt by any other usage. If another artist views it, and is inspired by it... so be it. If an AI views it, and is inspired by it, again, no harm done.

AI doesn't get inspired. It's not human. It adds everything about it to its endless stream of levers to pull, and if you pull the right ones, it will just give you the source verbatim as proven by the NYT lawsuit filing where it was just outputting unaltered copywritten NYT article text.

That's a matter of perspective. AI's do not make a copy of the source material. It very much just adjusts their internal weights, which from a broadminded perspective, can be seen as simple inspiration, and not copying.

Of course, just like a human artist, it could probably closely approximate the source material if it wanted to, but it would still be its own approximation, not an exact duplicate. As for plagiarism, we all have to be careful to rephrase things we've read elsewhere, and perhaps AI's need to be trained to do this a bit better... but it doesn't change the underlying fact that they're learning from the text they read, not storing a verbatim copy (at least, no more so than a human reader with a good memory)

Based on the comment you replied to, it seems they are indeed producing verbatim copies.

Based on the comment you replied to, it seems they are indeed producing verbatim copies.

Well i'll leave it to the legal system to decide if that's true.

But in any case, that's no different from a human with a photographic memory doing the same thing after reading a paragraph. We don't blame them for their superior memory, or being inspired by the knowledge. We don't claim they've violated copyright because their memory contains an exact copy of what they read.

We may still demand that they avoid reproducing the exact words they've read, even though they are capable of it -- which is fine. We can demand the same of AI's. All I object to, is the idea that a smart AI, with a great memory is guilty of something just by reading or viewing content that was willingingly shared online.

If I tell you water is wet and the sky is blue, will you be waiting for a court case to grind through the appeals process on that as well? The examples in the filing were unambiguous. You can go look it up and see them, they were also cited in all the news articles I saw about it. The AI regurgitated many paragraphs of text with very, very few small modifications.

The issue at hand is not if some words were copied; it's a legal issue of whether that constitutes legal or otherwise fair use or not. And I'm not a lawyer, and am happy to wait for the court to decide.

But to be honest, I don't really care one way or the other, since it doesn't get to the heart of the matter as I see it. To my mind, it's no different than a human with a good memory doing the same thing.

Specifically, it isn't the consumption of the media that is the problem, or even remembering it very well. Rather, it is the public verbatim reproduction of it, which a human might inappropriately do as well. AI's need to be trained to avoid unfair use of source material, but I don't think that means they should be prohibited from consuming public material.

AI's do not make a copy of the source material. It very much just adjusts their internal weights, which from a broadminded perspective, can be seen as simple inspiration, and not copying.

I think the term "AI" is one of the most loaded and misleading to come up in recent discourse. We don't say that relational databases "pack and ship" data, or web clients "hold a conversation" with each other. But for some reason we can say that LLMs and generative models "get inspired" by the data they ingest. It's all just software.

In my own opinion I don't think the models can copy verbatim except in cases of overfitting, but people like the author of the post have a right to feel that something is very wrong with the current system. It's the same principle of compressing a JPEG of the Mona Lisa to 20% and calling that an original work. I believe the courts don't care that it's just a new set of numbers, but instead want to know where those numbers originated from. It is a color of bits[1] situation.

When software is anthropomorphized, it seems like a lot of criticisms against it are pushed aside. Maybe it is because if you listen to the complaints and stop iterating on something like AI, it's like abandoning your child before their potential is fully realized. You see glimpses of something like yourself within its output and become (parentally?) invested in the software in a way beyond just saying "it's software." I feel as if people are getting attached to this kind of software unlike they would to a database, for example.

A thought experiment I have is whenever the term "AI" appears, mentally replace it with the term "advanced technology." The seeming intent behind many headlines changes with this replacement. "Advanced developments in technology will displace jobs." The AI itself isn't the one coming for people.

[1] https://ansuz.sooke.bc.ca/entry/23

You make a good point.

My own perspective is that humans do not have an exclusive right to intelligence, or ultimately to personhood. I am not anthropomorphizing when I defend the rights of AI. Instead, I am doing so in the abstract sense, without claiming that the current technology should be rightly classified as AI or not. But since the arguments are being framed against the rights of AI to consume media, I think the defense needs to be framed in the same way.

If you pull the right levers, you can also copy the NYT article fully.

Yeah and that's copyright infringement. That's why if you're reverse engineering something it needs to be done in a clean room environment, your prior exposure to the copywritten material poisons the well for any derivative you create.

This extends to music as well, if someone hears a song and is inspired by that in their work, the original artist gets credit.

This extends to music as well, if someone hears a song and is inspired by that in their work, the original artist gets credit.

That's only true in a very narrow set of circumstances. Imagine the case where someone listens to 10,000 songs, and then takes the sum total of that experience and writes their own. There's no credit given to the inspiration that each of those 10,000 songs gave. And that is in fact much closer to what AI is currently doing.

If we're discussing the current situation, where ChatGPT is outputting entire articles of NYT copyrighted content, then it certainly matches say, Bittersweet Symphony containing a small sample of a Rolling Stones song resulting in the Stones getting credit for the entire work.

There is a rather vibrant culture of _human_ remix artists who sample existing music and generate something completely new. It is still an open question of how long a sample they can use before it requires licensing:

https://www.ajschwartzlaw.com/can-i-use-that-part-2-remix-cu...

But all of this, doesn't have much to do with whether an AI should be allowed to consume public media -- even if we agree they need to do a better job about avoiding verbatim reconstruction when prompted.

Edit: I wasn't able to find the Lawrence Lessig talk I wanted to link, but here's a quite old one that makes some of the same points:

https://youtu.be/X8ULxxgjBuI

someone hears a song and is inspired by that in their work, the original artist gets credit.

I’m not a lawyer but I’m pretty certain that’s not actually how things work.

This is literally impossible for the general case. There isn't a way to compress everything that an AI consumes down to the finite number of weights. That would represent a perfect compression algorithm, that is mathematically impossible.

as proven by the NYT lawsuit filing

I would be careful in asserting that it's proven until the evidence is heard in court and we have all the data.

You can look at the filing yourself. It is unambiguous.

It’s unambiguous about repeating articles of NY times pieces with hallucinations interwoven occasionally. This isn’t nearly as strong of an argument.

...and if you ask a human artist to exactly reproduce an artwork, that is copyright infringment, or forgery. I don't see how this is any different? The person using the AI to make exact replicas would be the one committing the copyright infringement, not the AI.

If an AI views it, and is inspired by it, again, no harm done.

There's such thing as consent, I hope you've heard of it.

No artist whose work was used to train the AIs consented to such use.

Particularly if they released their work online before generative AI was a possibility.

it's really there for everyone to view and be inspired by

Generative AI model is not "everyone". It's a model, a combination of data that goes into it.

It's a thing. A product. A derivative work, to be specific, made by the person who trained it.

If an AI views it, and is inspired by it, again, no harm done.

Such a romantic notion!

But the same metric, a photocopy machine is an auteur that gets inspired by the work that it happens to stumble into to produce its own original art.

No.

The AI doesn't "view" the work, it has no agency. The human that trains the model does.

And that human is the one that is ripping the artist off.

The AI, as many people said, is just a tool. It doesn't suddenly turn into a person for copyright purposes.

It still remains a tool for people who train the models. A tool to rip off others' intellectual property, in the case we're discussing.

There's such thing as consent.

When you release something into the world, you have to accept that things beyond your control will engage with it. When you walk outside, down the street, you consent to being in public, and being recorded by video cameras, whether you like it or not. Even if a new 3D camera you don't know about exists and is recording you in ways you don't understand.

And in the same way, when you release something on the internet, it is there essentially forever. If you imagine that you get a "do-over" when the world changes in ways you didn't anticipate -- you're wrong. Nobody should feel entitled to such a reset.

And that human is the one that is ripping the artist off.

Nobody is being ripped off.

The AI, as many people said, is just a tool

People said the same thing about slaves. They aren't real people, worthy of rights. They were compared to animals, the same way you compared AI to a photocopier. Someday, this human self-obsession and self-importance against silicon based lives, will be seen in the same light, and artificial intelligences will be granted personhood. We should start protecting their rights in law today. Please don't be a bigot.

Nobody is being ripped off.

Nobody should feel entitled to such a reset.

Please don't be a bigot.

Please don't project.

We should start protecting their rights in law today.

LOL.

You can't have it both ways:

-AI is just a tool, work produced with AI is original work of the user

-AI is an intelligence that gets "inspired" by data it's trained on

Anyway. I don't want to regulate the AI.

I want to regulate the humans that make the choice of which data goes into training an AI.

We should start protecting their rights in law today.

Yes, and those laws will be very useful as soon as any actual artificial intelligence will crop up. Blindly applying them to DALL-E or ChatGPT because they are marketed as AI would be... short-sighted.

If an AI views it, and is inspired by it, again, no harm done.

You had me till that^ line. In your example if "inspired" human start competing with you, then there is harm. If the inspired human is replaced by an AI, then it also harms. By harm I am referring to competition.

So instead of saying "no harm done", then maybe its more accurate to say "same harm as a other humans being inspired by your work".

So instead of saying "no harm done", then maybe its more accurate to say "same harm as a other humans being inspired by your work".

I don't disagree with you. But then that is a completely different issue, and has nothing specifically to do with AI, but rather the tradeoff between the benefits of releasing your work to the public, and the potential competition it might inspire.

This was a reference to Instagram, et. al., where distributing your illustrations there implicitly allows thier Ad machine to profit from your work

you decide to put your stuff on Instagram, you don't decide to put your work in midjourney

But sometimes your work is put on Instagram without your knowledge or consent (eg by an Instagram aggregator account)

And this whole ecosystem of credit (or not sharing credit) is undoubtedly encouraged by Instagram (because it's valuable to me to have an Instagram account with many followers)

But sometimes your work is put on Instagram without your knowledge or consent (eg by an Instagram aggregator account)

That's copyright infringement. You can't claim to own other people's work and then give license to others to use when you don't own the work.

Of course it is! It being illegal doesn't mean Instagram is incentivized to crack down on it

And if you're a tiny independent artist you also aren't well-resourced to do anything about it.

(Afaict the best thing we can do is maintain a culture that demands attribution)

I represent tiny independent artists and the best thing you can do is hope that some moneyed media company infringes your work. They never have an excuse and they always pay up.

But that's not what happens on Instagram

This isn't like some TV ad using a song they forgot to license

Sure, but it is quite difficult to prove your work was used in the models and it is also quite expensive legally while being very time consuming. So in practice does it matter? I still believe in presumption of innocence, but that isn't to say that it can't be exploited and that the system itself is expensive (monetarily and temporally). I think you're just oversimplifying the issue and dismissing before considering any nuance.

what am I dismissing? I think these AI companies are stealing these works. Nerds here are equivocating the creation of art with what LLM output, simply out of their own inexperience making art. It's terrible and I hate it.

I believe I misinterpreted, it sounded like you were just suggesting it was copyright and thus could empower the artist to sue the companies. As if the process was rather straightforward and thus the fault on the artist for not pursuing. I'll admit that I'm primed by other comments which do express this sentiment explicitly. Sorry if I misunderstood.

Is that boundary something that the internet is likely to be able to recognize or enforce though? In 100 years? In 500?

What are we building for here?

If the boundary isn't enforced you'll likely see art move increasingly to a patronage system, whereby a select elite will get to choose what art is suitable for your consumption. Maybe you'll like that, maybe you won't.

I think it depends in large part on how the moving parts interact.

If the emergence of automata and internet creatures also means that people can spend more of their time doing what makes them happy, and if people doing that results in amazing art and music emerging, then I don't think that the purvey of selection will be so dangerously cornered by an elite.

But it's up to us to do the work to build that future.

you'll likely see art move increasingly to a patronage system

It is?

But also this is dismissive without understanding either system at play. Traditional art has galleries because no one wants to buy art that they haven't seen. But this creates a problem if our galleries are on the internet. It no longer matters how much you attempt to stop a user from downloading or copying the art, they still can. We even see efforts to watermark art on public posting but we also see efforts to remove the watermarks. It's very clear to me that this is becoming a more complex time to be an artist than it was, say 5 years ago. Maybe better than 100 years ago, but that would be ridiculous to compare to as we'd have to get into an argument about the point of societies and if a goal is to make life better.

Strangely this is something some in the NFT crowd were attempting to solve. We can all agree that there were lots of scams (I'd even say dominated by scams and grifting) in the area and that there was poor execution of even good faith attempts, but I'm just trying to say that this was motivation for sum. They weren't even trying to solve as difficult of a problem, they were just trying to solve the problem of proof of ownership. It's not like you can use timestamps or public/private key pairs for proof. It still wouldn't even solve the problem of redistribution or training. It wouldn't prevent someone from even buying the art and training on it even if the artist specified that that was against the TOS (a whole other philosophical debate about TOS and physical ownership that I don't want to get into).

I think you're simplifying the problem and being ready to dismiss without adequately demonstrating an understanding of the issues and environment. I don't know what the answer is but I sure as hell know it is a lot more complex than "let the 'free market' run its course".

We have a patronage system today. See: services like patreon

you mean, is the right of an individual human to property and the rule of law be recognized in 100 or 500 years?

hopefully yes, unless you foresee some inescapable transfer to dictatorship, which you'll have to argue for to maintain your argument

I meant something softer and simpler, like power structures which somehow convince learning machines to avert their gaze when media from a forbidden domain appears.

That just doesn't seem like the direction of the internet to me.

As for the rule of law, it seems that what we're talking about here is what authority can be capable of supplying the law. I highly doubt that a confusing and contradictory set of mandates from nation states will continue to be "the law" on the internet.

This seems like a strange criticism to me - if you're posting your illustrations on social media, it's presumably because you feel that you're getting value out of doing so. Who cares if they're also getting value out of you doing it, particularly when that value comes at no cost to you?

If you sell your art, then art marketplaces and printers and shipping services all profit from your work, but I don't imagine she's complaining about that. What's the difference? In all of those cases, as with social media, companies are making money from your work in return for providing a useful service to you (and one you don't have to use if you don't think it's useful).

This seems like a strange criticism to me

On the contrary, I think it is very natural. The environment changed, and thus the deal. There's no clear way to negotiate the terms of the deal. It may be easy to say to just drop off the platform, but we've seen how difficult it can be to destroy a social media platform. Sure, Myspace and Google+ failed, but they didn't have the network base that Facebook and Twitter do which have also fallen into this trap (I'd even add Amazon here for the more general context. Certainly none of these companies are optimizing for the public because there's difference between who their customers are and the public). Network effects are powerful.

So I see complaints like these as attempts to alter the deal and start a coalition. The deal was changed under them, so why do they not have a right to attempt to negotiate? It is quite clear that there is a very large power differential. It's quite clear that unless you're exceptionally famous (which means the alteration is less impactful) that abandoning the platform comes at risking your entire livelihood unless you can garner a large enough coalition. It's like quitting a shitty job without a new job and not knowing who would hire you. Most people are going to stick around and complain __while__ looking for a way out.

The author reminisces of a time that was favorable to them, and implies that back then somehow it was free to distribute and this was taken away. Which is interesting, because there were ads back then too. A lot. Banners. Toolbars. Google made money a shitton of money back then too. And there are amazing non-FAANG spaces today like the Fediverse where the author can distribute their work, and the number of users there are easily comparable to the 'old internet' user counts.

But that points at the main question, what was this magical old way of distribution that was somehow pure that's now gone? Mailing lists. Still there. RSS? Still here, Google Reader died (peace be upon it), but the next day others started to grow as its shadow was gone. IRC? Forums?

So maybe the author is really missing the audience? Sure, platforms and powerful interests shape userbases, but it's not like those old sites were run on pure air.

Of course now there's a new possible way to use copyrighted works, and while it's very similar in effect of what human artists can do (and of course sometimes do), there's a very clear difference in scale, economics, control and thus it's natural that politics is involved.

The author reminisces of a time

This sounds like another way to say that the environment changed and thus the deal did.

There's still a ton of ads. And it isn't like Meta is making less money. They're just below their all time high[0] and it's not like there was one of the largest global financial crises between these two periods or something. Meta is in the top 10 for market caps and just shy of getting into the trillion dollar club. I'm not sure what argument you're trying to make because I'm not seeing how Meta (or any FAANG) is struggling.

And there are amazing non-FAANG spaces today like the Fediverse where the author can distribute their work, and the number of users there are easily comparable to the 'old internet' user counts.

Be realistic. We both know that 1) many of the artists are trying to distribute there, 2) that this doesn't prevent their work from being used in ML models since you can still download the material, 3) the audience is substantially smaller and Mastodon is not even close to replacing Twitter.

platforms and powerful interests shape userbases

Also remember that platforms with large userbases shape users. There's a reason why the classic Silicon Valley Strategy is to run in the red and attempt to establish a (near) monopoly. Because once you have it, you have a lot of power and can make up for all the cash you burned. Or you now have a product to sell: users.

[0] https://seekingalpha.com/symbol/META

Who cares if they're getting value out if it.

Perhaps your artwork has an anti-capitalist message, and you do not want it to appear anywhere near an Ad for the latest beauty cream.

In the early days of the Internet, there were places to promote a webcomic with no commerical interest, like usenet and forums, and it was typical to visit an artist's website directly.

These days, the average new Internet user might not even be aware of the concept that an artist can have their own website, and own the user experience of visitng that website from end to end. Web design in the early 2000s had a lot of creativity and easter eggs built into the experience of navigating the pages themselves.

An artist absolutely has the right to not want to upload their creative work (which takes days and weeks to produce), onto a bland social media site with it's own terms and conditions with regard how that content is treated and monetized.

Places like Instagram are bleak compared to what was pushing creative boundaries of the web in the mid to late 2000s. Sure, there are still fun websites like this but they are difficult to find (what happened to StumbleUpon?)

This and social media and almost always the worst possible representation of an artwork. Crunched, cropped and compressed into an inferior version of itself.

We’ve forgotten how to appreciate digital artworks.

Who cares if they're also getting value out of you doing it,

The author, presumably.

particularly when that value comes at no cost to you?

Who would (has) argued that there is a cost.

in return for providing a useful service to you (and one you don't have to use if you don't think it's useful).

I think they agree with you, and are making the claim that since they choose not to use the service (considering it exploitative), it is getting increasingly difficult to distribute their work in a way they find ethical.

I'm not sure I agree with their arguments completely, but they aren't really "strange" as you suggest.

Yes, and I'm sure somewhere in the Instagram terms of service, users have agreed to license their work to Instagram for those purposes.

Put it this way - you remove all the copyrighted, permission-less content from OpenAIs training, what value does OpenAI's products have? If you think OpenAI is less valuable because it can't use copyrighted content, then it should give some of that value back to the content

Couldn't we say the same thing about search engines?

What value would google have without content to search for?

Is the conclusion we should make search engines pay royalities? That seems unfeasible at google scale. Should google just be straight up illegal? That also seems like a bad outcome; i like search engines i am glad they exist.

I guess i'm left with - i don't like this argument because of what it would imply for other projects if you follow the logic to its natural conclusion.

Search engines don't replicate the content, they index and point to it. When search engines have been caught replicating content they have been sued or had to work out licenses.

How do they make the index without ingesting a copy of the content?

Storing a copy of the content isn't the relevant part. It's what that copy is used for.

In a search engine, it's used to direct traffic to the original author. In Midjourney, it's used to create an alternative to the original author.

I certainly use the blurb in search results at times to get an alternative to the original author (especially if the blurbed content is paywalled).

In the US, that blurb is covered by fair use. Google doesn't regurgitate the entire published work.

These many AI lawsuits will likely settle whether training LLMs falls under fair use.

I mean, it seems like casds of AI regurtitating entire works verbatim is pretty rare. It happens but is unusual.

And also not really what this is about. If AI was just another libgenisis it would be a lot less controversial.

Google doesn't regurgitate the entire published work.

Wrong. Here's Google's cache of an entire paywalled article currently on the front page of HN: https://webcache.googleusercontent.com/search?q=cache:c-xOMj...

You can get these by clicking the three dots next to any search result, expanding the options in the pop-over, and then clicking "cached."

I mean, generative AI basically involves turning content into vectors in order to try and find related content. If that's not an index structure i don't know what is.

The technical definition of how something works is often irrelevant in the eyes of the law. What matters is the output, and its effects.

The output of a search engine is nothing like the output of generative AI.

Most ML are certainly compression systems, as you imply. But reproduction and redistribution of copyright material is quite different than generating a service that allows someone to find said material. The question about caching is a different one, but I doubt it falls under the same intent, as Google is incentivized to cache content to better facilitate the search __and redirection__ service, not to facilitate distribution.

Ya that worked out real well for Genius.

Search engine caches work on an opt-out basis, which was found legal in Field v. Google Inc.

The deal with search engines was always that you would get traffic out of then.

Use my content, to get people to me. Google's snippets kinda broke that deal and people have indeed complained about that, but otoh you can still technically opt out of being indexed.

It's not clear how you opt out of LLMs.

Google does transfer value back to the websites, by sending them traffic.

However, Google does get a lot of criticism when they do slurp up content and serve it back without sending traffic back to the websites! Yelp and others have testified to Congress complaining about this!

You can remove your work from a search index anytime you want. Not so with work included in a training set

A valuable company that is built from the work of content others created without transferring any value back to them.

But they are free to use the fruits of the model, same as anyone else. I suppose the difference is they don't care; they already have the talent to transform their labor into visual art, so what use do they have for a visual-art-generation machine?

I find strong parallels in the building of web crawlers and search indexers, except... Perhaps the indexers provided more universal, symmetrical value. It's hard to make the case that someone crawled and added to a search index doesn't derive value from that strong, healthy index being searchable (even librarians and news reporters thrive on having data indexed and reachable; the index is a force-multiplier, it's not "stealing their labor" by crawling their sub-indexing work and agglomerating it into a bigger index, nor is it cheapening the value of their information-sorting-and-sifting skills when the machine sorts and sifts).

So perhaps there is a dimension of symmetry here where the give-and-take aspect of what is created breaks. Much like the rich don't have to care whether it's legal to sleep under a bridge, artists don't have to care whether a machine can do 60% of the work of getting to a reasonable visual representation of an idea. No, more than that: it's harmful to them if it's legal to sleep under the bridge.

They'd be landlords in this analogy, crying to the city that because people can sleep under bridges the value of the houses they maintain has dropped.

Or they lost means of income? Like is this a difficult concept? Livelihoods will most likely be lost and probably never really coming back. Sure we can say industries change; however, we had protections in place to prevent artists losing money due to people copying… A company has said “screw the rules, here’s a supercharged printer.”

Imagine someone came up with a way to print houses for nearly-free. Anyone needs shelter? Bam, instant house.

Landlords would lose income left and right.

The relevant question is: would society stop the building of these free houses to protect the interest of the landlords?

(Now, take that analogy and throw it in the trash. ;) Housing is essential; art is fundamentally a luxury good. If we want to, we can have a society that protects profit-making on luxury goods and profit-making on necessities differently. "Someone is losing income, therefore this change is bad" is too simple a rule to build a healthy society upon, but a reasonable person can conclude "Artists are losing their livelihoods, therefore this change is bad").

Imagine someone came up with a way to print houses for nearly-free. Anyone needs shelter? Bam, instant house.

Money is just paper. And now with cashless systems, it's now only bits on some computers. But no one sane wants to maximize everyone accounts or print cash to hand out. Art takes hours, days, even years or human labor to produce. Copyright laws set an artificial scarcity, because if it can be replicated at no cost, there's no way for the author to recoup the cost of production.

In the analogy you provided, you'd still have to deal with land scarcity and environmental impact.

Art takes hours, days, even years or human labor to produce

Well, depending on whether one considers the output of one of the AI image diffusion algorithms "art"... Not anymore, right? That's rather the point of the debate? That we've gone over a few decades from a world of "This wall is blank. Maybe I'll go to the art show and find a piece someone made, or commission a piece based on an idea in my head" to "This wall is blank. Maybe I'll spend two hours futzing with DALL-E and send the result off to a local print-shop, then pop it over to Michael's and frame it?"

No one wants to live under the bridge even if it was perfectly legal.

OpenAI is very much a for-profit company with the same incentives to make money as every other US for-profit company. I understand that there's another company that is a non-profit and that company bosses the for-profit company. In my opinion, that's more of a footnote in their governance story, it doesn't make OpenAI a non-profit. Their website says...

"A new for-profit subsidiary would be formed, capable of issuing equity to raise capital and hire world class talent, but still at the direction of the Nonprofit. Employees working on for-profit initiatives were transitioned over to the new subsidiary."

https://openai.com/our-structure

They might be a for-profit company, but I don't believe they have made a profit. I wasn't even thinking of the murkey non-profit status.

This is an important distinction. The difference between making a boatload of money on the cheap and making a boatload of money after spending an even bigger amount should matter to this cartoonist becau

but I don't believe they have made a profit

Wrong way to evaluate and you're missing the complaint from the artists. It doesn't matter what the company makes or loses, it matters if people end up with money in their pockets. I don't know ML counts as derivative or not, but I don't think that's important here. Regardless it is a fact that an artist does work, that work isn't paid for, and then that work is used by someone else in some way to sell a product. The complaint is specifically that they have no control over deciding if their work can be used in that way or not.

At the end of the day, everyone working at OpenAI ends up with lots of money in their pockets. Regardless of the company profits or revenue, this money is connected to the work that these artists do, who do not make any or any substantial amounts of money. That's the underlying issue and we must read between the lines a bit to understand the difference between a specific point and the general thesis.

Also, remember Stability does have revenue as well. OpenAI is being pointed at because they are top dog, but it's not like there aren't others doing the same thing. So even a very specific justification may not refute the original complaint.

They have over a billion dollars in revenue per year. They aren't making a profit because they're reinvesting all of that money into growing the company.

Particularly after the whole Sam Altman debacle, regardless of one thinks that the board was being logical or not and regardless of whether anyone thinks that Sam Altman should have been fired, it's still very clear that the non-profit side of the company is not in control of the for-profit side.

We've seen zero evidence that the non-profit side of OpenAI meaningfully constrains the for-profit side in any way, and have seen direct evidence that when the non-profit and for-profit groups disagree with each other, the for-profit side wins.

It would be good to see how much it changes if you only include work that is public domain.

Any prompt response longer than 30 characters inevitably devolves into verbatim passages from The Iliad.

I could get on board with this.

If that includes a drastic revamp of copyright laws to increase the public domain, why not.

I don't see why Disney or Universal would be more legitimate than OpenAI to profit from stuff made from now dead authors 60 years ago. Both seems as legitimate.

I thought this was more of a reference to Facebook, etc. I could be wrong, though, it's pretty vague.

Yes, OP knows that the quote is referring to platforms/media channels like youtube/facebook/google, etc. But it's also referring to profit-making companies on the internet, like OpenAI.

I think it it's a reference to the media landscape, Facebook and OpenAI inclusive.

If you remove all the copyrighted, permission-less content from a human's training, what value does the human have, in connection with work?

When is AI good enough that the contents it contains can be comparable to human brain content, copyright wise?

And conversely, now that we can read signals from neurons in a human brain, and create images from dreams and audio from thoughts, would not that also break the copyright of the content?

There is absolutely zero comparison between living in the world and experiencing it, and building a model, loading in copyrighted, carefully curated material and then charging for the outputs of said model without paying royalties. It's hard to even believe people can't understand the difference.

The fact is, the majority of people do not want to steal others work for profit, and for those bottom feeders that do, there are lass to discourage such behavior and to protect the original producer.

If these models were trained on creative commons licensed material only, then you'd have a leg to stand on.

I even had to pay for my tuition, and textbook material. Even if some portion of my knowledge comes from osmosis, I have still contributed at some stage to access training material.

When I was 16, I wanted to learn to code, do you know what I did? I went and purchased coding books because even at 16, I understood that it was the right thing to do. To pay the author for the privilege of accessing their work.

How basic can one get?

Would you like it if I broke into your house and used your things without asking you? Because that's about what's happening her for professionals.

I wonder if AI shrinks the economy. Not that such a metric is the most important ruler by which to measure goodness, but it would be ironic to have a massive tech company that produces less value than it removes from the world.

In a way it will remove value but in a way it will add value back in terms of an avalanche of derivative junk, lacking authenticity and context. If you find value in memes for example there will be a lot of that type of content in the future.

It would be nice if you used a label you had to pay a fee for whoever owns the label, if you don't want to pay a fee to the owners of an artstyle, then you can always use public domain works. Hell, this might be even better for public domain works and restoration if a small fee went to them as well.

owners of an artstyle

I hope I never have to live in a world where such a thing exists.

The only way I see this working with out current economics and IP law is if the people training models license the work they are using. And a traditional license wouldn't do, it would have to be one specific to training AI models.

As to the question of worth, obviously OpenAI's models have value without the training data. Just having a collection of images does not make a trained AI. But the total value of the system is a combination of that model and the training data.

This goes for your knowledge as well, as AI fundamentally doesn’t learn any different than humans do.

If you remove all knowledge gained from learning from or copying others works, what value do you provide?

Nothing on this planet can learn without copying something else. So if we open the can of worms for AI, we should do the same for humans and require paying royalties to those who taught you.

I keep thinking: this is what eg Google has done all along. The content it uses to train models and present the answers to us absolutely belongs to others, but you try get any content (eg maps data) out of it for free at scale.

But the business model emerged and delivered value to us while enough that we didn’t consider asking for money for our content. We like being searched and linked to. Less so Google snippets presented to users without the users landing on our site. Even less so generated without any interaction. But it’s all still all our content.

That's not enough to say that, all companies are benefiting of what has been made before, nothing exists in a vaccum. AI adds into the current landscape.

It basically just looked at them. It’s absolutely preposterous that you can own a painting, thereby claiming nobody is allowed to draw that anymore, and now people can’t even look at your shitty drawing without paying? Then don’t put it online in the first place..

Seriously the audacity of these so called artists.. just because I sang a song one day does not mean I am entitled to own it and force people to pay me to be allowed to sing it. That’s absolutely insane.

that's arguably true for humans learning from the content as well

Agreed with the overall sentiment, but let's be clear. OpenAI is currently a (capped) for-profit company. They are partnered with Microsoft, a for-profit company. They commercially license their products. They provide services to for-profit companies. The existential crisis of the last six months of the company seems to have been over moving in the direction of being more profit-oriented, and one side clearly won that battle. OpenAI may soon be getting contracts with the defense department, after silently revoking their promise not to. Describing them as a non-profit company is de facto untrue, regardless of their nominal governance structure, and describing them (or them describing themselves) as a non-profit feels like falling for a sleight of hand trick at this point.

I think a model that would make more sense is to punish bad behavior in the form of infringement, so if someone monetizes an AI output that infringes on someone's copyright/trademark, then go after that person. Otherwise we are going to be completely stuck for the sake of some kind of outdated mentality around intellectual property.

I really haven't liked the crypto bros == AI bros memes, but in this way I do see the similarity.

As in: we will change the world, all that is required is that we throw away all previous protections! The ends justify the means!

I do see a much more beneficial trajectory for LLMs vs cryptocurrencies, but yeah, this is gross and unfair.

Note: as the days go on, I continue to realize the pitfalls of Utilariansim. I do miss the simplicity, but nope.

If you take away all music created by black people what value do The Rolling Stones have?

I don't know yet exactly how this compares, I’m trying to think it all through.

AI has different levels - output can be loosely inspired by, style cloning, or near exact reproductions of specific work.

In my opinion, the play is thus, steal everything, build the models, then:

* People won't notice, or the majority will forget (doesn't seem to be happening). * Raise enough money that you can smash anyone who complains in court. * Make a model good enough that can generate synthetic data and then claim new models aren't trained on anyone's data. * All of the above.

Anyway, I 100% agree with you, the value is in the content that everyone has produced for , they're repackaging and reselling it in a different format.

If the artist's license was for non-commercial use, isn't using it for training an AI for commercial use a clear violation ?

I would love to know why not.

Fair use. Licenses like that are enforced under copyright, if the use is not subject to copyright, the license terms are irrelevant.

I'm not sure if it applies here. Midjourney's use intends to substitute the artists' work, and is commercial.

No one is using Midjourney to replace artists. Artists are using Midjourney to speed up the most tedious parts of their work.

That's just BS. If people weren't replacing artists, there would be no need to prompt with "in the style of ___". are you honestly saying that only artistX is using the prompt "in the style of artistX"?

Making a picture in the style of an artist is not replacing them. I don't know why anyone could get that idea. I can sing in the style Janice Joplin, pretty sure I'm not replacing her and that nobody thinks I am.

Singing in the style of someone is a bit different than commanding a few promots to make up lyrics and AI sing them in a specific style. The former implies some effort, the latter is just effortless. That effort not being needed nor rewarded would eventually lead to much fewer taking the road to any type of artistry. Maybe that’s the next evolutional step culturally or maybe it a big loss.

Nobody cares what you think you sound like in the shower. Nobody

If I can make a voice over that sounds like Neil DeGrasse Tyson, why would I pay him to make the voice over for me? If I can make artwork in the style of X why would I commission X to make that artwork instead? Nobody cares if it actually was made by them as long as people think it was made by them. So "them" gets screwed because nobody will use them. We're obviously talking in very strict ones and zeros here with "nobody". Except for the nobody caring what you think you sound like

Media companies don't care if you think it's replacing her or not. They care that (1) they made something that will sell and that (2) they made it cheaper than they did before.

All content will be behind a subscription and the algorithm will say "you like Janis Joplin? Let's autoplay some close-enough AI artists that don't exist that we won't have to pay any royalties to!" You'll get an offer to upgrade to Double Pro+ for a special introductory rate to skip those tracks.

You're right. Instead of asking for art in the style of Van Gogh, I totally should have commissioned him to do it instead. I've replaced Van Gogh!

The term is used here in its legal meaning, therefore as competing in the same space as the original

Absolutely not true. Non-artists use it to replace the need to purchase art and art services.

That said it is unclear if the connection is strong enough between a given artist and their replacement.

We're a stone's throw from publishers no longer needing illustrators. That job is going to be gone. It'll be the cheapest choice, so the market will make it the only choice.

My children's children are going to have a few classic books that were illustrated by humans, but anything new or reissued won't be.

Magazines, articles, and covers that might have featured human illustration will be gone. It'll be too easy to throw pennies for "draw something topical to article title in the style of artist that appeals to target demographic".

Unless it's a 1 for 1 replica I don't see how it's a substitute in the copyright sense.

We already have laws covering close derivatives - in this case it might not be clear who is the inspiration but there is no novel creativity... everything these networks produce is derived from artists - there is no added novelty.

Imagine how rich you would be if you were the first artist to make a red health bar in a video game.

I think that's exactly the sort of flawed analysis these copyright trolls do. They see something they never would have had a hand in and count it as "theirs", and then greed kicks in. It's like the old MPAA/RIAA calculations of lost revenue that assume someone would have bought a CD if they didn't download a song. Only worse because at least in the downloading case, they are a 1:1 substitute.

In almost all cases I'd guess that artists whose style is being copied by many genAI users are getting way more exposure than in some alternate universe where midjourney didn't exist. But nobody looks at it that way, they just see somebody else making profit and think they deserve some.

it's derived from the original and competes with the original

if it was parody or commentary or otherwise found a new market, it might be allowed under fair use, but the fourth factor of fair use concerns the effect on the market value of the original

Direct duplicates are fair use as they are evaluated for criticism and the novelty of it being an actual duplicate at all

Everything else is a derivative work which falls outside of copyright claims

How would you like the law to change? That’s part of the equation too

Of course there are models trained only on licensed materials already, which reaches the same result of the material creators obsolescence

That's the question that is currently working its way through the courts in many different cases. We don't yet know the answer.

Fair use is a defence you can use when you infringe copyright [edit for clarity] or in other words the action you take would otherwise infringe.

It's not fair use because you want it to be, and it's not at all legally clear if this defence is valid in the case of AI training. But it's not clear it isn't, either.

This is basically what all the noise and PR money is about, currently, in hope that shaping the narrative will shape the legal decisions.

Fair use is not copyright infringement! It’s a limitation placed on copyright to balance the interests of copyright holders with the public interest.

Of course this all depends on jurisdiction, but e.g. in the US for example fair use does not show up [oops, see note] in the limitations and exceptions in the copyright definition (like e.g. government use does), but relies on case law, the "fair use doctrine". This means quite literally it is a framework for understanding the defence of a particular use based on precedent: the defendant(s) argued it.

It's a bit of a quibble to differentiate between allowable infringement or exception, here, so i've edited original.

[note: 0xcde4c3db points out correctly that it does actually show up since 1976, but in a way the defers much of the definition back to case law, so same effect]

I think it makes a difference, because it shifts the burden of proof. If fair use is a kind of infringement, then it’s up to the “infringer” to prove fair use. I think that’s ethically and legally backwards: the copyright holder should have to prove that the use was not fair.

Ethics aside, and again US specific, but the supreme court literally defines it as an affirmative defence.

Even this is a bit murky though, as a district court has argued it goes beyond that.

Maybe there's some nuance of your description that I'm missing, but fair use is codified as a limitation to copyright in US statute (although not until 1976) [1]. Application of it to any given situation is still heavily dependent on case law, though.

[1] https://www.law.cornell.edu/uscode/text/17/107

Fair enough - I had forgotten that quirk in the US situation, it was added to statute in a way that doesn't really define it, so you are back to case law.

Fair use is a legal defense that requires you to pass multiple tests. It's not a "thing" that you can just assert all AI training is and expect people to believe you.

and it seems unlikely to benefit society to let Microsoft and Google to hoover up 100% of creative works, then use that to compete with everyone in that industry to such an extent they can no longer make a living

The fair use defense in this situation is the equivalent of the YouTuber who puts "No copyright intended! I own nothing!" in the caption.

Making money from the trained model isn't fair use anymore.

No, fair use does not preclude making money.

If we talk about the OpenAI magnitude of money it's definitely precluded, that's way beyond fair.

Lol what? That's not how any of this works.

Commercial use does not automatically make the use copyright infringement, but it is an important deciding factor.

https://saltiellawgroup.com/2021/10/%E2%80%8B%E2%80%8Bcan-yo...

Isn’t this only the case if you give credit though?

That's an over-simplication. Fair use does not necessarily preclude the user from making money, but it definitely can if that action diminishes the value of the copied work.

That has to be legislated or written into law in the courts.

So the somewhat necessary followup question: if Midjourney or ChatGPT includes a license in front of its model weights or access to its model that forbids using those weights or output from its model to create a competitor, is it fair use to ignore that license?

In both cases, we're relying on copyright. The companies are saying that you can access the models under a license. It's not too hard to circumvent that license or get access through a third party or separate service: but doing so would obviously be seen as deliberate circumvention. We can easily compare that to an artist putting up a click-through in front of a gallery that says that by clicking 'agree' you agree not to use their work for training. And in fact, it's literally the same restriction in both cases: OpenAI maintains that they can use copyright to enforce that people accessing their model avoid using their model to train AIs. Paradoxically, they also maintain that artists can not use copyright to block people accessing their images from using those images to train AIs.

Facebook has released model weights with licenses that restrict how those model weights are used. But I can get those models without clicking 'agree' on that license agreement. They're mirrored all over the place. We'll see what courts say, but Facebook's public argument is that their license doesn't stop applying if I download their software from a mirror. So I find it hard to believe that Facebook honestly believes in a fair use argument if they also claim that the same fair use argument doesn't apply to their own copyrighted material that they've released online (assuming that model weights can even be copyrighted in the first place).

This is one of my biggest issues with the fair use argument -- it's not that there's nothing convincing about it, in a vacuum I'd very likely agree with it. I don't think that licenses should allow completely arbitrary restrictions. But I also can't ignore that the vast majority of companies making this argument are making it extremely inconsistently. They know that their business models literally don't work if competitors figure out ways to use their services to rapidly produce competing models. If another company comes up with a training method to rapidly replicate model weights or to augment existing models using their output, none of these companies will be OK with that.

None of these companies believe that fair use invalidates license terms when it comes to their own IP.

I think companies like openAI, meta and midjourney (if they say you can't train on output) are hypocrites, and probably have no standing to enforce some of the restrictions they try and make. That doesn't change fair use. There are also many truly open source models that explicitly give the users freedom to use them as they want, and I'd prefer more providers license their's like that, ie Apache 2.0 or MIT.

Definitely it doesn't change the fair use argument, but it should change the way we talk about these companies and the way we contextualize what they're arguing and the way we talk about the surrounding debate. We should be up-front that there are plenty of examples where people are not granted these fair use rights and that fair use is often unequally applied to IP law.

And we should be pushing hard for the entire package (not to say that you're not, more just pointing out that this is a package deal) -- I am not a copyright maximalist, but I especially don't want a model where only artists and individuals are subject to copyright restrictions. I'm of the opinion that it is good for these companies to take a credibility hit for their hypocrisy and for us to make it clear that the fair use argument has to be something that applies to everybody. I hope that this hypocrisy is something that comes up in court if these lawsuits go through, and ideally I'd like it to become a sticking point where trying to have their cake and eat it too does hurt these companies' legal arguments.

I want limited copyright across the board, but a system with maximum fair use for companies like OpenAI and minimal fair use for consumers and individuals is worse than a system with expansive copyright for everyone. I think it's a good thing for companies like OpenAI to be unable to make selective arguments. We shouldn't treat their end goals as irrelevant.

It's also very true that there are open models that take a permissive approach, but those models are in my experience usually not the primary target of most criticism (outside of the "open models are dangerous" FUD). The linked comic explicitly calls out "profit-making companies that own the most-used channels of communication." I'm not sure the artist would phrase it this way, but I think reading between the lines the artist is very likely coupling together ownership with their complaint -- they are talking about the injustice of having their work appropriated into a form that is locked down and controlled and that exists for the benefit of the profit-making companies who want to own yet another channel of communication (in this case artistic communication itself). I suspect that part of that feeling of injustice does directly come from an understanding that as an individual anyone would be sued into oblivion if they tried to make the same arguments in the opposite direction.

Does that get rid of the fair use argument? No. But it is impossible to actually talk about the fair use debate in a useful way while ignoring the context of that debate. To just mention fair use and nothing else is leaving out a ton of context about why artists are mad, and about why this feels unjust to so many people, and about the regular inconsistencies in how these arguments are often applied. We don't have to get rid of fair use arguments, but I personally think it's more responsible to include that context. I worry that leaving that context out is playing directly into the hands of companies that are trying to create an outcome of selective enforcement for IP.

----

It's a bit of a sidenote, but on the subject of truly open models, I will say that it is great that those models are licensed the way that they're licensed, but I'd still rather have legal recognition that algorithmically generated weights are not subject to copyright at all. It's nice when companies are permissive, but it also subtly implies that they have a choice whether to be permissive. As someone who favors expanding fair use, I would prefer that they not have that choice. I disagree with the application of any license to model weights, permissive or not. I think at best those model weights could be subject to a database license, maybe, but I'm honestly a little skeptical of even that: I think that model weights are a factual and reproducible artifact of training, not a creative work.

I don't think its fair use.

https://fairuse.stanford.edu/overview/fair-use/what-is-fair-...

There are complications, but google can use thumbnails because essentailly they are used to "review" the website.

Has google sampled and hosted the whole image on their own website and made more iamges in the style of say mickey mouse, they would have been taken to town by the owners.

This is why there are no commercial movies on youtube (without an explicit agreement) and why DCMA takedowns exist.

There's really no reason for anyone to be confident that this is or is not fair use. The law around fair use is a quagmire of conflicting ideas, cases, and opinions and most of the law is in the form of precedents since the letter of the law is so vague.

One only needs to read the Supreme Court decision in Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith. In this case, it was found to not be fair use, but with a forceful dissent from Elena Kagan. And this is just a straight-up use of a photograph / art-piece. Add technology to the mix and things get really messy quickly.

in that case, the source image was actually licensed:

"Years later, Goldsmith granted a limited license to Vanity Fair for use of one of her Prince photos as an “artist reference for an illustration.” The terms of the license included that the use would be for “one time” only. Vanity Fair hired Warhol to create the illustration, and Warhol used Goldsmith’s photo to create a purple silkscreen portrait of Prince, which appeared with an article about Prince in Vanity Fair’s November 1984 issue. "

We should note that whether AI training is actually fair use has yet to be tested in court. In particular, there's some wording in fair use about the effect on the market for the original and whether the derivative work can substitute for the original that I think a lot of programmers who blindly say this is fair use ignore. I am not a lawyer though.

Fair use, if it does indeed apply, doesn't care about license

I doubt it's still fair use if you become one of most valuable companies, hence the word "fair" in fair use.

The value of a company has nothing to do with whether copyright applies.

Unless the value of the company is based on one product with alleged copyright infringement.

OpenAI has one asset: GPT.

Commercial use does not automatically make the use copyright infringement, but it is an important deciding factor.

https://saltiellawgroup.com/2021/10/%E2%80%8B%E2%80%8Bcan-yo...

That says nothing about the value of the company.

If it's an infringement when OpenAI does it, it would be an infringement when a company with 1/1000th the value did it.

The fair in fair use has nothing to do with the size of a company. If Disney produces a satire of Penny Arcade, that's still fair use of Penny Arcade's IP.

Satire is always linked to the object of the satire, that's he fair part.

LLM text and images aren't.

It's a specific legal term - so the everyday meaning of the words isn't relevant as far as I understand it.

There is a reason why the word was chosen, in essence, it is about the meaning of the word

Fair use is an argument to explain why you violated the license/copyright. It relies on the fact that you did make a copy without permission.

Fair use is not a copyright violation! It’s a limitation placed on copyright to balance the interests of copyright holders with the public interest.

It's seems just like a poor choice of words and semantic argument around the term "violate". Fair use is an affirmative defense — you concede you did this thing that would be illegal if not for this other extenuating information/circumstance. This is similar to claiming self-defense in a murder case — yes I killed that person, but it's not murder since I can show it was self-defense.

In this case it would be 'yes, I used this copyrighted work in a way that would be infringement were it not a fair use, but here's why it's fair-use.'

Thank you, yes - affirmative defense is exactly what I was trying to think of but couldn't remember.

And if so, should that non-commercial-use qualifier extend to works co/created with that AI?

The sooner anyone making profit from models trained on creators proprietary content start paying for the content they’re using the better for creators, society and even the AI companies. It’s pretty tiring hearing people argue about whether copyright law applies to AI companies or not. It applies. Just get on and sort out a proper licensing model.

It applies.

If only saying it would make it so.

Unfortunately, it's not easy to make this legal argument given how copyright law only protects fixed, tangible expressions, not ideas, concepts, principles, etc. and has a gaping hole called 'fair use.'

The new York Times has examples where GPT will reproduce world for word exactly paragraphs of their (copyrighted) text if you ask it to. That's a pretty fixed tangible expression I think.

That is evidence that GPT Can violate copyright, not that all of the outputs do.

It supports an argument that GPT shouldnt produce outputs that are extremely similar, not that the content can not be used as an input.

I suspect a single verbatim output of sufficient length is enough to poison the entire weight set as a derivative work

as well as all the output it ever generated

I dont necessarily disagree, but by what logic or argument do you make that case?

Does that mean that models that can not produce copies of X length ARE fair use?

Does that mean that models that can not produce copies of X length ARE fair use?

not necessarily

"sufficient but not necessary" I believe is the term

Sure, but then you need an additional line of rationale and logic to cover those other cases.

I suspect that single case will catch all of them

Thats pretty silly. you can just put a gatekeeper to prevent it from spitting out anything too similar, or prevent a user from forcing it to. it is not an intractable or pervasive problem.

it is a fringe case that rarely occurs, and only with a lot of user prompting.

the weights themselves are still a derivative work even if they post-filter

legal discovery could almost certainly compel the LLM host to provide access to the output of the weights themselves without the "gatekeeper" present

I don't buy the argument that the models are not sufficiently transformative. Nobody would look at a weight table and confuse it for the original work, and it is different in basically every way.

If there is a case to be made, I think it has to be around the original use of the works, the transcription process. Not the weights, or the output

No one would look at a veracrypt archive containing a copyrighted work and confuse it either. They look very different to the original files but both the encrypted file and the learning model's weights allow one to reproduce the copyrighted work.

For sure, that could be an instance of infringement depending on how it is used. But that's a minuscule percentage of the output and still might be fair use (read the decision in Authors Guild, Inc. v. Google, Inc.). But even if that instance is determined to be infringement, it doesn't mean the process of training models on copyrighted work is also infringement.

I can see 3 ways that you can guarantee that the output of a model never violates copyright

1. Models are trained with 100% uncopyrighted or properly licensed input data

2. Every output of the ML model is evaluated to make sure it's not too close to training data

3. Copyright law is changed to have a specific cutout for AI

#1 is the approach taken by Adobe, although it generally is harder or more expensive to do.

#2 destroys most AI business models

#3 has been done in some countries, but seems likely that if done in the US it would still have some limits.

For example, I could train a model on a single image, song, or piece of written text/code. Then I run inference, and get out an exact copy of that image, song, or text. If there are no limits around AI and copyright, then we've got a loophole around all of copyright law. I don't think that the US would be up for devaluing intellectual property like that.

The much more likely scenario is that there is a precedent-setting court case. This is how it happened with practically every other instance of copyright bumping into technology.

The much more likely outcome:

4. A ruling comes down that enshrines what all the big companies have been doing (with the blessings of their armies of expensive, talented, and conservative legal teams) as legitimate fair use

The new York Times has examples where GPT will reproduce world for word exactly paragraphs of their (copyrighted) text if you ask it to

You are forgetting the massive asterisk that you need to provide multiple paragraphs of the original article in order to get verbatim output from chatgpt. In what world are people doing that to avoid paying the NYT?

This is absolutely not true. Here is an article where ars technica tried it

https://arstechnica.com/tech-policy/2023/12/ny-times-sues-op...

And this is a screenshot of their session whith copilot

https://cdn.arstechnica.net/wp-content/uploads/2023/12/Scree...

This does not match what GP claimed, that when prompted with the start of an article, GPT 3 (mostly) faithfully completes it. The original article also claimed that it seemed to have been patched shortly after publication.

It does match exactly what I claimed and it also states that even though the behavior was patched in GPT the ars people were able to easily reproduce it in copilot.

Its funny that the behavior was patched if OpenAI believes it isn't copyright infringement.

MidJourney, too.

https://twitter.com/Rahll/status/1739003201221718466

It's frankly impressive how well this image is embedded in the weights of their model, down to tufts of hair. And it's far from the only one.

"The bottom line is this," the firm, known as a16z, wrote. "Imposing the cost of actual or potential copyright liability on the creators of AI models will either kill or significantly hamper their development."

The firm said payment for all of the copyrighted material already used in LLMs would cost the companies that built them "tens or hundreds of billions of dollars a year in royalty payments."

https://www.businessinsider.com/marc-andreessen-horowitz-ai-...

Doesn't that kind of demonstrate the value being actively stolen from the creators, more than anything? Copyright law killed Napster, too. That doesn't mean applying copyright law was wrong.

And now the guy who started Napster is on the board of Spotify who just decided they weren’t going to pay small time artists for their music anymore. Go figure.

Spotify and the rights holders come together and agree on the right price. Unfortunately, since long before Spotify ever existed, it's usually the record labels who owns the rights to the music and the actual creators still get shafted.

Except in this case, 10s of thousands of rights holders are just getting nothing as of the start of 2024 and I can tell you, Spotify certainly did not “come together” with any of us.

Didn’t they basically say they’re no longer going to bother paying for songs earning under $3 USD per year.

It seems like the only people that will be impacted are the abusive auto-generated spammer accounts with thousands of garbage tracks uploaded garnering 1200 streams a year by people accidentally playing them via Google Home misinterpretations etc.

1000 streams per year for each song. That’s not just that auto-generated junk. That’s a majority of ALL music on Spotify.

So yes an individual song might be $3 per year but that just shows how poor their royalties are to begin with. And tries to obscure the fact that artists don’t just release one song ever.

There’s thousands of artists who maybe even were somewhat successful at some point in their career but would have a lot of songs in their back catalog that don’t get that many streams annually. Suddenly they’ve gone from not making enough per stream from Spotify, to just getting paid nothing at all.

Of course it was wrong. Abolish all copyright.

There are many areas of research, technological advancement, and construction which would proceed much more quickly than their current pace if we didn't force them do things in the way that society has decided is correct and just.

The constitution is clear that copyright is not "just;" it is a concession for the sole purpose of promoting "the Progress of Science and useful Arts," whose "unnatural restriction of natural rights must be carefully balanced."

The fact that "research, technological advancement, and construction would proceed much more quickly" without copyright is exactly why abolishing copyright is the just and correct thing to do.

Good.

Sounds like they need to find someone good at developing business models who can help them figure one out.

Starting a business is huge amounts of risk to begin with. Just because you may lose more doesn't mean you're magically exempt from being able to ignore that.

Watching the superstars of venture capital whine that copyright is unfair is quite something, though.

“Payment for all workers who develop fields or man factories would cost the companies that operate them hundreds of thousands of dollars a year in salary payments”

- slavers, probably.

Of course slavery != AI, but the argument that we should protect companies from their expenses to enable their bad business model is very entitled and presumptuous.

Thousands of companies have failed because their businesses models didn’t work, and thousands more will.

AI will be fine. It probably won’t be as stupidly lucrative as the current model, but we’ll find a way.

What do you expect them to say, "we're cool with making less money and making it harder to make money?"

Imposing the cost of actual or potential copyright liability on the creators of AI models will either kill or significantly hamper their development.

which, as an involuntary donor, is exactly what I want

Well. Fuck those guys.

Oh no, the horror of the cost of doing business when you can't get away with a get-rich-quick-scheme fast enough to cash out and disappear.

It applies - but how it applies is simply not settled. Clearly, there is no statistical model that can launder IP. If someone uses Midjourney to produce someone else's IP, that user (not Midjourney) would be in violation of copyright if they use it in a way that does not conform to the license, such as by creating their own website of cat and girl comics and running Adsense on it.

However, if other artists are inspired by this style of comic, and it influences their work - that is simply fair use. If that artist is some rando using a tool like Midjourney - that is inspired by the art but doesn't reproduce it - it is not at all clear to me that this is not also fair use.

The point is that Midjourney itself, the model weights, is possibly a derivative work of all of the works that went into its training. The fact that it can produce almost identical copies of some of them, especially if not carefully prevented from doing so through secondary mechanisms, is obvious proof that it directly encodes copies of some of these works.

That already clearly means that they couldn't publish the model directly even if they wanted to, since they don't have the right to distribute copies of those works, even if they are represented in a weird lossy encoding. Whether it's legal for them to give access to the model through an API that prevents returning copyrighted content is a much more complex legal topic.

possibly a derivative work of all of the works that went into its training.

or… it is possibly a transformative work of the all the works that went into it's training, which would lead to a strong fair use argument. Given how permissive the courts have been with transformative work, this seems like an easier argument to make.

But again, we’ve seen these models spit out verbatim text and images that are copyrighted (many examples throughout this thread). That doesn’t strike me as “transformative work.”

it is possible that those works are not transformative, but 99.9% of the output is.

Human artists can also create copies when instructed to.

That doesn't mean that the rest of their work isn't transformative, or that the process of leading isn't fair use.

Similarly, the law doesn't bar artists from learning, but provides recourse, when and if artists create and sell copes.

Clearly, there is no statistical model that can launder IP.

Of course. The model isn't making a decision as to what may be used as training data. The humans training it do.

If someone uses Midjourney to produce someone else's IP, that user (not Midjourney) would be in violation of copyright

That's like saying that if a user unpacks the dune _full_movie.zip I'm sharing online, it's them who have produced the copyrighted work. And me, the human who put the movie Dune into that zip file, is doing no wrong. Clearly, there is no compression algorithm that can launder IP, right?

However, if other artists are inspired by this style of comic, and it influences their work - that is simply fair use

The AI isn't inspired by anything. It's not a sentient being, it's not making decisions, and its behavior isn't regulated by laws because it does not have a behavior of its own. Humans decide what goes into an AI model, and what goes out. And humans who train AI models on art don't get "inspired". They transform it into a derivative work — the AI model.

One that has been shown to be awfully close to dune_full_movie.zip if you use the right unpacking tools. But even that isn't necessary. Using work of others in your own work without permission and credit usually goes by less inspiring words: plagiarism, theft, ripping off.

Regardless of whether you reproduce the work 1:1, and whether you can be punished by law for it.

tool like Midjourney - that is inspired by the art but doesn't reproduce it

Never in the history of humanity has the word inspired meant something that a tool can do. If it's "inspired" (which is something only sentient beings can do), then we should be crying out about human right abuses the way the AI models are trained and treated.

If it's just a tool, it's not "inspired".

You can't have your cake and eat it too. Either pay your computer minimum wage for working for you, or stop saying that it can get "inspired" by art (whether it's training an AI model or creating a zip file).

It's pretty tiring hearing people think anything like what you're saying is going to happen. The horse is out of the barn already, deal with it.

The horse is out of the barn already, deal with it.

Like leaded gas, the government can make regulations to deal with anything should they choose to.

Why do you think it's okay for massive companies to freely profit off the work of others?

Leaded gas isn't open source software. Try again.

Most of the big AI models aren't open source and even if they were? You can absolutely regulate them.

I think it's worth noting that Adobe Firefly and iStock already have generators that use only licensed content.

Highly doubtful.

Yes, I know Adobe said so. No, I don't trust them.

Facts:

1. Adobe Firefly is trained with Adobe Stock assets. [1]

2. Anyone can submit to Adobe Stock.

3. Adobe Stock already has AI-generated assets that are not correctly tagged so. [2]

4. It's hard to remove an image from a trained model.

Unless Adobe carefully scrutinize every image in the training set, the logical conclusion is Adobe Firefly already contains at least second-handed unauthorized images (e.g. those generated by Stable Diffusion). It's just "not Adobe's fault".

[1] https://www.adobe.com/products/firefly.html : "The current Firefly generative AI model is trained on a dataset of licensed content, such as Adobe Stock, and public domain content where copyright has expired."

[2] Famous example: https://twitter.com/destiny_thememe/status/17448423657672255...

anyone making profit from models

It's actually just "anyone making models". If you train a model with other people's art (without their permission) and then distribute the model or output for free, your still stealing their work, even if you make zero profit.

s/profit/any revenue what so ever

the better for [...] AI companies Yes exactly. Kill off the free alternative since nobody else can afford licenses with big rights holders. Google will love this. Creators will get pennies. Everyone has to go through proprietary apis, OS will be outlawed. Not something I would want to see!

Why do you as a hacker shamelessly support copyright?

On the same topic from another webcomic: https://www.penny-arcade.com/comic/2024/01/03/terminarter

(text commentary below the comic, in case your OS has decided to conceal the scroll bar from you and you didn't notice the page is longer)

I'm always reticent to fully engage in "The Dialogue," regardless of its momentary configuration. It's a smoothie made from shibboleths; you have to be able to detect them at only a few parts per million because once these things metastasize, they stop being about whatever they were about and instead become increasingly loud recitations of various catechisms and loyalty oaths.

Boy does that ring true.

Side comment: I respect the author’s right to choose words that ring well to them, but jeez, as a non native english speaker, reading this, i am happy my device has a dictionary function.

That's Jerry's "thing." Jerry Holkins has always enjoyed how weird English is and what you can do to someone's brain with a few precisely-chosen words aligned in the right order; it's his writing style.

If he applied his approach to Python coding, he'd be one of those folks who isn't happy until their entire program is a single `do()` method that executes by using decorators to synthesize classes and metaclasses out of thin air, that themselves generate the Python code as strings, that itself gets `eval()`'d to execute the intended program.

Which is all fun and game until you want to get some actual shit done, or in this case, communicate...

They're into their second half century doing this. Would they have made it this far without being a little weird? Maybe, but it's hard to A/B something like that. I understand it well enough after following the comic that long.

It's not for you.

(this is a reference to an ancient Penny Arcade bit: https://www.penny-arcade.com/comic/2004/03/24/the-adventures...)

It's not for you.

(this is a reference to an ancient Penny Arcade bit: https://www.penny-arcade.com/comic/2004/03/24/the-adventures...)

Thank you for posting this. I remember it, had forgotten it, and re-reading it after completely forgetting about it is almost as good as reading it for the first time. It still holds up 20 years later!

I think the point is that this is an artistic choice. I consider that paragraph to be halfway between poetry and prose. It's not the style of writing that you would want for most forms of communication, but if you think of it as being akin to a bit of poetry, then perhaps you will find it more acceptable.

I for one really appreciate seeing language being used so artistically. (Though if I was reading a technical spec, product description, legal contract, etc... absolutely not.)

As a native English speaker, I agree!

I'm not a native speaker and I enjoy his writing and have no problems understanding it. In this specific case it helps to be a bit of a nerd about religions/mythology though :P

Yes, the entire post (it's short) is well worth a read.

And from the following day, their paraphrasing of https://www.theverge.com/2023/11/4/23946353/generative-ai-co... is succinct:

Meta: We stole so much stuff that, actually, we didn't steal any stuff

Google: If things were different, things would be totally different

Andreeson Horowitz: But we already spent so much money

Microsoft: Think about how this would hurt the little guys, like us

Anthropic: Fucking shut up about it

Hugging Face: It's super legal but honestly it might not be idk

StabilityAI: It's legal in places that aren't here

This is a little oblique, but isn't there a parallel to the music piracy debates of the 90s here? The defense of those who steal training data, "It's not stealing because I'm only taking a little tiny piece from a whole bunch of people", seems like a neat inversion of the defense offered by people torrenting mp3s in their dorm rooms.

It's worth noting that, back when it suddenly became technically feasible to copy all songs for free, the response from the rightsholders of music was, essentially, to demand that we all collectively pretend it isn't. It might make sense for the rightsholders of training data to try to do the same, if they could speak with one voice and if they had lobbyists, but I guess the fact that they can't and don't settles that.

I like a good metaphor, so I see your point, though it doesn't quite fit for me.

To me, training a model is less like redistribution and more like reading something and having it influence your thoughts. You may be able to reproduce certain memorable sections of things you've read verbatim (common with poetry, for example) and you may well be able to reproduce a 'style', but you don't have the entirety of the source material memorized in a way that it could be redistributed in full (but if you did, that may be well an infringement at such a time).

I agree that copying music is not all that similar to an AI model using training data, but they're both examples of a superficially similar activity being transformed by automation and scale. The difference between manually copying a song on cassette for a friend and uploading that song to napster resembles the difference between a human learning to draw in an established artist's style to an AI model learning to draw in an established artist's style. In both cases, the offender's defense is a variation on, "It was okay when humans did this slowly..."

but if you call it virtue signaling, you get categorized into the group that uses that loyalty oath most often, instead of the non-group member that is simply frustrated by the accuracy of it occurring

well yeah, because you'd have to be virtuous to be able to virtue signal, and the ones who bring that up generally aren't.

It's the implication of virtue signalling that it's a facade?

If Steve the megarapist is writing articles about the latest popular injustice, does that make Steve virtuous?

Even a stopped clock is right twice a day, so they say.

This would explain why I find midjourney art so hideous.

I think a better explanation is that the vast majority of users don't try any prompt parameters or different styles and just rely on its default settings

Sorry, this was a joke about the decline of "Gabe's" artistic output. I'd find midjourney's output repulsive no matter what it looked like.

Got it. Though Midjourney can certainly be pushed beyond its fingerprint, so you might not be aware that a given image you've seen was generated by it or not

Let me be clear: it is the knowledge that an image was created with midjourney that repulses me, not the visual appeal of the image itself.

That is unfortunate

It is also not an uncommon opinion, which you should keep in mind if you intend to use AI-generated imagery in your products.

Actually, it's an opinion that's going to dwindle as AI tooling becomes more commonly available. I am not worried about it

Nah. Even the "good" stuff feels cheap. It's the art version of ho-hos and twinkies. Every once in a while I see one and think, "oh, neat, wonder how they got it to do that?", but usually I'm just left slightly queasy and with a bad taste.

Then it's no problem right? Or are you just in a minority.

That is a non-suitor. A lot of people like hohos and Twinkies.

That doesn't mean they arent stolen and that doesn't mean they are fine cuisine.

Price and volume mean it still has impact. We’re already seeing that with online marketplaces and forums filling up with high-volume AI excreta, and a lot of businesses love replacing positions with software so regardless of the relative merits of the output anyone who lives in a society should be thinking about what it’ll look like as millions of jobs disappear.

I've been using Midjourney to generate assets and textures that I Photoshop into new images, such as album covers, banners, and flyers. It has been excellent for speeding up my workflow. My small audience has found a high amount of visual appeal in the results

I'm curious if Gabe over at penny-arcade would be upset if his comics were used in an art class to train a new generation of artists.

> If my interlocutor isn't prepared to make any distinctions between a human being and a machine designed to mimic one, I think that I can't meaningfully discuss this with them. I hate that this is the case; it must be seen as a personal failing. I can't do it, though. These models aren't people. They don't know anything. They don't want anything. They don't need anything. I won't privilege it over a person. And I certainly won't privilege its master.

Perhaps if you read Gabe's post you could have saved yourself the trouble of making this comment.

Seems lazy to differentiate between types of intelligence without providing any substance as to why you believe one type is more worthy of learning than another and criticizing others for the same laziness

Seems lazy to differentiate between types of intelligence without providing any substance as to why you believe one type is more worthy of learning than another and criticizing others for the same laziness

Seems lazy to call a genAI model "intelligence" without providing any substance as to why one should believe that, but OK.

The model isn't an "intelligence" because it's not making a choice about which data you train on and be "inspired by", as people here say.

The humans that operate those models do. It's those humans that are exploiting the artists.

The AI, as many have said here, is just a tool.

Funny how it's "just a tool" when a human uses it to create art, but it's "intelligence" that gets "inspired" by art when a human uses it to create a software product.

The model isn't an "intelligence" because it's not making a choice about which data you train on and be "inspired by", as people here say.

This is a very poor definition of intelligence and is not convincing of deserving training data

So says you.

Models aren't capable of existing without humans in the loop making the choice of which data they get trained on.

Consequently, those humans and their choices need to be regulated.

The model isn't an "intelligence" because it's not making a choice […]

There’s no evidence any of us are making choices either. I don’t think LLMs are intelligences in the way most people mean the word (I think that would be vastly overestimating their abilities) but appealing to choice when it’s so poorly understood — as poorly understood as intelligence, even — is not useful.

There’s no evidence any of us are making choices either.

There is very clear evidence that the humans that operate the computers make the choice on which data the model gets trained on.

The discussion is about those choices.

Training people and training AI are completely different things, and this tired argument is disingenuous.

The argument here seems more mad at the decline of the ability of an individual to promote their work on social media channels (which is indeed a problem that's getting worse for reasons unrelated to generative AI) rather than losing money to people making AI art with Midjourney.

The point is that they no longer have a choice not to enrich profit-seeking companies with their unpaid labor, because by posting to those sites, you're generating the content they place their ads beside. And then pointing out that at least with social media, they had a choice in the matter, while AI scrapers just take it, wherever they might have posted it.

going to take some HN karma hit for this but i just don't get it. the price of anything you sell (or do not wish to give away for free) is not how much it cost to build but how much someone else is willing to pay for it.

If you think the work is so great. charge for it. let the market decide. i think twitter/x does this now somewhat.

how is this not similar to thinking google owes me money because my tweet appeared in their search results?

You seem to be missing the point. The artist has already compromised their personal ethics by choosing to post their work to social media. Now, with AI, they don't even have a choice in the matter.

nobody is forced to post to social media

"And now even that nominal opt-in option is gone. They just take it."

you've given it to them

Much in the same way I've "given" my wallet to a thief by leaving it my unlocked car.

how is posting something in public space analogous to leaving an item in an unlocked car where either you either did not mean to leave it there and/or leave the car unlocked?

When did I say I did so unintentionally?

Why post if you're not going to argue in good faith?

If they don't link to you, but instead take the content of the link and show it alongside ads, they do owe you money, at least in some jurisdictions.

but google doesn't do that ... they only show you a list of links (and granted some tidbit content you the web site can specify) pertaining to the search ... the product from google are those pages of list of links

which is really not dissimilar to what midjourney is giving you which is a result from an array of public(?) works if i understand (not an AI/ML expert)

Google used to do more of that sorta stuff: amp pages, google images would not link to the original page, full summaries in the "rich results" box etc. but got in trouble with lawsuits from Getty Images & others.

My impression of generative AI is that yes, it is not any one specific result, which is in some ways even worse because artists cannot get any credit or compensation whatsoever despite the end result depending at least partially on their work.

The market doesn't help the post author precisely because GenAI researchers and organizations are claiming fair use in the particular way they are. The only way an artist can prevent them from appropriating artwork without permission is to somehow prevent anyone, anytime, ever from placing it into a training data set. If you make something interesting and share one copy with a friend, and that friend shares it with one GenAI researcher friend who wants training data - boom, it'll be in the training set and your only option was to never share it at all with anyone.

Why is the author so concerned about enriching profit-seeking companies?

Fixed, should have specified "unpaid labor." Hopefully that makes it clear.

Going out of your way to voluntarily choose to post your work on the internet, and on social media specifically, in order to market yourself and connect with new readers, is unpaid labor?

No it's not. It's being sour that you live in an interconnected world where others might find a way to benefit as a side effect of your own efforts.

That's your opinion; the creator of the comic disagrees.

Of course these are just opinions. It just doesn't seem like a perspective that's serving them well, except to make them feel bad about the art that they've been putting out… in the same way… for years and years… unchanged… with no tangible new issues for them or their readers… except the optional sadness that comes from this optional choice to be upset that other people are making money.

If you can't understand why somebody would "choose" to be upset that they're struggling to subsist while a company worth billions is using their labor without compensation to generate revenue, you're not worth discussing this with.

I understand it. It's called jealousy.

Their struggle to subsist has literally nothing to do with this other party's profits.

By their own admission, their struggle to subsist was a choice. It was a choice they made before these social media companies even existed. And they deliberately chose to continue with the same poorly-performing model that didn't make money, because they preferred doing it that way.

And now they are sad that others have not made that same choice, and are making money.

Why do you keep leaving out the part where the "others" are making their money using the unpaid labor of this artist and many, many others, without compensating them for it? Are you seriously suggesting that this artist should stop expressing their upset because they have the option to build a plagiarism machine instead?

That’s a single panel. The point is they don’t care ultimately but don’t like being exploited.

I wonder what effects AI ripping off today's artists will have on the next generation. They will grow up with AI "art" generators. Will they be concerned enough about AI-ripping them off that the don't put out novel, "human-made" works of art? Will they put down the AI tools long enough to learn "manual" forms of art?

Yeah, it's over. Why would a kid want to spend thousands of hours practicing with pencil and paper, when they can get a much more impressive result by typing a couple of words on a web site?

Why would anyone play chess? Also, why did we all assume that art of all things was the pinnacle of human achievement?

Why wouldn't we assume this?

What is the pinnacle of human achievement? I guess anything you say, someone else will claim that's not it either.

Why would we assume anything?

What is the pinnacle of human achievement?

Clearly not art if we measure it by how easy it is to automate. Is that why artists took pride in themselves? Because they thought they couldn't be automated? Because some one said that art was important because of feelings?

Why would we assume anything? [...] What is the pinnacle of human achievement?

I dunno, you introduced this line of questioning. You're the one who talked about assumptions. Isn't it up to you to answer it?

Clearly not art if we measure it by how easy it is to automate

No, "clearly" nothing. I'm disagreeing with you here.

Because some one said that art was important because of feelings?

Is anything that humans do important to you? If so, why? If not, why not?

I'm going to show all my cards here: I think the nihilistic view that nothing ultimately matters so why worry about anything at all, and let's concede all battles, is antithetical to human existence. You're not outright espousing this nihilistic worldview, but you're coming quite close in my opinion.

If you don't care about art, then pick something you do care about. If you don't care about anything enough, then all debate will be fruitless.

I dunno, you introduced this line of questioning

Why would a kid want to spend thousands of hours practicing with pencil and paper, when they can get a much more impressive result by typing a couple of words on a web site?

No I did not introduce this line of questioning.

Anyway, it seems like a lot of "meaning" people get out of thing is actually completely based on pride and if there is something out there that can do it better it's not worth doing because their sense of pride is destroyed.

In essence, humans reject the potential for a scarcity free paradise because of lack of "meaning" (aka pride). It's almost biblical.

Yes, yes you did. Plus I edited my comment to elaborate more on why I disagree with your comment.

Edit: so you don't care about meaning? It's one of the fundamental human traits, I think. Just existing isn't enough for a lot of people, and I disagree there's something biblical about this.

I care more about peoples lives being better.

I didnt introduce this line of questioning. Learn to reread and maybe tjink a little more deeply about the good outcomes of automating intelligence.

You just have to be a complete moron to think more intelligence in the world is a bad thing.

Ah, insults in reply to polite conversation? Time for me to bow out of this trainwreck of an "argument".

You were never arguing in good faith anyway, good news is that progress will steamroll your stupid ideals.

Have you ever taken a pencil or a brush to a blank sheet of paper?

Thousands of people do and produce wonderful works of art. Even before the AI craze, not everyone embraced even "plain old" digital art. The physicality of feeling a graphite pencil scratching a blank page is gratifying and cannot be reproduced by typing a prompt somewhere online.

Yes, I like to draw (I'm awful at it though).

The other day I saw a skilled artist on social media saying that it feels pointless after AI, and I found it really sad.

Because they want agency, creating something using a process where they have total control. It’s why people make music, do sculpting, write poetry. You can pay someone to do this for you, but it’s more gratifying to do it yourself.

It doesn’t matter second you get a job you’ll have to prompt.

Knowing any non-prompt skill just becomes a frustration, you know the machine can make the finished product in 5 seconds yet you have to spend the 20 hours it used to take when the output will just be judged against how many promptings could be done in those hours.

It’s like a musket against the A10 gatling gun

I’d never advise anyone to bother learning anything that’s promptable.

Just embrace it because honestly everything else is over.

20 years ago people said “It’s all over for artists. You don’t even need to practice at it, you just press a few buttons in Photoshop and get a better result.” And 100 years ago they said “It’s all over for art, all you need to do is press a button on a camera and you get a perfect recreation.”

I can buy a cutting board on Amazon for $20 or I can spend thousands on woodworking equipment to make my own. I choose to do the latter because it's fun.

The same reason there's been a huge resurgence in rediscovering "artisanal" methods in many fields.

The same reason that photorealism as an artistic movement still exists despite photography existing.

The same reason people still learn to play the piano.

Because they find the process of creativity and creation to be a fun way to spend their time?

Why do people do anything that has an arguably better and more efficient option available?

The same effect type writers and printing had on Scribes and Scriptoriums - they won't be around, they won't be needed, people will not pay for them except for extremely niche scenarios.

It’s not the same effect. People learned how to read and write, then the means of creation got cheaper. Now no one is learning how to draw and paint, they just create a random mashup of others works.

Artists create new things, scribes copied pre-existing works.

"Art" will move into a direction that's harder to copy, artists will use ML models to help them create, people will develop a sense of what is AI generated and it will begin to feel cheap. It's already pretty obvious, like the blog posts with AI generated images on the top. There's already a trend of people taking out of focus and motion blurred selfies.

In whatever direction it takes, art will find what is difficult or obvious for AI to fake and then that will be valued.

Think of it like this, aluminum used to be prized by kings, honored guests would get aluminum cutlery instead of gold. Aluminum got cheap, there are still fancy spoons.

There is simple way to fix this.

Ban private large models trained on public data, require them to be public weights.

If a company wants to train large private model, they can do it with their own data.

I don't see many people suggesting this, but I also quite like this way of thinking about it. The idea that it should be illegal for models to learn from artists, or that artists have a right to extract payment from the model, doesn't make much sense to me, it's too much of a radical departure from the way we treat human learning.

But it seems unfair that a company can own such a model. It's not their work, it's a codified expression of basically our entire cultural output as a species. We should all own it.

We should all own it.

Maybe, just maybe better to ask artist, who should own it?

Which artist? There are billions.

I think all of them?

We can fix it. Lets create "for-ai-training" tag. With default value "false", so, when tag is absent you can't use text and images for ai training.

Machine learning is a radical departure from human learning, so it seems like different rules make sense.

It's not simple since you can't ban it everywhere. China, India, et al.

You can't ban it for open source either.

I agree that it isnt that simple, but you could still ban it.

Real world policy is rarely black and white, all or nothing. An inability to prevent every instance of something is not a reason to forgo any action

Just because country X has slavery, doesnt mean the US can't ban it. There may be some consequences, and disadvantages to doing so, but that is a trade off, not a show stopper.

Same thing with open source. There is lots of illegal and bootlegged content that is shared on the internet, and cant be eradicated. That doesnt mean it can all be freely bought at Walmart.

It doesn't matter, those states would not be better off if they did not ban it.

They would be better off if you did it the other way around - ban large public models.

If you want a large model with public weights, just do it yourself. Nobody is stopping you.

Cool. Just give me $500k for GPU time and I'll get started.

At the least, we need a Creative Commons license for ML: if you train on my data, your model must be public and permit remixing.

And I am in favor of more stringent licenses than that: you must pay me to use my data.

For some reason the people complaining don’t agree with that either.

You can't make everybody happy at the same time.

Some of the public data may still have license attached, so there are still challenges in provenance, attribution, and copyright.

To draw a parallel in software, we have MIT licenses that allow for-profit, private use of source code in the public. The copyleft license might be more aligned to what you are envisioning?

You can draw parallel with many things, ie. student who is learning on copyrighted work in a library.

The simple fix is another AI that checks the output for copyright violation.

The issue here is models generating copy written work verbatim.

People claiming that training on copyrighted work is a violation of copyright (its not) have no legal legs to stand on. They are purposelessly muddying concepts though to make it seem like it is. However any competent judge is going to see right through this.

This is a great compromise, honestly. Only slightly related, but what if copy"right"-law was opt-out? You could ignore it if you want, but you aren't allowed to hold any either. I would join.

Those of you who are desperately trying to soften the blow - "they're talking about facebook, not OpenAI", "it's fair use, the license terms are irrelevant", "nothing has been taken from them" - you need to allow yourself to think the thought that maybe your AI startup's business model is only valuable because of theft. You need to let yourself entertain the idea that maybe you are only getting paid a cushy tech salary because you're aiding and abetting theft. You need to at least try to learn how to draw hands, the hard way, before hopping on Midjourney and saying "art is easy, why do these artists think they deserve to get paid for this?".

You are using the word "theft" in a way which does not describe conduct that is actually theft.

Not all of us participate in the form of thought control that has been euphemistically pushed as "intellectual property."

I'm just curious if you believe that physical property or real estate is similarly a form of "thought control" that does not actually exist. "Property is theft" is a common saying in some intellectual-political circles, especially about land.

Your question is disingenuous IMO. Physical property is about physical things, not thoughts. "Intellectual property" is literally about thoughts and work based on them - it's in the name.

I don't know... I seem to go back-and-forth on the larger property question throughout my life.

But I certainly think that the casting the act of copying bytes from one medium to another as "theft" is propagandistic and dishonest.

Hey, sorry, I read your profile briefly before replying... do you play bluegrass as your main gig or as a side hustle? Your new album is a bop.

Anyway, I don't think there's much I can say to change your mind - talk to other artists, especially those who are artists as their main hustle.

do you play bluegrass as your main gig or as a side hustle?

Main gig, although of course the sands are shifting rapidly with regard to what that means, for precisely the reasons this conversation is so stimulating.

Your new album is a bop.

Jeez man that's a nice thing to say. I'm really proud of it. We worked really hard on it. It has been fascinating to see and hear the responses; the story that I tell is one of "open source" and "traditional" having roughly the same meanings as applied to the intellectual property frameworks surrounding them.

I'm curious to hear what you might think of the album played in this order; I think it gives a darker and more contemplative perspective:

https://open.spotify.com/playlist/2ucmjLc88iZdfKi9zlU5HP

talk to other artists, especially those who are artists as their main hustle.

As you can see from the ensemble on my record, I've become friendly with a good chunk of the mainstream bluegrass world (which is a fairly tight-knit community). I think that a lot of artists are just very frustrated with the system as it is, where they are pitted against their own fans (or whomever wants to simply download and listen to their music) with the industry acting decidedly as a middle-man.

I am reasonably (but not totally) confident that musicians, and particularly purveyors of what have come to be called "the traditionals", will be among the main forces seeking to dismantle the IP system in the next couple of decades.

nothing has been taken from them

Value has been taken. Motivation to create has been taken if the second you upload something it gets sucked in to a model that any old person can create infinite iterations.

The understanding you're missing is that artists benefit when they are credited for their work. Let's say you drew concept art for a videogame, and your name appears in the credits. You can now go to recruiters and hiring managers and say "I worked on a game that shipped" and get hired.

Let's say, during hard times when you don't have a job, fans of the game you worked on notice you on twitter and say "hey I loved your work, here's $5 for your patreon." Now you can pay your medical bills this month.

Their livelihood was taken from them. Midjourney is theft.

Nothing except their ability to make a living.

And those who are getting paid those great tech salaries, should be willing to send 100% of their paychecks to artists while they code for free.

Because . . . they're not creating value with their code. It took them years to learn how to code well. All that effort should be contributed to the world for free.

Good point! The software engineers whose work was used to train Copilot, etc. should also be getting paid.

It took them years to learn how to code well. All that effort should be contributed to the world for free.

I mean, there is a plethora of open source work that is actually contributed to the world for free, right?

Honestly digital media in general has greatly cheapened the value of images, music, illustration, etc. and this has been been a long time coming.

It was clear to me over 20 years ago when I realized that I wouldn't be able to earn enough in the arts, and ironically due to creative people sharing so much good content for free.

While I feel for people who didn’t realize this, but it’s no surprise if you try to earn your living by adding digital content to the global network of computer systems that something like the current AI trend was bound to happen.

If you love art - paint a painting, make a sculpture, perform in a play, perform live music, dance, etc. Digital is a dead-end goal in and of itself.

All human endeavors are dead ends, in the end.

Spot on. Most ai model value is in the stolen content. These criminals need punishment.

only valuable because of theft

It's not theft though

It has to be very hard to overcome the bad vibes of being in a situation like this.

The technology seems indecipherable to a non-techie.

The law seems indecipherable to a layman.

The ethics seem indecipherable to everyone.

With so much confusion, to feel that one has been treated justly it might not be enough to participate in a class-action lawsuit resolving what happened. It would help with public trust if there were available for example protocols or sites for connecting people who want to sue companies. - Just something that shows that society does support values of equality and justice.

I think the ethics are pretty clear.

1. Don't do things to people that they don't want to be done to them. 2. Do as you would be done by.

It really is that simple.

People and them are quite varied though.

One one hand I want better AI that can generate whatever image that comes to my mind.

On the other hand I don’t want it to blatantly copy someone else’s style that they spent years making.

US is pretty fucked since it has very little safety net compared to other modern countries if AI really started replacing humans.

This is people’s livelihoods we are talking about.

Will AI cause people to commit suicide? Yeah if it starts replacing them and they lose meaning in life.

People and them are quite varied though.

So ask permission. Do you want to have our model ingest your work? Y/N

[No answer]. Then what?

Permission isn't the end-all-be-all. I don't ask for your permission to exist or walk down the street, for example. You could try to make the case that people need to ask for permission first, on a balanced-harms argument, but it's not self-evident to be true.

"It's also not inherently unethical to do things that someone doesn't want, because not all wants are valid or reasonable. A child may not want to have the candy put away, but it is still done anyway." - ronsor

Do you understand the concept of consent?

Absence of "no" is not the same as "yes".

Sure. The requirement of consent is still not necessarily always ethical.

Yes. Yes it is.

There is a very narrow window where you can argue about the proportionality of state's monopoly on violence to coerce someone. There you can argue about the competing ethical needs to keep society safe vs a criminal's lack of consent to be imprisoned.

But that's not what we're discussing. Should a large company be able to ignore consent in order to create a product for profit?

No.

Since you brought up coercion as a reason for consent, do note that when your works are used, no action is being done upon you. So what coercion is being done here?

On the other hand, copyright proponents are the ones trying to coerce other people, restricting them from using what they already have access to via the use of copyright laws, backed by state violence. We do not consent to copyright laws.

Of course, the copyright of corporations producing AI is protected by state violence as well. But I do not regard that as legitimate either, so people should try and make that more open (pirating the models) instead of trying to close down their own works.

Ah yes. Property is theft. So I'm allowed to take your car because you're trying to restrict me from using something I have easy access to once I break your window. No harm has come to you, has it?

Unfortunately, we live in a world governed by laws and society. For 300 years copyright has been the law of the land (in varying degrees). It was bought about, in part, because people feel that someone mis-using their property is undesirable.

You cannot tell people that their feels are invalid. Lots of these artists feel aggrieved that their art is being misused. They think the law has been broken and a social contract has been violated. What argument do you have against that other than "you're wrong to feel that way"?

Do note that (1) personal property differs from private property, (2) IP laws are definitely private property in that they give the owner power over the consumers, and (3) you cannot compare scarce resources (e.g. a car) to abundant resources (e.g. information), essentially, don't commit the "you wouldn't download a car" fallacy.

It was bought about, in part, because people feel that someone mis-using their property is undesirable.

Really? Wasn't it to promote the progress of the arts and sciences? The whole point of IP laws is to, ultimately, benefit the public domain. Feelings by themselves aren't a really good argument either, when talking about a law that does harm to the general public.

The problem is this doesn't scale.

How do you ask permission? <img permission="yes"> ? How do you know the original author of the image is the one posting this?

(1) means that taxes, the legal and justice systems, much of the public education system, criticism, begging on the street, and protest are all unethical, among I'm sure many others. Not to suggest you believe they are- I'm sure that you can present very strong arguments for some or all of these- but that makes it much less simple and leaves room for debate about whether this situation is also an exception.

People are free to leave a place which taxes them, or to homeschool their children, etc.

This situation is not an exception. There is no compelling reason to force people to have their work ingested by an AI. Unless you believe that large companies have the inherent right to disregard the rights of individuals?

Adding opt-out conceivably covers all of these examples except for criticism and the justice system. It does also rope in DALL-E and Stable Diffusion.

Do I have the right to not let {MINORITY} ingest my work? The right to not allow my words to be quoted? To say you cannot rip my mp3s from CD and must buy it twice if you want to listen on your ipod? To prevent it from entering public domain or to ban fair use transformations like parody?

Was it unethical of Emily Dickinson's family to publish her work posthumously and without consent, and, separately, is it unethical for me to read it today? Was it wrong to translate The Art of War, given that Sun Tzu wanted it to give the Chinese an advantage over their enemies?

I (obviously) don't believe that large companies can disregard human rights. I do believe that ASI is a necessary precondition to a post-scarcity utopia and that every day it is delayed is a day that a billion people will suffer unnecessarily, and I admit that probably biases me against people in the first world who are doing their damnedest to delay it anyway because they want money. I do recognize that the possibility that OpenAI et al don't get us any closer to ASI or the possibility that ASI destroys rather than enriches humanity are valid critiques of that.

Ultimately, I'm strongly confident that people don't have unlimited rights to what is done with their works, weakly confident that one such limit is on the right to prevent their use as training data, and very strongly confident that this is a complicated ethical question.

I personally don't care if my work is used to train a large AI model.

It's also not inherently unethical to do things that someone doesn't want, because not all wants are valid or reasonable. A child may not want to have the candy put away, but it is still done anyway.

That's fine. You can donate your work. But you cannot forcibly donate the work of others.

I have no patience for midjourney’s hacky-ass discord interface so - is anyone with access able to weigh in on how effectively one can generate images aping the style of ‘cat and girl?’

What’s hacky about it? They’re using the discord chat bot interaction tools to make interactions for their chat bot.

They also have a UI on their website, not sure if it’s still closed or not.

It's still closed. It's hacky because Discord is generally a messaging app. You can wedge things into messaging apps but the UI is suboptimal. I did an AI art symposium at Google towards fall 2022 and it was __impossible__ to explain to a designer how to use it in under 2 minutes. Ended up dropping it entirely.

I find it very hard to believe that it was impossible to explain how to use the basics midjourny’s discord bot in 2 mins.

I would have going in, but it seems obvious in retrospect.

#1 Remember, designers, these aren't people who know who or what or when an IRC is.

#2 To even get going, you have to be able to mentally model the chat entry as a command line, and have an intuitive understanding of a slash command.

#3 The conceit was there were 3 laptop stations, one for Dall-E, one for Midjourney, one for Imagen, and they were rotating stations every couple of minutes. Adding time pressure didn't help.

Another way to think about it, real dialogue we had: wait, I'm at a chat app...okay where you chat, you want me to type slash...okay then imagine...wait whoa what was the menu that popped up after slash...wait whoa why did it go away...okay slash, and what do i type after that?...okay, imagine...wait, no space? okay '/imagine'...wait can I just load Dalle-E that was easier?

Discord is just Kryptonite to some people, myself included.

I eventually got used to it but I don't enjoy it. First time trying to log in it didn't believe I was a human + the way most servers have you add a react emoji to confirm you read the rules and only then do you actually see the channels available to you, not to mention all the alert sounds with no visual indication of what just alerted, something about it just makes me feel old.

It’s hacky to use a chat bot as a command line interface for a process that would greatly benefit from a proper GUI. It’s poor UX on purpose and it offends me.

this is what I get explicitly asking for "Cat and Girl" in the style of Dorothy Gambrell.

https://cdn.discordapp.com/attachments/1098302916395270235/1...

It gets even worse being MORE explicit about referencing THE '"Cat and Girl" online comic strip by Dorothy Gambrell'

More or less. Cat and Girl is interpreted as 2 not-proper nouns. But even when you just spam Dorothy Gambrell it doesn't get it. It doesn't seem to know the style individually, at all.

That’s what I was afraid of.

This comic reads like someone complaining about an imaginary problem, not a real one.

(The images generated here are lovely though)

oddly, seemingly impossible. https://imgur.com/a/7HAfL5w (5 trials)

Yeah that’s what I was afraid of.

It’s hard to be sympathetic to someone complaining about something that’s not actually happening to them :/

It's just a prompt to say using thick pen line art cartoon of a something something. Midjourney kind of sucks though in this cartoon style for most things unless it's a one off. Last I tried I couldn't make a 'comic' per se as the context and style would be off panel to panel. I paint and draw a lot for fun and love to play with these as ways to see ideas before expanding on the time to make them. I'd gladly pay as an artist a very small fee to use something that is worth a damn to experiment with. I hate the thought though that people are creating final imagery with AI created painting/imagery without a thought of what it took to create it.

what you'd really need to do is actually a custom --style that is trained on her images specifically, which midjourney allows

https://www.youtube.com/watch?v=Iy4d4UP7w2c

I'd find it hard to argue against this, or the Penny Arcade's statements, since I'm having trouble understanding their concrete arguments in between the rhetoric. I'd be hesitant to even discuss this in their comment sections or social media channels.

One might ask: Under what circumstances would AI art be acceptable then?

For example, does it really matter if these models are created by large corporations? I don't see what the legal or ethical difference would be if it was an individual who created such a model.

Is it relevant whether their artworks were used in the training data? Well, what if a new model that is trained only on public domain photos, videos and artworks turns out to be just as capable? What if a future model is able to imitate an art style after seeing merely one or two examples of it?

It might just be a matter of time until such a model is developed. Would it be alright then? If not, why?

(Personally, I think it's the responsibility of the AI model user to use the AI art legally and ethically, as if the user made the image themselves.)

That question - "Under what circumstances would AI art be acceptable then?" - is definitely not asked enough. And I think taking time to make it acceptable is a worthwhile goal, part of the path, not an obstacle.

"Under what circumstances would AI art be acceptable then?"

Easy!

Under the circumstances where the artists whose art was used to train the model explicitly consented to that (without coercion), licensed their art for such use, and were fairly compensated for that.

Plenty of artists would gladly paint for AI to learn from — just like stock photographers, or clip art designers, or music sample makers.

Somehow, "paying for art" isn't an idea that has entered the minds of those who use the art.

Perhaps because it wasn't asked enough.

This only makes it harder for open, non-profit models to compete with large corporations in making AI models. Examples: Adobe Firefly. Getty's AI. Probably Disney's internal AI. Are those acceptable, then? I'd say Stable Diffusion is more acceptable than any of them.

Information wants to be free, and you shouldn't need consent to reuse information. Furthermore, this isn't coercion -- I could just as easily make the case that artists are trying to use coercion (government power via copy-restriction laws) to impose restrictions on AI trainers.

I would say the better approach is to try and pirate the large models as much as possible -- using them to train smaller, open models, so we aren't forced to use commercial AI as much.

This only makes it harder for open, non-profit models to compete with large corporations in making AI models.

This is not a good argument to not pay people. Minimum wage and labour laws make it harder to compete, but they are still a good thing for society.

We don’t let companies use unpaid labour of non consenting others in different cases just to make it easier to compete.

The argument is that trying to make more restrictive IP laws only benefits large corporations. Therefore the question is "only corporations have AI" or "everybody has AI".

For the former, once the contributors have been paid, all of the capitalist-related issues that come with AI will still happen, just worse. More generally, the issues of artificial scarcity get worse as well, since now the models are no longer an abundant resource, but one controlled and restricted for profit.

Unless the answer is just "I don't care if future people get screwed over, I just want to be paid once/receive pennies in royalty".

Using your analogy, it's like minimum wage laws prevented open source software from existing because contributors need to be paid minimum wage.

Using your analogy, it's like minimum wage laws prevented open source software from existing because contributors need to be paid minimum wage.

I'd say it's different, either you work directly on an open source project and you know the license your work is under, or you license your own work to be/not be allowed to be used by for-profit companies.

More generally, the issues of artificial scarcity get worse as well, since now the models are no longer an abundant resource, but one controlled and restricted for profit.

Sure but that should be done on a higher level, that takes on copyright largely instead of just letting artists and writers be screwed over. We are currently in a situation that only benefits one party anyway.

information wants to be free in the same sense that hydrogen does: it tends to leak. it's a ~physical property of information, not an ethical stance.

Under what circumstances would AI art be acceptable then?

I hate to make a sort-of standard Internet retort but artists (and "society") don't have any obligation to reserve some space for AI art to be OK within culture. Maybe such a possibility exists and maybe it doesn't. But given that present AI is something like a complex but semi-literal average of the art works various largish companies could find, it seems reasonable to respond to people's objections to that.

And society doesn't have to keep paying artists. It's a terrible, distopian future, but there's nothing to say we can't.

Society has "underpaid" artists for millennia.

I think these are the interesting question. Especially if the user provided the context Style for samples.

It's currently fair use to give an artist paintings and say I want something like this but different in these ways.

You can tell a script writer to watch Star Wars and write something similar.

Questions of copyright will depend on if the output is sufficiently transformative, not if copyrighted work was used as inspiration

Yeah, there really needs to be a solution to situations like this honestly.

Arguably there is - training is not a copyright violation imo. I think it's more akin to you viewing the art and then later making your own thing, you get an intuition/inspiration from what you've seen.

The other complaints are mostly about how it has become hard to not share to social media, but that's mostly a complaint about how people are.

He can still just post on a website, it's not really the fault of social media companies that people prefer that.

Its not like looking and getting inspiration, its looking and being able to almost perfectly duplicate the style.

Their complaint is not being able to post art for its own sake because it will be hoovered up into datasets to train more AI: They are forced to submit to helping train AI or not post their art.

Most skilled artists are perfectly capable of generating near identical copies of other artists work. This is something that has been going on for millennia.

That's different though because there are social and legal reasons not to do it.

Also, the pool of people that can imitate a skilled artist is much smaller than those who can prompt "in the style of Picasso...".

Its not like looking and getting inspiration, its looking and being able to almost perfectly duplicate the style.

You cannot copyright a style. A person is perfectly within their right to copy your art style when they produce their own work. So, I think "duplicate the style of" and "taking inspiration from" are the same thing in this context.

It's called paying copyright owners -- artists. The economic model can be worked out, but it's not the status quo, where software companies take all the profit and artists are left with nothing.

Amazing how everyone is an intellectual property zealot now.

Yeah I would also point out... If copyright terms were shorter, perhaps people would be less tempted to hoover up modern artists' and writers' works.

Maybe because it's more relevant? If you see one artist being ripped off, it's sad, but it's an outlier. When you see the entire sector being ripped off, you start paying attention. It helps if you care about the people being ripped off, which most people are capable of, having an instinct for empathy.

Yeah. "Copyright for me but not for thee". Yes I'm aware that applies to commercial AI too -- use open models.

This is an understandable but at the same time petty feeling to be attached to. Progress is more important than your right to limit the uses of your work.

Progress is more important than your right to...

Hard no. We progress civilization so that we (as humans) may benefit. Progress at any cost is pathological, see universal paperclips as the extreme example.

I don't think we even have a choice here. It's unstopable.

Human laws are not natural products of the universe.

It was a mistake to accept the validity of Aquinas and Hobbes arguments because we end up with conclusions like this. The concept of intellectual property is a legal fiction, a fabrication invented from whole cloth and somehow we've ended up here where it's assumed part of the natural universe.

When we reverse the direction of laws from being products of human creation to reflecting the laws of the universe we're forgetting the power of their authors and in turn giving up one our agency in drafting laws from myriad possibilities, any number of which, if had been picked by chance, would have been equally erased in authorship and told to us was a natural, universal, unstoppable law.

You have a three line sentence. Come on. Write clearer.

The reason I said it is unstopable is not because of a philosophical/deterministic argument. It was a figure to speech to verbalize the fact that the incentives are so strong for allowing AI to use copyright data that going against it would be a societal backwards decision. Similar to what avoiding agriculture was for previous societies that don't exist anymore.

I crafted an argument which is impossible to refute. Luckily for you I wrote a different set of words that you took literally(how silly of you). You're more than welcome to still try though.

Comcast and AT&T are plenty willing to cut off my internet if I pirate popular shows. Don't see why they can't just cut off infringing AI companies... and nations.

Training on copyrighted data is not a violation of copyright.

Generating images in the style of an artist is not a violation of copyright.

Generating copies of copyrighted images is a violation of copyright.

AI companies only have to address the last point here, which frankly shouldn't be too hard.

Considers profit "unethical", gives work away for free, mad that people take free work and use it for their own purposes.

You can't have it both ways. You can either give stuff away and realize that once you give it away you have no control over what anyone does with it, or you can charge for it and get paid up front (and ultimately still have no control over what happens to it after it leaves your hands).

Are you making art because you want to share your vision or what? If so, then share it and be glad that your vision got out there. If not, charge for your work and make sure you get paid first. Don't whine and cry that you didn't get paid after you decided that getting paid was evil or whatever.

Giving something for free doesn’t mean giving it for others to make a profit. We have copyright and other legal rules for a reason. Likewise, many open source licenses don’t allow for people to profit off what you donated to the world. It’s not a forgone conclusion that everything must be exploited for money by someone.

In theory, sure. In practice, if you don't have the money to file the lawsuits to enforce it, no you don't. And if you think making money is unethical or exploitative, then it's unlikely you are going to be able to have the money to enforce whatever rules you think anyone who uses the stuff you put out there for free should follow.

I see, so theft is right so long as you get away with it?

Fortunately you actually can lodge lawsuits in such cases even if you can’t afford it so long as you find either a lawyer who works on contingency in expectation of a settlement or one who believes in something. In this case, almost any lawyer in the field of copyright would be eager to simply make their name with a case that sets precedent in such an important domain.

Fwiw, I make a considerable amount of money, yet also write and contribute to free software. I think it’s unethical and exploitive to take my labor given for free to the community and claim it as your own for your personal enrichment. I think I’m not alone in this specific community on hacker news. So, in theory you’re right. In practice, the libertarian ideal of mad max exploitation of anything you can find not bolted down appeals to a few small number of people, and the rest of us are happy to defend artists and creators from that small population.

Copying is not theft.

I think it’s unethical and exploitive to take my labor given for free to the community and claim it as your own for your personal enrichment.

Nobody is claiming anything is their own. You gave it away. You did the labor, it will always have been done regardless of what anyone does with it after that.

There is no difference in the effect on you, so why are you mad about it? If you want to lock it in a box so nobody else can ever look at your precious baby, then do that. Don't put it out where anyone can use it and then get mad when they use it. What did you make it for if you didn't want anyone to use it?

Presumably you got something out of making whatever you did, and that was enough reason to make it. If you wanted to keep some exclusive right to it, then you shouldn't have made it publicly available.

Don't whine and cry that you didn't get paid after you decided that getting paid was evil or whatever.

Software and source code comes with the most complex licenses and terms of use, but an artist isn’t allowed to make a small request that others don’t profit of her work without people on a software forum saying that is whining and crying…

I would be against those software licenses as well. Where I make my own code public I try to use MIT license whenever possible. If I don't want people to copy my code ... I don't make it public. It's not that complicated.

Fully open source software has probably created more economic value in the past 60 years than anything else in human history. We should be trying to share more knowledge with each other, not come up with new and more complicated ways to restrict it.

AI business model is that of a school bully who a) steals a cup of lemonade from your lemonade stand, b) drinks it, c) pees into the cup, d) sells it to others, e) convinces the cheerleaders to go around telling everyone that you are not going to be able to make or sell your own lemonade anymore.

All your points would imply that the original art was degraded somehow. How?

Because now any child can create art by typing a prompt, creating more competition for them, I guess. Very selfish thinking imo

It's a perfect example of the "if you're making a shitload of money, whatever you're doing to make it must be okay" rationalization that pervades SV and America in general.

Prosperity Gospel. "Everything happens due to god's plan, therefore anyone who does well is blessed by god."

Bleh.

A school yard bully that goes to an art museum, learns how to paint, and then sells those paintings.

An independent artist who was already struggling now got bamboozled by Midjourney. Instead of listing the Artist somewhere in the midst of a million attributions, every instance Midjourney draws a cartoon in the Artist's style should at least have a (clickable link) attribution to the artist's website.

But today there are thousands of (human) artists who create original works "in the style of" other famous artists, and my understanding is that this is completely legal.

Artists own a copyright to their actual works. Not to a "style". Besides, all artists learned from other artists and borrowed lots of ideas from predecessors.

Why should we hold AI to a higher standard than we hold human artists to?

Because it makes some people feel bad, and we care about their feelings?

True, but don't forget to take into account the harm done to the general public if you restrict the ability to "copy styles". Personally, I think that's a much larger concern.

Why should we hold AI to a higher standard than we hold human artists to?

Why hold it to human standards at all? It cannot be argued with, jailed, or shot. The only vaguely human thing it can do is draw kinda good.

If you ask MidJourney to mix the style of a few artists, what should the attribution be? I find the ChatGPT Dall.e solution better: if you name a style from an artist, ChatGPT will attempt to describe the style rather than using the artist name.

I don't think there was an actual list of artists released, was there?

It was in a court filing: https://storage.courtlistener.com/recap/gov.uscourts.cand.40...

Yeah... but I don't think it's an actual list of artists that were used? Here's a relevant comment from the last discussion of this on HN: "Looking at the source tweet[1], this isn't a list of artists specifically used to train Midjourney, but instead is a list of artists used to pre-render additives that can be added to an existing prompt much faster than an arbitrary addition."

https://news.ycombinator.com/item?id=38908079

You raise a good point. Why doesn't Midjourney release the list of plagiarized artists?

Isn't that the LAION dataset? It's publicly available.

Agreed its a bit tertiary, but how would that list of artists have any meaning if their work was not included in training?

also the discovery of the list was a strong rebuttal to the (iirc) CEO of Midjourney complaining that finding the source of online images in order to ask permission / offer compensation was just too arduous a task and also that they didn't bother. Well here's a list of thousands of artist styles they bothered to identify.

Whether this is bad or good or should be banned or not is functionally irrelevant. There is nothing that can be done short of terminating the global internet. Sure you could ban it in the US and Europe, but you can't in China, et al. You can't ban it everywhere. You can't ban open source.

Art has been democratized. It's not going back in the bottle. A kid with a laptop anywhere on the planet will be able to compete with the largest Hollywood studio in the near future.

And yet, Comcast and AT&T will gladly terminate any customer on their network pirating copyrighted works from large corporate owners. It seems to me like we very much can do something about AI infringement.

Does that actually happen often, if you use a VPN or live in one of the countries that don't care about piracy?

Sure you could ban it in the US and Europe, but you can't in China, et al.

The same can be said for enforcing intellectual property rights, and we do enforce IP in western countries despite other countries not respecting those rights. We could end up where models can only be legally trained with public domain IP, enforced by regulation, royalties, private causes of action, etc.

You may like things the way they are, but it's false that "nothing" can be done to change it.

You can't ban torrents, but Netflix still pays rights holders for movies.

There is nothing that can be done short of terminating the global internet. Sure you could ban it in the US and Europe, but you can't in China, et al. You can't ban it everywhere. You can't ban open source.

Ah yes, the Nirvana Falacy. [0]

Microsoft, Google, etc are corporations that operate within a legal framework. You can quite readily claw back some of their profits from such ill-gotten gains and redistribute them to the creators who they currently intend to put out of work.

"Oh, but some pirate will do it! Oh, but China will do it! So why not let those poor megacorporations do it, too!"

So what. The megaorporations who nominally answer to and respect the rule of law are the lion's share of this problem. They can be made to stop. Solve them and you're 90% there, don't get bogged down in the other 10%.

[0] https://en.wikipedia.org/wiki/Nirvana_fallacy

I'm a little skeptical that this artist's work was "used to train Midjourney" just because their name appears on that list. My understanding is that the list in question is just a scraped list of artist names from Wikipedia, and was used as a list of possible styles to try in prompts, to see if CLIP happened to know them. I don't think that being on the list necessarily means that an artist's work was used to train Midjourney, nor does not being on the list mean it wasn't.

An easy way to tell is to instruct Midjourney to create something “in the style of ______ “

I don’t currently have a MJ subscription or I’d try it.

Well, I tried it, and MJ gets thrown off by the name of the style being two generic words "cat and girl." It basically thinks you want a cat and a girl. I tried lots of variations and prompt options and I was giving too many hints.

Lesson being: give yourself a name as an artist that cannot be used easily in an AI prompt.

I think some musicians did this with artist names and track names that meant they couldn't easily be represented as MP3s, but could easily be printed on the tracklisting on the CD inlay.

Is the artist anonymous or something? Use their real name…

Well if you call your band just a string of characters that search engines don't like, such as punctuation, math symbols etc. I think Aphex Twin did some of this. Maybe Royksopp? And I think there is band called Various Artists.

My 2 cents - people often compare this AI to automation that came before say the printing press, or automatic fabric production . But it’s actually far worse.

If all the calligraphers drop dead, the printing press still works. If everyone who hand makes fabrics drops dead, the automatic fabric machines continue to develop.

But midjourney doesn’t work without artists, it depends on them like a parasite depends on its host. Once the host dies, the parasite is doomed.

So it’s value-destroying, more vandalism than capitalism. Or maybe like Viking pillaging. It’s like the people who used to burn millions of penguins alive to convert their fat into oil.

https://www.newscientist.com/article/dn21501-boiled-to-death...

If all the calligraphers drop dead, the printing press still works.

If all artists drop dead, MidJourney still has a trained model that will continue to work.... it relied on artists to become operational (just like the printing press and automatic looms relied on their respective artisans success) but it is not dependent on them to continue to function.

Printing press does not rely on work of calligraphers at all, not to become operational, not to continue functioning. new fonts and typefaces are created by an entirely different profession. Printing industry is still advancing and the number of calligraphers is basically zero.

Without artists, midjourney will never learn a new style. It is not capable of advancing.

I think the analogy holds. Printing presses can't produce hand-drawn calligraphy just like Midjourney can't produce art outside some boundary on it's training set. Both are limited in capability compared to humans, both have inhuman capabilities. Both required humans to initially produce value. Neither requires humans to continue to produce value.

What about training on art that's in the public domain or that's privately made specifically for training?

Midjourney is not art, it’s theft by proxy. LLMs aren’t A.I., they’re stochastic copy and paste chatbots.

Copying is not theft. Are we hackers or RIAA lobbyists?

That’s a naive and insidious line of reasoning. Free art and code should stay that way. Any company making a profit off of it through LLMs is unethical. It’s the opposite of open source.

Imagine if someone told me during the Pirate Bay showtrial that in the future I would see fellow hackers turn into copyright and intellectual property zealots.

Silicon Valley does not understand the concept of consent.

Consent is not needed to copy and reuse public information.

^ wow

Oh man can that be taken out of context... and still probably be right.

As someone who still has a couple physical Cat and Girl zines, and who spent those same years writing free software for real people and not to pad a resume or force a standards committee's hand... this hits really hard.

While I don't have the Cat and Girl zines. I've spent years convincing companies to open up what source code they can, or working for open source companies.

I get it, though I had the day GitHub CoPilot came out and I wanted them to take a long walk off a short pier. (And I still do.)

convincing companies to open up what source code they can, or working for open source companies.

Yeah, no, you don't get it. Sorry.

There's a certain prevalent belief that I consider a fallacy, which is that if A profits from B, then that must be to the detriment of B and B is being exploited.

Irregarding of whether midjourney is exploiting the authors, the author draws a comparison to instagram. He claims him posting his art there is unlaid labour as instagram profits from it, and hence he is being expliited when he does so. That comparison weakens his case for me, as I do not agree with the above implication.

In my house my wife and I have a saying: "Trade benefits both parties by definition."

We say it often as a kind of ward against evil. The zero sum thinking that leads one to feel they're getting gypped in a given deal is almost never actually the case in a free market society. You always have the option to, if a deal provides no value to you, simply leave it on the table and move on with your life.

However, I always mentally add an important word to the beginning: Voluntary trade benefits both parties by definition. Who decides what is considered voluntary? If this market square is the only one in town I can sell in without being beaten up, is selling there voluntary? What if it's just the case that no one wants to go to my random grocery stall in the woods?

I've come to conclude that's ultimately an ethical question, not an economic one. Hence the eternal bickering. So from a certain point of view, I could see why my handy ward against evil could just be seen as another way the capitalistic regime imposes its will. One of the fun things about the internet is all the edge cases it creates along this very boundary.

Truth is, there are many platforms where you can distribute your works very effectively while ads are being shown, so there is at least competition, severalarket squares. There are also many ways to distribute your works lrss effectively where ads aren't being shown, so you have that option too. It just sucks because guess in comparison because hosting and distributing just costs some money.

I was arguing with an artist the other day about this. I said that Midjourney copies the work of artists the same way that all human artists "copy" each other (checkmate, in my opinion). They pointed out that Midjourney wasn't a person and that I shouldn't treat it like one. They went on to say that it was a bit like enlisting a cheetah in a sprinting event, and that the game was still fair because the rule book didn't specifically outlaw the participation of cheetahs, and because the cheetah had to obey all the same rules that humans were bound by, such as gravity and requiring food for energy. I pointed out that this was obviously not the same, since the entire game falls apart if you let a cheetah play by the same rules as a human. They didn't seem to understand that, though.

I think you made the point for him, "the entire game falls apart if you let [AI] play by the same rules as a human."

I pointed out that this was obviously not the same, since the entire game falls apart if you let a cheetah play by the same rules as a human.

I think that was exactly their point. And a better one than your.

Evolve or die. The world is cruel.

This is true for everyone. Due to human nature, we will make AI increasingly capable at an exponential pace. Eventually, our efforts to control them will become futile, and humans will no longer be the dominant species.

Luckily, we are uniquely positioned in evolution to prevent this. We must merge with AI.

Taking this further, why would future newborns even care if they are AI or meat? It's really just us trying to cling on to our pathetic consciousnesses. It's quite sad and pointless, really, but it means the world to us.

I'm not sure how to feel about this, ultimately any work you put on the internet can be used in a way to enrichen other people. If you write free software, if you write a poem, if you draw art, etc. People will be free to heavily copy or be inspired by your work (and the distinction between copy and inspire is very muddy/gray).

The real question is about the scale and how "original" ai works are. Does generative ai enabling this at scale make it different than humans doing it? Is there something special about human authorship?

I personally don't care if my open source code gets trained on by AI, but I'm not sure about the social contract around artistic expression.

Let's say I develop an AI that is trained on all of the software IDEs in existence and then make the AI available for people to generate their own IDE from a set of prompts, like "jetbrains python" or "visual studio". As the developer of the AI model am I at all accountable if people use specific prompts to generate copycats of proprietary IDEs?

With that whole harvard plagiarism debacle, I must say this is the new normal, copyright and accreditation is a thing of the past. The only way forward is plagiarism and outsourcing your work, the only ones who will make the money will be the ones who offer the quickest way of offering the work, no matter its quality. A world for AI to AI. As morals erodes, 20 years from now, there will be no instagram/tik tok/facebook, it will be machine generated from current events that may or may not happen, nobody will know a hallucination from reality. You might be able to go to a journalism hub. but those also use AI generation to focus on the most clickbaity title.

And AI is only accelerating an already existing trend. Freebooting and content farms have existed for a decade now.

I wish too that consumers will grow tired of low-quality stolen/generated content, but that is not the case yet. Take some horrific YouTube Kids content farms that are still churning billions of views for example.

A: model is influenced by artist's work, output of model influences human work.

B: human influenced/inspired by artist's work

In both scenarios (A) and (B), the activity may or may not be commercial.

Why are we so much more worried about (A), when (B) is considered totally fine, even desirable?

Please note neither A nor B involve verbatim copying.

I think a good reason to be more worried about A is outlined in this sibling thread: https://news.ycombinator.com/item?id=39017576 - humans have limited capacity, AI models are able to produce content at unmatched speed and quantity

People that consume copyrighted content are also copying some of it into their brain.

Before the advent of cameras, and likely still in some cases, several painters would have companies that predominantly reproduced the works of a known artist and sold them so that you could have a nice replica of a famous painting in your home.

One might think that tools like mid journey are much within the same category, but I think there is a key difference. The system of reproduction up to this point has largely been centered on successful and renowned artists—the only way you'd make money is by reproducing something that already had a lot of value to people, and it was also incredibly clear that it was a reproduction. In this case, things have changed. Now, the vast works of largely obscure artists are partially reproduced, and in a fashion that further occludes their source. These artists rightly feel robbed as they endure the treatment of a van gogh without the riches of fame or the palliative of death before fame.

Blocked on this page0 (0%)

Please feel free to include adverts as a means to monetize your work, my first impulse was to disable uBlock on your page.

That was really good and insightful!

It's sad to me that so many in this community side with OpenAI/etc against the artists. It's understandable, because these companies have created a very cool and useful technology. But I think the arguments - from "it's not technically copying, it's just viewing copyrighted material and then producing its own version like a human is allowed to do" to "intellectual property/copyright is inherently flawed" - come from a lack of understanding.

Let's make an analogy that more people here will understand. If you built an app for your startup, and OpenAI trained on its source code such that any prompter could produce an app that's virtually identical with no effort - would that be OK with you? Would you claim that copyright is in fact irrelevant and that this doesn't constitute copying?

This is certainly the first commentary on the subject that made me _feel_ anything, so hats off to the author.

The generative AI that is changing the world today was built off the work of three groups - software developers, Reddit comment writers, and digital artists. The Reddit comment writers released their rights long ago and do not care. We are left with software developers and digital artists.

In general, the software developers were richly paid, the digital artists were not. The software developers released their work open to modification; the artists did not. Perhaps most importantly, software developers created the generative AIs, so in a way it is a creation of our own; cannibalizing your own profession is a much different feeling than having yours devoured by another alien group.

If Washington must burn, let it be the British and not the Martians. How might we have reacted if what has been done was not by our own hand?

Scans through all the comments… nope not a single person picked up on what the original post was saying… y'all just arguing about whether you can be a fucking cunt and get away with it or not.

'profit as unethical' is a toxic idea that has ruined many good people

I find it difficult to sympathize with stick figure artists. At least penny arcade has a distinct style. But with the stick figure artists I just feel a sense of disgust and don't really care if their "art" gets consumed by diffusion models.

I know I'm being unfair but I think it's important to say the quiet part out loud so we can all be aware of which way the wind blows, because I can't be alone in this feeling. If the class action lawsuit presents the justices with a bunch of soyboys clamoring about AI stealing their "hard work" I have a hard time imagining them ruling against AI.

and without being exploited

This is naive at best.

Even as a kid, I saw horror stories about how badly artists were exploited.

I watched my parents struggle. I never experienced anything near poverty, but I definitely experienced the fear and anxiety of standing on the precipice of it. I remember taking up the responsibility of helping my profoundly depressed and grieving mother sort through the mail of our family business and seeing how far behind we were. This was a few years before we sold everything to avoid bankruptcy, which was also around the time I left for college.

I'm not complaining about this. I think it was good for me. It bred a cynicism and understanding in me: this world isn't fair. It isn't going to get better. You are going to have to fight for the life you want, and you will probably still lose.

I understand the desire to pass gently over the earth. But this has never been a kind world, and has not become any MORE kind since 1999.

This is, I think, an example of toxic optimism. Many people over-estimate the likelihood of a positive outcome.

Not to make excuses for Midjourney. They should have asked permission from artists to use their work.

Sadly, ai has been overtaken by thieves and sociopaths led by sama and others like him.

GPL 4 here we come. Training a model with such code will require the model to be open source.

First they came for the artists The voice actors The translators And when they finally came for the tech bros, the entire world cheered.

soon.

The comment section is going to be filled with people saying that this is fair use, and I'm sympathetic to that argument (even though I do feel like we should be mature enough thinkers to have the capability to tell the difference between the morality of building on top of a commons and plundering it). I'm very openly not a copyright maximalist; I think there's real danger in equating training to copyright infringement, and I think it's an argument that might come back to bite artists in the future -- I do worry that the current lawsuits are misguided.

However, in every single conversation about fair use, two followup questions need to be asked:

1. If you're making the argument that AI's "learning" from existing pieces is just like a human learning from existing pieces, then you're indirectly comparing that AI to a human being. So let's be consistent about that analogy: commissioning or working with a human artist to produce a piece of art does not grant you copyright over that art.

To people who are saying that AIs training are just like humans learning from art, do you support the idea that AI-generated works can not be copyrighted? Because that's the consistent view of AI as analogues for human training: commissioning humans doesn't grant you copyright.

2. If we're viewing AI training as fair use, then is it fair use to circumvent the licenses on these models in order to train other models on them? The majority of these commercial tools have explicit clauses in their terms of service to prevent their use when training other AIs. Is it fair use to circumvent that TOS?

If the TOS is in a different category that somehow magically gets rid of the fair use question, then if an artist puts up a TOS in front of viewing a piece of artwork that says it may not be used to train AIs, does training on that art stop being fair use?

It doesn't make sense that one shrinkwrap license would be OK and one wouldn't be. If Midjourney has a legal case that circumventing its TOS and gaining access isn't fair use, then artists should be able to put up a click-through in front of images that includes an agreement not to train and that should be legally enforceable.

If artists can't do that, if artists can't legally enforce terms of use in front of artwork about how that artwork can be used, then commercial companies building these models shouldn't be able to either.

----

It's not that the fair use argument doesn't hold any water, it's that people tend to be very selective about how they want that argument applied. And I think that's revealing. Taking a step back from those questions:

It's somewhat frustrating to have debates on this because the debates are very abstract and are disconnected from the completely bad-faith way in which these arguments are often proposed. You're forced to debate in a very dry way, and what you really want to do is step back and say, "none of these companies actually believe in open culture, all of them are copyright maximalists when it's convenient, none of them care about accessibility of art tools, they just want to plunder the culture and insert themselves as middlepeople between art and the people who make art."

We all know this is a grift. We know what these companies actually believe, but we have to pretend we're having some kind of emotionless conversation about "Open" models and to look at their uses as if the primary intended audience is people using image generators to make printouts for their local D&D campaigns.

But come on, that's not the reality that's ever going to happen.

That kind of indie, low-cost usage and Open access can't exist in the long-term if these models are intended to be the commercial powerhouses that companies want. Monetization of their output and access necessarily requires restriction of their output and access. The truth that we all kind of know deep down is that these models are about inserting a monetary layer between human beings and art, a layer that is owned by private corporations and is deliberately used (regardless of its quality or suitability) to devalue labor and to poison the resources that these models were built on in order to reduce competition and to give gatekeepers more leverage over their workforces and their markets. It's a tragedy of the commons being played out in real time: the creation of a service that just like every other privatized platform will become crappier and worse over time as more control over model use is taken away and as prices increase and as restrictions over usage are increasingly layered on top normal individuals who will increasingly need to ask permission in order to create. At the same time, businesses hope that model usage will shrink and dilute the fields that are necessary to sustain those models, fields that are now treated as competitors, and who's destruction is desirable because the elimination of that resource makes it harder to replicate or compete with established AI models.

As is very often the case, the goal is to get up a ladder and then pull the ladder up.

That doesn't mean that this isn't fair use (although as someone who advocates very strongly for expansion of fair use and heavy reduction of copyright, I will say that fair use and morality are two entirely different subjects). But we can advocate for fair use without pretending that the companies and grifters making these arguments actually believe them. Actual ubiquitous open access to AI art tools would be toxic to a company like Midjourney; Midjourney's entire business model relies on them constructing an artificial moat in front of the production of art that forces creators to give them money. And the existence of that moat relies on Midjourney being able to restrict what people can do with their model, it can not exist without strong legal protections around copyright and licensing agreements. If Midjourney actually believed in free culture and free use, they wouldn't be launching a private model behind Discord with a bunch of ridiculous license terms telling people what does and doesn't count as fair use when using Midjourney. They wouldn't be pretending that model outputs are covered by copyright. As far as I'm concerned, every single "license" in front of model weights or model output is an admission that the companies behind those models don't actually believe in the fair use argument. They're just saying it because it's convenient.

I would love to have a purely intellectual debate about AI as a genuinely open tool that can fit into artists' belts alongside other tools. I would love to ignore the hype cycle and the spammers and the grifters who all couldn't care less about making creation accessible or expanding the scope of what artists create. But unfortunately, I can't; there's a real world that exists alongside that hypothetical debate, and we are allowed to look at that real world for context. And I can be sympathetic to the fair use argument while still recognizing the utter hypocrisy behind how companies actually use that argument, and while still recognizing that the direction that large companies are pushing AI and the way they want to use AI is toxic to real artists.

----

That doesn't mean I'm panicking about it any more than I'm panicking about GPT trying to program. I understand the limitations of these tools as well, and a little bit of perspective goes a long way. But I still understand why artists are upset. I don't always agree with the specific arguments but it's not just them being sore losers or something, they have a legitimate grievance with the AI market even if they don't always perfectly know how to put it into words.

And in a lot of ways, this doesn't really have anything to do with the quality of the output or about what's "better" or "fair" anyway; that's not really what the debate is about. Midjourney would be perfectly happy to live in a world where all art gets kind of crappy and repetitive as long as everybody uses Midjourney to produce it. They don't care, they just want you to pay them every time you generate a picture. So the social dynamics around the market are very different from the technical dynamics or even the philosophical arguments that people make.

I wasn’t familiar with cat and girl, so I looked at a few random comics, and stumbled across this one:

https://catandgirl.com/heroes-and-villains/

The universe, I’m sometimes convinced, has a dark sense of humour.

Is "training a neural net" an entirely novel legal construct?

At what point does fair use become unfair, and should that question be considered separately from the context of doing it trillions of times?

And the further context of the resulting computer code being used to make a profit?

Nowadays life is like this (exploit or be exploited). You need to secure your future with what you have (context: micro, macro). In many situations, you need to break country-related rules or be temporarily anti-ethic to thrive(reduce costs and/or risks), but in the long term comply with what is sold to investors (if any). Authoritarians win today because they have a strong ego-driven 'why' to stay in power, but they are detrimental to the rest of societies or even on the organizational level and the 'why's' of every other individual.

Make the decisions that suit your current needs, including deciding if it is time to go away for good. If you risk too many times but can't reach any relevant material success in a context that is unfavorable. Where you need to be an entrepreneur to make serious money.

Sometimes deciding to give up can be the only way to see. I'm not endorsing suicide to anyone, but it can be part of everyone's life just once, and then it is finished. 'No suffering' can be now or surely later, and you will most likely "rejoin" in the next chapter of an infinite book of human lives depending on your beliefs in this life because nobody knows the truth behind that.

I still don't get it. Why are they upset? Nothing has been taken from them. There is no alternative in midjourney that sates anyone's desire to read their comic. Just as many people are going to visit the page for the comic, if not more. Why are they complaining?

Regular artists take a large number of other artists work and combine them together to produce new art works.

Midjourney is performing the same blending function as regular artists.

It's hard to see why one is any different from the other other than a human is involved.

Your result is art not some AI generating something which looks familiar to what you did.

As far fetched as it might sound for obviuos artists: I'm also an artist when i write code and design systems and architect things.

I hate software patents and if i write a really good system, i'm happy to share how i did it.

I would only be pissed at someone if they just copy paste my work not if they got inspired by it.

So can midjourney replicate the art style or not? I was seeing a lot of back and forth on whether that list was actually used to train, and I haven't seen a single artist manage to claim that their style can accurately be replicated.

If it is "allowed by copyright law" what about anything non-American they used? Chinese? Russian? Does the AI know and uphold all nation's laws or do they just think the US will protect them?

Some comic artists took it differently. Personally, I think we're building our future on the remains of Mordor.

http://www.dorktower.com/2024/01/05/putting-the-ai-into-aiee...

imho everything came crumbling down when folk [even developers] stopped making websites. Took whatever one could from generous others then gave nothing back.

We search google then find a spam page repeating our question, then we blame google! LOL

In the old days, if you had an interesting question, you would write down your progress in a draft blog post, you would polish it (or not at all) and present the answer to those looking to do the same or you would leave it to your comment section for others to rage about the problem, sometimes answer it for you, sometimes they would write a post of their own.

If things are good enough someone will link to it... but now? Nothing is ever good enough? If everything is anonymized dumped on on stack overflow no one maintains it. There is no free flow of dialog about the topic. It tries to be all work, no fun. Who wrote this stuff? What else did they do? Why are all these wrong answers on the page?

Anyway, to state the obvious, Google doesn't make websites.