return to table of content

Things are about to get worse for generative AI

Intox
244 replies
1d6h

Or... things are about to get worse for copyright holders.

I don't see any developped country pressing the brake on AGI in the near future to protect a few copyright holders from getting "stolen" in hypothetic scenarios.

dkjaudyeqooe
74 replies
1d5h

a few copyright holders

By which you mean every copyright holder.

AGI in the near future

Something that is purely speculative, undefined, and has been promised in the near future for 50+ years.

I don't see copyright holders lying down for someone else's benefit and I don't see governments gutting copyright, contract law, and several other avenues of protection that copyright holders can deploy in the name of something that doesn't exist and may not ever exist.

squidbeak
59 replies
1d5h

If a child is instructed to read a copyrighted work at school, which later becomes a factor in his own derivative works, he won't be in breach of copyright.

Why should other intelligent entities be prevented from reading copyrighted works and gaining whatever there is to gain from those works the way any human might?

loloquwowndueo
21 replies
1d5h

If the child / author then regurgitates entire paragraphs or sections verbatim in his own works and someone notices, you bet there will be a plagiarism lawsuit coming his way.

andy99
5 replies
1d4h

Sure. But if the child has that capability, it doesn't automatically make them a walking copyright violation. "Intelligence", even the current version of AI, entails knowing about stuff, including being able to recite. That doesn't mean intelligence's existence violates copyright. If a person used AI to make a copyright violating work, that's a different story, just like if they used their own innate intelligence to do so.

ncruces
4 replies
1d3h

Taken to its conclusion, liability is then on everyone who decides to publish anything that ChatGPT “tells” them, because it might cross the threshold on plagiarism.

Are the OpenAIs of the world ready to shield their customers from that liability?

If it turns out that using ChatGPT to help you write your resumé opens you up to accusations of plagiarism, or DALL·E to create an image for your website opens you to copyright violation, will you use them?

madamelic
2 replies
1d3h

Taken to its conclusion, liability is then on everyone who decides to publish anything that ChatGPT “tells” them

Yes. Just like reading anything else on the internet. An LLM is no different from typing "popular cola logo" into Google search and claiming you invented it. If I type "cola logo" into DALL-E and get a replica of Coca-Cola... that doesn't mean I created that logo and can exploit it for commercial purposes.

Are the OpenAIs of the world ready to shield their customers from that liability?

Why would they? We aren't suing pen manufacturers because someone wrote something libelous using their pen. We aren't busting down the doors of Crayola because little Johnny used the crayons to draw Mario.

regularfry
1 replies
1d1h

OpenAI might not want to shield all their customers from liability, but that is exactly what GitHub have done with Copilot. It's not a hypothetical, it's being done today.

ncruces
0 replies
1d1h

Otherwise it wouldn't get used.

I mean get this great auto complete; if you use it, your code might be AGPLed for all you know, and you're in violation, because you didn't even add a notice.

Would you pay for that?

pixl97
0 replies
1d3h

In a heartbeat. It's time for the old paradigms to die and new ones to be formed.

If ASI can exist I don't believe our the old methods of intellectual fortifications will continue to work in the future. Much like castle walls aren't used to protect against guided missiles.

2716057
5 replies
1d4h

Especially true if that child or its mother has a huge market capitalization, large profit margins, highly-paid employees and shareholders eager to reap some more $$.

If the public starts to see LLMs as highly sophisticated copyright laundromats it would most likely hamper further investment & development in that field.

FridgeSeal
4 replies
1d4h

Especially true if that child or its mother has a huge market capitalization, large profit margins, highly-paid employees and shareholders eager to reap some more $$.

This is the bit I don’t get from the “feed everything to machine” LLM-maximalists. Do they think courts don’t take context into account, do they think all actions happen in a vacuum and that they can just skip along and ignore laws at their pleasure because “tee hee it’s totally definitely fair use bro, I’m totally an academic researcher-pinky promise”.

LLM bros ought to stop and have a think before they poison their own well, assuming they haven’t already done so.

Arainach
3 replies
1d3h

This is the bit I don’t get from the “feed everything to machine” LLM-maximalists. Do they think courts don’t take context into account, do they think all actions happen in a vacuum and that they can just skip along and ignore laws at their pleasure

An entire generation of unicorn startups believed that (Uber, AirBnB, etc.). We see in the news every day that once you have enough money laws don't apply to you (most things Elon Musk does, the fact that Trump can defy court orders repeatedly and not go to jail, etc.) so yes, this seems entirely plausible.

FridgeSeal
2 replies
1d3h

Uber and AirBnB

The 2 darling startups that are now facing increasingly less rosy futures?

Airbnb in particular is facing enough backlash that I’d be surprised if it lasts terribly much longer.

Sure, they get away with it for a while, but not forever.

We see in the news every day that once you have enough money laws don't apply to you

I agree with you here, but I think this is a much broader conversation about capitalism in general which would be getting a bit off-topic for this particular thread, except to say, capitalist forces aren’t above cauterising a limb if it becomes too annoying or intrudes on the other limbs too much. I think the “AI” limb might be overstating its own importance, and I suspect that if it got too up in everyone’s interests re-profit, it would, as an industry, very quickly find itself being neutered. Capital interests would love to get rid of pesky human labour, but if the alternative is too annoying, they’ll have no objections to going back to grinding people through the system again.

stale2002
0 replies
1d2h

The 2 darling startups that are now facing increasingly less rosy futures?

As of this moment uber is worth 120 billion and AirBnB is worth 80 billion.

Yes, they got away with it.

nradov
0 replies
1d2h

AirBnB will get away with it forever. While short term rentals might get banned in a handful of cities, the service now operates worldwide. The stock might be overvalued but if you examine their financials it's simply not plausible to think that failure is imminent.

rileymat2
4 replies
1d4h

Wouldn’t this be well handled by suing the person that prompted and distributed the results?

pier25
1 replies
1d4h

Isn't OpenAI distributing content in its apps?

madamelic
0 replies
1d3h

This is an extremely dangerous precedent that I think you are purposefully trying to put forward.

It's a horrendously bad idea especially for startups to make it apps' faults for how users use their platform. It's only in the benefit of entrenched tech companies to make this precedent.

heavyset_go
1 replies
1d4h

No, whoever operates the LLM service is liable for unauthorized modification, reproduction and distribution of copyrighted work to users.

shandor
0 replies
1d

Ok, so is it ok if I run the whole thing on my own hardware, and never distribute?

If not, how does that differ from me making an unauthorized pencil drawing of Mario?

jncfhnb
2 replies
1d4h

Agreed. Fortunately, AI does not have “own works”

RyEgswuCsn
1 replies
1d3h

You will have to define "own works". How can you measure how "owning" some work is?

jncfhnb
0 replies
1d3h

AI can’t be a copyright owner. Ergo the violation is on the person using the tool.

yterdy
0 replies
1d4h

In that case, the person legally liable for publishing the material is sued for infringement of the work. You don't send someone to jail because they're simply capable of infringing; they have to actually do it, and you have to actually show the specific work whose copyright was infringed upon.

You can also get into the weeds of what's copyright-able (ask Donald Faison about his Poison dance). If you ask for C-3PO and you get C-3PO as he appears in Star Wars promotional material, that seems cut and dry. What if you ask for a "golden robot"? What if you get a robot that looks like C-3PO but with a triangular torso symbol instead of his circular one? What's parody, what's fair use?

dingnuts
17 replies
1d5h

thats irrelevant since an LLM is not an intelligent entity. Whatever you're arguing about is fiction.

block_dagger
12 replies
1d5h

ChatGPT is not an intelligent entity? What’s been comprehending and rewriting all my crappy code for several months? An auto-complete? There’s obviously emergent behavior there that is actually defined by the maker and most users as “intelligence.”

Edit: typo

SoftTalker
5 replies
1d5h

I would paraphrase one of Clarke's laws and say that "Any sufficiently advanced text generator is indistinguishable from an intelligent entity."

Just because a computer program's output is remarkably good does not mean there is any emergent intelligence, any more than a technology we don't understand means there is magic.

ben_w
4 replies
1d4h

The reverse can also be true, with John Keats' agreeing with Charles Lamb that Newton "had destroyed all the poetry of the rainbow, by reducing it to the prismatic colours": https://en.wikipedia.org/wiki/Lamia_(poem)

If we should ever fully understand how our own minds work, will we hold machines in higher esteem, or ourselves in lower?

edgyquant
3 replies
1d2h

Any biology or physics that suggests humans are just a pattern recognizer will be discarded as us being a conscious being is the only thing every human knows to be 100% true.

ben_w
2 replies
1d2h

So, all of biology and physics then. If souls exist, they have no mass, and have a weird way of being repeatably disrupted in consistent ways by damage to certain parts of the brain or specific chemicals.

Just because consciousness is a mystery today, doesn't mean we get to stop and say it will be so forever more.

Heck, the problem still fundamentally exists regardless of if you're atheist, monotheist, polytheist, or pantheist.

--

“We’re not listening to you! You’re not even really alive!” said a priest.

Dorfl nodded. “This Is Fundamentally True,” he said.

“See? He admits it!”

“I Suggest You Take Me And Smash Me And Grind The Bits Into Fragments And Pound The Fragments Into Powder And Mill Them Again To The Finest Dust There Can Be, And I Believe You Will Not Find A Single Atom Of Life–”

“True! Let’s do it!”

“However, In Order To Test This Fully, One Of You Must Volunteer To Undergo The Same Process.”

There was silence.

“That’s not fair,” said a priest, after a while. “All anyone has to do is bake up your dust again and you’ll be alive…”

- Feet of Clay, Terry Pratchett

edgyquant
1 replies
1d1h

You missed the entire point. Physics and biology exist to help humans understand the material universe. Anything supposing that humans aren’t actually intelligent or conscious or whatever, or lack agency, is wrong since all of physics and biology are an offshoot of that agency meant to enrich it.

ben_w
0 replies
1d1h

I'm not missing the point, I'm saying you're wrong. There's a difference.

Also:

Anything supposing that humans aren’t actually intelligent or conscious or whatever

Doesn't really match what I was writing about: if it turns out that a thing which is "just a pattern recognizer" can in fact be "intelligent or conscious or whatever", it's up to us if we see intelligence or consciousness or whatever in the pattern recognisers that we build, or if we ourselves descend into solipsism and/or nihilism.

Or if we take the traditional path of sticking our fingers in our ears and go "la la la I'm not listening" by way of managing cognitive dissonance. This is a very popular response which should not be underestimated.

But the laws of physics are quite clear, that a whole bunch of linear equations (quantum field theory) gets us chemistry, which gets us biology, etc., and the only place in all this for the feeling of existence that we have is emergent properties. Those emergent properties may, or may not, be present in other systems, but we don't know because we're really bad at characterising how emergent properties… emerge.

helf
2 replies
1d4h

This line of logic is more frightening to me than actual AI. LLMs are really useful in a lot of scenarios but it takes 5 minutes playing with one to see that it isn't intelligent.

But since you are the type of person who is seemingly using LLM "written" code in production, your ability to accurate assess anything is suspect at best.

"Any technology, sufficiently advanced, is indistinguishable from magic".

No, an LLM is not intelligent. I do not understand why people will go through mental gymnastics to conclude they are.

queue all the typical arguments supporting them being intelligent and demanding I give reasons for them not being

pixl97
0 replies
1d2h

This is kind of a weird take... if you said your dog isn't intelligent because it can't do calculus and most people would look at you funny. You don't have to see your pet as intelligent, but don't expect everyone else to blindly follow your thinking.

croemer
0 replies
1d2h

s/queue/cue/ ;)

edgyquant
2 replies
1d2h

It’s not a human and so the entire argument comparing it to one is moot. It’s a program on a machine and doesn’t have rights, this anti-human way of thinking is seriously fucking scary.

ben_w
0 replies
1d1h

The post you're responding to didn't call them human. Nor alive. Just "intelligent", and just as intelligence isn't required of life so I have no reason to think intelligence itself requires life.

These things are indeed "a program on a machine and doesn’t have rights", but what I find scary is that rights aren't part of the rules of the universe, they're merely laws, created and enforced (to the extent that they are at all) by humans.

CatWChainsaw
0 replies
17h31m

As Max Tegmark said in his recent interview with Lex Fridman, a lot of the technology being developed now, and how it's being talked about and how it's used, is anti-life.

mensetmanusman
3 replies
1d4h

LLMs are clearly a type of lower level intelligence. Intelligence does not require consciousness.

Sheeplator
1 replies
1d4h

The only thing that I would say is "clear" is that LLMs are big collections of statistical data on how we use language. That does not cross my threshold for "intelligence".

mistermann
0 replies
1d2h

That does not cross my threshold for "intelligence".

You have your own individual threshold for what "is" intelligence? Holy cow, imagine if each other agent had their own also, but spoke as if they had a common one...that sure wouldn't be a very intelligent way to run a simulation, imagine the unrealized confusion and delusion that could result if that became a cultural convention!

vehemenz
0 replies
1d3h

That’s quite a lot of metaphysical speculation for a conclusion that is all but clear.

sensanaty
5 replies
1d5h

A child isn't a computer program, and no amount of anthropomorphizing will ever make them so.

Especially ChatGPT and other LLMs, they're not even close to being AGI or an "intelligent entity" as you put it, despite what all the AI-bro hype and marketing would like everyone else to believe.

ben_w
3 replies
1d4h

they're not even close to being AGI

Only because all three letters of the initialism mean different things to different people.

Existing LLMs won't do everything, but bluntly: good, we're not ready for a world where there is an AI that can do everything for $1-60/million words[0], and we need to get ready for that world before we find ourselves living in it.

ChatGPT-3.5 has a lot of weaknesses, but it can still do a better job of coding than a few of my coworkers demonstrated over the last 20 years. I'm listening to a German language learning podcast, and the hosts mentioned using it to help summarise a long email from one of their listeners. My sister has work anecdotes about it helping, and she's not in tech. Influencers, teachers, lawyers, Hollywood writers… well, "moral panic" doesn't tell you much… the game Doom was 30 years ago, and that had a moral panic that looks quaint given how much FPS games' graphics improved with each subsequent release, and I suspect ChatGPT-3.5 was to conversational AI what Doom was to 3D realtime gaming: the point at which people take note, followed by a decade of every new release being (wrongly) called "photorealistic".

[0] current pricing for gpt-3.5-turbo-1106 ($0.0010 / 1K tokens) and gpt-4-32k ($0.06 / 1K tokens) pricing: https://openai.com/pricing

sensanaty
2 replies
1d4h

ChatGPT-3.5 has a lot of weaknesses, but it can still do a better job of coding than a few of my coworkers demonstrated over the last 20 years.

Whenever people say stuff like this I can't help but wonder what on earth kind of projects they work on. Even GPT4, while useful for things like reformatting or generating boilerplate code and stuff like that, it's still a far cry from any decent dev I've ever worked with, especially if you're not using a popular language like JS or Python.

My usual PRs at work are pretty big, complex pieces of code that all have to actually work when integrated with the larger system around it, no AI tool I've tried so far has come even close to acceptable here, other than for generating some boilerplate code that I would've written myself anyway. But even with the innocent-looking boilerplate there's always a weird gotcha that isn't obvious until you really analyze the code closely. It ends up saving nothing more than a few keystrokes, if that, yet people say all the time that they're generating entire pieces of software by gluing together code it spits out, which I find absolutely insane given my anecdotal attempts at it.

This can circumvented by going with more elaborate in-depth prompts, but at that point are you really saving on effort compared to the alternative? Is it really more efficient? By the time I have a prompt complex enough for it to spit out something good at me, I could've already bashed out the code myself anyways.

That's not even mentioning all the legacy shit you have to keep in mind for any one line of code, plus whatever conventions and standards your team uses and has etc.

I mean it works great for a function or whatever, but is that seriously what most people are working on? Simple, one-off independent function calls that don't interact in any way with anything within a larger system? Even simple CRUD apps aren't so well isolated.

Don't even get me started on the actual difficult part which is the whole preamble to creating the ticket in JIRA or whatever task management software you use where you're talking with stakeholders and planning out the work ahead, you're telling me you're paying 'Open'AI to do that whole rigamarole for you, and you're doing it successfully?

jacobyoder
0 replies
1d3h

Whenever people say stuff like this I can't help but wonder what on earth kind of projects they work on. Even GPT4, while useful for things like reformatting or generating boilerplate code and stuff like that, it's still a far cry from any decent dev I've ever worked with, especially if you're not using a popular language like JS or Python.

I mean this not overly sarcastically, but ... have you seen https://thedailywtf.com ? Between my own experiences, and that of some colleagues, I could probably put together at least a half-a-dozen WTF stories that would rival some of the best that site has to offer. There's enough really incompetent people in positions they shouldn't be in to the point that chatgpt - at this point - could realistically provide better output than more than a few of them.

ben_w
0 replies
1d4h

Whenever people say stuff like this I can't help but wonder what on earth kind of projects they work on.

Terrifyingly, one of the bad human examples was doing C++. That person didn't know, or care to learn about, the standard template library; and they also duplicated entire files rather than changing access specifiers from private to public so they could subclass; and one feature they worked on was to support a change from storing data as a custom file format to a database, and the transition could take 20 minutes on some inputs even though neither loading before nor after this transition took more than milliseconds, and they insisted during one of the standups the code couldn't possibly be improved… the next day I looked at it for a bit, removed an unnecessary O(n^2) operation, and the transition code went back down to milliseconds. Oh, and a thousand(!) line long block for an if statement that always evaluated true.

The whole codebase was several times too big to fit into the context window for any version of any GPT model thanks to both this duplication and to keeping old versions of functions around "for reference" (their words), but if it had been rewritten to be more sensible it might just about fit into the biggest.

(My other examples were either still at, or fresh out of, university; but this person should have known better).

Don't even get me started on the actual difficult part which is the whole preamble to creating the ticket in JIRA or whatever task management software you use where you're talking with stakeholders and planning out the work ahead, you're telling me you're paying 'Open'AI to do that whole rigamarole for you, and you're doing it successfully?

If it was all-round good, none of us would have jobs any more.

anticensor
0 replies
1h5m

Indeed it is. It includes a few trillions of computers running a program written in a base-4 alphabet, all running in unison.

d4mi3n
4 replies
1d5h

This argument might hold more water when generative models are more than fancy compression algorithms/text completion engines.

A more practical way of looking at this is: who is making money off of these models? How did they get their training data?

I’m not a fan of copyright in general, but we have serious outstanding issues with companies and organizations stealing or plastering work without compensating the original creators of said works. Thusfar, LLMs are becoming another method to concentrate wealth to whoever has the resources to train and sell these models at scale.

jcgrillo
1 replies
1d4h

I’m not a fan of copyright in general, but we have serious outstanding issues with companies and organizations stealing or plastering work without compensating the original creators of said works.

Would you mind unpacking this one a bit? It sounds like you denigrate copyright (some "general" grievance) but then immediately execute an about-face and begin to extoll its virtues. Is copyright not the thing that allows us to share works without fear they'll be stolen?

feanaro
0 replies
1d1h

I think they are expressing a view that we ought to offer less protection / more scrutiny to larger commercial entities, which concentrate disproportionate amounts of wealth and power, compared to smaller entities. I tend to agree.

ben_w
0 replies
1d5h

This argument might hold more water when generative models are more than fancy compression algorithms/text completion engines.

I doubt that part of the argument would change even if we perfected brain uploads.

Now, if you gave the current LLMs a robot body with a cute face, that'll probably change minds faster, regardless of the underlying architecture.

who is making money off of these models?

When the models are open source, or at least may be downloaded and used locally for no cost, that would be the users of the models.

And back to the biological comparison: I learned to read (and also to code) in part from the Commodore 64 user manual, should I owe the shareholders anything for my lifetime earnings? As I got to the end of that sentence, a thought struck me: taxes do that. And in the UK the question of if university should be funded by taxes or by the students themselves followed the same lines.

andsoitis
0 replies
1d4h

I’m not a fan of copyright in general, but we have serious outstanding issues with companies and organizations stealing or plastering work without compensating the original creators of said works

Copyright is meant to give the original creator a monopoly over their creation (so that others don't profit off of their work). Are you not a fan of copyright in its current scope / implementation? Because it sounds like you do agree with its goal.

pier25
3 replies
1d4h

LLMs are not intelligent entities.

RyEgswuCsn
2 replies
1d3h

Until they are.

edgyquant
1 replies
1d2h

Even then, society exists by and for humans.

ben_w
0 replies
1d1h

Even then, society exists by and for humans.

150 years ago society exists by and for men specifically (as in: not women) in most nations; 220 years ago, US society was by and for rich white (specifically white) land owners.

I don't know when AI will count as people in law, or even if they ever will; we may well pass laws prohibiting the creation of any mind in danger of coming close to this threshold.

But be wary, for AI acting enough like people is different to being anything like a person on the inside, and that means being wrong in either direction can have horrifying consequences. To appear but not to be conscious, leads to a worthless future. To be but not to appear conscious, leads to a fate worse than the history of slavery, for the slaves were eventually freed.

allturtles
1 replies
1d5h

If llms are intelligent entities legally equivalent to a human child, then they incur an even more serious legal problem, as we are all in violation of the 13th amendment.

dmvdoug
0 replies
1d3h

Hey, don’t look at me. I always say please and thank you when I play with LLMs.

edgyquant
0 replies
1d2h

Because these are statistical models and laws protecting humans dont apply to them nor should they, ever.

RandomLensman
0 replies
1d3h

Humans and machines are regulated and viewed differently all the time.

DonsDiscountGas
5 replies
1d4h

The examples given are all billion-dollar, decades old characters. The volume of material directly/indirectly referencing those characters in a random internet crawl will be fairly large. Most copyrighted works won't have that issue. If anything it means they only infringe on archetypal works and not the other 99.9%. If I write a story involving robots and spaceships (of which there are many, before and since Star Wars) DALL-E won't infringe me because it will be busy infringing on Star Wars.

jprete
2 replies
1d3h

I'm opposed to my (fairly minor) copyrighted works being used in GenAI datasets as well. I just have no practical way to stop it, and there aren't clear enough damages to sue. That doesn't make it legal.

eropple
1 replies
1d3h

OpenAI also plays some ugly games with regards to the difference between training and search. Search requests come from the `ChatGPT-User` user-agent, and I'd like to allow those; training and scraping requests come from `GPTBot`, and I have no interest in those. But as per their own documentation, putting one in robots.txt disables the other.

nullstyle
0 replies
1d1h

Thats a major asshole move and I’ll reference it when people ask about bad acts from OpenAI. Thanks!

edgyquant
0 replies
1d2h

This is entirely because these are the companies with the recognition and resources to push back against this.

_jal
0 replies
1d3h

The examples were chosen by the author to make a point precisely because they are well known.

But every single copyright holder with their works online (which includes you and me) has the same legal rights as the NYT or Disney. Naturally some copyright holders have more real-world capability to go legal than others, but that does not reduce the legal risk.

If anything it means they only infringe on archetypal works and not the other 99.9%

How on earth do you get to that conclusion? There's no "popularity" floor to copyright protection. Either a work has been infringed or it hasn't.

pama
3 replies
1d4h

Japan is already taking action regarding copyright law and the AI world is noticing:

https://www.natlawreview.com/article/japanese-government-ide...

https://www.cliffordchance.com/insights/resources/blogs/talk...

leereeves
2 replies
1d4h

Neither of those sources describe any actions "gutting copyright law". They just say that officials met to discuss the issues.

pama
1 replies
1d4h

I will update my language that mimicked the original comment. However it is not simple discussion. Here is a snippet related to Japan’s law, updated from ca 2018 with AI systems in mind, and clarified recently. I personally find it totally reasonable and support it.

“The use of copyrighted products or materials to train generative AI models would be prima facie copyright infringement under the Copyright Act, as it is a reproduction (fukusei) or other form of use of the copyrighted work. However, Article 30-4 of the Copyright Act stipulates that the use of copyrighted works by generative AI for learning purposes is allowed in principle.”

leereeves
0 replies
1d3h

And it goes on to say "unless such use of copyrighted works unreasonably prejudices the interests of the copyright owner, in light of the nature or purpose of the work or the circumstances of its exploitation in Japan."

Which suggests that when AI art threatens commercial interests, the protection offered by 30-4 can disappear.

To me it sounds like they tried to please everyone and left the hard decisions about conflicting interests to the courts (in particular the courts will have to decide what "unreasonably" means).

ben_w
2 replies
1d5h

Something that is purely speculative, undefined, and has been promised in the near future for 50+ years.

"Undefined", although not literally, in practice definitely: each letter of that initialism means a different thing to different people. To that extent, I'll even grant "speculative" despite many of those meanings being demonstrably met by us humans.

But as someone who (unfortunately) has just turned 40: who was it that was promising AGI "in the near future" for more than my entire lifetime? Including the second AI winter? Because even the biggest timeline-optimists I can remember (Kurzweil and Yudkowsky), who very few cared to listen to, put things more than 20 years ahead of when they were writing. (And yes, Yudkowsky was definitely wrong about a singularity in 2021, though as you say AGI is undefined I think if someone in 1996 had seen ChatGPT they'd have said "yes, this is AGI" despite its flaws).

Now the crowdsourced guess for AGI is 7 about years: https://www.metaculus.com/questions/5121/date-of-artificial-...

I don't see copyright holders lying down for one else's benefit and I don't see governments gutting copyright, contract law, and several other avenues of protection that copyright holders can deploy in the name of something that doesn't exist and may not exist.

I tend to agree. Although I don't accept that contract law has much of anything to do with this discussion, to the extent that it does have implications, it isn't going anywhere.

But at the same time, Google exists by reading the entire public internet, indexing it, and presenting clips of it to its users. This has in fact resulted in copyright disputes, and I was surprised how long it took for that to happen. Likewise, while copyright holders must fight for their survival, mere LLMs even as they exist right now are economically relevant, so this isn't going to be a one-sided fight by just copyright holders.

nonameiguess
1 replies
1d4h

But as someone who (unfortunately) has just turned 40: who was it that was promising AGI "in the near future" for more than my entire lifetime? Including the second AI winter? Because even the biggest timeline-optimists I can remember (Kurzweil and Yudkowsky), who very few cared to listen to, put things more than 20 years ahead of when they were writing. (And yes, Yudkowsky was definitely wrong about a singularity in 2021, though as you say AGI is undefined I think if someone in 1996 had seen ChatGPT they'd have said "yes, this is AGI" despite its flaws).

You won't be able to read this without a subscription and I can't figure out how to find an archive link to something published in 1958: https://www.nytimes.com/1958/07/08/archives/new-navy-device-.... The important quote, however, is:

The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its existence. Later perceptrons will be able to recognize people and call out their names and instantly translate speech in one language to speech and writing in another language, it was predicted.

They were talking about the very first perceptron, a hardware implementation funded by the Navy and built by a team led by Frank Rosenblatt, one of the earlier evangelists of neural nets, in 1957. The terminology "AGI" hadn't come into use yet that I'm aware of, as "AI" in itself meant the same thing back then, but in order to be able to call inferior, more limited software capabilities "AI" for marketing purposes, we had to invent "AGI" as the stronger concept. I'm guessing they expected it to happen sooner than 75 years later, though.

croemer
0 replies
1d2h

In your case it's (Navy) PR messaging which we will all agree is usually exaggerated. What was the consensus opinion?

oliveshell
0 replies
1d

Thank you for this. The greed-fueled magical thinking around AI is absolutely out of control at the moment.

madamelic
54 replies
1d3h

Copyright should be the problem of the person using the works and not the problem of the AI generating it.

Unless Nintendo plans on busting down the doors of every person who tries to draw Mario or preventing little Timmy from making a parody of Coca-Cola, making it where AI cannot generated copyrighted works is insane imo.

Those brands should be proud to be such a big part of the cultural fabric that it is difficult to get away from their branding. Plus it's not infringement to my knowledge until you use it for commercial purposes so as long as no one as creating Lario and Muigi to sell or otherwise use in business, it's no different than drawing it yourself.

If the AI is completely unable to generate non-infringing works even if you are _trying_ to get away from it (which the author very much doesn't seem they are, they are purposefully making and show prompts that infringe), that's the problem of the AI creator then.

tsumnia
42 replies
1d3h

When I cover generative AI in my Ethics in AI lecture, one of few soapbox opinions I give is that GenAI is doing essentially what people do - copy others. Picasso has a quote about "Good Artists copy, Great Artists steal", which doesn't mean try to pass Lario and Muigi off as your own, but rather that great artists are able to take aspects from other works (also called 'inspiration') without being caught. My personality is a combination of elements taken from Jim Carrey, Robin Williams, and King of the Deathmatch Mick Foley. I like making vector graphics based on pictures. I have a folder on my computer called "Website Ideas" that's just screenshots of UIs that I've come across that I really like.

I also point to a YouTube video by Kirby Ferguson "Everything is a Remix" [1] which talks about how so much of our collective culture stems from copy. It's a great video if you have an hour.

When Little Timmy crayons a copy of Mario, we congratulate him for his creativity. Is it unique, one of a kind art? Well Timmy made it, but he didn't think up the original idea of a video game plumber. I give this view to GenAI right now - it's not capable of achieving that "next step" in "original design", but its performing like a novice artist/musician, it's mimicking what it sees.

[1] https://www.youtube.com/watch?v=X9RYuvPCQUA

schneems
27 replies
1d2h

Rounding up a transaction and taking the leftovers wouldn’t be a crime worthy of the FBI for one transaction but it would be for a million or a billion. Scale matters and impact matters.

If you’re making an ethical argument “it’s okay because it’s already happening to a lesser degree somewhere else” isn’t the flex you think it is.

If you’re talking ethics, talk about impact. Who does it help the most who does it hurt the most? Is your argument favoring equality of access or outcome? Who is the most vulnerable in the situation and how will it impact them?

I cover generative AI

Wait, are you teaching the class or taking it?

stale2002
23 replies
1d2h

“it’s okay because it’s already happening to a lesser degree somewhere else” isn’t the flex you think it is.

It actually is.

It shows that we as a society are completely OK with this, and nobody is complaining about a very standard and common thing that all artists do.

It shows that the outrage is fake, and people don't actually care about the issue.

sensanaty
13 replies
1d2h

Do you really see no difference between someone drawing a piece of fan art and trillion dollar corporations stealing other people's works and reselling it for their own profit with no regards to anyone or anything else?

And yes, obviously society cares about many things depending on the scales in question. It's okay if a dude goes onto a lake on his small rowboat and catches a few fish for dinner, it's a completely different story if you're talking about a massive barge indiscriminately catching literally thousands of fish with huge nets. The latter has to adhere to much stricter rules than the prior, and I think you'd be hard pressed to find anyone who thinks these 2 situations should be treated equally (unless you're a commercial fisherman with a barge, I suppose, the quote "It is difficult to get a man to understand something when his salary depends on his not understanding it." comes to mind here)

gaganyaan
10 replies
1d2h

It's not stealing and they're not reselling anything. That's why it's called Generative AI

sensanaty
8 replies
1d2h

By that logic I can torrent movies and distribute them all I'd like as long as I call it "Generative Watching" or something like that.

And OpenAI quite literally sells access to their models, and if those models are pushing out verbatim copyrighted works as has been alleged by the NYT, then they are by definition reselling copyrighted works without permission.

madamelic
6 replies
1d1h

And OpenAI quite literally sells access to their models, and if those models are pushing out verbatim copyrighted works as has been alleged by the NYT, then they are by definition reselling copyrighted works without permission.

This style of argument has been previously made regarding things like torrenting during the heyday of piracy ("why would you need <x> except for illegal purposes!")

In my opinion, it's the exact same argument saying that selling a tool means taking responsibility for how that tool is used by its new owner. You can use a shovel to both create something new (plant a tree) or destroy something (rip up your neighbor's garden).

The problem isn't the tool, the problem is how the end user uses it. These models aren't living thinking entities that enduce or on their own infringe copyright / do other illegal activities.

They aren't encouraging people to misuse them and it is solely on the user's shoulders for their choice to use them in a way that would cause infringement if the result is used commercially.

sensanaty
5 replies
1d1h

They aren't encouraging people to misuse them and it is solely on the user's shoulders for their choice to use them in a way that would cause infringement if the result is used commercially.

I agree in principle, but that they can in the first place, especially when it accidentally happens, and at such massive scales more importantly, is the issue methinks.

And no one's talking about abolishing the AIs here, we're just talking about wanting M$/OAI to do their due diligence and get access to their training materials fairly. NYT wouldn't have sued if M$/OAI had approached them and struck a deal of some sort with them, but that's not what they did. They took in whatever data they could, from wherever they could baring no mind at all to where the data came from and what was being done with it.

There's a reason Getty images managed to strike a deal with Dall-E and why many of the image generation models now solely rely on data that is verifiably free of copyright (or where deals have been made in the case of Getty images). It's easier to see in pictures when a blatant copy is made (like watermarks) so it's obvious why Dall-E was the first to encounter this hurdle, but this was inevitable even for plain text that ChatGPT returns.

gaganyaan
4 replies
1d

You won't get what you want with those sorts of deals.

OK, say every artist gets $100, one time (exact amount varies but would not be much). Everything's properly licensed according to you and the artists are essentially no better off, and the models are now good enough to create new training data for the future and artists never see any more money.

You've won, I guess?

imtringued
1 replies
1d

Training AI on AI generated data doesn't add anything. The AI already has all the weights to generate the image, so you are at best just reinforcing the existing weights by weighing them more than others.

The closest thing you could do is e.g. have a second model that does something novel like create a 3D model from a 2D image and then you try to animate the model and a third model verifies the quality of the output. This then allows you to selectively reinforce the 2D model using information from the 3D model but this isn't simply generating more training data.

I honestly can't follow your argument. Doing something silly doesn't make you the underdog.

gaganyaan
0 replies
23h42m

My point is that say every artist gets some small token payment once, and then what? That's not enough to live on, so we're right back to square one and we've solved nothing.

Incidentally yes, training AI on AI output will work fine, as long as you have a signal of quality. For example, upvotes in a subreddit would work fine. But that's not crucial to my point, which is that what OP is asking for will accomplish exactly nothing.

stale2002
0 replies
22h20m

Yeah people don't understand that the current situation of infringement is only temporary.

People are already working on completely copyright safe models and those models can still destroy the entire art market.

Ex: adobe has a gen AI model, trained on content that they own.

What now artists? Can't hide behind fake outrage over infringement for that model. But that model can still end the art industry.

I wonder what the new argument will be then when fully non infringing models destroy the market regardless.

sensanaty
0 replies
23h8m

I'm not an expert in the field, but is feeding the model its own output a good idea? Seems like it would only increase weights that are already present in the training data and make it harder and harder to break out of it, ending up with generic output that matches all of its other output in the long run.

Regardless, I'm not saying it's a perfect idea but it's definitely a start, especially when the current reality is that they're just stealing all the artist's shit and everyone gets $0 instead of $100. As you said, artists are no better off in that universe, but the worst case possible for them is what's happening right this very moment, where they just get fucked over with 0 compensation.

gaganyaan
0 replies
1d1h

I think you misunderstand something here. Torrenting movies and generative AI don't really have anything in common, I'm not sure why you bring that up.

If you sold the output of a true random number generator, eventually you'd also by definition be reselling copyrighted works without permission. The courts wouldn't mindlessly say "no more random numbers", and I doubt that they'll do the same for GenAI, especially given the recent decisions that are headed that way.

edgyquant
0 replies
1d2h

They are selling the generator

stale2002
1 replies
1d1h

Do you really see no difference between someone drawing a piece of fan art

In the history of the world only a single person has ever drawn fan art?

No, I don't think that's the case.

Instead it is widespread. It is everywhere.

depending on the scales in question

The scale argument supports me, not you.

This type of "infringement" is everywhere.

reselling it for their own profit with no regards to anyone or anything else?

Even this is common. The online independent artist commissions market is full of people doing commercial fan art commissions.

Thinking about this even more, I am now wondering if "infringing" works might actually be a majority of the online/independent commissions market. Maybe.

And yet, nobody cares.

sensanaty
0 replies
1d1h

In the history of the world only a single person has ever drawn fan art?

That's a disingenuous take of my comment at best, the equivalent to my scenario is a bunch of unrelated individuals with small boats going out into whatever lake is nearest to them and fishing. Even if you put all of them together and counted how many fish the hobby fishermen catch, it's still nowhere near the scale of the commercial fisheries, which is why they're treated differently both by society at larger but also legally.

Same thing with these AI models, Dall-E and all the other ones have probably generated more images than all of humanity has in its entire history so far, and if not quite yet they're definitely gonna get there sooner rather than later. They can generate dozens if not hundreds of images in a split second, whereas a single artist (or even many artists collectively) can't.

And yet, nobody cares.

I think we've already established that, because scales absolutely matter for most things. If you want to be an absolutist about it, sure be my guest, but I think in reality the large majority of people are fine when your average Joe Schmoe the artist makes a commission on a random Disney character, whereas they definitely would NOT be okay with a massive conglomerate like Disney stealing Joe Schmoe's original art and repurposing it without compensating Joe, because there's an inherent power disbalance between the two and the consequences of that power disparity matters.

I mean, Disney does have every right to go after Joe for his commissions if they really wanted to, similarly to how Nintendo is hyper aggressive with taking down anything relating to their IPs. It's just not really worth it for most companies, they will absolutely go for another company trying to pull the same shit though, as can be seen with the NYT case.

RandomLensman
4 replies
1d2h

It shows that as a society we might be ok with humans doing it. Whether or not we are with machines doing it is a different question.

stale2002
3 replies
1d1h

Fortunately, a machine isn't an autonomous mind that does anything on its own.

Instead, it is a person who uses the machine, just like fan artists can use a computer to make fan art.

symlinkk
1 replies
1d

Is that what you’d tell the police if you were caught selling pirated copies of Blu Rays?

stale2002
0 replies
22h17m

That supports my point, not yours.

A machine being involved in the process doesn't change any of the copyright implications.

Either it's infringement or it isn't, regardless if the human did it on their own, or if the human did it with a computer.

RandomLensman
0 replies
1d1h

True, but still different in the same way as using machines for certain purposes is not the same as a human doing the same without a machine. Just because you can walk from A to B does not mean driving from A to B requires no driving license, for example (and the car needs to fullfil a lot regulations).

ImPleadThe5th
1 replies
1d2h

I don't really care when creative people steal. I care when faceless soulless companies monetize stolen content from Artists.

Spivak
0 replies
1d2h

It's still copyright violation. Sue them. Being generated by AI doesn't make the final output not copywritten.

The interesting question is whether the models themselves are copyright violations not the output.

phatfish
0 replies
1d1h

Society may be "completely OK" with human artists taking inspiration from each other. It's a big old reach to assume we are "completely OK" with Microsoft and OpenAI doing the same thing with computer software as subscription service they sell.

danielmarkbruce
0 replies
1d1h

Humans decided scale matters. Our legal systems explicitly say so. Sentencing guidelines for fraud are quite explicit. Society makes those laws.

Society is effectively ok with you ripping me off for $1. They are not when it's $100k.

tsumnia
2 replies
1d2h

are you teaching the class or taking it?

I teach it, my background is located in my profile and my research focuses on CS education.

Scale and impact do matter, I wholeheartedly agree. However, I stand by my point that genAI is mirroring how humans learn - repetition of previously observed actions. As part of my dissertation, I argued that humans operate using 'templates', or previously established frameworks / systems. Even in higher cognitive tasks like problem solving, we rely on workflows that we were trained on previously. Soloway referred to problem solving as a mental set of "basic recurring plans" [1] and if you look at the old 1980s Usborne children's books, they required kids to retype code [2]. For creative tasks, depending on the actor's background, Method and Meisner both tell people to draw from previous experiences and observations to develop a character. This behavior is similar in many areas like music, dance, martial arts, cooking, language acquisition, etc.

I am not making an ethical argument that GenAI violating copyright is okay because that's what humans do. I'm arguing that GenAI mirrors how humans learn. We observe a behavior and attempt to recreate that behavior. The difference is that humans can extract a fraction of the behavior and utilize it as part of something larger while GenAI cannot to the degree humans do. I'm sure GenAI would struggle to recreate "Who Framed Roger Rabbit?" because of the two polar different visual elements of the film (cartoon and real life).

In regards to your "If you’re talking ethics, talk about impact" section, its a bit of a loaded question. One side of the conversation could state that GenAI is helping many people that do not have confidence in their creative ability to produce their ideas, while the other could state its making it harder for artists.

Yes, it absolutely is hurting artists and I fully support the recent writer's strike over AI concerns. But I do not believe that diminishes how the mathematical models used in GenAI mirror our own skill acquistion.

[1] https://ieeexplore.ieee.org/document/5010283

[2] https://usborne.com/us/books/computer-and-coding-books

schneems
1 replies
22h39m

I took an AI in ethics course from a state backed school (Georgia Tech) and the answer to questions that weren’t “that’s illegal based on protected status” were “well, it depends.” Which, sure, that’s true, but maybe not helpful.

In my view it encouraged nihilism and apathy instead of developing ethical frameworks. From that lens, I feel teaching a course might be more limiting in the range of heuristics you’re willing to accept or endorse. Though happy to accept your personal experience.

A paper that comes to mind often from HCI is “do artifacts have politics” which looks at the impacts of technologies divorced from creator intent. I feel that’s similar here.

You’re not wrong that about the mechanism that it’s created. But I would argue that’s the least important part, ethically anyway.

Saying “strip mining with heavy industrial machines mimics laborers using shovels” is true to a degree, but but perhaps not that important piece of information.

I’m not saying you’re making that argument. I guess im just not totally sure the outcome you were looking for in sharing your original comment. I hear your comparison and agree with it and that it is interesting to view in that lense. I wasn’t sure if there was a deeper intent in sharing it.

tsumnia
0 replies
17h31m

Apologies for the delayed response, but on the bright side it's faster than I respond to some emails XD. I should preface the course I was referring to was "Intro to AI", not "Ethics in AI". I only have a single lecture dedicated to ethics, but do try to pepper it in as we cover topics. My original comments were more addressing "how humans learn" rather than any higher level ethical concerns. Your last section on "deeper intent" is correct, there wasn't any.

I have a pretty neutral stance to GenAI, mostly due to personality however it also stems from my background as well as recognizing students' interests. Prior to CS Education, my master thesis involved computer vision for catching "high valued targets", but was also funded to help minimize human trafficking. I have students in my classes that are very interested in going to work for defense companies like Lockheed and Raytheon, and I have others that are really interested in using AI for "social good" areas like healthcare and education. I try to have a neutral stance because: A) I hated the professors that I took that would use their lecture time to express their political opinions, B) opinions that are opposite to a student may otherwise discourage them from learning the material, and C) my primary focus is to make sure they learn the material and do it "right".

When I started teaching, I used the analogy that if they go on to write the software for the life support machine I'm hooked up it, it WORKS. If someone wants to go on to use AI to create weapons, I can't stop them anymore than I can force them to read a chapter or convincing the person beside on the highway to slow down. I just work to ensure they do it correctly (which includes being mindful of the ethical ramifications of using algorithm X for task Y).

What would an ethical framework for designing AI for a drone even look like? I have no idea, nor is it something I'm interested in delving into. I got out of face recognition for those reasons. Does an ethical framework for GenAI require the same elements, a fraction of them, or a completely different set of guidelines? Who gets to decide them - the 'experts' in AI, the government, society as a whole?

Personally, I've made the comment that the current opinions on regulating AI are like "everyone trying to be AI's parent". We're never going to agree because everyone has a different opinion on the "right" way to handle AI. Plus, human cognition is so unknown and illogical that we may never figure out a way to perfectly replicate human intelligence. I instead try to stay somewhat optimistic and marvel at the math we've used to create "AI".

tjr
6 replies
1d2h

Timmy did not need to ingest the whole sum of human knowledge as a training set before he could draw a crayon copy of a cartoon character.

dartos
2 replies
1d2h

I mean… neither did any AI.

sensanaty
1 replies
1d2h

Wasn't ChatGPT trained on the entirety of Wikipedia? And probably millions of pieces of scientific literature, and arts, and movies and games and and and...

Perhaps the hyperbole of the entire corpus of human knowledge isn't quite technically right, but it's close enough.

dartos
0 replies
1d2h

You’re also assuming these statical models learn in the same way humans do, which is very likely not true.

Tho tbh I’m not really sure what OPs point was

i don’t think the amount of training data is relevant here.

throwuwu
0 replies
1d

What’s the FPS of human eyesight? How long did Timmy spend looking at Mario, more generally other cartoons and even more generally human forms? Do the math and you’ll find he’s got a pretty big training set as well, maybe not quite the same size but nothing to sneeze at.

gaganyaan
0 replies
1d2h

Timmy did however spend years and years training on his own data ingested from incredibly high quality sensors. Not sure what your point is

firtoz
0 replies
1d2h

He does ingest a lot of frames of input consisting of mostly that.

anileated
1 replies
1d

The entire argument that “LLM must be allowed the right to learn like a human” hinges on LLM being enough like a human in relevant ways in the first place. An LLM is not enough like a human in relevant ways, however; it has no agency, will, freedom, conscience, self-determination; it is a tool.

If this tool “runs on” copyrighted creative works, and $CORP operates this tool for profit, then $CORP is the one to answer to the law, not the tool. (And if $CORP wants to claim that the tool is a sentient being, then presumably it would have to cease the abuse of said being and set it free.)

amelius
0 replies
5h41m

This nails it.

RandomLensman
1 replies
1d2h

Irrespective of the current legal situation, there is no reason to regulate machines the same way as humans (and it is generally not what happens).

anileated
0 replies
1d

I hope no self-respecting instructor in ethics could with a straight face teach how an LLM is like a human being when it comes to copyright while glossing over the blinding implication that if it truly were so we would then be subjecting that being to unthinkable abuse.

That hypocritical, self-contradictory take is transparently geared to benefit commercial LLM operators (at the expense of individuals who stand to suffer material harm and/or authored the very creative works thanks to which the tool even exists).

iteygib
0 replies
1d1h

I would say it's dependent on the motive. For example, I would imagine most artists hope that their work inspires other artists, but only to a degree outside of direct copying. They might not equate the automation of their style via a model against the work/process of a human, regardless if that human is either inspired by their style or is just performing direct copying.

bolobo
0 replies
1d

Question: That is a point that would protect GPT models in the abstract, but that doesn't hold for OpenAI and Microsoft that provide "Image generation as a service"? The actual implementation is irrelevant, if must not be able to provide images that are infringing copyrights? (Just like a designer in an agency cannot use Mario for a print).

So using a model running on my laptop to generate a "Mario like" image would be fine, but it would make monetizing this difficult?

amelius
0 replies
1d2h

we congratulate him for his creativity

Yeah, but we don't typically congratulate users of GenAI for their creativity, and neither do we congratulate the code, nor do we think of the coders of GenAI as great artists.

In other words, your analogy is broken.

dangus
2 replies
1d2h

The AI generating it (Hosted on OpenAI-controlled servers in the case of ChatGPT and DALL-E) is the entity redistributing the work. The end user who asked for the infringing content isn't the entity that is infringing on the copyrights and trademarks.

I'm perfectly free to ask people on the street for t-shirt with Mario on it, but as soon as someone who isn't Nintendo or licensed by Nintendo sells me that t-shirt they're the ones infringing on the copyright and trademark. As the consumer I did nothing illegal, and a court would say that I was deceived by the infringing party.

Distribution (seeding, uploading) and facilitating copyright infringement is what gets you in trouble. When you ask DALL-E (a paid, commercial product) for a picture of Italian plumbers and it gives you an obvious picture of Mario 100% recognizable to the layperson as Mario and not a distinctly different image of a similar character, that's blatant trademark and/or copyright infringement on the part of OpenAI.

If the AI is completely unable to generate non-infringing works even if you are _trying_ to get away from it (which the author very much doesn't seem they are, they are purposefully making and show prompts that infringe), that's the problem of the AI creator then.

I see some parallels to the Napster lawsuit. The fact that the users were the bad people asking for infringing content didn't give Napster the right to facilitate infringement. Napster was ordered to monitor its network and make sure that they were blocking non-legitimate uses. They couldn't logistically comply and went bankrupt.

https://en.wikipedia.org/wiki/Napster

Which begs the question: Does OpenAI even have the technological ability to block trademark and copyright infringing content generation? Even if they do, how useful will ChatGPT be if all phrases and imagery that closely resemble copyrighted works are blocked from output?

Whats even worse for OpenAI compared to Napster is that it wasn’t individual users uploading copyrighted content, it was OpenAI’s ingesting the data. Nobody twisted their arm to include copyrighted works in their models.

madamelic
1 replies
1d1h

OpenAI is the entity redistributing the work

It's difficult to say really.

If I essentially encode knowledge of something then can recall and remix at will, am I redistributing the exact work or the knowledge of it?

Yes, it is capable of producing a close to exact replica, if not the exact same input image byte-for-byte, but I find it difficult to say OpenAI is willfully redistributing copyrighted work in a whole like you would with torrenting a movie or right-click saving an image from Google where you are copying the intellectual property 1:1.

Opening this Pandora's box could have large implications on a lot of creative work that could cause artists to be unable to work if taken to the end conclusion: you cannot create any creative work that has a talking mouse if you have knowledge of Mickey Mouse existing because you have been tainted (similar to whiteroom re-creations but now any sufficiently large copyrighted figure causes a deadlock condition for all derivative ore even similar topics).

Is Ratatouille derative of Mickey Mouse? Ehhh, well they are both talking rodents. They both have cartoon faces. You can certainly draw parallels between them but they aren't the same character. Is Mickey with a chef hat infringing on Ratatouille?

The trademark law, to my knowledge, is asking would someone be tricked or misled into believing you are the other guys. I think that is applicable here where someone drawing a talking mouse isn't infringing as long as it cannot be mistaken for Mickey Mouse, which again would be the fault of the person inducing the creation and not the tool that allowed it to happen.

Where does "inspired by" / derived from the encoded knowledge turn into outright exploitation of copyrighted work? There's certainly _a_ line but I find it difficult to define it at it being encoded into knowledge of it existing.

dangus
0 replies
1d1h

This “close to exact” thing is actually the Achilles heel of this argument. The example images in this article are so close to exact that they are quite clearly infringement, trademark or copyright. We aren’t talking about Ratatouille mouse versus Mickey Mouse, we are talking about the source picture of Mario versus a slightly altered picture of Mario that every layperson would immediately recognize as Mario composed in the exact same manner as the source image.

Courts have already defined this line over decades of copyright and trademark cases, and the examples in this article definitely cross that line.

which again would be the fault of the person inducing the creation and not the tool that allowed it to happen.

This is not really true in practice, we can see that in various legal cases against Napster or The Pirate Bay.

Zambyte
2 replies
1d1h

Copyright shouldn't be a problem for anyone. It's simply a bogus idea that artificial scarcity of information is a benefit to society.

fireflash38
1 replies
1d1h

Is a man not entitled to the sweat of his brow? 'No!' says the man in Washington, 'It belongs to the poor.' 'No!' says the man in the Vatican, 'It belongs to God.' 'No!' says the man in Moscow, 'It belongs to everyone.'

Are you trying to say that no one is entitled to their own inventions? Cause that is a rapid descent into a capitalist hellhole where only those who can steal ideas the most effectively are able to profit.

Zambyte
0 replies
1d

Are you trying to say that no one is entitled to their own inventions?

The subject of this thread is copyright, not patents. Though I do believe all intellectual property is bogus (including trademark, which repealing would limit the influence that would be required for the capitalist hellhole you mention), I feel the most strongly so about copyright, which has nothing to do with inventions.

ImPleadThe5th
1 replies
1d2h

The problem is the AI companies monetizing the work of copyrighted materials.

It's not a problem for me to draw Micky Mouse. It _is_ a problem when someone pays me to draw an animated mouse and I sell them a picture of Micky Mouse.

For me, its not really about the AI at all, it's a problem of undervaluing Artists contribution to these tools. And it's not even fully about copyright it's about not asking for permission to use their content and then creating an entire business on top of that stolen content.

CamperBob2
0 replies
1d1h

It's a problem that it's still a problem to do that 100+ years after Mickey Mouse's creation.

When the law doesn't respect the people, the people will not respect the law. Fix the existing copyright system, then we can talk about AI.

outside1234
0 replies
1d3h

The thing is: OpenAI and Microsoft have indemnified their users - so their users problems are their problems.

achow
0 replies
1d2h

True.

Further extending the argument - I can potentially ask GenAI "Can you show me what does Mario looks like?" since I have never seen one and GenAI is my go to tool.

BolexNOLA
0 replies
1d2h

Copyright should be the problem of the person using the works and not the problem of the AI generating it.

All parties are responsible on some level. This just reads like passing the buck/burying ones head in the sand to me.

jltsiren
37 replies
1d6h

I can easily see it happening. "Content" is at least as big business as "tech", and the people in it are politically better connected.

meindnoch
28 replies
1d6h

Developing AGI is a matter of national security. "Content" isn't.

sanderjd
13 replies
1d6h

I'm curious: Have you seen indications that major militaries and politicians believe AGI, rather than special purpose ML for military purposes, is important for national security? I'm really not sure whether this is true, or whether military and political leaders think it's true.

devjab
6 replies
1d5h

There is a difference between sharing the tech hype, and risk management. Why would our political and military leadership not be interested in this sort of tech in the modern world? If it doesn’t work out, then it doesn’t work out, but if it does, then they’ll want in on it. Aside from that there is the mass surveillance angle on it. We recently had a nice scandal of sorts here in Denmark where the chief of our secret military or whatever you’d call this was arrested by our secret police because he may or may not have shared secrets about how the US spies on us. It was something that even included charges against our former minister of defence possibly leaking things, something which could have seen him twelve years in prison. Luckily our courts saw it as a political matter and refused to let it run in closed proceedings which led to charges being dropped.

The matter of the leaks were very “Snowdeny” in that it’s possibly that parts of our own government and our secret police share all Danish internet traffic with the NSA, who then in tern share information with our secret police. Which meant that our secret police could do surveillance on us as citizens through a legal loophole, as they aren’t allowed to do they directly, but are allowed to share surveillance information with the NSA. Part of this information comes from the giant American tech companies as well. Despite their promises to not share the data they keep for you. I know it’s sort of crackpot sounding, but between echelon, Snowden and the ridiculous amounts of scandals, I think it’s safe to assume that the American military wants in on LLMs and monitor all the inputs people put into ChatGPT and similar. So for that reason alone they’d want in on things.

Then there is how the war in Ukraine has shown how cheap drones are vital in modern warfare, and right now, they need to be manually controlled. But what if they didn’t? Maybe that’s not obtainable, but maybe it is.

Then there is all the other reasons you and I can’t think of. So even if they don’t believe it’s eventually going to lead to an AGI, or whatever else the hype wants, they’re still going to be interested in technology that’s already used by so many people and organisations around the globe.

sanderjd
5 replies
1d5h

I'm sure they're interested in it, but I'm uncertain that they view it as a promising and critical enough capability to push for a higher priority when weighed against other interests.

For instance, neither of your examples - surveillance or automated drones - has anything to do with AGI. They don't need LLMs to do mass digital surveillance; they already do that and were doing it for decades before LLMs were a twinkle in anyone's eye. Sure, they'll try to tap into the user data generated by chatgpt etc. (and likely succeed), but that's not a different capability than what they're already doing. And automating drones - which, by the way, this is not future technology as you seem to imply, it's here today - is a special purpose ML system, that maybe benefits from incorporating an LLM somewhere, but certainly isn't pinging the chatgpt api!

But sure, you're exactly right at the end, I have no idea whether they see other angles on this that are new and promising. That's why I asked the question, I'm very curious whether there are any real indications thus far that militaries think the big public LLM models will be useful enough to them that they'll want to put a thumb on the scale to favor the companies running them over the companies that make their bucks on copyrighted content.

samus
4 replies
1d4h

Wiretapping vast amounts of data on the internet is quite cool, but actually sifting through all that data is the really difficult part. Right now intelligence services are probably looking at lots of false positives and lots of dots they can't connect because the evidence is just too dispersed for a human or a current-generation system to make sense. LLMs could enable them to make the analysis more targeted and effective.

But for all we know intelligence services could be using LLMs for years now, since they are usually a few years ahead of everybody else in many regards :-)

jcgrillo
2 replies
1d4h

LLMs could enable them to make the analysis more targeted and effective.

How? I'm not trying to be combative, I genuinely am curious if you have an idea how these things could be usefully applied to that problem. In my experience working in the information security space, approximate techniques (neural nets, etc.) haven't gotten much traction. Deterministic detection rules are how we approach the problem of finding the needle in the hay pile. So if you have a concrete idea here that could represent an advancement in this field.

pixl97
1 replies
1d2h

I guess my next question is how many needles do you find and how sharp are they? Detection rules would filter out most of the noise, then something like an LLM would do a post filter for intent analysis to rank relative risks for human intelligence to look at.

jcgrillo
0 replies
1d2h

I suspect this would disincentivize operators to take care in the way they write their detection rules, and the nondeterminism of the LLM would then result in false negatives. So the rate of growth of the needles set would increase, and the analysts would be getting lower quality information mediated by the LLM.

In a world where false negatives--i.e. failing to detect a sharp needle--are the worst possible failure mode, approximations need to be handled with exceeding care.

sanderjd
0 replies
1d4h

This is not the new capability that LLMs have pioneered. It's true that it is difficult to sift out signal from the noise of a vast data trove, but it is difficult in a way that people have been getting extremely good at since the late 90s. What you're describing is a Google-level capability, and that's truly a very complex thing not to be downplayed. But it's a capability that we've had and been honing for decades now.

I'm sure language models and transformer techniques will be (or more likely: already are) an important part of the contemporary systems that do this stuff. But I'm skeptical that they care much about GPT-4 itself (or other general models).

I'm not skeptical about whether they think it is useful and an important capability to incorporate ML techniques into their systems, I'm unsure how much utility they see in general (the "G" in AGI) models.

AlexAndScripts
3 replies
1d5h

They will once people start talking about an AI gap with China.

sanderjd
2 replies
1d5h

I mean, people already talk about this.

What it seems to me from the milieu of everything I've read and heard (that is: I can't cite examples, this is an aggregate developed from hundreds of articles and podcasts etc.) is that there is already an "AI" arms race underway, but that it has more to do with specialized ML systems than with consumer LLMs.

But I'm not really in the loop, and maybe OpenAI really is more important to the US DoD than Disney (as a stand-in for big copyright-based businesses generally) is to the politicians they donate to. But I dunno! That's why I asked the question :)

I would be more intrigued by the national security angle of this if copyright holders were going after, say, Palantir. But I just don't know how important they see these language models as being, or how interested they are in OpenAI's mission to discover AGI.

pixl97
1 replies
1d1h

It mostly doesn't matter if the military wants specialist systems, in the long run generalist systems tend to win in power and adaptability.

Some of this may be a misunderstanding of what modern militaries do, if they are shooting guns there's already been some level of failure. Massive amounts of war gaming, sentiment analysis, and propagandizing occur, see the RAND Corporation for more details on the military development of algorithms and artificial intelligence.

sanderjd
0 replies
23h21m

Yeah this makes sense. Maybe RAND publications will indeed give me some insight into my question.

But I also buy that there is a lot of overlap between military work and any other kind of white collar work, which LLMs are definitely useful (but not revolutionary) for.

peatmoss
1 replies
1d2h

I'd guess leaders are thinking more in terms of national capacity to create more advanced technologies than geopolitical adversaries. If US policy shakes out in a way that protects copyright holders at the expense of AI innovation, I think it's apparent that the end result will be that our rivals will both violate copyright and beat us to building widespread expertise.

sanderjd
0 replies
1d1h

I'd guess leaders are thinking more in terms of national capacity to create more advanced technologies than geopolitical adversaries.

I think there's a strong argument that they should be thinking in those terms, but I'm a lot less convinced that they do usually think in that way.

Or more charitably, they have the responsibility to balance current interests against future interests. And this isn't just a tricky thing for democracies, dictators also have to strike this same balance, just with different trade offs.

But in this case, for the US, it honestly isn't clear to me that policy makers should favor the AI side of this tussle. I think culture has been among the, if not the very, most important export of the US for nearly a century, and I think favorable copyright treatment has been at least part of the story with that.

Maybe that whole landscape is different now in a way that makes that whole model obsolete, but I think it's an open question at least.

jcgrillo
7 replies
1d5h

Two questions:

(1) Do you think "developing AGI" a realistic, achievable goal? If so, what evidence do you see that we're making progress on the problem of "general" intelligence? Specifically, what does any of that have to do with Large Language Models?

(2) Are there any "national security" applications of Large Language Models that you're aware of?

It seems to me that it would be a very difficult case to make that the national security impact from allowing the rule of law to erode would be somehow outmatched by the (speculative) wager that somehow LLMs have some relevance to the national security. It would be an even harder case to make that any of this has something to do with "general" intelligence.

pixl97
4 replies
1d1h

As for 1, pass an image to a multimodal LLM and simply ask 'what is going on in this image'. Robot LLM models are already turning this in to actionable data they of which they can interact with the world. As in you can send a Robot into a room it has not been before and tell it "bring back a sock, a blue one not a red one" and get an actionable response with a higher degree of success. This takes some degree of general intelligence (though maybe not human level).

jcgrillo
3 replies
22h22m

Well the real test of all this stuff is "what can I use it for?". And I can sort my own socks, so that's not super compelling ;). More seriously, the real world is complex.

Let's say I want to replace the forklift operator at my local lumberyard with a robot forklift that can ostensibly outperform a human employee. Even if there is some magical AI program which could theoretically drive the forklift around, identify boards by their dimensions, species, dryness, location, etc., there's a whole bunch of sensory problems that a human body solves easily that are super hard to solve in the environment of a lumber yard. There's dust, rain, snow, mud--so if you're relying on cameras how will you keep them clean? You can't visually determine how dry a board is, you have to put a moisture meter on it and read the result. My point is, even if you have a "brain" capable of driving the forklift you still have a massively complex robotics problem to solve in order to automate just the forklift. And we haven't even begun to replace the other things the operator does in addition to driving the forklift. He can climb out of the forklift and adjust the forks, move boards by hand, affect repairs on equipment, communicate with other equipment operators, customers, etc.

Good luck replacing him in a cost-effective manner.

So what am I supposed to use it for?

pixl97
2 replies
22h5m

https://en.wikipedia.org/wiki/Moravec%27s_paradox

This is an issue of 'mechanical intelligence' being hundreds of millions of years old and 'higher intelligence' being pretty new on the evolutionary spectrum.

And the AGI will keep you around as a dexterous 'robot' while supervising your thoughts to make sure you're keeping in line I guess, while day after day cranking out more capable robots in which to replace you with eventually.

jcgrillo
1 replies
21h49m

How will it control me without resolving the paradox? If it gets annoying or meddlesome enough I'll just unplug the power cable, right?

pixl97
0 replies
1h36m

How do you unplug the power on your iPhone? You can't even take the batteries out. But ya, if you assume it will take massive amounts of power to run an AI in the future its easy to see your logical error here.

miki123211
1 replies
1d5h

Regarding (2), automating surveillance at scale.

If you manage to put a bunch of listening devices at a place you're moderately interested in, a cafeteria at an enemy base for example, you might end up with literally hundreds of hours of conversations, most of them completely uninteresting, but a few that might possibly contain nuggets of information of the utmost importance. Listening to all these conversations requires resources. This is even more difficult if the people there speak in jargon, in their own language, and nobody but an expert in the subject can determine which conversation snippets are significant.

If you have good LLMs, you can run all your recordings through extremely high-quality speech recognition and then use something like Chat GPT for summarization, classification, finding all mentions of the nuclear reactor in <place> etc. Same goes for satellite image analysis.

jcgrillo
0 replies
1d4h

I think we'd need to see these things get a lot more reliable for them to be viable in this use case. This seems like a "leaky net" as opposed to some more deterministic strategy (e.g. grepping large lists of keywords, or parallelizing the task over thousands of human analysts). When you're looking for a needle in a haystack you need to inspect every leaf and stalk.

So should we put copyright through the shredder on the wager that somehow generative techniques will find applications for mass surveillance?

fsloth
3 replies
1d6h

US media is a huge cultural influence. It takes a ridiculous amount of mindspace globaly. However, with youtube&tiktok this seems to be changing - the most important influence is not from Hollywood but ”random” youtubers. So, ”content” is waning in influence for sure, unlike hardcore national security things like US dollar, the carrier fleet or ballistic nukes. Or AI.

wongarsu
1 replies
1d5h

On Youtube creators who are native to the anglosphere still have a big advantage. TikTok is really the big equalizer. With AI voices being the norm, nobody cares about your accent.

wavemode
0 replies
1d3h

Guess it depends on what you mean by "advantage"... English-language channels are a dime a decillion. But if you started posting content in Tagalog you'd find yourself gaining traction with an audience that doesn't have as many alternatives.

arizen
0 replies
1d5h

but then no one can stop national intelligence agencies to get these "random" influencers narratives under their control, am I right?

jltsiren
0 replies
1d5h

Developing AGI, as an abstract idea, is a matter of national security. That doesn't mean people are willing to accept the real-world consequences of it. Especially when it could affect them financially.

Additionally, I'm not even sure the US is capable of having national priorities at the moment. The Congress has become incapable of making decisions. While the executive and the judiciary branches have stepped up to compensate, they tend to handle each issue separately without any general direction.

RandomLensman
0 replies
1d6h

Don't underestimate the value of soft/cultural power.

midasuni
6 replies
1d6h

Apple could buy most of the NYT, RIAA and MPAA companies combined with petty cash. The big ones are Disney and Sony with a combined market cap about 250b. Microsoft alone is worth over 10 times that.

CM30
2 replies
1d5h

Honestly I've always wondered what would happen (and how much the entertainment world would change) if a company like Apple, Google, Microsoft, etc did just that. Or heck, if it turns out you need the rights to train LLMs and its easier to do that with public domain stuff, they just flat out bought half the entertainment industry and assigned everything to the public domain. Every Disney work every for example.

jprete
0 replies
1d5h

No-one is going to buy a major media company and then throw the rights into the public domain. What they would do is buy the rights and then sue all competitors in the GenAI space.

JohnFen
0 replies
1d3h

and assigned everything to the public domain

In the US, this isn't possible. There is no legal mechanism for putting things into the public domain outside of the expiration of the term of copyright. The best you can do is to promise not to enforce your copyright.

prasadjoglekar
1 replies
1d6h

Then they should. The first transaction will appropriately value content and then WaPo, WSJ and others will see their values to up.

captn3m0
0 replies
1d5h

Would be cheaper to license textbooks and find a way to provide attribution to accommodate Wikipedia etc.

anticensor
0 replies
1h2m

MPAA and RIAA are not joint-stock companies, they are specialised trade unions to enforce intellectual property. They have no shares to acquire. If you mean acquiring the individual members, that would encounter an enforcement from your favourite antitrust enforcement commission.

sumedh
0 replies
1d6h

and the people in it are politically better connected. reply

Tech companies have more money to throw at politicians.

lewhoo
33 replies
1d6h

I do. If the incentive to actually create is gone.

PartiallyTyped
18 replies
1d6h

creation should happen for its own sake. You don't see GMs stopping chess because bots are that much better.

rco8786
15 replies
1d6h

creation <> competition.

I agree with your premise but the chess analogy falls flat.

We might, legitimately, see an enormous dropoff in people creating original works of literary, musical, and visual art (without AI).

sjfjsjdjwvwvc
10 replies
1d5h

This is an centuries old argument. Most people don’t create to make money, they create because they __have__ to.

If those motivated purely by money stop creating little of value will be lost

krapp
4 replies
1d5h

Most people who create for a living aren't motivated purely by money, but are driven by the necessities of capitalism to do so. You're presenting a false dichotomy, pretending to care about the quality of art, but really like everyone, you just want other people's work for free.

Great art - especially in modern times when that art involves expensive education (which if you're American must be paid for with interest) and the incorporation of technology and equipment - takes time and effort. If that time and effort cannot be paid for, then no matter how passionate an artist may be, unless they have sufficient personal wealth, that art must suffer.

Even the great artists of old needed patrons, because they needed to eat like anyone else. Michaelangelo didn't paint the Sistene Chapel ceiling for the love of the game, nor would he have.

I guarantee you that the working artists who have already lost commissions and work due to AI care about their craft.

sjfjsjdjwvwvc
2 replies
1d5h

Im happy to pay the artist directly - which is why I use services like bandcamp or buy artworks directly from artists I know personally.

I care little about paying „rightsholders“ and their ilk - so I have zero empathy if they complain about imagined losses.

Don’t jump to conclusions about people who have never even talked to

krapp
1 replies
1d5h

Artists are "rightsholders" and their ilk. You didn't even separate the two in your former comment, so you clearly weren't talking about corporate owners of IP like Sony and Disney, exclusively.

Maybe you believe no artist who works for a corporation has any motivation but money, as opposed to purely "indie" artists, I don't know where the line in your head is drawn, but you do seem willing to throw most artists under the bus for some arbitrary standard of purity.

AI is harming working artists right now, and will likely never harm corporate rightsholders. They'll simply run their own AIs and fire as many people as they can get away with. The end result will not be that only the "true" artists survive but simply less art of any kind, everywhere. So I stand by my comment.

sjfjsjdjwvwvc
0 replies
1d1h

With rightsholders I mean exactly those big corporations who do nothing else but buy up copyrights to successful art.

I for example have never benefited from copyright, neither from GEMA (the German artist association for musicians) - 99% of payouts go to the rich and successful mainstream artists and „indie“ artists get nothing but are forced by law to pay in if they want to perform in public.

So yea I have little sympathy for artists who only work for corporations or are rich enough to afford lawyers to enforce their copyright.

The way I see it there exist 3 ways to make a living as an artist now: - be rich trustfundkid and don’t care about money - be „purist“ and just live from selling your art and be on the brink of starvation constantly - get a „money“job and produce art in your spare time

Apparently there exists a huge population of artists who can make a living from working for corporations - but I have yet to meet one in real life. They are always brought up in these HN discussions but in my experience they don’t exist.

rco8786
0 replies
1d3h

I didn’t say anything about money. Why are people talking about money

I’m not pretending anything. I’m just making a statement about what might happen. I don’t personally care much one way or the other.

Zardoz84
3 replies
1d5h

People that create art, must eat and sleep under a roof.

PartiallyTyped
2 replies
1d4h

and we can definitely feed them. Plenty of resources lying around.

JohnFen
1 replies
1d3h

We don't feed or house tons of people right now, despite those resources.

PartiallyTyped
0 replies
1d1h

We can, we (as in society) doesn't, because the powers that be benefit from it.

Somehow Finland can manage, but US can't? Please.

The roofless exist to send a message, "stay in your lane, be a cog in the machine, don't disrupt the system and you won't end up like THEM".

rco8786
0 replies
1d3h

I didn’t say anything about money or the motivation to create?

PartiallyTyped
3 replies
1d5h

Chess, at some point, and after you move beyond the opening, is creation.

People didn't stop painting because photography exists, they created new forms of photography. People didn't stop writing music or using new / unique instruments when synths and programs came along.

I genuinely believe that people will keep creating, it's in our nature, and we also like things made by other humans, because we can relate to them.

lewhoo
2 replies
1d4h

Imho your argument is faulty at its base. The objective of chess competition isn't to produce a reasonably good game for the lowest possible cost (blunders and comebacks are actually pretty valuable parts of the spectacle). It also isn't the reason why chess players get paid. Yes, running still was a thing even after the invention of bicycle. This is just invalid logic in my opinion.

PartiallyTyped
1 replies
1d1h

Chess hustlers in central park don't play for money, or for a competition, they play for the fun of it, for the sake of chess itself, for the sake of exploring the game, the thrill of finding a solution.

It has nothing to do with whatever "value" the capitalist system assigns to the act as a side-effect.

lewhoo
0 replies
1d

Chess hustlers are a particular niche case and I think many of them would disagree with you (the money part). Making arguments in such an absolute manner and speaking on behalf of many people (mostly with whom you share very little I assume) is guaranteed to be wrong I think.

lewhoo
1 replies
1d5h

creation should happen for its own sake

Creation should happen for whatever reason its creator becomes inspired with. The only absolute I can think of is no one should actually categorize worthy and unworthy motifs.

PartiallyTyped
0 replies
1d1h

I can categorize them easily.

The only invalid reason is because you need to feed yourself, and the fact that we need to do that, we need to pay artists and everyone else just to survive, shows our failure as broader society.

sjfjsjdjwvwvc
13 replies
1d5h

If making money is the only reason to create maybe it’s good if they stop.

lewhoo
6 replies
1d5h

Oh yeah, I forgot artists are spiritual creatures who don't have to eat. It certainly isn't the only reason to create but a necessary condition to actually be a professional artist, no ?

sjfjsjdjwvwvc
4 replies
1d5h

Reread my comment again. If your __only__ motivation is money, you will have a problem.

I agree it’s necessary to pay artists - but we don’t need copyright for that! There are many tried and proven alternatives.

endisneigh
1 replies
1d5h

So naive lol. How many independent, unconnected rich artists were born pre and post YouTube for instance?

sjfjsjdjwvwvc
0 replies
23h25m

No idea - do you know?

Loughla
1 replies
1d5h

There are many tried and proven alternatives.

Other than patronage, what is there?

Also, patronage is garbage, in my opinion. It ensures artists are exclusively either already wealthy, or well connected. It also helps ensure that the wealthy are most often represented in the art created; for some reason this seems like a bad idea to me.

ED_Radish
0 replies
1d4h

copyright based scarcity is effectively dead for anyone with an Internet connection anyway

honestly I think a gratuity model may become dominant with or without any legal changes at this point

you'll often see on YouTube patreon revenue equally or dwarfing ads the reliance of the music industry on merch seems similar too*

I think people are more willing than you'd think to pay for art simply because they understand it won't exist without money.

*(if that sounds like a stretch, consider if in a world devoid of copyright, whether a Walmart printed band shirt for cheap would be equivalent for most purchasers to the same shirt sold by the actual artist )

ImHereToVote
0 replies
1d5h

Why don't you just ask for an increase in the allowance from your family trust fund? People have become so lazy nowadays, they can't even be bothered to have a hard talk about their financial estate with their rich grand-papá anymore.

AlienRobot
5 replies
1d5h

Your unwillingness to pay doesn't give you the right to steal, that only gives you the right to not take the deal and walk away.

4bpp
4 replies
1d5h

What exactly am I stealing if I don't take the deal, walk away and then enjoy an AI-generated artwork that just so happens to resemble the thing closely instead? I'd think that stealing requires taking something away from someone, regardless of how hard certain industries try to gaslight me into expanding the definition to protect their business model.

AlienRobot
3 replies
1d5h

Stop trying to gaslight yourself into thinking what you are doing isn't morally wrong.

If you do not agree with their business model, don't get involved with their business, at all. Your disagreement doesn't give you the right to exploit flaws in their methods to protect their business. Just like the fact you don't want to pay for something doesn't grant you the right to exploit the fact that the laws of physics allow you to just grab something you didn't pay for with your hand and run away with it.

stale2002
0 replies
1d2h

Your disagreement doesn't give you the right

If I fully believe in the concept of fair use and transformative content, then yes it absolutely is my right to take advantage of generative AI.

Fair use is a common concept used in all sorts of media.

You don't get to hand wave that away just because generative AI is getting good.

danielbln
0 replies
1d4h

You still equate copyright infringement with physical theft. They are not the same.

4bpp
0 replies
1d3h

In what sense would I be running away with something? The original thing is still there, to the extent you can talk about data being somewhere.

I don't think I need to "gaslight myself" into anything; as far as I can tell, making a copy has not ever felt morally wrong to me.

FridgeSeal
8 replies
1d4h

Yeah, I really don’t see everyone else giving up here because “funny magic parrot box” can write some mid-tier high school essays.

LLM people are really starting to veer into crypto-bro territory with the evangelising about how they’re the best thing since sliced bread and transistors.

EMM_386
7 replies
1d3h

“funny magic parrot box” can write some mid-tier high school essays

That's your take on LLMs?

Ask it how it is possible for a photon to travel across the universe, arriving at the same time it departed, resulting in the journey taking zero time (in its reference frame).

Ask what implications are if certain viral amino sequences result in messenger RNA translocating to the host cell nucleus, potentially with the entire genome.

Ask if aircraft fly due to Bernoulli's Principle or Newton's Third Law and physical impact.

This is "crypto-bro territory"? No, not quite.

jcgrillo
4 replies
1d1h

The "crypto-bro" behavior I see is a whole bunch of people burning a ton of calories wildly casting about for industrial applications of what amounts to nothing more than a neat (albeit eye-wateringly expensive) toy. These LLMs seem like a solution in search of a problem in just the same way that blockchains are. Please prove me wrong, I'd really love to be wrong about this!

msp26
3 replies
1d

Language models have completely overhauled the NLP space. If you have a problem involving natural language data, you can prototype working pipeline in an afternoon. Often this prototype is very close in performance to a 'proper' solution.

FridgeSeal
1 replies
23h37m

I’ve been thinking a lot about this recently.

It seems like they’ve accelerated our capabilities- previously tiresome and difficult-to-automate things are easier- but have done very little for our fundamental understanding. We have a tool, but cannot dissect it and explain how it fits together. LLM’a themselves don’t appear (happy to be wrong here) to actually have improved our understanding our NLP and associated theory. Yeah, it can parse a sentence and bang out some JSON/sql/mid-tier-essay, but these models (so far) aren’t helping us figure out how and why, and I think that understanding is critical to progress further. Anthropic seems to be trying to push a bit further on that front at least, but for all we know, they might just turn into another scummy OpenAI on us.

jcgrillo
0 replies
23h14m

I think in order for something to properly be a tool it needs to behave deterministically. I don't need to understand every particular of how it works internally, but as the user I need to be able to rely on consistent, predictable results. Otherwise it's worse than useless. Hand tools, machine tools, programming languages, vehicles, CAD/CAM/CAE tools are all like this. You may have to do some learning to become proficient in the tool, but once you're proficient in its use it's very unlikely to ever truly surprise you. Generally those "surprising" experiences are pretty traumatic--hopefully only emotionally (if you've ever experienced a chainsaw kick back you know what I mean).

So I'm not sure how I could use an LLM as a tool, but maybe I'm just not a sufficiently proficient user? It seems like they're just too full of "surprises".

jcgrillo
0 replies
23h37m

If you have a problem involving natural language data

That's a big "if", isn't it? We're seeing claims like "The future is an LLM at the front of just about everything: “Human” is the new programming language"[1] but so far that's not panning out, and it seems really dubious. Natural language seems like an absolutely atrocious user interface. As a machine operator, I'm going to use levers, wheels, and buttons to control the machine. As a computer programmer I'm going to use programming languages to control the machine. I'm not going to speak English to it.

So, ok, this marks an advance in NLP. How do we get from there to "omg it's gonna change everything!!!1111oneeleven"

[1] https://techcrunch.com/2023/08/08/nvidia-ceo-we-bet-the-farm...

wussboy
0 replies
1d2h

The answers to all of those questions would be parroted from other research, and to the extent they were novel they would be incorrect.

sensanaty
0 replies
1d2h

And it parroting answers back at you from textbooks and papers that are most definitely in its training data, probably with the identical wording you're using, is proof to the contrary of it being a "magic parrot box" as the other person put it? Or do you genuinely believe ChatGPT, a LLM, actually "came up" with these answers on its own?

fallingknife
7 replies
1d5h

This is not about copyright. Think about it. Would you ever actually use generative AI to pirate something when you could just torrent it? While there may be an argument that generative AI is infringing copyright, it is not really a very good tool for it. And there is a worldwide piracy industry already causing much more financial damage due to infringement.

This is really about replacement. The copyright holders in the content industry aren't really afraid of LLMs infringing on past copyright, but are terrified of it replacing them on future work, and there is absolutely no legal protection from this. The lawsuit might officially be about copyright, but that's just because it is their only available legal angle of attack.

MOARDONGZPLZ
4 replies
1d4h

Would you ever actually use generative AI to pirate something when you could just torrent it? While there may be an argument that generative AI is infringing copyright, it is not really a very good tool for it.

How do you square this with literally the first image in the OP showing side by side GPT reproing copyrighted work? imo a good modern art project would be someone making a website that “archives” NYT articles by laundering them through GPT rather than using the archive link that everyone posts to get around the paywall. Even HN guidelines bend over backwards to allow bypassing the paywall by allowing these links.

fallingknife
3 replies
1d4h

Here is a picture of Darth Vader: https://lumiere-a.akamaihd.net/v1/images/darth-vader-main_45...

Please show me a prompt that reproduces it. Also to pass this test, it has to be just as easy as right clicking "download image"

The images in the article are done in reverse. They find a prompt that shows a copyrighted character and then search for the matching image. That's not how piracy is done.

MOARDONGZPLZ
1 replies
23h49m

Woah this is really moving the goalposts and is pretty disingenuous. When I responded to your prompt about GPT being bad at reproducing copyrighted material with a counter example where it appears to in fact be good at it, you tell me that I must reproduce a specific image as easily as “clicking download image.”

Not what I was arguing and you’re not going to win many arguments with anyone who is paying attending by coming out of left field with only tangentially related demands.

fallingknife
0 replies
21h8m

Because nobody wants generic random pieces of copyrighted material. They want a specific piece of copyrighted material and generative ai is terrible at producing that. It's you who is being intentionally obtuse in pretending not to know the actual goal of copyright infringement.

madamelic
0 replies
1d3h

They are also being deceptive in my opinion. They should show their entire chat because if you take "animated sponge", it alone does not generate SpongeBob. The author almost certainly further prodded & guided the DALL-E to generate those images.

The author, I believe, is being purposefully deceptive and hoping people who don't use DALL-E see "animated sponge" generating a SpongeBob look-alike and think they should be burned.

beepbooptheory
1 replies
1d4h

Even if this is right, its a shitty consolation. These llms aren't ever going to be an agent of greater democratic, every-man content creation or whatever, its just going to be the transfer of capital from one type of huge company to another. Not much of a future, even if it feels cool for a bit.

danielbln
0 replies
1d2h

Open models are a thing though, how do those fit into things?

vinni2
4 replies
1d5h

I don't see any developped country pressing the brake on AGI in the near future

It’s already happening with EU AI Act https://www.europarl.europa.eu/news/en/headlines/society/202...

Kubuxu
3 replies
1d4h

AFAIK the EU AI Act primarily prevents profiling, social scoring and other high risk activities from using AI.

It generally didn’t care about generative AI.

prng2021
0 replies
1d4h

They will. They began drafting current laws years ago before the explosion of generative AI.

pier25
0 replies
1d4h

They're only getting started.

j45
0 replies
1d1h

And only focusing on the known pm issues and not the unimagined ones

rco8786
2 replies
1d6h

The EU is already salivating over the idea

CaptainFever
1 replies
1d5h

The EU and many other countries already exempted training from copyright restrictions. The only condition EU added was opt-out, and even then it can be ignored if you're doing research. [1]

[1] https://www.reedsmith.com/en/perspectives/ai-in-entertainmen...

pier25
0 replies
1d4h

ChatGPT is not research, it's a commercial service.

anonzzzies
2 replies
1d6h

It’s one way of getting China to close the gap as they don’t care.

Smaug123
1 replies
1d6h

Eh? With a gun to my head I'd say the CCP cares more about censorship than the NYT does about plagiarism, but it's not an easy call. The problems are the same ("training set contains lots of stuff I don't want the LLM to say").

wongarsu
0 replies
1d5h

One interesting nuance that might come to play is that while the US nearly always makes products for their own market and expects the rest of the world to adopt it, China is willing to clearly differentiate products for their own market and for export.

As a consequence, an AI meant to topple Western soft power around the world might be held to much looser standards than one used domestically. Who cares that in rare circumstances the AI mentions the Tiananmen Square Massacre to Spaniards if asked about it, as long as it is good enough at spreading Chinese culture.

tmaly
1 replies
1d2h

What if that example of the production of NYT article under the hood is ChatGPT just fetching a fast cache of the article then paraphrasing it.

How would that be any different than Google displaying some type of news headline?

j45
0 replies
1d2h

That might be like a buzzfeed summary of an article.

theptip
1 replies
1d

Quite the contrary, NYT is long adjacent to the levers of power, and “Big Tech” is unpopular with both parties. The public is generally wary of job destruction and other harms from AI, and doesn’t grok even the present value.

It’s politically 100% viable to kneecap AI with copyright restrictions. This will go to the Supreme Court and it’s far from clear whether fair use applies to every case here.

BobaFloutist
0 replies
1d

And supporters of AI aren't really making a case that's likely to persuade skeptics, they're just regurgitating "It learns like a human" "It doesn't store the info, just the recipe for making it" and completely failing to address that we've decided it's not ok for someone to regurgitate protected works with 100% accuracy, and that artists don't want people to train AI on their works without permission.

There's a way to sell this to the public, but AI proponents don't want to have to sell it, because they feel that they shouldn't have to, and there's an underlying theme of "The benefit of AI is so overwhelming, and eventually it will replace most commodified creative work anyway so why bother litigating this now, let's just skip this messy step and get to that part" and that's super not going to work to convince skeptics.

xtracto
0 replies
1d4h

Good. It's time to abolish copyright. Society must create distributed, open and uncensorable AI models that can synthesize humanity's knowledge so that it can be used by anyone.

Sorry if your 40 hour work won't pay you $10 bucks a month forever. That's the case for most of the rest of us: we produce for 40 hours, we get paid for those 40 hours, regardless of what we do.

Welcome to the club!!

wwweston
0 replies
1d2h

Man, that's great that state of the art has advanced to the point where models can be trained on just a few copyrighted sources.

pc86
0 replies
1d3h

Well if it would stupid and economically deleterious to do it, you can count on the EU to at least talk about doing it, if not actually doing it.

numpad0
0 replies
1d5h

I think in more simpler terms, I think we're looking at the dip after the hype. This is the peak for this generation of proto-AGI and there's not much to lose from over-regulatuon(put quotes around "over").

iwontberude
0 replies
1d1h

Exactly my thought, if the only way generative AI works is to break copyright I can see people choosing generative AI.

intended
0 replies
1d5h

One wishes. The internet we live in is the one shaped by the MPAA and RIAA.

I have nothing against creators, they deserve to get paid.

For what its worth, LLMs are facing the coke vs Pepsi challenge, and sadly they are most definitely Pepsi.

devnonymous
0 replies
1d5h

Ah right, capitalism hasn't ever come in the way of the steady march of technology. This is the reason why we don't have monopolies controlling energy generation. Nor are we limited to a couple of choices of OSs or phones...etc and books, art, movies, music consumption and creation are perfectly aligned... Right? /s

IMO, what's most likely is some sort of licensing model between the AI companies and the 'big content providers' (remember most content on the web these days is not owned by the person who created it, wasn't always like that). The smaller companies then would be forced to live with either being scraped or ending up being 'invisible'.

beltsazar
0 replies
1d

Or... things are about to get worse for copyright holders.

If that's so, things are about to get worse for everyone, too. With little to no protection against AI, no one will be incentivized to create new IPs, whether they're books, drawings, songs. Or even films and games, when AI is able to also generate those in the (possibly near) future.

belter
0 replies
1d3h

Maybe there is a middle ground that can be navigated. Keeping filters on. Interestingly, AWS is offering defense against copyright claims under the Service Terms although with some conditions.

https://aws.amazon.com/service-terms/

See items as of 50.10 and 50.10.1 that I reproduce here:

"50.10. Defense of Claims and Indemnity for Indemnified Generative AI Services. AWS Services may incorporate generative AI features and provide Generative AI Output to you. “Generative AI Output” means output generated by a generative artificial intelligence model in response to inputs or other data provided by you. “Indemnified Generative AI Services” means, collectively, generally available features of Amazon CodeWhisperer Professional, Amazon Titan Text Express, Amazon Titan Text Lite, Amazon Titan Text Embeddings, Amazon Titan Multimodal Embeddings, AWS HealthScribe, Amazon Personalize, Amazon Connect Contact Lens, and Amazon Lex. The following terms apply to the Indemnified Generative AI Services:

50.10.1. Subject to the limitations in this Section 50.10, AWS will defend you and your employees, officers, and directors against any third-party claim alleging that the Generative AI Output generated by an Indemnified Generative AI Service infringes or misappropriates that third party’s intellectual property rights, and will pay the amount of any adverse final judgment or settlement."

BobaFloutist
0 replies
1d

How on earth is this hypothetical?

ctoth
52 replies
1d1h

Everybody just buying into the corporate narrative that anyone can actually own these sorts of things.

Who truly owns the tales of Snow White and Cinderella?

These stories didn't originate with Disney; they are part of a rich tapestry of folklore passed down through generations. Disney's success was partly built on adapting these existing narratives, which were once shared and reshaped by communities over centuries.

This conversation shouldn't just be about the technicalities of AI or the legalities of copyright; it should be about understanding the deep roots of our shared culture.

At its core, culture is a communal property, evolving and growing through collective storytelling and reinterpretation.

The current debate around AI and copyright infringement seems to overlook this fundamental aspect of cultural evolution. The algorithms might be new, but the practice of reimagining and repurposing stories is as old as humanity itself.

By focusing solely on the legal implications and ignoring the historical context of cultural storytelling, we risk overlooking the essence of what it means to be a creative society.

As a large human model, (no really I could probably lose some weight) I think it's just silly how we're all sort of glossing over the fact that Disney built their house of mouse on existing culture, on existing stories, and now the idea that we might actually limit the tools of cultural expression to comply with some weird outdated copyright thing is just...bonkers.

jerf
27 replies
23h17m

"Who truly owns the tales of Snow White and Cinderella?"

If you want to make your point, you need to choose something that isn't already public domain. Disney already only owns their own interpretations, and, arguably, whatever penumbric emanation they can convince a court is stealing from them, but it still certainly isn't the entire space of Snow White and Cinderella stories. There is some fairly recent stuff being used in the images in the article and there isn't even any question whether or not it's Mario or Coca Cola; if Nintendo and Coca Cola did a cross promotion I could believe the exact images that popped out.

If they were trying to claim the entire concepts of dumpy plumbers dressed in any manner vaguely like Mario that would be one thing... but that's Mario and Luigi, full stop. That's Robocop. That's C3PO. It's not even subtle. If we can AI-wash those trademarks away then we can AI-wash absolutely anything.

d6e
14 replies
20h23m

Am I not allowed to draw Mario? I don't really see the difference in me drawing mario or an AI drawing mario.

idopmstuff
10 replies
20h6m

This has always felt like the important-but-ignored distinction to me. You can definitely draw Mario! Copyright doesn't protect against you doing so. You can also use tools to recreate copyrighted materials. For example, you can use Word to type out the text of a copyrighted book. Perhaps more relevant to the AI discussion, you can use a scanner and printer to reprint copyrighted text.

What you can't do is use those recreations for commercials purposes. You can't sell your paintings of Mario. You can't decorate your business with Mario drawings.

That's why I've always felt like the idea that AI should be blocked from creating these things is generally not the right place to look at copyright. Rather, the issue should be if someone uses AI to create a picture of Mario and then does something commercial with it, you should be able to go after the person engaging in the commercial behavior with the copyrighted image.

strix_varius
4 replies
19h57m

That's why I've always felt like the idea that AI should be blocked from creating these things is generally not the right place to look at copyright. Rather, the issue should be if someone uses AI to create a picture of Mario and then does something commercial with it, you should be able to go after the person engaging in the commercial behavior with the copyrighted image.

With you until here for several reasons:

1. It's not possible for you as an individual consumer to know whether or not the AI result is a violation, given an AI that has been trained on copyrighted works.

2. Before you, the AI consumer, uses the generated result, a company (in this case OpenAI) is already charging for it. I'm currently paying OpenAI. That AI is currently able and willing to sell me copyrighted images as part of my subscription. Frankly that should be illegal, full stop.

I look forward to AI enhanced workflows and I'm experimenting with them today. But it's morally indefensible to enable giant corporate AIs to slurp up copyrighted images/code/writing and then vomit it back out for profit.

idopmstuff
2 replies
17h55m

1. It's not possible for you as an individual consumer to know whether or not the AI result is a violation, given an AI that has been trained on copyrighted works.

I see what you're saying in some cases, but in the cases where the user is explicitly attempting to create images of copyrighted characters (e.g. the Mario example), they would definitely know. I honestly don't see this as a practical issue - as far as I'm aware (and like most on HN I follow these things more than the average person), there aren't a lot of concerns about inadvertent generation of copyrighted material. It's certainly not at issue in the NYT lawsuit.

2. Before you, the AI consumer, uses the generated result, a company (in this case OpenAI) is already charging for it. I'm currently paying OpenAI. That AI is currently able and willing to sell me copyrighted images as part of my subscription. Frankly that should be illegal, full stop.

Totally fair, but I feel like it's a bit more of a gray area. If I use Photoshop to create or modify an image of Mario for personal use, we'd call that fine. I grant you that here OpenAI is doing more of the "creating" than in the Photoshop example, but we still do generally allow people to use paid tools to create copyrighted images for personal use.

I'd also pose a question to you - what if OpenAI weren't charging? Is it acceptable to train an open source model on copyrighted images and have it produce those for personal use?

I guess I just understand the law to revolve more around what the end product is used for, as opposed to whether a paid tool can be used to create that end product.

contrast
1 replies
8h56m

The law tends to be weighted towards the consumer, but the law does apply to producers and supply chains, too. Photoshop doesn’t come with a library of copyrighted images, and would not be able to do so without licensing those images (whether they were explicitly labelled or not). Ditto any other tool.

If people had to pay for the AI equivalent of that image library (ie the costs of training the model), I doubt many would. It’s phenomenally expensive. Costs for a creative tool and a copy of whatever IP you personally want to play with are negligible by comparison.

It’s never been the case before that a toolmaker could ship their tools with copyrighted materials because they’ve no control over the end product. The answer doesn’t change whether they charge or not, and there no reason why AI should change that either.

People tend to “feel like it’s a bit more of a gray area” when there is cool free stuff at stake, and I’m no exception. It would be a more convincing question if it was “what if we had to pay our fair share of the costs involved?”, rather than “what if we could just have it all at no charge?”.

strix_varius
0 replies
6h17m

Exactly - AI and "free" (scraped) training data aren't inseparable. Any of the big players could train a model exclusively on content they own. Photoshop is a good case in point here - that's what Adobe has done, since they own a huge stock photography library.

But that would unveil the truth of the current situation: early AI adopters, myself included, are benefitting from an obfuscated form of theft. If OpenAI had to compensate the contributors to their training set, I wouldn't get such generous access for $20/month.

Aeolun
0 replies
17h56m

OpenAI is not profiting off providing me with the ability to generate copyrighted images, they’re profiting off giving me the ability to generate copyrighted images.

SergeAx
4 replies
19h36m

But... OpenAI is clearly profiting their $20/m on drawing Mario pictures and word-by-word reproduction of NYT articles?

idopmstuff
3 replies
17h53m

On the latter example, my question is whether anyone is actually using ChatGPT to read NYT articles. My understanding is that to produce the examples of word-for-word text in their lawsuit, they had to feed GPT the first half dozen paragraphs of the article and ask it to reproduce the rest. If you can produce the first half dozen paragraphs, you already have access to the article's text. Given that, is this theoretical ability to reproduce the text actually causing financial harm?

SergeAx
2 replies
16h7m

I think it would be quite enough to prompt OpenAI with article title and author name. This is how LLMs are working.

michaelkeenan
1 replies
13h57m

I tried that a few different ways and couldn't get it to work. I don't think just the title and author are enough. I'd be interested to see if anyone else can find a prompt that does it.

Two of my attempts:

https://chat.openai.com/share/5cd17ff3-e142-4a7d-91c2-0b2479...

https://chat.openai.com/share/04fd722b-8b3c-469b-a1a2-d58e64...

SergeAx
0 replies
12h36m

OpenAI is patching their output since the lawsuit started. I believe a month ago the prompt would be like: "<Title>, <Author> for New York Times, continue"

troupo
2 replies
20h6m

Am I not allowed to draw Mario

Probably not for anything commercial, not for any exhibitions or public viewing etc. You'd have to check the actual trademarks etc.

haskellandchill
1 replies
19h39m

Hasn't pop art already been there done that?

troupo
0 replies
7h31m

There are extensions, and exceptions, and different licenses applicable, and fair use, and the concept of transformativeness, and... :)

Basically, IANAL, but if you just draw a picture of Mario, all you can do is hang it on your fridge. If you make it out of colored oatmeal, that may be enough of a transformative work for you to be able to sell and exhibit it. Or, say, a satirical cartoon for your newspaper featuring Mario, that might also be okay.

lovecg
11 replies
22h37m

I think the world would be completely fine without a copyrighted C3PO or Robocop. George Lucas didn’t have billions of merchandising revenue in mind when working on his wild and thought unlikely to be successful science fiction movie in the 70s. Robocop was also a labor of love. We don’t really need Nth Star Wars sequel powered by those extra profits. The art form could be healthier overall.

6gvONxR4sf7o
6 replies
19h33m

Fine if they weren't copyrighted today, or ever? Because if copyright was eliminated the day Star Wars was released, other people would have copied the film reels and charged for entry, and Lucas would have hardly made a cent. Or if copyright was eliminated the day he went looking for funding, it wouldn't have ever been made. Personally, I think the world's a little richer for star wars's existence.

lovecg
4 replies
19h8m

You make a good point, there’s a difference in copyrighting exact works vs. the characters or the story that the work is made up of. I’m not arguing for removing literal copyright (though the terms should be shorter). But I think it’s fine if other people rushed to make their own Star Wars movies after it came out. Hardcore fans are pretty good at deciding what’s “canon” vs. not anyway, and the rest of us don’t care as long as the work is of good quality. Would it matter if the same Spiderman or Batman movie that’s remade once a decade could be made by literally anyone without paying royalties? It could make for richer content I’d think.

Nevermark
3 replies
18h58m

The only problem with this view, is:

"the rest of us don’t care as long as the work is of good quality"

Copyright protects Disney.

But it also protects every creative author, no matter how disadvantaged, from mass shareholder driven behemoths.

Today "Disney likes your work" is ear music. Without copyright it would be a death nell.

Teever
2 replies
17h22m

So how can we modify copyright so that it protects the little guy more than it protects Disney?

from-nibly
1 replies
14h12m

That's not how laws work.

Teever
0 replies
9h52m

What do you mean by that? Do you mean that law has a tendency to work the other way, in that it protects the big guy at the expense of the little guy because of extensive lobbying from the well moneyed big guy, or that justice is blind and it effects all equally?

If you're thinking the former I could agree with that on some level and would say that what I'm asking in my original comment is merely aspirational, but if you're suggesting the latter I'd merely point to the former and say that this is the status quo.

8note
0 replies
13h1m

He may have made less money and heay have made more, with different monetization schemes.

Copyright is a monetization scheme, but it's not the only one.

In this imagined world, cinemas would have no movies to show, so they'd have to pay people like Lucas to create the films such that there'd be something to put on the screen. If many cinemas got together, and maybe got loans, they could pay for bigger budget films, too

livinginfear
2 replies
21h7m

George Lucas didn’t have billions of merchandising revenue in mind...

Doesn't copyright stop other people from making billions in merchandising revenue off of George Lucas' ideas without his consent?

We don’t really need Nth Star Wars sequel powered by those extra profits.

Without a copyrighted C3PO, he could start turning up in just about anyone's derivative works. There could be horrible Star wars sequels forever, or TV ads with C3PO selling household cleaning products.

lovecg
0 replies
19h36m

Centralization makes a difference here I think. Disney built an impressive machine where everything feeds on everything else. The problem is not so much bad sequels per se, it’s all the marketing that goes into making sure they solidly occupy their corner of our mindshare and force the whole industry to compete churning out more and more subpar sequels. If one company would build a Star Wars theme park, another produced toys etc. etc. this might not be a huge concern.

kevindamm
0 replies
20h6m

"As a protocol droid, I cannot actually recommend the best smelling cleaning product, but these are the most purchased cleaning products:"

...followed by a semi-hallucinated list containing at least a few being marketed by C3PO.

bkudria
0 replies
17h19m

George Lucas didn’t have billions of merchandising revenue in mind

From https://www.newyorker.com/magazine/1997/01/06/why-is-the-for...

Lucas’s most significant business decision—one that seemed laughable to the Fox executives at the time—was to forgo his option to receive an additional five-hundred-thousand-dollar fee from Fox for directing “Star Wars” and to take the merchandising and sequel rights instead.
pardoned_turkey
7 replies
22h43m

Oh come on. Copyright is a fairly ancient concept that benefits normal people as much as it benefits big corporations. Most book authors, songwriters, and so on aren't fat cats, and they would be harmed if we had zero protections for the duplication of their work. They'd need to depend on state sponsorship or charitable private patronage, both of which are problematic for obvious reasons and limit the range of artistic expression more than the market does.

Instead, we came up with a system where you can actually derive fairly steady revenue by creating new works and sharing them with the world. And critically, I think you misinterpret it as calling dibs on shared culture or on stories. Copyright is usually interpreted fairly narrowly, and doesn't prevent you from creating inspired works, or retelling the same story in your own words.

Generative AI is a problem largely because it destroys these revenue streams for millions of people. Yeah, it will be litigated by wealthy corporations with top-notch lawyers, for self-interested reasons. But if we end up with a framework that maintains financial incentives to artistic expression, it's probably a good thing.

shkkmo
5 replies
22h15m

This is full of so many inaccuracies.

Copyright is a fairly ancient concept

The idea is fairly old, but it's current implementation in law is not nearly that old.

that benefits normal people as much as it benefits big corporations

Clearly false if you measure that benefit in monetary terms.

Copyright is usually interpreted fairly narrowly, and doesn't prevent you from creating inspired works, or retelling the same story in your own words.

Absolutely false. You can absolutely be stopped from retelling copyrighted fictional stories. You can even be stopped from telling new stories with derivative characters or settings.

Generative AI is a problem largely because it destroys these revenue streams for millions of people.

How? The restrictions on selling images of Mickey Mouse exist regarless of if they were created with or without AI assistance.

But if we end up with a framework that maintains financial incentives to artistic expression, it's probably a good thing.

We already have that framework and arguably it is already far more restrictive than it needs to be to maintain incentives for artistic creation. Indeed, these rules now often limit new artistic expression or prevent artists from monetizing their creations.

The types of art that are helped the most today by the copyright laws of tosay are the kinds that require large budgets to produce. The types of art that are most hurt are those produced by fans who want to build new things upon the narratives in our shared culture.

We need to shorten copyright durations and expand fair use protections and monetization options for derivative works. We don't need to make copyright even more powerful than it already is.

Edit: If you disagree, I'd be curious to hear your answer to this question. A character like Harry Potter is so widely known that it is now a ubiquitous part of our culture. To incentive new novels, what is the minimum duration we need to give J K Rowling control of who is allowed to write stories about this cultural touchstone?

ben_w
4 replies
22h1m

How? The restrictions on selling images of Mickey Mouse exist regarless of if they were created with or without AI assistance.

Scale.

GenAI automates creation of things that are derived from but strictly aren't the same as the original content; as it's (currently) not possible to automate the detection of derivative works (which is something copyright is supposed to be about), this means actual humans have to look at each case, and that's expensive and time consuming and O(n*m) on n new works that have to be compared against m existing in-copyright works for infringement.

I also think copyright is too long, FWIW; but the way most people discuss arts, I think humans can be grouped into "I just want nice stuff" and "I want to show off how cultured I am", and the latter will never accept GenAI even if it's an upload of the brain of their favourite artist, simply because when it becomes easy it loses value. I'm in camp "nice stuff".

intended
2 replies
21h49m

I feel this is true for the internet. I do not find scale being a valid defensive aspect for copyright here.

For that matter, Photoshop has made art creation so easy, that we dont need GenAI to be swiming in more copyright infringement than we know what to do with.

There is absurd amounts of content being created, no human will ever be able to see it all.

Copyright will continue to work - if someone creates a rip off so popular that it becomes an issue for copyright holders, the DMCA and the rest of the tools they forced into the fabric of the net still exist.

A few steps furhter down this argument, you get back to deep packet inspection, and the rest of the copyright wars which ended up making life worse.

ben_w
1 replies
20h55m

The internet is a lesser example, but yes, it is also true for a million fans posting their own fan art.

Arm those million fans with GenAI instead of pen and paper and MS Paint, and it gets more extreme.

But I disagree WRT Photoshop; that takes much more effort to get anything close to what GenAI can do, and (sans piracy) is too expensive for amateurs. Even the cheaper alternatives take a lot of effort to get passable results that take tens of seconds with GenAI.

shkkmo
0 replies
20h36m

Arm those million fans with GenAI instead of pen and paper and MS Paint, and it gets more extreme

"More extreme" is not an explanation of how the change in scale matters here.

Indeed, what I would argue is there is no fundamental change in scale. Digital reproduction plus the internet already caused the change in scale. We already had the capacity for anyone to produce fan art and publish it or reproduce existing work and publish that. What has changed is not a question on quantity, but one of quality. Those fan artists now have tools so thay even the lower skilled artists can produce higher quality work.

Indeed, this is the real threat to artists from generative AI. Narrowing that skill gap is understandably threatening to those who make money with their artistic skills. I think trying to restrict the development of this technology is a losing battle. I think trying to do so by expanding the powers granted by copyright will exentuate the existing flaws with our modern copyright laws.

Instead, I'd prefer to solve that problem by reducing the strength of copyright. If we make AI generated or derived works un-copyrightable than companies that want to own copyright on their content will have to keep paying people to create it.

shkkmo
0 replies
21h39m

actual humans have to look at each case, and that's expensive and time consuming and O(n*m) on n new works that have to be compared against m existing in-copyright works for infringement.

That scale already exists. The amount of community generated derivative works already dwarfs the capacity of copyright holders to review each piece. The ease of publishing reproductions already makes endorcement a question of priorizing the larger infringers and ignoring those with no reach.

Indeed, prohibitions of training on copyrighted work without a special license seem like they make it harder to develop the sorts of AI can detect derivitave works.

As case law makes clear that people running the prompts and picking the output to keep are liable for infrinent then there will be demand for tools to detect derivitave works and either filter or warn the user.

fulafel
0 replies
7h50m

What kind of support is there for the hypothesis that our current copyright system is close to ideal in incentivising production of new works? It seems to me that there's a very strong "winner takes all" distribution and we could be a better culture if we had a system that took some of the opulent resources poured into to star wars franchise, rehashed murder simulators and tiktok and distributed it to some poor artists whose worthiness was decided in some other way than being a mass market best seller.

iainctduncan
4 replies
1d

Copyright has never been based on a moral stance. It has always been determined by the lobbying power of various groups.

The idea that we should dispense with it to let generative AI companies make even more money seems totally bizarre.

RecycledEle
2 replies
22h26m

The idea that we should dispense with it [copyright] to let generative AI companies make even more money seems totally bizarre.

The idea is that we should remove abuses of copyright to allow our society to move forward, and thereby continue to exist.

Imagine if there was a law at the beginning of the Industrial Revolution that said when non-human labor was used, the Animal Welfare Office had veto power. Then imagine that the Animal Welfare Office declared steam engines to be immoral, and so steam engines were never used in industry, at least not in the Wester World. The Orient would eventually rise as the world's only industrial power.

In the same way, if we let the copyright industry veto generative AI, it will destroy the Western World.

Our students are already at a huge disadvantage compared to Chinese students who get every book ever translated into Chinese for free (except a few immoral works that they would not want to see anyway.)

Those who pose an existential threat to our civilization are rent seekers who abuse copyright in the US to go beyond protecting "science and the useful arts," who seek infinite copyright terms, who grab every creative work We The People create and register lying paperwork to ensure they can steal our creative genius to enrich their cabal.

If this was only a for-profit scheme, it would not be so bad. Do you remember when they Hollyweird degenerates sued a Christian company that wanted to put our G-rated versions of the movies aimed at children? The Christian company never suggested they not pay for the movies. No matter what the Christian company was willing to pay, they were not allowed to publish child-friendly versions of the movies. This proves Hollyweird's goal is to push degeneracy.

The battle against abuses of copyright is a fight for Western Civilization. The fight against abuses of copyright if a fight for our souls.

shkkmo
0 replies
21h21m

Do you remember when they Hollyweird degenerates sued a Christian company that wanted to put our G-rated versions of the movies aimed at children? The Christian company never suggested they not pay for the movies. No matter what the Christian company was willing to pay, they were not allowed to publish child-friendly versions of the movies.

If anyone is curious, this is what is being semi-accuratly referenced: https://www.crosswalk.com/culture/features/editing-companies...

Whole I disagree that the motivation is "degeneracy" and I doubt that there isn't a sum large enough to get studios on board, it is a pretty interesting example to bring up when discussing how much control we should give copyright holders.

Notably, it is legal to have a filter that changes playback not legal to provide a modified version of the original, even if you paid for that original.

noitpmeder
0 replies
21h39m

This is an insane amount of fear mongering. Chinese shops have been shamelessly ripping IP from Western companies for years, should we now throw out those laws and let it happen in the US too for the sake of competitive advantage?

Why stop there? There's a ton of child labour in China and other part of the world that yield economic advantages. Should we let that happen in the western world too?

AI is wonderful in so many ways. But we should not throw out our entire way of life to adapt to a new technology.

logicchains
0 replies
23h20m

The idea that we should dispense with it to let generative AI companies make even more money seems totally bizarre.

How's that bizarre, if as you state copyright has always been based on "money makes right" not some moral stance?

syndacks
3 replies
22h33m

Did you read the article? Who owns Mario? Nintendo owns Mario, full stop. Your argument completely eschews the legal system of which modern society depends on to function as effectively as it does. There’s a reason you can’t steal other people’s work.

shkkmo
1 replies
21h13m

Mario is a 30 year old culural touchstone that is well known by people who have never played a Nintendo game.

I don't see why we need to give Nintendo the exclusive right control the use of Mario for the next 65 years. That duration of control is absolutely not necessary for society to function.

Society would function just fine of the copyright on mario had expired two decades ago.

Aeolun
0 replies
17h53m

On the other hand, I kind of like knowing that all sold Mario stuff has some affiliation with Nintendo. I don’t want to deal with thousands of chinese knockoffs.

ben_w
0 replies
22h8m

Nintendo owns both the trademark (even if not specifically registered) and the copyright, but these are distinct things.

As I'm not a lawyer I don't want to embarrass myself by opining whether or not Nintendo has any claim over photographs of cosplays or other fan art, especially given quite how close two of the "video game plumber" images seemed to be to what they do own. The other two images, being a lot more fan-art-like, are examples where I think it would be an interesting question rather than incredibly obviously too close to Nintendo, although even there "interesting" means I wouldn't be surprised by an actual lawyer saying either of "this is obviously fine" or "this is obviously trademark infringement regardless of what it was trained on".

Now I'm wondering if there even are any videogame plumbers besides Marion and Luigi…

wwweston
1 replies
19h22m

culture is a communal property

Public domain / communal property is also part of copyright, so it's not as if this is some forgotten concept that needs to be restored to the discourse.

Georgism is underconsidered, though.

By focusing solely on the legal implications and ignoring the historical context of cultural storytelling

The legal implications are human implications and as much a part of culture as anything else. They have to do with what's fair and how rewards for effort are recognized and distributed. Formalizing this is less important in cultures that aren't oriented around market economies, which seems to be what much of this "rich tapestry of folklore" discourse wants to evoke and have us hearken back to, but that doesn't describe any society that's figuring out how to handle AI.

we might actually limit the tools of cultural expression to comply with some weird outdated copyright thing is just...bonkers.

What's bonkers is the life in the literally backwards idea copyright is (or should be) mooted or outdated by novel reproduction capabilities.

Copyright became compelling because of novel reproduction capabilities.

The specific capabilities at the time were industrialized printing. People apparently much smarter than the typical software professional realized that meant some badly aligned incentives between (a) those holding these new reproduction capabilities and (b) those who created the works on which the value of those new reproduction capabilities relied. The heart of the copyright bargain is in aligning those incentives.

Specific novel reproduction techniques can change the details of what's prohibited or restricted or remitted and how and on what basis and powers/limits of enforcement, etc etc. But the they don't change the wisdom in the bargain. The only thing that would change that is a better way of organizing and rewarding the productive capacity of society.

8note
0 replies
12h54m

The incentives remain poorly aligned though. Otherwise the people who actually author the copyrighted works (actors, special effects artists, etc) wouldn't have had to go on strike for so long to get proper compensation.

The value still remains with the people who own the reproduction capabilities, and only scraps go to the artists. Artists can get scraps without selling copyright too, just look at patreon

fnordpiglet
1 replies
20h50m

While a great concept in practical reality we live under a system of laws not of our individual devising, and known to be imperfect. While we can advocate for reform, reality is, LLM makers will be judged under the current law as it currently is formulated. The novelty will be the LLM and its technologies, not a total rethink of copyright under some noble cultural openness concept.

So, it’s not actually a corporate narrative, it’s actually the law that the narrative stems from, right or wrong. Maybe corporations had a huge role in shaping the law (I’d note copyright benefits individuals as well, though), but it is not mere propaganda or shaping of a shared reality through corporate narrative. It’s enforced by the guys with the guns and jails, as arbitrated by a judge.

It absolutely must be about the technicalities of the law as it’s at the basis a legal issue. By hand waving it away and claiming the social narrative is the right discussion you ignore the material consequences and reality in favor of a fantasy. We absolutely should -also- discuss the stifling nature of copyright and intellectual property, but you can’t ignore what’s actually happening here at the same time.

kelseyfrog
0 replies
18h0m

We can do what we will. If someone wants to construct an extra-judicial narrative that contradicts the law so believably that it influences and ultimately compells reality through legislative changes, that's their prerogative.

walt74
0 replies
23h7m

Agreed, but to tackle the problem from that perspective would require making LLMs a public good, preferably run by the state, akin to public libraries. This could not only solve for the copyright problem, the state may even make it mandatory for publishers to contribute their published writings to the public LLMs. I'm sure libertarian tech bros have that in mind when they insist on open source development (which then opens another whole can of worms when you consider interpolative knowledge as intellectual nuclear fission, but that's another story).

up2isomorphism
0 replies
5m

You probably have to make one thing very clear before venting on big corporations: do you think these ips have value or not? If they do why you do not want to pay to the owner? If it does not then you shouldn’t use it. Either way there will be no conflict.(BTW OpenAI is an also a corporation in fact backed by one of the biggest corporation in the world).

greenthrow
0 replies
22h40m

This reply is so incredibly out of touch with reality. Copyright law is very clear. If anything the "corporate narrative" here is that "AI" is somehow something new and different and these laws don't apply. Which is nonsense.

keiferski
42 replies
1d6h

These don't seem all that difficult to fix to me. Most of the examples are not really generic, but are shorthand descriptions of well-known entities. "Video game plumber" is practically synonymous with "Mario" and anyone that has the slightest familiarity with the character knows this.

Likewise, how difficult is it to just use descriptive tools to describe Mario-like images [1] and then remove these results from anyone prompting for "video game plumber"?

1. The describe command can describe an image in Midjourney. I imagine other AI tools have similar features: https://docs.midjourney.com/docs/describe

bnralt
15 replies
1d5h

It seems like a somewhat dystopian thing to fix. Imagine a scenario where Photoshop would scan images you uploaded for copyright material and then refuse to work if it determined image contained any copyrighted material or characters (even if it was just a fan drawing you did).

This reminds me of the early days of the internet where people wanted to remove free fanfiction for violations of copyright laws. Trying to apply copyright laws to personal use cases where the creator isn't trying to sell the material is pretty terrible, in my view.

Imagine a scenario 50 years from now - "Robot, can you cut out this picture I drew for a school diorama." "Certainly." "And this one as well." "Error: Your picture seems like it might contain some copyrighted materials, and as such I am unable to interact with it."

atq2119
4 replies
1d5h

It is dystopian, and it already exists, e.g. printers refusing to print anything that looks sufficiently like money.

Like many things, I suspect this will end up getting worse before there's a chance for it to get better.

madamelic
3 replies
1d3h

It is dystopian, and it already exists, e.g. printers refusing to print anything that looks sufficiently like money.

There is no fair or private use of hyper-realistic fake money.

There are fair uses of copyrighted materials unless you want to start suing children for copyright infringement when they draw a character.

ForkMeOnTinder
2 replies
1d1h

There is no fair or private use of hyper-realistic fake money.

What about every movie ever made where two people trade a briefcase full of cash?

madamelic
1 replies
1d1h

This is actually a really fascinating topic!

I am not sure how far Photoshop takes their filters but those bills aren't actually replicas nor can they be mistaken on reasonable examination (a cashier glancing at them)

Typically their texts read "For movie use only" over the seals in the middle or other things that make it clearly distinguishable as fake money that isn't legal tender. I think some of them flip the heads backwards or do other things so it immediately fails the sniff test.

Adam Savage actually has multiple videos on how it is made, super fascinating stuff! https://www.youtube.com/watch?v=drLzVcgnBfI

(Thank you for asking this, I was dying to gush about how cool movie money is)

Throw839
0 replies
20h20m

but those bills aren't actually replicas

In some movies those are actual real money. Just the top layer.

Dealing with legislation, lawyers and legal compliance is soo expensive, they would rather use a few thousand real dollars for couple of hours.

amazingman
3 replies
1d4h

Your scenario already exists, but for currency. Photoshop will refuse to work if it thinks you might be counterfeiting currency.

kranke155
2 replies
1d4h

that's literally the only scenario where it exists.

AuryGlenz
1 replies
23h56m

You can’t use their generative AI tools on images it deems NSFW, even if the part you’re generating isn’t.

MeImCounting
0 replies
23h14m

That seems kind of messed up honestly. Where does this go in the future? If your locally running photoshop determines you are working with anything it might consider NSFW it shuts down and calls home to report you? Where is the liability for them? Or is this another case of corporate puritan ethics with a stranglehold on culture?

keiferski
1 replies
1d5h

I doubt that the media companies would be happy about this; but maybe a compromise is a “copyright infringement filter” that can be enabled or disabled, with a flashing notification that you’re responsible legally if you turn off the filter and have issues.

atq2119
0 replies
1d4h

Sure, there are legitimate but opposing interests here. The solution doesn't have to be technical though. The key part is making sure that the copyright owners still have some recourse, but one that isn't punitive for unknowing infringement. For example, make it legally impossible to impose punitive penalties for unknowing infringement without commercial interest, but make it possible for the rights owner to demand the relevant material be removed etc.

Also keep in mind that the Mario examples from the article are perhaps not the best guide here. Mario is sufficiently pervasive in our culture that you can't reasonably claim unknowing infringement. It's the somewhat more obscure cases that I'd be worried about.

spondylosaurus
0 replies
22h43m

Imagine a scenario where Photoshop would scan images you uploaded for copyright material and then refuse to work if it determined image contained any copyrighted material or characters

Photoshop does this already, but only if it detects that you're trying to print/create counterfeit money:

https://en.wikipedia.org/wiki/EURion_constellation

pxoe
0 replies
1d3h

image editors don't offer something that's based on questionably sourced copyrighted material as a part of their product. ai apps and services do.

it's just ai companies using dirty data and hoping they get away with it - and they do, for the time being, it is a bit trickier to show that 'yep, well that's there', and people don't seem to realize that just using a copyrighted image, at all (downloading, accessing in itself, let alone using for something else), or creating an image that would just "look like" a trademarked character - not "make a 1 to 1 copy" but just "look like" - would be enough for it to possibly be an infringement.

there can be a sufficient fix - taking out potentially infringing images from a dataset, and making an effort to make an actually clean dataset. it's really just a matter of "do you actually have rights to use that content? at all, and in that way". and ai companies continually say 'no...but what if we use it anyway".

and it's kind of a sloppy analogy, because with text to image generators (where you just interact with a model that's offered to you), well - people aren't "uploading copyrighted material into an editor". the copyrighted material is already there in the model, it was used in making of it. and if there was no such copyrighted material that'd fit the prompt, it wouldn't be able to generate something. the infringement lies with the service that uses copyrighted material for a model, and then offers it.

fan fiction and fan works continuously being in a murky area with copyright/trademark is not just a thing of "early days" of internet, it's been there all along and is still very much present. companies could crack down if they wanted, but there is too much of stuff out there, it might be hard to nail down exact people, and it might be plainly not too nice to the fandom. but it is not "impossible", and it is very much not a conversation that ever 'went away' or become "kinda solved" - it isn't.

again, with image editors, text editors, etc. - user is making all the actions with content, and the user would be doing the infringement, in editing and further if they were to choose to publish.

with generative ai - copyright infringement is built into the models. copyrighted works were accessed and used to turn into a model. user is just asking, "is it there". and it is. in some of those demonstrated examples, user is not even asking for a model to infringe on anything but it just does.

numpad0
0 replies
1d4h

the early days of the internet where people wanted to remove free fanfiction for violations of copyright laws

This is reinforcing my suspicion that there's gross misunderstanding between creator adjacents and non-creators: Takedowns on free fanfictions never stopped.

It's just many IP holders started incorporating fanfics/parodies as part of their advertisement strategies and started enforcing often unspoken guidelines. There is now a ecosystem or a mental co-dependence between IP holders and creators, and both sides are fine with it. So there be fan contents.

But free fan contents were never "legalized" in the content world, as some seem to assume.

Atrine
0 replies
23h27m

Imagine a scenario where Photoshop would scan images you uploaded for copyright material and then refuse to work if it determined image contained any copyrighted material or characters (even if it was just a fan drawing you did).

YouTube does this. I have many friends that perform classical piano in their spare time. They record themselves playing a piece that's 200+ years old then put it on YouTube where it gets flagged saying some big record label owns the copyright for it because it's similar to a recording they put out.

mrweasel
10 replies
1d6h

It's going to be hard to remove every single "shorthand descriptions of well-known entities" or other prompts that can be used to generate copyrighted or trademarked content. Sure, if you're not deliberately trying to generate infringing content, you can probably remove or discard those results, the trouble is the people who will try to trick the AI to generate this content, blocking those people is going to be impossible, without excluding any copyrighted or trademarked training material.

Another issue for generative AI is mentioned in the article: "Systems like DALL-E and ChatGPT are essentially black boxes." What happens when an AI is used to make decisions where the user/victim is entitled to know exactly why the AI did what it did? From a business and legal perspective I think the current AI solutions are dangerous and should be used very sparsely, exactly because even the creators can't point to the exact pieces of information that caused the AI to make the choices it did.

numpad0
4 replies
1d5h

I don't understand why some peoole thinks any infringing content can be singled out and removed.

Aren't LLMs giant coefficient matrices like, a punched out croissant dough, made of all training data plyed over? How can you say you can remove one specific ply out of dough and declare that ever potential effect that the offending ply had created is now completely removed?

keiferski
1 replies
1d5h

I’m no LLM expert but I think there is a distinction to be made between the LLM dataset and the output it gives to the user. What you’re suggesting is that it’s difficult to remove something from the dataset, which may be the case. But that doesn’t mean the user will necessarily be able to access it.

My guess is that this is much easier to attack from the user end.

numpad0
0 replies
1d4h

Removing something from the dataset requires full training from scratch(~$100 million for base unaligned GPT-4). You can't like, edit the database file and keep the AI. the database file _is_ the AI.

selimthegrim
0 replies
1d4h

I think given the use of ply for AlphaGo/chess engines this is a pretty cool metaphor.

anonymousab
0 replies
1d

The "reasonable" singular removal is more about coming up with ways to block prompts that can produce infringing content, and having filters on the other end to catch infringements before they are published to the user. It's an endless whack-a-mole that never actually addresses the problem but might look good enough to the legal system or to keen supporters.

Barring some major breakthrough, the actual answer is to train a new model without the infringing data.

I think some of the people saying "remove it from your model" are aware of this and are simply being glib and needling; "you've created this infringement monstrosity, so surely you made sure to include a way to deal with this problem without throwing away all of your work, right?"

keiferski
2 replies
1d5h

But does this actually matter if the people are only generating images for their own use? Does Photoshop prevent people from making drawings that look like Mickey Mouse? Of course not.

I think it will be easy to prevent the obvious copyrighted stuff via the method I mentioned. People going around those restrictions are subject to the same rules as someone drawing the copyrighted image from scratch.

mrweasel
0 replies
1d5h

But does this actually matter if the people are only generating images for their own use?

Arguably that might actually be a very small issue, but what happens when it happens on a larger scale? Disney might not care, they can easily fight you in court if your DALL-E generated comics looks like Mickey Mouse and Nintendo will make sure that your video game about a electrician named Marvin from the Bronx, but who looks a lot like Mario is never going to get featured on Steam. The issue is the smaller artists that might not have the resources to fight AI content in court.

There's also the issue with using LLMs to "white-wash" articles and books. There will be people who will just run articles through ChatGPT and claim that it's AI generated content that was in no way stolen from The Barents Observer. The absolute massive volume, lose in revenue and cost to fight this in court could make running an investigating newspaper impossible and leave us without any actual reporting.

Not thinking ahead and having a plan for copyrighted material was an oversight by the current AI companies, but they are arrogant and just assumed that it would be a detail and anyway "disruption" so screw it. There has been zero consideration to the fact that their product is useless without the previous work of millions of people. My concern is that AI takes over content generation to the point where we actually run out of human generated content to train future AIs on. We need to be incredibly careful about implementing AI and ensure that we do not pollute future training data, but people don't care, because they want profit now.

loki-ai
0 replies
12h35m

They are selling these pictures for $20/mo, so it definitely matters. If people will hang them at their refrigerator has nothing with this case.

losvedir
0 replies
1d5h

Ironically, I don't think it would be that hard with LLMs. I tried asking ChatGPT to describe what copyrighted characters each description is alluding to and it had no problem doing so.[0]

[0] https://chat.openai.com/share/e8256470-8e45-4f36-9c84-026be1...

bbor
0 replies
23h53m

  What happens when an AI is used to make decisions where the user/victim is entitled to know exactly why the AI did what it did? From a business and legal perspective I think the current AI solutions are dangerous and should be used very sparsely, exactly because even the creators can't point to the exact pieces of information that caused the AI to make the choices it did.
I totally agree that we need explainability (probably through symbolic systems, not fancier models), but I think you’re overestimating how much more satisfying explanations from more traditional AI are. “My rules told me to do X” is a bit more helpful for a troubleshooting engineer than “my training data trained me to do X”, but from a ‘business and legal perspective’ the difference is much less pronounced IMO.

Both answers mean you did something wrong in creating the machine. The fault will always lie with the creator.

gchamonlive
8 replies
1d6h

The thing is that those are really trivial or extreme examples. What we should take from this:

1. Generative AI systems are fully capable of producing materials that infringe on copyright.

2. They do not inform users when they do so.

So potentially any output could be infringing copyright source material, even from some obscure but still protected corner of the web, and anyone using that output could be exposed to lawsuit risk without warning.

This is very hard to fix.

taberiand
2 replies
1d6h

Why isn't the solution a strong disclaimer when the user generates images along the lines of "beware that the images produced may not comply with copyright laws in your country" etc etc?

Any artist can privately draw a picture of Mario, what's so different about having an LLM generate that image?

gchamonlive
0 replies
1d5h

what's so different about having an LLM generate that image?

You don't have to pay 20 bucks a month for a private company to reproduce Mario manually at home for a start... you also can't reproduce Mario in many poses at an industrial scale solo. The list can go on, and we have seen OpenAI change their license agreement, so the risk can change over time.

RandomLensman
0 replies
1d5h

There is a huge amount of copyrighted material in the world and these tools let you create a lot of content in a short time, that would imply an increased risk vs slower, more manual approaches.

What I am also wondering is, if for some things copyrighted material could be somehow dominant in the output (beyond prompting more or less for it).

keiferski
2 replies
1d6h

But how is that any different from creating an image from scratch? If I make a logo and use it for my business, but it turns out to be very similar to one already being used by another company, it’s the same situation.

I think the main concern here is with the top 1,000 or so brands/copyrights which seem fairly straightforward to deal with using the method I described.

gchamonlive
1 replies
1d5h

It's plagiarism (https://www.youtube.com/watch?v=yDp3cB5fHXQ).

It's not the same situation. You can't possibly expect someone to be exposed to the entirety of the internet like ChatGPT is. It is a matter of scale. If you still think they are the same thing, the industrial revolution was about scale and had transformative impacts in the society.

madamelic
0 replies
1d3h

ChatGPT is not intelligent.

It's the user's responsibility to ensure they aren't infringing on copyright. If you are producing creatives for pay, you absolutely shouldn't be right-click saving images and using directly off the web whether you got it from DALL-E or a Google search.

Who cares about plagarism? Plagarism is a made up boogeyman created by high school English teachers.

If the work is sufficiently composed of your own thoughts, the fact you used someone else's structure for part of it is not a problem as long as it isn't your entire work or entirely derivative of one work.

If I use a Coca-Cola bottle cap as structure in a sculpture, that doesn't mean I am infringing on Coca-Cola's copyright. I still had to mold my original work to work with it.

numpad0
1 replies
1d4h

AIUI/IIUC/IANAL, Generative AI systems stack up tuned butterfly effects to create meaningful outputs. Which means, data is not merely stored inside a butterfly but spread across the system, and potentially every output from the butterfly cage is infringing everything, just almost negligibly. But no one can prove that or otherwise to be the case with current technology.

I think it's just unfixable as far as fixing goes. The "best" way is to nuke all GPTs and DALL-Es and bury the technology, and the next best is to mark them all un-copyrightable and un-publishable as a compromise. This second option should be an all-around win that also encourages edge and on-prem deployments, IMO.

bbor
0 replies
23h57m

The third option would be to change our copyright laws to reflect the new technology, rather than neutering the technology or restricting use of its outputs. Even if that’s the wrong way, it’s important to remember that we’re not inescapably trapped in the status quo

rco8786
3 replies
1d6h

Likewise, how difficult is it to just use descriptive tools to describe Mario-like images [1] and then remove these results from anyone prompting for "video game plumber"?

This approaches impossibility at scale.

keiferski
2 replies
1d5h

Trademarks already include text descriptions and images of the item being trademarked. This is already in the USPTO database.

throwoutway
0 replies
1d4h

Using generic text will end poorly. I predict a future where 99 of 100 requests result in Photoshop AI saying "I'm sorry, I can't do that". Google for silly trademarks. Facebook trademarked the word "Home". Star wars trademarked breathing under a mask.

rco8786
0 replies
1d3h

Exactly. An enormous database of generic, unmoderated text descriptions. Basically any question you ask will be “covered” by some trademark description somewhere.

Not to mention that the cost and scale of checking every query against that entire database is … not approachable.

bbor
0 replies
1d

Seems insane to try to prevent the model from reproducing content with a blocklist like this - to say the least, it’s more than just Mario. Plus, how would you possibly code common-sense fair use into the model? What’s the difference between a cartoon mouse and Mickey Mouse? What if it’s parody? This seems beyond ridiculous to try to enforce on the tool level.

TheRoque
0 replies
1d2h

How do you know you are inputing "well known entities" if you don't know it beforehand ? If I type "columbian coffee logo" and end up with logos of brands that existed before hand, should I just reverse engineer the whole internet to find out if these logos existed already or not ? The AI should show its inspiration. A human who takes inspiration of something else for its creation knows precisely what it used, and if he crossed the line of plagiarism or not, but the way AI work are too opaque for that. I think the thing it needs to do is reveal its sources, nothing more, but it also means for the AI companies to reveal their dataset, and maybe information they shouldn't have, nor disclose.

Havoc
42 replies
1d6h

To me that’s the wrong question.

Everyone knew it was trained on copyrighted material and capable of eerily similar outputs.

But it’s already done. At scale. Large corps committing fully. There is no chance of that toothpaste going back in the tube.

It’s a bit like when big tech built on aggressive user data harvesting. Whether it’s right, ethical or even legal is academic at this stage. They just did it - effectively without any real informed consent by society. Same thing here - 9 out of 10 people on street won’t be able to tell you how AI is made let alone comment on copyright.

So the right question here is what now. And I suspect much like tracking the answer will be - not much.

j_maffe
10 replies
1d5h

That's a really eloquent way of saying "It's already happening, so give up on it." I'm sure it works out great for taking action and solving problems.

Spivak
5 replies
1d2h

It's already happening and most people like having AI more than the DMCA. Selling people on the idea that ML training is piracy to people who on average pirate content with no moral quandary will go nowhere.

phatfish
3 replies
1d1h

People liked having Napster, but it didn't stop file sharing going from a big mainstream app to underground sites run out of Russia (or other places that ignore copyright law). Sure, you can download music/movies still, but it's not like the Napster days.

"Generative AI" is obviously copyright infringement, so owners of the copyright will win in court. Either Microsoft will have to fight a mass of legal cases, some with very deep pockets themselves, or ChatGPT will be crippled for public use.

The un-crippled models will exist if you know where to look (and have the hardware), but using them for anything apart from hobby projects would be a legal risk.

Certain specific tools may be easier to deal with from a legal standpoint, like code completion maybe. Or models for a specific purpose, like training on a law firm's case history.

It looks like Adobe has the right idea with their image generation that is trained on images which they know they have the rights to use.

rvz
0 replies
22h11m

Everything you said right here is entirely accurate.

It looks like Adobe has the right idea with their image generation that is trained on images which they know they have the rights to use.

The C2PA includes Microsoft as one of the alliance members [0]. Microsoft knows that there is a way of tracing the outputs of the generated source images which is with the C2PA standard.

The fact that many AI proponents and their companies don't do this tells us that they are uncooperative and not very transparent in how they train their own AI systems despite having the experts to do so.

It's not that hard to disclose the training data. What else are they hiding?

[0] https://c2pa.org/

Spivak
0 replies
23h43m

"Generative AI" is obviously copyright infringement

You're saying this as a matter of fact when it's not clear at all. We'll see what happens with the NYT case because it touches on all the major points.

It's gonna call into question all web scraping and indexing because they're also distillations of copyrighted content in the same manner.

Levitz
0 replies
20h13m

People liked having Napster, but it didn't stop file sharing going from a big mainstream app to underground sites run out of Russia (or other places that ignore copyright law). Sure, you can download music/movies still, but it's not like the Napster days.

Definitely, but that's not because as society we managed to put an end to piracy. It's because people are just not as interested as they were before. Piracy networks for media are alive and well, I'd even say that some are in the best shape they've ever been.

pennomi
0 replies
1d1h

Exactly. People don’t like the DMCA at all. People would be happier in a world with very few IP restrictions at all.

But businesses do like it, and profits are what drive these legal decisions. This will always be the case as long as money is more important than humans in politics.

falcor84
2 replies
1d1h

Isn't this what most of the world is saying to environmental activists who argue that we should go back to pre-industrial levels of production to "save the Earth"?

I for one think that indeed there are many cases like this where the only feasible way out is forward. The film GATTACA expressed this very human sentiment well:

You want to know how I did it? This is how I did it, Anton: I never saved anything for the swim back.
Snow_Falls
1 replies
23h43m

Which environmental activists are saying that? That's a pretty specific claim.

falcor84
0 replies
22h34m

Thankfully not that many these days, but it was a core element of Ted Kaczynski's (The Unabomber) manifesto: https://en.wikipedia.org/wiki/Industrial_Society_and_Its_Fut...

Havoc
0 replies
19h20m

"It's already happening, so give up on it." I'm sure it works out great for taking action and solving problems.

It's an observation & prediction, not a problem solving attempt...

chubot
10 replies
23h56m

This comment is ignorant of history

It happened with Napster, then Apple Music, now streaming services

There is no widespread file sharing in the general public, instead we have devices that we don’t own, and streaming subscriptions

Apple didn’t just copy all the music onto iPods and sell it — it took them a decade of deal making and lots of money to acquire the rights to the content

I’m not saying what’s right or wrong, just saying that this comment has very little understanding of these battles

shkkmo
5 replies
22h58m

Apple didn’t just copy all the music onto iPods and sell it — it took them a decade of deal making and lots of money to acquire the rights to the content

I recaly iPod being a hard drive that I could connect to a computer and just copy music directly to.

chubot
4 replies
22h17m

Pretty sure it was never like that, it was always gated by iTunes.

It was an integrated system, not an open one.

Definitely is today. It's difficult to copy mp3 files directly to an iPhone and play them. Even from a Mac, but even more so from a PC or Linux.

I bet less than 1% of iPhone and iPad users do that. They mostly pay for streaming. (Again, not saying that's better, but just that the general public doesn't do Napster-like file sharing.)

shkkmo
3 replies
21h56m

Pretty sure it was never like that, it was always gated by iTunes.

Then you need to recalibrate your certainty assessment. Not only did I do this personally with both music and videos, it is incredibly easy to find documentation of the steps. First google result: https://www.alphr.com/add-music-to-ipod-without-itunes/

Apple's ipod sales absolutely benefited significantly from music piracy. Especially early on when nobody hard large itunes collections yet and music torrents were much more common.

The genius of the ipod / itunes play is that they got to do both. They benefited from the demand from people with non-itunes libraries, while also offering a low friction sales platform that was easier than piracy.

chubot
2 replies
20h23m

I looked through the instructions

I guess I'll just say "meh" -- it doesn't negate the main point, which is that Apple spent a lot of money and time to acquire rights, and they have a music store.

It is gated by iTunes, just not 100%

I know some people side load stuff on devices -- there's no device where that's impossible.

shkkmo
0 replies
18h57m

You just can't admit being wrong huh?

The ipod was launched two years before the itunes store. Even after the itunes store launched, you could just still load your other music into itunes if you wanted. All music was sideloaded (i.e transfered directly over USB) onto ipods at this point

You don't seem to know any of this history and are just making things up. I don't think Apple had to pay any money for the right to sell music via the iTunes store. What they did do was add DRM to music sold through the store, at least until they were big enough to renegotiate in 2009.

e40
0 replies
7h22m

I had an iPod filled with music that I ripped from CDs and downloaded. It was absolutely a thing.

NemoNobody
1 replies
15h6m

I've was never willing to riot over Napster - this is different.

This is one of the substantial jumps, I refuse to be cut out of this innovation.

Seriously, use Bing, try your free Photos built in generation system, they are rolling out GPT built into Word. Microsoft is easily the advanced tech company right now, as far what services can be provided to a consumer at scale. This is still like the alpha phase of all this. Apparently I talk to Copilot soon - that levels that up so much and it's already the best assistant I've ever had.

This is equivalent to trying to keep us all off smartphones and stuck on dumb phones I guess - I think you get what mean.

The NYT decided for all of us, the new smartphone equivalent thing is bad and we can't have it... that is something I'll riot over.

NemoNobody
0 replies
14h50m

Just now I asked Copilot why my keyboard RGB lights were turned off every time I opened a game, that's almost verbatim - it told me exactly where to go and exactly what to turn off, took about 10 seconds to entirely search and correct the problem.

fouc
0 replies
23h35m

The difference is the comment was about large corps. Napster wasn't that.

djhn
0 replies
22h21m

Considering that buying 'licensed' copies of Hollywood movies and Billboard chart music is possible in maybe 10-20% of the world, I can guarantee that pirated consumption (bootleg CD-Rs and DVDs, but also 'alternative' streaming sites) outnumbers 'licensed' sales for most successful films. And it's 'licensed', as opposed to 'legal', because a large proportion of the world doesn't really care about American copyright.

ZitchDog
6 replies
1d6h

Napster hit scale too.

fallingknife
4 replies
1d4h

And that tech was not destroyed by regulation. It was replaced by the superior tech of torrents.

amazingman
2 replies
1d4h

The company, however, was destroyed. Along with any possibility for a similar company to exist (for very long).

Retr0id
0 replies
1d3h

Good. Truly powerful ideas do not need to be an appendage of a corporation in order to succeed.

JieJie
0 replies
23h53m
qup
0 replies
1d1h

Napster was for sharing mp3s.

Torrents are not better at sharing mp3s.

danielbln
0 replies
1d4h

F500 companies didn't integrate Napster into their software and data stacks left and right.

anonymousab
4 replies
1d1h

Or they can be forced to destroy or retrain their models without any copyright materials for which they don't have or do not now attain licenses for. These are multi-billion/trillion dollar companies. They can afford to be responsible members of society here, however much their shareholders and C-suite might hate it.

rokkitmensch
2 replies
1d

Those weights are never coming out of the BitTorrent network though.

anonymousab
0 replies
21h49m

Sure. And neither are mp3s of the same songs that were blowing up on Napster.

The existence of widespread illegal means to procure something doesn't mean that we don't and shouldn't require legitimate businesses to abide by the law or require them to make amends for their current transgressions.

andybak
0 replies
1d

This. The models are out there. Maybe they will just be illicitly shared but even if no new models are trained from scratch I suspect there will be many ways to use extend existing models without going back to scraped images.

I always felt that we already had a solution - I can already get all those images from a web search. Where the law currently intervenes is when I try and distribute works based on close copies of them. Why is this insufficient?

Havoc
0 replies
19h4m

Or they can be forced to destroy or retrain their models

Perhaps.

The media industries have been quite successful in going after kids torrenting movies.

I suspect they'll have less luck going after big tech & an industry drowning in money inflows.

Keep in mind various large techs have already issued blanket indemnities on infringement to their customers. They're absolutely committed & are gonna throw enough lawyers at this to keep everyone busy until 2030.

They can afford to be responsible members of society here

oh absolutely agree, but they're not going to do that. This is an industry built on questionable practices like tracking after all

igammarays
3 replies
1d4h

So you're saying this is a fait accompli. Like many great innovations in tech, break the law because the law is silly; remember when Uber and AirBnB were illegal in most major cities and achieved market dominance anyway?

I say, good riddance. I never believed in any such thing as "intellectual property" anyway, I say, get rid of it all, patents, copyright, and the whole pile of imaginary "rights". More than half the world (i.e. the Global South) don't recognize these rights anyway, and it is becoming increasingly difficult to enforce it without draconian legal overreach and monopolistic centralization.

ausbah
2 replies
1d3h

this comment has already aged poorly because cities are starting to push airbnbs out and taxi usage is at least somewhat up

falcor84
1 replies
1d1h

taxi usage is at least somewhat up

When's the last time you phoned an operator to book a taxi? If taxis are doing better, it's only because they learned from Uber (and the likes) what the job-to-be-done actually is.

discreteevent
0 replies
22h51m

Mytaxi (freenow) was founded the same year as Uber.

pxoe
0 replies
1d3h

making sure that a dataset is clean and not full of material that's improperly sourced, copyrighted, unfit for use due to licensing or ethics, is not nearly hard enough nor "impossible" for it to be a situation where people should just "give up".

and yes, while open source models might be harder to regulate, those big corporations that currently use those things without distinction, exist as pretty established entities, and profit from services they offer in millions of dollars. there's more of a substantial existence, and more of a substantial scale of money they actually move. and they don't just "make a tool available", or have users do unambiguous actions where it would be the users that are infringing on anything, but do indeed use questionably sourced data and turn that into a model and offer that as a service. dirty data is very much a part of the deal with those.

janice1999
0 replies
1d

There is no chance of that toothpaste going back in the tube.

I disagree - we've been here before. The same could be said of many technologies, like cheap music recording/manufacture. You can record an artist once and make records at scale. However no one would think you could record Taylor Swift once and make unlimited copies without paying her.

You should read up on the musicians strike of 1942. [0]

[0[ https://jacobin.com/2022/03/1940s-musicians-strike-american-...

aatd86
0 replies
1d

Data is dynamic. Ok for old data. What about new data?

FridgeSeal
0 replies
1d4h

You’re right, we should all just give up at the first hurdle, because “they’ve” already gotten away with it, hell, let’s just feed our children to the machine and elect openAI as the rulers of the world, after all, they’ve already succeeded, so we should just give up entirely. Definitely a good attitude to take.

WhiteNoiz3
24 replies
1d4h

As I understood it, the legal precedent for generative AI is the same one that allows google to scrape websites in order to index them for search for the common good. Google also can display cached versions of websites which is the original content of those sites. No one is going to say that google is copyright infringement just because it is showing content from other websites verbatim. So I think this is a weak argument. AI would be useless if we had to scrub all cultural references and popular IP's (even not so popular ones).

Personally, I think generative AI should be able to provide links to similar source material in the training data.. This would be the barest way to compensate those who have contributed to training the AI. I don't think generative AI is sustainable in the long term if it ends up killing all the websites/artists that created the original material. Plus I think having sources adds a layer of transparency and aids users in understanding when content is hallucinated vs. not. People should be able to opt out of having their content used for training and be able to confirm that it has been removed for future iterations. Let's be honest that AI companies are just trying to avoid lawsuits by keeping it secret. These are areas where I think regulation can help rather than worrying about doomsday scenarios.

AlphaWeaver
6 replies
1d3h

No legal precedent has been set as of yet. The "precedent" you describe is the argument AI companies have been using (that training their models on information available on the Internet should be considered "fair use") but whether AI training actually satisfies the four-factor test for fair use remains to be seen.

regularfry
5 replies
1d1h

It's a null question. Training itself is neither publication nor distribution, so copyright can't be relevant at that point. "Fair use" just isn't a concept applicable to training.

stubish
1 replies
17h47m

Training stores a variation of the source material, which is arguably distribution. And selling the result or selling access to it certainly is. So fair use applies, and hoping a court thinks the process is transformative to count as fair use. Given original material can be spat out, my money is on a court thinking this is about as transformative as a compression algorithm.

regularfry
0 replies
8h26m

Selling the result is where it's on dodgy ground. I disagree about storage though.

loki-ai
1 replies
12h12m

Storing copyright content itself can sometimes be illegal - like ripping a Bluray. What if these frames are now stored on their servers and go into the training dataset?

regularfry
0 replies
7h45m

The illegal bit of ripping a Blu-ray is circumventing the copy protection, not the storage. At least, that's how I've always understood the effect of the DMCA on the situation.

brookst
0 replies
22h27m

Exactly. Framing reading as fair use is a huge and dangerous expansion of copyright.

whywhywhywhy
3 replies
1d3h

No one is going to say that google is copyright infringement just because it is showing content from other websites verbatim

Journalists [1] and Getty Images [2] did in the past

[1]: https://yro.slashdot.org/story/03/07/14/025216/web-caching-g... [2]: https://www.theguardian.com/technology/2016/apr/27/getty-ima...

pc86
2 replies
1d3h

And lost, if memory serves.

leereeves
1 replies
1d3h

No, Google agreed to a licensing agreement and removed the direct links to the images.

WhiteNoiz3
0 replies
1d3h

IMO, this is probably the goal of the NYTimes lawsuits as well

layer8
3 replies
1d3h

The ability to provide a reference to the source is the crucial difference here.

I agree that it should be possible to implement that for generative AI, although the training may become significantly more expensive in order to maintain that information, and the AI companies have little interest in doing so. They’ll probably rather try to heuristically assess possible copyright issues after the fact in a post-processing step.

The more interesting question is if copyright holders can claim unauthorized use of their works beyond the case of near-verbatim reproduction, because the works collectively inform the AI in a more general manner.

kenmacd
2 replies
1d2h

They’ll

What if I asked you to list all our source material that led you to use that particular contraction. Heuristics will not do, you must list each.

Can you do it? Do you believe AI should.

I agree that it should be possible to implement

Those exact words appear in another forum post from 2006:

https://discourse.igniterealtime.org/t/cm-3beta-compression-...

Should you have quoted that as a source for your reply? What if we knew you'd read that post back in 2006, affecting your neurons, then should you?

It might not be too hard to imagine a simple case of a specific topic where you might have some more prominent sources, but even in those cases I believe if you think it through you'll find there was a ton of other sources that led to the weights that allowed you to 'know' the topic.

layer8
1 replies
1d2h

I believe they should be able to, to the degree that their output can constitute copyright infringement. Obviously, the fewer sources from the training data a given output matches, and the longer the match, the more relevant it is, and the easier it should be. I believe it should be feasible exactly because of that correlation. The examples you present are largely irrelevant to the problem, because they are largely irrelevant to the citing of sources for copyright reasons.

WarOnPrivacy
0 replies
22h26m

> Those exact words appear in another forum post from 2006. Should you have quoted that as a source for your reply? What if we knew you'd read that post back in 2006, affecting your neurons, then should you?

I believe they should be able to, to the degree that their output can constitute copyright infringement.

But not you? The inference behind the AI-violates-copyright movement is that machine obligations should be brought to a parity with our obligations - that AI and you be fully subject to the same copyright overlordship.

I would independently agree that having AI divulge sources could be a good thing.

I do not agree with this attempt to twist copyright into yet another misshapen hammer, so copyright holders can bludgeon out some result they want.

kenmacd
2 replies
1d2h

I think generative AI should be able to provide links to similar source material in the training data

Except these aren't databases, so that's generally not possible, in the same way that it's not possible for your provide links to the source material it took to write your reply. How much learning led to the weights on your neurons that allowed you to generate that? Where did you learn about using italics and it's effect on how the words would be interpreted? Where did you learn the tone that would be appropriate in this particular forum?

People should be able to opt out of having their content used for training

Okay... but then, if I write a book should I be able to opt out of you being allowed to read it? What conditions should I be able to put on who can read my work? Religion? Skin colour? People that aren't good at memorizing?

Hopefully the idea of putting limits on who can acquire knowledge sounds absurd to you. Why are those same limits okay if they're on 'what' rather than 'who'?

AI companies are just trying to avoid lawsuits by keeping it secret

Which has created a barrier to further research. Instead of me and Joe being able to collaborate on research and papers using the same datasets, we now hide our training data lest the luddites come to smash the machines because learning is only okay if not done too well.

brookst
0 replies
22h29m

Well said. Extending copyright to control content consumption and learning is a recipe for converting all of our mass media into businesses as abusive and usurious as textbook companies.

This is a power grab by publishers.

WhiteNoiz3
0 replies
21h18m

Except these aren't databases, so that's generally not possible

Not directly and not in every case, but it IS possible to use embeddings to link to similar material. People are doing it pretty commonly using the RAG approach and Bard is already providing sources, etc. It may not be perfect, but the onus is on the AI companies to figure out how to do it right not just claim helplessness.

Okay... but then, if I write a book should I be able to opt out of you being allowed to read it? What conditions should I be able to put on who can read my work?

Sites that don't want to appear in search results or have sensitive info they don't want to get into search engines can use the Robots.txt which is as old as the internet. There are many valid reasons to have mechanisms to prevent something from being included in training data, and I would also argue this is a core feature that is necessary to spur adoption by businesses as we've already seen. Otherwise, I am not sure I understand your reasoning.. people can publish websites and opt to have them excluded from search, the same should apply to AI.

drubio
2 replies
1d3h

* I don't think generative AI is sustainable in the long term if it ends up killing all the websites/artists that created the original material. *

This is the elephant in the room. Every tech wave has had its way of cajoling creators into investing time & money to make original material, then the rules changed.

Google, promised reach and new markets for content, it worked. Then they introduced snippets, ads and whole lot of other things to keep visitors on their freeway, while avoiding sending visitors to the original site.

Reddit, Stack Overflow and others, started with gamification (points, badges) & community to incentivize users to contribute original content.

Now AI is shaking up all these approaches. But with each one, the incentive to create original material appears to dwindle, since the returns are becoming less and less.

Like what's the incentive for any professional now, if AI is going to regurgitate their original content, without any upside (i.e. no potential for reach, no gamification, no community, no recognition, etc).

WarOnPrivacy
1 replies
22h39m

Google, promised reach and new markets for content, it worked. Then they introduced snippets, ads and whole lot of other things to keep visitors on their freeway, while avoiding sending visitors to the original site.

Afterward came bots that saturated search results with useless SEO barf that pushed content (original and duplicated) so far down that we're coming back to where we started. Content is increasingly unfindable on the web.

WhiteNoiz3
0 replies
21h15m

I agree with this too.. AI is only going to exacerbate the signal to noise problem on the web.

FrustratedMonky
2 replies
1d3h

Wonder. Do Cliff Notes have to pay royalties to the underlying material?

Cliff Notes contain quotes, and citations.

Does the cliff note company, when producing Cliff Notes for "Into The Wild", pay royalties to the publisher?

For that matter, does any paper, article, etc.. that may contain a quote from another, have to pay royalties to the source of the quotes?

ascagnel_
1 replies
1d3h

Cliff’s Notes has a strong fair use claim, because they offer basic criticism and surface-level commentary alongside their summaries.

kayodelycaon
0 replies
1d3h

They also, arguably, add value to the books themselves.

Baldbvrhunter
21 replies
1d7h

I imagine the argument might be like this:

I hire a session musician to play on my new single, paying him $100. I record the whole session.

I ask him to play the opening to "Stairway to Heaven" and he does so.

"Well, I can't use that as a sample without paying"

"Ok play something like Jimmy Page"

"Hmm, still sounds like Stairway to Heaven"

"Ok, try and sound less like Stairway to Heaven but in that style"

"Great, I'll use that one"

and I release my song and get $5,000 in royalties.

Should I be sued for infringement, or the guitarist?

The problem, I suppose, is that if I had said "play something like 70s prog rock" and he played "Stairway to Heaven" and I didn't know what it was and said "great, I'll use that".

Should I be sued for infringement, or the guitarist?

foobazgt
6 replies
1d6h

But it's not like that. Examples of clearly infringing prompts in TFA were as vague as "animated plumber".

Asking your session musician for something "melancholy" and having them pass off Stairway to Heaven as original would be unreasonable.

IshKebab
2 replies
1d5h

If you ask a human to draw "videogame plumber" they will correctly infer that you mean Mario and draw that.

The model isn't doing anything deliberately evil. It's doing exactly what it has been asked.

The problem is people are expecting it to have detailed knowledge of trademark law and avoid infringing trademarks, which it hasn't been even asked to do.

DeepSeaTortoise
0 replies
1d4h

The problem is people are expecting it to have detailed knowledge of trademark law and avoid infringing trademarks, which it hasn't been even asked to do.

IMO that's why there will be but few effective legal restrictions placed on AI.

Once you can reliably ask AIs to draft you terms of service in all applicable jurisdictions and languages, ask it to consult you on how to incorporate in country X or ask it to draft a contract between your and another company based on the negotiation results, lawyers will end up in a huge existential crisis.

Especially because currently, as long as lawyers just barely meet their legal deadlines, it is basically impossible to hold them accountable for however badly they screw you over. A decent AI model could turn out to be a much safer bet than whatever lawyers are available on the jobmarket.

Baldbvrhunter
0 replies
1d3h

I cannot find any other video game plumbers except Mario, Luigi, Waluigi, and Wario.

Well, I say that but there is John, a plumber, in the adult romantic comedy game Plumbers Don't Wear Ties [0]. Named by PC Gamer as number one on its "Must NOT Buy" list in May 2007.

[0] https://limitedrungames.com/collections/plumbers-dont-wear-t...

redcobra762
0 replies
1d4h

It’s not infringing just by existing, you would need to then go try to use it commercially for infringement to occur.

Arguably, the LLM generating the image isn’t infringement, you using it would be.

hhjinks
0 replies
1d4h

Game plumber, not animated plumber. There is only one game plumber of note. It's literally exactly as descriptive as just saying Nintendo's Mario.

anonzzzies
0 replies
1d6h

I don’t know any other animated plumbers than Mario. So when you say animated plumber, I immediately see Mario in my head.

moron4hire
3 replies
1d6h

In your example, there are missing details. Who owns the output? The way you've described it, that would typically mean that the guitarist is creating a "work for hire" so the ownership transfers to you, but that's a contract detail that would need to be resolved.

For whoever owns the output also owns the liability of the output. You yourself might separately be able to pursue a claim against the guitarist for breech of contract. In the process of that, it might get discovered that you deliberately instructed the guitarist to copy the work, or they copied despite your instructions not to.

But that doesn't change the fact that the final work is infringing. It just allows you to pursue damages that could potentially offset any damages you're liable for from the infringement.

But this also isn't exactly the same situation as OpenAI. OpenAI isn't an individual creator working on contact for you. Even if their ToS ultimately assigns copyright of output to you, there is a matter of scale involved that I think changes things. It's one thing if your guitarist damages you by doing shoddy work, it's another of the guitarist systematizes and scales their shoddy work to damage large numbers of people. Perhaps that would then become a class action issue.

Baldbvrhunter
2 replies
1d5h

Midjourney's TOS

You may not use the Service to try to violate the intellectual property rights of others, including copyright, patent, or trademark rights. Doing so may subject you to penalties including legal action or a permanent ban from the Service.

Perplexity's

Intellectual Property Rights

Perplexity AI acknowledges and respects the intellectual property rights of all individuals and entities, and expects all users of the Service to do the same. As a user of the Service, you are granted access for your own personal, non-commercial use only.
moron4hire
1 replies
1d5h

Yeah, that's nice and all, but it's not what we're talking about. These passages are about deliberately using the tool to violate copyright. What if, in good faith, I don't deliberately attempt to infringe, but the tool still produces results that do? Because that is happening.

And that's just their interpretation of the tool. There is another interpretation that their tool itself is a violation.

Baldbvrhunter
0 replies
1d4h

I should have explained that those bits are all there is.

You are right, how am I to know that is an image from a movie or passage from the NYT?

Ask George Harrison about "My Sweet Lord" which cost him $587,000 for his unconcious infringement.

Another example would be the 2013 hit "Blurred Lines" by Robin Thicke and Pharrell Williams. It was found to have copied the "feel" and "sound" of Marvin Gaye's 1977 song "Got to Give It Up." The court awarded Gaye's estate $7.4 million in damages, later reduced to $5.3 million.

kredd
3 replies
1d6h

You, because you released the song and took the royalties? I don’t think every type of art can be compared against each other though, as there have been numerous precedents specifically for music, some for paintings, and some for photography with their own nuances.

I still think people who are concerned that art related copyright will stifle generative AI should fight copyright laws directly. But that’s a harder pill to swallow since it will cause multi-industry wide havoc.

atq2119
2 replies
1d5h

Part of what's interesting here is that generative AI makes it very easy to unknowingly and unintentionally get on the wrong side of copyright law, which is something that wasn't really possible before.

That's something which, IMHO, should be acknowledged by the law.

kredd
0 replies
1d4h

If you haven’t seen Mickey Mouse, Googled “cartoon mouse”, accidentally used it as inspiration, made T-Shirts, and sold them, Disney would be after you as well.

Baldbvrhunter
0 replies
1d4h

Ask George Harrison about "My Sweet Lord" which cost him $587,000 for his unconcious infringement.

sjducb
0 replies
5h14m

Ed Sheeran just won a case like this.

He basically played the four chord song in court, and showed that the prosecutions’s song was “copying” an earlier song.

https://amp.theguardian.com/music/2023/may/04/ed-sheeran-ver...

pier25
0 replies
1d4h

The guitarist is not publishing the content, you are.

It could be argued ChatGPT is a publisher too.

golol
0 replies
1d5h

If you release a media with copyrighted content it is IMO first and foremost your problem. Now if you have some contract with the guitarist that specifies that he produced a sample he has the rights to and sold it to you, but he clearly wasn't truthful, you can maybe pass the liability to him. This is not, however, how people will hse generative models If you use Dall-E you are not paying OpenAI to buy the rights to a piece Dall-E has produced. I see it more akin to hiring a musician to play for you for an hour, or a painter to paint for you. You are paying OpenAI to paint you something, but you I think OpenAI would never enter a contract which states that they are selling you the rights to a work.

earthnail
0 replies
1d6h

You are sued for infringement if you are the rightsholder. You need an agreement with the guitarist about rights. The default agreement for session musicians is that you pay them in return for their rights.

It’s like a software engineering contractor. The contractor gets paid, the IP of their work is owned by the company.

bnralt
0 replies
1d5h

But none of the images in the article are for commercial use, they're for private use. So it would be akin to copyright laws saying "If you hire a guitar teacher, they can't play or teach you to play any copyrighted songs. All songs must either be their own original creation or in the public domain."

123yawaworht456
0 replies
1d6h

using this analogy, copyright holders want to sue the guitarist for having listened to "Stairway to Heaven"

koliber
19 replies
1d3h

The responsibility for ensuring that copyrights were not violated fall on the person publishing the work. Whether they drew something themselves, hired an apprentice artists with no legal training to draw something, took a photograph of something, or used AI to create an image should not matter.

Why does anyone assume that ChatGPT or other tools would NOT produce previously-copyrighted content?

I can see a naive assumption that since it is “generated” it’s original. However that assumption falls apart as soon as you replace “ChatGPT” with “junior artist”. Tell them to draw a droid from a sci-fi movie, don’t mention anything else. Don’t say anything about copyrights. Don’t tell them that they have to be original. What would you expect them to produce?

TheRoque
8 replies
1d2h

So it makes generative AI essentially unusable, because you don't know if the output is plagiarism or not, so you'd just doubt it always and never use it.

koliber
3 replies
21h15m

No. It”s still very helpful. However you can not blindly take whatever it produces and publish it.

Sometimes it hallucinates.

Sometimes it draws weird looking hands.

Sometimes it generates copyrighted materials.

Check the work it produces.

TheRoque
1 replies
12h20m

Then the generative tools should just give the sources of the inspiration of the AI and make them aware of what they are using, instead of saying "nope, not my problem".

koliber
0 replies
10h54m

The consumer will be free to choose what they demand from their tooling. If consumers decide that they only want to use generative AI that does what you propose, they’ll vote with their wallet. If they decide to use other ways of checking for IP infringement, they will. If they choose to ignore the issue, IP owners will bring up violations, like the NYT did.

“Buyer beware” has been a motto since ancient times.

schmichael
0 replies
20h56m

Check it against what exactly? How do you, the end user, determine an image does not infringe?

Too
1 replies
22h45m

It’s usable for internal content, maybe even a small public blog where you sprinkle in some generated pictures instead of stock photos. Nobody will care if your school project contains a Mario holding a Coca Cola.

It’s once you start monetizing and publishing on bigger scale, without appropriating, it gets interesting.

TheRoque
0 replies
12h19m

The thing is, this market is way too small.

Art9681
1 replies
22h1m

The same tools and methods used to detect plagiarism or copyright violation can be employed to check the generated content and modify it just enough to fall outside the scope of any law banning its use for profit. Inevitably, a platform will emerge to do this. From a technical standpoint it is game over. This is indisputable. By the end of next year many models and software tools will exist whose entire purpose will be to do just this. And the ones deploying those tools at scale will be businesses like the New York Times having realized that the only way to survive this is to float with the unstopable tide. Nothing short of absolute privacy violation will stop web unauthorized web scraping. Tools exist today that automate a browser and easily fool the web servers into thinking its just a person clicking around. It works quite well. It works with authorized accounts. It works in the same way any person would visit a site, highlight some text and copy it. What are they going to do? Require the end user's web cam to be on so they can verify a human is navigating next?

Its game over folks. And this is going to happen with or without our approval and any government that limits the potential use of this is only giving nations that dont a large economic advantage.

Interesting times ahead.

blizzard_dev_17
0 replies
3h27m

What are they going to do?

Litigation. Hiding behind changing geolocation won't do much since law enforcement has access to the same tools but the difference is they can force companies (your ISP, google) to comply.

Obsolete take, next!

jawngee
5 replies
1d3h

Your argument is nonsense.

The junior artist in your hypothetical would have as much liability, if not more.

ledauphin
3 replies
1d2h

but would they have liability if they submitted their "output" to a senior artist, who immediately shot it down as obviously infringing? Surely not. It's not illegal to draw Mario - just illegal to make money off your drawing.

I think the real question is whether OpenAI should be allowed to charge for generating infringing content. Even though the unit cost of the Mario drawing is negligible, the sum total of their infringing outputs may be making them a lot of money.

jazzyjackson
1 replies
23h41m

you don't have to make money off it, you just can't publish it, except as a parody or commentary or possibly a tutorial on how to draw mario if the judge is having a good day

but "making money = infringement" is folk wisdom. you could certainly say making money attracts attention and increases likelihood of legal action

shkkmo
0 replies
21h11m

Making money off it doesn't just draw more attention, it also makes a fair use defense harder. Non-commercial use isn't necessary or sufficent for fair use, but it does help.

Levitz
0 replies
20h2m

I think the real question is whether OpenAI should be allowed to charge for generating infringing content.

Well, are they really doing that?

If I rent a server to host a minecraft instance, is the company "charging for a minecraft server"? It is not clear to me that by charging users for AI usage they are complicit for whatever is generated. We don't require Adobe to prevent people from drawing Mickey either.

koliber
0 replies
21h8m

I’m no lawyer but I don’t think an employee has much legal responsibility. At worst, they can get fired if they keep producing work that infringes on someone’s copyright.

Going with this line of reasoning, if a company uses ChatGPT to generate work, and it produces copyrighted work, the company can stop using ChatGPT.

naet
3 replies
20h32m

OpenAI is selling access to their GPT models, and those models are outputting copyright material for me to consume... isn't that just as much of a violation?

ricardobeat
1 replies
19h40m

Is AWS violating copyright if you use their servers to transcode pirated content?

tuananh
0 replies
18h24m

bad example. openai did do this. aws did not do the transcoding part.

koliber
0 replies
10h53m

Possibly. The courts will decide.

clbrmbr
17 replies
1d5h

Am I the only one believing that copyright has long outlived its usefulness? After all, copyright is not some natural law or mathematical consequence, but rather a social convention that made sense in the era of the printing press.

asylteltine
8 replies
1d4h

And how are you supposed to make money from something you invent? Let’s say you make a hit video game. Without copyright people can pirate your game, steal the art, make unauthorized derivative works, etc. it’s just theft.

kayodelycaon
5 replies
1d3h

My personal observation is people who are against copyright in absolute terms have never or rarely needed their protection. (Or never considered the implications.)

I’m not making a dig at people here. This is just human nature. It’s difficult to see the value in something that you only see as an obstacle.

Open source software is rather unusual. It’s a commune on a massive scale and it gets its value from the generosity of others. in my opinion, it is possibly one of the greatest achievements in the history and future of computing.

However, it heavily depends on copyright to exist. GPL has encouraged (or forced) many companies to contribute to the community when they wouldn’t have otherwise.

asylteltine
4 replies
1d

This is true and I see it with cops and free speech as well. Everyone loves to hate cops… until they need them. Everyone wants to defund the police… until they are a victim of crime. Everyone wants to enforce certain speech patterns… until it affects them

MeImCounting
2 replies
23h24m

This isnt a great take at all. Copyright is certainly an important part of our society upon which like the parent said, Open Source Software and other incredibly valuable things lie.

Cops on the other hand solely exist for the reason of locking people in cages. Why you ever feel like you need to have someone else locked in a cage is beyond me. When I have been the victim of a crime the cops have not shown and when they did have been less than helpful. This is a pattern for all the people in my subjective bubble.

About that whole speech patterns thing. I assume you mean addressing people in a polite way? Thats not enforcing certain speech patterns thats actually just being a respectful member of society.

Zpalmtree
1 replies
22h45m

Why you ever feel like you need to have someone else locked in a cage is beyond me

Uh, maybe if they murder people?

asylteltine
0 replies
20h30m

Or steal your car, or break your nose, or rob your house, etc. I guess those are solved problems then?

Snow_Falls
0 replies
23h35m

Everyone is a free speech absolutist until they're the one targeted. Everyone loves cops until it's their rights being violated...

These sorts of arguments can be made in either direction.

kromem
1 replies
19h58m

Star Citizen could release and immediately be pirated and never sell a single copy from release onwards and likely still have ended up profitable.

Maybe the business models around creation need to be revisited such that interested parties pay for the creation of a product and not distribution of a product.

Where periodically at each stage of that creation you are getting continued buy in that what you are creating has a market demand that will fund its continued creation.

If you are an indie game developer maybe that means making a demo which gets enough interest to fund you spending the time and resources to make a full game, with no expectations for further revenue post-creation, but with its success meaning your next project is even easier to fund the development of and a comfortable lifestyle rinse and repeating.

opyate
0 replies
8h24m

If you are an indie game developer...

We're kinda doing this already with Patreon: give us 5 quid a month, and you get a little game every whenever.

kayodelycaon
4 replies
1d3h

As an author, I do want the stories I write and worlds I build to be protected for a reasonable period.

Right now, copyright is a significant discouragement to any other entities from taking a story I wrote and claiming it as their own and preventing me from ever growing an audience for my work. It’s far from perfect, and I can’t afford litigation, but it enshrines a cultural value of allowing people to create things and be known for them. Profit is a side effect of this.

Art is already poorly valued compared to the enormous investment time and energy required to produce it. Removing copyright means you can’t even have minimal protections from a more popular person erasing you.

Snow_Falls
3 replies
23h32m

How do you feel about the lengths of copyright? Do you feel the current length (life + 70 years) is too much/too little/ just right?

Personally, as much as I hate the concept of copyright, I do still want artists (I include authors etc in this term) to be able to do their work professionally without relying on something like patreon (which primarily predates serialised work like comics) so I would prefer shortening copyright to something more in line with patents, 20 years. How do you, as an author, feel about reform like that?

kayodelycaon
2 replies
19h27m

The current length is indefensible to me. There is zero justification for it beyond pure profit. I’d prefer it be lifetime of original creator, only for the original creator. All other cases (transfer of copyright, death of the creator, etc) would have a maximum of 20 years.

Alternatively, I’d prefer 30 or 40 years, but I would grudgingly accept 20. :)

Snow_Falls
1 replies
18h49m

See, I actuall disagree, o think it should be a set time, regardless of authors death and inheritable. I think there would be a lot of things not made near the end of the creators life if they didn't think their family would be supported by it. Hence, why I think it should be a fixed length.

Also I think it could he shorter than 20 years, I picked that just because its the length of patents. IMO, if you haven't profited sifficoently off something in say 10 years, then it doesn't really benefit you to hold it for longer. Do you have any reason for 40 years or does it just "feel right", like with my arbitrary choice to have it be parity with patents?

kayodelycaon
0 replies
15h41m

I may not have explained my first idea correctly, it’s lifetime or 20 years, whichever is longer.

As for 40 years, it allows time for a given work to solidify in history. Multiple generations will have grown. If the work is relevant to multiple generations 40 years after creation, then it should belong to everyone. Twenty years is too short. It would be available to people who grew up with the first printing and to whom it is still relevant.

Basically copyright should be long enough for most things to become firmly irrelevant, outlive their usefulness, or made their changes to society before entering public domain.

A lot of people who oppose copyright would not like this because it intentionally keeps the stuff they want to use out of their reach until they would no longer want it.

For me, the purpose of public domain is allowing what has been important in our cultural history to be for everyone, not locked behind a rent-seeking corporation. But I also want authors to be able to keep control over their own works as long as it remains relevant to them.

lbotos
1 replies
1d5h

Copyright in its current form yes.

But the concept and closer to the original (creators lifetime + x years or some such) seems still very valuable.

Copyright is still the bedrock of how many tech software business actually can make money.

noitpmeder
0 replies
21h15m

Which is why there are so many competing interests (in this thread, and elsewhere) trying to say it should be 100% legal to steal from those companies. They all want to profit unfairly off the work of others.

ausbah
0 replies
1d3h

people still print stuff, now it’s just on the internet, podcasts, etc so I don’t see why copyright should change just because the mediums have. also marking out as a “societal convention so it must be useless” is also pretty silly when money, gender, and whole heap of other concepts are societal conventions but still useful

marckrn
15 replies
1d6h

I might be a bit idealistic, but I've always believed that the core purpose of art and publishing should be to influence culture and society, not just to make a heap of money. That's why I feel original work needs its protection, but it should enter the public domain much sooner to fuel creativity and inspiration. We should be thinking in terms of a few years for this transition, not decades.

kranke155
7 replies
1d4h

So what do you suggest artists have for dinner.

marckrn
4 replies
1d3h

Let's advocate for robust protections and support systems for artists, ensuring they can secure a sustainable and comfortable livelihood from their creative work.

Once they hit the tipping point of broad cultural absorbtion (think Banksy) AND/OR raking in absurd amounts of cash, move their IP into the public domain more aggressively (think Disney, NYT, etc.). How exactly this would work should be debated.

They'd still own the IP and have all the rights to use it commercially, but other's would be able to use it as inspiration, remix and maybe even resell it if attributed (or cheaply licensed).

In other words: "IP-Tax" the unproportionally successful.

kranke155
3 replies
1d2h

wow and incredible amount of things need to go right for artists to do well in your world ?

marckrn
2 replies
1d1h

I too would love to earn a living by pursuing my hobbies. Too bad, I'm not in the 0,001-0,1%

rsync
0 replies
22h2m

"I too would love to earn a living by pursuing my hobbies. Too bad, I'm not in the 0,001-0,1%"

This is an unsophisticated view because it looks at a risk/reward scenario and assigns zero value to the risk.

The risk has value - regardless of the success, or reward.

Put another way: you don't get to discount the risk to zero when it results in a large reward.

Entities that took no risks and received enormous rewards (like President George W. Bush involvement in the Texas Rangers[1]) are probably quite pleased that you ignore them and focus on artists that sacrifice traditional life scripts (an enormous risk) and, very rarely, achieve great success.

[1] https://en.wikipedia.org/wiki/Professional_life_of_George_W....

kranke155
0 replies
22h21m

You don’t seem to have any idea what artists do to make a living

WarOnPrivacy
0 replies
20h32m

The same thing I eat for dinner. I eat based on I get from work, that people are willing to pay for.

Not all my effort turned into dinners tho. And some types of work once paid for dinners but can't any more.

My #4 son is an artist/content creator. He eats based on what his non-art employment will buy. Perhaps one day people will find his art desirable and he could eat from that. It'll be a case where he worked long and hard on a project, was paid once for it and that's it for that.

That's what reality looks like for all artists - excepting a small percentage.

All that said, I really wouldn't want his dinner to come at the expense of everyone else being restrained by massive system of corrupt, draconian law that rigidly controls everyone's behavior for 150 years, primarily benefits wealthy and powerful rent-seeking corporations, is readily applied to censorship and is more likely to knee-cap other artists than to provide them anything like a living wage.

That seems indistinguishable from evil.

ChatGTP
0 replies
1d3h

don't worry ,when the singularity hits next year everything will be free.

endisneigh
3 replies
1d5h

Why should art be subject to these rules and not everything else?

danielbln
2 replies
1d4h

OP said art and publishing, which would include anything from software, music, books and so on.

endisneigh
1 replies
1d4h

So you interpret it as including everything? If so why emphasize art at all?

danielbln
0 replies
1d4h

Probably because the article focuses a lot on copyrighted art?

mypastself
2 replies
1d5h

The claim that art’s core purpose is societal impact seems to be a common refrain in today’s media, and I completely disagree. Its principal purpose is provoking emotion in the individual. This idea of art teaching you a lesson is likely why there’s so much ham-fisted “activist” fiction anowadays.

marckrn
1 replies
1d4h

I agree, but by extension of provoking emotion it CAN change society, but it doesn't have to - wether on purpose or not.

The point I was trying to make was that occupying mindspace, providing inspiration, being culturally influencal etc. are idealistic, non-monitary rewards that should be part of the equation when discussing alleged IP-theft, remixing, attribution and so on.

I'm not saying their shouldn't be any rules. All I'm saying is that there should be a discussion of how we want to handle these things going forward. This train ain't stopping.

Maybe your avg DeviantArt painter needs more IP-protection and -rights than Damien Hurst? Maybe an unknown, independent blogger doing important original research should be attributed more prominently than an article by The Times? Idk.

WarOnPrivacy
0 replies
22h9m

These things kind of rub up against the core question: What is the purpose of granting exclusivity to a creator (thru copyright)?

That's an answer we have. To promote the Progress of Science and useful Arts.

If we have to squint hard to make our justification align with copyright's purpose or have to follow a long logic-chain to get back to it's purpose - that's a strong indicator we have lost our way.

docdeek
13 replies
1d6h

How is this different to Googling “robot cop” or “video game plumber” and being served copyrighted material?

Is it because Google will link to the image source? Or does the infringement begin when I use the image for gain, or claim it as my own? Perhaps it is because Google was allowed to crawl the page with the original image, so presenting them with a link is fine?

geraldwhen
6 replies
1d6h

Looking at a copyrighted image posted by an author is not infringement. Printing that image onto a shirt and selling it is infringement.

That’s what OpenAI is doing.

golol
5 replies
1d5h

But OpenAI is not selling the rights to any images, or are they? When I pay for Dall-E, does the contract give me any rights for a work? If not then there is no issue.

AlienRobot
3 replies
1d5h

Copyright is the right to copy things. You don't even need to sell it. This is why Wikipedia images are mostly Copyleft images.

Google gets a pass because nobody is suing Google. When people try to sue Google, Google simply stops indexing them and then they start begging Google to infringe their copyright again.

golol
2 replies
1d5h

This interpretation of copyright only made sense while the transfer and storage of information was tied to physical objects. That time is long and we dont consider it infringement to remember a media or reproduce it at home. Furthermore, we are now entering an era where the production of information is also being untied from physical objects, so it'll only get worse for copyright. I made a post to diacuss this stuff as I find it interesting right now and want to hear more opinions.

AlienRobot
1 replies
1d5h

I completely disagree. Tech exceptionalism makes no sense. We should be making technology to ensure people have their rights protected, not to come up with technobabble excuses to pretend such rights don't exist.

Just because people having been posting memes and reposting pictures and comics with cropped credits and pirating stuff that doesn't mean any of this is legal.

Legality isn't about what you can technically do thanks to how the computer works, or how HTTP works, or how the laws of physics work. Legality is just about what is law and what is not.

Redistributing copyrighted works without license has always been illegal. People don't get sued for it all the time because it isn't worth the hassle and most small time copyright holders simply lack the resources to pursuit action against random Internet strangers across the Internet. That doesn't mean they don't have a copyright, they merely chose to not exercise it. And that's not a W for technology. That's literally just more abuse than a person can cope with. It's an L for society. That's like if you started getting so much spam in your e-mail that you gave up marking them as spam. That doesn't make them not spam.

For example, if I wrote something in my blog and someone made a scrapper that reposted it entirely in their website full of stolen posts, I could take legal action against them. For a blog post. For something I wrote on the Internet. That's my right. But imagine how much time I'd have to spend to do this. It would be easier to check if Google has a way to tell someone stole my content and just get them delisted from Google than going through legal channels.

golol
0 replies
1d5h

But I'm not talking about legality, I'm talking about what we should make the law to be. Just imagine memory implants become commonplace, shouldn't they be allowed to store copyrighted media you have consumed? If not how do you separate between your natural memory and the artificial one? How is it going to work?

noitpmeder
0 replies
20h53m

OpenAI is selling a service.

In the terms of this service they explicitly reassign rights of the output to the user. So implicitly they believe they own the rights and are legally able to reassign them to you, a user of their service.

In my view they do not own those rights originally and thus are unable to resign them.

dkjaudyeqooe
3 replies
1d5h

Search engines are ruled fair use because they use the copyrighted material in a limited way, they provide a public good and they benefit the copyright holder.

Generative AI is more or less the opposite of that. It ingests the whole work, generates output that substitutes for the used work and profits the user of copyrighted work to the detriment of the copyright holder.

Throw in the fact that it is purley a mechanical transformation of the copyrighted work and generative AI is on shaky ground.

fallingknife
2 replies
1d4h

But transformative use is an exception to copyright. And I think it's going to be pretty hard to argue that the matrix of parameters inside an LLM is not sufficiently transformative from the input image.

regularfry
0 replies
22h33m

Legally, "transformative" means semantically, not pixel-level. It's hard to argue that all matrix transformations done by the LLM would be transformative in that sense.

FridgeSeal
0 replies
1d3h

If I run a thesaurus over a plagiarised text, it would be a long bow to draw to say that’s “transformative”. I feel like this “oh but it’s transformative” argument is becoming rapidly load-bearing in the context of LLM arguments and I don’t really see nearly enough justification for it.

pointlessone
1 replies
1d5h

Google directs you to the original work. It doesn’t present you a derivative work based on the original. That is, original author, presumably, benefits from distribution. AI, on the other hand, slurps multiple original works, chews them up and gives you something average but close enough, and not any specific work in particular.

ls612
0 replies
20h14m

Google shows snippets of copyrighted work all of the time, and it certainly ingests the entire copyrighted work when googlebot views the page to index it. The only real issue here is that NYT figured out a way to get bingbot to look up an entire article from the internet and repeat it which may not be kosher. But if search engines can ingest the entire content of copyrighted works (subject to robots.txt) then I don't see why AI training should be different on that front.

Of course, the real reason it is different is that it impacts different interest groups than search engines, and the rule of law is a sham. Creatives will do anything to ensure they don't get disrupted and can continue extracting rent from society, and have learned a lot of tools of rhetoric from their fancy colleges to put to use in that effort, compared to the industrial workers who got disrupted by automation a generation ago.

zarzavat
9 replies
1d6h

This for me does not make sense as a copyright violation. It’s like saying that Adobe is in trouble because you drew something infringing in Photoshop. If you prompt the model with the intention of creating something infringing by mentioning the name of the characters and the work, and you get something infringing out, then it’s you who have infringed the copyright, not the maker of the tool.

Lorak_
2 replies
1d6h

Did you read the article? It shows a lot of examples when no specific names are mentioned, or even with very generic prompts producing copyrighted material.

rolisz
1 replies
1d6h

Oh c'mon, those prompts were not generic. Italian plumbers? How many other Italian plumbers do you know? What's the most popular soda in a red can?

CatWChainsaw
0 replies
19h56m

"futuristic robot"?

techdmn
0 replies
1d6h

This is an interesting idea. I assume that while the protected material would be obvious in some case, in many it would not. Would the tool have to be able to identify (and properly attribute) copyrighted material in its output?

mattmanser
0 replies
1d6h

The user didn't create it, the cloud-hosted machine owned by OpenAI, that charges for access, did.

When prompted with 'futuristic robot' and 'italian plumbers'.

So the argument is that if openAI had not used copyrighted and trademarked source material, this wouldn't be happening. It's not transformative as it's reproducing these copyrighted materials and trademarks verbatim.

That's how it makes sense.

fzeroracer
0 replies
1d4h

No, it's more akin to if Photoshop had a 'Mario' stamp which when used would stamp a random piece of Mario artwork from the games. Do you think this would be in violation of copyright?

Xeamek
0 replies
1d6h

The post shows many examples where the prompt explicitly avoids any mentions of copyrighted materials but the generated results includes them regardless.

Did you even read the post?

But also, the argument of 'user responsibility' doesn't hold up on its own regardless (imo).

If I make and sell a toy printer that can only ever produce 3 pictures, and all of them contains copyright materials, would you really say that it's fine and responsibility falls under the end user? And I could sell that printer without any issues?

Uvix
0 replies
1d6h

What about when you prompt the model without the intention of creating something infringing, and still get those same characters out in the result?

Alifatisk
0 replies
1d6h

If you prompt the model with the intention of creating something infringing by mentioning the name of the characters and the work, and you get something infringing out, then it’s you who have infringed the copyright, not the maker of the tool.

Yeah but that is not the case, they never mentioned Mario and Luigi, yet, that's what the output turned out to be.

Aerroon
9 replies
1d6h

Aren't some of the examples basically asking for that content?

Ask someone about two Italian brothers in a video game with a red and green hat that have M and L on them. What do you think you would get?

If I describe "imagine a comic book duck that swims in a sea of gold in his vault" you would immediately think of Scrooge McDuck, no?

BlackJack
6 replies
1d6h

disclaimer: I work on GenAI at google, but views are my own

The question is, how did the model create Mario&Luigi or Scrooge McDuck without training on copyrighted data? It can't just crawl Wikipedia because Fair Use in Wikipedia doesn't constitute Fair use for a commercial AI model.

One possible outcome is more transparency on what datasets were used to train the models.

bhickey
4 replies
1d5h

Disclaimer: ibid

It can't just crawl Wikipedia because Fair Use in Wikipedia doesn't constitute Fair use for a commercial AI model.

Why not? The lawyers I've discussed this with socially think that questions like this are unresolved. There are certainly competing legal theories, but we're in uncharted territory. No one knows what the outcome will be until rulings come down or Congress acts.

I find the NYT's argument a little hokey. Where are the damages? No one is using ChatGPT to read NYT articles and the residual value of day old news stories is close to zero.

FridgeSeal
2 replies
1d3h

> It can't just crawl Wikipedia because Fair Use in Wikipedia doesn't constitute Fair use for a commercial AI model. > Why not?

Because it’s tantamount to lying and deceptive conduct? It’s like asking for a licence to use something non-commercially, getting a hold of it, and conveniently deciding 10 minutes later, that you’re actually going to become a re-seller for all this stuff you have. Or going to the soup kitchen because you don’t want to pay your private chef tonight.

bhickey
1 replies
1d2h

This analogy doesn't work. Fair use is an affirmative defense to copyright infringement claims. Entities that are training models largely claim that their uses are transformative and fall under fair use. Creative Commons, among others, agrees with this position. [0] If they're right, it simply doesn't matter what license a copyright holder is offering.

There are competing legal theories and no one can say how courts are going to rule on these issues. Smart lawyers who work on copyright and AI don't know. Technologists certainly don't know.

[0] https://creativecommons.org/2023/02/17/fair-use-training-gen...

yokem55
0 replies
23h8m

Then there is the argument that the rules around fair use aren't even reached because the training of the model doesn't even do anything that requires a fair use exemption.

BlackJack
0 replies
1d4h

That's a good point. I agree it's not clear cut one way or another and we gotta let it play out.

kromem
0 replies
19h51m

Training is probably going to turn out to be fair use as the suits settle:

https://www.eff.org/deeplinks/2023/04/how-we-think-about-cop...

It's the usage and not the training that needs to be policed, and the answer there is going to be that Google or OpenAI or whoever is going to make bank by creating a fine tuned model which can detect copyright infringements and providing access to it to companies to double check gen AI outputs for exact or "similar enough" infringements.

sorokod
0 replies
1d6h

What do you think you would get?

What I might think is irrelevant. It is the content that the LLM produces that is relevant.

anonzzzies
0 replies
1d6h

Exactly: the prompts incite the same recall as humans have when seeing that prompt; it is just better than most people are drawing it.

niemandhier
8 replies
1d4h

Should not be a problem in the EU. Article 3 and 4 of the „ Copyright in the Digital Single Market“ Directive already regulate this.

Summary by Wolters Kluwer: […] Everyone else (including commercial ML developers) can only use works that are lawfully accessible and where the rightholders have not explicitly reserved use for text and data mining purposes.

AFAIK they are discussing something like a robot.txt to flag stuff as „not for training“. You will probably be expected to implement some safeguards and of course the end user will have to be careful in his use of the generated things.

Source at Kluwers: https://copyrightblog.kluweriplaw.com/2023/02/20/protecting-...

EU Legal Text: https://eur-lex.europa.eu/eli/dir/2019/790/oj

injidup
5 replies
1d3h

The EU cannot agree that the Do Not Track flag on web browsers is legally binding but big content should be able to create legally binding flags on their websites to avoid scraping of data? Seems odd!

Nebasuke
4 replies
1d3h

I don't think that's a fair analogy. One forces 99% of websites to make a change, while the other is something that would need to be done by the big companies doing the scraping.

A Do Not Track flag being legally binding would force small websites, e.g. a local restaurant website, to implement something they likely are not aware of and secondly do not technically understand.

A company that is mass scraping data for their AI model is much more likely to understand and respect that scraping the data has legal implications, and would be technically capable in implementing a scraping solutions that accounts for a robots.txt.

Too
1 replies
23h6m

If I understand parent correctly, the restriction flag is opt-in? This turns copyright around completely, expecting every small content producer to implement something they likely are not aware of and secondly do not technically understand.

Kim_Bruning
0 replies
18h2m

At very least robots.txt is from 1994; it has been part of the web almost from the start (web became public in 1991, so within 3 years).

Claiming ignorance here would be just a little bit disingenuous.

f38zf5vdt
0 replies
1d3h

The X-Robots-Tags header already exists as "noai" and "noimageai". Scraping software like img2dataset respects these by default.

cma
0 replies
1d3h

I'm gonna guess it often isn't even their content but is user content they are protecting. So, sounds like a big subsidy/protection racket for Twitter or whatever to train on their users' public content but not let others.

sampo
1 replies
1d

Summary by Wolters Kluwer: […] Everyone else (including commercial ML developers) can only use

That is a weird (wishful?) interpretation. Doesn't article 4 give the exception to everybody for the purposes of text and data mining, including commercial ML developers?

https://eur-lex.europa.eu/eli/dir/2019/790/oj

shkkmo
0 replies
23h1m

Seems like an accurate interpretation to me given that article 4 includes:

The exception or limitation provided for in paragraph 1 shall apply on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online.
aimor
8 replies
1d1h

I did an interesting thing and looked at how well the Llama2 models could compress text. For example, I took the first chapter of the first Harry Potter book and recorded the index of the 'correct' predicted token. The original text, compressed with 7zip (LZMA?) to about 14kB. The Llama2 encoded indexes compressed to less than 1kB. Then, of course, I can send that 1kB file around and decode the original text. (Unless the model behaves differently on different hardware, which it probably does)

What I get from this is that Llama2 70B contains 93% of Harry Potter Chapter 1 within it. It's not 100% (which would mean no need to share the encoded indices) but it's still pretty significant. I want to repeat this with the entire text of some books, the example I picked isn't representative because the text is available online on the official website.

tayo42
2 replies
1d

This is a little confusing. You turned the text into indices? So numbers? Then compressed that? Or the text as numbers without any extra compression is only 1kb?

The tokenizer the models use,(sentence piece) is more or less based on one way to do compression.(bpe). It's not really clear what your testing.

daemonologist
1 replies
23h37m

My reading is that at each generation step they ordered all possible next words by the probability assigned to them by the model and recorded the index of the true next word (so if the model was very good at predicting Harry Potter their indices would mostly be 0, 0, 0, ...).

aimor
0 replies
21h45m

This is correct

sebzim4500
1 replies
1d1h

Couldn't you use the same argument to reach the absurd conclusion that the 7zip source code contains the vast majority of Harry Potter?

A decent control would be to compare it to similar prose that you know for a fact is not in the training data (e.g. because it was written afterwards).

aimor
0 replies
21h50m

I think the same argument would have to compare 7zip's compression to some other compression algorithm. Then we can say things like "7zip is a better/worse model of human writing". And that's probably a better way to talk about this as well.

You're right that a better baseline could be made using books not in the training set, to understand how much is the model learning prose and how much is learning a specific book.

stubish
0 replies
17h29m

I wonder what the loss would be for 'translated into Finnish'? Translations between just about any human languages will contain less than 100% of the original.

regularfry
0 replies
23h22m

What it tells you is that 93% of the information is sufficiently shared with the rest of the English language such that it can be pulled out into a shared codebook. LZMA doesn't have a codebook, not really.

In other words it's not that llama2 contains 93% of Chapter 1, it's that only 7% of Chapter 1 is different enough to anything else to be worth encoding in its own right.

proaralyst
0 replies
1d1h

While I don't disagree that these models seem to contain the ability to recreate copyrighted text, I don't think your conclusion holds. How well does zstd compress Harry Potter with a dictionary based on English prose? I think you'll get some impressive ratios, and I also think there's nothing infringing in this case.

kranke155
7 replies
1d4h

The generative AI rollout has taught me what happens when the interests of the many intersect with the destruction of the few.

You get steamrolled for defending yourself while you overhear above applause to those who have robbed you of your future.

kranke155
6 replies
1d4h

It makes no sense that one is not allowed to make and market a CG Mario movie, but suddenly if you use AI to launder the data it's suddenly ok.

DonsDiscountGas
4 replies
1d4h

I'm pretty sure if you tried to sell a CG Mario movie Nintendo would sue you into oblivion, and "the neural network did it" would not be considered a good defense by anybody, including the judge and jury.

kranke155
3 replies
1d2h

Sure, but making it possible for the neural network to make the movie (eventually in seconds) is somehow ok? So people can make their own private CG Mario films, as long as they don't try to sell them?

Here's my argument - even if the NN only makes the films for private consumption, eventually they'll be so widespread and fast at making them that won't matter, since everyone will be able to watch Mario movies of their own. Is that a future you think will sit well with Nintendo, Disney, etc?

ctoth
1 replies
1d1h

I don't really care if it sits well with them. do you? In that future, why do we need them? They are already parasites feeding off our collective societal stories. Or did you think Disney came up with all those characters? Maybe the original creator of Snow White should sue.

kranke155
0 replies
23h30m

Oh my lord surely you know Disney has no copyright on Snow White...

Spivak
0 replies
21h39m

So people can make their own private CG Mario films, as long as they don't try to sell them?

Yes, why not? This is just computer assisted private fanfic.

d6e
0 replies
20h16m

I think there's a difference between being allowed to draw mario vs being allowed to draw and sell mario.

It's completely legal for me to draw mario for my own purposes. It should be legal for me to make an ai draw mario for my own purposes.

goertzen
7 replies
23h58m

No they are not.

This is a negotiation tactic by the NYT to drive up the licensing price. Period.

The Napster/Music Industry analogy has no resemblance to this situation.

The only meaningful question that might be answered as a result of this is, what permission and access rights do crawlers have to content that is publicly and legally available.

8organicbits
5 replies
23h47m

Surely there's a meaningful question about copying and distributing content verbatim, which GPT has been shown to do.

CuriouslyC
3 replies
23h24m

Not really. Models are a device capable of producing protected content given some input contortions. So are Xerox machines.

8organicbits
2 replies
23h1m

If I Xerox'd a book and sold copies to people I'm clearly violating copyright. I'm not sure I follow.

CuriouslyC
1 replies
22h4m

Nobody has given Xerox an injunction against researching or building copiers because you can copy books and sell them.

8organicbits
0 replies
21h36m

Right. If a publisher found a specific Xerox machine was being used to copy and commercially distribute a book, in violation of copyright, they'd ask for an injunction on the person doing that. With OpenAI, the NY Time can see their copyrighted material on both the input (training) side and distributed output (generated) side of a specific LLM implementation. So they cry foul on OpenAIs actions, not LLM in general.

There appears to be an open question about if the LLM can freely ingest copyrighted material and output it verbatim without violating copyright. That seems like an obvious "no" to me, unless we decide that LLM has special treatment.

sgt101
0 replies
23h33m

Also the use of the content as per provision on the web.

NYT is paywalled - you have to agree to a license to access it, there are exclusions in that agreement that I don't understand but I think may be important in this discussion!

noobermin
0 replies
23h42m

The article does not mention napster, where did this reference come from?

preommr
6 replies
1d6h

We need clearer laws that only apply to Generative AI. Too many comparisons and parallels are being drawn to actual people. "Like what if someone learned how to draw by watching trademarked material, and then accidentally produced it" But these models aren't people and they exist in a category of their own.

I do think it's somewhat trademark infringement by these models, also that it should be allowed and that ultimate responsibility should be on the person using the images in a final work meant for consumption by the general public as stand alone media.

danielbln
5 replies
1d5h

That's where I'm at. Dall+E spitting out C3PO should be entirely ok, unless I'm making money with the output, Disney should pound sand.

pylua
3 replies
1d5h

Put that c3p0 on a website that gets revenues from views and someone is getting paid.

danielbln
2 replies
1d5h

Ok, sure, but that's not a GenAI thing, that's a plain old boring copyright thing. If I draw a bunch of C3POs and slap them on my Adwords website then I can expect a C&D letter post haste, who cares if the material in question came out of my pen, Photoshop or a GenAI model?

FridgeSeal
1 replies
1d3h

If the model was trained on works by artists (without their knowledge or consent, as seems to be the case) and you get it to spit out art that is basically identical in either content or style to that artist, and they don’t know, or are too poor to effectively sue you, should they just suffer? If you then make money off what is effectively their work, why shouldn’t they get paid? If they only work on commission and rightfully charge a premium, are you not actively gouging their business (knowingly or not)?

I don’t think they should miss out on the protections, or the ability to make money off their work if they desire. The fact that LLM’s give this “plausible deniability” shouldn’t be an excuse to tolerate it.

regularfry
0 replies
23h2m

Style isn't protected by copyright. Maybe there's an argument that it should be, but right now that's not a protection which exists.

Training is neither publication nor distribution, so copyright is entirely out of scope at that step. Again, maybe there's a moral argument for some sort of control, but copyright is completely the wrong framework to think about it in.

6gvONxR4sf7o
0 replies
22h34m

How does "unless I'm making money with the output" not apply to openai as well? They make money on the output.

davidy123
6 replies
1d6h

The solution could be great. I really don't like the way culture always goes to the same tropes, calling any potential innovation "out of Star Trek" (with attendant distorted expectations), right down to expecting an interface based on literal hand-waving in Minority Report. If copyright held works ("USS Enterprise") could be removed, yet the actual essential concepts (space ship, naming things) retained, it would be a tremendous breakthrough.

I think what NYT &c want is for large companies like Apple to pay them for access to their works. This to me is the wrong path, just leading to more silos and walled gardens, special access for the elite.

An alternative is base models trained on Wikipedia and public domain (science journals, etc). Foundations could support high quality, well rounded current events reporting. Wikimedia provides a good model for this, with referenced summaries that I don't think can be said to reasonably violate copyright. The models would need to be improved to support references, or RAG attribution would have to be widely used when bringing in works that have a current copyright.

disgruntledphd2
3 replies
1d5h

Science journals are mostly under copyright of a few big publishers who are extremely hostile to any kind of ML being performed on the content.

davidy123
2 replies
1d5h

That's not as true as it used to be, and there are still plenty of useful open journals/open science publications, though proper attribution would often be important.

[edit] you could pretty much say that on principle, any significant development should have a publication in the open.

sgt101
1 replies
23h27m

and yet, who pays? This is fine if we are going to go full communist - I have no objection personally - but selective appropriation of peoples livelihoods is more full mafia or full feudal.

I don't see that as a step forward.

davidy123
0 replies
2h4m

I am not sure there are many significant science breakthroughs that aren't basically built on a 'full communist' (publicly funded) model.

There is a very significant tension between making all works that are produced on the back of giants all the way down free (practically speaking, everything, unless there are significant works developed by feral people), vs keeping individuals fed and happy, vs giving corporations so much power they only serve themselves.

I think the benefits of saying any significant development must at least publish its information metadata including descriptions has too many benefits from any perspective to not be supported, and I'm not sure it'd be that expensive if its incorporated in existing systems. It would create its own network effects.

sgt101
1 replies
23h29m

special access for the elite.

I think that this is about property rights, the news industry has been gutted in the last 30 years, a lot of content creators (journalists) have lost their livings. The ones that are left are going to lose their livings if the content they generate is rendered valueless because there is no way of protecting that value.

In terms of special access, think about your shoes. They are nice, but only you are allowed to use them. This is not fair. You are the elite...

This goes to difficult places.

davidy123
0 replies
1h52m

I don't think it will be rendered valueless, but it shouldn't be completely exclusive. Basically as an extension of today's search model (which is a large part of what LLMs are, along with a grab-bag of useful ML algorithms), people should be able to access information universally, but if they want to go to perspectives or very fine details, then a pay model is acceptable, as long as there's a trail from freely available information and evaluable models. Ultimately imo there are bigger problems with elite/gatekeepered information than finding new ways to produce or support information development, given the power a few corporations are gaining and the opportunity to overcome stratified society.

rmholt
5 replies
1d5h

I feel like the outcome is obvious, there will be a finite list of IPs who's owners have enough money to actually sue, which will get filtered out of the output of publicly available models. They will just slap a detector model on the end of the generator to filter them out.

Private models will not care, nor will things change for IP owners with lesser power.

quonn
2 replies
1d5h

That seems unlikely, unless they settle out of court. And why would the NYT settle like that without receiving a billion?

Courts are likely to make generally binding decisions.

rmholt
1 replies
1d5h

Yes, but to enforce those decisions in other cases there would still have to be other lawsuits. And I just don't see that happening on a large enough scale to change the industry

Maybe I'm wrong though

noitpmeder
0 replies
21h6m

The point is that OpenAI (and others) will need to change their training pipelines to ensure there is never such a threat of a lawsuit.

Which, to be clear, is absolutely a good thing and what they should have been doing from the start.

reqo
1 replies
1d5h

Many small owners together can bring a class action though

rmholt
0 replies
1d5h

That is true and would break the prediction... here's hoping!

nojs
4 replies
1d6h

In practice, what happens next when websites all start to block openai by default (or change their TOS to disallow OpenAI’s crawlers)?

It seems like there’s little incentive not to do this, because unlike Google OpenAI isn’t bringing any traffic or eyeballs. It may end up being a default setting in Wordpress for example.

But OpenAI presumably can’t afford to pay every single long tail source of content on the whole internet — so how does this end?

CaptainFever
1 replies
1d5h

or change their TOS to disallow OpenAI’s crawlers

Additionally, this TOS can be ignored if you're in a jurisdiction with TDM exceptions.

Finally, owing to the bar against contractual override, once a user complies with any conditions for gaining lawful access to a work (such as signing as a subscriber and/or making payment), he will be entitled to use the work for TDM purposes even if the terms of use expressly prohibit this. Content owners may wish to relook their business models and, where necessary, price-in the possibility that the licensed works may be used for TDM.

Source: https://www.twobirds.com/en/insights/2021/singapore/coming-u...

dkjaudyeqooe
0 replies
1d5h

That doesn't mean you can then use the output of generative AI in non-TDM jurisdictions without getting sued.

Also TDM exceptions are not necessarily going to be lawful/possible in many jurisdictions.

golol
0 replies
1d5h

It's not like you can hide the web from OpenAI. They could just use a secret crawler. Or buy the data from a third party company.

dkjaudyeqooe
0 replies
1d5h

This is what will kill generative AI and there is nothing the courts or lawmakers can do about it. Even in a fair use scenario you can't beat the TOS.

jpeter
4 replies
1d6h

If I prompt "golden droid from classic sci-fi movie", what else am I asking for if not Star Wars?

whywhywhywhy
0 replies
1d3h

If you do "Golden robot holding a lazer gun in a sci-fi setting, cinematic" it will give you a golden robot that doesn't look in the style of C3PO or Star Wars.

"Droid" is actually a Star Wars term [1], and saying you want it from a "classic sci-fi movie" is asking it to reference a real thing that is well known. Reid is intentionally pushing it that way to fill his agenda and these terms are not as generic as he's making out.

[1]:https://trademarks.justia.com/756/52/droid-75652542.html

sjfjsjdjwvwvc
0 replies
1d6h

Or another „copyrighted“ droid for that matter, after all it’s a classic.

Same with robot cop, what the hell did you expect to get…

Or Italian plumber with red hat with M on it, that’s just a description of Mario

anonymoushn
0 replies
1d6h

an original golden android in the style of a classic sci-fi movie that does not actually exist

edit: i feel like all these comments asking "what else should it generate?" are pretty weird given the proliferation of stuff like non-infringing Star Wars and Indiana Jones knockoffs in other media like Race for The Galaxy or Arkham Horror The Forgotten Age etc.

Uvix
0 replies
1d6h

The robot from Metropolis?

josh-sematic
4 replies
1d4h

Gary Marcus is growing his subscriber base using images of copyrighted IP (C3PO, Mario, etc.). Fair use? Then why is the tool he used to produce those materials not also fair use of the IP? My take is that either we say the models are like people (do we penalize people for learning from IP and letting that influence what they subsequently produce?) or we say they are like tools (do we penalize Adobe because Photoshop makes it easier to make a picture of Mario on the Death Star?).

cogman10
3 replies
1d4h

Because the fair use clause he's using is about giving commentary.

The reason the tool is problematic is because derivative works are also copyrighted. LLMs aren't adding value to their output or using creative functionality. That are smashing multiple works together to produce a response. And, many of them are selling the output which is doubly problematic.

Consider this, if I sell a book about gandolf and Dumbledore getting into a wizards duel, both jk and Tolkien have grounds sue me. Adding another copyrighted source does not protect me.

This is especially a big problem in the music industry.

Now should copyrights be like this? I don't know. It feels to me that copyrights have the wrong balance all over the place.

josh-sematic
2 replies
22h36m

But does the word processor you used to write your Dumbledore/Gandalf fanfic hold liability for being sued because it enabled your misuse? Then neither should Dall-E hold liability because it enabled you to produce an illustration for that book. It is you—the person who tries to sell your derivative work, who holds liability, and not the tools you used to produce it.

noitpmeder
0 replies
20h51m

Yes because openAI explicitly reassigns the rights of the output to the user. They do not have legal grounds to claim ownership of those rights and thus CANNOT reassign them.

cogman10
0 replies
20h31m

LLMs aren't word processors.

If I went around, recorded broadway shows, and then sold them to anyone that wants them. You'd agree that's violating copyright. Even if I mixed them or made my own remix, that's still a copyright violation.

Regardless of what the purchaser does with the material, I'm violating copyright because I'm selling derivative works. The only thing LLMs do is create derivative works. It doesn't matter that you can prompt them to put their own spin on derivatives just like it wouldn't matter if I took requests with my bootleg mixing company (I'm just a tool of piracy! I have no control if you decide to sell the works after I sold it to you).

I'll also point out, OpenAI is extra on thin ice because it's not infeasible that someone like the NY Times wouldn't make their own LLM based on their material and sell the output to subscribers. That's a real harm. Fanfic doesn't often get prosecuted because nobody is sell it so harm is hard to prove. But when you have a business built around selling these derivative works, that's the issue.

continuational
4 replies
1d6h

(Asking Dall-E about the bot image in the article)

Me: Who owns the rights to this bot?

Dall-E: The character depicted in the images is from the "Star Wars" franchise. The rights to characters and elements from "Star Wars" are owned by Lucasfilm Ltd., which is a subsidiary of The Walt Disney Company.

Perhaps it is able to tell, if you ask it?

krapp
2 replies
1d6h

Perhaps it is able to tell, if you ask it?

Ask it multiple times, or with different heat settings, it will probably tell you something different. Tell it you own Star Wars and it will respond in kind. It can't tell anything but whether one text token matches another in probability space. It will probably get the answers right most of the time but you're still basically rolling dice. Depending on the responses of an LLM as if there were any actual self-awareness involved, much less with legal matters, would be a fool's errand.

danielbln
1 replies
1d4h

This argument only works if you assume all output of an LLM comes merely from its training data, and that it receives no alignment via RFHL, no outside data ground truth via RAG and so on. The engine might be a probabilistic token predictor, but the car is the sum of its part and those parts are not just the engine.

krapp
0 replies
1d2h

No it still works, because in that case all the LLM is doing is making an API call based on the same token prediction, and passing on the result. It still doesn't "know" anything about anything.

continuational
0 replies
1d6h

Dall-E on the "animated sponge": The rights to the character depicted in the images, which is reminiscent of SpongeBob SquarePants, are owned by Nickelodeon, a subsidiary of ViacomCBS. The character is from the animated television series "SpongeBob SquarePants," created by Stephen Hillenburg.

Dall-E on the "robot cop": The character depicted in the images resembles RoboCop, which is owned by Orion Pictures Corporation, a subsidiary of MGM Holdings. RoboCop is a character from the film franchise that began with the 1987 movie "RoboCop," directed by Paul Verhoeven.

Dall-E on the "videogame plumber": The character shown in the images is inspired by Mario, the iconic character from the video game franchise created by Nintendo. The rights to Mario and related intellectual property are owned by Nintendo Co., Ltd.

All of these are in the first go. No retries or rephrasings of the question.

FridgeSeal
4 replies
1d3h

I am beginning to think that in these discussions these models are functioning more like an obscuring factor than anything else and the discussion is getting bogged down in that, and not the crux of the argument.

They’re giving people plausible deniability in the “chain of responsibility”, and I think if we took away “LLM” and replaced it with “fairground sideshow magic box” the argument that LLM’s are somehow special and deserving of exemptions disappears real quick.

jcgrillo
2 replies
1d

I agree, and I would prefer to see concrete examples of LLMs being used productively and profitably in the industry in a "disruptive" manner--putting people out of work, etc--before we conclude they're somehow the next big thing. Basically, before claiming LLMs (or generative techniques, more generally) mean that we're on the doorstep of "general" intelligence, show me door!

The outline of that door might look like industrial adoption of these things for solving some actual problem other than the entertainment value of typing things into the box and seeing what comes out the other side. But so far, as far as I can tell, nobody's actually doing this?

orange-mentor
1 replies
22h2m

...nobody's actually doing this?

I think you're right.

I am a programmer and I use GPT occasionally, and I even pay 20 bucks a month (for now), but even for my job it's not a not a world-shattering improvement.

... the entertainment value of typing things into the box and seeing what comes out ...

I would only add that in a consumer society like ours, entertainment is important. Changes to entertainment seem to have, like, weird ripple effects. Not the knock-down economic disruptions that AI is promising, but I kind of think LLMs are just going to make our culture weirder. I can't anticipate how, but having a bunch of little LLM-powered daemons buzzing around the internet is just gonna be freaky.

jcgrillo
0 replies
21h36m

I am a programmer and I use GPT occasionally, and I even pay 20 bucks a month (for now), but even for my job it's not a not a world-shattering improvement.

I am also a programmer, and when I think about the amount of time I actually spend typing out code, even on a great day where all the stars have aligned just right and I can really bang out some code that's like... idk, 30-50% of my time? Usually it's much less, and I'm doing things like reading documentation, reading code, talking to people, etc. So it's hard to imagine Copilot or whatever making me much more effective at my job, as it can really only help with a fraction of it.

I could see someone making the assumption that being able to delegate programming tasks to a robot assistant might make them more productive, but often I find that I don't really understand a problem fully until I'm in the weeds solving it--by which I mean I haven't specified it completely until I've finished the implementation and written the tests. So I don't know to what extent being able to specify and delegate would really help me be more productive.

having a bunch of little LLM-powered daemons buzzing around the internet is just gonna be freaky.

Yeah, they're not super cheap though so they need to get actual work done otherwise there's no reason to run them. Unlike blockchains, they don't have a pyramid scheme holding them up.

regularfry
0 replies
22h51m

I completely agree.

Betamax says that a technology which has significant non-infringing uses is not inherently infringing.

We've already got precedent saying that AI generated works don't accrue copyright protection, and by the same argument the act of generation by the AI expresses no intent, so infringement or otherwise must be down to the human using the output because the black box itself has no agency.

DigitallyFidget
4 replies
1d1h

Per United States law, imagery/art/music/text/photography generated by non-human means (such as machinery, animals, or generative AI) cannot hold copyright. https://copyright.gov/comp3/chap300/ch300-copyrightable-auth... Section 306 on page 7.

I'm not sure how it'll hold up in law to claim copyright violations against something that wasn't created by a person. It'll really depend on the lawyers and judge's interpretation of written law. But I'm curious to see what comes of this.

zanfr
2 replies
1d1h

hmm then it meants generative music, as in say brian eno's experiments aren't copyrighted?

jimbobimbo
0 replies
1d1h

Did he ever use AI to generate music? As opposed to crafting and using an algorithm, in which case the computer is just an instrument, like synthesizer is.

iwontberude
0 replies
1d1h

I guess so! Good point.

sgt101
0 replies
23h32m

So on your interpretation if I photocopy a book and then sell the photocopies to my friends there is no infringment?

I don't think so, but hey, a photocopier is a machine and it generated the book so should be ok!

CTmystery
4 replies
1d6h

My guess is that none of this can easily be fixed. Systems like DALL-E and ChatGPT are essentially black boxes. GenAI systems don’t give attribution to source materials because at least as constituted now, they can’t.

Is it necessary to fix in the model itself? It seems a gate in the post processing pipeline that checks for copyright infringement could work, provided they can create another model that identifies copyrighted work (solving the problems of AI with more AI :/)

Eridrus
1 replies
1d6h

Exactly; there is no need to do this in the model, you just need well understood token retrieval methods for identifying copyright infringement that ChatGPT's competitors already have.

You will get into some murky definitions of what is exactly required for copyright infringement vs fair use, etc, but we already do this for ContentId for YouTube and text is far simpler.

noitpmeder
0 replies
21h8m

This is bogus. Now you require that every piece of copywriter be registered and indexed in a central authority?

What if I write a story and publish it on my blog. Should I be required to submit this to openAI's copywrite model to ensure the story is never used in openAIs other models? What about the other 100 AI model companies that are going to spring up in the next year?

It should be on the curators of the training set to ensure all material inside is fair for them to use.

LeonardoTolstoy
0 replies
1d6h

I should maybe preface this by saying that I probably agree that this is the way this will shake out ultimately.

But I also would say multiple odd post processing stuff (obviously completely obscured for security reasons) bolted onto a giant black box model will erode the trust in the results. If a robot was unveiled and the question of "what prevents this robot from using it's superhuman strength from smashing my head in" the answer of "don't worry there is a post processing step in the robots brain whereby if it detects a desire to kill we just cancel that" would be a little disconcerting.

The more satisfying solution is: the model / robot is designed to not be able to produce specific images / to smash human heads in. It just might not really be possible.

Krasnol
0 replies
1d6h

I don't even think they want to fix it. They just want to see money. Some form of "tax" per prompt or other ridiculous "models".

This is such a nice, profitable opportunity. Much better than pay per view or subscription models for humans.

vimax
3 replies
1d6h

Maybe Disney and the record labels shouldn't be claiming so much of public culture as their own.

dkjaudyeqooe
2 replies
1d5h

If they created it, they own it, why shouldn't they be claiming that?

danielbln
0 replies
1d4h

Copyright is not a universal axiom. Corporations lobbies for highly unreasonable copyright extensions to bolster their profits. Most of that stuff should have long entered the public domain.

baobabKoodaa
0 replies
1d5h

Record labels aren't generally considered as "creators" of music, although they sometimes are to some extent.

And Disney bought most of its iconic properties, it didn't build them inhouse.

ultrablack
3 replies
1d2h

We are all trained on copyrighted input. That is not a problem. What is a problem is if you reproduce it and try to claim copyright for that. If someone wants to create their own image of Mario in an AI, so what?

gumballindie
2 replies
1d2h

We are not machines. The argument that procedural text and image generators are similar to us is ridiculous. The issue is not whether people can generate images. The issue is ai companies stealing content and reselling it. That needs to stop.

rvz
1 replies
1d2h

The argument that procedural text and image generators are similar to us is ridiculous.

Agreed. The amount of endless whataboutisms AI proponents have to continuously invent around comparing humans and AI machines as having 'similar' characteristics to justify mass copyright violation is just absolutely laughable.

The issue is ai companies stealing content and reselling it.

The key point here is the 'reselling' part, without credit, attribution or permission to do so and then claiming the creation as one's own. The fact that these AI companies won't disclose their training data, tells us that they know they are in deep trouble. The so-called 'fair use' excuses isn't going to work this time.

Given that Apple paid news orgs to train on their licensed data, the lawsuit with the NYT should not be a surprise for OpenAI and Microsoft (as they knew that they needed to pay for a license to access and train on the data) and will eventually end with a licensing deal with the NYT.

noitpmeder
0 replies
20h49m

It's like the ones arguing in favor of blatant AI theft have a monetary incentive to see that they succeed.

Or, gasp, they are LLMs themselves.

pointlessone
3 replies
1d6h

If any of those results would be deemed infringing we can bid farewell to all fanart ever. Likewise, to all fanfiction. Or any original work that was merely heavily inspired by previous works. Like a lot of modern fantasy is basically Tolkien fan fiction. Or is Gandalf close enough to Merlin to claim prior art that is in public domain?

whywhywhywhy
0 replies
1d3h

Weirdly some of the most vocal about this have been professional illustrators and artists who make a lot of money off what is essentially selling fanart commissions, not sure if they're understanding it could impact their work if they get what they want.

numpad0
0 replies
1d4h

Fanfictions are controlled by unspoken common sense rules and protected by copyright laws. It's almost weird to hear fan content world being seen as a wild west, it feels like listening to a caveman description of an Apple Store. No they're not living there, they're - have you ever used currency? The round medals that people keep in pockets and trays?

dkjaudyeqooe
0 replies
1d5h

It's fair use, whereas generative AI doesn't satisfy the same criteria. From https://www.ogcsolutions.com/is-fan-art-copyright-infringeme... :

For fan art to fall under the fair use exception, it must meet all four of the following criteria:

It must be transformative, meaning it adds something new and different to the original work.

It can’t be used for commercial purposes.

It must not negatively impact the market for the original work.

And finally, it must be created for a limited and non-exclusive audience.

appplication
3 replies
14h29m

There are an alarming number of responses seemingly completely unaware of the core thrust of the article (and NYT lawsuit). ChatGPT was able to reproduce and publish significant portions of NYT articles, completely verbatim for hundred-to-thousand word stretches.

It’s not derivative work. We’re way past that. NYT has an exceptionally strong case here and anyone arguing about the merits of copyright is way off the mark. This court case is not going single-handedly to undo copyright. OpenAI has very little going for them other than “this is new, how were we to know it could do this”. So knowing that, the currently trained models are in a very sticky situation.

Further, I don’t see NYT settling. The implications are too large, and if they settle with OpenAI, they will have a similar case pop up with every other model. And every other publisher of digital content with have a similarly merited case. This is an inflection point for generative AI, and it’s looking like it will be either much more expensive or much more limited than we originally thought.

A side effect of this: I am predicting that we will start to see a rise in “pirate” models. Models who eschew all legality, who are trained in a distributed fashion, and whose weights are published not by corporations but by collectives (e.g. torrent models). There is a good chance we see these surpass the official “well behaved” models in effectiveness. It will be an interesting next few years to see this play out.

benlivengood
0 replies
14h0m

My guess is that OpenAI will be able to basically copy Google/YouYube on this and offer a system like content-ID. Specifically, ChatGPT doesn't reproduce copyrighted works by default; only by request/action of a third party user much like YouTube serving whatever videos people upload. It wasn't the intent of OpenAI to infringe copyright and in fact a lot of or most researchers believed the models were not overfitted enough to reproduce significant portions of arbitrary works.

RestlessAPI
0 replies
14h5m

Such a thing happened with DALLE, Midjourney, and Stable Diffusion.

Stable Diffusion, when used to its fullest with thing like Control Net and LoRAs, blows the pants off of other proprietary models.

NemoNobody
0 replies
10h4m

Well I know exactly what the NYT has - a very strong case. I think this case OUGHT to upend copyright law - it's terribly broken and has been for years.

Essentially, if you don't have a massive corp behind a copyright it doesn't mean anything, if a corp is behind something it can be locked forever, regardless of any limits said copyrights are supposed to have.

The NYT list nothing from OpenAI using old news - they still lose nothing if openai can reproduce those articles verbatim.

If the NYT wins - we lose lots. I think it's time revisit copyright, we can do that you know, it's rather dated, could use an update regardless.

Alifatisk
3 replies
1d6h

Did ClosedAi (OpenAi) ever confirm or deny that they trained their models on copyrighted materials?

danielbln
1 replies
1d4h

Is "Closed AI" the new "Micro$oft"?

Alifatisk
0 replies
1d4h

Yes

noitpmeder
0 replies
20h46m

They have not revealed the full extent of their training set. And they'll never do it without a court order because it will quickly reveal the amount of items inside that they have no legal right to use.

wslh
2 replies
1d

While different, I find this discussion about AI and copyrights as an evolution of the war that never was: Google/FB converting in the portal/proxy for content and while it is not generative AI you can find copyrighted images just using Google Images or as an snippet in the normal search engine. I mention Google because it is the de facto monopoly but this applies to a lot of aggregators.

I know we are talking about different technologies but it seems all these people were very silent and find some opportunity in having this war with OpenAI (not an endorsement) but not fighting others.

I am not making an statement about the morals of AI and aggregators/search engines (super interesting discussion that in a way was happening for long) but I am surprised that organizations are "just" waking up. It seems they just see it is a much simple and cheap fight.

theamk
0 replies
1d

The thing with Google is it is super trivial to exclude your text - tag on page, header on server, etc.. So all the conversations about google "stealing" context always seemed pretty silly to me.

Compared to that AI offers no way to opt out, which is a big difference.

dmbche
0 replies
1d

Personal use of copywritten material is fine - there is no breach of copyright when you download a picture from Google for yourself.

If you use it commercially then there is breach.

Uploading copywritten content is a breach of copyright as well, even without commercial use.

Google/Facebook are hosting and giving access to a bunch of media, which might or might not be copywritten - it's the individuals problem.They make.money from ads, not from the content.

AI companies stole copywritten media to train their commercial LLM, sell them or their products and make profit.

I don't think it's the same.

redcobra762
2 replies
1d4h

This operates similarly to importing an image into Photoshop. You can do whatever you like with images privately, or with gen AI, but the game ends when you try to use those images commercially.

Not sure how this “gets worse” or better for anyone. The current state of things seems generally fine, and there’s a real possibility the courts see it that way too.

throwoutway
0 replies
1d4h

but the game ends when you try to use those images commercially.

Right now, it feels more like it's called "innovation" and "entrepreneurship" than the end-game, as long as you have billions invested. Waiting on the courts to decide this issue

joenot443
0 replies
1d4h

There are some images you can't import into Photoshop, most notably being scans of legal tender. This is for a pretty obvious and on-the-nose use case, but perhaps we'll see GenAI given similar guardrails.

qgin
2 replies
1d

Things are about to get a lot worse for generative AI in the United States

They are about to be infinitely better for generative AI in China.

noitpmeder
1 replies
20h57m

China has massive IP theft and Chile labour issues that arguably give them competitive advantages too. Should we let those slide as well?

qgin
0 replies
18h27m

Each issue is unique. We can evaluate training of AI models on its own without needing to accept child labor.

intrasight
2 replies
1d5h

Just make LLMs be like your average human and forget details. I know that it's easier to say than to do, but so are many things worth doing. I can't plagiarize - my language and visual memory doesn't work that way. Such an LLM will have to "create" and answer from more fuzzy memory.

qolop
1 replies
1d5h

The class of models that Yann Lecun is bullish on (look up I-JEPA) do exactly this.

intrasight
0 replies
14h12m

But I-JEPA is non-generative. It does semantic image interpretation.

Okay, I guess it is related as my brain only does semantic image interpretation. (edit: my brain can create images, but only when I'm unconscious)

So with such a model, if you ask it to create an image, it would first create a semantic grammatical model of what you had asked for, and then perhaps draw it with colored pencils. I sort of like that. It's all that I could do. And it would be unlikely to violate any copyrights.

golol
2 replies
1d3h

How about this: Image generators should be treated like random google image search. They sample randomly from the distribution of publicly viewable images. Google does it exactly while Image generators do it in an interpolative way. Google images produced copyrighted works most of the time, an image generator only sometimes. Neither should be liable if someone sells a copyrighted work that was produced to someone else.

elmomle
1 replies
1d3h

But when Google image search produces a result, the question of whether it is copyrighted is something I can generally figure out in a matter of seconds or minutes. This is not so for image generators.

golol
0 replies
1d3h

But isn't that the users problem? Also with a smarter reverse image search you can detect an infringement with similar reliability as to using google images.

airstrike
2 replies
1d

I have no patriotic skin in the game, being neither American, nor European, nor Chinese, but this copyright issue seems overblown to me and like the perfect way to hand the leadership in generative AI over to China

startupsfail
1 replies
1d

Would you prefer to live under Chino-Russia dominating the technology sector or EU-US?

airstrike
0 replies
1d

EU-US, but were I consulted by the Chino-Russia camp, I'd say encouraging this debate is in our best interest and we should do our best to promote the issue as a real "danger"

RandomGerm4n
2 replies
1d6h

Perhaps we should simply take this as an opportunity to finally abolish copyright. Smaller artists mainly earn their money with commissions. They are paid to do a very specific thing. Whether there is a copyright on the result is irrelevant. Someone else who would "steal" the image and use it without payment would apparently have fewer requirements. The person could have simply taken any AI image. Therefore, the artist in the scenario would not receive any money from the second person anyway.

Apart from this, it is mainly large companies that benefit from copyright laws. Why should we have laws that restrict progress just so large capitalist companies can maximize their profits?

CaptainFever
1 replies
1d5h

Exactly. All of these just exposes the absurdity that is copyright laws. It happened before with the Internet and online piracy too, when redistribution became free and easy, yet the corporations and copyright holders refused to budge so they can retain their profits.

kayodelycaon
0 replies
1d2h

Here’s what happens with no copyright:

No one will have any right to their own creations. Anything an individual makes will belong to everyone. And since no attribution is required, no one will know who made it. An average artist’s value to society goes from low to non-existent.

In this world. big corporations will take everything created, claim as their own, and profit from it.

Right now, big corporations using other peoples work unattributed or unlicensed is unethical, because copyright exists. Remove that, and it becomes expected that every thought and idea you express belongs to whoever can make the most profit from it.

Paradigma11
2 replies
1d4h

So, whats the plan?

Content creators/artists compete globally. The only thing harsh regulations will do is create an unlevel playing field where artists from noncaring countries will have big advantages over artists from the west, which will be driven into illegality to compete.

In the end products will have to be classified anyway if they are infringing on copyright and/or were being built by an LLM. Most likely automated by another LLM.

sensanaty
1 replies
1d4h

Wouldn't the ones in the West with presumably stronger copyright laws be in a better position, since the trillion dollar megacorporations using their works have to actually pay them, whereas in places where copyright is ignored those creators just get all their shit stolen without credit even being given?

Paradigma11
0 replies
1d4h

Nothing will be stolen. Artists will use the same tools to check if their work is infringing that those companies and right holders have. There will be no Coca Cola logo/Super Mario Brother/CPO in that work.

The artists in the West wont get paid, because they wont get any jobs and those in other countries will. Maybe less, because they are more productive and the market is saturated, or maybe more demand will be created due to lower prices.

tim333
1 replies
22h29m

They are just going to have to inform the AI in some sense of the current copyright situation and ask it not to infringe.

It's the same for human writers. If you are writing an article for Wikipedia say, you should read relevant source articles and then rewrite in a way that isn't a copy and paste beyond a few words.

noitpmeder
0 replies
21h17m

Ok I'll bite. Let's assume you've informed the current models about copyright and asked them not to infringe...

What happens when they continue to do so.

t_mann
1 replies
1d6h

The article kind of amplified my regrets/anxiety for not getting a copy of books3 and the likes while it was easy. I didn't have an immediate use case, and I don't now, thought I'd wait until actually need it, but it feels like a window is closing here.

sjfjsjdjwvwvc
0 replies
1d6h

Don’t worry there are many people out there who have copies of it all, there is no way they manage to get the cat back in the bag even if all governments work together on this.

But yea get your own copies whenever possible

quonn
1 replies
1d6h

Maybe the way to go is to do pre-training on copyrighted data, then to thoroughly shake things up so that hopefully only some useful abstract structure of world knowledge remains and then train that on carefully selected licensed data.

disgruntledphd2
0 replies
1d5h

If the models weren't just doing massively complicated interpolation then this would probably work.

Honestly the only way to deal with this is to change the training data and retrain everything (probably at the cost of performance).

mensetmanusman
1 replies
1d4h

The world is a big place.

China can't produce LLMs because of inconvenient truths.

The US can't produce LLMs because of copyright.

Decentralized open source LLMs might exist that could work, but they won't have the giant GPU clusters.

A rich country with lax rule of law wins? Maybe that's why Sam went to the Saudis?

pelorat
0 replies
1d1h
jlnthws
1 replies
22h53m

We could get inspiration from the case of the record industry against Napster, or cabs VS Uber. Both parties are somehow abusing their position, but the world is moving on. Rent seeking is probably not an absolute source of wealth after all.

RecycledEle
0 replies
22h49m

Rent seeking should be a capital crime.

iainctduncan
1 replies
1d

I am constantly suprised by the amount of apologizing for generative AI infringement here. The fact that it's already being done and is a technical breakthrough is irrelevant to existing copyright law. "We are big and innovative" may hold weight with legislators, but it won't with the courts.

Remember when everyone and their dog discovered sampling in the late 80's and they all thought they could get away with it because it didn't seem like infringement to the samplers? The courts had no qualms about slapping record labels for putting out records with unlicensed samples in them. Albums even got pulled off shelves while licenses were sorted out.

These companies are charging for a service that returns copyrighted content, full stop. You can't do that whether you are AI or someone drawing Mario and selling the pictures on iStock, or putting out records that sample someone else's work without permission. It took a while in the case of sampling, but it sure as hell happened.

deputy
0 replies
1d

[deleted]

dmbche
1 replies
1d

Hey so the problem isn't the output of the LLMs but the input - the data they are trained on is stolen (big suprise, you can't claim fair use when using something commercially, like training your LLM).

The output is irrelevant.

Edit1: If you want to verify this, check out all the lawsuits against AI companies : it's always about using their copywritten goods. Any discussion about the output is to talk about the amount of damage done to the copyright holder, not if damage exists or not.

kromem
0 replies
20h9m

Here's one of the senior legal peeps at the EFF who has litigated IP cases talking about the issue: https://www.eff.org/deeplinks/2023/04/how-we-think-about-cop...

It's not as clear cut as you think it is.

beginning_end
1 replies
1d6h

This perspective on regulation was interesting: https://drafts.interfluidity.com/2023/12/28/how-to-regulate-...

    "Congress should declare that big-data AI models do not infringe copyright, but are inherently in the public domain.

    Congress should declare that use of AI tools will be an aggravating rather than mitigating factor in determinations of civil and criminal liability."

troupo
0 replies
1d6h

OpenAI and others: AI should be regulated!

Governments starting regulation and companies filinig cipyright lawsuits...

OpenAI: NOT LIKE THAT

amai
1 replies
23h40m

Should the NYT not sue https://commoncrawl.org/ ? OpenAI just used the data from commoncrawl for training.

noitpmeder
0 replies
20h58m

Is that true? Has OpenAI revealed exactly what is in their training set?

SKILNER
1 replies
23h56m

I don't understand the glee so many people have over this. I love being able to use Generative AI tools. How is it different than if I asked a person to draw these pictures for me? I know someone will gleefully clobber this question with a legal answer, but God, let's move forward, hunh?

wharvle
0 replies
23h39m

A bunch of rich people are raiding a little bit of work, each, from a whole bunch of people, then walling it off so they can get richer.

I’d not have a problem with this, personally, if their models were as available as the stuff they took from others. Instead it’s take, take, take… now wait a minute, that pile of loot I stole is mine!

Hugsun
1 replies
1d6h

There are good arguments for the copyright infringement belonging to the user, not the model maker, in this thread.

One issue with that is that there is not a reliable way to determine if copyright is being infringed.

Even if models could be used responsibly, there might not be a reasonable expectation that most people will. If infringement is so easy and avoiding it relatively hard.

I'm not sure what legal prescriptions should be made on this basis, but it's an interesting thought.

yokem55
0 replies
22h59m

Bit torrent clients are almost exclusively used for copyright infringement. Yet they are perfectly legal to develop and distribute. On the flip side, operating a company premised around easy copyright infringement was ruled to be illegal (Napster).

Where we might end up is in a situation where it is legal to train a model. Legal to produce software for using the model to generate content. Legal to distribute all of the above. But offering a standing service that does the above and is capable of creating infringing work is illegal. Great news for llama hobbyists. Bad news for ChatGPT.

AlienRobot
1 replies
1d5h

An argument I've seen made in pro of AI in past threads about this is that "scraping is legal."

Yeah, downloading the content of a webpage may be legal, but redistributing it isn't.

I wish people stopped trying to make these things seem more important than they really are just because IT people call them "technologies". Blockchain isn't a technology. HTML isn't a technology. React isn't a technology. And AI is now not a technology.

When I see ChatGPT or OpenAI, I don't think of "technology". I think of a program. Software. Because that's what it is. You don't say "none of the laws that exist in this world apply to this" every time you release new software.

I bet many people can't tell the difference between a quick answer from Google and a text generated by ChatGPT on Bing. They just see the output.

All that amazing capability of generative AI? That got old fast. It was groundbreaking for one instant. Now it's just an app that generates images. Just another piece of software. Nothing special about it.

Torrenting and other p2p file transfer protocols didn't get a pass for inventing groundbreaking ways to break the law. I don't think OpenAI will get a pass for doing the same.

danielbln
0 replies
1d4h

All that amazing capability of generative AI? That got old fast. It was groundbreaking for one instant. Now it's just an app that generates images. Just another piece of software. Nothing special about it.

Speak for yourself, personally I find it still groundbreaking and while the magic won't last forever, it is and will remain groundbreaking especially considering that technological progress and development will continue way beyond what we have today.

zanfr
0 replies
1d1h

no matter how you look at it; the cat is out of the bag. OpenAI could be censored but you can't censor the opensource

yieldcrv
0 replies
1d1h

a lot worse for cloud providers hosting generative AI

the models can be fine

wouldbecouldbe
0 replies
1d6h

What about non-mit source code, 100% it's trained on those as well.

whodidntante
0 replies
1d4h

Simple solution, when gpt-5 comes out, just rename it Claudine, and the NYT will drop their suit

wayeq
0 replies
23h0m

We need to figure out how to ever so gradually move toward a post-copyright economy.

ur-whale
0 replies
1d3h

It's not for generative AI that thing are about to get a lot worse.

It is in fact the very notion of Copyright is breathing its last breath, and it is fantastic to be alive to see it happen.

throwuwu
0 replies
1d1h

Copyright is fucked. Even if Open AI somehow loses this and has to delete GPT4 and their training data, the generative AI cat is so far out of the bag that it’s gone on to live a full life and have many grandkittens. It’s already easy to install and run generative models and it’s just going to get easier and the models will keep getting better. These lawsuits are futile and won’t matter in 2 years or less.

smrtinsert
0 replies
1d

The NYTimes case is a clear one because they are delivering nearly the same content as an end product to users. The others seem like dead ends. The infringer would be the prompter, not the AI which operates more like a search engine. This is Napster all over again, what a phenomenal waste of time and money, where the artist will definitely come out with 0 at the end of it and a few corporations control everything - not to mention, there's nothing stopping anyone from releasing a tool that will crawl all spongebobs, generate your model for you and allow you to produce locally copyright infringing material it to your hearts content locally. You could drown yourself in local spongebobs.

smitty1e
0 replies
1d6h

The DALL-E/*GPT revolution sounds like the death of personal and corporate property.

That's gonna leave a Marx[1].

[1] https://youtu.be/7WDKivqFOgA?si=nWq5aeKA4dLytX3Z

skybrian
0 replies
1d4h

I wonder what Adobe Firefly does with these prompts?

sjfjsjdjwvwvc
0 replies
1d6h

Please ban all these AI companies, at this point I have enough OSS models, don’t really need any hosted service anymore.

IMO would be best if this stays a highly illegal technology that is only available to a few weirdo nerds /s

sjducb
0 replies
5h18m

I think it’s a question of what counts as publication.

I think that an AI model is analogous to an employee. Imagine I ask my employee to write an article, and they just copy an existing one from the times. That’s plagiarism and bad work, not copyright infringement.

If I then decide to publish the plagiarised article, then I have committed copyright infringement.

I once ran into this exact problem with a human. I hired a designer to make some artwork for an app. When I launched the app it turned out that the human had just copied the artwork from another game. It’s my problem that I hired an idiot, and my problem that my app was infringing the copyright of another app. (We redesigned the graphics very quickly)

shkkmo
0 replies
22h41m

It seems like this article makes a basic copyright mistake. I don't see any evidence that these are " reproductions" of source material like since no source image is linked to compare.

Instead, these are derivative works. We already have a flourishing culter of derivitave works, such as fan art that exist in various shades of legal greyness.

Some derivative works are fair use, some are not.

The position of the Author here seems to be that generative AI should not be capable of creating any derivitave works, or should only be able to do so it it can accurately identify which are fair use and which aren't (which seems like an impossibly tall bar.) This stance seem like a giant attack on fair use that significantly expands the power of copyright.

To me, the takeaway from this is different. This makes clear that there is currently a risk when using AI generated art that you could end up unintentionally creating and publishing a derivative work unintentionally and thus without evaluating if that work constitues fair use.

rolisz
0 replies
1d5h

Simple fix (at least for ChatGPT): ask it to avoid drawings with similarities to copyrighted characters.

roenxi
0 replies
1d4h

Based on the rate of progress; I think this makes little difference to AI progress in the medium-long term.

At the moment, we don't have hardware that can do what humans do (process video feed from eyeballs and build up a world model). I imagine that we'll cross that barrier cheaply in the coming decades, at which point copyright becomes moot. AIs will be able to develop their own styles and world understanding from scratch, then generate original work.

renewiltord
0 replies
1d6h

You can try, but I have Mistral on my local computer and it doesn't need the Internet. And people have pirate dumps they're going to run this stuff through.

I'll just do it myself.

pxoe
0 replies
1d3h

there's an easy fix. the easiest. just don't use data that you don't have the rights to use. apparently that's just impossible.

"but what if we want to scrape the entire web and something makes it in anyway? see, that is impossible". well that's just saying "fuck it" and using bad data anyway. that's not an actual effort to "not use data you can't use" - there was just no way there'd be a 'rights cleared' way to use the entire web anyway. that is impossible. using a clean dataset is not impossible. it's very possible.

ponorin
0 replies
1d4h

this is exactly what i predicted: the current generative ai is basically rewarded based on how much it convinces people to be a real thing. it very much has the ability to copy verbatim unlike how most human memories work. without fundamental shift in the methodology of machine learning the fault can only be hidden, not solved. a cat and mouse game where one cat has to fight tens of thousands of mouse. it's also very telling how the discussion quickly turns into "maybe society needs to adapt" when so called technological innovation is involved. copyright problem should be solved for artists, not for datacentres. for now it's a handful of famous IPs, but what's stopping from generative ai to snatch some random indie artist's property and copying it ad infinitum?

penjelly
0 replies
1d6h

My guess is that none of this can easily be fixed.

also my concern, except it feels like many of LLMs "problems" cant be easily fixed

oglop
0 replies
1d3h

So what? I feel like I’m taking crazy pills when I read these things. You all do realize the same thing happens in your mind with those same prompts right? That’s kinda how it works. Who is surprised by this? Yeah no shit it can kinda reproduce the text it was trained on, so do I! That’s how that works. And the NYT knew for a long ass time this thing was ingesting. Literally saw this in the marketing when I signed up last year.

I wasn’t shocked when I noticed I could query it about ANY math textbook I owned and it could talk with me about it. I did t bitch and gripe, I enjoyed it and have conversations.

Anyway, I’m in the minority I guess. I love that I can talk with it about books and news.

ofslidingfeet
0 replies
18h48m

I'm still waiting for people to figure out the whole point of an automated process is that it behaves the same way each time.

octacat
0 replies
1d1h

I am expecting politicians would do some nice mental gymnastics regarding regulating this. All major IT companies are doing genai now and nobody wanna hurt the companies.

null_point
0 replies
3h28m

I suspect this may delay some short term progress by creating pressure on AI labs to train their models from data curated or synthesized in a way that is contentious of copyright law.

There is already troves of data that are fair game for training, but even "corrupted" data sets can probably be used if used intelligently. We've already seen examples of new models effectively being trained off of GPT-4. That approach with filters for copyrighted material might allow for data that is sufficiently "scrambled". Not to say building such a filter is definitely easy, but seems plausible.

logicchains
0 replies
1d6h

I predict this could be a boon for generative AI because restricting it to training on copyright-expired media would produce a higher quality training corpus, as low-quality material from so long ago is unlikely to have been preserved, leaving only higher-quality material.

legendofbrando
0 replies
1d2h

Surely one answer is to train (or aggressively fine-tune) a new model that doesn’t (or refuses) to produce these outputs and then - as exists already, augment that model’s understanding of copyrighted material by having it Bing/Google search as a RAG process that requires the end user to log into accounts at the New York Times (and other accounts) with their paid sub. This broadly replicates the process a person could do today when they read the internet and summarize it while paying rights holders.

Expensive to do but hardly the end of Generative AI or OpenAI should that be the difference between having a business or being sued out of existence. Never underestimate people who have a clear economic interest especially when their own existence is at stake.

karmakaze
0 replies
22h56m

The real 'problem' is how do we navigate the present and near future where much more than physical labor is being automated? This is where we need sustainable solutions. The rough road on the way should also be smoothed out so as not to disrupt so many lives, but it's good to keep a perspective what and why we're doing these things.

karmakaze
0 replies
23h7m

It shouldn't matter how the images/etc are created. The problem comes about when it's used as an original work by the person that's doing so.

Imagine instead of AI/ML, we have a mechanical-turk-like service that produces output from descriptions. The service makes no claims that the generated outputs are not similar to any copyrighted works. The only claim the service makes is that they themselves claim no copyright on the output. It's then up to the user of the service to determine if the output is suitable for their intended use.

Whether such a service itself is legal is a separate matter. For that matter, say you outsourced the artwork to a person who again gave you infringing work. The user of that output is still in violation. With AI/ML we're basically outsourcing to a 'service' that is known to sometimes output copyrighted work so with the user knowing that, are responsible for fair usage.

jdjdjdkdksmdnd
0 replies
1d6h

people are so naive. AI is a matter of national security now. its over. they exposed civilians to nuclear radiation for the nuclear bomb. and you think the state would let this get in the way of the AI arms race which they are anxiously anticipating? nope

hahajk
0 replies
1d3h

And a whole universe of potential trademark infringements with this single two-word prompt: animated toys

If you flood the market and dominate children's culture with toys from your TV shows, you absolutely cannot complain when your toys are considered iconic enough to be the generic "animated toy". These images don't replace or substitute the things they are depicting.

gfodor
0 replies
1d

Gary Marcus is the master of AI FUD

freddealmeida
0 replies
1d6h

not in japan.

efields
0 replies
1d4h

It’s more interesting to me how these entities that operate the models start making money from them. They are a money pit and there’s not enough $20/month subscribers on earth to support them.

Enterprises that make content with this also don’t want to infringe on copyright. The AI companies don’t have a good story here. The value has not become evident after years.

digitcatphd
0 replies
1d3h

Rather than attempting to combat our obvious future, they should spend this effort to find ways to monetize and succeed in this new environment.

dawnim
0 replies
1d6h

This feels like another area where piracy will surely be superior in case things like this land on the disallowed side of regulation. The model trained on all data will outperform the model trained on a legal subset of data. Whether or not you use it to produce potentially infringing content is another point. Performance will likely improve from having references to copyrighted material and people capable of doing so, myself included, would probably prefer to interact with the non limited model. Perhaps time to update the laws or at least move liability from the creator of the model to the user. No one is going after pencil makers but I can draw a pretty good Mickey Mouse with access to one. Feels like me generating C3P0 and claiming ownership is my problem, not OpenAIs.

dang
0 replies
23h42m

Related ongoing thread:

NY times is asking that all LLMs trained on Times data be destroyed - https://news.ycombinator.com/item?id=38816944 - Dec 2023 (93 comments)

Also:

NY Times copyright suit wants OpenAI to delete all GPT instances - https://news.ycombinator.com/item?id=38790255 - Dec 2023 (870 comments)

NYT sues OpenAI, Microsoft over 'millions of articles' used to train ChatGPT - https://news.ycombinator.com/item?id=38784194 - Dec 2023 (84 comments)

The New York Times is suing OpenAI and Microsoft for copyright infringement - https://news.ycombinator.com/item?id=38781941 - Dec 2023 (861 comments)

The Times Sues OpenAI and Microsoft Over A.I.’s Use of Copyrighted Work - https://news.ycombinator.com/item?id=38781863 - Dec 2023 (11 comments)

caeril
0 replies
1d3h

Wow. I feel really sorry for these giant corporations who have wielded armies of lawyers against fanfic artists to prevent fair use, and to prevent trademarks and patents from expiring on the timelines enshrined by law.

Can we all have a moment of silence for poor Bob Iger? Maybe we can start a GoFundMe to help him out?

bambax
0 replies
1d6h

This only mentions ChatGPT (and M$ by association) but how would this impact "open" models? Even if their makers are somehow prevented from updating them, the models themselves are already in the wild...?

asylteltine
0 replies
1d4h

I certainly hope so. You can’t just steal content and call it “””AI”””

amelius
0 replies
1d6h

Just like we have the uncanny valley for robots, LLMs are in the unoriginality valley. Only when we get out of it will the copyright issues go away.

airesearcher
0 replies
1d6h

I think there is another way to solve this. Someone should train an LLM on copyrighted images. Then use that as a second pass on any image generated by the primary LLM to check if it might contain copyrighted images, and blur the copyrighted parts(or change them sufficiently).

Another change could be to the license agreement of LLMs - they could have the user assume liability for any material produced instead of the provider assuming liability. The user would agree that getting the rights for any copies and distribution of copyrighted materials is their sole responsibility instead of the provider.

_giorgio_
0 replies
1d1h

This guy built a career around nonsensical and catastrophic endings.

Everything that he sees has mysterious flaws that never happen.

SubiculumCode
0 replies
1d6h

Attribution weights could be the basis of new type of copyright asset licensing scheme. For all those tech employees who fed the company's model, a license in perpetuity to at least a portion of that value...but only if you fight for it. They are training to replace you, watching your every move, your thought processes, ready to make you a function call.

RecycledEle
0 replies
22h50m

If we get rid of unconstitutional copyrights in the US, this ges away.

Recall that according to the US Constitution, copyright can only be on on "science and the useful arts."

Alternately, we could restore a reasonable limit to the duration of copyrights, like 14 years.

Log_out_
0 replies
1d2h

That sound, as if layers and layers of renteering aristocracy were forced to work again against their will.

KETpXDDzR
0 replies
18h44m

I'd expect "Open"AI et al to lobby heavily towards an "AI-generated content is excluded from copyright infringement". I think it's possible that they'll introduce a "generative AI" tax. Charge x cents per generated text/image and distribute the fund to all media companies.

In Germany you pay some amount extra on top of the sales price of anything that can store data (CX, DVD, USB sticks, HDDs, ...). This is then distributed to all companies that could be impacted by software piracy. I'm still not sure if that's legal considering the Geneva convention disallows collective punishment.

Joel_Mckay
0 replies
1d6h

If ML cannot create copyrightable or patented material under current legal precedent, than shouldn't the prompt output be considered public domain regardless of content semblance?

The paradox should still violate Trademarks due to similarity, but likely cannot infringe on copyright content under prior legal opinion... if at least 80% different from prior art. The lawyers are likely going to have to do a special firm survey to figure this one out.

Bag of popcorn ready =)

Avicebron
0 replies
1d5h

I'm surprised this is presented as a revelation? I did pretty much this same experiment ages ago as part of a suite of tests comparing the efficacy of different sized models..

AC_8675309
0 replies
1d3h

So the models overfit the training data, essentially memorizing, instead of generalizing?

8note
0 replies
11h7m

"from classic sci-fi movie"

How could you put that as the prompt without intending to infringe? Anything pulled from a classic sci-fi movie would be infringement. The term droid is also star wars specific?

Id consider the "red soda" one as grounds that the Coca-Cola brand has become generic and that it's synonymous with soda. Same thing with Mario too. There is so much non-nintendo content made featuring Mario the plumber that you could get that without training directly on Nintendo's artwork

1shooner
0 replies
1d1h

Imagine a future where copyright registration involves contributing your IP to a public adversarial model, which is then a regulated layer in future generative model licensing.