Stable-Audio-Demo

Interestingly, Ed Newton-Rex, the person hired to build Stable Audio, quit shortly after it was released due to concerns around copyright and the training data being used.

He’s since founded https://www.fairlytrained.org/

Reference: https://x.com/ednewtonrex

For generative models, if the model authors do not publish the architecture of their model; and, the model uses a transformation from text to another kind of media; you can assume that they have delegated some part of their model to a text encoder or similar feature which is trained on data that they do not have an express license to.

Even for rightsholders with tens of millions to hundreds of millions of library items like images or audio snippets, the performance of the encoder or similar feature in text-to-X generative models is too poor on the less than billion tokens of text in the large repositories. This includes Adobe's Firefly.

It is also a misconception that large amounts of similar data, like the kinds that appear in these libraries, is especially useful. Without a powerful text encoder, the net result is that most text-to-X models create things that look or sound very average.

The simplest way to dispel such issues is to publish the architecture of the model.

But anyway, even if it were all true, the only reason we are talking about diffusers, and the only reason we are paying attention to this author's work Fairly Trained, is because of someone training on data that was not expressly licensed.

If you require licensing fees for training data, you kill open source ML.

That’s why it’s important for OpenAI to win the upcoming court cases.

If they lose, they’ll survive. But it will be the end of open model releases.

To be clear, I don’t like the idea of companies profiting off of people’s work. I just like open source dying even less.

If you require licensing fees for training data, you kill open source ML.

And likely proprietary ML as well, hopefully.

(To be clear, I think AI is an absolutely incredible innovation, capable of both good and harm; I also think it's not unreasonable to expect it to play a safer, slower strategy than the Uber "break the rules to grow fast until they catch up to you" playbook.)

I'm all for eliminating copyright. Until that happens, I'm utterly opposed to AI getting a special pass to ignore it while everyone else cannot.

Fair use was intended for things like reviews, commentary, education, remixing, non-commercial use, and many other things; that doesn't make it appropriate for "slurp in the entire Internet and make billions remixing all of it at once". The commercial value of AI should utterly break the four-factor test.

Here's the four-factor test, as applied to AI:

"What is the character of the use?" - Commercial

"What is the nature of the work to be used?" - Anything and everything

"How much of the work will you use?" - All of it

"If this kind of use were widespread, what effect would it have on the market for the original or for permissions?" - Directly competes with the original, killing or devaluing large parts of it

Literally every part of the four-factor test is maximally against this being fair use. (Open Source AI fails three of four factors, and then many users of the resulting AI fail the first factor as well.)

If they lose, they’ll survive.

That seems like an open question. If they lose these court cases, setting a precedent, then there will be ten thousand more on the heels of those, and it seems questionable whether they'd survive those.

To be clear, I don’t like the idea of companies profiting off of people’s work. I just like open source dying even less.

You're positioning these as opposed because you're focused on the case of Open Source AI. There are a massive number of Open Source projects whose code is being trained on, producing AIs that launder the copyrights of those projects and ignore their licenses. I don't want Open Source projects serving as the training data for AIs that ignore their license.

It’s not so clear cut. Many lawyers believe all that matters is whether the output of the model is infringing. As much as people love to cite ChatGPT spitting out code that violates copyright, the vast majority of the outputs do not. Those that do, are quickly clamped down on — you’ll find it hard to get Dalle to generate an image of anything Nintendo related, unless you’re using crafty language.

There’s also the moral question. Should creators have the right to prevent their bits from being copied at all? Fundamentally, people are upset that their work is being used. But "used" in this case means "copied, then transformed." There’s precedent for such copying and transformation. Fair use is only one example. You’re allowed to buy someone’s book and tear it up; that copy is yours. You can also download an image and turn it into a meme. That’s something that isn’t banned either. The question hinges on whether ML is quantitatively different, not qualitatively different. Scale matters, and it’s a difference of opinion whether the scale in this case is enough to justify banning people from training on art and source code. The courts’ opinion will have the final say.

The thing is, I basically agree with you in terms of what you want to happen. Unfortunately the most likely outcome is a world where no one except billion dollar corporations can afford to pay the fees to create useful ML models. Are you sure it’s a good outcome? The chance that OpenAI will die from lawsuits seems close to nil. Open source AI, on the other hand, will be the first on the chopping block.

Many lawyers believe all that matters is whether the output of the model is infringing.

What I don't understand (as a European with little knowledge of court decisions on fair use): with the same reasoning you might make software piracy a case of 'fair use', no? You take stuff someone else wrote - without their consent - and use it to create something new. The output (e.g. the artwork you create with Photoshop) is definitely not copyrighted by the manufacturer of the software. But in the case of software piracy, it is not about the output. With software, it seems clear that the act of taking something you do not have the rights for and using it for personal (financial) gain is not covered by fair use.

Why can OpenAI steal copyrighted content to create transformative works but I cannot steal Photoshop to create transformative works? What am I missing?

So, fair use is seen as a balance, and generally the balance is thought of as being codified under four factors:

https://www.copyright.gov/title17/92chap1.html#107

(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;

(2) the nature of the copyrighted work;

(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and

(4) the effect of the use upon the potential market for or value of the copyrighted work.

There's more detailed discussion here: https://copyright.columbia.edu/basics/fair-use.html

Why can OpenAI steal copyrighted content to create transformative works but I cannot steal Photoshop to create transformative works? What am I missing?

If Photoshop was hosted online by Adobe, you would be free to do so. It's copyrighted, but you'd have an implied license to use it by the fact it's being made available to you to download. Same reason search engines can save and present cached snapshots of a website (Field v. Google).

In other situations (e.g: downloading from an unofficial source) you're right that private copying is (in the US) still prima facie copyright infringement. However, when considering a fair use defense, courts do take the distinction into strong consideration: "verbatim intermediate copying has consistently been upheld as fair use if the copy is ‘not reveal[ed] . . . to the public.’" (Authors Guild v. Google)

If you were using Photoshop in some transformative way that gives it new purpose (e.g: documenting the evolution of software UIs, rather than just making a photo with it as designed) then you may* be able to get away with downloading it from unofficial sources via a fair use defense.

*: (this is not legal advice)

make software piracy a case of 'fair use'

That's not a good example. Making a copy of a record you own(as an example ripping a audio CD to MP3) is absolutely fair use. Giving your video game to your neighbor to play - that's also fair use.

Fair use is limited when it comes to transformative/derivative work. Similar laws are in place all over the world, just in US some of those come from case law.

With software, it seems clear that the act of taking something you do not have the rights for and using it for personal (financial) gain is not covered by fair use.

Why can OpenAI steal copyrighted content to create transformative works but I cannot steal Photoshop to create transformative works?

That's not a good analogy. The argument, that is not settled yet, is that a model doesn't contain enough copyrightable material to produce an infringing output.

Take your software example - you legally acquire Civ6, you play Civ6, you learn the concepts and the visuals of Civ6... then you take that knowledge and create a game that is similar to Civ6. If you're a copyright maximalist - then you would say that creating any games that mimic Civ6 by people who have played Civ6 is copyright infringement. Legally there are definitely lower limits to copyright - like no one owns the copyright to the phrase "Once upon a time", but there may be a copyright on "In a galaxy far far away".

Dalle on Bing is happy to generate Mario and Luigi and Sonic and basically everybody from everybody without using crafty language so I'm unsure of what you're talking about.

Those that do, are quickly clamped down on — you’ll find it hard to get Dalle to generate an image of anything Nintendo related, unless you’re using crafty language.

really it seems more like someone was afraid of angering Nintendo who is a corporate adversary one does not like to fight and thus it has a bunch of blocks to keep from generating anything that offends Nintendo, that does not really translate to quickly and easily stopping and blocking offending generations across every copyrighted work in the world.

It would be interesting to see if courts agree that training+transforming = copying.

If I paint a picture inspired by Starry Night(Van Gogh) - does that inherently infringe on the original? I looked at that painting, learned the characteristics, looked at other similar paintings and painted my own. I basically trained my brain. (and I mean the copyright, not the individual physical painting)

And I mean cases where I am not intentionally trying to recreate the original, but doing a derivative(aka inspired) work.

Because it's already settled that recreating the original from memory will infringe on copyright.

Fair use was intended for things like reviews, commentary, education, remixing, non-commercial use, and many other things

"many other things" has included, for example, Google Books scanning millions of in-copyright books, storing internally them in full, and making snippets available.

The basis for copyright itself is to "promote the progress of science and useful arts". For that reason a key consideration of fair use, which you've skipped entirely, is the transformative nature of the new work. As in Campbell v. Acuff-Rose Music: "The more transformative the new work, the less will be the significance of other factors", defined as "whether the new work merely 'supersede[s] the objects' of the original creation [...] or instead adds something new".

"How much of the work will you use?" - All of it

For the substantiality factor, courts make the distinction between intermediate copying and what is ultimately made available to the public. As in Sega v. Accolade: "Accolade, a commercial competitor of Sega, engaged in wholesale copying of Sega's copyrighted code as a preliminary step in the development of a competing product" yet "where the ultimate (as opposed to direct) use is as limited as it was here, the factor is of very little weight". Or as in Authors Guild v. Google: “verbatim intermediate copying has consistently been upheld as fair use if the copy is ‘not reveal[ed] . . . to the public.’”

The factor also takes into account whether the copying was necessary for the purpose. As in Kelly v. Arriba Soft: "If the secondary user only copies as much as is necessary for his or her intended use, then this factor will not weigh against him or her"

While there are still cases of overfitting resulting in generated outputs overly similar to training data, I think it's more favorable to AI than simply "it trained on everything, so this factor is maximally against fair use".

Directly competes with the original, killing or devaluing large parts of it

The factor is specifically the effect of the use upon the work - not the extent to which your work would be devalued even if it had not been trained on your work.

None of those arguments make sense. The output of AI absolutely does supersede the objects of the original creation. If it didn't, artists wouldn't care that they were no longer able to make a living.

Substantiality of code does not apply to substantiality of style. What's being copied is look and feel, which is very much protected by copyright.

The copying clearly is necessary for the purpose. No copying, no model. The fact that the copying is then compressed after ingestion doesn't change the fact that it's necessary for the modelling process.

Last point - see first point.

IANAL, but if I was a lawyer I'd be referring back to look and feel cases. It's the essence of an artist's look and feel that's being duplicated and used for commercial gain without a license.

That's true whether it's one artist - which it can be, with added training - or thousands.

Essentially what MJ etc do is curate a library of looks and feels, and charge money for access.

It's a little more subtle than copying fixed objects, but the principle remains the same - original work is being copied and resold.

What's being copied is _look and feel_, which is very much protected by copyright.

If that were the case, no one would be able to paint any cubist paintings. (Picasso estate would own the copyright, to this day)

It's not that clear cut, there are a lot of nuances.

Ironically, Picasso was notorious for copying other artist's 'look and feel'...

None of those arguments make sense. The output of AI absolutely does supersede the objects of the original creation. If it didn't, artists wouldn't care that they were no longer able to make a living.

The question for transformative nature is whether it merely supersedes or instead adds something new. E.G: Google translate was trained on books/documents translated by human translators and may in part displace that need, but adds new value in on-demand translation of arbitrary text - which the static works it was trained on did not provide.

Substantiality of code does not apply to substantiality of style.

I'm not certain what you're saying here.

The copying clearly is necessary for the purpose. No copying, no model.

Which, for the substantiality factor, works in favor of the model developers.

It's the essence of an artist's look and feel that's being duplicated and used for commercial gain without a license.

Copyright protects works fixed in a tangible medium, not ideas in someone's head. It would protect a work's look/appearance (which can be an issue for AI when overfitting causes outputs that are substantially similar to a protected work), but not style or "an artist's look and feel".

"many other things" has included, for example, Google Books scanning millions of in-copyright books, storing internally them in full, and making snippets available.

That succeeds on a different part of the four-factor test, the degree to which it competes with / affects the market for the original.

Google Books is not automatically producing new books derived from their copies that compete with the original books.

That succeeds on a different part of the four-factor test, the degree to which it competes with / affects the market for the original

It satisfied multiple parts of the four-factor test. It was found satisfy the first factor due to being "highly transformative", the second factor was considered not dispositive is isolation and favoring Google when combined with its transformative purpose, and it satisfied the third factor as the usage was "necessary to achieve that purpose" - with the court making the distinction between what was copied (lots) and what is revealed to the public (limited snippets).

As you had all factors as "maximally against" fair use, do you believe that AI is significantly less transformative than Google Books? I'd say even in cases where the output is the same format as the content it was trained on, like Google Translate, it's still generally highly transformative.

the degree to which it competes with

Specifically, to be pedantic, it's the effect of the use/copying of the original copyrighted work.

Bear with me here. Rushed and poorly articulated post incoming...

In the broadest sense, generative AI helps achieve the same goals that copyleft licences aim for. A future where software isn't locked away in proprietary blobs and users are empowered to create, combine and modify software that they use.

Copyleft uses IP law against itself to push people to share their work. Generative AI aims to assist in writing (or generating) code and make sharing less neccesary.

I argue that if you are a strong believer in the ultimate goals of copyleft licences you should also be supporting the legality of training on open source code.

The obvious difference is that copyleft is voluntary, while having your art style stolen isn't.

If an artist approached a software developer, created a painting of them using their Mac, and said "There, I've done your job for you" you'd think they were an idiot.

This is the same from the other side. The inability to understand why that's a realistic analogy does not change the fact that it is.

"> The obvious difference is that copyleft is voluntary, while having your art style stolen isn't."

This is why it is important whether you consider that infringement occurs upon ingestion or output. If it only matters for outputs, then artists have a problem, since copyright doesn't protect styles at all, see for example the entire fashion industry.

There is a saving grace though: Artists can make a case that the association of their distinctive style with their name is at least potentially a violation of trademark or trade dress, especially if that association is being used to promote the outputs to the public. This is a fairly clear case of commercial substitution in the market for creating new works in that artist's style and creating confusion concerning the origin of the resulting work.

Note that the market for creating new works in a particular artist's distinctive and named style kind of goes away upon the artist's passing. What remains is the trademark issue of whether a particular work was actually created by the artist or not, which existing trademark law is well suited to policing, as long as the trademark is defended, even past the expiration of the copyright.

Meanwhile, trademark (and copyright) also apply to the subjects of works, like Nintendo's Mario or Disney's Mickey Mouse or Marvel's Iron Man. But we don't really want models to simply be forbidden from producing them as outputs, or they become useless as tools for the purpose of parody and satire, not to mention the ability to create non-commercial fan art. The potential liability for violating these trademarks by publishing works featuring those characters rests with the users rather than the tools, though, and again existing law is fairly well suited to policing the market. Similarly, celebrities' right of publicity probably shouldn't prevent models from learning what they look like or from making images that include their likeness when prompted with their name, but users better be prepared to justify publishing those results if sued.

You can also make the (technical) argument that if you just ask for an image of Wonder Woman, and you get an image that looks like Gal Gadot as Wonder Woman, that the model is overfitting. That's also the issue with the recent spate of coverage of Midjourney producing near-verbatim screenshots from movies.

It might be appropriate though to regulate commercial generative AI services to the extent of requiring them to warn users of all the potential copyright/trademark/etc. violations, if they ask for images of Taylor Swift as Elsa, or Princess Peach, or Wonder Woman, for example.

The obvious difference is that copyleft is voluntary, while having your art style stolen isn't.

What a curious type of theft where the author keeps their art and I get different art.

The majority of AI models out there (at least by popularity / capability) are proprietary; with weights and even model architectures that are treated as trade secret. Instead of having human-written music and movies that you legally can't copy, but practically can; you now have slop-generating models that live on a cloud server you have no control over. Artists and programmers who want to actually publish something - copyright or no - now have to compete with AI spam on search engines, while ChatGPT gets to merely be "confidently wrong" because it was built on the Internet equivalent of low-background metal - pre-AI training data. Generative AI is not a road that leads to less intellectual property[0], it's just an argument for reappropriating it to whoever has the fastest GPUs.

This is contrary to the goals of the Free Software movement - and also why Free Software people were the first to complain about all the copying going on. One of the things Generative AI is really good at is plagiarism - i.e. taking someone else's work and "rewriting it" in different words. If that's fair use, then copyleft is functionally useless.

It's important to keep in mind the difference between violating the letter of the law and opposing the business interests of the people who wrote the law. Copyleft and share-alike clauses have the intention of getting in the way of copyright as an institution, but it also relies on copyright to work, which is why the clauses have power even though they violate the spirit of copyright. Generative AI might violate the letter of the law, but it's very much in the spirit of what the law wants.

[0] Cory Doctorow: "Intellectual property is any law that allows you to dictate the conduct of your competitors"

Is FSF's stance on AI actually clear? I thought they were just upset it was made by Microsoft.

Creative Commons has been fairly pro-AI -- they have been quite balanced, actually, but they do say that opt-in is not acceptable, it should be opt-out at most. EFF is fairly pro AI too -- at least, against using copyright to legislate against it.

You shouldn't discount progress in the open model ecosystem. You can sort of pirate ChatGPT by fine tuning on its responses, there's GPU sharing initiatives like Stable Horde, there's TabbyML which works very well nowadays, and Stable Diffusion is still the most advanced way of generating images. There's very much of an anti-IP spirit going on there, which is a good thing -- it's what copyleft is there for in sprit, isn't it?

Where this argument falls down for me is that "use" w.r.t. copyright means copying, and neither AI models nor their outputs include any material copied from the training data, in any usual sense. (Of course the inputs are copied during training, but those copies seem clearly ephemeral.)

Genuinely curious: for anyone who thinks AI obviously violates copyright, how do you resolve this? E.g. do you think the violation happens during training or inference? And is it the trained model, or the model output, that you think should be considered a derived work?

Personally I think trained models are derived works of all the training data.

Just like a translation of a book is a derived works of the original. Or a binary compiled output is a derived works of some source code.

Wikipedia:

In copyright law, a derivative work is an expressive creation that includes major copyrightable elements of ... the underlying work

A trained model fails that on two counts, doesn't it? Both the "includes" part, and the fact that a model is itself not an expressive work of authorship.

Curating training data is an exercise in editorial judgement.

You're trying to use words without the legal context here. The legal definition of words isn't 1-1 wit our colloquial usage.

Translation of a book is non-transformative and retains the original author's artistic expression.

As a counter example - if you write an essay about Picasso's Guernica painting, it is derivative according to our colloquial use of the term, but legally it's an original work.

"How much of the work will you use?" - All of it

That depends on the interpretation of "use", and it would be interesting to read what lawyers think. You learned the language largely from speech and copyrighted works. (All the stories, books, movies, etc. you ever read/heard) When you wrote this comment did you use all of them for that purpose? Is the case of AI different?

To be clear that's a rhetorical question - I don't expect anyone here to actually have a convincing enough argument either way.

Principles applied to human brains are not automatically applicable to AI training. To the best of my knowledge, there's no particular law that says a human brain is exempt from copyright, but it empirically is, because the alternative would be utterly unreasonable. No such exemption exists for AI training, nor should it.

Ideas/works/etc literally live rent-free in your head. That doesn't mean they should live rent-free in an AI's neural network.

Changing that should involve actually reducing or eliminating copyright, for everyone, not giving a special pass to AI.

To the best of my knowledge, there's no particular law that says a human brain is exempt from copyright, but it empirically is, because the alternative would be utterly unreasonable.

Human brain most definitely is not exempt. If you read Lord of the Rings and then write down a new book, with the same characters and same story line - that's plain copying(lookup the etymology of the verb to copy). If you look at a painting and paint a very similar painting - that's still copying.

Human brains are the reason we have copyright. Your recital of passages from any copyrighted book would violate the copyright, if not for fair use doctrine. And it has nothing to do with whether you do it yourself, or have a TTS engine produce the sound.

The human brain is absolutely exempt, insofar as the copy stored in your brain does not make your brain subject to copyright, even if a subsequent work you produce might be. Nobody's filing copyright infringement claims over people's memories in and of themselves.

I'm saying that AI does not and should not automatically get the exception that a human brain does.

AI is a genie that you can't really stuff back into a bottle. It's out and it's global.

If the US had tighter regulations, China or someone else will take over the market. If AI is genuinely transformative for productivity, then the US would just fall behind, sooner or later.

Then let them! If another country put forward tighter regulations to help actual people over and above the state that holds them, then that is good in itself, and either way will pay for itself. Why are we worried about China or whoever taking over the market of something that we see has bad effects?

Like, we see this line everywhere now, and it simply doesnt make sense. At some point you just have to believe something, be principled. Treating the entire world as this zero sum deadlock of "progress" does nothing but prevent one from actually being critical about anything.

This would-be Oppenheimer cosplay is growing really old in these discussions.

"What is the character of the use?" - Commercial

Your first factor seems to not at all be like that which Stanford has in its guidelines[1], which they call the transformative factor:

In a 1994 case, the Supreme Court emphasized this first factor as being an important indicator of fair use. At issue is whether the material has been used to help create something new or merely copied verbatim into another work.

LLMs mostly create something new, but sometimes seems to be able to regurgitate passages verbatim, so I can see arguments for and against, but to my untrained eyes doesn't seem as clear cut.

[1]: https://fairuse.stanford.edu/overview/fair-use/four-factors/

That makes no sense. OpenAI must lose and it must not be possible to have proprietary models based on copyrighted works. It's not fair use because OpenAI is profiting from the copyright holders work and substituting for it while not giving them recompense.

The alternative is that any models widely trained on copyrighted work are uncopyrightable and must be disclosed, along with their data sources. In essence this is forcing all such models to be open. This is the only equitable outcome. Any use of the model to create works has the same copyright issues as existing work creation, ie if substantially replicates an existing work it must be licenced.

For what it’s worth, I agree with your second paragraph. But it would take legislation to enforce that. For now, it’s unclear that OpenAI will lose. Quite the opposite; I’ve spoken with a few lawyers who believe OpenAI is on solid legal footing, because all that matters is whether the model’s output is infringing. And it’s not. No one reads books via ChatGPT, and Dalle 3 has tight controls preventing it from generating Pokémon or Mario.

All outcomes suck. The trick is to find the outcome that sucks the least for the majority of people. Maybe the needs of copyright holders will outweigh the needs of open source, but it’s basically guaranteed that open source ML will die if your first paragraph comes true.

Proposal: revenue from Generative AI should be taxed 10% for an international endowment for the arts. In exchange, copyright claims are settled.

With a minimum rate, such that no-one can pretend they’re getting no income from it.

We might apply that as a $5000 or so surcharge on AI accelerators capable of running the models, such as the 4090.

But it would take legislation to enforce that.

Absolutely true. That's the end game and we should be working toward influencing that. It's within our power.

I’ve spoken with a few lawyers who believe OpenAI is on solid legal footing

No one knows anything, this is too novel, and even if OpenAI gets some fair use ruling, it will be inequitable and legislation is inevitable. OpenAI is between a rock and a hard place here. If you read the basis for fair use and give each aspect serious consideration, as a judge should do, I can't see it passing fair use muster. It's not a case of simply reproducing work, which in unclear here, it's the negative effect on copyright holders, and that effect is undeniable.

All outcomes suck.

I don't think so. It's possible to fashion something equitable, but people other than the corporations have to get involved.

Just because something is not copyrightable doesn’t automatically mean it must be disclosed. If weights aren’t copyrightable (and I don’t think they should be, as the weights are not a human creation), commercial AI’s just get locked behind API barriers, with terms of usage that forbid cloning. Copyright then never enters the picture, unless weights get leaked.

Whether or not that’s equitable is in the eye of the beholder. Copyright is an artificial construct, not a natural law. There is nothing that says we must have it, or we must have it in its current form, and I would argue the current system of copyright has been largely harmful to creativity for a long time now. One of the most damning statements I’ve read in this thread about the current copyright system is how there’s simply not enough unlicensed content to train models on. That is the bed that the copyright-holding corporations have made for themselves by lobbying to extend copyright to a century, and it all but assured the current situation.

Just because something is not copyrightable doesn’t automatically mean it must be disclosed.

No I'm saying that's what they law should be, because models can be built and used without anyone knowing. If it's illegal not to disclose them you can punish people.

Copyright is something that protects the little guy as much as big corps. But the former has more to lose as a group in the world of AI models, and they will lose something here no matter what happens.

I would argue the current system of copyright has been largely harmful to creativity for a long time now

I'd love to hear that argument.

How has the current system of copyright been harmful to creativity?

If you require licensing fees for training data, you kill open source ML.

kill open source ML -> decrease speed of improvements for some open source ML

Sadly not. Making something illegal has social effects, not just legal effects. I’ve grown tired of being verbally spit on for books3. One lovely fellow even said that he hoped my daughter grows up resenting me for it.

It being legal is the only guard against that kind of thing. People will still be angry, but they won’t be so numerous. Right now everyone outside of AI almost universally despises the way AI is trained.

Which means you won’t be able to say that you do open source ML without risking your job. People will be angry enough to try to get you fired for it.

(If that sounds extreme, count yourself lucky that you haven’t tried to assemble any ML datasets and release them. The LAION folks are in the crosshairs for supposedly including CSAM in their dataset, and they’re not even a dataset, just an index.)

If everyone is unhappy with your rampant piracy, then perhaps that is a sign that you’re doing it wrong?

Perhaps. The reason I did it was because OpenAI was doing it, and it’s important for open source to be able to compete with ChatGPT. But if OpenAI’s actions are ruled illegal, then empirically open source wasn’t a persuasive enough reason to allow it.

Is there evidence that it's actually everyone or even close to everyone? The core innovation that the internet brought to harassment is that it is sufficient for some 0.0...01% of all people to take issue with you and be sufficiently dedicated to it for every waking minute of your life to be filled with a non-stop torrent of vitriol, as a tiny percentage of all internet users still amounts to thousands.

US copyright has limited reach. There are models trained in China, where the IP rules are... not really enforced. It would be an interesting world where you use / pay for those models because you can't train them locally.

Right now everyone outside of AI almost universally despises the way AI is trained.

I don't agree with this. Most people don't care at all, and at best people would argue about some form of compensation.

Saying "everyone" is unsubstantiated.

I mean... "Everyone was angry at Napster" at the same time "everyone is angry at the MPAA/RIAA"

Replying to a deleted comment:

It sounds as if you imply that would be bad. But what if it wasn't?

Entirely possible. The early history of aviation was open source in the sense that many unlicensed people participated, and died. The world is strictly better with licensing requirements in place for that field.

But no one knows. And if history is any guide for software, it seems better to err on freedoms that happen to have some downside rather then clamping down on them. One could imagine a world where BitTorrent was illegal. Or cryptography, or bitcoin.

Are you really comparing licensing for a profession with licensing of IP?

It’s much the same. Only authorized people are allowed to do X. Since X costs a lot of money, by definition it can’t be open source. There are no hobbyist pilots that carry passengers without a license, and if there are, they’re quickly told to stop. Generative AI faces a real chance of having the same fate. Which means open source will look similar to these planes trying to compete with commercial aircraft: https://pilotinstitute.com/flying-without-a-license/

If you can think of a better example, I’d like to know though. I’ll use it in future discussions. It’s hard to think of good analogies when the tech has new social effects.

If I fly a plane and crash, my passengers die. If I generate an image using a model whose training included some unlicensed imagery... Disney misses out on a fraction of a cent?

There is a real reason why some professions are licenced and others are not.

Your analogy is nonsensical. Not having a better one is irrelevant.

If training data requires licensing fees, ML practitioners will become a licensed field de facto, because no one in the open source world will have the resources to pursue it on their own.

Perhaps a better analogy is movies. At least with acting, you can make your own movies, even if you’re on a shoestring budget. With ML, you quite literally can’t make a useful model. There’s not enough uncopyrighted data to do anything remotely close to commercial models, even in spirit.

If training data requires licensing fees, ML practitioners will become a licensed field de facto,

You know the word "license" has multiple, dissimilar meanings, right?

Is there a license that states: if you use this data for ML training you must open source model weights and architecture?

It’s deeper than that. The basis of licensing is copyright. If the upcoming court cases rule in OpenAI’s favor, you won’t be able to apply copyright to training data. Which means you can’t license it.

Or rather, you can, but everyone is free to ignore you. A license without teeth is no license at all. The GPL is only relevant because it’s enforceable in court.

I’m sure some countries will try the licensing route though, so perhaps there you’d be able to make one.

EDIT: I misread you, sorry. You’re saying that if OpenAI loses and license fees become the norm, maybe people will be willing to let their data be used for open source models, and a license could be crafted to that effect.

Probably, yes. But the question is whether there’s enough training data to compete with the big companies that can afford to license much more. I’m doubtful, but it could be worth a try.

The GPL is only relevant because it’s enforceable in court.

The irony of GPL, is that it's validity with respect to users is only now tested in court.

https://www.dlapiper.com/en/insights/publications/2024/01/sf...

The point should be to kill training on unlicensed material. There needs to be regulation and tools to identify what was the training data. But as always, first comes the siphoning part, the massive extraction of value, then when the damage is done there will be the slow moving reparations and conservationism.

A ton of us out here don't agree with your goals. I think these models are transformative enough that the value added by organizing and extracting patterns from the data outweighs the interests of the extremely diffuse set of copyright holders whose data was ingested. So regardless of the technical details of copyright law (which I still think are firmly in favor of OpenAI et al) I would strongly opposed any effort to tighten a legal noose here.

Agreed. And every software engineer writing code should pay 10% of their salary to the publishers of the books that they learned their programming skills from.

If you require licensing fees for training data, you kill open source ML.

This is another one of those “well if you treat the people fairly it causes problems” sort of arguments. And: Sorry. If you want to do this you have to figure out how to do it ethically.

There are all sorts of situations where research would go much faster if we behaved unethically or illegally. Medicine, for example. Or shooting people in rockets to Mars. But we can’t live in a society where we harm people in the name of progress.

Everyone in AI is super smart — I’m sure they can chin-scratch and figure out a way to make progress while respecting the people whose work they need to power these tools. Those incapable of this are either lazy, predatory, or not that smart.

"Ethical" in this case is a matter of opinion. The whole point of copyright was to promote useful sciences and arts. It’s in the US constitution. You don’t get to control your work out of some sense of fairness, but rather because it’s better for the society you live in.

As an ML researcher, no, there’s basically no way to make progress without the data. Not in comparison with billion dollar corporations that can throw money at the licensing problem. Synthetic data is still a pipe dream, and arguably still a copyright violation according to you, since traditional models generate such data.

To believe that this problem will just go away or that we can find some way around it is to close one’s eyes and shout "la la la, not listening." If you want to kill open source AI, that’s fine, but do it with eyes open.

I would say that GPT-3 and its successors have nothing to do with open source, and if OpenAI uses open source as a shield, then we are all doomed. I would distance myself and any open source projects from involvement in OpenAI court cases as far as possible. Yes, they have delivered some open source models, but not all of them. Their defense must revolve around fair use and purchased content if they use books and materials that were never freely available. It should be permissible to purchase a book or other materials once and use them for the training of an unlimited number of models without incurring licensing fees.

The reality is always a dynamic tension between law, regulation, precedent, and enforceability.

It is possible to strangle OpenAI without strangling AI: pmarca is anti-OpenAI in print, but you can bet your butt he hopes to invest in whatever replaces it, and he’s got access to information that like, 10 people do.

A useful example would be the Napster Wars: the music industry had been rent seeking (taking the fucking piss really) for decades and technology destroyed the free ride one way or another. The public (led by the technical/hacker/maker public) quickly showed that short of disconnecting the internet, we were going to listen to the 2 good songs without buying the 8 shitty ones. The technical public doesn’t flex its muscles in a unified way very often, but when it does, it dictates what is and isn’t on the menu.

The public wants AI, badly. They want it aligned by them within the constraints of the law (which is what “aligned” should mean to begin with).

The public is getting what it wants on this: you can bet the rent. Whether or not OpenAI gets on board or gets run the fuck over is up to them.

“You in the market for a Tower Records franchise Eduardo?”

But anyway, even if it were all true, the only reason we are talking about diffusers, and the only reason we are paying attention to this author's work Fairly Trained, is because of someone training on data that was not expressly licensed.

Thanks for putting this into words. I'm of the same opinion and this is the best articulation I have so far.

Calling him "the person hired to build Stable Audio" seems a bit misleading? He was in a executive position (VP of product for Stability's audio group). An important position, but "person hired to build" to me evokes the image of lead developer/researcher.

I think that also helps in understanding his departure, since he's a founder with a music background.

It isn't unusual for those in leadership positions to use such phrasing when talking about projects and products. It's not a "taking credit" from the engineers sort of thing, but rather about the leadership of the engineers.

Agreed. Leadership can sometimes bring actual value ;)

And to be clear, I’m not sure Ed would call himself that. Those are my words, not his.

Managing a group of people is not synonymous with doing the actual knowledge work of researching and developing innovations that enabled this technology. I find it hard to believe that the contribution of his management somehow uniquely enabled this group of engineers to create this using their experience and expertise.

A captain may steer the ship, but they're not the one actually creating and maintaining the means by which it moves.

Person A gets hired to write the software that is the company's actual product.

Person B gets hired to observe Person A working, check email, and be the audio output buffer for Jira.

Person B says "I built this."

That's dishonesty no matter what the titles are or how important the emails were.

Not that it would have stopped the company for doing it anyway, but couldn't he think about that before working from them?

Or did he needed that as it i part of the business model of his certfications?

It's a complex topic and perceptions change.

Ed still likes Stability, especially as we fully trained stable audio on rights licensed data (bit different in audio to other media types), offer opt out of datasets etc.

There has to be a solution for the copyright roadbloacks that companies encounter when training models. I see it no different than an artist creating music which is influenced by the music the artist has been listening throughout his whole life, fundementally it's the exact same thing. You cannot create music or art in general in a vacuum

That’s an interesting take. But quite the odd stance since he joined Stability and the training of Stable Diffusion was well known.

Warning: This website may not function properly on Safari. For the best experience, please use Google Chrome.

We've come full circle with the 90's and Internet Explorer. Well I guess this time the dominant browser is opensource so that's atleast something...

Can someone please create an animated GIF button for Chrome which says: "Best viewed with Google Chrome"?

Chrome isn't open source, chromium is. Best not to confuse the two.

Chrome and Chromium are virtually identical except for Google services, which aren't required to do anything with the browser except for installing Chrome extensions that can alternatively be sideloaded, so this is nitpicking.

Tangential, but I tried to build chromium the other day but stopped when it said it required access to Google cloud platform to actually build it. If something requires a proprietary build system, does it matter that it's open source?

That is not true. See every distribution packaging chromium.

In particular, this package[1] by openSUSE builds completely offline. Many other distributions require packages to build offline.

[1] https://build.opensuse.org/package/show/network:chromium/chr...

Jumping in to defend parent comment, there’s nothing Open Source about Google Chrome and it’s highly relevant in this context because they are notorious for putting technologies and tracking in there that many people find objectionable.

Don't forget media DRM built into Chrome but not Chromium.

It's essential nitpicking

I found this article to explain it well:

https://www.lifewire.com/chromium-and-chrome-differences-417...

and there is a further ungoogled-chromium:

https://en.wikipedia.org/wiki/Ungoogled-chromium

Website works fine on safari too, I didn’t notice any issues

Same, I wonder what issue they thought they had...

Safari is known to be troublesome when a webpage contains many HTML audio players. It can get extremely slow and unresponsive.

Every researcher I know in the audio domain uses Chrome for exactly that reason. The alternative would be not to use the standard HTML audio tag which would be ridiculous.

Can someone please create an animated GIF button for Chrome which says: "Best viewed with Google Chrome"?

Here you go:

Edit: View the button: https://indiscipline.github.io/post/best-viewed-in-google-ch...

Surprised to see an actual gif pop up after adding that to a site. I guess thats just base64, still kind of amazing that its all inside a seemingly random string of text

This is right into the "uncanny valley" of music.

It definitely sounded "like music", but none of it is what a human would produce. There's just something off.

Here is a silly song I generated using suno.ai, which I have found to be incredibly impressive (at least, a small percentage of its outputs are very good, most are bad). I think it's good enough that most humans wouldn't realise it's AI generated. https://app.suno.ai/song/8a64868d-9dd3-46db-91af-f962d4bec8b...

Wow. I’m guessing it’s generating MIDI or something rather than synthesizing audio from scratch? Even so, the quality of the score is leaps and bounds better than any of the long-form audio on the Stable Audio demo page (either Stable Audio itself or the other models). The audio model outputs seem to take a sequence of 1 to 3 chords, add a barebones melody on top, and basically loop this over and over. When they deviate from the pattern, it feels unplanned and chaotic and they often just snap back to the pattern without resolving the idea added by the deviation. (Either that or they completely change course and forget what they were doing before.) Yes, EDM in particular often has repetitive chord structures and basic melodies, but it’s not that repetitive. In comparison, from listening to a few suno.ai outputs, they reliably have complex melodies and reasonable chord progressions. They do tend to be repetitive and formulaic, but the repetition comes on a longer time scale and isn’t as boring. And they do sometimes get confused and randomly set off in a new direction, but not as often. Most of the time, the outputs sound like real songs. Which is not something I knew AI could do in 2024.

I don't have any special insight into how it works, but I suspect it is largely synthesizing audio from scratch. The more I've thought about it, the task of generating music feels very similar to the task of text-to-speech with realistic intonation. So feels like the same techniques would be applicable.

Suno do have an open source repo here that presumably uses similar tech: https://github.com/suno-ai/bark

Bark was developed for research purposes. It is not a conventional text-to-speech model but instead a fully generative text-to-audio model, which can deviate in unexpected ways from provided prompts. Suno does not take responsibility for any output generated. Use at your own risk, and please act responsibly.

I've generated probably >200 songs now with Suno, of which perhaps 10 have been any good, and I can't detect any pattern in terms of the outputs.

Here's another one which is pretty good. I accidentally copied and pasted the prompt and lyrics, and it's amazing to me how 'musically' it renders the prompt:

https://app.suno.ai/song/d7bad82b-3018-4936-a06d-8477b400aae...

Here are a couple more which are pretty good (i use it primarily for making fun songs for my kids):

https://app.suno.ai/song/a308ca8a-9971-47a3-8bb3-a95126ff1a8...

https://app.suno.ai/song/3b78a631-b52a-4608-a885-94f2edc190b...

And this one's kindof interesting in that it can render 'gregorian chant' (i mean it's not very good): https://app.suno.ai/song/0da7502b-73cf-4106-88e8-26f4f465a5f...

But this is one reason it feels like these models are very similar to text-to-speech but with a different training set

My understanding is that they use a side effect of the Bark model. The comment https://news.ycombinator.com/item?id=35647569 from JonathanFly probably explains it well. If you train your model on a massive amount of audio mixes of lyrics+music then prompting lyrics alone pulls the music with it as when the comment suggested that prompting context-correlated texts might pull the background noises usual for such context. Already while writing this I imagine training with a huge set of publicly performed poetry pieces that would allow generating novel performances of artificial poets with novel prompts. This is different to riffusion.com approach, where works the genius idea of more or less feeding spectrograms as images to Stable Diffusion.

That’s impressive. Why do the printed lyrics for the second chorus differ from the audio? (Which repeats those from the first chorus)

I generated the lyrics using ChatGPT 4 and the suno model attempts to follow them.

It generally does a good job, but I have noticed it's fairly common in a second chorus for it to ignore the direction and instead use the same lyrics as the first chorus

That is fantastic. It has a bit of weirdness in the background, but nothing that would stop me from enjoying it.

Very good for my taste, but I should clarify, I'm obsessed with catchy tunes, as a listener and as a hobby musician, growing my own brainworms from time to time. And I must say that suno.ai is very impressive, in my case semi-ready brainworms are almost always in 30%-50% cases. And what's more important, it's really an inspiration tool for all kinds of tasks, like lyrics polishing or playing-along after track separation. Maybe catchy melodies are not for all, but who can argue with charts when The Beatles, ABBA and Queen were almost always producers of ones.

The overall audio quality sounds pretty good and it seems to do a good job of sustaining a consistent rhythm and musical concept. But I agree there's something "off" about some of the clips.

- The rave music sounds great. But that's because EDM can be quite out there in terms of musical construction.

- The guitar sounds weird because it doesn't sound like chords a human hand can make on a tuning nobody tunes their guitar to - with a strange mix of open and closed strings that don't make sense. I think the restrictions of what a guitar can do aren't well understood by the model.

- The disco chord progression is bizarre. It doesn't sound bad, but it's unlikely to be something somebody working in the genre would choose.

- meditation music - I mean, most of that genre may as well just be some randomized process

- drum solo - there's some weird issues in some of the drum sounds, things like cymbals, rides and hats changing tone in the middle of a note, some of the toms sound weird, it sounds like a mix of stick and brush and stick and stick and brush all at the same time...it's sort of the same problem the solo guitar has where it's just not produced within the constraints of what a drum player can actually do on an instrument made of actual drums

- sound effects, all are pretty good, a little chunky and low bit-rate or low sample-rate sounding, there's probably something going on in the network that's reducing the rate before it gets build back up. There's a constant sort of reverb in all of the examples

I honestly can't say I prefer their model over some of the musicgen output even if their model is doing a better job at following the prompts in some cases.

All of the models have a very low bitrate encoding problems and other weird anomalous things. Some of it reminds me of the output from older mp3 encoders, where hihats and such would get very "swishy" sounding. You can hear some of it in the autoencoder reconstructions, especially the trumpet and the last example.

However, in any case, I'm actually glad in some ways to see the progress being made in this area. It's really impressive. This was complete science fiction only a very few years ago.

- drum solo - there's some weird issues in some of the drum sounds, things like cymbals, rides and hats changing tone in the middle of a note, some of the toms sound weird, it sounds like a mix of stick and brush and stick and stick and brush all at the same time...it's sort of the same problem the solo guitar has where it's just not produced within the constraints of what a drum player can actually do on an instrument made of actual drums

And I would say that there is also background noise from time to time, at some point I heard some noise akin to voices. Maybe it is some artifact caused by the training data (many drum solos are performed exclusively live).

AI pictures are the same. We are more tolerant of six fingered-pictures with missing limbs, for some reason.

One thing I noticed is that when it’s playing chords, it seems a lot more likely than human players to put both major and minor thirds in. This isn’t unheard of — the famous Hendrix chord in “Purple Haze” consists of root, major third, 7th, minor third. But it sounds pretty weird when you do it in every chord.

As with Stable Diffusion, text prompting will be the least controllable way to get useful output with this model. I can easily imagine midi being used as an input with control net to essentially get a neural synthesizer.

Yes. Since working on my AI melodies project (https://www.melodies.ai/) two years ago, I've been saying that producing a high-quality, finalized song from text won't be feasible or even desirable for a while, and it's better to focus on using AI in various aspects of music making that support the artist's process.

Emad hinted here on HN the last time this was discussed that they were experimenting with exactly that. It will come, by them or by someone else quickly.

Text-prompting is just a very coarse tool to quickly get some base to stand on, ControlNet is where the human creativity again enters.

Yeah, we build ComfyUI so you can imagine what is coming soon around that.

Need to add more stuff to my Soundcloud https://on.soundcloud.com/XrqNb

Text will be an important input channel for texture, sound type, voice type and so on. You can't just use input audio, that defeats the point of generating something new. You can't also only use MIDI, it still needs to know what sits behind those notes, what performance, what instrument. So we need multiple channels.

For music perhaps. For sound effects I think text prompting is the rather good UI.

Controlnet/img2img style where you can mimic a sound with your mouth and it then makes it realistic could also be usable.

It's crazy that nobody cares. It seems to me that ML hype trends focus on denying skills and disproving creativity by denoising randoms into what are indistinguishable from human generation, and to me this whole chain of negatives don't seem to have proven its worth.

LLMs allow people without certain skills to be creative in forms of art that are inaccessible to them.

With Dalee - I can get an image of something I have in my head, without investing into watching hundreds of hours of Bob Ross(which I do anyway)

With audio generators - I can produce music that is in my head, without learning how to play an instrument or paying someone to do it. I have to arrange it correctly, but I can put out a techno track without spending years in learning the intricacies.

I think it would be ideal if it could take the audio recording of humming or singing a melody together with a text prompt and spitting out a track that resembles it

But works great when you don’t need much control, prompt example: “Free-jazz solo by tenor saxophonist, no time signature.”

I think we still need the step where the AI learns what a high quality sound library sounds like and then applies the previously learned abilities by triggering sounds of that library via MIDI.

That way you'd get perfect audio quality with the creativity of a musical AI.

How would MIDI get you eg a guitar being played dirty? Or some subtle echo that comes from recording in a bathroom?

You could have AI do some postprocessing. I think a similaar approach is the future for image generation, you have a model output a 3D scene, use a classical raytracer to do rendering and then have a final model apply corrections to achieve photorealism.

It would use a sampler and for the subtle echo effect add a reverb to the bus.

https://www.youtube.com/watch?v=EQdp2QLiSYQ&t=187s

the AI designs and controls the effects chain and mastering too

I've always wished for something like that for image generation AI. It'd be much cooler/more interesting to watch AI try to draw/paint pictures with strokes rather than just magically iterate into a fully-rendered image. I dunno what kind of dataset or architecture you could possibly apply to accomplish this, but it would be very interesting.

I get what you’re saying, but if you watch Stable Diffusion do each step it’s at least kind of similar. If you keep the same seed but change a detail, often the broad “strokes” are completely the same.

Isn‘t that what suno.ai does?

Warning: This website may not function properly on Safari. For the best experience, please use Google Chrome

Do better

Have you ever heard of an MVP?

That would be pertinent if it wasn't just a static web page with just text and some audio files to be played.

Reading about it, that ironically seems to be the exact problem Safari has. I mean the page "works" in Safari it's just you get these really random delays to the start of some of the sounds with all sorts of web discussion threads saying different ways to mitigate it on different platforms. I don't really fault them for having the goal to publish a paper and go the extra bit to make a friendly but imperfect webpage instead of being website creators who happen to publish papers on the side.

By the way, it does work on Firefox Android. No idea of what there is in Safari that's not standard in Chrome and Firefox.

...and recommend Firefox

is what you meant to say right? :)

I was briefly excited about the idea of generating sound effects, but those "footsteps" are incredibly bad.

I tried generating music on stableaudio.com and, yes, it's bad. However, given the blistering pace of developing in these models, I would not be surprised if these sound incredible in a year or two.

Everyone every time seems to assume a linear (or exponential) curve upwards.

But what is the proof for that?

I consider it far more likely that we had a breakthrough and now rushing towards the next plateau. Maybe are nearing that.

Like in the curve of a PID controller. It's how most or many human improvements go.

I'd say most are thinking of Midjourneys success in image generation when talking about this kind of progress.

I'm too.

But I still see no evidence that this keeps improving and not plateauing at some (current?) level.

The plateau we're heading for is getting professional human level output from these models with logarithmic progress.

I suspect this is because the underlying production factors like compute, data & model design are steadily improving whilst humans have diminishing sensitivity to output quality.

In the game of AI generated photorealistic images or history essays there's not much improvement left to make. Most humans are already convinced by the output of these things.

So there aren't public weights, is that right? Having trouble finding anything that says one way or the other.

edit: Oh okay, didn't realize this was somehow a controversial comment to make. It would have been great if you had answered the question before downvoting but that's fine I suppose.

Nope. They did release code for training, inference and fine tuning, but no datasets or weights.

See https://github.com/Stability-AI/stable-audio-tools

Wonder if it's an IP issue. They don't want every record label coming after them.

Yeah that tracks.

Thanks!

"Gen AI is the only mass-adoption technology that claims it's Ok to exploit everyone's work without permission, payment, or bringing them any other benefit."

Is it? What about the printing press, photography, the copier, the scanner ...

Sure, if a commercial image is used in a commercial setting, there is a potential legal case that could argue about infringement. This should NOT depend on the production means, but on the merit of the comparisons of the produced images.

Xerox should not be sued because you can use a copier to copy a book (trust me kids, book copying used to be very, very big).

Art by its social nature is always derivative, I can use diffusion models to create uncontestably original imagery. I can also try to get them to generate something close to an image in the training set if the model was large enough compared to the training set or the work just realy formulaic. However. It would be far easier and more efficient to just Google the image in the first place and patch it up with some Photoshop if that was my goal.

Where was this quote pulled from? I can't find it in the site, paper, or code repo readmes for some reason. Did the HN link get changed?

But the social nature of art also means that humans give the originator and their influences credit - of course not the entire chain but at least the nearest neighbours of influence. While a user of a diffusion generator does not even know the influences unless specifically asked for.

Shoulders of giants as a service.

> Xerox should not be sued because you can use a copier to copy a book (trust me kids, book copying used to be very, very big).

The appropriate analogy here isn't suing Xerox, but suing Kinko's (now FedEx Office).

And it isn't just books, but other sorts of copyrighted material as well, such as photographs, which are still an issue.

Art by its social nature is always derivative, I can use diffusion models to create uncontestably original imagery

How are you defining “uncontestably original” here?

The output could not exist if not for the training set used to train the model. While the process of deriving the end result is different than the one humans use when creating artwork, the end result is still derived from other works, and the degree of originality is a difference of degree, not of kind when compared to human output. (I acknowledge that the AI tool is enabled by a different process than the one humans use, but I’m not sure that a change in process changes the derivative nature of all subsequent output).

As a thought experiment, imagine that assuming we survive, after another million years of human evolution, our brains can process imagery at the scale of generative AI models, and can produce derivative output taking into account more influences than any human could even begin to approach with our 2024 brains.

Is the output no longer derivative?

Now consider the future human’s interpretation of the work vs. the 2024 human’s interpretation of the work. “I’ve never seen anything like this”, says the 2024 human. “The influences from 5 billion artists over time are clear in this piece” says the future human.

The fundamental question is: on what basis is the output of an AI model original? What are the criterion for originality?

This is incredibly good compared to SOTA music models (MusicGen, MusicLM). It looks like there's also a product page where you can subscribe to use it, similar to Midjourney: https://www.stableaudio.com/

Sadly it's not open-weight and it doesn't look like there's an API (again like Midjourney): you subscribe monthly to generate audio in their UI, rather than having something developers can integrate or wrap.

I was hoping to use it to generate some sound effects to use in a game I'm working on - but looks like I need an "enterprise license" (https://www.stableaudio.com/pricing)

Why does this have a different clause I wonder, and doesn't just fall under "In commercial products below 100,000 MAU"?

Thankfully you can train it at home, the bigger question is a data.

There is a CC licensed version soon plus API.

Models are advancing very fast, will be quite the year for music.

Not trying to knock the progress here, impressive. As a drummer, 'drum solo' is about as boring as it gets and some weird interspersing sounds. So, it depends on the intended audience.

FWIW the sound effects also are not 'realistic' to my ear, at the moment.

But again, the progress is huge, well done!

Yeah the drum solo really highlights how badly the model missed the point in a drum solo. I'm not a drummer, but this is just not pleasing to hear. Sounds like somebody randomly banging drums more or less in tempo.

It does okay with muzak-type things though, which I guess tracks with my expectations.

I think I was more disappointed by the music samples not having any transitions. Most songs have key changes and percussion turnovers.

As a drummer, the 'drum solo` was surprisingly interesting to listen to, if you consider it happening over a stable 4/4 pulse. The random-but-not-quite nature of the part makes for very unconventional rhythmic patterns. I'd like to be able to syncopate like this on the spot.

Don't ask me to transcribe it.

Tempo consistency is great. Extraneous noises and random cymbal tails show the deficiency of the model though.

I felt a great disturbance in the Force, as though all the music licensing lawyers in the USA all cried out at once.

Perhaps the disturbance you feel is actually the RIAA moving their Death Star into firing range of Stability.ai

stableaudio.com is fully licensed, music is an interesting area

https://www.musicbusinessworldwide.com/stability-ai-launches...

Serious question, I'd genuinely like to know - why?

You didn't license the images when training Stable Diffusion, and yet you did for Stable Audio? In both cases the training should either be fair use and legal without any licensing, or be infringing and need licensing. Why is audio different than images? Am I missing something here?

this can produce some pretty disturbing, but interesting music using the prompt "energetic music, violin, voice, orchestra, piano, minimalism, john adams, nixon in china": https://www.stableaudio.com/1/share/953f079e-d704-4138-904c-...

Finally, some music from the future

It reminds me a little of breath of the wild guardian music

So many questions ...

They publish the code to train on your own music, but not the weights of their model? So you cannot just upload this thing to some EC2 instance and start creating your own music, correct?

Is this the same as https://www.stableaudio.com?

StabilityAI is just a marketing machine at this point that is praying for an acquisition, since the runway is diminishing

this sounds like progress, but it is still very bad except for highly repetitive music like the EDM examples they give, and even then, it still can't get tempo right

The music is pretty meh but the sound effects are exciting for indie game dev!

Too bad according to their page you need an enterprise license for even indie games.

I find it interesting that they are releasing the code and lovely instructions for training, but no model. They are almost begging anonymous folks to hook the data loader up to an Apple Music account and go nuts. Not that I am suggesting anyone do that.

Speculatively it might have been part of an agreement with they were given the licensed stock audio library from AudioSparx to train on they wouldn't redistribute the resulting model.

obviously someone shadowy and non-corporate (eg. an artist) just needs to come out and make a model which includes promptable artist/producer/singer/instrumentalist/song metadata.

describing music without referring to musicians is so clunky because music is never labelled well. of course saying "disco house with funk bass and soulful vocals, uplifting" is going to be bland. Saying "disco house with nile rodgers rhythm guitar, michael mcdonald singing, and a bassline in the style of patrick alavi's power" is going to get you some magic

so this model can only ever understand music which is classified, described, labelled, standardized. and recombine those. sounds boring, sounds like the opposite of what (I would like to believe) people listen to music for, outside of a corporate stock audio context.

We append “high-quality, stereo” to our sound effects prompts because it is generally helpful.

It's hilarious that we've discovered you can get better outputs from LLMs by simply nicely telling it to generate better results.

Maybe sometimes you want an old cassette sound, or even older scratched 78 rpm sound, etc. Computers, as usual, do what you asked them to do, not what you meant.

This is part of a paper on the prior version of the model: https://x.com/stableaudio/status/1755558334797685089?s=20

https://arxiv.org/abs/2402.04825

Which outperforms similar music models.

The pace is accelerating and even better ones are coming with far greater cohesion and... stuff. Will be quite the year for music.

Particularly interesting with the scaled up version of https://www.text-description-to-speech.com

Do try https://www.stableaudio.com for rights licensed model you can use commercially.

Just a few days ago I was down voted for stating AI will be better in creating music than human would be: https://news.ycombinator.com/item?id=39273380#39273532

Now this is released and now I feel I got grist to my mill.

Sure it still kind of sucks, but it's very impressive for a _demo_. Remember that this tech is very much in it's infancy and it's very impressive already.

I don't find this music to be good in any way. It sounds interesting over a few notes, but then completely fails to find any kind of progression that goes anywhere interesting, never iterating on the theme, never teasing you with subtle or surprising variation over a core theme, no built-ups or clear resolution. Very annoying to actually listen to.

Music without changes is boring. I enjoyed the much less stable results of OpenAI's JuleBox (2021?) more than any music AI to come since. Their sound quality is better but they only seem to produce one monotonous texture at a time.

As a musician, I found the pieces unremarkable. Of course, a lot of contemporary music is forgettable as well, as people try to create songs that all sound like hits but, in doing so, create uninteresting songs. I wonder what music the model is based on. I suppose for game music/sounds, perhaps its good enough?

The few examples I was able to play are very promising, unfortunately the host seems to be getting some sort of HN-hug, because all the audio files are buffering every other second -- they seem to throttle at 32 KiB/s.

Music is perfect for AI generation using trained models, because artists have been copying each other for at least the past 100 years and having a computer do it for you is only notionally different. Sure a computer can never truly know your pain, but it can copy someone else's.

Trying to describe music with words is awkward! We need a model that is trained on dance

The problem with music generation is difficulty in editing. Photos and text can be easily edited, but music can't be. Either the piece needs to be MIDI, with relevant parameterisation of instruments, or a UI creating that allows segments of the audio to be reworked like in-painting.

My son suggested to play "Calm meditation music to play in a spa lobby" and "Drum solo" at the same time - sounds pretty good, actually...

wake me up when it can write a fugue

Now, if they can also generate MIDI-tracks to accompany - that'd be great.

That would add some much-needed levels of customization.

The reconstruction demo is in effect an audio compression codec. And I bet it makes existing audio codecs look like absolute toys.