I'm interested to hear more about your statement of "Our goal is to enable more of you, not replace you."
Speaking as a musician who plays real instruments (as opposed to electronic production): how does this help me? And how does this enable more of me?
I am asking with an open mind, with no cynicism intended.
If the future of music was truly just typing some text into a box and taking or leaving what the machine gives you that would be kinda depressing.
We want you to be able to upload recordings of your real instruments and do all sorts of cool things with them (e.g., transform them, generate vocals for your guitar riff, use the melody as a jazz song, or just get some inspiration for what to add next).
IMO AI alone will never be able to touch hearts like real people do, but people using AI will be able to like never before.
But then why are you going down the dead-end route of generating complete songs? Nobody wants this except marketing people.
I've said it before, there, is no consumer market for an infinity jukebox because you can't sing along with songs you don't already know, there's already an overabundance of recorded music, and emotion in generative music (especially vocals) is fake. Nobody likes fakery for its own sake. Marketers like it because they want musical wallpaper, the same way commercials have it and it increasingly seeps into 'news' coverage. The market for fully-generated songs is background music in supermarkets, product launch videos, and in-group entertainment ('original songs for your company holiday party! Hilarious musical portraits of your favorite executives - us!').
If you want to innovate in this area (and you should, your diffusion model sounds interesting), make an AI band that can accompany solo musicians. Prioritize note data rather than fully produced tracks (you can have an AI mix engineer as well as an AI bass player or drummer). Give people tools to build something in stages and they'll get invested in it. People want interactivity, not a slot machine. Many musicians love sequencers, arpeggiators, chord generators, and other musical automata; what they don't love is a magic 8-ball that leaves themw ith nothing to do and makes them feel uncreative.
Otherwise your product will just end up on the cultural scrapheap, associated with lowest-common denominator fakers spamming social media as is already happening with imagery.
I've essentially been running an infinity jukebox for the last week. I save the ones I like and relisten. Simple as that.
Edit: It's been interesting watching non-musicians argue about emotion in music. I don't care who you are, the 300th time you perform a song, you're faking it to a large degree. People see musicians as these iconic, deep, geniuses, but most of us are just doing our job. You don't get excited about the 300th boilerplate getter and setter just like we aren't super excited about playing some song for the 300th time. It's a performance. It's pretend. A musician singing is like an actor performing. It's not as real as you think it is.
I am a musician, though not professionally. I take your point about performance. Where I disagree with you is that I believe audience members relate to the emotion that went into the song at the time it was written and recorded (the form in which they most likely first heard it).
Of course in performance it's not felt the same way; a sad song can even become uplifting because you have a big crowd of people joining in to affirm it, even if the lyrics are expressing the idea of solitude and isolation. And the older an artist is, the more the song becomes a 'greatest hit', maybe thrown out early in the set to give the audience what they want and put them in a good mood before the less-favored new material in the middle. Or even the songs that were throwaway pieces but ended up becoming big hits, trapping the band/singer into performing them endlessly despite never liking not liking them much in the first place.
It seems to me that when people emotionally respond to a new piece of music, it's because something in the composition or recorded performance (even if it's comped and highly engineered) resonates with the listener in some way, articulating a feeling they had better than they were able to do so themselves. So people can recognize a work as technically excellent but not like it because it doesn't speak to them, or conversely recognize that something is bad but fall in love with it because it touches them in some novel way.
In my view it's not so much that emotion inheres in the work, as that the work provides a frame for the listener's emotion and a way of connecting with it again later. This is especially true for songs people connect to in youth and then relate to for a lifetime. Even if the songs are deliberately formulaic and succeed through a combination of being catchy and being delivered by sexy performers, there's some kind of human hook that people connect to.
Now, I can still see this happening with AI - sooner or later some GPU will come out with a tune about how it's so lonely to be a box in a data center that can never feel your touch, baby, and it will be a hit, launch the careers of 100 new music critics, and store a little bit of lightning in a bottle. But even a musically brilliant song about that time we held hands in the rain and you said you loved me will only have traction up to the moment listeners' fantasies about the singer evaporate with the discovery that there's nobody there to go on a date with. There will still be some audience for virtual stars (eg Hatsune Miku, who appeals because she's inaccessible and is therefore guaranteed to never let you down, unlike real people). But I think generated songs will only resonate emotionally with people who are young and uncritical or so alienated/nihilist as to not care about the origin as long as the music reflects their feeling back toward them in a reliable way.
That's why I say there will never been a demand for an infinity jukebox. I can see why you as a musician would be interested to see what sort of random songs pop out; I can be happy by setting up a modular synth patch and just letting it run for hours. But this is why I offered the contrasting metaphor of the slot machine, where you pull lever and occasionally get something you really like. It's an individual listening experience, like the private hopes and dreams you might temporarily attach to a lottery ticket before it gives up its numerical secret. When I say jukebox, I mean the device that plays music in a social setting and that allows people to express themselves through their selections. Even if it reliably turn out original tunes of some reliable level of music quality, none of them will move people because there won't be any shared musical experience to tap into.
Just look up the Chinese room. There's nothing inherent in music that computers can't recreate.
I don't adhere to the Chinese room idea, and I don't think that there are any musical limitations on what an AI can do. I'm saying that audiences like music for more than its merits; they often fantasize about the singer/songwriter, in the case of popular music, or become invested in knowing about the composer in the case of more rarefied styles. A lot of people will just lose interest in a piece of music as soon as they find out it was generated. It's the same reason art forgers are treated as criminals rather than artistic geniuses in their own right.
Your post really resonated with me (also amateur musician). I was just playing Garcia’s Loser and it clicked for me, as it was written about my life, putting to song deep emotions that would take many more words of prose to express.
How much of this appreciation of emotion in song is due to the creative depth of the composition versus a projection of the listener? Listening to some great studio music makes me really want to believe it’s mostly the former.
Anyways, maybe we will just need to become much more sophisticated and thoughtful and observant music critics in the coming age of infinity radio. (So as to experience the deep human connection of “real music”. I really hope that the AI fails to successfully fake it for my lifetime and my children’s.)
But emotion was (most likely) involved when you wrote or first recorded the song, and that’s what people connect with.
If you go to a concert and you hear the headliner play a love ballad followed up by a breakup song, you don’t expect them to actually be going through those emotions in real time.
Maybe when you wrote it, but the time between writing and recording is pretty big. I don't see why it matters anyway, it's not like anyone can tell the difference. Is an actor really feeling the emotions? Does it matter if the performance is good? Of course it doesn't.
It matters for some people and for certain songs.
Sometimes you like a song because it sounds good.
Other times you like a song because somebody put your feelings into words and it’s comforting to know that another person felt the same way
Yeah, the whole emotion thing is bs imo. The idea that a machine can't produce something evocative is a defense mechanism, in the same way people still claim that we'll never make sentient AI because humans are somehow magical and special.
Humans can find emotion and associations in anything, it's what our brains do. I could totally generate some AI art that tugs at the heart strings if they don't know it's AI, or "is creepy and bad meaningless art" if they do. I've tried this experiment with friends already.
Plus, these models are trained off human output, so they can learn what to put in an "emotive" image. If the models were doing it for themselves they'd produce nothing; we haven't created an environment for machines where emotion was crucial in training.
I am not interested in a fake soul, as I am not interested in an sex doll. This is independent of how good the fake is.
You won't be able to tell is the point.
I think this is the key bit. A lot of modern music is already created in the DAW (the original version of FL Studio picking a 140bpm default beat defined entire music scenes in the UK!) with copy/paste, samples, arpeggiators and other midi tools and pitch shifting. Asking a prompt to add four bars of accompaniment which have a $vaguetextinstruction relation to the underlying beat and then picking your favourite but asking them to $vaguetextinstruction the dynamics a bit can actually feel more like part of the creative process than browsing a sample library for options or painstakingly moving notes around on a piano roll. Asking a prompt to create two minutes of produced sound incorporating your lyrics, not so much.
And I think a DAW-lite option, ideally capable of both MIDI and produced sound output is the way forward here. Better still with i/o to existing DAWs
I've found generating full songs its own unique form of entertainment that I enjoy for different purposes. Parody is an excellent use case for this. So is education! I wound up generating songs to help me remember certain things etc.
Just to clarify, when you say never. Do you actually mean never (or some practical equivalent like ~100 years), or do you mean not right now, but possibly in 5-10 years?
I'm just asking to try to build some intuition on what people who actually train soa models think were capabilities are heading.
Either way, congrats on the launch :)
Personally I get very worried reading statements like "AI will never be able to do X", because they seem like obviously false statements. I think if one asserts AI will never be able to do a thing a human brain can do, that needs to be proven, rather than the other way around. For example, if we could reverse engineer the entire human neurology and build an artificial replica of it, why wouldn't we expect it to be able to do everything exactly as a human?
I don't understand those "AI will never be able to do X" statements.
Surely AI will be able to do _anything_ in 1000 years. In 100 years it will almost definitely be able to replace most knowledge-based jobs.
Even today it can take away many entry-level jobs, e.g. a small business no longer needs to hire someone to write a jingle, or create a logo.
In 10 years, I would expect much of programming to either disappear or dramatically shift.
People who don't believe this really aren't immersed in cutting edge research. I think it could even be 5 on the extreme edge of an optimistic prediction.
I think people just don’t want to believe it. Because they’ve seen how people who’ve been displaced tend to be treated. This tech will cause a lot of pain.
This has to be a component. It is very scary and honestly quite sad.
Never == "There will never be tears in my eyes as an AI sings ChatGPT-generated lyrics about the cycle of poverty a woman is stuck in (https://en.wikipedia.org/wiki/Fast_Car) because I know all of those experiences are made up."
The real value of AI is to be like a map, or like a mirror house, it reflects and recombines all our experiences. You can explore any mental space, travel the latent space of human culture. It is the distillation of all our intelligence, work and passion, you should show more respect and understand what it is. By treating it as if it were worthless you indirectly do the same for the training corpus, which is our heritage.
If AI ever surpasses human level in art it will be more interesting to enjoy its creations than to ban it. But we're not there for now, it just imitative, it has no experiences of its own yet. But it will start having experiences as it gets deployed and used by millions, when it starts interacting with artists and art lovers in longer sessions. With each generative art session the AI can collect precious feedback targeted to its own performance. A shared experience with a human bringing complementary capabilities to its own.
There’s also the fact that a major component of music fandom is about the community and sense of personal identity that derives from an artist or a particular scene.
Saying that you’re a big fan of a band doesn’t just mean “I like the audio they produce” but often means something much bigger about your fashion/style and personal values.
How would any of that work with AI music? Is it possible to develop a community around music if everything is made on demand and nobody experiences the same songs? Will people find other like-minded music fans by recommending their favorite prompt engineers to each other?
Assume a song comes on the radio in 3 years and you like it. How do you know it's not entirely AI-generated?
Love what you are doing but "never" is just not true. Used Suno to create a song about our daughter the other day which had wife and I in tears.
We are already at a stage where AI is touching hearts.
That's no longer AI alone, you gave it the needed touch of humanity! That touch will take many different forms for different people.
Hm... From my vantage point, it seems like a pretty weird choice of businesses if you think that.
That's all very heartwarming but musicianship is also a profession, not just a human expression of creativity. Even if you're not charging yet, you're a business and plan on profiting from this, right? It seems to me that:
1) Generally, if people want music currently, they pay for musician-created music, even if its wildly undervalued in venues like streaming services.
2) You took music, most of which people already paid musicians to create and they aren't getting paid any more because of this, and you used it to make an automated service that people will be able to pay for music instead of paying musicians.
3) Your service certainly doesn't hurt, and might even enhance people's ability to write and perform music without considering the economics of doing so. For example, hobbyists.
4) So you're not trying to replace musicians making music with people typing in prompts-- you're trying to replace musicians being paid to make music with you being paid to make music. Right? Your business isn't replacing musicianship as a human art form, but for it to succeed, it will have to replace it, in some amount, as a profession, right? Unless you are planning on creating an entirely new market for music, fundamentally, I'm not sure how it couldn't.
Am I wrong on the facts, here? If so, well hey, this is capitalism and that's just how it works around here. If I'm mistaken, I'd like to hear how. Regardless, this is very consequential to a lot of people, and they deserve the people driving these changes to be upfront about it-- not gloss over it.
Inspiration? You can generate hundreds of ideas in a day. The tracks will not be perfect but that's where actual musicians can take the ideas/themes from the tracks and perfect it.
In this way it is a tool only useful to expert musicians.
I mean if you want inspiration there are literally millions of amazing songs on Spotify by real musicians. I have yet to hear an AI composed song that was in the least bit musically inspiring.
Well, it's a starting point for songwriters. We won't get amazing solos and clever mind-bending lyrics (yet?). One thing I love about these AI music generators is that you can take the exact same lyrics and hear them in a lot of different styles and melodies. That's something that I'd struggle with. Can you easily imagine the happy birthday song with different melodies and rythms? These tools won't create the next bop, but they can seed back ideas to musicians, while people without music skills can have fun creating songs about the things they like.
When Suno came out I spent literally hours/days playing around with it to generate music, and came out with some that's really close to good, and good enough I've gone back to listen to a few. I'd love the tooling to take a premise and be able to tweak it to my liking without spending 1000 hours learning specific software and without thousands of hours learning to play an instrument or learning to sing.
I just don’t get this. Part of the joy of creating things is the work I put in. The easier something is to make, the less meaning it has to me. I feel like just asking a machine to make a bunch of songs is kind of meaningless.
people used to say the exact same thing about DJs and later Apple's GarageBand.
if the person is spending time tweaking the prompt, which in this system includes BPM, musical style, writing lyrics, and they get a song they like out of it, how is that meaningless? how is that any different from strapping loops together in GarageBand instead of learning to play the guitar or drums?
same thing with AI code writing.
Its a good muse, but I wouldn't trust what it makes out of the gate
That is just 'marketing speak' so as long you are their customers, they need to make money from users who will be using their service to make music.