HN comments for: Show HN: Sonauto – A more controllable AI music creator

adrianh

36 replies

23h13m

2024-04-10 19:12:11 UTC

I'm interested to hear more about your statement of "Our goal is to enable more of you, not replace you."

Speaking as a musician who plays real instruments (as opposed to electronic production): how does this help me? And how does this enable more of me?

I am asking with an open mind, with no cynicism intended.

zaptrem

27 replies

23h9m

2024-04-10 19:15:50 UTC

If the future of music was truly just typing some text into a box and taking or leaving what the machine gives you that would be kinda depressing.

We want you to be able to upload recordings of your real instruments and do all sorts of cool things with them (e.g., transform them, generate vocals for your guitar riff, use the melody as a jazz song, or just get some inspiration for what to add next).

IMO AI alone will never be able to touch hearts like real people do, but people using AI will be able to like never before.

anigbrowl

13 replies

21h55m

2024-04-10 20:30:28 UTC

But then why are you going down the dead-end route of generating complete songs? Nobody wants this except marketing people.

I've said it before, there, is no consumer market for an infinity jukebox because you can't sing along with songs you don't already know, there's already an overabundance of recorded music, and emotion in generative music (especially vocals) is fake. Nobody likes fakery for its own sake. Marketers like it because they want musical wallpaper, the same way commercials have it and it increasingly seeps into 'news' coverage. The market for fully-generated songs is background music in supermarkets, product launch videos, and in-group entertainment ('original songs for your company holiday party! Hilarious musical portraits of your favorite executives - us!').

If you want to innovate in this area (and you should, your diffusion model sounds interesting), make an AI band that can accompany solo musicians. Prioritize note data rather than fully produced tracks (you can have an AI mix engineer as well as an AI bass player or drummer). Give people tools to build something in stages and they'll get invested in it. People want interactivity, not a slot machine. Many musicians love sequencers, arpeggiators, chord generators, and other musical automata; what they don't love is a magic 8-ball that leaves themw ith nothing to do and makes them feel uncreative.

Otherwise your product will just end up on the cultural scrapheap, associated with lowest-common denominator fakers spamming social media as is already happening with imagery.

bongodongobob

10 replies

21h14m

2024-04-10 21:11:24 UTC

I've essentially been running an infinity jukebox for the last week. I save the ones I like and relisten. Simple as that.

Edit: It's been interesting watching non-musicians argue about emotion in music. I don't care who you are, the 300th time you perform a song, you're faking it to a large degree. People see musicians as these iconic, deep, geniuses, but most of us are just doing our job. You don't get excited about the 300th boilerplate getter and setter just like we aren't super excited about playing some song for the 300th time. It's a performance. It's pretend. A musician singing is like an actor performing. It's not as real as you think it is.

anigbrowl

3 replies

17h45m

2024-04-11 00:39:54 UTC

I am a musician, though not professionally. I take your point about performance. Where I disagree with you is that I believe audience members relate to the emotion that went into the song at the time it was written and recorded (the form in which they most likely first heard it).

Of course in performance it's not felt the same way; a sad song can even become uplifting because you have a big crowd of people joining in to affirm it, even if the lyrics are expressing the idea of solitude and isolation. And the older an artist is, the more the song becomes a 'greatest hit', maybe thrown out early in the set to give the audience what they want and put them in a good mood before the less-favored new material in the middle. Or even the songs that were throwaway pieces but ended up becoming big hits, trapping the band/singer into performing them endlessly despite never liking not liking them much in the first place.

It seems to me that when people emotionally respond to a new piece of music, it's because something in the composition or recorded performance (even if it's comped and highly engineered) resonates with the listener in some way, articulating a feeling they had better than they were able to do so themselves. So people can recognize a work as technically excellent but not like it because it doesn't speak to them, or conversely recognize that something is bad but fall in love with it because it touches them in some novel way.

In my view it's not so much that emotion inheres in the work, as that the work provides a frame for the listener's emotion and a way of connecting with it again later. This is especially true for songs people connect to in youth and then relate to for a lifetime. Even if the songs are deliberately formulaic and succeed through a combination of being catchy and being delivered by sexy performers, there's some kind of human hook that people connect to.

Now, I can still see this happening with AI - sooner or later some GPU will come out with a tune about how it's so lonely to be a box in a data center that can never feel your touch, baby, and it will be a hit, launch the careers of 100 new music critics, and store a little bit of lightning in a bottle. But even a musically brilliant song about that time we held hands in the rain and you said you loved me will only have traction up to the moment listeners' fantasies about the singer evaporate with the discovery that there's nobody there to go on a date with. There will still be some audience for virtual stars (eg Hatsune Miku, who appeals because she's inaccessible and is therefore guaranteed to never let you down, unlike real people). But I think generated songs will only resonate emotionally with people who are young and uncritical or so alienated/nihilist as to not care about the origin as long as the music reflects their feeling back toward them in a reliable way.

That's why I say there will never been a demand for an infinity jukebox. I can see why you as a musician would be interested to see what sort of random songs pop out; I can be happy by setting up a modular synth patch and just letting it run for hours. But this is why I offered the contrasting metaphor of the slot machine, where you pull lever and occasionally get something you really like. It's an individual listening experience, like the private hopes and dreams you might temporarily attach to a lottery ticket before it gives up its numerical secret. When I say jukebox, I mean the device that plays music in a social setting and that allows people to express themselves through their selections. Even if it reliably turn out original tunes of some reliable level of music quality, none of them will move people because there won't be any shared musical experience to tap into.

bongodongobob

1 replies

14h26m

2024-04-11 03:59:38 UTC

Just look up the Chinese room. There's nothing inherent in music that computers can't recreate.

anigbrowl

0 replies

14m

2024-04-11 18:11:34 UTC

I don't adhere to the Chinese room idea, and I don't think that there are any musical limitations on what an AI can do. I'm saying that audiences like music for more than its merits; they often fantasize about the singer/songwriter, in the case of popular music, or become invested in knowing about the composer in the case of more rarefied styles. A lot of people will just lose interest in a piece of music as soon as they find out it was generated. It's the same reason art forgers are treated as criminals rather than artistic geniuses in their own right.

clbrmbr

0 replies

16h42m

2024-04-11 01:43:43 UTC

Your post really resonated with me (also amateur musician). I was just playing Garcia’s Loser and it clicked for me, as it was written about my life, putting to song deep emotions that would take many more words of prose to express.

How much of this appreciation of emotion in song is due to the creative depth of the composition versus a projection of the listener? Listening to some great studio music makes me really want to believe it’s mostly the former.

Anyways, maybe we will just need to become much more sophisticated and thoughtful and observant music critics in the coming age of infinity radio. (So as to experience the deep human connection of “real music”. I really hope that the AI fails to successfully fake it for my lifetime and my children’s.)

parpfish

2 replies

20h25m

2024-04-10 22:00:19 UTC

But emotion was (most likely) involved when you wrote or first recorded the song, and that’s what people connect with.

If you go to a concert and you hear the headliner play a love ballad followed up by a breakup song, you don’t expect them to actually be going through those emotions in real time.

bongodongobob

1 replies

19h24m

2024-04-10 23:01:03 UTC

Maybe when you wrote it, but the time between writing and recording is pretty big. I don't see why it matters anyway, it's not like anyone can tell the difference. Is an actor really feeling the emotions? Does it matter if the performance is good? Of course it doesn't.

parpfish

0 replies

18h28m

2024-04-10 23:56:51 UTC

It matters for some people and for certain songs.

Sometimes you like a song because it sounds good.

Other times you like a song because somebody put your feelings into words and it’s comforting to know that another person felt the same way

fennecbutt

2 replies

17h5m

2024-04-11 01:20:12 UTC

Yeah, the whole emotion thing is bs imo. The idea that a machine can't produce something evocative is a defense mechanism, in the same way people still claim that we'll never make sentient AI because humans are somehow magical and special.

Humans can find emotion and associations in anything, it's what our brains do. I could totally generate some AI art that tugs at the heart strings if they don't know it's AI, or "is creepy and bad meaningless art" if they do. I've tried this experiment with friends already.

Plus, these models are trained off human output, so they can learn what to put in an "emotive" image. If the models were doing it for themselves they'd produce nothing; we haven't created an environment for machines where emotion was crucial in training.

holri

1 replies

11h42m

2024-04-11 06:43:20 UTC

I am not interested in a fake soul, as I am not interested in an sex doll. This is independent of how good the fake is.

bongodongobob

0 replies

4h35m

2024-04-11 13:50:37 UTC

You won't be able to tell is the point.

notahacker

0 replies

19h55m

2024-04-10 22:30:12 UTC

Many musicians love sequencers, arpeggiators, chord generators, and other musical automata; what they don't love is a magic 8-ball that leaves them with nothing to do and makes them feel uncreative.

I think this is the key bit. A lot of modern music is already created in the DAW (the original version of FL Studio picking a 140bpm default beat defined entire music scenes in the UK!) with copy/paste, samples, arpeggiators and other midi tools and pitch shifting. Asking a prompt to add four bars of accompaniment which have a $vaguetextinstruction relation to the underlying beat and then picking your favourite but asking them to $vaguetextinstruction the dynamics a bit can actually feel more like part of the creative process than browsing a sample library for options or painstakingly moving notes around on a piano roll. Asking a prompt to create two minutes of produced sound incorporating your lyrics, not so much.

And I think a DAW-lite option, ideally capable of both MIDI and produced sound output is the way forward here. Better still with i/o to existing DAWs

latentsea

0 replies

18h35m

2024-04-10 23:50:32 UTC

I've found generating full songs its own unique form of entertainment that I enjoy for different purposes. Parody is an excellent use case for this. So is education! I wound up generating songs to help me remember certain things etc.

Version467

9 replies

22h52m

2024-04-10 19:33:38 UTC

Just to clarify, when you say never. Do you actually mean never (or some practical equivalent like ~100 years), or do you mean not right now, but possibly in 5-10 years?

I'm just asking to try to build some intuition on what people who actually train soa models think were capabilities are heading.

Either way, congrats on the launch :)

digging

4 replies

21h38m

2024-04-10 20:47:16 UTC

Personally I get very worried reading statements like "AI will never be able to do X", because they seem like obviously false statements. I think if one asserts AI will never be able to do a thing a human brain can do, that needs to be proven, rather than the other way around. For example, if we could reverse engineer the entire human neurology and build an artificial replica of it, why wouldn't we expect it to be able to do everything exactly as a human?

shepherdjerred

3 replies

21h28m

2024-04-10 20:57:15 UTC

I don't understand those "AI will never be able to do X" statements.

Surely AI will be able to do _anything_ in 1000 years. In 100 years it will almost definitely be able to replace most knowledge-based jobs.

Even today it can take away many entry-level jobs, e.g. a small business no longer needs to hire someone to write a jingle, or create a logo.

In 10 years, I would expect much of programming to either disappear or dramatically shift.

cellis

2 replies

17h14m

2024-04-11 01:11:47 UTC

People who don't believe this really aren't immersed in cutting edge research. I think it could even be 5 on the extreme edge of an optimistic prediction.

yoyohello13

1 replies

16h0m

2024-04-11 02:25:42 UTC

I think people just don’t want to believe it. Because they’ve seen how people who’ve been displaced tend to be treated. This tech will cause a lot of pain.

shepherdjerred

0 replies

14h11m

2024-04-11 04:14:04 UTC

This has to be a component. It is very scary and honestly quite sad.

zaptrem

3 replies

22h18m

2024-04-10 20:07:04 UTC

Never == "There will never be tears in my eyes as an AI sings ChatGPT-generated lyrics about the cycle of poverty a woman is stuck in (https://en.wikipedia.org/wiki/Fast_Car) because I know all of those experiences are made up."

visarga

0 replies

21h55m

2024-04-10 20:30:09 UTC

The real value of AI is to be like a map, or like a mirror house, it reflects and recombines all our experiences. You can explore any mental space, travel the latent space of human culture. It is the distillation of all our intelligence, work and passion, you should show more respect and understand what it is. By treating it as if it were worthless you indirectly do the same for the training corpus, which is our heritage.

If AI ever surpasses human level in art it will be more interesting to enjoy its creations than to ban it. But we're not there for now, it just imitative, it has no experiences of its own yet. But it will start having experiences as it gets deployed and used by millions, when it starts interacting with artists and art lovers in longer sessions. With each generative art session the AI can collect precious feedback targeted to its own performance. A shared experience with a human bringing complementary capabilities to its own.

parpfish

0 replies

20h11m

2024-04-10 22:14:09 UTC

There’s also the fact that a major component of music fandom is about the community and sense of personal identity that derives from an artist or a particular scene.

Saying that you’re a big fan of a band doesn’t just mean “I like the audio they produce” but often means something much bigger about your fashion/style and personal values.

How would any of that work with AI music? Is it possible to develop a community around music if everything is made on demand and nobody experiences the same songs? Will people find other like-minded music fans by recommending their favorite prompt engineers to each other?

digging

0 replies

21h36m

2024-04-10 20:48:57 UTC

Assume a song comes on the radio in 3 years and you like it. How do you know it's not entirely AI-generated?

manibatra

1 replies

17h24m

2024-04-11 01:00:52 UTC

Love what you are doing but "never" is just not true. Used Suno to create a song about our daughter the other day which had wife and I in tears.

We are already at a stage where AI is touching hearts.

zaptrem

0 replies

14h58m

2024-04-11 03:26:59 UTC

That's no longer AI alone, you gave it the needed touch of humanity! That touch will take many different forms for different people.

chefandy

0 replies

20h15m

2024-04-10 22:10:01 UTC

If the future of music was truly just typing some text into a box and taking or leaving what the machine gives you that would be kinda depressing.

Hm... From my vantage point, it seems like a pretty weird choice of businesses if you think that.

IMO AI alone will never be able to touch hearts like real people do, but people using AI will be able to like never before.

That's all very heartwarming but musicianship is also a profession, not just a human expression of creativity. Even if you're not charging yet, you're a business and plan on profiting from this, right? It seems to me that:

1) Generally, if people want music currently, they pay for musician-created music, even if its wildly undervalued in venues like streaming services.

2) You took music, most of which people already paid musicians to create and they aren't getting paid any more because of this, and you used it to make an automated service that people will be able to pay for music instead of paying musicians.

3) Your service certainly doesn't hurt, and might even enhance people's ability to write and perform music without considering the economics of doing so. For example, hobbyists.

4) So you're not trying to replace musicians making music with people typing in prompts-- you're trying to replace musicians being paid to make music with you being paid to make music. Right? Your business isn't replacing musicianship as a human art form, but for it to succeed, it will have to replace it, in some amount, as a profession, right? Unless you are planning on creating an entirely new market for music, fundamentally, I'm not sure how it couldn't.

Am I wrong on the facts, here? If so, well hey, this is capitalism and that's just how it works around here. If I'm mistaken, I'd like to hear how. Regardless, this is very consequential to a lot of people, and they deserve the people driving these changes to be upfront about it-- not gloss over it.

LZ_Khan

2 replies

23h4m

2024-04-10 19:20:54 UTC

Inspiration? You can generate hundreds of ideas in a day. The tracks will not be perfect but that's where actual musicians can take the ideas/themes from the tracks and perfect it.

In this way it is a tool only useful to expert musicians.

jimmyjazz14

1 replies

20h27m

2024-04-10 21:58:24 UTC

I mean if you want inspiration there are literally millions of amazing songs on Spotify by real musicians. I have yet to hear an AI composed song that was in the least bit musically inspiring.

zimmund

0 replies

3h48m

2024-04-11 14:37:17 UTC

Well, it's a starting point for songwriters. We won't get amazing solos and clever mind-bending lyrics (yet?). One thing I love about these AI music generators is that you can take the exact same lyrics and hear them in a lot of different styles and melodies. That's something that I'd struggle with. Can you easily imagine the happy birthday song with different melodies and rythms? These tools won't create the next bop, but they can seed back ideas to musicians, while people without music skills can have fun creating songs about the things they like.

93po

2 replies

20h46m

2024-04-10 21:39:00 UTC

When Suno came out I spent literally hours/days playing around with it to generate music, and came out with some that's really close to good, and good enough I've gone back to listen to a few. I'd love the tooling to take a premise and be able to tweak it to my liking without spending 1000 hours learning specific software and without thousands of hours learning to play an instrument or learning to sing.

yoyohello13

1 replies

16h7m

2024-04-11 02:18:37 UTC

I just don’t get this. Part of the joy of creating things is the work I put in. The easier something is to make, the less meaning it has to me. I feel like just asking a machine to make a bunch of songs is kind of meaningless.

inhumantsar

0 replies

6h3m

2024-04-11 12:22:37 UTC

people used to say the exact same thing about DJs and later Apple's GarageBand.

if the person is spending time tweaking the prompt, which in this system includes BPM, musical style, writing lyrics, and they get a song they like out of it, how is that meaningless? how is that any different from strapping loops together in GarageBand instead of learning to play the guitar or drums?

whoomp12341

0 replies

6h40m

2024-04-11 11:45:31 UTC

same thing with AI code writing.

Its a good muse, but I wouldn't trust what it makes out of the gate

suyash

0 replies

21h22m

2024-04-10 21:03:17 UTC

That is just 'marketing speak' so as long you are their customers, they need to make money from users who will be using their service to make music.

dwallin

15 replies

23h10m

2024-04-10 19:15:02 UTC

I think the problem here is the same one as the other current music generation services. Iteration is so important to creativity and right now you can't really properly iterate. In order to get the right song you just spray and pray and keep generating until one that is sufficient arrives or you give up. I know you hint at this being a future direction of development but in my opinion it's a key feature to take these services beyond toys.

I think it's better to think of the process of finding the right song as a search algorithm through the space of all possible songs. The current approach just uses a "pick a random point in a general area". Once we find something that is roughly correct we need something that lets us iteratively tweak the aspects that are not quite right, decreasing the search space and allowing us to iteratively take smaller and smaller steps in defined directions.

zaptrem

6 replies

23h9m

2024-04-10 19:16:40 UTC

Our variations feature coming very soon is exactly this! Rhythm Control is an early version of this.

dwallin

2 replies

22h34m

2024-04-10 19:51:07 UTC

I'll keep an eye out for that! The variations feature in Suno is a good example of what not to do here, as it effectively just makes another random iteration using existing settings.

I think the other missing pieces I've found are upscaling and stem splitting. While existing tool exist for splitting stems exist, my testing found that this didn't work well in practice (at least on Suno music), likely due to a combination of encoder-specific artifacts and the overall low sound quality. Existing upscaling approaches also faced similar issues.

My naive guess is that these are things that will benefit from being closely intertwined with the generation process. Eg when splitting up stems, you can use the diffusion model(s) to help jointly converge individual stems into reasonable standalone tracks.

I'm excited about the potential of these tools. I've definitely personally found uses cases for small independent game projects where a paying for musicians is far out of budget, and the style of music is not one I can execute on my own. But I'm not willing to sacrifice on quality of results to do so.

zaptrem

1 replies

22h21m

2024-04-10 20:04:26 UTC

Our variations feature will be nothing like Suno's (which just generates another song using the same prompt/lyrics). Since we use a diffusion model, we can actually restart the generation process from an early timestep (e.g., with a similar seed or even parts of the existing song) to get exactly what you're looking for.

throwup238

0 replies

21h29m

2024-04-10 20:56:21 UTC

> Our variations feature will be nothing like Suno's (which just generates another song using the same prompt/lyrics).

That's their "Remix" feature which just got renamed "Reuse prompt" or something.

Their extend feature generates a new song starting from an arbitrary timestamp, with a new prompt. It doesn't always work for drastic style changes and it can be a bit repetitive with some songs but it doesn't completely reroll the entire song.

SubiculumCode

1 replies

22h39m

2024-04-10 19:46:14 UTC

More strength does what? More or less similar?

zaptrem

0 replies

22h20m

2024-04-10 20:04:58 UTC

More strength = force rhythm more. If you crank it to max it will probably result in just a drum line, so I prefer 3-4.

SubiculumCode

0 replies

22h35m

2024-04-10 19:50:27 UTC

I uploaded a bit of a song that I recorded once (that I wrote, unpublished), and I am trying to get it to riff on it, generate something close to it, etc.

p1esk

2 replies

18h37m

2024-04-10 23:48:17 UTC

But that’s not a problem when listening to Spotify? Why can’t we treat these music generation engines the same way we treat music streaming services?

darby_eight

1 replies

18h25m

2024-04-11 00:00:42 UTC

Idk what you're referring to specifically, but music discovery services are terrible across all of spotify, apple music, google music, tidal, etc. I don't expect these services to read your mind, but they also don't ask for many parameters to help with the search. Definitely a huge opportunity here for innovative new services.

zaptrem

0 replies

16h39m

2024-04-11 01:45:57 UTC

TikTok can tolerate a lot more active skipping than Spotify can before they annoy their users. We’d love to solve this. How would you? Maybe we could let users write why they didn’t like the song in natural language since we understand that now.

nomel

1 replies

16h52m

2024-04-11 01:33:48 UTC

Same with text models, for me. If I can't edit my query and the AI response, to retry/keep the context in check, then I have trouble finding use for it, in creation. I need to be able to directly influence the entire loop, and, most importantly, keep the context for the next token prediction clean and short.

skybrian

0 replies

15h14m

2024-04-11 03:11:26 UTC

Letting you edit the response is quite easy to do, technically speaking. It's not done in the default UI for most AI Chatbots, unfortunately. You will need to look for alternative UIs.

ljm

0 replies

4h31m

2024-04-11 13:54:07 UTC

I've noticed that the output tends to suffer when you pass in longer lyrics, too. Lots of my experiments start off fairly strong but then it's like it starts to forget, and the lyrics lose any rhythmic structure or just becomes incoherent.

At some point it's just not efficient to try and get the desired output purely through a prompt, and it would be helpful to download the output in a format you can plug into your DAW to tweak.

ctrw

0 replies

21h14m

2024-04-10 21:10:59 UTC

Basically you need something like comfy UI for music.

Variation in small details is fine, but you need control over larger scale structure.

Barneyhill

0 replies

21h31m

2024-04-10 20:54:31 UTC

Yep, I came to similar conclusions w/ text-to-audio models - in terms of creative work the ability to iterate is really lacking with the current interfaces. We've stopped working on text-to-audio models and are instead focusing on targeting a lower-level of abstraction by directly exposing an Ableton environment to LLM agents.

We just published a blog today discussing this - https://montyanderson.net/writing/synthesis

bufferoverflow

12 replies

2024-04-10 17:53:16 UTC

Check out what Udio can produce. It's so far ahead.

https://twitter.com/nasescobar316/status/1777481957774872704

https://twitter.com/apples_jimmy/status/1777905772384678149

https://twitter.com/HalimAlrasihi/status/1778118063138673137

https://twitter.com/AngryTomtweets/status/177811764524768059...

https://twitter.com/AngryTomtweets/status/177811769943385715...

swalsh

4 replies

23h20m

2024-04-10 19:04:53 UTC

None of these "songs" have any emotion.. AI music just doesn't make me "feel" anything yet.

suyash

1 replies

21h18m

2024-04-10 21:07:44 UTC

I bet that's only becuase you know it's created by AI. If no one told you that and you hear someone sing that song and play along bet you will feel. AI is only getting better, it will be just as good as any human and only way we will be able to tell is when it's disclosed if it's AI-generated or not.

relaxing

0 replies

18h47m

2024-04-10 23:38:15 UTC

If no one told me they were AI, I’d probably assume it was a parody group or a house band messing around in the studio. It doesn’t sound like an artist writing with intention.

And I’d wonder why they encoded at 32kbps with a RealMedia codec from 1998.

postalrat

0 replies

22h21m

2024-04-10 20:04:39 UTC

Probably because you haven't heard them before.

Kiro

0 replies

23h4m

2024-04-10 19:21:05 UTC

Would you pass a blind test?

klohto

3 replies

2024-04-10 18:23:58 UTC

meh, Suno v3 still has better quality for me personally

bufferoverflow

1 replies

23h51m

2024-04-10 18:34:29 UTC

Not for me. Suno voices sound distinctly robotic/metallic.

jmacd

0 replies

23h29m

2024-04-10 18:56:06 UTC

Something about a discussions of the nuance/taste of different LLMs for different purposes is really interesting to see when it is related to something like music.

throwup238

0 replies

23h51m

2024-04-10 18:33:55 UTC

In my experience with Suno ($40 spent so far) sound quality is worse than the cherry picked examples from Udio - especially the vocals - but everything I've heard from Udio could best be described as the elevator music equivalent of their respective genres so that's probably why it sounds so good. There seems to be a real quality vs originality trade off in the state of the art.

That said, I've only had the chance to generate a few songs with Udio and they have all sounded like they were recorded by a prison band in an overcrowded cell (I create mainly instrumental/orchestra/sound track music).

rlp

0 replies

21h41m

2024-04-10 20:44:32 UTC

Wow. I just had it write a song about being sad about losing my keys in r&b/soul style, I'm totally blown away:

https://www.udio.com/songs/bDY5CYdJZP93AdpgpfBJNX

huac

0 replies

22h56m

2024-04-10 19:29:19 UTC

it's difficult to gauge from outside / as a consumer, but what's interesting is rarely where models are at a given point in time, but rather where the model/team will be with similar amounts resources. it may very well still be Udio (who presumably have significantly more resources than Sonauto), but I would hesitate to say that a compute advantage counts as being 'far ahead.'

froyolobro

0 replies

23h21m

2024-04-10 19:04:28 UTC

These are pretty incredible. More compressed, but way better 'songwriting' and 'performance'

999900000999

10 replies

2024-04-10 17:49:05 UTC

What quality are you producing here ?

Suno has this issue too, but everything sounds like it's washed out or something. As if you recorded it from a different room.

Still I love this, ultimately I think it'll be a tool musicians use vs something for creating stand alone art

throw_m239339

4 replies

2024-04-10 18:24:40 UTC

Still I love this, ultimately I think it'll be a tool musicians use vs something for creating stand alone art

Spotify is getting flooded with AI generated music. It is absolutely something people will use to just generate the music they want to hear.

Ultimately though, what would be the point of spotify? Anybody will be able to generate 24/7 of songs based on their mood or a few keywords.

It will radically change the music landscape and how people "consume" music.

zaptrem

2 replies

23h49m

2024-04-10 18:36:29 UTC

If this were the future that would be kinda depressing. I think the best, truly catchy songs and those that truly connect with people will continue having a significant human element. I see this as similar to the invention of Photoshop except even easier for normal people to start getting into.

relaxing

0 replies

18h43m

2024-04-10 23:42:35 UTC

Photoshop doesn’t move the paintbrush for you.

pksebben

0 replies

21h40m

2024-04-10 20:45:12 UTC

So long as there's something to miss about human-generated content, there will be a market for that content.

Things are going to get truly weird when you can no longer tell the difference, on any level.

999900000999

0 replies

22h56m

2024-04-10 19:29:22 UTC

At least for hip hop, AI is too sanitized to do anything too creative.

I suspect record labels might train their own models. I know for sampling, being able to just create a royalty loop without worrying about clearing anything is cool.

zaptrem

3 replies

2024-04-10 18:00:17 UTC

The audio is 44.1khz stereo, but all of us use autoencoders so the songs will fit in a transformer's context window, and huge compression will affect quality. We're definitely working on better ones, though!

999900000999

1 replies

2024-04-10 18:12:12 UTC

I'd definitely pay more for higher quality!

Good work

LouisvilleGeek

0 replies

23h50m

2024-04-10 18:35:30 UTC

Same here. Please consider a higher quality option.

cchance

0 replies

23h41m

2024-04-10 18:44:16 UTC

Feels like this needs something like was done with stable diffusion when they fixed the contrast in images through the use of loras

kposehn

0 replies

20h47m

2024-04-10 21:38:03 UTC

I've found that adding prompt elements such as "hi-fi", "sharp imaging" and "clear soundstage" have helped create a less compressed and generally cleaner sound.

givinguflac

9 replies

23h39m

2024-04-10 18:46:31 UTC

Any plans for alternate login systems? Don’t want to use a Google account personally. I’d love to try it though. Thanks!

zaptrem

8 replies

23h31m

2024-04-10 18:54:33 UTC

Which providers would you prefer? We tried Twitter last night but it wasn't working for some reason (kept redirecting immediately with no oauth page).

4chandaily

6 replies

23h11m

2024-04-10 19:14:11 UTC

Basic username and password auth has worked for millions for decades. If you absolutely must collect user data for some reason, an email address can be used as the username. This isn't a hard problem to solve.

postalrat

3 replies

22h22m

2024-04-10 20:03:10 UTC

The problem is 1 person creating 10,000 accounts. Solve that and you will be rich.

4chandaily

2 replies

22h13m

2024-04-10 20:12:24 UTC

Why solve it at all? 10,000 fake accounts for every human is working out great for Elon. =)

Seriously, though - the solution isn't to prevent people from doing this, it is to remove the incentives that encourage it.

postalrat

1 replies

21h53m

2024-04-10 20:32:07 UTC

How do you remove the incentive? Don't allow free accounts?

4chandaily

0 replies

21h22m

2024-04-10 21:03:21 UTC

Don't use accounts at all for non-paid features.

zaptrem

1 replies

23h6m

2024-04-10 19:19:46 UTC

For us it had nothing to do with collecting user data, adding what you mentioned would have just required another few hours of dev time haha. You’re right that it’s not hard to solve, we just wanted to focus on the rest of the app since there’s only two of us. We can definitely add this though!

4chandaily

0 replies

22h11m

2024-04-10 20:14:07 UTC

Well, what I could see from this side of the wall looked professional and well put together. Impressive for a team of two.

Congrats on the launch, regardless. I will be sure to check it out when it becomes more accessible.

ale42

0 replies

21h17m

2024-04-10 21:08:00 UTC

Hacker news ;-) But I guess there's no OAuth or other similar function on HN...

More seriously, personally none of them, I don't have accounts on any "usually used" login providers. Just allow local accounts.

garyrob

9 replies

22h56m

2024-04-10 19:28:58 UTC

My hobby is songwriting. (Example: https://www.youtube.com/watch?v=Kjng3UoKkGk)

I play guitar, but I'm not much of a guitarist or singer. I really like songwriting, not trying to be polished as a performer. So I intermittently look into the AI world to see whether it has tools I could use to generate a higher-quality song demo than I could do on my own.

I've been looking for something that could take a chord progression and style instructions and create a decent backing track for a singer to sing over.

But your saying "Very soon you'll also be able to generate proper variations of an uploaded or previously generated song (e.g., you could even sing into Voice Memos for a minute and upload that!)" is very intriguing. I mean, I can sing and play, it just isn't very professional. But if I could then have an AI take what I did and just... make it better... that would be kind of awesome.

In fact, I believe you could have a very big market among songwriters if you could do that. What I would love to see is this:

My guitar parts are typically not just strummed, but involve picking, sometimes fairly intricate. I'm just not that good at it. It would be fantastic to have an AI that would just take would I played and fix it so that it's more perfect.

And then to have a tool where I could say, "OK, now add a bass part," and "OK, now add drums" would be awesome.

maroonblazer

5 replies

18h7m

2024-04-11 00:18:43 UTC

If all you're looking for is polished backing tracks, why couldn't Band in a Box serve that function?

https://www.pgmusic.com/

garyrob

4 replies

17h57m

2024-04-11 00:28:36 UTC

It could, but I want it to be even easier and with better results! I think AI has that potential. I am absolutely sure it does, in fact, and that some AI product will obsolete Band In A Box within the next decade. Maybe within the next year. If the people who make BIAB aren't working on it, themselves, with full focus, they are making a big mistake.

noizejoy

1 replies

17h5m

2024-04-11 01:20:46 UTC

But how will you make your song stand out as something special, when every other aspiring song writer has the same access to the same level of insta gratification for making a full production from barebones song writing?

Or is your target audience only your own ears, and you never plan to publish or even compare your work to others?

garyrob

0 replies

17h0m

2024-04-11 01:25:45 UTC

Songwriting is songwriting. You make it special by making a great song. You make a demo. You can get a song published if it sounds decent and is a great song. Publishers are influenced by the production quality, but they aren't idiots. They can discern a great lyrics, great harmonic shifts, and great melody as separate matters from whether it has a fantastic lead guitar solo or drum part.

If all someone can manage is "barebones song writing" without great lyrics, harmonic interest, or melody, they need to either be in a fantastic band or give up.

maroonblazer

1 replies

15h24m

2024-04-11 03:01:05 UTC

I don't have any connection to anyone at PGMusic but BiaB already implements a technique that could be described as AI or AI-like.

Having played music nearly all my life, songwriting included, and soaked up almost every bit of music-making tech in the process, I'd wager we won't see AI delivering better results more easily and, importantly, with the flexibility of Band in a Box within the next year.

The playing/performance part of making music is a solved problem. You can do this with DAWs and plug-ins today. The truly hard part is coming up with the ideas. That's where AI has an opportunity.

garyrob

0 replies

14h59m

2024-04-11 03:26:49 UTC

The problem I have with BIAB is my songs often have very specific fingerpicking parts. BIAB can't easily do the same picking as far as I can tell. (Or maybe at all?) So I'm thinking an AI like the one in the OP may be able to pick up on my specific fingerpicking but just do it more accurately. And then add other instruments that closely align with those parts.

If I'm missing something about BIAB, let me know!

zaptrem

0 replies

21h56m

2024-04-10 20:28:58 UTC

Awesome to hear this resonates with you! If you join our Discord server I'll ping @everyone when improvements are ready.

mschulkind

0 replies

20h47m

2024-04-10 21:38:36 UTC

Check out this AI vocals plugin. It's pretty impressive already.

https://youtu.be/PCYTqDSUbvU

LastTrain

0 replies

16h2m

2024-04-11 02:23:15 UTC

That song is quite nice, so is the performance. It would, IMO, would be less good if it were 'fixed' to be more perfect.

vouaobrasil

8 replies

4h18m

2024-04-11 14:07:27 UTC

I think programmers should stay away from trying to be musicians. Have you thought of some of the people that might lose their jobs because of this technology?

v7n

4 replies

4h2m

2024-04-11 14:23:39 UTC

Technological advancements have always disrupted industries. Should we stop innovating just because some way of doing some thing might become less profitable? Absolutely not.

vouaobrasil

3 replies

3h54m

2024-04-11 14:31:48 UTC

I think we should stop innovating, mostly and return to a relationship-based system with other animals and plants.

noman-land

1 replies

3h42m

2024-04-11 14:43:31 UTC

Social relationships are an evolutionary innovation.

vouaobrasil

0 replies

3h30m

2024-04-11 14:55:07 UTC

I meant we should stop techological innovation of most kinds.

v7n

0 replies

2h11m

2024-04-11 16:14:21 UTC

Have you thought of some of the people that might lose their jobs because of that?

kertoip_1

1 replies

4h7m

2024-04-11 14:18:28 UTC

Analogically, engineers should not invent cars, because that would make carters obsolete

vouaobrasil

0 replies

4h6m

2024-04-11 14:19:04 UTC

It's not an absolute rule. The rate of adoption and evolution should be taken into account. Just like some speeds for driving are safe and some are unsafe.

kome

0 replies

4h0m

2024-04-11 14:25:29 UTC

that's a silly comment. if it's serious, but i'm not sure you're serious...

but to be serious: yes, technology sometimes causes social dislocation. and that's serious. but civilized societies have many ways to deal with social dislocation, like welfare, retraining, etc.

Musicians are not going to lose their jobs because of technology. The CD hasn't killed live concerts.

But as a society, we need to do more for those who fall out of the productive system. As productivity and wealth increase, the only problem is redistribution.

lta

7 replies

2024-04-10 18:19:16 UTC

I've tried to look a little bit around but couldn't find anything, so I'll ask here.

Any plans to release the model(s) under an open license ?

zaptrem

3 replies

2024-04-10 18:22:01 UTC

This would be so cool, but we need to think more about how we could do it and make enough money in the future to train more models with even cooler features.

lta

2 replies

22h7m

2024-04-10 20:18:06 UTC

That's a very polite way to say no. Thanks for the answer.

Personally not interested then. I'll stick with Bitwig and Ardour until an open model is available

pksebben

0 replies

21h43m

2024-04-10 20:42:03 UTC

Neither of those look like they have a generative AI component.

We (as a society) desperately need a way to train these models in a federated, distributed manner. I would be more than happy to commit some of my own compute to training open audio / text / image / you-name-it models.

But (if I understand correctly) the current architecture makes this if not impossible, nearly so.

arisAlexis

0 replies

21h50m

2024-04-10 20:35:16 UTC

meta has billions. Other startups can't just donate their IP to the world and then raise money to do multimillion training runs

echelon

2 replies

23h28m

2024-04-10 18:57:34 UTC

All models for all types of content will eventually have open source equivalents. The game is to build a great product.

uyzstvqs

1 replies

5h9m

2024-04-11 13:16:04 UTC

I'm just observing until there's a Stable Diffusion 1.5 equivalent of music generation. Open license, under 8GB of VRAM, large communities for sharing fine-tuned models, plugins like ControlNet, etc. Then this AI music generation will really take off and yield flawless results.

I know it will happen, just like SD happened after DALL-E. Bonus points to whoever does so for using C++ and Vulkan instead of Pytorch and CUDA. :-)

echelon

0 replies

5h8m

2024-04-11 13:17:03 UTC

It's too bad Emad didn't get this one out before getting axed.

bogwog

6 replies

2024-04-10 18:03:29 UTC

Sign in with Google

Why?

ragnarok451

4 replies

23h33m

2024-04-10 18:52:29 UTC

99% of the population finds this easier than setting up a user/pass. If you care about this, understand that you will not be the target user for most new apps. Incredible that this comes up on so many new Show HNs.

suyash

2 replies

21h11m

2024-04-10 21:14:22 UTC

It's quite the opposite for professional audience, most people don't want to give away their Google credentials to a 3rd party website that can get hacked tomorrow.

theshackleford

0 replies

20h30m

2024-04-10 21:55:40 UTC

most people don't want to give away their Google credentials to a 3rd party website

Good thing that’s not how it works then I suppose.

ragnarok451

0 replies

21h6m

2024-04-10 21:19:38 UTC

lol tell that to all the (quite successful) B2B SaaS apps that started with Google login as their only option

4chandaily

0 replies

23h1m

2024-04-10 19:24:02 UTC

It is bizarre that creating an account on this service depends on me also already having an account on another, completely unrelated service. This unrelated service also requires me to provide it (and notably not Sonauto, the service I was actually interested in) my mobile phone number. This unrelated service also just recently admitted it collects data about you even when it says it doesn't.

As a community made up largely of picky nerds and pedants, it doesn't seem incredible at all that this comes up so often. More like inevitable.

jedisct1

0 replies

2024-04-10 18:07:03 UTC

Dealbreaker for me.

boringg

5 replies

23h3m

2024-04-10 19:22:05 UTC

I want to say two things -- one congrats - I am sure your team has been working exceptionally hard to develop this - and the songs sound reasonable good for AI! Two I am soo competely unenthusiastic about AI music and it infiltrating the music world - all of it sounds like fingernails on a chalkboard. Just mainstream overproduced low quality radio music. I know its a stepping stone but it kills me to listen to it right now.

zaptrem

1 replies

23h2m

2024-04-10 19:23:08 UTC

Agreed. My thoughts on this are here; https://news.ycombinator.com/item?id=39992817#39994616

Also, our model specifically excels at songs from the era before overproduction. Try asking for a Johnny Cash or Ella Fitzgerald-style country or swing/jazz song!

Here's an example: https://sonauto.ai/songs/taJX3GrKZW7C5qOhjopr

cowboylowrez

0 replies

20h16m

2024-04-10 22:08:59 UTC

how does the model know how to do a johnny cash style? did you feed it johnny cash tracks? if so, what were the licensing terms? are you interested in answering these questions about training data or would this be too dodgy to chat about on a tech website?

_DeadFred_

1 replies

20h7m

2024-04-10 22:18:11 UTC

80% of music is familiarity, 20% novelty, yet the majority of peoples' time goes into getting the 80% down so that they can add their 20%.

Look at current music production and compare it to past. Older music seems so much simpler. It was so much easier to come up with that 20% 'novel' when pop/recorded music was new. Ironically I think AI freeing people to focus on that 20% is going to add a lot of creativity to music, not reduce it.

I say this as someone who hates the concept of AI music. I'm actually really excited to see what it enables/creates (but I don't want to use it, even though I really could use it for vocals that I currently pay others to do for me).

I'll be here making my bad knockoffs of bad synth pop bands having fun and taking weeks to do 5% of what kids these days will start off as their entry point, with my 20% creativity ignored because my music sounds 'off' when I can't get the 80% familiar down.

People thought synthesizers were the end of music, yet Switched on Bach begot Jean Michel Jarre begot Kate Bush and on and on.

mewpmewp2

0 replies

19h45m

2024-04-10 22:40:32 UTC

I would agree when AI gets to a point where it's possible to do that 20%. It is just not possible yet to combine it in such ways. Right now you basically get whatever music, but there's no way to add that 20%. Same with image/video generation. AI advancements have obviously been amazing and far beyond what I would've expected, but there's still ways to go.

visarga

0 replies

22h6m

2024-04-10 20:19:13 UTC

That's because you didn't listen to the MIT license song. Gen music has the potential to make even the driest texts sound good, I didn't realize that before. How about paper abstract music? https://suno.com/song/cb729eb6-4cc5-4c15-ab74-0cdbef779684

Recursing

5 replies

2024-04-10 17:46:42 UTC

Congratulations on the launch!

I was recently really impressed by the state of AI-generated music, after listening to the April Fools LessWrong album https://www.lesswrong.com/posts/YMo5PuXnZDwRjhHhE/lesswrong-... . They claim it took them ~100 hours to generate 15 songs.

Can't wait for the day I can instantly generate a song based on a random blog post or group chat history, this seems like a step in that direction

disqard

4 replies

2024-04-10 17:49:04 UTC

Perhaps not exactly "instantly generate a song based on a random blog post or group chat history", but more like "instantly generate a song based on an input prompt sentence" is suno.ai -- you should check it out!

Recursing

2 replies

2024-04-10 17:52:17 UTC

LessWrong used suno.ai , but the typical song quality is not there yet, so they had to generate 3,000-4,000 songs to get 15 good ones

blueboo

0 replies

23h15m

2024-04-10 19:10:31 UTC

If there’s variance in output, it stands to reason you’d generate many X your desired output count and curate. Standard practice for creative output, from Midjourney to LLMs

Etheryte

0 replies

23h45m

2024-04-10 18:40:48 UTC

The real endgame in this space would be a tool that first generates a song layout, think Fruityloops, then the corresponding instruments for it, then the vocals, and as the last step allows you to modify each of those layers without nuking the rest. Imagine something similar to what Suno does now, except you had the ability to add in an extra verse without altering the rest of the song, swap out a few passages of the lyrics with the rest staying in tact, swapping out drums for a different drum set etc.

turnsout

0 replies

23h1m

2024-04-10 19:24:15 UTC

Wait, is there a Suno API? I've used the site, but it's manual

realfeel78

4 replies

19h52m

2024-04-10 22:33:39 UTC

Some uses of AI can be net positive for society. Making fake music is not one of them.

latentsea

1 replies

18h25m

2024-04-11 00:00:15 UTC

I've been doing it for a week on Suno, and hard disagree. There are legit use cases, and new possibilities it opens up. Haters gonna hate, but people that find it useful will find it useful.

sexy_seedbox

0 replies

12h22m

2024-04-11 06:03:31 UTC

Putting your CEO's boring town hall meeting transcripts into Suno as gangster rap benefits society as a whole.

mewpmewp2

0 replies

19h48m

2024-04-10 22:37:26 UTC

Until the music is just more beautiful than whatever people could generate. But you are correct. We are not there yet.

bschmidt1

0 replies

16h43m

2024-04-11 01:42:27 UTC

I would use it for game dev. You know the radio in GTA - something like that. Especially if I can have some control over how the song is made.

cchance

4 replies

23h43m

2024-04-10 18:42:32 UTC

Begs the question given this is diffusion based how much of the "ipadapter/faceid/controlnet" tech can be brought over, what would a audio-faceid or audio-ipadapter look like for something like this.

zaptrem

2 replies

23h38m

2024-04-10 18:47:25 UTC

This is exactly what makes it so exciting for us!

echelon

1 replies

23h29m

2024-04-10 18:56:12 UTC

IP-Adapter for music would be a game changer. Upload a reference sample, get something in that style.

cchance

0 replies

20h11m

2024-04-10 22:14:25 UTC

Exactly, upload or even multiple songs for influence, some lyrics ... tada! Holy shit thats gonna be powerful

sandkoan

0 replies

16h50m

2024-04-11 01:35:31 UTC

See https://musiccontrolnet.github.io/web/

cush

3 replies

22h35m

2024-04-10 19:50:10 UTC

There's a lot of negative comments here, but these are the earliest days and generating entire songs is kind of the hello world of this tech.

There's always going to be a balance between creating high level tools like this with no dials and low level tools with finer control, and while this touts itself as being "more controllable", it's clearly not there. But, the same way Adobe has integrated outpainting and generative fill into Photoshop, it's only a matter of time before products like this are built into Ableton and VSTs - where a creator can highlight a bar or two and ask your AI to make the the snippet more ethereal, create a bridge between the verse and the sax solo, or help you with an outro.

That said, similar to generating basic copy for a marketing site, these tools will be great for generating cheap background music but not much else, but any musician, marketing agency, or film-maker worth their salt is going to need very specifically branded music for their needs, and they're likely willing to pay for a real licence to something audiences will recognize, using generative AI and tools to remix the content to their specific need.

TheActualWalko

2 replies

18h11m

2024-04-11 00:14:31 UTC

If anyone here is interested in something that leans towards the Ableton end of the spectrum, we're building this: https://wavtool.com/

cush

0 replies

2024-04-11 18:22:29 UTC

So rad!

antidnan

0 replies

20m

2024-04-11 18:05:17 UTC

Wow, so cool, very interested. This is exactly what I wanted to see with next gen DAWs.

How long have you been working on this?

ALittleLight

3 replies

20h29m

2024-04-10 21:55:56 UTC

This has to be bad for Spotify, right? Infinite low cost music generation from multiple competitors challenges Spotify's moat and forces them to develop a similar product and compete away profits from innumerable challengers - or else just go out of business.

latentsea

2 replies

18h22m

2024-04-11 00:03:42 UTC

The market for human generated music isn't going away...

ALittleLight

1 replies

17h49m

2024-04-11 00:36:13 UTC

It's definitely getting competition though.

latentsea

0 replies

17h16m

2024-04-11 01:08:55 UTC

Yeah, but likely in different arenas though. I'm not going to be able to go to a live show of AI generated songs for instance. I input the model answers to Question 53 on the TOPIK II exam in Suno and made songs out of it to help me memorize the patterns/structure, which is never the type of content real k-pop groups would have any interest in putting out.

rcarmo

2 replies

22h31m

2024-04-10 19:54:06 UTC

Nice, but Google login is a no-go for me (or any form of social login, really).

dubeux

0 replies

3h13m

2024-04-11 15:12:14 UTC

same.

anjel

0 replies

15h51m

2024-04-11 02:34:30 UTC

same.

pachico

2 replies

2024-04-10 17:50:36 UTC

Good luck! I just tried it and the interface was a bit confusing. It allowed me to only fill the last input in the form, which is usually a bit counterintuitive.

I presentes this prompt "Noir detective music from the 60s. Low tempo, trumpet and walking bass" and got back a one-note only song that has nothing to do with the prompt if not for some lyrics that were a bit ridiculous.

This is just feedback, I'm passionately expecting something like this to surprise me but I know it's really hard!

Happy to share the song/project/account, if you tell me how to :)

zaptrem

1 replies

2024-04-10 17:58:35 UTC

Weird. We pushed a BPM assist feature last night that may have unforeseen consequences for genres we didn't test (we tried pop, edm, classic rock). I'll turn it off by default for now. Try checking the instrumental box too.

Redster

0 replies

2024-04-10 18:25:31 UTC

Congrats on the launch! I had a similar issue as the comment above. I put in the prompt "Celtic symphonic rock" (which seems to work on Suno.ai) and some lyrics. The output ended up being just readings of the lyrics without any music, except some artifact-level whispering of music when the voice was silent. Would definitely love to see some demos of what it can produce!

echelon

2 replies

2024-04-10 17:50:21 UTC

This space is going to get very full, very fast. Udio just launched and improves upon "SOTA" Suno. This will just keep coming.

Focus on product. Give actual music producers something they'll find useful. These fad, meme products will compete on edge model capability for 99% of users and ignore serving actual music producers.

I'd like a product with more control, and it doesn't appear Suno or Udio are interested in this.

mrnotcrazy

0 replies

2024-04-10 18:15:38 UTC

I'm not sure its that they aren't interested, I think its just really hard.

internet101010

0 replies

2024-04-10 18:24:51 UTC

Exactly. As of now, Suno can be used as template but you still need to go to DAW and make it from scratch. So... individual tracks for each instrument/vocals that can be exported and brought into DAW is what is needed. For me anyway.

zaptrem

1 replies

19h4m

2024-04-10 23:21:42 UTC

Hacker News Song: https://sonauto.ai/songs/PEiO2sLZukHkpJLnHtmZ

anjel

0 replies

15h49m

2024-04-11 02:36:29 UTC

The concept is as clearly human derived as the music so equally isn't.

yanis_t

1 replies

2024-04-10 17:50:00 UTC

Not quite sure if you aware but another AI music generator just lunched today https://udio.com/

cpill

0 replies

23h8m

2024-04-10 19:17:14 UTC

haha, this track is hysterical https://www.udio.com/songs/jGjYfsRosZjYTkSBdFgEyF

ionwake

1 replies

23h24m

2024-04-10 19:01:16 UTC

I dont know about the scene but i thought this was great! I was given 3 tracks, I have to say one had no sort of beat to it, so it was like noise, but the other 2 were fantastic. great stuff!

zaptrem

0 replies

23h17m

2024-04-10 19:08:36 UTC

Thanks! We have a BPM assist that can enforce rhythm as well, so you could try that, too!

ibdf

1 replies

2024-04-10 17:48:46 UTC

I was just trying similar apps last week and I was so frustrated with the amount of options and menus to get through before I could generate anything. Not to mention the fact that half of these services ended up asking me to pay per setting. I have to say this was the least painful service to use this far. Pretty impressive output for so little input.

zaptrem

0 replies

2024-04-10 18:14:45 UTC

Thanks! We have lots of fun dials for people who want them but they're all hidden by default and shouldn't be needed.

giancarlostoro

1 replies

21h37m

2024-04-10 20:48:30 UTC

I was going to ask what it was coded in then noticed '/Home' in the URL bar. Is this by chance ASP .NET? :)

zaptrem

0 replies

20h20m

2024-04-10 22:05:41 UTC

No, it's React (Native Web) Navigation... mistakes were made haha.

fennecbutt

1 replies

17h25m

2024-04-11 01:00:33 UTC

I really feel like the popularity of diffusion has made it far too shallow.

Why diffuse an entire track? We should be building these models to create music the same way that humans do, by diffusing samples, then having the model build the song using samples in a proper sequencer, diffuse vocals etc.

Problem with Suno etc, is that as other people have mentioned, you can't iterate or adjust anything. Saying "make the drums a little punchier and faster paced right after the chorus" is a really tough query to process if you've diffused the whole track rather than built it up.

Same thing with LLM story writing, the writing needs a good foundation, more generating information about the world and history and then generating a story taking that stuff into account, vs a simple "write me a story about x"

zaptrem

0 replies

16h59m

2024-04-11 01:26:25 UTC

I completely agree on the editing aspect. However if you want to generate five stem tracks, then all five tracks must have the full bandwidth of your auto encoder. Accordingly each inference or training staff would take much more compute for the same result. That’s why we’d prefer to do it all together and split after.

cchance

1 replies

23h36m

2024-04-10 18:49:15 UTC

Question since your now doing diffusion couldn't you also train something akin to a "upscaler" to improve the overall quality of the output as that seems to be a big complaint, it feels like it should be possible to train an upscaling audio model by feeding it lower quality versions of songs and high quality FLAC for it to learn how to improve audio via diffusion upscaling

zaptrem

0 replies

23h32m

2024-04-10 18:53:08 UTC

This can definitely be done. There are approaches that turn the decoder part of the autoencoder into another diffusion model. The drawback is that's much more expensive computationally. We think there's still a lot of room for better quality on the AE side and can't wait to show our improvements.

alexpogosyan

1 replies

20h15m

2024-04-10 22:09:54 UTC

Is there a music-generating AI that takes audio as input? I’m looking to upload simple guitar melodies or chord progressions I’ve doodled and receive an enhanced version back. Similar to how image generators turn doodles/sketches into polished drawings.

rockemsockem

0 replies

17h28m

2024-04-11 00:57:07 UTC

I've heard the newest stable audio model from stability can to audio-to-audio

adenta

1 replies

23h59m

2024-04-10 18:25:54 UTC

I cant tell, will this let me upload an instrumental track and change the genre/instrument makeup? When I tried, I might've overwritten the prompt.

zaptrem

0 replies

23h56m

2024-04-10 18:29:14 UTC

Upload an instrumental track, select it, then click "Use as Rhythm Control." Once you do that, you can give the model any new prompt and it should use the same rhythm (you may need to adjust the control strength depending on genre.)

Genre changes for melodies/etc are coming once we finish variations (partial renoising like SDEdit basically).

WhitneyLand

1 replies

2024-04-10 18:07:09 UTC

Can Sonauto (or any tool currently) take an instrumental track and lyrics as input and generate vocals?

zaptrem

0 replies

2024-04-10 18:10:09 UTC

Rhythm Control can do this for a drum line, and we have a variations feature that should be able to do this for instruments as well.

CuriouslyC

1 replies

22h29m

2024-04-10 19:56:27 UTC

I don't feel like prompt understanding is very good, I don't think I really ever got close to what I wanted with any of the attempts I made, I imagine learning the model tags and building some intuition might help but I wouldn't bother with that unless I was tinkering with a local model.

Some things it made sounded ok, but I feel like the average generation quality wasn't fantastic. It did a folk guitar melody and a vocoded thrash metal voice that I thought sounded pretty legit, but mostly vocals had an ear grating quality and everything had a bit of low bitrate vibe.

To be honest though, I don't think you need to try and outcompete Suno. I think you want to get into DAWs and VSTs and become the tool all the best producers in the world use. Spit out stems, and train your model on less processed sounds because things like matching reverb/delay and pre-squashed dynamics are a pain in the ass to work around.

Suno is trying to battle a large established industry that is actually very creator friendly and accessible. If you choose to instead serve that industry and enable it I think that's the winning play.

zaptrem

0 replies

22h22m

2024-04-10 20:03:01 UTC

The vast majority of our time was spent figuring out the model architecture and large-scale distributed training, and step 2 (starting now) is scaling everything up. Prompt understanding and audio quality will get significantly better once we swap in a larger text embedding model.

Thanks for the feedback re: DAWs, though! That would be really cool. Maybe we can tag tracks based on the effects applied to them to allow this to be more controllable.

zug_zug

0 replies

20h8m

2024-04-10 22:16:50 UTC

I love this. What this needs imo is the ability to generate X samples (I see you already have that) and then say "Now generate 3 more like this one, with the following change: ..." I think this was a killer feature for midjourney.

zitterbewegung

0 replies

2024-04-10 18:17:29 UTC

Is there a project that would do a sample instead of whole songs ?

weitendorf

0 replies

15h46m

2024-04-11 02:39:21 UTC

I really like being able to convert from artist name-> style with this, and in theory I like being able to use uploaded files in lieu of a style prompt. But to be honest I haven't been able to get output that seems nearly as high quality as Suno v3 or Udio yet - although it could be user error.

My experiment on sonauto.ai so far - I first selected "The Weeknd", then picked the prompt:

  The Weeknd's smooth vocals lead the song, blending with electronic effects. A low, pulsating bass line opens the track. Synthesizers add layers to the melody, creating a danceable rhythm. Minimalist drum machine beats provide the foundation for the rhythm section."

with vocals a modified, shortened (couldn't use the entire song as the UI truncates input past the length I ended up using in my link) version of "Starboy" mostly replacing some nouns with food-related nouns. The results didn't really sound like The Weeknd at all... example: https://sonauto.ai/songs/U6eDSrrn5V5AVmV8xMgR

I also tried uploading Starboy directly as an mp3 to generate from that prompt instead, using the same lyrics. I may have done something wrong (when I went back to Prompt, my prompt was replaced with the string "Uploaded File", and some of the output is so stylisticly different it makes me think it didn't get applied at all) but it didn't seem to work well, if at all: https://sonauto.ai/songs/klNMfs4bPgji3edPvwAv

Did I do something wrong using the upload file feature? And if anybody has hints for getting better output with the auto-generated prompts LMK. I'd love to use these new features but it seems like they're either configuring an underlying modl not mature enough for generally good output compared to suno/udio or the UX is not making good output easy to achieve.

Here's an example output I get from Suno with similar lyrics and a prompt that merely lists some styles associated with The Weekend. As you can see, the song is much better overall, and the voice sounds more like The Weeknd (although it still fails to style the eg Chorus properly): https://suno.com/song/d4f72fce-0bc7-4786-a299-f58d903c4275

weatherlight

0 replies

2024-04-10 17:53:14 UTC

yeah, Sounds pretty bloodless to me.

throwaway743

0 replies

14h28m

2024-04-11 03:57:29 UTC

This is great. Only issue I'm coming across is that the voice just sounds like the same one with slight variations no matter what I put in the prompt

solomonb

0 replies

11h59m

2024-04-11 06:26:18 UTC

This is technically really impressive but does anyone actually _want_ mass produced AI music?

skybrian

0 replies

15h3m

2024-04-11 03:22:12 UTC

Seems like we already have good ways to edit music (for example, a piano roll) and AI could use them.

As an amateur musician I'd like to see it take a MIDI track as input and produce audio as output, as a sort of AI MIDI instrument.

Or maybe take some tracks as input and generate another track, both MIDI and audio.

saaaaaam

0 replies

22h48m

2024-04-10 19:37:38 UTC

How worried are you about being sued? Seems like your training data probably includes quite a bit of copyright protected stuff. Just listened to the “blue scoobie doo” example and the influences are fairly obvious. With record companies getting super litigious about this, is that a concern? Or did you licence your training data?

rexreed

0 replies

17h14m

2024-04-11 01:11:27 UTC

I always test these AI generators with the head scratcher genre: Electro Klezmer Reggae Funk.

I was thrilled by 2 of the versions produced. I wish I could extend it more like one of the comments here said:

* ElectroKlezmerReggaeFunk 1: https://sonauto.ai/songs/s22rQEPnYsXy1yf7sjU0

* ElectroKlezmerReggaeFunk 3: https://sonauto.ai/songs/1iNTrA2CekPwp7XT9mmM

But wow, the UDIO version:

* https://www.udio.com/songs/j4zpRYgG2GEDbWpLPYbuJb

rdelpret

0 replies

19h3m

2024-04-10 23:22:15 UTC

Making my own music is fun for me so I’m gonna keep doing that.

pratclot

0 replies

9h20m

2024-04-11 09:05:29 UTC

Nice tool! I saw there an attempt to create a structured piece (https://sonauto.ai/songs/V8Lg2q50OOFl0FYbbdTu) and it seems like nobody is aware of that functionality. I suppose an average person (that only came to visit out of curiosity, lacking any knowledge of the underlying tech, like me) would become more "productive" if the project edit screen gave hints like that.

pistacchioso

0 replies

3h56m

2024-04-11 14:29:42 UTC

You know what's really controllable if you want to create music? Learn music theory, practice an instrument and start a band.

mdrzn

0 replies

10h33m

2024-04-11 07:52:41 UTC

Impressively fun, and less restricted then Suno on some artists. Well done, can't wait to see how this space progresses in a few years.

lcolucci

0 replies

16h37m

2024-04-11 01:48:26 UTC

Very cool! So is it possible to do control net-like architectures for music LDMs similar to how it's done for images?

jsf01

0 replies

17h0m

2024-04-11 01:25:06 UTC

This is ridiculously fun. Congrats on the launch! I took inspiration from “There I Ruined It” and grabbed lyrics from various popular songs to have the AI sing them in the style of other artists. It sometimes took a few attempts, but it honestly did a great job. You got a chuckle out of my friends and family. Also loved that I didn’t have to enter a credit card in order to try it out.

jpkw

0 replies

4h21m

2024-04-11 14:04:14 UTC

Not sure how you'd execute this, but I think a great method of control for rhythm would be like a karaoke bouncing ball that you control with a keystroke. eg:

  -I write out my lyrics
  -you break my lyrics down to [x] syllables
  -I choose this rhythm control option
  -I hit the spacebar [x] times, in my desired rhythm based on those syllables (maybe a karaoke style visual to guide this), & you calculate timings between each keystroke and then attempt to create rhythm based on that

jedisct1

0 replies

2024-04-10 18:06:53 UTC

Please offer alternatives to Google to sign-in.

jachee

0 replies

17h24m

2024-04-11 01:01:39 UTC

Let me sign up with something besides a Google account.

herval

0 replies

22h9m

2024-04-10 20:16:37 UTC

What’s your thoughts on copyright and how holders might react in a system like that?

My understanding of the music industry is the incumbents are VERY lawsuit happy, and plagiarism laws are substantially more reaching than with image or video (eg cases where someone gets sued for using the same chords as another song) - how do you plan to approach all that?

giancarlostoro

0 replies

21h33m

2024-04-10 20:52:41 UTC

How's this compare to Suno?

https://suno.com/

emsign

0 replies

13h56m

2024-04-11 04:29:35 UTC

Sounds creepy tbh

e12e

0 replies

19h53m

2024-04-10 22:32:14 UTC

I'm somewhat positively surprised by my first attempt - simple prompt, no editing of the (admittedly flat) lyrics: song to a robot harvester, "Robot Friend":

https://sonauto.ai/songs/avg5NT3qf9QYNfWAyeOn

Look forward to playing with this.

dznodes

0 replies

15h48m

2024-04-11 02:37:36 UTC

I call this ai_song "jobDenveR"

It's a folksong I prompted about loosing my job to a robot in the style of John Denver, god rest his gentle soul./

https://sonauto.ai/songs/oOdXomZV73uwfQIxIvTU

digging

0 replies

21h35m

2024-04-10 20:50:24 UTC

Sign in with Google

Well, maybe I'll try out the next AI music creator posted on HN.

dengsauve

0 replies

19h43m

2024-04-10 22:42:39 UTC

Had a blast playing around w/prompts and listening to the various results.

I play piano, sax, guitar, and I can sing well enough. I'm garbage at songwriting and composing. I immediately see the value of using this tool to scaffold an idea out. I think being able to export lyrics and chord progressions would be an amazing paid feature to keep this as a freemium product.

cwillu

0 replies

22h8m

2024-04-10 20:17:41 UTC

A volume control is not optional, and titling the song usually comes last for me, which means I have to give a nonsense name in the app before I've started.

cush

0 replies

22h32m

2024-04-10 19:53:29 UTC

Does it use a male voice by default? Just clicking on random songs, it took me 20+ tries to find a female voice

canogat

0 replies

18h0m

2024-04-11 00:25:14 UTC

Where did you get the music to train on? And how hard was it to get permission?

camillomiller

0 replies

6h19m

2024-04-11 12:06:48 UTC

Question: what's the deal with copyright? Could I use the generated music as YouTube background pads for my own videos with no repercussions?

bschmidt1

0 replies

17h22m

2024-04-11 01:03:48 UTC

Amazing work! Loving this thing.

browningstreet

0 replies

21h29m

2024-04-10 20:56:21 UTC

Hmm, I get "peppy cola commercial before movie starts" vibes off most of the vocals.

billconan

0 replies

18h1m

2024-04-11 00:24:04 UTC

what is a diffusion transformer?

artur_makly

0 replies

20h27m

2024-04-10 21:57:51 UTC

This app made my day. I literally just created my dream CD of Weird-Al-inspired parody songs. thank you.

antgiant

0 replies

19h4m

2024-04-10 23:21:28 UTC

Hmm any suggestions on how to convince it to produce something like this?

I have lyrics that I want sung by a solo female voice with the background being a male chorus. Something reminiscent of a woman singing the details of a David vs Goliath type battle backed by a chorus of the victorious warriors from that battle.

So far I have completely failed to be able to generate a female lead with a male chorus backing

abledon

0 replies

17h24m

2024-04-11 01:01:34 UTC

Is this similar to Riffusion?

aatd86

0 replies

14h13m

2024-04-11 04:12:37 UTC

At the meta level, I don't understand why AI would be used to replace people's entertainment and hobbies... It should be used for the things that no one wants to do... Or that no human is capable of doing...

I mean, as a dabbler in music, not being able to play a given instrument should make me want to learn, such upskilling which should have some beneficial effects even in terms of neurobiology.

Even if I know that could be used as a basis for creative input, it feels like this is dangerous for humanity.

After all, someone has to have something to do in their spare time?

Timwi

0 replies

11h49m

2024-04-11 06:36:27 UTC

Where is the download link or git repo?

Or is this just an ad for a commercial product?

You claim that “we're making generations on our site free and unlimited for as long as possible” but I couldn't find any UI where I could do anything for free. The best I could find was a “log in with Google” link. Requiring a Google account means your software is not free, and most definitely not “unlimited”.

Fischgericht

0 replies

2h12m

2024-04-11 16:13:04 UTC

To me the most important feature for this would be getting back stems as detailed as possible instead of a final mix-down. This way I could take "suggestions" or interesting parts/instruments from your AI, and use it in a track. But I am not sure how your model works, and if it's even able to produce stems, or if it directly generates a mixed track.

CaptWillard

0 replies

5h26m

2024-04-11 12:59:21 UTC

I'm a songwriter with hundreds of melodies and song sections in Garageband and Logic.

I would love/pay/kill for an AI tool to help me flesh them out. Is anyone working on this?

BonoboIO

0 replies

19h27m

2024-04-10 22:58:35 UTC

This works amazing even with German lyrics and a mashup of Till Lindemann from Rammstein and 1970s Rock

https://sonauto.ai/song/JSmCpJssZeIS2C87pkQW