For instance a guitarist will have a track they wish they had vocals for(and lyrics) for and if they could pay for that they would.
Literally if you could highlight a tune section in your DAW, prompt it, and vocals + lyrics were generated, possibly different version or harmonies for existing parts etc. Musicians already pay for plugins but the singing ones are awful to use so far.
I probably would be happy paying a service I could drop a riff into and get decent drum track that goes with it. Even more would be while recording or playing it modifies and adapts, it can be recorded and clipped. Something that fits a clean workflow. If anyone makes this please don't make it such a pain as most VSTs and plugin systems where there are like 4 different installers and licensing software layers.
The other very real aspect here is "training data" has to come from somewhere, and the copyright implications of this are beyond solved.
In the past I worked on real algorithmic music composition: algorithmic sequencer, paired with hardware- or soft- synthesizers. I could give it feedback and it'd evolve the composition, all without training data. It was computationally cheap, didn't infringe anyone's copyright, and a human still had very real creative influence (which instruments, scale, tempo, etc.). Message me if anyone's still interested in "dumb" AI like that. :-)
Computer-assisted music is nothing new, but taking away the creativity completely is turning music into noise -- noise that sounds like music.
The reason is greed. They jump on the bandwagon to get rich, not to bring art. They don't care about long term effects on creativity. If it means that it kills motivation to create new music, or even learn how to play an instrument, that's fine by these people. As long as they get their money.
I’m so sick of hearing this excuse. “I can’t draw so I use AI,” as if the people who can draw were born that way.
No, they spent countless hours practicing and that’s what makes it art. Because it’s the product of hours of decision making and learning. You can not skip ahead in line. Full stop.
Not sure how to reach out, but I'm definitely interested in reading about procedural methods in music synthesis. Any links describing your approach?
Actually, noise that sounds like music is some of the best music there is: electroacoustic music.
A lot better than most music on the radio. ;-)
I don't see any contact info in your profile, but I have an email in mine. I am interested in hearing more about your process and if you have music for sale anywhere, I like to support electronic artists doing interesting stuff.
It’s less than worthless.
How is that specific to text prompting? If you tap your fingers to a model and it generates a song from your tapping, it's still just fitting the training data as you say.
Here are a few of my songs, I think they are fairly consistent?
https://sonauto.ai/song/e2e3d210-69b4-4ad7-96d1-fb5744d0c648
https://sonauto.ai/song/a94e04a9-7b74-4b87-b5ed-ca3e8d2798d0
https://sonauto.ai/song/55a36595-c60a-4346-81d8-6f03ebe690ff
It might also be helpful to come up with some ways to segregate customers so that “prosumer” users get faster “cold starts” (so that they can iterate faster) at the expense of sometimes having to wait for generation to start back up again.
8. OUTPUT As between You and the Services, and to the extent permitted by applicable law, You own any right, title, or interest that may exist in the musical and/or audio content that You generate using the Services ("Outputs"). We hereby assign to You all our right, title, and interest, if any, in and to Your Outputs. This assignment does not extend to other users' Outputs, regardless of similarity between Your Outputs and their Outputs. You grant to us an unrestricted, unlimited, irrevocable, perpetual, non-exclusive, transferable, royalty-free, fully-paid, worldwide license to use Your Output to provide, maintain, develop, and improve the Services, to comply with applicable law, and/or to enforce our terms and policies. You are solely responsible for Outputs and Your use of Outputs, including ensuring that Outputs and Your use thereof do not violate any applicable law or these terms of service. We make no warranties or representations regarding the Outputs, including as to their copyrightability or legality. By using the Services, You warrant that You will use Outputs only for legal purposes.
You own the rights, but Sonauto is granted the rights to use it as well.
>We hereby assign to You all our right, title, and interest, if any
>You are solely responsible for Outputs and Your use of Outputs
I love how it clearly laid out the scenario that the right don't exist, yet you are responsible.
Error creating billing portal Failed to create billing portal session: No configuration provided and your live mode default configuration has not been created. Provide a configuration or create your default by saving your customer portal settings in live mode at https://dashboard.stripe.com/settings/billing/portal.
Also when trying to update my profile picture :
Failed to update image! column users.current_period_end does not exist
"Generate a rock song" is not a problem that working musicians have. "Take this riff I recorded with whatever guitar I have in the studio and show me what it sounds like as a Les Paul through a 5150 or Strat through a AC30" is, though.
I have used Music AI for what you describe, by uploading a track I have made myself and use diffusion on it with various genres, although I don't think that is the most interesting use case for Music AI.
Fusion is the most interesting use case IMHO.
Are you saying that it is wrong for a machine to learn? Even if it was, it is the massive amount it is based on that makes it less problematic, as that makes room for building abstracted knowledge.
So overall, yes, anyone can, with some luck create a "radio friendly" track with music AI, but not everyone can create good art with it. You still need to understand music, and the more you know about genres and composers and how they fit together the more power you have to create musically interesting patterns.
Keep in mind that Sunauto has 3400 generic tags + an undisclosed amount of artist names. You need to understand what makes it possible to combine different Artists/Musicians and what could lead to something interesting.
So musical knowledge is still needed.
Not to mention that now you can have playlists that transition seamlessly btw two songs. Low-cost party DJ?
Are there any good papers or writeups on them?
Are there any open source implementations to play with?
Audio models are actually quite similar to image models, but there are a few key differences. First, is the autoencoder needs to be designed much more carefully as human hearing is insanely good and music requires orders of magnitude more spatial compression (image AEs do 8X8 downsampling, audio AEs need to do thousands of times downsampling). Second the model itself needs to be really good at placing lyrics/beats (similar to placing text in image diffusion): a sixth finger in an image model is fine, but a missed beat can ruin a song. That's why language model approaches (which have a stronger sequential inductive bias than diffusion models which is good for rhythm and lyric placement) have been really popular in audio.
If you're interested in papers (IMO not good for new people as they make everything seem more complicated than it is):
Stable Audio (similar to our architecture): https://arxiv.org/abs/2402.04825 (code: https://github.com/Stability-AI/stable-audio-tools)
MusicGen (Suno-style architecture): https://arxiv.org/abs/2306.05284 (code: https://github.com/facebookresearch/audiocraft/tree/main)
But music production and distribution is (actually, was) my home turf, so here's my two cents on the topic:
I've already heard music qualitatively on par with the tracks available on your demo page. I've heard it way more than I truly wanted or felt it was necessary, at least once a day while tracking on pro tools hundreds of albums you've never ever heard of, in studios in France and LA, for years.
It was made with people with the best intentions, coming from all sorts of walks of life, and yet it was obvious from the first note they played that they were condemned to the oblivion, their music destined to be basically never heard by anyone.
And this has been done every day, multiple times a day, in every studio around the world, since the '60s.
20% of Spotify music has never been played once. IIRC less than 40% has been played more than once.
There's a genuinely humbling scene in the 2002 documentary "Scratch" where DJ Shadow, a world-renowed DJ and producer, wades trough stacks of EPs out of a record store in NY that have never, ever been played once[1], which perfectly captures how little of the musical output being recorded we actually get to listen to.
Making music is very easy. Making music people want to listen to is hard, mind-bogglingly so. For every whitebread pop track you've heard on the radio, there's thousands of other similar tracks that have been discarded by an A&R, a radio DJ, some label, or simply by the audience.
I'm saying this with no ill feelings towards you or your work, but I can't concieve even the flimsiest of reasons why anyone would ever listen to (or license/sync/track/ ) any of those generated songs once the novelty of "music made by the AI" is gone.
Easy: Independent/single-dev operations needing some quick background music for a project (game, whatever)
Nobody will hire live studio musicians or a symphony orchestra to create background music. Way too expensive.
Obviously these tools don't do everything necessary to make great music, but the barrier of entry to making music is being lowered, and the quality floor is being raised -- and that'll result in a lot more would-be "musicians"[0] creating music that wouldn't otherwise exist[1].
[0] I leave the argument of whether these generative musicians count as "real" musicians to the Scotsmen in the audience.
[1] Bonus question: does art still hold value if no one sees it?
Yes. Art can have intrinsic and personal value for its creator, independent of any external audience. Unseen art lacks immediate external value [to others] but retains latent worth, potentially realized when discovered or appreciated in the future.
Eh I don't think the world is exactly clamoring for even more music.
I can't speak for everyone's process, but if you don't know how to make music, I'm not convinced that this allows you to do so because the medium of input (aka writing text) is far too divergent from the resultant melodic output to allow for any kind of meaningful individuality.
But fuck all the people who have a career teaching them those skills as a part of a thousands-year long artistic tradition whose value isn't solely defined by the exchange of currency for lessons, but in that it subsidizes those artists' work which goes unpaid and furthers human experience.
It's wonderful techies with a surface level knowledge of the arts are cannibalizing the entire supply chain and marketplace so they can make a buck off the AI craze.
The barrier to entry is already zero. AI lowers the ceiling, not the floor.
I can. It’s predatory behavior, performed by people looking to steal and cash in on something they have neither the skill, understanding, or love to make on their own.
Agreed, although making music is "easy" once you've put in the hours to learn how to make music, and getting to somewhat professional standards requires a lot of time investment.
This reduces that to zero.
The point where it becomes a "problem" is the people abusing it to pump out hundreds (or dare I say thousands) of bullshit filler "music" to get some stream income at the expense of people who have put in the effort.
One of the most famous mantras in punk music was "this is a chord, this is another, now make a band"[1]. "Imagine" by John Lennon is a song written using the simplest scale and chord progressions, using a very low 4/4 tempo.
The hard part is not knowing the biggest amount of chords, it's knowing what not to use to carry your emotions.
Also, the "time investment" is the music. Once the final waveform hits the tape/DAW/recorder it's not art anymore, it's publishing.
And for most artists "learn to make music" is usually the fun part. Complicated and frustrating sometimes, but rewarding. For a lot of them, the "now play it in front of other people" part is the truly annoying one, frightening in some occasions.
[1]https://austinkleon.com/2019/01/13/this-is-a-chord-this-is-a...
Getting it from a service like this is the equivalent of buying an already assembled lego kit. Putting aside ethical concerns (we're talking about the music industry after all, there were none even before AI arrived) is there a viable business in it?
AI music for the sake of just having it in the background removes that human element. It's just more stuff. To be fair, like you said, making generic music isn't anything new. But everything is turning into this. Games are using AI generated music which isn't , by definition, able to try anything new, AI art which is just reguritated by other artists.
The enshitification of Spotify is here. Why pay artist 100$ for 500k plays when you can just push AI music and pay 1$ for every 500k plays. As is music( really any entertainment) is a horrible way to make money.
So I guess I'll just keep working on my beats, with Maschine( the only software that keeps me on Windows!), and sharing them with a few people every now and then.
That could just be me though. I am curious what users of Udio/Suno think?
based on results so far it also looks like more flexible approach to ai generation would be to generate set of stems/samples based on user description and let them to actually compose instead of producing complete audio (maybe this is already happening somewhere)
- in either case, properly looped tracks will be most likely necessary to be produced by these models at some point
If you have lyrics in your conditioning then you usually can use a [Chorus] section as your loop point as most music AI models render chorus section in away that affords crossfades. THAT ASSUMES that you keep the chorus sections with a gap less than about 60 seconds with 30 seconds exention as it can only see 90 seconds of the music track (or something in that range)
> Creating looping sections with Sunauto is generally not very difficult as the model is overly consistent when extending
the point is not consistent extension, but consistent extension (if any) plus starting audio
have you actually did this and got usable looped track ?
- link to both example
- briefly outline process of combining them for this usecase
- since this is API usage i suppose it's most likely not viable to obtain (reasonably) looped audio via user prompting (i.e. what's currently available in browser) - would that be correct ?
Edit: I have just heard the funniest most ridiculous metal song ever without a touch of metal inside. Breathe of Death, it’s like a bad joke.
If thats the future of anything, I’m going back to plain C (code) when I retire and I’ll never approach the internet ever again.
Humans create value. AI consumes and commoditizes that value, stealing it from the people and selling it back to their customers.
It’s unethical and will be detrimental in the long run. All profit should be distributed to all artists in the training set.
I beg of you, speak to some real life musicians. A human composing or improvising is not choosing notes based on a set of probabilities derived from all the music they’ve heard in their life.
> I think an alternative legal interpretation where all of humanity's musical knowledge and history are controlled by three megacorporations (UMG/Sony/Warner) would be kinda depressing.
Your impoverished worldview of music as an artistic endeavor is depressing. Humanity’s musical knowledge extends far beyond the big 3.
> If the above is true we might as well shutdown OpenAI and delete all LLM weights while we're at it
Now we’re talking.
> losing massive value to humanity.
Nothing of value would be lost. In fact it would refund massive value to humanity that was stolen by generative AI.
edit: I have to add how disingenuous I find calling out corporations owning "all of humanity's musical knowledge and history" as if generative AI music trained on unlicensed work from artists is somehow a moral good. At least the contracts artists make with these corporations are consensual and have the potential to yield the artist some benefit which is more than you can say for these gen-AI music apps.
My point re: LLMs wasn't meant to exclusively be a "they're doing it" one, the hope was to give an example of something many people would agree is super useful and valuable (I work much faster and learned so much more in college thanks to LLMs) that would be impossible in the proposed strict interpretation of copyright.
edit responding to your edit:
Re: moral good: I think that bringing the sum of human musical knowledge to anybody who cares to try for free is a moral good. Music production software costs >$200 and studios cost thousands and majoring in music costs hundreds of thousands, but we can make getting started so much easier.
Is it really consent for those artists signing to labels when only three companies have total control of all music consumption and production for the mass market? To be clear, artists absolutely have a right to benefit from reproduction of their recordings. I just don't think anyone should have rights to the knowledge built into those creations since in most cases it wasn't theirs to begin with (if their right to this knowledge were affirmed, every new song someone creates could hypothetically have a konga line of lawyer teams clamoring for "their cut" of that chord progression/instrument sample/effect/lyrical theme/style).
edit:
> I think that bringing the sum of human musical knowledge to anybody who cares to try for free is a moral good
Generative AI music isn't in any way accomplishing this goal. A free Spotify account with ads accomplishes this goal -- being able to generate a passable tune using a mish-mash of existing human works isn't bringing musical knowledge to the masses, it's just enabling end users to entertain themselves and you to profit from that.
> Is it really consent for those artists signing to labels
Yes? Ignoring the fact that there are independent labels outside the ownership of the Big Three you mention, artists enter into contracts with labels consensually because of the benefits the label can offer them. You train your model on these artists' output without their consent, credit or notification, profit off of it and offer nothing in return to the artists.
btw, if the user of the AI doesn't do any of the above then I think the US copyright office says it can't be copyrighted in the first place (so no profiting for them anyway).
Am I understanding right that the point here is that while you are able to get away with using copyrighted material to turn a profit, your end users cannot, so no worries?
1. Anthropomorphizing the kind of “influence” and “learning” these tools are doing, which is quite unrelated to the human process
2. Underrepresenting the massive differences in scale when comparing the human process of learning vs. the massive data centers training the AI models
3. Ignoring that this isn’t just about influence, it’s about the fact that the models would not exist at all, if not for the work of the artists it was trained on
This premise is false. I have made plenty of money busking on the street, for example. Or selling audio recordings at shows.
> {o be clear, artists absolutely have a right to benefit from reproduction of their recordings.
This is correct. Artists benefit when you pay them for the right to reproduce. When you don't (like what you are doing), you get sued. Here's a YouTube video covering 9 examples:
https://www.youtube.com/watch?v=IIVSt8Y1zeQ
> I just don't think anyone should have rights to the knowledge built into those creations since in most cases it wasn't theirs to begin with
What?
That's why I specified mass market. However, given a choice between literally being on the street and working with a record label I'd probably choose the label, though I don't know about others.
> pay them for the right to reproduce
My point is learning patterns/styles does not equate to reproducing their recordings. If someone wants to listen to "Hey Jude" they cannot do so with our model, they must go to Spotify. There are cases where models from our competitors were trained for too long on too small a dataset and were able to recite songs, but that's a bug they admit is wrong and are fighting against, not a feature.
> in most cases it wasn't theirs to begin with
In most cases they did not invent the chord progression they're using or instruments they're playing or style they're using or even the lyrical themes they're singing. All are based on what came before and the musicians that come after them are able to use any new knowledge they contribute freely. It's all a fork of a fork of a fork of a fork, and if everyone along the line decided they were entitled to a cut we'd have disaster.
What's the worst that can happen if we allow unregulated AI training on existing music? Musician as a job won't exist anymore lest for the greatest artists. But it makes creating music much more accessible to billions of people. Are they good music? Let the market decide. And people still make music because the creative process is enjoyable.
The animus towards AI generated music deeply stems from job security. I work in software and I see it is more likely that AI can be eventually able to replace software devs. I may lose my job if that happens. But I don't care. Find another career. Humanity needs to progress instead of stagnating for the sake of a few interest groups.
But beyond the originality !== novelty discussion, I'm not sure how we've come to equate 'creativity' (and the rights to retaining it) to a sort of fingerprint encoding one's work. As if a band, artist or creator should stick to a certain brand once invented, and we can sufficiently capture that brand in dense legalese or increasingly, stylistic prompts.
How many of today's artists just 'riffing' off existing motifs will remain, if the end result of their creative endeavours will be absorbed into generative tools in some manner? What's the incentive for indies to distribute digitally, beyond the guarantee their works will provide the (auditory) fingerprints for the next content generation system?
Intellectual property laws for thee but not for me, I guess.
AI music is a weird business model. They hope that there's enough money peddling music slop after paying off the labels (and maybe eventually the independent music platforms) whose music you stole. Meanwhile, not even Spotify can figure out how to be reliably profitable, serving music people want to hear.
I bring it up only to provide a bit of balance to the soulless slop debate, proving creators can have diverging opinions on what is good in music creation and life—they don't all feel threatened by poor substitutes no one can possibly enjoy.
My dream product in this space (...that I didn't know existed until I discovered your site about 10 minutes ago LOL):
I listen to music when I work/code, and I used to loooooove Spotify Playlist Radio (a feature the reason for which they killed I will never understand) because it helped me discover new music in the style of music I already enjoyed working to. Liked a song? Add it to the seed list and click play to fine-tune the radio station.
So what I really want is just a fine-tuneable infinite stream of novel music to work to. And by fine-tuneable, I mean I'd love to be able to nudge the generation (Pandora style) with thumbs ups / thumbs downs, or other more specific guidance/feedback (more bass, faster tempo, etc.) until I have this perfectly crafted, customized-for-me stream of music.
I'd probably listen to it all day and happily pay $$ for this.
Is this a pipe dream?
I was having a discussion with a friend who writes a lot of guitar music but can also play bass and sing. However, getting good drums is a problem. What he'd like is a service to upload his songs in some form (just guitar, or a mixed version with bass and vocals) and get an output that layers a drum track without altering the input. Ideally with appropriate fills, etc. I mean, just getting an in-time drum stem would probably be even better.
Is there any GenAI service to do this kind of incremental additive drums?
Less of this, more robots they do my dishes please.
For the API: I think this could be integrated into artists workflows in lots of ways we can't even imagine right now as it gets better. One example I gave above was generating transitions between songs.
I also used it when I was living in New Orleans to help a friend come up with a riff for a live set he had, which had some unusual constraints (only had a singer, drummer and trombone, but no others, in an echoey space). He used the generated song hook as inspiration for that nights' arrangement
There's lots of stuff, and song of it supports artists who have tight timelines and want creative support
What your friend did, using generation for inspiration for real music he creates is fine. But if someone gifted me an AI generated song I would ask why they didn't pay a few dollars -- honestly not much more -- to a real artist to do the same.
Ten years ago a friend of mine did that, hired a real person, and it cost less than $20 to write a ditty. That's comparable to the cost in tokens for an AI except you could support a real human artist instead of megalomaniac Yarvinists Sam Altman and friends.
And the song would have real meaning. You gave your friend a non-gift. The Let Me Google That For You of gifts. Honestly if one of my friends did that I'd wonder if they even like me.
I literally made a Mountain Goats song about them playing a fantasy video game together with their daughter as we all sat on the couch. This did not rob any artist of any amount of money they ever would have seen. It was a novel moment accentuated and joyful for humans at zero cost to anyone else. The creative world is not zero-sum like you're presenting it to be
So even if you just pay someone else to make you a song, its not really any more expensive than this. Same with painting. What does this AI bring to the table, at all? It grosses me out.
People on this site should go pick up a guitar and write a 3 chord song about someone, itll take you a day if that. Its not hard! Its fun!
When this critical number is not amassed then the genre effectively dies.
With A.I. we can resurrect dead genres, but not only that, we can combine genres together, popular genres with one another, also popular and unpopular genres or popular and dead genres.
Using A.I. for music is easier and much faster than traditional means, and this could greatly reduce the critical mass of musicians to support a genre. It could be reduced as much as 10 times, or 100 times, like one person creating 10000 songs or something similar.
By trying to compare A.I. music to traditional music, you are comparing 10 songs a real band makes, with 10000 songs an A.I. (human) musician makes. It's apples and oranges comparison.
I don't see why human music cannot be a genre, the best of all genres but just one, and an innumerable amount of A.I. genres which may not be so good, but they are infinite.
The real human music genre might be the best forever or just for the next 3 years, but so what? Let there be more genres some good some bad. No one is gonna listen to a cheap copy of an already existing song of an already existing genre, but songs already in existence should be used to train A.I. weights.
Regarding A.I. weights, smaller models forget much of the information they are trained on, and they are cheaper, faster and easier to be fine-tuned, also probably easier to apply RL reasoning on. In that way, A.I. musicians (or real musicians) could run the model in their computers and use it as an instrument instead of relying in companies with big models, slow and expensive.
And some times big and inefficient models copy text/code/music verbatim from the training data. But this is a bug, when small models become competitive enough, most people are gonna use those. They might even carry them around, like a personal band always ready to make melodies for them.
> The problem with real music, is that it requires a hefty amount of musicians to establish a genre.
Why is establishing genre a goal in the first place?
> This amount could be somewhere in the range of 100 to 1000 musicians.
This is demonstrably false. Genre is defined by critical consensus, and it can arise around one or a handful of bands.
> With A.I. we can resurrect dead genres
What dead genre are you after? I’d imagine there are folk styles that haven’t been kept alive, but I question whether AI recreations would satisfy anyone. I’d rather listen to authentic recordings instead. And if the genre doesn’t have a significant recorded catalog, you can’t train a generative AI to produce it anyway.
I think what the OP is trying to articulate is that they are aware of more genres now? Maybe AI makes exploration of niche genres more exciting and participatory for them. They are finding new genres and "expanding them", but it's just bc they were ignorant of them (or unengaged with the content of the song style) before they could participate in this way. I dunno, just trying to think what they might have experienced that would make them think some new universal was coming true * shrug *
What is culture, if not a common agreement on what is beautiful and ugly? Establishing a genre in music is not a goal, but we see it happen over and over again. It is how humans operate since forever, we mimic one another in fashion, in music and many other things.
> Genre is defined by critical consensus, and it can arise around one or a handful of bands.
It arises around a handful of bands, but if it doesn't grow past 5 bands let's say, we are talking about 50 songs in total every year. Who listens nowadays to only 50 songs per year?
> And if the genre doesn’t have a significant recorded catalog, you can’t train a generative AI to produce it anyway.
Yes you can. Synthetic data generation is a big thing already and tens of millions of dollars are poured into it every year.
I haven’t done the analysis, but consider someone who listens to pop radio: if one new song per week makes it into heavy rotation I’d say that sounds like the right ballpark.
Personally I’d be ecstatic if there were 50 worthwhile new songs to listen to each year.
I understand synthetic data. I question whether anyone will accept the results.
What makes these tracks "slop"?
https://sonauto.ai/song/942c0122-9358-4805-96c4-eb0537d97ca2
https://sonauto.ai/song/ba87e490-19f0-47c5-acb0-83a3b57a90e9
https://sonauto.ai/song/2b217436-0876-49bc-8a9c-d5807e626962
Copyright does not cover ideas, generic patterns, timbres etc.
If you know music history then will know that classical composers borrowed heavily. Without it, we would still be at the stone age level.
This situation can easily spiral out of control in a way that you end up with an oligarchy in music, where AI captures most of the attention, since it will be backed by those with most means to shove it in your face.
So yes, this is piracy. Hopefully the law will catch onto the ethics.
It's def valid to ask about the value of projects like this, but I think "Please delete this project, as you are actively making the world worse." isn't the right way to start that discussion if that was your intent. I also detailed my thoughts about the whole industry a little further down so I'll avoid duplicating that.