Stability AI’s new open source text-to-audio generator was trained on free music libraries to “respect creator rights”

The model was trained using audio data from free music libraries Freesound and the Free Music Archive.

When you purchase through affiliate links on MusicTech.com, you may contribute to our site through commissions. Learn more
Stable Audio Open

Image: Stability AI

Stability AI, the company behind AI-powered image generator Stable Diffusion, has launched Stable Audio Open, an open source model for generating short audio samples, sound effects and production elements using text prompts.

The new model was trained on audio data from free music libraries Freesound and the Free Music Archive. “This allowed us to create an open audio model while respecting creator rights,” says Stability AI. The company adds that Stable Audio Open’s specialised training makes it ideal for creating drum beats, instrument riffs, ambient sounds, foley recordings and other audio samples for music production and sound design.

Users can generate up to 47 seconds of audio data by inputting text descriptions like “warm arpeggios on an analog synthesizer with a gradually rising filter cutoff and a reverb tail” and “rock beat played in a treated studio, session drumming on an acoustic kit”.

One key advantage of the open source release is that users can fine-tune the model on their own custom audio data. For example, a drummer could fine-tune on samples of their own drum recordings to generate new beats.

That said, while Stable Audio Open can generate short musical clips, it is not optimised for full songs, melodies or vocals unlike the company’s flagship Stable Audio service. The latter is able to produce tracks with coherent musical structure up to three minutes in length, and offers advanced capabilities like audio-to-audio generation and coherent multi-part musical compositions.

According to Stability AI, the open source model “provides a glimpse into generative AI for sound design while prioritising responsible development alongside creative communities.”

The company’s latest focus on ‘responsible audio generation’ follows the high-profile exit of its VP of generative audio, Ed Newton-Rex, last November, who quit due to disagreements with the firm over what constitutes “fair use” of copyrighted works.

The former executive said he disagreed “with the company’s opinion that training generative AI models on copyrighted works.” Newton-Rex also told the BBC that he thought it was “exploitative” for developers to use creative work without consent – a stance he claimed many AI firms, including Stability AI, would beg to differ.

logo

Get the latest news, reviews and tutorials to your inbox.

Subscribe
Join Our Mailing List & Get Exclusive DealsSign Up Now
logo

The world’s leading media brand at the intersection of music and technology.

© 2024 MusicTech is part of NME Networks.