Stability AI has been on the roll to level up its GenAI game, with Stable Diffusion 3, Stable Video 3D, Stable Audio 2, and now an AI Audio Generator with Stable Audio Open.
Highlights:
- Stability AI launches Stable Audio Open, their latest open-source audio generator model.
- Can generate musical snippets and audio effects up to 47 seconds across diverse styles and genres.
- Trained extensively on more than 486492 audio files, all completely licensed.
What is Stable Audio Open?
Stability AI’s open-source text-to-audio model, Stable Audio Open is an audio generator that can produce sound effects and samples lasting up to 47 seconds.
It’s an open-source model that has been extensively trained to generate all sorts of audio samples and sound effects. This is really a breakthrough in the audio aspect of Generative AI, as 47 seconds is quite an impressive duration for a produced sound effect.
Because of its specific training, Stable Audio Open is perfect for producing foley recordings, ambient noises, drum beats, instrument riffs, and other audio samples for use in sound design and music creation. Furthermore, the model’s concept allows for the style transfer of audio samples and audio variants.
We’re excited to announce Stable Audio Open, an open source model optimised for generating short audio samples, sound effects and production elements using text prompts.
— Stability AI (@StabilityAI) June 5, 2024
This release marks a key milestone as we further open portions of our generative audio capabilities to… pic.twitter.com/KZlqJdTHiu
The ability for users to fine-tune the model using their unique audio data is a major advantage of this open-source model. To create new beats, a drummer could, for instance, refine samples of their drum recordings. A pianist could also generate notes of his preferred key bindings and tunes.
Here take a look into an audio snippet generated by Stable Audio Open, you won’t regret lending an ear to this pleasant melancholy.
Stable Audio Open by StabilityAI 🎶
— Vaibhav (VB) Srivastav (@reach_vb) June 5, 2024
> Generates stereo at 44.1KHz
> Max 47 sec generations
> T5 text embeddings
> Transformers-based diffusion model (DiT)
> Trained on Freesound (472618 hours) and Free Music Archive (FMA) (13874 hours)
> Takes roughly 30sec to generate 45 sec of… pic.twitter.com/L6YDNQQfRF
Even normal users could generate any sound effect of their choice, depending on the time, situation, and event. Complete freedom for short audio and music for all types of creators!
However, another key aspect of the model’s functioning is the dataset used for training Stable Audio Open. According to Stability AI, About 486492 audio recordings make up the collection! 13874 are from the Free Music Archive (FMA) and 472618 are from Freesound. The licenses for all audio files are either CC0, CC BY or CC Sampling+.
This vast dataset makes sure that it can respond to whatever query or prompt that is put forward by the user. The model utilizes this data to train the DiT and the autoencoder. Thanks to the licenses, Stability claimed that they were able to train an open model by respecting the rights of the creators.
Is It Really Worth It or Same as Stable Audio?
Stability’s previously launched commercial model Stable Audio produces complete, high-quality songs up to 3 minutes in length with a logical musical structure.
Conversely, Stable Audio Open focuses on sound effects, production elements, and audio samples. Although it may produce brief musical snippets, it is not designed to handle singers, melodies, or entire songs.
The terms of service for Stable Audio Open also forbid commercial use. Additionally, it has biases and performs differently in descriptions written in languages other than English as well as across musical genres and cultural contexts. AI stability attributes this to the training set.
Therefore, users who are looking to generate large melodies and songs of their interests can stick to the commercial Stable Audio as the latest open model is not optimized for longer tasks, but is rather focussing on short audio snippets.
Conclusion
Although Stable Audio Open may not be as effective in producing long-duration audio, it indeed lays the foundation for impromptu sound effects.