{"id":3312,"date":"2024-04-05T17:42:28","date_gmt":"2024-04-05T17:42:28","guid":{"rendered":"https:\/\/favtutor.com\/articles\/?p=3312"},"modified":"2024-04-22T10:37:22","modified_gmt":"2024-04-22T10:37:22","slug":"stable-audio-2","status":"publish","type":"post","link":"https:\/\/favtutor.com\/articles\/stable-audio-2\/","title":{"rendered":"Stable Audio 2 Now Crafts 3 Minutes of AI-Generated Music"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Stability AI made a major upgrade to its text-to-audio AI model with Stable Audio 2. Here&#8217;s what&#8217;s coming!<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Highlights:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Stable Audio 2 can now generate up to 3 minutes long at 44.1 KHz stereo using a single prompt.<\/li>\n\n\n\n<li>The model also has an audio-to-audio generation feature where users can modify samples using text prompts.<\/li>\n\n\n\n<li>The new model is available on the Stable Audio website for free.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What&#8217;s New in Stable Audio 2?<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Stable Audio 2 by Stability AI is a text-to-audio AI tool that can make music up to 3 minutes. It enables high-quality, full tracks with coherent musical structure at 44.1 kHz stereo from a single natural language prompt.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Below is an example shared by the company where they generated a 3-minute-long soundtrack using the prompt <em>&#8220;Cinematic Synthwave&#8221;<\/em>:<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<div class=\"jeg_video_container jeg_video_content\"><iframe title=\"Stable Audio 2.0\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/2tob9emMhJw?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/div>\n<\/div><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Another major feature in this upgrade is audio-to-audio generation. Users can now combine their upload samples and transform them using with text prompts to further flexibility, control and an elevated creative process. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here is a demo where the user uploaded their sample and with a single text prompt, additional drum or guitar effects are added:<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<div class=\"jeg_video_container jeg_video_content\"><iframe title=\"Audio-to-Audio Feature Demo\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/1JKlwgsCwEg?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/div>\n<\/div><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The improved model is aimed at providing clear structures and high-quality sound. It simplifies complex audio waveforms into shorter, more manageable forms and then reshapes them to create music that tries to capture the essence of human compositions. The goal is for the AI to grasp the nuances of music to replicate the patterns and sequences<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Stable Audio 2 was trained exclusively on a licensed dataset from the AudioSparx music library, honouring opt-out requests and ensuring fair compensation for creators. The 1.0 model was also trained using data from Audiosparx which has over 800,000 audio files containing music, sound effects as well as single instrument stems.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>How Stable Audio 2 Works?<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">At its core, Stable Audio 2 leverages diffusion transformer technology (DiT), following the same approach as Stability AI&#8217;s upcoming Stable Diffusion 3 image generator, representing a shift from its previously adopted U-Net technology.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">DiT and U-Net are both common architectures used in machine learning, but DiT is designed to refine random noise into structured data incrementally, making it particularly effective at handling long data sequences. U-Net, by contrast, focuses on accuracy for short generations but is less capable of handling longer, more complex sequences.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>How to use Stable Audio 2?<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Stable Audio 2 is available for free on the <a href=\"https:\/\/stableaudio.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">Stable Audio website<\/a>. It will soon be available on Stable Audio API.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh7-us.googleusercontent.com\/1TmxqPgoWaj3R2_M9-RArj_zbc-lZGCNBBqVl-bK99ej6qd239DPWUgkFD27lEIxRjQvTdUM9uDOkLXaMLXbtYQSLWfzBjda4m_p-xwmnWAwM0z7dHmK80oiAI5AEdXDuya6Xs4TVeZOsVo3Y7OBlps\" alt=\"Stable Audio website\"\/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Stable Audio can be accessed through this web interface. Just sign up and prompt it! They have a prompt library which is a good starting point to start with. Each free account receives 20 free credits a month for an AI music generation. Note that Stable Audio 1 requires 1 credit and Stable Audio 2 requires 2 credits.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Comparison with Stable Audio 1<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">We tested both versions of stable audio with the same text prompt: \u201c<em>Create a chill, melodic downtempo instrumental with warm piano, mellow electric guitar, subtle bassline and light percussion textures like shakers and cymbals. The vibe should be introspective and dreamy.<\/em>\u201d\u00a0<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This was the output from version 1:<\/p>\n\n\n\n<div align=\"center\"><blockquote class=\"twitter-tweet\" data-conversation=\"none\"><p lang=\"en\" dir=\"ltr\">This is a sample generated with the following prompt using stable audio 1.0<br>&quot;Create a chill, melodic downtempo instrumental with warm piano, mellow electric guitar, subtle bassline and light percussion textures like shakers and cymbals. The vibe should be introspective and\u2026 <a href=\"https:\/\/t.co\/LK86ikjc0n\" target=\"_blank\">pic.twitter.com\/LK86ikjc0n<\/a><\/p>&mdash; Kaustubh Saini (@kaustubh_saini) <a href=\"https:\/\/twitter.com\/kaustubh_saini\/status\/1775543770584588389?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">April 3, 2024<\/a><\/blockquote> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">This was the output from version 2:<\/p>\n\n\n\n<div align=\"center\"><blockquote class=\"twitter-tweet\" data-conversation=\"none\"><p lang=\"en\" dir=\"ltr\">using the same prompt on stable audio 2.0, I found the generated audio to be much more detailed and consistent with the prompt. <a href=\"https:\/\/t.co\/oOGtt7FyMO\" target=\"_blank\">pic.twitter.com\/oOGtt7FyMO<\/a><\/p>&mdash; Kaustubh Saini (@kaustubh_saini) <a href=\"https:\/\/twitter.com\/kaustubh_saini\/status\/1775544937691701507?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">April 3, 2024<\/a><\/blockquote> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Version 2 had a much more cohesive and detailed generation incorporating all specified elements of the prompt pleasingly and consistently.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Stability AI is becoming a big player in the AI space with tools like <a href=\"https:\/\/favtutor.com\/articles\/stable-video-3d\/\">Stable Video 3D<\/a>, <a href=\"https:\/\/favtutor.com\/articles\/stable-diffusion-3-new\/\">Stable Diffusion 3<\/a>, and <a href=\"https:\/\/favtutor.com\/articles\/stable-code-instruct-3b\/\">Stable Code Instruct 3B<\/a>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Stable Audio 2 has massively improved on the capabilities of its earlier version as well as giving tough competition to engines like Suno. The fact that a user can whistle a simple tune and with the help of prompts turn it into a detailed track is its trump card and the reason it\u2019s better than most audio engines out there.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The latest text-to-audio AI model Stable Audio 2 by Stability AI can now make 3 minutes of music, along with audio-to-audio generation.<\/p>\n","protected":false},"author":20,"featured_media":3320,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jnews-multi-image_gallery":[],"jnews_single_post":null,"jnews_primary_category":{"id":"","hide":""},"footnotes":""},"categories":[57],"tags":[56,59,88,74,155,156],"class_list":["post-3312","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-ai","tag-generative-ai","tag-music","tag-stability-ai","tag-stable-audio","tag-stable-audio-2"],"_links":{"self":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/3312","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/users\/20"}],"replies":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/comments?post=3312"}],"version-history":[{"count":2,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/3312\/revisions"}],"predecessor-version":[{"id":3321,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/3312\/revisions\/3321"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media\/3320"}],"wp:attachment":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media?parent=3312"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/categories?post=3312"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/tags?post=3312"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}