This news might be music to your ears! Stability AI, one of the leading players in generative AI Audio models, is now bringing these capabilities directly to smartphones. Now, you can create custom sound effects and audio samples wherever you are—no internet connection required!
Stability AI’s Generative Audio on Your Phone
Imagine editing a video on your phone, maybe for TikTok or Instagram, and now wants a unique sound effect or cool background music. But the need is for something very specific. Now, with Stability AI, just prompt it with “gentle ocean waves on sunset” and you will get your AI-generated audio instantly.
All this is happening when Stability AI partnered with Arm. They have optimized Stable Audio Open to run efficiently on Arm’s mobile CPUs. Stable Audio Open can generate about 45 seconds of audio with just text prompts, made specially for short-form audio like drum beats or ambient sounds. While their other commercial models are designed for complete songs, this is for smaller audio generation only.
It is not like AI-powered audio apps are not available. We already have Suno and Udio, but they are dependent on cloud processing. So, this offline generation of audio is something to look out for. Stability AI also claims that Stable Audio Open was trained exclusively on royalty-free audio and songs.
Processing audio data locally on the device also means it ensures personal information remains private and secured, addressing privacy concerns that usually come up that are associated with cloud-based processing.
ARM CPUs offer great performance and power efficiency, making them very capable of running these AI workloads. The aim is to make the AI tools available for “builders” (creators and developers) everywhere. This is the first offline text-to-audio for smartphones.
Watch the official demo here:
This was a long road for them as in the first attempt, it took about 4 minutes to create a small audio. But with clever optimizations and leveraging Arm’s KleidiAI libraries, they took it down to 8 seconds. This is about 30 times faster!
KleidiAI library provides performance-focused routines called ‘microkernels’ specialized for Arm CPUs. This also comes with XNNPack and ExecuTorch. XNNPack is a deep learning calculation library optimized for mobile devices, and ExecuTorch is a framework that streamlines model execution on mobile devices.
In addition, optimizations were made to take advantage of the characteristics of the CPU cores in the Armv9 architecture. Armv9 has an extended instruction set for machine learning workloads, which enables more efficient execution.
Such improvements highlight how hardware and software should be integrated to advance the AI for mobile devices. When 99% of smartphones globally are powered by ARM technology, this is quite a breakthrough!
“As more and more professional creatives and businesses adopt generative AI to power their production pipeline, it’s important that our models and workflows are available everywhere for builders to build and creators to create. We are excited to partner with Arm for this exact reason. Arm’s prevalence across the ecosystem from the server to the smartphone and its work to accelerate AI models across all the popular frameworks by integrating Arm Kleidi into the software stack, made it a no brainer.”
– Stability CEO Prem Akkaraju
This opens up possibilities for users to create custom sound effects and audio samples directly on their smartphones.
Takeaways
This collaboration is a significant step towards making AI tools more accessible and user-friendly for regular usage and wider adoption of AI for various applications. The partnership may soon expand to images, videos, and 3D content as well. This shift not only enhances user experience by providing faster, more secure, and offline functionality.