PyTorch recently announced the release of ExecuTorch Alpha, a new tool for deploying LLMs and large machine-learning models on edge devices with limited resources. It successfully bridges the gap between advanced AI capabilities and environments with limited computational resources.
Highlights:
- ExecuTorch Alpha is a new tool released by PyTorch for the deployment of LLMs and large ML models on edge devices.
- It leverages quantization and other techniques to pack LLMs for efficient execution on edge devices.
- It supports running models like Meta’s Llama 2 7B and Llama 3 8B on smartphones and wearables.
Why do we need ExecuTorch Alpha?
The existing methods for running large language models require computers with high computational power and resources. This has limited their application on edge devices like smartphones and mobile phones.
With this new tool, PyTorch aims to resolve the need to optimize model execution on edge devices while maintaining performance and efficiency.
Built on the PyTorch framework, ExecuTorch Alpha offers a complete workflow for deploying models on edge devices. To bring LLMs to edge devices, it heavily leverages quantization and other techniques to pack these models appropriately.
It is focused on deploying large language models and large ML models to edge devices, stabilizing the API surface, and improving the installation process.
ExecuTorch Alpha supports 4-bit post-training quantization using GPTQ. PyTorch has also provided device support on CPU by landing dynamic shape support and new dtypes in XNNPack. They have also made significant improvements in export and lowering, reduced memory overhead, and improved runtime performance thus leading to resource optimization.
It makes it possible to use small and efficient model runtimes on a wide range of edge devices by focusing on portability and efficient memory management. This connects powerful AI models with environments that are limited in resources.
It enables the deployment of powerful models on resource-constrained edge devices by prioritizing portability and efficient memory management. This technology bridges the gap between advanced LLMs and environments with limited computational resources.
Support for various models
ExecuTorch Alpha enables running Meta’s Llama 2 7B efficiently on iPhone 15 Pro, iPhone 15 Pro Max, Samsung Galaxy S22, S23, and S24 phones, and other edge devices. It also provides early support for Meta’s latest model, Llama 3 8B.
In addition to other improvements, this release enables running Meta Llama 2 7B efficiently on devices like the iPhone 15 Pro, Samsung Galaxy S24 and other edge devices — it also includes early support for Llama 3 8B.
— AI at Meta (@AIatMeta) April 30, 2024
More details on ExecuTorch Alpha ⬇️ https://t.co/aVkecCkQeQ
PyTorch has also closely collaborated with its partners at Apple, Arm, Qualcomm Technologies, Google, and MediaTek to build ExecuTorch Alpha.
👇 Get started with ExecuTorch Alpha for optimal performance of LLMs on the CPU, alongside delegation to GPU and NPU.
— Arm (@Arm) April 30, 2024
With LLMs already running on our efficient CPUs, our close partnership with @PyTorch is making this easier on @Meta’s Llama 2, 3 and other broad models. https://t.co/HuwGHoJR8t
They have also significantly expanded their list of supported models across NLP, vision, and speech. Although support for on-device LLMs is early, they expect most traditional models to function seamlessly out of the box, with delegation to XNNPACK, Core ML, MPS, TOSA, and HTP for performance.
The ExecuTorch framework has already been tested at the production level. Meta has been using it for hand tracking on Meta Quest 3 and various models on Ray-Ban Meta Smart Glasses They have also begun the integration of ExecuTorch with Instagram, WhatsApp, and other Meta products.
With ExecuTorch Alpha, PyTorch also intends to provide a powerful software development kit (SDK) that will help monitor the entire process from model authoring to deployment. It provides the SDK for debugging the model as if it were debugging a Python program. This helps to analyze the model performance and identify bottlenecks.
Conclusion
PyTorch’s release of ExecuTorch Alpha presents an innovative solution for the deployment of LLMs and large machine-learning models on resource-constrained edge devices. NVIDIA is also working in similar space with its NIM platform to deploy LLMs.