Meta recently introduced an upgraded version of Meta AI, powered by the company’s new advanced LLM, Meta Llama 3. It has been integrated into WhatsApp, Instagram, Facebook, and Messenger and Meta claims that it outperforms competing open-source models on key benchmarks.
Highlights:
- Meta introduced an upgraded version of Meta AI powered by Llama 3, the company’s latest open-source advanced LLM.
- It has been integrated into the search features of Meta’s social media apps WhatsApp, Instagram, Facebook, and Messenger.
- It beats all its open-source competitors like Mistral’s 7B and Google Gemma’s 7B on key benchmarks.
What is Llama 3?
Llama 3 is the successor to Meta’s previous language models, Llama and Llama 2, which were released in 2023. The newly released advanced LLM Llama 3 has improved performance, enhanced capabilities, and a more extensive knowledge base.
Meta says that Llama 3 is among the best open models currently available, offering users a powerful tool for generating text, creating AI images, and assisting with various tasks.
Introducing Meta Llama 3: the most capable openly available LLM to date.
— AI at Meta (@AIatMeta) April 18, 2024
Today we’re releasing 8B & 70B models that deliver on new capabilities such as improved reasoning and set a new state-of-the-art for models of their sizes.
Today's release includes the first two Llama 3… pic.twitter.com/Q80lVTeS7m
Meta CEO said the following about the launch:
“We’re upgrading Meta AI with our new state-of-the-art Llama 3 AI model, which we’re open sourcing. With this new model, we believe Meta AI is now the most intelligent AI assistant that you can freely use.”
Mark Zuckerberg
The Llama 3 family includes pretrained and instruction-fine-tuned language models with 8 billion and 70 billion parameters respectively that can support a wide range of use cases.
Meta described the new models, Llama 3 8B and Llama 3 70B, as a significant advancement compared to the previous generation of Llama 2 models in terms of performance.
Model Architecture
For Llama 3, Meta opted for a relatively standard decoder-only transformer architecture with several key improvements made compared to Llama 2. It utilizes a tokenizer with a vocabulary of 128,000 tokens that encodes language much more efficiently, leading to substantially improved model performance.
To enhance the inference efficiency of the new open-source models grouped query attention (GQA) was adopted across both the 8 billion and 70 billion parameter models.
The models were trained on sequences of 8,192 tokens, using a mask to ensure self-attention does not cross document boundaries. This prevents the model from attending to tokens across different documents during training.
Training Data
Meta invested heavily in pretraining data for Llama 3, which is pre-trained on over 15 trillion tokens collected from publicly available sources. This training dataset is seven times larger than the one used for Llama 2 and includes four times more code data.
To ensure the model was trained on the highest quality data, Meta developed a series of data-filtering pipelines. These included using heuristic filters, NSFW filters, semantic deduplication approaches, and text classifiers to predict data quality.
They found that previous generations of Llama were surprisingly effective at identifying high-quality data, hence Llama 2 was used to generate the training data for the text-quality classifiers powering Llama 3.
Platform Integration
One of the most notable applications of Llama 3 is its integration with WhatsApp. This allows users to generate AI images, and text, request the latest news, and generate birthday messages and greetings directly within the messaging app. This feature is being rolled out to users gradually, making AI-powered creative tools more accessible to a broader audience.
With LLaMA-3, WhatsApp users can easily generate images and text based on their prompts, opening up new possibilities for creative expression and communication.
Llama 3 is also available on Meta’s other social media apps such as Instagram, Facebook, and Messenger. The new upgraded Meta AI is designed to help users with all their queries across Meta apps and glasses. For easier access, Meta has integrated its AI assistant with the search features of the mentioned social media apps along with the launch of its official website.
The new models will soon be available on AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM Watson, Microsoft Azure, NVIDIA NIM, and Snowflake, and with support from hardware platforms offered by AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.
To further support the adoption and deployment of Llama, Meta has partnered with Microsoft to make the model available on the Azure cloud computing platform. This collaboration enables developers and businesses to easily integrate Llama 3 into their applications and services, leveraging the scalability and reliability of Azure.
The availability of Llama 3 on Azure is expected to accelerate the development of AI-powered solutions across various industries.
Results
Meta’s new language models, Llama 3 8B and Llama 3 70B, demonstrate impressive performance across multiple benchmarks compared to other open-source and industry models.
The smaller 8B parameter version outperforms models like Mistral 7B and Google’s Gemma 7B on at least 9 benchmarks covering areas such as reasoning, math, coding, and general knowledge.
These include MMLU, ARC, DROP, GPQA (a set of biology-, physics- and chemistry-related questions), HumanEval (a code generation test), GSM-8K (math word problems), MATH (another mathematics benchmark), AGIEval (a problem-solving test set) and BIG-Bench Hard (a commonsense reasoning evaluation).
While some of these competitor models are not the latest versions, Llama 3 8B still scores a few percentage points higher on several benchmarks.
More notably, the larger 70B parameter Llama 3 model is competitive with flagship industry models like Google’s Gemini 1.5 Pro. It outperforms Gemini 1.5 Pro on benchmarks like MMLU, HumanEval, and GSM-8K math word problems.
Additionally, while not rivalling Anthropic’s highest-performing Claude 3 Opus model, Llama 3 70B scores better than the second-weakest Claude 3 Sonnet model on 5 benchmarks.
Meta developed a new 1,800 prompt evaluation set across 12 real-world use cases, and restricted access to it during training to prevent overfitting, to optimize Llama 3 for practical performance beyond just benchmarks – with human evaluations showing how it compares to models like Claude, Mistral, and GPT-3.5 across these scenarios.
The chart below shows the aggregated results of human evaluations across the discussed categories and prompts against Claude Sonnet, Mistral Medium, GPT-3.5, and Meta Llama 2.
Overall, Meta claims their new Llama 3 models, especially the 70B version, demonstrate state-of-the-art performance that is competitive with or superior to other leading open-source and commercial language models across a wide range of capabilities and benchmarks.
Conclusion
Meta’s decision to make Llama 3 an open model is a significant step towards democratizing AI technology. They aim at innovation and collaboration within the AI community by allowing researchers, developers, and businesses to access and build upon the open-source model