Open-source AI is an ever-evolving space with new models being released rapidly. This month in particular has been filled with activity with the release of LLama 3, DBRX, and now Arctic AI. arctic is the newest family of models developed by the cloud-based data warehousing company Snowflake.
Highlights:
- Snowflake, a data warehousing platform, unveiled its enterprise-focused Arctic LLM.
- Arctic LLM has a context length of 4000 tokens and can match LLama 7B in enterprise metrics.
- While it is easily integrated with Snowflake’s Cortex platform, it fails to distinguish itself from other excellent open-source models.
Arctic LLM by Snowflake is here
Arctic is aimed at enterprise tasks such as SQL generation, code generation, and instruction following.
Fundamentally, Snowflake’s Arctic is very similar to most other open-source LLMs, which also use the Mixture of Experts (MoE) architecture including DBRX, Grok-1, and Mixtral among others.
The MoE architecture builds an AI model from smaller models trained on different datasets, and later these smaller models are combined into one model that excels in solving different kinds of problems. Arctic LLM is a combination of 128 smaller models.
Here is how they describe the new flagship model in the official announcement:
“This is a watershed moment for Snowflake, with our AI research team innovating at the forefront of AI. By delivering industry-leading intelligence and efficiency in a truly open way to the AI community, we are furthering the frontiers of what open source AI can do. Our research with Arctic will significantly enhance our capability to deliver reliable, efficient AI to our customers.”
Sridhar Ramaswamy, CEO, Snowflake
Snowflake said it trained this new LLM in less than three months, with a budget of under $2 million, which it said was one-eighth the cost of similar models.
Architecture of Arctic LLM
Arctic LLM uses a unique Dense Mixture-of-Experts hybrid transformer architecture. The improvement of the model quality depended primarily on the number of experts, the total number of parameters in the MoE model, and the number of ways in which these experts can be combined.
It leverages a large-scale architecture with numerous components (experts) to enable high-level intelligence.
Simultaneously, it intelligently selects and activates only the most relevant experts and a manageable number of parameters during training and inference, thereby reducing computational demands and improving efficiency without compromising the model’s overall capabilities.
Arctic is designed to have 480B parameters spread across 128 fine-grained experts and uses top-2 gating to choose 17B active parameters.
In contrast, recent MoE models are built with significantly fewer experts. It currently runs a 4k token context-length but Snowflake announced their plans for a 32k context model in the future.
Performance of Arctic LLM
Snowflake emphasized that the model was competitive with LLama 3 8B and Llama 2 70B on enterprise metrics while using less than half of the training compute budget. It even equals or is better than LLama 70B in Instruction following, SQL generation and common sense.
Enterprise intelligence is a combination of metrics based on SQL, code, complex instruction following, and the ability to produce grounded answers. Arctic equals LLama 70B in training efficiency at about 65%.
Inference efficiency is critical for the cost-effective deployment of the model. It represents a leap in the MoE model scale, using more experts and total parameters than any other open-sourced auto-regressive MoE model.
Is Arctic truly all it claims to be?
A major flaw in the Arctic, one that has been glossed over is its context length. The context length of a large language model refers to the maximum number of tokens (words or subword units) that the model can process and consider as input at once.
It essentially defines the window of context that the LLM can “see” and take into account when generating output or making predictions.
This is an essential factor because while LLMs are trained on vast amounts of text data, they can only generate outputs by analyzing the context length. So when the context of a model is small, it is more likely to hallucinate and produce inconsistent outputs. Since it has less reference data, the solution it generates could have incorrect facts or details.
The context length of Arctic is currently only 4000 tokens. Compare that to Llama and Mixtral with 8000 tokens or DBRX with 32,000 tokens!
Snowflake also claims this new LLM to be a top-of-the-line “Enterprise AI” model. However, for any enterprise metric, it is entirely possible that a foundational model fine-tuned with the specific tech stack used by an enterprise performs better than an Enterprise AI model.
Conclusion
Arctic, being the first AI model family released by Snowflake, is easiest integrated with Cortex. Arctic LLM demonstrates impressive performance and efficiency, particularly in enterprise-focused tasks.