Microsoft’s WizardLM-2 seems to have finally caught up to OpenAI, but it was later removed. Let’s discuss it in detail!
Highlights:
- Microsoft released WizardLM-2, an open-source model built by fine-tuning the Mixtral 8x22B on synthetic data.
- The model excels in complex tasks and provides top-tier reasoning skills.
- It was later deleted because they forgot to do toxicity testing.
WizardLM-2 Unboxed
WizardLM was an instruction-based model built on top of Meta’s LlaMA. The researchers used generated instruction data to fine-tune LLaMA.
The second version of this model WizardLM-2 was built on Mistral AI’s Mixtral 8x22B model by fine-tuning it to synthetic data. The model family includes three cutting-edge models: WizardLM-2 8x22B, 70B, and 7B.
The models demonstrate highly competitive performances compared to leading proprietary LLMs.
WizardLM-2 8x22B is the most advanced model, falling slightly behind GPT-4-1106-preview. The 70B reaches top-tier capabilities in the same size and the 7B version is the fastest, even achieving comparable performance with 10x larger leading models.
The model was trained on synthetic data generated by AI models. The company said in an X post:
As the natural world's human data becomes increasingly exhausted through LLM training, we believe that: the data carefully created by AI and the model step-by-step supervised by AI will be the sole path towards more powerful AI. Thus, we built a Fully AI powered Synthetic… pic.twitter.com/GVgkk7BVhc
— WizardLM (@WizardLM_AI) April 15, 2024
What are the Capabilities of the Model?
- MT-Bench: The WizardLM-2 8x22B even demonstrates highly competitive performance compared to the most advanced proprietary works such as GPT-4-Trubo and Glaude-3. Meanwhile, 7B and 70B are all the top-performing models among the other leading baselines at 7B to 70B model scales.
- Human Preferences Evaluation: Through this human preferences evaluation, WizardLM-2’s capabilities are very close to the cutting-edge proprietary models such as GPT-4-1106-preview, and significantly ahead of all the other open-source models.
But It Later Disappeared!
The model turned out to be quite the magician as the model weights were available on Hugging Face But were removed after only a few hours.
Speculation started about the reason for this current withdrawal and the company revealed in an update on X that they missed an important step in the release process: toxicity testing.
🫡 We are sorry for that.
— WizardLM (@WizardLM_AI) April 16, 2024
It’s been a while since we’ve released a model months ago😅, so we’re unfamiliar with the new release process now: We accidentally missed an item required in the model release process – toxicity testing.
We are currently completing this test quickly… https://t.co/1YG3e35Uvj pic.twitter.com/nyPCX2owA2
Toxicity in LLMs means its ability to create harmful or inappropriate content. If “toxicity” is found in an LLM, it is not so good for it, especially when everyone around the world is so concerned about the negative effects of AI. One wrong output and the internet will be rampant, and maybe the authorities will also look into it. No company wants such negative consequences.
So, both GitHub and Hugging Face have removed all files for the model. The pages now result in 404 errors.
However, many people had already downloaded the model weights before the repository was taken down. Several users also tested the model on some additional benchmarks before it was taken down.
But when it comes back, A powerful open-source model like this will find applications in various domains and among AI enthusiasts.
Conclusion
Despite the controversy surrounding the release and then deletion of the model weights and posts, WizardLM-2 shows great potential to dominate the open-source AI space. With the imminent arrival of Llama-3, this is the perfect time for Microsoft to drop a new model. Perhaps a bit hasty with the procedures, but no harm done!