Microsoft’s new AI model is beating GPT 3.5 and Gemini Pro to solve mathematics problems, and this one is quite smaller in size also. Find out everything you need to know about Orca-Math here!
Highlights:
- Microsoft unveils Orca-Math, a Small Language Model based on the Mistral 7B model.
- Comes with high mathematical reasoning capabilities after being trained on vast datasets of varied lengthy math problems.
- Also a publicly available math dataset for development purposes.
Orca-Math is Made for School Math Problems
Microsoft announced Orca-Math, an SLM (Small Language Model) to solve grade school-level math problems. It was created by fine-tuning the Mistral 7B model and trained on 200,000 synthetically generated math problems.
The announcement was first made by Arindam Mitra, Senior Researcher at Microsoft, via X:
Introducing Orca-Math, our Mistral-7B offshoot excelling in math word problems! 🧮🐳
— arindam mitra (@Arindam1408) March 4, 2024
– Impressive 86.81% score on GSM8k
– Surpasses models 10x larger or with 10x more training data
– No code, verifiers, or ensembling tricks needed pic.twitter.com/ncV1VUEAK5
Orca-Math follows its predecessors Orca and Orca 2 with their main goal being to show how better training signals and techniques might raise the reasoning capacities of smaller language models to a level comparable to that of much bigger language models.
Top-Notch Benchmarks Beating GPT-3.5
One of the most fascinating aspects of Orca-Math comes to light with its impressive score of 86.81% on the GSM8k pass@1 test.
The GSM8K dataset contains 8.5K well-written, linguistically diverse grade school math word problems that were composed of human problem solvers. These problems normally require two to eight steps to solve by a middle schooler. You can find more about the dataset here.
Orca-Math’s impressive results on this test showed that it outperforms several AI giant models such as Gemini Pro and GPT 3.5. It also surpassed several math-based models such as MetaMath-70B and WizardMa8th-70B:
This just shows how Orca-Math is narrowing the gap between AI models and complex mathematical problems by bringing SLMs into the picture. Till now solving lengthy math word problems was considered a hectic task for Small Language Models, but not any more thanks to Microsoft.
Mistral-Large has also recently launched, giving competition to GPT-4.
Teaching mathematics to any AI model is a very complex task, so the big question that must be on everyone’s mind is “How did Microsoft program Orca-Math to solve Maths?”. Here are the important aspects of the model’s build.
High-quality synthetic dataset of 200K math problems
According to Developer, they created a fresh set of 200,000-word problems using specialized agents working together, which included teacher and student AI agents who provided feedback on the students’ responses. This approach differs from the traditional SPIN (student models for bad answers) method.
Using 36,217 “sample math word problems from existing open-source datasets,” the problems were created. OpenAI’s GPT-4 was then asked to provide the solutions. The new Mistral 7B variant, Orca-Math, was then trained using the responses.
“We create Orca-Math-dataset, a synthetic dataset of 200K math problems, paired with GPT-4-Turbo solutions. The dataset was generated using an agent-based setup, hereby referred to as, Agent-Instruct, that not only paraphrases existing problems but aims to expand the problem set both in diversity and difficulty.”
Orca-Math Research Paper
Microsoft used three main agents namely Ask Me Anything, Suggester, and Editor to improve the quality of the math word problems and to have a better predefined dataset with varied and intricate math problems with solutions.
Ask Me Anything, Suggestor and Editor
Ask Me Anything agent mainly performs the function of expanding the existing word problems to generate further prompts that can be used as citations. Take a look at the Orca-Math’s research paper which shows Ask Me Anything in action:
The Suggester looks at a certain issue and suggests several ways to make it more complicated without making the issue worse. The Editor then creates an updated, trickier challenge using the Suggester’s suggestions and the original word problem.
This iterative process may take place in several rounds, where the difficulty of the problem that was previously developed is increased with each round.
Iterative Learning
Microsoft used a student-teacher paradigm for the iterative learning process. The SLM, or student, is learning from the demonstrations made by the big model, or instructor.
The iterative learning process begins by using AgentInstruct to teach the SLM by demonstrating issues and their fixes. This is a crucial step for rectification of major bugs and probable issues.
In the second stage, the SLM was given free rein to practice problem-solving. The SLM generates various solutions for each challenge. After that, Microsoft provides comments on the SLM solutions using the instructor model. The teacher’s solution was utilized in case the SLM failed to generate results even after various attempts.
In the final phase of the training, Preference data was created using the feedback that was generated. Consequently, retraining the SLM after presenting it with both excellent and poor solutions to the same issue.
Overall this whole process can be repeated iteratively for Orca-Math’s training process as stated by Microsoft. The research team also employed the “Kahneman-Tversky Optimisation,” or KTO, technique, which was developed late last year and made publicly available by the startup Contextual AI.
Microsoft’s 200,000-word maths problem set
The 200,000-word set of artificial intelligence (AI) arithmetic puzzles created by the Microsoft Orca team are available on Hugging Face under a permissive MIT license, enabling “everyone to explore, build, and innovate” with them, even for commercial use.
So, developers what are you waiting for? Go ahead and try out the dataset for both commercial and personal use purposes. You can get insights into training your own AI custom chatbots and models to solve complex maths problems and have an idea of the training process.
Conclusion
Orca-Math’s detailed iterative training and high-quality mathematical solving capabilities give us an idea of how far AI has come. This is just the beginning for the world of generative AI as surely much more will unfold in the days to come. Stay tuned to our articles for more updates on the world of AI.