AutoCoder, a brand-new open-source LLM for coding, has shocked the world by defeating OpenAI’s GPT-4 Turbo and GPT-4o (Omni), across benchmarks related to coding.
Highlights:
- AutoCoder, the latest AI based Coding Assistant LLM, has defeated GPT-4o in coding benchmark.
- Built on the AIEV-Instruct Architecture which helps in downloading coding dependencies without needing any further assistance.
- The model is open-source and can be run locally.
It’s been just a few days since GPT-4o was launched and it took the generative AI world by storm with its natural language processing and strong vision capabilities. Many developers around the world have enjoyed GPT-4o’s capabilities in developing AI systems, running and debugging complex codes, and much more.
Now, just days later, we already have an AI coding Assistant that has surpassed GPT-4o in coding benchmarks. AutoCoder is here to completely shock the world in this ever-evolving Generative AI industry.
In this article, we will look into AutoCoder’s brand-new open-source technology and analyze this state-of-the-art coding assistant in much detail. So, let’s explore right away!
AutoCoder: The Latest AI Coding Assistant
Autocoder, a new large language model, has surpassed GPT 4 Turbo and GPT-4o Omni in benchmark tests. Its broad range of applications is enhanced by providing an adaptable code translator that permits the installation of external packages.
AutoCoder comes in two versions: 6.7 billion and 33 billion. The 33 billion version has beaten GPT-4 Turbo and GPT-4o in the Human Eval Benchmark.
By installing necessary packages automatically and allowing for selective interpreter use, Autocoder performs better than other models. This is what developers are looking for nowadays, a coding assistant that can install all dependencies without any instructions. This saves up a lot of time for hectic coding tasks and development projects.
AutoCoder comes with a unique training methodology termed AIEV Instruct (Instruction Tuning with Agent-Interaction and Execution-Verified). This technique combines agent interactions with external code execution and verification. This is a highly innovative technical collaboration that takes the coding potential to very high limits.
This allows AutoCoder, whenever the user wants to run the code, to install the necessary packages automatically and try running the code until it determines there are no problems.
This open code interpreter offers freedom in code verification by executing all generated Python code without requiring user input. Overall, this LLM is the perfect example of an AI agent doing the coding tasks for you automatically, without requiring any assistance at all!
How can you Access it?
AutoCoder is currently Open-Source and is accessible to all. You can download AutoCoder’s model weights from Hugging Face and run it locally using LM Studio.
See this video below, to understand how you can setup AutoCoder Locally!
AIEV-Instruct: The Overall Architecture
The overall architecture of AutoCoder is built upon AIEV-Instruct, a novel method for creating high-quality large code datasets. Through agent interactions, it simulates programmers developing code and running unit tests while guaranteeing documentation accuracy while working with an external code executor.
“Compared to previous large-scale code dataset generation methods, AIEV-INSTRUCT reduces dependence on proprietary large models and provides execution-validated code dataset.”
Bin Lei, primary author, in Autocoder’s research article.
The architecture comprises a Teaching Stage and a Self-Learning Stage, which lessens the annotation process’s dependency on expensive closed-source models. Let’s take a look at these two stages in depth.
The Teaching Stage
During the Teaching Stage, GPT-4 Turbo is used as the teacher model to supplement and rectify the open-source code snippets used in the model. This stage is composed of 4 phases:
1. Initialization
Two tasks are allocated to GPT-4 Turbo: programmer and questioner.
Ensuring diversity in the generated data can prevent convergence to a particular discussion template and produce a more uniform probability distribution. The dialogue messages start as an empty list and are used to hold data during the process.
The dialogue will eventually be recorded in numerous rounds on this list and will be included as a single data entry to the final dataset.
2. Proposing the Question
To run OSS-Instruct, GPT-4 Turbo is used. A problem description and a targeted solution are created, including a code snippet based on the open-source code fragment.
The primary purpose of GPT-4 Turbo here is to supply certain Unit Tests. These unit tests further guarantee the accuracy of the code in the dataset. The problem description, the solution, and the unit tests are successively appended with the dialogue messages initialized in the preceding stage.
3. Execution Feedback
The researchers used multiple rounds of execution feedback to check the generated code, thereby improving the dataset’s quality.
The code snippet created in the second stage was first entered into the Code Interpreter by the researchers. The dialogue messages append the comprehensive Stderr output in the event of an execution error. In the meantime, the questioner receives this Stderr data and uses it to create a natural language description.
The programmer will then continue to alter the code, receiving additional inquiries in natural language descriptions and the Stderr. The dialogue messages will continue this process, appending the new code that it generates.
4. Termination
Lastly, the model also uses the Code Interpreter to run the code generated by the programmer. If the program executes successfully, the Stdout is appended to the dialogue messages. This completes the analysis of one data entry
The Self-Learning Stage
During the Self-Learning Stage, AutoCoder refines its understanding through continuous interaction and feedback.
The Self-Learning Stage differs from the Teaching Stage in that the student model takes the place of the original teacher model in this stage. The entire execution feedback process is carried out by the student model, which is designated as the questioner and coder.
Overall, this approach ensures that AutoCoder generates accurate and reliable code, setting it apart from other models in the market.
The Shocking Benchmarks
AutoCoder has completely shocked the world with its powerful benchmarks.
According to the trial results, AutoCoder achieved remarkable results in Java, C++, and Rust, with Pass@1 scores of 61.4%, 68.9%, and 60.8%, respectively. Its performance was only surpassed by a few models (like CodeQwen1.5-Chat) in the other three languages. This illustrates how strong AutoCoder’s multilingual code creation skills are.
It is superior to OpenAI’s GPT-4 Turbo and GPT-4 Omni in the HumanEval benchmark, and it performs exceptionally well in coding jobs. Its test accuracy on the HumanEval base dataset is higher than that of GPT-4 Turbo (April 2024). (90.2% as opposed to 90.9%).
Because of its better efficiency, AutoCoder is a great tool for developers looking for open-source, dependable solutions for their coding demands as well as sophisticated code interpretation skills.
What makes AutoCoder special?
Code Interpreter is one of the key parts of a coding assistant nowadays. The Code Interpreter helps the model to debug and run code, which is necessary for completely automating operations connected to scientific computations, sophisticated coding, and related duties.
The model must correctly identify the code blocks that it needs to execute in order to build a code interpreter. But currently, only a few AI models such as GPT-4o and InternLM-Chat support such interpreters.
These interpreters’ inability to communicate with external systems and their closed operating environment, however, severely restricts their ability to run code that calls for the installation of external packages.
This is where AutoCoder’s ability to install dependencies without requiring any sorts of instructions comes into play. This is accomplished by training the model to execute commands with the bash script when necessary.
For a simple single execution feedback example, the original data entry contains three parts: natural language from the User; natural language + bash command + natural language + code block + natural language from the Assistant; and execution result from the code interpreter.
Conclusion
AI coding assistants will never be the same after AutoCoder. With its innovative capabilities, distinct training approach, and exceptional benchmark performance, it’s an excellent option for developers and enterprises searching for a potent open-source solution. AutoCoder could change the way we approach coding tasks with its capacity to install external packages, generate precise code, and learn continually through feedback and interaction.