It’s been only 5 months since the Devin hype, which is still not available to the public yet, and there is another new AI sensation in the software engineering world. Cosine, a human reasoning lab based in the UK, was founded in 2022 when the founder, Alistair Pullen, saw potential in using LLMs to perform complex tasks in the coding space by imitating human software developers’ behaviors.
Today, they have finally launched Genie, a fully autonomous AI software engineering model that can ideate, write, build, and test code iteratively until it succeeds. Find out more about this groundbreaking model now.
Highlights:
- World’s highest score on SWE-Bench: 30.08%, outperforming competitors by a wide margin
- Trained on unique dataset to imitate real software engineers’ workflow
- Generates successful code significantly faster than humans
How does Genie work?
Alistair Pullen, the CEO and co-founder of Cosine AI, has launched Genie, the most capable autonomous software engineer model that can develop code successfully without errors and faster than a human ever could. Watch the announcement video here:
I'm excited to share that we've built the world's most capable AI software engineer, achieving 30.08% on SWE-Bench – ahead of Amazon and Cognition. This model is so much more than a benchmark score: it was trained from the start to think and behave like a human SWE. pic.twitter.com/OyvqKLxcGV
— Alistair (@AlistairPullen) August 12, 2024
The CEO speaks on how they took an innovative approach to build this model:
- Typical LLMs work by predicting the next most probable token and then keep regenerating code when it fails. Cosine felt that to make a model that writes accurate code, it needs to watch and learn how a human software engineer works.
- They trained the model on a unique dataset containing examples of real software engineers doing their jobs. Apparently, the dataset contains perfect information lineage, incremental knowledge discovery, and step-by-step decision-making processes – representing everything a human engineer does logically.
- The model is set to tackle problems exactly as a human does. The company demonstrates a small example of how the model works in the announcement video.
- You can either write a prompt to the model describing your problem or attach a GitHub issue link. It initiates the process by thinking about what is needed to solve the problem.
- The model then starts finding all the files related to your issue from its codebase. Every single step taken by the model is iterative, so it keeps going until it is satisfied that it has found everything related.
This entire process is documented by the model. After finding the necessary files, it analyzes them and starts devising the next steps. It’s really interesting to see the entire process as the model thinks of issues and then addresses the problems itself. It almost feels like we are inside the brain of a real software engineer.
Next, the model starts building the code. The co-founder says that one of the greatest advantages they have in this data-first approach is that the model has watched more humans solve problems than any typical human can watch in their lifetime, so it has a great grasp on how to solve issues.
The model can also edit the code easily, unlike various base models that need to rewrite the whole code in case they find errors. The company has provided the model with some debugging tools that help it resolve any problem in the code that it may face. Genie can also try different approaches to the same problem until it succeeds in finding code that runs without any errors.
We have already mentioned that the entire process is iterative, so if Genie writes code that contains errors and cannot debug, it starts the whole process of thinking, finding files, planning action, and writing code again. In the demonstration video, Genie solved the given problem using different approaches in 84 seconds. Look at the picture of the output it has produced.
Not only this, but Genie can even write a PR and publish it on GitHub using the Cosine web Platform. If it receives any comments or feedback regarding the code written, it will understand them and act accordingly.
Evaluation metrics
The company has released a technical report detailing the results after they evaluated the model. Genie is touted to be the best software engineering model as it achieved 30.08% on the SWE-Bench evaluation and 50.67% on the SWE-Lite.
On SWE-Bench, after being tested on 2,294 mixed complexity tasks, According to the reported results, Genie outperforms its counterparts on the SWE-Bench evaluation. It outperforms major competitors like Factory Code David, AutocodeRover, and even some well-known names like Devin – the world’s first software engineer robot. It is even better than the best LLMs available right now, like ChatGPT 4 and Claude 3 Opus, even when combined with an SWE agent.
As we know that the model has been trained on more than a billion parameters, it is astonishing to see that the data mix mostly consists of JavaScript (21%) and Python (21%). Popular languages like Java, C++, C#, and C have less than 5% in the data mix.
When it comes to the data involving the process of developing the code, the data mix majorly contains Feature development (25%) and bug fixing (20%).
The company has further written that they are still working on increasing the capabilities of the model as there are many untapped potentials left to work on. They have stated, “By broadening the data and introducing new abilities, Genie will become proficient in more programming languages and the latest frameworks, meeting developers exactly where they work.” So expect many updates even after the model releases publicly.
Collaboration with OpenAI
The founder has thanked OpenAI in the last part of the video for allowing them to fine-tune the model on a large context window. Even on the website, they have written “powered by OpenAI” as part of their experimental access program. Earlier, Cosine and OpenAI were thought to be competitors, but they turned out to be collaborators. Recently, Cosine raised a €2.2 million round led by US venture firms Uphonest and SOMA Capital, with participation from Lakestar, Focal, and others.
Conclusion
We can’t wait to try our hands on Genie. However, the model is not available for public access yet. To use this model, you need to head to the company’s website and join the waitlist. Yes, you have read that correctly. Another waitlist.