Devin got some new competition from an open-source alternative called SWE-Agent. It is an open-source agent that can turn any GitHub issue into a pull request.
Highlights:
- Researchers from Princeton NLP Group announced SWE-agent, an open-source AI software development system.
- It can turn language models like GPT-4 into software engineering agents that can fix bugs in real GitHub repositories.
- It achieves an accuracy of 12.29% on the SWE-bench benchmarks, very close to Devin AI’s 13.86%.
SWE-Agent Explained
The SWE-Agent (Software Engineering Agent) turns LMs into software engineering agents to fix bugs in GitHub repos.
It has demonstrated near-parity with Devin’s performance on the SWE-bench Benchmark. This remarkable performance showcases the potential for revolutionizing software engineers’ approach to addressing complex issues and streamlining their workflows.
The video below shows how an SWE agent resolves an issue in a GitHub repository by finding out what is causing the issue:
The agent takes an average of 93 seconds to complete any task. The system interacts with a specialized terminal that lets you open and search files, edit specific lines, and write and run tests.
How to access SWE-agent?
With SWE-agent being open-source, developers can leverage its capabilities by easily setting it up on their local machines. The setup instructions for local deployment are available on the agent’s official GitHub repository.
Developers can access the official demo on the official website. This free accessibility empowers software engineers to seamlessly integrate the agent into their current workflows, unlocking the advantages of AI-assisted development without requiring extensive technical know-how.
Working of SWE-agent
SWE-agent follows a systematic problem-solving strategy, which consists of planning, execution, observation, and iterative adjustment. This helps the agent to break down complex issues into simpler steps, ensuring efficient resolution of a problem.
This is accomplished by creating straightforward LM-centric commands and feedback structures to simplify the LM’s navigation of the repository, thus enabling it to view, edit, and execute code files.
This is called an Agent-Computer Interface (ACI) which facilitates communication between the agent and terminals. By enabling the agent to engage directly with the development environment, the interface reduces reliance on human involvement and accelerates the problem-solving process.
SWE-agent contains features that the team discovered to be immensely helpful during the ACI design process:
- They added a linter that runs when an edit command is issued and does not let the edit command go through if the code isn’t syntactically correct.
- They provided the agent with a custom file viewer rather than solely utilizing the ‘cat’ command for file display. It was observed that this file viewer functions optimally when presenting a maximum of 100 lines per iteration. Additionally, the developed file editor includes functionalities such as scrolling and search commands within the file.
- The agent was supplied with a specially designed full-directory string searching command. It was important for this tool to concisely list the matches, presenting each file containing a minimum of one match. Providing the model with more context about each match proved to be overly confusing for the model.
- When commands had an empty output, they returned a message saying “Your command ran successfully and did not produce any output.”
The image demonstrates the agent’s thought process to fix any issues that occur in a repository:
How does it compete with Devin?
SWE-agent achieves similar accuracy to Devin AI on the SWE-bench benchmark, solving 12.29% of problems autonomously, compared to Devin’s 13.86%.
However, it is important to remember that Devin was trained on only 25% of the SWE Benchmark. The agent takes, on average, 93 seconds to complete a task as opposed to 5 minutes by Devin.
Also, its open-source design lets developers access and contribute to it whenever needed. However, this is not the case with Devin which has not been officially released yet. This encourages developers to customize and expand their functionalities to tackle various software engineering hurdles. Still, there are many key things we found out about Devin AI to know about.
Conclusion
The potential influence of the SWE-Agent extends beyond simply improving GitHub issue management efficiency. Through leveraging the collective expertise of the developer community, the SWE-Agent could evolve into a tool capable of revolutionizing the software development and maintenance processes.