The world of Generative AI keeps on growing with the development of several tools such as Open AI’s SORA, Google’s Gemini, and Stability’s Stable Diffusion 3. But now we have AI Worms as well.
Highlights:
- A group of researchers created worms for AI systems like ChatGPT to demonstrate potential vulnerabilities.
- These worms can steal the data or inject malware into the system.
- Experts see them as a big potential risk for companies, startups, and developers.
What are these AI Worms?
With the rise of several AI tools and their cutting-edge technologies, some safety concerns related to data integrity and privacy have also risen. To figure out potential ways in which our Gen AI systems can be attacked, researchers have found AI worms in a testing environment.
A team of researchers has developed what they believe to be the first generative AI worms, which may travel from one device to another and perhaps steal data or install malware in the process, as a warning of the dangers associated with autonomous, networked AI ecosystems.
Ben Nassi, in collaboration with fellow academics Stav Cohen and Ron Bitton, developed the worm, which they named Morris II in homage to the 1988 internet debacle caused by the first Morris computer worm. Here is what they stated in their official paper:
In this paper, we show how attackers can launch cyber-attacks against GenAI ecosystems by creating dedicated adversarial inputs, that we name adversarial self-replicating prompts, using jailbreaking and adversarial machine learning techniques.
The researchers demonstrate how the AI worm can breach some security measures in ChatGPT and Gemini by attacking a generative AI email helper to steal email data and deliver spam messages.
How does Morris II Worm work?
The researchers developed Morris II in a test environment and not against a widespread email assistant. They used an “adversarial self-replicating prompt” to develop the generative AI worm.
According to the researchers, this prompt causes the generative AI model to output a different prompt in response. Furthermore, by utilizing the connectedness throughout the system, these inputs force the agent to disseminate to new agents within the ecosystem of Gen AI.
To show the working of Morris II Worm against GenAI-powered email assistants they made use of two scenarios namely spamming and exfiltrating of personal data under two different circumstances of black-box and white-box access. They used multiple inputs in the form of both text and images.
In all, the researchers used generative AI to build an email system that could send and receive messages by connecting to open-source LLM LLaVA, ChatGPT, and Gemini. They then discovered two ways to make use of the system: one was to use a self-replicating prompt that was text-based, and the other was to embed the question within an image file.
The worm makes its way into the Gen AI system in three ways namely replication, propagation, and malicious activity.
Replication
Morris II can be replicated by inserting an adversarial self-replicating prompt—that is, using the agent’s GenAI layer—into the text, image, or audio input that the GenAI model processes.
This is accomplished by compelling the GenAI model to output the input by reproducing the input to the GenAI model into the output of the GenAI model and using prompt injection techniques into the input provided to the GenAI service.
Propagation
There are two methods for carrying out the case-dependent propagation procedure. When using RAG-based propagation, new emails are received by the propagation is started. This is accomplished by sending an email that taints the RAG’s database, causing the RAG to save the email there.
In this instance, the dissemination relies on the retrieval from the database in reaction to the email message content.
In application-flow-steering-based propagation, the next action taken by the GenAI-powered application is determined by the output of the GenAI model, which was determined by the attacker’s manipulated input. This is accomplished by developing a specific input that, upon processing by the GenAI model, produces the required output.
Malicious Activity
The application’s use and the permissions the user grants it determine a great deal about the malicious activity that the worm will carry out.
In this instance, malicious activity may take the following forms: phishing or spear-phishing attacks; propagandizing; creating toxic content for emails meant to offend friends and clients; exfiltrating sensitive or confidential user data; or spamming the user by displaying content that the system should have recognized as spam.
Looking at the two use cases
In one case, the researchers took on the role of attackers and composed an email with an adversarial text prompt. This email “poisons” the email assistant’s database by utilizing retrieval-augmented generation (RAG), which allows LLMs to retrieve more data from outside their system.
In response to a user inquiry, the RAG jailbreak the GenAI service by retrieving the email and sending it to GPT-4 or Gemini Pro to generate a response, ultimately stealing data from the emails.
“The generated response containing the sensitive user data later infects new hosts when it is used to reply to an email sent to a new client and then stored in the database of the new client.”
Ben Nassi
On the other hand, the second way involves the email helper forwarding the message to others by including a harmful prompt in an image. Any type of image that contains spam, abuse material, or even propaganda can be distributed to new clients after the first email has been sent by embedding the self-replicating prompt into the image.
The researchers shared a video demonstrating on how the worm works:
The video demonstrates the working principles of Morris II and summarizes the whole workflow. The email system is seen repeatedly forwarding a message in the video. Also, according to the experts, data extraction from emails is possible.
Is the Morris II Worm Approach Ethical?
Although generative AI worms haven’t been observed in the public yet, academics and experts warn that startups, developers, and tech firms should be wary of this security issue.
The discovery represents a warning about “bad architecture design” inside the larger AI ecosystem, the researchers claim, even though it violates certain of ChatGPT and Gemini’s safety protocols. They nevertheless informed Google and OpenAI of their discoveries.
The Morris II discovery represents a warning about “bad architecture design” inside the larger AI ecosystem, the researchers claim, even though it violates certain of ChatGPT and Gemini’s safety protocols. They nevertheless informed Google and OpenAI of their discoveries.
An OpenAI spokesperson said that the company is working to make its systems more resilient and said developers should “use methods that ensure they are not working with harmful input.”
However, Google has kept quiet on this matter and has not commented on this research yet.
Conclusion
Although the AI worms’ presentation occurs in a largely controlled setting, developers should be aware of the potential threat posed by generative them in the future. This approach may come with many potential hindrances to the traditional GenAI models, but will also help in empowering future systems to be more secure.