A group of researchers has successfully filled a vacuum in the field of AI by connecting two AI models and establishing communication. Let’s understand the results we got!
Highlights:
- Researchers at the University of Geneva conducted a study in which they developed an Artificial Neural Network.
- The AI system could learn and execute several novel tasks and describe them to another AI system, developing communication.
- The model is a vast improvement over GPT and multimodal alternatives by achieving AI-to-AI communication.
New Research on AI Models Communication
Scientists at the University of Geneva have made an Artifical Neural Network that can learn tasks and then relay that knowledge to another AI system for replication.
This study has been successful in simulating an artificial neural network with this level of cognitive ability. This AI was able to learn and execute several mundane tasks, and then it was able to describe those tasks linguistically to a “sister” AI, which carried them out.
As humans, we are capable of picking up new skills from brief instructions and speaking them enough for someone else to repeat. This is essential to human communication and a fundamental aspect of the world we live in.
Till now, this ability has been exclusive to humans and distinguishes us from other species that require multiple tries along with positive or negative reinforcement signals to learn a new task, without having the ability to convey the information to their fellows.
This study recreates this human faculty with the help of Natural Language Processing – a sub-field of artificial intelligence. It aims to provide the blueprint for devices that can comprehend and react to spoken or written language. This came just in a time when there is a lot of talk about AGI and its impact.
How was the Research Conducted?
The study was led by Alexandre Pouget and his team. They utilized the power of Natural Language Models in collaboration with Artificial Neural Networks (that are based on our Neurons, carrying and transmitting electrical signals to the Brain).
It combines sensorimotor and language knowledge, enabling AIs to communicate and comprehend commands to carry out tasks like reaching for an object on a shelf or moving in a specific route.
“We, therefore, seek to leverage the power of language models in a way that results in testable neural predictions detailing how the human brain processes natural language in order to generalize across sensorimotor tasks.”
Alexandre Pouget, Department of Basic Neuroscience, University of Geneva
Let’s look into the whole research process step by step that has potentially given rise to a human-like technology on hand.
1) Set of Pre-Trained RNNs: S-Bert
An RNN (sensorimotor-RNN) model was trained by the researchers using a series of fifty straightforward psychophysical tasks. Each task’s instructions were processed by the network using a language model that had already been trained.
The 50 tasks were essentially split into five categories: “Go,” “Decision-making,” “Comparison”, “Duration” and “Matching,” in which tasks performed within a group needed different responses even when they shared identical sensory input structures.
Two example tasks were stated by the researchers. The decision-making (DM) task requires the network to react in the direction of the stimulus with the highest contrast, whereas the anti-decision-making (AntiDM) task requires the network to react in the direction of the stimulus with the lowest contrast.
For every task, the RNN models provided a motor response activity after receiving sensory input and task-specific data. Two one-dimensional maps of neurons, each representing a distinct input modality, were used to encode input stimuli. The maps had periodic Gaussian tuning curves to angles (over (0, 2π)).
The same encoding method was also applied to output answers. One fixation unit was also included in the inputs. The model was capable of reacting to the input stimuli once the input fixation was turned off.
2) Task Identification
The researchers used two non-linguistic control modules as the task-identifying inputs. Initially, in SIMPLENET, a task’s identity was represented by one of fifty orthogonal rule vectors. Secondly, tasks were encoded using combinations of a set of ten orthogonal structure vectors that STRUCTURENET employs.
Each vector represents a dimension of the task set. The response takes place from the weakest to the strongest set. Consequently, all pertinent links between the tasks were fully captured by STRUCTURENET, but SIMPLENET didn’t encode any of this structure.
3) A Sensorimotor RNN and Instruction Embedding
Finally, the researchers employed a multimodal model called the language embedder from CLIP, which learns a combined embedding space of text captions and images. Using the language model LANGUAGEMODELNET, they gave it the name Sensorimotor-RNN and included a letter describing its size.
The final hidden state of the transformer was pooled for each language model, and this fixed-length representation was then run through a set of linear weights that had been trained during task learning. This led to an instruction embedding that was 64 dimensions in all models.
Weights associated with language models are fixed unless otherwise noted. Lastly, they examined a bag-of-words (BoW) embedding strategy that embeds each instruction solely using word count data as a control.
Overall, during the initial phase of the study, the neuroscientists trained this network to mimic Wernicke’s area, the region of the brain responsible for language perception and interpretation.
During the second phase, the network was trained to mimic Broca’s region, which is in charge of creating and articulating words and is influenced by Wernicke’s area. Using standard laptop computers, the entire procedure was completed. After that, the AI received written instructions in English.
What did the Results Show?
The researchers tested S-Bert alongside several models such as StructureNet, SimpleNet, and GPTNet on novel psychological tasks.
The results were impressive which showed that their top-performing models were SBERTNET (L) and SBERTNET, which demonstrated that these networks can infer the correct semantic content even for completely new instructions. They achieved an average performance of 97% and 94%, respectively, on validation instructions.
It was a vast improvement on the traditionally used GPTNet in which the performance threshold fell below 95%. The researchers had to additionally lower the performance level to 85% for all GPT-based models.
For every job, S-Bert accurately deduced the shared semantic meaning across 15 different instruction formulations.
How is it An Improvement over Multimodal Models?
The outputs of some well-known machine learning models, like GPT, are challenging to interpret in terms of a sensorimotor mapping that we might anticipate occurring in a biological system, despite the fact these models can use natural language as a prompt to perform linguistic tasks or render images.
Although the actions of more recent multimodal interactive agents may be easier to understand, their use of a perceptual hierarchy that combines language and vision at an early processing stage makes it challenging to map their actions onto anatomically and functionally separate language and vision areas in human brains.
BERT is a machine learning model that is trained to locate hidden words inside a text passage. It also uses an unsupervised sentence-level objective in which the network is given two sentences and has to decide whether or not they follow each other in the original text.
Using the Stanford Natural Language Inference task, a hand-labelled dataset that details the logical relationship between two candidate sentences, SBERT is trained similarly to BERT but with additional customization.
S-Bert not only shows improvement in novel psychological task results, but it fully encapsulates the means to achieve human-like communication between two AI models.
The AI system was able to understand and carry out commands, correctly completing novel, unseen tasks 83% of the time based only on verbal instructions.
When given tasks to learn, the system could produce descriptions in such a way that another AI could comprehend and perform the same tasks with a comparable success rate.
Conclusion
This study lays a new foundation for AI models that can communicate with each other. This expands on the models’ capacity to acquire new skills and interact with language, creating new avenues for robotics research and development. This is just a novel idea right now but we can’t wait till it gets employed by major tech giants to benefit developers worldwide!