Just days after Robotics company Figure partnered with OpenAI, they released a demo video where the robot can talk like a human. Netizens call it ‘ChatGPT with a Body’ after it went viral. Find out more about Figure 01’s GPT advancements!
Highlights:
- Figure releases its latest demo showcasing its robot’s new speech reasoning capabilities.
- Their Humanoid Robot can automate several mundane tasks and naturally interact with humans.
- Combined with OpenAI’s multimodal model, it unlocks tons of use cases that can have a massive impact in the future.
Meet Figure 01 Robot, Now Powered by OpenAI
Figure 01, which they call “The world’s first commercially viable autonomous humanoid robot”, was already a trending topic in the robotics space. But now after collaboration with OpenAI, this robot can converse like humans.
Figure demonstrated in a status update video that their Figure 01 robot can make full conversations with people, powered by OpenAI’s visual knowledge and speech intelligence.
Watch the official video here:
The robot was able to identify things put in front of it, answer the queries, do the task asked of it (giving an apple to the person), and explain how it did that at the same time it was doing something else.
It’s like giving ChatGPT a body by integrating its robot with Open AI’s vision-language model, thus allowing it to engage in natural human-like conversations, and perform tasks autonomously without manual intervention. The goal is to combine OpenAI’s research with Figure’s deep understanding of the underlying hardware and software for robotics.
The high-level visual and language intelligence combined with the underlying neural network architecture of Figure 01 helps to unlock a range of possibilities. Some of the exceptional tasks Figure 01 can perform range from basic questions regarding the environment it is functioning in to exact reasons behind a particular action performed.
Some of the new features will be:
- Understand Surroundings
- Use simple reasoning when needed
- Eliminate ambiguity and translate high-level requests
- State the reasoning behind a particular task performed
- Use conversational knowledge to understand pronouns like “they” and “them”
- Identify the best solution for a confusing query
The final robot is fully electric, 5 feet 6 inches in tall, weighs 60 kilograms with a 20 kg payload, and runs for 5 hours on a charge:
The Motive behind Figure 01
Figure Robotics says that their robot can give humans the ability to improve their productivity, address labour shortages, and reduce the number of workers employed in risky jobs. Here is what their CEO has to say:
“Today, we are seeing unprecedented labor shortages. There are over 10 million unsafe or undesirable jobs in the U.S. alone, and an aging population will only make it increasingly difficult for companies to scale their workforces. As a result, the labor supply growth is set to flatline this century. If we want continued growth, we need more productivity — and this means more automation.”
Brett Adcock, Founder of Figure
This gives us an idea of exactly what we need in the future – robots equipped with the ability to think, learn, reason, and engage with their surroundings and ultimately surpass humans in terms of performance.
How Figure 01 Works With OpenAI?
The architecture of Figure 01 is based on the concept of neural networks that deliver fast and skilful robot actions. The various steps used to process the input and generate the required output are as follows:
- The user queries the model by providing the input.
- All the behaviours are learned by the robot based on patterns. There is no teleoperation involved which means that Figure 01 doesn’t need to rely on human control for executing actions.
- The images captured through the robot’s cameras along with the transcribed text from the speech input are then fed to the large multimodal vision-language model (VLM) which has been trained by OpenAI. The input text is captured using the robot’s microphones. The OpenAI model deals with both images and text.
- Figure’s neural nets take images at 10hz through cameras present on the robot. The neural net then outputs 24 degrees of freedom actions at 200hz.
- The model is tasked with the responsibility of deciding which kind of behaviour to run on the robot so that it fulfils a given command. This includes loading the required neural network weights onto the Graphic Processing Unit (GPU) and executing a policy based on the context and input received.
- This model then goes through the entire conversational history including previous images.
- After this, it comes up with language responses that are converted from text to speech and then spoken by the model to the user.
Figure and OpenAI have successfully integrated motors, firmware, thermals, electronics, middleware, battery systems, and actuator sensors.
How Does It Differ from Tesla’s Optimus?
As seen in Tesla’s Optimus robot launch, there was a hand movement in the background in the right bottom corner showing the robot the direction to fold the laundry. This showed that the robot could not perform operations independently thus showing teleoperation – a method of remote manipulation.
Contrastingly, Brett Adcock, asserts that their robot’s performance is devoid of such tricks. “The video is showing end-to-end neural networks. There is no teleop,” he said in a tweet, emphasizing the genuine nature of Figure 01’s interactions:
The video is showing end-to-end neural networks
— Brett Adcock (@adcock_brett) March 13, 2024
There is no teleop
Also, this was filmed at 1.0x speed and shot continuously
As you can see from the video, there’s been a dramatic speed-up of the robot, we are starting to approach human speed
Recently, we also got an update on RFM-1 Robots by Covariant, which can make them think like humans.
Conclusion
Figure 01 is an unbelievable advancement, something that none of us thought would be possible a few years back. This partnership with OpenAI is a big step for both companies and how their products can be useful in the future. However, it also hampers the potential job prospects of various workers in the industry.