{"id":2433,"date":"2024-03-15T07:12:07","date_gmt":"2024-03-15T07:12:07","guid":{"rendered":"https:\/\/favtutor.com\/articles\/?p=2433"},"modified":"2024-03-15T07:59:55","modified_gmt":"2024-03-15T07:59:55","slug":"figure-robot-openai-demo","status":"publish","type":"post","link":"https:\/\/favtutor.com\/articles\/figure-robot-openai-demo\/","title":{"rendered":"OpenAI Makes Figure&#8217;s Robot Talk Like Human, Video Went Viral"},"content":{"rendered":"\n<p>Just days after Robotics company Figure partnered with OpenAI, they released a demo video where the robot can talk like a human. Netizens call it &#8216;ChatGPT with a Body&#8217; after it went viral. Find out more about Figure 01&#8217;s GPT advancements!<\/p>\n\n\n\n<p><strong>Highlights:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Figure releases its latest demo showcasing its robot&#8217;s new speech reasoning capabilities.<\/li>\n\n\n\n<li>Their Humanoid Robot can automate several mundane tasks and naturally interact with humans.<\/li>\n\n\n\n<li>Combined with OpenAI\u2019s multimodal model, it unlocks tons of use cases that can have a massive impact in the future.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Meet Figure 01 Robot, Now Powered by OpenAI<\/strong><\/h2>\n\n\n\n<p>Figure 01, which they call <em>\u201cThe world\u2019s first commercially viable autonomous humanoid robot\u201d<\/em>, was already a trending topic in the robotics space. But now after <a href=\"https:\/\/www.popsci.com\/technology\/openai-wants-to-make-a-walking-talking-humanoid-robot-smarter\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">collaboration with OpenAI<\/a>, this robot can converse like humans.<\/p>\n\n\n\n<p><strong>Figure demonstrated in a status update video that their Figure 01 robot can make full conversations with people, powered by OpenAI&#8217;s visual knowledge and speech intelligence. <\/strong><\/p>\n\n\n\n<p>Watch the official video here:<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<div class=\"jeg_video_container jeg_video_content\"><iframe title=\"Figure Status Update - OpenAI Speech-to-Speech Reasoning\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/Sq1QZB5baNw?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/div>\n<\/div><\/figure>\n\n\n\n<p>The robot was able to identify things put in front of it, answer the queries, do the task asked of it (giving an apple to the person), and explain how it did that at the same time it was doing something else.<\/p>\n\n\n\n<p>It&#8217;s like giving ChatGPT a body by integrating its robot with Open AI\u2019s vision-language model, thus allowing it to engage in natural human-like conversations, and perform tasks autonomously without manual intervention.&nbsp;The goal is to combine OpenAI&#8217;s research with Figure&#8217;s deep understanding of the underlying hardware and software for robotics.<\/p>\n\n\n\n<p><strong>The high-level visual and language intelligence combined with the underlying neural network architecture of Figure 01 helps to unlock a range of possibilities.<\/strong> Some of the exceptional tasks Figure 01 can perform range from basic questions regarding the environment it is functioning in to exact reasons behind a particular action performed.<\/p>\n\n\n\n<p>Some of the new features will be:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Understand Surroundings<\/li>\n\n\n\n<li>Use simple reasoning when needed<\/li>\n\n\n\n<li>Eliminate ambiguity and translate high-level requests<\/li>\n\n\n\n<li>State the reasoning behind a particular task performed<\/li>\n\n\n\n<li>Use conversational knowledge to understand pronouns like \u201cthey\u201d and \u201cthem\u201d<\/li>\n\n\n\n<li>Identify the best solution for a confusing query<\/li>\n<\/ul>\n\n\n\n<p> The final robot is fully electric, 5 feet 6 inches in tall, weighs 60 kilograms with a 20 kg payload, and runs for 5 hours on a charge:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>The Motive behind Figure 01<\/strong><\/h3>\n\n\n\n<p>Figure Robotics says that their robot can give humans the ability to improve their productivity, address labour shortages, and reduce the number of workers employed in risky jobs. Here is what their CEO has to say:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>\u201cToday, we are seeing unprecedented labor shortages. There are over 10 million unsafe or undesirable jobs in the U.S. alone, and an aging population will only make it increasingly difficult for companies to scale their workforces. As a result, the labor supply growth is set to flatline this century. If we want continued growth, we need more productivity \u2014 and this means more automation.\u201d<\/p>\n<cite>Brett Adcock, Founder of Figure<\/cite><\/blockquote>\n\n\n\n<p>This gives us an idea of exactly what we need in the future &#8211; robots equipped with the ability to think, learn, reason, and engage with their surroundings and ultimately surpass humans in terms of performance.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>How Figure 01 Works With OpenAI?<\/strong><\/h3>\n\n\n\n<p>The architecture of Figure 01 is based on the concept of neural networks that deliver fast and skilful robot actions. The various steps used to process the input and generate the required output are as follows:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The user queries the model by providing the input.<\/li>\n\n\n\n<li>All the behaviours are learned by the robot based on patterns. There is no teleoperation involved which means that Figure 01 doesn\u2019t need to rely on human control for executing actions.<\/li>\n\n\n\n<li>The images captured through the robot\u2019s cameras along with the transcribed text from the speech input are then fed to the large multimodal vision-language model (VLM) which has been trained by OpenAI. The input text is captured using the robot\u2019s microphones. The OpenAI model deals with both images and text.<\/li>\n\n\n\n<li>Figure&#8217;s neural nets take images at 10hz through cameras present on the robot. The neural net then outputs 24 degrees of freedom actions at 200hz.<\/li>\n\n\n\n<li>The model is tasked with the responsibility of deciding which kind of behaviour to run on the robot so that it fulfils a given command. This includes loading the required neural network weights onto the Graphic Processing Unit (GPU) and executing a policy based on the context and input received.<\/li>\n\n\n\n<li>This model then goes through the entire conversational history including previous images.<\/li>\n\n\n\n<li>After this, it comes up with language responses that are converted from text to speech and then spoken by the model to the user.<\/li>\n<\/ol>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/How-Figure-01-works-with-OpenAI-1024x576.jpg\" alt=\"How Figure 01 works with OpenAI\" class=\"wp-image-2436\" srcset=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/How-Figure-01-works-with-OpenAI-1024x576.jpg 1024w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/How-Figure-01-works-with-OpenAI-300x169.jpg 300w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/How-Figure-01-works-with-OpenAI-768x432.jpg 768w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/How-Figure-01-works-with-OpenAI-750x422.jpg 750w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/How-Figure-01-works-with-OpenAI-1140x641.jpg 1140w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/How-Figure-01-works-with-OpenAI.jpg 1280w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p>Figure and OpenAI have successfully integrated motors, firmware, thermals, electronics, middleware, battery systems, and actuator sensors.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>How Does It Differ from Tesla\u2019s Optimus?<\/strong><\/h3>\n\n\n\n<p>As seen in Tesla\u2019s Optimus robot launch, there was a hand movement in the background in the right bottom corner showing the robot the direction to fold the laundry. This showed that the robot could not perform operations independently thus showing teleoperation &#8211; a method of remote manipulation.<\/p>\n\n\n\n<p>Contrastingly, Brett Adcock, asserts that their robot\u2019s performance is devoid of such tricks. \u201cThe video is showing end-to-end neural networks. There is no teleop,\u201d he said in a tweet, emphasizing the genuine nature of Figure 01\u2019s interactions:<\/p>\n\n\n\n<div align=\"center\"><blockquote class=\"twitter-tweet\" data-conversation=\"none\"><p lang=\"en\" dir=\"ltr\">The video is showing end-to-end neural networks<br><br>There is no teleop<br><br>Also, this was filmed at 1.0x speed and shot continuously<br><br>As you can see from the video, there\u2019s been a dramatic speed-up of the robot, we are starting to approach human speed<\/p>&mdash; Brett Adcock (@adcock_brett) <a href=\"https:\/\/twitter.com\/adcock_brett\/status\/1767914155183681673?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">March 13, 2024<\/a><\/blockquote> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/div>\n\n\n\n<p>Recently, we also got an update on <a href=\"https:\/\/favtutor.com\/articles\/covariant-rfm-1-llm\/\">RFM-1 Robots by Covariant<\/a>, which can make them think like humans.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>Figure 01 is an unbelievable advancement, something that none of us thought would be possible a few years back. This partnership with OpenAI is a big step for both companies and how their products can be useful in the future. However, it also hampers the potential job prospects of various workers in the industry.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A new demo video of Figure 01 showcases its new visuals and speech capabilities after partnering with OpenAI.<\/p>\n","protected":false},"author":18,"featured_media":2440,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jnews-multi-image_gallery":[],"jnews_single_post":null,"jnews_primary_category":{"id":"","hide":""},"footnotes":""},"categories":[57],"tags":[],"class_list":["post-2433","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai"],"_links":{"self":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/2433","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/users\/18"}],"replies":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/comments?post=2433"}],"version-history":[{"count":8,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/2433\/revisions"}],"predecessor-version":[{"id":2447,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/2433\/revisions\/2447"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media\/2440"}],"wp:attachment":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media?parent=2433"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/categories?post=2433"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/tags?post=2433"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}