{"id":2448,"date":"2024-03-15T08:50:01","date_gmt":"2024-03-15T08:50:01","guid":{"rendered":"https:\/\/favtutor.com\/articles\/?p=2448"},"modified":"2024-03-15T08:50:02","modified_gmt":"2024-03-15T08:50:02","slug":"google-sima-ai-gaming-agent","status":"publish","type":"post","link":"https:\/\/favtutor.com\/articles\/google-sima-ai-gaming-agent\/","title":{"rendered":"Google&#8217;s SIMA Can Play 3D Video Games But Like Humans"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">We know AI can play video games, but now it can play like humans and not just to beat high scores! Google\u2019s DeepMind team unveiled the world\u2019s first AI Gaming Agent called SIMA. So, let&#8217;s look further into its amazing features!<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Highlights:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Google\u2019s DeepMind unveils SIMA, short for Scalable Instructable Multiworld Agent.<\/li>\n\n\n\n<li>This AI is trained in various gaming environments with the help of natural language instructions from users.<\/li>\n\n\n\n<li>Tested on 9 games, with the help of 4 testing environments and several pre-trained models. <\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What is Google&#8217;s SIMA?<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>SIMA is the world\u2019s first AI gaming agent designed by Google and trained in several virtual gaming environments. It has been described as a basic generalist agent that acts as a general instructable game-playing agent.\u00a0<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">SIMA stands for Scalable Instructable Multiworld Agent. It can work with just natural-language instructions to play 3D video games, just like humans. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The goal is not to win the game with the best score possible, which other AI can do by accessing the game&#8217;s source code. SIMA only gets image inputs from the screen and instructions from the users. Now, it plays the video game by only using keyboard and mouse controls.<\/p>\n\n\n\n<div align=\"center\"><blockquote class=\"twitter-tweet\"><p lang=\"en\" dir=\"ltr\">Introducing SIMA: the first generalist AI agent to follow natural-language instructions in a broad range of 3D virtual environments and video games. \ud83d\udd79\ufe0f<br><br>It can complete tasks similar to a human, and outperforms an agent trained in just one setting. \ud83e\uddf5 <a href=\"https:\/\/t.co\/qz3IxzUpto\" target=\"_blank\">https:\/\/t.co\/qz3IxzUpto<\/a> <a href=\"https:\/\/t.co\/02Q6AkW4uq\" target=\"_blank\">pic.twitter.com\/02Q6AkW4uq<\/a><\/p>&mdash; Google DeepMind (@GoogleDeepMind) <a href=\"https:\/\/twitter.com\/GoogleDeepMind\/status\/1767918515585994818?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">March 13, 2024<\/a><\/blockquote> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">SIMA can follow instructions in a variety of gaming environments in different video settings and learn accordingly. It can use the virtual environments as a reference to even innovate further and expand its knowledge base.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This update comes days after <a href=\"https:\/\/favtutor.com\/articles\/google-genie-ai-create-game-environments\/\">Google launched its Genie AI<\/a> that can create entire playable virtual environments.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>How was SIMA trained?<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">This AI agent was exposed to several virtual gaming environments thanks to DeepMind\u2019s partnership with several game developers.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Google worked with eight different game companies to train and test SIMA on nine distinct video games, including Hello Games&#8217; No Man&#8217;s Sky, Tuxedo Labs&#8217; Teardown, Valheim, and Wobbly Life.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Every game in SIMA&#8217;s library introduces players to a brand-new interactive environment and a variety of new abilities to pick up, such as basic menu navigation and resource mining, spacecraft navigation, and helmet construction.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Secondly, four research environments were used to train SIMA. According to their <a href=\"https:\/\/storage.googleapis.com\/deepmind-media\/DeepMind.com\/Blog\/sima-generalist-ai-agent-for-3d-virtual-environments\/Scaling%20Instructable%20Agents%20Across%20Many%20Simulated%20Worlds.pdf\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">research paper,<\/a> They chose 3D embodied environments that provide a wide variety of unrestricted interactions\u2014rich and profound linguistic interactions are possible in these kinds of environments. <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"423\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/SIMA-Research-Environments-1024x423.jpg\" alt=\"SIMA Research Environments\" class=\"wp-image-2449\" srcset=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/SIMA-Research-Environments-1024x423.jpg 1024w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/SIMA-Research-Environments-300x124.jpg 300w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/SIMA-Research-Environments-768x317.jpg 768w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/SIMA-Research-Environments-750x310.jpg 750w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/SIMA-Research-Environments-1140x471.jpg 1140w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/SIMA-Research-Environments.jpg 1184w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">One of the chosen environments named Construction Labs was built with Unity.\u00a0<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Construction Labs provides agents with a brand-new study setting where participants must construct unique objects and sculptures out of interlocking construction blocks, such as dynamic devices, ramps to climb, and bridges to cross. Cognitive skills including manipulating objects and having an intuitive grasp of the physical world are its main focus.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The other three environments namely Playhouse, ProcTHOR, and WorldLab were used for graphical interactions, data collection, and physics simulation respectively.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Pre-Trained Models<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">First, in addition to components that were trained from scratch, SIMA\u2019s agent architecture also includes several pre-trained models, such as Phenaki, a video prediction model, and SPARC, a model that was trained on fine-grained image-text alignment. It also includes a text encoder for pre-processing and collection of text-based input data.\u00a0<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The agent can leverage internet-scale pretraining while maintaining specificity in the settings and control tasks it faces by combining these pre-trained models with fine-tuning and from-scratch training.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Transformers<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">To create a state representation, SIMA&#8217;s agent makes use of encoded language instruction, Transformer-XL that attends to prior memory states, and trained-from-scratch transformers that integrate into the various pre-trained vision components.\u00a0<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A policy network that generates keyboard-and-mouse actions for sequences of eight actions is fed the resultant state representation as input. The agent is trained by behavioural cloning with an additional goal of completion prediction.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Classifier Free Guidance<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">When the trained agent was run in an environment, the Classifier-Free Guidance was also utilized to enhance the linguistic conditionality of the agent. Although CFG was first suggested to improve text-conditioning in diffusion models, it has also shown promise in language models and language-conditioned agents.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>SIMA\u2019s Workflow: How Does the Agent Operate?<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>SIMA can recognize and comprehend a range of settings before acting to accomplish a given task. It consists of a video model that forecasts the next scene on the screen and a model for accurate image-language mapping. <\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Using training data in 3D settings, Google improved these models.<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"418\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/Google-SIMA-workflow-1024x418.jpg\" alt=\"Google SIMA workflow\" class=\"wp-image-2450\" srcset=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/Google-SIMA-workflow-1024x418.jpg 1024w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/Google-SIMA-workflow-300x122.jpg 300w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/Google-SIMA-workflow-768x313.jpg 768w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/Google-SIMA-workflow-750x306.jpg 750w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/Google-SIMA-workflow-1140x465.jpg 1140w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/03\/Google-SIMA-workflow.jpg 1280w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p class=\"wp-block-paragraph\">This is highly efficient and optimal when it comes to data collection. Compared to other traditional AI models, SIMA says goodbye to the need for a gamer\u2019s source code or APIS. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It just needs two inputs: the user&#8217;s straightforward, natural language instructions and the pictures displayed on the screen. To carry out these commands, SIMA controls the game&#8217;s main character via keyboard and mouse outputs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Because of its straightforward, widely-used interface, SIMA can theoretically communicate with any virtual environment. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Further, this agent translates linguistic commands and visual observations into keyboard and mouse movements. When the user provides the right directions, it breaks down activities into simpler subtasks that can be reused in whole new situations and circumstances.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Followed by the data collection process from the user instructions, the agent is trained on the data using the pre-trained models, environments, transformers, and the CFG. This is highly important to make it intelligent and interactive across several environments.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Performance Compared to Generalized Gaming Environments<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>When Google DeepMind assessed SIMA agents that were trained on a selection of nine 3D games from their library, they considerably outperformed all specialized agents that were trained exclusively on those games.&nbsp;<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In the test, DeepMind evaluated SIMA\u2019s environment-specialized agents\u2019 performance in following instructions to complete nearly 1500 unique in-game tasks, in part using human judges. They used SIMA\u2019s performance as a baseline comparison against three types of generalist SIMA agents, that were trained across multiple environments.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>SIMA&#8217;s Research Update<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Google\u2019s DeepMind said that SIMA is still in the development phase and requires more research to perform at never-seen-before levels of human-level performance. In the official announcement, they stated:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\">&#8220;SIMA\u2019s results show the potential to develop a new wave of generalist, language-driven AI agents. This is early-stage research and we look forward to further building on SIMA across more training environments and incorporating more capable models.&#8221;<\/p>\n<\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">They will expose the gaming agent to more training worlds and improve its abilities in the future. The idea is to build general AI agents that can do different tasks.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">SIMA is an exceptional advancement in the world of AI agents. The idea of training from video games has become a reality. This tool will soon allow developers to have diverse virtual environments on the tip of their fingers!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Google&#8217;s new SIMA gaming agent can play video games with natural language instructions. Find out how SIMA&#8217;s AI works and how it was trained.<\/p>\n","protected":false},"author":15,"featured_media":2452,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jnews-multi-image_gallery":[],"jnews_single_post":null,"jnews_primary_category":{"id":"","hide":""},"footnotes":""},"categories":[57],"tags":[56,108,58,107],"class_list":["post-2448","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-ai","tag-gaming","tag-google","tag-sima"],"_links":{"self":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/2448","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/users\/15"}],"replies":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/comments?post=2448"}],"version-history":[{"count":2,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/2448\/revisions"}],"predecessor-version":[{"id":2453,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/2448\/revisions\/2453"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media\/2452"}],"wp:attachment":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media?parent=2448"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/categories?post=2448"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/tags?post=2448"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}