{"id":2006,"date":"2024-02-28T05:52:14","date_gmt":"2024-02-28T05:52:14","guid":{"rendered":"https:\/\/favtutor.com\/articles\/?p=2006"},"modified":"2024-02-28T05:53:46","modified_gmt":"2024-02-28T05:53:46","slug":"google-genie-ai-create-game-environments","status":"publish","type":"post","link":"https:\/\/favtutor.com\/articles\/google-genie-ai-create-game-environments\/","title":{"rendered":"Create An Entire Playable Game World with Google&#8217;s Genie AI"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Google is again changing the AI landscape with the help of its &#8216;Genie&#8217;, an AI that can let you create 2D game environments very easily. Let&#8217;s find out more about Google&#8217;s Genie AI and take a closer look at the technology behind its technology!<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Highlights:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Google announces Genie, an AI platform to generate 2D video game worlds. <\/li>\n\n\n\n<li>Can convert various inputs from single texts to synthetic images to even sketches.<\/li>\n\n\n\n<li>It is a foundation model and paves the way for various generalist agents and virtual worlds.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What is Genie AI and How it Works?<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Google announced the release Genie AI tool that allows developers to create 2D-based video game environments based on a single text or image prompt.<\/strong> Genie stands for &#8220;Generative Interactive Environments&#8221;.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">According to Google, Genie is a foundational AI model trained mostly from internet videos that can generate a variety of playable video games. The video games generated can range from various genres to others based on the type of input provided. You can provide it with single text input to synthetic images and drawn sketches.\u00a0<\/p>\n\n\n\n<div align=\"center\"><blockquote class=\"twitter-tweet\"><p lang=\"en\" dir=\"ltr\">I am really excited to reveal what <a href=\"https:\/\/twitter.com\/GoogleDeepMind?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">@GoogleDeepMind<\/a>&#39;s Open Endedness Team has been up to \ud83d\ude80. We introduce Genie \ud83e\uddde, a foundation world model trained exclusively from Internet videos that can generate an endless variety of action-controllable 2D worlds given image prompts. <a href=\"https:\/\/t.co\/TnQ8uv81wc\" target=\"_blank\">pic.twitter.com\/TnQ8uv81wc<\/a><\/p>&mdash; Tim Rockt\u00e4schel (@_rockt) <a href=\"https:\/\/twitter.com\/_rockt\/status\/1762026090262872161?ref_src=twsrc%5Etfw\" target=\"_blank\" rel=\"noopener\">February 26, 2024<\/a><\/blockquote> <script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">The gap between Generative AI and content produced continues to narrow, following the release of <a href=\"https:\/\/favtutor.com\/articles\/sora-ai-video-generator-openai\/\">OpenAI\u2019s Sora\u2019s text-to-video technology<\/a> and now with Genie AI\u2019s video game generation technology.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">According to <a href=\"https:\/\/arxiv.org\/abs\/2402.15391\" target=\"_blank\" rel=\"noopener\">Google&#8217;s technical paper<\/a>, Genie can be called a foundational model with its startling 11B parameter size. It has been trained over a large dataset of over 200,000 hours of publicly available video sources online. These videos allow the generation of 2D video games based on a frame-by-frame basis rather than requiring an actual gaming engine.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Genie works without training any action labels. Its ability to pick up precise controls from online videos sets it apart. This is a problem because, in most cases, Internet videos lack labels indicating which action is being done or even where the area of the image needs to be controlled.\u00a0<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Surprisingly, Genie not only determines which aspects of observation are typically under control but also deduces a variety of hidden behaviors that are homogeneous among the created environments.\u00a0<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What can Genie AI do?<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Genie can create 2D-based video games based on various sorts of input forms. Here we will look at the vast input forms that can be used to process and analyze into fruition.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Images of Video Games<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Genie can take a single image as input and produce a new interactive environment of video games. They use Imagen2 for this state-of-the-art model and produce frames from the input images which they can bring to life.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Below are some of Genie\u2019s generated video games from static images:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"422\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Image-to-Game-using-Genie-AI-1024x422.jpg\" alt=\"Image to Game using Genie AI\" class=\"wp-image-2007\" srcset=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Image-to-Game-using-Genie-AI-1024x422.jpg 1024w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Image-to-Game-using-Genie-AI-300x124.jpg 300w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Image-to-Game-using-Genie-AI-768x317.jpg 768w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Image-to-Game-using-Genie-AI-750x309.jpg 750w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Image-to-Game-using-Genie-AI-1140x470.jpg 1140w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Image-to-Game-using-Genie-AI.jpg 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<h3 class=\"wp-block-heading\"><strong>Sketches to Video Games<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Ever imagined a frame-by-frame analysis of your Sketches in the form of Video Games? Genie can do that!&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Genie\u2019s abilities to go beyond are quite intriguing as it can even generate video games from human-based input images such as sketches.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Below are some of the generated video games from sketches-based inputs:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"405\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Skteches-to-Games-using-Genie-AI-1024x405.jpg\" alt=\"Skteches to Games using Genie AI\" class=\"wp-image-2008\" srcset=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Skteches-to-Games-using-Genie-AI-1024x405.jpg 1024w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Skteches-to-Games-using-Genie-AI-300x119.jpg 300w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Skteches-to-Games-using-Genie-AI-768x304.jpg 768w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Skteches-to-Games-using-Genie-AI-750x297.jpg 750w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Skteches-to-Games-using-Genie-AI-1140x451.jpg 1140w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Skteches-to-Games-using-Genie-AI.jpg 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<h3 class=\"wp-block-heading\"><strong>Real-World Images to Video Games<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The ultimate power comes in the form of simulating real-world images in the form of 2D-generated video games. Although the output is still 2D, it quite amazingly captures all the necessary motions and action sequences as can happen in the real world frame by frame.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Below are some of the generated video games from real-world-based image inputs:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"394\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Real-World-Images-using-Genie-AI-1024x394.jpg\" alt=\"Real World Images using Genie AI\" class=\"wp-image-2009\" srcset=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Real-World-Images-using-Genie-AI-1024x394.jpg 1024w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Real-World-Images-using-Genie-AI-300x116.jpg 300w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Real-World-Images-using-Genie-AI-768x296.jpg 768w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Real-World-Images-using-Genie-AI-750x289.jpg 750w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Real-World-Images-using-Genie-AI-1140x439.jpg 1140w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/02\/Real-World-Images-using-Genie-AI.jpg 1200w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\"><strong>Looking into Genie\u2019s Architecture<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The build is quite intricate. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>Genie\u2019s architecture is mainly based on a Vision Transformer. Google\u2019s Deepmind team adopted an efficient ST-transformer architecture, keeping in mind the quadratic memory costs of videos associated with the transformers.<\/strong> This allows for balancing model capacity with computational constraints across all model components.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">An ST transformer is different from all traditional transformer models. A feed-forward layer (FFW) comes after interleaved spatial and temporal attention layers in an ST-transformer&#8217;s \ud835\udc3f spatiotemporal blocks, which are conventional attention blocks. This plays a huge role when it comes to Genie\u2019s frame-by-frame analysis of 2D video games.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When looked further into this model consists of 3 main components:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>1) Video Tokenizer<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">This part serves as a fundamental building block for the video game generation. The video tokenizer is responsible for efficiently processing massive video data into vast controllable units called tokens. These tokens form a part-by-part integration for the output 2D video generation.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>2) Latent Action Model<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">This section carefully examines how the videos&#8217; frames flow from one to the next. These flow movements can be anything from sprinting and jumping to interacting with the game&#8217;s elements.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">These are termed latent actions. The analysis of latent actions is performed by an encoder. It first takes input from all previous video frames and produces a corresponding output of various latent frames. The following frame is then predicted by a decoder using all of the prior frames&#8217; input as well as any latent actions.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>3) Dynamics Model<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The dynamics model is a MaskGIT transformer that just has decoders. It predicts the next frame tokens at each time step by using the tokenized video and latent actions. Google employs an ST-transformer once more, which allows them to use tokens from every frame and token action due to its causal nature.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It produces predictions for all the next frames. A cross-entropy loss between the predicted and ground-truth tokens is used to train the model.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In short, It creates the next graphic outcome based on the current condition of the game world, which includes the player&#8217;s actions. In the end, this constant process of prediction produces the appearance of an interesting and participatory gaming experience.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Are there Any Limitations?<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Notably, Genie has certain restrictions and is still in development. Following are the limitations it has as of now:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Limited visual quality:<\/strong> Genie can only produce games at a frame rate of one frame per second, which affects the visual fidelity.<\/li>\n\n\n\n<li><strong>Access restricted to researchers:<\/strong> It is still a Google DeepMind research project and is not currently accessible to the general public.<\/li>\n\n\n\n<li><strong>Ethical considerations:<\/strong> The potential for misuse requires careful thought, just like with any strong technology. To ensure responsible development and implementation, Google is working on ethical elements.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The Future of Generalist Agents and Generative AI<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Genie holds a vast potential for training a multitude of Generalist Agents. However, the quantity of readily available games frequently limits the ability to use game environments as a useful testbed for creating AI agents. The AI agents of the future can be trained in an endless curriculum of newly created worlds thanks to Genie.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Google also stated that Genie is a universal method that can be applied for training various virtual worlds and can be applied to several domains without requiring any domain knowledge at all.<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p class=\"wp-block-paragraph\"><em>We trained a smaller 2.5B model on action-free videos from\u00a0RT1. As was the case for Platformers, trajectories with the same latent action sequence typically display similar behaviors. This indicates Genie can learn a consistent action space which may be amenable to training embodied generalist agents.<\/em><\/p>\n<cite>via <a href=\"https:\/\/sites.google.com\/view\/genie-2024\" target=\"_blank\" rel=\"noopener\">Google\u2019s announcement blog<\/a><\/cite><\/blockquote>\n\n\n\n<p class=\"wp-block-paragraph\">It is anticipated that the Genie will revolutionize creativity in a variety of fields once it is unleashed. Its capacity to create interactive worlds with little input will pave the way for innovative developments in the fields of education and entertainment in the future.\u00a0<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Genie\u2019s announcement has left the world of Generative AI in chaos and people can\u2019t wait to get their hands on the latest video game generation technology. However, we must not forget that this is just the initial testing phase and Genie can encounter much more limitations.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Now you can create complete 2D video game environments using the Genie AI model. Find out about its architecture and how it works.<\/p>\n","protected":false},"author":15,"featured_media":2011,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jnews-multi-image_gallery":[],"jnews_single_post":null,"jnews_primary_category":{"id":"","hide":""},"footnotes":""},"categories":[57],"tags":[56,59,77,58],"class_list":["post-2006","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-ai","tag-generative-ai","tag-genie-ai","tag-google"],"_links":{"self":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/2006","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/users\/15"}],"replies":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/comments?post=2006"}],"version-history":[{"count":2,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/2006\/revisions"}],"predecessor-version":[{"id":2012,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/2006\/revisions\/2012"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media\/2011"}],"wp:attachment":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media?parent=2006"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/categories?post=2006"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/tags?post=2006"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}