{"id":3694,"date":"2024-04-15T08:25:50","date_gmt":"2024-04-15T08:25:50","guid":{"rendered":"https:\/\/favtutor.com\/articles\/?p=3694"},"modified":"2024-04-15T08:25:52","modified_gmt":"2024-04-15T08:25:52","slug":"grok-1-5-vision-use-cases","status":"publish","type":"post","link":"https:\/\/favtutor.com\/articles\/grok-1-5-vision-use-cases\/","title":{"rendered":"New Use Cases of Grok-1.5V: It Can Now Understand Images"},"content":{"rendered":"\n<p>Elon Musk-backed Grok can now process a wide variety of visual input data like images, documents, and diagrams! <\/p>\n\n\n\n<p><strong>Highlights:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>xAI revealed Grok 1.5V, its first multimodal model capable of immense visual processing.<\/li>\n\n\n\n<li>A standout feature is its ability to convert logical diagrams into executable code.<\/li>\n\n\n\n<li>Grok 1.5V outperforms its peers in the new RealWorldQA benchmark that measures real-world spatial understanding.<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>The Capabilities of Grok 1.5V<\/strong><\/h2>\n\n\n\n<p><strong>Grok-1.5V is their first-generation multimodal model that can now process visual information as well.<\/strong> It will be available existing Grok users soon.<\/p>\n\n\n\n<p>This release comes mere weeks after xAI released their updated chatbot model Grok 1.5 featuring enhanced reasoning capabilities and a 128,000 tokens context. The model showed advanced coding and math capabilities and was praised for its minimal censoring and ability to give answers to controversial questions.<\/p>\n\n\n\n<p>Don&#8217;t forget that <a href=\"https:\/\/favtutor.com\/articles\/grok-ai-open-source\/\">Grok 1 is also open-source<\/a> now!<\/p>\n\n\n\n<p>Grok 1.5V outperforms GPTt-4 in text-reading, mathematics, and real-world question-answer. The real-world QA capabilities might just be the most impressive feature Grok has displayed yet.<\/p>\n\n\n\n<p>The company described 7 vision-based use cases for Grok in their blog from building Python code from a flowchart to counting calories based on the nutritional information on the packets. Here is an overview of the possible uses of this model in different domains.<\/p>\n\n\n\n<p>Let\u2019s take a look at all the use cases!<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>1) Writing Code from Diagrams<\/strong><\/h3>\n\n\n\n<p>This development is so impactful because it enables any person with a strong logical base to become a programmer without detailed knowledge of any language! <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"518\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image4-5-1024x518.png\" alt=\"Code from diagram using Grok\" class=\"wp-image-3695\" srcset=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image4-5-1024x518.png 1024w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image4-5-300x152.png 300w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image4-5-768x389.png 768w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image4-5-1536x777.png 1536w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image4-5-750x379.png 750w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image4-5-1140x577.png 1140w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image4-5.png 1591w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p>It empowers people to start building immediately without having to learn the finer intricacies of code.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>2) Calculation and Real-world understanding<\/strong><\/h3>\n\n\n\n<p>Far too many times AI models make errors in calculation even when all information is provided to them in text form. Here the model extracts information from the image and performs accurate mathematical calculations:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"516\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image3-4-1024x516.png\" alt=\" Real-world understanding using Grok 1.5V\" class=\"wp-image-3696\" srcset=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image3-4-1024x516.png 1024w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image3-4-300x151.png 300w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image3-4-768x387.png 768w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image3-4-1536x773.png 1536w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image3-4-360x180.png 360w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image3-4-750x378.png 750w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image3-4-1140x574.png 1140w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image3-4.png 1587w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p>We wonder if it will be able to analyze further more complicated mathematics like percentages and taxes. If so, imagine how efficient doing your taxes will be, simply by uploading records and asking the model to evaluate it!<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>3) Understanding Images<\/strong><\/h3>\n\n\n\n<p>The next three examples show the depth of image understanding shown by Grok. From a simple child&#8217;s drawing, it inferred the elements in the drawing and built a story around those elements:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"466\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image9-1-1024x466.png\" alt=\"Understaing Images using Grok 1.5 Vision\" class=\"wp-image-3697\" srcset=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image9-1-1024x466.png 1024w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image9-1-300x137.png 300w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image9-1-768x350.png 768w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image9-1-1536x699.png 1536w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image9-1-750x341.png 750w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image9-1-1140x519.png 1140w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image9-1.png 1586w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p>It even explained a meme in a pop culture context and analyzed a defective wood image to diagnose the problem:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"302\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image5-3-1024x302.png\" alt=\"meme explanation\" class=\"wp-image-3698\" srcset=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image5-3-1024x302.png 1024w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image5-3-300x88.png 300w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image5-3-768x227.png 768w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image5-3-1536x453.png 1536w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image5-3-750x221.png 750w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image5-3-1140x336.png 1140w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image5-3.png 1590w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p>The possible use cases of this particular application in industries like healthcare after fine-tuning are endless.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>4) Data extraction and problem-solving<\/strong><\/h3>\n\n\n\n<p>Grok also shows great ability to extract data from images and convert it to the required format as a CSV file or as a dataframe. It can give detailed solutions to competitive coding problems including the test cases:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"351\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image2-2-1024x351.png\" alt=\"Grok 1.5 Vision for Converting Table to CSV\" class=\"wp-image-3699\" srcset=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image2-2-1024x351.png 1024w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image2-2-300x103.png 300w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image2-2-768x263.png 768w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image2-2-1536x527.png 1536w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image2-2-750x257.png 750w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image2-2-1140x391.png 1140w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image2-2.png 1586w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p> The possibility that this ability will be misused in competitive coding competitions is very high, which seems a problem that platforms like leetcode will have to solve as soon as possible:<\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><img decoding=\"async\" width=\"1024\" height=\"530\" src=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image1-6-1024x530.png\" alt=\"Solving a Coding problem using Grok\" class=\"wp-image-3700\" srcset=\"https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image1-6-1024x530.png 1024w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image1-6-300x155.png 300w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image1-6-768x398.png 768w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image1-6-1536x795.png 1536w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image1-6-750x388.png 750w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image1-6-1140x590.png 1140w, https:\/\/favtutor.com\/articles\/wp-content\/uploads\/2024\/04\/image1-6.png 1588w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n<\/div>\n\n\n<p>Grok 1.5 V\u2019s standout ability is its real-world understanding, as they explained in their <a href=\"https:\/\/x.ai\/blog\/grok-1.5v\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">official announcement<\/a>:<\/p>\n\n\n\n<p>\u201cIn order to develop useful real-world AI assistants, it is crucial to advance a model&#8217;s understanding of the physical world. Toward this goal, we are introducing a new benchmark, RealWorldQA. This benchmark is designed to evaluate basic real-world spatial understanding capabilities of multimodal models. While many of the examples in the current benchmark are relatively easy for humans, they often pose a challenge for frontier models.\u201d<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>Conclusion<\/strong><\/h2>\n\n\n\n<p>The developers of Grok 1.5 vision anticipate significant growth in multimodal capabilities across all data types like audio, videos along with documents and images. The real-world understanding is a step in the right direction to <a href=\"https:\/\/favtutor.com\/articles\/agi-elon-musk-experts-prediction\/\">achieve AGI<\/a> eventually.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Grok 1.5 Vision preview unveils better visual processing, real-world understanding, and coding use cases with its first multimodal model.<\/p>\n","protected":false},"author":20,"featured_media":3703,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jnews-multi-image_gallery":[],"jnews_single_post":null,"jnews_primary_category":{"id":"","hide":""},"footnotes":""},"categories":[57],"tags":[56,101,100,72,114],"class_list":["post-3694","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-ai","tag-elon-musk","tag-grok","tag-llm","tag-xai"],"_links":{"self":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/3694","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/users\/20"}],"replies":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/comments?post=3694"}],"version-history":[{"count":2,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/3694\/revisions"}],"predecessor-version":[{"id":3704,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/posts\/3694\/revisions\/3704"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media\/3703"}],"wp:attachment":[{"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/media?parent=3694"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/categories?post=3694"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/favtutor.com\/articles\/wp-json\/wp\/v2\/tags?post=3694"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}