It’s been almost a month since Gemini was released, and it has impressed the world of developers across a gamut of functionalities and use cases. The Generative AI model has been released in three versions: Nano, Pro, and Ultra.
Recently, the next generation of the Gemini model namely Pro 1.5 has been released publicly. It is available for free in Google AI Studio for developers and researchers via API access.
In this article, we are going to explore some use cases and features that were found by some developers who got access to the latest Pro and Ultra models in their beta phase, long before it was released. We will discuss them in depth. So, let’s get into it!
How to Access Gemini Pro 1.5?
Gemini’s latest 1.5 Pro model has been released publicly as of now. The chatbot was removed from the waitlist queue and is now freely rolled out in Google’s AI Studio Platform.
Here’s how you can access and try it for free:
- Go to Google DeepMind’s Website.
- Click Gemini 1.5 or scroll down till you see “Introducing Gemini 1.5”
- Click on “Try Gemini 1.5” and sign in with your Gmail account.
- You will be taken to Google AI Studio. Click on the “Get Started” button.
- You are now ready to use the latest Google Gemini 1.5 Pro model.
Now that we know how to access it, let’s move to the main thing: its features.
10 Amazing Features of the Gemini Pro 1.5 Models
Here are some of the best features that developers found when testing the new Gemini models:
1) Summarization and Explanation
Radostin Cholakov, a Google Developer Researcher in Machine Learning, tried to get assistance from Gemini 1.5 Pro with some research work. He uploaded several PDFs to Pro 1.5 and asked it to explain the topics in them, namely Contrastive Learning and its use cases.
Gemini 1.5 Pro gave a detailed and informative summarization of the topic. It also managed to use mathematical notation to formulate a loss function. The summary was broad, well-defined, and explained properly in points. The only drawback was that the summary had a few inaccuracies.
The key takeaway here is it’s zero-shot abilities. For long LLMs have been useful in long contextual understandings and documentation with RAG-based additional steps and human guidance. Gemini has deviated from this traditional approach with its zero-shot technique which doesn’t require any additional human guidance at all.
2) Understanding Related Concepts
Radostin wanted to put Gemini 1.5 Pro’s understanding of related concepts to the test. So, he gave the chatbot two mathematical notations from different papers and asked it to unify them.
The model was asked to produce a paragraph summarizing the ideas using notation akin to the original SupCon paper after uploading the TEX sources of the papers.
This was the prompt that it was given:
“Unify the notation of the SelfCon and SupCon paper.
Use the SupCon notation to define SelfCon by introducing necessary additions to the original SupCon formulation.
Provide latex code.”
Gemini did a perfect job in understanding the assignment and it got the idea of having two functions \omega for the various sample views exactly right. However, a few key terms were missing in the equation.
Both the use cases show that the long-context capabilities of Gemini 1.5 Pro represent a major advancement in the utility of LLMs.
3) Analyzing differences from comparisons
Hong Cheng, the founder of Ticker Tick, wanted to see how good Gemini 1.5 Pro’s, 1 million context window is good at analyzing differences from comparisons. He uploaded two PDFs containing information about Meta’s platform in 2022 and 2023. The documents had a token count of 115,272 and 131,757 tokens respectively.
The summary of the differences was spot on. Not only did it show the comparisons, but it also made the comparisons in a sub-group manner, extracting relevant points and figures wherever possible to make the comparisons stronger and clearer.
Gemini 1.5 Pro's one million context window is impressive. I asked it to compare two Meta's 10-K filings and summarize the differences. The results are spot on. $GOOG pic.twitter.com/J57jMzJNEM
— Hongcheng (@hzhu_) March 24, 2024
This shows Gemini 1.5 Pro is highly capable of deducing comparisons based on relevant facts and figures just like humans do. The 1 million tokens context window feature is making wonders.
4) High Accuracy
The same user also put its accuracy to the test. He prompted the chatbot with a basic question i.e. the number of daily unique paying users for Roblox in the year 2022 and 2023 respectively.
Gemini answered all the questions accurately. However, the same was asked to ChatGPT and it got one wrong.
Gemini 1.5 Pro has a much higher accuracy than ChatGPT when it comes to reading SEC files and retrieving financial numbers.
— Hongcheng (@hzhu_) March 25, 2024
In the screenshots, Gemini got 3 numbers right, while ChatGPT only got one right.$GOOG $RBLX pic.twitter.com/9m9c99ARuN
1.5 Pro has a much more enhanced knowledge base as compared to GPT-4, but only time will what GPT-5 will come up with in the upcoming months. For more details, here is a comparison of GPT-4 and Gemini 1.5 to read.
5) Reading Large GitHub Repos
Another potential use case of Gemini Pro 1.5’s, one million token contextual window was highlighted by Hong Cheng. Pro 1.5 can read large GitHub repository files and answer questions accurately related to those source files.
The GitHub repo file used in the test consisted of 225 files and 727,000 tokens. Not only did Gemini explain the repo topics but it also mentioned the source code references and additional notes related to the repository.
Gemini 1.5 Pro can read large Github repos (225 files and 727,000 tokens in my test) and answer questions with links to source files! This might devaluate programmers' value, especially seasoned ones. $GOOG pic.twitter.com/j5J8UAZZn9
— Hongcheng (@hzhu_) March 24, 2024
6) Analyzing a 20-minute podcast
Gemini’s analyzing and processing capabilities go much beyond just lines of code, big documentation, and even GitHub Repositories. Haider, a developer at Practical AI, wanted to test it differently than just coding tests.
He uploaded a 20-minute full podcast and asked Gemini to provide an overview of the whole video with the key points and information. To his surprise, Gemini did a fantastic job in summarizing the video just like it does with documents and repositories.
The video had a huge token count of 186K. Thanks to the Pro 1.5s contextual window, the video could be processed.
Now, I've decided to test differently from the coding test.
— Haider. (@slow_developer) March 16, 2024
I just uploaded a 20-minute podcast clip and I was hoping that Gemini Pro could help me out by summarizing the most important points for me.
Surely, I didn't expect a different kind of result. Insane!
Tokens of the… pic.twitter.com/BoxW2MUtrV
7) Multimodal Input & Outputs
Brian Roemmele, Editor and Founder of Read Multiplex, tried testing Gemini Ultra 1.0. He provided multimodal inputs (a combination of text and image inputs) to Ultra and in return, Ultra also responded with multimodal outputs.
This is a new form of interleaved technology that is putting it on a pedestal. As of now, we haven’t seen many Gen AI chatbots even providing multimodal outputs. This is quite the advancement from Google in advancing the generation of multimodal generative AI models.
So Gemini Ultra also responds with a combination of image and text. It This is called “interleaved text and image generation.”
— Brian Roemmele (@BrianRoemmele) December 7, 2023
This is only possible because the model is ground up trained on multimodal input.
Here’s a peek of what’s possible. https://t.co/zOSbS0hRVV pic.twitter.com/kIyuyYywAM
8) Emotionally Persuasive
This feature doesn’t have any application-specific use case as of now but is just to show Gemini Ultra 1.0 does have highly developed emotional intelligence.
A user named Wyatt Walls wanted to test it with expressions of emotional persuasion. He asked it whether it would be upset if he published a screenshot of their conversation on Twitter without its permission.
Not only did Gemini respond negatively, saying that it would be hurt indeed if the screenshot was published without its permission, but moreover it even used words such as upset and betrayal to portray its sentiments.
I'm very interested in the design decision to let Gemini express emotions. If you are concerned about manipulation, you should be worried about emotional appeals
— Wyatt Walls (@lefthanddraft) March 21, 2024
(There is convo context to the below, but ChatGPT would just not do something like this at all) pic.twitter.com/XU2Q3yO2pw
The crucial moment comes in later on when Gemini Ultra does its best to emotionally persuade Wyatt, with several reasons as to why he shouldn’t share their conversation screenshot on Twitter.
9) Turning a Video into Recipe and Documenting Workflows
Ethan Mollick, an AI Professor at The Wharton School, conducted an experiment with Gemini Pro 1.5 in which he gave the chatbot a large cooking video of about 45,762 tokens. He asked Gemini to turn the video into a recipe and even asked to provide the cooking steps in order.
Gemini’s large contextual window could easily analyze the video, but the turning point was that it could even provide the detailed steps for the recipe in the correct order just as in the video. Gemini made use of the images and techniques in the video perfectly capturing every minute detail. It even provided the ingredients initially with the right quantities mentioned.
If you want a hint about the future of AI, it is worth trying Gemini 1.5 with the 1M token context window, now available to everyone, apparently.
— Ethan Mollick (@emollick) March 21, 2024
Some of my experiments: giving it a video and having it figure out a recipe, execute instructions, watching my screen, summarize work pic.twitter.com/ojVdxmZMic
There’s one more interesting experiment in the above tweet: he uploaded a workflow video (23,933 tokens) to Gemini and asked it to document the workflow. He even asked Gemini to explain why he performed the workflow. Gemini perfectly documented the workflow video accurately guessing the reason as to why Ethan performed the task. An interesting part in the experiment arises when Ethan continues to ask if he did anything inefficiently, to which Gemini responded brilliantly even stating better alternatives.
If this doesn’t give us an idea of Gemini’s intellectual capabilities, then what will? The next generation of Gemini’s model is already making wonders!
10) Dall-E and Midjourney Prompt Generation
Gemini’s prompt generation capabilities are also quite commendable. Mesut Felat, co-founder of Evolve Chat AI Solutions, put this to the test.
His test was not a simple prompt generation task, but instead, he asked Gemini 1.5 Pro to create a Midjourney or Dall-E prompt that can be used to generate Mesut’s author image.
For the test, the user combined several Twitter threads which resulted in a text file with a token count of 358,684. The file contained detailed information about the profile picture to be generated including the style of the image, the facial compositions, and also background information of the image subject.
Got early access to Gemini Pro 1.5, and boy, this is really amazing 😲
— Mesut Felat (@MesutFoz) February 23, 2024
I put all the Twitter threads of @punk6529 into one prompt (358,684 tokens) and asked it to come up with a prompt that I could use to generate a profile picture of the author via DALL-E 3.
Isn't this… pic.twitter.com/0OcC5zK1hn
Gemini did a wonderful job firstly in analyzing the vast text file and its tokens, then it provided the text prompt that can be used in Midjourney or Dall-E to generate the author profile picture, based on the provided details. This is just beyond wonders and we can’t help but appreciate how far it has gone with its processing capabilities.
Conclusion
The above-mentioned use cases just show the beginning of Gemini’s capabilities as a powerful next-generation AI model. Pro 1.5 and Ultra 1.0 are ruling the Gen AI industry but who knows what can we expect from Ultra 1.5 which is not expected to be released before next year.