Both Gemini 1.5 and GPT 4 have taken the world of Generative AI by storm with their latest updates. Developers across the world now want to maximize their full potential and explore all their use cases. Here in this blog, we will compare the benchmarks of the Gemini 1.5 vs GPT-4 and see which tool is better suited for developers.
Gemini 1.5 Pro Has a Large Contextual Window
OpenAI completely transformed the scenery of Generative AI with the release of Sora’s text-to-video cutting-edge technology. However, Google came back right into the scene with the release of Gemini 1.5, the latest upgrade to its rebranded version of Bard.
Gemini 1.5 Pro comes with the largest contextual window with 1 million tokens. It highly surpasses ChatGPT’s token count of 128k.
With a context window surpassing all its predecessors, Gemini possesses the ability to take in more information and process it in a given prompt. The vast amounts of information can range from 1 hour of video to even 11 hours of audio.
In a latest blog, the CEO of Google said:
This new generation also delivers a breakthrough in long-context understanding. We’ve been able to significantly increase the amount of information our models can process — running up to 1 million tokens consistently, achieving the longest context window of any large-scale foundation model yet.
-Sundar Pichai, CEO of Google and Alphabet CEO
Therefore, developers looking to process vast sources of information in a shorter time must stick to Gemini 1.5 Pro. Its high-performance capabilities will allow developers to work with different forms of input media such as words, images, videos, and audio. They can also expect more relevant, consistent, and useful processed output as compared to before.
Audio and Video Processing
Even after following Sora’s release of cutting-edge text-to-video technology, it has been observed that Gemini 1.5 possesses a better understanding of video and audio multimedia.
It indicates a better sense of understanding and analyzing video captioning and video question answering, two of the key aspects in generating content from video data.
It also surpasses GPT 4 in audio processing, showcasing its superiority in understanding and translating spoken language. The data is obtained from the official Google technical paper and study by bito.ai.
Video Understanding:
Benchmark | Gemini 1.5 | GPT-4 Turbo | Description |
VATEX | 63% | 56% | English Video Captioning |
Perception Text MCQA | 56.2% | 46.3% | Video Question Answering |
Audio Processing:
Benchmark | Gemini 1.5 | GPT-4 Turbo | Description |
CoVoST 2 | 40.1% | 29.1% | Automatic Speech Translation |
FLEURS | 6.6% | 17.6% | Automatic Speech Recognition |
If you are a developer looking to generate factual information and derive information from video and audio multimedia, Gemini seems a better option for now at least.
General and Mathematical Reasoning
When it comes to comprehension of the subject matter, Gemini 1.5 slightly outperforms GPT 4 when it comes to generating subjective content and detailed answers.
However, it falls behind GPT 4 when it comes to reading comprehension and analyzing commonsense reasoning for everyday tasks. GPT 4 also outshines Gemini when it comes to complex mathematical concepts and its nuanced understanding.
General Reasoning and Comprehension:
Benchmark | Gemini 1.5 | GPT-4 Turbo | Description |
MMLU | 81.9% | 80.48% | Multitask Language Understanding |
Big-Bench Hard | 84% | 83.90% | Multi-Step Reasoning Task |
DROP | 78.9% | 83% | Reading Comprehension |
HellaSwag | 92.5% | 96% | Common Sense Reasoning for Everyday Tasks |
Mathematical Reasoning:
Benchmark | Gemini 1.5 | GPT-4 Turbo | Description |
GSM8K | 91.7% | 92.95% | Basic Arithmetic and Grade School Math Problems |
MATH | 58.5% | 54% | Advance Math Problems |
Take a look at this post from X, where a user can be seen asking a similar question to both Gemini and GPT:
Gemini vs GPT-4 settled for good. pic.twitter.com/hCzU7Uap92
— Emsi (@emsi_kil3r) February 16, 2024
Gemini provides a better-detailed explanation but yet makes a very silly mistake in describing its appearance. GPT’s answer may not be elongated but it simply represents the required information correctly.
Code Generation
Here comes the X factor that every developer is looking for.
GPT 4 still outperforms Gemini 1.5 when it comes to generating optimal code snippets.
All developers would want a tool that not only generates code but makes it optimal, robust, and most importantly highly accurate. The tool better suited for those operations would be GPT 4, whose benchmarks show a higher capacity for Python code generation.
Benchmark | Gemini 1.5 | GPT-4 Turbo | Description |
HumanEval | 71.9% | 73.17% | Python code generation |
Natural2Code | 77.7% | 75% | Python code generation, new dataset |
Yet, Gemini still possesses a higher dataset for code generation thanks to its recent upgrade to 1 million tokens. Gemini 1.5 Pro shows remarkable accuracy in analyzing large datasets, with a 100% recall rate for up to 530,000 tokens.
When the dataset size is increased to one million tokens, its accuracy marginally decreases to 99.7%, but it still maintains an astoundingly high 99.2% accuracy for datasets up to ten million tokens.
Now it’s up to developers to see for themselves which aspect of code they want to focus on. If you want clarity and accuracy, go for GPT 4. If you instead prefer diverse dataset codes with longer blocks, the answer is Gemini 1.5.
Is Gemini 1.5 better than GPT 4?
Based on the above research obtained from various sources, it’s hard to just say which tool is better than the other. The answer more or less depends on what users need and what type of tool functions they are looking to utilize.
Gemini 1.5 is promising when you are looking for text and content across various modalities. It can also work with various forms of multimedia such as images, texts, videos, and audio which can help in providing a more comprehensive and factual understanding of the subject matter.
However, GPT 4 is suited to other forms of developer needs such as Code Generation with accuracy, clarity, and robustness. And we must not forget that Sora AI still holds the ominous power of text-to-video generation which not only developers but firms and enterprises worldwide also can’t wait to get their hands on.
Conclusion
Both Gemini 1.5 and GPT-4 are excellent advancements in the world of Generative AI. Both tools are still limited to a few tech enthusiasts and enterprises we must be patient before we derive an absolute opinion on which one performs better. As of now, they are quite impressive and are fulfilling the users’ demands in a mutually exclusive manner.