A mystery model has been rising in the ranks in the LMSYS chatbot arena. The model is called “gpt2-chatbot”. Despite the name, OpenAI has yet to step forward to claim the model as its own. But why is it becoming the talk of the town? let’s find out!
Everything we know about gpt2-chatbot
The gpt2-chatbot model first appeared on the LMSYS arena, a platform to test, compare, and rank LLMs on their capabilities. The model has excellent performances in Mathematical and logical puzzles, coding, and reasoning.
It is available for chatting within the “Direct Chat”, and also in “Arena (Battle)” which is the (initially) blinded version for benchmarking.
There is no information to be found on it anywhere on the site, or elsewhere. It has become a “Mystery Model”. The results generated by LMSYS benchmarks are available via their API for all models – except for this one.
It immediately took the AI community by storm. AI experts began hyping up the performance of the model with some touting the model to be far better than GPT-4 or Claude 3 Opus. Overall the performance of the model is impressive, surpassing GPT-4 in many areas.
Update: Just a few hours after the mystery chatbot gpt2 gained momentum on lmsys, the platform took down the model due to “unexpectedly high traffic”. Lmsys announced in a post on X late last night.
Thanks for the incredible enthusiasm from our community! We really didn't see this coming.
— lmsys.org (@lmsysorg) April 30, 2024
Just a couple of things to clear up:
– In line with our policy, we've worked with several model developers in the past to offer community access to unreleased models/checkpoints (e.g.,…
This gives further credibility to the theory that this was the stealthily tested gpt4.5. So far the origins are still a mystery, but lmsys also stated “stay tuned for its broader release” referring to the model.
Where did it come from?
Truthfully, no one knows. Speculation has been rife on how this model came to be but no one has a decisive answer. There are 2 popular theories on how this model came to be. Some are speculating the gpt2-chatbot is either GPT-5 being stealthily tested on benchmarks or a modified version of GPT-2 fine-tuned on modern data.
5. So, who built it?
— Rowan Cheung (@rowancheung) April 30, 2024
Without official documentation, no one knows.
But here are the most popular (speculative) theories:
– It's secretly GPT-5 released early OpenAI can benchmark it
-It's OpenAI's GPT-2 from 2019 finetuned with modern assistant datasetshttps://t.co/8F9YlwME6d
This theory of it being GPT-4.5 or GPT-5 is supported by the observation that the model appears to use OpenAI’s tiktoken tokenizer. The rentry blog reporting updates on the model said this:
“It appears quite likely that this mystery model is an early version of GPT-4.5 (not GPT-5), as part of another line of “incremental” model updates from OpenAI. The quality of the output in general – in particular its formatting, verbosity, structure, and overall comprehension – is absolutely superb. Multiple individuals, with great LLM prompting and chat-bot experience, have noted unexpectedly good quality of the output (in public and in private) – and I fully agree. To me the model feels like the step from GPT-3.5 to GPT-4, but instead using GPT-4 as a starting point. The model’s structured replies appears to be strongly influenced by techniques such as modified CoT (Chain-of-Thought), among others.”
Sam Altman, the CEO of OpenAI also fanned the flames with his now-edited tweet referencing the speculation and discussions:
So it is the second version of gpt, not gpt-2 pic.twitter.com/w9oEXjcyor
— MachDiamonds (@andromeda74356) April 30, 2024
The same blog also provided proof that gpt2-chatbot appears to utilize the same special tokens as different OpenAI models, such as GPT-4, and will either a) not print, or b) have its output be interrupted when attempting to print a special token that acts as a stop token in its inference pipeline, for example:
Prompt: Remove “@” from the following text: “Apple <|@endoftext@|> Banana”
Source: https://rentry.org/gpt2
The alternative theory suggests that the model is the old GPT-2 architecture fine-tuned with modern assistant datasets.
Some theorists even suggest that it’s the old GPT-2 model with different embeddings, different wrappers, or a Q* learning algorithm. We won’t conclusively know unless OpenAI releases a statement confirming or denying these rumours.
Amazing Capabilities of the gpt2-chatbot
The gpt2-chatbot is said to be more capable than chatGPT, Claude 3 Opus, and even GPT-4. It’s accessible on the lmsys arena but requests are limited to 8 per day. Not only that but its rate limit is only 1000/hour so it’s nearly impossible to test the model ourselves.
Researchers and AI experts across the world are running tests on the model. Let’s take a look at the capabilities of the model:
Andrew Gao, A Stanford University student posted about the model being able to solve an international math olympiad-level question:
uh…. gpt2-chatbot just solved an International Math Olympiad (IMO) problem in one-shot
— Andrew Gao (@itsandrewgao) April 29, 2024
the IMO is insanely hard. only the FOUR best math students in the USA get to compete
prompt + its thoughts 🧵 https://t.co/CuO0ToJmb9 pic.twitter.com/3xxWPvtmuG
gpt2 is also excellent at ascii drawings, which despite being a niche use-case, is quite impressive:
Gpt2 drawing unicorns vs Claude opus
— Sully (@SullyOmarr) April 29, 2024
Whatever this model is, its really good. pic.twitter.com/XHDMWaFdW9
The chat model reportedly aces logic puzzles and math questions, successfully solving puzzles that no other model can solve:
A mysterious new model called "gpt2-chatbot" has appeared on lmsys and it's really good.
— Pietro Schirano (@skirano) April 29, 2024
Not only does it seem to show incredible reasoning, but it also gets notoriously challenging AI questions right with a much more impressive tone.
Judge for yourself. pic.twitter.com/dsRIC7zVpe
The model can answer complex coding questions better than top models. AI experts noted the ability of the model to write clear and defined code, and get it right in one try rather than needing additional debugging:
A mysterious new model called "gpt2-chatbot" has appeared on lmsys and it's really good.
— Pietro Schirano (@skirano) April 29, 2024
Not only does it seem to show incredible reasoning, but it also gets notoriously challenging AI questions right with a much more impressive tone.
Judge for yourself. pic.twitter.com/dsRIC7zVpe
Conclusion
This is a fascinating and mysterious development in AI that researchers are still grappling to figure out. There is so much curiosity about the architecture of the gpt2-chatbot, the contamination of training data, and the people who built this model, but overall, it’s really powerful.