People are Testing AI Models with Minecraft Builds

People love Minecraft and like to see who can create the most beautiful builds there. Now, if AI models want to be like us, they need to be creative too. So, this new website lets you test and compare which AI models are good at Minecraft.

Minecraft Benchmarking for AI Models

MC-Bench or Minecraft Benchmarking is a website created by a 12th-grader Aditya Singh. On the website (mcbench.ai), you can compare two AI models on how well they can generate innovative Minecraft creations using the same prompt.

MC-Bench serves as a benchmarking platform specifically designed to evaluate AI models’ capabilities in generating Minecraft builds.

Here’s how it works: when you visit the website, you are shown two creations and you have to vote for which one looks better. For example, here are two tables made by two different AI models:

You have to vote for one of them but there is also a “Tie” option if you think both are equally good.

After voting, it will reveal the names of the AI models:

If you are a Minecraft player, this is a fun game you must try once. Here you can use your gaming skills to judge the AI models.

Here is another example of building Frosty the Snowman:

The builds also include Earth from space:

Herein, GPT 4.5 – Preview (2025-02-27) uses a simple Perlin Noise approximation to "Build our Earth as a sphere viewed from space, as detailed and realistic as possible."

Share link below.

cc: @OpenAI pic.twitter.com/8dYfl5GJxi
— Minecraft Benchmark (@_mcbench) March 14, 2025

Even unicorns:

Sometimes a model produces an elegant algorithm for placing blocks.

Other times it does the calculations "in its head."

Herein, GPT 4.5 – Preview (2025-02-27) just lays down the blocks to create "A fancy colorful Unicorn."

Share link below. pic.twitter.com/eVUOtwv3hZ
— Minecraft Benchmark (@_mcbench) March 13, 2025

Overall, users vote on the best Minecraft build before discovering which AI created it. This means it is a human preference leaderboard, just like LMArena.

Minecraft has achieved remarkable success since its release in 2009, becoming the best-selling video game of all time. As of October 2023, it has sold over 300 million copies worldwide. That’s why the creator of this website used Minecraft for benchmarking AI models. He talked about it to Techcrunch:

“Minecraft allows people to see the progress (of AI development) much more easily. People are used to Minecraft, used to the look and the vibe.”

-Aditya Singh

Traditional AI benchmarks typically use complex metrics and programming challenges that are difficult for the average person to understand. While valuable for researchers, these benchmarks often lack accessibility.

There is also a leaderboard available on the website. The #1 spot is currently held by Anthropic’s Claude 3.7 Sonnet. It has a win rate of 86% last time I checked. The runner-up is also an AI model by Anthropic: Claude 3.5 Sonnet. OpenAI’s GPT-4.5 Preview is on number 3.

According to the creator, the leaderboard reflects his own experience with these models, indicating that MC-Bench offers an accurate assessment.

People online are also find this it enjoyable. Some are calling it the “coolest benchmark ever”.

As of 15 March 2025, there are over 10,000 individual build samples have been voted on. There are still 20,000 builds yet to be evaluated, according to the latest update from their X.

Minecraft’s open-ended nature makes it an ideal testing ground for AI creativity. Benchmarking AI models in this environment helps determine how well AI can design within Minecraft’s constraints.

But this is not the first time games have been used for AI research. Classic games like Super Mario Bros, Street Fighter, and Pokemon Red were also used for testing the LLMs recently.

Takeaways

We have seen many ways in which we can test AI models but this is so far the most interesting method I have seen. This also adds some fun in this technical industry that might encourage young minds to get started with the AI world.

People are Testing AI Models with Minecraft Builds

Ranish Chauhan

RelatedPosts

7 Best AI Tools for Remote Job Seekers in 2025

9 Best AI Interview Assistant Tools For Job Seekers in 2025

AI Just Created a Full Tom & Jerry Cartoon Episode

Amazon’s New AI Makes Buying from Any Website Easy

What Went Wrong With Microsoft’s AI Version of Quake II?

About FavTutor

Categories

Important Subjects

Resources