Until now we have seen several sorts of Generative AI models, which can generate high-quality images and videos, and stunning audio in sound effects and music. Now we can get 3D Models generated in less than a minute, and all we have to do is just provide simple text prompts. Welcome Meta’s new AI: Meta 3D Gen!
Meta’s Gen 3D Model
Meta 3D Gen (3DGen) model is a new text-to-3D asset generator that can create high-quality 3D assets in seconds.
📣 New research from GenAI at Meta, introducing Meta 3D Gen: A new system for end-to-end generation of 3D assets from text in <1min.
— AI at Meta (@AIatMeta) July 2, 2024
Meta 3D Gen is a new combined AI system that can generate high-quality 3D assets, with both high-resolution textures and material maps end-to-end,… pic.twitter.com/rDD5GzNinY
It is compatible with physically-based rendering (PBR), which is required for real-world 3D asset relighting. Furthermore, 3DGen allows for the generative retexturing of previously generated (or artistically constructed) 3D shapes by utilizing user-supplied extra-textual inputs.
This means not only this model will reflect your prompt as it is in the form of attractive 3D assets, but it will also enhance its texture and lighting to give you the perfect blend of a model you could have imagined.
Not only this but there’s one more catch here. After the object is created, it takes only 20 seconds to further modify and customize its texture, offering more quality at a significantly lower cost than other options. Without any changes, the same method may be used to texture 3D meshes made by artists.
Latest Text-to-3D Asset Technology
It’s quite interesting when you look at the blueprint, i.e. the powerful innovative technical approach behind this text to 3D generation model.
3DGen, which builds upon AssetGen and TextureGen, efficiently integrates three highly complementary representations of the three-dimensional item: the volumetric space (three-dimensional shape and appearance), the UV space (texture), and the view spaces (pictures of the object).
Two components are combined in Meta 3D Gen, a two-stage method: one for text-to-texture generation and the other for text-to-3D generation. Higher-quality 3D generation is the outcome of this integration for the production of immersive content. Let’s look at the whole process through the stages in detail:
Stage 1) 3D Asset Generation
In Stage 1, a 3D asset is created using the Meta 3D AssetGen model in response to a text prompt supplied by the user. In this step, a 3D mesh with texture and PBR material maps is generated. It takes about 30 seconds to make an inference.
Using a multi-view and multi-channel variant of a text-to-image generator, AssetGen generates several fairly consistent views of the object to start this process.
Stage 2) Generative 3D texture refinement
Stage 2 creates a better-quality texture and PBR maps for a 3D asset created in Stage 1 and the original text prompt used for generation. It makes use of the Meta 3D TextureGen text-to-texture generator.
To elaborate, an initial version of the 3D object in volumetric space is extracted by an AssetGen reconstruction network. Mesh extraction is the next step, which determines the object’s 3D shape and initial texture.
This step is important for adding more depth and texture quality to the 3D material shapes initially generated.
Step 3) Generative 3D Retexturing
Ultimately, the TextureGen component regenerates the texture by combining UV-space and view-space generation to increase the material’s resolution and quality while maintaining the original prompt details.
Stage 2 can also be used to produce a texture for this 3D asset from scratch, given an untextured 3D model and a prompt specifying its desired appearance (the mesh can be already developed or made by an artist).
The inference takes about 20 seconds. All things considered, every phase of 3DGen expands upon Meta’s robust text-to-image models. Better textures are produced by fine-tuning the assets using synthetic 3D data rendering from an internal dataset to accomplish multi-view creation in both view space and UV space.
How Efficient is the Model?
The results obtained after testing Meta 3D Gen on various evaluations showed that it’s quite efficient when it comes to producing 3D assets while maintaining the integrity of the user prompt and enhancing the texture quality of the volumetric mesh shapes.
In both phases, 3DGen surpasses all industry baselines in this parameter, with third-party text-to-3D (T23D) generators emerging as the most formidable rival.
It was found that annotators with less 3D experience are insensitive to the presence of even minor texture and geometry artifacts and prefer assets with sharper, more vibrant, realistic, and detailed textures. Across all categories, professional 3D artists indicated a greater preference for 3DGen generations.
As a function of the scene complexity as indicated by the text prompt, the researchers additionally examine performance rates for visual quality, geometry, texture details, and the existence of texture artifacts.
Plots demonstrate that, although certain baselines match up well for basic prompts, 3DGen begins to significantly outperform baselines as prompt complexity rises, moving from objects to characters and their compositions.
They compare the 3DGen win rate to baselines and show the 50% threshold (dashed line) at which their approach outperforms the baselines.
Lastly, they also performed visual comparisons of Stage 1 and Stage 2. Stage 2 has a propensity for greater visual aesthetics, realism, and higher frequency details. Stage 2 generations were preferred in 68% of the cases across different objects and compositions.
Just a few months ago, Stability AI had a similar development in the same space with their TripoSR to generate 3D objects.
Conclusion
Meta 3D Gen lays a brand new foundation for Generative AI models in generating captivating 3D assets. Its innovative technology takes a step ahead the baseline models in not only just generating your desired shapes and meshes but also enhancing them across all compositions and texture qualities.