Generating 3D assets using AI
Imagine creating 3D assets using AI. In this article, we review the first AI models that make this possible and compare commercial and open-source options.
Generative AI is at the forefront of a remarkable technological revolution today. Since the introduction of the groundbreaking Transformer paper by Google researchers in 2017, the tech community has recognized the transformative potential of AI. However, it wasn’t until OpenAI’s release of the now-famous ChatGPT, on November 30, 2022, that the world truly began to grasp the profound impact AI will have on the future. As a result, the number of companies and technology labs investing in AI has been steadily increasing year by year.
Large Language Models (LLMs) emerged first, laying the groundwork for the development of more complex multimodal models. One area where this evolution has had a significant impact is the audiovisual sector. Multimodal models capable of combining text and images, known as Vision-Language Models (VLMs), have since been introduced. The audiovisual revolution gained momentum with the announcement of CLIP and DALL-E by OpenAI in 2021. DALL-E, which uses a variant of the GPT-3 architecture to generate images, marked a pivotal moment. The year 2022 saw a surge in AI advancements, with companies like MidJourney and Stability AI releasing their own AI models for image generation.
Following the success of image generation models, the first AI models for audio and music creation emerged, such as Google Research’s AudioML and MusicML. In 2023, tools like Suno and Udio pushed the boundaries further, becoming state-of-the-art in AI-driven audio creation, capable of generating complex soundtracks and music across various genres.
In February 2024, OpenAI announced SORA, a new video generation model that, while not the first of its kind, represented a significant leap forward in the field. The year 2024 has been particularly fruitful for AI, with the release of various tools such as Runway, Dream Machine, and Kling, among others.
As video generation becomes increasingly viable, the next frontier appears to be the generation of 3D assets using AI. This year has seen the introduction of the first AI models for 3D asset creation, including TripoAI, Meshy, Genie, and CSM.
TripoAI, Meshy, and CSM offer web-based services for users, as well as REST APIs for developers to integrate these services into external tools. These platforms typically operate on a credit-based system, where users purchase credits that are consumed with each generation. In contrast, Genie, developed by Luma Labs, was launched in an alpha version and is currently free to use.
The following table provides a comparison of the AI services for 3D generation mentioned below:
These AI models enable the generation of 3D assets from text or images, producing both geometry and textures. However, the quality and fidelity of the results vary between models. Additionally, some of these models offer extra features, such as generating PBR textures for the assets or rigging and animating the 3D models.
To evaluate the capabilities of these AI models, the Plain Concepts Research team has created a test batch for comparison, which you can see below.
First, we compare the commercial AI models for 3D generation using a text input.
Tag | Prompt |
Person | A person in a black official suit, red tie, highly detailed, ultra-realistic, A-pose |
Animal | Realistic horse |
Object | Gorgeous chair with blue cloth and wooden armrests |
Food | Photorealistic, high-quality hamburger with detailed textures |
Fantasy | Noisy goblin |
In the second test, we compare the AI models using an image input. The images for this test were generated using Flux (the new AI model for image generation AI model from Black Forest Lab).
Note. Genie (The AI model from Luma Labs) does not appear in the comparison because it currently lacks the image-to-3D feature.
On the other hand, there are also open-source AI models that can be run on local machines. Some of the most notable are TripoSR, CRM, InstantMesh, or SF3D. These models are particularly valuable because they allow you to experiment, refine, and enhance them, and integrate them into your tools without relying on external services.
To evaluate these models, we are recreating the same test batch used earlier with the commercial models.
Additionally, Meta recently introduced a new 3D generation model called Assets3DGen, which shows great promise with its high-quality results. This model supports text-to-3D, image-to-3D, and even PBR materials creation. Although Meta announced the project as open source, the source code has not been made public yet, so we are unable to include it in our comparison test at this time. We are staying tuned for updates in the coming months, but in the meantime, we shared the announcement video for Assets3DGen.
Finally, as part of our commitment to working with cutting-edge technology and providing our clients with the best solutions, the Plain Concepts Research team is developing an experimental tool for generating 3D assets using AI models. This tool is built on TripoAI services and Evergine, a powerful industrial graphics engine developed by Plain Concepts. The result is DonatelloAI, a tool that allows users to:
- Text to 3D
- Image or Sketch to 3D
- AutoRig and animating generated models
- Stylization
- Optimized models and export to multiple formats GLTF, USDZ, FBX, OBJ, STL.
- Model Gallery
- Compose large scenes with generated models
This experimental tool is available in the following repository, and anyone can test it on their local machine by downloading the tool from here:
This is just the beginning. In a few years, creatives will have access to powerful new tools that will enable them to push boundaries even further. At Plain Concepts, we are excited to join media companies on this journey toward the next leap forward.