The DeanBeat: Nvidia CEO Jensen Huang says AI will automatically fill in the 3D images of the metaverse

The DeanBeat: Nvidia CEO Jensen Huang says AI will automatically fill in the 3D images of the metaverse

Interested in learning what’s next for the gaming industry? Join gaming leaders to discuss new parts of the industry this October at GamesBeat Summit Next. Register today.

It takes AI types to create a virtual world. Nvidia CEO Jensen Huang said this week during a Q&A at the GTC22 online event that AI will automatically fill in the 3D images of the metaverse.

He believes that AI will make the first pass at creating the 3D objects that populate the vast virtual worlds of the metaverse – and then human creators will take over and refine them to their liking. And while that’s a very big claim about how smart AI will be, Nvidia has the research to back it up.

Nvidia Research is announcing this morning that a new AI model could help make the massive virtual worlds created by a growing number of companies and creators more easily populated with a multitude of 3D buildings, vehicles, characters and more.

This kind of mundane imagery represents a huge amount of tedious work. Nvidia said the real world is full of variety: the streets are full of unique buildings, with different vehicles speeding by and different crowds passing through. Manually modeling a 3D virtual world that reflects this is incredibly time-consuming, making it difficult to fill out a detailed digital environment.

This kind of task is what Nvidia wants to make easier with its Omniverse tools and cloud service. It hopes to make developers’ lives easier when it comes to creating metaverse applications. And auto-generating art – as we’ve seen happen with the likes of DALL-E and other AI models this year – is a way to lighten the burden of building a universe of virtual worlds like in Snow crash or Ready Player One.

Jensen Huang, CEO of Nvidia, speaks at the GTC22 keynote.

I asked Huang in a press Q&A earlier this week about what could make the metaverse come faster. He alluded to the Nvidia Research work, though the company didn’t spill the beans until today.

“First of all, as you know, the metaverse is created by users. And it’s either created by us by hand, or it’s created by us with the help of AI,” Huang said. “And, and in the future, it’s very likely that we’ll describe some characteristic of a house or characteristic of a city or something like that. And it’s like this city, or it’s like Toronto, or it’s like New York City, and it creates a new city for us. And maybe we don’t like it. We can give it more questions. Or we can just keep hitting enter until it automatically generates one that we want to start from. And then from that, from that world, we want to modify it. So I think AI to create virtual worlds is being realized as we speak.”

GET3D details

Trained with only 2D images, Nvidia GET3D generates 3D shapes with high-quality textures and complex geometric details. These 3D objects are created in the same format used by popular graphics programs, so users can instantly import their figures into 3D renderers and game engines for further editing.

The generated objects can be used in 3D representations of buildings, outdoor spaces or entire cities, designed for industries including gaming, robotics, architecture and social media.

GET3D can generate a virtually unlimited number of 3D shapes based on the data it is trained on. Like an artist turning a lump of clay into a detailed sculpture, the model transforms numbers into complex 3D shapes.

“At the heart of it is the very technology I was talking about just a second ago called large language models,” he said. “To be able to learn from all of humanity’s creations, and to be able to imagine a 3D world. And then from words, through a great language model, will come out one day, triangles, geometry, textures and materials. And from that , we would change that. And, and because none of it is pre-baked, and none of it is pre-rendered, all this simulation of physics and all simulation of light has to be done in real time. And that’s why the latest technologies that we’re creating with considerations for RTX neuro-rendering are so important. Because we can’t brute force it. We need the help of artificial intelligence to be able to do it.”

With a training dataset of 2D car images, for example, it creates a collection of sedans, trucks, racing cars and vans. When trained on animal images, it comes up with creatures such as foxes, rhinos, horses and bears. Given chairs, the model generates assorted swivel chairs, dining chairs and cozy recliners.

“GET3D brings us one step closer to democratizing AI-powered 3D content creation,” said Sanja Fidler, vice president of AI research at Nvidia and head of the Toronto-based AI lab that created the tool. “Its ability to instantly generate textured 3D shapes can be a game changer for developers, helping them quickly populate virtual worlds with varied and interesting objects.”

GET3D is one of more than 20 Nvidia-authored papers and workshops accepted to the NeurIPS AI conference, which takes place in New Orleans and practically Nov. 26-Dec. 4.

Nvidia said that while faster than manual methods, previous 3D generative AI models were limited in the level of detail they could produce. Even newer reverse rendering methods can only generate 3D objects based on 2D images taken from different angles, requiring developers to build one 3D shape at a time.

Instead, GET3D can churn out around 20 shapes per second when running inference on a single Nvidia graphics processing unit (GPU) – acting as a generative adversarial network for 2D images, while generating 3D objects. The larger, more diverse the training data set it has learned from, the more varied and
detailed output.

Nvidia researchers trained GET3D on synthetic data consisting of 2D images of 3D shapes taken from different camera angles. It took the team just two days to train the model on around one million images using Nvidia A100 Tensor Core GPUs.

GET3D gets its name from its ability to generate explicit textured 3D meshes – meaning that the shapes it creates are in the form of a triangular mesh, like a papier-mâché model, covered with a textured material. This allows users to easily import the objects into game engines, 3D modelers and movie renderers – and edit them.

When creators export GET3D-generated shapes into a graphics program, they can apply realistic lighting effects as the object moves or rotates in a scene. Incorporating another AI tool from NVIDIA Research, StyleGAN-NADA, developers can use text messages to add a certain style to an image, such as changing a rendered car to become a burned car or a taxi, or make an ordinary house to a haunted one.

The researchers note that a future version of GET3D could use camera position estimation techniques to allow developers to train the model on real data instead of synthetic datasets. It can also be enhanced to support universal generation – meaning developers can train GET3D on all kinds of 3D shapes simultaneously, rather than training it on one object category at a time.

Prolog is Brendan Greene's next project.
Prolog is Brendan Greene’s next project.

So AI will generate worlds,” Huang said. These worlds will be simulations, not just animations. And to drive all of this, Huang envisions the need to create a “new type of data center around the world.” It’s called a GDN, not a CDN. It’s a graphics delivery network, battle-tested through Nvidia’s GeForce Now cloud gaming service. Nvidia has taken that service and is using it to create Omniverse Cloud, a suite of tools that can be used to create Omniverse applications, anytime, anywhere. GDN will host cloud gaming as well as the metaverse tools of the Omniverse Cloud.

This type of network can deliver the real-time computing necessary for the metaverse.

“It’s interactivity that’s essentially instantaneous,” Huang said.

Are there any game developers asking for this? Well, actually I know someone who is. Brendan Greene, creator of the battle royale game PlayerUnknown’s Productions, called for this kind of technology this year when he announced Prologue and then revealed Project Artemis, an effort to create a virtual world the size of Earth. He said it could only be built with a combination of game design, user-generated content and AI.

Well, hell.

GamesBeat’s creed when covering the gaming industry is “where passion meets business.” What does this mean? We want to tell you how the news means something to you – not only as a decision-maker in a game studio, but also as a fan of games. Whether you’re reading our articles, listening to our podcasts, or watching our videos, GamesBeat will help you learn about the industry and enjoy being involved in it. Discover our orientations.

Leave a Reply

Your email address will not be published.