Google DeepMind launches world model Genie 3 to redefine “generative AI”

Google DeepMind on Monday announced the launch of the third generation of general-purpose world model Genie 3, which can generate unprecedentedly diverse interactive environments and give text prompts. Genie 3 can generate dynamic worlds that can be navigated in real time at 24 frames per second and remain consistent for several minutes at 720p resolution.

Genie 3 will initially be available as a limited research preview to a small group of scholars and creators to gather critical feedback.

Genie 3 Breakthrough

DeepMind has accumulated more than ten years of experience in the field of simulated environments. From training AI to play real-time strategy games to developing open learning environments for robots, these studies all point to a common goal: building powerful models of the world.

Genie 3 is the first world model to allow real-time interaction, while also offering improved consistency and realism compared to previous generation models such as Genie 1/2 and video generation models such as Veo 2. Veo 3’s deep understanding of intuitive physics

characteristic	Genie 2	Veo	Genie 3
resolution	360p	720p to 4K	720p
field	3D environment	Universal	Universal
Control method	Limited keyboard/mouse	video level description	Real-time navigation; promptable world events
Interaction duration	10-20 seconds	8 seconds	few minutes
Interaction delay	non real time	not applicable	real time

core competencies

Simulate the physical properties of the world: Genie 3 has a deep understanding of physical laws and can realistically simulate water flow, light and shadow changes, and complex environmental interactions, such as helicopters carefully maneuvering around cliffs and waterfalls.

Simulate the natural world: From vibrant ecosystems on the shores of glacial lakes to adorable furry creatures hopping across rainbow bridges in fantasy worlds, Genie 3 transforms imagination into explorable reality

Animation and novel modeling: you can use your imagination to create fantastic scenes and expressive animated characters

Explore different regions and historical scenes: The model can transcend geographical and time constraints and lead users to explore different places and historical eras, whether they are flying over snow-capped mountains in a wingsuit or immersed in an ancient city with a long history.

Pushing the limits of real-time performance: To achieve a high degree of controllability and real-time interactivity, during the autoregressive generation of each frame, the model must consider previously generated trajectories that grow over time. For example, if a user revisits a location one minute later, the model must reference relevant information from one minute ago. In order to achieve real-time interactivity, this calculation must be performed many times per second in response to the arrival of new user input

Long-term environment consistency: In order for AI-generated worlds to be immersive, they must remain physically consistent over long periods of time. However, generating environments with automatic regression is often a more difficult technical problem than generating an entire video, as inaccuracies tend to accumulate over time. Genie 3 environments are largely consistent over several minutes, and visual memory goes back to a minute ago. Genie 3 generated worlds are more dynamic and richer because they are created frame by frame based on the user's world descriptions and actions.

Promptable World Events: In addition to navigation input, Genie 3 also supports a more expressive form of text-based interaction called Promptable World Events. Cueable world events can alter the generated world, such as changing weather conditions or introducing new objects and characters, thereby enhancing the experience of navigation control. This ability also increases the breadth of counterfactual or "what-if" scenarios that agents can use to learn from experience to deal with unexpected situations.

Research on empowering embodied intelligence

One of the ultimate goals of Genie 3 is to provide an infinitely rich training ground for Embodied Agents. DeepMind has tested it in combination with the general-purpose agent SIMA. Researchers can give SIMA a goal (such as finding an industrial mixer in a bakery), and SIMA attempts to complete the task by sending navigation instructions to Genie 3. Genie 3 is like a real world, providing real-time feedback results based on SIMA's behavior, allowing the agent to learn and grow in a large number of what-if scenarios.

current limitations

Current limitations of Genie 3:

Limited action space: The agent’s direct action range is still limited

Lack of multi-agent simulation: Difficulty accurately simulating complex interactions between multiple independent agents

Insufficient geographic accuracy: Unable to perfectly replicate real-world geographic locations

Poor text rendering: Generated text is often blurry unless specified in the initial prompt

Limited interaction duration: currently supports continuous interactions of minutes, not hours