Google DeepMind on Monday announced the launch of the third generation of general-purpose world model Genie 3, which can generate unprecedentedly diverse interactive environments and give text prompts. Genie 3 can generate dynamic worlds that can be navigated in real time at 24 frames per second and remain consistent for several minutes at 720p resolution.

Genie 3 will initially be available as a limited research preview to a small group of scholars and creators to gather critical feedback.
Genie 3 Breakthrough
DeepMind has accumulated more than ten years of experience in the field of simulated environments. From training AI to play real-time strategy games to developing open learning environments for robots, these studies all point to a common goal: building powerful models of the world.
Genie 3 is the first world model to allow real-time interaction, while also offering improved consistency and realism compared to previous generation models such as Genie 1/2 and video generation models such as Veo 2. Veo 3’s deep understanding of intuitive physics
| characteristic | Genie 2 | Veo | Genie 3 |
|---|---|---|---|
| resolution | 360p | 720p to 4K | 720p |
| field | 3D environment | Universal | Universal |
| Control method | Limited keyboard/mouse | video level description | Real-time navigation; promptable world events |
| Interaction duration | 10-20 seconds | 8 seconds | few minutes |
| Interaction delay | non real time | not applicable | real time |
core competencies
Simulate the physical properties of the world: Genie 3 has a deep understanding of physical laws and can realistically simulate water flow, light and shadow changes, and complex environmental interactions, such as helicopters carefully maneuvering around cliffs and waterfalls.
Simulate the natural world: From vibrant ecosystems on the shores of glacial lakes to adorable furry creatures hopping across rainbow bridges in fantasy worlds, Genie 3 transforms imagination into explorable reality
Animation and novel modeling: you can use your imagination to create fantastic scenes and expressive animated characters
Explore different regions and historical scenes: The model can transcend geographical and time constraints and lead users to explore different places and historical eras, whether they are flying over snow-capped mountains in a wingsuit or immersed in an ancient city with a long history.
Pushing the limits of real-time performance: To achieve a high degree of controllability and real-time interactivity, during the autoregressive generation of each frame, the model must consider previously generated trajectories that grow over time. For example, if a user revisits a location one minute later, the model must reference relevant information from one minute ago. In order to achieve real-time interactivity, this calculation must be performed many times per second in response to the arrival of new user input
Long-term environment consistency: In order for AI-generated worlds to be immersive, they must remain physically consistent over long periods of time. However, generating environments with automatic regression is often a more difficult technical problem than generating an entire video, as inaccuracies tend to accumulate over time. Genie 3 environments are largely consistent over several minutes, and visual memory goes back to a minute ago. Genie 3 generated worlds are more dynamic and richer because they are created frame by frame based on the user's world descriptions and actions.
Promptable World Events: In addition to navigation input, Genie 3 also supports a more expressive form of text-based interaction called Promptable World Events. Cueable world events can alter the generated world, such as changing weather conditions or introducing new objects and characters, thereby enhancing the experience of navigation control. This ability also increases the breadth of counterfactual or "what-if" scenarios that agents can use to learn from experience to deal with unexpected situations.
Research on empowering embodied intelligence
One of the ultimate goals of Genie 3 is to provide an infinitely rich training ground for Embodied Agents. DeepMind has tested it in combination with the general-purpose agent SIMA. Researchers can give SIMA a goal (such as finding an industrial mixer in a bakery), and SIMA attempts to complete the task by sending navigation instructions to Genie 3. Genie 3 is like a real world, providing real-time feedback results based on SIMA's behavior, allowing the agent to learn and grow in a large number of what-if scenarios.
current limitations
Current limitations of Genie 3:
Limited action space: The agent’s direct action range is still limited
Lack of multi-agent simulation: Difficulty accurately simulating complex interactions between multiple independent agents
Insufficient geographic accuracy: Unable to perfectly replicate real-world geographic locations
Poor text rendering: Generated text is often blurry unless specified in the initial prompt
Limited interaction duration: currently supports continuous interactions of minutes, not hours