Scientists reconstruct visual scenes with high precision and convert mouse brain activity into movies

For the first time, scientists have successfully reconstructed a 10-second video clip of a mouse using only its brain activity, opening a new window into understanding how the brain encodes and processes visual experience. This achievement comes from a research team led by University College London (UCL). The relevant paper was recently published in the journal "eLife".

In recent years, the field of neuroscience has continued to focus on how the human brain "splices" the world we see from the signals received by the eyes. Past studies mostly showed volunteers images or videos in imaging equipment such as functional magnetic resonance, and then tried to decode visual information from brain activity down to a single pixel. This work continues this general goal, but chose to track the activity of the visual cortex through single-cell level recording in mice to obtain a more detailed picture of the brain's visual representation.

Using only activity data from the mice's visual cortex, the team was able to reconstruct previously viewed video clips of the mice with surprising quality. The paper's first author, Joel Bauer from the UCL Sainsbury Wellcome Centre, said they wanted to find a more general and realistic way to explore how the brain makes sense of what it sees. Many existing methods can only make inferences based on specific conditions or stimuli, and are difficult to generalize to more natural and complex visual scenes, while new methods try to directly capture what the brain is representing and compare it with reality.

In terms of specific technology, the research team adopted a "dynamic neural coding model." The model, originally developed by another team for the 2023 Sensorium competition, was used to predict the intensity of activity in each neuron as mice watched a movie, taking into account factors such as the animal's spontaneous movements and pupil diameter. The UCL team further improved the model based on the same data set: they compared two types of neuronal activity - the activity predicted by the model when the mice looked at a "blank screen", and the real activity measured by microscopic imaging technology. This imaging method can precisely identify which neurons are activated at specific time points based on changes in local calcium concentrations within the cell.

As the model runs, the researchers start from a "blank movie" and use the algorithm to continuously adjust each pixel until the generated video is statistically highly consistent with the video the mice actually watched. After completing training, the model could reconstruct a video clip of about 10 seconds in length based on the brain activity of mice watching a new video. It is worth noting that these videos used for reconstruction did not participate in model training, which better reflects the versatility of the method.

Ball noted that the detail of the reconstructed videos improved significantly after adding data from more individual neurons, underscoring the importance of obtaining more comprehensive neural data. To evaluate the reconstruction, the team used a pixel correlation metric, comparing each pixel of each frame in the original movie to the corresponding pixel in the reconstructed movie. The results show that the differences in space and time between the two are relatively limited, indicating that this kind of "movie translation" based on brain activity can achieve a very high accuracy.

In the future, the researchers plan to collect brain data with higher resolution and wider coverage to support clearer and wider visual scene reconstruction. In terms of application, they particularly hope to use this technology to explore "perception-reality deviation": that is, why and how the visual representation in the brain deviates from the objective image in front of us. Ball pointed out that humans do not have a completely real "copy of the world" in the brain. Visual information is selectively amplified, compressed or distorted during transmission and processing. This deviation is not a simple error, but a functional mechanism used by the brain to interpret and enhance perceived information.

This study on visual film reconstruction in mice lays the foundation for similar work in more complex animals and even humans in the future. With the continued development of imaging technology, computational models, and data analysis methods, scientists may be able to gain a deeper understanding of how we "see" the world, and are expected to provide new theoretical support for the diagnosis and treatment of visual impairments, brain-computer interfaces, and immersive artificial perception systems.