Chinese research team releases multi-view dataset "FreeMan" to solve limitations of 3D human pose estimation

Estimating the three-dimensional structure of the human body from real scenes is a challenging task and is of great significance to fields such as artificial intelligence, graphics, and human-computer interaction. However, existing 3D human pose estimation datasets are usually collected under controlled conditions with static backgrounds and fail to represent the diversity of real-world scenes, thus limiting the development of accurate models for real-world applications.

In this regard, existing datasets similar to Human3.6M and HuMMan are widely used for 3D human pose estimation, but they are collected in controlled laboratory environments and cannot fully capture the complexity of real-world environments. These datasets have limitations in scene diversity, human motion, and scalability. Researchers have proposed various models for 3D human pose estimation, but their effectiveness is often hindered when applied to real scenes due to limitations of existing datasets.

A research team in China has launched "FreeMan", a project jointly developed by teams from the Chinese University of Hong Kong (Shenzhen) and Tencent and other institutions. It is hailed as an innovative multi-view data set and aims to bring new breakthroughs in the field of 3D human pose estimation.

FreeMan is a novel large-scale multi-view dataset designed to address the limitations of existing datasets in 3D human pose estimation in real scenes. FreeMan is an important contribution aimed at facilitating the development of more accurate and robust models.

One of the characteristics of the FreeMan project is the size and diversity of its datasets. The data set consists of simultaneous recordings of 8 smartphones in different scenarios, including 10 different scenes and 27 real venues, and contains a total of more than 11 million frames of video. Each scene covers different lighting conditions, making this dataset a unique resource.

The FreeMan dataset is open sourced to promote the development of large-scale pre-training datasets and also provides a new benchmark for outdoor 3D human pose estimation. This data set not only includes videos, but also provides rich annotation information, including 2D and 3D human body key points, SMPL parameters, bounding boxes, etc., providing researchers with rich resources to promote research in related fields.

It is worth noting that FreeMan introduces changes in camera parameters and human scale to make it more representative. The research team developed an automated annotation process to efficiently generate accurate 3D annotations from the collected data. This process includes human detection, 2D key point detection, 3D pose estimation and mesh annotation. The resulting dataset is valuable for a variety of tasks, including monocular 3D estimation, 2D to 3D conversion, multi-view 3D estimation and neural rendering of human subjects.

The researchers provide a comprehensive evaluation baseline of FreeMan on a variety of tasks. They compared the performance of models trained on FreeMan with models trained on Human3.6M and HuMMan. Notably, the model trained on FreeMan showed significantly better performance when tested on the 3DPW dataset, highlighting FreeMan's superior generalization ability in real-world scenarios.

In the multi-view 3D human pose estimation experiment, compared with the model trained on Human3.6M, the model trained on FreeMan showed better generalization ability when tested on cross-domain datasets. The results consistently show the advantages of FreeMan's diversity and scale.

In the 2D to 3D pose conversion experiment, the challenge of FreeMan is obvious, because the model trained on this data set faces greater difficulty. However, when the model was trained on the entire FreeMan training set, its performance improved, showing the potential of this dataset to improve model performance.

The availability of FreeMan is expected to drive advances in the fields of human body modeling, computer vision and human-computer interaction, bridging the gap between controlled laboratory conditions and real-life scenarios.