A Virtual Dataset of Human Bodies

December 12, 2012

Some antefacts

As everyone working on biometric and computer vision know the availability of data is fundamental for the research. For me, this was entirely new when in 2010 I started working on biometric. Coming from a telecommunication environment, where we heavily rely on Montecarlo methods to simulate the channel distortion, the collection of real data was very new to me. Since the interest by the biometric community and mainly by the funding agencies (FBI, DoD), the analysis of the human shape analysis was my main research focus. In 2011/2012 I was able to collect some interesting data with the Microsoft Kinect, and a NIR (Near Infrared) camera together with some anthropometric measurements, but the number was around 130 subjects with a prevalence of undergrad students and some adult. These kind of collections are kinda expensive and time-consuming, two requisite that often are a big deal in small universities with a small budget, and students that need to finish for lack of funding. Then autonomously I came out with an interesting solution able to keep my research going, and avoiding the change in my Ph.D. topic.

Body models and Human shape in computer vision, and gaming

The human body is quite an interesting object, it’s composed of rigid and not rigid material, it has many degrees of freedom, and the appearance can change with age, gender, race, health status, and lifestyle. An enormous amount of works can be found in the modern and past literature. For instance, the Leonardo da Vinci Vitruvian man, just to cite one. In the last century, human body studies focused more on health assessment, or recognition, applying different statistical, and more physiological based techniques. However, in a data-driven approach, the necessity to collect human data for a statistically significant number of individuals is fundamental for avoiding biased results. The only dataset with a good amount of subjects is the CAESAR dataset. However, is not free, and the individuals in the dataset are only 2400.

Although, we can leverage the NHANES dataset from the CDC that collects many anthropometric measurements and statistics of the American population for different years. My initial thought was: if we can build bodies from these measurements we can replicate many analyses on real subjects with way less money and resources. My study focus on works like SCAPE: Shape Completion and Animation of People, and skinned body models. Unfortunately, this was before the enormous work done by Michael Black in human shape, and human pose. However, Black’ goal were slightly different than mine, and the only relevant work for my research was the base for the launch of a startup, now acquired by Amazon.

Another surprise came with the discovery of the character modeling for games. This area is definitely closer to the gaming and developer communities than scientific and research-based communities. I was attracted by the open source MakeHuman. Before that, I was contemplating to use some commercial software, but the cost, the scarcity of funds (in those years I was close to getting unfounded! maybe I’ll write a blog….) made me lean toward the open source option where I could get my hands dirty with the source code.

MakeHuman

Makehuman is an open source library from the prototyping of mesh characters. The contributions of many developers made MakeHuman extremely stable with multiple additions, and plugins. Makehuman is unique because is written in python with the use of common python libraries, and just a few dependencies. The structure of the code is completely modular with the possibility to write additional plugins and classes without effort.

Unfortunately MakeHuman has been design for the generation of a single character at the time, relying on other software for the animation. However, I develop an efficient pipeline capable to generate many bodies automatically.

Body Generator

The Main differences between my body generator and the other solutions (Shotton et al ¹, Buys et al.²) are two fundamental design goals.

The distribution of the generated population needs to be close enough to the distribution of real body population. Shotton et al.’ generator focus on body poses, thus the goal is to generate mesh poses similar to real poses. The generator uses the CMU MoCAP dataset to re-target the mesh to a new pose. However, a limited number of subjects is used. This makes the algorithm biased toward average size subjects, as we can deduct from the Kinect v1 specs. Our goal instead is to generate a large variety of body shapes when there are changes in anthropometric measurements and body composition. We use the anthropometric measurements of real subject from the NHANES dataset as target distribution for the new population. We feed these measurements to the generator, represented by the buiding blocks in Figure. A final measuring tool compare the anthropometric measurements on the generated population with the original population. In a later iteration of the system we included a feedback loop to tune the body measurements closer to the original measurements.

We don’t generate multiple body poses, but we have a virtual camera environment able to take multiple views of the body rendering.

Results

VirtualBody: Datasets Generated

We generated two datasets, which, can satisfy most of the real world scenarios.

Virtual NHANES Dataset:

Based on 500 real NHANES measurements, 12500 total shapes.
25 shapes for family: 5 values of weight, and 5 values of fat percentage.
Each family: same stature, but very fine step variations in body shape.

Virtual Random Dataset:

19995 shapes.
The model parameters randomly chosen.
Defined distributions with defined ranges of values.
The result is a population quite vast in diversities.

J. Shotton et al., “Real-time human pose recognition in parts from single depth images,” in In In CVPR, 2011. ↩
D. Van Deun, V. Verhaert, K. Buys, B. Haex, and J. Vander Sloten, “Automatic generation of personalized human models based on body measurements,” 2011. ↩