Are you an EPFL student looking for a semester project?
Work with us on data science and visualisation projects, and deploy your project as an app on top of GraphSearch.
Recovering a person's height from a single image is important for virtual garment fitting, autonomous driving and surveillance. However, it is also very challenging without absolute scale information. Here, we examine the rarely addressed case, where camera parameters and scene geometry are all unknown. Under this circumstances, scale is inherently ambiguous, and height can only be inferred from those statistics that are intrinsic to human anatomy and can be estimated from images directly, such as articulated pose, bone-length proportions, and facial features. Our contribution is twofold. First, we create a new humanheight dataset that is three magnitudes larger than existing ones, by mining explicit height labels and propagating them to additional images through face recognition and assignment consistency. Second, we test a wide range of machine learning models (linear, shallow, and deep models) to capture the relation between image content and human height. We also show that performance is predominantly limited by dataset size. Our central finding is that height can only be estimated with large uncertainty. The remaining high variance demonstrates that the geometrically motivated scale ambiguity persists into the age of deep learning, which has important implications for how to pose monocular reconstruction, such as 3D human pose estimation, in a scale invariant way.
Touradj Ebrahimi, Yuhang Lu, Zewei Xu