Vid2Actor: Free-viewpoint Animatable Person Synthesis from Video in the Wild
Chung-Yi Weng, Brian Curless, Ira Kemelmacher-Shlizerman
University of Washington
Given an "in-the-wild" video, we train a deep network with the video frames to produce an animatable human representation that can be rendered from any camera view in any body pose, enabling applications such as motion re-targeting and bullet-time rendering without the need for rigged 3D meshes. Here we rebuild a 3D animatable Roger Federer from a video of 2015 US Open Final. (AP Photo/Bill Kostroun)
Supplementary Video
Abstract
Given an "in-the-wild" video of a person, we reconstruct an animatable model of the person in the video. The output model can be rendered in any body pose to any camera view, via the learned controls, without explicit 3D mesh reconstruction. At the core of our method is a volumetric 3D human representation reconstructed with a deep network trained on input video, enabling novel pose/view synthesis. Our method is an advance over GAN-based image-to-image translation since it allows image synthesis for any pose and camera via the internal 3D representation, while at the same time it does not require a pre-rigged model or ground truth meshes for training, as in mesh-based learning. Experiments validate the design choices and yield results on synthetic data and on real videos of diverse people performing unconstrained activities (e.g. dancing or playing tennis). Finally, we demonstrate motion re-targeting and bullet-time rendering with the learned models.Paper
Paper (arXiv)BibTex
@article{weng2020vid2actor, title={Vid2Actor: Free-viewpoint Animatable Person Synthesis from Video in the Wild}, author={Weng, Chung-Yi and Curless, Brian and Kemelmacher-Shlizerman, Ira}, journal={arXiv preprint arXiv:2012.12884}, year={2020} }