Vid2Actor: Free-viewpoint Animatable Person Synthesis from Video in the Wild
Chung-Yi Weng, Brian Curless, Ira Kemelmacher-Shlizerman
University of Washington
Given an "in-the-wild" video, we train a deep network with the video frames to produce an animatable human representation that can be rendered from any camera view in any body pose, enabling applications such as motion re-targeting and bullet-time rendering without the need for rigged 3D meshes. Here we rebuild a 3D animatable Roger Federer from a video of 2015 US Open Final. (AP Photo/Bill Kostroun)
AbstractGiven an "in-the-wild" video of a person, we reconstruct an animatable model of the person in the video. The output model can be rendered in any body pose to any camera view, via the learned controls, without explicit 3D mesh reconstruction. At the core of our method is a volumetric 3D human representation reconstructed with a deep network trained on input video, enabling novel pose/view synthesis. Our method is an advance over GAN-based image-to-image translation since it allows image synthesis for any pose and camera via the internal 3D representation, while at the same time it does not require a pre-rigged model or ground truth meshes for training, as in mesh-based learning. Experiments validate the design choices and yield results on synthetic data and on real videos of diverse people performing unconstrained activities (e.g. dancing or playing tennis). Finally, we demonstrate motion re-targeting and bullet-time rendering with the learned models.
PaperPaper (arXiv) Code (Coming Soon)
AcknowledgmentsThis work was supported by the UW Reality Lab, Facebook, Google, Futurewei, and Amazon. We thank all of the photo owners for allowing us to use their photos. The training videos were downloaded from Youtube US Open Tennis Championships channel and Official World of Dance channel. Photo credits: Reuters/Carlo Allegri, Getty Images/Clive Brunskill, Getty Images/Maddie Meyer, AP Photo/Bill Kostroun