HumanNeRF:
Free-viewpoint Rendering of Moving People
from Monocular Video

CVPR 2022 (Oral)

Chung-Yi Weng ¹

Brian Curless ^1,2

Pratul P. Srinivasan ²

Jonathan T. Barron ²

Ira Kemelmacher-Shlizerman ^1,2

¹University of Washington

²Google Research

HumanNeRF transforms a YouTube video into a full 360 degree free-viewpoint video.

Abstract

We introduce a free-viewpoint rendering method -- HumanNeRF -- that works on a given monocular video of a human performing complex body motions, e.g. a video from YouTube. Our method enables pausing the video at any frame and rendering the subject from arbitrary new camera viewpoints or even a full 360-degree camera path for that particular frame and body pose. This task is particularly challenging, as it requires synthesizing photorealistic details of the body, as seen from various camera angles that may not exist in the input video, as well as synthesizing fine details such as cloth folds and facial appearance. Our method optimizes for a volumetric representation of the person in a canonical T-pose, in concert with a motion field that maps the estimated canonical representation to every frame of the video via backward warps. The motion field is decomposed into skeletal rigid and non-rigid motions, produced by deep networks. We show significant performance improvements over prior work, and compelling examples of free-viewpoint renderings from monocular video of moving humans in challenging uncontrolled capture scenarios.

Video

Results

Data in the wild

We demonstrate our method on YouTube videos (story, invisible, way2sexy) as well as self-captured data (rugby, hoodie). For each sequence, we begin with video frames, then show free-viewpoint rendering, and end with synthesized results.

video source: https://youtu.be/ANwEiICt7BM

video source: https://youtu.be/0ORaAnJYROg

video source: https://youtu.be/gEpJDE8ZbhU

ZJU-Mocap (Single View)

We show the results optimized on ZJU-Mocap data. We only use video frames from one camera as input. All frames are synthesized in below videos.

Resources

BibTex

@InProceedings{weng_humannerf_2022_cvpr,
    title     = {Human{N}e{RF}: Free-Viewpoint Rendering of Moving People From Monocular Video},
    author    = {Weng, Chung-Yi and 
                 Curless, Brian and 
                 Srinivasan, Pratul P. and 
                 Barron, Jonathan T. and 
                 Kemelmacher-Shlizerman, Ira},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2022},
    pages     = {16210-16220}
}

Acknowledgement

We thank Marquese Scott for generously allowing us to feature his inspiring videos in this work. Special thanks to Lulu Chu for her enduring support and to Aaron Weng for reminding me to stay forever curious about the world.

This work was funded by the UW Reality Lab, Meta, Google, Futurewei, and Amazon.

Inspired by Michaël Gharbi and Jon Barron | Written by Chung-Yi Weng

HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video