We introduce a fully automatic pipeline for inferring depth, occlusion, and lighting/shadow information from image sequences of a scene. All theses information is extracted just by using people (and other objects such as cars) as scene probes to passively scan the scene. We also develop a tool for image compositing based on the inferred depth, occlusion ordering, lighting, and shadows. Please see more results in the supplementary video.


By analyzing the motion of people and other objects in a scene, we demonstrate how to infer depth, occlusion, lighting, and shadow information from video taken from a single camera viewpoint. This information is then used to composite new objects into the same scene with a high degree of automation and realism. In particular, when a user places a new object (2D cut-out) in the image, it is automatically rescaled, relit, occluded properly, and casts realistic shadows in the correct direction relative to the sun, and which conform properly to scene geometry. We demonstrate results (best viewed in supplementary video) on a range of scenes and compare to alternative methods for depth estimation and shadow compositing.


Full paper + Supplementary material: [High Res, 42.2 MB], [Low Res, 6.2 MB], ECCV 2020

arXiv: [Link]

  title={People as Scene Probes},
  author={Wang, Yifan and Curless, Brian L and Seitz, Steven M},
  booktitle={European Conference on Computer Vision (ECCV)},


Inference code coming soon!


This work was supported by the UW Reality Lab, Facebook, Google, and Futurewei.


Yifan Wang

Last modified: