Multi-View Stereo for Community Photo Collections

Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, Steven M. Seitz

CPC on flickr.com
reconstructed geometry model

Abstract

We present a multi-view stereo algorithm that addresses the extreme changes in lighting, scale, clutter, and other effects in large online community photo collections. Our idea is to intelligently choose images to match, both at a per-view and per-pixel level. We show that such adaptive view selection enables robust performance even with dramatic appearance variability. The stereo matching technique takes as input sparse 3D points reconstructed from structure-from-motion methods and iteratively grows surfaces from these points. Optimizing for surface normals within a photoconsistency measure significantly improves the matching results. While the focus of our approach is to estimate high-quality depth maps, we also show examples of merging the resulting depth maps into compelling scene reconstructions. We demonstrate our algorithm on standard multi-view stereo datasets and on casually acquired photo collections of famous scenes gathered from the Internet.

Publication

Multi-View Stereo for Community Photo Collections
Michael Goesele, Noah Snavely, Brian Curless, Hugues Hoppe, Steven M. Seitz
Proceedings of ICCV 2007, Rio de Janeiro, Brasil, October 14-20, 2007.

Team

Michael Goesele - University of Washington, TU Darmstadt
Noah Snavely - University of Washington
Brian Curless - University of Washington
Hugues Hoppe - Microsoft Research
Steven M. Seitz - University of Washington

Overview Talk

The Google Tech Talk "Navigating the World's Photographs" by Steve Seitz, Noah Snavely, and Michael Goesele gives a good overview over our current work on community photo collections (including multi-view stereo reconstruction). View a video of the talk on Google Video or download the video in Flash Video (FLV) format (114 MB) or AVI format (141 MB).

In the Press ...

NewScientist.com ran a story on this work on October 29th, 2007. Have a look at the article "Holiday snapshots used to model the world in 3D" by Will Knight that explains the basic ideas behind the paper.

Datasets

The following datasets we reconstructed using the proposed multi-view stereo technique. Models were trimmed using standard mesh processing operations to remove spurious geometry introduced by the Poisson reconstuction approach.

rendered model	example input image	Venus de Milo, Paris, France reconstruction based on 129 images from Flickr large image of the rendered model
rendered model	example input image	Duomo in Pisa, Italy reconstruction based on 56 images from Flickr captured by 8 photographers large image of the rendered model
rendered model	example input image	Notre Dame de Paris, France reconstruction based on 653 images from Flickr captured by 313 photographers large image of the rendered model The result movie below shows a reconstruction of the central portal based on the same dataset.
rendered models	example input images	temple and dino model from the multi-view stereo evaluation page templeFull reconstruction based on 312 images from the test set (0.42 mm accuracy, 98.2% completeness) dinoFill reconstruction based on 363 images from the test set (0.46 mm accuracy, 96.7% completeness) More information is available at the multi-view stereo evaluation page.

Result Movie

The following movie shows a reconstruction of the central portal of Notre Dame cathedral in Paris. The reconstruction is based on 653 images from Flickr.

rendered model (klick to play)

image of the portal from Flickr

Acknowledgements

We would like to thank all photographers who made their images available via Flickr.

More about Community Photo Collections ...

... can be found at the Community Photo Collections project page.