Building Rome in a Day


The Colosseum, 2,106 images, 819,242 points, Full resolution video

Entering the search term Rome on Flickr returns more than two million photographs. This collection represents an increasingly complete photographic record of the city, capturing every popular site, facade, interior, fountain, sculpture, painting, cafe, and so forth. It also offers us an unprecedented opportunity to richly capture, explore and study the three dimensional shape of the city.

In this project, we consider the problem of reconstructing entire cities from images harvested from the web. Our aim is to build a parallel distributed system that downloads all the images associated with a city, say Rome, from Flickr.com. After downloading, it matches these images to find common points and uses this information to compute the three dimensional structure of the city and the pose of the cameras that captured these images. All this to be done in a day.

This poses new challenges for every stage of the 3D reconstruction pipeline, from image matching to large scale optimization. The key contributions of our work is a new, parallel distributed matching system that can match massive collections of images very quickly and a new bundle adjust software that can solve extremely large non-linear least squares problems that are encountered in three dimensional reconstruction problems.

The project is a work in progress and over the next few months, we hope to have full scale results on data sets consisting of 1 million images and more. Shown below are some preliminary results of running our system on three city data sets downloaded from Flickr: Dubrovnik, Croatia; Rome and Venice, Italy. The static images were rendered from viewpoints chosen using the Canonical Views algorithm. Our current results are sparse point clouds, in collaboration with Yasutaka Furukawa we are also working on producing dense mesh models.

This research is part of Community Photo Collections project at the University of Washington GRAIL Lab. which explores the use of large scale internet image collections for furthering research in computer vision and graphics. Our work uses and builds upon a number of previous works, in particular, Photo Tourism and Skeletal Sets.

Team

Papers

Building Rome in a Day
Sameer Agarwal, Noah Snavely, Ian Simon, Steven M. Seitz and Richard Szeliski
International Conference on Computer Vision, 2009, Kyoto, Japan.

Reconstructing Rome
Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Brian Curless, Steven M. Seitz and Richard Szeliski
IEEE Computer, pp. 40-47, June, 2010

Building Rome in a Day
Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Ian Simon, Brian Curless, Steven M. Seitz and Richard Szeliski
Communications of the ACM, Vol. 54, No. 10, Pages 105-112, October 2011.
with a Technical Perspective by Prof. Carlo Tomasi

Software

The structure from motion code underlying our system has been released as the Bundler toolkit. We plan to release other parts of our software as well; please check back here for periodic updates.

Press

University of Washington Press Release
National Geographic
Popular Science
Slashdot
Seattle Times
The Telegraph
The New York Times
Science Nation
US News

Rome

The data set consists of 150,000 images from Flickr.com associated with the tags "Rome" or "Roma". Matching and reconstruction took a total of 21 hours on a cluster with 496 compute cores. Upon matching, the images organized themselves into a number of groups corresponding to the major landmarks in the city of Rome. Amongst these clusters can be found the Colosseum, St. Peter's Basilica, Trevi Fountain and the Pantheon. One of the advantages of using community photo collections is the rich variety of view points that these photographs are taken from. A striking example of this is the reconstruction of the interior of St. Peter's Basilica shown below.

# Images	# Cores	Match Time	Reconstruction Time	Largest Component
150,000	496	13 Hours	8 Hours	2,106


Trevi Fountain, 1,936 images, 656,699 points, Full resolution video	St. Peter's Basilica, 1,294 images, 530,076 points, Full resolution video

photo by Ceci And Brandon

Click here for static views of the reconstruction

Venice

The Venice data set is the largest image collection that have experimented with up till now. Matching on this data set took 27 hours, and the 3D reconstruction took 27 hours on 496 compute cores. The matching process gave rise to three major components: the Grand Canal and San Marco square and Doge's Palace. The first two are illustrated with video fly throughs below. The San Marco square is also our largest reconstruction till date with almost 14,000 images and over 4.5 million 3D points.

# Images	# Cores	Match Time	Reconstruction Time	Largest Component
250,000	496	27 Hours	38 Hours	14,079


The Grand Canal, 3,272 images, 561,389 points, Full resolution video	San Marco Square, 14,079 images, 4,515,157 points, Full resolution video

Click here for static views of the reconstruction

photo by pattyvi

Dubrovnik

At the time of our experiments, there were only 58,000 images of Dubrovnik on Flickr. For this city we were able to experiment with the entire collection. Matching took only 5 hours on 352 compute cores. The largest and most interesting component corresonds to the old city. It is interesting that the reconstruction time for Dubrovnik is so much more than that for Rome. The reason lies in how the data sets are structured. The Rome data set is essentially a collection of landmarks which at large scale have a simple geometry and visibility structure. The largest connected component in Dubrovnik on the other hand captures the entire old city. With its narrow alley ways, complex visibility and widely varying view points, it is a much more complicated reconstruction problem, and this is reflected in the time it took to solve it.

Also worth noting is the fact that the reconstruction is not restricted to the city itself, as can be seen in the video below, it also contains the hills surrounding the city and part of Lokrum island which is south east of the city.