Steven M. Seitz, Luke Zettlemoyer
We introduce an approach for analyzing Wikipedia and other text, together with online photos, to produce annotated 3D models of famous tourist sites. The approach is completely automated, and leverages online text and photo co-occurrences via Google Image Search. It enables a number of new interactions, which we demonstrate in a new 3D visualization tool. Text can be selected to move the camera to the corresponding objects, 3D bounding boxes provide anchors back to the text describing them, and the overall narrative of the text provides a temporal guide for automatically flying through the scene to visualize the world as you read about it. We show compelling results on several major tourist sites.
Paper & Presentation
Code and Data
The source code is available on our GitHub project page.
We also provide the input text, reference image, and a Matlab data structure containing the 3D sparse point cloud for the Pantheon in Rome (73MB), which is used in the demo script in the source code.
The research was supported in part by the National Science Foundation (IIS-1250793), the Intel Science and Technology Centers for Visual Computing (ISTC-VC) and Pervasive Computing (ISTC-PC), and Google.