Image-Based Remodeling
- Abstract:
- Imagining what a proposed home remodel might look like without actually performing it is challenging. We present an image-based remodeling methodology that allows real-time photorealistic visualization during both the modeling and remodeling process of a home interior. Large-scale edits, like removing a wall or enlarging a window, are performed easily and in realtime, with realistic results. Our interface supports the creation of concise, parameterized, and constrained geometry, as well as remodeling directly from within the photographs. Real-time texturing of modified geometry is made possible by precomputing view-dependent textures for all faces that are potentially visible to each original camera viewpoint, blending multiple viewpoints and hole-filling when necessary. The resulting textures are stored and accessed efficiently enabling intuitive real-time realistic visualization, modeling, and editing of the building interior.
- Citation:
- Alex Colburn, Aseem Agarwala, Aaron Hertzmann, Brian Curless, and Michael F. Cohen. Image-Based Remodeling. IEEE Transactions on Visualization and Computer Graphics, to appear.
- On-line documents:
- Project Page
Preprint article (PDF)
A Landmark-free Framework for the Detection and Description of Shape Differences in Embryos
- Abstract:
- This paper introduces a new method to quantify and characterize shape changes during early facial development without the use of landmarks. Landmarks are traditionally used in morphometric analysis, but very few can be identified reliably across all stages of embryonic development. This method uses deformable registration to produce a dense vector field describing the point correspondences between two images. Low and mid-level features are extracted from the deformable vector field to find regions of organized differences that are biologically relevant. These methods are shown to detect regions of difference when evaluated on chick embryo images warped with small magnitude deformations in regions critical to midfacial development.
- Citation:
- S. M. Rolfe, L. G. Shapiro, T. C. Cox, A. M. Maga, and L. L. Cox. A Landmark-free Framework for the Detection and Description of Shape Differences in Embryos. 33rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC '11), August 2011.
- On-line documents:
- Complete article (PDF)
Learning to Compute the Plane of Symmetry for Human Faces
- Abstract:
- Facial symmetry analysis is complex in both computer vision and medicine. This paper presents a method to compute the plane of symmetry for 3D meshes of the human head and face through learning. The two steps of processing include: 1) landmark-related region detection and 2) symmetry plane computation in the learning stage, which uses the landmarks and the standard symmetry planes identified by medical experts for training. Experimental results show that our method performs better than the existing mirror method [1], and is robust to rotation and noise.
- Citation:
- Jia Wu, Raymond Tse, Carrie L. Heike, and Linda G. Shapiro. Learning to Compute the Plane of Symmetry for Human Faces. ACM Conference on Bioinformatics, Computational Biology and Biomedicine 2011, August 2011.
- On-line documents:
- Complete article (PDF)
Classification and Feature Selection for Craniosynostosis
- Abstract:
- Craniosynostosis is the premature fusion of the bones of the calvaria resulting in abnormal skull shapes that can be associated with increased intracranial pressure. The goal of this work is to analyze the various 3D skull shapes that manifest in isolated single suture craniosynostosis. A logistic regression is used to identify different types of synostosis and quantify the differences. Due to the high-dimensionality of the feature data, a sophisticated feature selection technique is required to avoid overfitting and to improve the classification accuracy on the unseen data. In addition, feature selection allows the identification of surface areas that contribute to the major skull deformations that characterize isolated synostosis. We applied three sparse feature selection methods: L1 regularization (lasso [9]), fused lasso ([10]) and a novel regularization method we have developed called the clustering lasso (cLasso). L1 regularized logistic regression locates important surface points, and the fused lasso groups these points into regions. The cLasso was designed to assign similar weights to groups of correlated shape features. Experimental results indicated that the regularized logistic regression models achieve a significantly lower misclassification rate than unregularized logistic regression.
- Citation:
- Shulin Yang, Linda G. Shapiro, Michael L. Cunningham, Matthew Speltz, and Su-In Lee. Classification and Feature Selection for Craniosynostosis. ACM Conference on Bioinformatics, Computational Biology and Biomedicine 2011, August 2011.
- On-line documents:
- Complete article (PDF)
Groupwise Pose Normalization for Craniofacial Applications
- Abstract:
- A general framework is proposed for solving groupwise pose normalization problems and is analyzed in detail under different feature spaces. The analysis shows that using principal component analysis for pose normalization is a special case of using the proposed framework under a special feature space. The experimental results on two craniofacial datasets show the proposed method achieved promising results for solving groupwise pose normalization problems for craniofacial applications.
- Citation:
- Jiun-Hung Chen and Linda G. Shapiro. Groupwise Pose Normalization for Craniofacial Applications. IEEE Workshop on Applications of Computer Vision (WACV), January 2011.
- On-line documents:
- Complete article (PDF)
Estimating Image Segmentation Difficulty
- Abstract:
- The heavy use of camera phones and other mobile devices all over the world has produced a market for mobile image analysis, including image segmentation to separate out objects of interest. Automatic image segmentation algorithms, when employed by many different users for multiple applications, cannot guarantee high quality results. Yet interactive algorithms require human effort that may become quite tedious. To reduce human effort and achieve better results, it is worthwhile to know in advance which images are difficult to segment and may require further user interaction or alternate processing. For this purpose, we introduce a new research problem: how to estimate the image segmentation difficulty level without actually performing image segmentation. We propose to formulate it as an estimation problem, and we develop a linear regression model using image features to predict segmentation difficulty level. Different image features, including graytone, color, gradient, and texture features are tested as the predictive variables, and the segmentation algorithm performance measure is the response variable. We use the benchmark images of the Berkeley segmentation dataset with corresponding F-measures to fit, test, and choose the optimal model. Additional standard image datasets are used to further verify the model's applicability to a variety of images. A new feature that combines information from the log histogram of log gradient and the local binary pattern histogram is a good predictor and provides the best balance of predictive performance and model complexity.
- Citation:
- Dingding Liu, Yingen Xiong, Kari Pulli, and Linda Shapiro. Estimating Image Segmentation Difficulty. Machine Learning and Data Mining in Pattern Recognition, August 2011.
- On-line documents:
- Complete article (PDF)
Nonlinear Inverse Reinforcement Learning with Gaussian Processes
- Abstract:
- We present a probabilistic algorithm for nonlinear inverse reinforcement learning. The goal of inverse reinforcement learning is to learn the reward function in a Markov decision process from expert demonstrations. While most prior inverse reinforcement learning algorithms represent the reward as a linear combination of a set of features, we use Gaussian processes to learn the reward as a nonlinear function, while also determining the relevance of each feature to the expert's policy. Our probabilistic algorithm allows complex behaviors to be captured from suboptimal stochastic demonstrations, while automatically balancing the simplicity of the learned reward structure against its consistency with the observed actions.
- Citation:
- Sergey Levine, Zoran Popović, and Vladlen Koltun. Nonlinear Inverse Reinforcement Learning with Gaussian Processes. Neural Information Processing Systems 24, December 2011.
- On-line documents:
- Complete article (PDF)
Algorithm discovery by protein folding game players
- Abstract:
- Foldit is a multiplayer online game in which players collaborate and compete to create accurate protein structure models. For specific hard problems, Foldit player solutions can in some cases outperform state-of-the-art computational methods. However, very little is known about how collaborative gameplay produces these results and whether Foldit player strategies can be formalized and structured so that they can be used by computers. To determine whether high performing player strategies could be collectively codified, we augmented the Foldit gameplay mechanics with tools for players to encode their folding strategies as "recipes" and to share their recipes with other players, who are able to further modify and redistribute them. Here we describe the rapid social evolution of player-developed folding algorithms that took place in the year following the introduction of these tools. Players developed over 5,400 different recipes, both by creating new algorithms and by modifying and recombining successful recipes developed by other players. The most successful recipes rapidly spread through the Foldit player population, and two of the recipes became particularly dominant. Examination of the algorithms encoded in these two recipes revealed a striking similarity to an unpublished algorithm developed by scientists over the same period. Benchmark calculations show that the new algorithm independently discovered by scientists and by Foldit players outperforms previously published methods. Thus, online scientific game frameworks have the potential not only to solve hard scientific problems, but also to discover and formalize effective new strategies and algorithms.
- Citation:
- Firas Khatib, Seth Cooper, Michael D. Tyka, Kefan Xu, Ilya Makedon, Zoran Popović, David Baker, and Foldit Players. Algorithm discovery by protein folding game players. PNAS, 108(47), 2011.
- On-line documents:
- Complete article (PDF)
Pause-and-Play: Automatically Linking Screencast Video Tutorials with Applications
- Abstract:
- Video tutorials provide a convenient means for novices to learn new software applications. Unfortunately, staying in sync with a video while trying to use the target application at the same time requires users to repeatedly switch from the application to the video to pause or scrub backwards to replay missed steps. We present Pause-and-Play, a system that helps users work along with existing video tutorials. Pause-and-Play detects important events in the video and links them with corresponding events in the target application as the user tries to replicate the depicted prodedure. This linking allows our system to automatically pause and play the video to stay in sync with the user. Pause-and-Play also supports convenient video navigation controls that are accessible from within the target application and allow the user to easily replay portions of the video without switching focus out of the application. Finally, since our system uses computer vision to detect events in existing videos and leverages application scripting APIs to obtain real time usage traces, our approach is largely independent of the specific target application and does not require access or modifications to application source code. We have implemented Pause-and-Play for two target applications, Google SketchUp and Adobe Photoshop, and we report on a user study that shows our system improves the user experience of working with video tutorials.
- Citation:
- Suporn Pongnumkul, Mira Dontcheva, Wilmot Li, Jue Wang, Lubomir Bourdev, Shai Avidan, and Michael F. Cohen. Pause-and-Play: Automatically Linking Screencast Video Tutorials with Applications. UIST'11, October 2011.
- On-line documents:
- Complete article (PDF)
Autonomous Generation of Complete 3D Object Models Using Next Best View Manipulation Planning
- Abstract:
- Recognizing and manipulating objects is an important task for mobile robots performing useful services in everyday environments. In this paper, we develop a system that enables a robot to grasp an object and to move it in front of its depth camera so as to build a 3D surface model of the object. We derive an information gain based variant of the next best view algorithm in order to determine how the manipulator should move the object in front of the camera. By considering occlusions caused by the robot manipulator, our technique also determines when and how the robot should re-grasp the object in order to build a complete model.
- Citation:
- Michael Krainin, Brian Curless, and Dieter Fox. Autonomous Generation of Complete 3D Object Models Using Next Best View Manipulation Planning. Proc. of International Conference on Robotics and Automation (ICRA), 2011.
- On-line documents:
- Complete article (PDF)
Candid Portrait Selection From Video
- Abstract:
- In this paper, we train a computer to select still frames from video that work well as candid portraits. Because of the subjective nature of this task, we conduct a human subjects study to collect ratings of video frames across multiple videos. Then, we compute a number of features and train a model to predict the average rating of a video frame. We evaluate our model with cross-validation, and show that it is better able to select quality still frames than previous techniques, such as simply omitting frames that contain blinking or motion blur, or selecting only smiles. We also evaluate our technique qualitatively on videos that were not part of our validation set, and were taken outdoors and under different lighting conditions.
- Citation:
- Juliet Fiss, Aseem Agarwala, and Brian Curless. Candid Portrait Selection From Video. ACM Transactions on Graphics 30(6), December 2011.
- On-line documents:
- Complete article (PDF)
Project Page
High-resolution structure of a retroviral protease folded as a monomer
- Abstract:
- The crystal structure of Mason-Pfizer monkey virus protease folded as a monomer has been solved by molecular replacement using a model generated by players of the online game Foldit. The structure shows at high resolution the details of a retroviral protease folded as a monomer which can guide rational design of protease dimerization inhibitors as retroviral drugs.
- Citation:
- M. Gilski, M. Kazmierczyk, S. Krzywda, H. Zábranská, S. Cooper, Z. Popović, F. Khatib, F. DiMaio, J. Thompson, D. Baker, I. Pichová, M. Jaskolski. High-resolution structure of a retroviral protease folded as a monomer. Acta Crystallography D67, 2011, pp. 907-914.
PhotoCity: Training Experts at Large-scale Image Acquisition Through a Competitive Game
- Abstract:
- Large-scale, ground-level urban imagery has recently developed as an important element of online mapping tools such as Google's Street View. Such imagery is extremely valuable in a number of potential applications, ranging from augmented reality to 3D modeling, and from urban planning to monitoring city infrastructure. While such imagery is already available from many sources, including Street View and tourist photos on photo-sharing sites, these collections have drawbacks related to high cost, incompleteness, and accuracy. A potential solution is to leverage the community of photographers around the world to collaboratively acquire large-scale image collections. This work explores this approach through PhotoCity, an online game that trains its players to become "experts" at taking photos at targeted locations and in great density, for the purposes of creating 3D building models. To evaluate our approach, we ran a competition between two universities that resulted in the submission of over 100,000 photos, many of which were highly relevant for the 3D modeling task at hand. Although the number of players was small, we found that this was compensated for by incentives that drove players to become experts at photo collection, often capturing thousands of useful photos each.
- Citation:
- Kathleen Tuite, Noah Snavely, Dun-Yu Hsiao, Nadine Tabing, and Zoran Popović. PhotoCity: Training Experts at Large-scale Image Acquisition Through a Competitive Game. CHI 2011, May 2011, Vancouver, BC, Canada.
- On-line documents:
- Complete article (PDF)
Project Page
On the Harmfulness of Secondary Game Objectives
- Abstract:
- Secondary game objectives, optional challenges that players can choose to pursue or ignore, are a fundamental element of game design. Still, little is known about how secondary objectives affect player behavior. It is commonly believed that secondary objectives such as coins or collectible items can increase a game's flexibility, replayability, and depth. In contrast, we present results from analysis of two popular online Flash games showing that secondary objectives can easily harm the retention of many players. We support our findings with data collected from over 27,000 players through large-scale A/B tests in which we measured play time, progress, and return rate. We show that while secondary objectives can encourage long-term players to extend their playtime, they can also cause many players to play for less time. By modifying secondary objectives so that they reinforce the primary goal of the game instead of distracting from it, we are able to avoid negative consequences and still maintain the retention of long-term players. Our results suggest that secondary objectives that support the primary goal of the game are consistently useful, while secondary objectives that do not support the main goal require extensive testing to avoid negative consequences.
- Citation:
- Erik Andersen, Yun-En Liu, Richard Snider, Roy Szeto, Seth Cooper, and Zoran Popović. On the Harmfulness of Secondary Game Objectives. Foundations of Digital Games 2011.
- On-line documents:
- Complete article (PDF)
Project Page
Placing a Value on Aesthetics in Online Casual Games
- Abstract:
- Game designers frequently invest in aesthetic improvements such as music, sound effects, and animations. However, their exact value for attracting and retaining players remains unclear. Seeking to estimate this value in two popular Flash games, we conducted a series of large-scale A/B tests in which we selectively removed aesthetic improvements and examined the effect of each component on play time, progress, and return rate. We found that music and sound effects had little or no effect on player retention in either game, while animations caused users to play more. We also found, counterintuitively, that optional rewards caused players to play less in both games. In one game, this gameplay modification affected play time three times as much as the largest aesthetic variation. Our methodology provides a way to determine where resources may be best spent during the game design and development process.
- Citation:
- Erik Andersen, Yun-En Liu, Richard Snider, Roy Szeto, and Zoran Popović. Placing a Value on Aesthetics in Online Casual Games. CHI 2011, May 2011, Vancouver, BC, Canada.
- On-line documents:
- Complete article (PDF)
Project Page
Analysis of Social Gameplay Macros in the Foldit Cookbook
- Abstract:
- As games grow in complexity, gameplay needs to provide players with powerful means of managing this complexity. One approach is to give automation tools to players. In this paper, we analyze an in-game automation tool, the Foldit cookbook, for the scientific discovery game Foldit. The cookbook allows players to write recipes that can automate their strategies. Through analysis of cookbook usage, we ob- serve that players take advantage of social mechanisms in the game to share, run, and modify recipes. Further, players take advantage of both a simplified visual programming interface and a text-based scripting interface for creating recipes. This indicates that there is potential for using automation tools to disseminate expert knowledge, and that it is useful to provide support for multiple authoring styles, especially for games where the final game goal is unbounded or hard to attain.
- Citation:
- Seth Cooper, Firas Khatib, Ilya Makedon, Hao Lu, Janos Barbero, David Baker, James Fogarty, and Zoran Popović. Analysis of Social Gameplay Macros in the Foldit Cookbook. Foundations of Digital Games, 2011.
- On-line documents:
- Complete article (PDF)
Project Page
Crystal structure of a monomeric retroviral protease solved by protein folding game players
- Abstract:
- Following the failure of a wide range of attempts to solve the crystal structure of M-PMV retroviral protease by molecular replacement, we challenged players of the protein folding game Foldit to produce accurate models of the protein. Remarkably, Foldit players were able to generate models of sufficient quality for successful molecular replacement and subsequent structure determination. The refined structure provides new insights for the design of antiretroviral drugs.
- Citation:
- Firas Khatib, Frank DiMaio, Seth Cooper, Maciej Kazmierczyk, Miroslaw Gilski, Szymon Krzywda, Helena Zábranská, Iva Pichová, James Thompson, Zoran Popović, Mariusz Jaskolski, and David Baker. Crystal structure of a monomeric retroviral protease solved by protein folding game players. Nature Structural & Molecular Biology, 18, 2011, pp. 1175-1177.
- On-line documents:
- Complete article (PDF)
Project Page
Feature-Based Projections for Effective Playtrace Analysis
- Abstract:
- Visual data mining is a powerful technique allowing game designers to analyze player behavior. Playtracer, a new method for visually analyzing play traces, is a generalized heatmap that applies to any game with discrete state spaces. Unfortunately, due to its low discriminative power, Playtracer's usefulness is significantly decreased for games of even medium complexity, and is unusable on games with continuous state spaces. Here we show how the use of state features can remove both of these weaknesses. These state features collapse larger state spaces without losing salient information, resulting in visualizations that are significantly easier to interpret. We evaluate our work by analyzing player data gathered from three complex games in order to understand player behavior in the presence of optional rewards, identify key moments when players figure out the solution to the puzzle, and analyze why players give up and quit. Based on our experiences with these games, we suggest general principles for designers to identify useful features of game states that lead to effective play analyses.
- Citation:
- Yun-En Liu, Erik Andersen, Richard Snider, Seth Cooper, and Zoran Popović. Feature-Based Projections for Effective Playtrace Analysis. Foundations of Digital Games, 2011.
- On-line documents:
- Complete article (PDF)
Project Page
Face Reconstruction in the Wild
- Abstract:
- We address the problem of reconstructing 3D face models from large unstructured photo collections, e.g., obtained by Google image search or from personal photo collections in iPhoto. This problem is extremely challenging due to the high degree of variability in pose, illumination, facial expression, non-rigid changes in face shape and reflectance over time and occlusions. In light of this extreme variability, no single reconstruction can be consistent with all of the images. Instead, we define as the goal of reconstruction to recover a model that is locally consistent with the image set. I.e., each local region of the model is consistent with a large set of photos, resulting in a model that captures the dominant trends in the input data for different parts of the face. Our approach leverages multi-image shading, but unlike traditional photometric stereo approaches, allows for changes in viewpoint and shape. We optimize over pose, shape, and lighting in an iterative approach that seeks to minimize the rank of the transformed images. This approach produces high quality shape models for a wide range of celebrities from photos available on the Internet.
- Citation:
- Ira Kemelmacher-Shlizerman, Steven M. Seitz. "Face Reconstruction in the Wild." to appear, International Conference on Computer Vision (ICCV), 2011.
- On-line documents:
- Complete article (PDF)
Project Page
Binocular Photometric Stereo
- Abstract:
- This paper considers the problem of computing scene depth from a stereo pair of cameras under a sequence of illumination directions. By integrating parallax and shading cues, we obtain both metric depth and fine surface details. Casting this problem into the filter flow framework, enables a convex formulation of the problem, and thus a globally optimal solution. We demonstrate high quality, continuous depth maps on a range of examples.
- Citation:
- Hao Du, Dan B Goldman, Steven M. Seitz. "Binocular Photometric Stereo." British Machine Vision Conference (BMVC), 2011.
- On-line documents:
- Complete article (PDF)
Exploring Photobios
- Abstract:
- We present an approach for generating face animations from large image collections of the same person. Such collections, which we call photobios, sample the appearance of a person over changes in pose, facial expression, hairstyle, age, and other variations. By optimizing the order in which images are displayed and crossdissolving between them, we control the motion through face space and create compelling animations (e.g., render a smooth transition from frowning to smiling). Used in this context, the cross dissolve produces a very strong motion effect; a key contribution of the paper is to explain this effect and analyze its operating range. The approach operates by creating a graph with faces as nodes, and similarities as edges, and solving for walks and shortest paths on this graph. The processing pipeline involves face detection, locating fiducials (eyes/nose/mouth), solving for pose, warping to frontal views, and image comparison based on Local Binary Patterns. We demonstrate results on a variety of datasets including time-lapse photography, personal photo collections, and images of celebrities downloaded from the Internet. Our approach is the basis for the Face Movies feature in Google's Picasa.
- Citation:
- Ira Kemelmacher-Shlizerman, Eli Shechtman, Rahul Garg, Steven M. Seitz. "Exploring Photobios." ACM Transactions on Graphics 30(4), July 2011.
- On-line documents:
- Complete article (PDF)
Project Page
Multicore Bundle Adjustment
- Abstract:
- We present the design and implementation of new inexact Newton type Bundle Adjustment algorithms that exploit hardware parallelism for efficiently solving large scale 3D scene reconstruction problems. We explore the use of multicore CPU as well as multicore GPUs for this purpose. We show that overcoming the severe memory and bandwidth limitations of current generation GPUs not only leads to more space efficient algorithms, but also to surprising savings in runtime. Our CPU based system is up to ten times and our GPU based system is up to thirty times faster than the current state of the art methods, while maintaining comparable convergence behavior.
- Citation:
- Changchang Wu, Sameer Agarwal, Brian Curless, Steven M. Seitz. "Multicore Bundle Adjustment." CVPR 2011.
- On-line documents:
- Complete article (PDF)
Project Page
Repetition-based Dense Single-View Reconstruction
- Abstract:
- This paper presents a novel approach for dense reconstruction from a single-view of a repetitive scene structure. Given an image and its detected repetition regions, we model the shape recovery as the dense pixel correspondences within a single image. The correspondences are represented by an interval map that tells the distance of each pixel to its matched pixels within the single image. In order to obtain dense repetitive structures, we develop a new repetition constraint that penalizes the inconsistency between the repetition intervals of the dynamically corresponding pixel pairs. We deploy a graph-cut to balance between the high-level constraint of geometric repetition and the low-level constraints of photometric consistency and spatial smoothness. We demonstrate the accurate reconstruction of dense 3D repetitive structures through a variety of experiments, which prove the robustness of our approach to outliers such as structure variations, illumination changes, and occlusions.
- Citation:
- Changchang Wu, Jan-Michael Frahm, Marc Pollefeys. "Repetition-based Dense Single-View Reconstruction." CVPR 2011.
- On-line documents:
- Complete article (PDF)
Content-Aware Dynamic Timeline for Video Browsing
- Abstract:
- When browsing a long video using a traditional timeline slider control, its effectiveness and precision degrade as a video's length grows. When browsing videos with more frames than pixels in the slider, aside from some frames being inaccessible, scrolling actions cause sudden jumps in a video's continuity as well as video frames to flash by too fast for one to assess the content. We propose a content-aware dynamic timeline control that is designed to overcome these limitations. Our timeline control decouples video speed and playback speed, and leverages video content analysis to allow salient shots to be presented at an intelligible speed. Our control also takes advantage of previous work on elastic sliders, which allows us to produce an accurate navigation control.
- Citation:
- Suporn Pongnumkul, Jue Wang, Gonzalo Ramos, Michael Cohen. "Content-Aware Dynamic Timeline for Video Browsing." Proceedings of UIST 2010, pp. 139-142.
- On-line documents:
- Complete article (PDF)
Bundle Adjustment in the Large
- Abstract:
- We present the design and implementation of a new inexact Newton type algorithm for solving large-scale bundle adjustment problems with tens of thousands of images. We explore the use of Conjugate Gradients for calculating the Newton step and its performance as a function of some simple and computationally efficient preconditioners. We show that the common Schur complement trick is not limited to factorization-based methods and that it can be interpreted as a form of preconditioning. Using photos from a street-side dataset and several community photo collections, we generate a variety of bundle adjustment problems and use them to evaluate the performance of six different bundle adjustment algorithms. Our experiments show that truncated Newton methods, when paired with relatively simple preconditioners, offer state of the art performance for large-scale bundle adjustment.
- Citation:
- Sameer Agarwal, Noah Snavely, Steven M. Seitz, and Richard Szeliski. "Bundle Adjustment in the Large." Proceedings of ECCV 2010, Part II, pp. 29-42.
- On-line documents:
- Complete article (PDF)
Project page
Predicting protein structures with a multiplayer online game
- Abstract:
- People exert large amounts of problem-solving effort playing computer games. Simple image- and text-recognition tasks have been successfully 'crowd-sourced' through games, but it is not clear if more complex scientific problems can be solved with human-directed computing. Protein structure prediction is one such problem: locating the biologically relevant native conformation of a protein is a formidable computational challenge given the very large size of the search space. Here we describe Foldit, a multiplayer online game that engages non-scientists in solving hard prediction problems. Foldit players interact with protein structures using direct manipulation tools and user-friendly versions of algorithms from the Rosetta structure prediction methodology, while they compete and collaborate to optimize the computed energy. We show that top-ranked Foldit players excel at solving challenging structure refinement problems in which substantial backbone rearrangements are necessary to achieve the burial of hydrophobic residues. Players working collaboratively develop a rich assortment of new strategies and algorithms; unlike computational approaches, they explore not only the conformational space but also the space of possible search strategies. The integration of human visual problem-solving and strategy development capabilities with traditional computational algorithms through interactive multiplayer games is a powerful new approach to solving computationally-limited scientific problems.
- Citation:
- Seth Cooper, Firas Khatib, Adrien Treuille, Janos Barbero, Jeehyung Lee, Michael Beenen, Andrew Leaver-Fay, David Baker, Zoran Popović & Foldit players. "Predicting protein structures with a multiplayer online game." Nature, volume 466 (05 August 2010), pp. 756-760.
- On-line documents:
- Complete article (PDF)
The challenge of designing scientific discovery games
- Abstract:
- Incorporating the individual and collective problem solving skills of non-experts into the scientific discovery process could potentially accelerate the advancement of science. This paper discusses the design process used for Foldit, a multiplayer online biochemistry game that presents players with computationally difficult protein folding problems in the form of puzzles, allowing ordinary players to gain expertise and help solve these problems. The principle challenge of designing such scientific discovery games is harnessing the enormous collective problem-solving potential of the game playing population, who have not been previously introduced to the specific problem, or, often, the entire scientific discipline. To address this challenge, we took an iterative approach to designing the game, incorporating feedback from players and biochemical experts alike. Feedback was gathered both before and after releasing the game, to create the rules, interactions, and visualizations in Foldit that maximize contributions from game players. We present several examples of how this approach guided the game's design, and allowed us to improve both the quality of the gameplay and the application of player problem-solving.
- Citation:
- Seth Cooper, Firas Khatib, Adrien Treuille, Janos Barbero, Jeehyung Lee, Michael Beenen, Andrew Leaver-Fay, David Baker, Zoran Popović. "Predicting protein structures with a multiplayer online game." FDG 2010, Monterey, CA, June 2010.
- On-line documents:
- Complete article (PDF)
Single Image Deblurring Using Motion Density Functions
- Abstract:
- We present a novel single image deblurring method to estimate spatially non-uniform blur that results from camera shake. We use existing spatially invariant deconvolution methods in a local and robust way to compute initial estimates of the latent image. The camera motion is represented as a Motion Density Function (MDF) which records the fraction of time spent in each discretized portion of the space of all possible camera poses. Spatially varying blur kernels are derived directly from the MDF. We show that 6D camera motion is well approximated by 3 degrees of motion (in-plane translation and rotation) and analyze the scope of this approximation. We present results on both synthetic and captured data. Our system out-performs current approaches which make the assumption of spatially invariant blur.
- Citation:
- Ankit Gupta, Neel Joshi, C. Lawrence Zitnick, Michael Cohen, Brian Curless. "Single Image Deblurring Using Motion Density Functions." ECCV 2010.
- On-line documents:
- Complete article (PDF)
Being John Malkovich
- Abstract:
- Given a photo of person A, we seek a photo of person B with similar pose and expression. Solving this problem enables a form of puppetry, in which one person appears to control the face of another. When deployed on a webcam-equipped computer, our approach enables a user to control another person's face in real-time. This image-retrieval- inspired approach employs a fully-automated pipeline of face analysis techniques, and is extremely general|we can puppet anyone directly from their photo collection or videos in which they appear. We show several examples using images and videos of celebrities from the Internet.
- Citation:
- Ira Kemelmacher-Shlizerman, Aditya Sankar, Eli Shechtman, Steven M. Seitz. "Being John Malkovich." ECCV 2010.
- On-line documents:
- Complete article (PDF)
Project page
Learning Behavior Styles with Inverse Reinforcement Learning
- Abstract:
- We present a method for inferring the behavior styles of character controllers from a small set of examples. We show that a rich set of behavior variations can be captured by determining the appropriate reward function in the reinforcement learning framework, and show that the discovered reward function can be applied to different environments and scenarios. We also introduce a new algorithm to recover the unknown reward function that improves over the original apprenticeship learning algorithm. We show that the reward function representing a behavior style can be applied to a variety of different tasks, while still preserving the key features of the style present in the given examples. We describe an adaptive process where an author can, with just a few additional examples, refine the behavior so that it has better generalization properties.
- Citation:
- Seong Jae Lee, Zoran Popović. "Learning Behavior Styles with Inverse Reinforcement Learning."ACM Transactions on Graphics 29(4), Article 122, July 2010.
- On-line documents:
- Complete article (PDF)
Generating Sharp Panoramas from Motion-blurred Videos
- Abstract:
- In this paper, we show how to generate a sharp panorama from a set of motion-blurred video frames. Our technique is based on joint global motion estimation and multi-frame deblurring. It also automatically computes the duty cycle of the video, namely the percentage of time between frames that is actually exposure time. The duty cycle is necessary for allowing the blur kernels to be accurately extracted and then removed. We demonstrate our technique on a number of videos.
- Citation:
- Yunpeng Li, Sing Bing Kang, Steven M. Seitz, Daniel P. Huttenlocher. "Generating Sharp Panoramas from Motion-blurred Videos." CVPR 2010.
- On-line documents:
- Complete article (PDF)
Project page
Seeing through Obscure Glass
- Abstract:
- Obscure glass is textured glass designed to separate spaces and 'obscure' visibility between the spaces. Such glass is used to provide privacy while still allowing light to flow into a space, and is often found in homes and offices. We propose and explore the challenge of 'seeing through' obscure glass, using both optical and digital techniques. In some cases - such as when the textured surface is on the side of the observer - we find that simple household substances and cameras with small apertures enable a surprising level of visibility through the obscure glass. In other cases, where optical techniques are not usable, we find that we can model the action of obscure glass as convolution of spatially varying kernels and reconstruct an image of the scene on the opposite side of the obscure glass with surprising detail.
- Citation:
- Qi Shan, Brian Curless, Tadayoshi Kohno. "Seeing through Obscure Glass." ECCV 2010.
- On-line documents:
- Complete article (PDF)
Reconstructing the World in 3D: Bringing Games with a Purpose Outdoors
- Abstract:
- We are interested in reconstructing real world locations as detailed 3D models, but to achieve this goal, we require a large quantity of photographic data. We designed a game to employ the efforts and digital cameras of everyday people to not only collect this data, but to do so in a fun and effective way. The result is PhotoCity, a game played outdoors with a camera, in which players take photos to capture flags and take over virtual models of real buildings. The game falls into the genres of both games with a purpose (GWAPs) and alternate reality games (ARGs). Each type of game comes with its own inherent challenges, but as a hybrid of both, PhotoCity presented us with a unique combination of ob- stacles. This paper describes the design decisions made to address these obstacles, and seeks to answer the question: Can games be used to achieve massive data-acquisition tasks when played in the real world, away from standard game consoles? We conclude with a report on player experiences and showcase some 3D reconstructions built by players during gameplay.
- Citation:
- Kathleen Tuite, Noah Snavely, Dun-Yu Hsiao, Adam M. Smith, Zoran Popović. "Reconstructing the World in 3D: Bringing Games with a Purpose Outdoors." FDG 2010, June 2010.
- On-line documents:
- Complete article (PDF)
Character Animation in Two-Player Adversarial Games
- Abstract:
- The incorporation of randomness is critical for the believability and effectiveness of controllers for characters in competitive games. We present a fully automatic method for generating intelligent real-time controllers for characters in such a game. Our approach uses game theory to deal with the ramifications of the characters acting simultaneously, and generates controllers which employ both long-term planning and an intelligent use of randomness. Our results exhibit nuanced strategies based on unpredictability, such as feints and misdirection moves, which take into account and exploit the possible strategies of an adversary. The controllers are generated by examining the interaction between the rules of the game and the motions generated from a parametric motion graph. This involves solving a large-scale planning problem, so we also describe a new technique for scaling this process to higher dimensions.
- Citation:
- Kevin Wampler, Erik Andersen, Evan Herbst, Yongjoon Lee, Zoran Popović. "Character Animation in Two-Player Adversarial Games." ACM Transactions on Graphics 29(3), Article 26, June 2010.
- On-line documents:
- Complete article (PDF)
3D Point Correspondence by Minimum Description Length in Feature Space
- Abstract:
- Finding point correspondences plays an important role in automatically building statistical shape models from a training set of 3D surfaces. For the point correspondence problem, Davies et al. [1] proposed a minimum-descriptionlength- based objective function to balance the training errors and generalization ability. A recent evaluation study [2] that compares several well-known 3D point correspondence methods for modeling purposes shows that the MDL-based approach [1] is the best method.
We adapt the MDL-based objective function for a feature space that can exploit nonlinear properties in point correspondences, and propose an efficient optimization method to minimize the objective function directly in the feature space, given that the inner product of any vector pair can be computed in the feature space.We further employ a Mercer kernel [3] to define the feature space implicitly. A key aspect of our proposed framework is the generalization of the MDL-based objective function to kernel principal component analysis (KPCA) [4] spaces and the design of a gradient-descent approach to minimize such an objective function. We compare the generalized MDL objective function on KPCA spaces with the original one and evaluate their abilities in terms of reconstruction errors and specificity. From our experimental results on different sets of 3D shapes of human body organs, the proposed method performs significantly better than the original method.
- Citation:
- Chen, J.-H., Zheng, K. Colin, and Shapiro, Linda G.. 3D Point Correspondence by Minimum Description Length in Feature Space. ECCV 2010.
- On-line documents:
- Complete article (PDF)
Robust Interactive Image Segmentation with Automatic Boundary Refinement
- Abstract:
- We propose an effective image segmentation approach with a novel automatic boundary refinement procedure that requires little user interaction and makes the object cutout process more robust and convenient. It achieves these goals by the following three steps. First, merge over-segmented regions according to the maximal similarity rule using a few marking strokes as input. Second, detect possible erroneous low-contrast object boundaries by analyzing image content. Third, automatically refine those boundary regions using both local and global information. Experimental results are good even on very complex images.
- Citation:
- Liu, Dingding, Xiong, Yingen, Shapiro, Linda and Pulli, Kari. Robust Interactive Image Segmentation with Automatic Boundary Refinement. IEEE International Conference on Image Processing, 2010.
- On-line documents:
- Complete article (PDF)
Fast Interactive Image Segmentation by Discriminative Clustering
- Abstract:
- We propose a novel and fast interactive image segmentation algorithm for use on mobile phones. Instead of using global optimization, our algorithm begins with an initial over-segmentation using the mean shift algorithm and follows this by discriminative clustering and local neighborhood classification. This procedure obtains better quality results than previous methods that use graph cuts on oversegmented regions or region merging based on maximal similarity, yet its running time is smaller by an order of magnitude. We compare and analyze the strengths and limitations of the three approaches and have implemented our algorithm as part of an interactive object cut out application running on a mobile phone.
- Citation:
- Liu, Dingding, Pulli, Kari, Shapiro, Linda G., and Xiong, Yingen. Fast Interactive Image Segmentation by Discriminative Clustering. First ACM International Workshop on Mobile Cloud Media Computing, 2010.
- On-line documents:
- Complete article (PDF)
Haar Random Forest Features and SVM Spatial Matching Kernel for Stonefly Species Identification
- Abstract:
- This paper proposes an image classification method based on extracting image features using Haar random forests and combining them with a spatial matching kernel SVM. The method works by combining multiple efficient, yet powerful, learning algorithms at every stage of the recognition process. On the task of identifying aquatic stonefly larvae, the method has state-of-the-art or better performance, but with much higher efficiency.
- Citation:
- Larios, N., Soran, B., Shapiro, L. G., Martínez-Muñoz, G., Lin, J., and Dietterich, T. G.. Haar Random Forest Features and SVM Spatial Matching Kernel for Stonefly Species Identification. International Conference on Pattern Recognition, 2010.
- On-line documents:
- Complete article (PDF)
Terrain-Adaptive Bipedal Locomotion Control
- Abstract:
- We describe a framework for the automatic synthesis of biped locomotion controllers that adapt to uneven terrain at run-time. The framework consists of two components: a per-footstep end-effector path planner and a per-timestep generalized-force solver. At the start of each footstep, the planner performs short-term planning in the space of end-effector trajectories. These trajectories adapt to the interactive task goals and the features of the surrounding uneven terrain at run-time. We solve for the parameters of the planner for different tasks in offline optimizations. Using the per-footstep plan, the generalized-force solver takes ground contacts into consideration and solves a quadratic program at each simulation timestep to obtain joint torques that drive the biped. We demonstrate the capabilities of the controllers in complex navigation tasks where they perform gradual or sharp turns and transition between moving forwards, backwards, and sideways on uneven terrain (including hurdles and stairs) according to the interactive task goals. We also show that the resulting controllers are capable of handling morphology changes to the character.
- Citation:
- Wu, Jia-chi and Popović, Z.. Terrain-Adaptive Locomotion Control. ACM Transactions on Graphics 29(4), July 2010.
- On-line documents:
- Complete article (PDF)
Regenerative Morphing
- Abstract:
- We present a new image morphing approach in which the output sequence is regenerated from small pieces of the two source (input) images. The approach does not require manual correspondence, and generates compelling results even when the images are of very different objects (e.g., a cloud and a face). We pose the morphing task as an optimization with the objective of achieving bidirectional similarity of each frame to its neighbors, and also to the source images. The advantages of this approach are 1) it can operate fully automatically, producing effective results for many sequences (but also supports manual correspondences, when available), 2) ghosting artifacts are minimized, and 3) different parts of the scene move at different rates, yielding more interesting (and less robotic) transitions.
- Citation:
- Shechtman, E., Rav-Acha, A., Irani, M., and Seitz, S. M.. Regenerative Morphing. Proceedings of CVPR 2010, June 2010.
- On-line documents:
- Complete article (PDF)
Project Website
Towards Internet-scale Multi-view Stereo
- Abstract:
- This paper introduces an approach for enabling existing multi-view stereo methods to operate on extremely large unstructured photo collections. The main idea is to decompose the collection into a set of overlapping sets of photos that can be processed in parallel, and to merge the resulting reconstructions. This overlapping clustering problem is formulated as a constrained optimization and solved iteratively. The merging algorithm, designed to be parallel and out-of-core, incorporates robust filtering steps to eliminate low-quality reconstructions and enforce global visibility constraints. The approach has been tested on several large datasets downloaded from Flickr.com, including one with over ten thousand images, yielding a 3D reconstruction with nearly thirty million points.
- Citation:
- Furukawa, Y., Curless, B., Seitz, S. M., and Szeliski, R.. Towards Internet-scale Multi-view Stereo. Proceedings of CVPR 2010, June 2010.
- On-line documents:
- Complete article (PDF)
Video Tapestries with Continuous Temporal Zoom
- Abstract:
- We present a novel approach for summarizing video in the form of a multiscale image that is continuous in both the spatial domain and across the scale dimension: There are no hard borders between discrete moments in time, and a user can zoom smoothly into the image to reveal additional temporal details. We call these artifacts tapestries because their continuous nature is akin to medieval tapestries and other narrative depictions predating the advent of motion pictures. We propose a set of criteria for such a summarization, and a series of optimizations motivated by these criteria. These can be performed as an entirely offline computation to produce high quality renderings, or by adjusting some optimization parameters the later stages can be solved in real time, enabling an interactive interface for video navigation. Our video tapestries combine the best aspects of two common visualizations, providing the visual clarity of DVD chapter menus with the information density and multiple scales of a video editing timeline representation. In addition, they provide continuous transitions between zoom levels. In a user study, participants preferred both the aesthetics and efficiency of tapestries over other interfaces for visual browsing.
- Citation:
- Barnes, C., Goldman, D. B, Shechtman, E., and Finkelstein, A.. Video Tapestries with Continuous Temporal Zoom. ACM Transactions on Graphics 29(4), July 2010.
- On-line documents:
- Complete article (PDF)
Project Website
The Use of Genetic Programming for Learning 3D Craniofacial Shape Quantifications
- Abstract:
- Craniofacial disorders commonly result in various head shape dysmorphologies. The goal of this work is to quantify the various 3D shape variations that manifest in the different facial abnormalities in individuals with a craniofacial disorder called 22q11.2 Deletion Syndrome. Genetic programming (GP) is used to learn the different 3D shape quantifications. Experimental results show that the GP method achieves a higher classification rate than those of human experts and existing computer algorithms.
- Citation:
- Atmosukarto, I., Shapiro, L. G., and Heike, C.. The Use of Genetic Programming for Learning 3D Craniofacial Shape Quantifications. Proceedings of ICPR 2010.
- On-line documents:
- Complete article (PDF)
3D Object Retrieval Using Salient Views
- Abstract:
- This paper presents a method for selecting salient 2D views to describe 3D objects for the purpose of retrieval. The views are obtained by first identifying salient points via a learning approach that uses shape characteristics of the 3D points [4, 3]. The salient views are selected by choosing views with multiple salient points on the silhouette of the object. Silhouette-based similarity measures from [6] are then used to calculate the similarity between two 3D objects. Experimental results show that the retrieval results using the salient views are comparable to the existing light field descriptor method [6], and our method achieves a 15-fold speedup in the feature extraction computation time.
- Citation:
- Atmosukarto, I., and Shapiro, L. G.. 3D Object Retrieval Using Salient Views. Proceedings of MIR 2010, March 2010.
- On-line documents:
- Complete article (PDF)
Gameplay Analysis through State Projection
- Abstract:
- Analysis of gameplay data is crucial for evaluating design decisions and refining a game experience. However, identifying player strategies and finding areas of confusion is difficult because a designer may not know what queries to ask or what patterns to look for in the data. To make this task easier, we present Playtracer, a method for visually analyzing play traces that is independent of a specific game's structure. Playtracer applies multidimensional scaling to cluster players and game states, providing a detailed visual representation of the paths the players take through a game. We evaluate our method by analyzing an educational puzzle game and highlighting common hypotheses, pitfalls, confusing elements, and anomalies. Our results suggest that Playtracer can be an effective tool for game analysis and improvement.
- Citation:
- Andersen, E., Liu, Y-E., Apter, E., Boucher-Genessee, F., and Popović, Z.. Gameplay Analysis through State Projection. Proceedings of FDG 2010, June 2010.
- On-line documents:
- Complete article (PDF)
Project Website
Food Recognition Using Statistics of Pairwise Local Features
- Abstract:
- Food recognition is difficult because food items are deformable objects that exhibit significant variations in appearance. We believe the key to recognizing food is to exploit the spatial relationships between different ingredients (such as meat and bread in a sandwich). We propose a new representation for food items that calculates pairwise statistics between local features computed over a soft pixellevel segmentation of the image into eight ingredient types. We accumulate these statistics in a multi-dimensional histogram, which is then used as a feature vector for a discriminative classifier. Our experiments show that the proposed representation is significantly more accurate at identifying food than existing methods.
- Citation:
- Yang, S., Chen, M., Pomerleau, D., and Sukthankar, R.. Food Recognition Using Statistics of Pairwise Local Features. Proceedings of CVPR 2010.
- On-line documents:
- Complete article (PDF)
Efficient Gradient Domain Object Editing on Mobile Devices
- Abstract:
- We present a gradient domain object editing approach and its implementation for mobile devices. It can be used for creating a new composite image by removing, adding, and moving objects in an image. The approach can be divided into two parts: creation and editing of a new gradient vector field, and recovery of a new composite image from the new gradient vector field. In the first part, a new gradient vector field is created from the gradients of the source image, and then updated by inserting the new object gradients, by removing object gradients and filling removed areas with the gradients of best-fit patches found in other parts of the source image, or by combining these two processes when moving objects. In the second part, a divergence vector field is computed from the gradient vector field and used for a guidance vector to construct a Poisson equation. The new composite image is recovered from the gradient vector field by solving the Poisson equation with boundary conditions.
Our approach can merge all regions in the picture seamlessly with smooth color transition for the whole picture. It can be used for large object removal and for filling the background of the removed object. The final composite image is a globally optimal solution. The approach is implemented and runs with good performance on mobile camera phones.
- Citation:
- Xiong, Yingen, Liu, Dingding, and Pulli, Kari. Effective Gradient Domain Object Editing on Mobile Devices. IEEE 43rd Asilomar Conference on Signals, Systems, and Computers.
- On-line documents:
- Complete article (PDF)
GradientShop: A Gradient-Domain Optimization Framework for Image and Video Filtering
- Abstract:
- We present an optimization framework for exploring gradientdomain solutions for image and video processing. The proposed framework unifies many of the key ideas in the gradient-domain literature under a single optimization formulation. Our hope is that this generalized framework will allow the reader to quickly gain a general understanding of the field and contribute new ideas of their own.
We propose a novel metric for measuring local gradient-saliency that identifies salient gradients that give rise to long, coherent edges, even when the individual gradients are faint. We present a general weighting-scheme for gradient-constraints that improves the visual appearance of results. We also provide a solution for applying gradient-domain filters to videos and video streams in a coherent manner.
Finally, we demonstrate the utility of our formulation in creating effective yet simple to implement solutions for various imageprocessing tasks. To exercise our formulation we have created a new saliency-based sharpen filter and a pseudo image-relighting application. We also revisit and improve upon previously defined filters such as non-photorealistic rendering, image de-blocking, and sparse data interpolation over images (e.g., colorization using optimization).
- Citation:
- Bhat, P., Zitnick, C. L., Cohen, M. F., and Curless, B.. GradientShop: A Gradient-Domain Optimization Framework for Image and Video Filtering. ACM TOG 29(2), March 2010.
- On-line documents:
- Complete article (PDF)
Project Website
Globally Optimal Algorithms for Stratified Autocalibration
- Abstract:
- We present practical algorithms for stratified autocalibration with theoretical guarantees of global optimality. Given a projective reconstruction, we first upgrade it to affine by estimating the position of the plane at infinity. The plane at infinity is computed by globally minimizing a least squares formulation of the modulus constraints. In the second stage, this affine reconstruction is upgraded to a metric one by globally minimizing the infinite homography relation to compute the dual image of the absolute conic (DIAC). The positive semidefiniteness of the DIAC is explicitly enforced as part of the optimization process, rather than as a post-processing step.
For each stage, we construct and minimize tight convex relaxations of the highly non-convex objective functions in a branch and bound optimization framework.We exploit the inherent problem structure to restrict the search space for the DIAC and the plane at infinity to a small, fixed number of branching dimensions, independent of the number of views. Chirality constraints are incorporated into our convex relaxations to automatically select an initial region which is guaranteed to contain the global minimum.
Experimental evidence of the accuracy, speed and scalability of our algorithm is presented on synthetic and real data.
- Citation:
- Chandraker, M., Agarwal, S., Kriegman, D., and Belongie, S.. Globally Optimal Algorithms for Stratified Autocalibration. International Journal of Computer Vision, 2009.
- On-line documents:
- Complete article (PDF)
Project Website
3D Head Shape Quantification for Infants with and without Deformational Plagiocephaly
- Abstract:
- Citation:
- Atmosukarto, I., Shapiro, L. G., Starr, J. R., Heike, C. L., Collett, B., Cunningham, M. L., and Speltz, M. L.. 3D Head Shape Quantification for Infants with and without Deformational Plagiocephaly. In The Cleft-Palate Craniofacial Journal, 2009.
- On-line documents:
- Complete article (PDF)
3D object classification using salient point patterns with application to craniofacial research
- Abstract:
- This paper presents a new 3D shape representation and classification methodology developed for use in craniofacial dysmorphology studies. The methodology computes low-level features at each point of a 3D mesh representation, aggregates the features into histograms over mesh neighborhoods, learns the characteristics of salient point histograms for each particular application, and represents the points in a 2D spatial map based on a longitude–latitude transformation. Experimental results on the medical classification tasks show that our methodology achieves higher classification accuracy compared to medical experts and existing state-of-the-art 3D descriptors. Additional experimental results highlight the strength and advantage of the flexible framework that allows the methodology to generalize from specific medical classification tasks to general 3D object classification tasks.
- Citation:
- Atmosukarto, I., Wilamowska, K., Heike, C., and Shapiro, L. G.. 3D object classification using salient point patterns with application to craniofacial research. Pattern Recognition Journal, 43, 2010.
- On-line documents:
- Complete article (PDF)
PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing
- Abstract:
- This paper presents interactive image editing tools using a new randomized algorithm for quickly finding approximate nearestneighbor matches between image patches. Previous research in graphics and vision has leveraged such nearest-neighbor searches to provide a variety of high-level digital image editing tools. However, the cost of computing a field of such matches for an entire image has eluded previous efforts to provide interactive performance. Our algorithm offers substantial performance improvements over the previous state of the art (20-100x), enabling its use in interactive editing tools. The key insights driving the algorithm are that some good patch matches can be found via random sampling, and that natural coherence in the imagery allows us to propagate such matches quickly to surrounding areas. We offer theoretical analysis of the convergence properties of the algorithm, as well as empirical and practical evidence for its high quality and performance. This one simple algorithm forms the basis for a variety of tools – image retargeting, completion and reshuffling – that can be used together in the context of a high-level image editing application. Finally, we propose additional intuitive constraints on the synthesis process that offer the user a level of control unavailable in previous methods.
- Citation:
- Barnes, C., Shechtman, E., Finkelstein, A., and Goldman, D. B. PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing. ACM Transactions on Graphics 29(3), August 2009.
- On-line documents:
- Complete article (PDF)
Project Website
A Prism-based System for Multispectral Video Acquisition
- Abstract:
- In this paper, we propose a prism-based system for capturing multispectral videos. The system consists of a triangular prism, a monochrome camera, and an occlusion mask. Incoming light beams from the scene are sampled by the occlusion mask, dispersed into their constituent spectra by the triangular prism, and then captured by the monochrome camera. Our system is capable of capturing videos of high spectral resolution. It also allows for different tradeoffs between spectral and spatial resolution by adjusting the focal length of the camera. We demonstrate the effectiveness of our system with several applications, including human skin detection, physical material recognition, and RGB video generation.
- Citation:
- Du, H., Tong, X., Cao, X., and Lin, S.. A Prism-based System for Multispectral Video Acquisition. ICCV 2009.
- On-line documents:
- Complete article (PDF)
Project Website
Mining Web Interactions to Automatically Create Mash-Ups
- Abstract:
- The deep web contains an order of magnitude more information than the surface web, but that information is hidden behind the web forms of a large number of web sites. Metasearch engines can help users explore this information by aggregating results from multiple resources, but previously these could only be created and maintained by programmers. In this paper, we explore the automatic creation of metasearch mash-ups by mining the web interactions of multiple web users to find relations between query forms on different web sites. We also present an implemented system called TX2 that uses those connections to search multiple deep web resources simultaneously and integrate the results in context in a single results page. TX2 illustrates the promise of constructing mash-ups automatically and the potential of mining web interactions to explore deep web resources.
- Citation:
- Bigham, J. P., Kaminsky, R. S., and Nichols, J. Mining Web Interactions to Automatically Create Mash-Ups. UIST 2009, October 2009, to appear.
- On-line documents:
- Complete article (PDF)
Alignment of 3D Point Clouds to Overhead Images
- Abstract:
- We address the problem of automatically aligning structure-from-motion reconstructions to overhead images, such as satellite images, maps and floor plans, generated from an orthographic camera. We compute the optimal alignment using an objective function that matches 3D points to image edges and imposes free space constraints based on the visibility of points in each camera. We demonstrate the accuracy of our alignment algorithm on several outdoor and indoor scenes using both satellite and floor plan images. We also present an application of our technique, which uses a labeled overhead image to automatically tag the input photo collection with textual information.
- Citation:
- Kaminsky, R. S., Snavely, N., Seitz, S. M., and Szeliski, R. Alignment of 3D Point Clouds to Overhead Images. CVPR 2009 Workshop on Internet Vision, June 2009.
- On-line documents:
- Complete article (PDF)
Reconstructing Building Interiors from Images
- Abstract:
- This paper proposes a fully automated 3D reconstruction and visualization system for architectural scenes (interiors and exteriors). The reconstruction of indoor environments from photographs is particularly challenging due to texture-poor planar surfaces such as uniformly-painted walls. Our system first uses structure-from-motion, multiview stereo, and a stereo algorithm specifically designed for Manhattan-world scenes (scenes consisting predominantly of piece-wise planar surfaces with dominant directions) to calibrate the cameras and to recover initial 3D geometry in the form of oriented points and depth maps. Next, the initial geometry is fused into a 3D model with a novel depth-map integration algorithm that, again, makes use of Manhattanworld assumptions and produces simplified 3D models. Finally, the system enables the exploration of reconstructed environments with an interactive, image-based 3D viewer. We demonstrate results on several challenging datasets, including a 3D reconstruction and image-based walk-through of an entire floor of a house, the first result of this kind from an automated computer vision system.
- Citation:
- Furukawa, Y., Curless, B., Seitz, S. M., and Szeliski, R. Reconstructing Building Interiors from Images. ICCV 2009.
- On-line documents:
- Complete article (PDF)
Project
Filter Flow
- Abstract:
- The filter flow problem is to compute a space-variant linear filter that transforms one image into another. This framework encompasses a broad range of transformations including stereo, optical flow, lighting changes, blur, and combinations of these effects. Parametric models such as affine motion, vignetting, and radial distortion can also be modeled within the same framework. All such transformations are modeled by selecting a number of constraints and objectives on the filter entries from a catalog which we enumerate. Most of the constraints are linear, leading to globally optimal solutions (via linear programming) for affine transformations, depth-from-defocus, and other problems. Adding a (non-convex) compactness objective enables solutions for optical flow with illumination changes, spacevariant defocus, and higher-order smoothness.
- Citation:
- Seitz, Steven M. and Baker, S.. Filter Flow. ICCV 2009.
- On-line documents:
- Complete article (PDF)
Supplemental Material (PDF)
Building Rome in a Day
- Abstract:
- We present a system that can match and reconstruct 3D scenes from extremely large collections of photographs such as those found by searching for a given city (e.g., Rome) on Internet photo sharing sites. Our system uses a collection of novel parallel distributed matching and reconstruction algorithms, designed to maximize parallelism at each stage in the pipeline and minimize serialization bottlenecks. It is designed to scale gracefully with both the size of the problem and the amount of available computation. We have experimented with a variety of alternative algorithms at each stage of the pipeline and report on which ones work best in a parallel computing environment. Our experimental results demonstrate that it is now possible to reconstruct cities consisting of 150K images in less than a day on a cluster with 500 compute cores.
- Citation:
- Agarwal, S., Snavely, N., Simon, I., Seitz, S. M., and Szeliski, R.. Building Rome in a Day. ICCV 2009.
- On-line documents:
- Complete article (PDF)
Project
The Dimensionality of Scene Appearance
- Abstract:
- Low-rank approximation of image collections (e.g., via PCA) is a popular tool in many areas of computer vision. Yet, surprisingly little is known justifying the observation that images of an object or scene tend to be low dimensional, beyond the special case of Lambertian scenes. This paper considers the question of how many basis images are needed to span the space of images of a scene under realworld lighting and viewing conditions, allowing for general BRDFs. We establish new theoretical upper bounds on the number of basis images necessary to represent a wide variety of scenes under very general conditions, and perform empirical studies to justify the assumptions. We then demonstrate a number of novel applications of linear models for scene appearance for Internet photo collections. These applications include, image reconstruction, occluder-removal, and expanding field of view.
- Citation:
- Garg, R., Du, H., Seitz, S. M., and Snavely, N. The Dimensionality of Scene Appearance. ICCV 2009.
- On-line documents:
- Complete article (PDF)
Project
Optimal Gait and Form for Animal Locomotion
- Abstract:
- We present a fully automatic method for generating gaits and morphologies for legged animal locomotion. Given a specific animal's shape we can determine an efficient gait with which it can move. Similarly, we can also adapt the animalâs morphology to be optimal for a specific locomotion task. We show that determining such gaits is possible without the need to specify a good initial motion, and without manually restricting the allowed gaits of each animal. Our approach is based on a hybrid optimization method which combines an efficient derivative-aware spacetime constraints optimization with a derivative-free approach able to find non-local solutions in high-dimensional discontinuous spaces. We demonstrate the effectiveness of this approach by synthesizing dynamic locomotions of bipeds, a quadruped, and an imaginary five-legged creature.
- Citation:
- Wampler, K. and Popović, Z..Optimal Gait and Form for Animal Locomotion. ACM Transactions on Graphics 28(3), August 2009 (Proceedings of SIGGRAPH 2009).
- On-line documents:
- Complete article (PDF)
Project
Contact-aware Nonlinear Control of Dynamic Characters
- Abstract:
- Dynamically simulated characters are difficult to control because they are underactuated - they have no direct control over their global position and orientation. In order to succeed, control policies must look ahead to determine stabilizing actions, but such planning is complicated by frequent ground contacts that produce a discontinuous search space. This paper introduces a locomotion system that generates high-quality animation of agile movements using nonlinear controllers that plan through such contact changes. We demonstrate the general applicability of this approach by emulating walking and running motions in rigid-body simulations. Then we consolidate these controllers under a higher-level planner that interactively controls the character's direction.
- Citation:
- Muico, U., Lee, Y., Popović, J. and Popović, Z. Contact-aware Nonlinear Control of Dynamic Characters. ACM Transactions on Graphics 28(3), August 2009 (Proceedings of SIGGRAPH 2009).
- On-line documents:
- Complete article (PDF)
Project
Dense 3D Motion Capture for Human Faces
- Abstract:
- This paper proposes a novel approach to motion capture from multiple, synchronized video streams, specifically aimed at recording dense and accurate models of the structure and motion of highly deformable surfaces such as skin, that stretches, shrinks, and shears in the midst of normal facial expressions. Solving this problem is a key step toward effective performance capture for the entertainment industry, but progress so far has been hampered by the lack of appropriate local motion and smoothness models. The main technical contribution of this paper is a novel approach to regularization adapted to nonrigid tangential deformations. Concretely, we estimate the nonrigid deformation parameters at each vertex of a surface mesh, smooth them over a local neighborhood for robustness, and use them to regularize the tangential motion estimation. To demonstrate the power of the proposed approach, we have integrated it into our previous work for markerless motion capture [9], and compared the performances of the original and new algorithms on three extremely challenging face datasets that include highly nonrigid skin deformations, wrinkles, and quickly changing expressions. Additional experiments with a dataset featuring fast-moving cloth with complex and evolving fold structures demonstrate that the adaptability of the proposed regularization scheme to nonrigid tangential motion does not hamper its robustness, since it successfully recovers the shape and motion of the cloth without overfitting it despite the absence of stretch or shear in this case.
- Citation:
- Furukawa, Y. and Ponce, J. Dense 3D Motion Capture for Human Faces. CVPR 2009, June 2009.
- On-line documents:
- Complete article (PDF)
Project
Parallax Photography: Creating 3D Cinematic Effects from Stills
- Abstract:
- We present an approach to convert a small portion of a light field with extracted depth information into a cinematic effect with simulated, smooth camera motion that exhibits a sense of 3D parallax. We develop a taxonomy of the cinematic conventions of these effects, distilled from observations of documentary film footage and organized by the number of subjects of interest in the scene. We present an automatic, content-aware approach to apply these cinematic conventions to an input light field. A face detector identifies subjects of interest. We then optimize for a camera path that conforms to a cinematic convention, maximizes apparent parallax, and avoids missing information in the input. We describe a GPU-accelerated, temporally coherent rendering algorithm that allows users to create more complex camera moves interactively, while experimenting with effects such as focal length, depth of field, and selective, depth-based desaturation or brightening. We evaluate and demonstrate our approach on a wide variety of scenes and present a user study that compares our 3D cinematic effects to their 2D counterparts.
- Citation:
- Zheng, K., Colburn, A., Agarwala, A., Agrawala, M., Curless, B., Salesin, D., and Cohen, M. Parallax Photography: Creating 3D Cinematic Effects from Stills. Proceedings of Graphics Interface 2009.
- On-line documents:
- Complete article (PDF)
Project
Dictionary-Free Categorization of Very Similar Objects via Stacked Evidence Trees
- Abstract:
- Current work in object categorization discriminates among objects that typically possess gross differences which are readily apparent. However, many applications require making much finer distinctions. We address an insect categorization problem that is so challenging that even trained human experts cannot readily categorize the insects based on their images. The state of the art that uses visual dictionaries, when applied to this problem, yields mediocre results (16.1% error). Three possible explanations for this are (a) the dictionaries are unsupervised, (b) the dictionaries lose the detailed information contained in each keypoint, and (c) these methods rely on hand-engineered decisions about dictionary size. This paper presents a novel, dictionary-free methodology. A random forest of trees is first trained to predict the class of an image based on individual keypoint descriptors. A unique aspect of these trees is that they do not make decisions but instead merely record evidence - i.e., the number of descriptors from training examples of each category that reached each leaf of the tree. We provide a mathematical model showing that voting evidence is better than voting decisions. To categorize a new image, descriptors for all detected keypoints are "dropped" through the trees, and the evidence at each leaf is summed to obtain an overall evidence vector. This is then sent to a second-level classifier to make the categorization decision. We achieve excellent performance (6.4% error) on the 9- class STONEFLY9 data set. Also, our method achieves an average AUC of 0.921 on the PASCAL06 VOC, which places it fifth out of 21 methods reported in the literature and demonstrates that the method also works well for generic object categorization.
- Citation:
- Martínez-Muñoz, G., Zhang, W., Payet, N., Todorovic, S., Larios, N., Yamamuro, A., Lytle, D., Moldenke, A., Mortensen, E., Paasch, R., Shapiro, L., and Dietterich, T.. Dictionary-Free Categorization of Very Similar Objects via Stacked Evidence Trees. CVPR 2009, June 2009.
- On-line documents:
- Complete article (PDF)
Manhattan-World Stereo
- Abstract:
- Multi-view stereo (MVS) algorithms now produce reconstructions that rival laser range scanner accuracy. However, stereo algorithms require textured surfaces, and therefore work poorly for many architectural scenes (e.g., building interiors with textureless, painted walls). This paper presents a novel MVS approach to overcome these limitations for Manhattan World scenes, i.e., scenes that consists of piece-wise planar surfaces with dominant directions. Given a set of calibrated photographs, we first reconstruct textured regions using an existing MVS algorithm, then extract dominant plane directions, generate plane hypotheses, and recover per-view depth maps using Markov random fields. We have tested our algorithm on several datasets ranging from office interiors to outdoor buildings, and demonstrate results that outperform the current state of the art for such texture-poor scenes.
- Citation:
- Furukawa, Y., Curless, B., Seitz, Steven M., and Szeliski, R. Manhattan-World Stereo. CVPR 2009, June 2009.
- On-line documents:
- Complete article (PDF)
Project
Enhancing and Experiencing Spacetime Resolution with Video and Stills
- Abstract:
- We present solutions for enhancing the spatial and/or temporal resolution of videos. Our algorithm targets the emerging consumer-level hybrid cameras that can simultaneously capture video and high-resolution stills. Our technique produces a high spacetime resolution video using the high-resolution stills for rendering and the low-resolution video to guide the reconstruction and the rendering process. Our framework integrates and extends two existing algorithms, namely a high-quality optical flow algorithm and a high-quality image-based-rendering algorithm. The framework enables a variety of applications that were previously unavailable to the amateur user, such as the ability to (1) automatically create videos with high spatiotemporal resolution, and (2) shift a high-resolution still to nearby points in time to better capture a missed event.
- Citation:
- Gupta, A., Bhat, P., Dontcheva, M., Cohen, Michael F., Curless, B., and Deussen, O.. Enhancing and Experiencing Spacetime Resolution with Videos and Stills. ICCP 2009, April 2009.
- On-line documents:
- Complete article (PDF)
Project Page
Zoetrope: Interacting with the Ephemeral Web
- Abstract:
- The Web is ephemeral. Pages change frequently, and it is nearly impossible to find data or follow a link after the underlying page evolves. We present Zoetrope, a system that enables interaction with the historicalWeb (pages, links, and embedded data) that would otherwise be lost to time. Using a number of novel interactions, the temporal Web can be manipulated, queried, and analyzed from the context of familar pages. Zoetrope is based on a set of operators for manipulating content streams. We describe these primitives and the associated indexing strategies for handling temporal Web data. They form the basis of Zoetrope and enable our construction of new temporal interactions and visualizations.
- Citation:
- Adar, E., Dontcheva, M., Fogarty, J., and Weld, D. S.. Zoetrope: Interacting with the Ephemeral Web. UIST 2008.
- On-line documents:
- Complete article (PDF, 2MB)
Adaptive Layout for Dynamically Aggregated Documents
- Abstract:
- We present a system for designing and displaying grid-based document designs that adapt to many different viewing conditions and content selections. Our system can display traditional, static documents, or it can assemble dynamic documents "on the fly" from many disparate sources via the Internet. Our adaptive layouts for aggregated documents are inspired by traditional newspaper design. Furthermore, our system allows documents to be interactive so that readers can customize documents as they read them. Our system builds on previous work on adaptive documents, using constraint based templates to specify content-independent page designs. The new templates we describe are much more flexible in their ability to adapt to different types of content and viewing situations. This flexibility comes from allowing the individual components, or "elements," of the templates to be mixed and matched, according to the content being displayed. We demonstrate our system with two example applications: an interactive news reader for the New York Times, and an Internet news aggregator based on MSN Newsbot.
- Citation:
- Schrier, E., Dontcheva, M., Jacobs, C., Wade, G., and Salesin, D.. Adaptive Layout for Dynamically Aggregated Documents. IUI 2008.
- On-line documents:
- Complete article (PDF, 4MB)
Creating Map-based Storyboards for Browsing Tour Videos
- Abstract:
- Watching a long unedited video is usually a boring experience. In this paper we examine a particular subset of videos, tour videos, in which the video is captured by walking about with a running camera with the goal of conveying the essence of some place. We present a system that makes the process of sharing and watching a long tour video easier, less boring, and more informative. To achieve this, we augment the tour video with a map-based storyboard, where the tour path is reconstructed, and coherent shots at different locations are directly visualized on the map. This allows the viewer to navigate the video in the joint location-time space. To create such a storyboard we employ an automatic pre-processing component to parse the video into coherent shots, and an authoring tool to enable the user to tie the shots with landmarks on the map. The browser-based viewing tool allows users to navigate the video in a variety of creative modes with a rich set of controls, giving each viewer a unique, personal viewing experience. Informal evaluation shows that our approach works well for tour videos compared with conventional media players.
- Citation:
- Pongnumkul, S., Wang, J., and Cohen, M.. Creating Map-based Storyboards for Browsing Tour Videos. UIST 2008.
- On-line documents:
- Complete article (PDF, 2MB)
A Salient-Point Signature for 3D Object Retrieval
- Abstract:
- In this paper we describe a new 3D object signature and evaluate its performance for 3D object retrieval. The signature is based on a learning approach that finds the characteristics of salient points on a 3D object and represents the points in a 2D spatial map based on a longitude-latitude transformation. Experimental results show that the signature is able to achieve good retrieval scores for both pose-normalized and randomly-rotated object queries.
- Citation:
- Atmosukarto, I. and Shapiro, L. G.. A Salient-Point Signature for 3D Object Retrieval, ACM Multimedia Information Retrieval (MIR), October 2008.
- On-line documents:
- Complete article (PDF, 2MB)
Fourier Analysis of the 2D Screened Poisson Equation for Gradient Domain Problems
- Abstract:
- We analyze the problem of reconstructing a 2D function that approximates a set of desired gradients and a data term. The combined data and gradient terms enable operations like modifying the gradients of an image while staying close to the original image. Starting with a variational formulation, we arrive at the "screened Poisson equation" known in physics. Analysis of this equation in the Fourier domain leads to a direct, exact, and efficient solution to the problem. Further analysis reveals the structure of the spatial filters that solve the 2D screened Poisson equation and shows gradient scaling to be a well-defined sharpen filter that generalizes Laplacian sharpening, which itself can be mapped to gradient domain filtering. Results using a DCT-based screened Poisson solver are demonstrated on several applications including image blending for panoramas, image sharpening, and de-blocking of compressed images.
- Citation:
- Bhat, P., Curless, B., Cohen, M., and Zitnick, C. L.. Fourier Analysis of the 2D Screened Poisson Equation for Gradient Domain Problems. ECCV 2008.
- On-line documents:
- Complete article (PDF, 1MB)
Project Site
Scene Segmentation Using the Wisdom of Crowds
- Abstract:
- Given a collection of images of a static scene taken by many different people, we identify and segment interesting objects. To solve this problem, we use the distribution of images in the collection along with a new field-of-view cue, which leverages the observation that people tend to take photos that frame an object of interest within the field of view. Hence, image features that appear together in many images are likely to be part of the same object. We evaluate the effectiveness of this cue by comparing the segmentations computed by our method against hand-labeled ones for several different models. We also show how the results of our segmentations can be used to highlight important objects in the scene and label them using noisy user-specified textual tag data. These methods are demonstrated on photos of several popular tourist sites downloaded from the Internet.
- Citation:
- Simon, I. and Seitz, S. M.. Scene Segmentation Using the Wisdom of Crowds. ECCV 2008.
- On-line documents:
- Complete article (PDF, 16MB)
Fast Algorithms for L_infty Problems in Multiview Geometry
- Abstract:
- Many problems in multi-view geometry, when posed as minimization of the maximum reprojection error across observations, can be solved optimally in polynomial time. We show that these problems are instances of a convex-concave generalized fractional program. We survey the major solution methods for solving problems of this form and present them in a unified framework centered around a single parametric optimization problem. We propose two new algorithms and show that the algorithm proposed by Olsson et al. [21] is a special case of a classical algorithm for generalized fractional programming. The performance of all the algorithms is compared on a variety of datasets, and the algorithm proposed by Gugat [12] stands out as a clear winner. An open source MATLAB toolbox thats implements all the algorithms presented here is made available.
- Citation:
- Agarwal, S., Snavely, N., and Seitz, S. M.. Fast Algorithms for L_infty Problems in Multiview Geometry. CVPR 2008.
- On-line documents:
- Complete article (PDF, 1MB)
Video Object Annotation, Navigation, and Composition
- Abstract:
- We explore the use of tracked 2D object motion to enable novel approaches to interacting with video. These include moving annotations, video navigation by direct manipulation of objects, and creating an image composite from multiple video frames. Features in the video are automatically tracked and grouped in an off-line preprocess that enables later interactive manipulation. Examples of annotations include speech and thought balloons, video graffiti, path arrows, video hyperlinks, and schematic storyboards. We also demonstrate a direct-manipulation interface for random frame access using spatial constraints, and a drag-and-drop interface for assembling still images from videos. Taken together, our tools can be employed in a variety of applications including film and video editing, visual tagging, and authoring rich media such as hyperlinked video.
- Citation:
- Goldman, D. B, Gonterman, C., Curless, B., Salesin, D., and Seitz, S. M.. Video Object Annotation, Navigation, and Composition. UIST 2008.
- On-line documents:
- Complete article (PDF, 7MB)
Project Site
In Defense of Nearest-Neighbor Based Image Classification
- Abstract:
- State-of-the-art image classification methods require an intensive learning/training stage (using SVM, Boosting, etc.) In contrast, non-parametric Nearest-Neighbor (NN) based image classifiers require no training time and have other favorable properties. However, the large performance gap between these two families of approaches rendered NNbased image classifiers useless.
We claim that the effectiveness of non-parametric NNbased image classification has been considerably undervalued. We argue that two practices commonly used in image classification methods, have led to the inferior performance of NN-based image classifiers: (i) Quantization of local image descriptors (used to generate "bags-of-words," codebooks). (ii) Computation of 'Image-to-Image' distance, instead of 'Image-to-Class' distance.
We propose a trivial NN-based classifier - NBNN, (Naive-Bayes Nearest-Neighbor), which employs NNdistances in the space of the local image descriptors (and not in the space of images). NBNN computes direct 'Image-to- Class' distances without descriptor quantization. We further show that under the Naive-Bayes assumption, the theoretically optimal image classifier can be accurately approximated by NBNN.
Although NBNN is extremely simple, efficient, and requires no learning/training phase, its performance ranks among the top leading learning-based image classifiers. Empirical comparisons are shown on several challenging databases (Caltech-101,Caltech-256 and Graz-01).
- Citation:
- Boiman, O., Shechtman, E., and Irani, M. In Defense of Nearest-Neighbor Based Image Classification. CVPR 2008.
- On-line documents:
- Complete article (PDF, 1MB)
Summarizing Visual Data Using Bidirectional Similarity
- Abstract:
- We propose a principled approach to summarization of visual data (images or video) based on optimization of a well-defined similarity measure. The problem we consider is re-targeting (or summarization) of image/video data into smaller sizes. A good "visual summary" should satisfy two properties: (1) it should contain as much as possible visual information from the input data; (2) it should introduce as few as possible new visual artifacts that were not in the input data (i.e., preserve visual coherence). We propose a bi-directional similarity measure which quantitatively captures these two requirements: Two signals S and T are considered visually similar if all patches of S (at multiple scales) are contained in T, and vice versa.
The problem of summarization/re-targeting is posed as an optimization problem of this bi-directional similarity measure. We show summarization results for image and video data. We further show that the same approach can be used to address a variety of other problems, including automatic cropping, completion and synthesis of visual data, image collage, object removal, photo reshuffling and more.
- Citation:
- Simakov, D., Caspi, Y., Shechtman, E., and Irani, M. Summarizing Visual Data Using Bidirectional Similarity. CVPR 2008.
- On-line documents:
- Complete article (PDF, 2.5MB)
MySong: Automatic Accompaniment Generation for Vocal Melodies
- Abstract:
- We propose a principled approach to summarization of visual data (images or video) based on optimization of a well-defined similarity measure. The problem we consider is re-targeting (or summarization) of image/video data into smaller sizes. A good "visual summary" should satisfy two properties: (1) it should contain as much as possible visual information from the input data; (2) it should introduce as few as possible new visual artifacts that were not in the input data (i.e., preserve visual coherence). We propose a bi-directional similarity measure which quantitatively captures these two requirements: Two signals S and T are considered visually similar if all patches of S (at multiple scales) are contained in T, and vice versa.
The problem of summarization/re-targeting is posed as an optimization problem of this bi-directional similarity measure. We show summarization results for image and video data. We further show that the same approach can be used to address a variety of other problems, including automatic cropping, completion and synthesis of visual data, image collage, object removal, photo reshuffling and more.
- Citation:
- Simon, I., Morris, D., and Basu, S.. MySong: Automatic Accompaniment Generation for Vocal Melodies. CHI 2008.
- On-line documents:
- Complete article (PDF, 1MB)
Finding Paths through the World's Photos
- Abstract:
- When a scene is photographed many times by different people, the viewpoints often cluster along certain paths. These paths are largely specific to the scene being photographed, and follow interesting regions and viewpoints. We seek to discover a range of such paths and turn them into controls for image-based rendering. Our approach takes as input a large set of community or personal photos, reconstructs camera viewpoints, and automatically computes orbits, panoramas, canonical views, and optimal paths between views. The scene can then be interactively browsed in 3D using these controls or with six degree-of-freedom free-viewpoint control. As the user browses the scene, nearby views are continuously selected and transformed, using control-adaptive reprojection techniques.
- Citation:
- Snavely, N., Garg, R., Seitz, S. M., and Szeliski, R. Finding Paths through the World's Photos. ACM Transactions on Graphics 27(3), August 2008.
- On-line documents:
- Complete article (PDF, 12MB)
Modeling the World from Internet Photo Collections
- Abstract:
- There are billions of photographs on the Internet, comprising the largest and most diverse photo collection ever assembled. How can computer vision researchers exploit this imagery? This paper explores this question from the standpoint of 3D scene modeling and visualization. We present structure-from-motion and image-based rendering algorithms that operate on hundreds of images downloaded as a result of keyword-based image search queries like "Notre Dame" or "Trevi Fountain." This approach, which we call Photo Tourism, has enabled reconstructions of numerous well-known world sites. This paper presents these algorithms and results as a first step towards 3D modeling of the world's well-photographed sites, cities, and landscapes from Internet imagery, and discusses key open problems and challenges for the research community.
- Citation:
- Snavely, N., Seitz, S. M., and Szeliski, R. Modeling the World from Internet Photo Collections. Accepted to IJCV, 2008.
- On-line documents:
- Complete article (PDF, 2MB)
Skeletal Graphs for Efficient Structure from Motion
- Abstract:
- We address the problem of efficient structure from motion for large, unordered, highly redundant, and irregularly sampled photo collections, such as those found on Internet photo-sharing sites. Our approach computes a small skeletal subset of images, reconstructs the skeletal set, and adds the remaining images using pose estimation. Our technique drastically reduces the number of parameters that are considered, resulting in dramatic speedups, while provably approximating the covariance of the full set of parameters. To compute a skeletal image set, we first estimate the accuracy of two-frame reconstructions between pairs of overlapping images, then use a graph algorithm to select a subset of images that, when reconstructed, approximates the accuracy of the full set. A final bundle adjustment can then optionally be used to restore any loss of accuracy.
- Citation:
- Snavely, N., Seitz, S. M., and Szeliski, R. Skeletal graphics for efficient structure from motion. CVPR 2008.
- On-line documents:
- Complete article (PDF, 2MB)
Automated Generation of Interactive 3D Exploded View Diagrams
- Abstract:
- We present a system for creating and viewing interactive exploded views of complex 3D models. In our approach, a 3D input model is organized into an explosion graph that encodes how parts explode with respect to each other. We present an automatic method for computing explosion graphs that takes into account part hierarchies in the input models and handles common classes of interlocking parts. Our system also includes an interface that allows users to interactively explore our exploded views using both direct controls and higher-level interaction modes.
- Citation:
- Li, W., Agrawala, M., Curless, B., and Salesin, D. Automated Generation of Interactive 3D Exploded View Diagrams. ACM Transactions on Graphics 27(3), August 2008.
- On-line documents:
- Complete article (PDF, 4.5MB)
Project page
Practical Global Optimization for Multiview Geometry
- Abstract:
This paper presents a practical method for finding the provably globally optimal solution to numerous problems in projective geometry including multiview triangulation, camera resectioning and homography estimation. Unlike traditional methods which may get trapped in local minima due to the non-convex nature of these problems, this approach provides a theoretical guarantee of global optimality. The formulation relies on recent developments in fractional programming and the theory of convex underestimators and allows a unified framework for minimizing the standard L2-norm of reprojection errors which is optimal under Gaussian noise as well as the more robust L1-norm which is less sensitive to outliers. Even though the worst case complexity of our algorithm is exponential, the practical efficacy is empirically demonstrated by good performance on experiments for both synthetic and real data. An open source MATLAB toolbox that implements the algorithm is also made available to facilitate further research.
- Citation:
- Practical Global Optimization for Multiview Geometry. Kahl, F., Agarwal, S., Chandraker, M., Kriegman, D. and Belongie, S.. International Journal of Computer Vision, 79(3), September 2008, pages 271-284.
- On-line documents:
- Complete article (PDF)
Rectified Surface Mosaics
- Abstract:
We approach mosaicing as a camera tracking problem within a known parameterized surface. From a video of a camera moving within a surface, we compute a mosaic representing the texture of that surface, flattened onto a planar image. Our approach works by defining a warp between images as a function of surface geometry and camera pose. Globally optimizing this warp to maximize alignment across all frames determines the camera trajectory, and the corresponding flattened mosaic image. In contrast to previous mosaicing methods which assume planar or distant scenes, or controlled camera motion, our approach enables mosaicing in cases where the camera moves unpredictably through proximal surfaces, such as in medical endoscopy applications.
- Citation:
- Rectified Surface Mosaics. Carroll, R. E. and Seitz, S. M.. IEEE Computer Society Workshop on Mathematical Methods in Biomedical Image Analysis (MMBIA 2007), Rio de Janeiro, Brazil, October 2007.
- On-line documents:
- Complete article (PDF, 2.6MB)
A Probabilistic Model for Object Recognition, Segmentation, and Non-Rigid Correspondence
- Abstract:
- We describe a method for fully automatic object recognition and segmentation using a set of reference images to specify the appearance of each object. Our method uses a generative model of image formation that takes into account occlusions, simple lighting changes, and object deformations. We take advantage of local features to identify, locate, and extract multiple objects in the presence of large viewpoint changes, nonrigid motions with large numbers of degrees of freedom, occlusions, and clutter. We simultaneously compute an object-level segmentation and a dense correspondence between the pixels of the appropriate reference images and the image to be segmented.
- Citation:
- A Probabilistic Model for Object Recognition, Segmentation, and Non-Rigid Correspondence. Simon, I. and Seitz, S. M. Proceedings of CVPR 2007, Minneapolis, Minnesota, June 2007.
- On-line documents:
- Complete article (PDF, 1.6MB)
Scene Summarization for Online Image Collections
- Abstract:
- We formulate the problem of scene summarization as selecting a set of images that efficiently represents the visual content of a given scene. The ideal summary presents the most interesting and important aspects of the scene with minimal redundancy. We propose a solution to this problem using multi-user image collections from the Internet. Our solution examines the distribution of images in the collection to select a set of canonical views to form the scene summary, using clustering techniques on visual features. The summaries we compute also lend themselves naturally to the browsing of image collections, and can be augmented by analyzing user-specified image tag data. We demonstrate the approach using a collection of images of the city of Rome, showing the ability to automatically decompose the images into separate scenes, and identify canonical views for each scene.
- Citation:
- Scene Summarization for Online Image Collections. Simon, I., Snavely, N. and Seitz, S. M. Proceedings of ICCV 2007, Rio de Janeiro, Brazil, October 2007.
- On-line documents:
- Complete article (PDF, 2.4MB)
Project Page
Multi-View Stereo for Community Photo Collections
- Abstract:
- We present a multi-view stereo algorithm that addresses the extreme changes in lighting, scale, clutter, and other effects in large online community photo collections. Our idea is to intelligently choose images to match, both at a per-view and per-pixel level. We show that such adaptive view selection enables robust performance even with dramatic appearance variability. The stereo matching technique takes as input sparse 3D points reconstructed from structure-from-motion methods and iteratively grows surfaces from these points. Optimizing for surface normals within a photoconsistency measure significantly improves the matching results. While the focus of our approach is to estimate high-quality depth maps, we also show examples of merging the resulting depth maps into compelling scene reconstructions. We demonstrate our algorithm on standard multi-view stereo datasets and on casually acquired photo collections of famous scenes gathered from the Internet.
- Citation:
- Multi-View Stereo for Community Photo Collections. Goesele, M., Snavely, N., Curless, B., Hoppe, H. and Seitz, S. M. Proceedings of ICCV 2007, Rio de Janeiro, Brazil, October 2007.
- On-line documents:
- Complete article (PDF, 9.2MB)
Project Page
Globally Optimal Affine and Metric Upgrades in Stratified Autocalibration
- Abstract:
- We present a practical, stratified autocalibration algorithm with theoretical guarantees of global optimality. Given a projective reconstruction, the first stage of the algorithm upgrades it to affine by estimating the position of the plane at infinity. The plane at infinity is computed by globally minimizing a least squares formulation of the modulus constraints. In the second stage, the algorithm upgrades this affine reconstruction to a metric one by globally minimizing the infinite homography relation to compute the dual image of the absolute conic (DIAC). The positive semidefiniteness of the DIAC is explicitly enforced as part of the optimization process, rather than as a post-processing step.
For each stage, we construct and minimize tight convex relaxations of the highly non-convex objective functions in a branch and bound optimization framework. We exploit the problem structure to restrict the search space for the DIAC and the plane at infinity to a small, fixed number of branching dimensions, independent of the number of views.
Experimental evidence of the accuracy, speed and scalability of our algorithm is presented on synthetic and real data. MATLAB code for the implementation is made available to the community.
- Citation:
- Globally Optimal Affine and Metric Upgrades in Stratified Autocalibration. Chandraker, M., Agarwal, S., Kriegman, D. and Belongie, S. Proceedings of ICCV 2007, Rio de Janeiro, Brazil, October 2007.
- On-line documents:
- Complete article (PDF, 1.5MB)
Relations, Cards, and Search Templates: User-Guided Web Data Integration and Layout
- Abstract:
- We present three new interaction techniques for aiding users in collecting and organizing Web content. First, we demonstrate an interface for creating associations between websites, which facilitate the automatic retrieval of related content. Second, we present an authoring interface that allows users to quickly merge content from many different websites into a uniform and personalized representation, which we call a card. Finally, we introduce a novel search paradigm that leverages the relationships in a card to direct search queries to extract relevant content from multipleWeb sources and fill a new series of cards instead of just returning a list of webpage URLs. Preliminary feedback from users is positive and validates our design.
- Citation:
- Relations, Cards, and Search Templates: User-Guided Web Data Integration and Layout. Dontcheva, M., Drucker, S. M., Salesin, D. H. and Cohen, M. F. Proceedings of UIST 2007, Newport, Rhode Island, October 2007.
- On-line documents:
- Complete article (PDF, 9.3MB)
Near-optimal Character Animation with Continuous Control
- Abstract:
- We present a new model for real-time character animation with multidimensional, interactive control. The underlying motion engine is data-driven, enables rapid transitions, and automatically enforces foot-skate constraints without inverse kinematics. On top of this motion space, our algorithm learns approximately optimal controllers which use a compact basis representation to guide the system through multidimensional state-goal spaces. These controllers enable real-time character animation that fluidly responds to changing user directives and environmental constraints.
- Citation:
- Near-optimal Character Animation with Continuous Control. Treuille, A., Lee, Y., and Popović, Z. ACM Transactions on Graphics 26(3), August 2007.
- On-line documents:
- Complete article (PDF, 0.8MB)
Project Page
Video Watercolorization using Bidirectional Texture Advection
- Abstract:
- In this paper, we present a method for creating watercolor-like animation, starting from video as input. The method involves two main steps: applying textures that simulate a watercolor appearance; and creating a simplified, abstracted version of the video to which the texturing operations are applied. Both of these steps are subject to highly visible temporal artifacts, so the primary technical contributions of the paper are extensions of previous methods for texturing and abstraction to provide temporal coherence when applied to video sequences. To maintain coherence for textures, we employ texture advection along lines of optical flow. We furthermore extend previous approaches by incorporating advection in both forward and reverse directions through the video, which allows for minimal texture distortion, particularly in areas of disocclusion that are otherwise highly problematic. To maintain coherence for abstraction, we employ mathematical morphology extended to the temporal domain, using filters whose temporal extents are locally controlled by the degree of distortions in the optical flow. Together, these techniques provide the first practical and robust approach for producing watercolor animations from video, which we demonstrate with a number of examples.
- Citation:
- Video Watercolorization using Bidirectional Texture Advection. Adrien Bousseau, Fabrice Neyret, Joëlle Thollot, David Salesin. ACM Transactions on Graphics 26(3), August 2007.
- On-line documents:
- Complete article (PDF, 5.3MB)
Project Page
Active Learning for Real-time Motion Controllers
- Abstract:
- This paper describes an approach to building real-time highly-controllable characters. A kinematic character controller is built on-the-fly during a capture session, and updated after each new motion clip is acquired. Active learning is used to identify which motion sequence the user should perform next, in order to improve the quality and responsiveness of the controller. Because motion clips are selected adaptively, we avoid the difficulty of manually determining which ones to capture, and can build complex controllers from scratch while significantly reducing the number of necessary motion samples.
- Citation:
- Active Learning for Real-time Motion Controllers. Seth Cooper, Aaron Hertzmann, Zoran Popović. ACM Transactions on Graphics 26(3), August 2007.
- On-line documents:
- Complete article (PDF, 2.4MB)
Project Page
Layered Depth Panoramas
- Abstract:
- Representations for interactive photorealistic visualization of scenes range from compact 2D panoramas to dataintensive 4D light fields. In this paper, we propose a technique for creating a layered representation from a sparse set of images taken with a hand-held camera. This representation, which we call a layered depth panorama (LDP), allows the user to experience 3D by off-axis panning. It combines the compelling experience of panoramas with limited 3D navigation. Our choice of representation is motivated by ease of capture and compactness. We formulate the problem of constructing the LDP as the recovery of color and geometry in a multi-perspective cylindrical disparity space. We leverage a graph cut approach to sequentially determine the disparity and color of each layer using multi-view stereo. Geometry visible through the cracks at depth discontinuities in a frontmost layer is determined and assigned to layers behind the frontmost layer. All layers are then used to render novel panoramic views with parallax. We demonstrate our approach on a variety of complex outdoor and indoor scenes.
- Citation:
- Layered Depth Panoramas. Ke Colin Zheng, Sing Bing Kang, Michael Cohen, Richard Szeliski. CVPR 2007, Minneapolis, Minnesota.
- On-line documents:
- Complete article (PDF, 4.4MB)
Project Page
Soft Scissors: An Interactive Tool for Realtime High Quality Matting
- Abstract:
- We present Soft Scissors, an interactive tool for extracting alpha mattes of foreground objects in realtime. We recently proposed a novel offline matting algorithm capable of extracting high-quality mattes for complex foreground objects such as furry animals [Wang and Cohen 2007]. In this paper we both improve the quality of our offline algorithm and give it the ability to incrementally update the matte in an online interactive setting. Our realtime system efficiently estimates foreground color thereby allowing both the matte and the final composite to be revealed instantly as the user roughly paints along the edge of the foreground object. In addition, our system can dynamically adjust the width and boundary conditions of the scissoring paint brush to approximately capture the boundary of the foreground object that lies ahead on the scissor's path. These advantages in both speed and accuracy create the first interactive tool for high quality image matting and compositing.
- Citation:
- Soft Scissors: An Interactive Tool for Realtime High Quality Matting. Jue Wang, Maneesh Agrawala and Michael Cohen. ACM Transactions on Graphics 26(3), August 2007.
- On-line documents:
- Complete article (PDF, 5.6MB)
Simultaneous Matting and Compositing
- Abstract:
- Recent work in matting, hole filling, and compositing allows image elements to be mixed in a new composite image. Previous algorithms for matting foreground elements have assumed that the new background for compositing is unknown. We show that, if the new background is known, the matting algorithm has more freedom to create a successful matte by simultaneously optimizing the matting and compositing operations.
We propose a new algorithm, that integrates matting and compositing into a single optimization process. The system is able to compose foreground elements onto a new background more efficiently and with less artifacts compared with previous approaches. In our examples, we show how one can enlarge the foreground while maintaining the wide angle view of the background. We also demonstrate composing a foreground element on top of similar backgrounds to help remove unwanted portions of the background or to re-scale or re-arrange the composite. We compare and contrast our method with a number of previous matting and compositing systems.
- Citation:
- Simultaneous Matting and Compositing. Jue Wang and Michael Cohen. CVPR 2007, Minneapolis, Minnesota.
- On-line documents:
- Complete article (PDF, 6.8MB)
Optimized Color Sampling for Robust Matting
- Abstract:
- Image matting is the problem of determining for each pixel in an image whether it is foreground, background, or the mixing parameter, "alpha," for those pixels that are a mixture of foreground and background. Matting is inherently an ill-posed problem. Previous matting approaches either use naive color sampling methods to estimate foreground and background colors for unknown pixels, or use propagation-based methods to avoid color sampling under weak assumptions about image statistics. We argue that neither method itself is enough to generate good results for complex natural images.
We analyze the weaknesses of previous matting approaches, and propose a new robust matting algorithm. In our approach we also sample foreground and background colors for unknown pixels, but more importantly, analyze the confidence of these samples. Only high confidence samples are chosen to contribute to the matting energy function which is minimized by a Random Walk. The energy function we define also contains a neighborhood term to enforce the smoothness of the matte. To validate the approach, we present an extensive and quantitative comparison between our algorithm and a number of previous approaches in hopes of providing a benchmark for future matting research.
- Citation:
- Optimized Color Sampling for Robust Matting. Jue Wang and Michael Cohen. CVPR 2007, Minneapolis, Minnesota.
- On-line documents:
- Complete article (PDF, 3.2MB)
Principal Curvature-Based Region Detector for Object Recognition
- Abstract:
- This paper presents a new structure-based interest region detector called Principal Curvature-Based Regions (PCBR) which we use for object class recognition. The PCBR interest operator detects stable watershed regions within the multi-scale principal curvature image. To detect robust watershed regions, we "clean" a principal curvature image using a combination of grayscale morphological closing and a new "eigenvector flow" hysteresis thresholding. Robustness across scales is achieved by selecting the maximal stable regions across consecutive scales. PCBR typically detects distinctive patterns distributed evenly on the objects and it shows significant robustness to local intensity perturbations and intra-class variations. We evaluate PCBR both qualitatively (through visual inspection) and quantitatively (by measuring repeatability and classification accuracy in real-world object-class recognition problems). Experiments on different benchmark datasets show that PCBR is comparable or superior to state-of-art detectors for both feature matching and object recognition problems. Moreover, we demonstrate the application of PCBR to symmetry detection.
- Citation:
- Principal Curvature-Based Region Detector for Object Recognition. Hongli Deng, Wei Zhang, Eric Mortensen, Thomas Dietterich, Linda Shapiro. CVPR 2007, Minneapolis, Minnesota.
- On-line documents:
- Complete article (PDF, 3.0MB)
Using Photographs to Enhance Videos of a Static Scene
- Abstract:
- We present a framework for automatically enhancing videos of a static scene using a few photographs of the same scene. For example, our system can transfer photographic qualities such as high resolution, high dynamic range and better lighting from the photographs to the video. Additionally, the user can quickly modify the video by editing only a few still images of the scene. Finally, our system allows a user to remove unwanted objects and camera shake from the video. These capabilities are enabled by two technical contributions presented in this paper. First, we make several improvements to a state-of-the-art multiview stereo algorithm in order to compute view-dependent depths using video, photographs, and structure-from-motion data. Second, we present a novel image-based rendering algorithm that can re-render the input video using the appearance of the photographs while preserving certain temporal dynamics such as specularities and dynamic scene lighting.
- Citation:
- Using Photographs to Enhance Videos of a Static Scene. Pravin Bhat, C. Lawrence Zitnick, Noah Snavely, Aseem Agarwala, Maneesh Agrawala, Michael Cohen, Brian Curless, Sing Bing Kang. Eurographics Symposium on Rendering 2007.
- On-line documents:
- Complete article (PDF, 20.0MB)
Project Page
Automated Insect Identification through Concatenated Histograms of Local Appearance Features
- Abstract:
- Abstract This paper describes a computer vision approach to automated rapid-throughput taxonomic identification of stonefly larvae. The long-term goal of this research is to develop a cost-effective method for environmental monitoring based on automated identification of indicator species. Recognition of stonefly larvae is challenging because they are highly articulated, they exhibit a high degree of intraspecies variation in size and color, and some species are difficult to distinguish visually, despite prominent dorsal patterning. The stoneflies are imaged via an apparatus that manipulates the specimens into the field of view of a microscope so that images are obtained under highly repeatable conditions. The images are then classified through a process that involves (a) identification of regions of interest, (b) representation of those regions as SIFT vectors [1], (c) classification of the SIFT vectors into learned "features" to form a histogram of detected features, and (d) classification of the feature histogram via state-of-the-art ensemble classification algorithms. The steps (a) to (c) compose the concatenated feature histogram (CFH) method. We apply three region detectors for part (a) above, including a newly developed principal curvature-based region (PCBR) detector. This detector finds stable regions of high curvature via a watershed segmentation algorithm. We compute a separate dictionary of learned features for each region detector, and then concatenate the histograms prior to the final classification step.
We evaluate this classification methodology on a task of discriminating among four stonefly taxa, two of which, Calineuria and Doroneuria, are difficult even for experts to discriminate. The results show that the combination of all three detectors gives four-class accuracy of 82% and three-class accuracy (pooling Calineuria and Doroneuria) of 95%. Each region detector makes a valuable contribution. In particular, our new PCBR detector is able to discriminate Calineuria and Doroneuria much better than the other detectors.
- Citation:
- Automated Insect Identification through Concatenated Histograms of Local Appearance Features: Feature Vector Generation and Region Detection for Deformable Objects. Enrique Larios, Hongli Deng, Wei Zhang, Matt Sarpola, Jenny Yuen, Robert Paasch, Andrew Moldenke, David Lytle, Salvador Ruiz Correa, Eric Mortensen, Linda Shapiro, and Tom Dietterich. In Machine Vision and Applications, 2007.
- On-line documents:
- Complete article (PDF, 0.9MB)
Interactive Cutaway Illustrations of Complex 3D Models
- Abstract:
- We present a system for authoring and viewing interactive cutaway illustrations of complex 3D models using conventions of traditional scientific and technical illustration. Our approach is based on the two key ideas that 1) cuts should respect the geometry of the parts being cut, and 2) cutaway illustrations should support interactive exploration. In our approach, an author instruments a 3D model with auxiliary parameters, which we call "rigging," that define how cutaways of that structure are formed. We provide an authoring interface that automates most of the rigging process. We also provide a viewing interface that allows viewers to explore rigged models using high-level interactions. In particular, the viewer can just select a set of target structures, and the system will automatically generate a cutaway illustration that exposes those parts. We have tested our system on a variety of CAD and anatomical models, and our results demonstrate that our approach can be used to create and view effective interactive cutaway illustrations for a variety of complex objects with little user effort.
- Citation:
- Interactive Cutaway Illustration of Complex 3D Models. Wilmot Li, Lincoln Ritter, Maneesh Agrawala, Brian Curless, David Salesin. ACM Transactions on Graphics 26(3), August 2007.
- On-line documents:
- Complete article (PDF, 15.0MB)
Project page
A Theory of Frequency Domain Invariants: Spherical Harmonic Identities for BRDF / Lighting Transfer and Image Consistency
- Abstract:
- This paper develops a theory of frequency domain invariants in computer vision. We derive novel identities using spherical harmonics, which are the angular frequency domain analog to common spatial domain invariants such as reflectance ratios. These invariants are derived from the spherical harmonic convolution framework for reflection from a curved surface. Our identities apply in a number of canonical cases, including single and multiple images of objects under the same and different lighting conditions. One important case we consider is two different glossy objects in two different lighting environments. For this case, we derive a novel identity, independent of the specific lighting configurations or BRDFs, that allows us to directly estimate the fourth image if the other three are available. The identity can also be used as an invariant to detect tampering in the images.
While this paper is primarily theoretical, it has the potential to lay the mathematical foundations for two important practical applications. First, we can develop more general algorithms for inverse rendering problems, which can directly relight and change material properties by transferring the BRDF or lighting from another object or illumination. Second, we can check the consistency of an image, to detect tampering or image splicing.
- Citation:
- A Theory Of Frequency Domain Invariants: Spherical Harmonic Identities for BRDF / Lighting Transfer and Image Consistency. Dhruv Mahajan, Ravi Ramamoorthi, and Brian Curless. To appear, IEEE Pattern Analysis and Machine Intelligence.
- On-line documents:
- Complete article (PDF, 3.0MB)
Devices That Tell On You: Privacy Trends in Consumer Ubiquitous Computing
- Abstract:
- We analyze three new consumer electronic gadgets in order to gauge the privacy and security trends in mass-market UbiComp devices. Our study of the Slingbox Pro uncovers a new information leakage vector for encrypted streaming multimedia. By exploiting properties of variable bitrate encoding schemes, we show that a passive adversary can determine with high probability the movie that a user is watching via her Slingbox, even when the Slingbox uses encryption. We experimentally evaluated our method against a database of over 100 hours of network traces for 26 distinct movies.
Despite an opportunity to provide significantly more location privacy than existing devices, like RFIDs, we find that an attacker can trivially exploit the Nike+iPod Sport Kit's design to track users; we demonstrate this with a GoogleMaps-based distributed surveillance system. We also uncover security issues with the way Microsoft Zunes manage their social relationships.
We show how these products' designers could have significantly raised the bar against some of our attacks. We also use some of our attacks to motivate fundamental security and privacy challenges for future UbiComp devices.
- Citation:
- Devices That Tell On You: Privacy Trends in Consumer Ubiquitous Computing. T. Scott Saponas, Jonathan Lester, Carl Hartung, Sameer Agarwal and Tadayoshi Kohno, to appear USENIX Security 2007.
- On-line documents:
- Complete article (PDF, 1.5MB)
Project Page
Generalized Non-metric Multidimensional Scaling
- Citation:
- Generalized Non-metric Multidimensional Scaling. Sameer Agarwal, Josh Wills, Lawrence Cayton, Gert Lanckriet, David Kriegman and Serge Belongie. AISTATS 2007, San Juan, Puerto Rico.
- On-line documents:
- Complete article (PDF, 0.9MB)
ShadowCuts: Photometric Stereo with Shadows
- Abstract:
- We present an algorithm for performing Lambertian photometric stereo in the presence of shadows. The algorithm has three novel features. First, a fast graph cuts based method is used to estimate per pixel light source visibility. Second, it allows images to be acquired with multiple illuminants, and there can be fewer images than light sources. This leads to better surface coverage and improves the reconstruction accuracy by enhancing the signal to noise ratio and the condition number of the light source matrix. The ability to use fewer images than light sources means that the imaging effort grows sublinearly with the number of light sources. Finally, the recovered shadow maps are combined with shading information to perform constrained surface normal integration. This reduces the low frequency bias inherent to the normal integration process and ensures that the recovered surface is consistent with the shadowing configuration.
The algorithm works with as few as four light sources and four images. We report results for light source visibility detection and high quality surface reconstructions for synthetic and real datasets.
- Citation:
- ShadowCuts: Photometric Stereo with Shadows. Manmohan Chandraker, Sameer Agarwal, David Kriegman. CVPR 2007, Minneapolis, Minnesota.
- On-line documents:
- Complete article (PDF, 1.6MB)
Autocalibration via Rank-Constrained Estimation of the Absolute Quadric
- Abstract:
- We present an autocalibration algorithm for upgrading a projective reconstruction to a metric reconstruction by estimating the absolute dual quadric. The algorithm enforces the rank degeneracy and the positive semidefiniteness of the dual quadric as part of the estimation procedure, rather than as a post-processing step. Furthermore, the method allows the user, if he or she so desires, to enforce conditions on the plane at infinity so that the reconstruction satisfies the chirality constraints.
The algorithm works by constructing low degree polynomial optimization problems, which are solved to their global optimum using a series of convex linear matrix inequality relaxations. The algorithm is fast, stable, robust and has time complexity independent of the number of views. We show extensive results on synthetic as well as real datasets to validate our algorithm.
- Citation:
- Autocalibration via Rank-Constrained Estimation of the Absolute Quadric. Manmohan Chandraker, Sameer Agarwal, Fredrik Kahl, David Nistér, David Kriegman. CVPR 2007, Minneapolis, Minnesota.
- On-line documents:
- Complete article (PDF, 0.2MB)
Stylizing 2.5-D Video
- Abstract:
- In recent years considerable interest has been given to non-photorealistic rendering of photographs, video, and 3D models for illustrative or artistic purposes. Conventional 2D inputs such as photographs and video are easy to create and capture, while 3D models allow for a wider variety of stylization techniques, such as cross-hatching. In this paper, we propose using video with depth information (2.5D video) to combine the advantages of 2D and 3D input. 2.5D video is becoming increasingly easy to capture, and with the additional depth information, stylization techniques that require shape information can be applied. However, because 2.5D video contains only limited shape information and 3D correspondence over time is unknown, it is difficult to create temporally coherent stylized animations directly from raw 2.5D video. In this paper, we present techniques for processing 2.5D video to overcome these drawbacks, and demonstrate several styles that can be created using these techniques.
- Citation:
- Stylizing 2.5D video. Noah Snavely, C. Lawrence Zitnick, Sing Bing Kang, Michael Cohen. In Proc. Symposium on Non-Photorealistic Animation and Rendering (NPAR) 2006, pages 63-69.
- On-line documents:
- Complete article (PDF, 0.8MB)
Summarizing Personal Web Browsing Sessions
- Abstract:
- We describe a system, implemented as a browser extension, that enables users to quickly and easily collect, view, and share personal Web content. Our system employs a novel interaction model, which allows a user to specify webpage extraction patterns by interactively selecting webpage elements and applying these patterns to automatically collect similar content. Further, we present a technique for creating visual summaries of the collected information by combining user labeling with predefined layout templates. These summaries are interactive in nature: depending on the behaviors encoded in their templates, they may respond to mouse events, in addition to providing a visual summary. Finally, the summaries can be saved or sent to other users to continue the research at another place or time. Informal evaluation shows that our approach works well for popular websites, and that users can quickly learn this interaction model for collecting Web content.
- Citation:
- Mira Dontcheva, Steven Drucker, Geraldine Wade, David Salesin and Michael F. Cohen. Summarizing Personal Web Browsing Sessions. Proceedings of ACM UIST 2006.
- On-line documents:
- Complete article (PDF, 5.0MB)
Project Page
Painting With Texture
- Abstract:
- We present an interactive texture painting system that allows the user to author digital images by painting with a palette of input textures. At the core of our system is an interactive texture synthesis algorithm that generates textures with natural-looking boundary effects and alpha information as the user paints. Furthermore, we describe an intuitive layered painting model that allows strokes of texture to be merged, intersected and overlapped while maintaining the appropriate boundaries between texture regions. We demonstrate the utility and expressiveness of our system by painting several images using textures that exhibit a range of different boundary effects.
- Citation:
- Lincoln Ritter, Wilmot Li, Maneesh Agrawala, Brian Curless, David Salesin. Paitning With Texture. Proceedings of the 17th Eurographics Symposium on Rendering, 2006.
- On-line documents:
- Complete article (PDF, 1.7MB)
Project Page
Learning a correlated model of identity and pose-dependent body shape variation for real-time synthesis
- Abstract:
- We present a method for learning a model of human body shape variation from a corpus of 3D range scans. Our model is the first to capture both identity-dependent and pose-dependent shape variation in a correlated fashion, enabling creation of a variety of virtual human characters with realistic and non-linear body deformations that are customized to the individual. Our learning method is robust to irregular sampling in pose-space and identity space, and also to missing surface data in the examples. Our synthesized character models are based on standard skinning techniques and can be rendered in real time.
- Citation:
- Brett Allen, Brian Curless, Zoran Popović, Aaron Hertzmann. Learning a correlated model of identity and pose-dependent body shape variation for real-time synthesis. Proceedings of the ACM SIGGRAPH / Eurographics Symposium on Computer Animation, 2006, pp. 147-156.
- On-line documents:
- Complete article (PDF, 1.9MB)
Project Page
Gaze-Based Interaction for Semi-Automatic Photo Cropping
- Abstract:
- We present an interactive method for cropping photographs given minimal information about the location of important content, provided by eye tracking. Cropping is formulated in a general optimization framework that facilitates adding new composition rules, as well as adapting the system to particular applications. Our system uses fixation data to identify important content and compute the best crop for any given aspect ratio or size, enabling applications such as automatic snapshot recomposition, adaptive documents, and thumbnailing. We validate our approach with studies in which users compare our crops to ones produced by hand and by a completely automatic approach. Experiments show that viewers prefer our gaze-based crops to uncropped images and fully automatic crops.
- Citation:
- Anthony Santella, Maneesh Agrawala, Doug DeCarlo, David H. Salesin, Michael F. Cohen. Gaze-Based Interaction for Semi-Automatic Photo Cropping. ACM Human Factors in Computing Systems (CHI), 2006, pp. 771-780.
- On-line documents:
- Complete article (PDF, 2.2MB)
Project Page
Photo Tourism: Exploring Photo Collections in 3D
- Abstract:
- We present a system for interactively browsing and exploring large unstructured collections of photographs of a scene using a novel 3D interface. Our system consists of an image-based modeling front end that automatically computes the viewpoint of each photograph as well as a sparse 3D model of the scene and image to model correspondences. Our photo explorer uses image-based rendering techniques to smoothly transition between photographs, while also enabling full 3D navigation and exploration of the set of images and world geometry, along with auxiliary information such as overhead maps. Our system also makes it easy to construct photo tours of scenic or historic locations, and to annotate image details, which are automatically transferred to other relevant images. We demonstrate our system on several large personal photo collections as well as images gathered from Internet photo sharing sites.
- Citation:
- Noah Snavely, Steven M. Seitz, Richard Szeliski. Photo Tourism: Exploring Photo Collections in 3D. ACM Transactions on Graphics 25(3) (ACM SIGGRAPH 2006), July 2006.
- On-line documents:
- Complete article (PDF, 1.7MB)
Project Page
The Cartoon Animation Filter
- Abstract:
- We present the "Cartoon Animation Filter," a simple filter that takes an arbitrary input motion signal and modulates it in such a way that the output motion is more "alive" or "animated." The filter adds a smoothed, inverted, and (sometimes) time shifted version of the second derivative (the acceleration) of the signal back into the original signal. Almost all parameters of the filter are automated. The user only needs to set the desired strength of the filter. The beauty of the animation filter lies in its simplicity and generality. We apply the filter to motions ranging from hand drawn trajectories, to simple animations within PowerPoint presentations, to motion captured DOF curves, to video segmentation results. Experimental results show that the filtered motion exhibits anticipation, follow-through, exaggeration and squash-and-stretch effects which are not present in the original input motion data.
- Citation:
- Jue Wang, Steven M. Drucker, Maneesh Agrawala, Michael F. Cohen. The Cartoon Animation Filter. ACM Transactions on Graphics 25(3) (ACM SIGGRAPH 2006), July 2006.
- On-line documents:
- Complete article (PDF, 0.6MB)
Project Page
Composition of Complex Optimal Multi-Character Motions
- Abstract:
- This paper presents a physics-based method for creating complex multi-character motions from short singlecharacter sequences. We represent multi-character motion synthesis as a spacetime optimization problem where constraints represent the desired character interactions. We extend standard spacetime optimization with a novel timewarp parameterization in order to jointly optimize the motion and the interaction constraints. In addition, we present an optimization algorithm based on block coordinate descent and continuations that can be used to solve large problems multiple characters usually generate. This framework allows us to synthesize multi-character motion drastically different from the input motion. Consequently, a small set of input motion dataset is sufficient to express a wide variety of multi-character motions.
- Citation:
- C. Karen Liu, Aaron Hertzmann, Zoran Popović. Composition of Complex Optimal Multi-Character Motions. ACM SIGGRAPH / Eurographics Symposium on Computer Animation, 2006.
- On-line documents:
- Complete article (PDF, 3.5MB)
Project Page
Photographing Long Scenes with Multi-Viewpoint Panoramas
- Abstract:
- We present a system for producing multi-viewpoint panoramas of long, roughly planar scenes, such as the facades of buildings along a city street, from a relatively sparse set of photographs captured with a handheld still camera that is moved along the scene. Our work is a significant departure from previous methods for creating multiviewpoint panoramas, which composite thin vertical strips from a video sequence captured by a translating video camera, in that the resulting panoramas are composed of relatively large regions of ordinary perspective. In our system, the only user input required beyond capturing the photographs themselves is to identify the dominant plane of the photographed scene; our system then computes a panorama automatically using Markov Random Field optimization. Users may exert additional control over the appearance of the result by drawing rough strokes that indicate various high-level goals. We demonstrate the results of our system on several scenes, including urban streets, a river bank, and a grocery store aisle.
- Citation:
- Aseem Agarwala, Maneesh Agrawala, Michael F. Cohen, David H. Salesin, Richard Szeliski. Photographing Long Scenes with Multi-Viewpoint Panoramas. ACM Transactions on Graphics 25(3) (ACM SIGGRAPH 2006), July 2006.
- On-line documents:
- Complete article (PDF, 4.3MB)
Project Page
Volumetric Density Capture From a Single Image
- Abstract:
- We propose a new approach to capture the volumetric density of scattering media instantaneously with a single image. The volume is probed with a set of laser lines and the scattered intensity is recorded by a conventional camera. We then determine the density along the laser lines taking the scattering properties of the media into account. A specialized interpolation technique reconstructs the full density field in the volume. We apply the technique to capture the volumetric density of participating media such as smoke.
- Citation:
- Christian Fuchs, Tongbo Chen, Michael Goesele, Holger Theisel, Hans-Peter Seidel. Volumetric Density Capture From a Single Image, Proceedings of the International Workshop on Volume Graphics 2006, July 2006.
- On-line documents:
- Complete article (PDF, 3.5MB)
Model Reduction for Real-time Fluids
- Abstract:
- We present a new model reduction approach to fluid simulation, enabling large, real-time, detailed flows with continuous user interaction. Our reduced model can also handle moving obstacles immersed in the flow. We create separate models for the velocity field and for each moving boundary, and show that the coupling forces may be reduced as well. Our results indicate that surprisingly few basis functions are needed to resolve small but visually important features such as spinning vortices.
- Citation:
- Adrien Treuille, Andrew Lewis, Zoran Popović. Model Reduction for Real-time Fluids, ACM Transactions on Graphics 25(3) (SIGGRAPH 2006), July 2006.
- On-line documents:
- Complete article (PDF, 3MB)
Project Page
Continuum Crowds
- Abstract:
- We present a real-time crowd model based on continuum dynamics. In our model, a dynamic potential field simultaneously integrates global navigation with moving obstacles such as other people, efficiently solving for the motion of large crowds without the need for explicit collision avoidance. Simulations created with our system run at interactive rates, demonstrate smooth flow under a variety of conditions, and naturally exhibit emergent phenomena that have been observed in real crowds.
- Citation:
- Adrien Treuille, Seth Cooper, Zoran Popović. Continuum Crowds, ACM Transactions on Graphics 25(3) (SIGGRAPH 2006), July 2006.
- On-line documents:
- Complete article (PDF, 3.4MB)
Project Page
Schematic Storyboards for Video Visualization and Editing
- Abstract:
- We present a method for visualizing short video clips in a single static image, using the visual language of storyboards. These schematic storyboards are composed from multiple input frames and annotated using outlines, arrows, and text describing the motion in the scene. The principal advantage of this storyboard representation over standard representations of video generally either a static thumbnail image or a playback of the video clip in its entirety is that it requires only a moment to observe and comprehend but at the same time retains much of the detail of the source video. Our system renders a schematic storyboard layout based on a small amount of user interaction.We also demonstrate an interaction technique to scrub through time using the natural spatial dimensions of the storyboard. Potential applications include video editing, surveillance summarization, assembly instructions, composition of graphic novels, and illustration of camera technique for film studies.
- Citation:
- Dan B Goldman, Brian Curless, David H. Salesin, Steven M. Seitz. Schematic Storyboarding for Video Visualization and Editing, ACM Transactions on Graphics 25(3), (SIGGRAPH 2006), July 2006.
- On-line documents:
- Complete article (PDF, 6MB)
Project Page
Spatio-Angular Resolution Tradeoff in Integral Photography
- Abstract:
- An integral camera samples the 4D light field of a scene within a single photograph. This paper explores the fundamental tradeoff between spatial resolution and angular resolution that is inherent to integral photography. Based on our analysis we divide previous integral camera designs into two classes depending on how the 4D light field is distributed (multiplexed) over the 2D sensor. Our optical treatment is mathematically rigorous and extensible to the broader area of light field research. We argue that for many real-world scenes it is beneficial to sacrifice angular resolution for higher spatial resolution. The missing angular resolution is then interpolated using techniques from computer vision. We have developed a prototype integral camera that uses a system of lenses and prisms as an external attachment to a conventional camera. We have used this prototype to capture the light fields of a variety of scenes. We show examples of novel view synthesis and refocusing where the spatial resolution is significantly higher than is possible with previous designs.
- Citation:
- Todor Georgiev, Ke Colin Zheng, Brian Curless, David H. Salesin, Shree Nayar, Chintan Intwala. Spatio-Angular Resolution Tradeoff in Integral Photography, Proceedings of Eurographics Symposium on Rendering, 2006.
- On-line documents:
- Complete article (PDF, 0.6MB)
Project Page
Multi-View Stereo Revisited
- Abstract:
- We present an extremely simple yet robust multi-view stereo algorithm and analyze its properties. The algorithm first computes individual depth maps using a window-based voting approach that returns only good matches. The depth maps are then merged into a single mesh using a straightforward volumetric approach. We show results for several datasets, showing accuracy comparable to the best of the current state of the art techniques and rivaling more complex algorithms.
- Citation:
- Michael Goesele, Steven M. Seitz and Brian Curless. Multi-View Stereo Revisited, Proceedings of CVPR 2006, New York, NY, USA, June 2006.
- On-line documents:
- Complete article (PDF, 5.3MB)
Mesostructure from Specularity
- Abstract:
- We describe a simple and robust method for surface mesostructure acquisition. Our method builds on the observation that specular reflection is a reliable visual cue for surface mesostructure perception. In contrast to most photometric stereo methods, which take specularities as outliers and discard them, we propose a progressive acquisition system that captures a dense specularity field as the only information for mesostructure reconstruction. Our method can efficiently recover surfaces with fine-scale geometric details from complex real-world objects with a wide variety of reflection properties, including translucent, low albedo, and highly specular objects. We show results for a variety of objects including human skin, dried apricot, orange, jelly candy, black leather and dark chocolate.
- Citation:
- Tongbo Chen, Michael Goesele and Hans-Peter Seidel. Mesostructure from Specularity, Proceedings of CVPR 2006, New York, NY, USA, June 2006.
- On-line documents:
- Complete article (PDF, 4.0MB)
Project Page
Piecewise Image Registration in the Presence of Multiple Large Motions
- Abstract:
- We present a technique for computing a dense pixel correspondence between two images of a scene containing multiple large, rigid motions. We model each motion with either a homography (for planar objects) or a fundamental matrix. The various motions in the scene are first extracted by clustering an initial sparse set of correspondences between feature points; we then perform a multi-label graph cut optimization which assigns each pixel to an independent motion and computes its disparity with respect to that motion. We demonstrate our technique on several example scenes and compare our results with previous approaches.
- Citation:
- Pravin Bhat, Ke Colin Zheng, Noah Snavely, Aseem Agarwala, Maneesh Agrawala, Michael F. Cohen and Brian Curless. Piecewise Image Registration in the Presence of Multiple Large Motions, Proceedings of CVPR 2006, New York, NY, USA, June 2006.
- On-line documents:
- Complete article (PDF, 0.8MB)
A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms
- Abstract:
- This paper presents a quantitative comparison of several multi-view stereo reconstruction algorithms. Until now, the lack of suitable calibrated multi-view image datasets with known ground truth (3D shape models) has prevented such direct comparisons. In this paper, we first survey multi-view stereo algorithms and compare them qualitatively using a taxonomy that differentiates their key properties. We then describe our process for acquiring and calibrating multiview image datasets with high-accuracy ground truth and introduce our evaluation methodology. Finally, we present the results of our quantitative comparison of state-of-the-art multi-view stereo reconstruction algorithms on six benchmark datasets. The datasets, evaluation details, and instructions for submitting new models are available online at http://vision.middlebury.edu/mview.
- Citation:
- Steven M. Seitz, Brian Curless, James Diebel, Daniel Scharstein, and Richard Szeliski. A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms, Proceedings of CVPR 2006, New York, NY, USA, June 2006.
- On-line documents:
- Complete article (PDF, 1.8MB)
Project Page
A Theory of Spherical Harmonic Identities for BRDF/Lighting Transfer and Image Consistency
- Abstract:
- We develop new mathematical results based on the spherical harmonic convolution framework for reflection from a curved surface. We derive novel identities, which are the angular frequency domain analogs to common spatial domain invariants such as reflectance ratios. They apply in a number of canonical cases, including single and multiple images of objects under the same and different lighting conditions. One important case we consider is two different glossy objects in two different lighting environments. Denote the spherical harmonic coefficients by Blight,materiallm, where the subscripts refer to the spherical harmonic indices, and the superscripts to the lighting (1 or 2) and object or material (again 1 or 2). We derive a basic identity, B1,1lmB2,2lm = B1,2lmB2,1lm, independent of the specific lighting configurations or BRDFs. While this paper is primarily theoretical, it has the potential to lay the mathematical foundations for two important practical applications. First, we can develop more general algorithms for inverse rendering problems, which can directly relight and change material properties by transferring the BRDF or lighting from another object or illumination. Second, we can check the consistency of an image, to detect tampering or image splicing.
- Citation:
- Dhruv Mahajan, Ravi Ramamoorthi and Brian Curless. A Theory of Spherical Harmonic Identities for BRDF/Lighting Transfer and Image Consistency, in Proceedings of the Ninth European Conference on Computer Vision (ECCV 2006), Graz, Austria, May 2006.
- On-line documents:
- Complete article (PDF, 16.0MB)
Audio Analogies: Creating new music from an existing performance by concatenative synthesis
- Abstract:
- This paper describes a method for creating new music by concatenative synthesis. Given a MIDI score and an audio recording of an example piece of monophonic music, our method synthesizes audio to correspond with a new MIDI score. The algorithm we use is based on concatenative synthesis, commonly used for generating speech. Two versions of our algorithm are explored, one in which individual notes from the example piece are concatenated, and one in which pairs of adjacent notes from the example piece are concatenated. We examine the range of example pieces and target scores for which each version of our algorithm yields good results. Our underlying framework remains general enough to be applicable to other problems, such as rendering a stylized version of the target score, and other types of sound analogies.
- Citation:
- Audio Analogies: Creating new music from an existing performance by concatenative synthesis. Simon, I., Basu, S., Salesin, D. H. and Agrawala, M. Proceedings of ICMC 2005, Barcelona, Spain.
- On-line documents:
- Complete article (PDF, 0.4MB)
Dance reveals symmetry especially in young men
- Abstract:
- Dance is a common part of human courtship. Is it just for fun or does it carry a hidden message? This question was tackled in a population -- Jamaican -- where dance is particularly important. One property that dance might reflect is bodily symmetry, often used in evolutionary studies to measure developmental stability and genetic quality. A study using motion capture cameras to create video images of the dancers reveals a strong link between symmetry and dancing ability. The effect is stronger for men than for women, and women rate dances by symmetrical men relatively more positively than do men. It works both ways; symmetrical men value symmetry in women dancers more highly than less symmetrical men. In Jamaica at least, it seems that dance is a factor in sexual selection and reveals important information about the dancer. Freeze-frame images on the cover (by William M. Brown) show a symmetrical male dancer in action.
- Citation:
- William M. Brown, Lee Cronk, Keith Grochow, Amy Jacobson, C. Karen Liu, Zoran Popović, Robert Trivers. Dance reveals symmetry especially in young men. Nature 438(7071), 22 Dec 2005, pp. 1148-1150.
- On-line documents:
- Complete article (PDF, 0.2MB)
Project Page
A Theory of Inverse Light Transport
- Abstract:
- In this paper we consider the problem of computing and removing interreflections in photographs of real scenes. Towards this end, we introduce the problem of inverse light transport -- given a photograph of an unknown scene, decompose it into a sum of n-bounce images, where each image records the contribution of light that bounces exactly n times before reaching the camera. We prove the existence of a set of interreflection cancelation operators that enable computing each n-bounce image by multiplying the photograph by a matrix. This matrix is derived from a set of "impulse images" obtained by probing the scene with a narrow beam of light. The operators work under unknown and arbitrary illumination, and exist for scenes that have arbitrary spatially-varying BRDFs. We derive a closedform expression for these operators in the Lambertian case and present experiments with textured and untextured Lambertian scenes that confirm our theory's predictions.
- Citation:
- Steven M. Seitz, Yasuyuki Matsushita and Kiriakos N. Kutulakos. A Theory of Inverse Light Transport, in Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV 2005), Beijing, China, October 2005.
- On-line documents:
- Complete article (PDF, 4.0MB)
Vignette and Exposure Calibration and Compensation
- Abstract:
- We discuss calibration and removal of "vignetting" (radial falloff) and exposure (gain) variations from sequences of images. Unique solutions for vignetting, exposure and scene radiances are possible when the response curve is known. When the response curve is unknown, an exponential ambiguity prevents us from recovering these parameters uniquely. However, the vignetting and exposure variations can nonetheless be removed from the images without resolving this ambiguity. Applications include panoramic image mosaics, photometry for material reconstruction, imagebased rendering, and preprocessing for correlation-based vision algorithms.
- Citation:
- Dan B Goldman and Jiun-Hung Chen. Vignette and Exposure Calibration and Compensation, in Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV 2005), Beijing, China, October 2005.
- On-line documents:
- Complete article (PDF, 6.0MB)
Shape and Spatially-Varying BRDFs From Photometric Stereo
- Abstract:
- This paper describes a photometric stereo method designed for surfaces with spatially-varying BRDFs, including surfaces with both varying diffuse and specular properties. Our method builds on the observation that most objects are composed of a small number of fundamental materials. This approach recovers not only the shape but also material BRDFs and weight maps, yielding compelling results for a wide variety of objects. We also show examples of interactive lighting and editing operations made possible by our method.
- Citation:
- Dan B Goldman, Brian Curless, Aaron Hertzmann and Steven M. Seitz. Shape and Spatially-Varying BRDFs From Photometric Stereo, in Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV 2005), Beijing, China, October 2005.
- On-line documents:
- Complete article (PDF, 6.0MB)
Parameter Estimation for MRF Stereo
- Abstract:
- This paper presents a novel approach for estimating parameters for MRF-based stereo algorithms. This approach is based on a new formulation of stereo as a maximum a posterior (MAP) problem, in which both a disparity map and MRF parameters are estimated from the stereo pair itself. We present an iterative algorithm for the MAP estimation that alternates between estimating the parameters while fixing the disparity map and estimating the disparity map while fixing the parameters. The estimated parameters include robust truncation thresholds, for both data and neighborhood terms, as well as a regularization weight. The regularization weight can be either a constant for the whole image, or spatially-varying, depending on local intensity gradients. In the latter case, the weights for intensity gradients are also estimated. Experiments indicate that our approach, as a wrapper for existing stereo algorithms, moves a baseline belief propagation stereo algorithm up six slots in the Middlebury rankings.
- Citation:
- Li Zhang and Steven M. Seitz. Parameter Estimation for MRF Stereo, in Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego CA, June 2005.
- On-line documents:
- Complete article (PDF, 1.0MB)
Project Web Page
Interactive Video Cutout
- Abstract:
- We present an interactive system for efficiently extracting foreground objects from a video. We extend previous min-cut based image segmentation techniques to the domain of video with four new contributions. We provide a novel painting-based user interface that allows users to easily indicate the foreground object across space and time. We introduce a hierarchical mean-shift preprocess in order to minimize the number of nodes that min-cut must operate on. Within the min-cut we also define new local cost functions to augment the global costs defined in earlier work. Finally, we extend 2D alpha matting methods designed for images to work with 3D video volumes. We demonstrate that our matting approach preserves smoothness across both space and time. Our interactive video cutout system allows users to quickly extract foreground objects from video sequences for use in a variety of applications including compositing onto new backgrounds and NPR cartoon style rendering.
- Citation:
- Jue Wang, Pravin Bhat, R. Alex Colburn, Maneesh Agrawala, Michael F. Cohen. ACM Transactions on Graphics 24(3), July 2005.
- On-line documents:
- Complete article (PDF, 60MB)
Animating Pictures with Stochastic Motion Textures
- Abstract:
- In this paper, we explore the problem of enhancing still pictures with subtly animated motions. We limit our domain to scenes containing passive elements that respond to natural forces in some fashion. We use a semi-automatic approach, in which a human user segments the scene into a series of layers to be individually animated. Then, a "stochastic motion texture" is automatically synthesized using a spectral method, i.e., the inverse Fourier transform of a filtered noise spectrum. The motion texture is a time-varying 2D displacement map, which is applied to each layer. The resulting warped layers are then recomposited to form the animated frames. The result is a looping video texture created from a single still image, which has the advantages of being more controllable and of generally higher image quality and resolution than a video texture created from a video source. We demonstrate the technique on a variety of photographs and paintings.
- Citation:
- Yung-Yu Chuang, Dan B Goldman, Ke Colin Zheng, Brian Curless, David H. Salesin, Richard Szeliski. ACM Transactions on Graphics 24(3), July 2005.
- On-line documents:
- Complete article (PDF, 1.3MB)
Project web page
Learning Physics-based Motion Style with Nonlinear Inverse Optimization
- Abstract:
- This paper presents a novel physics-based representation of realistic character motion. The dynamical model incorporates several factors of locomotion derived from the biomechanical literature, including relative preferences for using some muscles more than others, elastic mechanisms at joints due to the mechanical properties of tendons, ligaments, and muscles, and variable stiffness at joints depending on the task. When used in a spacetime optimization framework, the parameters of this model define a wide range of styles of natural human movement.
Due to the complexity of biological motion, these style parameters are too difficult to design by hand. To address this, we introduce Nonlinear Inverse Optimization, a novel algorithm for estimating optimization parameters from motion capture data. Our method can extract the physical parameters from a single short motion sequence. Once captured, this representation of style is extremely flexible: motions can be generated in the same style but performing different tasks, and styles may be edited to change the physical properties of the body.
- Citation:
- C. Karen Liu, Aaron Hertzmann, Zoran Popović. ACM Transactions on Graphics 24(3), July 2005.
- On-line documents:
- Complete article (PDF, 954KB)
Project web page
Panoramic Video Textures
- Abstract:
- This paper describes a mostly automatic method for taking the output of a single panning video camera and creating a panoramic video texture (PVT): a video that has been stitched into a single, wide field of view and that appears to play continuously and indefinitely. The key problem in creating a PVT is that although only a portion of the scene has been imaged at any given time, the output must simultaneously portray motion throughout the scene. Like previous work in video textures, our method employs min-cut optimization to select fragments of video that can be stitched together both spatially and temporally. However, it differs from earlier work in that the optimization must take place over a much larger set of data. Thus, to create PVTs, we introduce a dynamic programming step, followed by a novel hierarchical min-cut optimization algorithm. We also use gradient-domain compositing to further smooth boundaries between video fragments. We demonstrate our results with an interactive viewer in which users can interactively pan and zoom on high-resolution PVTs.
- Citation:
- Aseem Agarwala, Ke Colin Zheng, Chris Pal, Maneesh Agrawala, Michael Cohen, Brian Curless, David H. Salesin, Richard Szeliski. ACM Transactions on Graphics 24(3), July 2005.
- On-line documents:
- Complete article (PDF, 954KB)
Project web page
Physically Based Rigging for Deformable Characters
- Abstract:
- In this paper we introduce a framework for instrumenting ("rigging") characters that are modeled as dynamic elastic bodies, so that their shapes can be controlled by an animator. Because the shape of such a character is determined by physical dynamics, the rigging system cannot simply dictate the shape as in traditional animation. For this reason, we introduce forces as the building blocks of rigging. Rigging forces guide the shape of the character, but are combined with other forces during simulation. Forces have other desirable features: they can be combined easily and simulated at any resolution, and since they are not tightly coupled with the surface geometry, they can be more easily transferred from one model to another. Our framework includes a new pose-dependent linearization scheme for elastic dynamics, which ensures a correspondence between forces and deformations, and at the same time produces plausible results at interactive speeds. We also introduce a novel method of handling collisions around creases.
- Citation:
- Steve Capell, Matthew Burkhart, Brian Curless, Tom Duchamp, and Zoran Popović. Proceedings of ACM SIGGRAPH / Eurographics Symposium on Computer Animation, 2005.
Extended version: Steve Capell, Matthew Burkhart, Brian Curless, Tom Duchamp, and Zoran Popović. Graphical Models, vol. 69, p. 71-87, 2007.
- On-line documents:
- Complete article (PDF, 4MB)
Project web page
Exploring the space of human body shapes: data-driven synthesis under anthropometric control
- Abstract:
- In this paper, we demonstrate a system for synthesizing high-resolution, realistic 3D human body shapes according to user-specified anthropometric parameters. We begin with a corpus of whole-body 3D laser range scans of 250 different people. For each scan, we warp a common template mesh to fit each scanned shape, thereby creating a one-to-one vertex correspondence between each of the example body shapes. Once we have a common surface representation for each example, we then use principal component analysis to reduce the data storage requirements. The final step is to relate the variation of body shape with concrete parameters, such as body circumferences, point-to-point measurements, etc. These parameters can then be used as "sliders" to synthesize new individuals with the required attributes, or to edit the attributes of scanned individuals.
- Citation:
- Brett Allen, Brian Curless, and Zoran Popović. Exploring the space of human body shapes: data-driven synthesis under anthropometric control. SAE 2004: Digital Human Modeling for Design and Engineering (DHMC), 2004.
- On-line documents:
Project Web Page
Interactive, Image-Based Exploded View Diagrams
- Abstract:
- We present a system for creating interactive exploded view diagrams using 2D images as input. This imagebased approach enables us to directly support arbitrary rendering styles, eliminates the need for building 3D models, and allows us to leverage the abundance of existing static diagrams of complex objects.We have developed a set of semi-automatic authoring tools for quickly creating layered diagrams that allow the user to specify how the parts of an object expand, collapse, and occlude one another.We also present a viewing system that lets users dynamicallylter the information presented in the diagram by directly expanding and collapsing the exploded view and searching for individual parts. Our results demonstrate that a simple 2.5D diagram representation is powerful enough to enable a useful set of interactions and that, with the right authoring tools, effective interactive diagrams in this format can be created from existing static illustrations with a small amount of effort.
- Citation:
- Wilmot Li, Maneesh Agrawala, David H. Salesin. Interactive Image-Based Exploded View Diagrams, Graphics Interface 2004, May 2004.
- On-line documents:
Project Web Page
Example-Based Stereo with General BRDFs
- Abstract:
- This paper presents an algorithm for voxel-based reconstruction of objects with general reflectance properties from multiple calibrated views. It is assumed that one or more reference objects with known geometry are imaged under the same lighting and camera conditions as the object being reconstructed. The unknown object is reconstructed using a radiance basis inferred from the reference objects. Each view may have arbitrary, unknown distant lighting. If the lighting is calibrated, our model also takes into account shadows that the object casts upon itself. To our knowledge, this is the first stereo method to handle general, unknown, spatially-varying BRDFs under possibly varying, distant lighting, and shadows. We demonstrate our algorithm by recovering geometry and surface normals for objects with both uniform and spatially-varying BRDFs. The normals reveal fine-scale surface detail, allowing much richer renderings than the voxel geometry alone.
- Citation:
- Treuille, Adrien, Hertzmann, Aaron, Seitz, Steven M. Example-Based Stereo with General BRDFs, 8th European Conference on Computer Vision (ECCV 2004), Prague, Czech Republic, May 2004.
- On-line documents:
Video-Based Document Tracking: Unifying Your Physical and Electronic Desktops
- Abstract:
- This paper presents an approach for tracking paper documents on the desk over time and automatically linking them to the corresponding electronic documents using an overhead video camera. We demonstrate our system in the context of two scenarios, paper tracking and photo sorting. In the paper tracking scenario, the system tracks changes in the stacks of printed documents and books on the desk and builds a complete representation of the spatial structure of the desktop. When users want to nd a printed document buried in the stacks, they can query the system based on appearance, keywords, or access time. The system also provides a remote desktop interface for directly browsing the physical desktop from a remote location. In the photo sorting scenario, users sort printed photographs into physical stacks on the desk. The system automatically recognizes the photographs and organizes the corresponding digital photographs into separate folders according to the physical arrangement. Our framework provides a way to unify the physical and electronic desktops without the need for a specialized physical infrastructure except for a video camera.
- Citation:
- Kim, Jiwon, Seitz, Steven M. and Agrawala, Maneesh. Video-Based Document Tracking: Unifying Your Physical and Electronic Desktops, UIST 2004, Santa Fe, New Mexico, USA, October 2004.
- On-line documents:
Project Page
Momentum-based Parameterization of Dynamic Character Motion
- Abstract:
- This paper presents a system for rapid editing of highly dynamic motion capture data. The heart of this system is an optimization algorithm that can transform the captured motion so that it satisfies high-level user constraints while enforcing that the linear and angular momentum of the motion remain physically plausible. Unlike most previous approaches to motion editing, our algorithm does not require pose specification or model reduction, and the user only need specify high-level changes to the input motion. To preserve the similar dynamic behavior of the input motion, we introduce a spline-based parameterization that matches the linear and angular momentum pattern of the motion capture data. Because our algorithm enables rapid convergence by presenting a good initial state of the optimization, the user can efficiently generate a large family of realistic motions from a single input motion. The algorithm can then populate the dynamic space of motions by simple interpolation, effectively parameterizing the space of realistic motions. We show how this framework can be used to produce an effective interface for rapid creation of dynamic animations, as well as to drive the dynamic motion of a character in real-time.
- Citation:
- Abe, Y., Liu, C. K., Popović, Z.. Momentum-based Parameterization of Dynamic Character Motion, ACM SIGGRAPH / Eurographics Symposium on Computer Animation, August 2004.
- On-line documents:
Project page
Flow-based Video Synthesis and Editing
- Abstract:
- This paper presents a novel algorithm for synthesizing and editing video of natural phenomena that exhibit continuous flow patterns. The algorithm analyzes the motion of textured particles in the input video along user-specified flow lines, and synthesizes seamless video of arbitrary length by enforcing temporal continuity along a second set of user-specified flow lines. The algorithm is simple to implement and use. We used this technique to edit video of waterfalls, rivers, flames, and smoke.
- Citation:
- Bhat, Kiran S., Seitz, Steven M., Hodgins, Jessica K., Khosla, Pradeep K.. Flow-based Video Synthesis and Editing, ACM Transactions on Graphics 23(3), July 2004.
- On-line documents:
- PDF (2.6MB)
Project page
Video Tooning
- Abstract:
- We describe a system for transforming an input video into a highly abstracted, spatio-temporally coherent cartoon animation with a range of styles. To achieve this, we treat video as a space-time volume of image data. We have developed an anisotropic kernel mean shift technique to segment the video data into contiguous volumes. These provide a simple cartoon style in themselves, but more importantly provide the capability to semi-automatically rotoscope semantically meaningful regions.
In our system, the user simply outlines objects on keyframes. A mean shift guided interpolation algorithm is then employed to create three dimensional semantic regions by interpolation between the keyframes, while maintaining smooth trajectories along the time dimension. These regions provide the basis for creating smooth two dimensional edge sheets and stroke sheets embedded within the spatio-temporal video volume. The regions, edge sheets, and stroke sheets are rendered by slicing them at particular times. A variety of styles of rendering are shown. The temporal coherence provided by the smoothed semantic regions and sheets results in a temporally consistent non-photorealistic appearance.
- Citation:
- Wang, Jue, Xu, Yingqing, Shum, Heung-Yeung, Cohen, Michael F. Video Tooning, ACM Transactions on Graphics 23(3), July 2004.
- On-line documents:
- PDF (4.0MB)
Spacetime Faces: High-Resolution Capture for Modeling and Animation
- Abstract:
- We present an end-to-end system that goes from video sequences to high resolution, editable, dynamically controllable face models. The capture system employs synchronized video cameras and structured light projectors to record videos of a moving face from multiple viewpoints. A novel spacetime stereo algorithm is introduced to compute depth maps accurately and overcome over-fitting deficiencies in prior work. A new template fitting and tracking procedure fills in missing data and yields point correspondence across the entire sequence without using markers. We demonstrate a data-driven, interactive method for inverse kinematics that draws on the large set of fitted templates and allows for posing new expressions by dragging surface points directly. Finally, we describe new tools that model the dynamics in the input sequence to enable new animations, created via key-framing or texture-synthesis techniques.
- Citation:
- Zhang, Li, Snavely, Noah, Curless, Brian, Seitz, Steven M.. Spacetime Faces: High-Resolution Capture for Modeling and Animation. ACM Transactions on Graphics 23(3), July 2004.
- On-line documents:
- PDF (10.3MB)
Project page
Fluid Control using the Adjoint Method
- Abstract:
- We describe a novel method for controlling physics-based fluid simulations through gradient-based nonlinear optimization. Using a technique known as the adjoint method, derivatives can be computed efficiently, even for large 3D simulations with millions of control parameters. In addition, we introduce the first method for the full control of free-surface liquids. We show how to compute adjoint derivatives through each step of the simulation, including the fast marching algorithm, and describe a new set of control parameters specifically designed for liquids.
- Citation:
- McNamara, Antoine, Treuille, Adrien, Popović, Zoran, Stam, Jos. Fluid Control using the Adjoint Method, ACM Transactions on Graphics 23(3), July 2004.
- On-line documents:
- PDF (4.0MB)
Project page
Interactive Digital Photomontage
- Abstract:
- We describe an interactive, computer-assisted framework for combining parts of a set of photographs into a single composite picture, a process we call "digital photomontage." Our framework makes use of two techniques primarily: graph-cut optimization, to choose good seams within the constituent images so that they can be combined as seamlessly as possible; and gradient-domain fusion, a process based on Poisson equations, to further reduce any remaining visible artifacts in the composite. Also central to the framework is a suite of interactive tools that allow the user to specify a variety of high-level image objectives, either globally across the image, or locally through a painting-style interface. Image objectives are applied independently at each pixel location and generally involve a function of the pixel values (such as "maximum contrast") drawn from that same location in the set of source images. Typically, a user applies a series of image objectives iteratively in order to create a finished composite. The power of this framework lies in its generality; we show how it can be used for a wide variety of applications, including "selective composites" (for instance, group photos in which everyone looks their best), relighting, extended depth of field, panoramic stitching, clean-plate production, stroboscopic visualization of movement, and time-lapse mosaics.
- Citation:
- Agarwala, Aseem, Dontcheva, Mira, Agrawala, Maneesh, Drucker, Steven, Colburn, Alex, Curless, Brian, Salesin, David H., Cohen, Michael. Interactive Digital Photomontage, ACM Transactions on Graphics 23(3), July 2004.
- On-line documents:
- PDF (6.0MB)
Project page
Keyframe-Based Tracking for Rotoscoping and Animation
- Abstract:
- We describe a new approach to rotoscoping --- the process of tracking contours in a video sequence --- that combines computer vision with user interaction. In order to track contours in video, the user specifies curves in two or more frames; these curves are used as keyframes by a computer-vision-based tracking algorithm. The user may interactively refine the curves and then restart the tracking algorithm. Combining computer vision with user interaction allows our system to track any sequence with significantly less effort than interpolation-based systems --- and with better reliability than pure computer vision systems. Our tracking algorithm is cast as a spacetime optimization problem that solves for time-varying curve shapes based on an input video sequence and user-specified constraints. We demonstrate our system with several rotoscoped examples. Additionally, we show how these rotoscoped contours can be used to help create cartoon animation by attaching user-drawn strokes to the tracked contours.
- Citation:
- Agarwala, Aseem, Hertzmann, Aaron, Salesin, David H., Seitz, Steven. Keyframe-Based Tracking for Rotoscoping and Animation, ACM Transactions on Graphics 23(3), July 2004.
- On-line documents:
- PDF (2.4MB)
Project page
Style-based Inverse Kinematics
- Abstract:
- We present an inverse kinematics system based on a learned model of human poses. Given a set of constraints, our system can produce the most likely pose satisfying those constraints, in realtime. Training the model on different input data leads to different styles of IK. The model is represented as a probability distribution over the space of all possible poses. This means that our IK system can generate any pose, but prefers poses that are most similar to the space of poses in the training data. We represent the probability with a novel model called a Scaled Gaussian Process Latent Variable Model. The parameters of the model are all learned automatically; no manual tuning is required for the learning component of the system. We additionally describe a novel procedure for interpolating between styles.
Our style-based IK can replace conventional IK, wherever it is used in computer animation and computer vision. We demonstrate our system in the context of a number of applications: interactive character posing, trajectory keyframing, real-time motion capture with missing markers, and posing from a 2D image.
- Citation:
- Grochow, Keith, Martin, Steven L., Hertzmann, Aaron, and Popović, Zoran. Style-based Inverse Kinematics, ACM Transactions on Graphics 23(3), July 2004.
- On-line documents:
- PDF (1.4MB)
Project page
On Creating Animated Presentations
- Abstract:
- Computers are used to display visuals for millions of live presentations each day, and yet only the tiniest fraction of these make any real use of the powerful graphics hardware available on virtually all of today s machines. In this paper, we describe our efforts toward harnessing this power to create better types of presentations: presentations that include meaningful animation as well as at least a limited degree of interactivity. Our approach has been iterative, alternating between creating animated talks using available tools, then improving the tools to better support the kinds of talk we wanted to make. Through this cyclic design process, we have identified a set of common authoring paradigms that we believe a system for building animated presentations should support. We describe these paradigms and present the latest version of our script-based system for creating animated presentations, called SLITHY. We show several examples of actual animated talks that were created and given with versions of SLITHY, including one talk presented at SIGGRAPH 2000 and four talks presented at SIGGRAPH 2002. Finally, we describe a set of design principles that we have found useful for making good use of animation in presentation.
- Citation:
- Zongker, Douglas E. and Salesin, David H.. On Creating Animated Presentations, Eurographics / ACM SIGGRAPH Symposium on Computer Animation, July 2003.
- On-line documents:
- PDF (1.0MB)
Adaptive Grid-Based Document Layout
- Abstract:
- Grid-based page designs are ubiquitous in commercially printed publications, such as newspapers and magazines. Yet, to date, no one has invented a good way to easily and automatically adapt such designs to arbitrarily-sized electronic displays. The difficult of generalizing grid-based designs explains the generally inferior nature of on-screen layouts when compared to their printed counterparts, and is arguably one of the greatest remaining impediments to creating on-line reading experiences that rival those of ink on paper. In this work, we present a new approach to adaptive grid-based document layout, which attempts to bridge this gap. In our approach, an adaptive layout style is encoded as a set of grid-based templates that know how to adapt to a range of page sizes and other viewing conditions. These templates include various types of layout elements (such as text, figures, etc.) and define, through constraint-based relationships, just how these elements are to be laid out together as a function of both the properties of the content itself, such as a figure's size and aspect ratio, and the properties of the viewing conditions under which the content is being displayed. We describe an XML-based representation for our templates and content, which maintains a clean separation between the two. We also describe the various parts of our research prototype system: a layout engine for formatting the page; a paginator for determining a globally optimal allocation of content amongst the pages; and a graphical user interface for interactively creating adaptive templates. We also provide numerous examples demonstrating the capabilities of this prototype, including this paper, itself, which has been laid out with our system.
- Citation:
- Jacobs, C., Li, W., Schrier, E., Bargeron, D., and Salesin, D.. Adaptive Grid-Based Document Layout, ACM Transactions on Graphics 22(3) (Proceedings of ACM SIGGRAPH 2003), July 2003, pp. 838-847.
- On-line documents:
- PDF (8.6MB)
Shape and Motion under Varying Illumination: Unifying Structure from Motion, Photometric Stereo, and Multi-view Stereo
- Abstract:
- This paper presents an algorithm for computing optical flow, shape, motion, lighting, and albedo from an image sequence of a rigidly-moving Lambertian object under distant illumination. The problem is formulated in a manner that subsumes structure from motion, multi-view stereo, and photometric stereo as special cases. The algorithm utilizes both spatial and temporal intensity variation as cues: the former constrains flow and the latter constrains surface orientation; combining both cues enables dense reconstruction of both textured and texture-less surfaces. The algorithm works by iteratively estimating affine camera parameters, illumination, shape, and albedo in an alternating fashion. Results are demonstrated on videos of hand-held objects moving in front of a fixed light and camera.
- Citation:
- Zhang, L., Curless, B., Hertzmann, A. and Seitz, Steven M.. Shape and Motion under Varying Illumination: Unifying Structure from Motion, Photometric Stereo, and Multi-view Stereo, Proceedings of the 9th IEEE International Conference on Computer Vision (ICCV), Nice France, October 2003.
- On-line documents:
Project Web Page
A Sketching Interface for Articulated Animation
- Abstract:
- We introduce a new interface for rapidly creating 3D articulated figure animation, from 2D sketches of the character in the desired key frame poses. Since the exact 3D pose corresponding to a 2D drawing is ambiguous we first reconstruct a set of possible 3D configurations and then apply a set of constraints and assumptions to present the user with the most likely 3D pose. The user can refine this candidate pose by choosing among alternate poses proposed by the system. This interface is supported by pose reconstruction and optimization methods specifically designed to work with imprecise hand drawn figures. Our system provides a simple, intuitive and fast interface for creating rough animations that leverages our users existing ability to draw. The resulting key framed sequence can be exported to a commercial animation packages for interpolation and additional refinement.
- Citation:
- Davis, J., Agrawala, M., Chuang, E., Popović, Z. and Salesin, David H. 2003 A Sketching Interface for Articulated Animation, Eurographics / ACM SIGGRAPH Symposium on Computer Animation.
- On-line documents:
- PDF (3.5MB)
Project Web Page
Estimating Cloth Simulation Parameters from Video
- Abstract:
- Cloth simulations are notoriously difficult to tune due to the many parameters that must be adjusted to achieve the look of a particular fabric. In this paper, we present an algorithm for estimating the parameters of a cloth simulation from video data of real fabric. A perceptually motivated metric based on matching between folds is used to compare video of real cloth with simulation. This metric compares two video sequences of cloth and returns a number that measures the differences in their folds. Simulated annealing is used to minimize the frame by frame error between the metric for a given simulation and the real-world footage. To estimate all the cloth parameters, we identify simple static and dynamic calibration experiments that use small swatches of the fabric. To demonstrate the power of this approach, we use our algorithm to find the parameters for four different fabrics. We show the match between the video footage and simulated motion on the calibration experiments, on new video sequences for the swatches, and on a simulation of a full skirt.
- Citation:
- Bhat, K. S., Twigg, C. D., Hodgins, J. K., Khosla, P. K., Popović, Z. and Seitz, Steven M. 2003. Estimating Cloth Simulation Parameters from Video, Eurographics / ACM SIGGRAPH Symposium on Computer Animation.
- On-line documents:
- PDF (7.5MB PDF)
Project Web Page
Layered Acting for Character Animation
- Abstract:
- We introduce an acting-based animation system for creating and editing character animation at interactive speeds. Our system requires minimal training, typically under an hour, and is well suited for rapidly prototyping and creating expressive motion. A real-time motion-capture framework records the user's motions for simultaneous analysis and playback on a large screen. The animator's real-world, expressive motions are mapped into the character's virtual world. Visual feedback maintains a tight coupling between the animator and character. Complex motion is created by layering multiple passes of acting. We also introduce a novel motion-editing technique, which derives implicit relationships between the animator and character. The animator mimics some aspect of the character motion, and the system infers the association between features of the animator's motion and those of the character. The animator modifies the mimic by acting again, and the system maps the changes onto the character. We demonstrate our system with several examples.
- Citation:
- Dontcheva, M., Yngve, G. and Popović, Z. Layered Acting for Character Animation. ACM Transactions on Graphics (ACM SIGGRAPH 2003).
- On-line documents:
Project Web Page
![]()
Bird Flight
- Abstract:
- In this paper we describe a physics-based method for synthesis of bird flight animations. Our method computes a realistic set of wingbeats that enables a bird to follow the specified trajectory. We model the bird as an articulated skeleton with elastically deformable feathers. The bird motion is created by applying joint torques and aerodynamic forces over time in a forward dynamics simulation. We solve for each wingbeat motion separately by optimizing for wingbeat parameters that create the most natural motion. The final animation is constructed by concatenating a series of optimal wingbeats. This detailed bird flight model enables us to produce flight motions of different birds performing a variety of maneuvers including taking off, cruising, rapidly descending, turning, and landing.
- Citation:
- Wu, Jia-Chi and Zoran Popović. 2003. Realistic Modeling of Bird Flight Animations, ACM Transactions on Graphics (ACM SIGGRAPH 2003).
- On-line documents:
- PDF (3.6MB)
Project Web Page
Keyframe Control of Smoke Simulations
- Abstract:
- We describe a method for controlling smoke simulations through user-specified keyframes. To achieve the desired behavior, a continuous quasi-Newton optimization solves for appropriate "wind" forces to be applied to the underlying velocity field throughout the simulation. The cornerstone of our approach is a method to efficiently compute exact derivatives through the steps of a fluid simulation. We formulate an objective function corresponding to how well a simulation matches the user's keyframes, and use the derivatives to solve for force parameters that minimize this function. For animations with several keyframes, we present a novel multipleshooting approach. By splitting large problems into smaller overlapping subproblems, we greatly speed up the optimization process while avoiding certain local minima.
- Citation:
- Treuille, A., McNamara, A., Popović, Z. and Stam, J. 2003. Keyframe Control of Smoke Simulations, ACM Transactions on Graphics (ACM SIGGRAPH 2003).
- On-line documents:
- PDF (0.9 MB)
Project web page
Shadow Matting and Compositing
- Abstract:
- In this paper, we describe a method for extracting shadows from one natural scene and inserting them into another. We develop physically-based shadow matting and compositing equations and use these to pull a shadow matte from a source scene in which the shadow is cast onto an arbitrary planar background. We then acquire the photometric and geometric properties of the target scene by sweeping oriented linear shadows (cast by a straight object) across it. From these shadow scans, we can construct a shadow displacement map without requiring camera or light source calibration. This map can then be used to deform the original shadow matte. We demonstrate our approach for both indoor scenes with controlled lighting and for outdoor scenes using natural lighting.
- Citation:
- CHUANG, Y.-Y., GOLDMAN, D. B, CURLESS, B., SALESIN, D. H.. and SZELISKI, R. 2003. Shadow Matting and Compositing, ACM Transactions on Graphics (ACM SIGGRAPH 2003).
- On-line documents:
- PDF (1.8 MB)
- Project web page
The space of human body shapes: reconstruction and parameterization from range scans
- Abstract:
- We develop a novel method for fitting high-resolution template meshes to detailed human body range scans with sparse 3D markers. We formulate an optimization problem in which the degrees of freedom are an affine transformation at each template vertex. The objective function is a weighted combination of three measures: proximity of transformed vertices to the range data, similarity between neighboring transformations, and proximity of sparse markers at corresponding locations on the template and target surface. We solve for the transformations with a non-linear optimizer, run at two resolutions to speed convergence. We demonstrate reconstruction and consistent parameterization of 250 human body models. With this parameterized set, we explore a variety of applications for human body modeling, including: morphing, texture transfer, statistical analysis of shape, model fitting from sparse markers, feature analysis to modify multiple correlated parameters (such as the weight and height of an individual), and transfer of surface detail and animation controls from a template to fitted models.
- Citation:
- ALLEN, B., CURLESS, B., and POPOVIĆ, Z. 2003. The space of human body shapes: reconstruction and parameterization from range scans, ACM Transactions on Graphics (ACM SIGGRAPH 2003).
- On-line documents:
- Paper web page
- PDF (6.3 MB)
- Project web page
Shape and Materials by Example: A Photometric Stereo Approach
- Abstract:
- This paper presents a technique for computing the geometry of objects with general reflectance properties from images. For surfaces with varying material properties, a full segmentation into different material types is also computed. It is assumed that the camera viewpoint is fixed, but the illumination varies over the input sequence. It is also assumed that one or more example objects with similar materials and known geometry are imaged under the same illumination conditions. Unlike most previous work in shape reconstruction, this technique can handle objects with arbitrary and spatially-varying BRDFs. Furthermore, the approach works for arbitrary distant and unknown lighting environments. Finally, almost no calibration is needed, making the approach exceptionally simple to apply.
- Citation:
- Aaron Hertzmann, Steven M. Seitz, Proceedings of CVPR 2003.
- On-line documents:
- Project web page
Spacetime Stereo: Shape Recovery for Dynamic Scenes
- Abstract:
- This paper extends the traditional binocular stereo problem into the spacetime domain, in which a pair of video streams is matched simultaneously instead of matching pairs of images frame by frame. Almost any existing stereo algorithm may be extended in this manner simply by replacing the image matching term with a spacetime term. By utilizing both spatial and temporal appearance variation, this modification reduces ambiguity and increases accuracy. Three major applications for spacetime stereo are proposed in this paper. First, spacetime stereo serves as a general framework for structured light scanning and generates high quality depth maps for static scenes. Second, spacetime stereo is effective for a class of natural scenes, such as waving trees and flowing water, which have repetitive textures and chaotic behaviors and are challenging for existing stereo algorithms. Third, the approach is one of very few existing methods that can robustly reconstruct objects that are moving and deforming over time, achieved by use of oriented spacetime windows in the matching procedure. Promising experimental results in the above three scenarios are demonstrated.
- Citation:
- Li Zhang, Brian Curless, and Steven M. Seitz, Proceedings of CVPR 2003.
- On-line documents:
- Complete article [Acrobat pdf file]
Project Web Page
View-dependent refinement of multiresolution meshes with subdivision connectivity
Abstract:
- We present a view-dependent level-of-detail algorithm for triangle meshes with subdivision connectivity. The algorithm is more suitable for textured meshes of arbitrary topology than existing progressive mesh-based schemes. It begins with a wavelet decomposition of the mesh, and, per frame, finds a partial sum of wavelets necessary for high-quality renderings from that frame's viewpoint. We present a screen-space error metric that measures both geometric and texture deviation and tends to outperform prior error metrics developed for progressive meshes. In addition, wavelets that lie outside the view frustum or in backfacing areas are eliminated. The algorithm takes advantage of frame-to-frame coherence for improved performance and supports geomorphs for smooth transitions between levels of detail.
- Citation:
- Daniel I. Azuma, Daniel N. Wood, Brian Curless, Tom Duchamp, David H. Salesin, and Werner Stuetzle, Proceedings of AFRIGRAPH 2003.
- On-line documents:
- Complete article [Acrobat pdf file]
Single View Modeling of Free-Form Scenes
- Abstract:
- This paper presents a novel approach for reconstructing free-form, texture-mapped, 3D scene models from a single painting or photograph. Given a sparse set of user-specified constraints on the local shape of the scene, a smooth 3D surface that satisfies the constraints is generated. This problem is formulated as a constrained variational optimization problem. In contrast to previous work in single view reconstruction, our technique enables high quality reconstructions of free-form curved surfaces with arbitrary reflectance properties. A key feature of the approach is a novel hierarchical transformation technique for accelerating convergence on a non-uniform, piecewise continuous grid. The technique is interactive and updates the model in real time as constraints are added, allowing fast reconstruction of photorealistic scene models. The approach is shown to yield high quality results on a large variety of images.
- Citation:
- L. Zhang, G. Dugas-Phocion, J.-S. Samson, and S. M. Seitz, Proceedings of CVPR 2001.
L. Zhang, G. Dugas-Phocion, J.-S. Samson, and S. M. Seitz, Journal of Visualization and Computer Animation, 2002, (Invited paper).
- On-line documents:
- Project web page
Curve Analogies
- Abstract:
- This paper describes a method for learning statistical models of 2D curves, and shows how these models can be used to design line art rendering styles by example. A user can create a new style by providing an example of the style, e.g. by sketching a curve in a drawing program. Our method can then synthesize random new curves in this style, and modify existing curves to have the same style as the example. This method can incorporate position constraints on the resulting curves.
- Citation:
- Aaron Hertzmann, Nuria Oliver, Brian Curless, and Steven M. Seitz. 13th Eurographics Workshop on Rendering, Pisa, Italy, June 26-28, 2002.
- On-line documents:
- Complete article [Acrobat pdf file]
Rapid Shape Acquisition Using Color Structured Light and Multi-pass Dynamic Programming
- Abstract:
- This paper presents a color structured light technique for recovering object shape from one or more images. The technique works by projecting a pattern of stripes of alternating colors and matching the projected color transitions with observed edges in the image. The correspondence problem is solved using a novel, multi-pass dynamic programming algorithm that eliminates global smoothness assumptions and strict ordering constraints present in previous formulations. The resulting approach is suitable for generating both high-speed scans of moving objects when projecting a single stripe pattern and high-resolution scans of static scenes using a short sequence of time-shifted stripe patterns. In the latter case, spacetime analysis is used at each sensor pixel to obtain inter-frame depth localization. Results are demonstrated for a variety of complex scenes.
- Citation:
- Li Zhang, Brian Curless, and Steven M. Seitz. 1st international symposium on 3D data processing, visualization, and transmission, Padova, Italy, June 19-21, 2002.
- On-line documents:
- Complete article [Acrobat pdf file]
Project page
Articulated Body Deformation from Range Scan Data
- Abstract:
- This paper presents an example-based method for calculating skeleton-driven body deformations. Our example data consists of range scans of a human body in a variety of poses. Using markers captured during range scanning, we construct a kinematic skeleton and identify the pose of each scan. We then construct a mutually consistent parameterization of all the scans using a posable subdivision surface template. The detail deformations are represented as displacements from this surface, and holes are filled smoothly within the displacement maps. Finally, we combine the range scans using k-nearest neighbor interpolation in pose space. We demonstrate results for a human upper body with controllable pose, kinematics, and underlying surface shape.
- Citation:
- Brett Allen, Brian Curless, and Zoran Popović. Proceedings of SIGGRAPH 2002, in Computer Graphics Proceedings, Annual Conference Series, 2002.
- On-line documents:
- Complete article [Acrobat pdf file]
Interactive Skeleton-Driven Dynamic Deformations
- Abstract:
- This paper presents a framework for the skeleton-driven animation of elastically deformable characters. A character is embedded in a coarse volumetric control lattice, which provides the structure needed to apply the finite element method. To incorporate skeletal controls, we introduce line constraints along the bones of simple skeletons. The bones are made to coincide with edges of the control lattice, which enables us to apply the constraints efficiently using algebraic methods. To accelerate computation, we associate regions of the volumetric mesh with particular bones and perform locally linearized simulations, which are blended at each time step. We define a hierarchical basis on the control lattice, so for detailed interactions the simulation can adapt the level of detail. We demonstrate the ability to animate complex models using simple skeletons and coarse volumetric meshes in a manner that simulates secondary motions at interactive rates.
- Citation:
- Steve Capell, Seth Green, Brian Curless, Tom Duchamp, and Zoran Popović. Proceedings of SIGGRAPH 2002, in Computer Graphics Proceedings, Annual Conference Series, 2002.
- On-line documents:
- Complete article (PDF, 1.7MB)
Project web page
A Multiresolution Framework for Dynamic Deformations
- Abstract:
- We present a novel framework for the dynamic simulation of elastic deformable solids. Our approach combines classical finite element methodology with a multiresolution subdivision framework in order to produce fast, easy to use, and realistic animations. We represent deformations using a hierarchical basis constructed using volumetric subdivision. The subdivision framework provides topological flexibility and the hierarchical basis allows the simulation to add detail where it is needed. Since volumetric parameterization is difficult for complex models, we support the embedding of objects in domains that are easier to parameterize.
- Citation:
- Steve Capell, Seth Green, Brian Curless, Tom Duchamp, and Zoran Popović. Proceedings of ACM SIGGRAPH Symposium on Computer Animation, 2002.
- On-line documents:
- Complete article (PDF, 0.7MB)
Project web page
Video Matting of Complex Scenes
- Abstract:
- This paper describes a new framework for video matting, the process of pulling a high-quality alpha matte and foreground from a video sequence. The framework builds upon techniques in natural image matting, optical flow computation, and background estimation. User interaction is comprised of garbage matte specification if background estimation is needed, and hand-drawn keyframe segmentations into "foreground," "background," and "unknown". The segmentations, called trimaps, are interpolated across the video volume using forward and backward optical flow. Competing flow estimates are combined based on information about where flow is likely to be accurate. A Bayesian matting technique uses the flowed trimaps to yield high-quality mattes of moving foreground elements with complex boundaries filmed by a moving camera. A novel technique for smoke matte extraction is also demonstrated.
- Citation:
- Yung-Yu Chuang, Aseem Agarwala, Brian Curless, David H. Salesin, and Richard Szeliski. Proceedings of SIGGRAPH 2002, in Computer Graphics Proceedings, Annual Conference Series, 2002.
- On-line documents:
- Complete article [Acrobat pdf file]
- Project web page
Synthesis of Complex Dynamic Character Motion from Simple Animations
- Abstract:
- In this paper we present a general method for rapid prototyping of realistic character motion. We solve for the natural motion from a simple animation provided by the animator. Our framework can be used to produce relatively complex realistic motion with little user effort. We describe a novel constraint detection method that automatically determines different constraints on the character by analyzing the input motion. We show that realistic motion can be achieved by enforcing a small set of linear and angular momentum constraints. This simplified approach helps us avoid the complexities of computing muscle forces. Simpler dynamic constraints also allow us to generate animations of models with greater complexity, performing more intricate motions. Finally, we show that by learning a small set of key parameters that describe a character pose we can help a non-skilled animator rapidly create realistic character motion.
- Citation:
- C. Karen Liu and Zoran Popović. Proceedings of SIGGRAPH 2002, in Computer Graphics Proceedings, Annual Conference Series, 2002.
- On-line documents:
- Complete article [Acrobat pdf file]
The Space of All Stereo Images
- Abstract:
- A theory of stereo image formation is presented that enables a complete classification of all possible stereo views, including non-perspective varieties. Towards this end, the notion of epipolar geometry is generalized to apply to multiperspective images. It is shown that any stereo pair must consist of rays lying on one of three varieties of quadric surfaces. A unified representation is developed to model all classes of stereo views, based on the concept of a quadric view. The benefits include a unified treatment of projection and triangulation operations for all stereo views. The framework is applied to derive new types of stereo image representations with unusual and useful properties. Experimental examples of these images are constructed and used to obtain 3D binocular object reconstructions.
- Citation:
- Steven M. Seitz and Jiwon Kim. Marr Prize Special Issue, IJCV 2001. First published in ICCV 2001.
- On-line documents:
- Project web page
Image Analogies
- Abstract:
- This paper describes a new framework for processing images by example, called "image analogies." The framework involves two stages: a design phase, in which a pair of images, with one image purported to be a "filtered" version of the other, is presented as training data; and an application phase, in which the learned filter is applied to some new target image in order to create an "analogous" filtered result. Image analogies are based on a simple multi-scale autoregression, inspired primarily by recent results in texture synthesis. By choosing different types of source image pairs as input, the framework supports a wide variety of "image filter" effects, including traditional image filters, such as blurring or embossing; improved texture synthesis, in which some textures are synthesized with higher quality than by previous approaches; super-resolution, in which a higher-resolution image is inferred from a low-resolution source; texture transfer, in which images are "texturized" with some arbitrary source texture; artistic filters, in which various drawing and painting styles are synthesized based on scanned real-world examples; and texture-by-numbers, in which realistic scenes, composed of a variety of textures, are created using a simple painting interface.
- Citation:
- Aaron Hertzmann, Charles E. Jacobs, Nuria Oliver, Brian Curless, and David H. Salesin. Proceedings of SIGGRAPH 2001, in Computer Graphics Proceedings, Annual Conference Series, 2001.
- On-line documents:
- Project web page
A Bayesian Approach to Digital Matting
- Abstract:
- This paper proposes a new Bayesian framework for solving the matting problem, i.e. extracting a foreground element from a background image by estimating an opacity for each pixel of the foreground element. Our approach models both the foreground and background color distributions with spatially-varying sets of Gaussians, and assumes a fractional blending of the foreground and background colors to produce the final output. It then uses a maximum-likelihood criterion to estimate the optimal opacity, foreground and background simultaneously. In addition to providing a principled approach to the matting problem, our algorithm effectively handles objects with intricate boundaries, such as hair strands and fur, and provides an improvement over existing techniques for these difficult cases.
- Citation:
- Yung-Yu Chuang, Brian Curless, David H. Salesin, and Richard Szeliski. Proceedings of CVPR 2001.
- On-line documents:
- Complete article [Acrobat pdf file]
- Project web page
Interactive Control of Rigid Body Simulations
- Abstract:
- Physical simulation of dynamic objects has become commonplace in computer graphics because it produces highly realistic animations. In this paradigm the animator provides few physical parameters such as the objects' initial positions and velocities, and the simulator automatically generates realistic motions. The resulting motion, however, is difficult to control because even a small adjustment of the input parameters can drastically affect the subsequent motion. Furthermore, the animator often wishes to change the end-result of the motion instead of the initial physical parameters. We describe a novel interactive technique for intuitive manipulation of rigid multi-body simulations. Using our system, the animator can select bodies at any time and simply drag them to desired locations. In response, the system computes the required physical parameters and simulates the resulting motion. Surface characteristics such as normals and elasticity coefficients can also be automatically adjusted to provide a greater range of feasible motions, if the animator so desires. Because the entire simulation editing process runs at interactive speeds, the animator can rapidly design complex physical animations that would be difficult to achieve with existing rigid body simulators.
- Citation:
- Jovan Popovic, Steven M. Seitz, Michael Erdmann, Zoran Popovic, and Andrew Witkin. Proceedings of SIGGRAPH 2000, in Computer Graphics Proceedings, Annual Conference Series, 2000.
- On-line documents:
- Complete article [Acrobat pdf file]
The Digital Michelangelo Project: 3D Scanning of Large Statues
- Abstract:
- We describe a hardware and software system for digitizing the shape and color of large fragile objects under non-laboratory conditions. Our system employs laser triangulation rangefinders, laser time-of-flight rangefinders, digital still cameras, and a suite of software for acquiring, aligning, merging, and viewing scanned data. As a demonstration of this system, we digitized 10 statues by Michelangelo, including the well-known figure of David, two building interiors, and all 1,163 extant fragments of the Forma Urbis Romae, a giant marble map of ancient Rome. Our largest single dataset is of the David - 2 billion polygons and 7,000 color images. In this paper,we discuss the challenges we faced in building this system, the solutions we employed, and the lessons we learned. We focus in particular on the unusual design of our laser triangulation scanner and on the algorithms and software we developed for handling very large scanned models.
- Citation:
- Marc Levoy, Kari Pulli, Brian Curless, Szymon Rusinkiewicz, David Koller, Lucas Pereira, Matt Ginzton, Sean Anderson, James Davis, Jeremy Ginsberg, Jonathan Shade, and Duane Fulk. Proceedings of SIGGRAPH 2000, in Computer Graphics Proceedings, Annual Conference Series, 2000.
- On-line documents:
- Project web page
Video Textures
- Abstract:
- This paper introduces a new type of medium, called a video texture, which has qualities somewhere between those of a photograph and a video. A video texture provides a continuous infinitely varying stream of images. While the individual frames of a video texture may be repeated from time to time, the video sequence as a whole is never repeated exactly. Video textures can be used in place of digital photos to infuse a static image with dynamic qualities and explicit action. We present techniques for analyzing a video clip to extract its structure, and for synthesizing a new, similar looking video of arbitrary length. We combine video textures with view morphing techniques to obtain 3D video textures. We also introduce video-based animation, in which the synthesis of video textures can be guided by a user through high-level interactive controls. Applications of video textures and their extensions include the display of dynamic scenes on web pages, the creation of dynamic backdrops for special effects and games, and the interactive control of video-based animation.
- Citation:
- Arno Schödl, Richard Szeliski, David H. Salesin, and Irfan Essa. Proceedings of SIGGRAPH 2000, in Computer Graphics Proceedings, Annual Conference Series, 2000.
- On-line documents:
- Complete article [Acrobat pdf file]
- Project web page
Surface Light Fields for 3D Photography
- Abstract:
- A surface light field is a function that assigns a color to each ray originating on a surface. Surface light fields are well suited to constructing virtual images of shiny objects under complex lighting conditions. This paper presents a framework for construction, compression, interactive rendering, and rudimentary editing of surface light fields of real objects. Generalizations of vector quantization and principal component analysis are used to construct a compressed representation of an object's surface light field from photographs and range scans. A new rendering algorithm achieves interactive rendering of images from the compressed representation, incorporating view-dependent geometric level-of-detail control. The surface light field representation can also be directly edited to yield plausible surface light fields for small changes in surface geometry and reflectance properties.
- Citation:
- Daniel N. Wood, Daniel I. Azuma, Ken Aldinger, Brian Curless, Tom Duchamp, David H. Salesin, and Werner Stuetzle. Proceedings of SIGGRAPH 2000, in Computer Graphics Proceedings, Annual Conference Series, 2000.
- On-line documents:
- Complete article [Acrobat pdf file]
- Project web page
Escherization
- Abstract:
This paper introduces and presents a solution to the "Escherization" problem: given a closed figure in the plane, find a new closed figure that is similar to the original and tiles the plane. Our solution works by using a simulated annealer to optimize over a parameterization of the "isohedral" tilings, a class of tilings that is flexible enough to encompass nearly all of Escher's own tilings, and yet simple enough to be encoded and explored by a computer. We also describe a representation for isohedral tilings that allows for highly interactive viewing and rendering. We demonstrate the use of these tools -- along with several additional techniques for adding decorations to tilings -- with a variety of original ornamental designs.
- Citation:
- Craig S. Kaplan and David H. Salesin. Proceedings of SIGGRAPH 2000, in Computer Graphics Proceedings, Annual Conference Series, 2000.
- On-line documents:
- Complete article [Acrobat pdf file]
- Project web page
Environment Matting Extensions: Towards Higher Accuracy and Real-Time Capture
- Abstract:
- Environment matting is a generalization of traditional bluescreen matting. By photographing an object in front of a sequence of structured light backdrops, a set of approximate light-transport paths through the object can be computed. The original environment matting research chose a middle ground---using a moderate number of photographs to produce results that were reasonably accurate for many objects. In this work, we extend the technique in two opposite directions: recovering a more accurate model at the expense of using additional structured light backdrops, and obtaining a simplified matte using just a single backdrop. The first extension allows for the capture of complex and subtle interactions of light with objects, while the second allows for video capture of colorless objects in motion.
- Citation:
- Yung-Yu Chuang, Douglas E. Zongker, Joel Hindorff, Brian Curless, David H. Salesin, and Richard Szeliski. Proceedings of SIGGRAPH 2000, in Computer Graphics Proceedings, Annual Conference Series, 2000.
- On-line documents:
- Complete article [Acrobat pdf file, 1,458 Kb (hi-res figures)]
- Complete article [Acrobat pdf file, 423 Kb (lo-res figures)]
- Technical Report UW-CSE-2000-05-01 [Acrobat pdf file, 1,735 Kb] (SIGGRAPH paper + appendix)
- Project web page
Example-Based Hinting of TrueType Fonts
- Abstract:
- Hinting in TrueType is a time-consuming manual process in which a typographer creates a sequence of instructions for better fitting the characters of a font to a grid of pixels. In this paper, we propose a new method for automatically hinting TrueType fonts by transferring hints of one font to another. Given a hinted source font and a target font without hints, our method matches the outlines of corresponding glyphs in each font, and then translates all of the individual hints for each glyph from the source to the target font. It also translates the control value table (CVT) entries, which are used to unify feature sizes across a font. The resulting hinted font already provides a great improvement over the unhinted version. More importantly, the translated hints, which preserve the sound, hand-designed hinting structure of the original font, provide a very good starting point for a professional typographer to complete and fine-tune, saving time and increasing productivity. We demonstrate our approach with examples of automatically hinted fonts at typical display sizes and screen resolutions. We also provide estimates of the time saved by a professional typographer in hinting new fonts using this semi-automatic approach.
- Citation:
- Douglas E. Zongker, Geraldine Wade, and David H. Salesin. Proceedings of SIGGRAPH 2000, in Computer Graphics Proceedings, Annual Conference Series, 2000.
- On-line documents:
- Complete article [Acrobat pdf file, 280 Kb (hi-res figures)]
Environment Matting and Compositing
- Abstract:
- This paper introduces a new process, environment matting, which captures not just a foreground object and its traditional opacity matte from a real-world scene, but also a description of how that object refracts and reflects light, which we call an environment matte. The foreground object can then be placed in a new environment, using environment compositing, where it will refract and reflect light from that scene. Objects captured in this way exhibit not only specular but glossy and translucent effects, as well as selective attenuation and scattering of light according to wavelength. Moreover, the environment compositing process, which can be performed largely with texture mapping operations, is fast enough to run at interactive speeds on a desktop PC. We compare our results to photos of the same objects in real scenes. Applications of this work include the relighting of objects for virtual and augmented reality, more realistic 3D clip art, and interactive lighting design.
- Citation:
- Douglas E. Zongker, Dawn M. Werner, Brian Curless, and David H. Salesin. Proceedings of SIGGRAPH 99, in Computer Graphics Proceedings, Annual Conference Series, 1999.
- On-line documents:
- Complete article [Acrobat pdf file, 1,703 Kb (hi-res figures)]
- Complete article [Acrobat pdf file, 484 Kb (lo-res figures)]
- Project web page
Interactive Arrangement of Botanical L-System Models
- Abstract:
- In this paper, we explore the problem of interactively manipulating plant models without sacrificing their botanical accuracy. The primary technical contribution of the paper is a method for interactively manipulating plant structures using a inverse-kinematics optimization technique. The branches of the plant are endowed with flexural and torsional stiffnesses, and these are used in the IK optimization. We demonstrate our approach with several examples of plant models arranged in this fashion.
- Citation:
- Joanna L. Power, A. J. Bernheim Brush, David H. Salesin, and Przemyslaw Prusinkiewicz. 1999 ACM Symposium on Interactive 3D Graphics.
- On-line documents:
- Complete article [Acrobat pdf file, 120 Kb]
- Color Plate [Acrobat pdf file, 44 Kb]
Computer-Generated Floral Ornament
- Abstract:
- This paper describes some of the priniciples of traditional floral ornamental design, and explores ways in which these designs can be created algorithmically. It introduces the idea of "adaptive clip art," which encapsulates the rules for creating a specific ornamental pattern. Adaptive clip art can be used to generate patterns that are tailored to fit a particularly shaped region of the plane. If the region is resized or reshaped, the ornament can be automatically regenerated to fill this new area in an appropriate way. Our ornamental patterns are created in two steps: first, the geometry of the pattern is generated as a set of two-dimensional curves and filled boundaries: second, this geometry is rendered in any number of styles. We demonstrate our approach with a variety of floral ornamental designs.
- Citation:
- Michael T. Wong, Douglas E. Zongker, and David Salesin. Proceedings of SIGGRAPH 98, in Computer Graphics Proceedings, Annual Conference Series, 1998.
- On-line documents:
- Complete article [Acrobat pdf file, 9.7 Mb]
Layered Depth Images
- Abstract:
- In this paper we present a set of efficient image based rendering methods capable of rendering multiple frames per second on a PC. The first method warps Sprites with Depth representing smooth surfaces without the gaps found in other techniques. A second method for more general scenes performs warping from an intermediate representation called a Layered Depth Image (LDI). An LDI is a view of the scene from a single input camera view, but with multiple pixels along each line of sight. The size of the representation grows only linearly with the observed depth complexity in the scene. Moreover, because the LDI data are represented in a single image coordinate system, McMillan's warp ordering algorithm can be successfully adapted. As a result, pixels are drawn in the output image in back-to-front order. No z-buffer is required, so alpha-compositing can be done efficiently without depth sorting. This makes splatting an efficient solution to the resampling problem.
- Citation:
- Jonathan Shade, Steven J. Gortler, Li-wei He, and Richard Szeliski. Proceedings of SIGGRAPH 98, in Computer Graphics Proceedings, Annual Conference Series, 1998.
- On-line documents:
- Complete article [Acrobat pdf file, 841 Kb]
- Project web page
Reproducing Color Images Using Custom Inks
- Abstract:
- We investigate the general problem of reproducing color images on an offset press using custom inks in any combination and number. While this problem has been explored previously for the case of two inks, there are a number of new mathematical and algorithmic challenges that arise as the number of inks increases. These challenges include more complex gamut mapping strategies, more efficient ink selection strategies, and fast and numerically accurate methods for computing ink separations in situations that may be either over-or under-constrained. In addition, the demands of high-quality color printing require an accurate physical model of the colors that result from overprinting multiple inks using halftoning, including the effects of trapping, dot gain, and the interreflection of light between ink layers. In this paper, we explore these issues related to printing with multiple custom inks, and address them with new algorithms and physcial models. Finally, we present some printed examples demonstrating the promise of our methods.
- Citation:
- Eric J. Stollnitz, Victor Ostromoukhov, and David Salesin. Proceedings of SIGGRAPH 98, in Computer Graphics Proceedings, Annual Conference Series, 1998.
- On-line documents:
- Article without appendices [Acrobat pdf file, 221 Kb]
- Project web page
Synthesizing Realistic Facial Expressions from Photographs
- Abstract:
- We present new techniques for creating photorealistic textured 3D facial models from photographs of a human subject, and for creating smooth transitions between different facial expressions by morphing between these different models. Starting from several uncalibrated views of a human subject, we employ a user-assisted technique to recover the camera poses corresponding to the views as well as the 3D coordinates of a sparse set of chosen locations on the subject's face. A scattered data interpolation techniques is then used to deform a generic face mesh to fit the particular geometry of the subject's face. Having recovered the camera poses and the facial geometry, we extract from the input images one or more texture maps for the model. This process is reapeated for several facial expressions of a particular subject. To generate transitions between these facial expressions we use 3D shape morphing between the corresponding face modes, while at the same time blending the corresponding textures. Using our technique, we have been able to generate highly realistic face models and natural looking animations.
- Citation:
- Frederic Pighin, Jamie Hecker, Dani Lischinski, Richard Szeliski, and David Salesin. Proceedings of SIGGRAPH 98, in Computer Graphics Proceedings, Annual Conference Series, 1998.
- On-line documents:
- Complete article [Acrobat pdf file, 272 Kb]
- Also available as Department of Computer Science and Engineering Technical Report TR 97-01-03.
- Text only [compressed PostScript file, 80 Kb]
- Color plates [compressed PostScript file, 80 Kb]
- Project web page
Computer-Generated Watercolor
- Abstract:
- This paper describes the various artistic effects of watercolor and shows how they can be simulated automatically. Our watercolor model is based on an ordered set of translucent glazes, which are created independently using a shallow-water fluid simulation. We use a Kubelka-Munk compositing model for simulating the optical effect of the superimposed glazes. We demonstrate how computer-generated watercolor can be used in three different applications: as part of an interactive watercolor paint system, as a method for automatic image "watercolorization", and as a mechanism for non-photorealistic rendering of three-dimensional scenes.
- Citation:
- Cassidy Curtis, Sean Anderson, Josh Seims, Kurt Fleischer, and David H. Salesin. Proceedings of SIGGRAPH 97, in Computer Graphics Proceedings, Annual Conference Series, 1997.
- On-line documents:
- The paper (gzipped PostScript, 8.1 Mb)
- The paper (.pdf, 1.83 Mb)
- The paper (.pdf, 517 Kb, images compressed beyond recognition)
Multiperspective Panoramas for Cel Animation
- Abstract:
- Traditional 2D cel-animation uses background panoramas over which foreground characters, and the camera, move. Because characters move through complex worlds, the panorama often contains multiple views of the world taken from different perspectives, but nonetheless seamlessly integrated into a 2D painting that is locally coherent, but may be globally nonsensical. This is a difficult task in which computer graphics can be of service. The panorama-creation process is currently performed by specialists, and the complexity of camera paths through the world is limited by their ability to assemble multiple views into a coherent whole. Futhermore, once an artist has created the panorama it is often difficult to incorporate computer-generated imagery elements into the animation because it is hard to abstract a meaningful 3D geometry for the world. This paper presents a system that creates a layout-guide from a crude 3D model and a camera path through that model; this layout-guide is then used in the production of a panorama, but one in which complex paths are possible, and in which the incorporation of CG elements is simple.
- Citation:
- Daniel N. Wood, Adam Finkelstein, John F. Hughes, Craig E. Thayer, and David H. Salesin. Proceedings of SIGGRAPH 97, in Computer Graphics Proceedings, Annual Conference Series, 1997.
- On-line documents:
- Complete article [Acrobat pdf file, 2 Mb]
Orientable Textures for Image-Based Pen-and-Ink Illustration
- Abstract:
- We present an interactive system for creating pen-and-ink-style line drawings from greyscale images in which the strokes of the rendered illustation follow the features of the original image. The user, via new interaction techniques for editing a direction field, specifies an orientation for each region of the image; the computer draws oriented strokes, based on a user-specified set of example strokes, that achieve the same tone as the image via a new algorithm that compares an adaptively-blurred version of the current illustration to the target tone image. By aligning the direction field with surface orientation of the objects in the image the user can create textures that appear attached to those objects instead of merely converying their darkness. The result is a more compelling pen-and-ink illustration than was previously possible from 2D reference imagery.
- Citation:
- Michael P. Salisbury, Michael T. Wong, John F. Hughes, and David H. Salesin. Proceedings of SIGGRAPH 97, in Computer Graphics Proceedings, Annual Conference Series, 401-406, August 1997.
- On-line documents:
- Complete article [compressed PostScript file, 2.3 Mb]
Progressive Previewing of Ray-Traced Images Using Image Plane Discontinuity Meshing
- Abstract:
- This paper presents a new method for progressively previewing a ray-traced image while it is being computed. Our method constructs and incrementally updates a constrained Delaunay triangulation for the image plane containing various important discontinuity edges in the image along with all of the image samples that have been computed by the ray tracer. The triangulation is rendered using hardware Gourand shading, yielding a piecewise linear approximation to the final image. Texture mapped surfaces as well as other regions in the image that are not well approximated by linear interpolation, are handled threough the use of hardware texture mapping.
- Citation:
- F.P. Pighin, D. Lishinski, and D.H. Salesin. Eight Eurographics Workshop on Rendering, 115-125, Saint-Etienne, France, June 1997.
- On-line documents:
- Complete article [Acrobat pdf file, 1.2 MB]
Clustering for Glossy Global Illumination
- Abstract:
- We present a new clustering algorithm for global illumination in complex environments. The new algorithm extends previous work on clustering for radiosity to allow for non-diffuse (glossy) reflectors. We represent clusters as points with directional distributions of outgoing and incoming radiance and importance, and we derive an error bound for transfers between these clusters. The algorithm groups input surfaces into a hierarchy of clusters, and then permits clusters to interact only if the error bound is below an acceptable tolerance. We show that the algorithm is asymptotically more efficient than previous clustering algorithms even when restricted to ideally diffuse environments. Finally, we demonstrate the performance of our method on two complex glossy environments.
- Citation:
- Per H. Christensen, Dani Lischinski, Eric J. Stollnitz, and David H. Salesin. Clustering for glossy global illumination. ACM Transactions on Graphics 16, January 1997.
- On-line documents:
- Complete article [Acrobat file, 843 Kb]
- Complete article [compressed PostScript file, 9.1 Mb]
- Article without color images [compressed PostScript file, 92 Kb]
Comic Chat
- Abstract:
- Comics have a rich visual vocabulary, and people find them appealing. They are also an effective form of communication. We have built a system, called Comic Chat, that represents on-line communications in the form of comics. Comic Chat automates numerous aspects of comics generation, including balloon construction and layout, the placement and orientation of comic characters, the default selection of character gestures and expressions, the incorporation of semantic panel elements, and the choice of zoom factor for the virtual camera. This paper describes the mechanisms that Comic Chat uses to perform this automation, as well as novel aspects of the program's user interface. Comic Chat is a working program, allowing groups of people to communicate over the Internet. It has several advantages over other graphical chat programs, including the availability of a graphical history, and a dynamic graphical presentation.
- Citation:
- D. Kurlander, David H. Salesin, T. Skelly. Comic chat. Proceedings of SIGGRAPH 96, in Computer Graphics Proceedings, Annual Conference Series, 225-236, August 1996.
- On-line documents:
- Complete article [Acrobat file, 2.3 Mb]
Declarative Camera Control for Automatic Cinematography
- Abstract:
- Animations generated by interactive 3D computer graphics applications are typically portrayed either from a particular character's point of view or from a small set of strategically-placed viewpoints. By ignoring camera placement, such applications fail to realize important storytelling capabilities that have been explored by cinematographers for many years. In this paper, we describe several of the principles of cinematography and show how they can be formalized into a declarative language, called the Declarative Camera Control Language ( dccl ). We describe the application of dccl within the context of a simple interactive video game and argue that dccl represents cinematic knowledge at the same level of abstraction as expert directors by encoding 16 idioms from a film textbook. These idioms produce compelling animations, as demonstrated on the accompanying videotape.
- Citation:
- David B. Christianson, Sean. E. Anderson, Li-Wei He, Daniel S. Weld, Michael F. Cohen, David H. Salesin. Declarative camera control for automatic cinematography. Proceedings of AAAI '96 (Portland, OR), 148-155, 1996. An earlier version appeared as Department of Computer Science and Engineering Technical Report TR 95-01-03, University of Washington, 1995.
- On-line documents:
- Complete article [Acrobat file, 240 KB]
Fast Rendering of Complex Environments
Using a Spatial Hierarchy
- Abstract:
- We present a new method for accelerating the rendering of complex static scenes. The technique is applicable to unstructured scenes containing arbitrary geometric primitives and has sublinear asymptotic complexity. Our approach is to construct a spatial hierarchy of cells over the scene and to associate with each cell a simplified representation of its contents. The scene is then rendered using a traversal of the hierarchy in which a cell's approximation is drawn instead of its contents if the approximation is sufficiently accurate. We apply the method to several different scenes and demonstrate significant speedups with little image degradation. We also exhibit and discuss some of the artifacts that our approximation may cause.
- Citation:
- Brad Chamberlain, Tony DeRose, Dani Lischinski, David H. Salesin, and John Snyder. Fast rendering of complex environments using a spatial hierarchy. Proceedings of Graphics Interface '96 (Toronto), 132-141, 1996.
- On-line documents:
- Text of article [compressed PostScript file, 38 Kb]
- Color plates 1 and 2 [compressed PostScript file, 225 Kb]
- Color plates 3 to 5 [compressed PostScript file, 1 Mb]
Global Illumination of Glossy Environments
Using Wavelets and Importance
- Abstract:
- We show how importance-driven refinement and a wavelet basis can be combined to provide an efficient solution to the global illumination problem with glossy and diffuse reflections. Importance is used to focus the computation on the interactions having the greatest impact on the visible solution. Wavelets are used to provide an efficient representation of radiance, importance, and the transport operator. We discuss a number of choices that must be made when constructing a finite element algorithm for glossy global illumination. Our algorithm is based on the standard wavelet decomposition of the transport operator and makes use of a four-dimensional wavelet representation for spatially- and angularly-varying radiance distributions. We use a final gathering step to improve the visual quality of the solution. Features of our implementation include support for curved surfaces as well as texture-mapped anisotropic emission and reflection functions.
- Citation:
- Per H. Christensen, Eric J. Stollnitz, David H. Salesin, and Tony D. DeRose. Global illumination of glossy environments using wavelets and importance. ACM Transactions on Graphics, 15(1):37-71, January 1996.
- On-line documents:
- Complete article [Acrobat file, 611 Kb]
- Complete article [compressed PostScript file, 2.8 Mb]
Hierarchical Image Caching for Accelerated Walkthroughs of Complex Environments
- Abstract:
- We present a new method for accelerating walkthroughs of geometrically complex static scenes. As a preprocessing step, our method constructs a BSP-tree that hierarchically partitions the geometric primitives in the scene. In the course of a walkthrough, images of nodes at various levels of the hierarchy are cached for reuse in subsequent frames. A cached image is applied as a texture map to a single quadrilateral that is drawn instead of the geometry contained in the corresponding node. Visual artifacts are kept under control by using an error metric that quantifies the descrepancy between the appearance of geometry contained in a node and the cached image. The new method is shown to achieve significant speedups for a walkthrough of a complex outdoor scene, with little or no loss in rendering quality.
- Citation:
- Jonathan Shade, Dani Lischinski, Tony D. DeRose, and John Snyder, David H. Salesin. Hierarchical image caching for accelerated walkthroughs of complex environments. Proceedings of SIGGRAPH 96, in Computer Graphics Proceedings, Annual Conference Series, 75-82, August 1996.
- On-line documents:
- Complete article [Acrobat pdf file, 180 Kb]
- Also available as Department of Computer Science and Engineering Technical Report TR 96-01-06.
- TR 96-01-06 [compressed postscript file, 40 Kb]
- Plate1 [compressed Postscript file, 824 Kb]
- Plate2 [compressed Postscript file, 704 Kb]
- Project web page
Interactive Multiresolution Surface Viewing
- Abstract:
- Multiresolution analysis has been proposed as a basic tool supporting compression, progressive transmission, and level-of-detail control of complex meshes in a unified and theoretically sound way.
- We extend previous work on multiresolution analysis of meshes in two ways. First, we show how to perform multiresolution analysis of colored meshes by separately analyzing shape and color. Second, we describe efficient algorithms and data structures that allow us to incrementally construct lower resolution approximations to colored meshes from the geometry and color wavelet coefficients at interactive rates. We have integrated these algorithms in a prototype mesh viewer that supports progressive transmission, dynamic display at a constant frame rate independent of machine characteristics and load, and interactive choice of tradeoff between the amount of detail in geometry and color. The viewer operates as a helper application to Netscape, and can therefore be used to rapidly browse and display complex geometric models stored on the World Wide Web.
- Citation:
- Andrew Certain, Jovan Popovic, Tony DeRose, Tom Duchamp, David H. Salesin, Werner Stuetzle. Interactive multiresolution surface viewing. Proceedings of SIGGRAPH 96, in Computer Graphics Proceedings, Annual Conference Series, 91-98, August 1996.
- Available as Department of Computer Science and Engineering Technical Report TR 96-01-07, University of Washington, 1996.
- On-line documents:
- Compelete article [Acrobat file, 435 Kb]
Multiresolution Video
- Abstract:
- We present a new representation for time-varying image data, called multiresolution video. The representation allows for varying -- and arbitrarily high -- spatial and temporal resolutions in different parts of a video sequence. The representation is based on a sparse, hierarchical encoding of the video data. We show how multiresolution video supports a number of primitive operations: drawing frames at a particular spatial and temporal resolution; and translating, scaling, and compositing multiresolution sequences. These primitives are then used as the building blocks to support a variety of applications: video compression; multiresolution playback, including motion-blurred "fast-forward" and "reverse"; constant speed display; enhanced video scrubbing; and "video clip art" editing and compositing. The multiresolution representation requires little storage overhead, and the algorithms using the representation are both simple and efficient.
- Citation:
- Adam Finkelstein, Charles E. Jacobs, David H. Salesin. Multiresolution Video. Proceedings of SIGGRAPH 96, in Computer Graphics Proceedings, Annual Conference Series, 281-290, August 1996.
- On-line documents:
- Complete article [Acrobat file, 651 Kb]
- Complete article [compressed Postscript file, 3.5 Mb]
- Article without color images [compressed Postscript file, 54Kb]
Rendering Parametric Surfaces in Pen and Ink
- Abstract:
- This paper presents new algorithms and techniques for rendering parametric free-form surfaces in pen and ink. In particular, we introduce the idea of "controlled-density hatching" for conveying tone, texture, and shape. The fine control over tone this method provides allows the use of traditional texture mapping techniques for specifying the tone of pen-and-ink illustrations. We also show how a planar map, a data structure central to our rendering algorithm, can be constructed from parametric surfaces, and used for clipping strokes and generating outlines. Finally, we show how curved shadows can be cast onto curved objects for this style of illustration.
- Citation:
- George Winkenbach, David H. Salesin. Rendering parametric surfaces in pen and ink. Proceedings of SIGGRAPH 96, in Computer Graphics Proceedings, Annual Conference Series, 469-476, August 1996.
- On-line documents:
- Available as Technical Report:
- 96-01-05 [compressed Postscript file, 801 Kb]
Reproducing Color Images as Duotones
- Abstract:
- We investigate a new approach for reproducing color images. Rather than mapping the colors in an image onto the gamut of colors that can be printed with cyan, magenta, yellow, and black inks, we choose the set of printing inks for the particular image being reproduced. In this paper, we look at the special case of selecting inks for duotone printing, a relatively inexpensive process in which just two inks are used. Specifically, the system we describe takes an image as input, and allows a user to select 0, 1, or 2 inks. It then chooses the remaining ink or inks so as to reproduce the image as accurately as possible and produces the appropriate color separations automatically.
- Citation:
- Joanna L. Power, Brad S. West, Eric J. Stollnitz, and David H. Salesin. Reproducing color images as duotones. Proceedings of SIGGRAPH 96, in Computer Graphics Proceedings, Annual Conference Series, 237-248, August 1996.
- On-line documents:
- Article without duotones [Acrobat file, 2.8 Mb]
- Article without duotones [compressed PostScript file, 3.0 Mb]
Scale-dependent Reproduction of Pen-and-ink Illustrations
- Abstract:
- This paper describes a compact resolution- and scale-independent representation for pen-and-ink illustrations. The proposed representation consists of a low-resolution grey-scale image, augmented by a set of discontinuity segments. We also present a new reconstruction algorithm that magnifies the low-resolution image while keeping the image sharp along the discontinuities. By storing pen-and-ink illustrations in this representation, we can produce high-fidelity illustrations at any scale and resolution by generating an image of the desired size and filling that image with pen-and-ink strokes.
- Citation:
- Mike Salisbury, Corey Anderson, Dani Lischinski, and David H. Salesin. Scale-dependent Reproduction of Pen-and-ink Illustration. Proceedings of SIGGRAPH 96, in Computer Graphics Proceedings, Annual Conference Series, 461-468, August 1996.
- On-line documents:
- Available as Technical Report:
- TR 96-01-02 [compressed PostScript file, 11.5 Mb]
Wavelets for Computer Graphics: Theory and Applications
- Preview:
- This distinctly accessible introduction to wavelets provides computer graphics professionals and researchers with the mathematical foundations for understanding and applying this new and powerful tool.
- Wavelets are rapidly becoming a core technique in computer graphics, with applications to:
- image editing and compression;
- automatic level-of-detail control for editing and rendering curves and surfaces;
- surface reconstruction from contours; and
- physical simulation for global illumination and animation.
- Stressing intuition and clarity, this book offers a solid understanding of the theory of wavelets and their proven applications in computer graphics.
- Although previous introductions to wavelets have presented an elegant mathematical framework, that framework is too restrictive to apply to many problems in graphics. In contrast, this book focuses on a generalized theory that naturally accommodates the kinds of objects that commonly arise in computer graphics, including images, open curves, and surfaces of arbitrary topology.
- The book also contains a foreword by Ingrid Daubechies and an appendix covering the necessary background material in linear algebra.
- Contents: See the table of contents.
- Citation:
- Eric J. Stollnitz, Tony D. DeRose, and David H. Salesin. Wavelets for Computer Graphics: Theory and Applications. Morgan Kaufmann, San Francisco, 1996.
- ISBN: 1-55860-375-1
- Ordering information: See Morgan Kaufmann's web site.
- On-line material:
- Matlab code from Appendix C [compressed tar file, 17 Kb]
The Virtual Cinematographer: a Paradigm for Automatic Real-Time Camera Control and Directing
- Abstract:
- This paper presents a paradigm for automatically generating complete camera specifications for capturing events in virtual 3D environments in real-time. We describe a fully implemented system, called the Virtual Cinematographer, and demonstrate its application in a virtual "party" setting. Cinematographic expertise, in the form of film idioms, is encoded as a set of small hierarchically organized finite state machines. Each idiom is responsible for capturing a particular type of scene, such as three virtual actors conversing or one actor moving across the environment. The idiom selects shot types and the timing of transitions between shots to best communicate events as they unfold. A set of camera modules, shared by the idioms, is responsible for the low-level geometric placement of specific cameras for each shot. The camera modules are also responsible for making subtle changes in the virtual actors' positions to best frame each shot. In this paper, we discuss some basic heuristics of filmmaking and show how these ideas are encoded in the Virtual Cinematographer.
- Citation:
- Li-Wei He, Michael F. Cohen, David H. Salesin. The virtual cinematographer: a paradigm for automatic real-time camera control and directing. Proceedings of SIGGRAPH 96, in Computer Graphics Proceedings, Annual Conference Series, 217-224, August 1996.
- On-line documents:
- Complete article [Acrobat file, 158 Kb]
Fast Multiresolution Image Querying
- Abstract:
- We present a method for searching in an image database using a query image that is similar to the intended target. The query image may be a hand-drawn sketch or a (potentially low-quality) scan of the image to be retrieved. Our searching algorithm makes use of multiresolution wavelet deompositions of the query and database images. The coefficients of these decompositions are distilled into small "signatures" for each image. We introduce an "image querying metric" that operates on these signatures. This metric essentially compares how many significant wavelet coefficients the query has in common with potential targets. The metric includes parameters that can be tuned, using a statistical analysis, to accommodate the kinds of image distortions found in different types of image queries. The resulting algoritm is simple, requires very little storage overhead for the database signatures, and is fast enough to be performed on a database of 20,000 images at interactive rates (on standard desktop machines) as a query is sketched. Our experiments with hundreds of queries in databases of 1000 and 20,000 images show dramatic improvement, in both speed and success rate, over using a conventional L1, L2, or color histogram norm.
- Citation:
- Charles E. Jacobs, Adam Finkelstein, David H. Salesin. Fast Multiresolution Image Querying. Proceedings of SIGGAPH 95, in Computer Graphics Proceedings, Annual Conference Series, pages 277-286, August 1995.
- On-line documents:
- Available as Technical Report TR 95-01-06:
- Complete report [compressed PostScript file, 474 Kb]
- Report without color plates [compressed PostScript file, 63 Kb]
Multiresolution Analysis of Arbitrary Meshes
- Abstract:
- In computer graphics and geometric modeling, shapes are often represented by triangular meshes. With the advent of laser scanning systems, meshes of extreme complexity are rapidly becoming commonplace. Such meshes are notoriously expensive to store, transmit, render, and are awkward to edit. Multiresolution analysis offers a simple, unified, and theoretically sound approach to dealing with these problems. Lounsbery et al. have recently developed a technique for creating multiresolution representations for a restricted class of meshes with subdivision connectivity. Unfortunatedly, meshes encountered in practice typically do not meet this requirement. In this paper we present a method for overcoming the subdivision connectivity restriction, meaning that completely arbitrary meshes can now be converted to multiresolution form. The method is based on the approximation of an arbitrary initial mesh M by a mesh M3 that has subdivision connectivity and is guaranteed to be within a specified tolerance.
- The key ingredient of our algorithm is the construction of a parametrization of M over a simple domain. We expect this parametrization to be of use in other contexts, such as texture mapping or the approximation of complex meshes by NURBS patches.
- Citation:
- Matthias Eck, Tony DeRose, Tom Duchamp, Hugues Hoppe, Michael Lounsbery, and Werner Stuetzle. Multiresolution Analysis of Arbitrary Meshes. Technical Report #95-01-02, January 1995.
- On-line documents:
- Article [compressed PostScript file 1.3 Mb]
- Color plate 1 [compressed Postscript file 1.1 Mb]
- Color plate 2 [compressed Postscript file 1.6 Mb]
- Color plate 3 [compressed Postscript file 1 Mb]
- Color plate 4 [[compressed Postscript file 1 Mb]
Wavelets for Computer Graphics: A Primer
- Abstract:
- Wavelets are a mathematical tool for hierarchically decomposing functions. Using wavelets, a function can be described in terms of a coarse overall shape, plus details that range from broad to narrow. Regardless of whether the function of interest is an image, a curve, or a surface, wavelets provide an elegant technique for representing the levels of detail present. This primer is intended to provide those working in computer graphics with some intuition for what wavelets are, as well as to present the mathematical foundations necessary for studying and using them. In Part 1, we discuss the simple case of Haar wavelets in one and two dimensions, and show how they can be used for image compression. Part 2 presents the mathematical theory of multiresolution analysis, develops bounded-interval spline wavelets, and describes their use in multiresolution curve and surface editing.
- Citations:
- Eric J. Stollnitz, Tony D. DeRose, and David H. Salesin. Wavelets for computer graphics: A primer, part 1. IEEE Computer Graphics and Applications, 15(3):76-84, May 1995.
- Eric J. Stollnitz, Tony D. DeRose, and David H. Salesin. Wavelets for computer graphics: A primer, part 2. IEEE Computer Graphics and Applications, 15(4):75-85, July 1995.
- On-line documents:
- Part 1 [Acrobat file, 264 Kb]
- Part 1 [compressed PostScript file, 473 Kb]
- Part 2 [Acrobat file, 865 Kb]
- Part 2 [compressed PostScript file, 417 Kb]
- Matlab code [compressed tar file, 17 Kb]
Computer-Generated Pen-and-Ink Illustration
- Abstract:
- This paper describes the principles of traditional pen-and-ink illustration, and shows how a great number of them can be implemented as part of an automated rendering system. It introduces "stroke textures," which can be used for achieving both texture and tone with line drawing. Stroke textures also allow resolution-dependent rendering, in which the choice of strokes used in an illustration is appropriately tied to the resolution of the target medium. We demonstrate these techniques using complex architectural models, including Frank Lloyd Wright's "Robie House."
- Citation:
- Georges Winkenbach and David H. Salesin. Computer-Generated Pen-and-Ink Illustration. Proceedings of SIGGRAPH 94 (Orlando, Florida, July 24-29, 1994). in Computer Graphics, Annual Conference Series, 1994.
- On-line documents:
- Available as Technical Report:
- TR 94-01-08b [compressed Postscript file, 2.2 Mb]
Interactive Pen-and-Ink Illustration
- Abstract:
- We present an interactive system for creating pen-and-ink illustrations. The system uses stroke textures--collections of strokes arranged in different patterns--to generate texture and tone. The user"paints" with a desired stroke texture to achieve a desired tone, and the computer draws all of the individual strokes.
- The system includes support for using scanned or rendered images for reference to provide the user with guides for outline and tone. By following these guides closely, the illustration system can be used for interactive digital halftoning, in which stroke textures are applied to convey details that would otherwise be lost in this black-and white medium.
- By removing the burden of placing individual strokes from the user, the illustration system makes it possible to create fine stroke work with a purely mouse-based interface. Thus, this approach holds promise for bringing high-quality balck-and white illustration to the world of personal computing and desktop publishing.
- Citation:
- Michael P. Salisbury, Sean E. Anderson, Ronen Barzel, and David H. Salesin. Interactive Pen-and-Ink Illustration. Proceedings of SIGGRAPH 94, in Computer Graphics Proceedings, Annual Conference Series, pages 101-108, July 1994.
- On-line documents:
- Complete article [Acrobat file, 21MB]
Multiresolution Analysis for Surfaces of Arbitrary Topological Type
- Abstract:
- Multiresolution analysis provides a useful and efficient tool for representing shape and analyzing features at multiple levels of detail. Although the technique has met with consderable success when applied to univariate functions, images, and more generally to functions defined on Rn, to our knowledge it has not been extended to functions defined on surfaces of arbitrary genus.
- In this report, we demonstrate that multiresolution analysis can be extended to surfaces of arbitrary genus using techniques from subdivision surfaces. We envision many applications for this work, including automatic level-of-detail control in high-performance graphics rendering, compression of CAD models, and acceleration of global illumination algorithms. We briefly sketch one of these applications, that of automatic level-of-detail control of polyhedral surfaces.
- Citation:
- Tony D. DeRose, Michael Lounsbery, Joe Warren. Multiresolution Analysis for Surfaces of Arbitrary Topological Type. Deptment of Computer Science and Engineering, University of Washington Technical Report TR 93-10-05, October 29, 1993.
- On-line documents:
- [PDF document, 1.0MB]
Multiresolution Curves
- Abstract:
- We describe a multiresolution curve representation, based on wavelets, that conveniently supports a variety of operations: smoothing a curve; editing the overall form of a curve while preserving its details; and approximating a curve within any given error tolerance for scan conversion. We present methods to support continuous levels of smoothing as well as direct manipulation of an arbitrary portion of the curve; the control points, as well as the discrete nature of the underlying hierarchical representation, can be hidden from the user. The multiresolution representation requires no extra storage beyond that of the original control points, and the algorithms using the representation are both simple and fast.
- Citation:
- Adam Finkelstein, David H. Salesin. Multiresolution Curves. In Proceedings of SIGGRAPH '94, pages 261-268. ACM, New York, 1994.
- On-line documents:
- Available as Technical Report:
- TR 94-01-06b [compressed PostScript file, 352Kb]
Multiresolution Painting and Compositing
- Abstract:
- We describe a representation for "multiresolution images"--images that have different resolutions in different places--and methods for creating such images using painting and compositing operations. These methods are very easy to implement, and they are efficient in both memory and speed. At a particular resolution, the representation requires space proportional only to the amount of detail actually present, and the most common painting operations, "over" and "erase," require time proportional only to the number of pixels displayed. Finally, we show how "fractional-level zooming" can be implemented in order to allow a user to display and edit portions of a multiresolution image at any arbitrary size.
- Citation:
- Deborah F. Berman, Jason T. Bartell, David H. Salesin. Multiresolution Painting and Compositing. Proceedings of SIGGRAPH 94, in Computer Graphics Proceedings, Annual Conference Series, 85-90, July 1994.
- On-line documents:
- Available as Technical Report:
- TR 94-01-09b [compressed PostScript file, 8.8 Mb]
Multiresolution Tiling
- Abstract:
- This paper describes an efficient method for constructing a tiling between a pair of planar contours. The problem is of interest in a number of domains, including medical imaging, biological research and geological reconstructions. Our method, based on ideas from multiresolution analysis and wavelets, requires O(n) space and appears to require O(nlogn) time for average inputs, compared to the O(n2) space and O(n2logn) time required by the optimizing algorithm of Fuchs, Kedem and Uselton. The results computed by our algorithm are in many cases nearly the same as those of the optimizing algorithm, but at a small fraction of the computational cost. The performance improvement makes the algorithm usable for large contours in an interactive system. The use of multiresolution analysis provides an efficient mechanism for data compression by discarding wavelet coefficients smaller than a threshold value during reconstruction. The amount of detail lost can be controlled by appropreiate choice of the threshold value. The use of lower resolution approximations to the original contours yields significant savings in the time required to display a reconstructed object, and in the space required to store it.
- Citation:
- David Meyers. Multiresolution Tiling. Computer Graphics Forum, December 1994.
- On-line documents:
- Complete article [compressed PostScript file, 233Kb]
- Graphics Interface '94 version [compressed PostScript file, 189Kb]
Piecewise Smooth Surface Reconstruction
- Abstract:
- We present a general method for automatic reconstruction of accurate, concise, piecewise smooth surface models from scattered range data. The method can be used in a viariety of applications such as reverse engineering--the automatic generation of CAD models from physical objects. Novel aspects of the method are its ability to model surfaces of arbitrary topological type and to recover sharp features such as creases and corners. The method has proven to be effective, as demonstrated by a number of examples using both simulated and real data.
- A key ingredient in the method, and a principal contribution of this paper, is the introduction of a new class of piecewise smooth surface representations based on subdivision. These surfaces have a number of properties that make them ideal for use in surface reconstruction: they are simple to implement, they can model sharp features concisely, and they can be fit to scattered range data using an unconstrained optimization procedure.
- Citation:
- Hugues Hoppe, Tony DeRose, Tom Duchamp, Mark Halstead, Hubert Jin, John McDonald, Jean Schweitzer, Werner Stuetzle. Piecewise Smooth Surface Reconstruction. Computer Graphics Proceedings, Annual Conference Series, 1994.
- On-line documents:
- Complete article [compressed PostScript file, 98Kb]
Wavelet Radiance
- Abstract:
- In this paper, we show how wavelet analysis can be used to provide an efficient solution method for global illumination with glossy and diffuse reflections. Wavelets are used to sparsely represent radiance distribution functions and the transport operator. In contrast to previous wavelet methods (for radiosity), our algorithm transports light directly among wavelets, and eliminates the pushing and pulling procedures.
- The framework we describe supports curved surfaces and spatially-varying anisotropic BRDFs. We use importance to make the global illumination problem tractable for complex scenes, and we use a final gathering step to improve the visual quality of the solution.
- Citation:
- Per H. Christensen, Eric J. Stollnitz, David H. Salesin, and Tony D. DeRose. Wavelet radiance. In G. Sakas, P. Shirley, and S. Müller, editors, Photorealistic Rendering Techniques, pages 295-309. Springer-Verlag, Berlin, 1995.
- Reprinted from Proceedings of the Fifth Eurographics Workshop on Rendering (Darmstadt, Germany, June 1994), pages 287-302.
- On-line documents:
- Complete article [Acrobat file, 263 Kb]
- Complete article [compressed PostScript file, 1.4 Mb]
Electronic ``How Things Work'' Articles: Two Early Prototypes
- Abstract:
- The Electronic Encyclopedia Exploratorium (E3) is a vision of a future computer system--a kind of electronic ``How Things Work'' book. Typical articles in E3 will describe such mechanisms as compression refrigerators, engines, telescopes, and mechanical linkages. Each article will provide simulations, 3-dimensional animated graphics that the user can manipulate, laboratory areas that allow a user to modify the device or experiment with related artifacts, and a facility for asking questions and receiving customized, computer-generated English language explanations. In this paper, we discuss some of the foundational technology--especially focusing on topics in articial intelligence, graphics, and user interfaces--needed to achieve this long-term vision. We describe our two initial prototypes and the technical lessons we've learned from them.
- Citation:
- F. Amador, Deborah Berman, Alan Borning, Tony D. DeRose, Adam Finkelstein, D. Neville, David Notkin, David H. Salesin, Michael Salisbury, J. Sherman, Y. Sun, D. Weld, G. Winkenbach. Electronic ``How Things Work'' articles: Two early prototypes. IEEE Transactions on Knowledge and Data Engineering 5(4): 611-618, August 1993.
- On-line documents:
- Earlier version of article [Postscript file, 306 Kb]
Mesh Optimization
- Abstract:
- We present a method for solving the following problem: Given a set of data points scattered in three dimensions and an initial triangular mesh M0, produce a mesh M, of the same topological type a M0, that fits the data well and has a small number of vertices. Our approach is to minimize an energy function that explicitly models the competing desires of conciseness of representation and fidelity to the data. We show that mesh optimization can be effectively used in at least two applications: surface reconstruction from unorganized points, and mesh simplification (the reduction of the number of vertices in an initially dense mesh of triangles).
- Citation:
- Hughes Hoppe, Tony DeRose, Tom Duchamp, John McDonald, Werner Stuetzle. Mesh Optimization. In SIGGRAPH 93 Conference Procedings. ACM, New York, 1993.
- On-line documents:
- Available as Technical Report:
- TR 93-01-01 [compressed PostScript file, 972 Kb]
Three-Dimensional Computer Graphics:
A Coordinate Free Approach
- Preface:
- This manuscript is intended as a rigorous introduction to the field of computer graphics at a level appropriate for advanced undergraduates and beginning graduate students in computer science. My intent is not to present a completely comprehensive survey of the field. Rather, my goal is to provide a firm, modern account of those topics within the subfield of three-dimensional raster graphics that can be given adequate treatment in a ten week session. I have therefore, unfortunately, been forced to eliminate discussions of many interesting topics. The text by Foley, van Dam, Feiner, and Hughes should be considered a primary reference for topics not covered here.
- The manuscript is based on two courses (CSE 457 and 557) that I have taught over the past several years. The most distinguishing feature is the treatment of the geometric component of the material. Rather than using coordinate calculations, matrices, and matrix manipulations to accomplish geometric computations, a so-called coordinate-free approach is used. It is my feeling that a great deal of conceptual clarity and programming power is achieved by moving to the slightly higher level of abstraction provided by the coordinate- free framework.
- Citation:
- Tony D. DeRose, unpublished manuscript, 1993.
- On-line documents:
- Complete manuscript [compressed PostScript file, 1.3 Mb]
- Coordinate-free library for geometric programming:
A Continuous Adjoint Formulation for Radiance Transport
- Abstract:
- We describe a continuous adjoint formulation for radiance transport that allows a global illumination algorithm to focus on the directional interactions that contribute most to the visible scene. We show how the adjoint quantity for radiance can be described by an angular distribution that is only piecewise-continuous. This observation motivates the formulation of a related adjoint quantity, called exitant directional importance, whose angular distribution is continuous. We prove that exitant directional importance is equivalent to radiance in the sense that the two quantities satisfy the same transport equation and can be propagated through the environment in the same fashion.
- An adjoint formulation can dramatically reduce the time to compute radiosities when much of the scene is invisible. We present some preliminary results that demonstrate how the adjoint formulation for radiance can provide significant speed-ups even when all surfaces are visible.
- Citation:
- Per H. Christensen, David H. Salesin, Tony D. DeRose. Proceedings of the Fourth Eurographics Workshop on Rendering (Paris, France), 95-104, 1992.
- On-line documents:
- Article without figures [Acrobat file, 107 Kb]
Reconstructing Illumination Functions with Selected Discontinuities
- Abstract:
- Typical illumination functions contain boundaries that are discontinuous in intensity or derivative. These discontinuities arise from contact between surfaces, and from the penumbra and umbra boundaries of shadows cast by area light sources. In this paper, we present an algorithm that allows for smooth (C1) reconstruction of intensity everywhere across a surface except along selected edges of intensity or derivative discontinuity. The reconstruction algorithm is based on a piecewise-cubic scattered data interpolation method originally proposed by Clough and Tocher. Our results show marked improvement over piecewise linear or C1 quadratic reconstructions of some simple illumination functions.
- Citation:
- Dani Lischinski, Tony D. DeRose, David H. Salesin. Proceedings of the Third Eurographics Workshop on Rendering (Bristol, England), 99-112, 1992.
- On-line documents:
- Article without figures [Acrobat file, 155 Kb]
Surface Reconstruction from Unorganized Points
- Abstract:
- We describe and demonstrate an algorithm that takes as input an unorganized set of points {x1,...,xn}
on or near an unknown manifold M, and produces as output a simplicial surface that approximates M. Neither the topology, the presence of boundaries, nor the geometry of M are assumed to be known in advance - all are inferred automatically from the data. This problem naturally arises in a variety of practical situations such as range scanning an object from multiple view points, recovery of biological shapes from two-dimensional slices, and interactive surface sketching.
- Citation:
- Hugues Hoppe, Tony DeRose, Tom Duchamp, John McDonald, Werner Stuetzle. Surface Reconstruction from Unorganized Points. Computer Graphics Proceedings, Annual Conference Series, August 1992, pages 295-302. Available as Department of Computer Science and Engineering Technical Report TR 91-12-03, University of Washington, 1991.
- On-line documents:
- Directory containing TR version
People | Courses | Projects| Publications | Theses | Software/Data | Images | Home Page
Comments to grail-webmaster@cs.washington.edu 14 June 2001 sns