Multicore Bundle Adjustment


The emergence of multi-core computers represents a fundamental shift, with major implications for the design of computer vision algorithms. Most computers sold today have a multicore CPU with 2-16 cores and a GPU with anywhere from 4 to 128 cores. Exploiting this hardware parallelism will be key to the success and scalability of computer vision algorithms in the future. In this project, we consider the design and implementation of new inexact Newton type Bundle Adjustment algorithms that exploit hardware parallelism for efficiently solving large scale 3D scene reconstruction problems. We explore the use of multicore CPU as well as multicore GPUs for this purpose. We show that overcoming the severe memory and bandwidth limitations of current generation GPUs not only leads to more space efficient algorithms, but also to surprising savings in runtime. Our CPU based system is up to ten times and our GPU based system is up to thirty times faster than the current state of the art methods, while maintaining comparable convergence behavior.



Changchang Wu, Sameer Agarwal, Brian Curless, and Steven M. Seitz,
"Multicore Bundle Adjustment", CVPR 2011, (poster ,supplemental material)


Software (written and maintained by Changchang Wu)

pba(v1.0.5) (3MB, including code and win32 binary)
manual.pdf, (8/01/2011, a short usage manual)
VisualSFM, an integrated SFM software you may be interested in.

CMakeList contributed by Pierre Moulon
Code snippet for Multi-GPU contributed by Pravin Bhat

The software is distributed under the GNU General Public License V3.
For commercial licensing of the software, please contact Changchang Wu.


Recent changes (complete changelist, previous versions)

  Fixed a bug (rarely happens) in CUDA kernel configuration
  Supporting constraints for camera clusters with equal focal lengths
   Added first version of Intel-AVX-based implementation for CPU

   Supporting per-camera constant flag (CameraT::SetConstantCamera)
   Automatic switching from CUDA to CPU if CUDA is not supported
   CPU thread count adjustment according to the number of cores;
   Added Motion-only and Structure-only bundle adjustment mode.
   Added first draft of manual;


Performance tuning for CPU implementation

The thread number settings are coded in function:
You can tune the thread count to get better performance. If you run "driver filename -profile --float", the program will print out some useful function timing.



The data used in the paper are from the Bundle Adjustment in the Large project. The only exception is the slightly larger Venice Final model.

Some additional test data from UNC Landmark image reconstruction projects can be found here.



A short manual is available here.

You should use more LM iterations for the first few cameras if the two-view initialization is bad (decomposed from Fundamental matrix rather than Essential matrix). An alternative is to switch from regular BA to PBA only after a few cameras (e.g. 5). In particular for bundler, you can switch from run_sfm to run_sfm_pba after 5 camras.