How to Build Cities from Tourist Photos

My research colleague Vinh Nguyen sent me this video of Washington University’s latest image-based reconstruction project (via a New Scientist article “Entire cities recreated from Flickr photos”).

 

 

The video shows reconstructed mesh models of the city of Rome. The meshes were constructed using “Structure from Motion“, a “process of finding the three-dimensional structure by analyzing the motion of an object over time”. In this case, the motion is actually the motion of tourists around the city, capturing the same buildings from many different angles. The power of such an approach is that thousands of creative-commons images of a particular city can be retrieved with a quick flickr search.

Sameer Agarwal‘s team have built on existing reconstruction technology such as Microsoft’s PhotoSynth, and PhotoTourism, part of Washington U’s Community Photo Collections project. The big improvement however, is that they have extended it to run on many computers in parallel, thus allowing it to quickly process thousands of photos:

The data set consists of 150,000 images from Flickr.com associated with the tags “Rome” or “Roma”. Matching and reconstruction took a total of 21 hours on a cluster with 496 compute cores.

 

Previous Systems

I have been experimenting with structure-from-motion for a while now. Since late 2008, Photosynth has allowed users to upload and process their own photosets, and there are tools available (binarymillenium exporter, Photosynth Point Cloud Exporter) for exporting the resulting point-cloud, producing results like this:

Unfortunately, I have not had very successful results with my own photosets using PhotoSynth; the algorithm has failed to match images up correctly.

 

Video-based structure-from-motion

After disappointing results from image based structure-from-motion, I experimented with video-based processing. The theoretical advantage of video-based processing is that there are constraints on the positions of each concurrent frame, as the camera travels through its path of motion. There are a handful of camera-tracking applications designed for video production, intended to allow overlay of computer-generated content which moves correctly with the rest of the scene. Voodoo Camera Tracker is total free, and allows exporting of calculated point-clouds. The following are the quite promising results from an initial test:

 

PhotoTourism code

Excitingly, Washington U. has released the updated code used for PhotoTourism at the Bundler website. According to the site, Bundler is

a structure-from-motion system for unordered image collections (for instance, images from the Internet) written in C and C++. An earlier version of this system was used in the Photo Tourism project.

It will be interesting to see how robust the calculated camera-positions/point-clouds will be from this system. I will post results as I get them.

If you’ve had any experience with any of these systems, I’d love to hear from you (via the comments) as to the outcomes!