My research colleague Vinh Nguyen sent me this video of Washington University’s latest image-based reconstruction project (via a New Scientist article “Entire cities recreated from Flickr photos”).
The video shows reconstructed mesh models of the city of Rome. The meshes were constructed using “Structure from Motion“, a “process of finding the three-dimensional structure by analyzing the motion of an object over time”. In this case, the motion is actually the motion of tourists around the city, capturing the same buildings from many different angles. The power of such an approach is that thousands of creative-commons images of a particular city can be retrieved with a quick flickr search.
Sameer Agarwal‘s team have built on existing reconstruction technology such as Microsoft’s PhotoSynth, and PhotoTourism, part of Washington U’s Community Photo Collections project. The big improvement however, is that they have extended it to run on many computers in parallel, thus allowing it to quickly process thousands of photos:
The data set consists of 150,000 images from Flickr.com associated with the tags “Rome” or “Roma”. Matching and reconstruction took a total of 21 hours on a cluster with 496 compute cores.
Previous Systems
I have been experimenting with structure-from-motion for a while now. Since late 2008, Photosynth has allowed users to upload and process their own photosets, and there are tools available (binarymillenium exporter, Photosynth Point Cloud Exporter) for exporting the resulting point-cloud, producing results like this:
Unfortunately, I have not had very successful results with my own photosets using PhotoSynth; the algorithm has failed to match images up correctly.
Video-based structure-from-motion
After disappointing results from image based structure-from-motion, I experimented with video-based processing. The theoretical advantage of video-based processing is that there are constraints on the positions of each concurrent frame, as the camera travels through its path of motion. There are a handful of camera-tracking applications designed for video production, intended to allow overlay of computer-generated content which moves correctly with the rest of the scene. Voodoo Camera Tracker is total free, and allows exporting of calculated point-clouds. The following are the quite promising results from an initial test:
PhotoTourism code
Excitingly, Washington U. has released the updated code used for PhotoTourism at the Bundler website. According to the site, Bundler is
a structure-from-motion system for unordered image collections (for instance, images from the Internet) written in C and C++. An earlier version of this system was used in the Photo Tourism project.
It will be interesting to see how robust the calculated camera-positions/point-clouds will be from this system. I will post results as I get them.
If you’ve had any experience with any of these systems, I’d love to hear from you (via the comments) as to the outcomes!
talkie_tim
Jan 21, 2010 -
That’s a pretty cool concept. I’d love to see some landmarks local to me being modelled using this method. Most popular subjects in Bristol are the cranes in the docks, Clifton Suspension Bridge and the crystal ball in Millennium square.
Josh
Jan 21, 2010 -
The Bundler code is there for the taking (Windows Binary or source code available). Maybe this could be your homework for the week? It combines well with your EOS 450D shutterbugging 🙂
Phil Wilson
Jan 22, 2010 -
That Voodoo tracker example is fascinating.
At work we’ve been thinking about whether there’s anything we can do in this area to help prospective students get an idea about what the university is like both before they see it and whilst they’re visiting. There are some low-hanging fruit like Layar but also better mapping and routing using a combination of openstreetmap’s path detail and google map’s aerial photography but I hadn’t thought about using Photosynth and certainly not anything more advanced.
Josh
Jan 23, 2010 -
I’m not convinced that PhotoSynth itself has come far enough to actually be helpful. The feature-point match up isn’t very good, and I personally get very little out of their image navigation interface.
Some of this current research though is really on the cusp, about to become something really useful. Being able to tap into a world of freely contributed images, which are often timestamped, lets you reconstruct places without having to assume things are static, or having any preconceptions or structure to your mapping (I would argue).
Another project from Washington Uni (that I want to post about) tries to reconstruct the architectural structure from images.
Of course, it would be fun – and probably not too difficult – to look at all these city mapping systems, find where there are holes in their coverage (say a particular nook in the Notre Dame or Coliseum) and digitally insert yourself as a statue into some pictures. Then seed a few flickr accounts with them. The next iteration of digital reconstruction projects will pick it up. Want to help?