RESULTS - Stereo Reconstruction
Separation Bar
The stereo tracker algorithm is capable of constructing accurate 3D maps using the information comming from two cameras built as a fixed stereo rig. The system constructs the map during the survey (on-line process) using feature-based registering techniques (i.e. SURF and SIFT) exteded to a stereo framework. The algorithm runs a final bundle adjustment optimization (off-line) refining the structure and the camera motion by minimizing the reprojection error of the 3D estimates to the cameras.

Before executing any survey, the stereo system must be calibrated in order to obtain the individual intrinsic camera parameters and extrinsinc parameters relating both cameras. In the calibration process the non-linear image distortion parameters (radial and tangential) are estimated and will anable us to remove this distortions afterwards. Furthermore, after calibration, each stereo image pair obtained by the stereo rig can be rectified. The rectification process transform the image pairs to new pairs that can be considered to be obtained by a fronto-parallel stereo system. Using this type of system correspondece points in the images are allways in scan-lines. This introduces a constraint that reduces the correspondence search from 2D to 1D.

Figure 1 shows two image pairs on the left taken at time t and time t+1. On the right, it is shown the result after applying the correction of the non-linear distortions and the rectification.
Source Images Rectified Images
Images
Acquired
at Time
t+1
Source Pair at Time t+1 Right Arrow Rectified Pair at Time t+1
Images
Acquired
at Time
t
Source Pair at Time t Right Arrow Rectified Pair at Time t
Fig 1. Left column shows the acquired images at time t and t+1 and
the right column shows images after the non-linear corrections and the stereo rectification process at time t and t+1.

To achive the reconstruction the Stereo Tracker Algorithm executes the following actions at each step:

  • Feature Detection
  • Feature Matching
  • Triangulation
  • 3D Registration and ego-motion estimation

Figure 2 depicts detected features using SURF in the quadruplet of rectified images gathered by the stereo rig at time t and time t+1. The best features are selected equaly sparsed withing images by using the non-maximal suppression algorithm.
Detected Features
Fig 2. Detected features using SURF algoritm.
Once the features are detected, SURF descriptors arround them are computed. The feature descriptors are matched in image pairs a) left-right at time t; b) left-right at time t+1; c) left-left at time t and t+1; and d) righ-right at time t and time t+1.

Figure 3 shows on the left all the matches found and, on the right, the matches that remain after applying the epipolar, disparity and quadruplet constraints. Epipolar constraint only allows to find a correspondence at (x+Δx, y) in the right image for a feature at (x, y) on the left. The disparity constraint bounds the Δx along the closer range centered in x. Finally, the quadruplet constraint filters out the quadruplet matches that are not closed set, this is, if we select a matched feature on the left image at time t, and we follow the matching links as edges in a graph, after for steps we must end up in the initial feature. In other words, the resulting shape when linking a set of 4 matched features must be a parallelogram.
Matched Features Right Arrow Filtered Matched Features
Fig 3. The left image shows all the correspondecens obtained after the matching process.
The right image shows the filtered matches using the epipolar, the disparity and the quadruplet constraints.
Using the pair wise matches at time t we triangulate the 3D position of any matched pair of points. The same process is done by the matched features at time t+1. After this step we obtain two sets of 3D points and the correspondences between them since the have computed the time t and t+1 matches. Using this information, a robust 3D registration technique by executing a RANSAC over the absolute orientation algorithm is executed to obtain the stereo rig motion (Rotation and Translation) between time t and time t+1.

The incrementally estimated trajectory suffers from drifting along the time. That is, since more measurements are incorporated the more error is accumulated. We improve the solution by reducing slightly this drift executing an off-line optimization (bundle adjustment) step. Therefore, to incorporate more information to the optimization, the 2D features are tracked within consecutive frames providing two or more 2D projections for each 3D point. We use an optimized C/C++ Sparse-Bundle implementation extended to account for stereo data focused on optimizing the structure from motion data that can account multiple 2D projections of a single 3D point within frames.

Bundle Adjustment optimizes the MLE for the Structure from Motion problems (reprojection error), however its cost is cubic with respect to the number of parameters. A Sparse-Bundle Adjustment algorithm exploits the block sparse structure of the Jacobian Matrix in order to save time and memory resources. Moreover, we use the sparse-bundle adjustment to optimize the error in the 3D instead of the 2D saving a little bit more of time and resources since the same problem formulated in the 3D terms has smaller size than in the 2D terms.

Figure 4 illustrates the reconstruction process run over a outdors dataset.

Fig 4. This figure is a video sequence showing on top row are the stereo input sequence at 25 FPS. The bottom left figure shows consecutive pairs of stereo shots at a subsample rate (1 out every 20 stereo pairs) with the stereo feature pairs tracked along time. Bottom right picture illustrates the evolution of the reconstruction. Blue ellipoid represents the camera position uncertainty (99% of confidence) and the green ellipsoid the uncertainty of the scene reconstructed at the last shot (99% of confidence). Uncertainty in the reconstruccion allows to detect the loop closure point and relate the first and the last stereo-frame features. Then, a global optimization (stereo sparse bundle adjustment) is performed. The final camera trajectory and the images are fed to a dense reconstruction algorithm.