Depth Image Alignment: An Implementation

Stefan Sechelmann

Date: February 10, 2007


This documentation accompanies an implementation done for the ``Scanning Real World Objects Without Worries'' project of Prof. Marc Alexa during winter term $ 06/07$ at Berlin technical university. It is also available in PDF format at Documentation.pdf. The doxygen documentation of the project is available at refman.pdf. The talks given for this project are available here:

The goal of the project is to build a 3D scanning device. It has been the challenge to construct a scanner that would be usable in an outdoor environment. Therefore it had to be operated in an easy and time efficient manner.

The idea is to use a PMD depth image camera and two commercial cameras, which are mounted on a tripod. To produce range data from the stereo pair the depth information of the PMD camera is exploited. Once depth images of different views of the object are taken, alignment, registration and meshing has to be performed.

Figure 1: Artificial depth images used for testing
Image greek2-60-invertImage greek2-30-invertImage greek2-0-invertImage greek2-330-invertImage greek2-300-invert

This documentation describes the alignment part of the process. By the time of writing this article the acquiring of the depth images is still in development. Artificial data has been produced to do the testing see fig. 1. The testing data has a depth resolution of $ 8$ Bits and has been created using the software renderer Blender. The $ Z$-buffer is scaled and clamped to fit the size of the model.

The idea for the alignment is to generate feature points from the range data. Then match those points to construct a rigid transformation between each pair of overlapping images in turn. This proceeding is described in section 2.3.

The range images are treated as point clouds after removal of the background area.


The implementation is a C++ program. The source tree is divided in $ 8$ sub-folders. It includes a front-end, the algorithms and other features like exporting and OpenSceneGraph support for the QT widget set. The outline of the source folder is listed in table 1.

Table 1: The source folders
% latex2html id marker 49
\item [{controller}]...
...ons, includes transformation calculation


Figure 2: Saliency Features
Image greekfeatures1500Image greek330features1500  Image greekfeatures3000Image greek330features3000

The feature points are created by various methods described in The Documentation of Jonas Hurrelmann. Features generated with the saliency method are shown in figure 2. Three different quantities of features are shown: $ 1500$ and $ 3000$ points are mapped onto the gray-scale images of the statue. In practice we used a number of $ 1500$ features which seem to be enough to cover all interesting points.

Figure 3: The Feature Class

The abstract class Feature is defined in the file features/Feature.h see fig. 3. A feature $ f$ is defined having a center $ p\in\Bbb R^{3}$ and some descriptor information which is different for each feature type. The feature space $ F$ forms a metric space implemented by the method getDistance2(Feature*). For the features $ f_{i},f_{j},f_{k}\in F$ is is
d$\displaystyle (f_{i},f_{j})$ $\displaystyle \geq$ 0  
d$\displaystyle (f_{i},f_{i})=0$ $\displaystyle \Leftrightarrow$ $\displaystyle i=j$  
d$\displaystyle (f_{i},f_{j})$ $\displaystyle =$ d$\displaystyle (f_{j},f_{i})$  
d$\displaystyle (f_{i},f_{j})+$d$\displaystyle (f_{j},f_{k})$ $\displaystyle \geq$ d$\displaystyle (f_{i},f_{k}).$  

The features also form a vector space, but only coordinate access (operator[]) and the dimension (dimension()) are exposed. The folder features contains two implementations. The TrivialFeature does not contain any data and is used for manual feature selection. The SaliencyFeature has an extra double value holding the saliency at the center of this feature. The metric is the standard metric in $ \Bbb R$. Other implementations using a saliency kernel or a normal variance kernel are defined the feature documentation. In the remainder of this document features are denoted by a small type $ f$ and the corresponding center is $ p$. Feature sets are denoted by capital $ F$. Sets are assumed to have a natural ordering, such that $ f_{i}$ is the $ i$'th feature of $ F$ and $ \char93  F$ is the cardinality of $ F$.

Managing Depth Images And Features

Table 2: The Fields Of FeatureQImage
\item [{name}] The file name of the loaded dept...
...lay}] An overlay image for the front-end

Depth images are managed by the class FeatureQImage defined in features/FeatureQImage.h. This class extends the QImage class from the QT library. Important fields are listed in table 2. A FeatureQImage can only manage one feature set at a time. There are methods for addition, removal and creation of features. At this time it is possible to create saliency and saliency kernel features.

The program stores the set of loaded images in the Controller. The Controller is defined in the file controller/Controller.h. It manages global application data and some inter module communication.

The Alignment Algorithm

% latex2html id marker 139\par
...{{*}}] \textbf{return} $T_{r},F_{r},\hat{F_{r}}$

% latex2html id marker 205\par
\item [{{*}}] \textbf{return} $F_{r}$

% latex2html id marker 246\par
\item [{{*}}] \textbf{return} $\hat{F}$
The alignment algorithm is a modified RANSAC see algorithm 1. It is defined in matching/Matching.h. It calculates the matching transformation for the left feature set. The right set is assumed to have an arbitrary transformation $ \hat{T}$. The important parameters of RANSAC are

$ F$
left feature set
$ \hat{T}$
matching transformation of the right feature set
$ \hat{F}$
right feature set
$ k$
number of randomly chosen features
$ \Sigma$
minimal distance of the chosen features
$ r_{p}$
search radius (e.g. the disparity maximum of the images)
$ s$
number of samples in FINDOPTIMALMATCHING at lines $ 11$-$ 25$.
$ \varepsilon$
$ E_{g}<\varepsilon$ $ \Rightarrow$ optimal matching found
$ n$
an iteration limit.
The algorithm determines $ k$ randomly chosen features $ f_{1},\dots,f_{k}$ such that

$\displaystyle \parallel p_{i}-p_{j}\parallel\geq\Sigma\,\,\,\forall_{i,j=1\dots k}.$

For those features it finds matching features $ \hat{f}_{1},\dots,\hat{f}_{k}$ from image $ 2$ with the following properties
$\displaystyle \parallel p_{i}-\hat{p}_{i}\parallel$ $\displaystyle \leq$ $\displaystyle r_{p}$  
$\displaystyle \parallel p_{i}-p_{j}\parallel$ $\displaystyle \approx$ $\displaystyle \parallel\hat{p}_{i}-\hat{p}_{j}\parallel$  
d$\displaystyle (\hat{f}_{i},f_{i})$ $\displaystyle =$ $\displaystyle \min.$  

The result of the alignment is of type FeatureMatching which is defined in matching/FeatureMatching.h. It is a Wrapper for the result of RANSAC. One FeatureMatching has a name, the two matched feature sets and pointers to the corresponding images. The features are copied and new memory is consumed for each matching. In the program the FeatureMatchings are managed by the Controller.

The global matching error is the mean distance of the nearest features from the transformed images. For feature centers $ p_{1},\dots,p_{n}$ from the left image and $ \hat{p}_{1},\dots,\hat{p}_{m}$ from the right image it is

$\displaystyle E_{g}$ $\displaystyle =$ $\displaystyle \frac{1}{\min(n,m)}\sum_{i=0}^{\min(n,m)}\parallel Tp_{i}-nq(Tp_{i})\parallel$  

where $ nq(p_{i})\in\{\hat{T}\hat{p_{1}},\dots,\hat{T}\hat{p_{m}}\}$ such that $ \parallel Tp_{i}-nq(Tp_{i})\parallel$ is minimal. $ T$ and $ \hat{T}$ are the matching transformations of the left and right image respectively.

To speed up the NEARESTQUERY operation we use a KDTree implementation by Scott Kircher, that can be found at KDTree.h.

The Matching Transfomation

For feature points $ p_{1},\dots,p_{n}$ and $ \hat{p}_{1},\dots,\hat{p}_{n}$ we calculate the matching transformation $ T$. We implemented two methods for calcuating the transformation. The direct method which takes only $ 3$ matched points and a least squares approach which uses principal component analysis.

Direct calculation contructs a transformation $ T$ with the following properties

$\displaystyle p_{1}$ $\displaystyle \mapsto$ $\displaystyle \hat{p}_{1}$  
$\displaystyle p_{2}$ $\displaystyle \mapsto$ $\displaystyle \hat{p}_{2}'\in\overline{\hat{p}_{1}\hat{p}_{2}}$ (1)
$\displaystyle p_{3}$ $\displaystyle \mapsto$ $\displaystyle \hat{p}_{3}'\in\overline{\hat{p}_{1}\hat{p}_{2}\hat{p}_{3}}$  

where $ \overline{\hat{p}_{1}\hat{p}_{2}\hat{p}_{3}}$ is the plane through $ \hat{p}_{1},\hat{p}_{2}$ and $ \hat{p}_{3}$. The matching transformation $ T$ can be decomposed

$\displaystyle T=R_{2}\cdot R_{1}\cdot T_{0}$

where $ R_{2},R_{1}$ and $ T_{0}$ correspond to the mappings (1). It is

$ T_{0}$
translation which takes $ p_{1}$ to $ \hat{p}_{1}$
$ R_{1}$
rotation in the plane of $ \hat{p}_{1},\hat{p}_{2}$ and $ T_{0}p_{2}$ that fixes $ \hat{p}_{1}$ and takes $ T_{0}\cdot p_{2}$ to the line $ \overline{\hat{p}_{1}\hat{p}_{2}}$.
$ R_{2}$
rotation that fixes the line $ \overline{\hat{p}_{1}\hat{p}_{2}}$ and takes $ R_{1}T_{0}p_{3}$ to the plane $ \overline{\hat{p}_{1}\hat{p}_{2}\hat{p}_{3}}$.
For the principal component analysis (PCA) we calculate the covariance matrix of the feature points and map the corresponding eigenvectors. Calculate the following quantities

$ b_{1},b_{2}$
barycenters of the two feature sets
$ X_{1},X_{2}$
covariance matrices
$ V_{1},V_{2}$
eigenvectors of $ X_{1}$ and $ X_{2}$.
The resulting transformation is given by

$\displaystyle T=T(b_{2})\cdot V_{2}\cdot V_{1}^{T}\cdot T(b_{2})^{-1}\cdot T(b_{1},b_{2})$

where $ T(b_{1},b_{2})$ is a translation such that $ b_{1}\mapsto b_{2}$ and $ T(b)\hat{=}T(0,b)$.

The transformation calculation among other usefull utility functions is implemented in the file util/NewmatUtil.cpp.


Figure 4: Matchings On The Test Data

Image matching7 Image matching10

The algorithm has been successfully applied to the test data. The angle between the images is $ 30$. We use the following parameters: $ \Sigma=50.0$, $ r_{p}=50.0$, $ s=6$, $ \varepsilon=5.0$, $ n=200$. The matching is beeing performed on $ 1500$ features in each image. Results for $ k=7$ and $ k=10$ are shown in figure 4.

The User Interface

Figure 5: Screenshot Of The Application
Image screenshot

The user interface has been implemented using the widget library QT. A screenshot is displayed in figure 5. We use QT version $ 4.2.2$. An overview of the implemented gui classes can be seen in the DOXYGEN documentation created from the project.

The preview panel is an OPENSCENEGRAPH widget showing the loaded and transformed range images as lighted point clouds. The normals of the points are calculated by examining the direct neighbourhood of a point in the image. A slider alters the size of the displayed points.

The image library shows all currently loaded depth images. Buttons for loading and removal of images are at the top of this docking widget. One can add color overlays to the range files which will be displayed instead. The reset button sets the matching transformation of the active image to identity.

The main part of the application is the matching frame. Here one can choose the two images to be matched. The images are shown next to each other. Here features can be manually added and removed by clicking into the image panel. A left-click adds a feature, a right-click removes the feature under the mouse. Dragging features to a different location is possible.

The lower section provides the tools for feature generation and matching. Controls to generate and remove features of a selected image are located in the features tab. The number of features and skipped features can be entered. The matching tab offers spin boxes for all important parameters of the RANSAC algorithm. It also features controls for matching manually picked features.

Once a matching is aquired for the active pair of images it is drawn in the image panel. Blue lines connect the matched features point. All stored matchings for a pair of images are available in the drop-down menu below the image panel.

The Output File Format

If one is satisfied with the alignment result viewed in the preview dock, the range data and matching transformations as well as the feature points can be exported to an XML file. The export functionality is implemented in the file export/Export.cpp. An example output can bee seen at example.pcml.

Future Improvements

The definition of the global error can be improved by exploiting symetric queries to exclude unmatchable features from the calculation. This approach can also be used to improve the matching quality of the features. A feature match is correct if the match connects the features from both direction.

To match a series of range images a batch processor is needed. Features generation and matching in turn has to be automated to be usefull in a scanning environment.

Generated on Sat Feb 10 15:56:53 2007 for Features and Matching by  doxygen 1.5.0