Update Using Predictive online method

Introduction

Goal:

We do not perform full reconstruction from every corresponding set of frames. Instead the previously constructed model is updated with current 3D information. Then we find the difference between the previous and current set of frames which exhibit intensity change are the ones which affect the current temporal 3D model. Therefore, we use only this small subset of point in estimating the changes in the new model.

Methodology:

Table 3. Predictive algorithm of the online reconstruction

 

Step1

Updating using prediction of the linear statistical model (Section IV A)

1.1Predicting the 3D structure points

1.2Generating the Predicted structure matrix

1.3 Removing the wrong 3D points

 

Step2

Validating the predicted value using feature-based method (Section IV B)

2.1Validating the predicted feature points.

2.2Validating the removed wrong feature points.

2.3Repeat step1 and step2 until there is no more frame subsequences.

 

 

In the structure from motion approach, a number of feature points are tracked and a measurement matrix is formed in which each element corresponds to the image coordinates of a tracked point. Then the factorization method is used to recover the camera motion and the 3D model out from those points. In any realistic situation, the measurement matrix may have missing entries. This is either because of the occlusion of certain feature points in some frames or due to a failure in the tracking algorithm. In our method, we are trying to eliminate the number of frames used in the reconstruction process. So, a number of feature points may be missed from a frame to a frame. A prediction process must exist to estimate the value of the feature points. We have exploited the linear statistical model to predict the position of the feature points that have a big difference in optical flow.

 

IV A. Step1:Predicting the location of the feature points using linear statistical model and the flow-based model

 

In some situations, a tracking algorithm may fail to track proper feature points in the next frame. This is due to the rotation of the object in front of the camera. In our case, we have an object rotating in front of the camera. Such movement may expose some feature points to disappear and others to appear. The missing of feature point comes after a decrease in the displacement between the feature point positions in the consecutive frames each time the object is turning. The feature disappears when the object totally rotates to one side. This information is the reason behind using the regression model as the prediction model. In the following subsection, we provide our predictive method to predict the location of the feature points.

The linear statistical model is used to predict what value will occur for the feature point if it wasnít missed using its displacement history before been missed. We use a form of multiple regression analysis in which the relationship between one or more independent variables and another variable is modeled by a least squares function, called a linear regression equation. This function is a linear combination of one or more model parameters, called regression coefficients.

 

         Step 1.1: Predicting the 3D structure points

 

The following equation indicates that the position of the feature point , defined in Eq (3), for frame is the position of it in frame plus the flow displacement of the point between the frames and frames.

 

That is:

, and†††

 

Where is the displacement between the positions of the feature point in the two frames.

 

Where is the expected ratio of the increase or decrease in the displacement among the last set of frames having the feature point. The value of is estimated using linear regression. We assume that the missed point had appeared in at least four frames before been missed. So, used four predictors to predict the ratio of the increase or decrease in the displacement of the feature point over the last set of frames it appears in. We predict each of the horizontal displacement and vertical displacement of the feature point in separate equations. The following predictors are used using the past four frames:

The u- and v-displacement of the feature points in the previous four frames to are represented by:

,††††††††††††††††††††††††††††

 

So, the equation to predict the u-displacement of the feature point is

[y1] 

 

Similarly, the equation to predict the v-displacement of the feature points of frameis:

[y2] 

 

Where and are the predicted displacement values for the feature point. To solve the normal equations for the value of β matrix, we substitute the five displacement values (including the predicted one) with an existing displacement value of five points. After having the value of matrix, the linear system is ready to predict the displacement of a feature point in frame when given the displacement of the previous four points.

By substitution the displacement value in equations, the predicted u- and v- coordinates of the feature point will be:

 

And the predicted feature point will take the form:

†††††††††††††††††††††††††††

 

 

         Step 1.2: Generating the predicted Structure matrix

After predicting the feature points and removing the wrong feature points as shown in step 1.1 and step 1.2, the predicted structure matrix is generated using the factorization method.

 

Eq. 56 and 57 in step 1.1show the coordinates of the predicted feature points. According to Eq. 6, the measurement matrix that contains some predicted feature points will called the predicted measurement matrix and will have the form:

†† ††††††††††††††††††† †††††††††††††††

 

Where,

: number of frames

: number of points

Note, not all points in the predicted measurement matrix are predicted, but some of them. Accordingly, we compute the predicted registered measurement matrix by subtracting the mean and from . Where and are the mean of all and entries in each row of the measurement matrix.

 

†† ††††††††† ††††††††††††††

can be expressed in a matrix form:

. Where,

†††††††††††††††††††††††††††††††††††††††††††† †††††††††††

represents the motion matrix, and

 

†††††† †††††††††††††††††††††††††††††††††††

represents the predicted shape matrix.

 

The rows of represent the predicted orientations of the horizontal and vertical motion throughout the stream. The columns of are the coordinates of the predicted 3D feature points with respect to their centroid.

The centroid can be computed as:

 

 

The registered measurement matrix is decomposed into by SVD (Singular Value Decomposition). If corresponds to the 3 largest singular values obtained, we estimate the respective motion matrix and structure matrix tobe:

††††††††††††††

†††††††††††††††††††††††††††††††††††††

The true value of the predicted motion and structure matrices is given by

 

††††††††††††††††††††††††††††††††††††††††††

 

         Step 1.3: Removing the wrong feature points

 

Our prediction model along with the flow-based method is also used to extract the wrong feature points from frames. Some features are wrong selected by the KLT tracker due to noise or difference in intensity values in the background. The wrong points are characterized by their slow movement through the sequence of frames comparing to the moving object movement. In our method, wrong feature points are removed from the set of feature points. Let the position of a feature point at some frame be = , where represents the index of frames and represents the index of feature points.

The feature points are considered wrong feature points if the whole displacement among the frames they appear in is less than the threshold. The point is deleted if its overall displacement in the consecutive frames is less than a threshold. We delete a wrong feature point by assigning 0 to its value.

 

††††††† †††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††††

 

Where ††††††††††††††††††††††††††††††††††††††††††††††††††

where represents the sum of the horizontal and vertical displacement of the feature point in range between the first frame it appeared in up to the last frame it appeared in .

 

IV B. Validating the predicted feature points using the feature based method

 

Some feature points are wrong predicted. The correction method is necessary to validate the process of prediction. Moreover, we used prediction to remove the wrong chosen feature points. In this case the correction is used to make sure that we didnít remove a good feature point. We used the feature-based method to detect and correct the value of the wrong predicted feature points.

 

         Step 2.1: Correcting the predicted feature points.

††††††††††††††††††††††††††† ††† †††††††††††††††††††††††††††††††

To correct the proposed predicted 3D point , we first, convert it to its corresponding 2D point in the frames that has big difference in optical flow in the following manner:

Having the predicted point and the predicted motion and for the frame that have a big difference in optical flow (compared with its previous frame), then the corresponding 2D point in frame can be estimated using the orthographic projection,

 

†††††††††††

 

Where and can be estimated.

 

Having the estimated point , we use the KLT tracker to search for similarity in intensity information within a window to correct the location of the estimated 2D point in the following manner.

 

First, we set a window about frame region around the estimated 2D point . Then, the KLT algorithm uses local image intensity gradient vector to find the corresponding feature point at locations where the minimum eigenvalue of the symmetric matrix G in eq (KLT1) is above some threshold.Where

 

After that, the KLT algorithm uses the matrix to find the displacement between the previous frame and the current frame .A solution for the displacement may be found by finding a displacement which minimizes the sum of the squares of the intensity in the following cost function in eq. (KLT3):

 

Then we go through the equations from (KLT3) to (KLT5) to find the displacement .

Once the displacement is calculated, the corrected feature point can be calculated from the previous frame as the following equation:

 

If the KLT finds the displacement of the estimated feature, then the predicted point is replaced with the corrected point. Otherwise, if no intensity similarity found, then the feature point is considered as missing point and its value is estimated as suggested by equations from (a) to (c) in section III.

 

After correcting the value of the predicted point, the new 3D point is projected as described in eq.(3).

 

†††††††††††

 

         Step 2.2: Validating the removed wrong feature points.

 

To verify whether the removed feature points were wrong, the KLT tracker is applied.

We first convert the removed 3D point to its corresponding 2D point in all frames that appear in. The corresponding 2D point in frame can be estimated using the orthographic projection,

 

††††††††††

 

Where and can be estimated.

 

Then, we set a window around the estimated feature point whose size is , where is the predefined window size used in the KLT tracker when tracking the feature points among the frame subsequences. The window location is fixed, meaning that it doesnít move from the first image having this feature point up to the last frame.

 

Now, the KLT tracker is used to search for similarity in intensity information within the window to look for the wrong feature point. The KLT algorithm uses local image intensity gradient vector to find the corresponding feature point at locations where the minimum eigenvalue of the symmetric matrix G in eq (KLT1) is above some threshold.Where

 

After that, the KLT algorithm uses the matrix to find the displacement between the previous frame and the current frame for all frames the feature points appear in.A solution for the displacement may be found by finding a displacement which minimizes the sum of the squares of the intensity in the following cost function in eq. (KLT3):

 

Then we go through the equations from (KLT3) to (KLT5) to find the displacement . If the displacement is found within the small window, then it is an indication that the feature point didnít move that much. So it is a wrong feature point and it will not be recovered. If the feature point was not detected within the small window, then this is an indication that the feature point is not a wrong point. Then, this feature point must be recovered.

 

After recovering the value of the removed point, the new 3D point is projected.

†††††††††††††††

 

The prediction and validation is repeated for every subsequence of frames as indicated by the flow chart.

Results

Data Condition:

We tested our online 3D face reconstruction method on two real image sequences:

1. Hotel Dataset:

A real image sequence of a small model building was used in this paper. The dataset was prepared in the laboratory by using a camera mounted on a computer-controlled movable platform. The camera motion included substantial translation away from the camera and across the field of view. The dataset consists of one hundred frames.

Five frames of the hotel dataset are shown:

The following picture is the output of the KLT feature tracker using the predictive online reconstruction method.

Left image: The first image of the stream with the extracted feature points.

Right image: The tracking path of the feature points in the rest 99 frames using the predictive online reconstruction method.

3D structure points from different viewpoints using the predictive reconstruction method of hotel image stream

The house 3D model with texture mapping:

 2. Face Dataset of a male face turning his face left and right facing the camera.

We collect the dataset by capturing a video record of continuous 6 seconds for a male subject. The cameraís position was at the level of the maleís head and far about 30 cms from his face. The video record was recorded under a normal white light and the brightness didnít change during the video record. The background behind the face contained different objects and colors but it was fixed during the video recording. Image Ready software was then used to cut slices from the video for every 0.03 seconds (30 msec). The software generated two hundred frames each of size 1 KB. The two hundred frames were divided into consecutive sequences each has 10 frames.

Five frames of the face dataset are shown:

The output of the KLT feature tracker using the predictive online reconstruction method.

Left image: The first image of the face image stream with the extracted feature points.

Right image: The tracking path of the feature points in the rest 99 frames using the offline reconstruction method.

The 3D points of the face structure

 

Left image: The face mask

Right image: The face model with texture mapping

 

Comparison between the offline and the online reconstruction method

Rotation (Face dataset)

Complexity (Number of frames & Complexity)

From the results above, we note the effectiveness of our online 3D face model estimation from image sequences. The method removes the redundancy by eliminating the number of frames that donít show a big difference in optical flow. This method acts in an efficient manner without compromising the quality of the 3D model.


 [y1]Hat p sub I and sub j

 [y2]Hat p sub I and sub j