COMPUTER VISION

Computer vision, or machine vision, refers to the process of extracting intelligent information from images and videos. Although it seems easy for humans to recognize thousands of different objects and make sense of what is going on around us, it is difficult to write a program that can allow a computer, which only sees images as a matrix of numbers, to complete these same tasks. To accomplish this computer vision researchers must turn to a mix of image processing, machine learning and stastitical methods when designing their algorithms.

Fortunately, Intel has compiled an open source library of optimized code for C++ that includes many of the famous methods developed in computer vision. It's called OpenCV, and allows researchers to use high level functions to write portions of their code for methods that are already well established in the field today in order to avoid reinventing the wheel. For this project we developed a library in Visual Basic for interacting with the camera through HTTP protocol. Jpeg images that are recieved from the camera are converted to the IplImage type which is the standard for performing processes in OpenCV. The majority of image processing is completed using OpenCV functions that can be modified for specific purposes.

OpenCV Structure

FEATURES

We use OpenCV to find the region in the current frame that best matches the target region in the past frame in order to estimate its position and velocity relative to the camera. Many techniques have been proposed for accomplishing this, but most are inadequate for our ptz tracking application. Methods such as block matching are costly and inefficient if not combined with efficient search methods. Additionally, to be used in any practical application the camera's tracking system should not fail when the target is moving towards or away from its position. As a result, the tracking method must be somewhat robust to changes in scale and also noise and distortion introduced by the lense distortion, autofocus and other non-ideal behavior. For these reasons we decided to track a subset of features with the target which exhibit such properties.

Our features consist of a 3x3 neighborhoods of pixels centered at a point. For a feature to be useful it should be destinguishable from the surrounding pixels. To identify good features to track, we used the Shi and Tomasi criteria for finding points with texture in two or more directions. During initialization, our software prompts the user to identify the target by drawing a box around the region to be tracked. Highly textured features are identified in the specified region and are tracked through the following frames. Statistical analysis of the position and displacement vectors of the tracked features can be used to obtain information about the motion of the target in general.