Visual Saliency Tracking in C++ using NMPT's FastSUN Algorithm and OpenCV:

Last updated on 20th Oct, 2016. Posted originally on 6th June, 2010 by Shervin Emami

Visual Saliency Tracking is a way for a computer or robot to look for interesting things to look at, in a similar way that humans look for interesting things to look at. For example, if something is moving, then we are more likely to look at it than something that is simply still, and if something has many bright colors and sharp edges then we are also more likely to look at it than a blank wall.

The FastSaliency library in NMPT (created by Nicholas Butko, et al. 2008 with original website at UCSD) tries to look around in a scene or video in a similar way that a human might, by Attention Tracking from Visual Salience.

 

Demonstration:

Posted on 6th June, 2010 by Shervin Emami

Here is a demo of my Visual Saliency Tracking program based on the FastSaliency tracker in NMPT.

The green dot will try to follow the same sorts of things in the video that a human would, based both on movement and contrast in the objects.


This video was taken at a beach in Australia on a hot sunny day. The areas of least visual saliency (least important) are shown in black, and most important areas are shown in white. The blue dot is the most important section in each camera frame, and the green dot is a smoothed version of the blue dot, that does not jump around as much, since there are often multiple pixels in the scene that are considered as important.

 

How to compile FastSaliency in Visual Studio:

Posted on 11th March, 2010 by Shervin Emami

NMPT (Nick's Machine Perception Toolbox, freely available here) is designed to be compiled on Mac or Linux (or Windows through Cygwin). But most of the NMPT system is just code that performs a thorough statistical analysis of its 2 algorithms compared to other algorithms. If you just want to do saliency tracking, we just need a few of the C++ classes that can be compiled into our own project. That way we dont have to use CMake or Cygwin or anything that they used to compile NMPT, we can compile it just like the rest of our C++ code (eg: Microsoft VC++ 2005).


So to test NMPT's Simple Saliency Tracker program, what I did is:
1) Create a new blank console project called "SimpleSaliency" in VC++ 2005.
2) Add these files (from NMPT's source code folder):
    SimpleSaliency.cpp
    FastSaliency.cpp
    FastSaliency.h
    OpenCVBoxFilter.cpp
    OpenCVBoxFilter.h
3) Then you just need to compile with OpenCV the same way any other OpenCV 1 or OpenCV 2 program does.
For example, if you have the OpenCV 2.0 SDK installed at "C:\OpenCV2.0", then in your Project Properties in VS2005:
- Add "C:\OpenCV2.0\include\opencv" to "C/C++ -- Additional Include Directories".
- Add "C:\OpenCV2.0\build\lib\Release" to "Linker -- Additional Library Directories". (Note that this folder is probably somewhere different on your computer, you will have to find the folder with files like "cv200.lib" in it).
- Add "cv200.lib cxcore200.lib cvaux200.lib cxts200.lib highgui200.lib opencv_ffmpeg200.lib" to "Linker -- Input -- Additional Dependencies".

You should now be able to compile the SimpleSaliency example, which uses NMPT's FastSaliency library and the OpenCV library. To run the SimpleSaliency program, you need to run it from the "NMPT" folder, since it tries to open the file "data\HDMovieClip.avi". So if you open a command prompt window, assuming that you installed NMPT to "C:\Nmpt" and your test project to "C:\Nmpt\TestNmpt", you can type:

cd C:\Nmpt
C:\Nmpt\TestNmpt\Release\SimpleSaliency.exe

(This should run the SimpleSaliency program you just compiled, from the C:\NMPT folder, so that it can access the video at "data\HDMovieClip.avi". You should see a little window on the screen showing the greyscale saliency values of the HDMovieClip.avi video).
Now that you can get the saliency of a video stream, you can track motion & salient objects by searching the saliency image to see where the maximum saliency point is within the image. Note that the saliency image is a 32-bit floating-point image, so you might want to convert it to a 8-bit UCHAR image.