For work by the (primarily) JHU people on range-based selection and saliency

The current saliency toolbox (by Laurent Itti) only include bottom-up processing for selection of objects based on orientation, intensity and color. However, Goal based navigation of visual scenes includes more specific information about the object such as its size in the visual field versus its distance. For example, when we are actively searching for a mug in a scene, we expect the mug size to be 5 degrees when it is 2 meters from us, 2 degrees when it is 10 meters from us and not visible when it is more than 100 meters. Hence including the depth (distance from the camera) information in the saliency map can significantly accelerate (constrict) the visual search to relevant parts of the image.

Here, we have used a LIDAR camera (Swissranger SR4000) in conjunction with a webcam to acquire images from indoor scenes. LIDAR camera measures the distance of the objects from the camera by measuring the phase delay between a beam of transmitted and received infrared light. We set the LIDAR frequency to 15 MHz corresponding to a 10 m range (15 mm resolution). The LIDAR images were acquired using MATLAB. Figure 1 below shows the image captured by the webcam (left) as well the LIDAR image from the scene(right).


Figure 1: (left) An image of a indoor scene taken by the webcam. (right) Simultaneous LIDAR image acquired by the swissranger SR4000 camera. The hotter colors correspond to longer distances.

The model

The object of this project is to selectively navigate specific parts of the image using the depth information.We have used both the Saliency toolbox (by Laurent Itti) as well as the Saliency model presented by Russel et al. (https://neuromorphs.net/nm/wiki/2010/att10/ProtoObjects). Since the latter model have superior performance, we incorporated our model including depth information with this model. Without addition of the depth information, the model only focuses on few objects in a random order.


Figure 2: The Salient features obtained by the traditional bottom-up saliency map. The map only can extract few objects in a random order.

Now we incorporate the LIDAR information in the Saliency model as following. First, we select the range that we are going to look for features in. We convert the LIDAR image to a binary image according to this range (1 for the selected range and 0 for outside that range). Figure below shows the block diagram of the incorporated model.

Figure 3: The block diagram of the proposed Range-based saliency model. The LIDAR image is converted to a binary matrix based on the selected range and then multiplied by the grouping map obtained by the Saliency map. Only the features in the selected range can be the salient objects.

For example, if we are looking in the 2 to 2.2m range, we convert the LIDAR depth matrix W so that

Wij=0 if Wij<2 or Wij>2.2

Wij=1 otherwise

Now we simply multiply this matrix,W, by the Saliency map matrix to obtain the new Saliency map:


The toolbox now identified the Salient objects only in the selected distance range. Figure 4: Selective range based salient map for three different ranges (left)0.7-1m (middle)2-2.2m and (right)3-3.5m. Only the objects in the selected range are chosen

Future work
Here, we only made a binary weighting out of the LIDAR information. In practice, we can enhance the information from any depth of the image by giving more weight to that specific part of the image. Moreover, the LIDAR has a limited range (10m) and can only work well for the indoor scenes. Other instruments with larger range are needed to acquire depths data from outdoor scenes.


Andreas Andreou, Ernst Niebur, 'Jeremy Wolfe', Ralph Etienne-Cummings, 'Alex Russel', 'Mohsen Mollazadeh'