Using information about target distance for attentional selection

The saliency map approach (Niebur & Koch, 1996; Itti et al, 1998) explains bottom-up attentional processing based on orientation, intensity and color (other versions include other feature dimensions in the computation of the saliency map). It is also possible to include top-down feature information, e.g. the color of a visual target.

In this project, we use information about the size of perceptual objects. If the physical size of an object is known (at least approximately) as well as its distance from the observer, we may be able to use this information for a more efficient search. For example, when we are actively searching for a mug in a scene, we expect it to extend to about 5 degrees when it is 2 meters from us, 2 degrees when it is 10 meters from us, and essentially not being visible when it is more than 100 meters. Hence including the distance information in the saliency map can accelerate visual search to relevant parts of the image, by excluding parts of the visual scene where search is unlikely to be successful. More generally, it allows us to constrain the search to selected parts of the scene and to objects of the appropriate angular size in a given part of the scene.

The primate visual system uses a variety of cues to estimate distance to an object, for instance disparity, intensity, shading, ... Our interest here is not to study these mechanisms in detail. Instead, we assume that the system has obtained somehow an estimate of the distance and we are interested in how it can be used for attentional selection. Therefore, we use a LIDAR range scanner to obtain distance information whichis used in conjunction with a webcam to acquire images from indoor scenes. The LIDAR camera (Swissranger SR4000) measures the distance of the objects from the camera by measuring the phase delay between a beam of transmitted and received infrared light. We set the LIDAR frequency to 15 MHz resulting in a 10m range (15 mm resolution). The LIDAR images were acquired using MATLAB.

Figure 1 below shows the image captured by the webcam (left) as well the LIDAR image from the scene(right).

Figure 1: (left) An image of an indoor scene taken by webcam. (right) Simultaneous LIDAR image acquired by the Swissranger SR4000 camera. Warmer colors correspond to larger distances. Distance scale (right) is in meters

The model

We can now select specific parts of the image using the distance information which is integrated in the (proto-)object oriented saliency map model described elsewhere, see https://neuromorphs.net/nm/wiki/2010/results/att10/ProtoObjects. (A preliminary version was implemented using the Saliency toolbox, www.saliencytoolbox.net, but results were inferior and we only show results using the proto-object based saliency map model). Figure 2 shows selection of salient areas without addition of the depth information.

Figure 2: Salient locations obtained by the traditional bottom-up saliency map. Selected regions are circled by red lines and subsequent fixations are indicated by black arrows. Objects are selected without regard to distance.

Distance information is integrated in the saliency map as follows. First, we select the target distance range. This is used to convert the LIDAR image to a binary image, setting it to 1 within the selected range and to 0 outside that range. Figure 3 below shows the block diagram of the incorporated model.

Figure 3: Block diagram of the saliency map with integrated distance information. The LIDAR image is converted to a binary (0,1) matrix based on the selected range and then multiplied by the saliency map (our saliency map includes grouping information, see text). The Winner-Take-All mechanism which selects attended areas only operates in the selected range

For example, if we are looking in the 2m to 2.2m range, we convert the LIDAR depth matrix W so that

Wij=0 if Wij<2 or Wij>2.2

Wij=1 otherwise

Now we simply multiply this matrix, W, by the matrix representing the original saliency map, to obtain the distance-weigthed saliency map:


and salient objects are now only selected in the chosen distance range, see Figure 4. Figure 4: Selective range based salient map for three different ranges (left)0.7-1m (middle)2-2.2m and (right)3-3.5m. Only the objects in the selected range are chosen

Future work

So far, we have only employed a binary weighting of the LIDAR information. This can be generalized if specific information is available. For instance, the likelihood of detecting an object of given size (and possibly other visual properties) as a function of distance from the observer can be used as a weight. Furthermore, it may be possible to selectively search only for objects of a specific spatial angle range which corresponds to different levels in the pyramid structure used in the saliency map construction (thus, which level to choose would depend on the distance between object and observer).

A limitation is that the LIDAR scanner we used has a limited range (10m) and only works well for indoor scenes. Other instruments with larger range are needed to acquire distance data from outdoor scenes.


Andreas Andreou, Ernst Niebur, 'Jeremy Wolfe', Ralph Etienne-Cummings, 'Alex Russel', 'Mohsen Mollazadeh'


Itti, L., Koch, C. and Niebur, E. 1998. A model of saliency-based fast visual attention for rapid scene analysis. IEEE Trans. PAMI 20(11) 1254-1259

Niebur, E. and Koch, C. Control of Selective Visual Attention: Modeling the `Where' Pathway. Neural Information Processing Systems 8:802-808 (1996)