Spike-Based Cognition in Active Neuromorphic Systems

Members: Jorg Conradt, Garrick Orchard

Organizers:: Ryad Benjamin Benosman (UPMC, Institut de la Vision), Cornelia Fermuller (U Maryland), Garrick Orchard (National University of Singapore), Michele Rucci (Boston University), Bert Shi (Hong Kong University of Science and Technology)

Invitees:: Rodrigo Alvarez (IBM Research), 'Marco Antonelli' (Hong Kong University of Science and Technology), Andrew Cassidy (IBM Research), 'Daniel Lee' (University of Pennsylvania), Chen Song (Tsinghua University)

Focus and Goals

The process of cognition is often thought to take place purely within the brain, where a representation of the external environment is formed, which is then used to plan behavioral strategies which are implemented by the body. This “feedforward model” has recently been challenged by the notion of embodied cognition, which posits that the body of an agent is an important component of cognition. This viewpoint brings up many questions. This topic area seeks to focus on two.

  • How do sensorimotor contingencies encode cognitive concepts, and how can these be learned in an unsupervised way?
  • How can we exploit the dynamics of the physical body as a potential substrate for computation?

Both of these tasks rely upon an understanding of the dynamical behavior of the agent’s body and the environment. Thus, real time processing is critical. Event-based retinas take full advantage of the dynamic characteristics of visual scenes and introduce time as a major source of information to process visual data. While great progress has been achieved in the development of new sensory hardware and the understanding of biological vision has greatly improved in the past decades, we are still lacking efficient bio-inspired algorithms that can perform complex cognitive tasks directly on the event-stream output of such sensors. At first glance, this task might appear even more difficult for an active observer, since events might be generated by changes in the input induced by self-motion as well as the independent motion of external objects in the environment. On the other hand, it has been demonstrated that an active agent can actually simplify visual computations through appropriate behavior (Ref. 1-2). However, this idea has yet to be fully exploited within the context of neuromorphic cognitive systems. We believe that algorithms based upon the concept of embodied cognition will be an excellent platform to fully exploit the dynamic information available from event based retinas.

We will exploit recent developments in event-based visual processing and control strategies, which show that a new promising methodology for cognition can emerge when mixing machine learning and probabilistic inference techniques with the spike-based framework (Ref. 3-5). We will also exploit recent work combining generative models for learning an efficient representation of sensory inputs with reinforcement learning of active control policies that allow the generative model to function most efficiently in order to investigate the unsupervised learning of sensori-motor contingencies [6]. We will address this problem in the context of challenging real-world tasks, such as stereovision, visual tracking, movement classification, robot localization and object detection/recognition.

The main idea of the projects is to offer the possibility to tackle cognitive problems which depend upon the dynamic nature of the environment, the body and the interaction between the two. The projects will be based around two hardware platforms. The first consists of an active binocular vision system mounted with with Dynamic Vision Sensors (DVS) or spike-based ATIS sensors. The second consists of an aerial drone mounted with DVS or ATIS sensors. These will form the basis of projects where participants seek to infer information about the environment based upon spike trains which are generated by the motion of the agent in the environment, coupled with knowledge of self motion from encoder or inertial measurement units mounted on the physical hardware.

We will also provide tutorials to introduce a number of existing approaches for event-based feature detection, stereovision, physical computation, optical flow, shape tracking and spike-based algorithms for learning and Bayesian inference, generative models for learning neural representations of sensory input, and reinforcement learning.


The projects will focus on two themes: 1.) Developing algorithms for robots to perform basic visual navigation. 2.) Active Perception: Studying advantages of taking Actions for Visual Perception.

1.) Visual Navigation

Systems that move need a set of basic processes to build essential representations of the environment. These processes include: image motion estimation, egomotion estimation (computing their own 3D motion), detecting independently moving objects, navigating in the presence of visual obstacles, and building some spatial representations of the scene. We will be implementing algorithms for these tasks on a turtlebot robot and a drone.

Subprojects include:

Ego-motion estimation

Egomotion estimation is the task of determining the motion of oneself relative to the world (or in our case, the motion of the camera relative to the world). There are basically two constraints for egomotion estimation. The commonly used approach in Computer Vision uses the so-called ‘epipolar constraint.’ The exact image motion estimates (optical flow) or the correspondence of feature points are related in linear equations to functions of the motion parameters, from which then the three parameters of the rotation and the 2 parameters of the direction of translation are computed. However, from the DVS we cannot compute very accurate estimates of the flow. Specifically, estimates of the length of the optic flow are not easy to obtain. We may be able to compute a sparse set of good correspondences from points at corners of objects. A second constraint uses the sign of normal flow (the flow perpendicular to edges). Although less powerful, this constraint is very robust, and provides sets of candidates for the rotation and translation. We plan to explore various ways to combine the robust normal flow constraints with possibly a sparse set of feature correspondences. In addition we will explore how to combine these constraints with estimates from the IMU, which is expected to give good initial estimates of the rotation. Furthermore, as shown in previous studies, egomotion estimation is much easier and more stable if the field of view is large. We will use multiple DVS (ATIS, DAVIS) sensors, which we calibrate and arrange on a spherical surface.


Using image motion moving systems can separate a scene into surfaces at different depth, because the image motion is different for different depth values. However, the DVS signals do not provide dense image motion. There is signal at the boundaries of objects, at strong textures and due to noise. The problem thus becomes a clustering of the different image motion values and a filling in of information, where there is no signal. The more difficult process for a moving system is to separate the scene into static background and independently moving objects. This is the problem of “independent moving object detection.” Since the values of motion field are due to both 3D motion and the depth of the scene, solving this problem also requires some knowledge of the system’s own motion. Using inertial measurements, we can get some approximate values for the 3D motion, sufficient to detect many moving objects. For more challenging scenarios, segmentation, 3D motion estimation, and ultimately image motion estimation are coupled and need to be addressed together.

Localization and Automatic Path Planning

The main, overarching application will be about a fast moving robot that on the basis of DVS input navigates the room, moves between door-ways and flies through windows. We intend to implement integrated vision-control solutions on the turtlebot and/or for a drone. Current approaches in the literature use motion capture systems to monitor the position of flying drones. The approach here, instead, seeks an onboard solution, with the DVS signal used directly to control the drone. The problem becomes one of visual servoing. The robot needs to learn to distinguish features of multiple locations (e.g. the lines and corners on the window) and use these DVS features to compute the 3D motion of the robot.

Object recognition

Recently, several algorithms for learning features from event based vision streams collected from jittering dynamic retinas have been proposed (HOTS and spike-based GASSOM). This project seeks to investigate efficient algorithms and their possible implementations on the IBM True North Architecture. In work on using spike-based computation to recognize objects in images, a recent  arXiv paper from IBM shows how TrueNorth can be used to implement Convolutional Neural Networks. Tasks to be studied include:

  • Investigating spike-based object recognition models and their implementation on TrueNorth
  • Develop visual tracking strategies to allow the iCub to follow objects. Optical flow can be used to separate background from moving objects when performing saccades
  • Learning models for object recognition, based on spatial relationships between identified features and their motion in time

2.) Active Perception

Humans (and animals) do usually not perceive images “passively.” We are always involved in tasks, and vision serves the actions we are engaged in. At the high-level we plan where to look and what to process next. At the lower level, active head and eye movements can serve the vision processes to facilitate perception. In this project we will be looking at the fuctional roles of eye movement and how to learn basic visual-motor processes using as input the transient image signal (simulated by DVS data). Subprojects include:

Studying Eye Movements

There is a growing body of research suggesting that small movements made by our eyes during fixation play an important role in perception. For some more reading on this topic take a look at the  Vision Research special issue on Fixational eye movements and perception. We will address questions, such as:

  • Active learning: performing saccades and head rotations to resolve ambiguities about object or location identities
  • Dynamics of visual perception after saccades
  • Comparing DVS spiking output with different retinal models of different species.
  • Building a predictive model of sensory input based on recent history and knowledge of camera motion.

Depth Perception

Knowledge of the structure of a scene (or depth) can be very useful for a robot, especially for tasks such as path planning. Although many cues can be used to infer depth (size, occlusion, perspective, etc), the most common methods rely either on stereo (2 cameras) or on motion parallax. Subprojects include:

  • Develop Stereovision strategies to allow the active vision head to develop vergence toward presented object as a way of learning the sensorimotor contingency between visual disparity and the control vergence, which is a correlate of the concept of depth in the environment.
  • How do you make sense stereoscopically of input given two eyes that move independently?
  • Monocular depth perception using motion information about the camera obtained from an IMU

Saliency and Attention

Subprojects include:

  • Learning salient visual features (e.g. edges, corners, ...) of objects from event-based input
  • Integrating attention into a predictive model of the sensory input.


Silicon Retina

The  silicon retina is a biologically inspired vision sensor which we will rely on for some of the projects. It comes in various 'flavours'. To find out more about silicon retinae you can follow the links below

For more detailed information, this paper gives a good overview:

  • Posch, C., Serrano-Gotarredona, T., Linares-Barranco, B., & Delbruck, T. (2014). Retinomorphic Event-Based Vision Sensors : Bioinspired Cameras With Spiking Output. Proceedings of the IEEE, 102(10), 1470–1484.  http://doi.org/10.1109/JPROC.2014.2346153

IBM TrueNorth

IBM's  TrueNorth is a neural processor. Each individual  TrueNorth chip is capable of simulating over 1 million neurons in real-time at millisecond precision! Some members of the IBM team will be present at the workshop to support the hardware. You can read more about it on the  IBM website, or take a look at the below article which made the cover of Science:

  • Merolla, P. A., Arthur, J. V., Alvarez-Icaza, R., Cassidy, A. S., Sawada, J., Akopyan, F., … Modha, D. S. (2014). A million spiking-neuron integrated circuit with a scalable communication network and interface. Science (New York, N.Y.), 345(6197), 668–73.  http://doi.org/10.1126/science.1254642

Or watch the  IBM SyNAPSE Deep Dive video series on youtube:

Pan-tilt unit

We will bring a  pan-tilt unit which allows us to precisely control the motion of a vision sensor as we investigate the visual sensori-motor loop.

Ground Vehicles

Aerial Vehicles

Background Reading Material

Attached to this page (at the bottom) you will find some papers with useful background reading. This section contains a brief description of these papers.

Spiking_GASSOM_v1.6.pdf - paper by Chandrapala and Shi describing an architecture for learning invariant features from a jittering retina following a fixational eye movement model

Antonelli_ICRA_2014.pdf - paper by Antonelli, del Pobil and Rucci describing a monocular approach for computing a dense depth map of the scene during fixational head/eye movements.

Antonelli_IROS_2016.pdf - paper by Antonelli, Rucci and Shi describing an architecture for the unsupervised learning of depth-tuned cells during the execution of fixational head/eye movements.

Countourmotion-dvs-final.pdf - paper by Barranco, Fermuller and Aloimonos on contour motion estimation from DVS cameras

Some papers on small eye movements (fixational eye movements) These are the eye movements (microsaccades, drifts, and tremor) that humans incessantly perform during the fixation periods in which information is acquired.

For a very brief, general intro to these movements, see: M. Rucci, P. V. McGraw? and R. J. Krauzlis (2016), Fixational eye movements and perception, Vision Research, 118:1-4.

1. Some theoretical ideas and computational mechanisms These are some of the computational ideas that have been driving our experimental work.

M. Rucci and J. D. Victor (2015). The unsteady eye: An information processing stage, not a bug, Trends in Neurosciences, 38(4):195-206. Some ideas on how humans encode information with jittering eyes. The main idea is that spatial information is encoded into temporal modulations.

X. Kuang, M. Poletti, J.D. Victor and M. Rucci (2012), Temporal encoding of spatial information during active visual fixation, Current Biology, 22(6), 510-514. Drifts/tremor whiten the input to the retina during viewing of natural scenes. Fixational eye movement are tuned to the characteristics of the natural world and the visual input does not follow 1/k2 (the spectrum of natural scenes).

2. Experimental support There is a growing body of evidence showing that these movements are useful. Here is an example:

M. Rucci, R. Iovin, M. Poletti, and F. Santini (2007), Miniature Eye Movements Enhance Fine Spatial Detail, Nature. 447(7146), 851-854. A psychophysical study supporting the idea that humans encode space in time by means of eye movements.

3. Control of fixational eye movements It is now clear that both microsaccades and drifts are controlled. Microsaccades are precisely controlled to position a preferred retinal locus within the foveola on the stimulus in high-acuity tasks. Drifts are controlled to ensure that the input signal on the retina has given characteristics (a sort of Brownian motion with fixed diffusion constant).

H.-K. Ko, M. Poletti and M. Rucci (2010), Microsaccades precisely relocate gaze in a high visual acuity task, Nature Neuroscience, 13, 1549-1553.

M. Poletti, M. Aytekin and M. Rucci (2015), Head-eye coordination at a microscopic scale, Current Biology, 25(24):3253-3259.


1 J. Aloimonos, I. Weiss and A. Bandyopadhyay, "Active vision," Int. J. Comput. Vision, vol. 1, pp. 333-356, 1988. 2 D. H. Ballard, "Animate vision," Artif. Intell., vol. 48, pp. 57-86, 1991.

3 R. Benosman, S-H Ieng S-H, P. Rogister and C Posch, Asynchronous Event-Based Hebbian Epipolar Geometry IEEE Trans Neural Networks, 2011

4 R. Benosman, S-H Ieng, C. Clercq, C. Bartolozzi and M. Srinivasan, Asynchronous frameless event-based optical flow, Neural Networks, 2011

5 Pfeiffer M, Nessler W, Douglas RJ, Maass W: Reward-modulated Hebbian learning of decision making. Neural Computation, 2010.

6 Y. Zhao, C.A. Rothkopf, J. Triesch, and B.E. Shi., A Unified Model of the Joint Development of Disparity Selectivity and Vergence Control. IEEE Int. Conf. on Development and Learning (ICDL), 2012.

7 H. Hauser, A. J. Ijspeert, R. M. Füchslin, R. Pfeifer, and W. Maass, Towards a theoretical foundation for morphological computation with compliant bodies, Biological Cybernetics vol. 105, pp. 355–370, 2011.