Spike-Based Cognitive Computing: Seeing, Hearing, and Thinking with Spikes

Members: arindam basu, Andreas Andreou, Andrew Cassidy, Brandon Carroll, Bruno Umbria Pedroni, Yi Chen, Diederik Paul Moeys, Daniel Mendat, David Anderson, Emina Alickovic, Eric Hunsberger, Emre Neftci, Francisco Barranco, Cornelia Fermuller, Guillaume Garreau, Giovanny Sanchez-Rivera, Garrick Orchard, George Ritmiller, John Harris, Paul Isaacs, john arthur, Jonathan Tapson, Jie Jack Zhang, Kaitlin Fair, Kate Fischl, Kan Li, Luca Longinotti, Michelle Collins, Mark Wang, Ernst Niebur, Peter Diehl, Paul Merolla, Philip Tully, Ralph Etienne-Cummings, Rodrigo Alvarez, Saeed Afshar, Sergio Davies, Shih-Chii Liu, Suraj Honnuraiah, shashikant koul, Soumyajit Mandal, Timmer Horiuchi, Tobi Delbruck, Zonghua Gu

Organizers: arindam basu (NTU) john arthur (IBM Almaden)


Cameras and microphones are everywhere, capturing frames of pixels and streams of sound intensity at a data rate that we cannot process, store, or transmit. The conventional approach to address this challenge is to store a fraction of the incoming data and process it later, after an important event occurs—to be reactive rather than proactive. Data is growing faster than processing, storage, and bandwidth, and we are falling further and further behind the deluge of data. This conventional approach is in stark contrast to the one taken by neurobiology in which visual and audio information are constantly processed, eliminating the need to store in a memory or transmit across distance. Events of interest are detected as they occur, allowing an animal to respond immediately, a crucial evolutionary advantage.

Machine learning methods, including deep learning, run on traditional computing systems, have shown great promise in object recognition from images and speech recognition problems, but networks on traditional hardware cannot keep up the with the data rate. Further, these are inherently based on computing models which do not include time explicitly, similar to artificial neural networks of earlier generations. Our brain on the other hand uses spikes to communicate information and it is increasingly believed that the time of spiking contains important information.

This workgroup will enable participants to use large-scale spiking neuromorphic systems to explore spike based representations their benefits to computing and communication. TrueNorth and Extreme Learning Machine chips will process visual and audio input from spike-based retinal and cochlear sensors, performing visual, auditory, and decision making tasks in real-time. All of the hardware will natively use spikes, allowing us to connect various blocks to compose complex systems, including higher-order decision making systems (using for example, Neural Engineering Framework, Extreme Learning Machines, and/or Echo State Networks). We will explore bio-inspired spike-based computing and contrast with traditional machine learning, as well as build hybrids of these two approaches.

Final Projects

Neural Networks for Natural Language Processing: using Word2Vec on TrueNorth

MNIST ATIS: Classify MNIST digits captured live with ATIS retinal camera connected to TrueNorth for processing and, trained with Caffe

Speech Recognition using Cochlea data?

Speech Recognition using Cochlea+ELM IC

Speech Recognition using Cochlea+TrueNorth: Sparse Representations on Spikes

Pedestrian Dataset: Acquire and label a spiking dataset with the DAVIS retinal camera

Spiking Sensor Interface to True North via UDP: ATIS, Davis, Cochlea, Sonar, Radar?

Lip Reading via Sensory Fusion: Acquired multi-modal dataset to improve speech recognition from silicon cochlea using data from a silicon retina (DVS or DAVIS sensor)

Localization Using Remote Sensing: Localization of a mobile robot by using a spiking neural network to classify signals from sonar and radar sensors

Finite State Machines (FSA) on TrueNorth: Finite State Machines

Mapping Neural Populations to Cores?

Projects This is a list of project ideas. We have several. Feel free to suggest your own. We will support you with equipment, software, and expertise.
Broadly we propose two projects here on Vision and Audition (with sensory fusion).

We propose to attempt the task of pedestrian detection from video input covering 3 broad areas of camera input pre-processing, feature extraction and classification. Several sub-projects under this umbrella could be:

  • Pre-process DVS inputs (e.g. spatio temporal filters)
  • Feature generation (oriented filters etc)
  • Saliency --- find areas likely to have pedestrians (likely low resolution)
  • Classification --- pedestrian or not (higher resolution)
  • Stereo --- find objects in space using 2 DVS inputs or egomotion (if we can mount the DVS on something mobile)
  • Tracking --- track moving objects
  • Face detection --- find faces in scene
  • Face recognition --- identify faces (we could train on images from participants)
  • Collision detection --- are pedestrians going to collide with each other or camera

Audition (With Sensory Fusion)
We will try a task of lip reading as an example of a method of using sensory fusion to improve speech recognition. We can use the DVS and AEREAR to simultaneously record image of moving lips and the sound produced. A simple set of words, such as ten digits (similar to TI digits), will be used (initial task can be binary classification). After creation of this database, we can have separate classification using image and sound and then a strategy to inject these features together into one classifier. Eventually, we can try corrupting signal in any one modality by adding noise and hope to still have good performance.

For classification, we will use ELM or SVM type of advanced classifiers and explore using HMM to integrate their outputs over time. We will also use an alternative strategy based on sparse representations followed by simple classifier. A dictionary can be trained to represent both signal types in a sparse manner, using frames from the video and possibly spectrograms from the audio signal recordings. The signals can then be represented in the sparse domain relative to the trained dictionary elements, mimicking the response fields of biological systems that are learned over time and guaranteed to be sparse via an inhibition-like operation. Classification can be performed using a simple classifier in this sparse domain. Hence, several sub-projects under this umbrella could be:

  • Database creation including partially corrupt samples
  • Single modality (audio or video) classification using ELM/SVM on ELM IC or TrueNorth. Integrating the outputs of these classifiers over time using HMM will be explored.
  • Single modality (video or audio) classification in the sparse domain using a simple linear classifier.
  • Fusing the two sources for enhanced classification using ELM/TrueNorth (as well as HMM for integrating information)
  • Fusing the two sources for enhanced classification in the sparse domain using a simple linear classifier
    • Align data to a common frame before performing sparse coding (sensor registration)
    • Arrange data so sparse representations aren’t needed from each sensor simultaneously
    • Find a relationship matrix between audio and visual recordings (similar to low resolution to high resolution reconstruction)

Invited Guests

Tobi Delbruck (INI) Expertise: Event-Based Visual sensors (6/28 - 7/18)

David Anderson (Georgia Tech) Audition

Paul Merolla (IBM) TN

Andrew Cassidy (IBM) TN

Rodrigo Alvarez (IBM) TN

Yi Chen (NTU) Expertise: ELM IC

Saeed Afshar (UWS) Expertise: Spike based learning algorithms

Lectures, Tutorials, and Slides

  • Talk on Random Projection Neural Networks by Jon Tapson and Arindam Basu on July 3. Abstract -- We will explain the advantages of these networks from the basic level, show some results of deep ELM with ~100X less training time than standard back prop and our ELM IC for decoding signals from implantable BMI at sub-uW power levels.

Slides are here:  http://neuromorphs.net/nm/attachment/wiki/2015/scc15/random-projections.pdf

* Talk on Silicon Retina Technology by Tobi Delbruck on July 3. Abstract -- Recent advances in event-based vision sensors based on DVS technology were presented.

Slides are here:  http://neuromorphs.net/nm/attachment/wiki/2015/scc15/Silicon%20retinas%20Telluride%202015.pdf


  • Spiking Retinas
    • DVS
    • ATIS
  • Spiking Cochleas (AEREAR)
  • Extreme Learning Machine Chip
  • TrueNorth
  • FPGA


  • ELM IC emulator
  • coming soon