Problem Description and Data Collection

Haptic is relating to the perception and manipulation of objects using the senses of touch and proprioception. Humans develop the haptic perception through an everyday interaction with the surrounding objects, and gradually learn to relate the tactile feedback with the visual perception. Specifically, adults developed the capability of hallucinating the tactile feedback from visual input when seeing a certain configuration of hand interacting with an object, For example, consider the scene with a human firmly grasping a paper cup, current computer vision systems can easily detect that there is one hand and one paper cup in the image. Through grasp type classification, the system can further recognize that the hand cylindrical grasp the paper cup. But humans can estimate the tactile responses (or a force pattern) after seeing this scene, and can tell that due to the opposite forces applied onto the paper cup, it may lead to crash the cup. In other words, the capability of hallucinating force patterns from visual input is essential for a more detailed analysis of human interacting with the surrounding physical world, beyond the processes of current state-of-the-art vision systems.

The goal of this study is to develop a computer vision system that can estimate the corresponding force patterns on fingers and palm from visual input. These estimated force responses, which mimics the mirror neurons, can be further used by the intelligent agent to 1) reason the current physical interaction between hand and the object under manipulation, 2) predict the action consequences driven by the estimated force pattern.

Data Collection

In order to learn the mapping from image features to force responses, we need an instrument to collect the force data. Thus, a glove was made with eleven force sensors attached to the surface, one on the palm and the other 10 on the five fingers. Each finger has two sensors. One at the tip of the finger and the other on the middle phalanx. The location of the sensors are arranged so that it can capture as many different grasp type as possible with limited amount of sensors. Figure is a photo of the force-sensing glove. The force sensor is a piezoresistive force sensor from Tekscan. The voltage output is linear with respect to the force being imposed perpendicular to the sensor surface. The accuracy claimed by the manufacturer is within 3%. The sensor is thin and flexible which is a desirable property for our application. The sensors at the finger tip have the maximum measurement range of 4.4 N (1 lb). The sensors at the middle phalanx and the palm have the maximum measurement range of 110 N (25 lb). Those on the fingers have a round sensing area of 9.53 mm in diameter and the one on the palm has a 2.54 cm diameter sensing area. The entire sensing area is treated as one single contact point.

To solve the dilemma between bare hand manipulation for visual feature extraction and a hand with data glove to collect force responses, we asked the human subject to synchronously perform a manipulation task using his or her both hands with mirrored action. Three subjects are asked to perform ten (five for each tool) various manipulation tasks five repetitions. From this data, an recurrent neural network (RNN) based regression model is trained to map visual features from pre-train convotional neural network (CNN) to force responses from force glove.

We collected force data on 8 subjects with three objects: cup, sponge and spoon. There are five affordance corresponds to each of the object.

Cup has the affordance: drink, transfer, pour, hit and shake.

Sponge has the affordance: squeeze, flip, wash, wipe and scratch.

Spoon has the affordance: scoop, hit, eat, stir and sprinkle.

force glove


The model we used for force estimation has similar structure with our action prediction model. In this task, we use the RNN model for regression.