Full hand model tracking and grasp classification for recognizing manipulation actions

by Yezhou Yang, Cornelia Fermuller, and Antonis Argyros


Hand tracking and grasp classification play crucial roles in the recognition of manipulation actions. By tracking the hand-pose over time, we have information that can help predict the progress of action (where is the hand going, where is the object expected), provide information on the grasp which in turn facilitates object recognition and classification of possible actions. Using as input a 3D hand-tracking software, we obtain the pose of the hand and articulation of the fingers, from which we derive a robust low-dimensional description of the hand pose to distinguishing different types of grasps. These are then used as part of the action model.

Fig. 1 Left: one example of full hand model tracking, Right: an illustration of the tracked joints model.

Forth Hand Tracking Software

The software [1] tracks the 3D position, orientation and full articulation of a human hand from markerless rgb-depth observations. The developed method describes the hand using a 3D geometric model consisting of parameterized geometric primitives (a cylinder and ellipsoids for the palm, cones and spheres for the fingers, etc.) and models the kimenatics with a 26 DOF model. The hand position at every frame is estimated by minimizing the the re-projection of the hand model and the observations obtained from edges and depth data. The method has the following properties. It : 1) estimates the full articulation of a hand ( 26 DoFs ) involved in unconstrained motion 2) operates on input acquired by easy-to-install and widely used/supported RGB-D cameras (e.g. Kinect, Xtion) 3) does not require markers, or special gloves 4) performs at a rate of 20fps in modern architectures (GPU acceleration) 5) does not require calibration 6) does not rely on any proprietary built-in tracking technologies (Nite, OpenNI, Kinect SDK)

Typical Grasp Types

Many grasping taxonomies have been reported in the literature, in different such as robotics, rehabilitation, and the medical sciences, most of them for a specific application. The figure below shows a classification [2] into ten different grasp types:

Error: Macro Image(grasp_types.png) failed
Attachment 'wiki:act13/Results/hand-grasp:: grasp_types.png' does not exist.

Fig.2 Typical grasp types

We collected using the Forth tracker the trails of 10 different types related to our manipulation action tasks. They are: 1) Small Diameter 2) Parallel Extension 3) Adducted Thumb 4) Precision Sphere 5) Index Finger Extension 6) Palmar Pinch 7) Prismatic 4 Finger 8) Power Disk 9) Sphere 3 Finger and 10) Lateral Tripod.

Then using an unsupervised method we reduced the dimensionality of the data and learned four different grasp types, as explained in the sections.

What differentiates different grasps

Fig.3 Arches of a human hand. [3]

Instead of using the 26 DoF data directly, we first compute finger arches as shown in Fig.3, aka, 4 Oblique arches (Red) and 4 Longitudinal arches (Brown). Then we apply PCA on the reduced eight dimensional space. Result shown below.

On the features extracted this way, we applied k-mean clustering and discovered four general hand poses: 1) Rest, 2), Extension, to pin down the objects 3) Firm Grasp, to hold the tools and 4) Pinch, to hold objects delicately.

Grasp type classification

For every testing sequence, we first extract the 8 arches, and then project the such obtained data onto the space of the top three Principle components. Then a Naive Bayes classification algorithm is used to classify each hand pose into one of the four clusters we discovered in the last section. Intuitively, each post holds a position in the space and a whole trail makes a path (shown below).

Recognition result

Discover general grasp types through an unsupervised way

Within the scope of this project, we focus on the four different grasp types described before. We track for both hands the full hand model to obtain the trails. At every time stamp, we assign to each hand a label (assigning one of the four grasp types). A sample result is shown below.