Image Database of Facial Actions and Expressions

One important product of the Automating Facial Expression Measurement grant from NSF is a database of images of specific facial actions. These examples of actions are needed to train the neural networks to classify facial behaviors based on the Facial Action Coding System (FACS). Subjects contract specific facial muscles, singly and in combination, on request repeatedly and over multiple sessions. Few individuals can perform the requested actions accurately, and they must know FACS to understand the complicated requests and interpret feedback about what to do or not to do. The expressions depicted in these images are, thus, deliberately produced behaviors made on request.

The subjects' performances are videotaped (in NTSC format) under standard conditions, which include relatively flat lighting. The videotapes are carefully examined using FACS to locate performances that contain the correct actions and nothing else, and to determine the exact location of each frame to be digitized. To provide good training examples for the network, several stringent criteria for acceptable performances are necessary, such as capturing each individual action in a combination of actions at roughly the same strength of contraction throughout the course of the actions.

Then, sequences of individual frames from each correct performance are digitized in color. To examine the motion or time course of the actions, several images that sample the increase in the strength of contraction of the muscles are digitized. Video frames showing low, medium, and high intensity action are digitized, if possible, with a frame grabber capturing full frame (1/30th sec.) stills. The time difference between the frames representing different intensities varies, depending on how fast the subject moved. These intervals are recorded in a relational database in SMPTE time code, along with other information about the images and the project. Two consecutive frames at each intensity level are digitized to represent movement over small time intervals. Many of the performances also have a neutral face image located within a few frames of the beginning of the action. The images are 24-bit color, 640x480 pixels, and in TIFF.

To date, 24 subjects are represented in this database, yielding between about 6 to 18 examples of the 150 different requested actions. Thus, about 7,000 color images are included in the database, and each has a matching gray scale image used in the neural network analysis.

What is known about these images is only that they show good examples of the corresponding FACS scores. We know nothing about how observers would perceive the meaning of the images, or even if there is any meaning, except what might be predicted based on previous studies.

The images listed below are examples of the images in the database, reduced to 30% and 256 colors and pasted so that the top row represents the low intensity, the middle row, middle intensity, and the bottom row, high intensity. The images in each row are adjacent frames, the earlier one on the left. No neutral is shown.

  • AUs 4+6 (gif, about 62K)
  • AUs 10+15 (gif, about 63K)

  • HIL Home Page
    For further information contact: Joseph C. Hager jchager@ibm.net
    Last updated: Oct. 23, 1995