Recognizing Human Facial Emotions in Video: A Psychologically-Inspired Fusion Model
Communication among humans is rich in complexity. It is not limited to verbal signals; emotions are conveyed with gesture, pose and facial expression. Facial Emotion Recognition and Analysis, the techniques by which non-verbal communication is quantified from video of the face, is an exemplar case where computers have difficulty in detecting underlying feelings. This has applications in medicine (treatment of Asperger’s syndrome), video games (Xbox Kinect), human-computer interaction and affective computing. The challenge for image analysis is to design a system to recognize apparent facial expressions as underlying emotional states. To date no system has been proposed that can recognize emotions robustly in a naturally captured, spontaneous video of faces. We propose two advancements to the state-of-the-art methods for computers to undertake this challenge: (1) a novel method that is a marriage of perceptual psychology and image analysis, which prunes frames from large data sets to reduce memory cost by retaining significant frames in the same way the human visual system perceives motion in a scene, and (2) a new technique for face alignment that warps faces in a such a way that facial structures are precisely aligned between all frames in a video, and internal information of facial structures is unmodified in the warping process. These two improvements are demonstrated to significantly improve emotion recognition rates over baseline and the other state-of the-art approaches on the challenging AVEC2011 video-subchallenge dataset. This research is a major step towards empathetic computers that are sensitive to emotional states of humans.