Head Pose Gesture Recognition
(EECS 498-005 Final Project)
Thomas Cohn, Lance Ying
EECS 498-005 was a special topics course offered by the University of Michigan in the winter semester of 2021, titled Applied Machine Learning for Affective Computing. The focus of this class was how machine learning can be used to understand and interpret human behavior, through the mediums of text, audio, and video.
In our project, we aimed to identify head-based gestures in a video, such as nodding "yes", or shaking "no". Our implementation begins by detecting a set of facial landmarks from a video feed. We initially built our own detector using a convolutional neural network. However, our model didn't generalize outside of the dataset's images, and we had to use a prebuilt detector from the Dlib library. We then estimated the head pose using these facial landmarks, by solving the Perspective-n-Point problem with OpenCV.
Given a sequence of head poses, as obtained by a video, we could then attempt to identify the gesture. To compare these spatio-temporal sequences, we used a similarity metric described by Shen-Shyang Ho et. al. in their paper, Manifold Learning for Multivariate Variable-Length Sequences With an Application to Similarity Search, and then performed nearest-neighbors classification with respect to this metric.