How much training data for facial action unit detection?

methodological
nonverbal behavior
machine learning
proceeding
Author

Girard, Cohn, Jeni, et al.

Doi

Citation (APA 7)

Girard, J. M., Cohn, J. F., Jeni, L. A., Lucey, S., & De la Torre, F. (2015). How much training data for facial action unit detection? Proceedings of the 11th IEEE International Conference on Automatic Face & Gesture Recognition (FG), 1–8.

Abstract

By systematically varying the number of subjects and the number of frames per subject, we explored the influence of training set size on appearance and shape-based approaches to facial action unit (AU) detection. Digital video and expert coding of spontaneous facial activity from 80 subjects (over 350,000 frames) were used to train and test support vector machine classifiers. Appearance features were shape-normalized SIFT descriptors and shape features were 66 facial landmarks. Ten-fold cross-validation was used in all evaluations. Number of subjects and number of frames per subject differentially affected appearance and shape-based classifiers. For appearance features, which are high-dimensional, increasing the number of training subjects from 8 to 64 incrementally improved performance, regardless of the number of frames taken from each subject (ranging from 450 through 3600). In contrast, for shape features, increases in the number of training subjects and frames were associated with mixed results. In summary, maximal performance was attained using appearance features from large numbers of subjects with as few as 450 frames per subject. These findings suggest that variation in the number of subjects rather than number of frames per subject yields most efficient performance.