Despite the recent achievements made in the multi-modal emotion recognition task, two problems still exist and have not been well investigated: 1) the relationship between different emotion categories are not utilized, which leads to sub-optimal performance; and 2) current models fail to cope well with low-resource emotions, especially for unseen emotions. In this paper, we propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues. We use pre-trained word embeddings to represent emotion categories for textual data. Then, two mapping functions are learned to transfer these embedding into visual and acoustic spaces. For each modality, the model calculates the representation distance between the input sequence and target emotions and makes predictions based on the distances. By doing so, our model can directly adapt to the unseen emotions in any modality since we have their pre-trained embeddings and modality mapping functions. Experiments show that our model achieves stateof-the-art performance on most of the emotion categories. In addition, our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.

We use CMU-Multimodal SDK for downloading and pre-processing the datasets. It helps to do data alignment and early stage feature extraction for each modality. The textual data is tokenized in word level and represented using GloVe (Pennington et al., 2014) embedding. Facial action units are extracted by the Facet (iMotions, 2017) to indicate muscle movements and expressions. These are a commonly used type of feature for facial expression recognition. For acoustic data, COVAREP is used to extract fundamental features, such as mel-frequency cepstral coefficients (MFCCs), pitch tracking, glottal source parameters, etc.

Have you done Research with iMotions?

We want to do more for researchers. Please contact us if you have done research using the iMotions Software Platform and would like to be featured here on our publications list and promoted to our community.