Abstract: Analyzing human multimodal language is an emerging area of research in NLP. Intrinsically human communication is multimodal (heterogeneous), temporal and asynchronous; it consists of the language (words), visual (expressions), and acoustic (paralinguistic) modalities all in the form of asynchronous coordinated sequences. From a resource perspective, there is a genuine need for large scale datasets that allow for in-depth studies of multimodal language. In this paper we introduce CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI), the largest dataset of sentiment analysis and emotion recognition to date. Using data from CMU-MOSEI and a novel multimodal fusion technique called the Dynamic Fusion Graph (DFG), we conduct experimentation to investigate how modalities interact with each other in human multimodal language. Unlike previously proposed fusion techniques, DFG is highly interpretable and achieves competitive performance compared to the current state of the art.

Neurogaming: Bridging the Mind and Machine in the Gaming Universe

Neuroeconomics: The Best of Neuroscience, Psychology, and Economics

Multimodal Language Analysis in the Wild: CMU-MOSEI Dataset and Interpretable Dynamic Fusion Graph

Learn more about the technologies used

Scientific Publications from Researchers Using iMotions

My money—My problem: How fear-of-missing-out appeals can hinder sustainable investment decisions

Are pie charts evil? An assessment of the value of pie and donut charts compared to bar charts

Being facially expressive is socially advantageous

In-Lab and Remote webcam-based Respiration: A promising candidate for neuromarketing?

Related Posts

Neurogaming: Bridging the Mind and Machine in the Gaming Universe

Neuroeconomics: The Best of Neuroscience, Psychology, and Economics

What is Attribution Theory?

What is the Observer Effect?

🍪 Use of cookies

Settings