Abstract: Multimodal machine learning is a core research area spanning the language, visual and acoustic modalities. The central challenge in multimodal learning involves learning representations that can process and relate information from multiple modalities. In this paper, we propose two methods for unsupervised learning of joint multimodal representations using sequence to sequence (Seq2Seq) methods: a Seq2Seq Modality Translation Model and a Hierarchical Seq2Seq Modality Translation Model. We also explore multiple different variations on the multimodal inputs and outputs of these seq2seq models. Our experiments on multimodal sentiment analysis using the CMU-MOSI dataset indicate that our methods learn informative multimodal representations that outperform the baselines and achieve improved performance on multimodal sentiment analysis, specifically in the Bimodal case where our model is able to improve F1 Score by twelve points. We also discuss future directions for multimodal Seq2Seq methods.

Neurogaming: Bridging the Mind and Machine in the Gaming Universe

Neuroeconomics: The Best of Neuroscience, Psychology, and Economics

Seq2Seq2Sentiment: Multimodal Sequence to Sequence Models for Sentiment Analysis

Learn more about the technologies used

Scientific Publications from Researchers Using iMotions

My money—My problem: How fear-of-missing-out appeals can hinder sustainable investment decisions

Are pie charts evil? An assessment of the value of pie and donut charts compared to bar charts

Being facially expressive is socially advantageous

In-Lab and Remote webcam-based Respiration: A promising candidate for neuromarketing?

Related Posts

Neurogaming: Bridging the Mind and Machine in the Gaming Universe

Neuroeconomics: The Best of Neuroscience, Psychology, and Economics

What is Attribution Theory?

What is the Observer Effect?

🍪 Use of cookies

Settings