Abstract: Learning multimodal representations is a fundamentally complex research problem due to the presence of multiple heterogeneous sources of information. Although the presence of multiple modalities provides additional valuable information, there are two key challenges to address when learning from multimodal data: 1) models must learn the complex intra-modal and cross-modal interactions for prediction and 2) models must be robust to unexpected missing or noisy modalities during testing. In this paper, we propose to optimize for a joint generative-discriminative objective across multimodal data and labels. We introduce a model that factorizes representations into two sets of independent factors: multimodal discriminative and modality-specific generative factors. Multimodal discriminative factors are shared across all modalities and contain joint multimodal features required for discriminative tasks such as sentiment prediction. Modality-specific generative factors are unique for each modality and contain the information required for generating data. Experimental results show that our model is able to learn meaningful multimodal representations that achieve state-of-the-art or competitive performance on six multimodal datasets. Our model demonstrates flexible generative capabilities by conditioning on independent factors and can reconstruct missing modalities without significantly impacting performance. Lastly, we interpret our factorized representations to understand the interactions that influence multimodal learning.

Fragrance Testing – Unveiling the Olfactory Impact on Human Behavior

What is Habituation, and How Does it Work?

Learning Multimodal Representations with Factorized Deep Generative Models

Learn more about the technologies used

Scientific Publications from Researchers Using iMotions

People’s experience of urban transformation: eye-tracking architectural qualities of the post-industrial NDSM wharf in Amsterdam

Predicting consumer ad preferences: Leveraging a machine learning approach for EDA and FEA neurophysiological metrics

Enhancing Imaging Anatomy Competency: Integrating Digital Imaging and Communications in Medicine (DICOM) Viewers Into the Anatomy Lab Experience

Shaken, not stirred: Effects of Minimal Rotational Motion Cues on Cybersickness in a VR Flying Experience

Related Posts

Fragrance Testing – Unveiling the Olfactory Impact on Human Behavior

What is Habituation, and How Does it Work?

Introducing iMotions’ New Automated AOI Module

How Meditation Affects the Brain – Exploring the Science Behind Inner Calm

🍪 Use of cookies

Settings