Abstract: In this paper, we propose a novel multimodal fusion framework, named locally confined modality fusion network (LMFN), that contains a bidirectional multiconnected LSTM (BM-LSTM) to address the multimodal human affective computing problem. Instead of conducting fusion only on a holistic level, we propose a hierarchical fusion strategy that considers both local and global interactions to obtain a comprehensive interpretation of information. Specifically, we partition the feature vector corresponding to each modality into multiple segments and learn every local interaction through a tensor fusion procedure. Global interaction is then modeled by learning the interconnections of local interactions via an originally designed BM-LSTM architecture, establishing a direct connection of cells and states between local interactions that are several time steps apart. With LMFN, we achieve advantages over other methods in the following aspects: 1) local interactions are successfully modeled using a feasible vector segmentation procedure that can explore cross modal dynamics in a more specialized manner; 2) global interactions are modeled to obtain an integral view of multimodal information using BM-LSTM, which guarantees a sufficient exchange of information; and 3) our general fusion strategy is highly extendable by applying other local and global fusion methods. Experiments show that LMFN yields state-of-the-art results. Moreover, LMFN achieves higher efficiency compared to other models applying the outer product as the fusion method.

Neurogaming: Bridging the Mind and Machine in the Gaming Universe

Neuroeconomics: The Best of Neuroscience, Psychology, and Economics

Locally Confined Modality Fusion Network with a Global Perspective for Multimodal Human Affective Computing

Learn more about the technologies used

Scientific Publications from Researchers Using iMotions

My money—My problem: How fear-of-missing-out appeals can hinder sustainable investment decisions

Are pie charts evil? An assessment of the value of pie and donut charts compared to bar charts

Being facially expressive is socially advantageous

In-Lab and Remote webcam-based Respiration: A promising candidate for neuromarketing?

Related Posts

Neurogaming: Bridging the Mind and Machine in the Gaming Universe

Neuroeconomics: The Best of Neuroscience, Psychology, and Economics

What is Attribution Theory?

What is the Observer Effect?

🍪 Use of cookies

Settings