Abstract: In this paper, we propose a novel multimodal fusion framework, named locally confined modality fusion network (LMFN), that contains a bidirectional multiconnected LSTM (BM-LSTM) to address the multimodal human affective computing problem. Instead of conducting fusion only on a holistic level, we propose a hierarchical fusion strategy that considers both local and global interactions to obtain a comprehensive interpretation of information. Specifically, we partition the feature vector corresponding to each modality into multiple segments and learn every local interaction through a tensor fusion procedure. Global interaction is then modeled by learning the interconnections of local interactions via an originally designed BM-LSTM architecture, establishing a direct connection of cells and states between local interactions that are several time steps apart. With LMFN, we achieve advantages over other methods in the following aspects: 1) local interactions are successfully modeled using a feasible vector segmentation procedure that can explore cross modal dynamics in a more specialized manner; 2) global interactions are modeled to obtain an integral view of multimodal information using BM-LSTM, which guarantees a sufficient exchange of information; and 3) our general fusion strategy is highly extendable by applying other local and global fusion methods. Experiments show that LMFN yields state-of-the-art results. Moreover, LMFN achieves higher efficiency compared to other models applying the outer product as the fusion method.
Scientific Publications from Researchers Using iMotions
iMotion is used for some of the most interesting human behavior research studies done by top researchers around the world. Contact us to have your publication featured here.All Publications