Abstract: Block-wise missingness in multimodal data poses a challenging barrier for the analysis over it, which is quite common in practical scenarios such as the multimedia intelligent tutoring systems (ITSs). In this work, we collected data from 194 undergraduates via a biology ITS which involves three modalities: student-system logfiles, facial expressions, and eye tracking. However, only 32 out of the 194 students had all three modalities and 83% of them were missing the facial expression data, eye tracking data, or both. To handle such a block-wise missing problem, we propose a ProgressivelyRefinedImputation forMulti-modalities by auto-Encoder (PRIME), which trains the model based on single, pairwise, and entire modalities for imputation in a progressive manner, and therefore enables us to maximally utilize all the available data. We have evaluated PRIME against single-modality log-only (without missingness handling) and five state-of-the-art missing data handling methods on one important yet challenging student modeling task: to predict students’ learning gains. Our results show that using multimodal data as a result of missing data handling yields better prediction performance than using logfiles only, and PRIME outperforms other baseline methods for both learning gain prediction and data reconstruction tasks.