Abstract: Affect inextricably plays a critical role in the learning process. In this study, we investigate the multimodal fusion of facial, keystrokes, mouse clicks, head posture and contextual features for the detection of student’s frustration in an Affective Tutoring System. The results (AUC=0.64) demonstrated empirically that a multimodal approach offers higher accuracy and better robustness as compared to a unimodal approach. In addition, the inclusion of keystrokes and mouse clicks makes up for the detection gap where video based sensing modes (facial and head postures) are not available. The findings in this paper will dovetail to our end research objective of optimizing the learning of students by adapting empathetically or tailoring to their affective states.