Abstract: Understanding and modeling people’s behavior in social interactions is an important problem in Social Computing. In this work, we automatically predict the communication skill of a person in two kinds of interview-based social interactions namely interface-based (without an interviewer) and traditional face-to-face interviews. We investigate the differences in behavior perception and automatic prediction of communication skill when the same participant gives both interviews. Automated video interview platforms are gaining increasing attention that allows conducting interviews anywhere and anytime. Until recently, interviews were conducted face-to-face either for screening or for automatic assessment purposes. Our dataset consists of 100 dual interviews where the same participant participates in both settings. External observers rate the interviews by answering several behavioral based assessment questions (manually annotated attributes). Multimodal features related to lexical, acoustic and visual behavior are extracted automatically and trained using supervised learning algorithms like Support Vector Machines (SVM) and Logistic Regression. We make an extensive study of the verbal behavior of the participant using the spoken response obtained from manual transcriptions and an Automatic Speech Recognition (ASR) tool. We also explore early and late fusion of modalities for better prediction. Our best results indicate that automatic assessment can be done with interface-based interviews.