POV Learning: Individual Alignment of Multimodal Models using Human Perception

Simon Werner; Katharina Christ; Laura Bernardy; Marion G. Muller; Achim Rettinger

doi:https://doi.org/10.48550/arXiv.2405.04443

POV Learning: Individual Alignment of Multimodal Models using Human Perception

Simon Werner

Katharina Christ

Laura Bernardy

Marion G. Muller

Achim Rettinger

Aligning machine learning systems with human expectations is mostly attempted by training with manually vetted human behavioral samples, typically explicit feedback. This is done on a population level since the context that is capturing the subjective Point-Of-View (POV) of a concrete person in a specific situational context is not retained in the data. However, we argue that alignment on an individual level can boost the subjective predictive performance for the individual user interacting with the system considerably. Since perception differs for each person, the same situation is observed differently. Consequently, the basis for decision making and the subsequent reasoning processes and observable reactions differ. We hypothesize that individual perception patterns can be used for improving the alignment on an individual level. We test this, by integrating perception information into machine learning systems and measuring their predictive performance wrt. individual subjective assessments. For our empirical study, we collect a novel data set of multimodal stimuli and corresponding eye tracking sequences for the novel task of PerceptionGuided Crossmodal Entailment and tackle it with our Perception-Guided Multimodal Transformer. Our findings suggest that exploiting individual perception signals for the machine learning of subjective human assessments provides a valuable cue for individual alignment. It does not only improve the overall predictive performance from the point-of-view of the individual user but might also contribute to steering AI systems towards every person’s individual expectations and values.

This publication uses Eye Tracking and Eye Tracking Screen Based which is fully integrated into iMotions Lab

Learn more

POV Learning: Individual Alignment of Multimodal Models using Human Perception

Learn more about the technologies used

Other publications you might be interested in

Poker face and steady voice: Gender and reactions to emotional neutrality in crowdfunding

Even Better than the Real Thing: How Imperfection Shapes Trust and Engagement with Digital Humans

Real-time Facial Communication Restores Cooperation After Defection in Social Dilemmas

Websites accessibility assessment of voivodeship cities in Poland

Brain Interfacing, as a Key for Improving Human–technology Integration, Cases and Implementations

Developing an AI-driven multimodal approach to visualising resilient team performance: joint attentional engagement with gaze and speech in simulated emergency scenarios

Related Posts

Memory and Visual Attention: 5 Foundational Eye-Tracking Experiments

Converting Raw Eye-Tracking Data into Cognitive Load Indicators

Desire Before Delight: Why Wanting Drives Consumer Choice More Than Liking

Top 5 Publications of 2025

🍪 Use of cookies

Settings