Discover how Hidden Markov Models enhance eye tracking technology by analyzing visual attention and gaze patterns. This formal overview explores their applications in fields such as psychology, marketing, and user experience research.
Table of Contents
Eye tracking reveals where people look, but expert analysis is about understanding how attention unfolds over time under noisy, real-world conditions. Gaze data is inherently imperfect, and it is affected by factors such as hardware quality, recording environments, head movement, and individual differences between participants.
As we make advances in ecologically valid and diverse eye-tracking, for example with the use of webcams, considerations for handling individual level noise become more important. With the lack of dedicated hardware and controlled environments for participants to collect eye-tracking data, the signal to noise ratio is more variable and affected on an individual rather than study level.
The problem with classifying fixations with webcam eye-tracking:
In the presence of individual level noise, how to deal with noise for fixation classifications is an important concern to deal with. The current gold standards of fixation classifications involve setting hard cut-offs based on velocity of gaze or temporal and spatial proximity of gaze data.
If such a classifier is used, researchers risk misclassifying noise as false positive saccades or fixations. In the absence of a classifier, researchers are left with every data point, including the noise with the signal, and without the benefit of knowing where the participants were actively receiving information from fixations and where they were scanning using saccades.
How Hidden Markov Models can help with webcam based eye-tracking:
This is precisely why Hidden Markov Models (HMMs) play such a central role in modern eye-tracking and why they are the primary Markov-based approach implemented in platforms globally.
Rather than treating eye movements as clean, easily separable events, HMMs acknowledge uncertainty. They model what we cannot observe directly and infer it probabilistically from noisy signals. This makes them especially well suited to eye tracking.
A visualization of how Hidden Markov Models are implemented in Eye Tracking, could look like this:

Hidden Markov Models, Briefly Explained
A Hidden Markov Model (HMM) describes a system that evolves through a sequence of unobservable (hidden) states over time, while producing observable signals that are probabilistically linked to those states. The defining assumption, known as the Markov property, is that the probability of the next hidden state depends only on the current state, not on the full sequence of past states.
Think about reading a line of text. You don’t go back to the beginning of the page every time you reach a new word, you keep going in a sequence that only requires the last known point in order to determine the next one.

This is a critical distinction in eye tracking. Eye-movement states such as fixations, saccades, and blinks are not directly observed. Instead, they must be inferred from noisy gaze signals like position, velocity, and signal stability. HMMs provide a principled way to perform this inference by combining information about the current observations with realistic constraints on how eye movements transition over time.
Hidden Markov Models for eye-tracking classification
Hidden Markov Models are a statistical model for time series data, assuming every sample to be an observation of one of a range of possible states whose probabilities are unobservable, hidden.
In the scenario of fixation classification, it is assumed that every eye tracking sample either represents a data point collected during a fixation or a saccade, with the eye movement pattern assumed to be an unobservable Markov process. Transitioning from one state (the fixation) to the other (the saccade) has a certain probability, which is unknown and needs to be learned by the algorithm from the recorded data.
Since this computation is executed for each individual respondent independent from the other respondents, the algorithm can adapt to individual eye movement patterns and levels of data noise.
How does the Hidden Markov Model work:
The model gets a certain amount of information before it encounters a new dataset. For example, it is given criteria for what a fixation and saccade look like. The model also assumes the probability to stay in a given state or to leave that state.
When given a new dataset, based on known definitions and probabilities, the model finds the Markov, of unknown, parameter to include in the classification. This is what is adapted by the model on an individual level, and allows it to handle problems of noise per individual dataset.
HMM vs other classifiers in iMotions
The default classifier for webcam based eye-tracking data in iMotions is Hidden Markov Models.
Validation study 1: HMM vs I-VT for a screen based eye-tracker
An internal validation study showed very high consistency between the fixation classification of Hidden Markov Models and the gold standard velocity threshold classifier for data recorded with standard screen based eye trackers. This was found consistently on a total of 24 respondents with screen based eye trackers running at 60 and up to 600 Hz.
Interclass Correlation of fixation count across 6×6 gridded AOIs was above 0.99 for all studies, indicating excellent reliability. A prediction of the two classifiers through confusion matrices revealed excellent accuracy of above 0.94 and Kappa values above 0.8 for all respondents.
Validation study 2: HMM vs I-VT for webcam based eye-tracking
Two internal studies, with 5 participants each, recorded with iMotions’ web camera based eye tracking algorithm, found a medium level of consistency between Hidden Markov Models and the velocity based filter.
Fixation count across 6×6 gridded AOIs indicated good reliability with Interclass Correlation values of 0.87 or higher, but predicting I-HMM data from I-VT had an accuracy of at least 0.73 with Kappa values ranging between 0.3-0.8 across respondents. Visual inspection of the data revealed that Hidden Markov Models better classified fixations from web camera based eye tracking data.
What HMM can and can not do
It is important to note that the fixation filter or classifier can not correct the data obtained from a webcamera. The low accuracy and precision associated with the lack of a dedicated tracker will not be corrected by any fixation classification filter. However, HMM proved to be better adept at filtering out the true signal from the individual level noise.
Conclusion
With eye tracking moving beyond controlled lab settings and into more natural, scalable, and remote environments, handling noise at the individual level becomes a central challenge. Traditional fixation classifiers based on fixed thresholds work well when signal quality is high and consistent, but their performance degrades as variability increases, which is common in webcam-based eye tracking.
Hidden Markov Models provide a more flexible and principled approach by treating eye movements as a latent, sequential process and explicitly modeling uncertainty. Rather than forcing all data into a single rigid framework, HMMs adapt fixation and saccade classification to individual behavior and noise characteristics. While they cannot overcome the fundamental limitations of webcam hardware, they are better suited to separating meaningful signal from noise, enabling more reliable inference of visual attention in ecologically valid and remote eye-tracking research.
Frequently Asked Questions
1. Why is fixation classification more difficult with webcam-based eye tracking?
Webcam-based eye tracking typically has lower spatial accuracy and higher variability than dedicated eye trackers. Differences in lighting, head movement, camera quality, and participant behavior introduce individual-level noise, making fixed threshold classifiers more prone to misclassifying noise as fixations or saccades.
2. How do Hidden Markov Models differ from velocity-based fixation classifiers?
Velocity-based classifiers rely on hard cut-offs to separate fixations and saccades, assuming consistent signal quality. Hidden Markov Models instead treat eye movements as hidden states that generate noisy observations, allowing classification to be inferred probabilistically and adapt to variation within each individual dataset.
3. Do Hidden Markov Models correct poor-quality eye-tracking data?
No. HMMs do not improve the underlying accuracy or precision of webcam-based eye tracking. They cannot recover information that was never captured. Their strength lies in better separating meaningful signals from noise given the limitations of the data.
4. Why are HMMs well suited for individual-level eye-movement analysis?
HMMs are trained and applied at the individual level, allowing transition probabilities and state definitions to adapt to each participant’s eye-movement patterns and noise characteristics. This makes them especially useful in studies with heterogeneous participants and recording conditions.
5. When should researchers prefer Hidden Markov Models over traditional classifiers?
HMMs are particularly advantageous in remote, online, or ecologically valid eye-tracking studies where signal quality varies across participants. In high-quality lab recordings with dedicated hardware, traditional classifiers may perform similarly, but HMMs offer greater robustness when consistency cannot be guaranteed.
