Introduction Healthcare team performance directly impacts the quality and safety of medical care. However, measuring the performance of teams is challenging and requires methodologies to investigate different contributing elements. This study proposes an AI(artificial intelligence)-driven multimodal approach to visualising gaze (ie, joint visual attention) and speech in medical team performance and examines how these might differ across medical expertise, using eye-trackers and our own automatic gaze annotation programme.
Method Four simulation sessions, two in Japan and another two in the UK, were filmed with eye-trackers worn by a clinician and a nurse. In each site, one session was conducted with an experienced pair (UK_Ex and JP_Ex) and the other with a less experienced pair (UK_LessEx and JP_LessEx). The scenarios were a difficult intubation in Japan and a urine infection, with a family member present, in the UK. The numbers of occurrences and the time lengths of joint attention and individuals’ speeches in the four data sets were compared in total and in 15 s time ranges to see the correlations.
Result The Ex pairs in both contexts paid joint visual attention more frequently and longer and spoke more than the LessEx pairs. In the JP_Ex, the positive correlation was found between the numbers of joint attention and the total speech durations (r=0.81). That indicates the team members verbally coordinated each other’s attentional objects, which is termed coregulative attentional engagement. In contrast, in the UK_Ex, the correlation was negative (r=−0.70), where they visually monitored each other’s actions while talking to the patient’s family, which we call coinfluential attentional engagement. These tendencies were weak in the LessEx pairs.
Conclusion Although the accuracy of automatic annotation (approximately. 40%–60%) should be improved before applying it to medical training, the research method could provide preliminary insight into elements of good team performance.





