Introducing speech-to-text and valence analysis

The integration of artificial intelligence (AI) has revolutionized various industries, and now it is transforming the realm of human behavior research. iMotions recently integrated AssemblyAI’s sophisticated speech recognition and analysis API into our iMotions Lab platform, to be able to offer speech-to-text analysis to our users. 

Speech-to-text - outside interview
Analyze and segment any audio and video files, with speech recognition, valence analysis, and auto-transcription.

This integration marks a significant milestone in our data collection and analysis endeavors, enabling iMotions users to unlock deeper insights from spoken language and empower researchers and analysts with enhanced capabilities for understanding and interpreting human communication. This article will give a general overview of what exciting capabilities iMotions users can expect from this new integration.

Import, transcribe, and analyze

The new speech-to-text analysis feature allows users to import videos or audio files, and through AssemblyAI’s API, have the audio automatically transcribed and analyzed. Analysis of the audio includes detection of the number of speakers, discursive valence detection (the use of positive, negative, or neutrally laden words), and speech summarization. 

Speech-to-text and speech analysis components are among the top features our clients and users have requested over the years. The value of analyzing interviews, focus groups, speeches, or interactions recorded on audio or video cannot be underestimated. It can give new depth and provide analysis to just about any recorded verbal account. 

iMotions has customers and users around the world, therefore it was important to us to make sure that as many people as possible can benefit from this new feature, without necessarily mastering English, and the same goes for the respondents of course. Our speech-to-text feature currently supports; English (US, UK, AUS, Global), Dutch, French, German, Italian, Portuguese, and Spanish – with more to come.  

Wide range of applications

For academic users, speech-to-text will prove a valuable tool when doing interviews and research data collection. Researchers conducting interviews or gathering qualitative data can transcribe interviews, focus groups, or other recorded conversations using the speech-to-text feature.

The ability to auto-transcribe and analyze speech can accelerate the data analysis part of most research projects, and through discursive sentiment and valence analysis, it can highlight linguistic features that might have been overlooked. 

speech-to-text in group interview
In iMotions’ new speech analysis feature allows users to identify speakers and assign individual tracks to each.

The value and application of speech-to-text analysis extend far beyond academic research. Industries such as market research, customer experience management, and sentiment analysis can now delve deeper into the subtleties of human communication.

Organizations can gain unprecedented insights into customer sentiment, through interviews, analyze large volumes of call center interactions, and extract valuable data from focus group discussions or social media conversations. The ability to process spoken data empowers decision-makers to act swiftly and make informed choices based on comprehensive linguistic analysis.

How speech-to-text and speech analysis works

AssemblyAI’s API is a paid cloud-based processing service. This means that in order to use speech-to-text and speech analysis in iMotions, users must purchase a unique API license key from AssemblyAI and activate it in the iMotions Software. Once that is done, the audio processing can commence.  

Using speech-to-text and speech analysis in iMotions is incredibly simple. Simply choose the study you want to process and choose the speech-to-text option in the post-processing menu, and the processing will happen in the background, just like all other post-processing features in iMotions. 

Once the processing is complete, the results will be stored and visible as annotations to your data. Your audio will be divided into a list of speakers, each with a string of transcribed blocks, as well as a sentiment analysis that indicates whether a speaker is using positive, negative, or neutrally-laden words.

That way, on top of the convenience of having an audio file broken down into its component parts and transcribed in the process, you can also gain a valuable discursive overview of each individual speaker. 

Look out for more voice analysis features from iMotions

The release of the speech-to-text feature marks the beginning of iMotions’ venture into voice and speech analysis. Later in the year, we will release an entire module dedicated to voice analysis, which we are very excited about. If you are interested in staying updated with news from iMotions, we recommend subscribing to our monthly newsletter or following us on LinkedIn. 

If you are interested in our speech-to-text feature and would like to use it in your research, please do not hesitate to reach out to us here.

About the author


See what is next in human behavior research

Follow our newsletter to get the latest insights and events send to your inbox.