Login New user?  
Applied Mathematics & Information Sciences
An International Journal
               
 
 
 
 
 
 
 
 
 
 
 
 
 

Content
 

Volumes > Volume 19 > No. 5

 
   

The Automated Method of Collecting and Labeling Data for Speech Emotion Recognition based on Face Emotion Recognition

PP: 1067-1077
doi:10.18576/amis/190508
Author(s)
Aisultan Shoiynbek, Darkhan Kuanyshbay, Paulo Menezes, Gustavo Assunc ̧a ̃o, Bakhtiyor Meraliyev, Assylbek Mukhametzhanov, Temirlan Shoiynbek, Sergey Sklyar,
Abstract
Speech Emotion Recognition (SER) is vital for enabling natural and effective human–machine interactions, yet its advancement is constrained by the scarcity of richly annotated emotional speech corpora, the laborious nature of manual labeling, and the difficulty of eliciting genuine expressions. We propose an automated data-collection and labeling pipeline that synchronizes video-based facial emotion recognition (FER) with audio capture to annotate speech recordings according to speakers’ natural facial expressions. Applying this method, we processed 1 243 YouTube videos (1 058 hours of raw footage) and extracted 218 359 candidate utterances, which—after FER-guided filtering—yielded a high-quality corpus of 45 459 recordings (33 h 15 min of audio) across seven basic emotions in Kazakh (15 076 utterances) and Russian (30 383 utterances). We trained a deep neural network on the combined dataset and achieved 86.84% overall test accuracy, with per-language accuracies of 89.00% (Kazakh) and 85.20% (Russian) for seven- way emotion classification; a support vector machine reached 82.47% under the same conditions. By reducing manual annotation effort by over 80% while maintaining consistent labels, our approach delivers a scalable, language-agnostic solution for generating authentic emotional speech datasets, substantially cutting down on human labor and paving the way for more robust, real-world SER systems.

  Home   About us   News   Journals   Conferences Contact us Copyright naturalspublishing.com. All Rights Reserved