.:: Natural Sciences Publishing ::.

Login

New user?

Applied Mathematics & Information Sciences

An International Journal

AMIS Home

For Authors

Editorial Board

Publication Ethics

Author Self-Archiving

Processing Charges

Submit an Article

Content

Forthcoming Papers

Subscription

Content


	Volumes > Volume 19 > No. 5


	The Automated Method of Collecting and Labeling Data for Speech Emotion Recognition based on Face Emotion Recognition

	PP: 1067-1077

	doi:10.18576/amis/190508

	Author(s)

	Aisultan Shoiynbek, Darkhan Kuanyshbay, Paulo Menezes, Gustavo Assunc ̧a ̃o, Bakhtiyor Meraliyev, Assylbek Mukhametzhanov, Temirlan Shoiynbek, Sergey Sklyar,

	Abstract

	Speech Emotion Recognition (SER) is vital for enabling natural and effective human–machine interactions, yet its advancement is constrained by the scarcity of richly annotated emotional speech corpora, the laborious nature of manual labeling, and the difficulty of eliciting genuine expressions. We propose an automated data-collection and labeling pipeline that synchronizes video-based facial emotion recognition (FER) with audio capture to annotate speech recordings according to speakers’ natural facial expressions. Applying this method, we processed 1 243 YouTube videos (1 058 hours of raw footage) and extracted 218 359 candidate utterances, which—after FER-guided filtering—yielded a high-quality corpus of 45 459 recordings (33 h 15 min of audio) across seven basic emotions in Kazakh (15 076 utterances) and Russian (30 383 utterances). We trained a deep neural network on the combined dataset and achieved 86.84% overall test accuracy, with per-language accuracies of 89.00% (Kazakh) and 85.20% (Russian) for seven- way emotion classification; a support vector machine reached 82.47% under the same conditions. By reducing manual annotation effort by over 80% while maintaining consistent labels, our approach delivers a scalable, language-agnostic solution for generating authentic emotional speech datasets, substantially cutting down on human labor and paving the way for more robust, real-world SER systems.

Home

Copyright naturalspublishing.com. All Rights Reserved