Academic Journal

Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation

التفاصيل البيبلوغرافية
العنوان: Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation
المؤلفون: Huawei Tao, Shuai Shan, Ziyi Hu, Chunhua Zhu, Hongyi Ge
المصدر: Entropy; Volume 25; Issue 1; Pages: 68
بيانات النشر: Multidisciplinary Digital Publishing Institute
سنة النشر: 2022
المجموعة: MDPI Open Access Publishing
مصطلحات موضوعية: speech emotion recognition, data augmentation, multi-channel feature extractor, Wasserstein distance, feature distributions, speaker-invariant emotional representations
الوصف: The absence of labeled samples limits the development of speech emotion recognition (SER). Data augmentation is an effective way to address sample sparsity. However, there is a lack of research on data augmentation algorithms in the field of SER. In this paper, the effectiveness of classical acoustic data augmentation methods in SER is analyzed, based on which a strong generalized speech emotion recognition model based on effective data augmentation is proposed. The model uses a multi-channel feature extractor consisting of multiple sub-networks to extract emotional representations. Different kinds of augmented data that can effectively improve SER performance are fed into the sub-networks, and the emotional representations are obtained by the weighted fusion of the output feature maps of each sub-network. And in order to make the model robust to unseen speakers, we employ adversarial training to generalize emotion representations. A discriminator is used to estimate the Wasserstein distance between the feature distributions of different speakers and to force the feature extractor to learn the speaker-invariant emotional representations by adversarial training. The simulation experimental results on the IEMOCAP corpus show that the performance of the proposed method is 2–9% ahead of the related SER algorithm, which proves the effectiveness of the proposed method.
نوع الوثيقة: text
وصف الملف: application/pdf
اللغة: English
Relation: Signal and Data Analysis; https://dx.doi.org/10.3390/e25010068
DOI: 10.3390/e25010068
الاتاحة: https://doi.org/10.3390/e25010068
Rights: https://creativecommons.org/licenses/by/4.0/
رقم الانضمام: edsbas.AC82E011
قاعدة البيانات: BASE