التفاصيل البيبلوغرافية
العنوان: |
Strong Generalized Speech Emotion Recognition Based on Effective Data Augmentation |
المؤلفون: |
Huawei Tao, Shuai Shan, Ziyi Hu, Chunhua Zhu, Hongyi Ge |
المصدر: |
Entropy; Volume 25; Issue 1; Pages: 68 |
بيانات النشر: |
Multidisciplinary Digital Publishing Institute |
سنة النشر: |
2022 |
المجموعة: |
MDPI Open Access Publishing |
مصطلحات موضوعية: |
speech emotion recognition, data augmentation, multi-channel feature extractor, Wasserstein distance, feature distributions, speaker-invariant emotional representations |
الوصف: |
The absence of labeled samples limits the development of speech emotion recognition (SER). Data augmentation is an effective way to address sample sparsity. However, there is a lack of research on data augmentation algorithms in the field of SER. In this paper, the effectiveness of classical acoustic data augmentation methods in SER is analyzed, based on which a strong generalized speech emotion recognition model based on effective data augmentation is proposed. The model uses a multi-channel feature extractor consisting of multiple sub-networks to extract emotional representations. Different kinds of augmented data that can effectively improve SER performance are fed into the sub-networks, and the emotional representations are obtained by the weighted fusion of the output feature maps of each sub-network. And in order to make the model robust to unseen speakers, we employ adversarial training to generalize emotion representations. A discriminator is used to estimate the Wasserstein distance between the feature distributions of different speakers and to force the feature extractor to learn the speaker-invariant emotional representations by adversarial training. The simulation experimental results on the IEMOCAP corpus show that the performance of the proposed method is 2–9% ahead of the related SER algorithm, which proves the effectiveness of the proposed method. |
نوع الوثيقة: |
text |
وصف الملف: |
application/pdf |
اللغة: |
English |
Relation: |
Signal and Data Analysis; https://dx.doi.org/10.3390/e25010068 |
DOI: |
10.3390/e25010068 |
الاتاحة: |
https://doi.org/10.3390/e25010068 |
Rights: |
https://creativecommons.org/licenses/by/4.0/ |
رقم الانضمام: |
edsbas.AC82E011 |
قاعدة البيانات: |
BASE |