Academic Journal
Masked multi-center angular margin loss for language recognition
العنوان: | Masked multi-center angular margin loss for language recognition |
---|---|
المؤلفون: | Ju, Minghang, Xu, Yanyan, Ke, Dengfeng, Su, Kaile |
بيانات النشر: | Springer |
سنة النشر: | 2022 |
المجموعة: | Griffith University: Griffith Research Online |
مصطلحات موضوعية: | Audio processing, Science & Technology, Technology, Acoustics, Engineering, Electrical & Electronic |
الوصف: | Language recognition based on embedding aims to maximize inter-class variance and minimize intra-class variance. Previous researches are limited to the training constraint of a single centroid, which cannot accurately describe the overall geometric characteristics of the embedding space. In this paper, we propose a novel masked multi-center angular margin (MMAM) loss method from the perspective of multiple centroids, resulting in a better overall performance. Specifically, numerous global centers are used to jointly approximate entities of each class. To capture the local neighbor relationship effectively, a small number of centers are adapted to construct the similarity relationship between these centers and each entity. Furthermore, we use a new reverse label propagation algorithm to adjust neighbor relations according to the ground truth labels to learn a discriminative metric space in the classification process. Finally, an additive angular margin is added, which understands more discriminative language embeddings by simultaneously enhancing intra-class compactness and inter-class discrepancy. Experiments are conducted on the APSIPA 2017 Oriental Language Recognition (AP17-OLR) corpus. We compare the proposed MMAM method with seven state-of-the-art baselines and verify that our method has 26.2% and 31.3% relative improvements in the equal error rate (EER) and Cavg respectively in the full-length test (“full-length” means the average duration of the utterances is longer than 5 s). Also, there are 31.2% and 29.3% relative improvements in the 3-s test and 14% and 14.8% relative improvements in the 1-s test. ; Full Text |
نوع الوثيقة: | article in journal/newspaper |
اللغة: | English |
تدمد: | 1687-4722 |
Relation: | EURASIP Journal on Audio, Speech, and Music Processing; Ju, M; Xu, Y; Ke, D; Su, K, Masked multi-center angular margin loss for language recognition, EURASIP Journal on Audio, Speech, and Music Processing, 2022, 2022, pp. 17; http://hdl.handle.net/10072/419732 |
DOI: | 10.1186/s13636-022-00249-4 |
الاتاحة: | http://hdl.handle.net/10072/419732 https://doi.org/10.1186/s13636-022-00249-4 |
Rights: | http://creativecommons.org/licenses/by/4.0/ ; © The Author(s). 2022 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. ; open access |
رقم الانضمام: | edsbas.6FB3EB6A |
قاعدة البيانات: | BASE |
تدمد: | 16874722 |
---|---|
DOI: | 10.1186/s13636-022-00249-4 |