Academic Journal

Masked multi-center angular margin loss for language recognition

التفاصيل البيبلوغرافية
العنوان: Masked multi-center angular margin loss for language recognition
المؤلفون: Ju, Minghang, Xu, Yanyan, Ke, Dengfeng, Su, Kaile
بيانات النشر: Springer
سنة النشر: 2022
المجموعة: Griffith University: Griffith Research Online
مصطلحات موضوعية: Audio processing, Science & Technology, Technology, Acoustics, Engineering, Electrical & Electronic
الوصف: Language recognition based on embedding aims to maximize inter-class variance and minimize intra-class variance. Previous researches are limited to the training constraint of a single centroid, which cannot accurately describe the overall geometric characteristics of the embedding space. In this paper, we propose a novel masked multi-center angular margin (MMAM) loss method from the perspective of multiple centroids, resulting in a better overall performance. Specifically, numerous global centers are used to jointly approximate entities of each class. To capture the local neighbor relationship effectively, a small number of centers are adapted to construct the similarity relationship between these centers and each entity. Furthermore, we use a new reverse label propagation algorithm to adjust neighbor relations according to the ground truth labels to learn a discriminative metric space in the classification process. Finally, an additive angular margin is added, which understands more discriminative language embeddings by simultaneously enhancing intra-class compactness and inter-class discrepancy. Experiments are conducted on the APSIPA 2017 Oriental Language Recognition (AP17-OLR) corpus. We compare the proposed MMAM method with seven state-of-the-art baselines and verify that our method has 26.2% and 31.3% relative improvements in the equal error rate (EER) and Cavg respectively in the full-length test (“full-length” means the average duration of the utterances is longer than 5 s). Also, there are 31.2% and 29.3% relative improvements in the 3-s test and 14% and 14.8% relative improvements in the 1-s test. ; Full Text
نوع الوثيقة: article in journal/newspaper
اللغة: English
تدمد: 1687-4722
Relation: EURASIP Journal on Audio, Speech, and Music Processing; Ju, M; Xu, Y; Ke, D; Su, K, Masked multi-center angular margin loss for language recognition, EURASIP Journal on Audio, Speech, and Music Processing, 2022, 2022, pp. 17; http://hdl.handle.net/10072/419732
DOI: 10.1186/s13636-022-00249-4
الاتاحة: http://hdl.handle.net/10072/419732
https://doi.org/10.1186/s13636-022-00249-4
Rights: http://creativecommons.org/licenses/by/4.0/ ; © The Author(s). 2022 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. ; open access
رقم الانضمام: edsbas.6FB3EB6A
قاعدة البيانات: BASE
الوصف
تدمد:16874722
DOI:10.1186/s13636-022-00249-4