Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models

التفاصيل البيبلوغرافية
العنوان: Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models
المؤلفون: Sunao Hara, Katsuki Inoue, Shinji Watanabe, Ryuichi Yamamoto, Masanobu Abe, Tomoki Hayashi
المصدر: ICASSP
بيانات النشر: IEEE, 2020.
سنة النشر: 2020
مصطلحات موضوعية: Similarity (geometry), Computer science, Speech recognition, Speech synthesis, 010501 environmental sciences, computer.software_genre, 01 natural sciences, Pipeline (software), 030507 speech-language pathology & audiology, 03 medical and health sciences, End-to-end principle, Transcription (linguistics), 0305 other medical science, computer, 0105 earth and related environmental sciences, Speaker adaptation
الوصف: Recently, end-to-end text-to-speech (TTS) models have achieved a remarkable performance, however, requiring a large amount of paired text and speech data for training. On the other hand, we can easily collect unpaired dozen minutes of speech recordings for a target speaker without corresponding text data. To make use of such accessible data, the proposed method leverages the recent great success of state-of-the-art end-to-end automatic speech recognition (ASR) systems and obtains corresponding transcriptions from pretrained ASR models. Although these models could only provide text output instead of intermediate linguistic features like phonemes, end-to-end TTS can be well trained with such raw text data directly. Thus, the proposed method can greatly simplify a speaker adaptation pipeline by consistently employing end-to-end ASR/TTS ecosystems. The experimental results show that our proposed method achieved comparable performance to a paired data adaptation method in terms of subjective speaker similarity and objective cepstral distance measures.
DOI: 10.1109/icassp40776.2020.9053371
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::858c159b6af41c659715109383e20628
https://doi.org/10.1109/icassp40776.2020.9053371
Rights: CLOSED
رقم الانضمام: edsair.doi...........858c159b6af41c659715109383e20628
قاعدة البيانات: OpenAIRE
الوصف
DOI:10.1109/icassp40776.2020.9053371