ArTST: Arabic Text and Speech Transformer

التفاصيل البيبلوغرافية
العنوان:	ArTST: Arabic Text and Speech Transformer
المؤلفون:	Toyin, Hawau Olamide, Djanibekov, Amirbek, Kulkarni, Ajinkya, Aldarmaki, Hanan
سنة النشر:	2023
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
الوصف:	We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language. The model architecture follows the unified-modal framework, SpeechT5, that was recently released for English, and is focused on Modern Standard Arabic (MSA), with plans to extend the model for dialectal and code-switched Arabic in future editions. We pre-trained the model from scratch on MSA speech and text data, and fine-tuned it for the following tasks: Automatic Speech Recognition (ASR), Text-To-Speech synthesis (TTS), and spoken dialect identification. In our experiments comparing ArTST with SpeechT5, as well as with previously reported results in these tasks, ArTST performs on a par with or exceeding the current state-of-the-art in all three tasks. Moreover, we find that our pre-training is conducive for generalization, which is particularly evident in the low-resource TTS task. The pre-trained model as well as the fine-tuned ASR and TTS models are released for research use. Comment: 11 pages, 1 figure, SIGARAB ArabicNLP 2023
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2310.16621
رقم الانضمام:	edsarx.2310.16621
قاعدة البيانات:	arXiv

View record in Arxiv

الوصف
الوصف غير متاح.