Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation

التفاصيل البيبلوغرافية
العنوان: Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation
المؤلفون: El Moatez Billah Nagoudi, AbdelRahim A. Elmadany, Muhammad Abdul-Mageed
بيانات النشر: arXiv, 2021.
سنة النشر: 2021
مصطلحات موضوعية: FOS: Computer and information sciences, Computer Science - Machine Learning, Machine translation, Computer science, Existential quantification, Context (language use), 02 engineering and technology, 010501 environmental sciences, computer.software_genre, 01 natural sciences, Code (semiotics), Task (project management), Machine Learning (cs.LG), 0202 electrical engineering, electronic engineering, information engineering, 0105 earth and related environmental sciences, Computer Science - Computation and Language, business.industry, Egyptian Arabic, language.human_language, Modern Standard Arabic, language, 020201 artificial intelligence & image processing, Language model, Artificial intelligence, business, computer, Computation and Language (cs.CL), Natural language processing
الوصف: Recent progress in neural machine translation (NMT) has made it possible to translate successfully between monolingual language pairs where large parallel data exist, with pre-trained models improving performance even further. Although there exists work on translating in code-mixed settings (where one of the pairs includes text from two or more languages), it is still unclear what recent success in NMT and language modeling exactly means for translating code-mixed text. We investigate one such context, namely MT from code-mixed Modern Standard Arabic and Egyptian Arabic (MSAEA) into English. We develop models under different conditions, employing both (i) standard end-to-end sequence-to-sequence (S2S) Transformers trained from scratch and (ii) pre-trained S2S language models (LMs). We are able to acquire reasonable performance using only MSA-EN parallel data with S2S models trained from scratch. We also find LMs fine-tuned on data from various Arabic dialects to help the MSAEA-EN task. Our work is in the context of the Shared Task on Machine Translation in Code-Switching. Our best model achieves $\bf25.72$ BLEU, placing us first on the official shared task evaluation for MSAEA-EN.
Comment: CALCS2021, colocated with NAACL-2021
DOI: 10.48550/arxiv.2105.13573
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::6b82bbf9b0fafddc958748b30241350f
Rights: OPEN
رقم الانضمام: edsair.doi.dedup.....6b82bbf9b0fafddc958748b30241350f
قاعدة البيانات: OpenAIRE
الوصف
DOI:10.48550/arxiv.2105.13573