MAGPIE: A Large Corpus of Potentially Idiomatic Expressions

التفاصيل البيبلوغرافية
العنوان: MAGPIE: A Large Corpus of Potentially Idiomatic Expressions
المؤلفون: Haagsma, Hessel, Bos, Johan, Nissim, Malvina, Calzolari, Nicoletta, Bechet, Frederic, Blache, Philippe, Choukri, Khalid, Cieri, Christopher, Declerck, Thierry, Goggi, Sara, Isahara, Hitoshi, Maegaard, Bente, Mariani, Joseph, Mazo, Helene, Moreno, Asuncion, Odijk, Jan, Piperidis, Stelios
المصدر: Proceedings of The 12th Language Resources and Evaluation Conference: LREC 2020, 279-287
STARTPAGE=279;ENDPAGE=287;TITLE=Proceedings of The 12th Language Resources and Evaluation Conference
بيانات النشر: European Language Resources Association (ELRA), 2020.
سنة النشر: 2020
الوصف: Given the limited size of existing idiom corpora, we aim to enable progress in automatic idiom processing and linguistic analysis by creating the largest-to-date corpus of idioms for English. Using a fixed idiom list, automatic pre-extraction, and a strictly controlled crowdsourced annotation procedure, we show that it is feasible to build a high-quality corpus comprising more than 50K instances, an order of a magnitude larger than previous resources. Crucial ingredients of crowdsourcing were the selection of crowdworkers, clear and comprehensive instructions, and an interface that breaks down the task in small, manageable steps. Analysis of the resulting corpus revealed strong effects of genre on idiom distribution, providing new evidence for existing theories on what influences idiom usage. The corpus also contains rich metadata, and is made publicly available.
وصف الملف: application/pdf
اللغة: English
URL الوصول: https://explore.openaire.eu/search/publication?articleId=narcis______::b1ff2e3ee2c4d3c8298407a4f4adafb8
https://research.rug.nl/en/publications/98738fc4-9838-42e7-b951-4d96414e657a
Rights: OPEN
رقم الانضمام: edsair.narcis........b1ff2e3ee2c4d3c8298407a4f4adafb8
قاعدة البيانات: OpenAIRE