Lithuanian morphologically annotated corpus - MATAS v1.0

التفاصيل البيبلوغرافية
العنوان: Lithuanian morphologically annotated corpus - MATAS v1.0
المؤلفون: Rimkutė, Erika, Bielinskienė, Agnė, Dadurkevičius, Virginijus, Kovalevskaitė, Jolanta, Utka, Andrius, Boizou, Loïc
بيانات النشر: Vytautas Magnus University
سنة النشر: 2019
مصطلحات موضوعية: morphologically annotated, POS tagged, Lithuanian, lang, geo
الوصف: MATAS corpus (version 1.0) DESCRIPTION Manually checked, morphologically annotated corpus MATAS FORMATS 1. CoNLL-U (CONLLU, conllu) 2. SketchEngine - tab delimited word per line (TAB-WPL, txt) SIZE Wordform count: 1,693,410 Sentence count: 144,047 GENRES Contains 5 genres: Documents (14%), Fiction (19%), Periodicals (36%), Scientific texts (24%), Transcripts(7%) TAGSETS morphological annotation presented with 3 different tagsets: - Universal Dependencies (POS 4 column, morphological categories 6 column), see universaldependencies.org; - Jablonskis (5 column) see Documentation folder; - Multext-EAST (10 column), see Documentation folder. JABLONSKIS AND MULTEXT-EAST TAGSETS Jablonskis -> Lithuanian tagset -> human-readable Multext-East -> English tagset -> machine-readable Please use the following text to cite this item: Rimkutė E., Daudaravičius V., Utka A. 2007: Morphological Annotation of the Lithuanian Corpus. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics; Workshop Balto-Slavonic Natural Language Processing 2007, Prague, 94–99.
نوع الوثيقة: dataset
اللغة: unknown
Relation: http://hdl.handle.net/20.500.11821/33
الاتاحة: https://hdl.handle.net/20.500.11821/33
Rights: lic_clarin-pub
رقم الانضمام: edsbas.58DB8047
قاعدة البيانات: BASE