التفاصيل البيبلوغرافية
العنوان: |
Lithuanian morphologically annotated corpus - MATAS v1.0 |
المؤلفون: |
Rimkutė, Erika, Bielinskienė, Agnė, Dadurkevičius, Virginijus, Kovalevskaitė, Jolanta, Utka, Andrius, Boizou, Loïc |
بيانات النشر: |
Vytautas Magnus University |
سنة النشر: |
2019 |
مصطلحات موضوعية: |
morphologically annotated, POS tagged, Lithuanian, lang, geo |
الوصف: |
MATAS corpus (version 1.0) DESCRIPTION Manually checked, morphologically annotated corpus MATAS FORMATS 1. CoNLL-U (CONLLU, conllu) 2. SketchEngine - tab delimited word per line (TAB-WPL, txt) SIZE Wordform count: 1,693,410 Sentence count: 144,047 GENRES Contains 5 genres: Documents (14%), Fiction (19%), Periodicals (36%), Scientific texts (24%), Transcripts(7%) TAGSETS morphological annotation presented with 3 different tagsets: - Universal Dependencies (POS 4 column, morphological categories 6 column), see universaldependencies.org; - Jablonskis (5 column) see Documentation folder; - Multext-EAST (10 column), see Documentation folder. JABLONSKIS AND MULTEXT-EAST TAGSETS Jablonskis -> Lithuanian tagset -> human-readable Multext-East -> English tagset -> machine-readable Please use the following text to cite this item: Rimkutė E., Daudaravičius V., Utka A. 2007: Morphological Annotation of the Lithuanian Corpus. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics; Workshop Balto-Slavonic Natural Language Processing 2007, Prague, 94–99. |
نوع الوثيقة: |
dataset |
اللغة: |
unknown |
Relation: |
http://hdl.handle.net/20.500.11821/33 |
الاتاحة: |
https://hdl.handle.net/20.500.11821/33 |
Rights: |
lic_clarin-pub |
رقم الانضمام: |
edsbas.58DB8047 |
قاعدة البيانات: |
BASE |