Dissertation/ Thesis
Lithuanian text difficulty characterization with syllables frequencies / ; Lietuviškų tekstų sudėtingumo analizė.
العنوان: | Lithuanian text difficulty characterization with syllables frequencies / ; Lietuviškų tekstų sudėtingumo analizė. |
---|---|
المؤلفون: | Štulaitė, Laima |
بيانات النشر: | Institutional Repository of Vilnius University |
سنة النشر: | 2024 |
المجموعة: | Vilnius University Virtual Library (VU VL) / Vilniaus universitetas virtuali biblioteka |
مصطلحات موضوعية: | Zipf’s law, Yule model, Beta model, Zipf-Mandelbrot, rank-frequency distribution, sylla-ble‘s entropy rate, syllable’s conditional entropy, complex text classification, gradient boost classifica-tion |
الوصف: | The frequency of words in a language is well-described by Zipf's (1949) law. However, studies at the syllable level are relatively rare in the field of quantitative linguistics, and Zipf's law does not neces-sarily describe the distribution of syllables. In examining the frequency of syllable occurrence in the Lithuanian language, I found that the ranked frequencies of syllables are best described by the Yule distribution model. The Yule equation fits the distribution of Lithuanian syllable rank frequencies bet-ter than the Zipf's, Beta, and Zipf-Mandelbrot models. To account for the complexity of the Lithuanian language, I employed Shannon and conditional entropy measures. The Shannon entropy rate averaged 8.91 information bits per syllable across the Lithuanian text corpus, and the conditional entropy aver-aged 6.45, conditioned on the preceding syllable. The Shannon entropy rate was used to classify more complex texts, and the gradient boost classification algorithm demonstrated the best accuracy and bal-ance in classifying fractions of syllables from 80 Lithuanian texts into complex and not complex cate-gories. |
نوع الوثيقة: | master thesis |
وصف الملف: | application/pdf |
اللغة: | Lithuanian English |
Relation: | https://epublications.vu.lt/object/elaba:192057754/192057754.pdf; https://repository.vu.lt/VU:ELABAETD192057754&prefLang=en_US |
الاتاحة: | https://repository.vu.lt/VU:ELABAETD192057754&prefLang=en_US |
Rights: | info:eu-repo/semantics/openAccess |
رقم الانضمام: | edsbas.C42FA545 |
قاعدة البيانات: | BASE |
الوصف غير متاح. |