Dissertation/ Thesis

Lithuanian text difficulty characterization with syllables frequencies / ; Lietuviškų tekstų sudėtingumo analizė.

التفاصيل البيبلوغرافية
العنوان: Lithuanian text difficulty characterization with syllables frequencies / ; Lietuviškų tekstų sudėtingumo analizė.
المؤلفون: Štulaitė, Laima
بيانات النشر: Institutional Repository of Vilnius University
سنة النشر: 2024
المجموعة: Vilnius University Virtual Library (VU VL) / Vilniaus universitetas virtuali biblioteka
مصطلحات موضوعية: Zipf’s law, Yule model, Beta model, Zipf-Mandelbrot, rank-frequency distribution, sylla-ble‘s entropy rate, syllable’s conditional entropy, complex text classification, gradient boost classifica-tion
الوصف: The frequency of words in a language is well-described by Zipf's (1949) law. However, studies at the syllable level are relatively rare in the field of quantitative linguistics, and Zipf's law does not neces-sarily describe the distribution of syllables. In examining the frequency of syllable occurrence in the Lithuanian language, I found that the ranked frequencies of syllables are best described by the Yule distribution model. The Yule equation fits the distribution of Lithuanian syllable rank frequencies bet-ter than the Zipf's, Beta, and Zipf-Mandelbrot models. To account for the complexity of the Lithuanian language, I employed Shannon and conditional entropy measures. The Shannon entropy rate averaged 8.91 information bits per syllable across the Lithuanian text corpus, and the conditional entropy aver-aged 6.45, conditioned on the preceding syllable. The Shannon entropy rate was used to classify more complex texts, and the gradient boost classification algorithm demonstrated the best accuracy and bal-ance in classifying fractions of syllables from 80 Lithuanian texts into complex and not complex cate-gories.
نوع الوثيقة: master thesis
وصف الملف: application/pdf
اللغة: Lithuanian
English
Relation: https://epublications.vu.lt/object/elaba:192057754/192057754.pdf; https://repository.vu.lt/VU:ELABAETD192057754&prefLang=en_US
الاتاحة: https://repository.vu.lt/VU:ELABAETD192057754&prefLang=en_US
Rights: info:eu-repo/semantics/openAccess
رقم الانضمام: edsbas.C42FA545
قاعدة البيانات: BASE