Electronic Resource

Impact of Data Augmentation on Hate Speech Detection in Roman Urdu

التفاصيل البيبلوغرافية
العنوان: Impact of Data Augmentation on Hate Speech Detection in Roman Urdu
المؤلفون: Atzori, M, Ciaccia, P, Ceci, M, Mandreoli, F, Malerba, D, Sanguinetti, M, Pellicani, A, Motta, F, Maqbool, F, Spahiu, B, Maurino, A, Fariha Maqbool, Blerina Spahiu, Andrea Maurino
بيانات النشر: CEUR-WS 2024
نوع الوثيقة: Electronic Resource
مستخلص: The prevalence of hate speech leads to an increase in hate crimes, online violence, and serious harm to social safety, physical security, and cyberspace. To address this issue, several studies have been conducted on hate speech detection in European languages, whereas little attention has been paid to low-resource South Asian languages, making social media vulnerable for millions of users. Due to the scarcity of the datasets and the samples available, there is a need to apply some strategies to increase the data samples. In this paper, we improved the performance of the already fine-tuned m-Bert model by applying data augmentation techniques to one of the datasets on hate speech on tweets in Roman Urdu language. F1-score and accuracy matrix have been used to compare the results. We also experiment to determine the optimal percentage of augmented data to be included and the percentage of words augmented in each instance of data. The new RUHSOLD++ Dataset containing the augmented data has also been published publicly. The improvement in hate speech detection of the model proved that the performance of the models can be improved by applying data augmentation techniques to the dataset with a limited number of instances.
مصطلحات الفهرس: data augmentation, under resourced languages, large language models, info:eu-repo/semantics/conferenceObject
URL: https://hdl.handle.net/10281/490399
https://ceur-ws.org/Vol-3741/
ispartofbook:Proceedings of the 32nd Symposium on Advanced Database Systems
32nd Italian Symposium on Advanced Database Systems, SEBD 2024 - 23 June 2024 through 26 June 2024
volume:3741
firstpage:321
lastpage:330
numberofpages:10
serie:CEUR WORKSHOP PROCEEDINGS
alleditors:Atzori, M; Ciaccia, P; Ceci, M; Mandreoli, F; Malerba, D; Sanguinetti, M; Pellicani, A; Motta, F
الاتاحة: Open access content. Open access content
info:eu-repo/semantics/openAccess
ملاحظة: English
Other Numbers: ITBAO oai:boa.unimib.it:10281/490399
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-85202057651
1446972448
المصدر المساهم: BICOCCA OPEN ARCH
From OAIster®, provided by the OCLC Cooperative.
رقم الانضمام: edsoai.on1446972448
قاعدة البيانات: OAIster