Electronic Resource
Impact of Data Augmentation on Hate Speech Detection in Roman Urdu
العنوان: | Impact of Data Augmentation on Hate Speech Detection in Roman Urdu |
---|---|
المؤلفون: | Atzori, M, Ciaccia, P, Ceci, M, Mandreoli, F, Malerba, D, Sanguinetti, M, Pellicani, A, Motta, F, Maqbool, F, Spahiu, B, Maurino, A, Fariha Maqbool, Blerina Spahiu, Andrea Maurino |
بيانات النشر: | CEUR-WS 2024 |
نوع الوثيقة: | Electronic Resource |
مستخلص: | The prevalence of hate speech leads to an increase in hate crimes, online violence, and serious harm to social safety, physical security, and cyberspace. To address this issue, several studies have been conducted on hate speech detection in European languages, whereas little attention has been paid to low-resource South Asian languages, making social media vulnerable for millions of users. Due to the scarcity of the datasets and the samples available, there is a need to apply some strategies to increase the data samples. In this paper, we improved the performance of the already fine-tuned m-Bert model by applying data augmentation techniques to one of the datasets on hate speech on tweets in Roman Urdu language. F1-score and accuracy matrix have been used to compare the results. We also experiment to determine the optimal percentage of augmented data to be included and the percentage of words augmented in each instance of data. The new RUHSOLD++ Dataset containing the augmented data has also been published publicly. The improvement in hate speech detection of the model proved that the performance of the models can be improved by applying data augmentation techniques to the dataset with a limited number of instances. |
مصطلحات الفهرس: | data augmentation, under resourced languages, large language models, info:eu-repo/semantics/conferenceObject |
URL: | ispartofbook:Proceedings of the 32nd Symposium on Advanced Database Systems 32nd Italian Symposium on Advanced Database Systems, SEBD 2024 - 23 June 2024 through 26 June 2024 volume:3741 firstpage:321 lastpage:330 numberofpages:10 serie:CEUR WORKSHOP PROCEEDINGS alleditors:Atzori, M; Ciaccia, P; Ceci, M; Mandreoli, F; Malerba, D; Sanguinetti, M; Pellicani, A; Motta, F |
الاتاحة: | Open access content. Open access content info:eu-repo/semantics/openAccess |
ملاحظة: | English |
Other Numbers: | ITBAO oai:boa.unimib.it:10281/490399 info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-85202057651 1446972448 |
المصدر المساهم: | BICOCCA OPEN ARCH From OAIster®, provided by the OCLC Cooperative. |
رقم الانضمام: | edsoai.on1446972448 |
قاعدة البيانات: | OAIster |
الوصف غير متاح. |