التفاصيل البيبلوغرافية
العنوان: |
New benchmark dataset and fine-grained cross-modal fusion framework for Vietnamese multimodal aspect-category sentiment analysis: New benchmark dataset and fine-grained cross-modal fusion framework for Vietnamese multimodal...: Q. H. Nguyen et al. |
المؤلفون: |
Nguyen, Quy Hoang1,2 (AUTHOR) 20521815@gm.uit.edu.vn, Nguyen, Minh-Van Truong1,2 (AUTHOR) 20522146@gm.uit.edu.vn, Van Nguyen, Kiet1,2 (AUTHOR) kietnv@uit.edu.vn |
المصدر: |
Multimedia Systems. Feb2025, Vol. 31 Issue 1, p1-28. 28p. |
مستخلص: |
The emergence of multimodal data on social media platforms presents new opportunities to better understand user sentiments toward a given aspect. However, existing multimodal datasets for aspect-category sentiment analysis (ACSA) often focus on textual annotations, neglecting fine-grained information in images. Consequently, these datasets fail to fully exploit the richness inherent in multimodal. To address this, we introduce a new Vietnamese multimodal dataset, named ViMACSA, which consists of 4876 text-image pairs with 14,618 fine-grained annotations for both text and image in the hotel domain. Additionally, we propose a fine-grained cross-modal fusion framework (FCMF) that effectively learns both intra- and inter-modality interactions and then fuses these information to produce a unified multimodal representation. Experimental results show that our framework outperforms SOTA models on the ViMACSA dataset, achieving the highest F1 score of 79.73%. We also explore characteristics and challenges in Vietnamese multimodal sentiment analysis, including misspellings, abbreviations, and the complexities of the Vietnamese language. This work contributes both a benchmark dataset and a new framework that leverages fine-grained multimodal information to improve multimodal aspect-category sentiment analysis. Our dataset is available for research purposes. [ABSTRACT FROM AUTHOR] |
قاعدة البيانات: |
Academic Search Index |