DIUSum: Dynamic Image Utilization for Multimodal Summarization

التفاصيل البيبلوغرافية
العنوان:	DIUSum: Dynamic Image Utilization for Multimodal Summarization
المؤلفون:	Xiao, Min, Zhu, Junnan, Zhai, Feifei, Zhou, Yu, Zong, Chengqing
المصدر:	Proceedings of the AAAI Conference on Artificial Intelligence; Vol. 38 No. 17: AAAI-24 Technical Tracks 17; 19297-19305 ; 2374-3468 ; 2159-5399
بيانات النشر:	Association for the Advancement of Artificial Intelligence
سنة النشر:	2024
المجموعة:	Association for the Advancement of Artificial Intelligence: AAAI Publications
مصطلحات موضوعية:	NLP: Summarization, NLP: Language Grounding & Multi-modal NLP
الوصف:	Existing multimodal summarization approaches focus on fusing image features in the encoding process, ignoring the individualized needs for images when generating different summaries. However, whether intuitively or empirically, not all images can improve summary quality. Therefore, we propose a novel Dynamic Image Utilization framework for multimodal Summarization (DIUSum) to select and utilize valuable images for summarization. First, to predict whether an image helps produce a high-quality summary, we propose an image selector to score the usefulness of each image. Second, to dynamically utilize the multimodal information, we incorporate the hard and soft guidance from the image selector. Under the guidance, the image information is plugged into the decoder to generate a summary. Experimental results have shown that DIUSum outperforms multiple strong baselines and achieves SOTA on two public multimodal summarization datasets. Further analysis demonstrates that the image selector can reflect the improved level of summary quality brought by the images.
نوع الوثيقة:	article in journal/newspaper
وصف الملف:	application/pdf
اللغة:	English
Relation:	https://ojs.aaai.org/index.php/AAAI/article/view/29899/31571; https://ojs.aaai.org/index.php/AAAI/article/view/29899/31572; https://ojs.aaai.org/index.php/AAAI/article/view/29899
DOI:	10.1609/aaai.v38i17.29899
الاتاحة:	https://ojs.aaai.org/index.php/AAAI/article/view/29899 https://doi.org/10.1609/aaai.v38i17.29899
Rights:	Copyright (c) 2024 Association for the Advancement of Artificial Intelligence
رقم الانضمام:	edsbas.9F3C117A
قاعدة البيانات:	BASE

View record in BASE

الوصف
DOI:	10.1609/aaai.v38i17.29899