Report
Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol
العنوان: | Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol |
---|---|
المؤلفون: | Apostolidis, Konstantinos, Abesser, Jakob, Cuccovillo, Luca, Mezaris, Vasileios |
سنة النشر: | 2024 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Computer Vision and Pattern Recognition, Computer Science - Multimedia, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing |
الوصف: | This paper presents a baseline approach and an experimental protocol for a specific content verification problem: detecting discrepancies between the audio and video modalities in multimedia content. We first design and optimize an audio-visual scene classifier, to compare with existing classification baselines that use both modalities. Then, by applying this classifier separately to the audio and the visual modality, we can detect scene-class inconsistencies between them. To facilitate further research and provide a common evaluation platform, we introduce an experimental protocol and a benchmark dataset simulating such inconsistencies. Our approach achieves state-of-the-art results in scene classification and promising outcomes in audio-visual discrepancies detection, highlighting its potential in content verification applications. Comment: Accepted for publication, 3rd ACM Int. Workshop on Multimedia AI against Disinformation (MAD'24) at ACM ICMR'24, June 10, 2024, Phuket, Thailand. This is the "accepted version" |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2405.00384 |
رقم الانضمام: | edsarx.2405.00384 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |