Electronic Resource
Multimodal information extraction from videos: Automatic creation of highlight clips from political speeches
العنوان: | Multimodal information extraction from videos: Automatic creation of highlight clips from political speeches |
---|---|
المؤلفون: | Strafforello, Ombretta (author) |
بيانات النشر: | 2019-10-03 |
نوع الوثيقة: | Electronic Resource |
مستخلص: | With the huge amount of data that is collected every day and shared on the internet, many recent studies have focused on methods to make multimedia browsing simple and efficient, investigating techniques for automatic multimedia analysis. This work specifically delves into the case of information extraction from videos, which is still an open challenge due to the combination of their semantic complexity and dynamic nature. The majority of the existing solutions are tailored for specific video categories and result in the creation of key frames time-lapses, video summaries, video overviews or highlight clips. In particular, this thesis project focuses on the case of highlights extraction from videos where one person speaks facing the camera. Automating the process of analysis of this specific kind of videos is important in the industrial context because it can be harnessed for several interesting applications, such as the automatic video summarisation of interviews or the automatic creation of personal video curricula vitae. In this setting, the research objective is to investigate how Machine Learning can be deployed for the task of information extraction. From the target videos, multiple types of features can be extracted, such as textual features from the speech transcription; visual features from the facial expressions, head pose, eye gaze and hand gestures; audio features from the variations in the tone of the voice. ¬The exploitation of multimodal features enhances the capacity of Machine Learning algorithms. In fact, as proven in former research, the integration of multiple channels of information --- textual, audio, visual --- makes it possible to derive a more precise and greater amount of knowledge, just like humans exploit their multiple senses, in addition to experience, to make classifications or predictions. In this work, two approaches for multimodal information extraction from videos are investigated. The first approach is based on simple mul Computer Science |
مصطلحات الفهرس: | Machine Learning, Multimodal Machine Learning, Video analysis, Highlights extraction, Crowdsourcing, master thesis |
URL: | |
الاتاحة: | Open access content. Open access content © 2019 Ombretta Strafforello |
ملاحظة: | English |
Other Numbers: | NLTUD oai:tudelft.nl:uuid:a6f1d8c3-9915-4a31-aca7-953965f4454e 1200034558 |
المصدر المساهم: | DELFT UNIV OF TECHNOL From OAIster®, provided by the OCLC Cooperative. |
رقم الانضمام: | edsoai.on1200034558 |
قاعدة البيانات: | OAIster |
الوصف غير متاح. |