Dissertation/ Thesis

Detection of the location of talkers via video and audio bimodal processing

التفاصيل البيبلوغرافية
العنوان: Detection of the location of talkers via video and audio bimodal processing
Alternate Title: 利用視訊與聲訊雙重處理進行說話者位置偵測
المؤلفون: Kung, Fan-Jie, 孔繁傑
Thesis Advisors: Liu, Yi-Wen, 劉奕汶
سنة النشر: 2013
المجموعة: National Digital Library of Theses and Dissertations in Taiwan
الوصف: 101
Much research has been investigated regarding the source detection by joining audio and video methods recently. The audio-video method performs better in bias reduction for source detection in the noisy and reverberant environment than using the audio method alone. In this thesis, we design a system for talker detection by using two microphones and the web camera. For audio, we use the definition of hyperbolic surface to estimate the direction of sound sources relative to the microphones. For video, we use Viola-Jones algorithm to detect the face. Afterwards, we use Turk-Pentland algorithm to find the eigenface by principal component analysis, and later use the eigenface to recognize the face. The location of a talking person is determined in two steps. First, we estimate the normal distance between the talker and the imaging plane of the camera by the size of the talker’s face in the image. Then, an estimate of two-dimensional location of the talker is obtained by considering the angle of the talker relative to the camera (or the center of two microphones). Because of using video and audio information jointly, the system can identify the talker, and face detection can be made robust against rotations thanks to the availability of audio information. In addition, when there are multiple talkers in the room, the number of sound sources can be estimated under the assumption that the sources are uncorrelated; this can be achieved either by counting the number of faces in video or calculating the cross correlation function between signals obtained by two microphones. Experiments were conducted and results showed that the bias for estimating the location of a single talker is less than 5cm. Experiments for double talker estimation were also conducted, and we demonstrated that, in principle, we can only use two microphones to detect two sources as long as that they are uncorrelated.
Original Identifier: 101NTHU5442086
نوع الوثيقة: 學位論文 ; thesis
وصف الملف: 70
الاتاحة: http://ndltd.ncl.edu.tw/handle/86177478829798856851
رقم الانضمام: edsndl.TW.101NTHU5442086
قاعدة البيانات: Networked Digital Library of Theses & Dissertations