Dissertation/ Thesis
Detection of the location of talkers via video and audio bimodal processing
العنوان: | Detection of the location of talkers via video and audio bimodal processing |
---|---|
Alternate Title: | 利用視訊與聲訊雙重處理進行說話者位置偵測 |
المؤلفون: | Kung, Fan-Jie, 孔繁傑 |
Thesis Advisors: | Liu, Yi-Wen, 劉奕汶 |
سنة النشر: | 2013 |
المجموعة: | National Digital Library of Theses and Dissertations in Taiwan |
الوصف: | 101 Much research has been investigated regarding the source detection by joining audio and video methods recently. The audio-video method performs better in bias reduction for source detection in the noisy and reverberant environment than using the audio method alone. In this thesis, we design a system for talker detection by using two microphones and the web camera. For audio, we use the definition of hyperbolic surface to estimate the direction of sound sources relative to the microphones. For video, we use Viola-Jones algorithm to detect the face. Afterwards, we use Turk-Pentland algorithm to find the eigenface by principal component analysis, and later use the eigenface to recognize the face. The location of a talking person is determined in two steps. First, we estimate the normal distance between the talker and the imaging plane of the camera by the size of the talker’s face in the image. Then, an estimate of two-dimensional location of the talker is obtained by considering the angle of the talker relative to the camera (or the center of two microphones). Because of using video and audio information jointly, the system can identify the talker, and face detection can be made robust against rotations thanks to the availability of audio information. In addition, when there are multiple talkers in the room, the number of sound sources can be estimated under the assumption that the sources are uncorrelated; this can be achieved either by counting the number of faces in video or calculating the cross correlation function between signals obtained by two microphones. Experiments were conducted and results showed that the bias for estimating the location of a single talker is less than 5cm. Experiments for double talker estimation were also conducted, and we demonstrated that, in principle, we can only use two microphones to detect two sources as long as that they are uncorrelated. |
Original Identifier: | 101NTHU5442086 |
نوع الوثيقة: | 學位論文 ; thesis |
وصف الملف: | 70 |
الاتاحة: | http://ndltd.ncl.edu.tw/handle/86177478829798856851 |
رقم الانضمام: | edsndl.TW.101NTHU5442086 |
قاعدة البيانات: | Networked Digital Library of Theses & Dissertations |
الوصف غير متاح. |