簡易檢索 / 詳目顯示

研究生: 楊善翔
Shan-hsiang Yang
論文名稱: 聲源三維方位偵測之研究
Three-dimensional Sound Source Localization
指導教授: 古鴻炎
Hung-yan Gu
口試委員: 鍾國亮
Kuo-Liang Chung
范欽雄
Chin-Shyurng Fahn
王新民
Hsin-min Wang
陳柏琳
Ber-lin Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 中文
論文頁數: 78
中文關鍵詞: 聲源方位偵測
外文關鍵詞: sound source localization
相關次數: 點閱:185下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

本論文研究製作了一個聲源三維方位的偵測系統。硬體部分,僅使用三個麥克風組成的正三角形陣列來輸入聲音訊號;軟體部分,依序進行VAD(Voice Activity Detection),TDOA(Time Delay of Arrival)估計,方位偵測之處理。關於VAD,我們提出以頻譜亂度加SNR驗證之方法。關於TDOA估計,我們採用廣義交互相關函數的逼近算法,並且提出以同步式相位複製來改進相位不穩定的問題,再進一步研究以拋物線內插法來提高TDOA的準確度。關於方位偵測,則以計算TDOA估計值與理論值之間的距離,來尋找出聲源方位,再透過內插來提高準確度,進一步研究以不等值投票機制,來對數個音框作綜合的方位角度計算。經由線上實驗測試可知,本論文製作的系統花費的計算量少,而可作到即時處理,並且水平方位角的平均角度誤差為3.43度,而仰角的平均偵測誤差為2.08度,所以整體效能還算不錯。


In this thesis, we study and implement a system to detect the direction of a sound source in three-dimensional space. For the hardware part, an equilateral triangle microphone array composed of only three microphones is used to input the voice signals. For the software part, VAD (Voice Activity Detection), TDOA (Time Delay of Arrival) estimation and direction detection are executed in order. In the processing of VAD, we propose a spectral-entropy plus SNR-verification based method to distinguish speech/non-speech frames. To estimate TDOA, an approximation algorithm is used to compute a generalized cross correlation function. We propose a synchronous phase replication method to solve the problem of unstable phase. In addition, we propose a parabolic interpolation based method to increase the accuracy of estimated TDOA values. Then, the distances between the estimated vector of TDOA values and the vectors of theoretical value are computed in order to find the direction of a sound source. Also, the accuracy is improved by using interpolation. Furthermore, we propose a weighted voting mechanism to determine the final direction angle from the angles obtained in several speech frames. According to the results of on-line experiments, our system can do real-time processing by using small amount of computations. The averaged error of azimuth angle is 3.43 degrees and the averaged error of elevation is 2.08 degrees. Therefore, the overall performance of our sound source localization system is good.

摘要 I Abstract II 誌謝 IV 目錄 V 圖表索引 VII 第1章 緒論 1 1.1 研究動機及目的 1 1.2 相關研究 2 1.3 研究方法 4 1.4 論文架構 6 第2章 語音活動偵測 7 2.1 頻譜亂度量測 7 2.2 SNR量測 9 2.3 VAD實驗 10 2.3.1 去除亂度曲線跳動 11 2.3.2 SNR驗證 13 第3章 時間延遲估計與方位偵測 17 3.1 延遲時間估計 18 3.1.1 交互相關函數 18 3.1.2 廣義交互相關函數 20 3.1.3 濾波器頻率響應函數 22 3.1.4 改進作法 25 3.2 方位偵測 27 3.2.1 前人提出之方法 28 3.2.2 本論文使用之方法 32 3.2.3 不等值投票機制 35 第4章 系統製作 37 4.1 硬體部分 37 4.1.1 麥克風 37 4.1.2 放大器及濾波器電路 38 4.1.3 資料擷取器(DAQ) 40 4.2 軟體部分 41 4.2.1 語音活動偵測 42 4.2.2 聲源方位偵測 43 4.2.3 軟體介面 47 第5章 測試實驗 50 5.1 線外測試 50 5.1.1 最佳選項組合 51 5.1.2 次佳選項組合 53 5.1.3 新增選項 55 5.2 線上測試 58 5.3 效能比較 59 第6章 結論 62 參考文獻 66 作者簡介 69

[1] R. O. Schmidt, “Multiple emitter location and signal parameter estimation”, IEEE Trans. Antennas Propag, Vol. AP-34, No.3, pp. 276-280, March, 1986.
[2] R. Roy and T. Kailath, “ESPRIT-Estimation of Signal Parameters via Rotational Invariance Techniques”, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol.37, No. 7, pp. 984-995, July, 1989.
[3] C. H. Knapp, G. C. Carter, “The generalized correlation method for estimation of time delay”, IEEE Transactions on Acoustics, Speech and Signal Processing, Vol. 24, No. 4, pp. 320-327, Aug. 1976.
[4] K. Nakadai, K. Hidai, H. Mizoguchi, Hiroshi G. Okuno, H. Kitano, “Real-Time Auditory and Visual Multiple-Object Tracking for Humanoids”, IJCAI 2001, pp. 1425-1436, 2001.
[5] F. Asano, Y. Motomura, H. Asoh, T. Yoshimura, N. Ichimura, S. Nakamura, “Fusion of audio and video information for detecting speech events”, in Proc. Fision 2003, pp. 386-293, 2003
[6] J.-M. Valin, F. Michaud, J. Rouat, D. Letourneau, “Robust sound source localization using a microphone array on a mobile robot”, IEEE/RSJ International Conference on Intelligent Robots and Systems, Vol. 2, pp. 1228-1233, Oct. 2003.
[7] Y. Sasaki, Y. Tamai, S. Kagami, H. Mizoguchi, ”2D sound source localization on a mobile robot with a concentric microphone array”, IEEE International Conference on Systems, Man and Cybernetics, Vol. 4, pp. 3528-3533, Oct. 2005.
[8] Byoungho Kwon, Gyeongho Kim, Youngjin Park, “Sound Source Localization Methods with Considering of Microphone Placement in Robot Platform”, The 16th IEEE International Symposium on Robot and Human interactive Communication, pp. 127-130, Aug. 2007.
[9] Xiaoling Lv, Minglu Zhang, “Sound Source Localization Based on Robot Hearing and Vision”, International Conference on Computer Science and Information Technology, pp. 942-946, Aug. 2008.
[10] Philippe Renevey and Andrzej Drygajlo, “Entropy Based Voice Activity Detection in Very Noisy Conditions”, European Conference on Speech Communication and Technology, Eurospeech, 2001.
[11] C. E. Shannon, “A Mathematical Theory of Communication”, The Bell System Technical Journal, Vol. 27, pp. 379–423, 623–656, July, October, 1948.
[12] K. Nakadai, H. G. Okuno, H. Kitano, “Epipolar geometry based sound localization and extraction for humanoid audition”, IEEE/RSJ International Conference on Intelligent Robots and Systems, 2001. Vol. 3, pp. 1395-1401, 2001.
[13] K. Nakadai, H. G. Okuno, H. Kitano, "Real-time sound source localization and separation for robot audition", In ICSLP-2002, pp. 193-196, 2002.
[14] B. Rudzyn, W. Kadous, C. Sammut, ”Real time robot audition system incorporating both 3D sound source localisation and voice characterization”, IEEE International Conference on Robotics and Automation, 2007. pp. 4733 – 4738, 2007.
[15] C. M. Lee, K. S. Yoon, J. H. Lee, K. K. Lee, “Efficient algorithm for localising 3-D narrowband multiple sources”, IEE Proceedings Radar, Sonar and Navigation, Vol. 148, Issue 1, pp. 23-26, Feb, 2001.
[16] 朱楠群,用於網路電話之雜訊刪減與迴音消除之研究,國立台灣科技大學資訊工程研究所,2008。
[17] Daniel V. Rabikin, Richard J. Renomeron, Arthur Dahl, Joseph C. French, James Flanagan, "A DSP implementation of Source LocationIntegration of Using Microphone Arrays", in Proceedings of the 131st Meeting of the Acoustical Society of America, 1996. pp. 88-99.
[18] A. Brutti, M. Omologo, P. Svaizer, “Comparison Between Different Sound Source Localization Techniques Based on a Real Data Collection”, Hands-Free Speech Communication and Microphone Arrays, 2008. pp. 69 – 72.
[19] Primo, Electret Condenser Microphone, http://www.primocorp.co.jp/product/PDF/EM147.pdf
[20] National Semiconductor, LM386 low voltage audio power amplifier http://www.national.com/ds/LM/LM386.pdf
[21] 盧明智,黃敏祥,OP Amp應用+實驗模擬,全華科技圖書股份有限公司,1995。
[22] National Instruments, C Series Analog Input Modules, http://www.ni.com/pdf/products/us/c_series_ai.pdf
[23] National Semiconductor, LM124/LM224/LM324/LM2902 Low Power Quad Operational Amplifiers
http://cache.national.com/ds/LM/LM124.pdf
[24] P. R. Roth,“ Effective measurements using digital signal analysis,” IEEE Spectrum, Vol. 8, pp. 62-70, Apr. 1971.
[25] G. C. Carter, A. H. Nuttall, and P. G. Cable, “The smoothed coherence transform,” Proc. IEEE (Lett.), Vol. 61, pp. 1497-1498, Oct. 1973.
[26] G. C. Carter, A. A. Nuttall, and P. G. Cable, “The smoothed coherence transform (SCOT)”, Naval Underwater Systems Center, New London Lab., New London, CT, Tech. Memo TC-159-72, Aug. 8, 1972.
[27] 莎莉,二維空間的音源定位:時域方法的實作與探討,國立清華大學資訊工程所,2005。
[28] 江文雄,二維近場訊號源方位角與距離之追蹤,國立台灣海洋大學電機工程所,2006。
[29] M. Omologo, P. Svaizer, “Use of the Crosspower-Spectrum Phase in Acoustic Event Location”, IEEE Transactions on Speech and Audio Processing, Vol. 5, No. 3, pp. 288-292, May, 1997.

QR CODE