簡易檢索 / 詳目顯示

研究生: 石棟樑
Dong-Liang Shih
論文名稱: 基於多種類時頻特徵擷取之鳥鳴聲辨識系統
Bird Call Identification System Based on Various Time-Frequency Feature Extraction
指導教授: 林敬舜
ChingShun Lin
口試委員: 陳維美
Wei-Mei Chen
林淵翔
Yuan-Hsiang Lin
王煥宗
Huan-Chun Wang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 中文
論文頁數: 71
中文關鍵詞: 鳥鳴聲辨識特徵擷取線性時頻分析雙線性時頻分析短時傅立葉轉換韋格納-威爾分佈喬依-威廉斯分佈波恩-喬丹分佈萊文分佈希爾伯特-黃轉換
外文關鍵詞: Bird call identification, Feature extraction, Linear time-frequency analysis, Bilinear time-frequency analysis, Short time Fourier transform, Wigner-Ville distribution, Choi-Williams distribution, Born-Jordan distribution, Levin distribution, Hilbert-Huang transform
相關次數: 點閱:279下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

鳥類鳴聲辨識系統已經發展了一段時間,而大部分的研究中所使用的相關演算法,皆是從人類的語者或語音辨識系統修改而來。也就是說,大部分鳥鳴聲辨識系統所使用的特徵擷取法,都是使用配合人耳聽覺特性在內的演算法,如MFCC便為主要的方法。然而,對於與人類生理構造不同的鳥類來說,使用包含人因參數的模型在內可能會散失了重要的辨識特徵,因此如何選擇適當的特徵擷取方法即為辨識系統重要的研究方向之一。本論文提出的鳥鳴聲辨識系統可分成三個部分:時頻轉換法的選擇、參考樣本的訓練與鳥類鳴聲的比較。時頻轉換法的選擇極大的影響特徵擷取的結果,辨識系統將鳥鳴聲的資訊轉換成時頻分佈做分析,而這些時頻分佈的數據會做為視覺化的樣式資訊以待進一步量化成可比對的特徵。本研究將會針對一般時頻轉換與含有人因聽覺特性在內的時頻轉換結果做比較,說明使用人耳模型與否之適用性影響。除此之外,並對線性時頻轉換、雙線性時頻轉換與希爾伯特-黃轉換等不同轉換類型間產生的結果做觀察,藉此找出各個鳥類所適用的時頻轉換以重新調整系統。


Automatic bird sound identification system has been developed for several years. Traditional recognition approaches are modified from human speech processing systems. Features extraction algorithms usually used in the bird call identification are based on the human auditory models such as Mel-frequency cepstral coefficient (MFCC). However, the auditory model is not quite suitable for bird sound recognition owing to the different mechanism between human being and computer system. In this thesis, our bird call identification system is composed of three major parts: Time-frequency methods selection, reference templates training and bird sounds comparison. We analyze the bird call by transforming data into the time-frequency domain, which is used as the visual patterns for further feature extraction. In this study, a variety of transformations such as linear time-frequency transform, bilinear time-frequency transform and Hilbert-Huang transform are included in this recognition system. We have also made several comparisons between the normal time-frequency transform and the human perception related transform, and then conclude the best transform for different bird species.

摘要I AbstractII 目錄III 圖片索引V 表索引VII 第一章 導論1 1.1 前言1 1.2 文獻探討1 1.2.1 語音辨識系統1 1.2.2 鳥鳴聲特性3 1.2.3 鳥鳴聲辨識系統4 1.3 本文架構4 第二章 時頻分析與特徵辨識5 2.1 時頻分析的發展5 2.2 線性時頻分析7 2.2.1 短時傅立葉轉換7 2.2.2 窗型函數8 2.2.3 小波轉換11 2.3 雙線性時頻分析14 2.3.1韋格納-威爾分佈14 2.3.2 柯恩分佈17 2.3.3 Reduced Interference Distribution18 2.4 希爾伯特-黃轉換19 2.5 特徵處理23 2.5.1 梅爾倒頻譜係數24 2.5.2 SEAV27 2.6 決策模型28 2.61 高斯混合模型28 2.62 動態時間扭曲31 第三章 聲音辨識系統35 3.1 時頻表示法37 3.2 樣式量化42 3.3 高斯擬合分佈47 3.4 決策模型52 第四章 實驗結果54 4.1 鳴叫聲資料建立54 4.2 實驗步驟與結果57 4.2.1 單一時頻分佈57 4.2.3 混合時頻分佈59 4.3 實驗結果比較65 第五章 結論與未來展望67 5.1 結論67 5.2 未來展望67 參考文獻69

[1] D. W. Tank and J. J. Hopfiled, “Neural computation by concentrating information in time,” Proceedings of Nat. Academy Sciences, pp. 1896-1900, Apr. 1987.
[2] 孫清松,玉山鳥樂:136種野鳥鳴聲圖鑑,玉山國家公園管理處,2012年12月。
[3] T. S. Brandes, “Automated sound recording and analysis techniques for bird surveys and conservation,” Bird Conservation International, vol. 18, pp. 163-173, 2008.
[4] W. H. Tsai, Y. Y. Xu, and W. C. Lin, “Bird species identification based on timbre and pitch features,” IEEE International Conference on Multimedia and Expo, pp. 1-6, Jul. 2013.
[5] C. H. Chou, P. H. Liu, and B. Cai, “On the studies of syllable segmentation and improving MFCCs for automatic birdsong recognition,” Asia-Pacific Services Computing Conference, pp. 745-750, Dec. 2008.
[6] R. Wielgat, P. Swietojanski, T. Potempa, and D. Krol, “On using prefiltration in HMM-based bird species recognition,” International Conference on Signals and Electronic Systems, pp. 1-5, Sep. 2012.
[7] C. H. Chou and P. H. Liu, “Bird species recognition by wavelet transformation of a section of birdsong,” Symposia and Workshops on Ubiquitous, Automatic and Trusted Computing, pp. 189-193, Jul. 2009.
[8] B. Ghoraani and S. Krishnan, “Time-frequency matrix feature extraction and classification of environmental audio signals,” IEEE Transactions on Audio, Speech and Language Processing, vol. 19, pp. 2197-2209, Sep. 2011.
[9] A. Patti and G. A. Williamson, “Methods for classification of nocturnal migratory bird vocalizations using pseudo Wigner-Ville transform,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 758-762, May 2013.

[10] S. S. Chen and Y. Li, “Automatic recognition of bird songs using time-frequency texture,” 5th International Conference on Computational Intelligence and Communication Network, pp. 262-266, Sep. 2013.
[11] C. H. Lee, S. B. Hsu, J. L. Shih, and C. H. Chou, “Continuous birdsong recognition using Gaussian mixture modeling of image shape features,” IEEE Transaction on Multimedia, vol. 15, pp. 454-464, Feb. 2013.
[12] D. Gabor, “Theory of communication,” J. IEE. (London), vol. 93, no. 3, pp. 429-457, 1946.
[13] A. R. Abdullah, A. Z. Sha’ameri, A. Z. Said, N. A. M. Said, N. M. Saad, and A. Jidin, “Bilinear time-frequency analysis techniques for power quality signals,” Proceedings of the International MultiConference of Engineers and Computer Scientists, vol. 2, Mar. 2012.
[14] K. Grochenig, Foundations of Time-Frequency Analysis, Birkauser, Boston, 2001.
[15] F. Hlawatsch and F. Auger, Time-Frequency Analysis: Concepts and Methods, Wiley-ISTE, 2008.
[16] B. Barkat and B. Boashash, “A high-resolution quadratic time-frequency distribution for multicomponent signal analysis,” IEEE Transactions on Signal Processing, vol. 49, pp. 2232-2239, Oct. 2001.
[17] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N. C. Yen, C. C. Tung, and H. H. Liu, “The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis,” Processing of Royal Society of London Series A-Mathematical Physical and Engineering Sciences, vol. 454, no. 1971, pp. 903-995, Mar. 1998.
[18] B. Boashash, Time Frequency Signal Analysis and Processing – A Comprehensive Reference, Elsevier Science, Oxford, 2008.
[19] S. K. Pal and P. Mitra, Pattern Recognition Algorithms for Data Mining, Chapman & Hall/CRC, 2004.
[20] J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signal, Wiley Interscience, 1993.
[21] H. Tyagi, R. M. Hegde, A. Murthy, and A. Prabhakar, “Automatic identification of bird calls using spectral ensemble average voice prints,” Proceedings of the 13th European Signal Processing Conference, pp. 1-5, Sep. 2006.
[22] J. M. Marin, K. Mengersen, and C. P. Robert, “Bayesian modeling and inference on mixtures of distributions,” in Handbook of Statics 25, pp. 15840-15845, Elsevier Sciences, 2005.
[23] Z. Yuxin, Y. Miyanaga, and C. Siriteanu, “New robust speech recognition using DTW in noise,” International Symposium on Communications and Information Technologies, pp. 34-38, Oct. 2010.
[24] X. Zhang, J. Sun, Z. Luo, and M. Li, “Confidence index dynamic time warping for language-independent embedded speech recognition,” International Conference on Acoustics, Speech and Signal Processing, pp. 8066-8070, May 2013.
[25] L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
[26] L. Stanciu and C. Stanciu, “Complete computer program for audio restoration,” 8th International Conference on Communications, pp. 61-64, Jun. 2010.
[27] Xeno-Canto Web Site, <http://xeno-canto.org/>, accessed in September 23th, 2013.
[28] 孫清松,大雪山國家森林遊樂區常見鳥類鳴聲,行政院農業委員會林務局,2008年8月。

無法下載圖示 全文公開日期 2019/01/21 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE