研究生: |
石棟樑 Dong-Liang Shih |
---|---|
論文名稱: |
基於多種類時頻特徵擷取之鳥鳴聲辨識系統 Bird Call Identification System Based on Various Time-Frequency Feature Extraction |
指導教授: |
林敬舜
ChingShun Lin |
口試委員: |
陳維美
Wei-Mei Chen 林淵翔 Yuan-Hsiang Lin 王煥宗 Huan-Chun Wang |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2014 |
畢業學年度: | 102 |
語文別: | 中文 |
論文頁數: | 71 |
中文關鍵詞: | 鳥鳴聲辨識 、特徵擷取 、線性時頻分析 、雙線性時頻分析 、短時傅立葉轉換 、韋格納-威爾分佈 、喬依-威廉斯分佈 、波恩-喬丹分佈 、萊文分佈 、希爾伯特-黃轉換 |
外文關鍵詞: | Bird call identification, Feature extraction, Linear time-frequency analysis, Bilinear time-frequency analysis, Short time Fourier transform, Wigner-Ville distribution, Choi-Williams distribution, Born-Jordan distribution, Levin distribution, Hilbert-Huang transform |
相關次數: | 點閱:453 下載:4 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
鳥類鳴聲辨識系統已經發展了一段時間,而大部分的研究中所使用的相關演算法,皆是從人類的語者或語音辨識系統修改而來。也就是說,大部分鳥鳴聲辨識系統所使用的特徵擷取法,都是使用配合人耳聽覺特性在內的演算法,如MFCC便為主要的方法。然而,對於與人類生理構造不同的鳥類來說,使用包含人因參數的模型在內可能會散失了重要的辨識特徵,因此如何選擇適當的特徵擷取方法即為辨識系統重要的研究方向之一。本論文提出的鳥鳴聲辨識系統可分成三個部分:時頻轉換法的選擇、參考樣本的訓練與鳥類鳴聲的比較。時頻轉換法的選擇極大的影響特徵擷取的結果,辨識系統將鳥鳴聲的資訊轉換成時頻分佈做分析,而這些時頻分佈的數據會做為視覺化的樣式資訊以待進一步量化成可比對的特徵。本研究將會針對一般時頻轉換與含有人因聽覺特性在內的時頻轉換結果做比較,說明使用人耳模型與否之適用性影響。除此之外,並對線性時頻轉換、雙線性時頻轉換與希爾伯特-黃轉換等不同轉換類型間產生的結果做觀察,藉此找出各個鳥類所適用的時頻轉換以重新調整系統。
Automatic bird sound identification system has been developed for several years. Traditional recognition approaches are modified from human speech processing systems. Features extraction algorithms usually used in the bird call identification are based on the human auditory models such as Mel-frequency cepstral coefficient (MFCC). However, the auditory model is not quite suitable for bird sound recognition owing to the different mechanism between human being and computer system. In this thesis, our bird call identification system is composed of three major parts: Time-frequency methods selection, reference templates training and bird sounds comparison. We analyze the bird call by transforming data into the time-frequency domain, which is used as the visual patterns for further feature extraction. In this study, a variety of transformations such as linear time-frequency transform, bilinear time-frequency transform and Hilbert-Huang transform are included in this recognition system. We have also made several comparisons between the normal time-frequency transform and the human perception related transform, and then conclude the best transform for different bird species.
[1] D. W. Tank and J. J. Hopfiled, “Neural computation by concentrating information in time,” Proceedings of Nat. Academy Sciences, pp. 1896-1900, Apr. 1987.
[2] 孫清松,玉山鳥樂:136種野鳥鳴聲圖鑑,玉山國家公園管理處,2012年12月。
[3] T. S. Brandes, “Automated sound recording and analysis techniques for bird surveys and conservation,” Bird Conservation International, vol. 18, pp. 163-173, 2008.
[4] W. H. Tsai, Y. Y. Xu, and W. C. Lin, “Bird species identification based on timbre and pitch features,” IEEE International Conference on Multimedia and Expo, pp. 1-6, Jul. 2013.
[5] C. H. Chou, P. H. Liu, and B. Cai, “On the studies of syllable segmentation and improving MFCCs for automatic birdsong recognition,” Asia-Pacific Services Computing Conference, pp. 745-750, Dec. 2008.
[6] R. Wielgat, P. Swietojanski, T. Potempa, and D. Krol, “On using prefiltration in HMM-based bird species recognition,” International Conference on Signals and Electronic Systems, pp. 1-5, Sep. 2012.
[7] C. H. Chou and P. H. Liu, “Bird species recognition by wavelet transformation of a section of birdsong,” Symposia and Workshops on Ubiquitous, Automatic and Trusted Computing, pp. 189-193, Jul. 2009.
[8] B. Ghoraani and S. Krishnan, “Time-frequency matrix feature extraction and classification of environmental audio signals,” IEEE Transactions on Audio, Speech and Language Processing, vol. 19, pp. 2197-2209, Sep. 2011.
[9] A. Patti and G. A. Williamson, “Methods for classification of nocturnal migratory bird vocalizations using pseudo Wigner-Ville transform,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 758-762, May 2013.
[10] S. S. Chen and Y. Li, “Automatic recognition of bird songs using time-frequency texture,” 5th International Conference on Computational Intelligence and Communication Network, pp. 262-266, Sep. 2013.
[11] C. H. Lee, S. B. Hsu, J. L. Shih, and C. H. Chou, “Continuous birdsong recognition using Gaussian mixture modeling of image shape features,” IEEE Transaction on Multimedia, vol. 15, pp. 454-464, Feb. 2013.
[12] D. Gabor, “Theory of communication,” J. IEE. (London), vol. 93, no. 3, pp. 429-457, 1946.
[13] A. R. Abdullah, A. Z. Sha’ameri, A. Z. Said, N. A. M. Said, N. M. Saad, and A. Jidin, “Bilinear time-frequency analysis techniques for power quality signals,” Proceedings of the International MultiConference of Engineers and Computer Scientists, vol. 2, Mar. 2012.
[14] K. Grochenig, Foundations of Time-Frequency Analysis, Birkauser, Boston, 2001.
[15] F. Hlawatsch and F. Auger, Time-Frequency Analysis: Concepts and Methods, Wiley-ISTE, 2008.
[16] B. Barkat and B. Boashash, “A high-resolution quadratic time-frequency distribution for multicomponent signal analysis,” IEEE Transactions on Signal Processing, vol. 49, pp. 2232-2239, Oct. 2001.
[17] N. E. Huang, Z. Shen, S. R. Long, M. C. Wu, H. H. Shih, Q. Zheng, N. C. Yen, C. C. Tung, and H. H. Liu, “The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis,” Processing of Royal Society of London Series A-Mathematical Physical and Engineering Sciences, vol. 454, no. 1971, pp. 903-995, Mar. 1998.
[18] B. Boashash, Time Frequency Signal Analysis and Processing – A Comprehensive Reference, Elsevier Science, Oxford, 2008.
[19] S. K. Pal and P. Mitra, Pattern Recognition Algorithms for Data Mining, Chapman & Hall/CRC, 2004.
[20] J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-Time Processing of Speech Signal, Wiley Interscience, 1993.
[21] H. Tyagi, R. M. Hegde, A. Murthy, and A. Prabhakar, “Automatic identification of bird calls using spectral ensemble average voice prints,” Proceedings of the 13th European Signal Processing Conference, pp. 1-5, Sep. 2006.
[22] J. M. Marin, K. Mengersen, and C. P. Robert, “Bayesian modeling and inference on mixtures of distributions,” in Handbook of Statics 25, pp. 15840-15845, Elsevier Sciences, 2005.
[23] Z. Yuxin, Y. Miyanaga, and C. Siriteanu, “New robust speech recognition using DTW in noise,” International Symposium on Communications and Information Technologies, pp. 34-38, Oct. 2010.
[24] X. Zhang, J. Sun, Z. Luo, and M. Li, “Confidence index dynamic time warping for language-independent embedded speech recognition,” International Conference on Acoustics, Speech and Signal Processing, pp. 8066-8070, May 2013.
[25] L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
[26] L. Stanciu and C. Stanciu, “Complete computer program for audio restoration,” 8th International Conference on Communications, pp. 61-64, Jun. 2010.
[27] Xeno-Canto Web Site, <http://xeno-canto.org/>, accessed in September 23th, 2013.
[28] 孫清松,大雪山國家森林遊樂區常見鳥類鳴聲,行政院農業委員會林務局,2008年8月。