Basic Search / Detailed Display

Author: 徐晟紘
Cheng-Hung Hsu
Thesis Title: 鳥聲辨識與聲源定位APP
Bird Sound Recognition and Sound Source Localization APP
Advisor: 楊傳凱
Chuan-Kai Yang
Committee: 賴源正
Yuan-Cheng Lai
林伯慎
Bor-Shen Lin
Degree: 碩士
Master
Department: 管理學院 - 資訊管理系
Department of Information Management
Thesis Publication Year: 2023
Graduation Academic Year: 111
Language: 中文
Pages: 59
Keywords (in Chinese): 鳥聲辨識聲源定位到達時間差到達方向廣義互相關
Keywords (in other languages): Bird Sound Recognition, Sound Source Localization, Time Dierence Of Arrival, Direction Of Arrival, Generialized Cross-Correlation
Reference times: Clicks: 165Downloads: 0
Share:
School Collection Retrieve National Library Collection Retrieve Error Report
  • 深度學習領域的最新進展顯示出令人鼓舞的結果,我們使用卷積神經網絡(CNN)為大規模鳥類進行聲音辨識訓練,同時也針對台灣地區的鳥類做專門的訓練。我們從從Xeno-canto下載資料集進行訓練,因該網站提供了大量標記和分類的錄音。接著我們從錄音檔中提取鳥類聲音特徵,再將其輸入於CNN模型中進行訓練。

    聲源定位是聽者在方向和距離上識別檢測到的聲音來源,本研究嘗試使用高階智慧型手機的兩個麥克風來進行定位,為其建立可行的聲源定位演算法。

    在本論文中,我們開發一款手機應用程式並提供以下技術:
    1. 簡單的麥克風陣列演算法來估計二維空間中聲源的方向和距離,且本技術只需使用智能手機的兩個內建麥克風。在方向方面,我們將一種即時的到達方向估計技術應用在手機上。而在距離方面,基於每對麥克風之間的幾何關係和估計的時間延遲,我們提出了一種獲取聲源距離的方法,其計算複雜度很低,且演算法能簡單實現。
    2. 我們訓練了三種鳥聲識別的CNN模型,並通過不同的配置和超參數對其進行了測試。 並且本文將模型直接放置於前端,故不須傳送任何資料到後端,意味著可直接在手機使用模型進行辨識獲得結果,相較於傳統的作法而言,離線使用的速度更快且不失準確度。


    We use Convolutional Neural Networks (CNN) for large-scale bird sound recognition, and at the same time, we also perform training for the birds in Taiwan. The dataset is downloaded from Xeno-canto, a website that provides a large archive of tagged and categorized bird sounds. We then extracted bird sound features from the recordings and fed them into a CNN model for training.

    Sound source localization is the listener's identification of the detected sound source in terms of direction and distance. This study attempts to use the two microphones of a high-end smartphone for localization, and establishes a feasible sound source localization algorithm for it.

    In this paper, we develop an APP and our contributions are as follows:

    1. A simple microphone array algorithm to estimate the direction and distance of sound sources in two-dimensional space. This technology only needs to use the two built-in microphones of a smartphone.

    2. Three CNN models for bird sound recognition are trained and tested with different configurations and hyperparameters. Models can be used offline. Unlike traditional methods, our method can be used offline and does not need to send data back to the server, which is our advantage when there is no network.

    中文摘要.................................................................. I 英文摘要.................................................................. II 誌謝 ...................................................................... III 目 錄 ..................................................................... IV 圖目錄 .................................................................... VI 表目錄 .................................................................... VIII 第一章 緒論 .............................................................. 1 1.1 研究背景 .................................................................... 1 1.2 研究動機與目的 ............................................................ 1 1.3 論文架構 .................................................................... 2 第二章 文獻探討 ......................................................... 3 2.1 鳥聲辨識 .................................................................... 4 2.2 聲源定位 .................................................................... 7 第三章 鳥聲辨識研究方法 ................................................ 9 3.1 鳥聲辨識系統流程.......................................................... 9 3.2 資料前處理.................................................................. 10 3.2.1 提取音源特徵-ResNet . . . . . . . . . . . . . . . . . . . . . . 10 3.2.2 提取音源特徵-EcientNet . . . . . . . . . . . . . . . . . . . . 14 3.2.3 將音頻重新採樣-YAMNet . . . . . . . . . . . . . . . . . . . . 15 3.3 訓練模型 .................................................................... 15 3.3.1 ResNet Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.3.2 YAMNet Model . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.3.3 EcientNet Model . . . . . . . . . . . . . . . . . . . . . . . . 17 3.4 轉換模型 .................................................................... 18 3.5 錄音 ......................................................................... 19 3.6 提取音源特徵(手機端) ..................................................... 19 IV 3.7 將音頻重新採樣(手機端)................................................... 19 3.8 預測 ......................................................................... 20 第四章 聲源定位研究方法 ................................................ 21 4.1 聲源定位系統流程.......................................................... 22 4.2 錄音 ......................................................................... 23 4.3 將立體聲分割 ............................................................... 23 4.4 求出TDOA.................................................................. 23 4.5 計算到達角度 ............................................................... 25 4.6 ARCore...................................................................... 25 4.7 計算距離 .................................................................... 28 第五章 結果展示與評估................................................... 32 5.1 系統環境 .................................................................... 32 5.2 實驗環境 .................................................................... 33 5.3 資料集....................................................................... 33 5.4 鳥聲辨識結果與分析 ....................................................... 33 5.5 聲源定位結果與分析 ....................................................... 37 5.6 APP畫面展示............................................................... 40 第六章 結論與未來展望................................................... 42 參考文獻.................................................................. 43

    [1] Julia Shoneld and Erin Bayne. Autonomous recording units in avian ecological
    research: current use and future applications. Avian Conservation and Ecology,
    12(1), 2017.
    [2] Dan Stowell, Michael D Wood, Hanna Pamuªa, Yannis Stylianou, and Hervé
    Glotin. Automatic acoustic detection of birds through deep learning: the rst
    bird audio detection challenge. Methods in Ecology and Evolution, 10(3):368
    380, 2019.
    [3] Yariv Ephraim and David Malah. Speech enhancement using a minimum-mean
    square error short-time spectral amplitude estimator. IEEE Transactions on
    acoustics, speech, and signal processing, 32(6):11091121, 1984.
    [4] Yariv Ephraim and David Malah. Speech enhancement using a minimum meansquare error log-spectral amplitude estimator. IEEE transactions on acoustics,
    speech, and signal processing, 33(2):443445, 1985.
    [5] Tom Denton, Scott Wisdom, and John R Hershey. Improving bird classication
    with unsupervised sound separation. In ICASSP 2022-2022 IEEE International
    Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 636
    640. IEEE, 2022.
    [6] Alexis Joly, Hervé Goëau, Hervé Glotin, Concetto Spampinato, Pierre Bonnet,
    Willem-Pier Vellinga, Robert Planqué, Andreas Rauber, Simone Palazzo, Bob
    Fisher, and Henning Müller. Lifeclef 2015: Multimedia life species identication
    challenges. In Proceedings of the 6th International Conference on Experimental IR Meets Multilinguality, Multimodality, and Interaction - Volume 9283,
    CLEF'15, page 462–483, Berlin, Heidelberg, 2015. Springer-Verlag.
    [7] Mario Lasseck. Improved automatic bird identication through decision tree
    based feature selection and bagging. CLEF (working notes), 1391, 2015.
    [8] Karol J Piczak. Environmental sound classication with convolutional neural
    networks. In 2015 IEEE 25th international workshop on machine learning for
    signal processing (MLSP), pages 16. IEEE, 2015.
    [9] Hervé Goëau, Hervé Glotin, Willem-Pier Vellinga, Robert Planqué, and Alexis
    Joly. Lifeclef bird identication task 2016: The arrival of deep learning. In
    CLEF: Conference and Labs of the Evaluation Forum, number 1609, pages
    440449, 2016.
    [10] Karol J Piczak. Recognizing bird species in audio recordings using deep convolutional neural networks. In CLEF (working notes), pages 534543, 2016.
    [11] Elias Sprengel, Martin Jaggi, Yannic Kilcher, and Thomas Hofmann. Audio
    based bird species identication using deep learning techniques. Technical report, 2016.
    [12] Hervé Goëau, Hervé Glotin, Willem-Pier Vellinga, Robert Planqué, and Alexis
    Joly. Lifeclef bird identication task 2017. In CLEF: Conference and Labs of
    the Evaluation Forum, number 1866, 2017.
    [13] Stefan Kahl, Thomas Wilhelm-Stein, Hussein Hussein, Holger Klinck, Danny
    Kowerko, Marc Ritter, and Maximilian Eibl. Large-scale bird sound classi-
    cation using convolutional neural networks. In CLEF (working notes), volume
    1866. 2017.
    [14] Andreas Fritzler, Sven Koitka, and Christoph M Friedrich. Recognizing bird
    species in audio les using transfer learning. In CLEF (Working Notes), 2017.
    [15] Stefan Kahl, Thomas Wilhelm-Stein, Holger Klinck, Danny Kowerko, and Maximilian Eibl. Recognizing birds from sound-the 2018 birdclef baseline system.
    arXiv preprint arXiv:1804.07177, 2018.
    [16] Mario Lasseck. Bird species identication in soundscapes. CLEF (Working
    Notes), 2380, 2019.
    [17] Markus Mühling, Jakob Franz, Nikolaus Korfhage, and Bernd Freisleben. Bird
    species recognition via neural architecture search. In CLEF (Working Notes),
    2020.
    [18] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer
    vision and pattern recognition, pages 770778, 2016.
    [19] Mangalam Sankupellay and Dmitry Konovalov. Bird call recognition using
    deep convolutional neural network, resnet-50. In Proceedings of ACOUSTICS,
    volume 7, pages 18, 2018.
    [20] models/research/audioset/yamnet at master · tensorow/models 
    github.com. https://github.com/tensorflow/models/tree/master/
    research/audioset/yamnet. [Accessed 27-Jul-2022].
    [21] Mingxing Tan and Quoc Le. Ecientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning,
    pages 61056114. PMLR, 2019.
    [22] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
    Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE conference
    on computer vision and pattern recognition, pages 19, 2015.
    [23] Jozef Kotus. Application of passive acoustic radar to automatic localization,
    tracking and classication of sound sources. In 2010 2nd International Conference on Information Technology,(2010 ICIT), pages 6770. IEEE, 2010.
    [24] Ajay Kumar Bandi, Maher Rizkalla, and Paul Salama. A novel approach for
    the detection of gunshot events using sound source localization techniques. In
    2012 IEEE 55th International Midwest Symposium on Circuits and Systems
    (MWSCAS), pages 494497. IEEE, 2012.
    [25] Ertan TOPAL. Gunshot detection system,“sound source localization with
    tdoa”.
    [26] Xiaofei Li and Hong Liu. A survey of sound source localization for robot audition. Zhineng Xitong Xuebao(CAAI Transactions on Intelligent Systems),
    7(1):920, 2013.
    [27] Hidetomo Tanaka and Tetsunori Kobayashi. Estimating positions of multiple
    adjacent speakers based on music spectra correlation using a microphone array.
    In 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 01CH37221), volume 5, pages 30453048. IEEE,
    2001.
    [28] Arthur N Popper and Carl R Schilt. Hearing and acoustic behavior: basic and
    applied considerations. In Fish bioacoustics, pages 1748. Springer, 2008.
    [29] Mehdi Zohourian, Gerald Enzner, and Rainer Martin. Binaural speaker localization integrated into an adaptive beamformer for hearing aids. IEEE/ACM
    Transactions on Audio, Speech, and Language Processing, 26(3):515528, 2017.
    [30] Ali Aroudi and Simon Doclo. Cognitive-driven binaural beamforming using
    eeg-based auditory attention decoding. IEEE/ACM Transactions on Audio,
    Speech, and Language Processing, 28:862875, 2020.
    [31] Tobias May, Steven Van de Par, and Armin Kohlrausch. A binaural scene
    analyzer for joint localization and recognition of speakers in the presence of interfering noise sources and reverberation. IEEE Transactions on Audio, Speech,
    and Language Processing, 20(7):20162030, 2012.
    [32] MA Awad-Alla, Ahmed Hamdy, Farid A Tolbah, Moatasem A Shahin, and
    MA Abdelaziz. A two-stage approach for passive sound source localization
    based on the srp-phat algorithm. APSIPA Transactions on Signal and Information Processing, 9, 2020.
    [33] Zhang Chun-lei, Zeng Xiang-yang, and Zhang Gui-min. Gmm-based binaural
    localization of sound sources in both simulated and real rooms. In 2013 IEEE
    International Conference on Signal Processing, Communication and Computing
    (ICSPCC 2013), pages 14. IEEE, 2013.
    [34] Junhyeong Pak and Jong Won Shin. Sound localization based on phase dierence enhancement using deep neural networks. IEEE/ACM Transactions on
    Audio, Speech, and Language Processing, 27(8):13351345, 2019.
    [35] Nelson Yalta, Kazuhiro Nakadai, and Tetsuya Ogata. Sound source localization
    using deep learning models. Journal of Robotics and Mechatronics, 29(1):3748,
    2017.
    [36] Andrew Francl and Josh H McDermott. Deep neural network models of sound
    localization reveal how perception is adapted to real-world environments. Nature Human Behaviour, 6(1):111133, 2022.
    [37] Lal C Godara. Application of antenna arrays to mobile communications. ii.
    beam-forming and direction-of-arrival considerations. Proceedings of the IEEE,
    85(8):11951245, 1997.
    [38] Siwat Suksri and Thaweesak Yingthawornsuk. Speech recognition using mfcc.
    2012.
    [39] Thomas Grill and Jan Schlüter. Two convolutional neural networks for bird
    detection in audio signals. In 2017 25th European Signal Processing Conference
    (EUSIPCO), pages 17641768. IEEE, 2017.
    [40] Nirosha Priyadarshani, Stephen Marsland, Isabel Castro, and Amal Punchihewa. Birdsong denoising using wavelets. PloS one, 11(1):e0146790, 2016.
    [41] GitHub - m-kortas/Sound-based-bird-species-detection: Sound-based Bird
    Classication - using AI, acoustics and ornithology to classify birds
    in the environment, an environmental awareness project (Web Application, Flask, Python)  github.com. https://github.com/m-kortas/
    Sound-based-bird-species-detection. [Accessed 27-Jul-2022].
    [42] Sergey Ioe and Christian Szegedy. Batch normalization: Accelerating deep
    network training by reducing internal covariate shift. In Proceedings of the 32nd
    International Conference on International Conference on Machine Learning -
    Volume 37, ICML'15, page 448–456. JMLR.org, 2015.
    [43] Vinod Nair and Georey E. Hinton. Rectied linear units improve restricted
    boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML'10, page 807–814, Madison, WI, USA, 2010. Omnipress.
    [44] Tensorow lite. https://www.tensorflow.org/lite/guide?hl=zh-tw. [Accessed 27-Jul-2022].
    [45] Taiwan Wild Bird Federation Bird Record Committee. The
    2020 twbf checklist of the birds of taiwan. https://www.
    bird.org.tw/sites/default/files/field/file/download/
    The-2020-TWBF-Checklist-of-the-Birds-of-Taiwan20210908ed.pdf.

    無法下載圖示 Full text public date 2026/02/13 (Intranet public)
    Full text public date 2026/02/13 (Internet public)
    Full text public date 2026/02/13 (National library)
    QR CODE