簡易檢索 / 詳目顯示

研究生: 胡准雄
Winner Roedily
論文名稱: Evaluation of Real-Time Noise Classifier based on CNN-LSTM and MFCC for Smartphones
Evaluation of Real-Time Noise Classifier based on CNN-LSTM and MFCC for Smartphones
指導教授: 阮聖彰
Shanq-Jang Ruan
力博宏
Lieber Po-Hung Li
口試委員: 林淵翔
Yuan-Hsiang Lin
林敬舜
Ching-Shun Lin
力博宏
Lieber Po-Hung Li
Peter Chondro
Peter Chondro
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 57
中文關鍵詞: Noise classificationMFCCCNN-LSTMTarsosDSP libraryAndroid
外文關鍵詞: Noise classification, MFCC, CNN-LSTM, TarsosDSP library, Android
相關次數: 點閱:181下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Recent studies demonstrate various methods to classify noises present in daily human
    activity. Most of these methods utilize multiple audio features that require heavy
    computation, which increases the latency. This paper presents a real-time sound classifier based on a smartphone by utilizing only the Mel-frequency Cepstral Coefficient (MFCC) as the feature vector. By relying on this single feature and an augmented audio dataset, this system drastically reduced the computation complexity and achieved 92.06% accuracy. This system utilizes the TarsosDSP library for feature extraction and Convolutional Neural Network – Long-Short Term Memory (CNN-LSTM) for both classification and MFCCs determination. The results show that the developed system can classify the noises with higher accuracy and shorter processing time compared with other architectures. Additionally, this system only takes up 0.03 Watts of power consumption, which makes it suitable for future commercial use.


    Recent studies demonstrate various methods to classify noises present in daily human
    activity. Most of these methods utilize multiple audio features that require heavy
    computation, which increases the latency. This paper presents a real-time sound classifier based on a smartphone by utilizing only the Mel-frequency Cepstral Coefficient (MFCC) as the feature vector. By relying on this single feature and an augmented audio dataset, this system drastically reduced the computation complexity and achieved 92.06% accuracy. This system utilizes the TarsosDSP library for feature extraction and Convolutional Neural Network – Long-Short Term Memory (CNN-LSTM) for both classification and MFCCs determination. The results show that the developed system can classify the noises with higher accuracy and shorter processing time compared with other architectures. Additionally, this system only takes up 0.03 Watts of power consumption, which makes it suitable for future commercial use.

    Abstract Table of Contents List of Tables List of Figures 1 Introduction 1.1 Hearing Loss and Hearing Aids 1.2 Noise Classification 1.3 Organization of this Thesis 2 Related Works 2.1 MFCC Characteristics 2.2 MFCC Extraction 2.3 Convolutional Neural Network (CNN) 2.4 Long Short-Term Memory (LSTM) 2.5 Motivations 3 Proposed Method 3.1 Feature Extraction 3.2 Noise Classification 3.2.1. Datasets 3.2.2. Noise Classifier Model 4 Experimental Result 4.1 MFCC Determination 4.2 The Runtime of the Developed Application 4.3 Resource Consumption during Classification 4.4 Developed Application Overview 4.5 Discussion 5 Conclusions References

    [1] K. R. Borisagar, R. M. Thanki and B. S. Sedani, Speech Enhancement Techniques
    for Digital Hearing Aids, Switzerland: Springer, 2019.
    [2] World Health Organization, “Deafness and hearing loss,” World Health Organization, 20 March 2019. [Online]. Available: https://www.who.int/newsroom/fact-sheets/detail/deafness-and-hearing-loss. Accessed on: September 16,
    2019.
    [3] A. McCormack and H. Fortnum, “Why do people fitted with hearing aids not wear
    them?,” International journal of audiology, vol. 52, no. 5, pp. 360-368, 2013.
    [4] Å. Skagerstrand, S. Stenfelt, S. Arlinger and J. Wikström, “Sounds perceived as
    annoying by hearing-aid users in their daily soundscape,” International Journal of
    Audiology, vol. 53, no. 4, pp. 259-269, 2014.
    [5] C.-Y. Chang, A. Siswanto, C.-Y. Ho, T.-K. Yeh, Y.-R. Chen and S. M. Kuo,
    “Listening in a Noisy Environment: Integration of active noise control in audio
    products,” IEEE Consumer Electronics Magazine, vol. 5, no. 4, pp. 34-43, 2016.
    [6] AUDI-LAB, “The Advantages & Disadvantages of Hearing Aid Types,” AUDILAB, [Online]. Available: https://www.audi-lab.com/types-of-hearing-aidsadvantages-disadvantages. Accessed on: January 20, 2020.
    [7] National Institute of Deafness and Other Communication Disorders (NIDCD),
    “Hearing Aids,” National Institutes of Health, Bethesda, 2013.
    [8] Y.-H. Lai, Y. Tsao, X. Lu, F. Chen, Y.-T. Su, K.-C. Chen, Y.-H. Chen, L.-C. Chen, L. Li and C.-H. Lee, “Deep Learning–Based Noise Reduction Approach to Improve Speech Intelligibility for Cochlear Implant Recipients,” Ear and Hearing, vol. 39, p. 1, 2018.
    [9] F. Saki and N. Kehtarnavaz, “Background Noise Classification using Random
    Forest Tree Classifier for Cochlear Implant Applications,” in 2014 IEEE
    International Conference on Acoustic, Speech and Signal Processing (ICASSP),
    2014.
    [10] Z. Alavi and B. Azimi, “Application of Environment Noise Classification towards Sound Recognition for Cochlear Implant Users,” in 2019 6th International
    Conference on Electrical and Electronics Engineering (ICEEE), 2019.
    [11] J. Singh and R. Joshi, “Background Sound Classification in Speech Audio
    Segments,” in 2019 International Conference on Speech Technology and HumanComputer Dialogue (SpeD), Timisoara, 2019.
    [12] S. U. Hassan, M. Z. Khan, M. U. Ghani Khan and S. Saleem, “Robust Sound
    Classification for Surveillance using Time Frequency Audio Features,” in 2019
    International Conference on Communication Technologies (ComTech), 2019.
    [13] J. Sang, S. Park and J. Lee, “Convolutional Recurrent Neural Networks for Urban Sound Classification using Raw Waveforms,” in 2018 26th European Signal
    Processing Conference (EUSIPCO), Rome, 2018.
    [14] X. Lu, Y. Tsao, S. Matsuda and C. Hori, “Speech enhancement based on deep
    denoising Auto-Encoder,” Proc. Interspeech, pp. 436-440, 2013.
    [15] N. Alamdari and N. Kehtarnavaz, “A Real-Time Smartphone App for Unsupervised
    Noise Classification in Realistic Audio Environments,” in 2019 IEEE International
    Conference on Consumer Electronics (ICCE), Las Vegas, 2019.
    [16] F. Saki, A. Sehgal, I. Panahi and N. Kehtarnavaz, “Smartphone-based Real-time
    Classification of Noise Signals Using Subband Features and Random Forest Classifier,” in 2016 IEEE International Conference on Acoustics, Speech and
    Signal Processing (ICASSP), Shanghai, 2016.
    [17] S. Hyun, I. Choi and N. K. Soo, “ACOUSTIC SCENE CLASSIFICATION USING
    PARALLEL COMBINATION OF LSTM AND CNN,” in Detection and Classification of Acoustic Scenes and Events 2016, Budapest, 2016.
    [18] A. Samal, D. Parida, M. R. Satapathy and M. N. Mohanty, “On the Use of MFCC
    Feature Vector Clustering for Efficient Text Dependent Speaker Recognition,” in
    Proceedings of the International Conference on Frontiers of Intelligent Computing:
    Theory and Applications (FICTA) 2013, 2014.
    [19] T. Ganchev, N. Fakotakis and K. George, “Comparative evaluation of various
    MFCC implementations on the speaker verification task,” in 10th International
    Conference on Speech and Computer (SPECOM 2005), 2005.
    [20] M. Xu, L.-Y. Duan, J. Cai, L.-T. Chia, C. Xu and Q. Tian, “HMM-Based Audio
    Keyword Generation,” in 5th Pacific Rim Conference on Multimedia, Tokyo, 2004.
    [21] C. B. Jacoby, “Automatic Urban Sound Classification Using Feature Learning
    Techniques,” New York University, New York, 2014.
    [22] J. Salamon, C. Jacoby and J. P. Bello, “A Dataset and Taxonomy for Urban Sound
    Research,” in MM ’14: Proceedings of the 22nd ACM international conference on
    Multimedia, Orlando, 2014.
    [23] M. Sahidullah and G. Saha, “Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition,” Speech
    Communication, vol. 54, no. 4, pp. 543-565, 2012.
    [24] F. Zheng, G. Zhang and Z. Song, “Comparison of different implementations of
    MFCC,” Journal of Computer Science and Technology, vol. 16, no. 6, pp. 582-589,
    2001.
    [25] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural
    Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
    [26] J. Dai, S. Liang, W. Xue, C. Ni and W. Liu, “Long short-term memory recurrent
    neural network based segment features for music genre classification,” in 2016 10th
    International Symposium on Chinese Spoken Language Processing (ISCSLP), Tianjin, 2016.
    [27] A. Dang, T. H. Vu and J.-C. Wang, “Acoustic scene classification using convolutional neural networks and multi-scale multi-feature extraction,” in 2018
    IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, 2018.
    [28] J. Six, Digital Sound Processing and Java - Documentation for the TarsosDSP
    Audio Processing Library, Belgium: IPEM, 2015.
    [29] E. Fonseca, M. Plakal, F. Font, D. P. W. Ellis and X. Serra, “FSDKaggle2019,”
    Zenodo, New York, 2020.
    [30] P. Corcoran, C. Costacke, V. Varkarakis and J. Lemley, “Deep Learning for
    Consumer Devices and Services 3—Getting More From Your Datasets With Data
    Augmentation,” IEEE Consumer Electronics Magazine, vol. 9, no. 3, pp. 48-54,
    2020.
    [31] S. K. Kumar, “On weight initialization in deep neural networks,” ArXiv, vol.
    abs/1704.08863, 2017.
    [32] D. P. Kingma and J. L. Ba, “ADAM: A Method for Stochastic Optimization,” in
    International Conference on Learning Representation, 2014.
    [33] L. D. Shapiro, “Boston Chapter Attuned to Hearing Health Care [Society News],
    IEEE Consumer Electronics Magazine, vol. 8, no. 3, pp. 5-6, 2019.

    QR CODE