簡易檢索 / 詳目顯示

研究生: 胡准雄
Winner Roedily
論文名稱: Evaluation of Real-Time Noise Classifier based on CNN-LSTM and MFCC for Smartphones
Evaluation of Real-Time Noise Classifier based on CNN-LSTM and MFCC for Smartphones
指導教授: 阮聖彰
Shanq-Jang Ruan
力博宏
Lieber Po-Hung Li
口試委員: 林淵翔
Yuan-Hsiang Lin
林敬舜
Ching-Shun Lin
力博宏
Lieber Po-Hung Li
Peter Chondro
Peter Chondro
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 57
中文關鍵詞: Noise classificationMFCCCNN-LSTMTarsosDSP libraryAndroid
外文關鍵詞: Noise classification, MFCC, CNN-LSTM, TarsosDSP library, Android
相關次數: 點閱:209下載:4
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

Recent studies demonstrate various methods to classify noises present in daily human
activity. Most of these methods utilize multiple audio features that require heavy
computation, which increases the latency. This paper presents a real-time sound classifier based on a smartphone by utilizing only the Mel-frequency Cepstral Coefficient (MFCC) as the feature vector. By relying on this single feature and an augmented audio dataset, this system drastically reduced the computation complexity and achieved 92.06% accuracy. This system utilizes the TarsosDSP library for feature extraction and Convolutional Neural Network – Long-Short Term Memory (CNN-LSTM) for both classification and MFCCs determination. The results show that the developed system can classify the noises with higher accuracy and shorter processing time compared with other architectures. Additionally, this system only takes up 0.03 Watts of power consumption, which makes it suitable for future commercial use.


Recent studies demonstrate various methods to classify noises present in daily human
activity. Most of these methods utilize multiple audio features that require heavy
computation, which increases the latency. This paper presents a real-time sound classifier based on a smartphone by utilizing only the Mel-frequency Cepstral Coefficient (MFCC) as the feature vector. By relying on this single feature and an augmented audio dataset, this system drastically reduced the computation complexity and achieved 92.06% accuracy. This system utilizes the TarsosDSP library for feature extraction and Convolutional Neural Network – Long-Short Term Memory (CNN-LSTM) for both classification and MFCCs determination. The results show that the developed system can classify the noises with higher accuracy and shorter processing time compared with other architectures. Additionally, this system only takes up 0.03 Watts of power consumption, which makes it suitable for future commercial use.

Abstract Table of Contents List of Tables List of Figures 1 Introduction 1.1 Hearing Loss and Hearing Aids 1.2 Noise Classification 1.3 Organization of this Thesis 2 Related Works 2.1 MFCC Characteristics 2.2 MFCC Extraction 2.3 Convolutional Neural Network (CNN) 2.4 Long Short-Term Memory (LSTM) 2.5 Motivations 3 Proposed Method 3.1 Feature Extraction 3.2 Noise Classification 3.2.1. Datasets 3.2.2. Noise Classifier Model 4 Experimental Result 4.1 MFCC Determination 4.2 The Runtime of the Developed Application 4.3 Resource Consumption during Classification 4.4 Developed Application Overview 4.5 Discussion 5 Conclusions References

[1] K. R. Borisagar, R. M. Thanki and B. S. Sedani, Speech Enhancement Techniques
for Digital Hearing Aids, Switzerland: Springer, 2019.
[2] World Health Organization, “Deafness and hearing loss,” World Health Organization, 20 March 2019. [Online]. Available: https://www.who.int/newsroom/fact-sheets/detail/deafness-and-hearing-loss. Accessed on: September 16,
2019.
[3] A. McCormack and H. Fortnum, “Why do people fitted with hearing aids not wear
them?,” International journal of audiology, vol. 52, no. 5, pp. 360-368, 2013.
[4] Å. Skagerstrand, S. Stenfelt, S. Arlinger and J. Wikström, “Sounds perceived as
annoying by hearing-aid users in their daily soundscape,” International Journal of
Audiology, vol. 53, no. 4, pp. 259-269, 2014.
[5] C.-Y. Chang, A. Siswanto, C.-Y. Ho, T.-K. Yeh, Y.-R. Chen and S. M. Kuo,
“Listening in a Noisy Environment: Integration of active noise control in audio
products,” IEEE Consumer Electronics Magazine, vol. 5, no. 4, pp. 34-43, 2016.
[6] AUDI-LAB, “The Advantages & Disadvantages of Hearing Aid Types,” AUDILAB, [Online]. Available: https://www.audi-lab.com/types-of-hearing-aidsadvantages-disadvantages. Accessed on: January 20, 2020.
[7] National Institute of Deafness and Other Communication Disorders (NIDCD),
“Hearing Aids,” National Institutes of Health, Bethesda, 2013.
[8] Y.-H. Lai, Y. Tsao, X. Lu, F. Chen, Y.-T. Su, K.-C. Chen, Y.-H. Chen, L.-C. Chen, L. Li and C.-H. Lee, “Deep Learning–Based Noise Reduction Approach to Improve Speech Intelligibility for Cochlear Implant Recipients,” Ear and Hearing, vol. 39, p. 1, 2018.
[9] F. Saki and N. Kehtarnavaz, “Background Noise Classification using Random
Forest Tree Classifier for Cochlear Implant Applications,” in 2014 IEEE
International Conference on Acoustic, Speech and Signal Processing (ICASSP),
2014.
[10] Z. Alavi and B. Azimi, “Application of Environment Noise Classification towards Sound Recognition for Cochlear Implant Users,” in 2019 6th International
Conference on Electrical and Electronics Engineering (ICEEE), 2019.
[11] J. Singh and R. Joshi, “Background Sound Classification in Speech Audio
Segments,” in 2019 International Conference on Speech Technology and HumanComputer Dialogue (SpeD), Timisoara, 2019.
[12] S. U. Hassan, M. Z. Khan, M. U. Ghani Khan and S. Saleem, “Robust Sound
Classification for Surveillance using Time Frequency Audio Features,” in 2019
International Conference on Communication Technologies (ComTech), 2019.
[13] J. Sang, S. Park and J. Lee, “Convolutional Recurrent Neural Networks for Urban Sound Classification using Raw Waveforms,” in 2018 26th European Signal
Processing Conference (EUSIPCO), Rome, 2018.
[14] X. Lu, Y. Tsao, S. Matsuda and C. Hori, “Speech enhancement based on deep
denoising Auto-Encoder,” Proc. Interspeech, pp. 436-440, 2013.
[15] N. Alamdari and N. Kehtarnavaz, “A Real-Time Smartphone App for Unsupervised
Noise Classification in Realistic Audio Environments,” in 2019 IEEE International
Conference on Consumer Electronics (ICCE), Las Vegas, 2019.
[16] F. Saki, A. Sehgal, I. Panahi and N. Kehtarnavaz, “Smartphone-based Real-time
Classification of Noise Signals Using Subband Features and Random Forest Classifier,” in 2016 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), Shanghai, 2016.
[17] S. Hyun, I. Choi and N. K. Soo, “ACOUSTIC SCENE CLASSIFICATION USING
PARALLEL COMBINATION OF LSTM AND CNN,” in Detection and Classification of Acoustic Scenes and Events 2016, Budapest, 2016.
[18] A. Samal, D. Parida, M. R. Satapathy and M. N. Mohanty, “On the Use of MFCC
Feature Vector Clustering for Efficient Text Dependent Speaker Recognition,” in
Proceedings of the International Conference on Frontiers of Intelligent Computing:
Theory and Applications (FICTA) 2013, 2014.
[19] T. Ganchev, N. Fakotakis and K. George, “Comparative evaluation of various
MFCC implementations on the speaker verification task,” in 10th International
Conference on Speech and Computer (SPECOM 2005), 2005.
[20] M. Xu, L.-Y. Duan, J. Cai, L.-T. Chia, C. Xu and Q. Tian, “HMM-Based Audio
Keyword Generation,” in 5th Pacific Rim Conference on Multimedia, Tokyo, 2004.
[21] C. B. Jacoby, “Automatic Urban Sound Classification Using Feature Learning
Techniques,” New York University, New York, 2014.
[22] J. Salamon, C. Jacoby and J. P. Bello, “A Dataset and Taxonomy for Urban Sound
Research,” in MM ’14: Proceedings of the 22nd ACM international conference on
Multimedia, Orlando, 2014.
[23] M. Sahidullah and G. Saha, “Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition,” Speech
Communication, vol. 54, no. 4, pp. 543-565, 2012.
[24] F. Zheng, G. Zhang and Z. Song, “Comparison of different implementations of
MFCC,” Journal of Computer Science and Technology, vol. 16, no. 6, pp. 582-589,
2001.
[25] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural
Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[26] J. Dai, S. Liang, W. Xue, C. Ni and W. Liu, “Long short-term memory recurrent
neural network based segment features for music genre classification,” in 2016 10th
International Symposium on Chinese Spoken Language Processing (ISCSLP), Tianjin, 2016.
[27] A. Dang, T. H. Vu and J.-C. Wang, “Acoustic scene classification using convolutional neural networks and multi-scale multi-feature extraction,” in 2018
IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, 2018.
[28] J. Six, Digital Sound Processing and Java - Documentation for the TarsosDSP
Audio Processing Library, Belgium: IPEM, 2015.
[29] E. Fonseca, M. Plakal, F. Font, D. P. W. Ellis and X. Serra, “FSDKaggle2019,”
Zenodo, New York, 2020.
[30] P. Corcoran, C. Costacke, V. Varkarakis and J. Lemley, “Deep Learning for
Consumer Devices and Services 3—Getting More From Your Datasets With Data
Augmentation,” IEEE Consumer Electronics Magazine, vol. 9, no. 3, pp. 48-54,
2020.
[31] S. K. Kumar, “On weight initialization in deep neural networks,” ArXiv, vol.
abs/1704.08863, 2017.
[32] D. P. Kingma and J. L. Ba, “ADAM: A Method for Stochastic Optimization,” in
International Conference on Learning Representation, 2014.
[33] L. D. Shapiro, “Boston Chapter Attuned to Hearing Health Care [Society News],
IEEE Consumer Electronics Magazine, vol. 8, no. 3, pp. 5-6, 2019.

QR CODE