簡易檢索 / 詳目顯示

研究生: 洪子翔
Tz-Shiang Hung
論文名稱: 基於Wavenet 和SENet 的異常聲音偵測系統
Detecting Anomalous Sound Event Based on WaveNet and SENet
指導教授: 沈上翔
Shan-Hsiang Shen
口試委員: 洪西進
Shi-Jinn Horng
林灶生
Jzau-Sheng Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 39
中文關鍵詞: 深度學習異常偵測
外文關鍵詞: Deep Learning, Anomalous Detection
相關次數: 點閱:310下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

近年來,許多研究人員專門研究使用圖像檢測異常事件。儘管基於圖像的異常事件檢測性能良好,但仍然存在一些缺點,例如拍攝角度限制,安裝成本高等。基於聲音的異常事件檢測可以避免這些缺點。基於聲音的異常事件檢測也分為多種類型,從概率分佈理論到深度學習。本文在深度學習的基礎上,設計並實現了一套完整的異常聲音檢測系統,使用WaveNet進行快速仿真,使用SENet進行訓練優化。我們還使用量身定制的嵌入式系統來實現準確性和速度之間的平衡。因此,可以將設計的異常聲音偵測系統安裝到可以在工廠中部署的嵌入式系統中。與現有的WaveNet方法相比,在本文的自我收集驗證數據集中,儘管EB F1分數沒有明顯改善,但是添加SENet可以將訓練時間縮短22%,並且SENet 版本的SB F1得分(88.48%)比原始版本(86.67%)好。


In recent years, many researchers are specializing in using images to detect abnormal events. Even though image-based abnormal event detection performs well, it still has some disadvantages, such as shooting angle limitation, high installation cost, and so on. Sound-based abnormal event detection can avoid these disadvantages. Sound-based abnormal event detection is also divided into many types, ranging from probability distribution theory to deep learning. In this paper, based on deep learning, we design and implement a complete set of abnormal sound detection system, using WaveNet for fast simulation, and SENet for training optimization. We also use a tailor-made embedded system to achieve a balance between accuracy and speed. Hence, the designed abnormal sound detection system can be installed into the embedded system which can be deployed in a factory. Compared to the existing WaveNet approach, in the self-collection verification dataset of this paper, although the EB F1-score didn't major improve, the addition of SENet can speed up training time 22%, and the addition of SENet SB F1-score (88.48%) is better than the original (86.67%). 

I. Introduction 8 II The Proposed Method 11 A. Detection Pre-processing 12 B. Original WaveNet 16 C. WaveNet Joined with SENet 22 D. Optimization Model 23 E. Detection Post-processing 25 III. Experiments 28 A. System Architecture 32 B. Pipeline Leakage Experiments 32 C. Real Anomalies 33 References 34

[1] S. C. Lee and R. Nevatia, “Hierarchical abnormal event detection by real time and semi-real time multi-tasking video surveillance system,” Machine Vision and Applications, vol. 25, pp. 133-143, May 2013.
[2] P. Wu, J. Liu, and F. Shen, “A deep one-class neural network for anomalous event detection in complex scenes,” IEEE Transactions on Neural Networks and Learning Systems, pp. 1-14, 2019.
[3] T. V. Sharath Adavanne, Archontis Politis, “Multichannel sound event detection using 3d convolutional neural networks for learning inter-channel features,” 2018 International Joint Conference on Neural Networks (IJCNN), Jul. 2018.
[4] G. Huang, T. Heittola, and T. Virtanen, “Using sequential information in polyphonic sound event detection,” 16th International Workshop on Acoustic Signal Enhancement (IWAENC), Sep. 2018.
[5] S. Liu, M. Yamada, N. Collier, and M. Sugiyama, “Change-point detection in time-series data by relative density-ratio estimation,” Lecture Notes in Computer Science, Neural Networks, vol. 43, pp. 72-83, 2013.
[6] D. W. Scott, “Outlier detection and clustering by partial mixture modeling,” Proceedings in Computational Statistics, pp. 453-464, 2004.
[7] M. Sakurada and T. Yairi, “Anomaly detection using autoencoders with nonlinear dimensionality reduction,” Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis - MLSDA'14, 2014.
[8] T. Hayashi, S. Watanabe, T. Toda, T. Hori, J. L. Roux, and K. Takeda, “Duration-controlled LSTM for polyphonic sound event detection,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, pp. 2059-2070, Nov. 2017.
[9] T. Hayashi, T. Komatsu, R. Kondo, T. Toda, and K. Takeda, “Anomalous sound event detection based on wavenet,” 2018 26th European Signal Processing Conference (EUSIPCO), Sep. 2018.
[10] T. Komatsu, T. Hayashiy, R. Kondo, T. Todaz, and K. Takeday, “Scene-dependent anomalous acoustic-event detection based on conditional wavenet and i-vector,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2019.
[11] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132-7141, June 2018.
[12] J. C. Steinberg, “Positions of stimulation in the cochlea by pure tones,” The Journal of the Acoustical Society of America, vol. 8, pp. 176-180, Jan. 1937.
[13] J. Volkmann, S. S. Stevens, and E. B. Newman, “A scale for the measurement of the psychological magnitude pitch,” The Journal of the Acoustical Society of America, vol. 8, pp. 208-208, Jan. 1937.
[14] “Mel scale.” https://en.wikipedia.org/wiki/Mel_scale. Accessed October 31, 2019.
[15] P. Mermelstein, “Distance measures for speech recognition, psychological and instrumental,” Pattern recognition and artificial intelligence, vol. 116, pp. 374-388, 1976.
[16] G. R. ITU-T, “Pulse code modulation (pcm) of voice frequencies,” 1988.
[17] A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “Wavenet: A generative model for raw audio,” CoRR abs/1609.03499, 2016.
[18] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” 2015.
[19] V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep learning,” CoRR, vol. abs/1603.07285, 2016.
[20] A. van den Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” CoRR, vol. abs/1601.06759, 2016.
[21] A. van den Oord, N. Kalchbrenner, O. Vinyals, L. Espeholt, A. Graves, and K. Kavukcuoglu, “Conditional image generation with pixelcnn decoders,” CoRR, vol. abs/1606.05328, 2016.
[22] “Tensorrt.” https://devblogs.nvidia.com/speed-up-inferencetensorrt/. Accessed December 4, 2019.
[23] “Information entropy” https://en.wikipedia.org/wiki/Entropy_ (information_theory). Accessed December 4, 2019.

無法下載圖示 全文公開日期 2025/07/31 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE