簡易檢索 / 詳目顯示

研究生: Zohaib Mushtaq
Zohaib Mushtaq
論文名稱: 以光譜圖及不同的資料擴增技術用於深層卷積神經網路的環境聲音分類辨識
DCNN based Environmental Sounds Classification by Using Spectrogram Images and Various Data Augmentation Techniques
指導教授: 蘇順豐
Shun-Feng Su
口試委員: 李祖添
Tsu-Tian Lee
Wen-June Wang
Wei-Yen Wang
Yo-Ping Huang
Ching-Chih Tsai
Jyh-Horng Chou
Sheng-Luen Chung
學位類別: 博士
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 109
語文別: 英文
論文頁數: 128
中文關鍵詞: 環境聲音分類增廣聚合深度卷積神經網絡頻譜圖
外文關鍵詞: Environmental Sound Classification, Augmentation, Aggregation, Deep Convolutional Neural Network, Spectrogram
相關次數: 點閱:376下載:0
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報


In recent years the Sound Event Recognition (SER), has become very popular due to the complex nature of environmental sounds. The autonomous classification of dynamic and intricate urban sounds is one of the major aspects of different applications such as noise source recognition, surveillance systems, and various sound identification applications. Therefore, this dissertation aims to propose a cutting-edge methodology, which could classify the environmental sounds with a higher accuracy rate. This study used three open-source environmental sounds datasets, named the ESC-10, ESC-50, and Urbansound8k (Us8k). The first method proposed two L2 regularizations based on Deep Convolutional Neural Network (DCNN) models build from scratch. These models are named Model-1 (with Max-pooling) and Model-2 (without Max-pooling). Three famous auditory features Mel spectrogram (Mel), Log-Mel spectrogram (LM), and Mel Frequency Cepstral Coefficient (MFCC) are used in this experiment. The audio augmentation is also performed. The Model-2 with LM feature attains the best results. The second methodology includes the use of spectrogram images instead of audio clips, directly feed to DCNN. The transfer learning-based 11 explicit pretrained weights trained on ImageNet, deployed on the spectrogram images. The concept of fine-tuning layers with optimal learning rates based on discriminative learning is also used. This methodology is also implemented on our proposed meaningful data augmentation approach. The remarkable results obtained by the proposed approach by using ResNet-152 and DenseNet-161 models on the recommended data enhancement techniques. The third and final strategy is based on the selection of the best performing transfer learning model from the previous experiment. Various acoustic feature aggregation approaches are implemented. In this part of the strategy, we also generate two novels Mel filter based auditory features, named Log (Log-Mel) L2M and Log (Log (Log-Mel)) L3M. Two new innovative data enhancement techniques are also proposed. These methodologies are the mixture of the reinforcement and accumulation of the audio augmented general and Mel filter based ordinary and novel acoustic features in the form of spectrograms. The proposed data augmentation approaches are also tested on the real audio recordings obtained from YouTube, similar to the used original datasets. The proposed methodologies are robust, efficient, and also achieve the state-of-the-art results on both, validation and real audio data.

Contents Abstract i Acknowledgment ii List of Abbreviations iii Contents v List of Figures vii List of Tables xi Chapter 1. Introduction 1 1.1. MOTIVATION 1 1.2. CONTRIBUTIONS 5 1.3. DISSERTATION ARRANGEMENT 8 Chapter 2. Materials and Datasets 9 2.1. DATASETS DESCRIPTION 9 2.2. DESCRIPTION OF REAL AUDIO DATA FROM YOUTUBE. 11 2.3. HARDWARE SPECIFICATIONS OF THE SYSTEM. 12 2.4. SOFTWARE SPECIFICATIONS OF THE SYSTEM. 12 Chapter 3. Classification of Environmental Sounds by Using a Regularized DCNN with Data Augmentation 14 3.1. METHODOLOGY 14 3.2. RESULTS AND DISCUSSIONS 22 3.3. COMPARISON WITH RELATED STUDIES 33 3.4. CONCLUDING REMARKS 34 Chapter 4. Spectral Images based Environmental Sounds Classification using CNN with Meaningful Data Augmentation 35 4.1. METHODOLOGY 35 4.2. RESULTS AND DISCUSSIONS 47 4.3. ANALYSIS AND COMPARISON WITH RELATED LITERATURE 57 4.4. CONCLUDING REMARKS 59 Chapter 5. Efficient Classification of Environmental Sounds Through Multiple Features Aggregation and Novel Data Enhancement Techniques for Spectrogram Images 60 5.1. METHODOLOGY 60 5.2. EXPERIMENTAL RESULTS 79 5.3. COMPARISON AND ANALYSIS OF RESULTS WITH PREVIOUS AND BASELINE MODELS. 97 5.4. CONCLUDING REMARKS 100 Chapter 6. Conclusions and Future work 101 References 103

