簡易檢索 / 詳目顯示

研究生: Zohaib Mushtaq
Zohaib Mushtaq
論文名稱: 以光譜圖及不同的資料擴增技術用於深層卷積神經網路的環境聲音分類辨識
DCNN based Environmental Sounds Classification by Using Spectrogram Images and Various Data Augmentation Techniques
指導教授: 蘇順豐
Shun-Feng Su
口試委員: 李祖添
Tsu-Tian Lee
王文俊
Wen-June Wang
王偉彥
Wei-Yen Wang
姚立德
Leehter-Yao
黃有評
Yo-Ping Huang
蔡清池
Ching-Chih Tsai
周至宏
Jyh-Horng Chou
鍾聖倫
Sheng-Luen Chung
學位類別: 博士
Doctor
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2020
畢業學年度: 109
語文別: 英文
論文頁數: 128
中文關鍵詞: 環境聲音分類增廣聚合深度卷積神經網絡頻譜圖
外文關鍵詞: Environmental Sound Classification, Augmentation, Aggregation, Deep Convolutional Neural Network, Spectrogram
相關次數: 點閱:214下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,由於環境聲音的複雜性,聲音事件識別(SER)變得非常流行。動態和復雜的城市聲音的自主分類是不同應用程序(例如噪聲源識別,監視系統和各種聲音識別應用程序)的主要方面之一。因此,本文旨在提出一種前沿的方法論,以較高的準確率對環境聲音進行分類。這項研究使用了三個開源環境聲音數據集,分別稱為ESC-10,ESC-50和Urbansound8k(Us8k)。第一種方法基於從頭開始構建的深度卷積神經網絡(DCNN)模型,提出了兩個L2正則化。這些模型分別命名為Model-1(具有最大緩衝)和Model-2(不具有最大緩衝)。本實驗使用了三個著名的聽覺特徵梅爾頻譜圖(Mel),對數梅爾頻譜圖(LM)和梅爾頻率倒譜係數(MFCC)。還執行音頻增強。具有LM功能的Model-2可獲得最佳效果。第二種方法包括使用頻譜圖圖像代替音頻片段,直接將其饋送到DCNN。在ImageNet上訓練的基於傳遞學習的11個明確的預訓練權重,部署在頻譜圖圖像上。還使用了基於判別學習的具有最佳學習率的微調層的概念。我們建議的有意義的數據擴充方法也採用了這種方法。通過在推薦的數據增強技術上使用ResNet-152和DenseNet-161模型,所提出的方法獲得了顯著結果。第三個也是最後一個策略是基於從先前的實驗中選擇性能最佳的轉移學習模型。實現了各種聲學特徵聚集方法。在該策略的這一部分中,我們還生成了兩種基於梅爾過濾器的新穎聽覺特徵,分別稱為Log(Log-Mel)L2M和Log(Log(Log-Mel))L3M。還提出了兩種新的創新數據增強技術。這些方法是音頻增強的常規和頻譜圖形式的基於Mel濾波器的普通和新穎聲學特徵的增強和累積的混合。擬議的數據增強方法也將在從YouTube獲得的真實錄音中進行測試,類似於所使用的原始數據集。所提出的方法是健壯,高效的,並且還可以在驗證和真實音頻數據上獲得最新的結果。


    In recent years the Sound Event Recognition (SER), has become very popular due to the complex nature of environmental sounds. The autonomous classification of dynamic and intricate urban sounds is one of the major aspects of different applications such as noise source recognition, surveillance systems, and various sound identification applications. Therefore, this dissertation aims to propose a cutting-edge methodology, which could classify the environmental sounds with a higher accuracy rate. This study used three open-source environmental sounds datasets, named the ESC-10, ESC-50, and Urbansound8k (Us8k). The first method proposed two L2 regularizations based on Deep Convolutional Neural Network (DCNN) models build from scratch. These models are named Model-1 (with Max-pooling) and Model-2 (without Max-pooling). Three famous auditory features Mel spectrogram (Mel), Log-Mel spectrogram (LM), and Mel Frequency Cepstral Coefficient (MFCC) are used in this experiment. The audio augmentation is also performed. The Model-2 with LM feature attains the best results. The second methodology includes the use of spectrogram images instead of audio clips, directly feed to DCNN. The transfer learning-based 11 explicit pretrained weights trained on ImageNet, deployed on the spectrogram images. The concept of fine-tuning layers with optimal learning rates based on discriminative learning is also used. This methodology is also implemented on our proposed meaningful data augmentation approach. The remarkable results obtained by the proposed approach by using ResNet-152 and DenseNet-161 models on the recommended data enhancement techniques. The third and final strategy is based on the selection of the best performing transfer learning model from the previous experiment. Various acoustic feature aggregation approaches are implemented. In this part of the strategy, we also generate two novels Mel filter based auditory features, named Log (Log-Mel) L2M and Log (Log (Log-Mel)) L3M. Two new innovative data enhancement techniques are also proposed. These methodologies are the mixture of the reinforcement and accumulation of the audio augmented general and Mel filter based ordinary and novel acoustic features in the form of spectrograms. The proposed data augmentation approaches are also tested on the real audio recordings obtained from YouTube, similar to the used original datasets. The proposed methodologies are robust, efficient, and also achieve the state-of-the-art results on both, validation and real audio data.

    Contents Abstract i Acknowledgment ii List of Abbreviations iii Contents v List of Figures vii List of Tables xi Chapter 1. Introduction 1 1.1. MOTIVATION 1 1.2. CONTRIBUTIONS 5 1.3. DISSERTATION ARRANGEMENT 8 Chapter 2. Materials and Datasets 9 2.1. DATASETS DESCRIPTION 9 2.2. DESCRIPTION OF REAL AUDIO DATA FROM YOUTUBE. 11 2.3. HARDWARE SPECIFICATIONS OF THE SYSTEM. 12 2.4. SOFTWARE SPECIFICATIONS OF THE SYSTEM. 12 Chapter 3. Classification of Environmental Sounds by Using a Regularized DCNN with Data Augmentation 14 3.1. METHODOLOGY 14 3.2. RESULTS AND DISCUSSIONS 22 3.3. COMPARISON WITH RELATED STUDIES 33 3.4. CONCLUDING REMARKS 34 Chapter 4. Spectral Images based Environmental Sounds Classification using CNN with Meaningful Data Augmentation 35 4.1. METHODOLOGY 35 4.2. RESULTS AND DISCUSSIONS 47 4.3. ANALYSIS AND COMPARISON WITH RELATED LITERATURE 57 4.4. CONCLUDING REMARKS 59 Chapter 5. Efficient Classification of Environmental Sounds Through Multiple Features Aggregation and Novel Data Enhancement Techniques for Spectrogram Images 60 5.1. METHODOLOGY 60 5.2. EXPERIMENTAL RESULTS 79 5.3. COMPARISON AND ANALYSIS OF RESULTS WITH PREVIOUS AND BASELINE MODELS. 97 5.4. CONCLUDING REMARKS 100 Chapter 6. Conclusions and Future work 101 References 103

    References
    [1] F.Weninger and B.Schuller, “Audio recognition in the wild: static and dynamic classification on a real-world database of animal vocalizations,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 337–340.
    [2] C. H.Lee, C. C.Han, and C. C.Chuang, “Automatic classification of bird species from their sounds using two-dimensional cepstral coefficients,” IEEE Trans. Audio, Speech Lang. Process., vol. 16, no. 8, pp. 1541–1550, 2008.
    [3] E.Baum, M.Harper, R.Alicea, and C.Ordonez, “Sound identification for fire-fighting mobile robots,” in Proc. 2nd IEEE International Conference on Robotic Computing (IRC), 2018, pp. 79–86.
    [4] P.Laffitte, D.Sodoyer, C.Tatkeu, and L.Girin, “Deep neural networks for automatic detection of screams and shouted speech in subway trains,” ICASSP, IEEE Int. Conf. Acoust. Speech Signal Process. - Proc., vol. 2016-May, pp. 6460–6464, 2016.
    [5] P.Intani and T.Orachon, “Crime warning system using image and sound processing,” in International Conference on Control, Automation, and Systems (ICCAS), 2013, pp. 1751–1753.
    [6] Y.Alsouda, S.Pllana, and A.Kurti, “IoT-based urban noise identification using machine learning: performance of svm, knn, bagging, and random forest,” ACM Int. Conf. Proceeding Ser., vol. Part F1481, pp. 62–67, 2019.
    [7] S.Steinle, S.Reis, and C. E.Sabel, “Quantifying human exposure to air pollution-Moving from static monitoring to spatio-temporally resolved personal exposure assessment,” Sci. Total Environ., vol. 443, no. May 2019, pp. 184–193, 2013.
    [8] M.Vacher, D.Istrate, L.Besacier, J.Serignat, and E.Castelli, “Sound detection and classification for medical telesurvey,” in Annual International Conference of the IEEE engineering in medicine and biology society, 2014.
    [9] J. D.Deng, C.Simmermacher, and S.Cranefield, “A study on feature analysis for musical instrument classification,” IEEE Trans. Syst. MAN, Cybern. B Cybern., vol. 38, no. 2, pp. 429–438, 2008.
    [10] L.Jing et al., “DCAR: A discriminative and compact audio representation for audio processing,” IEEE Trans. Multimed., vol. 19, no. 12, pp. 2637–2650, 2017.
    [11] J.Ye, T.Kobayashi, X.Wang, H.Tsuda, and M.Masahiro, “Audio data mining for anthropogenic disaster identification: an automatic taxonomy approach,” IEEE Trans. Emerg. Top. Comput., vol. 6750, no. c, pp. 1–1, 2017.
    [12] M.Green and D.Murphy, “Environmental sound monitoring using machine learning on mobile devices,” Appl. Acoust., vol. 159, p. 107041, 2020.
    [13] Huakang Li, Satoshi Ishikawa, Qunfei Zhao, Michiko Ebana, Hiroyuki Yamamoto and Jie Huang, "Robot navigation and sound based position identification," 2007 IEEE International Conference on Systems, Man and Cybernetics, Montreal, Que., 2007, pp. 2449-2454, doi: 10.1109/ICSMC.2007.4413757.
    [14] G. Narang, K. Nakamura and K. Nakadai, "Auditory-aware navigation for mobile robots based on reflection-robust sound source localization and visual SLAM," in IEEE International Conference on Systems, Man, and Cybernetics (SMC), San Diego, USA, 2014, pp. 4021-4026, doi: 10.1109/SMC.2014.6974560.
    [15] Graf, S., Herbig, T., Buck, M. et al. "Features for voice activity detection: a comparative analysis". EURASIP J. Adv. Signal Process, 2015, pp. 1-15, doi: 10.1186/s13634-015-0277.
    [16] M.Lagrange, G.Lafay, B.Défréville, and J.-J.Aucouturier, “The bag-of-frames approach: A not so sufficient model for urban soundscapes,” J. Acoust. Soc. Am., vol. 138, no. 5, pp. EL487–EL492, 2015, doi: 10.1121/1.4935350.
    [17] S.Sigtia, A. M.Stark, S.Krstulović, and M. D.Plumbley, “Automatic environmental sound recognition: performance versus computational cost,” IEEE Trans. Audio Speech Lang. Process., vol. 24, no. 11, pp. 2096–2107, 2016.
    [18] H.Ali, S. N.Tran, E.Benetos, and A. S.d’Avila Garcez, “Speaker recognition with hybrid features from a deep belief network,” Neural Comput. Appl., vol. 29, no. 6, pp. 13–19, 2018.
    [19] L.Sun, T.Gu, K.Xie, and J.Chen, “Text-independent speaker identification based on deep Gaussian correlation super vector,” Int. J. Speech Technol., vol. 22, no. 2, pp. 449–457, 2019.
    [20] K.Choi, G.Fazekas, M.Sandler, and K.Cho, “Transfer learning for music classification and regression tasks,” in Proceedings of the 18th International Society for Music Information Retrieval (ISMIR), Suzhou, China, 2017, pp.1-9.
    [21] Y. M. G.Costa, L. S.Oliveira, and C. N.Silla, “An evaluation of convolutional neural networks for music classification using spectrograms,” Appl. Soft Comput. J., vol. 52, pp. 28–38, 2017.
    [22] H.Phan, L.Hertel, M.Maass, R.Mazur, and A.Mertins, “Learning representations for nonspeech audio events through their similarities to speech patterns,” IEEE Trans. Audio Speech Lang. Process., vol. 24, no. 4, pp. 807–822, 2016.
    [23] M.Crocco, M.Cristani, A.Trucco, and V.Murino, “Audio surveillance: A systematic review,” ACM Comput. Surv., vol. 48, no. 4, pp. 1-44, 2016.
    [24] S.Ntalampiras, I.Potamitis, and N.Fakotakis, “Probabilistic novelty detection for acoustic surveillance under real-world conditions,” IEEE Trans. Multimed., vol. 13, no. 4, pp. 713–719, 2011.
    [25] J. F.Gemmeke, L.Vuegen, P.Karsmakers, B.Vanrumste, and H.VanHamme, “An exemplar-based nmf approach to audio event detection,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013, no. 1, pp. 1–4.
    [26] S.Chachada and C. C. J.Kuo, “Environmental sound recognition : a survey,” Asia-Pacific Signal and Information Processing Association (APSIPA) Trans. Signal Inf. Process., vol. 3, no. 2014, pp. 1–15, 2014.
    [27] M.Muller, F.Kurth, and M.Clausen, “Chroma based statistical audio features for audio matching,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005, pp. 275–278.
    [28] C.Harte, M.Sandler, and M.Gasser, “Detecting harmonic change in musical audio,” in Proc. of ACM Audio and Music Computing Multimedia (AMCMM), 2006, pp. 21–25.
    [29] L.Lu, H.Zhang, J.Tao, L.Cui, and D.Jiang, “Music type classification by spectral contrast feature’,” in IEEE Int. Conf. on Multimedia and Expo, 2002, pp. 113–116.
    [30] Z.Zhang, S.Xu, S.Cao, and S.Zhang, “Deep convolutional neural network with mixup for environmental sound classification,” in Conference on Pattern Recognition and Computer Vision (PRCV), 2018, vol. 2, pp. 356–367.
    [31] L.Qu, C.Weber, and S.Wermter, “LipSound: neural mel spectrogram reconstruction for lip reading,” in Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH), 2019, no. 4, pp. 2768–2772.
    [32] J.Li, W.Dai, F.Metze, S.Qu, and S.Das, “A comparison of deep learning methods for environmental sound detection,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2017, pp. 126–130.
    [33] J.Holdsworth and I.Nimmo-Smith, “Implementing a gammatone filter bank,” SVOS Final Rep. (Part A Audit. Filter Bank), no. 1, pp. 1–5, 1988.
    [34] J. T.Geiger andK.Helwani, “Improving event detection for audio surveillance using Gabor filterbank features,” in 23rd European Signal Processing Conference (EUSIPCO), 2015, pp. 714–718.
    [35] K. J.Piczak, “ESC: Dataset for environmental sound classification,” in Proceedings of the ACM Multimedia Conference, 2015, pp. 1015–1018.
    [36] Justin Salamon; Christopher Jacoby; Juan Pablo Bello, “A dataset and taxonomy for urban sound research,” in Proceedings of the 22nd ACM International Conference on Multimedia, 2014, no. 3, pp. 1041–1044.
    [37] B.daSilva, A. W.Happi, A.Braeken, and A.Touhafi, “Evaluation of classical machine learning techniques towards urban sound recognition on embedded systems,” Appl. Sci., vol. 9, no. 18, 2019. pp. 1-27. doi:10.3390/app9183885
    [38] K. J.Piczak, “Environmental sound classification with convolutional neural networks,” in IEEE international workshop on machine learning for signal processing, Boston, USA, 2015. pp.1-6.
    [39] H.Zhou, Y.Song, and H.Shu, “Using a deep convolutional neural network to classify urban sounds,” in IEEE Region 10th Annual International Conference, Proceedings (TENCON), 2017, pp. 3089–3092.
    [40] F.Demir, D. A.Abdullah, and A.Sengur, “A new deep cnn model for environmental sound classification,” IEEE Access, pp. 66529–66537, 2020.
    [41] Y.Chen, Q.Guo, X.Liang, J.Wang, and Y.Qian, “Environmental sound classification with dilated convolutions,” Appl. Acoust., vol. 148, pp. 123–132, 2019.
    [42] L.Hertel, H.Phan, and A.Mertins, “Comparing time and frequency domain for audio event recognition using deep learning,” in Proc. Int. Jt. Conf. Neural Networks, 2016, pp. 3407–3411.
    [43] A.Pillos, K.Alghamidi, N.Alzamel, V.Pavlov, and S.Machanavajhala, “A real-time environmental sound recognition system for the android os,” in Conf. of Detect. Classif. Acoust. Scenes Events, Budapest, Hungary, 2016. pp. 1-5.
    [44] S.Ahmad et al., “Environmental sound classification using optimum allocation sampling based empirical mode decomposition,” Phys. A Stat. Mech. its Appl., vol. 537, no. 122613, pp. 1-11, 2020.
    [45] F.Medhat, D.Chesmore, and J.Robinson, “Masked Conditional Neural Networks for sound classification,” Appl. Soft Comput. J., vol. 90, no. 106073, pp. 1-17, 2020.
    [46] A.Singh, P.Rajan, and A.Bhavsar, “SVD-based redundancy removal in 1-D CNNs for acoustic scene classification,” Pattern Recognit. Lett., vol. 131, pp. 383–389, 2020.
    [47] S.Abdoli, P.Cardinal, and A.Lameiras Koerich, “End-to-end environmental sound classification using a 1D convolutional neural network,” Expert Syst. Appl., vol. 136, pp. 252–263, 2019.
    [48] X.Li, V.Chebiyyam, and K.Kirchhoff, “Multi-stream network with temporal attention for environmental sound classification,” in Proceedings of the Annual Conference of the International Speech Communication Association, (INTERSPEECH), 2019, pp. 3604–3608.
    [49] S.Wang, Z.Tang, and S.Li, “Design and implementation of an audio classification system based on SVM,” Procedia Eng., vol. 15, pp. 4031–4035, 2011.
    [50] J.Ye, T.Kobayashi, and M.Murakawa, “Urban sound event classification based on local and global features aggregation,” Appl. Acoust., vol. 117, pp. 246–256, 2017.
    [51] D.Chong, Y.Zou, and W.Wang, “Multi-channel convolutional neural networks with multi-level feature fusion for environmental sound classification,” in 25th International Conference on Multimedia Modeling (Mmm), 2019, vol. 2, pp. 157–168.
    [52] M.Yang, L.Yu, and A.Herweg, “Automated environmental sound recognition for soundscape measurement and assessment,” in 48th International Congress and Exhibition on Noise Control Engineering, 2019, pp. 1-9.
    [53] J.Sharma, O.-C.Granmo, and M.Goodwin, “Environment sound classification using multiple feature channels and deep convolutional neural networks,” J. Latex Cl. Files, vol. 14, no. 8, pp. 1–11, 2015.
    [54] B.daSilva, A. W.Happi, A.Braeken, and A.Touhafi, “Evaluation of classical machine learning techniques towards urban sound recognition on embedded systems,” Appl. Sci., vol. 9, no. 18, pp. 1–27, 2019.
    [55] V.Boddapati, A.Petef, J.Rasmusson, and L.Lundberg, “Classifying environmental sounds using image recognition networks,” Procedia Computer Science, vol. 112, pp. 2048–2056, 2017.
    [56] Z.Zhang, S.Xu, S.Zhang, T.Qiao, and S.Cao, “Learning Attentive Representations for Environmental Sound Classification,” IEEE Access, vol. 7, pp. 130327–130339, 2019.
    [57] F.Demir, M.Turkoglu, M.Aslan, and A.Sengur, “A new pyramidal concatenated CNN approach for environmental sound classification,” Appl. Acoust., vol. 170, no. 107520, pp. 1-7, 2020.
    [58] Z.Mushtaq and S. F.Su, “Environmental sound classification using a regularized deep convolutional neural network with data augmentation,” Appl. Acoust., vol. 167, no. 107389, pp. 1-13, 2020.
    [59] A.Khamparia, D.Gupta, N. G.Nguyen, A.Khanna, B.Pandey, and P.Tiwari, “Sound classification using convolutional neural network and tensor deep stacking network,” IEEE Access, vol. 7, pp. 7717–7727, 2019.
    [60] J.Salamon and J. P.Bello, “Deep convolutional neural networks and data augmentation for environmental sound classification,” IEEE Signal Process. Lett., vol. 24, no. 3, pp. 279–283, 2017.
    [61] M.Esmaeilpour, P.Cardinal, and A.Lameiras Koerich, “Unsupervised feature learning for environmental sound classification using weighted cycle-consistent generative adversarial network,” Appl. Soft Comput. J., vol. 86, no. 105912, pp. 1-13, 2020.
    [62] B.McFee et al., “librosa: Audio and Music Signal Analysis in Python,” in Proceedings of the 14th Python in Science Conference, 2015, pp. 18–24.
    [63] J. Howard et al., “vision.learner | fastai,” GitHub, 2018. [Online]. Available: https://docs.fast.ai/vision.learner.html. [Accessed: 26-Feb-2020].
    [64] F.Chollet, “Image Preprocessing - Keras Documentation,” GitHub, 2015. [Online]. Available: https://keras.io/preprocessing/image/. [Accessed: 16-Nov-2019].
    [65] Audacity Team, “Audacity,” Audacity version 2.3.3, 2008. [Online]. Available: https://www.audacityteam.org/. [Accessed: 20-Feb-2020].
    [66] X.Valero and F.Alias, “Gammatone cepstral coefficients: Biologically inspired features for non-speech audio classification,” IEEE Trans. Multimed., vol. 14, no. 6, pp. 1684–1689, 2012.
    [67] C.V.Cotton and D. P. W.Ellis, “Spectral vs. spectro-temporal features for acoustic event detection,” in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2011, pp. 69–72.
    [68] S.Li, Y.Yao, J.Hu, G.Liu, X.Yao, and J.Hu, “An ensemble stacked convolutional neural network model for environmental event sound recognition,” Appl. Sci., vol. 8, no. 7(1152), pp. 1-20, 2018.
    [69] M.Raghu, C.Zhang, J.Kleinberg, and S.Bengio, “Transfusion: understanding transfer learning for medical imaging,” in 33rd Conference on Neural Information Processing Systems (NeurlPS), 2019, pp. 1–11.
    [70] P.Arora and R.Haeb-Umbach, “A study on transfer learning for acoustic event detection in a real life scenario,” in 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP), 2017, pp. 1–6.
    [71] S.Hershey et al., “CNN architectures for large-scale audio classification,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process (ICASSP) pp. 131–135, 2017.
    [72] R.Arandjelović and A.Zisserman, “Objects that sound,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 11205, pp. 451–466, 2018.
    [73] K.He, X.Zhang, S.Ren, and J.Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit (CVPR), pp. 770–778, 2016.
    [74] G.Huang, Z.Liu, L.Van DerMaaten, and K. Q.Weinberger, “Densely connected convolutional networks,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017, pp. 2261–2269.
    [75] F. N.Iandola, S.Han, M. W.Moskewicz, K.Ashraf, W. J.Dally, and K.Keutzer, “SqueezeNet: alexnet-level accuracy with 50x fewer parameters and < 0.5Mb model size,” in 5th International Conf. of Learning Representations (ICLR), pp. 1–13, 2017. arXiv:1602.07360v4.
    [76] K.Simonyan and A.Zisserman, “Very deep convolutional networks for large-scale image recognition,” in 3rd International Conf. of Learning Representations (ICLR), 2015, pp. 1–14.
    [77] L. N.Smith, “Cyclical learning rates for training neural networks,” in Proceedings of IEEE Winter Conference on Applications of Computer Vision, (WACV), 2017, pp. 464–472.
    [78] A. P.George and W. B.Powell, “Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming,” Mach. Learn., vol. 65, no. 1, pp. 167–198, 2006.
    [79] J.Duchi, E.Hazan, and Y.Singer, “Adaptive subgradient methods for online learning and stochastic optimization,” J. Mach. Learn. Res., vol. 12, pp. 2121–2159, 2011.
    [80] C.Shorten and T. M.Khoshgoftaar, “A survey on image data augmentation for deep learning,” J. Big Data, vol. 6, no. 1, pp. 1-48, 2019.
    [81] S.Wang, B.Pan, H.Chen, and Q.Ji, “Thermal augmented expression recognition,” IEEE Trans. Cybern., vol. 48, no. 7, pp. 2203–2214, 2018.
    [82] J.Luo, M.Boutell, R. T.Gray, and C.Brown, “Image transform bootstrapping and its applications to semantic scene classification,” IEEE Trans. Syst. MAN, Cybern. B Cybern., vol. 35, no. 3, pp. 563–570, 2005.
    [83] Y.Tokozume, Y.Ushiku, and T.Harada, “Learning from between-class examples for deep sound recognition,” in International Conf. of Learning Representations (ICLR), 2018, pp. 1–13.
    [84] Chaurasiya, Himanshu. “Time-frequency representations: spectrogram, cochleogram and correlogram.” Procedia Computer Science., vol. 167, 1901-1910, 2020.
    [85] Y.Aytar, C.Vondrick, and A.Torralba, “SoundNet: Learning sound representations from unlabeled video,” in Conf. Adv. Neural Inf. Process. Syst (NIPS), 2016, pp. 892–900.
    [86] Y.Su, K.Zhang, J.Wang, and K.Madani, “Environment sound classification using a two-stream CNN based on decision-level fusion,” Sensors (Switzerland), vol. 19, no. 7, pp. 1–15, 2019.
    [87] S.Chandrakala and S. L.Jayalakshmi, “Generative model driven representation learning in a hybrid framework for environmental audio scene and sound event recognition,” IEEE Trans. Multimed., vol. 22, no. 1, pp. 3–14, 2020.
    [88] B.Zhu et al., “Learning environmental sounds with multi-scale convolutional neural network,” in Proceedings of the International Joint Conference on Neural Networks, 2018, pp. 1-7. arXiv:1803.10219v1.
    [89] K.Palanisamy, D.Singhania, and A.Yao, “Rethinking cnn models for audio classification,” in 25th International Conference on Pattern Recognition (ICPR), 2020. pp. 1-8. arXiv:2007.11154v2.
    [90] R. G.Bachu, S.Kopparthi, B.Adapa, and B. D.Barkana, “Separation of voiced and unvoiced using zero-crossing rate and energy of the speech signal,” in Conf. American Society For Engineering Education, 2008, pp. 279–282.
    [91] M. A.Bartsch and G. H.Wakefield, “Audio thumbnailing of popular music using chroma-based representations,” IEEE Trans. Multimed., vol. 7, no. 1, pp. 96–104, 2005.
    [92] Y.Su, K.Zhang, J.Wang, D.Zhou, and K.Madani, “Performance analysis of multiple aggregated acoustic features for environment sound classification,” Appl. Acoust., vol. 158, no. 107050. pp. 1-11, 2020.
    [93] A.Nepal, A. K.Shah, and D. C.Shrestha, “Chroma feature extraction,” in Encyclopedia of GIS, 2019, pp. 1–9.
    [94] C.Harte, M.Sandler, and M.Gasser, “Detecting harmonic change in musical audio,” in Proceedings of the 1st workshop on Audio and Music Computing Multimedia, 2006, pp. 21–26.
    [95] D.Jiang, L.Lu, H.Zhang, J.Tao, and L.-H.Cai, “Music type classification by spectral contrast feature,” in Proceedings. IEEE International Conference on Multimedia and Expo, 2002, pp. 113–116.
    [96] S. C.Wong, A.Gatt, V.Stamatescu, and M. D.McDonnell, “Understanding data augmentation for classification: when to warp?,” in International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2016, pp.1-6.
    [97] A.Mikołajczyk and M.Grochowski, “Data augmentation for improving deep learning in image classification problem,” in 2018 International Interdisciplinary PhD Workshop (IIPhDW), 2018, pp. 117–122.
    [98] A.Tharwat, “Classification assessment methods,” Appl. Comput. Informatics, pp. 1-13, 2018.
    [99] D.Chicco and G.Jurman, “The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation,” BMC Genomics, vol. 21, no. 1, pp. 1–13, 2020.
    [100] J.Yosinski, J.Clune, Y.Bengio, and H.Lipson,“How transferable are features in deep neural networks,” Adv. Neural Inf. Process. Syst., vol.4,pp.3320–3328, 2014

    無法下載圖示 全文公開日期 2025/12/10 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE