研究生: |
Said Karam Said Karam |
---|---|
論文名稱: |
基於事件記憶並避免災難性遺忘的持續學習方法應用於環境聲音分類 Episodic Memory Based Continual Learning without Catastrophic Forgetting for Environmental Sound Classification |
指導教授: |
阮聖彰
Shanq-Jang Ruan |
口試委員: |
林昌鴻
Chang Hong Lin 陳維美 Wei-Mei Chen 林淵翔 Yuan-Hsiang Lin |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 英文 |
論文頁數: | 65 |
中文關鍵詞: | 災難性遺忘 、持續學習 、情景記憶 、聲音分類 |
外文關鍵詞: | Catastrophic Forgetting, Continual Learning, Episodic Memory, Sound Classification |
相關次數: | 點閱:120 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
卷積神經網絡在持續學習期間遭受災難性遺忘。這是人工智能的主要障礙之一,要在不忘記先前學到的信息的情況下解決新問題。在本文中,我們提出了一種用於增量學習聲音數據的情景記憶技術。所提出的方法依次觀察任務並成功解決新任務而不會忘記之前的任務。結果表明,所提出的方法能夠有效地向後和向前轉移知識。性能評估表明,所提出的方法比其他基準實現了更好的性能。對於 ESC-50 和 UrbanSound8K 數據集,所提出的方法分別獲得了 96.5% 和 93.2% 的準確率。
Convolutional neural network suffers from catastrophic forgetting during continual learning. This is one of the major obstacles for artificial intelligence, to solve new problems without forgetting the previously learned information. In this article, we propose an episodic memory technique for learning sound data incrementally. The proposed method observes tasks sequentially and successfully solves the new task without forgetting the previous task. The results show that the proposed method is able to transfer backward and forward knowledge efficiently. The performance evaluation demonstrates that the proposed method achieves better performance than other benchmarks. For ESC-50 and UrbanSound8K datasets, the proposed method obtained 96.5% and 93.2% accuracy, respectively.
[1] Z. Ali and M. Talha, “Innovative method for unsupervised voice activity detection and classification of audio segments,” Ieee Access,
vol. 6, pp. 15494–15504, 2018.
[2] H. Li, S. Ishikawa, Q. Zhao, M. Ebana, H. Yamamoto, and J. Huang,
“Robot navigation and sound based position identification,” in 2007
IEEE International Conference on Systems, Man and Cybernetics,
pp. 2449–2454, IEEE, 2007.
[3] J. Ye, T. Kobayashi, X. Wang, H. Tsuda, and M. Murakawa, “Audio data mining for anthropogenic disaster identification: An automatic taxonomy approach,” IEEE Transactions on Emerging Topics
in Computing, vol. 8, no. 1, pp. 126–136, 2017.
[4] R. F. Lyon, “Machine hearing: An emerging field [exploratory dsp],”
IEEE signal processing magazine, vol. 27, no. 5, pp. 131–139, 2010.
[5] D. Barchiesi, D. Giannoulis, D. Stowell, and M. D. Plumbley,
“Acoustic scene classification: Classifying environments from the
sounds they produce,” IEEE Signal Processing Magazine, vol. 32,
no. 3, pp. 16–34, 2015.
[6] K. Choi, G. Fazekas, M. Sandler, and K. Cho, “Transfer learning for music classification and regression tasks,” arXiv preprint
arXiv:1703.09179, 2017.
[7] W. Bian, J. Wang, B. Zhuang, J. Yang, S. Wang, and J. Xiao, “Audiobased music classification with densenet and data augmentation,”
41
in Pacific Rim International Conference on Artificial Intelligence,
pp. 56–65, Springer, 2019.
[8] P. Intani and T. Orachon, “Crime warning system using image and
sound processing,” in 2013 13th International Conference on Control,
Automation and Systems (ICCAS 2013), pp. 1751–1753, IEEE, 2013.
[9] M. Vacher, D. Istrate, L. Besacier, J.F. Serignat, and E. Castelli,
“Sound detection and classification for medical telesurvey,” in 2nd
Conference on Biomedical Engineering, pp. 395–398, 2004.
[10] M. Green and D. Murphy, “Environmental sound monitoring using
machine learning on mobile devices,” Applied Acoustics, vol. 159,
p. 107041, 2020.
[11] J. Ramírez and M. J. Flores, “Machine learning for music genre: multifaceted review and experimentation with audioset,” Journal of Intelligent Information Systems, vol. 55, no. 3, pp. 469–499, 2020.
[12] G. Saon, G. Kurata, T. Sercu, K. Audhkhasi, S. Thomas, D. Dimitriadis, X. Cui, B. Ramabhadran, M. Picheny, L.L. Lim, et al., “English conversational telephone speech recognition by humans and machines,” arXiv preprint arXiv:1703.02136, 2017.
[13] K. J. Piczak, “Environmental sound classification with convolutional
neural networks,” in 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6, IEEE, 2015.
[14] H. Zhou, Y. Song, and H. Shu, “Using deep convolutional neural network to classify urban sounds,” in TENCON 20172017 IEEE Region
10 Conference, pp. 3089–3092, IEEE, 2017.
42
[15] M. Crocco, M. Cristani, A. Trucco, and V. Murino, “Audio surveillance: A systematic review,” ACM Computing Surveys (CSUR),
vol. 48, no. 4, pp. 1–46, 2016.
[16] K. J. Piczak, “Esc: Dataset for environmental sound classification,”
in Proceedings of the 23rd ACM international conference on Multimedia, pp. 1015–1018, 2015.
[17] J. Salamon, C. Jacoby, and J. P. Bello, “A dataset and taxonomy for
urban sound research,” in Proceedings of the 22nd ACM international
conference on Multimedia, pp. 1041–1044, 2014.
[18] V. Bountourakis, L. Vrysis, and G. Papanikolaou, “Machine learning algorithms for environmental sound recognition: Towards soundscape semantics,” in Proceedings of the Audio Mostly 2015 on Interaction With Sound, pp. 1–7, 2015.
[19] D. M. Agrawal, H. B. Sailor, M. H. Soni, and H. A. Patil, “Novel teobased gammatone features for environmental sound classification,”
in 2017 25th European Signal Processing Conference (EUSIPCO),
pp. 1809–1813, IEEE, 2017.
[20] X. Valero and F. Alías, “Classification of audio scenes using narrowband autocorrelation features,” in 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pp. 2012–2019,
2012.
[21] L. Lexfors and M. Johansson, “Audio representation for environmental sound classification using convolutional neural networks,” Master’s Theses in Mathematical Sciences, 2018.
43
[22] C. V. Cotton and D. P. Ellis, “Spectral vs. spectrotemporal features
for acoustic event detection,” in 2011 IEEE Workshop on Applications
of Signal Processing to Audio and Acoustics (WASPAA), pp. 69–72,
IEEE, 2011.
[23] X. Xin, F. Wenhui, Y. Yuming, G. Bin, C. Junhai, and W. Wei, “Hla
based high level modeling and simulation for integrated logistical
supporting system,” in 2007 IEEE International Conference on Automation and Logistics, pp. 2041–2045, IEEE, 2007.
[24] S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE transactions on acoustics, speech, and signal processing, vol. 28, no. 4, pp. 357–366, 1980.
[25] F. Aurino, M. Folla, F. Gargiulo, V. Moscato, A. Picariello, and
C. Sansone, “Oneclass svm based approach for detecting anomalous
audio events,” in 2014 International Conference on Intelligent Networking and Collaborative Systems, pp. 145–151, IEEE, 2014.
[26] A. Mesaros, T. Heittola, A. Eronen, and T. Virtanen, “Acoustic event
detection in real life recordings,” in 2010 18th European Signal Processing Conference, pp. 1267–1271, IEEE, 2010.
[27] P. Somervuo, A. Harma, and S. Fagerlund, “Parametric representations of bird sounds for automatic species recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 6,
pp. 2252–2263, 2006.
44
[28] Z. Zhang, S. Xu, S. Zhang, T. Qiao, and S. Cao, “Attention based
convolutional recurrent neural network for environmental sound classification,” Neurocomputing, vol. 453, pp. 896–903, 2021.
[29] C. K. Reddy, V. Gopal, and R. Cutler, “Dnsmos: A nonintrusive
perceptual objective speech quality metric to evaluate noise suppressors,” in ICASSP 20212021 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), pp. 6493–6497,
IEEE, 2021.
[30] A. Guzhov, F. Raue, J. Hees, and A. Dengel, “Esresne (x) tfbsp: Learning robust timefrequency transformation of audio,” arXiv
preprint arXiv:2104.11587, 2021.
[31] Y. Kim, H. Jeong, J.D. Cho, and J. Shin, “Construction of a
soundscapebased media art exhibition to improve user appreciation
experience by using deep neural networks,” Electronics, vol. 10,
no. 10, p. 1170, 2021.
[32] S. Yadav and M. E. Foster, “Gise51: A scalable isolated sound events
dataset,” arXiv preprint arXiv:2103.12306, 2021.
[33] H. R. Joo and L. M. Frank, “The hippocampal sharp wave–ripple in
memory retrieval for immediate use and consolidation,” Nature Reviews Neuroscience, vol. 19, no. 12, pp. 744–757, 2018.
[34] M. Minsky, “Society of mind: a response to four reviews,” Artificial
Intelligence, vol. 48, no. 3, pp. 371–396, 1991.
45
[35] M. McCloskey and N. J. Cohen, “Catastrophic interference in connectionist networks: The sequential learning problem,” in Psychology of
learning and motivation, vol. 24, pp. 109–165, Elsevier, 1989.
[36] J.L. Shieh, M. A. Haq, S. Karam, P. Chondro, D.Q. Gao, S.J.
Ruan, et al., “Continual learning strategy in onestage object detection framework based on experience replay for autonomous driving
vehicle,” Sensors, vol. 20, no. 23, p. 6777, 2020.
[37] J. Zhang, J. Zhang, S. Ghosh, D. Li, S. Tasci, L. Heck, H. Zhang, and
C.C. J. Kuo, “Classincremental learning via deep model consolidation,” in The IEEE Winter Conference on Applications of Computer
Vision, pp. 1131–1140, 2020.
[38] Z. Li and D. Hoiem, “Learning without forgetting,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 12,
pp. 2935–2947, 2017.
[39] S.A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “icarl:
Incremental classifier and representation learning,” in Proceedings of
the IEEE conference on Computer Vision and Pattern Recognition,
pp. 2001–2010, 2017.
[40] Y. Li, Z. Li, L. Ding, Y. Pan, C. Huang, Y. Hu, W. Chen, and
X. Gao, “Supportnet: solving catastrophic forgetting in class incremental learning with support data,” arXiv preprint arXiv:1806.02942,
2018.
[41] Y. Wu, Y. Chen, L. Wang, Y. Ye, Z. Liu, Y. Guo, Z. Zhang, and
Y. Fu, “Incremental classifier learning with generative adversarial
networks,” arXiv preprint arXiv:1802.00853, 2018.
46
[42] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins,
A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. GrabskaBarwinska,
et al., “Overcoming catastrophic forgetting in neural networks,” Proceedings of the national academy of sciences, vol. 114, no. 13,
pp. 3521–3526, 2017.
[43] R. Pascanu and Y. Bengio, “Revisiting natural gradient for deep networks,” arXiv preprint arXiv:1301.3584, 2013.
[44] J. Schwarz, W. Czarnecki, J. Luketina, A. GrabskaBarwinska, Y. W.
Teh, R. Pascanu, and R. Hadsell, “Progress & compress: A scalable
framework for continual learning,” in International Conference on
Machine Learning, pp. 4528–4537, PMLR, 2018.
[45] M. Delange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis,
G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, pp. 1–1, 2021.
[46] F. Zenke, B. Poole, and S. Ganguli, “Continual learning through
synaptic intelligence,” in International Conference on Machine
Learning, pp. 3987–3995, PMLR, 2017.
[47] N. Y. Masse, G. D. Grant, and D. J. Freedman, “Alleviating catastrophic forgetting using contextdependent gating and synaptic stabilization,” Proceedings of the National Academy of Sciences, vol. 115,
no. 44, pp. E10467–E10475, 2018.
[48] R. Aljundi, K. Kelchtermans, and T. Tuytelaars, “Taskfree continual
learning,” in Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition, pp. 11254–11263, 2019.
47
[49] S. Hou, X. Pan, C. C. Loy, Z. Wang, and D. Lin, “Lifelong learning via progressive distillation and retrospection,” in Proceedings of
the European Conference on Computer Vision (ECCV), pp. 437–452,
2018.
[50] H. Shin, J. K. Lee, J. Kim, and J. Kim, “Continual learning with deep
generative replay,” in Advances in neural information processing systems, pp. 2990–2999, 2017.
[51] L. Perez and J. Wang, “The effectiveness of data augmentation in image classification using deep learning,” arXiv preprint
arXiv:1712.04621, 2017.
[52] Y. Tian, Y. Wang, D. Krishnan, J. B. Tenenbaum, and P. Isola, “Rethinking fewshot image classification: a good embedding is all you
need?,” arXiv preprint arXiv:2003.11539, 2020.
[53] S. Wen, W. Liu, Y. Yang, P. Zhou, Z. Guo, Z. Yan, Y. Chen,
and T. Huang, “Multilabel image classification via feature/label coprojection,” IEEE Transactions on Systems, Man, and Cybernetics:
Systems, pp. 1–10, 2020.
[54] X. Wu, D. Sahoo, and S. C. Hoi, “Recent advances in deep learning
for object detection,” Neurocomputing, vol. 396, pp. 39–64, 2020.
[55] P. Druzhkov and V. Kustikova, “A survey of deep learning methods
and software tools for image classification and object detection,” Pattern Recognition and Image Analysis, vol. 26, no. 1, pp. 9–15, 2016.
[56] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and
S. Zagoruyko, “Endtoend object detection with transformers,” in
48
European Conference on Computer Vision, pp. 213–229, Springer,
2020.
[57] Y. Xiao, Z. Tian, J. Yu, Y. Zhang, S. Liu, S. Du, and X. Lan, “A
review of object detection based on deep learning,” Multimedia Tools
and Applications, vol. 79, no. 33, pp. 23729–23791, 2020.
[58] Z. Dokur and T. Ölmez, “Heart sound classification using wavelet
transform and incremental selforganizing map,” Digital Signal Processing, vol. 18, no. 6, pp. 951–959, 2008.
[59] Z. Dokur, “Respiratory sound classification by using an incremental supervised neural network,” Pattern Analysis and Applications,
vol. 12, no. 4, pp. 309–319, 2009.
[60] F. Demir, M. Turkoglu, M. Aslan, and A. Sengur, “A new pyramidal
concatenated cnn approach for environmental sound classification,”
Applied Acoustics, vol. 170, p. 107520, 2020.
[61] K. Palanisamy, D. Singhania, and A. Yao, “Rethinking cnn models
for audio classification,” arXiv preprint arXiv:2007.11154, 2020.
[62] Y. Shen, J. Cao, J. Wang, and Z. Yang, “Urban acoustic classification based on deep feature transfer learning,” Journal of the Franklin
Institute, vol. 357, no. 1, pp. 667–686, 2020.
[63] J. Salamon and J. P. Bello, “Deep convolutional neural networks and
data augmentation for environmental sound classification,” IEEE Signal Processing Letters, vol. 24, no. 3, pp. 279–283, 2017.
[64] B. Zhu, C. Wang, F. Liu, J. Lei, Z. Huang, Y. Peng, and F. Li, “Learning environmental sounds with multiscale convolutional neural net49
work,” in 2018 International Joint Conference on Neural Networks
(IJCNN), pp. 1–8, 2018.
[65] Z. Chen and B. Liu, “Lifelong machine learning,” Synthesis Lectures
on Artificial Intelligence and Machine Learning, vol. 12, no. 3, pp. 1–
207, 2018.
[66] A. Chaudhry, M. Ranzato, M. Rohrbach, and M. Elhoseiny, “Efficient
lifelong learning with agem,” arXiv preprint arXiv:1812.00420,
2018.
[67] R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, “Memory aware synapses: Learning what (not) to forget,”
in Proceedings of the European Conference on Computer Vision
(ECCV), pp. 139–154, 2018.
[68] F. M. Castro, M. J. MarínJiménez, N. Guil, C. Schmid, and K. Alahari, “Endtoend incremental learning,” in Proceedings of the European conference on computer vision (ECCV), pp. 233–248, 2018.
[69] Y. Wu, Y. Chen, L. Wang, Y. Ye, Z. Liu, Y. Guo, and Y. Fu, “Large
scale incremental learning,” in Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pp. 374–382, 2019.
[70] K. Shmelkov, C. Schmid, and K. Alahari, “Incremental learning of
object detectors without catastrophic forgetting,” in Proceedings of
the IEEE International Conference on Computer Vision, pp. 3400–
3409, 2017.
[71] D. Li, S. Tasci, S. Ghosh, J. Zhu, J. Zhang, and L. Heck, “Rilod: near
realtime incremental learning for object detection at the edge,” in
50
Proceedings of the 4th ACM/IEEE Symposium on Edge Computing,
pp. 113–126, 2019.
[72] R. Kemker and C. Kanan, “Fearnet: Braininspired model for incremental learning,” arXiv preprint arXiv:1711.10563, 2017.
[73] E. Koh, F. Saki, Y. Guo, C.Y. Hung, and E. Visser, “Incremental
learning algorithm for sound event detection,” in 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, IEEE,
2020.
[74] Z. Wang, C. Subakan, E. Tzinis, P. Smaragdis, and L. Charlin, “Continual learning of new sound classes using generative replay,” in 2019
IEEE Workshop on Applications of Signal Processing to Audio and
Acoustics (WASPAA), pp. 308–312, IEEE, 2019.
[75] K. Javed and F. Shafait, “Revisiting distillation and incremental classifier learning,” in Asian conference on computer vision, pp. 3–17,
Springer, 2018.
[76] E. Belouadah and A. Popescu, “Il2m: Class incremental learning with
dual memory,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 583–592, 2019.
[77] Y. Xiang, Y. Fu, P. Ji, and H. Huang, “Incremental learning using
conditional adversarial networks,” in Proceedings of the IEEE/CVF
International Conference on Computer Vision, pp. 6619–6628, 2019.
[78] N. Kamra, U. Gupta, and Y. Liu, “Deep generative dual memory network for continual learning,” arXiv preprint arXiv:1710.10368, 2017.
51
[79] Y. Wang, N. J. Bryan, M. Cartwright, J. P. Bello, and J. Salamon,
“Fewshot continual learning for audio classification,” in ICASSP
20212021 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP), pp. 321–325, IEEE, 2021.
[80] S. Gidaris and N. Komodakis, “Dynamic fewshot visual learning
without forgetting,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4367–4375, 2018.
[81] B. Bayram and İ. Gökhan, “Realtime auditory scene analysis using
continual learning in real environments,” Avrupa Bilim ve Teknoloji
Dergisi, pp. 215–226.
[82] S. R. Eddy, “What is a hidden markov model?,” Nature biotechnology, vol. 22, no. 10, pp. 1315–1316, 2004.
[83] S. Lu, S. Li, Y. Xu, K. Wang, H. Lan, and J. Guo, “Event detection
from text using pathaware graph convolutional network,” Applied
Intelligence, pp. 1–12, 2021.
[84] A. Kumar and V. Ithapu, “A sequential self teaching approach for improving generalization in sound event recognition,” in International
Conference on Machine Learning, pp. 5447–5457, PMLR, 2020.
[85] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer
vision and pattern recognition, pp. 770–778, 2016.
[86] F. Font, G. Roma, and X. Serra, “Freesound technical demo,” in Proceedings of the 21st ACM international conference on Multimedia,
pp. 411–412, 2013.
52
[87] M. Huzaifah, “Comparison of timefrequency representations for
environmental sound classification using convolutional neural networks,” arXiv preprint arXiv:1706.07156, 2017.
[88] S. Chu, S. Narayanan, and C.C. J. Kuo, “Environmental sound recognition with time–frequency audio features,” IEEE Transactions on
Audio, Speech, and Language Processing, vol. 17, no. 6, pp. 1142–
1158, 2009.
[89] X. Zhang, “Designs, experiments, and applications of multichannel
structures for hearing aids,” European Journal of Electrical Engineering and Computer Science, vol. 5, no. 4, pp. 46–55, 2021.
[90] S. L. Ullo, S. K. Khare, V. Bajaj, and G. Sinha, “Hybrid computerized
method for environmental sound classification,” IEEE Access, vol. 8,
pp. 124055–124065, 2020.
[91] S. Sadhu and H. Hermansky, “Radically old way of computing spectra: Applications in endtoend asr,” arXiv preprint
arXiv:2103.14129, 2021.
[92] M. Anusuya and S. Katti, “Front end analysis of speech recognition:
a review,” International Journal of Speech Technology, vol. 14, no. 2,
pp. 99–145, 2011.
[93] K. A. Weinheimer, Efficient Audio Source Separation Using MelSpectrograms. PhD thesis, University of Maryland, Baltimore
County, 2020.
[94] D. Milacic and S. Krishnan, “Applying neural style transfer to spectrograms of environmental audio,” in International Symposium on
53
Signal Processing and Intelligent Recognition Systems, pp. 313–322,
Springer, 2020.
[95] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,
T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An
imperative style, highperformance deep learning library,” arXiv
preprint arXiv:1912.01703, 2019.