簡易檢索 / 詳目顯示

研究生: Said Karam
Said Karam
論文名稱: 基於事件記憶並避免災難性遺忘的持續學習方法應用於環境聲音分類
Episodic Memory Based Continual Learning without Catastrophic Forgetting for Environmental Sound Classification
指導教授: 阮聖彰
Shanq-Jang Ruan
口試委員: 林昌鴻
Chang Hong Lin
陳維美
Wei-Mei Chen
林淵翔
Yuan-Hsiang Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 65
中文關鍵詞: 災難性遺忘持續學習情景記憶聲音分類
外文關鍵詞: Catastrophic Forgetting, Continual Learning, Episodic Memory, Sound Classification
相關次數: 點閱:120下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 卷積神經網絡在持續學習期間遭受災難性遺忘。這是人工智能的主要障礙之一,要在不忘記先前學到的信息的情況下解決新問題。在本文中,我們提出了一種用於增量學習聲音數據的情景記憶技術。所提出的方法依次觀察任務並成功解決新任務而不會忘記之前的任務。結果表明,所提出的方法能夠有效地向後和向前轉移知識。性能評估表明,所提出的方法比其他基準實現了更好的性能。對於 ESC-50 和 UrbanSound8K 數據集,所提出的方法分別獲得了 96.5% 和 93.2% 的準確率。


    Convolutional neural network suffers from catastrophic forgetting during continual learning. This is one of the major obstacles for artificial intelligence, to solve new problems without forgetting the previously learned information. In this article, we propose an episodic memory technique for learning sound data incrementally. The proposed method observes tasks sequentially and successfully solves the new task without forgetting the previous task. The results show that the proposed method is able to transfer backward and forward knowledge efficiently. The performance evaluation demonstrates that the proposed method achieves better performance than other benchmarks. For ESC-50 and UrbanSound8K datasets, the proposed method obtained 96.5% and 93.2% accuracy, respectively.

    Abstract in English . . . . . . . . . . . . . . . . iii Acknowledgements . . . . . . . . . . . . . . . . . . iv Contents . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . vii List of Tables . . . . . . . . . . . . . . . . . . . . ix 1 Introduction . . . . . . . . . . . . . . . . . . . . 1 2 Related Prior Works . . . . . . . . . . . . . . . . . 9 3 Proposed Methodology . . . . . . . . . . . . . . . . 16 3.1 Datasets . . . . . . . . . . . . . . . . . . . . . .21 3.1.1 ESC­50 . . . . . . . . . . . . . . . . . . . . . . 22 3.1.2 UrbanSound8K . . . . . . . . . . . . . . . . . . .23 3.2 Data Pre­processing . . . . . . . . . . . . . . . . .23 3.2.1 Spectrogram . . . . . . . . . . . . . . . . . . . 25 3.3 Model . . . . . . . . . . . . . . . . . . . . . . . 26 4 Experimental Results . . . . . . . . . . . . . . . . .27 4.1 Experimental Setup . . . . . . . . . . . . . . . . .27 v 4.1.1 Training setup . . . . . . . . . . . . . . . . . .28 4.1.2 Comparison with other methods . . . . . . . . . . 30 4.1.3 Performance of the proposed method on different episodic memories . . . . . . . . . . . . . . . . . . . 33 4.1.4 Complete Experiments . . . . . . . . . . . . . . . 35 5 Conclusions . . . . . . . . . . . . . . . . . . . . . .39 5.1 Future Work . . . . . . . . . . . . . . . . . . . . .40 References . . . . . . . . . . . . . . . . . . . . . . . 41

    [1] Z. Ali and M. Talha, “Innovative method for unsupervised voice activity detection and classification of audio segments,” Ieee Access,
    vol. 6, pp. 15494–15504, 2018.
    [2] H. Li, S. Ishikawa, Q. Zhao, M. Ebana, H. Yamamoto, and J. Huang,
    “Robot navigation and sound based position identification,” in 2007
    IEEE International Conference on Systems, Man and Cybernetics,
    pp. 2449–2454, IEEE, 2007.
    [3] J. Ye, T. Kobayashi, X. Wang, H. Tsuda, and M. Murakawa, “Audio data mining for anthropogenic disaster identification: An automatic taxonomy approach,” IEEE Transactions on Emerging Topics
    in Computing, vol. 8, no. 1, pp. 126–136, 2017.
    [4] R. F. Lyon, “Machine hearing: An emerging field [exploratory dsp],”
    IEEE signal processing magazine, vol. 27, no. 5, pp. 131–139, 2010.
    [5] D. Barchiesi, D. Giannoulis, D. Stowell, and M. D. Plumbley,
    “Acoustic scene classification: Classifying environments from the
    sounds they produce,” IEEE Signal Processing Magazine, vol. 32,
    no. 3, pp. 16–34, 2015.
    [6] K. Choi, G. Fazekas, M. Sandler, and K. Cho, “Transfer learning for music classification and regression tasks,” arXiv preprint
    arXiv:1703.09179, 2017.
    [7] W. Bian, J. Wang, B. Zhuang, J. Yang, S. Wang, and J. Xiao, “Audiobased music classification with densenet and data augmentation,”
    41
    in Pacific Rim International Conference on Artificial Intelligence,
    pp. 56–65, Springer, 2019.
    [8] P. Intani and T. Orachon, “Crime warning system using image and
    sound processing,” in 2013 13th International Conference on Control,
    Automation and Systems (ICCAS 2013), pp. 1751–1753, IEEE, 2013.
    [9] M. Vacher, D. Istrate, L. Besacier, J.­F. Serignat, and E. Castelli,
    “Sound detection and classification for medical telesurvey,” in 2nd
    Conference on Biomedical Engineering, pp. 395–398, 2004.
    [10] M. Green and D. Murphy, “Environmental sound monitoring using
    machine learning on mobile devices,” Applied Acoustics, vol. 159,
    p. 107041, 2020.
    [11] J. Ramírez and M. J. Flores, “Machine learning for music genre: multifaceted review and experimentation with audioset,” Journal of Intelligent Information Systems, vol. 55, no. 3, pp. 469–499, 2020.
    [12] G. Saon, G. Kurata, T. Sercu, K. Audhkhasi, S. Thomas, D. Dimitriadis, X. Cui, B. Ramabhadran, M. Picheny, L.­L. Lim, et al., “English conversational telephone speech recognition by humans and machines,” arXiv preprint arXiv:1703.02136, 2017.
    [13] K. J. Piczak, “Environmental sound classification with convolutional
    neural networks,” in 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6, IEEE, 2015.
    [14] H. Zhou, Y. Song, and H. Shu, “Using deep convolutional neural network to classify urban sounds,” in TENCON 2017­2017 IEEE Region
    10 Conference, pp. 3089–3092, IEEE, 2017.
    42
    [15] M. Crocco, M. Cristani, A. Trucco, and V. Murino, “Audio surveillance: A systematic review,” ACM Computing Surveys (CSUR),
    vol. 48, no. 4, pp. 1–46, 2016.
    [16] K. J. Piczak, “Esc: Dataset for environmental sound classification,”
    in Proceedings of the 23rd ACM international conference on Multimedia, pp. 1015–1018, 2015.
    [17] J. Salamon, C. Jacoby, and J. P. Bello, “A dataset and taxonomy for
    urban sound research,” in Proceedings of the 22nd ACM international
    conference on Multimedia, pp. 1041–1044, 2014.
    [18] V. Bountourakis, L. Vrysis, and G. Papanikolaou, “Machine learning algorithms for environmental sound recognition: Towards soundscape semantics,” in Proceedings of the Audio Mostly 2015 on Interaction With Sound, pp. 1–7, 2015.
    [19] D. M. Agrawal, H. B. Sailor, M. H. Soni, and H. A. Patil, “Novel teobased gammatone features for environmental sound classification,”
    in 2017 25th European Signal Processing Conference (EUSIPCO),
    pp. 1809–1813, IEEE, 2017.
    [20] X. Valero and F. Alías, “Classification of audio scenes using narrowband autocorrelation features,” in 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO), pp. 2012–2019,
    2012.
    [21] L. Lexfors and M. Johansson, “Audio representation for environmental sound classification using convolutional neural networks,” Master’s Theses in Mathematical Sciences, 2018.
    43
    [22] C. V. Cotton and D. P. Ellis, “Spectral vs. spectro­temporal features
    for acoustic event detection,” in 2011 IEEE Workshop on Applications
    of Signal Processing to Audio and Acoustics (WASPAA), pp. 69–72,
    IEEE, 2011.
    [23] X. Xin, F. Wenhui, Y. Yuming, G. Bin, C. Junhai, and W. Wei, “Hla
    based high level modeling and simulation for integrated logistical
    supporting system,” in 2007 IEEE International Conference on Automation and Logistics, pp. 2041–2045, IEEE, 2007.
    [24] S. Davis and P. Mermelstein, “Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences,” IEEE transactions on acoustics, speech, and signal processing, vol. 28, no. 4, pp. 357–366, 1980.
    [25] F. Aurino, M. Folla, F. Gargiulo, V. Moscato, A. Picariello, and
    C. Sansone, “One­class svm based approach for detecting anomalous
    audio events,” in 2014 International Conference on Intelligent Networking and Collaborative Systems, pp. 145–151, IEEE, 2014.
    [26] A. Mesaros, T. Heittola, A. Eronen, and T. Virtanen, “Acoustic event
    detection in real life recordings,” in 2010 18th European Signal Processing Conference, pp. 1267–1271, IEEE, 2010.
    [27] P. Somervuo, A. Harma, and S. Fagerlund, “Parametric representations of bird sounds for automatic species recognition,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, no. 6,
    pp. 2252–2263, 2006.
    44
    [28] Z. Zhang, S. Xu, S. Zhang, T. Qiao, and S. Cao, “Attention based
    convolutional recurrent neural network for environmental sound classification,” Neurocomputing, vol. 453, pp. 896–903, 2021.
    [29] C. K. Reddy, V. Gopal, and R. Cutler, “Dnsmos: A non­intrusive
    perceptual objective speech quality metric to evaluate noise suppressors,” in ICASSP 2021­2021 IEEE International Conference on
    Acoustics, Speech and Signal Processing (ICASSP), pp. 6493–6497,
    IEEE, 2021.
    [30] A. Guzhov, F. Raue, J. Hees, and A. Dengel, “Esresne (x) tfbsp: Learning robust time­frequency transformation of audio,” arXiv
    preprint arXiv:2104.11587, 2021.
    [31] Y. Kim, H. Jeong, J.­D. Cho, and J. Shin, “Construction of a
    soundscape­based media art exhibition to improve user appreciation
    experience by using deep neural networks,” Electronics, vol. 10,
    no. 10, p. 1170, 2021.
    [32] S. Yadav and M. E. Foster, “Gise­51: A scalable isolated sound events
    dataset,” arXiv preprint arXiv:2103.12306, 2021.
    [33] H. R. Joo and L. M. Frank, “The hippocampal sharp wave–ripple in
    memory retrieval for immediate use and consolidation,” Nature Reviews Neuroscience, vol. 19, no. 12, pp. 744–757, 2018.
    [34] M. Minsky, “Society of mind: a response to four reviews,” Artificial
    Intelligence, vol. 48, no. 3, pp. 371–396, 1991.
    45
    [35] M. McCloskey and N. J. Cohen, “Catastrophic interference in connectionist networks: The sequential learning problem,” in Psychology of
    learning and motivation, vol. 24, pp. 109–165, Elsevier, 1989.
    [36] J.­L. Shieh, M. A. Haq, S. Karam, P. Chondro, D.­Q. Gao, S.­J.
    Ruan, et al., “Continual learning strategy in one­stage object detection framework based on experience replay for autonomous driving
    vehicle,” Sensors, vol. 20, no. 23, p. 6777, 2020.
    [37] J. Zhang, J. Zhang, S. Ghosh, D. Li, S. Tasci, L. Heck, H. Zhang, and
    C.­C. J. Kuo, “Class­incremental learning via deep model consolidation,” in The IEEE Winter Conference on Applications of Computer
    Vision, pp. 1131–1140, 2020.
    [38] Z. Li and D. Hoiem, “Learning without forgetting,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 12,
    pp. 2935–2947, 2017.
    [39] S.­A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “icarl:
    Incremental classifier and representation learning,” in Proceedings of
    the IEEE conference on Computer Vision and Pattern Recognition,
    pp. 2001–2010, 2017.
    [40] Y. Li, Z. Li, L. Ding, Y. Pan, C. Huang, Y. Hu, W. Chen, and
    X. Gao, “Supportnet: solving catastrophic forgetting in class incremental learning with support data,” arXiv preprint arXiv:1806.02942,
    2018.
    [41] Y. Wu, Y. Chen, L. Wang, Y. Ye, Z. Liu, Y. Guo, Z. Zhang, and
    Y. Fu, “Incremental classifier learning with generative adversarial
    networks,” arXiv preprint arXiv:1802.00853, 2018.
    46
    [42] J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins,
    A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska­Barwinska,
    et al., “Overcoming catastrophic forgetting in neural networks,” Proceedings of the national academy of sciences, vol. 114, no. 13,
    pp. 3521–3526, 2017.
    [43] R. Pascanu and Y. Bengio, “Revisiting natural gradient for deep networks,” arXiv preprint arXiv:1301.3584, 2013.
    [44] J. Schwarz, W. Czarnecki, J. Luketina, A. Grabska­Barwinska, Y. W.
    Teh, R. Pascanu, and R. Hadsell, “Progress & compress: A scalable
    framework for continual learning,” in International Conference on
    Machine Learning, pp. 4528–4537, PMLR, 2018.
    [45] M. Delange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis,
    G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,” IEEE Transactions on Pattern
    Analysis and Machine Intelligence, pp. 1–1, 2021.
    [46] F. Zenke, B. Poole, and S. Ganguli, “Continual learning through
    synaptic intelligence,” in International Conference on Machine
    Learning, pp. 3987–3995, PMLR, 2017.
    [47] N. Y. Masse, G. D. Grant, and D. J. Freedman, “Alleviating catastrophic forgetting using context­dependent gating and synaptic stabilization,” Proceedings of the National Academy of Sciences, vol. 115,
    no. 44, pp. E10467–E10475, 2018.
    [48] R. Aljundi, K. Kelchtermans, and T. Tuytelaars, “Task­free continual
    learning,” in Proceedings of the IEEE/CVF Conference on Computer
    Vision and Pattern Recognition, pp. 11254–11263, 2019.
    47
    [49] S. Hou, X. Pan, C. C. Loy, Z. Wang, and D. Lin, “Lifelong learning via progressive distillation and retrospection,” in Proceedings of
    the European Conference on Computer Vision (ECCV), pp. 437–452,
    2018.
    [50] H. Shin, J. K. Lee, J. Kim, and J. Kim, “Continual learning with deep
    generative replay,” in Advances in neural information processing systems, pp. 2990–2999, 2017.
    [51] L. Perez and J. Wang, “The effectiveness of data augmentation in image classification using deep learning,” arXiv preprint
    arXiv:1712.04621, 2017.
    [52] Y. Tian, Y. Wang, D. Krishnan, J. B. Tenenbaum, and P. Isola, “Rethinking few­shot image classification: a good embedding is all you
    need?,” arXiv preprint arXiv:2003.11539, 2020.
    [53] S. Wen, W. Liu, Y. Yang, P. Zhou, Z. Guo, Z. Yan, Y. Chen,
    and T. Huang, “Multilabel image classification via feature/label coprojection,” IEEE Transactions on Systems, Man, and Cybernetics:
    Systems, pp. 1–10, 2020.
    [54] X. Wu, D. Sahoo, and S. C. Hoi, “Recent advances in deep learning
    for object detection,” Neurocomputing, vol. 396, pp. 39–64, 2020.
    [55] P. Druzhkov and V. Kustikova, “A survey of deep learning methods
    and software tools for image classification and object detection,” Pattern Recognition and Image Analysis, vol. 26, no. 1, pp. 9–15, 2016.
    [56] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and
    S. Zagoruyko, “End­to­end object detection with transformers,” in
    48
    European Conference on Computer Vision, pp. 213–229, Springer,
    2020.
    [57] Y. Xiao, Z. Tian, J. Yu, Y. Zhang, S. Liu, S. Du, and X. Lan, “A
    review of object detection based on deep learning,” Multimedia Tools
    and Applications, vol. 79, no. 33, pp. 23729–23791, 2020.
    [58] Z. Dokur and T. Ölmez, “Heart sound classification using wavelet
    transform and incremental self­organizing map,” Digital Signal Processing, vol. 18, no. 6, pp. 951–959, 2008.
    [59] Z. Dokur, “Respiratory sound classification by using an incremental supervised neural network,” Pattern Analysis and Applications,
    vol. 12, no. 4, pp. 309–319, 2009.
    [60] F. Demir, M. Turkoglu, M. Aslan, and A. Sengur, “A new pyramidal
    concatenated cnn approach for environmental sound classification,”
    Applied Acoustics, vol. 170, p. 107520, 2020.
    [61] K. Palanisamy, D. Singhania, and A. Yao, “Rethinking cnn models
    for audio classification,” arXiv preprint arXiv:2007.11154, 2020.
    [62] Y. Shen, J. Cao, J. Wang, and Z. Yang, “Urban acoustic classification based on deep feature transfer learning,” Journal of the Franklin
    Institute, vol. 357, no. 1, pp. 667–686, 2020.
    [63] J. Salamon and J. P. Bello, “Deep convolutional neural networks and
    data augmentation for environmental sound classification,” IEEE Signal Processing Letters, vol. 24, no. 3, pp. 279–283, 2017.
    [64] B. Zhu, C. Wang, F. Liu, J. Lei, Z. Huang, Y. Peng, and F. Li, “Learning environmental sounds with multi­scale convolutional neural net49
    work,” in 2018 International Joint Conference on Neural Networks
    (IJCNN), pp. 1–8, 2018.
    [65] Z. Chen and B. Liu, “Lifelong machine learning,” Synthesis Lectures
    on Artificial Intelligence and Machine Learning, vol. 12, no. 3, pp. 1–
    207, 2018.
    [66] A. Chaudhry, M. Ranzato, M. Rohrbach, and M. Elhoseiny, “Efficient
    lifelong learning with a­gem,” arXiv preprint arXiv:1812.00420,
    2018.
    [67] R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, “Memory aware synapses: Learning what (not) to forget,”
    in Proceedings of the European Conference on Computer Vision
    (ECCV), pp. 139–154, 2018.
    [68] F. M. Castro, M. J. Marín­Jiménez, N. Guil, C. Schmid, and K. Alahari, “End­to­end incremental learning,” in Proceedings of the European conference on computer vision (ECCV), pp. 233–248, 2018.
    [69] Y. Wu, Y. Chen, L. Wang, Y. Ye, Z. Liu, Y. Guo, and Y. Fu, “Large
    scale incremental learning,” in Proceedings of the IEEE Conference
    on Computer Vision and Pattern Recognition, pp. 374–382, 2019.
    [70] K. Shmelkov, C. Schmid, and K. Alahari, “Incremental learning of
    object detectors without catastrophic forgetting,” in Proceedings of
    the IEEE International Conference on Computer Vision, pp. 3400–
    3409, 2017.
    [71] D. Li, S. Tasci, S. Ghosh, J. Zhu, J. Zhang, and L. Heck, “Rilod: near
    real­time incremental learning for object detection at the edge,” in
    50
    Proceedings of the 4th ACM/IEEE Symposium on Edge Computing,
    pp. 113–126, 2019.
    [72] R. Kemker and C. Kanan, “Fearnet: Brain­inspired model for incremental learning,” arXiv preprint arXiv:1711.10563, 2017.
    [73] E. Koh, F. Saki, Y. Guo, C.­Y. Hung, and E. Visser, “Incremental
    learning algorithm for sound event detection,” in 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6, IEEE,
    2020.
    [74] Z. Wang, C. Subakan, E. Tzinis, P. Smaragdis, and L. Charlin, “Continual learning of new sound classes using generative replay,” in 2019
    IEEE Workshop on Applications of Signal Processing to Audio and
    Acoustics (WASPAA), pp. 308–312, IEEE, 2019.
    [75] K. Javed and F. Shafait, “Revisiting distillation and incremental classifier learning,” in Asian conference on computer vision, pp. 3–17,
    Springer, 2018.
    [76] E. Belouadah and A. Popescu, “Il2m: Class incremental learning with
    dual memory,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 583–592, 2019.
    [77] Y. Xiang, Y. Fu, P. Ji, and H. Huang, “Incremental learning using
    conditional adversarial networks,” in Proceedings of the IEEE/CVF
    International Conference on Computer Vision, pp. 6619–6628, 2019.
    [78] N. Kamra, U. Gupta, and Y. Liu, “Deep generative dual memory network for continual learning,” arXiv preprint arXiv:1710.10368, 2017.
    51
    [79] Y. Wang, N. J. Bryan, M. Cartwright, J. P. Bello, and J. Salamon,
    “Few­shot continual learning for audio classification,” in ICASSP
    2021­2021 IEEE International Conference on Acoustics, Speech and
    Signal Processing (ICASSP), pp. 321–325, IEEE, 2021.
    [80] S. Gidaris and N. Komodakis, “Dynamic few­shot visual learning
    without forgetting,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4367–4375, 2018.
    [81] B. Bayram and İ. Gökhan, “Real­time auditory scene analysis using
    continual learning in real environments,” Avrupa Bilim ve Teknoloji
    Dergisi, pp. 215–226.
    [82] S. R. Eddy, “What is a hidden markov model?,” Nature biotechnology, vol. 22, no. 10, pp. 1315–1316, 2004.
    [83] S. Lu, S. Li, Y. Xu, K. Wang, H. Lan, and J. Guo, “Event detection
    from text using path­aware graph convolutional network,” Applied
    Intelligence, pp. 1–12, 2021.
    [84] A. Kumar and V. Ithapu, “A sequential self teaching approach for improving generalization in sound event recognition,” in International
    Conference on Machine Learning, pp. 5447–5457, PMLR, 2020.
    [85] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer
    vision and pattern recognition, pp. 770–778, 2016.
    [86] F. Font, G. Roma, and X. Serra, “Freesound technical demo,” in Proceedings of the 21st ACM international conference on Multimedia,
    pp. 411–412, 2013.
    52
    [87] M. Huzaifah, “Comparison of time­frequency representations for
    environmental sound classification using convolutional neural networks,” arXiv preprint arXiv:1706.07156, 2017.
    [88] S. Chu, S. Narayanan, and C.­C. J. Kuo, “Environmental sound recognition with time–frequency audio features,” IEEE Transactions on
    Audio, Speech, and Language Processing, vol. 17, no. 6, pp. 1142–
    1158, 2009.
    [89] X. Zhang, “Designs, experiments, and applications of multichannel
    structures for hearing aids,” European Journal of Electrical Engineering and Computer Science, vol. 5, no. 4, pp. 46–55, 2021.
    [90] S. L. Ullo, S. K. Khare, V. Bajaj, and G. Sinha, “Hybrid computerized
    method for environmental sound classification,” IEEE Access, vol. 8,
    pp. 124055–124065, 2020.
    [91] S. Sadhu and H. Hermansky, “Radically old way of computing spectra: Applications in end­to­end asr,” arXiv preprint
    arXiv:2103.14129, 2021.
    [92] M. Anusuya and S. Katti, “Front end analysis of speech recognition:
    a review,” International Journal of Speech Technology, vol. 14, no. 2,
    pp. 99–145, 2011.
    [93] K. A. Weinheimer, Efficient Audio Source Separation Using MelSpectrograms. PhD thesis, University of Maryland, Baltimore
    County, 2020.
    [94] D. Milacic and S. Krishnan, “Applying neural style transfer to spectrograms of environmental audio,” in International Symposium on
    53
    Signal Processing and Intelligent Recognition Systems, pp. 313–322,
    Springer, 2020.
    [95] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,
    T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., “Pytorch: An
    imperative style, high­performance deep learning library,” arXiv
    preprint arXiv:1912.01703, 2019.

    無法下載圖示 全文公開日期 2025/09/29 (校內網路)
    全文公開日期 2026/09/29 (校外網路)
    全文公開日期 2026/09/29 (國家圖書館:臺灣博碩士論文系統)
    QR CODE