簡易檢索 / 詳目顯示

研究生: 黃炬智
Chu-Chih Huang
論文名稱: 適用於分類中文構音障礙之深度學習模型
Classification of Chinese Articulation Disorder based on Deep Learning Model
指導教授: 阮聖彰
Shanq-Jang Ruan
口試委員: 郭柏齡
Po-Ling Kuo
湯梓辰
Tzu-Chen Tang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 80
中文關鍵詞: 構音障礙深度學習卷積神經網路LeNet-5
外文關鍵詞: Articulation disorder, Deep Learning, Convolutional neural network, LeNet-5
相關次數: 點閱:243下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 構音障礙為發音過程中發生錯誤或困難,導致咬字不正確進而造成語句不夠清晰,而構音障礙一直是常見的兒童語言問題,目前在台灣醫療界對於構音障礙的類別並沒有統一說法 ,所以一般醫院的治療方式都必須有一位語言治療師來進行判斷及治療,治療師會針對每種構音障礙類別所缺乏的發音去設計一系列的單字並讓孩童跟著念,經由一連串的單字發音後 治療師會根據該孩童的發音狀況去下判斷,並持續幾個月的回診來改善發音問題,但這樣的治療方式造成構音孩童只能在醫院才能接受治療並得到回饋,造成治療周期拉長,本論文目的是結合最新的AI卷積神經網路(CNN)自動診斷構音障礙。結果顯示,LeNet-5在模型最小的情況下達到94.56 Top-1準確度和0.995平均F1-score使得更適合在移動裝置上執行構音障礙的分類。


    Articulation disorder means having difficulties during pronunciations, leading to incorrect articulations and unclear sentences. Articulation disorder has been a common child language issue. Currently, there is no any unified sayings for articulation disorder's classification in the Taiwan's medical field. Thus, a speech therapist is required for analysis and treatment in hospitals. After a series of pronunciations, a speech therapist will make an analysis based on children's pronunciations. Children will return to the hospitals for months continuously to improve their conditions. Nevertheless, the treatment can only benefit children with articulation disorder by receiving treatments in hospitals, slowing down the treatment cycle. The purpose of this work is to automate the diagnosis for articulation disorder by combining the latest AI's convolutional neural network (CNN). Results show that LeNet-5 which achieved 94.56 Top-1 accuracy and 0.995 avg F1-score with the smallest model size is more suitable to apply articulations disorder application on mobile devices.

    Table of Contents Recommendation Form Committee Form Chinese Abstract English Abstract Acknowledgements Table of Contents List of Tables List of Figures Introduction Related Works Proposed Method Experimental Results Conclusions

    [1] JingWei Huang, YeouJiunn Chen, “Development of Articulation Diagnostic and Teaching Activities System for Articulation Disorders ,”, Department of Electrical Engineering Southern Taiwan University of Science and Technology. 2007
    [2] 林寶貴(1994) 。聽覺障礙教育與復健。台北: 五南。ISBN13: 9789571108803
    [3] National Institute on Deafness and Other Communication Disorders (NIDCD) (1994), National Strategic research Plan, Bethesda, MD: Department of Health and Human Services.
    [4] YuNan Wang, MeiLi Cheng, YaWen Li, XiaoJun Zhang, “Result of Low Frequency Speech Therapy in Children with Articulation Disorder,” in Taiwan Journal of Physical Medicine and Rehabilitation, 38(1)‧2734, 2010.
    [5] Fox, Cynthia; Ramig, Lorraine; Ciucci, Michelle; Sapir, Shimon; McFarland, David; Farley, Becky, “Neural PlasticityPrincipled Approach to Treating Individuals with Parkinson Disease and Other Neurological Disorders,” in Seminars in Speech and Language 27 (4), 283–99. Doi: 10.1055/s2006955118.
    [6] The National Collaborating Centre for Chronic Conditions, ed., “Other key interventions,” in Parkinson’s Disease. London: Royal College of Physicians, pp. 135–46, 2006.
    [7] S. Witt and S. Young„ “Phonelevel pronunciation scoring and assessment for interactive language learning,” Speech Communication, vol. 30, no. 2–3, pp. 95–108, 2000.
    [8] F. Zhang, C. Huang, F. K. Soong, M. Chu, and R. H. Wang, “Automatic mispronunciation detection for Mandarin,” in Proc. ICASSP, pp. 5077–5080, 2008. [9] Y.B. Wang and L.S. Lee, “Improved approaches of modeling and detecting error patterns with empirical analysis for computeraided pronunciation training,” in Proc. ICASSP, pp. 5049–5052, 2012.
    [10] O. AbdelHamid, A. Mohamed, H. Jiang, and G. Penn, “Applying convolutional neural networks concepts to hybrid NNHMM model for speech recognition,” in Proc. ICASSP, pp. 4277–4280, 2012.
    [11] P. D. Polur and G. E. Miller, “Experiments with fast Fourier transform, linear predictive and cepstral coefficients in dysarthric speech recognition algorithms using hidden Markov model,” in IEEE Trans. Neural Syst. Rehabil. Eng., vol. 13, no. 4, pp. 558–561, Dec. 2005.
    [12] H. V. Sharma and M. HasegawaJohnson, “Statetransition interpolation and MAP adaptation for HMMbased dysarthric speech recognition,” in Proc. NAACL HLT Workshop Speech Lang. Process. Assist. Tech., Jun. 2010, pp. 29–72.
    [13] X. Huang, A. Acero, and H.W. Hon, “Spoken Language Processing: A Guide to Theory,” Algorithm and System Development. Englewood Cliffs, JN, USA: PrenticeHall, 2001.
    [14] Asha.org, “Selected Phonological Processes,” https:// www.asha.org/ PracticePortal/ ClinicalTopics/ SpeechSoundDisordersArticulationandPhonology/ SelectedPhonologicalProcesses/, July 16, 2019
    [15] BaiiJia Yang, ShiangJiun Lai, and WenLing Liao, “Patterns of Dyslalia in Mandarin Speakers,” Taiwan Journal of Physical Medicine and Rehabilitation, 1984, pp. 3543.
    [16] Shin WonHo, Yang TaeYoung, “Speech Recognition Using Noise Robust Features and Spectral Subtraction,” the journal of the acoustical society of Korea, vol. 19, no. 2, pp. 3843, 1969
    [17] Nitsch B. H, “A Frequencyselective stepfactor control for an adaptive filter algorithm working in the frequency domain,” Signal processing the official publication of the European Association for Signal Processing, vol. 80, no. 9, pp. 17331745, 2000
    [18] Liu QG, Champagne. B, Ho D.K.C, “Simple design of oversampled uniform DFT filter banks with applications to subband acoustic echo cancellation,” Signal processing the official publication if the European Association for Signal Processing, vol. 80, no.5, pp.831,847, 2000.
    [19] H. Franco, L. Neumeyer, M. Ramos, and H. Bratt, “Automatic detection of phonelevel mispronunciation for language learning,” in Proc. Eurospeech, pp. 851–854, 1999.
    [20] YaoChi Hsu, MingHan Yang, HsiaoTsung Hung, Yuwen Hsiung, YaoTing Sung, and Berlin Chen, “Exploring Combinations of Various Deep Neural Network based Acoustic Models and Classification Techniques for Mandarin Mispronunciation Detection,” The 2015 Conference on Computational Linguistics and Speech Processing ROCLING 2015, pp. 103120.
    [21] YowBang Wang, LinShan Lee, “Error Pattern Detection Integrating Generative and Discriminative Learning for ComputerAided Pronunciation Training,” INTERSPEECH 2012: 819822
    [22] YowBang Wang, LinShan Lee, “Improved approaches of modeling and detecting Error Patterns with empirical analysis for ComputerAided Pronunciation Training,” ICASSP 2012: 50495052
    [23] YowBang Wang, LinShan Lee, “Supervised Detection and Unsupervised Discovery of Pronunciation Error Patterns for ComputerAssisted Language Learning,” IEEE/ACM Trans. Audio, Speech & Language Processing 23(3): 564579 (2015)
    [24] T. N. Sainath, A. Mohamed, B. Kingsbury, and B. Ramabhadran, “Deep convolutional neural networks for LVCSR,” in Proc. ICASSP, pp. 8614–8618, 2013.
    [25] T. O'Shea, J. Corgan, and T. Clancy, “Convolutional radio modulation recognition networks,” in Proc. International conference on engineering applications of neural networks, 2016.
    [26] Alex Graves, Abdelrahman Mohamed and Geoffrey Hinton, “SPEECH RECOGNITION WITH DEEP RECURRENT NEURAL NETWORKS,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2013.
    [27] T. N. Sainath, O. Vinyals, A. W. Senior, and H. Sak, “Convolutional, long shortterm memory, fully connected deep neural networks,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2015.
    [28] Braun, Stefan, Neil, Daniel, and Liu, ShihChii. “A curriculum learning method for improved noise robustness in automatic speech recognition.” arXiv preprint arXiv: 1606.06864, 2016.
    [29] Zaidi Razak,Noor Jamilah Ibrahim,emran mo mil,mohd Yamani Idna Idris, Mohd yaakob Yusoff, “Quranic verse recition feature extraction using mel frequency cepstral coefficient (MFCC),”
    [30] PrenticeHall, Englewood Cliffs, NJ, L.R.Rabiner and R. W.Schafer, “Digital Processing of Speech Signals,” 1987
    [31] A. Vijayan, B. M. Mathai, K. Valsalan, R. R. Johnson, L. R. Mathew, and K. Gopakumar, “Throat microphone speech recognition using mfcc,” in Networks and Advances in Computational Technologies (NetACT), 2017 International Conference on, 2017, pp. 392–395.
    [32] Y. Wang dan B. Lawlor, “Speaker recognition based on MFCC and BP neural networks,” in Signals and Systems Conference (ISSC), 2017 28th Irish, 2017, pp. 1– 4.
    [33] M. N. Aulia, M. S. Mubarok, W. U. Novia, dan F. Nhita, “A comparative study of MFCCKNN and LPCKNN for hijaiyyah letters pronounciation classification system,” in Networks and Advances in Computational Technologies (NetACT), in Information and Communication Technology (ICoIC7), 2017 5th International Conference on, 2017, pp. 1–5.
    [34] U. G. Patil, S. D. Shirbahadurkar, and A. N. Paithane, “Automatic Speech Recognition of isolated words in Hindi language using MFCC,” in Computing, Analytics and Security Trends (CAST), International Conference on, 2016, pp. 433–438.
    [35] K. Lee and H. Hon, “Speakerindependent phone recognition using hidden Markov models,” in IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 37, No. 11, Nov. 1989.
    [36] L.R. Rabiner,H. Niemann. M. Lang and G. Sagerer, “Mathematical foundations of hidden Markov models,” in Speech Understanding and Dialog Systems, Vol. F46 of NATO ASI Series, Springer, Berlin. 1988, pp. 183205.
    [37] L. R. Rabiner, B. H. Juang, “An Introduction to Hidden Markov Models,” in IEEE ASSP Magazine, Jan. 1986.
    [38] A. B. Poritz, “Hidden Markov Models: A Guided Tour,” in ICASSP 1988.
    [39] S. E. Levinson, L. R. Rabiner, M. M. Sondhi, “An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition,” in The Bell System Technical Journal, vol. 62, no. 4, April 1983.
    [40] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradientbased learning applied to document recognition,” Proceedings of the IEEE, november 1998.
    [41] Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., and Citro, “Tensorflow: Largescale machine learning on heterogeneous distributed systems,” in arXiv preprint arXiv:1603.04467, 2016.
    [42] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, “Faster RCNN: Towards RealTime Object Detection with Region Proposal Networks,” in arXiv: 1506.01497v3 [cs.CV] 6 Jan 2016.
    [43] Karen Simonyan, Andrew Zisserman, “Very Deep Convolutional Networks for Largescale Image Recognition,” in Proc. ICLR 2015.
    [44] C. Goutte and É. Gaussier, “A Probabilistic Interpretation of Precision, Recall and FScore, with Implication for Evaluation,” ECIR, 2005.

    無法下載圖示 全文公開日期 2024/08/22 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE