簡易檢索 / 詳目顯示

研究生: 陳麒文
Chi-Wen Chen
論文名稱: 使用條件對抗神經網路設定與實現智慧型助聽器 The design and implementation of smart hearing aid using
The design and implementation of smart hearing aid using conditional adversarial neural networks
指導教授: 陳郁堂
Yie-Tarng Chen
口試委員: 方文賢
Wen-Hsien Fang
呂政修
Jenq-Shiou Leu
林銘波
Ming-Bo Lin
陳郁堂
Yie-Tarng Chen
陳省隆
Hsing-Lung Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 英文
論文頁數: 135
中文關鍵詞: 助聽器語音增強條件對抗神經網路
外文關鍵詞: Hearing aid, Speech enhancement, Conditional Generative Adversarial Network
相關次數: 點閱:162下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

傳統的語音增強技術常運用在頻譜分析或者更高層級特徵,能解決噪音問題到相當
程度。由於深層網路對於學習複雜函數的優異性,越來越多的神經網路模型被使用在相
關的領域。在本文中,我們提出了基於條件對抗神經網路模型的助聽器應用程序進行語
音增強並且增加平衡器的增益補償來優化結果。使用條件對抗神經網路作為去噪方法可
以針對使用者進行個人化的訓練,並且轉換傳統的tensorflow 模型到tensorflow-lite 模
型來適應手機應用程序的處理並減少其硬體的需求;通過聽力損失的標準程序來測試用
戶實際上的聽力損失並記錄結果來進行平衡器的設定,更能貼近人耳真實聆聽的情況。
為了解決延遲問題,選擇麥克風收音頻率為16394 赫茲和4096 個樣本數作為輸入輸出
的長度;另外在藍芽傳輸協定中,選擇了aptX-low latency 作為藍芽傳輸協定編碼,上
述的兩個設定使得整個聲音處理後的延遲控制在0.3 秒,有效減少處理的延遲,同時兼
顧人耳聽覺的感受以及模型處理的效能。此應用程序的設計是使用條件對抗神經網路進
行環境噪音的降低,且利用聽力損失的測試結果設定平衡器進行增益補償。此外助聽器
應用程式可以即時運行,用戶還可以選擇於不同環境的模型以達到適應各種環境的效果,
故使用本文提出的應用程式能有效達到補償人耳聽力損失的情況以及環境降噪的功能來
取得良好的語音增強效果。
iii


466 million people have the problem of hearing impairment worldwide in 2018, including
34 million children. However, conventional hearing aids cannot work well, since
they cannot block out background noise, and separate speech and noise in noisy environments.
To address these issues, in this work, we design and implement a novel smart
hearing aid application on a mobile phone, which leverages the deep learning scheme
to provide customized speech enhancement, adaptively de-noise under different environments
and also operate with a low latency. Specifically, conditional adversarial networks
are employed for speech enhancement, which can be trained by different datasets from
voice of different trainers under different environments to provide a customized speech
and adaptively de-noise under different environments . To operate deep speech enhancement
model on mobile devices, Tensorflow-lite is used to build our system. Furthermore,
we integrate hearing loss testing and equalizer in our app. To build a low delay audio app,
we carefully select frame size and leverage Bluetooth aptX-low latency protocol. Experiments
on both objective and subjective evaluations show that our smart hearing aid app
yields competitive performance on three well-known datasets for speech enhancement.
Additionally, the developed hearing aid app can achieve the latency as low as 0.3 seconds.
iv

Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.1 Deep Learning for Speech Enhancement . . . . . . . . . . . . . . . . . 3 2.2 Hearing Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Hearing Loss Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3 Proposed Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.1 Speech Enhancement by Deep Learning . . . . . . . . . . . . . . . . . . 15 3.2 Tensorflow-Lite Speech Enhancement Implementation . . . . . . . . . . 24 3.3 Hearing Aided APP System Architecture . . . . . . . . . . . . . . . . . . 26 3.3.1 Customized Equalizer from Output of Yearning Test . . . . . . . 28 3.3.2 Hearing Test Modules . . . . . . . . . . . . . . . . . . . . . . . 29 3.3.3 DNN-based Speech Enhancement . . . . . . . . . . . . . . . . . 32 3.3.4 Jitter Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.5 Low latency Design in Bluetooth . . . . . . . . . . . . . . . . . 35 4 Experimental Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.1 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 Training Database for Speech Enhancement . . . . . . . . . . . . . . . . 42 4.3 The WGAN-gp Train Setup and Result . . . . . . . . . . . . . . . . . . . 44 4.4 The Correction of Hearing Loss . . . . . . . . . . . . . . . . . . . . . . 57 4.5 The Comparison of Performance by Different Corpus . . . . . . . . . . . 65 4.6 Actual Improvement of Sound latency . . . . . . . . . . . . . . . . . . . 71 4.6.1 Audio latency by Sample Length . . . . . . . . . . . . . . . . . . 71 4.6.2 Audio latency by Bluetooth . . . . . . . . . . . . . . . . . . . . 74 4.7 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.1 Different Parameters for Training . . . . . . . . . . . . . . . . . . . . . . 83 6.1.1 Gradient Penalty Weight . . . . . . . . . . . . . . . . . . . . . . 84 6.1.2 Batch Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.2 Different Surroundings for Training . . . . . . . . . . . . . . . . . . . . 96 6.3 The Different Corpus in 4096 Sample Length . . . . . . . . . . . . . . . 101 6.4 User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.5 Application Unified Modeling Language . . . . . . . . . . . . . . . . . . 114 6.6 Research Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Letter of Authority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

References
[1] W. H. Organization et al., “Deafness and hearing loss. fact sheet,” Geneva, Switzerland: Author, 2017.
[2] H. Zettl, Sight, sound, motion: Applied media aesthetics. Cengage Learning, 2013.
[3] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean,
M. Devin, et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,”
arXiv preprint arXiv:1603.04467, 2016.
[4] C. J. Liu, H. T. Liu, and J. F. Andrews, “Deaf education in taiwan: History, policies, practices, and
outcomes,” Deaf Education Beyond the Western World: Context, Challenges, and Prospects, p. 239,
2019.
[5] C. M. Liu and C. T. C. Lee, “Association of hearing loss with dementia,” JAMA network open, vol. 2,
no. 7, pp. e198112–e198112, 2019.
[6] C. Y. Kuo, C. H. Chung, C. Wang, W. C. Chien, and H. C. Chen, “Increased incidence in hospitalised
patients with sudden sensorineural hearing loss: a 14-year nationwide population-based study,” International
journal of audiology, vol. 58, no. 11, pp. 769–773, 2019.
[7] T. H. Tang, J. H. Hwang, T. H. Yang, C. J. Hsu, C. C. Wu, and T. C. Liu, “Can nutritional intervention
for obesity and comorbidities slow down age-related hearing impairment?,” Nutrients, vol. 11, no. 7,
p. 1668, 2019.
[8] M. Green, “Hearing aid system,” Mar. 5 2019. US Patent 10,225,665.
[9] K. M. Thrailkill, M. A. Brennan, and W. Jesteadt, “Effects of amplification and hearing-aid experience
on the contribution of specific frequency bands to loudness,” Ear and hearing, vol. 40, no. 1, p. 143,
2019.
[10] P. C. Loizou, Speech enhancement: theory and practice. CRC press, 2013.
[11] M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,”
in ICASSP’79. IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4,
pp. 208–211, IEEE, 1979.
[12] J. Lim and A. Oppenheim, “All-pole modeling of degraded speech,” IEEE Transactions on Acoustics,
Speech, and Signal Processing, vol. 26, no. 3, pp. 197–210, 1978.
[13] S. Parveen and P. Green, “Speech enhancement with missing data techniques using recurrent neural
networks,” in 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing,
vol. 1, pp. I–733, IEEE, 2004.
[14] X. Lu, Y. Tsao, S. Matsuda, and C. Hori, “Speech enhancement based on deep denoising autoencoder.,”
in Interspeech, pp. 436–440, 2013.
[15] Y. C. Lin, Y. H. Lai, H. W. Chang, Y. Tsao, Y. P. Chang, and R. Y. Chang, “Smarthear: A smartphonebased
remote microphone hearing assistive system using wireless technologies,” IEEE Systems Journal,
vol. 12, no. 1, pp. 20–29, 2015.
[16] B. Bollepalli, L. Juvela, and P. Alku, “Generative adversarial network-based glottal waveform model
for statistical parametric speech synthesis,” arXiv preprint arXiv:1903.05955, 2019.
[17] Y. Saito, S. Takamichi, and H. Saruwatari, “Statistical parametric speech synthesis incorporating generative
adversarial networks,” IEEE/ACM Transactions on Audio, Speech, and Language Processing,
vol. 26, no. 1, pp. 84–96, 2017.
[18] C. Donahue, J. McAuley, and M. Puckette, “Adversarial audio synthesis,” arXiv preprint arXiv:
1802.04208, 2018.
[19] C. Donahue, B. Li, and R. Prabhavalkar, “Exploring speech enhancement with generative adversarial
networks for robust speech recognition,” in 2018 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), pp. 5024–5028, IEEE, 2018.
[20] S. Pascual, A. Bonafonte, and J. Serra, “Segan: Speech enhancement generative adversarial network,”
arXiv preprint arXiv:1703.09452, 2017.
[21] Y. Xu, J. Du, L. R. D. , and C. H.Lee, “An experimental study on speech enhancement based on deep
neural networks,” IEEE Signal processing letters, vol. 21, no. 1, pp. 65–68, 2013.
[22] D. S. Williamson, Y. Wang, and D. Wang, “Complex ratio masking for joint enhancement of magnitude
and phase,” in 2016 IEEE International Conference on Acoustics, Speech and Signal Processing
(ICASSP), pp. 5220–5224, IEEE, 2016.
[23] Y. Xu, J. Du, L. R. Dai, and C. H. Lee, “Global variance equalization for improving deep neural
network based speech enhancement,” in 2014 IEEE China Summit & International Conference on
Signal and Information Processing (ChinaSIP), pp. 71–75, IEEE, 2014.
[24] T. Gao, J. Du, L. R. Dai, and C. H. Lee, “Snr-based progressive learning of deep neural network for
speech enhancement.,” in INTERSPEECH, pp. 3713–3717, 2016.
[25] Z. T. Wang, X. Li, X. F. Wang, Q. Fu, and Y. H. Yan, “A dnn-hmm approach to non-negative matrix
factorization based speech enhancement.,” in INTERSPEECH, pp. 3763–3767, 2016.
[26] C. Huemmer, A. Schwarz, R. Maas, H. Barfuss, R. F. Astudillo, and W. Kellermann, “A new uncertainty
decoding scheme for dnn-hmm hybrid systems with multichannel speech enhancement,” in
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5760–
5764, IEEE, 2016.
[27] A. Kumar and D. Florencio, “Speech enhancement in multiple-noise conditions using deep neural
networks,” arXiv preprint arXiv:1605.02427, 2016.
[28] L. Sun, J. Du, L. R. Dai, and C. H. Lee, “Multiple-target deep learning for lstm-rnn based speech enhancement,”
in 2017 Hands-free Speech Communications and Microphone Arrays (HSCMA), pp. 136–
140, IEEE, 2017.
[29] W. Han, X. W. Z. , G. Min, X. Y. Zhou, and W. Zhang, “Perceptual weighting deep neural networks
for single-channel speech enhancement,” in 2016 12th World Congress on Intelligent Control and
Automation (WCICA), pp. 446–450, IEEE, 2016.
[30] M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXiv preprint arXiv:1701.07875, 2017.
[31] S. Rosen and P. Howell, Signals and systems for speech and hearing, vol. 29. Brill, 2011.
[32] M. E. Ravicz, J. T. Cheng, and J. J. Rosowski, “Sound pressure distribution within human ear canals:
Ii. reverse mechanical stimulation,” The Journal of the Acoustical Society of America, vol. 145, no. 3,
pp. 1569–1583, 2019.
[33] F.-G. Zeng, Q.-J. Fu, and R. Morse, “Human hearing enhanced by noise,” Brain research, vol. 869,
no. 1-2, pp. 251–255, 2000.
[34] M. R. Bowl and S. J. Dawson, “Age-related hearing loss,” Cold Spring Harbor Perspectives in
Medicine, vol. 9, no. 8, p. a033217, 2019.
[35] I. Yeend, E. F. Beach, M. Sharma, and H. Dillon, “The effects of noise exposure and musical training
on suprathreshold auditory processing and speech perception in noise,” Hearing Research, vol. 353,
pp. 224–236, 2017.
[36] T. Zahnert, “The differential diagnosis of hearing loss,” Deutsches ärzteblatt international, vol. 108,
no. 25, p. 433, 2011.
[37] S. Gordon-Salant and S. S. Cole, “Effects of age and working memory capacity on speech recognition
performance in noise among listeners with normal hearing,” Ear and Hearing, vol. 37, no. 5, pp. 593–
602, 2016.
[38] P. G. Stelmachowicz, J. Kopun, A. L. Mace, and D. E. Lewis, “Measures of hearing aid gain for real
speech,” Ear and Hearing, vol. 17, no. 6, pp. 520–527, 1996.
[39] K. Chung, “Challenges and recent developments in hearing aids: Part i. speech understanding in
noise, microphone technologies and noise reduction algorithms,” Trends in Amplification, vol. 8, no. 3,
pp. 83–124, 2004.
[40] N. S. Reed, J. Betz, N. Kendig, M. Korczak, and F. R. Lin, “Personal sound amplification products vs
a conventional hearing aid for speech understanding in noise,” Jama, vol. 318, no. 1, pp. 89–90, 2017.
[41] B. C. Moore, An introduction to the psychology of hearing. Brill, 2012.
[42] C. E. Speaks, Introduction to sound: acoustics for the hearing and speech sciences. Plural Publishing,
2017.
[43] J. P. Egan, “Articulation testing methods,” The Laryngoscope, vol. 58, no. 9, pp. 955–991, 1948.
[44] C. V. Hudgins, J. Hawkins, J. Kaklin, and S. Stevens, “The development of recorded auditory tests for
measuring hearing loss for speech,” The Laryngoscope, vol. 57, no. 1, pp. 57–89, 1947.
[45] A. S.-L.-H. Association, “Determining threshold level for speech,” 1988.
[46] Google, Tensorflowlite, 2020.
[47] Google, Androidstudio, 2020.
[48] X. J. Sun and Z. W. Shuang, “Controlling a jitter buffer,” Feb. 11 2020. US Patent 10,560,393.
[49] S. M. Thoen, “Wireless communication device,” Nov. 26 2019. US Patent 10,492,002.
[50] A. Ganeshkumar, “Reducing codec noise in acoustic devices,” May 14 2019. US Patent 10,290,309.
[51] A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, “Perceptual evaluation of speech quality
(pesq)-a new method for speech quality assessment of telephone networks and codecs,” in 2001
IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.
01CH37221), vol. 2, pp. 749–752, IEEE, 2001.
[52] M. C. Killion, P. A. Niquette, G. I. Gudmundsen, L. J. Revit, and S. Banerjee, “Development of a quick
speech-in-noise test for measuring signal-to-noise ratio loss in normal-hearing and hearing-impaired
listeners,” The Journal of the Acoustical Society of America, vol. 116, no. 4, pp. 2395–2405, 2004.
[53] C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, “A short-time objective intelligibility measure
for time-frequency weighted noisy speech,” in 2010 IEEE international conference on acoustics,
speech and signal processing, pp. 4214–4217, IEEE, 2010.
[54] MATLAB, version 7.10.0 (R2010a). Natick, Massachusetts: The MathWorks Inc., 2010.
[55] K. Smeds, F. Wolters, and M. Rung, “Estimation of signal-to-noise ratios in realistic sound scenarios,”
Journal of the American Academy of Audiology, vol. 26, no. 2, pp. 183–196, 2015.
[56] A. H. Abdelaziz, “Ntcd-timit: A new database and baseline for noise-robust audio-visual speech recognition.,”
in INTERSPEECH, pp. 3752–3756, 2017.
[57] C. Valentini-Botinhao et al., “Noisy reverberant speech database for training speech enhancement
algorithms and tts models,” 2017.
[58] D. Jiang, X. Lei, W. Li, N. Luo, Y. Hu, W. Zou, and X. Li, “Improving transformer-based speech
recognition using unsupervised pre-training,” arXiv preprint arXiv:1910.09932, 2019.
[59] L. Ma, B. Milner, and D. Smith, “Acoustic environment classification,” ACM Transactions on Speech
and Language Processing (TSLP), vol. 3, no. 2, pp. 1–22, 2006.
[60] R. Ardila, M. Branson, K. Davis, M. Henretty, M. Kohler, J. Meyer, R. Morais, L. Saunders, F. M. Tyers,
and G. Weber, “Common voice: A massively-multilingual speech corpus,” arXiv preprint arXiv:
1912.06670, 2019.
[61] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein
gans,” in Advances in neural information processing systems, pp. 5767–5777, 2017.
[62] P. Scalart et al., “Speech enhancement based on a priori signal to noise estimation,” in 1996 IEEE International
Conference on Acoustics, Speech, and Signal Processing Conference Proceedings, vol. 2,
pp. 629–632, IEEE, 1996.
[63] S. R. Park and J. Lee, “A fully convolutional neural network for speech enhancement,” arXiv preprint
arXiv:1609.07132, 2016.
[64] J.-M. Valin, “A hybrid dsp/deep learning approach to real-time full-band speech enhancement,” in
2018 IEEE 20th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–5, IEEE,
2018.

無法下載圖示
全文公開日期 2030/08/24 (校外網路)
全文公開日期 2030/08/24 (國家圖書館:臺灣博碩士論文系統)
QR CODE