簡易檢索 / 詳目顯示

研究生: 黃威翔
Wei-Hsiang Huang
論文名稱: 以超聲波下達無聲指令
Endophasia: Utilizing Acoustic-Based Images for Dropping Contact-Free Silent Speech Commands
指導教授: 姚智原
Chih-Yuan Yao
黃大源
Da-Yuan Huang
口試委員: 姚智原
Chih-Yuan Yao
黃大源
Da-Yuan Huang
陳炳宇
Bing-Yu Chen
余能豪
Neng-Hao Yu
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 38
中文關鍵詞: 無聲指令聲學成像移動設備
外文關鍵詞: Silent command, acoustic-based imaging, mobile devices
相關次數: 點閱:267下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,透過無聲的方式進行語音輸入越來越受到關注,因為使用者可以透過直觀的方式進行輸入而不會引起他人的注意,且也不必考慮隱私上的的問題。然而,目前此種輸入方式主要是透過影像辨識或將麥克風貼於使用者的喉嚨進行實現,這些方式除了會消耗大量電能外,還會有隱私上的問題。在本文中,我們提出一種只需喇叭以及麥克風的感測技術,此技術不僅功耗低,還具有較少隱私問題。
    我們挑選10個單字進行實驗,透過手機內建的喇叭及麥克風偵測手機前使用者的說話的嘴型。並透過CNN深度學習與15位使用者的單字資料收集、訓練出 Within-user model 與 Cross-user model,最後得出 81.41 \% 與 81.80\%的準確率,驗證了以超聲波技術無聲對手機下指令的可行性。


    Using silent speech to drop commands has received growing attention, as users can utilize existing command set from voicebased interface without evoking other people's attention. Such interaction keeps the privacy and social acceptance from others. However, current solutions for recognizing silent speech mainly relies on vision-based data or attaching the microphone on the throat. These solutions are either power-consuming and have potential privacy issues. In this paper, we propose a sensing technique that only needs a pair of microphone and speaker, which not only consumes only few powers but also have less privacy concerns.
    We chose 10 commands for experimentation and used the built-in speaker and microphone to detect the mouth movement of the user in front of the phone. Through the deep learning of CNN and the command data collection from 15 users, we trained within-user and cross-user model, and through the accuracy of 81.41\% and 81.80\%, the feasibility of using ultrasonic technology to drop silently command to the mobile phone is verified.

    1. Introduction 2. Related Works 3. Method 4. Evaluation 5. Discussion 6. Conclusion

    [1] Jonathan S. Brumberg, Alfonso Nieto-Castanon, Philip R. Kennedy, and Frank H. Guenther. 2010. Brain-computer Interfaces for Speech Communication. Speech Commun. 52, 4 (April 2010), 367–379. DOI: http://dx.doi.org/10.1016/j.specom.2010.01.001
    [2] Björn Engquist, Anna-Karin Tornberg, and Richard Tsai. 2005. Discretization of Dirac delta functions in level set methods. J. Comput. Phys. 207, 1 (2005), 28–51. [3] GDJR Forney. 1972. Maximum-likelihood sequence estimation of digital sequences in the presence of intersymbol interference. IEEE Transactions on Information theory18, 3 (1972), 363– 378.
    [4] Masaaki Fukumoto. 2018. SilentVoice: Unnoticeable Voice Input by Ingressive Speech. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (UIST ’18)., ACM, New York, NY, USA, 237–246. DOI: http://dx.doi.org/10.1145/3242587. 3242603
    [5] Tatsuya Hirahara, Makoto Otani, Shota Shimizu, Tomoki Toda, Keigo Nakamura, Yoshitaka Nakajima, and Kiyohiro Shikano. 2010. Silent-speech enhancement using body-conducted vocal-tract resonance signals. Speech Communication 52, 4 (2010), 301 –313. DOI: https: //doi.org/10.1016/j.specom.2009.12.001 Silent Speech Interfaces.
    [6] Robin Hofe, Stephen R. Ell, Michael J. Fagan, James M. Gilbert, Phil D. Green, Roger K. Moore, and Sergey I. Rybchenko. 2013. Small-vocabulary speech recognition using a silent speech interface based on magnetic sensing. Speech Communication55, 1 (2013), 22–32. DOI: https://doi.org/10.1016/j.specom.2012.02.001
    [7] Naoki Kimura, Michinari Kono, and Jun Rekimoto. 2019. SottoVoce: An Ultrasound ImagingBased Silent Speech Interaction Using Deep Neural Networks. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19).ACM, New York, NY, USA.
    [8] Richard Li, Jason Wu, and Thad Starner. 2019. TongueBoard: An Oral Interface for Subtle Input. In Proceedings of the 10th Augmented Human International Conference 2019 (AH2019). ACM, New York, NY, USA, Article 1, 9 pages. DOI: http://dx.doi.org/10.1145/3311823. 3311831 27
    [9] G. S. Meltzner, J. T. Heaton, Y. Deng, G. De Luca, S. H. Roy, and J. C. Kline. 2017. Silent Speech Recognition as an Alternative Communication Device for Persons With Laryngectomy. IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, 12 (Dec 2017), 2386– 2398. DOI: http://dx.doi.org/10.1109/TASLP.2017.2740000 [10] Yoshitaka Nakajima, Hideki Kashioka, Nick Campbell, and Kiyohiro Shikano. 2006. NonAudible Murmur (NAM) Recognition. IEICE - Trans. Inf. SystE89-D, 1(Jan. 2006), 1–4. DOI: http://dx.doi.org/10.1093/ietisy/e89-d.1.1
    [11] Y. Nakajima, H. Kashioka, K. Shikano, and N. Campbell. 2003. Non-audible murmur recognition input interface using stethoscopic microphone attached to the skin. In 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP ’03)., Vol. 5. V–708. DOI: http://dx.doi.org/10.1109/ICASSP.2003.1200069
    [12] Chuong H Nguyen, George K Karavas, and Panagiotis Artemiadis. 2017. Inferring imagined speech using EEG signals: a new approach using Riemannian manifold features. 2003 Journal of Neural Engineering 15, 1 (nov 2017), 016002. DOI: http://dx.doi.org/10.1088/ 1741-2552/aa8235
    [13] Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 806–813.
    [14] Ke Sun, Chun Yu, Weinan Shi, Lan Liu, and Yuanchun Shi. 2018a. Lip-Interact: Improving Mobile Device Interaction with Silent Speech Commands. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (UIST ’18). ACM, New York, NY, USA, 581–593. DOI: http://dx.doi.org/10.1145/3242587.3242599 [15] Ke Sun, Ting Zhao, Wei Wang, and Lei Xie. 2018b. Vskin: Sensing touch gestures on surfaces of mobile devices using acoustic signals. In Proceedings of the 24th Annual International Conference on Mobile Computing and Networking. ACM, 591–605. [16] B. Denby,T. Schultz,K. Honda,T. Hueber,J.M. Gilbert,J.S. Brumberg. 2010. Silent speech interfaces. In Speech Communication 52, 4 (2010), 270–287. DOI: https://doi.org/10.1016/j. specom.2009.08.002 28
    [17] Viet-Anh Tran,Gérard Bailly,Hélène. 2010. Improvement to a NAM-captured whisper-to-speech system. In Speech Communication 52 (2010) 314–326. DOI: https://doi.org/10.1016/j. specom.2009.11.005
    [18] Szu-Chen Jou, Tanja Schultz, and Alex Waibel. 2004. Adaptation for Soft Whisper Recognition Using a Throat Microphone. In INTERSPEECH DOI: http://citeseerx.ist.psu.edu/ viewdoc/summary?doi=10.1.1.138.4495
    [19] Michael J Fagan, Stephen R Ell, James M Gilbert, E Sarrazin, and Peter M Chapman. 2008. Development of a (silent) speech recognition system for patients following laryngectomy. In Medical Engineering and Physic 30, 4 (2008), 419–425. DOI: https://doi.org/10.1016/ j.medengphy.2007.05.003
    [20] Michael Wand and Tanja Schultz. 2011. Session-independent EMG-based Speech Recognition. In In Biosignals. 295–300.
    [21] Arnav Kapur, Shreyas Kapur, Pattie Maes. 2018. AlterEgo: A Personalized Wearable Silent Speech Interface. In 23rd International Conference on Intelligent User Interfaces (IUI ’18). ACM, New York, NY, USA, 43 –53. DOI: https://dl.acm.org/citation.cfm?doid= 3172944.3172977
    [22] Hiroyuki Manabe, Akira Hiraiwa , Toshiaki Sugimura. 2003. Unvoiced speech recognition using EMG - mime speech recognition. In CHI ’03 Extended Abstracts on Human Factors in Computing Systems (CHI EA ’03). ACM New York, NY, USA, 794-795. DOI: https: //dl.acm.org/citation.cfm?id=765996
    [23] Ayaz A. Shaikh, Dinesh K. Kumar, Wai C. Yau, M. Z. Che Azemin, Jayavardhana Gubbi. 2010. Lip reading using optical flow and support vector machines. In 2010 3rd International Congress on Image and Signal Processing (CISP2010). 16-18 https://ieeexplore.ieee. org/document/5646264
    [24] Victoria M. Florescu, Lise Crevier-Buchman, Bruce Denby, Thomas Hueber, Antonia ColazoSimon, Claire Pillot-Loiseau, Pierre Roussel-Ragot, Cédric Gendrot, Sophie Quattrocchi. 2010. Silent vs Vocalized Articulation for a Portable Ultrasound-Based Silent Speech Interface. In Eleventh Annual Conference of the International Speech Communication Association. 29
    [25] Thomas Hueber, Elie-Laurent Benaroya, Gérard Chollet, Bruce Denby, Gérard Dreyfus, Maureen Stone. 2010. Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips. In Speech Communication. 52, 4 (2010), 288–300.
    [26] Anne Porbadnigk, Marek Wester, and Tanja Schultz Jan-p Calliess. 2009. EEG-based speech recognition impact of temporal effects. (2009).
    [27] Hao Lu, Yang Li. 2015. Gesture On: Enabling Always-On Touch Gestures for Fast Mobile Access from the Device Standby Mode. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI ’15). ACM, New York, NY, USA, 3355–3364. DOI: https://dl.acm.org/citation.cfm?doid=2702123.2702610
    [28] Per Ola Kristensson, Shumin Zhai. 2007. Command strokes with and without preview: using pen gestures on keyboard for command selection. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’07). ACM, New York, NY, USA, 1137–1146. DOI: https://dl.acm.org/citation.cfm?doid=2702123.2702610 [29] Alex Butler, Shahram Izadi, and Steve Hodges. 2008. SideSight: Multi-”Touch” Interaction Around Small Devices. In Proceedings of the 21st Annual ACM Symposium on User Interface Software and Technology (UIST ’08). ACM, New York, NY, USA, 201–204. DOI: http://dx.doi.org/10.1145/1449715.1449746
    [30] Daniel Wigdor, Clifton Forlines, Patrick Baudisch, John Barnwell, and Chia Shen. 2007. Lucid Touch: A See-through Mobile Device. In Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology (UIST ’07). ACM, New York, NY, USA, 269–278. DOI: https://dl.acm.org/citation.cfm?id=1294259
    [31] Otmar Hilliges, Shahram Izadi, Andrew D. Wilson, Steve Hodges, Armando Garcia-Mendoza, and Andreas Butz. 2009. Interactions in the Air: Adding Further Depth to Interactive Tabletops. In Proceedings of the 22Nd Annual ACM Symposium on User Interface Software and Technology (UIST ’09). ACM, New York, NY, USA, 139–148. DOI: https://dl.acm.org/citation. cfm?id=1622203
    [32] Robert Xiao, Teng Cao, Ning Guo, Jun Zhuo, Yang Zhang, Chris Harrison. 2018. LumiWatch: On-Arm Projected Graphics and Touch Input. In Proceedings of the 2018 CHI Conference on 30 Human Factors in Computing Systems Paper(CHI ’18). ACM New York, NY, USA, 95. DOI: https://dl.acm.org/citation.cfm?id=3173669
    [33] Apple. 2019. iOS-Siri-Apple. (2019). DOI: https://www.apple.com/siri/
    [34] Google. 2019. Google Assistant. (2019). DOI: https://assistant.google.com/ 31

    無法下載圖示 全文公開日期 2024/07/22 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE