簡易檢索 / 詳目顯示

研究生: 陳兆捷
Chao-chieh CHEN
論文名稱: 暫穩態特徵線性比對法之機器人中文語音辨識系統設計與實現
Design and Implementation of Mandarin Robot Speech Recognition System using Linear Transient/Steady-State Feature Coefficient Alignment
指導教授: 施慶隆
Ching-Long Shih
口試委員: 許新添
Hsin-Teng Hsu
古鴻炎
Hung- yan Gu
李文猶
Wen-Yo Lee
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2009
畢業學年度: 97
語文別: 中文
論文頁數: 114
中文關鍵詞: 語音辨識暫穩態特徵線性比對法間格邊界函數法
外文關鍵詞: Speech recognition, LTSFCA, Interval Boundary Function model (IBF)
相關次數: 點閱:161下載:11
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  本論文設計並實現一套機器人中文語音辨識系統。基於過去機器人語音辨識廣泛被使用的”動態時軸校正法(Dynamic Time Warping,DTW)”運算量太大問題,提出一個新穎的語音辨識設計 “暫穩態特徵線性比對法(Linear Transient/Steady-State Feature Coefficient Alignment, LTSFCA)”。此法基於中文發音的基礎,將中文字音分成暫穩態的觀點,拆開獨立作辨識。方法原理簡單且運算量小,不同於DTW需要快速的處理核心,可以將之實現於普通或舊世代的電腦。或因節省運算量而可同步進行視覺辨識、輪具控制、手臂控制……等功能。針對線性比對設計”間格邊界函數法(Interval Boundary Function model,IBF)”校正時間,節省比對過程的運算。由於實現”獨立字”辨識而非”獨立詞”辨識,因此可以完成多字辨識的功能(但本論文規格僅設計至5字辨識)。並利用不同字數的分類,擴大欲辨識之樣本,也大幅增加辨識率。實驗結果,在35個辨識樣本(Database)中,特定語者之辨識率為98.86%;而在僅使用本文作者單人之語音訓練,完成非特定語者(2男2女測試)之整體語音辨識率也達96.43%,結果相當成功。為了增加多元化的語音命令,滿足更實際的機器人操控。設計命令句的”量化詞規則”使之能有例如”前進一四九”等的進階辨識功能;另外設計命令句的”連接詞規則”,使之能有例如”前進後右轉”、”啟動並加速”等的進階辨識功能。但這類進階功能之辨識率僅為90%。設計訓練器可以在一次訓練中接受多次但須相同的語音輸入,並直接轉換成梅爾倒頻譜特徵參數(Mel-frequency cepstral coefficient,MFCC)存入或修正。讓使用者可以單次輸入即完成所有的訓練程序。


This thesis design and implement of a robotic Mandarin speech recognition system.Due to the reason that the commonly used conventional robot speech recognition system “Dynamic Time Warping model (DTW)” requires an extensive amount of calculation and processing, we proposed and tested a novel speech recognition design“Linear Transient/Steady-State Feature Coefficient Alignment model (LTSFCA)” Relying on the basis of Chinese language pronunciations which separates speeches into transient and Steady-State segments, this method fractionates speeches into pieces for recognition. In contrast to DTW, this particular method is built upon simple principles and it requires much less processing resources and can, therefore, be applied on simpler and older devices or computers. Spared processing resources can then be used on other synchronization functions such as machine vision, motion planning, manipulator control.We constructed “Interval Boundary Function model (IBF)” in order to shorten the calculation required during the aligning procedures of linear alignment. Multiple words recognition was achieved. Utilization on word numbers categorizations and the increase of database numbers greatly raise the recognition rate. According to our experimental results, the speech recognition rate was 98.86% among 35 databases tested, while the recognition rate was 96.43% for speaker-independent tests(2 males and 2 females) using one training result done by the author of this article.We designed the“rule of the quantization” commands such as “ move forward 149”. Moreover, the“rule of the conjunction” was also built in order to recognize advanced commands such as “move forward then turn to the right” or “start up and accelerate”.However,the recognition rates for these were roughly 90% only.Design trainer was able to receive multiple inputs of the same speech in one single training and transform them to Mel-frequency cepstral coefficient(MFCC) for storages or modifications.

摘要 ABSTRACT 致謝 目錄 圖表索引 第一章 緒論 1.1 前言 1.2 研究背景 1.3 辨識目標 1.4 論文架構 第二章 辨識前置處理 2.1 中文特徵 2.2 聲紋圖 2.3 特徵參數的取得 2.3.1 類比聲音訊號取樣 2.3.2 端點偵測 2.3.3 取音框 2.3.4 預強調處理 2.3.5 視窗函數 2.3.6 轉頻域操作 2.3.7 語音的聲學簡化模型 2.3.8 倒頻譜分析 2.3.9 梅爾刻度 2.3.10 梅爾倒頻譜參數求法 2.4 動態特徵參數 2.5 正規化 第三章 語音辨識法則 3.1 動態時軸校正 3.1.1 不同時間比例之命令問題 3.1.2 動態時軸校正原理 3.1.3 動態時軸校正作獨立詞辨認 3.1.4 簡化計算 3.2 暫穩態特徵線性比對法 3.2.1 介紹 3.2.2 暫態穩態的波形觀點 3.2.3 中文發音的特性 3.2.4 暫穩態特徵線性比對法 3.3 線性比對 3.3.1 最小公倍數法 3.3.2 間格邊界函數法 3.3.3 比對 3.4 穩態比對法的替代 3.4.1 計算量的再節省 3.4.2 欠缺穩態特徵 第四章 增強功能 4.1 多字辨識 4.1.1 語音命令表 4.1.2 字元組合 4.2 簡單語音命令的文法建置 4.2.1 動作量化文法辨識 4.2.2 複合動作命令文法辨識 第五章 語音訓練 5.1 訓練緣起 5.2 音框相對位置校正 5.3 訓練原理 5.4 單次語音大量輸入訓練 第六章 實驗結果 6.1 語音辨識系統規格 6.2 特定語者實驗 6.3 非特定語者實驗 6.3.1 曉惠(女性)的辨識結果 6.3.2 為中(女性)的辨識結果 6.3.3 奕男(男性)的辨識結果 6.3.4 仲欽(男性)的辨識結果 6.4 簡單命令文法辨識實驗 6.5 容錯實驗 6.6 抗噪實驗 6.7 檢討 6.7.1 特定語者實驗 6.7.2 非特定語者實驗 6.7.3 簡單命令文法辨識實驗 6.7.4 容錯實驗 6.7.5 抗噪實驗 第七章 結論與未來展望 7.1 結論 7.2 未來展望 7.2.1 更完備的訓練 7.2.2 連音辨識技術 7.2.3 快速傅立葉轉換 7.2.4 浮點運算改為定點運算 7.2.5 二字命令字的權重調整 7.2.6 語音合成技術 參考文獻

[1] Fry,J.;Matsui,T.;“Natural Dialogue with the Jijo-2 Office Robot,”Proceeddings of IEEE/RSJ International Conference,pp.1278-1283,1998.
[2] Rabiner,L.R;”An introduction to hidden markov models,” IEEE ASSP Mag,pp.4-16,Vol.3,1986.
[3] Rabiner,L.R;”A Tutorial on Hidden Markov Models and Select Applications in Speech Recognition,”Proceedings of the IEEE, pp.257-286,Vol.77,1989.
[4] Kwong,S.;Chau,C.W.;Halang,W.A.;“Genetic algorithm for optimizing the nonlinear time alignment of automatic speech recognition systems ”,IEEE Transactions, Acoustics, Speech, and Signal Processing,pp 559-566,1996.
[5] Haimi-Cohen,R.;Rannon,Z.M.;“Dynamic Time Warping with Generalized Templates for Speaker Independent Speech Recognition”,IEEE Conference, Electrical and Electronics Engineers in Israel,pp 1549-1552,1989.
[6] 顏坤銘,家用機器人之語音辨識系統,國立交通大學電機與控制工程研究所碩士論文,2002。
[7] 陳明峯,DSP在語音辨識系統之應用與研究,國立台灣科技大學機械工程研究所碩士論文,2003。
[8] Qingyang Hong;Caihong Zhang;Xiaoyang Chen;Yan Chen; “Embedded speech recognition system for intelligent robot” IEEE/RSJ International Conference, Digital Object Identifier, pp.35-38,2007.
[9] 陳明熒,PC 電腦語音辨認實作,旗標,1994年
辦公室機器人對話。
[10] O'Shaughnessy;Speech communications;human and machine,2ed,2000.
[11] 王小川,語音訊號處理,全華科技圖書,2005。
[12] Al-Haddad,S.A.R.;Samad,S.A.;Hussain,A.;Ishak,K.A.; Mirvaziri,H.;Decision Fusion for Isolated Malay Digit Recognition Using Dynamic Time Warping (DTW) and Hidden Markov Model (HMM) Research and Development, 2007. 5th Student Conference,pp.1–6,2007.
[13] 於錦恩,民國注音字母政策史論,中華書局,2007。
[14] Lienard,J.-S.;Soong,F.;”On the use of transient information in speech recognition” Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '84., Volume 9, Part 1,pp.9-12,1984.
[15] Wu,D.;Gowdy,J.N.;”Time-frequency-energy representation based real-time speech recognition” Southeastcon '93, Proceedings., IEEE,Digital Object Identifier 10.1109/SECON,3p.,1993.
[16] Mariani,J.;Lienard,J.;“Acoustic-phonetic recognition of connected speech using transient information “Acoustics, Speech, and Signal Processing, IEEE International Conference,pp.667-670 Volume 2,1977.

QR CODE