基於深度學習之端到端語音辨識與物件偵測於雙臂機器人之整合應用

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃泗堯 Ssu-Yao Huang
論文名稱：	基於深度學習之端到端語音辨識與物件偵測於雙臂機器人之整合應用 Application of End-to-End Speech Recognition and Object Detection Based on Deep Learning in Dual Arms Robot
指導教授：	黃緒哲 Shiuh-Jer Huang
口試委員:	郭重顯 Chung-Hsien Kuo 陳亮光 Liang-Kuang Chen
學位類別：	碩士 Master
系所名稱：	工程學院 - 機械工程系 Department of Mechanical Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	中文
論文頁數：	92
中文關鍵詞：	雙臂機器人、深度學習、YOLOv3 、人機協作、模糊滑動控制
外文關鍵詞：	Dual-arms robot, deep learning, YOLOv3, human-machine cooperation, FSMC
相關次數：	點閱：393 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

本研究整合雙臂機器人之可程式化邏輯陣列(FPGA)運動控制系統及嵌入式
人工智慧運算平台 NVIDIA® Jetson™ TX2，以執行短句語音識別及立體影像物
體辨識，機器人受命進行抓取物件，藉由安裝於機器人胸前之麥克風陣列
(ReSpeaker Mic Array v2.0)取得音源訊號，解析為機器人之外在命令，並藉由機
器人頭部之雙目立體攝影機(ZED Stero Camera)擷取影像資訊，使機器人得以在
環境中找尋目標物，以完成雙臂互動的整合應用。
語音系統由嵌入式平台 NVIDIA® Jetson™ TX2 搭配 Kaldi、Tensorflow 等函
式庫運算，分別進行資料前處理及模型訓練，本研究將音頻訊號經由麥克風陣
列提取至自己訓練的語音模型(RNN、LSTM)，由前向類神經網路對模型的輸出
處理後得到每一字節的機率分布，再計算 CTC 損失函數並進行音素與語料庫比
對，最後輸出短句辨識結果。
機器視覺亦由嵌入式平台 NVIDIA® Jetson™ TX2 搭配立體鏡頭之 SDK 及
OpenCV 等函式庫進行影像校正、視差運算、三維座標轉換、影像輪廓搜尋和
卷積神經網路模型訓練等處理。本研究在模型架構採用 YOLOv3 方法判別目標
物體的種類，並預測物體的中心點作為雙臂機器人操作之目標點，並透過座標
轉換平移的方式將相機座標轉為機械手臂座標，由通用非同步收發傳輸器
(UART)傳至 FPGA 雙臂機器人運動控制系統，以完成物體辨識後之抓取動作。
綜合前述深度學習模型識別結果及 UART 介面傳輸，以達成端到端語音辨識結
合物件識別之雙臂機器人控制的人機協作。

In this research, Field Programmable Gate Array(FPGA) is used to design the
dual arm robotic controller, and embedded artificial intelligence platform NVIDIA®
Jetson™ TX2 is employed the short-sentence speech to text(STT) and the 3D object
recognition system. The ReSpeaker microphone array which is installed on the chest
of the robot, we used to catch the sound signal and parse it into the command and the
ZED Stero Camera is used to capture the image information for the robotic system to
find the target object in the environment.
The voice recognition system is constructed on an embedded platform
NVIDIA® Jetson™ TX2 with Kaldi, Tensorflow and other library for data pre
processing and model training. The sound signal is extracted through the microphone
array and sent to the self-trained speech model(RNN、LSTM).The forward neural
network is used to process the output model for obtaining the probability distribution
of each character, and calculating the CTC loss function. Then, the phoneme and text
corpus are compared to find the short sentence recognition result.
Machine vision is also constructed on the embedded platform NVIDIA®
Jetson™ TX2. The SDK of ZED, OpenCV libraries and Tensorflow are used to rectify
image, calculate disparity, build three-dimensional coordinates, capture 3D points
cloud and train the neural network model. In this research, the YOLOv3 model
structure is employed to decide object classification and object 3D central position.
Then, the image position and orientation data are sent to dual arm robotic system
FPGA controller through UART, and the target point for accomplishing the
application of human-robot interaction by using end-to-end speech to text command
integrated with object recognition system.

摘要 I
Abstract II
致謝 IV
目錄 V
圖目錄 VIII
表目錄 XI
1 文獻回顧 1
2 研究動機與目的 3
3 論文架構 3
第 2 章 系統架構 4
1 系統簡介 4
2 語音及視覺系統 5
2.1 NVIDIA Jetson TX2 嵌入式電腦  5
2.2 語音訊號系統 6
2.3 影像擷取系統. 8
3 運動控制系統 8
3.1 Nios II 發展板  10
3.2 馬達驅動電路 11
4 雙臂機器人 11
第 3 章 運動學分析  13
1 雙臂機器人運動學分析 14
1.1 連桿座標系統與機械手臂參數 14
1.2 正向運動學 17
1.3 反向運動學 18
1.4 梯形速度規劃 22
第 4 章 語音辨識系統  27
1 原始音頻前處理 27
1.1 訊號增強 28
1.2 MFCC 特徵擷取  29
2 循環神經網路架構 31
2.1 循環神經網路 31
2.2 梯度消失及長期依賴 35
3 長短期記憶神經網路(LSTM)  37
第 5 章 立體視覺系統  39
1 YOLOV2 與 YOLOV3 之方法 39
1.1 網路架構 39
1.2 多尺度預測 40
1.3 小目標檢測 43
1.4 損失函數的變化 44
1.5 多標記分類 44
2 立體視覺計算 45
2.1 座標轉換 45
第 6 章 雙臂機器人運動控制及策略 47
1 模糊滑動模式控制(FSMC) 47
2 機器人控制策略 51
第 7 章 實驗結果與討論 53
1 短句語音辨識效果解析 53
2 物件準確率之信心程度分析 55
3 靜止物件三維座標準確度評估 59
4 機械手臂運動控制 62
5 整體實驗結果 71
第 8 章 結論與未來展望  73
1 結論 73
2 未來展望 74 
                                

【1】 Linjian Sun, Ye Zhang, Xuling Chang and Jiajia Xu, “Design of Integrated
Vision and Speech Technology for a Robot Receptionist,” 2018 IEEE
International Conference on Mechatronics and Automation (ICMA),
Changchun, China, 2018, pp. 1329-1333.
【2】 A. Satt, S. Rozenberg, R. Hoory, "Efficient emotion recognition from speech
using deep learning on spectrograms", International Journal of Computer
Applications, vol. 52, no. 9, 2012.
【3】 A. Zeng , Yu, K. T., Song, S., Suo, D., Walker, E., Rodriguez, A., and Xiao, J.
“Multi-view self-supervised deep learning for 6D pose estimation in the
Amazon Picking Challenge,” 2017 IEEE International Conference on Robotics
and Automation (ICRA), Singapore, 2017, pp. 1386-1383.
【4】 Cristina Nuzzi, Simone Pasinetti, Matteo Lancini, Franco Docchio, and
Giovanna Sansoni, “Deep learning-based hand gesture recognition for
collaborative robots,”2019 IEEE Instrumentation & Measurement Magazine,
2019, pp. 44-51.
【5】 Hikaru Sasaki, Tadashi Horiuchi, and Satoru Kato, “A study on vision-based
mobile robot learning by deep Q-network,” 2017 56th Annual Conference of
the Society of Instrument and Control Engineers of Japan (SICE), Kanazawa,
Japan, 2017, pp. 799-804.
【6】 M. Schuster, K.K. Paliwal, “Bidirectional recurrent neural networks,” in
IEEE Transactions on Signal Processing
【7】 Sepp Hochreiter, Jürgen Schmidhuber, “Long Short-Term Memory,” in Journal
Neural Computation
【8】 Chung, Junyoung; Gulcehre, Caglar; Cho, KyungHyun; Bengio, Yoshua
(2014). "Empirical Evaluation of Gated Recurrent Neural Networks on
Sequence Modeling"
【9】 T Mikolov, I Sutskever, K Chen, GS Corrado, J Dean,“Distributed
representations of words and phrases and their compositionality,” in Journal
Neural information processing systems
【10】 Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe:
Global Vectors for Word Representation.
【11】 S. Ren, K. He, R. Girshick and J. Sun, “Faster R-CNN: Towards Real-Time
Object Detection with Region Proposal Networks,” in IEEE Transactions on
Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, June
1 2017
【12】 E. Shelhamer, J. Long and T. Darrell, “Fully Convolutional Networks for
Semantic Segmentation,” in IEEE Transactions on Pattern Analysis and
Machine Intelligence, vol. 39, no. 4, pp. 640-651, April 1 2017.
【13】 J. Redmon, S. Divvala, R. Girshick, and A. Farhadi., “Youonly look once:
Unified, real-time object detection,” arXiv preprint arXiv:1506.02640, 2015
【14】 Martin Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen,
Craig Citro,Greg Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay
Ghemawat, Ian Goodfellow,Andrew Harp, Geoffrey Irving, Michael Isard,
Yangqing Jia, Lukasz Kaiser, Manjunath Kudlur,Josh Levenberg, Dan Man,
Rajat Monga, Sherry Moore, Derek Murray, Jon Shlens, BenoitSteiner, Ilya
Sutskever, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Oriol Vinyals,
PeteWarden, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng, “TensorFlow:
Large-Scale MachineLearning on Heterogeneous Distributed Systems,” 2015.
【15】 Emmanuel Vincent, Jon Barker, Shinji Watanabe, Jonathan Le Roux,
Francesco Nesta, et al.. The second 'CHiME' Speech Separation and
Recognition Challenge: Datasets, tasks and baselines. ICASSP - 38th
International Conference on Acoustics, Speech, and Signal Processing - 2013,
May 2013, Vancouver, Canada. pp.126-130, 2013.
【16】 TEXAS INSTRUMENTS, “LMD18200 3A, 55V H-Bridge,” TEXAS
INSTRUMENTS Corporation, 2011.
【17】 M. W. Spong, S. Hutchinson, and M. Vidyasagar, Robot Modeling and
Control, United States: John Wiley & Sons, 2006.
【18】 Alex Graves, Abdel-rahman Mohamed, Geoffrey Hinton. “Speech
recognition with deep recurrent neural networks.” in IEEE international
conference on acoustics, speech and signal processing – 2013,pp. 6645-6649,
May 26 2013.
【19】 W. M. Peters, “3D Sensing 3D Shape from X Perspective Geometry Camera
Model Camera Calibration General Stereo Triangulation 3D Reconstruction.”
【20】 R. Palm, “Sliding mode fuzzy control”, IEEE Int. Conference Fuzzy System,
San Diego, CA, pp. 519-526, Mar 1992
【21】 L. X. Wang, “Stable adaptive fuzzy control of nonlinear systems”, IEEE
Trans. Fuzzy System, vol.1, pp.146-155, 1993.
【22】 R. Palm, “Robust control by fuzzy sliding mode,” Automatica, Vol. 30, No. 9,
pp. 1429-1437, 1994
【23】 S.C. Lin, and Y.Y. Chen, “Design of adaptive fuzzy sliding mode control for
nonlinear system control,” in Proc. 3rd IEEE Conf. Fuzzy Syst., IEEE World
congress Computat. Intell., Orlando, FL, Vol.1, pp.35-39, Jun 1994
【24】 S.Y. Yi and M.J. Chung, “Robustness of Fuzzy Logic Control for an
Uncertain Dynamic System”, IEEE Transactions on Fuzzy Systems, Vol.6,
No.2, pp. 216-225, May 1998
【25】蔡金波,“機器人於自動化組裝之應用,”碩士論文, 國立台灣科技大學機械
工程系, 2003
【26】巫憲欣, “以系統晶片發展具機器視覺之機械手臂運動控制,” 碩士論文,
國立台灣科技大學機械工程系, 2006
【27】李佑琳, “嵌入式觸覺感測夾爪與機械手臂之整合,”碩士論文, 國立台灣
科技大學機械工程系,2013

全文公開日期 2024/08/13 (校內網路)
全文公開日期 2024/08/13 (校外網路)
全文公開日期 2024/08/13 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文