研究生: |
陳緯仁 WEI-JEN CHEN |
---|---|
論文名稱: |
應用於卷積神經網路物件辨識之互動系統 An Interactive System for Convolutional Neural Network Based Object Detection Applications |
指導教授: |
郭重顯
Chung-Hsien Kuo |
口試委員: |
傅立成
Li-Chen Fu 宋開泰 Kai-Tai Song 蘇順豐 Shun-Feng Su |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2018 |
畢業學年度: | 106 |
語文別: | 中文 |
論文頁數: | 94 |
中文關鍵詞: | 深度學習 、卷積神經網路 、YOLO 、物件辨識 、自動影像標籤 |
外文關鍵詞: | Deep Learning, Convolutional Neural Network, YOLO, Object Detection, Auto-labeling |
相關次數: | 點閱:466 下載:2 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文提出一應用於機器人影像認知學習之互動式學習系統,此一系統以模擬幼童認知階段,教導者手上拿取物品並輕微晃動引發幼童注意,藉由語音傳達正確物品名稱,進而教導幼童學習與認知口語指令之物品型態。在實作上,本系統建構在Linux 環境下,並整合 Google 語音辨識及Open CV 影像處理軟體套件進行初步輕微晃動物件影像區域與語音標籤進行關聯;其藉由輸入畫面透過光流演算法與手部位置影像辨識,找出物品之相對影像區域,並自動建立物件影像標籤資訊,以達到影像自動標籤(Auto-labeling)之目的。此外也提出像素平均值曲線判別法,自動篩選模糊、框取不完整之圖像,並提高物品影像標籤框選之正確性。上述所收到的大量影像標籤,進一步導入建構在YOLO 之卷積神經網路之深度學習模型,並進行訓練即更新網路模型權重,即可應用於線上實際預測框及物件資訊之顯示。本論文以11個物品進行測試,在導入像素平均值曲線判別法之自動物件影像標籤後,平均精確度(Mean Average Precision;mAP)從78.69%提高到86.6%,提高此一機器人影像認知學習之互動式學習系統之可行性。
This paper presents an image based interactive cognitive learning system for robots. The idea of this system comes from the emulation the cognitive learning of toddlers. The parents are usually slightly swing object to attract the attention of toddlers to learn the recognition of objects. At the same time, the parents speak words simultaneously for helping the correlation between the attracted object and spoken words. Practically, the proposed system was realized in a Linux environment. The open source codes of Google speech recognition service and the Open CV were initially used to correlate the detected object image area and spoken words. The input images were processed in terms of optical flow and hand area recognition to find an interested swing object image area. Such automatically retrieved interested swing object image areas were formed as the labeled dataset of a deep learning network. Hence, an auto-labeling technology can be realized. Moreover, a pixel averaging algorithm was used to reject the imperfect labeled dataset caused from blurring and incomplete object image coverage. This procedure significantly improves the quality of the auto-labeled dataset. The collected auto-labeled dataset was further processed in terms of a convolutional neural network (CNN) deep learning model with You Only Look Once (YOLO). As a consequence, a converged CNN model was used for real-time object detection applications. Finally, this thesis used 11 different objects for interactive cognitive learning experiments. The experiments showed that the proposed auto-labeling techniques achieved 78.69% mean average precision (mAP), and an improved 86.6% mAP was concluded when the pixel averaging algorithm was applied. Hence, the proposed interactive cognitive learning system is feasible, and it can be used for intelligent service robot in the future.
[1] 莊力文,供機器人自我學習的互動認知平台開發,國立臺灣科技大學,博士班論文,2015年。
[2] 黃珮雯,具穩定性最佳化理論於深層類神經網路語音辨識之研究,國立交通大學電機工程學系,碩士班論文,2016。
[3] 呂相弘,使用深層學習的語音辨識中的跨語言聲學模型,國立臺灣大學電信工程學研究所,碩士班論文,2016。
[4] 徐家鏞,類神經網路訓練結合局部資訊於強健性語音辨識之研究,國立中央大學資訊工程學系,碩士班論文,2015。
[5] 謝蕙蘭,利用多工學習加強人臉辨識與特徵偵測,國立臺灣大學資訊網路與多媒體系,碩士班論文,2016。
[6] 李亭緯,利用人臉五官為特徵之人臉辨識系統,國立中央大學資訊工程系,碩士班論文,2008。
[7] 白聖顗,藉由圖之線上構建於卷積神經網路上進行半監督式學習,國立臺灣大學資訊工程學系,碩士班論文,2016。
[8] 周伯威,以深層與卷積類神經網路建構聲學模型之大字彙連續語音辨識,國立臺灣大學電信工程學系,碩士班論文,2015。
[9] S. Li, M. Kleinehagenbrock, J. Fritsch, B. Wrede and C. Sagerer, “BIRON, let me show you something:evaluating the interaction with a robot companion,” IEEE International Conference on Systems, Man and Cybernetics, vol. 3, pp. 2827-2834, 2004.
[10] T. Cao, D. Li, C. Maple and R. Qiu, “Human interaction based Robot Self-Learning approach for generic skill learning in domestic environment,” Proceeding of the IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 203-208, 2013.
[11] A. Ghadirzadeh, J. Bütepage and A. Maki, “A sensorimotor reinforcement learning framework for physical Human-Robot Interaction,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2682-2688, 2016.
[12] H. Sawada, A. Todo, and T. Takechi, “Real time Speaker Tracking for Robotic Auditory System,” IECON 2006 - 32nd Annual Conference on IEEE Industrial Electronics, pp. 5474-5479, 2006.
[13] A. Punchihewa, and Z.M. Arshad, “Voice command interpretation for robot control,” International Conference on Automation, Robotics and Applications, pp. 90-95, 2011.
[14] M. Fatahi, and M. Farzaneh, “Smart way to verify the identity of the sound, based on neural network technique identification by voice,” International Conference on Knowledge-Based Engineering and Innovation (KBEI), pp. 345-347, 2015.
[15] M.M. Ghazi and H.K. Ekenel, “A Comprehensive Analysis of Deep Learning Based Representation for Face Recognition,” IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 102-109, 2016.
[16] C.K. Tran, C.D. Tseng and T.F. Lee, “Improving the Face Recognition Accuracy under Varying Illumination Conditions for Local Binary Patterns and Local Ternary Patterns Based on Weber-Face and Singular Value Decomposition,” International Conference on Green Technology and Sustainable Development (GTSD), pp. 5-9, 2016.
[17] P. Hsu and B. Chen, “Blurred Image Detection and Classification,” International Conference on Multimedia Modeling, vol. 4903, pp. 277-286, 2008.
[18] Y. Gong, Y. Yuan, W. Zou, Y. Zhao, S. Tang and Y. Qin, “Restoration of the Blurred Image Based on Continuous Blur Kernel,” Intelligent Human-Machine Systems and Cybernetics, vol. 2, pp. 461-465, 2016.
[19] A.N. Ragab, M. M. Rehan and Y. Hassan, “Partially blurred Images Restoration Using Adaptive Multi-Stage Approach,” IEEE Canadian Conference on Electrical and Computer Engineering, pp. 1-5, 2016.
[20] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding” Proceedings of the ACM International Conference on Multimedia, pp. 675-678, 2014.
[21] H. Qiao, Y. Li, F. Li, X. Xi and W. Wu, “Biologically Inspired Model for Visual Cognition Achieving Unsupervised Episodic and Semantic Feature Learning,” IEEE Transactions on Cybernetics, vol. 46, no. 10, pp. 2335-2347, 2016.
[22] A. Krizhevsky, S. Ilya and G.E. Hinton,“Imagenet classification with deep convolutional neural networks,” In Advances in Neural Information Processing Systems (NIPS), 2012.