簡易檢索 / 詳目顯示

研究生: 陳緯仁
WEI-JEN CHEN
論文名稱: 應用於卷積神經網路物件辨識之互動系統
An Interactive System for Convolutional Neural Network Based Object Detection Applications
指導教授: 郭重顯
Chung-Hsien Kuo
口試委員: 傅立成
Li-Chen Fu
宋開泰
Kai-Tai Song
蘇順豐
Shun-Feng Su
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 94
中文關鍵詞: 深度學習卷積神經網路YOLO物件辨識自動影像標籤
外文關鍵詞: Deep Learning, Convolutional Neural Network, YOLO, Object Detection, Auto-labeling
相關次數: 點閱:466下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

本論文提出一應用於機器人影像認知學習之互動式學習系統,此一系統以模擬幼童認知階段,教導者手上拿取物品並輕微晃動引發幼童注意,藉由語音傳達正確物品名稱,進而教導幼童學習與認知口語指令之物品型態。在實作上,本系統建構在Linux 環境下,並整合 Google 語音辨識及Open CV 影像處理軟體套件進行初步輕微晃動物件影像區域與語音標籤進行關聯;其藉由輸入畫面透過光流演算法與手部位置影像辨識,找出物品之相對影像區域,並自動建立物件影像標籤資訊,以達到影像自動標籤(Auto-labeling)之目的。此外也提出像素平均值曲線判別法,自動篩選模糊、框取不完整之圖像,並提高物品影像標籤框選之正確性。上述所收到的大量影像標籤,進一步導入建構在YOLO 之卷積神經網路之深度學習模型,並進行訓練即更新網路模型權重,即可應用於線上實際預測框及物件資訊之顯示。本論文以11個物品進行測試,在導入像素平均值曲線判別法之自動物件影像標籤後,平均精確度(Mean Average Precision;mAP)從78.69%提高到86.6%,提高此一機器人影像認知學習之互動式學習系統之可行性。


This paper presents an image based interactive cognitive learning system for robots. The idea of this system comes from the emulation the cognitive learning of toddlers. The parents are usually slightly swing object to attract the attention of toddlers to learn the recognition of objects. At the same time, the parents speak words simultaneously for helping the correlation between the attracted object and spoken words. Practically, the proposed system was realized in a Linux environment. The open source codes of Google speech recognition service and the Open CV were initially used to correlate the detected object image area and spoken words. The input images were processed in terms of optical flow and hand area recognition to find an interested swing object image area. Such automatically retrieved interested swing object image areas were formed as the labeled dataset of a deep learning network. Hence, an auto-labeling technology can be realized. Moreover, a pixel averaging algorithm was used to reject the imperfect labeled dataset caused from blurring and incomplete object image coverage. This procedure significantly improves the quality of the auto-labeled dataset. The collected auto-labeled dataset was further processed in terms of a convolutional neural network (CNN) deep learning model with You Only Look Once (YOLO). As a consequence, a converged CNN model was used for real-time object detection applications. Finally, this thesis used 11 different objects for interactive cognitive learning experiments. The experiments showed that the proposed auto-labeling techniques achieved 78.69% mean average precision (mAP), and an improved 86.6% mAP was concluded when the pixel averaging algorithm was applied. Hence, the proposed interactive cognitive learning system is feasible, and it can be used for intelligent service robot in the future.

指導教授推薦書 口試委員審定書 銘謝 摘要 ABSTRACT 目錄 表目錄 圖目錄 第一章 緒論 1.1 研究背景與動機 1.2 研究目的 1.3 研究貢獻 1.4 文獻回顧 第二章 系統架構 2.1 系統簡介 2.2 系統情境 2.3 影像前處理 2.3.1 光流法之影像處理 2.3.2 手部識別 2.3.3 物品之於左側或右側判別 2.3.4 物件框演算與判斷 2.4 語音辨識與文字標籤 2.4.1 語音識別 2.4.2 教導模式之語音轉文字標籤 2.4.3 詰問模式之預測結果語音回饋 2.4.4 字串傳遞與控制 2.5 檔案資料庫建置 2.5.1 物件歸類 2.5.2 自動化圖像彙整 2.5.3 圖像拼接 2.5.4 訓練資料集生成 第三章 影像處理之演算法 3.1 物件偵測之探討 3.1.1 背景相減法 3.1.2 光流法 3.2 圖像篩選系統 3.2.1 像素平均值曲線判別法 3.2.2 模糊偵測 第四章 圖像深度學習 4.1 機器學習沿革與卷積神經網路概述 4.1.1 類神經網路 4.1.2 深度學習的訓練 4.1.3 卷積層 4.1.4 激活層 4.1.5 池化層 4.1.6 池化層多層前饋神經網路 4.2 YOU ONLY LOOK ONCE神經網路 4.2.1 Unified Detection之檢測 4.2.2 YOLO v1網路訓練 4.2.3 YOLO v2之改進 4.3 模型評估參數PRECISION、RECALL、IOU、MAP 第五章 實驗結果與分析 5.1 實驗環境配置 5.2 圖像篩選之平均精確度分析 5.3 長時間追蹤與分析 第六章 結論與未來研究 6.1 結論 6.2 未來研究方向 參考文獻 附錄一

[1] 莊力文,供機器人自我學習的互動認知平台開發,國立臺灣科技大學,博士班論文,2015年。
[2] 黃珮雯,具穩定性最佳化理論於深層類神經網路語音辨識之研究,國立交通大學電機工程學系,碩士班論文,2016。
[3] 呂相弘,使用深層學習的語音辨識中的跨語言聲學模型,國立臺灣大學電信工程學研究所,碩士班論文,2016。
[4] 徐家鏞,類神經網路訓練結合局部資訊於強健性語音辨識之研究,國立中央大學資訊工程學系,碩士班論文,2015。
[5] 謝蕙蘭,利用多工學習加強人臉辨識與特徵偵測,國立臺灣大學資訊網路與多媒體系,碩士班論文,2016。
[6] 李亭緯,利用人臉五官為特徵之人臉辨識系統,國立中央大學資訊工程系,碩士班論文,2008。
[7] 白聖顗,藉由圖之線上構建於卷積神經網路上進行半監督式學習,國立臺灣大學資訊工程學系,碩士班論文,2016。
[8] 周伯威,以深層與卷積類神經網路建構聲學模型之大字彙連續語音辨識,國立臺灣大學電信工程學系,碩士班論文,2015。
[9] S. Li, M. Kleinehagenbrock, J. Fritsch, B. Wrede and C. Sagerer, “BIRON, let me show you something:evaluating the interaction with a robot companion,” IEEE International Conference on Systems, Man and Cybernetics, vol. 3, pp. 2827-2834, 2004.
[10] T. Cao, D. Li, C. Maple and R. Qiu, “Human interaction based Robot Self-Learning approach for generic skill learning in domestic environment,” Proceeding of the IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 203-208, 2013.
[11] A. Ghadirzadeh, J. Bütepage and A. Maki, “A sensorimotor reinforcement learning framework for physical Human-Robot Interaction,” IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2682-2688, 2016.
[12] H. Sawada, A. Todo, and T. Takechi, “Real time Speaker Tracking for Robotic Auditory System,” IECON 2006 - 32nd Annual Conference on IEEE Industrial Electronics, pp. 5474-5479, 2006.
[13] A. Punchihewa, and Z.M. Arshad, “Voice command interpretation for robot control,” International Conference on Automation, Robotics and Applications, pp. 90-95, 2011.
[14] M. Fatahi, and M. Farzaneh, “Smart way to verify the identity of the sound, based on neural network technique identification by voice,” International Conference on Knowledge-Based Engineering and Innovation (KBEI), pp. 345-347, 2015.
[15] M.M. Ghazi and H.K. Ekenel, “A Comprehensive Analysis of Deep Learning Based Representation for Face Recognition,” IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 102-109, 2016.
[16] C.K. Tran, C.D. Tseng and T.F. Lee, “Improving the Face Recognition Accuracy under Varying Illumination Conditions for Local Binary Patterns and Local Ternary Patterns Based on Weber-Face and Singular Value Decomposition,” International Conference on Green Technology and Sustainable Development (GTSD), pp. 5-9, 2016.
[17] P. Hsu and B. Chen, “Blurred Image Detection and Classification,” International Conference on Multimedia Modeling, vol. 4903, pp. 277-286, 2008.
[18] Y. Gong, Y. Yuan, W. Zou, Y. Zhao, S. Tang and Y. Qin, “Restoration of the Blurred Image Based on Continuous Blur Kernel,” Intelligent Human-Machine Systems and Cybernetics, vol. 2, pp. 461-465, 2016.
[19] A.N. Ragab, M. M. Rehan and Y. Hassan, “Partially blurred Images Restoration Using Adaptive Multi-Stage Approach,” IEEE Canadian Conference on Electrical and Computer Engineering, pp. 1-5, 2016.
[20] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding” Proceedings of the ACM International Conference on Multimedia, pp. 675-678, 2014.
[21] H. Qiao, Y. Li, F. Li, X. Xi and W. Wu, “Biologically Inspired Model for Visual Cognition Achieving Unsupervised Episodic and Semantic Feature Learning,” IEEE Transactions on Cybernetics, vol. 46, no. 10, pp. 2335-2347, 2016.
[22] A. Krizhevsky, S. Ilya and G.E. Hinton,“Imagenet classification with deep convolutional neural networks,” In Advances in Neural Information Processing Systems (NIPS), 2012.

無法下載圖示 全文公開日期 2023/08/27 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE