簡易檢索 / 詳目顯示

研究生: 譚馳澔
Chih-Haw Tan
論文名稱: 供室內導航基於RGB圖像的機器人姿態估算
RGB Image-Based Robot Pose Estimation for Indoor Navigation
指導教授: 林其禹
Chyi-Yeu Lin
口試委員: 林柏廷
Po-Ting Lin
范欽雄
Chin-Shyurng Fahn
學位類別: 碩士
Master
系所名稱: 工程學院 - 機械工程系
Department of Mechanical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 84
中文關鍵詞: 雷射SLAMRGB圖像定位迴歸問題姿態估算深度學習卷積神經網路PoseNetMapNet機器人導航
外文關鍵詞: laser SLAM, RGB image localization, regression problems, pose estimation, deep learning, convolutional neural networks, PoseNet, MapNet, robot navigation
相關次數: 點閱:226下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 定位對於導航來說是一個重要的議題,包括自動駕駛、服務型機器人的室內導航。當中發展悠久的即時定位與地圖構建(Simultaneous Localization and Mapping;SLAM)在室內定位有良好的表現,常用的傳感器主要分為雷射或相機。雷射SLAM優點是擁有較高的定位精度,但缺乏RGB圖像,因此限制了後續的應用,如尋找物件。視覺SLAM依賴RGB圖像和深度信息(Depth map),雖然沒有雷射SLAM成熟,但也有不錯的定位精度,具備RGB圖像使其在後續也可以有較多的應用發展;然而,其缺點是因為大量的特徵點提取與匹配導致龐大的計算量,且易受到特徵缺失、動態光源和人物的干擾。鑑此,本研究將聚焦在僅藉由RGB圖像定位的迴歸問題,省去大量的特徵點提取與匹配,而直接由RGB圖像迴歸出機器人姿態,且無需額外的人工特徵或圖形優化,以達到室內導航之目的。
    近年來深度學習和卷積神經網絡(CNN)在許多電腦視覺之研究取得不錯的成果,其優點是能夠端到端的訓練整個神經網路並從數據中學習特徵。亦有研究證明使用深度學習對RGB圖像的即時定位是有效的,如PoseNet與MapNet。本研究使用雷射SLAM收集欲導航環境之數據,將收集的RGB圖像與機器人姿態作為PoseNet與MapNet所需的訓練對,以達到根據當前RGB圖像迴歸機器人姿態。最後將系統推廣到真正的機器人平台Turtlebot3 Waffle Pi,並結合我們自主研發的路徑規劃與速度控制達到導航之目標。
    本研究選擇在一個的室內環境(覆蓋面積約為12m x 3.5m)進行測試,其平均位置誤差大約37公分,平均方向誤差約4.69度,並證明透過MapNet獲得的定位資訊結合自主開發的路徑規劃與速度控制系統,能夠讓機器人導航至指定位置。


    Localization is an important issue for navigation, including, self-driving car and indoor navigation for service robots. SLAM has a good performance in indoor localization. Commonly used sensors are mainly divided into lasers or cameras. The advantage of laser SLAM is its high localization accuracy. However, the lack of image information leads to restrictions on some applications, such as finding objects. Visual SLAM relies on RGB image and depth map. It also has good localization performance. Because having RGB images makes it possible to develop more applications in the future. The disadvantage is that a large number of features extracting and matching, cause a large amount of computation. It is easily influenced by missing features, dynamic light sources, and human disturbance. Therefore, this research will focus on the pose estimation only by RGB image, without features extracting and matching. The robot pose is directly regressing by RGB image to achieve the purpose of indoor navigation.
    In recent years, deep learning and convolutional neural network (CNN) have achieved good results in many computer vision studies. It can train the entire neural network end-to-end and learn features from the data. There have some studies shown that it is possible to use deep learning to estimate pose by RGB images, such as PoseNet and MapNet. In this study, we use laser SLAM to collect the data, including RGB images and robot pose which is used as the training pairs required by PoseNet and MapNet. Our target is to regress the robot pose based on the current RGB image. Finally, apply this system on the real robot Turtlebot3 Waffle Pi, and combined it with path planning and speed control system which develope by ourself to achieve the goal of navigation.
    We tested it in an indoor environment (the area is about 12m x 3.5m), the average position error is about 37cm, and the average orientation error is about 4.69 degrees. And proves that the pose predicts by the neural network can let the robot navigate to the target.

    摘要 I ABSTRACT II 誌謝 IV 第1章 緒論 1 1-1 前言 1 1-2 研究動機 2 1-3 研究目的 2 1-4 本文架構 3 第2章 研究理論基礎 4 2-1 機器學習 4 2-1-1 常見問題與實踐 4 2-1-1-1 模型解釋性 5 2-1-1-2 低度擬合與過度擬合 5 2-1-1-3 超參數優化 6 2-2 深度學習 6 2-2-1 多層感知器 7 2-2-1-1 損失函數 9 2-2-1-2 反向傳遞算法 10 2-2-1-3 激活函數 12 2-2-1-4 正規化 15 2-2-2 卷積神經網路 17 2-2-2-1 卷積層 18 2-2-2-2 池化層 19 2-2-3 卷積神經網路經典模型 20 第3章 運用深度學習估測機器人姿態 23 3-1 問題陳述 23 3-2 文獻回顧 24 3-3 整體系統架構 25 3-3-1 資料收集 26 3-3-2 即時定位 27 3-3-2-1 模型架構 28 3-3-2-2 損失函數 29 3-3-2-3 資料預處理與訓練方法 30 3-3-2-4 定位去噪 31 3-3-2-5 參數配置 32 3-3-3 導航系統 33 3-4 感測器元件與實驗設備 36 第4章 實驗流程與結果 39 4-1 實驗流程 39 4-2 數據集 40 4-3 實驗結果分析與討論 49 4-4 即時定位成效分析與討論 59 4-5 基於RGB圖像定位導航之成效 60 第5章 結論與未來展望 65 5-1 結論 65 5-2 未來展望 65

    [1] B.Gerkey, “gmapping-ROS Wiki.” [Online]. Available: http://wiki.ros.org/gmapping.
    [2] Google, “Cartographer ROS Integration.” [Online]. Available: https://google-cartographer-ros.readthedocs.io/en/latest/.
    [3] M.Labbé, “Real-Time Appearance-Based Mapping.” [Online]. Available: http://introlab.github.io/rtabmap/.
    [4] R.Mur-Artal, J. M. M.Montiel, andJ. D.Tardos, “ORB-SLAM: A Versatile and Accurate Monocular SLAM System,” IEEE Trans. Robot., 2015.
    [5] J.Engel, T.Schöps, andD.Cremers, “LSD-SLAM: Large-Scale Direct monocular SLAM,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014.
    [6] D. G.Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., 2004.
    [7] H.Bay, T.Tuytelaars, andL.VanGool, “SURF: Speeded up robust features,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2006.
    [8] H.Bay, A.Ess, T.Tuytelaars, andL.VanGool, “Speeded-Up Robust Features (SURF),” Comput. Vis. Image Underst., 2008.
    [9] E.Rublee, V.Rabaud, K.Konolige, andG.Bradski, “ORB: An efficient alternative to SIFT or SURF,” in Proceedings of the IEEE International Conference on Computer Vision, 2011.
    [10] A.Kendall, M.Grimes, andR.Cipolla, “PoseNet: A convolutional network for real-time 6-dof camera relocalization,” in Proceedings of the IEEE International Conference on Computer Vision, 2015.
    [11] A.Kendall andR.Cipolla, “Geometric loss functions for camera pose regression with deep learning,” in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, 2017.
    [12] S.Brahmbhatt, J.Gu, K.Kim, J.Hays, andJ.Kautz, “Geometry-Aware Learning of Maps for Camera Localization,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018.
    [13] robotis, “Turtlebot3 Waffle Pi.” [Online]. Available: http://emanual.robotis.com/docs/en/platform/turtlebot3/specifications/.
    [14] J.Bobadilla, F.Ortega, A.Hernando, andA.Gutiérrez, “Recommender systems survey,” Knowledge-Based Syst., 2013.
    [15] V.Gulshan et al., “Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs,” JAMA - J. Am. Med. Assoc., 2016.
    [16] K.He, X.Zhang, S.Ren, andJ.Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016.
    [17] J.Johnson, A.Karpathy, andL.Fei-Fei, “DenseCap: Fully convolutional localization networks for dense captioning,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016.
    [18] R.Girshick, J.Donahue, T.Darrell, andJ.Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2014.
    [19] Tensorflow, “Embedding Projector.” [Online]. Available: https://projector.tensorflow.org/.
    [20] Tensorflow, “A Neural Network Playground.”
    [21] Aayush, “The relationship between model capacity and error.” [Online]. Available: https://www.quora.com/How-are-training-and-test-errors-affected-when-hidden-layers-are-increased-in-neural-networks.
    [22] J. R.Quinlan, “Induction of Decision Trees,” Mach. Learn., 1986.
    [23] C.Cortes andV.Vapnik, “Support-Vector Networks,” Mach. Learn., 1995.
    [24] A. C.Lagandula, “Biological Neuron Introduction.” [Online]. Available: https://towardsdatascience.com/mcculloch-pitts-model-5fdf65ac5dd1.
    [25] H.Pokharna, “Artificial Neurons Introduction.”
    [26] K.Hornik, M.Stinchcombe, andH.White, “Multilayer feedforward networks are universal approximators,” Neural Networks, 1989.
    [27] “Multilayer Perceptron Introduction.” [Online]. Available: https://www.researchgate.net/figure/A-hypothetical-example-of-Multilayer-Perceptron-Network_fig4_303875065.
    [28] N.Srivastava, G.Hinton, A.Krizhevsky, I.Sutskever, andR.Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., 2014.
    [29] C.Maklin, “Dropout Neural Network Layer In Keras Explained.” [Online]. Available: https://towardsdatascience.com/machine-learning-part-20-dropout-keras-layers-explained-8c9f6dc4c9ab.
    [30] Y.LeCun, L.Bottou, Y.Bengio, andP.Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, 1998.
    [31] J.Redmon, S.Divvala, R.Girshick, andA.Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016.
    [32] J.Shotton et al., “Real-time human pose recognition in parts from single depth images,” Stud. Comput. Intell., 2013.
    [33] Y.LeCun et al., “Backpropagation Applied to Handwritten Zip Code Recognition,” Neural Comput., 2008.
    [34] CH.Tseng, “Convolutional Neural Network Structure.” [Online]. Available: https://chtseng.wordpress.com/2017/09/12/初探卷積神經網路/.
    [35] D.Cornelisse, “An intuitive guide to Convolutional Neural Networks.” [Online]. Available: https://www.freecodecamp.org/news/an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050/.
    [36] Ujjwalkarn, “An Intuitive Explanation of Convolutional Neural Networks.” [Online]. Available: https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/.
    [37] C.Szegedy et al., “Going deeper with convolutions,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2015.
    [38] C.LO, “Convolution Neural Network Introduction.” [Online]. Available: https://medium.com/@chenchoulo/convolution-neural-network-cnn-175d924bfcc1.
    [39] “VGG16 – Convolutional Network for Classification and Detection.” [Online]. Available: https://neurohive.io/en/popular-networks/vgg16/.
    [40] S.-H.Tsang, “GoogLeNet (Inception v1)— Winner of ILSVRC 2014 (Image Classification).” [Online]. Available: https://medium.com/coinmonks/paper-review-of-googlenet-inception-v1-winner-of-ilsvlc-2014-image-classification-c2b3565a64e7.
    [41] “Residual CNNs for Image Classification.” [Online]. Available: https://neurohive.io/en/popular-networks/resnet/.
    [42] T.Sattler, B.Leibe, andL.Kobbelt, “Improving image-based localization by active correspondence search,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2012.
    [43] T.Sattler, B.Leibe, andL.Kobbelt, “Fast image-based localization using direct 2D-to-3D matching,” in Proceedings of the IEEE International Conference on Computer Vision, 2011.
    [44] J.Shotton, B.Glocker, C.Zach, S.Izadi, A.Criminisi, andA.Fitzgibbon, “Scene coordinate regression forests for camera relocalization in RGB-D images,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2013.
    [45] G.Grisetti, C.Stachniss, andW.Burgard, “Improved techniques for grid mapping with Rao-Blackwellized particle filters,” IEEE Trans. Robot., 2007.
    [46] Velodyne, “velodynelidar.” [Online]. Available: https://velodynelidar.com/.
    [47] Microsoft, “Microsoft Lifecam Studio V2.” [Online]. Available: https://www.microsoft.com/accessories/zh-tw/products/webcams/lifecam-studio/q2f-00017.
    [48] “前瞻智慧自動化與機器人實驗室.” [Online]. Available: https://jerrylin.me.ntust.edu.tw/.
    [49] “semantic segmentation.” [Online]. Available: https://sergioskar.github.io/Semantic_Segmentation/.

    無法下載圖示 全文公開日期 2024/08/24 (校內網路)
    全文公開日期 2024/08/24 (校外網路)
    全文公開日期 2024/08/24 (國家圖書館:臺灣博碩士論文系統)
    QR CODE