簡易檢索 / 詳目顯示

研究生: 黎中山
Trung-Son Le
論文名稱: 物件6D姿勢估計方法應用在機器人引導和操作-從傳統電腦視覺到基於深度學習的方法
Object 6D Pose Estimation Methodology in a Transition from Classic Computer Vision to Deep Learning-based Approach in Robot Guiding and Manipulations
指導教授: 林其禹
Chyi-Yeu Lin
口試委員: 郭重顯
Chung-Hsien Kuo
林顯易
Hsien-I Lin
林其禹
Chyi-Yeu Lin
李維楨
Wei-Chen Lee
林柏廷
Po Ting Lin
學位類別: 博士
Doctor
系所名稱: 工程學院 - 機械工程系
Department of Mechanical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 55
中文關鍵詞: 姿勢估計基於關鍵點機器人引導Faster R-CNN
外文關鍵詞: pose estimation, keypoint-based, robot guiding, Faster R-CNN
相關次數: 點閱:229下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 要使用機器人手臂對隨機定位的物體執行自主操作,應估計或確定包括位置和方向的姿勢。 這種觀點強調了姿態估計任務作為大多數自主機器人操作中最前沿階段的重要性。 在本論文中,通過使用 2D 和 3D 視覺模態的經典到並發深度學習計算機視覺方法,對姿勢估計問題進行了廣泛的探索。 為了促進機器人操作,機器人軌跡規劃也是一項關鍵功能,偶爾需要高精度和製造中的人類專業知識。 為了利用人手的靈巧性完成這項任務,從姿勢估計問題研究和製定了一種手持機器人引導筆狀設備。 該系統只需要一個 2D 相機和打印的 ArUco 標記,這些標記是手工粘貼在設計的 3D 打印導引筆的 31 個表面上的。 為了準確估計筆的 6D 姿態,將圖像處理與數值優化方法相結合,以確保筆校準的準確性及其運行時間。 使用 ChArUco 板的動態實驗表明,該筆在其工作範圍內的穩健性,最小軸精度約為 0.8 毫米。

    深度學習的最新進展揭示了機器人視覺感知的新可能性。 然而,僅依靠這種技術執行強大的機器人任務仍然是一個開放的挑戰。 本文針對機器人抓取提出了一種基於關鍵點的目標檢測和姿態估計方法。 該實驗通過桌面固定整理和整理場景進行演示。 Faster R-CNN 框架及其基於關鍵點的人體姿勢估計架構經過啟發式調整,可以估計多類對象及其 3D/6D 姿勢。 實驗證實了合理的任務成功率。


    To perform an autonomous operation on a randomly positioned object by using a robot arm, the pose including the location and orientation should be estimated or determined. This perspective emphasizes the importance of the pose estimation task as a forefront phase in most autonomous robot operations. In this thesis, the pose estimation problem is extensively explored via classic to concurrent deep learning computer vision methods using both 2D and 3D vision modalities. To facilitate a robot operation, robot trajectory planning is also a crucial function that occasionally demands high accuracy and human expertise in manufacturing. To leverage human hand dexterity for this task, a handheld robot guiding penlike device is researched and formulated from the pose estimation problem. The system simply requires a 2D camera and the printed ArUco markers which are hand glued on 31 surfaces of the designed 3D-­printed guiding pen. To estimate accurately the 6D pose of the pen, image processing is combined with numerical optimization methods to ensure the accuracy of pen calibration as well as its run­time. The dynamic experiments with ChArUco board have shown the pen's robustness within its working range which has a minimum axis-accuracy of approximately 0.8mm.

    Recent advances in deep learning have unveiled new possibilities in robot visual perception. However, performing a robust robot task relying solely on this technique is still an open challenge. A key point-based object detection and pose estimation method is proposed in this thesis for robot grasping. The experiment is demonstrated with a tabletop stationary decluttering and arranging scenario. The Faster R-CNN framework with its key point-based human pose estimation architecture is adapted heuristically to estimate multiple classes of object and their 3D/6D poses. Experiments have confirmed a reasonable task success rate.

    1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 Concurrent Methods for 3D Object Recognition . . 5 1.2.2 Robot vision­based teaching methodology . . . . . 7 1.2.3 Robot household tasking . . . . . . . . . . . . . . 9 1.3 Objective and Scope of Study . . . . . . . . . . . . . . . . 10 1.4 Structure of the Thesis . . . . . . . . . . . . . . . . . . . 11 2 Extending 2D Object Detection with CNN­feature Regions to 6D Pose Estimation using Point­Pair Features . . . . . . . . . . . . 13 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.1 Object detection . . . . . . . . . . . . . . . . . . 14 2.1.2 6D pose estimation . . . . . . . . . . . . . . . . . 16 2.1.3 6D pose estimation integration for 3D object recognition . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.1 Deep learning . . . . . . . . . . . . . . . . . . . . 22 2.2.2 Pose Estimation . . . . . . . . . . . . . . . . . . . 25 2.2.3 Robot grasping . . . . . . . . . . . . . . . . . . . 27 2.3 Results and Discussions . . . . . . . . . . . . . . . . . . . 28 3 An ArUco­Attached Icosahedron Bundle Adjustment Calibration and Pose Estimation for Accurate 6D Guidance of a Robot Arm . 33 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Icosahedron Design . . . . . . . . . . . . . . . . . . . . . 34 3.3 Define Aruco Marker Poses in Penta and Hexa­polygon . . 35 3.4 Pen Calibration . . . . . . . . . . . . . . . . . . . . . . . 36 3.4.1 Pen geometry calibration with bundle adjustment . 36 3.4.2 Pen­tip calibration . . . . . . . . . . . . . . . . . 39 3.5 Experiment Results . . . . . . . . . . . . . . . . . . . . . 40 3.5.1 Calibration accuracy . . . . . . . . . . . . . . . . 40 3.5.2 Inference accuracy . . . . . . . . . . . . . . . . . 42 3.5.3 Image processing ablation . . . . . . . . . . . . . 43 4 Multi­Object Class Key Point­Based Detection and Pose Estimation with CNN­Feature Regions for Robot Grasping . . . . . . . 46 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2.1 Residual learning unit . . . . . . . . . . . . . . . 48 4.2.2 Keypoint prediction network . . . . . . . . . . . . 49 4.2.3 Visibility branch . . . . . . . . . . . . . . . . . . 51 4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 52 5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . 60 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    [1] J. Vidal, C.­Y. Lin, X. Lladó, and R. Martí, “A method for 6d pose estimation of free­form rigid objects
    using point pair features on range data,” Sensors, vol. 18, no. 8, p. 2678, 2018.
    [2] I. Gordon and D. G. Lowe, “What and where: 3d object recognition with accurate pose,” in Toward
    category­level object recognition, pp. 67–82, Springer, 2006.
    [3] C. M. Cyr and B. B. Kimia, “3d object recognition using shape similiarity­based aspect graph,” in
    Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 1, pp. 254–
    261, IEEE, 2001.
    [4] M. Ulrich, C. Wiedemann, and C. Steger, “Cad­based recognition of 3d objects in monocular images.,”
    in ICRA, vol. 9, pp. 1191–1198, 2009.
    [5] C. Steger, “Similarity measures for occlusion, clutter, and illumination invariant object recognition,”
    in Joint Pattern Recognition Symposium, pp. 148–154, Springer, 2001.
    [6] C. Steger, “Occlusion, clutter, and illumination invariant object recognition,” International Archives
    of Photogrammetry Remote Sensing and Spatial Information Sciences, vol. 34, no. 3/A, pp. 345–350,
    2002.
    [7] S. Hinterstoisser, S. Holzer, C. Cagniart, S. Ilic, K. Konolige, N. Navab, and V. Lepetit, “Multimodal
    templates for real­time detection of texture­less objects in heavily cluttered scenes,” in 2011 international conference on computer vision, pp. 858–865, IEEE, 2011.
    [8] S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, and N. Navab, “Model based
    training, detection and pose estimation of texture­less 3d objects in heavily cluttered scenes,” in Asian
    conference on computer vision, pp. 548–562, Springer, 2012.
    [9] H. Borotschnig, L. Paletta, M. Prantl, and A. Pinz, “Appearance­based active object recognition,”
    Image and Vision Computing, vol. 18, no. 9, pp. 715–727, 2000.
    [10] J. Liebelt, C. Schmid, and K. Schertler, “independent object class detection using 3d feature maps,”
    in 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, IEEE, 2008.
    [11] S. Holzer, S. Hinterstoisser, S. Ilic, and N. Navab, “Distance transform templates for object detection and pose estimation,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition,
    pp. 1177–1184, IEEE, 2009.
    [12] S. Hinterstoisser, V. Lepetit, S. Ilic, P. Fua, and N. Navab, “Dominant orientation templates for realtime detection of texture­less objects,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2257–2264, IEEE, 2010.
    [13] S. Hinterstoisser, C. Cagniart, S. Ilic, P. Sturm, N. Navab, P. Fua, and V. Lepetit, “Gradient response
    maps for real­time detection of textureless objects,” IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 5, pp. 876–888, 2011.
    [14] Y. Xiang, R. Mottaghi, and S. Savarese, “Beyond pascal: A benchmark for 3d object detection in the
    wild,” in IEEE winter conference on applications of computer vision, pp. 75–82, IEEE, 2014.
    [15] P. Wohlhart and V. Lepetit, “Learning descriptors for object recognition and 3d pose estimation,” in
    Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3109–3118, 2015.
    [16] S. Falahati, OpenNI cookbook. Packt Publishing Ltd, 2013.
    [17] Y. Guo, M. Bennamoun, F. Sohel, M. Lu, and J. Wan, “3d object recognition in cluttered scenes with
    local surface features: a survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
    vol. 36, no. 11, pp. 2270–2287, 2014.
    [18] A. Bronstein, M. Bronstein, and M. Ovsjanikov, “3d features, surface descriptors, and object descriptors,” 3D Imaging, Analysis, and Applications, pp. 1–27, 2010.
    [19] R. B. Rusu, G. Bradski, R. Thibaux, and J. Hsu, “Fast 3d recognition and pose using the viewpoint
    feature histogram,” in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems,
    pp. 2155–2162, IEEE, 2010.
    [20] E. Paquet, M. Rioux, A. Murching, T. Naveen, and A. Tabatabai, “Description of shape information
    for 2­d and 3­d objects,” Signal processing: Image communication, vol. 16, no. 1­2, pp. 103–122,
    2000.
    [21] R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin, “Shape distributions,” ACM Transactions on
    Graphics (TOG), vol. 21, no. 4, pp. 807–832, 2002.
    [22] J. Vidal, C.­Y. Lin, and R. Martí, “6d pose estimation using an improved method based on point pair
    features,” in 2018 4th international conference on control, automation and robotics (iccar), pp. 405–
    409, IEEE, 2018.
    [23] B. Drost, M. Ulrich, N. Navab, and S. Ilic, “Model globally, match locally: Efficient and robust 3d
    object recognition,” in 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 998–1005, Ieee, 2010.
    [24] E. Kim and G. Medioni, “3d object recognition in range images using visibility context,” in 2011 IEEE/
    RSJ International Conference on Intelligent Robots and Systems, pp. 3800–3807, IEEE, 2011.
    [25] B. Drost and S. Ilic, “3d object detection and localization using multimodal point pair features,” in
    2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pp. 9–16, IEEE, 2012.
    [26] T. Birdal and S. Ilic, “Point pair features based object detection and pose estimation revisited,” in 2015
    International Conference on 3D Vision, pp. 527–535, IEEE, 2015
    [27] S. Hinterstoisser, V. Lepetit, N. Rajkumar, and K. Konolige, “Going further with point pair features,”
    in European conference on computer vision, pp. 834–848, Springer, 2016.
    [28] Y. Xiang and S. Savarese, “Estimating the aspect layout of object categories,” in 2012 IEEE Conference
    on Computer Vision and Pattern Recognition, pp. 3410–3417, IEEE, 2012.
    [29] J. J. Lim, H. Pirsiavash, and A. Torralba, “Parsing ikea objects: Fine pose estimation,” in Proceedings
    of the IEEE International Conference on Computer Vision, pp. 2992–2999, 2013.
    [30] E. Brachmann, A. Krull, F. Michel, S. Gumhold, J. Shotton, and C. Rother, “Learning 6d object pose
    estimation using 3d object coordinates,” in European conference on computer vision, pp. 536–551,
    Springer, 2014.
    [31] G. Gkioxari, B. Hariharan, R. Girshick, and J. Malik, “R­cnns for pose estimation and action detection,” arXiv preprint arXiv:1406.5212, 2014.
    [32] A. Crivellaro, M. Rad, Y. Verdie, K. Moo Yi, P. Fua, and V. Lepetit, “A novel representation of
    parts for accurate 3d object detection and tracking in monocular images,” in Proceedings of the IEEE
    international conference on computer vision, pp. 4391–4399, 2015.
    [33] S. Tulsiani and J. Malik, “Viewpoints and keypoints,” in Proceedings of the IEEE Conference on
    Computer Vision and Pattern Recognition, pp. 1510–1519, 2015.
    [34] D. Dwibedi, T. Malisiewicz, V. Badrinarayanan, and A. Rabinovich, “Deep cuboid detection: Beyond
    2d bounding boxes,” arXiv preprint arXiv:1611.10010, 2016.
    [35] D. Dwibedi, “Towards pose estimation of 3d objects in monocular images via keypoint detection,”
    [36] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural
    networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
    [37] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection
    and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern
    recognition, pp. 580–587, 2014.
    [38] R. Girshick, “Fast r­cnn,” in Proceedings of the IEEE international conference on computer vision,
    pp. 1440–1448, 2015.
    [39] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r­cnn: Towards real­time object detection with region
    proposal networks,” in Advances in neural information processing systems, pp. 91–99, 2015.
    [40] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r­cnn,” in Proceedings of the IEEE international
    conference on computer vision, pp. 2961–2969, 2017.
    [41] W. Kehl, F. Manhardt, F. Tombari, S. Ilic, and N. Navab, “Ssd­6d: Making rgb­based 3d detection and
    6d pose estimation great again,” in Proceedings of the IEEE International Conference on Computer
    Vision, pp. 1521–1529, 2017
    [42] Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “Posecnn: A convolutional neural network for 6d
    object pose estimation in cluttered scenes,” arXiv preprint arXiv:1711.00199, 2017.
    [43] S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, “Pvnet: Pixel­wise voting network for 6dof pose
    estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
    pp. 4561–4570, 2019.
    [44] J. Tremblay, T. To, B. Sundaralingam, Y. Xiang, D. Fox, and S. Birchfield, “Deep object pose estimation for semantic robotic grasping of household objects,” arXiv preprint arXiv:1809.10790, 2018.
    [45] K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, “Centernet: Keypoint triplets for object
    detection,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 6569–
    6578, 2019.
    [46] Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully convolutional one­stage object detection,” in
    Proceedings of the IEEE/CVF international conference on computer vision, pp. 9627–9636, 2019.
    [47] H.­I. Lin and Y.­H. Lin, “A novel teaching system for industrial robots,” Sensors, vol. 14, no. 4,
    pp. 6012–6031, 2014.
    [48] P. Tsarouchi, A. Athanasatos, S. Makris, X. Chatzigeorgiou, and G. Chryssolouris, “High level robot
    programming using body and hand gestures,” Procedia Cirp, vol. 55, pp. 1–5, 2016.
    [49] J. Lambrecht, M. Kleinsorge, M. Rosenstrauch, and J. Krüger, “Spatial programming for industrial
    robots through task demonstration,” International Journal of Advanced Robotic Systems, vol. 10, no. 5,
    p. 254, 2013.
    [50] J. Lambrecht and J. Krüger, “Spatial programming for industrial robots: efficient, effective and useroptimised through natural communication and augmented reality,” in Advanced Materials Research,
    vol. 1018, pp. 39–46, Trans Tech Publ, 2014.
    [51] P. Wacker, O. Nowak, S. Voelker, and J. Borchers, “Evaluating menu techniques for handheld ar with a
    smartphone & mid­air pen,” in 22nd International Conference on Human­Computer Interaction with
    Mobile Devices and Services, pp. 1–10, 2020.
    [52] P.­C. Wu, R. Wang, K. Kin, C. Twigg, S. Han, M.­H. Yang, and S.­Y. Chien, “Dodecapen: Accurate 6dof tracking of a passive stylus,” in Proceedings of the 30th Annual ACM Symposium on User
    Interface Software and Technology, pp. 365–374, 2017.
    [53] P. Azad, T. Asfour, and R. Dillmann, “Stereo­based 6d object localization for grasping with humanoid robot systems,” in 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 919–924, IEEE, 2007.
    [54] P. Azad, D. Münch, T. Asfour, and R. Dillmann, “6­dof model­based tracking of arbitrarily shaped 3d
    objects,” in 2011 IEEE International Conference on Robotics and Automation, pp. 5204–5209, IEEE,
    2011.
    [55] C. Choi and H. I. Christensen, “Rgb­d object tracking: A particle filter approach on gpu,” in 2013
    IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1084–1091, IEEE, 2013.
    [56] W. Abdulla, “Mask r­cnn for object detection and instance segmentation on keras and tensorflow.”
    https://github.com/matterport/Mask_RCNN, 2017.
    [57] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.­Y. Fu, and A. C. Berg, “Ssd: Single shot
    multibox detector,” in European conference on computer vision, pp. 21–37, Springer, 2016.
    [58] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real­time object
    detection. arxiv 2015,” arXiv preprint arXiv:1506.02640, vol. 1, no. 2, 2015.
    [59] T. Hodan, F. Michel, E. Brachmann, W. Kehl, A. GlentBuch, D. Kraft, B. Drost, J. Vidal, S. Ihrke,
    X. Zabulis, et al., “Bop: Benchmark for 6d object pose estimation,” in Proceedings of the European
    Conference on Computer Vision (ECCV), pp. 19–34, 2018.
    [60] Wikipedia contributors, “Truncated icosahedron — Wikipedia, the free encyclopedia,” 2021. [Online;
    accessed 7­November­2022].
    [61] Wikipedia contributors, “Icosahedron — Wikipedia, the free encyclopedia,” 2021. [Online; accessed
    7­November­2022].
    [62] S. Garrido­Jurado, R. Muñoz­Salinas, F. J. Madrid­Cuevas, and M. J. Marín­Jiménez, “Automatic
    generation and detection of highly reliable fiducial markers under occlusion,” 2014.
    [63] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000.
    [64] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Sixth international
    conference on computer vision (IEEE Cat. No. 98CH36271), pp. 839–846, IEEE, 1998.
    [65] R. C. Gonzalez, R. E. Woods, et al., “Digital image processing,” 2002.
    [66] A. Morales, T. Asfour, P. Azad, S. Knoop, and R. Dillmann, “Integrated grasp planning and visual
    object localization for a humanoid robot with five­fingered hands,” in 2006 IEEE/RSJ International
    Conference on Intelligent Robots and Systems, pp. 5663–5668, IEEE, 2006.
    [67] K. Hsiao, S. Chitta, M. Ciocarlie, and E. G. Jones, “Contact­reactive grasping of objects with partial
    shape information,” in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems,
    pp. 1228–1235, IEEE, 2010.
    [68] J. Bohg, A. Morales, T. Asfour, and D. Kragic, “Data­driven grasp synthesis—a survey,” IEEE Transactions on robotics, vol. 30, no. 2, pp. 289–309, 2013.
    [69] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings
    of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
    [70] E. K. V. I. I. A. Buslaev, A. Parinov and A. A. Kalinin, “Albumentations: fast and flexible image
    augmentations,” ArXiv e­prints, 2018.
    [71] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human­level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer
    vision, pp. 1026–1034, 2015.

    QR CODE