研究生: |
黎中山 Trung-Son Le |
---|---|
論文名稱: |
物件6D姿勢估計方法應用在機器人引導和操作-從傳統電腦視覺到基於深度學習的方法 Object 6D Pose Estimation Methodology in a Transition from Classic Computer Vision to Deep Learning-based Approach in Robot Guiding and Manipulations |
指導教授: |
林其禹
Chyi-Yeu Lin |
口試委員: |
郭重顯
Chung-Hsien Kuo 林顯易 Hsien-I Lin 林其禹 Chyi-Yeu Lin 李維楨 Wei-Chen Lee 林柏廷 Po Ting Lin |
學位類別: |
博士 Doctor |
系所名稱: |
工程學院 - 機械工程系 Department of Mechanical Engineering |
論文出版年: | 2023 |
畢業學年度: | 111 |
語文別: | 英文 |
論文頁數: | 55 |
中文關鍵詞: | 姿勢估計 、基於關鍵點 、機器人引導 、Faster R-CNN |
外文關鍵詞: | pose estimation, keypoint-based, robot guiding, Faster R-CNN |
相關次數: | 點閱:229 下載:5 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
要使用機器人手臂對隨機定位的物體執行自主操作,應估計或確定包括位置和方向的姿勢。 這種觀點強調了姿態估計任務作為大多數自主機器人操作中最前沿階段的重要性。 在本論文中,通過使用 2D 和 3D 視覺模態的經典到並發深度學習計算機視覺方法,對姿勢估計問題進行了廣泛的探索。 為了促進機器人操作,機器人軌跡規劃也是一項關鍵功能,偶爾需要高精度和製造中的人類專業知識。 為了利用人手的靈巧性完成這項任務,從姿勢估計問題研究和製定了一種手持機器人引導筆狀設備。 該系統只需要一個 2D 相機和打印的 ArUco 標記,這些標記是手工粘貼在設計的 3D 打印導引筆的 31 個表面上的。 為了準確估計筆的 6D 姿態,將圖像處理與數值優化方法相結合,以確保筆校準的準確性及其運行時間。 使用 ChArUco 板的動態實驗表明,該筆在其工作範圍內的穩健性,最小軸精度約為 0.8 毫米。
深度學習的最新進展揭示了機器人視覺感知的新可能性。 然而,僅依靠這種技術執行強大的機器人任務仍然是一個開放的挑戰。 本文針對機器人抓取提出了一種基於關鍵點的目標檢測和姿態估計方法。 該實驗通過桌面固定整理和整理場景進行演示。 Faster R-CNN 框架及其基於關鍵點的人體姿勢估計架構經過啟發式調整,可以估計多類對象及其 3D/6D 姿勢。 實驗證實了合理的任務成功率。
To perform an autonomous operation on a randomly positioned object by using a robot arm, the pose including the location and orientation should be estimated or determined. This perspective emphasizes the importance of the pose estimation task as a forefront phase in most autonomous robot operations. In this thesis, the pose estimation problem is extensively explored via classic to concurrent deep learning computer vision methods using both 2D and 3D vision modalities. To facilitate a robot operation, robot trajectory planning is also a crucial function that occasionally demands high accuracy and human expertise in manufacturing. To leverage human hand dexterity for this task, a handheld robot guiding penlike device is researched and formulated from the pose estimation problem. The system simply requires a 2D camera and the printed ArUco markers which are hand glued on 31 surfaces of the designed 3D-printed guiding pen. To estimate accurately the 6D pose of the pen, image processing is combined with numerical optimization methods to ensure the accuracy of pen calibration as well as its runtime. The dynamic experiments with ChArUco board have shown the pen's robustness within its working range which has a minimum axis-accuracy of approximately 0.8mm.
Recent advances in deep learning have unveiled new possibilities in robot visual perception. However, performing a robust robot task relying solely on this technique is still an open challenge. A key point-based object detection and pose estimation method is proposed in this thesis for robot grasping. The experiment is demonstrated with a tabletop stationary decluttering and arranging scenario. The Faster R-CNN framework with its key point-based human pose estimation architecture is adapted heuristically to estimate multiple classes of object and their 3D/6D poses. Experiments have confirmed a reasonable task success rate.
[1] J. Vidal, C.Y. Lin, X. Lladó, and R. Martí, “A method for 6d pose estimation of freeform rigid objects
using point pair features on range data,” Sensors, vol. 18, no. 8, p. 2678, 2018.
[2] I. Gordon and D. G. Lowe, “What and where: 3d object recognition with accurate pose,” in Toward
categorylevel object recognition, pp. 67–82, Springer, 2006.
[3] C. M. Cyr and B. B. Kimia, “3d object recognition using shape similiaritybased aspect graph,” in
Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, vol. 1, pp. 254–
261, IEEE, 2001.
[4] M. Ulrich, C. Wiedemann, and C. Steger, “Cadbased recognition of 3d objects in monocular images.,”
in ICRA, vol. 9, pp. 1191–1198, 2009.
[5] C. Steger, “Similarity measures for occlusion, clutter, and illumination invariant object recognition,”
in Joint Pattern Recognition Symposium, pp. 148–154, Springer, 2001.
[6] C. Steger, “Occlusion, clutter, and illumination invariant object recognition,” International Archives
of Photogrammetry Remote Sensing and Spatial Information Sciences, vol. 34, no. 3/A, pp. 345–350,
2002.
[7] S. Hinterstoisser, S. Holzer, C. Cagniart, S. Ilic, K. Konolige, N. Navab, and V. Lepetit, “Multimodal
templates for realtime detection of textureless objects in heavily cluttered scenes,” in 2011 international conference on computer vision, pp. 858–865, IEEE, 2011.
[8] S. Hinterstoisser, V. Lepetit, S. Ilic, S. Holzer, G. Bradski, K. Konolige, and N. Navab, “Model based
training, detection and pose estimation of textureless 3d objects in heavily cluttered scenes,” in Asian
conference on computer vision, pp. 548–562, Springer, 2012.
[9] H. Borotschnig, L. Paletta, M. Prantl, and A. Pinz, “Appearancebased active object recognition,”
Image and Vision Computing, vol. 18, no. 9, pp. 715–727, 2000.
[10] J. Liebelt, C. Schmid, and K. Schertler, “independent object class detection using 3d feature maps,”
in 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, IEEE, 2008.
[11] S. Holzer, S. Hinterstoisser, S. Ilic, and N. Navab, “Distance transform templates for object detection and pose estimation,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition,
pp. 1177–1184, IEEE, 2009.
[12] S. Hinterstoisser, V. Lepetit, S. Ilic, P. Fua, and N. Navab, “Dominant orientation templates for realtime detection of textureless objects,” in 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2257–2264, IEEE, 2010.
[13] S. Hinterstoisser, C. Cagniart, S. Ilic, P. Sturm, N. Navab, P. Fua, and V. Lepetit, “Gradient response
maps for realtime detection of textureless objects,” IEEE transactions on pattern analysis and machine intelligence, vol. 34, no. 5, pp. 876–888, 2011.
[14] Y. Xiang, R. Mottaghi, and S. Savarese, “Beyond pascal: A benchmark for 3d object detection in the
wild,” in IEEE winter conference on applications of computer vision, pp. 75–82, IEEE, 2014.
[15] P. Wohlhart and V. Lepetit, “Learning descriptors for object recognition and 3d pose estimation,” in
Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3109–3118, 2015.
[16] S. Falahati, OpenNI cookbook. Packt Publishing Ltd, 2013.
[17] Y. Guo, M. Bennamoun, F. Sohel, M. Lu, and J. Wan, “3d object recognition in cluttered scenes with
local surface features: a survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 36, no. 11, pp. 2270–2287, 2014.
[18] A. Bronstein, M. Bronstein, and M. Ovsjanikov, “3d features, surface descriptors, and object descriptors,” 3D Imaging, Analysis, and Applications, pp. 1–27, 2010.
[19] R. B. Rusu, G. Bradski, R. Thibaux, and J. Hsu, “Fast 3d recognition and pose using the viewpoint
feature histogram,” in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems,
pp. 2155–2162, IEEE, 2010.
[20] E. Paquet, M. Rioux, A. Murching, T. Naveen, and A. Tabatabai, “Description of shape information
for 2d and 3d objects,” Signal processing: Image communication, vol. 16, no. 12, pp. 103–122,
2000.
[21] R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin, “Shape distributions,” ACM Transactions on
Graphics (TOG), vol. 21, no. 4, pp. 807–832, 2002.
[22] J. Vidal, C.Y. Lin, and R. Martí, “6d pose estimation using an improved method based on point pair
features,” in 2018 4th international conference on control, automation and robotics (iccar), pp. 405–
409, IEEE, 2018.
[23] B. Drost, M. Ulrich, N. Navab, and S. Ilic, “Model globally, match locally: Efficient and robust 3d
object recognition,” in 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 998–1005, Ieee, 2010.
[24] E. Kim and G. Medioni, “3d object recognition in range images using visibility context,” in 2011 IEEE/
RSJ International Conference on Intelligent Robots and Systems, pp. 3800–3807, IEEE, 2011.
[25] B. Drost and S. Ilic, “3d object detection and localization using multimodal point pair features,” in
2012 Second International Conference on 3D Imaging, Modeling, Processing, Visualization & Transmission, pp. 9–16, IEEE, 2012.
[26] T. Birdal and S. Ilic, “Point pair features based object detection and pose estimation revisited,” in 2015
International Conference on 3D Vision, pp. 527–535, IEEE, 2015
[27] S. Hinterstoisser, V. Lepetit, N. Rajkumar, and K. Konolige, “Going further with point pair features,”
in European conference on computer vision, pp. 834–848, Springer, 2016.
[28] Y. Xiang and S. Savarese, “Estimating the aspect layout of object categories,” in 2012 IEEE Conference
on Computer Vision and Pattern Recognition, pp. 3410–3417, IEEE, 2012.
[29] J. J. Lim, H. Pirsiavash, and A. Torralba, “Parsing ikea objects: Fine pose estimation,” in Proceedings
of the IEEE International Conference on Computer Vision, pp. 2992–2999, 2013.
[30] E. Brachmann, A. Krull, F. Michel, S. Gumhold, J. Shotton, and C. Rother, “Learning 6d object pose
estimation using 3d object coordinates,” in European conference on computer vision, pp. 536–551,
Springer, 2014.
[31] G. Gkioxari, B. Hariharan, R. Girshick, and J. Malik, “Rcnns for pose estimation and action detection,” arXiv preprint arXiv:1406.5212, 2014.
[32] A. Crivellaro, M. Rad, Y. Verdie, K. Moo Yi, P. Fua, and V. Lepetit, “A novel representation of
parts for accurate 3d object detection and tracking in monocular images,” in Proceedings of the IEEE
international conference on computer vision, pp. 4391–4399, 2015.
[33] S. Tulsiani and J. Malik, “Viewpoints and keypoints,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, pp. 1510–1519, 2015.
[34] D. Dwibedi, T. Malisiewicz, V. Badrinarayanan, and A. Rabinovich, “Deep cuboid detection: Beyond
2d bounding boxes,” arXiv preprint arXiv:1611.10010, 2016.
[35] D. Dwibedi, “Towards pose estimation of 3d objects in monocular images via keypoint detection,”
[36] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural
networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
[37] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection
and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern
recognition, pp. 580–587, 2014.
[38] R. Girshick, “Fast rcnn,” in Proceedings of the IEEE international conference on computer vision,
pp. 1440–1448, 2015.
[39] S. Ren, K. He, R. Girshick, and J. Sun, “Faster rcnn: Towards realtime object detection with region
proposal networks,” in Advances in neural information processing systems, pp. 91–99, 2015.
[40] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask rcnn,” in Proceedings of the IEEE international
conference on computer vision, pp. 2961–2969, 2017.
[41] W. Kehl, F. Manhardt, F. Tombari, S. Ilic, and N. Navab, “Ssd6d: Making rgbbased 3d detection and
6d pose estimation great again,” in Proceedings of the IEEE International Conference on Computer
Vision, pp. 1521–1529, 2017
[42] Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “Posecnn: A convolutional neural network for 6d
object pose estimation in cluttered scenes,” arXiv preprint arXiv:1711.00199, 2017.
[43] S. Peng, Y. Liu, Q. Huang, X. Zhou, and H. Bao, “Pvnet: Pixelwise voting network for 6dof pose
estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
pp. 4561–4570, 2019.
[44] J. Tremblay, T. To, B. Sundaralingam, Y. Xiang, D. Fox, and S. Birchfield, “Deep object pose estimation for semantic robotic grasping of household objects,” arXiv preprint arXiv:1809.10790, 2018.
[45] K. Duan, S. Bai, L. Xie, H. Qi, Q. Huang, and Q. Tian, “Centernet: Keypoint triplets for object
detection,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 6569–
6578, 2019.
[46] Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully convolutional onestage object detection,” in
Proceedings of the IEEE/CVF international conference on computer vision, pp. 9627–9636, 2019.
[47] H.I. Lin and Y.H. Lin, “A novel teaching system for industrial robots,” Sensors, vol. 14, no. 4,
pp. 6012–6031, 2014.
[48] P. Tsarouchi, A. Athanasatos, S. Makris, X. Chatzigeorgiou, and G. Chryssolouris, “High level robot
programming using body and hand gestures,” Procedia Cirp, vol. 55, pp. 1–5, 2016.
[49] J. Lambrecht, M. Kleinsorge, M. Rosenstrauch, and J. Krüger, “Spatial programming for industrial
robots through task demonstration,” International Journal of Advanced Robotic Systems, vol. 10, no. 5,
p. 254, 2013.
[50] J. Lambrecht and J. Krüger, “Spatial programming for industrial robots: efficient, effective and useroptimised through natural communication and augmented reality,” in Advanced Materials Research,
vol. 1018, pp. 39–46, Trans Tech Publ, 2014.
[51] P. Wacker, O. Nowak, S. Voelker, and J. Borchers, “Evaluating menu techniques for handheld ar with a
smartphone & midair pen,” in 22nd International Conference on HumanComputer Interaction with
Mobile Devices and Services, pp. 1–10, 2020.
[52] P.C. Wu, R. Wang, K. Kin, C. Twigg, S. Han, M.H. Yang, and S.Y. Chien, “Dodecapen: Accurate 6dof tracking of a passive stylus,” in Proceedings of the 30th Annual ACM Symposium on User
Interface Software and Technology, pp. 365–374, 2017.
[53] P. Azad, T. Asfour, and R. Dillmann, “Stereobased 6d object localization for grasping with humanoid robot systems,” in 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 919–924, IEEE, 2007.
[54] P. Azad, D. Münch, T. Asfour, and R. Dillmann, “6dof modelbased tracking of arbitrarily shaped 3d
objects,” in 2011 IEEE International Conference on Robotics and Automation, pp. 5204–5209, IEEE,
2011.
[55] C. Choi and H. I. Christensen, “Rgbd object tracking: A particle filter approach on gpu,” in 2013
IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1084–1091, IEEE, 2013.
[56] W. Abdulla, “Mask rcnn for object detection and instance segmentation on keras and tensorflow.”
https://github.com/matterport/Mask_RCNN, 2017.
[57] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu, and A. C. Berg, “Ssd: Single shot
multibox detector,” in European conference on computer vision, pp. 21–37, Springer, 2016.
[58] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, realtime object
detection. arxiv 2015,” arXiv preprint arXiv:1506.02640, vol. 1, no. 2, 2015.
[59] T. Hodan, F. Michel, E. Brachmann, W. Kehl, A. GlentBuch, D. Kraft, B. Drost, J. Vidal, S. Ihrke,
X. Zabulis, et al., “Bop: Benchmark for 6d object pose estimation,” in Proceedings of the European
Conference on Computer Vision (ECCV), pp. 19–34, 2018.
[60] Wikipedia contributors, “Truncated icosahedron — Wikipedia, the free encyclopedia,” 2021. [Online;
accessed 7November2022].
[61] Wikipedia contributors, “Icosahedron — Wikipedia, the free encyclopedia,” 2021. [Online; accessed
7November2022].
[62] S. GarridoJurado, R. MuñozSalinas, F. J. MadridCuevas, and M. J. MarínJiménez, “Automatic
generation and detection of highly reliable fiducial markers under occlusion,” 2014.
[63] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000.
[64] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in Sixth international
conference on computer vision (IEEE Cat. No. 98CH36271), pp. 839–846, IEEE, 1998.
[65] R. C. Gonzalez, R. E. Woods, et al., “Digital image processing,” 2002.
[66] A. Morales, T. Asfour, P. Azad, S. Knoop, and R. Dillmann, “Integrated grasp planning and visual
object localization for a humanoid robot with fivefingered hands,” in 2006 IEEE/RSJ International
Conference on Intelligent Robots and Systems, pp. 5663–5668, IEEE, 2006.
[67] K. Hsiao, S. Chitta, M. Ciocarlie, and E. G. Jones, “Contactreactive grasping of objects with partial
shape information,” in 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems,
pp. 1228–1235, IEEE, 2010.
[68] J. Bohg, A. Morales, T. Asfour, and D. Kragic, “Datadriven grasp synthesis—a survey,” IEEE Transactions on robotics, vol. 30, no. 2, pp. 289–309, 2013.
[69] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings
of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[70] E. K. V. I. I. A. Buslaev, A. Parinov and A. A. Kalinin, “Albumentations: fast and flexible image
augmentations,” ArXiv eprints, 2018.
[71] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification,” in Proceedings of the IEEE international conference on computer
vision, pp. 1026–1034, 2015.