移動機器人之機器手臂自主夾取系統開發｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	鄭子軒 Zi-Xuan Zheng
論文名稱：	移動機器人之機器手臂自主夾取系統開發 Development of an Autonomous Grasping System for the Mobile Robot with a Manipulator
指導教授：	郭重顯 Chung-Hsien Kuo
口試委員:	陳金聖 CHIN-SHENG CHEN 劉孟昆 Meng-Kun Liu 鍾聖倫 Sheng-Luen Chung
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	81
中文關鍵詞：	深度學習神經網路、自主夾取、雙目立體視覺、特徵點提取、極線約束
外文關鍵詞：	Deep neural network, Autonomous grasping, Binocular stereo vision, Feature extraction, Epipolar constraint
相關次數：	點閱：281 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

本研究主旨為開發一移動機器人之機器手臂自主夾取系統，使移動機器人具有在不同複雜環境中夾取物體能力，藉此讓移動機器人可以在工廠不同位置進行服務。本研究在自主夾取的部份主要分成四個階段：物體類別與夾取位置辨識、影像特徵點計算與匹配、三維空間定位與型態擬合。隨著新型深度學習技術的進步，物體類別與夾取位置辨識可以容易從開源代碼進行開發。在影像特徵點計算與匹配中本論文使用SURF演算法，此演算法在獲取特徵點上有很好的表現，但在匹配上會因為特徵點太相似導致錯誤的匹配，而錯誤匹配會對系統後續造成嚴重影響，由於本論文使用雙目視覺系統，通常會以極線約束的方式約束，因此在本論文在匹配上導入極線約束，使正確率提高25%，花費時間減少一半的時間。
在雙目視覺系統上，本系統使用非平行雙目，在非平行系統上視覺定位精度與兩相機重疊視角範圍上優於平行雙目，在定位公式的相關文獻中有許多推倒，但都以機構理想為前提，缺少考慮機構製作與安裝上所產生的誤差，因此本論文在定位公式上，針對有機構誤差之雙目定位模型進行推導，使本系統的定位誤差減少2mm。最後在擬合部分可以透過深度學習得知物體類別後得知物體形狀，在圓形物體中可以透過至少三點以上的散點擬合圓，因此本論文透過最小二乘法擬合的方法進行擬合，但由於匹配上會有少許匹配錯誤之匹配對，因此加入RANSAC演算法剔除錯誤點，增加準確度。而在方形物件中SURF可以有效的平面上找出特徵點，但缺少物體輪廓，為了增加準確度，因此在擬合前先透過HSV方法提取物體並透過連通法找出物體輪廓，進一步的使用圖形學中Convex Hull求出物體之角點，並將兩圖像之角點計算出三維空間座標後加入散點中進行擬合，由於方形物為兩正交之平面，因此壓縮點雲至X-Z平面後，使用最小二乘法與RANSAC演算法擬合出兩正交之直線。
最後本研究透過simultaneous localization and mapping (SLAM)進行定位，並給定手臂大概的初始姿態，此步驟可以使自主夾取系統在執行前可以看到目標待夾取物品，並實行系統。最後實驗結果顯示，此系統抓取一目標物的花費時間為0.8秒，而抓取誤差小於5mm，而夾爪最小容許誤差為10mm，因此在自動夾取系統可以成功夾取。

The purpose of this study is to develop an autonomous grasping system of the manipulator of the mobile robot, which enables the mobile robot to grasp cylinders, cones, and cubes with texture in different complex environments that can be detected by YOLO. The autonomous grasping in this study is divided into four stages: object classification and grasping position detection, two image feature detection and matching, three-dimensional space positioning, and shape fitting. With new deep learning technology advancement, object classification and grasping position detection can be easily developed from open-source code. In this study, the SURF algorithm was used to detect and match the feature points of the left and right images. This algorithm had a good performance in detecting feature points. However, the matching logarithm could be wrong because of the similarity of the feature points, and the wrong matching logarithm will have a serious impact on the system. Since this study used a binocular stereo vision (BSV) system, where the epipolar constraint is the constraint in the left and right image, we add the epipolar constraint into feature matching, increasing the accuracy rate by 25% reduces the time by half. For the BSV system, the non-parallel BSV is used, the visual positioning accuracy and the overlapping vision range of the two cameras are both superior to the parallel BSV. There are many derivations for the positioning equation in the literature; however, they were all under the premise of ideal mechanism, without considering the mechanism construction and installation errors. Therefore, this study derived the BSV positioning model with mechanism errors that can reduce the 2mm positioning error of the system.
In the fitting part, the shape of the object can be known through deep learning. If the target is a cylindrical or conical object, the circle can be fitted through at least three points or more. Therefore, the least square method was applied in this study to fit the circle. Nevertheless, there will be a few wrong matching logarithms. The RANSAC algorithm was added to eliminate the error points and increase the accuracy, and then the grasping point is the circle center. If the target is a cubic object, feature points can be easily found on the plane in SURF but lacking the corner of the object. To increase the accuracy, the HSV method will fetch the object and find the contour of the object through the connected-component labeling before fitting. Further, the convex hull of the image is used to find the corner points of the object. Next, it calculates the three-dimensional space coordinates of the corner points of the two images and adds them to the scatter points for fitting. The cubic object is orthogonal for two planes; hence, after compressing the cloud points to the X-Z plane. The least-square method and RANSAC algorithm are used to fit two orthogonal straight lines, and the grasping point can be obtained by calculating the intersection point of two mid-perpendicular lines. Since the grasping point is an image coordinate system, it is necessary to transform it into a manipulator coordinate system. The transform matrix is calculated by the relationship between the camera and the end of the manipulator. Finally, Simultaneous Localization and Mapping (SLAM) was used for positioning in this study, and the approximate initial posture of the manipulator was given. This step allows the autonomous grasping system to see the target picking up an object and implement the system before execution. The experimental results show that the time cost of this system to grasp a target is 0.8 seconds, and the system can successfully grasp within 10mm acceptable error of the gripper, the grasping error is less than 5mm.

指導教授推薦書    I
口試委員會審定書    II
誌謝    III
摘要    IV
Abstract    VI
List of Tables    X
List of Figures    XI
Nomenclature    XIII
Chapter 1    Introduction    1
1    Background and Motivation    1
2    Research Purposes    2
3    Literature Review    3
3.1    Related Studies on Binocular Stereo Vision    3
3.2    Related Studies on Camera Calibration    4
3.3    Related Studies on Feature Point Algorithm    5
3.4    Related Studies on Stereo Vision and Reconstruction    6
4    Thesis Structure    9
Chapter 2    System Architecture and Research Methods    10
1    System Architecture    10
2    Hardware Architecture    11
2.1    Mobile Platform Hardware Design    12
2.2    Hardware Equipment Introduction    14
3    Robot Operating System    19
4    Simultaneous Localization and Mapping Technology    22
Chapter 3    Binocular Stereo Vision (BSV) System    24
1    Point Location Model    24
2    Camera Calibration    27
Chapter 4    Autonomous Grasping System    29
1    Automatic Sample Training System    29
1.1    Automatic Extraction of Object and Background Fusion Algorithm    30
1.2    Neural Network Dataset Construction    31
1.3    Double-layer YOLO Recognition Network Architecture    33
2    Speeded Up Robust Features (SURF)    36
2.1    Feature Point Detection    36
2.2    Feature Point Decription    39
2.3    Feature Point Matching    41
2.4    Feature Point Matching Based on Epipolar Constraint    42
3    Estimation of Grasping Position    45
4.1.    Least Square Method    45
4.2.    Grasping of Circular Object    46
4.3.    Grasping of Square Cylinder Object    49
Chapter 5    Experimental Results and Analysis    52
1    BSV System Location    52
2    Feature Point Matching Method Based on Epipolar Constraint    54
3    Autonomous Grasping Experiment    56
4    Pick-and-placement Experiment    61
Chapter 6    Conclusions and Future Work    62
1    Conclusion    62
2    Future Work    63
Reference    …………………………………………………………………………..64

                                

[1] H. Christopher Longuet-Higgins, “A computer algorithm for reconstructing a scene from two projections,” Nature, vol. 5828, no. 291, pp. 133-135, 1981.
[2] L. Qiong, Q. Xiansheng, Y. Shenshun and H. Feng, “Structural Parameters Optimal Design and Accuracy Analysis for Binocular Vision Measure System,” IEEE/ASME International Conference on Advanced Intelligent Mechatronics, pp. 156-161, 2008.
[3] L. Yang, B. Wang, R. Zhang, H. Zhou and R. Wang, “Analysis on Location Accuracy for the Binocular Stereo Vision System,” IEEE Photonics Journal, vol. 10, no. 1, pp. 1-16, Feb. 2018.
[4] A. Kapadia, D. Braganza, D. M. Dawson and M. L. McIntyre, “Adaptive camera calibration with measurable position of fixed features,” American Control Conference, pp. 3869-3874, 2008.
[5] A. Fetić, D. Jurić and D. Osmanković, “The procedure of a camera calibration using camera calibration toolbox for MATLAB,” Proceedings of the 35th International Convention MIPRO, pp. 1752-1757, 2012.
[6] L. Song, W. Wu, J. Guo and X. Li, “Survey on camera calibration technique, ” 5th International Conference on Intelligent Human-Machine Systems and Cybernetics, vol. 2, pp. 389-392, 2013.
[7] F. Pirahansiah, S. N. H. S. Abdullah and S. Sahran, “Camera calibration for multi-modal robot vision based on image quality assessment,” 10th Asian Control Conference (ASCC), pp. 1-6, 2015.
[8] F. Jin and X. Wang, “An autonomous camera calibration system based on the theory of minimum convex hull,” Fifth International Conference on Instrumentation and Measurement, Computer, Communication and Control (IMCCC), pp. 857-860, 2015.
[9] D. G. Lowe, “Object recognition from local scale-invariant features,” Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150-1157, 1999.
[10] H. Bay, A. Ess, T. Tuytelaars and L. Van Gool, “SURF: Speeded Up Robust Features,” Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346-359, 2008.
[11] H. Sheng, S. Wei, X. Yu and L. Tang, “Research on Binocular Visual System of Robotic Arm Based on Improved SURF Algorithm,” IEEE Sensors Journal, vol. 20, no. 20, pp. 11849-11855, 2019.
[12] Y. M. Mustafah, R. Noor, H. Hasbi and A. W. Azma, “Stereo vision images processing for real-time object distance and size measurements,” International Conference on Computer and Communication Engineering (ICCCE), pp. 659-663, 2012.
[13] P. Ondrúška, P. Kohli and S. Izadi, “MobileFusion: real-time volumetric surface reconstruction and dense tracking on mobile phones,” IEEE Transactions on Visualization and Computer Graphics, vol. 21, no. 11, pp. 1251-1258, 2015.
[14] M. Yuda, Z. Xiangjun, S. Weiming and L. Shaofeng, “Target accurate positioning based on the point cloud created by stereo vision,” 23rd International Conference on Mechatronics and Machine Vision in Practice (M2VIP), pp. 1-5, 2016.
[15] S. B. Mane and S. Vhanale, “Real time obstacle detection for mobile robot navigation using stereo vision,” International Conference on Computing, Analytics and Security Trends (CAST), pp. 637-642, 2016.
[16] M. Liu, C. Shan, H. Zhang and Q. Xia, “Stereo vision based road free space detection,” 9th International Symposium on Computational Intelligence and Design (ISCID), vol. 2, pp. 272-276, 2016.
[17] D. Chung, S. Hong and J. Kim, “Underwater pose estimation relative to planar hull surface using stereo vision,” IEEE Underwater Technology (UT), pp. 1-4, 2017.
[18] K.A. Yu, “Development of a Deep Learning Classification Approach for Indoor Personal Image Servo Manipulation Objects,” Master thesis of National Taiwan University of Science and Technology, July 2017.
[19] Y.J. Lin, “Emulating Human Grasp Strategies for Identifying Manipulator Grasp Pose and Position with RGB-D Images,” Master thesis of National Taiwan University of Science and Technology, July 2020.
[20] J. Redmon, A. Farhadi, “YOLOv3: An Incremental Improvemen,” Computer Vision and Pattern Recognition, pp. 1-5, 2018.

全文公開日期 2024/07/10 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文