簡易檢索 / 詳目顯示

研究生: 黃子峰
Zih-Fong Huang
論文名稱: 為了機器人抓取球形及圓柱形物體的三維K-mer物體辨識及重建技術
3D K-mer-based Object Recognition and Reconstruction for Robotic Grasping of Spherical and Cylindrical Objects
指導教授: 林柏廷
Po-Ting Lin
口試委員: 吳育瑋
Yu-Wei Wu
林紀穎
Chi-Ying Lin
陳永耀
Yung-Yao Chen
學位類別: 碩士
Master
系所名稱: 工程學院 - 機械工程系
Department of Mechanical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 117
中文關鍵詞: 三維點雲重建深度學習物體辨識人工智慧模糊屬性歸類
外文關鍵詞: 3D Point Cloud Reconstruction, Deep Learning, Object Recognition, Artificial Intelligence, Fuzzy Clustering
相關次數: 點閱:242下載:3
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來因高科技產業迅速發展,現如今的自動化產業常要面臨多樣少量的客製化需求,其原因為從前的多量生產的銷售模式是將產品生產好之後存至庫存再將商品販售,但科技的迅速發展常使庫存品未銷售完就有新的產品推出,使庫存品的價值降低導致損失,然而客製化的需求則會增加製造商的生產成本,能應用於多樣化產品的製造模式將會是自動化產業的趨勢,多樣化的產品擁有多樣化的形狀(曲面、不規則與平面),選用相對應的夾爪能使效率提升,自動化產線中常應用的設備為機械手臂,手臂常會結合影像視覺輔助手臂定位,使手臂進行更精密的操作,將手臂搭配三維點雲技術除了能得到物體的外觀資訊,還能得知物體薄殼的位置資訊,將手臂的操作更細緻化,但利用點雲掃描設備及一視覺情況下取得的資料,無法精準的呈現物體在空間中的位置關係。
    故本論文利用Intel RealSense Depth Camera D455[1]深度相機,透過一視覺的物體薄殼點雲資訊,結合三維K-mer編碼及深度學習,辨識球形及圓柱形兩種形貌,利用其辨識出的結果對應此形貌最適應夾爪機構,本論文利用10-fold與4種模型常用指標(Accuracy、Precision、Recall與F1-Score)評估本論文辨識結果成效,最高準確率99.67%(±0.67%)優於PointNet[2]、VoxelNet[3]、LeNet[4]與AlexNet[5]四種方法。
    將辨識出的形貌進行相對應的重建方法,其重建資料為利用一視覺情況下取得的點雲資料,透過本論文的Fuzzy Clustering Lagrange Minimization方程式進行物體最佳幾何參數尋找,重建出的物體幾何將能更精準呈現物體在空間中的關係,本論文會利用尺寸誤差及重疊誤差作為重建好壞的量化,其尺寸誤差利用拍攝物體尺寸參數與計算得出的幾何參數差值求得,重疊誤差利用拍攝物體體積及兩物體疊合體積差值求得,本論文球形的重建誤差最小為0.14%,耗時0.348秒,最小的重心距離差11.53mm,最小的重疊誤差12.1%。圓柱體的平均重建誤差最小為15.59%,耗時949秒,最小的重心距離差為17.21mm,最小的重疊誤差37%。於產線中利用本論文的形貌辨識及重建方法,將能更因應多樣化的產品任務。


    In recent years, due to the rapid development of high-tech industries, the automation industry has been facing increasing demands for diverse and customized products. The reason behind this is that the traditional sales model used to produce products in large quantities, store them in inventory, and then sell them. However, with the fast-paced technological advancements, new products are often introduced before the existing inventory is sold completely, leading to a decrease in the value of the stored products and resulting in losses. On the other hand, the growing demand for customized products adds to the production costs for manufacturers. Therefore, the trend in the automation industry is shifting towards versatile manufacturing to cater to diverse product requirements, which come in various shapes such as curved surfaces, irregular shapes, and flat surfaces. Choosing the appropriate grippers can enhance operational efficiency.
    One commonly used equipment in automated production lines is the robotic arm, which is often combined with visual perception systems for precise operations. By integrating the robotic arm with three-dimensional point cloud technology, the system can not only obtain visual information about the object's appearance but also determine the object's surface location, enabling more refined and precise manipulations. However, relying solely on point cloud scanning devices and monocular vision may not accurately represent the spatial relationships of objects.
    To address these challenges, this study utilizes an Intel RealSense Depth Camera D455[1], which captures three-dimensional point cloud information of objects using monocular vision. It combines three-dimensional K-mer encoding with deep learning techniques to distinguish between two types of object shapes: spheres and cylinders. The results of shape recognition are used to adapt the gripper mechanism accordingly. The performance of the proposed method is evaluated using a 10-fold cross-validation approach, and four commonly used metrics, namely Accuracy, Precision, Recall, and F1-Score, are employed. The highest achieved accuracy is 99.67% (±0.67%), outperforming four other methods, namely PointNet[2], VoxelNet[3], LeNet[4], and AlexNet[5].
    Furthermore, the study applies corresponding reconstruction methods to the recognized shapes, using the point cloud data obtained through monocular vision. The Fuzzy Clustering Lagrange Minimization equation proposed in this study is utilized to find the optimal geometric parameters for reconstructing the objects. The reconstructed geometric representation provides a more accurate depiction of the spatial relationships of the objects. The evaluation of the reconstruction utilizes size and overlap errors as quantifiers of reconstruction quality. The size error is computed by comparing the object's size parameters obtained from the captured data with the calculated geometric parameters. The overlap error is determined based on the difference in volumes between the captured object and the reconstructed object. The minimum reconstruction error achieved for spheres is 0.14%, with a processing time of 0.348 seconds, the minimum centroid distance error is 11.53mm, and the minimum overlap error is 12.1%. For cylinders, the average minimum reconstruction error is 15.59%, with a processing time of 949 seconds, the minimum centroid distance error is 17.21mm, and the minimum overlap error is 37%. Implementing the proposed shape recognition and reconstruction methods in production lines enables efficient adaptation to diverse product tasks.

    摘要 I ABSTRACT III 誌謝 VI 目錄 VII 圖目錄 XI 表目錄 XIX 符號索引 XXI 第一章 緒論 1 1.1 前言 1 1.2 研究目標 2 1.3 論文架構 3 第二章 文獻回顧 5 2.1 K-mer 5 2.2 夾爪 6 2.3 三維點雲 8 2.3.1 三維點雲分割 9 2.3.2 三維點雲辨識 10 2.3.3 三維點雲重建 13 2.4 深度學習網路 15 2.4.1 神經網路架構 15 2.4.1.1 激活函數(Activation Function) 17 2.4.1.2 損失函數(Loss Function) 19 2.4.2 評估指標 20 2.4.3 K-Fold交叉驗證 21 2.4.4 卷積神經網路(Convolutional Neural Network, CNN) 22 2.4.5 八元樹卷積神經網路(Octree-based Convolutional Neural Networks, O-CNN) 24 第三章 研究方法 29 3.1 三維點雲前處理 31 3.1.1 資料庫點雲處理 32 3.1.2 拍攝物體點雲處理 34 3.1.2.1 二維影像處理 35 3.1.2.2 三維點雲過濾 38 3.2 三維K-mer採樣 40 3.2.1 三維K-mer轉換 41 3.2.2 牛頓法(Newton’s Method) 43 3.2.3 最佳化目標 45 3.2.4 K-mer 編碼參數 48 3.3 深度學習 49 3.4 三維點雲重建(Fuzzy Clustering Lagrange Minimization) 50 3.4.1 單屬性幾何物體重建 50 3.4.2 多屬性幾何物體重建 53 第四章 實驗結果 62 4.1 分類器實驗結果 64 4.1.1 找尋角度比較 65 4.1.2 Kpoints比較 65 4.1.3 分割參數以及分層參數比較 66 4.1.4 模型比較 67 4.1.4.1 PointNet[2] 67 4.1.4.2 VoxelNet[3] 67 4.1.4.3 LeNet[4] 68 4.1.4.4 AlexNet[5] 68 4.1.4.5 模型結果比較 68 4.2 三維重建實驗 69 4.2.1 球形 70 4.2.2 圓柱體 80 4.3 抓取模擬結果 91 4.4 實驗結果總結 93 第五章 結論與未來展望 94 5.1 結論 94 5.2 未來展望 94 參考文獻 96 附錄A 102

    [1] intel, Intel® RealSense™ Depth Camera D455, URL: https://www.intelrealsense.com/depth-camera-d455/.
    [2] C. R. Qi, H. Su, K. Mo, L. J. Guibas, Pointnet: Deep learning on point sets for 3d classification and segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition, 652-660, 2017.
    [3] D. Maturana, S. Scherer, Voxnet: A 3d convolutional neural network for real-time object recognition, 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), 922-928, 2015.
    [4] Y. LeCun, LeNet-5, convolutional neural networks, URL: http://yann. lecun. com/exdb/lenet, 20(5), 14, 2015.
    [5] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, K. Keutzer, SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size, arXiv preprint arXiv:1602.07360, 2016.
    [6] P. T. Lin, B. R. Lin, Fuzzy automatic contrast enhancement based on fuzzy C-means clustering in CIELAB color space, 2016 12th IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications (MESA), 1-10, 2016.
    [7] 黃正傑, 使用基於卷積類神經網路的強化矩陣進行圖像還原, 國立臺灣科技大學, 2022.
    [8] J. Qiu, Q. Wu, G. Ding, Y. Xu, S. Feng, A survey of machine learning for big data processing, EURASIP Journal on Advances in Signal Processing, 2016, 1-16, 2016.
    [9] S. Kariin, C. Burge, Dinucleotide relative abundance extremes: a genomic signature, Trends in genetics, 11(7), 283-290, 1995.
    [10] S. Karlin, Z.-Y. Zhu, K. D. Karlin, The extended environment of mononuclear metal centers in protein structures, Proceedings of the National Academy of Sciences, 94(26), 14225-14230, 1997.
    [11] J. Mrazek, S. Karlin, Detecting Alien Genes in Bacterial Genomes a, Annals of the New York Academy of Sciences, 870(1), 314-329, 1999.
    [12] S. Kurtz, A. Narechania, J. C. Stein, D. Ware, A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes, BMC genomics, 9(1), 1-18, 2008.
    [13] C. Zhu, T. O. Delmont, T. M. Vogel, Y. Bromberg, Functional basis of microorganism classification, PLoS computational biology, 11(8), e1004472, 2015.
    [14] O. Ciferri, Spirulina, the edible microorganism, Microbiological reviews, 47(4), 551-578, 1983.
    [15] 姚佑達, 基於二階段多保真最佳化之智慧影像辨識方法, 國立台灣科技大學, 2020.
    [16] 林新翔, 基於 K-mer 深度學習於旋轉圖像之影像辨識方法, 國立台灣科技大學, 2021.
    [17] 張皓崴, 基於 K-mer 圖像特徵生成影像資料擴增, 國立臺灣科技大學, 2021.
    [18] 葉致和, 多維度K-mer辨識技術之研究:1D K-mer信號辨識及 2.5D K-mer物體辨識, 國立台灣科技大學, 2022.
    [19] P. V. P. Reddy, V. Suresh, A review on importance of universal gripper in industrial robot applications, Int. J. Mech. Eng. Robot. Res, 2(2), 255-264, 2013.
    [20] A. Milojević, S. Linß, Ž. Ćojbašić, H. Handroos, A novel simple, adaptive, and versatile soft-robotic compliant two-finger gripper with an inherently gentle touch, Journal of Mechanisms and Robotics, 13(1), 011015, 2021.
    [21] 洪揚, 具兩夾持模式與三種姿態之不足驅動被動適應夾爪之設計, 2016.
    [22] A. Nguyen, B. Le, 3D point cloud segmentation: A survey, 2013 6th IEEE conference on robotics, automation and mechatronics (RAM), 225-230, 2013.
    [23] A. D. Sappa, M. Devy, Fast range image segmentation by an edge detection strategy, Proceedings Third International Conference on 3-D Digital Imaging and Modeling, 292-299, 2001.
    [24] T. Rabbani, F. Van Den Heuvel, G. Vosselmann, Segmentation of point clouds using smoothness constraint, International archives of photogrammetry, remote sensing and spatial information sciences, 36(5), 248-253, 2006.
    [25] G. Vosselman, S. Dijkman, 3D building model reconstruction from point clouds and ground plans, International archives of photogrammetry remote sensing and spatial information sciences, 34(3/W4), 37-44, 2001.
    [26] R. Schnabel, R. Wahl, R. Klein, Efficient RANSAC for point‐cloud shape detection, Computer graphics forum, 214-226, 2007.
    [27] A. Golovinskiy, T. Funkhouser, Min-cut based segmentation of point clouds, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, 39-46, 2009.
    [28] P. F. Felzenszwalb, D. P. Huttenlocher, Efficient graph-based image segmentation, International journal of computer vision, 59, 167-181, 2004.
    [29] D. Griffiths, J. Boehm, A review on deep learning techniques for 3D sensed data classification, Remote Sensing, 11(12), 1499, 2019.
    [30] Y. Zhang, M. Rabbat, A graph-cnn for 3d point cloud classification, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 6279-6283, 2018.
    [31] L. Alzubaidi, J. Zhang, A. J. Humaidi, A. Al-Dujaili, Y. Duan, O. Al-Shamma, J. Santamaría, M. A. Fadhel, M. Al-Amidie, L. Farhan, Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions, Journal of big Data, 8, 1-74, 2021.
    [32] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, J. Xiao, 3d shapenets: A deep representation for volumetric shapes, Proceedings of the IEEE conference on computer vision and pattern recognition, 1912-1920, 2015.
    [33] B. Graham, L. Van der Maaten, Submanifold sparse convolutional networks, arXiv preprint arXiv:1706.01307, 2017.
    [34] H. Su, S. Maji, E. Kalogerakis, E. Learned-Miller, Multi-view convolutional neural networks for 3d shape recognition, Proceedings of the IEEE international conference on computer vision, 945-953, 2015.
    [35] M. Simonovsky, N. Komodakis, Dynamic edge-conditioned filters in convolutional neural networks on graphs, Proceedings of the IEEE conference on computer vision and pattern recognition, 3693-3702, 2017.
    [36] M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, A. J. Smola, Deep sets, Advances in neural information processing systems, 30, 2017.
    [37] Z. Ma, S. Liu, A review of 3D reconstruction techniques in civil engineering and their applications, Advanced Engineering Informatics, 37, 163-174, 2018.
    [38] H. Fan, H. Su, L. J. Guibas, A point set generation network for 3d object reconstruction from a single image, Proceedings of the IEEE conference on computer vision and pattern recognition, 605-613, 2017.
    [39] L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, A. Geiger, Occupancy networks: Learning 3d reconstruction in function space, Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4460-4470, 2019.
    [40] M. A. Nielsen, Neural networks and deep learning, 25, Determination press San Francisco, CA, USA, 2015.
    [41] B. Mahesh, Machine learning algorithms-a review, International Journal of Science and Research (IJSR).[Internet], 9, 381-386, 2020.
    [42] C. M. Bishop, Neural networks and their applications, Review of scientific instruments, 65(6), 1803-1832, 1994.
    [43] P.-S. Wang, Y. Liu, Y.-X. Guo, C.-Y. Sun, X. Tong, O-cnn: Octree-based convolutional neural networks for 3d shape analysis, ACM Transactions On Graphics (TOG), 36(4), 1-11, 2017.
    [44] D. Meagher, Geometric modeling using octree encoding, Computer graphics and image processing, 19(2), 129-147, 1982.
    [45] Wikipedia, Octree, URL: https://zh.wikipedia.org/zh-tw/%E5%85%AB%E5%8F%89%E6%A0%91.
    [46] SolidWorks, SolidWorks, 2021.
    [47] Ansys, Ansys Workbench, 2022, URL: https://www.ansys.com/products/ansys-workbench.
    [48] L. Ding, A. Goshtasby, On the Canny edge detector, Pattern recognition, 34(3), 721-725, 2001.

    QR CODE