簡易檢索 / 詳目顯示

研究生: 邱庭威
Ting-Wei Chiu
論文名稱: 基於特徵金字塔與轉譯器於高解析度影像之微小瑕疵偵測
Study on Feature Pyramid Network with the Use of Transformer for Tiny Defect Detection in High Resolution Images
指導教授: 蘇順豐
Shun-Feng Su
口試委員: 郭重顯
Chung-Hsien Kuo
黃有評
Yo-Ping Huang
陳美勇
Mei-Yung Chan
莊鎮嘉
Chen-Chia Chuang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 72
中文關鍵詞: 視網膜網路轉譯器自注意力機制瑕疵檢測小物件偵測網格分類
外文關鍵詞: RetinaNet, Transformer, Self-attention, Defect Detection, Tiny Object Detection, Grid-based Classification
相關次數: 點閱:193下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨於使用圖像分類和目標檢測方法來對高解析度影像中的消費性電子產品進行微小瑕疵檢測 (Tiny Defect Detection),並透過特徵金字塔網路 (FPN)和轉譯器 (Transformer)提高檢測效能。在本研究中,產品影像數據集由Garmin Ltd.所提供,其為拍攝於產品檢驗流水線上之髮絲紋表面的高單價智慧型手錶。在這些4608×3288像素的高解析度影像中,其瑕疵位置標註係由YOLOv4所標註而非人工標註,因此此數據集中存在一定程度的標註誤差。為了確保輸出結果正確,測試資料集中的影像被重新標註以驗證結果。影像中的瑕疵微小致使本研究難以對影像解析度進行改變。在本研究中,首先採用了基於網格分類的殘差神經網路 (ResNet)方法以來避免改變影像解析度。而無瑕疵網格間的巨大差異導致基於網格分類的方法無法獲得好的效果。另一個基於目標偵測的方法,視網膜網路 (RetinaNet),為本方法的基礎模型。此外,為了提高物件偵測效果,此方法採用雙重注意力機制,分別來自於殘差轉譯器神經網路(Bottleneck Transformer)和殘差分散注意力網路 (ResNeSt)。藉由將多頭自注意力機制 (Multi-Head Self-Attention)與分散注意力機制 (Split-Attention)加入預訓練權重後,本篇研究所提出之模型在重新標記的測試數據集中可以達到0.2101 mAP和0.9167 Recall的效果。這證明我們所提出的架構在此數據集中相較於基礎模型擁有更好的性能。


    The study considers image classification and object detection methods on tiny defect detection in high resolution images of consumer electronics and is aimed at using Feature Pyramid Network (FPN) with the use of Transformer to improve detection performance. In this research, the dataset of product images is provided by Garmin Ltd. The images are for high unit price intelligent watches with brushed surface, and the images in the dataset were all taken on the assembly line. Those images are high-resolution images with 4608×3288 pixels with the defect positions marked by YOLOv4 instead of real labeling. For this reason, the labeling errors occurred in this dataset. To make sure the results obtained are corrected, the testing dataset is relabeled for evaluation. Moreover, the defects in those images are tiny so that it is hard to change the image size for experiments. In our study, a grid-based classification technique with Residual Neural Network (ResNet) was first adopted to avoid resizing images. Due to massive disparity on normal grids, the grid-based classification technique cannot work well. Other object detection based approaches, RetinaNet is used as a baseline model in defect detections. In addition, adopted from the Residual Transformer Neural Network (Bottleneck Transformer) and Residual Split-Attention Network (ResNeSt), dual attention mechanisms are employed in the object detection method used to improve the detection performance. After adding Multi-Head Self-Attention and Split-Attention into the pre-trained weight, the proposed model can achieve 0.2101 mAP and 0.9167 Recall in the relabeled testing dataset. It proves that our suggested architecture boosts the performance better than the initial one in this dataset.

    中文摘要 I Abstract II 致謝 III Table of Contents IV List of Figures VII List of Tables IX Chapter 1 Introduction 1 1.1 Background 1 1.2 Motivation and Data Introduction 2 1.3 Contributions 5 1.4 Thesis Organization 5 Chapter 2 Related Work 6 2.1 Object Segmentation 6 2.2 Grid-based Classification 7 2.3 Object Detection 7 Chapter 3 Methodology 13 3.1 Background 14 3.2 Structure 14 3.2.1 Grid-based Classification 14 3.2.2 Object Detection 15 3.3 Data Pre-processing 16 3.3.1 Grid-based Classification 17 3.3.2 Object Detection 23 3.4 Network and Backbone Architecture 24 3.4.1 ResNet 24 3.4.2 ResNeXt 24 3.4.3 ResNeSt 26 3.4.4 BotNet 29 3.4.5 BotNeSt 31 3.5 Modified and Redesign Architecture 34 3.5.1 Primitive Design 34 3.5.2 RetinaNet with P2 Level 36 3.5.3 RetinaNet with Different BackBone 38 3.6 Loss Functions 38 Chapter 4 Experiments 40 4.1 Dataset 40 4.2 Evaluation Metric 43 4.3 Experiments 44 4.3.1 Grid-based Classification 44 4.3.2 Object Detection 46 4.4 Implement Details 49 4.5 Environment 50 4.6 Results and Analysis 51 4.6.1 Grid-based Classification 51 4.6.2 Object Detection 52 Chapter 5 Conclusions and Future Work 54 5.1 Conclusions 54 5.2 Future Work 54 5.2.1 Grid-based Classification 54 5.2.2 Object Detection 55 References 56

    [1] Y. Cheng, D. HongGui, and F. YuXin, "Effects of faster region-based convolutional neural network on the detection efficiency of rail defects under machine vision," IEEE 5th information technology and mechatronics engineering conference (ITOEC), pp. 1377-1380, 2020.
    [2] L. Shang, Q. Yang, J. Wang, S. Li, and W. Lei, "Detection of rail surface defects based on CNN image recognition and classification," IEEE 20th international conference on advanced communication technology (ICACT), pp. 45-51, 2018.
    [3] D. Amin and S. Akhter, "Deep learning-based defect detection system in steel sheet surfaces," IEEE region 10 symposium (TENSYMP), pp. 444-448, 2020.
    [4] S. Cheon, H. Lee, C. O. Kim, and S. H. Lee, "Convolutional neural network for wafer surface defect classification and the detection of unknown defect class," IEEE transactions on semiconductor manufacturing, vol. 32, no. 2, pp. 163-170, 2019.
    [5] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117-2125, 2017.
    [6] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Advances in neural information processing systems, pp. 5998-6008, 2017.
    [7] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "Yolov4: Optimal speed and accuracy of object detection," arXiv preprint arXiv:2004.10934, 2020.
    [8] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal loss for dense object detection," Proceedings of the IEEE international conference on computer vision, pp. 2980-2988, 2017.
    [9] X. Wang and Z. Hu, "Grid-based pavement crack analysis using deep learning," IEEE international conference on transportation information and safety (ICTIS), pp. 917-924, 2017.
    [10] A. Srinivas, T.-Y. Lin, N. Parmar, J. Shlens, P. Abbeel, and A. Vaswani, "Bottleneck transformers for visual recognition," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16519-16529, 2021.
    [11] H. Zhang, C. Wu, Z. Zhang, Y. Zhu, Z. Zhang, H. Lin, Y. Sun, T. He, J. Mueller, and R. Manmatha, "Resnest: split-attention networks," arXiv preprint arXiv:2004.08955, 2020.
    [12] O. Ronneberger, P. Fischer, and T. Brox, "U-net: convolutional networks for biomedical image segmentation," Medical image computing and computer-assisted intervention (MICCAI), pp. 234-241, 2015.
    [13] L. Breiman, "Random forests," Machine learning, vol. 45, no. 1, pp. 5-32, 2001.
    [14] D. Racki, D. Tomazevic, and D. Skocaj, "A compact convolutional neural network for textured surface anomaly detection," IEEE winter conference on applications of computer vision (WACV), pp. 1331-1339, 2018.
    [15] D. Tabernik, S. Šela, J. Skvarč, and D. Skočaj, "Segmentation-based deep-learning approach for surface-defect detection," Journal of intelligent manufacturing, vol. 31, no. 3, pp. 759-776, 2020.
    [16] Z. Li, J. Li, and W. Dai, "A two-stage multiscale residual attention network for light guide plate defect detection," IEEE access, vol. 9, pp. 2780-2792, 2020.
    [17] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "CBAM: convolutional block attention module," Proceedings of the european conference on computer vision (ECCV), pp. 3-19, 2018.
    [18] J. Park, S. Woo, J.-Y. Lee, and I. S. Kweon, "BAM: bottleneck attention module," arXiv preprint arXiv:1807.06514, 2018.
    [19] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, "Dual attention network for scene segmentation," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3146-3154, 2019.
    [20] J. Shlens, "A tutorial on principal component analysis," arXiv preprint arXiv:1404.1100, 2014.
    [21] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
    [22] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
    [23] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9, 2015.
    [24] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, "Aggregated residual transformations for deep neural networks," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492-1500, 2017.
    [25] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: unified, real-time object detection," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016.
    [26] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "SSD: Single shot multibox detector," European conference on computer vision, pp. 21-37, 2016.
    [27] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: towards real-time object detection with region proposal networks," IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 6, pp. 1137-1149, 2016.
    [28] E. H. Adelson, C. H. Anderson, J. R. Bergen, P. J. Burt, and J. M. Ogden, "Pyramid methods in image processing," RCA engineer, vol. 29, no. 6, pp. 33-41, 1984.
    [29] L. Xiao, B. Wu, and Y. Hu, "Surface defect detection using image pyramid," IEEE sensors journal, vol. 20, no. 13, pp. 7181-7188, 2020.
    [30] K. Li, X. Wang, and L. Ji, "Application of multi-scale feature fusion and deep learning in detection of steel strip surface defect," International conference on artificial intelligence and advanced manufacturing (AIAM), pp. 656-661, 2019.
    [31] X. Cheng and J. Yu, "RetinaNet with difference channel attention and adaptively spatial feature fusion for steel surface defect detection," IEEE transactions on instrumentation and measurement, vol. 70, pp. 1-11, 2020.
    [32] A. Neubeck and L. Van Gool, "Efficient non-maximum suppression," 18th International conference on pattern recognition (ICPR'06), vol. 3, pp. 850-855, 2006.
    [33] X. Qin, Z. Zhang, C. Huang, M. Dehghan, O. R. Zaiane, and M. Jagersand, "U2-Net: Going deeper with nested U-structure for salient object detection," Pattern recognition, vol. 106, p. 107404, 2020.
    [34] T. He, Z. Zhang, H. Zhang, Z. Zhang, J. Xie, and M. Li, "Bag of Tricks for Image Classification with Convolutional Neural Networks," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 558-567, 2019.
    [35] J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132-7141, 2018.
    [36] X. Li, W. Wang, X. Hu, and J. Yang, "Selective kernel networks," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 510-519, 2019.
    [37] X. Wang, R. Girshick, A. Gupta, and K. He, "Non-local neural networks," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794-7803, 2018.
    [38] P. J. Huber, "Robust estimation of a kocation parameter," Breakthroughs in statistics, pp. 492-518, 1992.
    [39] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," IEEE conference on computer vision and pattern recognition, pp. 248-255, 2009.
    [40] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
    [41] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-CAM: visual explanations from deep networks via gradient-based localization," Proceedings of the IEEE international conference on computer vision, pp. 618-626, 2017.
    [42] O. Chapelle, B. Scholkopf, and A. Zien, "Semi-supervised learning," IEEE transactions on neural networks, vol. 20, no. 3, pp. 542-542, 2009.

    無法下載圖示 全文公開日期 2026/09/06 (校內網路)
    全文公開日期 2028/09/06 (校外網路)
    全文公開日期 2028/09/06 (國家圖書館:臺灣博碩士論文系統)
    QR CODE