研究生: |
邱庭威 Ting-Wei Chiu |
---|---|
論文名稱: |
基於特徵金字塔與轉譯器於高解析度影像之微小瑕疵偵測 Study on Feature Pyramid Network with the Use of Transformer for Tiny Defect Detection in High Resolution Images |
指導教授: |
蘇順豐
Shun-Feng Su |
口試委員: |
郭重顯
Chung-Hsien Kuo 黃有評 Yo-Ping Huang 陳美勇 Mei-Yung Chan 莊鎮嘉 Chen-Chia Chuang |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 英文 |
論文頁數: | 72 |
中文關鍵詞: | 視網膜網路 、轉譯器 、自注意力機制 、瑕疵檢測 、小物件偵測 、網格分類 |
外文關鍵詞: | RetinaNet, Transformer, Self-attention, Defect Detection, Tiny Object Detection, Grid-based Classification |
相關次數: | 點閱:193 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本研究旨於使用圖像分類和目標檢測方法來對高解析度影像中的消費性電子產品進行微小瑕疵檢測 (Tiny Defect Detection),並透過特徵金字塔網路 (FPN)和轉譯器 (Transformer)提高檢測效能。在本研究中,產品影像數據集由Garmin Ltd.所提供,其為拍攝於產品檢驗流水線上之髮絲紋表面的高單價智慧型手錶。在這些4608×3288像素的高解析度影像中,其瑕疵位置標註係由YOLOv4所標註而非人工標註,因此此數據集中存在一定程度的標註誤差。為了確保輸出結果正確,測試資料集中的影像被重新標註以驗證結果。影像中的瑕疵微小致使本研究難以對影像解析度進行改變。在本研究中,首先採用了基於網格分類的殘差神經網路 (ResNet)方法以來避免改變影像解析度。而無瑕疵網格間的巨大差異導致基於網格分類的方法無法獲得好的效果。另一個基於目標偵測的方法,視網膜網路 (RetinaNet),為本方法的基礎模型。此外,為了提高物件偵測效果,此方法採用雙重注意力機制,分別來自於殘差轉譯器神經網路(Bottleneck Transformer)和殘差分散注意力網路 (ResNeSt)。藉由將多頭自注意力機制 (Multi-Head Self-Attention)與分散注意力機制 (Split-Attention)加入預訓練權重後,本篇研究所提出之模型在重新標記的測試數據集中可以達到0.2101 mAP和0.9167 Recall的效果。這證明我們所提出的架構在此數據集中相較於基礎模型擁有更好的性能。
The study considers image classification and object detection methods on tiny defect detection in high resolution images of consumer electronics and is aimed at using Feature Pyramid Network (FPN) with the use of Transformer to improve detection performance. In this research, the dataset of product images is provided by Garmin Ltd. The images are for high unit price intelligent watches with brushed surface, and the images in the dataset were all taken on the assembly line. Those images are high-resolution images with 4608×3288 pixels with the defect positions marked by YOLOv4 instead of real labeling. For this reason, the labeling errors occurred in this dataset. To make sure the results obtained are corrected, the testing dataset is relabeled for evaluation. Moreover, the defects in those images are tiny so that it is hard to change the image size for experiments. In our study, a grid-based classification technique with Residual Neural Network (ResNet) was first adopted to avoid resizing images. Due to massive disparity on normal grids, the grid-based classification technique cannot work well. Other object detection based approaches, RetinaNet is used as a baseline model in defect detections. In addition, adopted from the Residual Transformer Neural Network (Bottleneck Transformer) and Residual Split-Attention Network (ResNeSt), dual attention mechanisms are employed in the object detection method used to improve the detection performance. After adding Multi-Head Self-Attention and Split-Attention into the pre-trained weight, the proposed model can achieve 0.2101 mAP and 0.9167 Recall in the relabeled testing dataset. It proves that our suggested architecture boosts the performance better than the initial one in this dataset.
[1] Y. Cheng, D. HongGui, and F. YuXin, "Effects of faster region-based convolutional neural network on the detection efficiency of rail defects under machine vision," IEEE 5th information technology and mechatronics engineering conference (ITOEC), pp. 1377-1380, 2020.
[2] L. Shang, Q. Yang, J. Wang, S. Li, and W. Lei, "Detection of rail surface defects based on CNN image recognition and classification," IEEE 20th international conference on advanced communication technology (ICACT), pp. 45-51, 2018.
[3] D. Amin and S. Akhter, "Deep learning-based defect detection system in steel sheet surfaces," IEEE region 10 symposium (TENSYMP), pp. 444-448, 2020.
[4] S. Cheon, H. Lee, C. O. Kim, and S. H. Lee, "Convolutional neural network for wafer surface defect classification and the detection of unknown defect class," IEEE transactions on semiconductor manufacturing, vol. 32, no. 2, pp. 163-170, 2019.
[5] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117-2125, 2017.
[6] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, "Attention is all you need," Advances in neural information processing systems, pp. 5998-6008, 2017.
[7] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, "Yolov4: Optimal speed and accuracy of object detection," arXiv preprint arXiv:2004.10934, 2020.
[8] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, "Focal loss for dense object detection," Proceedings of the IEEE international conference on computer vision, pp. 2980-2988, 2017.
[9] X. Wang and Z. Hu, "Grid-based pavement crack analysis using deep learning," IEEE international conference on transportation information and safety (ICTIS), pp. 917-924, 2017.
[10] A. Srinivas, T.-Y. Lin, N. Parmar, J. Shlens, P. Abbeel, and A. Vaswani, "Bottleneck transformers for visual recognition," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 16519-16529, 2021.
[11] H. Zhang, C. Wu, Z. Zhang, Y. Zhu, Z. Zhang, H. Lin, Y. Sun, T. He, J. Mueller, and R. Manmatha, "Resnest: split-attention networks," arXiv preprint arXiv:2004.08955, 2020.
[12] O. Ronneberger, P. Fischer, and T. Brox, "U-net: convolutional networks for biomedical image segmentation," Medical image computing and computer-assisted intervention (MICCAI), pp. 234-241, 2015.
[13] L. Breiman, "Random forests," Machine learning, vol. 45, no. 1, pp. 5-32, 2001.
[14] D. Racki, D. Tomazevic, and D. Skocaj, "A compact convolutional neural network for textured surface anomaly detection," IEEE winter conference on applications of computer vision (WACV), pp. 1331-1339, 2018.
[15] D. Tabernik, S. Šela, J. Skvarč, and D. Skočaj, "Segmentation-based deep-learning approach for surface-defect detection," Journal of intelligent manufacturing, vol. 31, no. 3, pp. 759-776, 2020.
[16] Z. Li, J. Li, and W. Dai, "A two-stage multiscale residual attention network for light guide plate defect detection," IEEE access, vol. 9, pp. 2780-2792, 2020.
[17] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, "CBAM: convolutional block attention module," Proceedings of the european conference on computer vision (ECCV), pp. 3-19, 2018.
[18] J. Park, S. Woo, J.-Y. Lee, and I. S. Kweon, "BAM: bottleneck attention module," arXiv preprint arXiv:1807.06514, 2018.
[19] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, "Dual attention network for scene segmentation," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 3146-3154, 2019.
[20] J. Shlens, "A tutorial on principal component analysis," arXiv preprint arXiv:1404.1100, 2014.
[21] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770-778, 2016.
[22] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[23] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1-9, 2015.
[24] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, "Aggregated residual transformations for deep neural networks," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492-1500, 2017.
[25] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: unified, real-time object detection," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016.
[26] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, "SSD: Single shot multibox detector," European conference on computer vision, pp. 21-37, 2016.
[27] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: towards real-time object detection with region proposal networks," IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 6, pp. 1137-1149, 2016.
[28] E. H. Adelson, C. H. Anderson, J. R. Bergen, P. J. Burt, and J. M. Ogden, "Pyramid methods in image processing," RCA engineer, vol. 29, no. 6, pp. 33-41, 1984.
[29] L. Xiao, B. Wu, and Y. Hu, "Surface defect detection using image pyramid," IEEE sensors journal, vol. 20, no. 13, pp. 7181-7188, 2020.
[30] K. Li, X. Wang, and L. Ji, "Application of multi-scale feature fusion and deep learning in detection of steel strip surface defect," International conference on artificial intelligence and advanced manufacturing (AIAM), pp. 656-661, 2019.
[31] X. Cheng and J. Yu, "RetinaNet with difference channel attention and adaptively spatial feature fusion for steel surface defect detection," IEEE transactions on instrumentation and measurement, vol. 70, pp. 1-11, 2020.
[32] A. Neubeck and L. Van Gool, "Efficient non-maximum suppression," 18th International conference on pattern recognition (ICPR'06), vol. 3, pp. 850-855, 2006.
[33] X. Qin, Z. Zhang, C. Huang, M. Dehghan, O. R. Zaiane, and M. Jagersand, "U2-Net: Going deeper with nested U-structure for salient object detection," Pattern recognition, vol. 106, p. 107404, 2020.
[34] T. He, Z. Zhang, H. Zhang, Z. Zhang, J. Xie, and M. Li, "Bag of Tricks for Image Classification with Convolutional Neural Networks," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 558-567, 2019.
[35] J. Hu, L. Shen, and G. Sun, "Squeeze-and-excitation networks," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132-7141, 2018.
[36] X. Li, W. Wang, X. Hu, and J. Yang, "Selective kernel networks," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 510-519, 2019.
[37] X. Wang, R. Girshick, A. Gupta, and K. He, "Non-local neural networks," Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7794-7803, 2018.
[38] P. J. Huber, "Robust estimation of a kocation parameter," Breakthroughs in statistics, pp. 492-518, 1992.
[39] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," IEEE conference on computer vision and pattern recognition, pp. 248-255, 2009.
[40] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
[41] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-CAM: visual explanations from deep networks via gradient-based localization," Proceedings of the IEEE international conference on computer vision, pp. 618-626, 2017.
[42] O. Chapelle, B. Scholkopf, and A. Zien, "Semi-supervised learning," IEEE transactions on neural networks, vol. 20, no. 3, pp. 542-542, 2009.