多階段數據增強與擴張聯合注意力神經網路｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	莊涵宇 Han-Yu Chuang
論文名稱：	多階段數據增強與擴張聯合注意力神經網路 Multi-Stage Data Augmentation and Dilated Coordinate Attention Neural Network
指導教授：	花凱龍 Kai-Lung Hua 沈上翔 Shan-Hsiang Shen
口試委員:	陳宜惠 Yi-Hui Chen 陳永耀 Yung-Yao Chen 花凱龍 Kai-Lung Hua 楊朝龍 Chao-Lung Yang 沈上翔 Shan-Hsiang Shen
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2023
畢業學年度：	111
語文別：	英文
論文頁數：	35
中文關鍵詞：	注意力機制、資料擴增、多階段資料擴增、異常檢測、擴張捲積、監督式神經網路
外文關鍵詞：	Attention mechanism, Data augmentation, CutMix method, Anomaly detection, Dilated convolution, Supervised learning
相關次數：	點閱：167 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

深度學習的異常檢測在許多工業應用中扮演著重要的角色，如何利用先進瑕疵檢測技術，確保生產品質是工業製造中重要的任務，但因為缺乏帶有標記的大量資料，以及含有各種瑕疵情況的已標記資料，現有的瑕疵檢測模型通常只使用正常的樣本進行訓練，這樣可能會導致模型對於異常樣本的檢測效果不佳，因此為了解決這個問題，本篇論文額外引入了兩種數據增強技術，能夠模擬出真實瑕疵的情況，讓模型能夠學習到更加貼近真實環境的特徵，提高模型對於異常樣本的檢測準確率。具體來說本篇論文改善了CutMix 的數據增強技術，可以將單張圖片進行圖像混合，能夠更精確的混合成一個新的圖像，將其應用到瑕疵檢測領域中，並增加了對於瑕疵樣本的生成。另一種數據增強方式是使用柏林噪聲的方式產生特定類型異常的知識。柏林噪聲是一種生成隨機數值的演算法，可以生成接近真實世界中可能出現的異常樣本，此外本篇論文還採用了注意力機制的技術，並結合不同空洞率的捲積層，能夠讓模型更加關注關鍵的區域，且能夠得到更廣的視野愈的同時，也避免可能的梯度消失的問題，從而提高瑕疵檢測的準確率。通過這些方法的結合，本篇論文在此領域的資料集(VisA ) 上取得了有競爭力的結果。

Ensuring product quality is crucial in anomaly detection for industrial production.
Using advanced defect detection technology to assure product
quality is a vital task in manufacturing. However, due to the lack of massive
annotated data encompassing various defect scenarios, current defect
detection models often rely solely on normal samples for training. This approach
might cause lower detection performance for anomalous samples.
Therefore, to solve this issue, this paper introduces two data augmentation
techniques capable of simulating real defect conditions. These allow the
model to learn features more closely from real-world anomalies, enhancing
its anomaly detection accuracy. Specifically, this paper improves upon the
CutMix data augmentation technique, which can blend single images more
precisely to create a new one, applying it to defect detection and enhancing
the generation of defect samples. Another data augmentation method involves
generating knowledge about specific types of anomalies using Perlin
noise, an algorithm for generating random noise that appears in the real
world., which can produce anomalous samples possibly encountered in the
future. The article also implements attention mechanisms and combines
convolutional layers with varying dilation rates to get more comprehensive
feature extraction. By adjusting the dilation rates of the convolutional
layers, the receptive field of the model can be controlled, allowing for the
focus on different scales of information in the input data. This significantly
enhances the model’s versatility and robustness, helping it adapt to different
anomalies better. The article demonstrates that competitive results can
be achieved on the VisA dataset in anomaly detection in deep learning by
combining these methods.

Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . i
Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . v
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Anomaly Detection Approch . . . . . . . . . . . . . . . . 4
2.2 Data augmentation . . . . . . . . . . . . . . . . . . . . . 4
2.3 Attention Mechanism . . . . . . . . . . . . . . . . . . . . 5
3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1 Overview of Our Approach . . . . . . . . . . . . . . . . . 8
3.2 Learning Disentangled Abnormalities . . . . . . . . . . . 8
3.2.1 Loss Function . . . . . . . . . . . . . . . . . . . . 16
4 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.1.1 VisA . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Implementation Details . . . . . . . . . . . . . . . . . . . 21
4.3 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . 21
4.3.1 Improve CutMix . . . . . . . . . . . . . . . . . . 22
4.3.2 Multi-Synthetic Anomaly . . . . . . . . . . . . . 26
4.3.3 DCA Module . . . . . . . . . . . . . . . . . . . . 28
4.4 Comparisons with other model . . . . . . . . . . . . . . . 30
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Letter of Authority . . . . . . . . . . . . . . . . . . . . . . . . . . 36
                                

[1] Y.-C. Liu, M. Shahid, W. Sarapugdi, Y.-X. Lin, J.-C. Chen, and K.-L. Hua, “Cascaded atrous dual
attention u-net for tumor segmentation,” Multimedia tools and applications, vol. 80, pp. 30007–30031,
2021.
[2] Y.-C. Liu, D. S. Tan, J.-C. Chen, W.-H. Cheng, and K.-L. Hua, “Segmenting hepatic lesions using
residual attention u-net with an adaptive weighted dice loss,” in 2019 IEEE International Conference
on Image Processing (ICIP), pp. 3322–3326, IEEE, 2019.
[3] M. Shahid, S.-F. Chen, Y.-L. Hsu, Y.-Y. Chen, Y.-L. Chen, and K.-L. Hua, “Forest fire segmentation
via temporal transformer from aerial images,” Forests, vol. 14, no. 3, p. 563, 2023.
[4] M. Shahid and K.-l. Hua, “Fire detection using transformer network,” in Proceedings of the 2021
International Conference on Multimedia Retrieval, pp. 627–630, 2021.
[5] M. Shahid, I.-F. Chien, W. Sarapugdi, L. Miao, and K.-L. Hua, “Deep spatial-temporal networks for
flame detection,” Multimedia Tools and Applications, vol. 80, pp. 35297–35318, 2021.
[6] K. Perlin, “An image synthesizer,” ACM Siggraph Computer Graphics, vol. 19, no. 3, pp. 287–296,1985.
[7] Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile network design,” in Proceedings
of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13713–13722,
2021.
[8] N. Görnitz, M. Kloft, K. Rieck, and U. Brefeld, “Toward supervised anomaly detection,” Journal of
Artificial Intelligence Research, vol. 46, pp. 235–262, 2013.
[9] W. Liu, W. Luo, Z. Li, P. Zhao, S. Gao, et al., “Margin learning embedded prediction for video anomaly
detection with a few anomalies.,” in IJCAI, pp. 3023–3030, 2019.
[10] G. Pang, L. Cao, L. Chen, and H. Liu, “Learning representations of ultrahigh-dimensional data for
random distance-based outlier detection,” in Proceedings of the 24th ACM SIGKDD international
conference on knowledge discovery & data mining, pp. 2041–2050, 2018.
[11] L. Ruff, R. A. Vandermeulen, N. Görnitz, A. Binder, E. Müller, K.-R. Müller, and M. Kloft, “Deep
semi-supervised anomaly detection,” arXiv preprint arXiv:1906.02694, 2019.
[12] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error
visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–
612, 2004.
[13] Y. Zou, J. Jeong, L. Pemula, D. Zhang, and O. Dabeer, “Spot-the-difference self-supervised pretraining
for anomaly detection and segmentation,” in Computer Vision–ECCV 2022: 17th European
Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXX, pp. 392–408, Springer,
2022.
[14] T. DeVries and G. W. Taylor, “Improved regularization of convolutional neural networks with cutout,”
arXiv preprint arXiv:1708.04552, 2017.
[15] S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “Cutmix: Regularization strategy to train
strong classifiers with localizable features,” in Proceedings of the IEEE/CVF international conference
on computer vision, pp. 6023–6032, 2019.
[16] W. Li, K. Liu, L. Zhang, and F. Cheng, “Object detection based on an adaptive attention mechanism,”
Scientific Reports, vol. 10, no. 1, p. 11307, 2020.
[17] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, pp. 7132–7141, 2018.
[18] J. Hu, L. Shen, S. Albanie, G. Sun, and A. Vedaldi, “Gather-excite: Exploiting feature context in
convolutional neural networks,” Advances in neural information processing systems, vol. 31, 2018.
[19] I. Bello, B. Zoph, A. Vaswani, J. Shlens, and Q. V. Le, “Attention augmented convolutional networks,”
in Proceedings of the IEEE/CVF international conference on computer vision, pp. 3286–3295, 2019.
[20] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings
of the European conference on computer vision (ECCV), pp. 3–19, 2018.
[21] H.-A. Hsia, C.-H. Lin, B.-H. Kung, J.-T. Chen, D. S. Tan, J.-C. Chen, and K.-L. Hua, “Clipcam:
A simple baseline for zero-shot text-guided object and action localization,” in ICASSP 2022-2022
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4453–4457, IEEE, 2022.
[22] Q. Hou, L. Zhang, M.-M. Cheng, and J. Feng, “Strip pooling: Rethinking spatial pooling for scene
parsing,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
pp. 4003–4012, 2020.
[23] Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, “Ccnet: Criss-cross attention for semantic
segmentation,” in Proceedings of the IEEE/CVF international conference on computer vision,
pp. 603–612, 2019.
[24] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,”
in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
pp. 3146–3154, 2019.
[25] W.-Y. Chen, P. Podstreleny, W.-H. Cheng, Y.-Y. Chen, and K.-L. Hua, “Code generation from a graphical
user interface via attention-based encoder–decoder model,” Multimedia Systems, vol. 28, no. 1,
pp. 121–130, 2022.
[26] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings
of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[27] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural
networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition,
pp. 1492–1500, 2017.
[28] M. Shahid, J. J. Virtusio, Y.-H. Wu, Y.-Y. Chen, M. Tanveer, K. Muhammad, and K.-L. Hua, “Spatiotemporal
self-attention network for fire detection and segmentation in video surveillance,” IEEE Access,vol. 10, pp. 1259–1275, 2021.
[29] H.-H. Lin, J.-D. Lin, J. J. M. Ople, J.-C. Chen, and K.-L. Hua, “Social media popularity prediction
based on multi-modal self-attention mechanisms,” IEEE Access, vol. 10, pp. 4448–4455, 2021.
[30] J. J. M. Ople, P.-Y. Yeh, S.-W. Sun, I.-T. Tsai, and K.-L. Hua, “Multi-scale neural network with dilated
convolutions for image deblurring,” IEEE Access, vol. 8, pp. 53942–53952, 2020.
[31] P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Uninformed students: Student-teacher anomaly
detection with discriminative latent embeddings,” in Proceedings of the IEEE/CVF conference on
computer vision and pattern recognition, pp. 4183–4192, 2020.
[32] G. Pang, C. Ding, C. Shen, and A. v. d. Hengel, “Explainable deep few-shot anomaly detection with
deviation networks,” arXiv preprint arXiv:2108.00462, 2021.
[33] S. Wang, L. Wu, L. Cui, and Y. Shen, “Glancing at the patch: Anomaly localization with global
and local feature comparison,” in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, pp. 254–263, 2021.
[34] J. Yi and S. Yoon, “Patch svdd: Patch-level svdd for anomaly detection and segmentation,” in Proceedings
of the Asian Conference on Computer Vision, 2020.
[35] M. Yang, P. Wu, and H. Feng, “Memseg: A semi-supervised method for image surface defect detection
using differences and commonalities,” Engineering Applications of Artificial Intelligence, vol. 119,
p. 105835, 2023.
[36] V. Zavrtanik, M. Kristan, and D. Skočaj, “Draem-a discriminatively trained reconstruction embedding
for surface anomaly detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8330–8339, 2021.
[37] M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi, “Describing textures in the wild,”
in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3606–3613,2014.
[38] G. Pang, C. Shen, and A. van den Hengel, “Deep anomaly detection with deviation networks,” in Proceedings
of the 25th ACM SIGKDD international conference on knowledge discovery & data mining,
pp. 353–362, 2019.
[39] C. Ding, G. Pang, and C. Shen, “Catching both gray and black swans: Open-set supervised anomaly
detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7388–7398, 2022.
[40] C.-L. Li, K. Sohn, J. Yoon, and T. Pfister, “Cutpaste: Self-supervised learning for anomaly detection and localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9664–9674, 2021.

全文公開日期 2025/08/07 (校內網路)
全文公開日期 2025/08/07 (校外網路)
全文公開日期 2025/08/07 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文