簡易檢索 / 詳目顯示

研究生: 莊涵宇
Han-Yu Chuang
論文名稱: 多階段數據增強與擴張聯合注意力神經網路
Multi-Stage Data Augmentation and Dilated Coordinate Attention Neural Network
指導教授: 花凱龍
Kai-Lung Hua
沈上翔
Shan-Hsiang Shen
口試委員: 陳宜惠
Yi-Hui Chen
陳永耀
Yung-Yao Chen
花凱龍
Kai-Lung Hua
楊朝龍
Chao-Lung Yang
沈上翔
Shan-Hsiang Shen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 35
中文關鍵詞: 注意力機制資料擴增多階段資料擴增異常檢測擴張捲積監督式神經網路
外文關鍵詞: Attention mechanism, Data augmentation, CutMix method, Anomaly detection, Dilated convolution, Supervised learning
相關次數: 點閱:167下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 深度學習的異常檢測在許多工業應用中扮演著重要的角色,如何利用先進瑕疵檢測技術,確保生產品質是工業製造中重要的任務,但因為缺乏帶有標記的大量資料,以及含有各種瑕疵情況的已標記資料,現有的瑕疵檢測模型通常只使用正常的樣本進行訓練,這樣可能會導致模型對於異常樣本的檢測效果不佳,因此為了解決這個問題,本篇論文額外引入了兩種數據增強技術,能夠模擬出真實瑕疵的情況,讓模型能夠學習到更加貼近真實環境的特徵,提高模型對於異常樣本的檢測準確率。具體來說本篇論文改善了CutMix 的數據增強技術,可以將單張圖片進行圖像混合,能夠更精確的混合成一個新的圖像,將其應用到瑕疵檢測領域中,並增加了對於瑕疵樣本的生成。另一種數據增強方式是使用柏林噪聲的方式產生特定類型異常的知識。柏林噪聲是一種生成隨機數值的演算法,可以生成接近真實世界中可能出現的異常樣本,此外本篇論文還採用了注意力機制的技術,並結合不同空洞率的捲積層,能夠讓模型更加關注關鍵的區域,且能夠得到更廣的視野愈的同時,也避免可能的梯度消失的問題,從而提高瑕疵檢測的準確率。通過這些方法的結合,本篇論文在此領域的資料集(VisA ) 上取得了有競爭力的結果。


    Ensuring product quality is crucial in anomaly detection for industrial production.
    Using advanced defect detection technology to assure product
    quality is a vital task in manufacturing. However, due to the lack of massive
    annotated data encompassing various defect scenarios, current defect
    detection models often rely solely on normal samples for training. This approach
    might cause lower detection performance for anomalous samples.
    Therefore, to solve this issue, this paper introduces two data augmentation
    techniques capable of simulating real defect conditions. These allow the
    model to learn features more closely from real-world anomalies, enhancing
    its anomaly detection accuracy. Specifically, this paper improves upon the
    CutMix data augmentation technique, which can blend single images more
    precisely to create a new one, applying it to defect detection and enhancing
    the generation of defect samples. Another data augmentation method involves
    generating knowledge about specific types of anomalies using Perlin
    noise, an algorithm for generating random noise that appears in the real
    world., which can produce anomalous samples possibly encountered in the
    future. The article also implements attention mechanisms and combines
    convolutional layers with varying dilation rates to get more comprehensive
    feature extraction. By adjusting the dilation rates of the convolutional
    layers, the receptive field of the model can be controlled, allowing for the
    focus on different scales of information in the input data. This significantly
    enhances the model’s versatility and robustness, helping it adapt to different
    anomalies better. The article demonstrates that competitive results can
    be achieved on the VisA dataset in anomaly detection in deep learning by
    combining these methods.

    Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . i Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Abstract in Chinese . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract in English . . . . . . . . . . . . . . . . . . . . . . . . . . iv Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . v Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Anomaly Detection Approch . . . . . . . . . . . . . . . . 4 2.2 Data augmentation . . . . . . . . . . . . . . . . . . . . . 4 2.3 Attention Mechanism . . . . . . . . . . . . . . . . . . . . 5 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.1 Overview of Our Approach . . . . . . . . . . . . . . . . . 8 3.2 Learning Disentangled Abnormalities . . . . . . . . . . . 8 3.2.1 Loss Function . . . . . . . . . . . . . . . . . . . . 16 4 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.1.1 VisA . . . . . . . . . . . . . . . . . . . . . . . . 19 4.2 Implementation Details . . . . . . . . . . . . . . . . . . . 21 4.3 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . 21 4.3.1 Improve CutMix . . . . . . . . . . . . . . . . . . 22 4.3.2 Multi-Synthetic Anomaly . . . . . . . . . . . . . 26 4.3.3 DCA Module . . . . . . . . . . . . . . . . . . . . 28 4.4 Comparisons with other model . . . . . . . . . . . . . . . 30 5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Letter of Authority . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    [1] Y.-C. Liu, M. Shahid, W. Sarapugdi, Y.-X. Lin, J.-C. Chen, and K.-L. Hua, “Cascaded atrous dual
    attention u-net for tumor segmentation,” Multimedia tools and applications, vol. 80, pp. 30007–30031,
    2021.
    [2] Y.-C. Liu, D. S. Tan, J.-C. Chen, W.-H. Cheng, and K.-L. Hua, “Segmenting hepatic lesions using
    residual attention u-net with an adaptive weighted dice loss,” in 2019 IEEE International Conference
    on Image Processing (ICIP), pp. 3322–3326, IEEE, 2019.
    [3] M. Shahid, S.-F. Chen, Y.-L. Hsu, Y.-Y. Chen, Y.-L. Chen, and K.-L. Hua, “Forest fire segmentation
    via temporal transformer from aerial images,” Forests, vol. 14, no. 3, p. 563, 2023.
    [4] M. Shahid and K.-l. Hua, “Fire detection using transformer network,” in Proceedings of the 2021
    International Conference on Multimedia Retrieval, pp. 627–630, 2021.
    [5] M. Shahid, I.-F. Chien, W. Sarapugdi, L. Miao, and K.-L. Hua, “Deep spatial-temporal networks for
    flame detection,” Multimedia Tools and Applications, vol. 80, pp. 35297–35318, 2021.
    [6] K. Perlin, “An image synthesizer,” ACM Siggraph Computer Graphics, vol. 19, no. 3, pp. 287–296,1985.
    [7] Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile network design,” in Proceedings
    of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13713–13722,
    2021.
    [8] N. Görnitz, M. Kloft, K. Rieck, and U. Brefeld, “Toward supervised anomaly detection,” Journal of
    Artificial Intelligence Research, vol. 46, pp. 235–262, 2013.
    [9] W. Liu, W. Luo, Z. Li, P. Zhao, S. Gao, et al., “Margin learning embedded prediction for video anomaly
    detection with a few anomalies.,” in IJCAI, pp. 3023–3030, 2019.
    [10] G. Pang, L. Cao, L. Chen, and H. Liu, “Learning representations of ultrahigh-dimensional data for
    random distance-based outlier detection,” in Proceedings of the 24th ACM SIGKDD international
    conference on knowledge discovery & data mining, pp. 2041–2050, 2018.
    [11] L. Ruff, R. A. Vandermeulen, N. Görnitz, A. Binder, E. Müller, K.-R. Müller, and M. Kloft, “Deep
    semi-supervised anomaly detection,” arXiv preprint arXiv:1906.02694, 2019.
    [12] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error
    visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–
    612, 2004.
    [13] Y. Zou, J. Jeong, L. Pemula, D. Zhang, and O. Dabeer, “Spot-the-difference self-supervised pretraining
    for anomaly detection and segmentation,” in Computer Vision–ECCV 2022: 17th European
    Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXX, pp. 392–408, Springer,
    2022.
    [14] T. DeVries and G. W. Taylor, “Improved regularization of convolutional neural networks with cutout,”
    arXiv preprint arXiv:1708.04552, 2017.
    [15] S. Yun, D. Han, S. J. Oh, S. Chun, J. Choe, and Y. Yoo, “Cutmix: Regularization strategy to train
    strong classifiers with localizable features,” in Proceedings of the IEEE/CVF international conference
    on computer vision, pp. 6023–6032, 2019.
    [16] W. Li, K. Liu, L. Zhang, and F. Cheng, “Object detection based on an adaptive attention mechanism,”
    Scientific Reports, vol. 10, no. 1, p. 11307, 2020.
    [17] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in Proceedings of the IEEE conference
    on computer vision and pattern recognition, pp. 7132–7141, 2018.
    [18] J. Hu, L. Shen, S. Albanie, G. Sun, and A. Vedaldi, “Gather-excite: Exploiting feature context in
    convolutional neural networks,” Advances in neural information processing systems, vol. 31, 2018.
    [19] I. Bello, B. Zoph, A. Vaswani, J. Shlens, and Q. V. Le, “Attention augmented convolutional networks,”
    in Proceedings of the IEEE/CVF international conference on computer vision, pp. 3286–3295, 2019.
    [20] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings
    of the European conference on computer vision (ECCV), pp. 3–19, 2018.
    [21] H.-A. Hsia, C.-H. Lin, B.-H. Kung, J.-T. Chen, D. S. Tan, J.-C. Chen, and K.-L. Hua, “Clipcam:
    A simple baseline for zero-shot text-guided object and action localization,” in ICASSP 2022-2022
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4453–4457, IEEE, 2022.
    [22] Q. Hou, L. Zhang, M.-M. Cheng, and J. Feng, “Strip pooling: Rethinking spatial pooling for scene
    parsing,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
    pp. 4003–4012, 2020.
    [23] Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, “Ccnet: Criss-cross attention for semantic
    segmentation,” in Proceedings of the IEEE/CVF international conference on computer vision,
    pp. 603–612, 2019.
    [24] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,”
    in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition,
    pp. 3146–3154, 2019.
    [25] W.-Y. Chen, P. Podstreleny, W.-H. Cheng, Y.-Y. Chen, and K.-L. Hua, “Code generation from a graphical
    user interface via attention-based encoder–decoder model,” Multimedia Systems, vol. 28, no. 1,
    pp. 121–130, 2022.
    [26] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings
    of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
    [27] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural
    networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition,
    pp. 1492–1500, 2017.
    [28] M. Shahid, J. J. Virtusio, Y.-H. Wu, Y.-Y. Chen, M. Tanveer, K. Muhammad, and K.-L. Hua, “Spatiotemporal
    self-attention network for fire detection and segmentation in video surveillance,” IEEE Access,vol. 10, pp. 1259–1275, 2021.
    [29] H.-H. Lin, J.-D. Lin, J. J. M. Ople, J.-C. Chen, and K.-L. Hua, “Social media popularity prediction
    based on multi-modal self-attention mechanisms,” IEEE Access, vol. 10, pp. 4448–4455, 2021.
    [30] J. J. M. Ople, P.-Y. Yeh, S.-W. Sun, I.-T. Tsai, and K.-L. Hua, “Multi-scale neural network with dilated
    convolutions for image deblurring,” IEEE Access, vol. 8, pp. 53942–53952, 2020.
    [31] P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “Uninformed students: Student-teacher anomaly
    detection with discriminative latent embeddings,” in Proceedings of the IEEE/CVF conference on
    computer vision and pattern recognition, pp. 4183–4192, 2020.
    [32] G. Pang, C. Ding, C. Shen, and A. v. d. Hengel, “Explainable deep few-shot anomaly detection with
    deviation networks,” arXiv preprint arXiv:2108.00462, 2021.
    [33] S. Wang, L. Wu, L. Cui, and Y. Shen, “Glancing at the patch: Anomaly localization with global
    and local feature comparison,” in Proceedings of the IEEE/CVF Conference on Computer Vision and
    Pattern Recognition, pp. 254–263, 2021.
    [34] J. Yi and S. Yoon, “Patch svdd: Patch-level svdd for anomaly detection and segmentation,” in Proceedings
    of the Asian Conference on Computer Vision, 2020.
    [35] M. Yang, P. Wu, and H. Feng, “Memseg: A semi-supervised method for image surface defect detection
    using differences and commonalities,” Engineering Applications of Artificial Intelligence, vol. 119,
    p. 105835, 2023.
    [36] V. Zavrtanik, M. Kristan, and D. Skočaj, “Draem-a discriminatively trained reconstruction embedding
    for surface anomaly detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8330–8339, 2021.
    [37] M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, and A. Vedaldi, “Describing textures in the wild,”
    in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3606–3613,2014.
    [38] G. Pang, C. Shen, and A. van den Hengel, “Deep anomaly detection with deviation networks,” in Proceedings
    of the 25th ACM SIGKDD international conference on knowledge discovery & data mining,
    pp. 353–362, 2019.
    [39] C. Ding, G. Pang, and C. Shen, “Catching both gray and black swans: Open-set supervised anomaly
    detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7388–7398, 2022.
    [40] C.-L. Li, K. Sohn, J. Yoon, and T. Pfister, “Cutpaste: Self-supervised learning for anomaly detection and localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9664–9674, 2021.

    無法下載圖示 全文公開日期 2025/08/07 (校內網路)
    全文公開日期 2025/08/07 (校外網路)
    全文公開日期 2025/08/07 (國家圖書館:臺灣博碩士論文系統)
    QR CODE