簡易檢索 / 詳目顯示

研究生: 張容瑄
Rong-Hsuan Chang
論文名稱: 基於顯著特徵指引與抑制擴散於影像拼接類別活化之弱監督語意分割任務
Saliency Guidance and Expansion Suppression on PuzzleCAM for Weakly Supervised Semantic Segmentation
指導教授: 郭景明
Jing-Ming Guo
口試委員: 郭景明
Jing-Ming Guo
楊家輝
Jar-Ferr Yang
楊士萱
Shih-Hsuan Yang
賴文能
Wen-Nung Lie
王乃堅
Nai-Jian Wang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 中文
論文頁數: 95
中文關鍵詞: 弱監督學習語意分割任務類別活化圖偽標籤深度學習
外文關鍵詞: Weakly Supervised, Semantic Segmentation, Class Activation Map, Pseudo Mask, Deep Learning
相關次數: 點閱:286下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 語意分割任務對圖像提供逐點的類別預測,進而產生分析結果,以往的語義分割任務須要使用像素級別的分類標註進行網路訓練,學習各語意間不同的特徵,然而逐點標注相當耗時與耗人力,因此現今語意分割朝向減少標注來完成相同目標,在此篇論文中使用影像級別的分類標注完成目標。
    對於現有使用影像級別標注的方法下,通常以類別映射取得物件基本區域位置為首要步驟,類別映射的方式下通過圖像分類可以取得物件的關注區域,不過因分類任務往往關注到較為明確與相近的特徵,然而同類別物件的影像中經常出現相似的場景,使出現(1)物件過度關注特定區域導致其他物件區域被忽略、(2) 物件範圍超出區域與(3) 物件內部色彩平滑區域難以辨識。
    本篇論文為了解決在類別映射時對於物件內過度關注問題導致其餘區域遭忽視,使用抑制控制的方法減少強烈關注點,使關注範圍向四周擴散,此外,為了解決場景誤判為物件的問題,本論文導入顯著特徵指引,利用前景與背景輔助學習物件區域範圍,以改善關注區域擴張的問題,有效的將範圍限制在物件主要區域上,同時解決物件中色彩一致特徵不明確的問題。
    在實驗結果方面,本論文採用公開切割競賽資料庫PASCAL VOC 2012進行測試,並與前人的方法比較,從結果顯示提出的架構所生成的偽標籤於此公開資料集訓練集上的在平均交並比(mIoU)上可達76.0 %,且使用偽標籤訓練語意切割網路其效能在此公開資料集驗證集與測試集上平均交並比(mIoU)上可達73.3%與73.5%。


    Former semantic segmentation tasks provided pixel-wised category prediction for images, indicating that it requires pixel-level annotation to learn the varying features. Yet, pixel-wised annotation is time-consuming and labor-intensive. Nowadays, semantic segmentation aims to reduce annotation while accomplishing the same goal, which is achieved in this thesis simply using image-level categorical annotation.
    The primary process of weakly supervised semantic segmentation is in general to generate pseudo masks with class activation mapping. Since classification tasks tend to focus on explicit features and similar appearances such as the scene. Yet, the issues as follows normally occur with the produced pseudo mask. (1) Excessive focus on a specific area causes affiliated ones neglected, (2) perceiving range is beyond the boundary, and (3) color smoothing of the internal objects is difficult to recognize.
    This thesis is proposed to address the above three issues. To solve the problem of being overly focused on significant features, the suppression expansion module is used to diminish the intensely centralized features to expand the attention view. In addition, to tackle the problem of scenario misclassification, the saliency guide module is adopted to assist in learning regional information. It constricts the object area effectively while resolving the challenge of internal color smoothing simultaneously.
    Experimental results show that the pseudo masks generated by the proposed network can achieve 76.0% in mIoU with the PASCAL VOC 2012 training set. The performance of the segmentation network trained with the pseudo masks is up to 73.3% and 73.5% in mIoU on the validation and testing set of PASCAL VOC 2012.

    摘要 IV Abstract V 致謝 VI 目錄 VII 圖片索引 IX 表格索引 XII 第一章 緒論 1 1.1 背景介紹 1 1.2 研究動機與目的 2 1.3 論文架構 5 第二章 文獻探討 6 2.1 深度學習架構與特徵萃取技術 6 2.1.1 類神經網路(Artificial Neural Network, ANN) 7 2.1.2 卷積神經網路(Convolutional Neural Network, CNN) 11 2.2 網路學習方式 18 2.2.1 監督學習(Supervised Learning) 18 2.2.2 非監督學習(Unsupervised Learning) 19 2.2.3 弱監督學習(Weakly Supervised Learning) 20 2.3 監督式圖像分割(Supervised Image Segmentation) 23 2.3.1 全卷積網路 24 2.3.2 對稱性編碼器與解碼器架構 25 2.3.3 多尺度模型 26 2.4 弱監督圖像分割(Weakly Supervised Image Segmentation) 30 2.4.1 偽標籤生成方法 31 2.4.2 顯著影像輔助方法 34 2.4.3 偽標籤強化方法 36 2.5 多任務學習機制 42 第三章 研究方法 44 3.1 整體架構 46 3.2 類別活化流程與架構 46 3.3 抑制擴散模組 48 3.4 顯著特徵指引架構 50 3.5 損失函數(Loss Function) 52 3.5.1 Classification loss 52 3.5.2 Reconstruction loss 53 3.5.3 Saliency loss 53 第四章 實驗結果 55 4.1 公開資料集 55 4.1.1 PASCAL VOC 2012 55 4.1.2 增量資料集 57 4.1.3 顯著影像資料集 59 4.2 測試環境 61 4.3 實驗結果與分析 61 4.3.1 定量評估指標 61 4.3.2 網路訓練參數設置 62 4.3.3 自我評估比較 67 4.3.4 實驗結果比較 72 第五章 結論與未來展望 79 參考文獻 80

    [1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
    [2] M.-C. Popescu, V. E. Balas, L. Perescu-Popescu, and N. Mastorakis, "Multilayer perceptron and neural networks," WSEAS Transactions on Circuits and Systems, vol. 8, no. 7, pp. 579-588, 2009.
    [3] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
    [4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
    [5] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
    [6] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
    [7] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
    [8] H. Zhang et al., "Resnest: Split-attention networks," arXiv preprint arXiv:2004.08955, 2020.
    [9] A. Ng, "Sparse autoencoder," CS294A Lecture notes, vol. 72, no. 2011, pp. 1-19, 2011.
    [10] A. Radford, L. Metz, and S. Chintala, "Unsupervised representation learning with deep convolutional generative adversarial networks," arXiv preprint arXiv:1511.06434, 2015.
    [11] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440.
    [12] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention, 2015: Springer, pp. 234-241.
    [13] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Semantic image segmentation with deep convolutional nets and fully connected crfs," arXiv preprint arXiv:1412.7062, 2014.
    [14] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs," IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834-848, 2017.
    [15] K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1904-1916, 2015.
    [16] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, "Rethinking atrous convolution for semantic image segmentation," arXiv preprint arXiv:1706.05587, 2017.
    [17] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, "Encoder-decoder with atrous separable convolution for semantic image segmentation," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801-818.
    [18] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, "Learning deep features for discriminative localization," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921-2929.
    [19] S. Jo and I.-J. Yu, "Puzzle-CAM: Improved localization via matching partial and full features," in 2021 IEEE International Conference on Image Processing (ICIP), 2021: IEEE, pp. 639-643.
    [20] Q. Yao and X. Gong, "Saliency guided self-attention network for weakly and semi-supervised semantic segmentation," IEEE Access, vol. 8, pp. 14413-14423, 2020.
    [21] S. Lee, M. Lee, J. Lee, and H. Shim, "Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 5495-5505.
    [22] J. Ahn and S. Kwak, "Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4981-4990.
    [23] G. R. Cross and A. K. Jain, "Markov random field texture models," IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 1, pp. 25-39, 1983.
    [24] C. Sutton and A. McCallum, "An introduction to conditional random fields," Foundations and Trends® in Machine Learning, vol. 4, no. 4, pp. 267-373, 2012.
    [25] P. Krähenbühl and V. Koltun, "Efficient inference in fully connected crfs with gaussian edge potentials," Advances in neural information processing systems, vol. 24, 2011.
    [26] M. Lin, Q. Chen, and S. Yan, "Network in network," arXiv preprint arXiv:1312.4400, 2013.
    [27] B. Kim, S. Han, and J. Kim, "Discriminative region suppression for weakly-supervised semantic segmentation," in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, no. 2, pp. 1754-1761.
    [28] B. Bischke, P. Helber, J. Folz, D. Borth, and A. Dengel, "Multi-task learning for segmentation of building footprints with deep neural networks," in 2019 IEEE International Conference on Image Processing (ICIP), 2019: IEEE, pp. 1480-1484.
    [29] D. Cheng, G. Meng, S. Xiang, and C. Pan, "FusionNet: Edge aware deep convolutional networks for semantic segmentation of remote sensing harbor images," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 10, no. 12, pp. 5769-5783, 2017.
    [30] J. Dai, K. He, and J. Sun, "Instance-aware semantic segmentation via multi-task network cascades," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3150-3158.
    [31] B. Hariharan, P. Arbeláez, L. Bourdev, S. Maji, and J. Malik, "Semantic contours from inverse detectors," in 2011 international conference on computer vision, 2011: IEEE, pp. 991-998.
    [32] J.-J. Liu, Q. Hou, M.-M. Cheng, J. Feng, and J. Jiang, "A simple pooling-based design for real-time salient object detection," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3917-3926.
    [33] P.-T. Jiang, Y. Yang, Q. Hou, and Y. Wei, "L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16886-16896.
    [34] S. H. Jo, I. J. Yu, and K.-S. Kim, "RecurSeed and CertainMix for Weakly Supervised Semantic Segmentation," arXiv preprint arXiv:2204.06754, 2022.

    無法下載圖示 全文公開日期 2024/08/29 (校內網路)
    全文公開日期 2024/08/29 (校外網路)
    全文公開日期 2024/08/29 (國家圖書館:臺灣博碩士論文系統)
    QR CODE