研究生: |
張容瑄 Rong-Hsuan Chang |
---|---|
論文名稱: |
基於顯著特徵指引與抑制擴散於影像拼接類別活化之弱監督語意分割任務 Saliency Guidance and Expansion Suppression on PuzzleCAM for Weakly Supervised Semantic Segmentation |
指導教授: |
郭景明
Jing-Ming Guo |
口試委員: |
郭景明
Jing-Ming Guo 楊家輝 Jar-Ferr Yang 楊士萱 Shih-Hsuan Yang 賴文能 Wen-Nung Lie 王乃堅 Nai-Jian Wang |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2022 |
畢業學年度: | 110 |
語文別: | 中文 |
論文頁數: | 95 |
中文關鍵詞: | 弱監督學習 、語意分割任務 、類別活化圖 、偽標籤 、深度學習 |
外文關鍵詞: | Weakly Supervised, Semantic Segmentation, Class Activation Map, Pseudo Mask, Deep Learning |
相關次數: | 點閱:286 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
語意分割任務對圖像提供逐點的類別預測,進而產生分析結果,以往的語義分割任務須要使用像素級別的分類標註進行網路訓練,學習各語意間不同的特徵,然而逐點標注相當耗時與耗人力,因此現今語意分割朝向減少標注來完成相同目標,在此篇論文中使用影像級別的分類標注完成目標。
對於現有使用影像級別標注的方法下,通常以類別映射取得物件基本區域位置為首要步驟,類別映射的方式下通過圖像分類可以取得物件的關注區域,不過因分類任務往往關注到較為明確與相近的特徵,然而同類別物件的影像中經常出現相似的場景,使出現(1)物件過度關注特定區域導致其他物件區域被忽略、(2) 物件範圍超出區域與(3) 物件內部色彩平滑區域難以辨識。
本篇論文為了解決在類別映射時對於物件內過度關注問題導致其餘區域遭忽視,使用抑制控制的方法減少強烈關注點,使關注範圍向四周擴散,此外,為了解決場景誤判為物件的問題,本論文導入顯著特徵指引,利用前景與背景輔助學習物件區域範圍,以改善關注區域擴張的問題,有效的將範圍限制在物件主要區域上,同時解決物件中色彩一致特徵不明確的問題。
在實驗結果方面,本論文採用公開切割競賽資料庫PASCAL VOC 2012進行測試,並與前人的方法比較,從結果顯示提出的架構所生成的偽標籤於此公開資料集訓練集上的在平均交並比(mIoU)上可達76.0 %,且使用偽標籤訓練語意切割網路其效能在此公開資料集驗證集與測試集上平均交並比(mIoU)上可達73.3%與73.5%。
Former semantic segmentation tasks provided pixel-wised category prediction for images, indicating that it requires pixel-level annotation to learn the varying features. Yet, pixel-wised annotation is time-consuming and labor-intensive. Nowadays, semantic segmentation aims to reduce annotation while accomplishing the same goal, which is achieved in this thesis simply using image-level categorical annotation.
The primary process of weakly supervised semantic segmentation is in general to generate pseudo masks with class activation mapping. Since classification tasks tend to focus on explicit features and similar appearances such as the scene. Yet, the issues as follows normally occur with the produced pseudo mask. (1) Excessive focus on a specific area causes affiliated ones neglected, (2) perceiving range is beyond the boundary, and (3) color smoothing of the internal objects is difficult to recognize.
This thesis is proposed to address the above three issues. To solve the problem of being overly focused on significant features, the suppression expansion module is used to diminish the intensely centralized features to expand the attention view. In addition, to tackle the problem of scenario misclassification, the saliency guide module is adopted to assist in learning regional information. It constricts the object area effectively while resolving the challenge of internal color smoothing simultaneously.
Experimental results show that the pseudo masks generated by the proposed network can achieve 76.0% in mIoU with the PASCAL VOC 2012 training set. The performance of the segmentation network trained with the pseudo masks is up to 73.3% and 73.5% in mIoU on the validation and testing set of PASCAL VOC 2012.
[1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[2] M.-C. Popescu, V. E. Balas, L. Perescu-Popescu, and N. Mastorakis, "Multilayer perceptron and neural networks," WSEAS Transactions on Circuits and Systems, vol. 8, no. 7, pp. 579-588, 2009.
[3] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
[4] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Advances in neural information processing systems, vol. 25, pp. 1097-1105, 2012.
[5] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[6] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
[7] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[8] H. Zhang et al., "Resnest: Split-attention networks," arXiv preprint arXiv:2004.08955, 2020.
[9] A. Ng, "Sparse autoencoder," CS294A Lecture notes, vol. 72, no. 2011, pp. 1-19, 2011.
[10] A. Radford, L. Metz, and S. Chintala, "Unsupervised representation learning with deep convolutional generative adversarial networks," arXiv preprint arXiv:1511.06434, 2015.
[11] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 3431-3440.
[12] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention, 2015: Springer, pp. 234-241.
[13] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Semantic image segmentation with deep convolutional nets and fully connected crfs," arXiv preprint arXiv:1412.7062, 2014.
[14] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs," IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834-848, 2017.
[15] K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1904-1916, 2015.
[16] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, "Rethinking atrous convolution for semantic image segmentation," arXiv preprint arXiv:1706.05587, 2017.
[17] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, "Encoder-decoder with atrous separable convolution for semantic image segmentation," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801-818.
[18] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, "Learning deep features for discriminative localization," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921-2929.
[19] S. Jo and I.-J. Yu, "Puzzle-CAM: Improved localization via matching partial and full features," in 2021 IEEE International Conference on Image Processing (ICIP), 2021: IEEE, pp. 639-643.
[20] Q. Yao and X. Gong, "Saliency guided self-attention network for weakly and semi-supervised semantic segmentation," IEEE Access, vol. 8, pp. 14413-14423, 2020.
[21] S. Lee, M. Lee, J. Lee, and H. Shim, "Railroad is not a train: Saliency as pseudo-pixel supervision for weakly supervised semantic segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 5495-5505.
[22] J. Ahn and S. Kwak, "Learning pixel-level semantic affinity with image-level supervision for weakly supervised semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 4981-4990.
[23] G. R. Cross and A. K. Jain, "Markov random field texture models," IEEE Transactions on Pattern Analysis and Machine Intelligence, no. 1, pp. 25-39, 1983.
[24] C. Sutton and A. McCallum, "An introduction to conditional random fields," Foundations and Trends® in Machine Learning, vol. 4, no. 4, pp. 267-373, 2012.
[25] P. Krähenbühl and V. Koltun, "Efficient inference in fully connected crfs with gaussian edge potentials," Advances in neural information processing systems, vol. 24, 2011.
[26] M. Lin, Q. Chen, and S. Yan, "Network in network," arXiv preprint arXiv:1312.4400, 2013.
[27] B. Kim, S. Han, and J. Kim, "Discriminative region suppression for weakly-supervised semantic segmentation," in Proceedings of the AAAI Conference on Artificial Intelligence, 2021, vol. 35, no. 2, pp. 1754-1761.
[28] B. Bischke, P. Helber, J. Folz, D. Borth, and A. Dengel, "Multi-task learning for segmentation of building footprints with deep neural networks," in 2019 IEEE International Conference on Image Processing (ICIP), 2019: IEEE, pp. 1480-1484.
[29] D. Cheng, G. Meng, S. Xiang, and C. Pan, "FusionNet: Edge aware deep convolutional networks for semantic segmentation of remote sensing harbor images," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 10, no. 12, pp. 5769-5783, 2017.
[30] J. Dai, K. He, and J. Sun, "Instance-aware semantic segmentation via multi-task network cascades," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 3150-3158.
[31] B. Hariharan, P. Arbeláez, L. Bourdev, S. Maji, and J. Malik, "Semantic contours from inverse detectors," in 2011 international conference on computer vision, 2011: IEEE, pp. 991-998.
[32] J.-J. Liu, Q. Hou, M.-M. Cheng, J. Feng, and J. Jiang, "A simple pooling-based design for real-time salient object detection," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3917-3926.
[33] P.-T. Jiang, Y. Yang, Q. Hou, and Y. Wei, "L2G: A Simple Local-to-Global Knowledge Transfer Framework for Weakly Supervised Semantic Segmentation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16886-16896.
[34] S. H. Jo, I. J. Yu, and K.-S. Kim, "RecurSeed and CertainMix for Weakly Supervised Semantic Segmentation," arXiv preprint arXiv:2204.06754, 2022.