簡易檢索 / 詳目顯示

研究生: 許庭禎
Ting-Chen Hsu
論文名稱: 基於比較框架對多階層廣域模型之改進 - 應用於語義圖像分割
Improving Multi-Scale and Large-Kernel Models with A Comparative Framework for Semantic Image Segmentation
指導教授: 林伯慎
Bor-Shen Lin
口試委員: 羅乃維
Nai-Wei Lo
楊傳凱
Chuan-Kai Yang
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2021
畢業學年度: 110
語文別: 中文
論文頁數: 52
中文關鍵詞: 語義分割全域卷積網路空洞空間金字塔池化自注意力
外文關鍵詞: Semantic Segmentation, Global Convolutional Network, Atrous Spatial Pyramid Pooling, Self-attention
相關次數: 點閱:248下載:20
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

近年主流的語義分割模型包含了全域卷積網路(GCN)、空洞空間金字塔池化
(ASPP)、和自關注(self-attention)等模型;這些模型透過多尺度的特徵融合,將局
部特徵與全局相關特徵互相結合,以提升分類正確率。這些模型的共通概念就是,
在不大幅增加參數量下,設法擴大不同尺度的感受視野。然而,因為各模型的網路
結構複雜,不容易公平地比較哪一種機制有較好的效能;若想要結合不同機制時,
容易受限於各自模型架構的複雜性而不易整合。為了改善此問題,本研究提出了一
個通用的語義分割得比較整合框架,在此框架下可以對不同的廣域語義分割模型
進行並行測試、比較、改進、與整合。本研究分別就GCN、ASPP 和self-attention
等三個方法提出了三種改進的轉換模組:分別是 1024 個輸出通道的 GCN、精簡
self-attention 和SPP+GCN。實驗結果顯示這些模組可以提供更精確的語義分割結
果,並在 Pascal VOC2012 圖像分割數據集上取得出色的表現,其正確率分別可達
73.97.%、71.75%與74.02%,均優於原始模型。此外,將不同方法互相組合。實
驗結果顯示,將這三類改進後的轉換模組結合進行平行訓練、並以加權和串接其輸
出特徵圖進行分類,其準確率分別達74.08%與75.29%。
進一步本研究驗證單一模組在多階層架構下不同階層的效用,上階層的運算
可視為高通濾波器,主要提取局部細節但對於分類較為不靈敏。下階層的運算可視
為低通濾波器,主要提取總體輪廓與正確分類各像素,但對於細化物體邊界較為不
靈敏。各階層組合有助於互補缺失,分類時能提供更多資訊進而提升整體的正確率。


State-of-the-art models of semantic segmentation, based on global convolutional network, ASPP, self-attention and so on, focus on capturing spatial contexts by multi-scale feature fusion and integrating local features with global dependencies. However, these models have not yet been compared in parallel to interpret their relative efficacies. This makes it difficult to combine or improve these models further because of complicated network structures. In this paper, a common framework was proposed to investigate the network structures of the models in parallel. Additionally, three alternative modules were proposed to improve these methods, and experiments show that these modules may give more precise segmentation results and achieve outstanding performance on Pascal VOC2012 images segmentation datasets. Besides, we also verified the efficacies of multi-scale architecture. Experimental result shows that the upper-level operation can be regarded as a high-pass filter extracting local details and the lower-level operation can be regarded as a low-pass filter extracting the overall contour. The multi-scale architecture helps to complement each other's deficiencies.

摘 要 I Abstract II 致 謝 III 目 錄 IV 圖目錄 VII 表目錄 XI 第1章 序論 1.1 研究背景與動機 1.2 研究貢獻 1.3 論文組織與架構 第2章 文獻回顧 2.1 卷積神經網路 2.2 ResNet模型 2.2.1 殘差學習 2.2.2 網路結構 2.3 語義分割 2.3.1 全卷積網路 2.3.2 全域卷積網路 2.3.3 DeepLabv3+ 2.3.4 雙注意力網路 2.4 語義分割指標 第3章 語義分割方法分析與改進 3.1 模型架構 3.1.1 模型設計 3.1.2 誤差計算 3.2 實驗設定 3.2.1 資料集介紹與資料前處理 3.2.2 訓練設定 3.3 基礎實驗 3.4 模組分析 3.4.1 模組設計與實驗 3.5 轉換模組之組合分析 3.5.1 以加權方式結合轉換模組 3.5.2 以串接方式結合轉換模組 3.6 本章摘要 第4章 多階層架構之探討 4.1 模型架構 4.2 基礎實驗 4.3 多階層之組合分析 4.4 本章摘要 第5章 結論 參考文獻

[1] J. Long, E. Shelhamer and T. Darrell, "Fully convolutional networks for semantic segmentation," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 3431-3440, doi: 10.1109/CVPR.2015.7298965.
[2] Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun. (2017). "Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network." arXiv:1703.02719
[3] Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam. (2018). "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. " arXiv: 1802.02611
[4] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin (2017). “Attention Is All You Need.” arXiv: 1706.03762
[5] Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, Hanqing Lu. (2018). “Dual Attention Network for Scene Segmentation.” arXiv: 1809.02983
[6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. (2015). “Deep Residual Learning for Image Recognition.” arXiv: 1512.03385
[7] Karen Simonyan, Andrew Zisserman. (2014). “Very Deep Convolutional Networks for Large-Scale Image Recognition.” arXiv: 1409.1556
42
[8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. (2015). “Deep Residual Learning for Image Recognition.” arXiv: 1512.03385 (p.5)
[9] Hyeonwoo Noh, Seunghoon Hong, Bohyung Han. (2015). “Learning Deconvolution Network for Semantic Segmentation.” arXiv: 1505.04366 (p.3)
[10] Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. (2016). “DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.” arXiv: 1606.00915
[11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. (2015). “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.” arXiv: 1502.01852

QR CODE