簡易檢索 / 詳目顯示

研究生: 李皓軒
Hao-Hsuan Lee
論文名稱: 基於多任務學習與邊緣監督機制於文件頁面語義分割網路之應用
A Deep Document Semantic Segmentation Network with Edge Supervision and Multi-Task Learning Mechanism
指導教授: 郭景明
Jing-Ming Guo
口試委員: 王乃堅
Nai-Jian Wang
徐繼聖
Gee-Sern Hsu
丁建均
Jian-Jiun Ding
夏至賢
Chih-Hsien Hsia
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 106
中文關鍵詞: 圖像語義分割文件分割頁面排版分析深度學習
外文關鍵詞: Semantic Segmentation, Document Segmentation, Layout Analysis, Deep Learning
相關次數: 點閱:196下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 語義分割技術為一項於電腦視覺及深度學習領域中常見的議題,有別圖像分類以及目標檢測等任務,語義分割技術為所有像素點賦予單一類別標籤,使機器對圖像有更加細微與深層的理解。其應用領域分布廣泛,例如:自動駕駛,醫學影像、衛星圖像以及文件頁面分割解析等等。
    儘管語義分割網路近年來在自然影像分割任務中已取得穩定且優異的表現,但由於文件圖像與自然影像在特徵結構及物件類別上巨大的差異,導致現有之語義分割網路於文件解析任務之應用仍然還有改善及進步的空間。由於文件圖像之物件類別相較於自然影像少而且物件區域大多較為龐大,因此邊緣特徵可以預期是一個能夠輔佐網路學習的資訊。本論文所提出的方法,基於現有穩定且高效之語義分割神經網路中,導入邊緣特徵資訊輔佐神經網路進行特徵學習,採納不同特徵並設置新的損失函數,進而強化邊緣附近之預測效果。並且提出密集金字塔整合模塊,期望可以減少過往語義分割網路為了得到更加密集且多尺度之特徵,而對圖像進行下採樣所造成的上下文資訊不足以及邊緣細節遺失問題,進而對整體效能有所提升。


    Semantic segmentation technology is a common issue in computer vision and deep learning fields. Differs from image classification and object detection tasks, semantic segmentation technology assigns a single category label to all pixels, thereby getting a better scene understanding about the data. Consequently, it normally plays an essential role in high-level applications such as medical image retrieval, satellite imagery analytics, autonomous driving and document page segmentation to name but a few.
    The semantic segmentation network has achieved stable and excellent performance in the natural image segmentation task in recent years. However, due to the huge difference in the feature structure and object category between the document image and the natural image, the existing semantic segmentation network still has room for improvement and progress in the document page segmentation field. For example, since the object type of the document image is less than the natural image and the object area is mostly large, the edge can be expected as a useful information that can assist the network learning process.
    Based on existing and stable semantic segmentation network, this paper utilizes the edge feature information to assist the network for feature learning, expected to improve the performance near the object edge. We also proposed the Densely Joint Pyramid Module, which enhances the feature extraction part to get multi-scale and dense feature extraction. As a result, it improves the overall performance in the document page segmentation field.

    摘要 I Abstract II 致謝 III 目錄 IV 圖片索引 VI 表格索引 IX 第一章 緒論 1 背景介紹 1 研究動機與目的 2 論文架構 4 第二章 基於卷積神經網路物件語義分割技術之文獻探討 5 2.1 類神經網路的運作模式 7 2.1.1 前向傳播(Forward Propagation) 7 2.1.2 反向傳播(Backward Propagation) 10 2.2 影響類神經網路效能的因素 14 2.3 卷積神經網路 18 2.3.1 卷積運算 18 2.3.2 非線性激活函數 20 2.3.3 影像降維操作 21 2.3.4 卷積神經網路之訓練方法 23 2.3.5 卷積神經網路架構之發展 26 2.4 語義分割網路 31 2.4.1 Fully Convolutional Network(FCN) 32 2.4.2 U-Net 34 2.4.2 SegNet 35 2.4.4 Pyramid Scene Parsing Network(PSPNet) 35 2.4.2 Deeplab Series 36 2.5 頁面分割技術 41 2.6 多任務學習機制 43 第三章 基於多任務學習與邊緣監督機制於文件頁面語義分割網路之應用 46 3.1 特徵萃取基底網路 47 3.2 文件頁面語義分割網路架構 49 3.3 Densely Joint Pyramid Module 53 3.4 邊緣檢測網路分支 56 第四章 實驗數據及結果 58 4.1 公開資料庫 58 4.1.1 RDCL2017 資料集 58 4.1.2 Marmot Database 60 4.1.3 PASCAL-Context Dataset 61 4.2 PPSD2019資料庫 62 4.2.1 IEEE5000 Database 62 4.2.2 Finacial3000 Database 64 4.2.3 TIME1000 Database 65 4.3 實驗結果 66 4.3.1 定量評估指標 66 4.3.2 網路與訓練參數設置 67 4.3.3 實驗結果分析 68 第五章 結論與未來展望 88 第六章 參考文獻 89

    [1] Nair, Vinod, and Geoffrey E. Hinton. "Rectified linear units improve restricted boltzmann machines." Proceedings of the 27th international conference on machine learning (ICML-10). 2010.
    [2] Springenberg, Jost Tobias, et al. "Striving for simplicity: The all convolutional net." arXiv preprint arXiv:1412.6806 (2014).
    [3] M. Zeiler, D. Krishnan, G. Taylor, and R. Fergus, “Deconvolutional Networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2010.
    [4] LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
    [5] Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." 2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009.
    [6] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
    [7] Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." European conference on computer vision. Springer, Cham, 2014.
    [8] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
    [9] Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
    [10] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
    [11] Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
    [12] Ess, Andreas, et al. "Segmentation-Based Urban Traffic Scene Understanding." BMVC. Vol. 1. 2009.
    [13] Geiger, Andreas, Philip Lenz, and Raquel Urtasun. "Are we ready for autonomous driving? the kitti vision benchmark suite." 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012.
    [14] Cordts, Marius, et al. "The cityscapes dataset for semantic urban scene understanding." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
    [15] Khan, Khalil, Massimo Mauro, and Riccardo Leonardi. "Multi-class semantic segmentation of faces." 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 2015.
    [16] Benini, Sergio, et al. "Face analysis through semantic face segmentation." Signal Processing: Image Communication 74 (2019): 21-31.
    [17] Skiparis, Deividas. Semantic face segmentation from video streams in the wild. Diss. Universitat Rovira i Virgili, 2017.
    [18] Milletari, Fausto, Nassir Navab, and Seyed-Ahmad Ahmadi. "V-net: Fully convolutional neural networks for volumetric medical image segmentation." 2016 Fourth International Conference on 3D Vision (3DV). IEEE, 2016.
    [19] Çiçek, Özgün, et al. "3D U-Net: learning dense volumetric segmentation from sparse annotation." International conference on medical image computing and computer-assisted intervention. Springer, Cham, 2016.
    [20] He, Dafang, et al. "Multi-scale multi-task fcn for semantic page segmentation and table detection." 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR). Vol. 1. IEEE, 2017.
    [21] Yang, Xiao, et al. "Learning to extract semantic structure from documents using multimodal fully convolutional neural networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
    [22] Bloomberg, Dan S., and Luc Vincent. "Document image applications." Morphologie Mathmatique 8 (2007).
    [23] Bukhari, Syed Saqib, Faisal Shafait, and Thomas M. Breuel. "Improved document image segmentation algorithm using multiresolution morphology." Document recognition and retrieval XVIII. Vol. 7874. International Society for Optics and Photonics, 2011.
    [24] Fernández, Francisco Cruz, and Oriol Ramos Terrades. "Document segmentation using relative location features." Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012). IEEE, 2012.
    [25] Volpi, Michele, and Vittorio Ferrari. "Semantic segmentation of urban scenes by learning local class interactions." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2015.
    [26] Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
    [27] Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.
    [28] Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. "Segnet: A deep convolutional encoder-decoder architecture for image segmentation." IEEE transactions on pattern analysis and machine intelligence 39.12 (2017): 2481-2495.
    [29] Zhao, Hengshuang, et al. "Pyramid scene parsing network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
    [30] Chen, Liang-Chieh, et al. "Semantic image segmentation with deep convolutional nets and fully connected crfs." arXiv preprint arXiv:1412.7062 (2014).
    [31] Chen, Liang-Chieh, et al. "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." IEEE transactions on pattern analysis and machine intelligence 40.4 (2017): 834-848.
    [32] He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." IEEE transactions on pattern analysis and machine intelligence 37.9 (2015): 1904-1916.
    [33] Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation." arXiv preprint arXiv:1706.05587 (2017).
    [34] Chen, Liang-Chieh, et al. "Encoder-decoder with atrous separable convolution for semantic image segmentation." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
    [35] Bischke, Benjamin, et al. "Multi-task learning for segmentation of building footprints with deep neural networks." arXiv preprint arXiv:1709.05932 (2017).
    [36] Cheng, Dongcai, et al. "FusionNet: Edge aware deep convolutional networks for semantic segmentation of remote sensing harbor images." IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 10.12 (2017) : 5769-5783.
    [37] Miao, Shun, et al. "Dilated FCN for multi-Agent 2D/3D medical image registration." Thirty-Second AAAI Conference on Artificial Intelligence. 2018.
    [38] Wu, Huikai, et al. "FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation." arXiv preprint arXiv:1903.11816 (2019).
    [39] Zhang, Hang, et al. "Context encoding for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
    [40] Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
    [41] Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).
    [42] Dai, Jifeng, Kaiming He, and Jian Sun. "Instance-aware semantic segmentation via multi-task network cascades." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
    [43] Bischke, Benjamin, et al. "Multi-task learning for segmentation of building footprints with deep neural networks." arXiv preprint arXiv:1709.05932 (2017).
    [44] Fang, Jing, et al. "Dataset, ground-truth and performance metrics for table detection evaluation." 2012 10th IAPR International Workshop on Document Analysis Systems. IEEE, 2012.
    [45] Mottaghi, Roozbeh, et al. "The role of context for object detection and semantic segmentation in the wild." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.
    [46] Everingham, Mark, et al. "The pascal visual object classes (voc) challenge." International journal of computer vision 88.2 (2010): 303-338.
    [47] Sauvola,J.; Kauniskangas,H.; Mediateam document database ii. A CD-ROM collection of document images, University of Oulu Finland, 1999.
    [48] Todoran, Leon, Marcel Worring, and Arnold WM Smeulders. "The UvA color document dataset." International Journal of Document Analysis and Recognition (IJDAR) 7.4 (2005): 228-240.
    [49] Antonacopoulos, Apostolos, et al. "A realistic dataset for performance evaluation of document layout analysis." 2009 10th International Conference on Document Analysis and Recognition. IEEE, 2009.
    [50] https://github.com/rafaelpadilla/Object-Detection-Metrics
    [51] Y. Liu, K. Bai, P. Mitra, and C. L. Giles, “Tableseer: automatic table metadata extraction and searching in digital libraries,” in Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries. ACM,2007, pp. 91–100.
    [52] Yildiz, Burcu, Katharina Kaiser, and Silvia Miksch. "pdf2table: A method to extract table information from pdf files." IICAI. 2005.
    [53] Siddiqui, Shoaib Ahmed, et al. "DeCNT: Deep Deformable CNN for Table Detection." IEEE Access 6 (2018): 74151-74161.

    無法下載圖示 全文公開日期 2024/08/21 (校內網路)
    全文公開日期 2024/08/21 (校外網路)
    全文公開日期 2024/08/21 (國家圖書館:臺灣博碩士論文系統)
    QR CODE