基於雙編碼和自注意力機制之即時圖像語義分割｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	張育邦 Yu-Bang Chang
論文名稱：	基於雙編碼和自注意力機制之即時圖像語義分割 Real-Time Semantic Segmentation with Dual Encoder and Self-Attention Mechanism for Autonomous Driving
指導教授：	陳伯奇 Poki Chen 林昌鴻 Chang-Hong Lin
口試委員:	陳伯奇 Poki Chen 林昌鴻 Chang-Hong Lin 呂政修 Jenq-Shiou Leu 陳維美 Wei-Mei Chen
學位類別：	碩士 Master
系所名稱：	電資學院 - 電子工程系 Department of Electronic and Computer Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	80
中文關鍵詞：	即時圖像語義分割、深度學習、自駕車、影像辨識、卷積神經網路、邊緣裝置
外文關鍵詞：	Real-time semantic segmentation, Deep learning, Autonomous driving, Image recognition, Convolution Neural Network, Edge devices
相關次數：	點閱：318 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

隨著自動駕駛的技術愈來愈被重視以及普及化，即時圖像語義分割在深度學習和電腦視覺領域中是近幾年非常熱門和具有挑戰性的領域，除此之外，圖像語義分割的主要目的是為了辨識道路上影像中每一個像素的類別像是汽車、行人、紅綠燈、道路標誌等等，然而為了將深度學習模型可以應用於邊緣裝置上，我們必須設計出一個架構並且找出較佳的準確率和推理時間的平衡點。過去所提出的方法中，有些方法為了取得快速的推理時間而犧牲掉準確度，其他方法則是在滿足即時推理時間的情況下尋找最佳的準確度，然而過去即時圖像語義分割方法的準確度和一般圖像語義分割的準確度還是有一大截的差距，因此為了解決這個問題，本論文提出了一個基於雙編碼和自注意力機制的網路架構，並且使用了真實的行車道路影像進行訓練，另外我們的架構和一般即時圖像語義分割的架構不太一樣，一般的方法普遍是用U-Net的結構來進行改良並且只有一個編碼和一個解碼，我們則是使用了兩個編碼和一個解碼，在兩個編碼路徑中，其中一條路徑是為了取得影像空間上的資訊，另外一條路徑則是為了取得影像語境中的資訊藉此獲得更佳的影像分割效果，和過去其它方法的結果相比，我們的準確度優於過去的其他方法，在視覺效果比較中明顯地也比其他方法較佳，在影像中各個像素的物件分類結果變得更完整。

As the techniques of autonomous driving become more and more valued and universal, real-time semantic segmentation is very popular and challenging in the field of deep learning and computer vision in recent years. In addition, the purpose of the semantic segmentation is to recognize the road driving scenes in an image for each pixel, such as vehicle, pedestrian, traffic light, traffic sign, and so on. However, in order to apply the deep learning model on the edge devices, we need to design a structure, which has the best trade-off between the accuracy and the inference time. In the previous works, some methods sacrifice the accuracy to obtain a faster inference time, the others try to find the best accuracy under the condition of real-time. Nevertheless, the accuracies of previous real-time semantic segmentation methods still have a large gap compared to the general semantic segmentation methods. As a result, we propose a network architecture based on the dual encoder and the self-attention mechanism to solve this problem, and we use a dataset of real driving scenes to train the model. On the other hand, the architecture of our framework is different from other general real-time semantic segmentation methods. The general methods widely use the U-Net structure to improve the results, which includes an encoder and a decoder. We adopt two encoders and a decoder. The encoders contain a spatial path and a context path to acquire spatial information and context information, respectively. Compared with preceding works, the proposed method achieves better results both quantitatively and qualitatively. The predicted results also look more intact than the previous methods.

LIST OF CONTENTS
摘要    I
ABSTRACT    II
致謝    III
LIST OF CONTENTS    IV
LIST OF FIGURES    VI
LIST OF TABLES    VIII
CHAPTER 1    INTRODUCTIONS    1
1.1    Motivation    1
1.2    Contributions    3
1.3    Thesis Organization    4
CHAPTER 2    RELATED WORKS    5
2.1    General Semantic Segmentation Network    5
2.2    Real-time Semantic Segmentation Network    6
CHAPTER 3    PROPOSED METHODS    9
3.1    Data Augmentation    11
3.1.1    Random Scale & Random Crop    11
3.1.2    Random Horizontal Flip    15
3.1.3    Random Color Jitter    16
3.1.4    GridMask    19
3.2    Network Architecture    23
3.2.1    The Overall Model    23
3.2.2    Harmonic Densely Connected Network (HarDNet) [46]    27
3.2.3    ResNet [29]    31
3.2.4    Deformable Convolution [49]    33
3.2.5    Concurrent Spatial and Channel Squeeze & Excitation [44, 45]    35
3.2.6    Refinement Module    38
3.2.7    Factorized Atrous Spatial Pyramid Pooling Module (FASPPM)    40
3.2.8    Skip Connection    44
3.3    Loss function    46
CHAPTER 4    Experimental results    49
4.1    Experimental Environment    49
4.2    Cityscapes Dataset [23]    50
4.3    Evaluation and Results    52
4.3.1    Training Details    52
4.3.2    Quantitative Results    52
4.3.3    Qualitative Results    56
CHAPTER 5    CONCLUSIONS and Future works    62
5.1    Conclusions    62
5.2    Future Works    63
REFERENCES    64



 
LIST OF FIGURES
Figure 2.1 U-Net [33] structure    7
Figure 2.2 Bilateral Segmentation Network [6]    8
Figure 3.1 Flowchart of the Training Process    10
Figure 3.2 Padding on the scaled image    13
Figure 3.3 An example of cropped images with different scales.    14
Figure 3.4 The results of random horizontal flip    15
Figure 3.5 An example of different factors of color jitter. The first row is the results of different brightness factors. The second row is the results of different contrast factors. The third row is the results of different saturation factors, and the last row is the results of different combination factors.    18
Figure 3.6 (a) The height masking. (b) The width masking.    21
Figure 3.7 An example of each unit of the mask    21
Figure 3.8 (a) Original images. (b) Output images after GridMask.    22
Figure 3.9 The proposed overall network architecture    24
Figure 3.10 The Dense Block in DenseNet    28
Figure 3.11 The Harmonic Dense Block in HarDNet    29
Figure 3.12 (a) Standard convolution filters (b) Point-wise convolution filters are as same as    30
Figure 3.13 Residual learning framework    31
Figure 3.14 Architecture of ResNet-18 [29]    32
Figure 3.15 (a) Regular sampling grid (green points) in 3x3 standard convolution (b)(c)(d) The examples of sampling locations (blue points) with augmented offsets (blue arrows) in 3x3 deformable convolution    34
Figure 3.16 An example of 3x3 deformable convolution [49]    34
Figure 3.17 Spatial Squeeze and Channel Excitation (cSE)    36
Figure 3.18 Channel Squeeze and Spatial Excitation (sSE)    37
Figure 3.19 Concurrent Spatial and Channel Squeeze & Excitation (scSE)    37
Figure 3.20 Refinement module    38
Figure 3.21 Strip pooling module    39
Figure 3.22 Atrous Spatial Pyramid Pooling [25]    40
Figure 3.23 An example of atrous convolution [1] with the dilation rate of (a) 1, and (b) 12, respectively    41
Figure 3.24 Factorized Atrous Spatial Pyramid Pooling Module    43
Figure 3.25 Skip connection with three atrous convolution [1, 61]    44
Figure 4.1 The examples of the original images and its corresponding ground truth images in the training data    51
Figure 4.2 Qualitative results of the ResNet-18 [29] backbone    57
Figure 4.3 Qualitative results of the HarDNet-68ds [46] backbone    58
Figure 4.4 (a) Input image and results of the (b) ResNet-18 [29] backbone and (c) HarDNet-68ds [46] backbone    59
Figure 4.5(a) Input image and results of the (b) BiSeNet [6] and (c) proposed method    60
Figure 4.6 (a) Input image and results of the (b) BiSeNet [6] and (c) proposed method    61

 
LIST OF TABLES
Table 4.1 Hardware and software information of the training and testing environment.    49
Table 4.2 The class labels and definitions in Cityscapes Dataset [23]    50
Table 4.3 Quantitative results in the test data of the Cityscapes Dataset [23]    54
Table 4.4 Measure the FPS of ResNet-18 [29] backbone on Nvidia Jetson Nano    55
Table 4.5 Measure the FPS of HarDNet-68ds [46] backbone on Nvidia Jetson Nano    55


                                

[1] L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834-848, 2018.
[2] Z. Zhu, M. Xu, S. Bai, T. Huang, and X. Bai, "Asymmetric Non-Local Neural Networks for Semantic Segmentation," in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 593-602, 2019.
[3] Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, "CCNet: Criss-Cross Attention for Semantic Segmentation," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 27 Oct.-2 Nov. 2019, pp. 603-612.
[4] Y. Zhu, K. Sapra, F. A. Reda, K. J. Shih, S. Newsam, A. Tao, and B. Catanzaro, "Improving Semantic Segmentation via Video Propagation and Label Relaxation," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15-20 June 2019, pp. 8848-8857.
[5] A. Tao, K. Sapra, and B. Catanzaro, "Hierarchical multi-scale attention for semantic segmentation," arXiv preprint arXiv:2005.10821, 2020.
[6] C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, "Bisenet: Bilateral segmentation network for real-time semantic segmentation," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 325-341.
[7] H. Li, P. Xiong, H. Fan, and J. Sun, "DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15-20 June 2019, pp. 9514-9523.
[8] M. Oršic, I. Krešo, P. Bevandic, and S. Šegvic, "In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15-20 June 2019, pp. 12599-12608.
[9] P. Hu, F. Perazzi, F. C. Heilbron, O. Wang, Z. Lin, K. Saenko, and S. Sclaroff, "Real-Time Semantic Segmentation With Fast Attention," IEEE Robotics and Automation Letters, vol. 6, no. 1, pp. 263-270, 2021.
[10] P. Lin, P. Sun, G. Cheng, S. Xie, X. Li, and J. Shi, "Graph-Guided Architecture Search for Real-Time Semantic Segmentation," in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13-19 June 2020, pp. 4202-4211.
[11] M. Gruosso, N. Capece, and U. Erra, "Human segmentation in surveillance video with deep learning," Multimedia Tools and Applications, vol. 80, no. 1, pp. 1175-1199, 2021/01/01.
[12] T. Cane and J. Ferryman, "Evaluating deep semantic segmentation networks for object detection in maritime surveillance," in 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 27-30 Nov. 2018, pp. 1-6.
[13] E. Maggiori, Y. Tarabalka, G. Charpiat, and P. Alliez, "Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification," IEEE Transactions on Geoscience and Remote Sensing, vol. 55, no. 2, pp. 645-657, 2017.
[14] Q. Wang, S. Liu, J. Chanussot, and X. Li, "Scene Classification With Recurrent Attention of VHR Remote Sensing Images," IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 2, pp. 1155-1167, 2019.
[15] U. Côté-Allard, C. L. Fall, A. Drouin, A. Campeau-Lecours, C. Gosselin, K. Glette, F. Laviolette, and B. Gosselin, "Deep Learning for Electromyographic Hand Gesture Signal Classification Using Transfer Learning," IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 27, no. 4, pp. 760-771, 2019.
[16] Q. Wang, X. He, and X. Li, "Locality and Structure Regularized Low Rank Representation for Hyperspectral Image Classification," IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 2, pp. 911-923, 2019.
[17] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You Only Look Once: Unified, Real-Time Object Detection," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June 2016, pp. 779-788.
[18] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017.
[19] T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature Pyramid Networks for Object Detection," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21-26 July 2017, pp. 936-944.
[20] M. Yang, K. Yu, C. Zhang, Z. Li, and K. Yang, "DenseASPP for Semantic Segmentation in Street Scenes," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18-23 June 2018, pp. 3684-3692.
[21] S. Mehta, M. Rastegari, A. Caspi, L. Shapiro, and H. Hajishirzi, "Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation," in Proceedings of the european conference on computer vision (ECCV), 2018, pp. 552-568.
[22] S. Mehta, M. Rastegari, L. Shapiro, and H. Hajishirzi, "ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15-20 June 2019, pp. 9182-9192.
[23] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, "The Cityscapes Dataset for Semantic Urban Scene Understanding," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June 2016, pp. 3213-3223.
[24] P. Chen, S. Liu, H. Zhao, and J. Jia, "GridMask Data Augmentation," ArXiv, vol. abs/2001.04086, 2020.
[25] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, "Encoder-decoder with atrous separable convolution for semantic image segmentation," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 801-818.
[26] J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 7-12 June 2015, pp. 3431-3440.
[27] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, "Pyramid Scene Parsing Network," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21-26 July 2017, pp. 6230-6239.
[28] Y. Yuan, X. Chen, and J. Wang, "Object-Contextual Representations for Semantic Segmentation," in Computer Vision–ECCV 2020: 16th European Conference, 2020, pp. 173-190.
[29] K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June 2016, pp. 770-778.
[30] K. Sun, B. Xiao, D. Liu, and J. Wang, "Deep High-Resolution Representation Learning for Human Pose Estimation," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15-20 June 2019, pp. 5686-5696.
[31] E. Romera, J. M. Álvarez, L. M. Bergasa, and R. Arroyo, "ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation," IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 1, pp. 263-272, 2018.
[32] H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, "ICNet for real-time semantic segmentation on high-resolution images," in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 405-420.
[33] O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015: Springer International Publishing, pp. 234-241.
[34] B. Xu, F. Yang, J. Yang, S. Wu, and Y. Shan, "SPNet: Superpixel Pyramid Network for Scene Parsing," in 2018 Chinese Automation Congress (CAC), 30 Nov.-2 Dec. 2018, pp. 3690-3695.
[35] T. Hu, Y. Wang, Y. Chen, P. Lu, H. Wang, and G. Wang, "Sobel Heuristic Kernel for Aerial Semantic Segmentation," in 2018 25th IEEE International Conference on Image Processing (ICIP), 7-10 Oct. 2018, pp. 3074-3078.
[36] Y. Nakayama, H. Lu, Y. Li, and H. Kim, "Wide Residual Networks for Semantic Segmentation," in 2018 18th International Conference on Control, Automation and Systems (ICCAS), 17-20 Oct. 2018, pp. 1476-1480.
[37] Y. Zhou, X. Sun, Z. Zha, and W. Zeng, "Context-Reinforced Semantic Segmentation," in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 15-20 June 2019, pp. 4041-4050.
[38] X. Peng, Z. Tang, F. Yang, R. S. Feris, and D. Metaxas, "Jointly Optimize Data Augmentation and Network Training: Adversarial Data Augmentation in Human Pose Estimation," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18-23 June 2018, pp. 2226-2234.
[39] M. Bateriwala and P. Bourgeat, "Enforcing temporal consistency in Deep Learning segmentation of brain MR images," arXiv preprint arXiv:1906.07160, 2019.
[40] S. Yun, D. Han, S. Chun, S. J. Oh, Y. Yoo, and J. Choe, "CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 27 Oct.-2 Nov. 2019, pp. 6022-6031.
[41] S. Lim, I. Kim, T. Kim, C. Kim, and S. Kim, "Fast autoaugment," Advances in Neural Information Processing Systems, vol. 32, pp. 6665-6675, 2019.
[42] T. DeVries and G. W. Taylor, "Improved regularization of convolutional neural networks with cutout," arXiv preprint arXiv:1708.04552, 2017.
[43] K. K. Singh and Y. J. Lee, "Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization," in 2017 IEEE international conference on computer vision (ICCV), 2017, pp. 3544-3553.
[44] A. G. Roy, N. Navab, and C. Wachinger, "Concurrent spatial and channel ‘squeeze & excitation’in fully convolutional networks," in International conference on medical image computing and computer-assisted intervention, 2018: Springer, pp. 421-429.
[45] A. G. Roy, N. Navab, and C. Wachinger, "Recalibrating Fully Convolutional Networks With Spatial and Channel “Squeeze and Excitation” Blocks," IEEE Transactions on Medical Imaging, vol. 38, no. 2, pp. 540-549, 2019.
[46] P. Chao, C. Kao, Y. Ruan, C. Huang, and Y. Lin, "HarDNet: A Low Memory Traffic Network," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 27 Oct.-2 Nov. 2019, pp. 3551-3560.
[47] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in International conference on machine learning, 2015, pp. 448-456.
[48] X. Glorot, A. Bordes, and Y. Bengio, "Deep Sparse Rectifier Neural Networks," in International conference on artificial intelligence and statistics, 2011, vol. 15, pp. 315-323.
[49] J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, "Deformable Convolutional Networks," in 2017 IEEE International Conference on Computer Vision (ICCV), 22-29 Oct. 2017, pp. 764-773.
[50] G. Lin, A. Milan, C. Shen, and I. Reid, "Refinenet: Multi-path refinement networks for high-resolution semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1925-1934.
[51] P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell, "Understanding convolution for semantic segmentation," in 2018 IEEE winter conference on applications of computer vision (WACV), 2018: IEEE, pp. 1451-1460.
[52] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, and M. Bernstein, "Imagenet large scale visual recognition challenge," International journal of computer vision, vol. 115, no. 3, pp. 211-252, 2015.
[53] Q. Hou, L. Zhang, M. M. Cheng, and J. Feng, "Strip Pooling: Rethinking Spatial Pooling for Scene Parsing," in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13-19 June 2020, pp. 4002-4011.
[54] C. Yu, C. Gao, J. Wang, G. Yu, C. Shen, and N. Sang, "Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation," arXiv preprint arXiv:2004.02147, 2020.
[55] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," in arXiv preprint arXiv:1409.1556., 2014.
[56] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the Inception Architecture for Computer Vision," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June 2016, pp. 2818-2826.
[57] G. Huang, Z. Liu, L. V. D. Maaten, and K. Q. Weinberger, "Densely Connected Convolutional Networks," in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 21-26 July 2017, pp. 2261-2269.
[58] S. Jégou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio, "The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2017, pp. 11-19.
[59] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, "Mobilenets: Efficient convolutional neural networks for mobile vision applications," arXiv preprint arXiv:1704.04861, 2017.
[60] J. Hu, L. Shen, and G. Sun, "Squeeze-and-Excitation Networks," in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 18-23 June 2018, pp. 7132-7141.
[61] L. Jing, Y. Chen, and Y. Tian, "Coarse-to-Fine Semantic Segmentation From Image-Level Labels," IEEE Transactions on Image Processing, vol. 29, pp. 225-236, 2020.
[62] S. Zhao, Y. Wang, Z. Yang, and D. Cai, "Region mutual information loss for semantic segmentation," arXiv preprint arXiv:1910.12037, 2019.
[63] A. Shrivastava, A. Gupta, and R. Girshick, "Training Region-Based Object Detectors with Online Hard Example Mining," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 27-30 June 2016, pp. 761-769.
[64] L. Zhang, X. Li, A. Arnab, K. Yang, Y. Tong, and P. H. Torr, "British Machine Vision Conference (BMVC)," in arXiv preprint arXiv:1909.06121, 2019.
[65] H. Zha, R. Liu, D. Zhou, X. Yang, Q. Zhang, and X. Wei, "Efficient Attention Calibration Network for Real-Time Semantic Segmentation," in Asian Conference on Machine Learning, 2020, pp. 337-352.
[66] X. Hu and Y. Gong, "Lightweight Asymmetric Dilation Network for Real-Time Semantic Segmentation," IEEE Access, vol. 9, pp. 55630-55643, 2021.
[67] M. Everingham, S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, "The pascal visual object classes challenge: A retrospective," International journal of computer vision, vol. 111, no. 1, pp. 98-136, 2015.
[68] "NVIDIA (2019). TensorRT. Accessed: Jun. 10, 2019. [Online]. Available:
https://developer.nvidia.com/tensorrt."
[69] V. Badrinarayanan, A. Kendall, and R. Cipolla, "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 12, pp. 2481-2495, 2017.

全文公開日期 2026/07/21 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文