簡易檢索 / 詳目顯示

研究生: 陳又瑄
Yu-Hsuan Chen
論文名稱: 多尺度視覺轉換器於UNet進行遙測影像之水域分割
Multi-scale vision transformer UNet for water segmentation in remote sensing image
指導教授: 李佩君
Pei-Jun Lee
口試委員: 許孟烈
Meng-Lieh Sheu
陳嘉瑞
Chia-Jui Chen
莊智清
Jyh-Chin Juang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 81
中文關鍵詞: 遙測影像水域分割視覺轉換器
外文關鍵詞: water segmentation, vision transformer
相關次數: 點閱:136下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 遙測衛星透過機器學習方式進行洪災事件即時辨識系統,並提出洪水災害預測,達到即時預警功能,以此在極端氣候影響下的台灣,進行災防的社會責任是本研究的目標。本研究將導入深度學習在遙測衛星上進行多頻譜影像(Multi-spectral images, R, G, B, NIR)與特徵擷取技術以增強辨識物件。透過多頻譜影像辨識河川區域,以優化的Transformer對特徵進行學習。然後結合UNet形成CNN加transformer達到同時觀察局部和全局的特性。本研究提出組合損失函數針對區域和邊緣做監督學習。根據數據庫與所記憶之影像資料,對水流動態事件判讀進行洪水災害預測,進一步達到阻止災害範圍擴大的效果。
    為了解決水域預測結果產生的分割不精確,在模型上提出多尺度視野觀測圖像、豐富且強大的特徵學習、使用不同的圖像增強模型理解,以及圖像中不同類別之間邊緣的學習與監督。本研究提出的方法與先進的模型相比有更好的表現及更穩定的性能。


    The research aims to fulfill the social responsibility of disaster prevention in Taiwan, which is prone to extreme weather conditions, by utilizing remote sensing satellites and implementing a real-time flood event identification system through machine learning. This system also provides flood disaster prediction for immediate warning purposes. The study incorporates deep learning techniques into remote sensing satellites to enhance object recognition using multi-spectral images (R, G, B, IR) and feature extraction. By utilizing multi-spectral imagery for river region identification, an optimized Transformer is employed to learn features. Subsequently, combining this with UNet forms a CNN plus Transformer architecture, enabling the simultaneous observation of both local and global characteristics. This reaserch introduces a combined loss function for supervised learning both regions and edges. Leveraging a database and stored image data, dynamic water flow events are interpreted for flood disaster prediction, further achieving the effect of preventing the expansion of disaster areas.
    To address the issue of inaccurate segmentation in water area prediction, several improvements are proposed in the model. These include incorporating multi-scale presentations, enriching and enhancing feature learning, utilizing different image augmentation models for better understanding, and learning and supervising the edges between different classes in the images. The proposed approach demonstrates superior performance and more stable results compared to advanced models.

    摘要 I ABSTRACT II 致謝 III LIST OF CONTENTS IV LIST OF FIGURES VI LIST OF TABLES VIII CHAPTER 1 INTRODUCTIONS 1 1.1 Introduction 1 1.2 Motivation 2 1.3 Organization 9 CHAPTER 2 RELATED WORKS 10 2.1 The deep learning model for water segmentation 10 2.2 Vision Transformer 13 2.3 Segmentation Model 17 2.4 The loss function for segmentation 22 CHAPTER 3 MULTI-SCALE VISION TRANSFORMER UNET 27 3.1 Multi-scale vision transformer 27 3.1.1 Multi-scale projection 27 3.1.2 Spatial Reduction and Attention 31 3.1.3 Vision transformer integration 35 3.2 Optimized UNet 36 3.2.1 Multi-Scale Vision Transformer in UNet 36 3.2.2 Multi-spectral channel 37 3.2.3 Integration with the hue image 39 3.3 Loss Function 44 3.3.1 Selective water region learning 44 3.3.2 Composed Loss Function 46 CHAPTER 4 EXPERIMENTAL RESULTS 53 4.1 Wroldfloods Dataset[24] 53 4.2 Performance Evaluation 55 4.2.1 Evaluation metrics 55 4.2.2 Validation Results 57 4.2.3 Inference Results 58 CHAPTER 5 CONCLUSIONS AND FUTURE WORKS 65 5.1 Conclusions 65 5.2 Future Works 66 REFERENCES 67

    [1] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9351, pp. 234–241, 2015, doi: 10.1007/978-3-319-24574-4_28.
    [2] J. Chen et al., “TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation,” Feb. 2021, Accessed: Jun. 24, 2023. [Online]. Available: https://arxiv.org/abs/2102.04306v1
    [3] M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, Y. Bengio, and Y. U. Com, “Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1,” Feb. 2016, Accessed: Jun. 03, 2023. [Online]. Available: https://arxiv.org/abs/1602.02830v3
    [4] T. A. Bui, P. J. Lee, K. Y. Chen, C. R. Chen, C. S. J. Liu, and H. C. Lin, “Edge Computing-Based SAT-Video Coding for Remote Sensing,” IEEE Access, vol. 10, pp. 52840–52852, 2022, doi: 10.1109/ACCESS.2022.3174553.
    [5] T. Y. Lin et al., “Microsoft COCO: Common Objects in Context,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8693 LNCS, no. PART 5, pp. 740–755, May 2014, doi: 10.1007/978-3-319-10602-1_48.
    [6] A. Vaswani et al., “Attention Is All You Need,” Adv Neural Inf Process Syst, vol. 2017-December, pp. 5999–6009, Jun. 2017, Accessed: Jun. 13, 2023. [Online]. Available: https://arxiv.org/abs/1706.03762v5
    [7] A. Dosovitskiy et al., “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale,” Oct. 2020, Accessed: Jun. 13, 2023. [Online]. Available: https://arxiv.org/abs/2010.11929v2
    [8] H. Wu et al., “CvT: Introducing Convolutions to Vision Transformers,” Proceedings of the IEEE International Conference on Computer Vision, pp. 22–31, Mar. 2021, doi: 10.1109/ICCV48922.2021.00009.
    [9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” Commun ACM, vol. 60, no. 6, pp. 84–90, May 2017, doi: 10.1145/3065386.
    [10] E. Shelhamer, J. Long, and T. Darrell, “Fully Convolutional Networks for Semantic Segmentation,” IEEE Trans Pattern Anal Mach Intell, vol. 39, no. 4, pp. 640–651, Nov. 2014, doi: 10.1109/TPAMI.2016.2572683.
    [11] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, Sep. 2014, Accessed: Jun. 10, 2023. [Online]. Available: https://arxiv.org/abs/1409.1556v6
    [12] L. C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11211 LNCS, pp. 833–851, Feb. 2018, doi: 10.1007/978-3-030-01234-2_49.
    [13] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking Atrous Convolution for Semantic Image Segmentation,” Jun. 2017, Accessed: Aug. 16, 2023. [Online]. Available: https://arxiv.org/abs/1706.05587v3
    [14] I. Ahmed, M. Ahmad, F. A. Khan, and M. Asif, “Comparison of deep-learning-based segmentation models: Using top view person images,” IEEE Access, vol. 8, pp. 136361–136373, 2020, doi: 10.1109/ACCESS.2020.3011406.
    [15] N. Otsu, “THRESHOLD SELECTION METHOD FROM GRAY-LEVEL HISTOGRAMS.,” IEEE Trans Syst Man Cybern, vol. SMC-9, no. 1, pp. 62–66, 1979, doi: 10.1109/TSMC.1979.4310076.
    [16] A. Bleau and L. J. Leon, “Watershed-Based Segmentation and Region Merging,” Computer Vision and Image Understanding, vol. 77, no. 3, pp. 317–370, Mar. 2000, doi: 10.1006/CVIU.1999.0822.
    [17] J. Ma, “Segmentation Loss Odyssey,” May 2020, Accessed: Jun. 19, 2023. [Online]. Available: https://arxiv.org/abs/2005.13449v1
    [18] T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” IEEE Trans Pattern Anal Mach Intell, vol. 42, no. 2, pp. 318–327, Aug. 2017, doi: 10.1109/TPAMI.2018.2858826.
    [19] F. Milletari, N. Navab, and S. A. Ahmadi, “V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation,” Proceedings - 2016 4th International Conference on 3D Vision, 3DV 2016, pp. 565–571, Jun. 2016, doi: 10.1109/3DV.2016.79.
    [20] M. A. Rahman and Y. Wang, “Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10072 LNCS, pp. 234–244, 2016, doi: 10.1007/978-3-319-50835-1_22.
    [21] D. Karimi and S. E. Salcudean, “Reducing the Hausdorff Distance in Medical Image Segmentation with Convolutional Neural Networks,” IEEE Trans Med Imaging, vol. 39, no. 2, pp. 499–513, Apr. 2019, doi: 10.1109/TMI.2019.2930068.
    [22] S.-H. Gao, M.-M. Cheng, K. Zhao, X.-Y. Zhang, M.-H. Yang, and P. Torr, “Res2Net: A New Multi-scale Backbone Architecture,” IEEE Trans Pattern Anal Mach Intell, vol. 43, no. 2, pp. 652–662, Apr. 2019, doi: 10.1109/TPAMI.2019.2938758.
    [23] F. Chollet, “Xception: Deep Learning with Depthwise Separable Convolutions,” Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 1800–1807, Oct. 2016, doi: 10.1109/CVPR.2017.195.
    [24] G. Mateo-Garcia et al., “Towards global flood mapping onboard low cost satellites with machine learning,” Scientific Reports 2021 11:1, vol. 11, no. 1, pp. 1–12, Mar. 2021, doi: 10.1038/s41598-021-86650-z.
    [25] B. Cheng, R. Girshick, P. Dollár, A. C. Berg, and A. Kirillov, “Boundary IoU: Improving Object-Centric Image Segmentation Evaluation,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 15329–15337, Mar. 2021, doi: 10.1109/CVPR46437.2021.01508.
    [26] A. Kirillov, Y. Wu, K. He, and R. Girshick, “PointRend: Image Segmentation as Rendering,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 9796–9805, Dec. 2019, doi: 10.1109/CVPR42600.2020.00982.
    [27] O. Oktay et al., “Attention U-Net: Learning Where to Look for the Pancreas,” Apr. 2018, Accessed: Jun. 24, 2023. [Online]. Available: https://arxiv.org/abs/1804.03999v3
    [28] Y. Gao, M. Zhou, and D. N. Metaxas, “UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12903 LNCS, pp. 61–71, Jul. 2021, doi: 10.1007/978-3-030-87199-4_6.

    無法下載圖示 全文公開日期 2025/08/21 (校內網路)
    全文公開日期 2028/08/21 (校外網路)
    全文公開日期 2028/08/21 (國家圖書館:臺灣博碩士論文系統)
    QR CODE