簡易檢索 / 詳目顯示

研究生: 羅柏智
Po-Chih Lo
論文名稱: 學習無人機動態避障感知的運動表示
Learning Motion Representation for UAV Dynamic Obstacle Avoidance Perception
指導教授: 項天瑞
Tien-Ruey Hsiang
口試委員: 楊傳凱
Chuan-Kai Yang
吳怡樂
Yi-Leh Wu
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 41
中文關鍵詞: 障礙物避開光流深度學習
外文關鍵詞: obstacle avoidance, optical flow, deep learning
相關次數: 點閱:377下載:29
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本文展示了基於運動表示的輕量視覺框架,用於 UAV 動態障礙物檢測與避障決策生成。框架主要由兩個核心組件構成,首先,我們提出基於 Element-Wise Subtraction 的輕量運動估計網路 Diffnet,藉由編碼多尺度相鄰幀差異特徵,映射為避障任務導向的光流特徵,用以提升對光照變化、運動模糊的容錯性,以及作為主流光流估計網路的快速計算替代。其次,我們在運動特徵基礎上建構Transformer 網路來預測兩個避障子任務: 避障機率和逃逸方向策略,以自注意力方式提取全域上下文特徵,改善 CNN 的不足。最後,我們引入了二階段運動特徵訓練方案,結合無監督光流預訓練和光流知識蒸餾來充分學習任務導向特徵。為了驗證該方案的可行性,我們在真實環境的動態障礙物數據集以及 Airsim 模擬平台進行測試,用於評估障礙物感知性能。實驗表明,相比於其他應用於 UAV 領域
    的 CNN 架構,提議的方案再低光源環境及避障決策具有顯著的提升,整體框架計算快速,以滿足 UAV 機載低運算設備的推理即時需求。


    In this paper, we present a lightweight visual framework based on Motion Representation for UAV dynamic obstacle perception and avoidance decision-making. The framework consists of two components. Firstly, we propose a motion estimation network called Diffnet based on element-wise subtraction. It encodes multi-scale frame difference features and decodes them into obstacle-oriented optical flow representation. This improves tolerance to lighting variations and motion blur while providing a fast computation alternative to mainstream optical flow estimation networks. Secondly, we construct a Transformer on motion features to predict two subtasks of obstacle avoidance: Collision Probability and Escape-Direction Action. Self-attention is used to extract global context and address the limitations of CNNs. Finally, we introduce a two-stage motion feature training scheme that combines Unsupervised Optical Flow Pretraining and Knowledge Distillation to further guide task-specific flow features.To validate the feasibility of this approach, we test it on a real-world dataset of dynamic obstacle collisions and the Airsim simulation platform to evaluate perception performance. The experiments demonstrate significant improvements in low-light environments and obstacle avoidance decision-making compared to other CNN architectures used in the UAV domain. The overall framework is computationally efficient to meet the real-time requirements of UAVs with onboard low-computational devices.

    中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .I 英文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .II 誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .III 目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .IV 圖目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .VI 表目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .VIII 1 介紹 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 1.1 動機與目的 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 論文架構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 相關研究 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 基於機器視覺的控制框架 . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 移動物體的運動表示 . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Vision Transformer 概述 . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 方法與架構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1 方法動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 總體架構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3 運動估計網路 Diffnet . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3.2 Difference Feature Encoder . . . . . . . . . . . . . . . . . . . . 9 3.3.3 Motion Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4 Action Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4.1 Patching and Position Embedding . . . . . . . . . . . . . . . . . 12 3.4.2 Class Token and Transformer Encoder . . . . . . . . . . . . . . . 12 3.4.3 FFN for Predicting Avoidance Tasks . . . . . . . . . . . . . . . . 12 Two-Stage Motion Training . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.5.1 Unsupervised Optical Flow Pretraining . . . . . . . . . . . . . . 13 3.5.2 Optical Flow Knowledge Distillation . . . . . . . . . . . . . . . 15 4 實驗模擬與評估 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.1 資料集 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 資料增強和訓練設置 . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3 評估指標 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.4 實驗結果及驗證 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.4.1 總體精確度評估 . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.4.2 場景評估 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.4.3 運動表示視覺化 . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.4.4 光流網路評比 . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.4.5 感知距離測試 . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5 結論與未來展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 參考文獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

    [1] A. Loquercio, A. I. Maqueda, C. R. Del-Blanco, and D. Scaramuzza, “Dronet:
    Learning to fly by driving,” IEEE Robotics and Automation Letters, vol. 3, no. 2,
    pp. 1088–1095, 2018.
    [2] A. Kouris and C.-S. Bouganis, “Learning to fly by myself: A self-supervised cnn-
    based approach for autonomous navigation,” in 2018 IEEE/RSJ International Con-
    ference on Intelligent Robots and Systems (IROS), pp. 1–9, IEEE, 2018.
    [3] S. Sun, Z. Kuang, L. Sheng, W. Ouyang, and W. Zhang, “Optical flow guided feature:
    A fast and robust motion representation for video action recognition,” in Proceedings
    of the IEEE conference on computer vision and pattern recognition, pp. 1390–1399,
    2018.
    [4] A. Anwar and A. Raychowdhury, “Autonomous navigation via deep reinforcement
    learning for resource constraint edge nodes using transfer learning,” IEEE Access,
    vol. 8, pp. 26549–26560, 2020.
    [5] S. Ozer et al., “Visual object tracking in drone images with deep reinforcement
    learning,” in 2020 25th International Conference on Pattern Recognition (ICPR),
    pp. 10082–10089, IEEE, 2021.
    [6] Y. Chen, N. González-Prelcic, and R. W. Heath, “Collision-free uav navigation with
    a monocular camera using deep reinforcement learning,” in 2020 IEEE 30th Inter-
    national Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6,
    IEEE, 2020.
    [7] A. Loquercio, E. Kaufmann, R. Ranftl, M. Müller, V. Koltun, and D. Scara-
    muzza, “Learning high-speed flight in the wild,” Science Robotics, vol. 6, no. 59,
    p. eabg5810, 2021.
    [8] A. Giusti, J. Guzzi, D. C. Cireşan, F.-L. He, J. P. Rodríguez, F. Fontana, M. Faessler,
    C. Forster, J. Schmidhuber, G. Di Caro, et al., “A machine learning approach to
    visual perception of forest trails for mobile robots,” IEEE Robotics and Automation
    Letters, vol. 1, no. 2, pp. 661–667, 2015.
    [9] D. Gandhi, L. Pinto, and A. Gupta, “Learning to fly by crashing,” in 2017 IEEE/RSJ
    International Conference on Intelligent Robots and Systems (IROS), pp. 3948–3955,
    IEEE, 2017.
    [10] N. Smolyanskiy, A. Kamenev, J. Smith, and S. Birchfield, “Toward low-flying au-
    tonomous mav trail navigation using deep neural networks for environmental aware-
    ness,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems
    (IROS), pp. 4241–4247, IEEE, 2017.
    [11] X. Dai, Y. Mao, T. Huang, N. Qin, D. Huang, and Y. Li, “Automatic obstacle avoid-
    ance of quadrotor uav via cnn-based learning,” Neurocomputing, vol. 402, pp. 346–
    358, 2020.
    [12] D. Falanga, K. Kleber, and D. Scaramuzza, “Dynamic obstacle avoidance for
    quadrotors with event cameras,” Science Robotics, vol. 5, no. 40, p. eaaz9712, 2020.
    [13] N. J. Sanket, C. M. Parameshwara, C. D. Singh, A. V. Kuruttukulam, C. Fermüller,
    D. Scaramuzza, and Y. Aloimonos, “Evdodgenet: Deep dynamic obstacle dodging
    with event cameras,” in 2020 IEEE International Conference on Robotics and Au-
    tomation (ICRA), pp. 10651–10657, IEEE, 2020.
    [14] L. Sevilla-Lara, Y. Liao, F. Güney, V. Jampani, A. Geiger, and M. J. Black, “On
    the integration of optical flow and action recognition,” in Pattern Recognition: 40th
    German Conference, GCPR 2018, Stuttgart, Germany, October 9-12, 2018, Pro-
    ceedings 40, pp. 281–297, Springer, 2019.
    [15] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action
    recognition in videos,” Advances in neural information processing systems, vol. 27,
    2014.
    [16] J. Stroud, D. Ross, C. Sun, J. Deng, and R. Sukthankar, “D3d: Distilled 3d networks
    for video action recognition,” in Proceedings of the IEEE/CVF Winter Conference
    on Applications of Computer Vision, pp. 625–634, 2020.
    [17] Y. Liu, Z. Tu, L. Lin, X. Xie, and Q. Qin, “Real-time spatio-temporal action local-
    ization via learning motion representation,” in Proceedings of the Asian Conference
    on Computer Vision, 2020.
    [18] H. Wang, P. Cai, R. Fan, Y. Sun, and M. Liu, “End-to-end interactive prediction and
    planning with optical flow distillation for autonomous driving,” in Proceedings of
    the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2229–
    2238, 2021.
    [19] H. Wang, P. Cai, Y. Sun, L. Wang, and M. Liu, “Learning interpretable end-to-
    end vision-based motion planning for autonomous driving with optical flow distilla-
    tion,” in 2021 IEEE International Conference on Robotics and Automation (ICRA),
    pp. 13731–13737, IEEE, 2021.
    [20] Y. Zhu, Z. Lan, S. Newsam, and A. Hauptmann, “Hidden two-stream convolutional
    networks for action recognition,” in Computer Vision–ACCV 2018: 14th Asian Con-
    ference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Se-
    lected Papers, Part III 14, pp. 363–378, Springer, 2019.
    [21] C. Zhang, Y. Zou, G. Chen, and L. Gan, “Pan: Towards fast action recognition via
    learning persistence of appearance,” arXiv preprint arXiv:2008.03462, 2020.
    [22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser,
    and I. Polosukhin, “Attention is all you need,” Advances in neural information pro-
    cessing systems, vol. 30, 2017.
    [23] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Un-
    terthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is
    worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint
    arXiv:2010.11929, 2020.
    [24] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training
    data-efficient image transformers & distillation through attention,” in International
    conference on machine learning, pp. 10347–10357, PMLR, 2021.
    [25] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-
    to-end object detection with transformers,” in Computer Vision–ECCV 2020: 16th
    European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16,
    pp. 213–229, Springer, 2020.
    [26] J. J. Yu, A. W. Harley, and K. G. Derpanis, “Back to basics: Unsupervised learning of
    optical flow via brightness constancy and motion smoothness,” in Computer Vision–
    ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16,
    2016, Proceedings, Part III 14, pp. 3–10, Springer, 2016.
    [27] D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “Pwc-net: Cnns for optical flow using
    pyramid, warping, and cost volume,” in Proceedings of the IEEE conference on com-
    puter vision and pattern recognition, pp. 8934–8943, 2018.
    [28] A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Ev-flownet: Self-supervised
    optical flow estimation for event-based cameras,” arXiv preprint arXiv:1802.06898,
    2018.
    [29] Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,”
    in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August
    23–28, 2020, Proceedings, Part II 16, pp. 402–419, Springer, 2020.

    QR CODE