Basic Search / Detailed Display

Author: 羅柏智
Po-Chih Lo
Thesis Title: 學習無人機動態避障感知的運動表示
Learning Motion Representation for UAV Dynamic Obstacle Avoidance Perception
Advisor: 項天瑞
Tien-Ruey Hsiang
Committee: 楊傳凱
Chuan-Kai Yang
吳怡樂
Yi-Leh Wu
Degree: 碩士
Master
Department: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
Thesis Publication Year: 2023
Graduation Academic Year: 111
Language: 中文
Pages: 41
Keywords (in Chinese): 障礙物避開光流深度學習
Keywords (in other languages): obstacle avoidance, optical flow, deep learning
Reference times: Clicks: 370Downloads: 29
Share:
School Collection Retrieve National Library Collection Retrieve Error Report

本文展示了基於運動表示的輕量視覺框架,用於 UAV 動態障礙物檢測與避障決策生成。框架主要由兩個核心組件構成,首先,我們提出基於 Element-Wise Subtraction 的輕量運動估計網路 Diffnet,藉由編碼多尺度相鄰幀差異特徵,映射為避障任務導向的光流特徵,用以提升對光照變化、運動模糊的容錯性,以及作為主流光流估計網路的快速計算替代。其次,我們在運動特徵基礎上建構Transformer 網路來預測兩個避障子任務: 避障機率和逃逸方向策略,以自注意力方式提取全域上下文特徵,改善 CNN 的不足。最後,我們引入了二階段運動特徵訓練方案,結合無監督光流預訓練和光流知識蒸餾來充分學習任務導向特徵。為了驗證該方案的可行性,我們在真實環境的動態障礙物數據集以及 Airsim 模擬平台進行測試,用於評估障礙物感知性能。實驗表明,相比於其他應用於 UAV 領域
的 CNN 架構,提議的方案再低光源環境及避障決策具有顯著的提升,整體框架計算快速,以滿足 UAV 機載低運算設備的推理即時需求。


In this paper, we present a lightweight visual framework based on Motion Representation for UAV dynamic obstacle perception and avoidance decision-making. The framework consists of two components. Firstly, we propose a motion estimation network called Diffnet based on element-wise subtraction. It encodes multi-scale frame difference features and decodes them into obstacle-oriented optical flow representation. This improves tolerance to lighting variations and motion blur while providing a fast computation alternative to mainstream optical flow estimation networks. Secondly, we construct a Transformer on motion features to predict two subtasks of obstacle avoidance: Collision Probability and Escape-Direction Action. Self-attention is used to extract global context and address the limitations of CNNs. Finally, we introduce a two-stage motion feature training scheme that combines Unsupervised Optical Flow Pretraining and Knowledge Distillation to further guide task-specific flow features.To validate the feasibility of this approach, we test it on a real-world dataset of dynamic obstacle collisions and the Airsim simulation platform to evaluate perception performance. The experiments demonstrate significant improvements in low-light environments and obstacle avoidance decision-making compared to other CNN architectures used in the UAV domain. The overall framework is computationally efficient to meet the real-time requirements of UAVs with onboard low-computational devices.

中文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .I 英文摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .II 誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .III 目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .IV 圖目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .VI 表目錄 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .VIII 1 介紹 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 1.1 動機與目的 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 論文架構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 相關研究 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 基於機器視覺的控制框架 . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 移動物體的運動表示 . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Vision Transformer 概述 . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 方法與架構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.1 方法動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.2 總體架構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.3 運動估計網路 Diffnet . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3.1 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3.2 Difference Feature Encoder . . . . . . . . . . . . . . . . . . . . 9 3.3.3 Motion Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4 Action Transformer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.4.1 Patching and Position Embedding . . . . . . . . . . . . . . . . . 12 3.4.2 Class Token and Transformer Encoder . . . . . . . . . . . . . . . 12 3.4.3 FFN for Predicting Avoidance Tasks . . . . . . . . . . . . . . . . 12 Two-Stage Motion Training . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.5.1 Unsupervised Optical Flow Pretraining . . . . . . . . . . . . . . 13 3.5.2 Optical Flow Knowledge Distillation . . . . . . . . . . . . . . . 15 4 實驗模擬與評估 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.1 資料集 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 4.2 資料增強和訓練設置 . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3 評估指標 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.4 實驗結果及驗證 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4.4.1 總體精確度評估 . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.4.2 場景評估 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.4.3 運動表示視覺化 . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.4.4 光流網路評比 . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.4.5 感知距離測試 . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 5 結論與未來展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 參考文獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

[1] A. Loquercio, A. I. Maqueda, C. R. Del-Blanco, and D. Scaramuzza, “Dronet:
Learning to fly by driving,” IEEE Robotics and Automation Letters, vol. 3, no. 2,
pp. 1088–1095, 2018.
[2] A. Kouris and C.-S. Bouganis, “Learning to fly by myself: A self-supervised cnn-
based approach for autonomous navigation,” in 2018 IEEE/RSJ International Con-
ference on Intelligent Robots and Systems (IROS), pp. 1–9, IEEE, 2018.
[3] S. Sun, Z. Kuang, L. Sheng, W. Ouyang, and W. Zhang, “Optical flow guided feature:
A fast and robust motion representation for video action recognition,” in Proceedings
of the IEEE conference on computer vision and pattern recognition, pp. 1390–1399,
2018.
[4] A. Anwar and A. Raychowdhury, “Autonomous navigation via deep reinforcement
learning for resource constraint edge nodes using transfer learning,” IEEE Access,
vol. 8, pp. 26549–26560, 2020.
[5] S. Ozer et al., “Visual object tracking in drone images with deep reinforcement
learning,” in 2020 25th International Conference on Pattern Recognition (ICPR),
pp. 10082–10089, IEEE, 2021.
[6] Y. Chen, N. González-Prelcic, and R. W. Heath, “Collision-free uav navigation with
a monocular camera using deep reinforcement learning,” in 2020 IEEE 30th Inter-
national Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6,
IEEE, 2020.
[7] A. Loquercio, E. Kaufmann, R. Ranftl, M. Müller, V. Koltun, and D. Scara-
muzza, “Learning high-speed flight in the wild,” Science Robotics, vol. 6, no. 59,
p. eabg5810, 2021.
[8] A. Giusti, J. Guzzi, D. C. Cireşan, F.-L. He, J. P. Rodríguez, F. Fontana, M. Faessler,
C. Forster, J. Schmidhuber, G. Di Caro, et al., “A machine learning approach to
visual perception of forest trails for mobile robots,” IEEE Robotics and Automation
Letters, vol. 1, no. 2, pp. 661–667, 2015.
[9] D. Gandhi, L. Pinto, and A. Gupta, “Learning to fly by crashing,” in 2017 IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS), pp. 3948–3955,
IEEE, 2017.
[10] N. Smolyanskiy, A. Kamenev, J. Smith, and S. Birchfield, “Toward low-flying au-
tonomous mav trail navigation using deep neural networks for environmental aware-
ness,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS), pp. 4241–4247, IEEE, 2017.
[11] X. Dai, Y. Mao, T. Huang, N. Qin, D. Huang, and Y. Li, “Automatic obstacle avoid-
ance of quadrotor uav via cnn-based learning,” Neurocomputing, vol. 402, pp. 346–
358, 2020.
[12] D. Falanga, K. Kleber, and D. Scaramuzza, “Dynamic obstacle avoidance for
quadrotors with event cameras,” Science Robotics, vol. 5, no. 40, p. eaaz9712, 2020.
[13] N. J. Sanket, C. M. Parameshwara, C. D. Singh, A. V. Kuruttukulam, C. Fermüller,
D. Scaramuzza, and Y. Aloimonos, “Evdodgenet: Deep dynamic obstacle dodging
with event cameras,” in 2020 IEEE International Conference on Robotics and Au-
tomation (ICRA), pp. 10651–10657, IEEE, 2020.
[14] L. Sevilla-Lara, Y. Liao, F. Güney, V. Jampani, A. Geiger, and M. J. Black, “On
the integration of optical flow and action recognition,” in Pattern Recognition: 40th
German Conference, GCPR 2018, Stuttgart, Germany, October 9-12, 2018, Pro-
ceedings 40, pp. 281–297, Springer, 2019.
[15] K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action
recognition in videos,” Advances in neural information processing systems, vol. 27,
2014.
[16] J. Stroud, D. Ross, C. Sun, J. Deng, and R. Sukthankar, “D3d: Distilled 3d networks
for video action recognition,” in Proceedings of the IEEE/CVF Winter Conference
on Applications of Computer Vision, pp. 625–634, 2020.
[17] Y. Liu, Z. Tu, L. Lin, X. Xie, and Q. Qin, “Real-time spatio-temporal action local-
ization via learning motion representation,” in Proceedings of the Asian Conference
on Computer Vision, 2020.
[18] H. Wang, P. Cai, R. Fan, Y. Sun, and M. Liu, “End-to-end interactive prediction and
planning with optical flow distillation for autonomous driving,” in Proceedings of
the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2229–
2238, 2021.
[19] H. Wang, P. Cai, Y. Sun, L. Wang, and M. Liu, “Learning interpretable end-to-
end vision-based motion planning for autonomous driving with optical flow distilla-
tion,” in 2021 IEEE International Conference on Robotics and Automation (ICRA),
pp. 13731–13737, IEEE, 2021.
[20] Y. Zhu, Z. Lan, S. Newsam, and A. Hauptmann, “Hidden two-stream convolutional
networks for action recognition,” in Computer Vision–ACCV 2018: 14th Asian Con-
ference on Computer Vision, Perth, Australia, December 2–6, 2018, Revised Se-
lected Papers, Part III 14, pp. 363–378, Springer, 2019.
[21] C. Zhang, Y. Zou, G. Chen, and L. Gan, “Pan: Towards fast action recognition via
learning persistence of appearance,” arXiv preprint arXiv:2008.03462, 2020.
[22] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser,
and I. Polosukhin, “Attention is all you need,” Advances in neural information pro-
cessing systems, vol. 30, 2017.
[23] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Un-
terthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al., “An image is
worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint
arXiv:2010.11929, 2020.
[24] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou, “Training
data-efficient image transformers & distillation through attention,” in International
conference on machine learning, pp. 10347–10357, PMLR, 2021.
[25] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-
to-end object detection with transformers,” in Computer Vision–ECCV 2020: 16th
European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16,
pp. 213–229, Springer, 2020.
[26] J. J. Yu, A. W. Harley, and K. G. Derpanis, “Back to basics: Unsupervised learning of
optical flow via brightness constancy and motion smoothness,” in Computer Vision–
ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16,
2016, Proceedings, Part III 14, pp. 3–10, Springer, 2016.
[27] D. Sun, X. Yang, M.-Y. Liu, and J. Kautz, “Pwc-net: Cnns for optical flow using
pyramid, warping, and cost volume,” in Proceedings of the IEEE conference on com-
puter vision and pattern recognition, pp. 8934–8943, 2018.
[28] A. Z. Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Ev-flownet: Self-supervised
optical flow estimation for event-based cameras,” arXiv preprint arXiv:1802.06898,
2018.
[29] Z. Teed and J. Deng, “Raft: Recurrent all-pairs field transforms for optical flow,”
in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August
23–28, 2020, Proceedings, Part II 16, pp. 402–419, Springer, 2020.

QR CODE