研究生: |
劉子齊 TZU-CHI LIU |
---|---|
論文名稱: |
多層次時序的事件與可見光融合物件偵測技術 Multi-Level Temporal-Based Event and Visible Fusion for Object Detection |
指導教授: |
陳永耀
Yung-Yao Chen |
口試委員: |
花凱龍
Kai-Lung Hua 林淵翔 Yuan-Hsiang Lin 呂政修 Jenq-Shiou Leu 陳永耀 Yung-Yao Chen |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電子工程系 Department of Electronic and Computer Engineering |
論文出版年: | 2024 |
畢業學年度: | 112 |
語文別: | 中文 |
論文頁數: | 44 |
中文關鍵詞: | 事件相機 、時序特徵融合 、跨模態物件偵測 |
外文關鍵詞: | Event-based Camera, Temporal feature fusion, Cross-modality object detection |
相關次數: | 點閱:73 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
傳統的可見光相機容易受到不同天氣、光照條件的影響,導致性能下降。對此,我們引入了一種名為事件相機的新型感測器來解決這樣的問題,並且提出多層次時序的事件與可見光融合物件偵測技術,旨在提升事件相機與可見光相機結合下的物件偵測性能。我們設計了三個關鍵模組:門控事件累積表示模組、時序特徵選擇模組和自適應融合模組。門控事件累積表示模組和時序特徵選擇模組能夠強化事件在影像層次和特徵層次的時序特徵融合,而自適應融合模組能夠有效整合來自不同模態的特徵,並且保有較低的運算複雜度。本研究在公開資料集DSEC-Detection上進行訓練和驗證,在mAP50和mAP50-95中分別達到67.2%和45.6%,展示了優異的偵測性能,證明了方法的有效性。
Traditional visible imaging camera are susceptible to performance degradation under varying weather and lighting conditions. To address this issue, we introduce a novel sensor called an event-based camera and propose multi-level temporal-based event and visible fusion for object detection. Our approach aims to enhance object detection performance by combining event-based camera and visible imaging camera. We have designed three key modules: Gated Event Accumulation Representation Module, Temporal Feature Selection Module, and Adaptive Fusion Module. Gated Event Accumulation Representation Module and Temporal Feature Selection Module enhance the temporal feature fusion at both the image and feature levels, while Adaptive Fusion Module effectively integrates features from different modalities while maintaining low computational complexity. Our method has been trained and validated on the publicly available DSEC-Detection dataset, achieving mAP50 and mAP50-95 scores of 67.2% and 45.6%, respectively, demonstrating excellent detection performance and validating the effectiveness of the proposed method.
[1] A. I. Maqueda, A. Loquercio, G. Gallego, N. Garcia, and D. Scaramuzza, “Event-based vision meets deep learning on steering prediction for self-driving cars,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 5419-5427.
[2] A. Nguyen, T. T. Do, D. G. Caldwell, and N. G. Tsagarakis, “Real-time 6dof pose relocalization for event cameras with stacked spatial lstm networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019, pp. 0-0.
[3] P. K.J. Park, B. H. Cho, J. M. Park, K. Lee, H. Y. Kim, H. A. Kang, H. G. Lee, J. Woo, Y. Roh, W. J. Lee, C. W. Shin, Q. Wang, and H. Ryu, “Performance improvement of deep learning based gesture recognition using spatiotemporal demosaicing technique,” in IEEE International Conference on Image Processing, 2016, pp. 1624-1628.
[4] H. Chen, D. Suter, Q. Wu, and H. Wang, “End-to-end learning of object motion estimation from retinal events for event-based object tracking,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 10534-10541
[5] X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, and R. B. Benosman, “Hots: A hierarchy of event-based time-surfaces for pattern recognition,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, pp. 1346-1359.
[6] A. Sironi, M. Brambilla, N. Bourdis, X. Lagorce, and R. B. Benosman, “Hats: Histograms of averaged time surfaces for robust event-based object classification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1731-1740.
[7] A. Zihao Zhu, L. Yuan, K. Chaney, and K. Daniilidis, “Unsupervised event-based learning of optical flow, depth, and egomotion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 989-997
[8] Y. Nam, M. Mostafavi, K. J. Yoon, and J. Choi, “Stereo Depth from Events Cameras: Concentrate and Focus on the Future,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 6114-6123
[9] Z. Huang, L. Sun, C. Zhao, S. Li, and S. Su, “EventPoint: Self-Supervised Interest Point Detection and Description for Event-based Camera,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 5396-5405
[10] J. Kim, J. Bae, G. Park, D. Zhang, and Y. M. Kim, “N-imagenet: Towards robust, fine-grained object recognition with event cameras,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 2146–2156.
[11] L. Cordone, B. Miramond, and P. Thierion, “Object detection with spiking neural networks on automotive event data,” in International Joint Conference on Neural Networks, 2022, pp. 1-8.
[12] M. Yao, H. Gao, G. Zhao, D. Wang, Y. Lin, Z. Yang, and G. Li, “Temporal-wise attention spiking neural networks for event streams classification,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10221-10230.
[13] J. Zhang, B. Dong, H. Zhang, J. Ding, F. Heide, B. Yin, and X. Yang, “Spiking transformers for event-based single object tracking,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 8801-8810.
[14] N. F. Y. Chen, “Pseudo-labels for supervised learning on dynamic vision sensor data, applied to object detection under ego-motion,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition workshops, 2018, pp. 644-653.
[15] M. Iacono, S. Weber, A. Glover, and C. Bartolozzi, “Towards event-driven object detection with off-the-shelf deep learning,” in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2018, pp. 1-9.
[16] Z. Jiang, P. Xia, K. Huang, W. Stechele, G. Chen, Z. Bing, and A. Knoll, “Mixed frame- /event-driven fast pedestrian detection,” in International Conference on Robotics and Automation, 2019, pp. 8332-8338.
[17] M. Gehrig, and D. Scaramuzza, “Recurrent Vision Transformers for Object Detection with Event Cameras,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 13884-13893.
[18] J. Li, J. Li, L. Zhu, X. Xiang, T. Huang, and Y. Tian, “Asynchronous spatio-temporal memory network for continuous event-based object detection,” in IEEE Transactions on Image Processing, 2022, pp. 2975-2987.
[19] E. Perot, P. d. Tournemire, D. Nitti, J. Masci, and A. Sironi, “Learning to detect objects with a 1 megapixel event camera,” in Neural Information Processing Systems, 2020, pp. 16639-16652.
[20] J. Li, S. Dong, Z. Yu, Y. Tian, and T. Huang, “Event-based vision enhanced: A joint detection framework in autonomous driving,” in IEEE International Conference on Multimedia and Expo, 2019, pp. 1396–1401.
[21] A. Tomy, A. Paigwar, K. S. Mann, A. Renzaglia, and C. Laugier, “Fusing event-based and rgb camera for robust object detection in adverse conditions,” in International Conference on Robotics and Automation, 2022, pp. 933–939.
[22] L. Sun, C. Sakaridis, J. Liang, Q. Jiang, K. Yang, P. Sun, Y. Ye, K. Wang, and L. Van Gool, “Event-based fusion for motion deblurring with cross-modal attention,” in European Conference on Computer Vision, 2022, pp. 412-428.
[23] M. Liu, N. Qi, Y. Shi, and B. Yin, “An attention fusion network for eventbased vehicle object detection,” in IEEE International Conference on Image Processing, 2021, pp. 3363–3367.
[24] H. Cao, G. Chen, J. Xia, G. Zhuang, and A. Knoll, “Fusion-based feature attention gate component for vehicle detection based on event camera,” in IEEE Sensors Journal, 2021, pp. 24540-24548.
[25] Z. Zhou, Z. Wu, R. Boutteau, F. Yang, C. Demonceaux, and D. Ginhac, “Rgb-event fusion for moving object detection in autonomous driving,” in IEEE International Conference on Robotics and Automation, 2022, pp. 7808-7815.
[26] Z. Liu, N. Yang, Y. Wang, Y. Li, X. Zhao, and F. Y. Wang, “Enhancing Traffic Object Detection in Variable Illumination with RGB-Event Fusion,” in arXiv:2311.00436, 2023.
[27] J. L. Elman, “Finding structure in time,” in Cognitive Science, 1990, pp. 179-211
[28] S. Hochreiter, and J. Schmidhuber, “Long Short-Term Memory,” in Neural Computation, 1997, pp. 1735–1780.
[29] K. Cho, B. V. Merriënboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoder-decoder approaches,” in arXiv:1409.1259, 2014
[30] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Neural Information Processing Systems, 2017, pp. 5998-6008.
[31] G. Jocher. “YOLOv8.” GitHub. Accessed: Mar. 25, 2023. [Online.] Available: https://github.com/ultralytics/ultralytics
[32] Q. Fang, H. Dapeng, and W. Zhaokui, “Cross-modality fusion transformer for multispectral object detection,” in arXiv:2111.00273, 2021.
[33] M. Gehrig, W. Aarents, D. Gehrig, and D. Scaramuzza, “DSEC: A Stereo Event Camera Dataset for Driving Scenarios,” in IEEE Robotics and Automation Letters, 2021, pp. 4947-4954.
[34] D. Gehrig, and D. Scaramuzza, “Low Latency Automotive Vision with Event Cameras,” in Nature, 2024, pp. 1034-1040.
[35] T. Fischer, T. E. Huang, J. Pang, L. Qiu, H. Chen, T. Darrell, and F. Yu, “QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple Object Tracking,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.