研究生: 蔡孟軒
Meng-Hsuan Tsai
論文名稱: 利用深度學習估測 YouTube 車禍影片中三維車輛軌跡之研究
3D Vehicle Trajectory Estimationfrom YouTube Car Accident Videos Using Deep Neural Networks
指導教授: 陳郁堂
Yie-Tarng Chen
口試委員: 陳郁堂
Yie-Tarng Chen
Ming-Bo Lin
Wen-Hsien Fang
Jenq-Shiou Leu
Hsing-Lung Chen
學位類別: 碩士
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 51
中文關鍵詞: 視覺測距即時定位與地圖構建物件偵測透視變換前後向錯誤校正
外文關鍵詞: Visual Odometry, Simultaneous Localization and Mapping, Object Detection, Perspective Transformation, Forward and backward Error
相關次數: 點閱:476下載:0
由於近年來自駕車等引人注目的應用,從行車紀錄器影片生成3D車輛軌跡成為被關注的議題。為了提前預防車禍,我們必須首先了解車禍的主要原因以及基於這些三維車輛軌跡容易發生車禍的駕駛模式。在本文中,我們提出了一種三維車輛軌跡估計方案:通過YouTube網站上的車禍行車紀錄器影片生成車輛軌跡。首先,我們使用單目視覺測距方法估計自我運動並得到兩個相鄰幀的特徵點對,以及最近開發的SfMLearner神經網路。由於來自圖像對的自我運動估計會有尺度問題,只能獲得相對速度。為了解決這個問題,首先我們將圖片利用透視變換(Perspective Transformation)投影到鳥瞰圖上面,並且偵測道路線,之後對於道路線進行比例縮放,找出特徵點位移與真實距離間的比例關係。最後,我們可以計算車輛與自我運動相機的距離並加入相機自我運動來獲得其他車輛在世界坐標中的3D車輛軌跡。與單眼相機的其他自我運動估計方法相比,我們的方法有兩個優點。首先,我們的方法可以獲得絕對尺度的車輛軌跡。其次,所提出的方法不僅可以在KITTI數據集上運行,還可以在YouTube網站上質量較差的車禍行車紀錄器影片運行。

Generating 3D vehicle trajectories from dash-cam videos has garnered recent attention due to compelling applications in self-driving cars. In order to prevent car accidents in advance, we must first understand the main causes of car accidents and the driving modes that are prone to car accidents based on these 3D vehicle trajectories. In this thesis, we explore an intriguing scenario for 3D vehicle trajectory estimation: generating trajectories of vehicles involved accidents from online videos on YouTube captured by dash-cameras. First, we estimate camera ego motion using both the structure from motion, where we use the feature point pairs of two adjacent frames, and recently developed SfMLearner neural networks. Since ego motion estimation from image pairs suffers from the scale ambiguity problem, where only a relative speed can be obtained. To address this problem, first we use a novel vehicle depth estimation based on a combination of inverse perspective transform a dash line heuristic and sequentially calculate a moving vector from adjacent frames at the bird view by using matching feature pairs. Finally, we can obtain 3D vehicle trajectories in the world coordinate by adding the estimated camera ego motion and vehicle motion in the car coordinate. Compared with other ego motion estimation approaches from monocular camera, our approach has two advantages. First,our method can get vehicle trajectories in the absolute scale. Second, the proposed method can operate not only on KITTI datasets but also accident videos on YouTube from a dash cam.

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Acknowledgment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Vehicle Trajectories Estimation . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Summary of the Proposal Approach . . . . . . . . . . . . . . . . . 3 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 Traditional Scene Geometry . . . . . . . . . . . . . . . . . . . . . 5 2.2 View Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3 Learning-Based Motion Estimation . . . . . . . . . . . . . . . . . 6 2.4 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4.1 One-Stage Detector: . . . . . . . . . . . . . . . . . . . . . 7 2.4.2 Two-Stage Detector: . . . . . . . . . . . . . . . . . . . . . 8 2.5 Visual Odometry Benchmark [1] . . . . . . . . . . . . . . . . . . . 8 3 The Proposed Approach for 3D Ego-Motion Estimation . . . . . . . . . 10 3.1 Overall Methodology . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Region-of-Interest Selection . . . . . . . . . . . . . . . . . . . . . 12 3.4 Structure from Motion . . . . . . . . . . . . . . . . . . . . . . . . 13 3.4.1 Monocular Visual Odometry . . . . . . . . . . . . . . . . . 13 3.4.2 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . 14 3.4.3 Feature Matching . . . . . . . . . . . . . . . . . . . . . . . 15 3.4.4 Essential Matrix Estimation . . . . . . . . . . . . . . . . . 16 3.4.5 Computing R, t from the Essential Matrix . . . . . . . . . 18 3.4.6 Ego-Motion Estimation in World Coordinate . . . . . . . . 18 3.5 Ego-Motion Estimation Based on Deep Learning . . . . . . . . . . 19 3.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.5.2 Supervision by View Synthesis . . . . . . . . . . . . . . . . 20 3.5.3 Dierentiable Depth Image-Based Rendering Module . . . 20 3.5.4 Mask for the Model Limitation . . . . . . . . . . . . . . . 21 3.5.5 Depth Smoothness . . . . . . . . . . . . . . . . . . . . . . 22 3.5.6 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.5.7 Network Architecture . . . . . . . . . . . . . . . . . . . . . 22 4 The Proposed 3D Trajectory Generation . . . . . . . . . . . . . . . . . 24 4.1 Absolute Scale Estimation for Monocular Visual Odometry . . . . 26 4.1.1 Feature Point Pairs Selection . . . . . . . . . . . . . . . . 27 4.1.2 Inverse Perspective Transformation . . . . . . . . . . . . . 28 4.1.3 Distance Estimation by Dotted Line Heuristic . . . . . . . 29 4.1.4 Scale Ratio Calculation . . . . . . . . . . . . . . . . . . . . 30 4.2 Coordinate Transformation . . . . . . . . . . . . . . . . . . . . . . 31 4.2.1 Coordinate Rotation . . . . . . . . . . . . . . . . . . . . . 32 4.2.2 Car Coordinate to World Coordinate Transformation . . . 33 5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.1.1 KITTI Visual Odometry Dataset [1] . . . . . . . . . . . . 34 5.1.2 Youtube Accident Dataset . . . . . . . . . . . . . . . . . . 35 5.2 Experimental Results on Monocular Visual Odometry . . . . . . . 36 5.3 Experimental Results on Absolute Scale Estimation . . . . . . . . 40 5.4 Experimental Results on Car Accident Trajectory Generation . . 42 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

