研究生: 陳柏勳
Po-Hsun Chen
論文名稱: 用於邊緣設備之影片分類法
Video Classification on Edge Devices
指導教授: 方文賢
Wen-Hsien Fang
Yie-Tarng Chen
口試委員: 方文賢
Wen-Hsien Fang
Yie-Tarng Chen
Kuen-Tsair Lay
Chien-Ching Chiu
Shanq-Jang Ruan
學位類別: 碩士
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2022
畢業學年度: 110
語文別: 英文
論文頁數: 57
中文關鍵詞: 影片分類弱監督式學習模型優化邊緣裝置
外文關鍵詞: Video classification, Weakly supervised learning, Model optimization, Edge device
影片分類是計算機視覺中的重要課題之一,例如駕駛員行為、人類動作和跌倒檢測。然而由於攝像機角度、光照、天氣等多種因素,影片分類仍然是一項艱鉅的任務。此外由於需要大量計算,邊緣設備的實施是另一個挑戰。在本論文中,我們的目標是在廣泛使用的邊緣設備Jetson Nano上實現影片分類網絡,該網絡可以在低延遲的情況下實現準確的性能。為此我們考慮了一個弱監督的影片分類網絡。首先我們選擇MobileNet-V2作為我們的骨幹網絡主幹,以在計算成本和準確性之間取得良好的平衡。由於MobileNet-V2是一個二維卷積神經網絡,它無法提取時間關係。因此我們將時間移位模塊添加到我們的網絡中。時間移位模塊沿時間維度移動一些通道,因此相鄰幀可以相互交換信息。為了進一步提高準確性,我們還將非本地操作添加到網絡中,這樣我們就可以通過計算位置之間的交互來直接捕獲遠程依賴。之後我們考慮在Jetson Nano上實現上述內容。為此我們在工作中導入了張量虛擬機。我們首先使用帶有遠程過程調用的自動調優模塊(自動張量虛擬機 或 自動調度器)來調優模型,從而獲得最優的模型調度。最後我們將獲得的時間表導入張量虛擬機以優化模型。對各種數據集的模擬證明了這個方法在Jetson Nano上的有效性。

Video classification, such as driver behaviors, human actions, and fall detection, is one of the important topics in computer vision. However, due to a variety of factors such as camera angle, lighting, weather, {\it etc.}, video classification remains to be a difficult task. In addition, the implementation on edge devices is another challenge due to the massive computations required. In this thesis, we aim to implement the video classification network on the widespread edge device Jetson Nano, which can achieve accurate performance with low latency. Toward this end, we consider a weakly supervised video classification network. First, we choose MobileNet-V2 as our backbone network backbone to get a good trade-off between computational cost and accuracy. Since MobileNet-V2 is a 2D convolutional neural network (CNN), it can not extract the temporal relationships. Thereby, we add the temporal shift module (TSM) to our network. TSM shifts some channels along the temporal dimension, so the neighboring frames can exchange information with each other. To further enhance the accuracy, we also add the non-local operation into the network so we can capture the long-range dependency directly by computing the interactions between positions. Afterwards, we consider the implementation of the aforementioned on Jetson Nano. For this, we import the tensor virtual machine (TVM) in our work. We first use the auto-tuning module (AutoTVM or AutoScheduler) with remote procedure call (RPC) to tune the model so that we can get the optimal model schedule. Finally, we import the obtained schedule into TVM to optimize the model. Simulations on a variety of datasets demonstrate the efficacy of this method on Jetson Nano.

摘要 i Abstract ii Acknowledgment iii Table of contents iv List of Figures vii List of Tables x List of Acronyms xi 1 Introduction 1 2 Related Work 3 2.1 Video Classification 3 2.2 Weakly Supervised Learning 4 2.3 Temporal Modeling 4 2.4 Edge Computation 5 2.5 Model Optimization 5 2.6 Summary 6 3 Proposed Method 7 3.1 Proposed Architecture 7 3.2 TSM Architecture 8 3.2.1 Temporal Shift Module 8 3.2.2 Non-Local Operation 9 3.2.3 Segmental Consensus 10 3.2.4 Loss Function 10 3.3 Implementation on Edge Devices 11 3.3.1 Tensor Virtual Machine 11 3.3.2 Auto-tuning Module 13 3.3.3 Remote Procedure Call 13 3.4 Summary 14 4 Experimental Results and Discussions 15 4.1 Datasets 15 4.1.1 Driver Monitoring Dataset 15 4.1.2 UP Fall Dataset 17 4.1.3 Our Driver Monitoring Dataset 18 4.1.4 Our Fall Dataset 20 4.2 Experimental Setup 20 4.2.1 Model Parameters 20 4.2.2 Data Augmentation 21 4.2.3 Evaluation Metrics 22 4.3 Experimental Results 22 4.3.1 DMD Dataset Results 22 4.3.2 UP Fall Dataset Results 24 4.3.3 Our Driver Monitoring Dataset 26 4.3.4 Our Fall Dataset 28 4.3.5 Implementation on Jetson Nano 29 4.4 Failure Cases and Error Analysis 32 4.4.1 Imbalanced Numbers of Each Class 32 4.4.2 Detailed Action of Fall Event 34 4.5 Summary 37 5 Conclusion and Future Works 38 5.1 Conclusion 38 5.2 Future Works 38 References 39

