簡易檢索 / 詳目顯示

研究生: 呂小龍
Herleeyandi Markoni
論文名稱: 混和式卷積類神經網路及長短期記憶模型駕駛瞌睡偵測
Driver Drowsiness Detection Using Hybrid Convolutional Neural Network and Long Short-Term Memory
指導教授: 郭景明
Jing-Ming Guo
口試委員: 郭景明
Jing-Ming Guo
王乃堅
Nai-Jian Wang
賴坤財
Kuen-Tsair Lay
王靖維
Ching-Wei Wang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 104
中文關鍵詞: 疲勞偵測臉部偵測卷積神經網路長短期記憶時間濃縮長短期記憶
外文關鍵詞: drowsiness detection, face detection, convolutional neural networks, long short-term memory, time skip combination long short-term memory
相關次數: 點閱:372下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 疲勞駕駛為車禍事故的重要原因之一,且每年因疲勞駕駛而死亡的人數也日益增加,為了防止這個問題造成的影響,本研究提出疲勞駕駛偵測系統。

    此研究所面臨的挑戰主要在於人臉的變化,系統準確性受到所需要的時間和實時性要求的限制,雖使用傳統的圖像處理和機器視覺的演算法已可很好地處理臉部變化的影響,但如臉部表情、光源影響、類內變異和姿勢等因素是傳統演算法未能解決的關鍵問題,因此深度學習是一種替代的解決方案,通過自動學習特徵的方式提供更好的性能。基於以上動機,本文提出了一種新型系統架構,結合卷積神經網絡(CNN)和長期短期記憶(LSTM)用於處理駕駛員疲勞的問題。該系統已用於ACCV 2016比賽的公共駕駛數據庫進行測試,並超越目前所提出的技術結果。


    Drowsiness and fatigue of the drivers are amongst the significant causes of the accident. Every year they increase the number of deaths and fatalities to the human population. To prevent the impact that caused by this problem, the driver drowsiness system is proposed and examined in this study.
    The challenge of this problem is the variation of the human face, the accuracy of the system which respected to the time that needed by the system to analyze with the real-time requirement. The first challenge pertaining the facial variation has been handled well using conventional image processing and hand-craft features of computer vision algorithms. Yet, variations such as facial expression, lighting condition, intra-class variation, and pose variation are additional critical issues that conventional method failed to address. Deep learning is an alternative solution which provides a better performance by learning features automatically. Thus, this thesis proposed a new concept for handling the real-time driver drowsiness detection using the hybrid of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM). The performance of the system has been tested using the public drowsy driver dataset from ACCV 2016 competition. The results show that it can outperform the former schemes in the literature.

    Master’s Thesis Recommendation Form i Qualification Form by Master’s Degree Examination Committee ii Acknowledgement iii 摘要 iv Abstract v Table of Contents vi List of Figures x List of Tables xv Chapter 1 1 Introduction 1 1.1 Introduction 1 1.2 Motivation 3 1.3 Objective 4 1.4 Main Contribution 5 1.5 Thesis Organization 5 Chapter 2 6 Literature Review and Basic Theory 6 2.1 Driver Drowsiness Problem 6 2.1.1 Related Works 6 2.2 Convolutional Neural Network 9 2.2.1 Convolution Layer 9 2.2.2 Pooling Layer 11 2.2.3 Activation Function 13 2.2.4 Fully Connected Layer 15 2.2.5 Batch Normalization Layer 16 2.2.6 Dropout Layer 17 2.3 Optimizer and Loss Function 18 2.3.1 Batch Gradient Descent 19 2.3.2 Stochastic Gradient Descent 19 2.3.3 Softmax 20 2.3.4 Cross Entropy Loss Function 21 2.4 Designing CNN Architecture 22 2.5 VGG 11 Architecture 24 2.6 Face Detection 25 2.7 Long Short-Term Memory (LSTM) 26 Chapter 3 30 System Design and Algorithm Implementation 30 3.1 Proposed Method 31 3.2 ACCV Drowsy Driver Dataset 33 3.2.1 Acquisition and details 33 3.2.2 Data statistic 34 3.2.3 Data Augmentation 36 3.2.4 Handling Poor Vision 36 3.3 Eyes and Mouth Architecture Design 38 3.3.1 Stage Pruning 39 3.3.2 Add Layers 40 3.3.3 Replacing Filters 42 3.4 Face Feature Extraction 44 3.5 Temporal Feature 48 3.6 Time Skip Combination Long Short-Term Memory (TSC-LSTM) 50 Chapter 4 52 Experiment Setup 52 4.1 Hardware and Software 52 4.2 Training CNN for Eyes Feature 53 4.3 Training CNN for Mouth Feature 54 4.4 Training Time Skip Combination LSTM 55 4.5 Training Refinement LSTM 57 Chapter 5 58 Experimental Results 58 5.1 Best CNN Experiment 58 5.1.1 Eyes CNN Depth Effect 58 5.1.2 Eyes CNN Adding Layer 61 5.1.3 Eyes CNN Change Filter Size 63 5.1.4 Mouth CNN Depth Effect 66 5.1.5 Mouth CNN Adding Layer 69 5.1.6 Mouth CNN Change Filter Size 71 5.2 Time Skip Combination Long Short-Term Memory (TSC-LSTM) Experiment 74 5.2.1 Number of Hidden Size 74 5.2.2 Number of Layer 77 5.2.3 Sequences Length 80 5.2.4 Number of Time Skip 82 5.2.5 Time Skip 2 Step Combination 85 5.2.6 Time Skip 3 Step Combination 87 5.2.7 Last Fully Connected 90 5.2.8 Different Scenario 92 5.3 Refinement 95 5.3.1 Median Filter Refinement 95 5.3.2 LSTM Refinement 96 5.3.3 Final Discussion and Comparison 99 Chapter 6 101 Conclusion and Future Work 101 6.1 Conclusion 101 6.2 Future Works 102 References 103

    [1] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press, 2016.
    [2] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, "Large-scale video classification with convolutional neural networks," pp. 1725-1732.
    [3] S. Ji, W. Xu, M. Yang, and K. Yu, "3D convolutional neural networks for human action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221-231, 2013.
    [4] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
    [5] M. Akopyan and E. Khashba, "Large-Scale YouTube-8M Video Understanding with Deep Neural Networks," arXiv preprint arXiv:1706.04488, 2017.
    [6] C.-Y. Ma, M.-H. Chen, Z. Kira, and G. AlRegib, "TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition," arXiv preprint arXiv:1703.10667, 2017.
    [7] T.-H. Shih and C.-T. Hsu, "MSTN: Multistage Spatial-Temporal Network for Driver Drowsiness Detection," pp. 146-153: Springer.
    [8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," pp. 1097-1105.
    [9] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
    [10] C. Szegedy et al., "Going deeper with convolutions," pp. 1-9.
    [11] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," pp. 770-778.
    [12] S. Park, F. Pan, S. Kang, and C. D. Yoo, "Driver Drowsiness Detection System Based on Feature Representation Learning Using Various Deep Networks," pp. 154-164: Springer.
    [13] J. Yu, S. Park, S. Lee, and M. Jeon, "Representation Learning, Scene Understanding, and Feature Fusion for Drowsiness Detection," pp. 165-177: Springer.
    [14] C.-H. Weng, Y.-H. Lai, and S.-H. Lai, "Driver Drowsiness Detection via a Hierarchical Temporal Deep Belief Network," pp. 117-133: Springer.
    [15] W. Zhang, B. Cheng, and Y. Lin, "Driver drowsiness recognition based on computer vision technology," Tsinghua Science and Technology, vol. 17, no. 3, pp. 354-362, 2012.
    [16] R. N. Khushaba, S. Kodagoda, S. Lal, and G. Dissanayake, "Driver drowsiness classification using fuzzy wavelet-packet-based feature-extraction algorithm," IEEE Transactions on Biomedical Engineering, vol. 58, no. 1, pp. 121-131, 2011.
    [17] P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," vol. 1, pp. I-I: IEEE.
    [18] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," pp. 448-456.
    [19] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," Journal of machine learning research, vol. 15, no. 1, pp. 1929-1958, 2014.
    [20] M. T. McCann, K. H. Jin, and M. Unser, "A review of convolutional neural networks for inverse problems in imaging," arXiv preprint arXiv:1710.04011, 2017.
    [21] K. He and J. Sun, "Convolutional neural networks at constrained time cost," pp. 5353-5360.
    [22] M. Lin, Q. Chen, and S. Yan, "Network in network," arXiv preprint arXiv:1312.4400, 2013.
    [23] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision," pp. 2818-2826.
    [24] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks," IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, 2016.
    [25] C. Olah. (2015). Understanding LSTM Networks. Available: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
    [26] R. C. Gonzalez and R. E. Woods, "Digital image processing," ed: Prentice hall New Jersey, 2002.

    QR CODE