混和式卷積類神經網路及長短期記憶模型駕駛瞌睡偵測

簡易檢索 / 詳目顯示

回結果列表

研究生：	呂小龍 Herleeyandi Markoni
論文名稱：	混和式卷積類神經網路及長短期記憶模型駕駛瞌睡偵測 Driver Drowsiness Detection Using Hybrid Convolutional Neural Network and Long Short-Term Memory
指導教授：	郭景明 Jing-Ming Guo
口試委員:	郭景明 Jing-Ming Guo 王乃堅 Nai-Jian Wang 賴坤財 Kuen-Tsair Lay 王靖維 Ching-Wei Wang
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2018
畢業學年度：	106
語文別：	英文
論文頁數：	104
中文關鍵詞：	疲勞偵測、臉部偵測、卷積神經網路、長短期記憶、時間濃縮長短期記憶
外文關鍵詞：	drowsiness detection, face detection, convolutional neural networks, long short-term memory, time skip combination long short-term memory
相關次數：	點閱：892 下載：4
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

疲勞駕駛為車禍事故的重要原因之一，且每年因疲勞駕駛而死亡的人數也日益增加，為了防止這個問題造成的影響，本研究提出疲勞駕駛偵測系統。

此研究所面臨的挑戰主要在於人臉的變化，系統準確性受到所需要的時間和實時性要求的限制，雖使用傳統的圖像處理和機器視覺的演算法已可很好地處理臉部變化的影響，但如臉部表情、光源影響、類內變異和姿勢等因素是傳統演算法未能解決的關鍵問題，因此深度學習是一種替代的解決方案，通過自動學習特徵的方式提供更好的性能。基於以上動機，本文提出了一種新型系統架構，結合卷積神經網絡（CNN）和長期短期記憶（LSTM）用於處理駕駛員疲勞的問題。該系統已用於ACCV 2016比賽的公共駕駛數據庫進行測試，並超越目前所提出的技術結果。

Drowsiness and fatigue of the drivers are amongst the significant causes of the accident. Every year they increase the number of deaths and fatalities to the human population. To prevent the impact that caused by this problem, the driver drowsiness system is proposed and examined in this study.
The challenge of this problem is the variation of the human face, the accuracy of the system which respected to the time that needed by the system to analyze with the real-time requirement. The first challenge pertaining the facial variation has been handled well using conventional image processing and hand-craft features of computer vision algorithms. Yet, variations such as facial expression, lighting condition, intra-class variation, and pose variation are additional critical issues that conventional method failed to address. Deep learning is an alternative solution which provides a better performance by learning features automatically. Thus, this thesis proposed a new concept for handling the real-time driver drowsiness detection using the hybrid of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM). The performance of the system has been tested using the public drowsy driver dataset from ACCV 2016 competition. The results show that it can outperform the former schemes in the literature.

Master’s Thesis Recommendation Form    i
Qualification Form by Master’s Degree Examination Committee    ii
Acknowledgement    iii
摘要    iv
Abstract    v
Table of Contents    vi
List of Figures    x
List of Tables    xv
Chapter 1    1
Introduction    1
1    Introduction    1
2    Motivation    3
3    Objective    4
4    Main Contribution    5
5    Thesis Organization    5
Chapter 2    6
Literature Review and Basic Theory    6
1    Driver Drowsiness Problem    6
1.1    Related Works    6
2    Convolutional Neural Network    9
2.1    Convolution Layer    9
2.2    Pooling Layer    11
2.3    Activation Function    13
2.4    Fully Connected Layer    15
2.5    Batch Normalization Layer    16
2.6    Dropout Layer    17
3    Optimizer and Loss Function    18
3.1    Batch Gradient Descent    19
3.2    Stochastic Gradient Descent    19
3.3    Softmax    20
3.4    Cross Entropy Loss Function    21
4    Designing CNN Architecture    22
5    VGG 11 Architecture    24
6    Face Detection    25
7    Long Short-Term Memory (LSTM)    26
Chapter 3    30
System Design and Algorithm Implementation    30
1    Proposed Method    31
2    ACCV Drowsy Driver Dataset    33
2.1    Acquisition and details    33
2.2    Data statistic    34
2.3    Data Augmentation    36
2.4    Handling Poor Vision    36
3    Eyes and Mouth Architecture Design    38
3.1    Stage Pruning    39
3.2    Add Layers    40
3.3    Replacing Filters    42
4    Face Feature Extraction    44
5    Temporal Feature    48
6    Time Skip Combination Long Short-Term Memory (TSC-LSTM)    50
Chapter 4    52
Experiment Setup    52
1    Hardware and Software    52
2    Training CNN for Eyes Feature    53
3    Training CNN for Mouth Feature    54
4    Training Time Skip Combination LSTM    55
5    Training Refinement LSTM    57
Chapter 5    58
Experimental Results    58
1    Best CNN Experiment    58
1.1    Eyes CNN Depth Effect    58
1.2    Eyes CNN Adding Layer    61
1.3    Eyes CNN Change Filter Size    63
1.4    Mouth CNN Depth Effect    66
1.5    Mouth CNN Adding Layer    69
1.6    Mouth CNN Change Filter Size    71
2    Time Skip Combination Long Short-Term Memory (TSC-LSTM) Experiment    74
2.1    Number of Hidden Size    74
2.2    Number of Layer    77
2.3    Sequences Length    80
2.4    Number of Time Skip    82
2.5    Time Skip 2 Step Combination    85
2.6    Time Skip 3 Step Combination    87
2.7    Last Fully Connected    90
2.8    Different Scenario    92
3    Refinement    95
3.1    Median Filter Refinement    95
3.2    LSTM Refinement    96
3.3    Final Discussion and Comparison    99
Chapter 6    101
Conclusion and Future Work    101
1    Conclusion    101
2    Future Works    102
References    103


                                

[1] I. Goodfellow, Y. Bengio, and A. Courville, Deep learning. MIT press, 2016.
[2] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, "Large-scale video classification with convolutional neural networks," pp. 1725-1732.
[3] S. Ji, W. Xu, M. Yang, and K. Yu, "3D convolutional neural networks for human action recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 1, pp. 221-231, 2013.
[4] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[5] M. Akopyan and E. Khashba, "Large-Scale YouTube-8M Video Understanding with Deep Neural Networks," arXiv preprint arXiv:1706.04488, 2017.
[6] C.-Y. Ma, M.-H. Chen, Z. Kira, and G. AlRegib, "TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition," arXiv preprint arXiv:1703.10667, 2017.
[7] T.-H. Shih and C.-T. Hsu, "MSTN: Multistage Spatial-Temporal Network for Driver Drowsiness Detection," pp. 146-153: Springer.
[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," pp. 1097-1105.
[9] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[10] C. Szegedy et al., "Going deeper with convolutions," pp. 1-9.
[11] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," pp. 770-778.
[12] S. Park, F. Pan, S. Kang, and C. D. Yoo, "Driver Drowsiness Detection System Based on Feature Representation Learning Using Various Deep Networks," pp. 154-164: Springer.
[13] J. Yu, S. Park, S. Lee, and M. Jeon, "Representation Learning, Scene Understanding, and Feature Fusion for Drowsiness Detection," pp. 165-177: Springer.
[14] C.-H. Weng, Y.-H. Lai, and S.-H. Lai, "Driver Drowsiness Detection via a Hierarchical Temporal Deep Belief Network," pp. 117-133: Springer.
[15] W. Zhang, B. Cheng, and Y. Lin, "Driver drowsiness recognition based on computer vision technology," Tsinghua Science and Technology, vol. 17, no. 3, pp. 354-362, 2012.
[16] R. N. Khushaba, S. Kodagoda, S. Lal, and G. Dissanayake, "Driver drowsiness classification using fuzzy wavelet-packet-based feature-extraction algorithm," IEEE Transactions on Biomedical Engineering, vol. 58, no. 1, pp. 121-131, 2011.
[17] P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features," vol. 1, pp. I-I: IEEE.
[18] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," pp. 448-456.
[19] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," Journal of machine learning research, vol. 15, no. 1, pp. 1929-1958, 2014.
[20] M. T. McCann, K. H. Jin, and M. Unser, "A review of convolutional neural networks for inverse problems in imaging," arXiv preprint arXiv:1710.04011, 2017.
[21] K. He and J. Sun, "Convolutional neural networks at constrained time cost," pp. 5353-5360.
[22] M. Lin, Q. Chen, and S. Yan, "Network in network," arXiv preprint arXiv:1312.4400, 2013.
[23] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision," pp. 2818-2826.
[24] K. Zhang, Z. Zhang, Z. Li, and Y. Qiao, "Joint face detection and alignment using multitask cascaded convolutional networks," IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1499-1503, 2016.
[25] C. Olah. (2015). Understanding LSTM Networks. Available: http://colah.github.io/posts/2015-08-Understanding-LSTMs/
[26] R. C. Gonzalez and R. E. Woods, "Digital image processing," ed: Prentice hall New Jersey, 2002.

簡易檢索 / 詳目顯示

相關論文