基於視訊分析及LRCN模型之人體跌倒偵測｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	陳昭勳 CHAS-HSUN CHEN
論文名稱：	基於視訊分析及LRCN模型之人體跌倒偵測 Human Fall Detection Based on Video Analysis and LRCN Model
指導教授：	楊振雄 Chen-Hsiung Yang
口試委員:	吳常熙 Chang-Shi Wu 郭鴻飛 Hung-Fei Kuo 郭永麟 Yong-Lin Kuo 楊振雄 Chen-Hsiung Yang
學位類別：	碩士 Master
系所名稱：	工程學院 - 自動化及控制研究所 Graduate Institute of Automation and Control
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	72
中文關鍵詞：	2D影像處理、光流法、深度學習、LRCN模型、跌倒偵測
外文關鍵詞：	2D image processing, Optical flow, Deep Learning, LRCN Model, Fall Detection
相關次數：	點閱：293 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

近年來由於醫療進步，人類壽命越來越長，加上國內的「少子化」現象，造成人口老化，長照議題時常登上新聞版面，而跌倒容易造成年長者的傷害，因此我們利用電腦視覺及深度學習的技術，來偵測行動不便或年長者的活動狀態，解決護理中心人力不足的問題。
有鑑於以往的論文時常使用人工標籤的方式來定義跌倒的參數，本論文使用2D影像視覺技術搭配深度學習的演算法，提出可以自動判別並偵測年長者跌倒的系統。
本論文的系統架構可分為三個部分，首先利用稠密光流法(Dense Optical Flow)得到移動物體的影像資訊。接著用Sliding Window的技術將大影片分為許多小影片，每部小影片為10張影像。最後使用深度學習技術中LRCN (CNN+RNN)的混和模型對光流影像進行分析並判別跌倒事件的發生，CNN提取特徵模型使用VGG16架構，而RNN時序判別模型則使用LSTM架構作為隱藏層。
最後我們調整超參數(hyperparameters)得到最佳的模型，並透過交叉驗證(Cross Validation)來分析模型是否有過度擬合(overfitting)的問題，經過實驗結果顯示，跌倒事件的辨識可達到敏感度(Sensitivity) 99.76693% 以及97.67701% 的準確率(Accuracy)。

In recent years, due to medical progress, human life has become longer and longer, coupled with the phenomenon of “less child” in the country, causing the population to age. The home-cared issue is often on the news page, and the fall is easy to cause damage to the elderly. Therefore, we use computer vision. And deep learning techniques to detect the inconvenience of movement or the activity status of the elderly, and to solve the problem of insufficient manpower in the nursing center.
In view of the fact that previous theses often use manual tags to define the parameters of falls. This thesis uses two-dimension image vision technology with deep learning algorithms to propose a system that can automatically identify and detect falls of the elderly.
The system structure of this thesis can be divided into three parts. Firstly, the image information of the moving object is obtained by Dense Optical Flow. Then use the Sliding Window technology to divide the big movie into many small video clips, each of clips have 10 images. Finally, the LRCN (CNN+RNN) hybrid model in depth learning technology is used to analyze the optical flow image and discriminate the occurrence of the fall event. The CNN extraction feature model uses the VGG16 architecture, while the RNN timing discrimination model uses the LSTM architecture as the hidden layer.
Finally, we tune the hyperparameters and then use five cross-validation to check whether the model has overfitting problems or not. The experimental results show that the identification of the fall event can achieve sensitivity 99.76693% and accuracy 97.67701%.

摘要    i
Abstract    ii
致謝    iii
CONTENTS    iv
List of Figure    vi
List of Table    ix
Chapter 1 Introduction    1
1    Introduction    1
2    Literature Review    3
2.1    Two-dimension Human Feature Extraction    3
2.2    Motion Detection    4
2.3    Deep Learning Method    5
3    Motivation and Purpose    6
4    Outline    7
Chapter 2 Optical Flow and Image Pre-processing    9
1    Optical Flow    9
1.1    Basic Theory    10
1.2    OpenCV TV-L1    11
2    Generate The Flow Images    13
2.1    RGB to Gray-Scale    14
2.2    Transfer the Flow Data    14
2.3    Storage the information into HSV image    17
3    Sliding Windows    22
4    Feature Data Storage    23
Chapter 3 Fall Detection System Based on Deep Learning    25
1    CNN feature extraction    25
1.1    Convolutional neural network    25
1.2    VGG16 model    28
1.3    Feature Extraction    31
2    Recurrent Neural Network (LSTM Cell)    31
3    Network Weights Optimization    37
4    Deep Learning Model    40
4.1    LRCN Model    40
4.2    Modify The Model    41
Chapter 4 Experimental Results and Discussion    46
1    Experimental Environment    46
2    Le2i Dataset    48
3    Compare TV-L1 and Farneback Flow Images    50
4    LRCN Model Results    53
4.1    Evaluation Methodology    53
4.2    Hyper parameter Tuning    54
4.3    Early Stopping    55
4.4    K-Fold Cross Validation Analysis    56
4.5    Compare the Accuracy    64
5    Online Test    65
Chapter 5 Conclusions and Future Work    71
1 Conclusions    71
2 Future Work    72
Reference    73

                                

[1]World Health Organization, World Health Organization. Ageing, & Life Course Unit. (2008). WHO global report on falls prevention in older age. World Health Organization.
[2]Lee, T., & Mihailidis, A. (2005). An intelligent emergency response system: preliminary development and testing of automated fall detection. Journal of telemedicine and telecare, 11(4), 194-198.
[3]Rougier, C., Meunier, J., St-Arnaud, A., & Rousseau, J. (2011). Robust video surveillance for fall detection based on human shape deformation. IEEE Transactions on circuits and systems for video Technology, 21(5), 611-622.
[4]Mubashir, M., Shao, L., & Seed, L. (2013). A survey on fall detection: Principles and approaches. Neurocomputing, 100, 144-152.
[5]Liu, C. L., Lee, C. H., & Lin, P. M. (2010). A fall detection system using k-nearest neighbor classifier. Expert systems with applications, 37(10), 7174-7181.
[6]De Miguel, K., Brunete, A., Hernando, M., & Gambao, E. (2017). Home camera-based fall detection system for the elderly. Sensors, 17(12), 2864.
[7]Zerrouki, N., Harrou, F., Houacine, A., & Sun, Y. (2016, November). Fall detection using supervised machine learning algorithms: A comparative study. In 2016 8th International Conference on Modelling, Identification and Control (ICMIC) (pp. 665-670). IEEE.
[8]Zerrouki, N. and A. Houacine (2018). Combined curvelets and hidden Markov models for human fall detection. Multimedia Tools and Applications 77(5): 6405-6424.
[9]Murtaza, F., Yousaf, M. H., & Velastin, S. A. (2015, December). Multi-view human action recognition using histograms of oriented gradients (HOG) description of motion history images (MHIs). In 2015 13th International Conference on Frontiers of Information Technology (FIT) (pp. 297-302). IEEE.
[10]Varol, G., Laptev, I., & Schmid, C. (2017). Long-term temporal convolutions for action recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6), 1510-1517.
[11]Bilen, H., Fernando, B., Gavves, E., & Vedaldi, A. (2017). Action recognition with dynamic image networks. IEEE transactions on pattern analysis and machine intelligence, 40(12), 2799-2813.
[12]Yan, S., Xiong, Y., & Lin, D. (2018, April). Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-Second AAAI Conference on Artificial Intelligence.
[13]Alhimale, L., Zedan, H., & Al-Bayatti, A. (2014). The implementation of an intelligent and video-based fall detection system using a neural network. Applied Soft Computing, 18, 59-69.
[14]Fan, Y. X., et al. (2017). A deep neural network for real-time detection of falling humans in naturally occurring scenes. Neurocomputing 260: 43-58.
[15]Nunez-Marcos, A., et al. (2017). Vision-Based Fall Detection with Convolutional Neural Networks. Wireless Communications & Mobile Computing: 16.
[16]Wang, K., Cao, G., Meng, D., Chen, W., & Cao, W. (2016, December). Automatic fall detection of human in video using combination of features. In 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 1228-1233). IEEE.
[17]Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... & Darrell, T. (2014, November). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 675-678). ACM.
[18]Ge, C., Gu, I. Y. H., & Yang, J. (2017, September). Human fall detection using segment-level CNN features and sparse dictionary learning. In 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) (pp. 1-6). IEEE.
[19]Kong, Y., Huang, J., Huang, S., Wei, Z., & Wang, S. (2019). Learning spatiotemporal representations for human fall detection in surveillance video. Journal of Visual Communication and Image Representation, 59, 215-230.
[20]WOS, “Web of Science,” Official Website: https://www.webofknowledge.com/
[21]Gibson, J. J. (1950). "The perception of the visual world." Oxford, England: Houghton Mifflin, 1950.
[22]Pérez, J. S., et al. (2013). "TV-L1 optical flow estimation." Image Processing On Line 2013: 137-150.
[23]Opencv, “Open Source Computer Vision Library,” Official Website: https://opencv.org/
[24]Pandas, “Python Data Analysis Library,” Official Website: https://pandas.pydata.org/
[25]HDF5, “High-performance data management and storage suite,” Official Website: https://www.hdfgroup.org/solutions/hdf5/
[26]Simonyan, K. and A. Zisserman (2014). "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556.
[27]Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
[28]Pearlmutter, B. A. (1989). Learning state space trajectories in recurrent neural networks. Neural Computation, 1(2), 263-269.
[29]Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
[30]Kingma, D. P. and J. Ba (2014). "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980.
[31]Hecht-Nielsen, R. (1992). Theory of the backpropagation neural network. In Neural networks for perception (pp. 65-93). Academic Press.
[32]Donahue, J., et al. (2017). "Long-Term Recurrent Convolutional Networks for Visual Recognition and Description." IEEE Transactions on Pattern Analysis and Machine Intelligence 39(4): 677-691.
[33]Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958.
[34]Keras, “Keras: The Python Deep Learning library,” Official Website: https://keras.io/
[35]Charfi, I., Miteran, J., Dubois, J., Atri, M., & Tourki, R. (2013). Optimized spatio-temporal descriptors for real-time fall detection: comparison of support vector machine and Adaboost-based classification. Journal of Electronic Imaging, 22(4), 041106.
[36]Farneback, G. (2003). Two-frame motion estimation based on polynomial expansion. Image Analysis, Proceedings. J. Bigun and T. Gustavsson. 2749: 363-370.
[37]Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
[38]Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99).
[39]Krishna, K., & Murty, N. M. (1999). Genetic K-means algorithm. IEEE Transactions on Systems Man And Cybernetics-Part B: Cybernetics, 29(3), 433-439.
[40]Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.

全文公開日期 2024/07/20 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文