簡易檢索 / 詳目顯示

研究生: 陳昭勳
CHAS-HSUN CHEN
論文名稱: 基於視訊分析及LRCN模型之人體跌倒偵測
Human Fall Detection Based on Video Analysis and LRCN Model
指導教授: 楊振雄
Chen-Hsiung Yang
口試委員: 吳常熙
Chang-Shi Wu
郭鴻飛
Hung-Fei Kuo
郭永麟
Yong-Lin Kuo
楊振雄
Chen-Hsiung Yang
學位類別: 碩士
Master
系所名稱: 工程學院 - 自動化及控制研究所
Graduate Institute of Automation and Control
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 72
中文關鍵詞: 2D影像處理光流法深度學習LRCN模型跌倒偵測
外文關鍵詞: 2D image processing, Optical flow, Deep Learning, LRCN Model, Fall Detection
相關次數: 點閱:293下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來由於醫療進步,人類壽命越來越長,加上國內的「少子化」現象,造成人口老化,長照議題時常登上新聞版面,而跌倒容易造成年長者的傷害,因此我們利用電腦視覺及深度學習的技術,來偵測行動不便或年長者的活動狀態,解決護理中心人力不足的問題。
    有鑑於以往的論文時常使用人工標籤的方式來定義跌倒的參數,本論文使用2D影像視覺技術搭配深度學習的演算法,提出可以自動判別並偵測年長者跌倒的系統。
    本論文的系統架構可分為三個部分,首先利用稠密光流法(Dense Optical Flow)得到移動物體的影像資訊。接著用Sliding Window的技術將大影片分為許多小影片,每部小影片為10張影像。最後使用深度學習技術中LRCN (CNN+RNN)的混和模型對光流影像進行分析並判別跌倒事件的發生,CNN提取特徵模型使用VGG16架構,而RNN時序判別模型則使用LSTM架構作為隱藏層。
    最後我們調整超參數(hyperparameters)得到最佳的模型,並透過交叉驗證(Cross Validation)來分析模型是否有過度擬合(overfitting)的問題,經過實驗結果顯示,跌倒事件的辨識可達到敏感度(Sensitivity) 99.76693% 以及97.67701% 的準確率(Accuracy)。


    In recent years, due to medical progress, human life has become longer and longer, coupled with the phenomenon of “less child” in the country, causing the population to age. The home-cared issue is often on the news page, and the fall is easy to cause damage to the elderly. Therefore, we use computer vision. And deep learning techniques to detect the inconvenience of movement or the activity status of the elderly, and to solve the problem of insufficient manpower in the nursing center.
    In view of the fact that previous theses often use manual tags to define the parameters of falls. This thesis uses two-dimension image vision technology with deep learning algorithms to propose a system that can automatically identify and detect falls of the elderly.
    The system structure of this thesis can be divided into three parts. Firstly, the image information of the moving object is obtained by Dense Optical Flow. Then use the Sliding Window technology to divide the big movie into many small video clips, each of clips have 10 images. Finally, the LRCN (CNN+RNN) hybrid model in depth learning technology is used to analyze the optical flow image and discriminate the occurrence of the fall event. The CNN extraction feature model uses the VGG16 architecture, while the RNN timing discrimination model uses the LSTM architecture as the hidden layer.
    Finally, we tune the hyperparameters and then use five cross-validation to check whether the model has overfitting problems or not. The experimental results show that the identification of the fall event can achieve sensitivity 99.76693% and accuracy 97.67701%.

    摘要 i Abstract ii 致謝 iii CONTENTS iv List of Figure vi List of Table ix Chapter 1 Introduction 1 1.1 Introduction 1 1.2 Literature Review 3 1.2.1 Two-dimension Human Feature Extraction 3 1.2.2 Motion Detection 4 1.2.3 Deep Learning Method 5 1.3 Motivation and Purpose 6 1.4 Outline 7 Chapter 2 Optical Flow and Image Pre-processing 9 2.1 Optical Flow 9 2.1.1 Basic Theory 10 2.1.2 OpenCV TV-L1 11 2.2 Generate The Flow Images 13 2.2.1 RGB to Gray-Scale 14 2.2.2 Transfer the Flow Data 14 2.2.3 Storage the information into HSV image 17 2.3 Sliding Windows 22 2.4 Feature Data Storage 23 Chapter 3 Fall Detection System Based on Deep Learning 25 3.1 CNN feature extraction 25 3.1.1 Convolutional neural network 25 3.1.2 VGG16 model 28 3.1.3 Feature Extraction 31 3.2 Recurrent Neural Network (LSTM Cell) 31 3.3 Network Weights Optimization 37 3.4 Deep Learning Model 40 3.4.1 LRCN Model 40 3.4.2 Modify The Model 41 Chapter 4 Experimental Results and Discussion 46 4.1 Experimental Environment 46 4.2 Le2i Dataset 48 4.3 Compare TV-L1 and Farneback Flow Images 50 4.4 LRCN Model Results 53 4.4.1 Evaluation Methodology 53 4.4.2 Hyper parameter Tuning 54 4.4.3 Early Stopping 55 4.4.4 K-Fold Cross Validation Analysis 56 4.4.5 Compare the Accuracy 64 4.5 Online Test 65 Chapter 5 Conclusions and Future Work 71 5.1 Conclusions 71 5.2 Future Work 72 Reference 73

    [1]World Health Organization, World Health Organization. Ageing, & Life Course Unit. (2008). WHO global report on falls prevention in older age. World Health Organization.
    [2]Lee, T., & Mihailidis, A. (2005). An intelligent emergency response system: preliminary development and testing of automated fall detection. Journal of telemedicine and telecare, 11(4), 194-198.
    [3]Rougier, C., Meunier, J., St-Arnaud, A., & Rousseau, J. (2011). Robust video surveillance for fall detection based on human shape deformation. IEEE Transactions on circuits and systems for video Technology, 21(5), 611-622.
    [4]Mubashir, M., Shao, L., & Seed, L. (2013). A survey on fall detection: Principles and approaches. Neurocomputing, 100, 144-152.
    [5]Liu, C. L., Lee, C. H., & Lin, P. M. (2010). A fall detection system using k-nearest neighbor classifier. Expert systems with applications, 37(10), 7174-7181.
    [6]De Miguel, K., Brunete, A., Hernando, M., & Gambao, E. (2017). Home camera-based fall detection system for the elderly. Sensors, 17(12), 2864.
    [7]Zerrouki, N., Harrou, F., Houacine, A., & Sun, Y. (2016, November). Fall detection using supervised machine learning algorithms: A comparative study. In 2016 8th International Conference on Modelling, Identification and Control (ICMIC) (pp. 665-670). IEEE.
    [8]Zerrouki, N. and A. Houacine (2018). Combined curvelets and hidden Markov models for human fall detection. Multimedia Tools and Applications 77(5): 6405-6424.
    [9]Murtaza, F., Yousaf, M. H., & Velastin, S. A. (2015, December). Multi-view human action recognition using histograms of oriented gradients (HOG) description of motion history images (MHIs). In 2015 13th International Conference on Frontiers of Information Technology (FIT) (pp. 297-302). IEEE.
    [10]Varol, G., Laptev, I., & Schmid, C. (2017). Long-term temporal convolutions for action recognition. IEEE transactions on pattern analysis and machine intelligence, 40(6), 1510-1517.
    [11]Bilen, H., Fernando, B., Gavves, E., & Vedaldi, A. (2017). Action recognition with dynamic image networks. IEEE transactions on pattern analysis and machine intelligence, 40(12), 2799-2813.
    [12]Yan, S., Xiong, Y., & Lin, D. (2018, April). Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-Second AAAI Conference on Artificial Intelligence.
    [13]Alhimale, L., Zedan, H., & Al-Bayatti, A. (2014). The implementation of an intelligent and video-based fall detection system using a neural network. Applied Soft Computing, 18, 59-69.
    [14]Fan, Y. X., et al. (2017). A deep neural network for real-time detection of falling humans in naturally occurring scenes. Neurocomputing 260: 43-58.
    [15]Nunez-Marcos, A., et al. (2017). Vision-Based Fall Detection with Convolutional Neural Networks. Wireless Communications & Mobile Computing: 16.
    [16]Wang, K., Cao, G., Meng, D., Chen, W., & Cao, W. (2016, December). Automatic fall detection of human in video using combination of features. In 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 1228-1233). IEEE.
    [17]Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., ... & Darrell, T. (2014, November). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia (pp. 675-678). ACM.
    [18]Ge, C., Gu, I. Y. H., & Yang, J. (2017, September). Human fall detection using segment-level CNN features and sparse dictionary learning. In 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP) (pp. 1-6). IEEE.
    [19]Kong, Y., Huang, J., Huang, S., Wei, Z., & Wang, S. (2019). Learning spatiotemporal representations for human fall detection in surveillance video. Journal of Visual Communication and Image Representation, 59, 215-230.
    [20]WOS, “Web of Science,” Official Website: https://www.webofknowledge.com/
    [21]Gibson, J. J. (1950). "The perception of the visual world." Oxford, England: Houghton Mifflin, 1950.
    [22]Pérez, J. S., et al. (2013). "TV-L1 optical flow estimation." Image Processing On Line 2013: 137-150.
    [23]Opencv, “Open Source Computer Vision Library,” Official Website: https://opencv.org/
    [24]Pandas, “Python Data Analysis Library,” Official Website: https://pandas.pydata.org/
    [25]HDF5, “High-performance data management and storage suite,” Official Website: https://www.hdfgroup.org/solutions/hdf5/
    [26]Simonyan, K. and A. Zisserman (2014). "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556.
    [27]Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097-1105).
    [28]Pearlmutter, B. A. (1989). Learning state space trajectories in recurrent neural networks. Neural Computation, 1(2), 263-269.
    [29]Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
    [30]Kingma, D. P. and J. Ba (2014). "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980.
    [31]Hecht-Nielsen, R. (1992). Theory of the backpropagation neural network. In Neural networks for perception (pp. 65-93). Academic Press.
    [32]Donahue, J., et al. (2017). "Long-Term Recurrent Convolutional Networks for Visual Recognition and Description." IEEE Transactions on Pattern Analysis and Machine Intelligence 39(4): 677-691.
    [33]Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958.
    [34]Keras, “Keras: The Python Deep Learning library,” Official Website: https://keras.io/
    [35]Charfi, I., Miteran, J., Dubois, J., Atri, M., & Tourki, R. (2013). Optimized spatio-temporal descriptors for real-time fall detection: comparison of support vector machine and Adaboost-based classification. Journal of Electronic Imaging, 22(4), 041106.
    [36]Farneback, G. (2003). Two-frame motion estimation based on polynomial expansion. Image Analysis, Proceedings. J. Bigun and T. Gustavsson. 2749: 363-370.
    [37]Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
    [38]Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems (pp. 91-99).
    [39]Krishna, K., & Murty, N. M. (1999). Genetic K-means algorithm. IEEE Transactions on Systems Man And Cybernetics-Part B: Cybernetics, 29(3), 433-439.
    [40]Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.

    無法下載圖示 全文公開日期 2024/07/20 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE