研究生: |
陳建祐 Chien-Yu Chen |
---|---|
論文名稱: |
基於序列幀輸入的生成對抗網路架構結合 降噪機制於影片時空相關性分析之 跌倒偵測技術 Fall Detection with Spatial-Temporal Correlation Encoded by the Sequence-to-Sequence Denoised GAN |
指導教授: |
郭景明
Jing-Ming Guo |
口試委員: |
楊士萱
Shin-Hsuan Yang 王乃堅 Nai-Jian Wang 鍾國亮 Kuo-Liang Chung 夏至賢 Chih-Hsien Hsia |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 電機工程系 Department of Electrical Engineering |
論文出版年: | 2020 |
畢業學年度: | 108 |
語文別: | 中文 |
論文頁數: | 105 |
中文關鍵詞: | 生成對抗網路 、跌倒偵測 、飛時測距相機 、紅外線深度影像 |
外文關鍵詞: | GAN, Fall Detection, Time of Flight, IR-depth Images |
相關次數: | 點閱:885 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
老年族群的跌倒意外通常會導致較高的住院和死亡風險,或引起後續的併發症。如何讓跌倒意外發生當下能藉由示警系統發報而有立即的對應救護措施,成為銀髮長照中重要的環節。以往基於視覺的跌倒檢測方法大多缺乏實務上的考量,例如相機的架設角度、日夜光源環境變化與相機架設的地點所需考量到的隱私維護問題。本研究實驗中使用深度圖與熱成像圖做為偵測的輸入影像,其優點在於能保障用戶者的隱私且不受外在光源與日夜間環境的影響,以達到長時間連續偵測警示的效果。在使用深度學習架構下,由於跌倒行為的偶發性質會造成正負樣本收集不均的問題,故採用非監督式學習架構僅針對一般的正常行為進行模型訓練,而將跌倒意外定義為異常事件。本研究中提出了基於序列幀輸入預測未來幀的生成對抗網路架構結合降噪機制(S2SdGAN),其結合生成網路、判別網路、光流網路與編碼特徵比較降噪機制。生成網路由encoder-decoder的架構組成,它能夠基於正常行為的序列幀影像預測所對應的未來幀,以萃取受測者於一般正常行為的連續動態特徵,並且透過判別網路判斷目前生成之未來幀是否形態正確。同時使用光流網路幫助生成網路學習預測連續序列幀影像的時序動態關係。最終透過編碼特徵比較降噪機制以降低深度圖中的抖動雜訊影響,並判斷未來幀的時序動態變化是否正常以達到跌倒偵測的目的。與目前最前沿的技術相比,本論文中所提出的方法架構能實現(1)多人跌倒偵測、(2)即時(real-time)判斷是否發生跌倒意外、(3)受測者跌倒後被遮擋之解決方案與(4)濾除深度圖像雜訊 之網路。實驗中收集包含各種日常活動和跌倒意外的深度圖資料集以驗證系統的穩定性及有效性。本研究所提出跌倒偵測架構更能符合實際跌倒發生時的情境,並有效達到長時間連續偵測跌倒意外的功能。
Falling is a major cause of personal injury and accidental death worldwide, especially for the elderly. For aged care, a falling alarm system is highly demanded so that medical aid can be obtained immediately when the falling accident happened. Previous studies about falling detection lack practical considerations to work for real-world situations, which includes the camera’s mounting angle, lighting differences between day and night, and the privacy protection for users. For our experiments, IR-depth images and thermal images are used as the input image sources for fall detection, as a result, detailed facial information is not captured by the system, and it is invariant to the lighting conditions. Owing to the occasional property of falling, the problem of data imbalance between falling samples and normal samples may occur, and it became the major drawback for supervised learning approaches. Therefore, in this study, the anomaly detection is performed using unsupervised learning approaches. That is, the models are only trained with the normal cases, and falling accident will be defined as an anomaly event. In this thesis, the Sequence-to-Sequence Denoising GAN (S2SdGAN) for fall detection is proposed to perform spatial-temporal correlation analysis. The proposed network comprises the future frame generator, frame discriminator, FlowNet, and denoising scheme of encoded features comparisons. The designed framework for falling detection provides (1) multi-subject detection, (2) real-time fall alarm that is triggered by motion, (3) a solution to the situations that subjects became unseen after falling, and (4) a denoising scheme for depth images. The experimental results show that the proposed system achieves state-of-the-art performance on the public datasets. In addition, a dataset that includes real world falling accidents and other regular activities is futher collected to verify the validity and the robustness of the framework. And the results suggest that the proposed system can work for the real-world cases to achieve fall detection successfully.
[1] J. Nogas, S.S. Khan, and A. Mihailidis, “DeepFall: Non-Invasive Fall Detection with Deep Spatio-Temporal Convolutional Autoencoders,” Journal of Healthcare Informatics Research, vol. 4, no. 1, pp. 50-70, 2020.
[2] W. He, D. Goodkind, and P. R. Kowal, “An aging world: 2015,” International Population Reports. U.S. Government Printing Office, 2016.
[3] Z. Pang, L. Zheng, J. Tian, S. Kao-Walter, E. Dubrova, and Q. Chen, “Design of a terminal solution for integration of in-home health care devices and services towards the Internet-of-Things,” Enterprise Information Systems, vol. 9, no. 1, pp. 86-116, 2015.
[4] S. Bureau, “Ministry of internal affairs and communications, Japan,” Annual Report on Current Population Estimates, 2010.
[5] M.E. Tinetti, W.-L. Liu, and E.B. Claus, “Predictors and prognosis of inability to get up after falls among elderly persons,” Jama, vol. 269, no. 1, pp. 65-70, 1993.
[6] C. Lindmeier, World report on ageing and health, World Health Organization, 2015.
[7] R. Igual, C. Medrano, and I. Plaza, “Challenges, issues and trends in fall detection systems,” Biomedical engineering online, vol. 12, no. 1, p. 66, 2013.
[8] A. Dosovitskiy, P. Fischer, E. Ilg, P. Hausser, C. Hazirbas, V. Golkov, P. van der Smagt, D. Cremers, and T. Brox. Flownet: Learning optical flow with convolutional networks. In ICCV, pages 2758–2766, 2015.
[9] H. Kerdegari, K. Samsudin, A.R. Ramli, and S. Mokaram, “Evaluation of fall detection classification approaches,” in 2012 4th International Conference on Intelligent and Advanced Systems (ICIAS2012), vol. 1, pp. 131-136, 2012.
[10] A. Abobakr, M. Hossny, and S. Nahavandi, “A skeleton-free fall detection system from depth images using random decision forest,” IEEE Systems Journal, vol. 12, no. 3, pp. 2994-3005, 2017.
[11] G. Demiris, M.J. Rantz, M.A. Aud, K.D. Marek, H.W. Tyrer, M. Skubic and Ali A Hussam, “Older adults' attitudes towards and perceptions of ‘smart home’technologies: a pilot study,” Medical informatics and the Internet in medicine, vol. 29, no. 2, pp. 87-94, 2004.
[12] E.E. Stone and M. Skubic, “Fall detection in homes of older adults using the Microsoft Kinect,” IEEE journal of biomedical and health informatics, vol. 19, no. 1, pp. 290-301, 2014.
[13] A. Abobakr, M. Hossny, H. Abdelkader, and S. Nahavandi, “Rgb-d fall detection via deep residual convolutional lstm networks,” 2018 Digital Image Computing: Techniques and Applications (DICTA), pp. 1-7, 2018.
[14] A. Núñez-Marcos, G. Azkune, and I. Arganda-Carreras, “Vision-based fall detection with convolutional neural networks,” Wireless communications and mobile computing, vol. 2017, 2017.
[15] T.H. Tsai and C.W. Hsu, “Implementation of Fall Detection System Based on 3D Skeleton for Deep Learning Technique,” IEEE Access, vol. 7, pp. 153049-153059, 2019.
[16] B. Kwolek and M. Kepski, “Human fall detection on embedded platform using depth maps and wireless accelerometer,” Computer methods and programs in biomedicine, vol. 117, no. 3, pp. 489-501, 2014.
[17] S. Vadivelu, S. Ganesan, O. R. Murthy, and A. Dhall, “Thermal imaging based elderly fall detection,” Asian Conference on Computer Vision, Springer, pp. 541-553, 2016.
[18] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” International Conference on Medical image computing and computer-assisted intervention, Springer, pp. 234-241, 2015.
[19] P. Isola, J.Y. Zhu, T. Zhou, and A.A. Efros, “Image-to-image translation with conditional adversarial networks,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125-1134, 2017.
[20] J.Y. Zhu, T. Park, P. Isola, and A.A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” Proceedings of the IEEE international conference on computer vision, pp. 2223-2232, 2017.
[21] C. Li and M. Wand, “Precomputed real-time texture synthesis with markovian generative adversarial networks,” European conference on computer vision, Springer, pp. 702-716, 2016.
[22] D.P. Kingma and J.Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[23] G.E. Hinton and R.R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” science, vol. 313, no. 5786, pp. 504-507, 2006.
[24] V. Nair and G.E. Hinton, “Rectified linear units improve restricted boltzmann machines,” International Conference on Machine Learning (ICML), 2010.
[25] F.H. Chan, Y.T. Chen, Y. Xiang, and M. Sun, “Anticipating accidents in dashcam videos,” Asian Conference on Computer Vision, Springer, pp. 136-153, 2016.
[26] S.S. Khan, J. Nogas, and A. Mihailidis, “Spatio-Temporal Adversarial Learning for Detecting Unseen Falls,” arXiv preprint arXiv:1905.07817, 2019.
[27] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[28] A. Krizhevsky, I. Sutskever, and G.E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” advances in Neural Information Processing Systems (NIPS), 2012.
[29] V. Nair and G.E. Hinton, “Rectified linear units improve restricted boltzmann machines,” Machine Learning, Haifa, Israel, pp. 807-814, 2010.
[30] R.T. Tan, “Visibility in bad weather from a single image,” IEEE Inter. Conf. on Computer Vision and Pattern Recognition(CVPR), 2008.
[31] O. Ronneberger and P. Fischer and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” Medical Image Computing and Computer-Assisted Intervention (MICCAI), vol. 9351, pp. 234-241, 2015.
[32] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, “Generative adversarial nets,” Advances in Neural Information Processing Systems (NIPS), 2014.
[33] M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv preprint arXiv:1411.1784, 2014.
[34] C.O.A. Odena and J. Shlens, “Conditional image synthesis with auxiliary classifier gans,” arXiv preprint arXiv:1610.09585, 2016.
[35] J. Zhu, T. Park, P. Isola, and A.A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), 2017.
[36] M. Mathieu, C. Couprie, and Y. LeCun, “Deep multi-scale video prediction beyond mean square error,” arXiv preprint arXiv:1511.05440, 2015.
[37] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation.” Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431-3440, 2015.
[38] M. Drozdzal, E. Vorontsov, G. Chartrand, S. Kadoury, and C. Pal, “The Importance of Skip Connections in Biomedical Image Segmentation,” Workshop on Deep Learning in Medical Image Analysis (DLMIA), 2016.
[39] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in European conference on computer vision, 2014: Springer, pp. 818-833.
[40] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, and S. Ghemawat, “Tensorflow: Large-scale machine learning on heterogeneous distributed systems.”, arXiv preprint arXiv:1603.04467, 2016.
[41] Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., and Darrell, T., “Caffe: Convolutional architecture for fast feature embedding.”, Proc. of the 22nd ACM international conference on Multimedia, ACM, pp. 675-678, 2014.
[42] A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” arXiv preprint arXiv:1511.06434, 2015.
[43] M. Bertalmio, A. L. Bertozzi, and G. Sapiro, “Navier-stokes, fluid dynamics, and image and video inpainting,” Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2001), vol. 1, 2001.
[44] A. Telea, “An image inpainting technique based on the fast marching method,” Journal of graphics tools, vol. 9, no. 1, pp. 23-34, 2004.