簡易檢索 / 詳目顯示

研究生: 游騰凱
TENG-KAI YOU
論文名稱: 使用軌跡萃取特徵在第一人稱視角下動作辨識之研究
The Study of Trajectory Aligned Features for Action Recognition in First-Person Videos
指導教授: 陳郁堂
Yie-Tarng Chen
口試委員: 方文賢
Wen-Hsien Fang
吳乾彌
Chen-Mie Wu
陳省隆
Hsing-Lung Chen
林銘波
Ming-Bo Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 35
中文關鍵詞: 動作辨識第一人稱視角動量分解池化排序軌跡描述與代表
外文關鍵詞: action recognition, first-person video, motion decomposition, rank pooling, trajectory representation
相關次數: 點閱:174下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

隨著科技日新月異,產品推陳出新,穿戴式攝影機越來越普及,例如:GoPro,Google Glass,HTC Re等,不僅體積小,影片的解析度也很清晰。隨著各式的攝影機,數以萬計的影片配放在網路上供人瀏覽。當攝影機沒有被固定在一個穩定的地方,而是隨著人的移動做拍攝時,一般的動作辨識可能就沒有辦法準確的分辨當下的行為。我們的目的是希望能在這種第一人稱的視角之下,也能做準確的分類。首先我們針對晃動的影片做穩定,以減少全域動量。接著在影片中找出含有動量的軌跡,沿著軌跡取HOG,HOF,MBH等特徵。接著針對這些特徵找出他們在時間軸以及空間軸的代表方式(Fisher vector 與 VideoDarwin),最後用SVM當分類器,判別動作的種類。


With the popularity of portable devices, the number of egocentric videos increase. Unstructured movement of the camera due to natural head motion of the first-person causes sharp changes in the visual field of the egocentric camera causing many traditional action recognition methods to perform poorly in these scenario. This paper proposed a new approach for action recognition in First-Person videos by decomposing the dense trajectories and finding a better representation for spatial and temporal informations by fisher vector and ranking machine. Simulation results show the robustness of the proposed method which outperform the other handcraft feature approaches.

中文摘要 Abstract Acknowledgment Table of contents List of tables List of figures 1. Introduction 2. Related work 3. Approach 3.1 Trajectory extraction by optical flow 3.2 Motion decomposition 3.3 Trajectory-aligned descriptors 3.4 Spatial encoding by Fisher Vector 3.5 Temporal encoding by Video Evolution 4. Experimental results 4.1 Dataset and parameter settings 4.2 Parameters setting and procedure Result Comparison 4.3 Comparison With Improved Dense Trajectory 4.4 Comparison With Baseline Approaches 4.5 Fail Cases Discussion 5. Conclusion References

[1] H. Wang, A. Klaser, C. Schmid, and C. L. Liu, \Action recognition by dense
trajectories," in Proceedings of the IEEE Computer Society Conference on Com-
puter Vision and Pattern Recognition (CVPR), pp. 3169{3176, 2011.
[2] H. Wang and C. Schmid, \Action recogntiion with improved trajectories," in
Proceedings of the IEEE International Conference on Computer Vision (ICCV),
pp. 3551{3558, 2013.
[3] W. Heng, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid, \Evaluation of
local spatio-temporal features for action recognition To cite this version," in
British Machine Vision Conference (BMVC), 2009.
[4] A. Fathi, A. Farhadi, and J. M. Rehg, \Understanding egocentric activities," in
Proceedings of the IEEE International Conference on Computer Vision (ICCV),
pp. 407{414, 2011.
[5] K. Ogaki, K. M. Kitani, Y. Sugano, and Y. Sato, \Coupling eye-motion and
ego-motion features for rst-person activity recognition," in IEEE Computer
Society Conference on Computer Vision and Pattern Recognition Workshops
(CVPR), pp. 1{7, 2012.
[6] A. Fathi, Y. Li, and J. M. Rehg, \Learning to recognize daily actions using
gaze," in Lecture Notes in Computer Science (including subseries Lecture Notes
in Arti cial Intelligence and Lecture Notes in Bioinformatics), vol. 7572 LNCS,
pp. 314{327, 2012.
[7] A. Fathi, X. Ren, and J. M. Rehg, \Learning to recognize objects in egocentric
activities," in Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR), pp. 3281{3288, 2011.
[8] M. E. Kakar, M. A. Khan, M. S. Khan, K. Ashraf, M. A. Kakar, Hamdullah,
S. Jan, and A. Razzaq, \Prevalence of tick infestation in di erent breeds
of cattle in balochistan," in Journal of Animal and Plant Sciences, vol. 27,
pp. 797{802, 2017.
[9] P. Wang, Y. Cao, C. Shen, L. Liu, and H. T. Shen, \Temporal Pyramid Pooling
Based Convolutional Neural Networks for Action Recognition," in Proceedings
of IEEE Computer Vision and Pattern Recognition (CVPR), pp. 1{8, 2015.
[10] L. Wang, Y. Qiao, and X. Tang, \Action recognition with trajectory-pooled
deep-convolutional descriptors," in Proceedings of the IEEE Computer Society
Conference on Computer Vision and Pattern Recognition (CVPR), vol. 07-12-
June, pp. 4305{4314, 2015.
[11] X. Shu, J. Tang, G.-J. Qi, Y. Song, Z. Li, and L. Zhang, \Concurrence-Aware
Long Short-Term Sub-Memories for Person-Person Action Recognition," in Pro-
ceedings of IEEE Computer Vision and Pattern Recognition (CVPR), 2017.
[12] Y. Iwashita, A. Takamine, R. Kurazume, and M. S. Ryoo, \First-person animal
activity recognition from egocentric videos," in Proceedings of the International
Conference on Pattern Recognition (ICPR), no. i, pp. 4310{4315, 2014.
[13] R. Kahani, A. Talebpour, and A. Mahmoudi-Aznaveh, \Time series correlation
for rst-person videos," in Iranian Conference on Electrical Engineering
(ICEE), pp. 805{809, 2016.
[14] S. Narayan, M. S. Kankanhalli, and K. R. Ramakrishnan, \Action and interaction
recognition in rst-person videos," in IEEE Computer Society Conference
on Computer Vision and Pattern Recognition Workshops (CVPR), pp. 526{532,
2014.
[15] M. S. Ryoo and L. Matthies, \First-person activity recognition: What are they
doing to me?," in Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR), no. June, pp. 2730{2737,
2013.
[16] S. Wu, O. Oreifej, and M. Shah, \Action recognition in videos acquired by a
moving camera using motion decomposition of Lagrangian particle trajectories,"
Iccv2011, pp. 1419{1426, 2011.
[17] Y. J. Lee, J. Ghosh, and K. Grauman, \Discovering important people and objects
for egocentric video summarization," in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR),
pp. 1346{1353, 2012.
[18] F. Ozkan, M. A. Arabaci, E. Surer, and A. Temizel, \Boosted Multiple Kernel
Learning for First-Person Activity Recognition," in Proceedings of the 25th
European Signal Processing Conference (EUSIPCO), no. 1059, 2017.
[19] S. Song, N. M. Cheung, V. Chandrasekhar, B. Mandal, and J. Liri, \Egocentric
activity recognition with multimodal sher vector," Proceedings of the IEEE In-
ternational Conference on Acoustics, Speech and Signal Processing (ICASSP),
vol. 2016-May, pp. 2717{2721, 2016.
[20] T. P. Moreiral, D. Menotti, and H. Pedrini, \First-person action recognition
through visual rhythm texture description," in Proceedings of IEEE Inter-
national Conference on Acoustics, Speech and Signal Processing (ICASSP),,
pp. 2627{2631, 2017.
[21] M. S. Ryoo, B. Rothrock, and L. Matthies, \Pooled motion features for rstperson
videos," in Proceedings of the IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR), vol. 07-12-June, pp. 896{
904, 2015.
[22] M. Maximov, S.-r. Oh, and M. S. Park, \Real-Time Action Recognition System
from a First-Person View Video Stream," International Journal of Computer
Theory and Engineering, vol. 9, no. 2, 2017.
[23] C. Liu, W. Adviser-Freeman, and E. Adviser-Adelson, \Beyond pixels: exploring
new representations and applications for motion analysis," in Proceedings
of European Conference on Computer Vision (ECCV), pp. 28{42, 2009.
[24] T. Brox, N. Papenberg, and J. Weickert, \High Accuracy Optical Flow Estimation
Based on a Theory for Warping," in Proceedings of European Conference
on Computer Vision (ECCV), vol. 4, pp. 25{36, 2004.
[25] A. Bruhn, J. Weickert, and C. Schnorr, \Lucas/Kanade meets Horn/Schunck:
Combining local and global optic
ow methods," International Journal of Com-
puter Vision (IJCV), vol. 61, no. 3, pp. 1{21, 2005.
[26] M. H. Nguyen and F. D. la Torre, \Robust Kernel Principal Component Analysis,"
Advances in Neural Information Processing Systems, vol. 21, no. 11,
pp. 3179{3213, 2008.
[27] B. Fernando, E. Gavves, M. Jose Oramas, A. Ghodrati, and T. Tuytelaars,
\Rank Pooling for Action Recognition," IEEE Transactions on Pattern Anal-
ysis and Machine Intelligence (PAMI), vol. 39, no. 4, pp. 773{787, 2017.
[28] A. Vedaldi and B. Fulkerson, \VLFeat: An open and portable library of computer
vision algorithms." http://www.vlfeat.org/, 2008.
[29] G. Abebe, A. Cavallaro, and X. Parra, \Robust multi-dimensional motion features
for rst-person vision activity recognition," in Computer Vision and Im-
age Understanding, vol. 149, pp. 229{248, Elsevier Inc., 2016.

QR CODE