簡易檢索 / 詳目顯示

研究生: 桂茲文
Zi-Wen Gui
論文名稱: 基於詞袋模型與稀疏表示之時間序列資料分類
Bag-of-Words for Time Series Classification via Sparse Representation
指導教授: 李育杰
Yuh-Jye Lee
口試委員: 葉倚任
Yi-Ren Yeh
Hsing-Kuo Kenneth Pao
Tien-Ruey Hsiang
學位類別: 碩士
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2015
畢業學年度: 103
語文別: 英文
論文頁數: 45
中文關鍵詞: 時間序列分析資料探勘機器學習稀疏表示特徵學習
外文關鍵詞: time series analysis, data mining, machine learning, sparse coding, feature learning
相關次數: 點閱:542下載:1
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

時間序列資料分類近年來在機器學習和資料探勘上變成越來越重要的議題,而在時間序列資料分析時最重要的步驟是如何去提取特徵,並重新表示這種資料型態。一般來說,我們可以視每個时间戳記點為一個特徵,然而這樣的方法會遇到序列偏移、歪曲以及高維度問題。有鑑於這些問題對時間序列資料的影響,在這篇論文中我們提出一種通用的特徵提取架構,從 "文件探勘"中而來的 Bag-of-Words 概念在電腦視覺上也獲得蠻好的成果,因此我們將此應用於表示時間序列資料。從原始的時間序列中擷取出局部特徵來學習字典,並利用這些具有描述局部特徵能力的字典來重新表示時間序列資料。在我們的實驗中, 我們以 UCR time series collection datasets 所提供的資料來做評估,並與其他已知的方法做比較, 評估我們的時間序列特徵表示法。最後將此方法應用於開門實驗中,在門上裝置加速度儀測量每個人開門的行為,預期每個人開門會有不同的行為模式,最後用此論文的方法來做到很好的辨識開門的使用者。

Time series classification has attracted increasing attention in machine learning and data mining. In the analysis of time series data, how to represent data is a critical step for the performance. Generally, we can regard each value as a feature dimension for time series data instance. However, this naive representation might be not suitable because time series data usually be shifted, distorted and scaled in time. To address these problems, we proposed a general time series feature extraction framework. Since the concept of “Bag-of-Words” from text mining has shown promising performance in computer vision, we apply it to represent time series data. The subsequences from raw series were extracted as local patterns for learning codebook. Consequently, we encode a time series data instance by the codebook, which describes different local patterns of time series data. In our experiments, we demonstrate that our proposed method can achieve better results in UCR time series collection datasets by comparing with competitive methods. Finally, we apply this method to door opening experiment. The accelerometer is attached at the door to collect user door opening trajectory. We assume that each person has own pattern of opening trajectory. This method also makes promising performance for user identification.

1 Introduction 1.1 Motivation 1.2 Organization 2 Background and Related Work 2.1 Bag-of-Words 2.2 Sparse Coding 2.3 Coordinate Descent 2.4 Word in Time Series 2.5 Bag-of-Patterns 2.6 Latent Patterns 3 Framework 3.1 Extracting Subsequences 3.2 Codebook Learning 3.3 Feature Encoding with Codebook 4 Experiments 4.1 Datasets 4.1.1 UCR collection 4.1.2 Door opening/closing trajectory 4.2 Experimental Results 4.2.1 Kernel Function 4.2.2 Window Size 4.2.3 Codebook Size 4.2.4 Results of UCR collection 4.2.5 Results of door opening/closing trajectory 5 Conclusion and Future Work

[1] Abdulla-Al-Maruf, Hung-Hsuan Huang, and Kyoji Kawagoe. Time series classification method based on longest common subsequence and textual approximation. In Digital Information Management (ICDIM), 2012 Seventh International Conference on, pages 130-137, Aug 2012.

[2] Boris Babenko, Ming-Hsuan Yang, and Serge Belongie. Robust object tracking with online multiple instance learning. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 33(8):1619-1632, Aug 2011.

[3] Mustafa Gokce Baydogan, George Runger, and Eugene Tuv. A bag-of-features framework to classify time series. IEEE Trans. Pattern Anal. Mach. Intell., 35(11):2796-2802, November 2013.

[4] Tak chung Fu. A review on time series data mining. Engineering Applications of Artificial Intelligence, 24(1):164-181, 2011.

[5] Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh. Querying and mining of time series data: Experimental comparison of representations and distance measures. Proc. VLDB Endow., 1(2):1542-1552, August 2008.

[6] Rob Fergus, Pietro Perona, and Andrew Zisserman. Object class recognition by unsupervised scale-invariant learning. In Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on, volume 2, pages II-264-II-271 vol.2, June 2003.

[7] Jerome Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of statistical software, 33.1:122, 2010.

[8] Zhouyu Fu, Guojun Lu, Kai Ming Ting, and Dengsheng Zhang. Music classification via the bag-of-features approach. Pattern Recognition Letters, 32(14):1768 -1777,2011.

[9] Pierre Geurts. Pattern extraction for time series classification. In Luc De Raedt and Arno Siebes, editors, Principles of Data Mining and Knowledge Discovery, volume 2168 of Lecture Notes in Computer Science, pages 115-127. Springer Berlin Heidelberg, 2001.

[10] Josif Grabocka, Alexandros Nanopoulos, and Lars Schmidt-Thieme. Invariant time series classification. In PeterA. Flach, Tijl De Bie, and Nello Cristianini, editors, Machine Learning and Knowledge Discovery in Databases, volume 7524 of Lecture Notes in Computer Science, pages 725-740. Springer Berlin Heidelberg, 2012.

[11] Josif Grabocka and Lars Schmidt-Thieme. Invariant time-series factorization. Data Mining and Knowledge Discovery, 28(5-6):1455-1479, 2014.

[12] Steinn Gudmundsson, Thomas Philip Runarsson, and Sven Sigurdsson. Support vector machines and dynamic time warping for time series. In Neural Networks, 2008. IJCNN 2008. (IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, pages 2772-2776, June 2008.

[13] Zi-Wen Gui and Yi-Ren Yeh. Time series classification with temporal bag-of-words model. In Shin-Ming Cheng and Min-Yuh Day, editors, Technologies and Applications of Artificial Intelligence, volume 8916 of Lecture Notes in Computer Science, pages 145-153. Springer International Publishing, 2014.

[14] J. Ponce J. Mairal, F. Bach and G. Sapiro. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11:19-60, 2010.

[15] Thorsten Joachims. Text categorization with support vector machines: Learning with many relevant features. In Claire Ndellec and Cline Rouveirol, editors, Machine Learning: ECML-98, volume 1398 of Lecture Notes in Computer Science, pages 137-142. Springer Berlin Heidelberg, 1998.

[16] E. Keogh, Q. Zhu, B. Hu, Hao. Y., X. Xi, L. Wei, and C. A. Ratanamahatana. The ucr time series classification/clustering, 2011.

[17] Chih-Jen Lin. Projected gradient methods for nonnegative matrix factorization. Neural Comput., 19(10):2756-2779, October 2007.

[18] Jessica Lin, Eamonn Keogh, Li Wei, and Stefano Lonardi. Experiencing sax: a novel symbolic representation of time series, 2007.

[19] Jessica Lin and Yuan Li. Finding structural similarity in time series data using bag of-patterns representation. In Marianne Winslett, editor, Scientific and Statistical Database Management, volume 5566 of Lecture Notes in Computer Science, pages 461-477. Springer Berlin Heidelberg, 2009.

[20] Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. Online dictionary learning for sparse coding. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pages 689-696, New York, NY, USA, 2009. ACM.

[21] Thanawin Rakthanmanon, Bilson Campana, Abdullah Mueen, Gustavo Batista,
Brandon Westover, Qiang Zhu, and Jesin Zakaria Eamonn Keogh. Searching and
mining trillions of time series subsequences under dynamic time warping. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pages 262-270, New York, NY, USA, 2012. ACM.

[22] Yuanming Suo, Minh Dao, Umamahesh Srinivas, Vishal Monga, and Trac D. Tran. Structured dictionary learning for classification. IEEE Transactions on Signal Processing, 2014.

[23] Paul Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, 109(3):475-494, 2001.

[24] Jin Wang, Ping Liu, Mary F.H.She, Saeid Nahavandi, and Abbas Kouzani. Bag-of-words representation for biomedical time series classification. Biomedical Signal Processing and Control, 8(6):634-644, 2013.

[25] Chin-Chia Michael Yeh and Yi-Hsuan Yang. Supervised dictionary learning for music genre classification. In Proceedings of the 2Nd ACM International Conference on Multimedia Retrieval, ICMR '12, pages 55:1-55:8, New York, NY, USA, 2012. ACM.

[26] Yi-Ren Yeh, Ting-Chu Lin, Yung-Yu Chung, and Yu-Chiang Frank Wang. A novel multiple kernel learning framework for heterogeneous feature fusion and variable selection. Multimedia, IEEE Transactions on, 14(3):563-574, June 2012.

無法下載圖示 全文公開日期 2020/04/24 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)