簡易檢索 / 詳目顯示

研究生: 林宥樺
You-Hua Lin
論文名稱: 時間序列資料中單元樣式之提取與分類
Time Series Unit Pattern Extraction and Classification
指導教授: 鮑興國
Hsing-Kuo Pao
李育杰
Yuh-Jye Lee
口試委員: 邱舉明
Ge-Ming Chiu
陳瑞彬
Ray-Bing Chen
蘇黎
Li Su
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 47
中文關鍵詞: 時間序列單元樣式分類
外文關鍵詞: time series, unit pattern, classification
相關次數: 點閱:221下載:10
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

時間序列分類的議題已經被研究多年,在時間序列資料中,確認序列中的哪些部分包含最有效的訊息,並以此來做時間序列的分類是很重要的,主要的問題是如何從一連串的時間序列中抓取出重要的特徵,如此一來不僅能夠有效降低傳輸與儲存成本,還能夠代表該序列的特性,因此我們提出一個方法來有效提取出時間序列中的單元模式,所謂的單元模式是時間序列中重複出現的子序列,這些子序列有著相同的形態,藉由提取出單元模式,它們可以視為一組特徵,並用來進行時間序列的分類。基本上,時間序列分類問題可以分為三類:基於距離、基於模型與基於特徵的分法,我們著重在基於特徵的方法上,利用提取出的特徵透過包絡法處理,因為包絡法表示出單元模式的形態,所以如何很好的從時間序列資料中提取出單元模式來建立包絡法是很重要的。我們提出的單元模式提取法能夠有效提取單元模式的形態,並透過內差縮放法將提取出的時間序列處理成相同長度,因此這組對齊且等長的時間序列就可以建立包絡法表示,而包絡法是該組時間序列資料的輪廓,我們可以利用包絡法將提取出的時間序列轉換為稀疏表示法的型式,此外轉換後的資料具有稀疏性,是壓縮感知應用中很重要的性質,可以進一步透過壓縮感知將資料壓縮,降低儲存與運算的成本。對於時間序列分類的問題,最重要的關鍵點是找出具鑑別度的特徵來代表原始的資料,為了利用包絡法表示出不同時間序列資料的輪廓,進而達到時間序列分類的目標,提取出單元模式做為特徵會是很重要的步驟,最後我們展示單元模式提取讓包絡法在實際收取的物聯網資料集中有良好的表現。


The problem of time series classification has been studied for decades. In time series data, it is important to determine which part of sequence contains the most significant information for classification. The main issue is whether certain features, that represent time series into a set of characterized values, could be extract from the time series data. We propose an efficient procedure that can extract unit patterns from time series as a set of segmented time sequences. Unit patterns are repeated subsequences frequently appear in time series data, which have different lengths but share the same shape. This procedure is faster and lower computation cost than subsequence matching. After unit pattern extraction, it can be regarded as features in time series data for further classification. Fundamentally, issues involving with time series classification can be categorized into three types, including distance-based, model-based and feature-based. In this paper, we focus on feature-based method, which represents time series into a set of characterized values. We apply a time series representation envelope to process the extraction result. Because the envelope represents the pattern shape, it is a critical task to extract the proper unit patterns from time series data. We propose this unit pattern extraction method can capture the pattern shape well. We elastically scale each segmented time series into the same length by interpolation scaling. Therefore, a set of well synchronized and equal length time series can be represented as an envelope. The envelope is the profile of this set of time series data. We can use the envelope to have sparse representation for each time series in the segmented time series data. Moreover, the transformation result by envelope has the characteristic of sparsity which is an essential property to apply compressed sensing. To have good performance for time series classification, we have to build the envelope that keeps the shape of unit pattern the most. At last, we demonstrate the unit pattern extraction method can approximately make envelope work well on real world datasets.

Introduction 1 2 Related Work 3 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Time series motifs discovery . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.3 Time series classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Compressed Sensing 9 4 Sparse Envelope Representation for Time Series 13 5 Unit Pattern Extraction and Rescaling 17 5.1 Time series unit pattern extraction . . . . . . . . . . . . . . . . . . . . . . 17 5.2 Scaling unit pattern to specific length . . . . . . . . . . . . . . . . . . . . . 22 6 Research Framework 28 6.1 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6.3 Determine best of k times standard deviation for envelope . . . . . . . . . 33 7 Experimental Results 35 7.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 7.2 Finding unit pattern shape . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 7.3 Influence of compression ratio . . . . . . . . . . . . . . . . . . . . . . . . . 41 7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 8 Conclusion 44

[1] Robert Calderbank, Sina Jafarpour, and Robert Schapire. Compressed learning:
Universal sparse dimensionality reduction and learning in the measurement domain.
2009.
[2] Emmanuel J. Candes and Michael B. Wakin. An introduction to compressive sampling. IEEE Signal Processing Magazine, 25(2):21-30, March 2008.
[3] Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh.Querying and mining of time series data: experimental comparison of representations and distance measures. Proceedings of the VLDB Endowment, 1(2):1542-1552, 2008.
[4] Pierre Geurts. Pattern extraction for time series classification. In Luc De Raedt and Arno Siebes, editors, Principles of Data Mining and Knowledge Discovery, volume 2168 of Lecture Notes in Computer Science, pages 115-127. Springer Berlin Heidelberg, 2001.
[5] A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdor_, P. Ch. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for
complex physiologic signals. Circulation, 101(23):e215-e220, 2000 (June 13). Circulation Electronic Pages: http://circ.ahajournals.org/cgi/content/full/101/23/e215
PMID:1085218; doi: 10.1161/01.CIR.101.23.e215.
[6] Eamonn Keogh and Shruti Kasetty. On the need for time series data mining benchmarks: A survey and empirical demonstration. Data Mining and Knowledge Discovery, 7(4):349-371, 2003.
[7] Eamonn Keogh, Jessica Lin, and Ada Fu. Hot sax: efficiently finding the most unusual time series subsequence. In Fifth IEEE International Conference on Data Mining (ICDM'05), pages 8 pp.-, Nov 2005.
[8] Y. J. Lee, H. K. Pao, S. H. Shih, J. Y. Lin, and X. R. Chen. Compressed learning for time series classification. In 2016 IEEE International Conference on Big Data (Big Data), pages 923-930, Dec 2016.
[9] Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. A symbolic representation of time series, with implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, DMKD '03, pages 2{11, New York, NY, USA, 2003. ACM.
[10] Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Pranav Patel. Finding motifs in time series. In Proc. of the 2nd Workshop on Temporal Data Mining, pages 53-68, 2002.
[11] Jason Lines, Luke M. Davis, Jon Hills, and Anthony Bagnall. A shapelet transform for time series classification. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '12, pages 289-297, New York, NY, USA, 2012. ACM.
[12] Battuguldur Lkhagva, Yu Suzuki, and Kyoji Kawagoe. Extended sax: Extension of symbolic aggregate approximation for financial time series data representation. DEWS2006 4A-i8, 7, 2006.
[13] Abdullah Mueen. Enumeration of time series motifs of all lengths. In 2013 IEEE 13th International Conference on Data Mining, pages 547-556, Dec 2013.
[14] Abdullah Mueen, Eamonn Keogh, Qiang Zhu, Sydney Cash, and Brandon Westover. Exact Discovery of Time Series Motifs, pages 473-484.
[15] Chotirat Ratanamahatana, Eamonn Keogh, Anthony J. Bagnall, and Stefano Lonardi. A Novel Bit Level Time Series Representation with Implication of Similarity Search and Clustering, pages 771-777. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005.
[16] Chao Sun, David Stirling, Christian Ritz, and Claude Sammut. Variance-wise segmentation for a temporal-adaptive sax. In Proceedings of the Tenth Australasian Data Mining Conference-Volume 134, pages 71-77. Australian Computer Society, Inc., 2012.
[17] Yoshiki Tanaka, Kazuhisa Iwamoto, and Kuniaki Uehara. Discovery of time-series motif from multi-dimensional data based on mdl principle. Machine Learning, 58(2):269-300, Feb 2005.
[18] Zhengzheng Xing, Jian Pei, and Eamonn Keogh. A brief survey on sequence classification. SIGKDD Explor. Newsl., 12(1):40-48, November 2010.
[19] Lexiang Ye and Eamonn Keogh. Time series shapelets: A new primitive for data mining. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '09, pages 947-956, New York, NY, USA, 2009. ACM.

QR CODE