簡易檢索 / 詳目顯示

研究生: 陳信融
Xin-Rong Chen
論文名稱: 萃取正規化單元圖案之包絡稀疏表示法應用於時間序列資料
Extracting and scaling the unit pattern for time series sparse representation with an envelope
指導教授: 鲍興國
Hsing-Kuo Pao
李育杰
Yuh-Jye Lee
口試委員: 項天瑞
Tien-Ruey Hsiang
蘇黎
Li Su
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 45
中文關鍵詞: 時間序列分類感知壓縮單元圖案
外文關鍵詞: time series, classification, compressed sensing, unit pattern
相關次數: 點閱:204下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

時間序列分類問題是個已研究多年的領域。基本上,時間序列分類問題可以簡單地分為三類: 基於距離、基於模型、基於特徵之方法。在此我們所關注的方向是基於特徵的時間序列分類之方法,這類方法從時間序列中萃取出特徵。然而,大多數已存在的時間序列表示法並非都能夠解釋。我們提出了一種新的時間序列表達法 - 包絡法。這種監督式的特徵萃取演算法將一條時間序列轉換成簡單的1/0/-1之型式,而且萃取出的特徵具有稀疏性,這是在感知壓縮應用中非常重要的一個性質。對於大部分的資料探勘、機器學習之課題,最重要的關鍵點是找出具有鑑別度的特徵來表示原始資料。因此,我們提出了一種啟發式的方法來決定最佳的包絡表示範圍。
然而,包絡法必須滿足使用的時間序列具備已經被對齊且等長的特性。有鑒於此,我們使用了簡單有效的方法從時間序列中切割出單元圖案並縮放至等長。首先掃描特定長度的子序列提取出類似的子序列,然後從提取出的類似子序列中中決定代表整個序列的圖案輪廓。提取單元圖案的方法 - 追跡輪廓法,使用得到的圖案輪廓來在其餘時間序列找到最佳的切割範圍。當我們掃描不同長度的序列時,使用內插縮放來讓不等長序列變為等長序列,之後再與圖案輪廓能計算相似度。最後,我們展示了追跡輪廓法適當地讓包絡法能在實際收取的資料集中表現良好。


The time sequence classification problem has been studied over a decade. Basically, time series classification can be categorized into three types, including distance-based, model-based, and feature-based. We focus on feature-based methods, which represent time series into a set of characterized values. However, features generated by most of existing representation techniques are not completely interpretable. A novel time series representation, envelope, is proposed. This supervised feature extraction method transforms time series into simple 1/0/-1 values, and the characteristic of sparsity which is essential property for applying compressed sensing. It is always important to find the most discriminating features for data mining task. Hence, a heuristic is introduced for determining the best area of representation.
However, the envelope have the prerequisite which is the time series must be well-synchronized with same length. Due to this fact, we use the simple but effective methods to cut the unit patterns from time sequence, and scale to equal length. We first scan subsequences of a specific length and extract similar subsequences, then select the pattern profile of the whole sequence from the similar subsequences. The method of cutting unit patterns, tracing profile method, use the pattern profile to find the best cutting range. While we scan the sequence with different length, use the interpolation scaling to let unequal length sequence become equal length time sequence. After that, can calculate the similarity between pattern profile and equal length time sequence. Finally, we demonstrate that tracing profile method and interpolation scaling appropriately to let envelope work well on real world dataset.

1 Introduction 1 2 Related Work 3 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Enumeration of time series motifs . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Time series classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3 Compressed Sensing 9 4 Sparse Envelope Representation for Time Series 12 5 Extracting Unit Pattern and Scaling to Specific Length 16 5.1 Extracting the unit patterns from time series . . . . . . . . . . . . . . . . . 16 5.2 Scaling unit pattern to specific length . . . . . . . . . . . . . . . . . . . . . 24 6 Research Framework 25 6.1 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 II 6.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 6.3 Decide best envelope of k times standard deviation . . . . . . . . . . . . . 30 7 Experimental Results 33 7.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 7.2 Scanning the around different length . . . . . . . . . . . . . . . . . . . . . 36 7.3 Influence of compression ratio . . . . . . . . . . . . . . . . . . . . . . . . . 39 7.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 8 Conclusion 42

[1] Robert Calderbank, Sina Jafarpour, and Robert Schapire. Compressed learning:
Universal sparse dimensionality reduction and learning in the measurement domain.
[2] E.J. Cand es and M.B.Wakin. An introduction to compressive sampling. IEEE Signal
Processing Magazine, 25(2):21{30, 2008.
[3] Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh.
Querying and mining of time series data: Experimental comparison of representations
and distance measures. Proc. VLDB Endow., 1(2):1542{1552, August 2008.
[4] Antonio Fratini, Mario Sansone, Paolo Bifulco, and Mario Cesarelli. Individual identi-
cation via electrocardiogram analysis. BioMedical Engineering OnLine, 14(1):1{23,
2015.
[5] Pierre Geurts. Pattern extraction for time series classi cation. In Luc De Raedt and
Arno Siebes, editors, Principles of Data Mining and Knowledge Discovery, volume
2168 of Lecture Notes in Computer Science, pages 115{127. Springer Berlin Heidel-
berg, 2001.
[6] A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdor , P. Ch. Ivanov,
R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. Phys-
ioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for
complex physiologic signals. Circulation, 101(23):e215{e220, 2000 (June 13). Cir-
culation Electronic Pages: http://circ.ahajournals.org/cgi/content/full/101/23/e215
PMID:1085218; doi: 10.1161/01.CIR.101.23.e215.
[7] Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. A symbolic representa-
tion of time series, with implications for streaming algorithms. In Proceedings of the
8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge
Discovery, DMKD '03, pages 2{11, New York, NY, USA, 2003. ACM.
[8] Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Pranav Patel. Finding motifs in
time series. pages 53{68, 2002.
[9] Jason Lines, Luke M. Davis, Jon Hills, and Anthony Bagnall. A shapelet transform
for time series classi cation. In Proceedings of the 18th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, KDD '12, pages 289{297, New
York, NY, USA, 2012. ACM.
44
[10] Abdullah Mueen and Nikan Chavoshi. Enumeration of time series motifs of all
lengths. Knowl. Inf. Syst., 45(1):105{132, October 2015.
[11] Chotirat (Ann) Ratanamahatana, Eamonn J. Keogh, Anthony J. Bagnall, and Ste-
fano Lonardi. A novel bit level time series representation with implication of similarity
search and clustering. In Advances in Knowledge Discovery and Data Mining, 9th
Paci c-Asia Conference, PAKDD 2005, Hanoi, Vietnam, May 18-20, 2005, Proceed-
ings, pages 771{777, 2005.
[12] Zhengzheng Xing, Jian Pei, and Eamonn Keogh. A brief survey on sequence classi-
cation. SIGKDD Explor. Newsl., 12(1):40{48, November 2010.
[13] Lexiang Ye and Eamonn Keogh. Time series shapelets: A new primitive for data min-
ing. In Proceedings of the 15th ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining, KDD '09, pages 947{956, New York, NY, USA,
2009. ACM.

QR CODE