簡易檢索 / 詳目顯示

研究生: 廖韋茹
Wei-Ju Liao
論文名稱: 應用鄰近差值法於動態資料區間之時間序列搜尋
Adjacent Difference Value (ADV) Method for Dynamic Segmentation in Time Series Data Search
指導教授: 楊朝龍
Chao-Lung Yang
林承哲
Cheng-Jhe Lin
口試委員: 楊朝龍
Chao-Lung Yang
林承哲
Cheng-Jhe Lin
鄭辰仰
Chen-Yang Cheng
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 73
中文關鍵詞: 時間序列資料分割動態分割異常檢測維度縮減
外文關鍵詞: Time Series Segmentation, Dynamic Segmentation, Anomaly Detection, Dimensionality Reduction
相關次數: 點閱:183下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 啟發式時間序列符號聚合近似法(HOT SAX)為一應用於檢測出現在時間序列資料中之異常區間的符號代表法。時間序列資料中的異常為不遵循資料走向趨勢之不尋常的區段,換句話說,時間序列資料中之異常即為一個與所有其他子序列相似程度最低的子序列。然而,時間序列資料通常涵蓋了較長的時間幅度,如何處理大量的時間序列資料為第一個需要被克服的難題。由於HOT SAX法採用維度縮減及使用兩種啟發式演算法對維度縮減後的資料進行異常搜尋,使HOT SAX法能夠有效率地檢測出異常區段。儘管如此,由於HOT SAX法透過設定一個固定長度的區間全面地對整個資料進行異常搜尋,當設定的搜尋區間長度改變時,異常檢測的結果也會隨著改變,且搜尋區間的最佳長度難以被決定。因此,搜尋區間之最佳長度的判斷為HOT SAX法最主要的疑慮。故為了解決HOT SAX法的主要疑慮,本研究提出一個能夠在不設定任何參數的前提之下,對時間序列資料進行動態的切割方法,稱為鄰近差值法(ADV),ADV法利用資料點之間的變動將資料動態地切分成多個不同長度的子序列。此外,為了完成異常的檢測,本研究應用了名為FastDTW之方法來比較不同子序列之間的相似程度。本研究之實驗結果顯示,ADV法可簡單且有效的對時間序列資料進行動態切割,且透過與HOT SAX法的比較證明,ADV法確實能夠應用於異常檢測且具有較高的計算效率。


    Heuristic Ordered Time series Symbolic Aggregate approXimation (HOT SAX) is a well-known symbolic representation approach used to detect the anomalies existing in time series data. Time series anomalies are the unusual patterns which does not follow the tendency of time series data. In other words, time series anomalies are the subsequences which have the less level of similarity with other subsequence. However, time series data usually covers a long interval of time, how to compute the large amount of data is the first difficulty to be overcome. Since HOT SAX reduces the dimensionality of data and then searches anomalies in the reduced data with two heuristic algorithms, HOT SAX can detect the anomalies efficiently. Nevertheless, because HOT SAX searches anomalies through sliding a fixed-length window on the entire data comprehensively, the results of anomalies detected would change when setting a different length of window and the optimal length of the window is hard to determine. Hence, the determination of optimal length of the sliding window is the main concern of HOT SAX. Therefore, to solve the main concern of HOT SAX, Adjacent Mean Difference (ADV) segmentation method, which can segment the time series data dynamically without setting any parameter, was proposed in this research. Essentially, ADV partitions data into multiple subsequences with different lengths based on the transitions between data points. To complete the detection of anomalies, FastDTW was used to compare the level of similarity between every subsequence. The experiments demonstrate that ADV is an easy and efficient method. And the comparison with HOT SAX shows that ADV is really useful and can be used to detect anomalies with better computational efficiency.

    摘要 I ABSTRACT II 誌謝 III TABLE OF CONTNETS IV LIST OF TABLES VI LIST OF FIGURES VII Chapter 1 INTRODUCTION 1 1.1 Research Background 1 1.2 Research Objective 2 1.3 Research Framework 4 Chapter 2 LITERATURE REVIEW 5 2.1 Basic Definitions 5 2.2 Time Series Representation 6 2.3 Similarity Search 9 2.4 Anomaly Detection 14 Chapter 3 METHODOLOGY 21 3.1 Problem Definition 21 3.2 Framework of Methodology 22 Chapter 4 EXPERIMENTAL EVALUATION 44 4.1 Anomaly Detection on Periodic Data 44 4.2 Anomaly Detection on Electrocardiograms 45 Chapter 5 CONCLUSION AND DISCUSSION 48 REFERENCE 53 APPENDIX 58 A. Main Code: Normalization 58 B. Main Code: Difference Value Calculation 58 C. Main Code: Segmentation 59 D. Main Code: Anomaly Detection 63

    [1] K. Zhou, L. Taigang, and Z. Lifeng, "Industry 4.0: Towards future industrial opportunities and challenges," in 2015 12th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD), 2015, pp. 2147-2152.
    [2] G. Shao, S. J. Shin, and S. Jain, "Data analytics using simulation for smart manufacturing," in Proceedings of the Winter Simulation Conference 2014, 2014, pp. 2192-2203.
    [3] C.-F. Tsai and Z.-Y. Quan, "Stock Prediction by Searching for Similarities in Candlestick Charts," ACM Trans. Manage. Inf. Syst., vol. 5, no. 2, pp. 1-21, 2014.
    [4] H. Ziegler, M. Jenny, T. Gruse, and D. A. Keim, "Visual market sector analysis for financial time series data," in 2010 IEEE Symposium on Visual Analytics Science and Technology, 2010, pp. 83-90.
    [5] H. M. Rai, A. Trivedi, and S. Shukla, "ECG signal processing for abnormalities detection using multi-resolution wavelet transform and Artificial Neural Network classifier," Measurement, vol. 46, no. 9, pp. 3238-3246, 2013/11/01/ 2013.
    [6] J. Jun and Y. Peng, "A Study on Improving the Abnormal Signal Detection Ability of Digital Storage Oscilloscope," in 2013 IEEE 11th International Conference on Dependable, Autonomic and Secure Computing, 2013, pp. 244-247.
    [7] D. Pandit, L. Zhang, N. Aslam, C. Liu, and S. Chattopadhyay, "Improved abnormality detection from raw ECG signals using feature enhancement," in 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), 2016, pp. 1402-1406.
    [8] A. Apostolico, M. E. Bock, and S. Lonardi, "Monotony of surprise and large-scale quest for unusual words," presented at the Proceedings of the sixth annual international conference on Computational biology, Washington, DC, USA, 2002.
    [9] G. D. Stormo, "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Richard Durbin , Sean R. Eddy , Anders Krogh , Graeme Mitchison," The Quarterly Review of Biology, vol. 75, no. 3, pp. 313-314, 2000.
    [10] A. Gionis and H. Mannila, "Finding recurrent sources in sequences," presented at the Proceedings of the seventh annual international conference on Research in computational molecular biology, Berlin, Germany, 2003.
    [11] T. L. Bailey, "Discovering Sequence Motifs," in Bioinformatics: Data, Sequence Analysis and Evolution, J. M. Keith, Ed. Totowa, NJ: Humana Press, 2008, pp. 231-251.
    [12] J. Buhler and M. Tompa, "Finding motifs using random projections," presented at the Proceedings of the fifth annual international conference on Computational biology, Montreal, Quebec, Canada, 2001.
    [13] E. Keogh, J. Lin, and A. Fu, "HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence," presented at the Proceedings of the Fifth IEEE International Conference on Data Mining, 2005.
    [14] J. Lin, E. Keogh, S. Lonardi, and B. Chiu, "A symbolic representation of time series, with implications for streaming algorithms," presented at the Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, San Diego, California, 2003.
    [15] S. Salvador and P. Chan, "Toward accurate dynamic time warping in linear time and space," Intell. Data Anal., vol. 11, no. 5, pp. 561-580, 2007.
    [16] V. Chandola, A. Banerjee, and V. Kumar, "Anomaly detection: A survey," ACM Comput. Surv., vol. 41, no. 3, pp. 1-58, 2009.
    [17] S. T. Roweis and L. K. Saul, "Nonlinear Dimensionality Reduction by Locally Linear Embedding," Science, vol. 290, no. 5500, pp. 2323-2326, 2000.
    [18] L. Van Der Maaten, E. Postma, and J. Van den Herik, "Dimensionality reduction: a comparative review," J Mach Learn Res, vol. 10, pp. 66-71, 2009.
    [19] H. Ding, G. Trajcevski, P. Scheuermann, X. Wang, and E. Keogh, "Querying and mining of time series data: experimental comparison of representations and distance measures," Proc. VLDB Endow., vol. 1, no. 2, pp. 1542-1552, 2008.
    [20] E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra, "Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases," Knowledge and Information Systems, journal article vol. 3, no. 3, pp. 263-286, 2001.
    [21] E. Keogh, "Fast similarity search in the presence of longitudinal scaling in time series databases," in Proceedings Ninth IEEE International Conference on Tools with Artificial Intelligence, 1997, pp. 578-584.
    [22] E. Keogh, S. Chu, D. Hart, and M. Pazzani, "An online algorithm for segmenting time series," in Proceedings 2001 IEEE International Conference on Data Mining, 2001, pp. 289-296.
    [23] F. L. Chung, Fu,T.C.,Luk,R.,Ng,V., "Flexible time series pattern matching based on perceptually important points.," in Workshop on Learning from Temporal and Spatial Data in International Joint Conference on Artificial Intelligence (IJCAI'01), Seattle, Washington, USA, 2001, pp. 1-7.
    [24] T.-c. Fu, "A review on time series data mining," Engineering Applications of Artificial Intelligence, vol. 24, no. 1, pp. 164-181, 2011/02/01/ 2011.
    [25] J. Lin, E. Keogh, L. Wei, and S. Lonardi, "Experiencing SAX: a novel symbolic representation of time series," Data Min. Knowl. Discov., vol. 15, no. 2, pp. 107-144, 2007.
    [26] M. Vlachos, M. Hadjieleftheriou, D. Gunopulos, and E. Keogh, "Indexing multi-dimensional time-series with support for multiple distance measures," presented at the Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, D.C., 2003.
    [27] T. Rakthanmanon et al., "Searching and mining trillions of time series subsequences under dynamic time warping," presented at the Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, Beijing, China, 2012.
    [28] F. K. P. Chan, A. W. C. Fu, and C. Yu, "Haar wavelets for efficient similarity search of time-series: with and without time warping," IEEE Transactions on Knowledge and Data Engineering, vol. 15, no. 3, pp. 686-705, 2003.
    [29] J. Shen, D. Zhu, W. Huang, and J. Liang, "A novel similarity measure approach for time series based on PLA and DTW," in 2016 35th Chinese Control Conference (CCC), 2016, pp. 7159-7163.
    [30] I. Assent, M. Wichterich, R. Krieger, H. Kremer, and T. Seidl, "Anticipatory DTW for efficient similarity search in time series databases," Proc. VLDB Endow., vol. 2, no. 1, pp. 826-837, 2009.
    [31] B.-K. Yi and C. Faloutsos, "Fast Time Sequence Indexing for Arbitrary Lp Norms," presented at the Proceedings of the 26th International Conference on Very Large Data Bases, 2000.
    [32] N. D. Pham, Q. L. Le, and T. K. Dang, "HOT aSAX: A Novel Adaptive Symbolic Representation for Time Series Discords Discovery," in Intelligent Information and Database Systems: Second International Conference, ACIIDS, Hue City, Vietnam, March 24-26, 2010. Proceedings, Part I, N. T. Nguyen, M. T. Le, and J. Świątek, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 113-121.
    [33] G. Li, Y. Wang, L. Zhang, and X. Zhu, "Similarity measure for time series based on piecewise linear approximation," in 2009 International Conference on Wireless Communications & Signal Processing, 2009, pp. 1-4.
    [34] H. Li, C. Guo, and W. Qiu, "Similarity measure based on piecewise linear approximation and derivative dynamic time warping for time series mining," Expert Systems with Applications, vol. 38, no. 12, pp. 14732-14743, 2011/11/01/ 2011.
    [35] H. Lee and R. Singh, "Symbolic representation and clustering of bio-medical time-series data using non-parametric segmentation and cluster ensemble," in 2012 25th IEEE International Symposium on Computer-Based Medical Systems (CBMS), 2012, pp. 1-6.
    [36] L. Jinyang, Z. Chuanlei, Z. Shanwen, and F. Weidong, "An efficient and accurate optimization method of sliding window size for PAA," in 2014 IEEE Computers, Communications and IT Applications Conference, 2014, pp. 283-286.
    [37] Z. Bankó and J. Abonyi, "Mixed dissimilarity measure for piecewise linear approximation based time series applications," Expert Systems with Applications, vol. 42, no. 21, pp. 7664-7675, 2015/11/30/ 2015.
    [38] H. Sakoe and S. Chiba, "Dynamic programming algorithm optimization for spoken word recognition," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 26, no. 1, pp. 43-49, 1978.
    [39] N. Q. V. Hung and D. T. Anh, "Combining SAX and Piecewise Linear Approximation to Improve Similarity Search on Financial Time Series," in 2007 International Symposium on Information Technology Convergence (ISITC 2007), 2007, pp. 58-62.
    [40] Y. Sun, J. Li, J. Liu, B. Sun, and C. Chow, "An improvement of symbolic aggregate approximation distance measure for time series," Neurocomputing, vol. 138, pp. 189-198, 8/22/ 2014.
    [41] Z. Zhu et al., "Time series mining based on multilayer piecewise aggregate approximation," in 2016 International Conference on Audio, Language and Image Processing (ICALIP), 2016, pp. 174-179.
    [42] P. S. Mann and C. J. Lacke, Introductory statistics, 8th ed. Singapore : John Wiley & Sons (Singapore) 2013.
    [43] K. Chakrabarti, E. Keogh, S. Mehrotra, and M. Pazzani, "Locally adaptive dimensionality reduction for indexing large time series databases," ACM Trans. Database Syst., vol. 27, no. 2, pp. 188-228, 2002.
    [44] M. Shevchenko, "Space Research Institute," R. Moscow, Ed., ed, 2000.
    [45] G. Moody, "MIT-BIH Database Distribution," M. Cambridge, Ed., ed, 2000.

    QR CODE