簡易檢索 / 詳目顯示

研究生: 鄭明修
Ming-shiou Cheng
論文名稱: 從時間序列資料庫中搜尋相似性形樣之研究
The Study of Patterns Searching in Time Series Databases
指導教授: 楊鍵樵
Chen-Chau Yang
口試委員: 鮑興國
Hsing-kuo Pao
朱雨其
Yu-chi Chu
學位類別: 碩士
Master
系所名稱: 工程學院 - 自動化及控制研究所
Graduate Institute of Automation and Control
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 76
中文關鍵詞:  時間序列資料相似性形樣搜尋資料探勘
外文關鍵詞:  Time Series,  Patterns, Similarity, Data Mining
相關次數: 點閱:240下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 摘要

    時間序列相似性形樣搜尋是近年來資料探勘領域的重要研究議題,係使用相似度比對尺規搭配搜尋策略以找出時間序列資料庫裡的形樣。本論文提出時間序列相似性形樣搜尋的方法,是以統計學裡的迴歸分析做為相似度比對的尺規,可處理振幅程度變形、振幅偏移量、時間程度變形等轉換,並補強相關係數看不出振幅程度變形大小的短缺。在執行效率方面,本論文的演算法在搜尋形樣的過程當中對部分需要運算的數值做處裡,以減少許多不必要的運算,達成執行效率的提升。
    在實驗結果裡顯示,比起相關係數、動態時間扭曲兩者,本方法的執行效率快了許多,尤其是與動態時間扭曲比較,本方法的執行效率更是顯著。在準確度方面,本方法的搜尋結果與相關係數完全吻合、與動態時間扭曲搜尋結果可達約七成多至八成多的吻合。

    關鍵詞:資料探勘、時間序列資料、相似性形樣搜尋


    Abstract

    Searching for similar patterns in time series, using metrics of similarity matching and collocating searching strategies to find out patterns in time series, is an important issue of researches in data mining.

    This thesis propose a method which use linear regression to be the metric of similarity for similar patterns searching in time series. Our method not only deals with transformations such as amplitude scaling, amplitude shifting, and time scaling, but also enhances the shortcoming of correlation of correlations(C.C.).In order to promote the efficiency of execution, the algorithm of our method reduces a lot of unnecessary operations during the process of patterns searching.

    According to the experimental results, our method is faster than C.C. and dynamic time warping(DTW), especially much faster than DTW. In the aspect of accuracy, the searching results of our method matches C.C.’s entirely; moreover, it matches about eighty percent of DTW’s.

    keywords:Data Mining、Time Series、Patterns、Similarity

    目次 第一章 緒論...................................................................1 1.1 研究動機..................................................................1 1.2 研究目的..................................................................2 1.3 論文架構..................................................................2 第二章 相關文獻探討...........................................................3 2.1 資料探勘之相關研究........................................................3 2.2 時間序列相似度之相關研究..................................................3 2.2.1 歐幾里德距離(Euclidean Distance)........................................4 2.2.2 相關係數(Coefficient of Correlation, C.C., r)...........................4 2.2.3 動態時間扭曲(Dynamic Time Warping, DTW).................................6 2.3 時間序列相似性型樣搜尋的相關研究..........................................8 2.4 小結....................................................................10 第三章 研究方法與系統設計....................................................13 3.1 系統架構簡述.............................................................13 3.2 相似度比對尺規...........................................................14 3.3 時間序列形樣搜尋演算法(Time Series Patterns Searching, TSPS).............18 3.3.1 提升搜尋效率之演算法(Fast Time Series Patterns Searching, FastTSPS)....33 3.4 FastTSPS與C.C.的比較....................................................40 3.5 FastTSPS與DTW的比較.....................................................42 第四章 系統實驗..............................................................44 4.1實驗環境..................................................................44 4.2 資料集簡介...............................................................45 4.3 實驗結果.................................................................46 4.3.1 實驗一:平均準確度比較.................................................46 4.3.2 實驗二:平均CPU執行時間比較............................................50 第五章 結論與未來工作........................................................58 5.1 結論.....................................................................58 5.2 未來工作.................................................................58 參考文獻.....................................................................60 附錄A 實驗系統介面...........................................................65 附錄B 詳細實驗結果列表.......................................................67 B.1 準確度比較...............................................................67 B.2 執行效率比較.............................................................73

    參考文獻

    [1]W.J. Frawley, G. Piatetsky-Shpiro, and C.J. Matheus, C.J., “Knowledge
    Discovery in Databases: An Overview”, In Knowledge Discovery in
    Databases,eds. Pages1-27. Menlo Park, California: The AAAI Press, 1999.

    [2]J. Han, M. Kamber, “Data Mining: Concepts and Techniques”Academic Press,
    2001.

    [3]G. Das, K. Lin, H. Mannila, H. Renganathan, G. Renganathan, and P. Smyth, ”
    Rule Discovery from time series”, Proceeding of the 4th International
    Conference of Knowledge Discovery and Data Mining, pages16-22, AAA Press,
    1998.

    [4]E. Keogh, and M. Pazzani, “An enhanced representation of time series which
    allows fast and accurate classification, clustering, and relevant
    feedback ”, Proceeding of the 4th International Conference of knowledge
    Discovery and Data Mining, p.239-241, AAAI Press, 1998.

    [5]E. Keogh, and M. Pazzani, “ An Index schema for fast similarity search in
    large time series databases”, Proceeding of theInternational Conference on
    Science and Statistical Databases Management, 1999.

    [6]C. Berberidis, I. Vlahavas, W. G. Aref, M. Atallah and A. K. Elmagarmid,
    “On the Discovery of Weak Periodicities in Large Time Series”, Proceedings
    of the 6th European Conference on Principles and Practice of Knowledge
    Discovery in Database (PKDD’02), p.51- 61, August 2002.

    [7]R. Agrawal, K. I. Lin, H. S. Sawhney and K. Shim, “Fast Similarity Search
    in the Presence of Noise, Scaling and Translation in Times-Series
    Database”, Proceedings of 21th International Conference on Very Large Data
    Bases, pp. 490-50, 1995.

    [8]K. K.W. Chu and M. H. Wong, “Fast Time-Series Searching with Scaling and
    Shifting”, Proceeding of the 9th ACM SIGMOD-SIGACT-SIGART Symposium on
    Principles of Database System, pp.237-248, 1999.

    [9]C. Faloutsos, M. Ranganathan and Y. Manolopoulos, “Fast-Subsequence
    Matching in Time-Series Databases”, In Proceedings of ACM SIGMOD
    International Conference on Management of Data, p. 419-429, 1994.

    [10]D. Q. Goldin and P. C. Kanellakis, “On Similarity Queries for Time-
    Series Data: Constraint Specification and Implementation”, Proceedings of
    the 1st International Conference on Principles and Practice of Constraint
    Programming (CP'95), pp. 137-153, 1995.

    [11]S. W. Kim, S. Park, and W. W. Chu, “An index-based approach for
    similarity search supporting time warping in large sequence databases”,
    Proceedings of the 17th International Conference on Data Engineering, pp.
    607-614, 2001.

    [12]S. L. Lee, S. J. Chun, D. H. Kim, J. H. Lee and C. W. Chung, “Similarity
    Search for Multidimensional Data Sequences”, Proceedings of the 16th IEEE
    International Conference on Data Engineering, pp. 599-608, 2000.

    [13]S. Park, D. Lee and W. W. Chu, “Fast Retrieval of Similar Subsequences in
    Long Sequence Databases”, Proceedings of 3rd IEEE Knowledge and Data
    Engineering Exchange Workshop (KDEX), pp. 60-67, 1999.

    [14]D. Rafiei and A.O. Mendelzon, “Querying Time Series Data Based on
    Similarity”, IEEE Transactions on Knowledge and Data Engineering, Volume
    12, Issue 5, pp. 675-693, 2000.

    [15]J. P. C. Valente and I. L. Chavarrias, “Discovering Similar Patterns in
    Time Series”, Proceedings of the 6th ACM SIGKDD International Conference
    on Knowledge Discovery and Data Mining, pp. 497-505, August 2000.

    [16]郭明哲, “預測方法-理論與實例”, 中興管理顧問公司,1985.

    [17]陳順宇、鄭碧娥, “統計學”, 華泰出版, 1998.7 三版.

    [18]吳宗正, “迴歸分析”, 三民書局, 1993.

    [19]陳耀茂, “迴歸分析”, 全華科技圖書, 1994.

    [20]H. Sakoe,, S. Chiba, “Dynamic Programming Algorithm Optimization for
    Spoken Word Recognition”, In Proceeding in Speech Recognition, eds.
    Waibel, A. and Lee, K., p.159-165. San Mateo, California: Morgan Kaufmann
    Publishers, Inc. 1990.

    [21]D. Berndt and J. Clifford, “Using Dynamic Time Warping to find Patterns
    in Time Series.” In Working Notes of the Knowledge Discovery in Databases
    Work Shop, p.359-370, 1994.

    [22]D. Berndt and J. Clifford, “Finding Patterns in Time Series: Dynamic
    Programming Approach”, Advances in Knowledge Discovery and Data Mining,
    p.229-248, AAAI Press/ The MIT Press, 1996.

    [23]M. Rick, , M.A., “Pattern Recognition in Time-Series.” Technique
    Analysis of Stocks & Commodities, January 1998.

    [24]D. Rafiei and A. Mendelzon. “Similarity-based queries for time series
    data.”, In Proceeding of the ACM International Conference on Management
    of Data(SIGMOD), Tucson, Arizona, May 1997.

    [25]陳延洛.“基因表現時間序列的叢集分析方法與系統實作”. 國立成功大學資訊工程
    研究所碩士論文. 2002.

    [26]UCR Time Series Data Mining Archive (http://www.cs.ucr.edu/~eamonn/TSDMA/)

    [27]行政院環保署環境監測及資訊處, 臭氧濃度即時值

    QR CODE