簡易檢索 / 詳目顯示

研究生: 劉騏豪
Chi-hao Liu
論文名稱: 應用基因演算法於時間序列之分群
Applying Genetic Algorithm to Cluster Time Series Data
指導教授: 楊鍵樵
Chen-Chau Yang
口試委員: 朱雨其
Yu-chi Chu
呂永和
Yung-Ho Leu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電子工程系
Department of Electronic and Computer Engineering
論文出版年: 2005
畢業學年度: 93
語文別: 中文
論文頁數: 56
中文關鍵詞: 時間序列滑動式視窗K-Means基因演算法資料探勘
外文關鍵詞: Sliding Windows, K-Means, Genetic Algorithm, Data Mining, Time Series
相關次數: 點閱:865下載:12
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

時間序列分析可以讓我們瞭解實際事件變化與行為,而針對時間序列進行相似性群集有助於未來行為的預測以及規劃。然而,大部分的群集化演算法都先要求給定參數(例如分群數),但是使用者卻沒有足夠的知識背景來決定所要求的參數,因此,本論文提出一個以基因演算法及密度為基礎的群集演算法,有效的找出適當的群組參數,以此來幫助使用者省略群組參數的設定。
其次,在實驗方面我們使用環保署空氣污染監測資料當作資料來源,並運用我們提出之方法將其分群找出數個不同之型樣,而這些型樣若能進一步運用資料探勘技術,從中發掘有用的知識,例如各污染物型樣間是否存有未知的關聯性等這類訊息,則對環保署擬訂空氣污染防治政策與措施,當有正面助益。


Time series analysis can let us understand changes and behavior of the events, and clustering of time series will be helpful to the prediction of the behavior and plans in the future. However, most of the clustering algorithms need to input some parameters, such as the number of clusters but the user do not have enough domain knowledge to determine these parameters. Thus, in this thesis we propose a time series clustering algorithm base on the genetic algorithm and the concept of density, to effectively find out the proper parameters of clusters without user setting.
Further more, we apply our method to the Environmental Protection Administration data, and find some different kinds of patterns in the result. These patterns are useful knowledge when the Environmental Protection Administration need to find some relationship between pollution or to plan some air pollution control policy.

中文摘要.........................................................I Abstract........................................................II 誌謝...........................................................III 目錄............................................................IV 圖目錄.........................................................VII 表目錄..........................................................IX 第一章 緒論.....................................................1 1.1 研究背景.....................................................1 1.2 研究動機.....................................................2 1.3 研究方法.....................................................2 1.4 研究目的.....................................................3 1.5 論文架構.....................................................3 第二章 相關文獻.................................................5 2.1 時間序列分析................................................5 2.1.1 時間序列相似度............................................5 2.1.2 序列正規化( Normalization of Sequences )..................7 2.2 群集技術(Clustering Techniques)...........................8 2.2.1階層式(Hierarchical)演算法............................8 2.2.2切割式(Partitional)演算法.............................9 2.2.3K-Means演算法..........................................10 2.2.4密度基礎的叢集方法.....................................12 2.3 基因演算法(Genetic Algorithms)...........................13 2.3.1 選擇(Selection)........................................15 2.3.2 交配(Selection)........................................16 2.3.3 突變(Selection)........................................17 2.3.4 基因演算法的特性與應用...................................17 2.4 基因演算法與分群...........................................18 2.5 GA-clustering 演算法.......................................21 第三章 研究方法................................................25 3.1 基因演算法在分群之研究.....................................25 3.2 以基因及K-Means演算法為基礎的分群方法......................26 3.2.1 以滑動式視窗切割時間序列.................................27 3.2.2 序列正規化...............................................28 3.2.3 染色體編碼與族群初始化...................................28 3.2.4 適應函數.................................................30 3.2.5 演化流程.................................................31 第四章 實驗方式................................................34 4.1 實驗流程....................................................34 4.2 實驗環境....................................................34 4.3 實驗資料....................................................35 4.4 效能測試....................................................54 4.5 實驗結果....................................................55 第五章 結論與未來研究方向......................................59 5.1 結論.......................................................59 5.2 未來研究方向...............................................60 參考書目........................................................62

[1]A.Hinneburg and D. A. Keim, “An Efficient Approach to Clustering in Multimedia Databases with Noise.” Proc. 4th Int. Conf. on Knowledge Discovery and Data Mining, New York,AAAI
Press, 1998.

[2]Amir Ben-Dor and Zohar Yakhini, “Clustering gene expression patterns.” Proceedings of the 3rd Annual International Conference
on Computational Molecular BiologyRECOMB ‘99), 1999.

[3]Anil K. Jain and Richard C. Dubes, “Algorithms for Clustering
Data.”Prentice Hall, 1988.

[4]Cole R. M.,“Clustering with genetic algorithms.” University of
Western Australia, Master thesis, pp.2-3,1998.

[5]Dumitrescu D., Lazzerini B., Jain L. C., Dumitrescu A., Evolutionary Computation, CRC Press Boca Raton,
pp.223-224,2000.

[6]Eamonn Keogh, “A Tutorial on Indexing and Mining Time Series
Data,”IEEE ICDM, 2001.

[7]Eamonn Keogh,Jessica Lin, and Wagner Truppel, “Clustering of Time Series Subsequences is Meaningless:Implications for
Previous and Future Research,”2003.

[8]Eamonn Keogh , Kaushik Chakrabarti , Sharad Mehrotra, and Michael Pazzani,” Locally Adaptive Dimensionality Reduction for
Indexing Large Time Series Databases”,2001.

[9]Goldin, D. & Kanellakis, P. “On similarity queries for time-series data:constraint specification and implementation.” In
proceedings of the 1st,1995.

[10]Gholamhosein Sheikholeslami, Surojit Chatterjee, and Aidong Zhang,“WaveCluster: A multi-resolution clustering approach for very large spatial databases.” Proceedings of the 24 th Very Large Databases Conference (VLDB 98), pages 428— 439, New York,
Aug. 1998.

[11]Krishna K., Murty M. N., “Genetic K-Means algorithm”, IEEE Transactionson Systems, Man, and Cybernetics, Vol. 29, No. 3,
pp.433-439,1999.

[12]Herrera, F. and J, L, Verdegay, Genetic Algorithms and Soft
Computing, Springer-Verlag, Heidelberg, pp. 8, 1996.

[13]M. Ankerst, M. M. Breunig, H. P. Kriegel, and J. Sander, “OPTICS: ordering points to identify the clustering structure.” Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Philadephia, Pennsylvania, USA, pages 49— 60, June
1999.

[14]Martin Ester, Hans-Peter Kriegel, Jorg Sander and Xiaowei Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise.” Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pages
226-231, Portland, Orgon, 1996.

[15]Maulik U., Bandyopadhyay S., “Genetic algorithm-based clusteringtechnique”, Pattern Recognition, Vol. 33, pp.1455-1465,
2000.

[16]McQueen, J. B., “Some Methods of Classification and Analysis of Multivariate Observations,” Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp.
281-297, 1967.

[17]Murthy C. A., Chowdhury N., “In search of optimal clusters using genetic algorithms”, Pattern Recognition Letters, Vol. 17,
pp.825-832,1996.

[18]Ng, R. T. and H. Jiawei, “Efficient and Effective Clustering Methods for Spatial Data Mining,” Proceedings of the 20th VLDB
Conference, pp. 144-155, 1994.

[19]Periklis, A., “Data Clustering Techniques”, March 2002.
URL: http://www.cs.toronto.edu/~periklis/pubs/depth.pdf

[20]Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, Prabhakar Raghavan, “Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications.” Proc. of the ACM SIGMOD Int'l Conference on Management of Data, Seattle,
Washington, June 1998.

[21]Sarkar M., Yegnanarayana B., Khemani D.,“A clustering algorithm using anevolutionary programming-based approach”,
Pattern Recognition Letters, Vol. 18,pp.975-986,1997.

[22]Sudipto, G., R. Rajeev and S. Kyuseok, “CURE: An Efficient Clustering Algorithm for Large Database,” Proceedings of ACM- SIGMOD International Conference on Management of Data, pp.
73- 84, 1998.

[23]Sudipto, G., R. Rajeev and S. Kyuseok, “ROCK: A Robust Clustering Algorithm for Categorical Attributes,” Proceedings of
the 15th International Conference on Data Eng., 1999.

[24]Tou J. T., Gonzalez R. C., Pattern Recognition Principles,
Addison-Wesley,Reading, pp.94-97,1974.

[25]Zhang, T., R. Ramakrishnan and M. Livny, “BIRCH: An New Data Clustering Algorithm and Its Applications,” Data Mining and
Knowledge Discovery, vol. 1, no. 2, 1997.

[26]行政院環境保護署,「空氣污染指標的定義」,URL:
http://taqm.epa.gov.tw/emc/default.aspx?pid=b0201&cid=b0201

[27]行政院環境保護署, 空氣品質長期趨勢分析與年報編撰, 2002.

[28]行政院環境保護署, 空氣品質監測報告, 2002.

[29]江巧雯, “長時間序列叢集化之研究,” 私立元智大學資訊管理
系碩士論文, 2002.

[30]陳延洛, “基因表現時間序列的叢集分析方法與系統實作,” 國
立成功大學資訊工程系碩士論文, 2002.

[31]葉承銓, “應用適應性基因演算法於資料分群的問題,” 私立樹德
科技大學資訊管理所碩士論文, 2002.

[32]楊東麟, “探討基植於時間序列上的群組化技術,” 私立逢甲大
學資訊工程系碩士論文,2001

[33]蘇木春,張孝德,2000,機器學習:類神經網路、模糊系統以及基因演算法則,二版,全華科技圖書股份有限公司,頁9-2。

QR CODE