簡易檢索 / 詳目顯示

研究生: 陳彥煊
Yen-Hsuan Chen
論文名稱: 高維度頻繁型樣分群法
A High Dimensional Frequent Pattern Clustering Approach
指導教授: 徐俊傑
Chiun-Chieh Hsu
口試委員: 王有禮
Yue-Li Wang
黃世禎
Sun-Jen Huang
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 77
中文關鍵詞: 分群頻繁型樣高維度完全圖
外文關鍵詞: Clustering, Frequent pattern, High dimension, Clique
相關次數: 點閱:182下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來,由於網際網路每日產生的資料量不斷增加,要在海量的資料中找出有價值的資訊愈來愈受到重視,這使得資料探勘成為重要的研究課題。分群是資料探勘中一個常用的方法,資料不需要事先給定類別標籤是分群的優勢,在分完群後,各群內的資料有高度同質性,而各群間則有高度的相異性。在遇到非數值型資料時,常先利用關聯分析建立資料的關係,再建成圖進行分群,但大多數的圖分群方法都只考慮兩兩節點之間的距離,忽略了關聯分析產生的各維度的頻繁型樣,而資訊價值較高的高維度的頻繁型樣則因此容易在分群過程中被拆散,導致在分完群後遺失了重要的關聯資訊。
    本研究提出了一個考量高維度頻繁型樣的分群法(High Dimensional Frequent Pattern Clustering,HDFPC)來改善問題並增進分群效能。HDFPC先利用高維度頻繁型樣作為群中心,過程中利用合併門檻判斷頻繁型樣是否進行合併,再用二維頻繁型樣來做群的拓展。若無法加入已有的群,則該二維頻繁型樣作為中心成立次級群,最後進行各群集的修剪與調整。根據實驗結果顯示,本研究提出之方法能有效的改善先前提出的問題並且比其它方法獲得較好的效能。


    Since the quantity of the data generated per day increases tremendously in recent years, the necessity of finding precious information from the tremendous amount of data becomes more important. Therefore, the data mining research becomes a significant issue.
    Clustering is one of the data mining technologies which have been frequently used. One of the advantages is that data do not need to be given any labels in advance. After grouping, there is high heterogeneity between the data of different clusters, and high homogeneity for the data in the cluster. When grouping the non-numeric data, a usual way is to first perform the associative analysis, build the graph based on the frequent pattern, and then cluster the graph. Most of the graph clustering methods only consider the distance between nodes, and ignore the frequent patterns of each dimensional generated from associative analysis. Therefore the valuable high dimensional frequent patterns will be easily separated while grouping, which results in losing important information.
    This study proposes a method named High Dimensional Frequent Pattern Clustering for solving the problems and improving the clustering performance. This method first makes use of high dimensional frequent patterns as the cluster center. Then it utilizes the merge threshold to decide whether or not to merge the high dimensional frequent patterns with any clusters during this stage, and uses two-dimensional frequent patterns to expand the cluster. If there is no cluster can join together, the two-dimensional frequent pattern will create a new secondary cluster and become the cluster center. The final step is to prune and adjust each cluster. According to the experimental results, HDFPC can effectively solve the problem mentioned previously, thus it has better performance than those of others.

    論文摘要 I Abstract II 目錄 IV 圖索引 VI 表索引 VIII 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機 1 1.3 研究目的 2 1.4 研究範圍與限制 2 1.5 論文架構 3 第二章 文獻探討 4 2.1 分群技術 4 2.2 圖分群 4 2.3 以分割為基礎的圖分群 5 2.4 以案例為基礎的圖分群 6 2.5 以密度為基礎的圖分群 7 2.6 混和型圖分群 8 2.7 分群正確率 9 第三章 研究方法 12 3.1 問題描述 12 3.2 HDFPC定義 13 3.3 HDFPC演算法 14 3.3.1 建立群中心階段 16 3.3.2 拓展群集階段 18 3.3.3 群集修剪階段 22 3.4 完整範例 23 3.5 時間複雜度 35 第四章 實驗分析與結果 38 4.1 實驗資料與環境 38 4.2 實驗方法 38 4.3 實驗結果與分析 40 4.3.1 頻繁型樣完整率 40 4.3.2 時間 50 4.3.3 空間 55 4.4 實驗總結 59 第五章 結論與未來研究方向 62 參考文獻 63

    [1] A. S. Arefin, C. Riveros, R. Berretta, and P. Moscato, "kNN-MST-Agglomerative: A fast and scalable graph-based data clustering approach on GPU," 7th International Conference on Computer Science & Education (ICCSE), pp. 585-590, 2012.
    [2] A. Bryant and K. Cios, “RNN-DBSCAN: A density-based clustering algorithm using reverse nearest neighbor density estimates,” IEEE Transactions on Knowledge and Data Engineering, Vol. 30, No. 6, pp. 1109-1121, 2018.
    [3] C. Cassisi, A. Ferro, R. Giugno, G. Pigola, and A. Pulvirenti, “Enhancing density-based clustering: Parameter reduction and outlier detection,” Information Systems, Vol. 38, No. 3, pp. 317-330, 2013.
    [4] L. Chang, W. Li,, L. Qin,W. Zhang, and S. Yang, “pSCAN : Fast and Exact Structural Graph Clustering,” IEEE Transactions on Knowledge and Data Engineering, Vol. 29, No. 2, pp. 387-401, 2017.
    [5] P. Coscia, P. Braca, L. M. Millefiori, F. A. N. Palmieri, and P. Willett, “Multiple Ornstein-Uhlenbeck Processes for Maritime Traffic Graph Representation,” IEEE Transactions on Aerospace and Electronic Systems, pp. 1-13, 2018.
    [6] D. Fortin-Simard, S. Gaboury, B. Bouchard and A. Bouzouane , “Frequent Pattern Clustering for ADLs Recognition in Smart Environments,” PETRA '15: Proceedings of the 8th ACM International Conference on PErvasive Technologies Related to Assistive Environments, ACM, Corfu, Greece, No. 64, 2015
    [7] Y. Gu, C. Gao, G. Cong, and G. Yu, “Effective and Efficient Clustering Methods for Correlated Probabilistic Graphs,” IEEE Transactions on Knowledge and Data Engineering, Vol. 26, No. 5, pp. 1117-1130, 2014.
    [8] M. Hayat and M. Bennamoun, “An automatic framework for textured 3D video-based facial expression recognition,” IEEE Transactions on Affective Computing, Vol. 5, No. 3, pp. 301-313, 2014.
    [9] F. Hu, G.-S. Xia, Z. Wang, X. Huang, and L. Zhang, “Unsupervised feature learning via spectral clustering of patches for remotely sensed scene classification,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol. 8, No. 5, pp. 2015-2030, 2015.
    [10] Y.-F. Huang and C.-J. Lai, “Integrating frequent pattern clustering and branch-and-bound approaches for data partitioning,” Information Sciences, Vol. 328, pp. 288-301, 2016
    [11] D. Huang, J. H. Lai, and C. D. Wang, “Robust Ensemble Clustering Using Probability Trajectories,” IEEE Transactions on Knowledge and Data Engineering, Vol. 28, No. 5, pp. 1312-1326, 2016.
    [12] J. Huang, H. Sun, Q. Song, H. Deng, and J. Han, “Revealing Density-Based Clustering Structure from the Core-Connected Tree of a Network,” IEEE Transactions on Knowledge and Data Engineering, Vol.25, No.8, pp. 1876-1889, 2013.
    [13] T. Jo, “Graph based KNN for Text Categorization,” 20th International Conference on Advanced Communication Technology (ICACT), pp. 260-265, 2018.
    [14] Y. Li and C. Guo, “Hypergraph-based spectral clustering for categorical data,” Seventh International Conference on Advanced Computational Intelligence (ICACI), pp. 396-401, 2015.
    [15] H. Liu, J. Wu, T. Liu, D. Tao, and Y. Fu, “Spectral ensemble clustering via weighted k-means: Theoretical and practical evidence,” IEEE Transactions on Knowledge and Data Engineering, Vol. 29, No. 5, pp. 1129-1143, 2017.
    [16] Y. H. Lu, T. H. Ma, M. L. Tang, J. Cao, Y. Tian, A. Al-Dhelaan, and M. Al-Rodhaan, “An efficient and scalable density-based clustering algorithm for datasets with complex structures,” Neurocomputing, Vol. 171, pp. 9-22, 2016.
    [17] S. Miyahara and S. Miyamoto, “A family of algorithms using spectral clustering and DBSCAN,” IEEE International Conference on Granular Computing (GrC), pp. 196-200, 2014.
    [18] A. Y. Ng, M. I. Jordan, and Y. Weiss, “On spectral clustering: Analysis and an algorithm,” in Proc. Adv. Neural Inf. Process. Syst., pp. 849-856, 2002.
    [19] R. Reztaputra and M. L. Khodra, “Sentence structure-based summarization for Indonesian news articles,” International Conference on Advanced Informatics, Concepts, Theory, and Applications (ICAICTA), 2017.
    [20] V. Sabitha and Dr. S. K. Srivatsa, ”Annotation in Web Database Search Result Records with Machine Learning Technique in Frequent Pattern Clustering,” National Journal on Advances in Computing & Management, Vol. 6, No. 2, pp. 17-25, 2015.
    [21] A. Sharma and A. Sharma, “KNN-DBSCAN: Using k-nearest Neighbor Information for Parameter-free Density Based Clustering,” International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), pp. 787-792, 2017.
    [22] Y. Shi, C. Otto, and A.K. Jain, “Face Clustering: representation and pairwise constraints,” IEEE Transactions on Information Forensics and Security, Vol. 13, No. 7, pp. 1626-1640, 2018.
    [23] Z. Uykan and M.C., Ganiz, “Application of the SpecHybrid Algorithm to Text Document Clustering Problem,” International Symposium on Innovations in Intelligent Systems and Applications (INISTA), pp. 118-122, 2011.
    [24] S. Vadapalli, S. Valluri, and K. Karlapalem, “A Simple Yet Effective Data Clustering Algorithm,” Sixth International Conference on Data Mining (ICDM'06), pp. 1108-1112, 2006.
    [25] P. Veenstra, C. Cooper, and S. Phelps, “Spectral Clustering Using the kNN-MST Similarity Graph,” 8th Computer Science and Electronic Engineering (CEEC), pp. 222-227, 2016.
    [26] R. Wang, F. Nie, and W. Yu, “Fast spectral clustering with anchor graph for large hyperspectral images,” IEEE Geoscience and Remote Sensing Letters, Vol. 14, No. 11, pp. 2003-2007, 2017.
    [27] J. J. Whang, D. F. Gleich, and I. S. Dhillon, “Overlapping Community Detection Using Neighborhood-Inflated Seed Expansion,” IEEE Transactions on Knowledge and Data Engineering, Vol. 28, No. 5, pp. 1272-1284, 2016.
    [28] X. Ye and T. Sakurai, “Spectral clustering using robust similarity measure based on closeness of shared Nearest Neighbors,” International Joint Conference on Neural Networks (IJCNN), pp. 1-8, 2015.
    [29] M. Zaki, M. Peters, I. Assent, and T. Seidl, “CLICKS: An effective algorithm for mining subspace clusters in categorical datasets,” Data and Knowledge Engineering, Vol. 60, pp. 51-70, 2007.

    無法下載圖示 全文公開日期 2023/08/08 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE