簡易檢索 / 詳目顯示

研究生: 侯宜均
Yi-Chun Hou
論文名稱: 高維度混合共享最近鄰居譜分群法
High Dimensional Spectral Clustering Approach Using Shared Nearest Neighbors Based on Mixing Selection
指導教授: 徐俊傑
Chiun-Chieh Hsu
口試委員: 王有禮
Yue-Li Wang
黃世禎
Sun-Jen Huang
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 58
中文關鍵詞: 高維度譜分群奇異值分解特徵向量
外文關鍵詞: High dimension, Spectral clustering, Singular value decomposition, Eigenvector
相關次數: 點閱:154下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著現代科技不斷進步,網際網路日常的數據大量地產出。在這些資料中隱藏著其潛在的價值,許多人汲於一窺究竟,而資料探勘即是目前一種很有用的方法,能協助管理者利用特徵規則取得有用的資訊。分群方法是資料探勘中很有用的一種技術,其優點為不需要依照事先定義好的類別屬性將資料分類,且同一群的資料彼此間相似,不同群間的資料彼此間不相似。以往使用歐幾里德距離測量數據點之間的相似性,但無法正確反應高維度數據集分布狀況,因此有學者提出圖分群方法。而譜分群為圖分群目前最流行的方法之一,與傳統的圖分群方法相比具有更好的效能,若欲知道所有數據點之間的關係,會消耗大量的時間及空間成本。而有些方法利用一組代表點來代表數據集整體結構,並使用了k-means選擇方法或隨機選擇方法來選擇代表點,但隨機選擇方法可能會有很大的機率選擇到一組效能並不好的代表點,k-means選擇方法則需要額外時間成本來選出代表點。
    本研究提出了一個高維度混合共享最近鄰居譜分群法(High Dimensional Spectral Clustering Approach Using Shared Nearest Neighbors Based on Mixing Selection,HDSC-SM)來改善問題並增進分群效能。HDSC-SM首先以混合選擇方式挑選代表點,並以共享鄰居數量及考量數據點相互關係來建構相似性矩陣,再利用奇異值分解方法分解出特徵向量,最後再依照特徵向量使用k-means方法將數據點作分群。根據實驗結果顯示,本研究提出之方法能有效改善先前提出之問題,並且與其他方法相比能獲得更好的效能。


    In recent years, lots of the Internet daily data is produced. These data have their potential values, and data mining is a very useful method to help managers use feature rules to obtain useful information. Clustering is a very useful technique in data mining. It does not need to classify data according to the predefined category attributes. The data of the same cluster are similar to each other, and the data between different clusters are not similar to each other. In the past, euclidean distance was used to measure the similarity between data points. Because it could not accurately reflect the distribution of high dimensional datasets, some scholars proposed a graph clustering method. Spectral clustering is one of the most popular methods for graph clustering. Compared with traditional graph clustering methods, it has better performance. However, in order to know the relationship between all data points, it will consume lots time and space costs. Some methods use a set of representative points to represent the whole structure of the dataset, and use the k-means selection method or the random selection method to select the representative points. However, the random selection method may have a high probability to select a set of not so good representative points, and the k-means selection method requires additional time cost to select representative points.
    This study proposes a method named High Dimensional Spectral Clustering Approach Using Shared Nearest Neighbors Based on Mixing Selection (HDSC-SM) for solving these problems and improving the clustering performance. HDSC-SM first selects representative points with a mixed selection method, constructs a similarity matrix by sharing the number of neighbors, and considering the relationship between data points. Then it uses singular value decomposition to decompose the eigenvector, and finally uses the k-means method with the eigenvector to cluster data points. According to the experimental results, HDSC-SM can effectively solve the previously mentioned problems and obtain better performance than other methods do.

    論文摘要 I ABSTRACT II 誌 謝 III 目 錄 IV 圖目錄 VI 表目錄 VII 第一章 緒論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 研究目的 2 1.4 研究範圍與限制 3 1.5 論文架構 3 第二章 相關文獻解析 5 2.1 分群方法 5 2.2 圖分群 5 2.2.1 切割式圖分群 6 2.2.2 案例式圖分群 7 2.2.3 階層式圖分群 9 2.2.4 密度式圖分群 10 2.2.5 混合式圖分群 12 第三章 研究方法 14 3.1 問題描述 14 3.2 HDSC-SM定義 15 3.3 HDSC-SM演算法 17 3.3.1 混合選擇代表點 19 3.3.2 建構相似性矩陣 20 3.3.3 產生集群 22 3.4 時間複雜度計算 23 第四章 實驗分析與結果 24 4.1 實驗環境與數據集 24 4.2 相關方法及評估方式 25 4.2.1 相關方法介紹 26 4.2.2 評估方式 26 4.3 實驗分析與結果 28 4.3.1 實驗結果 28 4.3.2 執行時間比較 33 4.3.3 最近代表鄰居參數r 35 4.3.4 代表點參數p 37 4.4 實驗總結 39 第五章 結論與未來展望 42 參考文獻 44

    [1] Z. Bian, H. Ishibuchi, and S. Wang, “Joint Learning of Spectral Clustering Structure and Fuzzy Similarity Matrix of Data,” IEEE Transactions on Fuzzy Systems, Vol. 27, No. 1, pp. 31–44, 2019.
    [2] J. L. Bruse, M. A. Zuluaga, A. Khushnood, K. McLeod, H. N. Ntsinjana, T. Y. Hsia, …, and S. Schievano, “Detecting Clinically Meaningful Shape Clusters in Medical Image Data: Metrics Analysis for Hierarchical Clustering Applied to Healthy and Pathological Aortic Arches,” IEEE Transactions on Biomedical Engineering, Vol. 64, No. 10, pp. 2373–2383, 2017.
    [3] A. Bryant and K. Cios, “Rnn-dbscan: A density-based clustering algorithm using reverse nearest neighbor density estimates,” IEEE Transactions on Knowledge and Data Engineering, Vol. 30, No. 6, pp. 1109–1121, 2018.
    [4] D. Cai and X. Chen, “Large Scale Spectral Clustering Via Landmark-Based Sparse Representation,” IEEE Transactions on Cybernetics, Vol. 45, No. 8, pp. 1669–1680, 2015.
    [5] C. Cassisi, A. Ferro, R. Giugno, G. Pigola, and A. Pulvirenti, “Enhancing density-based clustering: Parameter reduction and outlier detection,” Information Systems, Vol. 38, No. 3, pp. 317–330, 2013.
    [6] X. Chen, D. He, W. Hong, M. Yang, F. Nie, and J. Z. Huang, “Spectral clustering of large-scale data by directly solving normalized cut,” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1206–1215, 2018.
    [7] P. Coscia, P. Braca, L. M. Millefiori, F. A. N. Palmieri, and P. Willett, “Multiple Ornstein-Uhlenbeck Processes for Maritime Traffic Graph Representation,” IEEE Transactions on Aerospace and Electronic Systems, Vol. 54, No. 5, pp. 2158–2170, 2018.
    [8] L. Ding, F. M. Gonzalez-Longatt, P. Wall, and V. Terzija, “Two-step spectral clustering controlled islanding algorithm,” IEEE Transactions on Power Systems, Vol. 28, No. 1, pp. 75–84, 2013.
    [9] G. Dong, Y. Jin, S. Wang, W. Li, Z. Tao, and S. Guo, “DB-Kmeans:An Intrusion Detection Algorithm Based on DBSCAN and K-means,” 2019 20th Asia-Pacific Network Operations and Management Symposium: Management in a Cyber-Physical World, APNOMS 2019, pp. 1–4, 2019.
    [10] F. Gama, S. Segarra, and A. Ribeiro, “Hierarchical Overlapping Clustering of Network Data Using Cut Metrics,” IEEE Transactions on Signal and Information Processing Over Networks, Vol. 4, No. c, pp. 1–15, 2017.
    [11] M. F. Hassanin, M. Hassan, and A. Shoeb, “DDBSCAN: Different Densities-Based Spatial Clustering of Applications with Noise,” 2015 International Conference on Control Instrumentation Communication and Computational Technologies, ICCICCT 2015, pp. 401–404, 2016.
    [12] D. Huang, J. H. Lai, and C. D. Wang, “Robust Ensemble Clustering Using Probability Trajectories,” IEEE Transactions on Knowledge and Data Engineering, Vol. 28, No. 5, pp. 1312–1326, 2016.
    [13] D. Huang, C. D. Wang, J. Wu, J. H. Lai, and C. K. Kwoh, “Ultra-Scalable Spectral Clustering and Ensemble Clustering,” IEEE Transactions on Knowledge and Data Engineering, pp. 1–14, 2019.
    [14] H. Liu, J. Wu, T. Liu, D. Tao, and Y. Fu, “Spectral Ensemble Clustering via Weighted K-Means: Theoretical and Practical Evidence,” IEEE Transactions on Knowledge and Data Engineering, Vol. 29, No. 5, pp. 1129–1143, 2017.
    [15] Y. Lv, T. Ma, M. Tang, J. Cao, Y. Tian, A. Al-Dhelaan, and M. Al-Rodhaan, “An efficient and scalable density-based clustering algorithm for datasets with complex structures,” Neurocomputing, Vol. 171, pp. 9–22, 2016.
    [16] B. Maschinen, A. Investition, G. Beschaffungen, B. Ersatzbeschaffungen, and S. Mittelherkunft, “Detecting Group-level Crowd Using Spectral Clustering Analysis on Particle Trajectories,” Information Technology Journal, Vol. 12, No. 1, pp. 174–179, 2013.
    [17] Z. Nazari, D. Kang, M. R. Asharif, Y. Sung, and S. Ogawa, “A new hierarchical clustering algorithm,” ICIIBMS 2015 - International Conference on Intelligent Informatics and Biomedical Sciences, pp. 148–152, 2016.
    [18] J. Read, B. Pfahringer, G. Holmes, and E. Frank, “Classifier chains for multi-label classification,” Machine Learning, Vol. 85, No. 3, pp. 333–359, 2011.
    [19] I. Saini, D. Singh, and A. Khosla, “QRS detection using K-Nearest Neighbor algorithm (KNN) and evaluation on standard ECG databases,” Journal of Advanced Research, Vol. 4, No. 4, pp. 331–344, 2013.
    [20] A. Sharma and A. Sharma, “KNN-DBSCAN: Using k-nearest neighbor information for parameter-free density based clustering,” 2017 International Conference on Intelligent Computing, Instrumentation and Control Technologies, ICICICT 2017, 2018-January, pp. 787–792, 2018.
    [21] J. Shen, X. Hao, Z. Liang, Y. Liu, W. Wang, and L. Shao, “Real-Time Superpixel Segmentation by DBSCAN Clustering Algorithm,” IEEE Transactions on Image Processing, Vol. 25, No. 12, pp. 5933–5942, 2016.
    [22] Y. Shi, C. Otto, and A. K. Jain, “Face Clustering: Representation and Pairwise Constraints,” IEEE Transactions on Information Forensics and Security, Vol. 13, No. 7, pp. 1626–1640, 2018.
    [23] S. Vadapalli, S. R. Valluri, and K. Karlapalem, “A simple yet effective data clustering algorithm,” Proceedings - IEEE International Conference on Data Mining, ICDM, pp. 1108–1112, 2006.
    [24] R. Wang, F. Nie, and W. Yu, “Fast Spectral Clustering with Anchor Graph for Large Hyperspectral Images,” IEEE Geoscience and Remote Sensing Letters, Vol. 14, No. 11, pp. 2003–2007, 2017.
    [25] S. Wang, F. Chen, and J. Fang, “Spectral clustering of high-dimensional data via Nonnegative Matrix Factorization,” Proceedings of the International Joint Conference on Neural Networks, 2015-September, 2015.
    [26] Y. Wei, C. Niu, Y. Wang, H. Wang, and D. Liu, “The fast spectral clustering based on spatial information for large scale hyperspectral image,” IEEE Access, Vol. 7, No. i, pp. 141045–141054, 2019.
    [27] W. Xing and Y. Bei, “Medical Health Big Data Classification Based on KNN Classification Algorithm,” IEEE Access, Vol. 8, pp. 28808–28819, 2020.
    [28] T. S. Xu, H. D. Chiang, G. Y. Liu, and C. W. Tan, “Hierarchical K-means Method for Clustering Large-Scale Advanced Metering Infrastructure Data,” IEEE Transactions on Power Delivery, Vol. 32, No. 2, pp. 609–616, 2017.
    [29] X. Ye, H. Li, T. Sakurai, and Z. Liu, “Large scale spectral clustering using sparse representation based on hubness,” Proceedings - 2018 IEEE SmartWorld, Ubiquitous Intelligence and Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People and Smart City Innovations, SmartWorld/UIC/ATC/ScalCom/CBDCom/IoP/SCI 2018, pp. 1731–1737, 2018.
    [30] X. Ye and T. Sakurai, “Spectral clustering using robust similarity measure based on closeness of shared Nearest Neighbors,” Proceedings of the International Joint Conference on Neural Networks, 2015-Septe, pp. 1–8, 2015.
    [31] Z. Yu, L. Li, J. You, H. S. Wong, and G. Han, “SC3: Triple spectral clustering-based consensus clustering framework for class discovery from cancer gene expression profiles,” IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 9, No. 6, pp. 1751–1765, 2012.
    [32] X. Zhu, S. Zhang, Y. Li, J. Zhang, L. Yang, and Y. Fang, “Low-Rank Sparse Subspace for Spectral Clustering,” IEEE Transactions on Knowledge and Data Engineering, Vol. 31, No. 8, pp. 1532–1543, 2019.

    無法下載圖示 全文公開日期 2025/08/18 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE