簡易檢索 / 詳目顯示

研究生: 于喬
Qiao Yu
論文名稱: 運用高效的過濾法加速 K-means 演算法
Accelerated K-means Algorithm Based on Efficient Filtering Method
指導教授: 戴碧如
Bi-Ru Dai
口試委員: 蔡曉萍
戴碧如
徐國偉
鮑興國
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 35
中文關鍵詞: 分群過濾法加速k-means
外文關鍵詞: Filtering methond, Accelerating K-means
相關次數: 點閱:246下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

K-means 是資料探勘和機器學習中的眾所周知的分群算法。它廣泛應用於計算 機視覺,市場分割,社會網絡分析等各個領域。然而,k-means 在不必要的距離 計算上浪費大量的時間。因此,加速 k-means 已經成為一個有價值和重要的研 究。加速 k-means 演算法可以獲得與標準 k-means 演算法相同的結果,但是可 以提高原始 k-means 演算法速度。在這篇論文中,我們提出了一種新的加速 k-means 演算法,稱為 Fission-Fusion k-means,它在速度上優於目前最先進的加 速 k-means 演算法。我們提出的演算法的額外的佔用的內存上也遠小於其他加 速 k-means 演算法。Fission-Fusion k-means 演算法主要是在迭代期間通過有效 的過濾方法來加速 k-means 演算法。它可以很好的平衡距離計算的時間和過濾 的時間。我們用現實世界的資料集進行大量的實驗。在實驗中,現實世界的資料 集驗證了在大多數情況下,Fission-Fusion k-means 演算法可以顯著的優於目前最 先進的加速 k-means 演算法。此外,對於更分散的和有著自然群集的資料集, 我們的演算法也相對於其他加速 k-means 演算法更快。


K-means is a well-known clustering algorithm in data mining and machine learn- ing. It is widely applicable in various domains such as computer vision, market seg- mentation, social network analysis, etc. However, k-means wastes a large amount of time on the unnecessary distance calculations. Thus accelerating k-means has become a worthy and important topic. Accelerated k-means algorithms can achieve the same result as k-means, but only faster. In this paper, we present a novel accelerated exact k-means algorithm named Fission-Fusion k-means that is significantly faster than the state-of-the-art accelerated k-means algorithms. The additional memory consumption of our algorithm is also much less than other accelerated k-means algorithms. Fis- sion-Fusion k-means accelerates k-means by efficient filtering method during the iter- ations. It can balance these expenses well between distance calculations and the fil- tering time cost. We conduct extensive experiments on the real world datasets. In the experiments, real world datasets verify that Fission-Fusion k-means can considerably outperform the state-of-the-art accelerated k-means algorithms in the most cases. In addition, for more separated and naturally-clustered datasets, our algorithm is rela- tively faster than other accelerated k-means algorithms.

Table of Contents 指導教授推薦書 II 論文口試委員審定書 III Abstract IV 論文摘要 V 致 謝 VI Table of Contents VII List of Tables VIII List of Figures IX 1. Introduction 1 1.1 Background 1 1.2 Motivation and Contribution 1 1.3 Thesis Organization 2 2 Related Works 3 3 Proposed Method 4 3.1 The Framework of Our Algorithm 4 3.2 Filtering 1 (FUC): Filtering Based on Unchanging Centers 5 3.3 Filtering 2 (FC): Filtering for Clusters of Points 6 3.4 Filtering 3 (FG): Filtering for Groups of Points 8 3.4.1 Fission Step: Grouping Points Automatically 8 3.4.2 Fusion Step: Limiting the Increasing Number of Groups 10 3.4.3 Filtering for Groups of Points 12 3.5 Filtering 4 (FRP): Filtering Based on Representative Point 13 3.6 Summary of Filtering Methods 15 4 Experiment Study 18 4.1 Experiment Design 18 4.2 Separability 18 4.3 Avoided Distance Calculations 19 4.4 Effectiveness of Filtering Methods 21 4.5 Memory Cost Comparison and Relative Speedup 22 5 Conclusion and Future Work 24 6 Reference 25

1. Lloyd, S.: Least squares quantization in PCM. In: IEEE Trans. Information Theory, 28(2):129–137, 1982.
2. Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), 2007.
3. Sculley, D.: Web-scale k-means clustering. In: Proc. 19th Int. Conf. World Wide Web, pp. 1177–1178, 2010.
4. Wang, J., Wang, J., Ke, Q., Zeng, G., and Li, S.: Fast approximate k-means via cluster closures. In: IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), pp. 3037–3044, 2012.
5. Pelleg, D. and Moore, A.: Accelerating exact k-means algorithms with geometric reasoning. In: Proc. 5th ACM SIGKDD Int. Conf. Knowl. Disc. Data Mining (KDD), pp. 277–281, 1999.
6. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., R.Silverman, and Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. In: IEEE Trans. Patt. Anal. Mach. Intell., 24:881–892, 2002.
7. Elkan, C.: Using the triangle inequality to accelerate k-means. In: Proc. 20th Int. Conf. Mach. Learn. (ICML), pp. 147–153, 2003.
8. Hamerly, G.: Making k-means even faster. In: SIAM Int. Conf. Data Mining (SDM), pp. 130–140, 2010.
9. Drake, J. and Hamerly, G.: Accelerated k-means with adaptive distance bounds. In: 5th NIPS Work. Optim. Mach. Learn., pp. 579–587, 2012.
10. Drake, J.: Faster k-means clustering, 2013. In: Accessed online 19 August 2015.
11. Ding, Y., Zhao, Y., Shen, X., Musuvathi, M., and Mytkowicz, T.: Yinyang k-means: A drop-in replacement of the classic k-means with consistent speedup. In: Proc. 32nd Int. Conf. Mach. Learn. (ICML), pp. 579–587, 2015.
12. Ryˇsav´y, P. and Hamerly, G.: Geometric methods to accelerate k-means algorithms. In: SIAM Int. Conf. Data Mining (SDM), pp. 324–332, 2016.
13. Bottesch, T., Bu ̈hler, T. and Ka ̈chele, M.: Speeding up k-means by approximating Euclidean distances via block vectors. In: Proc. of the 33rd Int. Conf. Mach Learn, New York, NY, USA, 2016.
14. Newling, J. and Fleuret, F.: Fast K-Means with Accurate Bounds. In: Proc. of the 33rd Int. Conf. Mach Learn, New York, NY, USA, 2016.
15. Bache, K. and Lichman, M.: UCI machine learning repository, 2013. URL: http://archive.ics.uci.edu/ml/.
16. Joensuu.: clustering datasets – Joensuu homepage URL: https://cs.joensuu.fi/sipu/datasets/
17. Rong-En,F.: Libsvm homepage
URL: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
18. Arthur, D. and Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proc. of the 18th annual ACM-SIAM symposium on Discrete algorithm, pp. 1027-1035. Society for Industrial and Applied Mathematics, 2007.

無法下載圖示 全文公開日期 2024/02/13 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE