運用高效的過濾法加速 K-means 演算法｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	于喬 Qiao Yu
論文名稱：	運用高效的過濾法加速 K-means 演算法 Accelerated K-means Algorithm Based on Efficient Filtering Method
指導教授：	戴碧如 Bi-Ru Dai
口試委員:	蔡曉萍戴碧如徐國偉鮑興國
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	35
中文關鍵詞：	分群、過濾法、加速k-means
外文關鍵詞：	Filtering methond, Accelerating K-means
相關次數：	點閱：246 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

上一筆

K-means 是資料探勘和機器學習中的眾所周知的分群算法。它廣泛應用於計算機視覺,市場分割,社會網絡分析等各個領域。然而,k-means 在不必要的距離計算上浪費大量的時間。因此,加速 k-means 已經成為一個有價值和重要的研究。加速 k-means 演算法可以獲得與標準 k-means 演算法相同的結果,但是可以提高原始 k-means 演算法速度。在這篇論文中,我們提出了一種新的加速 k-means 演算法,稱為 Fission-Fusion k-means,它在速度上優於目前最先進的加速 k-means 演算法。我們提出的演算法的額外的佔用的內存上也遠小於其他加速 k-means 演算法。Fission-Fusion k-means 演算法主要是在迭代期間通過有效的過濾方法來加速 k-means 演算法。它可以很好的平衡距離計算的時間和過濾的時間。我們用現實世界的資料集進行大量的實驗。在實驗中,現實世界的資料集驗證了在大多數情況下,Fission-Fusion k-means 演算法可以顯著的優於目前最先進的加速 k-means 演算法。此外,對於更分散的和有著自然群集的資料集, 我們的演算法也相對於其他加速 k-means 演算法更快。

K-means is a well-known clustering algorithm in data mining and machine learn- ing. It is widely applicable in various domains such as computer vision, market seg- mentation, social network analysis, etc. However, k-means wastes a large amount of time on the unnecessary distance calculations. Thus accelerating k-means has become a worthy and important topic. Accelerated k-means algorithms can achieve the same result as k-means, but only faster. In this paper, we present a novel accelerated exact k-means algorithm named Fission-Fusion k-means that is significantly faster than the state-of-the-art accelerated k-means algorithms. The additional memory consumption of our algorithm is also much less than other accelerated k-means algorithms. Fis- sion-Fusion k-means accelerates k-means by efficient filtering method during the iter- ations. It can balance these expenses well between distance calculations and the fil- tering time cost. We conduct extensive experiments on the real world datasets. In the experiments, real world datasets verify that Fission-Fusion k-means can considerably outperform the state-of-the-art accelerated k-means algorithms in the most cases. In addition, for more separated and naturally-clustered datasets, our algorithm is rela- tively faster than other accelerated k-means algorithms.

Table of Contents
指導教授推薦書                                                    II
論文口試委員審定書                                               III
Abstract    IV
論文摘要    V
致    謝    VI
Table of Contents    VII
List of Tables    VIII
List of Figures    IX
   Introduction    1
1    Background    1
2    Motivation and Contribution    1
3    Thesis Organization    2
  Related Works    3
  Proposed Method    4
1    The Framework of Our Algorithm    4
2    Filtering 1 (FUC): Filtering Based on Unchanging Centers    5
3    Filtering 2 (FC): Filtering for Clusters of Points    6
4    Filtering 3 (FG): Filtering for Groups of Points    8
4.1    Fission Step: Grouping Points Automatically    8
4.2    Fusion Step: Limiting the Increasing Number of Groups    10
4.3    Filtering for Groups of Points    12
5    Filtering 4 (FRP): Filtering Based on Representative Point    13
6    Summary of Filtering Methods    15
  Experiment Study    18
1    Experiment Design    18
2    Separability    18
3    Avoided Distance Calculations    19
4    Effectiveness of Filtering Methods    21
5    Memory Cost Comparison and Relative Speedup    22
  Conclusion and Future Work    24
  Reference    25

                                

1. Lloyd, S.: Least squares quantization in PCM. In: IEEE Trans. Information Theory, 28(2):129–137, 1982.
2. Philbin, J., Chum, O., Isard, M., Sivic, J., and Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), 2007.
3. Sculley, D.: Web-scale k-means clustering. In: Proc. 19th Int. Conf. World Wide Web, pp. 1177–1178, 2010.
4. Wang, J., Wang, J., Ke, Q., Zeng, G., and Li, S.: Fast approximate k-means via cluster closures. In: IEEE Conf. Comp. Vis. Patt. Recogn. (CVPR), pp. 3037–3044, 2012.
5. Pelleg, D. and Moore, A.: Accelerating exact k-means algorithms with geometric reasoning. In: Proc. 5th ACM SIGKDD Int. Conf. Knowl. Disc. Data Mining (KDD), pp. 277–281, 1999.
6. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., R.Silverman, and Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. In: IEEE Trans. Patt. Anal. Mach. Intell., 24:881–892, 2002.
7. Elkan, C.: Using the triangle inequality to accelerate k-means. In: Proc. 20th Int. Conf. Mach. Learn. (ICML), pp. 147–153, 2003.
8. Hamerly, G.: Making k-means even faster. In: SIAM Int. Conf. Data Mining (SDM), pp. 130–140, 2010.
9. Drake, J. and Hamerly, G.: Accelerated k-means with adaptive distance bounds. In: 5th NIPS Work. Optim. Mach. Learn., pp. 579–587, 2012.
10. Drake, J.: Faster k-means clustering, 2013. In: Accessed online 19 August 2015.
11. Ding, Y., Zhao, Y., Shen, X., Musuvathi, M., and Mytkowicz, T.: Yinyang k-means: A drop-in replacement of the classic k-means with consistent speedup. In: Proc. 32nd Int. Conf. Mach. Learn. (ICML), pp. 579–587, 2015.
12. Ryˇsav´y, P. and Hamerly, G.: Geometric methods to accelerate k-means algorithms. In: SIAM Int. Conf. Data Mining (SDM), pp. 324–332, 2016.
13. Bottesch, T., Bu ̈hler, T. and Ka ̈chele, M.: Speeding up k-means by approximating Euclidean distances via block vectors. In: Proc. of the 33rd Int. Conf. Mach Learn, New York, NY, USA, 2016.
14. Newling, J. and Fleuret, F.: Fast K-Means with Accurate Bounds. In: Proc. of the 33rd Int. Conf. Mach Learn, New York, NY, USA, 2016.
15. Bache, K. and Lichman, M.: UCI machine learning repository, 2013. URL: http://archive.ics.uci.edu/ml/.
16. Joensuu.: clustering datasets – Joensuu homepage URL: https://cs.joensuu.fi/sipu/datasets/
17. Rong-En,F.: Libsvm homepage
URL: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
18. Arthur, D. and Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proc. of the 18th annual ACM-SIAM symposium on Discrete algorithm, pp. 1027-1035. Society for Industrial and Applied Mathematics, 2007.

全文公開日期 2024/02/13 (校內網路)
全文公開日期本全文未授權公開 (校外網路)
全文公開日期本全文未授權公開 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文