簡易檢索 / 詳目顯示

研究生: 吳祚旭
Tzuoh-shiuh Wu
論文名稱: 整合加權距離判斷於K-means分群演算法
The Integrated K-Means Clustering Algorithm with Weighting Distance Judgement
指導教授: 洪西進
Shi-Jinn Horng
口試委員: 古鴻炎
none
蔡鴻旭
none
江季翰
none
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2012
畢業學年度: 100
語文別: 中文
論文頁數: 113
中文關鍵詞: 演算法分群演算法K-means 演算法向量量化法(VQ)加權距離判斷(WDJ)LBG 演算法
外文關鍵詞: clustering algorithm, K-means algorithm, vector quantization, weighting distance judgment, enhanced LBG algorithm, fast global K-means algorithm
相關次數: 點閱:277下載:1
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 分群(Clustering)演算法應用在很多科學與工程的領域,以資訊科學領域方面而言,不論在機器學習(Machine Learning)、資料壓縮(Data Compression)、資料探勘(Data Mining)、影像分析(Image Analysis)、語音編碼(Speech Code)語音辨識(Speech Recognition),生物資訊(Bioinformatics)、人工智慧(Artificial Intelligence)等皆可看到它解決相關領域的一些問題的能力與貢獻度。
    在分群演算法中,K-means分群演算法的特性為設計簡單、時間複雜度低且應用廣泛久遠。由於傳統K-means演算法中做資料點 歸屬群心運算時,所用的距離量度只採用資料點與群心間的平方距離,並沒有將群落中現成可用的資料點數納入考量。
    因此本論文提出一個改進方法。將每一個群落中,各自歸屬該群落的資料點數,額外加入做為分群的加權參考值。用此加權參考值修改距離量度公式成為加權距離量度公式,將其適當地放進分群判斷規則中,以提昇分群的準確度。
    我們稱這種方法為加權距離判斷(Weighting Distance Judgment)並簡稱為WDJ。實作結果顯現出整合WDJ於傳統原始K-means演算法後,比一般傳統K-means演算法的效果來得更好。
    從K-means演算法發現到現今,已有很多學者提出各式各樣的改良K-means演算法。如果那些學者所提的修正方法並未修正距離量度公式,則本論文提出的WDJ,應能整合他們所提出的改良演算法。
    經實驗結果顯示,本論文所選出的改良K-means演算法於整合WDJ後,一樣能提昇改良演算法的分群準確度。因此本論文所提的加權距離判斷(WDJ)不僅能改善傳統原始的K-means分群演算法,也能夠提昇一些改良的K-means演算法的分群準確度。加入WDJ後,既不減低原有K-means 演算法的優適性,也不增加原有K-means 演算法的負擔度。


    Clustering algorithms can be extensively applied in many fields of science and engineering. Take some fields of computer science for an example, clustering algorithm shows its power and contribution on solving problems in some related field, no matter what in Machine Learning ,Data Compression ,Data Mining ,Image Analysis ,Speech Code ,Speech Recognition ,Bioinformatics or Artificial Intelligence.
    In clustering algorithms,the K-means algorithm present features of simplification and lower the degree of time complexity ,so it is used generally and historically. Due to the traditional K-means algorithm, the distance measurement which being used to computing the data points attributed to cluster centroid only adopt pure Euclidean distance between data points and cluster centroid without considering those data points which already exist in groups.
    Therefore, this essay would propose one improved method. Add data points attributed to each group points to the weighting reference value while clustering. By using the weighting reference value to revise the distance measurement formula as weighting distance measurement formula, then add into the rule of clustering judgment to promote the accuracy. This method is to be named as Weighting Distance Judgment (WDJ).
    The result of experiment represents that effects on the integrated original K-means algorithm with Weighting Distance Judgment(WDJ) is better than the general original K-means. In addition, other improved K-means algorithms integrated with WDJ can also promote the accuracy of clustering. For the reason that WDJ mentioned in this essay can not only improve the original K-means algorithm, but also other improved K-means algorithms.

    第一章 簡介1 1.1 原始K-MEANS分群演算法的流程2 1.2 在原始K-MEANS分群演算法中資料點活動的特性9 第二章 K-MEANS分群演算法的問題與改進12 2.1 有歸屬群落的資料點分群不佳的現象13 2.2 暫無歸屬群落的資料點分群不佳的現象16 2.3 群落間的距離與資料點數對WD的影響21 2.4 WD(WEIGHTING DISTANCE)的化簡29 2.5 利用WD(WEIGHTING DISTANCE)產生WDJ(WEIGHTING DISTANCE JUDGMENT)33 2.5.1 在BPD(Bottom Up Design)演算法中用WD33 2.5.2 在TDD(Top Down Design)演算法中用WD34 2.5.3 在K-means中使用WD的問題37 2.5.4 在TDD演算法修正WD成WDJ38 第三章比較與分析44 3.1 向量量化法(VECTOR QUANTIZATION)44 3.2 以LBG為實驗比較的方法47 3.3 WDJ應用在傳統的K-MEANS演算法50 3.4 K-MEANS演算法的改良64 3.5 WDJ應用在改良的K-MEANS演算法73 3.6 綜合比較87 第四章 總結93 參考文獻99

    [1] T. Kaukoranta , P. Franti, and O. Nevalainen, “A Fast Exact GLA Based on Code Vector Activity Detection,” IEEE Trans. on Image Processing, vol. 9, no. 8, Aug. 2000.
    [2] W.H. Equitz, “A new vector quantization Weighting algorithm,” IEEE Trans. on Acoustics, Speech and Signal Processing .37 (10)(1989) ,pp. 1568–1575.
    [3] R.T.Ng and J.Han, “CLARANS: A Method for Clustering Objects for Spatial Data Mining ,”IEEE Trans. On Knowledge and Data Engineering ,vol.14,no.5,pp.1003-1016,Oct.2002.
    [4] Y. Linde, A. Buzo, and R. M. Gray, “An algorithm for vector quantizer design , ” IEEE Trans. On Communications , 28 ,1980, pp. 84-95.
    [5] A.K.Jain and R.C. Dubes,Algorithms for Clustering Data,Prentice-Hall New Jersey,1988.
    [6] B.S. Everitt, Cluster Analysis,third ed., Edward Arnold/Halsted Press, London, 1992.
    [7] Aristidis Likasa, Nikos Vlassisb, and JakobJ. Verbeekb “The global k-means clustering algorithm,” Pattern Recognition 36 (2003) pp.451 – 461
    [8] W. B. Kleijn and K. K. Paliwal, Eds., Speech Coding and Synthesis.Amsterdam, The Netherlands: Elsevier, 1995.
    [9] K. K. Paliwal and B. S. Atal, “Efficient vector quantization of LPC parameters at 24 bits/frame,” IEEE Trans. Speech Audio Processing, vol.1, pp. 3–14, Jan. 1993
    [10] Anderberg, M. R., Cluster Analysis for Applications, Academic Press,New York, 1973
    [11] Jain, A. K., Murty, M. N., Flynn, P. J., Data clustering: a review ACM Computing Surveys 31 (3), 264_323. 1999.
    [12] Reich M, Liefeld T, Gould J, Lerner J, Tamayo P, Mesirov, "Gene Pattern Software Version 2.0", Journal ,Nature Genetics Vol. 38 No.5,2006, pp: 500-501.
    [13] F. Shena,* ,O. Hasegawab,c “An adaptive incremental LBG for vector quantization,” Neural Networks 2005
    [14] R.C. Gonzalez , R.E. Woods, Digital Image Processing, Prentice Hall, Upper Saddle River, NJ, 2002.
    [15] A. Gersho and R. M. Gray, Vector Quantization and Signal Compression. Boston, Kluwer Academic Pub., Boston,1992.
    [16] Michael Laszlo and Sumitra Mukherjee, Member, IEEE “A Genetic Algorithm Using Hyper-Quadtrees for Low-Dimensional K-means Clustering” IEEE Transactions on Pattern Analysis and Machine Intelligence, VOL. 28, NO. 4, APRIL 2006
    [17] MacQueen, J. B., Some methods for classification and analysis of multi-variate observation. In: In Le Cam, L.M and Neyman, J., editor, 5 Berkeley Symposium on Mathematical Statistics and Probability. Univ. of California Press ,1967.
    [18] G.P. Babu and M.N. Murty, “A Near-Optimal Initial Seed Value Selection for k-Means Algorithm Using Genetic Algorithm,”Pattern Recognition Letters, vol. 14, pp. 763-769, 1993.
    [19] Patane, G., & Russo, M., “The enhanced LBG algorithm.” Neural Networks, 14, 1219–1237 ,2001.
    [20] Vladimir, C., Filip, M. Learning from data—Concepts, theory, and methods. New York: Wiley, 1997.

    無法下載圖示 全文公開日期 2017/07/31 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE