簡易檢索 / 詳目顯示

研究生: 劉原呈
Yuan-Cheng Liu
論文名稱: 自適應並行近鄰傳遞演算法研究
A Study of Adaptive Map/Reduce Affinity Propagation Algorithm
指導教授: 吳怡樂
Yi-Leh Wu
口試委員: 唐政元
none
陳建中
none
閻立剛
none
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 37
中文關鍵詞: Affinity PropagationMap/ReduceHadoop分群演算法
外文關鍵詞: Affinity propagation, Map/Reduce, Hadoop, clustering algorithm
相關次數: 點閱:240下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • Affinity Propagation演算法是一個基於資料點與資料點間互相進行消息傳遞的演算法。其特點是不需要在分群前知道分群的數量。演算法會利用資料給予的資訊進行分群並找出適當的群數。在較早之前的研究有利用Hadoop,一個雲端運算的環境來實現Map/Reduce Affinity Propagation。但其有一個限制,即是很難知道其中一個參數值preference,這個參數跟分群結果的好壞息息相關。另外還有一篇研究利用Adaptive Affinity Propagation Clustering來解決此問題。本篇論文中,我們利用這兩種方法來改善參數值的問題,藉由Adaptive Map/Reduce Affinity Propagation,在多台電腦上同時運算,並且找到適當的preference值來得到較好的分群結果。

    在實驗方面,我們用Adaptive Map/Reduce Affinity Propagation的架構,進行資料分群的實驗,並將實驗結果與先前提到的分群方法比較。實驗結果在資料的分群上,Adaptive Map/Reduce Affinity Propagation於準確度、Davies–Bouldin index(DBI)以及Dunn index(DI)上都有不錯的結果。


    The Affinity Propagation (AP) is a clustering algorithm based on the concept of “message passing” between data points. Unlike most clustering algorithms such as k-means, the AP does not require the number of clusters to be determined or estimated before running the algorithm. There are implementation of AP on Hadoop, a distribute cloud environment, called the Map/Reduce Affinity Propagation (MRAP). But the MRAP has a limitation: it is hard to know what value of parameter “preference” can yield an optimal clustering solution. The Adaptive Affinity Propagation Clustering (AAP) algorithm was proposed to overcome this limitation to decide the preference value in AP. In this study, we propose to combine these two methods as the Adaptive Map/Reduce Affinity Propagation (AMRAP), which divides the clustering task to multiple mappers and one reducer in Hadoop, and decides suitable preference values individually for each mapper.
    In the experiments, we compare the clustering results of the proposed AMRAP with the original MRAP method. The experiment results support that the proposed AMRAP method outperforms the original MRAP method in terms of accuracy, Davies–Bouldin index and Dunn Index.
    In the experiments, we compare the clustering result of the proposed AMRAP with the MRAP method. The experiment results support that the proposed AMRAP method has good performance at accuracy, Davies–Bouldin index and Dunn Index.

    論文摘要 I ABSTRACT II CONTENTS III LIST OF FIGURES IV LIST OF TABLES V CHAPTER 1. INTRODUCTION 1 CHAPTER 2. APACHE HADOOP 4 2.1 HADOOP DISTRIBUTED FILE SYSTEM 5 2.2 JOBTRACKER AND TASKTRACKER: THE MAP/REDUCE ENGINE 5 CHAPTER 3. ALGORITHM OF AFFINITY PROPAGATION 6 3.1 AFFINITY PROPAGATION 6 3.2 ADAPTIVE AFFINITY PROPAGATION 8 3.3 MAP/REDUCE AFFINITY PROPAGATION 10 3.4 ADAPTIVE MAP/REDUCE AFFINITY PROPAGATION 12 CHAPTER 4. EVALUATION AND EXPERIMENTS 16 4.1 EVALUATION 17 4.2 RUNNING TIME TEST 11 4.3 DAVIES-BOULDIN INDEX RESULT 22 4.4 ACCURACY RESULT 23 4.5 DUNN INDEX RESULT 28 CHAPTER 5. CONCLUSIONS 29 REFERENCES 30

    [1] B. J. Frey and D. Dueck, “Clustering by Passing Messages Between Data Points”, Science 315, Pages 972–976, February 2007.
    [2] Y.C. He, Q.C. Chen, X.L. Wang, R.F. Xu, X.H. Bai, and X.J. Meng, “An adaptive affinity propagation document clustering”, Informatics and Systems (INFOS), 2010 The 7th International Conference, Pages 1-7, March 2010.
    [3] J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters”, Communications of the ACM - 50th anniversary issue: 1958 - 2008, Volume 51 Issue 1, Pages 107-113, January 2008.
    [4] Hadoop Releases, http://hHadoop.apache.org, referenced on March 1st, 2013.
    [5] M. A. Bhandarkar, “MapReduce programming with apache Hadoop”, International Parallel and Distributed Processing Symposium/International Parallel Processing Symposium - IPDPS(IPPS), Pages 19-23, April 2010.
    [6] M. Maurya and S. Mahajan, “Performance analysis of MapReduce programs on Hadoop cluster”, Information and Communication Technologies (WICT), Pages 505-510, 2012.
    [7] N. Lynch, “Distributed Algorithms” San Francisco, CA: Morgan Kaufmann Publishers. ISBN 978-1-55860-348-6, 1996.
    [8] K. Wang, J. Zhang, Dan Li, X. Zhang, T. Guo, “Adaptive Affinity Propagation Clustering”, Acta Automatica Sinica, 33(12):1242-1246, 2007.
    [9] C-Y Chu, “An improved affinity propagation clustering algorithm for map/reduce environment”, Master Thesis, National Taiwan University of Science and Technology, Taiwan, June 2013.
    [10] S. Babu, “Towards automatic optimization of MapReduce programs”, SoCC Proceedings of the 1st ACM symposium on Cloud computing, Pages 137-142. 2010.
    [11] Y. Zhu, J. Yu, C.Y. Jia, “Initializing K-means Clustering Using Affinity Propagation”, Ninth International Conference on Hybrid Intelligent Systems, August 2010.
    [12] “UCI Machine Learning Repository”, http://archive.ics.uci.edu/ml/, referenced on March 1st, 2013.
    [13] “Statlog Datasets: comparison of results”, http://www.is.umk.pl/projects/datasets-stat.html, referenced on March 1st, 2013.
    [14] “The Yale Face Database”, http://cvc.yale.edu/projects/yalefaces/yalefaces.html , referenced on March 1st, 2013.
    [15] D. L. Davies and D. W. Bouldin, “A Cluster Separation Measure”, IEEE Transactions on Pattern Analysis and Machine Intelligence. PAMI-1 (2): 224–227, 1979.
    [16] D.M.W. Powers, "Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation," vol. 2, no. 1, 2011, pp. 37-63, 2011.
    [17] J. C. Dunn, “A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters”, Journal of Cybernetics 3 (3): 32–57, 1973.
    [18] P.J. Rousseeuw, Silhouettes: "a graphical aid to the interpretation and validation of cluster analysis", Computational and Applied Mathematics, (20), 53-65, 1987.
    [19] S. Dudoit, J. Fridlyand. "A prediction-based resampling method for estimating the number of clusters in a dataset". Genome Biology, 3(7): 0036.1-0036.21, 2002.

    無法下載圖示 全文公開日期 2019/06/30 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE