簡易檢索 / 詳目顯示

研究生: 董柏均
Po-Chun Tung
論文名稱: 以無監督式學習為核心的分散式發佈訂閱系統設計
Exploiting Unsupervised Learning in Publish/Subscribe System Design
指導教授: 陳秋華
Chyouhwa Chen
鄧惟中
Wei-Chung Teng
口試委員: 李育杰
Yuh-Jye Lee
鮑興國
Hsing-Kuo Pao
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2013
畢業學年度: 101
語文別: 中文
論文頁數: 52
中文關鍵詞: publish/subscribe servicessubscriptionpublished eventstructured P2P Networksunsupervised machine learning
外文關鍵詞: publish/subscribe services, subscription, published event, structured P2P Networks, unsupervised machine learning
相關次數: 點閱:286下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

語義發佈/訂閱系統中既有的訂閱分群技術,藉由分析訂閱間語言及語法上之相似及相異度,作為將訂閱分成不同群組之依據。但訂閱間語法上之相似度不易定義且關係複雜多變,因此分群的效果並不佳。本計劃中我們擬以無監督式學習方法為發佈/訂閱系統的設計核心,最後探討分散式環境下,以無監督式學習作為發佈/訂閱系統設計需考量的議題。本計劃擬探討的議題有三。
首先,以無監督式學習的方式,作為訂閱分群方法。我們研究無監督式學習的技巧,發掘使用者間隱含的社群關係的能力,作為訂閱分群的方法。藉由無監督式學習的方式將訂閱分群,避免掉訂閱間語法之相似度分析的問題,且可以更有效地發掘使用者間隱含的關係。
第二,研究事件分群及批次處理的策略。將發佈的事件收集成一個批次,以無監督式學習的技巧將事件分群,再以資料廣播的方式傳遞。由於社會現象中熱門事件呈現Zipf分佈,這些訂閱者的事件可同時以一個封包傳遞,因此可以大幅降低網路資源的使用。
第三,本計劃擬研究無監督式學習技巧在分散式發佈/訂閱架構的研究,我們擬將前兩項研究的結果建構在分散式無監督式學習環境中,探討訂閱分群及事件批次處理策略在在分散式架構中,對其效能的影響。


Existing publish/subscribe systems employ extensive subscription clustering as a core technology to reduce system operating costs in subscription storage, network transmission, and event matching. However, current subscription clustering is based on syntactic analysis of the similarity between subscriptions, resulting in exponential explosion of cases and reasoning difficulty.
In this project, we propose to study the employment of unsupervised learning technology as the core for publish/subscribe system design. Specifically, the technology is used as a means for subscription clustering, multiple event clustering ,and batch delivery of clustered events. In this manner, the storage and delivery costs of the system are greatly reduced. The specific topics we propose to study in this project are as follows:
1.We propose to issues related to the employment of unsupervised learning techniques for subscription clustering. Using a set of training events, the subscriptions are partitioned into clusters with similar interests. Crucial to the success of the approach are the issues of training event selection, and summary representation of the portioned clusters, since a summary of all the subscriptions must be distributed to all the nodes in the network.
2.The unsupervised learning techniques are similarly applied to the partitioning of published events. We also propose to construct in a distributed manner a multicast tree for each cluster, so that future events destined for each cluster can be delivered efficiently. The clustered events are then broadcast using the per-cluster multicast tree constructed to reduce network overhead.
3.Finally, as publish/subscribe systems operate in a distributed setting, the unsupervised learning component must work in a distributed manner as well. Fortunately, there have been several previous works studying distributed approaches to clustering. We intend to investigate the suitability of those approaches for our distributed publish/subscribe architecture.

誌謝 摘要 Abstract 目錄 圖表目錄 1 Introduction ⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯1 1.1Publish/Subscribe services⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯1 1.2Chord Structured Overlay Network⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯4 1.3The Ferry Publish/Subscribe System⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯7 2A Pub/Sub Design that Exploits Unsupervised Learning⋯⋯⋯⋯⋯⋯11 2.1Cluster Similar subscriptions into clusters⋯⋯⋯⋯⋯⋯⋯11 2.2Broadcast to find global top-k medoids⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯15 2.3Create multicast tree for each cluster⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯21 2.4Batch Event Delivery⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯23 3Performance Evaluation⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯28 4Conclusion and Future Work⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯48 5Reference⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯⋯50

1.R Baldoni, C Marchetti, A Virgillito, R Vitenberg, Content-Based Publish-Subscribe over Structured Overlay Networks, ICDCS 2005
2.J. Branch, B. Szymanski, C. Gionnella, R. Wolff, and H. Kargupta, In-Network Outlier Detection in Wireless Sensor Networks. IEEE International Conference on Distributed Computing Systems, ICDCS, 2006
3.Fengyun Cao, Jaswinder Pal Singh: MEDYM: Match-Early with Dynamic Multicast for Content-Based Publish-Subscribe Networks. ACM /USENIX Middleware Conference 2005
4.Y. Choi, H. Lee, K. Park, and D. Park, “A new peer-to-peer overlay network for content-based publish/subscribe systems,” in IEEE GLOBECOM, 2005.
5.A. Carzaniga, M.J. Rutherford, and A.L. Wolf, "A Routing Scheme for Content-Based Networking". Proceedings of IEEE INFOCOM 2004.
6.Chyouhwa Chen, Chia-Liang Tsai, Kuen-Cheng Tsai, Fleet: An Effective System for Publish/Subscribe Service over P2P Networks, International Conference on Algorithms and Architectures for Parallel Processing, LNCS, 2009
7.Chyouhwa Chen, Chia-Liang Tsai, and Shi-Jinn Horng, Exploiting Attribute Popularity Distribution Skew to Enhance the Performance of Peer to Peer Publish/Subscribe Systems, International Journal of Innovative Computing Information and Control, 2010
8.Gregory Chockler, Roie Melamed, Yoav Tock, Roman Vitenberg,Constructing scalable overlays for pub-sub with many topics, Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing, 2007
9.Y. Choi, H. Lee, K. Park, and D. Park, “A new peer-to-peer overlay network for content-based publish/subscribe systems,” in IEEE GLOBECOM, 2005
10.S. Datta, C. Giannella, and H. Kargupta, “K-means clustering over a large, dynamic network,” in Proceedings of the Sixth SIAM International Conference on Data Mining, Bethesda, Maryland, USA, 2006
11.S. Datta, K. Bhaduri, C. Giannella, R.Wolff, and H. Kargupta, “Distributed data mining in peer-to-peer networks,” Internet Computing, IEEE,, vol. 10, no. 4, pp. 18–26, 2006
12.S. Datta, C. Giannella, and H. Kargupta, “Approximate distributed k-means clustering over a peer-to-peer network,” IEEE Transactions on Knowledge and Data Engineering, vol. 21, no. 10, pp. 1372–1388, 2009
13.Franoise Fabret, Arno Jacobsen, Franois Llirbat, Joo Pereira, Ken Ross, Dennis Shasha, Filtering algorithms and implementation for very fast publish/subscribe systems, ACM SIGMOD Record, Volume 30 , Issue 2, 2001
14.Giuseppe Di Fatta, Francesco Blasa, Simone Cafiero ,and Giancarlo Fortino, Epidemic K-Means Clustering, 11th IEEE International Conference on Data Mining Workshops, 2011
15.P. Haghani, S. Michel, and K. Aberer, “Distributed Similarity Search in High Dimensions Using Locality Sensitive Hashing,” Proceedings of the 12th International Conference on Extending Database Technology, 2009
16. Jiawei Han, Micheline Kamber, and Jian Pei, Data Mining: Concepts and Techniques, 3rd edition, Morgan Kaufmann, 2011
17.Kato, Daishi; Elkhiyaoui, Kaoutar; Kunieda, Kazuo; Yamada, Keiji; Michiardi, Pietro, A scalable interest-oriented peer-to-peer Pub/Sub network, Peer-to-Peer Networking and Applications Journal, Springer, 2010
18.D. Kempe, A. Dobra, and J. Gehrke, “Gossip-based computation of aggregate information,” in Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, oct. 2003
19.S.C. Lo and Y. T. Chiu, "Design of Content-Based Publish/Subscribe Systems over Structured Overlay Networks," IEICE Trans. on Information and Systems, vol. E91-D, no.5, pp. 1504-1511, May 2008
20.K Lua, J Crowcroft, M Pias, R Sharma, S Lim, A survey and comparison of peer-to-peer overlay network schemes, Communications Surveys & Tutorials, IEEE, 2005
21.Majumder, A. Shrivastava, N. Rastogi, R. Srinivasan, A., Scalable Content-Based Routing in Pub/Sub Systems, Annual Conference of the IEEE Communications Society, INFOCOM, 2009
22.P.Magdalinos, M.Vazirgiannis, D.Valsamou, “Distributed Knowledge Discovery with Non Linear Dimensionality Reduction”, in the Proceedings of the 14th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'10), Hyderabad, India, June 2010
23.Majumder, A. Shrivastava, N. Rastogi, R. Srinivasan, A., Scalable Content-Based Routing in Pub/Sub Systems, Annual Conference of the IEEE Communications Society( INFOCOM) 2009
24.Joseph (Seffi) Naor, Debmalya Panigrahi, Mohit Singh. Online Node-weighted Steiner Tree and Related Problems. IEEE Symposium on Foundations of Computer Science, 2011
25.RSS, http://en.wikipedia.org/wiki/RSS_(file_format)
26.Weixiong Rao, Lei Chen, Ada W. Fu, On Efficient Content Matching in Distributed Pub/Sub Systems, In Annual Conference of the IEEE Communications Society (INFOCOM), 2009
27.Riabov, Z. Liu, J. Wolf, P. Yu and L. Zhang: Clustering algorithms for content-based publication-subscription systems. In: Proceedings of IEEE ICDCS, 2002
28.H. Roitman, D. Carmel, and E. Yom-Tov - " Maintaining Dynamic Channel Profiles on the Web ". The 34th International Conference on Very Large Data Bases (VLDB), 2008
29.X.J. Shen, Z.J. Zha, Q. Zhu, H.B. Yang and P.Y. Gu, Approximate distributed clustering by learning the confidence radius on Fisher discriminant ratio, IET Electronics Letters, 2012
30.Lei Shi, Zhimin Gu, Lin Wei, and Yun Shi “An Applicative Study of Zipf’s Law on Web Cache,” International Journal of Information Technology, Vol. 12 No.4 2006
31.K. Sripanidkulchai, B. Maggs, H. Zhang., “Efficient Content Location Using Interest-Based Locality in Peer-to-Peer Systems,” IEEE INFOCOM, 2003
32.A. Srinivas, G. Zussman, and E. Modiano, Construction and Maintenance of Wireless Mobile Backbone Networks, IEEE/ACM Trans. on Networking, 17(1), 2009
33.P. Triantafillou and I. Aekaterinidis, Content-Based Publish- Subscribe over Structured P2P Networks, Proc. Third Int’l Workshop Distributed Event-Based Systems (DEBS ’04), pp. 104-109, May 2004.
34.Visan, A.; Istin, M.; Pop, F.; Xhafa, F.; Cristea, V. “Peer Interest-based Discovery for Decentralized Peer-to-Peer Systems,” International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC), 2010
35.Ian H. Witten and Eibe Frank. Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann, 2011
36.Yingwu Zhu and Yiming Hu, Ferry A P2P-Based Architecture for Content-Based Publish/Subscribe Services, IEEE Transactions on Parallel and Distributed System, 2007

無法下載圖示 全文公開日期 2018/08/05 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE