簡易檢索 / 詳目顯示

研究生: 廖建維
Jian-Wei Liao
論文名稱: 在資料串流上針對概念漂移與回復型類別問題的集成式學習方法
An Ensemble Learning Approach for Data Stream with Concept Drift and Recurring Class Problem
指導教授: 戴碧如
Bi-Ru Dai
口試委員: 鮑興國
Hsing-Kuo Pao
蔡曉萍
Hsiao-Ping Tsai
戴志華
Chih-Hua Tai
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2014
畢業學年度: 102
語文別: 英文
論文頁數: 40
中文關鍵詞: 概念飄移資料串流探勘回復型類別集成式分類器
外文關鍵詞: Concept drift, data stream mining, recurring class problem, ensemble classifier
相關次數: 點閱:232下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 概念飄移(Concept Drift)近年來已成為資料探勘領域中分析非平穩分佈(Non-stationary distribution)的重要議題,特別在處理資料串流(Data stream)這種資料型態時,因為其具有資料分佈會隨著時間產生變化這項重要的特徵,因此很容易產生概念飄移。而其中概念飄移大致又分為急遽(Sudden)及平緩(Gradual)型概念飄移。現有的研究方法因為概念飄移的種類難以判斷,所以往往只適用在解決單一種類型的概念飄移,無法同時全面克服,並且在資料串流上資料也可能隨著概念飄移產生回復型類別問題(Recurring class problem),這都是我們需要去了解並且克服的。鑑此,我們提出一個新的權重機制用以調整基底分類模型與新到達的訓練資料,希望藉此能更快的適應新的概念,而從結果也顯示出我們所提出的方法不只對於平緩型概念飄移能夠快速的適應,對於急遽型概念飄移與回復型類別也能有效的解決。


    Recently, concept drift has become an important issue while analyzing non-stationary distribution data in data mining. For example, data streams carry a characteristic that data vary by time, and there is probably concept drift in this type of data. Concept drifts can be categorized into sudden and gradual concept drifts in brief. Most of researches concentrate on solving one type of concept drift since the type of concept drift is usually difficult to be identified. Moreover, there is an important issue in the data stream called recurring class problem which is caused by concept drift. This problem often occurs in the non-stationary environment. In light of these reasons, we propose two new weighting mechanisms to base models in the ensemble and new arriving data in order to adapt to current concept quickly. The experimental results show that our method is not only stable enough on the datasets with gradual concept drifts but also flexible to adapt to sudden concept drifts and the recurring class problem.

    指導教授推薦書 II 論文口試委員審定書 III Abstract IV 論文摘要 V 致 謝 VI Table of Contents VII List of Figures VIII List of Tables IX 1. Introduction 1 1.1 Background 1 1.2 Motivation and Contribution 4 1.3 Thesis Organization 5 2. Related Works 6 2.1 Single Classifier Methods 7 2.2 Ensemble Methods 7 2.3 Hybrid Ensemble Methods 7 3. Proposed Method 10 3.1 System Architecture 11 3.2 Weighting Ensemble with Accuracy and Growth Rate 12 3.3 Weighting Training instances with Mean Absolute Error 15 4 Experiment Study 18 4.1 Datasets 18 4.2 Experimental Setup 20 4.3 Experimental Results 20 4.3.1 Varying the Chunk Size 20 4.3.2 Varying the Ensemble Size 23 4.3.3 Comparisons of Memory Usage and Processing Time 25 5 Conclusion and Future Works 28 Reference 29 授權書 31

    1. Tsymbal, A.: The Problem of Concept Drift: Definitions and Related Work (2004)
    2. Kelly, M.G., Hand, D.J., Adams, N.M.: The Impact of Changing Populations on Classifier Performance. Knowledge Discovery and Data Mining, pp. 367-371 (1999)
    3. Domingos, P., Hulten, G.: Mining High Speed Data Streams. ACM SIGKDD Conference on Knowledge Discovery and Data Mining, (2000)
    4. Hulten, G., Spencer, L., Domingos, P.: Mining Time-Changing Data Streams. ACM SIGKDD Conference on Knowledge Discovery and Data Mining, (2001)
    5. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with Drift Detection. Advances in Artificial Intelligence, Proceedings of SBIA 2004, vol. 3171, pp. 286--295. Springer Verlag (2004)
    6. Baena-Garcia, M., Campo-Avila, J., Del, F.R., Bifet, A.: Early Drift Detection Method. In: Proceedings 24th ECML PKDD International Workshop on KnowledgeDiscovery From Data Streams (IWKDDS 2006), Berlin, Germany, pp. 77–86 (2006)
    7. Street, W. Nick and Kim, YongSeog. "A streaming ensemble algorithm (SEA) for large-scale classification.." Paper presented at the meeting of the KDD, (2001).
    8. Wang, Haixun, Fan, Wei, Yu, Philip S. and Han, Jiawei. "Mining concept-drifting data streams using ensemble classifiers.." Paper presented at the meeting of the KDD, (2003).
    9. Nishida, K., Yamauchi, K., Omori, T.: ACE: Adaptive Classifiers-Ensemble System for Concept-Drifting Environments. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) Multiple Classifier Systems, vol. 3541, pp. 176-185. Springer (2005)
    10. Brzezinski, D., Stefanowski, J.: Accuracy Updated Ensemble for Data Streams with Concept Drift. In: Corchado, E., Kurzynski, M., Wozniak, M. (eds.) HAIS 2, vol. 6679, pp. 155-163. Springer (2011)
    11. Brzezinski, D., Stefanowski, J.: Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm. Neural Networks and Learning Systems, IEEE Transactions on PP, 1-1 (2013)
    12. Kolter, J.Z., Maloof, M.A.: Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts. Journal of Machine Learning Research 8, 2755-2790 (2007)
    13. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analysis. Journal of Machine Learning Research 11, 1601-1604 (2010)
    14. Harries, M., Wales, N.S.: Splice-2 comparative evaluation: Electricity pricing. (1999)
    15. Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., Gavalda, R.: New ensemble methods for evolving data streams. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 139-148. ACM(2009)
    16. Oza, N.C., Russell, S.: Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 359-364. ACM(2001)
    17. Bifet, Albert, Holmes, Geoffrey and Pfahringer, Bernhard. "Leveraging Bagging for Evolving Data Streams." Paper presented at the meeting of the ECML/PKDD (1), 2010.
    18. Oza, Nikunj C.. "Online bagging and boosting.." Paper presented at the meeting of the SMC, 2005.
    19. A. Bifet and R. Gavald`a. Learning from time-changing data with adaptive win-dowing. In SDM, 2007.

    無法下載圖示 全文公開日期 2019/07/28 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE