簡易檢索 / 詳目顯示

研究生: 王信凱
Sin-Kai Wang
論文名稱: 在類別不平衡的資料串流上針對概念漂移問題的幾何平均更新集成式學習方法
Classification on the Imbalanced Data Stream with Concept Drifts Using a G-means Update Ensemble Approach
指導教授: 戴碧如
Bi-Ru Dai
口試委員: 徐國偉
none
戴志華
none
吳怡樂
none
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2016
畢業學年度: 104
語文別: 英文
論文頁數: 41
中文關鍵詞: 資料串流探勘概念飄移集成式分類器類別不平衡問題。
外文關鍵詞: Data stream mining, Concept drift, Ensemble classifier, Imbalance class problem.
相關次數: 點閱:246下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

概念飄移近年來已成為資料探勘領域中分析非平穩分佈的重要議題,此外,資料串流上也有傾斜的類別分布,被稱為類別不平衡。實際上,在現實世界中,資料串流可能同時具有多個概念漂移和類別不平衡分佈。然而,由於大多數的研究方法並沒有同時地考慮類別不平衡和概念飄移問題,它們可能在整體平均準確度上有不錯的表現,但在少數類別的準確度卻是非常差,為了應對這些挑戰,本篇論文提出了一種可以進一步在有概念飄移的不平衡資料串流上提高少數類別準確度的新加權方法。實驗結果證實了我們的方法不僅可以在平均準確度上達到不錯的表現,而且還提高了在不平衡的資料串流上少數類別的準確度。


Concept drift has become an important issue while analyzing data streams. Further, data streams can also have skewed class distributions, known as class imbalance. Actually, in the real world, it is likely that a data stream simultaneously has multiple concept drifts and an imbalanced class distribution. However, since most research approaches do not consider class imbalance and the concept drift problem at the same time, they probably have a good performance on the overall average accuracy, while the accuracy of the minority class is very poor. To deal with these challenges, this paper proposes a new weighting method which can further improve the accuracy of the minority class on the imbalanced data streams with concept drifts. The experimental results confirm that our method not only achieves an impressive performance on the average accuracy but also improves the accuracy of the minority class on the imbalanced data streams.

指導教授推薦書 II 論文口試委員審定書 III Abstract IV 論文摘要 V 致 謝 VI Table of Contents VII List of Tables VIII List of Figures IX 1. Introduction 1 1.1 Background 1 1.2 Motivation and Contribution 2 1.3 Thesis Organization 2 2. Related Works 3 3. Proposed Method 6 3.1 System Architecture 7 3.2 Weighting mechanism of the ensemble 8 3.3 Dealing with the class imbalance in the data streams 9 4 Experiment Study 14 4.1 Experimental setup and datasets 14 4.2 Experimental Results 16 4.2.1 Sensitivity Analysis 16 4.2.2 Discussions on the classification accuracy 18 4.2.3 Discussions on the F-measure 21 4.2.4 Discussions on the imbalanced multiclass datasets 24 4.2.5 Discussions on the execution time 26 5 Conclusion and Future Works 29 6 Reference 30

1. Tsymbal, A.: The Problem of Concept Drift: Definitions and Related Work. Technical Report TCD-CS-2004-15, Computer Science Department, Trinity College Dublin (2004)
2. Kelly, M.G., Hand, D.J., Adams, N.M.: The Impact of Changing Populations on Classifier Performance. Knowledge Discovery and Data Mining, pp. 367-371 (1999)
3. João, G., Indrė, Ž., Albert, B., Mykola, P., Abdelhamid, B.: A Survey on Con-cept Drift Adaptation. ACM Computing Surveys, pp. 1-37 (2014)
4. Haibo, H., Edwardo A.G.: Learning from Imbalanced Data. Knowledge and Data Engineering, IEEE Transactions on, pp. 1263-1284 (2009)
5. Sukarna, B.M., Monirul I.X., Yao, Murase, K.: MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning. Knowledge and Data Engineering, IEEE Transactions on, pp. 405-425 (2014)
6. Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: An ensemble method for drifting concepts. Journal of Machine Learning Research 8, 2755–2790 (2007)
7. Blum, A.: Empirical support for winnow and weighted-majority algorithms: Re-sultson a calendar scheduling domain. Machine Learning 26(1), 5–23 (1997)
8. Silas, G.T.C.S., Geyson, D.S.S., Roberto, S.M.B.: Speeding Up Recovery from Concept Drifts. Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg, pp. 179-194 (2014)
9. Brzezinski, D., Stefanowski, J.: Accuracy Updated Ensemble for Data Streams with Concept Drift. In: Corchado, E., Kurzynski, M., Wozniak, M. (eds.) HAIS 2, vol. 6679, pp. 155-163. Springer (2011)
10. Minku, L.L., Yao, X.: DDD: A new ensemble approach for dealing with concept drift. IEEE Transactions on Knowledge and Data Engineering 24(4), pp. 619–633(2012)
11. Brzezinski, D., Stefanowski, J.: Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm. Neural Networks and Learning Systems, IEEE Transactions on PP, 1-1 (2013)
12. Shuo, W., Leandro, L.M., Xin, Y.: A Learning Framework for Online Class Im-balance Learning. Computational Intelligence and Ensemble Learning (CIEL), pp.36-45 (2013)
13. Ghazikhani, A., Reza, M., Hadi, S.Y.: Recursive least square perceptron model for non-stationary and imbalanced data stream classification. Evolving Systems, pp.119-131 (2013)
14. Mirza, B., Zhiping, L., Kar-Ann, T.: Weighted online sequential extreme learning machine for class imbalance learning. Neural processing letters, pp.465–486 (2013)
15. Shuo, W., Leandro L.M., Xin, Y.: Resampling-Based Ensemble Methods for Online Class Imbalance Learning. Knowledge and Data Engineering, IEEE Transactions on, pp. 1356-1368 (2015)
16. Oza, N.C., Russell, S.: Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the seventh ACM SIGKDD interna-tional conference on Knowledge discovery and data mining, pp. 359-364. ACM (2001)
17. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analy-sis. Journal of Machine Learning Research 11, 1601-1604 (2010)
18. Street, W.N., Kim, Y.: A streaming ensemble algorithm SEA for large-scale classification. In: Lee, D., Schkolnick, M., Provost, F.J., Srikant, R. (eds.) KDD, pp. 377-382. ACM (2001)
19. Oza, N.C., Russell, S.: Online bagging and boosting. In: Artificial Intelligence and Statistics 2001, pp. 105–112. Morgan Kaufmann (2001)
20. Kelly, M.G., Hand, D.J., Adams, N.M.: The impact of changing populations on classifier performance. In 00Proceedings of the fifth ACM SIGKDD internation-al conference on Knowledge discovery and data mining, pp. 367-371. ACM(1999)
21. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analy-sis. Journal of Machine Learning Research 11, 1601-1604 (2010)
22. Harries, M., Wales, N.S.: Splice-2 comparative evaluation: Electricity pricing. (1999)

無法下載圖示 全文公開日期 2021/08/24 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE