研究生: |
王信凱 Sin-Kai Wang |
---|---|
論文名稱: |
在類別不平衡的資料串流上針對概念漂移問題的幾何平均更新集成式學習方法 Classification on the Imbalanced Data Stream with Concept Drifts Using a G-means Update Ensemble Approach |
指導教授: |
戴碧如
Bi-Ru Dai |
口試委員: |
徐國偉
none 戴志華 none 吳怡樂 none |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2016 |
畢業學年度: | 104 |
語文別: | 英文 |
論文頁數: | 41 |
中文關鍵詞: | 資料串流探勘 、概念飄移 、集成式分類器 、類別不平衡問題。 |
外文關鍵詞: | Data stream mining, Concept drift, Ensemble classifier, Imbalance class problem. |
相關次數: | 點閱:246 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
概念飄移近年來已成為資料探勘領域中分析非平穩分佈的重要議題,此外,資料串流上也有傾斜的類別分布,被稱為類別不平衡。實際上,在現實世界中,資料串流可能同時具有多個概念漂移和類別不平衡分佈。然而,由於大多數的研究方法並沒有同時地考慮類別不平衡和概念飄移問題,它們可能在整體平均準確度上有不錯的表現,但在少數類別的準確度卻是非常差,為了應對這些挑戰,本篇論文提出了一種可以進一步在有概念飄移的不平衡資料串流上提高少數類別準確度的新加權方法。實驗結果證實了我們的方法不僅可以在平均準確度上達到不錯的表現,而且還提高了在不平衡的資料串流上少數類別的準確度。
Concept drift has become an important issue while analyzing data streams. Further, data streams can also have skewed class distributions, known as class imbalance. Actually, in the real world, it is likely that a data stream simultaneously has multiple concept drifts and an imbalanced class distribution. However, since most research approaches do not consider class imbalance and the concept drift problem at the same time, they probably have a good performance on the overall average accuracy, while the accuracy of the minority class is very poor. To deal with these challenges, this paper proposes a new weighting method which can further improve the accuracy of the minority class on the imbalanced data streams with concept drifts. The experimental results confirm that our method not only achieves an impressive performance on the average accuracy but also improves the accuracy of the minority class on the imbalanced data streams.
1. Tsymbal, A.: The Problem of Concept Drift: Definitions and Related Work. Technical Report TCD-CS-2004-15, Computer Science Department, Trinity College Dublin (2004)
2. Kelly, M.G., Hand, D.J., Adams, N.M.: The Impact of Changing Populations on Classifier Performance. Knowledge Discovery and Data Mining, pp. 367-371 (1999)
3. João, G., Indrė, Ž., Albert, B., Mykola, P., Abdelhamid, B.: A Survey on Con-cept Drift Adaptation. ACM Computing Surveys, pp. 1-37 (2014)
4. Haibo, H., Edwardo A.G.: Learning from Imbalanced Data. Knowledge and Data Engineering, IEEE Transactions on, pp. 1263-1284 (2009)
5. Sukarna, B.M., Monirul I.X., Yao, Murase, K.: MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning. Knowledge and Data Engineering, IEEE Transactions on, pp. 405-425 (2014)
6. Kolter, J.Z., Maloof, M.A.: Dynamic weighted majority: An ensemble method for drifting concepts. Journal of Machine Learning Research 8, 2755–2790 (2007)
7. Blum, A.: Empirical support for winnow and weighted-majority algorithms: Re-sultson a calendar scheduling domain. Machine Learning 26(1), 5–23 (1997)
8. Silas, G.T.C.S., Geyson, D.S.S., Roberto, S.M.B.: Speeding Up Recovery from Concept Drifts. Machine Learning and Knowledge Discovery in Databases. Springer Berlin Heidelberg, pp. 179-194 (2014)
9. Brzezinski, D., Stefanowski, J.: Accuracy Updated Ensemble for Data Streams with Concept Drift. In: Corchado, E., Kurzynski, M., Wozniak, M. (eds.) HAIS 2, vol. 6679, pp. 155-163. Springer (2011)
10. Minku, L.L., Yao, X.: DDD: A new ensemble approach for dealing with concept drift. IEEE Transactions on Knowledge and Data Engineering 24(4), pp. 619–633(2012)
11. Brzezinski, D., Stefanowski, J.: Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm. Neural Networks and Learning Systems, IEEE Transactions on PP, 1-1 (2013)
12. Shuo, W., Leandro, L.M., Xin, Y.: A Learning Framework for Online Class Im-balance Learning. Computational Intelligence and Ensemble Learning (CIEL), pp.36-45 (2013)
13. Ghazikhani, A., Reza, M., Hadi, S.Y.: Recursive least square perceptron model for non-stationary and imbalanced data stream classification. Evolving Systems, pp.119-131 (2013)
14. Mirza, B., Zhiping, L., Kar-Ann, T.: Weighted online sequential extreme learning machine for class imbalance learning. Neural processing letters, pp.465–486 (2013)
15. Shuo, W., Leandro L.M., Xin, Y.: Resampling-Based Ensemble Methods for Online Class Imbalance Learning. Knowledge and Data Engineering, IEEE Transactions on, pp. 1356-1368 (2015)
16. Oza, N.C., Russell, S.: Experimental comparisons of online and batch versions of bagging and boosting. In: Proceedings of the seventh ACM SIGKDD interna-tional conference on Knowledge discovery and data mining, pp. 359-364. ACM (2001)
17. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analy-sis. Journal of Machine Learning Research 11, 1601-1604 (2010)
18. Street, W.N., Kim, Y.: A streaming ensemble algorithm SEA for large-scale classification. In: Lee, D., Schkolnick, M., Provost, F.J., Srikant, R. (eds.) KDD, pp. 377-382. ACM (2001)
19. Oza, N.C., Russell, S.: Online bagging and boosting. In: Artificial Intelligence and Statistics 2001, pp. 105–112. Morgan Kaufmann (2001)
20. Kelly, M.G., Hand, D.J., Adams, N.M.: The impact of changing populations on classifier performance. In 00Proceedings of the fifth ACM SIGKDD internation-al conference on Knowledge discovery and data mining, pp. 367-371. ACM(1999)
21. Bifet, A., Holmes, G., Kirkby, R., Pfahringer, B.: MOA: Massive Online Analy-sis. Journal of Machine Learning Research 11, 1601-1604 (2010)
22. Harries, M., Wales, N.S.: Splice-2 comparative evaluation: Electricity pricing. (1999)