簡易檢索 / 詳目顯示

研究生: 葉倚任
Yi-Ren Yeh
論文名稱: 決策樹規則及資料點之排序
Ranking Rules and Instances of Decision Trees
指導教授: 李育杰
Yuh-Jye Lee
口試委員: 張源俊
Yuan-chin Chang
許鈞南
Chun-Nan Hsu
鮑興國
Kuo Kenneth Pao
項天瑞
Tien-Ruey Hsiang
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 英文
論文頁數: 36
中文關鍵詞: 決策樹單層貝氏網路
外文關鍵詞: Decision Tree, Naive Bayes
相關次數: 點閱:209下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

傳統上決策樹會以在葉節點上資料點的類別比例來當作資料點為某類別的機率估計值,然而這樣方式會造成在同一個葉節點的資料點具有相同的機率估計值。在一篇論文中,我們提出一個階層式的架構來結合決策樹與單層的貝氏網路(Naïve Bayes),進而改善資料點的機率估計值。在這個階層式的架構中,第一階段為對規則(葉節點)的排序,第二階段為利用單層的貝氏網路對葉節點內資料集的排序。而在單層的貝氏網路方面,我們也提出了許多方式來改善其對資料點機率的估計。最後的實驗數據也顯示出我們提出的方法比其他方法擁有較好的效果。


Traditionally, decision trees rank instances by using the local
probability estimations for each leaf node. The instances in the
same leaf node will be estimated with equal probabilities. In this
thesis, we propose a hierarchical ranking strategy by combining
decision trees and leaf weighted Na\"{i}ve Bayes to improve the local probability estimation for a leaf node. We consider the importance of the rules, and then rank the instances fit in with the rules. Because the probability estimations based on Na\"{i}ve Bayes might be poor, we investigate some different techniques which were proposed to modify Na\"{i}ve Bayes as well. Experiments show that our proposed method has significantly better performance than that of other methods according to paired $t$-test. All results are evaluated by using AUC (Area under ROC Curve) instead of classification accuracy.

1 Introduction 1 2 Related Work 4 3 Decision Trees and Naıve Bayes Models 6 3.1 Decision Trees Model . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.1.1 Generating Decision Trees . . . . . . . . . . . . . . . . . . . . . . 7 3.1.2 Incorporating Numeric Attributes . . . . . . . . . . . . . . . . . . . 9 3.1.3 Prunning Strategies . . . . . . . . . . . . . . . . . . . . . . .. . 10 3.2 Na¨ıve Bayes Model . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.1 Conventional Na¨ıve Bayes Model . . . . . . . . . . . . . . . . . . . 12 3.2.2 Weighted Na¨ıve Bayes Model . . . . . . . . . . . . . . . . . . . . . 13 4 Hierarchical Ranking of Decision Trees 16 4.1 Ranking Rules and Ranking Instances in a Leaf Node . . . . . . . . . . 16 4.2 Leaf Na味ve Bayes Classifier . . . . . . . . . . .. . . . . . . . . . . 17 4.3 Estimating Pr(ai|c) . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.4 Using Weighted Naive Bayes . . . . . . . . . . . . . . . . . . . . . .. 19 5 Numerical Results and Comparisons 22 5.1 Comparing Our Proposed Method with Other Methods . . . . . . . . . . . .23 5.2 Parameters Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 6 Conclusion and Future Work 32

[1] I. Alvarez and S. Bernard. Ranking cases with decision trees: a geometric method that preserves intelligibility. In Proceedings of 18th Internationa Conference on Artificial Intelligence(IJCAI-2005), 2005.

[2] C. Blake and C. Merz. Uci repository of machine learning databases. niversity of California (http://www.ics.uci.edu/∼mlearn/MLRepository.html), 1998.

[3] A. P. Bradley. The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30:1145–1159, 1997.

[4] L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, Belmont, CA, 1984.

[5] C. E. Brodley and P. E. Utgoff. Multivariate decision trees. Machine Learning, 19(1):45–77, 1995.

[6] S. Chakrabarti. Mining The Web: Discovering Knowledge From Hypertext Data.
Morgan Kaufmann, San Francisco, CA, 2002.

[7] C. Cortes and M. Mohri. AUC optimization vs. error rate minimization. In Advances in Neural Information Processing Systems(NIPS 2003), 2004.

[8] D. Dash and G. F. Cooper. Model averaging for prediction with discrete bayesian networks. Journal of Machine Learning Research, 5:1177–1203, 2004.

[9] J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretizations of continuous features. In Proceedings of the 12th International Conference on Machine Learning, pages 194–202, New York, 1995. Morgan Kaufmann.

[10] M. H. Dunham. Data Mining: Introductory and Advanced Topics. Prentice Hall, New Jersey, 2003.

[11] J. Egan. Signal detection theory and roc analysis. New York:Academic Pres, 1975.

[12] U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence, pages 1022–1029, 1993.

13] C. Ferri, P. Flach, and J. Hern´andez-Orallo. Improving the AUC of probabilistic estimators trees. In Proceedings of 14th European Conference on Machine Learning(ECML’2003), pages 121–132, 2003.

[14] J. Gama, L. Torgo, and C. Soares. Dynamic discretization of continuous attributes. In Proceedings of the Iberoamericam Conference on AI (IBERAMIA-98), pages 160–169. Springer-Verlag, 1998.

[15] T. G¨artner and P. A. Flach. Wbcsvm: Weighted bayesian classification based on support vector machines. In Proceedings of the 18th International Conference on Machine Learning, pages 207–209, 2001.

[16] D. Green and J. Swets. Signal detection theory and psychophysics. New York:Wiley, 1966.

[17] D. J. Hand and R. J. Till. A simple generalisation of the area under the roc curve for multiple class classification problems. Machine Learning, 45:171–186, 2001.

[18] M. Kantardzic. Data Mining: Concepts, Models, Methods, and Algorithms. Wiley-IEEE Press, New Jersey, 2002.

[19] R. Kohavi and G. John. Wrappers for feature subset selection. Artificial Intelligence, 97:273–324, 1997.

[20] R. Kohavi and M. Sahami. Error-based and entropy-based discretization of continuous features. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), pages 114–119. AAAI Press, 1996.

[21] P. Langley and S. Sage. Induction of selective bayesian classifiers. In Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, pages 399–406, 1994.

[22] X. B. Li, J. R. Sweigart, J. T. C. Teng, J. M. Donohue, L. A. Thombs, and S. M. Wang. Multivariate decision trees using linear discriminants and tabu search. Systems, Man and Cybernetics, Part A, IEEE Transactions on, 33(2):194–205, 2003.

[23] C. X. Ling, J. Hunag, and H. Zhang. AUC: a astatistically consistent and more discriminating measure than accuracy. In Proceedings of 18th Internationa Conference on Artificial Intelligence(IJCAI-2003), 2003.

[24] C. X. Ling and J. Yan. Decision tree with better ranking. In Proceedings of 2003 International Conference on Machine Learning (ICML’2003), pages 480–487, 2003.

[25] M. Mehta, J. Rissanen, and R. Agrawal. MDL-based decision tree pruning. In
Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD-95), pages 216–221. AAAI Press, 1995.

[26] C. Metz. Basic principles of roc analysis. Seminars in NuclearMedcine, 8:283–298, 1978.

[27] J. Mingers. An empirical comparison of pruning methods for decision tree induction. Machine Learning, 4(3):227–243, 1989.

[28] J. Mingers. An empirical comparison of selection measures for decision tree induction. Machine Learning, 3(4):319–342, 1989.

[29] T. M. Mitchell. Machine Learning. McGraw-Hill, Boston, 1997.

[30] H. K. Pao, S. C. Chang, and Y. J. Lee. Model trees for hybrid data type classification. In Proceedings of 6th Internationa Conference on Intelligent Data Engineering and Automated Learning, 2005.

[31] J. Platt. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. in Advances in Large Margin Classifiers, A. Smola, P. Bartlett, B. Sch¨olkopf, and D. Schuurmans, Eds. Cambridge, MA: MIT Press, 2000.

[32] F. Provost and P. Domingos. Well-trained pets: Improving probability estimation trees. Technical Report CDER #00-04-IS, 2000.

[33] F. Provost and P. Domingos. Tree induction for probability-based rankings. Machine Learning, 52:199–215, 2003.

[34] F. Provost, T. Fawcett, and R. Kohavi. The case against accuracy estimation for comparing induction algorithms. In Proceedings of the fifteenth international conference on machine learning, pages 445–453, 1998.

[35] J. R. Quinlan. Induction of decision trees. Machine Learning, 1(1):81–106, 1986.

[36] J. R. Quinlan. Simplifying decision trees. International Journal of Man Machine Studies, 27:221–234, 1987.

[37] J. R. Quinlan. Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4:77–90, 1996.

[38] J. Swets. Measuring the accuracy of diagnostic systems. Science, 240:1285–1293, 1988.

[39] I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco, CA, 2000.
http://www.cs.waikato.ac.nz/∼ml/index.html.

[40] B. Zadrozny and C. Elkan. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In Proceedings of 18th International Conference on Machine Learning, pages 609–616, 2001.

[41] H. Zhang and S. Sheng. Learning weighted naive bayes with accurate ranking. In Proceedings of the 4th IEEE International Conference on Data Mining (ICDM’04), pages 567–570, 2004.

QR CODE