簡易檢索 / 詳目顯示

研究生: 鍾興志
Hsin-Chih Chung
論文名稱: MDL-based Model Trees for Classification of Hybrid Type Data
MDL-based Model Trees for Classification of Hybrid Type Data
指導教授: 鮑興國
Hsing-Kuo Pao
口試委員: 劉庭祿
Tyng-Luh Liu
張源俊
Yuan-Chin Chang
李育杰
Yuh-Jye Lee
戴碧如
Bi-Ru Dai
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2007
畢業學年度: 95
語文別: 英文
論文頁數: 55
中文關鍵詞: model treeminimum description lengthdecision treesupport vector machine
外文關鍵詞: model tree, minimum description length, decision tree, support vector machine
相關次數: 點閱:315下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

We propose a method of model selection for the dataset of hybrid types, that is, the dataset includes both of nominal and numeric data attributes. Motivated by the effectiveness of decision tree on nominal data and the success of support vector machine on numeric data, we propose a model tree combining both models. We derive a synthesized Boolean attribute based on the classification from SVM applying only on those numeric attributes. After that, the SVM-synthesized attribute as well as all of the nominal attributes are collected for the decision tree induction, or specifically the ID3 algorithm which selects the "best" attribute based on some goodness criteria. The concept of model tree is not new. Different from the model tree proposed by Chang et al. in 2004, we aim at improving the performance by a Minimum Description Length approach. The MDL principle is adopted to balance the choice between the SVM-synthesized attribute and a discrete attribute by also considering their model complexity. That is, an SVM is considered a more complex model than a simple discrete classifier (such as "education = Master or Ph.D."). Therefore, a large penalty should be paid to an SVM classifier rather than a discrete classifier in the selection of best attribute in decision tree induction. The penalty gives in a form where its 1-D case coincides the one proposed by Quinlan in 1996 for a simple numeric classifier (such as "age>=22"). Our experiments show that the modification improves the prediction accuracy in many datasets from the real world.


We propose a method of model selection for the dataset of hybrid types, that is, the dataset includes both of nominal and numeric data attributes. Motivated by the effectiveness of decision tree on nominal data and the success of support vector machine on numeric data, we propose a model tree combining both models. We derive a synthesized Boolean attribute based on the classification from SVM applying only on those numeric attributes. After that, the SVM-synthesized attribute as well as all of the nominal attributes are collected for the decision tree induction, or specifically the ID3 algorithm which selects the "best" attribute based on some goodness criteria. The concept of model tree is not new. Different from the model tree proposed by Chang et al. in 2004, we aim at improving the performance by a Minimum Description Length approach. The MDL principle is adopted to balance the choice between the SVM-synthesized attribute and a discrete attribute by also considering their model complexity. That is, an SVM is considered a more complex model than a simple discrete classifier (such as "education = Master or Ph.D."). Therefore, a large penalty should be paid to an SVM classifier rather than a discrete classifier in the selection of best attribute in decision tree induction. The penalty gives in a form where its 1-D case coincides the one proposed by Quinlan in 1996 for a simple numeric classifier (such as "age>=22"). Our experiments show that the modification improves the prediction accuracy in many datasets from the real world.

1 Introduction 1.1 Motivation 1.2 Proposed Method 1.3 Thesis Outline 2 Classification of Hybrid Type Data 2.1 Decision Trees 2.1.1 Constructing a Decision Tree 2.1.2 Incorporating Continuous-Valued Attributes 2.1.3 Univariate and Multivariate Decision Trees 2.1.4 Pruning Strategies 2.2 Support Vector Machines 2.2.1 Conventional Support Vector Machines 2.2.2 Incorporating Nominal-Valued Attributes 3 Model Trees 3.1 The Concept of Model Trees 3.2 Building Model Trees 4 MDL-based Model Trees 4.1 Minimum Description Length Principle 4.2 Estimate the coding length of Support Vector Machines 5 Experiments 5.1 Dataset Descriptions 5.2 Numerical Results and Comparisons 6 Conclusion and future work 6.1 Conclusions 6.2 Future work

[1] K. Bennett and J. Blue. A support vector machine approach to decision
trees. 1997.
[2] D.P. Bertsekas. Nonlinear Programming. Athena Scientific, 1999.
[3] L. Breiman. Bias-variance, regularization, instability and stabilization.
In Neural Networks and Machine Learning, ed. C.M. Bishop, pages 27–
56, Springer.
[4] L. Breiman, J.H. Friedman, R.A. Olshen, and Stone C.J. Classification
and regression trees. 1984.
[5] C.E. Brodley and P.E. Utgoff. Multivariate decision tree. Machine
Learning, 19(1):45–77, 1995.
[6] C.J.C. Burges. A tutorial on support vector machines for pattern recog-
nition. Data Mining and Knowledge Discovery, 2(2):121–167, 1998.
[7] J. Catlett. Megainduction: machine learning on very large database.
PhD thesis, University of Sydney, 1991.
[8] V. Cherkassky and F. Mulier. Learning From Data - Concepts, Theory
and Methods. John Wiley & Sons, 1998.
[9] R. Courant and D. Hilber. Methods of Mathematical Physics. Inter-
science Publishers, 1953.
[10] N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector
Machines. Cambridge University Press, 2000.
[11] J. Dougherty, Kohavi R., and M. Sahami. Supervised and unsuper-
vised discritizations of continuous features. In Proceedings of the 12th
International Conference on Machine Learning, pages 194–202, 1995.
[12] M.H. Dunham. Data Mining: Introduction and Advanced Topics. Pren-
tice Hall, 2003.
[13] Alpaydin Ethem. Introduction To Machine Learning. MIT Press, 2004.
[14] U.M. Fayyad and K.B. Irani. Muti-interval discretizations of continuous
value attributes for classification learning. In proceedings of the 13th
International Joint Conference on Artificial Intelligence, pages 1022–
1029, 1993.
[15] J. Gama. Functional trees for classification. In Proceedings of the 2001
IEEE International conference on Data, pages 147–154, 2001.
[16] J. Gama, L. Torgo, and C. Soares. Dynamic discretization of continu-
ous attributes,. In Proceedings of the Iberoamericam Conference on AI,
pages 160–169, 1998.
[17] Peter Grunwald. A tutorial introduction to the minimum description
length principle. In Peter Grunwald, Jae Myung and Mark A. Pitt
(eds.), Advances in Minimum Description Length: Theory and Applica-
tions, 2004.
[18] T. Hastie, R. Tibshirani, and J. Friedaman. The Elements of Statistical
Learning. Springer-Verlag, 2001.
[19] D. Heath, S. Kasif, and S. Salzberg. Induction of oblique decision trees.
In Proceedings of the 13th International Joint Conference on Artificial
Intelligence, pages 1002–1007, 1993.
[20] E.B. Hunt, J.and Marin, and P. J. Stone. Experiments in Induction.
Academic Press, 1996.
[21] L. Hyafil and R.L. Rivest. Constructing optimal binary decision tree is
np-complete. Information Processing Letters, 5(1):15–17, 1976.
[22] T. Joachims. Learning to Classify Text Using Support Vector Machines:
Methods, Theory, and Algorithms. Kluwer Academic Publishers, 2002.
[23] M. Kantardzic. Data Ming: Concepts, Models, Methods, and Algorithms.
Wiley-IEEE Press, 2002.
[24] R. Kohavi and M. Sahami. Error-based and entropy-based discretization
of continuous features. In Proceedings of the Second International Con-
ference on Knowledge Discovery and Data Ming, pages 114–119, 1996.
[25] I. Kononenko, I. Bratko, and E. Roskar. ASSISTANT 86: A knowledge-
elicitation tool for sophisticated users. In I. Bratko and N. Lavrac (eds.),
Progress in Machine Learning. Sigma Press, 1987.
[26] Y.-J. Lee. Smooth support vector machine toolbox.
http://dmlab1.csie.ntust.edu.tw/downloads/SSVM Code.htm.
[27] Y.-J. Lee, O.L. Mangasarian, and W.h. Wolberg. Survival-time classifi-
cation of breast cancer patients. Technical Report 01-03, Data Min-
ing Institute, Computer Sciences Department, University of Wiscon-
sin, Madison, Wisconsin, March 2001. Computational Optimization and
Applications, to appear. ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/01-
03.ps, 2001.
[28] X.B. Li, J.R. Sweigrat, J.T.C. Teng, J.M. Donohue, L.A. Thombs, and
S.M. Wang. Multivariate decision trees using linear discriminants and
tabu search. Systems, Man and Cybernetics, Part A, IEEE Transaction
on, 33(2):194–205, 2003.
[29] O.L. Mangasarian. Nonlinear Programming. SIAM, 1994.
[30] M. Mehta, J. Rissanen, and R. Agrawal. Mdl-based decision tree prun-
ing. In Proceedings of the First International Conference on Knowledge
Discovery and Data Mining (KDD-95), pages 216–221, 1995.
[31] J. Mingers. An empirical comparison of pruning methods for decision
tree induction. Machine Learning, 4(3):227–243, 1989.
[32] J. Mingers. An empirical comparison of selection measures for decision
tree induction. Machine Learning, 3(4):319–342, 1989.
[33] T.M. Mitchell. Machine Learning. McGraw-Hill, 1997.
[34] S.K. Murthy, S. Kasif, and S. Salzberg. A system for induction of oblique
decision trees. Journal of Artificial Intelligence Research, 2(1):1–33,
1994.
[35] S.K. Murthy, S. Kasif, S. Salzberg, and R. Beigel. Oc1: Randomized
induction of oblique decision trees. In Proceedings of the Eleventh Nation
Conference on Artificial Intelligence, pages 322–327, 1993.
[36] E. Osuna, Freund R., and F. Girosi. Training support vector machines:
an application to face detection. In IEEE Conference on Computer
vision and Pattern Recognition, pages 130–135, 1997.
[37] H.-K. Pao, S.-C. Chang, and Y.-J. Lee. Model trees for classification
of hybrid data types. The 6th International Conference on Intelligent
Data Engineering and Automated Learning, 2005.
[38] J. R. Quinlan and R.L. Rivest. Inferring decision trees using the
minimum description length principle. Information and computation,
80(3):227–248, 1989.
[39] J.R. Quinlan. Learning efficient classification procedures. In R.S.
Michalski, J.G. Carbonell, and T.M. Mitchell(eds), Machine Learning:
An Artificial Intelligence Approach. Tioga Press, 1983.
[40] J.R. Quinlan. Introduction of decision trees. Machine Learning, 1(1):81–
106, 1986.
[41] J.R. Quinlan. Simplifying decision trees. International Journal of Man
Machine Studies, 27:221–234, 1987.
[42] J.R. Quinlan. C4.5: Programs for machine Learning. Morgan Kauf-
mann, 1993.
[43] J.R. Quinlan. Improved Use of Continuous Attributes in C4.5, volume 4.
1996.
[44] J. Rissanen. Modeling by shortest data description. Automatica, 14:465–
471, 1978.
[45] J. Rissanen. Stochastic complexity and modeling. Ann. of Statist.,
14(3):1080–1100, 1986.
[46] J. Rissanen and G.G. Langdon. Universal modeling and coding. Infor-
mation Theory IEEE Transaction on, 27(1):12–23, 1981.
[47] Evgeniou T., M. Pontil, and T. Poggio. Regularization networks and
support vector machines. In A. Smola, P. Bartlett, B. Scholkopf, and
D. Schuurmans, editors, Advances in Large Margin Classifiers, pages
171–203, 2000.
[48] P.E. Utgoff and C. E. Brodley. Linear machine decision trees. Technical
report, 1991.
[49] V.N. Vapnik. The Nature of Statistical Learning Theory. Springer-
Verlag, 1995.
[50] V.N. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998.
[51] I.H. Witten and E. Frank. Data Mining: Practical Machine Learning
Tools and Techniques with Java Implementations. Morgan Kaufmann,
2000.

無法下載圖示 全文公開日期 2009/07/31 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE