簡易檢索 / 詳目顯示

研究生: 宋玉玲
SUNG - EVI YULIATI
論文名稱: 應用遺傳演算法為基之決策樹於產品品質分類之研究-以牛奶加工程序控制為例
Application of Genetic Algorithm-Based Decision Tree to Product Quality Classification- A Case Study on Milk Process Control
指導教授: 郭人介
REN-JIEH KUO
口試委員: 王孔政
KUNG-JENG WANG
許鉅秉
JIUH-BING SHEU
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2010
畢業學年度: 98
語文別: 英文
論文頁數: 117
外文關鍵詞: Process control validation, 5 folds validation, Bayesian conditional probability, C4.5.
相關次數: 點閱:338下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  • Process control validation is a part of Food Safety System validation that has been a fundamental requirement in Food Safety Management System Implementation. Process control validation requires plenty of data collecting, processing, and analysis. As the complexity of process, organization, and product increases, more and more time and resources will be required to perform the validation process. Data mining and soft computing are very helpful in doing validation. In this research, process control validation is performed using product quality classification approach to classify product into 2 classes, i.e. acceptable and unacceptable. The main idea is to find combinations of process control setting parameters required to produce product with acceptable quality as an organization may have several setting parameters options, especially during trial and commissioning.
    As real world problem is complex with various types of data, a robust and flexible algorithm is required to process the data. An algorithm that is able to process binary and non-binary attribute values also discrete and continuous data, will be very advantageous in data mining as many researches conducted previously has limitations. Thus, the objective of this research is to create an algorithm that is able to deal with this problem. In this research decision tree is applied to perform classification task for binary class problem. A soft computing method, Genetic Algorithm (GA), is used to generate decision tree. As GA explores the search space to find tree structures, Bayesian conditional probability and information gain is applied to optimize tree structure found by GA. Five folds validation is performed to test the validity of the proposed algorithm. Four benchmark datasets, SPECT, House Vote, Hepatitis, and Tic-Tac-Toe Endgame, are used to test algorithm performance. The algorithm is applied solely (denoted as GA) and in combination with C4.5 initial solutions (denoted as C4.5+GA). Predictive accuracy of proposed algorithm is compared with other (conventional heuristic) algorithms such as C4.5, CART, CHAID, and One R. When applied solely, GA performs better than conventional heuristic algorithms in SPECT and Hepatitis. In combination with initial solutions of C4.5, C4.5+GA outperforms GA and conventional algorithms in SPECT, House Vote, and Hepatitis. Since the proposed algorithm is proven to have good performance in binary attribute case, experiment on real world data (Milk Processing) is carried on using proposed algorithm. The result of experiment on real world data is aligned with previous result of validation.

    Abstract ii Acknowledgement iv List of Contents v List of Tables viii List of Figures ix Chapter 1 Introduction 1 1.1 Research Background 1 1.2 Research Objectives 2 1.3 Research Scope and Constraint 3 1.4 Research Framework 3 Chapter 2 Literature Review 5 2.1 Introduction to Food Safety 5 2.1.1 Hazard Analysis and Critical Control Point (HACCP) 6 2.1.2 Decision Support System Application in Food Quality 8 2.2 Introduction to Data Mining 8 2.2.1 Data Mining Concept 9 2.2.2 Data Mining Methods and Applications 11 2.3 Decision Tree Classifier 13 2.3.1 Heuristic Approach on Decision Tree Induction 13 2.3.2 Decision Tree Evaluation 16 2.4 Genetic Algorithm 18 2.4.1 Introduction to Genetic Algorithm 18 2.4.2 Genetic Algorithm Implementation 19 2.4.2.1 Scheme Encoding 19 2.4.2.2 Initialization/Population Generation 20 2.4.2.3 Fitness Evaluation 20 2.4.2.4 Parent Selection 20 2.4.2.5 Crossover 23 2.4.2.6 Mutation 25 2.4.3 Genetic Algorithm Application to Food Quality 27 2.5 Application of Soft Computing To Decision Tree 27 2.5.1 Artificial Neural Networks (ANNs) 28 2.5.2 Tabu Search 28 2.5.3 Ant Colony 28 2.5.4 Fuzzy Logic 29 2.5.5 Genetic Algorithm 29 2.5.5.1 Genetic Algorithm Application for Feature Selection of Decision Tree 30 2.5.5.2 Genetic Algorithm Application for Evolving Heuristic Decision Tree 30 2.5.5.3 Genetic Algorithm Application for Inducing and Evolving Decision Tree Structure 31 Chapter 3 Methodology 35 3.1 Validation Data Preparation 35 3.1.1 SPECT Heart Data Set 35 3.1.2 Congressional Voting Records Data Set 37 3.1.3 Hepatitis Data Set 38 3.1.4 Tic-Tac-Toe Endgame Data Set 40 3.2 Proposed Algorithm for Breeding Decision Tree Using Genetic Algorithm 42 3.2.1 Encoding Scheme 43 3.2.2 Chromosome Length Determination 45 3.2.3 Initialization of Population 45 3.2.3.1 Generating Tree Structure 45 3.2.3.2 Assigning Edge Value 47 3.2.4 Fitness Value Evaluation 52 3.2.5 Parent Selection 52 3.2.6 Genetic Operator 53 3.2.6.1 Crossover 53 3.2.6.2 Mutation 54 3.2.7 Feasibility Check 55 Chapter 4 Decision Tree Validation Result 58 4.1 Decision Tree Validation 58 4.2 Validation Computational Result 60 4.2.1 SPECT Dataset 60 4.2.2 House Vote Dataset 63 4.2.3 Hepatitis Dataset 65 4.2.4 TIC-TAC-TOE Endgame Dataset 67 Chapter 5 Decision Tree Application to Milk Processing Control 70 5.1 Milk Processing Data Preparation 70 5.2 Milk Processing Computational Result 75 Chapter 6 Conclusions and Recommendations 80 6.1 Conclusions 80 6.2 Contributions 81 6.3 Suggestions for Future Studies 81 References 83 Appendix A Computing Time of the Best Decision Tree of Each Fold 88 Appendix B Decision Tree Models for Milk Processing Data 89 Appendix C Convergence History of Validation Dataset 95 Appendix D Convergence History of Milk Processing Dataset 105 Appendix E Result of Principle Component Analysis for Milk Processing Attribute Selection 108 Appendix F Result of Tunning Parameter Experiments 110

    Bala, J, Huang, J. Vafale, H., DeJong, K., and Wechsler, H. 1995. Hybrid Learning Using Genetic Algorithms and Decision Trees for Pattern Classification. Montreal: IJCAI Conference.
    Banga, J. R, Balsa-Canto, E., Moles, C.G, and A.A. Alonso. 2003. Improving Food Processing Using Modern Optimization Methods. Trends in Food Science and Technology, Vol. 14, p. 131-144.
    Blodgett, D.E., Gendreau, M., Guertin, F., Potvin, J., and R. Seguin. 2003. A Tabu Search Heuristic for Resource Management in Naval Warfare. Journal of Heuristics, Vol 9, p. 145-169.
    Boryczka, U. and J. Kozak. New Algorithm for Generation Decision Trees-Ant Miner and Its Modifications. Foundation of Computational Intelligence, Vol 6, SCI 206, p. 229-262.
    British Retail Consortium, 2008. BRC Global Standard for Food Safety Issue 5. United Kingdom: The Stationery Office.
    Carvalho, D.R. and A.A. Freitas. 2003. A Hybrid Decision Tree/Genetic Algorithm Method for Data Mining. Information Sciences, Vol 163, p.13-35.
    Cha, S. and C. Tappert. 2009. A Genetic Algorithm for Constructing Compact Binary Decision Tree. Journal of Pattern Recognition Research, Vol 1, p. 1-13.
    Chen, T. C. and T. C. Hsu. 2006. A GA s Based Approach For Mining Breast Cancer Pattern. Expert System with Application (30) p. 674-681.
    Cunha, C. D., Agard, B., and A. Kusiak. 2006. Data Mining for Improvement of Product Quality. International Journal of Production Research (44) p. 4027-4041.
    de Castro, J.N. 2006. Fundamentals of Natural Computing, Basic Concepts, Algorithms, and Applications. USA: Chapman and Hall/CRC, Taylor & Francis Group.
    Didier Blanc. 2006. ISO 22000: 2005 from Intent to Implementation. ISO Management System: www.iso.org/ims .
    Du, C.J., and D. Sun. 2005. Pizza Sauce Spread Classification Using Colour Vision and Support Vector Machines. Journal of Food Engineering, Vol 66, p. 137-145.
    Du, C.J., and D. Sun. 2008. Multi-classification of Pizza Using Computer Vision and Support Vector Machines. Journal of Food Engineering, Vol 866, p. 234-242.
    Eisenberg, J.N.S. and T.E. McKone. 1999. Decision Tree Method for The Classification of Chemical Pollutants: Incorporation of Across-chemical Variability and Within-chemical Uncertainty. Environment Science and Technology, Vol 32, p. 3396-3404.
    Farid, D., Harbi, N., and M. Z. Rahman. 2010. Naive Bayes and Decision Tree for Adaptive Intrusion Detection. International Journal of Network Security and Its Applications (IJNSA), Vol 2 (2), p. 12-25.
    Fayyad, U., Piatetsky-Shapiro, G., and P. Smyth. 1996. The KDD Process for Extracting Usefull Knowledge from Volumes of Data. Communication of The ACM, Vol. 39, No. 11, p. 27-34.
    Frietas, A. A. 2002. Data Mining and Knowledge Discovery with Evolutionary Algorithm. Germany: Springer-Verlag Berlin Heidelberg.
    Fu, Z. 1999. An Innovative GA-Based Decision Tree Classifier in Large Scale Data Mining. Germany: Springer-Verlag Berlin Heidelberg.
    Garcia-Gimeno, R.M., Hervas-Martinez, C., and M.I. de Siloniz. 2002. Improving artificial neural networks with a pruning methodology and genetic algorithms for their application in microbial growth prediction in food. International Journal of Food Microbiology, Vol 72, p. 19-30.
    Giudici, P. 2003. Applied Data Mining, Statistical Method for Business and Industry. England: John Wiley and Sons, Inc.
    Huang, M., Gong, J., Shi, Z., Liu, C., and L. Zhang. 2007. Genetic Algorithm-based Decision Tree Classifier for Remote Sensing Mapping With SPOT-5 Data in The Hong Shi Mao Watershed of The Loess Plateau, China. Neural Computing and Application, Vol 16, p.513-517.
    IOS. 2007. International Standard ISO 22000: 2005, Food Safety Management System-Requirements for Any Organization in The Food Chain. International Organization for Standarization.
    Izadifar, M. and M.Z. Jahromi. 2007. Application of genetic algorithm for optimization of vegetable oil hydrogenation process. Journal of Food Engineering, Vol 78, p. 1-8.
    Kargupta, H., Joshi, A., Sivakumar, K., and Y. Yesha. 2004. Data Mining, Next Generation Challenge and Future Direction. USA: MIT Press.
    Kim, K.M., Park, J.J, Song, M.H., Kim, I.C., and C.Y. Shuen. 2004. Binary Decision Tree Using Genetic Algorithm for Recognizing Defect Patterns of Cold Mill Strip. Germany: Springer-Verlag Berlin Heidelberg.
    Koc, A.B., Heinemann, P.H., and G.R. Ziegler. 2007. Optimization of Whole Milk Powder Processing Variables with Neural Networks and Genetic Algorithms. Institution of Chemical Engineers. Vol 85 (C4), p. 336-343.
    Leaper, S. and P. Richardson. 1999. Validation of Thermal Process Control for The Assurance of Food Safety. Food Control, Vol 10, p. 281-283.
    Leardi, R. 2007. Genetic Algorithms in Chemistry. Journal of Chromatography A, vol 1158, p. 226-233.
    Li, X, Sweigart, J.R., Teng, J.T.C., Donohue, J.M., Thombs, L.A, and S.M. Wang. 2003. Multivariate Decision Tree Using Linear Discriminants and Tabu Search. IEEE Transaction on Systems, Man, and Cybernetics, Part A: Systems and Humans, Vol 33 (2), p. 194- 205.
    Maimon, O. and L. Rokach. 2005. Data Mining and Knowledge Discovery Handbook. USA: Springer Science+Business Media, Inc.
    Mena, Jesus. 2003. Investigative Data Mining for Security and Criminal Detection. USA: Elsevier.
    Mitra, S., Pal, S.K., and P. Mitra. 2002. Data Mining in Soft Computing Framework: A Survey. IEEE Transactions on Neural Networks, Vol 22 (1), p.3-14.
    Myles, A.J., Feudale, R.N., Liu, Y., Woody, N.A., and S.D. Brown. 2004. An Introduction to Decision Tree Modeling. Journal of Chemometrics, Vol 18, p. 275-285.
    Ngai, B. W. T., Xiu, L., and D. C. K. Chau. 2009. Application of Data Mining Techniques in Customer Relationship Management: A Literature Review and Classification. Expert System with Application (36) p. 2592-2602.
    Olson, D.L. and D. Delen. 2008. Advanced Data Mining Techniques. Germany: Springer-Verlag Berlin Heidelberg.
    O’Neill, R. and A. Szarfman. 2001. Some US Food and Drug Administration Perspectives on Data Mining for Pediatric Safety Assessment. Current Therapeutic Research (62) p. 650-663.
    Orsenigo, C. and C. Vercellis. 2004. Discrete Support Vector Decision Trees via Tabu Search. Computational Statistics and Data Analysis, Vol 47, p. 311-322.
    Papagelis, A. and D. Kalles. 2001. Breeding decision trees using evolutionary techniques. Proceedings of the eighteenth international conference on machine learning, Morgan Kaufmann, June 28–July 01, pp 93–400.
    Pendrycz, W. and Z.A. Sosnowski. 2005. C-Fuzzy Decision Trees. IEEE Transaction on Systems, Man, and Cybernetics, Part A: Applications and Reviews, Vol 35 (4), p. 498- 511.
    Podgorelec, V., Kokol, P., Stiglic, B., and I. Rozman. 2002. Decision Tree: An Overview and Their Use In Medicine. Journal of Medical Systems, Vol 26 (5), p. 445-463.
    Rokach, L. and O. Maimon. 2006. Data Mining for Improving The Quality of Manufacturing: A Feature Set Decomposition Approach. Journal of Intelligence Manufacturing (17) p. 285-299.
    Sanchez, D., Vila, M. A., Cerda, L., and J.M. Serrano. 2009. Association Rules Applied to Credit Card Fraud Detection. Expert System with Application (36) p.3630-3640.
    Shankar, T.J. and S. Bandyopadhyay. 2004. Optimization of Extrusion Process Variables Using A Genetic Algorithm. Food and Bioproducts Processing, vol 82 (C2): p.143-150.
    Shelokar, P.S., Jayaraman, V.K., and B.D. Kulkarni. 2004. An Ant Colony Classifier System: Application to Some Process Engineering Problems. Computers and Chemical Engineering, Vol 28, p. 1577-1584.
    Shmueli, G., Patel, N.R., and P.C. Bruce. 2007. Data Mining for Bussiness Inteligence. Canada: John Wiley & Sons, Inc.
    Sivanandam, S.M. and S.N. Deepa. 2007. Introduction to Genetic Algorithm. Germany: Springer-Verlag Berlin Heidelberg.
    Sorensen, K. and G.K. Janssens. 2003. Data Mining with Genetic Algorithms on Binary Trees. European Journal of Operation Research, Vol 151, p. 253-264.
    Tan, P.N. 2006. Introduction to Data Mining. Boston: Pearson Addison Wesley.
    Tetra Pak International, 2010. Product and Services. http://www.tetrapak.com/products_and_services/packages
    USDA, 1998. Key Facts: The Seven HACCP Principles. http://www.fsis.usda.gov/OA/background/keyhaccp.htm
    WHO, 2007. Hazard Analysis Critical Control Point System (HACCP). http://www.who.int/foodsafety/fs_management/haccp/en/print.html
    Wikipedia, 2009. Chromosome. http://en.wikipedia.org/wiki/Chromosome
    Wikipedia, 2009. DNA. http://en.wikipedia.org/wiki/DNA
    Wikipedia, 2009. Gene. http://en.wikipedia.org/wiki/Gene
    Wikipedia, 2009. Ultra High Temperature Processing. http://en.wikipedia.org/wiki/Ultra-high-temperature_processing
    Xie, L., Ying, Y. and T. Ying. 2009. Classification of Tomatoes with Different Genotypes by Visible and Short-wave Near-infrared Spectroscopy with Least-squares Support Vector Machines. Journal of Food Engineering (94) p.34-39.
    Yang, B., Zhao, M., and Y. Jiang. 2008. Optimization of Tyrosinase Inhibition Activity of Ultrasonic-extracted Polysaccharides from Longan Fruit Pericarp. Food Chemistry vol 110, p. 294-300.
    Zhang, j., Shi, Y. and Y. Peng. 2009. Several Multi Criteria Programming Methods for Classification. Computer and Operation Research (36) p. 825-836.
    Zhao, H. 2007. A Multi-objective Genetic Programming Approach to Developing Pareto Optimal Decision Trees. Decision Support Systems, vol 43, p. 809-826.

    QR CODE