簡易檢索 / 詳目顯示

研究生: 倪艾莎
Nisya Kintan Qumari
論文名稱: 資料探勘中以特徵相關矩陣做特徵選擇探討
Study on Using Correlation Matrix for Feature Selection in Data Mining
指導教授: 蘇順豐
Shun-Feng Su
口試委員: 蘇順豐
Shun-Feng Su
郭重顯
Chung-Hsien Kuo
莊鎮嘉
Chen-Chia Chuang
陳美勇
Mei-Yung Chen
梁書豪
Shu-Hao Liang
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 112
語文別: 英文
論文頁數: 52
外文關鍵詞: Correlation Matrix, CNC Machines, Feature Selection, Machine Learning, Data Mining
相關次數: 點閱:35下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  • This study is to use correlation matrix for feature selection in data mining—more especially inside CNC machines is explored in this thesis. Prediction modeling and classification are the main areas of concentration, with various scenarios utilized to maximize model performance and feature selection. Regarding classification, a thorough investigation of multiple cases demonstrates the effectiveness of a hybrid strategy that employs correlation heatmap-guided feature selection and bases the procedure on power factor values. The Random Forest model stands out as a top performer with an accuracy of 96.8% in the base learner stage and an astounding 97.2% post-tuning, both in original and normalized data settings. Notably, better performance with fewer features supports the adage less is more, highlighted by the reduced feature set decision. The K-Nearest Neighbors model regularly performs better than other models in predictive modeling, showing the lowest error values in various circumstances. The K-Nearest Neighbors model demonstrates its robustness with low prediction errors, regardless of whether it uses all features without a power factor or just non-redundant features without a power factor. This work lays the groundwork for developing feature selection techniques, investigating ensemble learning opportunities, refining prediction models, and customizing strategies to the complexities of CNC machine datasets. These results add to the ongoing discussion in data mining and provide useful information for improving categorization and predictive modeling in the complex field of CNC machines.

    Abstract I Acknowledgement II Contents IV Figures VI Tables VIII 1. Introduction 1 1.1 Background 1 1.2 Aim and Contributions 2 1.3 Thesis Outline 3 2. Literature Review 4 3. Overview of Feature Selection with Correlation Matrix 6 4. Proposed Method 7 4.1 Methodology 7 4.1.1 Data Collection 7 4.1.2 Data Pre-processing 7 4.1.3 Case Study 10 4.2 Feature Selection 11 4.3 Learning Scenario on Classification 13 4.4 Learning Scenario on Prediction 14 4.4.1 All Features without PF 14 4.4.2 Non-Redundant Features without PF 15 4.5 Developed Models 16 4.5.1 K-Nearest Neighbor 16 4.5.2 Decision Tree 17 4.5.3 Random Forest 19 4.5.4 Naïve Bayes 21 4.5.5 Super Vector Machine 22 5. Results and Discussion 24 5.1 Hardware Installation 24 5.1.1 CNC Machine 24 5.1.2 Power Meter 25 5.1.3 Switch 27 5.1.4 Access Point (AP) 28 5.1.5 Laptop 28 5.2 Dataset 29 5.3 Data Acquisition 29 5.4 Data Exploration 30 5.4.1 Import Libraries 30 5.4.2 Load Data 31 5.4.3 Count number of rows and columns 31 5.4.4 Dropping Unnamed / Nan Value 32 5.4.5 Dropping Irrelevant Features 32 5.4.6 Data Preprocessing 32 5.5 Feature Selection 33 5.5.1 Feature Selection in Classification 35 5.5.2 Feature Selection in Prediction 36 5.6 Normalized Dataset 38 5.7 Data Splitting 39 5.8 Performance Assessment on Classification 40 5.8.1 Accuracy 40 5.8.2 Confusion Matrix 41 5.9 Performance Assessment on Prediction 41 5.9.1 Root Mean Square Error (RMSE) All Feature Without PF 42 5.9.2 Root Mean Square Error (RMSE) Non-Redundant Feature without PF 44 6. Conclusions and Future Direction 47 6.1 Conclusions 47 6.2 Future Direction 48 References 49

    [1] J. Han, J. Pei, and H. Tong, "Data mining: concepts and techniques," Morgan kaufmann, 2022.
    [2] H. Peng, F. Long, and C. Ding, "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Transactions on pattern analysis and machine intelligence, vol. 27, no. 8, pp. 1226-1238, 2005.
    [3] C. Guo et al., "Recent advancements in machining with abrasives," Journal of Manufacturing Science and Engineering, vol. 142, no. 11, p. 110810, 2020.
    [4] Y. Zhang, X. Wang, and H. Tang, "An improved Elman neural network with piecewise weighted gradient for time series prediction," Neurocomputing, vol. 359, pp. 199-208, 2019.
    [5] M. Zou et al., "Bayesian cnn-bilstm and vine-gmcm based probabilistic forecasting of hour-ahead wind farm power outputs," IEEE Transactions on Sustainable Energy, vol. 13, no. 2, pp. 1169-1187, 2022.
    [6] B. Yi et al., "Resolving the influence of lignin on soil organic matter decomposition with mechanistic models and continental‐scale data," Global Change Biology, vol. 29, no. 20, pp. 5968-5980, 2023.
    [7] K. Javed, H. A. Babri, and M. Saeed, "Feature selection based on class-dependent densities for high-dimensional binary data," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 3, pp. 465-477, 2010.
    [8] L. Xu, Y. Lu, and S. Wu, "Optimization of One-dimensional Convolution Time Series Model for Computer Numerical Control Machine Tool Fault Diagnosis," in 2022 International Conference on Machine Learning and Intelligent Systems Engineering (MLISE), 2022: IEEE, pp. 254-259.
    [9] N. Singh, "Detection of Breast Cancer with Python," 2020.
    [10] L. Yu and H. Liu, "Feature selection for high-dimensional data: A fast correlation-based filter solution," in Proceedings of the 20th international conference on machine learning (ICML-03), 2003, pp. 856-863.
    [11] I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," Journal of machine learning research, vol. 3, no. Mar, pp. 1157-1182, 2003.
    [12] K. Huang, F. Meng, H. Li, S. Chen, Q. Wu, and K. N. Ngan, "Class activation map generation by multiple level class grouping and orthogonal constraint," in 2019 Digital Image Computing: Techniques and Applications (DICTA), 2019: IEEE, pp. 1-6.
    [13] H. Youn, S. Kwon, H. Lee, J. Kim, S. Hong, and D.-J. Shin, "Construction of error correcting output codes for robust deep neural networks based on label grouping scheme," in 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC), 2021: IEEE, pp. 51-55.
    [14] D. Singh and B. Singh, "Investigating the impact of data normalization on classification performance," Applied Soft Computing, vol. 97, p. 105524, 2020.
    [15] B. K. Baradwaj and S. Pal, "Mining educational data to analyze students' performance," arXiv preprint arXiv:1201.3417, 2012.
    [16] B. Xue, M. Zhang, and W. N. Browne, "Particle swarm optimization for feature selection in classification: A multi-objective approach," IEEE transactions on cybernetics, vol. 43, no. 6, pp. 1656-1671, 2012.
    [17] Z. Han, J. Zhao, H. Leung, K. F. Ma, and W. Wang, "A review of deep learning models for time series prediction," IEEE Sensors Journal, vol. 21, no. 6, pp. 7833-7848, 2019.
    [18] J. Jia, N. Yang, C. Zhang, A. Yue, J. Yang, and D. Zhu, "Object-oriented feature selection of high spatial resolution images using an improved Relief algorithm," Mathematical and Computer Modelling, vol. 58, no. 3-4, pp. 619-626, 2013.
    [19] B. A. Johnson, "High-resolution urban land-cover classification using a competitive multi-scale object-based approach," Remote Sensing Letters, vol. 4, no. 2, pp. 131-140, 2013.
    [20] P. M. Mather and M. Koch, "Computer processing of remotely-sensed images: an introduction," John Wiley & Sons, 2011.
    [21] R. Kohavi and G. H. John, "Wrappers for feature subset selection," Artificial intelligence, vol. 97, no. 1-2, pp. 273-324, 1997.
    [22] Q. Dai, R. Ye, and Z. Liu, "Considering diversity and accuracy simultaneously for ensemble pruning," Applied Soft Computing, vol. 58, pp. 75-91, 2017.
    [23] Y. Ren, L. Zhang, and P. N. Suganthan, "Ensemble classification and regression-recent developments, applications and future directions," IEEE Computational intelligence magazine, vol. 11, no. 1, pp. 41-53, 2016.
    [24] Z. Chen, T. Lin, R. Chen, Y. Xie, and H. Xu, "Creating diversity in ensembles using synthetic neighborhoods of training samples," Applied Intelligence, vol. 47, pp. 570-583, 2017.
    [25] L. Breiman, "Bagging predictors," Machine learning, vol. 24, pp. 123-140, 1996.
    [26] W. Yan, D. Tang, and Y. Lin, "A data-driven soft sensor modeling method based on deep learning and its application," IEEE Transactions on Industrial Electronics, vol. 64, no. 5, pp. 4237-4245, 2016.
    [27] Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of computer and system sciences, vol. 55, no. 1, pp. 119-139, 1997.
    [28] L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001.
    [29] M. I. Jordan and T. M. Mitchell, "Machine learning: Trends, perspectives, and prospects," Science, vol. 349, no. 6245, pp. 255-260, 2015.
    [30] S. Chan and P. Treleaven, "Continuous model selection for large-scale recommender systems," in Handbook of statistics, vol. 33: Elsevier, 2015, pp. 107-124.
    [31] N. Buslim, I. L. Rahmatullah, B. A. Setyawan, and A. Alamsyah, "Comparing Bitcoin's Prediction Model Using GRU, RNN, and LSTM by Hyperparameter Optimization Grid Search and Random Search," in 2021 9th International Conference on Cyber and IT Service Management (CITSM), 2021: IEEE, pp. 1-6.
    [32] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, "A practical guide to support vector classification," ed: Taipei, Taiwan, 2003.
    [33] W. Wang and Y. Lu, "Analysis of the mean absolute error (MAE) and the root mean square error (RMSE) in assessing rounding model," in IOP conference series: materials science and engineering, 2018, vol. 324: IOP Publishing, p. 012049.
    [34] M. Pujari, C. Awati, and S. Kharade, "Efficient Classification with an Improved Nearest Neighbor Algorithm," in 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), 2018: IEEE, pp. 1-5.
    [35] I. Ibrahim and A. Abdulazeez, "The role of machine learning algorithms for diagnosing diseases," Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 10-19, 2021.
    [36] M.-L. Zhang and Z.-H. Zhou, "ML-KNN: A lazy learning approach to multi-label learning," Pattern recognition, vol. 40, no. 7, pp. 2038-2048, 2007.
    [37] Y. Wang, Z. Pan, and Y. Pan, "A Training Data Set Cleaning Method by Classification Ability Ranking for the $ k $-Nearest Neighbor Classifier," IEEE transactions on neural networks and learning systems, vol. 31, no. 5, pp. 1544-1556, 2019.
    [38] P. Mueller et al., "Scent classification by K nearest neighbors using ion-mobility spectrometry measurements," Expert Systems with Applications, vol. 115, pp. 593-606, 2019.
    [39] J. R. Quinlan, "Induction of decision trees," Machine learning, vol. 1, pp. 81-106, 1986.
    [40] F. Aaboub, H. Chamlal, and T. Ouaderhman, "Analysis of the prediction performance of decision tree-based algorithms," in 2023 International Conference on Decision Aid Sciences and Applications (DASA), 2023: IEEE, pp. 7-11.
    [41] Y. CHEN and Y. MAO, "Automatic tuning of Ceph parameters based on random forest and genetic algorithm," Journal of Computer Applications, vol. 40, no. 2, p. 347, 2020.
    [42] K. Pahwa, M. Sharma, M. S. Saggu, and A. K. Mandpura, "Performance evaluation of machine learning techniques for fault detection and classification in PV array systems," in 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN), 2020: IEEE, pp. 791-796.
    [43] L. Wei, "Genetic Algorithm Optimization of Concrete Frame Structure Based on Improved Random Forest," in 2023 International Conference on Electronics and Devices, Computational Science (ICEDCS), 2023: IEEE, pp. 249-253.
    [44] G. Dimitoglou, J. A. Adams, and C. M. Jim, "Comparison of the C4. 5 and a Naïve Bayes classifier for the prediction of lung cancer survivability," arXiv preprint arXiv:1206.1121, 2012.
    [45] H. T. Zaw, N. Maneerat, and K. Y. Win, "Brain tumor detection based on Naïve Bayes Classification," in 2019 5th International Conference on engineering, applied sciences and technology (ICEAST), 2019: IEEE, pp. 1-4.
    [46] M.-L. Zhang, J. M. Peña, and V. Robles, "Feature selection for multi-label naive Bayes classification," Information Sciences, vol. 179, no. 19, pp. 3218-3229, 2009.
    [47] M. Martinez-Arroyo and L. E. Sucar, "Learning an optimal naive bayes classifier," in 18th international conference on pattern recognition (ICPR'06), 2006, vol. 3: IEEE, pp. 1236-1239.
    [48] F. K. Mega, Y. Yaddarabullah, and S. D. H. Permana, "Classification Auction Motorcycle and Car In South Jakarta District Attorney Using Naïve Bayes," in 2021 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT), 2021: IEEE, pp. 168-171.
    [49] F. Ramadhani and I. P. Sari, "Improving the Performance of Naïve Bayes Algorithm by Reducing the Attributes of Dataset Using Gain Ratio and Adaboost," in 2021 International Conference on Computer Science and Engineering (IC2SE), 2021, vol. 1: IEEE, pp. 1-5.
    [50] H. Rhomadhona and J. Permadi, "Klasifikasi Berita Kriminal Menggunakan Naïve Bayes Classifier (NBC) dengan Pengujian K-Fold Cross Validation," Jurnal Sains Dan Informatika, vol. 5, no. 2, pp. 108-117, 2019.
    [51] C. Cortes and V. Vapnik, "Support-vector networks," Machine learning, vol. 20, pp. 273-297, 1995.
    [52] K. Chen, H. Yao, and Z. Han, "Arithmetic optimization algorithm to optimize support vector machine for chip defect Identification," in 2022 28th International Conference on Mechatronics and Machine Vision in Practice (M2VIP), 2022: IEEE, pp. 1-5.
    [53] M. Holub et al., "Experimental study of operational data collection from CNC machine tools for advanced analysis," in 2022 20th International Conference on Mechatronics-Mechatronika (ME), 2022: IEEE, pp. 1-5.
    [54] X. Liu, "Linking competence to opportunities to learn: Models of competence and data mining," Springer, 2009.

    QR CODE