資料探勘中以特徵相關矩陣做特徵選擇探討｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	倪艾莎 Nisya Kintan Qumari
論文名稱：	資料探勘中以特徵相關矩陣做特徵選擇探討 Study on Using Correlation Matrix for Feature Selection in Data Mining
指導教授：	蘇順豐 Shun-Feng Su
口試委員:	蘇順豐 Shun-Feng Su 郭重顯 Chung-Hsien Kuo 莊鎮嘉 Chen-Chia Chuang 陳美勇 Mei-Yung Chen 梁書豪 Shu-Hao Liang
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2023
畢業學年度：	112
語文別：	英文
論文頁數：	52
外文關鍵詞：	Correlation Matrix, CNC Machines, Feature Selection, Machine Learning, Data Mining
相關次數：	點閱：35 下載：2
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

This study is to use correlation matrix for feature selection in data mining—more especially inside CNC machines is explored in this thesis. Prediction modeling and classification are the main areas of concentration, with various scenarios utilized to maximize model performance and feature selection. Regarding classification, a thorough investigation of multiple cases demonstrates the effectiveness of a hybrid strategy that employs correlation heatmap-guided feature selection and bases the procedure on power factor values. The Random Forest model stands out as a top performer with an accuracy of 96.8% in the base learner stage and an astounding 97.2% post-tuning, both in original and normalized data settings. Notably, better performance with fewer features supports the adage less is more, highlighted by the reduced feature set decision. The K-Nearest Neighbors model regularly performs better than other models in predictive modeling, showing the lowest error values in various circumstances. The K-Nearest Neighbors model demonstrates its robustness with low prediction errors, regardless of whether it uses all features without a power factor or just non-redundant features without a power factor. This work lays the groundwork for developing feature selection techniques, investigating ensemble learning opportunities, refining prediction models, and customizing strategies to the complexities of CNC machine datasets. These results add to the ongoing discussion in data mining and provide useful information for improving categorization and predictive modeling in the complex field of CNC machines.

Abstract	I
Acknowledgement	II
Contents	IV
Figures	VI
Tables	VIII
Introduction   1
1	Background   1
2	Aim and Contributions   2
3	Thesis Outline   3
Literature Review   4
Overview of Feature Selection with Correlation Matrix   6
Proposed Method   7
1	Methodology    7
1.1	Data Collection   7
1.2	Data Pre-processing   7
1.3 Case Study   10
2	Feature Selection   11
3 	Learning Scenario on Classification   13
4 	Learning Scenario on Prediction   14
4.1 All Features without PF   14
4.2 Non-Redundant Features without PF   15
5 	Developed Models   16
5.1	K-Nearest Neighbor   16
5.2 Decision Tree   17
5.3 Random Forest   19
5.4 Naïve Bayes   21
5.5 Super Vector Machine   22
Results and Discussion   24
1	Hardware Installation   24
1.1 CNC Machine   24
1.2	Power Meter   25
1.3	Switch   27
1.4	Access Point (AP)   28
1.5	Laptop   28
2	Dataset   29
3	Data Acquisition   29
4	Data Exploration   30
4.1	Import Libraries   30
4.2	Load Data   31
4.3	Count number of rows and columns   31
4.4	Dropping Unnamed / Nan Value   32
4.5	Dropping Irrelevant Features   32
4.6	Data Preprocessing   32
5	Feature Selection   33
5.1	Feature Selection in Classification   35
5.2	Feature Selection in Prediction   36
6	Normalized Dataset   38
7	Data Splitting   39
8	Performance Assessment on Classification   40
8.1	Accuracy   40
8.2	Confusion Matrix   41
9	Performance Assessment on Prediction   41
9.1 	Root Mean Square Error (RMSE) All Feature Without PF    42
9.2 	Root Mean Square Error (RMSE) Non-Redundant Feature without PF   44
Conclusions and Future Direction   47
1 Conclusions   47
2	Future Direction   48 
References   49
                                

[1] J. Han, J. Pei, and H. Tong, "Data mining: concepts and techniques," Morgan kaufmann, 2022.
[2] H. Peng, F. Long, and C. Ding, "Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy,” IEEE Transactions on pattern analysis and machine intelligence, vol. 27, no. 8, pp. 1226-1238, 2005.
[3] C. Guo et al., "Recent advancements in machining with abrasives," Journal of Manufacturing Science and Engineering, vol. 142, no. 11, p. 110810, 2020.
[4] Y. Zhang, X. Wang, and H. Tang, "An improved Elman neural network with piecewise weighted gradient for time series prediction," Neurocomputing, vol. 359, pp. 199-208, 2019.
[5] M. Zou et al., "Bayesian cnn-bilstm and vine-gmcm based probabilistic forecasting of hour-ahead wind farm power outputs," IEEE Transactions on Sustainable Energy, vol. 13, no. 2, pp. 1169-1187, 2022.
[6] B. Yi et al., "Resolving the influence of lignin on soil organic matter decomposition with mechanistic models and continental‐scale data," Global Change Biology, vol. 29, no. 20, pp. 5968-5980, 2023.
[7] K. Javed, H. A. Babri, and M. Saeed, "Feature selection based on class-dependent densities for high-dimensional binary data," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 3, pp. 465-477, 2010.
[8] L. Xu, Y. Lu, and S. Wu, "Optimization of One-dimensional Convolution Time Series Model for Computer Numerical Control Machine Tool Fault Diagnosis," in 2022 International Conference on Machine Learning and Intelligent Systems Engineering (MLISE), 2022: IEEE, pp. 254-259.
[9] N. Singh, "Detection of Breast Cancer with Python," 2020.
[10] L. Yu and H. Liu, "Feature selection for high-dimensional data: A fast correlation-based filter solution," in Proceedings of the 20th international conference on machine learning (ICML-03), 2003, pp. 856-863.
[11] I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," Journal of machine learning research, vol. 3, no. Mar, pp. 1157-1182, 2003.
[12] K. Huang, F. Meng, H. Li, S. Chen, Q. Wu, and K. N. Ngan, "Class activation map generation by multiple level class grouping and orthogonal constraint," in 2019 Digital Image Computing: Techniques and Applications (DICTA), 2019: IEEE, pp. 1-6.
[13] H. Youn, S. Kwon, H. Lee, J. Kim, S. Hong, and D.-J. Shin, "Construction of error correcting output codes for robust deep neural networks based on label grouping scheme," in 2021 7th IEEE International Conference on Network Intelligence and Digital Content (IC-NIDC), 2021: IEEE, pp. 51-55.
[14] D. Singh and B. Singh, "Investigating the impact of data normalization on classification performance," Applied Soft Computing, vol. 97, p. 105524, 2020.
[15] B. K. Baradwaj and S. Pal, "Mining educational data to analyze students' performance," arXiv preprint arXiv:1201.3417, 2012.
[16] B. Xue, M. Zhang, and W. N. Browne, "Particle swarm optimization for feature selection in classification: A multi-objective approach," IEEE transactions on cybernetics, vol. 43, no. 6, pp. 1656-1671, 2012.
[17] Z. Han, J. Zhao, H. Leung, K. F. Ma, and W. Wang, "A review of deep learning models for time series prediction," IEEE Sensors Journal, vol. 21, no. 6, pp. 7833-7848, 2019.
[18] J. Jia, N. Yang, C. Zhang, A. Yue, J. Yang, and D. Zhu, "Object-oriented feature selection of high spatial resolution images using an improved Relief algorithm," Mathematical and Computer Modelling, vol. 58, no. 3-4, pp. 619-626, 2013.
[19] B. A. Johnson, "High-resolution urban land-cover classification using a competitive multi-scale object-based approach," Remote Sensing Letters, vol. 4, no. 2, pp. 131-140, 2013.
[20] P. M. Mather and M. Koch, "Computer processing of remotely-sensed images: an introduction," John Wiley & Sons, 2011.
[21] R. Kohavi and G. H. John, "Wrappers for feature subset selection," Artificial intelligence, vol. 97, no. 1-2, pp. 273-324, 1997.
[22] Q. Dai, R. Ye, and Z. Liu, "Considering diversity and accuracy simultaneously for ensemble pruning," Applied Soft Computing, vol. 58, pp. 75-91, 2017.
[23] Y. Ren, L. Zhang, and P. N. Suganthan, "Ensemble classification and regression-recent developments, applications and future directions," IEEE Computational intelligence magazine, vol. 11, no. 1, pp. 41-53, 2016.
[24] Z. Chen, T. Lin, R. Chen, Y. Xie, and H. Xu, "Creating diversity in ensembles using synthetic neighborhoods of training samples," Applied Intelligence, vol. 47, pp. 570-583, 2017.
[25] L. Breiman, "Bagging predictors," Machine learning, vol. 24, pp. 123-140, 1996.
[26] W. Yan, D. Tang, and Y. Lin, "A data-driven soft sensor modeling method based on deep learning and its application," IEEE Transactions on Industrial Electronics, vol. 64, no. 5, pp. 4237-4245, 2016.
[27] Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of computer and system sciences, vol. 55, no. 1, pp. 119-139, 1997.
[28] L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001.
[29] M. I. Jordan and T. M. Mitchell, "Machine learning: Trends, perspectives, and prospects," Science, vol. 349, no. 6245, pp. 255-260, 2015.
[30] S. Chan and P. Treleaven, "Continuous model selection for large-scale recommender systems," in Handbook of statistics, vol. 33: Elsevier, 2015, pp. 107-124.
[31] N. Buslim, I. L. Rahmatullah, B. A. Setyawan, and A. Alamsyah, "Comparing Bitcoin's Prediction Model Using GRU, RNN, and LSTM by Hyperparameter Optimization Grid Search and Random Search," in 2021 9th International Conference on Cyber and IT Service Management (CITSM), 2021: IEEE, pp. 1-6.
[32] C.-W. Hsu, C.-C. Chang, and C.-J. Lin, "A practical guide to support vector classification," ed: Taipei, Taiwan, 2003.
[33] W. Wang and Y. Lu, "Analysis of the mean absolute error (MAE) and the root mean square error (RMSE) in assessing rounding model," in IOP conference series: materials science and engineering, 2018, vol. 324: IOP Publishing, p. 012049.
[34] M. Pujari, C. Awati, and S. Kharade, "Efficient Classification with an Improved Nearest Neighbor Algorithm," in 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), 2018: IEEE, pp. 1-5.
[35] I. Ibrahim and A. Abdulazeez, "The role of machine learning algorithms for diagnosing diseases," Journal of Applied Science and Technology Trends, vol. 2, no. 01, pp. 10-19, 2021.
[36] M.-L. Zhang and Z.-H. Zhou, "ML-KNN: A lazy learning approach to multi-label learning," Pattern recognition, vol. 40, no. 7, pp. 2038-2048, 2007.
[37] Y. Wang, Z. Pan, and Y. Pan, "A Training Data Set Cleaning Method by Classification Ability Ranking for the $ k $-Nearest Neighbor Classifier," IEEE transactions on neural networks and learning systems, vol. 31, no. 5, pp. 1544-1556, 2019.
[38] P. Mueller et al., "Scent classification by K nearest neighbors using ion-mobility spectrometry measurements," Expert Systems with Applications, vol. 115, pp. 593-606, 2019.
[39] J. R. Quinlan, "Induction of decision trees," Machine learning, vol. 1, pp. 81-106, 1986.
[40] F. Aaboub, H. Chamlal, and T. Ouaderhman, "Analysis of the prediction performance of decision tree-based algorithms," in 2023 International Conference on Decision Aid Sciences and Applications (DASA), 2023: IEEE, pp. 7-11.
[41] Y. CHEN and Y. MAO, "Automatic tuning of Ceph parameters based on random forest and genetic algorithm," Journal of Computer Applications, vol. 40, no. 2, p. 347, 2020.
[42] K. Pahwa, M. Sharma, M. S. Saggu, and A. K. Mandpura, "Performance evaluation of machine learning techniques for fault detection and classification in PV array systems," in 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN), 2020: IEEE, pp. 791-796.
[43] L. Wei, "Genetic Algorithm Optimization of Concrete Frame Structure Based on Improved Random Forest," in 2023 International Conference on Electronics and Devices, Computational Science (ICEDCS), 2023: IEEE, pp. 249-253.
[44] G. Dimitoglou, J. A. Adams, and C. M. Jim, "Comparison of the C4. 5 and a Naïve Bayes classifier for the prediction of lung cancer survivability," arXiv preprint arXiv:1206.1121, 2012.
[45] H. T. Zaw, N. Maneerat, and K. Y. Win, "Brain tumor detection based on Naïve Bayes Classification," in 2019 5th International Conference on engineering, applied sciences and technology (ICEAST), 2019: IEEE, pp. 1-4.
[46] M.-L. Zhang, J. M. Peña, and V. Robles, "Feature selection for multi-label naive Bayes classification," Information Sciences, vol. 179, no. 19, pp. 3218-3229, 2009.
[47] M. Martinez-Arroyo and L. E. Sucar, "Learning an optimal naive bayes classifier," in 18th international conference on pattern recognition (ICPR'06), 2006, vol. 3: IEEE, pp. 1236-1239.
[48] F. K. Mega, Y. Yaddarabullah, and S. D. H. Permana, "Classification Auction Motorcycle and Car In South Jakarta District Attorney Using Naïve Bayes," in 2021 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT), 2021: IEEE, pp. 168-171.
[49] F. Ramadhani and I. P. Sari, "Improving the Performance of Naïve Bayes Algorithm by Reducing the Attributes of Dataset Using Gain Ratio and Adaboost," in 2021 International Conference on Computer Science and Engineering (IC2SE), 2021, vol. 1: IEEE, pp. 1-5.
[50] H. Rhomadhona and J. Permadi, "Klasifikasi Berita Kriminal Menggunakan Naïve Bayes Classifier (NBC) dengan Pengujian K-Fold Cross Validation," Jurnal Sains Dan Informatika, vol. 5, no. 2, pp. 108-117, 2019.
[51] C. Cortes and V. Vapnik, "Support-vector networks," Machine learning, vol. 20, pp. 273-297, 1995.
[52] K. Chen, H. Yao, and Z. Han, "Arithmetic optimization algorithm to optimize support vector machine for chip defect Identification," in 2022 28th International Conference on Mechatronics and Machine Vision in Practice (M2VIP), 2022: IEEE, pp. 1-5.
[53] M. Holub et al., "Experimental study of operational data collection from CNC machine tools for advanced analysis," in 2022 20th International Conference on Mechatronics-Mechatronika (ME), 2022: IEEE, pp. 1-5.
[54] X. Liu, "Linking competence to opportunities to learn: Models of competence and data mining," Springer, 2009.

簡易檢索 / 詳目顯示

相關論文