簡易檢索 / 詳目顯示

研究生: 莊惠鈞
Chuang-Hui Chun
論文名稱: 植基於反應變數與主成份相關性之二元分類器
A binary classification procedure based on the correlation with response variable and principal components
指導教授: 楊維寧
Wei-Ning Yang
口試委員: 呂永和
陳雲岫
學位類別: 碩士
Master
系所名稱: 管理學院 - 資訊管理系
Department of Information Management
論文出版年: 2018
畢業學年度: 106
語文別: 中文
論文頁數: 22
中文關鍵詞: 二元分類主成分分析基因演算法線性判別分析
外文關鍵詞: binary classification, Principal Components Analysis, Genetic Algorithm, Linear Discrimination Analysis
相關次數: 點閱:248下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究之目的在於提出一個植基於合成屬性(integrated feature)的二元分類法,其主要是先利用「主成份分析」法將原本互為相關「屬性變數」的「屬性向量」轉化為任兩個「主成份」皆不相關的「主成份向量」,目的在於消除「屬性變數」之間的相關性太高時導致的多元共線性(multi-collinearity)問題。
    將「屬性向量」轉化為「主成份向量」後,依據「主成份」與類別變數(label)之間相關係數 (correlation coefficient)的平方進行「主成份」排序,再逐步將相關係數top 1、top 2、top 3 ... 的主成份加入資料集中,本方法將自動探索並利用合成屬性與類別變數之間的關係,來提升分類的正確性。合成屬性是「主成份」與權重向量的線性組合,在給定的訓練樣本下,我們決定每個「主成份」的權重,使得依據合成屬性分類的準確度(Accuracy)最大化,由於計算最佳權重向量的困難,於是我們先利用線性判別分析(Linear Discrimination Analysis, LDA)演算法取得優化的權重向量,再以基因演算法(Genetic Algorithm)來逼近最佳的權重向量與分類的臨界值。本研究以醫學研究資料集進行實驗,實驗結果顯示本研究所提之二元分類器,在訓練樣本分類準確度可達到99.29%,測試樣本分類準確度則可達到97.18%。


    Binary classification method predicts the class of an object based on the associated feature vector. The main logic behind the classification is to regress the feature vector on the class variable. Due to the high correlation between feature variables, we apply principal component analysis to obtain the uncorrelated transformed features to mitigate the multicollinearity problem.For maintaining the simplicity and flexibility of modeling the relationship between the feature vector and the class variable, we propose to predict the class of an object based on an integrated feature which is the linear combination of the features and the powers of each individual feature.The weights on the original features and their powers, and the corresponding classification threshold are determined by the genetic algorithm to maximize the classification accuracy based on the training dataset.To speed up the convergence rate of the genetic algorithm, the solution obtained from the linear discrimination analysis is used as the starting point of the genetic algorithm.We apply the proposed procedure on the benchmark datasets, and experimental results demonstrate the effectiveness of our algorithm.

    摘要 i ABSTRACT ii 致謝 iii 目錄 iv 圖目錄 v 第1章 序論 1 1.1 研究動機 1 1.2 研究目的 1 1.3 論文架構 2 第2章 資料集與研究方法 3 2.1 資料集簡介 3 2.2 資料處理方法 3 2.3 分類演算法 5 第3章 實驗步驟與結果 7 3.1 實驗步驟 7 3.2 實驗結果 9 第4章 結論與未來展望 10 4.1 結論 10 4.2 未來展望 11 參考文獻 12 附錄 13

    [1] Olli Lahdenoja, Mika Laiho, and Ari Paasio,”Reducing the feature vector length in local binary pattern based face recognition,” IEEE International Conference on Image Processing, Genoa, Italy, Sept. 11-14, 2005, Vol. 2, pp. 914-917.

    [2] Timo Ojala, Matti Pietikainen, and David Harwood, ”A comparative study of texture measures with classification based on feature distributions,” Pattern Recognition, Vol. 29, No. 1, pp. 51-59, 1996.

    [3] Lichman, M. (2013). UCI Machine Learning Repository [http://
    archive.ics.uci.edu/ml]. Irvine, CA: University of California, School
    of Information and Computer Science.

    [4] S. Aruna et al. (2011), Knowledge based analysis of various statistical tools in detecting breast cancer.

    [5]Fisher, R.: The use of multiple measurements in taxonomic problems. Annals of Eugenics 7(7), 179_188 (1936)

    [6]Izenman A.J. (2013) Linear Discriminant Analysis.In: Modern Multivariate Statistical Techniques.Springer Texts in Statistics. Springer, New York, NY

    [7] D.E. Goldberg, Genetic Algorithms in Search, Optimisation, and Machine Learning, Addison Wesley (1989)

    [8] Pearson, K. (1901). ”On Lines and Planes of Closest Fit to Systems
    of Points in Space”. Philosophical Magazine. 2 (11): 559-572.

    [9] Vellore, Tamil Nadu,India “ A SURVEY OF DIMENSIONALITY REDUCTION AND CLASSIFICATION METHODS” , International Journal of Computer Science & Engineering Survey (IJCSES) Vol.3, No.3, June 2012

    [10] Zhao, Z., Wang, L., Liu, H. and Ye, J. (2013). On Similarity Preserving Feature Selection.IEEE Transactions on Knowledge and Data
    Engineering, 25(3), pp.619-632.

    無法下載圖示 全文公開日期 2023/08/06 (校內網路)
    全文公開日期 本全文未授權公開 (校外網路)
    全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
    QR CODE