簡易檢索 / 詳目顯示

研究生: 陳俊霖
Jun-Lin Chen
論文名稱: 基於選擇域強化策略優化高維度連續型數據之離散化特徵選擇問題
Optimizing Discretized Feature Selection Problems in High-Dimensional Continuous Data Based on Selection Domain Enhancement Strategy
指導教授: 花凱龍
Kai-Lung Hua
楊朝龍
Chao-Lung Yang
口試委員: 沈上翔
Shan-Hsiang Shen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 英文
論文頁數: 56
中文關鍵詞: 特徵選擇特徵離散粒子群演算法分類高為數據
外文關鍵詞: Feature selection, Feature discretization, Particle swarm optimization, Classification, High-dimensional data
相關次數: 點閱:163下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本研究旨在提出一個基於選擇域優化策略的新方法稱為 Selection Domain Enhancement Strategy for Discretized Feature Selection (SD-DFS),以求在有限的計算成本下,同時解決多變量離散化特徵選擇問題及切點依賴性問題。為了避免特徵離散及特徵選擇兩者分開進行時會有丟失離散特徵之間交互資訊,因此近幾年的研究將兩者合併考慮,透過熵建立潛在切點子集並透過進化算法來同時選擇特徵及切點。然而數據集中每個特徵的潛在切點數量不同。過去方法透過雙層優化來解決潛在切點子集因切點數量不同產生的切點依賴性問題,然而這樣的方式需要很高的計算成本來實現,包含了需要更高的迭代次數及透過 GPU 來進行運算。本研究提出切點選擇域獨立均分策略來對切點進行編碼,並透過選擇域縮放策略及相互資訊排名策略來進一步優化。我們使用 10 個高維數據集來測試性能,將 SD-DFS 相對過去幾個主要方法比較,綜合評估如準確率平均值、準確率最高值與特徵選擇數目等主要指標具有優勢。本研究的主要貢獻包括在不大幅提高計算成本的情況下解決切點依賴性的問題,並用較少的離散特徵取得相對較優的分類精度。


    % 英文abstarct
    The aim of this study is to propose a new method called Selection Domain Enhancement Strategy for Discretized Feature Selection (SD-DFS) based on a selection domain optimization strategy to simultaneously solve the problem of multi-variable discrete feature selection and cut-point dependency with limited computational cost. To avoid losing interaction information between discrete features when separately considering feature discretization and selection, recent studies have merged the two by establishing a potential cut-point subset through entropy and using evolutionary algorithms to select features and cut-points simultaneously. However, the number of potential cut-points for each feature in the dataset is different, and past methods have used a dual optimization approach to solve the cut-point dependency problem caused by different numbers of cut-points in the subset. However, this approach requires high computational costs, including increased iteration times and GPU computations. This study proposes a cut-point selection domain-independent uniform distribution strategy to encode cut-points and further optimize them using selection domain scaling and mutual information ranking strategies. The performance was tested of SD-DFS on ten high-dimensional datasets and compared it with several major previous methods in terms of main indicators such as mean accuracy, maximum accuracy, and number of selected features. SD-DFS demonstrated advantages in these indicators. The main contributions of this study are solving the cut-point dependency problem without significantly increasing computational costs and achieving relatively superior classification accuracy with fewer discrete features.

    Chinese Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . 5 List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . 7 List of Illustrations . . . . . . . . . . . . . . . . . . . .. . . . 8 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2 Background & Related work . . . . . . . . . . . . . . . . . . . . 12 2.1 Particle Swarm Optimization . . . . . . . . . . . . . . . . . . 12 2.2 Feature Selection . . . . . . . . . . . . . . . . . . . . . . .. 13 2.3 Feature Discretization . . . . . . . . . . . . . . . . . . . . . 14 2.4 Feature Selection via Discretization . . . . . . . . . . . . . . 16 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1 Model Design . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 Cut-Points Generated by MDLP . . . . . . . . . . . . . . . . . . 25 3.3 Selected Domain Strategy Based on Discrete Features . . . . . . 26 3.4 Selected Domain Dynamic Scaling Mechanism . . . . . . . . . . . 29 3.5 MI ranking strategy . . . . . . . . . . . . . . . . . . . . . .. 32 3.5.1 Mutual Information . . . . . . . . . . . . . . . . . . . . . . 32 3.5.2 Method Description . . . . . . . . . . . . . . . . . . . . . . 32 3.6 Fitness Function . . . . . . . . . . . . . . . . . . . . . . . . 35 4 Experiment & Result . . . . . . . . . . . . . . . . . . . . . . . .37 4.1 Experiment Design . . . . . . . . . . . . . . . . . . . . . . .. 37 4.1.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.1.2 Benchmark Methods . . . . . . . . . . . . . . . . . . . . . .. 37 4.1.3 Experimental parameters . . . . . . . . . . . . . . . . . . . 38 4.2 Results . . . . . . . . . . . . . . . . . . . . . . . .. . . . . 39 4.2.1 SD-DFS vs Full . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2.2 SD-DFS vs PSO-FS . . . . . . . . . . . . . . . . . . . . . . . 43 4.2.3 SD-DFS vs EPSO . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2.4 SD-DFS vs PPSO . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2.5 SD-DFS vs CC-DFS . . . . . . . . . . . . . . . . . . . . . . . 45 4.3 Ablation experiment . . . . . . . . . . . . . . . . . . . . .. . 45 4.4 Computation Time . . . . . . . . . . . . . . . . . . . . . . . . 48 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    [1] W. Raghupathi and V. Raghupathi, “Big data analytics in healthcare: Promise
    and potential,” Health Inf. Sci. Syst., vol. 2, pp. 1–10, 2014.
    [2] O. Y. Al-Jarrah, P. D. Yoo, S. Muhaidat, G. K. Karagiannidis, and K. Taha,
    “Efficient machine learning for big data: A review,” Big Data Res., vol. 2, no. 3,
    pp. 87–93, 2015.
    [3] M. James, C. Michael, B. Brad, B. Jacques et al., “Big data: The next frontier
    for innovation competition and productivity,” McKinsey Global Institute New York, NY, Tech. Rep., 2011.
    [4] A. J. Ferreira and M. A. Figueiredo, “Efficient feature selection filters for high-dimensional data,” Pattern Recognition Letters, vol. 33, no. 13, pp. 1794–1804,
    2012.
    [5] P. Agrawal, H. F. Abutarboush, T. Ganesh, and A. W. Mohamed, “Meta-heuristic algorithms on feature selection: A survey of one decade of research (2009-2019),” IEEE Access, vol. 9, pp. 26 766–26 791, 2021.
    [6] M. Hammami, S. Bechikh, C.-C. Hung, and L. SDid, “A multiobjective hybrid
    filter-wrapper evolutionary approach for feature selection,” Memetic Comput-
    ing, vol. 11, no. 2, pp. 193–208, July 2019.
    [7] X.-F. Song, Y. Zhang, Y.-N. Guo, X.-Y. Sun, and Y.-L. Wang, “Variable-size
    cooperative coevolutionary particle swarm optimization for feature selection
    on high-dimensional data,” IEEE Transactions on Evolutionary Computation, vol. 24, no. 5, pp. 882–895, January 2020.
    [8] S. Garc’ıa, J. Luengo, J. A. S’aez, V. L’opez, and F. Herrera, “A survey of
    discretization techniques: taxonomy and empirical analysis in supervised learning,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 4, pp. 734–750, 2013.
    [9] H. Liu, F. Hussain, C. L. Tan, and M. Dash, “Discretization: an enabling
    technique,” Data mining and knowledge discovery, vol. 6, no. 4, pp. 393–423, 2002.
    [10] S. Kotsiantis and D. Kanellopoulos, “Discretization techniques: A recent sur-
    vey,” GESTS International Transactions on Computer Science and Engineering, vol. 32, no. 1, pp. 47–58, 2006.
    [11] S. Sharmin, M. Shoyaib, A. Ali, M. A. H. Khan, and O. Chae, “Simultaneous feature selection and discretization based on mutual information,” Pattern Recognition, vol. 91, pp. 162–174, Jul 2019.
    [12] B. Tran, B. Xue, and M. Zhang, “Bare-bone particle swarm optimization for simultaneously discretising and selecting features for high-dimensional classification,” in Applications of Evolutionary Computation, the 19th European Conference on the Applications of Evolutionary Computation (EvoApplications ’2016).
    Springer, 2016, pp. 701–718.
    [13] ——, “A new representation in pso for discretization-based feature selection,”
    IEEE Transactions on Cybernetics, vol. 48, no. 6, pp. 1733–1746, 2017.
    [14] Y. Zhou, J. Kang, and X. Zhang, “A cooperative coevolutionary approach to discretization-based feature selection for high-dimensional data,” Entropy, vol. 22, no. 6, p. 613, 2020.
    [15] Y. Zhou, J. Lin, and H. Guo, “Feature subset selection via an improved
    discretization-based particle swarm optimization,” Applied Soft Computing, vol. 98, p. 106794, 2021.
    [16] Y. Zhou, J. Kang, S. Kwong, X. Wang, and Q. Zhang, “An evolutionary multi-
    objective optimization framework of discretization-based feature selection for
    classification,” Swarm and Evolutionary Computation, vol. 60, p. 100770, 2021.
    [17] R. Sdid, M. Elarbi, S. Bechikh, C. A. Coello Coello, and L. Ben Sdid,
    “Discretization-based feature selection as a bi-level optimization problem,”
    IEEE Transactions on Evolutionary Computation, 2022, early Access.
    [18] J. Kennedy, “Particle swarm optimization,” Encyclopedia of machine learning,
    pp. 760–766, 2011.
    [19] R. Eberhart and J. Kennedy, “A new optimizer using particle swarm theory,”
    in Proceedings of the Sixth International Symposium on Micro Machine and Human Science, 1995, pp. 39–43.
    [20] T. Dokeroglu, A. Deniz, and H. E. Kiziloz, “A comprehensive survey on recent
    metaheuristics for feature selection,” Neurocomputing, vol. 494, pp. 269–296, 2022.
    [21] T. Cover and P. Hart, “Nearest neighbor pattern classification,” IEEE Trans-
    actions on Information Theory, vol. 13, no. 1, pp. 21–27, 1967.
    [22] T. Mitchell, Machine learning. McGraw-Hill, 1997.
    [23] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning, vol. 20,
    no. 3, pp. 273–297, 1995.
    [24] A. Jović, K. Brkić, and N. Bogunović, “A review of feature selection meth-
    ods with applications,” in 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). IEEE, 2015, pp. 1216–1221.
    [25] B. Xue, M. Zhang, W. N. Browne, and X. Yao, “A survey on evolutionary com-
    putation approaches to feature selection,” IEEE Transactions on Evolutionary
    Computation, vol. 20, no. 4, pp. 606–626, 2015.
    [26] A. S. Abdullah, C. Ramya, V. Priyadharsini, C. Reshma, and S. Selvakumar,
    “A survey on evolutionary techniques for feature selection,” in 2017 Conference
    on Emerging Devices and Smart Systems (ICEDSS 2017). IEEE, 2017, pp.58–62.
    [27] B. Tran, B. Xue, and M. Zhang, “Variable-length particle swarm optimization
    for feature selection on high-dimensional classification,” IEEE Transactions on
    Evolutionary Computation, vol. 23, no. 3, pp. 473–487, September 2018.
    [28] B. Tran, B. Xue, M. Zhang, and S. Nguyen, “Investigation on particle swarm
    optimization for feature selection on high-dimensional data: Local search and
    selection bias,” Connection Science, vol. 28, no. 3, pp. 270–294, May 2016.
    [29] A. A. Bakar, Z. A. Othman, and N. L. M. Shuib, “Building a new taxonomy
    for data discretization techniques,” pp. 132–140, 2009.
    [30] M. R. Chmielewski and J. W. Grzymala-Busse, “Global discretization of con-
    tinuous attributes as preprocessing for machine learning,” Int’l J. Approximate
    Reasoning, vol. 15, no. 4, pp. 319–331, 1996.
    [31] G. K. Singh and S. Minz, “Discretization using clustering and rough set theory,”
    in Proc. 17th Int’l Conf. Computer Theory and Applications (ICCTA), 2007, pp. 330–336.
    [32] U. M. Fayyad and K. Irani, “Multi-interval discretization of continuous-valued
    attributes for classification learning,” in Proc. 13th Int. Joint Conf. Artif. In-
    tell., vol. 2, 1993, pp. 1022–1027.
    [33] T. M. Cover and J. A. Thomas, Elements of information theory. John Wiley
    Sons, 1991, vol. 1.
    [34] Z. Zhu, Y.-S. Ong, and M. Dash, “Markov blanket-embedded genetic algorithm
    for gene selection,” Pattern Recognition, vol. 40, no. 11, pp. 3236–3248, 2007.
    [35] L. J. Sheela and D. V. Shanthi, “An approach for discretization and feature
    selection of continuous-valued attributes in medical images for classification
    learning,” Int. J. Comput. Theory Eng., vol. 1, no. 2, pp. 154–158, 2009.

    無法下載圖示 全文公開日期 2028/06/27 (校內網路)
    全文公開日期 2033/06/27 (校外網路)
    全文公開日期 2033/06/27 (國家圖書館:臺灣博碩士論文系統)
    QR CODE