簡易檢索 / 詳目顯示

研究生: Areti Nagendra Soma Charan
Areti Nagendra Soma Charan
論文名稱: 用於檢測網路釣魚網站的智能架構
Intelligent Architecture for Detecting Phishing Websites
指導教授: 陳俊良
Jiann-Liang Chen
口試委員: 陳俊良
林宗男
鄧惟中
馬奕葳
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2022
畢業學年度: 111
語文別: 英文
論文頁數: 90
中文關鍵詞: 網路釣魚安全機器學習深度學習多層集成模型網路釣魚
外文關鍵詞: Phishing,, Multi-Layer Ensemble model, Security, Machine Learning, Deep Learning
相關次數: 點閱:158下載:2
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於互聯網的匿名結構,網路釣魚已成為安全研究人員的重要關注點。網路釣魚是一種騙局,它利用技術和社會工程欺騙來竊取客戶的銀行帳戶資訊和個人身份資訊。網路釣魚攻擊變得越來越普遍,由於技術的快速發展,駭客正在開發越來越複雜的網路釣魚工具包。他們經常複製目標網站,使攻擊者能夠完全控制受害者的計算機,並使他們能夠違反任何進一步的安全措施。網路釣魚防護對於當前環境是必要的。因此,更加注意防止網路釣魚詐騙。
    本研究提出了一種用於網路釣魚網站檢測的智能架構。系統架構包括數據採集、預處理、特徵提取、數據轉換、最優特徵選擇、特徵縮放、模型評估、合法或釣魚預測八部分。本研究使用 URL 作為數據集來檢測網路釣魚網站。為了防止本研究中的數據不平衡,我們對良性和網路釣魚使用了等量的數據。從 URL 數據集中提取了 76 個要素;然後,它們分為基於 URL、基於路徑、基於域和基於查詢。這裏我們使用數據轉換技術將對象數據類型轉換為整數數據類型,並實現最優特徵選擇技術來選擇最佳 20 個特徵。為了提高訓練穩定性和模型的性能,應用的特徵縮放是改變特徵,使它們具有相當的大小。在模型評估中,機器學習、深度學習和多層集成模型被用於研究。使用混淆矩陣(例如準確性、精度、召回率和 F1 分數)計算模型的所有性能,並比較訓練時間。性能分析結果表明,與其他模型相比,多層集成學習模型的準確率最高,達到98.79%。


    Phishing has become a vital concern for security researchers due to the anonymous structure of the Internet. Phishing is a scam that employs technological and social engineering deception to steal a customer's bank account information and personal identification information. Phishing attacks are becoming more prevalent, and hackers are developing increasingly sophisticated phishing kits due to rapid technological advancements. They frequently duplicate the targeted site, granting the complete assailant control of the victim's computer and enabling them to breach any further security measures. Phishing prevention is necessary for the current environment. Therefore, more attention to be paid to preventing phishing scams.
    This study proposes an intelligent architecture for phishing website detection. The system architecture includes eight parts: data collection, pre-processing, feature extraction, data transformation, optimal feature selection, feature scaling, model evaluation, and prediction as legitimate or phishing. This study uses URLs as a dataset to detect phishing websites. To prevent data imbalance in this study, we used an equal quantity of data for benign and phishing. From the URL dataset, 76 features are extracted; these are then divided into URL-based, path-based, domain-based, and query-based. Here we used the data transformation technique to convert the object data type to integer data type and implemented the optimal feature selection technique and selected the best 20 features. In order to increase the training stability and the model's performance, applied feature scaling was to change the features such that they are of comparable size. In the model evaluation, machine learning, deep learning, and multi-layer ensemble models were used in the study. All the model's performance is calculated using the confusion matrix, such as Accuracy, Precision, Recall, and F1 score, and the training times were compared. The performance analysis results show that the Multi-layer ensemble model got the highest accuracy of 98.79% compared with other models.

    摘要 ................................ ................................ ................................ ................................ I Abstract ................................ ................................ ................................ ......................... II List of Figures ................................ ................................ ................................ ........... VIII List of Tables ................................ ................................ ................................ ................ X Chapter 1 Introduction ................................ ................................ ................................ 1 1.1 Motivation ................................ ................................ ................................ ...... 1 1.2 Contributions................................ ................................ ................................ .. 6 1.3 Organization ................................ ................................ ................................ ... 8 Chapter 2 Related Works ................................ ................................ ............................ 9 2.1 List-based Technique ................................ ................................ ..................... 9 2.2 Heuristic-based Technique ................................ ................................ ........... 10 2.3 Visual Similarity-based Technique ................................ .............................. 12 2.4 Machine/ Deep Learning Technique ................................ ............................ 15 Chapter 3 Proposed Phishing Detection System ................................ ...................... 21 3.1 System Architecture ................................ ................................ ..................... 21 3.2 Data Collection ................................ ................................ ............................ 22 3.3 Data Preprocessing................................ ................................ ....................... 22 3.4 Feature Extraction ................................ ................................ ........................ 23 V 3.4.1 URL-based Features................................ ................................ ................. 23 3.4.2 Domain-based Features ................................ ................................ ............ 26 3.4.3 Path-based Features ................................ ................................ ................. 28 3.4.4 Query-based Features................................ ................................ ............... 30 3.5 Data Transformation ................................ ................................ .................... 32 3.6 Optimal Feature Selection................................ ................................ ............ 32 3.7 Feature Scaling................................ ................................ ............................. 35 3.8 Models Evaluation ................................ ................................ ....................... 35 3.8.1 Machine Learning models................................ ................................ ........ 35 3.8.2 Deep Learning models ................................ ................................ ............. 36 3.8.2.1 Artificial Neural Network ................................ ............................ 36 3.8.2.2 Convolutional Neural Network ................................ .................... 37 3.8.2.3 Long-Short Term Memory ................................ ........................... 39 3.8.3 Multi-Layer Ensemble model ................................ ................................ .. 40 Chapter 4 System Environment and Performance Analysis ................................ ..... 42 4.1 System Environment ................................ ................................ .................... 42 4.2 Experimental Results ................................ ................................ ................... 43 4.2.1 Performance Analysis of Machine Learning Algorithms ......................... 44 4.2.1.1 Performance Analysis of Decision Tree ................................ ....... 45 VI 4.2.1.2 Performance Analysis of Random Forest ................................ .... 46 4.2.1.3 Performance Analysis of Logistic Regression ............................. 47 4.2.1.4 Performance Analysis of XGBoost ................................ .............. 48 4.2.1.5 Performance Analysis of Support Vector Machine ...................... 49 4.2.1.6 Performance Analysis of K-Nearest Neighbors ........................... 50 4.2.1.7 Performance Analysis of AdaBoost ................................ ............. 51 4.2.1.8 Performance Analysis of Extra Tree ................................ ............ 52 4.2.1.9 Performance Comparison of Machine Learning Algorithms ....... 53 4.2.2 Performance Analysis of Deep Learning Algorithms .............................. 54 4.2.2.1 Performance Analysis of Multi-Layer Perceptron ....................... 54 4.2.2.2 Performance Analysis of ANN................................ ..................... 55 4.2.2.3 Performance Analysis of CNN................................ ..................... 58 4.2.2.4 Performance Analysis of LSTM ................................ .................. 60 4.2.2.5 Performance Comparison of Deep Learning Algorithms ............ 63 4.2.3 Performance Analysis of Multi-Layer Ensemble model .......................... 63 4.3 Comparing with Different Study ................................ ................................ . 65 4.4 Summary ................................ ................................ ................................ ...... 66 Chapter 5 Conclusions and Future Works ................................ ................................ 69 5.1 Conclusions ................................ ................................ ................................ .. 69 VII 5.2 Future Works ................................ ................................ ................................ 70 References ................................ ................................ ................................ .................... 72

    [1] K. Greene, M. Steves and M. Theofanos, No phishing beyond this point Computer, vol. 51, no. 6, pp. 86-89, 2018.
    [2] S. R. Curtis, P. Rajivan, D. N. Jones and C. Gonzalez, "Phishing attempts among the dark triad Patterns of attack and vulnerability", Computers in Human Behavior, vol. 87, pp. 174-182, 2018.
    [3] "Phishing Activity Trends Report Q2 2022", [online] Available: https://docs.apwg.org/reports/apwg_trends_report_q2_2022.pdf
    [4] "Phishing Techniques", [online] Available: https://www.phishing.org/phishing-techniques.
    [5] Y. Cao, W. Han and Y. Le, "Anti-phishing based on automated individual white-list", Proceedings of the 4th ACM workshop on Digital identity management - DIM 08, pp. 51-60, 2008.
    [6] M. Sharifi and S. H. Siadati, "A phishing sites blacklist generator", 2008 IEEE/ACS International Conference on Computer Systems and Applications, pp. 840-843, 2008.
    [7] Prakash P, Kumar M, Kompella RR, Gupta M (2010) Phishnet: predictive blacklisting to detect phishing attacks. In: INFOCOM, IEEE Proceedings, pp. 1–5, 2010.
    [8] Y. Joshi, S. Saklikar, D. Das and S. Saha, "PhishGuard: A browser plug-in for protection from phishing," 2nd International Conference on Internet Multimedia Services Architecture and Applications, pp. 1-6, 2008, doi: 10.1109/IMSAA.2008.4753929.
    [9] H. Shahriar and M. Zulkernine, "Trustworthiness testing of phishing websites: A behavior model-based approach," Future Generation Computer Systems, Volume 28, Issue 8, pp. 1258-1271, 2012.
    [10] R. S. Rao, S. T. Ali, "PhishShield: A Desktop Application to Detect Phishing Webpages through Heuristic Approach, "Procedia Computer Science, Volume 54, pp. 147-156, 2015.
    [11] R. S. Rao and A. R. Pais,"Detecting Phishing Websites using Automation of Human Behavior." In Proceedings of the 3rd ACM Workshop on Cyber-Physical System Security (CPSS '17). Association for Computing Machinery, New York, NY, USA, pp 33–42, 2017.
    [12] Z. Zhang, Q. He, and B. Wang, 2017, "A Novel Multi-Layer Heuristic Model for Anti-Phishing." In Proceedings of the 6th International Conference on Information Engineering (ICIE '17), Association for Computing Machinery, New York, NY, USA, Article 21, pp 1–6.
    [13] A. Y. Fu, L. Wenyin and X. Deng, "Detecting Phishing Web Pages with Visual Similarity Assessment Based on Earth Mover's Distance (EMD)," IEEE Transactions on Dependable and Secure Computing, vol. 3, no. 4, pp. 301-311, Oct.-Dec. 2006, doi: 10.1109/TDSC.2006.50.
    [14] L. Wenyin, G. Huang, L. Xiaoyue, Z. Min, and X. Deng. 2005. Detection of phishing webpages based on visual similarity. In Special interest tracks and posters of the 14th international conference on World Wide Web (WWW '05). Association for Computing Machinery, New York, NY, USA, pp. 1060–1061.
    [15] M. Hara, A. Yamada and Y. Miyake, "Visual similarity-based phishing detection without victim site information," 2009 IEEE Symposium on Computational Intelligence in Cyber Security, 2009, pp. 30-36, doi: 10.1109/CICYBS.2009.4925087.
    [16] R. S. Rao and S. T. Ali, "A Computer Vision Technique to Detect Phishing Attacks," 2015 Fifth International Conference on Communication Systems and Network Technologies, 2015, pp. 596-601, doi: 10.1109/CSNT.2015.68.
    [17] Kang Leng Chiew, Ee Hung Chang, San Nah Sze, Wei King Tiong, Utilisation of website logo for phishing detection, Computers & Security, Volume 54, 2015, pp. 16-26, ISSN 0167-4048.
    [18] S. Haruta, H. Asahina and I. Sasase, "Visual Similarity-Based Phishing Detection Scheme Using Image and CSS with Target Website Finder," GLOBECOM 2017 - 2017 IEEE Global Communications Conference, 2017, pp. 1-6, doi: 10.1109/GLOCOM.2017.8254506.
    [19] Y. Zhou, Y. Zhang, J. Xiao, Y. Wang and W. Lin, "Visual Similarity Based Anti-phishing with the Combination of Local and Global Features," 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications, 2014, pp. 189-196, doi: 10.1109/TrustCom.2014.28.
    [20] H. BOUIJIJ and A. BERQIA, "Machine Learning Algorithms Evaluation for Phishing URLs Classification," 2021 4th International Symposium on Advanced Electrical and Communication Technologies (ISAECT), 2021, pp. 01-05, doi: 10.1109/ISAECT53699.2021.9668489.
    [21] V. Patil, P. Thakkar, C. Shah, T. Bhat and S. P. Godse, "Detection and Prevention of Phishing Websites Using Machine Learning Approach," 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), 2018, pp. 1-5, doi: 10.1109/ICCUBEA.2018.8697412.
    [22] J. Rashid, T. Mahmood, M. W. Nisar and T. Nazir, "Phishing Detection Using Machine Learning Technique," 2020 First International Conference of Smart Systems and Emerging Technologies (SMARTTECH), 2020, pp. 43-46, doi: 10.1109/SMART-TECH49988.2020.00026.
    [23] A. N. S. Charan, Y. -H. Chen and J. -L. Chen, "Phishing Websites Detection using Machine Learning with URL Analysis," 2022 IEEE World Conference on Applied Intelligence and Computing (AIC), 2022, pp. 808-812, doi: 10.1109/AIC55036.2022.9848895.
    [24] Y. -H. Chen and J. -L. Chen, "Intelligent Learning Architecture with Hybrid Features for Phishing Detection," 2021 Twelfth International Conference on Ubiquitous and Future Networks (ICUFN), 2021, pp. 225-230, doi: 10.1109/ICUFN49451.2021.9528537.
    [25] Y. -C. Chen, Y. -W. Ma and J. -L. Chen, "Intelligent Malicious URL Detection with Feature Analysis," 2020 IEEE Symposium on Computers and Communications (ISCC), 2020, pp. 1-5, doi: 10.1109/ISCC50000.2020.9219637.
    [26] T. Nathezhtha, D. Sangeetha and V. Vaidehi, "WC-PAD: Web Crawling based Phishing Attack Detection," 2019 International Carnahan Conference on Security Technology (ICCST), 2019, pp. 1-6, doi: 10.1109/CCST.2019.8888416.
    [27] L. R. Kalabarige, R. S. Rao, A. Abraham and L. A. Gabralla, "Multilayer Stacked Ensemble Learning Model to Detect Phishing Websites," in IEEE Access, vol. 10, pp. 79543-79552, 2022, doi: 10.1109/ACCESS.2022.3194672.
    [28] M. Aljabri and S. Mirza, "Phishing Attacks Detection using Machine Learning and Deep Learning Models," 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA), 2022, pp. 175-180, doi: 10.1109/CDMA54072.2022.00034.
    [29] B. Gogoi, T. Ahmed and A. Dutta, "A Hybrid approach combining blocklists, machine learning and deep learning for detection of malicious URLs," 2022 IEEE India Council International Subsections Conference (INDISCON), 2022, pp. 1-6, doi: 10.1109/INDISCON54605.2022.9862909.
    [30] A. Odeh, I. Keshta and E. Abdelfattah, "Machine Learning Techniques for Detection of Website Phishing: A Review for Promises and Challenges," 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC), 2021, pp. 0813-0818, doi: 10.1109/CCWC51732.2021.9375997.
    [31] P. Yang, G. Zhao and P. Zeng, "Phishing Website Detection Based on Multidimensional Features Driven by Deep Learning," in IEEE Access, vol. 7, pp. 15196-15209, 2019, doi: 10.1109/ACCESS.2019.2892066.
    [32] L. Tang and Q. H. Mahmoud, "A Deep Learning-Based Framework for Phishing Website Detection," in IEEE Access, vol. 10, pp. 1509-1521, 2022, doi: 10.1109/ACCESS.2021.3137636.
    [33] J. Feng, L. Zou, O. Ye and J. Han, "Web2Vec: Phishing Webpage Detection Method Based on Multidimensional Features Driven by Deep Learning," in IEEE Access, vol. 8, pp. 221214-221224, 2020, doi: 10.1109/ACCESS.2020.3043188.
    [34] M. H. F. Butt, J. P. Li, T. Saboor, M. Arslan and M. A. F. Butt, "Intelligent Phishing Url Detection: A Solution Based On Deep Learning Framework," 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), 2021, pp. 434-439, doi: 10.1109/ICCWAMTIP53232.2021.9674162.

    QR CODE