簡易檢索 / 詳目顯示

研究生: 黃冠龍
Kuan-Lung Huang
論文名稱: 特定企業之視覺化釣魚網站偵測
Visual Mechanisms Approach to Detect Phishing Website for Specific Enterprise
指導教授: 陳俊良
Jiann-Liang Chen
口試委員: 楊竹星
Chu-Sing Yang
黎碧煌
Bih-Hwang Lee
馬奕葳
Yi-Wei Ma
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 英文
論文頁數: 63
中文關鍵詞: 釣魚網站wHashSIFT視覺化機制圖像相似度
外文關鍵詞: phishing website, wHash, SIFT, visualization mechanism, image similarity
相關次數: 點閱:207下載:7
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 過去20年間,網路上的釣魚行為從未間斷,近年基於啟發式的機器學習方法已獲得重大的突破,然而釣魚手法不斷推陳出新,使得過去一些顯著的特徵在分類上的效能有所下降,因此,需要使用不易隨時間改變的特徵來偵測釣魚網站。
    本研究提出基於視覺化機制來對特定企業網站進行釣魚網站偵測,針對釣魚網站的分類,視覺上的特徵是一種不易隨時間改變的特徵。針對極為相似的案例,以Wavelet hashing (wHash) 搭配色彩直方圖比較來對網頁截圖進行相似度的評估,wHash相較於Perceptual hashing (pHash) 對圖像資訊的雜訊容忍度較高,色彩直方圖則補足了wHash僅基於輪廓相似的缺陷。針對局部相似的案例,主要是利用釣魚網站使用知名品牌Logo的特點,以尺度不變特徵變換 (Scale-Invariant Feature Transform, SIFT) 來偵測Logo,進一步提升輪廓及色彩相似的準確度,並搭配cache提升整體速度。
    本研究分別對極為相似和局部相似的案例進行實驗,實驗結果表明,在極為相似的實驗中wHash搭配色彩直方圖在threshold的制訂上較pHash穩定,本研究還利用了釣魚網站大量使用相同前端框架的特性,將待測網站與cache內黑名單做比對,進一步提升了全域比對的利用率,在局部相似的實驗中SIFT對Logo的偵測優於加速穩健特徵(Speeded Up Robust Features, SURF),準確度可達97.93%,針對釣魚網站的整體性能測試的不平衡數據分析,準確度可達98.14%,平衡數據分析,準確度可達96.33%,此外,本機制加入了cache進一步縮短了偵測時間,偵測速度可提升4.6倍。


    In the past 20 years, the phishing attacks on the web have never been interrupted. In recent years, heuristic-based machine learning methods have made major breakthroughs. However, the phishing methods are constantly being updated that cause some significant features of the past less effective in classification. Therefore, it is necessary to use features that are not easily changed over time to detect phishing sites.
    This study proposes to use a visual mechanism to detect phishing websites for specific enterprise’s websites. For the classification of phishing websites, the visual features are not easy to change with time. The types of visually detecting phishing sites can be classified into three categories: very similar, local similar, and non-imitation.
    For very similar cases, Wavelet hashing (wHash) and color histogram comparison are used to evaluate the similarity of webpage screenshots. wHash has higher tolerance for image noise than Perceptual hashing (pHash). Color histograms fix wHash based on contour similarity defects. For local similarity cases, the phishing website uses the well-known brand logos that can be detected the logo by Scale-Invariant Feature Transform (SIFT), which further improves the accuracy of contour and color similarity. And used cache to improve overall speed.
    This study experimented with very similar and locally similar cases. In very similar experiments, wHash with color histogram is more stable than pHash in threshold setting. This study also makes use of the phishing website to use the same front-end framework features. Comparing the website to be tested with the blacklist in the cache, further improving the utilization of the screenshot matcher. In the local similar experiment, SIFT is better than Speeded Up Robust Features (SURF) for detecting logos, and the accuracy is up to 97.93%. For the unbalanced data analysis of the overall performance, the accuracy can reach 98.14%, and the balance data analysis can reach 96.33%. In addition, the mechanism is further shortened detection time by adding the cache. The detection speed increased by 4.6 times.

    摘要 I Abstract IV 致謝 VI Contents VII List of Figures XIII List of Tables XVI Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contributions 6 1.3 Organization 8 Chapter 2 Background Knowledge 9 2.1 Phishing site appearance 9 2.2 Very Similar 11 2.2.1 Locality-Sensitive Hashing (LSH) 11 2.2.2 Wavelet Hashing (wHash) 12 2.2.3 Color Histogram 13 2.3 Local Similar 13 2.3.1 Scale-Invariant Feature Transform (SIFT) 14 Chapter 3 Phishing Detect System based on Visual Mechanisms 17 3.1 System Overview 17 3.2 Detection Mechanisms 17 3.2.1 Offline Phase 18 3.2.2 Online Phase 19 3.3 Modules 21 3.3.1 Cache 21 3.3.2 Screenshot Matcher 22 3.3.2.1 Contour Matcher 22 3.3.2.2 Color Matcher 24 3.3.3 Logo Finder 25 Chapter 4 System Environment and Performance Analysis 27 4.1 System Environment 27 4.2 Experiment Settings 27 4.3 Performance Analysis 30 4.3.1 Very Similar Cases 30 4.3.2 Local Similar Cases 46 4.3.3 Complete Test 50 4.4 Summary 53 Chapter 5 Conclusion and Future Work 55 5.1 Conclusion 55 5.2 Future Work 56 References 57

    [1] Phishing Scams Cost American Businesses Half a Billion Dollars a Year (https://www.forbes.com/sites/leemathews/2017/05/05/phishing-scams-cost-american-businesses-half-a-billion-dollars-a-year/#79340383fa1c)
    [2] APWG's Q3 2017 Phishing Activity Trends Report (http://docs.apwg.org/reports/apwg_trends_report_q3_2017.pdf)
    [3] A.K. Jain and B.B. Gupta, "Two-level Authentication Approach to Protect from Phishing Attacks in Real Time, "Journal of Ambient Intelligence and Humanized Computing, vol. 9, no. 6, pp 1783–1796, 2018.
    [4] Tan, C. Lin, K.L. Chiew, and K.S. Wong. "PhishWHO: Phishing Webpage Detection via Identity Keywords Extraction and Target Domain Name Finder." Decision Support Systems, vol. 88, pp. 18-27, 2017.
    [5] L.A.T. Nguyen, B.L. To, H.K. Nguyen and M.H. Nguyen, "A Novel Approach for Phishing Detection Using URL-based Heuristic," Proceedings of International Conference on Computing, Management and Telecommunications, pp. 298-303, 2014.
    [6] M. Khonji, A. Jones and Y. Iraqi, "A Novel Phishing Classification Based on URL
    Features," Proceedings of Conference and Exhibition, pp. 221-224, 2011.
    [7] A.C. Bahnsen, E.C. Bohorquez, S. Villegas, J. Vargas and F.A. González, "Classifying Phishing URLs Using Recurrent Neural Networks," Proceedings of APWG Symposium on Electronic Crime Research, pp. 1-8, 2017.
    [8] D. Sahoo, C. Liu and C.H. Hoi, "Malicious URL Detection Using Machine Learning: A Survey," arXiv preprint, 2017, arXiv:1701.07179.
    [9] E. Buber, B. Dırı and O.K. Sahingoz, "Detecting Phishing Attacks from URL by Using NLP Techniques," Proceedings of International Conference on Computer Science and Engineering, pp. 337-342, 2017.
    [10] A. Blum, B. Wardman, T. Solorio and G. Warner Department of Computer. "Lexical Feature Based Phishing URL Detection Using Online Learning." Proceedings of the 3rd ACM workshop on Artificial intelligence and security, pp. 54-60, 2010.
    [11] C.L. Tan, K.L. Chiew and S.N. Sze, "Phishing Website Detection Using URL-Assisted Brand Name Weighting System," Proceedings of International Symposium on Intelligent Signal Processing and Communication Systems, pp. 54-59, 2014.
    [12] J. Ma , L.K. Saul , S. Savage and G.M. Voelker, "Identifying Suspicious URLs: an Application of Large-Scale Online Learning," Proceedings of the 26th Annual International Conference on Machine Learning, pp.681-688, 2009.
    [13] Z. Hu, R. Chiong, I. Pranata, W. Susilo and Y. Bao, "Identifying Malicious Web Domains Using Machine Learning Techniques with Online Credibility and Performance Data," Proceedings of Congress on Evolutionary Computation, pp. 5186-5194, 2016.
    [14] H. Zuhair, A, Selamat and M. Salleh, "New Hybrid Features for Phish Website Prediction, " International Journal of Advances in Soft Computing & Its Applications, vol. 8, no.1, pp. 28-43, 2016.
    [15] Y. Zhang, J.I. Hong, and L.F. Cranor. "Cantina: a Content-based Approach to Detecting Phishing Web Sites." Proceedings of the 16th international conference on World Wide Web, pp. 639-648, 2007.
    [16] G. Xiang, J. Hong, C.P. Rose and L. Cranor, "Cantina+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites." ACM Transactions on Information and System Security, vol.14, no. 2, pp. 21-49, 2011.
    [17] M. Dunlop, S. Groat and D. Shelly, "GoldPhish: Using Images for Content-Based Phishing Analysis," Proceedings of International Conference on Internet Monitoring and Protection, pp. 123-128, 2010.
    [18] B. Kulis and K. Grauman, "Kernelized locality-sensitive hashing for scalable image search," Proceedings of International Conference on Computer Vision, pp. 2130-2137, 2009.
    [19] Paulevé, Loïc, Hervé Jégou, and Laurent Amsaleg. "Locality sensitive hashing: A comparison of hash function types and querying mechanisms." Pattern Recognition Letters, vol. 31, no. 11, pp. 1348-1358, 2010.
    [20] J. Mao, W. Tian, P. Li, T. Wei and Z. Liang, "Phishing-Alarm: Robust and Efficient Phishing Detection via Page Component Similarity," IEEE Access, vol. 5, pp. 17020-17030, 2017.
    [21] J.S. White, J.N. Matthews and J.L. Stacy, "A Method for the Automated Detection of Phishing Websites Through Both Site Characteristics and Image Analysis," Cyber Sensing International Society for Optics and Photonics, vol. 8408, pp. 1-11, 2012.
    [22] V. Monga and B.L. Evans, "Perceptual Image Hashing Via Feature Points: Performance Evaluation and Tradeoffs," Transactions on Image Processing, vol. 15, no. 11, pp. 3452-3465, 2006.
    [23] R.S. Rao and S.T. Ali, "A Computer Vision Technique to Detect Phishing Attacks," Proceedings of the 2015 Fifth International Conference on Communication Systems and Network Technologies, pp. 596-601, 2015.
    [24] H. Bay, T. Tuytelaars, and L.V. Gool, "SURF: Speeded Up Robust Features, " Computer vision and image understanding, vol. 110, no. 3, pp. 346-359, 2008.
    [25] O. Asudeh, "A New Real-time Approach for Website Phishing Detection Based on Visual Similarity," The University of Texas at Arlington, 2016.
    [26] G. Wang, H. Liu, S. Becerra, K. Wang, S. Belongie, H. Shacham and S. Savage, "Verilogo: Proactive Phishing Detection via Logo Recognition, " Department of Computer Science and Engineering, University of California, San Diego, 2011.
    [27] S.P. Singh and G. Bhatnagar, "A Robust Image Hashing Based on Discrete Wavelet Transform," Proceedings of IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 440-444, 2017.
    [28] D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," International journal of computer vision, vol. 60, no. 2, pp. 91-110, 2004.

    QR CODE