特定企業之視覺化釣魚網站偵測｜國立臺灣科技大學博碩士論文系統

簡易檢索 / 詳目顯示

回結果列表

研究生：	黃冠龍 Kuan-Lung Huang
論文名稱：	特定企業之視覺化釣魚網站偵測 Visual Mechanisms Approach to Detect Phishing Website for Specific Enterprise
指導教授：	陳俊良 Jiann-Liang Chen
口試委員:	楊竹星 Chu-Sing Yang 黎碧煌 Bih-Hwang Lee 馬奕葳 Yi-Wei Ma
學位類別：	碩士 Master
系所名稱：	電資學院 - 電機工程系 Department of Electrical Engineering
論文出版年：	2019
畢業學年度：	107
語文別：	英文
論文頁數：	63
中文關鍵詞：	釣魚網站、wHash 、SIFT 、視覺化機制、圖像相似度
外文關鍵詞：	phishing website, wHash, SIFT, visualization mechanism, image similarity
相關次數：	點閱：207 下載：7
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

過去20年間，網路上的釣魚行為從未間斷，近年基於啟發式的機器學習方法已獲得重大的突破，然而釣魚手法不斷推陳出新，使得過去一些顯著的特徵在分類上的效能有所下降，因此，需要使用不易隨時間改變的特徵來偵測釣魚網站。
本研究提出基於視覺化機制來對特定企業網站進行釣魚網站偵測，針對釣魚網站的分類，視覺上的特徵是一種不易隨時間改變的特徵。針對極為相似的案例，以Wavelet hashing (wHash) 搭配色彩直方圖比較來對網頁截圖進行相似度的評估，wHash相較於Perceptual hashing (pHash) 對圖像資訊的雜訊容忍度較高，色彩直方圖則補足了wHash僅基於輪廓相似的缺陷。針對局部相似的案例，主要是利用釣魚網站使用知名品牌Logo的特點，以尺度不變特徵變換 (Scale-Invariant Feature Transform, SIFT) 來偵測Logo，進一步提升輪廓及色彩相似的準確度，並搭配cache提升整體速度。
本研究分別對極為相似和局部相似的案例進行實驗，實驗結果表明，在極為相似的實驗中wHash搭配色彩直方圖在threshold的制訂上較pHash穩定，本研究還利用了釣魚網站大量使用相同前端框架的特性，將待測網站與cache內黑名單做比對，進一步提升了全域比對的利用率，在局部相似的實驗中SIFT對Logo的偵測優於加速穩健特徵(Speeded Up Robust Features, SURF)，準確度可達97.93%，針對釣魚網站的整體性能測試的不平衡數據分析，準確度可達98.14%，平衡數據分析，準確度可達96.33%，此外，本機制加入了cache進一步縮短了偵測時間，偵測速度可提升4.6倍。

In the past 20 years, the phishing attacks on the web have never been interrupted. In recent years, heuristic-based machine learning methods have made major breakthroughs. However, the phishing methods are constantly being updated that cause some significant features of the past less effective in classification. Therefore, it is necessary to use features that are not easily changed over time to detect phishing sites.
This study proposes to use a visual mechanism to detect phishing websites for specific enterprise’s websites. For the classification of phishing websites, the visual features are not easy to change with time. The types of visually detecting phishing sites can be classified into three categories: very similar, local similar, and non-imitation.
For very similar cases, Wavelet hashing (wHash) and color histogram comparison are used to evaluate the similarity of webpage screenshots. wHash has higher tolerance for image noise than Perceptual hashing (pHash). Color histograms fix wHash based on contour similarity defects. For local similarity cases, the phishing website uses the well-known brand logos that can be detected the logo by Scale-Invariant Feature Transform (SIFT), which further improves the accuracy of contour and color similarity. And used cache to improve overall speed.
This study experimented with very similar and locally similar cases. In very similar experiments, wHash with color histogram is more stable than pHash in threshold setting. This study also makes use of the phishing website to use the same front-end framework features. Comparing the website to be tested with the blacklist in the cache, further improving the utilization of the screenshot matcher. In the local similar experiment, SIFT is better than Speeded Up Robust Features (SURF) for detecting logos, and the accuracy is up to 97.93%. For the unbalanced data analysis of the overall performance, the accuracy can reach 98.14%, and the balance data analysis can reach 96.33%. In addition, the mechanism is further shortened detection time by adding the cache. The detection speed increased by 4.6 times.

摘要    I
Abstract    IV
致謝    VI
Contents    VII
List of Figures    XIII
List of Tables    XVI
Chapter 1    Introduction    1
1    Motivation    1
2    Contributions    6
3    Organization    8
Chapter 2    Background Knowledge    9
1    Phishing site appearance    9
2    Very Similar    11
2.1    Locality-Sensitive Hashing (LSH)    11
2.2    Wavelet Hashing (wHash)    12
2.3    Color Histogram    13
3    Local Similar    13
3.1    Scale-Invariant Feature Transform (SIFT)    14
Chapter 3    Phishing Detect System based on Visual Mechanisms    17
1    System Overview    17
2    Detection Mechanisms    17
2.1    Offline Phase    18
2.2    Online Phase    19
3    Modules    21
3.1    Cache    21
3.2    Screenshot Matcher    22
3.2.1    Contour Matcher    22
3.2.2    Color Matcher    24
3.3    Logo Finder    25
Chapter 4    System Environment and Performance Analysis    27
1    System Environment    27
2    Experiment Settings    27
3    Performance Analysis    30
3.1    Very Similar Cases    30
3.2    Local Similar Cases    46
3.3    Complete Test    50
4    Summary    53
Chapter 5    Conclusion and Future Work    55
1    Conclusion    55
2    Future Work    56
References    57

                                

[1] Phishing Scams Cost American Businesses Half a Billion Dollars a Year (https://www.forbes.com/sites/leemathews/2017/05/05/phishing-scams-cost-american-businesses-half-a-billion-dollars-a-year/#79340383fa1c)
[2] APWG's Q3 2017 Phishing Activity Trends Report (http://docs.apwg.org/reports/apwg_trends_report_q3_2017.pdf)
[3] A.K. Jain and B.B. Gupta, "Two-level Authentication Approach to Protect from Phishing Attacks in Real Time, "Journal of Ambient Intelligence and Humanized Computing, vol. 9, no. 6, pp 1783–1796, 2018.
[4] Tan, C. Lin, K.L. Chiew, and K.S. Wong. "PhishWHO: Phishing Webpage Detection via Identity Keywords Extraction and Target Domain Name Finder." Decision Support Systems, vol. 88, pp. 18-27, 2017.
[5] L.A.T. Nguyen, B.L. To, H.K. Nguyen and M.H. Nguyen, "A Novel Approach for Phishing Detection Using URL-based Heuristic," Proceedings of International Conference on Computing, Management and Telecommunications, pp. 298-303, 2014.
[6] M. Khonji, A. Jones and Y. Iraqi, "A Novel Phishing Classification Based on URL
Features," Proceedings of Conference and Exhibition, pp. 221-224, 2011.
[7] A.C. Bahnsen, E.C. Bohorquez, S. Villegas, J. Vargas and F.A. González, "Classifying Phishing URLs Using Recurrent Neural Networks," Proceedings of APWG Symposium on Electronic Crime Research, pp. 1-8, 2017.
[8] D. Sahoo, C. Liu and C.H. Hoi, "Malicious URL Detection Using Machine Learning: A Survey," arXiv preprint, 2017, arXiv:1701.07179.
[9] E. Buber, B. Dırı and O.K. Sahingoz, "Detecting Phishing Attacks from URL by Using NLP Techniques," Proceedings of International Conference on Computer Science and Engineering, pp. 337-342, 2017.
[10] A. Blum, B. Wardman, T. Solorio and G. Warner Department of Computer. "Lexical Feature Based Phishing URL Detection Using Online Learning." Proceedings of the 3rd ACM workshop on Artificial intelligence and security, pp. 54-60, 2010.
[11] C.L. Tan, K.L. Chiew and S.N. Sze, "Phishing Website Detection Using URL-Assisted Brand Name Weighting System," Proceedings of International Symposium on Intelligent Signal Processing and Communication Systems, pp. 54-59, 2014.
[12] J. Ma , L.K. Saul , S. Savage and G.M. Voelker, "Identifying Suspicious URLs: an Application of Large-Scale Online Learning," Proceedings of the 26th Annual International Conference on Machine Learning, pp.681-688, 2009.
[13] Z. Hu, R. Chiong, I. Pranata, W. Susilo and Y. Bao, "Identifying Malicious Web Domains Using Machine Learning Techniques with Online Credibility and Performance Data," Proceedings of Congress on Evolutionary Computation, pp. 5186-5194, 2016.
[14] H. Zuhair, A, Selamat and M. Salleh, "New Hybrid Features for Phish Website Prediction, " International Journal of Advances in Soft Computing & Its Applications, vol. 8, no.1, pp. 28-43, 2016.
[15] Y. Zhang, J.I. Hong, and L.F. Cranor. "Cantina: a Content-based Approach to Detecting Phishing Web Sites." Proceedings of the 16th international conference on World Wide Web, pp. 639-648, 2007.
[16] G. Xiang, J. Hong, C.P. Rose and L. Cranor, "Cantina+: A Feature-Rich Machine Learning Framework for Detecting Phishing Web Sites." ACM Transactions on Information and System Security, vol.14, no. 2, pp. 21-49, 2011.
[17] M. Dunlop, S. Groat and D. Shelly, "GoldPhish: Using Images for Content-Based Phishing Analysis," Proceedings of International Conference on Internet Monitoring and Protection, pp. 123-128, 2010.
[18] B. Kulis and K. Grauman, "Kernelized locality-sensitive hashing for scalable image search," Proceedings of International Conference on Computer Vision, pp. 2130-2137, 2009.
[19] Paulevé, Loïc, Hervé Jégou, and Laurent Amsaleg. "Locality sensitive hashing: A comparison of hash function types and querying mechanisms." Pattern Recognition Letters, vol. 31, no. 11, pp. 1348-1358, 2010.
[20] J. Mao, W. Tian, P. Li, T. Wei and Z. Liang, "Phishing-Alarm: Robust and Efficient Phishing Detection via Page Component Similarity," IEEE Access, vol. 5, pp. 17020-17030, 2017.
[21] J.S. White, J.N. Matthews and J.L. Stacy, "A Method for the Automated Detection of Phishing Websites Through Both Site Characteristics and Image Analysis," Cyber Sensing International Society for Optics and Photonics, vol. 8408, pp. 1-11, 2012.
[22] V. Monga and B.L. Evans, "Perceptual Image Hashing Via Feature Points: Performance Evaluation and Tradeoffs," Transactions on Image Processing, vol. 15, no. 11, pp. 3452-3465, 2006.
[23] R.S. Rao and S.T. Ali, "A Computer Vision Technique to Detect Phishing Attacks," Proceedings of the 2015 Fifth International Conference on Communication Systems and Network Technologies, pp. 596-601, 2015.
[24] H. Bay, T. Tuytelaars, and L.V. Gool, "SURF: Speeded Up Robust Features, " Computer vision and image understanding, vol. 110, no. 3, pp. 346-359, 2008.
[25] O. Asudeh, "A New Real-time Approach for Website Phishing Detection Based on Visual Similarity," The University of Texas at Arlington, 2016.
[26] G. Wang, H. Liu, S. Becerra, K. Wang, S. Belongie, H. Shacham and S. Savage, "Verilogo: Proactive Phishing Detection via Logo Recognition, " Department of Computer Science and Engineering, University of California, San Diego, 2011.
[27] S.P. Singh and G. Bhatnagar, "A Robust Image Hashing Based on Discrete Wavelet Transform," Proceedings of IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 440-444, 2017.
[28] D.G. Lowe, "Distinctive Image Features from Scale-Invariant Keypoints," International journal of computer vision, vol. 60, no. 2, pp. 91-110, 2004.

簡易檢索 / 詳目顯示

相關論文