簡易檢索 / 詳目顯示

研究生: 姚昭宇
Eric Dao
論文名稱: 應用網路域名位置特徵於監督式機器學習的詐騙域名偵測
Application of Network Domain Location Features in Supervised Machine Learning for Ecommerce Scam Domain Detection
指導教授: 鄧惟中
Wei-Chung Teng
口試委員: 黃勝雄
Kenny Huang
鮑興國
Hsing-Kuo Pao
陳俊良
Jiann-Liang Chen
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 68
中文關鍵詞: domainlocationfeatureecommercescamnetwork
外文關鍵詞: domain, location, feature, ecommerce, scam, network
相關次數: 點閱:217下載:20
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

  • Ecommerce scam is a cybercrime that affects online consumer shoppers from nearly every country. Criminal groups implement deceiving ecommerce websites that lure consumers into purchasing their products, only to make away with the consumer’s money without giving the consumer what they had promised to sell them. Researchers have utilized a variety of domain features, from website HTML source code features to a domain’s DNS features to create frameworks that could identify ecommerce scam websites. However, much of the previous literature regarding this subject matter has neglected the potentially advantageous use of a domain’s location data to differentiate ecommerce scam websites from benign ecommerce websites. In this thesis, to find novel ways to combat ecommerce scam, the potential application of a domain’s location data as novel features to detect ecommerce scam websites was investigated.
    The first finding is that through testing with supervised machine learning models, it was discovered that our novel domain location features, in the form of domain location co-occurrences and geographical distances are effective features to detect ecommerce scam domains. Secondly, to our knowledge, we are the first researchers to have done a detailed analysis of domain location features between benign and scam ecommerce domains. To which, it was revealed that the location features of ecommerce scam domains, in comparison with benign ecommerce domains, tended to have much lower location co-occurrences and larger location distances with the country that they were marketing towards. Thirdly, an analysis was performed on the location features in our dataset at a local country level and to our knowledge, we are the first researchers to reveal the current trends in domain location data for ecommerce scam and benign websites in Taiwan. To which, it was discovered that ecommerce scam domains in Taiwan, in comparison to benign ecommerce domains in Taiwan, evidently possessed more location associations with China and less or none with Taiwan. Conversely, benign ecommerce domains in Taiwan, tended to have more location associations with Taiwan, and less or none with China. Therefore, this could serve as strong evidence to suspect that for foreign scam groups targeting a specific country, it is difficult, risky, and or costlier to ensure their scam domain’s various location data are located in the target country. Hence, the novel domain location features introduced in this thesis appear to be viable features in the detection of ecommerce scam domains, since they are likely not domain data features that scam groups are able to adapt to at a whim to evade detection.

    Table of Contents Abstract i Acknowledgements iii Table of Contents iv List of Figures & List of Tables vi Chapter 1 Introduction 1 1.1 Usage of the term “Country” Disclaimer 1 1.2 Motivation 1 1.3 Research Goals 4 1.4 Contributions 4 1.5 Outline of Thesis 5 Chapter 2 Background and Related Work 6 2.1 Background 6 2.1.1 Context Features 9 DNS Resolution Features 9 IP Address Geolocation Features 9 WHOIS Features 10 SSL certificate Features 11 2.1.2 Content Features 12 HTML, CSS, JavaScript Features 12 Actual Content Features 12 2.2 Related Work 13 Chapter 3 Research Methodology 16 3.1 Data Collection 16 3.1.1 Ecommerce Scam Websites in Taiwan 17 3.1.2 Benign Ecommerce Websites in Taiwan 19 3.2 Location Data Feature Collection 20 3.2.1 Target Audience 21 3.2.2 Presence of ccTLD 22 3.2.3 IP address Geolocation Data Collection 23 3.2.4 DNS Resolution Location-Relevant Data Collection 23 3.2.5 WHOIS Registration Location-Relevant Data Collection 24 3.2.6 Other DNS and WHOIS Non-Location Relevant Collected Data 26 3.2.7 Website SSL Certificate Data Collection 26 3.3 Location Co-occurrence and Location Distance Calculation 27 3.3.1 Collected Location Data 27 3.3.2 Location Co-Occurrence Calculation 28 3.3.3 Location Distance Calculation 32 3.4 Machine Learning for Scam Classification to Determine Feature Significance 36 3.4.1 Supervised Machine Learning 36 3.4.2 Features Used 37 3.4.3 Data Preprocessing 38 3.5 Results 39 3.5.1 Model Results 39 3.5.2 Feature Importance 40 3.6 Conclusion About Location Feature Significance 41 Chapter 4 Analysis of Broad Location Feature Differences Between Scam and Benign Ecommerce Domains 43 4.1.1 Location Co-occurrence Features 43 4.1.2 Location Distance Features 44 4.1.3 Conclusion of Analysis of Location Feature Differences Between Scam and Benign Ecommerce Domains 46 Chapter 5 Analysis of the Finer Location Feature Differences Between Scam and Benign Ecommerce Domains in Taiwan 48 5.1 Location Feature Differences Between Taiwan Ecommerce Scam and Benign Websites 48 5.2 Location Feature Differences Between Taiwan Ecommerce Scam and Benign .tw Websites 50 5.3 Domain Services Organization Location Feature Differences Between Taiwan Ecommerce Scam and Benign Websites 52 5.4 Conclusion of Analysis of the Finer Location and Other Domain Feature Differences Between Scam and Benign Ecommerce Domains in Taiwan 56 Chapter 6 Overall Conclusion 58 References 60

    [1] "Top Sites of HK," [Online]. Available: https://www.alexa.com/topsites/countries/HK.
    [2] "countries.csv," [Online]. Available: https://developers.google.com/public-data/docs/canonical/countries_csv. [Accessed 10 03 2021].
    [3] "Root Zone Database," [Online]. Available: https://www.iana.org/domains/root/db. [Accessed 10 03 2021].
    [4] "Taiwan Internet Report 2019," TWNIC, 2019.
    [5] J. D. a. T. M. John Wadleigh, "The E-Commerce Market for “Lemons”: Identification and Analysis of Websites Selling Counterfeit Goods," in Proceedings of the 24th International Conference on World Wide Web (WWW '15), Republic and Canton of Geneva, 2015.
    [6] "ScamAdviser," [Online]. Available: https://www.scamadviser.com/. [Accessed 01 03 2021].
    [7] "What Is DNS? | How DNS Works," [Online]. Available: https://www.cloudflare.com/learning/dns/what-is-dns/. [Accessed 01 03 2021].
    [8] "Number Resources," [Online]. Available: https://www.iana.org/numbers. [Accessed 11 03 2021].
    [9] "About WHOIS," [Online]. Available: https://whois.icann.org/en/about-whois. [Accessed 11 03].
    [10] E. Schechter, "A secure web is here to stay," 08 02 2018. [Online]. Available: https://security.googleblog.com/2018/02/a-secure-web-is-here-to-stay.html. [Accessed 21 03 2021].
    [11] W. &. Z. B. &. W. M. Mostard, "Combining Visual and Contextual Information for Fraudulent Online Store Classification," IEEE, pp. 84-90, 2019.
    [12] "About Scamadviser," [Online]. Available: https://www.scamadviser.com/about-scamadviser. [Accessed 01 03 2021].
    [13] B. a. V. A. a. W. K. Eshete, "BINSPECT: Holistic analysis and detection of malicious web pages," in International Conference on Security and Privacy in Communication Systems, 2013.
    [14] L. K. S. S. S. G. M. V. Matthew F. Der, "Knock It Off: Profiling the Online Storefronts of Counterfeit Merchandise," Association for Computing Machinery, 2014.
    [15] R. L. D. C. C. L. a. X. L. S. Hao, "Inconsistency between Domain Name and Server Location: Phenomena, Causes, and Countermeasures," in Electronic and Automation Control Conference (IMCEC), Chongqing, 2019.
    [16] T. a. S. J. a. A. P. a. J. D. a. J. P. a. W. M. a. P. F. a. J. W. a. D. L. Vissers, "Exploring the Ecosystem of Malicious Domain Registrations in the .eu TLD," in International Symposium on Research in Attacks, Intrusions, and Defenses, 2017.
    [17] "About Us," [Online]. Available: https://www.twnic.tw/en_about_mission.php. [Accessed 14 06 2021].
    [18] C. &. Romano, "Learning to detect and measure fake ecommerce websites in search-engine results," Association for Computing Machinery, 2017.
    [19] "Security Trails reverse IP lookup," [Online]. Available: https://securitytrails.com/list/ip/8.8.8.8. [Accessed 01 01 2021].
    [20] 張銘億, "An E-commerce Scam Website Detection Framework Based on Syntactic Similarity ofHTML Code and Conversion Tracking Identity," 2019.
    [21] "廣告小撇步:使用2020 Q1熱搜關鍵字來增加曝光和流量吧!," [Online]. Available: https://myads.shopee.tw/news/258. [Accessed 1 2 2021].
    [22] anthonyhseb, "google-search," [Online]. Available: https://github.com/anthonyhseb/googlesearch. [Accessed 1 2 2021].
    [23] "How Search algorithms work," [Online]. Available: https://www.google.com/search/howsearchworks/algorithms/. [Accessed 10 03 2021].
    [24] "Google Safe Browsing," [Online]. Available: https://safebrowsing.google.com/#:~:text=Google%20Safe%20Browsing%20helps%20prot
    ect,sites%20or%20download%20dangerous%20files.&text=Our%20Transparency%20Rep
    ort%20includes%20details%20on%20the%20threats%20that%20Safe%20Browsing%20id
    entifies.. [Accessed 10 03 2021].
    [25] "Taiwan Network Infromation Center," [Online]. Available: https://rs.twnic.net.tw/. [Accessed 11 03 2021].
    [26] "ip2geotools," [Online]. Available: https://github.com/tomas-net/ip2geotools. [Accessed 11 03 2021].
    [27] E. MARTINEZ, "VirusTotal += Passive DNS replication," 1 04 2013. [Online]. Available: https://blog.virustotal.com/2013/04/virustotal-passive-dns-replication.html. [Accessed 11 03 2021].
    [28] "python-whois," [Online]. Available: https://github.com/richardpenman/whois. [Accessed 11 03 2021].
    [29] "geopy," [Online]. Available: https://github.com/geopy/geopy. [Accessed 01 03 2021].
    [30] O. S. Research, "The role of country code top-level domains (ccTLDs) in malware classification," 18 01 2013. [Online]. Available: https://umbrella.cisco.com/blog/the-importance-of-cctld. [Accessed 01 03 2021].
    [31] "Top Sites in Taiwan," [Online]. Available: https://www.alexa.com/topsites. [Accessed 01 01 2020].
    [32] "Top Sites in China," [Online]. Available: https://www.alexa.com/topsites/countries/CN. [Accessed 01 01 2021].
    [33] "Top Sites in United States," [Online]. Available: https://www.alexa.com/topsites/countries/US. [Accessed 01 01 2021].
    [34] "What Is DNS? | How DNS Works," [Online]. Available: https://www.cloudflare.com/learning/dns/what-is-dns/. [Accessed 11 03 2021].
    [35] "Geodesic," [Online]. Available: https://en.wikipedia.org/wiki/Geodesic. [Accessed 18 03 2021].
    [36] Y. a. K. I. a. Y. T. a. D. M. Zhauniarovich, "A Survey on Malicious Domains Detection through DNS Data Analysis," in ACM Computing Surveys, 2018.

    QR CODE