Basic Search / Detailed Display

Author: 林秉賢
Ping-Hsien Lin
Thesis Title: 一個偵測行動裝置即時通訊訊息的反詐騙系統-以臉書即時通為例
A Fraud Detection System for Real-time Messaging Communication on Android Facebook Messenger
Advisor: 羅乃維
Nai-Wei Lo
Committee: 吳宗成
Tzong-Chen Wu
葉國暉
Kuo-Hui Yeh
Degree: 碩士
Master
Department: 管理學院 - 資訊管理系
Department of Information Management
Thesis Publication Year: 2015
Graduation Academic Year: 103
Language: 英文
Pages: 42
Keywords (in Chinese): 詐騙偵測潛在語意模型餘弦相似度
Keywords (in other languages): Fraud Detection, Latent Semantic Analysis, Cosine Similarity
Reference times: Clicks: 494Downloads: 12
Share:
School Collection Retrieve National Library Collection Retrieve Error Report
  • 隨著智慧型手機的普及化,各種行動應用裝置通訊應用程式(如:Facebook、Line、WeChat)的興起,不僅縮短了人與人溝通的距離,也節省了許多通訊的成本。但是在享受資訊科技所帶來便利的同時,許多風險也隨之產生,除了一些高風險的應用程式權限,導致我們的個人隱私資訊洩露之外,也被詐騙集團拿來當作詐騙的工具。近年來,許多詐騙事件都是詐騙集團透過行動應用裝置通訊應用程式來犯罪,利用聊天的方式掌握人性的弱點,進而騙取錢財。
    在本篇論文中,我們設計出一個偵測行動裝置即時通訊訊息的反詐騙系統-以臉書即時通為例來解決上述的詐騙問題。本系統使用自然語言處理、矩陣處理、潛在語意分析與餘弦相似度來處理所輸入的資料,並且蒐集許多詐騙相關的新聞與案例,來驗證本系統偵測詐騙事件可行性,最後透過本系統搭配的行動裝置應用程式,達成警示使用者該聊天紀錄是否為詐騙事件的效果。


    Recently, the popularity rate of the smartphone usage has rapidly risen. There is a variety of mobile applications which are developed, such as “Facebook”, “Line”, “WeChat”, etc. The applications not only make people communicate with each other more easily, but also help humans reduce extra fee of calling or sending short messages. However, when we enjoy the convenience of the smartphone, many potential risks will appear at the same time. For example, some of high risk permissions would let your personal privacy information be exposed. In Taiwan, fraudsters also use the applications as a fraud tool to complete their purpose of crime.
    In this paper, we develop a fraud detection system of communications to solve the fraud problems. We use some technologies to process input data and verify feasibility of the fraud detection system, such as natural language processing, matrix processing, latent semantic analysis and cosine similarity. Then, we collect some news and cases about fraud event as training data for our fraud detection system and intercept the real-time message chat logs from “Facebook Messenger” as testing data. Finally, we develop a mobile application to warn the user whether the real-time message chat logs are fraud event or not.

    中文摘要 I Abstract II 誌謝 III Contents IV List of Figures V List of Tables VI Chapter 1 Introduction 1 Chapter 2 Preliminaries 5 2.1 Semantic Models 5 2.1.1 Latent Semantic Analysis 5 2.1.2 Probabilistic Latent Semantic Analysis 6 2.1.3 Latent Dirichlet Allocation 6 2.2 Decision Models 8 2.2.1 Cosine similarity 8 2.2.2 Jaccard Similarity 9 2.2.3 Dice Similarity 9 Chapter 3 The Proposed Fraud Detection System 10 3.1 System Architecture 10 3.2 Data Flow of the Fraud Detection System 10 3.3 Data Collection 11 3.4 Natural Language Processing 12 3.4.1 CKIP Word Segmentation 12 3.4.2 Stop Word 13 3.4.3 Special Symbol 13 3.5 Matrix Processing 13 3.5.1 Vector Space Model (VSM) 13 3.5.2 Term Frequency-Inverse Document Frequency Matrix 16 3.6 Latent Semantic Analysis 20 3.7 Classification Rules 28 Chapter 4 System Implementation, Testing Scenarios and Discussion 31 4.1 System Implementation 31 4.2 Testing Scenarios 33 4.3 Discussion 37 Chapter 5 Conclusion 38 References 39

    [1] Y. Kou, C. T. Lu, S. Sirwongwattana and Y. P. Huang, “Survey of fraud detection techniques,” 2004 IEEE international conference on Networking, sensing and control, vol. 2, pp. 749-754, 2004.
    [2] W. Lee and K. W. Mok, “Adaptive intrusion detection: a data mining approach,” Artificial Intelligence Review, vol. 14, no. 6, pp. 533-567, 2000.
    [3] M. H. Cahill, D. Lambert, J. C. Pinheiro and D. X. Sun, “Detecting fraud in the real world,” Handbook of massive data sets, pp. 911-929, 2002.
    [4] J. B. S. Freeman, A. Bivens and B. Szymanski, “Host-based intrusion detection using user signatures,” Graduate Research Conference, 2002.
    [5] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas and R. A. Harsh-man, “Indexing by Latent Semantic Analysis,” American Society for Information Science, vol. 41, no. 6, pp 391-407, 1990b.
    [6] C. D. Manning, P. Raghavan and H. Schütze, “Introduction to information retrieval,” Cambridge: Cambridge university press, 2008.
    [7] G. Cosma and M. Joy, “An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis,” Institute of Electrical and Electronics Engineers Transactions on Computers, vol. 61, no. 3, pp. 379-394, 2012.
    [8] N. Evangelopoulos, X. Zhang, and V. R. Prybutok, “Latent Semantic Analysis: Five Methodological Recommendations,” European Journal of Information Sys-tems, vol. 21, no. 1, pp. 70-86, 2010.
    [9] T. K. Landauer, D. S. McNamara, S. Dennis and W. Kintsch, Handbook of Latent Semantic Analysis, Psychology Press, 2013.
    [10] F.-F. Kuo, M.-K. Shan, and S.-Y. Lee, “Background Music Recommendation for Video Based on Multimodal Latent Semantic Analysis,” 2013 IEEE International Conference on Multimedia and Expo (ICME), pp. 1-6, 2013.
    [11] R. Klein, A. Kyrilov and M. Tokman, “Automated Assessment of Short Free-Text Responses in Computer Science using Latent Semantic Analysis,” in Proceedings of the 16th annual joint conference on innovation and technology in computer science education, ACM, pp. 158-162, 2011.
    [12] M. C. Lintean, C. Moldovan, V. Rus and D. S. McNamara, “The Role of Local and Global Weighting in Assessing the Semantic Similarity of Texts Using Latent Semantic Analysis,” FLAIRS Conference, pp. 235-240, 2010.
    [13] M. G. Ozsoy, F. N. Alpaslan and I. Cicekli, “Text summarization using Latent Semantic Analysis,” Journal of Information Science, vol. 37, no. 4, pp. 405-417, 2011.
    [14] C.-J. Luh, S.-A. Yang and D. T.-L. Huang, “Estimating Search Engine Ranking Function with Latent Semantic Analysis and a Genetic Algorithm,” in Proceed-ings of the 2012 3rd International Conference on E-Business and E-Government-Volume 04, IEEE Computer Society, pp. 439-442, 2012.
    [15] P. Y. Hui and H. Y. Meng, “Latent Semantic Analysis for Multimodal User Input With Speech and Gestures,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 2, pp. 417-429, 2014.
    [16] T. Hofmann, “Probabilistic latent semantic analysis,” in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp. 289-296, 1999.
    [17] J. Zhang and S. Gong, “Action Categorization by Structural Probabilistic Latent Semantic Analysis,” Computer Vision and Image Understanding, vol. 114, no. 8, pp. 857-864, 2010.
    [18] C. Shen, T. Li and C. H. Ding, “Integrating Clustering and Multi-Document Summarization by Bi-Mixture Probabilistic Latent Semantic Analysis (PLSA) with Sentence Bases,” Association for the Advancement of Artificial Intelligence, pp. 914-920, 2011.
    [19] E. C. Su, J.-M. Chang, C.-W. Cheng, T.-Y. Sung, and W.-L. Hsu, “Prediction of nuclear proteins using nuclear translocation signals proposed by probabilistic la-tent semantic indexing,” BMC bioinformatics (13:S17-S13), pp. 1-10, 2012.
    [20] Y. Wen, C. Zou and J. Liu, “Probabilistic latent semantic analysis for sketch-based 3D model retrieval," 2014 4th IEEE International Conference on Information Science and Technology (ICIST), pp. 594-597, 2014.
    [21] D. M. Blei, A. Y. Ng and M. I. Jordan, “Latent Dirichlet Allocation,” the Journal of machine Learning research, vol. 3, pp. 993-1022, 2003.
    [22] A. Bhardwaj, M. Reddy, S. Setlur, V. Govindaraju and S. Ramachandrula, “Latent Dirichlet Allocation Based Writer Identification in Offline Handwriting,” in Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, ACM, pp. 357-362, 2010.
    [23] J. C. Niebles, H. Wang, and L. Fei-Fei, “Unsupervised learning of human action categories using spatial-temporal words,” International journal of computer vision, vol. 79, no. 3, pp. 299-318, 2008.
    [24] J. Caol, J. Li, Y. Zhang and S. Tang, “LDA-Based Retrieval Framework for Semantic News Video Retrieval,” 2007 IEEE International Conference on Semantic Computing (ICSC), pp. 155-160, 2007.
    [25] M. Juneja, A. Vedaldi, C. Jawahar and A. Zisserman, “Blocks that Shout: Distinctive Parts for Scene Classification,” 2013 IEEE Conference on Computer Vi-sion and Pattern Recognition (CVPR), pp. 924-930, 2013.
    [26] T. Pang-Ning, M. Steinbach and V. Kumar, “Introduction to data mining,” Library of Congress, 2006.
    [27] A. Singhal, “Modern information retrieval: A brief overview,” IEEE Data Eng. Bull, vol. 24, no. 4, pp. 35-43, 2001.
    [28] V. Thada and D. V. Jaglan, “Comparison of Jaccard, Dice, Cosine Similarity Coefficient To Find Best Fitness Value for Web Retrieved Documents Using Genet-ic Algorithm,” International Journal of Innovations in Engineering and Technology, 2013.

    QR CODE