簡易檢索 / 詳目顯示

研究生: 吳建興
Jain-Shing - Wu
論文名稱: 用於資料防護之多面向使用者行為剖析與身份學習辨識
Multi-Vector User Behavior Profiling and Learning to Classify Identity for Data Protection
指導教授: 李漢銘
Hahn-Ming Lee
李育杰
Yuh-Jye Lee
口試委員: 黃彥男
Yen-Nun Huang
邱舉明
Ge-Ming Chiu
林永松
Yeong-Sung Lin
鮑興國
Hsing-Kuo Pao
鄧惟中
Wei-Chung Teng
學位類別: 博士
Doctor
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 81
中文關鍵詞: 使用者分類辨識按鍵行為剖析用戶行為分析機器學習滑鼠移動觸控銀幕手勢AD活動日誌
外文關鍵詞: classify user identity, keystroke profiling, user behavior analysis, machine learning, mouse movement, touch-screen gestures, AD logs
相關次數: 點閱:350下載:25
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 近年來企業資料外洩事件頻繁且衝擊更為嚴重,物聯網時代更多的電腦主
    機、行動裝置及雲端系統更容易遭致來自全球的駭客攻擊威脅,隱私及機敏資料
    外洩的疑慮倍增。過去敏感資料保護技術多需進行檔案內容掃描過濾,非常倚賴
    對各種檔案格式的解析能力,對於無法解析之機敏檔案常無法有效保護。此外,
    文件在終端被編輯建立時,常缺乏製作者有效識別,不易判別文件機敏程度而同
    步進行控管,萬一身分被冒用或惡意內部人員進行文件資料竊取活動,更是如大
    海撈針不易發覺,致使機密文件被竊事件不斷發生。

    本論文提出了一種以使用者多重行為面向進行資料安全識別與控管機制,主
    要包括:(a)終端裝置之操作物理行為辨識、(b)系統環境之意圖行為分析。物理
    行為辨識係擷取使用者在電腦上個人獨特的操作特徵習慣,包括文字鍵盤敲擊速
    率、滑鼠移動方向速度、以及手機觸控螢幕之按壓位置、滑移速度、按壓面積與
    壓力等各種連續行為特徵數值,以SVM分類學習方法產生個別行為模型,並建立
    連續性身份辨識方法,有效偵測偽冒竄改與非法存取。此外,針對意圖行為分
    析,則以應用層活動日誌進行分析,涵蓋AD(Active Directory) 及Proxy 日誌資料
    等,用於偵測使用者異常存取資料之活動,透過Markov鏈建立使用者正常行為樣
    態,以此為基準用以偵測異常,可及早發現惡意使用者之意圖。

    依上述藉由物理及意圖行為識別使用者身份,可突破DLP無法解析未知格式
    檔案之限制。於操作物理行為辨識方面,針對鍵盤及滑鼠操作行為分析實驗,
    結果顯示識別之正確率為92.64%,另就手機觸控行為分析實驗更達到98.60%正確
    率;此外於系統環境意圖行為分析,採用AD日誌實驗之異常偵測率達85.66%,整
    體而言,以人因行為分析之方法確能支援安全識別與管控,有效降低資料外洩風
    險。未來結合更多種日誌分析,可延伸偵測金融詐騙及基礎設施攻擊等。


    In the recent years, data leakage issues have become increasingly serious for companies and organizations. As the Internet of Things continues to expand, more
    computer software, connected devices, and cloud systems are vulnerable to attacks from anywhere in the world. Any incident of data leakage can cause severe financial loss or damaged reputations for individuals and corporations. This issue is becoming more explicit as people increasingly depend on smartphones in daily life and business. To protect sensitive data, traditional solutions such as content scanning and file filtering are commonly used, but these depend on the capability of parsing various file formats. For unsupported file formats, this type of approach is ineffective and the risk of data breach persists. Further, these methods are unable to discover malicious tampering by unauthorized users during data creation and modification. Even for authorized users, potential risks exist. It often occurs that a privileged account may be compromised, or a database may be accessed by a malicious employee. Detecting this type of malicious behavior in normal privileged activities is like finding a needle in a haystack.

    This thesis proposes an user behavior-based approach with multi-vector profiling, which can be regarded as a complimentary solution to the state-of-the-art, contentbased data loss prevention (DLP) model. The proposed approach includes two types of behavior analysis to actively identify data creators and malicious activity. The first type is related to the physical behavior of the user on the endpoint device, such as keystrokes, mouse movement, or touch-screen gestures. Support vector machine (SVM) classification and machine learning methods generate a behavior model and identify the user. The second type is focused on analyzing user intent through user behavior in the system environment, including account access, privilege escalation, and web browsing. Using a Markov chain with AD (Active Directory) and proxy logs, this analysis can model user behavior patterns to detect malicious activity. As mentioned above, by using physical and intention behavior analysis to recognize user identity, the proposed approach can resolve the limitation of parsing an
    unknown format. For analyzing physical behavior, the experiment on keystroke and
    mouse movement operations is 92.64% accurate, while the experiment on touchscreen gesture is 98.60% accurate. Additionally, for analyzing system intension behavior, the experiment with AD logs yields an anomaly detection rate of up to 85.66%. In summary, the conceived approach based on user behavior analysis can help protect data by effectively identifying data creators and malicious activity. Ultimately, it can also contribute to mitigating the risk of data leakage. In the future, the proposed approach can be widely used in financial fraud detection, infrastructure threat detection, and other areas of security vulnerability.

    1、INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Challenges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.1 File Format Parsing . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2.2 Data Creator Identification at an Endpoint Devicee . . . . . . .. . . 3 1.2.3 Stealthy Activity Discovery in a System Environment . . . . . . . . . 5 1.3 The Proposed Approach . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3.1 Physical Behavior Analysis on an Endpoint Device. . . . . . . . . . . 6 1.3.2 Intention Behavior Analysis in System Environment . . . . . . . . . . 7 2、RELATED WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1 Data Loss Prevention Techniques . . . . . . . . . . . . . . . . . . . . 9 2.2 Input Behavior Profiling of Touch Panel. . . . . . . . . . . . . . . . 10 2.3 Abnormal Behavior Detection Techniques . . . . . . . . . . . . . . . . 12 3、METHODOLOGY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2 Keystroke Behavior Learning. . . . . . . . . . . . . . . . . . . . . . 18 3.2.1 Recording and Extracting . . . . . . . . . . . . . . . . . . . . . . 19 3.2.2 Training and Testing . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 Mouse Movement Behavior Learning . . . . . . . . . . . . . . . . . . . 21 3.4 Smartdevice Touch-Panel Behavior Learning . . . . . . . . . . . . . . 24 3.5 Abnormal User Behavior Analysis. . . . . . . . . . . . . . . . . . . . 26 3.5.1 Windows Active Directory Domain Service . . . . . . . . . . . . . . 27 3.5.2 Proxy Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.5.3 The Real Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.5.4 The Proposed Method - ChainSpot . . . . . . . . . . . . . . . . . . .28 3.5.5 State Constitution of Used Markov Chain . . . . . . . . . . . . . . 29 3.5.6 Build a Markov Chain Given a Log Sequences Dataset . . . . . . . . . 31 3.5.7 Deviation Estimation Given Different Markov Chains . . . . . . . . . 32 4、EXPERIMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1 Framework Implementation . . . . . . . . . . . . . . . . . . . . . . . 33 4.1.1 Secure Keystream Analyzer (SKA). . . . . . . . . . . . . . . . . . . 34 4.1.2 DLP System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 Keystroke Behavior Experiments . . . . . . . . . . . . . . . . . . . . 36 4.3 Mouse Movement Behavior Experiments. . . . . . . . . . . . . . . . . . 38 4.4 Practical Keystroke and Mouse Behavior Experiments . . . . . . . . . . 40 4.4.1 Diverse Style Keystroke Experiment . . . . . . . . . . . . . . . . . 40 4.4.2 Combining Keystroke and Mouse Movement Profiling Findings. . . . . . 42 4.5 Smartdevice Touch Panel Behavior Experiments . . . . . . . . . . . . . 46 4.6 Discussion on Content-Based and Behavior-Based DLP Models. . . . . . . 52 4.7 Abnormal User Behavior Experiments . . . . . . . . . . . . . . . . . . 53 4.7.1 Effectiveness of Measuring Behavior Deviation Using ChainSpot . . . .54 4.7.2 Performance of Anomaly Detection Using ChainSpot . . . . . . . . . . 58 4.7.3 Representative Demonstration and Case Study of ChainSpot . . . . . . 60 5、CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 REFERENCES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Appendix A: Example for Behavior Log File . . . . . . . . . . . . . . . . 75

    [1] A. Shabtai, Y. Elovici, and L. Rokach, A Survey of Data Leakage Detection and Prevention Solutions, Springer, Berlin, 2012.
    [2] V. Stamati-Koromina, C. Ilioudis, R. E. Overill, C. K. Georgiadis, and D. Stamatis, Insider threats in corporate environments: a case study for data leakage prevention, in Proceedings of the 5th Balkan Conference in Informatics, 2012, pp. 271-274.
    [3] Symantec, Inc., Internet security threat report, 2011 trends, http://www.symantec. com/threatreport/, 2012.
    [4] Verizon Communications, 2012 data breach investigations report, http://www.verizonenterprise. com/DBIR/2012/, 2012.
    [5] DataLossDB, Open Security Foundation, Data loss statistics, http://datalossdb.org/statistics, 2013
    [6] Websense, Unified data loss prevention for gateways, endpoints and discovery, http://www.websense.com/assets/datasheets/datasheet-data-securitysuite-en.pdf, 2013.
    [7] eMarketer, Smartphone users worldwide will total 1.75 billion in 2014, http://www.emarketer.com/Article/Smartphone-Users-Worldwide-Will-Total-175-Billion-2014/1010536, January 2014.
    [8] Gartner, Gartner says smartphone sales surpassed one billion units in 2014, http://www.gartner.com/newsroom/id/2996817, March 2015.
    [9] S. Schleimer, D. S. Wilkerson, and A. Aiken, Winnowing: Local algorithms for document fingerprinting, in Proceedings of ACM SIGMOD International Conference on Management of Data, 2003, pp. 76-85.
    [10] H. Takabi, J. B. D. Joshi, and G. J. Ahn, Security and privacy challenges in cloud computing environments, IEEE Security and Privacy, Vol. 8, 2010, pp. 24-31.
    [11] M. Cooperation, Office binary file formats (for Word, Excel and Power-Point), http://download.microsoft.com/download/2/4/8/24862317-78F0-4C4BB355-C7B2C1D997DB/OfficeFileFormatsProtocols.zip, 2008.
    [12] Wikipedia, File format, http://en.wikipedia.org/wiki/File format, 2013.
    [13] M. S. Pera and Y. K. Ng, Simpad: A word-similarity sentence-based plagiarism detection tool on web documents, Web Intelligence and Agent Systems, Vol. 9, 2011, pp.27-41.
    [14] S. Hariharan, Automatic plagiarism detection using similarity analysis, The International Arab Journal of Information Technology, Vol. 9, 2012, pp. 322-326.
    [15] C. C. Lin, C. C. Chang, and D. Liang, A new non-intrusive authentication approach for data protection based on mouse dynamics, in Proceedings of International Symposium on Biometrics and Security Technologies, 2012, pp. 9-14.
    [16] H.K. Pao, H.Y. Lin, K.T. Chen, and J. Fadlil, Trajectory based behavior analysis for user verification, in proceeding of IDEAL”10 Proceedings of the 11th international conference on Intelligent data engineering and automated learning, pp. 316-323, 2010.
    [17] Y. Z. Hui Xu and M. R. Lyu. Towards continuous and passive authentication via touch biometrics: An experimental study on smartphones, In Symposium On Usable Privacy and Security (SOUPS 2014), 2014.
    [18] Y. Zhang, P. Xia, J. Luo, Z. Ling, B. Liu, and X. Fu, Fingerprint attack against touch-enabled devices, In Proceedings of the Second ACM Workshop on Security and Privacy in Smartphones and Mobile Devices, 2012.
    [19] M. Frank, R. Biedert and D. Song, Touchalytics: On the applicability of touchscreen input as a behavioral biometric for continuous authentication, In IEEE Transactions on Information Forensics and Security, vol. 8, no. 1, pages 136–148, 2013.
    [20] Trend Micro Inc., M-Trends 2015: A VIEW FROM THE FRONT LINES, 2015.
    [21] Trend Micro Inc., Trend micro white paper on advanced persistent threat(apt), 2013.
    [22] E. Ouellet, Magic quadrant for content-aware data loss prevention, Technique Report No. G00224160, Gartner, Inc., 2013.
    [23] RSA, The Security Division of EMC Corporation, RSA data loss prevention suite, http://www.rsa.com/products/DLP/sb/9104n DLPSTn SBn 0311.pdf, 2010.
    [24] McAfee, Mcafee total protection for data loss prevention, Technical Report, http://www.mcafee.com/au/resources/solution-briefs/sb-total-protectionfor-dlp.pdf, McAfee,Inc., 2012.
    [25] amXecure, RSA data loss prevention suite, http://www.amxecure.com/index.php/zh/siem/453-privacyid, 2013.
    [26] Paloalto Networks, Preventing data leaks at the firewall, http://www.paloaltonetworks.com/literature/whitepapers/, 2008.
    [27] W. Meng, D. S. Wong and J. Zhou, Surveying the development of biometric user authentication on mobile phones, In Communications Surveys & Tutorials, IEEE, vol. 17, no. 3, pages 1268–1293, 2014.
    [28] Y. Meng and L.-F. Kwok, Design of touch dynamics based user authentication with an adaptive mechanism on mobile phones, In Proceeding SAC ’14 Proceedings of the 29th Annual ACM Symposium on Applied Computing, pages 1680–1687, 2014.
    [29] M. Antal and L. szl’o Zsolt Szab’o, An evaluation of one-class and two-class classification algorithms for keystroke dynamics authentication on mobile devices, http://www.ms.sapientia.ro/ manyi/research/43.pdf, 2015.
    [30] L. Cai and H. Chen, Touchlogger: Inferring keystrokes on touch screen from smartphone motion, In Proceedings of the 6th USENIX conference on Hot topics in security, 2011.
    [31] D. Buschek and F. Alt, Improving accuracy, applicability and usability of keystroke biometrics on mobile touchscreen devices, In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2015.
    [32] B. Draffin and J. Zhang, Keysens: Passive user authentication through microbehavior modeling of soft keyboard interaction, In 5th International Conference, MobiCASE 2013, 2013.
    [33] H. Xu and M. R. Lyu, Towards continuous and passive authentication via touch biometrics: An experimental study on smartphones, In Symposium On Usable Privacy and Security (SOUPS 2014), 2014.
    [34] L. Li and G. Xue, Unobservable re-authentication for smartphones, In NDSS, The Internet Society, 2013.
    [35] W. Meng and L.-F. Kwok, The effect of adaptive mechanism on behavioural biometric based mobile phone authentication, In Information Management & Computer Security, vol. 22, no. 2, pages 155–166, 2014.
    [36] T.-F. Yen, A. Oprea, K. Onarlioglu, T. Leetham, W. Robertson, A. Juels, and E. Kirda, Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks, in Proceedings of the 29th Annual Computer Security Applications Conference. ACM, 2013, pp. 199–208.
    [37] T.-F. Yen, V. Heorhiadi, A. Oprea, M. K. Reiter, and A. Juels, An epidemiological study of malware encounters in a large enterprise, in Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2014, pp. 1117–1130.
    [38] A. Oprea, Z. Li, T.-F. Yen, S. H. Chin, and S. Alrwais, Detection of early-stage enterprise infection by mining large-scale log data, in Dependable Systems and Networks (DSN), 2015 45th Annual IEEE/IFIP International Conference on. IEEE, 2015, pp. 45–56.
    [39] L. Invernizzi, S. Miskovic, R. Torres, C. Kruegel, S. Saha, G. Vigna, S.-J. Lee, and M. Mellia, Nazca: Detecting malware distribution in large-scale networks. in NDSS, vol. 14, 2014, pp. 23–26.
    [40] H.-K. Pao, C.-H. Mao, H.-M. Lee, C.-D. Chen, and C. Faloutsos, An intrinsic graphical signature based on alert correlation analysis for intrusion detection, in Technologies and Applications of Artificial Intelligence (TAAI), 2010 International Conference on. IEEE, 2010, pp. 102–109.
    [41] R. C. Holte and C. Drummond, Cost-sensitive classifier evaluation using cost curves, in Advances in Knowledge Discovery and Data Mining. Springer, 2008, pp. 26–29.
    [42] K. Revett, F. Gorunescu, M. Gorunescu, M. Ene, S. T. de Magalhaaes, and H. M. D. Santos, A machine learning approach to keystroke dynamics based user authentication, International Journal of Electronic Security and Digital Forensics, Vol. 1, 2007, pp. 55-70.
    [43] Y. Zhao, Learning user keystroke patterns for authentication, World Academy of Science, Engineering and Technology, Vol. 14, 2008, pp. 739-744.
    [44] I. Traore, I. Woungang, S. Obaidat, Y. Nakkabi, and I. Lai, Combining mouse and keystroke dynamics biometrics for risk-based authentication in web environments, in Proceedings of International Conference on Digital Human, 2012, pp. 138-145.
    [45] S. P. Banerjee and D. L. Woodard, Biometric authentication and identification using keystroke dynamics: A survey, Journal of Pattern Recognition Research, 2012, pp. 116-139.
    [46] C. Cortes and V. Vapnik, Support-vector network, Machine Learning, Vol. 20, 1995, pp. 273-297.
    [47] C. W. Hsu and C. J. Lin, A comparison of methods for multiclass support vector machines, IEEE Transactions on Neural Networks, Vol. 13, 2002, pp. 415-425.
    [48] P. S. Teh, A. B. J. Teoh, and S. Yue, A survey of keystroke dynamics biometrics, The Scientific World Journal, Vol. 2013, Article ID 408280, 2013.
    [49] E. Yu and S. Cho, Keystroke dynamics identity verification its problems and practical solutions, Computers and Security, Vol. 23, 2004, pp. 428-440.
    [50] C. C. Chang, LIBSVM, a library for support vector machines, http://www.csie.ntu.edu.tw/cjlin/libsvm, 2012.
    [51] J. Wu, C. Lin, S. Chong and Y. Lee, Keystroke and mouse movement profiling for data loss prevention, In Journal of Information Science and Engineering, pages 23–42, 2015.
    [52] Microsoft, MSDN Library, Directory system agent, 2014, [Online; accessed: 6-May-2014]. [Online]. Available: https://msdn.microsoft.com/enus/library/ms675902(v=vs.85).aspx
    [53] M. E. Russinovich and D. A. Solomon, Microsoft Windows Internals: Microsoft Windows Server (TM) 2003, Windows XP, and Windows 2000 (Pro-Developer). Microsoft Press, 2004.
    [54] Microsoft, Active directory collection: Active directory on a windows server 2003 network, Microsoft, TechNet Library, Tech. Rep., 2015, [Online; accessed: 6-May-2015]. [Online]. Available: https://technet.microsoft.com/enus/library/cc780036(WS.10).aspx
    [55] Microsoft, How the kerberos version 5 authentication protocol works, Microsoft, TechNet Library, Tech. Rep., 2015, [Online; accessed: 6-May-2015]. [Online]. Available: https://technet.microsoft.com/enus/library/cc772815(v=ws.10).aspx
    [56] M. Shapiro, Structure and encapsulation in distributed systems: the proxy principle, in icdcs, 1986, pp. 198–204.
    [57] J. R. Norris, Markov chains., Cambridge university press, 1998, no. 2.
    [58] L. Rabiner, A tutorial on hidden markov models and selected applications in speech recognition, Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989.
    [59] M.-Y. Chen, A. Kundu, and J. Zhou, Off-line handwritten word recognition using a hidden markov model type stochastic network, Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 16, no. 5, pp. 481–496, 1994.
    [60] A. D. Wilson and A. F. Bobick, “Parametric hidden markov models for gesture recognition,” Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 21, no. 9, pp. 884–900, 1999.
    [61] Monterey Technology Group, Inc., Windows security log events, 17-April-2016. [Online]. Available: https://www.ultimatewindowssecurity.com/securitylog/encyclopedia/Default.aspx
    [62] Monterey Technology Group, Inc., Event code 4771: Kerberos pre-authentication failed, 17-April-2016. [Online]. Available: https://www.ultimatewindowssecurity.com/securitylog/encyclopedia/event.aspx?eventID=4771
    [63] G. Salton and M. J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill, 1986.
    [64] N. Zheng, A. Paloski, and H. N. Wang, An efficient user verification system via mouse movements, in Proceedings of the 18th ACM Conference on Computer and Communications Security, 2011, pp. 139-150.
    [65] T. Wuchner and A. Pretschner, Data loss prevention based on data-driven usage control, in Proceedings of IEEE 23rd International Symposium on Software Reliability Engineering, 2012, pp. 151-160.
    [66] P. A. Networks, Preventing data leaks at the firewall, http://www.paloaltonetworks.com/literature/whitepapers/, 2008.

    QR CODE