簡易檢索 / 詳目顯示

研究生: 陳佳琤
Chia-Cheng Chen
論文名稱: 時間序列分析的重新檢視於內部威脅偵測
Re-consider Time Series Analysis for Insider Threat Detection
指導教授: 鮑興國
Hsing-Kuo Kenneth Pao
口試委員: 鄧惟中
項天瑞
曾俊元
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2024
畢業學年度: 112
語文別: 英文
論文頁數: 42
中文關鍵詞: 內部威脅偵測時間分析句子嵌入時間序列對比學習異常偵測
外文關鍵詞: Insider threat detection, Temporal analysis, Sentence embedding, Time series contrastive learning, Anomaly detection
相關次數: 點閱:106下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 內部威脅偵測在網路安全領域中是一項重大挑戰,特別是在大型公司或組織中。過去研究大多關注於外部所造成的威脅,內部威脅偵測往往被忽視,導致這一關鍵領域的研究和發展不足。傳統的內部威脅偵測方法主要依賴事件驅動,並且研究人員開發了各種基於規則的方法來處理這些任務。而這些方法往往忽視了不同時間點發生的事件,它們之間存在的時間關係。例如,我們容易理解具有因果關係的事件,如一個異常事件之後發生另一個特定事件來完成惡意行為,但可能忽略特定時刻發生的事件。例如,早上九點在上班時間所發生的事件。

    本研究中,認為在處理內部威脅偵測時應重新考慮對應使用者行為之時間關係,提取隱藏在網路活動中的資訊。具體來說,有效的句子嵌入可以幫助我們提供有用的內部表示,以代表時間序列中的活動之間存在的關係,從而在內部威脅偵測中做出更準確的判斷。在本文中,我們提出了一種新穎的內部威脅偵測方法,強調在已經成熟的事件序列分析基礎上進行時間關係建模,以有效捕捉可能造成的內部威脅事件。我們的方法利用對比句子嵌入來學習序列中用戶的意圖,並通過雙層對比學習模型結合時間行為與用戶行為嵌入。我們還考慮了週期性和時間尺度等問題,以進一步提高模型的效能。

    為了驗證我們的方法,我們使用卡內基梅隆大學所公開的CERT數據集進行了廣泛的分析和實驗。結果顯示,我們的方法在偵測內部威脅和識別惡意事件方面非常有效且穩健,這突顯了其在複雜組織環境中加強網路安全措施的潛力。


    Insider threat detection (ITD) presents a significant
    challenge in cybersecurity, particularly within large and complex organizations. Traditionally, ITD has been overshadowed
    by the focus of external threats, resulting in less attention
    and development in this critical area.
    Conventional ITD approaches often rely heavily on event-driven approaches. On top of that, researchers developed various rule-based methods
    to conquer the tasks. Based on that, we often ignore the intrinsic temporal relationships that are naturally built in between events that occur in different moments.
    For instance, we may easily understand events with causality such as one
    anomalous event followed by another specific event to complete
    a malicious action; however, may not be aware of events that occur around 9 am every morning during working hours.
    In our opinion, we attempt to re-consider the temporal behavior to extract the information hidden in cyberspace activities. Specifically, some
    effective sentence embeddings can assist us in providing
    informative internal representations to summarize temporal
    behaviors in the temporal activity sequences to make the
    right judgment on insider threat detection.
    In this paper, we propose a novel methodology for insider threat detection that emphasizes temporal relationship modeling on top of already-matured event sequence analysis to effectively catch insider threats.
    The proposed approach leverages contrastive sentence embeddings to learn users' intentions in sequences, followed by the deployment of a Hierarchical Contrastive Learning (HCL) model to incorporate temporal behaviors with user behavior embeddings. Some other issues such as multiple periodicities and granularity are also considered to further improve the
    model performance.
    To validate the proposed methodology, we conduct extensive analyses and experiments using the publicly available CERT dataset. The results demonstrate the effectiveness and robustness of our method in detecting insider threats and identifying malicious scenarios, highlighting its
    potential for enhancing cybersecurity measures in complex organizational environments.

    Recommendation Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . I Approval Letter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . V Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VIII List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IX List of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . X 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1 Insider Threat Detection . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Sentence Embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3 Contrastive Sentence Embedding . . . . . . . . . . . . . . . . . . . . . . 9 2.4 Time Series Contrastive Learning . . . . . . . . . . . . . . . . . . . . . 9 3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.1 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Contrastive Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3.3 User-level Contrastive Learning . . . . . . . . . . . . . . . . . . . . . . 13 3.4 Temporal Information Projection . . . . . . . . . . . . . . . . . . . . . . 14 3.5 Event-level Contrastive Learning . . . . . . . . . . . . . . . . . . . . . . 15 3.6 Hierarchical Contrastive Learning . . . . . . . . . . . . . . . . . . . . . 17 V 3.7 Fine-tuning and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 19 4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.1 Dataset and Experimental Setting . . . . . . . . . . . . . . . . . . . . . . 21 4.1.1 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.2 Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.3 Feasibility Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.4 Model Performance for Different Granularity . . . . . . . . . . . . . . . 28 4.4.1 BERT-CL (User-level) . . . . . . . . . . . . . . . . . . . . . . . 28 4.4.2 Hierarchical Contrastive Learning (User-level + Event-level) . . . 28 4.5 Fine-Grained Granularity Analysis . . . . . . . . . . . . . . . . . . . . . 30 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Letter of Authority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    [1] L. Ponemon, “2023 cost of insider threats: Global report,” 2023.

    [2] S. Yuan and X. Wu, “Deep learning for insider threat detection: Review, challenges and opportunities,” Computers & Security, vol. 104, p. 102221, 2021.

    [3] J. Glasser and B. Lindauer, “Bridging the gap: A pragmatic approach to generating insider threat data,” in 2013 IEEE Security and Privacy Workshops, pp. 98–104, IEEE, 2013.

    [4] J. Happa et al., “Insider-threat detection using gaussian mixture models and sensitivity profiles,” Computers & Security, vol. 77, pp. 838–859, 2018.

    [5] L. Chen and M. Aritsugi, “An svm-based masquerade detection method with online update using co-occurrence matrix,” in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 37–53, Springer, 2006.

    [6] W. Eberle, L. Holder, and D. Cook, “Identifying threats using graph-based anomaly detection,” in Machine Learning in Cyber Trust: Security, Privacy, and Reliability, pp. 73–108, Springer, 2009.

    [7] D. Caputo, M. Maloof, and G. Stephens, “Detecting insider theft of trade secrets,” IEEE Security & Privacy, vol. 7, no. 6, pp. 14–21, 2009.

    [8] T. Gao, X. Yao, and D. Chen, “Simcse: Simple contrastive learning of sentence embeddings,” arXiv preprint arXiv:2104.08821, 2021.

    [9] Z. Yue, Y. Wang, J. Duan, T. Yang, C. Huang, Y. Tong, and B. Xu, “Ts2vec: Towards universal representation of time series,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 8980–8987, 2022.

    [10] T. Rashid, I. Agrafiotis, and J. R. Nurse, “A new take on detecting insider threats: exploring the use of hidden markov models,” in Proceedings of the 8th ACM CCS International workshop on managing insider security threats, pp. 47–56, 2016.

    [11] T. E. Senator, H. G. Goldberg, A. Memory, W. T. Young, B. Rees, R. Pierce, D. Huang, M. Reardon, D. A. Bader, E. Chow, et al., “Detecting insider threats in a real corporate database of computer usage activity,” in Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 1393–1401, 2013.

    [12] H. Eldardiry, E. Bart, J. Liu, J. Hanley, B. Price, and O. Brdiczka, “Multi-domain information fusion for insider threat detection,” in 2013 IEEE Security and Privacy Workshops, pp. 45–51, IEEE, 2013.

    [13] P. A. Legg, O. Buckley, M. Goldsmith, and S. Creese, “Automated insider threat detection system using user and role-based profile assessment,” IEEE Systems Journal, vol. 11, no. 2, pp. 503–512, 2015.

    [14] G. Gavai, K. Sricharan, D. Gunning, R. Rolleston, J. Hanley, and M. Singhal, “Detecting insider threat from enterprise social and online activity data,” in Proceedings of the 7th ACM CCS international workshop on managing insider security threats, pp. 13–20, 2015.

    [15] D. C. Le and A. N. Zincir-Heywood, “Evaluating insider threat detection workflow using supervised and unsupervised learning,” in 2018 IEEE Security and Privacy Workshops (SPW), pp. 270–275, IEEE, 2018.

    [16] D. C. Le and A. N. Zincir-Heywood, “Machine learning based insider threat modelling and detection,” in 2019 IFIP/IEEE Symposium on Integrated Network and Service Management (IM), pp. 1–6, IEEE, 2019.

    [17] D. C. Le, N. Zincir-Heywood, and M. I. Heywood, “Analyzing data granularity levels for insider threat detection using machine learning,” IEEE Transactions on Network and Service Management, vol. 17, no. 1, pp. 30–44, 2020.

    [18] D. C. Le and N. Zincir-Heywood, “Anomaly detection for insider threats using unsupervised ensembles,” IEEE Transactions on Network and Service Management, vol. 18, no. 2, pp. 1152–1164, 2021.

    [19] A. Tuor, S. Kaplan, B. Hutchinson, N. Nichols, and S. Robinson, “Deep learning for unsupervised insider threat detection in structured cybersecurity data streams,” in Workshops at the Thirty-First AAAI Conference on Artificial Intelligence, 2017.

    [20] F. Meng, F. Lou, Y. Fu, and Z. Tian, “Deep learning based attribute classification insider threat detection for data security,” in 2018 IEEE third international conference on data science in cyberspace (DSC), pp. 576–581, IEEE, 2018.

    [21] W. He, X. Wu, J. Wu, X. Xie, L. Qiu, and L. Sun, “Insider threat detection based on user historical behavior and attention mechanism,” in 2021 IEEE Sixth International Conference on Data Science in Cyberspace (DSC), pp. 564–569, IEEE, 2021.

    [22] F. Yuan, Y. Cao, Y. Shang, Y. Liu, J. Tan, and B. Fang, “Insider threat detection with deep neural network,” in Computational Science–ICCS 2018: 18th International Conference, Wuxi, China, June 11–13, 2018, Proceedings, Part I 18, pp. 43–54, Springer, 2018.

    [23] J. Zhang, Y. Chen, and A. Ju, “Insider threat detection of adaptive optimization dbn for behavior logs,” Turkish Journal of Electrical Engineering and Computer Sciences, vol. 26, no. 2, pp. 792–802, 2018.

    [24] P. Chattopadhyay, L. Wang, and Y.-P. Tan, “Scenario-based insider threat detection from cyber activities,” IEEE Transactions on Computational Social Systems, vol. 5, no. 3, pp. 660–675, 2018.

    [25] F. Yuan, Y. Shang, Y. Liu, Y. Cao, and J. Tan, “Data augmentation for insider threat detection with gan,” in 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), pp. 632–638, IEEE, 2020.

    [26] R. Gayathri, A. Sajjanhar, Y. Xiang, and X. Ma, “Anomaly detection for scenario-based insider activities using cgan augmented data,” in 2021 IEEE 20th international conference on trust, security and privacy in computing and communications (TrustCom), pp. 718–725, IEEE, 2021.

    [27] S. Zhou, L. Wang, J. Yang, and P. Zhan, “Sitd: insider threat detection using siamese architecture on imbalanced data,” in 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD), pp. 245–250, IEEE, 2022.

    [28] X. Tao, S. Lu, F. Zhao, R. Lan, L. Chen, L. Fu, and R. Jia, “User behavior threat detection based on adaptive sliding window gan,” IEEE Transactions on Network and Service Management, 2024.

    [29] S. Hu, Z. Xiao, Q. Rao, and R. Liao, “An anomaly detection model of user behavior based on similarity clustering,” in 2018 IEEE 4th Information Technology and Mechatronics Engineering Conference (ITOEC), pp. 835–838, IEEE, 2018.

    [30] S. D. Bhattacharjee, J. Yuan, Z. Jiaqi, and Y.-P. Tan, “Context-aware graph-based analysis for detecting anomalous activities,” in 2017 IEEE International Conference on Multimedia and Expo (ICME), pp. 1021–1026, IEEE, 2017.

    [31] J. M. Vidal, A. L. S. Orozco, and L. J. G. Villalba, “Online masquerade detection resistant to mimicry,” Expert Systems with Applications, vol. 61, pp. 162–180, 2016.

    [32] R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba, and S. Fidler, “Skip-thought vectors,” Advances in neural information processing systems, vol. 28, 2015.

    [33] F. Hill, K. Cho, and A. Korhonen, “Learning distributed representations of sentences from unlabelled data,” arXiv preprint arXiv:1602.03483, 2016.

    [34] L. Logeswaran and H. Lee, “An efficient framework for learning sentence representations,” arXiv preprint arXiv:1803.02893, 2018.

    [35] S. Arora, Y. Liang, and T. Ma, “A simple but tough-to-beat baseline for sentence embeddings,” in International conference on learning representations, 2017.

    [36] A. Conneau, D. Kiela, H. Schwenk, L. Barrault, and A. Bordes, “Supervised learning of universal sentence representations from natural language inference data,” arXiv preprint arXiv:1705.02364, 2017.

    [37] D. Cer, Y. Yang, S.-y. Kong, N. Hua, N. Limtiaco, R. S. John, N. Constant, M. Guajardo-Cespedes, S. Yuan, C. Tar, et al., “Universal sentence encoder,” arXiv preprint arXiv:1803.11175, 2018.

    [38] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.

    [39] N. Reimers and I. Gurevych, “Sentence-bert: Sentence embeddings using siamese bert-networks,” arXiv preprint arXiv:1908.10084, 2019.

    [40] Y. Zhu, J.-Y. Nie, Z. Dou, Z. Ma, X. Zhang, P. Du, X. Zuo, and H. Jiang, “Contrastive learning of user behavior sequence for context-aware document ranking,” in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 2780–2791, 2021.

    [41] T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in International conference on machine learning, pp. 1597–1607, PMLR, 2020.

    [42] X. Wu, C. Gao, L. Zang, J. Han, Z. Wang, and S. Hu, “Esimcse: Enhanced sample building method for contrastive learning of unsupervised sentence embedding,” arXiv preprint arXiv:2109.04380, 2021.

    [43] H. Wang and Y. Dou, “Sncse: Contrastive learning for unsupervised sentence embedding with soft negative samples,” in International Conference on Intelligent Computing, pp. 419–431, Springer, 2023.

    [44] A. v. d. Oord, Y. Li, and O. Vinyals, “Representation learning with contrastive predictive coding,” arXiv preprint arXiv:1807.03748, 2018.

    [45] X. Yang, Z. Zhang, and R. Cui, “Timeclr: A self-supervised contrastive learning framework for univariate time series representation,” Knowledge-Based Systems, vol. 245, p. 108606, 2022.

    [46] G. Zerveas, S. Jayaraman, D. Patel, A. Bhamidipaty, and C. Eickhoff, “A transformer-based framework for multivariate time series representation learning,” in Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp. 2114–2124, 2021.

    [47] E. Eldele, M. Ragab, Z. Chen, M. Wu, C. K. Kwoh, X. Li, and C. Guan, “Time-series representation learning via temporal and contextual contrasting,” arXiv preprint arXiv:2106.14112, 2021.

    [48] G. Woo, C. Liu, D. Sahoo, A. Kumar, and S. Hoi, “Cost: Contrastive learning of disentangled seasonal-trend representations for time series forecasting,” arXiv preprint arXiv:2202.01575, 2022.

    [49] X. Zhang, Z. Zhao, T. Tsiligkaridis, and M. Zitnik, “Self-supervised contrastive pretraining for time series via time-frequency consistency,” Advances in Neural Information Processing Systems, vol. 35, pp. 3988–4003, 2022.

    [50] X. Wu, S. Lv, L. Zang, J. Han, and S. Hu, “Conditional bert contextual augmentation,” in Computational Science–ICCS 2019: 19th International Conference, Faro, Portugal, June 12–14, 2019, Proceedings, Part IV 19, pp. 84–95, Springer, 2019.

    [51] K. Sohn, “Improved deep metric learning with multi-class n-pair loss objective,” Advances in neural information processing systems, vol. 29, 2016.

    [52] J.-Y. Franceschi, A. Dieuleveut, and M. Jaggi, “Unsupervised scalable representation learning for multivariate time series,” Advances in neural information processing systems, vol. 32, 2019.

    [53] S. Tonekaboni, D. Eytan, and A. Goldenberg, “Unsupervised representation learning for time series with temporal neighborhood coding,” arXiv preprint arXiv:2106.00750, 2021.

    無法下載圖示 全文公開日期 2026/08/05 (校內網路)
    全文公開日期 2029/08/05 (校外網路)
    全文公開日期 2029/08/05 (國家圖書館:臺灣博碩士論文系統)
    QR CODE