簡易檢索 / 詳目顯示

研究生: 湯京祐
Jing-You Tang
論文名稱: 使用預訓練表示法於財務風險指標預測之研究
A Study on Using Pre-trained Embedding Models for Finance Risk Prediction
指導教授: 陳冠宇
Kuan-Yu Chen
口試委員: 陳怡伶
Yi-Ling Chen
戴碧如
Bi-Ru Dai
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 93
中文關鍵詞: 文字探勘財務風險機器學習語言模型
外文關鍵詞: Text mining, Financial risk, Machine learning, Language model
相關次數: 點閱:293下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 本論文的目的在於使用新穎的機器學習的模型架構以及詞向量表示,使風險評估的精準程度有效上升。研究焦點放於使用各種不同特徵以及方法來做比較實驗,特徵包含對數尺度下的詞頻特徵以及從舊到新的各種不同的詞向量表示兩種,訓練模型架構則是基本的深度神經網路架構以及支持向量回歸模型。
    本論文在此提出了三種新的改良方式,第一,以最新穎之預訓練模型BERT的特徵,使風險預估有效提升精準度,第二,提出一種新穎的訓練模型,不同於預測值,本論文選擇預測每年的指標差額,來改進風險預估的精準度;最後使用詞向量特徵以及詞頻特徵,所得到的訓練特徵有所不同的想法,結合此兩種特徵以餘弦距離遠近的方式使其能夠互相彌補,以利實驗結果能夠有更好的精準度,最後實驗得出,使用BERT特徵能夠有效使風險預測的精準度有大幅度的上升,並且也證明提出的差額模型在預測波動率以及事後波動率的實驗中,有著比原本模型更好的結果;然後希望以不同的特徵互相彌補的實驗中,雖然有好的結果但其效果不彰。


    The purpose of this paper is to use the state of the art machine learning model architecture and word vector representation to effectively increase the accuracy of risk assessment. The focus of research is on the use of various features and methods for comparative experiments. The features include word frequency features on logarithmic scales and various word vector representations. The training model architecture is the basic deep neural network architecture and support vector regression models.
    In this paper, three new methods are proposed. First, we add the state of the art feature : BERT are used to improve the accuracy of risk estimation. Secondly, we proposed a new training model, which is different from the predicted value. We choose to predict the margin of indicators to improve the accuracy of risk estimation. Third, we combining the word vector feature and the word frequency feature to make experimental results have better precision. Finally, the experiment shows that the use BERT features can effectively increase the accuracy of risk prediction. And the proposed margin model predicts volatility and post-event volatility are better than the original model; But in experiments that combine two different features, although there are some good accuracy, the experiments effect is not good

    摘要 I Abstract II 致謝 III 目錄 IV 第1章 緒論 1 1.1 研究目的及動機 1 1.2 論文大綱 4 第2章 相關研究 5 2.1 前人方法 5 2.1.1 財務風險預測之研究 5 2.1.1.1 基於文本的風險預測 5 2.1.1.2 財務專用詞庫之研究 6 2.1.1.3 擴增財務詞庫之風險預測 9 2.1.2 應用文字前處理之研究 11 2.1.2.1 Transformer的前身,編碼器與解碼器 11 2.1.2.2 Transformer & Attention 13 2.2 文本特徵擷取 17 2.2.1 TF-IDF&LOG1P 17 2.3 財務風險預測相關研究 18 2.3.1 風險預估方法 20 2.3.1.1 後事件波動率 20 2.3.1.2 波動率 21 2.3.1.3 異常交易量 23 2.3.1.4 超額回報 24 第3章 迴歸方法及基準系統 26 3.1 詞向量表示 26 3.1.1 N-gram 26 3.1.2 CBOW & Skip-gram 27 3.1.3 GloVe 29 3.1.4 Doc2Vec 29 3.1.5 ELMo (Embedding from Language Models) 30 3.1.6 Generative pre-training (OPENAI GPT) 32 3.1.7 BERT(Bidirectional Encoder Representations from Transformers) 34 3.2 深度神經網路(Deep Neural Network) 36 3.3 卷積神經網路 37 3.4 循環神經網路即長短期記憶網路 38 3.5 方法 39 3.5.1 以BERT改良詞向量表示 39 3.5.2 讓不同訓練資料互相彌補 39 3.5.3 預測風險評估指標的差值 40 3.6 評估方法及基準系統 41 3.6.1 評估方法 41 3.6.2 績效評估的基準(Baseline) 41 第4章 實驗設定 43 4.1 訓練資料集 43 第5章 實驗結果 44 5.1 Baseline之實驗 44 5.1.1 前處理 44 5.1.2 支持向量迴歸 44 5.1.3 使用深度神經網路之方法 45 5.1.4 Baseline結果 47 5.2 卷積神經網路以及長短期記憶神經網路之實驗 50 5.2.1 方法 51 5.3 Doc2Vec之實驗 55 5.3.1 前處理 55 5.3.2 Doc2Vec之實驗結果 56 5.4 使用預訓練模型之實驗 57 5.4.1 前處理 57 5.4.2 ELMo方法與結果 57 5.4.3 GPT方法與結果 59 第6章 比較實驗 61 6.1 BERT取代詞向量表示之研究結果 61 6.2 預測指標差之研究結果 66 6.2.1 訓練資料集前處理 66 6.2.2 預測指標差之基準系統 66 6.2.3 預測指標差之研究結果 68 6.3 訓練資料互補之研究結果 70 6.4 詞向量表示法之差異結果 76 6.4.1 波動率 & 波動率差 76 6.4.1 事後波動率 & 事後波動率差 78 6.4.2 異常交易量 & 異常交易量差 80 6.4.3 超額報酬 & 超額報酬差 82 第7章 討論與分析 85 第8章 結論 88 參考文獻 89

    [1] S. Kogan, D. Levin, B. R. Routledge, J. S. Sagi, and N. A. Smith, "Predicting risk from financial reports with regression." In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’09) pp. 272-280.
    [2] M.-F. Tsai, C.-J. Wang, and P.-C. Chien, “Discovering finance keywords via continuous-space language models,” ACM Transactions on Management Information Systems (TMIS), vol. 7, no. 3, pp. 7, 2016.
    [3] G. Salton, and C. Buckley, “Term-Weighting Approaches in Automatic Text Retrieval,” Information Processing & Management, vol. 24, no. 5, pp. 513-523, 1988.
    [4] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A neural probabilistic language model,” Journal of Machine Learning Research, vol. 3, no. 6, pp. 1137-1155, Aug 15, 2003.
    [5] B. E. Boser, I. M. Guyon, and V. N. Vapnik, "A training algorithm for optimal margin classifiers." in Proceeding COLT '92 Proceedings of the fifth annual workshop on Computational learning theory pp. 144-152.
    [6] M. W. Gardner, and S. R. Dorling, “Artificial neural networks (the multilayer perceptron) - A review of applications in the atmospheric sciences,” Atmospheric Environment, vol. 32, no. 14-15, pp. 2627-2636, Aug, 1998.
    [7] W. Y. Wang, and Z. Hua, "A semiparametric gaussian copula regression model for predicting financial risks from earnings calls." in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics pp. 1155-1165.
    [8] M. F. Tsai, and C. J. Wang, “On the risk prediction and analysis of soft information in finance reports,” European Journal of Operational Research, vol. 257, no. 1, pp. 243-250, Feb 16, 2017.
    [9] C. Nopp, and A. Hanbury, "Detecting risks in the banking system by sentiment analysis." in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing pp. 591-600.
    [10] S. Kazemian, S. Zhao, and G. Penn, "Evaluating sentiment analysis evaluation: A case study in securities trading." in Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis pp. 119-127.
    [11] X. Ding, Y. Zhang, T. Liu, and J. Duan, "Deep learning for event-driven stock prediction." in Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015)
    [12] T. H. Nguyen, and K. Shirai, "Topic modeling based sentiment analysis on social media for stock market prediction." in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing pp. 1354-1364.
    [13] T. Loughran, and B. McDonald, “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks,” Journal of Finance, vol. 66, no. 1, pp. 35-65, Feb, 2011.
    [14] Investopedia. "Sarbanes-Oxley (SOX) Act of 2002," https://www.investopedia.com/terms/s/sarbanesoxleyact.asp.
    [15] C. D. Manning, C. D. Manning, and H. Schütze, Foundations of statistical natural language processing: MIT press, 1999.
    [16] H. Schwenk, “Continuous space language models,” Computer Speech and Language, vol. 21, no. 3, pp. 492-518, Jul, 2007.
    [17] R. Collobert, and J. Weston, "A unified architecture for natural language processing: Deep neural networks with multitask learning." in Proceeding ICML '08 Proceedings of the 25th international conference on Machine learning pp. 160-167.
    [18] X. Glorot, A. Bordes, and Y. Bengio, "Domain adaptation for large-scale sentiment classification: A deep learning approach." in Proceedings of the 28 th International Conference on Machine Learning, Bellevue, WA, USA, 2011 pp. 513-520.
    [19] R. Socher, C. C. Lin, C. Manning, and A. Y. Ng, "Parsing natural scenes and natural language with recursive neural networks." in Proceedings of the 28 th International Conference on Machine Learning, Bellevue, WA, USA, 2011 pp. 129-136.
    [20] J. Weston, S. Bengio, and N. Usunier, "Wsabie: Scaling up to large vocabulary image annotation." in Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI (2011)
    [21] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
    [22] Q. Le, and T. Mikolov, "Distributed representations of sentences and documents." in Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32 pp. 1188-1196.
    [23] F. Hill, K. Cho, and A. Korhonen, “Learning distributed representations of sentences from unlabelled data,” arXiv preprint arXiv:1602.03483, 2016.
    [24] M. Schuster, and K. K. Paliwal, “Bidirectional recurrent neural networks,” Ieee Transactions on Signal Processing, vol. 45, no. 11, pp. 2673-2681, Nov, 1997.
    [25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need."in NIPS 2017 pp. 5998-6008.
    [26] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” arXiv preprint arXiv:1802.05365, 2018.
    [27] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” URL https://s3-us-west-2. amazonaws.com/openai-ssets/researchcovers/languageunsupervised/language understanding paper. pdf, 2018.
    [28] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
    [29] Y.-W. Liu, L.-C. Liu, C.-J. Wang, and M.-F. Tsai, "Fin10k: a web-based information system for financial report analysis and visualization." in CIKM '16 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management pp. 2441-2444.
    [30] X. Man, T. Luo, and J. Lin, "Financial Sentiment Analysis (FSA): A Survey." pp. in 2019 IEEE International Conference on Industrial Cyber Physical Systems (ICPS) 617-622.
    [31] J. Z. G. Hiew, X. Huang, H. Mou, D. Li, Q. Wu, and Y. Xu, “BERT-based Financial Sentiment Index and LSTM-based Stock Return Predictability,” arXiv preprint arXiv:1906.09024, 2019.
    [32] B. Dumas, A. Kurshev, and R. Uppal, “Equilibrium portfolio strategies in the presence of sentiment risk and excess volatility,” The Journal of Finance, vol. 64, no. 2, pp. 579-629, 2009.
    [33] R. P. Weber. "Harvard IV dictionary," http://textanalysis.info/pages/category-systems/general-category-systems/harvard-iv-dictionary.php.
    [34] D. Jurafsky, Speech & language processing: Pearson Education India, 2000.
    [35] M. F. Porter, “An algorithm for suffix stripping,” Program-Electronic Library and Information Systems, vol. 40, no. 3, pp. 211-218, 2006.
    [36] I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks." in Advances in Neural Information Processing Systems 27 (NIPS 2014) pp. 3104-3112.
    [37] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
    [38] N. Kalchbrenner, L. Espeholt, K. Simonyan, A. v. d. Oord, A. Graves, and K. Kavukcuoglu, “Neural machine translation in linear time,” arXiv preprint arXiv:1610.10099, 2016.
    [39] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
    [40] M.-F. Tsai, C.-J. Wang, and P.-C. Chien, “Discovering finance keywords via continuous-space language models,” ACM Transactions on Management Information Systems (TMIS), vol. 7, no. 3, pp. 7, 2016.
    [41] B. Xie, R. Passonneau, L. Wu, and G. G. Creamer, "Semantic frames to predict stock price movement." in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 873–883, Sofia, Bulgaria, August 4-9 2013.
    [42] M.-C. Lin, A. J. Lee, R.-T. Kao, and K.-T. Chen, “Stock price movement prediction using representative prototypes of financial reports,” ACM Transactions on Management Information Systems (TMIS), vol. 2, no. 3, pp. 19, 2011.
    [43] J. L. Leidner, and F. Schilder, "Hunting for the black swan: risk mining from text." in Proceedings of the ACL 2010 System Demonstrations pp. 54-59.
    [44] R. Gunning, “The fog index after twenty years,” Journal of Business Communication, vol. 6, no. 2, pp. 3-13, 1969.
    [45] E. F. Fama, and K. R. French, “Common Risk-Factors in the Returns on Stocks and Bonds,” Journal of Financial Economics, vol. 33, no. 1, pp. 3-56, Feb, 1993.
    [46] Investopedia. "Fama and French Three Factor Model," https://www.investopedia.com/terms/f/famaandfrenchthreefactormodel.asp.
    [47] B. Dumas, A. Kurshev, and R. Uppal, “Equilibrium Portfolio Strategies in the Presence of Sentiment Risk and Excess Volatility,” Journal of Finance, vol. 64, no. 2, pp. 579-629, Apr, 2009.
    [48] R. S. Tsay, “Financial Time Series,” Wiley StatsRef: Statistics Reference Online, pp. 1-23, 2014.
    [49] R. F. Engle, “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United-Kingdom Inflation,” Econometrica, vol. 50, no. 4, pp. 987-1007, 1982.
    [50] R. E. Whaley, “Understanding the VIX,” The Journal of Portfolio Management, vol. 35, no. 3, pp. 98-105, 2009.
    [51] Investopedia. "Volatility Definition," https://www.investopedia.com/terms/v/volatility.asp.
    [52] P. A. Griffin, “Got information? Investor response to Form 10-K and Form 10-Q EDGAR filings,” Review of Accounting Studies, vol. 8, no. 4, pp. 433-460, 2003.
    [53] B. G. Malkiel, “The efficient market hypothesis and its critics,” Journal of economic perspectives, vol. 17, no. 1, pp. 59-82, 2003.
    [54] N. Shing, L. Erlikh, N. R. Lim, J. L. Lambert, J. M. Moskowitz, V. K. Wadhwa, J. Hughes, and E. C. Power, "Software distribution system to build and distribute a software release," Google Patents, 1996.
    [55] G. E. Hinton, J. L. McClelland, and D. E. Rumelhart, Distributed representations: Carnegie-Mellon University Pittsburgh, PA, 1984.
    [56] A. Mnih, and G. E. Hinton, "A scalable hierarchical distributed language model." in Advances in Neural Information Processing Systems 21 (NIPS 2008) pp. 1081-1088.
    [57] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality." in Advances in Neural Information Processing Systems 26 (NIPS 2013) pp. 3111-3119.
    [58] J. Pennington, R. Socher, and C. Manning, "Glove: Global vectors for word representation." in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) pp. 1532-1543.
    [59] S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
    [60] Y. Kim, “Convolutional neural networks for sentence classification,” arXiv preprint arXiv:1408.5882, 2014.
    [61] F. M. Dekking, C. Kraaikamp, H. P. Lopuhaä, and L. E. Meester, A Modern Introduction to Probability and Statistics: Understanding why and how: Springer Science & Business Media, 2005.
    [62] H. Drucker, C. J. Burges, L. Kaufman, A. J. Smola, and V. Vapnik, "Support vector regression machines." in Advances in Neural Information Processing Systems 9 (NIPS 1996) pp. 155-161.
    [63] B. Schölkopf, A. J. Smola, and F. Bach, Learning with kernels: support vector machines, regularization, optimization, and beyond: MIT press, 2002.
    [64] J. Perkins, Python 3 text processing with NLTK 3 cookbook: Packt Publishing Ltd, 2014.

    無法下載圖示 全文公開日期 2024/08/29 (校內網路)
    全文公開日期 2024/08/29 (校外網路)
    全文公開日期 2024/08/29 (國家圖書館:臺灣博碩士論文系統)
    QR CODE