研究生: |
湯京祐 Jing-You Tang |
---|---|
論文名稱: |
使用預訓練表示法於財務風險指標預測之研究 A Study on Using Pre-trained Embedding Models for Finance Risk Prediction |
指導教授: |
陳冠宇
Kuan-Yu Chen |
口試委員: |
陳怡伶
Yi-Ling Chen 戴碧如 Bi-Ru Dai |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2019 |
畢業學年度: | 107 |
語文別: | 中文 |
論文頁數: | 93 |
中文關鍵詞: | 文字探勘 、財務風險 、機器學習 、語言模型 |
外文關鍵詞: | Text mining, Financial risk, Machine learning, Language model |
相關次數: | 點閱:304 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文的目的在於使用新穎的機器學習的模型架構以及詞向量表示,使風險評估的精準程度有效上升。研究焦點放於使用各種不同特徵以及方法來做比較實驗,特徵包含對數尺度下的詞頻特徵以及從舊到新的各種不同的詞向量表示兩種,訓練模型架構則是基本的深度神經網路架構以及支持向量回歸模型。
本論文在此提出了三種新的改良方式,第一,以最新穎之預訓練模型BERT的特徵,使風險預估有效提升精準度,第二,提出一種新穎的訓練模型,不同於預測值,本論文選擇預測每年的指標差額,來改進風險預估的精準度;最後使用詞向量特徵以及詞頻特徵,所得到的訓練特徵有所不同的想法,結合此兩種特徵以餘弦距離遠近的方式使其能夠互相彌補,以利實驗結果能夠有更好的精準度,最後實驗得出,使用BERT特徵能夠有效使風險預測的精準度有大幅度的上升,並且也證明提出的差額模型在預測波動率以及事後波動率的實驗中,有著比原本模型更好的結果;然後希望以不同的特徵互相彌補的實驗中,雖然有好的結果但其效果不彰。
The purpose of this paper is to use the state of the art machine learning model architecture and word vector representation to effectively increase the accuracy of risk assessment. The focus of research is on the use of various features and methods for comparative experiments. The features include word frequency features on logarithmic scales and various word vector representations. The training model architecture is the basic deep neural network architecture and support vector regression models.
In this paper, three new methods are proposed. First, we add the state of the art feature : BERT are used to improve the accuracy of risk estimation. Secondly, we proposed a new training model, which is different from the predicted value. We choose to predict the margin of indicators to improve the accuracy of risk estimation. Third, we combining the word vector feature and the word frequency feature to make experimental results have better precision. Finally, the experiment shows that the use BERT features can effectively increase the accuracy of risk prediction. And the proposed margin model predicts volatility and post-event volatility are better than the original model; But in experiments that combine two different features, although there are some good accuracy, the experiments effect is not good
[1] S. Kogan, D. Levin, B. R. Routledge, J. S. Sagi, and N. A. Smith, "Predicting risk from financial reports with regression." In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL’09) pp. 272-280.
[2] M.-F. Tsai, C.-J. Wang, and P.-C. Chien, “Discovering finance keywords via continuous-space language models,” ACM Transactions on Management Information Systems (TMIS), vol. 7, no. 3, pp. 7, 2016.
[3] G. Salton, and C. Buckley, “Term-Weighting Approaches in Automatic Text Retrieval,” Information Processing & Management, vol. 24, no. 5, pp. 513-523, 1988.
[4] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin, “A neural probabilistic language model,” Journal of Machine Learning Research, vol. 3, no. 6, pp. 1137-1155, Aug 15, 2003.
[5] B. E. Boser, I. M. Guyon, and V. N. Vapnik, "A training algorithm for optimal margin classifiers." in Proceeding COLT '92 Proceedings of the fifth annual workshop on Computational learning theory pp. 144-152.
[6] M. W. Gardner, and S. R. Dorling, “Artificial neural networks (the multilayer perceptron) - A review of applications in the atmospheric sciences,” Atmospheric Environment, vol. 32, no. 14-15, pp. 2627-2636, Aug, 1998.
[7] W. Y. Wang, and Z. Hua, "A semiparametric gaussian copula regression model for predicting financial risks from earnings calls." in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics pp. 1155-1165.
[8] M. F. Tsai, and C. J. Wang, “On the risk prediction and analysis of soft information in finance reports,” European Journal of Operational Research, vol. 257, no. 1, pp. 243-250, Feb 16, 2017.
[9] C. Nopp, and A. Hanbury, "Detecting risks in the banking system by sentiment analysis." in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing pp. 591-600.
[10] S. Kazemian, S. Zhao, and G. Penn, "Evaluating sentiment analysis evaluation: A case study in securities trading." in Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis pp. 119-127.
[11] X. Ding, Y. Zhang, T. Liu, and J. Duan, "Deep learning for event-driven stock prediction." in Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015)
[12] T. H. Nguyen, and K. Shirai, "Topic modeling based sentiment analysis on social media for stock market prediction." in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing pp. 1354-1364.
[13] T. Loughran, and B. McDonald, “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks,” Journal of Finance, vol. 66, no. 1, pp. 35-65, Feb, 2011.
[14] Investopedia. "Sarbanes-Oxley (SOX) Act of 2002," https://www.investopedia.com/terms/s/sarbanesoxleyact.asp.
[15] C. D. Manning, C. D. Manning, and H. Schütze, Foundations of statistical natural language processing: MIT press, 1999.
[16] H. Schwenk, “Continuous space language models,” Computer Speech and Language, vol. 21, no. 3, pp. 492-518, Jul, 2007.
[17] R. Collobert, and J. Weston, "A unified architecture for natural language processing: Deep neural networks with multitask learning." in Proceeding ICML '08 Proceedings of the 25th international conference on Machine learning pp. 160-167.
[18] X. Glorot, A. Bordes, and Y. Bengio, "Domain adaptation for large-scale sentiment classification: A deep learning approach." in Proceedings of the 28 th International Conference on Machine Learning, Bellevue, WA, USA, 2011 pp. 513-520.
[19] R. Socher, C. C. Lin, C. Manning, and A. Y. Ng, "Parsing natural scenes and natural language with recursive neural networks." in Proceedings of the 28 th International Conference on Machine Learning, Bellevue, WA, USA, 2011 pp. 129-136.
[20] J. Weston, S. Bengio, and N. Usunier, "Wsabie: Scaling up to large vocabulary image annotation." in Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI (2011)
[21] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[22] Q. Le, and T. Mikolov, "Distributed representations of sentences and documents." in Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014. JMLR: W&CP volume 32 pp. 1188-1196.
[23] F. Hill, K. Cho, and A. Korhonen, “Learning distributed representations of sentences from unlabelled data,” arXiv preprint arXiv:1602.03483, 2016.
[24] M. Schuster, and K. K. Paliwal, “Bidirectional recurrent neural networks,” Ieee Transactions on Signal Processing, vol. 45, no. 11, pp. 2673-2681, Nov, 1997.
[25] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need."in NIPS 2017 pp. 5998-6008.
[26] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” arXiv preprint arXiv:1802.05365, 2018.
[27] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” URL https://s3-us-west-2. amazonaws.com/openai-ssets/researchcovers/languageunsupervised/language understanding paper. pdf, 2018.
[28] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[29] Y.-W. Liu, L.-C. Liu, C.-J. Wang, and M.-F. Tsai, "Fin10k: a web-based information system for financial report analysis and visualization." in CIKM '16 Proceedings of the 25th ACM International on Conference on Information and Knowledge Management pp. 2441-2444.
[30] X. Man, T. Luo, and J. Lin, "Financial Sentiment Analysis (FSA): A Survey." pp. in 2019 IEEE International Conference on Industrial Cyber Physical Systems (ICPS) 617-622.
[31] J. Z. G. Hiew, X. Huang, H. Mou, D. Li, Q. Wu, and Y. Xu, “BERT-based Financial Sentiment Index and LSTM-based Stock Return Predictability,” arXiv preprint arXiv:1906.09024, 2019.
[32] B. Dumas, A. Kurshev, and R. Uppal, “Equilibrium portfolio strategies in the presence of sentiment risk and excess volatility,” The Journal of Finance, vol. 64, no. 2, pp. 579-629, 2009.
[33] R. P. Weber. "Harvard IV dictionary," http://textanalysis.info/pages/category-systems/general-category-systems/harvard-iv-dictionary.php.
[34] D. Jurafsky, Speech & language processing: Pearson Education India, 2000.
[35] M. F. Porter, “An algorithm for suffix stripping,” Program-Electronic Library and Information Systems, vol. 40, no. 3, pp. 211-218, 2006.
[36] I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks." in Advances in Neural Information Processing Systems 27 (NIPS 2014) pp. 3104-3112.
[37] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
[38] N. Kalchbrenner, L. Espeholt, K. Simonyan, A. v. d. Oord, A. Graves, and K. Kavukcuoglu, “Neural machine translation in linear time,” arXiv preprint arXiv:1610.10099, 2016.
[39] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
[40] M.-F. Tsai, C.-J. Wang, and P.-C. Chien, “Discovering finance keywords via continuous-space language models,” ACM Transactions on Management Information Systems (TMIS), vol. 7, no. 3, pp. 7, 2016.
[41] B. Xie, R. Passonneau, L. Wu, and G. G. Creamer, "Semantic frames to predict stock price movement." in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 873–883, Sofia, Bulgaria, August 4-9 2013.
[42] M.-C. Lin, A. J. Lee, R.-T. Kao, and K.-T. Chen, “Stock price movement prediction using representative prototypes of financial reports,” ACM Transactions on Management Information Systems (TMIS), vol. 2, no. 3, pp. 19, 2011.
[43] J. L. Leidner, and F. Schilder, "Hunting for the black swan: risk mining from text." in Proceedings of the ACL 2010 System Demonstrations pp. 54-59.
[44] R. Gunning, “The fog index after twenty years,” Journal of Business Communication, vol. 6, no. 2, pp. 3-13, 1969.
[45] E. F. Fama, and K. R. French, “Common Risk-Factors in the Returns on Stocks and Bonds,” Journal of Financial Economics, vol. 33, no. 1, pp. 3-56, Feb, 1993.
[46] Investopedia. "Fama and French Three Factor Model," https://www.investopedia.com/terms/f/famaandfrenchthreefactormodel.asp.
[47] B. Dumas, A. Kurshev, and R. Uppal, “Equilibrium Portfolio Strategies in the Presence of Sentiment Risk and Excess Volatility,” Journal of Finance, vol. 64, no. 2, pp. 579-629, Apr, 2009.
[48] R. S. Tsay, “Financial Time Series,” Wiley StatsRef: Statistics Reference Online, pp. 1-23, 2014.
[49] R. F. Engle, “Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United-Kingdom Inflation,” Econometrica, vol. 50, no. 4, pp. 987-1007, 1982.
[50] R. E. Whaley, “Understanding the VIX,” The Journal of Portfolio Management, vol. 35, no. 3, pp. 98-105, 2009.
[51] Investopedia. "Volatility Definition," https://www.investopedia.com/terms/v/volatility.asp.
[52] P. A. Griffin, “Got information? Investor response to Form 10-K and Form 10-Q EDGAR filings,” Review of Accounting Studies, vol. 8, no. 4, pp. 433-460, 2003.
[53] B. G. Malkiel, “The efficient market hypothesis and its critics,” Journal of economic perspectives, vol. 17, no. 1, pp. 59-82, 2003.
[54] N. Shing, L. Erlikh, N. R. Lim, J. L. Lambert, J. M. Moskowitz, V. K. Wadhwa, J. Hughes, and E. C. Power, "Software distribution system to build and distribute a software release," Google Patents, 1996.
[55] G. E. Hinton, J. L. McClelland, and D. E. Rumelhart, Distributed representations: Carnegie-Mellon University Pittsburgh, PA, 1984.
[56] A. Mnih, and G. E. Hinton, "A scalable hierarchical distributed language model." in Advances in Neural Information Processing Systems 21 (NIPS 2008) pp. 1081-1088.
[57] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality." in Advances in Neural Information Processing Systems 26 (NIPS 2013) pp. 3111-3119.
[58] J. Pennington, R. Socher, and C. Manning, "Glove: Global vectors for word representation." in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) pp. 1532-1543.
[59] S. Hochreiter, and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[60] Y. Kim, “Convolutional neural networks for sentence classification,” arXiv preprint arXiv:1408.5882, 2014.
[61] F. M. Dekking, C. Kraaikamp, H. P. Lopuhaä, and L. E. Meester, A Modern Introduction to Probability and Statistics: Understanding why and how: Springer Science & Business Media, 2005.
[62] H. Drucker, C. J. Burges, L. Kaufman, A. J. Smola, and V. Vapnik, "Support vector regression machines." in Advances in Neural Information Processing Systems 9 (NIPS 1996) pp. 155-161.
[63] B. Schölkopf, A. J. Smola, and F. Bach, Learning with kernels: support vector machines, regularization, optimization, and beyond: MIT press, 2002.
[64] J. Perkins, Python 3 text processing with NLTK 3 cookbook: Packt Publishing Ltd, 2014.