簡易檢索 / 詳目顯示

研究生: 孫熙辰
Hsi-Chen Sun
論文名稱: 文字探勘於時間序列預測之應用-以股價為例
Text Mining on Time Series Forecasting: Take Stock Price for Example
指導教授: 呂志豪
Shih-Hao Lu
口試委員: 鄭仁偉
Jen-Wei Cheng
黃美慈
Mei-Tzu Huang
學位類別: 碩士
Master
系所名稱: 管理學院 - 企業管理系
Department of Business Administration
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 54
中文關鍵詞: 財經新聞股價預測情感分析情緒詞典時間序列MAPE機器學習隨機森林演算法
外文關鍵詞: Financial News, Stock Price Forecasting, Sentiment Analysis, Sentiment Lexicon, Time Series, MAPE, Machine Learning, Random Forest (Algorithm)
相關次數: 點閱:197下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  •   隨著資訊的數位化,數量龐大的文本數據能夠隨時取用。文字探勘(Text Mining)與情感分析(Sentiment Analysis)技術因而興起,各領域的研究人員都想從複雜的文本內容中,尋找有用的資訊,並發掘隱含在其中的洞見與價值。
      本研究嘗試建立「時間序列預測股價之誤差」的修正模型。抓取中時新聞網2015/6/26至2020/7/30之11間公司的財經新聞共計68,775篇,藉由情感分析技術將每日的新聞情緒量化,再透過隨機森林迴歸演算法預測ARIMA模型預測股價時產生的誤差值,達到修正ARIMA模型的效果。
      研究結果顯示,模型修正成功率介於36%-71%,平均為54%。整體而言模型修正成功率越高,MAPE下降率也越大。後續回溯公司的新聞內容與基本面分析,發現本研究只考慮單一公司的新聞對其股價影響,以致修正模型的預測效果並不穩定。或許加入更多相關廠商的新聞或產業趨勢文章加以分析,更能貼合大眾瀏覽財經新聞時產生的情緒,使修正模型具有一定程度能讓投資者作為投資決策的參考依據。


      With the digitalization of information, a huge amount of text data can be captured at any time. Text mining and sentiment analysis have become more and more popular in recent years. Researchers in various fields want to apply these technologies to find useful information and insights from complicated text.
      This research aimed to establish an error correction model of time series forecasting in stock price. With the base of 68,755 pieces of financial news from 11 companies from 26th June 2015 to 30th July 2020 captured from “chinatimes.com”, this model could quantify the daily news sentiment to predict the stock price error of the ARIMA model through random forest regression. Thus, it could achieve the effect of model correction.
      The results revealed that the success rate of model correction ranges from thirty-six percent to seventy-one percent, with an average of fifty-four percent. Overall, the higher the success rate of model correction, the greater the rate of MAPE decrease. The results also showed that this model only considered the impact of a single company’s news on its stock price, so that the success rate of model correction was unstable. Perhaps adding more related news and industrial trend articles can fitter the public sentiment, so that the error correction model may serve as a guide for investors to make their investment decisions.

    摘要 I Abstract II 致謝 III 目錄 IV 圖目錄 V 表目錄 VI 第一章 緒論 1 第一節 研究背景與動機 1 第二節 研究目的與重要性 1 第三節 研究流程 2 第二章 文獻探討 3 第一節 文字探勘 3 第二節 情感分析 4 第三節 時間序列 6 第四節 機器學習 7 第三章 研究方法 9 第一節 第一階段:新聞前處理 9 第二節 第二階段:股價預測修正模型 12 第四章 研究結果 17 第一節 資料描述 17 第二節 修正模型之表現 18 第三節 小結 26 第五章 結論與建議 27 第一節 研究結論 27 第二節 研究限制 27 第三節 未來研究建議 28 參考文獻 30 附錄 34

    一、中文文獻
    李謦哲(2015)。應用FFCA結合情感分析探勘Facebook對議題之評論-以台灣2014九合一選舉為例〔碩士論文〕。國立雲林科技大學資訊管理系碩士班。
    張良杰(2014)。巨量資料環境下之新聞主題暨輿情與股價關係之研究〔碩士論文〕。國立政治大學資訊管理系碩士班。
    陳學毅(2004)。匯率預測模型績效之研究--時間序列及灰色預測模型之應用〔碩士論文〕。東海大學國際貿易系碩士班。
    楊踐為、李家豪、類惠貞(2007)。應用時間序列分析法建構台灣證券市場之預測交易模型。中華管理評論國際學報第十卷第三期。
    鍾任明、李維平、吳澤民(2007)。運用文字探勘於日內股價漲跌趨勢預測之研究。中華管理評論國際學報第十卷第一期。
    簡智宏(2015)。應用文字探勘技術於概念股輿情與股價共同移動之研究-以蘋果供應鏈為例〔碩士論文〕。國立政治大學資訊管理系碩士班。

    二、英文文獻
    Ahmad, K., Oliveira, P. C., Manomaisupat, P., Casey, M. & Taskya, T. (2002). Description of Events : An Analysis of Keywords and Indexical Names. Proceedings of the 3rd International Conference on Language Resources and Evaluation: Workshop on Event Modelling for Multilingual Document linking, 29-35.
    Anderson, E. T., & Simester, D. I. (2014). Reviews Without a Purchase: Low Ratings, Loyal Customers, and Deception. Journal of Marketing Research, 51(3), 249-69.
    Ariyo, A. A., Adewumi, A. O., & Ayo, C. K. (2014). Stock Price Prediction Using the ARIMA Model. Proceedings of the 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, 106-112.
    Baek, H., Oh, S., Yang, H-D. & Anhn, J. (2017). Electronic word-of-mouth, box office revenue and social media. Electronic Commerce Research and Applications, 22, 13-23.
    Berger, J., Humphreys, A., Ludwig, S., Moe, W. W., Netzer, O., & Schweidel, D. A. (2020). Uniting the Tribes: Using Text for Marketing Insight. Journal of Marketing, 84(1), 1-25.
    Bollerslev, T. (1986). Genernalized Autoregressive Conditional Heteroscedasticity. Journal of Ecomometrics, 30, 307-327.
    Box, G., & Jenkins, G. (1970). Time Series Analysis: Forecasting and Control, San Francisco: Holden-Day.
    Breiman, L. (2001). Random Forests, Machine Learning, 45, 5-32
    Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and Regression Trees, New York: Chapman and Hall.
    Devitt, A., & Ahmad, K. (2007). Sentiment polarity identification in financial news: A cohesion-based approach. Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL), 984-991.
    Engle, R, F. (1982). Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of United Kingdom Inflation. Econometrica, 50(4), 987-1108.
    Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical. Journal of Finance, 25, 383-417.
    Fiss, P. C., & Hirsch, P. M. (2005). The Discourse of Globalization: Framing and Sensemaking of an Emerging Concept. American Sociological Review, 70(1), 29-52.
    Gandomi, A., & Haider M. (2015). Beyond the Hype: Big Data Concepts, Methods, and Analytics. International Journal of Information Management, 35(2), 137-44.
    Hatzivassiloglou, V., & McKeown, K. R. (1997). Predicting the Semantic Orientation of Adjectives. Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics (ACL & EACL), 174-181.
    Herhausen, D., Ludwig, S., Grewal, D., Wulf, J., & Schögel M. (2019). Detecting, Preventing, and Mitigating Online Firestorms in Brand Communities, Journal of Marketing, 83(3), 1-21.
    Ho, T. K. (1995). Random decision forests. Proceedings of 3rd International Conference on Document Analysis and Recognition, 278-282.
    Ho, T. K. (1998). The random subspace method for constructing decision forests. Pattern Analysis and Machine Intelligence, 20(8), 832-844.
    Hsieh, W.-L. (2015). Predicting Company Revenue Trend Using Financial News. (Master’s thesis). National Sun Yat-sen University, Kaohsiung, ROC (Taiwan).
    Huang, Y.-L. (2013). The Asymmetric Effect of Investor Sentiment and Stock Returns (Master’s thesis). Ming Chuan University, Taipei, ROC (Taiwan).
    Ku, L.-W. & Chen, W.-F. (2007). Mining Opinions from the Web: Beyond Relevance Retrieval. Journal of American Society for Information Science and Technology, 58(12), 1838-1850
    Li, Y.-S. (2017). Text Mining and Financial News: Could News Sentiment affect Market Behavior? (Master’s thesis). National Sun Yat-sen University, Kaohsiung, ROC (Taiwan).
    Lin, I.-H. (2013). Creating and Verifying Sentiment Dictionary of Finance and Economics via Financial News (Master’s thesis). National Sun Yat-sen University, Kaohsiung, ROC (Taiwan).
    Liu, B. (2012). Setiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1-167.
    Liu, B., Mobasher, B., & Nasraoui, O. (2011). Web Data Mining. Berlin Heidelberg: Springer
    Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. J. (1990). Introduction to wordnet: An on-line lexical database*. International Journal of Lexicography, 3(4), 235-244.
    Molner, S., Prabhu, J. C., & Yadav, M. S. (2019). Lost in the Universe of Markets: Toward a Theory of Market Scoping for Early-Stage Technologies. Journal of Marketing, 83(2), 37-61.
    Moon, S., & Kamakura, W. A. (2017). A Picture Is Worth a Thousand Words: Translating Product Reviews into a Product Positioning Map. International Journal of Research in Marketing, 34(1), 265-285.
    Moritz, B., & Zimmermann, T. (2016). Tree-Based Conditional Portfolio Sorts: The Relation between Past and Future Stock Returns. Available at SSRN: https://ssrn.com/abstract=2740751
    Pang, B. & Lee, L. (2004). A sentimental education: Setiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL), 271-278.
    Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment Classification using Machine Learning Techniques. Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), 79-86.
    Pennebaker, J. W. (2011). The Secret Life of Pronouns. New Scientist, 211(2828), 42-45.
    Rasekhschaffe, K. C., & Jones, R. C. (2019). Machine Learning for Stock Selection. Financial Analysts Journal, 75(3), 70-88.
    Siami-Namini, S., Tavakoli, N., & Siami Namin, A. (2018). A Comparison of ARIMA and LSTM in Forecasting Time Series. Proceedings of the 2018 17th International Conference on Machine Learning and Applications (ICMLA), 1394-1401
    Sullivan, D. (2001). Document warehousing and text mining, Canada: Wiley.
    Takeuchi, L., & Lee, Y. Y. A. (2013). Applying Deep Learning to Enhance Momentum Trading Strategies in Stocks. (Working paper). Stanford University, Stanford, CA, USA.
    Turney, P. D., & Littman, M. L. (2003). Measuring praise and criticism: Inference of semantic orientation from association. ACM Transactions on Information Systems (TOIS), 21(4), 315-346.
    Turney, P.D. (2002). Thumbs up or thumbs down? Semantic Orientation Applied to Unsupervised Classification of Reviews. Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL), 417-424.
    Wang, S., & Luo, Y. (2012). Signal Processing: The Rise of the Machines. USA: Deutsche Bank North American Quantitative Strategy.
    Wang, S., & Luo, Y. (2014). Signal Processing: The Rise of the Machines III. USA: Deutsche Bank North American Quantitative Strategy.
    Wies, S., Hoffmann, A. O. I., Aspara, J., & Pennings, J. M. E. (2019). Can Advertising Investments Counter the Negative Impact of Shareholder Complaints on Firm Value?. Journal of Marketing, 83(4), 58-80.
    Wuthrich, B., Cho, V., Leung, S., Permunetilleke, D., Sankaran, K. & Zhang, J. (1998). Daily stock market forecast from textual web data. Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2720-2725.

    無法下載圖示 全文公開日期 2024/08/24 (校內網路)
    全文公開日期 2024/08/24 (校外網路)
    全文公開日期 2024/08/24 (國家圖書館:臺灣博碩士論文系統)
    QR CODE