簡易檢索 / 詳目顯示

研究生: 楊子沅
Tzu-Yuan Yang
論文名稱: 社群媒體情緒於股價預測之探討——以PTT文字探勘為例
Predicting Stock Price with Social Media Sentiment: PTT Text Mining
指導教授: 陳俊男
Chun-nan Chen
口試委員: 陳俊男
Chun-nan Chen
鄭仁偉
Jen-Wei Cheng
謝劍平
Jian-Ping Shieh
林軒竹
Hsuan-Chu Lin
學位類別: 碩士
Master
系所名稱: 管理學院 - 財務金融研究所
Graduate Institute of Finance
論文出版年: 2023
畢業學年度: 112
語文別: 中文
論文頁數: 44
中文關鍵詞: 文字探勘社群媒體情緒分析SVR股價預測
外文關鍵詞: text mining, social media, sentiment analysis, SVR, stock price prediction
相關次數: 點閱:253下載:11
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 在網路發達的世代,社群媒體讓投資人能獲取即時的資訊,與他人直接進行交流,並決定下一步行動。本研究旨在量化這種非結構化的文字資訊,萃取出其中隱含的情緒與意向,並驗證是否能以此提升股市分析效率。過去之研究已證實,來自新聞文本的情緒可提升股價預測能力,且能利用支持向量迴歸(SVR)獲得更佳結果;本研究改使用純社群討論文章,以取得投資人對資訊最直接的反應。

    本研究以2022年6月至2023年5月之PTT股票板「每日盤後討論」推文,共計231日資料作為研究文本。進行斷詞處理後,以「監督式學習情緒特徵篩選模型 (SESTM)」進行字詞的情緒判斷,建立社群情緒指數。最後進行台股大盤指數預測,使用複迴歸及Polynomial、RBF兩種SVR模型,驗證加入社群情緒指數後是否提高預測效率;以Adj.R^2評估解釋力、MSE評估準確率。

    研究結果中,文本擷取出約255萬個詞,扣除重複者後共有2,063個正面情緒詞、2,000個中立詞、3,457個負面情緒詞。未加入社群情緒指數之下,RBF的解釋力與準確率優於複迴歸與Polynomial;只有RBF加入社群情緒指數後預測能力提高, 以第t - 1日結果最佳,使Adj.R^2增加2.49%、MSE下降2.56%。結果顯示,利用本研究之文字探勘方法及RBF支持向量迴歸模型,可成功提取出社群媒體中之投資人情緒,提升股市分析效率。


    Social medias enable investors to access real-time news, engage in discussions, and make informed decisions. This study aims to improve the efficiency of stock market analysis by quantifying unstructured textual data and extracting hidden sentiments and intentions. Unlike previous research that relied on news texts, this study focuses on utilizing non-news discussions to capture investors' direct reactions to daily information.

    Data collected from the "Daily After-Market Discussion" on PTT Stock Board, spanning June 2022 to May 2023, yielded 231 days of data. The collected text was subjected to word tokenization and sentiment analysis using the Sentiment Extraction via Screening and Topic Modeling (SESTM) to establish a social media sentiment index.

    To predict the Taiwan Weighted Index (TWII), multiple regression and two Support Vector Regression (SVR) models, Polynomial and RBF, were employed. The models incorporated US stock indices and exchange rates as independent variables. The objective was to evaluate whether the inclusion of the social media sentiment index improves prediction accuracy of stock market analysis. Evaluation criteria included Adj.R^2 to measure explanatory power and mean squared error (MSE) to assess accuracy.

    The empirical analysis included approximately 2.55 million words, with positive, neutral, and negative sentiments identified. Results showed that the RBF SVR model outperformed multiple regression and the Polynomial SVR model in terms of prediction accuracy. Only the RBF model improved prediction accuracy when adding the social media sentiment index. Particularly, utilizing the sentiment index from day (t-1) yielded the best results, led to a 2.49% increase in Adj.R^2 and a 2.56% decrease in MSE.

    Overall, the empirical result indicated enhanced stock price prediction capabilities with the integration of social media sentiment.

    摘要    I ABSTRACT    II 致謝    III 目錄    IV 圖目錄    VI 表目錄    VII 第一章    緒論    1 第一節    研究背景與動機    1 第二節    研究目的    2 第三節    研究架構    3 第二章    文獻探討    4 第一節    社群媒體情緒與投資人行為    4 第二節    文字探勘與中文斷詞技術    5 第三節    情感分析模型    7 第四節    機器學習之SVM與SVR    10 第三章    研究方法    12 第一節    利用文字探勘蒐集社群意見    12 第二節    建立社群情緒模型    14 第三節    以複迴歸與SVR進行預測    18 第四章    實證結果    20 第一節    文字探勘與情緒指數    20 第二節    簡單迴歸分析    20 第三節    複迴歸模型    23 第四節    SVR Polynomial模型    29 第五節    SVR RBF模型    33 第六節    本章小結    36 第五章    結論與建議    38 第一節    研究結論    38 第二節    研究貢獻    39 第三節    未來研究建議    40 參考文獻    41 一、英文文獻    41 二、中文文獻    43 三、網路資料    44

    Al-rubaiee, H., Qiu, R., & Li, D. (2015). Analysis of the relationship between Saudi twitter posts and the Saudi stock market. In: 2015 IEEE seventh international conference on intelligent computing and information systems (ICICIS).
    Angel, J. (2021). Gamestonk: What Happened and What to Do About It. Georgetown University Working Paper.
    Baker, M., & Wurgler, J. (2000). The Equity Share in New Issues and Aggregate Stock Return. Journal of Finance, 55(5), 2219-2257.
    Cade, N. L. (2018). Corporate social media: How two-way disclosure channels influence investors. Accounting, Organizations and Society, 68, 63-79.
    Cambria, E. (2013). Knowledge-Based Approaches to Concept-Level Sentiment Analysis. Intelligent Systems, IEEE 28(2), 12-14.
    Elliott, W. B., Grant, S. M., & Hobson, J. L. (2018). Trader participation in disclosure: Implications of interactions with management. Working paper, University of Illinois and University of Washington.
    Feldman, R. (2013). Techniques and applications for sentiment analysis. Commun. ACM, 56(4), 82-89.
    Gao, Z., & Ye, M. (2007). A framework for data mining-based anti-money laundering research. J Money Laund Control, 10(2), 170-179.
    Gupta, A., Dengre, V., Kheruwala, H., & Shah, M. (2020). Comprehensive review of text‑mining applications in finance. Financ Innov, 6(39).
    Holton, C. (2009). Identifying disgruntled employee systems fraud risk through text mining: a simple solution for a multi-billion-dollar problem. Decis Support Syst, 46(4), 853-86.
    Jaseena, K.U., & David, J.M. (2014). Issues, challenges, and solutions: big data mining. NeTCoM, CSIT, GRAPH-HOC, SPTM, 2014, 131-140.
    Kumar, B.S., & Ravi, V. (2016). A survey of the applications of text mining in financial domain. Knowl Based Syst, 114, 128-1.
    Ke, Z., Kelly, B.T., & Xiu, D. (2020). Predicting Returns with Text Data. University of Chicago, Becker Friedman Institute for Economics Working Paper No. 2019-69, Yale ICF Working Paper No. 2019-10, Chicago Booth Research Paper, 20-37.
    Lewis, C., & Young, S. (2019). Fad or future? Automated analysis of financial text and its implications for corporate reporting. Account Bus Res, 49(5), 587-615.
    Liew, J., & Budavari, T. (2016). The 'Sixth' Factor--Social Media Factor Derived Directly from Tweet Sentiments. The Portfolio of Portfolio Management.
    Nguyen, T.H., Shirai, K., & Velcin, J. (2015) Sentiment analysis on social media for stock movement prediction. Expert Syst Appl, 42(24), 9603-9611.
    Patel, D., Shah, Y., Thakkar, N., Shah, K., & Shah, M. (2020). Implementation of artificial intelligence techniques for cancer detection. Augment Hum Res.
    Shahi, A.M., Issac, B., & Modapothala, J.R. (2014). Automatic analysis of corporate sustainability reports and intelligent SCORING. Int J Comput Intell Appl, 13(01), 1450006.
    Talib, R., Hanif, M.K., Ayesha, S., & Fatima, F. (2016). Text mining: techniques. Appl Issues, 7(11), 414-418.
    Vapnik, V. (1998). Statistical Learning Theory. Chichester: John Wiley & Sons.
    Walczak, S. (2001). An empirical analysis of data requirements for financial forecasting with neural networks. Journal of Management Information Systems, 174, 203-222.
    Yap, B.W., Ong, S.H., & Husain, N.H.M. (2011). Using data mining to improve assessment of credit worthiness via credit scoring models. Expert Syst Appl, 38(10), 13274-13283.
    何彥慶(2001)。九零前後,台股加權股價指數與美國道瓊工業指數、日本股價指數的連動關係之研究。國立成功大學企業管理學系研究所碩士論文。
    巫啟台(2002)。文件之關連資訊萃取及其概念圖自動建構。國立成功大學資訊工程學系所碩士論文。
    黃慧兒(2018)。投資人新聞反應情緒對台灣電子股指數之關係探討。國立台灣科技大學財務金融研究所碩士論文。
    裘江南、葛一迪(2021)。股市危機情境下社會媒體投資者情緒對股票市場的影響研究。管理評論,33(5),281-294。
    鄭天澤、陳莉貞、郭怡君(2021)。基金投資人投資行為與偏好問卷調查分析。證券暨期貨市場發展基金會。
    簡智宏(2015)。應用文字探勘技術於概念股輿情與股價共同移動之研究—以蘋果供應鏈為例。國立政治大學資訊管理研究所碩士論文。
    Coding Man. (2022). PyPtt. Retrieved from https://github.com/PyPtt/PyPtt (May 31, 2023)
    Python全民瘋AI系列: [Day 14] SVR 迴歸器(2020年9月29日)。檢自https://ithelp.ithome.com.tw/articles/10246312 (May 31, 2023)
    丘祐瑋(2015)。大數軟體X大數學堂:網路爬蟲(Crawler)實戰教學。檢自https://www.largitdata.com/course_list/8 (May 31, 2023)
    政府資料開放平臺,銀行間市場新臺幣對美元收盤匯率(民111年6月2日)。檢自https://data.gov.tw/en/datasets/7232 (Jun. 2, 2023)
    黃志勝(2018)。機器學習: Kernel 函數。檢自https://chih-sheng-huang821.medium.com/機器學習-kernel-函數-47c94095171 (Jun. 2, 2023)
    蔡炎龍(2018)。政大開放式課程影音網:成為python數據分析達人的第一門課。檢自https://ctld.video.nccu.edu.tw/km/1399 (Jun. 2, 2023)

    QR CODE