簡易檢索 / 詳目顯示

研究生: 李祐任
Yu-Jen Lee
論文名稱: 一個基於深度神經網路 用以預測美國職業棒球大聯盟球隊戰績的方法- 以是否晉級季後賽為例
A Deep-Neural-Network-Based Approach to Predicting the Team Standings of Major League Baseball-Taking Whether to Qualify for the Playoffs as An Example
指導教授: 范欽雄
Chin-Shyurng Fahn
口試委員: 黃榮堂
Jung-Tang Huang
李建德
Jiann-Der Lee
吳怡樂
Yi-Leh Wu
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 52
中文關鍵詞: 深度學習網路棒球比賽美國職棒大聯盟球隊戰績勝場預測季後賽預測
外文關鍵詞: deep neural network, baseball game, Major League Baseball, team standing, the prediction of wins, the prediction of playoff teams
相關次數: 點閱:291下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 數據一直以來都出現在每個人的身邊,且與人類生活是密不可分的。近年來,數據在各領域地位日益漸增,尤其是在職業運動方面更加明顯;在所有職業運動中,棒球比賽的統計可說是數據化的先驅,例如:”Sabermetrics”是使用數據的最佳代表。
    棒球的數據是相對容易取得且大量的,而Major League of Baseball (MLB)又是世界上最頂級且最有名的職業棒球聯盟。本篇論文將運用深度學習的方式來預測MLB各球隊的整年度戰績區間;由於戰績預測是相對複雜且困難,而原始資料存在著大量的雜訊,導致特徵選取的重要性大大提升。我們將使用Weka做特徵的選取,再使用兩種模型來預測勝場數,且利用均方根誤差(Root Mean Square Error; RMSE)的評斷標準跟真實勝場數做比較;此外,用預測出來的勝場數做出戰績排名表,據此,得到季後賽名單來跟實際名單做相比。
    本篇論文提出兩種模型來預測勝場數,其中,第一種模型,使用人工神經網路(Artificial Neural Network),而第二種模型,則會利用閘控遞迴單元網路(Gated Recurrent Unit),且資料的收集將會以2000年~2018年的數據做為訓練基礎,並以2019年的戰績作為最後的測試資料。此外,我們為了增加這些模型的信賴度,也會把2019 ZIPS球員預測成績結合2019 ZIPS 預估的球隊成績當作另一個測試集;另外,2019 ZIPS球隊勝場預測結果,也會當成我們比較結果的標準。
    在最後的結果裡,人工神經網路模型表現得比閘控遞迴單元網路來的出色。接著比較把目標當成分類問題或回歸問題,當成回歸問題的結果又些許贏過視為分類問題的結果。最後比較了四種特徵選取的方式,發現關聯性方法是最好的方法。綜合上述,我們可以得到最好的模型是利用人工神經網路搭配關聯性特徵選取法來解決回歸性的問題,在利用2019真實數據當測試及測試時,並在RMSE作為評測方式下得到4.55的成績。而當使用ZIPS預估的球隊成績做為測試數據時,可得到9.04的結果。另外,在做季後賽預測測試時,可以分別得到0.93及0.73的準確率。


    Data has always been around everyone and is inseparable from human life. In recent years, the status of data in various fields has been increasing, especially in professional sports; in all professional sports, the statistics of baseball games can be said to be the pioneer of digitization. For example, “Sabermetrics” is the best representative of using data.
    Baseball data is relatively easy to obtain and a large amount, and the Major League of Baseball (MLB) is the top and most famous professional baseball league in the world. This thesis will use deep learning to predict the annual record range of each team in MLB; because record prediction is relatively complicated and difficult, and the original data contains a lot of noise, the importance of feature selection is greatly increased. We will use Weka for feature selection, and then use two models to predict the number of wins, and use the root mean square error (RMSE) as the criteria to compare with the actual number of wins; in addition, according to the number of predicted wins, the obtained list of playoffs is acquired to compare with the actual list of playoffs.
    This thesis proposes two models. One is Artificial Neural Network (ANN), and another is Gated Recurrent Unit (GRU) network to predict the number of wins. The data collection comprises the record from the 2000 season to the 2018 season, and the record of the 2019 season is used as the final test data. In addition, in order to increase the reliability of these models, we combine the 2019 ZIPS player prediction results with the team standings estimated by the 2019 ZIPS as another test set. Besides this, the 2019 ZIPS team's win prediction result also acts as our standard for comparing results.
    According to the final experimental results, we can find some conclusions. First, ANN does better than GRU. Then regression problem surpasses the classification problem. In the end, the best feature selection method among the 4 options is the correlation method. After combination the situations above, we can realize that ANN model with the correlation method for the regression problem is our best option. The RMSEs of this model are 4.55 and 9.04 while using 2019 actual test data and ZIPS projection data. In addition, this model can acquire the 0.93 and 0.73 accuracy rates while using two test data in the playoff team prediction part.

    中文摘要 i Abstract ii 誌謝 iii List of Figures vi List of Tables vii Chapter 1 Introduction 1 1.1 Overview 1 1.2 Motivation 3 1.3 Introduction to Major League Baseball and Basic Baseball Stats 4 1.4 Thesis Organization 7 Chapter 2 Related Work 8 2.1 Football (Soccer) Game Result Prediction 8 2.2 Other Sports Result Prediction 9 2.3 Baseball Team’s Win-Loss Prediction (Game-Level) 10 2.4 Baseball Player’s Performance Stats Prediction 11 2.5 Playoff Team Prediction 12 Chapter 3 Data Preprocessing and Feature Selection 13 3.1 Dataset 13 3.2 Data Preprocessing 15 3.3 Feature Selection 16 Chapter 4 Our Proposed Prediction Method 20 4.1 Artificial Neural Network 20 4.2 Gated Recurrent Unit 23 Chapter 5 Experimental Results and Discussion 29 5.1 Experimental Environment Setup 29 5.2 Abbreviation in Experimental Result Table 30 5.3 Team Standing Prediction 30 5.4 Playoff Prediction 34 5.5 Discussion of Experimental Results 37 Chapter 6 Conclusions and Future Work 40 6.1 Conclusions 40 6.2 Future Work 41 References 43

    [1] M. Lewis, Moneyball: The Art of Winning an Unfair Game, New York: W. W. Norton,
    2003.
    [2] “FanGraphs Baseball,” [Online]. Available: https://www.fangraphs.com/.
    [3] N. Danisik, P. Lacko, and M. Farkas, “Football match prediction using players
    attributes,” in Proceedings of the IEEE World Symposium on Digital Intelligence for
    Systems and Machines, Kosice, Slovakia, pp. 201-206, 2018.
    [4] M. A. Raju et al., “Predicting the outcome of English Premier League matches using
    machine learning,” in Proceedings of the 2nd International Conference on Sustainable
    Technologies for Industry 4.0, Dhaka, Bangladesh, pp. 1-6, 2020.
    [5] F. Thabtah, L. Zhang, and N. Abdelhamid, “NBA game result prediction using feature
    analysis and machine learning,” Annals of Data Science, vol. 6, no. 1, pp. 103-116,
    2019.
    [6] M. Hall et al., “The WEKA data mining software: An update,”ACM SIGKDD
    Explorations Newsletter, vol. 11, no. 1, pp. 10-18, 2009.
    [7] P.Bosch and S. Bhulai, “Predicting the winner of NFL-games using machine and deep
    learning,” Research Paper Business Analytics, 2018.
    [8] R. Jia, C. Wong, and D. Zeng, “Predicting the Major League Baseball season,”
    Unpublished Project Report CS 229, Stanford University, Standford, California, 2013.
    [9] C. Soto-Valero, “Predicting Win-Loss outcomes in MLB regular season games– A
    comparative study using data mining methods,” International Journal of Computer
    Science in Sport, vol. 15, no. 2, pp. 91-112, 2016.
    [10] S. S. Keerthi et al., “Improvements to Platt's SMO algorithm for SVM classifier
    design,” Neural Computation, vol. 13, no. 3, pp. 637-649, 2001.
    [11] T. Elfrink and S. Bhulai, “Predicting the outcomes of MLB games with a machine
    learning approach,” Unpublished Report, Vrije Universiteit, Amsterdam, Netherlands,
    2018.
    [12] S. R. Bailey, J. Loeppky, and T. B. Swartz, “The prediction of batting averages in
    Major League Baseball,” Stats, vol. 3, no. 2, pp. 84-93, 2020.
    [13] T. C. Yu and J. C. Hung, “Forecasting MLB playoff teams using GA-SVM,” in
    Proceedings of the International IEEE Conference on Applied System Innovation,
    Sapporo, Japan, pp. 446-448, 2017.
    [14] “Baseball Reference,” [Online]. Available: https://www.baseball-reference.com/.
    44
    [15] J. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, pp. 81-106, 1986.
    [16] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation,
    vol. 9, no. 8, pp. 1735-1780, 1997.
    [17] K. Cho et al., “Learning phrase representations using RNN Encoder-Decoder for
    statistical machine translation,” in Proceedings of the 2014 Conference on Empirical
    Methods in Natural Language Processing, Doha, Qatar, pp. 1724-1734, 2014.

    無法下載圖示 全文公開日期 2026/08/30 (校內網路)
    全文公開日期 2031/08/30 (校外網路)
    全文公開日期 2031/08/30 (國家圖書館:臺灣博碩士論文系統)
    QR CODE