一個基於深度神經網路用以預測美國職業棒球大聯盟球隊戰績的方法－以是否晉級季後賽為例

簡易檢索 / 詳目顯示

回結果列表

研究生：	李祐任 Yu-Jen Lee
論文名稱：	一個基於深度神經網路用以預測美國職業棒球大聯盟球隊戰績的方法－以是否晉級季後賽為例 A Deep-Neural-Network-Based Approach to Predicting the Team Standings of Major League Baseball－Taking Whether to Qualify for the Playoffs as An Example
指導教授：	范欽雄 Chin-Shyurng Fahn
口試委員:	黃榮堂 Jung-Tang Huang 李建德 Jiann-Der Lee 吳怡樂 Yi-Leh Wu
學位類別：	碩士 Master
系所名稱：	電資學院 - 資訊工程系 Department of Computer Science and Information Engineering
論文出版年：	2021
畢業學年度：	109
語文別：	英文
論文頁數：	52
中文關鍵詞：	深度學習網路、棒球比賽、美國職棒大聯盟、球隊戰績、勝場預測、季後賽預測
外文關鍵詞：	deep neural network, baseball game, Major League Baseball, team standing, the prediction of wins, the prediction of playoff teams
相關次數：	點閱：291 下載：0
分享至:	分享至facebook 分享至twitter

查詢本校圖書館目錄查詢臺灣博碩士論文知識加值系統勘誤回報

數據一直以來都出現在每個人的身邊，且與人類生活是密不可分的。近年來，數據在各領域地位日益漸增，尤其是在職業運動方面更加明顯；在所有職業運動中，棒球比賽的統計可說是數據化的先驅，例如：”Sabermetrics”是使用數據的最佳代表。
棒球的數據是相對容易取得且大量的，而Major League of Baseball (MLB)又是世界上最頂級且最有名的職業棒球聯盟。本篇論文將運用深度學習的方式來預測MLB各球隊的整年度戰績區間；由於戰績預測是相對複雜且困難，而原始資料存在著大量的雜訊，導致特徵選取的重要性大大提升。我們將使用Weka做特徵的選取，再使用兩種模型來預測勝場數，且利用均方根誤差(Root Mean Square Error; RMSE)的評斷標準跟真實勝場數做比較；此外，用預測出來的勝場數做出戰績排名表，據此，得到季後賽名單來跟實際名單做相比。
本篇論文提出兩種模型來預測勝場數，其中，第一種模型，使用人工神經網路(Artificial Neural Network)，而第二種模型，則會利用閘控遞迴單元網路(Gated Recurrent Unit)，且資料的收集將會以2000年~2018年的數據做為訓練基礎，並以2019年的戰績作為最後的測試資料。此外，我們為了增加這些模型的信賴度，也會把2019 ZIPS球員預測成績結合2019 ZIPS 預估的球隊成績當作另一個測試集；另外，2019 ZIPS球隊勝場預測結果，也會當成我們比較結果的標準。
在最後的結果裡，人工神經網路模型表現得比閘控遞迴單元網路來的出色。接著比較把目標當成分類問題或回歸問題，當成回歸問題的結果又些許贏過視為分類問題的結果。最後比較了四種特徵選取的方式，發現關聯性方法是最好的方法。綜合上述，我們可以得到最好的模型是利用人工神經網路搭配關聯性特徵選取法來解決回歸性的問題，在利用2019真實數據當測試及測試時，並在RMSE作為評測方式下得到4.55的成績。而當使用ZIPS預估的球隊成績做為測試數據時，可得到9.04的結果。另外，在做季後賽預測測試時，可以分別得到0.93及0.73的準確率。

Data has always been around everyone and is inseparable from human life. In recent years, the status of data in various fields has been increasing, especially in professional sports; in all professional sports, the statistics of baseball games can be said to be the pioneer of digitization. For example, “Sabermetrics” is the best representative of using data.
Baseball data is relatively easy to obtain and a large amount, and the Major League of Baseball (MLB) is the top and most famous professional baseball league in the world. This thesis will use deep learning to predict the annual record range of each team in MLB; because record prediction is relatively complicated and difficult, and the original data contains a lot of noise, the importance of feature selection is greatly increased. We will use Weka for feature selection, and then use two models to predict the number of wins, and use the root mean square error (RMSE) as the criteria to compare with the actual number of wins; in addition, according to the number of predicted wins, the obtained list of playoffs is acquired to compare with the actual list of playoffs.
This thesis proposes two models. One is Artificial Neural Network (ANN), and another is Gated Recurrent Unit (GRU) network to predict the number of wins. The data collection comprises the record from the 2000 season to the 2018 season, and the record of the 2019 season is used as the final test data. In addition, in order to increase the reliability of these models, we combine the 2019 ZIPS player prediction results with the team standings estimated by the 2019 ZIPS as another test set. Besides this, the 2019 ZIPS team's win prediction result also acts as our standard for comparing results.
According to the final experimental results, we can find some conclusions. First, ANN does better than GRU. Then regression problem surpasses the classification problem. In the end, the best feature selection method among the 4 options is the correlation method. After combination the situations above, we can realize that ANN model with the correlation method for the regression problem is our best option. The RMSEs of this model are 4.55 and 9.04 while using 2019 actual test data and ZIPS projection data. In addition, this model can acquire the 0.93 and 0.73 accuracy rates while using two test data in the playoff team prediction part.

中文摘要    i
Abstract    ii
誌謝    iii
List of Figures    vi
List of Tables    vii
Chapter 1    Introduction    1
1.1    Overview    1
1.2    Motivation    3
1.3    Introduction to Major League Baseball and Basic Baseball Stats    4
1.4    Thesis Organization    7
Chapter 2    Related Work    8
2.1    Football (Soccer) Game Result Prediction    8
2.2    Other Sports Result Prediction    9
2.3    Baseball Team’s Win-Loss Prediction (Game-Level)    10
2.4    Baseball Player’s Performance Stats Prediction    11
2.5    Playoff Team Prediction    12
Chapter 3    Data Preprocessing and Feature Selection    13
3.1    Dataset    13
3.2    Data Preprocessing    15
3.3    Feature Selection    16
Chapter 4    Our Proposed Prediction Method    20
4.1    Artificial Neural Network    20
4.2    Gated Recurrent Unit    23
Chapter 5    Experimental Results and Discussion    29
5.1    Experimental Environment Setup    29
5.2    Abbreviation in Experimental Result Table    30
5.3    Team Standing Prediction    30
5.4    Playoff Prediction    34
5.5    Discussion of Experimental Results    37
Chapter 6    Conclusions and Future Work    40
6.1    Conclusions    40
6.2    Future Work    41
References    43

                                

[1] M. Lewis, Moneyball: The Art of Winning an Unfair Game, New York: W. W. Norton,
2003.
[2] “FanGraphs Baseball,” [Online]. Available: https://www.fangraphs.com/.
[3] N. Danisik, P. Lacko, and M. Farkas, “Football match prediction using players
attributes,” in Proceedings of the IEEE World Symposium on Digital Intelligence for
Systems and Machines, Kosice, Slovakia, pp. 201-206, 2018.
[4] M. A. Raju et al., “Predicting the outcome of English Premier League matches using
machine learning,” in Proceedings of the 2nd International Conference on Sustainable
Technologies for Industry 4.0, Dhaka, Bangladesh, pp. 1-6, 2020.
[5] F. Thabtah, L. Zhang, and N. Abdelhamid, “NBA game result prediction using feature
analysis and machine learning,” Annals of Data Science, vol. 6, no. 1, pp. 103-116,
2019.
[6] M. Hall et al., “The WEKA data mining software: An update,”ACM SIGKDD
Explorations Newsletter, vol. 11, no. 1, pp. 10-18, 2009.
[7] P.Bosch and S. Bhulai, “Predicting the winner of NFL-games using machine and deep
learning,” Research Paper Business Analytics, 2018.
[8] R. Jia, C. Wong, and D. Zeng, “Predicting the Major League Baseball season,”
Unpublished Project Report CS 229, Stanford University, Standford, California, 2013.
[9] C. Soto-Valero, “Predicting Win-Loss outcomes in MLB regular season games– A
comparative study using data mining methods,” International Journal of Computer
Science in Sport, vol. 15, no. 2, pp. 91-112, 2016.
[10] S. S. Keerthi et al., “Improvements to Platt's SMO algorithm for SVM classifier
design,” Neural Computation, vol. 13, no. 3, pp. 637-649, 2001.
[11] T. Elfrink and S. Bhulai, “Predicting the outcomes of MLB games with a machine
learning approach,” Unpublished Report, Vrije Universiteit, Amsterdam, Netherlands,
2018.
[12] S. R. Bailey, J. Loeppky, and T. B. Swartz, “The prediction of batting averages in
Major League Baseball,” Stats, vol. 3, no. 2, pp. 84-93, 2020.
[13] T. C. Yu and J. C. Hung, “Forecasting MLB playoff teams using GA-SVM,” in
Proceedings of the International IEEE Conference on Applied System Innovation,
Sapporo, Japan, pp. 446-448, 2017.
[14] “Baseball Reference,” [Online]. Available: https://www.baseball-reference.com/.
44
[15] J. Quinlan, “Induction of decision trees,” Machine Learning, vol. 1, pp. 81-106, 1986.
[16] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation,
vol. 9, no. 8, pp. 1735-1780, 1997.
[17] K. Cho et al., “Learning phrase representations using RNN Encoder-Decoder for
statistical machine translation,” in Proceedings of the 2014 Conference on Empirical
Methods in Natural Language Processing, Doha, Qatar, pp. 1724-1734, 2014.

全文公開日期 2026/08/30 (校內網路)
全文公開日期 2031/08/30 (校外網路)
全文公開日期 2031/08/30 (國家圖書館：臺灣博碩士論文系統)

簡易檢索 / 詳目顯示

相關論文