研究生: 簡世育
Shih-Yu Chien
論文名稱: 利用機器學習預測部落格人氣 - 以Medium為例
Using Machine Learning Techniques to Predict Blog Popularity – Evidence of Medium
指導教授: 林孟彥
Meng-Yen Lin
Chii-Shyan Kuo
口試委員: 林孟彥
Meng-Yen Lin
Chii-Shyan Kuo
Yi-Tai Shih
學位類別: 碩士
系所名稱: 管理學院 - 企業管理系
Department of Business Administration
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 33
中文關鍵詞: 部落格人氣結構化資料非結構化資料機器學習極限梯度提升
外文關鍵詞: Blog Popularity, Structured Data, Unstructured Data, Machine Learning, XGBoost
相關次數: 點閱:436下載:6
With the advent of E-commerce, blog has become an important tool for word-of-mouth communication among consumers and business. With the emergence of a large number of blogs on the Internet, how to improve the blog popularity has become the main topic among bloggers around the world. With the advent of the information society, structured and unstructured data are growing on the Internet. Most of the research were conducted through small samples. The lack of sample diversity and the accuracy of predictions are difficult to ensure. This study uses the Medium blog platform as the research object. Using XGBoost to compare structural data and unstructured data which data types has better predictive accuracy in blog popularity predicting. Finally, using feature selection techniques to identify important variables, content keywords, and keyword tags that affect blog popularity.
The results, predicted blog popularity with unstructured data and unstructured data can ensure that prediction model has best predictive accuracy. Second, we found that average response count is the important variable affecting the blog popularity;If the content of the blog can keep up with the current trend of technology, it can help improve the blog popularity.

摘要 I Abstract II 目錄 III 圖目錄 IV 表目錄 V 1. 緒論 1 2. 文獻探討 2 2.1. 部落格人氣的定義與目的 2 2.2. 影響部落格人氣的因素 2 2.3. 極限梯度提升演算法 3 3. 研究方法 3 3.1. 研究對象 3 3.2. 變數選擇 3 3.3. 資料預處理 6 3.3.1. 移除停用詞 6 3.3.2. N-gram 7 3.3.3. TF-IDF 7 3.3.4. One-Hot Encoding 8 3.4. 預測模型 8 4. 研究結果 8 4.1. 敘述性統計 8 4.2. 整合前預測結果 9 4.3. 整合後預測結果 12 5. 結論與建議 14 5.1. 結論 14 5.2. 管理意涵 14 5.3. 研究限制 15 參考文獻 16 附錄一、爬蟲程式碼 19 附錄二、預處理程式碼 21 附錄三、預測模型程式碼 23

