簡易檢索 / 詳目顯示

研究生: 簡世育
Shih-Yu Chien
論文名稱: 利用機器學習預測部落格人氣 - 以Medium為例
Using Machine Learning Techniques to Predict Blog Popularity – Evidence of Medium
指導教授: 林孟彥
Meng-Yen Lin
郭啟賢
Chii-Shyan Kuo
口試委員: 林孟彥
Meng-Yen Lin
郭啟賢
Chii-Shyan Kuo
謝亦泰
Yi-Tai Shih
學位類別: 碩士
Master
系所名稱: 管理學院 - 企業管理系
Department of Business Administration
論文出版年: 2019
畢業學年度: 107
語文別: 中文
論文頁數: 33
中文關鍵詞: 部落格人氣結構化資料非結構化資料機器學習極限梯度提升
外文關鍵詞: Blog Popularity, Structured Data, Unstructured Data, Machine Learning, XGBoost
相關次數: 點閱:285下載:6
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著電子商務時代的來臨,部落格已成為現今消費者及企業口碑傳遞的重要工具。而面對網路上大量的部落格湧現,如何提高部落格人氣,已成為全球部落客討論的重點。資訊社會的來臨,網際網路上的結構化資料及非結構化資料日益增長,過去大多數的研究都是透過小樣本進行分析,其資料缺乏樣本多樣性也很難確保預測的準確性。本研究以Medium部落格平台作為研究對象,利用極限梯度提升演算法比較結構化資料與非結構化資料何種資料型態在預測部落格人氣時具有較好的預測能力,最後利用特徵選擇技術找出影響部落格人氣的重要變數、內容關鍵字與關鍵字標籤。
    研究結果顯示,利用非結構化資料及非結構化資料所進行預測部落格人氣能使預測能力達到最好。接著,透過特徵選擇本研究發現平均留言次數為影響部落格人氣的重要變數;而部落格的內容中如果能夠跟上現在科技流行的趨勢,可以幫助提升部落格人氣。


    With the advent of E-commerce, blog has become an important tool for word-of-mouth communication among consumers and business. With the emergence of a large number of blogs on the Internet, how to improve the blog popularity has become the main topic among bloggers around the world. With the advent of the information society, structured and unstructured data are growing on the Internet. Most of the research were conducted through small samples. The lack of sample diversity and the accuracy of predictions are difficult to ensure. This study uses the Medium blog platform as the research object. Using XGBoost to compare structural data and unstructured data which data types has better predictive accuracy in blog popularity predicting. Finally, using feature selection techniques to identify important variables, content keywords, and keyword tags that affect blog popularity.
    The results, predicted blog popularity with unstructured data and unstructured data can ensure that prediction model has best predictive accuracy. Second, we found that average response count is the important variable affecting the blog popularity;If the content of the blog can keep up with the current trend of technology, it can help improve the blog popularity.

    摘要 I Abstract II 目錄 III 圖目錄 IV 表目錄 V 1. 緒論 1 2. 文獻探討 2 2.1. 部落格人氣的定義與目的 2 2.2. 影響部落格人氣的因素 2 2.3. 極限梯度提升演算法 3 3. 研究方法 3 3.1. 研究對象 3 3.2. 變數選擇 3 3.3. 資料預處理 6 3.3.1. 移除停用詞 6 3.3.2. N-gram 7 3.3.3. TF-IDF 7 3.3.4. One-Hot Encoding 8 3.4. 預測模型 8 4. 研究結果 8 4.1. 敘述性統計 8 4.2. 整合前預測結果 9 4.3. 整合後預測結果 12 5. 結論與建議 14 5.1. 結論 14 5.2. 管理意涵 14 5.3. 研究限制 15 參考文獻 16 附錄一、爬蟲程式碼 19 附錄二、預處理程式碼 21 附錄三、預測模型程式碼 23

    Dobele, Anitha, Liyana Shuib, Maizatul Akmar Ismail and Ghulam Mujtaba (2018), “Social Media Recommender Systems: Review and Open Research Issues,” IEEE Access, 6, 15608-15628.
    Bengio, Yoshua, Réjean Ducharme, Pascal Vincent and Christian Jauvin (2003), “A Neural Probabilistic Language Model,” Journal of Machine Learning Research, 3, 1137-1155.
    Bourlai, Elli E. (2018), “‘Comments in Tags, Please!’: Tagging Practices on Tumblr,” Discourse, Context & Media, 22, 46-56.
    Chang, Hsin Hsin, Kit Hong Wong, and Tsun Wei Chu. (2018), “Online Advertorial Attributions on Consumer Responses: Materialism as a Moderator,” Online Information Review, 42 (5), 697-717.
    Chen, Jinchuan, Yueguo Chen, Xiaoyong Du, Cuiping Li, Jiaheng Lu, Suyun Zhao and Xuan Zhou (2013), “Big Data Challenge: a Data Management Perspective,” Frontiers of Computer Science, 7 (2), 157-164.
    Chen, Tianqi, and Carlos Guestrin (2016), “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794.
    Colucci, Cara and Erin Cho (2014), “Trust Inducing Factors of Generation Y Blog-Users,” International Journal of Design, 8 (3), 113-122.
    Djamasbi, Soussan, Marisa Siegel and Tom Tullis (2010), “Generation Y, Web Design, and Eye Tracking,” International journal of human-computer studies, 68(5), 307-323.
    Dobele, Angela, Marion Steel, and Tony Cooper (2015), “Sailing the Seven C ' s of Blog Marketing: Understanding Social Media and Business Impact,” Marketing Intelligence & Planning, 33 (7), 1087-1102.
    Erz, Antonia and Anna-Bertha Heeris Christensen (2018), “Transforming Consumers Into Brands: Tracing Transformation Processes of the Practice of Blogging,” Journal of Interactive Marketing, 43, 69-82.
    Gardiner, Anna, Miriam Sullivan and Ann Grand (2018), “Who Are You Writing for? Differences in Response to Blog Design Between Scientists and Nonscientists,” International Journal of Design, 40 (1), 109-123.
    Gonçalves, Marcos André, Jussara M. Almeida, Luiz G.P. dos Santos, Alberto H.F. Laender and Virgílio Almeida (2010), “‘On Popularity in the Blogosphere,” IEEE Internet Computing, 14 (3), 42-49.
    He, Xiangnan, Ming Gao, Min-Yen Kan, Yiqun Liu and Kazunari Sugiyama (2014), “Predicting the Popularity of Web 2.0 Items Based on User Comments,”in SIGIR '14 Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, 233-242.
    Huang, Li-Shia Yu-Jen Chou and Che-Hung Lin (2008), “The Influence of Reading Motives on the Responses after Reading Blogs,” CyberPsychology & Behavior, 11 (3), 351-355.
    Huang, Li-Shia (2015), “Trust in Product Review Blogs: The Influence of Self-disclosure and Popularity,” Behaviour & Information Technology, 34 (1), 33-44.
    Khan, Hikmat Ullah and Ali Daud (2017), “Finding the Top Influential Bloggers based on Productivity and Popularity Features,” New Review of Hypermedia and Multimedia, 23 (3), 189-206.
    Kim, Su-Do, Sung-Hwan Kim and Hwan-Gue Cho (2011), “Predicting the Virtual Temperature of Web-Blog Articles as a Measurement Tool for Online Popularity,” In 2011 11th IEEE International Conference on Computer and Information Technology, 449-454.
    Ko, Hsiu‐Chia (2012), “Why Are A‐list Bloggers Continuously Popular?,” Online Information Review,36 (3), 401-419.
    Kraus, Mathias and Stefan Feuerriegel (2017), “Decision Support From Financial Disclosures with Deep Neural Networks and Transfer Learning,” Decision Support Systems, 104, 38-48.
    Kumar, B. Shravan and Vadlamani Ravi (2016), “A Survey of the Applications of Text Mining in Financial Domain,” Knowle dge-Base d Systems, 114, 128-147.
    Li, Feng and Timon C. Du (2011), “Who is Talking? An Ontology-based Opinion Leader Identification Framework for Word-of-mouth Marketing in Online Social Blogs,” Decision Support Systems, 51 (1), 190-197.
    Liu, Duen-Ren, Pei-Yun Tsai, and Po-Huan Chiu (2011), “Personalized Recommendation of Popular Blog Articles for Mobile Applications,” Information Sciences, 181 (9) 1552-1572.
    Liu, Wensen, Xiaoyi Wang and Zewen Cao (2015), “Popularity Prediction in Microblog Based on LR-DT,” in 2015 International Conference on Behavioral, Economic and Socio-cultural Computing, 18-23.
    Lowrey, Wilson, Scott Parrott, and Tom Meade (2011), “When Blogs become Organizations,” Journalism, 12 (3), 243-259.
    Makrehchi, Masoud and Mohamed S. Kamel (2017), “Extracting Domain-Specific Stopwords for Text Classifiers,” Intelligent Data Analysis, 21, 39-62.
    Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg Corrado and Jeffrey Dean (2013b), “Distributed Representations of Words and Phrases and their Compositionality,” in Proceedings of neural information processing systems, 785-794.
    Mikolov, Tomas, Kai Chen, Greg Corrado and Jeffrey Dean (2013a), “Efficient Estimation of Word Representations in Vector Space,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794.
    Mikolov, Tomas, Wen-tau Yih and Geoffrey Zweig (2013c), “Linguistic Regularities in Continuous Space Word Representations,” in Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: Human language technologies. Association for Computational Linguistics.
    Nasar1, Zara, Syed Waqar Jaffry and Muhammad Kamran Malik (2018), “Information extraction from scientific articles: a survey,” Scientometrics, 117, 1931-1990.
    Park, Jinhee, Jaekwang Kim and Jee-Hyong Lee (2013), “Keyword extraction for blogs based on content richness,” Journal of Information Science, 40 (1), 38-49.
    Ranger, Mathieu and Karen Bultitude (2016), “‘The Kind of Mildly Curious Sort of Science Interested Person Like me’: Science Bloggers’ Practices Relating to Audience Recruitment,” Public Understanding of Science, 25 (3), 361-378.
    Robertson, Stephen (2004), “Understanding Inverse Document Frequency: on Theoretical Arguments for IDF,” Journal of Documentation, 60 (5), 503-520.
    Rowse, D and Garrett, C. (2012), ProBlogger: Secrets for Blogging Your Way to a Six-figure Income, John Wiley & Sons.
    Tripathy, Abinash, Ankit Agrawal and Santanu Kumar Rath (2016), “Classification of Sentiment Reviews using N-gram Machine Learning Approach,” Expert Systems With Applications, 57, 117-126.
    Xia, Yufei, Chuanzhe Liu, YuYing Li and Nana Liu (2017), “A Boosted Decision Tree Approach using Bayesian Hyper-parameter Optimization for Credit Scoring,” Expert Systems With Applications, 78, 225-241.

    QR CODE