簡易檢索 / 詳目顯示

研究生: 蘇新佳
Monica Cynthia Soputro
論文名稱: 利用機器學習選出Facebook粉絲專頁貼文關鍵特徵以增進貼文點擊次數
Using Machine Learning to identify key features to increase number of clicks for Facebook post
指導教授: 林孟彥
Meng-Yen Lin
口試委員: 蔡瑤昇
Yao-Sheng Tsai
葉穎蓉
Ying-Jung Yeh
學位類別: 碩士
Master
系所名稱: 管理學院 - 管理學院MBA
School of Management International (MBA)
論文出版年: 2018
畢業學年度: 106
語文別: 英文
論文頁數: 28
中文關鍵詞: 機器學習特徵選擇點擊次數預測臉書貼文
外文關鍵詞: machine learning, feature selection, clicks prediction, Facebook post
相關次數: 點閱:258下載:17
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著社群媒體用戶數量的不斷增長,越來越多的企業熱衷於利用Facebook (FB)等社交網絡拓展商業活動。透過創建FB粉絲專頁,讓企業可以發布與電子商務有關的連結資訊貼文給消費者,除了有與消費者互動的好處之外,企業也希望增加消費者對品牌的意識並增加購買意願。為了瞭解消費者直接點擊網站連結的重要性,本研究目的為發展一個預測模型來預測FB貼文的點擊次數,並透過機器學習技術找出關鍵特徵以增進貼文點擊次數。我提出了兩種方法來建構預測模型預測FB貼文的點擊次數,再以均方根對數誤差(RMSLE)值比較兩種方法預測的效能,結果顯示極限梯度增強樹(XGBoost)實現的RMSLE值為0.835,與脊迴歸(Ridge Regression)實現的RMSLE值為0.981相比,顯示有較佳的預測效能。我也利用XGBoost技術進行對特徵選擇,過濾出對應變數相對重要的自變數。


    As the number of social media users continues to grow, more businesses are keen on taking advantage of the Social Network, such as Facebook (FB), to conduct commercial activities, known as social commerce. One way is to create a FB page, which allows companies to communicate company information to consumers by publishing posts with direct links to ecommerce websites. Apart from the benefit it offers to interact with consumers, companies are also hoping to increase consumer awareness toward the brand and to gain consumer purchase intention. Realizing the importance of engaging consumers to click on direct links, the objectives of this paper are to develop a predictive model to predict clicks for link in FB post and to discover key features of FB post to drive number of clicks by performing machine learning technique. I propose two methods for building a predictive model to predict the number of clicks. Then, I compare the performance of both approaches by looking at the Root Mean Squared Logarithmic Error (RMSLE) result. The Extreme Gradient Boosting Trees (XGBoost) implementation on the predictive model with a RMSLE value of 0.835 represents a better performance compared to Ridge Regression with 0.981 RMSLE value. I also utilized XGBoost technique to perform feature selection to shrink my independent variables to a subset of predictors that are important to promote number of clicks.

    Table of Contents 摘要 i ABSTRACT ii ACKNOWLEDGMENT iii Table of Contents iv List of Tables v List of Figures vi 1. Introduction 1 2. Literature Review 3 3. The Data 4 3.1. Data Collection 4 3.2. Variables 5 4. Methodology 6 4.1. Eliminating Stop-words 7 4.2. Term Frequency – Inverse Document Frequency (TF-IDF) 7 4.3. Predictive Modeling Methodology 7 4.4. Feature Selection 8 4.5. Word Scoring 9 5. Results 9 5.1. Predictive Model 9 5.2. Feature Selection 10 5.3. Word Scoring 11 6. Conclusion 12 7. Reference 14 APPENDIX 18

    Abbruzzo, Antonio, Juan G. Brida, and Raffaele Scuderi (2014), "Scad-elastic Net and the Estimation of Individual Tourism Expenditure Determinants," Decision Support Systems, 66, 52–60.
    Bose, Indranil and Radha K. Mahapatra (2001), "Business Data Mining — a Machine Learning Perspective," Information & Management, 39 (3), 211–225.
    Chappelle, Olivier, Eren Manavoglu, and Romer Rosales (2015), "Simple and Scalable Response Prediction for Display Advertising," ACM Transactions on Intelligent Systems and Technology, 5 (4), Article 61.
    Chen, Jianqing and Jan Stallaert (2014), "An Economic Analysis of Online Advertising Using Behavioral Targeting," MIS Quarterly, 38 (2), 429–449.
    Chen, Jengchung V., Bo-chiuan Su, and Andree E. Widjaja (2016), "Facebook C2C Social Commerce: A Study of Online Impulse Buying," Decision Support Systems, 83, 57–69.
    Chen, Tianqi and Carlos Guestrin (2016), "XGBoost : A scalable Tree Boosting System," in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '16, 785–794.
    Figl, Kathrin, Jan Recker, and Jan Mendling (2013), "A study on the Effects of Routing Symbol Design on Process Model Comprehension," Decision Support Systems, 54 (2), 1104–1118.
    Forbes, Peter and Zhu, Mu (2011), "Content-boosted Matrix Factorization for Recommender Systems: Experiments with Recipe Recommendation," in Proceedings of the Fifth ACM Conference on Recommender Systems - RecSys ’11, 261–264.
    Guyon, Isabelle and André Elisseeff (2003), "An Introduction to Variable and Feature Selection," Journal of Machine Learning Research, 3, 1157–1182.
    He, Xinran, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, and Joaquin Q. Candela (2014), "Practical Lessons from Predicting Clicks on Ads at Facebook," in Proceedings of the Eighth International Workshop on Data Mining for Online Advertising - ADKDD '14, 1–9.
    Hoerl, Arthur E. and Robert W. Kennard (1970), "Ridge Regression: Biased Estimation for Nonorthogonal Problems," Technometrics, 12 (1), 55–67.
    Hong, Taeho and Eunmi Kim (2012), "Segmenting Customers in Online Stores Based on Factors that Affect the Customer’s Intention to Purchase," Expert Systems with Applications, 39 (2), 2127–2131.
    Hummel, Patrick and R. Preston McAfee (2017), "Loss Functions for Predicted Click-through Rates in Auctions for Online Advertising," Journal of Applied Econometrics, 32 (7), 1314–1328.
    Hutter, Katja, Julia Hautz, Severin Dennhardt, and Johann Füller (2013), "The Impact of User Interactions in Social Media on Brand Awareness and Purchase Intention: The Case of MINI on Facebook," Journal of Product & Brand Management, 22 (5/6), 342–351.
    Kim, Cookhwan, Sungsik Park, Yongseok Chang, and Woojin Chang (2011), "Random Effects Model for Estimating Effectiveness of Advertising in Online Marketplaces," Expert Systems with Applications, 38 (8), 9867–9878.
    Langley, Pat and Herbert A. Simon (1995), "Applications of Machine Learning and Rule Induction," Communications of the ACM, 38 (11), 54–64.
    Lee, Heeseok, Sue Y. Choi, and Young S. Kang (2009), "Formation of e-Satisfaction and Repurchase Intention: Moderating Roles of Computer Self-efficacy and Computer Anxiety," Expert Systems with Applications, 36 (4), 7848–7859.
    Lipovetsky, Stan (2010), "Enhanced Ridge Regressions," Mathematical and Computer Modelling, 51 (5–6), 338–348.
    Lohtia, Ritu, Naveen Donthu, and Edmund K. Hershberger (2003), "The Impact of Content and Design Elements on Banner Advertising Click-through Rates," Journal of Advertising Research, 43 (4), 410–418.
    Richardson, Matthew, Ewa Dominowska, and Robert Ragno (2007), "Predicting Clicks: Estimating the Click-through Rate for New Ads," in Proceedings of the 16th International Conference on World Wide Web - WWW '07, 521–530.
    Rutz, Oliver J., and Randolph E. Bucklin (2011), "From Generic to Branded: A Model of Spillover in Paid Search Advertising," Journal of Marketing Research, 48 (1), 87–102.
    Shan, Lili, Lei Lin, Chengjie Sun, and Xiaolong Wang (2016), "Predicting Ad Click-through Rates via Feature-based Fully Coupled Interaction Tensor Factorization," Electronic Commerce Research and Applications, 16, 30–42.
    Sherman, Lee and John Deighton (2001), "Banner Advertising: Measuring Effectiveness and Optimizing Placement," Journal of Interactive Marketing, 15 (2), 60–64.
    Singh, Jyoti P., Seda Irani, Nripendra P. Rana, Yogesh K. Dwivedi, Sunil Saumya, and Pradeep K. Roy (2017), "Predicting Helpfulness of Online Consumer Reviews," Journal of Business Research, 70, 346–355.
    Statista. (2017). "Number of Internet Users Worldwide from 2015-2017". (February 4, 2018), [available at https://www.statista.com/statistics/273018/number-of-internet-users-worldwide/].
    Tam, Kar Y. and Shuk Y. Ho (2006), "Understanding the Impact of Web Personalization on User Information Processing and Decision Outcomes," MIS Quarterly, 30 (4), 865–890.
    Valavi, Masood, Michael Svärd and Åke C. Rasmuson (2016), "Prediction of the Solubility of Medium-Sized Pharmaceutical Compounds Using a Temperature-Dependent NRTL-SAC Model," Industrial and Engineering Chemistry Research, 55 (42), 11150–11159.
    Van Den Poel, Dirk, and Wouter Buckinx (2005), "Predicting Online-purchasing Behaviour," European Journal of Operational Research, 166 (2), 557–575.
    Van Der Aalst, W. M. P. and A. H. M. Ter Hofstede (2005), "YAWL: Yet Another Workflow Language," Information Systems, 30 (4), 245–275.
    Zanjani, Mohammad D. and Shahram Khadivi (2015), "Predicting User Click Behaviour in Search Engine Advertisements," New Review of Hypermedia and Multimedia, 21 (3–4), 301–319.
    Zhang, Ying, Bernanrd J. Jansen, and Amanda Spink (2009), "Identification of Factors Predicting Clickthrough in Web Searching using Neural Network Analysis," Journal of the American Society for Information Science and Technolog, 60 (3), 557–570.
    Ziȩba, Maciej, Sebastian K. Tomczak, and Jakub M. Tomczak (2016), "Ensemble Boosted Trees with Synthetic Features Generation in Application to Bankruptcy Prediction," Expert Systems with Applications, 58, 93–101.

    QR CODE