簡易檢索 / 詳目顯示

研究生: 杜平
Ping Tu
論文名稱: 探索社群媒體之中文章與留言關聯性之數據分析框架
Data Analysis Framework of Searching the Association Between Posts and Comments on Social Network
指導教授: 楊朝龍
Chao-Lung Yang
林承哲
Cheng-Jhe Lin
口試委員: 林希偉
Shi-Woei Lin
陳怡伶
Yi-Ling Chen
學位類別: 碩士
Master
系所名稱: 管理學院 - 工業管理系
Department of Industrial Management
論文出版年: 2021
畢業學年度: 109
語文別: 英文
論文頁數: 64
中文關鍵詞: 內容分群多目標最佳化NSGAIII模糊化分群FCM社群媒體分析
外文關鍵詞: content-based analysis, multi-objective optimization, NSGAIII, fuzzy clustering method, FCM, social network analysis
相關次數: 點閱:183下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著社群媒體的普及,發表在Facebook 和 Twitter 的言論越來越受到關注,此類型的社群媒體讓使用者能自由地在網路上創作、分享資訊或留下任何評論。透過進一步分析,這些資訊可以應用於不同的領域,例如:廣告投放、協助決策執行等。這也使得社群媒體上的文字成為一個重要的分析對象,例如:有著特定模式的文章是否會帶來相應模式的留言?受眾是否會更容易被藉由特定關鍵字構成的文章而吸引?部分的研究著重於針對內容的情緒分析,例如判斷文字屬於正向或負向、樂觀抑或是悲觀,然而,這些內容是否有其「背後目的」則為一個重要的研究議題。本研究提出一個探索社群媒體之文章與留言關聯之數據分析框架,利用Fuzzy c-means模糊化分群之手法進行文章與留言關聯分析。為了避免片面地決策影響實驗結果,本研究利用第三代非支配遺傳演算法(Non-dominated Sorting Genetic Algorithm III, NSGAIII)多目標最佳化方法針對1) 文章隸屬程度平均變異值、2) 留言隸屬程度平均變異值以及3) 預測留言結果平均均方根誤差,三個目標函數找出對內容最有影響力的關鍵字以及最佳地分群數。本研究利用發表在Kaggle的公開社群討論資料,進行演算法分析的驗證並與循序性分群方法進行實驗結果比較。實驗結果顯示,本研究所提出之數據分析框架比其他提出之方法有更好的表現。


    Nowadays, online social networks, such as Facebook and Twitter that allow people to create, share, and comment on anything related to their real life, is becoming increasingly popular. Such information is very useful for various application domains, e.g., decision support systems and online advertising. Therefore, analyzing contents on social network has become an important research area. What characteristics of contents in posts can make users leave specific comments? What kinds of keywords in the contents are useful to attract users? Previous studies mainly focused on analyzing sentiment of the contents, such as positive or negative. In this study, we focused on investigating the correlation between posts and the associated comments on a social network platform. This research proposed a data analysis framework for searching the association between posts and comments on social network based on fuzzy c-means clustering method. Non-dominated Sorting Genetic Algorithm III was used to search the most influential keywords and the optimal number of clusters by considering multiple objective functions: 1) the average variance of post membership, 2) the average variance of comment membership, and 3) the average root mean square error to predict comment types. A social network dataset from Kaggle was used to compare the proposed framework with sequential cluster method and canonical correlation analysis (CCA) method. The experimental results reveal that the proposed framework outperforms the sequential method and has similar performance with CCA.

    CHAPTER 1. INTRODUCTION CHAPTER 2. LITERATURE REVIEW 2.1 Content-based clustering on social network 2.2 Term frequency and inverse documentation frequency (TFIDF) 2.3 Fuzzy c-means algorithm (FCM) 2.4 Non-dominated sorting genetic algorithm 2.4.1 Difference from NSGAII 2.4.2 Recent research regarding NSGAIII CHAPTER 3. METHODOLOGY 3.1 LSIK based on TFIDF scores 3.2 The generation of MPK and MCK 3.3 Fuzzy c-means formulation 3.4 The framework using NSGAIII multi-objective optimization 3.4.1 Crossover and mutation 3.4.2 Evaluation 3.4.3 The Pareto front method based on NSGAIII 3.4.4 Niche operation CHAPTER 4. EXPERIMENTS AND RESULTS 4.1 Data preprocessing and evaluation 4.2 Selection of decision variable 4.3 Optimal solutions 4.3.1 Solutions in different criteria 4.3.2 Cluster analysis 4.4 Compare with other experiments 4.4.1 Compare with sequential method 4.4.2 Compare with canonical correlation analysis CHAPTER 5. CONCLUSION REFERENCES APPENDIX

    [1] P. Mika, "Social networks and the semantic web," in IEEE/WIC/ACM International Conference on Web Intelligence (WI'04), 2004: IEEE, pp. 285-291.
    [2] Q. Cai, M. Gong, L. Ma, S. Ruan, F. Yuan, and L. Jiao, "Greedy discrete particle swarm optimization for large-scale social network clustering," Information Sciences, vol. 316, pp. 503-516, 2015/09/20/ 2015, doi: https://doi.org/10.1016/j.ins.2014.09.041.
    [3] T. Jose and S. S. Babu, "Detecting spammers on social network through clustering technique," Journal of Ambient Intelligence and Humanized Computing, pp. 1-15, 2019.
    [4] S. Sharma and R. Gupta, "Improved BSP clustering algorithm for social network analysis," International journal of grid and Distributed Computing, vol. 3, no. 3, pp. 67-76, 2010.
    [5] J. C. Bezdek, Pattern recognition with fuzzy objective function algorithms. Springer Science & Business Media, 2013.
    [6] M. Thelwall, D. Wilkinson, and S. Uppal, "Data mining emotion in social network communication: Gender differences in MySpace," Journal of the American Society for Information Science and Technology, vol. 61, no. 1, pp. 190-199, 2010.
    [7] M. Thelwall, "Emotion homophily in social network site messages," First Monday, 2010.
    [8] W. J. Brady, J. A. Wills, J. T. Jost, J. A. Tucker, and J. J. Van Bavel, "Emotion shapes the diffusion of moralized content in social networks," Proceedings of the National Academy of Sciences, vol. 114, no. 28, pp. 7313-7318, 2017.
    [9] S. Jamali and H. Rangwala, "Digging digg: Comment mining, popularity prediction, and social network analysis," in 2009 International Conference on Web Information Systems and Mining, 2009: IEEE, pp. 32-38.
    [10] C. C. Yang and T. D. Ng, "Terrorism and crime related weblog social network: Link, content analysis and information visualization," in 2007 IEEE Intelligence and Security Informatics, 2007: IEEE, pp. 55-58.
    [11] M. De Laat, "Network and content analysis in an online community discourse," 2002.
    [12] T. A. Williams and D. A. Shepherd, "Mixed method social network analysis: Combining inductive concept development, content analysis, and secondary data for quantitative analysis," Organizational Research Methods, vol. 20, no. 2, pp. 268-298, 2017.
    [13] A. Berson and S. J. Smith, Data warehousing, data mining, and OLAP. McGraw-Hill, Inc., 1997.
    [14] C.-Y. Liu, M.-S. Chen, and C.-Y. Tseng, "Incrests: Towards real-time incremental short text summarization on comment streams from social network services," IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 11, pp. 2986-3000, 2015.
    [15] W. Chen, J. Wang, Y. Zhang, H. Yan, and X. Li, "User based aggregation for biterm topic model," in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), 2015, pp. 489-494.
    [16] Z. Qiu and H. Shen, "User clustering in a dynamic social network topic model for short text streams," Information Sciences, vol. 414, pp. 102-116, 2017/11/01/ 2017, doi: https://doi.org/10.1016/j.ins.2017.05.018.
    [17] Y. Zhou, K. R. Fleischmann, and W. A. Wallace, "Automatic Text Analysis of Values in the Enron Email Dataset: Clustering a Social Network Using the Value Patterns of Actors," in 2010 43rd Hawaii International Conference on System Sciences, 5-8 Jan. 2010 2010, pp. 1-10, doi: 10.1109/HICSS.2010.77.
    [18] R. Logesh and V. Subramaniyaswamy, "Learning recency and inferring associations in location based social network for emotion induced point-of-interest recommendation," Journal of Information Science & Engineering, vol. 33, no. 6, 2017.
    [19] X. Zhang, W. Li, and S. Lu, "Emotion detection in online social network based on multi-label learning," in International Conference on Database Systems for Advanced Applications, 2017: Springer, pp. 659-674.
    [20] A. Aizawa, "An information-theoretic perspective of tf–idf measures," Information Processing & Management, vol. 39, no. 1, pp. 45-65, 2003/01/01/ 2003, doi: https://doi.org/10.1016/S0306-4573(02)00021-3.
    [21] B. Harish and M. Revanasiddappa, "A comprehensive survey on various feature selection methods to categorize text documents," International Journal of Computer Applications, vol. 164, no. 8, pp. 1-7, 2017.
    [22] J. C. Dunn, "A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters," 1973.
    [23] Yi-Ming Wang, "Feature Importance Evaluation under Fuzzy Group Based Recommendation System," 碩士, 工業管理系, 國立臺灣科技大學, 台北市, 2019. [Online]. Available: https://hdl.handle.net/11296/8gkzc9
    [24] V. Schwämmle and O. N. Jensen, "A simple and fast method to determine the parameters for fuzzy c–means cluster analysis," Bioinformatics, vol. 26, no. 22, pp. 2841-2848, 2010.
    [25] S. Miyamoto, "Information clustering based on fuzzy multisets," Information processing & management, vol. 39, no. 2, pp. 195-213, 2003.
    [26] V. K. Singh, N. Tiwari, and S. Garg, "Document clustering using k-means, heuristic k-means and fuzzy c-means," in 2011 International Conference on Computational Intelligence and Communication Networks, 2011: IEEE, pp. 297-301.
    [27] J. Kang and W. Zhang, "Combination of fuzzy C-means and particle swarm optimization for text document clustering," in Advances in Electrical Engineering and Automation: Springer, 2012, pp. 247-252.
    [28] K. Deb and H. Jain, "An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints," IEEE Transactions on Evolutionary Computation, vol. 18, no. 4, pp. 577-601, 2014, doi: 10.1109/TEVC.2013.2281535.
    [29] I. Das and J. E. Dennis, "Normal-boundary intersection: A new method for generating the Pareto surface in nonlinear multicriteria optimization problems," SIAM journal on optimization, vol. 8, no. 3, pp. 631-657, 1998.
    [30] G. Campos Ciro, F. Dugardin, F. Yalaoui, and R. Kelly, "A NSGA-II and NSGA-III comparison for solving an open shop scheduling problem with resource constraints," IFAC-PapersOnLine, vol. 49, no. 12, pp. 1272-1277, 2016/01/01/ 2016, doi: https://doi.org/10.1016/j.ifacol.2016.07.690.
    [31] A. Teymourifar, A. M. Rodrigues, and J. S. Ferreira, "A Comparison between NSGA-II and NSGA-III to Solve Multi-Objective Sectorization Problems based on Statistical Parameter Tuning," in 2020 24th International Conference on Circuits, Systems, Communications and Computers (CSCC), 2020: IEEE, pp. 64-74.
    [32] Y. Zhu, J. Liang, J. Chen, and Z. Ming, "An improved NSGA-III algorithm for feature selection used in intrusion detection," Knowledge-Based Systems, vol. 116, pp. 74-85, 2017/01/15/ 2017, doi: https://doi.org/10.1016/j.knosys.2016.10.030.
    [33] Z. Cui, Y. Chang, J. Zhang, X. Cai, and W. Zhang, "Improved NSGA-III with selection-and-elimination operator," Swarm and Evolutionary Computation, vol. 49, pp. 23-33, 2019/09/01/ 2019, doi: https://doi.org/10.1016/j.swevo.2019.05.011.
    [34] X. Yuan, H. Tian, Y. Yuan, Y. Huang, and R. M. Ikram, "An extended NSGA-III for solution multi-objective hydro-thermal-wind scheduling considering wind power cost," Energy Conversion and Management, vol. 96, pp. 568-578, 2015/05/15/ 2015, doi: https://doi.org/10.1016/j.enconman.2015.03.009.
    [35] X. Xue, J. Lu, and J. Chen, "Using NSGA-III for optimising biomedical ontology alignment," CAAI Transactions on Intelligence Technology, https://doi.org/10.1049/trit.2019.0014 vol. 4, no. 3, pp. 135-141, 2019/09/01 2019, doi: https://doi.org/10.1049/trit.2019.0014.
    [36] E. Parvizi and M. H. Rezvani, "Utilization-aware energy-efficient virtual machine placement in cloud networks using NSGA-III meta-heuristic approach," Cluster Computing, vol. 23, no. 4, pp. 2945-2967, 2020/12/01 2020, doi: 10.1007/s10586-020-03060-y.
    [37] R. Sanders, "The Pareto principle: its use and abuse," Journal of Services Marketing, 1987.
    [38] X. L. Xie and G. Beni, "A validity measure for fuzzy clustering," IEEE Transactions on pattern analysis and machine intelligence, vol. 13, no. 8, pp. 841-847, 1991.
    [39] H. Hotelling, "The most predictable criterion," Journal of educational Psychology, vol. 26, no. 2, p. 139, 1935.
    [40] B. Thompson, "Canonical correlation analysis," 2000.

    無法下載圖示 全文公開日期 2024/09/04 (校內網路)
    全文公開日期 2026/09/04 (校外網路)
    全文公開日期 2026/09/04 (國家圖書館:臺灣博碩士論文系統)
    QR CODE