Author: |
簡子婷 Tzu-Ting Chien |
---|---|
Thesis Title: |
跨領域情感分類 Cross-domain Sentiment Classification |
Advisor: |
徐俊傑
Chiun-Chieh Hsu |
Committee: |
賴源正
Yuan-Cheng Lai 洪政煌 Cheng-Huang Hung |
Degree: |
碩士 Master |
Department: |
管理學院 - 資訊管理系 Department of Information Management |
Thesis Publication Year: | 2018 |
Graduation Academic Year: | 106 |
Language: | 中文 |
Pages: | 49 |
Keywords (in Chinese): | 跨領域情感分類 、文字探勘 、特徵遷移學習 、資料探勘 |
Keywords (in other languages): | cross-domain sentiment classification, text mining, feature transfer learning, data mining |
Reference times: | Clicks: 571 Downloads: 0 |
Share: |
School Collection Retrieve National Library Collection Retrieve Error Report |
近年來隨著社群平台、電子商務的興起,每天皆有大量的評論數據產生,這 些評論中隱含著使用者對於某類事務的情感傾向,獲取這其中的資訊,便能建立 用戶興趣模型、推薦系統等等,因此,情感分類的技術也逐漸受到重視。
監督式學習演算法是使用最廣泛的情感分類技術,但是若要達到高準確率, 必須透過訓練大量的標記數據,而產生龐大的標記資料是非常耗費成本的。另一 方面,目前情感分類方法的領域依賴性高,已訓練於某領域之分類模型無法在其 他領域有很好的表現,因此,便有學者提出跨領域情感分類的方法。跨領域情感 分類即是將訓練於某一領域已標記資料的分類器,用來預測另一領域之資料的情 感傾向。譜特徵對齊演算法是較早提出之跨領域情感分類方法,透過領域間詞與 詞共同出現的關係作為橋梁,建立兩領域的特徵空間以此將特徵詞做情感分類。 然而,若兩領域的差距過大,即兩領域詞彙之間共同出現的頻率過小,其分類的 準確率也會隨之下降。
因此,本論文提出一種新的基於特徵遷移學習的跨領域情感分類方法,利用 擴展領域的詞彙來增加詞與詞之間共同出現的關係,進而減少兩領域的差距,並 使用譜分群將特徵詞映射至新的特徵空間。最後,透過支援向量機模型來訓練並 將特徵詞做情感分類。根據實驗結果,本論文提出之方法能夠改善先前方法之缺 點並提升其的準確率。
As social media and e-commerce have been flourishing in recent years, tremendous review data are produced each day. Some of these reviews contain sentiment polarity towards certain objects or events. If we can mine the information hidden in these reviews, we can apply it to the fields such as building recommending systems or users’ interest models. Therefore, the technology of sentiment classification has been receiving more and more attention.
Supervised learning algorithm is the most commonly used sentiment classification technique. However, in order to obtain high accuracy, we need to build a model which is trained with numerous labeled text data, where producing a large amount of labeled text data is extremely time-consuming and expensive. On the other hand, the existing sentiment classification method is highly domain-dependent. A model that has already been trained in certain domains does not perform well in the others. Because of these, some scholars have proposed methods of cross-domain sentiment classification. Cross-domain sentiment classification is to use a classifier that has been trained in one domain with its labeled text data, and use it to predict the sentiment polarity in other domains. Spectral Feature Alignment Algorithm was one of the first cross-domain sentiment classification methods. It uses domain-independent words appear in both domains as a bridge to establish a feature space between the two domains, which can be used to train sentiment classifiers in the target domain accurately. Yet if the two domains are barely related, which resulting in extremely low co-occurrence of domain-independent words and domain-specific words, dropping accuracy of classification as well.
Hence, we propose a new cross-domain sentiment classification that is based on feature transfer learning. The method extends the domain-independent words and uses spectral clustering to find a new representation of domain-specific words. Lastly, it is trained by support vector machine, and then perform sentiment classification on domain-specific words. The experimental results show that our proposed method can improve the weaknesses of previous methods and increase the accuracy.
[1] N. X. Bach, V. T. Hai, and T. M. Phuong, "Cross-Domain Sentiment Classification with Word Embeddings and Canonical Correlation Analysis,"Internatinal Symposium on Information and Communication Technology, pp. 159-166, 2016.
[2] M. Belkin and N. Niyogi, "Laplacian Eigenmaps for Dimensionality Reduction and Data Representation," Neural Computation, pp. 1373-1396, 2003.
[3] J. Blitzer, M. Dredze, and F. Pereira, "Biographies, Bollywood, Boom-boxes and Blenders:," Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 432-439, 2007.
[4] D. Bollegala, T. Mu, and J. Y. Goulermas, "Cross-Domain Sentiment Classification Using Sentiment Sensitive Embeddings," IEEE Transactions on Knowledge and Data Engineering, pp. 398-410, 2016.
[5] F. R. K. Chung, "Spectral Graph Theory," CBMS Regional Conference Series in Mathematics, 1997.
[6] I. S. Dhillon, "Co-clustering documents and words using bipartite spectral graph partitioning," International Conference on Knowledge Discovery and Data Mining, pp. 269-274, 2001.
[7] X. Glorot, A. Bordes, and Y. Bengio, "Domain Adaptation for Large-Scale Sentiment Classification," Proceedings of the 28 th International Conference, pp. 513-520, 2011.
[8] H. Hammer, A. Yazidi, A. Bai, and P. Engelstad, "Building Domain Specific Sentiment Lexicons Combining Information from Many Sentiment Lexicons and a Domain Specific Corpus," International Federation for Information Processing , pp. 205-216, 2015.
[9] M. Hu and B. Liu , "Mining opinion features in customer reviews," American Association for Artificial Intelligence , pp. 755-760, 2004.
[10] W.-Y. Ma and K.-J. Chen, "Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff," Proceedings of ACL, Second SIGHAN Workshop on Chinese Language Processing, pp. 168-171, 2003.
[11] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representationsin Vector Space," Proceedings of the International Conference on Learning Representations, 2013.
[12]T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, "Distributed Representations of Words and Phrases and their Compositionality," International Conference on Neural Information Processing Systems, pp. 3111-3119, 2013.
[13] A. Y. Ng, M. I. Jordan, and Y. Weiss, "On spectral clustering: Analysis and an algorithm," Advances in Neural Information Processing Systems, pp. 849-856, 2001.
[14] S. J. Pan, X. Ni, J.-T. Sun, Q. Yang, and Z. Chen, "Cross-Domain Sentiment Classification via Spectral Feature Alignment," International World Wide Web Conference Committee, p. 751, 2010.
[15] B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up? Sentiment Classification using Machine Learning," Proceedings of the Conference on Empirical Methods in Natural Language Processing, p. 79–86, 2002.
[16] N. Ponomareva and M. Thelwall, "Do Neighbours Help? An Exploration of Graph-based Algorithms for Cross-domain Sentiment Classification," Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 655-665, 2012.
[17] N. Ponomareva and M. Thelwall, "Semi-supervisedvs. Cross-domain Graphs for Sentiment Analysis," Proceedings of Recent Advances in Natural Language Processing, pp. 571-578, 2013.
[18] G. Qiu, B. Liu, J. Bu, and C. Chen, "Opinion word expansion and target extraction through double propagation," Journal of Computational Linguistics, pp. 9-27, 2011.
[19] S. D. Roy, W. Zeng, T. Mei, and . S. Li , "SocialTransfer: Cross-Domain Transfer Learning from Social Streams for Media Applications," 20th ACM International conference, pp. 649-658, (2012).
[20] L. Shoushan, Y. Xue, Z. Wang, and . G. Zhou, "Active Learning for Cross- omain Sentiment Classification," Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 2127-2133, 2013.
[21]Q. Wu and S. Tan, "A two-stage framework for cross-domain sentiment classification," Expert Systems with Applications, p. 14269–14275, 2011.
[22] P. Yang, W. Gao, Q. Tan, and K.-F. Wong, "A link-bridged topic model for cross-domain document classification," Information Processing and Management, pp. 1181-1193, 2013.
[23] T. Yi-Lin, R. T.-H. T. Tsai, C.-H. Chueh, and S.-C. Chang, "Cross-Domain Opinion Word Identification with Query-By-Committee Active Learning," Technologies and Applications of Artificial Intelligence, pp. 334-343, 2014.
[24] H.-C. Yu, T.-H. Huang, and H.-H. Chen , "Domain Dependent Word Polarity Analysis for Sentiment Classification," Computational Linguistics and Chinese Language Processing , pp. 33-48, 2012.
[25] Z. Zhu, D. Dai, Y. Ding, J. Qian, and S. Li , "Employing Emotion Keywords to Improve Cross-Domain Sentiment Classification," Chinese Lexical Semantics, pp. 64-71, 2012.
[26] 琚春華、鄒江波、傅小康,「基於中文電子病歷的跨科室組塊分析」,知識管 理論壇,2016.
[27] 戴雪、蔣志鵬、關毅,「基於中文電子病歷的跨科室組塊分析」,計算機應用 研究,2017.