簡易檢索 / 詳目顯示

研究生: 林純正
Zhun-Zheng Lin
論文名稱: 使用循環估計之動態加權決策森林方法改善極大量多標籤分類問題
A Dynamic Reweighting Forest for Extreme Multi-Label Classification with Rotation Estimation
指導教授: 戴碧如
Bi-Ru Dai
口試委員: 徐國偉
Kuo-wei Hsu
蔡曉萍
Hsiao-Ping Tsai
鮑興國
Hsing-kuo Pao
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2017
畢業學年度: 105
語文別: 英文
論文頁數: 38
中文關鍵詞: 多標籤分類隨機森林極大分類
外文關鍵詞: Multi-label classification, Random forest, Extreme classification
相關次數: 點閱:327下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

由於近年來網路技術的蓬勃發展,產生了一些擁有巨量資料量的資料集。這些資料集包含著成千上萬的資料點、特徵以及標籤,因此傳統的分類演算法並不能在可接受的時間內處理這些資料集。極大多標籤分類演算法就是被設計來處理這一類的問題。不同於傳統的多標籤分類演算法,極大多標籤分類演算法被要求在更短的時間內執行而且必須能夠處理成上百萬的資料點、特徵以及標籤等等。在這篇論文中,為了增加極大多標籤分類的實用價值,我們專心於如何讓極大多標籤分類演算法可以在一台個人電腦中執行。因此我們設計了一個二階段架構的演算法來處理上面提到的問題。在重新調整權重的階段中,藉由加強難以分類的資料點的學習,而提升預測模型的精準度以及多樣性。另一方面,在預先測試的階段中,藉由移除在預測模型中品質較差的樹,預測模型整體的精準度也能夠得到提升。在最後經由真實資料集的驗證,我們提出的方法的確能夠擁有更佳的預測精準度以及預測模型的儲存空間大小也能夠得到的縮減。


In recent years, data volume is getting larger along with the fast development of Internet technologies. Some datasets contain a huge number of labels, dimensions and data points. As a result, some of them cannot be loaded by typical classifiers, and some of them require very long and unacceptable time for execution. Extreme multi-label classification is designed for these challenges. Extreme multi-label classification differs from traditional multi-label classification in a number of ways including the need for lower execution time, training at an extreme scale with millions of data points, features and labels, etc. In order to enhance the practicality, in this paper, we focus on designing an extreme multi-label classification approach which can be performed on a single personal computer. We devise a two-phase framework for dealing with the above issues. In the reweighting phase, the prediction precision is improved by paying more attention on hard-to-classify instances and increasing the diversity of the model. In the pretesting phase, trees with lower quality will be removed from the prediction model for reducing the model size and increasing the prediction precision. Experiments on real world datasets will verify that the proposed method is able to generate better prediction results and the model size is successfully shrunk down.

指導教授推薦書II 論文口試委員審定書III AbstractIV 論文摘要 V 致 謝VI Table of ContentsVII List of TablesVIII List of FiguresIX 1. Introduction1 2. Related Works4 3. Proposed Method6 3.1 Problem Definition and Proposed Framework6 3.2 The Reweighting Phase9 3.3 The Pretesting Phase13 3.4 Advanced Improvements15 3.4.1 Cross-validation15 3.4.2 Dynamic Pretesting Part Size Adaptation16 4. Experiments18 4.1 Experimental Setup and Datasets18 4.2 Experimental Results19 5. Conclusion26 6. Reference27

1. Jain, H., Prabhu, Y., Varma, M.: Extreme Multi-label Loss Functions for Recommendation, Tagging, Ranking & Other Missing Label Applications. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 935–944. ACM (2016)
2. Prabhu, Y., Varma, M.: FastXML: A Fast, Accurate and Stable Tree-classifier for eXtreme Multi-label Learning. In: Proceedings of the 20nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 263–272. ACM (2014)
3. Bhatia, K., Jain, H., Kar, P., Varma, M., Jain, P.: Sparse Local Embeddings for Extreme Multi-label Classification. In: Advances in Neural Information Processing Systems, pp. 730–738 (2015)
4. Babbar, R., & Shoelkopf, B.: DiSMEC: Distributed Sparse Machines for Extreme Mul-ti-label Classification. In: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, pp. 721–729. ACM (2017)
5. Yen, I. E., Huang, X., Zhong, K., Ravikumar, P., Dhillon, I. S.: PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification. In: Proceedings of The 33rd International Conference on Machine Learning, pp. 3069–3077. IEEE (2016)
6. Xu, C., Tao, D., Xu, C.: Robust Extreme Multi-label Learning. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1275–1284. ACM (2016)
7. Yu, H. F., Jain, P., Kar, P., Dhillon, I. S.: Large-scale Multi-label Learning with Missing Labels. In: Proceedings of The 31st International Conference on Machine Learning, pp. 593–601. IEEE (2014)
8. Adnan, M. N., Islam, M. Z.: Forest CERN: A New Decision Forest Building Technique. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 304–315. Springer International Publishing (2016)
9. Adnan, M. N., Islam, M. Z.: On Improving Random Forest for Hard-to-Classify Records. In: Advanced Data Mining and Applications: 12th International Conference, pp. 558–566. Springer International Publishing (2016)
10. Snoek, C.G., Worring, M., Van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Pro-ceedings of the 14th Annual ACM International Conference on Multimedia, pp. 421– 430. ACM (2006)
11. Katakis, I., Tsoumakas, G., Vlahavas, I: Multilabel text classification for automated tag sug-gestion. In: ECML/PKDD Discovery Challenge, (2008)
12. Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: ECML/PKDD 2008 Workshop on Mining Multi-dimensional Data, pp. 30-44. (2008)
13. Mencia, E. L., Fürnkranz, J.: Efficient pairwise multilabel classification for large-scale prob-lems in the legal domain. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 50-65. (2008)
14. McAuley, J., Leskovec, J.: Hidden factors and hidden topics: understanding rating dimen-sions with review text. In: Proceedings of the 7th ACM conference on Recommender sys-tems, pp. 165-172. ACM (2013)
15. Zubiaga, A.: Enhancing navigation on wikipedia with social tags. In: Preprint. (2012)
16. Partalas, I., Kosmopoulos, A., Baskiotis, N., Artieres, T., Paliouras, G., Gaussier, E., An-droutsopoulos, I., Amini, M.-R., Galinari, P.: LSHTC: A benchmark for large-scale text classification. In: Preprint. (2015)
17. The Extreme Classification Repository. http://research.microsoft.com/en-us/um/people/manik/downloads/XC/XMLRepository.html.
18. John, G. H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pp. 338-345. (1995)
19. Safavian, S. R., Landgrebe, D.: A survey of decision tree classifier methodology. In: IEEE transactions on systems, man, and cybernetics, pp. 660-674. (1991)
20. Weston, J., Makadia, A., Yee, H.: Label Partitioning For Sublinear Ranking. In: Proceeding of ICML, pp. 181-189. (2013)
21. Krizhevsky, A., Sutskever, I., Hinton, G. E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097-1105. (2012)
22. Suykens, J. A., Vandewalle, J.: Least squares support vector machine classifiers. In: Neural processing letters, pp. 293-300. (1999)
23. Al Bataineh, M., Al-qudah, Z.: A novel gene identification algorithm with Bayesian classifi-cation. In: Biomedical Signal Processing and Control, pp. 6-15. (2017)
24. Goodman, K. E., Lessler, J., Cosgrove, S. E., Harris, A. D., Lautenbach, E., Han, J. H., Tamma, P. D.: A Clinical Decision Tree to Predict Whether a Bacteremic Patient Is Infected With an Extended-Spectrum β-Lactamase–Producing Organism. In: Clinical Infectious Dis-eases, pp. 896-903. (2016)
25. Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., Thrun, S.: Der-matologist-level classification of skin cancer with deep neural networks. In: Nature, pp. 115-118. (2017)
26. Geng, Y., Chen, J., Fu, R., Bao, G., Pahlavan, K.: Enlighten wearable physiological moni-toring systems: On-body rf characteristics based human motion classification using a sup-port vector machine. In: IEEE transactions on mobile computing, pp. 656-671. (2016)

無法下載圖示 全文公開日期 2022/08/23 (校內網路)
全文公開日期 本全文未授權公開 (校外網路)
全文公開日期 本全文未授權公開 (國家圖書館:臺灣博碩士論文系統)
QR CODE