簡易檢索 / 詳目顯示

研究生: 郭威霆
Wei-Ting Guo
論文名稱: 電商領域命名實體識別(NER)的優化:預訓練模型、問答系統架構與不確定性導向訓練資料選擇之應用與貢獻
Optimization of Named Entity Recognition (NER) in E-commerce: Applications and Contributions of Pre-trained Models, Question Answering Architecture, and Uncertainty-oriented Training Data Selection
指導教授: 鍾聖倫
Sheng-Luen Chung
口試委員: 鍾聖倫
Sheng-Luen Chung
蘇順豐
Shun-Feng Su
徐繼聖
Gee-Sern Hsu
陳冠宇
Kuan-Yu Chen
陸敬互
Ching-Hu Lu
學位類別: 碩士
Master
系所名稱: 電資學院 - 電機工程系
Department of Electrical Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 80
中文關鍵詞: 電子商務命名實體識別深度學習BERT編碼器主動學習
外文關鍵詞: E-commerce, Named Entity Recognition, Deep Learning, BERT Encoder, Active Learning
相關次數: 點閱:238下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • NER 是自然語言處理的一個重要任務,它的目標是識別出文本中的特定實體,如人名、地點等;然而,NER在電商領域中仍有可優化的空間,因此本研究討論了命名實體識別(Named Entity Recognition, NER)在電商領域的應用與挑戰,並提出了一些改進的方法和具體貢獻。
    首先,目前的 BERT 特徵編碼模型並未針對電商領域文本進行特別的預訓練,因此對電商文本的語意理解還有改善的空間。為了解決這個問題,本論文利用電商領域的文本對BERT進行預訓練,並建立了專為電商領域專用的模型:ceComBERT。該模型在電商情境中的各種下游任務中皆有較好的表現。
    其次,電商情境中,時常會有屬性資訊重疊的情況;或是同時出現多個屬性值需要同時預測,導致過往的 NER 架構難以符合電商情境的需求。為了克服這個問題,本文使用 Question Answering 架構,搭配序列標註的預測方式來訓練 NER 模型。這種方法能夠符合以上的電商情境需求,並且有效地減少模型的辨識類別並增強模型的效能。
    最後,人工標註的成本昂貴,且隨機選擇標註資料可能會選擇到過於簡單或特徵相仿的資料,這對模型的效果幫助有限且可能浪費人力資源。因此,本論文利用主動學習的資料蒐集策略,根據模型對資料的不確定性,選擇較為困難的資料進行訓練。這種方法的貢獻在於,在有限的人力資源下,通過標註較為困難的訓練資料,能夠針對性地改良模型並有效提升模型的準確度。


    NER is a significant task in natural language processing, with the objective of identifying specific entities within a text, such as names of individuals, locations, etc. However, there is still room for improvement in NER within the e-commerce domain. Thus, this study delves into the application and challenges of Named Entity Recognition (NER) in the e-commerce field, presenting several methods for enhancement and specific contributions.
    Firstly, existing BERT feature encoding models have not been tailored to pretrain on e-commerce domain text, leaving room for improvement in semantic understanding of e-commerce text. To address this, the paper pretrains BERT on e-commerce domain text and establishes a specialized model called "ceComBERT" for e-commerce. This model demonstrates favorable performance across various downstream tasks within the e-commerce context.
    Secondly, the e-commerce context often involves overlapping attribute information or simultaneous prediction of multiple attribute values, posing challenges to conventional NER architectures. To overcome this, the paper employs a Question Answering framework combined with sequence labeling prediction to train the NER model. This approach caters to e-commerce scenario demands and effectively reduces category recognition complexities while enhancing model efficiency.
    Lastly, manual annotation is costly, and randomly selecting annotated data might lead to choosing overly simplistic or similar-feature data, limiting model efficacy and potentially wasting human resources. Hence, the paper employs an active learning data collection strategy that selects challenging data for training based on model uncertainty. The contribution of this approach lies in improving the model in a targeted manner and significantly enhancing accuracy by annotating more challenging training data under limited human resources.

    摘要 I Abstract II 致謝 IV 目錄 V 圖目錄 IX 表目錄 XI 第 1 章 、簡介 1 1.1 NER任務定義 1 1.2 研究背景 2 1.2.1 模型預訓練 2 1.2.2 模型架構 3 1.2.3 資料蒐集策略 4 1.3 研究動機和貢獻 4 1.4 論文結構 5 第 2 章 、文獻審閱 7 2.1 NER符號定義 7 2.2 非深度學習的NER方法 8 2.2.1 基於規則的NER 8 2.2.2 基於特徵工程的NER 9 2.3 NER的深度學習技術 9 2.3.1 模型輸出策略 10 2.3.2 過往模型架構 11 第 3 章 、訓練與測試資料 15 3.1 資料處理流程概要總攬 15 3.1.1 NER任務資料處理順序 15 3.1.2 模型預訓練資料 16 3.2 資料來源與取得 16 3.2.1 爬蟲的困難與解決辦法 17 3.2.2 現有資料 20 3.3 NER的資料標註 20 3.3.1 商品屬性類別 (Attribute) 20 3.3.2 標註方式 24 3.3.3 標註工具 26 3.3.4 標註工作的困難 26 3.3.5 加速標註的輔助方法 27 第 4 章 、研究方法 29 4.1 符合電商情境的模型 – ceComBERT 29 4.2 NER模型架構 30 4.2.1 模型架構與推論 31 4.2.2 損失函數 33 4.3 資料的處理與取樣 34 4.3.1 資料平衡 (Data Balancing) 34 4.3.2 主動學習 (Active Learning) 35 第 5 章 、實驗與討論 39 5.1 評測方法與評分指標 39 5.1.1 模型評測方法 39 5.1.2 模型評分指標 40 5.2 模型架構實驗 42 5.3 indomain預訓練實驗 43 5.4 訓練資料相關實驗 44 5.5 各類別實驗數值 46 第 6 章 、NER的應用、結論與未來展望 49 6.1 商品過濾 49 6.2 商品比對 50 6.3 搜尋任務的測試集建置 53 6.4 結論 57 6.5 未來展望 57 參考文獻 58 附錄A 62 口試委員之建議與答覆 64

    [1] R. Grishman and B. M. Sundheim, "Message understanding conference-6: A brief history," in COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics, 1996.
    [2] R. Sharnagat, "Named entity recognition: A literature survey," Center For Indian Language Technology, pp. 1-27, 2014.
    [3] J. Li, A. Sun, J. Han, and C. Li, "A survey on deep learning for named entity recognition," IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 1, pp. 50-70, 2020.
    [4] S. R. Eddy, "Hidden markov models," Current opinion in structural biology, vol. 6, no. 3, pp. 361-365, 1996.
    [5] J. Lafferty, A. McCallum, and F. C. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," 2001.
    [6] A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017.
    [7] A. Shelmanov et al., "Active learning for sequence tagging with deep pre-trained models and Bayesian uncertainty estimates," arXiv preprint arXiv:2101.08133, 2021.
    [8] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
    [9] Q. Wang et al., "Learning to extract attribute value from product via question answering: A multi-task approach," in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, pp. 47-55.
    [10] A. Tsvigun et al., "Towards computationally feasible deep active learning," arXiv preprint arXiv:2205.03598, 2022.
    [11] J.-H. Kim and P. C. Woodland, "A rule-based named entity recognition system for speech input," in Sixth International Conference on Spoken Language Processing, 2000.
    [12] D. Hanisch, K. Fundel, H.-T. Mevissen, R. Zimmer, and J. Fluck, "ProMiner: rule-based protein and gene entity recognition," BMC bioinformatics, vol. 6, no. 1, pp. 1-9, 2005.
    [13] D. Nadeau and S. Sekine, "A survey of named entity recognition and classification," Lingvisticae Investigationes, vol. 30, no. 1, pp. 3-26, 2007.
    [14] V. Krishnan and C. D. Manning, "An effective two-stage model for exploiting non-local dependencies in named entity recognition," in Proceedings of the 21st international conference on computational linguistics and 44th annual meeting of the association for computational linguistics, 2006, pp. 1121-1128.
    [15] Z. Ji, A. Sun, G. Cong, and J. Han, "Joint recognition and linking of fine-grained locations from tweets," in Proceedings of the 25th international conference on world wide web, 2016, pp. 1271-1281.
    [16] J. Zhu, V. Uren, and E. Motta, "ESpotter: Adaptive named entity recognition for web browsing," in Professional Knowledge Management: Third Biennial Conference, WM 2005, Kaiserslautern, Germany, April 10-13, 2005, Revised Selected Papers 3, 2005: Springer, pp. 518-529.
    [17] L. Yang et al., "Mave: A product dataset for multi-source attribute value extraction," in Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, 2022, pp. 1256-1265.
    [18] H. Xu, W. Wang, X. Mao, X. Jiang, and M. Lan, "Scaling up open tagging from tens to thousands: Comprehension empowered attribute value extraction from product title," in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 5214-5223.
    [19] K. Roy, P. Goyal, and M. Pandey, "Attribute value generation from product title using language models," in Proceedings of The 4th Workshop on e-Commerce and NLP, 2021, pp. 13-17.
    [20] C. Sanchez and Z. Zhang, "The Effects of In-domain Corpus Size on pre-training BERT," arXiv preprint arXiv:2212.07914, 2022.
    [21] J. Ni, J. Li, and J. McAuley, "Justifying recommendations using distantly-labeled reviews and fine-grained aspects," in Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), 2019, pp. 188-197.
    [22] T. Brown et al., "Language models are few-shot learners," Advances in neural information processing systems, vol. 33, pp. 1877-1901, 2020.
    [23] C. Liang et al., "Bond: Bert-assisted open-domain named entity recognition with distant supervision," in Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining, 2020, pp. 1054-1064.
    [24] Y. Meng et al., "Distantly-supervised named entity recognition with noise-robust learning and language model augmented self-training," arXiv preprint arXiv:2109.05003, 2021.
    [25] Y. Liu et al., "Roberta: A robustly optimized bert pretraining approach," arXiv preprint arXiv:1907.11692, 2019.
    [26] C. Raffel et al., "Exploring the limits of transfer learning with a unified text-to-text transformer," The Journal of Machine Learning Research, vol. 21, no. 1, pp. 5485-5551, 2020.
    [27] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, "Albert: A lite bert for self-supervised learning of language representations," arXiv preprint arXiv:1909.11942, 2019.
    [28] J. Howard and S. Ruder, "Universal language model fine-tuning for text classification," arXiv preprint arXiv:1801.06146, 2018.
    [29] C. Sun, X. Qiu, Y. Xu, and X. Huang, "How to fine-tune bert for text classification?," in Chinese Computational Linguistics: 18th China National Conference, CCL 2019, Kunming, China, October 18–20, 2019, Proceedings 18, 2019: Springer, pp. 194-206.
    [30] S. Gururangan et al., "Don't stop pretraining: Adapt language models to domains and tasks," arXiv preprint arXiv:2004.10964, 2020.
    [31] J. Tracz, P. I. Wójcik, K. Jasinska-Kobus, R. Belluzzo, R. Mroczkowski, and I. Gawlik, "BERT-based similarity learning for product matching," in Proceedings of Workshop on Natural Language Processing in E-Commerce, 2020, pp. 66-75.
    [32] Ckip Lab. "ckip-transformers." https://github.com/ckiplab/ckip-transformers (accessed.
    [33] Z. Xu, D. Shen, T. Nie, and Y. Kou, "A hybrid sampling algorithm combining M-SMOTE and ENN based on random forest for medical imbalanced data," Journal of Biomedical Informatics, vol. 107, p. 103465, 2020.
    [34] Z. Liu, F. Jiang, Y. Hu, C. Shi, and P. Fung, "Ner-bert: A pre-trained model for low-resource entity tagging," arXiv preprint arXiv:2112.00405, 2021.
    [35] P. Blunsom et al., "Proceedings of the 2nd Workshop on Representation Learning for NLP," in Proceedings of the 2nd Workshop on Representation Learning for NLP, 2017, pp. 252-256.
    [36] E. F. Sang and F. De Meulder, "Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition," arXiv preprint cs/0306050, 2003.
    [37] S. Pradhan, A. Moschitti, N. Xue, O. Uryupina, and Y. Zhang, "CoNLL-2012 shared task: Modeling multilingual unrestricted coreference in OntoNotes," in Joint conference on EMNLP and CoNLL-shared task, 2012, pp. 1-40.
    [38] Y. Chen, S. Liu, Z. Liu, W. Sun, L. Baltrunas, and B. Schroeder, "Wands: Dataset for product search relevance assessment," in Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part I, 2022: Springer, pp. 128-141.
    [39] C. Yang, "Optimizing eCommerce Product Search: Utilizing Batch-Negative and Domain-Adaptive Pre-training Models, along with Test Dataset Augmentation to Enhance Search Performance and Evaluation Accuracy.."

    無法下載圖示 全文公開日期 2028/08/25 (校內網路)
    全文公開日期 2028/08/25 (校外網路)
    全文公開日期 2028/08/25 (國家圖書館:臺灣博碩士論文系統)
    QR CODE