簡易檢索 / 詳目顯示

研究生: 吳澤鑫
Tse-Hsin Wu
論文名稱: 深度神經網路於中文斷詞之研究
Research On Chinese Word Segmentation with Deep Neural Network
指導教授: 陳冠宇
Kuan-Yu Chen
口試委員: 曾厚強
Hou-Chiang Tseng
蘇明祥
Ming-Hsiang Su
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2023
畢業學年度: 111
語文別: 中文
論文頁數: 75
中文關鍵詞: 中文斷詞深度學習監督式學習非監督式學習TransformerBERT
外文關鍵詞: Chinese Word Segmentation, Deep Learning, Supervised Learning, Unsupervised Learning, Transformer, BERT
相關次數: 點閱:364下載:11
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 隨著Transformer架構的提出,如BERT、GPT-2…等等,這些預訓練於大量文本上的模型透過微調下游任務的方式在自然語言處理領域中蓬勃發展。在中文斷詞領域中,也將資料集的評估分數推升至F1分數97分之超高分標準,但對於未知詞的問題上卻無法得到很好的解決。在此本論文針對中文的斷詞系統,提出一個結合監督式與非監督式方法的斷詞模型,提供一個可供參考之訓練框架,並且藉由此方式達到非監督式模型輔助監督式模型的效果,期望讓模型對於未知詞的處理能擁有更好的適應能力;在實驗中比較傳統Word2Vec預訓練字向量模型和目前主流的Transformer預訓練模型,並且探討目前中文斷詞領域,Word2Vec模型之短處以及非監督式模型之架構改進方式,最後與目前常用套件進行效能上的比較。


    With the introduction of the Transformer architecture, such as BERT, GPT-2, etc., these pre-trained models on a large amount of text have flourished in the field of natural language processing by fine-tuning downstream tasks. In the field of Chinese word segmentation, the evaluation score of the data set has also been pushed up to the ultra-high score standard of F1 score of 97 points, but the problem of unknown words cannot be solved well. In this paper, a word segmentation model combining supervised and unsupervised methods is proposed for the Chinese word segmentation system. Provide a training framework for reference, and in this way achieve the effect of the unsupervised model assisting the supervised model, expecting the model to have better adaptability to the processing of unknown words; compare the traditional Word2Vec pre-training in the experiment The word vector model and the current mainstream Transformer pre-training model, and discuss the current field of Chinese word segmentation, the shortcomings of the Word2Vec model and the improvement of the structure of the unsupervised model, and finally compare the performance with the current commonly used packages.

    第1章 緒論 1 1.1 研究目的及動機 1 1.2 論文大綱 2 第2章 相關研究 3 2.1 基礎神經網路 3 2.1.1 卷積神經網路 3 2.1.2 長短期記憶體 4 2.1.3 Transformer 6 2.2 預訓練詞向量 11 2.2.1 詞向量表示法 11 2.2.2 連續詞袋模型和跳字模型 12 2.2.3 全局向量表示法 14 2.2.4 ELMo 15 2.2.5 OpenAI GPT 16 2.2.6 BERT 18 2.3 監督式學習斷詞標註方式 20 2.3.1 四分類法 20 2.3.2 六分類法 21 2.4 本論文相關斷詞模型 21 2.4.1 長詞優先斷詞 21 2.4.2 監督式學習方法 22 2.4.2.1 隱藏式馬可夫模型 22 2.4.2.2 最大熵馬可夫模型 24 2.4.2.3 條件式隨機域 27 2.4.2.4 雙向長短期記憶模型 28 2.4.2.5 堆疊卷積神經網路結合條件式隨機域 30 2.4.3 非監督式學習方法 33 2.4.3.1 基於分段語言模型的非監督斷詞法 33 2.4.3.2 使用雙向語言模型之非監督斷詞法 36 第3章 結合非監督式與監督式學習之斷詞方法 41 3.1 動機以及目的 41 3.2 結合非監督式與監督式學習之斷詞框架 42 第4章 實驗設定 43 4.1 資料集 43 4.2 評估方式 43 4.3 實驗設定 45 4.3.1 預訓練字向量 45 4.3.2 基礎實驗模型設定 45 第5章 實驗結果與討論 47 5.1 基礎系統 47 5.2 結合非監督與監督式學習之斷詞模型實驗 48 5.3 CNN與BiLSTM模型之探討 49 5.4 雙向語言斷詞模型之探討 51 5.5 常用套件比較 55 5.6 基礎模型優缺點 56 第6章 結論與未來展望 57 參考文獻 58

    [1] F. Stahlberg, "Neural machine translation: A review," Journal of Artificial Intelligence Research, vol. 69, pp. 343-418, 2020.
    [2] D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," ICLR 2015, 2015.
    [3] J. Andreas, M. Rohrbach, T. Darrell, and D. Klein, "Learning to compose neural networks for question answering," arXiv preprint arXiv:1601.01705, 2016.
    [4] A. M. N. Allam and M. H. Haggag, "The question answering systems: A survey," International Journal of Research and Reviews in Information Sciences (IJRRIS), vol. 2, no. 3, 2012.
    [5] A. Nenkova and K. McKeown, "Automatic summarization," Foundations and Trends® in Information Retrieval, vol. 5, no. 2–3, pp. 103-233, 2011.
    [6] Y. Liu, "Fine-tune BERT for extractive summarization," arXiv preprint arXiv:1903.10318, 2019.
    [7] O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu, "Convolutional neural networks for speech recognition," IEEE/ACM Transactions on audio, speech, and language processing, vol. 22, no. 10, pp. 1533-1545, 2014.
    [8] A. Graves, A.-r. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," in 2013 IEEE international conference on acoustics, speech and signal processing, 2013: Ieee, pp. 6645-6649.
    [9] Y. Lin, H. Ji, F. Huang, and L. Wu, "A joint neural model for information extraction with global features," in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 7999-8009.
    [10] P. Wang, Y. Qian, F. K. Soong, L. He, and H. Zhao, "Part-of-speech tagging with bidirectional long short-term memory recurrent neural network," arXiv preprint arXiv:1510.06168, 2015.
    [11] X. Zheng, H. Chen, and T. Xu, "Deep learning for Chinese word segmentation and POS tagging," in Proceedings of the 2013 conference on empirical methods in natural language processing, 2013, pp. 647-657.
    [12] Y. Tian, Y. Song, F. Xia, T. Zhang, and Y. Wang, "Improving Chinese word segmentation with wordhood memory networks," in Proceedings of the 58th annual meeting of the association for computational linguistics, 2020, pp. 8274-8285.
    [13] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," nature, vol. 323, no. 6088, pp. 533-536, 1986.
    [14] G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," science, vol. 313, no. 5786, pp. 504-507, 2006.
    [15] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
    [16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Communications of the ACM, vol. 60, no. 6, pp. 84-90, 2017.
    [17] S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
    [18] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," NIPS 2014 Deep Learning and Representation Learning Workshop, 2014.
    [19] R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks."
    [20] M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE transactions on Signal Processing, vol. 45, no. 11, pp. 2673-2681, 1997.
    [21] A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017.
    [22] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," ICLR 2013, 2013.
    [23] Y. Bengio, R. Ducharme, and P. Vincent, "A neural probabilistic language model," Advances in neural information processing systems, vol. 13, 2000.
    [24] J. Pennington, R. Socher, and C. D. Manning, "Glove: Global vectors for word representation," in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532-1543.
    [25] P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching word vectors with subword information," Transactions of the association for computational linguistics, vol. 5, pp. 135-146, 2017.
    [26] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," Advances in neural information processing systems, vol. 26, 2013.
    [27] M. N. Matthew E. Peters, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer, "Deep contextualized word representations," presented at the NAACL 2018, 2018.
    [28] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding by generative pre-training," 2018.
    [29] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," NAACL 2019, 2019.
    [30] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, "Albert: A lite bert for self-supervised learning of language representations," arXiv preprint arXiv:1909.11942, 2019.
    [31] M. Lewis et al., "Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension," Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020.
    [32] V. Sanh, L. Debut, J. Chaumond, and T. Wolf, "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter," NeurIPS 2019, 2019.
    [33] Z. Sun, H. Yu, X. Song, R. Liu, Y. Yang, and D. Zhou, "Mobilebert: a compact task-agnostic bert for resource-limited devices," Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020.
    [34] Y. Liu et al., "Roberta: A robustly optimized bert pretraining approach," arXiv preprint arXiv:1907.11692, 2019.
    [35] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, "Xlnet: Generalized autoregressive pretraining for language understanding," Advances in neural information processing systems, vol. 32, 2019.
    [36] Y. Cui, W. Che, T. Liu, B. Qin, and Z. Yang, "Pre-training with whole word masking for chinese bert," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3504-3514, 2021.
    [37] N. Xue, "Chinese word segmentation as character tagging," in International Journal of Computational Linguistics & Chinese Language Processing, Volume 8, Number 1, February 2003: Special Issue on Word Formation and Chinese Language Processing, 2003, pp. 29-48.
    [38] C. Huang and H. Zhao, "Which is essential for Chinese word segmentation: Character versus word," in Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation, 2006, pp. 1-12.
    [39] K. Seymore, A. McCallum, and R. Rosenfeld, "Learning hidden Markov model structure for information extraction," in AAAI-99 workshop on machine learning for information extraction, 1999, pp. 37-42.
    [40] L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, 1989.
    [41] A. McCallum, D. Freitag, and F. C. Pereira, "Maximum entropy Markov models for information extraction and segmentation," in Icml, 2000, vol. 17, no. 2000, pp. 591-598.
    [42] F. Peng, F. Feng, and A. McCallum, "Chinese segmentation and new word detection using conditional random fields," in COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics, 2004, pp. 562-568.
    [43] J. Lafferty, A. McCallum, and F. C. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," 2001.
    [44] J. Ma, K. Ganchev, and D. Weiss, "State-of-the-art Chinese word segmentation with Bi-LSTMs," Association for Computational Linguistics, 2018.
    [45] D. Weiss, C. Alberti, M. Collins, and S. Petrov, "Structured training for neural network transition-based parsing," Association for Computational Linguistics, 2015.
    [46] C. Wang and B. Xu, "Convolutional neural network with word embeddings for Chinese word segmentation," Asian Federation of Natural Language Processing, 2017.
    [47] Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, "Language modeling with gated convolutional networks," in International conference on machine learning, 2017: PMLR, pp. 933-941.
    [48] Z. Sun and Z.-H. Deng, "Unsupervised neural word segmentation for chinese via segmental language modeling," Association for Computational Linguistics, 2018.
    [49] L. Wang and X. Zheng, "Unsupervised word segmentation with bi-directional neural language model," ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 22, no. 1, pp. 1-16, 2022.
    [50] H. Wang, J. Zhu, S. Tang, and X. Fan, "A new unsupervised approach to word segmentation," Computational Linguistics, vol. 37, no. 3, pp. 421-454, 2011.
    [51] C. Huang and H. Zhao, "Chinese word segmentation: A decade review," Journal of Chinese Information Processing, vol. 21, no. 3, pp. 8-20, 2007.
    [52] Q. Zhang, X. Liu, and J. Fu, "Neural networks incorporating dictionaries for Chinese word segmentation," in Proceedings of the AAAI Conference on Artificial Intelligence, 2018, vol. 32, no. 1.
    [53] J. Liu, F. Wu, C. Wu, Y. Huang, and X. Xie, "Neural chinese word segmentation with dictionary knowledge," in CCF International Conference on Natural Language Processing and Chinese Computing, 2018: Springer, pp. 80-91.
    [54] T. Emerson, "The second international Chinese word segmentation bakeoff," in Proceedings of the fourth SIGHAN workshop on Chinese language Processing, 2005.
    [55] Y. Sasaki, "The truth of the F-measure," Teach tutor mater, vol. 1, no. 5, pp. 1-5, 2007.
    [56] W.-C. Yeh, Y.-L. Hsieh, Y.-C. Chang, and W.-L. Hsu, "MONPA: 中文命名實體及斷詞與詞性同步標註系統 (MONPA: A Multitask Chinese Segmentation, Named-entity and Part-of-speech Annotator)," in Proceedings of the 31st Conference on Computational Linguistics and Speech Processing (ROCLING 2019), 2019, pp. 241-245.

    QR CODE