簡易檢索 / 詳目顯示

研究生: 顏苙峯
Li-Phen Yen
論文名稱: 基於上下文詞嵌入模型之神經檢索模型於語音文件檢索
Neural Retrieval Models using Contextualized Word Embedding Model for Spoken Document Retrieval
指導教授: 陳冠宇
Kuan-Yu Chen
口試委員: 王新民
陳柏琳
林伯慎
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2020
畢業學年度: 108
語文別: 中文
論文頁數: 99
中文關鍵詞: 資訊檢索語音文件檢索機器學習偽相關反饋生成對抗網路語言模型基於預訓練深層雙向模型
外文關鍵詞: Information Retrieval, Spoken Document Retrieval, Machine Learning, Pseudo-Relevance Feedback, Generative Adversarial Network, Language Model, bidirectional encoder representations from transformers
相關次數: 點閱:259下載:0
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 由於多媒體數據和許多語音應用的問世,像是語音助理Alexa、Siri,這些應用已經充斥在我們的日常生活,檢索多媒體內容以滿足用戶查詢的語音文件檢索任務已經越來越受到重視。目前基於預訓練深層雙向模型(Bidirectional Encoder Representations from Transformers, BERT)的檢索模型,多半是利用預訓練模型微調(Fine-tune)模型,取得查詢(Query)和文件(Document)各自的表示法後進行計分。在本論文中,我們基於預訓練模型的檢索模型架構提出兩個方法:一個是基於監督式(Supervised)學習上的模型另外一個是基於無監督(Unsupervised)學習的模型。首先,我們提出一套使用生成對抗網路產生偽向量加強表示的模型,此模型利用生成對抗網路生成偽向量。另外我們提出一套基於預訓練深層雙向模型新穎監督式模型,效果和效率上都有卓越的成效,本論文另外利用語音的音節(Syllable)作為特徵,減緩在語音文件檢索上可能會造成的問題,一系列的實驗亦證明本研究提出方法之成效。


    Due to the multimedia data and many speech applications coming out, such as Alexa and Siri, lots of applications with speech have been flooded into our daily lives. Spoken document retrieval (SDR) retrieves multimedia content to satisfy user’s queries has gradually become an important task. At present, the retrieval models based on the bidirectional encoder representations from transformers (BERT) mostly obtain the final models by fine-tuning the pre-trained model. After that, the final models can be used to infer representations for queries and documents. In this paper, we propose an unsupervised and a supervised retrieval methods. First, we propose a model that uses generative adversarial networks (GAN) to generate pseudo-vector-enhanced representations. Besides, we propose a novel supervised retrieval model based on BERT, which can achieve excellent results in both efficiency and performance. Moreover, we also explore some ways to leverage the syllable features for enhancing the SDR performance. A series of experiments have proven the effective and efficient of our proposed retrieval methods.

    致謝 摘要 Abstract 本論文使用之符號意義 第1章 緒論 第2章 相關研究 第3章 使用生成對抗網路產生之偽相關表示(Generating Pseudo-relevant Representations) 第4章 神經文件語言模型網路(Neural Document Language Model Framework,NDLM) 第5章 實驗 第6章 結論 第7章 論文引用 附錄

    1. Minsky, M., “Steps toward artificial intelligence.” Proceedings of the IRE, 1961. 49(1): pp. 8-30.
    2. Hecht-Nielsen, R., “Theory of the backpropagation neural network,” in Neural networks for perception. 1992, Elsevier. pp. 65-93.
    3. Rumelhart, D.E., R. Durbin, R. Golden, & Y. Chauvin, “Backpropagation: The basic theory.” Backpropagation: Theory, architectures and applications, 1995: pp. 1-34.
    4. Gardner, M.W. & S. Dorling, “Artificial neural networks (the multilayer perceptron)—a review of applications in the atmospheric sciences.” Atmospheric environment, 1998. 32(14-15): pp. 2627-2636.
    5. Moore, G.E., “Cramming more components onto integrated circuits.” 1965, McGraw-Hill New York, NY, USA:.
    6. LeCun, Y., Y. Bengio, & G. Hinton, “Deep learning.” Nature, 2015. 521(7553): pp. 436-444.
    7. Young, T., D. Hazarika, S. Poria, & E. Cambria, “Recent trends in deep learning based natural language processing.” ieee Computational intelligenCe magazine, 2018. 13(3): pp. 55-75.
    8. Hand, D.J., “Principles of data mining.” Drug safety, 2007. 30(7): pp. 621-622.
    9. Krizhevsky, A., I. Sutskever, & G.E. Hinton. “Imagenet classification with deep convolutional neural networks.” in Advances in neural information processing systems. 2012.
    10. Manning, C.D., P. Raghavan, & H. Schütze, Introduction to information retrieval. 2008: Cambridge university press.
    11. Lee, L.-s. & B. Chen, “Spoken document understanding and organization.” IEEE Signal Processing Magazine, 2005. 22(5): pp. 42-60.
    12. Plamondon, R. & S.N. Srihari, “Online and off-line handwriting recognition: a comprehensive survey.” IEEE Transactions on pattern analysis and machine intelligence, 2000. 22(1): pp. 63-84.
    13. Zhu, X. & A.B. Goldberg, “Introduction to semi-supervised learning.” Synthesis lectures on artificial intelligence and machine learning, 2009. 3(1): pp. 1-130.
    14. Krogh, A. & J. Vedelsby. “Neural network ensembles, cross validation, and active learning.” in Advances in neural information processing systems. 1995.
    15. Barlow, H.B., “Unsupervised learning.” Neural computation, 1989. 1(3): pp. 295-311.
    16. Hinton, G.E., T.J. Sejnowski, & T.A. Poggio, Unsupervised learning: foundations of neural computation. 1999: MIT press.
    17. Goodfellow, I., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, & Y. Bengio. “Generative adversarial nets.” in Advances in neural information processing systems. 2014.
    18. Manning, C.D., C.D. Manning, & H. Schütze, Foundations of statistical natural language processing. 1999: MIT press.
    19. Lehmann, W.P., Historical linguistics: an introduction. 2013: Routledge.
    20. Wu, Y., M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, & K. Macherey, “Google's neural machine translation system: Bridging the gap between human and machine translation.” arXiv preprint arXiv:1609.08144, 2016.
    21. Kepuska, V. & G. Bohouta. “Next-generation of virtual personal assistants (microsoft cortana, apple siri, amazon alexa and google home).” in 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC). 2018. IEEE.
    22. Bellegarda, J.R., “Spoken language understanding for natural interaction: The siri experience,” in Natural interaction with robots, knowbots and smartphones. 2014, Springer. pp. 3-14.
    23. 蔡瑜方, “中文斷詞與詞類標記系統簡介.” 2007 年, 2004. 7.
    24. 馬偉雲, 謝佑明, 楊昌樺, & 陳克健. “中文語料庫構建及管理系統設計.” in Proceedings of the 14th Conference on Computational Linguistics and Speech Processing (ROCLING 14). 2001.
    25. Luo, S.-B., H.-S. Lee, K.-Y. Chen, & H.-M. Wang. “Spoken Multiple-Choice Question Answering Using Multimodal Convolutional Neural Networks.” in 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). 2019. IEEE.
    26. Rajpurkar, P., J. Zhang, K. Lopyrev, & P. Liang, “Squad: 100,000+ questions for machine comprehension of text.” arXiv preprint arXiv:1606.05250, 2016.
    27. Nenkova, A. & K. McKeown, “Automatic summarization.” Foundations and Trends® in Information Retrieval, 2011. 5(2–3): pp. 103-233.
    28. Prasad, B., A. Gupta, H.-M.D. Toong, & S.E. Madnick, “A microcomputer-based image database management system.” IEEE Transactions on Industrial Electronics, 1987(1): pp. 83-88.
    29. Chelba, C., T.J. Hazen, & M. Saraclar, “Retrieval and browsing of spoken content.” IEEE Signal Processing Magazine, 2008. 25(3): pp. 39-49.
    30. Barnard, E., M.H. Davel, & G.B. Van Huyssteen. “Speech technology for information access: a South African case study.” in 2010 AAAI Spring Symposium Series. 2010.
    31. Brems, D.J. & M.S. Schoeffler, “Automatic speech recognition (ASR) processing using confidence measures.” 1996, Google Patents.
    32. Ricardo, B.-Y. & R.-N. Berthier, “Modern information retrieval: the concepts and technology behind search.” New Jersey, USA: Addi-son-Wesley Professional, 2011.
    33. Bazzi, I., “Modelling out-of-vocabulary words for robust speech recognition.” 2002, Massachusetts Institute of Technology.
    34. Young, S.R. “Detecting misrecognitions and out-of-vocabulary words.” in Proceedings of ICASSP'94. IEEE International Conference on Acoustics, Speech and Signal Processing. 1994. IEEE.
    35. Gårding, E., “A generative model of intonation,” in Prosody: Models and measurements. 1983, Springer. pp. 11-25.
    36. Kingma, D.P. & M. Welling, “Auto-encoding variational bayes.” arXiv preprint arXiv:1312.6114, 2013.
    37. Isola, P., J.-Y. Zhu, T. Zhou, & A.A. Efros. “Image-to-image translation with conditional adversarial networks.” in Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
    38. Osindero, M.M.S., “Conditional Generative Adversarial Nets,” in arxiv. 2014. pp. 1411.1784.
    39. Zhu, J.-Y., T. Park, P. Isola, & A.A. Efros. “Unpaired image-to-image translation using cycle-consistent adversarial networks.” in Proceedings of the IEEE international conference on computer vision. 2017.
    40. Zhang, H., I. Goodfellow, D. Metaxas, & A. Odena, “Self-attention generative adversarial networks.” arXiv preprint arXiv:1805.08318, 2018.
    41. Miyato, T., T. Kataoka, M. Koyama, & Y. Yoshida, “Spectral normalization for generative adversarial networks.” arXiv preprint arXiv:1802.05957, 2018.
    42. Brock, A., J. Donahue, & K. Simonyan, “Large scale gan training for high fidelity natural image synthesis.” arXiv preprint arXiv:1809.11096, 2018.
    43. Salton, G., E.A. Fox, & H. Wu, “Extended Boolean information retrieval.” Communications of the ACM, 1983. 26(11): pp. 1022-1036.
    44. Salton, G., A. Wong, & C.-S. Yang, “A vector space model for automatic indexing.” Communications of the ACM, 1975. 18(11): pp. 613-620.
    45. Salton, G. & M.E. Lesk, “Computer evaluation of indexing and text processing.” Journal of the ACM (JACM), 1968. 15(1): pp. 8-36.
    46. Robertson, S., “Understanding inverse document frequency: on theoretical arguments for IDF.” Journal of documentation, 2004.
    47. Jones, S.E.R.K.S., “Relevance weighting of search terms,” in Document retrieval systems. 1988. pp. 143-160.
    48. Ponte, J.M. & W.B. Croft. “A language modeling approach to information retrieval.” in Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval. 1998.
    49. Zamani, H. & W.B. Croft. “Embedding-based query language models.” in Proceedings of the 2016 ACM international conference on the theory of information retrieval. 2016.
    50. Lv, Y. & C. Zhai. “A comparative study of methods for estimating query language models with pseudo feedback.” in Proceedings of the 18th ACM conference on Information and knowledge management. 2009.
    51. Zhai, C. & J. Lafferty. “Model-based feedback in the language modeling approach to information retrieval.” in Proceedings of the tenth international conference on Information and knowledge management. 2001.
    52. Zhai, C. & J. Lafferty. “A study of smoothing methods for language models applied to ad hoc information retrieval.” in ACM SIGIR Forum. 2017. ACM New York, NY, USA.
    53. Xu, J. & W.B. Croft. “Quary expansion using local and global document analysis.” in Acm sigir forum. 2017. ACM New York, NY, USA.
    54. Dilawar, N., H. Majeed, M.O. Beg, N. Ejaz, K. Muhammad, I. Mehmood, & Y. Nam, “Understanding citizen issues through reviews: A step towards data informed planning in smart cities.” Applied Sciences, 2018. 8(9): pp. 1589.
    55. Bengio, Y., R. Ducharme, P. Vincent, & C. Jauvin, “A neural probabilistic language model.” Journal of machine learning research, 2003. 3(Feb): pp. 1137-1155.
    56. Mikolov, T., K. Chen, G. Corrado, & J. Dean, “Efficient estimation of word representations in vector space.” arXiv preprint arXiv:1301.3781, 2013.
    57. Mikolov, T., I. Sutskever, K. Chen, G.S. Corrado, & J. Dean. “Distributed representations of words and phrases and their compositionality.” in Advances in neural information processing systems. 2013.
    58. Pennington, J., R. Socher, & C.D. Manning. “Glove: Global vectors for word representation.” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.
    59. Guo, J., Y. Fan, Q. Ai, & W.B. Croft. “A deep relevance matching model for ad-hoc retrieval.” in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. 2016.
    60. Pearson, K., “LIII. On lines and planes of closest fit to systems of points in space.” The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1901. 2(11): pp. 559-572.
    61. Hui, K., A. Yates, K. Berberich, & G. de Melo, “Pacrr: A position-aware neural ir model for relevance matching.” arXiv preprint arXiv:1704.03940, 2017.
    62. Hui, K., A. Yates, K. Berberich, & G. De Melo. “Co-PACRR: A context-aware neural IR model for ad-hoc retrieval.” in Proceedings of the eleventh ACM international conference on web search and data mining. 2018.
    63. Xiong, C., Z. Dai, J. Callan, Z. Liu, & R. Power. “End-to-end neural ad-hoc ranking with kernel pooling.” in Proceedings of the 40th International ACM SIGIR conference on research and development in information retrieval. 2017.
    64. Peters, M.E., M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, & L. Zettlemoyer, “Deep contextualized word representations.” arXiv preprint arXiv:1802.05365, 2018.
    65. Devlin, J., M.-W. Chang, K. Lee, & K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding.” arXiv preprint arXiv:1810.04805, 2018.
    66. Devlin, J., M.-W. Chang, K. Lee, & K. Toutanova. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. Minneapolis, Minnesota: Association for Computational Linguistics.
    67. Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł. Kaiser, & I. Polosukhin. “Attention is all you need.” in Advances in neural information processing systems. 2017.
    68. Deng, J., W. Dong, R. Socher, L.-J. Li, K. Li, & L. Fei-Fei. “Imagenet: A large-scale hierarchical image database.” in 2009 IEEE conference on computer vision and pattern recognition. 2009. IEEE.
    69. He, K., X. Zhang, S. Ren, & J. Sun. “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification.” in Proceedings of the IEEE international conference on computer vision. 2015.
    70. Goyal, P., P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, & K. He, “Accurate, large minibatch sgd: Training imagenet in 1 hour.” arXiv preprint arXiv:1706.02677, 2017.
    71. Marmanis, D., M. Datcu, T. Esch, & U. Stilla, “Deep learning earth observation classification using ImageNet pretrained networks.” IEEE Geoscience and Remote Sensing Letters, 2015. 13(1): pp. 105-109.
    72. Hochreiter, S. & J. Schmidhuber, “Long short-term memory.” Neural computation, 1997. 9(8): pp. 1735-1780.
    73. Mikolov, T., M. Karafiát, L. Burget, J. Černocký, & S. Khudanpur. “Recurrent neural network based language model.” in Eleventh annual conference of the international speech communication association. 2010.
    74. Rumelhart, D.E., G.E. Hinton, & R.J. Williams, “Learning representations by back-propagating errors.” nature, 1986. 323(6088): pp. 533-536.
    75. Davies, M., “Google books corpus.(based on Google books n-grams).” 2011.
    76. Nogueira, R. & K. Cho, “Passage Re-ranking with BERT.” arXiv preprint arXiv:1901.04085, 2019.
    77. Salton, G. & C. Buckley, “Term-weighting approaches in automatic text retrieval.” Information processing & management, 1988. 24(5): pp. 513-523.
    78. Lioma, C. & R. Blanco. “Part of speech based term weighting for information retrieval.” in European Conference on Information Retrieval. 2009. Springer.
    79. Zheng, G. & J. Callan. “Learning to reweight terms with distributed representations.” in Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. 2015.
    80. Chen, J. & Q. Gu, “Closing the generalization gap of adaptive gradient methods in training deep neural networks.” arXiv preprint arXiv:1806.06763, 2018.
    81. Golub, G.H. & C. Reinsch, “Singular value decomposition and least squares solutions,” in Linear Algebra. 1971, Springer. pp. 134-151.
    82. Rosasco, L., E.D. Vito, A. Caponnetto, M. Piana, & A. Verri, “Are loss functions all the same?” Neural Computation, 2004. 16(5): pp. 1063-1076.
    83. Duan, K.-B. & S.S. Keerthi. “Which is the best multiclass SVM method? An empirical study.” in International workshop on multiple classifier systems. 2005. Springer.
    84. Gibbs, J.W., Elementary principles in statistical mechanics: developed with especial reference to the rational foundation of thermodynamics. 1902: C. Scribner's sons.
    85. Hinton, G.E. & R.R. Salakhutdinov. “Replicated softmax: an undirected topic model.” in Advances in neural information processing systems. 2009.
    86. Krugler, D.F., The Voice of America and the domestic propaganda battles, 1945-1953. 2000: University of Missouri Press.

    87. Wang, H.-M., B. Chen, J.-W. Kuo, & S.-S. Cheng, “MATBN: A Mandarin Chinese broadcast news corpus.” International Journal of Computational Linguistics & Chinese Language Processing : Special Issue on Annotated Speech Corpora, 2005. 10(2): pp. 219-236.
    88. Xin, X., “A developing market in news: Xinhua News Agency and Chinese newspapers.” Media, Culture & Society, 2006. 28(1): pp. 45-66.
    89. 陳光華, “資訊檢索技術之核心.” 大學圖書館, 3 (1), 1999: pp. 17-28.
    90. Borko, H., “Information science: what is it?” American documentation, 1968. 19(1): pp. 3-5.
    91. 吳美美, “中文檢索評估系統可行性研究.” 圖書館學與資訊科學, 1999.
    92. Losee, R.M., “Evaluating retrieval performance given database and query characteristics: Analytic determination of performance surfaces.” Journal of the American Society for Information Science, 1996. 47(1): pp. 95-105.
    93. Lioma, C., J.G. Simonsen, & B. Larsen. “Evaluation measures for relevance and credibility in ranked lists.” in Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval. 2017.
    94. Fawcett, T., “An introduction to ROC analysis.” Pattern recognition letters, 2006. 27(8): pp. 861-874.
    95. Baccini, A., S. Déjean, L. Lafage, & J. Mothe, “How many performance measures to evaluate Information Retrieval Systems?” Knowledge and Information Systems, 2012. 30(3): pp. 693-713.
    96. Lutz, M., Programming python. 2001: O'Reilly Media, Inc.
    97. Dai, Z. & J. Callan, “Context-aware sentence/passage term importance estimation for first stage retrieval.” arXiv preprint arXiv:1910.10687, 2019.
    98. Loshchilov, I. & F. Hutter, “Fixing weight decay regularization in adam.” 2018.
    99. Tibshirani, R., “Regression shrinkage and selection via the lasso.” Journal of the Royal Statistical Society: Series B (Methodological), 1996. 58(1): pp. 267-288.
    100. Humphrys, M., “Continuous output-the sigmoid function.” School of Computing. Dublin City University, 1987.
    101. Han, J. & C. Moraga. “The influence of the sigmoid function parameters on the speed of backpropagation learning.” in International Workshop on Artificial Neural Networks. 1995. Springer.
    102. De Boer, P.-T., D.P. Kroese, S. Mannor, & R.Y. Rubinstein, “A tutorial on the cross-entropy method.” Annals of operations research, 2005. 134(1): pp. 19-67.
    103. Dietterich, T.G. “Ensemble methods in machine learning.” in International workshop on multiple classifier systems. 2000. Springer.
    104. Zhou, Z.-H., Ensemble methods: foundations and algorithms. 2012: CRC press.
    105. Wolpert, D.H., “Stacked generalization.” Neural networks, 1992. 5(2): pp. 241-259.

    無法下載圖示 全文公開日期 2025/08/26 (校內網路)
    全文公開日期 2025/08/26 (校外網路)
    全文公開日期 2025/08/26 (國家圖書館:臺灣博碩士論文系統)
    QR CODE