簡易檢索 / 詳目顯示

研究生: 羅上堡
Shang-Bao Luo
論文名稱: 探究預訓練模型於口語問答任務之研究
A Study on Exploring the Pre-trained Language Model for Spoken Question Answering
指導教授: 陳冠宇
Kuab-Yu Chen
口試委員: 王新民
Hsin-Min Wang
陳柏琳
Ber-Lin Chen
林伯慎
Bor-Shen Lin
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2021
畢業學年度: 109
語文別: 中文
論文頁數: 127
中文關鍵詞: 口語問答系統
外文關鍵詞: Spoken Question Answering
相關次數: 點閱:200下載:14
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報

本論文針對口語問答系統之需求,在BERT(Bidirectional Encoder Representations from Transformers)的架構上,將文本與音訊特徵一起學習,並命名為多輪音訊萃取之捲積神經網路架構(MA-CNNs)。最終在中文口語選擇題的問答任務數據集裡,分別在發展集、測試集1與測試集2可以獲得1.99%、2.20%與1.30%的效能改善。此外,藉由文氏圖(Venn Diagram)的觀察,我們探究各種模型的答題分布情況,進而提出藉由變分自動編碼器(Variational Autoencoder)來進行多模型的融合架構。為了節省整體訓練時間與記憶體需求,本論文也提出一套簡單的作法,將不同模型重要的特徵記錄下來,供後續模型使用。最終在只純用文字的情況下,分別在發展集、測試集1與測試集2得到3.03%、3.34%與1.10%的效能改善;在融合文字與聲學特徵的情況下,分別在發展集、測試集1與測試集2可以獲得4.05%、4.96%與4.10%的效能改善。


In recent years, spoken question answering task has become an emergent research task. This study concentrated on this research subject and explored a BERT-based method, which is named Multi-turn Audio-extraction Convolutional Nerual Networks (MA-CNNs), to consider text and audio features simutaniously. Experimental results revealed that the proposed method achieved 1.99%、2.20% and 1.30% relative improvements on the development set, test1 set and test2 set, respectively. Moreover, according to the Venn diagram, we summarizaed the pros and cons for some models, and then a multi-model fusion architecture was proposed by using a variational autoencoder. To reduce the training time and memory requirements, the study proposed a simple method to induce the final model by using pre-owned features from different models. Consequently, compared with the baseline system, the proposed method with only text features obtained the relative improvements up to 3.03%, 3.34% and 1.10% on development, test1 and test2 sets, respectively. Furthermore, with both text and acoustic features, the proposed framework achieved 4.05%, 4.96%, and 4.10% relative improvements than baseline system on the development set, test set 1, and test set 2, respectively.

第1章 緒論 9 1.1 研究目的及動機 9 1.2 論文大綱 11 第2章 相關研究 12 2.1 詞向量表示法 12 2.1.1 連續型詞袋模型與略詞模型 12 2.1.2 全局向量 14 2.1.3 快文向量 14 2.1.4 上下文向量 15 2.1.5 Transformer 15 2.1.6 ELMo 18 2.1.7 OpenAI GPT 19 2.1.8 BERT 20 2.2 傳統問答模型 22 2.2.1 QACNN 22 2.2.2 Co-Matching 23 2.3 口語問答模型 24 2.3.1 SpeechBERT 25 2.3.2 基於捲積神經網路的階層式多模態之多重預測框架於口語選擇問答系統 27 2.3.3 特徵粒度之訓練策略 31 第3章 基於BERT與多模型融合預測模型 34 3.1 口語問答系統之任務目標 35 3.2 基礎模型與HMM框架之結合 36 3.3 Audio-BERT 37 3.3.1 基礎BERT選擇題架構 37 3.3.2 多輪音訊萃取之捲積神經網路架構 38 3.3.3 循環神經網路預測 40 3.3.4 多輪音訊萃取之捲積神經網路融合增強架構 42 3.3.4.1 融合增強架構一 (AeBERT-1) 43 3.3.4.2 融合增強架構二 (AeBERT-2) 44 3.3.4.3 融合增強架構三 (AeBERT-3) 45 3.3.4.4 融合增強架構四 (AeBERT-4) 46 3.3.4.5 融合增強架構五 (AeBERT-5) 47 3.4 多模型融合預測模型 48 3.4.1 閱讀向量複現模型 48 3.4.2 變分自動編碼器之多模型融合預測模型 49 第4章 實驗設定 53 4.1 詞向量訓練 53 4.2 自動語音辨識系統 54 4.3 HMM框架模型融合訓練設定 56 4.4 BERT相關模型設定 57 4.5 多模型融合模型設定 58 4.5.1 閱讀向量複現模型 58 4.5.2 變分自動編碼器之多模型融合預測模型 59 4.6 特徵粒度訓練策略設定 60 4.7 實驗語料 61 第5章 實驗結果與討論 62 5.1 基礎系統 62 5.2 HMM結合之系統 70 5.3 MA-CNNs融合增強架構系統 74 5.4 變分自動編碼器之多模型融合預測模型系統 83 5.5 討論 93 5.5.1 MA-CNNs相關 93 5.5.2 閱讀向量複現模型相關 96 5.5.3 變分自動編碼器之多模型融合預測模型相關 97 第6章 結論 98 參考文獻 99 附錄 115

[1] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang and H. Lu, "Dual attention network for scene segmentation," IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146-3154, 2019.
[2] T. KARRAS, S. LAINE and T. AILA, "A style-based generator architecture for generative adversarial networks," IEEE Conference on Computer Vision and Pattern Recognition., pp. 4401-4410, 2019.
[3] D. P. KINGMA and P. DHARIWAL, "Glow: Generative flow with invertible 1x1 convolutions," Advances in Neural Information Processing Systems, pp. 10215-10224, 2018.
[4] W. Wang, J. Shen, F. Guo, M.-M. Cheng and A. Borji, "Revisiting video saliency: A large-scale benchmark and a new model," IEEE Conference on Computer Vision and Pattern Recognition, pp. 4894-4903, 2018.
[5] B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Barriuso and A. Torralba, "Semantic understanding of scenes through the ade20k dataset," International Journal of Computer Vision, vol. 127, no. 3, pp. 302-321, 2019.
[6] F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser and J. Xiao, "Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop," arXiv:1506.03365, 2015.
[7] D. CHEN, J. BOLTON and C. D. MANNING, "A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task.," the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 2358-2367, 2016.
[8] B. Dhingra, H. Liu, Z. Yang, W. W. Cohen and R. Salakhutdinov, "Gated-attention readers for text comprehension," the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1832-1846, 2017.
[9] T. MIHAYLOV and A. FRANK, "Knowledgeable reader: Enhancing cloze-style reading comprehension with external commonsense knowledge.," arXiv:1805.07858, 2018.
[10] M. Seo, A. Kembhavi, A. Farhadi and H. Hajishirzi, "Bidirectional attention flow for machine comprehension," arXiv:1611.01603, 2016.
[11] P. Rajpurkar, J. Zhang, K. Lopyrev and P. Liang, "SQuAD: 100,000+ Questions for Machine Comprehension of Text," the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383-2392, 2016.
[12] A. W. Yu, D. Dohan, M.-T. Luong, R. Zhao, K. Chen, M. Norouzi and Q. V. Le, "Qanet: Combining local convolution with global self-attention for reading comprehension," arXiv:1804.09541, 2018.
[13] J. Zhang, X. Zhu, Q. Chen, Z. Ling, L. Dai, S. Wei and H. Jiang, "Exploring question representation and adaptation with neural networks," in 2017 3rd IEEE International Conference on Computer and Communications (ICCC), IEEE, 2017, pp. 1975-1984.
[14] Y. Kim, Y. Jernite, D. Sontag and A. M. Rush, "Character-aware neural language models," in Thirtieth AAAI Conference on Artificial Intelligenc, 2016, pp. 2741-2749.
[15] C. Tan, F. Wei, N. Yang, B. Du, W. Lv and M. Zhou, "S-net: From answer extraction to answer generation for machine reading comprehension," arXiv:1706.04815, 2017.
[16] C. Tan, F. Wei, N. Yang, B. Du, W. Lv and M. Zhou, "S-net: From answer extraction to answer synthesis for machine reading comprehension," in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[17] R. Liu, W. Wei, W. Mao and M. Chikina, "Phase conductor on multi-layered attentions for machine comprehension," arXiv:1710.10504, 2017.
[18] D. WEISSENBORN, G. WIESE and L. SEIFFE, "Making Neural QA as Simple as Possible but not Simpler," the 21st Conference on Computational Natural Language Learning, pp. 271-280, 2017.
[19] Z. Chen, R. Yang, B. Cao, Z. Zhao, D. Cai and X. He, "Smarnet: Teaching machines to read and comprehend like human," arXiv:1710.02772, 2017.
[20] M. Hu, Y. Peng, Z. Huang, X. Qiu, F. Wei and M. Zhou, "Reinforced mnemonic reader for machine reading comprehension," in the 27th International Joint Conference on Artificial Intelligence, 2018, pp. 4099-4106.
[21] B. Pan, H. Li, Z. Zhao, B. Cao, D. Cai and X. He, "Memen: Multi-layer embedding with memory networks for machine comprehension," arXiv:1707.09098, 2017.
[22] W. Wang, N. Yang, F. Wei, B. Chang and M. Zhou, "Gated self-matching networks for reading comprehension and question answering," the 55th Annual Meeting of the Association for Computational Linguistics , vol. 1, pp. 189-198, 2017.
[23] Y. Wang, K. Liu, J. Liu, W. He, Y. Lyu, H. Wu, S. Li and H. Wang, "Multi-Passage Machine Reading Comprehension with Cross-Passage Answer Verification," the 56th Annual Meeting of the Association for Computational Linguistics , vol. 1, pp. 1918-1927, 2018.
[24] Y. Shen, P.-S. Huang, J. Gao and W. Chen, "Reasonet: Learning to stop reading in machine comprehension," the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1047-1055, 2017.
[25] S. REDDY, D. CHEN and C. D. MANNING, "Coqa: A conversational question answering challenge.," Transactions of the Association for Computational Linguistics., pp. 249-226, 7 2019.
[26] E. Choi, H. He, M. Iyyer, M. Yatskar, W.-t. Yih, Y. Choi, P. Liang and L. Zettlemoyer, "QuAC: Question Answering in Context," in the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 2174-2184.
[27] G. Lai, Q. Xie, H. Liu, Y. Yang and E. Hovy, "RACE: Large-scale ReAding Comprehension Dataset From Examinations," in the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 785-794.
[28] V. Ingale and P. Singh, "GenNet: Reading Comprehension with Multiple Choice Questions using Generation and Selection model," arXiv:2003.04360, 2020.
[29] Y. LIANG, J. LI and J. YIN, "A New Multi-choice Reading Comprehension Dataset for Curriculum Learning.," in Asian Conference on Machine Learning, 2019, pp. 742-757.
[30] Q. Ran, P. Li, W. Hu and J. Zhou, "Option comparison network for multiple-choice reading comprehension," arXiv:1903.03033, 2019.
[31] P. ZHU, H. ZHAO and X. LI, "Dual Multi-head Co-attention for Multi-choice Reading Comprehension.," arXiv:2001.09415, 2020.
[32] Z. Zhang, Y. Wu, J. Zhou, S. Duan and H. Zhao, "SG-Net: Syntax-guided machine reading comprehension," arXiv:1908.05147, 2019.
[33] S. Zhang, H. Zhao, Y. Wu, Z. Zhang, X. Zhou and X. Zhou, "DCMN+: Dual co-matching network for multi-choice reading comprehension," arXiv:1908.11511, 2019.
[34] M. Tang, J. Cai and H. H. Zhuo, "Multi-matching network for multiple choice reading comprehension," the AAAI Conference on Artificial Intelligence, no. 33, pp. 7088-7095, 2019.
[35] S. Wang, M. Yu, J. Jiang and S. Chang, "A Co-Matching Model for Multi-choice Reading Comprehension," the 56th Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 746-751, 2018.
[36] D. Chen, A. Fisch, J. Weston and A. Bordes, "Reading Wikipedia to Answer Open-Domain Questions," the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1870-1879, 2017.
[37] W. Yang, Y. Xie, A. Lin, X. Li, L. Tan, K. Xiong, M. Li and J. Lin, "End-to-End Open-Domain Question Answering with BERTserini," NAACL HLT , p. 72, 2019.
[38] R. Das, S. Dhuliawala, M. Zaheer and A. McCallum, "Multi-step retriever-reader interaction for scalable open-domain question answering," arXiv:1905.05733, 2019.
[39] K. Lee, M.-W. Chang 且 K. Toutanova, “Latent Retrieval for Weakly Supervised Open Domain Question Answering,” the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6086-6096, 2019.
[40] P. Banerjee and C. Baral, "Knowledge Fusion and Semantic Knowledge Ranking for Open Domain Question Answering," arXiv:2004.03101, 2020.
[41] S. Stenchikova, D. Hakkani-Tur and G. Tur, "QASR: Spoken Question Answering Using Semantic Role Labeling," in 9th biannual IEEE workshop on Automatic Speech Recognition and Understanding, 2005.
[42] S.-R. Shiang, H.-y. Lee and L.-s. Lee, "Spoken question answering using tree-structured conditional random fields and two-layer random walk," in Fifteenth Annual Conference of the International Speech Communication Association, 2014.
[43] W. Fang, J.-Y. Hsu, H.-y. Lee and L.-S. Lee, "Hierarchical attention model for improved machine comprehension of spoken content," in IEEE Spoken Language Technology Workshop, 2016, pp. 232-238.
[44] C. Van Heerden, D. Karakos, K. Narasimhan, M. Davel and R. Schwartz, "Constructing sub-word units for spoken term detection," in IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2017, pp. 5780-5784.
[45] I. Szöke, "Hybrid word-subword decoding for spoken term detection," in the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Citeseer, 2008, pp. 42-48.
[46] C.-H. Lee, Y.-N. Chen and H.-Y. Lee, "Mitigating the impact of speech recognition errors on spoken question answering by adversarial domain adaptation," in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2019, pp. 7300-7304.
[47] B.-H. Tseng, S.-s. Shen, H.-Y. Lee and L.-S. Lee, "Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine," in Interspeech, 2016, pp. 2731-2735.
[48] S.-B. Luo, H.-S. Lee, K.-Y. Chen and H.-M. Wang, "Spoken Multiple-Choice Question Answering Using Multimodal Convolutional Neural Networks," in 2019 IEEE Automatic Speech Recognition and Understanding Workshop, IEEE, 2019, pp. 772-778.
[49] J. Devlin, M.-W. Chang, K. Lee and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171-4186, 2019.
[50] T.-C. Liu, Y.-H. Wu and H.-Y. Lee, "Query-based attention CNN for text similarity map," arXiv:1709.05036, 2017.
[51] T. Mikolov, K. Chen, G. Corrado and J. Dean, "Efficient estimation of word representations in vector space," arXiv:1301.3781, 2013.
[52] J. Pennington, R. Socher and C. D. Manning, "Glove: Global vectors for word representation," the 2014 conference on empirical methods in natural language processing, pp. 1532-1543, 2014.
[53] P. Bojanowski, E. Grave, A. Joulin and T. Mikolov, "Enriching word vectors with subword information," Transactions of the Association for Computational Linguistics, vol. 5, pp. 135-146, 2017.
[54] Y. Bengio, R. a. V. P. Ducharme and C. Jauvin, "A neural probabilistic language model," Journal of machine learning research, vol. 3, no. Feb, pp. 1137-1155, 2003.
[55] A. Mnih and G. E. Hinton, "A scalable hierarchical distributed language model," in Advances in neural information processing systems, 2009, pp. 1081-1088.
[56] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in neural information processing systems, 2013, pp. 3113-3119.
[57] B. McCann, J. Bradbury, C. Xiong and R. Socher, "Learned in translation: Contextualized word vectors," in Advances in Neural Information Processing Systems, 2017, pp. 6294-6305.
[58] I. Sutskever, O. Vinyals and Q. V. Le, "Sequence to sequence learning with neural networks," in Advances in neural information processing systems, 2014, pp. 3104-3112.
[59] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee and L. Zettlemoyer, "Deep contextualized word representations," the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 2227-2237, 2018.
[60] A. Radford, K. Narasimhan, T. Salimans and I. Sutskever, "Improving language understanding by generative pre-training," 2018. [Online]. Available: URL https://s3-us-west-2. amazonaws. com/openai-assets/researchcovers/languageunsupervised/language understanding paper. pdf.
[61] B. McCann, N. S. Keskar, C. Xiong and R. Socher, "The natural language decathlon: Multitask learning as question answering," arXiv:1806.08730, 2018.
[62] C. Xiong, V. Zhong and R. Socher, "Dcn+: Mixed objective and deep residual coattention for question answering," arXiv:1711.00106, 2017.
[63] A. Vaswani, N. Shazeer, N. a. U. J. a. J. L. a. G. A. N. Parmar, L. Kaiser and I. Polosukhin, "Attention is all you need," in Advances in neural information processing systems, 2017, pp. 5998-6008.
[64] I. Beltagy, M. E. Peters and A. Cohan, "Longformer: The long-document transformer," arXiv:2004.05150, 2020.
[65] N. Kitaev, L. Kaiser and A. Levskaya, "Reformer: The Efficient Transformer," in International Conference on Learning Representations, 2019.
[66] S. Subramanian, R. Collobert, M. Ranzato and Y.-L. Boureau, "Multi-scale Transformer Language Models," arXiv:2005.00581, 2020.
[67] Y. Tay, D. Bahri, D. Metzler, D.-C. Juan, Z. Zhao and C. Zheng, "Synthesizer: Rethinking Self-Attention in Transformer Models," arXiv:2005.00743, 2020.
[68] Z. Dai, Z. Yang, Y. Yang, J. G. Carbonell, Q. Le and R. Salakhutdinov, "Transformer-XL: Attentive Language Models beyond a Fixed-Length Context," the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978-2988, 2019.
[69] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov and Q. V. Le, "Xlnet: Generalized autoregressive pretraining for language understanding," in Advances in neural information processing systems, 2019, pp. 5754-5764.
[70] Y. Cui, W. Che, T. Liu, B. Qin, Z. Yang, S. Wang and G. Hu, "Pre-training with whole word masking for chinese bert," arXiv:1906.08101, 2019.
[71] X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang and Q. Liu, "Tinybert: Distilling bert for natural language understanding," arXiv:1909.10351, 2019.
[72] V. Sanh, L. Debut, J. Chaumond and T. Wolf, "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter," arXiv:1910.01108, 2019.
[73] O. Zafrir, G. Boudoukh, P. Izsak and M. Wasserblat, "Q8bert: Quantized 8bit bert," arXiv:1910.06188, 2019.
[74] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma and R. Soricut, "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations," in International Conference on Learning Representations, 2019.
[75] Z. Sun, H. Yu, X. Song, R. Liu, Y. Yang and D. Zhou, "Mobilebert: a compact task-agnostic bert for resource-limited devices," arXiv:2004.02984, 2020.
[76] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun and Q. Liu, "ERNIE: Enhanced Language Representation with Informative Entities," in the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 1441-1451.
[77] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI Blog, 2019.
[78] Y. Sun, S. Wang, Y. Li, S. Feng, H. Tian, H. Wu and H. Wang, "Ernie 2.0: A continual pre-training framework for language understanding," arXiv:1907.12412, 2019.
[79] M. Joshi, D. Chen, Y. Liu, D. S. Weld, L. Zettlemoyer and O. Levy, "Spanbert: Improving pre-training by representing and predicting spans," Transactions of the Association for Computational Linguistics, vol. 8, pp. 64-77, 2020.
[80] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov and L. Zettlemoyer, "Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension," arXiv:1910.13461, 2019.
[81] K. Song, X. Tan, T. Qin, J. Lu and T.-Y. Liu, "MASS: Masked Sequence to Sequence Pre-training for Language Generation," in International Conference on Machine Learning, 2019, pp. 5926-5936.
[82] L. Dong, N. Yang, W. Wang, F. Wei, X. Liu, Y. Wang, J. Gao, M. Zhou and H.-W. Hon, "Unified language model pre-training for natural language understanding and generation," in Advances in Neural Information Processing Systems, 2019, pp. 13042-13054.
[83] K. Clark, M.-T. Luong, Q. V. Le and C. D. Manning, "ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators," in International Conference on Learning Representations, 2019.
[84] W. Wang, B. Bi, M. Yan, C. Wu, Z. Bao, L. Peng and L. Si, "StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding," arXiv:1908.04577, 2019.
[85] Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang and others, "CodeBERT: A Pre-Trained Model for Programming and Natural Languages," arXiv:2002.08155, 2020.
[86] L. Hou, L. Shang, X. Jiang and Q. Liu, "DynaBERT: Dynamic BERT with Adaptive Width and Depth," arXiv:2004.04037, 2020.
[87] K. Song, X. Tan, T. Qin, J. Lu and T.-Y. Liu, "MPNet: Masked and Permuted Pre-training for Language Understanding," arXiv:2004.09297, 2020.
[88] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer and V. Stoyanov, "Roberta: A robustly optimized bert pretraining approach," arXiv:1907.11692, 2019.
[89] S. Shen, Z. Dong, J. Ye, L. Ma, Z. Yao, A. Gholami, M. W. Mahoney and K. Keutzer, "Q-bert: Hessian based ultra low precision quantization of bert," arXiv:1909.05840, 2019.
[90] H. Miao, R. Liu and S. Gao, "Option Attentive Capsule Network for Multi-choice Reading Comprehension," in International Conference on Neural Information Processing, Springer, 2019, pp. 306-318.
[91] D. Jin, S. Gao, J.-Y. Kao, T. Chung and D. Hakkani-tur, "MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension," arXiv:1910.00458, 2019.
[92] H. Wan, "Multi-task Learning with Multi-head Attention for Multi-choice Reading Comprehension," arXiv:2003.04992, 2020.
[93] M. Shoeybi, M. Patwary, R. Puri, P. LeGresley, J. Casper and B. Catanzaro, "Megatron-lm: Training multi-billion parameter language models using model parallelism," arXiv:1909.08053, 2019.
[94] J. Li, Z. Zhang and H. Zhao, "Multi-choice Dialogue-Based Reading Comprehension with Knowledge and Key Turns," arXiv:2004.13988, 2020.
[95] H. Miao, R. Liu and S. Gao, "A Multiple Granularity Co-Reasoning Model for Multi-choice Reading Comprehension," in 2019 International Joint Conference on Neural Networks, IEEE, 2019, pp. 1-7.
[96] K. S. Tai, R. Socher and C. D. Manning, "Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks," the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing , vol. 1, pp. 1556-1566, 2015.
[97] S. Wang and J. Jiang, "A compare-aggregate model for matching text sequences," arXiv:1611.01747, 2016.
[98] Z. Wang, W. Hamza and R. Florian, "Bilateral multi-perspective matching for natural language sentences," arXiv:1702.03814, 2017.
[99] Y.-S. Chuang, C.-L. Liu and H.-Y. Lee, "SpeechBERT: Cross-Modal Pre-trained Language Model for End-to-end Spoken Question Answering," arXiv:1910.11559, 2019.
[100] F. Ruskey and M. Weston, "A survey of Venn diagrams," Electronic Journal of Combinatorics, vol. 4, p. 3, 1997.
[101] D. B. Wijesinghe, S. Ranathunga and G. Dias, "Computer representation of Venn and Euler diagrams," International Conference on Advances in ICT for Emerging Regions, pp. 100-105, 2016.
[102] Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens and L. Carin, "Variational autoencoder for deep learning of images, labels and captions," in Advances in neural information processing systems, 2016, pp. 2352-2360.
[103] M. J. Kusner, B. Paige and J. M. Hern{\'a}ndez-Lobato, "Grammar variational autoencoder," the 34th International Conference on Machine Learning, vol. 70, pp. 1945-1954, 2017.
[104] C. K. S{\o}nderby, T. Raiko, L. Maal{\o}e, S. K. S{\o}nderby and O. Winther, "Ladder variational autoencoders," in Advances in neural information processing systems, 2016, pp. 3738-3746.
[105] "PTT," [Online]. Available: https://www.ptt.cc/index.html.
[106] "CNA," [Online]. Available: http://www.cna.com.tw/.
[107] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsk´y, G. Stemmer and K. Vesel´y, "The Kaldi Speech Recognition Toolkit," in Automatic Speech Recognition and Understanding Workshop, 2011.
[108] D. Povey, G. Cheng, Y. Wang, K. a. X. H. Li, M. Yarmohammadi and S. Khudanpur, "Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks.," in Interspeech, 2018, pp. 3743-3747.
[109] D. Povey, V. Peddinti, D. Galvez, P. Ghahremani, V. Manohar, X. Na, Y. Wang and S. Khudanpur, "Purely sequence-trained neural networks for ASR based on lattice-free MMI," in Interspeech, 2016, pp. 2751-2755.
[110] K. Vesel{\`y}, A. Ghoshal, L. Burget and D. Povey, "Sequence-discriminative training of deep neural networks," in Interspeech, 2013, pp. 2345-2349.
[111] H.-M. Wang, B. Chen, J.-W. Kuo and S.-S. Cheng, "MATBN: A Mandarin Chinese broadcast news corpus," in International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 2, June 2005: Special Issue on Annotated Speech Corpora, 2005, pp. 219-236.
[112] "CKIP," [Online]. Available: http://ckip.iis.sinica.edu.tw/.
[113] H. Zhang and D. Chiang, "Kneser-ney smoothing on expected counts," the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 765-774, 2014.
[114] F. James, Modified Kneser-Ney smoothing of n-gram models, 2000.
[115] F. Jelinek, R. L. Mercer, L. R. Bahl and J. K. Baker, "Perplexity—a measure of the difficulty of speech recognition tasks," The Journal of the Acoustical Society of America, vol. 62, pp. S63-S63, 1977.
[116] T. Mikolov, M. Karafi{\'a}t, L. Burget, J. {\v{C}}ernock{\`y} and S. Khudanpur, "Recurrent neural network based language model," Eleventh annual conference of the international speech communication association, 2010.
[117] X. Liu, Y. Wang, X. Chen, M. J. Gales and P. C. Woodland, "Efficient lattice rescoring using recurrent neural network language models," International Conference on Acoustics, Speech and Signal Processing, pp. 4908-4912, 2014.
[118] L. Prechelt, "Automatic early stopping using cross validation: quantifying the criteria," in Neural Networks, vol. 11, 1998, pp. 761-767.
[119] O. M. Strand and A. Egeberg, "Cepstral mean and variance normalization in the model domain," in COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction, 2004.
[120] Y.-F. Liao, W.-H. Hsu, Y.-C. Lin, Y.-H. S. Chang, M. Pleva, J. Juhar and G.-F. Deng, "Formosa speech recognition challenge 2018: data, plan and baselines," in 11th International Symposium on Chinese Spoken Language Processing, IEEE, 2018, pp. 270-274.
[121] J.-H. Kim, J. Jun and B.-T. Zhang, "Bilinear attention networks," in Advances in Neural Information Processing Systems, 2018, pp. 1564-1574.
[122] G. Ye, D. Liu, I.-H. Jhuo and S.-F. Chang, "Robust late fusion with rank minimization," in Conference on Computer Vision and Pattern Recognition, IEEE, 2012, pp. 3021-3028.
[123] H. J. Escalante, C. A. H{\'e}rnadez, L. E. Sucar and M. Montes, "Late fusion of heterogeneous methods for multimedia image retrieval," in the 1st ACM international conference on Multimedia information retrieval, 2008, pp. 172-179.
[124] S. Ruder, "An overview of gradient descent optimization algorithms," arXiv:1609.04747, 2016.
[125] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv:1412.6980, 2014.

QR CODE