研究生: |
羅上堡 Shang-Bao Luo |
---|---|
論文名稱: |
探究預訓練模型於口語問答任務之研究 A Study on Exploring the Pre-trained Language Model for Spoken Question Answering |
指導教授: |
陳冠宇
Kuab-Yu Chen |
口試委員: |
王新民
Hsin-Min Wang 陳柏琳 Ber-Lin Chen 林伯慎 Bor-Shen Lin |
學位類別: |
碩士 Master |
系所名稱: |
電資學院 - 資訊工程系 Department of Computer Science and Information Engineering |
論文出版年: | 2021 |
畢業學年度: | 109 |
語文別: | 中文 |
論文頁數: | 127 |
中文關鍵詞: | 口語問答系統 |
外文關鍵詞: | Spoken Question Answering |
相關次數: | 點閱:200 下載:14 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
本論文針對口語問答系統之需求,在BERT(Bidirectional Encoder Representations from Transformers)的架構上,將文本與音訊特徵一起學習,並命名為多輪音訊萃取之捲積神經網路架構(MA-CNNs)。最終在中文口語選擇題的問答任務數據集裡,分別在發展集、測試集1與測試集2可以獲得1.99%、2.20%與1.30%的效能改善。此外,藉由文氏圖(Venn Diagram)的觀察,我們探究各種模型的答題分布情況,進而提出藉由變分自動編碼器(Variational Autoencoder)來進行多模型的融合架構。為了節省整體訓練時間與記憶體需求,本論文也提出一套簡單的作法,將不同模型重要的特徵記錄下來,供後續模型使用。最終在只純用文字的情況下,分別在發展集、測試集1與測試集2得到3.03%、3.34%與1.10%的效能改善;在融合文字與聲學特徵的情況下,分別在發展集、測試集1與測試集2可以獲得4.05%、4.96%與4.10%的效能改善。
In recent years, spoken question answering task has become an emergent research task. This study concentrated on this research subject and explored a BERT-based method, which is named Multi-turn Audio-extraction Convolutional Nerual Networks (MA-CNNs), to consider text and audio features simutaniously. Experimental results revealed that the proposed method achieved 1.99%、2.20% and 1.30% relative improvements on the development set, test1 set and test2 set, respectively. Moreover, according to the Venn diagram, we summarizaed the pros and cons for some models, and then a multi-model fusion architecture was proposed by using a variational autoencoder. To reduce the training time and memory requirements, the study proposed a simple method to induce the final model by using pre-owned features from different models. Consequently, compared with the baseline system, the proposed method with only text features obtained the relative improvements up to 3.03%, 3.34% and 1.10% on development, test1 and test2 sets, respectively. Furthermore, with both text and acoustic features, the proposed framework achieved 4.05%, 4.96%, and 4.10% relative improvements than baseline system on the development set, test set 1, and test set 2, respectively.
[1] J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang and H. Lu, "Dual attention network for scene segmentation," IEEE Conference on Computer Vision and Pattern Recognition, pp. 3146-3154, 2019.
[2] T. KARRAS, S. LAINE and T. AILA, "A style-based generator architecture for generative adversarial networks," IEEE Conference on Computer Vision and Pattern Recognition., pp. 4401-4410, 2019.
[3] D. P. KINGMA and P. DHARIWAL, "Glow: Generative flow with invertible 1x1 convolutions," Advances in Neural Information Processing Systems, pp. 10215-10224, 2018.
[4] W. Wang, J. Shen, F. Guo, M.-M. Cheng and A. Borji, "Revisiting video saliency: A large-scale benchmark and a new model," IEEE Conference on Computer Vision and Pattern Recognition, pp. 4894-4903, 2018.
[5] B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Barriuso and A. Torralba, "Semantic understanding of scenes through the ade20k dataset," International Journal of Computer Vision, vol. 127, no. 3, pp. 302-321, 2019.
[6] F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser and J. Xiao, "Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop," arXiv:1506.03365, 2015.
[7] D. CHEN, J. BOLTON and C. D. MANNING, "A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task.," the 54th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 2358-2367, 2016.
[8] B. Dhingra, H. Liu, Z. Yang, W. W. Cohen and R. Salakhutdinov, "Gated-attention readers for text comprehension," the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1832-1846, 2017.
[9] T. MIHAYLOV and A. FRANK, "Knowledgeable reader: Enhancing cloze-style reading comprehension with external commonsense knowledge.," arXiv:1805.07858, 2018.
[10] M. Seo, A. Kembhavi, A. Farhadi and H. Hajishirzi, "Bidirectional attention flow for machine comprehension," arXiv:1611.01603, 2016.
[11] P. Rajpurkar, J. Zhang, K. Lopyrev and P. Liang, "SQuAD: 100,000+ Questions for Machine Comprehension of Text," the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 2383-2392, 2016.
[12] A. W. Yu, D. Dohan, M.-T. Luong, R. Zhao, K. Chen, M. Norouzi and Q. V. Le, "Qanet: Combining local convolution with global self-attention for reading comprehension," arXiv:1804.09541, 2018.
[13] J. Zhang, X. Zhu, Q. Chen, Z. Ling, L. Dai, S. Wei and H. Jiang, "Exploring question representation and adaptation with neural networks," in 2017 3rd IEEE International Conference on Computer and Communications (ICCC), IEEE, 2017, pp. 1975-1984.
[14] Y. Kim, Y. Jernite, D. Sontag and A. M. Rush, "Character-aware neural language models," in Thirtieth AAAI Conference on Artificial Intelligenc, 2016, pp. 2741-2749.
[15] C. Tan, F. Wei, N. Yang, B. Du, W. Lv and M. Zhou, "S-net: From answer extraction to answer generation for machine reading comprehension," arXiv:1706.04815, 2017.
[16] C. Tan, F. Wei, N. Yang, B. Du, W. Lv and M. Zhou, "S-net: From answer extraction to answer synthesis for machine reading comprehension," in Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[17] R. Liu, W. Wei, W. Mao and M. Chikina, "Phase conductor on multi-layered attentions for machine comprehension," arXiv:1710.10504, 2017.
[18] D. WEISSENBORN, G. WIESE and L. SEIFFE, "Making Neural QA as Simple as Possible but not Simpler," the 21st Conference on Computational Natural Language Learning, pp. 271-280, 2017.
[19] Z. Chen, R. Yang, B. Cao, Z. Zhao, D. Cai and X. He, "Smarnet: Teaching machines to read and comprehend like human," arXiv:1710.02772, 2017.
[20] M. Hu, Y. Peng, Z. Huang, X. Qiu, F. Wei and M. Zhou, "Reinforced mnemonic reader for machine reading comprehension," in the 27th International Joint Conference on Artificial Intelligence, 2018, pp. 4099-4106.
[21] B. Pan, H. Li, Z. Zhao, B. Cao, D. Cai and X. He, "Memen: Multi-layer embedding with memory networks for machine comprehension," arXiv:1707.09098, 2017.
[22] W. Wang, N. Yang, F. Wei, B. Chang and M. Zhou, "Gated self-matching networks for reading comprehension and question answering," the 55th Annual Meeting of the Association for Computational Linguistics , vol. 1, pp. 189-198, 2017.
[23] Y. Wang, K. Liu, J. Liu, W. He, Y. Lyu, H. Wu, S. Li and H. Wang, "Multi-Passage Machine Reading Comprehension with Cross-Passage Answer Verification," the 56th Annual Meeting of the Association for Computational Linguistics , vol. 1, pp. 1918-1927, 2018.
[24] Y. Shen, P.-S. Huang, J. Gao and W. Chen, "Reasonet: Learning to stop reading in machine comprehension," the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1047-1055, 2017.
[25] S. REDDY, D. CHEN and C. D. MANNING, "Coqa: A conversational question answering challenge.," Transactions of the Association for Computational Linguistics., pp. 249-226, 7 2019.
[26] E. Choi, H. He, M. Iyyer, M. Yatskar, W.-t. Yih, Y. Choi, P. Liang and L. Zettlemoyer, "QuAC: Question Answering in Context," in the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, pp. 2174-2184.
[27] G. Lai, Q. Xie, H. Liu, Y. Yang and E. Hovy, "RACE: Large-scale ReAding Comprehension Dataset From Examinations," in the 2017 Conference on Empirical Methods in Natural Language Processing, 2017, pp. 785-794.
[28] V. Ingale and P. Singh, "GenNet: Reading Comprehension with Multiple Choice Questions using Generation and Selection model," arXiv:2003.04360, 2020.
[29] Y. LIANG, J. LI and J. YIN, "A New Multi-choice Reading Comprehension Dataset for Curriculum Learning.," in Asian Conference on Machine Learning, 2019, pp. 742-757.
[30] Q. Ran, P. Li, W. Hu and J. Zhou, "Option comparison network for multiple-choice reading comprehension," arXiv:1903.03033, 2019.
[31] P. ZHU, H. ZHAO and X. LI, "Dual Multi-head Co-attention for Multi-choice Reading Comprehension.," arXiv:2001.09415, 2020.
[32] Z. Zhang, Y. Wu, J. Zhou, S. Duan and H. Zhao, "SG-Net: Syntax-guided machine reading comprehension," arXiv:1908.05147, 2019.
[33] S. Zhang, H. Zhao, Y. Wu, Z. Zhang, X. Zhou and X. Zhou, "DCMN+: Dual co-matching network for multi-choice reading comprehension," arXiv:1908.11511, 2019.
[34] M. Tang, J. Cai and H. H. Zhuo, "Multi-matching network for multiple choice reading comprehension," the AAAI Conference on Artificial Intelligence, no. 33, pp. 7088-7095, 2019.
[35] S. Wang, M. Yu, J. Jiang and S. Chang, "A Co-Matching Model for Multi-choice Reading Comprehension," the 56th Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 746-751, 2018.
[36] D. Chen, A. Fisch, J. Weston and A. Bordes, "Reading Wikipedia to Answer Open-Domain Questions," the 55th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1870-1879, 2017.
[37] W. Yang, Y. Xie, A. Lin, X. Li, L. Tan, K. Xiong, M. Li and J. Lin, "End-to-End Open-Domain Question Answering with BERTserini," NAACL HLT , p. 72, 2019.
[38] R. Das, S. Dhuliawala, M. Zaheer and A. McCallum, "Multi-step retriever-reader interaction for scalable open-domain question answering," arXiv:1905.05733, 2019.
[39] K. Lee, M.-W. Chang 且 K. Toutanova, “Latent Retrieval for Weakly Supervised Open Domain Question Answering,” the 57th Annual Meeting of the Association for Computational Linguistics, pp. 6086-6096, 2019.
[40] P. Banerjee and C. Baral, "Knowledge Fusion and Semantic Knowledge Ranking for Open Domain Question Answering," arXiv:2004.03101, 2020.
[41] S. Stenchikova, D. Hakkani-Tur and G. Tur, "QASR: Spoken Question Answering Using Semantic Role Labeling," in 9th biannual IEEE workshop on Automatic Speech Recognition and Understanding, 2005.
[42] S.-R. Shiang, H.-y. Lee and L.-s. Lee, "Spoken question answering using tree-structured conditional random fields and two-layer random walk," in Fifteenth Annual Conference of the International Speech Communication Association, 2014.
[43] W. Fang, J.-Y. Hsu, H.-y. Lee and L.-S. Lee, "Hierarchical attention model for improved machine comprehension of spoken content," in IEEE Spoken Language Technology Workshop, 2016, pp. 232-238.
[44] C. Van Heerden, D. Karakos, K. Narasimhan, M. Davel and R. Schwartz, "Constructing sub-word units for spoken term detection," in IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2017, pp. 5780-5784.
[45] I. Szöke, "Hybrid word-subword decoding for spoken term detection," in the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Citeseer, 2008, pp. 42-48.
[46] C.-H. Lee, Y.-N. Chen and H.-Y. Lee, "Mitigating the impact of speech recognition errors on spoken question answering by adversarial domain adaptation," in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, 2019, pp. 7300-7304.
[47] B.-H. Tseng, S.-s. Shen, H.-Y. Lee and L.-S. Lee, "Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine," in Interspeech, 2016, pp. 2731-2735.
[48] S.-B. Luo, H.-S. Lee, K.-Y. Chen and H.-M. Wang, "Spoken Multiple-Choice Question Answering Using Multimodal Convolutional Neural Networks," in 2019 IEEE Automatic Speech Recognition and Understanding Workshop, IEEE, 2019, pp. 772-778.
[49] J. Devlin, M.-W. Chang, K. Lee and K. Toutanova, "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding," the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 4171-4186, 2019.
[50] T.-C. Liu, Y.-H. Wu and H.-Y. Lee, "Query-based attention CNN for text similarity map," arXiv:1709.05036, 2017.
[51] T. Mikolov, K. Chen, G. Corrado and J. Dean, "Efficient estimation of word representations in vector space," arXiv:1301.3781, 2013.
[52] J. Pennington, R. Socher and C. D. Manning, "Glove: Global vectors for word representation," the 2014 conference on empirical methods in natural language processing, pp. 1532-1543, 2014.
[53] P. Bojanowski, E. Grave, A. Joulin and T. Mikolov, "Enriching word vectors with subword information," Transactions of the Association for Computational Linguistics, vol. 5, pp. 135-146, 2017.
[54] Y. Bengio, R. a. V. P. Ducharme and C. Jauvin, "A neural probabilistic language model," Journal of machine learning research, vol. 3, no. Feb, pp. 1137-1155, 2003.
[55] A. Mnih and G. E. Hinton, "A scalable hierarchical distributed language model," in Advances in neural information processing systems, 2009, pp. 1081-1088.
[56] T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado and J. Dean, "Distributed representations of words and phrases and their compositionality," in Advances in neural information processing systems, 2013, pp. 3113-3119.
[57] B. McCann, J. Bradbury, C. Xiong and R. Socher, "Learned in translation: Contextualized word vectors," in Advances in Neural Information Processing Systems, 2017, pp. 6294-6305.
[58] I. Sutskever, O. Vinyals and Q. V. Le, "Sequence to sequence learning with neural networks," in Advances in neural information processing systems, 2014, pp. 3104-3112.
[59] M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee and L. Zettlemoyer, "Deep contextualized word representations," the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 2227-2237, 2018.
[60] A. Radford, K. Narasimhan, T. Salimans and I. Sutskever, "Improving language understanding by generative pre-training," 2018. [Online]. Available: URL https://s3-us-west-2. amazonaws. com/openai-assets/researchcovers/languageunsupervised/language understanding paper. pdf.
[61] B. McCann, N. S. Keskar, C. Xiong and R. Socher, "The natural language decathlon: Multitask learning as question answering," arXiv:1806.08730, 2018.
[62] C. Xiong, V. Zhong and R. Socher, "Dcn+: Mixed objective and deep residual coattention for question answering," arXiv:1711.00106, 2017.
[63] A. Vaswani, N. Shazeer, N. a. U. J. a. J. L. a. G. A. N. Parmar, L. Kaiser and I. Polosukhin, "Attention is all you need," in Advances in neural information processing systems, 2017, pp. 5998-6008.
[64] I. Beltagy, M. E. Peters and A. Cohan, "Longformer: The long-document transformer," arXiv:2004.05150, 2020.
[65] N. Kitaev, L. Kaiser and A. Levskaya, "Reformer: The Efficient Transformer," in International Conference on Learning Representations, 2019.
[66] S. Subramanian, R. Collobert, M. Ranzato and Y.-L. Boureau, "Multi-scale Transformer Language Models," arXiv:2005.00581, 2020.
[67] Y. Tay, D. Bahri, D. Metzler, D.-C. Juan, Z. Zhao and C. Zheng, "Synthesizer: Rethinking Self-Attention in Transformer Models," arXiv:2005.00743, 2020.
[68] Z. Dai, Z. Yang, Y. Yang, J. G. Carbonell, Q. Le and R. Salakhutdinov, "Transformer-XL: Attentive Language Models beyond a Fixed-Length Context," the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978-2988, 2019.
[69] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov and Q. V. Le, "Xlnet: Generalized autoregressive pretraining for language understanding," in Advances in neural information processing systems, 2019, pp. 5754-5764.
[70] Y. Cui, W. Che, T. Liu, B. Qin, Z. Yang, S. Wang and G. Hu, "Pre-training with whole word masking for chinese bert," arXiv:1906.08101, 2019.
[71] X. Jiao, Y. Yin, L. Shang, X. Jiang, X. Chen, L. Li, F. Wang and Q. Liu, "Tinybert: Distilling bert for natural language understanding," arXiv:1909.10351, 2019.
[72] V. Sanh, L. Debut, J. Chaumond and T. Wolf, "DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter," arXiv:1910.01108, 2019.
[73] O. Zafrir, G. Boudoukh, P. Izsak and M. Wasserblat, "Q8bert: Quantized 8bit bert," arXiv:1910.06188, 2019.
[74] Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma and R. Soricut, "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations," in International Conference on Learning Representations, 2019.
[75] Z. Sun, H. Yu, X. Song, R. Liu, Y. Yang and D. Zhou, "Mobilebert: a compact task-agnostic bert for resource-limited devices," arXiv:2004.02984, 2020.
[76] Z. Zhang, X. Han, Z. Liu, X. Jiang, M. Sun and Q. Liu, "ERNIE: Enhanced Language Representation with Informative Entities," in the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 1441-1451.
[77] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei and I. Sutskever, "Language models are unsupervised multitask learners," OpenAI Blog, 2019.
[78] Y. Sun, S. Wang, Y. Li, S. Feng, H. Tian, H. Wu and H. Wang, "Ernie 2.0: A continual pre-training framework for language understanding," arXiv:1907.12412, 2019.
[79] M. Joshi, D. Chen, Y. Liu, D. S. Weld, L. Zettlemoyer and O. Levy, "Spanbert: Improving pre-training by representing and predicting spans," Transactions of the Association for Computational Linguistics, vol. 8, pp. 64-77, 2020.
[80] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov and L. Zettlemoyer, "Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension," arXiv:1910.13461, 2019.
[81] K. Song, X. Tan, T. Qin, J. Lu and T.-Y. Liu, "MASS: Masked Sequence to Sequence Pre-training for Language Generation," in International Conference on Machine Learning, 2019, pp. 5926-5936.
[82] L. Dong, N. Yang, W. Wang, F. Wei, X. Liu, Y. Wang, J. Gao, M. Zhou and H.-W. Hon, "Unified language model pre-training for natural language understanding and generation," in Advances in Neural Information Processing Systems, 2019, pp. 13042-13054.
[83] K. Clark, M.-T. Luong, Q. V. Le and C. D. Manning, "ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators," in International Conference on Learning Representations, 2019.
[84] W. Wang, B. Bi, M. Yan, C. Wu, Z. Bao, L. Peng and L. Si, "StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding," arXiv:1908.04577, 2019.
[85] Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang and others, "CodeBERT: A Pre-Trained Model for Programming and Natural Languages," arXiv:2002.08155, 2020.
[86] L. Hou, L. Shang, X. Jiang and Q. Liu, "DynaBERT: Dynamic BERT with Adaptive Width and Depth," arXiv:2004.04037, 2020.
[87] K. Song, X. Tan, T. Qin, J. Lu and T.-Y. Liu, "MPNet: Masked and Permuted Pre-training for Language Understanding," arXiv:2004.09297, 2020.
[88] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer and V. Stoyanov, "Roberta: A robustly optimized bert pretraining approach," arXiv:1907.11692, 2019.
[89] S. Shen, Z. Dong, J. Ye, L. Ma, Z. Yao, A. Gholami, M. W. Mahoney and K. Keutzer, "Q-bert: Hessian based ultra low precision quantization of bert," arXiv:1909.05840, 2019.
[90] H. Miao, R. Liu and S. Gao, "Option Attentive Capsule Network for Multi-choice Reading Comprehension," in International Conference on Neural Information Processing, Springer, 2019, pp. 306-318.
[91] D. Jin, S. Gao, J.-Y. Kao, T. Chung and D. Hakkani-tur, "MMM: Multi-stage Multi-task Learning for Multi-choice Reading Comprehension," arXiv:1910.00458, 2019.
[92] H. Wan, "Multi-task Learning with Multi-head Attention for Multi-choice Reading Comprehension," arXiv:2003.04992, 2020.
[93] M. Shoeybi, M. Patwary, R. Puri, P. LeGresley, J. Casper and B. Catanzaro, "Megatron-lm: Training multi-billion parameter language models using model parallelism," arXiv:1909.08053, 2019.
[94] J. Li, Z. Zhang and H. Zhao, "Multi-choice Dialogue-Based Reading Comprehension with Knowledge and Key Turns," arXiv:2004.13988, 2020.
[95] H. Miao, R. Liu and S. Gao, "A Multiple Granularity Co-Reasoning Model for Multi-choice Reading Comprehension," in 2019 International Joint Conference on Neural Networks, IEEE, 2019, pp. 1-7.
[96] K. S. Tai, R. Socher and C. D. Manning, "Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks," the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing , vol. 1, pp. 1556-1566, 2015.
[97] S. Wang and J. Jiang, "A compare-aggregate model for matching text sequences," arXiv:1611.01747, 2016.
[98] Z. Wang, W. Hamza and R. Florian, "Bilateral multi-perspective matching for natural language sentences," arXiv:1702.03814, 2017.
[99] Y.-S. Chuang, C.-L. Liu and H.-Y. Lee, "SpeechBERT: Cross-Modal Pre-trained Language Model for End-to-end Spoken Question Answering," arXiv:1910.11559, 2019.
[100] F. Ruskey and M. Weston, "A survey of Venn diagrams," Electronic Journal of Combinatorics, vol. 4, p. 3, 1997.
[101] D. B. Wijesinghe, S. Ranathunga and G. Dias, "Computer representation of Venn and Euler diagrams," International Conference on Advances in ICT for Emerging Regions, pp. 100-105, 2016.
[102] Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens and L. Carin, "Variational autoencoder for deep learning of images, labels and captions," in Advances in neural information processing systems, 2016, pp. 2352-2360.
[103] M. J. Kusner, B. Paige and J. M. Hern{\'a}ndez-Lobato, "Grammar variational autoencoder," the 34th International Conference on Machine Learning, vol. 70, pp. 1945-1954, 2017.
[104] C. K. S{\o}nderby, T. Raiko, L. Maal{\o}e, S. K. S{\o}nderby and O. Winther, "Ladder variational autoencoders," in Advances in neural information processing systems, 2016, pp. 3738-3746.
[105] "PTT," [Online]. Available: https://www.ptt.cc/index.html.
[106] "CNA," [Online]. Available: http://www.cna.com.tw/.
[107] D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsk´y, G. Stemmer and K. Vesel´y, "The Kaldi Speech Recognition Toolkit," in Automatic Speech Recognition and Understanding Workshop, 2011.
[108] D. Povey, G. Cheng, Y. Wang, K. a. X. H. Li, M. Yarmohammadi and S. Khudanpur, "Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks.," in Interspeech, 2018, pp. 3743-3747.
[109] D. Povey, V. Peddinti, D. Galvez, P. Ghahremani, V. Manohar, X. Na, Y. Wang and S. Khudanpur, "Purely sequence-trained neural networks for ASR based on lattice-free MMI," in Interspeech, 2016, pp. 2751-2755.
[110] K. Vesel{\`y}, A. Ghoshal, L. Burget and D. Povey, "Sequence-discriminative training of deep neural networks," in Interspeech, 2013, pp. 2345-2349.
[111] H.-M. Wang, B. Chen, J.-W. Kuo and S.-S. Cheng, "MATBN: A Mandarin Chinese broadcast news corpus," in International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 2, June 2005: Special Issue on Annotated Speech Corpora, 2005, pp. 219-236.
[112] "CKIP," [Online]. Available: http://ckip.iis.sinica.edu.tw/.
[113] H. Zhang and D. Chiang, "Kneser-ney smoothing on expected counts," the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 765-774, 2014.
[114] F. James, Modified Kneser-Ney smoothing of n-gram models, 2000.
[115] F. Jelinek, R. L. Mercer, L. R. Bahl and J. K. Baker, "Perplexity—a measure of the difficulty of speech recognition tasks," The Journal of the Acoustical Society of America, vol. 62, pp. S63-S63, 1977.
[116] T. Mikolov, M. Karafi{\'a}t, L. Burget, J. {\v{C}}ernock{\`y} and S. Khudanpur, "Recurrent neural network based language model," Eleventh annual conference of the international speech communication association, 2010.
[117] X. Liu, Y. Wang, X. Chen, M. J. Gales and P. C. Woodland, "Efficient lattice rescoring using recurrent neural network language models," International Conference on Acoustics, Speech and Signal Processing, pp. 4908-4912, 2014.
[118] L. Prechelt, "Automatic early stopping using cross validation: quantifying the criteria," in Neural Networks, vol. 11, 1998, pp. 761-767.
[119] O. M. Strand and A. Egeberg, "Cepstral mean and variance normalization in the model domain," in COST278 and ISCA Tutorial and Research Workshop (ITRW) on Robustness Issues in Conversational Interaction, 2004.
[120] Y.-F. Liao, W.-H. Hsu, Y.-C. Lin, Y.-H. S. Chang, M. Pleva, J. Juhar and G.-F. Deng, "Formosa speech recognition challenge 2018: data, plan and baselines," in 11th International Symposium on Chinese Spoken Language Processing, IEEE, 2018, pp. 270-274.
[121] J.-H. Kim, J. Jun and B.-T. Zhang, "Bilinear attention networks," in Advances in Neural Information Processing Systems, 2018, pp. 1564-1574.
[122] G. Ye, D. Liu, I.-H. Jhuo and S.-F. Chang, "Robust late fusion with rank minimization," in Conference on Computer Vision and Pattern Recognition, IEEE, 2012, pp. 3021-3028.
[123] H. J. Escalante, C. A. H{\'e}rnadez, L. E. Sucar and M. Montes, "Late fusion of heterogeneous methods for multimedia image retrieval," in the 1st ACM international conference on Multimedia information retrieval, 2008, pp. 172-179.
[124] S. Ruder, "An overview of gradient descent optimization algorithms," arXiv:1609.04747, 2016.
[125] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv:1412.6980, 2014.