電子學位論文服務

§ 瀏覽學位論文書目資料

本論文電子全文於2020-07-01起於校外公開使用
本論文紙本於2020-07-01起公開使用

系統識別號	U0002-3006202015341100
DOI	10.6846/TKU.2020.00901
論文名稱(中文)	基於BERT語言模型回答是非題之研究
論文名稱(英文)	Answering Yes/No Questions by the BERT Language Model
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	大數據分析與商業智慧碩士學位學程
系所名稱(英文)	Master's Program In Big Data Analytics and Business Intelligence
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	108
學期	2
出版年	109
研究生(中文)	吳進益
研究生(英文)	Chin-Yi Wu
學號	607890083
學位類別	碩士
語言別	英文
第二語言別
口試日期	2020-06-04
論文頁數	69頁
口試委員	指導教授 - 魏世杰委員 - 戴敏育委員 - 古倫維
關鍵字(中)	是非題自然語言處理 BERT 深度學習
關鍵字(英)	Yes/no questions NLP BERT Deep Learning
第三語言關鍵字
學科別分類
中文摘要	隨著圖形處理單元（GPU）技術的發展，深度學習在機器學習和自然語言處理（NLP）任務中得到了廣泛應用。最近，NLP領域出現大量有關問答（QA）任務採用深度學習的文獻。這些文獻大部分集中在英語的簡短回答和對話任務上。本文將以回答中文的是/否問題當作研究主題，動機是對此主題採取深度學習作法的文獻較少。本文將基於公開的中文資料集，利用預訓練的中文BERT（Bidirectional Encoding Representation of Transformer）語言模型進行微調和評估。結果顯示，在本文擴展的五份資料集中，本文採用的二分類及三分類模型，皆可在十次交叉驗證下得到良好的準確率。
英文摘要	As the Graphics Processing Unit (GPU) technology advances, there is a boom in the use of deep learning for tasks in Machine Learning and Natural Language Processing (NLP). Recently NLP has seen a considerable amount of literature about Question Answering (QA) using deep learning approaches. Most of these works concentrate on short answer and dialogue tasks in English. In this paper, answering the yes/no questions in Chinese using the deep learning approach will be our topic of research as it is less studied in the literature. Based on a public Chinese QA dataset, a pre-trained Chinese BERT (Bidirectional Encoding Representation of Transformer) language model will be fine-tuned and evaluated. The result shows that for the five expanded datasets, our two-class or three-class models can all obtain a good accuracy using the 10-fold cross validation. Our approach can achieve a high accuracy using 10-fold cross validation. We expanded our dataset to create five different datasets to use. In addition, we also provided the model to predict the three-class task.
第三語言摘要
論文目次	Table of Contents I. INTRODUCTION 1 1.1 Background and Motivation 1 1.2 Research Purpose 1 1.3 Overview of the Paper 1 II. RELATED WORKS 3 2.1 Past Works on Yes/No Questions 3 2.2 Artificial Intelligence 4 2.2.1 Deep Learning 6 2.2.2 Artificial Neural Networks 6 2.3 The BERT Language Model 7 III. METHODOLOGY 11 3.1 Research Methodology 11 3.2 System Architecture 13 3.3 Corpus 13 3.4 Preprocessing 14 3.5 BERT Fine-Tuning 16 3.6 Composite Word Attention Weights 17 3.7 Evaluation Metrics 19 IV. EXPERIMENT 22 4.1 The Environments 22 4.2 The Datasets 22 4.2.1 The Consistent Two-Class Dataset 24 4.2.2 The Expanded Two-Class Dataset 25 4.2.3 The Fact and Opinion Datasets of the Expanded Two-Class Dataset 26 4.2.4 The Expanded Three-Class Dataset 28 4.3 Experimental Setup and Results 31 4.3.1 The Result of the Consistent Two-class Dataset 32 4.3.2 The Result of the Expanded Two-class Dataset 35 4.3.3 The Result of the Fact and Opinion Subsets of the Expanded Two-class Dataset 38 4.3.3.1 The Result of the Fact Dataset 38 4.3.3.2 The Result of the Opinion Dataset 41 4.3.3.3 The Comparison between the Fact and Opinion Dataset 43 4.3.4 The Result of the Expanded Three-class Dataset 44 4.3 Discussion 46 V. CONCLUSIONS AND FUTURE WORK 56 5.1 Findings of the Study 56 5.2 Limitation of the Study 56 5.3 Research Contribution 56 5.4 Future Work 57 References 58 Appendix 61 List of Figures Figure 1. The relationship between AI, ML, DL, and NLP 6 Figure 2. Pre-training model architectures between BERT, ELMo, and OpenAI GPT 7 Figure 3. Illustrations of fine-tuning BERT for different downstream tasks 10 Figure 4. Research Development Procedure 11 Figure 5. system development research process 12 Figure 6. System Architecture 13 Figure 7. Preprocessing 14 Figure 8. An example for structure adjustment and encoding conversion 15 Figure 9. Another example for structure adjustment and encoding conversion 16 Figure 10. The fine-tuned BERT model for answering the yes/no question about passage. 17 Figure 11. A sample attention diagram for an aspect of attention on the input token 18 Figure 12. A sample composite attention diagram for all aspects of attention 19 Figure 13. The percentages of the DuReader 2.0 zhido and search sets 23 Figure 14. The percentages of the consistent two-class train and dev sets 25 Figure 15. The percentages of the expanded two-class train and dev sets 26 Figure 16. The percentages of the fact train and dev sets 27 Figure 17. The percentages of the opinion train and dev sets 28 Figure 18. The percentages of the expanded three-class train and dev sets 29 Figure 19. Composite word attention diagram for yes predicted as yes 47 Figure 20. Composite word attention diagram for yes predicted as no 48 Figure 21. Composite word attention diagram for yes predicted as depends 49 Figure 22. Composite word attention diagram for no predicted as no 50 Figure 23. Composite word attention diagram for no predicted as yes 51 Figure 24. Composite word attention diagram for no predicted as depends 52 Figure 25. Composite word attention diagram for depends predicted as depends 53 Figure 26. Composite word attention diagram for depends predicted as yes 54 Figure 27. Composite word attention diagram for depends predicted as no 55 Figure 28. Test set results on BoolQ 57 Figure 29. Attention diagram for the case of yes predicted as yes (9L_11H) 61 Figure 30. Attention diagram for the case of yes predicted as no (9L_11H) 62 Figure 31. Attention diagram for the case of yes predicted as depends (9L_1H) 63 Figure 32. Attention diagram for the case of no predicted as no (9L_11H) 64 Figure 33. Attention diagram for the case of no predicted as yes (8L_6H) 65 Figure 34. Attention diagram for the case of no predicted as depends (8L_10H) 66 Figure 35. Attention diagram for the case of depends predicted as depends (9L_10H) 67 Figure 36. Attention diagram for the case of depends predicted as yes (9L_11H) 68 Figure 37. Attention diagram for the case of depends predicted as no (9L_12H) 69 List of Tables Table 1. The definition of four categories of Artificial Intelligence 5 Table 2. The confusion matrix of two-class prediction results 20 Table 3. The confusion matrix of three-class prediction results 21 Table 4. Dataset profile 22 Table 5. The profile of the expanded three-class dataset 24 Table 6. Consistent two-class dataset profile 25 Table 7. Expanded two-class dataset profile 26 Table 8. Fact dataset profile 27 Table 9. Opinion dataset profile 28 Table 10. Expanded three-class dataset profile 29 Table 11. Dataset summary 31 Table 12. The hyperparameters for model training 32 Table 13. The prediction result for the dev set of the consistent dataset 33 Table 14. The prediction result for the train set of the consistent dataset 34 Table 15. The 10-fold C.V. result for the consistent dataset with combined sets 34 Table 16. Summary of the result for consistent two-class dataset 35 Table 17. The prediction result for the dev set of the expanded dataset 36 Table 18. The prediction result for the train set of the expanded dataset 36 Table 19. The10-folds C.V. result for the expanded two-class dataset with combined sets 37 Table 20. Summary of the result for the expanded two-class dataset 38 Table 21. The prediction result for the dev set of the expanded Fact dataset 39 Table 22. The prediction result for the train set of the expanded Fact dataset 39 Table 23. The 10-fold cross validation result for the Fact dataset with combined sets 40 Table 24. The prediction result for the dev set of the expanded Opinion dataset 41 Table 25. The prediction result for the train set of the expanded Opinion Dataset 42 Table 26. The 10-fold cross validation result for the Opinion dataset with combined sets 42 Table 27. Summary for the Fact dataset 43 Table 28. Summary for the Opinion dataset 44 Table 29. The prediction result of the fold1 test set of the expanded three-class dataset 45 Table 30. The result for the 10 folds of the expanded three-class with combined sets 45 Table 31. Summary for the expanded three-class dataset 46 Table 32. The illustration of each sample 46 Table 33. The case of correct prediction for Yes 47 Table 34. The case of wrong prediction of No for Yes 48 Table 35. The case of wrong prediction of Depends for Yes 49 Table 36. The case of correct prediction for No 50 Table 37. The case of wrong prediction of Yes for No 51 Table 38. The case of wrong prediction of Depends for No 52 Table 39. The case of correct prediction for Depends 53 Table 40. The case of wrong prediction of Yes for Depends 54 Table 41. The case of wrong prediction of No for Depends 55
參考文獻	REFERENCES Bentivogli L., Dagan I., Magnini B. (2017) The Recognizing Textual Entailment Challenges: Datasets and Methodologies. In: Ide N., Pustejovsky J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht Choi, E., He, H., Iyyer, M., Yatskar, M., Yih, W. T., Choi, Y., ... & Zettlemoyer, L. (2018). Quac: Question answering in context. arXiv preprint arXiv:1808.07036. Clark, C., Lee, K., Chang, M. W., Kwiatkowski, T., Collins, M., & Toutanova, K. (2019). BoolQ: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044. Dai, A.M. & Le, Q.V. (2015). Semi-supervised sequence learning. In Advances in Neural Information Processing Systems, pages 3079–3087. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193-202. Harabagiu, S., & Hickl, A. (2006, July). Methods for using textual entailment in open-domain question answering. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (pp. 905-912). Association for Computational Linguistics. He, W., Liu, K., Liu, J., Lyu, Y., Zhao, S., Xiao, X., ... & Liu, X. (2017). DuReader: a Chinese machine reading comprehension dataset from real-world applications. arXiv preprint arXiv:1711.05073. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780. Joshi, M., Choi, E., Weld, D. S., & Zettlemoyer, L. (2017). Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. Malakasiotis, P., & Androutsopoulos, I. (2007, June). Learning textual entailment using SVMs and string similarity measures. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing (pp. 42-47). McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), 115-133. Mihaylov, T., Clark, P., Khot, T., & Sabharwal, A. (2018). Can a suit of armor conduct electricity? a new dataset for open book question answering. arXiv preprint arXiv:1809.02789. Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., & Deng, L. (2016). MS MARCO: a human-generated machine reading comprehension dataset. Computing Research Repository, arXiv:1611.09268. Version 3. Nunamaker Jr, J. F., Chen, M., & Purdin, T. D. (1990). Systems development in information systems research. Journal of management information systems, 7(3), 89-106. Pearson, K. (1904). III. Mathematical contributions to the theory of evolution. —XII. On a generalised theory of alternative inheritance, with special reference to Mendel's laws. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 203(359-371), 53-86. Provost, F., & Kohavi, R. (1998). On applied research in machine learning. Machine Learning, 30, 127-132. Quinlan, J. R. J. M. l. (1986). Induction of decision trees. 1(1), 81-106. Rajpurkar, P., Jia, R., & Liang, P. (2018). Know what you don't know: Unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822. Reddy, S., Chen, D., & Manning, C. D. (2019). Coqa: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7, 249-266. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. Russell, S. J., & Norvig, P. (2016). Artificial intelligence: a modern approach: Malaysia; Pearson Education Limited. Sondak, N. E., & Sondak, V. K. (1989, February). Neural networks and artificial intelligence. In Proceedings of the twentieth SIGCSE technical symposium on Computer science education (pp. 241-245). Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998-6008). Welbl, J., Stenetorp, P., & Riedel, S. (2018). Constructing datasets for multi-hop reading comprehension across documents. Transactions of the Association for Computational Linguistics, 6, 287-302. Williams, A., Nangia, N., & Bowman, S. R. (2017). A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426. Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., & Manning, C. D. (2018). Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600. Zellers, R., Bisk, Y., Schwartz, R., & Choi, Y. (2018). Swag: A large-scale adversarial dataset for grounded commonsense inference. arXiv preprint arXiv:1808.05326. Zhang, S., Liu, X., Liu, J., Gao, J., Duh, K., & Van Durme, B. (2018). Record: Bridging the gap between human and machine commonsense reading comprehension. arXiv preprint arXiv:1810.12885.
論文全文使用權限	校內：校內紙本論文立即公開同意電子論文全文授權校園內公開校內電子論文立即公開校外：同意授權校外電子論文立即公開

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信