§ 瀏覽學位論文書目資料
系統識別號 U0002-3006202015341100
DOI 10.6846/TKU.2020.00901
論文名稱(中文) 基於BERT語言模型回答是非題之研究
論文名稱(英文) Answering Yes/No Questions by the BERT Language Model
校院名稱 淡江大學
系所名稱(中文) 大數據分析與商業智慧碩士學位學程
系所名稱(英文) Master's Program In Big Data Analytics and Business Intelligence
學年度 108
學期 2
出版年 109
研究生(中文) 吳進益
研究生(英文) Chin-Yi Wu
學號 607890083
學位類別 碩士
語言別 英文
口試日期 2020-06-04
論文頁數 69頁
口試委員 指導教授 - 魏世杰
委員 - 戴敏育
委員 - 古倫維
關鍵字(中) 是非題
關鍵字(英) Yes/no questions
Deep Learning
隨著圖形處理單元(GPU)技術的發展,深度學習在機器學習和自然語言處理(NLP)任務中得到了廣泛應用。 最近,NLP領域出現大量有關問答(QA)任務採用深度學習的文獻。 這些文獻大部分集中在英語的簡短回答和對話任務上。 本文將以回答中文的是/否問題當作研究主題,動機是對此主題採取深度學習作法的文獻較少。本文將基於公開的中文資料集,利用預訓練的中文BERT(Bidirectional Encoding Representation of Transformer)語言模型進行微調和評估。 結果顯示,在本文擴展的五份資料集中,本文採用的二分類及三分類模型,皆可在十次交叉驗證下得到良好的準確率。
As the Graphics Processing Unit (GPU) technology advances, there is a boom in the use of deep learning for tasks in Machine Learning and Natural Language Processing (NLP). Recently NLP has seen a considerable amount of literature about Question Answering (QA) using deep learning approaches. Most of these works concentrate on short answer and dialogue tasks in English. In this paper, answering the yes/no questions in Chinese using the deep learning approach will be our topic of research as it is less studied in the literature. Based on a public Chinese QA dataset, a pre-trained Chinese BERT (Bidirectional Encoding Representation of Transformer) language model will be fine-tuned and evaluated. The result shows that for the five expanded datasets, our two-class or three-class models can all obtain a good accuracy using the 10-fold cross validation. Our approach can achieve a high accuracy using 10-fold cross validation. We expanded our dataset to create five different datasets to use. In addition, we also provided the model to predict the three-class task.
Table of Contents
1.1 Background and Motivation	1
1.2 Research Purpose	1
1.3 Overview of the Paper	1
2.1 Past Works on Yes/No Questions	3
2.2 Artificial Intelligence	4
2.2.1 Deep Learning	6
2.2.2 Artificial Neural Networks	6
2.3 The BERT Language Model	7
3.1 Research Methodology	11
3.2 System Architecture	13
3.3 Corpus	13
3.4 Preprocessing	14
3.5 BERT Fine-Tuning	16
3.6 Composite Word Attention Weights	17
3.7 Evaluation Metrics	19
4.1 The Environments	22
4.2 The Datasets	22
4.2.1 The Consistent Two-Class Dataset	24
4.2.2 The Expanded Two-Class Dataset	25
4.2.3 The Fact and Opinion Datasets of the Expanded Two-Class Dataset	26
4.2.4 The Expanded Three-Class Dataset	28
4.3 Experimental Setup and Results	31
4.3.1 The Result of the Consistent Two-class Dataset	32
4.3.2 The Result of the Expanded Two-class Dataset	35
4.3.3 The Result of the Fact and Opinion Subsets of the Expanded Two-class Dataset	38 The Result of the Fact Dataset	38 The Result of the Opinion Dataset	41 The Comparison between the Fact and Opinion Dataset	43
4.3.4 The Result of the Expanded Three-class Dataset	44
4.3 Discussion	46
5.1 Findings of the Study	56
5.2 Limitation of the Study	56
5.3 Research Contribution	56
5.4 Future Work	57
References	58
Appendix	61

List of Figures
Figure 1. The relationship between AI, ML, DL, and NLP	6
Figure 2. Pre-training model architectures between BERT, ELMo, and OpenAI GPT	7
Figure 3. Illustrations of fine-tuning BERT for different downstream tasks	10
Figure 4. Research Development Procedure	11
Figure 5. system development research process	12
Figure 6. System Architecture	13
Figure 7. Preprocessing	14
Figure 8. An example for structure adjustment and encoding conversion	15
Figure 9. Another example for structure adjustment and encoding conversion	16
Figure 10. The fine-tuned BERT model for answering the yes/no question about passage.	17
Figure 11. A sample attention diagram for an aspect of attention on the input token	18
Figure 12. A sample composite attention diagram for all aspects of attention	19
Figure 13. The percentages of the DuReader 2.0 zhido and search sets	23
Figure 14. The percentages of the consistent two-class train and dev sets	25
Figure 15. The percentages of the expanded two-class train and dev sets	26
Figure 16. The percentages of the fact train and dev sets	27
Figure 17. The percentages of the opinion train and dev sets	28
Figure 18. The percentages of the expanded three-class train and dev sets	29
Figure 19. Composite word attention diagram for yes predicted as yes	47
Figure 20. Composite word attention diagram for yes predicted as no	48
Figure 21. Composite word attention diagram for yes predicted as depends	49
Figure 22. Composite word attention diagram for no predicted as no	50
Figure 23. Composite word attention diagram for no predicted as yes	51
Figure 24. Composite word attention diagram for no predicted as depends	52
Figure 25. Composite word attention diagram for depends predicted as depends	53
Figure 26. Composite word attention diagram for depends predicted as yes	54
Figure 27. Composite word attention diagram for depends predicted as no	55
Figure 28. Test set results on BoolQ	57
Figure 29. Attention diagram for the case of yes predicted as yes (9L_11H)	61
Figure 30. Attention diagram for the case of yes predicted as no (9L_11H)	62
Figure 31. Attention diagram for the case of yes predicted as depends (9L_1H)	63
Figure 32. Attention diagram for the case of no predicted as no (9L_11H)	64
Figure 33. Attention diagram for the case of no predicted as yes (8L_6H)	65
Figure 34. Attention diagram for the case of no predicted as depends (8L_10H)	66
Figure 35. Attention diagram for the case of depends predicted as depends (9L_10H)	67
Figure 36. Attention diagram for the case of depends predicted as yes (9L_11H)	68
Figure 37. Attention diagram for the case of depends predicted as no (9L_12H)	69
List of Tables
Table 1. The definition of four categories of Artificial Intelligence	5
Table 2. The confusion matrix of two-class prediction results	20
Table 3. The confusion matrix of three-class prediction results	21
Table 4. Dataset profile	22
Table 5. The profile of the expanded three-class dataset	24
Table 6. Consistent two-class dataset profile	25
Table 7. Expanded two-class dataset profile	26
Table 8. Fact dataset profile	27
Table 9. Opinion dataset profile	28
Table 10. Expanded three-class dataset profile	29
Table 11. Dataset summary	31
Table 12. The hyperparameters for model training	32
Table 13. The prediction result for the dev set of the consistent dataset	33
Table 14. The prediction result for the train set of the consistent dataset	34
Table 15. The 10-fold C.V. result for the consistent dataset with combined sets	34
Table 16. Summary of the result for consistent two-class dataset	35
Table 17. The prediction result for the dev set of the expanded dataset	36
Table 18. The prediction result for the train set of the expanded dataset	36
Table 19. The10-folds C.V. result for the expanded two-class dataset with combined sets	37
Table 20. Summary of the result for the expanded two-class dataset	38
Table 21. The prediction result for the dev set of the expanded Fact dataset	39
Table 22. The prediction result for the train set of the expanded Fact dataset	39
Table 23. The 10-fold cross validation result for the Fact dataset with combined sets	40
Table 24. The prediction result for the dev set of the expanded Opinion dataset	41
Table 25. The prediction result for the train set of the expanded Opinion Dataset	42
Table 26. The 10-fold cross validation result for the Opinion dataset with combined sets	42
Table 27. Summary for the Fact dataset	43
Table 28. Summary for the Opinion dataset	44
Table 29. The prediction result of the fold1 test set of the expanded three-class dataset	45
Table 30. The result for the 10 folds of the expanded three-class with combined sets	45
Table 31. Summary for the expanded three-class dataset	46
Table 32. The illustration of each sample	46
Table 33. The case of correct prediction for Yes	47
Table 34. The case of wrong prediction of No for Yes	48
Table 35. The case of wrong prediction of Depends for Yes	49
Table 36. The case of correct prediction for No	50
Table 37. The case of wrong prediction of Yes for No	51
Table 38. The case of wrong prediction of Depends for No	52
Table 39. The case of correct prediction for Depends	53
Table 40. The case of wrong prediction of Yes for Depends	54
Table 41. The case of wrong prediction of No for Depends	55
Bentivogli L., Dagan I., Magnini B. (2017) The Recognizing Textual Entailment Challenges: Datasets and Methodologies. In: Ide N., Pustejovsky J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht
Choi, E., He, H., Iyyer, M., Yatskar, M., Yih, W. T., Choi, Y., ... & Zettlemoyer, L. (2018). Quac: Question answering in context. arXiv preprint arXiv:1808.07036.
Clark, C., Lee, K., Chang, M. W., Kwiatkowski, T., Collins, M., & Toutanova, K. (2019). BoolQ: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044.
Dai, A.M. & Le, Q.V. (2015). Semi-supervised sequence learning. In Advances in Neural Information Processing Systems, pages 3079–3087.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. 
Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193-202.
Harabagiu, S., & Hickl, A. (2006, July). Methods for using textual entailment in open-domain question answering. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (pp. 905-912). Association for Computational Linguistics.
He, W., Liu, K., Liu, J., Lyu, Y., Zhao, S., Xiao, X., ... & Liu, X. (2017). DuReader: a Chinese machine reading comprehension dataset from real-world applications. arXiv preprint arXiv:1711.05073.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.
Joshi, M., Choi, E., Weld, D. S., & Zettlemoyer, L. (2017). Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
Malakasiotis, P., & Androutsopoulos, I. (2007, June). Learning textual entailment using SVMs and string similarity measures. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing (pp. 42-47).
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), 115-133.
Mihaylov, T., Clark, P., Khot, T., & Sabharwal, A. (2018). Can a suit of armor conduct electricity? a new dataset for open book question answering. arXiv preprint arXiv:1809.02789.
Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., & Deng, L. (2016). MS MARCO: a human-generated machine reading comprehension dataset. Computing Research Repository, arXiv:1611.09268. Version 3.
Nunamaker Jr, J. F., Chen, M., & Purdin, T. D. (1990). Systems development in information systems research. Journal of management information systems, 7(3), 89-106.
Pearson, K. (1904). III. Mathematical contributions to the theory of evolution. —XII. On a generalised theory of alternative inheritance, with special reference to Mendel's laws. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 203(359-371), 53-86.
Provost, F., & Kohavi, R. (1998). On applied research in machine learning. Machine Learning, 30, 127-132.
Quinlan, J. R. J. M. l. (1986). Induction of decision trees. 1(1), 81-106.
Rajpurkar, P., Jia, R., & Liang, P. (2018). Know what you don't know: Unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822.
Reddy, S., Chen, D., & Manning, C. D. (2019). Coqa: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7, 249-266.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536.
Russell, S. J., & Norvig, P. (2016). Artificial intelligence: a modern approach: Malaysia; Pearson Education Limited.
Sondak, N. E., & Sondak, V. K. (1989, February). Neural networks and artificial intelligence. In Proceedings of the twentieth SIGCSE technical symposium on Computer science education (pp. 241-245).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998-6008).
Welbl, J., Stenetorp, P., & Riedel, S. (2018). Constructing datasets for multi-hop reading comprehension across documents. Transactions of the Association for Computational Linguistics, 6, 287-302.
Williams, A., Nangia, N., & Bowman, S. R. (2017). A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426.
Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., & Manning, C. D. (2018). Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600.
Zellers, R., Bisk, Y., Schwartz, R., & Choi, Y. (2018). Swag: A large-scale adversarial dataset for grounded commonsense inference. arXiv preprint arXiv:1808.05326.
Zhang, S., Liu, X., Liu, J., Gao, J., Duh, K., & Van Durme, B. (2018). Record: Bridging the gap between human and machine commonsense reading comprehension. arXiv preprint arXiv:1810.12885.

圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信