淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


下載電子全文限經由淡江IP使用) 
系統識別號 U0002-3006202015341100
中文論文名稱 基於BERT語言模型回答是非題之研究
英文論文名稱 Answering Yes/No Questions by the BERT Language Model
校院名稱 淡江大學
系所名稱(中) 大數據分析與商業智慧碩士學位學程
系所名稱(英) Master's Program In Big Data Analytics and Business Intelligence
學年度 108
學期 2
出版年 109
研究生中文姓名 吳進益
研究生英文姓名 Chin-Yi Wu
學號 607890083
學位類別 碩士
語文別 英文
口試日期 2020-06-04
論文頁數 69頁
口試委員 指導教授-魏世杰
委員-戴敏育
委員-古倫維
中文關鍵字 是非題  自然語言處理  BERT  深度學習 
英文關鍵字 Yes/no questions  NLP  BERT  Deep Learning 
學科別分類
中文摘要 隨著圖形處理單元(GPU)技術的發展,深度學習在機器學習和自然語言處理(NLP)任務中得到了廣泛應用。 最近,NLP領域出現大量有關問答(QA)任務採用深度學習的文獻。 這些文獻大部分集中在英語的簡短回答和對話任務上。 本文將以回答中文的是/否問題當作研究主題,動機是對此主題採取深度學習作法的文獻較少。本文將基於公開的中文資料集,利用預訓練的中文BERT(Bidirectional Encoding Representation of Transformer)語言模型進行微調和評估。 結果顯示,在本文擴展的五份資料集中,本文採用的二分類及三分類模型,皆可在十次交叉驗證下得到良好的準確率。
英文摘要 As the Graphics Processing Unit (GPU) technology advances, there is a boom in the use of deep learning for tasks in Machine Learning and Natural Language Processing (NLP). Recently NLP has seen a considerable amount of literature about Question Answering (QA) using deep learning approaches. Most of these works concentrate on short answer and dialogue tasks in English. In this paper, answering the yes/no questions in Chinese using the deep learning approach will be our topic of research as it is less studied in the literature. Based on a public Chinese QA dataset, a pre-trained Chinese BERT (Bidirectional Encoding Representation of Transformer) language model will be fine-tuned and evaluated. The result shows that for the five expanded datasets, our two-class or three-class models can all obtain a good accuracy using the 10-fold cross validation. Our approach can achieve a high accuracy using 10-fold cross validation. We expanded our dataset to create five different datasets to use. In addition, we also provided the model to predict the three-class task.
論文目次 Table of Contents
I. INTRODUCTION 1
1.1 Background and Motivation 1
1.2 Research Purpose 1
1.3 Overview of the Paper 1
II. RELATED WORKS 3
2.1 Past Works on Yes/No Questions 3
2.2 Artificial Intelligence 4
2.2.1 Deep Learning 6
2.2.2 Artificial Neural Networks 6
2.3 The BERT Language Model 7
III. METHODOLOGY 11
3.1 Research Methodology 11
3.2 System Architecture 13
3.3 Corpus 13
3.4 Preprocessing 14
3.5 BERT Fine-Tuning 16
3.6 Composite Word Attention Weights 17
3.7 Evaluation Metrics 19
IV. EXPERIMENT 22
4.1 The Environments 22
4.2 The Datasets 22
4.2.1 The Consistent Two-Class Dataset 24
4.2.2 The Expanded Two-Class Dataset 25
4.2.3 The Fact and Opinion Datasets of the Expanded Two-Class Dataset 26
4.2.4 The Expanded Three-Class Dataset 28
4.3 Experimental Setup and Results 31
4.3.1 The Result of the Consistent Two-class Dataset 32
4.3.2 The Result of the Expanded Two-class Dataset 35
4.3.3 The Result of the Fact and Opinion Subsets of the Expanded Two-class Dataset 38
4.3.3.1 The Result of the Fact Dataset 38
4.3.3.2 The Result of the Opinion Dataset 41
4.3.3.3 The Comparison between the Fact and Opinion Dataset 43
4.3.4 The Result of the Expanded Three-class Dataset 44
4.3 Discussion 46
V. CONCLUSIONS AND FUTURE WORK 56
5.1 Findings of the Study 56
5.2 Limitation of the Study 56
5.3 Research Contribution 56
5.4 Future Work 57
References 58
Appendix 61


List of Figures
Figure 1. The relationship between AI, ML, DL, and NLP 6
Figure 2. Pre-training model architectures between BERT, ELMo, and OpenAI GPT 7
Figure 3. Illustrations of fine-tuning BERT for different downstream tasks 10
Figure 4. Research Development Procedure 11
Figure 5. system development research process 12
Figure 6. System Architecture 13
Figure 7. Preprocessing 14
Figure 8. An example for structure adjustment and encoding conversion 15
Figure 9. Another example for structure adjustment and encoding conversion 16
Figure 10. The fine-tuned BERT model for answering the yes/no question about passage. 17
Figure 11. A sample attention diagram for an aspect of attention on the input token 18
Figure 12. A sample composite attention diagram for all aspects of attention 19
Figure 13. The percentages of the DuReader 2.0 zhido and search sets 23
Figure 14. The percentages of the consistent two-class train and dev sets 25
Figure 15. The percentages of the expanded two-class train and dev sets 26
Figure 16. The percentages of the fact train and dev sets 27
Figure 17. The percentages of the opinion train and dev sets 28
Figure 18. The percentages of the expanded three-class train and dev sets 29
Figure 19. Composite word attention diagram for yes predicted as yes 47
Figure 20. Composite word attention diagram for yes predicted as no 48
Figure 21. Composite word attention diagram for yes predicted as depends 49
Figure 22. Composite word attention diagram for no predicted as no 50
Figure 23. Composite word attention diagram for no predicted as yes 51
Figure 24. Composite word attention diagram for no predicted as depends 52
Figure 25. Composite word attention diagram for depends predicted as depends 53
Figure 26. Composite word attention diagram for depends predicted as yes 54
Figure 27. Composite word attention diagram for depends predicted as no 55
Figure 28. Test set results on BoolQ 57
Figure 29. Attention diagram for the case of yes predicted as yes (9L_11H) 61
Figure 30. Attention diagram for the case of yes predicted as no (9L_11H) 62
Figure 31. Attention diagram for the case of yes predicted as depends (9L_1H) 63
Figure 32. Attention diagram for the case of no predicted as no (9L_11H) 64
Figure 33. Attention diagram for the case of no predicted as yes (8L_6H) 65
Figure 34. Attention diagram for the case of no predicted as depends (8L_10H) 66
Figure 35. Attention diagram for the case of depends predicted as depends (9L_10H) 67
Figure 36. Attention diagram for the case of depends predicted as yes (9L_11H) 68
Figure 37. Attention diagram for the case of depends predicted as no (9L_12H) 69

List of Tables
Table 1. The definition of four categories of Artificial Intelligence 5
Table 2. The confusion matrix of two-class prediction results 20
Table 3. The confusion matrix of three-class prediction results 21
Table 4. Dataset profile 22
Table 5. The profile of the expanded three-class dataset 24
Table 6. Consistent two-class dataset profile 25
Table 7. Expanded two-class dataset profile 26
Table 8. Fact dataset profile 27
Table 9. Opinion dataset profile 28
Table 10. Expanded three-class dataset profile 29
Table 11. Dataset summary 31
Table 12. The hyperparameters for model training 32
Table 13. The prediction result for the dev set of the consistent dataset 33
Table 14. The prediction result for the train set of the consistent dataset 34
Table 15. The 10-fold C.V. result for the consistent dataset with combined sets 34
Table 16. Summary of the result for consistent two-class dataset 35
Table 17. The prediction result for the dev set of the expanded dataset 36
Table 18. The prediction result for the train set of the expanded dataset 36
Table 19. The10-folds C.V. result for the expanded two-class dataset with combined sets 37
Table 20. Summary of the result for the expanded two-class dataset 38
Table 21. The prediction result for the dev set of the expanded Fact dataset 39
Table 22. The prediction result for the train set of the expanded Fact dataset 39
Table 23. The 10-fold cross validation result for the Fact dataset with combined sets 40
Table 24. The prediction result for the dev set of the expanded Opinion dataset 41
Table 25. The prediction result for the train set of the expanded Opinion Dataset 42
Table 26. The 10-fold cross validation result for the Opinion dataset with combined sets 42
Table 27. Summary for the Fact dataset 43
Table 28. Summary for the Opinion dataset 44
Table 29. The prediction result of the fold1 test set of the expanded three-class dataset 45
Table 30. The result for the 10 folds of the expanded three-class with combined sets 45
Table 31. Summary for the expanded three-class dataset 46
Table 32. The illustration of each sample 46
Table 33. The case of correct prediction for Yes 47
Table 34. The case of wrong prediction of No for Yes 48
Table 35. The case of wrong prediction of Depends for Yes 49
Table 36. The case of correct prediction for No 50
Table 37. The case of wrong prediction of Yes for No 51
Table 38. The case of wrong prediction of Depends for No 52
Table 39. The case of correct prediction for Depends 53
Table 40. The case of wrong prediction of Yes for Depends 54
Table 41. The case of wrong prediction of No for Depends 55
參考文獻 REFERENCES
Bentivogli L., Dagan I., Magnini B. (2017) The Recognizing Textual Entailment Challenges: Datasets and Methodologies. In: Ide N., Pustejovsky J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht
Choi, E., He, H., Iyyer, M., Yatskar, M., Yih, W. T., Choi, Y., ... & Zettlemoyer, L. (2018). Quac: Question answering in context. arXiv preprint arXiv:1808.07036.
Clark, C., Lee, K., Chang, M. W., Kwiatkowski, T., Collins, M., & Toutanova, K. (2019). BoolQ: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044.
Dai, A.M. & Le, Q.V. (2015). Semi-supervised sequence learning. In Advances in Neural Information Processing Systems, pages 3079–3087.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193-202.
Harabagiu, S., & Hickl, A. (2006, July). Methods for using textual entailment in open-domain question answering. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (pp. 905-912). Association for Computational Linguistics.
He, W., Liu, K., Liu, J., Lyu, Y., Zhao, S., Xiao, X., ... & Liu, X. (2017). DuReader: a Chinese machine reading comprehension dataset from real-world applications. arXiv preprint arXiv:1711.05073.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780.
Joshi, M., Choi, E., Weld, D. S., & Zettlemoyer, L. (2017). Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444.
Malakasiotis, P., & Androutsopoulos, I. (2007, June). Learning textual entailment using SVMs and string similarity measures. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing (pp. 42-47).
McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), 115-133.
Mihaylov, T., Clark, P., Khot, T., & Sabharwal, A. (2018). Can a suit of armor conduct electricity? a new dataset for open book question answering. arXiv preprint arXiv:1809.02789.
Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., & Deng, L. (2016). MS MARCO: a human-generated machine reading comprehension dataset. Computing Research Repository, arXiv:1611.09268. Version 3.
Nunamaker Jr, J. F., Chen, M., & Purdin, T. D. (1990). Systems development in information systems research. Journal of management information systems, 7(3), 89-106.
Pearson, K. (1904). III. Mathematical contributions to the theory of evolution. —XII. On a generalised theory of alternative inheritance, with special reference to Mendel's laws. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 203(359-371), 53-86.
Provost, F., & Kohavi, R. (1998). On applied research in machine learning. Machine Learning, 30, 127-132.
Quinlan, J. R. J. M. l. (1986). Induction of decision trees. 1(1), 81-106.
Rajpurkar, P., Jia, R., & Liang, P. (2018). Know what you don't know: Unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822.
Reddy, S., Chen, D., & Manning, C. D. (2019). Coqa: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7, 249-266.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536.
Russell, S. J., & Norvig, P. (2016). Artificial intelligence: a modern approach: Malaysia; Pearson Education Limited.
Sondak, N. E., & Sondak, V. K. (1989, February). Neural networks and artificial intelligence. In Proceedings of the twentieth SIGCSE technical symposium on Computer science education (pp. 241-245).
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998-6008).
Welbl, J., Stenetorp, P., & Riedel, S. (2018). Constructing datasets for multi-hop reading comprehension across documents. Transactions of the Association for Computational Linguistics, 6, 287-302.
Williams, A., Nangia, N., & Bowman, S. R. (2017). A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426.
Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., & Manning, C. D. (2018). Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600.
Zellers, R., Bisk, Y., Schwartz, R., & Choi, Y. (2018). Swag: A large-scale adversarial dataset for grounded commonsense inference. arXiv preprint arXiv:1808.05326.
Zhang, S., Liu, X., Liu, J., Gao, J., Duh, K., & Van Durme, B. (2018). Record: Bridging the gap between human and machine commonsense reading comprehension. arXiv preprint arXiv:1810.12885.
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2020-07-01公開。
  • 同意授權瀏覽/列印電子全文服務,於2020-07-01起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2486 或 來信