系統識別號 | U0002-3006202015341100 |
---|---|
DOI | 10.6846/TKU.2020.00901 |
論文名稱(中文) | 基於BERT語言模型回答是非題之研究 |
論文名稱(英文) | Answering Yes/No Questions by the BERT Language Model |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 大數據分析與商業智慧碩士學位學程 |
系所名稱(英文) | Master's Program In Big Data Analytics and Business Intelligence |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 108 |
學期 | 2 |
出版年 | 109 |
研究生(中文) | 吳進益 |
研究生(英文) | Chin-Yi Wu |
學號 | 607890083 |
學位類別 | 碩士 |
語言別 | 英文 |
第二語言別 | |
口試日期 | 2020-06-04 |
論文頁數 | 69頁 |
口試委員 |
指導教授
-
魏世杰
委員 - 戴敏育 委員 - 古倫維 |
關鍵字(中) |
是非題 自然語言處理 BERT 深度學習 |
關鍵字(英) |
Yes/no questions NLP BERT Deep Learning |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
隨著圖形處理單元(GPU)技術的發展,深度學習在機器學習和自然語言處理(NLP)任務中得到了廣泛應用。 最近,NLP領域出現大量有關問答(QA)任務採用深度學習的文獻。 這些文獻大部分集中在英語的簡短回答和對話任務上。 本文將以回答中文的是/否問題當作研究主題,動機是對此主題採取深度學習作法的文獻較少。本文將基於公開的中文資料集,利用預訓練的中文BERT(Bidirectional Encoding Representation of Transformer)語言模型進行微調和評估。 結果顯示,在本文擴展的五份資料集中,本文採用的二分類及三分類模型,皆可在十次交叉驗證下得到良好的準確率。 |
英文摘要 |
As the Graphics Processing Unit (GPU) technology advances, there is a boom in the use of deep learning for tasks in Machine Learning and Natural Language Processing (NLP). Recently NLP has seen a considerable amount of literature about Question Answering (QA) using deep learning approaches. Most of these works concentrate on short answer and dialogue tasks in English. In this paper, answering the yes/no questions in Chinese using the deep learning approach will be our topic of research as it is less studied in the literature. Based on a public Chinese QA dataset, a pre-trained Chinese BERT (Bidirectional Encoding Representation of Transformer) language model will be fine-tuned and evaluated. The result shows that for the five expanded datasets, our two-class or three-class models can all obtain a good accuracy using the 10-fold cross validation. Our approach can achieve a high accuracy using 10-fold cross validation. We expanded our dataset to create five different datasets to use. In addition, we also provided the model to predict the three-class task. |
第三語言摘要 | |
論文目次 |
Table of Contents I. INTRODUCTION 1 1.1 Background and Motivation 1 1.2 Research Purpose 1 1.3 Overview of the Paper 1 II. RELATED WORKS 3 2.1 Past Works on Yes/No Questions 3 2.2 Artificial Intelligence 4 2.2.1 Deep Learning 6 2.2.2 Artificial Neural Networks 6 2.3 The BERT Language Model 7 III. METHODOLOGY 11 3.1 Research Methodology 11 3.2 System Architecture 13 3.3 Corpus 13 3.4 Preprocessing 14 3.5 BERT Fine-Tuning 16 3.6 Composite Word Attention Weights 17 3.7 Evaluation Metrics 19 IV. EXPERIMENT 22 4.1 The Environments 22 4.2 The Datasets 22 4.2.1 The Consistent Two-Class Dataset 24 4.2.2 The Expanded Two-Class Dataset 25 4.2.3 The Fact and Opinion Datasets of the Expanded Two-Class Dataset 26 4.2.4 The Expanded Three-Class Dataset 28 4.3 Experimental Setup and Results 31 4.3.1 The Result of the Consistent Two-class Dataset 32 4.3.2 The Result of the Expanded Two-class Dataset 35 4.3.3 The Result of the Fact and Opinion Subsets of the Expanded Two-class Dataset 38 4.3.3.1 The Result of the Fact Dataset 38 4.3.3.2 The Result of the Opinion Dataset 41 4.3.3.3 The Comparison between the Fact and Opinion Dataset 43 4.3.4 The Result of the Expanded Three-class Dataset 44 4.3 Discussion 46 V. CONCLUSIONS AND FUTURE WORK 56 5.1 Findings of the Study 56 5.2 Limitation of the Study 56 5.3 Research Contribution 56 5.4 Future Work 57 References 58 Appendix 61 List of Figures Figure 1. The relationship between AI, ML, DL, and NLP 6 Figure 2. Pre-training model architectures between BERT, ELMo, and OpenAI GPT 7 Figure 3. Illustrations of fine-tuning BERT for different downstream tasks 10 Figure 4. Research Development Procedure 11 Figure 5. system development research process 12 Figure 6. System Architecture 13 Figure 7. Preprocessing 14 Figure 8. An example for structure adjustment and encoding conversion 15 Figure 9. Another example for structure adjustment and encoding conversion 16 Figure 10. The fine-tuned BERT model for answering the yes/no question about passage. 17 Figure 11. A sample attention diagram for an aspect of attention on the input token 18 Figure 12. A sample composite attention diagram for all aspects of attention 19 Figure 13. The percentages of the DuReader 2.0 zhido and search sets 23 Figure 14. The percentages of the consistent two-class train and dev sets 25 Figure 15. The percentages of the expanded two-class train and dev sets 26 Figure 16. The percentages of the fact train and dev sets 27 Figure 17. The percentages of the opinion train and dev sets 28 Figure 18. The percentages of the expanded three-class train and dev sets 29 Figure 19. Composite word attention diagram for yes predicted as yes 47 Figure 20. Composite word attention diagram for yes predicted as no 48 Figure 21. Composite word attention diagram for yes predicted as depends 49 Figure 22. Composite word attention diagram for no predicted as no 50 Figure 23. Composite word attention diagram for no predicted as yes 51 Figure 24. Composite word attention diagram for no predicted as depends 52 Figure 25. Composite word attention diagram for depends predicted as depends 53 Figure 26. Composite word attention diagram for depends predicted as yes 54 Figure 27. Composite word attention diagram for depends predicted as no 55 Figure 28. Test set results on BoolQ 57 Figure 29. Attention diagram for the case of yes predicted as yes (9L_11H) 61 Figure 30. Attention diagram for the case of yes predicted as no (9L_11H) 62 Figure 31. Attention diagram for the case of yes predicted as depends (9L_1H) 63 Figure 32. Attention diagram for the case of no predicted as no (9L_11H) 64 Figure 33. Attention diagram for the case of no predicted as yes (8L_6H) 65 Figure 34. Attention diagram for the case of no predicted as depends (8L_10H) 66 Figure 35. Attention diagram for the case of depends predicted as depends (9L_10H) 67 Figure 36. Attention diagram for the case of depends predicted as yes (9L_11H) 68 Figure 37. Attention diagram for the case of depends predicted as no (9L_12H) 69 List of Tables Table 1. The definition of four categories of Artificial Intelligence 5 Table 2. The confusion matrix of two-class prediction results 20 Table 3. The confusion matrix of three-class prediction results 21 Table 4. Dataset profile 22 Table 5. The profile of the expanded three-class dataset 24 Table 6. Consistent two-class dataset profile 25 Table 7. Expanded two-class dataset profile 26 Table 8. Fact dataset profile 27 Table 9. Opinion dataset profile 28 Table 10. Expanded three-class dataset profile 29 Table 11. Dataset summary 31 Table 12. The hyperparameters for model training 32 Table 13. The prediction result for the dev set of the consistent dataset 33 Table 14. The prediction result for the train set of the consistent dataset 34 Table 15. The 10-fold C.V. result for the consistent dataset with combined sets 34 Table 16. Summary of the result for consistent two-class dataset 35 Table 17. The prediction result for the dev set of the expanded dataset 36 Table 18. The prediction result for the train set of the expanded dataset 36 Table 19. The10-folds C.V. result for the expanded two-class dataset with combined sets 37 Table 20. Summary of the result for the expanded two-class dataset 38 Table 21. The prediction result for the dev set of the expanded Fact dataset 39 Table 22. The prediction result for the train set of the expanded Fact dataset 39 Table 23. The 10-fold cross validation result for the Fact dataset with combined sets 40 Table 24. The prediction result for the dev set of the expanded Opinion dataset 41 Table 25. The prediction result for the train set of the expanded Opinion Dataset 42 Table 26. The 10-fold cross validation result for the Opinion dataset with combined sets 42 Table 27. Summary for the Fact dataset 43 Table 28. Summary for the Opinion dataset 44 Table 29. The prediction result of the fold1 test set of the expanded three-class dataset 45 Table 30. The result for the 10 folds of the expanded three-class with combined sets 45 Table 31. Summary for the expanded three-class dataset 46 Table 32. The illustration of each sample 46 Table 33. The case of correct prediction for Yes 47 Table 34. The case of wrong prediction of No for Yes 48 Table 35. The case of wrong prediction of Depends for Yes 49 Table 36. The case of correct prediction for No 50 Table 37. The case of wrong prediction of Yes for No 51 Table 38. The case of wrong prediction of Depends for No 52 Table 39. The case of correct prediction for Depends 53 Table 40. The case of wrong prediction of Yes for Depends 54 Table 41. The case of wrong prediction of No for Depends 55 |
參考文獻 |
REFERENCES Bentivogli L., Dagan I., Magnini B. (2017) The Recognizing Textual Entailment Challenges: Datasets and Methodologies. In: Ide N., Pustejovsky J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht Choi, E., He, H., Iyyer, M., Yatskar, M., Yih, W. T., Choi, Y., ... & Zettlemoyer, L. (2018). Quac: Question answering in context. arXiv preprint arXiv:1808.07036. Clark, C., Lee, K., Chang, M. W., Kwiatkowski, T., Collins, M., & Toutanova, K. (2019). BoolQ: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044. Dai, A.M. & Le, Q.V. (2015). Semi-supervised sequence learning. In Advances in Neural Information Processing Systems, pages 3079–3087. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193-202. Harabagiu, S., & Hickl, A. (2006, July). Methods for using textual entailment in open-domain question answering. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics (pp. 905-912). Association for Computational Linguistics. He, W., Liu, K., Liu, J., Lyu, Y., Zhao, S., Xiao, X., ... & Liu, X. (2017). DuReader: a Chinese machine reading comprehension dataset from real-world applications. arXiv preprint arXiv:1711.05073. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780. Joshi, M., Choi, E., Weld, D. S., & Zettlemoyer, L. (2017). Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551. LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436-444. Malakasiotis, P., & Androutsopoulos, I. (2007, June). Learning textual entailment using SVMs and string similarity measures. In Proceedings of the ACL-PASCAL workshop on textual entailment and paraphrasing (pp. 42-47). McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), 115-133. Mihaylov, T., Clark, P., Khot, T., & Sabharwal, A. (2018). Can a suit of armor conduct electricity? a new dataset for open book question answering. arXiv preprint arXiv:1809.02789. Nguyen, T., Rosenberg, M., Song, X., Gao, J., Tiwary, S., Majumder, R., & Deng, L. (2016). MS MARCO: a human-generated machine reading comprehension dataset. Computing Research Repository, arXiv:1611.09268. Version 3. Nunamaker Jr, J. F., Chen, M., & Purdin, T. D. (1990). Systems development in information systems research. Journal of management information systems, 7(3), 89-106. Pearson, K. (1904). III. Mathematical contributions to the theory of evolution. —XII. On a generalised theory of alternative inheritance, with special reference to Mendel's laws. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 203(359-371), 53-86. Provost, F., & Kohavi, R. (1998). On applied research in machine learning. Machine Learning, 30, 127-132. Quinlan, J. R. J. M. l. (1986). Induction of decision trees. 1(1), 81-106. Rajpurkar, P., Jia, R., & Liang, P. (2018). Know what you don't know: Unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822. Reddy, S., Chen, D., & Manning, C. D. (2019). Coqa: A conversational question answering challenge. Transactions of the Association for Computational Linguistics, 7, 249-266. Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. Russell, S. J., & Norvig, P. (2016). Artificial intelligence: a modern approach: Malaysia; Pearson Education Limited. Sondak, N. E., & Sondak, V. K. (1989, February). Neural networks and artificial intelligence. In Proceedings of the twentieth SIGCSE technical symposium on Computer science education (pp. 241-245). Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (pp. 5998-6008). Welbl, J., Stenetorp, P., & Riedel, S. (2018). Constructing datasets for multi-hop reading comprehension across documents. Transactions of the Association for Computational Linguistics, 6, 287-302. Williams, A., Nangia, N., & Bowman, S. R. (2017). A broad-coverage challenge corpus for sentence understanding through inference. arXiv preprint arXiv:1704.05426. Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., & Manning, C. D. (2018). Hotpotqa: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600. Zellers, R., Bisk, Y., Schwartz, R., & Choi, Y. (2018). Swag: A large-scale adversarial dataset for grounded commonsense inference. arXiv preprint arXiv:1808.05326. Zhang, S., Liu, X., Liu, J., Gao, J., Duh, K., & Van Durme, B. (2018). Record: Bridging the gap between human and machine commonsense reading comprehension. arXiv preprint arXiv:1810.12885. |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信