§ 瀏覽學位論文書目資料
系統識別號 U0002-0408202014341400
DOI 10.6846/TKU.2020.00086
論文名稱(中文) 基於微調BERT模型的增強式中文問答系統
論文名稱(英文) An Enhanced Chinese Question Answering System based a Fine-Tuned BERT Model
校院名稱 淡江大學
系所名稱(中文) 電機工程學系碩士班
系所名稱(英文) Department of Electrical and Computer Engineering
學年度 108
學期 2
出版年 109
研究生(中文) 李柏誼
研究生(英文) Bo-Yi Li
學號 607450177
學位類別 碩士
語言別 繁體中文
口試日期 2020-07-11
論文頁數 71頁
口試委員 指導教授 - 衛信文
委員 - 李維聰
委員 - 朱國志
關鍵字(中) 中文問答系統
關鍵字(英) BERT
Chinese Question Answering System
現在對於自然語言處理主流的架構,為2018年由google公司發布的BERT(Bidirectional Encoder Representation from Transformers)語言模型,google使用二階段式訓練並修改輸入與輸出的表示,使得BERT模型可以處理大多數的自然語言任務並取得當時11項任務的最佳成績,在這之後如何提高BERT架構的效能是自然語言研究的一大重點。 
Natural language processing (NLP) is an important field of computer science and artificial intelligence. Due to the improvement of hardware computing power, deep learning networks can process more data than ever before and allows computers to process and analyze large amounts of human language data to increase the capability of processing natural languages. The common applications in natural language processing field include speech recognition, automatic summarization, machine translation, natural language generation, emotion analysis, and other applications. The question answering task (QA) is one of the most important tasks in the field of natural language processing since QA task includes semantic understanding, semantic inference, and so on. If we can train a good question answering model, it can be applied to many fields, which is of great help to the development of natural language processing.
Now, the mainstream architecture of natural language processing is the BERT (Bidirectional Encoder Representation from Transformers) released by Google in 2018. Google uses transfer learning and modifies the representation of input and output, so that the BERT model can handle most natural language tasks and obtain the best results of 11 tasks at that time. After that, how to improve the efficiency of the BERT is a major focus of natural language researches.
However, it is a difficult challenge to analyze the Bert model due to its huge architecture, the standard version of the 12 encoder layer architecture parameters has reached 110 million. One previous research found that the coding layer of BERT can be divided into three stages: shallow, medium, and deep layer coding stage. Each stage has a different contribution to the task. The shallow layer coding stage is mainly responsible for surface feature coding, the middle layer coding stage is responsible for syntactic feature coding, and deep layer coding stage is responsible for semantic feature coding. According to the previous study, we found that the emphasis of shallow and deep layers is not the same and there will be similar jobs in different encoding processes. Therefore, for these interesting features, this thesis proposes a method to improve the training performance of the BERT model with Chinese.
This thesis uses two kinds of data sets to fine-tune two BERT models and then combines them to achieve better results in question answering task. The first step of the propose method is to train basic BERT model for two different types of data set, DRCD-master data set and MSRA data set. The model fine-tuned by DRCD-master is called DRCD-BERT model and is taken as the main stem of the question answering model. The BERT model fine-tuning by the MSRA data set, called MSRA-BERT model can better determine the part of syntactic of the word. Then, on the premise of not affecting the question answering task model, some encoding layers in MSRA-BERT model is taken to replace some encoding layers in DRCD-BERT model to achieve better performance in question answering task in Chinese.
The experimental results show that the proposed method can successfully train sequence tagging task model and training question answering task model by exchange specific coding layer. This method does not need larger computing power and can improve the processing ability of the fine-tuned model with limited hardware resources.
致謝	I
中文摘要	II
英文摘要	IV
目錄	Ⅵ
圖目錄	Ⅷ
表目錄	Ⅺ
第一章 緒論	1
1.1	前言	1
1.2	動機與目的	2
1.3	論文章節架構	3
第二章 背景知識與相關文獻	4
2.1	自然語言處理	4
2.2	Encoder - Decoder	12
2.3	Transformer	14
2.3.1	Attention	15
2.3.2	Self-Attention	17
2.3.3	Scaled Dot-Product Attention	18
2.3.4	Multi-Head Attention	20
第三章 基於微調BERT模型的增強式中文問答系統	21
3.1	BERT	21
3.1.1	輸入表示(Input Representation)	23
3.1.2	Fine-Tuning	24
3.2	BERT先前研究	32
3.2.1	任務與Transformer層關係	32
3.2.2	BERT在問答任務的訊息管道	34
3.3	層置換方法	39
3.4	流程圖	41
第四章	訓練與結果	43
4.1	資料集	43
4.2	實驗環境	46
4.3	實驗結果	47
第五章	貢獻與未來展望	68
5.1 主要貢獻	68
5.2 未來展望	68
參考文獻	69

圖2.1	序列到序列模型(sequence-to-sequence)	5
圖2.2	UPNN情感分析模型[11]	7
圖2.3	Text-CNN文字分類模型	9
圖2.4	PEGASUS文章摘要模型	10
圖2.5	問答語言模型	12
圖2.6	Encoder-Decoder模型	13
圖2.7	神經網路機制模型	15
圖2.8	Attention計算示意圖[15]	16
圖2.9	注意力機制可視化	18
圖2.10	注意力計算	19
圖2.11	多頭注意力機制	20
圖3.1	BERT輸入表示	23
圖3.2	單句分類微調	25
圖3.3	句子對分類任務微調	26
圖3.4	單句標記任務微調	28
圖3.5	NER任務資料集	29
圖3.6	問答語言任務微調	30
圖3.7	各層對特定任務的貢獻比率[19]	33
圖3.8	各編碼層分析	34
圖3.9	Layer2標計分佈狀況	36
圖3.10	Layer5標籤分佈狀況	37
圖3.11	Layer7標籤分佈狀況	38
圖3.12	Layer11標籤分佈狀況	39
圖3.13	置換方法示意圖	41
圖3.14	流程圖	42
圖4.1	MSRA任務資料集	45
圖4.2	預訓練BERT模型微調後EM分數折線圖	48
圖4.3	預訓練BERT模型微調後F1分數折線圖	49
圖4.4	預訓練模型注意力可視化圖	51
圖4.5	18000迭代注意力可視圖	52
圖4.6	108000迭代注意力可視化圖	53
圖4.7	原始模型標記圖	55
圖4.8	原始預訓練模型注意力圖(全)	56
圖4.9	18000迭代標記圖	58
圖4.10	迭代18000注意力圖(全)	59
圖4.11	108000迭代標記圖	61
圖4.12	迭代108000注意力圖(全)	62
圖4.13	各模型EM分數	63
圖4.14 各模型F1分數	64
圖4.15  EM分數圖	66
圖4.16  F1分數圖	67

表3.1	分類任務資料集	26
表3.2	句子對任務資料集	27
表3.3	SQUAD問答語言資料集	31
表4.1	DRCD問題數量分佈	44
表4.2	DRCD資料集內容	44
表4.3	MSRA標籤分佈	45
表4.4	MSRA分數	46
表4.5	硬體規格	46
表4.6	參數設置1	47
表4.7	注意力可視化使用題目	50
表4.8	參數設置2	65
[1]Ian H. Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal, “Data Mining”. Elsevier Inc. Press, 2017.
[2]Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton “ImageNet Classification with Deep Convolutional Neural Networks”. In NIPS, 2012.
[3]Jeffrey L. Elman, “Finding Structure in Time”. Cognitive Science. Volume 14, Issue 2,Pages 179-211 (April–June 1990)
[4]Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le,  “XLNet: Generalized Autoregressive Pretraining for Language Understanding” ,(2019,Jun,19).
[5]Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations”, (2019,Sep,26), Published as a conference paper at ICLR 2020.
[6]Elizabeth D. Liddy, “Natural language processing”, 2nd edn. Encyclopedia of Library and Information Science, Marcel Decker,2001.
[7]Sepp Hochreiter, Jürgen Schmidhuber. “LONG SHORT-TERM MEMORY” Neural Computation 9(8):1735-1780, 1997.
[8]Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu. “Recurrent Models of Visual Attention”. NIPS,(2014.Jun 24)
[9]Ashish Vaswani,Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. “Attention Is All You Need”, NIPS,(2017,Jun,12).
[10]Doug Laney. “3-D Data Management: Controlling Data Volume, Velocity and Variety”.META Group Inc, 2001.
[11]Duyu Tang, Bing Qin, Ting Liu. “Learning Semantic Representations of Users and Products for Document Level Sentiment Classification”, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
[12]Yoon Kim. “Convolutional Neural Networks for Sentence Classification”. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.
[13]Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. “Rich feature hierarchies for accurate object detection and semantic segmentation”.(2013,Nov,11)
[14]Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu. “PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization”,(2019,Dec,18)
[15]Minh-Thang Luong, Hieu Pham, Christopher D. Manning. “Effective Approaches to Attention-based Neural Machine Translation”,(2015,Sep,20)
[16]Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, (2018,Oct,11) . arXiv preprint arXiv:1810.04805.
[17]Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang. “SQuAD: 100,000+ Questions for Machine Comprehension of Text”.(2016,Jun,16)
[18]Matthew E, Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer. “Deep contextualized word representations”(2018,Feb,15)
[19]Ian Tenney, Dipanjan Das, Ellie Pavlick. “BERT Rediscovers the Classical NLP Pipeline”.(2019,Aug,9)
[20]Betty van Aken, Benjamin Winter, Alexander Löser, Felix A. Gers. “How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations”.(2019,Sep,11)
[21]Wold S, Esbensen K, Geladi P. “Principal component analysis”. Chemom. Intell. Lab. Syst., 2 (1987), pp. 37-52.
[22]Chih Chieh Shao, Trois Liu, Yuting Lai, Yiying Tseng, Sam Tsai.“DRCD: a Chinese Machine Reading Comprehension Dataset”.(2018,Mon,4)
[23]Jesse Vig. “A Multiscale Visualization of Attention in the Transformer Model”.(2019,Jun,12)
[24]Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, Guoping Hu. “Pre-Training with Whole Word Masking for Chinese BERT”.(2019,jun,19)

圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信