§ 瀏覽學位論文書目資料
  
系統識別號 U0002-0408202014341400
DOI 10.6846/TKU.2020.00086
論文名稱(中文) 基於微調BERT模型的增強式中文問答系統
論文名稱(英文) An Enhanced Chinese Question Answering System based a Fine-Tuned BERT Model
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 電機工程學系碩士班
系所名稱(英文) Department of Electrical and Computer Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 108
學期 2
出版年 109
研究生(中文) 李柏誼
研究生(英文) Bo-Yi Li
學號 607450177
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2020-07-11
論文頁數 71頁
口試委員 指導教授 - 衛信文
委員 - 李維聰
委員 - 朱國志
關鍵字(中) 中文問答系統
自然語言處理
BERT
關鍵字(英) BERT
NLP
Chinese Question Answering System
第三語言關鍵字
學科別分類
中文摘要
自然語言處理(NLP)是現在計算機科學及人工智慧著重的領域,得益於現今硬體運算能力的提升,深度學習網路可以計算比以往更加龐大的數據,使得計算機能處理並分析大量人類使用的語言數據,讓處理自然語言的方法與技術更加成熟。自然語言領域常見的應用有語音辨識、摘要文本大綱、機器翻譯、自然語言生成,情緒分析等,而在自然語言處理中常見的問答語言任務(QA),則是可以包含最多文字語意訊息的任務類型,若可以訓練出一個優良的問答語言模型,則可以應用於許多領域,對於自然語言處理發展有很大的幫助。
現在對於自然語言處理主流的架構,為2018年由google公司發布的BERT(Bidirectional Encoder Representation from Transformers)語言模型,google使用二階段式訓練並修改輸入與輸出的表示,使得BERT模型可以處理大多數的自然語言任務並取得當時11項任務的最佳成績,在這之後如何提高BERT架構的效能是自然語言研究的一大重點。 
由於BERT本身的架構龐大,其中標準版的12層編碼層架構參數量就已經達到一億一千萬,如何分析BERT模型也是一大挑戰,在先前的研究中發現可以將BERT分為淺中深三種編碼階段,每個階段對於任務的貢獻並不相同,在淺層編碼階段主要負責表面特徵編碼、中層負責語句特徵編碼、深層負責語意特徵編碼。根據先前研究的探討,發現淺層與深層所著重的部分並不相同,同時,在不同的編碼過程中會具有類似的工作情形,基於這種有趣的特性,故本文提出一種適用於中文語言與提高模型訓練效能的方法。
本論文提出利用不同種資料集分別微調兩個BERT模型後再將其組合以達到在問答語言任務中更為優秀的效果。本論文的做法可以分為兩個部分,首先利用兩種不同類型的資料集DRCD-master資料集與MSRA數據集分別訓練出基本的BERT模型,最後,以經由DRCD-master微調訓練出來的BERT模型符合問答任務主要方向,由於經由MSRA數據集所訓練出的BERT模型可以更好的確定單詞的詞性,在不影響問答語言任務模型的前提下,通過交換兩個模型中的編碼層使BERT中文語言模型達到更佳的效果。
研究結果顯示,本論文所提出利用訓練序列標註任務與訓練問答任務並交換特定的編碼層的方法,不會需要比原先更高規格的硬體效能,可以在有限的硬體資源中提高語言模型語言能力。
英文摘要
Natural language processing (NLP) is an important field of computer science and artificial intelligence. Due to the improvement of hardware computing power, deep learning networks can process more data than ever before and allows computers to process and analyze large amounts of human language data to increase the capability of processing natural languages. The common applications in natural language processing field include speech recognition, automatic summarization, machine translation, natural language generation, emotion analysis, and other applications. The question answering task (QA) is one of the most important tasks in the field of natural language processing since QA task includes semantic understanding, semantic inference, and so on. If we can train a good question answering model, it can be applied to many fields, which is of great help to the development of natural language processing.
Now, the mainstream architecture of natural language processing is the BERT (Bidirectional Encoder Representation from Transformers) released by Google in 2018. Google uses transfer learning and modifies the representation of input and output, so that the BERT model can handle most natural language tasks and obtain the best results of 11 tasks at that time. After that, how to improve the efficiency of the BERT is a major focus of natural language researches.
However, it is a difficult challenge to analyze the Bert model due to its huge architecture, the standard version of the 12 encoder layer architecture parameters has reached 110 million. One previous research found that the coding layer of BERT can be divided into three stages: shallow, medium, and deep layer coding stage. Each stage has a different contribution to the task. The shallow layer coding stage is mainly responsible for surface feature coding, the middle layer coding stage is responsible for syntactic feature coding, and deep layer coding stage is responsible for semantic feature coding. According to the previous study, we found that the emphasis of shallow and deep layers is not the same and there will be similar jobs in different encoding processes. Therefore, for these interesting features, this thesis proposes a method to improve the training performance of the BERT model with Chinese.
This thesis uses two kinds of data sets to fine-tune two BERT models and then combines them to achieve better results in question answering task. The first step of the propose method is to train basic BERT model for two different types of data set, DRCD-master data set and MSRA data set. The model fine-tuned by DRCD-master is called DRCD-BERT model and is taken as the main stem of the question answering model. The BERT model fine-tuning by the MSRA data set, called MSRA-BERT model can better determine the part of syntactic of the word. Then, on the premise of not affecting the question answering task model, some encoding layers in MSRA-BERT model is taken to replace some encoding layers in DRCD-BERT model to achieve better performance in question answering task in Chinese.
The experimental results show that the proposed method can successfully train sequence tagging task model and training question answering task model by exchange specific coding layer. This method does not need larger computing power and can improve the processing ability of the fine-tuned model with limited hardware resources.
第三語言摘要
論文目次
致謝	I
中文摘要	II
英文摘要	IV
目錄	Ⅵ
圖目錄	Ⅷ
表目錄	Ⅺ
第一章 緒論	1
1.1	前言	1
1.2	動機與目的	2
1.3	論文章節架構	3
第二章 背景知識與相關文獻	4
2.1	自然語言處理	4
2.2	Encoder - Decoder	12
2.3	Transformer	14
2.3.1	Attention	15
2.3.2	Self-Attention	17
2.3.3	Scaled Dot-Product Attention	18
2.3.4	Multi-Head Attention	20
第三章 基於微調BERT模型的增強式中文問答系統	21
3.1	BERT	21
3.1.1	輸入表示(Input Representation)	23
3.1.2	Fine-Tuning	24
3.2	BERT先前研究	32
3.2.1	任務與Transformer層關係	32
3.2.2	BERT在問答任務的訊息管道	34
3.3	層置換方法	39
3.4	流程圖	41
第四章	訓練與結果	43
4.1	資料集	43
4.2	實驗環境	46
4.3	實驗結果	47
第五章	貢獻與未來展望	68
5.1 主要貢獻	68
5.2 未來展望	68
參考文獻	69

圖目錄
圖2.1	序列到序列模型(sequence-to-sequence)	5
圖2.2	UPNN情感分析模型[11]	7
圖2.3	Text-CNN文字分類模型	9
圖2.4	PEGASUS文章摘要模型	10
圖2.5	問答語言模型	12
圖2.6	Encoder-Decoder模型	13
圖2.7	神經網路機制模型	15
圖2.8	Attention計算示意圖[15]	16
圖2.9	注意力機制可視化	18
圖2.10	注意力計算	19
圖2.11	多頭注意力機制	20
圖3.1	BERT輸入表示	23
圖3.2	單句分類微調	25
圖3.3	句子對分類任務微調	26
圖3.4	單句標記任務微調	28
圖3.5	NER任務資料集	29
圖3.6	問答語言任務微調	30
圖3.7	各層對特定任務的貢獻比率[19]	33
圖3.8	各編碼層分析	34
圖3.9	Layer2標計分佈狀況	36
圖3.10	Layer5標籤分佈狀況	37
圖3.11	Layer7標籤分佈狀況	38
圖3.12	Layer11標籤分佈狀況	39
圖3.13	置換方法示意圖	41
圖3.14	流程圖	42
圖4.1	MSRA任務資料集	45
圖4.2	預訓練BERT模型微調後EM分數折線圖	48
圖4.3	預訓練BERT模型微調後F1分數折線圖	49
圖4.4	預訓練模型注意力可視化圖	51
圖4.5	18000迭代注意力可視圖	52
圖4.6	108000迭代注意力可視化圖	53
圖4.7	原始模型標記圖	55
圖4.8	原始預訓練模型注意力圖(全)	56
圖4.9	18000迭代標記圖	58
圖4.10	迭代18000注意力圖(全)	59
圖4.11	108000迭代標記圖	61
圖4.12	迭代108000注意力圖(全)	62
圖4.13	各模型EM分數	63
圖4.14 各模型F1分數	64
圖4.15  EM分數圖	66
圖4.16  F1分數圖	67

表目錄
表3.1	分類任務資料集	26
表3.2	句子對任務資料集	27
表3.3	SQUAD問答語言資料集	31
表4.1	DRCD問題數量分佈	44
表4.2	DRCD資料集內容	44
表4.3	MSRA標籤分佈	45
表4.4	MSRA分數	46
表4.5	硬體規格	46
表4.6	參數設置1	47
表4.7	注意力可視化使用題目	50
表4.8	參數設置2	65
參考文獻
[1]Ian H. Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal, “Data Mining”. Elsevier Inc. Press, 2017.
[2]Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton “ImageNet Classification with Deep Convolutional Neural Networks”. In NIPS, 2012.
[3]Jeffrey L. Elman, “Finding Structure in Time”. Cognitive Science. Volume 14, Issue 2,Pages 179-211 (April–June 1990)
[4]Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le,  “XLNet: Generalized Autoregressive Pretraining for Language Understanding” ,(2019,Jun,19).
[5]Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. “ALBERT: A Lite BERT for Self-supervised Learning of Language Representations”, (2019,Sep,26), Published as a conference paper at ICLR 2020.
[6]Elizabeth D. Liddy, “Natural language processing”, 2nd edn. Encyclopedia of Library and Information Science, Marcel Decker,2001.
[7]Sepp Hochreiter, Jürgen Schmidhuber. “LONG SHORT-TERM MEMORY” Neural Computation 9(8):1735-1780, 1997.
[8]Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu. “Recurrent Models of Visual Attention”. NIPS,(2014.Jun 24)
[9]Ashish Vaswani,Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin. “Attention Is All You Need”, NIPS,(2017,Jun,12).
[10]Doug Laney. “3-D Data Management: Controlling Data Volume, Velocity and Variety”.META Group Inc, 2001.
[11]Duyu Tang, Bing Qin, Ting Liu. “Learning Semantic Representations of Users and Products for Document Level Sentiment Classification”, Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing
[12]Yoon Kim. “Convolutional Neural Networks for Sentence Classification”. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.
[13]Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. “Rich feature hierarchies for accurate object detection and semantic segmentation”.(2013,Nov,11)
[14]Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu. “PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization”,(2019,Dec,18)
[15]Minh-Thang Luong, Hieu Pham, Christopher D. Manning. “Effective Approaches to Attention-based Neural Machine Translation”,(2015,Sep,20)
[16]Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding”, (2018,Oct,11) . arXiv preprint arXiv:1810.04805.
[17]Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, Percy Liang. “SQuAD: 100,000+ Questions for Machine Comprehension of Text”.(2016,Jun,16)
[18]Matthew E, Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer. “Deep contextualized word representations”(2018,Feb,15)
[19]Ian Tenney, Dipanjan Das, Ellie Pavlick. “BERT Rediscovers the Classical NLP Pipeline”.(2019,Aug,9)
[20]Betty van Aken, Benjamin Winter, Alexander Löser, Felix A. Gers. “How Does BERT Answer Questions? A Layer-Wise Analysis of Transformer Representations”.(2019,Sep,11)
[21]Wold S, Esbensen K, Geladi P. “Principal component analysis”. Chemom. Intell. Lab. Syst., 2 (1987), pp. 37-52.
[22]Chih Chieh Shao, Trois Liu, Yuting Lai, Yiying Tseng, Sam Tsai.“DRCD: a Chinese Machine Reading Comprehension Dataset”.(2018,Mon,4)
[23]Jesse Vig. “A Multiscale Visualization of Attention in the Transformer Model”.(2019,Jun,12)
[24]Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang, Shijin Wang, Guoping Hu. “Pre-Training with Whole Word Masking for Chinese BERT”.(2019,jun,19)
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信