§ 瀏覽學位論文書目資料
  
系統識別號 U0002-2409202007541500
DOI 10.6846/TKU.2020.00713
論文名稱(中文) 基於深度學習之Q&A機器人自動生成問題技術
論文名稱(英文) Question Generation Technology based on Deep Learning for Q&A Robots
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 108
學期 2
出版年 109
研究生(中文) 陳俊廷
研究生(英文) Jun-Ting Chen
學號 607410221
學位類別 碩士
語言別 繁體中文
第二語言別 英文
口試日期 2020-07-10
論文頁數 45頁
口試委員 指導教授 - 張世豪
委員 - 游國忠
委員 - 張世豪
委員 - 張志勇
關鍵字(中) 深度學習
GPT-2
序列比對
問題生成
關鍵字(英) Deep Learning
GPT-2
Sequence Alignment
Question generation
第三語言關鍵字
學科別分類
中文摘要
近年來,問答機器人的發展蒸蒸日上,已有大量產品與平台實際應用於協助人們自動回答使用者所詢問的事情。問答機器人可分為兩類機器人,一為工具性的服務機器人,其主要針對客戶的問題回應適當的答案,二為聊天型機器人,以有趣的回答與客戶進行溝通。在問答機器人之中,機器人正確回答出使用者的問題是一件不容易的事情,因為人們的問法有多樣化的方式,故問答機器人無法準確瞭解使用者的提問,因此問答機器人需要有能力辨識多樣化的問法,在問答機器人訓練時,將多樣化的問題且人們會提問的方式,放入問答機器人的問題集內。然而,問答機器人建立多樣化的問題集時,需透過員工自行填寫問題對應至答案來建立訓練模型,而員工造出的問題數量有限,且可能不是一般人會詢問的方式,更可能因問法接近而對訓練機器人沒有效果,導致問答機器人回答出的答案與使用者的問題不符合。在過去的問答機器人中,可能會面臨兩大問題,一為人工生成的問題資料不夠多元化,二為自動生成出的問題集句型不夠完整。
有鑑於近年來深度學習相關技術的成熟,本論文基於深度學習之技術,擬發展智慧化生成問題集系統的設計與實作,解決上述的兩大問題。
本論文所提出的「設計及實作基於深度學習之Q&A機器人自動生成問題技術」大致可分為三大部份,一為收集多樣化的問句,二為自動生成問題技術,三為修正訓練問答機器人。其一,因人們的問法多樣化,因此本論文透過網路爬蟲方式,提取多樣式問題的資料;其二,為了生成多樣化問題且是人會提問的方式,本論文針對從網路爬蟲下的多樣式問題進行分析,將問題剖析進行分類為「人、事情、時間、地點、物品」,透過此分類可將相似的問句提取出來,並對使用者問題的內容與相似的問句進行替換,而問句的內容進行替換,可能會發生問句不完整的情況下,因此本論文透過GPT-2技術將問題進行完整化,延伸出完好的問題句子,進而達到協助員工自動化延伸問題集的資料,以自動化及智慧化的方式,減少人事成本、人員負擔,提升問題多樣性與問答機器人的準確性,讓企業管理營運成本時更有效率。
英文摘要
In recent years, with the rapid development of question and answer robots, a large number of products and platforms have been applied to help people automatically answer questions from users. Question-and-answer robots can be divided into two types, one is a tool-based service robot, which mainly responds to appropriate answers to customers' questions, and the other is a chat-type robot, which communicates with customers with interesting answers. In the question and answer robot, it is not easy for the robot to correctly answer the user's questions. Because people have different ways of asking questions, the question and answer robot can not accurately understand the user's questions. Therefore, the question and answer robot needs to be able to recognize a variety of questions. In the question and answer robot training, put a variety of questions and the way people will ask questions. Enter the question set of the question answering robot. However, when building a diverse set of questions, the Q&A robot needs to build a training model by filling out the corresponding answers to the questions by the employees themselves. The number of questions created by the employees is limited, and may not be the way that the general people will ask. It is more likely that the Q&A robot will not be effective for the training robots because of the close question method, resulting in the answers that the Q&A robot answers do not match the user's questions. In the past, question-and-answer robots may face two major problems, one is that the data generated by human is not diverse enough, and the other is that the sentences of automatically generated question sets are not complete enough.
With the maturity of in-depth learning related technology in recent years, this paper is based on in-depth learning technology, and intends to develop the design and implementation of an intelligent generation problem set system to solve the above two major problems.
The "Design and Implementation of Q&A Robot Auto-generating Problem Technology Based on in-depth Learning" proposed in this paper can be roughly divided into three parts, one is to collect a variety of questions, the other is to generate questions automatically, and the other is to train a modified Q&A robot. First, due to the diversity of people's questions, this paper extracts the data of a variety of questions by means of web crawling. Second, in order to generate a variety of questions and to be the way people will ask them, this paper analyzes the variety of questions from the web crawling, and classifies the problem analysis into "people, things, time, places, objects". Through this classification, the phase can be identified. Similar questions are extracted and replaced with similar questions. When the contents of questions are replaced, incomplete questions may occur. Therefore, this paper uses GPT-2 technology to complete the questions and extend the intact question sentences, so as to help employees automate and extend the information of question set for automation and intelligence. Ways to reduce personnel costs, personnel burdens, improve the diversity of questions and the accuracy of question and answer robots, so as to make enterprise management more efficient in operating costs.
第三語言摘要
論文目次
目錄	VI
圖目錄	VII
表目錄	VIII
第一章、簡介	1
第二章、相關研究	4
第三章、背景知識	8
3.1 JIEBA技術	9
3.2 序列比對	9
3.3 GPT-2(Generative Pre-Training)技術	10
第四章、系統架構	12
4.1 環境與問題描述	12
4.2 系統架構	14
第五章、系統實作	23
第六章、實驗分析	29
第七章、結論	31
參考文獻	32
附錄-英文論文	33
圖目錄
圖1:背景知識與各項技術	8
圖2:使用GPT-2預測輸入語句下一個字詞	11
圖3:Transformer模型	11
圖4:設計及實作基於深度學習之Q&A機器人自動生成問題技術	13
圖5:系統架構圖	14
圖6:訓練資料階段架構圖	15
圖7:問法多樣化簡介資料示意圖	16
圖8:專案名稱計算階段	17
圖9:預處理訓練集資料	18
圖10:標註語句中重要的字詞	18
圖11:語句命名實體辨識	19
圖12:序列比對	20
圖13:修正語句的完整性	21
圖14:修正訓練Q&A模型	22
圖15:命名實體識別	25
圖16:輸入欲辨識相似的句子	26
圖17:重要字詞替換於相似問句上	27
圖18:GPT-2模型修正不完整的句子	28
圖19:命名實體類別不同下,相似句型的精準度	29
圖20:Precision/Recall	30
表目錄
表1:相關研究功能比較表	7
參考文獻
[1]	X. Tang, H. Gao and J. Gao, "Knowledge-based Questions Generation with Seq2Seq Learning," 2018 IEEE International Conference on Progress in Informatics and Computing (PIC), Suzhou, China, 2018, pp. 180-184.
[2]	Bang Liu, Haojie Wei, Di Niu, Haolan Chen, Yancheng He, "Asking Questions the Human Way: Scalable Question-Answer Generation from Text Corpus," arXiv:2002.00748v2  [cs.CL]  5 Mar 2020.
[3]	P. Pabitha, M. Mohana, S. Suganthi and B. Sivanandhini, "Automatic Question Generation system," 2014 International Conference on Recent Trends in Information Technology, Chennai, 2014, pp. 1-5.
[4]	A. Srivastava, S. Shinde, N. Patel, S. Despande, A. Dalvi and S. Tripathi, "Questionator-Automated Question Generation using Deep Learning," 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), Vellore, India, 2020, pp. 1-5.
[5]	Linfeng Song, Zhiguo Wang, Wael Hamza, Yue Zhang and Daniel Gildea, "Leveraging Context Information for Natural Question Generation", Association for Computational Linguistics, 2018, pp.569-574.
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信