§ 瀏覽學位論文書目資料
  
系統識別號 U0002-0309202011590200
DOI 10.6846/TKU.2020.00069
論文名稱(中文) 設計與實作基於深度學習技術之QA機器人
論文名稱(英文) Design and implementation of QA robots based on deep learning techniques
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系全英語碩士班
系所名稱(英文) Master's Program, Department of Computer Science and Information Engineering (English-taught program)
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 108
學期 2
出版年 109
研究生(中文) 單以瑨
研究生(英文) Yi-Ching Shan
學號 607780029
學位類別 碩士
語言別 英文
第二語言別
口試日期 2020-07-10
論文頁數 52頁
口試委員 指導教授 - 黃連進(micro@mail.tku.edu.tw)
委員 - 張志勇(cychang@mail.tku.edu.tw)
委員 - 游國忠(133742@mail.tku.edu.tw)
關鍵字(中) 人工智慧
QA機器人
BERT
深度學習
語意比對
關鍵字提取
關鍵字(英) Artificial Intelligence
FAQ
BERT
Deep Learning
Semantic Comparison
Keyword Extraction
第三語言關鍵字
學科別分類
中文摘要
只要遇到無法理解的事物,人的想法通常是尋找了解該事物的人來詢問問題或是上網查詢,尤其是在專業領域上遇到較為深度與精細的事物時,不可避免的需要提出大量的問題來彌補知識的落差,為了解決這種問題並且更有效率的回答問題,問答機器人為了這個目的而產生。
本論文以淡江大學校園事務之問答為例,擬設計一個問答機器人AI模型,旨在以AI技術來智慧地協助提升校園問答效率,並且使校園問答能夠自動化,精簡人事成本。使用方式為使用者將校園相關問題輸入,系統會自動輸出該問題的答案。
在校園之中的4大族群:學生、家長、教授、職員,通常都有想了解的校園事務,而訴求的管道通常需透過電話聯絡校園中的各個處所來獲得相關資訊。但是校園內處所繁多,可能會不知向哪個具體處所提出想問的問題,導致問題解決效率低下,不僅如此,如果問題過於複雜且處所人員歷練不夠,將無法回答使用者的問題。而各處所則需花費大量的時間成本來回覆使用者的提問,這樣的巨大時間成本將會直接影響到學校的營運及各處所的工作效率,因此,學校需要有效的減少時間成本並對使用者做出精準的答案回覆。
傳統問答機器人常會遭遇到困難與某些癥結點,傳統的作法會先行使用斷詞斷句後提取關鍵字,最終輸出較有可能的答案,這種方法會侷限於只能判斷存在於既有資料的關鍵字而無法判讀新的字詞。而近代的深度學習則使用先分類,後比對詞向量的作法,但是仍然無法理解真實的問句語意以及因應使用者多樣化的問句。
近期,由於人工智慧相關技術的成熟與自然語言處理的突破,本論文基於最新的人工智慧技術,使用了3個深度學習神經網路來解決問答系統長期存在的2大問題,即: (1)傳統作法常常僅以使用者問題比對資料集問題的方式後直接輸出答案,沒有顧及到問題可能是不同領域類別,造成回答出錯誤的答案 (2)僅以使用者問句中較有可能為關鍵字的字詞提取出來進行比對。
針對上述的2個致命問題,本論文透過多層的細節處理,實作校園問答系統,本系統分為訓練期以及使用期:
訓練期時,先將校園中已經存在的問答集做資料預處理,在確認有問題分類不平均狀況後,透過製造延伸問題的方式,輔助增加類別較少的資料,使資料較為均衡。確認資料均衡後,將校園資料做為輸入,訓練BERT分類器,使其有能力區別使用者問題的範圍,分類器能夠解決上述所提及之問題(1)。之後,我們更進一步地訓練BERT關鍵字標註來尋找出使用者的核心意圖,最後訓練BERT語意模型來比較使用者問題與校園問答集中的最有可能問題,語意模型能夠處理上述提及之問題(2),透過整個問題完整的輸入,解決了斷章取義的可能性。
使用期時,透過BERT分類、BERT關鍵字標註加以分析問題,以縮小使用者問題範圍,最後再透過餘弦相似度、Fuzzy Wuzzy比對以及BERT語意模型,協助發現使用者的意圖,並找出使用者問題的答案,且透過額外候選問題的方式,還能將使用者進一步想問的相關問題輸出,進而以自動化及智慧化的方式,完成校園問答。
實驗階段則以使用者實際問題來確認系統的回應能力,並且與傳統方式比對,方便了解本系統之優缺點。
本系統除了提升學校時間成本的管理外,對於處所新進人員的培訓也有幫助,讓學校能更有效率的解決校園問答業務。
英文摘要
Whenever people encounter something they don't understand, the first thought usually is to find someone who knows about it to ask or to search on the Internet. Especially when they encounter something deeper and more delicate in the professional field, it is unavoidable to ask a large number of questions to cover the gap of knowledge. To solve this problem and answer questions more efficiently, QA robot is designed for this purpose.
This thesis takes the QA on campus affairs of Tamkang University as an example, to design a AI QA robot, aiming at using AI technology to intelligently help improve the efficiency of campus QA, and make campus QA automated and reduce personnel costs. The method is to input campus related questions from users, and the system will automatically output the answers to the questions.
There are four main groups on the campus: students, parents, professors and staff. They usually have things they want to know about the campus, and the pipeline of their aspirations usually needs to contact each place on the campus by phone to get relevant information. However, there are so many places on the campus that you may not know where to ask specific questions, which leads to inefficient problem solving. Moreover, if the questions are too complex and the staff are not experienced enough, the user’s questions will not be answered. It takes a lot of time to answer user’s questions. Such a huge time cost will directly affect the operation of the school and the efficiency of the work everywhere. This means school needs to try to effectively reduce the time cost and make accurate answers to users.
Traditional QA robots often encounter some difficulties, the traditional way will use the word breaker and extract keywords then output the most probable answer. This method is limited to judging only the keywords that exist in the data but not the new words. However, modern deep learning uses the method of classifying first, then comparing word vectors, but it is still unable to understand the true meaning of the question as well as the diverse questions corresponding to the users.
Recently, due to the maturity of artificial intelligence related technology and the breakthrough of natural language processing, this thesis is based on the latest artificial intelligence technology, which uses three deep learning neural networks to solve two long-standing problems in QA system, which is:(1) Traditionally, answers are usually output directly only by comparing user questions to dataset questions, without considering that the questions may be from different domain categories, resulting to response incorrect answers (2) only by extracting words from user questions that are more likely to be keywords for comparison.
For the above two fatal questions, this thesis implements a campus QA system through multi-level detailed processing, which is divided into training period and usage period:
During the training period, the existing question and answer sets on the campus are preprocessed. After identifying the unequal classification of the problems, the data with fewer categories is added to help make the data more balanced by manufacturing extended questions. After confirming that the data is balanced, use the campus data as input and train the BERT classifier to be able to distinguish the range of user problems. The classifier can solve the above mentioned problem (1). After that, we further trained the BERT keyword extraction to find out the user's core intentions. Finally, we trained the BERT semantics model to compare the user's question with the most likely question in the campus question and answer set. The semantics model can handle the above mentioned question (2), and solved the problem of taking user’s question out of context.
When it gets to using period, analyze the user’s problem by BERT classification, BERT keyword extraction to narrow down the scope, and finally by cosine similarity, Fuzzy-Wuzzy comparison and BERT semantics model help to discover user’s intentions and find answers of it. Through the way of additional candidate questions, it can also output related questions that users want to ask further, thus completing campus question and answer in an automated and intelligent way.
In the experimental phase, user’s practical questions are used to confirm the responsiveness of the system, and compared with traditional methods, it is easy to understand the advantages and disadvantages of the system.
In addition to improve the management of school time costs, this system is also helpful for the training of new recruits, so that schools can more effectively solve the campus QA business.
第三語言摘要
論文目次
TABLE OF CONTENT	VIII
LIST OF FIGURE	IX
LIST OF TABLE	XI
1.	INTRODUCTION	1
2.	RELATED WORKS	5
3.	BACKGROUND KNOWLEDGE	9
3.1 BERT technology	10
3.2 Cosine similarity	14
3.3 Chat Bot	15
4.	SYSTEM STRUCTURE	22
4-1 Environment and problem description	22
4-2 System Architecture	24
5.	SYSTEM DISPLAY	41
6.	EXPERIMENT ANALYSIS	45
7.	CONCLUSION	50
REFERENCE	52


Figure 1: The main structure of this system	9
Figure 2: Classification of single sentences and classification of each word in a sentence	14
Figure 3: Users may ask questions about things they want to know	23
Figure 4: The question is not known to which organ it is addressed and whether it belongs to that administrative organ or not	23
Figure 5: Design and implementation of a QA robot based on deep learning technology	24
Figure 6: System architecture diagram	25
Figure 7: Training material architecture diagram	26
Figure 8: Campus data set	27
Figure 9: Use of Thesaurus	28
Figure 10: Data processing architecture	29
Figure 11: Keyword Labeling Module	30
Figure 12: Extended Problem Manufacturing System	32
Figure 13: Problem identification module	33
Figure 14: Design and implementation of a deep learning neural network for identifying types of campus problems via BERT	35
Figure 15: Design and implementation of a deep learning neural network extracted by BERT keywords	36
Figure 16: Does the BERT classification system complement the BERT keyword system?	37
Figure 17: Candidate problem module	38
Figure 18: Design and implementation of a semantics model comparing a user problem with a real problem	39
Figure 19: User interface	40
Figure 20: Campus Data Collection	41
Figure 21: Administrative system with the highest probability of output for BERT problem class identification model	42
Figure 22: Keyword obtained through a keyword module	42
Figure 23: Probability of a real problem being calculated by comparison at the end of the problem	43
Figure 24: Listed questions and answers	44
Figure 25: Training BERT Classified Neural Network	46
Figure 26: Prediction of most likely administrative categories by BERT classification	46
Figure 27: Classification accuracy of three questions with different problem lengths	47
Figure 28: Problem test results	48
Figure 29: Precision and Recall	49

LIST OF TABLE
Table 1: Comparison of related research functions  8
Table 2: Main chat robot tasks  17
Table 3: A comparison of the results of the confusion matrix  49
參考文獻
[1] 	Ilya Sutskever, Oriol Vinyals, Quoc V. Le, “Sequence to Sequence Learning with Neural Networks,” Neural Information Processing Systems Conference, 2014. 
[2] 	P. Muangkammuen, N. Intiruk and K. R. Saikaew, “Automated Thai-FAQ Chatbot using RNN-LSTM,” 22nd International Computer Science and Engineering Conference (ICSEC), 2018. 
[3] 	N. P. Patel, D. R. Parikh, D. A. Patel and R. R. Patel, “AI and Web-Based Human-Like Interactive University Chatbot (UNIBOT),” 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA), 2019. 
[4] 	A. Bozzon, “Enterprise Crowd Computing for Human Aided Chatbots,” IEEE/ACM 1st International Workshop on Software Engineering for Cognitive Services (SE4COG), 2018. 
[5] 	Devlin, J., Chang, M., Lee, K., & Toutanova, K., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” NAACL-HLT, 2019. 
[6] 	Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, “Attention Is All You Need,” Advances in Neural Information Processing Systems 30 (NIPS 2017), 2017. 
[7] 	J. Mao and J. Zhu, “FAQ Auto Constructing Based on Clustering,” 2012 International Conference on Computer Science and Electronics Engineering, 2012. 
[8] 	A. Verma and A. Arora, “Reflexive hybrid approach to provide precise answer of user desired frequently asked question,” 2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence, 2017.
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信