||Design and implementation of QA robots based on deep learning techniques
||Master’s Program, Department of Computer Science and Information Engineering (English-taught program
近期，由於人工智慧相關技術的成熟與自然語言處理的突破，本論文基於最新的人工智慧技術，使用了3個深度學習神經網路來解決問答系統長期存在的2大問題，即: (1)傳統作法常常僅以使用者問題比對資料集問題的方式後直接輸出答案，沒有顧及到問題可能是不同領域類別，造成回答出錯誤的答案 (2)僅以使用者問句中較有可能為關鍵字的字詞提取出來進行比對。
||Whenever people encounter something they don't understand, the first thought usually is to find someone who knows about it to ask or to search on the Internet. Especially when they encounter something deeper and more delicate in the professional field, it is unavoidable to ask a large number of questions to cover the gap of knowledge. To solve this problem and answer questions more efficiently, QA robot is designed for this purpose.
This thesis takes the QA on campus affairs of Tamkang University as an example, to design a AI QA robot, aiming at using AI technology to intelligently help improve the efficiency of campus QA, and make campus QA automated and reduce personnel costs. The method is to input campus related questions from users, and the system will automatically output the answers to the questions.
There are four main groups on the campus: students, parents, professors and staff. They usually have things they want to know about the campus, and the pipeline of their aspirations usually needs to contact each place on the campus by phone to get relevant information. However, there are so many places on the campus that you may not know where to ask specific questions, which leads to inefficient problem solving. Moreover, if the questions are too complex and the staff are not experienced enough, the user’s questions will not be answered. It takes a lot of time to answer user’s questions. Such a huge time cost will directly affect the operation of the school and the efficiency of the work everywhere. This means school needs to try to effectively reduce the time cost and make accurate answers to users.
Traditional QA robots often encounter some difficulties, the traditional way will use the word breaker and extract keywords then output the most probable answer. This method is limited to judging only the keywords that exist in the data but not the new words. However, modern deep learning uses the method of classifying first, then comparing word vectors, but it is still unable to understand the true meaning of the question as well as the diverse questions corresponding to the users.
Recently, due to the maturity of artificial intelligence related technology and the breakthrough of natural language processing, this thesis is based on the latest artificial intelligence technology, which uses three deep learning neural networks to solve two long-standing problems in QA system, which is:(1) Traditionally, answers are usually output directly only by comparing user questions to dataset questions, without considering that the questions may be from different domain categories, resulting to response incorrect answers (2) only by extracting words from user questions that are more likely to be keywords for comparison.
For the above two fatal questions, this thesis implements a campus QA system through multi-level detailed processing, which is divided into training period and usage period:
During the training period, the existing question and answer sets on the campus are preprocessed. After identifying the unequal classification of the problems, the data with fewer categories is added to help make the data more balanced by manufacturing extended questions. After confirming that the data is balanced, use the campus data as input and train the BERT classifier to be able to distinguish the range of user problems. The classifier can solve the above mentioned problem (1). After that, we further trained the BERT keyword extraction to find out the user's core intentions. Finally, we trained the BERT semantics model to compare the user's question with the most likely question in the campus question and answer set. The semantics model can handle the above mentioned question (2), and solved the problem of taking user’s question out of context.
When it gets to using period, analyze the user’s problem by BERT classification, BERT keyword extraction to narrow down the scope, and finally by cosine similarity, Fuzzy-Wuzzy comparison and BERT semantics model help to discover user’s intentions and find answers of it. Through the way of additional candidate questions, it can also output related questions that users want to ask further, thus completing campus question and answer in an automated and intelligent way.
In the experimental phase, user’s practical questions are used to confirm the responsiveness of the system, and compared with traditional methods, it is easy to understand the advantages and disadvantages of the system.
In addition to improve the management of school time costs, this system is also helpful for the training of new recruits, so that schools can more effectively solve the campus QA business.
||TABLE OF CONTENT VIII
LIST OF FIGURE IX
LIST OF TABLE XI
1. INTRODUCTION 1
2. RELATED WORKS 5
3. BACKGROUND KNOWLEDGE 9
3.1 BERT technology 10
3.2 Cosine similarity 14
3.3 Chat Bot 15
4. SYSTEM STRUCTURE 22
4-1 Environment and problem description 22
4-2 System Architecture 24
5. SYSTEM DISPLAY 41
6. EXPERIMENT ANALYSIS 45
7. CONCLUSION 50
Figure 1: The main structure of this system 9
Figure 2: Classification of single sentences and classification of each word in a sentence 14
Figure 3: Users may ask questions about things they want to know 23
Figure 4: The question is not known to which organ it is addressed and whether it belongs to that administrative organ or not 23
Figure 5: Design and implementation of a QA robot based on deep learning technology 24
Figure 6: System architecture diagram 25
Figure 7: Training material architecture diagram 26
Figure 8: Campus data set 27
Figure 9: Use of Thesaurus 28
Figure 10: Data processing architecture 29
Figure 11: Keyword Labeling Module 30
Figure 12: Extended Problem Manufacturing System 32
Figure 13: Problem identification module 33
Figure 14: Design and implementation of a deep learning neural network for identifying types of campus problems via BERT 35
Figure 15: Design and implementation of a deep learning neural network extracted by BERT keywords 36
Figure 16: Does the BERT classification system complement the BERT keyword system? 37
Figure 17: Candidate problem module 38
Figure 18: Design and implementation of a semantics model comparing a user problem with a real problem 39
Figure 19: User interface 40
Figure 20: Campus Data Collection 41
Figure 21: Administrative system with the highest probability of output for BERT problem class identification model 42
Figure 22: Keyword obtained through a keyword module 42
Figure 23: Probability of a real problem being calculated by comparison at the end of the problem 43
Figure 24: Listed questions and answers 44
Figure 25: Training BERT Classified Neural Network 46
Figure 26: Prediction of most likely administrative categories by BERT classification 46
Figure 27: Classification accuracy of three questions with different problem lengths 47
Figure 28: Problem test results 48
Figure 29: Precision and Recall 49
LIST OF TABLE
Table 1: Comparison of related research functions 8
Table 2: Main chat robot tasks 17
Table 3: A comparison of the results of the confusion matrix 49
|| Ilya Sutskever, Oriol Vinyals, Quoc V. Le, “Sequence to Sequence Learning with Neural Networks,” Neural Information Processing Systems Conference, 2014.
 P. Muangkammuen, N. Intiruk and K. R. Saikaew, “Automated Thai-FAQ Chatbot using RNN-LSTM,” 22nd International Computer Science and Engineering Conference (ICSEC), 2018.
 N. P. Patel, D. R. Parikh, D. A. Patel and R. R. Patel, “AI and Web-Based Human-Like Interactive University Chatbot (UNIBOT),” 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA), 2019.
 A. Bozzon, “Enterprise Crowd Computing for Human Aided Chatbots,” IEEE/ACM 1st International Workshop on Software Engineering for Cognitive Services (SE4COG), 2018.
 Devlin, J., Chang, M., Lee, K., & Toutanova, K., “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” NAACL-HLT, 2019.
 Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia Polosukhin, “Attention Is All You Need,” Advances in Neural Information Processing Systems 30 (NIPS 2017), 2017.
 J. Mao and J. Zhu, “FAQ Auto Constructing Based on Clustering,” 2012 International Conference on Computer Science and Electronics Engineering, 2012.
 A. Verma and A. Arora, “Reflexive hybrid approach to provide precise answer of user desired frequently asked question,” 2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence, 2017.