系統識別號 | U0002-1009202310590300 |
---|---|
DOI | 10.6846/tku202300653 |
論文名稱(中文) | 基於知識圖譜、圖神經網路與ChatGPT設計與實作非結構性文件之QA問答系統 |
論文名稱(英文) | Design and Implementation of a Question-Answering System for Unstructured Documents based on Knowledge Graph, Graph Neural Networks and ChatGPT |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 資訊工程學系碩士班 |
系所名稱(英文) | Department of Computer Science and Information Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 111 |
學期 | 2 |
出版年 | 112 |
研究生(中文) | 楊家宇 |
研究生(英文) | Jia-Yu Yang |
學號 | 610414012 |
學位類別 | 碩士 |
語言別 | 繁體中文 |
第二語言別 | |
口試日期 | 2023-06-12 |
論文頁數 | 40頁 |
口試委員 |
口試委員
-
張志勇(cychang@mail.tku.edu.tw)
指導教授 - 黃仁俊(victor@gms.tku.edu.tw) 口試委員 - 廖文華(whliao@ntub.edu.tw) 口試委員 - 蒯思齊(sckuai@ntub.edu.tw) |
關鍵字(中) |
非結構性文件 ChatGPT 知識圖譜 自然語言處理 問答系統 人工智慧 圖神經網路 |
關鍵字(英) |
Unstructured Documents Question-Answering system Artificial Intelligence ChatGPT Knowledge Graph Graph Neural Networks Natural Language Processing |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
隨著數據爆炸性增長和信息碎片化,處理非結構性文件以提取有價值信息的挑戰變得日益迫切。傳統的問答(QA)系統通常局限於結構化數據,而在處理非結構性文件數據時表現不佳。這是因為這些系統難以理解文本的語境,也難以應對複雜的自然語言問題。因此,開發非結構性文件問答系統一直是一個具有挑戰性但極具價值的研究方向。ChatGPT技術的盛行和知識圖譜技術的發展,讓問答系統的開發有了新的突破方向。ChatGPT可以幫助系統很好地理解語境,並能更自然地回答用戶的問題。知識圖譜則可以幫助系統擁有更廣泛的知識基礎,回答各個領域的專業問題。這兩種技術的結合使得處理非結構性文件的問答系統更具潛力,能夠應對日益複雜的信息挑戰。爲了更好地讓系統回答非結構性文件中内含的知識。本論文主要透過自然語言處理技術分析文章的内容,找出文章(非結構性文件)的實體和關係,並建立文章的知識圖譜,訓練圖神經網路讓系統可以從使用者的問題中找到更多的相關要素,以此完善系統的資訊檢索功能,最後運用ChatGPT的語言組織能力,實現一個高效且靈活的非結構性文件問答(QA)系統。 |
英文摘要 |
With the explosive growth of data and information fragmentation, the challenge of processing unstructured documents to extract valuable information has become increasingly urgent. Traditional Question-Answer (QA) systems are often limited to structured data and perform poorly when dealing with unstructured document data. This is because these systems struggle to understand the context of the text and handle complex natural language questions. Therefore, the development of unstructured document QA systems has been a challenging yet highly valuable research direction. The prevalence of ChatGPT technology and the development of knowledge graph techniques have opened up new breakthrough directions in QA system development. ChatGPT can help the system understand context well and provide more natural answers to user questions. Knowledge graphs can help the system have a broader knowledge base and answer specialized questions in various domains. The combination of these two technologies makes unstructured document QA systems more promising and capable of addressing increasingly complex information challenges. In order to better enable the system to answer the knowledge contained in unstructured documents, this paper primarily uses natural language processing techniques to analyze the content of articles, identify entities and relationships in the articles, and establish a knowledge graph of the articles. Graph neural networks are trained to allow the system to find more relevant elements from user questions, thereby improving the system's information retrieval capabilities. Finally, leveraging ChatGPT's language generation capabilities, an efficient and flexible unstructured document Question-Answer (QA) system is realized. |
第三語言摘要 | |
論文目次 |
目錄 目錄 VI 圖目錄 VIII 表目錄 X 第一章、簡介 1 第二章、相關研究 7 2-1基於檢索的問答系統 7 2-2生成式問答系統 9 第三章、背景知識 13 3-1知識圖譜 13 3-2圖神經網路 14 3-3 ChatGPT 16 第四章、系統架構 18 4-1環境與問題描述 18 4-1-1欲解決問題 18 4-1-2目標 18 4-2系統架構 18 4-2-1前處理 19 A.資料收集 19 B.資料前處理 20 C.實體與關係+生成問句 21 D.資料表 23 4-2-2知識圖譜建構 23 E. Neo4j建構知識圖譜 24 F. 知識圖譜查詢 25 4-2-3模型建構 26 4-2-4ChatGPT串接 29 第五章、實驗分析 30 5-1環境設定 30 5-2實驗數據 30 5-3實驗結果 31 第六章、結論 38 參考文獻 39 圖目錄 圖1、研究目標 3 圖2、研究架構 5 圖3、資料前處理 19 圖4、非結構化資料處理流程 21 圖5、ChatGPT找實體+關係 22 圖6、ChatGPT生成問題+回答 23 圖7、建構知識圖譜 24 圖8、實體和關係表格 24 圖9、Neo4j知識圖譜 25 圖10、知識圖譜查詢 26 圖11、圖神經網路模型架構 26 圖12、鏈路預測 27 圖13、問答集混淆矩陣 33 圖14、活動文章混淆矩陣 34 圖15、正確性分數 35 圖16、完整性分數 35 圖17、可讀性分數 36 圖18、問答集不同資料量和模型下的F1-Score 37 圖19、活動故事集不同資料量和模型下的F1-Score 37 表目錄 表1、相關研究比較表 12 表2、模型訓練環境 30 表3、實驗參數 31 表4、問答模型在開放測試中的結果範例 31 表5、混淆矩陣舉例 32 |
參考文獻 |
[1] S. Kwon, et al."Stroke medical ontology QA system for processing medical queries in natural language form," 2021 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Korea, 2021, doi: 10.1109/ICTC52510.2021.9620837 [2] A.M.Elema, "Developing Amharic Question Answering Model Over Unstructured Data Source Using Deep Learning Approach," 2022 International Conference on Information and Communication Technology for Development for Africa (ICT4DA), Bahir Dar, Ethiopia, 2022, doi: 10.1109/ICT4DA56482.2022.9971413 [3] Y. Chen and F. Zulkernine, "BIRD-QA: A BERT-based Information Retrieval Approach to Domain Specific Question Answering," 2021 IEEE International Conference On Big Data (Big Data), Orlando, FL, USA, 2021doi: 10.1109/BigData52589.2021.9671523 [4] T. Shao, Y. Guo, H. Chen and Z. Hao, "Transformer-based neural network for answer selection in question answering," IEEE Access, vol. 7, pp. 26146-26156, 2019.0 [5] S. Liu and X. Huang, "A Chinese Question Answering System based on GPT," 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 2019, doi: 10.1109/ICSESS47205.2019.9040807 [6] Y. Chen, L. Wu and M.J. Zaki, "Toward Subgraph-Guided Knowledge Graph Question Generation With Graph Neural Networks," IEEE Transactions on Neural Networks and Learning Systems (Early Access), pp.1-12, doi: 10.1109/TNNLS.2023.3264519 [7] Aidan Hogan, Eva Blomqvist, Michael Cochez, "Knowledge Graphs," arXiv:2003.02320 [8] B. Oguz, et al. "UniK-QA: Unified Representations of Structured and Unstructured Knowledge for Open-Domain Question Answering," arXiv:2012.14610 [9] J. Zhang, Z. Pei, W. Xiong and Z. Luo, "Answer Extraction with Graph Attention Network for Knowledge Graph Question Answering," 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Chengdu, China, 2020, doi: 10.1109/ICCC51575.2020.9345000 [10] J. Zhou, et al. "Graph Neural Networks: A Review of Methods and Applications," arXiv:1812.08434 [11] T. N. Kipf, M. Welling, "Semi-Supervised Classification with Graph Convolutional Networks," arXiv:1609.02907 [12] Y. Liu, et al. "Summary of ChatGPT-Related Research and Perspective Towards the Future of Large Language Models," arXiv:2304.01852 |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信