§ 瀏覽學位論文書目資料
系統識別號 U0002-1009202310590300
DOI 10.6846/tku202300653
論文名稱(中文) 基於知識圖譜、圖神經網路與ChatGPT設計與實作非結構性文件之QA問答系統
論文名稱(英文) Design and Implementation of a Question-Answering System for Unstructured Documents based on Knowledge Graph, Graph Neural Networks and ChatGPT
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 111
學期 2
出版年 112
研究生(中文) 楊家宇
研究生(英文) Jia-Yu Yang
學號 610414012
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2023-06-12
論文頁數 40頁
口試委員 口試委員 - 張志勇(cychang@mail.tku.edu.tw)
指導教授 - 黃仁俊(victor@gms.tku.edu.tw)
口試委員 - 廖文華(whliao@ntub.edu.tw)
口試委員 - 蒯思齊(sckuai@ntub.edu.tw)
關鍵字(中) 非結構性文件
ChatGPT
知識圖譜
自然語言處理
問答系統
人工智慧
圖神經網路
關鍵字(英) Unstructured Documents
Question-Answering system
Artificial Intelligence
ChatGPT
Knowledge Graph
Graph Neural Networks
Natural Language Processing
第三語言關鍵字
學科別分類
中文摘要
隨著數據爆炸性增長和信息碎片化,處理非結構性文件以提取有價值信息的挑戰變得日益迫切。傳統的問答(QA)系統通常局限於結構化數據,而在處理非結構性文件數據時表現不佳。這是因為這些系統難以理解文本的語境,也難以應對複雜的自然語言問題。因此,開發非結構性文件問答系統一直是一個具有挑戰性但極具價值的研究方向。ChatGPT技術的盛行和知識圖譜技術的發展,讓問答系統的開發有了新的突破方向。ChatGPT可以幫助系統很好地理解語境,並能更自然地回答用戶的問題。知識圖譜則可以幫助系統擁有更廣泛的知識基礎,回答各個領域的專業問題。這兩種技術的結合使得處理非結構性文件的問答系統更具潛力,能夠應對日益複雜的信息挑戰。爲了更好地讓系統回答非結構性文件中内含的知識。本論文主要透過自然語言處理技術分析文章的内容,找出文章(非結構性文件)的實體和關係,並建立文章的知識圖譜,訓練圖神經網路讓系統可以從使用者的問題中找到更多的相關要素,以此完善系統的資訊檢索功能,最後運用ChatGPT的語言組織能力,實現一個高效且靈活的非結構性文件問答(QA)系統。
英文摘要
With the explosive growth of data and information fragmentation, the challenge of processing unstructured documents to extract valuable information has become increasingly urgent. Traditional Question-Answer (QA) systems are often limited to structured data and perform poorly when dealing with unstructured document data. This is because these systems struggle to understand the context of the text and handle complex natural language questions. Therefore, the development of unstructured document QA systems has been a challenging yet highly valuable research direction. The prevalence of ChatGPT technology and the development of knowledge graph techniques have opened up new breakthrough directions in QA system development. ChatGPT can help the system understand context well and provide more natural answers to user questions. Knowledge graphs can help the system have a broader knowledge base and answer specialized questions in various domains. The combination of these two technologies makes unstructured document QA systems more promising and capable of addressing increasingly complex information challenges. In order to better enable the system to answer the knowledge contained in unstructured documents, this paper primarily uses natural language processing techniques to analyze the content of articles, identify entities and relationships in the articles, and establish a knowledge graph of the articles. Graph neural networks are trained to allow the system to find more relevant elements from user questions, thereby improving the system's information retrieval capabilities. Finally, leveraging ChatGPT's language generation capabilities, an efficient and flexible unstructured document Question-Answer (QA) system is realized.
第三語言摘要
論文目次
目錄
目錄	VI
圖目錄	VIII
表目錄	X
第一章、簡介	1
第二章、相關研究	7
2-1基於檢索的問答系統	7
2-2生成式問答系統	9
第三章、背景知識	13
3-1知識圖譜	13
3-2圖神經網路	14
3-3 ChatGPT	16
第四章、系統架構	18
4-1環境與問題描述	18
4-1-1欲解決問題	18
4-1-2目標	18
4-2系統架構	18
4-2-1前處理	19
A.資料收集	19
B.資料前處理	20
C.實體與關係+生成問句	21
D.資料表	23
4-2-2知識圖譜建構	23
E. Neo4j建構知識圖譜	24
F.	知識圖譜查詢	25
4-2-3模型建構	26
4-2-4ChatGPT串接	29
第五章、實驗分析	30
5-1環境設定	30
5-2實驗數據	30
5-3實驗結果	31
第六章、結論	38
參考文獻	39
 
圖目錄
圖1、研究目標	3
圖2、研究架構	5
圖3、資料前處理	19
圖4、非結構化資料處理流程	21
圖5、ChatGPT找實體+關係	22
圖6、ChatGPT生成問題+回答	23
圖7、建構知識圖譜	24
圖8、實體和關係表格	24
圖9、Neo4j知識圖譜	25
圖10、知識圖譜查詢	26
圖11、圖神經網路模型架構	26
圖12、鏈路預測	27
圖13、問答集混淆矩陣	33
圖14、活動文章混淆矩陣	34
圖15、正確性分數	35
圖16、完整性分數	35
圖17、可讀性分數	36
圖18、問答集不同資料量和模型下的F1-Score	37
圖19、活動故事集不同資料量和模型下的F1-Score	37

表目錄
表1、相關研究比較表	12
表2、模型訓練環境	30
表3、實驗參數	31
表4、問答模型在開放測試中的結果範例	31
表5、混淆矩陣舉例	32
參考文獻
[1]	S. Kwon, et al."Stroke medical ontology QA system for processing medical queries in natural language form," 2021 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Korea, 2021, doi: 10.1109/ICTC52510.2021.9620837
[2]	A.M.Elema, "Developing Amharic Question Answering Model Over Unstructured Data Source Using Deep Learning Approach," 2022 International Conference on Information and Communication Technology for Development for Africa (ICT4DA), Bahir Dar, Ethiopia, 2022, doi: 10.1109/ICT4DA56482.2022.9971413
[3]	Y. Chen and F. Zulkernine, "BIRD-QA: A BERT-based Information Retrieval Approach to Domain Specific Question Answering," 2021 IEEE International Conference On Big Data (Big Data), Orlando, FL, USA, 2021doi: 10.1109/BigData52589.2021.9671523
[4]	T. Shao, Y. Guo, H. Chen and Z. Hao, "Transformer-based neural network for answer selection in question answering," IEEE Access, vol. 7, pp. 26146-26156, 2019.0
[5]	S. Liu and X. Huang, "A Chinese Question Answering System based on GPT," 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 2019, doi: 10.1109/ICSESS47205.2019.9040807
[6]	Y. Chen, L. Wu and M.J. Zaki, "Toward Subgraph-Guided Knowledge Graph Question Generation With Graph Neural Networks," IEEE Transactions on Neural Networks and Learning Systems (Early Access), pp.1-12, doi: 10.1109/TNNLS.2023.3264519
[7]	Aidan Hogan, Eva Blomqvist, Michael Cochez, "Knowledge Graphs," arXiv:2003.02320
[8]	B. Oguz, et al. "UniK-QA: Unified Representations of Structured and Unstructured Knowledge for Open-Domain Question Answering," arXiv:2012.14610
[9]	J. Zhang, Z. Pei, W. Xiong and Z. Luo, "Answer Extraction with Graph Attention Network for Knowledge Graph Question Answering," 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Chengdu, China, 2020, doi: 10.1109/ICCC51575.2020.9345000
[10]	J. Zhou, et al. "Graph Neural Networks: A Review of Methods and Applications," 	arXiv:1812.08434
[11]	T. N. Kipf, M. Welling, "Semi-Supervised Classification with Graph Convolutional Networks,"  arXiv:1609.02907
[12]	Y. Liu, et al. "Summary of ChatGPT-Related Research and Perspective Towards the Future of Large Language Models," 	arXiv:2304.01852
論文全文使用權限
國家圖書館
不同意無償授權國家圖書館
校內
校內紙本論文立即公開
電子論文全文不同意授權
校內書目延後至2024-09-12公開,延後「中英文摘要」
校外
不同意授權予資料庫廠商
校外書目延後至2024-09-12公開,延後「中英文摘要」

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信