電子學位論文服務

§ 瀏覽學位論文書目資料

本論文電子全文於2019-08-01起於校外公開使用
本論文紙本於2019-08-01起公開使用

系統識別號	U0002-0108201909111000
DOI	10.6846/TKU.2019.00029
論文名稱(中文)	文章摘要的方法比較：以台灣網路新聞資料為例
論文名稱(英文)	Comparisons of Text Summarization Methods :A Case Study of Online News in Taiwan
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	大數據分析與商業智慧碩士學位學程
系所名稱(英文)	Master's Program In Big Data Analytics and Business Intelligence
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	107
學期	2
出版年	108
研究生(中文)	莊浩偉
研究生(英文)	Hao-Wei Chuang
學號	606890068
學位類別	碩士
語言別	繁體中文
第二語言別
口試日期	2019-07-01
論文頁數	35頁
口試委員	指導教授 - 陳景祥共同指導教授 - 張雅梅委員 - 陳麗菁委員 - 吳牧恩
關鍵字(中)	TextRank LexRank 詞向量長短期記憶神經網路
關鍵字(英)	TextRank LexRank Word to Vector Deep Learinig
第三語言關鍵字
學科別分類
中文摘要	本研究主要比較四種自動文章摘要方法，主要使用圖理論方法和使用深層學習方法，有TextRank、LexRank、Word to Vector + TextRank及Bi-LSTM，並對網路新聞文章產生摘要，以兩種摘要評估指標做為比較。經過分析後發現，在沒有大量或是高質量數據資料建立模型下，利用圖理論方法建立出文章摘要是個有效的方法，且基於圖理論方法所形成的摘要，比深層學習網路模型建立的摘要更佳；而在圖理論中，經典的TextRank與結合Word to Vector的TextRank算法之間表現有勝有負，而LexRank的評估指標總體平均較低，但仍然具有語意完整且較佳的可讀性；而深層學習建立出的文章摘要會有不斷重複且可能不為目標的詞彙出現。
英文摘要	This study mainly compares four automatic article summarization methods, based on graph theory methods or deep learning, including TextRank, LexRank, Word to Vector + TextRank and Bi-LSTM, for generating abstracts for online news articles by two evaluation indices. After analysis, we find that the model based on graph theory is an effective method without a large number of high-quality data models. The abstract based on the graph theory method is better than the deep learning network model. For models using graph theory, the performances of the classic TextRank and the TextRank algorithm combined with Word to Vector are roughly the same, while the evaluation index of LexRank is generally low on average, but still has a complete semantic and better readability; The abstracts of the articles created by deep learning will have vocabularies that are repeated and may not be targeted terms.
第三語言摘要
論文目次	目錄第一章緒論 1 第一節研究背景 1 第二節研究動機目的 1 第三節論文結構 2 第二章文獻回顧 3 第一節自動文章摘要 3 1. 自動文章摘要技術與主要方法 3 2. 自動文章摘要目的 4 3. 單文本與多文本文章摘要 4 4. 抽取式與抽象事自動文章摘要 5 5. 自動文章摘要評估方法 5 第二節圖理論 6 1. TextRank 6 2. LexRank 7 第三節深度學習 8 1. Word to Vector 8 2. 遞歸神經網路 Recurrent Neural Network 10 3. LSTM 10 第三章研究方法 14 第一節資料收集 14 1. 網路新聞參考摘要 14 2. 資料收集 15 3. 資料檢視 15 第二節方法 17 1. TextRank 17 2. LexRank 17 3. Word2Vec + TextRank 18 4. Bilstm 18 第三節摘要評估指標 20 1. ROUGE 20 2. Consine 相似度 20 第四章分析結果與評估 21 第一節生成摘要與標題作為比對 21 第二節生成摘要與文章第一段作為比較 27 第五章結論與建議 31 第一節結論 31 第二節未來建議 32 文獻參考 33 表目錄表 1 訓練資料表 16 表 2 測試資料表 16 表 3 表 word2vec參數設定 18 表 4 類神經模型參數設定表 19 表 5 四種方法產生摘要之ROUGE評估結果(參考摘要為文章標題) 22 表 6 四種方法產生標題之Cosine評估結果 23 表 7 Deep Learning生成摘要高召回率 24 表 8 Deep Learning生成摘要中較高的精確度 25 表 9 Deep Learning生成摘要高精確度 26 表 10 四種方法產生摘要之ROUGE評估結果(參考摘要為文章首段) 27 表 11 四種方法產生標題之Cosine評估結果 28 表 12 Deep Learning生成摘要召回率高 29 表 13 Deep Learning生成摘要精確度高 30 圖目錄圖 1 研究架構圖 2 圖 2 LexRank句子標量圖 8 圖 3 CBOW 9 圖 4 Skip-gram 9 圖 5 遞歸神經網路RNN架構圖 10 圖 6 LSTM架構圖 11 圖 7 LSTM細胞狀態 11 圖 8 雙向LSTM 架構圖 12 圖 9 生成摘要之Cosine1相似度(參考摘要為標題) 23 圖 10 生成摘要之Cosine2相似度(參考摘要為標題) 23 圖 11生成摘要之Cosine1相似度(參考摘要為文章首段) 28 圖 12 生成摘要之Cosine2相似度(參考摘要為文章首段) 28
參考文獻	中文文獻 1. 陳景祥(2018)， R 軟體：應用統計方法, 二版，台北：東華。 2. 寧建飛、劉降珍(2016)，融合 Word2vec與TextRank 的關鍵詞抽取研究。 3. 黃仁鵬、張貞瑩(2014)，運用詞彙權重技術於自動文件摘要之研究。中華民國資訊管理學報12（4）。 4. 劉海燕、張　鈺(2017)，基于lexrank的中文單文檔摘要方法。網站文獻 1. Understanding LSTM Networks: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 2. RNN - Recurrent Neural Networks: https://mchirico.github.io/rnn/2017/03/03/RNN.html 英文文獻 1. Y.-H. Tseng, Y.-M. Wang, Y.-I. Lin, 2007, Patent surrogate extraction and evaluation in the context of patent mapping. 2. G. Erkan, D. R. Radev, 2004, LexRank: Graph-based lexical centrality as salience in text summarization. 3. M. Gambhir, V. Gupta, 2017, Recent automatic text summarization techniques: a survey. 4. C.-Y. Lin, 2004, ROUGE:A Package for Automatic Evaluation of Summaries. 5. C.-Y. Lin, 2003, ROUGE: Recall-oriented Understudy for Gisting Evaluation. [Online]. Available: http://haydn.isi.edu/ROUGE/. 6. R. Mihalcea, P. Tarau, 2004, TextRank: Bringing Order into Texts. 7. R. Mihalcea, 2004, Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization. 8. T. Mikolov, K. Chen, G. Corrado, J. Dean, 2013, Efficient Estimation of Word Representations in Vector Space. 9. L. Page, S. Brin, R. Motwani, et al., 1999, The PageRank Citation Ranking: Bringing Order to the Web. 10. J.-M. Conroy and D.-P. O’Leary, 2001, Text Summarization via Hidden Markov Models, Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 406-407. 11. G. Murray, S. Renals, and J. Carletta., 2005, Extractive Summarization of Meeting Recordings, Proceedings of the 6th Annual Conference of the International Speech Communication Association (Interspeech), pp. 593-596. 12. J.-J. Kuo and H.-H. Chen., 2008, Multi-document Summary Generation using Informative and Event Words, Journal of ACM Transactions on Asian Language Information Processing, Vol. 7, No.1, pp. 550-557 13. C.-D. Paice, 1990, Constructing Literature Abstracts by Computer Techniques and Prospects, Journal of Information Processing and Management, Vol. 26, No. 1, pp. 171-186. 14. M. Witbrock and V. Mittal, 1999, Ultra Summarization: a Statistical Approach to Generating Highly Condensed Non-extractive Summaries, Proceedings of the 22th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 315-316. 15. Mani and M.-T. Maybury, 1999, Advances in Automatic Text Summarization, Cambridge: MIT Press. 16. X.-Y. Cai, and W.-J. Li, 2013, Ranking through Clustering: An Integrated Approach to Multi-Document Summarization, IEEE Transactions on Audio, Speech and Language Processing, Vol. 21, No. 7, pp.1424-1433. 17. H.P. Luhn , 1958, The Automatic Creation of Literature Abstracts, IBM Journal of Research and Development, Vol. 2, No. 2, pp.159-165. 18. J. Carbonell and J. Goldstein, 1998, The Use of MMR Diversity-based Reranking for Reordering Documents and Producing Summaries, Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 335-336. 19. J. Kupiec , 1995, A Trainable Document Summarizer, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pp. 68-73. 20. R. Nallapati, 2016, Abstractive text summarization using sequence-to-sequence rnns and beyond. ‎ 21. L. Liu, 2017, Generative Adversarial Network for Abstractive Text Summarization. 22. B. Hu, Q. Chen, F. Zhu, 2015, LCSTS: A Large Scale Chinese Short Text Summarization Dataset. 23. G. Salton, C. T. Yu, 1973, On the construction of effective vocabularies for information retrieval. 24. P. Li, W. Lam, L. Bing, Z. Wang, 2017, Deep recurrent generative decoder for abstractive text summarization.
論文全文使用權限	校內：校內紙本論文立即公開同意電子論文全文授權校園內公開校內電子論文立即公開校外：同意授權予資料庫廠商校外電子論文立即公開

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信