系統識別號 | U0002-2301201412135600 |
---|---|
DOI | 10.6846/TKU.2014.00887 |
論文名稱(中文) | 使用語意詞彙網路及語法相依性分析於中文文本蘊涵關係之研究 |
論文名稱(英文) | Chinese Textual Entailment with WordNet Semantic and Dependency Syntactic Analysis |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 資訊管理學系碩士班 |
系所名稱(英文) | Department of Information Management |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 102 |
學期 | 1 |
出版年 | 103 |
研究生(中文) | 杜駿 |
研究生(英文) | Chun Tu |
學號 | 601630014 |
學位類別 | 碩士 |
語言別 | 英文 |
第二語言別 | |
口試日期 | 2014-01-13 |
論文頁數 | 72頁 |
口試委員 |
指導教授
-
戴敏育(myday@mail.tku.edu.tw)
委員 - 翁頌舜(wengss@ntut.edu.tw) 委員 - 周清江(cjou@mail.im.tku.edu.tw) 委員 - 戴敏育(myday@mail.tku.edu.tw) |
關鍵字(中) |
文本蘊涵 語意特徵 相依性分析 WordNet 語法特徵 機器學習 支持向量機(SVM) |
關鍵字(英) |
Textual Entailment Semantic Features Dependency Analysis WordNet Syntactic Features Machine Learning Support Vector Machine (SVM) |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
文本蘊涵辨識(RITE)是一個效能評鑑任務,目的在評鑑系統自動偵測語句之間 "推論關係" 的能力,例如蘊涵(entailment)、意譯(paraphrase)、矛盾(contradiction)等。 本研究提出加入語意詞彙網路(WordNet)及語法相依性分析(Dependency syntactic analysis)之特徵方法用以處理NTCIR-10 RITE-2子任務之文本蘊涵辨識。語意詞彙網路通常用於辨識詞彙程度的蘊涵關係,語法相依性方法是一種將兩文本進行相依樹之轉換並計算兩子樹之編輯距離(Edit Distance)。 本研究實驗結果顯示,利用我們系統所加入之語意特徵為基礎,並利用機器學習進行特徵的分類,使用特徵選取的方法得到最佳化的特徵組合,在NTCIR-10 RITE-2之中文文本蘊涵辨識的整體準確率在繁體BC子任務中達到73.28%,在簡體BC子任務中達到74.57% ,本研究的主要貢獻為,我們於實驗中加入語意特徵方法對中文文本蘊涵辨識之準確率有大幅提升之效果。 |
英文摘要 |
Recognizing Inference in TExt (RITE) is a task for automatically detecting entailment, paraphrase, and contradiction in texts which addressing major text understanding in information access research areas. In this paper, we proposed a Chinese textual entailment system using Wordnet semantic and dependency syntactic approaches in Recognizing Inference in Text (RITE) using the NTCIR-10 RITE-2 subtask datasets. Wordnet is used to recognize entailment at lexical level. Dependency syntactic approach is a tree edit distance algorithm applied on the dependency trees of both the text and the hypothesis. We thoroughly evaluate our approach using NTCIR-10 RITE-2 subtask datasets. As a result, our system achieved 73.28% on Traditional Chinese Binary-Class (BC) subtask and 74.57% on Simplified Chinese Binary-Class subtask with NTCIR-10 RITE-2 development datasets. Thorough experiments with the text fragments provided by the NTCIR-10 RITE-2 subtask showed that the proposed approach can improve system's overall accuracy. |
第三語言摘要 | |
論文目次 |
Table of Contents 1.INTRODUCTION1 1.1Research Background1 1.2Research Motivation3 1.3 Research Objective4 2.LITERATURE REVIEW5 2.1 Recognizing Textual Entailment5 2.2Recognizing Inference in Text8 2.3Machine Learning21 2.4WordNet21 2.5Dependency Parser23 2.6Research Gap24 3.SYSTEM ARCHITECTURE25 3.1Preprocessing26 3.2Feature Generation29 3.3Machine Learning40 4.EXPERIMENTAL RESULTS AND ANALYSIS41 4.1Ablation Test42 4.2Experiment 1: Baseline43 4.3Experiment 2: Baseline with WordNet Evaluation45 4.4Experiment 3: Baseline with Antonym and Negation Evaluation51 4.5Experiment 4: Baseline with Dependency Parser Evaluation53 4.6Overall Evaluation54 5.CONCLUSIONS59 5.1Managerial Implication59 5.2Research Contribution60 5.3Research Limitation61 5.4Future Works61 6.REFERENCES63 Appendix I.Negation words list65 Appendix II.Antonym pairs list66 List of Tables ======================================== Table 1.Pairs In The BC Dataset From NTCIR-9 RITE12 Table 2.Pairs In The MC Dataset From NTCIR-9 RITE12 Table 3.Pairs In The RITE4QA Dataset From NTCIR-9 RITE12 Table 4.The Mapping Of The Original RITE Entailment Direction17 Table 5.Example Of Numerical Expressions Identified By Patterns In The RITE-2 Datasets28 Table 6.Syntactic And Semantic Features30 Table 7.Pairs In The BC Dataset From NTCIR-10 RITE-241 Table 8.Pairs In The MC Dataset From NTCIR-10 RITE-241 Table 9.Ablation Test On 20 Features43 Table 10.Baseline Configurations44 Table 11.Baseline Configurations With WordNet Similarity46 Table 12.Baseline Configurations With WordNet Similarity Ratio46 Table 13.Baseline Configurations With WordNet Similarity Minimum46 Table 14.Baseline Configurations With WordNet Similarity +WordNet Similarity Ratio48 Table 15.Baseline Configurations With WordNet Similarity Ratio+WordNet Similarity Minimum48 Table 16. Baseline Configurations With WordNet Similarity +WordNet Similarity Minimum49 Table 17.Baseline Configurations With all WordNet Similarity features50 Table 18.Baseline Configurations With Negation51 Table 19.Baseline Configurations With Antonym51 Table 20.Baseline Configurations With Antonym And Negation52 Table 21.Baseline Configurations With Dependency Parser53 Table 22.Cross Validation And Open Test Results After Features Selection (CT BC subtask)54 Table 23.Cross Validation And Open Test Results After Features Selection (CT MC Subtask)54 Table 24.Cross Validation and Open Test results after features selection (CS BC Subtask)54 Table 25.Cross Validation And Open Test Results After Features Selection (CS MC Subtask)54 Table 26.IMTKU Experiments For NTCIR-10 RITE-2 Datasets56 List of Figures ================================== Figure 1.Example Of DCS Tree16 Figure 2.Illustration Of Alignment For RTE.18 Figure 3.E-HowNet Lexical Sense Expression20 Figure 4.E-HowNet Lexical Expression And Tree Structure20 Figure 5.An Example Of Dependency Tree23 Figure 6.System Architecture Of IMTKU Text Entailment System In NTCIR-10 RITE-225 Figure 7.NTCIR-10 RITE-2 Traditional Chinese BC Subtask Raw Datasets27 Figure 8.CKIP Autotag Results On A Text Pair29 Figure 9.Longest Common Substring (LCS) Algorithm31 Figure 10.A Dependency Tree Is A Set Of Links Connecting Heads To Dependents38 Figure 11.The Representation Of A Dependency Tree38 Figure 12.Model Cross-Validation Results56 |
參考文獻 |
[1](2011.10.27). CKIP AutoTag. Available: http://ckipsvr.iis.sinica.edu.tw/ [2](2011.10.27). Text Analysis Conference. Available: http://www.nist.gov/tac/2010/RTE/RTE6_Main_NoveltyDetection_Task_Guidelines.pdf [3]Banerjee S. and Pedersen T., "An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet," in Computational Linguistics and Intelligent Text Processing. vol. 2276, A. Gelbukh, Ed., ed: Springer Berlin Heidelberg, 2002, pp. 136-145. [4]Barzilay R. and McKeown K. R., "Extracting paraphrases from a parallel corpus," in Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, 2001, pp. 50-57. [5]Burchardt A., Reiter N., Thater S., and Frank A., "A semantic approach to textual entailment: System evaluation and task analysis.," Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pp. 10-15, 2007. [6]Castillo J., J., "A Machine Learning Approach for Recognizing Textual Entailment in Spainish," 2010. [7]Chang C.-C. and Lin C.-J., "LIBSVM: A library for support vector machines," ACM Trans. Intell. Syst. Technol vol. 2, pp. 1-27, 2011. [8]Chang P.-C., Tseng H., Jurafsky D., and Manning C. D., "Discriminative reordering with Chinese grammatical relations features," presented at the Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation, Boulder, Colorado, 2009. [9]Giampiccolo D., B. M., Dagan I., and Dolan B., "The third pascal recognizing textual entailment challenge," presented at the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, 2007. [10]Hattori S. and Sato S., "Team SKL's Strategy and Experience in RITE2," in Proceedings of the 10th NTCIR Conference, Tokyo, Japan, 2013. [11]Hirschberg D. S., "Algorithms for the Longest Common Subsequence Problem," Journal of the Assocrauon for Computing Machinery, vol. 24:4, pp. 664-675, 1997. [12]Huang W.-C. and Wu S.-H., "Feature Analysis of Chinese Textural Entailment Systems," presented at the Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing 2011. [13]Huang W.-J. and Liu C.-L., "NCCU-MIG at NTCIR-10: Using Lexical, Syntactic, and Semantic Features for the RITE Tasks," in Proceedings of the 10th NTCIR Conference, Tokyo, Japan, 2013. [14]Ide N. and Veronis J., "Introduction to the special issue on word sense disambiguation: the state of the art," Computational linguistics, vol. 24, pp. 2-40, 1998. [15]Ito D., Tanaka M., and Yamana H., "WSD Team's Approaches for Textual Entailment Recognition at the NTCIR10 (RITE2)," in Proceeding of the 10th NTCIR Conference, Tokyo, Japan, 2013. [16]Kouylekov M. and Magnini B., "Recognizing textual entailment with tree edit distance algorithms.," presented at the Proceedings of the PASCAL Recognizing Textual Entailment Challenge, 2005. [17]Malakasiotis P. and Androutsopoulos I., "Learning textual entailment using SVMs and string similarity measures.," presented at the Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, 2007. [18]McCallum A., Freitag D., and Pereira F. C., "Maximum Entropy Markov Models for Information Extraction and Segmentation," in ICML, 2000, pp. 591-598. [19]McDonald R., Pereira F., Ribarov K., Haji J., and #269, "Non-projective dependency parsing using spanning tree algorithms," presented at the Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada, 2005. [20]Mei J.-J., Zhu Y.-M., Gao Y.-Q., and Yin H.-X., TongYiCi CiLin (Chinese Synonym Forest): Shanghai Press of Lexicon and Books, 1983. [21]Mihalcea R. and Moldovan D., "Semantic indexing using WordNet senses," presented at the Proceedings of the ACL-2000 workshop on Recent advances in natural language processing and information retrieval: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 11, Hong Kong, 2000. [22]Miller G. A., "WordNet: A lexical database for English," presented at the Communications of the ACM, 1995. [23]Missen M. and Boughanem M., "Using WordNet’s Semantic Relations for Opinion Detection in Blogs," in Advances in Information Retrieval. vol. 5478, M. Boughanem, C. Berrut, J. Mothe, and C. Soule-Dupuy, Eds., ed: Springer Berlin Heidelberg, 2009, pp. 729-733. [24]Nivre J. and Scholz M., "Deterministic dependency parsing of English text," presented at the Proceedings of the 20th international conference on Computational Linguistics, Geneva, Switzerland, 2004. [25]Pang B. and Lee L., "A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts," in Proceedings of the 42nd annual meeting on Association for Computational Linguistics, 2004, p. 271. [26]Papineni K., Roukos S., Ward T., and Zhu W.-J., "BLEU: a method for automatic evaluation of machine translation," in Proceedings of the 40th annual meeting on association for computational linguistics, 2002, pp. 311-318. [27]Ravichandran D. and Hovy E., "Learning surface text patterns for a question answering system," in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002, pp. 41-47. [28]Shih C.-W., Liu C., Lee C.-W., and Hsu W.-L., "IASL RITE System at NTCIR-10," in Proceeding of the 10th NTCIR Conference, Tokyo, Japan, 2013. [29]Shima H., Kanayama H., Lee C.-W., Lin C.-J., Mitamura T., Miyao Y., et al., "Overview of NTCIR-9 RITE: Recognizing Inference in TExt,," presented at the Proceedings of NTCIR-8 Workshop Meeting, Tokyo, Japan, 2011. [30]Siblini R. and Kosseim L., "Using Ontology Alignment for TAC RTE Challenge," presented at the Proceedings of the Text Analysis Conference, Gaithersburg, MD, 2008. [31]Tian R., Miyao Y., Matsuzaki T., and Komatsu H., "BnO at NTCIR-10 RITE: A Strong Shallow Approach and an Inference-based Textual Entailment Recognition System," in Proceedings of the 10th NTCIR Conference, Tokyo, Japan, 2013. [32]Tu C. and Day M.-Y., "A statistical approach with syntactic and semantic features for Chinese Textual Entailment," in Proceeding of Information Reuse and Integration (IRI), 2012. [33]Vanderwende L., Coughlin D., and Dolan B., "What Syntax can Contribute in Entailment Task," Microsoft Research, 2006. [34]Wang X.-L., Zhao H., and Lu B.-L., "BCMI-NLP Labeled-Allignment-Based Entailment System for NTCIR-10 RITE-2 Task," in Proceedings of the 10th NTCIR Conference, Tokyo, Japan, 2013. [35]Watanabe Y., Miyao Y., Mizuno J., Shibata T., Kanayama H., Lee C.-W., et al., "Overview of the Recognizing Inference in Text (RITE-2) at NTCIR-10," in Proceedings of the 10th NTCIR Conference, Tokyo, Japan, 2013. |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信