§ 瀏覽學位論文書目資料
  
系統識別號 U0002-2301201412135600
DOI 10.6846/TKU.2014.00887
論文名稱(中文) 使用語意詞彙網路及語法相依性分析於中文文本蘊涵關係之研究
論文名稱(英文) Chinese Textual Entailment with WordNet Semantic and Dependency Syntactic Analysis
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊管理學系碩士班
系所名稱(英文) Department of Information Management
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 102
學期 1
出版年 103
研究生(中文) 杜駿
研究生(英文) Chun Tu
學號 601630014
學位類別 碩士
語言別 英文
第二語言別
口試日期 2014-01-13
論文頁數 72頁
口試委員 指導教授 - 戴敏育(myday@mail.tku.edu.tw)
委員 - 翁頌舜(wengss@ntut.edu.tw)
委員 - 周清江(cjou@mail.im.tku.edu.tw)
委員 - 戴敏育(myday@mail.tku.edu.tw)
關鍵字(中) 文本蘊涵
語意特徵
相依性分析
WordNet
語法特徵
機器學習
支持向量機(SVM)
關鍵字(英) Textual Entailment
Semantic Features
Dependency Analysis
WordNet
Syntactic Features
Machine Learning
Support Vector Machine (SVM)
第三語言關鍵字
學科別分類
中文摘要
文本蘊涵辨識(RITE)是一個效能評鑑任務,目的在評鑑系統自動偵測語句之間 "推論關係" 的能力,例如蘊涵(entailment)、意譯(paraphrase)、矛盾(contradiction)等。

本研究提出加入語意詞彙網路(WordNet)及語法相依性分析(Dependency syntactic analysis)之特徵方法用以處理NTCIR-10 RITE-2子任務之文本蘊涵辨識。語意詞彙網路通常用於辨識詞彙程度的蘊涵關係,語法相依性方法是一種將兩文本進行相依樹之轉換並計算兩子樹之編輯距離(Edit Distance)。

本研究實驗結果顯示,利用我們系統所加入之語意特徵為基礎,並利用機器學習進行特徵的分類,使用特徵選取的方法得到最佳化的特徵組合,在NTCIR-10 RITE-2之中文文本蘊涵辨識的整體準確率在繁體BC子任務中達到73.28%,在簡體BC子任務中達到74.57% ,本研究的主要貢獻為,我們於實驗中加入語意特徵方法對中文文本蘊涵辨識之準確率有大幅提升之效果。
英文摘要
Recognizing Inference in TExt (RITE) is a task for automatically detecting entailment, paraphrase, and contradiction in texts which addressing major text understanding in information access research areas. 

In this paper, we proposed a Chinese textual entailment system using Wordnet semantic and dependency syntactic approaches in Recognizing Inference in Text (RITE) using the NTCIR-10 RITE-2 subtask datasets. Wordnet is used to recognize entailment at lexical level. Dependency syntactic approach is a tree edit distance algorithm applied on the dependency trees of both the text and the hypothesis. 

We thoroughly evaluate our approach using NTCIR-10 RITE-2 subtask datasets. As a result, our system achieved 73.28% on Traditional Chinese Binary-Class (BC) subtask and 74.57% on Simplified Chinese Binary-Class subtask with NTCIR-10 RITE-2 development datasets. Thorough experiments with the text fragments provided by the NTCIR-10 RITE-2 subtask showed that the proposed approach can improve system's overall accuracy.
第三語言摘要
論文目次
Table of Contents
1.INTRODUCTION1
1.1Research Background1
1.2Research Motivation3
1.3 Research Objective4
2.LITERATURE REVIEW5
2.1 Recognizing Textual Entailment5
2.2Recognizing Inference in Text8
2.3Machine Learning21
2.4WordNet21
2.5Dependency Parser23
2.6Research Gap24
3.SYSTEM ARCHITECTURE25
3.1Preprocessing26
3.2Feature Generation29
3.3Machine Learning40
4.EXPERIMENTAL RESULTS AND ANALYSIS41
4.1Ablation Test42
4.2Experiment 1: Baseline43
4.3Experiment 2: Baseline with WordNet Evaluation45
4.4Experiment 3: Baseline with Antonym and Negation Evaluation51
4.5Experiment 4: Baseline with Dependency Parser Evaluation53
4.6Overall Evaluation54
5.CONCLUSIONS59
5.1Managerial Implication59
5.2Research Contribution60
5.3Research Limitation61
5.4Future Works61
6.REFERENCES63
Appendix I.Negation words list65
Appendix II.Antonym pairs list66

List of Tables
========================================
Table 1.Pairs In The BC Dataset From NTCIR-9 RITE12
Table 2.Pairs In The MC Dataset From NTCIR-9 RITE12
Table 3.Pairs In The RITE4QA Dataset From NTCIR-9 RITE12
Table 4.The Mapping Of The Original RITE Entailment Direction17
Table 5.Example Of Numerical Expressions Identified By Patterns In The RITE-2 Datasets28
Table 6.Syntactic And Semantic Features30
Table 7.Pairs In The BC Dataset From NTCIR-10 RITE-241
Table 8.Pairs In The MC Dataset From NTCIR-10 RITE-241
Table 9.Ablation Test On 20 Features43
Table 10.Baseline Configurations44
Table 11.Baseline Configurations With WordNet Similarity46
Table 12.Baseline Configurations With WordNet Similarity Ratio46
Table 13.Baseline Configurations With WordNet Similarity Minimum46
Table 14.Baseline Configurations With WordNet Similarity +WordNet Similarity Ratio48
Table 15.Baseline Configurations With WordNet Similarity Ratio+WordNet Similarity Minimum48
Table 16. Baseline Configurations With WordNet Similarity +WordNet Similarity Minimum49
Table 17.Baseline Configurations With all WordNet Similarity features50
Table 18.Baseline Configurations With Negation51
Table 19.Baseline Configurations With Antonym51
Table 20.Baseline Configurations With Antonym And Negation52
Table 21.Baseline Configurations With Dependency Parser53
Table 22.Cross Validation And Open Test Results After Features Selection (CT BC subtask)54
Table 23.Cross Validation And Open Test Results After Features Selection (CT MC Subtask)54
Table 24.Cross Validation and Open Test results after features selection (CS BC Subtask)54
Table 25.Cross Validation And Open Test Results After Features Selection (CS MC Subtask)54
Table 26.IMTKU Experiments For NTCIR-10 RITE-2 Datasets56

List of Figures
==================================
Figure 1.Example Of DCS Tree16
Figure 2.Illustration Of Alignment For RTE.18
Figure 3.E-HowNet Lexical Sense Expression20
Figure 4.E-HowNet Lexical Expression And Tree Structure20
Figure 5.An Example Of Dependency Tree23
Figure 6.System Architecture Of IMTKU Text Entailment System In NTCIR-10 RITE-225
Figure 7.NTCIR-10 RITE-2 Traditional Chinese BC Subtask Raw Datasets27
Figure 8.CKIP Autotag Results On A Text Pair29
Figure 9.Longest Common Substring (LCS) Algorithm31
Figure 10.A Dependency Tree Is A Set Of Links Connecting Heads To Dependents38
Figure 11.The Representation Of A Dependency Tree38
Figure 12.Model Cross-Validation Results56
參考文獻
[1](2011.10.27). CKIP AutoTag. Available: http://ckipsvr.iis.sinica.edu.tw/
[2](2011.10.27). Text Analysis Conference. Available: http://www.nist.gov/tac/2010/RTE/RTE6_Main_NoveltyDetection_Task_Guidelines.pdf
[3]Banerjee S. and Pedersen T., "An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet," in Computational Linguistics and Intelligent Text Processing. vol. 2276, A. Gelbukh, Ed., ed: Springer Berlin Heidelberg, 2002, pp. 136-145.
[4]Barzilay R. and McKeown K. R., "Extracting paraphrases from a parallel corpus," in Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, 2001, pp. 50-57.
[5]Burchardt A., Reiter N., Thater S., and Frank A., "A semantic approach to textual entailment: System evaluation and task analysis.," Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pp. 10-15, 2007.
[6]Castillo J., J., "A Machine Learning Approach for Recognizing Textual Entailment in Spainish," 2010.
[7]Chang C.-C. and Lin C.-J., "LIBSVM: A library for support vector machines," ACM Trans. Intell. Syst. Technol vol. 2, pp. 1-27, 2011.
[8]Chang P.-C., Tseng H., Jurafsky D., and Manning C. D., "Discriminative reordering with Chinese grammatical relations features," presented at the Proceedings of the Third Workshop on Syntax and Structure in Statistical Translation, Boulder, Colorado, 2009.
[9]Giampiccolo D., B. M., Dagan I., and Dolan B., "The third pascal recognizing textual entailment challenge," presented at the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, 2007.
[10]Hattori S. and Sato S., "Team SKL's Strategy and Experience in RITE2," in Proceedings of the 10th NTCIR Conference, Tokyo, Japan, 2013.
[11]Hirschberg D. S., "Algorithms for the Longest Common Subsequence Problem," Journal of the Assocrauon for Computing Machinery, vol. 24:4, pp. 664-675, 1997.
[12]Huang W.-C. and Wu S.-H., "Feature Analysis of Chinese Textural Entailment Systems," presented at the Proceedings of the 23rd Conference on Computational Linguistics and Speech Processing 2011.
[13]Huang W.-J. and Liu C.-L., "NCCU-MIG at NTCIR-10: Using Lexical, Syntactic, and Semantic Features for the RITE Tasks," in Proceedings of the 10th NTCIR Conference, Tokyo, Japan, 2013.
[14]Ide N. and Veronis J., "Introduction to the special issue on word sense disambiguation: the state of the art," Computational linguistics, vol. 24, pp. 2-40, 1998.
[15]Ito D., Tanaka M., and Yamana H., "WSD Team's Approaches for Textual Entailment Recognition at the NTCIR10 (RITE2)," in Proceeding of the 10th NTCIR Conference, Tokyo, Japan, 2013.
[16]Kouylekov M. and Magnini B., "Recognizing textual entailment with tree edit distance algorithms.," presented at the Proceedings of the PASCAL Recognizing Textual Entailment Challenge, 2005.
[17]Malakasiotis P. and Androutsopoulos I., "Learning textual entailment using SVMs and string similarity measures.," presented at the Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, Prague, 2007.
[18]McCallum A., Freitag D., and Pereira F. C., "Maximum Entropy Markov Models for Information Extraction and Segmentation," in ICML, 2000, pp. 591-598.
[19]McDonald R., Pereira F., Ribarov K., Haji J., and #269, "Non-projective dependency parsing using spanning tree algorithms," presented at the Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada, 2005.
[20]Mei J.-J., Zhu Y.-M., Gao Y.-Q., and Yin H.-X., TongYiCi CiLin (Chinese Synonym Forest): Shanghai Press of Lexicon and Books, 1983.
[21]Mihalcea R. and Moldovan D., "Semantic indexing using WordNet senses," presented at the Proceedings of the ACL-2000 workshop on Recent advances in natural language processing and information retrieval: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 11, Hong Kong, 2000.
[22]Miller G. A., "WordNet: A lexical database for English," presented at the Communications of the ACM, 1995.
[23]Missen M. and Boughanem M., "Using WordNet’s Semantic Relations for Opinion Detection in Blogs," in Advances in Information Retrieval. vol. 5478, M. Boughanem, C. Berrut, J. Mothe, and C. Soule-Dupuy, Eds., ed: Springer Berlin Heidelberg, 2009, pp. 729-733.
[24]Nivre J. and Scholz M., "Deterministic dependency parsing of English text," presented at the Proceedings of the 20th international conference on Computational Linguistics, Geneva, Switzerland, 2004.
[25]Pang B. and Lee L., "A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts," in Proceedings of the 42nd annual meeting on Association for Computational Linguistics, 2004, p. 271.
[26]Papineni K., Roukos S., Ward T., and Zhu W.-J., "BLEU: a method for automatic evaluation of machine translation," in Proceedings of the 40th annual meeting on association for computational linguistics, 2002, pp. 311-318.
[27]Ravichandran D. and Hovy E., "Learning surface text patterns for a question answering system," in Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, 2002, pp. 41-47.
[28]Shih C.-W., Liu C., Lee C.-W., and Hsu W.-L., "IASL RITE System at NTCIR-10," in Proceeding of the 10th NTCIR Conference, Tokyo, Japan, 2013.
[29]Shima H., Kanayama H., Lee C.-W., Lin C.-J., Mitamura T., Miyao Y., et al., "Overview of NTCIR-9 RITE: Recognizing Inference in TExt,," presented at the Proceedings of NTCIR-8 Workshop Meeting, Tokyo, Japan, 2011.
[30]Siblini R. and Kosseim L., "Using Ontology Alignment for TAC RTE Challenge," presented at the Proceedings of the Text Analysis Conference, Gaithersburg, MD, 2008.
[31]Tian R., Miyao Y., Matsuzaki T., and Komatsu H., "BnO at NTCIR-10 RITE: A Strong Shallow Approach and an Inference-based Textual Entailment Recognition System," in Proceedings of the 10th NTCIR Conference, Tokyo, Japan, 2013.
[32]Tu C. and Day M.-Y., "A statistical approach with syntactic and semantic features for Chinese Textual Entailment," in Proceeding of Information Reuse and Integration (IRI), 2012.
[33]Vanderwende L., Coughlin D., and Dolan B., "What Syntax can Contribute in Entailment Task," Microsoft Research, 2006.
[34]Wang X.-L., Zhao H., and Lu B.-L., "BCMI-NLP Labeled-Allignment-Based Entailment System for NTCIR-10 RITE-2 Task," in Proceedings of the 10th NTCIR Conference, Tokyo, Japan, 2013.
[35]Watanabe Y., Miyao Y., Mizuno J., Shibata T., Kanayama H., Lee C.-W., et al., "Overview of the Recognizing Inference in Text (RITE-2) at NTCIR-10," in Proceedings of the 10th NTCIR Conference, Tokyo, Japan, 2013.
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信