淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


下載電子全文限經由淡江IP使用) 
系統識別號 U0002-0408201023581000
中文論文名稱 英文介係詞的錯誤偵測與更正
英文論文名稱 Detection and Correction for English Preposition Error
校院名稱 淡江大學
系所名稱(中) 資訊工程學系碩士班
系所名稱(英) Department of Computer Science and Information Engineering
學年度 98
學期 2
出版年 99
研究生中文姓名 張愷達
研究生英文姓名 Kai-Da Chang
學號 697410909
學位類別 碩士
語文別 中文
第二語文別 英文
口試日期 2010-06-29
論文頁數 63頁
口試委員 指導教授-郭經華
委員-郭經華
委員-陳孟彰
委員-楊接期
委員-蔡憶佳
中文關鍵字 貝氏理論  content word  function word  機器學習  統計模型  加權 
英文關鍵字 Bayesian theory  content word  function word  machine learning  statistical model  weighting 
學科別分類 學科別應用科學資訊工程
中文摘要 本篇論文提出了一個針對英語介係詞錯誤的偵測與更正系統,目的在於幫助ESL(English as second language)環境的英語學習者了解英語介係詞的用法。
本系統擁有幾項特質。首先,由於應用貝氏理論,因此本系統在效能上能有不錯的表現。再者,本系統應用了英語文法上的概念:content word與function word概念,因此我們可以觀察每個單字所包含的語意多寡是否會影響介係詞的選擇。
本系統包含了三種類型的演算法,運用統計模組、貝氏理論的基本演算法,根據套用不同出現機率的加權演算法,應用英語文法中content word、function word概念演算法。在數據討論我們將後二者演算法與基準演算法比較可發現在精確值與回饋值上的改進。
英文摘要 This paper presents a system within detection and correction for English preposition errors, the objective is to help the ESL (English as second language) learners in learning the usage of preposition.
The system has some special features. First, it applies the Bayesian theory so it presents nice performance in efficiency. Second, it applies some concepts in English grammar: the content word and the function word, we can observe the influence on the choice of prepositions from each word’s semantic meaning.
There are 3 types of algorithms in our system, the first is the baseline algorithm with statistical model using Bayesian theory, the second is the weighting algorithm depends on the kind of frequency it catches, the third is the algorithm applying the concept of content word and function word. In the Discussion we compare latter two algorithms with baseline algorithm and we can find some improvements from examining precision and recall.
論文目次 第1章 緒論.................................................1
1.1 研究動機...............................................1
1.2 研究目的...............................................2
1.3 論文大綱...............................................2
第2章 相關研究.............................................4
第3章 演算法...............................................6
3.1 前置處理...............................................6
3.1.1 訓練資料 - BNC語料庫.................................6
3.1.2 term_info資料表......................................7
3.1.3 collocation資料表....................................9
3.1.4 測試資料............................................11
3.1.5 系統架構圖..........................................13
3.2 核心演算法............................................14
3.2.1 基準演算法..........................................14
3.2.2 加權演算法..........................................15
3.2.3 應用content word & function word概念演算法..........18
第4章 結果數據討論........................................23
4.1 後置處理:Kbest演算法.................................23
4.2 結果數據討論..........................................24
4.2.1 數據表格說明........................................24
4.2.2 實驗結果:基準演算法................................24
4.2.3 加權演算法:取單詞出現次數做計算....................29
4.2.4 加權演算法:取單詞與介係詞搭配出現次數做計算........33
4.2.5 只取content word項,捨去function word項.............37
4.2.6 只取function word項,捨去content word項.............41
4.2.7 content word, function word權重變動演算法...........45
第5章 未來展望............................................50
參考文獻..................................................51
附錄-英文論文.............................................57

圖目錄
圖 3.1.5 系統架構圖.......................................13
圖 4.2.2-1 1best實驗結果..................................25
圖 4.2.2-2 2best實驗結果..................................26
圖 4.2.2-3 3best實驗結果..................................27
圖 4.2.2-4 3種best實驗結果比較precision...................28
圖 4.2.2-5 3種best實驗結果比較recall......................28
圖 4.2.3-1 1best實驗結果..................................30
圖 4.2.3-2 2best實驗結果..................................31
圖 4.2.3-3 3best實驗結果..................................32
圖 4.2.3-4 1best與基準演算法比較recall....................33
圖 4.2.4-1 1best實驗結果..................................34
圖 4.2.4-2 2best實驗結果..................................35
圖 4.2.4-3 3best實驗結果..................................36
圖 4.2.4-4 1best與基準演算法比較recall....................37
圖 4.2.5-1 1best實驗結果..................................38
圖 4.2.4-2 2best實驗結果..................................39
圖 4.2.5-3 3best實驗結果..................................40
圖 4.2.5-4 1best與基準演算法比較precision.................41
圖 4.2.6-1 1best實驗結果..................................42
圖 4.2.6-2 2best實驗結果..................................43
圖 4.2.6-3 3best實驗結果..................................44
圖 4.2.7-1 1best實驗結果..................................46
圖 4.2.7-2 2best實驗結果..................................47
圖 4.2.7-3 3best實驗結果..................................48
圖 4.2.7-4 1best與基準演算法比較recall....................49

表目錄
表 3.1.2 term_info資料表...................................8
表 3.1.3 collocation資料表................................10
表 3.1.4 testing_data資料表...............................12
表 3.2.3 function word list資料表.........................20
表 4.2.2-1 1best實驗結果..................................25
表 4.2.2-2 2best實驗結果..................................26
表 4.2.2-3 3best實驗結果..................................27
表 4.2.3-1 1best實驗結果..................................30
表 4.2.3-2 2best實驗結果..................................31
表 4.2.3-3 3best實驗結果..................................32
表 4.2.4-1 1best實驗結果..................................34
表 4.2.4-2 2best實驗結果..................................35
表 4.2.4-3 3best實驗結果..................................36
表 4.2.5-1 1best實驗結果..................................38
表 4.2.5-2 2best實驗結果..................................39
表 4.2.5-3 3best實驗結果..................................40
表 4.2.6-1 1best實驗結果..................................42
表 4.2.6-2 2best實驗結果..................................43
表 4.2.6-3 3best實驗結果..................................44
表 4.2.7-1 1best實驗結果..................................46
表 4.2.7-2 2best實驗結果..................................47
表 4.2.7-3 3best實驗結果..................................48
參考文獻 [1] List of English Prepositions – Wikipedia
http://en.wikipedia.org/wiki/List_of_English_prepositions

[2] Joel R. Tetreault, Martin Chodorow. 2008. The ups and downs of preposition error detection in ESL writing. Proceedings of the 22nd International Conference on Computational Linguistics

[3] Izumi, E., K. Uchimoto, T. Saiga, T. Supnithi, and H. Isahara. 2003. Automatic error detection in the
Japanese leaners’ English spoken data. In ACL.

[4] Bayes’ Theorem – Wikipedia
http://en.wikipedia.org/wiki/Bayes%27_theorem

[5] Statistical Model – Wikipedia
http://en.wikipedia.org/wiki/Statistical_model

[6] Content word – Wikipedia
http://en.wikipedia.org/wiki/Content_word

[7] Function word – Wikipedia
http://en.wikipedia.org/wiki/Function_word

[8] Izumi, E., K. Uchimoto, and H. Isahara. 2004. The overview of the sst speech corpus of Japanese learner English and evaluation through the experiment on automatic detection of learners’ errors. In LREC.

[9] Gamon, M., J. Gao, C. Brockett, A. Klementiev, W. B. Dolan, D. Belenko, and L. Vanderwende. 2008. Using contextual speller techniques and language modeling for esl error correction. In IJCNLP.

[10] Lee, J. and S. Seneff. 2006. Automatic grammar correction for second-language learners. In Interspeech.

[11] Turner, Jenine and Eugene Charniak. 2007. Language Modeling for Determiner Selection. In Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers, pp 177-180.

[12] Rachele De Felice and Stephen G Pulman. 2007. Automatically acquiring models of preposition use. Proceedings of the ACL-07 Workshop on Prepositions.

[13] Chodorow, Martin, Joel Tetreault, and Na-Rae Han.2007. Detection of grammatical errors involving prepositions. In Proceedings of the 4th ACLSIGSEM Workshop on Prepositions.

[14] Rachele De Felice and Stephen G. Pulman. 2008. A classifier-based approach to preposition and determiner. Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 169–176

[15] BNC – British National Corpus http://www.natcorp.ox.ac.uk/

[16] Michael Gamon, Claudia Leacock, Chris Brockett, William B. Dolan, Jianfeng Gao, Dmitriy Belenko, and Alexandre Klementiev. 2009. Using statistical techniques and web search to correct ESL errors. To appear in CALICO Journal, Special Issue on Automatic Analysis of Learner Language.

[17] A. Ratnaparkhi. 1998. Maximum Entropy Models for natural language ambiguity resolution. Ph.D. thesis,University of Pennsylvania.

[18] Chodorow, Martin, Joel Tetreault. 2008. Native Judgments of Non-Native Usage: Experiments in Preposition Error Detection. Coling 2008: Proceedings of the workshop on Human Judgements in Computational Linguistics, pages 24–32 Manchester, August 2008

[19] Joel Tetreault, Jennifer Foster and Martin Chodorow. 2010. Using Parse Features for Preposition Selection and Error Detection. ACL '10, Uppsala, Sweden.

[20] Rachele De Felice and Stephen G. Pulman. 2008. A classifier-based approach to preposition and determiner error correction in L2 english. In Proceedings of the 22nd COLING, Manchester, United Kingdom.

[21] Rachele De Felice and Stephen Pulman. 2009. Automatic detection of preposition errors in learning writing. CALICO Journal, 26(3):512–528.

[22] Rachele De Felice. 2009. Automatic Error Detection
in Non-native English. Ph.D. thesis, Oxford University.

[23] Stephen Clark and James R. Curran. 2007. Widecoverage
efficient statistical parsing with CCG
and log-linear models. Computational Linguistics,
33(4):493–552.

[24] Matthieu Hermet, Alain D′esilets, and Stan Szpakowicz. 2008. Using the web as a linguistic resource to automatically correct lexico-syntactic errors. In Proceedings of LREC, Marrekech, Morocco.

[25] John Lee and Ola Knutsson. 2008. The role of PP attachment in preposition generation. In Proceedings of CICling. Springer-Verlag Berlin Heidelberg.

[26] Anas Elghafari, Detmar Meurers, Holger Wunsch. 2010. Exploring the Data-Driven Prediction of Prepositions in English. The 23rd International Conference on Computational Linguistics (COLING 2010)

[27] Bergsma, Shane, Dekang Lin, and Randy Goebel.2009. Web-scale n-gram models for lexical disambiguation.In IJCAI’09: Proceedings of the 21st international jont conference on Artifical intelligence,pages 1507–1512, San Francisco, CA,USA. Morgan Kaufmann Publishers Inc.
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2010-08-16公開。
  • 同意授權瀏覽/列印電子全文服務,於2010-08-16起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信