§ 瀏覽學位論文書目資料
系統識別號 U0002-2706202414333400
DOI 10.6846/tku202400362
論文名稱(中文) 英語適性學習:透過BERT和自然語言處理技術協助英語學習
論文名稱(英文) Adaptive English Learning: Facilitating English Learning with BERT and NLP Technologies
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系博士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 112
學期 2
出版年 113
研究生(中文) 楊喻婷
研究生(英文) Yu-Ting Yang
ORCID 0009-0005-2139-2319
學號 810410042
學位類別 博士
語言別 英文
第二語言別
口試日期 2024-06-06
論文頁數 54頁
口試委員 指導教授 - 張志勇(cychang@mail.tku.edu.tw) (0000-0002-0672-5593)
共同指導教授 - 武士戎(wushihjung@mail.tku.edu.tw)
口試委員 - 廖文華(whliao@ntub.edu.tw)
口試委員 - 趙榮耀(007753@mail.tku.edu.tw)
口試委員 - 林怡弟(ytlin@mail.tku.edu.tw)
口試委員 - 蕭顯勝(hssiu@ntnu.edu.tw)
關鍵字(中) 深度學習
自然語言處理
適性學習
以英語為外語
英語學習
BERT模型
詞彙網路
詞彙簡化
關鍵字(英) Adaptive Learning
English as a Foreign Language (EFL)
English Language Learning
Deep Learning
Natural Language Processing
BERT model
WordNet
Lexical Simplification
第三語言關鍵字
學科別分類
中文摘要
英語學習雖然重要,但遇到內容中有過多沒學過的單字,多數學生便喪失學習動力,加上許多的內容不夠生動活潑,使得課堂外,學生較缺乏主動學習英語的動機。經典文學或小說的內容通常較能吸引學生專注於故事情節,激發他們的學習興趣和閱讀動力。然而,這些內容中大量的詞彙時常超出學習者的語言程度,成為閱讀理解上的主要阻礙與挑戰。本論文介紹一種創新的方法,運用BERT(Bidirectional Encoder Representations from Transformers)和自然語言處理技術來解決此問題:Adaptive Language Learning by Leveraging BERT and Semantic Technologies (ALBS)。此方法的設計旨在透過用更適合學習者詞彙技能的詞彙替換難懂的詞彙,同時保持原文意思和語法的正確性,來提升 EFL(English as a Foreign Language) 學習者的詞彙能力。這種方法向不同詞彙程度的 EFL 學習者推薦經典文學,提供與其詞彙能力相符合的文本以強化個人化學習體驗,進而提升他們的學習動力。為實現此一目標,本做法分為三個主要階段。第一階段使用自然語言處理(NLP)技術訓練 L-BERT 模型,以確定句子中每個單字的程度級別。在第二階段中,使用 L-BERT 繪製出經典文學和學習者詞彙能力程度的百分比分佈。最後在詞彙替換階段,結合流暢度、語義和《歐洲語言共同參考框架》(CEFR)詞彙級別的標準,選擇最佳替換詞。研究結果顯示,本論文提倡的ALBS方法在精準度、召回率和 F1 分數方面均超越了現有的方法。
英文摘要
Integrating classic literature into language learning is both captivating and beneficial for English as a Foreign Language (EFL) students. It draws them into the story, igniting their interest and motivation to continue reading. However, the extensive vocabulary in these classics can present a major challenge, often exceeding the learners’ proficiency levels. This thesis introduces an innovative method called Adaptive Language Learning by Leveraging BERT and Semantic Technologies (ALBS) to tackle this problem. ALBS is designed to improve lexical proficiency tailored to EFL learners by substituting difficult words with those more suitable to the learners’ vocabulary skills, while maintaining the original meaning and grammatical correctness. This approach supports the recommendation of classic literature to EFL learners at different vocabulary levels, providing personalized learning experiences that match their lexical proficiency, thereby boosting their motivation. ALBS is divided into three main phases to achieve this objective. The first phase involves training an L-BERT model with Natural Language Processing (NLP) techniques to determine the difficulty level of each word in a sentence. In the second phase, the vocabulary proficiency levels of both the classic literature and the learner are mapped out in percentages using L-BERT. Finally, in the word replacement phase, the criteria of fluency, semantics, and the Common European Framework of Reference for Languages (CEFR) word level are combined to select the best replacement word for the target word. The findings show that ALBS surpasses existing methods in terms of precision, recall, and F1-score.
第三語言摘要
論文目次
Table of Contents
Acknowledgment	II
Table of Contents	VI
List of Figures	VIII
List of Tables	IX
Chapter 1. Introduction	1
1.1 Background	1
1.2 Research Goals	2
1.3 Organization of the Thesis	4
Chapter 2. Related Work	5
2.1 Vocabulary Simplification Techniques	5
2.2 Text Rewriting	6
Chapter 3. Preliminary Research	9
3.1 Introduction to NLP	9
3.2 Traditional NLP Techniques	10
3.2.1 TF-IDF (Term Frequency-Inverse Document Frequency)	10
3.2.2 Keyword Extraction with TF-IDF	11
3.3 From N-Grams to Word Vectors	11
3.3.1 N-grams	12
3.3.2 CBOW (Continuous Bag of Words) and Word2Vec	14
3.4 BERT: An In-Depth Look at Encoding, Processing, and Applications	19
3.4.1 Encoding and Positional Processing of a Sentence for BERT Input	19
3.4.2 Transformation of Each Word into q, k, v Vectors	22
3.4.3 Self-Attention Using q, k, v	22
3.4.4 Multi-Head Attention	23
3.4.5 Multi-Layer Attention	26
3.4.6 Other Layers in BERT	28
3.5 Downstream Applications of BERT	29
3.5.1 Text Classification	29
3.5.2 Named Entity Recognition (NER)	30
3.5.3 Review Classification	30
3.5.4 Question Answering (QA)	31
3.5.5 Next Sentence Prediction (NSP)	31
3.5.6 Language Translation	32
3.6 Conclusion of Preliminary Research	32
Chapter 4. Network Environment and Problem Formulation	33
4.1 Problem Statement	33
4.2 Objective	34
Chapter 5. The Proposed Mechanism	35
5.1 The BERT Model Training Phase	35
5.1.1 Building a Standardized CEFR Word Bank	36
5.1.2 Data Augmentation for Training	37
5.1.3 The L-BERT Model Construction and Training	38
5.2 The Word-Level Identification Phase	39
5.2.1 The learner’s vocabulary level distribution	39
5.2.2 The recommended text vocabulary level distribution	40
5.3 The Word Replacement Phase	42
5.3.1 Determining which words to replace	42
5.3.2 Word Replacement Policies	43
Chapter 6. Performance Evaluation	46
6.1 Datasets	46
6.2 Simulation Results	46
Chapter 7. Conclusion and Future Work	52
References	53

List of Figures
Fig. 3.1 CBOW Model     15
Fig. 3.2 CBOW Model (Expanded)      17
Fig. 3.3 Tokenization Process in BERT     20
Fig. 3.4 Self-attention Mechanism in NLP     25
Fig. 3.5 Self-attention Mechanism Forming the Multi-layer Structure of BERT     27
Fig. 6.1 The correlation between level of words and the number of replaced words in different book categories     47
Fig. 6.2 Distribution of words adjustment by the proposed ALBS     48
Fig. 6.3 Distribution of words between before and after replacement     48
Fig. 6.4 Accuracy of different W_fluency, W_semantics, W_level     50
Fig. 6.5 Accuracy of different methods     50
Fig. 6.6 The proposed ALBS comparing Level, Fluency, Semantic in terms of Precision, Recall, and F1-Score     51

List of Tables
Table 2.1 Comparison of related work     7
Table 6.1 The evaluation results using precision (PR) and accuracy (ACC) on three datasets     49
參考文獻
[1] M. Safari and M. M. Montazeri, “The Effect of Reducing Lexical and Syntactic Complexity of Texts on Reading Comprehension,” Journal of Teaching Language Skills, vol. 36, no. 3, pp. 59-83, Jan. 2017.
[2] D. M. Odo, “The Effect of Automatic Text Simplification on L2 Readers’ Text Comprehension,” Applied Linguistics, amac057, pp. 1-18, Oct. 2022.
[3] M. R. Ahmadi, “The Impact of Motivation on Reading Comprehension,” International Journal of Research in English Education, vol. 2, no. 1, pp. 1-7, Mar. 2017.
[4] G. M. Negari and M. Rouhi, “Effects of Lexical Modification on Incidental Vocabulary Acquisition of Iranian EFL Students,” English Language Teaching, vol. 5, no. 6, pp. 95-104, June 2012.
[5] Y. Yin, F. Zhang, Y. Li, J. Zhang and J. Qiang, “Personalized English Lexical Simplification for Chinese,” 2021 IEEE 23rd International Conference on High Performance Computing & Communications, Haikou, China, pp. 1777-1782, 2021.
[6] I. Rets and J. Rogaten, “To Simplify or Not? Facilitating English L2 Users’ Comprehension and Processing of Open Educational Resources in English Using Text Simplification,” Journal of Computer Assisted Learning, vol. 37, pp. 705-717, June 2021.
[7] J. Qiang, Y. Li, Y. Zhu, Y. Yuan, Y. Shi, “LSBert Lexical Simplification Based on BERT,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3064-3076, Sep. 2021.
[8] M. Gross and D. Cogan-Drew, Newsela, 2024. [Online]. Available: https://newsela.com/ [Accessed Jan. 5, 2024].
[9] D. Schicchi, G. Pilato, G. Lo Bosco, “Attention Based Model for Evaluating the Complexity of Sentences in English Language,” IEEE 2020 20th Mediterranean Electrotechnical Conference, Palermo, Italy, June 2020.
[10] S. Štajner, D. Ferrés, M. Shardlow, K. North, M. Zampieri, and H. Saggion, “Lexical Simplification Benchmarks for English, Portuguese, and Spanish,” Frontiers in Artificial Intelligence, vol. 5, pp. 1-18, Sep. 2022.
[11] G. H. Paetzold and L. Specia, “Benchmarking Lexical Simplification Systems,” 10th International Conference on Language Resources and Evaluation, Portorož, Slovenia, May 2016.
[12] J. Devlin, M. W. Chang, K. Lee, K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,  Minneapolis, USA, Jun. 2019.
[13] G. H. Paetzold and L. Specia, “A Survey on Lexical Simplification,” Journal of Artificial Intelligence Research, vol. 60, pp. 549–593, Nov. 2017.
[14] D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural Language Processing: State of the Art, Current Trends and Challenges,” Multimedia Tools and Applications, vol. 82, no. 3, pp. 3713–3744, Jan. 2023.
[15] Nikolaeva, S. “The Common European Framework of Reference for Languages: Past, Present and Future,” Advanced Education, no. 12, pp. 121-20, June 2019.
[16] F. Alva-Manchego, L. Martin, A. Bordes, C. Scarton, B. Sagot, and L. Specia, “ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations,” 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 4668–4679, Jan. 2020.
[17] M. Kang, M. Han, S. Ju Hwang, “Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation,” 2020 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, pp.6102 – 6120, Nov. 2020.
[18] S. Alissa and M. Wald, “Text SimplificationUsing Transformer and BERT,” Computers, Materials & Continua, vol. 75, no. 2, pp. 3479–3495, Feb. 2023.
[19] Ferrés, D., M. Marimon, H. Saggion, and A. AbuRa’ed. “YATS: Yet Another Text Simplifier,” Natural Language Processing and Information Systems: 21st International Conference on Applications of Natural Language to Information Systems, Salford, UK, pp. 335–342, June 2016.
[20] R. Sun, W. Xu, X. Wan, “Teaching the Pre-trained Model to Generate Simple Texts for Text Simplification,” 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Canada, pp. 9345–9355, July, 2023.
[21] M. Dehghan, D. Kumar, L. Golab, “GRS: Combining Generation and Revision in Unsupervised Sentence Simplification,” 60th Meeting of the Association for Computational Linguistics, Dublin, Ireland, pp. 949–960, May 2022.
[22] N. Srikanth, J. Jessy Li, “Elaborative Simplification: Content Addition and Explanation Generation in Text Simplification,” 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, pp. 5123–5137, Aug. 2021.
[23] S. Maqsood, A. Shahid, F. Nazar, M. Asif, M. Ahmad, M. Mazzara, “C-POS: A Context-Aware Adaptive Part-of-Speech Language Learning Framework,” IEEE Access, vol. 8, pp. 30720-30733, Feb. 2020.
[24] “English Vocabulary Profile Online - American English,” English Profile: The CEFR for English, Cambridge University Press, 2015. [Online]. https://www.englishprofile.org/american-English [Accessed Jan. 5, 2024].
[25] R. Patrick, “Comprehensible Input and Krashen’s Theory,” Journal of Classics Teaching, vol. 20, no. 39, pp. 37–44, Jul. 2019.
[26] S. Devlin and J. Tait, “The Use of a Psycholinguistic Database in the Simplification of Text for Aphasic Readers,” Linguistic Databases, vol. 1, pp. 161–173, Feb. 1998.
[27] T. Kajiwara, H. Matsumoto, and K. Yamamoto, “Selecting Proper Lexical Paraphrase for Children,” International Conference on Computational Linguistics, Kaohsiung, Taiwan, pp. 59–73, Oct. 2013.
[28] O. Biran, S. Brody, and N. Elhadad, “Putting it Simply: a Context-Aware Approach to Lexical Simplification,” 49th Annual Meeting of the Association for Computational Linguistics, Portland, USA, pp. 496–501, June 2011.
[29] C. Horn, C. A. Manduca, and D. Kauchak, “Learning a Lexical Simplifier Using Wikipedia,” 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, USA, pp. 458–463, Jan. 2014.
[30] G. Glavaš and S. Štajner, “Simplifying Lexical Simplification: Do We Need Simplified Corpora?” 53nd Annual Meeting of the Association for Computational Linguistics, Beijing, China, pp. 63–68, Jan. 2015.
[31] G. H. Paetzold and L. Specia, “Unsupervised Lexical Simplification for Non-Native Speakers,” 30th Conference on Artificial Intelligence, vol. 30, no. 1, pp. 3761–3767, Mar. 2016.
[32] G. Paetzold and L. Specia, “Lexical Simplification with Neural Ranking,” 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, pp. 34–40, Apr. 2017.
[33] S. Gooding and E. Kochmar, “Recursive context-aware lexical simplification,” 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, pp. 4853–4863, Nov. 2019.
[34] C. Horn, C. Manduca, and D. Kauchak, “Learning a Lexical Simplifier Using Wikipedia,” 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, USA, pp. 458–463, June 2014.
[35] G. H. Paetzold and L. Specia, “Unsupervised Lexical Simplification for Non-native Speakers,” 30th Conference on Artificial Intelligence, Phoenix, USA, pp. 3761–3767, Mar. 2016.
論文全文使用權限
國家圖書館
不同意無償授權國家圖書館
校內
校內紙本論文立即公開
電子論文全文不同意授權
校內書目立即公開
校外
同意授權予資料庫廠商
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信