電子學位論文服務

§ 瀏覽學位論文書目資料

本論文電子全文於2024-07-02起於校外公開使用
本論文紙本於2024-07-02起公開使用

系統識別號	U0002-2706202414333400
DOI	10.6846/tku202400362
論文名稱(中文)	英語適性學習：透過BERT和自然語言處理技術協助英語學習
論文名稱(英文)	Adaptive English Learning: Facilitating English Learning with BERT and NLP Technologies
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	資訊工程學系博士班
系所名稱(英文)	Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	112
學期	2
出版年	113
研究生(中文)	楊喻婷
研究生(英文)	Yu-Ting Yang
ORCID	0009-0005-2139-2319
學號	810410042
學位類別	博士
語言別	英文
第二語言別
口試日期	2024-06-06
論文頁數	54頁
口試委員	指導教授 - 張志勇(cychang@mail.tku.edu.tw) (0000-0002-0672-5593) 共同指導教授 - 武士戎(wushihjung@mail.tku.edu.tw) 口試委員 - 廖文華(whliao@ntub.edu.tw) 口試委員 - 趙榮耀(007753@mail.tku.edu.tw) 口試委員 - 林怡弟(ytlin@mail.tku.edu.tw) 口試委員 - 蕭顯勝(hssiu@ntnu.edu.tw)
關鍵字(中)	深度學習自然語言處理適性學習以英語為外語英語學習 BERT模型詞彙網路詞彙簡化
關鍵字(英)	Adaptive Learning English as a Foreign Language (EFL) English Language Learning Deep Learning Natural Language Processing BERT model WordNet Lexical Simplification
第三語言關鍵字
學科別分類
中文摘要	英語學習雖然重要，但遇到內容中有過多沒學過的單字，多數學生便喪失學習動力，加上許多的內容不夠生動活潑，使得課堂外，學生較缺乏主動學習英語的動機。經典文學或小說的內容通常較能吸引學生專注於故事情節，激發他們的學習興趣和閱讀動力。然而，這些內容中大量的詞彙時常超出學習者的語言程度，成為閱讀理解上的主要阻礙與挑戰。本論文介紹一種創新的方法，運用BERT（Bidirectional Encoder Representations from Transformers）和自然語言處理技術來解決此問題：Adaptive Language Learning by Leveraging BERT and Semantic Technologies (ALBS)。此方法的設計旨在透過用更適合學習者詞彙技能的詞彙替換難懂的詞彙，同時保持原文意思和語法的正確性，來提升 EFL（English as a Foreign Language）學習者的詞彙能力。這種方法向不同詞彙程度的 EFL 學習者推薦經典文學，提供與其詞彙能力相符合的文本以強化個人化學習體驗，進而提升他們的學習動力。為實現此一目標，本做法分為三個主要階段。第一階段使用自然語言處理（NLP）技術訓練 L-BERT 模型，以確定句子中每個單字的程度級別。在第二階段中，使用 L-BERT 繪製出經典文學和學習者詞彙能力程度的百分比分佈。最後在詞彙替換階段，結合流暢度、語義和《歐洲語言共同參考框架》（CEFR）詞彙級別的標準，選擇最佳替換詞。研究結果顯示，本論文提倡的ALBS方法在精準度、召回率和 F1 分數方面均超越了現有的方法。
英文摘要	Integrating classic literature into language learning is both captivating and beneficial for English as a Foreign Language (EFL) students. It draws them into the story, igniting their interest and motivation to continue reading. However, the extensive vocabulary in these classics can present a major challenge, often exceeding the learners’ proficiency levels. This thesis introduces an innovative method called Adaptive Language Learning by Leveraging BERT and Semantic Technologies (ALBS) to tackle this problem. ALBS is designed to improve lexical proficiency tailored to EFL learners by substituting difficult words with those more suitable to the learners’ vocabulary skills, while maintaining the original meaning and grammatical correctness. This approach supports the recommendation of classic literature to EFL learners at different vocabulary levels, providing personalized learning experiences that match their lexical proficiency, thereby boosting their motivation. ALBS is divided into three main phases to achieve this objective. The first phase involves training an L-BERT model with Natural Language Processing (NLP) techniques to determine the difficulty level of each word in a sentence. In the second phase, the vocabulary proficiency levels of both the classic literature and the learner are mapped out in percentages using L-BERT. Finally, in the word replacement phase, the criteria of fluency, semantics, and the Common European Framework of Reference for Languages (CEFR) word level are combined to select the best replacement word for the target word. The findings show that ALBS surpasses existing methods in terms of precision, recall, and F1-score.
第三語言摘要
論文目次	Table of Contents Acknowledgment II Table of Contents VI List of Figures VIII List of Tables IX Chapter 1. Introduction 1 1.1 Background 1 1.2 Research Goals 2 1.3 Organization of the Thesis 4 Chapter 2. Related Work 5 2.1 Vocabulary Simplification Techniques 5 2.2 Text Rewriting 6 Chapter 3. Preliminary Research 9 3.1 Introduction to NLP 9 3.2 Traditional NLP Techniques 10 3.2.1 TF-IDF (Term Frequency-Inverse Document Frequency) 10 3.2.2 Keyword Extraction with TF-IDF 11 3.3 From N-Grams to Word Vectors 11 3.3.1 N-grams 12 3.3.2 CBOW (Continuous Bag of Words) and Word2Vec 14 3.4 BERT: An In-Depth Look at Encoding, Processing, and Applications 19 3.4.1 Encoding and Positional Processing of a Sentence for BERT Input 19 3.4.2 Transformation of Each Word into q, k, v Vectors 22 3.4.3 Self-Attention Using q, k, v 22 3.4.4 Multi-Head Attention 23 3.4.5 Multi-Layer Attention 26 3.4.6 Other Layers in BERT 28 3.5 Downstream Applications of BERT 29 3.5.1 Text Classification 29 3.5.2 Named Entity Recognition (NER) 30 3.5.3 Review Classification 30 3.5.4 Question Answering (QA) 31 3.5.5 Next Sentence Prediction (NSP) 31 3.5.6 Language Translation 32 3.6 Conclusion of Preliminary Research 32 Chapter 4. Network Environment and Problem Formulation 33 4.1 Problem Statement 33 4.2 Objective 34 Chapter 5. The Proposed Mechanism 35 5.1 The BERT Model Training Phase 35 5.1.1 Building a Standardized CEFR Word Bank 36 5.1.2 Data Augmentation for Training 37 5.1.3 The L-BERT Model Construction and Training 38 5.2 The Word-Level Identification Phase 39 5.2.1 The learner’s vocabulary level distribution 39 5.2.2 The recommended text vocabulary level distribution 40 5.3 The Word Replacement Phase 42 5.3.1 Determining which words to replace 42 5.3.2 Word Replacement Policies 43 Chapter 6. Performance Evaluation 46 6.1 Datasets 46 6.2 Simulation Results 46 Chapter 7. Conclusion and Future Work 52 References 53 List of Figures Fig. 3.1 CBOW Model 15 Fig. 3.2 CBOW Model (Expanded) 17 Fig. 3.3 Tokenization Process in BERT 20 Fig. 3.4 Self-attention Mechanism in NLP 25 Fig. 3.5 Self-attention Mechanism Forming the Multi-layer Structure of BERT 27 Fig. 6.1 The correlation between level of words and the number of replaced words in different book categories 47 Fig. 6.2 Distribution of words adjustment by the proposed ALBS 48 Fig. 6.3 Distribution of words between before and after replacement 48 Fig. 6.4 Accuracy of different W_fluency, W_semantics, W_level 50 Fig. 6.5 Accuracy of different methods 50 Fig. 6.6 The proposed ALBS comparing Level, Fluency, Semantic in terms of Precision, Recall, and F1-Score 51 List of Tables Table 2.1 Comparison of related work 7 Table 6.1 The evaluation results using precision (PR) and accuracy (ACC) on three datasets 49
參考文獻	[1] M. Safari and M. M. Montazeri, “The Effect of Reducing Lexical and Syntactic Complexity of Texts on Reading Comprehension,” Journal of Teaching Language Skills, vol. 36, no. 3, pp. 59-83, Jan. 2017. [2] D. M. Odo, “The Effect of Automatic Text Simplification on L2 Readers’ Text Comprehension,” Applied Linguistics, amac057, pp. 1-18, Oct. 2022. [3] M. R. Ahmadi, “The Impact of Motivation on Reading Comprehension,” International Journal of Research in English Education, vol. 2, no. 1, pp. 1-7, Mar. 2017. [4] G. M. Negari and M. Rouhi, “Effects of Lexical Modification on Incidental Vocabulary Acquisition of Iranian EFL Students,” English Language Teaching, vol. 5, no. 6, pp. 95-104, June 2012. [5] Y. Yin, F. Zhang, Y. Li, J. Zhang and J. Qiang, “Personalized English Lexical Simplification for Chinese,” 2021 IEEE 23rd International Conference on High Performance Computing & Communications, Haikou, China, pp. 1777-1782, 2021. [6] I. Rets and J. Rogaten, “To Simplify or Not? Facilitating English L2 Users’ Comprehension and Processing of Open Educational Resources in English Using Text Simplification,” Journal of Computer Assisted Learning, vol. 37, pp. 705-717, June 2021. [7] J. Qiang, Y. Li, Y. Zhu, Y. Yuan, Y. Shi, “LSBert Lexical Simplification Based on BERT,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 3064-3076, Sep. 2021. [8] M. Gross and D. Cogan-Drew, Newsela, 2024. [Online]. Available: https://newsela.com/ [Accessed Jan. 5, 2024]. [9] D. Schicchi, G. Pilato, G. Lo Bosco, “Attention Based Model for Evaluating the Complexity of Sentences in English Language,” IEEE 2020 20th Mediterranean Electrotechnical Conference, Palermo, Italy, June 2020. [10] S. Štajner, D. Ferrés, M. Shardlow, K. North, M. Zampieri, and H. Saggion, “Lexical Simplification Benchmarks for English, Portuguese, and Spanish,” Frontiers in Artificial Intelligence, vol. 5, pp. 1-18, Sep. 2022. [11] G. H. Paetzold and L. Specia, “Benchmarking Lexical Simplification Systems,” 10th International Conference on Language Resources and Evaluation, Portorož, Slovenia, May 2016. [12] J. Devlin, M. W. Chang, K. Lee, K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, USA, Jun. 2019. [13] G. H. Paetzold and L. Specia, “A Survey on Lexical Simplification,” Journal of Artificial Intelligence Research, vol. 60, pp. 549–593, Nov. 2017. [14] D. Khurana, A. Koli, K. Khatter, and S. Singh, “Natural Language Processing: State of the Art, Current Trends and Challenges,” Multimedia Tools and Applications, vol. 82, no. 3, pp. 3713–3744, Jan. 2023. [15] Nikolaeva, S. “The Common European Framework of Reference for Languages: Past, Present and Future,” Advanced Education, no. 12, pp. 121-20, June 2019. [16] F. Alva-Manchego, L. Martin, A. Bordes, C. Scarton, B. Sagot, and L. Specia, “ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations,” 58th Annual Meeting of the Association for Computational Linguistics, Online, pp. 4668–4679, Jan. 2020. [17] M. Kang, M. Han, S. Ju Hwang, “Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation,” 2020 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, pp.6102 – 6120, Nov. 2020. [18] S. Alissa and M. Wald, “Text SimplificationUsing Transformer and BERT,” Computers, Materials & Continua, vol. 75, no. 2, pp. 3479–3495, Feb. 2023. [19] Ferrés, D., M. Marimon, H. Saggion, and A. AbuRa’ed. “YATS: Yet Another Text Simplifier,” Natural Language Processing and Information Systems: 21st International Conference on Applications of Natural Language to Information Systems, Salford, UK, pp. 335–342, June 2016. [20] R. Sun, W. Xu, X. Wan, “Teaching the Pre-trained Model to Generate Simple Texts for Text Simplification,” 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Canada, pp. 9345–9355, July, 2023. [21] M. Dehghan, D. Kumar, L. Golab, “GRS: Combining Generation and Revision in Unsupervised Sentence Simplification,” 60th Meeting of the Association for Computational Linguistics, Dublin, Ireland, pp. 949–960, May 2022. [22] N. Srikanth, J. Jessy Li, “Elaborative Simplification: Content Addition and Explanation Generation in Text Simplification,” 62nd Annual Meeting of the Association for Computational Linguistics, Bangkok, Thailand, pp. 5123–5137, Aug. 2021. [23] S. Maqsood, A. Shahid, F. Nazar, M. Asif, M. Ahmad, M. Mazzara, “C-POS: A Context-Aware Adaptive Part-of-Speech Language Learning Framework,” IEEE Access, vol. 8, pp. 30720-30733, Feb. 2020. [24] “English Vocabulary Profile Online - American English,” English Profile: The CEFR for English, Cambridge University Press, 2015. [Online]. https://www.englishprofile.org/american-English [Accessed Jan. 5, 2024]. [25] R. Patrick, “Comprehensible Input and Krashen’s Theory,” Journal of Classics Teaching, vol. 20, no. 39, pp. 37–44, Jul. 2019. [26] S. Devlin and J. Tait, “The Use of a Psycholinguistic Database in the Simplification of Text for Aphasic Readers,” Linguistic Databases, vol. 1, pp. 161–173, Feb. 1998. [27] T. Kajiwara, H. Matsumoto, and K. Yamamoto, “Selecting Proper Lexical Paraphrase for Children,” International Conference on Computational Linguistics, Kaohsiung, Taiwan, pp. 59–73, Oct. 2013. [28] O. Biran, S. Brody, and N. Elhadad, “Putting it Simply: a Context-Aware Approach to Lexical Simplification,” 49th Annual Meeting of the Association for Computational Linguistics, Portland, USA, pp. 496–501, June 2011. [29] C. Horn, C. A. Manduca, and D. Kauchak, “Learning a Lexical Simplifier Using Wikipedia,” 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, USA, pp. 458–463, Jan. 2014. [30] G. Glavaš and S. Štajner, “Simplifying Lexical Simplification: Do We Need Simplified Corpora?” 53nd Annual Meeting of the Association for Computational Linguistics, Beijing, China, pp. 63–68, Jan. 2015. [31] G. H. Paetzold and L. Specia, “Unsupervised Lexical Simplification for Non-Native Speakers,” 30th Conference on Artificial Intelligence, vol. 30, no. 1, pp. 3761–3767, Mar. 2016. [32] G. Paetzold and L. Specia, “Lexical Simplification with Neural Ranking,” 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, pp. 34–40, Apr. 2017. [33] S. Gooding and E. Kochmar, “Recursive context-aware lexical simplification,” 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, pp. 4853–4863, Nov. 2019. [34] C. Horn, C. Manduca, and D. Kauchak, “Learning a Lexical Simplifier Using Wikipedia,” 52nd Annual Meeting of the Association for Computational Linguistics, Baltimore, USA, pp. 458–463, June 2014. [35] G. H. Paetzold and L. Specia, “Unsupervised Lexical Simplification for Non-native Speakers,” 30th Conference on Artificial Intelligence, Phoenix, USA, pp. 3761–3767, Mar. 2016.
論文全文使用權限	國家圖書館：不同意無償授權國家圖書館校內：校內紙本論文立即公開電子論文全文不同意授權校內書目立即公開校外：同意授權予資料庫廠商校外電子論文立即公開

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信