系統識別號 | U0002-2601201513405900 |
---|---|
DOI | 10.6846/TKU.2015.00872 |
論文名稱(中文) | 英語學習標的推薦機制設計 |
論文名稱(英文) | A design of English learning target recommender mechanism |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 資訊工程學系博士班 |
系所名稱(英文) | Department of Computer Science and Information Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 103 |
學期 | 1 |
出版年 | 104 |
研究生(中文) | 季振忠 |
研究生(英文) | Chen-Chung Chi |
學號 | 895410115 |
學位類別 | 博士 |
語言別 | 英文 |
第二語言別 | |
口試日期 | 2015-01-26 |
論文頁數 | 83頁 |
口試委員 |
指導教授
-
郭經華
委員 - 陳孟彰 委員 - 郭經華 委員 - 石貴平 委員 - 張志勇 委員 - 楊接期 |
關鍵字(中) |
餘弦相似度 文章推薦系統 文件可讀性 |
關鍵字(英) |
Cosine similarity Article recommender system Document readability |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
對於許多以英語為第二學習語言的學習者而言,藉由閱讀英文文章或聽英文歌曲,不失為可以精通英語閱讀與聽力的語言學習方法。然而,在學習的過程中,總是要經過許多次的嘗試,才能找到既符合自己的興趣,又能符合自己英語字彙能力的歌曲或英文文章等學習標的。 有鑑於目前「數位原生世代」學習者,相較於「數位原生世代」學習者,更愛用3 C(電腦、通訊、消費型電子)產品與社群網路服務的特性,本研究所設計的學習標的推薦系統,為了要迎合這些數位原生世代學習者的學習需求,亦結合YouTube與Facebook等強大的社群網路服務,並引入IWill這個以高中生為主的英語學習平台中的學習者資料,以學習者語料庫分群的機制,嘗試分析這個學習族群的字彙能力,據以提供更精確的推薦。 本論文分為二大部分:其一為設計一套機制,基於高中生的字彙能力,辨識英語學習標的的字彙難度,據以選擇並推薦合適的英語標的供學習者學習,經本研究實驗證實這套機制中的字彙難度篩選機制,可以明確的區分學習者寫作文章、台灣高中英語課文與英文新聞文章之間的字彙難度差異;其二為應用字彙難度辨識機制,推薦字彙難度合宜的歌曲,以行動學習或以網頁形式,讓學習者在聽歌的同時,藉由閱讀歌詞與選擇正確搭配字的形式,著重於動詞-名詞的搭配字學習,透過計算互斥資訊的方法,以分辨語料庫中的正確或錯誤的搭配字,在行動載具或電腦上,隨時隨地在行動裝置上提供聽力與搭配字使用能力的練習管道。 |
英文摘要 |
For many EFL (English as Foreign Language) learners, reading English articles or listening to music has always been a good way to improve their English proficiency. However, it’s always not easy to find appropriate learning target for fulfill learners’ interest and vocabulary ability. Nowadays most learners are “digital natives.” Compared to the characteristics the “digital immigrants”, the “digital natives” are used to using 3C (computers, communications, and consumers) products. Therefore this approach tries to develop a learning target recommender system that combine powerful social network – Youtube and Facebook, and import learner corpus – IWill (an English learners’ learning platform), tries to clustering the vocabulary characteristic this corpus to find out learners’ vocabulary using trend, therefore recommend appropriate learning target for learners. Two parts have been implemented in this approach. The first part is to describe the vocabulary difficulty filtering mechanism, and then use it to choose and recommend appropriate learning target for English learners; The second part is applied this mechanism to recommend music video via mobile device or web browser, the design of the proposed recommend system has focused on the study of correlation word of Verb-Noun, let learners can learning by watch dynamic displaying lyrics sentence by sentence, and listen music audio at the same time; This proposed system fetching worth learning collocation words by calculating mutual information of corpus, and then provide a practice tool for improve both reading and listening skill. |
第三語言摘要 | |
論文目次 |
Table of Contents List of Figures VII List of Tables IX Chapter 1. Introduction 1 1.1 Related work 1 1.2 The purpose and contribution of this study 7 1.3 Introduce of the remain section 9 Chapter 2. Text databases and research tools 11 2.1 Text databases 12 2.1.1 Vocabulary Sets 12 2.1.2 Text Databases 13 2.2 Research Tools 14 2.2.1 Natural Language Processing Toolkit: NLTK 14 2.2.2 Data Mining Toolkit: Orange 14 2.2.3 Open ID Toolkit: Facebook Graph API 18 2.3 Document readability estimation formula 20 2.3.1 Automated Readability Index (ARI) 20 2.3.2 Coleman-Liau Index (CLI) 21 2.3.3 Flesch-Kincaid readability test / Flesch Reading Ease 22 2.3.4 Gunning Fog Index 23 2.3.5 SMOG and SMOG Index 24 2.4 Natural language processing 26 2.4.1. Document preprocessing 26 2.4.2. POS Tagging 26 2.4.3. Lemmatizing: 26 2.5 Mutual Information and Collocation database 28 2.5.1 Mutual Information 28 2.5.2 Collocation and Mis-Collocation 29 Chapter 3. Mechanism overview 33 3.1 Design a text-reading recommendation system 33 3.1.1 Natural Language Process Preprocessing 35 3.1.2. Feature Extraction Mechanism by IR Models 35 3.1.3. Find document patterns in Stage i+1 and make recommendation 40 3.2. Design a Music Video Recommendation System 44 3.2.1. Login process for keeping personal information 44 3.2.2. Web-based music video recommendation list and playing interface 44 3.2.3. Collocation Databases 46 3.2.4 Applied Text database 46 3.2.5. Lyrics’ vocabulary difficulty evaluation 46 Chapter 4. Mechanism Performance Evaluation 48 4.1 The evaluation of the text-reading recommendation system 49 4.1.1 The system’s ability to identify stage i + 1 level articles appropriate for learners 49 4.1.2 The article source differentiation accuracy of the system 51 4.1.3 The rationality of system-recommended online English news articles for certain learner populations 53 4.1.4 Analyze essays in the iWill learners’ corpus using document difficulty formula 56 4.1.5 System extension application in identifying articles written with superior abilities 57 4.2 The implement of the music video recommendation system 60 4.2.1 User interface 62 4.2.2 Quiz in lyrics 63 4.2.3. Lyrics’ clustering based on vocabulary difficulty estimation 64 4.2.4 Correlation relationship estimation between document difficulty and learners’ feedback 65 Chapter 5. Conclusion and Future Work 70 Bibloigraphy 75 List of Figures Figure 1 Screenshot of Search Interface for Finding Appropriate Readings. 6 Figure 2 Search results and analysis of readability (E. Miltsakaki et al., 2009) 6 Figure 3 Proportions of examinees from different age groups for GEPT in 2010. 12 Figure 4 Data mining tools: Orange interface 17 Figure 5 Login process – connected to the Facebook Graph API 19 Figure 6 User authentication – using the FaceGraph API 19 Figure 7 Gunning Fog Index algorithm. 24 Figure 8 Data collect flowchart in proposed collocation database. 31 Figure 9 A data fragment in collocation database. 31 Figure 10 Average MI of the collocation database 32 Figure 11 The recommendation making procedure in the reading target recommendation system. 34 Figure 12 Similarity between words from GEPT level 6 and SHSETs 40 Figure 13 A classifier accuracy estimation flowchart 41 Figure 14 The system ask learner to answer a quiz. 45 Figure 15 The system leaves feedback message to learner. 45 Figure 16 The system architecture: music video recommender. 47 Figure 17 Document classification accuracy compared 52 Figure 18 Difficulty score distribution from three text databases 54 Figure 19 Document clustering result (fetched from iWill learners’ corpus) 57 Figure 20 Average cosine similarity comparison between the 100 well-written articles and the nonselected articles (word frequency model) 59 Figure 21 Average cosine similarity comparison between the best 100 well-written articles and the nonselected articles (Boolean model) 59 Figure 22 An example for candidate lists of word-pair options. 63 Figure 23 Clustering result by analyze lyrics’ vocabulary difficulty. 64 Figure 24 Correlation relationship Between document difficulty and learners’ feedback 67 Figure 25 Correlation relationship between right-ratio of quiz and feedback from learner (N=28, r=-0.69) 69 List of Tables Table 1 Categories of Orange widgets and functions 16 Table 2 Index score & description of Flesch Reading Ease Score method. 23 Table 3 GEPT level vocabulary covered ratio (before lemmatizing) 27 Table 4 GEPT level vocabulary covered ratio (after lemmatizing) 27 Table 5 Two example documents 37 Table 6 Similarity scores between documents and vocabulary sets 37 Table 7 Document features (obtained by applying the Boolean IR model) 38 Table 8 Raw data 42 Table 9 Evaluation results for the accuracy of classifiers. 50 Table 10 The confusion matrix 51 Table 11 Measurement methods in educational criteria for RS in TEL 61 Table 12 An evaluation framework for Recommendation System 74 |
參考文獻 |
Apostol, T., Calculus, Vol. 2: "Multi-Variable Calculus and Linear Algebra with Applications", John Wiley and Sons, ISBN 978-0471000075. (1969) BBC, available from http://www.bbc.co.uk/, retrieved (2011). Bird, S., Klein, E., Loper, E. and Baldridge, J., Multidisciplinary instruction with the Natural Language Toolkit. Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 62-70. (2008) Brants, Thorsten., "TnT: a statistical part-of-speech tagger." Proceedings of the sixth conference on Applied natural language processing. Association for Computational Linguistics. (2000) Brett, P., "Using multimedia: a descriptive investigation of incidental language learning", Computer Assisted Language Learning, Vol. 11, No. 2, pp. 179-200. (1998) Chall, J. S., Readability: An appraisal of research and publication, Bureau of Educational Research Monographs, Columbus: Ohio State University Press, Epping, England: Bowker. (1958) Chall, J. S. and Dale, E., “Readability revisited: The New ale-Chall Readability Formula, ”Cambridge, MA: Brookline Books. (1995) Chi, C. C. and Kuo, C. H., “The Design of English Article Recommender Mechanism for Senior High School Students”, Proc. of 2012 International Conference on Advanced Learning Technologies, Rome, Italy, July 4-6, pp. 541-545. (2012) Chen, C. M., Hsu, S. H., Li, Y. L. and Peng, C. J., "Personalized Intelligent M-learning System for Supporting Effective English Learning," Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on, vol.6, no., pp.4898, 4903, 8-11. (2006) Choi, H. J. and Johnson, S, D., “The effect of problem-based video instruction on learner satisfaction, comprehension and retention in college course”, British Journal of Educational Technologies, Vol. 38, No.5, pp. 885-895. (2007) Church, K. W. and Hanks, P., “Word association norms, mutual information, and lexicography”, Computational Linguistics, Vol. 16, No.1, pp. 22-29. (1990) CNN, available from http://edition.cnn.com/, retrieved (2011). Coleman, M., Liau, T. L., "A computer readability formula designed for machine scoring", Journal of Applied Psychology, Vol. 60, pp. 283–284. (1975) Cooley, R. and Mobasher, B., Srivastava, J., "Web mining: information and pattern discovery on the World Wide Web," Tools with Artificial Intelligence, 1997. Proceedings., Ninth IEEE International Conference on , vol., no., pp.558,567, 3-8. (1997) Csikszentmihalyi, M., “Beyond boredom and anxiety,” Jossey-Bass Publishers, pp.10-,2000, Original work published. (1975) Csikszentmihalyi, M., “Finding flow,” New York: Basic. (1997) Danielson, W. A. and Bryan, S. D., “Computer automation of two readability formulas,” Journalism Quarterly, pp. 201-206. (1963) Demšar, J., Zupan, B., Leban, G. and Curk, T., Orange: From experimental machine learning to interactive data mining. Springer Berlin Heidelberg. (2004) Drachsler, H., Hans G. K., Hummel, Koper, R., Identifying the Goal, User model and Conditions of Recommender Systems for Formal and Informal Learning, Journal of Digital Information, Vol. 10, No.2, pp. 4-24. (2009) Facebook Graph API, Information on https://developers.facebook.com/docs/graph-api?locale=zh_TW. (2012) GEPT, General English Proficiency Test, available from http:// www.gept.org.tw, retrieved (2011). Gunning, R., "The technique of clear writing", New York, NY: McGraw-Hill International Book Co. (1952) Heilman, M., Zhao, L., Pino, J., and Eskenazi, M., “Retrieval of Reading Materials for Vocabulary and Reading Practice”, Proc. of the Third ACL Workshop on Innovative Use of NLP for Building Educational Applications, Ohio, U. S. A., pp. 80–88. (2008) Hsu, C. C., Chen, H. C., Huang, K. K., Huang, Y. M., “A personalized auxiliary material recommendation system based on learning style on Facebook applying an artificial bee colony algorithm”, Computers & Mathematics with Applications, Vol. 64, No. 5, pp. 1506-1513. (2012) Hsu, C. K., Hwang, G. J., Chang, C. K., “Development of a reading material recommendation system based on a knowledge engineering approach”, Computers & Education, Vol. 55, No. 1, pp. 76-83. (2010) Huang, J., "Voices from Chinese student: Professors' use of English affects academic listening", College Student Journal, pp. 212-224. (2004) Hung, T. F., Chiou, Y. S., Kuo, C. H., Tsao, N. L., "A personalized movies system for English learning", Proc. of International Computer Symposiums (ICS), Taiwan, R. O. C., Nov 13-15. (2008) IWILL, Intelligent Web-based Interactive Language Learning, Information on http://cube.iwillnow.org/iwill/ (2012) Ito, K., Encyclopedic Dictionary of Mathematics 2nd ed, MIT Press, ISBN 978-0-262-59020-4, pp. 82, 113, 144-145. (1993) Kincaid, J. P., Braby, R., Mears, J., "Electronic authoring and delivery of technical information", Journal of Instructional Development, Vol. 11, No. 2, pp.8–13. (1988) Kincaid, J. P., Fishburne Jr, R. P., Rogers, R. L. and Chissom, B. S., Derivation of new readability formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy enlisted personnel, Research Branch Report, pp.8-75, Millington, TN: Naval Technical Training, U. S. Naval Air Station, Memphis, TN. (1975) Klare, G. R., The measurement of readability, Ames: Iowa State University Press. (1963) Knutsson O., Pargman, T. C., Eklundh, K. S., and Westlund, S., 2007. Designing and developing a language environment for second language writers. Computer Education. 49, 4, pp. 1122-1146. (2007) Kohavi, Ron., A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence 2 (12): pp.1137–1143. (Morgan Kaufmann, San Mateo, CA) (1995). Konchady, M., Text mining application programming. Boston, Mass.: Charles River Media. (2006) Krashen, S. D., “Principles practice in second language acquisition”, New York: Pergamon Press. (1995) Kuo, C. H., Wible. D., Tsao, N. L., Chang, C. F., "A video retrieval system for Computer Assisted Language Learning", Proc. of the 12th Internal Conference on Artificial Intelligence in Education, July 18-22, Amsterdam, Netherlands, pp. 378-385. (2005) Leacock, Claudia, et al. "Automated Grammatical Error Detection for Language Learners." Synthesis Lectures on Human Language Technologies 7.1, pp.1-170. (2014) Liu, B., Information Retrieval and Web Search, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer. (2007) LTTC, The Language Training & Testing Center, available from http://www.lttc.ntu.edu.tw, retrieved (2012). Martin, L. and Gottron, T., Readability and the Web. Future Internet, Vol. 4, No.1, pp. 238-252. (2012) McLaughlin, G. H., SMOG Grading - a New Readability Formula. Journal of Reading, Information on http://www.articlearchives.com/ education-training/literacy-illiteracy/880189-1.html (PDF), Journal of Reading, Vol.12, No.8, pp. 639–646. (1969) Miltsakaki, E., “Matching Readers’ Preferences and Reading Skills with Appropriate Web Texts”, Proceedings of the European Chapter of the Association for Computational Linguistics 2009 Demonstrations Session, Athens, Greece, 3 April, pp. 49–52. (2009) Najjar, L. J., "Multimedia information and learning", Journal of Educational Multimedia and Hypermedia Vol. 5 No. 2, pp. 129-150. (1996) Nakamura J. and Csikszentmihalyi, M., The concept of flow, Handbook of positive psychology, pp.89-105. (2002) Ng'ambi, D., & Lombe, A., Using Podcasting to Facilitate Student Learning: A Constructivist Perspective. Educational Technology & Society, 15 (4), pp. 181–192. (2012) Ono, Y. and Ishihara, M., "Examination of the podcasting system in Second Language Acquisition", Proc. of the 9th International Conference on Computer and Information Science, Yamagata, Japan, pp. 540-545. (2010) Orange, Analyze process through visual programming, available from http://orange.biolab.si/features.html, retrieved (2013). Perkins, Jacob., Python text processing with NLTK 2.0 cookbook. Packt Publishing Ltd. (2010) Petersen, S. E. and Ostendorf, M., A machine learning approach to reading level assessment, Computer Speech and Language, Vol. 23, pp. 89-106. (2009) Pitler, E. and Nenkova, A. A., Revisiting Readability: A Unified Framework for Predicting Text Quality, Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 186~195, Honolulu. (2008) Raymond Kosala and Hendrik Blockeel., 2000. Web mining research: a survey. SIGKDD Explor. Newsl. 2, 1, pp. 1-15. (2000) Recordon, D. and Reed, D., OpenID 2.0: a platform for user-centric identity management. In Proceedings of the second ACM workshop on Digital identity management (DIM '06). ACM, New York, NY, USA, pp. 11-16. (2006) Senter, R.J., Smith, E.A., "Automated Readability Index", Wright-Patterson Air Force Base. p. iii. AMRL-TR-6620. (1967) Shea, P., “Leveling the playing field: A study of captioned interactive video for second language learning”, Journal of Educational Computing Research, Vol. 22, No.3, pp. 243-263. (2000) Sanmin, available from http://www.sanmin.com.tw/page-history.asp., retrieved (2011). Senter, R. J. and Smith, E. A., Automated Readability Index, Wright Patterson Air Force Base, P. iii, AMRL-TR-6620. (1967) Tetreault, J., Chodorow, M. and Madnani, N. Bucking the trend: improved evaluation and annotation practices for ESL error detection systems. Language Resources and Evaluation, pp. 1-27. (2013) The China Post, available from http://chinapost.com.tw/, retrieved (2011). Thompson, K. C. and Callan, J., Predicting Reading Difficulty With Statistical Language Models, Journal Of The American Society For Information Science And Technology, Vol. 56, pp. 1448-1462. (2005) Tsao, N. L., Kuo, C. H., Liu, Anne L. E., Wible, D. Lu, Y. T., "Error-driven incidental language learning: learning Collocation from movies", Proceedings of the 17th International Conference on Computers in Education [CDROM], pp. 136-162. (2009) Vygotsky, L. S., “Mind in Society: The development of Higher Psychological Processes”, Harvard University Press. (1978) Williams, C. B., A note on the statistical analysis of sntence length as a criterion of literary style, Biometrika Trust, Vol. 31, No.3, pp. 356-361. (1940) Youtube API, Information on https://developers.google.com/youtube/ getting\_started?hl=zh-TW. (2012) Zhang, L., Liu Z., and Ni, J., Feature-Based Assessment of Text Readability, Seventh International Conference on Internet Computing for Engineering and Science, pp. 51-54. (2013) |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信