電子學位論文服務

§ 瀏覽學位論文書目資料

本論文電子全文於2015-01-30起於校外公開使用
本論文紙本於2015-01-30起公開使用

系統識別號	U0002-2601201513405900
DOI	10.6846/TKU.2015.00872
論文名稱(中文)	英語學習標的推薦機制設計
論文名稱(英文)	A design of English learning target recommender mechanism
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	資訊工程學系博士班
系所名稱(英文)	Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	103
學期	1
出版年	104
研究生(中文)	季振忠
研究生(英文)	Chen-Chung Chi
學號	895410115
學位類別	博士
語言別	英文
第二語言別
口試日期	2015-01-26
論文頁數	83頁
口試委員	指導教授 - 郭經華委員 - 陳孟彰委員 - 郭經華委員 - 石貴平委員 - 張志勇委員 - 楊接期
關鍵字(中)	餘弦相似度文章推薦系統文件可讀性
關鍵字(英)	Cosine similarity Article recommender system Document readability
第三語言關鍵字
學科別分類
中文摘要	對於許多以英語為第二學習語言的學習者而言，藉由閱讀英文文章或聽英文歌曲，不失為可以精通英語閱讀與聽力的語言學習方法。然而，在學習的過程中，總是要經過許多次的嘗試，才能找到既符合自己的興趣，又能符合自己英語字彙能力的歌曲或英文文章等學習標的。有鑑於目前「數位原生世代」學習者，相較於「數位原生世代」學習者，更愛用3 C（電腦、通訊、消費型電子）產品與社群網路服務的特性，本研究所設計的學習標的推薦系統，為了要迎合這些數位原生世代學習者的學習需求，亦結合YouTube與Facebook等強大的社群網路服務，並引入IWill這個以高中生為主的英語學習平台中的學習者資料，以學習者語料庫分群的機制，嘗試分析這個學習族群的字彙能力，據以提供更精確的推薦。本論文分為二大部分：其一為設計一套機制，基於高中生的字彙能力，辨識英語學習標的的字彙難度，據以選擇並推薦合適的英語標的供學習者學習，經本研究實驗證實這套機制中的字彙難度篩選機制，可以明確的區分學習者寫作文章、台灣高中英語課文與英文新聞文章之間的字彙難度差異；其二為應用字彙難度辨識機制，推薦字彙難度合宜的歌曲，以行動學習或以網頁形式，讓學習者在聽歌的同時，藉由閱讀歌詞與選擇正確搭配字的形式，著重於動詞-名詞的搭配字學習，透過計算互斥資訊的方法，以分辨語料庫中的正確或錯誤的搭配字，在行動載具或電腦上，隨時隨地在行動裝置上提供聽力與搭配字使用能力的練習管道。
英文摘要	For many EFL (English as Foreign Language) learners, reading English articles or listening to music has always been a good way to improve their English proficiency. However, it’s always not easy to find appropriate learning target for fulfill learners’ interest and vocabulary ability. Nowadays most learners are “digital natives.” Compared to the characteristics the “digital immigrants”, the “digital natives” are used to using 3C (computers, communications, and consumers) products. Therefore this approach tries to develop a learning target recommender system that combine powerful social network – Youtube and Facebook, and import learner corpus – IWill (an English learners’ learning platform), tries to clustering the vocabulary characteristic this corpus to find out learners’ vocabulary using trend, therefore recommend appropriate learning target for learners. Two parts have been implemented in this approach. The first part is to describe the vocabulary difficulty filtering mechanism, and then use it to choose and recommend appropriate learning target for English learners; The second part is applied this mechanism to recommend music video via mobile device or web browser, the design of the proposed recommend system has focused on the study of correlation word of Verb-Noun, let learners can learning by watch dynamic displaying lyrics sentence by sentence, and listen music audio at the same time; This proposed system fetching worth learning collocation words by calculating mutual information of corpus, and then provide a practice tool for improve both reading and listening skill.
第三語言摘要
論文目次	Table of Contents List of Figures VII List of Tables IX Chapter 1. Introduction 1 1.1 Related work 1 1.2 The purpose and contribution of this study 7 1.3 Introduce of the remain section 9 Chapter 2. Text databases and research tools 11 2.1 Text databases 12 2.1.1 Vocabulary Sets 12 2.1.2 Text Databases 13 2.2 Research Tools 14 2.2.1 Natural Language Processing Toolkit: NLTK 14 2.2.2 Data Mining Toolkit: Orange 14 2.2.3 Open ID Toolkit: Facebook Graph API 18 2.3 Document readability estimation formula 20 2.3.1 Automated Readability Index (ARI) 20 2.3.2 Coleman-Liau Index (CLI) 21 2.3.3 Flesch-Kincaid readability test / Flesch Reading Ease 22 2.3.4 Gunning Fog Index 23 2.3.5 SMOG and SMOG Index 24 2.4 Natural language processing 26 2.4.1. Document preprocessing 26 2.4.2. POS Tagging 26 2.4.3. Lemmatizing: 26 2.5 Mutual Information and Collocation database 28 2.5.1 Mutual Information 28 2.5.2 Collocation and Mis-Collocation 29 Chapter 3. Mechanism overview 33 3.1 Design a text-reading recommendation system 33 3.1.1 Natural Language Process Preprocessing 35 3.1.2. Feature Extraction Mechanism by IR Models 35 3.1.3. Find document patterns in Stage i+1 and make recommendation 40 3.2. Design a Music Video Recommendation System 44 3.2.1. Login process for keeping personal information 44 3.2.2. Web-based music video recommendation list and playing interface 44 3.2.3. Collocation Databases 46 3.2.4 Applied Text database 46 3.2.5. Lyrics’ vocabulary difficulty evaluation 46 Chapter 4. Mechanism Performance Evaluation 48 4.1 The evaluation of the text-reading recommendation system 49 4.1.1 The system’s ability to identify stage i + 1 level articles appropriate for learners 49 4.1.2 The article source differentiation accuracy of the system 51 4.1.3 The rationality of system-recommended online English news articles for certain learner populations 53 4.1.4 Analyze essays in the iWill learners’ corpus using document difficulty formula 56 4.1.5 System extension application in identifying articles written with superior abilities 57 4.2 The implement of the music video recommendation system 60 4.2.1 User interface 62 4.2.2 Quiz in lyrics 63 4.2.3. Lyrics’ clustering based on vocabulary difficulty estimation 64 4.2.4 Correlation relationship estimation between document difficulty and learners’ feedback 65 Chapter 5. Conclusion and Future Work 70 Bibloigraphy 75 List of Figures Figure 1 Screenshot of Search Interface for Finding Appropriate Readings. 6 Figure 2 Search results and analysis of readability (E. Miltsakaki et al., 2009) 6 Figure 3 Proportions of examinees from different age groups for GEPT in 2010. 12 Figure 4 Data mining tools: Orange interface 17 Figure 5 Login process – connected to the Facebook Graph API 19 Figure 6 User authentication – using the FaceGraph API 19 Figure 7 Gunning Fog Index algorithm. 24 Figure 8 Data collect flowchart in proposed collocation database. 31 Figure 9 A data fragment in collocation database. 31 Figure 10 Average MI of the collocation database 32 Figure 11 The recommendation making procedure in the reading target recommendation system. 34 Figure 12 Similarity between words from GEPT level 6 and SHSETs 40 Figure 13 A classiﬁer accuracy estimation ﬂowchart 41 Figure 14 The system ask learner to answer a quiz. 45 Figure 15 The system leaves feedback message to learner. 45 Figure 16 The system architecture: music video recommender. 47 Figure 17 Document classification accuracy compared 52 Figure 18 Diﬃculty score distribution from three text databases 54 Figure 19 Document clustering result (fetched from iWill learners’ corpus) 57 Figure 20 Average cosine similarity comparison between the 100 well-written articles and the nonselected articles (word frequency model) 59 Figure 21 Average cosine similarity comparison between the best 100 well-written articles and the nonselected articles (Boolean model) 59 Figure 22 An example for candidate lists of word-pair options. 63 Figure 23 Clustering result by analyze lyrics’ vocabulary difficulty. 64 Figure 24 Correlation relationship Between document difficulty and learners’ feedback 67 Figure 25 Correlation relationship between right-ratio of quiz and feedback from learner (N=28, r=-0.69) 69 List of Tables Table 1 Categories of Orange widgets and functions 16 Table 2 Index score & description of Flesch Reading Ease Score method. 23 Table 3 GEPT level vocabulary covered ratio (before lemmatizing) 27 Table 4 GEPT level vocabulary covered ratio (after lemmatizing) 27 Table 5 Two example documents 37 Table 6 Similarity scores between documents and vocabulary sets 37 Table 7 Document features (obtained by applying the Boolean IR model) 38 Table 8 Raw data 42 Table 9 Evaluation results for the accuracy of classifiers. 50 Table 10 The confusion matrix 51 Table 11 Measurement methods in educational criteria for RS in TEL 61 Table 12 An evaluation framework for Recommendation System 74
參考文獻	Apostol, T., Calculus, Vol. 2: "Multi-Variable Calculus and Linear Algebra with Applications", John Wiley and Sons, ISBN 978-0471000075. (1969) BBC, available from http://www.bbc.co.uk/, retrieved (2011). Bird, S., Klein, E., Loper, E. and Baldridge, J., Multidisciplinary instruction with the Natural Language Toolkit. Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 62-70. (2008) Brants, Thorsten., "TnT: a statistical part-of-speech tagger." Proceedings of the sixth conference on Applied natural language processing. Association for Computational Linguistics. (2000) Brett, P., "Using multimedia: a descriptive investigation of incidental language learning", Computer Assisted Language Learning, Vol. 11, No. 2, pp. 179-200. (1998) Chall, J. S., Readability: An appraisal of research and publication, Bureau of Educational Research Monographs, Columbus: Ohio State University Press, Epping, England: Bowker. (1958) Chall, J. S. and Dale, E., “Readability revisited: The New ale-Chall Readability Formula, ”Cambridge, MA: Brookline Books. (1995) Chi, C. C. and Kuo, C. H., “The Design of English Article Recommender Mechanism for Senior High School Students”, Proc. of 2012 International Conference on Advanced Learning Technologies, Rome, Italy, July 4-6, pp. 541-545. (2012) Chen, C. M., Hsu, S. H., Li, Y. L. and Peng, C. J., "Personalized Intelligent M-learning System for Supporting Effective English Learning," Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on, vol.6, no., pp.4898, 4903, 8-11. (2006) Choi, H. J. and Johnson, S, D., “The effect of problem-based video instruction on learner satisfaction, comprehension and retention in college course”, British Journal of Educational Technologies, Vol. 38, No.5, pp. 885-895. (2007) Church, K. W. and Hanks, P., “Word association norms, mutual information, and lexicography”, Computational Linguistics, Vol. 16, No.1, pp. 22-29. (1990) CNN, available from http://edition.cnn.com/, retrieved (2011). Coleman, M., Liau, T. L., "A computer readability formula designed for machine scoring", Journal of Applied Psychology, Vol. 60, pp. 283–284. (1975) Cooley, R. and Mobasher, B., Srivastava, J., "Web mining: information and pattern discovery on the World Wide Web," Tools with Artificial Intelligence, 1997. Proceedings., Ninth IEEE International Conference on , vol., no., pp.558,567, 3-8. (1997) Csikszentmihalyi, M., “Beyond boredom and anxiety,” Jossey-Bass Publishers, pp.10-,2000, Original work published. (1975) Csikszentmihalyi, M., “Finding flow,” New York: Basic. (1997) Danielson, W. A. and Bryan, S. D., “Computer automation of two readability formulas,” Journalism Quarterly, pp. 201-206. (1963) Demšar, J., Zupan, B., Leban, G. and Curk, T., Orange: From experimental machine learning to interactive data mining. Springer Berlin Heidelberg. (2004) Drachsler, H., Hans G. K., Hummel, Koper, R., Identifying the Goal, User model and Conditions of Recommender Systems for Formal and Informal Learning, Journal of Digital Information, Vol. 10, No.2, pp. 4-24. (2009) Facebook Graph API, Information on https://developers.facebook.com/docs/graph-api?locale=zh_TW. (2012) GEPT, General English Proﬁciency Test, available from http:// www.gept.org.tw, retrieved (2011). Gunning, R., "The technique of clear writing", New York, NY: McGraw-Hill International Book Co. (1952) Heilman, M., Zhao, L., Pino, J., and Eskenazi, M., “Retrieval of Reading Materials for Vocabulary and Reading Practice”, Proc. of the Third ACL Workshop on Innovative Use of NLP for Building Educational Applications, Ohio, U. S. A., pp. 80–88. (2008) Hsu, C. C., Chen, H. C., Huang, K. K., Huang, Y. M., “A personalized auxiliary material recommendation system based on learning style on Facebook applying an artificial bee colony algorithm”, Computers & Mathematics with Applications, Vol. 64, No. 5, pp. 1506-1513. (2012) Hsu, C. K., Hwang, G. J., Chang, C. K., “Development of a reading material recommendation system based on a knowledge engineering approach”, Computers & Education, Vol. 55, No. 1, pp. 76-83. (2010) Huang, J., "Voices from Chinese student: Professors' use of English affects academic listening", College Student Journal, pp. 212-224. (2004) Hung, T. F., Chiou, Y. S., Kuo, C. H., Tsao, N. L., "A personalized movies system for English learning", Proc. of International Computer Symposiums (ICS), Taiwan, R. O. C., Nov 13-15. (2008) IWILL, Intelligent Web-based Interactive Language Learning, Information on http://cube.iwillnow.org/iwill/ (2012) Ito, K., Encyclopedic Dictionary of Mathematics 2nd ed, MIT Press, ISBN 978-0-262-59020-4, pp. 82, 113, 144-145. (1993) Kincaid, J. P., Braby, R., Mears, J., "Electronic authoring and delivery of technical information", Journal of Instructional Development, Vol. 11, No. 2, pp.8–13. (1988) Kincaid, J. P., Fishburne Jr, R. P., Rogers, R. L. and Chissom, B. S., Derivation of new readability formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy enlisted personnel, Research Branch Report, pp.8-75, Millington, TN: Naval Technical Training, U. S. Naval Air Station, Memphis, TN. (1975) Klare, G. R., The measurement of readability, Ames: Iowa State University Press. (1963) Knutsson O., Pargman, T. C., Eklundh, K. S., and Westlund, S., 2007. Designing and developing a language environment for second language writers. Computer Education. 49, 4, pp. 1122-1146. (2007) Kohavi, Ron., A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence 2 (12): pp.1137–1143. (Morgan Kaufmann, San Mateo, CA) (1995). Konchady, M., Text mining application programming. Boston, Mass.: Charles River Media. (2006) Krashen, S. D., “Principles practice in second language acquisition”, New York: Pergamon Press. (1995) Kuo, C. H., Wible. D., Tsao, N. L., Chang, C. F., "A video retrieval system for Computer Assisted Language Learning", Proc. of the 12th Internal Conference on Artificial Intelligence in Education, July 18-22, Amsterdam, Netherlands, pp. 378-385. (2005) Leacock, Claudia, et al. "Automated Grammatical Error Detection for Language Learners." Synthesis Lectures on Human Language Technologies 7.1, pp.1-170. (2014) Liu, B., Information Retrieval and Web Search, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer. (2007) LTTC, The Language Training & Testing Center, available from http://www.lttc.ntu.edu.tw, retrieved (2012). Martin, L. and Gottron, T., Readability and the Web. Future Internet, Vol. 4, No.1, pp. 238-252. (2012) McLaughlin, G. H., SMOG Grading - a New Readability Formula. Journal of Reading, Information on http://www.articlearchives.com/ education-training/literacy-illiteracy/880189-1.html (PDF), Journal of Reading, Vol.12, No.8, pp. 639–646. (1969) Miltsakaki, E., “Matching Readers’ Preferences and Reading Skills with Appropriate Web Texts”, Proceedings of the European Chapter of the Association for Computational Linguistics 2009 Demonstrations Session, Athens, Greece, 3 April, pp. 49–52. (2009) Najjar, L. J., "Multimedia information and learning", Journal of Educational Multimedia and Hypermedia Vol. 5 No. 2, pp. 129-150. (1996) Nakamura J. and Csikszentmihalyi, M., The concept of flow, Handbook of positive psychology, pp.89-105. (2002) Ng'ambi, D., & Lombe, A., Using Podcasting to Facilitate Student Learning: A Constructivist Perspective. Educational Technology & Society, 15 (4), pp. 181–192. (2012) Ono, Y. and Ishihara, M., "Examination of the podcasting system in Second Language Acquisition", Proc. of the 9th International Conference on Computer and Information Science, Yamagata, Japan, pp. 540-545. (2010) Orange, Analyze process through visual programming, available from http://orange.biolab.si/features.html, retrieved (2013). Perkins, Jacob., Python text processing with NLTK 2.0 cookbook. Packt Publishing Ltd. (2010) Petersen, S. E. and Ostendorf, M., A machine learning approach to reading level assessment, Computer Speech and Language, Vol. 23, pp. 89-106. (2009) Pitler, E. and Nenkova, A. A., Revisiting Readability: A Uniﬁed Framework for Predicting Text Quality, Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 186~195, Honolulu. (2008) Raymond Kosala and Hendrik Blockeel., 2000. Web mining research: a survey. SIGKDD Explor. Newsl. 2, 1, pp. 1-15. (2000) Recordon, D. and Reed, D., OpenID 2.0: a platform for user-centric identity management. In Proceedings of the second ACM workshop on Digital identity management (DIM '06). ACM, New York, NY, USA, pp. 11-16. (2006) Senter, R.J., Smith, E.A., "Automated Readability Index", Wright-Patterson Air Force Base. p. iii. AMRL-TR-6620. (1967) Shea, P., “Leveling the playing field: A study of captioned interactive video for second language learning”, Journal of Educational Computing Research, Vol. 22, No.3, pp. 243-263. (2000) Sanmin, available from http://www.sanmin.com.tw/page-history.asp., retrieved (2011). Senter, R. J. and Smith, E. A., Automated Readability Index, Wright Patterson Air Force Base, P. iii, AMRL-TR-6620. (1967) Tetreault, J., Chodorow, M. and Madnani, N. Bucking the trend: improved evaluation and annotation practices for ESL error detection systems. Language Resources and Evaluation, pp. 1-27. (2013) The China Post, available from http://chinapost.com.tw/, retrieved (2011). Thompson, K. C. and Callan, J., Predicting Reading Diﬃculty With Statistical Language Models, Journal Of The American Society For Information Science And Technology, Vol. 56, pp. 1448-1462. (2005) Tsao, N. L., Kuo, C. H., Liu, Anne L. E., Wible, D. Lu, Y. T., "Error-driven incidental language learning: learning Collocation from movies", Proceedings of the 17th International Conference on Computers in Education [CDROM], pp. 136-162. (2009) Vygotsky, L. S., “Mind in Society: The development of Higher Psychological Processes”, Harvard University Press. (1978) Williams, C. B., A note on the statistical analysis of sntence length as a criterion of literary style, Biometrika Trust, Vol. 31, No.3, pp. 356-361. (1940) Youtube API, Information on https://developers.google.com/youtube/ getting\_started?hl=zh-TW. (2012) Zhang, L., Liu Z., and Ni, J., Feature-Based Assessment of Text Readability, Seventh International Conference on Internet Computing for Engineering and Science, pp. 51-54. (2013)
論文全文使用權限	校內：校內紙本論文立即公開同意電子論文全文授權校園內公開校內電子論文立即公開校外：同意授權予資料庫廠商校外電子論文立即公開

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信