§ 瀏覽學位論文書目資料
  
系統識別號 U0002-2601201513405900
DOI 10.6846/TKU.2015.00872
論文名稱(中文) 英語學習標的推薦機制設計
論文名稱(英文) A design of English learning target recommender mechanism
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系博士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 103
學期 1
出版年 104
研究生(中文) 季振忠
研究生(英文) Chen-Chung Chi
學號 895410115
學位類別 博士
語言別 英文
第二語言別
口試日期 2015-01-26
論文頁數 83頁
口試委員 指導教授 - 郭經華
委員 - 陳孟彰
委員 - 郭經華
委員 - 石貴平
委員 - 張志勇
委員 - 楊接期
關鍵字(中) 餘弦相似度
文章推薦系統
文件可讀性
關鍵字(英) Cosine similarity
Article recommender system
Document readability
第三語言關鍵字
學科別分類
中文摘要
對於許多以英語為第二學習語言的學習者而言,藉由閱讀英文文章或聽英文歌曲,不失為可以精通英語閱讀與聽力的語言學習方法。然而,在學習的過程中,總是要經過許多次的嘗試,才能找到既符合自己的興趣,又能符合自己英語字彙能力的歌曲或英文文章等學習標的。
有鑑於目前「數位原生世代」學習者,相較於「數位原生世代」學習者,更愛用3 C(電腦、通訊、消費型電子)產品與社群網路服務的特性,本研究所設計的學習標的推薦系統,為了要迎合這些數位原生世代學習者的學習需求,亦結合YouTube與Facebook等強大的社群網路服務,並引入IWill這個以高中生為主的英語學習平台中的學習者資料,以學習者語料庫分群的機制,嘗試分析這個學習族群的字彙能力,據以提供更精確的推薦。
本論文分為二大部分:其一為設計一套機制,基於高中生的字彙能力,辨識英語學習標的的字彙難度,據以選擇並推薦合適的英語標的供學習者學習,經本研究實驗證實這套機制中的字彙難度篩選機制,可以明確的區分學習者寫作文章、台灣高中英語課文與英文新聞文章之間的字彙難度差異;其二為應用字彙難度辨識機制,推薦字彙難度合宜的歌曲,以行動學習或以網頁形式,讓學習者在聽歌的同時,藉由閱讀歌詞與選擇正確搭配字的形式,著重於動詞-名詞的搭配字學習,透過計算互斥資訊的方法,以分辨語料庫中的正確或錯誤的搭配字,在行動載具或電腦上,隨時隨地在行動裝置上提供聽力與搭配字使用能力的練習管道。
英文摘要
For many EFL (English as Foreign Language) learners, reading English articles or listening to music has always been a good way to improve their English proficiency. However, it’s always not easy to find appropriate learning target for fulfill learners’ interest and vocabulary ability.
Nowadays most learners are “digital natives.” Compared to the characteristics the “digital immigrants”, the “digital natives” are used to using 3C (computers, communications, and consumers) products. Therefore this approach tries to develop a learning target recommender system that combine powerful social network – Youtube and Facebook, and import learner corpus – IWill (an English learners’ learning platform), tries to clustering the vocabulary characteristic this corpus to find out learners’ vocabulary using trend, therefore recommend appropriate learning target for learners.
Two parts have been implemented in this approach. The first part is to describe the vocabulary difficulty filtering mechanism, and then use it to choose and recommend appropriate learning target for English learners; The second part is applied this mechanism to recommend music video via mobile device or web browser, the design of the proposed recommend system has focused on the study of correlation word of Verb-Noun, let learners can learning by watch dynamic displaying lyrics sentence by sentence, and listen music audio at the same time; This proposed system fetching worth learning collocation words by calculating mutual information of corpus, and then provide a practice tool for improve both reading and listening skill.
第三語言摘要
論文目次
Table of Contents
List of Figures	VII
List of Tables	IX
Chapter 1. Introduction	1
1.1 Related work	1
1.2 The purpose and contribution of this study	7
1.3 Introduce of the remain section	9
Chapter 2. Text databases and research tools	11
2.1 Text databases	12
2.1.1 Vocabulary Sets	12
2.1.2 Text Databases	13
2.2 Research Tools	14
2.2.1 Natural Language Processing Toolkit: NLTK	14
2.2.2 Data Mining Toolkit: Orange	14
2.2.3 Open ID Toolkit: Facebook Graph API	18
2.3 Document readability estimation formula	20
2.3.1 Automated Readability Index (ARI)	20
2.3.2 Coleman-Liau Index (CLI)	21
2.3.3 Flesch-Kincaid readability test / Flesch Reading Ease	22
2.3.4 Gunning Fog Index	23
2.3.5 SMOG and SMOG Index	24
2.4 Natural language processing	26
2.4.1. Document preprocessing	26
2.4.2. POS Tagging	26
2.4.3. Lemmatizing:	26
2.5 Mutual Information and Collocation database	28
2.5.1 Mutual Information	28
2.5.2 Collocation and Mis-Collocation	29
Chapter 3. Mechanism overview	33
3.1 Design a text-reading recommendation system	33
3.1.1 Natural Language Process Preprocessing	35
3.1.2. Feature Extraction Mechanism by IR Models	35
3.1.3. Find document patterns in Stage i+1 and make recommendation	40
3.2. Design a Music Video Recommendation System	44
3.2.1. Login process for keeping personal information	44
3.2.2. Web-based music video recommendation list and playing interface	44
3.2.3. Collocation Databases	46
3.2.4 Applied Text database	46
3.2.5. Lyrics’ vocabulary difficulty evaluation	46
Chapter 4. Mechanism Performance Evaluation	48
4.1 The evaluation of the text-reading recommendation system	49
4.1.1 The system’s ability to identify stage i + 1 level articles appropriate for learners	49
4.1.2 The article source differentiation accuracy of the system	51
4.1.3 The rationality of system-recommended online English news articles for certain learner populations	53
4.1.4 Analyze essays in the iWill learners’ corpus using document difficulty formula	56
4.1.5 System extension application in identifying articles written with superior abilities	57
4.2 The implement of the music video recommendation system	60
4.2.1 User interface	62
4.2.2 Quiz in lyrics	63
4.2.3. Lyrics’ clustering based on vocabulary difficulty estimation	64
4.2.4 Correlation relationship estimation between document difficulty and learners’ feedback	65
Chapter 5. Conclusion and Future Work	70
Bibloigraphy	75

List of Figures
Figure 1 Screenshot of Search Interface for Finding Appropriate Readings.	6
Figure 2 Search results and analysis of readability (E. Miltsakaki et al., 2009)	6
Figure 3  Proportions of examinees from different age groups for GEPT in 2010.	12
Figure 4  Data mining tools: Orange interface	17
Figure 5  Login process – connected to the Facebook Graph API	19
Figure 6  User authentication – using the FaceGraph API	19
Figure 7  Gunning Fog Index algorithm.	24
Figure 8  Data collect flowchart in proposed collocation database.	31
Figure 9  A data fragment in collocation database.	31
Figure 10  Average MI of the collocation database	32
Figure 11  The recommendation making procedure in the reading target recommendation system.	34
Figure 12  Similarity between words from GEPT level 6 and SHSETs	40
Figure 13  A classifier accuracy estimation flowchart	41
Figure 14  The system ask learner to answer a quiz.	45
Figure 15  The system leaves feedback message to learner.	45
Figure 16  The system architecture: music video recommender.	47
Figure 17  Document classification accuracy compared	52
Figure 18  Difficulty score distribution from three text databases	54
Figure 19  Document clustering result (fetched from iWill learners’ corpus)	57
Figure 20  Average cosine similarity comparison between the 100 well-written articles and the nonselected articles (word frequency model)	59
Figure 21  Average cosine similarity comparison between the best 100 well-written articles and the nonselected articles (Boolean model)	59
Figure 22  An example for candidate lists of word-pair options.	63
Figure 23  Clustering result by analyze lyrics’ vocabulary difficulty.	64
Figure 24  Correlation relationship Between document difficulty and learners’ feedback	67
Figure 25  Correlation relationship between right-ratio of quiz and feedback from learner (N=28, r=-0.69)	69

List of Tables
Table 1  Categories of Orange widgets and functions	16
Table 2  Index score & description of Flesch Reading Ease Score method.	23
Table 3  GEPT level vocabulary covered ratio (before lemmatizing)	27
Table 4  GEPT level vocabulary covered ratio (after lemmatizing)	27
Table 5  Two example documents	37
Table 6  Similarity scores between documents and vocabulary sets	37
Table 7  Document features (obtained by applying the Boolean IR model)	38
Table 8  Raw data		42
Table 9  Evaluation results for the accuracy of classifiers.	50
Table 10  The confusion matrix	51
Table 11  Measurement methods in educational criteria for RS in TEL	61
Table 12  An evaluation framework for Recommendation System	74
參考文獻
Apostol, T., Calculus, Vol. 2: "Multi-Variable Calculus and Linear Algebra with Applications", John Wiley and Sons, ISBN 978-0471000075. (1969)
BBC, available from http://www.bbc.co.uk/, retrieved (2011).
Bird, S., Klein, E., Loper, E. and Baldridge, J., Multidisciplinary instruction with the Natural Language Toolkit. Proceedings of the Third Workshop on Issues in Teaching Computational Linguistics, Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 62-70. (2008)
Brants, Thorsten., "TnT: a statistical part-of-speech tagger." Proceedings of the sixth conference on Applied natural language processing. Association for Computational Linguistics. (2000)
Brett, P., "Using multimedia: a descriptive investigation of incidental language learning", Computer Assisted Language Learning, Vol. 11, No. 2, pp. 179-200. (1998)
Chall, J. S., Readability: An appraisal of research and publication, Bureau of Educational Research Monographs, Columbus: Ohio State University Press, Epping, England: Bowker. (1958)
Chall, J. S. and Dale, E., “Readability revisited: The New ale-Chall Readability Formula, ”Cambridge, MA: Brookline Books. (1995)
Chi, C. C. and Kuo, C. H., “The Design of English Article Recommender Mechanism for Senior High School Students”, Proc. of 2012 International Conference on Advanced Learning Technologies, Rome, Italy, July 4-6, pp. 541-545. (2012)
Chen, C. M., Hsu, S. H., Li, Y. L. and Peng, C. J., "Personalized Intelligent M-learning System for Supporting Effective English Learning," Systems, Man and Cybernetics, 2006. SMC '06. IEEE International Conference on, vol.6, no., pp.4898, 4903, 8-11. (2006)
Choi, H. J. and Johnson, S, D., “The effect of problem-based video instruction on learner satisfaction, comprehension and retention in college course”, British Journal of Educational Technologies, Vol. 38, No.5, pp. 885-895. (2007)
Church, K. W. and Hanks, P., “Word association norms, mutual information, and lexicography”, Computational Linguistics, Vol. 16, No.1, pp. 22-29. (1990)
CNN, available from http://edition.cnn.com/, retrieved (2011).
Coleman, M., Liau, T. L., "A computer readability formula designed for machine scoring", Journal of Applied Psychology, Vol. 60, pp. 283–284. (1975)
Cooley, R. and Mobasher, B., Srivastava, J., "Web mining: information and pattern discovery on the World Wide Web," Tools with Artificial Intelligence, 1997. Proceedings., Ninth IEEE International Conference on , vol., no., pp.558,567, 3-8. (1997)
Csikszentmihalyi, M., “Beyond boredom and anxiety,” Jossey-Bass Publishers, pp.10-,2000, Original work published. (1975)
Csikszentmihalyi, M., “Finding flow,” New York: Basic. (1997)
Danielson, W. A. and Bryan, S. D., “Computer automation of two readability formulas,” Journalism Quarterly, pp. 201-206. (1963)
Demšar, J., Zupan, B., Leban, G. and Curk, T., Orange: From experimental machine learning to interactive data mining. Springer Berlin Heidelberg. (2004)
Drachsler, H., Hans G. K., Hummel, Koper, R., Identifying the Goal, User model and Conditions of Recommender Systems for Formal and Informal Learning, Journal of Digital Information, Vol. 10, No.2, pp. 4-24. (2009)
Facebook Graph API, Information on https://developers.facebook.com/docs/graph-api?locale=zh_TW. (2012)
GEPT, General English Proficiency Test, available from http:// www.gept.org.tw, retrieved (2011).
Gunning, R., "The technique of clear writing", New York, NY: McGraw-Hill International Book Co. (1952)
Heilman, M., Zhao, L., Pino, J.,  and Eskenazi, M., “Retrieval of Reading Materials for Vocabulary and Reading Practice”, Proc. of the Third ACL Workshop on Innovative Use of NLP for Building Educational Applications, Ohio, U. S. A., pp. 80–88. (2008)
Hsu, C. C., Chen, H. C., Huang, K. K., Huang, Y. M., “A personalized auxiliary material recommendation system based on learning style on Facebook applying an artificial bee colony algorithm”, Computers & Mathematics with Applications, Vol. 64, No. 5, pp. 1506-1513. (2012)
Hsu, C. K., Hwang, G. J., Chang, C. K., “Development of a reading material recommendation system based on a knowledge engineering approach”, Computers & Education, Vol. 55, No. 1, pp. 76-83. (2010)
Huang, J., "Voices from Chinese student: Professors' use of English affects academic listening", College Student Journal, pp. 212-224. (2004)
Hung, T. F., Chiou, Y. S., Kuo, C. H., Tsao, N. L., "A personalized movies system for English learning", Proc. of International Computer Symposiums (ICS), Taiwan, R. O. C., Nov 13-15. (2008)
IWILL, Intelligent Web-based Interactive Language Learning, Information on http://cube.iwillnow.org/iwill/ (2012)
Ito, K., Encyclopedic Dictionary of Mathematics 2nd ed, MIT Press, ISBN 978-0-262-59020-4, pp. 82, 113, 144-145. (1993)
Kincaid, J. P., Braby, R., Mears, J., "Electronic authoring and delivery of technical information", Journal of Instructional Development, Vol. 11, No. 2, pp.8–13. (1988)
Kincaid, J. P., Fishburne Jr, R. P., Rogers, R. L. and Chissom, B. S., Derivation of new readability formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy enlisted personnel, Research Branch Report, pp.8-75, Millington, TN: Naval Technical Training, U. S. Naval Air Station, Memphis, TN. (1975)
Klare, G. R., The measurement of readability, Ames: Iowa State University Press. (1963)
Knutsson O., Pargman, T. C., Eklundh, K. S., and Westlund, S., 2007. Designing and developing a language environment for second language writers. Computer Education. 49, 4, pp. 1122-1146. (2007)
Kohavi, Ron., A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence 2 (12): pp.1137–1143. (Morgan Kaufmann, San Mateo, CA) (1995).
Konchady, M., Text mining application programming. Boston, Mass.: Charles River Media. (2006)
Krashen, S. D., “Principles practice in second language acquisition”, New York: Pergamon Press. (1995)
Kuo, C. H., Wible. D., Tsao, N. L., Chang, C. F., "A video retrieval system for Computer Assisted Language Learning", Proc. of the 12th Internal Conference on Artificial Intelligence in Education, July 18-22, Amsterdam, Netherlands, pp. 378-385. (2005)
Leacock, Claudia, et al. "Automated Grammatical Error Detection for Language Learners." Synthesis Lectures on Human Language Technologies 7.1, pp.1-170. (2014)
Liu, B., Information Retrieval and Web Search, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer. (2007)
LTTC, The Language Training & Testing Center, available from http://www.lttc.ntu.edu.tw, retrieved (2012).
Martin, L. and Gottron, T., Readability and the Web. Future Internet, Vol. 4, No.1, pp. 238-252. (2012)
McLaughlin, G. H., SMOG Grading - a New Readability Formula. Journal of Reading, Information on http://www.articlearchives.com/ education-training/literacy-illiteracy/880189-1.html (PDF), Journal of Reading, Vol.12, No.8, pp. 639–646. (1969)
Miltsakaki, E., “Matching Readers’ Preferences and Reading Skills with Appropriate Web Texts”, Proceedings of the European Chapter of the Association for Computational Linguistics 2009 Demonstrations Session, Athens, Greece, 3 April, pp. 49–52. (2009)
Najjar, L. J., "Multimedia information and learning", Journal of Educational Multimedia and Hypermedia Vol. 5 No. 2, pp. 129-150. (1996)
Nakamura J. and Csikszentmihalyi, M., The concept of flow, Handbook of positive psychology, pp.89-105. (2002)
Ng'ambi, D., & Lombe, A., Using Podcasting to Facilitate Student Learning: A Constructivist Perspective. Educational Technology & Society, 15 (4), pp. 181–192. (2012)
Ono, Y. and Ishihara, M., "Examination of the podcasting system in Second Language Acquisition", Proc. of the 9th International Conference on Computer and Information Science, Yamagata, Japan, pp. 540-545. (2010)
Orange, Analyze process through visual programming, available from http://orange.biolab.si/features.html, retrieved (2013).
Perkins, Jacob., Python text processing with NLTK 2.0 cookbook. Packt Publishing Ltd. (2010)
Petersen, S. E. and Ostendorf, M., A machine learning approach to reading level assessment, Computer Speech and Language, Vol. 23, pp. 89-106. (2009)
Pitler, E. and Nenkova, A. A., Revisiting Readability: A Unified Framework for Predicting Text Quality, Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 186~195, Honolulu. (2008)
Raymond Kosala and Hendrik Blockeel., 2000. Web mining research: a survey. SIGKDD Explor. Newsl. 2, 1, pp. 1-15. (2000)
Recordon, D. and Reed, D., OpenID 2.0: a platform for user-centric identity management. In Proceedings of the second ACM workshop on Digital identity management (DIM '06). ACM, New York, NY, USA, pp. 11-16. (2006)
Senter, R.J., Smith, E.A., "Automated Readability Index", Wright-Patterson Air Force Base. p. iii. AMRL-TR-6620. (1967)
Shea, P., “Leveling the playing field: A study of captioned interactive video for second language learning”, Journal of Educational Computing Research, Vol. 22, No.3, pp. 243-263. (2000)
Sanmin, available from http://www.sanmin.com.tw/page-history.asp., retrieved (2011).
Senter, R. J. and Smith, E. A., Automated Readability Index, Wright Patterson Air Force Base, P. iii, AMRL-TR-6620. (1967)
Tetreault, J., Chodorow, M. and Madnani, N. Bucking the trend: improved evaluation and annotation practices for ESL error detection systems. Language Resources and Evaluation, pp. 1-27. (2013)
The China Post, available from http://chinapost.com.tw/, retrieved (2011).
Thompson, K. C. and Callan, J., Predicting Reading Difficulty With Statistical Language Models, Journal Of The American Society For Information Science And Technology, Vol. 56, pp. 1448-1462. (2005)
Tsao, N. L., Kuo, C. H., Liu, Anne L. E., Wible, D. Lu, Y. T., "Error-driven incidental language learning: learning Collocation from movies", Proceedings of the 17th International Conference on Computers in Education [CDROM], pp. 136-162. (2009)
Vygotsky, L. S., “Mind in Society: The development of Higher Psychological Processes”, Harvard University Press. (1978)
Williams, C. B., A note on the statistical analysis of sntence length as a criterion of literary style, Biometrika Trust, Vol. 31, No.3, pp. 356-361. (1940)
Youtube API, Information on https://developers.google.com/youtube/ getting\_started?hl=zh-TW. (2012)
Zhang, L., Liu Z., and Ni, J., Feature-Based Assessment of Text Readability, Seventh International Conference on Internet Computing for Engineering and Science, pp. 51-54. (2013)
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信