||Fuzzy Data Mining Techniques for the Learning and Study Strategies Inventory
||Department of Computer Science and Information Engineering
||With the popularity of higher education during recent years, universities and colleges have had more and more researches on assessments to enhance students’ learning performance. Practically performed, however majority schools have limited counselors. In addition, traditional assessment is mostly pen-and-paper tests, therefore, the results are restricted. Take the Learning and Study Strategies Inventory (LASSI) for example, exam participants have to answer questions from more than ten scales of study strategy with 87 assessment items, which is time-consuming and easily resulting in student’s resistance, fatigue and unwillingness to complete the assessment. Therefore, it is difficult to reach expected effect.
To improve foregoing situation, in this dissertation, we come up with a fuzzy data mining technique for the LASSI. Two major steps are taken to do so. First step is to extract valuable or critical questions from questionnaires to directly reduce the number of assessment questions for LASSI, according to the classification charts of decision tree analysis. Second step is to find the related scale of study strategy from association rule analysis to indirectly decrease the correlative scale of study strategy and reduce the assessment questions for LASSI. Moreover, by integrating the concepts of fuzzy set theory, the rules discovered by data mining techniques are assembled as tree structure, and the ways in the past to answer questions from the first to the end are changed with students’ answer results and then the results will be evaluated to decide whether further assessment is required.
A web-based Learning and Study Strategy self-assessment system (Web-LSA) is developed in this dissertation. It is not to replace the original LASSI assessment but to minimize the assessment questions for approximate result. Fewer questions and the web-based system will enhance students’ willingness to more quickly do self-assessment. The results of the self-assessment will be provided counselors to help in finding high-risk students with study disturbances. Therefore, counselors can pay their attention only on these students, which can not only cut down human resource and counseling cost, but make student’s learning performance more efficiently as well. Furthermore, fuzzy data mining techniques can also be applied to social scientific researches, and can be especially efficient and practical in simplifying the assessment questions.
List of Figures VII
List of Tables VIII
Chapter 1 Introduction 1
1.1 Research Motivation of This Dissertation 1
1.2 Research Objectives of This Dissertation 2
1.3 Organization of This Dissertation 4
Chapter 2 Background Knowledge 5
2.1 Learning and Study Strategy Scale Inventory for University Students 5
2.1.1 Introduction to the LASSI Scales 8
2.1.2 Scoring to the LASSI 12
2.2 Data Mining Technology 15
2.2.1 Decision Tree 17
220.127.116.11 CART 20
18.104.22.168 CHAID 21
22.214.171.124 ID3 and C4.5 22
2.2.2 Association Rule 23
126.96.36.199 Hash-based algorithm 27
188.8.131.52 Partition-based algorithm 28
184.108.40.206 FP-growth algorithm 29
2.3 fuzzy set concepts 30
2.3.1 crisp set theory 31
2.3.2 Fuzzy set theory 32
Chapter 3 Fuzzy Data Mining Method 38
3.1 Overview 38
3.2 Preprocess of survey data 40
3.3 Decision tree analysis 43
3.4 Selecting candidate items for LASSI Scales 44
3.5 Association analysis 45
3.6 Prioritizing the LASSI scales 49
Chapter 4 Implementation of Web-LSA 56
4.1 Web-based self-assessment for the university students 58
4.2 Guidance-support system for the counselors 61
Chapter 5 Performance Evaluations 63
5.1 Reduction of items 63
5.1.1 Applying decision tree analysis results 63
5.1.2 Applying fuzzy method analysis results 67
5.2 Experimental results analysis 69
5.2.1 Efficacy of applying decision tree 69
5.2.2 Efficacy of applying fuzzy method 73
Chapter 6 Conclusions and Future Directions 77
6.1 Conclusions 77
6.2 Future Directions 78
Figure 2.1 An example of decision tree 19
Figure 2.2 A comparison of fuzzy set and crisp set 30
Figure 2.3 The membership function of trapezoidal 34
Figure 3.1 The LASSI data mining processing 40
Figure 3.2 The results from decision tree analysis for motivation scale 44
Figure 3.3 The tree graph of motivation scale 45
Figure 3.4 Parts of association rules produced by association analysis 47
Figure 3.5 The association rules tree 54
Figure 4.1 The framework of the Web-LSA system 56
Figure 4.2 The system flowchart of Web-LSA 58
Figure 4.3 LASSI self-assessment system start menu 59
Figure 4.4 Sample of the assessing item 59
Figure 4.5 Sample of the assessing results 60
Figure 4.6 Sample of the assessing reference 60
Figure 4.7 Interfaces of the Web-LSA Database Query 61
Figure 4.8 Interfaces of the Web-LSA Guidance-support system 62
Table 2.1 The LASSI Scales 7
Table 2.2 The percentage rank norms for LASSI 14
Table 3.1 Original questionnaire answer table 41
Table 3.2 Original LASSI scales scoring table 42
Table 3.3 LASSI Scales Self-assessment converted table 43
Table 3.4 The processed rules from association rule analysis 49
Table 3.5 The correlation of all study strategies for poor association 51
Table 3.6 The correlation of eight scales (second prioritizing) 52
Table 3.7 The correlation of six scales (third prioritizing) 53
Table 3.8 The correlation of three scales (fourth prioritizing) 53
Table 5.1 Necessary items of LASSI based on decision tree analysis 66
Table 5.2 Necessary items of LASSI based on Fuzzy Method 68
Table 5.3 Necessary items for decision tree prediction 70
Table 5.4 Accuracy of decision tree prediction 72
Table 5.5 Necessary items for Fuzzy Method prediction 74
Table 5.6 Accuracy of Fuzzy Method prediction 76
|| R. Agrawal, T. Imielinski, A. Swami, Mining Association Rules between Sets of Items in Large Databases, ACM SIGMOD Conference, Washington, DC, USA, (1993) 207-216.
 R. Agrawal, T. Imielinksi, A. Swami, Database mining: a performance perspective, IEEE Trans. Knowledge Data Eng. 5 (6) (1993) 914–925.
 R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, A.I. Verkamo, Fast Discovery of Association Rules, Advances in KDDM, U. Fayyad et al., eds., MIT/AAAI Press, 1995.
 R. Agrawal, R. Srikant, Fast Algorithms for Mining Association Rules in Large Databases, The 20th International Conference Very Large Data Bases, (1994) 478–499.
 R. Agrawal, R. Srikant, Mining Sequential Patterns, The 11th International Conference on Data Engineering, Taipei, Taiwan, (1995) 3-14.
 B.A.A. Antao, A.J. Brodersen, J.R. Bourne, J.R.Cantwell, Building intelligent tutorial systems for teaching simulation in engineering education, IEEE Transactions on Education 35(3), (2000) 222–25.
 M. Baker, The roles of models in artificial intelligence and education research: a prospective view, International Journal of Artificial Intelligence in Education, 11, (2000) 122–143.
 C. Baragoin, C.M. Andersen, S. Bayerl, G. Bent, J. Lee, C. Schommer, Mining Your Own Business in Telecoms Using DB2 Intelligent Miner for Data, IBM Redbooks.
 M.J.A. Berry, G.S. Linoff, Data Mining Techniques for Marketing, Sales, and Customer Relationship Management, Wiley Technology Publishing, 2004.
 A. Berson, S. Smith, K. Thearling, Building Data Mining Applications for CRM, McGraw-Hill Osborne Media, 2000.
 T. Blischok, Every transaction tells a story, In: Chain Store Age Executive with Shopping Center Age 71 (3), (1995) 50–57.
 A.F. Blishun, Fuzzy learning models in expert systems, Fuzzy Sets and Systems 22 (1987) 57–70.
 L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Tree, Wadsworth Belmont, CA, 1984.
 T. Brijs, G. Swinnen, K. Vanhoof, G. Wets, Using Association Rules for Product Assortment Decisions: A Case Study, ACM SIGKDD, (1999) 254–260.
 S. Brin, R. Motwani, J. Ullman, S. Tsur, Dynamic Itemset Counting and Implication Rule for Market Basket Data, The SIGMOD Conference on Management of Data, (1997) 255-264.
 G.N. Burlak, J.A. Hernandez, A. Ochoa, J. Munoz, The Use of Data Mining to Determine Cheating in Online Student Assessment, Proceedings of the Electronics, Robotics and Automotive Mechanics Conference (CERMA'06)
 L.M.D. Campos, S. Moral, Learning rules for a fuzzy inference model, Fuzzy Sets and Systems 59 (1993) 247–257.
 F. Castro, A. Vellido, A. Nebot, F. Mugica, Applying data mining techniques to e-learning problems: A survey and state of the art, Studies in Computational Intelligence, Vol. 62, (2007) 183–221.
 G. Chang, M.J. Healey, J.A.M. McHugh, J.T.L. Wang, Mining the World Wide Web--An Information Search Approach, Boston: Kluwer Academic Publishers, 2001.
 R.L.P. Chang, T. Pavliddis, Fuzzy decision tree algorithms, IEEE Transcending Systems Man Cybernet. 7 (1977) 28–35.
 M.S. Chen, J. Han, P.S. Yu, Data mining: an overview from a database perspective, IEEE Trans. Knowledge Data Eng. 8 (6) (1996) 866–883.
 R.S. Chen, R.C. Wu, J.Y. Chen, Data Mining Application in Customer Relationship Management Of Credit Card Business, Proceedings of the 29th Annual International Computer Software and Applications Conference, 2005.
 J. Cheng, U.M. Fayyad, K.B. Irani, Z. Qian, Improved decision trees: a generalized version of ID3, The fifth International conference on Machine Learning, Morgan Kaufman, 21, (1998) 100-106.
 S.Y. Cheng, C.S. Lin, H.H. Chen, J.S. Heh, Learning and diagnosis of individual and class conceptual perspectives: an intelligent systems approach using clustering techniques. Computers & Education, 44(3), (2005) 257–283.
 C. Chou, Constructing a computer-assisted testing and evaluation system on the world wide web – the CATES experience, IEEE Transactions on Education 43(3), (2000) 266–272.
 H.C. Chu, G.J. Hwang, J.C.R. Tseng, G.H. Hwang, A Computerized Approach to Diagnosing Student Learning Problems in Health Education, Asian Journal of Health and Information Sciences, Vol. 1, No. 1, (2006) 43-60.
 C. Clair, C. Liu, N. Pissinou, Attribute weighting: a method of applying domain knowledge in the decision tree process, The Seventh International Conference Information and Knowledge Management, (1998) 259–266.
 P. Clark, T. Niblett, The CN2 induction algorithm, Machine Learning 3 (1989) 261–283.
 C. Conati, A. Gertner, K. vanLehn, Using Bayesian networks to manage uncertainty in student modeling. User Modeling and User-Adapted Interaction, 12(4), (2002) 371–417.
 M. Delgado, A. Gonzalez, An inductive learning procedure to identify fuzzy systems, Fuzzy Sets and Systems 55 (1993) 121–132.
 M. Delgado, N. Marín, D. Sánchez, M.A. Vila, Fuzzy Association Rules: General Model and Applications, IEEE Transactions on Fuzzy Systems, Vol. 11, N0. 2, (2003) 214-225.
 A. Famili, W.M. Shen, R. Weber, E. Simoudis, Data preprocessing and intelligent data analysis, Intelligent Data Analysis, 1 (1) (1997) 1–28.
 V. Figueiredo, F. Rodrigues, Z. Vale, J.B. Gouveia, An Electric Energy Consumer Characterization Framework Based on Data Mining Techniques, IEEE Transactions on Power Systems, Vol. 20, NO. 2, (2005) 596-602.
 N. Friedman, D. Geiger, M. Goldszmidt, Bayesian Network Classifiers, Machine Learning, vol. 29, 2-3, (1997) 131-163.
 D.F. Giussepe, Distributed Data Mining ACAI’05/SEKT’05 Advanced course on Knowledge Discovery, January 11th, 2005.
 A. Gonzalez, A learning methodology in uncertain and imprecise environments, International Journal Intelligent Systems 10 (1995) 57–371.
 J. Han, Y. Cai, N. Cercone, Knowledge Discovery in Databases: An Attribute-Oriented Approach, The 18th Internation Conference Very Large Data Bases, (1992) 547–559.
 J. Han, Y. Fu, Discovery of multiple-level association rules from large database, The International Conference on Very Large Databases, 1995.
 J. Han, M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, San Francisco, 2001.
 J. Han, J. Pei, Y. Yin, Mining Frequent Patterns without Candidate Generation. The 2000 ACM SIGMOD Conference on Management of Data, Dallas, Texas, USA, (2000) 1-12.
 J.A. Hartigan, Clustering Algorithms, John Wiley, New York, 1975.
 Q. He, P. Tymms, A computer-assisted test design and diagnosis system for use by classroom teachers, Journal of Computer Assisted Learning, 21(6), (2005) 419–429.
 A. Hinneburg, D.A. Keim, An efficient approach to clustering in large multimedia databases with noise, International Conference Knowledge Discovery and Data Mining, (1998) 58–65.
 J. Hipp, U. Güntzer, G. Nakhaeizadeh, Algorithms for Association Rule Mining – A General Survey and Comparison, SIGKDD Explorations, 2(1), (2000) 58-64.
 T.P. Hong, J.B. Chen, Finding relevant attributes and membership functions, Fuzzy Sets and Systems 103 (3) (1999) 389–404.
 T.P. Hong, J.B. Chen, Processing individual fuzzy attributes for fuzzy rule induction, Fuzzy Sets and Systems 112 (1) (2000) 127–140.
 T.P. Hong, C.S. Kuo, S.C. Chi, A data mining algorithm for transaction data with quantitative values, Intelligent Data Analysis, 3 (5) (1999) 363–376.
 T.P. Hong, K.Y. Lin, S.L.Wang, Fuzzy data mining for interesting generalized association rules, Fuzzy Sets and Systems 138, (2003) 255-269.
 M. Houtsma, A. Swami, Set-Oriented Mining for Association Rules in Relational Databases, The 11th International Conference Data Engineering, (1995) 25–33.
 X.H. Hu, A Data Mining Approach for Retailing Bank Customer Attrition Analysis, 2005 Springer Science Business Media, Inc. Manufactured in The Netherlands, Applied Intelligence 22, (2005) 47–60.
 Y. Huo, F. Azuaje, P. McCullagh, R. Harper, Semi-Supervised Clustering Models for Clinical Risk Assessment, Sixth IEEE Symposium on BionInformatics and BioEngineering, 2006.
 G.J. Hwang, A Data Mining Algorithm for Diagnosing Student Learning Problems in Science Courses, International Journal of Distance Education Technology, 3(4), (2005) 35-50.
 G.J. Hwang, J.L. Hsiao, J.C.R. Tseng, A computer-assisted approach for diagnosing student learning problems in engineering courses. Journal of Information Science and Engineering, 19(2), (2003) 229-248.
 C.J. Huang, M.C. Lium, S.S. Chu, C.L. Cheng, Application of machine learning techniques to web-based intelligent learning diagnosis system. In Proceedings of fourth international conference on hybrid intelligent systems, (2004) 242–247.
 G.J. Hwang, C.R. Tseng, G.H. Hwang, Diagnosing student learning problems based on historical assessment records, Innovations in Education and Teaching International, Vol. 45, No. 1, (2008) 77–89.
 B.S. Jong, T.W. Lin, Y.L. Wu, T. Chan, Diagnostic and remedial learning strategy based on conceptual graphs, Journal of Computer Assisted Learning, 20(5), (2004) 377–386.
 A. Kandel, Fuzzy Expert Systems, CRC Press, Boca Raton, FL, (1992) 8–19.
 G.J. Klir, T.A. Folger, Fuzzy Sets, Uncertainty, and Information, Prentice-Hall, Inc. Englewood Cliffs, New Jersey, 1988.
 I. Kononenko, Machine Learning for Medical Diagnosis: History, State of the Art and Perspective, Artificial Intelligence in Medicine, Vol. 23, Issue:1, 2001.
 B. Kosko, Neural Networks and Fuzzy Systems, Prentice-Hall, Englewood Cliffs, NewJersey, 1992.
 B. Kosko, Fuzzy Thinking: The New Science of Fuzzy Logic, Hyperion, New York, 1992.
 C.S. Lee, Diagnostic, predictive and compositional modeling with data mining in integrated learning environments, Computers and Education, 49, (2007) 562-580.
 X.B. Li, A scalable decision tree system and its application in pattern recognition and intrusion detection, Decision Support Systems, v.41 n.1, (2005) 112-130.
 J. Luan, D. James, Data Mining and Its applications in Higher Education, EBSCO Publishing 2003.
 A. Maeda, H. Ashida, Y. Taniguchi, Y. Takahashi, Data Mining System using Fuzzy Rule Induction, The Fourth IEEE International Conference on Fuzzy Systems and The Second International Fuzzy Engineering Symposium, Yokohama, Japan,Vol.5, (1995) 45-46.
 J. Magidson, The CHAID approach to segmentation modeling, In Handbook of Marketing Research, 1993.
 R.J. Marshall, The use of classification and regression trees in clinical epidemiology, Journal of Clinical Epidemiology, 54(6), (2001) 603-609.
 B.L. Milenova, M.M. Campos, O-Cluster: Scalable clustering of large high dimensional data sets, IEEE International Conference on Data Mining (2002) 290–297.
 J. Mostow, J. Beck, H. Cen, A. Cuneo, E. Gouvea,C. Heiner, An educational data mining tool to browse tutor–student interactions: Time will tell! In Proceedings of the workshop on educational data mining, Pittsburgh, USA, (2005) 15–22.
 J. Mostow, J. Beck, Some useful tactics to modify, map and mine data from intelligent tutors, Natural Language Engineering, 12(2), (2006) 195–208.
 T. Nakamura, Y. Kamidoi, N. Yoshida, A Clustering Method Using an Irregular Size Cell Graph, The 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications, 2005.
 C. Ordonez, Association Rule Discovery With the Train and Test Approach for Heart Disease Prediction, IEEE Transactions on Information Technology in Biomedicine, Vol. 10, No. 2, 2006.
 C. Pahl, Data mining technology for the evaluation of learning content interaction. International Journal on E-Learning, 3(4), (2004) 47–55.
 C. Pahl, C. Donnellan, Data mining technology for the evaluation of web-based teaching and learning systems, In Proceedings of the Congress E-learning. Montreal, Canada (2003) 1–7.
 J.S. Park, M.S Chen, P.S. Yu, An effective hash-based algorithm for mining association rules, ACM SIGMOD ,Vol. 24, 2 (1995) 175–186.
 J.S. Park, M.S. Chen, P.S. Yu, Using a Hash-Based Method with Transaction Trimming for Mining Association Rules, IEEE Transactions on Knowledge and Data Engineering, 9(5), (1997) 813-825.
 J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks for Plausible Inference, Morgan Kaufmann, 1988
 W. Pedrycz, Fuzzy set technology in knowledge discovery, Fuzzy Sets System, Vol. 98, (1998) 279-290
 W. Pedirycz, L.A. Zadeh, Fuzzy Sets Engineering, CRC Press, Inc. Boca Raton, FL, USA, 1995.
 J.R.Quinlan, Learning Efficient Classification Procedures and Their Application to Chess and Games, Machine Learning: An Artificial Intelligence Approach, Morgan Kaufmann, (1983) 463-482.
 J.R. Quinlan, Induction of Decision Tree, Machine Learning, 1(1), (1986) 81-106.
 J.R. Quinlan, C4.5: Programs for machine Learning, San Mateo, CA, 1993.
 C. Romero, S. Ventura, E. Garcia, Data mining in course management systems: Moodle case study and tutorial, Computers & Education 51 (2008) 368–384.
 V.H. Royal, Data Mining For Research and Evaluation, Phi-Delta-Kappan Technology section Nov. (1998) 251-256.
 S. Russell, P. Norving, Artifical Intelligence a Modern Approach, Prentice-Hall International Editions. 1995.
 A. Savasere, E. Omiecinski, S. Navathe, An Efficient Algorithm for Mining Association Rules in Large Databases, the 21st VLDB Conference, (1995) 432-444.
 Y. Saygin, O. Ulusoy, Exploiting Data Mining Techniques for Broadcasting Data in Mobile Computing Environments, IEEE Transactions on Knowledge and Data Enginerring, Vol. 14, No. 6, 2002.
 M.J. Shaw, C.T.G.W. Subramaniam, M.E. Welge, Knowledge Management and Data Mining for Marketing, Decision Support System, 31, 1, (2001) 127-137.
 E. Sheader, I. Gouldsborough, R. Grady, Staff and student perceptions of computer-assisted assessment for physiology practical classes, American Journal of Physiology-Advances in Physiology Education, 30(4), (2006) 174–180.
 R. Srikant, R. Agrawal, Mining Sequential Patterns: Generalizations and Performance Improvements. The 5th International Conference on Extending Database Technology, Avignon, France, (1996) 3-17.
 R. Srikant, R. Agrawal, Mining quantitative association rules in large relational tables, The 1996 ACM SIGMOD International Conference on Management of Data, Monreal, Canada, (1996) 1–12.
 L. Tsantis, J. Castellani, Enhancing learning environments through solution-based knowledge discovery tools. Journal of Special Education Technology, 16(4), (2001) 1–35.
 M.G. Tsipouras, T.P. Exarchos, D.I. Fotiadis, A.P. Kotsia, K.V. Vakalis, K.K. Naka, L.K. Michalis, Automated Diagnosis of Coronary Artery Disease Based on Data Mining and Fuzzy Modeling, Information Technology in Biomedicine, IEEE Transactions on Vol.12, 4, (2008) 447–458.
 C.H. Wang, J.F. Liu, T.P. Hong, S.S. Tseng, A fuzzy inductive learning strategy for modular rules, Fuzzy Sets and Systems 103 (1) (1999) 91–105.
 T. Wang, K. Wang, W. Wang, S. Huang, Y. Chen, Web-based assessment and test analyses (WATA) system: development and evaluation, Journal of Computer Assisted Learing, 20, (2004) 59–71.
 T.H. Wang, What strategies are effective for formative assessment in an elearning environment?, Journal of Computer Assisted Learning, 23(3), (2007) 171–186.
 Y. Wang, S. Wang, K.K. Lai, A New Fuzzy Support Vector Machine to Evaluate Credit Risk, IEEE Transactions on Fuzzy Systems, Vol. 13, No. 6, 2005.
 C.E. Weinstein, Assessment and training of student learning strategies, In R. R. Schmeck (Eds.), Learning Strategies and learning styles, New York, (1988a) Plenum Press.
 C.E. Weinstein, Executive control processes in Learning: Why Knowing About How to learn in not Enough, Adapted from an invited address presented at the annual meeting of National Association for Developmental Education Orlando, (1988b) FL.
 C.E. Weinstein, D.R. Palmer, Learning and Study Strategies Inventory Second Edition, LASSI User’s Manual, 2002.
 S.M. Weiss, N. Indurkhya, Decision tree pruning: biased or optimal? MA: AAAI96/MIT, 1996.
 C.H. Wu, S.C. Kao, Y.Y. Su, C.C. Wu, Targeting customers via discovery knowledge for the insurance industry, Expert Systems with Applications, (2005) 291–299.
 G. Yavas, D. Katsaros, O. Ulusoy, Y. Manolopoulos, A data mining approach for location prediction in mobile environments, Data & Knowledge Engineering (2005) 121–146.
 Y. Yuan, M.J. Shaw, Induction of fuzzy decision trees, Fuzzy Sets and Systems 69 (1995) 125–139.
 L.A. Zadeh, Fuzzy sets, Information and Control 8, (1965) 338-353.
 L.A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems 1, (1978) 3-28.
 T. Zhang, R. Ramakrishnan, M. Livny, BIRCH: An efficient data clustering method for very large, ACM-SIGMOD International Conference on Management of Data, (1996) 103–114.
 H.J. Zimmermann, Fuzzy Set Theory and Its Applications, Kluwer Academic Publishers, 1991.