系統識別號 | U0002-0308201716365400 |
---|---|
DOI | 10.6846/TKU.2017.00082 |
論文名稱(中文) | 具隸屬函數調整機制的模糊時序關聯規則萃取技術 |
論文名稱(英文) | Fuzzy Temporal Association Rule Acquisition Technique with Membership Function Tuning Mechanism |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 資訊工程學系全英語碩士班 |
系所名稱(英文) | Master's Program, Department of Computer Science and Information Engineering (English-taught program) |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 105 |
學期 | 2 |
出版年 | 106 |
研究生(中文) | 周翔 |
研究生(英文) | Hsiang Chou |
學號 | 604785021 |
學位類別 | 碩士 |
語言別 | 英文 |
第二語言別 | |
口試日期 | 2017-07-13 |
論文頁數 | 126頁 |
口試委員 |
指導教授
-
陳俊豪
委員 - 洪智傑 委員 - 陳朝鈞 |
關鍵字(中) |
分群技術 模糊時序關聯規則 遺傳演算法 商品上架時間 隸屬函數 |
關鍵字(英) |
Clustering fuzzy temporal association rule genetic algorithm item lifespan membership functions |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
在實際應用上,相較於二元交易資料,數值型交易資料是更常見的。為處理數值型交易資料,模糊關聯規則探勘演算法因應而生。此外,交易資料中的商品是有存活期間或上架的區間的,故許多演算法亦被提出並用以探勘模糊時序關聯規則。在模糊規則萃取過程中,其關鍵成功因素是如何選擇適合的隸屬函數。雖然文獻中已有不少方法可用於產生適當的隸屬函數,但目前並無文獻闡述如何產生探勘模糊時序關聯規則所需的隸屬函數。故本論文提出兩個具有隸屬函數調整機制的探勘方法用來挖掘模糊時序關聯規則。 在第一個方法,它首先利用分群技術與交易資料集產生每個商品的專屬隸屬函數。隸屬函數的產生主要是根據商品的數值值域,故每個商品的專屬隸屬函數在數值區間與函數數量上不盡相同。具體來說是透過兩因子制定商品的隸屬函數,即區間密度相似與區間資訊相似因子。兩因子分別用於評估區間密度與資料量是否相似。所採用的分群技術使用theta參數值決定兩因子的重要程度。最後,所產生之商品專屬隸屬函數則用於探勘模糊時序關聯規則。此外,本論文亦開發Fuzzy FP-growth方法用提升探勘效率。 因不同的theta參數值對商品的專屬隸屬函數產生過程有很大的影響,為自動找出theta參數值,第二個方法結合遺傳演算法旨在找出可產生最多且多樣的規則的theta值。首先,它利用位元字串表示每一可能的theta值。適合度函數則由規則數量與規則多樣性組成,其中規則多樣性是由商品的平均隸屬函數個數進行評估。如商品有較多的隸屬函數則表示每個隸屬函數的值域是較小的亦表示可挖掘出較特殊的規則。 實驗部分透過模擬與真實交易資料驗證所提方法的有效性。模擬交易資料集的結果顯示第一個方法在規則數量與規則多樣性較現存的方法優異。在真實交易資料集中,因商品購買數量差異相當大,故所提方法可探勘出的規則數量雖較現存方法少,但仍可得到較多樣且有用的規則。在第二個方法中,實驗結果顯示它可找出合適的theta值用以產生數量較多且多樣的規則並可用於揭露交易資料中令人感興趣的資訊。 |
英文摘要 |
In real world applications, transactions are far more common to be presented with quantitative data as opposed to binary data. Fuzzy association rule mining algorithms have thus been proposed to handle quantitative transactions. In addition, items have certain life spans or temporal periods in which they exist in a database. Approaches have also been presented to mine fuzzy temporal association rules (FTARs). A key factor in the acquisition of fuzzy rules is the selection of appropriate membership functions. Although many approaches have been designed to generate membership functions, there is currently no existing approach which deals with the problem of generating membership functions for mining FTARs for market basket analysis. In this thesis, we propose two approaches with membership function tuning mechanism to discover FTARs. In the first approach, it utilizes a clustering method to generate unique membership functions specifically tailored to each item in a data set. Each membership function is based on each individual items’ quantitative range, and the generated membership functions differ not only in terms the values of each interval but also in terms of the number of intervals. Two factors are instrumental in deciding each item's membership functions; density-similarity among intervals, which corresponds to the similarity in density of intervals, and information closeness within an interval, which corresponds to the similarity in the number of data points between intervals. A parameter θ is used to indicate the importance of the two factors. At last, the derived membership functions are employed in a fuzzy temporal rule mining algorithm to generate association rules. Besides, to speed up the mining process, the Fuzzy FP-growth approach is also utilized in two methods. Because different θ values will affect greatly the set of membership functions produced, to automatically obtain a suitable parameter θ, the second approach incorporates a genetic algorithm to decide on the optimum value for θ which can produce the largest number of diverse rules. It first uses bit string to encode a possible theta value. The fitness function is made up of a combination of two factors, the number of rules generated and also the diversity of the rules, where diversity of the rules is evaluated by average number of membership functions for items. If a membership function has a larger number of intervals, each interval is smaller so the rules generated are more specific. Experiments were carried out on one simulated dataset and one real dataset to show the effectiveness of the proposed approaches. For the simulated dataset, the first proposed approach could greatly outperform the previous approach using predefined membership functions in terms of number of rules and diversity of rules. Since the real data set was made up of data with largely differing quantitative values, it can generate a smaller number of rules but the rules related to much more specific fuzzy regions, making them more useful. In relation to the second approach, the genetic algorithm could successfully discover the optimum value for θ in terms of producing the largest number of diverse rules. These rules were used to uncover interesting information from within the datasets. |
第三語言摘要 | |
論文目次 |
Table of Contents CHAPTER 1 INTRODUCTION 1 1.1 Problem Definition and Motivation 1 1.2 Contributions 5 1.3 Reader’s Guide 6 CHAPTER 2 RELATED WORK 7 2.1 Association Rule Mining Approaches 7 2.1.1 Binary Association Rule Mining Approaches 8 2.1.2 Fuzzy Association Rule Mining Approaches 8 2.1.3 Temporal Fuzzy Association Rule Mining Approaches 10 2.2 Membership Function Generation Approaches 11 2.2.1 Membership Function Generation Approaches using Evolutionary Algorithms 11 2.2.2 Membership Function Generation Approaches not using Evolutionary Algorithms 12 CHAPTER 3 MEMBERSHIP FUNCTION GENERATION FOR FUZZY TEMPORAL ASSOCIATION RULE ACQUISITION 13 3.1 Motivation 13 3.2 Notation 15 3.2.1 Membership Function Generation Notation 15 3.2.2 Fuzzy Temporal Association Rule Mining Notation 18 3.3 Components of Proposed Approach 22 3.3.1 Membership Function Generation 22 3.3.1.1 Assigning the Initial Intervals 23 3.3.1.2 Information Closeness (IC) 27 3.3.1.3 Information Density-Similarity (DS) 28 3.3.1.4 Threshold Value 29 3.3.1.5 Density-Similarity Among Intervals 30 3.3.1.6 Information Closeness Within an Interval 30 3.3.1.7 Selecting the Optimum Interval 32 3.3.1.8 Generating Membership Functions from Intervals 32 3.3.2 Fuzzy Temporal Association Rule Mining 34 3.3.2.1 Converting to Fuzzy Set 34 3.3.2.2 Temporal Information Table 35 3.3.2.3 Calculating Temporal Fuzzy Support 35 3.3.2.4 Generating Large Itemsets 36 3.3.2.4 Generating Fuzzy Temporal Association Rules 37 3.4 Algorithm for Approach I 38 3.4.1 Algorithm for Membership Function Generation 38 3.4.2 Algorithm for Fuzzy Temporal Association Rule Mining 45 3.5 Example of Approach I 49 3.5.1 Example of GNAAR Approach 50 3.5.2 Example of GNAAROQ Approach 63 3.5.3 Example of Temporal Fuzzy Rule Mining Approach 76 CHAPTER 4 OPTIMIZING PARAMETER FOR MEMBERSHIP FUNCTION GENERATION THROUGH A GENETIC ALGORITHM 85 4.1 Motivation 85 4.2 Notation 88 4.3 Components of Proposed Approach 89 4.3.1 FP-Growth 89 4.3.2 Encoding Scheme 90 4.3.3 Genetic Operators 91 4.3.4 Fitness Evaluation 92 4.3.5 Fitness Lookup Table 94 4.4 Algorithm for Approach II 95 4.5 Example of Approach II 98 CHAPTER 5 EXPERIMENTAL RESULTS 104 5.1 Experimental Results for Approach I 104 5.1.1 Experimental Datasets 105 5.1.2 Approach used for Comparison 106 5.1.3 Comparison of Approaches with Different Minimum Supports 108 5.1.4 Comparison of Approaches with Different Minimum Confidences 111 5.2 Experimental Results for Approach II 113 5.2.1 Comparison of Efficiency Between Apriori Algorithm and FP-Growth 114 5.2.2 Optimum Theta Values Discovered by GA 115 5.2.3 Example of a Rule with Large Number of Items 116 5.2.4 Example of a Rule with Diverse Fuzzy Regions 117 5.2.3 Example of an Unusual Rule with High Confidence 118 CHAPTER 6 CONCLUSION AND FUTURE WORKS 120 REFERENCES 123 List of Figures Figure 1: Initial intervals generated using GNAAR approach 23 Figure 2: Initial intervals generated using GNAAROH approach 24 Figure 3: Initial intervals generated using GNAAROQ approach 25 Figure 4: Interval C20 contains no data points when using GNAAR approach 25 Figure 5: Interval C20 contains data points when using GNAAROH approach 26 Figure 6: Membership functions assigned based on centroids of intervals 33 Figure 7: Membership function for Item X 34 Figure 8: The initial 4 intervals 51 Figure 9: Intervals C00 and C10 merged to become C11 52 Figure 10: The initial 4 intervals 57 Figure 11: Interval C00 and C10 merged 57 Figure 12: Interval C10 and C20 merged 57 Figure 13: Interval C20 and C30 merged 57 Figure 14: Level 0 58 Figure 15: Level 1 58 Figure 16: Level 2 58 Figure 17: Level 3 58 Figure 18: Centroid of intervals used to form membership functions 62 Figure 19: The resulting membership function 62 Figure 20: GNAAROQ interval extension 65 Figure 21: The initial 4 intervals 71 Figure 22: Interval C00 and C10 merged 71 Figure 23: Interval C20 and C30 merged 71 Figure 24: Interval C10 and C20 merged 71 Figure 25: Level 0 71 Figure 26: Level 1 71 Figure 27: Level 2 71 Figure 28: Level 3 71 Figure 29: The centroids of each fuzzy region 75 Figure 30: The resulting membership function 75 Figure 31: The membership functions used for each item 78 Figure 32: Two possible different membership functions for the same item 86 Figure 33: 3 different values of theta generated from chromosome 91 Figure 34: The effect of different θ values on fitness value for GNAAR algorithm 93 Figure 35: Chromosomes in the initial population 99 Figure 36: Crossover operator 102 Figure 37: Mutation operator 103 Figure 38: Dispersion of quantitative values 106 Figure 39: Membership functions used for Generated Dataset 107 Figure 40: Membership functions used for Online Retail Dataset 107 Figure 41: Minimum support and number of rules for Generated Dataset 108 Figure 42: Minimum support and number of rules for Online Retail Dataset 109 Figure 43: Confidence and number of rules for Generated Dataset 111 Figure 44: Confidence and number of rules for Online Retail Dataset 112 Figure 45: Running time of Apriori algorithm and the FP-Growth algorithm 114 Figure 46 Example of a large 9-itemset rule 116 Figure 47 Example of a rule with diverse fuzzy regions 117 Figure 48: Fuzzy region C70 for both items 117 Figure 49 Example of an unusual rule with high confidence 118 Table 1: Input dataset ................................ ................................ ................................ . 50 Table 2: Quantitative values for item A ................................ ................................ ..... 50 Table 3: Threshold values of each interval ................................ ................................ 57 Table 4: The formula is used to decide on the most suitable level 60 Table 5: Data set of quantitative transactions ................................ ............................ 63 Table 6: Quantitative values for item A ................................ ................................ ..... 63 Table 7: The Threshold Values of each interval ................................ ......................... 70 Table 8: Formula used to decide on the most suitable level ........... 74 Table 9: The six temporal quantitative transactions ................................ ................... 77 Table 10:The fuzzy set for each item in each transaction ................................ .......... 77 Table 11: Temporal Information Table ................................ ................................ ....... 78 Table 12: The count values of the fuzzy regions ................................ ....................... 79 Table 13: Fuzzy temporal supports of fuzzy regions ................................ ................. 80 Table 14: Membership values for A.1∩E.1 ................................ .............................. 81 Table 15: Large 2-itemsets ................................ ................................ ......................... 82 Table 16: TS table ................................ ................................ ................................ ...... 82 Table 17: Diversity of each chromosome ................................ ................................ 100 Table 18: Number of rules for each chromosome ................................ .................... 100 Table 19: Fitness value for each chromosome ................................ ......................... 101 Table 20: Confidence and number of rules for Generated Dataset .......................... 111 Table 21: Confidence and number of rules for Online Retail Dataset ..................... 112 Table 22: Optimum theta values for each approach ................................ ................. 115 |
參考文獻 |
[1]R. Agrawal and R. Srikant, "Fast algorithm for mining association rules," The International Conference on Very Large Data Bases, pp. 487–499, 1994. [2]M. Delgado, N. Marin, D. Sanchez and M. Vila, "Fuzzy association rules: general model and applications," IEEE Transactions on Fuzzy Systems, Vol. 11, No. 2, pp. 214-225, 2003. [3]J. Alcala-Fdez, R. Alcala and F. Herrera, "A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning," IEEE Transactions on Fuzzy Systems, Vol. 19, No. 5, pp. 857-872, 2011. [4]T. Hong, C. Chen, Y. Wu and Y. Lee, "A GA-based fuzzy mining approach to achieve a trade-off between number of rules and suitability of membership functions," Soft Computing, Vol. 10, No. 11, pp. 1091-1101, 2006. [5]W. Lee and S. Lee, "Discovery of fuzzy temporal association rules," IEEE Transactions on Systems, Man and Cybernetics, Part B, Vol. 34, No. 6, pp. 2330-2342, 2004. [6]C. Chen, G. Lan, T. Hong and S. Lin, "Mining fuzzy temporal association rules by item lifespans," Applied Soft Computing, Vol. 41, pp. 265-274, 2016. [7]H. Verlinde, M. De Cock and R. Boute, "Fuzzy versus quantitative association rules: a fair data-driven comparison," IEEE Transactions on Systems, Man and Cybernetics, Part B, Vol. 36, No. 3, pp. 679-684, 2005. [8]C. Chen, V. Tseng and T. Hong, "Cluster-based evaluation in fuzzy-genetic data mining," IEEE Transactions on Fuzzy Systems, Vol. 16, No. 1, pp. 249-262, 2008. [9]W. Au and K. Chan, "Mining fuzzy association rules in a bank-account database", IEEE Transactions on Fuzzy Systems, vol. 11, no. 2, pp. 238-248, 2003. [10]G. Lan, C. Chen, T. Hong and S. Lin, "A fuzzy approach for mining general temporal association rules in a publication database," 11th International Conference on Hybrid Intelligent Systems (HIS), pp. 611-615, 2011. [11]B. Chien, Z. Lin and Y. Chen, "An algorithm of granulation on numeric attributes for association rules mining," Polish Academy of Sciences, Committee of Automation and Robotics, Vol. 12, No. 4, pp 379-395, 2002 [12]A. Paul, P. Shill, R. Rabin, A. Kundu and A. Akhand, "Fuzzy membership function generation using DMS-PSO for the diagnosis of heart disease," 18th International Conference on Computer and Information Technology (ICCIT), pp. 456 – 461, 2015 [13]B. Choi and F. Rhee, "Interval type-2 fuzzy membership function generation methods for pattern recognition," Information Sciences, Vol. 179, No. 13, pp. 2102-2122, 2009. [14]R. Srikant and R. Agrawal, "Mining quantitative association rules in large relational tables,"ACM SIGMOD Record, Vol. 25, No. 2, pp. 1-12, 1996. [15]W. Lee, J. Jiang and S. Lee, "Mining fuzzy periodic association rules," Data & Knowledge Engineering, Vol. 65, No. 3, pp. 442-462, 2008. [16]Y. Woon, W. Ng and E. Lim, "A support-ordered trie for fast frequent itemset discovery," IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 07, pp. 875-879, 2004. [17]J. Han, J. Pei, Y. Yin and R. Mao, "Mining frequent patterns without candidate generation: a frequent-pattern tree approach," Data Mining and Knowledge Discovery, Vol. 8, No. 1, pp. 53-87, 2004. [18]T. Zhang, R. Ramakrishnan and M. Livny, "BIRCH: An efficient data clustering method for very large databases," ACM SIGMOD Record, Vol. 25, No. 2, pp. 103-114, 1996. [19]T. Hong, C. Chen, Y. Lee and Y. Wu, "Genetic-fuzzy data mining with divide-and-conquer strategy," IEEE Transactions on Evolutionary Computation, Vol. 12, No. 2, pp. 252-265, 2008. [20]E. Mansoori, "FRBC: A fuzzy rule-based clustering algorithm," IEEE Transactions on Fuzzy Systems, Vol. 19, No. 5, pp. 960-971, 2011. [21] Y. Hui, K. Choy, G. Ho, H. Lam, "A fuzzy association rule mining framework for variables selection concerning the storage time of packaged food,"IEEE International Conference on Fuzzy Systems,pp. 671-677, 2016. [22] M. del Jesus, P. Gonzalez, F. Herrera and M. Mesonero, "Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing," IEEE Transactions on Fuzzy Systems, Vol. 15, No. 4, pp. 578-592, 2007. [23] R. Agrawal, T. Imielinksi and A. Swami, "Mining association rules between sets ofitems in large database,"The 1993 ACM SIGMOD Conference, pp. 207-216, 1993. [24] R. Agrawal, T. Imielinksi and A. Swami, "Database mining: a performance perspective, "EEE Transactions on Knowledge and Data Engineering archive, Vol. 5 No. 6, pp. 914–925,1993. [25] R. Agrawal, R. Srikant and Q. Vu, "Mining association rules with item constraints, "The Third International Conference on Knowledge Discovery in Databases andData Mining, pp. 67-73, 1997 [26]S. Matthews, M. Gongora, A. Hopgood and S. Ahmadi, "Web usage mining with evolutionary extraction of temporal fuzzy association rules," Knowledge-Based Systems, Vol. 54, pp. 66-72, 2013. [27]T. Hong, G. Lan, J. Su, P. Wu and S. Wang, "Discovery of temporal association rules with hierarchical granular framework,"Applied Computing and Informatics, Vol. 12, No. 2, pp. 134-141, 2016. [28] C. Wang, W. Lee and C. Pang, "Applying fuzzy FP-Growth to mine fuzzy association rules,"International Journal of Computer, Electrical, Automation, Control and Information Engineering, Vol 4, No. 5, pp. 986–992, 2010 [29] C. Lin, T. Hong and W. Lu, "An efficient tree-based fuzzy data mining approach,"International Journal of Fuzzy Systems, Vol. 12, No. 2, pp. 150-157, 2010 [30] D. Chen, S. Sain and K. Guo, "Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining," Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197-208, 2012 |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信