淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


系統識別號 U0002-0308201716365400
中文論文名稱 具隸屬函數調整機制的模糊時序關聯規則萃取技術
英文論文名稱 Fuzzy Temporal Association Rule Acquisition Technique with Membership Function Tuning Mechanism
校院名稱 淡江大學
系所名稱(中) 資訊工程學系全英語碩士班
系所名稱(英) Master’s Program, Department of Computer Science and Information Engineering (English-taught program
學年度 105
學期 2
出版年 106
研究生中文姓名 周翔
研究生英文姓名 Hsiang Chou
學號 604785021
學位類別 碩士
語文別 英文
口試日期 2017-07-13
論文頁數 126頁
口試委員 指導教授-陳俊豪
委員-洪智傑
委員-陳朝鈞
中文關鍵字 分群技術  模糊時序關聯規則  遺傳演算法  商品上架時間  隸屬函數 
英文關鍵字 Clustering  fuzzy temporal association rule  genetic algorithm  item lifespan  membership functions 
學科別分類 學科別應用科學資訊工程
中文摘要 在實際應用上,相較於二元交易資料,數值型交易資料是更常見的。為處理數值型交易資料,模糊關聯規則探勘演算法因應而生。此外,交易資料中的商品是有存活期間或上架的區間的,故許多演算法亦被提出並用以探勘模糊時序關聯規則。在模糊規則萃取過程中,其關鍵成功因素是如何選擇適合的隸屬函數。雖然文獻中已有不少方法可用於產生適當的隸屬函數,但目前並無文獻闡述如何產生探勘模糊時序關聯規則所需的隸屬函數。故本論文提出兩個具有隸屬函數調整機制的探勘方法用來挖掘模糊時序關聯規則。

在第一個方法,它首先利用分群技術與交易資料集產生每個商品的專屬隸屬函數。隸屬函數的產生主要是根據商品的數值值域,故每個商品的專屬隸屬函數在數值區間與函數數量上不盡相同。具體來說是透過兩因子制定商品的隸屬函數,即區間密度相似與區間資訊相似因子。兩因子分別用於評估區間密度與資料量是否相似。所採用的分群技術使用theta參數值決定兩因子的重要程度。最後,所產生之商品專屬隸屬函數則用於探勘模糊時序關聯規則。此外,本論文亦開發Fuzzy FP-growth方法用提升探勘效率。

因不同的theta參數值對商品的專屬隸屬函數產生過程有很大的影響,為自動找出theta參數值,第二個方法結合遺傳演算法旨在找出可產生最多且多樣的規則的theta值。首先,它利用位元字串表示每一可能的theta值。適合度函數則由規則數量與規則多樣性組成,其中規則多樣性是由商品的平均隸屬函數個數進行評估。如商品有較多的隸屬函數則表示每個隸屬函數的值域是較小的亦表示可挖掘出較特殊的規則。
實驗部分透過模擬與真實交易資料驗證所提方法的有效性。模擬交易資料集的結果顯示第一個方法在規則數量與規則多樣性較現存的方法優異。在真實交易資料集中,因商品購買數量差異相當大,故所提方法可探勘出的規則數量雖較現存方法少,但仍可得到較多樣且有用的規則。在第二個方法中,實驗結果顯示它可找出合適的theta值用以產生數量較多且多樣的規則並可用於揭露交易資料中令人感興趣的資訊。
英文摘要 In real world applications, transactions are far more common to be presented with quantitative data as opposed to binary data. Fuzzy association rule mining algorithms have thus been proposed to handle quantitative transactions. In addition, items have certain life spans or temporal periods in which they exist in a database. Approaches have also been presented to mine fuzzy temporal association rules (FTARs). A key factor in the acquisition of fuzzy rules is the selection of appropriate membership functions. Although many approaches have been designed to generate membership functions, there is currently no existing approach which deals with the problem of generating membership functions for mining FTARs for market basket analysis. In this thesis, we propose two approaches with membership function tuning mechanism to discover FTARs.
In the first approach, it utilizes a clustering method to generate unique membership functions specifically tailored to each item in a data set. Each membership function is based on each individual items’ quantitative range, and the generated membership functions differ not only in terms the values of each interval but also in terms of the number of intervals. Two factors are instrumental in deciding each item's membership functions; density-similarity among intervals, which corresponds to the similarity in density of intervals, and information closeness within an interval, which corresponds to the similarity in the number of data points between intervals. A parameter θ is used to indicate the importance of the two factors. At last, the derived membership functions are employed in a fuzzy temporal rule mining algorithm to generate association rules. Besides, to speed up the mining process, the Fuzzy FP-growth approach is also utilized in two methods.
Because different θ values will affect greatly the set of membership functions produced, to automatically obtain a suitable parameter θ, the second approach incorporates a genetic algorithm to decide on the optimum value for θ which can produce the largest number of diverse rules. It first uses bit string to encode a possible theta value. The fitness function is made up of a combination of two factors, the number of rules generated and also the diversity of the rules, where diversity of the rules is evaluated by average number of membership functions for items. If a membership function has a larger number of intervals, each interval is smaller so the rules generated are more specific.
Experiments were carried out on one simulated dataset and one real dataset to show the effectiveness of the proposed approaches. For the simulated dataset, the first proposed approach could greatly outperform the previous approach using predefined membership functions in terms of number of rules and diversity of rules. Since the real data set was made up of data with largely differing quantitative values, it can generate a smaller number of rules but the rules related to much more specific fuzzy regions, making them more useful. In relation to the second approach, the genetic algorithm could successfully discover the optimum value for θ in terms of producing the largest number of diverse rules. These rules were used to uncover interesting information from within the datasets.
論文目次 Table of Contents

CHAPTER 1 INTRODUCTION 1
1.1 Problem Definition and Motivation 1
1.2 Contributions 5
1.3 Reader’s Guide 6
CHAPTER 2 RELATED WORK 7
2.1 Association Rule Mining Approaches 7
2.1.1 Binary Association Rule Mining Approaches 8
2.1.2 Fuzzy Association Rule Mining Approaches 8
2.1.3 Temporal Fuzzy Association Rule Mining Approaches 10
2.2 Membership Function Generation Approaches 11
2.2.1 Membership Function Generation Approaches using Evolutionary Algorithms 11
2.2.2 Membership Function Generation Approaches not using Evolutionary Algorithms 12
CHAPTER 3 MEMBERSHIP FUNCTION GENERATION FOR FUZZY TEMPORAL ASSOCIATION RULE ACQUISITION 13
3.1 Motivation 13
3.2 Notation 15
3.2.1 Membership Function Generation Notation 15
3.2.2 Fuzzy Temporal Association Rule Mining Notation 18
3.3 Components of Proposed Approach 22
3.3.1 Membership Function Generation 22
3.3.1.1 Assigning the Initial Intervals 23
3.3.1.2 Information Closeness (IC) 27
3.3.1.3 Information Density-Similarity (DS) 28
3.3.1.4 Threshold Value 29
3.3.1.5 Density-Similarity Among Intervals 30
3.3.1.6 Information Closeness Within an Interval 30
3.3.1.7 Selecting the Optimum Interval 32
3.3.1.8 Generating Membership Functions from Intervals 32
3.3.2 Fuzzy Temporal Association Rule Mining 34
3.3.2.1 Converting to Fuzzy Set 34
3.3.2.2 Temporal Information Table 35
3.3.2.3 Calculating Temporal Fuzzy Support 35
3.3.2.4 Generating Large Itemsets 36
3.3.2.4 Generating Fuzzy Temporal Association Rules 37
3.4 Algorithm for Approach I 38
3.4.1 Algorithm for Membership Function Generation 38
3.4.2 Algorithm for Fuzzy Temporal Association Rule Mining 45
3.5 Example of Approach I 49
3.5.1 Example of GNAAR Approach 50
3.5.2 Example of GNAAROQ Approach 63
3.5.3 Example of Temporal Fuzzy Rule Mining Approach 76
CHAPTER 4 OPTIMIZING PARAMETER FOR MEMBERSHIP FUNCTION GENERATION THROUGH A GENETIC ALGORITHM 85
4.1 Motivation 85
4.2 Notation 88
4.3 Components of Proposed Approach 89
4.3.1 FP-Growth 89
4.3.2 Encoding Scheme 90
4.3.3 Genetic Operators 91
4.3.4 Fitness Evaluation 92
4.3.5 Fitness Lookup Table 94
4.4 Algorithm for Approach II 95
4.5 Example of Approach II 98
CHAPTER 5 EXPERIMENTAL RESULTS 104
5.1 Experimental Results for Approach I 104
5.1.1 Experimental Datasets 105
5.1.2 Approach used for Comparison 106
5.1.3 Comparison of Approaches with Different Minimum Supports 108
5.1.4 Comparison of Approaches with Different Minimum Confidences 111
5.2 Experimental Results for Approach II 113
5.2.1 Comparison of Efficiency Between Apriori Algorithm and FP-Growth 114
5.2.2 Optimum Theta Values Discovered by GA 115
5.2.3 Example of a Rule with Large Number of Items 116
5.2.4 Example of a Rule with Diverse Fuzzy Regions 117
5.2.3 Example of an Unusual Rule with High Confidence 118
CHAPTER 6 CONCLUSION AND FUTURE WORKS 120
REFERENCES 123


List of Figures

Figure 1: Initial intervals generated using GNAAR approach 23
Figure 2: Initial intervals generated using GNAAROH approach 24
Figure 3: Initial intervals generated using GNAAROQ approach 25
Figure 4: Interval C20 contains no data points when using GNAAR approach 25
Figure 5: Interval C20 contains data points when using GNAAROH approach 26
Figure 6: Membership functions assigned based on centroids of intervals 33
Figure 7: Membership function for Item X 34
Figure 8: The initial 4 intervals 51
Figure 9: Intervals C00 and C10 merged to become C11 52
Figure 10: The initial 4 intervals 57
Figure 11: Interval C00 and C10 merged 57
Figure 12: Interval C10 and C20 merged 57
Figure 13: Interval C20 and C30 merged 57
Figure 14: Level 0 58
Figure 15: Level 1 58
Figure 16: Level 2 58
Figure 17: Level 3 58
Figure 18: Centroid of intervals used to form membership functions 62
Figure 19: The resulting membership function 62
Figure 20: GNAAROQ interval extension 65
Figure 21: The initial 4 intervals 71
Figure 22: Interval C00 and C10 merged 71
Figure 23: Interval C20 and C30 merged 71
Figure 24: Interval C10 and C20 merged 71
Figure 25: Level 0 71
Figure 26: Level 1 71
Figure 27: Level 2 71
Figure 28: Level 3 71
Figure 29: The centroids of each fuzzy region 75
Figure 30: The resulting membership function 75
Figure 31: The membership functions used for each item 78
Figure 32: Two possible different membership functions for the same item 86
Figure 33: 3 different values of theta generated from chromosome 91
Figure 34: The effect of different θ values on fitness value for GNAAR algorithm 93
Figure 35: Chromosomes in the initial population 99
Figure 36: Crossover operator 102
Figure 37: Mutation operator 103
Figure 38: Dispersion of quantitative values 106
Figure 39: Membership functions used for Generated Dataset 107
Figure 40: Membership functions used for Online Retail Dataset 107
Figure 41: Minimum support and number of rules for Generated Dataset 108
Figure 42: Minimum support and number of rules for Online Retail Dataset 109
Figure 43: Confidence and number of rules for Generated Dataset 111
Figure 44: Confidence and number of rules for Online Retail Dataset 112
Figure 45: Running time of Apriori algorithm and the FP-Growth algorithm 114
Figure 46 Example of a large 9-itemset rule 116
Figure 47 Example of a rule with diverse fuzzy regions 117
Figure 48: Fuzzy region C70 for both items 117
Figure 49 Example of an unusual rule with high confidence 118


Table 1: Input dataset ................................ ................................ ................................ . 50
Table 2: Quantitative values for item A ................................ ................................ ..... 50
Table 3: Threshold values of each interval ................................ ................................ 57
Table 4: The formula is used to decide on the most suitable level 60
Table 5: Data set of quantitative transactions ................................ ............................ 63
Table 6: Quantitative values for item A ................................ ................................ ..... 63
Table 7: The Threshold Values of each interval ................................ ......................... 70
Table 8: Formula used to decide on the most suitable level ........... 74
Table 9: The six temporal quantitative transactions ................................ ................... 77
Table 10:The fuzzy set for each item in each transaction ................................ .......... 77
Table 11: Temporal Information Table ................................ ................................ ....... 78
Table 12: The count values of the fuzzy regions ................................ ....................... 79
Table 13: Fuzzy temporal supports of fuzzy regions ................................ ................. 80
Table 14: Membership values for A.1∩E.1 ................................ .............................. 81
Table 15: Large 2-itemsets ................................ ................................ ......................... 82
Table 16: TS table ................................ ................................ ................................ ...... 82
Table 17: Diversity of each chromosome ................................ ................................ 100
Table 18: Number of rules for each chromosome ................................ .................... 100
Table 19: Fitness value for each chromosome ................................ ......................... 101
Table 20: Confidence and number of rules for Generated Dataset .......................... 111
Table 21: Confidence and number of rules for Online Retail Dataset ..................... 112
Table 22: Optimum theta values for each approach ................................ ................. 115
參考文獻 [1]R. Agrawal and R. Srikant, "Fast algorithm for mining association rules," The International Conference on Very Large Data Bases, pp. 487–499, 1994.

[2]M. Delgado, N. Marin, D. Sanchez and M. Vila, "Fuzzy association rules: general model and applications," IEEE Transactions on Fuzzy Systems, Vol. 11, No. 2, pp. 214-225, 2003.

[3]J. Alcala-Fdez, R. Alcala and F. Herrera, "A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning," IEEE Transactions on Fuzzy Systems, Vol. 19, No. 5, pp. 857-872, 2011.

[4]T. Hong, C. Chen, Y. Wu and Y. Lee, "A GA-based fuzzy mining approach to achieve a trade-off between number of rules and suitability of membership functions," Soft Computing, Vol. 10, No. 11, pp. 1091-1101, 2006.

[5]W. Lee and S. Lee, "Discovery of fuzzy temporal association rules," IEEE Transactions on Systems, Man and Cybernetics, Part B, Vol. 34, No. 6, pp. 2330-2342, 2004.

[6]C. Chen, G. Lan, T. Hong and S. Lin, "Mining fuzzy temporal association rules by item lifespans," Applied Soft Computing, Vol. 41, pp. 265-274, 2016.

[7]H. Verlinde, M. De Cock and R. Boute, "Fuzzy versus quantitative association rules: a fair data-driven comparison," IEEE Transactions on Systems, Man and Cybernetics, Part B, Vol. 36, No. 3, pp. 679-684, 2005.

[8]C. Chen, V. Tseng and T. Hong, "Cluster-based evaluation in fuzzy-genetic data mining," IEEE Transactions on Fuzzy Systems, Vol. 16, No. 1, pp. 249-262, 2008.


[9]W. Au and K. Chan, "Mining fuzzy association rules in a bank-account database", IEEE Transactions on Fuzzy Systems, vol. 11, no. 2, pp. 238-248, 2003.

[10]G. Lan, C. Chen, T. Hong and S. Lin, "A fuzzy approach for mining general temporal association rules in a publication database," 11th International Conference on Hybrid Intelligent Systems (HIS), pp. 611-615, 2011.

[11]B. Chien, Z. Lin and Y. Chen, "An algorithm of granulation on numeric attributes for association rules mining," Polish Academy of Sciences, Committee of Automation and Robotics, Vol. 12, No. 4, pp 379-395, 2002

[12]A. Paul, P. Shill, R. Rabin, A. Kundu and A. Akhand, "Fuzzy membership function generation using DMS-PSO for the diagnosis of heart disease," 18th International Conference on Computer and Information Technology (ICCIT), pp. 456 – 461, 2015

[13]B. Choi and F. Rhee, "Interval type-2 fuzzy membership function generation methods for pattern recognition," Information Sciences, Vol. 179, No. 13, pp. 2102-2122, 2009.

[14]R. Srikant and R. Agrawal, "Mining quantitative association rules in large relational tables,"ACM SIGMOD Record, Vol. 25, No. 2, pp. 1-12, 1996.

[15]W. Lee, J. Jiang and S. Lee, "Mining fuzzy periodic association rules," Data & Knowledge Engineering, Vol. 65, No. 3, pp. 442-462, 2008.

[16]Y. Woon, W. Ng and E. Lim, "A support-ordered trie for fast frequent itemset discovery," IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 07, pp. 875-879, 2004.

[17]J. Han, J. Pei, Y. Yin and R. Mao, "Mining frequent patterns without candidate generation: a frequent-pattern tree approach," Data Mining and Knowledge Discovery, Vol. 8, No. 1, pp. 53-87, 2004.

[18]T. Zhang, R. Ramakrishnan and M. Livny, "BIRCH: An efficient data clustering method for very large databases," ACM SIGMOD Record, Vol. 25, No. 2, pp. 103-114, 1996.

[19]T. Hong, C. Chen, Y. Lee and Y. Wu, "Genetic-fuzzy data mining with divide-and-conquer strategy," IEEE Transactions on Evolutionary Computation, Vol. 12, No. 2, pp. 252-265, 2008.
[20]E. Mansoori, "FRBC: A fuzzy rule-based clustering algorithm," IEEE Transactions on Fuzzy Systems, Vol. 19, No. 5, pp. 960-971, 2011.

[21] Y. Hui, K. Choy, G. Ho, H. Lam, "A fuzzy association rule mining framework for variables selection concerning the storage time of packaged food,"IEEE International Conference on Fuzzy Systems,pp. 671-677, 2016.

[22] M. del Jesus, P. Gonzalez, F. Herrera and M. Mesonero, "Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing," IEEE Transactions on Fuzzy Systems, Vol. 15, No. 4, pp. 578-592, 2007.

[23] R. Agrawal, T. Imielinksi and A. Swami, "Mining association rules between sets ofitems in large database,"The 1993 ACM SIGMOD Conference, pp. 207-216, 1993.

[24] R. Agrawal, T. Imielinksi and A. Swami, "Database mining: a performance perspective, "EEE Transactions on Knowledge and Data Engineering archive, Vol. 5 No. 6, pp. 914–925,1993.

[25] R. Agrawal, R. Srikant and Q. Vu, "Mining association rules with item constraints, "The Third International Conference on Knowledge Discovery in Databases andData Mining, pp. 67-73, 1997

[26]S. Matthews, M. Gongora, A. Hopgood and S. Ahmadi, "Web usage mining with evolutionary extraction of temporal fuzzy association rules," Knowledge-Based Systems, Vol. 54, pp. 66-72, 2013.

[27]T. Hong, G. Lan, J. Su, P. Wu and S. Wang, "Discovery of temporal association rules with hierarchical granular framework,"Applied Computing and Informatics, Vol. 12, No. 2, pp. 134-141, 2016.

[28] C. Wang, W. Lee and C. Pang, "Applying fuzzy FP-Growth to mine fuzzy association rules,"International Journal of Computer, Electrical, Automation, Control and Information Engineering, Vol 4, No. 5, pp. 986–992, 2010

[29] C. Lin, T. Hong and W. Lu, "An efficient tree-based fuzzy data mining approach,"International Journal of Fuzzy Systems, Vol. 12, No. 2, pp. 150-157, 2010

[30] D. Chen, S. Sain and K. Guo, "Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining," Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197-208, 2012
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2022-08-08公開。
  • 同意授權瀏覽/列印電子全文服務,於2022-08-08起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2486 或 來信