§ 瀏覽學位論文書目資料
  
系統識別號 U0002-0308201716365400
DOI 10.6846/TKU.2017.00082
論文名稱(中文) 具隸屬函數調整機制的模糊時序關聯規則萃取技術
論文名稱(英文) Fuzzy Temporal Association Rule Acquisition Technique with Membership Function Tuning Mechanism
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系全英語碩士班
系所名稱(英文) Master's Program, Department of Computer Science and Information Engineering (English-taught program)
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 105
學期 2
出版年 106
研究生(中文) 周翔
研究生(英文) Hsiang Chou
學號 604785021
學位類別 碩士
語言別 英文
第二語言別
口試日期 2017-07-13
論文頁數 126頁
口試委員 指導教授 - 陳俊豪
委員 - 洪智傑
委員 - 陳朝鈞
關鍵字(中) 分群技術
模糊時序關聯規則
遺傳演算法
商品上架時間
隸屬函數
關鍵字(英) Clustering
fuzzy temporal association rule
genetic algorithm
item lifespan
membership functions
第三語言關鍵字
學科別分類
中文摘要
在實際應用上,相較於二元交易資料,數值型交易資料是更常見的。為處理數值型交易資料,模糊關聯規則探勘演算法因應而生。此外,交易資料中的商品是有存活期間或上架的區間的,故許多演算法亦被提出並用以探勘模糊時序關聯規則。在模糊規則萃取過程中,其關鍵成功因素是如何選擇適合的隸屬函數。雖然文獻中已有不少方法可用於產生適當的隸屬函數,但目前並無文獻闡述如何產生探勘模糊時序關聯規則所需的隸屬函數。故本論文提出兩個具有隸屬函數調整機制的探勘方法用來挖掘模糊時序關聯規則。

在第一個方法,它首先利用分群技術與交易資料集產生每個商品的專屬隸屬函數。隸屬函數的產生主要是根據商品的數值值域,故每個商品的專屬隸屬函數在數值區間與函數數量上不盡相同。具體來說是透過兩因子制定商品的隸屬函數,即區間密度相似與區間資訊相似因子。兩因子分別用於評估區間密度與資料量是否相似。所採用的分群技術使用theta參數值決定兩因子的重要程度。最後,所產生之商品專屬隸屬函數則用於探勘模糊時序關聯規則。此外,本論文亦開發Fuzzy FP-growth方法用提升探勘效率。

因不同的theta參數值對商品的專屬隸屬函數產生過程有很大的影響,為自動找出theta參數值,第二個方法結合遺傳演算法旨在找出可產生最多且多樣的規則的theta值。首先,它利用位元字串表示每一可能的theta值。適合度函數則由規則數量與規則多樣性組成,其中規則多樣性是由商品的平均隸屬函數個數進行評估。如商品有較多的隸屬函數則表示每個隸屬函數的值域是較小的亦表示可挖掘出較特殊的規則。
實驗部分透過模擬與真實交易資料驗證所提方法的有效性。模擬交易資料集的結果顯示第一個方法在規則數量與規則多樣性較現存的方法優異。在真實交易資料集中,因商品購買數量差異相當大,故所提方法可探勘出的規則數量雖較現存方法少,但仍可得到較多樣且有用的規則。在第二個方法中,實驗結果顯示它可找出合適的theta值用以產生數量較多且多樣的規則並可用於揭露交易資料中令人感興趣的資訊。
英文摘要
In real world applications, transactions are far more common to be presented with quantitative data as opposed to binary data. Fuzzy association rule mining algorithms have thus been proposed to handle quantitative transactions. In addition, items have certain life spans or temporal periods in which they exist in a database. Approaches have also been presented to mine fuzzy temporal association rules (FTARs). A key factor in the acquisition of fuzzy rules is the selection of appropriate membership functions. Although many approaches have been designed to generate membership functions, there is currently no existing approach which deals with the problem of generating membership functions for mining FTARs for market basket analysis. In this thesis, we propose two approaches with membership function tuning mechanism to discover FTARs.
In the first approach, it utilizes a clustering method to generate unique membership functions specifically tailored to each item in a data set. Each membership function is based on each individual items’ quantitative range, and the generated membership functions differ not only in terms the values of each interval but also in terms of the number of intervals. Two factors are instrumental in deciding each item's membership functions; density-similarity among intervals, which corresponds to the similarity in density of intervals, and information closeness within an interval, which corresponds to the similarity in the number of data points between intervals. A parameter θ is used to indicate the importance of the two factors. At last, the derived membership functions are employed in a fuzzy temporal rule mining algorithm to generate association rules. Besides, to speed up the mining process, the Fuzzy FP-growth approach is also utilized in two methods.
Because different θ values will affect greatly the set of membership functions produced, to automatically obtain a suitable parameter θ, the second approach incorporates a genetic algorithm to decide on the optimum value for θ which can produce the largest number of diverse rules. It first uses bit string to encode a possible theta value. The fitness function is made up of a combination of two factors, the number of rules generated and also the diversity of the rules, where diversity of the rules is evaluated by average number of membership functions for items. If a membership function has a larger number of intervals, each interval is smaller so the rules generated are more specific. 
Experiments were carried out on one simulated dataset and one real dataset to show the effectiveness of the proposed approaches. For the simulated dataset, the first proposed approach could greatly outperform the previous approach using predefined membership functions in terms of number of rules and diversity of rules. Since the real data set was made up of data with largely differing quantitative values, it can generate a smaller number of rules but the rules related to much more specific fuzzy regions, making them more useful. In relation to the second approach, the genetic algorithm could successfully discover the optimum value for θ in terms of producing the largest number of diverse rules. These rules were used to uncover interesting information from within the datasets.
第三語言摘要
論文目次
Table of Contents

CHAPTER 1 INTRODUCTION	1
1.1 Problem Definition and Motivation	1
1.2 Contributions	5
1.3 Reader’s Guide	6
CHAPTER 2 RELATED WORK	7
2.1 Association Rule Mining Approaches	7
2.1.1 Binary Association Rule Mining Approaches	8
2.1.2 Fuzzy Association Rule Mining Approaches	8
2.1.3 Temporal Fuzzy Association Rule Mining Approaches	10
2.2 Membership Function Generation Approaches	11
2.2.1 Membership Function Generation Approaches using Evolutionary Algorithms	11
2.2.2 Membership Function Generation Approaches not using Evolutionary Algorithms	12
CHAPTER 3 MEMBERSHIP FUNCTION GENERATION FOR FUZZY TEMPORAL ASSOCIATION RULE ACQUISITION	13
3.1 Motivation	13
3.2 Notation	15
3.2.1 Membership Function Generation Notation	15
3.2.2 Fuzzy Temporal Association Rule Mining Notation	18
3.3 Components of Proposed Approach	22
3.3.1 Membership Function Generation	22
3.3.1.1 Assigning the Initial Intervals	23
3.3.1.2 Information Closeness (IC)	27
3.3.1.3 Information Density-Similarity (DS)	28
3.3.1.4 Threshold Value	29
3.3.1.5 Density-Similarity Among Intervals	30
3.3.1.6 Information Closeness Within an Interval	30
3.3.1.7 Selecting the Optimum Interval	32
3.3.1.8 Generating Membership Functions from Intervals	32
3.3.2 Fuzzy Temporal Association Rule Mining	34
3.3.2.1 Converting to Fuzzy Set	34
3.3.2.2 Temporal Information Table	35
3.3.2.3 Calculating Temporal Fuzzy Support	35
3.3.2.4 Generating Large Itemsets	36
3.3.2.4 Generating Fuzzy Temporal Association Rules	37
3.4 Algorithm for Approach I	38
3.4.1 Algorithm for Membership Function Generation	38
3.4.2 Algorithm for Fuzzy Temporal Association Rule Mining	45
3.5 Example of Approach I	49
3.5.1 Example of GNAAR Approach	50
3.5.2 Example of GNAAROQ Approach	63
3.5.3 Example of Temporal Fuzzy Rule Mining Approach	76
CHAPTER 4 OPTIMIZING PARAMETER FOR MEMBERSHIP FUNCTION GENERATION THROUGH A GENETIC ALGORITHM	85
4.1 Motivation	85
4.2 Notation	88
4.3 Components of Proposed Approach	89
4.3.1 FP-Growth	89
4.3.2 Encoding Scheme	90
4.3.3 Genetic Operators	91
4.3.4 Fitness Evaluation	92
4.3.5 Fitness Lookup Table	94
4.4 Algorithm for Approach II	95
4.5 Example of Approach II	98
CHAPTER 5 EXPERIMENTAL RESULTS	104
5.1 Experimental Results for Approach I	104
5.1.1 Experimental Datasets	105
5.1.2 Approach used for Comparison	106
5.1.3 Comparison of Approaches with Different Minimum Supports	108
5.1.4 Comparison of Approaches with Different Minimum Confidences	111
5.2 Experimental Results for Approach II	113
5.2.1 Comparison of Efficiency Between Apriori Algorithm and FP-Growth	114
5.2.2 Optimum Theta Values Discovered by GA	115
5.2.3 Example of a Rule with Large Number of Items	116
5.2.4 Example of a Rule with Diverse Fuzzy Regions	117
5.2.3 Example of an Unusual Rule with High Confidence	118
CHAPTER 6 CONCLUSION AND FUTURE WORKS	120
REFERENCES	123

 
List of Figures

Figure 1: Initial intervals generated using GNAAR approach	23
Figure 2: Initial intervals generated using GNAAROH approach	24
Figure 3: Initial intervals generated using GNAAROQ approach	25
Figure 4: Interval C20 contains no data points when using GNAAR approach	25
Figure 5: Interval C20 contains data points when using GNAAROH approach	26
Figure 6: Membership functions assigned based on centroids of intervals	33
Figure 7: Membership function for Item X	34
Figure 8: The initial 4 intervals	51
Figure 9: Intervals C00 and C10 merged to become C11	52
Figure 10: The initial 4 intervals	57
Figure 11: Interval C00 and C10 merged	57
Figure 12: Interval C10 and C20 merged	57
Figure 13: Interval C20 and C30 merged	57
Figure 14: Level 0	58
Figure 15: Level 1	58
Figure 16: Level 2	58
Figure 17: Level 3	58
Figure 18: Centroid of intervals used to form membership functions	62
Figure 19: The resulting membership function	62
Figure 20: GNAAROQ interval extension	65
Figure 21: The initial 4 intervals	71
Figure 22: Interval C00 and C10 merged	71
Figure 23: Interval C20 and C30 merged	71
Figure 24: Interval C10 and C20 merged	71
Figure 25: Level 0	71
Figure 26: Level 1	71
Figure 27: Level 2	71
Figure 28: Level 3	71
Figure 29: The centroids of each fuzzy region	75
Figure 30: The resulting membership function	75
Figure 31: The membership functions used for each item	78
Figure 32: Two possible different membership functions for the same item	86
Figure 33: 3 different values of theta generated from chromosome	91
Figure 34: The effect of different θ values on fitness value for GNAAR algorithm	93
Figure 35: Chromosomes in the initial population	99
Figure 36: Crossover operator	102
Figure 37: Mutation operator	103
Figure 38: Dispersion of quantitative values	106
Figure 39: Membership functions used for Generated Dataset	107
Figure 40: Membership functions used for Online Retail Dataset	107
Figure 41:  Minimum support and number of rules for Generated Dataset	108
Figure 42: Minimum support and number of rules for Online Retail Dataset	109
Figure 43: Confidence and number of rules for Generated Dataset	111
Figure 44: Confidence and number of rules for Online Retail Dataset	112
Figure 45: Running time of Apriori algorithm and the FP-Growth algorithm	114
Figure 46 Example of a large 9-itemset rule	116
Figure 47 Example of a rule with diverse fuzzy regions	117
Figure 48: Fuzzy region C70  for both items	117
Figure 49 Example of an unusual rule with high confidence	118

 
Table 1: Input dataset ................................ ................................ ................................ . 50
Table 2: Quantitative values for item A ................................ ................................ ..... 50
Table 3: Threshold values of each interval ................................ ................................ 57
Table 4: The formula is used to decide on the most suitable level 60
Table 5: Data set of quantitative transactions ................................ ............................ 63
Table 6: Quantitative values for item A ................................ ................................ ..... 63
Table 7: The Threshold Values of each interval ................................ ......................... 70
Table 8: Formula used to decide on the most suitable level ........... 74
Table 9: The six temporal quantitative transactions ................................ ................... 77
Table 10:The fuzzy set for each item in each transaction ................................ .......... 77
Table 11: Temporal Information Table ................................ ................................ ....... 78
Table 12: The count values of the fuzzy regions ................................ ....................... 79
Table 13: Fuzzy temporal supports of fuzzy regions ................................ ................. 80
Table 14: Membership values for A.1∩E.1 ................................ .............................. 81
Table 15: Large 2-itemsets ................................ ................................ ......................... 82
Table 16: TS table ................................ ................................ ................................ ...... 82
Table 17: Diversity of each chromosome ................................ ................................ 100
Table 18: Number of rules for each chromosome ................................ .................... 100
Table 19: Fitness value for each chromosome ................................ ......................... 101
Table 20: Confidence and number of rules for Generated Dataset .......................... 111
Table 21: Confidence and number of rules for Online Retail Dataset ..................... 112
Table 22: Optimum theta values for each approach ................................ ................. 115
參考文獻
[1]R. Agrawal and R. Srikant, "Fast algorithm for mining association rules," The International Conference on Very Large Data Bases, pp. 487–499, 1994.

[2]M. Delgado, N. Marin, D. Sanchez and M. Vila, "Fuzzy association rules: general model and applications," IEEE Transactions on Fuzzy Systems, Vol. 11, No. 2, pp. 214-225, 2003.

[3]J. Alcala-Fdez, R. Alcala and F. Herrera, "A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning," IEEE Transactions on Fuzzy Systems, Vol. 19, No. 5, pp. 857-872, 2011.

[4]T. Hong, C. Chen, Y. Wu and Y. Lee, "A GA-based fuzzy mining approach to achieve a trade-off between number of rules and suitability of membership functions," Soft Computing, Vol. 10, No. 11, pp. 1091-1101, 2006.

[5]W. Lee and S. Lee, "Discovery of fuzzy temporal association rules," IEEE Transactions on Systems, Man and Cybernetics, Part B, Vol. 34, No. 6, pp. 2330-2342, 2004.

[6]C. Chen, G. Lan, T. Hong and S. Lin, "Mining fuzzy temporal association rules by item lifespans," Applied Soft Computing, Vol. 41, pp. 265-274, 2016.

[7]H. Verlinde, M. De Cock and R. Boute, "Fuzzy versus quantitative association rules: a fair data-driven comparison," IEEE Transactions on Systems, Man and Cybernetics, Part B, Vol. 36, No. 3, pp. 679-684, 2005.

[8]C. Chen, V. Tseng and T. Hong, "Cluster-based evaluation in fuzzy-genetic data mining," IEEE Transactions on Fuzzy Systems, Vol. 16, No. 1, pp. 249-262, 2008.


[9]W. Au and K. Chan, "Mining fuzzy association rules in a bank-account database", IEEE Transactions on Fuzzy Systems, vol. 11, no. 2, pp. 238-248, 2003.

[10]G. Lan, C. Chen, T. Hong and S. Lin, "A fuzzy approach for mining general temporal association rules in a publication database," 11th International Conference on Hybrid Intelligent Systems (HIS), pp. 611-615, 2011.

[11]B. Chien, Z. Lin and Y. Chen, "An algorithm of granulation on numeric attributes for association rules mining," Polish Academy of Sciences, Committee of Automation and Robotics, Vol. 12, No. 4, pp 379-395, 2002

[12]A. Paul, P. Shill, R. Rabin, A. Kundu and A. Akhand, "Fuzzy membership function generation using DMS-PSO for the diagnosis of heart disease," 18th International Conference on Computer and Information Technology (ICCIT), pp. 456 – 461, 2015

[13]B. Choi and F. Rhee, "Interval type-2 fuzzy membership function generation methods for pattern recognition," Information Sciences, Vol. 179, No. 13, pp. 2102-2122, 2009.

[14]R. Srikant and R. Agrawal, "Mining quantitative association rules in large relational tables,"ACM SIGMOD Record, Vol. 25, No. 2, pp. 1-12, 1996.

[15]W. Lee, J. Jiang and S. Lee, "Mining fuzzy periodic association rules," Data & Knowledge Engineering, Vol. 65, No. 3, pp. 442-462, 2008.

[16]Y. Woon, W. Ng and E. Lim, "A support-ordered trie for fast frequent itemset discovery," IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 07, pp. 875-879, 2004.

[17]J. Han, J. Pei, Y. Yin and R. Mao, "Mining frequent patterns without candidate generation: a frequent-pattern tree approach," Data Mining and Knowledge Discovery, Vol. 8, No. 1, pp. 53-87, 2004.

[18]T. Zhang, R. Ramakrishnan and M. Livny, "BIRCH: An efficient data clustering method for very large databases," ACM SIGMOD Record, Vol. 25, No. 2, pp. 103-114, 1996.

[19]T. Hong, C. Chen, Y. Lee and Y. Wu, "Genetic-fuzzy data mining with divide-and-conquer strategy," IEEE Transactions on Evolutionary Computation, Vol. 12, No. 2, pp. 252-265, 2008.
[20]E. Mansoori, "FRBC: A fuzzy rule-based clustering algorithm," IEEE Transactions on Fuzzy Systems, Vol. 19, No. 5, pp. 960-971, 2011.

[21] Y. Hui, K. Choy, G. Ho, H. Lam, "A fuzzy association rule mining framework for variables selection concerning the storage time of packaged food,"IEEE International Conference on Fuzzy Systems,pp. 671-677, 2016.

[22] M. del Jesus, P. Gonzalez, F. Herrera and M. Mesonero, "Evolutionary fuzzy rule induction process for subgroup discovery: a case study in marketing," IEEE Transactions on Fuzzy Systems, Vol. 15, No. 4, pp. 578-592, 2007.

[23] R. Agrawal, T. Imielinksi and A. Swami, "Mining association rules between sets ofitems in large database,"The 1993 ACM SIGMOD Conference, pp. 207-216, 1993.

[24] R. Agrawal, T. Imielinksi and A. Swami, "Database mining: a performance perspective, "EEE Transactions on Knowledge and Data Engineering archive, Vol. 5 No. 6, pp. 914–925,1993.

[25] R. Agrawal, R. Srikant and Q. Vu, "Mining association rules with item constraints, "The Third International Conference on Knowledge Discovery in Databases andData Mining, pp. 67-73, 1997

[26]S. Matthews, M. Gongora, A. Hopgood and S. Ahmadi, "Web usage mining with evolutionary extraction of temporal fuzzy association rules," Knowledge-Based Systems, Vol. 54, pp. 66-72, 2013.

[27]T. Hong, G. Lan, J. Su, P. Wu and S. Wang, "Discovery of temporal association rules with hierarchical granular framework,"Applied Computing and Informatics, Vol. 12, No. 2, pp. 134-141, 2016.

[28] C. Wang, W. Lee and C. Pang, "Applying fuzzy FP-Growth to mine fuzzy association rules,"International Journal of Computer, Electrical, Automation, Control and Information Engineering, Vol 4, No. 5, pp. 986–992, 2010

[29] C. Lin, T. Hong and W. Lu, "An efficient tree-based fuzzy data mining approach,"International Journal of Fuzzy Systems, Vol. 12, No. 2, pp. 150-157, 2010

[30] D. Chen, S. Sain and K. Guo, "Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining," Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197-208, 2012
論文全文使用權限
校內
紙本論文於授權書繳交後5年公開
同意電子論文全文授權校園內公開
校內電子論文於授權書繳交後5年公開
校外
同意授權
校外電子論文於授權書繳交後5年公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信