淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


下載電子全文限經由淡江IP使用) 
系統識別號 U0002-2906201210470800
中文論文名稱 有效率的探勘客戶消費行為方法的研究
英文論文名稱 Efficient Approaches for Mining Customer Behaviors
校院名稱 淡江大學
系所名稱(中) 管理科學學系博士班
系所名稱(英) Doctoral Program, Department of Management Sciences
學年度 100
學期 2
出版年 101
研究生中文姓名 王秋光
研究生英文姓名 Chiu-Kuang Wang
學號 895620093
學位類別 博士
語文別 英文
口試日期 2012-06-06
論文頁數 131頁
口試委員 指導教授-顏秀珍
指導教授-歐陽良裕
委員-莊忠柱
委員-曹銳勤
委員-陳景祥
委員-李御璽
委員-謝邦昌
委員-鄭宇庭
中文關鍵字 資料探勘  頻繁項目集  頻繁封閉項目集  資料流  消費行為模式 
英文關鍵字 Data mining  Frequent itemset  Frequent closed itemset  Data stream  Consumption behaviors 
學科別分類
中文摘要 資料豐富的資料庫在數位化之後已經普遍產生,如何從資料庫中挖掘重要的資訊是資料探勘的主要任務。在商業活動的應用中,我們可以從交易資料庫中分析常常一併購買的商品以及顧客在購買某些商品之後,可能也會購買其它商品的關聯行為,也就是探勘頻繁項目集,在論文中,我們提出很有效率的探勘頻繁項目集演算法,不論在執行時間或記憶體的使用量上都優於之前的研究。
然而,新的交易資料會不斷產生,而舊有的交易資料必須被移除,若重新探勘原始資料,則會浪費時間重新找尋已知的資訊。隨時間不斷產生新資料與移除舊資料的環境稱之為資料流。因此在資料流的環境下,找出所有頻繁項目集開始被學者提出研究。另外,交易資料中的頻繁項目集可能非常多,眾多資訊會造成困擾,以致無法做決策。因此學者提出封閉項目集。若項目集的支持度比其所有超集合的支持度大,則此項目集稱為封閉項目集。由頻繁封閉項目集可衍生全部的頻繁項目集。我們也提出有效率的演算法,當資料不斷新增或被移除時,原始資料庫不需要被重新讀取,只需將新增或被移除的資料與舊有的封閉項目集做運算,就可產生更新後的封閉項目集。
耗材性商品通常在所有商品中十分經常被購買,雖然單獨的獲利可能並沒有家電、電子商品這麼高,但是累積後的獲利卻不是小數值,所以若是能針對耗材性商品掌握正確的商機促銷,對於獲取重大利潤將有很大的幫助,而頻繁項目集無法提供促銷時機的資訊。因此,我們提出一個新穎的資料探勘方式,針對於某種耗材性商品,找出不同特徵的客戶對此商品的消費行為,根據客戶的背景屬性值以及此次購買某商品的數量,我們可以利用此消費行為,正確的預測出此顧客何時會再需要此商品,以掌握行銷此商品給此客戶的時機。
英文摘要 Mining frequent itemsets is to discover the groups of items appearing always together excess of a user specified threshold. Many approaches have been proposed for mining frequent itemsets by applying the FP-tree structure to improve the efficiency of the FP-Growth algorithm which needs to recursively construct sub-trees. Although these approaches do not need to recursively construct many sub-trees, they also suffer the problem of a large search space, such that the performances for the previous approaches degrade when the database is massive or the threshold for mining frequent itemsets is low. In order to reduce the search space and speed up the mining process, we propose an efficient algorithm for mining frequent itemsets based on frequent pattern tree. Our algorithm generates a sub-tree for each frequent item and then generates candidates in batch from this sub-tree. For each candidate generation, our algorithm only generates a small set of candidates, which can significantly reduce the search space.
However, there may be many frequent itemsets existing in a transaction database, such that it is difficult to make a decision for a decision maker. Recently, mining frequent closed itemsets becomes a major research issue, since a set of the frequent closed itemsets is a condensed and complete representation of the frequent itemsets and all the frequent itemsets can be derived from the frequent closed itemsets. Because the transactions in a transaction database will grow rapidly in a short time, and some of the transactions may be antiquated. Consequently, the frequent closed itemsets may be changed due to the addition of the new transactions or the deletion of the old transactions from the transaction database. It is a challenge that how to update the previous closed itemsets when the transactions are added into or removed from the transaction database. We propose an efficient algorithm for incrementally mining closed itemsets without scanning the original database. Our algorithm updates closed itemsets by performing some operations on the previous closed itemsets and the added/deleted transactions without searching the previous closed itemsets.
Compared with other commodities, consumable products are purchased high-frequently. Although single gains for consumable products may be lower than that of appliances or electronic products, the accumulative gains for consumable products are great. Therefore, grasping suitable timing to do sales promotion for consumable products is an important task. Sequential pattern mining only considers the sequential purchasing behaviors for most of the customers, but they cannot predict when the customer will need the products in the future. For the consumable products, the purchase time for the next transaction is usually related to the purchase quantities for this transaction. We propose a novel data mining algorithm to find the consumption behaviors for most of customers. From this information, we can predict the next purchased time for an item based on the purchased quantity of this item at this time.
論文目次 Content I
List of Table III
List of Figure V
Chapter 1 Introduction 6
Chapter 2 Related Work 12
2.1 Mining Frequent Itemsets 12
2.2 Frequent Closed Itemsets in the Data Streams 13
2.3 Mining Consumption Behaviors 15
Chapter 3 Mining Frequent Itemsets 19
3.1 Preliminary 19
3.2 Search Space Reduced Algorithm 24
3.2.1. Item-prefix Tree Construction 25
3.2.2. Frequent Itemset Generation 28
3.3 Experimental Results 31
Chapter 4 Mining Closed Frequent Itemsets 44
4.1. Preliminary 44
4.2 MFCI Algorithm 46
4.2.1 Algorithm MFCI-add 50
4.2.1.1 Closed Itemset Generator 52
4.2.1.2 The Maintenance of Closed Itemsets after Adding a Transaction 57
4.2.2 Algorithm MFCI-del 68
4.3 Experimental Results 75
4.3.1 Experimental Results on Synthetic Datasets 76
4.3.2 Experimental Results on Real Datasets 85
Chapter 5 Mining Consumption Behaviors 91
5.1 Preliminary 91
5.2 Mining Consumption Association Rules 92
5.2.1 Mining Consumption Patterns 97
5.2.2 Mining Consumption Association Rules 102
5.3 Experimental Results 116
5.3.1 Synthetic Data Generations 116
5.3.2 Experimental Results 118
Chapter 6 Conclusions and Future Work 123
REFERENCES 127
List of Table
Table 3 1 A Transaction Database TDB 21
Table 3 2 Frequent items and their support counts 21
Table 3 3 A sorted transaction database with minimum support 25% 21
Table 3 4 The memory usages and execution times for the five algorithms on the datasets with average transaction size T = 10 38
Table 3 5 The memory usages and execution times for the five algorithms on the datasets with average transaction size T = 20 38
Table 4 1 A Transaction Database 45
Table 4 2 Content Table for the transaction database in Table 4-1 46
Table 4 3 The Item Table for the transaction database in Table 4-1 47
Table 4 4 The Temp Table after adding T = {ACT} 55
Table 4 5 Transaction database D 63
Table 4 6 Content Table for the transaction database in Table 4-5 63
Table 4 7 Item Table for the transaction database in Table 4-5 63
Table 4 8 The Temp Table after adding T5 = {ACDTW} 64
Table 4 9 The Item Table after adding T5 = {ACDTW} 64
Table 4 10 The Content Table after adding T5 = {ACDTW} 65
Table 4 11 The Temp Table after adding T6 = {CDT} 66
Table 4 12 The Item Table after adding T6 = {CDT} 66
Table 4 13 The Content Table after adding T6 = {CDT} 67
Table 4 14 The Item Table after deleting T1 and T2 72
Table 4 15 The Content Table after deleting T1 and T2 73
Table 4 16 Parameters for IBM data generator 76
Table 5 1 A Transaction Table 94
Table 5 2 Transaction table for item A 95
Table 5 3 Sorted transaction table for item A 96
Table 5 4 Consumption table 97
Table 5 5 The status for 3 consumption intervals 101
Table 5 6 Customer Table 103
Table 5 7 Customer-Consumption Table 104
Table 5 8 Frequent 2-attribute value set 111
Table 5 9 The confidence for the frequent attribute value set 113
Table 5 10 Consumption association rules 114
Table 5 11 The parameters of the synthetic datasets 117
Table 5 12 The parameters for the first experiment 119
Table 5 13 The parameters settings for the second experiment 121
Table 5 14 The comparison of experimental results 122

List of Figure
Figure 3 1 The FP-tree for Table 3-1 with minimum support 25% 21
Figure 3 2 The item-prefix tree for item D 28
Figure 3 3 Execution times for the five algorithms on dataset T10I2D100K 32
Figure 3 4 Execution times for the five algorithms on dataset T10I4D100K 33
Figure 3 5 Execution times for the five algorithms on dataset T10I6D100K 33
Figure 3 6 Execution times for the five algorithms on dataset T20I4D100K 36
Figure 3 7 Execution times for five algorithms on dataset T20I6D100K 36
Figure 3 8 Execution times for the five algorithms on dataset T20I8D100K 37
Figure 3 9 Execution times for the five algorithms 40
Figure 3 10 Execution times for algorithms on dataset T10I4D100K(item 10K) 40
Figure 3 11 Execution times for five algorithms on real dataset chess 42
Figure 3 12 Execution times for five algorithms on the real dataset Mushroom 42
Figure 4 1 Execution times for the three algorithms 78
Figure 4 2 Execution times for the three algorithms 79
Figure 4 3 Execution times for the three algorithms when 1K transactions are added and another 1K transactions are deleted 80
Figure 4 4 Execution times for the two algorithms 81
Figure 4 5 Execution times for the two algorithms 82
Figure 4 6 Number of closed itemsets deleted after 83
Figure 4 7 Execution times for the two algorithms after 84
Figure 4 8 Execution times for CFI-Stream-Modify and MFCI 85
Figure 4 9 Execution times of MFCI for adding 86
Figure 4 10 The total number of closed itemsets after adding 87
Figure 4 11 Execution times of MFCI for deleting 406 transactions each time 88
Figure 4 12 The total number of the closed itemsets 88
Figure 4 13 Execution times of MFCI for adding and 89
Figure 4 14 The total number of closed itemsets for each addition and deletion 90
Figure 5 1 Consumption intervals in Table 5-4 100
Figure 5 2 The MinRSup for consumption interval I 101
Figure 5 3 Numeric attribute 2-D projection 108
Figure 5 4 Algorithm CAR 116
Figure 5 5 The experimental results for Series A-D 120
參考文獻 [1]Agrawal, R., Gehrke, J., Gunopulos, D. and Raghavan, P. (1998), Automatic subspace clustering of high dimensional data for data minig applications, International conference on management of data, 94-105.
[2]Agrawal, R. and Srikant, R. (1994), Fast algorithm for mining association rules, International conference on very large data bases, 487-499.
[3]Agrawal, R. and Srikant, R. (1995), Mining sequential patterns, International conference on data engineering, 3-14.
[4]Agrawal, R. and Srikant, R. (1996a), Mining sequential patterns: Gerneralizations and performance improvements, International conference on extending database technulogy, 3-17.
[5]Agrawal, R. and Srikant, R. (1996b), Mining quantitative association rules in large relational tables, International conference on management of data, 1-12.
[6]Bastide, Y., Lakhal, L., Pasquier, N., Stumme, G. and Taouil, R. (2000), Mining frequent patterns with counting inference, International conference on knowledge discovery and data mining, 2(2), 66-75.
[7]Brin, S., Motwani, R., Ullman, J. and Tsur, S. (1997), Dynamic itemset counting and implication rules for market basket data, International conference on management of data, 255-264.
[8]Cheng, J., Ke Y. and Ng, W. (2008), A survey on algorithms for mining frequent itemsets over data streams, Knowledge and Information Systems, 16(1), 1-27.
[9]Chi, Y., Wang, H., Yu, P. S. and Muntz, R. R. (2004), Moment: maintaining closed itemsets over a stream sliding window, International conference on data mining, 59-66.
[10]Donga, Jie and Han, Min (2007), BitTableFI: An efficient mining frequent itemsets algorithm, Knowledge-Based Systems, 20(4), 329-335.
[11]El-Hajj, M. and Zaiane, O. R. (2003), Non recursive generation of frequent k-itemsets from frequent pattern tree representation, International conference on data warehousing and knowledge discovery, 371-380.
[12]Ester, M., Kriegel, H. P., Sander, J. and Xu, X. (1996), A density-based algorithm for discovering clusters in large spatial database with noise, International conference on knowledge discovery in data, 226-231.
[13]Fiesler, E. (1993), Neural network classification and formalization, Computer standards and interfaces, 16(3), 231-239.
[14]Giannella, C., Han, J. and Pei, J. (2003), Mining frequent patterns in data streams at multiple time granularities, Next generation data mining, AAAI/MIT, 91-212.
[15]Han, J., Cheng, H., Xin, D. and Yan, X. (2007), Frequent pattern mining: current status and future directions, Data mining and knowledge discovery, 15(1), 55-86.
[16]Han, J., Pei, J., and Yin, Y. (2000), Mining frequent patterns without candidate generation, ACM International conference on management of data, 29(2), 1-12.
[17]Han, J., Pei, J., Yin, Y. and Mao, R. (2004), Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data mining and knowledge discovery, 8, 53-87.
[18]Hou, W., Yang, B., Zhou, Z. and Wu, C. S. (2008), An adaptive frequent itemset mining algorithm for data stream with concept drifts, International conference on computer science and software engineering, 382-385.
[19]Houtsma, M. and Swami, A. (1995), Set-oriented mining for association rules in relational databases, International conference on data engineering, 25-33.
[20]Jiang, N. and Gruenwald, L. (2006), CFI-Stream: Mining closed closed itemsets in data streams, International conference on knowledge discovery and data mining, 592-597.
[21]Jin, R., and Agrawal, G. (2005), An algorithm for in-core frequent itemset mining on streaming data, International conference on data mining, 210-217.
[22]Koh, J. L. and Shieh, S. F. (2004), An efficient approach for maintaining association rules based on adjusting fp-tree structures, International conference on database systems for advanced applications, 417-424.
[23]Li, Hua-Fu and Lee, Suh-Yin (2009), Mining frequent itemsets over data streams using efficient window sliding techniques, Expert systems with applications, 36(2), Part 1, 1466-1477.
[24]Lee, C. H., Lin, C. R. and Chen, M. S. (2001), On mining general temporal association rules in a publication database, International conference on data mining.
[25]Liu, B., Hsu, W. and Ma, Y. (1999), Mining association rules with multiple minimum supports, International conference on knowledge discovery in data , 337-341.
[26]Lucchese, C., Orlando, S. and Perego, R. (2006), Fast and memory efficient mining of frequent closed itemsets, Transaction on knowledge and data engineering, 18(1), 21-36.
[27]Manku, G. S. and Motwani, R. (2002), Approximate frequency counts over data streams, International conference on very large data bases, 346-357.
[28]Mobasher, B., Dai, H., Luo, T. and Nakagawa, M. (2001), Efficient personalization based on association rule discovery from web usage data, Workshop on web information and data management.
[29]Park, J. S., Chen, M. S. and Yu, P. S. (1995), An effective hash-based algorithm for mining association rules, Association for computing machinery special interest group on management of data, 24(2), 175-186.
[30]Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. (1999), Discovering frequent closed itemsets for association rules, International conference on database theory, 398-416.
[31]Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U. and Hsu, M. C. (2001), PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth, International conference on data engineering, 215-224.
[32]Quinlan, J. R. (1986), Introduction of decision trees, Machine learning, Vol. 1, 81-106.
[33]Quinlan, J.R. (1996), Improved Use of Continuous Attributes in C4.5, Journal of Artificial Intelligence Research, 77-90.
[34]Raïssi, C., Poncelet, P. and Teisseire, M. (2007), Towards a new approach for mining frequent itemsets on data stream, Journal of intelligent information systems, 28(1), 23-36.
[35]Wang, J., Han, J., and Pei, J. (2003), CLOSET+: searching for the best strategies for mining frequent closed itemsets, International conference on knowledge discovery and data mining, 236-245.
[36]Xu, Y., Yu, J. X., Liu, G. and Lu, H. (2002), From path tree to frequent patterns: A framework for mining frequent patterns, International conference on data mining, 514-521.
[37]Yen, S. J. and Chen, A. L. P. (2001), A graph-based approach for discovering various types of association rules, Transactions on knowledge and data engineering, 13(5), 839-845.
[38]Yen, S. J. and Lee, Y. S. (2002a), Mining time-gap sequential patterns from transaction databases, Journal of computers, 14(2), 30-46.
[39]Yen, S. J., Lee, Y. S. and Chen, S. W. (2002b), Mining quantitative association rules from transaction database, National conference on fuzzy teory and its applications, 520-525.
[40]Yen, S. J., Lee, Y. S., Wang, C. K., Wu, J. W. and Ouyang, L. Y. (2009), The studies of mining frequent patterns based on frequent pattern tree, Pacific-asia conference on knowledge discovery and data mining, lecture notes in artificial intelligence, LNAI 5476, 232-241.
[41]Yu, J., Chong, Z. and Zhou, H. Lu (2004), False positive or false negative: Mining frequent itemsets from high speed transactional data streams, International conference on very large data bases, 204-215.
[42]Zaiane, O. R., El-Hajj, M. and Lu, P. (2001), Fast parallel association rule mining without candidacy generation, International conference on data mining, 665-668.
[43]Zaki, M. J. (2000a), Generation non-redundant association rules, International conference on knowledge discovery in data, 34-43.
[44]Zaki, M. J. (2000b), Scalable algorithms for association mining, Transactions on knowledge and data engineering, 12(3), 372-390
[45]Zaki, M. J. and Hsiao, C. J. (2002), CHARM: An efficient algorithm for closed itemset mining, International conference on data mining, 99-104.
[46]Zhou, Z. H., Jiang, Y. and Chen, S. F. (2000), A General Neural Framework for Classification Rule Mining, International Journal of Computers, System and Signals, 154-168.
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2017-07-12公開。
  • 同意授權瀏覽/列印電子全文服務,於2017-07-12起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信