§ 瀏覽學位論文書目資料
  
系統識別號 U0002-2906201210470800
DOI 10.6846/TKU.2012.01273
論文名稱(中文) 有效率的探勘客戶消費行為方法的研究
論文名稱(英文) Efficient Approaches for Mining Customer Behaviors
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 管理科學學系博士班
系所名稱(英文) Doctoral Program, Department of Management Sciences
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 100
學期 2
出版年 101
研究生(中文) 王秋光
研究生(英文) Chiu-Kuang Wang
學號 895620093
學位類別 博士
語言別 英文
第二語言別
口試日期 2012-06-06
論文頁數 131頁
口試委員 指導教授 - 顏秀珍
指導教授 - 歐陽良裕
委員 - 莊忠柱
委員 - 曹銳勤
委員 - 陳景祥
委員 - 李御璽
委員 - 謝邦昌
委員 - 鄭宇庭
關鍵字(中) 資料探勘
頻繁項目集
頻繁封閉項目集
資料流
消費行為模式
關鍵字(英) Data mining
Frequent itemset
Frequent closed itemset
Data stream
Consumption behaviors
第三語言關鍵字
學科別分類
中文摘要
資料豐富的資料庫在數位化之後已經普遍產生,如何從資料庫中挖掘重要的資訊是資料探勘的主要任務。在商業活動的應用中,我們可以從交易資料庫中分析常常一併購買的商品以及顧客在購買某些商品之後,可能也會購買其它商品的關聯行為,也就是探勘頻繁項目集,在論文中,我們提出很有效率的探勘頻繁項目集演算法,不論在執行時間或記憶體的使用量上都優於之前的研究。
然而,新的交易資料會不斷產生,而舊有的交易資料必須被移除,若重新探勘原始資料,則會浪費時間重新找尋已知的資訊。隨時間不斷產生新資料與移除舊資料的環境稱之為資料流。因此在資料流的環境下,找出所有頻繁項目集開始被學者提出研究。另外,交易資料中的頻繁項目集可能非常多,眾多資訊會造成困擾,以致無法做決策。因此學者提出封閉項目集。若項目集的支持度比其所有超集合的支持度大,則此項目集稱為封閉項目集。由頻繁封閉項目集可衍生全部的頻繁項目集。我們也提出有效率的演算法,當資料不斷新增或被移除時,原始資料庫不需要被重新讀取,只需將新增或被移除的資料與舊有的封閉項目集做運算,就可產生更新後的封閉項目集。
耗材性商品通常在所有商品中十分經常被購買,雖然單獨的獲利可能並沒有家電、電子商品這麼高,但是累積後的獲利卻不是小數值,所以若是能針對耗材性商品掌握正確的商機促銷,對於獲取重大利潤將有很大的幫助,而頻繁項目集無法提供促銷時機的資訊。因此,我們提出一個新穎的資料探勘方式,針對於某種耗材性商品,找出不同特徵的客戶對此商品的消費行為,根據客戶的背景屬性值以及此次購買某商品的數量,我們可以利用此消費行為,正確的預測出此顧客何時會再需要此商品,以掌握行銷此商品給此客戶的時機。
英文摘要
Mining frequent itemsets is to discover the groups of items appearing always together excess of a user specified threshold. Many approaches have been proposed for mining frequent itemsets by applying the FP-tree structure to improve the efficiency of the FP-Growth algorithm which needs to recursively construct sub-trees. Although these approaches do not need to recursively construct many sub-trees, they also suffer the problem of a large search space, such that the performances for the previous approaches degrade when the database is massive or the threshold for mining frequent itemsets is low. In order to reduce the search space and speed up the mining process, we propose an efficient algorithm for mining frequent itemsets based on frequent pattern tree. Our algorithm generates a sub-tree for each frequent item and then generates candidates in batch from this sub-tree. For each candidate generation, our algorithm only generates a small set of candidates, which can significantly reduce the search space.
However, there may be many frequent itemsets existing in a transaction database, such that it is difficult to make a decision for a decision maker. Recently, mining frequent closed itemsets becomes a major research issue, since a set of the frequent closed itemsets is a condensed and complete representation of the frequent itemsets and all the frequent itemsets can be derived from the frequent closed itemsets. Because the transactions in a transaction database will grow rapidly in a short time, and some of the transactions may be antiquated. Consequently, the frequent closed itemsets may be changed due to the addition of the new transactions or the deletion of the old transactions from the transaction database. It is a challenge that how to update the previous closed itemsets when the transactions are added into or removed from the transaction database. We propose an efficient algorithm for incrementally mining closed itemsets without scanning the original database. Our algorithm updates closed itemsets by performing some operations on the previous closed itemsets and the added/deleted transactions without searching the previous closed itemsets.
Compared with other commodities, consumable products are purchased high-frequently. Although single gains for consumable products may be lower than that of appliances or electronic products, the accumulative gains for consumable products are great. Therefore, grasping suitable timing to do sales promotion for consumable products is an important task. Sequential pattern mining only considers the sequential purchasing behaviors for most of the customers, but they cannot predict when the customer will need the products in the future. For the consumable products, the purchase time for the next transaction is usually related to the purchase quantities for this transaction. We propose a novel data mining algorithm to find the consumption behaviors for most of customers. From this information, we can predict the next purchased time for an item based on the purchased quantity of this item at this time.
第三語言摘要
論文目次
Content	I
List of Table	III
List of Figure	V
Chapter 1 Introduction	6
Chapter 2 Related Work	12
2.1	Mining Frequent Itemsets	12
2.2	Frequent Closed Itemsets in the Data Streams	13
2.3	Mining Consumption Behaviors	15
Chapter 3 Mining Frequent Itemsets	19
3.1	Preliminary	19
3.2	Search Space Reduced Algorithm	24
3.2.1.	Item-prefix Tree Construction	25
3.2.2.	Frequent Itemset Generation	28
3.3	Experimental Results	31
Chapter 4 Mining Closed Frequent Itemsets	44
4.1.	Preliminary	44
4.2	MFCI Algorithm	46
4.2.1	Algorithm MFCI-add	50
4.2.1.1	Closed Itemset Generator	52
4.2.1.2	The Maintenance of Closed Itemsets after Adding a Transaction	57
4.2.2	Algorithm MFCI-del	68
4.3	Experimental Results	75
4.3.1	Experimental Results on Synthetic Datasets	76
4.3.2	Experimental Results on Real Datasets	85
Chapter 5 Mining Consumption Behaviors	91
5.1	Preliminary	91
5.2	Mining Consumption Association Rules	92
5.2.1  Mining Consumption Patterns	97
5.2.2  Mining Consumption Association Rules	102
5.3  Experimental Results	116
5.3.1  Synthetic Data Generations	116
5.3.2  Experimental Results	118
Chapter 6  Conclusions and Future Work	123
REFERENCES	127
List of Table
Table 3 1 A Transaction Database TDB	21
Table 3 2 Frequent items and their support counts	21
Table 3 3 A sorted transaction database with minimum support 25%	21
Table 3 4 The memory usages and execution times for the five algorithms on the datasets with average transaction size T = 10	38
Table 3 5 The memory usages and execution times for the five algorithms on the datasets with average transaction size T = 20	38
Table 4 1 A Transaction Database	45
Table 4 2 Content Table for the transaction database in Table 4-1	46
Table 4 3 The Item Table for the transaction database in Table 4-1	47
Table 4 4 The Temp Table after adding T = {ACT}	55
Table 4 5 Transaction database D	63
Table 4 6 Content Table for the transaction database in Table 4-5	63
Table 4 7 Item Table for the transaction database in Table 4-5	63
Table 4 8 The Temp Table after adding T5 = {ACDTW}	64
Table 4 9 The Item Table after adding T5 = {ACDTW}	64
Table 4 10 The Content Table after adding T5 = {ACDTW}	65
Table 4 11 The Temp Table after adding T6 = {CDT}	66
Table 4 12 The Item Table after adding T6 = {CDT}	66
Table 4 13 The Content Table after adding T6 = {CDT}	67
Table 4 14 The Item Table after deleting T1 and T2	72
Table 4 15 The Content Table after deleting T1 and T2	73
Table 4 16 Parameters for IBM data generator	76
Table 5 1 A Transaction Table	94
Table 5 2 Transaction table for item A	95
Table 5 3 Sorted transaction table for item A	96
Table 5 4 Consumption table	97
Table 5 5 The status for 3 consumption intervals	101
Table 5 6 Customer Table	103
Table 5 7 Customer-Consumption Table	104
Table 5 8 Frequent 2-attribute value set	111
Table 5 9 The confidence for the frequent attribute value set	113
Table 5 10 Consumption association rules	114
Table 5 11 The parameters of the synthetic datasets	117
Table 5 12 The parameters for the first experiment	119
Table 5 13 The parameters settings for the second experiment	121
Table 5 14 The comparison of experimental results	122
 
List of Figure
Figure 3 1 The FP-tree for Table 3-1 with minimum support 25%	21
Figure 3 2 The item-prefix tree for item D	28
Figure 3 3 Execution times for the five algorithms on dataset T10I2D100K	32
Figure 3 4 Execution times for the five algorithms on dataset T10I4D100K	33
Figure 3 5 Execution times for the five algorithms on dataset T10I6D100K	33
Figure 3 6 Execution times for the five algorithms on dataset T20I4D100K	36
Figure 3 7 Execution times for five algorithms on dataset T20I6D100K	36
Figure 3 8 Execution times for the five algorithms on dataset T20I8D100K	37
Figure 3 9 Execution times for the five algorithms	40
Figure 3 10 Execution times for algorithms on dataset T10I4D100K(item 10K)	40
Figure 3 11 Execution times for five algorithms on real dataset chess	42
Figure 3 12 Execution times for five algorithms on the real dataset Mushroom	42
Figure 4 1 Execution times for the three algorithms	78
Figure 4 2 Execution times for the three algorithms	79
Figure 4 3 Execution times for the three algorithms when 1K transactions are added and another 1K transactions are deleted	80
Figure 4 4 Execution times for the two algorithms	81
Figure 4 5 Execution times for the two algorithms	82
Figure 4 6 Number of closed itemsets deleted after	83
Figure 4 7 Execution times for the two algorithms after	84
Figure 4 8 Execution times for CFI-Stream-Modify and MFCI	85
Figure 4 9 Execution times of MFCI for adding	86
Figure 4 10 The total number of closed itemsets after adding	87
Figure 4 11 Execution times of MFCI for deleting 406 transactions each time	88
Figure 4 12 The total number of the closed itemsets	88
Figure 4 13 Execution times of MFCI for adding and	89
Figure 4 14 The total number of closed itemsets for each addition and deletion	90
Figure 5 1 Consumption intervals in Table 5-4	100
Figure 5 2 The MinRSup for consumption interval I	101
Figure 5 3 Numeric attribute 2-D projection	108
Figure 5 4 Algorithm CAR	116
Figure 5 5 The experimental results for Series A-D	120
參考文獻
[1]Agrawal, R., Gehrke, J., Gunopulos, D. and Raghavan, P. (1998), Automatic subspace clustering of high dimensional data for data minig applications, International conference on management of data, 94-105.
[2]Agrawal, R. and Srikant, R. (1994), Fast algorithm for mining association rules, International conference on very large data bases, 487-499.
[3]Agrawal, R. and Srikant, R. (1995), Mining sequential patterns, International conference on data engineering, 3-14.
[4]Agrawal, R. and Srikant, R. (1996a), Mining sequential patterns: Gerneralizations and performance improvements, International conference on extending database technulogy, 3-17.
[5]Agrawal, R. and Srikant, R. (1996b), Mining quantitative association rules in large relational tables, International conference on management of data, 1-12.
[6]Bastide, Y., Lakhal, L., Pasquier, N., Stumme, G. and Taouil, R. (2000), Mining frequent patterns with counting inference, International conference on knowledge discovery and data mining, 2(2), 66-75.
[7]Brin, S., Motwani, R., Ullman, J. and Tsur, S. (1997), Dynamic itemset counting and implication rules for market basket data, International conference on management of data, 255-264.
[8]Cheng, J., Ke Y. and Ng, W. (2008), A survey on algorithms for mining frequent itemsets over data streams, Knowledge and Information Systems, 16(1), 1-27.
[9]Chi, Y., Wang, H., Yu, P. S. and Muntz, R. R. (2004), Moment: maintaining closed itemsets over a stream sliding window, International conference on data mining, 59-66.
[10]Donga, Jie and Han, Min (2007), BitTableFI: An efficient mining frequent itemsets algorithm, Knowledge-Based Systems, 20(4),  329-335.
[11]El-Hajj, M. and Zaiane, O. R. (2003), Non recursive generation of frequent k-itemsets from frequent pattern tree representation, International conference on data warehousing and knowledge discovery, 371-380.
[12]Ester, M., Kriegel, H. P., Sander, J. and Xu, X. (1996), A density-based algorithm for discovering clusters in large spatial database with noise, International conference on knowledge discovery in data, 226-231.
[13]Fiesler, E. (1993), Neural network classification and formalization, Computer standards and interfaces, 16(3), 231-239.
[14]Giannella, C., Han, J. and Pei, J. (2003), Mining frequent patterns in data streams at multiple time granularities, Next generation data mining, AAAI/MIT, 91-212.
[15]Han, J., Cheng, H., Xin, D. and Yan, X.  (2007), Frequent pattern mining: current status and future directions, Data mining and knowledge discovery, 15(1), 55-86.
[16]Han, J., Pei, J., and Yin, Y. (2000), Mining frequent patterns without candidate generation, ACM International conference on management of data, 29(2), 1-12.
[17]Han, J., Pei, J., Yin, Y. and Mao, R. (2004), Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data mining and knowledge discovery, 8, 53-87.
[18]Hou, W., Yang, B., Zhou, Z. and Wu, C. S. (2008), An adaptive frequent itemset mining algorithm for data stream with concept drifts, International conference on computer science and software engineering, 382-385.
[19]Houtsma, M. and Swami, A. (1995), Set-oriented mining for association rules in relational databases, International conference on data engineering, 25-33.
[20]Jiang, N. and Gruenwald, L. (2006), CFI-Stream: Mining closed closed itemsets in data streams, International conference on knowledge discovery and data mining, 592-597.
[21]Jin, R., and Agrawal, G. (2005), An algorithm for in-core frequent itemset mining on streaming data, International conference on data mining, 210-217.
[22]Koh, J. L. and Shieh, S. F. (2004), An efficient approach for maintaining association rules based on adjusting fp-tree structures, International conference on database systems for advanced applications, 417-424.
[23]Li, Hua-Fu and Lee, Suh-Yin (2009), Mining frequent itemsets over data streams using efficient window sliding techniques, Expert systems with applications, 36(2), Part 1, 1466-1477.
[24]Lee, C. H., Lin, C. R. and Chen, M. S. (2001), On mining general temporal association rules in a publication database, International conference on data mining.
[25]Liu, B., Hsu, W. and Ma, Y. (1999), Mining association rules with multiple minimum supports, International conference on knowledge discovery in data , 337-341.
[26]Lucchese, C., Orlando, S. and Perego, R. (2006), Fast and memory efficient mining of frequent closed itemsets, Transaction on knowledge and data engineering, 18(1), 21-36.
[27]Manku, G. S. and Motwani, R. (2002), Approximate frequency counts over data streams, International conference on very large data bases, 346-357.
[28]Mobasher, B., Dai, H., Luo, T. and Nakagawa, M. (2001), Efficient personalization based on association rule discovery from web usage data, Workshop on web information and data management.
[29]Park, J. S., Chen, M. S. and Yu, P. S. (1995), An effective hash-based algorithm for mining association rules, Association for computing machinery special interest group on management of data, 24(2), 175-186.
[30]Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. (1999), Discovering frequent closed itemsets for association rules, International conference on database theory, 398-416.
[31]Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U. and Hsu, M. C. (2001), PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth, International conference on data engineering, 215-224.
[32]Quinlan, J. R. (1986), Introduction of decision trees, Machine learning, Vol. 1, 81-106.
[33]Quinlan, J.R. (1996), Improved Use of Continuous Attributes in C4.5, Journal of Artificial Intelligence Research, 77-90.
[34]Raïssi, C., Poncelet, P.  and Teisseire, M.  (2007), Towards a new approach for mining frequent itemsets on data stream, Journal of intelligent information systems, 28(1), 23-36.
[35]Wang, J., Han, J., and Pei, J. (2003), CLOSET+: searching for the best strategies for mining frequent closed itemsets, International conference on knowledge discovery and data mining, 236-245.
[36]Xu, Y., Yu, J. X., Liu, G. and Lu, H. (2002), From path tree to frequent patterns: A framework for mining frequent patterns,  International conference on data mining, 514-521.
[37]Yen, S. J. and Chen, A. L. P. (2001), A graph-based approach for discovering various types of association rules, Transactions on knowledge and data engineering, 13(5), 839-845.
[38]Yen, S. J. and Lee, Y. S. (2002a), Mining time-gap sequential patterns from transaction databases, Journal of computers, 14(2), 30-46.
[39]Yen, S. J., Lee, Y. S. and Chen, S. W. (2002b), Mining quantitative association rules from transaction database, National conference on fuzzy teory and its applications, 520-525.
[40]Yen, S. J., Lee, Y. S., Wang, C. K., Wu, J. W. and Ouyang, L. Y. (2009), The studies of mining frequent patterns based on frequent pattern tree, Pacific-asia conference on knowledge discovery and data mining, lecture notes in artificial intelligence, LNAI 5476, 232-241.
[41]Yu, J., Chong, Z. and Zhou, H. Lu (2004), False positive or false negative: Mining frequent itemsets from high speed transactional data streams, International conference on very large data bases, 204-215.
[42]Zaiane, O. R., El-Hajj, M. and Lu, P. (2001), Fast parallel association rule mining without candidacy generation, International conference on data mining, 665-668.
[43]Zaki, M. J. (2000a), Generation non-redundant association rules, International conference on knowledge discovery in data, 34-43.
[44]Zaki, M. J. (2000b), Scalable algorithms for association mining, Transactions on knowledge and data engineering, 12(3), 372-390
[45]Zaki, M. J. and Hsiao, C. J. (2002), CHARM: An efficient algorithm for closed itemset mining, International conference on data mining, 99-104.
[46]Zhou, Z. H., Jiang, Y. and Chen, S. F. (2000), A General Neural Framework for Classification Rule Mining, International Journal of Computers, System and Signals, 154-168.
論文全文使用權限
校內
紙本論文於授權書繳交後5年公開
同意電子論文全文授權校園內公開
校內電子論文於授權書繳交後5年公開
校外
同意授權
校外電子論文於授權書繳交後5年公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信