電子學位論文服務

§ 瀏覽學位論文書目資料

本論文電子全文於2005-06-15起於校外公開使用
本論文紙本於2005-06-15起公開使用

系統識別號	U0002-1406200514222000
DOI	10.6846/TKU.2005.00251
論文名稱(中文)	以動態任務分配為基礎之分散式循序樣本探勘系統
論文名稱(英文)	A Distributed Sequential Pattern Mining System based on Dynamic Task Partition
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	資訊管理學系碩士班
系所名稱(英文)	Department of Information Management
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	93
學期	2
出版年	94
研究生(中文)	周定賢
研究生(英文)	Ting-Hsien Chou
學號	692520454
學位類別	碩士
語言別	繁體中文
第二語言別
口試日期	2005-05-21
論文頁數	41頁
口試委員	指導教授 - 張昭憲委員 - 周清江委員 - 陳安斌委員 - 陳彥良
關鍵字(中)	循序樣本探勘分散式架構關聯規則資料探勘
關鍵字(英)	sequential pattern mining distributed architecture association rules data mining
第三語言關鍵字
學科別分類
中文摘要	循序樣本探勘(sequential pattern mining)可從資料庫找出經常出現的樣本，而且指出樣本中各項目出現的時序，其複雜度遠高於關聯規則式的菜籃分析(Market Basket Analysis)。針對循序樣本探勘目前已有許多方法被提出[1,10-16]，然而，面對日益膨脹的資料庫，這些方法的效能再次受到挑戰。為有效改善大型資料庫的探勘效率，利用網路結合多部電腦的分散式探勘(distributed mining)便開始受到重視[2][4]。為加速大型資料庫的循序樣本探勘，本研究以分散式架構為基礎研製有效的探勘演算法，並據以發展實用的探勘系統。首先，本研究提出任務佇列(task queue)的概念，有效結合靜態與動態任務分配之優點，不但可減輕靜態分配的任務歪斜問題，亦能降低動態分配頻繁的通訊負擔。其次，為使探勘完成後之結果彙整更有效率，本研究也充分利用閒置節點來進行探勘結果整合。此外，我們特別採用PrefixSpan[1]做為基礎演算法，以便有效控制任務間的獨立性。為評估系統效能，我們分別使用2、4、8、16及32部電腦進行分散式探勘實驗，數據顯示本系統不但能有效降低探勘時間，同時具有良好的加速比(speedup ratio)。此結果驗證了提出方法之有效性，也顯示本系統處理大型資料庫之潛能。
英文摘要	Sequential pattern mining can discover frequent patterns in database and point out the sequence of items in patterns. It is more complex than traditional association rule mining. In the past few years, there are many efficient sequential pattern mining methods have been proposed. However, their efficiency are being challenged because the size of real-life database is drastically increased. In order to alleviate the problem of mining large database, the researchers begin paying attention to perform mining under a distributed architecture. In this thesis, a distributed sequential pattern mining system are developed to speed up the mining process. The proposed system is based on a novel concept of “task queue”, which effectively abates task askew of static task partition and communication overhead of dynamic task partition. For the purpose of collecting the mining result efficiently, the first idle node is assigned to collect the resultant patterns. Besides, to maintain the independency of dispatched tasks, we adopt PrefixSpan as the outline algorithm. Finally , we performed a serious of experiments on 1, 2, 4, 8, 16 and 32 processors respectively. A fine speedup ratio is obtained according to the experimental results. It clearly demonstrate that the system has potential to deal with large database .
第三語言摘要
論文目次	目錄第一章緒論 1 第二章循序樣本探勘方法簡介 6 2.1 PREFIXSPAN演算法 7 2.2 分散式資料探勘演算法 10 2.3 演算法評析 12 第三章分散式資料探勘系統 14 3.1系統架構 14 3.2系統運作模式 15 3.3探勘過程之任務分配方式 17 第四章實驗結果 27 4.1實驗環境 27 4.2實驗結果 27 4.3實驗結果比較 31 第五章結論 34 參考文獻 35 附錄 37 附錄1 37 附錄2 39 表目錄表一範例交易資料庫 8 表二一階頻繁項目之投影資料庫 9 表三表一中的範例資料庫產生的循序樣本 9 表四探勘任務執行時間 28 表五產生一階頻繁項目集與二階頻繁項目集之比較 31 表六動態任務分配資料庫特性 32 表七動態任務分配加速比 33 圖目錄圖一系統運作模式 15 圖二樣本收集步驟 17 圖三靜態任務分配 18 圖四動態任務分配 19 圖五 Z字型靜態任務分配 25 圖六五萬筆資料加速比趨勢圖 29 圖七七萬五千筆資料加速比趨勢圖 29 圖八十萬筆資料加速比趨勢圖 30
參考文獻	[1] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu,“PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth”,In Proc. Int. Conf. Data Engineering (ICDE ’01), Heidelberg, Germany, April 2001, pp.215-224. [2] Valerie Gualnik , George Karypis ,“ Paralel tree-projection-based sequence mining algorithms ”,Parallel Computing ,30 (2004) 443-472 . (ELSEVIER) [3] Agrawal R, Srikant R.“Fast algorithm for mining association rules”.In: Proceedings of the 20th International Conference on VLDB. Santiago, 1994. 487~499. [4] Cheung, D.W. , Ng , V.T; Fu , A.W. , Yongjian Fu , “Efficient Mining of Association Rule in Distributed Databases ” , IEEE Transaction on Knowledge and Data Emgomeeromg , Vp; 8 ,No.6 .Dec 1996 [5] T.H. Cormen, C.E. Leiserson, R.L. Rivest, Introduction to Algorithms, MIT Press, McGraw-Hill,New York, NY, 1990. [6] A. Grama, A. Gupta, G. Karypis, V. Kumar, “Introduction to Parallel Computing: Design and Analysis of Algorithms, 2nd ed.”, Addison Wesley Publishing Company, Redwood City, CA, 2003. [7] J. S. Park; Ming-S yan Chen and Philip S. Yu , “An Effective Hash-Based Algorithm for Mining Association Rules,” Proceedins Management of Data ,May 1995 ACM SIGMOD international conference on Management of Data , May 1995,pp.175-186. [8] J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation”, In Proc. ACM-SIGMOD Int. Conf. Management of Data (SIGMOD’00), Dallas, TX, May 2000, pp.1-12. [9] J. Pei, J. Han, B. Mortazavi-Asl, H. Zhu, “Mining access pattern efficiently from web logs”, in proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2000, pp. 396-407. [10] R. Srikant and R. Agrawal. Mining sequential patterns. In 11th Int. Conf. Data Engineering, 1995. [11] R. Agrawal and R. Srikant. “Mining sequential patterns: Generalizations and per- formance improvements”. Proc. of the Fifth Int'l Conference on Extending Database Technology, 1995. [12] M. J. Zaki, “SPADE: An efficient algorithm for mining frequent sequences”, Machine Learning, vol. 1, no. 1~2, 2001, pp. 31-60. [13] http://www-sal.cs.uiuc.edu/~hanj/ [14] J. Pei, J. Han and W. Wang, “Mining Sequential Patterns with Constraints in Large Databases,” CIKM’02, Nov. 4-9, 2002, McLean, Virginia, USA. [15] Y. L. Chen, S. S. Chen, and P. Y. Hsu, “Mining hybrid sequential patterns and sequential rules”, Information Systems, Vol. 27, No. 5, 2004, pp. 345-362. [16] D. E. Brown, and R. B. Oxford, “Data Mining Time Series with Applications to Crime Analysis,” Proceeding of the 2001 IEEE conference, pp. 1453-1458. [17] P. Bertone and M. Gerstein, “Integrative Data Mining: The New Direction in Bioinformatics,” IEEE Engineering in Medicine and Biology, July/August 2001, pp. 33-40. [18] R.Agrawal and J.C. Shafer, “Parallel mining of association rules”, IEEE Transactions on Knowledge and Data Engineering, 8(6), pp962-969, 1996.
論文全文使用權限	校內：校內紙本論文立即公開同意電子論文全文授權校園內公開校內電子論文立即公開校外：同意授權校外電子論文立即公開

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信