電子學位論文服務

§ 瀏覽學位論文書目資料

本論文電子全文於2009-01-19起於校外公開使用
本論文紙本於2010-01-19起公開使用

系統識別號	U0002-1501200910360100
DOI	10.6846/TKU.2009.00462
論文名稱(中文)	以塔布基因為基礎的兩階段螞蟻分群方法
論文名稱(英文)	A Tabu search based two-stage ant system for automatic clustering
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	資訊管理學系碩士班
系所名稱(英文)	Department of Information Management
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	97
學期	1
出版年	98
研究生(中文)	曾偉哲
研究生(英文)	Wei-Che Tseng
學號	695630656
學位類別	碩士
語言別	繁體中文
第二語言別
口試日期	2009-01-07
論文頁數	58頁
口試委員	指導教授 - 李鴻璋委員 - 梁德昭委員 - 陸承志
關鍵字(中)	資料分群螞蟻演算法基因演算法塔布搜尋法分群效度指標
關鍵字(英)	Clustering Ant Algorithms Genetic Algorithm Tabu Search Cluster Validity Index
第三語言關鍵字
學科別分類
中文摘要	分群就是將一組資料依照相似程度將其分成數個群集（cluster），相同群集內的資料會有較高的相似性，而不同群集內的資料則會有較大的異質性。傳統上常用的方法有k平均值法（k-means）、凝聚法（agglomerative）、分裂法（divisive）等統計應用的方法，近年來則還應用了基因演算法（genetic algorithms, GA）、螞蟻族群系統（ant colony system, ACS）等啟發式的分群方法，其中以k平均值法最被廣泛應用。不過分群最困難的地方就是在於分群數上的選擇，過去常會訂立一個分群效度指標（cluster validity index），再一個一個去計算出擁有最佳分群效度的分群數，但這卻是個非常耗費時間的方式。為了解決使用者必須預先決定分群數的困擾，本研究提出一個兩階段的分群方法，簡稱TAC，第一階段為可自動決定群數的初步螞蟻分群法，此階段主要目的在於建構一個高密集度聚合的分群結構；第二階段則為針對第一階段分群的結果，搭配基因遮罩及分群效度指標的螞蟻分群法，此階段主要目的在於調整第一階段分群的結果，找出最佳的分群數及分群結構。實驗方面，利用UCI Machine Learning Repository及Gerrild and Lantz所提供的資料集，並與六個分群方法進行比較。實驗過程中，wine資料集的分群數與實際群數不符，經過試驗發現PBM指標某項因子的影響力過大，因此我們將此指標稍加修改，結果顯示本研究修改後的指標不但能找出正確的分群數，所提出的方法在分群的建構上也有比較好的結果。
英文摘要	Clustering is a process that organizes a set of datum into a number of clusters according to their similarity, and datum within a cluster are more similar to each other than they are belong to different clusters. There are many common algorithm, like K-means, Agglomerative, and Divisive, which are already perform various practical cases. Recently, numerous heuristic algorithms such as Genetic Algorithm (GA) and Ant Colony System (ACS) have been proposed to the clustering problems. K-means is the most popular technique among above. However, the most difficult problem in clustering is deciding a number of clusters. In the past, the general solution to the problem is that one performs a clustering technique with a cluster validity index which is selected in advance for all possible numbers of clusters in turn to find out a best number of clusters. In order to solve the troublesome problem which user have to determine a number of clusters beforehand, we propose a two-phased clustering algorithm, named TAC. The first phrase is an ant clustering algorithm which is able to automatically decide a number of clusters, and the purpose of this phrase is to construct a high similarity clustering structure. The second phrase is an ant clustering algorithm which is mixed genetic mask and cluster validity index, and the purpose of this phrase is to continuously adjust the clustering structure of the first phrase according to the result of the first one, so that we can find out the best number of clusters and the clustering structure finally. In our experiment, we use four data sets to perform the clustering problem and compare with six clustering techniques. In the process of the experiment, the number of clusters in wine data set do not conform the actual number of clusters in it. In turns of this, we modify one factor of PBM index slightly, and the result reveals that not only can we obtain the correct number of clusters but also can construct a better clustering structure with the modified index in this research.
第三語言摘要
論文目次	目錄III 圖目錄V 表目錄VII 第一章緒論 1 1.1 研究背景與動機 1 1.2 研究目的 2 第二章文獻探討 3 2.1 k平均值法 3 2.2 凝聚法 7 2.3 分裂法 7 2.4 DBSCAN 8 2.5 螞蟻演算法 10 2.6 基因演算法 15 2.7 塔布搜尋法 21 第三章研究方法 24 3.1 參數標記與定義 24 3.2 Ant System-based Clustering Algorithm（ASCA） 26 3.3 Ant Cluster Algorithm（ACA） 30 3.4 基因遮罩 33 3.5 分群效度指標 33 3.5.1 Dunn's index 33 3.5.2 Davies-Bouldin index 34 3.5.3 PBM index 34 3.6 研究架構 36 第四章實驗結果 39 第五章結論 53 參考文獻 54 附錄A 57 iris data set 57 wisconsin breast cancer data set 57 crude oil data set 58 wine data set 58 圖目錄圖1 k平均值法步驟一 4 圖2 k平均值法步驟二 5 圖3 k平均值法步驟三 5 圖4 k平均值法步驟四 6 圖5 k平均值法步驟五 6 圖6凝聚法產生的樹狀結構圖 8 圖7 DBSCAN分群法圖例說明 10 圖8螞蟻覓食概念圖 11 圖9螞蟻族群系統流程圖 12 圖10基因演算法流程圖 16 圖11基因編碼的型式 17 圖12單點交配 18 圖13雙點交配 18 圖14字罩交配 19 圖15突變 20 圖16塔布搜尋法流程圖 23 圖17 ASCA分群演算法 27 圖18 ASCA分群演算法 28 圖19 ASCA流程圖 29 圖20 ACA流程圖 32 圖21基因遮罩 33 圖22研究架構圖 37 圖23本研究分群的流程圖 38 圖24母代遮罩數5的分群效度指標 41 圖25母代遮罩數10的分群效度指標 42 圖26兩母代遮罩數比較圖 43 表目錄表1資料集的分群數、維度及資料點 39 表2第一階段參數ξ及初始分群數 40 表3母代遮罩數5的詳細資訊 41 表4母代遮罩數10的詳細資訊 42 表5不同更新門檻所獲得的分群效度 44 表6各項參數設定 45 表7 iris資料集的分群結果 46 表8 wisconsin breast cancer資料集的分群結果 46 表9 crude oil資料集的分群結果 47 表10 wine資料集的分群結果 47 表11加入常數值與所獲得的分群數 49 表12加入常數值後的分群效度指標 49 表13 iris資料集各群蒐集的資料數 50 表14 wisconsin breast cancer資料集各群蒐集的資料數 50 表15 crude oil資料集各群蒐集的資料數 50 表16 wine資料集各群蒐集的資料數 51 表17各資料集分群的正確率 51 表18 crude oil資料集標準化後的實驗數據 52 表19 crude oil資料集標準化後各群蒐集的資料數 52
參考文獻	[1] 柯景文，禁制搜尋法於動態車輛巡迴路線問題之研究，逢甲大學交通工程與管理研究所碩士論文，台中市，2002。 [2] 周仕雄，「應用螞蟻系統於資料挖礦之集群分析」，國立臺北科技大學生產系統工程與管理研究所碩士論文，台北市，2003。 [3] 周世章，「應用螞蟻族群系統構建群落分析演算法」，逢甲大學交通工程與管理研究所碩士論文，台中市，2004。 [4] MacQueen, J.B. “Some Methods for Classification and Analysis of Multivariate Observations”, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1: 281-297, 1967. [5] Leonard. Kaufman and Peter J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis, New York: John Wiley & Sons, 1990. [6] Ester M., Kriegel H.P., Sander J. and Xu X., “Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” In Proc. 1996 Int. Conf. Knowledge Discovery and Data Mining (KDD’96), pp. 226-231, Portland, OR, Aug. 1996. [7] Dorigo M., Maniezzo V., Colorni, A. “Ant system: optimization by a colony of cooperating agents,” IEEE Transactions on Systems, Man, and Cybernetics-Part B, Vol. 26, pp. 29-41, 1996. [8] Dorigo M., Gambardella L.M. “Ant colony system: a cooperative learning approach to the traveling salesman problem,” IEEE Transactions on Evolutionary Computation, Vol. 1, pp. 53-66, 1997. [9] Holland, J.H., “Adaptation in Natural and Artificial Systems”, Ann Arbor, MI：University of Michigan Press, 1975. [10] Glover, F., “Tabu Search-partⅠ,” ORSA Journal On Computing, Vol. 1, No. 3, pp. 190-206, 1989. [11] Dunn, J.C., “Well separated clusters and optimal fuzzy partitions”, J. Cybern., Vol. 4, pp. 95-104, 1974. [12] Davies, D.L. and Bouldin, D.W., “A cluster separation measure,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 1, No. 4, pp.224-227, 1979. [13] Pakhira, M.K., S. Bandyopadhyay, S. and Maulik, U., “Validity index for crisp and fuzzy clusters,” Pattern Recognition, Vol. 37, No. 3, pp. 487-501, 2004. [14] Pan, Shih-Ming and Cheng, Kuo-Sheng, "Evolution-based tabu search approach to automatic clustering, "IEEE Transactions on Systems, Man and Cybernetics. Part C, Applications and Reviews, Vol. 37, No. 5, pp. 827-838, 2007. [15] UCI Machine Learning Repository：http://archive.ics.uci.edu/ml/ [16] Johnson, R.A. and Wichern, D.W., “Applied Multivariate Statistical analysis,” Englewood Cliffs, New Jersey： Prentice-Hall, 1982.
論文全文使用權限	校內：紙本論文於授權書繳交後1年公開同意電子論文全文授權校園內公開校內電子論文立即公開校外：同意授權校外電子論文立即公開

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信