電子學位論文服務

§ 瀏覽學位論文書目資料

本論文電子全文於2013-07-18起於校外公開使用
本論文紙本於2013-07-18起公開使用

系統識別號	U0002-0407201116435900
DOI	10.6846/TKU.2011.00113
論文名稱(中文)	基於不同相似尺度之多元整合式分群法於基因表現資料的群集分析
論文名稱(英文)	Multiple Ensemble Clustering Based on Different Similarity Measures for Gene Expression Data
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	數學學系碩士班
系所名稱(英文)	Department of Mathematics
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	99
學期	2
出版年	100
研究生(中文)	李牧學
研究生(英文)	Mu-Hsueh Li
學號	696190437
學位類別	碩士
語言別	繁體中文
第二語言別
口試日期	2011-06-17
論文頁數	42頁
口試委員	指導教授 - 吳漢銘委員 - 陳怡如委員 - 蘇家玉
關鍵字(中)	群集分析相關係數整合式分群相似尺度階層式分群法 K 均值法分割環繞物件法一致性分群法
關鍵字(英)	clustering consensus clustering ensemble clustering gene expression hierarchical clustering tree K-means partitioning around medoids similarity measures
第三語言關鍵字
學科別分類
中文摘要	微陣列資料群集分析的目的是為了找出在不同的實驗條件之下具有相似功能的基因表現。不同的相似尺度之下, 與使用不同的群集分析方法皆可導致不同的分群結果。本研究中,我們使用Pearson、Kendall、Spearman 三種不同的相關係數以及歐式距離尺度, 分別運行階層分群樹(HCT)、K均值(K-means)、分割環繞物件法(PAM)、一致性分群法(Consensus clustering) 與整合式分群法(Ensemble clustering) 。我們整合這些群集結果, 得到資料最後的分群, 期望得到較穩定的分群結果, 我們將以一組模擬資料與一組微陣列基因資料來說明與討論我們所提的方法。
英文摘要	Unsupervised clustering methods have been widely applied to the analysis of gene expression data to identify biologically relevant groups of genes. Using different clustering algorithms with various similarity measures usually results in quite different gene clusters. To lessen these effects, we propose a new clustering method by integrating various clustering algorithms based on three similarity measures. The proposed method, which we called the multiple ensemble clustering, averages the consensus results from the hierarchical clustering, the K-means, and the partitioning around medoids based on the Pearson rho, Kendall tau, and Spearman rank correlations. We use a simulated and a real data set to illustrate the proposed method. The validity indices indicate that the multiple ensemble clustering provide a much more stable clustering result.
第三語言摘要
論文目次	目錄中文摘要 i 英文摘要 ii 致謝詞 iii 1 導論 1 2 基因間的距離量測尺度 1 3 群集分析法 4 3.1 階層分群樹(hierarchical clustering tree, HCT) . . . . . . . . . . . . . 4 3.2 K均值法(K-means) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.3 分割環繞物件法(partitioning around medoids, PAM) . . . . . . . . . . 6 3.4 一致性分群法(consensus clustering, CC) . . . . . . . . . . . . . . . . . 7 3.5 整合式分群法(ensemble clustering, EC) . . . . . . . . . . . . . . . . . 8 4 多元整合式分群法(multiple ensemble clustering, MEC) 9 5 分群驗証指標 10 6 資料介紹 12 6.1 模擬資料. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 6.2 基因微陣列資料. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 7 結論與討論 14 參考書目 15 表目錄 1 方法縮寫的表格。. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 利用Connectivity、Silhouette 、Dunn、Davies-Bouldin 與Gap 驗證指標估計模擬資料的分群個數。. . . . . . . . . . . . . . . . . . . . . . . . 22 3 MEC 與CCHTC(p) 在模擬資料的分群結果交叉表格。其中橫軸與縱軸的 1、2、3代表了分群所得到的三個群集, 表格中的數值代表了群集之間兩兩的交集個數。. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4 MEC 與CCHTC(s) 在模擬資料的分群結果交叉表格。. . . . . . . . . . 28 5 MEC 與CCHTC(k) 在模擬資料的分群結果交叉表格。. . . . . . . . . . 28 6 利用Connectivity、Silhouette 、Dunn、Davies-Bouldin 與Gap 驗證指標估計老鼠腦細胞微陣列資料的最佳分群個數。. . . . . . . . . . . . . . 32 7 MEC 與CCHTC(d) 在老鼠腦細胞微陣列資料的分群結果交叉表格。其中橫軸與縱軸的1、2代表了分群所得到的兩個群集, 表格中的數值代表了群集之間兩兩的交集個數。. . . . . . . . . . . . . . . . . . . . . . . . . . 38 8 MEC 與CCK¡means(d) 在老鼠腦細胞微陣列資料的分群結果交叉表格。. 38 9 MEC 與CCPAM(d) 在老鼠腦細胞微陣列資料的分群結果交叉表格。。. . 38 10 MEC 與ECall(d) 在老鼠腦細胞微陣列資料的分群結果交叉表格。. . . . 38 圖目錄 1 左上方為原始資料, 從中隨機選取出M筆資料, 將這M 筆資料利用選定的分群方法運算, 將分群結果分別記錄於M(i, j與I(i, j)中, 重複運算b 次後, 利用一致性分群法的公式運算就可以求得一致性矩陣, 再利用分群的方法演算就可以的到最終的分群結果。. . . . . . . . . . . . . . . . . . . 18 2 左上方為利用HCT、K-means、PAM 三個方法得到的一致性矩陣, 將這三個矩陣相加做平均就會得到一個整合式矩陣, 再利用分群的方法演算就可以的到最終的分群結果。. . . . . . . . . . . . . . . . . . . . . . . . . 19 3 流程圖。. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 4 三個群集的模擬資料曲線圖。. . . . . . . . . . . . . . . . . . . . . . . 21 5 利用歐式空間距離作為相似尺度的模擬資料分群; 其中圖(a)、(b)、(c)、(d) 中的左邊為排序後的資料矩陣圖, 右邊是相對應的距離矩陣圖。. . . . . . 23 6 利用Pearson 作為相似尺度的模擬資料分群結果。. . . . . . . . . . . . . 24 7 利用Spearman 作為相似尺度的模擬資料分群結果。. . . . . . . . . . . 25 8 利用Kendall 作為相似尺度的模擬資料分群結果。. . . . . . . . . . . . . 26 9 模擬資料分群結果。(a)、(b)、(c) 為利用HCT、PAM、K-means 個別整合不同相似尺度, (d) 為多元整合式分群法。. . . . . . . . . . . . . . . . 27 10 MEC 與CCHTC(p) 在模擬資料的分群結果交叉圖表, 其中橫軸為資料中的六個變數, 縱軸為觀察值表現量。. . . . . . . . . . . . . . . . . . . . 29 11 MEC 與CCHTC(s) 在模擬資料的分群結果交叉圖表。. . . . . . . . . . 30 12 MEC 與CCHTC(k) 在模擬資料的分群結果交叉圖表。. . . . . . . . . . 31 13 利用歐式空間距離作為相似尺度的老鼠腦細胞微陣列資料分群, 其中圖(a)、 (b)、(c)、(d) 中的左邊為排序後的資料矩陣圖, 右邊是相對應的距離矩陣。33 14 利用pearson 作為相似尺度的老鼠腦細胞微陣列資料分群結果。. . . . . . 34 15 利用spearman 作為相似尺度的老鼠腦細胞微陣列資料分群結果。. . . . 35 16 利用kendall 作為相似尺度的老鼠腦細胞微陣列資料分群結果。. . . . . . 36 17 老鼠腦細胞微陣列資料分群結果圖示。(a)、(b)、(c) 為利用HCT、PAM、 K-means 個別整合不同相似尺度, (d) 多元整合式分群法。. . . . . . . . 37 18 MEC 與CCHCT (d) 在老鼠腦細胞微陣列資料的分群結果交叉圖, 其中橫軸為資料中的六個細胞(神經脊細胞和中層細胞), 縱軸為基因表現量。. . 39 19 MEC 與CCK¡means(d) 在老鼠腦細胞微陣列資料的分群結果交叉圖。. . 40 20 MEC 與CCPAM(d) 在老鼠腦細胞微陣列資料的分群結果交叉圖。. . . . 41 21 MEC 與ECall(d) 在老鼠腦細胞微陣列資料的分群結果交叉圖。. . . . . 42
參考文獻	Ashlock, D.A., Kim, E.Y., Guo, L., 2005. Multi-clustering: avoiding the natural shape of underlying metrics. In ANNIE Vol. 15, 453-461. Balasubramaniyan, R, Hullermeier, E, Weskamp, N, Kamper, J. 2005. Clustering of gene expression data using a local shape-based similarity measure. Bioinformatics, 21(7):1069-77. Chen, B., Tai, P.C., Harrison, R., Pan, Y., 2005. Novel hybrid hierarchical-Kmeans clustering method (H-K-means) for microarray analysis. CSB Workshops, pp:105-108. Dunn, J.C., 1974. Well separated clusters and fuzzy partitions. Journal on Cybernetics, 4, 95-104. Eisen, M.B. et al., 1998. Cluster analysis and display of genome-wide expression pattern. Proc, Natl. Acad. Sci. USA, 95:14863-14868. Guy, B., Vasyl, P., Susmita, D., Somnath, D., 2008. Clvalid: an R package for cluster validation. Journal of Statistical Software, Vol. 25, issue 4. Handl, J, Knowles, J, Kell D.B., 2005. Computational cluster validation in postgenomic. Bioinformatics, 21(15), 3201-12. Hardin, J., Mitani, A., Hicks, L., VanKoten, B., 2007. A Robust Measure of Correlation Between Two Genes on a Microarray. BMC Bioinformatics, 8:220. Hartigan, J.A., Wong, M.A., 1979. A k-means clustering algorithm. Applied Statistics, 28, 100-108. Hornik, K., 2005. A CLUE for CLUster Ensembles. Journal of Statistical Software, Vol. 14, Issue 12. Kasturi, J., Acharya, R., Ramanathan, M., 2003. An information theoretic approach for analyzing temporal patterns of gene expression. Bioinformatics, 19: 449-58. Kaufman, L., and Rousseeuw, P.J, 1990. Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons. Kerr, MA., Churchill, GA., 2001. Statistical design and the analysis of gene expression microarray data. Genetical Research, 77:123-8. Kim, E.Y., Kim, S.Y., Ashlock, D., Nam, D., 2009. MULTI-K: accurate classification of microarray subtypes using ensemble k-means clustering. BMC Bioinformatics, 10:260. Kim, S., Lee, J., 2007. Ensemble clustering method based on the resampling similarity measure for gene expression data. Statistical Methods in Medical Research, 16:539-564. Monti, S., Tamayo, P., Mesirov, J., Golub, T., 2003. Consensus clustering: a resampling based method for class discovery and visualizationof gene expression microarray data. Machine Learning, 52, 91-118. Rousseeuw, P.J., 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65. Savage, R., Heller, K., Xu, Y. , Ghahramani, Z., Truman, W.M., Grant, M., Denby, K.J., and Wild, D.L., 2009 R/BHC: fast Bayesian hierarchical clustering for microarray data. BMC Bioinformatics, 10:242. Yum, Z., Wong, H.S., and Wang, H., 2007. Graph-based consensus clustering for class discovery from gene expression data. Bioinformatics, 23(21), 2888-2896.
論文全文使用權限	校內：紙本論文於授權書繳交後2年公開同意電子論文全文授權校園內公開校內電子論文於授權書繳交後2年公開校外：同意授權予資料庫廠商校外電子論文於授權書繳交後2年公開

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信