淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


下載電子全文限經由淡江IP使用) 
系統識別號 U0002-2406200518490100
中文論文名稱 演化樹建立方法之分析與比較
英文論文名稱 Analysis and Comparison of the Construction Methods for Phylogenetic Trees
校院名稱 淡江大學
系所名稱(中) 資訊工程學系碩士班
系所名稱(英) Department of Computer Science and Information Engineering
學年度 93
學期 2
出版年 94
研究生中文姓名 連淑美
研究生英文姓名 Shu-Mei Lian
學號 792190091
學位類別 碩士
語文別 中文
口試日期 2005-06-06
論文頁數 85頁
口試委員 指導教授-許輝煌
委員-施國琛
委員-王慶生
委員-許輝煌
中文關鍵字 演化樹  無權重群組算數平均法  鄰點加入法  最大吝嗇度法  最大可能性法 
英文關鍵字 phylogenetic trees  ML  MP  FM  NJ  UPGMA  bootstrapping 
學科別分類 學科別應用科學資訊工程
中文摘要 在過去,生物分類學家以生物的形態及生理特徵做為物種分類的依據,再佐以解剖學或化石來驗證物種之間的關係,並依照物種的相互關係畫出演化樹。在發現生物分子序列後,科學家嘗試利用這些分子序列來分析物種間的相互關係,並陸續發展出各種不同的方法來建立演化樹。而利用演化樹分析基因間的關係,在生物科技研究方面也扮演重要的角色。本論文將研究5種演化樹建立方法:無權重群組算數平均法 (UPGMA) 、Fitch-Margoliash (FM) 法、鄰點加入法 (NJ) 、最大吝嗇度法 (MP) 及最大可能性法 (ML) ,以及一些評估方法的基本理論﹔然後再利用現有之軟體,對不同的分子序列資料來推導演化樹,比較各種建立演化樹方法所需之執行時間,以bootstrapping統計方法來評估所建立演化樹可能出現的機率。另外再依據模型樹模擬操作分類單元 (OTU,operational taxonomic unit) 的序列資料,用各種方法推導出演化樹後再與模型樹比較正確度,以此比較各種建立演化樹方法之差異。實際的實驗利用seqgen程式及PHYLIP現有之軟體進行,在改變OTU數目比較執行時間的實驗後發現,徹底搜尋法dnaml (ML) 、dnapars (MP) 、fitch (FM) 三者以MP的執行速度較快,FM法在OTU個數少時較ML快,但超過30個OTU後就比ML慢。逐步群集法NJ與UPGMA二者中以UPGMA較快。在準確度分析方面發現5種演化樹建立方法各有優缺點,整體而言在OTU序列長度越小時,ML表現比其他方法好,OTU序列長度越長時,FM與NJ表現較好。
英文摘要 In the past, the biological taxonomist classified the organisms according to the morphological and physiological characters, and confirmed the relationship of species by anatomy or fossils. Then, an evolutionary tree was drawn to show the relationship of species. After discovering the biological molecular sequences, the scientists have attempted to analyze the relationship between taxa by their molecular sequences, and continually developed different methods for constructing phylogenetic trees. And using the phylogenetic trees to analyze the relationship between genes plays an important role in the biotechnology area.
This thesis studies the elementary theories of five construction methods (ML, MP, FM, NJ and UPGMA) and evaluation methods for phylogenetic trees. Then, the software on the Web is used to construct the phylogenetic trees with different molecular sequence data. The execution time of different methods is compared. Bootstrapping is used to estimate the appearance probability of inferred trees from the construction methods. Moreover, a model tree is used to simulate the sequence data of operational taxonomic units (OTUs). Phylogenetic trees are reconstructed with the generated data by the five construction methods, and are compared with the model (correct) tree. The correctness of the phylogenetic tree by the five methods is then analyzed.
The ‘seqgen’ program and the software on the PHYLIP are used to perform the experiments. From the results of the experiment of execution time with different OTU numbers, the MP method is the fastest one in the three exhaustive methods. The FM method is faster than ML with smaller OTU numbers but slower than ML when the OTU number is over thirty. The UPGMA method is faster than NJ in the two stepwise clustering methods. In the aspect of accuracy analysis, the five methods have respective advantages and disadvantages. In general, the ML method is better than others with shorter OTU sequence lengths, the FM and NJ methods are better than others with longer OTU sequence lengths.
論文目次 第一章 簡介 1
1-1 研究動機與目的 1
1-2 相關術語及名詞 3
1-3 分子序列之距離 14
1-4 演化樹建立方法之分類 17
第二章 演化樹建立方法 21
2-1 無權重群組算數平均法 21
2-2 FM (Fitch and Margoliash) 法 24
2-3 鄰點加入法 29
2-4 最大吝嗇度 37
2-5 最大可能性 40
第三章 網路資源 45
3-1 演化樹相關軟體 45
3-2 國家衛生研究院 46
3-3 線上資料庫 47
3-4 Seq-Gen:序列產生軟體 49
第四章 演化樹之評估方法 52
4-1 樹的距離 (tree distance) 52
4-2 統計方法bootstrapping 56
第五章 實驗及結果 60
5-1 執行時間之比較 60
5-1-1 不同分子序列長度 61
5-1-2 不同OTU數目 66
5-2 統計分析 68
5-3 準確度分析 72
第六章 結論與未來研究方向 78
6-1 結論 78
6-2 未來研究方向 79
參考文獻 80
附錄 英文論文86

表目錄
表1.1 N個OTU可建立的無根樹與有根樹數目 7
表1.2 序列1至序列4間之距離矩陣 16
表1.3 Parsimony之演化樹軟體 17
表1.4 距離矩陣之演化樹軟體 18
表1.5 ML與貝氏定理之演化樹軟體 18
表1.6 Quartet之演化樹軟體 19
表1.7 Compatibility之演化樹軟體 19
表1.8 AI及GA之演化樹軟體 19
表2.1 A、B、C及D之距離矩陣 22
表2.2 (AC)、B及D之距離矩陣 23
表2.3 A、B、C、D之距離矩陣 25
表2.4 A、B與CD之距離矩陣 25
表2.5 AB群組與C、D之距離矩陣 26
表2.6 A、C、BD之距離矩陣 27
表2.7 AC群組後與B、D之距離矩陣 28
表2.8 A、B、C、D、E之距離矩陣 33
表2.9 A、B與 (CDE) 之距離矩陣 34
表2.10 AB互為鄰點與C、D、E之距離矩陣 34
表2.11 (AB)、C、(DE)之距離矩陣 35
表2.12 (AB)、C為鄰點與D、E之距離矩陣 36
表2.13 4個OTU之分子序列 37
表5.1 TreeBASE之M884原始名稱與修改後名稱對照表 62
表5.2 ML法與MP法在不同序列長度之執行時間 65
表5.3 不同方法在不同OTU數之執行時間 67
表5.4 NJ法與 UPGMA法在不同OTU數之執行時間 68
表5.5 5種方法其一致樹之距離關係 72
表5.6 各演化樹建立方法在高演化速率之準確度分析(%) 76
表5.7 各演化樹建立方法在低演化速率之準確度分析(%) 77

圖目錄
圖1.1 有根演化樹 4
圖1.2 無根演化樹 5
圖1.3 刻度樹與非刻度樹 5
圖1.4 3 OTUs 之無根演化樹與有根演化樹 7
圖1.5 基因樹與物種樹之關係 9
圖1.6 Jukes與 Cantor所提出之核苷酸置換模型 10
圖1.7 Kimura之兩參數核苷酸置換模型 13
圖1.8 含3個邊長為a,b,c之演化樹 16
圖2.1 以UPGMA所建立之演化樹 23
圖2.2 A、B與 (CD) 之關係 25
圖2.3 (AB) 與C、D之關係 26
圖2.4 (AB) 與C、D所構成之演化樹 27
圖2.5 A、C與 (BD) 之關係 27
圖2.6 (AC) 與B、D之關係 28
圖2.7 (AC) 與B、D所構成之演化樹 29
圖2.8 A、B互為鄰點,C、D互為鄰點 30
圖2.9 星狀樹 31
圖2.10 挑出兩點互為鄰點 31
圖2.11 A、B、C、D、E之星狀樹 33
圖2.12 A、B為鄰點與其他點之關係 34
圖2.13 (AB) 群組與C為鄰點與其他點之關係 35
圖2.14 (ABC) 群組與其他點之關係 36
圖2.15 4個OTU可決定3棵演化樹 37
圖2.16 3棵演化樹在位置1核苷酸變化情形 38
圖2.17 3棵演化樹在位置2核苷酸變化情形 38
圖2.18 3棵演化樹在位置3核苷酸變化情形 38
圖2.19 3棵演化樹在位置4核苷酸變化情形 39
圖2.20 3棵演化樹在位置5核苷酸變化情形 39
圖2.21 3棵演化樹在位置6核苷酸變化情形 39
圖2.22 3個OTU之演化樹及演化路徑 41
圖2.23 由z演化至Xu1 、Xu2的演化樹 42
圖4.1 有3個內部邊的演化樹 53
圖4.2 兩棵由5個OTU所組成的演化樹 54
圖4.3 比較兩棵樹之子樹集合 55
圖4.4 原始資料、取樣資料與推導樹 57
圖4.5 統計次數最高的演化樹 58
圖4.6 執行 consense程式輸出檔之部份資料 59
圖5.1 在Dos下執行Seqgen部份畫面 64
圖5.2 比較ML與MP法在不同序列長度之執行時間 66
圖5.3 各演化樹建立方法在不同OTU個數之執行時間 68
圖5.4 ML法之一致樹 69
圖5.5 MP法之一致樹 70
圖5.6 FM法之一致樹 70
圖5.7 NJ法之一致樹 71
圖5.8 UPGMA法之一致樹 71
圖5.9 A、B、C、D模型樹 73


參考文獻 Adachi, J., and Hasegawa, M. (1996). Model of amino acid substitution in proteins encoded by mitochondrial DNA. J. Mol. Evol., 42, 459-468.

Adachi, J., Waddell, P.J., Martin, W., and Hasegawa, M. (2000). Plastid genome phylogeny and a model of amino acid substitution for proteins encoded by chloroplast DNA. J. Mol. Evol., 50, 348-358.

Bleidorn, C., Vogt, L., and Bartolomaeus, T. (2005). Molecular Phylogeny of Lugworms (Annelida, Arenicolidae) inferred from three genes. Molecular Phylogenetics and Evolution 34, 673-679.

Bluis, J., Nori, R., Hsin-Wei Wang, Pinglei Zhou, Gogarten, J.P. and Dong-Guk Shin, (2001). Comparing Trees in a Phylogenetic Relationship Repository. Bioinformatics and Bioengineering Conference, 2001. Proceedings of the IEEE 2nd International Symposium on IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001), 166-173

Bush, R. M., C.A. Bender, K. Subbaro, N.J. Cox and W.M. Fitch. (1999). Predicting the evolution of Human influenza A. Science, 286, 1921-1925.

Cavalli-Sforza, L. L. and A. W. F. Edwards. (1967). Phylogenetic analysis: Models and estimation procedures. Am. J. Hum. Genet., 19, 233-257.

Chang, B. S. W. and M. J. Donoghue. (2000). Recreating ancestral proteins. Trends Ecol. Evol., 15, 109-114.

Cruickshank, R. H. and Thomas, R. H., (1999). Evolution of haplodiploidy in dermanyssine mites (Acari: Mesostigmata). Evolution, 53, 1796-1803.

Dayhoff, M.O., Schwartz, R.M., Orcutt, B.C. (1978). A model of evolutionary change in proteins. In Dayhoff, M.O. (ed.) Atlas of Protein Sequence Structur. , Vol5, Suppl. 3, National Biomedical Research Foundation, Washington DC, pp. 345-352.

Durbin, R., Eddy, S., Krogh, A. and Mitchison, G. (1998). Biological sequence analysis, Cambridge University Press

Efron, B. (1979). Bootstrap methods: another look at the jackknife. Ann. Stat. 7, 1-26.

Eisen, J. A. and Wu, M. (2002). Phylogenetic Analysis and Gene Functional Predictions: Phylogenomics in Action. Theoretical Population Biology. 61, 481-487.

Felsenstein, J. (1973). Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Systematic Zoology 22, 240-249.

Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol., 17, 368-376.

Felsenstein, J. (1982). Numerical methods for inferring evolutionary trees. Q. Rev. Biol. 57, 379-404.

Felsenstein, J. (1983). Statistical inference of phylogenies. J. Roy. Statist. Soc. Ser. A 146, 246–272.

Felsenstein, J. (1985). Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783-791.

Felsenstein, J. (1988). Phylogenies from molecular sequences: inference and reliability. Annu. Rev. Genet. 22, 521-565.

Felsenstein, J. and Churchill, G. (1996). A hidden markov model approach to variation among sites in rate of evolution. Molecular Biology and Evolution, 13, 93-104.

Ferguson, N.M., A.P. Galvani and R.M. Bush. (2003). Ecological and immunological determinants of influenza evolution. Nature 422, 428-433.

Fitch, W. M. (1977). On the problem of discovering the most parsimonious tree. The American naturalist, 111, 223-257.

Fitch, W.M.and Margoliash, E. (1967). Construction of phylogenetic trees. Science 155, 279-284.

Fitch, W.M., R.M. Bush, C.A. Bender and N.J. Cox. (1997). Long term trends in the evolution of H(3) HA1 human influenza type A. Proceedings of the National Academy of Sciences 94, 7712-7718.

Fuellen, G. and Wagele, J.W. and Giegerich, R.(2001). Minimum Conflict - A Divide-And-Conquer Approach to Phylogeny Estimation. Bioinformatics, 17, Pages, 1168-1178

Gascuel, O. (2000). On the Optimization Principle in Phylogenetic Analysis and the Minimum-Evolution Criterion. Mol. Biol. Evol., 17(3), 401-405.

Graur, Dan and Li, Wen-Hsiung. (2000). FUNDAMENTALS OF MOLECULAR EVOLUTION, Sinauer Associates, Inc.

Hasegawa, M., Kishino, H. and Yano, T. (1985). Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol., 22, 160-174.

Holmes, S. (2003). Bootstrapping Phylogenetic Trees, Theory and methods. Statistical Science, 18(2), 241–255

Jones, D.T., Taylor, W.R. and Thornton, J.M. (1992). The rapid generation of mutation data matrices from protein sequences. CABIOS. 8, 275-282

Jukes, T. H. and Cantor, C. R. (1969). Evolution of protein molecules. In H. N. Munro, ed., Mammalian Protein Metabolism, pp. 21-132, Academic Press, New York.

Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol., 16, 111-120.

Krane, D. E. and Raymer, M. L. (2003). FUNDAMENTAL CONCEPTS OF BIOINFORMATICS, Person Education, Inc.

Kumar, S. (1996). A Stepwise Algorithm for Finding Minimum Evolution Trees. Mol. Biol. Evol., 13(4), 584-593.

Li, Wen-Hsiung, (1997). Molecular Evolution, Sinauer Associates, Inc.

Michener,C.D. and Sokal,R.R. (1957). A quantitative approach to a problem in classification. Evolution, 11, 130-162.

Misener, S. and Krawetz, S.A. (2000). Bioinformatics Methods and Protocols. Humana Press

Mount, D. W. (2001). Bioinformatics Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press

Nakhleh, L., Roshan, ., St. John, K., Sun, J. And Warnow, T., (2001). Designing Fast Converging Phylogenetic Methods. proceedings of the Ningh International Conference on Intelligent systems for Molecular Biology (ISMB 2001), 190-198.

Pearson, W. R., Robins, G., and Zhang, T. (1999). Generalized neighbor-joining: more reliable phylogenetic tree reconstruction. Mol. Biol. Evol., 16, 806-816.

Penny, D. and Hendy, M.D. (1985). The use of tree comparison metrics. Syst. Zool. 34, 75-82.

Phillips, M. J., Delsuc, F. and Penny, D. (2004). Genome-Scale Phylogeny and the Detection of Systematic Biases. Mol. Biol. Evol., 21(7), 1455–1458.

Rambaut, A. and Grassly, N. C. (1997). Seq-Gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13, 235-238.

Rest, J. S., and Mindell, D. P. (2003). SARS associated coronavirus has a recombinant polymerase and coronaviruses have a history of host-shifting. Infection, Genetics and Evolution 3 (2003) 219-225.

Robinson, D.F. and Foulds, L.R. (1981). Comparison of phylogenetic trees. Math. Biosci. 53, 131-147.

Rokas, A., Williams, B.L., King, N. and Carroll, S.B. (2003). Genome-scale approaches to resolving incongruence in molecular phylogenies. NATURE, 425, 798-804 OCTOBER 2003

Saitou, N. and Imanishi, T. (1989). Relative efficiencies of the Fitch-Margoliash, maximum-parsimony, maximum-likelihood, minimum-evolution, and neighbor-joining methods of phylogenetic tree construction in obtaining the correct tree. Mol. Biol. Evol., 6, 514-525.

Saitou, N. and Nei, M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol., 4(4), 406-25.

Sankoff, D. (1975). Minimal mutation trees of sequences. SIAM.I. Appl. Math., 28, 35-42.

Sitnikova, T., A. Rzhetsky, and M. Nei. (1995). Interior-branch and bootstrap tests of phylogenetic trees. Mol. Biol. Evol., 12(2), 319-333

Waterman, M.S., and Smith, T.F. (1978). On the similarity of dendrograms. J. Theor. Biol. 73, 789-800.

Whelan, S. and Goldman, N. (2001). A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol. Biol. Evol., 18, 691-699.

Wiens, J. J. (2004). The role of morphological data in phylogeny reconstruction. Systematic Biology 53(4), 653–661.

Williams, T.L., and Moret, B.M.E. (2003). An investigation of phylogenetic likelihood methods. Proc. 3rd IEEE Symp. on Bioinformatics and Bioengineering (BIBE'03), IEEE Press, 2003, 79-86.

Yang, Z. (1994). Estimating the pattern of nucleotide substitution. J. Mol. Evol., 39, 105-111
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2005-07-19公開。
  • 同意授權瀏覽/列印電子全文服務,於2005-07-19起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信