淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


下載電子全文限經由淡江IP使用) 
系統識別號 U0002-0606200614065200
中文論文名稱 應用類神經網路於蛋白質二級結構預測
英文論文名稱 Protein secondary structure prediction by artificial neural networks
校院名稱 淡江大學
系所名稱(中) 資訊工程學系碩士班
系所名稱(英) Department of Computer Science and Information Engineering
學年度 94
學期 2
出版年 95
研究生中文姓名 張建蒼
研究生英文姓名 Jian-Tsang Chang
學號 693190562
學位類別 碩士
語文別 中文
口試日期 2006-06-06
論文頁數 77頁
口試委員 指導教授-許輝煌
委員-許輝煌
委員-王俊嘉
委員-鄭建中
中文關鍵字 類神經網路  蛋白質二級結構 
英文關鍵字 Gamma neural network  Protein secondary structure prediction 
學科別分類 學科別應用科學資訊工程
中文摘要 蛋白質的組成控制著它的功能,而人體裡面有數百甚至數千個蛋白質的存在,因此我們會想要去知道蛋白質個別的功用及它們之間彼此是如何互動的,而這個領域被稱作蛋白質體學,它的目的主要是在調查在生物裡面的蛋白質的功能為何。
蛋白質的功能是決定於它的架構,目前X光結晶繞射照影(X-ray Crystallography) 與核磁共振(Nuclear Magnetic Resonance)(NMR)都能夠視覺化蛋白質的三維架構。然而它們是耗費時間且昂貴的,而所耗費的時間長達數週到數個月,另外也有解析上的問題,也就是詳細的資訊也許會在實驗裡缺失掉。相對的,隨著生物科技在近十年裡的進展,胺基酸序列能夠被大量的產生出來,而且這種技術能十分快速且便宜的決定蛋白質的胺基酸序列為何,因此我們會想要直接藉由序列來得? 蛋白質的結構。
蛋白質的一級架構(primary structure)決定了二級架構(secondary structure),二級結構決定了三級架構(tertiary structure),四級結構(quaternary structure)也跟著被決定出來,而蛋白質的功能取決於它的三級結構以及四級結構,然而要預測出三級結構於四級結構並不是那麼容易的,目前已有不同的方法去做這方面的研究,而我們在這裡專注在利用胺基酸序列資訊去做結構預測的類神經網路技術,由於一級架構決定了二級架構,二級結構決定了三級架構,因此藉由序列得到二級架構是得到蛋白質結構的第一步。
蛋白質有三種主要的二級結構:螺旋體(alpha helices)、摺板體(beta sheets)以及迴旋體(coils),它們都是三級結構的子結構。蛋白質序列則是由20種胺基酸所組成,一般我們會以單ㄧ字母來表示一個胺基酸,而序列都會有終止碼(terminus code)去表示序列的前後端,因此我們一般會用21個的二元數字去對每個胺基酸做編碼。時間延遲類神經網路(Time-Delay Neural Network)已經被廣泛的使用在二級結構預測上,但是要決定適當的視窗(window)大小並不容易。
在這篇論文裡,我們會用具備記憶深度概念的迦瑪類神經模組(Gamma Neural Model)來進一步的增進二級結構預測的效果,另一方面為了獲得更多的輸入資訊,我們把胺基酸的化學性質列入考量而產生了新的胺基酸編碼方式。
在我們的實驗裡,我們發現迦瑪類神經網路能夠在耗費相當少的時間的情況下達到跟使用時間延遲類神經網路幾乎相同的預測效果,而新的編碼方式也確實提供了提升摺板預測率之效果,而這兩種技術都能夠用在當前運用到視窗概念及典型編碼方式的預測二級結構之類神經網路上。
英文摘要 The composition of proteins in an organism controls its functioning. There are hundreds or even thousands of proteins in an organism. To understand the function of respective protein and even the interaction between proteins is desired. The field is named as proteomics. It examines the functioning of proteins in an organism .
A protein's function can be determined by its structure. X-ray crystallographic and NMR are two techniques to visualize the three dimensional structure of the protein. However, they are both expensive and time-consuming. It takes weeks or even months to decide a protein's structure from either of the two techniques. Another problem with the techniques is the resolution. Details might be missing in the results. On the contrary, with the advances of bio-techniques in the past decade, the primary structure, i.e., the sequence of the protein in amino acids, can be found by high-throughput methods. It is fast and cheap to determine a protein's amino acid sequence. So it is desirable that the protein structure can be inferred simply from the protein sequence.
We can say that in nature the primary structure determines the secondary structure and the secondary structure decides the tertiary structure and then the quaternary structure. One more force or interaction is placed to the protein to have the next higher level structure when the four levels of structures are considered. The function of a protein resides in its tertiary and quaternary structures. However, it is a nontrivial task to predict the tertiary structure or quaternary structure of a protein. Different methods have been tried. Here we only focus on the neural network techniques with the sequence information of the protein. It is desired that the secondary structure can be predicted from the sequence and the tertiary structure then can be determined from the secondary structure. So to decide the secondary structure from the sequence is the first step.
There are three major secondary structures: alpha helices, beta sheets, and coils. They are the substructures of the tertiary structure. The protein sequence is composed of 20 amino acids. Each amino acid is named with a unique one-letter code. A terminus code is added to indicate the two ends of the sequence. Thus twenty-one-bit binary numbers can be used to encode the amino acids in the sequence. The time-delay neural network (TDNN) is generally used to classify each position of the amino acid sequence into the three substructures. However, it is hard to decide a proper window size for the TDNN.
In this thesis, the gamma neural model that is adaptable in memory depths is tested for further improvement of the prediction accuracy . Also, to gain more information on the input to the neural network, chemistry properties of the amino acids are taken into consideration for encoding the sequence.
In our experiments, we show that Gamma neural network (GNN) can spend much less time than time delay neural network (TDNN) and have similar results. On the other hand, the new encoding way also provides a higher beta-sheet prediction rate. These technologies can be implemented in artificial neural networks using the concept of window and the traditional encoding method for predicting protein secondary structures.
論文目次 目 錄
第一章 緒論 1
1.1 研究動機與目的 1
1.2 論文組織章節 3
第二章 文獻分析 5
2.1 胺基酸與蛋白質 5
2.2 蛋白質二級結構預測 6
2.2.1 源由與動機 7
2.2.2 預測方法的簡介 8
2.2.3 訓練與測試資料來源 9
2.2.4 測試方法 11
2.2.5評估預測的精準度 12
2.3 類神經網路 13
2.3.1 簡介 13
2.3.2 時間延遲類神經網路 15
2.3.3 迦瑪類神經網路 17
第三章 分類編碼 20
3.1 背景動機 20
3.2 相關方法 21
3.3 傳統編碼方式 23
3.4 分類編碼方式 25
第四章 類神經網路用於二級結構預測 28
4.1 時間延遲類神經網路 28
4.1.1 網路架構 28
4.1.2 運作方式 29
4.1.3 訓練方法 32
4.2 迦瑪類神經網路 35
4.2.1 網路架構 35
4.2.2 運作方式 39
4.2.3 訓練方法 41
第五章 系統實作與實驗結果 46
5.1 系統概觀 46
5.1.1 系統架構 46
5.1.2 系統介面 48
5.2 時間延遲類神經網路 56
5.2.1 視窗大小 56
5.2.2 隱藏層節點 58
5.3 分類編碼方式 59
5.4 迦瑪類神經網路 60
5.4.1 階層參數 62
5.4.2 右側涵蓋個數 63
5.4.3 隱藏層節點 64
5.5 結合分類編碼與迦瑪網路 65
第六章 結論與未來展望 67
6.1 結論 67
6.2 未來展望 68
參考文獻 70
英文論文 72
圖2.1蛋白質結構預測圖示............................9
圖2.2切割時間性序列給類神經網路示意圖....................17 圖3.1︰典型的編碼及輸入方式的概念圖.....................25 圖3.2︰新的編碼方式的概念圖.............................28 圖4.1︰用來預測二級結構的TDNN的架構圖.............29
圖4.2︰GNN用於蛋白質二級架構預測之示意圖...............36
圖4.3︰將資料放到迦瑪記憶體的示意圖................37
圖4.4︰記憶儲藏所的架構....................38
圖5.1︰系統的架構.....................................47
圖5.2︰程式主畫面........................................48
圖5.3︰網路設定畫面...........................49
圖5.4︰資料產生畫面.....................................49
圖5.5︰訓練畫面(1).....................................50
圖5.6︰訓練畫面(2)..................51
圖5.7︰圖形化測試結果(1).................54
圖5.8︰圖形化測試結果(2)..........................55
圖5.9︰混淆矩陣....................55
表2.1︰胺基酸與其對應的代碼.........................5
表3.1︰胺基酸分類資訊..........................22
表3.2︰20種胺基酸對應的編碼方式................26
表3.3︰20種胺基酸對應的分類編碼方式..........27
表5.1︰視窗大小與預測率關係圖...........................58
表5.2︰隱藏層節點與預測率關係圖.......................59
表5.3︰比較藉由典型編碼與分類編碼方式產生的結果........60
表5.4︰比較由TDNN與GNN產生出來的結果.................62
表5.5︰階層參數與預測率之間的關係........................63
表5.6︰右側涵蓋個數與預測率之間的關係....................64
表5.7︰GNN中隱藏層節點與預測率的關係.....................65
表5.8︰合併兩種方法得到的結果..................66

參考文獻 [1] Akkaladevi, S.; Katangur, A.K.; Belkasim, S.; Pan, Y.;(2004).Protein secondary structure prediction using neural network and simulated annealing algorithm.Engineering in Medicine and Biology Society, 2004. EMBC 2004. Conference Proceedings. 26th Annual International Conference of the Volume 2, 2004 Page(s):2987 - 2990 Vol.4

[2] Rost, B. & Sander, C. (1993).Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584-599.

[3] Jones, D. T. (1999).Protein secondary structure prediction based on position-specific scoring matrices.J. Mol. Biol. 292, 195-202.

[4] Cuff, J. A., & Barton, G. J. (1999).Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Protein 34,508-519.

[5] Qian, N. & Sejnowski, T. (1988).Predicting the secondary structure of globular proteins using neural network models. J. Mol. Biol. 202, 865-884.

[6] Hui-Huang Hsu; LiMin Fu; Principe, J.C.;(1996). Context analysis by the gamma neural network.Neural Networks, 1996., IEEE International Conference on Volume 2, 3-6 June 1996 Page(s):682 - 687 vol.2

[7] B. de Vries and J. C. Principe.(1992).The gamma model - A new neural model for temporal processing. Neural Net works, 5(4) : 565-576, 1992.

[8] Lamont, O.; Hiew Hong Liang; Bellgard, M.;(2001).Data representation influences protein secondary structure prediction using artificial neural networks.Intelligent Information Systems Conference, The Seventh Australian and New Zealand 2001 18-21 Nov. 2001 Page(s):411 - 415

[9] Guang-Zheng Zhang; De-Shuang Huang; Hong-Qiang Wang;(2004).Protein secondary structure prediction based on the amino acids conformational classification and neural network technique.Acoustics, Speech, and Signal Processing, 2004. Proceedings. (ICASSP '04). IEEE International Conference on Volume 5, 17-21 May 2004 Page(s):V - 573-6 vol.5

[10] 河北大學生物信息中心蛋白質數據庫, http://hpdb.hbu.cn/thesis/2005/general.asp

[11] Schalkoff, Robert J.(1997). Artificial neural networks. McGraw Hill

[12] Lesk, Arthur M(2002). Introduction to bioinformatics. Oxford University Press
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2006-06-09公開。
  • 同意授權瀏覽/列印電子全文服務,於2006-06-09起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信