淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


下載電子全文限經由淡江IP使用) 
系統識別號 U0002-3006201015371400
中文論文名稱 等軸距切片逆迴歸法之非線性流形學習
英文論文名稱 Isometric sliced inverse regression for nonlinear manifolds learning
校院名稱 淡江大學
系所名稱(中) 數學學系碩士班
系所名稱(英) Department of Mathematics
學年度 98
學期 2
出版年 99
研究生中文姓名 姚威廷
研究生英文姓名 Wei-Ting Yao
學號 696190452
學位類別 碩士
語文別 英文
口試日期 2010-06-25
論文頁數 29頁
口試委員 指導教授-吳漢銘
委員-陳君厚
委員-李百靈
委員-吳漢銘
中文關鍵字 階層式群集分析  等軸距特徵映射  非線性維度縮減  非線性流形  秩二橢圓排序  切片逆迴歸法. 
英文關鍵字 Hierarchical clustering  Isometric feature mapping (ISOMAP)  Nonlinear dimension reduction  Nonlinear manifold  Rank-two ellipse seriation  Sliced inverse regression 
學科別分類 學科別自然科學數學
中文摘要 運用切片逆迴歸法可以找出有效的維度縮減方向來探索高維度資料的內在結構。在本論文中,我們針對非線性維度縮減問題,提出利用幾何測地線距離逼近法的一個混合型切片逆迴歸法,我們稱此方法為等軸距切片逆迴歸法。所提的方法中,第一步是先計算兩兩資料點等軸距距離,然後根據群集分析(例如:階層式群集分析)或排序方法(例如:秩二橢圓排序法)在這個距離矩陣上的分群結果,當成切片的依據,使得傳統的切片逆迴歸演算法可以被應用。
我們將說明等軸距切片逆迴歸法可以重新找到非線性流形資料 (例如瑞士捲資料) 內隱的維度和幾何結構。進一步,我們將應用所找到的特徵向量在分類問題上。
說明的例子會有一般的實際資料及微陣列基因表現資料。所提的方法也會和其它現存的幾個維度縮減方法相比較。
英文摘要 Sliced inverse regression (SIR) was introduced to find an effective linear dimension-reduction direction to explore the intrinsic structure of high dimensional data. In this study, we present isometric SIR for nonlinear dimension reduction - a hybrid of the SIR method using the geodesic distance approximation. First, the proposed method computes the isometric distance between data points; the resulting distance matrix is then sliced according to hierarchical clustering results with rank-two ellipse seriation, and the classical SIR algorithm is applied. We show that the isometric SIR can recover the embedded dimensionality and geometric structure of a nonlinear manifold dataset (e.g., the Swiss-roll). We illustrate how isometric SIR features can further be used for the classification problems. Finally, we report and discuss this novel method in comparison to several existing dimension-reduction techniques.
論文目次 1 Introduction 1

2 Isometric sliced inverse regression 4
2.1 The classical SIR . . . . . . . . . . . . . . . . 4
2.2 Geodesic distance approximation . . . . . . . . . 5
2.3 SIR for nonlinear manifold learning . . . . . . 6

3 Slicing strategies for nonlinear manifolds when the response is unavailable 8
3.1 K-means . . . . . . . . . . . . . . . . . . . . . 8
3.2 The agglomerative hierarchical clustering tree
(HCT) . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Rank-two ellipse seriation (R2E) . . . . . . . . 9
3.4 The hierarchical clustering tree with rank-two ellipse seriation (HCTR2E) . . . . . . . . . . . . . . . . . . 10

4 Some practical issues 10
4.1 The pinch and short-circuit problem. . . . . . . . 10
4.2 Eigen-decomposition for high-dimensional data . 11

5 ISOSIR for nonlinear dimension reduction and data visualization 12

6 Applications to classification problems 15
6.1 UCI datasets . . . . . . . . . . . . . . . . . . . 16
6.2 Microarray datasets . . . . . . . . . . . . . . . 17

7 Conclusion and discussion 17

References 18

Figures 24

List of Tables

1 Characteristics of the selected UCI data sets. . . 16
2 Six publicly available microarray datasets. . . . 18

List of Figures

1 The first two ISOSIR projections of the Swiss roll dataset (right column) using three different slicing schemes (left column): random slicing, K-means slicing and HCTR2E slicing. h = 8 slices were used. The data points within the same slices are color-coded for each slicing scheme.. . . . . . . . . . . . . . . . . . . . . . . . 24
2 From top to bottom, constant value contour lines of the first three eigenvectors with the corresponding eigenvalues are shown. Note that only two eigenvectors are available in linear SIR. . . . . . . . . . . . . . . . . . . . . 25
3 From top to bottom, constant value contour lines of the first three eigenvectors with the corresponding eigenvalues are shown. Note that only two eigenvectors are available in linear SIR. . . . . . . . . . . . . . . . . . . . . 25
4 The 2D projection plot of the Swiss roll data achieved by various dimension-reduction methods. A Gaussian kernel with a scale of 0.05 is used in KPCA and KSIR. ISOSIR uses HCTR2E slicing with eight slices. . . . . . . . . . . 26
5 The 2D projection plot of the Swiss roll data with 10 noise dimensions using various dimension-reduction methods. A Gaussian kernel with a scale of 0.05 is used in KPCA and KSIR. ISOSIR uses HCTR2E slicing with eight slices. .26
6 The projections for the wine data on the estimated first 2D subspace. The colors represent the three different classes. . . . . . . . . . . . . . . . . . . . . . . . 27
7 The projections for the lung cancer microarray data on the estimated first 2D subspace. . . . . . . . . . . . 27
8 Classification error rates with ten-fold cross-validation against a 1-to-10 dimensionality based on the dimension-reduction variates and the full-dimensional space vector x for nine UCI datasets. A Gaussian kernel with a scale of 0.05 is used for KSIR. . . . . . . . . . . . . . . . . 28
9 Classification error rates with a leave-one-out cross-validation against a 1-to-10 dimensionality based on dimension- reduction variates and the full dimensional space vector x for six public microarray datasets. A Gaussian kernel with a scale of 0.05 is used for KSIR. 29
參考文獻 Aizerman, M., Braverman, E., and Rozonoer, L.: Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control, 25, 821-837 (1964)

Balasubramanian, M., Schwartz, E.L.: The isomap algorithm and topological sta- bility. Science 295(5552), 7-7 (2002)

Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15(6), 1373-1396 (2003)

Bian, W., Tao, D.: Manifold regularization for SIR with rate root-n convergence. Advances in Neural Information Processing Systems 22, 117-125 (2009)

Bura, E., Pfeiffer, R.M.: Graphical methods for class prediction using dimension reduction techniques on DNA microarray data. Bioinformatics 19(10), 1252-1258 (2003)

Chen, C.H.: Generalized association plots: information visualization via iteratively generated correlation matrices. Statistica Sinica 12, 7-29 (2002)

Chen, C.H., Li, K.C.: Can SIR be as popular as multiple linear regression? Statis- tica Sinica 8, 289-316 (1998)

Coifman, R.R., Lafon, S., Lee, A.B., Maggioni, M., Nadler, B., Warner, F., Zucker, S.W.: Geometric diffusions as a tool for harmonic analysis and structure def- inition of data: diffusion maps. Proc. Natl. Acad. Sci. USA 102, 7426-7431 (2005)

Cook, R.D.: On the interpretation of regression plots. Journal of the American Statistical Association 89, 177-190 (1994)

Cook, R.D.: Graphics for regressions with a binary response. Journal of the Amer- ican Statistical Association 91, 983-992 (1996)

Cook, R.D.: SAVE: a method for dimension reduction and graphics in regression. Communications in Statistics: Theory and Methods 29, 2109-2121 (2000)

Cook, R.D., Critchley, F.: Identifying regression outliers and mixtures graphically. Journal of the American Statistical Association 95, 781-794 (2000)

Cook, R.D., Ni, L.: Sufficient dimension reduction via inverse regression: a mini- mum discrepancy approach. Journal of the American Statistical Association 100(470), 410-428 (2005)

Cook, R.D., Ni, L.: Using intraslice covariances for improved estimation of the central subspace in regression. Biometrika 93(1), 65-74 (2006)

Dettling, M., Bu¨hlmann, P.: Supervised clustering of genes. Genome Biology 3(12), research0069.1-0069.15. (2002)

Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. USA 100(10), 5591-5596 (2003)

Fukumizu, K., Bach, F.R., Jordan, M.I.: Kernel dimension reduction in regression. Ann. Statist. 37(4) 1871-1905 (2009)

Garber, M. et al.: Diversity of gene expression in adenocarcinoma of the lung. Proc. Natl. Acad. Sci. USA 98(24), 13784-13789 (2001)

Gather, U., Hilker, T., Becker, C.: A note on outlier sensitivity of sliced inverse regression. Statistics 36(4), 271-281 (2002)

Geng, X., Zhan, D.C. Zhou, Z.H.: Supervised nonlinear dimensionality reduction for visualization and classification. IEEE Trans Syst Man Cybern B Cybern
35(6), 1098-1107 (2005)

Ham, J., Lee, D.D. Mika, S., Scholkopf, B.: A kernel view of the dimensional- ity reduction of manifolds. ACM International Conference Proceeding Series 69, p47, Proceedings of the twenty-first international conference on Machine learning (2004).

Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm. Applied Statistics 28, 100-108 (1979)

Hsing, T.: Nearest neighbor inverse regression. The Annals of Statistics 27(2), 697-731 (1999)

Lee, Y.J., Huang, S.Y.: Reduced support vector machines: a statistical theory. IEEE Transactions on Neural Networks 18, 1-13 (2007)

Li, K.C.: Sliced inverse regression for dimension reduction. Journal of The American Statistical Association 86, 316-342 (1991)

Li, L.: Sparse sufficient dimension reduction. Biometrika 94(3) 603-613 (2007)

Li, C.G., Guo, J.: Supervised isomap with explicit mapping. Proceedings of the First International Conference on Innovative Computing, Information and Control - Volume 3, 345-348 (2006)

Li, L., Yin, X.: Sliced inverse regression with regularizations. Biometrics 64(1), 124-131 (2007)

Murphy, P.M., Aha, D.W.: UCI Repository of Machine Learning Databases. Uni- versity of California, Department of Information and Computer Science, Irvine, CA. (1993)

Ni, L., Cook, R.D.: A robust inverse regression estimator. Statistics & Probability Letters 77(3), 343-349 (2007)

Nilsson, J., Fioretos, T., Ho¨glund, M., Fontes, M.: Approximate geodesic dis- tances reveal biologically relevant structures in microarray data, Bioinformatics 20(6), 874-880 (2004)

Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embed- ding. Science 290, 2323-2326 (2000)

Setodji, C.M., Cook, R.D: K-means inverse regression. Technometrics 46(4), 421-429 (2004)

Smola, A.J., Sch¨olkopf, B.: Sparse greedy matrix approximation for machine learn- ing. in Proceedings of the 17th International Conference on Machine Learning, 911-918, Stanford University, CA, Morgan Kaufmann Publishers (2000)

Sch¨olkopf, B., Smola, A.J.: Learning With Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA (2002)

Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319-2323 (2000)

Tien, Y.J. Lee, Y.S., Wu, H.M., Chen, C.H.: Methods for simultaneously identi- fying coherent local clusters with smooth global patterns in gene expression profiles. BMC Bioinformatics 9:155 (2008)

Vlachos, M., Domeniconi, C., Gunopulos, D., Kollios, G., Koudas, N.: Nonlin- ear dimensionality reduction techniques for classification and visualization. International Conference on Knowledge Discovery and Data Mining, 645-651. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)

Williams, C., Seeger, M.: Using the Nystrom method to speed up kernel machines, in Leen, T. K., Dietterich, T. G., and Tresp, V. (eds), Advances in Neural Information Processing System 13, 682-688. MIT Press (2001)

Wu, H.M.: Kernel Sliced inverse regression with applications on classification. Journal of Computational and Graphical Statistics 17(3), 590-610 (2008)

Wu, H.M., Lu, H.H.-S.: Supervised motion segmentation by spatial-frequential analysis and dynamic sliced inverse regression. Statistica Sinica 14, 413-430 (2004)

Wu, H.M., Lu, H.H.-S.: Iterative sliced inverse regression for segmentation of ul- trasound and MR Images. Pattern Recognition 40(12) 3492-3502 (2007)

Wu, H.M., Tien, Y.J., Chen, C.H.: GAP: a graphical environment for matrix visualization and cluster analysis, Computational Statistics and Data Analysis 54, 767-778 (2010)

Wu, Q., Mukherjee, S., Liang, F.: Localized sliced inverse regression. Advances in Neural Information Processing Systems 20, Cambridge, MA: MIT Press (2008)

Yeh, Y.R., Huang, S.Y., Lee, Y.J.: Nonlinear dimension reduction with kernel sliced inverse regression. IEEE Transactions on Knowledge and Data Engineering 21(11), 1590-1603 (2009)

Zhong, W., Zeng, P., Ma, P., Liu, J.S., Zhu, Y.: RSIR: regularized sliced inverse regression for motif discovery. Bioinformatics 21(22), 4169-4175 (2005)
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2010-07-21公開。
  • 同意授權瀏覽/列印電子全文服務,於2010-07-21起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信