§ 瀏覽學位論文書目資料
  
系統識別號 U0002-3006201015371400
DOI 10.6846/TKU.2010.01122
論文名稱(中文) 等軸距切片逆迴歸法之非線性流形學習
論文名稱(英文) Isometric sliced inverse regression for nonlinear manifolds learning
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 數學學系碩士班
系所名稱(英文) Department of Mathematics
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 98
學期 2
出版年 99
研究生(中文) 姚威廷
研究生(英文) Wei-Ting Yao
學號 696190452
學位類別 碩士
語言別 英文
第二語言別
口試日期 2010-06-25
論文頁數 29頁
口試委員 指導教授 - 吳漢銘
委員 - 陳君厚
委員 - 李百靈
委員 - 吳漢銘
關鍵字(中) 階層式群集分析
等軸距特徵映射
非線性維度縮減
非線性流形
秩二橢圓排序
切片逆迴歸法.
關鍵字(英) Hierarchical clustering
Isometric feature mapping (ISOMAP)
Nonlinear dimension reduction
Nonlinear manifold
Rank-two ellipse seriation
Sliced inverse regression
第三語言關鍵字
學科別分類
中文摘要
運用切片逆迴歸法可以找出有效的維度縮減方向來探索高維度資料的內在結構。在本論文中,我們針對非線性維度縮減問題,提出利用幾何測地線距離逼近法的一個混合型切片逆迴歸法,我們稱此方法為等軸距切片逆迴歸法。所提的方法中,第一步是先計算兩兩資料點等軸距距離,然後根據群集分析(例如:階層式群集分析)或排序方法(例如:秩二橢圓排序法)在這個距離矩陣上的分群結果,當成切片的依據,使得傳統的切片逆迴歸演算法可以被應用。
    我們將說明等軸距切片逆迴歸法可以重新找到非線性流形資料 (例如瑞士捲資料) 內隱的維度和幾何結構。進一步,我們將應用所找到的特徵向量在分類問題上。
    說明的例子會有一般的實際資料及微陣列基因表現資料。所提的方法也會和其它現存的幾個維度縮減方法相比較。
英文摘要
Sliced inverse regression (SIR) was introduced to find an effective linear dimension-reduction direction to explore the intrinsic structure of high dimensional data. In this study, we present isometric SIR for nonlinear dimension reduction - a hybrid of the SIR method using the geodesic distance approximation. First, the proposed method computes the isometric distance between data points; the resulting distance matrix is then sliced according to hierarchical clustering results with rank-two ellipse seriation, and the classical SIR algorithm is applied. We show that the isometric SIR can recover the embedded dimensionality and geometric structure of a nonlinear manifold dataset (e.g., the Swiss-roll). We illustrate how isometric SIR features can further be used for the classification problems. Finally, we report and discuss this novel method in comparison to several existing dimension-reduction techniques.
第三語言摘要
論文目次
1 Introduction 	                                    1

2 Isometric sliced inverse regression         	4
2.1 The classical SIR  . . . . . . . . . . . . . . . .	4
2.2 Geodesic distance approximation . . . . . . . .  .	5
2.3 SIR for nonlinear  manifold  learning  . . . . . .	6

3 Slicing strategies for  nonlinear manifolds when the response is unavailable 	8
3.1 K-means . . . . . . . . . . . . . . . . . . . . .	8
3.2 The agglomerative hierarchical clustering tree 
(HCT) . . . . . . . . . . . . . . . . . . . . . . . . 	9
3.3 Rank-two ellipse seriation  (R2E)  . . . . . . . .	9
3.4 The hierarchical clustering tree with rank-two ellipse seriation (HCTR2E) . . . . . . . . . . . . . . . . . . 10

4 Some practical issues                       	10
4.1 The pinch and short-circuit problem. . . . . . . .	10
4.2 Eigen-decomposition  for high-dimensional  data .	11

5 ISOSIR for  nonlinear dimension reduction and data visualization                                          12

6 Applications to classification problems	         15
6.1 UCI datasets . . . . . . . . . . . . . . . . . . .	16
6.2 Microarray  datasets . . . . . . . . . . . . . . .	17

7 Conclusion and discussion	                           17

References 	                                     18

Figures                                        	24

List of Tables

1 Characteristics  of the selected UCI data  sets. . .	16
2 Six publicly  available  microarray  datasets. . . .	18

List of Figures

1 The first two ISOSIR projections of the Swiss roll dataset (right column) using three different slicing schemes (left column): random slicing, K-means slicing and HCTR2E slicing. h = 8 slices were used. The data points within the same slices are color-coded for each slicing scheme.. . . . . . . . . . . . . . . . . . . . . . . .	24
2 From top to bottom, constant value contour lines of the first three eigenvectors with the corresponding eigenvalues are shown. Note that only two eigenvectors are available in linear  SIR.  . . . . . . . . . . . . . . . . . . . .	25
3 From top to bottom, constant value contour lines of the first three eigenvectors with the corresponding eigenvalues are shown. Note that only two eigenvectors are available in linear  SIR.  . . . . . . . . . . . . . . . . . . . .	25
4 The 2D projection plot of the Swiss roll data achieved by various dimension-reduction methods. A Gaussian kernel with a scale of 0.05 is used in KPCA and KSIR.  ISOSIR  uses HCTR2E slicing with eight slices. . . . . . . . . . .	26
5 The 2D projection  plot of the Swiss roll data with 10 noise dimensions using various dimension-reduction methods. A Gaussian kernel with a scale of 0.05 is used in KPCA and KSIR. ISOSIR  uses HCTR2E slicing  with eight slices. .26
6 The projections for the wine data on the estimated first 2D subspace. The colors represent the three different  classes. . . . . . . . . . . . . . . . . . . . . . . .	27
7 The projections for the lung cancer microarray data on the estimated first  2D subspace. . . . . . . . . . . .	27
8 Classification error rates with ten-fold cross-validation against a 1-to-10 dimensionality  based on the dimension-reduction variates and the full-dimensional  space vector x for nine UCI datasets. A Gaussian kernel with a scale of 0.05 is used for KSIR. . . . . . . . . . . . . . . . .	28
9 Classification  error rates with a leave-one-out cross-validation against a 1-to-10 dimensionality  based on dimension- reduction variates and the full dimensional space vector x for six public microarray datasets. A Gaussian kernel with a scale of 0.05 is used for KSIR.	29
參考文獻
Aizerman, M., Braverman, E., and Rozonoer, L.: Theoretical foundations of the potential function method in pattern recognition learning.  Automation and Remote Control, 25, 821-837 (1964)

Balasubramanian, M., Schwartz, E.L.: The isomap algorithm and topological sta- bility. Science 295(5552), 7-7 (2002)

Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15(6), 1373-1396 (2003)

Bian, W., Tao, D.: Manifold regularization for SIR with rate root-n convergence. Advances in Neural Information Processing Systems 22, 117-125 (2009)

Bura, E., Pfeiffer, R.M.: Graphical  methods for class prediction using dimension reduction techniques on DNA microarray data. Bioinformatics 19(10), 1252-1258 (2003)

Chen, C.H.: Generalized association plots: information visualization via iteratively generated correlation matrices. Statistica Sinica 12, 7-29 (2002)

Chen, C.H., Li, K.C.: Can SIR be as popular as multiple linear regression? Statis- tica Sinica 8, 289-316 (1998)

Coifman, R.R., Lafon, S., Lee, A.B., Maggioni, M., Nadler, B., Warner, F., Zucker, S.W.: Geometric diffusions as a tool for harmonic analysis and structure def- inition of data: diffusion maps. Proc.  Natl. Acad. Sci. USA 102, 7426-7431 (2005)

Cook, R.D.: On the interpretation of regression plots. Journal of the American Statistical Association 89, 177-190 (1994)

Cook, R.D.: Graphics  for regressions with a binary response. Journal of the Amer- ican Statistical Association 91, 983-992 (1996)

Cook, R.D.: SAVE: a method for dimension reduction and graphics in regression. Communications in Statistics: Theory and Methods 29, 2109-2121 (2000)

Cook, R.D., Critchley, F.: Identifying regression outliers and mixtures graphically. Journal of the American Statistical Association 95, 781-794 (2000)

Cook, R.D., Ni, L.: Sufficient dimension reduction via inverse regression: a mini- mum discrepancy approach. Journal of the American Statistical Association 100(470), 410-428 (2005)

Cook, R.D., Ni, L.: Using intraslice covariances for improved estimation of the central subspace in regression. Biometrika 93(1), 65-74 (2006)

Dettling, M., Bu¨hlmann,  P.:  Supervised clustering of genes.	Genome  Biology 3(12), research0069.1-0069.15. (2002)

Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. USA 100(10), 5591-5596 (2003)

Fukumizu, K., Bach, F.R., Jordan, M.I.: Kernel dimension reduction in regression. Ann. Statist. 37(4) 1871-1905 (2009)

Garber, M. et al.:  Diversity  of gene expression  in adenocarcinoma of the lung. Proc. Natl. Acad. Sci. USA 98(24), 13784-13789 (2001)

Gather, U., Hilker, T., Becker, C.: A note on outlier sensitivity of sliced inverse regression. Statistics 36(4), 271-281 (2002)

Geng, X., Zhan, D.C. Zhou, Z.H.: Supervised nonlinear dimensionality reduction for visualization and classification. IEEE Trans Syst Man Cybern B Cybern
35(6), 1098-1107 (2005)

Ham, J., Lee, D.D. Mika,  S., Scholkopf, B.: A kernel view of the dimensional- ity reduction of manifolds. ACM International Conference Proceeding Series 69, p47, Proceedings of the twenty-first international  conference on Machine learning (2004).

Hartigan, J.A., Wong, M.A.: A k-means clustering algorithm.  Applied  Statistics 28, 100-108 (1979)

Hsing, T.: Nearest neighbor inverse regression. The Annals of Statistics 27(2), 697-731 (1999)

Lee, Y.J., Huang, S.Y.: Reduced support vector machines: a statistical  theory. IEEE Transactions on Neural Networks 18, 1-13 (2007)

Li, K.C.: Sliced inverse regression for dimension reduction. Journal  of The American Statistical Association 86, 316-342 (1991)

Li, L.: Sparse sufficient dimension reduction. Biometrika 94(3) 603-613 (2007)

Li, C.G., Guo, J.: Supervised isomap with explicit mapping. Proceedings of the First International Conference on Innovative Computing, Information and Control - Volume 3, 345-348 (2006)

Li, L., Yin, X.: Sliced inverse regression with regularizations.  Biometrics 64(1), 124-131 (2007)

Murphy, P.M., Aha, D.W.: UCI Repository of Machine Learning Databases. Uni- versity of California,  Department of Information and Computer Science, Irvine, CA. (1993)

Ni, L., Cook, R.D.: A robust inverse regression estimator. Statistics & Probability Letters 77(3), 343-349 (2007)

Nilsson, J., Fioretos, T., Ho¨glund, M., Fontes, M.: Approximate  geodesic dis- tances reveal biologically relevant structures in microarray data, Bioinformatics 20(6), 874-880 (2004)

Roweis, S., Saul, L.: Nonlinear dimensionality reduction by locally linear embed- ding. Science 290, 2323-2326 (2000)

Setodji, C.M., Cook, R.D: K-means inverse regression. Technometrics 46(4), 421-429 (2004)

Smola, A.J., Sch¨olkopf, B.: Sparse greedy matrix approximation for machine learn- ing. in Proceedings of the 17th International Conference on Machine Learning, 911-918, Stanford University, CA, Morgan Kaufmann Publishers (2000)

Sch¨olkopf, B., Smola, A.J.: Learning With Kernels: Support Vector Machines, Regularization, Optimization, and Beyond, MIT Press, Cambridge, MA (2002)

Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319-2323 (2000)

Tien, Y.J. Lee, Y.S., Wu, H.M., Chen, C.H.: Methods for simultaneously identi- fying coherent local clusters with smooth global patterns in gene expression profiles. BMC Bioinformatics 9:155 (2008)

Vlachos, M., Domeniconi, C., Gunopulos, D., Kollios, G., Koudas,  N.:  Nonlin- ear dimensionality  reduction techniques for classification and visualization. International Conference on Knowledge Discovery and Data Mining, 645-651. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2002)

Williams, C., Seeger, M.: Using the Nystrom method to speed up kernel machines, in Leen, T. K., Dietterich, T. G., and Tresp, V. (eds), Advances  in Neural Information Processing System 13, 682-688. MIT Press (2001)

Wu, H.M.: Kernel  Sliced inverse regression with applications on classification. Journal of Computational and Graphical Statistics 17(3), 590-610 (2008)

Wu, H.M., Lu, H.H.-S.:  Supervised motion segmentation by spatial-frequential analysis and dynamic sliced inverse regression. Statistica Sinica 14, 413-430 (2004)

Wu, H.M., Lu, H.H.-S.: Iterative sliced inverse regression for segmentation of ul- trasound and MR Images. Pattern Recognition 40(12) 3492-3502 (2007)

Wu, H.M., Tien, Y.J., Chen, C.H.: GAP: a graphical  environment for matrix visualization and cluster analysis, Computational Statistics and Data Analysis 54, 767-778 (2010)

Wu, Q., Mukherjee, S., Liang, F.: Localized  sliced inverse regression. Advances in Neural Information Processing Systems 20, Cambridge, MA: MIT Press (2008)

Yeh, Y.R., Huang, S.Y., Lee, Y.J.: Nonlinear dimension reduction with kernel sliced inverse regression.  IEEE Transactions on Knowledge and Data Engineering 21(11), 1590-1603 (2009)

Zhong, W., Zeng, P., Ma, P., Liu, J.S., Zhu, Y.: RSIR: regularized sliced inverse regression for motif discovery. Bioinformatics 21(22), 4169-4175 (2005)
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信