系統識別號 | U0002-2007201709245400 |
---|---|
DOI | 10.6846/TKU.2017.00699 |
論文名稱(中文) | 基於累積切片平均估計的非線性維度縮減法 |
論文名稱(英文) | Nonlinear dimension reduction based on the cumulative slicing mean estimation |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 數學學系碩士班 |
系所名稱(英文) | Department of Mathematics |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 105 |
學期 | 2 |
出版年 | 106 |
研究生(中文) | 王子豪 |
研究生(英文) | Tzu-Hao Wang |
學號 | 604190206 |
學位類別 | 碩士 |
語言別 | 英文 |
第二語言別 | |
口試日期 | 2017-07-14 |
論文頁數 | 57頁 |
口試委員 |
指導教授
-
吳漢銘(hmwu@gm.ntpu.edu.tw)
共同指導教授 - 黃逸輝(yhhuang@mail.tku.edu.tw) 委員 - 蘇家玉(emilysu@tmu.edu.tw) 委員 - 陳怡如(viviyjchen@stat.tku.edu.tw) |
關鍵字(中) |
累積切片估計 等距特徵映射 流形學習 非線性維度縮 減 切片逆迴歸 |
關鍵字(英) |
Cumulative slicing estimation isometric feature mapping manifold learning nonlinear dimension reduction sliced inverse regression |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
文獻中,對於流形學習的非線性維度縮減已有不少研究。其中,迴 歸等軸距切片逆迴歸法(ISOSIR),是屬於一種半監督式的學習演算 法,已被提出並証明它可以有效地探索非線性流形資料隱含的幾何 結構,例如瑞士捲資料。ISOSIR 是採用均值法做為一個基礎的群集 分析,應用到預先計算好的資料集等距距離矩陣。然而,反應變數 在群內及群間的順序訊息在群集之後會被忽略,而順序結構是非線 性資料很重要的特徵之一。另一方面,假設資料的具有類別資訊, 等距離矩陣的計算並沒有考慮到這個資訊。在本研究中,我們擴展 ISOSIR 和等軸距累積切片平均估計法,提出一監督式演算法,用以 解決上述這兩個問題。我們進行了模擬研究和實際資料分析,結果 顯示所提出的方法可以揭示非線性流形資料的幾何結構,同時與監 督式的ISOSIR 表現相當。我們更進一步研究,應用所找出的低維度 資料特徵於實際資料的分類及回歸問題。 |
英文摘要 |
A number of studies have been conducted on the nonlinear dimension reduction for manifold learning in the literature. Among them, the isometric sliced inverse regression (ISOSIR), a semi-supervised learning algorithm, has been proposed and shown to be useful for exploring the embedded geometric structure of the nonlinear manifold data set such as the Swiss roll. ISOSIR applied K-means as a base clustering method to the pre-calculated isometric distance matrix of the data set. However, the ordering information of response both within and between the resulting clusters was ignored where the ordering structure was one of the most important characteristics of a nonlinear manifold data set. On the other hand, the construction of the isometric distance matrix did not consider the class labels of data if they were available. In this study, we are motivated to settle these two defects and propose the supervised extensions of ISOSIR and isometric cumulative slicing mean estimation. We conducted the simulation studies and real data analysis and shown that the proposed method can reveal the geometric structure of a nonlinear manifold data set and the results were comparable to the supervised ISOSIR. We further investigated the applications of the found features for the classification and regression problems to the real world data sets. |
第三語言摘要 | |
論文目次 |
1 Introduction 1 2 Briefreviewofdimensionreductiontechniques 2 2.1 TheclassicalSIR.............................2 2.2 Thecumulativeslicingestimation(CUME)...............4 2.3 ThegeodesicdistanceapproximationandISOMAP..........5 2.4 TheisometricSIR(ISOSIR).......................6 3 ExtensionsofISOSIRandCUME 8 3.1 Thegeodesicdistanceapproximationrevisited.............8 3.2 TheextensionsofISOSIRandISOCUME...............9 4 Theslicingandtheseriationstrategies 10 4.1 TheslicingstrategyforSIR-basedmethods...............10 4.2 TheseriationstrategyforCUME-basedmethods...........11 5 Simulationstudies 12 6 Applications 16 6.1 Datavisualization............................16 6.2 Classificationproblems..........................16 6.3 Regressionproblems...........................17 7 Conclusion and discussion 18 |
參考文獻 |
References Aizerman, M.,Braverman,E.,andRozonoer,L.:Theoreticalfoundationsofthe potentialfunctionmethodinpatternrecognitionlearning.Automationand Remote Control 25, 821-837(1964) Balasubramanian, M.,Schwartz,E.L.:Theisomapalgorithmandtopologicalsta- bility.Science 295(5552), 7-7(2002) Belkin, M.,Niyogi,P.:Laplacianeigenmapsfordimensionalityreductionanddata representation.NeuralComputation 15(6), 1373-1396(2003) Bengio, Y.,Paiement,J.,Vincent,P.,Delalleau,O.,Roux,N.L.,Ouimet,M.: Out-of-sample extensionsforLLE,Isomap,MDS,Eigenmaps,andspectral clustering. InNeuralInformationProcessingSystems,pp.177-184.MIT Press (2003) Bian, W.,Tao,D.:ManifoldregularizationforSIRwithrateroot-nconvergence. AdvancesinNeuralInformationProcessingSystems 22, 117-125(2009) Bura, E.,Pfeiffer,R.M.:Graphicalmethodsforclasspredictionusingdimension reduction techniquesonDNAmicroarraydata.Bioinformatics 19(10), 1252- 1258 (2003) Chen, C.H.:Generalizedassociationplots:informationvisualizationviaiteratively generated correlationmatrices.StatisticaSinica 12, 7-29(2002) 22 Chen, C.H.,Li,K.C.:CanSIRbeaspopularasmultiplelinearregression?Statis- tica Sinica 8, 289-316(1998) Chen, C.H.,Li,K.C.:GeneralizationofFisher’slineardiscriminantanalysisvia the approachofslicedinverseregression.JournaloftheKoreanStatistical Society 30, 193-217(2001) Coifman, R.R.,Lafon,S.,Lee,A.B.,Maggioni,M.,Nadler,B.,Warner,F.,Zucker, S.W.: Geometricdiffusionsasatoolforharmonicanalysisandstructurede- finition ofdata:diffusionmaps.Proc.Natl.Acad.Sci.USA 102, 7426-7431 (2005) Cook,R.D.:Ontheinterpretationofregressionplots.JournaloftheAmerican Statistical Association 89, 177-190(1994) Cook,R.D.:Graphicsforregressionswithabinaryresponse.JournaloftheAme- rican StatisticalAssociation 91, 983-992(1996) Cook,R.D.:SAVE:amethodfordimensionreductionandgraphicsinregression. CommunicationsinStatistics:TheoryandMethods 29, 2109-2121(2000) Cook,R.D.,Critchley,F.:Identifyingregressionoutliersandmixturesgraphically. Journal oftheAmericanStatisticalAssociation 95, 781-794(2000) Cook,R.D.,Ni,L.:Sufficientdimensionreductionviainverseregression:amini- mumdiscrepancyapproach.JournaloftheAmericanStatisticalAssociation 100(470), 410-428(2005) Cook,R.D.,Ni,L.:Usingintraslicecovariancesforimprovedestimationofthe centralsubspaceinregression.Biometrika 93(1), 65-74(2006) Cox,T.F.,Cox,M.A.A.:MultidimensionalScaling,London:ChapmanandHall. (1994) Dettling, M.,Bühlmann,P.:Supervisedclusteringofgenes.GenomeBiology 3(12), research0069.1-0069.15.(2002) Donoho, D.L.,Grimes,C.:Hessianeigenmaps:locallylinearembeddingtechniques for high-dimensionaldata.Proc.Natl.Acad.Sci.USA 100(10), 5591-5596 (2003) 23 Frank,A.,Asuncion,A.:UCIMachineLearningRepository[http://archive.ics.uci.edu/ml]. Irvine, CA:UniversityofCalifornia,SchoolofInformationandComputer Science (2010) Fukumizu,K.,Bach,F.R.,Jordan,M.I.:Kerneldimensionreductioninregression. Ann. Statist. 37(4) 1871-1905(2009) Gaoa, X.,Liang,J.:Thedynamicalneighborhoodselectionbasedonthesam- pling densityandmanifoldcurvatureforisometricdataembedding,Pattern Recognition Letters32(2),202-209(2011) Garber,M.etal.:Diversityofgeneexpressioninadenocarcinomaofthelung. Proc.Natl.Acad.Sci.USA 98(24), 13784-13789(2001) Gather, U.,Hilker,T.,Becker,C.:Anoteonoutliersensitivityofslicedinverse regression. Statistics 36(4), 271-281(2002) Geng, X.,Zhan,D.C.,Zhou,Z.H.:Supervisednonlineardimensionalityreduction for visualizationandclassification.IEEETransSystManCybernBCybern 35(6), 1098-1107(2005) Ham, J.,Lee,D.D.,Mika,S.,Scholkopf,B.:Akernelviewofthedimensionality reduction ofmanifolds.ACMInternationalConferenceProceedingSeries 69, Proceedingsofthetwenty-firstinternationalconferenceonMachinelearning (2004). Hartigan, J.A.,Wong,M.A.:Ak-meansclusteringalgorithm.AppliedStatistics 28, 100-108(1979) Hastie, T.,Tibshirani,R.:DiscriminantanalysisbyGaussianmixtures.Journal of theRoyalStatisticalSociety,SeriesB 58, 155-176(1996) Hastie, T.,Tibshirani,R.,Friedman,J.:TheElementsofStatisticalLearning: Data Mining,Inference,andPrediction,SecondEdition,Springer.(2009) Hsing, T.:Nearestneighborinverseregression.TheAnnalsofStatistics 27(2), 697-731 (1999) Kuss, M.:NonlinearMultivariateAnalysiswithGeodesicKernels.Technische UniversitatBerlin,DiplomaTheses(2002) 24 Lee, Y.J.,Huang,S.Y.:Reducedsupportvectormachines:astatisticaltheory. IEEE TransactionsonNeuralNetworks 18, 1-13(2007) Li, K.C.:Slicedinverseregressionfordimensionreduction.JournalofTheAme- rican StatisticalAssociation 86, 316-342(1991) Li, L.:Sparsesufficientdimensionreduction.Biometrika 94(3) 603-613(2007) Li, C.G.,Guo,J.:Supervisedisomapwithexplicitmapping.Proceedingsofthe First InternationalConferenceonInnovativeComputing,Informationand Control-Volume 3, 345-348(2006) Li, L.,Yin,X.:Slicedinverseregressionwithregularizations.Biometrics 64(1), 124-131 (2007) Ni, L.,Cook,R.D.:Arobustinverseregressionestimator.Statistics&Probability Letters 77(3), 343-349(2007) Nilsson, J.,Fioretos,T.,Hoglund, M.,Fontes,M.:Approximategeodesicdistan- ces revealbiologicallyrelevantstructuresinmicroarraydata,Bioinformatics 20(6), 874-880(2004) Roweis,S.,Saul,L.:Nonlineardimensionalityreductionbylocallylinearembed- ding. Science 290, 2323-2326(2000) Samko,O.,Marshall,A.D.,Rosin,PL.:Selectionoftheoptimalparametervalue for theISOMAPalgorithm.PatternRecognitionLetters 27(9), 968-979(2006) Saul, L.K.,Roweis,S.T.:Thinkglobally,fitlocally:unsupervisedlearningoflow dimensional manifolds.JournalofMachineLearningResearch 4, 119-155 (2003) Setodji,C.M.,Cook,R.D:K-meansinverseregression.Technometrics 46(4), 421- 429 (2004) Smola, A.J.,Schölkopf,B.:Sparsegreedymatrixapproximationformachinelear- ning. inProceedingsofthe17thInternationalConferenceonMachineLear- ning, 911-918,StanfordUniversity,CA,MorganKaufmannPublishers(2000) Schölkopf,B.,Smola,A.J.:LearningWithKernels:SupportVectorMachines,Re- gularization, Optimization,andBeyond,MITPress,Cambridge,MA(2002) 25 Tenenbaum,J.B.,deSilva,V.,Langford,J.C.:Aglobalgeometricframeworkfor nonlinear dimensionalityreduction.Science 290, 2319-2323(2000) Tien, Y.J.Lee,Y.S.,Wu,H.M.,Chen,C.H.:Methodsforsimultaneouslyidenti- fying coherentlocalclusterswithsmoothglobalpatternsingeneexpression profiles. BMCBioinformatics 9:155 (2008) Vlachos,M.,Domeniconi,C.,Gunopulos,D.,Kollios,G.,Koudas,N.:Nonli- near dimensionalityreductiontechniquesforclassificationandvisualization. InternationalConferenceonKnowledgeDiscoveryandDataMining,645-651. Proceedingsofthe8thACMSIGKDDInternationalConferenceonKnowledge DiscoveryandDataMining(2002) Weinberger,K.Q.,Sha,F.,andSaul,L.K.:Learningakernelmatrixfornonli- near dimensionalityreduction.ProceedingsoftheTwentyFirstInternational Conference onMachineLearning(ICML2004),pp.839-846,Banff,Canada (2004) Williams, C.,Seeger,M.:UsingtheNystrommethodtospeedupkernelmachines, in Leen,T.K.,Dietterich,T.G.,andTresp,V.(eds),AdvancesinNeural Information ProcessingSystem 13, 682-688.MITPress(2001) Wu,H.M.:KernelSlicedinverseregressionwithapplicationsonclassification. Journal ofComputationalandGraphicalStatistics 17(3), 590-610(2008) Wu,H.M.,Lu,H.H.-S.:Supervisedmotionsegmentationbyspatial-frequential analysis anddynamicslicedinverseregression.StatisticaSinica 14, 413-430 (2004) Wu,H.M.,Lu,H.H.-S.:Iterativeslicedinverseregressionforsegmentationoful- trasound andMRImages.PatternRecognition 40(12) 3492-3502(2007) Wu,H.M.,Tien,Y.J.,Chen,C.H.:GAP:agraphicalenvironmentformatrix visualization andclusteranalysis,ComputationalStatisticsandDataAnalysis 54, 767-778(2010) Wu,Q.,Mukherjee,S.,Liang,F.:Localizedslicedinverseregression.Advances in NeuralInformationProcessingSystems 20, Cambridge,MA:MITPress (2008) 26 Yeh,Y.R.,Huang,S.Y.,Lee,Y.J.:Nonlineardimensionreductionwithkernel sliced inverseregression.IEEETransactionsonKnowledgeandDataEngi- neering 21(11), 1590-1603(2009) Zhong, W.,Zeng,P.,Ma,P.,Liu,J.S.,Zhu,Y.:RSIR:regularizedslicedinverse regression formotifdiscovery.Bioinformatics 21(22), 4169-4175(2005) |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信