系統識別號 | U0002-2208201610481900 |
---|---|
DOI | 10.6846/TKU.2016.00715 |
論文名稱(中文) | 應用切片逆迴歸法於直方圖資料之維度縮減與視覺化 |
論文名稱(英文) | Dimension Reduction and Visualization of the Histogram Data Using Sliced Inverse Regression |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 數學學系碩士班 |
系所名稱(英文) | Department of Mathematics |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 104 |
學期 | 2 |
出版年 | 105 |
研究生(中文) | 蕭敬翰 |
研究生(英文) | Jing-Han Xiao |
學號 | 604190180 |
學位類別 | 碩士 |
語言別 | 英文 |
第二語言別 | |
口試日期 | 2016-07-21 |
論文頁數 | 25頁 |
口試委員 |
指導教授
-
吳漢銘
委員 - 陳君厚 委員 - 蘇家玉 |
關鍵字(中) |
資料視覺化 直方圖資料 維度縮減 主成分分析 切片逆迴歸 象徵性資料分析 |
關鍵字(英) |
Data visualization histogram data principal component analysis sliced inverse regression symbolic data analysis |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
在象徵性資料分析 (Symbolic data analysis, SDA) 中,直方圖資料是一 個重要的研究主題,主要的研究發展方向是利用主成分分析法 (Principal component analysis, PCA)。在此研究中,我們利用另一個替代的維度縮減 方法逆切片迴歸法 (Sliced inverse regression, SIR) 去降低直方圖資料的維 度。逆切片迴歸法是一個基於切片的充分維度縮減技術使我們可以在低維 度空間中觀察高維度的資料所隱藏的結構與資訊。我們首先考慮直方圖資 料變數的經驗分布去計算象徵性權重共變異數矩陣,接著利用線性組合質 方圖的方法與矩陣視覺化技術去視覺化降維後的直方圖資料。我們會使用 多筆真實資料去評估此方法降維後的判別能力與視覺化方法。 |
英文摘要 |
The dimension reduction of the histogram-valued data (histogram data hereafter) is one of the active research topics in symbolic data analysis (SDA). The main thread has been focused on the extensions of the principal component analysis (PCA) though. In this study, we extend the classical sliced inverse regression (SIR), an alternative method to dimension reduction, to the histogram data. SIR is one of the popular sliced-based sufficient dimension reduction techniques for exploring the intrinsic structure of high-dimensional data. We first consider the empirical (joint) density of histogram variables to compute the symbolic weighted covariance-variance matrix. Then a linear combination of histograms rule and the matrix visualization technique are employed to visualize the projections of histograms in the low-dimensional subspace. We evaluate the method for the low- dimensional discriminative and visualization purposes with some applications to real data sets. The comparison with PCA for histogram data is also reported. |
第三語言摘要 | |
論文目次 |
1 Introduction 1 2 Dimension reduction for histogram-valued data 3 2.1 PCA and SIR . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 The distributional approach . . . . . . . . . . . . . . . . . . . 5 2.3 The linear combination of p histograms . . . . . . . . . . . . . 7 3 Visualization of histograms 9 3.1 Presentation of histogram data matrix . . . . . . . . . . . . . 9 3.2 A histogram of histograms . . . . . . . . . . . . . . . . . . . . 10 3.3 The 2D joint histogram . . . . . . . . . . . . . . . . . . . . . . 10 3.4 The 2D scatterplot . . . . . . . . . . . . . . . . . . . . . . . . 11 4 An example 11 5 The comparison study 17 6 Conclusion and discussion 18 |
參考文獻 |
References [1] Billard, L. and Diday, E., (2003). From the statistics of data to the statistics of knowledge: symbolic data analysis, Journal of the American Statistical Association, 98(462), 470-487. [2] Chen, M., Wang, H., and Qin, Z., (2014). Principal component analysis for probabilistic symbolic data: a more generic and accurate algorithm. Journal Advances in Data Analysis and Classi cation, 9(1), 59-79. [3] Ichino, M., (2008). Symbolic PCA for histogram-valued data, in Pro- ceedings IASC. December 5-8, Yokohama, Japan. [4] Ichino, M. (2011). The quantile method for symbolic principal component analysis, Statistical Analysis and Data Mining, 4(2): 184-198. [5] Makosso-Kallyth, S. and Diday, E., (2012). Adaptation of interval PCA to symbolic histogram variables, Adv Data Anal Classif, 6: 147-159. [6] Massy, W.F., (1965). Principal components regression in exploratory statistical research, Journal of the American Statistical Association, 60(309), 234-256. [7] Nagabhushan, P. and Kumar, P., (2007). Histogram PCA, Advances in Neural Networks - ISNN 2007 Lecture Notes in Computer Science, 4492, pp.1012-1021, Springer, Berlin. [8] Rodriguez, O., Diday, E., and Winsberg, S., (2000). Generalization of the principal component analysis to histogram data, PKDD2000, Lyon, 2000. [9] Verde, R. and Irpino, A., (2013). Dimension reduction techniques for distributional symbolic data, SIS 2013 Statistical Conference, Advances in Latent Variables - Methods, Models and Applications, University of Brescia, June, 19-21, 2013. 22 [10] Wang, H., Chen, M., Li, N., and Wang, L., (2011). Principal component analysis of modal interval-valued data with constant numerical characteristics, Int. Statistical Inst.: Proc. 58th World Statistical Congress |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信