淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


下載電子全文限經由淡江IP使用) 
系統識別號 U0002-2208201610481900
中文論文名稱 應用切片逆迴歸法於直方圖資料之維度縮減與視覺化
英文論文名稱 Dimension Reduction and Visualization of the Histogram Data Using Sliced Inverse Regression
校院名稱 淡江大學
系所名稱(中) 數學學系碩士班
系所名稱(英) Department of Mathematics
學年度 104
學期 2
出版年 105
研究生中文姓名 蕭敬翰
研究生英文姓名 Jing-Han Xiao
學號 604190180
學位類別 碩士
語文別 英文
口試日期 2016-07-21
論文頁數 25頁
口試委員 指導教授-吳漢銘
委員-陳君厚
委員-蘇家玉
中文關鍵字 資料視覺化  直方圖資料  維度縮減  主成分分析  切片逆迴歸  象徵性資料分析 
英文關鍵字 Data visualization  histogram data  principal component analysis  sliced inverse regression  symbolic data analysis 
學科別分類 學科別自然科學數學
中文摘要 在象徵性資料分析 (Symbolic data analysis, SDA) 中,直方圖資料是一 個重要的研究主題,主要的研究發展方向是利用主成分分析法 (Principal component analysis, PCA)。在此研究中,我們利用另一個替代的維度縮減 方法逆切片迴歸法 (Sliced inverse regression, SIR) 去降低直方圖資料的維 度。逆切片迴歸法是一個基於切片的充分維度縮減技術使我們可以在低維 度空間中觀察高維度的資料所隱藏的結構與資訊。我們首先考慮直方圖資 料變數的經驗分布去計算象徵性權重共變異數矩陣,接著利用線性組合質 方圖的方法與矩陣視覺化技術去視覺化降維後的直方圖資料。我們會使用 多筆真實資料去評估此方法降維後的判別能力與視覺化方法。
英文摘要 The dimension reduction of the histogram-valued data (histogram data hereafter) is one of the active research topics in symbolic data analysis (SDA). The main thread has been focused on the extensions of the principal component analysis (PCA) though. In this study,
we extend the classical sliced inverse regression (SIR), an alternative method to dimension reduction, to the histogram data. SIR is one of the popular sliced-based sufficient dimension reduction techniques for exploring the intrinsic structure of high-dimensional data. We first consider the empirical (joint) density of histogram variables to compute the symbolic weighted covariance-variance matrix. Then a linear combination of histograms rule and the matrix visualization technique are employed to visualize the projections of histograms in
the low-dimensional subspace. We evaluate the method for the low- dimensional discriminative and visualization purposes with some applications to real data sets. The comparison with PCA for histogram data is also reported.
論文目次 1 Introduction 1
2 Dimension reduction for histogram-valued data 3
2.1 PCA and SIR . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 The distributional approach . . . . . . . . . . . . . . . . . . . 5
2.3 The linear combination of p histograms . . . . . . . . . . . . . 7
3 Visualization of histograms 9
3.1 Presentation of histogram data matrix . . . . . . . . . . . . . 9
3.2 A histogram of histograms . . . . . . . . . . . . . . . . . . . . 10
3.3 The 2D joint histogram . . . . . . . . . . . . . . . . . . . . . . 10
3.4 The 2D scatterplot . . . . . . . . . . . . . . . . . . . . . . . . 11
4 An example 11
5 The comparison study 17
6 Conclusion and discussion 18
參考文獻 References
[1] Billard, L. and Diday, E., (2003). From the statistics of data to the
statistics of knowledge: symbolic data analysis, Journal of the American
Statistical Association, 98(462), 470-487.
[2] Chen, M., Wang, H., and Qin, Z., (2014). Principal component analysis
for probabilistic symbolic data: a more generic and accurate algorithm.
Journal Advances in Data Analysis and Classi cation, 9(1), 59-79.
[3] Ichino, M., (2008). Symbolic PCA for histogram-valued data, in Pro-
ceedings IASC. December 5-8, Yokohama, Japan.
[4] Ichino, M. (2011). The quantile method for symbolic principal component
analysis, Statistical Analysis and Data Mining, 4(2): 184-198.
[5] Makosso-Kallyth, S. and Diday, E., (2012). Adaptation of interval PCA
to symbolic histogram variables, Adv Data Anal Classif, 6: 147-159.
[6] Massy, W.F., (1965). Principal components regression in exploratory
statistical research, Journal of the American Statistical Association,
60(309), 234-256.
[7] Nagabhushan, P. and Kumar, P., (2007). Histogram PCA, Advances in
Neural Networks - ISNN 2007 Lecture Notes in Computer Science, 4492,
pp.1012-1021, Springer, Berlin.
[8] Rodriguez, O., Diday, E., and Winsberg, S., (2000). Generalization of
the principal component analysis to histogram data, PKDD2000, Lyon,
2000.
[9] Verde, R. and Irpino, A., (2013). Dimension reduction techniques for
distributional symbolic data, SIS 2013 Statistical Conference, Advances
in Latent Variables - Methods, Models and Applications, University of
Brescia, June, 19-21, 2013.
22
[10] Wang, H., Chen, M., Li, N., and Wang, L., (2011). Principal component
analysis of modal interval-valued data with constant numerical characteristics,
Int. Statistical Inst.: Proc. 58th World Statistical Congress
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2016-08-22公開。
  • 同意授權瀏覽/列印電子全文服務,於2016-08-22起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信