§ 瀏覽學位論文書目資料
  
系統識別號 U0002-2208201610481900
DOI 10.6846/TKU.2016.00715
論文名稱(中文) 應用切片逆迴歸法於直方圖資料之維度縮減與視覺化
論文名稱(英文) Dimension Reduction and Visualization of the Histogram Data Using Sliced Inverse Regression
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 數學學系碩士班
系所名稱(英文) Department of Mathematics
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 104
學期 2
出版年 105
研究生(中文) 蕭敬翰
研究生(英文) Jing-Han Xiao
學號 604190180
學位類別 碩士
語言別 英文
第二語言別
口試日期 2016-07-21
論文頁數 25頁
口試委員 指導教授 - 吳漢銘
委員 - 陳君厚
委員 - 蘇家玉
關鍵字(中) 資料視覺化
直方圖資料
維度縮減
主成分分析
切片逆迴歸
象徵性資料分析
關鍵字(英) Data visualization
histogram data
principal component analysis
sliced inverse regression
symbolic data analysis
第三語言關鍵字
學科別分類
中文摘要
在象徵性資料分析 (Symbolic data analysis, SDA) 中,直方圖資料是一 個重要的研究主題,主要的研究發展方向是利用主成分分析法 (Principal component analysis, PCA)。在此研究中,我們利用另一個替代的維度縮減 方法逆切片迴歸法 (Sliced inverse regression, SIR) 去降低直方圖資料的維 度。逆切片迴歸法是一個基於切片的充分維度縮減技術使我們可以在低維 度空間中觀察高維度的資料所隱藏的結構與資訊。我們首先考慮直方圖資 料變數的經驗分布去計算象徵性權重共變異數矩陣,接著利用線性組合質 方圖的方法與矩陣視覺化技術去視覺化降維後的直方圖資料。我們會使用 多筆真實資料去評估此方法降維後的判別能力與視覺化方法。
英文摘要
The dimension reduction of the histogram-valued data (histogram data hereafter) is one of the active research topics in symbolic data analysis (SDA). The main thread has been focused on the extensions of the principal component analysis (PCA) though. In this study,
we extend the classical sliced inverse regression (SIR), an alternative method to dimension reduction, to the histogram data. SIR is one of the popular sliced-based sufficient dimension reduction techniques for exploring the intrinsic structure of high-dimensional data. We first consider the empirical (joint) density of histogram variables to compute the symbolic weighted covariance-variance matrix. Then a linear combination of histograms rule and the matrix visualization technique are employed to visualize the projections of histograms in
the low-dimensional subspace. We evaluate the method for the low- dimensional discriminative and visualization purposes with some applications to real data sets. The comparison with PCA for histogram data is also reported.
第三語言摘要
論文目次
1 Introduction 1
2 Dimension reduction for histogram-valued data 3
2.1 PCA and SIR . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 The distributional approach . . . . . . . . . . . . . . . . . . . 5
2.3 The linear combination of p histograms . . . . . . . . . . . . . 7
3 Visualization of histograms 9
3.1 Presentation of histogram data matrix . . . . . . . . . . . . . 9
3.2 A histogram of histograms . . . . . . . . . . . . . . . . . . . . 10
3.3 The 2D joint histogram . . . . . . . . . . . . . . . . . . . . . . 10
3.4 The 2D scatterplot . . . . . . . . . . . . . . . . . . . . . . . . 11
4 An example 11
5 The comparison study 17
6 Conclusion and discussion 18
參考文獻
References
[1] Billard, L. and Diday, E., (2003). From the statistics of data to the
statistics of knowledge: symbolic data analysis, Journal of the American
Statistical Association, 98(462), 470-487.
[2] Chen, M., Wang, H., and Qin, Z., (2014). Principal component analysis
for probabilistic symbolic data: a more generic and accurate algorithm.
Journal Advances in Data Analysis and Classi cation, 9(1), 59-79.
[3] Ichino, M., (2008). Symbolic PCA for histogram-valued data, in Pro-
ceedings IASC. December 5-8, Yokohama, Japan.
[4] Ichino, M. (2011). The quantile method for symbolic principal component
analysis, Statistical Analysis and Data Mining, 4(2): 184-198.
[5] Makosso-Kallyth, S. and Diday, E., (2012). Adaptation of interval PCA
to symbolic histogram variables, Adv Data Anal Classif, 6: 147-159.
[6] Massy, W.F., (1965). Principal components regression in exploratory
statistical research, Journal of the American Statistical Association,
60(309), 234-256.
[7] Nagabhushan, P. and Kumar, P., (2007). Histogram PCA, Advances in
Neural Networks - ISNN 2007 Lecture Notes in Computer Science, 4492,
pp.1012-1021, Springer, Berlin.
[8] Rodriguez, O., Diday, E., and Winsberg, S., (2000). Generalization of
the principal component analysis to histogram data, PKDD2000, Lyon,
2000.
[9] Verde, R. and Irpino, A., (2013). Dimension reduction techniques for
distributional symbolic data, SIS 2013 Statistical Conference, Advances
in Latent Variables - Methods, Models and Applications, University of
Brescia, June, 19-21, 2013.
22
[10] Wang, H., Chen, M., Li, N., and Wang, L., (2011). Principal component
analysis of modal interval-valued data with constant numerical characteristics,
Int. Statistical Inst.: Proc. 58th World Statistical Congress
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信