§ 瀏覽學位論文書目資料
  
系統識別號 U0002-0307202522181900
DOI 10.6846/tku202500495
論文名稱(中文) 不同距離測度與分群方法於空間資料之比較
論文名稱(英文) A Comparative Study of Different Distance Measures and Clustering Methods in Spatial Data Analysis
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 統計學系應用統計學碩士班
系所名稱(英文) Department of Statistics
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 113
學期 2
出版年 114
研究生(中文) 李琍絹
研究生(英文) LI-CHUAN LEE
學號 613650026
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2025-06-25
論文頁數 41頁
口試委員 指導教授 - 張雅梅(yameic@ccu.edu.tw)
口試委員 - 張春桃(chuntao@mail.tku.edu.tw)
口試委員 - 張育瑋(ychang@nccu.edu.tw)
共同指導教授 - 吳碩傑(shuo@mail.tku.edu.tw)
關鍵字(中) 核密度估計
主成分分析
分群分析
距離度量
關鍵字(英) Kernel density estimation
Principal Component Analysis
Clustering Analysis
Distance metrics
第三語言關鍵字
學科別分類
中文摘要
物種的空間分佈是生態學與生物地理學中的重要議題,能反映物種對環境資源的需求與相互作用。隨著生態空間資料的累積與分析方法的發展,如何有效揭示多物種空間分佈的群聚模式,成為生態資料探勘的重要挑戰。本研究提出一套結合核密度估計(kernel density estimation)、主成分分析(principal component analysis)、距離測度(distance measures)與多種分群方法(包括 k-means、k-medoids 及多種階層式分群)的系統化分析流程。首先利用核密度估計將離散物種出現點轉換為連續的空間強度函數,進而透過主成分分析降低資料維度,保留主要變異結構。接著,採用不同分群演算法與距離度量,包含歐幾里得距離、坎培拉距離等,全面比較各方法在辨識物種群聚結構上的效能。為驗證方法的穩健性與適用性,本研究透過模擬資料進行比較,並應用於實際生態空間資料,結果顯示所提流程能有效捕捉物種分佈特徵,具備良好的解釋力與實務應用價值。本研究成果期望為生態空間資料的多物種分析提供具體可行且具彈性的分析框架,促進生態資料探勘與保育決策之發展。
英文摘要
The spatial distribution of species is a key issue in ecology and biogeography, reflecting species' environmental needs and interactions. With increasing ecological spatial data, uncovering clustering patterns among multiple species poses a major challenge. This study proposes a systematic framework combining kernel density estimation (KDE), principal component analysis (PCA), distance measures, and multiple clustering methods, including k-means, k-medoids, and hierarchical clustering. KDE transforms species occurrence points into spatial intensity functions, and PCA reduces dimensionality while preserving major variation. Various distance metrics and clustering algorithms are compared to assess their ability to identify spatial clusters. Simulation and real ecological data show the framework effectively captures species distribution patterns, offering strong interpretability and practical value. This work provides a flexible tool for multi-species spatial analysis and supports ecological research and conservation planning.
第三語言摘要
論文目次
目錄
圖 目錄 II
表 目錄 III
第一章 緒論 1
第二章 研究方法 4
第一節 主成分分析 (PCA) 5
第二節 距離測度 5
第三節 分群分析 7
第一小節 階層式分群 7
第二小節 非階層式分群 8
第四節 蘭德指數 9
第三章 模擬研究 10
第四章 實例分析 14
第五章 結論 24
參考文獻 26
附錄 31


圖 目錄
3.1 2000點個數模擬空間資料: (a) 情況 1 : 向右上生長 (b) 情況 2 : 向左上生長 (c) 情況 3 : 向上生長 (d) 情況 4 : 向低海拔生長 (e) 情況 5 : 向高海拔生長 11
4.1 宜蘭福山植物園的等高線圈 ( 藍色的線條代表溪流) 15
4.2 宜蘭福山植物園的點過程圖 15
4.3 宜蘭福山植物園的強度圖 16
4.4 宜蘭福山植物園的強度圖(有數值) 16
4.5 kmeans 分 3 群 20
4.6 kmedoids-maximum 分 3 群 21
4.7 kmediods-canberra 分 3 群 22


表 目錄
3.1 不同距離測度與分群方法在不同點個數條件下所計算出的蘭德指數 13
4.1 不同分群方法與群數下的指標比較 19
參考文獻
Abdi, H. (1994). “Additive-tree representations (with an application to
face processing) . In: Lecture Notes in Biomathematics 84, pp. 43-59.
Advantages and disadvantages of k-means | Machine Learning (n.d.). https:
// developers . google . com/machine-learning/ clust/e alrgoirinthmgs /kmeans/
advantages-disadvantages. Accessed: 2025-04-24.
Aghabozorgi, S., A. Seyed Shirkhorshidi, andT. Y. Wah (2015). “Time-series
clustering-A decade review. In: Information Systems 53, pp. 16- 38.
DOI: 10.1016/j.is.2015.04.007.
Cha, S.-H. (2007). “Comprehensive survey on distance/similarity measures
between probability density functions . In: International Journal of
Mathematical Models and Methods in Applied Sciences 1. doi: 10. 1.1. 154. 8446,
pp. 30- 3007.
Clatworthy, J., D. Buick, M. Hankins, J. Weinman, and R. Horne (2005). “The
use and reporting of cluster analysis in health psychology: A review’ .
In: British Journal of Health Psychology 10, pp. 32- 3958.
Coates, A. and A. Y. Ng (2012). “Learning feature representations with
k-means’ . In: Neural Networks: Tricks of the Trade. Ed. by G. Montavon,
G. B. Orr, and K.-k. Muller. Originally archived (PDF) on 2013-07-
06. Springer. URL: http://ufldl.stanford.edu/wiki/resources/kmeans_
tricks.pdf.
Ding, C. and X. He (2004). “K-means clustering via principal component
analysis’ . In: Proceedings of the 21st International Conference on
Machine Learning CICML), pp. 22- 2532.
Dunham, M. H. (2003). Data Mining: Introductory and Advanced Topics. Upper
Saddle River, New Jersey: Prentice Hall.
Gan, G., C. Ma, and J. Wu (2007). Data Clustering: Theory, Algorithms, and
Applications. SIAM - Society for Industrial and Applied Mathematics.
Han, J., M. Kamber, and J. Pei (2006). Data Mining: Concepts and Techniques.
Morgan Kaufmann.
Heyer, L., S. Kruglyak, and S. Yooseph (1999). “Exploring expression data:
identification and analysis of coexpressed genes’ . In: Genome Research
9, pp. 1106-1115.
Honarkhah, M. and J. Caers (2010). “Stochastic Simulation of Patterns
Using Distance-Based Pattern Modeling” . In: Mathematical Geosciences
42, pp. 487-517.
Huang, Z. (1998). “Extensions to the k-means algorithm for clustering
large datasets with categorical values’ . In: Data Mining and Knowledge
Discovery 2, pp. 28- 3304.
Hubert, L. and P. Arabie (1985)... “Comparing partitions’ . In: Journal of
Classification 2, pp. 193-218. DOI: 10.1007/BF01908075.
Jain, A. K., M. N. Murty, andP. J. Flynn (1999). “Data clustering: a
review. In: ACM Computing Surveys 31.3, pp. 264-323. DOI: 10.1145/
331499.331504.
Jardine, N. andR. Sibson (1968). “The construction of hierarchic and
non-hierarchic classifications’ . In: The Computer Journal 11, pp. 177-
184.
K-Medoids Clustering (n.d.). https://link.springer.com/. Accessed: 2025-
05-23.
Kaufman, L. and P. J. Rousseeuw (1987). Clustering by Means of Medoids. New
York: Wiley.
— (1990). “Partitioning Around Medoids (Program PAM)” . In: Wiley Series
in Probability and Statistics. Hoboken, NJ, USA: John Wiley & Sons, Inc.,
pp. 68-125. ISBN: 978-0-470-31680-1. DOI: 10.1002/9780470316801.ch2.
Keil, P., T. Wiegand, A. B. Toth, D. J. McGlinn, and J. M. Chase (2021).
“Measurement and Analysis of Interspecific Spatial Associations as a
Facet of Biodiversity’ . In: Ecological Monographs 91, e01452. DOI:
10.1002/ecm. 1452.
Ledo, A. (2015). “Nature and Age of Neighbours Matter: Interspecific Associations
among Tree Species Exist and Vary across Life Stages in Tropical
Forests . In: PLoS ONE 10, e0141387. DOI: 10.1371/journal.pone.0141387,
Legendre, P. and M.-J. Fortin (1989). “Spatial pattern and ecological
analysis’ . In: Vegetatio 80.2, pp. 107-138. DOI: 10.1007/BF00048036.
Legendre, P. and L. Legendre (2012). Numerical Ecology. Elsevier.
Mackay, D. J. C. (2003). Information Theory, Inference and Learning Algorithms.
Cambridge University Press.
MacQueen, J. (1967). “Some Methods for Classification and Analysis of
Multivariate Observations’ . In: Proceedings of the Fifth Berkeley Symposium
on Mathematical Statistics and Probability, Volume 1: Statistics.
Berkeley, CA, USA: University of California Press, pp. 28- 2197.
Mao, J. and A. K. Jain (1996). “A self-organizing network for hyperellipsoidal
clustering (HEC)” . In: IEEE Transactions on Neural Networks 7,
pp. 16-29. DOI: 10.1109/72.478389.
Moustakas, A. and M. R. Evans (2015). “Effects of tree spatial structure
on the dynamics of interacting forest species’ . In: Journal of Ecology
103.6, pp. 1444-1455. DOI: 10.1111/1365-2745.12473.
Ng, R. T. andJ. Han (1994). “Efficient and effective clustering methods for
spatial data mining’ . In: Proc. of the 20th VLDB Conference. Santiago,
Chile, pp. 144-155.
Nielsen, F. (2016). “Hierarchical Clustering” . In: Introduction to HPC
with MPI for Data Science. Springer, pp. 195-211. ISBN: 978-3-319-21903-
0.
Pang, S. E. H., J. W. F. Slik, D. Zurell, and E. L. Webb (2023). “The
clustering of spatially associated species unravels patterns in tropical
tree species distributions’ . In: Ecosphere 14.6. Handling Editor:
Charles D. Canham, e04989. DOI: 10.1002/ecs2.4589. URL: https:
//doi.org/10.1002/ecs2.4589.
Parzen, E. (1962). “On Estimation of a Probability Density Function and
Mode” . In: Annals of Mathematical Statistics 33.3, pp. 1065-1076. DOI:
10.1214/aoms/1177704472.
Peres-Neto, P. R., P. Legendre, S. Dray, and D. Borcard (2006). “Variation
partitioning of species data matrices: Estimation and comparison of
fractions . In: Ecology 87.10, pp. 2614-2625. DOI: 10.1890/0012-
9658 (2006) 87 [2614: VPOSDM]2.0.CO; 2.
Plotkin, J. B., J. Chave, and P. S. Ashton (2002). “Cluster Analysis of
Spatial Patterns in Malaysian Tree Species’ . In: The American Naturalist
160.5, pp. 629-644. DOI: 10.1086/342823.
Rosenblatt, M. (1956). “Remarks on Some Nonparametric Estimates of a Density
Function” . In: Annals of Mathematical Statistics 27.3, pp. 83- 8287.
DOL: 10.1214/aoms/1177728190.
Santos, J. M. andM. Embrechts (2009). “On the Use of the AdjustReandd Index
as a Metric for Evaluating Supervised Classification’ . In: Lecture Notes
in Computer Science, pp. 175-184. DOI: 10.1007/978-3-642-042771-85.
Schubert, E. and P. J. Rousseeuw (2020). “Fast and Eager k-Medoids Clustering:
O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms’ . In: Knowledge and Information Systems 62.4, pp. 751-776. DOI:
10.1007/s10115-020-01414-2.
Shirkhorshidi, A. S., S. Aghabozorgi, and T.-Y. Wah (2015). “A Comparison
Study on Similarity and Dissimilarity Measures in Clustering Continuous
Data’ . In: PLONEo 10.S12, e0144059. DOI: 10.1371/journal.pone.0144059.
Wand, M. and M. Jones (1995). Kernel Smoothing. Chapman and Hall/CRC. DOI:
10.1201/b14876.
Wang, H., W. Wang, H. Yang, et al. (2002). “Clusterinbgy pattern similarity
in ladratag seets’ . In: Proceedings of the 2002 ACM SIGMOD international
conference on Management of data. New York, USA: ACM, p. 394. DOI:
10.1145/564691 .564737.
Zha, H., C. Ding, M. Gu, X. He, and H. Simon (2001). “Spectral Relaxation
for K-means Clustering’ . In: Advances in Neural Information Processing
Systems 14, pp. 10- 510674.
論文全文使用權限
國家圖書館
同意無償授權國家圖書館,書目與全文電子檔於繳交授權書後, 於網際網路立即公開
校內
校內紙本論文立即公開
同意電子論文全文授權於全球公開
校內電子論文立即公開
校外
同意授權予資料庫廠商
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信