| 系統識別號 | U0002-0307202522181900 |
|---|---|
| DOI | 10.6846/tku202500495 |
| 論文名稱(中文) | 不同距離測度與分群方法於空間資料之比較 |
| 論文名稱(英文) | A Comparative Study of Different Distance Measures and Clustering Methods in Spatial Data Analysis |
| 第三語言論文名稱 | |
| 校院名稱 | 淡江大學 |
| 系所名稱(中文) | 統計學系應用統計學碩士班 |
| 系所名稱(英文) | Department of Statistics |
| 外國學位學校名稱 | |
| 外國學位學院名稱 | |
| 外國學位研究所名稱 | |
| 學年度 | 113 |
| 學期 | 2 |
| 出版年 | 114 |
| 研究生(中文) | 李琍絹 |
| 研究生(英文) | LI-CHUAN LEE |
| 學號 | 613650026 |
| 學位類別 | 碩士 |
| 語言別 | 繁體中文 |
| 第二語言別 | |
| 口試日期 | 2025-06-25 |
| 論文頁數 | 41頁 |
| 口試委員 |
指導教授
-
張雅梅(yameic@ccu.edu.tw)
口試委員 - 張春桃(chuntao@mail.tku.edu.tw) 口試委員 - 張育瑋(ychang@nccu.edu.tw) 共同指導教授 - 吳碩傑(shuo@mail.tku.edu.tw) |
| 關鍵字(中) |
核密度估計 主成分分析 分群分析 距離度量 |
| 關鍵字(英) |
Kernel density estimation Principal Component Analysis Clustering Analysis Distance metrics |
| 第三語言關鍵字 | |
| 學科別分類 | |
| 中文摘要 |
物種的空間分佈是生態學與生物地理學中的重要議題,能反映物種對環境資源的需求與相互作用。隨著生態空間資料的累積與分析方法的發展,如何有效揭示多物種空間分佈的群聚模式,成為生態資料探勘的重要挑戰。本研究提出一套結合核密度估計(kernel density estimation)、主成分分析(principal component analysis)、距離測度(distance measures)與多種分群方法(包括 k-means、k-medoids 及多種階層式分群)的系統化分析流程。首先利用核密度估計將離散物種出現點轉換為連續的空間強度函數,進而透過主成分分析降低資料維度,保留主要變異結構。接著,採用不同分群演算法與距離度量,包含歐幾里得距離、坎培拉距離等,全面比較各方法在辨識物種群聚結構上的效能。為驗證方法的穩健性與適用性,本研究透過模擬資料進行比較,並應用於實際生態空間資料,結果顯示所提流程能有效捕捉物種分佈特徵,具備良好的解釋力與實務應用價值。本研究成果期望為生態空間資料的多物種分析提供具體可行且具彈性的分析框架,促進生態資料探勘與保育決策之發展。 |
| 英文摘要 |
The spatial distribution of species is a key issue in ecology and biogeography, reflecting species' environmental needs and interactions. With increasing ecological spatial data, uncovering clustering patterns among multiple species poses a major challenge. This study proposes a systematic framework combining kernel density estimation (KDE), principal component analysis (PCA), distance measures, and multiple clustering methods, including k-means, k-medoids, and hierarchical clustering. KDE transforms species occurrence points into spatial intensity functions, and PCA reduces dimensionality while preserving major variation. Various distance metrics and clustering algorithms are compared to assess their ability to identify spatial clusters. Simulation and real ecological data show the framework effectively captures species distribution patterns, offering strong interpretability and practical value. This work provides a flexible tool for multi-species spatial analysis and supports ecological research and conservation planning. |
| 第三語言摘要 | |
| 論文目次 |
目錄 圖 目錄 II 表 目錄 III 第一章 緒論 1 第二章 研究方法 4 第一節 主成分分析 (PCA) 5 第二節 距離測度 5 第三節 分群分析 7 第一小節 階層式分群 7 第二小節 非階層式分群 8 第四節 蘭德指數 9 第三章 模擬研究 10 第四章 實例分析 14 第五章 結論 24 參考文獻 26 附錄 31 圖 目錄 3.1 2000點個數模擬空間資料: (a) 情況 1 : 向右上生長 (b) 情況 2 : 向左上生長 (c) 情況 3 : 向上生長 (d) 情況 4 : 向低海拔生長 (e) 情況 5 : 向高海拔生長 11 4.1 宜蘭福山植物園的等高線圈 ( 藍色的線條代表溪流) 15 4.2 宜蘭福山植物園的點過程圖 15 4.3 宜蘭福山植物園的強度圖 16 4.4 宜蘭福山植物園的強度圖(有數值) 16 4.5 kmeans 分 3 群 20 4.6 kmedoids-maximum 分 3 群 21 4.7 kmediods-canberra 分 3 群 22 表 目錄 3.1 不同距離測度與分群方法在不同點個數條件下所計算出的蘭德指數 13 4.1 不同分群方法與群數下的指標比較 19 |
| 參考文獻 |
Abdi, H. (1994). “Additive-tree representations (with an application to face processing) . In: Lecture Notes in Biomathematics 84, pp. 43-59. Advantages and disadvantages of k-means | Machine Learning (n.d.). https: // developers . google . com/machine-learning/ clust/e alrgoirinthmgs /kmeans/ advantages-disadvantages. Accessed: 2025-04-24. Aghabozorgi, S., A. Seyed Shirkhorshidi, andT. Y. Wah (2015). “Time-series clustering-A decade review. In: Information Systems 53, pp. 16- 38. DOI: 10.1016/j.is.2015.04.007. Cha, S.-H. (2007). “Comprehensive survey on distance/similarity measures between probability density functions . In: International Journal of Mathematical Models and Methods in Applied Sciences 1. doi: 10. 1.1. 154. 8446, pp. 30- 3007. Clatworthy, J., D. Buick, M. Hankins, J. Weinman, and R. Horne (2005). “The use and reporting of cluster analysis in health psychology: A review’ . In: British Journal of Health Psychology 10, pp. 32- 3958. Coates, A. and A. Y. Ng (2012). “Learning feature representations with k-means’ . In: Neural Networks: Tricks of the Trade. Ed. by G. Montavon, G. B. Orr, and K.-k. Muller. Originally archived (PDF) on 2013-07- 06. Springer. URL: http://ufldl.stanford.edu/wiki/resources/kmeans_ tricks.pdf. Ding, C. and X. He (2004). “K-means clustering via principal component analysis’ . In: Proceedings of the 21st International Conference on Machine Learning CICML), pp. 22- 2532. Dunham, M. H. (2003). Data Mining: Introductory and Advanced Topics. Upper Saddle River, New Jersey: Prentice Hall. Gan, G., C. Ma, and J. Wu (2007). Data Clustering: Theory, Algorithms, and Applications. SIAM - Society for Industrial and Applied Mathematics. Han, J., M. Kamber, and J. Pei (2006). Data Mining: Concepts and Techniques. Morgan Kaufmann. Heyer, L., S. Kruglyak, and S. Yooseph (1999). “Exploring expression data: identification and analysis of coexpressed genes’ . In: Genome Research 9, pp. 1106-1115. Honarkhah, M. and J. Caers (2010). “Stochastic Simulation of Patterns Using Distance-Based Pattern Modeling” . In: Mathematical Geosciences 42, pp. 487-517. Huang, Z. (1998). “Extensions to the k-means algorithm for clustering large datasets with categorical values’ . In: Data Mining and Knowledge Discovery 2, pp. 28- 3304. Hubert, L. and P. Arabie (1985)... “Comparing partitions’ . In: Journal of Classification 2, pp. 193-218. DOI: 10.1007/BF01908075. Jain, A. K., M. N. Murty, andP. J. Flynn (1999). “Data clustering: a review. In: ACM Computing Surveys 31.3, pp. 264-323. DOI: 10.1145/ 331499.331504. Jardine, N. andR. Sibson (1968). “The construction of hierarchic and non-hierarchic classifications’ . In: The Computer Journal 11, pp. 177- 184. K-Medoids Clustering (n.d.). https://link.springer.com/. Accessed: 2025- 05-23. Kaufman, L. and P. J. Rousseeuw (1987). Clustering by Means of Medoids. New York: Wiley. — (1990). “Partitioning Around Medoids (Program PAM)” . In: Wiley Series in Probability and Statistics. Hoboken, NJ, USA: John Wiley & Sons, Inc., pp. 68-125. ISBN: 978-0-470-31680-1. DOI: 10.1002/9780470316801.ch2. Keil, P., T. Wiegand, A. B. Toth, D. J. McGlinn, and J. M. Chase (2021). “Measurement and Analysis of Interspecific Spatial Associations as a Facet of Biodiversity’ . In: Ecological Monographs 91, e01452. DOI: 10.1002/ecm. 1452. Ledo, A. (2015). “Nature and Age of Neighbours Matter: Interspecific Associations among Tree Species Exist and Vary across Life Stages in Tropical Forests . In: PLoS ONE 10, e0141387. DOI: 10.1371/journal.pone.0141387, Legendre, P. and M.-J. Fortin (1989). “Spatial pattern and ecological analysis’ . In: Vegetatio 80.2, pp. 107-138. DOI: 10.1007/BF00048036. Legendre, P. and L. Legendre (2012). Numerical Ecology. Elsevier. Mackay, D. J. C. (2003). Information Theory, Inference and Learning Algorithms. Cambridge University Press. MacQueen, J. (1967). “Some Methods for Classification and Analysis of Multivariate Observations’ . In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics. Berkeley, CA, USA: University of California Press, pp. 28- 2197. Mao, J. and A. K. Jain (1996). “A self-organizing network for hyperellipsoidal clustering (HEC)” . In: IEEE Transactions on Neural Networks 7, pp. 16-29. DOI: 10.1109/72.478389. Moustakas, A. and M. R. Evans (2015). “Effects of tree spatial structure on the dynamics of interacting forest species’ . In: Journal of Ecology 103.6, pp. 1444-1455. DOI: 10.1111/1365-2745.12473. Ng, R. T. andJ. Han (1994). “Efficient and effective clustering methods for spatial data mining’ . In: Proc. of the 20th VLDB Conference. Santiago, Chile, pp. 144-155. Nielsen, F. (2016). “Hierarchical Clustering” . In: Introduction to HPC with MPI for Data Science. Springer, pp. 195-211. ISBN: 978-3-319-21903- 0. Pang, S. E. H., J. W. F. Slik, D. Zurell, and E. L. Webb (2023). “The clustering of spatially associated species unravels patterns in tropical tree species distributions’ . In: Ecosphere 14.6. Handling Editor: Charles D. Canham, e04989. DOI: 10.1002/ecs2.4589. URL: https: //doi.org/10.1002/ecs2.4589. Parzen, E. (1962). “On Estimation of a Probability Density Function and Mode” . In: Annals of Mathematical Statistics 33.3, pp. 1065-1076. DOI: 10.1214/aoms/1177704472. Peres-Neto, P. R., P. Legendre, S. Dray, and D. Borcard (2006). “Variation partitioning of species data matrices: Estimation and comparison of fractions . In: Ecology 87.10, pp. 2614-2625. DOI: 10.1890/0012- 9658 (2006) 87 [2614: VPOSDM]2.0.CO; 2. Plotkin, J. B., J. Chave, and P. S. Ashton (2002). “Cluster Analysis of Spatial Patterns in Malaysian Tree Species’ . In: The American Naturalist 160.5, pp. 629-644. DOI: 10.1086/342823. Rosenblatt, M. (1956). “Remarks on Some Nonparametric Estimates of a Density Function” . In: Annals of Mathematical Statistics 27.3, pp. 83- 8287. DOL: 10.1214/aoms/1177728190. Santos, J. M. andM. Embrechts (2009). “On the Use of the AdjustReandd Index as a Metric for Evaluating Supervised Classification’ . In: Lecture Notes in Computer Science, pp. 175-184. DOI: 10.1007/978-3-642-042771-85. Schubert, E. and P. J. Rousseeuw (2020). “Fast and Eager k-Medoids Clustering: O(k) Runtime Improvement of the PAM, CLARA, and CLARANS Algorithms’ . In: Knowledge and Information Systems 62.4, pp. 751-776. DOI: 10.1007/s10115-020-01414-2. Shirkhorshidi, A. S., S. Aghabozorgi, and T.-Y. Wah (2015). “A Comparison Study on Similarity and Dissimilarity Measures in Clustering Continuous Data’ . In: PLONEo 10.S12, e0144059. DOI: 10.1371/journal.pone.0144059. Wand, M. and M. Jones (1995). Kernel Smoothing. Chapman and Hall/CRC. DOI: 10.1201/b14876. Wang, H., W. Wang, H. Yang, et al. (2002). “Clusterinbgy pattern similarity in ladratag seets’ . In: Proceedings of the 2002 ACM SIGMOD international conference on Management of data. New York, USA: ACM, p. 394. DOI: 10.1145/564691 .564737. Zha, H., C. Ding, M. Gu, X. He, and H. Simon (2001). “Spectral Relaxation for K-means Clustering’ . In: Advances in Neural Information Processing Systems 14, pp. 10- 510674. |
| 論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信