§ 瀏覽學位論文書目資料
系統識別號 U0002-2707201921493600
DOI 10.6846/TKU.2019.00927
論文名稱(中文) 空間對數常態模式於俄亥俄州肺癌資料之應用
論文名稱(英文) A spatial log-normal model for lung cancer data in Ohio
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 統計學系應用統計學碩士班
系所名稱(英文) Department of Statistics
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 107
學期 2
出版年 108
研究生(中文) 蔡宜欣
研究生(英文) Yi-Shin Tsai
學號 606650066
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2019-07-16
論文頁數 30頁
口試委員 指導教授 - 張雅梅
委員 - 吳碩傑
委員 - 張育瑋
關鍵字(中) 對數常態
非平穩空間模型
最小絕對壓縮與篩選運算法
最小角度迴歸法
空間分佈
關鍵字(英) lognormal
non-stationary spatial model
lasso
lars
spatial distribution
第三語言關鍵字
學科別分類
中文摘要
本研究旨在建立一個非平穩空間模型(non-stationary spatial model) 來分析美國俄亥俄州各郡白人肺癌死亡人數資料於空間的分布現象,該模型中包含基底函數(basis function)及平穩過程(stationary process)。針對模型中所需估計的大量參數,我們使用Tibshirani (1996) 提出的最小絕對壓縮與篩選運算法(least absolute shrinkage and selection operator, lasso),該方法可以幫助我們在估計參數的同時篩選變數,達到選模與建模的目的。結合Efron et al. (2004) 提出的最小角迴歸法(least angle regression, lars),能夠加快估計法的計算速度,本研究使用R 軟體中的lars 套件來進行計算,我們也會使用交叉驗證法(cross-validation, cv),幫助我們選擇lasso中最佳的調整參數(tunning parameter)。我們將估計結果繪製成空間分布圖,透過空間分布圖討論因肺癌死亡人數在空間上的關係。從分析結果中我們發現,在馬霍寧郡(Mahoning)、凱霍加郡(Cuyahoga)、盧卡斯郡(Lucas)、富蘭克林郡(Franklin)、蒙哥馬利郡(Montgomery) 及漢密爾頓郡(Hamilton) 的平均死亡人數較高,變異程度也較高;男性的肺癌死亡人數在俄亥俄州西南部的柯林頓郡(Clinton) 及高地郡(Highland)附近有較低相關性,而女性則在中部偏東的科肖克頓郡(Coshocton)附近有較低相關性。
英文摘要
In this research, we use a non-stationary spatial model to construct the distribution of the lung cancer data in Ohio. The model includes some basis functions and stationary processes. The least absolute shrinkage and selection operator (lasso) is used to estimate the parameters. The method of lasso can get the results of model selection and  parameter estimation simultaneously. We use the lars package in R language to solve the lasso estimation more efficiently. Cross-validation (cv) is used to choose the tunning parameter of lasso. We use the plots to show the distribution of the lung cancer data. Mahoning, Cuyahoga, Lucas, Franklin and Hamiltonthe have the higher average numbers and the variances of deaths. For male white people, the correlation of the average number of deaths is lower in Clinton and Highland. For female white people, the correlation of the average number of deaths is lower in Coshocton.
第三語言摘要
論文目次
目錄
第一章簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
第二章模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
第三章估計方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
第一節最小絕對壓縮與篩選運算法(lasso) . . . . . . . . . . . . . . . . . . . 8
第二節交叉驗證法(cv) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
第三節本研究之估計方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
第四章俄亥俄州肺癌資料之應用. . . . . . . . . . . . . . . . . . . . . . . . . . . 13
第五章結論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
圖目錄
3.1 交叉驗證法示意圖. . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1 俄亥俄州各郡地圖. . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 各郡平均總人口數. . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 交叉驗證法結果圖. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 肺癌致死平均人數估計. . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.5 肺癌致死人數標準差估計. . . . . . . . . . . . . . . . . . . . . . . . 19
4.6 馬霍寧郡、凱霍加郡、盧卡斯郡、富蘭克林郡、蒙哥馬利郡、漢密
爾頓郡男性肺癌死亡人數的相關係數. . . . . . . . . . . . . . . . . . 22
4.7 馬霍寧郡、凱霍加郡、盧卡斯郡、富蘭克林郡、蒙哥馬利郡、漢密
爾頓郡女性肺癌死亡人數的相關係數. . . . . . . . . . . . . . . . . . 23
表目錄
4.1 男性平均肺癌死亡人數參數估計. . . . . . . . . . . . . . . . . . . . 17
4.2 女性平均肺癌死亡人數參數估計. . . . . . . . . . . . . . . . . . . . 18
4.3 男性肺癌死亡人數共變異數參數估計. . . . . . . . . . . . . . . . . . 20
4.4 女性肺癌死亡人數共變異數參數估計. . . . . . . . . . . . . . . . . . 20
參考文獻
[1] D. K. Agarwal, A. E. Gelfand, and S. Citron-Pousty, “Zero-inflated models with application to spatial count data,” Environmental and Ecological statistics, vol. 9, no. 4, pp. 341–355, 2002.
[2] J. M. Albert, W. Wang, and S. Nelson, “Estimating overall exposure effects for zero-inflated regression models with application to dental caries,” Statistical methods in medical research, vol. 23, no. 3, pp. 257–278, 2014.
[3] M. Alfò and A. Maruotti, “Two-part regression models for longitudinal zeroinflated count data,” Canadian Journal of Statistics, vol. 38, no. 2, pp. 197–216, 2010.
[4] G. Baetschmann and R. Winkelmann, “A dynamic hurdle model for zeroinflated count data,” Communications in Statistics-Theory and Methods, vol. 46, no. 14, pp. 7174–7187, 2017.
[5] A. Buu, R. Li, X. Tan, and R. A. Zucker, “Statistical models for longitudinal zero-inflated count data with applications to the substance abuse field,” Statistics in medicine, vol. 31, no. 29, pp. 4074–4086, 2012.
[6] B. P. Carlin, A. E. Gelfand, and S. Banerjee, Hierarchical modeling and analysis for spatial data. Chapman and Hall/CRC, 2014.
[7] Y.-M. Chang, N.-J. Hsu, and H.-C. Huang, “Semiparametric estimation and selection for nonstationary spatial covariance functions,” Journal of Computational and Graphical Statistics, vol. 19, no. 1, pp. 117–139, 2010.
[8] B. Efron, T. Hastie, I. Johnstone, R. Tibshirani et al., “Least angle regression,” The Annals of statistics, vol. 32, no. 2, pp. 407–499, 2004.
[9] D. B. Hall, “Zero-inflated poisson and binomial regression with random effects: a case study,” Biometrics, vol. 56, no. 4, pp. 1030–1039, 2000.
[10] N.-J. Hsu, Y.-M. Chang, and H.-C. Huang, “A group lasso approach for nonstationary spatial–temporal covariance estimation,” Environmetrics, vol. 23, no. 1, pp. 12–23, 2012.
[11] G. James, D. Witten, T. Hastie, and R. Tibshirani, An introduction to statistical learning. Springer, 2013, vol. 112.
[12] J. Kelsall and J. Wakefield, “Modeling spatial variation in disease risk: a geostatistical approach,” Journal of the American Statistical Association, vol. 97, no. 459, pp. 692–701, 2002.
[13] H. Kim, D. Sun, and R. K. Tsutakawa, “Lognormal vs. gamma: extra variations,” Biometrical Journal: Journal of Mathematical Methods in Biosciences, vol. 44, no. 3, pp. 305–323, 2002.
[14] N. Klein, T. Kneib, and S. Lang, “Bayesian generalized additive models for location, scale, and shape for zero-inflated and overdispersed count data,” Journal of the American Statistical Association, vol. 110, no. 509, pp. 405–419, 2015.
[15] A. Kottas, J. A. Duan, and A. E. Gelfand, “Modeling disease incidence data with spatial and spatio temporal dirichlet process mixtures,” Biometrical Journal: Journal of Mathematical Methods in Biosciences, vol. 50, no. 1, pp. 29–42, 2008.
[16] D. Lambert, “Zero-inflated poisson regression, with an application to defects in manufacturing,” Technometrics, vol. 34, no. 1, pp. 1–14, 1992.
[17] H. K. Lim, W. K. Li, and L. Philip, “Zero-inflated poisson regression mixture model,” Computational Statistics & Data Analysis, vol. 71, pp. 151–158, 2014.
[18] H. Liu, K.-S. Chan et al., “Introducing cozigam: an r package for unconstrained and constrained zero-inflated generalized additive model analysis,” Journal of Statistical Software, vol. 35, no. 11, pp. 1–26, 2010.
[19] T. Nakaya, A. S. Fotheringham, C. Brunsdon, and M. Charlton, “Geographically weighted poisson regression for disease association mapping,” Statistics in medicine, vol. 24, no. 17, pp. 2695–2717, 2005.
[20] M. Ridout, J. Hinde, and C. G. DeméAtrio, “A score test for testing a zeroinflated poisson regression model against zero-inflated negative binomial alternatives,” Biometrics, vol. 57, no. 1, pp. 219–223, 2001.
[21] D. Sun, R. K. Tsutakawa, H. Kim, and Z. He, “Spatio-temporal interaction with disease mapping,” Statistics in Medicine, vol. 19, no. 15, pp. 2015–2035, 2000.
[22] R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 267–288, 1996.
[23] J. Van den Broek, “A score test for zero inflation in a poisson distribution,” Biometrics, pp. 738–743, 1995.
[24] L. A. Waller, B. P. Carlin, H. Xia, and A. E. Gelfand, “Hierarchical spatiotemporal mapping of disease rates,” Journal of the American Statistical association, vol. 92, no. 438, pp. 607–617, 1997.
[25] H. Wang and C. Leng, “A note on adaptive group lasso,” Computational statistics & data analysis, vol. 52, no. 12, pp. 5277–5286, 2008.
[26] H. Xia and B. P. Carlin, “Spatio-temporal models with errors in covariates: mapping ohio lung cancer mortality,” Statistics in medicine, vol. 17, no. 18, pp. 2025–2043, 1998.
[27] H. Xia, B. P. CARLIN, and L. A. Waller, “Hierarchical models for mapping ohio lung cancer rates,” Environmetrics: The official journal of the International Environmetrics Society, vol. 8, no. 2, pp. 107–120, 1997.
[28] F.-C. Xie, B.-C. Wei, and J.-G. Lin, “Score tests for zero-inflated generalized poisson mixed regression models,” Computational Statistics & Data Analysis, vol. 53, no. 9, pp. 3478–3489, 2009.
[29] H. Zhu, S. Luo, and S. M. DeSantis, “Zero-inflated count models for longitudinal measurements with heterogeneous random effects,” Statistical methods in medical research, vol. 26, no. 4, pp. 1774–1786, 2017.
[30] H. Zhu, S. M. DeSantis, and S. Luo, “Joint modeling of longitudinal zeroinflated count and time-to-event data: A bayesian perspective,” Statistical methods in medical research, vol. 27, no. 4, pp. 1258–1270, 2018.
[31] H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 2, pp. 301–320, 2005.
論文全文使用權限
校內
紙本論文於授權書繳交後5年公開
同意電子論文全文授權校園內公開
校內電子論文於授權書繳交後5年公開
校外
同意授權
校外電子論文於授權書繳交後5年公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信