系統識別號 | U0002-0607201121284600 |
---|---|
DOI | 10.6846/TKU.2011.00202 |
論文名稱(中文) | 不完整長期追蹤二元資料之插補策略 |
論文名稱(英文) | Imputation Strategies for Incomplete Longitudinal Binary Data |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 統計學系碩士班 |
系所名稱(英文) | Department of Statistics |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 99 |
學期 | 2 |
出版年 | 100 |
研究生(中文) | 李紫熒 |
研究生(英文) | Tzu-Ying Li |
學號 | 698650180 |
學位類別 | 碩士 |
語言別 | 英文 |
第二語言別 | |
口試日期 | 2011-06-17 |
論文頁數 | 38頁 |
口試委員 |
指導教授
-
陳怡如
委員 - 林國欽 委員 - 鄧文舜 |
關鍵字(中) |
長期追蹤資料 遺失值 多重插補法 |
關鍵字(英) |
Longitudinal data Missing data Multiple imputation |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
長期追蹤研究期間常會產生遺失值的問題,解決遺失值的問題有許多種方法,其中一種解決遺失值的有效方法為插補法。Demirtas與Hedeker (2007) 利用在多變量常態下具有完整發展架構的多重插補法與應用隨機生成二元反應變數之演算法,以對於二元資料進行轉換,進而提出對於不完整長期追蹤二元資料之插補策略。由於Demirtas與Hedeker (2007)方法無法確保相關性矩陣為正定,以及必須滿足範圍限制使得其相關性才有唯一解。為改善使用Demirtas-Hedeker方法時可能會面臨到的困難,我們提出對Demirtas- Hedeker方法之修改插補程序,並應用標準偏誤 (standardized bias),覆蓋率(coverage percentage),和均方誤根(root-mean-squared error)等基準量測,討論在不同的遺失型態與遺失比率下,比較所提出之插補方法與Demirtas-Hedeker方法之表現差異。此外,並使用實例來模擬研究說明如何應用我們所提出的方法。 |
英文摘要 |
It is very common for longitudinal studies to involve missing data. The imputation method is one of the effective procedures for handling with the problem of missing data. Based on the well-developed multiple imputation for normal responses and a random number generation algorithm for binary outcomes, Demirtas and Hedeker (2007) introduced a quasi-imputation strategy for incomplete longitudinal binary data. The shortcomings of Demirtas-Hedeker approach are that positive-definiteness of the correlation matrix cannot be guaranteed and the correlations need to satisfy the constraint for a unique solution. To improve the shortcomings of Demirtas-Hedeker method, the proposed methods can be regarded as the modification of Demirtas-Hedeker method with simpler procedures. The performance of Demirtas-Hedeker method and the proposed procedures is compared in terms of standardized bias, coverage percentage, and root-mean-squared error under various configurations of missing rates and missingness mechanisms. A real data set is used to illustrate the application of the proposed methods. |
第三語言摘要 | |
論文目次 |
Contents 1 Introduction 1 2 Description of Methodology 7 2.1 Imputation Method . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Demirtas and Hedeker Approach . . . . . . . . . . . . . . . . . 12 2.3 Proposed Imputation Strategies . . . . . . . . . . . . . . . . . . 13 3 Simulation Study 18 3.1 Missingness Mechanisms . . . . . . . . . . . . . . . . . . . . . . 19 3.2 GEE Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4 Conclusion and Discussion 34 i List of Tables 1 The first ten patients for each center in a trial of respiratory disease. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2 The parameter estimates of GEE model with independent working correlation and their standard errors, confidence intervals, test statistics as well as p-values for respiratory disease data. . 22 3 The parameter estimates of GEE model with exchangeable working correlation and their standard errors, confidence intervals, test statistics as well as p-values for respiratory disease data. . 23 4 The parameter estimates of GEE model with unstructured working correlation and their standard errors, confidence intervals, test statistics as well as p-values for respiratory disease data. . 24 5 The performance measures of DH, M1 and M2 approaches using GEE model with independent working correlation under various missing rates of MCAR. The targeted value is -0.0656. . . . . . 28 6 The performance measures of DH, M1 and M2 approaches using GEE model with independent working correlation under various missing rates of MAR. The targeted value is -0.0656. . . . . . . 29 ii 7 The performance measures of DH, M1 and M2 approaches using GEE model with exchangeable working correlation under various missing rates of MCAR. The targeted value is -0.0685. . 30 8 The performance measures of DH, M1 and M2 approaches using GEE model with exchangeable working correlation under various missing rates of MAR. The targeted value is -0.0685. . . 31 9 The performance measures of DH, M1 and M2 approaches using GEE model with unstructured working correlation under various missing rates of MCAR. The targeted value is -0.0954. . 32 10 The performance measures of DH, M1 and M2 approaches using GEE model with unstructured working correlation under various missing rates of MAR. The targeted value is -0.0954. . . 33 iii |
參考文獻 |
Bibliography Agresti, A. (2002). Categorical Data Analysis, 2nd edition, Wiley: New York. Demirtas, H. and Hedeker, D. (2007). Gaussianization-based quasi-imputation and expansion strategies for incomplete correlated binary responses, Statis- tics in Medicine, 26, 782-799. Diggle, P.J., Heagerty, P.J., Liang, K.Y. and Zeger, S.L. (1994). Analysis of Longitudinal Data, 2nd edition, Oxford University Press. Emrich, L.J. and Piedmonte, R.P. (1991). A method for generating highdimensional multivariate binary outcomes., American Statistician, 45, 302- 304. Fitzmaurice, G.M. and Lipsitz, S.R. (1995). A model for binary time series data with serial odds ratio patterns., Applied Statistics, 44, 51-61. Hedeker, D. (2007). On imputing continuous data when the eventual interest pertains to ordinalized outcomes via threshold concept, Computational Statistics & Data Analysis, 52, 2261-2271. Hedeker, D. and Gibbons, R.D. (1997). Application of Random-Effects Pattern-Mixture Models for Missing Data in Longitudinal Studies, Psycho- logical Methods, 2, 64-78. Koch, G.G., Carr, G.J., Amara, I.A., Stokes, M.E. and Uryniak, T.J. (1990). Categorical data analysis. In Statistical Methodology in the Pharmaceutical Sciences, Berry DA (ed.). Marcel Dekker: New York, 389-473. 37 Lavori, P.W., Dawson, R., Shera, D. (1995). A multiple imputation strategy for clinical trials with truncation of patient data, Statistics in Medicine, 14, 1913-1925. Lee, A.J. (1993). Generating random binary deviates having fixed marginal distributions and specified degrees of association, Statistical Computing, 47, 209-215. Liang, K.Y., and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models, Biometrika, 73, 13-22. Little, R.J.A. and Rubin, D.B. (2002). Statistical Analysis with Missing Data, 2nd edition, Wiley: New York. Kenward, M.G. and Carpener, J. (2007). Multiple imputation: current perspectives, Statistical Method in Medical Research, 16, 199-218. Rubin, D.B. (1976). Inference and missing data (with discussion), Biometrika, 63, 581-592. Rubin, D.B. (1978). Multiple Imputation in Sample Surveys, Proc. Survey Res. Meth. Sec., Am. Statist. Assoc., 20-34. Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Survey, Wiley: New York. Schafer, J.L. (1997). Analysis of Incomplete Multivariate Data, Chapman & Hall: London. Schafer, J.L. (1999). Multiple imputation: a primer, Statistical Methods in Medical Research, 8, 3-15. Stiratelli, R., Laird, N. and Ware, J.H. (1984). Random-effects models for serial observations with binary, Biometrics, 40, 961-971. Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data, Springer: New York. 38 |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信