§ 瀏覽學位論文書目資料
  
系統識別號 U0002-0607201121284600
DOI 10.6846/TKU.2011.00202
論文名稱(中文) 不完整長期追蹤二元資料之插補策略
論文名稱(英文) Imputation Strategies for Incomplete Longitudinal Binary Data
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 統計學系碩士班
系所名稱(英文) Department of Statistics
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 99
學期 2
出版年 100
研究生(中文) 李紫熒
研究生(英文) Tzu-Ying Li
學號 698650180
學位類別 碩士
語言別 英文
第二語言別
口試日期 2011-06-17
論文頁數 38頁
口試委員 指導教授 - 陳怡如
委員 - 林國欽
委員 - 鄧文舜
關鍵字(中) 長期追蹤資料
遺失值
多重插補法
關鍵字(英) Longitudinal data
Missing data
Multiple imputation
第三語言關鍵字
學科別分類
中文摘要
長期追蹤研究期間常會產生遺失值的問題,解決遺失值的問題有許多種方法,其中一種解決遺失值的有效方法為插補法。Demirtas與Hedeker (2007) 利用在多變量常態下具有完整發展架構的多重插補法與應用隨機生成二元反應變數之演算法,以對於二元資料進行轉換,進而提出對於不完整長期追蹤二元資料之插補策略。由於Demirtas與Hedeker (2007)方法無法確保相關性矩陣為正定,以及必須滿足範圍限制使得其相關性才有唯一解。為改善使用Demirtas-Hedeker方法時可能會面臨到的困難,我們提出對Demirtas- Hedeker方法之修改插補程序,並應用標準偏誤 (standardized bias),覆蓋率(coverage percentage),和均方誤根(root-mean-squared error)等基準量測,討論在不同的遺失型態與遺失比率下,比較所提出之插補方法與Demirtas-Hedeker方法之表現差異。此外,並使用實例來模擬研究說明如何應用我們所提出的方法。
英文摘要
It is very common for longitudinal studies to involve missing data. The imputation method is one of the effective procedures for handling with the problem of missing data. Based on the well-developed multiple imputation for normal
responses and a random number generation algorithm for binary outcomes, Demirtas and Hedeker (2007) introduced a quasi-imputation strategy for incomplete longitudinal binary data. The shortcomings of Demirtas-Hedeker approach are that positive-definiteness of the correlation matrix cannot be guaranteed and the correlations need to satisfy the constraint for a unique solution. To improve the shortcomings of Demirtas-Hedeker method, the proposed methods can be regarded as the modification of Demirtas-Hedeker method with simpler procedures. The performance of Demirtas-Hedeker method and the proposed procedures is compared in terms of standardized bias, coverage percentage, and root-mean-squared error under various configurations of missing rates and missingness mechanisms. A real data set is used to illustrate the application of the proposed methods.
第三語言摘要
論文目次
Contents
1 Introduction 1
2 Description of Methodology 7
2.1 Imputation Method . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Demirtas and Hedeker Approach . . . . . . . . . . . . . . . . . 12
2.3 Proposed Imputation Strategies . . . . . . . . . . . . . . . . . . 13
3 Simulation Study 18
3.1 Missingness Mechanisms . . . . . . . . . . . . . . . . . . . . . . 19
3.2 GEE Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Conclusion and Discussion 34
i
List of Tables
1 The first ten patients for each center in a trial of respiratory
disease. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 The parameter estimates of GEE model with independent working
correlation and their standard errors, confidence intervals,
test statistics as well as p-values for respiratory disease data. . 22
3 The parameter estimates of GEE model with exchangeable working
correlation and their standard errors, confidence intervals,
test statistics as well as p-values for respiratory disease data. . 23
4 The parameter estimates of GEE model with unstructured working
correlation and their standard errors, confidence intervals,
test statistics as well as p-values for respiratory disease data. . 24
5 The performance measures of DH, M1 and M2 approaches using
GEE model with independent working correlation under various
missing rates of MCAR. The targeted value is -0.0656. . . . . . 28
6 The performance measures of DH, M1 and M2 approaches using
GEE model with independent working correlation under various
missing rates of MAR. The targeted value is -0.0656. . . . . . . 29
ii
7 The performance measures of DH, M1 and M2 approaches using
GEE model with exchangeable working correlation under
various missing rates of MCAR. The targeted value is -0.0685. . 30
8 The performance measures of DH, M1 and M2 approaches using
GEE model with exchangeable working correlation under
various missing rates of MAR. The targeted value is -0.0685. . . 31
9 The performance measures of DH, M1 and M2 approaches using
GEE model with unstructured working correlation under
various missing rates of MCAR. The targeted value is -0.0954. . 32
10 The performance measures of DH, M1 and M2 approaches using
GEE model with unstructured working correlation under
various missing rates of MAR. The targeted value is -0.0954. . . 33
iii
參考文獻
Bibliography
Agresti, A. (2002). Categorical Data Analysis, 2nd edition, Wiley: New York.
Demirtas, H. and Hedeker, D. (2007). Gaussianization-based quasi-imputation
and expansion strategies for incomplete correlated binary responses, Statis-
tics in Medicine, 26, 782-799.
Diggle, P.J., Heagerty, P.J., Liang, K.Y. and Zeger, S.L. (1994). Analysis of
Longitudinal Data, 2nd edition, Oxford University Press.
Emrich, L.J. and Piedmonte, R.P. (1991). A method for generating highdimensional
multivariate binary outcomes., American Statistician, 45, 302-
304.
Fitzmaurice, G.M. and Lipsitz, S.R. (1995). A model for binary time series
data with serial odds ratio patterns., Applied Statistics, 44, 51-61.
Hedeker, D. (2007). On imputing continuous data when the eventual interest
pertains to ordinalized outcomes via threshold concept, Computational
Statistics & Data Analysis, 52, 2261-2271.
Hedeker, D. and Gibbons, R.D. (1997). Application of Random-Effects
Pattern-Mixture Models for Missing Data in Longitudinal Studies, Psycho-
logical Methods, 2, 64-78.
Koch, G.G., Carr, G.J., Amara, I.A., Stokes, M.E. and Uryniak, T.J. (1990).
Categorical data analysis. In Statistical Methodology in the Pharmaceutical
Sciences, Berry DA (ed.). Marcel Dekker: New York, 389-473.
37
Lavori, P.W., Dawson, R., Shera, D. (1995). A multiple imputation strategy
for clinical trials with truncation of patient data, Statistics in Medicine, 14,
1913-1925.
Lee, A.J. (1993). Generating random binary deviates having fixed marginal
distributions and specified degrees of association, Statistical Computing, 47,
209-215.
Liang, K.Y., and Zeger, S.L. (1986). Longitudinal data analysis using generalized
linear models, Biometrika, 73, 13-22.
Little, R.J.A. and Rubin, D.B. (2002). Statistical Analysis with Missing Data,
2nd edition, Wiley: New York.
Kenward, M.G. and Carpener, J. (2007). Multiple imputation: current perspectives,
Statistical Method in Medical Research, 16, 199-218.
Rubin, D.B. (1976). Inference and missing data (with discussion), Biometrika,
63, 581-592.
Rubin, D.B. (1978). Multiple Imputation in Sample Surveys, Proc. Survey Res.
Meth. Sec., Am. Statist. Assoc., 20-34.
Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Survey, Wiley:
New York.
Schafer, J.L. (1997). Analysis of Incomplete Multivariate Data, Chapman &
Hall: London.
Schafer, J.L. (1999). Multiple imputation: a primer, Statistical Methods in
Medical Research, 8, 3-15.
Stiratelli, R., Laird, N. and Ware, J.H. (1984). Random-effects models for
serial observations with binary, Biometrics, 40, 961-971.
Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal
Data, Springer: New York.
38
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信