淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


下載電子全文限經由淡江IP使用) 
系統識別號 U0002-0607201121284600
中文論文名稱 不完整長期追蹤二元資料之插補策略
英文論文名稱 Imputation Strategies for Incomplete Longitudinal Binary Data
校院名稱 淡江大學
系所名稱(中) 統計學系碩士班
系所名稱(英) Department of Statistics
學年度 99
學期 2
出版年 100
研究生中文姓名 李紫熒
研究生英文姓名 Tzu-Ying Li
電子信箱 maolido@yahoo.com.tw
學號 698650180
學位類別 碩士
語文別 英文
口試日期 2011-06-17
論文頁數 38頁
口試委員 指導教授-陳怡如
委員-林國欽
委員-鄧文舜
中文關鍵字 長期追蹤資料  遺失值  多重插補法 
英文關鍵字 Longitudinal data  Missing data  Multiple imputation 
學科別分類 學科別自然科學統計
中文摘要 長期追蹤研究期間常會產生遺失值的問題,解決遺失值的問題有許多種方法,其中一種解決遺失值的有效方法為插補法。Demirtas與Hedeker (2007) 利用在多變量常態下具有完整發展架構的多重插補法與應用隨機生成二元反應變數之演算法,以對於二元資料進行轉換,進而提出對於不完整長期追蹤二元資料之插補策略。由於Demirtas與Hedeker (2007)方法無法確保相關性矩陣為正定,以及必須滿足範圍限制使得其相關性才有唯一解。為改善使用Demirtas-Hedeker方法時可能會面臨到的困難,我們提出對Demirtas- Hedeker方法之修改插補程序,並應用標準偏誤 (standardized bias),覆蓋率(coverage percentage),和均方誤根(root-mean-squared error)等基準量測,討論在不同的遺失型態與遺失比率下,比較所提出之插補方法與Demirtas-Hedeker方法之表現差異。此外,並使用實例來模擬研究說明如何應用我們所提出的方法。
英文摘要 It is very common for longitudinal studies to involve missing data. The imputation method is one of the effective procedures for handling with the problem of missing data. Based on the well-developed multiple imputation for normal
responses and a random number generation algorithm for binary outcomes, Demirtas and Hedeker (2007) introduced a quasi-imputation strategy for incomplete longitudinal binary data. The shortcomings of Demirtas-Hedeker approach are that positive-definiteness of the correlation matrix cannot be guaranteed and the correlations need to satisfy the constraint for a unique solution. To improve the shortcomings of Demirtas-Hedeker method, the proposed methods can be regarded as the modification of Demirtas-Hedeker method with simpler procedures. The performance of Demirtas-Hedeker method and the proposed procedures is compared in terms of standardized bias, coverage percentage, and root-mean-squared error under various configurations of missing rates and missingness mechanisms. A real data set is used to illustrate the application of the proposed methods.
論文目次 Contents
1 Introduction 1
2 Description of Methodology 7
2.1 Imputation Method . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Demirtas and Hedeker Approach . . . . . . . . . . . . . . . . . 12
2.3 Proposed Imputation Strategies . . . . . . . . . . . . . . . . . . 13
3 Simulation Study 18
3.1 Missingness Mechanisms . . . . . . . . . . . . . . . . . . . . . . 19
3.2 GEE Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Evaluation Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4 Conclusion and Discussion 34
i
List of Tables
1 The first ten patients for each center in a trial of respiratory
disease. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2 The parameter estimates of GEE model with independent working
correlation and their standard errors, confidence intervals,
test statistics as well as p-values for respiratory disease data. . 22
3 The parameter estimates of GEE model with exchangeable working
correlation and their standard errors, confidence intervals,
test statistics as well as p-values for respiratory disease data. . 23
4 The parameter estimates of GEE model with unstructured working
correlation and their standard errors, confidence intervals,
test statistics as well as p-values for respiratory disease data. . 24
5 The performance measures of DH, M1 and M2 approaches using
GEE model with independent working correlation under various
missing rates of MCAR. The targeted value is -0.0656. . . . . . 28
6 The performance measures of DH, M1 and M2 approaches using
GEE model with independent working correlation under various
missing rates of MAR. The targeted value is -0.0656. . . . . . . 29
ii
7 The performance measures of DH, M1 and M2 approaches using
GEE model with exchangeable working correlation under
various missing rates of MCAR. The targeted value is -0.0685. . 30
8 The performance measures of DH, M1 and M2 approaches using
GEE model with exchangeable working correlation under
various missing rates of MAR. The targeted value is -0.0685. . . 31
9 The performance measures of DH, M1 and M2 approaches using
GEE model with unstructured working correlation under
various missing rates of MCAR. The targeted value is -0.0954. . 32
10 The performance measures of DH, M1 and M2 approaches using
GEE model with unstructured working correlation under
various missing rates of MAR. The targeted value is -0.0954. . . 33
iii
參考文獻 Bibliography
Agresti, A. (2002). Categorical Data Analysis, 2nd edition, Wiley: New York.
Demirtas, H. and Hedeker, D. (2007). Gaussianization-based quasi-imputation
and expansion strategies for incomplete correlated binary responses, Statis-
tics in Medicine, 26, 782-799.
Diggle, P.J., Heagerty, P.J., Liang, K.Y. and Zeger, S.L. (1994). Analysis of
Longitudinal Data, 2nd edition, Oxford University Press.
Emrich, L.J. and Piedmonte, R.P. (1991). A method for generating highdimensional
multivariate binary outcomes., American Statistician, 45, 302-
304.
Fitzmaurice, G.M. and Lipsitz, S.R. (1995). A model for binary time series
data with serial odds ratio patterns., Applied Statistics, 44, 51-61.
Hedeker, D. (2007). On imputing continuous data when the eventual interest
pertains to ordinalized outcomes via threshold concept, Computational
Statistics & Data Analysis, 52, 2261-2271.
Hedeker, D. and Gibbons, R.D. (1997). Application of Random-Effects
Pattern-Mixture Models for Missing Data in Longitudinal Studies, Psycho-
logical Methods, 2, 64-78.
Koch, G.G., Carr, G.J., Amara, I.A., Stokes, M.E. and Uryniak, T.J. (1990).
Categorical data analysis. In Statistical Methodology in the Pharmaceutical
Sciences, Berry DA (ed.). Marcel Dekker: New York, 389-473.
37
Lavori, P.W., Dawson, R., Shera, D. (1995). A multiple imputation strategy
for clinical trials with truncation of patient data, Statistics in Medicine, 14,
1913-1925.
Lee, A.J. (1993). Generating random binary deviates having fixed marginal
distributions and specified degrees of association, Statistical Computing, 47,
209-215.
Liang, K.Y., and Zeger, S.L. (1986). Longitudinal data analysis using generalized
linear models, Biometrika, 73, 13-22.
Little, R.J.A. and Rubin, D.B. (2002). Statistical Analysis with Missing Data,
2nd edition, Wiley: New York.
Kenward, M.G. and Carpener, J. (2007). Multiple imputation: current perspectives,
Statistical Method in Medical Research, 16, 199-218.
Rubin, D.B. (1976). Inference and missing data (with discussion), Biometrika,
63, 581-592.
Rubin, D.B. (1978). Multiple Imputation in Sample Surveys, Proc. Survey Res.
Meth. Sec., Am. Statist. Assoc., 20-34.
Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Survey, Wiley:
New York.
Schafer, J.L. (1997). Analysis of Incomplete Multivariate Data, Chapman &
Hall: London.
Schafer, J.L. (1999). Multiple imputation: a primer, Statistical Methods in
Medical Research, 8, 3-15.
Stiratelli, R., Laird, N. and Ware, J.H. (1984). Random-effects models for
serial observations with binary, Biometrics, 40, 961-971.
Verbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal
Data, Springer: New York.
38
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2011-07-19公開。
  • 同意授權瀏覽/列印電子全文服務,於2011-07-19起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信