淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


下載電子全文限經由淡江IP使用) 
系統識別號 U0002-2506201201451700
中文論文名稱 抽樣調查敏感性問題比例估計之研究
英文論文名稱 ON ESTIMATION OF PROPORTION OF A SENSITIVE CHARACTERISTIC IN SAMPLING SURVEYS
校院名稱 淡江大學
系所名稱(中) 管理科學學系博士班
系所名稱(英) Doctoral Program, Department of Management Sciences
學年度 100
學期 2
出版年 101
研究生中文姓名 郭美貝
研究生英文姓名 Mei-Pei Kuo
學號 895620101
學位類別 博士
語文別 中文
口試日期 2012-06-03
論文頁數 63頁
口試委員 指導教授-張紘炬
委員-張紘炬
委員-莊忠柱
委員-歐陽良裕
委員-林進財
委員-黃建森
委員-陳淼勝
委員-陳耀竹
中文關鍵字 信賴區間  涵蓋機率  直接作答  估計效率  隨機作答 
英文關鍵字 confidence interval  coverage probability  direct response  estimation efficiency  randomized response 
學科別分類
中文摘要 本研究主要探討有關抽樣調查的二分類母體比例之估計問題。在抽樣調查當中,有些問題是牽涉到個人隱私或非法的部分行為,對於這些敏感性問題,若是直接調查,經常造成拒答的情形;即使願意回答,亦難確保回答的內容是真實的。為了瞭解敏感性問題的真相,降低回答偏誤以獲取正確的資料,於是有學者提出隨機作答模式。由於隨機作答模式可以視為直接作答模式的一般化方法,且多數研究僅只探討點估計量的部份,本研究採用Wilson (1927)信賴區間建構方法,以數理推導方式求得隨機作答模式的二分類母體比例之一般化點估計量與區間估計量,同時,並推論出各估計量之相關統計性質,此外,亦將進一步以均方誤差與涵蓋機率等評量方式,分別進行點估計量與區間估計量的估計效率比較分析。
在直接作答模式之下,二個評比的點估計量各有其相對有效區間,而在隨機作答模式之下,由於相對有效區間與設計參數數值有關,因此,相對有效區間必須依據條件成立與否來求得。無論是在直接作答模式或者是在隨機作答模式之下,利用Wilson方法所建構而得之信賴區間表現得均優於Wald信賴區間。本研究並發現二個評比的區間估計量都會有樣本數愈大,愈有可能產生區間上下界限均不合理的問題。
英文摘要 This study considers the problem of estimation for binomial proportions of sensitive attributes in the population of interest. Randomized response models are suggested for protecting the privacy of respondents and reducing the response bias while eliciting information on sensitive attributes. By applying the Wilson (1927) approach for constructing confidence intervals, various probable point estimators and confidence interval estimators are suggested for the common structures of randomized response models. The results also cover to the case of direct response model. Efficiency comparisons are carried out to study the performance of the proposed estimators for both the cases of direct response and randomized response models. In particular, efficiency comparisons are worked out for point estimators comparison and confidence intervals comparison separately. The efficiency aspect of the proposed point estimators is studied with respect to mean square error criterion. To evaluate the performance of confidence intervals, we concentrate on coverage probability. Circumstances under which each proposed estimators is better in use are also identified. In addition, the effects of design parameters will be discussed.
For the case of direct response model, one of the two competing point estimators is more efficient than the other under certain circumstances. For the case of randomized response model, circumstances under which a point estimator is superior to the other are correlated with design parameters such that it is in need of checking whether the condition holds. For both the cases of direct response and randomized response models, the Wilson approach performs better than Wald confidence interval. It is also found that both the two competing confidence intervals suffer from the undesirable feature that larger sample size results in higher possibility of both the upper and lower limits of the interval outside the parameter space.
論文目次 目錄
中文摘要…………………………………………………………………I
英文摘要………………………………………………………………II
目錄……………………………………………………………………IV
表目錄…………………………………………………………………VI
圖目錄…………………………………………………………………VII
通用符號一覽表……………………………………………………VIII
第一章 緒論……………………………………………………………1
1.1 研究背景與動機…………………………………………………1
1.2 研究目的…………………………………………………………3
1.3 論文結構…………………………………………………………4
第二章 文獻探討………………………………………………………6
2.1 比例估計…………………………………………………………6
2.2 隨機作答模式…………………………………………………15
第三章 隨機作答模式之一般化比例估計……………………………24
3.1 一般化比例估計………………………………………………24
3.2 比例估計量之推論……………………………………………28
第四章 估計效率比較分析……………………………………………31
4.1 直接作答模式……………………………………………………31
4.1.1 點估計量比較………………………………………………31
4.1.2 區間估計量比較……………………………………………34
4.2 隨機作答模式……………………………………………………39
4.2.1 點估計量比較………………………………………………39
4.2.2 區間估計量比較……………………………………………42
第五章 結論……………………………………………………………50
5.1 主要研究結果…………………………………………………50
5.2 未來研究方向…………………………………………………52
參考文獻………………………………………………………………54

表目錄
表4.1 直接作答模式點估計量相對有效區間彙整表……… 34
表4.2 直接作答模式信賴區間涵蓋機率彙整表…………… 36
表4.3 直接作答模式不合理區間界限相對次數彙整表…… 38
表4.4 隨機作答模式信賴區間涵蓋機率彙整表(p=0.7)…44
表4.5 隨機作答模式信賴區間涵蓋機率彙整表(p=0.8)…45
表4.6 隨機作答模式不合理區間界限相對次數彙整表(p=0.7)…47
表4.7 隨機作答模式不合理區間界限相對次數彙整表(p=0.8)…48

圖目錄
圖1.1 本文研究架構圖……………………………………… 5
圖2.1 直接作答模式Wald信賴區間示意圖………………… 8
圖2.2 直接作答模式Wald信賴區間涵蓋機率示意圖……… 9
圖2.3 直接作答模式Wilson信賴區間示意圖……………… 11
圖2.4 直接作答模式Wilson信賴區間涵蓋機率示意圖…… 11
圖4.1 直接作答模式點估計量相對效率比較圖…………… 33
圖4.2 直接作答模式信賴區間涵蓋機率比較圖…………… 35
圖4.3 隨機作答模式點估計量相對效率比較圖…………… 41
圖4.4 隨機作答模式信賴區間涵蓋機率比較圖(p=0.7)……43
圖4.5 隨機作答模式信賴區間涵蓋機率比較圖(p=0.8)……43
圖4.6 隨機作答模式Wald信賴區間示意圖………………… 49
圖4.7 隨機作答模式Wilson信賴區間示意圖……………… 49
參考文獻 [1] Abul-Ela ALA, Greenberg BG, Horvitz DG (1967), A multi- proportions randomized response mode. J Am Stat Assoc 62: 990–1008
[2] Adhikary AK, Chaudhuri A, Vijayan K (1984), Optimum sampling strategies for RR trials. Int Stat Rev 52: 115-125
[3] Agresti A, Caffo B (2000), Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures. Am Stat 54: 280-288
[4] Agresti A, Coull BA (1998), Approximate is better than “exact” for interval estimation of binomial proportions. Am Stat 52: 119–126
[5] Antonak RF, Livneh H (1995), Randomized response technique: A review and proposed extension to disability attitude research. Genet Soc Gen Psychol Monogr 121: 97-145
[6] Arnab R (2004), Optional randomized response techniques for complex survey designs. Biometrical J 46: 114 – 124
[7] Arnab R, Dorffner G (2007), Randomized response techniques for complex survey designs. Stat Pap 48: 131–141
[8] Barabesi L (2008), A design-based randomized response procedure for the estimation of population proportion and sensitivity level. J Stat Plan Infer 138: 2398-2408
[9] Barabesi L, Marcheselli M (2006), A practical implementation and Bayesian estimation in Franklin's randomized response procedure. Commun Stat Simul Comput 35: 563-573
[10] Barabesi L, Marcheselli M (2010), Bayesian estimation of proportion and sensitivity level in randomized response procedures. Metrika 72: 75-88
[11] Bar-Lev SK, Bobovitch E, Boukai B (2005), A note on randomized response models for quantitative data. Metrika 60: 255-260
[12] Bhargava M, Singh R (2002), On the efficiency comparison of certain randomized response strategies. Metrika 55: 191-197
[13] Bohning D (1998), Confidence interval estimation of a rate and the choice of sample size. Stat Med 7: 865–875
[14] Bohning D, Viwatwongkasem C (2005), Revisiting proportion estimators. Stat Methods Med Res 14: 147-169
[15] Bouza CN (2009), Ranked set sampling and randomized response procedures for estimating the mean of a sensitive quantitative character. Metrika 70: 267–277
[16] Brown LD, Cai TT, DasGupta A (2001), Interval estimation for a binomial proportion and asymptotic expansions. Ann Stat 30: 160-201
[17] Brown LD, Cai TT, DasGupta A (2002), Confidence intervals for a binomial proportion. Stat Sci 16: 101-117
[18] Casella G, Berger RL (1990), Statistical inference. Wadsworth and Brooks/Cole, CA
[19] Chang HJ, Huang KC (2001), Estimation of proportion and sensitivity of a qualitative character. Metrika 53: 269-280
[20] Chaudhuri A (2005), Christofides’ randomized response technique in complex sample surveys. Metrika 60: 223-228
[21] Chaudhuri A, Mukerjee R (1988), Randomized response: Theory and techniques. Marcel Dekker, New York
[22] Chaudhuri A, Pal S (2008), Estimating sensitive proportions from Warner’s randomized responses in alternative ways restricting to only distinct units sampled. Metrika 68: 147–156
[23] Chen H (1990), The accuracy of approximate intervals for the binomial parameter. J Am Stat Assoc 85: 514–518
[24] Christofides TC (2005), Randomized response in stratified sampling. J Stat Plan Infer 128: 303-310
[25] Clopper CJ, Pearson ES (1934), The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26: 404–413
[26] Devore JL (1977), A note on the RR techniques. Commun Stat Theory Methods 6: 1525–1529
[27] Diana G, Perri PF (2009), Estimating a sensitive proportion through randomized response procedures based on auxiliary information. Stat Pap 50: 661–672
[28] Dowing TA, Shachtman R (1975), On the relative efficiency of RR models. J Am Stat Assoc 70: 84–87
[29] Duffy JC, Waterton JJ (1984), RR models for estimating the distribution function of a quantitative character. Int Stat Rev 52: 165-171
[30] Fisher M, Kupferman LB, Lesser M (1992), Substance use in a school-based clinic population: Use of the randomized response technique to estimate prevalence. J Adolesc Health 13: 281-285
[31] Folsom RE, Greenberg BG, Horvitz DG, Abernathy JR (1973), The two alternative questions randomized response model for human surveys. J Am Stat Assoc 68: 525–530
[32] Franklin LA (1989), Randomized response sampling from dichotomous populations with continuous randomization. Survey Method 15: 225-235
[33] Gjestvang CR, Singh S (2006), A new randomized response model. J Roy Stat Soc B 68: 523-530
[34] Godambe VP (1980), Estimation in RR trials. Int Stat Rev 48: 29-32
[35] Ghosh BK (1979), A comparison of approximate interval estimators for the binomial parameter. J Am Stat Assoc 74: 894–900
[36] Greenberg BG, Abul-Ela ALA, Simmons WR, Horvitz DG (1969), The unrelated question RR model: Theoretical framework. J Am Stat Assoc 64: 520–539
[37] Greenberg BG, Kuebler RR, Abernathy JR, Horvitz DG (1971), Application of randomized response technique in obtaining quantitative data. J Am Stat Assoc 66: 243–250
[38] Gupta S, Gupta B, Singh S (2002), Estimation of sensitivity level of personal interview survey questions. J Stat Plan Infer 100, 239-247
[39] Gupta S, Shabbir J, Sehra S (2010), Mean and sensitivity estimation in optional randomized response model. J Stat Plann Infer 140: 2870–2874
[40] Hanson S, Schuermann T (2006), Confidence intervals for probabilities of default. J Banking Finance 30: 2281-2301
[41] Hedayat AS, Sinha BK (1991), Design and inference in finite population sampling. Wiley, New York
[42] Heijden PGMvd (2000), A comparison of randomized response, computer-assisted self-interview, and face-to-face direct questioning. Sociol Methods Res 28: 505-537
[43] Heijden PGMvd, Gils Gv (1996), Some logistic regression models for randomized response data. In: Forcina A, Marcheti GM, Hatzinder R, Galmatti G (eds.) Statistical modeling. Proc 11th Int Workshop Stat Model. Orvieto, Italy, 341-348
[44] Heijden PGMvd, Gils Gv, Bouts J, Hox J (1998), A comparison of randomized response, CASAQ, and direct questioning; eliciting sensitive information in the context of fraud. Kwant Method 19: 15-34
[45] Horvitz DG, Shah BV, Simmons WR (1967), The unrelated question RR model. Proc ASA Soc Stat Sec 65-72
[46] Hosseini JC, Armacost RL (1993), Gathering sensitive information in organization. Am Behav Sci 36: 443-471
[47] Huang KC (2004), A survey technique for estimating the proportion and sensitivity in a dichotomous finite population. Stat Neer 58: 75-82
[48] Huang KC (2006), Estimation of sensitive data from a dichotomous population. Stat Pap 47: 149-156
[49] Huang KC (2007), Constructing optimal randomized response designs with consideration for the level of privacy protection. Stat Neer 61: 284-291
[50] Huang KC (2008), Estimation for sensitive characteristics using optional randomized response technique. Qual Quant 42: 679–686
[51] Huang KC (2010), Unbiased estimators of mean, variance and sensitivity level for quantitative characteristics in finite population sampling. Metrika 71: 341–352
[52] Hussain Z, Shabbir J (2009a), Bayesian estimation of population proportion of a sensitive characteristic using simple beta prior. Pakistan J Stat 25: 27-35
[53] Hussain Z, Shabbir J (2009b), On estimation of mean of a sensitive quantitative variable in complex surveys. Pakistan J Stat 25: 127-134
[54] Hussain Z, Shabbir J (2009c), Improved estimation procedures for the mean of sensitive variable using randomized response model. Pakistan J Stat 25: 205-220
[55] Jovanovic BD, Levy PS (1997), A look at the rule of three. Am Stat 51: 137-139
[56] Kerkvliet J (1994), Estimating a logit model with randomized data: The case of cocaine use. Austral J Stat 36: 9-20
[57] Kim JM, Elam ME (2005), A two-stage stratified Warner’s randomized response model using optimal allocation. Metrika 61: 1-7
[58] Kim JM, Elam ME (2007), A stratified unrelated question randomized response model. Stat Pap 48: 215–233
[59] Kim JM, Tebbs JM, An SW (2006), Extensions of Mangat’s randomized response model. J Stat Plan Infer 136: 1554-1567
[60] Kim JM, Warde WD (2005), A mixed randomized response model. J Stat Plan Infer 133: 211-221
[61] Koopman PAR (1984), Confidence intervals for the ratio of two binomial proportions. Biometrics 40: 513-517
[62] Kuk AYC (1990), Asking sensitive questions indirectly. Biometrika 77: 436-438
[63] Lanke J (1975), On the choice of unrelated question in Simmon’s version of RR. J Am Stat Assoc 70: 80–83
[64] Leysieffer RW, Warner SL (1976), Respondent jeopardy and optimal designs in RR models. J Am Stat Assoc 71: 649–656
[65] Lipsitz SR, Dear KBG, Laird NM, Molenberghs G (1998), Tests for homogeneity of the risk difference when data are sparse. Biometrics 54: 148-160
[66] Louis TA (1981), Confidence intervals for a binomial parameter after observing no successes. Am Stat 35: 154
[67] Marcheselli M, Barabesi L (2006), A generalization of Huang's randomized response procedure for the estimation of population proportion and sensitivity level. Metron 64: 145-159
[68] Mangat NS (1994), An improved randomized response strategy. J Roy Stat Soc B 56: 93-95
[69] Mangat NS, Singh R (1990), An alternative randomized response procedure. Biometrika 77: 439-442
[70] McClave JT, Sincich T (2000), Statistics. Prentice Hall, Englewood Cliffs
[71] Moors JJA (1971), Optimization of the unrelated question randomized response model. J Am Stat Assoc 66: 627–629
[72] Newcombe R (1998a), Two-sided confidence intervals for the single proportion: Comparison of seven methods. Stat Med 17: 857–872
[73] Newcombe R (1998b), Interval estimation for the difference between independent proportion: Comparison of seven methods. Stat Med 17: 873–890
[74] Olivier J, May WL (2006), Weighted confidence interval construction for binomial parameters. Stat Methods Med Res 15: 37–46
[75] Pal S (2008), Unbiasedly estimating the total of a stigmatizing variable from a complex survey on permitting options for direct or randomized responses. Stat Pap 49: 157–164
[76] Pan W (2002), Approximate confidence intervals for one proportion and difference of two proportions. Comput Stat Data Anal 40: 143-157
[77] Pollock KH, Bek Y (1976), A comparison of three RR models for quantitative data. J Am Stat Assoc 71: 884–886
[78] Poole WK (1974), Estimation of the distribution function of a continuous type random variable through RR. J Am Stat Assoc 69: 1002–1005
[79] Poole WK, Clayton AC (1982), Generalizations of a contamination model for continuous type random variables. Commun Stat Theory Methods 11: 1733–1742
[80] Price RM, Bonett DG (2004), An improved confidence interval for a linear function of binomial proportions. Comput Stat Data Anal 45: 449-456
[81] Raghavarao D (1978), On an estimation problem in Warner’s randomized response technique. Biometrics 34: 87–90
[82] Saha A (2010), A modified unrelated question randomized response device for complex surveys. Stat Pap 51: 349-355
[83] Sanchez-Meca J, Marin-Martinez F (2000), Testing significance of a common risk difference in meta-analysis. Comput Stat Data Anal 33: 299-313
[84] Sankey SS, Weissfeld LA, Fine MJ, Kapoor W (1996), An assessment of the use of the continuity correction for sparse data in meta-analysis. Commun Stat Simul Comput 25: 1031-1056
[85] Scheers NJ (1992), A review of randomized response techniques. Meas Eval Couns Dev 25: 27-41
[86] Singh R, Mangat NS (1996), Elements of survey sampling. Kluwer, Dordrecht
[87] Singh HP, Mathur N (2005), Estimation of population mean when coefficient of variation is known using scrambled response technique. J Stat Plan Infer 131: 135-144
[88] Singh S, Singh R (1992), Improved Franklin’s model for randomized response sampling. J Indian Stat Assoc 30: 109–122
[89] Singh S, Singh R (1993), Generalized Franklin’s model for randomized response sampling. Commun Stat Theory Methods 22: 741–755
[90] Soeken KL, Macready GB (1982), Respondents’ perceived protection when using randomized response. Psychol Bull 92: 487–498
[91] Tian GL, Yu JW, Tang ML, Geng Z (2007), A new non-randomized model for analyzing sensitive questions with binary outcomes. Stat Med 26: 4238-4252
[92] Tan MT, Tian GL, Tang ML (2009), Sample surveys with sensitive questions: A non-randomized response approach. Am Stat 63: 1-9
[93] Tracy DS, Mangat NS (1995), Respondent’s privacy hazards in Moor’s randomized response model – A remedial strategy. Int J Math Stat Sci 4: 1-10
[94] Tukey JW (1977), Exploratory data analysis. Addison-Wesley, Reading
[95] Umesh UN, Peterson RA (1991), A critical evaluation of the randomized response method: Application, validation, and research agenda. Sociol Methods Res 20: 104-138
[96] Vollset SE (1993), Confidence intervals for a binomial proportion. Stat Med 12: 809-824
[97] Wang CL, Shih KY (2009), A study on estimating proportion and sensitivity for sensitive questions. J Chinese Stat Assoc 47: 174-193
[98] Warner SL (1965), Randomized response: A survey technique for eliminating evasive answer bias. J Am Stat Assoc 60: 63–69
[99] Warner SL (1971), The linear RR model. J Am Stat Assoc 66: 884–888
[100] Whitehead A, Whitehead J (1991), A general parametric approach to the meta-analysis of randomized clinical trials. Stat Med 10: 1665-1677
[101] Williams BL, Suen H (1994), A methodological comparison of survey techniques in obtaining self-reports of condom-related behaviors. Psychol Rep 7: 1531-1537
[102] Wilson EB (1927), Probable inference, the law of succession, and statistical inference. J Am Stat Assoc 22: 209–212
[103] Yu JW, Tian GL, Tang ML (2008), Two new models for survey sampling with sensitive characteristic: Design and analysis. Metrika 67: 251–263
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2012-07-04公開。
  • 同意授權瀏覽/列印電子全文服務,於2012-07-04起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信