§ 瀏覽學位論文書目資料
  
系統識別號 U0002-2207202223402200
DOI 10.6846/TKU.2022.00614
論文名稱(中文) 婚外性行為調查資料之機器提升學習
論文名稱(英文) Machine Boost Learning on Extramarital Sex Survey Data
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 數學學系數學與數據科學碩士班
系所名稱(英文) Master's Program, Department of Mathematics
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 110
學期 2
出版年 111
研究生(中文) 董沛瑄
研究生(英文) Pei-Hsuan Tung
學號 610190059
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2022-06-27
論文頁數 26頁
口試委員 指導教授 - 温啟仲(chichung.wen@gmail.com)
口試委員 - 程毅豪
口試委員 - 黃逸輝
關鍵字(中) 逐一分量函數梯度下降
提升法
隨機作答技巧
關鍵字(英) Component-wise functional gradient descent
Mboost
Randomized response technique
第三語言關鍵字
學科別分類
中文摘要
在本論文中,對於無關聯問題隨機作答技巧的高維度問卷資料,我們使用的損失函數是由羅吉斯迴歸模型下的負對數概似函數來獲得,並利用機器提升學習方法來建構模型與選取變量,該方法的計算是使用 R 軟體中的"mboost"套件來進行。我們提了 3 種決定選出變量重要性的方法,也提了 1 個評估最終模型預測效能的指標。我們進行了模擬試驗展示該方法的數值表現,並分析台灣婚外情的無關聯問題隨機作答技巧之問卷資料,作為該方法的實例應用。
英文摘要
For high dimensional unrelated randomized response technique survey data, we, in this thesis, base on the loss function, constructed by the negative log-likelihood under the logistic model, to propose a machine boosting learning for model building and variable selection. The computation of the method is implemented by the modified R package ‘mboost’. We propose three methods to determine the importance of selected variables and on index to evaluate the predictive power of the final model. The proposed method is evaluated by simulation studies and illustrated by the analysis an extramarital sex survey dataset of Taiwan residents.
第三語言摘要
論文目次
一、前言............................................................1
二、資料與模型介紹..................................................5
三、梯度提升法......................................................7
四、模擬...........................................................12
五、實例分析.......................................................17
六、結論...........................................................24
七、參考文獻.......................................................25
參考文獻
1. Breiman L (1998) Arcing classifiers (with discussion). Ann Stat 26:801–849
2. Breiman L (2001) Random forests. Mach Learn 45:5–32
3. Bühlmann P, Yu B (2003) Boosting with the L2 loss: regression and classification. J Am Stat Assoc 98: 324–338
4. Bühlmann P (2006) Boosting for high-dimensional linear models. Ann Stat 34:559–583
5. Bühlmann P, Hothorn T (2007).Model-based boosting in R: a hands on tutorial using the R package mboost. Springer-Verlag Berlin Heidelberg 2012
6. Chang H, Wang C, Haung K (2004) On estimating the proportion of a qualitative sensitive character using randomized response sampling. Qual Quant 38:675–680
7. Fan J, Lv J (2010) A selective overview of variable selection in high dimensional feature space. Statistica Sinica 20:101–148
8. Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 28:337–407
9. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
10. Gjestvang CR, Singh S (2006) A new randomized response model. J R Stat Soc Ser B 68:523–530
11. Greenberg BG, Abul-Ela A, Simmons WR, Horvitz DG (1969) The underlated question randomized response model: theoretical framework. J Am Stat Assoc 64:520–539
12. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York
13. Haung K (2004) A survey technique for estimating the proportion and sensitivity in a dichotomous finite population. Stat Neerlandica 58:75–82
14. Horvitz DG, Shah BV, Simmons WR (1967) The unrelated question randomised response model. In: Proceedings of the social statistics section, American Statistical Association, pp 65–72
15. Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B (2012) mboost: model-based boosting. http://CRAN. R-project.org/package=mboost, R package version 2.1-3
16. Kim JM, Warde WD (2004) A stratfied Warner’s randomized response model. J Stat Plann Inference 120:155–165
17. Kneib T, Hothorn T, Tutz G (2009) Variable selection and model choice in geoadditive regression models. Biometrics 65:626–634. Web appendix accessed at http://www.biometrics.tibs.org/datasets/071127P. htm on 16 Apr 2012
18. Kuk AYC (1990) Asking sensitive questions indirectly. Biometrika 77:436–438
19. Mangat NS (1994) An improved randomized response strategy. J R Stat Soc Ser B 56:93–95
20. Mangat NS, Singh R (1990) An alternative randomized response procedure. Biometrika 77:439–442
21. Mayr A, Hofner B, Schmid M (2012) The importance of knowing when to stop. A sequential stopping rule for component-wise gradient boosting. Methods of Information in Medicine 51: 178–186.
22. Moors JJA (1971) Optimization of the unrelated question randomized response model. J Am Stat Assoc 66:627–629
23. Raghavarao D (1978) On an estimation problem in Warner’s randomized response technique. Biometrics 34:87–90
24. Schmid M, Hothorn T (2008a) Boosting additive models using component-wise P-splines. Comput Stat Data Anal 53:298–311
25. Singh S, Singh R, Mangat NS (2000) Some alternative strategies to Moor’s model in randomized response sampling. J Stat Plan Inference 83:243–255
26. Van der laan MJ, Dudoit S (2003). Unified cross-validation methodology for selection among estimstors: finite sample results, asymptotic optimality, and applications. Technical Report 130, Division of Biostatistics, University of California, Berkeley, Califomia.
27. Van der laan MJ, Robins JM (2003). Unifed Methods for Censored Longitudinal Data and Causality. New York: Springer
28. Van der laan MJ, Dudoit S, Van der vaart AW (2004). The cross-validated adaptive epsilonnet estimator. Technical Report 142, Division of Biostatistics, University of California, Berkeley, Califomia.
29. Warner SL (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60:63–69
論文全文使用權限
國家圖書館
同意無償授權國家圖書館,書目與全文電子檔於繳交授權書後, 於網際網路立即公開
校內
校內紙本論文立即公開
同意電子論文全文授權於全球公開
校內電子論文立即公開
校外
同意授權予資料庫廠商
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信