系統識別號 | U0002-2207202223402200 |
---|---|
DOI | 10.6846/TKU.2022.00614 |
論文名稱(中文) | 婚外性行為調查資料之機器提升學習 |
論文名稱(英文) | Machine Boost Learning on Extramarital Sex Survey Data |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 數學學系數學與數據科學碩士班 |
系所名稱(英文) | Master's Program, Department of Mathematics |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 110 |
學期 | 2 |
出版年 | 111 |
研究生(中文) | 董沛瑄 |
研究生(英文) | Pei-Hsuan Tung |
學號 | 610190059 |
學位類別 | 碩士 |
語言別 | 繁體中文 |
第二語言別 | |
口試日期 | 2022-06-27 |
論文頁數 | 26頁 |
口試委員 |
指導教授
-
温啟仲(chichung.wen@gmail.com)
口試委員 - 程毅豪 口試委員 - 黃逸輝 |
關鍵字(中) |
逐一分量函數梯度下降 提升法 隨機作答技巧 |
關鍵字(英) |
Component-wise functional gradient descent Mboost Randomized response technique |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
在本論文中,對於無關聯問題隨機作答技巧的高維度問卷資料,我們使用的損失函數是由羅吉斯迴歸模型下的負對數概似函數來獲得,並利用機器提升學習方法來建構模型與選取變量,該方法的計算是使用 R 軟體中的"mboost"套件來進行。我們提了 3 種決定選出變量重要性的方法,也提了 1 個評估最終模型預測效能的指標。我們進行了模擬試驗展示該方法的數值表現,並分析台灣婚外情的無關聯問題隨機作答技巧之問卷資料,作為該方法的實例應用。 |
英文摘要 |
For high dimensional unrelated randomized response technique survey data, we, in this thesis, base on the loss function, constructed by the negative log-likelihood under the logistic model, to propose a machine boosting learning for model building and variable selection. The computation of the method is implemented by the modified R package ‘mboost’. We propose three methods to determine the importance of selected variables and on index to evaluate the predictive power of the final model. The proposed method is evaluated by simulation studies and illustrated by the analysis an extramarital sex survey dataset of Taiwan residents. |
第三語言摘要 | |
論文目次 |
一、前言............................................................1 二、資料與模型介紹..................................................5 三、梯度提升法......................................................7 四、模擬...........................................................12 五、實例分析.......................................................17 六、結論...........................................................24 七、參考文獻.......................................................25 |
參考文獻 |
1. Breiman L (1998) Arcing classifiers (with discussion). Ann Stat 26:801–849 2. Breiman L (2001) Random forests. Mach Learn 45:5–32 3. Bühlmann P, Yu B (2003) Boosting with the L2 loss: regression and classification. J Am Stat Assoc 98: 324–338 4. Bühlmann P (2006) Boosting for high-dimensional linear models. Ann Stat 34:559–583 5. Bühlmann P, Hothorn T (2007).Model-based boosting in R: a hands on tutorial using the R package mboost. Springer-Verlag Berlin Heidelberg 2012 6. Chang H, Wang C, Haung K (2004) On estimating the proportion of a qualitative sensitive character using randomized response sampling. Qual Quant 38:675–680 7. Fan J, Lv J (2010) A selective overview of variable selection in high dimensional feature space. Statistica Sinica 20:101–148 8. Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 28:337–407 9. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232 10. Gjestvang CR, Singh S (2006) A new randomized response model. J R Stat Soc Ser B 68:523–530 11. Greenberg BG, Abul-Ela A, Simmons WR, Horvitz DG (1969) The underlated question randomized response model: theoretical framework. J Am Stat Assoc 64:520–539 12. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York 13. Haung K (2004) A survey technique for estimating the proportion and sensitivity in a dichotomous finite population. Stat Neerlandica 58:75–82 14. Horvitz DG, Shah BV, Simmons WR (1967) The unrelated question randomised response model. In: Proceedings of the social statistics section, American Statistical Association, pp 65–72 15. Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B (2012) mboost: model-based boosting. http://CRAN. R-project.org/package=mboost, R package version 2.1-3 16. Kim JM, Warde WD (2004) A stratfied Warner’s randomized response model. J Stat Plann Inference 120:155–165 17. Kneib T, Hothorn T, Tutz G (2009) Variable selection and model choice in geoadditive regression models. Biometrics 65:626–634. Web appendix accessed at http://www.biometrics.tibs.org/datasets/071127P. htm on 16 Apr 2012 18. Kuk AYC (1990) Asking sensitive questions indirectly. Biometrika 77:436–438 19. Mangat NS (1994) An improved randomized response strategy. J R Stat Soc Ser B 56:93–95 20. Mangat NS, Singh R (1990) An alternative randomized response procedure. Biometrika 77:439–442 21. Mayr A, Hofner B, Schmid M (2012) The importance of knowing when to stop. A sequential stopping rule for component-wise gradient boosting. Methods of Information in Medicine 51: 178–186. 22. Moors JJA (1971) Optimization of the unrelated question randomized response model. J Am Stat Assoc 66:627–629 23. Raghavarao D (1978) On an estimation problem in Warner’s randomized response technique. Biometrics 34:87–90 24. Schmid M, Hothorn T (2008a) Boosting additive models using component-wise P-splines. Comput Stat Data Anal 53:298–311 25. Singh S, Singh R, Mangat NS (2000) Some alternative strategies to Moor’s model in randomized response sampling. J Stat Plan Inference 83:243–255 26. Van der laan MJ, Dudoit S (2003). Unified cross-validation methodology for selection among estimstors: finite sample results, asymptotic optimality, and applications. Technical Report 130, Division of Biostatistics, University of California, Berkeley, Califomia. 27. Van der laan MJ, Robins JM (2003). Unifed Methods for Censored Longitudinal Data and Causality. New York: Springer 28. Van der laan MJ, Dudoit S, Van der vaart AW (2004). The cross-validated adaptive epsilonnet estimator. Technical Report 142, Division of Biostatistics, University of California, Berkeley, Califomia. 29. Warner SL (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60:63–69 |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信