電子學位論文服務

§ 瀏覽學位論文書目資料

本論文電子全文於2022-07-27起於校外公開使用
本論文紙本於2022-07-27起公開使用

系統識別號	U0002-2207202223402200
DOI	10.6846/TKU.2022.00614
論文名稱(中文)	婚外性行為調查資料之機器提升學習
論文名稱(英文)	Machine Boost Learning on Extramarital Sex Survey Data
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	數學學系數學與數據科學碩士班
系所名稱(英文)	Master's Program, Department of Mathematics
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	110
學期	2
出版年	111
研究生(中文)	董沛瑄
研究生(英文)	Pei-Hsuan Tung
學號	610190059
學位類別	碩士
語言別	繁體中文
第二語言別
口試日期	2022-06-27
論文頁數	26頁
口試委員	指導教授 - 温啟仲(chichung.wen@gmail.com) 口試委員 - 程毅豪口試委員 - 黃逸輝
關鍵字(中)	逐一分量函數梯度下降提升法隨機作答技巧
關鍵字(英)	Component-wise functional gradient descent Mboost Randomized response technique
第三語言關鍵字
學科別分類
中文摘要	在本論文中，對於無關聯問題隨機作答技巧的高維度問卷資料，我們使用的損失函數是由羅吉斯迴歸模型下的負對數概似函數來獲得，並利用機器提升學習方法來建構模型與選取變量，該方法的計算是使用 R 軟體中的"mboost"套件來進行。我們提了 3 種決定選出變量重要性的方法，也提了 1 個評估最終模型預測效能的指標。我們進行了模擬試驗展示該方法的數值表現，並分析台灣婚外情的無關聯問題隨機作答技巧之問卷資料，作為該方法的實例應用。
英文摘要	For high dimensional unrelated randomized response technique survey data, we, in this thesis, base on the loss function, constructed by the negative log-likelihood under the logistic model, to propose a machine boosting learning for model building and variable selection. The computation of the method is implemented by the modified R package ‘mboost’. We propose three methods to determine the importance of selected variables and on index to evaluate the predictive power of the final model. The proposed method is evaluated by simulation studies and illustrated by the analysis an extramarital sex survey dataset of Taiwan residents.
第三語言摘要
論文目次	一、前言............................................................1 二、資料與模型介紹..................................................5 三、梯度提升法......................................................7 四、模擬...........................................................12 五、實例分析.......................................................17 六、結論...........................................................24 七、參考文獻.......................................................25
參考文獻	1. Breiman L (1998) Arcing classifiers (with discussion). Ann Stat 26:801–849 2. Breiman L (2001) Random forests. Mach Learn 45:5–32 3. Bühlmann P, Yu B (2003) Boosting with the L2 loss: regression and classification. J Am Stat Assoc 98: 324–338 4. Bühlmann P (2006) Boosting for high-dimensional linear models. Ann Stat 34:559–583 5. Bühlmann P, Hothorn T (2007).Model-based boosting in R: a hands on tutorial using the R package mboost. Springer-Verlag Berlin Heidelberg 2012 6. Chang H, Wang C, Haung K (2004) On estimating the proportion of a qualitative sensitive character using randomized response sampling. Qual Quant 38:675–680 7. Fan J, Lv J (2010) A selective overview of variable selection in high dimensional feature space. Statistica Sinica 20:101–148 8. Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 28:337–407 9. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232 10. Gjestvang CR, Singh S (2006) A new randomized response model. J R Stat Soc Ser B 68:523–530 11. Greenberg BG, Abul-Ela A, Simmons WR, Horvitz DG (1969) The underlated question randomized response model: theoretical framework. J Am Stat Assoc 64:520–539 12. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York 13. Haung K (2004) A survey technique for estimating the proportion and sensitivity in a dichotomous finite population. Stat Neerlandica 58:75–82 14. Horvitz DG, Shah BV, Simmons WR (1967) The unrelated question randomised response model. In: Proceedings of the social statistics section, American Statistical Association, pp 65–72 15. Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B (2012) mboost: model-based boosting. http://CRAN. R-project.org/package=mboost, R package version 2.1-3 16. Kim JM, Warde WD (2004) A stratfied Warner’s randomized response model. J Stat Plann Inference 120:155–165 17. Kneib T, Hothorn T, Tutz G (2009) Variable selection and model choice in geoadditive regression models. Biometrics 65:626–634. Web appendix accessed at http://www.biometrics.tibs.org/datasets/071127P. htm on 16 Apr 2012 18. Kuk AYC (1990) Asking sensitive questions indirectly. Biometrika 77:436–438 19. Mangat NS (1994) An improved randomized response strategy. J R Stat Soc Ser B 56:93–95 20. Mangat NS, Singh R (1990) An alternative randomized response procedure. Biometrika 77:439–442 21. Mayr A, Hofner B, Schmid M (2012) The importance of knowing when to stop. A sequential stopping rule for component-wise gradient boosting. Methods of Information in Medicine 51: 178–186. 22. Moors JJA (1971) Optimization of the unrelated question randomized response model. J Am Stat Assoc 66:627–629 23. Raghavarao D (1978) On an estimation problem in Warner’s randomized response technique. Biometrics 34:87–90 24. Schmid M, Hothorn T (2008a) Boosting additive models using component-wise P-splines. Comput Stat Data Anal 53:298–311 25. Singh S, Singh R, Mangat NS (2000) Some alternative strategies to Moor’s model in randomized response sampling. J Stat Plan Inference 83:243–255 26. Van der laan MJ, Dudoit S (2003). Unified cross-validation methodology for selection among estimstors: finite sample results, asymptotic optimality, and applications. Technical Report 130, Division of Biostatistics, University of California, Berkeley, Califomia. 27. Van der laan MJ, Robins JM (2003). Unifed Methods for Censored Longitudinal Data and Causality. New York: Springer 28. Van der laan MJ, Dudoit S, Van der vaart AW (2004). The cross-validated adaptive epsilonnet estimator. Technical Report 142, Division of Biostatistics, University of California, Berkeley, Califomia. 29. Warner SL (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60:63–69
論文全文使用權限	國家圖書館：同意無償授權國家圖書館，書目與全文電子檔於繳交授權書後, 於網際網路立即公開校內：校內紙本論文立即公開同意電子論文全文授權於全球公開校內電子論文立即公開校外：同意授權予資料庫廠商校外電子論文立即公開

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信