| 系統識別號 | U0002-2207202222242700 |
|---|---|
| DOI | 10.6846/TKU.2022.00612 |
| 論文名稱(中文) | 現狀設限白內障資料之機器提升學習 |
| 論文名稱(英文) | Machine Boost Learning on Current Status Censored Cataract Data |
| 第三語言論文名稱 | |
| 校院名稱 | 淡江大學 |
| 系所名稱(中文) | 數學學系數學與數據科學碩士班 |
| 系所名稱(英文) | Master's Program, Department of Mathematics |
| 外國學位學校名稱 | |
| 外國學位學院名稱 | |
| 外國學位研究所名稱 | |
| 學年度 | 110 |
| 學期 | 2 |
| 出版年 | 111 |
| 研究生(中文) | 盧杰愷 |
| 研究生(英文) | Chieh-Kai Lu |
| 學號 | 610190026 |
| 學位類別 | 碩士 |
| 語言別 | 繁體中文 |
| 第二語言別 | |
| 口試日期 | 2022-06-27 |
| 論文頁數 | 24頁 |
| 口試委員 |
指導教授
-
温啟仲(chichung.wen@gmail.com)
口試委員 - 蔡志群(141400@mail.tku.edu.tw) 口試委員 - 吳裕振(yuhjenn@cycu.edu.tw) |
| 關鍵字(中) |
逐一分量函數梯度下降 提升法 存活分析 |
| 關鍵字(英) |
Component-wise functional gradient descent Mboost Survival analysis |
| 第三語言關鍵字 | |
| 學科別分類 | |
| 中文摘要 |
在本論文中,對於高維度的現狀設限數據問卷資料,我們以在比例勝算比模型下的負對數概似函數作為損失函數,使用機器提升學習方法來建構模型與選取變數,這個方法的計算是基於"mboost" R 套件來發展。我們提了3種決定選出變數重要性的方法,也提了1個評估最終模型預測效能的指標。我們進行了模擬實驗評量所提方法的數值表現,並以台灣65歲以上老人是否罹患白內障的問卷資料分析,作為方法的例說。 |
| 英文摘要 |
For current status censoring data with high dimensional covariates, we, in this thesis, use the negative log-likelihood under the proportional odds model as the loss function and propose a machine boosting learning for model building and variable selection. The computation of the method is based on R package ‘mboost’. We propose three methods to determine the importance of selected variables and on index to evaluate the predictive power of the final model. We conduct simulations to evaluate the proposed procedure and analyze a cataract dataset of Taiwan residents aged over 65 to illustrate our method. |
| 第三語言摘要 | |
| 論文目次 |
一、前言 1 二、資料與模型介紹 4 三、梯度提升法 5 四、模擬 10 五、實例分析 15 六、結論 22 七、參考文獻 23 |
| 參考文獻 |
1. Breiman L (1998) Arcing classifiers (with discussion). Ann Stat 26:801–849 2. Breiman L (2001) Random forests. Mach Learn 45:5–32 3. Bühlmann P, Yu B (2003) Boosting with the L2 loss: regression and classification. J Am Stat Assoc 98: 324–338 4. Bühlmann P (2006) Boosting for high-dimensional linear models. Ann Stat 34:559–583 5. Bühlmann P, Hothorn T (2007) Model-based boosting in R: a handson tutorial using the R package mboost. Springer-Verlag Berlin Heidelberg 2012 6. Fan J, Lv J (2010) A selective overview of variable selection in high dimensional feature space. Statistica Sinica 20:101–148 7. Friedman JH, Hastie T, Tibshirani R (2000) Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 28:337–407 8. Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232 9. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York 10. Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B (2012) mboost: model-based boosting. http://CRAN. R-project.org/package=mboost, R package version 2.1-3 11. Huang J (1996). Efficient estimation for the Cox model with interval censoring. Annals of statistic, 24, 540-568. 12. Kneib T, Hothorn T, Tutz G (2009) Variable selection and model choice in geoadditive regression models. Biometrics 65:626–634. Web appendix accessed at http://www.biometrics.tibs.org/datasets/071127P. htm on 16 Apr 2012 13. Lin DY, Oakes D, Ying Z (1998). Additive hazards regression with current status data. Binometrika, 85, 289-298. 14. Mayr A, Hofner B, Schmid M (2012) The importance of knowing when to stop. A sequential stopping rule for component-wise gradient boosting. Methods of Information in Medicine 51: 178–186. 15. Rossini AJ, Tsiatis AA (1996). A semiparametric proportional odds regression model for the analysis of current status data. Journal of the American Statistical Association 91,713-721. 16. Schmid M, Hothorn T (2008a) Boosting additive models using component-wise P-splines. Comput Stat Data Anal 53:298–311 17. Sun J, Sun L (2005). Semiparametric linear transformation models for current status data. The Canadian Journal of Statistics, 33, 85-96 18. Tian L, Cai T (2006). On the accelerated failure time model for current status and interval censored data. Binometrika, 93, 329-342. 19. Turnbull, B. (1976) The empricial distribution with arbitrarily grouped and censored data Journal of the Royal Statistical Society B, vol 38 p290-295 20. Van der laan MJ, Dudoit S (2003). Unified cross-validation methodology for selection among estimstors: finite sample results, asymptotic optimality, and applications. Technical Report 130, Division of Biostatistics, University of California, Berkeley, Califomia. 21. Van der laan MJ, Robins JM (2003). Unifed Methods for Censored Longitudinal Data and Causality. New York: Springer 22. Van der laan MJ, Dudoit S and Van der vaart AW (2004). The crossvalidated daptive epsilonnet estimator. Technical Report 142, Division of Biostatistics, University of California, Berkeley, Califomia. |
| 論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信