電子學位論文服務

§ 瀏覽學位論文書目資料

本論文電子全文於2022-08-15起於校外公開使用
本論文紙本於2022-08-15起公開使用

系統識別號	U0002-1508202202092900
DOI	10.6846/TKU.2022.00364
論文名稱(中文)	比較隨機森林和XGBoost的預測強韌性
論文名稱(英文)	Comparison of RandomForest and XGBoost in Forecasting Robustness
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	統計學系應用統計學碩士班
系所名稱(英文)	Department of Statistics
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	110
學期	2
出版年	111
研究生(中文)	陳伯杰
研究生(英文)	Po-Chieh Chen
學號	605650059
學位類別	碩士
語言別	繁體中文
第二語言別
口試日期	2022-07-12
論文頁數	68頁
口試委員	指導教授 - 陳景祥(steve@stat.tku.edu.tw) 口試委員 - 何宗武口試委員 - 李百靈
關鍵字(中)	隨機森林極限梯度提升法強韌性
關鍵字(英)	RandomForest XGBoost Robustness
第三語言關鍵字
學科別分類
中文摘要	隨著數據資料不斷地加入，模型的預測能力是否可能不如預期或是預測水準的穩健性呈現大幅度變動。穩定的預測正確率具有強韌性的優勢。本研究針對具有趨勢與週期性的時間數列型態資料和非時間數列型態的資料，探討隨機森林模型和XGBoost模型的預測正確率與預測正確變動率，進行兩模型強韌性的優劣比較。研究發現，當資料屬性為趨勢的時間數列型態時，隨機森林模型在預測的強韌性表現上，優於XGBoost模型。當資料屬性為週期的時間數列型態時，隨機森林模型在預測的強韌性表現上與XGBoost模型差異不大。當資料屬性為非時間數列型態時，隨機森林模型在預測的強韌性表現上，略優於XGBoost模型。
英文摘要	With the continuous addition of data, whether the predictive ability of the model may be lower than expected or the robustness of the prediction level will change significantly. A stable prediction accuracy has the advantage of strong resilience. In this study, aiming at the time series type data with trend and periodicity and the non-time series type data, the prediction accuracy and prediction accuracy change rate of the RandomForest model and the XGBoost model were discussed, and the comparison of the strengths and weaknesses of the two models was found. The study found that when the data attribute is a trend time series type, the RandomForest model is better than the XGBoost model in predicting the toughness performance. When the data attribute is a periodic time series, the Random- Forest model has little difference with the XGBoost model in predicting the toughness performance. When the data attribute is non-time series type, the RandomForest model is slightly better than the XGBoost model in the performance of prediction robustness.
第三語言摘要
論文目次	目錄第一章緒論 1 1.1研究背景 1 1.2研究動機與目的 2 1.3論文架構 3 第二章文獻探討 4 2.1決策樹 4 2.2 CART決策樹 6 2.3隨機森林 8 2.4 XGBoost 11 第三章研究方法 16 3.1資料來源 16 3.2 評估指標 18 3.3研究方法 19 第四章實證分析 21 4.1資料背景 21 4.2研究分析 23 4.2.1 Apple Stock Price分析 23 4.2.2 Tokyo Weather Data分析 36 4.2.3 Combined Cycle Power Plant分析 47 第五章結論與建議 64 5.1結論 64 5.2研究建議 65 參考文獻 66 中文文獻 66 英文文獻 66 表目錄表1 研究資料集 16 表2 軟硬體規格彙整 17 表3 R軟體套件彙整 17 表4 Apple Stock Price – All Time的變數 21 表5 Tokyo Weather Data的變數 22 表6 Combined Cycle Power Plant的變數 22 表7 5筆累加的預測正確率 24 表8 5筆累加的預測正確變動率 25 表9 10筆累加的預測正確率 28 表10 10筆累加的預測正確變動率 29 表11 30筆累加的預測正確率 31 表12 30筆累加的預測正確變動率 32 表13 Apple Stock Price前50%與後50%比較 34 表14 Apple Stock Price前25%與後25%比較 35 表15 5筆累加的預測正確率 37 表16 5筆累加的預測正確變動率 38 表17 10筆累加的預測正確率 40 表18 10筆累加的預測正確變動率 41 表19 30筆累加的預測正確率 43 表20 30筆累加的預測正確變動率 44 表21 Tokyo Weather Data前50%與後50%比較 46 表22 Tokyo Weather Data前25%與後25%比較 46 表23 逐期預測正確率 47 表24 逐期預測正確變動率 48 表25 Combined Cycle Power Plant前50%與後50%比較 50 表26 Combined Cycle Power Plant前25%與後25%比較 50 表27 Corn, Soy, Wheat, Crude Oil, and S&P 500 prices前50%與後50%比較 51 表28 Corn, Soy, Wheat, Crude Oil, and S&P 500 prices前25%與後25%比較 52 表29 Closing price of Top Indexes前50%與後50%比較 53 表30 Closing price of Top Indexes前25%與後25%比較 54 表31 Air Quality前50%與後50%比較 55 表32 Air Quality前25%與後25%比較 56 表33 Daily Coffee Price前50%與後50%比較 57 表34 Daily Coffee Price前25%與後25%比較 58 表35 Concrete Compressive Strength前50%與後50%比較 59 表36 Concrete Compressive Strength前25%與後25%比較 59 表37 Dry Bean Dataset前50%與後50%比較 60 表38 Dry Bean Dataset前25%與後25%比較 60 表39 HTRU2前50%與後50%比較 61 表40 HTRU2前25%與後25%比較 61 表41 隨機森林與XGBoost在所有資料集的優劣比較彙整 63 圖目錄圖1 研究流程圖 3 圖2 決策樹示意圖 5 圖3 隨機森林示意圖 9 圖4 5筆累加預測正確率趨勢圖 26 圖5 5筆累加預測正確變動率趨勢圖 27 圖6 10筆累加預測正確率趨勢圖 30 圖7 10筆累加預測正確變動率趨勢圖 30 圖8 30筆累加預測正確率趨勢圖 33 圖9 30筆累加預測正確變動率趨勢圖 33 圖10 5筆累加預測正確率趨勢圖 39 圖11 5筆累加預測正確變動率趨勢圖 39 圖12 10筆累加預測正確率趨勢圖 42 圖13 10筆累加預測正確變動率趨勢圖 42 圖14 30筆累加預測正確率趨勢圖 45 圖15 30筆累加預測正確變動率趨勢圖 45 圖16 逐步預測正確率趨勢圖 49 圖17 逐步預測正確變動率趨勢圖 49
參考文獻	參考文獻中文文獻江奕(2013)，「資料探勘技術應用於病患存活狀態之預測」，淡江大學統計學系應用統計學碩士班論文江泓德(2020)，「異狀資料偵測模型推薦」，淡江大學大數據分析與商業智慧碩士學位學程碩士論文張竣維(2020)，「二階段加權隨機森林運用於汽車保險資料之應用」，淡江大學統計學系應用統計學碩士班碩士論文英文文獻 A.B. Parsa, A. Movahedi, H. Taghipour, S. Derrible and A. Mohammadian (2020). Toward safer highways, application of XGBoost and SHAP for real-time accident detection and feature analysis. Accident Analysis and Prevention 136:105405. Cano, Garcia-Rodriguez, Garcia-Garcia, et al. (2017). Automatic selection of molecular descriptors using random forest: Application to drug discovery. Expert Systems with Applications 238-244. Xunfei Deng, Zhi Liu, Yu Zhan, Kang Ni, Yongzhi Zhang, Wanzhu Ma, Shengzhi Shao, Xiaonan Lv, Yuwei Yuan and Karyne M. Rogers (2020). Predictive geographical authentication of green tea with protected designation of origin using a random forest model. Food Control 107:106807. Hoang Nguyen, Xuan-Nam Bui, Hoang-Bac Bui, Dao Trong Cuong (2019). Developing an XGBoost model topredict blast-induced peak particle velocity in an open-pit mine: a case study. Acta Geophysica 67:477-490. H.R. Varian (2014a). Big Data: New Tricks for Econometrics. Journal of Economic Perspectives 28:3-28. H.R. Varian (2014b). Beyond Big Data. Business Economics 49: 27-31. J.C. Huang, Y.C. Tsai, P.Y. Wu, Y.H. Lien, C.Y. Chien, C.F. Kuo, J.F. Hung , S.C. Chen and C.H. Kuo (2020). Predictive modeling of blood pressure during hemodialysis: a comparison of linear model, random forest, support vector regression, XGBoost, LASSO regression and ensemble method. Computer Methods and Programs in Biomedicine 195:105536. J.M. Sadler, J.L. Goodall ,M.M. Morsy and K. Spencer (2018). Modeling urban coastal flood severity from crowd-sourced flood reports using Poisson regression and Random Forest. Journal of Hydrology 559:43-55. J. Sen and T. Chaudhuri (2017). A Robust Predictive Model for Stock Price Forecasting. Working Paper, The 5th International Conference on Business Analytics and Intelligence. L. Breiman, J. H. Friedman, R. A. Olshen and C. J. Stone (1984). Classification and regression trees. L. Breiman (2001). Random Froest. Machine learning 45:5-32. M. Kumar and M. Thenmozhi (2006). Forecasting Stock Index Movement: A Comparison of Support Vector Machines and Random Forest. The Ninth Indian Institute of Capital Markets Conference. J. Sen, and T. Chaudhuri (2017). A Robust Predictive Model for Stock Price Forecasting. The 5th International Conference on Business Analytics and Intelligence S.S. Dhaliwal, A. Nahid and R. Abbas (2018). Effective Intrusion Detection System Using XGBoost. Infornation 9:149. T. Chen and C. Guestrin (2016). XGBoost: A Scalable Tree Boosting System. Proceeding of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and DataMining. Y. Zhou, T. Li, J. Shi, and Z. Qian (2019). A CEEMDAN and XGBOOST-Based Approach to Forecast Crude Oil Prices. Complexity 1–15. Zeinab Shahbazi and Y.C. Byun (2020). Product Recommendation Based on Content-based Filtering Using XGBoost Classifier. International Journal of Advanced Science and Technology 29(4): 6979–6988.
論文全文使用權限	國家圖書館：同意無償授權國家圖書館，書目與全文電子檔於繳交授權書後, 於網際網路立即公開校內：校內紙本論文立即公開同意電子論文全文授權於全球公開校內電子論文立即公開校外：同意授權予資料庫廠商校外電子論文立即公開

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信