電子學位論文服務

§ 瀏覽學位論文書目資料

本論文電子全文於2013-07-24起於校外公開使用
本論文紙本於2013-07-24起公開使用

系統識別號	U0002-2207201311235900
DOI	10.6846/TKU.2013.00868
論文名稱(中文)	銀行信用風險評分應用資料探勘技術之比較研究
論文名稱(英文)	A Comparative Study of Data Mining Techniques for Credit Scoring in Banking
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	資訊管理學系碩士班
系所名稱(英文)	Department of Information Management
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	101
學期	2
出版年	102
研究生(中文)	黃世禎
研究生(英文)	Shih-Chen Huang
學號	600630445
學位類別	碩士
語言別	繁體中文
第二語言別
口試日期	2013-06-14
論文頁數	86頁
口試委員	指導教授 - 戴敏育(myday@mail.tku.edu.tw) 委員 - 戴敏育(myday@mail.tku.edu.tw) 委員 - 侯永昌委員 - 翁頌舜
關鍵字(中)	分類方法資料探勘信用風險評分支持向量機賽仕企業採礦工具
關鍵字(英)	Classification Method Data mining Credit Risk Score Support Vector Machine (SVM) SAS Enterprise Miner (SAS EM)
第三語言關鍵字
學科別分類
中文摘要	信貸對於銀行機構是重要收入來源，過去研究指出信用風險評分模型以邏輯斯迴歸和類神經網路分類方法較佳。本研究主要目的為提出較合適的信用風險評分模型以降低信貸風險並分析比較各分類模型正確率。本研究提出利用企業資料探勘軟體建構四種信用風險評分模型，分別為決策樹法(Decision Tree)、邏輯斯迴歸(Logistic Regression)、類神經網路(Neural Network)、支持向量機(Support Vector Machine; SVM)，並進一步詳細比較17種分類模型之正確率，實驗結果顯示，支持向量機分類模型有較高正確率。本研究主要貢獻為利用資料探技術建立各種銀行信用風險評分之分類模型並比較其正確率，並證實支持向量機分類方法皆優於傳統分類方法。
英文摘要	Credit is becoming one of the most important sources of income for the banking institutions. Prior studies indicated that logistic regression and neural network had been performed better on credit risk scoring. The major purpose of the present study is to propose appropriate credit risk scoring models to reduce credit risk and compare the accuracy of various classification models. The study proposed using enterprise data mining software to construct four classifications predictive models, such as decision tree, logistic regression, neural network and support vector machine, and further compared their accuracy of 17 classification models. The experimental results show that support vector machine classification models perform better in terms of high accuracy. The main contribution of this paper is that we use data mining techniques to construct various classification models for credit scoring in banking and compare their accuracy, and evidence shows that support vector machine outperforms traditional classification methods.
第三語言摘要
論文目次	壹、緒論 1 1.1 研究背景與動機 1 1.2 研究目的 3 1.3 研究流程 3 貳、文獻探討 5 2.1 信用風險( Credit risk) 5 2.1.1 信用風險評分( Credit scoring risk) 6 2.1.2 信用風險評分之應用( Credit scoring application) 7 2.2 資料探勘( Data mining) 8 2.2.1 資料探勘之定義及概念( Data mining definition) 9 2.2.2 資料探勘之功能( Data mining efficacy) 9 2.2.3 分類的意義( Classification definition ) 11 2.2.4 分類概念( Classification conception) 12 2.2.5 分類相關研究( Classification relation work) 13 2.2.6 特徵選取( Feature selection) 14 2.3 分類方法( Classification method) 15 2.3.1 決策樹分類方法( Decision tree ) 15 2.3.2 邏輯斯迴歸分類方法( Logistic regression) 16 2.3.3 類神經網路分類方法( Neural network) 17 2.3.4 支持向量機分類方法( Support vector machine) 19 2.4 企業資料探勘軟體SAS Enterprise Miner( SAS EM) 20 2.4.1 SEMMA資料採礦流程 20 2.5 本章小結 26 參、研究方法 27 3.1 研究架構 27 3.2 資料蒐集與前處理 29 3.3 SAS EM統計工具之模型建置(決策樹) 29 3.4 SAS EM統計工具之模型建置(邏輯斯迴歸) 29 3.5 SAS EM統計工具之模型建置(類神經網路) 30 3.6 SAS EM統計工具之模型建置(支持向量機) 30 3.7 LIB-SVM之模型建置 31 3.8 本章小結 32 肆、實證研究 33 4.1 資料前處理 33 4.2 SAS EM模型建置流程 35 4.2.1 SAS EM資料前處理 36 4.2.2 建立資料採礦流程圖 38 4.2.3 使用SEMMA之決策樹模型 39 4.2.4 使用SEMMA之邏輯斯迴歸模型 40 4.2.5 使用SEMMA之類神經網路模型 41 4.2.6 使用SEMMA之支持向量機模型 41 4.2.7 SAS EM 支持向量機12種模型比較結果 43 4.3 LibSVM模型建置流程 45 4.4 分析結果與討論 45 4.5 本章小結 48 伍、結論 50 5.1 研究結論 50 5.2 研究貢獻 51 5.3 未來研究建議 52 陸、參考文獻 53 附錄A 59 圖目次圖 1-1 研究流程圖 4 圖 3-1研究方法流程圖 27 圖 4-1資料集轉檔 34 圖4-2 ROC圖 42 圖4-3增益表 43 圖4-4建立12種支持向量機分類模型正確率比較圖 45 圖4-5建立 17種分類模型正確率比較圖 48 圖4-6澳洲資料集與德國資料集前三高正確率比較表 49 表目次表 2-1信用評分相關研究 8 表 2-2 資料探勘定義 9 表 2-3分類相關研究 14 表 2-4決策樹分類規則 16 表 3-1核函數表 31 表 4-1資料集變數 33 表 4-2 模型編號與模型名稱 35 表 4-3 SAS EM支持向量機12種預測模型 44 表 4-4 17種分類模型正確率比較表 47 附錄A 圖 A-1資料集轉檔 59 圖 A-2開新專案(1) 60 圖 A-3開新專案(2) 60 圖 A-4開啟資料館 61 圖 A-5建立新資料館 61 圖 A-6輸入資料館名稱及路徑 62 圖 A-7選擇欲連接之資料檔 62 圖 A-8資料館設定完成 63 圖 A-9建立資料集 64 圖 A-10選擇中繼資料來源 65 圖 A-11選取資料來源(1) 65 圖 A-12選取資料來源(2) 66 圖 A-13確認表格資訊 66 圖 A-14中繼資料顧問選項 67 圖 A-15中繼資料設定(1) 67 圖 A-16中繼資料設定(2) 68 圖 A-17中繼資料設定(3) 68 圖 A-18觀察摘要統計(1) 69 圖 A-19觀察摘要統計(2) 69 圖 A-20決策處理設定 70 圖 A-21是否建立樣本資料集 70 圖 A-22選擇建立的類型 71 圖 A-23選擇資料來源角色 71 圖 A-24確認完成資料 72 圖 A-25建立流程圖 73 圖 A-26輸入流程圖名稱 73 圖 A-27流程圖建立完成 74 圖 A-28將資料集拉入流程圖中 74 圖 A-29找到資料分區節點 75 圖 A-30將資料分區拉入流程圖中且做連線 75 圖 A-31將資料集配置做更改 76 圖 A-32找到決策樹節點 76 圖 A-33將決策樹節點拉出且連結資料分區節點 77 圖 A-34將模型先進行執行 77 圖 A-35確定執行 78 圖 A-36觀察結果 78 圖 A-37使用互動式選項 79 圖 A-38瀏覽樹檢視圖 79 圖 A-39使用分割節點 80 圖 A-40觀察-LOG(P)值 80 圖 A-41 –LOG(P)越大為分割之依據(1) 81 圖 A-42–LOG(P)越大為分割之依據(2) 81 圖 A-43分割到-LOG(P)小於5且個數小於5即停止 82 圖 A-44手動樹分支之結果 82 圖 A-45找出邏輯斯迴歸節點 83 圖 A-46設定模型屬性 83 圖 A-47找出類神經網路節點 84 圖 A-48將類神經網路節點連接在邏輯斯迴歸後面 84 圖 A-49設定最佳化(1) 85 圖 A-50設定最佳化(2) 85 圖 A-51執行模型 86 圖 A-52支持向量機模型 86
參考文獻	一.英文部分 Berkson, J. 1944, "Application of the Logistic Function to Bio-Assay," Journal of the American Statistical Association (39:227), pp. 357-365. Berry, M. J., and G. Linoff. 1997. Data Mining Techniques: For Marketing, Sales, and Customer Support, John Wiley & Sons, Inc. Blake, C., and C. J. Merz. 1998, "{UCI} Repository of Machine Learning Databases, ". Breiman, L., J. Friedman, C. J. Stone, and R. A. Olshen. 1984. Classification and Regression Trees, Chapman & Hall/CRC. Chen, Y., and C. Lin. 2006, "Combining SVMs with various Feature Selection Strategies,” Feature Extraction pp. 315-324. Cho, B. H., H. Yu, K. Kim, T. H. Kim, I. Y. Kim, and S. I. Kim. 2008, "Application of Irregular and Unbalanced Data to Predict Diabetic Nephropathy using Visualization and Feature Selection Methods," Artificial Intelligence in Medicine (42:1), pp. 37-54. Dash, M., and H. Liu. 1997, "Feature Selection for Classification," Intelligent Data Analysis (1:1-4), pp. 131-156. Desai, V. S., J. N. Crook, and G. A. Overstreet. 1996, "A Comparison of Neural Networks and Linear Scoring Models in the Credit Union Environment," European Journal of Operational Research (95:1), pp. 24-37. Fayyad, U., G. Piatetsky-Shapiro, and P. Smyth. 1996, "From Data Mining to Knowledge Discovery in Databases," AI Magazine (17:3), pp. 37. The Financial Industry Regulatory Authority; FINRA," http://www.finra.org." Gao, L., C. Zhou, H. Gao, and Y. Shi. 2006, "Credit Scoring Model Based on Neural Network with Particle Swarm Optimization," Advances in Natural Computation pp. 76-79. Henley, W. E. 1994, Statistical Aspects of Credit Scoring. Hsieh, N. 2005, "Hybrid Mining Approach in the Design of Credit Scoring Models," Expert Systems with Applications(28:4), pp. 655-665. Hsu, C., C. Chang, and C. Lin. 2003, A Practical Guide to Support Vector Classification. Huang, C., M. Chen, and C. Wang. 2007, "Credit Scoring with a Data Mining Approach Based on Support Vector Machines," Expert Systems with Applications (33:4), pp. 847-856. Huang, C., H. Liao, and M. Chen. 2008, "Prediction Model Building and Feature Selection with Support Vector Machines in Breast Cancer Diagnosis," Expert Systems with Applications (34:1), pp. 578-587. Huang, C., H. Liao, and M. Chen. 2008, "Prediction Model Building and Feature Selection with Support Vector Machines in Breast Cancer Diagnosis," Expert Systems with Applications (34:1), pp. 578-587. Huang, J., G. Tzeng, and C. Ong. 2006, "Two-Stage Genetic Programming (2SGP) for the Credit Scoring Model," Applied Mathematics and Computation (174:2), pp. 1039-1053. Hunn, P. 1971, "Bank Credit in the 1970’s New Realities and Old Verities," The Journal of Commercial Bank Lendingpp. pp. 29-34. Jo, H., I. Han, and H. Lee. 1997, "Bankruptcy Prediction using Case-Based Reasoning, Neural Networks, and Discriminant Analysis,” Expert Systems with Applications (13:2), pp. 97-108. Kleissner, C. 1998. "Data Mining for the Enterprise”, pp. 295-304. Mangasarian, O. L., and D. R. Musicant. 2001, "Lagrangian Support Vector Machines," The Journal of Machine Learning Research pp. 161-177. Martens, D., B. Baesens, T. Van Gestel, and J. Vanthienen. 2007, "Comprehensible Credit Scoring Models using Rule Extraction from Support Vector Machines," European Journal of Operational Research (183:3), pp. 1466-1476. Nanni, L., and A. Lumini. 2009, "An Experimental Comparison of Ensemble of Classifiers for Bankruptcy Prediction and Credit Scoring," Expert Systems with Applications (36:2), pp. 3028-3033. Ong, C., J. Huang, and G. Tzeng. 2005, "Building Credit Scoring Models using Genetic Programming," Expert Systems with Applications (29:1), pp. 41-47. Owrang O, M. M., and F. H. Grupe. 1996, "Using Domain Knowledge to Guide Database Knowledge Discovery," Expert Systems with Applications (10:2), pp. 173-180. Piateski, G., and W. Frawley. 1991. Knowledge Discovery in Databases, MIT press. Quinlan, J. R. 1993. C4. 5: Programs for Machine Learning, Morgan kaufmann. Quinlan, J. 1979, "Discovering Rules Form Large Collections of Examples: A Case Study," Expert Systems in the Microelectronics Age. Reichert, A. K., C. Cho, and G. M. Wagner. 1983, "An Examination of the Conceptual Issues Involved in Developing Credit-Scoring Models," Journal of Business & Economic Statistics (1:2), pp. 101-114. Schebesch, K. B., and R. Stecking. 2005, "Support Vector Machines for Classifying and Describing Credit Applicants: Detecting Typical and Critical Regions," Journal of the Operational Research Society (56:9), pp. 1082-1088. Su, C., and C. Yang. 2008, "Feature Selection for the SVM: An Application to Hypertension Diagnosis," Expert Systems with Applications (34:1), pp. 754-763. Su, C., and C. Yang. 2008, "Feature Selection for the SVM: An Application to Hypertension Diagnosis," Expert Systems with Applications (34:1), pp. 754-763. Suykens, J. A., and J. Vandewalle. 1999, "Least Squares Support Vector Machine Classifiers," Neural Processing Letters (9:3), pp. 293-300. Thomas, L. C. 2000, "A Survey of Credit and Behavioural Scoring: Forecasting Financial Risk of Lending to Consumers," International Journal of Forecasting (16:2), pp. 149-172. Tsai, C., and J. Wu. 2008," Using Neural Network Ensembles for Bankruptcy Prediction and Credit Scoring," Expert Systems with Applications (34:4), pp. 2639-2649. Vapnik, V. 1999. The Nature of Statistical Learning Theory, springer. West, D. 2000, "Neural Network Credit Scoring Models," Computers & Operations Research (27:11), pp. 1131-1152. Yang, Y. 2007," An Extension to the Composite Rule Induction System." 二中文部分李逢嘉, 2010, "特徵選取為基礎之複合分類預測模式-以信用資料為例," 清華大學工業工程與工程管理學系學位論文2010年。劉榮輝, 2009, "中小企業融資與行銷實務", 台北:金創研訓張嘉豪, 2006, "應用平滑支撐向量分類於台灣股票市場選股之研究"。張大成, 2003, "違約機率與信用評分模型," 台灣金融財務季刊 (4:1), pp, 19-37。曾俊堯, 1991, "信用卡信用管理之研究, 政治大學企業管理研究所碩士論文"。林健州, 2000, "銀行個人消費信用貸款授信風險評估模式之研究, 中山大學財務管理研究所碩士論文"。林維義, 1997, "從擠兌事件談金融風險管理"。張筑嬪, 2006, "應用模糊層級分析法建立個人信用評估準則-以信用卡審核為例",私立中華大學資訊管理所碩士論文。黃思嘉, 2000, "股權結構與組織策略對銀行信用風險之衝擊", 國立中央大學財務管理研究所未出版碩士論文"。黃承龍, 2004, "支援向量機於信用評等之應用"。
論文全文使用權限	校內：校內紙本論文立即公開同意電子論文全文授權校園內公開校內電子論文立即公開校外：同意授權校外電子論文立即公開

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信