§ 瀏覽學位論文書目資料
  
系統識別號 U0002-1306201021175700
DOI 10.6846/TKU.2010.00360
論文名稱(中文) 個人信貸信用風險評分卡模型之探討
論文名稱(英文) A Comparison of Different Credit Risk Scorecards for Personal Loans
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 統計學系碩士班
系所名稱(英文) Department of Statistics
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 98
學期 2
出版年 99
研究生(中文) 范維真
研究生(英文) Wei-Jan Fan
學號 697650082
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2010-05-21
論文頁數 88頁
口試委員 指導教授 - 林志娟
委員 - 張慶暉
委員 - 林志鴻
關鍵字(中) 信用風險評分卡模型
邏輯斯迴歸
支持向量機
核函數
關鍵字(英) credit scoring model
logistic regression
support vector machines
kernel function
第三語言關鍵字
學科別分類
中文摘要
本研究主要是利用資料採礦中的支持向量機,來建構個人信貸信用風險評分卡模型。目前較常被使用來建立信用風險評分卡模型的方法為邏輯斯迴歸,雖然資料採礦在使用上很方便而且限制不多,但實務上卻較少被使用來建立信用風險評分卡模型,其主要原因為支持向量機模型所選取變數之經濟意涵常不易被解釋。為了探究支持向量機模型是否能提供另一個信用風險評分卡模型的較佳選擇,本研究除了先以該銀行所提供的所有變數為考量的情況下進行模式建構,另外再分別以證據權數(weight of evidence, WOE)/訊息值(information value,IV)、逐步選取法、刪除異常變數、相關係數等四種方法來選取變數,並將這五種篩選出的變數組合分別套用在邏輯斯迴歸及支持向量機模型中;另外在支持向量機模型中,本研究所採用分割資料的核函數(kernel functions)分別有線性(linear)、多項式(polynomial)、放射(radial basis function, RBF) 和S型(sigmoid)等四種,期望能從以上所搭配出的這二十五種模型中,找到較適合且能合理解釋的信用風險評分卡模型。至於本研究採用評估各模型優劣的準則有正確率(accuracy rate)、AUROC(area under the receiver operating characteristic)、吉尼(gini)係數、穩定度分析指標(population stability index, PSI)及交叉驗證(cross-validation)。本研究實證結果顯示,支持向量機模型中採用放射核函數的方法為最佳,其正確率為最高,而AUROC、吉尼係數雖然並非為最高,但其值跟最高的邏輯斯迴歸相差並不大,因此本研究建議先以此法為分類之優先選擇。
英文摘要
The main purpose of the research is to build a credit scoring model for personal loans with a data mining approach based on support vector machines (SVM). Though the logistic regression model is more commonly adopted by the credit card industry due to its easier explanation feature in credit scoring, SVM are more accurate in applicants’ classification problems pointed out in recent literature. Hence this research intends to apply SVM incorporating the features selected from 4 different criteria and suggests a better model for the credit scoring problems. The feature selection criteria includes the original variables provided by the credit card department in Taiwan financial holding company, the stepwise procedure through the logistic regression model, weight of evidence/ information value, abnormal deletion and correlation coefficients. In addition, 4 different kernel functions- linear, polynomial, radial basis function and sigmoid, are adopted in SVM to find the optimal hyperplane. To evaluate the performance of SVM, we compare them with naïve logistic regression along with the aforementioned 5 different feature combinations. Besides, population stability index and cross-validation are used to check the model fitness of the aforementioned 5 naïve logistic regression models and 20 SVM, respectively. The empirical results show that SVM with radial basis function performs more or less about the same as the naïve logistic regression models in term of area under the receiver operating characteristic, equivalently, and gini coefficient. However, it outperforms the rest 24 models in terms of accuracy rate. Therefore, SVM with radial basis function is recommended.
第三語言摘要
論文目次
目錄
第 一 章 緒論 .................................. 1
1.1 研究背景與動機 ..............................1
1.2 研究動機與目的 ..............................3
1.3 研究架構與流程 ............................. 4
1.4 研究限制 ................................... 6
第 二 章 文獻探討 .............................. 7
2.1 信用風險評分卡相關模型 ..................... 7
2.2 邏輯斯迴歸模型 ............................. 11
2.3 支持向量機模型 ............................. 13
第 三 章 研究方法 .............................. 18
3.1 客戶好壞定義 ............................... 18
3.2 樣本區隔及樣本抽樣 ......................... 19
3.3 變數形成、分組與篩選變數 ................... 20
3.3.1 證據權數 ..................................21
3.3.2 訊息值 ....................................23
3.4 信用風險評分卡相關模型 ..................... 26
3.4.1 邏輯斯迴歸模型 ........................... 26
3.4.2 支持向量機模型 ........................... 32
3.5 信用評分分數 ............................... 40
3.6 模型驗證 ................................... 43
3.6.1 正確率 ....................................43
3.6.2 AUROC指標 .................................45
3.6.3 吉尼係數 ..................................52
3.6.4 穩定度分析指標 ............................52
3.6.5 交叉驗證 ..................................54
第 四 章 實證分析 .............................. 55
4.1 資料來源及說明 ............................. 55
4.2 信用風險評分卡建模流程 ..................... 56
4.3 實證結果 ................................... 76
第 五 章 結論與建議 ............................ 82
參考文獻.........................................84

表目錄
表2.1相關文獻之研究彙整表 ...............................................................11
表2.2相關文獻之研究彙整表 ...............................................................13
表2.3相關文獻之研究彙整表 ...............................................................17
表3.1訊息值之經驗法則 .......................................................................25
表3.2不同屬性及其信用評分範例 .......................................................40
表3.3次數分類表....................................................................................44
表3.4四種評分模型可能分類的結果 ...................................................47
表3.5模型的區別能力 ...........................................................................51
表3.6模型的區別能力 ...........................................................................52
表4.1卡齡違約與非違約分布狀況 .......................................................58
表4.2是否於9803月為循環戶違約與非違約分布狀況 .....................59
表4.3是否於9803月為有效戶違約與非違約分布狀況 .....................60
表4.4信用卡額度違約與非違約分布狀況 ...........................................61
表4.5 9803月餘額違約與非違約分布狀況 ..........................................62
表4.6 9803月額度使用率違約與非違約分布狀況 ..............................63
表4.7性別違約與非違約分布狀況 .......................................................63
表4.8年齡違約與非違約分布狀況 .......................................................64
表4.9婚姻違約與非違約分布狀況 .......................................................65
表4.10教育程度違約與非違約分布狀況 .............................................66
表4.11居住狀況違約與非違約分布狀況 .............................................67
表4.12年收入違約與非違約分布狀況 .................................................68
表4.13職業違約與非違約分布狀況 .....................................................70
表4.14變數之訊息值及與近一年逾期次數之相關係數 .....................72
表4.15勝算比估計表 .............................................................................73
表4.16五種變數組合彙整表 .................................................................74
表4.17四種支持向量機方法之參數設定 .............................................76
表4.18五種模型方法五種變數組合結果比較 .....................................77
表4.19邏輯斯迴歸方法下五種變數組合之穩定度分析指標值 .........79
表4.20五次交叉驗證結果 .....................................................................80

圖目錄
圖1.1本文的研究架構圖 ......................................................................... 5
圖3.1邏輯斯函數的曲線圖 ...................................................................29
圖3.2最佳化區分超帄面 .......................................................................34
圖3.3最大邊界圖....................................................................................35
圖3.4輸入空間與高維度特性空間之對應關係 ...................................38
圖3.5好客戶與壞客戶之機率分配與截斷點C之關係圖 ..................48
圖3.6 ROC曲線 ......................................................................................50
圖3.7穩定度分析指標示意圖 ...............................................................53
圖4.1卡齡違約與非違約分布狀況 .......................................................58
圖4.2是否於9803月為循環戶違約與非違約分布狀況 .....................59
圖4.3是否於9803月為有效戶違約與非違約分布狀況 .....................60
圖4.4信用卡額度違約與非違約分布狀況 ...........................................61
圖4.5 9803月餘額違約與非違約分布狀況 ..........................................62
圖4.6 9803月額度使用率違約與非違約分布狀況 ..............................63
圖4.7性別違約與非違約分布狀況 .......................................................64
圖4.8年齡違約與非違約分布狀況 .......................................................65
圖4.9婚姻違約與非違約分布狀況 .......................................................66
圖4.10教育程度違約與非違約分布狀況 .............................................67
圖4.11居住狀況違約與非違約分布狀況 .............................................68
圖4.12年收入違約與非違約分布狀況 .................................................69
圖4.13職業違約與非違約分布狀況 .....................................................70
圖4.14職業違約與非違約分布狀況 .....................................................71
圖4.15 AUROC在五種變數組合下之比較 ..........................................78
圖4.16吉尼係數在五種變數組合下之比較 .........................................78
圖4.17正確率在五種變數組合下之比較 .............................................79
圖4.18帄均正確率在五種變數組合下之比較 .....................................81
參考文獻
[1] 王濟川、郭志剛(2008)。Logistic迴歸模型-方法及應用,五南圖書出版股份有限公司。
[2] 林建州(2000)。銀行個人消費信用貸款授信風險評估模式之研
究。中山大學財務管理研究所碩士論文,高雄縣。
[3] 黃承龍、陳穆臻、王界人(2004)。支援向量機於信用評等之應用。計量管理期刊,1,2,155-172。
[4] 廖仁傑(2005)。信用卡業務信用評分制度與模型之有效性研究。
中央大學財務金融研究所碩士論文,桃園縣。
[5] 謝有隆(2006)。―信用評分模型的建構與驗證‖。政治大學經濟學系碩士論文,台北市。
[6] Altman, E. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. Journal of Finance, 23(4), 589–609.
[7] Bailey, M. (2001). Credit Scoring:The Principles and Practicalities, Bristol:White Box Publishing.
[8] Beaver, W. (1966). Financial ratios as prediction of failure. Empirical research in accounting: selected studies. Journal of Accounting Research, 4, 71–111.
[9] Brill, J. (1998). The importance of credit scoring models in improving cash flow and collection. Business Credit, 100(1), 16–17.
[10] Boser, B. E., Guyon, I. M., & Vapink, V. N. (1992). A training algorithm for optimal margin classifiers. In D. Haussler, editor, Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, 144–152. ACM Press.
[11] Chen, P. H., Lin, C. J., & Scholkopf, B. (2005). A tutorial on v-support vector machines. Applied Stochastic Models in Business and Industry, 21, 111–136.
[12] Ding, Y. Y., & Wilkins, D. (2006). Improving the Performance of SVM-RFE to Select Genes in Microarray Data. BMC Bioinformatics, 7(S-2).
[13] Desai, V. S., Crook, J. N., & Overstreet, G. A. (1996). A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operational Research, 95(1), 24–37.
[14] Fan, R. E., Chen, P. H., & Lin, C. J. (2005).Working Set Selection Using Second Order Information for Training Support Vector Machines. Journal of Machine Learning Research, 6, 1889–1918.
[15] Fletcher, D., & Goss, E. (1993). Forecasting with neural networks: an application using bankruptcy data. Information and Management, 24(3), 159–167.
[16] Frohlich, H., & Chapelle, O. (2003). Feature selection for support vector machines by means of genetic algorithms. In Proceedings of the 15th IEEE international conference on tools with artificial intelligence, Sacramento, California, USA, 142–148.
[17] Hosmer, D. W.(2000). Applied Logistic Regression, John Wiley & Sons, Inc.
[18] Huang, C. L., Chen, M. C., & Wang, C. J. (2007). Credit scoring with a data mining approach based on support vector machines. Expert Systems with Applications, 33, 847–856.
[19] Huang, C. L., & Wang, C. J. (2006). A GA-based feature selection and parameters optimization for support vector machines. Expert System with Applications, 31, 231–240.
[20] Huang, Z., Chen, H., Hsu, C. J., Chen, W.H., & Wu, S. (2004). Credit rating analysis with support vector machines and neural networks: a market comparative study. Decision Support Systems, 37(4), 543–558.
[21] Joachims, T. (1998). Text categorization with support vector machines. In Proceedings of European conference on machine learning (ECML), Chemintz, DE, 137–142.
[22] Liao, S. H. (2005). Expert system methodologies and applications—a decade review from 1995 to 2004. Expert Systems with Applications, 28, 93–103.
[23] Min, J. H., & Lee, Y. C. (2005). Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters.Expert Systems with Applications, 28, 603–614.
[24] Min, S. H., Lee, J., & Han, I. (2006). Hybrid genetic algorithms and support vector machines for bankruptcy prediction. Expert Systems with Applications, 31, 652–660.
[25] Morrison, A. M. (2005). Receiver Operating Characteristic (ROC) Curve Preparation - A Tutorial. Boston: Massachusetts Water Resources Authority. Report ENQUAD 2005.
[26] Nello, C., & John, S. T. (2000). An Introduction to Support Vector Machines and other kernel-based learning methods,st1edition,Cambridge University Press.
[27] Pontil, M., & Verri, A. (1998). Support vector machines for 3D object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(6), 637–646.
[28] Shin, K. S., Lee, T. S., & Kim, H. J. (2005). An application of support vector machines in bankruptcy prediction model. Expert Systems with Applications, 28, 127–135.
[29] Siddiqi, N. (2005). Credit Risk Scorecards: Developing And Implementing Intelligent Credit Scoring, John Wiley & Sons Inc.
[30] Stein, R. M. (2005). The relationship between default prediction and lending profits: Integrating ROC analysis and loan pricing. Journal of Banking & Finance, 29, 1213–1236.
[31] Sun, Z., Bebis, G., & Miller, R. (2004). Object detection using feature subset selection. Pattern Recognition, 37, 2165-2176.
[32] Tam, K., & Kiang, M. (1992). Managerial applications of neural networks: the case of bank failure predictions. Management Science, 38(7), 926–947.
[33] Van, G.T., Baesens, B., Suykens, J., Espinoza, M., Baestaens, D. E.,Vanthienen, J., & De Moor, B. (2003). Bankruptcy prediction with least squares support vector machine classifiers. Proceedings of the IEEE international conference on computational intelligence for financial engineering, Hong Kong, 1–8.
[34] Vapnik, V. N. (1995). The nature of statistical learning theory. New York: Springer-Verlag.
[35] Yu, G. X., Ostrouchov, G., Geist, A., & Samatova, N. F. (2003). An SVM-based algorithm for identification of photosynthesis-specific genome features. In 2nd IEEE computer society bioinformatics conference, CA, USA, 235–243.
[36] Zhang, G., Hu, Y. M., Patuwo, E. B., & Indro, C. D. (1999). Artificial neural networks in bankruptcy prediction: general framework and cross-validation analysis. European Journal of Operational Research, 116, 16–32.
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信