簡易檢索 / 詳目顯示

研究生: 江坤林
Kun-lin Chiang
論文名稱: 資料探勘於個人信用貸款審核之應用
A Data Mining Application for Personal Loan
指導教授: 李育杰
Yuh-Jye Lee
口試委員: 鮑興國
Hsing-Kuo Kenneth
項天瑞
Tien-Ruey Hsiang
黃文瀚
Wen-Han Hwang
余尚武
none
學位類別: 碩士
Master
系所名稱: 電資學院 - 資訊工程系
Department of Computer Science and Information Engineering
論文出版年: 2006
畢業學年度: 94
語文別: 中文
論文頁數: 67
中文關鍵詞: 餘額代償資料探勘支撐向量法特徵選取重覆取樣逾期放款
外文關鍵詞: balance transfer, data mining, support vector machines, feature selection, over-sampling, non-performance loan
相關次數: 點閱:288下載:21
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 國內的個人消費金融市場競爭相當激烈,根據行政院金融監督管理委員會截至民國九十四年十一月底的統計,全體金融機構之信用卡循環信用餘額與現金卡放款餘款就逼近8,000億元。若再加上其他個人消費性貸款餘額,整體無擔保放款之餘額則累積到1.4兆元,預估每年將挹注銀行1,400億到1,600億之利息收入。這麼龐大的商機也促使許多銀行紛紛推出餘額代償的專案,以爭取更多具貢獻度的好客戶。餘額代償專案在銀行授信審查難度較高,除需藉由授信人員的專業審查經驗外,更需從聯徵中心取得更多可靠的信用資料來決定是否貸放。實務上,銀行授信人員亦相當程度的依賴聯徵中心之信用報告。
    在本研究中,我們提出一套運用資料探勘技術,自動化地將聯徵中心的文字信用報告轉換成申請人的特徵資料,並利用支撐向量法來建立信用評價模型之核心來鑑別申請人是否可能會發生逾放。我們亦使用一些特徵選取及重覆取樣的技術來提升系統對逾期放款的預測準確性,最後將此系統應用於北部某金融機構。經一連串的實驗證實,此系統擁有良好的預測能力,可有效地減少發生逾期放款的可能。


    Personal banking in Taiwan has encountered increasingly competition. The outstanding balance of cash card and credit card revolving credit are nearly approached NT$800 billion according to data from Taiwan Financial Supervisory Commission. If other individual consumer loans are added, then these non-secured consumer loans reached NT$1.4 trillion. It is estimated that banks rake in as much as NT$140 billion to NT$160 billion per year in debt interest income. Such that balance transfer program was promoted by banks and become more popular for them to obtain more valuable customers. It is more difficulty to evaluate the credit of the applicant who applies for balance transfer than other consumer loan. In banking practice, the loan officers are highly relying on heuristic rules and credit reports which are acquired from JCIC, Joint Credit Information Center, to determine whether the loan is approved or not.
    In this study, we adopt a credit scoring system which using data mining techniques to automatically extract the applicant’s features from the text credit report that acquired from JCIC and employ the SVMs, Support Vector Machines, as the kernel of the credit scoring system to assessment the applicant. Moreover, we also employ some feature selection and over-sampling techniques to improve the prediction accuracy of the system. Finally, we apply the proposed system to a financial institution in northern Taiwan. The experimental results show that the adopted system has good prediction ability and can effectively decrease the amount of non-performance loan.

    碩士學位論文指導教授推薦書 I 碩士學位考試委員會審定書 II 論文摘要 III Abstract IV 誌謝 V 目錄 VI 圖目錄 VIII 表目錄 IX 第 1 章 緒論 1 第1.1節 研究背景 1 第1.2節 研究動機 4 第1.3節 研究目的 5 第 2 章 文獻探討 6 第2.1節 消費性貸款之審核 6 第2.2節 資料探勘在金融界的應用 7 第2.3節 資料探勘技術流程 9 第 3 章 資料探勘相關技術與方法 13 第3.1節 分類方法 13 第3.1.1節 Support Vector Machines 14 第3.1.2節 Smooth Support Vector Machine 16 第3.2節 特徵選取(Feature Selection) 17 第3.2.1節 Weight Score Approach 20 第3.2.2節 資訊理論(Information Therory) 21 第3.2.3節 1-norm SVM 22 第3.3節 評估方法(Evaluation) 23 第3.4節 非平衡資料集 (Imbalanced Dataset) 24 第 4 章 餘額代償信用評價模式 26 第4.1節 資訊取得與聯合徵信中心之資料庫連結 26 第4.2節 信用評價系統 29 第4.2.1節 JCIC Report Parser, RP 29 第4.2.2節 Model Builder, MB 29 第4.2.3節 Grade Evaluator, GE 31 第4.2.4節 系統應用之相關議題 32 第 5 章 實驗評估 34 第5.1節 資料集 35 第5.1.1節 資料來源 35 第5.1.2節 樣本特徵資料整理 35 第5.1.3節 資料前處理(Data Preprocessing) 41 第5.1.4節 資料淨化(Data Cleaning) 42 第5.2節 評估方式 43 第5.3節 特徵選取(Feature Selection) 43 第5.3.1節 實驗a 43 第5.3.2節 實驗b 45 第5.3.3節 實驗c 48 第5.3.4節 特徵選取結果綜合分析 50 第5.4節 Over-sampling 52 第5.5節 與其他演算法之效能比較 55 第 6 章 結論及未來展望 58 第6.1節 結論 58 第6.2節 未來展望 60 參考文獻 61 中文部分 61 英文部分 62 作者簡介 65

    中文部分
    [1] 江坤林、簡立仁、李育杰,「以支援向量法建構之餘額代償信用評價模式」, 第十屆人工智慧與應用研討會論文集,民國九十四年。
    [2] 汪海清等,消費者貸款實務,財團法人金融人員研究訓練中心,民國九十年。
    [3] 財政部,銀行資產評估損失準備提列及逾期放款催收款呆帳處理辦法, 民國九十三年。
    [4] 張智星,資料群聚與樣式辨認,http://www.cs.nthu.edu.tw/~jang ,民國九十一年。
    [5] 彭文正譯,Michael J. A. Berry, Gordon S. Linoff,資料探勘:顧客關係管理暨電子行銷之應用,數博網資訊股份有限公司,民國九十年。
    [6] 曾令寧、黃仁德,風險基準資本指南,台灣金融研訓院,民國九十三年。
    [7] 鄒慶士,林盟凱,「信用評等時間序列資料分析-結合效能提昇技術之類神經網路分類模型」,國立台北商業技術學院企業管理系,民國九十三年。
    [8] 劉豐榮,「資料探勘在銀行放款風險預測之應用」,國立中正大學資訊工程學系暨研究所,民國九十二年。
    [9] 戴堅,「個人消費性信用貸款授信評量模式之研究」,國立中正大學國際經濟研究所,民國九十三年。
    [10] 戴嘉甫,「銀行現金卡客戶違約機率之衡量」,義守大學管理科學研究所,民國九十三年。
    [11] 鐘志明,「現金卡二次授信風險實證分析」,國立高雄第一科技大學風險管理與保險所,民國九十三年。

    英文部分
    [12] A.F. Atiya., “Bankruptcy Prediction for Credit Risk Using Neural Networks: A Survey and New Results”. IEEE Transactions on Neural Networks (Special Issue on Neural Networks in Financial Engineering), 12, 929-935 (2002).
    [13] P. S. Bradley and O. L. Mangasarian, “Feature Selection via Concave Minimization and Support Vector Machines”, In Proceedings of the 15th international Conference on machine Learning, page 82-90, San Francisco, CA, USA (1998).
    [14] G. Brassard and P. Bratley, Fundamentals of Algorithms, Prentice Hall, New Jersey (1996).
    [15] C. J. C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition”, Data Mining and Knowledge Discovery, col.2, pp.121-167 (1998).
    [16] Chia-Huang Chao, “Feature Selection for Microarray Gene Expression Data”, National Taiwan University Science and Technology, Taiwan (2004).
    [17] Nello Cristianini, John Shawe-Taylor, An Introduction to Support Vector Machines, Cambridge University Press (2000).
    [18] J. Doak, “An Evaluation of Feature Selection Methods and Their Application to Computer Security,” technical report, Univ. of California at Davis, Dept. Computer Science (1992).
    [19] Xin Dong and Wu Zhaohui, “Speaker Recognition Using Continuous Density Support Vector Machine” Electronics Letters 16th Augst, pp.1099~1101 (2001).
    [20] J. Egan, Signal Detection Theory and ROC Analysis. New York:Academic Press (1975).
    [21] Glenn Fung, “The Disputed Federalist Papers: SVM Feature Selection via Concave Minimization”, Journal of the ACM (2003).
    [22] Tony Van Gestel, Bart Baesens, Joao Garcia, Peter Van Dijcke,” A Support Vector Machine Approach to Credit Scoring”, Default Risk .com (2003).
    [23] D. Green and J. Swets, Signal Detection Theory and Psychophysics. New York:Wiley (1966).
    [24] David W. Hosmer and Stanley Lemeshow, Applied Logistic Regression, Second Edition, Wiley (2000).
    [25] Nathalie Japkowicz, “Learning from Imbalanced Data Sets: A Comparison of Various Strategies”, AAAI Press, Technical Report WS-00-05, pp. 10–15 (2000).
    [26] Sen Ashish K. and Srivastava M. S., Regression analysis: theory methods and applications, Springer-Verlag (1990).
    [27] Yuh-Jye Lee and O. L. Mangasarian, “SSVM: A Smooth Support Vector Machine for Classification”, Computational Optimization and Applications (2001).
    [28] Huan Liu and Lei Yu,”Toward Integrating Feature Selection Algorithms for Classification and Clustering”, IEEE Transactions on Knowledge and Data Engineering, Vol. 17, NO. 3 (2005).
    [29] Liu Huan Liu and Hiroshi Motoda, Feature Selection for Knowledge Discovery and Data Mining, Kluwer Academic (1998).
    [30] Alexander Yun-chung Liu, “The Effect of Oversampling and Undersampling on Classifying Imbalanced Text Datasets”, The University of Texas at Austin (2004).
    [31] O. L. Mangasarian, Nonlinear Programming, SIAM (1994).
    [32] C. Metz, Basic Principles of ROC Analysis, Seminars in NuclearMedcine, vol. 8, pp. 283-298 (1978).
    [33] P.M. Narendra and K. Fukunaga, “A Branch and Bound Algorithm for Feature Subset Selection,” IEEE Trans. Computer, vol. 26, no. 9, pp. 917-922 (1977).
    [34] Julia Neumann, Christoph Schnorr, Gabriele Steidl , “SVM-based Feature Selection by Direct Objective Minimisation”, Proceedings of DAGM, Dept. of Mathematics and Computer Science University of Mannheim (2004).
    [35] John C. Platt, “Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods”, Microsoft Research (1999).
    [36] John C. Platt, “Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines”, Microsoft Research, Technical Report MSR-TR-98-14, April 21 (1998).
    [37] Ross Quinlan, J R Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann (1992).
    [38] Kasper Roszbach, “Bank Lending Policy, Credit Scoring and the survival of Loans”, Sveriges Riksbank Working Paper Series, pp.7 (2003).
    [39] J. Swets, Measuring the Accuracy of Diagnostic Systems, Science, vol. 240, pp. 1285-1293 (1988).
    [40] Vladimir N. Vapnik, The Nature of Statistical Learning Theory Second Edition, Springer (2000).
    [41] J Weston, S Mukherjee, O Chapelle, M Pontil, T Poggio and V. Vapnik, “Feature Selection for SVMs”, NIPS (2000).
    [42] Ian H. Witten, Eibe Frank, Data Mining, Morgan Kaufmann (1999).
    [43] Ji Zhu, Saharon Rosset, Trevor Hastie, Rob Tibshirani,”1-norm Support Vector Machines”, Advances in Neural Information Processing Systems (2004).

    QR CODE