系統識別號 | U0002-0103201023361400 |
---|---|
DOI | 10.6846/TKU.2010.00006 |
論文名稱(中文) | 以決策支援系統架構協助資料採礦流程之實際應用 |
論文名稱(英文) | The Framework of Decision Support System in Aiding Data Mining Process for Real Applications |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 資訊工程學系博士班 |
系所名稱(英文) | Department of Computer Science and Information Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 98 |
學期 | 1 |
出版年 | 99 |
研究生(中文) | 詹念怡 |
研究生(英文) | Nien-Yi Jan |
學號 | 892190314 |
學位類別 | 博士 |
語言別 | 英文 |
第二語言別 | |
口試日期 | 2010-01-15 |
論文頁數 | 97頁 |
口試委員 |
指導教授
-
林丕靜(nancylin@mail.tku.edu.tw)
委員 - 洪宗貝(tphong@nuk.edu.tw) 委員 - 謝楠楨(nchsieh@ntcn.edu.tw) 委員 - 蔣定安(chiang@cs.tku.edu.tw‎) 委員 - 陳伯榮(pozung@cs.tku.edu.tw) 委員 - 林丕靜(nancylin@mail.tku.edu.tw) |
關鍵字(中) |
決策支援系統 資料採礦 客戶流失模型 入侵偵測 客戶關係管理 |
關鍵字(英) |
Decision Support System Data Mining Churn Model Network Intrusion CRM |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
本篇論文我們提出一個決策支援系統架構,去協助採用資料採礦模型於實際應用中。透過該決策支援系統,可以整合具有該領域業務知識的專家與具有資料採礦知識的專家,以達到採用資料採礦模型去協助應用案例的最佳方法。在一開始的訓練階段,會建構包含行銷設計、產品資訊、資料採礦模型與成效評估相關的知識於知識庫中。推論引擎會負責整合業務專家與資料採礦專家的協助去完成建模相關的工作,並將模型存放在知識庫中。這些模型可以在反覆採用模型的流程中,根據衡量商業目標的達成率與累積每次的應用結果,回饋知識庫以進行模型精鍊。我們設計三個評估測量達成目標的測量標準,以協助評估採用資料採礦模型於實際應用的執行成效,並透過這三個評估測量結果成為一個回饋機制給知識庫去調整行銷方案或者試模型。這整個決策支援系統架構,可以儲存採用模型於實際應用時的每一個相關參數,包含建模流程、行銷應用流程及回饋機制等,這些參數可以協助累積執行經驗於知識庫中。我們也設計一個處理電信業的客戶流失問題的實驗架構去評估我們所提架構的效能與可行性。流失客戶管理是客戶關係管理系統中很重要的一個議題,也是一個典型採用資料採礦技術的應用。可是,以往較少將採礦技術與行銷同時討論,但不僅試資料採礦模型本身,後面的挽留活動配合也都是影響流失管理成效重要的因素。 我們所提出的實驗架構也可做為採用資料採礦於新應用的前導專案流程,經由我們所提出的三個達成目標的測量標準:修改測量標準、執行成效測量標準及成本效益測量標準,在每次執行後,了解需對模型或行銷方案如何進行修改,協助逐步優化採用模型成效。我們也採用這個決策支援系統架構在入侵偵測應用。其所建立的三個警示分類的規則可以協助專家快速的發掘可疑的模式或快速準確的發掘進行入侵的模式,也可明顯的減輕專家於線上警示分析系統的負擔。後續,我們將持續專注在如何整合更多業務知識、商業角度參數與資料採礦模型的相關知識至系統中去協助解決客戶關係管理中各項商業應用的重要議題。 |
英文摘要 |
In this dissertation, we proposed a novel decision support system framework to help for adopting data mining models easily in real world applications. Through the decision support system, domain experts can help provide the business related knowledge and the data mining experts can help provide the technique related knowledge. In training stage, the knowledge base are firstly constructed which consist of marketing plan, promotion plan, data mining models, and some model evaluation related knowledge, etc. The Inference Engine are responsible to construct data mining models via data mining experts and domain experts’ assistances stored in knowledge base. Those models can be refined according to the considerations of profits and different business goal in several iterative cycles. We also design three good effectiveness measures to aid in evaluating the execution effectiveness and give a proper feedback according to the knowledge base to adjust data mining models or redesign marketing activities. The DSS framework can store the relative parameters of each adoption of data mining models in knowledge base can help for accumulating execution experiments. To evaluate the effectiveness of our proposed models, we designed an experimental architecture on telecom churn management application. Churn management is a critical issue of customer relationship management (CRM). Not only the data mining models but also the retention activities influence the results of the churn management. The proposed experimental architecture based up our decision support system framework can take as a design of a pilot project for real application. Three measures including modification measure, execution effectiveness measure, and cost benefits analysis are proposed to aid in adjusting the data mining models. For those models which need be adjusted, we can know how to improve the result of decreasing churn rate by adjustment of data mining models or marketing activity according to the effectiveness measures. We also adapt the decision support system framework on network intrusion detection application, we construct three kinds of alert classification rule classes to help experts quickly discover suspicious or intrusion patterns quickly and precisely, and lightens the load of on-line alert analysis system for experts obviously. In the future, we will focus integrating more domain knowledge, business considerations, and mining models into this system and solve the critical problems is customer relationship management in real business applications. |
第三語言摘要 | |
論文目次 |
Table of Contents 誌謝 ………………………………………………………………………. I 中文摘要 ………………………………………………………………….. II Abstract ………………………………………………………………….. III Table of Contents …………………………………………………………. IV List of Figures ……………………………………………………………...VI List of Tables ……………………………………………………………..VIII List of Algorithms ………………………………………………………... IX CHAPTER 1 INTRODUCTION …………………………………………. 1 CHAPTER 2 BACKGROUND …………………………………………… 8 2.1 Data Mining Process ………………………………………………… 8 2.2 Data Mining Technologies and Its Applications …………………... 10 2.3 Difficulties in Data Mining ……………………………………….... 14 2.4 Integration of Data Mining and Decision Support …………………. 15 2.5 Telecom Churn Model Application ………………………………… 18 2.5.1 Churn Problem in Telecom Industry ………………………….. 18 2.5.2 Data Mining Churn Model ……………………………………. 21 Chapter 3 The Integration of Decision Support System with Data Mining Methods …………………………………………………………………… 28 3.1 Framework of Decision Support System for Data Mining ………… 28 3.2 Knowledge Base and Trigger Mechanism .………………………… 30 3.3 Decision Support Inference Engine ………………………………... 31 3.4 Feedback Engine …………………………………………………… 39 CHAPTER 4 Applications for Adopting Data Mining Churn Model in CRM ………………………………………………………………………. 42 4.1 DSS Framework of Adopting Data Mining Model in CRM ……….. 43 4.2 System Architecture of Data Mining Churn Model Construction ….. 46 4.2.1 Data Preprocessing Stage ……………………………………... 47 4.2.2 Model Constructing Stage …………………………………….. 50 4.2.3 Refining Stage ……………………………………………….... 51 4.3 System Architecture of Feedback Engine on Churn Model ………... 51 4.3.1 Modification Measure Stage ………………………………….. 54 4.3.2 Execution Effectiveness Measure Stage ……………………… 55 4.3.3 Economic Cost Benefits Analysis Stage ……………………. 57 4.4 Experiments ………………………………………………………… 58 4.4.1 Data Mining Modeling and Predicting ………………………... 60 4.4.2 Design Retention Activity …………………………………….. 62 4.4.3 Design Comparison Groups …………………………………... 62 4.4.4 Feedback Evaluation ………………………………………….. 63 Chapter 5 Applications for Constructing An Alert Classification Model 65 5.1 System Architecture of alert classification …………………………. 66 5.2 Alert Preprocessing Phase ………………………………………….. 67 5.3 Model Constructing Phase …………………………………………. 71 5.4 Rule Refining Phase ………………………………………………... 74 5.5 Rule Class Construction Algorithms of Model Constructing Phase .. 76 5.6 Normal Behavior Rule Class Construction ………………………… 78 5.7 Intrusion/Suspicious Behavior Rule Class Construction …………... 81 5.8 Experiments ………………………………………………………… 85 5.8.1The Overview of Related Tools ……………………………….. 85 5.8.2 The Design of Experimental Environment ……………………. 87 5.8.3 The Experimental Results …………………………………….. 88 CHAPTER 6 CONCLUSIONS ………………………………………….. 91 REFERENCES …………………………………………………………… 93 List of Figures Figure 2-1: The CRISP-DM Process ………………………………………. 10 Figure 2-2: Integration of data mining and decision support ……………… 17 Figure 2-3: Churn types in telecom ………………………………………... 20 Figure 2-4: Applying Data Mining in Churn Problem …………………….. 20 Figure 2-5: Cumulative gain chart …………………………………………. 25 Figure 2-6: The Lift Cart Example ………………………………………… 26 Figure 3-1: Framework of Decision Support System for Data Mining ……. 30 Figure 3-2: The Flow of Decision Support Inference Engine ……………... 32 Figure 3-3: Data Preprocess of Sequential Data Format …………………... 37 Figure 3-4: The Components of Feedback Engine ………………………… 39 Figure 4-1: DSS Framework of Adopting Data Mining Model in CRM ….. 45 Figure 4-2: Architecture for Data Mining Churn Model Construction ……. 47 Figure 4-3: Timeline of adopting churn model in CRM …………………... 49 Figure 4-4: The Architecture of Feedback Engine on Churn Model ………. 52 Figure 4-5: Concept Map of the design of Feedback Engine on Churn Model ………………………………………………………………………. 54 Figure 4-6: Decision Tree based Conclusions ……………………………... 57 Figure 4-7: Experimental Architecture …………………………………….. 58 Figure 4-8: Timeline of Experiments ……………………………………… 59 Figure 4-9: Lift Chart of Data Mining Model, (M6) ………………………. 61 Figure 4-10: Lift Chart of Data Mining Model, (M7) ……………………... 61 Figure 4-11: Lift Chart of Data Mining Model, (M8) ……………………... 61 Figure 4-12: Design Comparison Groups of M6 …………………………... 62 Figure 5-1: Decision Support System on Constructing Alert Classification Models ……………………………………………………………………... 66 Figure 5-2: An Attack Tool Being Run Against Three Targets …………….. 68 Figure 5-3: Meta Rules of Classification Rule Classes for On-Line Monitoring …………………………………………………………………. 74 Figure 5-4: Three Types of Alert Behavior Classification Rule Classes …... 77 Figure 5-5: The Procedure of Normal Behavior Classification Rule Class Construction ……………………………………………………………….. 79 Figure 5-6: The Procedure of Suspicious/Intrusion Classification Rule Class Construction ……………………………………………………………….. 81 Figure 5-7: The Procedure of Identifying Intrusion Rule Classification .….. 83 Figure 5-8: System Prototype in Experiments ……………………………. 87 Figure 5-9: Alert Reduction Rate of Normal Behavior Classification Model 89 Figure 5-10: Observations of Percentages of Different Suspicious Flags …. 90 List of Tables Table 2-1: Data Mining Technologies ……………………………………... 11 Table 2-2: Confusion Matrix …………………………………………….. 23 Table 2-3: Performance Measures from Confusion Matrix ………………... 25 Table 3-1: Data Formats and Objectives ……………………………….... 35 Table 3-2: The Data Mining Technologies and Data Format ……………… 36 Table 4-1: Example of Patterns in DB ……………………………………... 56 Table 4-2: ANOVE Test of Control & Test Group by Attribute “Sex” ……. 63 Table 4-3: Results of Evaluation Scales …………………………………… 64 List of Algorithms Algorithm 5-1: Normal Behavior Classification Rule Class Construction Algorithm ………………………………………………………………….. 80 Algorithm 5-2: The Suspicious/Intrusion Behavior Classification Rule Class Construction Algorithm …………………………………………………... 85 |
參考文獻 |
[1]. Agrwal, R. and Srikant, R., 1995. Mining Sequential Patterns. In Proceedings of the 11th International Conference on Data Engineering, 3-14, Date: 6-10 Mar. [2]. Ali, F. Özden Gür and Wallace, William A, 1997. Bridging the Gap Between Business Objectives and Parameters of Data Mining Algorithms. Decision Support Systems, 21(1), 3-15. [3]. Abe, H., Yokoi, H., Ohsaki, M., and Yamaguchi, T., 2007. Developing an Integrated Time-Series Data Mining Environment for Medical Data Mining. In Proceeding of 7th IEEE International Conference on Data Mining Workshops, 127-132, Date: 28-31 Oct. [4]. Burez, J., and Van den Poel, D., 2007. CRM at a Pay-TV Company: Using Analytical Models to Reduce Customer Attrition by Targeted Marketing for Subscription Services. Expert Systems With Applications, 32(2), 277–288. [5]. Chu, B.-H., Tsai, M.-S., and Ho, C.-S., 2007. Toward a Hybrid Data Mining Model for Customer Retention. Knowledge-Based Systems, 20(8), 703-718. [6]. Coussement, K., Benoit, D. F., and Van den Poel, D., 2010. Improved Marketing Decision Making in A Customer Churn Prediction Context Using Generalized Additive Models. Expert Systems With Applications, 37( 3), 2132-2143. [7]. Coussement, K. and Van den Poel, D., 2008. Churn Prediction in Subscription Services: An Application of Support Vector Machines while Comparing Two Parameter-Selection Techniques. Expert Systems With Applications, 34(1), 313–327. [8]. Coussement, K. and Van den Poel, D., 2009. Improving Customer Attrition Prediction by Integrating Emotions from Client/Company Interaction Emails and Evaluating Multiple Classifiers. Expert Systems With Applications, 36(3), 6127-6134. [9]. Dasgupta, K., Singh, R., Viswanathan, B., Chakraborty, D., Mukherjea, S., Nanavati, A. A., and Joshi, A., 2008. Social Ties and Their Relevance to Churn in Mobile Telecom Networks. In Proceedings of The 11th International Conference on Extending Database Technology: Advances in Database Technology, 668-677, Date: 25 Mar. [10]. Ganguly, A. R. and Gupta, A., 2005. Data Mining Technologies and Decision Support Systems for Business and Scientific Applications. Encyclopedia of Data Warehouse and Mining, MIT press, U.S. [11]. Geng, L. and Hamilton, H. J., 2006. Interestingness Measures for Data Mining: A Survey. ACM Computing Surveys, 38(3). [12]. Ghani, R. and Soares, C., 2006. Data Mining for Business Applications: KDD-2006 Workshop. ACM SIGKDD Explorations Newsletter, 8(2), 79-81. [13]. Hadden, J., Tiwari, A., Roy, R., and Ruta, D., 2007. Computer Assisted Customer Churn Management: State-of-the-art and Future Trends. Computers & Operations Research, 34(10), 2902-2917. [14]. Hung, S.-Y., David C.-Y., and Wang, H.-Y., 2006. Applying Data Mining to Telecom Churn Management. Expert Systems With Applications, 31(3), 515-524. [15]. Ju, C.-H. and Guo, F.-P., 2008. Research and Application of Customer Churn Analysis in Chain Retail Industry. In Proceedings of 2008 International Symposium on Electronic Commerce and Security, 670-673, Date: 3-5 Aug. [16]. Julisch, K. and Dacier, M., 2002. Mining Intrusion Detection Alarms for Actionable Knowledge. In Proceedings of the 8th ACM International Conference on Knowledge Discovery and Data Mining, 366-375. [17]. Larivière, B. and Van den Poel, D., 2004. Investigating the Role of Product Features in Preventing Customer Churn by Using Survival Analysis and Choice Modeling: The Case of Financial Services. Expert Systems With Applications, 27(2), 277-285. [18]. Larivière, B. and Van den Poel, D., 2005. Predicting Customer Retention and Profitability by Using Random Forests and Regression Forests Techniques. Expert Systems With Applications, 29(2), 472-484. [19]. Li, A.-H. and Zhang, L.-L., 2009. A Study of The Gap from Data Mining to Its Application with Cases. In Proceeding of 2009 International Conference on Business Intelligence and Financial Engineering, 464-467, Date: 24-26 Jul. [20]. Liou, J. J.-H., 2009. A Novel Decision Rules Approach for Customer Relationship Management of The Airline Market. Expert Systems with Applications, 36(3), 4374-4381. [21]. Luo, B., Shao, P.-J., and Liu, J., 2007. Customer Churn Prediction Based on the Decision Tree in Personal Handyphone System Service. In Proceedings of 2007 International Conference on Service Systems and Service Management, 1-5, Date: 9-11 Jun. [22]. Mladenic, D., Lavrac, N., Bohanec, M., and Moyle, S. (Eds), 2003. Data Mining and Decision Support: Integration and Collaboration. Kluwer Academic Publishers. ISBN 1-4020-7388-8. [23]. Ngai, E.W.T., Xiu, L., and Chau, D.C.K., 2009. Application of Data Mining Techniques in Customer Relationship Management: A Literature Review and Classification. Expert Systems With Applications, 36(2), 2592-2602. [24]. Popović, D. and Bašić, B. D., 2009. Churn Prediction Model in Retail Banking Using Fuzzy C-Means Algorithm. Informatica, 33, 243-247. [25]. Romero, C. and Ventura, S, 2007. Educational Data Mining: A Survey from 1995 to 2005. Expert Systems With Applications, 33(1), 135-146. [26]. Rygielski, C., Wang, J.-C., Yen, D. C., 2002. Data Mining Techniques for Customer Relationship Management. Technology in Society, 24(4), 483-502. [27]. Tsai, C.-F. and Chen, M.-Y., 2010. Variable Selection by Association Rules for Customer Churn Prediction of Multimedia on Demand. Experts System With Applications, 37(3), 2006-2015. [28]. Valdes, A. and Skinner, K., 2001. Probabilistic Alert Correlation. Lecture Notes in Computer Science, Vol. 2212, 54-68. [29]. Zhang, H.-Q. and Dantu, R., 2008. Discovery of Social Groups Using Call Detail Records. Lecture Notes in Computer Science, Vol. 5333, 489-498. [30]. Zhao, J. and Dang, X.-H., 2008. Bank Customer Churn Prediction Based on Support Vector Machine: Taking a Commercial Bank’s VIP Customer Churn as The Example. In Proceedings of 4th International Conference on Wireless Communications, Networking and Mobile Computing, 1-4, Date: 12-14 Oct. [31]. Zhu, J.-W. and Tang, Y.-G., 2009. A Dynamic Data Mining Model for Engineering Management. In Proceeding of ISECS International Colloquium on Computing, Communication, Control, and Management, 165-168, Date: 8-9 Aug. [32]. Basic Analysis and Security Engine (BASE) Project. 2009, http://secureideas.sourceforge.net/. [33]. CoreTech Knowledge Inc., 2008. DRAMA v2.5. http://www.coretech.com.tw/c_DRAMA.htm [34]. CRISP-DM Project, 2009. Cross Industry Standard Process for Data Mining. http://www.crisp-dm.org/index.htm. [35]. Hamilton, H., 2009. Cumulative Gains and Lift Charts. http://www2.cs.uregina.ca/~dbd/cs831/notes/lift_chart/lift_chart.html. [36]. MediaWiki, 2009. Cost-Benefit Analysis (CBA) http://www.informatics-review.com/wiki/index.php/Cost-Benefit_Analysis_(CBA). [37]. Newton, H., 2009. Newton’s Telecom Dictionary. CMP Books, http://www.cmpbooks.com. [38]. Sourcefire, Inc., 2009. Snort. URL: http://www.snort.org/. [39]. Symantec Corp., 2007. Symantec Internet Security Threat Report: Trends for January to June 06, Vol XII, http://www.symantec.com/content/en/us/about/media/ISTRXII_Main.pdf. [40]. Wikipedia, 2009. Precision and Recall. http://en.wikipedia.org/wiki/Precision_and_recall. |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信