淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


系統識別號 U0002-1607201315182300
中文論文名稱 應用領域導向方法探勘電信與醫療資料
英文論文名稱 Mining Telecom and Medical Data by Domain Driven Approaches
校院名稱 淡江大學
系所名稱(中) 資訊工程學系博士班
系所名稱(英) Department of Computer Science and Information Engineering
學年度 101
學期 2
出版年 102
研究生中文姓名 李卓銘
研究生英文姓名 Cho-Ming Lee
學號 896410049
學位類別 博士
語文別 中文
口試日期 2013-06-28
論文頁數 80頁
口試委員 指導教授-蔣璿東
委員-謝楠楨
委員-許輝煌
委員-葛煥昭
委員-王亦凡
委員-蔣璿東
中文關鍵字 領域導向資料探勘  封閉式迴圈探勘  組合探勘  可行動知識挖掘 
英文關鍵字 Domain Driven Data Mining  Closed-Loop Mining  Combined Mining  Actional Knowledge Mining 
學科別分類 學科別應用科學資訊工程
中文摘要 本研究的目的是運用領域導向資料探勘相關技術探討醫療及電信兩個產業案例;探討醫療產業相關資料主要以封閉式迴圈探勘的技術,將學術相關研究與具有臨床經驗的專業醫師做討論的判斷,確認過後的結果將成為能夠輔佐醫師針對病患的狀況制訂治療方式的知識;另外探討電信產業的相關資料除了以封閉式迴圈探勘的技術並結合組合探勘為基礎的可行動知識挖掘的架構,減少學術與商業的隔閡,為商業組織帶來更高之效益。
英文摘要 This research of purpose is using Domain Driven Data Mining related technology discussion medical and the telecommunications two a industry case; discussion medical industry related data main to closed Closed-Loop Data Mining of technology, will academic related research and has clinical experience of professional physician do discussion of judgment, confirmed after of results will became to supporting physician for disease patients of situation developed treatment way of knowledge; also discussion telecommunications industry of related data except to Closed-Loop Data Mining of technology and combined Combined Mining based AKD framework, Reducing the gap in academic and commercial for commercial organizations to bring greater benefits.
論文目次 目錄
第1章 緒論 1
1.1 研究動機與目的 1
1.2 研究架構 4
第2章 相關文獻與研究探討 5
2.1 領域導向資料探勘(Domain-Driven Data Mining, D3M) 5
2.1.1 可行動知識挖掘與傳達(Actionable Knowledge Discover & Delivery, AKD) 7
2.2 組合探勘(Combined Mining) 10
2.3 關聯式法則(Association Rules) 14
2.3.1 Apriori 演算法 18
2.4 決策樹(Decision Tree) 21
2.4.1 決策樹演算法 21
2.4.2 決策樹演算法流程 23
2.4.3 分類與回歸樹 26
2.5 群集化(Clustering) 30
2.5.1 資料型態 30
2.5.2 群集演算法則 32
第3章 案例探討:醫療資料 38
3.1 背景介紹 38
3.2 研究方法與流程 41
3.3 資料準備與說明 46
3.4 實驗結果與討論 48
第4章 案例探討:電信資料 56
4.1 背景介紹 56
4.2 研究方法與流程 57
4.3 實驗結果與討論 59
第5章 結論與未來研究方向 74
參考文獻 77

圖目錄
圖 2 1 領域導向資料探勘主要概念圖 7
圖 2-2 Combined Mining 12
圖 2-3 Closed-loop Multi method 14
圖 2-4 產生候選項目 19
圖 2-5 計算候選項目次數 20
圖 2-6 Apriori演算法過程 21
圖 2-7 預測顧客是否會買電腦的決策樹 22
圖 2-8 建構決策樹的基本演算法 24
圖 2-9 三種測量群集間距離之方法 37
圖 3-1 領域導向探勘系統架構 45
圖 3-2 子宮內膜異位症病患資料決策樹 49
圖 3-3 7分鐘以上的子宮內膜異位症病患資料決策樹 50
圖 3-4 Node 6規則病患資料決策樹 54
圖 4-1 Combined Mining-based Customer Payment Behavior Prediction Framework (CM-CoP) 58
圖 4-2 群集圖 67
圖 4-3 群集[6]6 69
圖 4-4 群集[4]3 71


表目錄
表 2-1 生物基本屬性及生物學分類資料表 28
表 3-1 分析的欄位資料說明 46
表 3-2 決策樹各節點病患統計分佈及其t檢定 51
表 3-3 決策樹中Node 1、5、6病患統計分佈 52
表 3-4 決策樹個節點病患統計分析 54
表 4-1 原始CDR付費狀況資料格式 60
表 4-2 新付費狀態 62
表 4-3 相關重要欄位 65
參考文獻 [1] L. Cao, P.S. Yu, C. Zhang, Y. Zhao, Challenges and Trends, in: Domain Driven Data Mining, Springer US2010, pp. 1-26.
[2] W.J. Frawley, G. Piatetsky-Shapiro, C.J. Matheus, Knowledge discovery in databases: an overview, AI Magazine, 13 (1992) 57-70.
[3] G. Piatetsky-Shapiro, C.J. Matheus, The interestingness of deviations, Proceedings of KDD-94: AAAI-94 Workshop on Knowledge Discovery in Databases, Seattle, Washington, 1994, pp. 25-36.
[4] A. Silberschatz, A. Tuzhilin, On subjective measures of interestingness in knowledge discovery, Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining (KDD' 95), AAAI Press, 1995, pp. 275-281.
[5] B. Liu, W. Hsu, S. Chen, Using General Impressions to Analyze Discovered Classification Rules, Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD' 97), AAAI Press, 1997, pp. 31-36.
[6] D. Gamberger, N. Lavrac, Generating Actionable Knowledge by Expert-Guided Subgroup Discovery, Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery(PKDD' 2002), Springer-Verlag, 2002, pp. 163-174.
[7] L. Geng, H.J. Hamilton, Interestingness measures for data mining: A survey, ACM Comput. Surv., 38 (2006) 1-32.
[8] Z. Zhu, J. Gu, L. Zhang, W. Song, R. Gao, Research on Domain-Driven Actionable Knowledge Discovery, Proceedings of the 20th International Conference on Multiple Criteria Decision Making, Springer Berlin Heidelberg, Chengdu, China, 2009, pp. 176-183.
[9] L. Cao, Actionable knowledge discovery and delivery, WIREs Data Mining Knowl Discov, 2 (2012) 149-163.
[10] L. Cao, C. Zhang, P. Yu, Y. Zhao, Challenges and Trends, in: Domain Driven Data Mining, Springer US2010, pp. pp 1-25.
[11] Z.W. Ras, A. Wieczorkowska, Action-Rules: How to Increase Profit of a Company, Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD' 2000), Springer-Verlag, 2000, pp. 587-592.
[12] S. Im, Z. Raś, H. Wasyluk, Action rule discovery from incomplete data, Knowl. Inf. Syst., 25 (2010) 21-33.
[13] L. Cao, C. Zhang, Domain-Driven actionable knowledge discovery in the real world, Proceedings of the 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD' 2006), Springer-Verlag, Singapore, 2006, pp. 821-830.
[14] A.A. Freitas, On rule interestingness measures, Knowledge-Based Syst., 12 (1999) 309-315.
[15] B. Liu, W. Hsu, L.-F. Mun, H.-Y. Lee, Finding interesting patterns using user expectations, IEEE Trans Knowl Data Eng, 11 (1999) 817-832.
[16] B. Padmanabhan, A. Tuzhilin, Unexpectedness as a measure of interestingness in knowledge discovery, Decis Support Syst, 27 (1999) 303-318.
[17] K. McGarry, A survey of interestingness measures for knowledge discovery, Knowl. Eng. Rev., 20 (2005) 39-61.
[18] L.-S. Tsay, Z.W. Raś, Action rules discovery: system DEAR2, method and experiments, J Exp Theor Artif Intell, 17 (2005) 119-128.
[19] Y. Yao, Y. Chen, X. Yang, A Measurement-Theoretic Foundation of Rule Interestingness Evaluation, in: T. Young Lin, S. Ohsuga, C.-J. Liau, X. Hu (Eds.) Foundations and Novel Approaches in Data Mining, Springer Berlin / Heidelberg2006, pp. pp 41-59.
[20] L. Cao, D. Luo, C. Zhang, Knowledge actionability: satisfying technical and business interestingness, Int. J. Business Intelligence and Data Mining, 2 (2007) 496-514.
[21] K.-N. Kontonasios, E. Spyropoulou, T. De Bie, Knowledge discovery interestingness measures based on unexpectedness, Wiley Interdiscip Rev Data Min Knowl Discov, 2 (2012) 386-399.
[22] I.N.M. Shaharanee, F. Hadzic, T.S. Dillon, Interestingness measures for association rules based on statistical validity, Knowledge-Based Systems, 24 (2011) 386-392.
[23] L. Cao, C. Zhang, Q. Yang, D. Bell, M. Vlachos, B. Taneri, E. Keogh, P.S. Yu, N. Zhong, M.Z. Ashrafi, D. Taniar, E. Dubossarsky, W. Graco, Domain-Driven, Actionable Knowledge Discovery, IEEE Intelligent Systems, 22 (2007) 78-88, c73.
[24] H. Zhang, Y. Zhao, L. Cao, C. Zhang, Combined Association Rule Mining, Proceedings of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining(PAKDD' 2008), Springer Berlin / Heidelberg, Osaka, Japan, 2008, pp. 1069-1074.
[25] L. Cao, P.S. Yu, C. Zhang, Y. Zhao, Combined Mining, in: Domain Driven Data Mining, Springer US2010, pp. 113-143.
[26] L. Cao, H. Zhang, Y. Zhao, D. Luo, C. Zhang, Combined mining: discovering informative knowledge in complex data, IEEE Trans Syst Man Cybern B Cybern, 41 (2011) 699-712.
[27] Y. Zhao, H. Zhang, L. Cao, C. Zhang, H. Bohlscheid, Combined Pattern Mining: From Learned Rules to Actionable Knowledge, Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence, Springer-Verlag, Auckland, New Zealand, 2008, pp. 393-403.
[28] J. Han, M. Kamber, Data mining : concepts and techniques, Morgan Kaufmann Publishers, San Francisco, 2001.
[29] M.E. Maron, Automatic Indexing - an Experimental Inquiry, J Acm, 8 (1961) 404-417.
[30] K.L. Kwok, The use of title and cited titles as document representation for automatic classification, Information Processing & Management, 11 (1975) 201-206.
[31] K.A. Hamill, A. Zamora, The Use of Titles for Automatic Document Classification, J Am Soc Inform Sci, 31 (1980) 396-402.
[32] R.R. Larson, Experiments in Automatic Library-of-Congress Classification, J Am Soc Inform Sci, 43 (1992) 130-148.
[33] K.R. Muller, A.J. Smola, G. Ratsch, B. Scholkopf, J. Kohlmorgen, V. Vapnik, Predicting time series with support vector machines, 1327 (1997) 999-1004.
[34] V. Vapnik, S.E. Golowich, A. Smola, Support Vector Method for Function Approximation, Regression Estimation, and Signal Processing, in: M.I. Jordan, M.C. Mozer, T. Petsche (Eds.) Neural information processing systems, MIT Press1997, pp. 281-287.
[35] R. Bayardo, R. Agrawal, Mining the Most Interesting Rules, in: S. Chaudhuri, D. Madigan (Eds.) ACM SIGKDD international conference on knowledge discovery and data mining; KDD, Association for Computing Machinery1999, pp. 145-154.
[36] J. Quinlan, Discovering rules form large collections of examples: a case study, Edinburgh University Press, Edinburgh, 1979.
[37] J.R. Quinlan, Learning efficient classification procedures and their application to chess end games, Machine Learning. An Artificial Intelligence Approach, 1983, pp. 463-482.
[38] J.R. Quinlan, Induction of decision trees, Machine Learning, 1 (1986) 81-106.
[39] 丁一賢, 陳牧言, 資料探勘, 滄海圖書出版股份有限公司, 中華民國:台中, 2005.
[40] L. Breiman, Classification and regression trees, Wadsworth International Group, Belmont, Calif., 1984.
[41] J.B. MacQueen, Some Methods for Classification and Analysis of MultiVariate Observations, in: L.M.L. Cam, J. Neyman (Eds.) Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 1967, pp. 281-297.
[42] A.K. Jain, R.C. Dubes, Algorithms for clustering data, Prentice-Hall, Inc.1988.
[43] L. Kaufman, P.J. Rousseeuw, Partitioning Around Medoids (Program PAM), in: Finding Groups in Data, John Wiley & Sons, Inc.2008, pp. 68-125.
[44] C.-L. Hsieh, C.-S. Shiau, L.-M. Lo, T.s.-T.a. Hsieh, M.-Y. Chang, Effectiveness of ultrasound-guided aspiration and sclerotherapy with 95% ethanol for treatment of recurrent ovarian endometriomas, Fertil. Steril., 91 (2009) 2709-2713.
[45] J. Noma, N. Yoshida, Efficacy of ethanol sclerotherapy for ovarian endometriomas, Int. J. Gynecol. Obstet., 72 (2001) 35-39.
[46] A. Ikuta, Y. Tanaka, T. Mizokami, A. Tsutsumi, M. Sato, M. Tanaka, H. Kajihara, H. Kanzaki, Management of transvaginal ultrasound-guided absolute ethanol sclerotherapy for ovarian endometriotic cysts, J Med Ultrason (2001) 33 (2006) 99-103.
[47] W. Zhu, Z. Tan, Z. Fu, X. Li, X. Chen, Y. Zhou, Repeat transvaginal ultrasound-guided aspiration of ovarian endometrioma in infertile women with endometriosis, American journal of obstetrics and gynecology, 204 (2011) 61.e61-61.e66.
[48] H. Kafali, S. Yurtseven, F. Atmaca, I. Ozardali, Management of non-neoplastic ovarian cysts with sclerotherapy, International journal of gynaecology and obstetrics: the official organ of the International Federation of Gynaecology and Obstetrics, 81 (2003) 41-45.
[49] M. Taniguchi, M. Haft, J. Hollmen, V. Tresp, Fraud detection in communication networks using neural and probabilistic methods, Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on, 1998, pp. 1241-1244 vol.1242.
[50] T. Fawcett, F. Provost, Adaptive fraud detection, Data Min Knowl Disc, 1 (1997) 291-316.
[51] R. Wheeler, S. Aitken, Multiple algorithms for fraud detection, Knowledge-Based Systems, 13 (2000) 93-99.
[52] M.H. Cahill, D. Lambert, Jos, #233, C. Pinheiro, D.X. Sun, Detecting fraud in the real world, in: A. James, M.P. Panos, G.C.R. Mauricio (Eds.) Handbook of massive data sets, Kluwer Academic Publishers2002, pp. 911-929.
[53] L. Yan, R.H. Wolniewicz, R. Dodier, Predicting customer behavior in telecommunications, Ieee Intell Syst, 19 (2004) 50-58.
[54] S.-Y. Hung, D.C. Yen, H.-Y. Wang, Applying data mining to telecom churn management, Expert Systems with Applications, 31 (2006) 515-524.
[55] C. Schommer, Discovering Fraud Behaviour in Call Detailed Records, 2010 Grande Region Security and Reliability Day, Saarbrucken, Germany, 2010.
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2018-07-29公開。
  • 同意授權瀏覽/列印電子全文服務,於2018-07-29起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信