§ 瀏覽學位論文書目資料
  
系統識別號 U0002-1607201504231000
DOI 10.6846/TKU.2015.00441
論文名稱(中文) 以決策樹結合領域導向方法挖掘不可預期模式
論文名稱(英文) Mining Unexpected Patterns Using Decision Tree with Domain Driven Approach
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系博士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 103
學期 2
出版年 104
研究生(中文) 詹千慧
研究生(英文) Chien-Hui Chan
學號 899410012
學位類別 博士
語言別 繁體中文
第二語言別
口試日期 2015-07-01
論文頁數 76頁
口試委員 指導教授 - 蔣璿東
委員 - 王鄭慈
委員 - 王亦凡
委員 - 葛煥昭
委員 - 許輝煌
關鍵字(中) 不可預期模式
領域導向資料探勘
決策樹
治療比較
關鍵字(英) Unexpected pattern
Domain-driven data mining
Decision tree
Treatment comparison
第三語言關鍵字
學科別分類
中文摘要
不可預期模式的有趣之處在於,它們與人的既有知識相悖或是出乎意料,所以可能可以提供一些不同的觀點給研究人員參考,並可用以對未來研究的內容與方向提出建議;因此,本研究將提出一個不可預期模式探勘模型,以找出與領域專家前導知識相違背之不可預期模式。傳統資料探勘的過程是強調以資料為中心的模式探勘,環境、人類經驗等等的因素經常是被過濾或是大量的簡化,較不考慮個別使用者的需求或是領域相關的知識。在本研究中是使用醫學上的經陰道超音波引導抽取術之追蹤資料進行分析,由於臨床研究的環境因素較為複雜,所以發展一個可以與使用者互動並將領域前導知識、領域限制及專家知識介入資料探勘過程的模型是很重要的。同時,因醫學資料中經常包含大量數值變數,而決策樹可以同時處理數值及類別型資料,本研究所提出之模型使用決策樹結合領域導向資料探勘中封閉迴圈、深入探勘的概念,比較不同治療方式的治癒率並挖掘不可預期模式。
英文摘要
Unexpected patterns are interesting because they are contrast with the prior knowledge or unexpected. Therefore, unexpected patterns may provide researchers with different vision for future research. In this study, we propose an unexpected pattern mining model to find patterns that contrast with the prior knowledge of domain users. Traditional data mining emphasizes data-centered mining for interesting patterns. During the data mining process, environmental factors are usually filtered or simplified. Individual user requirements and domain-related knowledge are less considered. In this study, we use retrospective data from transvaginal ultrasound-guided aspirations to conduct our analysis. Since clinical studies are conducted in complex environments, we believe that it is important to develop an interactive mining model that involves prior domain knowledge, constraints, and expert knowledge. Meanwhile, medical data usually contain plenty of continuous variables. Decision tree algorithms can deal with both continuous and categorical variables at same time. Therefore, the proposed model uses decision trees to compare the recovery rates of two different treatments. By applying the concept of domain-driven data mining, we repeatedly utilize decision trees in a closed-loop, in-depth mining process to find unexpected and interesting patterns.
第三語言摘要
論文目次
目錄
第1章	緒論	1
1.1	研究背景與動機	1
1.2	研究目的	9
第2章	相關文獻與研究	12
2.1	領域導向資料探勘	14
2.2	興趣性測量	20
2.3	決策樹演算法	24
2.4	子宮內膜異位症與治療	34
第3章	不可預期知識挖掘模型	44
3.1	不可預期知識挖掘系統架構	45
3.2	不可預期的節點與興趣測量	48
3.3	不可預期模式偵測演算法	53
第4章	實驗與討論	56
4.1	實驗材料	56
4.2	實驗1:酒精注入時間長短的治癒率比較	58
4.3	實驗2:術後有無搭配荷爾蒙藥物治療治癒率比較	64
第5章	結論	67
REFERENCE	70
圖目錄
圖 1–1  治療方式決策樹	7
圖 1–2  A治療成效決策樹	7
圖 1–3  B治療成效決策樹	8
圖 2–1  資料庫知識挖掘流程	16
圖 2–2  領域導向資料探勘流程	16
圖 2–3  興趣測量涉入資料探勘流程	20
圖 2–4  興趣測量分類	22
圖 2–5  藥物使用決策樹	27
圖 2–6  建構決策樹的基本演算法	28
圖 2–7  決策樹二元分割	33
圖 2–8  月經週期與子宮內膜變化	34
圖 3–1  不可預期知識挖掘系統架構	46
圖 4–1  比較酒精注入時間初始結果決策樹	59
圖 4–2  根據圖 4–1節點2條件選擇資料產生之決策樹	62
圖 4–3  根據圖 4–2節點1條件選擇資料產生之決策樹	63
圖 4–4  根據圖 4–2節點2條件選擇資料產生之決策樹	63
圖 4–5  比較術後有無搭配荷爾蒙藥物治療決策樹	65
表目錄
表 1–1  子宮內膜異位治療記錄	5
表 2–1  病患資料與藥物使用記錄	26
表 2–2  決策樹演算法比較	30
表 3–1  輸出欄位配置	48
表 4–1  決策樹分析使用欄位	57
表 4–2  圖 4–1潛在不可預期規則的興趣性測量	60
表 4–3  圖 4–4潛在不可預期規則的興趣性測量	63
表 4–4  圖 4–5潛在不可預期規則的興趣性測量	65
參考文獻
[1]	A. W. Nap, P. G. Groothuis, A. Y. Demir et al., “Pathogenesis of endometriosis,” Best Pract Res Clin Obstet Gynaecol, vol. 18, no. 2, pp. 233-244, 2004.
[2]	C. Bulletti, M. Coccia, S. Battistoni et al., “Endometriosis and infertility,” J Assist Reprod Genet, vol. 27, no. 8, pp. 441-447, 2010/08/01, 2010.
[3]	L. Culley, C. Law, N. Hudson et al., “The social and psychological impact of endometriosis on women's lives: a critical narrative review,” Hum Reprod Update, vol. 19, no. 6, pp. 625-39, Nov-Dec, 2013.
[4]	S. Kennedy, A. Bergqvist, C. Chapron et al., “ESHRE guideline for the diagnosis and treatment of endometriosis,” Hum Reprod, vol. 20, no. 10, pp. 2698-704, Oct, 2005.
[5]	W. Zhu, Z. Tan, Z. Fu et al., “Repeat transvaginal ultrasound-guided aspiration of ovarian endometrioma in infertile women with endometriosis,” Am J Obstet Gynecol, vol. 204, no. 1, pp. 61.e1-61.e6, 2011.
[6]	N. Berlanda, P. Vercellini, and L. Fedele, “The outcomes of repeat surgery for recurrent symptomatic endometriosis,” Curr Opin Obstet Gynecol, vol. 22, no. 4, pp. 320-325, 2010.
[7]	P. Vercellini, E. Somigliana, P. ViganO et al., “The effect of second-line surgery on reproductive performance of women with recurrent endometriosis: A systematic review,” Acta Obstet Gynecol Scand, vol. 88, no. 10, pp. 1074-1082, 2009.
[8]	M. A. Aboulghar, R. T. Mansour, G. I. Serour et al., “Ultrasonic transvaginal aspiration of endometriotic cysts: an optional line of treatment in selected cases of endometriosis,” Human reproduction (Oxford, England), vol. 6, no. 10, pp. 1408-1410, 11/, 1991.
[9]	J. Noma, and N. Yoshida, “Efficacy of ethanol sclerotherapy for ovarian endometriomas,” Int J Gynaecol Obstet, vol. 72, no. 1, pp. 35-39, 2001.
[10]	H. Kafali, S. Yurtseven, F. Atmaca et al., “Management of non-neoplastic ovarian cysts with sclerotherapy,” Int J Gynaecol Obstet, vol. 81, no. 1, pp. 41-5, Apr, 2003.
[11]	C.-L. Hsieh, C.-S. Shiau, L.-M. Lo et al., “Effectiveness of ultrasound-guided aspiration and sclerotherapy with 95% ethanol for treatment of recurrent ovarian endometriomas,” Fertil Steril, vol. 91, no. 6, pp. 2709-2713, 2009.
[12]	S. Bolton, and C. Bon, "Analysis of Variance," Pharmaceutical Statistics: Practical and Clinical Applications, pp. 182-221, New York: Informa Healthcare, 2009.
[13]	S. Bolton, and C. Bon, "Linear Regression and Correlation," Pharmaceutical Statistics: Practical and Clinical Applications, pp. 147-181, New York: Informa Healthcare, 2009.
[14]	Y. F. Wang, M. Y. Chang, R. D. Chiang et al., “Mining Medical Data: A Case Study of Endometriosis,” J Med Syst, vol. 37, no. 2, pp. 1-7, 2013/01/17, 2013.
[15]	M. Y. Chang, C. L. Hsieh, C. S. Shiau et al., “Ultrasound-Guided Aspiration and Ethanol Sclerotherapy (EST) for Treatment of Cyst Recurrence in Patients after Previous Endometriosis Surgery: Analysis of Influencing Factors Using a Decision Tree,” Journal of Minimally Invasive Gynecology, vol. 20, no. 5, pp. 595-603, 9//, 2013.
[16]	A. Silberschatz, and A. Tuzhilin, “On subjective measures of interestingness in knowledge discovery,” in Proceedings of the 1st International Conference on Knowledge Discovery and Data Mining (KDD' 95), 1995, pp. 275-281.
[17]	A. Silberschatz, and A. Tuzhilin, “What makes patterns interesting in knowledge discovery systems,” IEEE Trans Knowl Data Eng, vol. 8, no. 6, pp. 970-974, 1996.
[18]	K.-N. Kontonasios, E. Spyropoulou, and T. De Bie, “Knowledge discovery interestingness measures based on unexpectedness,” Wiley Interdiscip Rev Data Min Knowl Discov, vol. 2, no. 5, pp. 386-399, 2012.
[19]	P. Lenca, P. Meyer, B. Vaillant et al., “On selecting interestingness measures for association rules: User oriented description and multiple criteria decision aid,” Eur J Oper Res, vol. 184, no. 2, pp. 610-626, 1/16/, 2008.
[20]	G. Piatetsky-Shapiro, and C. J. Matheus, "The interestingness of deviations." pp. 25-36.
[21]	C. X. Ling, C. Tielin, Y. Qiang et al., “Mining optimal actions for profitable CRM,” in Proceedings of the 2002 IEEE International Conference on Data Mining, 2002, pp. 767-770.
[22]	K. Wang, S. Zhou, and J. Han, “Profit Mining: From Patterns to Actions,” in Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology, 2002, pp. 70-87.
[23]	B. Liu, W. Hsu, and S. Chen, “Using General Impressions to Analyze Discovered Classification Rules,” in Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (KDD' 97), 1997, pp. 31-36.
[24]	D. Gamberger, and N. Lavrac, “Generating Actionable Knowledge by Expert-Guided Subgroup Discovery,” in Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery(PKDD' 2002), 2002, pp. 163-174.
[25]	L. Geng, and H. J. Hamilton, “Interestingness measures for data mining: A survey,” ACM Comput Surv, vol. 38, no. 3, 2006.
[26]	Z. Zhu, J. Gu, L. Zhang et al., "Research on Domain-Driven Actionable Knowledge Discovery," Cutting-Edge Research Topics on Multiple Criteria Decision Making, Communications in Computer and Information Science Y. Shi, S. Wang, Y. Peng et al., eds., pp. 176-183: Springer Berlin Heidelberg, 2009.
[27]	Y. Sebastian, and P. H. H. Then, “Domain-driven KDD for mining functionally novel rules and linking disjoint medical hypotheses,” Knowl-Based Syst, vol. 24, no. 5, pp. 609-620, 2011.
[28]	L. Breiman, Classification and regression trees, Belmont, Calif.: Wadsworth International Group, 1984.
[29]	B. Van Calster, G. Condous, E. Kirk et al., “An application of methods for the probabilistic three-class classification of pregnancies of unknown location,” Artificial Intelligence in Medicine, vol. 46, no. 2, pp. 139-154, 2009.
[30]	O. Uzuner, X. Zhang, and T. Sibanda, “Machine Learning and Rule-based Approaches to Assertion Classification,” Journal of the American Medical Informatics Association, vol. 16, no. 1, pp. 109-115, 2009.
[31]	I. Štajduhar, B. Dalbelo-Bašić, and N. Bogunović, “Impact of censoring on learning Bayesian networks in survival modelling,” Artificial Intelligence in Medicine, vol. 47, no. 3, pp. 199-217, 2009.
[32]	S. H. Huang, L. R. Wulsin, H. Li et al., “Dimensionality reduction for knowledge discovery in medical claims database: Application to antidepressant medication utilization study,” Computer Methods and Programs in Biomedicine, vol. 93, no. 2, pp. 115-123, 2009.
[33]	L. J. Pino, D. W. Stashuk, S. G. Boe et al., “Motor unit potential characterization using "pattern discovery",” Medical Engineering and Physics, vol. 30, no. 5, pp. 563-573, 2008.
[34]	W. H. Wu, A. A. T. Bui, M. A. Batalin et al., “Incremental diagnosis method for intelligent wearable sensor systems,” IEEE Transactions on Information Technology in Biomedicine, vol. 11, no. 5, pp. 553-562, 2007.
[35]	N. Theera-Umpon, and S. Dhompongsa, “Morphological granulometric features of nucleus in automatic bone marrow white blood cell classification,” IEEE Transactions on Information Technology in Biomedicine, vol. 11, no. 3, pp. 353-359, 2007.
[36]	L. Cao, C. Zhang, P. S. Yu et al., "Challenges and Trends," Domain Driven Data Mining, pp. 1-25: Springer US, 2010.
[37]	W. J. Frawley, G. Piatetsky-Shapiro, and C. J. Matheus, “Knowledge discovery in databases: an overview,” AI Mag., vol. 13, no. 3, pp. 57-70, 1992.
[38]	L. Cao, C. Zhang, P. S. Yu et al., "D 3 M Methodology," Domain Driven Data Mining, pp. 27-47: Springer US, 2010.
[39]	U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “The KDD process for extracting useful knowledge from volumes of data,” Commun. ACM, vol. 39, no. 11, pp. 27-34, 1996.
[40]	L. Cao, “Actionable knowledge discovery and delivery,” WIREs Data Mining Knowl Discov, vol. 2, pp. 149-163, 2012.
[41]	L. Cao, “Domain-driven data Mining: challenges and prospects,” IEEE Trans Knowl Data Eng, vol. 22, no. 6, pp. 755-769, 2010.
[42]	L. Cao, and C. Zhang, "Domain-Driven Actionable Knowledge Discovery in the Real World," Advances in Knowledge Discovery and Data Mining, Lecture Notes in Computer Science W.-K. Ng, M. Kitsuregawa, J. Li et al., eds., pp. 821-830: Springer Berlin Heidelberg, 2006.
[43]	B. Padmanabhan, and A. Tuzhilin, “Unexpectedness as a measure of interestingness in knowledge discovery,” Decis Support Syst, vol. 27, no. 3, pp. 303-318, 1999.
[44]	G. Piatetsky-Shapiro, "Discovery, analysis, and presentation of strong rules," Knowledge Discovery in Databases, G. Piatetsky-Shapiro and W. Frawley, eds., pp. 229-248, Cambridge, MA: AAAI/MIT Press, 1991.
[45]	A. A. Freitas, “On rule interestingness measures,” Knowl-Based Syst, vol. 12, no. 5–6, pp. 309-315, 1999.
[46]	B. Liu, W. Hsu, L.-F. Mun et al., “Finding interesting patterns using user expectations,” IEEE Trans Knowl Data Eng, vol. 11, no. 6, pp. 817-832, 1999.
[47]	K. McGarry, “A survey of interestingness measures for knowledge discovery,” Knowl Eng Rev, vol. 20, no. 1, pp. 39-61, 2005.
[48]	L.-S. Tsay, and Z. W. Raś, “Action rules discovery: system DEAR2, method and experiments,” J Exp Theor Artif Intell, vol. 17, no. 1-2, pp. 119-128, 2005.
[49]	Y. Yao, Y. Chen, and X. Yang, "A Measurement-Theoretic Foundation of Rule Interestingness Evaluation," Foundations and Novel Approaches in Data Mining, Studies in Computational Intelligence Studies in Computational Intelligence, T. Young Lin, S. Ohsuga, C.-J. Liau et al., eds., pp. 41-59: Springer Berlin / Heidelberg, 2006.
[50]	L. Cao, D. Luo, and C. Zhang, “Knowledge actionability: satisfying technical and business interestingness,” Int. J. Business Intelligence and Data Mining, vol. 2, no. 4, pp. 496-514, 2007.
[51]	I. N. M. Shaharanee, F. Hadzic, and T. S. Dillon, “Interestingness measures for association rules based on statistical validity,” Knowl-Based Syst, vol. 24, no. 3, pp. 386-392, 2011.
[52]	D. H. Glass, “Confirmation measures of association rule interestingness,” Knowl-Based Syst, vol. 44, pp. 65-77, 2013.
[53]	L. Cao, and C. Zhang, “Domain-Driven, Actionable Knowledge Discovery,” Intell Syst, IEEE, vol. 22, no. 4, pp. 78-88, 2007.
[54]	J. R. Quinlan, “Induction of decision trees,” Mach Learn, vol. 1, no. 1, pp. 81-106, 1986.
[55]	G. V. Kass, “An exploratory technique for investigating large quantities of categorical data,” Appl Stat, pp. 119-127, 1980.
[56]	J. R. Quinlan, C4.5: programs for machine learning: Morgan Kaufmann Publishers Inc., 1993.
[57]	G. F. Cooper, and E. Herskovits, “A Bayesian method for the induction of probabilistic networks from data,” Machine Learning, vol. 9, no. 4, pp. 309-347, 1992.
[58]	C. M. Bishop, “Neural networks for pattern recognition,” 1995.
[59]	T. Cover, and P. Hart, “Nearest neighbor pattern classification,” Information Theory, IEEE Transactions on, vol. 13, no. 1, pp. 21-27, 1967.
[60]	S. Cost, and S. Salzberg, “A weighted nearest neighbor algorithm for learning with symbolic features,” Machine Learning, vol. 10, no. 1, pp. 57-78, 1993.
[61]	C. Cortes, and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.
[62]	P. Clark, and R. Boswell, "Rule induction with CN2: Some recent improvements." pp. 151-163.
[63]	R. C. Holte, “Very simple classification rules perform well on most commonly used datasets,” Machine Learning, vol. 11, no. 1, pp. 63-90, 1993.
[64]	J. Furnkranz, and G. Widmer, "Incremental reduced error pruning." pp. 70-77.
[65]	P. Clark, and T. Niblett, “The CN2 induction algorithm,” Machine Learning, vol. 3, no. 4, pp. 261-283, 1989.
[66]	L. Breiman, “Bagging predictors,” Machine Learning, vol. 24, no. 2, pp. 123-140, 1996.
[67]	Y. Freund, and R. E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” Journal of Computer and System Sciences, vol. 55, no. 1, pp. 119-139, 1997.
[68]	J. Quinlan, "Discovering rules form large collections of examples," Expert systems in the micro electronic age, pp. 168-201: Edinburgh University Press, Edinburgh, 1979.
[69]	J. R. Quinlan, “Learning efficient classification procedures and their application to chess end games,” in Machine Learning. An Artificial Intelligence Approach, 1983, pp. 463-482.
[70]	J. Han, and M. Kamber, Data mining : concepts and techniques, San Francisco: Morgan Kaufmann Publishers, 2001.
[71]	R. J. Lewis, “An introduction to classification and regression tree (CART) analysis.,” in Annual Meeting of the Society for Academic Emergency Medicine, San Francisco, California, 2000, pp. 1-14.
[72]	J. H. Medicine. "Endometrial Biopsy," Jan. 1st, 2014; http://www.hopkinsmedicine.org/healthlibrary/test_procedures/gynecology/endometrial_biopsy_92,P07773/.
[73]	P. J. Q. van der Linden, “Theories on the pathogenesis of endometriosis,” Human Reproduction, vol. 11, no. suppl 3, pp. 53-65, November 1, 1996, 1996.
[74]	中華民國子宮內膜異位症婦女協會. "子宮內膜異位症簡介," May 1st, 2015; http://www.eataiwan.org.tw/pageAbout02.htm.
[75]	C. Tseng. "曾志仁醫師的網頁," May 1st., 2015; http://www.csh.org.tw/Dr.TCJ/Educartion/OBGYN/%E5%AD%90%E5%AE%AE%E5%85%A7%E8%86%9C%E7%95%B0%E4%BD%8D%E7%97%87.htm.
[76]	M. Muyldermans, F. J. Cornillie, and P. R. Koninckx, “CA125 and endometriosis,” Hum Reprod Update, vol. 1, no. 2, pp. 173-87, Mar, 1995.
[77]	C. Chapron, P. Vercellini, H. Barakat et al., “Management of ovarian endometriomas,” Human Reproduction Update, vol. 8, no. 6, pp. 591-597, November 1, 2002, 2002.
[78]	J. Donnez, J. Squiffle, and O. Donnez, “Minimally invasive gynecologic procedures,” Curr Opin Obstet Gynecol, vol. 23, no. 4, pp. 289-95, Aug, 2011.
[79]	M. Aboulghar, R. Mansour, G. Serour et al., “Treatment of recurrent chocolate cysts by transvaginal aspiration and tetracycline sclerotherapy,” Journal of Assisted Reproduction and Genetics, vol. 10, no. 8, pp. 531-533, 1993/11/01, 1993.
[80]	S. Mesogitis, A. Antsaklis, G. Daskalakis et al., “Combined ultrasonographically guided drainage and methotrexate administration for treatment of endometriotic cysts,” The Lancet, vol. 355, no. 9210, pp. 1160.
[81]	P. Acien, F. J. Quereda, M. J. Gomez-Torres et al., “GnRH Analogues, Transvaginal Ultrasound-Guided Drainage and Intracystic Injection of Recombinant Interleukin-2 in the Treatment of Endometriosis,” Gynecologic and Obstetric Investigation, vol. 55, no. 2, pp. 96-104, 2003.
[82]	A. Ikuta, Y. Tanaka, T. Mizokami et al., “Management of transvaginal ultrasound-guided absolute ethanol sclerotherapy for ovarian endometriotic cysts,” J. Med. Ultrason., vol. 33, no. 2, pp. 99-103, 2006.
[83]	N. V. Chawla, N. Japkowicz, and A. Kotcz, “Editorial: special issue on learning from imbalanced data sets,” SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 1-6, 2004.
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信