§ 瀏覽學位論文書目資料
  
系統識別號 U0002-1906200714345200
DOI 10.6846/TKU.2007.00563
論文名稱(中文) 決策樹中移除不相關值問題在醫療研究的運用
論文名稱(英文) THE IRRELEVANT VALUES PROBLEM IN THE DECISION TREE FOR MEDICAL EXAMINATIONS
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 95
學期 2
出版年 96
研究生(中文) 黃南競
研究生(英文) Nan-Ching Huang
學號 694190124
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2007-06-14
論文頁數 103頁
口試委員 指導教授 - 葛煥昭(087173@mail.tku.edu.tw)
委員 - 王鄭慈
委員 - 蔣定安
關鍵字(中) 決策樹
不相關值問題
移除分支問題
醫學檢驗
關鍵字(英) Decision tree
classification
the irrelevant values problem
the missing branches problem
medical examination.
第三語言關鍵字
學科別分類
中文摘要
隨著醫療資訊系統的廣泛使用使得資料庫中資料量的大量增加。因此我們若能從現有的病歷資料經由數據的分析找出各種病徵在某一特定的病症中的相關性從而歸納出它們相互間的必然性,則可幫助醫生在診斷時給於協助進而提升醫療品質。
由於科技的進步、原先由手書寫方式的病歷改由以電腦儲存,近年更是由於軟硬體的進步,使得原先單純文字為主的病歷資料,進一步結合影像以及數位訊號等多媒體的資料型態,而成為多媒體醫學資料庫。無論是從病歷儲存到各種醫學影像或是生理訊號等屬於內含的資訊,藉此醫生更能有效的掌握病人的資料,對於臨床和基礎醫學研究都有相當大的正面意義,同時也可進一步的讓病人能夠得到更佳的醫療品質,基於以上的原因,歐美各國及日本先進國家無不對醫療資訊的整合系統進行廣泛的研究,目前國內外大部分的醫療體系也都建立的專屬的資料庫管理系統,以加速病患、醫師、與醫院間資訊的流通。
在資料探勘技術裡,決策樹中不相關值問題將會是本文討論的重點。當使用一組規則來代表一決策樹時,個別規則的先決條件可能含有不相關的狀況。當我們將這些規則應用在醫療檢驗時,這些不相關的狀況可能造成病人與社會不必要的負擔。因此為避免產生含有不相關狀況的規則,我們提出一個新的演算法。根據決策樹上的資訊,在轉換決策樹的過程中移除規則的不相關狀況。我們的演算法不只能處理不連續值,同時也可以處理連續值。
英文摘要
The decision tree is one of the key data mining techniques and has been applied to medical applications. A decision tree is built up by selecting the best test attribute as the root of the decision tree. Then, the same procedure is operated on each branch to induce the remaining levels of the decision tree until all examples in a. leaf belong to the same class. However, since the decision tree creates a branch for each value of that appearing in the training data without considering whether the value is relevant to the classification, the resultant tree may have over-specialization problem. Without losing generality, we only consider ID3-like algorithm in this paper.
As pointed out by J. Cheng, the irrelevant values problem and the missing branches problem are two causes of over-specialization of the decision tree. The missing branches problem of the decision tree is due to the fact that some of the reduced subsets at the non-leaf nodes do not necessarily contain examples of every possible value of the branching attribute. Consequently, the decision tree may fail to classify some instances. Since some values of that attribute may not be relevant to the classification, the resultant rules of the decision tree may have irrelevant conditions, which demands extra information to be supplied. Extra information needed means extra examinations needed to a patient, and extra examinations cause more expense and more burdens to the patient and society. When the decision tree is applied to medical applications, to save medical resources and avoid unnecessary examinations, we have to deal with irrelevant conditions in the decision tree. 
   When a decision tree is represented by a collection of rules, the antecedents of individual rules may contain irrelevant conditions. When we apply these rules to medical examinations, these irrelevant conditions may cause unnecessary burden to the patient and the society. Therefore, to avoid generating rules with irrelevant conditions, we propose a new algorithm to remove irrelevant conditions of rules in the process of converting the decision tree to rules according to information on the decision tree. Our algorithm can handle not only discrete values, but also continuous values.
第三語言摘要
論文目次
目    次
第一章  緒論	1
     1.1 前 言	1
     1.2 研究動機	2
     1.3 研究目的	3
     1.4 論文架構	4
第二章  文獻探討	5
     2.1 甲狀腺簡介	5								
2.2 甲狀腺癌之臨床表徵與治療	8
2.3 資料探勘與相關研究	16
第三章 決策樹演算法	27
     3.1 各種演算法	28
     3.2 決策樹方式推導及法則方式推導	37
    3.3 遞增式推導或非遞增式推導	39 
    3.4 由上而下推導及由下往上推導	41
    3.5 選擇最佳特的度量方法	42
第四章 決策樹的運算	50
     4.1 ID3不相關值問題	50
     4.2 非連續性值演算法	57
4.3 辨認決策樹不相關值為連續值之演算法	68
4.4 為醫療檢驗提出的演算法	76
第五章  結論與未來展望	78
     5.1 結論	78
5.2 未來展望	79
參考文獻	80
英文期刊論文	80 
中文期刊論文	87
英文稿	90

 



 
圖目錄
圖2-1  診斷乳癌之貝氏網路拓撲	20
圖4-1  含有不相關值的決策樹	51
圖4-2  ID3產生的決策樹	56
圖4-3  GID3產生的決策樹	56
圖4-4  含有不相關值的決策樹	61
圖4-5  解決不相關值問題的有效演算法	66
圖4-6  含有不相關值的決策樹	67
圖4-7  含有不相關值的決策樹	68
圖4-8 轉換決策樹為一組無非相關狀況的演算法	71
圖4-9  診斷甲狀腺功能低下疾病的ID3決策樹	72






表目錄
表2-1  國內目前研究	25
表3-1  種演算法的分類	27
表3-2 Contingency Table 的一般形式	44
表3-3 特徵A、B、C的Contingency Table	44
表4-1  圖4-1的決策樹所轉換的法則	51
表4-2  表4-1移除不相關值之後的法則	52
表4-3  例題二的資料表	54
表4-4  在診斷甲狀腺功能低下疾病時調查型式的一些
特性	73
表4-5  決策樹的遞迴規則	74
參考文獻
參考文獻
英文期刋論文
1. Ding-An Chiang, Wei Chen, Yi-Fan Wang, Chen-Fang Hsu “The Irrelevant Valuse Problem in The ID3 Tree” Computers and Artificial Interlligence, Vol. 19, 2000, pp.169-182.
2. Adjeroh and K. C. Nwousu, ”Multimedia database management reqirements and issues? IEEEE Multimedia, July-September, pp.24-33, 1997.
3. msar, J., Zupan, B., Aoki, N., eature Mining And Predictive Model Construction From Severe Trauma Patient data? Ournal of Medical Informatic, Vol. 63, pp.41-50, 2001.
4. Roskar, P. Abrams, I. Bratko, I. Kononenko, and A. Varsek, ”CUDS-An expert system for the diagnostics of lower urinary Tract Disorders? Journal of biomedical measurements, informatics an dControl, Vol. 1, No.4, pp.201-204, 1986.
5. Ragavan, L. Rendell, M. Shaw, and A. tessmer A, ”Lookahead feature construction for learning hard concepts? Proc. 10th Intern. Conf. on Machine Learning, pp.252-259, June 1993. 
6. Kononenko, I. Bratko, and E.Roskar, ”Expert system in automatic learning of medical diagnostic rules? International school for the synthesis of expert’s knowledge workshop, Bled, Slovenia, August 1984.
7. M. McDonald, S, Brossstte, and S. M. Moser, ”Pathology information-system-data mining leads to knowledge discovery? Pathology & laboratory medicine, VOL. 122, Is. 5, pp.409-411, 1998.
8. R. Quinlan, “C4.5 : Program for Machine Learning”, Morgan Kaufmann, 1993.
9. R. Quinlan, “C4.5: Program for Machine Learning”, Morgan Kaufmann, 1993.
10. R. Quinlan, “Induction of decision tree, “Machine Learning, 1, 1986, pp. 81-106.
11. R. Quinlan, “Learning efficient classification Procedures and their application to chess end games,” in Machine Learning: An Artificial intelligence approach, Michalski, arbonell and Mitchell. eds. Morgan Kaufmann, 1983 pp. 463-482.
12. J. r. Quinlan, “Learning efficient classification procedures and their application to chess end games,” in Machine Learning: An Artificial
intelligence approach, Michalski, Carbonell and Mitchell. eds. Morgan Kaufmann, 1983 pp. 463-482.
13. J. R. Quinlan, “Simplifyig decision trees,” Int, J. Man-Machine Studies, vol. 27, 1987, pp. 221-234.
14. Jiawei Han, Yandong Cai, and Nick Cercone, ”Knowledge discovery in database: An Attributed-Oriednted Approach, rocceding of The 18th VLBS Conference Vancoceve, British Columbia, Canada 1992.
15. John Minger, “An Emprical Comparison of Pruning Methods for Decision Tree Induction”,Machine Learning,4,p227-243.
16. John Minger, “An Emprical of Selection Measure for Decision Tree Induction”,Machine Learning,3, p319-342.
17. John Minger, “A further comparison of splitting rules for decision-tree induction”,Machine Learning 8:pp.75-85,1992
18. K. A.Horn, P. Compton, L. Lazarusl, and J. R. Quinlan, ”An Expert system for the interpretation of thyroid assays in a clinical laboratory? The Australian Computer Journal., Vo1. 17, No.1, pp.7-11, 1985.
19. K. E. Burnthornton and L. denbrqnd yocardial-infarction –pinpoint the key indicators in the 12-lead ECG using data mining? Computer and biomedical research, Vol. 31, Iss. 4, pp293-303, 1998.
20. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stoe,
Classification and regression trees, Wadsworth, 1984.
21. Lim, T.S., Loh, W. Y., Shih, Y.S., Comparison of Prediction Accuracy, Complexity, and Traububg Time of Thirty-Three Old And New Classification Algoithms? Machine Learning Journal, Vol. 40, 2000, pp. 203-228.
22. M. Kukar, I. Kononenko, and T. Silvester, ”Mchine learning in prognosis of the femoral neck fracture recovery? Artificial intelligence in medicine, Vol. 8, pp.431-451, 1996.
23. Makino Kazuhisa, Suda Takashi, Ono Hirotaka, Ibaraki Toshihide, ata Analysis By Positive Decision Trees? IEICE Trans, INF. & SYST, Vol. E82-D, No. 1, 1999, pp.76-99.
24. Milan Zorman, Peter Kokol, “Decision tree and automatic learning in medical decision making”, Facutly for electrical engineering and computer science, Smetanova 17, 2000 Maribor, Slovenia.
25. N. Lavarc, “Selected techniques for data mining in medicine”, Artificial intelligence in medicine, Vol. 16, pp.3-23, 1999.
26. Ordonez, C., Omiecinski, E., De Braal, L., Santana, C.A., Ezquerra, N., Taboada, J.A., Cooke, D., Krawczynska, E., Garcia, E.V., Mining Constrained Association Rules To Predict Heart Disease? 83
Proceedings IEEE International conference, 2001, pp.433-440.
27. P. E. Utgoff, “An Incremental ID3”, Proceeding of the figth International Conference on Maching Learning, p107-120.
28. P. E. Utgoff, “Incremental Induction of Decision Trees”, Machine Learning, 4. p161-181.
29. Phei-Lang Chang, Yu-Chuan Li, “ the use of a medical database support system to improve the preoperative diagnosis of prostate cancer with lymph node metastates" 1999國際醫療資訊研討論文集,pp.23.
30. Pilih, I.A., Mladenic, D., Lavra, N., Prevec, T.S., sing Machine Learning For Outcome Prediction of Patients With evere Head Injury? Tenth IEEE Symposium on Computer Based Medical Systems. 1997, pp.200-204.
31. Po Shun Ngan, Man Leung Wong, Wai Lam, Kwong Sak Leung, and Jack C. Y. Cheng, “Medical data mining using evolutionary computation”, Artificial intelligence in medicine, Vol. 16, pp.73-96, 1999.
32. Puuronen, S., Tsymbal, A., Skrypnky, 1., dvanced Local Feature Selection in Medical Diagnostic? Computer-Based Medical Systems, 84
Proceedings 13th IEEE Symposium, 20000, pp.25-30.
33. Ramirez, J.C.G., Cook, D.J., Peterson, L.L., Peterson, D.M., emporal Pattern Discovery in Coure-of-Disease Data? IEEE Engineering in Medicine and bilolgy Magazine, Vol. 19, Issue. 4, 2000, pp.63?1.
34. S. E. Brossette, A. P. Sprague, J. M. Hardin, K. B. Waites, W. T. Jones, and S. A. Moser ” Association rules and data mining in-hospital infection-control and public-health surveillance? Journal of the American medical informatics association, Vol.5, Iss., pp.373-381, 1998.
35. S. Hojker, I. Kononenko, A. Jauk, V. Filder, and M. Porenta, ”Expert system’s development in the management of thyroid diseases? Proc. European Congress for Nuclear Medicine, Milano Sept, 1988.
36. Suzuki Einoshin, ypothesis-Driven Exception-Rule Discovery From common Data Sets? JSAI, Vol. 15, No. 5, 2000, pp.782-789.
37. Tsumoto Shusaku, he Common medical Data Sets to Compare and Evaluate KDD Methods ISAI, Vol. 15, No. 5, 2000, pp.751-758.
38. U. M. Fayyad and K. B. Irani, “A machine learning algorithm (GID3*) for automated knowledge acquisition improvements and extensions,” General Motored Research Report CS-634, Warren MI:
GM research labs, 1991.
39. U. M. Fayyad and Keki. B. Irani, “The attribute selection problem in decision tree generalization,” Proc. Tenth National Conference on Artifical Intelligence, AAAI-92m San Jose, California, 1992. pp.. 104-110.
40. U. M. Fayyad, “Branching on attribute values in decision tree generalization,” Proc. Twelfth National Conference on Artificial Intelligence, AAAI-94, Seattle, Washington, 1994. pp.104-110.
41. U. M. Fayyad, J. Cheng, K. B. Irani and Z. Qian, “Improved decision trees: a generalized version of ID3,” Proc. Of the Fifth Int. Conf. on Machine Learning, 1988, 100-108.
42. U. M. Fayyad, Keki B. Irani, “What Should Be Minimized in a Decision Tree?”
43. Walter Van de Velde “Incremental Induction of Topological Minimal Trees”, 1990.
44. Walter Van de Velde, “IDL, or Taming the Multipleser”
45. Walter Van de Velde, “IDL, or Taming the Multiplexer”
46. Walter Van de Velde, “Incremental Induction of Toplolgical Minimal Trees”, 1990.
47. William H. Wolberg, W. Nick Street, O. L. Mangasarian, “Machine learning technique to diagnose breast cancer from image-processed nuclear feature of fine needle aspirates” Cancer letters 77, pp.163-171, 1994.
48. Xiao Hui Wang, Bin zheng, Walter F. Good, Jill L. King, and Yuan Hsiang Chang” Computer assisted diagnosis of breast cancer using a data-driven Bayesian belief network”, International Journal of medical informatics, Vol. 54, pp.115-126, 1999.
49. Zorman, M., Gou Masuda, Kokol, P., Yamamoto, R.,Stiglic, ining Diabetes Database With DecisionTrees and Association Rules? Proceedings of The 15th IEEE Symposium on Computer-Based Medical Systems, 2002, pp. 134-139.
中文期刋論文
1. 吳國禎,「資料探索在醫學資料庫之應用」,中原大學醫學工程學系碩士論文,1999年6月。
2. 李姿儀.「醫院門診資料探勘-以虎尾若瑟醫院為例」,南華大學資訊管理學系碩士論文,2000年6月。
3. 李建明,「數量相關法則技術在疾病資料庫之應用」,國立臺灣大學電機工程學研究所碩士論文,1999年6月。 87
4. 李博智,「資料探勘在慢性病預測模式之建構」,元智大學資訊管理研究所碩士論文,2002年7月。
5. 林伊蓉,「跨平台資料探勘工具之設計與建立:應用於醫學面」,國立陽明大學公共衛生研究所碩士論文,1998年6月。
6. 張文忠,「應用健保與環境資訊探討環境與健康之關係-以水質相關疾病為例」,國立高雄醫學大學公共衛生學研究所碩士論文,2001年6月。
7. 許懷仁,「生物醫學文件探勘系統之架構設計與實作」,國立成功大學資訊工程學系碩博士班碩士論文,2001年6月。
8. 郭振宗,「微生物類別診斷與抗生素用藥決策支援系統」,屏東科技大學資訊管理系碩士論文,1999年6月。
9. 陳永耀、謝銘鈞、嚴家鈺、陳啟鴻、林文澧, “類神經網模型應用於超音波換能器與組織參數之估測“,中華醫學工程期刊第18卷第二期,第129-138頁,1998年6月。
10. 陳益良,「應用資料抈探勘法探討老人就醫特性-以高雄市三民區為例」,國立高雄醫學大學公共衛生學研究所碩士在職專班碩士論文, 2001年6月。
11. 曾君俊、朱唯勤、詹寶珠、鍾文裕、潘宏基,“以類神經網路輔助加馬刀立體定位放射手術治療計畫之初步研究",中華醫學工
程期刊第18卷第二期,第96-97頁,1998年6月。
12. 黃勝宗,「資料探勱應用於醫療院所輔助病患看診指引之研究」,南華大學資訊管理學系碩士論文,2000年6月。
13. 楊銘耀、徐良育、胡威志、張恆雄、高材,“利用小波轉換與類神經網路進行心電圖特徵擷取與病症分類",中華醫學工程期刊第17卷第四期,第265-266頁,1997年12月。
14. 廖雅郁,「應用資料探採於我國西藥行銷之研究」,國立交通大學經營管理研究所碩士論文,2001年6月。
15. 劉漢男、李友專、薛宏昇、林瑞宜,"A dermatopathological diagnostic decision support system for non-infectious generalized blistering diseases", 1999國際醫療資訊研討會論文集,pp.23, 1999,台北。
16. 蔣定安,「資料庫基本理論與實作」,東華書局,2004年8月,二版。
論文全文使用權限
校內
紙本論文於授權書繳交後5年公開
同意電子論文全文授權校園內公開
校內電子論文於授權書繳交後3年公開
校外
同意授權
校外電子論文於授權書繳交後3年公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信