淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


下載電子全文限經由淡江IP使用) 
系統識別號 U0002-1608200923555900
中文論文名稱 使用關聯式法則解決決策樹不相關值問題
英文論文名稱 Remove The Irrelevant Values Problem in the Decision Tree by Using Association Rule
校院名稱 淡江大學
系所名稱(中) 資訊工程學系碩士在職專班
系所名稱(英) Department of Computer Science and Information Engineering
學年度 97
學期 2
出版年 98
研究生中文姓名 林靜宜
研究生英文姓名 Ching-Yi Lin
學號 796410032
學位類別 碩士
語文別 中文
口試日期 2009-06-17
論文頁數 83頁
口試委員 指導教授-蔣定安
委員-葛煥昭
委員-王鄭慈
委員-蔣定安
中文關鍵字 決策樹  不相關值問題  離散化  關聯式法則 
英文關鍵字 Decision Tree  Irrelevant Value  Discretization  Association Rule 
學科別分類 學科別應用科學資訊工程
中文摘要 決策樹的特點是以遞迴的計算方式,透過由上而下和各別擊破的策略,但是在建構決策樹中有可能產生「不相關值問題」,同時因決策樹樹狀的結構,在移除不相關值後,切點的選擇也會因而改變,另一方面,決策樹在處理連續型數值屬性離散化是利用區域離散化的方式處理,而區域離散化必須考慮到屬性之間的相依性關係,若屬性之間不存在相依性關係,則全域離散化比區域離散化較好。
因此,在本文裡提出了移除不相關值問題的方法,並且利用關聯法則,整合分類法則樣版與全域離散化後的屬性切點,同時提出一個分類器建構方法以整合所有的分類法則。由本論文所提出方法的實驗結果,不僅移除了不相關值,且透過關聯法則整合全域切點後更精準。
英文摘要 The characteristics of the decision tree is based recursive formula, through top-down and divide-and-conquer. However, decision tree construction may have the irrelevant values problem, at the same time, the tree structure of decision tree, remove irrelevant values, the choice of cut-off point will also be changed. On the other hand, discretization techniques of the decision tree is local method , and local method must have between attributes and attribute dependency relationship, if there is no dependency, then the global of discretization methods are better than the local of discretization methods.

As a result, in this paper has been proposed to remove the irrelevant values problem, and the use of association rules, classification rules like the integration with the global discrete of the attributes after the cut-off point, at the same time to propose a method to build classifiers to integrate all of the classification rules. By the thesis of the proposed method of experimental results, not only to remove the relevant values, and using association rules integrate into the global cut-off point are more accurate .
論文目次 第一章 緒論 1
1.1 研究背景 1
1.2 研究動機與目的 2
1.3 章節架構 5
第二章 相關文獻與研究探討 6
2.1 連續屬性資料離散化 7
2.1.1 全域離散化和區域離散化(Global and Local) 7
2.1.2 監督離散化和非監督離散化(Supervised and Unsupervised) 8
2.1.3 靜態離散化和動態離散化(Static and Dynamic) 8
2.1.4 直接離散化和間接離散化(Direct and Incremental) 9
2.1.5 由上而下離散化和由下而上離散化(Top-Down and Bottom-Up) 9
2.2 分類法種類 10
2.2.1 分類決策樹演算法 10
2.2.1.1. C4.5演算法 12
2.2.1.2. 卡方互動自動檢視樹(Chi-Square Automatic Interaction Detector, CHAID) 13
2.2.1.3. 分類與迴歸樹(Classification and Regression Tree, CART) 14
2.3 分類決策樹演算法產生的問題 16
2.3.1 不相關值問題(Irrelevant Values Problem) 16
2.3.2 遺失分支問題(Missing Branches Problem) 19
2.4 解決不相關值問題的相關演算法 21
2.4.1 Chiang 演算法 21
2.5 關聯式法則(Associative Rules) 24
2.5.1 Apriori 演算法 28
2.6 關聯式分類器(Associative Classifier) 31
2.6.1 以關聯法則為基礎之分類演算法(Classification Base on Association,CBA) 31
第三章 研究方法 34
3.1 決策樹分類問題討論 34
3.2 DTMGC(Decision Tree Marge Global Classifier演算法流程圖 40
3.3 移除分類規則中不相關值 43
3.4 合併離散屬性與分類規則 47
3.4.1 DTMGC_MC(Decision Tree Marge Global Classifier Of Marge Condition)演算法 49
3.4.2 DTMGC_STCR(Decision Tree Marge Global Classifier Of Select Top Confidence Rule)演算法 53
3.5 範例 58
第四章 實驗方法及步驟 64
4.1 資料來源及實驗步驟 64
4.2 實驗結果及驗證 65
第五章 結論及未來研究方向 69
參考文獻 70
附錄-英文論文 76

圖目錄
圖2.1 決策樹樹狀圖 10
圖2.2絶對分割圖 12
圖2.3 含有不相關值的決策樹 17
圖2.4含有遺失分支的決策樹 19
圖2.5移除不相關節點的決策樹 20
圖2.6根據Apriori演算法以商品物項集合推導過程 30
圖3.1根據表3.1建構決策樹樹狀圖 36
圖3.2根據表3.2建構決策樹樹狀圖 38
圖3.3 DTMGC演算法流程圖 40
圖3.4 DTMGC_RIR演算法 44
圖3.5 DTMGC_RRIR演算法 45
圖3.6 BuilderSubRule演算法 46
圖3.7 Marge Condition內部運作圖 48
圖3.8 A>a1 and A<=a3 and B>b1 and B<=b3→c1,信賴度=80% 54
圖3.9 Ab1 and B<=b3→c2,信賴度=85% 54
圖3.10 A>a1 and A<=a2 and B>b1 and B<=b2→c1,信賴度=90% 55
圖3.11 Blood資料決策樹樹狀圖 58

表目錄
表2.1決策樹演算法比較表 15
表2.2圖2.3的決策樹所轉換的法則 17
表2.3表2.2移除不相關值之後的法則 18
表3.1高血壓控制狀況表 36
表3.2成人身高及體重資料 38
表3.3飲酒量屬性全域離散化切點表 50
表3.4鈉鹽量屬性全域離散化切點表 50
表3.5關聯分析輸入項次資料表 51
表3.6分類規則候選集 52
表3.7分類規則執行次序表 55
表3.8飲酒量屬性全域離散化切點表 56
表3.9鈉鹽量屬性全域離散化切點表 56
表3.10關聯分析輸入項次資料表 57
表3.11 Blood的決策樹所轉換的規則 59
表3.12 表3.11移除不相關值後的規則 60
表3.13 表3.11連續數值的資料屬性全域離散化切點表 60
表3.14 R3規則以DTMGC_MR屬性全域離散化切點表 61
表3.15 R2規則以DTMGC_STCR屬性全域離散化切點表 62
表3.16 R2規則以DTMGC_STCR分類規則候選集 62
表3.17 分類系統有效表格 63
表3.18 建構3.11分類器表 63
表4.1平均準確值表 67
表4.2交叉驗證準確值表 67
表4.3 (II)及(V)方法中規則三的資料表 68
表4.4平均項次數量表 68
參考文獻 [1]J.R.Quilan, Induction o decision trees, Machine Learning, 1, pp.81-106,1986.
[2]G.V.Kass, An exploratory technique for investigating large quantities of categorical data, Applied Statistics, vol.29, pp.119-127,1980.
[3]J.Cheng, U.M.Fayyad, K.B.Irani and Z.Qian,“Improved Decision Tree: a Generalized Version of ID3,“Proc. Of the Fifth Int’1 Conf. on Machine Larning,pp.100-108,1998.
[4]U.M.Fayyad and K.B.I, A Machine Learning Algorithm(GID3*) for Automated Knowledge Acquisition Improvements and Extensions, General Motors Research Report CS-634, Warren, MI:CM research labs,1991.
[5]L. Breiman, J.H.Friedman, R.A.Olshen, and C.J.Stone. Classification and regression trees .Wadsworth, Belmont, CA, 1984.
[6]J.R.Quinlan, C4.5:programs for machine learning. Morgan Kaufmann,1993.
[7]B.Liu, W.Hsu, and Y.Ma, Integrating Classification and Association Rule Mining, In KDD,1998.
[8]K.Wang, S.Q.Zhou, and Y.He, Growing Decision Trees On Support-Less Association Rules, ACM,2000.
[9]G. Kundu, M.M.Islam, S.Munir, and M.F.Bari, ACN:An Associative Classifier with Negative Rules, IEEE,2008.
[10]U.M.Fayyed and K.B.Irani, Multi-Interval Discretization of Continuous-Valued Attributes for Classification learning,In Proceedings Thirteenth International Joint Conference on Artificial Intelligence,1993.
[11]S. Kotsiantis and D. Kanellopoulos,Discretization Techniques: A recent survey, GESTS International Transactions on Computer Science and Engineering, Vol.32 (1),pp.47-58,2006.
[12]Hastie,T., Tibshirani, R.,Friedman, J., The Elements of Statistical Learning: Data Mining,Inference, and Prediction, Spring Series in Statistics,2001.
[13]Elomaa, T. Rousu, J., Efficient multisplitting revisited: Optima-preserving eliminatin of partition candidates. Data Mining and Knowledge Discovery 8:2,pp.131-138,2002.
[14]Cerquides,J.Lopez, R., Proposal and Empirical Comparison of a Parallelizable Distance-Based Discretization Method.III International Conference on Knowledge Discovery and Data Mining(KDDM97), Newport Beach,pp.139-142,1997.
[15]Kerber, R., ChiMerge:Discretization of Numeric Attributes. XNational Conf. on Artificial Intelligence American Association(AAAI92),pp.123-128,1992.
[16]Steck, H. and Setiono, R. , Feature selection and discretization. IEEE Transactions on Knowledge and Data Engineering,9:1-4,1997.
[17]Holte, R.C., Very simple classification rules perform well on most commonly used datasets.Machine Learing II,pp.63-91,1993.
[18]Dougherty J, Kohavi R, Sahami M. Supervised and unsuperivisd discretization of continuous features. In: Proceeding s of the 12th international conference on machine learing.pp.194-202,1995.
[19]Liu,B., Hsu,W. & Ma,Y., Integrating Classification and Association Rule Mining,Proceedings of 1998 International Conference on Knowledge Discovery and DataMining, pp. 80-86, New York, NY,1998.
[20]Han,J., and Kamber,M., Data Mining: Concepts and Techniques, Morgan Kanfmann Publishers.2001.
[21]J.R. Quinlan, Learning efficient classification procedures and their application to chess end games, in Machine Learning: An Artificial intelligence approach, Michalski, Carbonell and Mitichell.eds.Morgan Kaufmann,pp.463-482,1983.
[22]J.R. Quinlan,Simplifying decision tress.Int. J. Man-Machine Studies, vol.27,pp.221-234,1987.
[23]Hamill Karen A. and Zamora Antonio,“The Use of Titles for Automatic Document Classification, JASIS, V31, n6, pp.396-402,1980.
[24]Kwok K.L.,The Use of Title and Cited Titles as Document Representation for Automatic Classfication,Inform Proc. and Manag,V11,pp.201-206,1975.
[25]Larson Ray R.,“Experiments in Automatic Library of Congress Classification,JASIS,V43,n2,pp.130-148,1992.
[26]Maron M.E.,“Automatic Indexing : an Experimental Inquiry,J. of the ACM,V8,pp.404-417,1961.
[27]Müller K-R, Smola A J, Ra tsch G, et al.,Predicting time series with support vector machines., In: Proc. of ICANN'97, Springer Lecture Notes in Computer Science, 1997.
[28]Vapnik V, Golowich S, Smola A.,Support vector method for function approximation, regression estimation, and signal Processing,Neural Information Processing Systems 9, pp. 281-287,1997.
[29]Srikant,R. and Agrawal, R., Mining Quantitative Association Rules in Large Relational Tables. ACM, pp.4-6,1996.
[30]Han, J., Pei, J. & Yin, Y., Mining Frequent Patterns without Candidate Generation,Proceedings of the ACM SIGMOD Conference on Management of Data, pp.1-12,2000.
[31]Bodon, F. & Rόnyai, L., TRIE: An Alternative Data Structure for Data Mining Algorithms, Mathematical and Computer Modelling, Vol.38, pp.739-751,2003.
[32]吳育儒,「決策樹中移除不相關值問題的研究」,淡江大學,碩士論文,民國87年。
[33]葉介山、廖偉敦和卓政儒,「使用決策樹演算法之電子公文自動分類」,靜宜大學,碩士論文,民國95年。
[34]林文燦、柯美珠和王文杉,中華民國品質學會第43 屆年會暨第13 屆全國品質管理研討會「以資料探勘技術應用於教育訓練課程之規劃─以某A汽車公司為例」論文,民國96年。
[35] http://lovehome.kcg.gov.tw/service/word/110.pdf
[36]顏博文,「 應用資料探勘技術分析學生選課特性與學業表現」,中原大學,碩士論文,民國92年。
[37]Ding-An Chiang* , Cheng-Tzu Wang, Yi-Hsin Wang and Chun-Chi Chen,The Irrelevant Values Problem of Decision Tree for Improving a Glass Sputtering Process,2008.
[38]A. Veloso,W. Meira Jr., M. J. Zaki,“Lazy Associative Classification,” in Proceedings of the Sixth International Conference on Data Mining 2006 IEEE,2006.
[39]王瀞誼,「衡量關聯法則的新方法」,高雄大學,碩士論文,民國96年。
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2009-08-20公開。
  • 同意授權瀏覽/列印電子全文服務,於2009-08-20起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信