§ 瀏覽學位論文書目資料
  
系統識別號 U0002-0607201100332800
DOI 10.6846/TKU.2011.00175
論文名稱(中文) 關聯式分類演算法結合規則優先權以改善分類之準確度
論文名稱(英文) Improving the Performance of Associative Classification Algorithms with Rule Priorities
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士在職專班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 99
學期 2
出版年 100
研究生(中文) 王務本
研究生(英文) Wu-Pen Wang
學號 798410170
學位類別 碩士
語言別 繁體中文
第二語言別 英文
口試日期 2011-06-19
論文頁數 68頁
口試委員 指導教授 - 黃連進
委員 - 黃連進
委員 - 蔣定安
委員 - 葛煥昭
委員 - 王鄭慈
關鍵字(中) 規則排序
規則相依性
關聯式法則
關聯式分類演算法
關鍵字(英) Associative Classification Algorithms
Association Rule
Ranking
Rule Dependency
第三語言關鍵字
學科別分類
中文摘要
雖然已有許多關聯式分類演算法被發表,但是都沒有將規則相依問題考慮進去。而規則相依問題會造成規則信賴度的改變甚至規則及類別的改變,進而影響到分類的結果,因此,要解決規則相依問題(找尋最佳規則執行順序)將是一個非常耗時的工作,本論文將提出Rule Priority演算法來對規則做排序,來達到較佳的執行順序,降低規則相依問題對分類結果產生的影響,進而改善最後分類的結果。因我們提出的演算法是一種時間多項式的演算法,所以可以很輕易的跟任何關聯式分類演算法結合。而在本論文中,我們將Lazy演算法加上規則優先權的概念,來與僅使用Lazy演算法的方式進行比較,而實驗結果也證明,規則相依性的確可以改善分類的精確度。
英文摘要
Although different associative classification algorithms have been proposed, none of the available associative classification algorithms consider the rule dependence problem that directly influences the classification accuracy of associative classification algorithms. Since the finding of the optimal execution order of class association rules (CARs) is a combinational problem, instead of finding the optimal execution order of CARs, in this paper we propose polynomial time algorithms to re-rank the execution order of CARs by rules’ priority. This reduces the influence of rule dependency problems. Consequently, the performance (the classification accuracy and recall rate) of the associative classification algorithms can be improved. The experimental results show that using LAZY with our method can get better classification results than that of the LAZY association classifier without considering the rule dependence problem.
第三語言摘要
論文目次
目錄 IV
圖目錄 VI
表目錄 VII
第1章 緒論 1
1.1 研究動機 1
1.2 論文架構 4
第2章 相關文獻與研究探討 5
2.1 Apriori演算法 5
2.2 關聯式分類 8
2.2.1 CBA and CMAR演算法 8
2.2.2 CPAR演算法及PRM演算法 14
2.2.3 Lazy 演算法 17
2.3 文件分類 20
2.4 評量值 24
第3章 研究方法 26
3.1 問題探討 26
3.2 Rule Priority演算法 34
第4章 實驗結果 40
第5章 結論 45
參考文獻 46
附錄 英文論文 54

圖目錄
圖 2.1 CBA-RG演算法 9
圖 2.2 CBA-CB Naive(called M1) Algorithm 10
圖 2.3 PRM演算法 15
圖 2.4 L3 規則修剪演算法 19
圖 2.5 文件分類流程圖 20
圖 2.6 關聯式分類器分類流程示意圖 23
圖 3.1 Rule Priority演算法流程圖 34
圖 3.2 找出規則優先權的演算法 39

表目錄
表 2.1 文件數量分類表 24
表 3.1 Rule Dependence example (a) 27
表 3.2 Rule Dependence example (b) 28
表 3.3 Rule Dependence example (c) 28
表 3.4 Rule Dependence example (d) 29
表 3.5 Rule Dependence example (e) 30
表 3.6 Rule Dependence example (f) 31
表 3.7 Rule Dependence example (g) 31
表 3.8 Rule Dependence example (h) 32
表 4.1 訓練及測試文件數 41
表 4.2 Classification results of the LAZY without Rule Priority(a) 42
表 4.3 Classification results of the LAZY without Rule Priority(b) 42
表 4.4 Classification results of the LAZY Rule Priority(a) 42
表 4.5 Classification results of the LAZY Rule Priority(b) 42
表 4.6 Lazy演算法使用及不使用Rule Priority比較 44
參考文獻
[1] Alipio M. Jorge, Paulo J. Azevedo, “An Experiment With Association Rules And Classification: Post-Bagging And Conviction”, Lecture Notes In Computer Science, vol 3735, Oct 2005, Pp.137-149, 2005
[2] Bingheng Yan, Depei Qian, “Building A Simple And Effective Text Categorization System Using Relative Importance In Category”, Proceedings Of The Third International Conference On Natural Computation (Icnc 2007), vol 01, Pp. 108-114, 2007
[3] I. Dı’Az, J. Ranilla, E. Montan˜ E’S, J. Ferna’Ndez, And E.F. Combarro, "Improving Performance Of Text Categorization By Combining Filtering And Support Vector,” J. Am. Soc. Information Science And Technology (Jasist), vol. 55, No. 7, Pp. 579-592, 2004
[4] Hamill Karen A. And Zamora Antonio, “The Use Of Titles For Automatic Document Classification,” Jasis, vol31, Pp. 396-402,1980
[5] Hisham Al-Mubaid, Syed A. Umair, “A New Text Categorization Technique Using Distributional Clustering And Learning Logic,” Ieee Transactions On Knowledge And Data Engineering, vol. 18, No. 9 Pp. 1156-1165, 2006
[6] Dunja Mladenic, Etc, 1998, "Feature Selection For Unbalanced Class Distribution And Naive Bayes," Proceedings Of The International Conference On Machine Learning (Icml’98), 1998, http://www.cs.cmu.edu/~textlearning/pww/yplanet.Html
[7] F. THABTAH, “A review of associative classification mining,” Knowl. Eng. Rev., vol. 22, pp. 37-65, 2007
[8] David L. Banks And Yasmin H. Said, 2006, “Data Mining In Electronic Commerce”, Statistical Science, vol 21, Pp. 234-246, Number 2 2006
[9] Elı’As F. Combarro, Elena Montan˜e’ S, Irene Dı’Az, Jose’ Ranilla, And Ricardo Mones, 2005, “Introducing A Family Of Linear Measures For Feature Selection In Text Categorization,” Ieee Transactions On Knowledge And Data Engineering, vol. 17, No. 9, Pp. 1223-1232, September 2005
[10]	Fang Yuan; Yu-Qin Guo; Liu Yang; Fan Yang, 2006, “Chinese Text Categorization Based On Fuzzy Association Rules”, Machine Learning And Cybernetics, 2006 International Conference On, vol.19, Issue , Pp. 1030-1035, Aug. 2006
[11]	G.H. John, R. Kohavi, And K. Pfleger, “Irrelevant Features And The Subset Selection Problem,” Proc. 11th Int’L Conf. Machine Learning, Pp. 121-129, 1994
[12]	G. Salton And C. Buckley, “Term Weighting Approaches In Automatic Text Retrieval”, Information Processing And Management, vol. 24, No. 5, Pp. 513-523 , 1988
[13]	Hwee Tou Ng, Wei Boon Goh And Kok Leong Low, “Feature Selection, Perception Learning, And A Usability Case Study For Text Categorization,”, Proceedings Of The 20th Annual International ACm-Sigir Conference On Research And Development In Information Retrieval, vol. 13,Pp.67–73, 1997
[14]	Platt. J., “Fast Training Of Svms Using Sequential Minimal Optimization,” In B. Scholkopf, C. Burges, And A. Smola (Eds.) Advances In Kernel Methods–Support Vector Learning, Mit Press, 1998.
[15]	Jiang M, Wang L, Lu Y, Liao S, “A Rbf Network For Chinese Text Classification Based On Concept Feature Extraction,” Lecture Notes In Computer Science, 2006, No. 4234, Pp. 285-294, 2006
[16]	Jing Chen, Zhigang Zhang, Qing Li And Xiaoming Li, “A Pattern-Based Voting Approach For Concept Discovery On The Web”, Web Technologies Research And Development - Apweb 2005, vol 3399,pp. 77, 2005
[17]	J. R. Quinlan, “Induction Of Decision Trees,” Machine Learning, vol. 1, Pp. 81-106, 1986
[18]	Karras, Da, “An Improved Text Categorization Methodology Based On Second And Third Order Probabilistic Feature Extraction And Neural Network Classifiers,” Lecture Notes In Computer Science, 2006, No. 4251, Pp. 9-20, 2006
[19]	Ken Lang, 1995, "Newsweeder: Learning To Filter Netnews", Proceedings Of The Twelfth International Conference On Machine Learning, Pp. 331-339, 1995
[20]	Khalid Al-Kofahi, Alex Tyrrell, Arun Vachher, Tim Travers, And Peter Jackson, "Combining Multiple Classifiers For Text Categorization," Proceedings Of The Tenth International Conference On Information And Knowledge Management 2001, Atlanta, Georgia, Usa, Pp. 97-104, 2001
[21]	Kwok K.L., “The Use Of Title And Cited Titles As Document Representation For Automatic Classfication,” Inform Proc. And Manag, Pp. 201-206, 1975
[22]	K. Aas And L.Eikvil, "Text Categorization: A Survey,", Technical Report, Norwegian Computing Center, 1999
[23]	Maron M.E., “Automatic Indexing : An Experimental Inquiry,” J. Of The ACm, vol8, Pp. 404-417,1961
[24]	Montejoraez A, Urenalopez La, “Selection Strategies For Multi-Label Text Categorization,” Lecture Notes In Computer Science, No. 4139, Pp. 585-592, 2006
[25]	Muller K-R, Smola A J, Ra Tsch G, Et Al., “Predicting Time Series With Support Vector Machines,” In: Proc. Of Icann'97, Springer Lecture Notes In Computer Science, Pp. 999-1005,1997
[26]	M.F. Porter, ”An Algorithm For Suffix Stripping”, Program, 14(3) Pp.130−137, 1980
[27]	Osmar R. Zaiane, Maria-Luiza Antonie, “Classifying Text Documents By Associating Terms With Text Categories,” Proceedings Of The 13th Australasian Database Conference - vol 5, 2002, Pp. 215-222, 2002
[28]	Soucy, P.; Mineau, G.W., “A Simple Knn Algorithm For Text Categorization,” Data Mining, 2001. Icdm 2001, Proceedings Ieee International Conference On, Pp. 64-68 , 29 Nov.-2 Dec. 2001
[29]	Thorsten Joachims, “A Probabilistic Analysis Of The Rocchio Algorithm With Tfidf For Text Categorization,” Proceedings Of Icml-97, 14th International Conference On Machine Learning, Pp. 143-151, 1997
[30]	Tom M. Mitchell, “Machine Learning,” The Mcgraw-Hill Companies, Inc. , 1997
[31]	Tong Xiao-Jun; Cui Ming-Gen; Song Guo-Long, “Research On Chinese Text Automatic Categorization Based On Vsm”, Wireless Communications, Networking And Mobile Computing, 2007. Wicom 2007. International Conference On, vol10 , Pp. 3863-3866 ,Issue , 21-25 Sept. 2007
[32]	Vapnik V, Golowich S, Smola A., “Support Vector Method For Function Approximation, Regression Estimation, And Signal Processing,” Neural Information Processing Systems 9, Pp. 281-287,1997
[33]	R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int. Conf. Very Large Data Bases, VLDB, J.B. Bocca, M. Jarke, and C. Zaniolo, eds., Morgan Kaufmann, 1994, pp. 487–499.
[34]	B. Liu, W. Hsu and Y. Ma. ”Integrating classification and association rule  
mining”. Knowledge Discovery and Data Mining. ,1998, pp. 86, 80.
[35]	W. Li, J. Han, and J. Pei, CMAR: Accurate and efficient classification based on multiple class-association rules., ICDM-01, San Jose, CA, Nov. 2001, pp. 369-376.
[36]	Xiaoxin Yin, Jiawei Han, “CPAR: Classification based on Predictive Association Rules,” University of Illinois at Urbana-Champaign, 2003
[37]	Adriano Veloso, Wagner Meira Jr, Mohammed J. Zaki, “Lazy Associative Classification” Proceedings of the Sixth International Conference on Data Mining 2006 (IEEE)
[38]	E. Baralis and P. Garza. ”A lazy approach to pruning classification rules.” Data Mining, IEEE International Conference on 0pp. 35.  Dec. 2002.
[39]	M.F. Mutual, “An algorithm for suffix stripping,” Readings in information retrieval, Morgan Kaufmann Publishers Inc, pp. 313-316. , 1997
[40]	Yongwook yoon, Gary G. Lee, Tseng, “Text Categorization Based on Boosting Association Rules,”   Semantic Computing 2008 IEEE International Conference on, 2008, pp. 136-143.
[41]	Tseng, Yuen-Hsien, “Effectiveness Issues in Automatic Text Categorization,” Bulletin of the Library Association of China,  vol. 68, Jun. 2002, pp. 62-83
[42]http://www.Daviddlewis.Com/Resources/Testcollections/
Reuters21578/
[43]	http://rocling.iis.sinica.edu.tw/CKIP/
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信