§ 瀏覽學位論文書目資料
  
系統識別號 U0002-0707201012335300
DOI 10.6846/TKU.2010.00211
論文名稱(中文) 應用多層次架構之類別優先度與多重分類器改善文件分類準確率
論文名稱(英文) Adopting the framework of Multi-level Class Priority with Multiple Classifiers to improve the Accuracy of Text Classification
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士在職專班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 98
學期 2
出版年 99
研究生(中文) 董純賢
研究生(英文) Chun-Hsien Tung
學號 797410031
學位類別 碩士
語言別 繁體中文
第二語言別 英文
口試日期 2010-06-15
論文頁數 72頁
口試委員 指導教授 - 蔣定安(chiang@cs.tku.edu.tw)
委員 - 蔣定安(chiang@cs.tku.edu.tw)
委員 - 葛煥昭(keh@cs.tku.edu.tw)
委員 - 王鄭慈(ctwang@tea.ntue.edu.tw)
關鍵字(中) 關聯式分類法
規則排序
規則相依性
多層次類別優先
關鍵字(英) Associative Classification
Ranking
Rule Dependency
Multi-level Class Priority
第三語言關鍵字
學科別分類
中文摘要
一般關聯式分類法(Associative Classification, AC)通常依照準則排序,然而規則與規則間存在著規則相依性(Rule Dependency)的問題,在相同的信賴值、支援值、長度的條件下,規則的執行順序仍然會對分類結果造成影響。
    本論文核心針對規則排序問題,除了採用Lazy法則為一般排序原則針對100%信賴值階層進行文件分類外,並刪除分類過文件重新計算信賴值排序,加上採用多層次類別優先度的概念,來探討其對分類效能的影響。利用TFIDF權重及貝氏分類器初次分類後所得之最低類別準確率設為單一靜態門檻值,AC無法分類之文件則以貝氏分類器來分類,以解決關聯式分類器預設類別降低分類準確率的問題。
英文摘要
Regardless that the associative classification (AC) [1][2] method normally ranks the sequence according to the prescribed criteria, yet in terms of the problem of rule dependency that exists between rules, under the identical confidence value, support value and length criteria, the sequence by which the rules are executed can still impact the classification results.
    The core of the thesis, focusing on rule ranking problems, entails for more than adopting the Lazy[3] method as the general ranking principle for conducting document classification focusing on 100% confidence level, but also by pruning the classified documents to recalculate the confidence value ranking, together with using a multilevel class priority concept, to examine how it affects the classification performance.  The TFIDF[4] weighing and the minimum classification criteria derived from the preliminary classification using the Naïve Bayes[5] classifier are used to define a single still-mode threshold value, and the Naïve Bayes classifier used to classify documents unclassifiable by the associative classification method, aiming to resolve the problem of lowering the classification precision rate due to the preset categories when using the associative classifiers.
第三語言摘要
論文目次
目錄
目錄	 IV
圖目錄	 VI
表目錄	 VII
第1章	緒論	1
1.1	前言	1
1.2	研究動機與目的	2
1.3	論文架構	6
第2章	相關文獻與研究探討	7
2.1	關聯式分類(Associative Classification)	7
2.1.1	預處理(Pre-processing)	12
2.1.2	規則產生(Rule Generation)	12
2.1.3	規則排序 (Ranking)	15
2.1.4	刪除規則(Pruning)	16
2.1.5	關聯式分類器(Association Rule Classifier)	19
2.1.6	多重分類器	20
2.2	TFIDF(Term Frequency Inverse Document Frequency)	22
2.3	貝氏分類法(Naïve Bayes)	23
2.4	評量值	25
第3章	研究方法	27
3.1	問題探討	27
3.2	門檻值設定與多重分類器	32
3.3	分類流程	34
第4章	實驗結果	36
4.1	資料來源	36
4.2	實驗結果	40
4.3	實驗結果分析	44
第5章	結論與未來展望	46
5.1	結論	46
5.2	未來展望	47
文獻參考	48
附錄一英文論文	51

 
圖目錄
圖 2 1 關聯式分類器分類流程示意圖	9
圖 2 2 CBA排序法	15
圖 2 3 Lazy 排序法	16
圖 2 4 database coverage演算法	17
圖 2 5 Lazy演算法	18
圖 3 1 多層次類別優先流程圖	30
圖 3 2 測試分類流程圖	35
圖 4 1 Reuters文件範例	37

 
表目錄
表 2 1 使用AC結合KNN分類法的多重分類器實驗結果	21
表 2 2 文件數量分佈表	25
表 3 1 貝氏分類器初次分類結果	34
表 4 3 Reuters 21578不同類別的文件數	38
表 4 4 Reuters 21578訓練及測試文件數	39
表 4 3 Lazy針對Reuters21578的分類結果	42
表 4 4 貝氏分類器針對Reuters21578的分類結果	42
表 4 5 針對Reuters21578以多重分類器及單一靜態門檻值之分類結果	43
表 4 6 Reuters 21578最佳實驗結果	44
參考文獻
[1]	F. THABTAH, “A review of associative classification mining,” Knowl. Eng. Rev.,  vol. 22, 2007, pp. 37-65.
[2]	Hsin Yuan Chiou, “Improving the performance of Associative Classification by using the Multi-level Class Priority of Rule Ranking,” Master thesis of Tamkang University, Jun. 2010, pp. 1-52.
[3]	Mao-Sheng Hung, “Improve Document Classify Accuracy by Rule – Static threshold and Dynamic threshold Research,” Master thesis of Tamkang University, Jun. 2009, pp. 1-49.
[4]	T.M. Mitchell, Machine Learning, McGraw-Hill Science/Engineering/Math, 1997.
[5]	Y.M. Chen, “Using Association Rule to Improve The Accuracy of Text Categorization - The Combination with other Classifiers,” Master thesis of Tamkang University, Jun. 2009, pp. 1-57.
[6]	G. Salton and C. Buckley, Term Weighting Approaches in Automatic Text Retrieval, Cornell University, 1987.
[7]	B. Liu, W. Hsu, and Y. Ma, “Integrating Classification and Association Rule Mining,” Knowledge Discovery and Data Mining, 1998,  pp. 86, 80.
[8]	U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, 1996.
[9]	K. Wang, S. Zhou, and Y. He, “Growing decision trees on support-less association rules,” Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining,  Boston, Massachusetts, United States: ACM, 2000, pp. 265-269.
[10]	K. Wang, Y. He, and D.W. Cheung, “Mining confident rules without support requirement,” Proceedings of the tenth international conference on Information and knowledge management,  Atlanta, Georgia, USA: ACM, 2001, pp. 89-96.
[11]	P.G. Elena Baralis, “A Lazy Approach to Pruning Classification Rules,” Dec. 2002.
[12]	W. Li, J. Han, and J. Pei, “CMAR: accurate and efficient classification based on multiple class-association rules,” Data Mining, 2001. ICDM 2001,  Proceedings IEEE International Conference on, 2001, pp. 376, 369.
[13]	Yongwook yoon, Gary G. Lee, Tseng, “Text Categorization Based on Boosting Association Rules,”   Semantic Computing 2008 IEEE International Conference on, 2008, pp. 136-143.
[14]	M.F. Porter, “An algorithm for suffix stripping,” Readings in information retrieval, Morgan Kaufmann Publishers Inc., 1997 , pp. 313-316.
[15]	Jing Chen, Zhigang Zhang, Qing Li and Xiaoming Li, 2005, “A Pattern-Based Voting Approach for Concept Discovery on the Web,”   Web Technologies Research and Development-APWeb 2005, Volume 3399/2005
[16]	http://rocling.iis.sinica.edu.tw/CKIP/ 
[17]	Karras, DA, 2006, “An Improved Text Categorization Methodology Based on Second and Third Order Probabilistic Feature Extraction and Neural Network Classifiers,”   Lecture Notes in Computer Science, 2006,  pp. 9-20.
[18]	J.R. Quinlan and R.M. Cameron-jones, “FOIL: A Midterm Report,” IN PROCEEDINGS OF THE EUROPEAN CONFERENCE ON MACHINE LEARNING,  vol. 667, 1993, pp. 3--20.
[19]	E. Baralis, S. Chiusano, and P. Garza, “On support thresholds in associative classification,” Proceedings of the 2004 ACM symposium on Applied computing,  Nicosia, Cyprus: ACM, 2004, pp. 553-558.
[20]	R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int. Conf. Very Large Data Bases, VLDB, J.B. Bocca, M. Jarke, and C. Zaniolo, eds., Morgan Kaufmann, 1994, pp. 487–499.
[21]	P. Soucy and G. Mineau, “A simple KNN algorithm for text categorization,” Data Mining,  2001. ICDM 2001, Proceedings IEEE International Conference on, 2001, pp. 647-648.
[22]	Y. Yang and X. Liu, “A re-examination of text categorization methods,” Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, United States: ACM, 1999, pp. 42-49.
[23]	T. Joachims, “A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization,” Proceedings of the Fourteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., 1997, pp. 143-151.
[24]	P. Bickel and E. Levina, “Some theory for Fisher's linear discriminant function, `naive Bayes', and some alternatives when there are many more variables than observations,” Bernoulli,  vol. 10, 2004, pp. 1010, 989.
[25]	Tseng, Yuen-Hsien, “Effectiveness Issues in Automatic Text Categorization,” Bulletin of the Library Association of China,  vol. 68, Jun. 2002, pp. 62-83.
[26]	Cho-Ming Lee, “Classifying Chinese Text Documents by Association Rule,” Master thesis of Tamkang University, Jun. 2006, pp. 1-66.
論文全文使用權限
校內
紙本論文於授權書繳交後1年公開
同意電子論文全文授權校園內公開
校內電子論文於授權書繳交後1年公開
校外
同意授權
校外電子論文於授權書繳交後1年公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信