§ 瀏覽學位論文書目資料
  
系統識別號 U0002-1608200920232500
DOI 10.6846/TKU.2009.00569
論文名稱(中文) 利用關聯法則改善文件分類準確度-類別優先問題之探討
論文名稱(英文) Improving the Accuracy of Text Categorization by Using Association Rule with Class Priority
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士在職專班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 97
學期 2
出版年 98
研究生(中文) 熊瀚升
研究生(英文) Han-Sheng Hsiung
學號 796410222
學位類別 碩士
語言別 繁體中文
第二語言別 英文
口試日期 2009-06-17
論文頁數 54頁
口試委員 指導教授 - 蔣定安(chiang@cs.tku.edu.tw)
委員 - 蔣定安(chiang@cs.tku.edu.tw)
委員 - 葛煥昭(keh@cs.tku.edu.tw)
委員 - 王鄭慈(ctwang@tea.ntue.edu.tw)
關鍵字(中) 關聯式分類法
規則排序
規則產生
候選詞彙項目
關鍵字(英) Associative Classification
Ranking
Rule Generation
candidate frequent ruleitem
第三語言關鍵字
學科別分類
中文摘要
一般關聯式分類法(Associative Classification, AC)在規則排序(Ranking)[1][2]上,作法是先依照信賴值由高至低排序,接著依支援值由高至低排序,再依規則由短至長排序,短規則因為通用性較高,通常為了讓更多文件可以分類,因此短規則在排序上優於長規則。
本論文核心即在針對規則排列問題,除了採用Lazy法[3]所提出的排序法則為一般排序原則外,再加上本論文提出之類別優先度來探討其對分類效能的影響。再結合TFIDF[4]及貝氏分類器[5]先做第一次分類,計算其準確率及F1值,利用這些數據設定單一門檻值、為了避免不同類別間的落差,針對各類別設定多重門檻值,並利用靜態不變及動態修正門檻值兩種方式來引用規則並執行分類。
英文摘要
General relational classification (Associative Classification, AC) in the rules of order (Ranking) [1] [2], the approach is to rely on the value of pupils in accordance with the order, and then sorted according to support the value of pupils, according to the rules Sorting by short to long and short rules because of the higher common, usually in order to allow more files can be categorized, so in short order on the rules of the rules is better than long. 
    In this paper, that is the core of the problem in order for the rules, in addition to the use of Lazy method [3] by the law of the sort order for the general principles, together with the categories proposed in this paper to discuss the priority of its impact on the classification performance. Combined with the TFIDF [4] and Bayesian classifier [5] first classified the first time to calculate their accuracy rate and the F1 value, use the data to set a single threshold value, in order to avoid differences between different categories for each of the categories to set multiple threshold value, and use the same static and dynamic threshold amended to refer to two ways and the implementation of classification rules.
第三語言摘要
論文目次
目錄
第一章	序論	1
1.1 前言	1
1.2 研究動機與目的	2
1.3  論文架構	4
第二章	相關文獻探討	5
2.1 關聯式分類 (Associative Classification)	5
2.2 預處理(Pre-processing)	7
2.3 規則產生 (Rule Generation)	7
2.4 規則排序 (Ranking)	10
2.5 刪除規則 (Pruning)	11
2.6 關聯式分類器 (Association Rule Classifier)	13
2.7 多重分類器	15
2.8 TFIDF特徵選取	17
2.9 Naive-Bayes貝氏分類法	18
2.10 評量值	20
第三章	研究方法	22
3.1 類別優先度(Class Priority)	22
3.2 門檻值設定	24
3.3 靜態門檻值	25
3.4 動態門檻值	26
3.5 執行策略	28
第四章	實驗結果	30
4.1 資料來源	30
4.2 實驗步驟	32
4.3 實驗結果	33
4.3.1 Precision-based分類結果	33
4.3.2 F1-based分類結果	35
4.4 實驗結果分析	38
第五章	結論與未來展望	39
5.1 結論	39
5.2 未來展望	40
參考文獻	41
附錄  英文論文	43

圖目錄

圖2-1 關聯式分類器分類流程示意圖	6
圖2-2 CBA 排序法	10
圖2-3 Lazy 排序法	10
圖2-4 database coverage演算法	11
圖2-5 Lazy 演算法	12
圖3-1 靜態門檻值流程圖	25
圖3-2 動態門檻值流程圖	26
 
表目錄

表2-1 關聯式規則搜索與關聯式分類差異表	5
表2-2 Lazy分類器實驗結果	15
表2-3 利用準確率為靜態及動態門檻值之分類準確率比較	16
表2-4 文件數量分佈表	20
表4-1 由各系所選取出的文章數	30
表4-2 文件描述的格式	31
表4-3 文件經過斷詞以後的結果	32
表4-4 依準確率設定單一門檻值	33
表4-5 依準確率設定多重門檻值	34
表4-6 利用準確率為靜態及動態門檻值之分類準確率比較	35
表4-7 利用準確率為靜態及動態門檻值之分類文件正確數比較	35
表4-8 依F1設定單一門檻值	35
表4-9 依F1設定多重門檻值	36
表4-10 利用F1為靜態及動態門檻值之分類準確率比較	37
表4-11 利用F1為靜態及動態門檻值之分類文件正確數比較	37
表4-12 最佳實驗結果	38
參考文獻
[1] B. Liu, W. Hsu, and Y. Ma, “Integrating Classification and Association
Rule Mining," Knowledge Discovery and Data Mining, 1998, pp. 86, 80.
[2] F. THABTAH, “A review of associative classification mining," Knowl.
Eng. Rev., vol. 22, 2007, pp. 37-65.
[3] P.G. Elena Baralis, “A Lazy Approach to Pruning Classification
Rules," Dec. 2002.
[4] G. Salton and C. Buckley, Term Weighting Approaches in Automatic Text
Retrieval, Cornell University, 1987.
[5] T.M. Mitchell, Machine Learning, McGraw-Hill
Science/Engineering/Math, 1997.
[6] Yongwook Yoon and G. Lee, “Text Categorization Based on Boosting
Association Rules," Semantic Computing, 2008 IEEE International
Conference on, 2008, pp. 136-143.
[7] M.F. Porter, “An algorithm for suffix stripping," Readings in
information retrieval, Morgan Kaufmann Publishers Inc., 1997, pp.
313-316.
[8] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association
Rules," Proc. 20th Int. Conf. Very Large Data Bases, VLDB, J.B. Bocca,
M. Jarke, and C. Zaniolo, eds., Morgan Kaufmann, 1994, pp. 487–499.
[9] J.R. Quinlan and R.M. Cameron-jones, “FOIL: A Midterm Report," IN
PROCEEDINGS OF THE EUROPEAN CONFERENCE ON MACHINE LEARNING, vol. 667,
1993, pp. 3--20.
[10] W. Li, J. Han, and J. Pei, “CMAR: accurate and efficient
classification based on multiple class-association rules," Data
Mining, 2001. ICDM 2001, Proceedings IEEE International Conference
on, 2001, pp. 376, 369.
[11] Y.M. Chen, “Using Association Rule to Improve The Accuracy of Text
Categorization - The Combination with other Classifiers," Master
thesis of Tamkang University, Jun. 2009, pp. 1-57.
[12] M. Hung, “Improve document classify accuracy by association
rule-static threshold and dynamic threshold research ," Master
thesis of Tamkang University, Jun. 2009, pp. 1-40.
[13] Y. Yang and X. Liu, “A re-examination of text categorization
methods," Proceedings of the 22nd annual international ACM SIGIR
conference on Research and development in information retrieval,
Berkeley, California, United States: ACM, 1999, pp. 42-49.
[14] T. Joachims, “A Probabilistic Analysis of the Rocchio Algorithm with
TFIDF for Text Categorization," Proceedings of the Fourteenth
International Conference on Machine Learning, Morgan Kaufmann
Publishers Inc., 1997, pp. 143-151.
[15] P. Bickel and E. Levina, “Some theory for Fisher's linear
discriminant function, `naive Bayes', and some alternatives when
there are many more variables than observations," Bernoulli, vol.
10, 2004, pp. 1010, 989.
[16] Tseng, Yuen-Hsien, “Effectiveness Issues in Automatic Text
Categorization," Bulletin of the Library Association of China, vol.
68, Jun. 2002, pp. 62-83.
[17] 國家圖書館, “ 全國博碩士論文資訊網,
http://etds.ncl.edu.tw/theabs/index.html."
[18] 中央研究院, “中文斷詞系統, http://ckipsvr.iis.sinica.edu.tw/."
論文全文使用權限
校內
紙本論文於授權書繳交後1年公開
同意電子論文全文授權校園內公開
校內電子論文於授權書繳交後1年公開
校外
同意授權
校外電子論文於授權書繳交後1年公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信