淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


下載電子全文限經由淡江IP使用) 
系統識別號 U0002-1508200923010800
中文論文名稱 利用關聯式法則改善文件分類準確度-靜態與動態門檻值問題之探討
英文論文名稱 Improve Document Classify Accuracy by Association Rule - Static threshold and Dynamic threshold Research
校院名稱 淡江大學
系所名稱(中) 資訊工程學系碩士在職專班
系所名稱(英) Department of Computer Science and Information Engineering
學年度 97
學期 2
出版年 98
研究生中文姓名 洪茂盛
研究生英文姓名 Mao-Sheng Hung
學號 796410172
學位類別 碩士
語文別 中文
口試日期 2009-06-17
論文頁數 49頁
口試委員 指導教授-蔣定安
委員-王鄭慈
委員-葛煥昭
委員-蔣定安
中文關鍵字 文件分類  關聯式法則  靜態門檻值  動態門檻值 
英文關鍵字 document classification  association rule  static threshold  dynamic threshold 
學科別分類 學科別應用科學資訊工程
中文摘要 在利用關聯式法則(Association Rule)做分類時,一般關聯式法則分類(Association Rule Classification)的信賴度門檻值設定,大多是依據經驗法則來設定單一且固定的信賴值(Confidence)為其門檻值(Threshold value),所以在設定上較為主觀。同時為了提升分類的準確度,依經驗通常選取較高的信賴度當門檻值,但門檻值若設定太高時,則容易使得部分文件因缺乏規則無法判斷其歸屬的類別,而必須利用預設規則(default rule)將這些文件分類成預設類別;若將門檻值降低則可能造成文件分類錯誤而降低分類的效能。因此本論文將針對門檻值問題做相關探討。
本論文針對信賴度門檻值問題,分兩部分來討論,一為採取靜態門檻值或動態門檻值;另外是採單一門檻值或多重門檻值。動態門檻值的概念是在每次分類後比較準確率是否有提升,決定是否向上修定原始的門檻值;而多重門檻值的概念則是可根據不同的類別,設定不同的門檻值。
實驗將依據不同的組合來設定門檻值,同時,希望能依據實驗的結果,找出如何能以客觀的方式來設定信賴度門檻值,並提升分類的效能。
英文摘要 While using association rule for classification, the experience for association classification rules setting is following single and fixed confidence threshold value, hence is comparatively subjective. In order to increase the accuracy of classification, usually choose higher confidence in accordance with experience, but if set the confidence too high, might cause a part of documentations failed to justify the attributes by lacking rules; if set the confidence too low, it may decrease the documentation classification efficiency.
This thesis focus on the threshold value discussion, which divides into two parts, one is static threshold value, though the training process is quicker and simpler, but during the classification procedure, the accuracy that originally already been improved could probably be influenced by follow-up lower confidence rule, namely this kind of confidence rule accuracy is low than the original threshold value setting, therefore may decrease the documentation classification efficiency, so that this thesis proposes the dynamic threshold value, to determine whether the threshold value is upward revision by after each classified whether comparative improved the accuracy or not, also propose in an objective way to set the confidence threshold value to improve the classification efficiency, this thesis proved by experiment the dynamic threshold value can obtain better classification efficiency than static threshold value.
論文目次 第一章 緒論 1
1.1 前言 1
1.2 研究動機與目的 2
1.3 論文架構 4
第二章 相關文獻與研究探討 5
2.1 關聯式分類 (Associative Classification) 5
2.1.1 規則產生 (Rule Generation) 9
2.1.2 規則排序 (Ranking) 11
2.1.3 刪除規則 (Pruning) 12
2.1.4 關聯式分類器 (Association Rule Classifier) 15
2.1.5 多重分類器 16
2.2 TFIDF (Term Frequency Inverse Document Frequency) 18
2.3 貝氏分類器(Naïve Bayes Classifier) 19
2.4 評量值 21
第三章 研究方法 23
3.1 門檻值設定 23
3.2 靜態門檻值 24
3.3 動態門檻值 26
3.4 實驗步驟 27
3.5 執行策略 28
第四章 實驗結果 30
4.1資料來源 30
4.2 實驗結果 32
4.2.1 Precision-based分類結果 32
4.2.2 F1-based分類結果 34
4.3 實驗結果分析 36
第五章 結論與未來展望 37
5.1 結論 37
5.2 未來展望 38
文獻參考 39
英文論文 41


圖目錄
圖2-1 關聯式分類流程圖 6
圖2-2 CBA 排序法 11
圖2-3 Lazy 排序法 12
圖2-4 database coverage演算法 13
圖2-5 Lazy 演算法 14
圖3-1 靜態門檻值流程圖 24
圖3-2 靜態門檻值流程圖 26

表目錄
表2-1 關聯式規則搜索與關聯式分類器差異表 6
表2-2 利用AC單一分類器的實驗結果 16
表2-3 使用AC結合KNN分類法的多重分類器實驗結果 17
表2-4 文件數量分佈表 21
表4-1 由各系所選取出的文章數 30
表4-2 文件描述的格式 31
表 4-3 依準確率設定單一門檻值 32
表4-4 依準確率設定多重門檻值 32
表4-5 利用準確率為靜態及動態門檻值之分類準確率比較 33
表4-6 利用準確率為靜態及動態門檻值之分類文件正確數比較 34
表 4-7 依F1設定單一門檻值 34
表 4-8 依F1設定多重門檻值 34
表4-9 利用F1為靜態及動態門檻值之分類準確率比較 35
表4-10 利用F1為靜態及動態門檻值之分類文件正確數比較 35
表4-11 最佳實驗結果 36

參考文獻 [1] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, 1996.
[2] K. Wang, S. Zhou, and Y. He, “Growing decision trees on support-less association rules,” Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, Massachusetts, United States: ACM, 2000, pp. 265-269.
[3] K. Wang, Y. He, and D.W. Cheung, “Mining confident rules without support requirement,” Proceedings of the tenth international conference on Information and knowledge management, Atlanta, Georgia, USA: ACM, 2001, pp. 89-96.
[4] Y.M. Chen, “Using Association Rule to Improve The Accuracy of Text Categorization - The Combination with other Classifiers,” Master thesis of Tamkang University, Jun. 2009, pp. 1-57.
[5] F. THABTAH, “A review of associative classification mining,” Knowl. Eng. Rev., vol. 22, 2007, pp. 37-65.
[6] J.R. Quinlan and R.M. Cameron-jones, “FOIL: A Midterm Report,” IN PROCEEDINGS OF THE EUROPEAN CONFERENCE ON MACHINE LEARNING, vol. 667, 1993, pp. 3--20.
[7] B. Liu, W. Hsu, and Y. Ma, “Integrating Classification and Association Rule Mining,” Knowledge Discovery and Data Mining, 1998, pp. 86, 80.
[8] P.G. Elena Baralis, “A Lazy Approach to Pruning Classification Rules,” Dec. 2002.
[9] E. Baralis, S. Chiusano, and P. Garza, “On support thresholds in associative classification,” Proceedings of the 2004 ACM symposium on Applied computing, Nicosia, Cyprus: ACM, 2004, pp. 553-558.
[10] W. Li, J. Han, and J. Pei, “CMAR: accurate and efficient classification based on multiple class-association rules,” Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, 2001, pp. 376, 369.
[11] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int. Conf. Very Large Data Bases, VLDB, J.B. Bocca, M. Jarke, and C. Zaniolo, eds., Morgan Kaufmann, 1994, pp. 487–499.
[12] P. Soucy and G. Mineau, “A simple KNN algorithm for text categorization,” Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, 2001, pp. 647-648.
[13] T.M. Mitchell, Machine Learning, McGraw-Hill Science/Engineering/Math, 1997.
[14] G. Salton and C. Buckley, Term Weighting Approaches in Automatic Text Retrieval, Cornell University, 1987.
[15] Y. Yang and X. Liu, “A re-examination of text categorization methods,” Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, United States: ACM, 1999, pp. 42-49.
[16] T. Joachims, “A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization,” Proceedings of the Fourteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., 1997, pp. 143-151.
[17] P. Bickel and E. Levina, “Some theory for Fisher's linear discriminant function, `naive Bayes', and some alternatives when there are many more variables than observations,” Bernoulli, vol. 10, 2004, pp. 1010, 989.
[18] Tseng, Yuen-Hsien, “Effectiveness Issues in Automatic Text Categorization,” Bulletin of the Library Association of China, vol. 68, Jun. 2002, pp. 62-83.
[19] 中央研究院, “中文斷詞系統, http://ckipsvr.iis.sinica.edu.tw/.”
[20] 國家圖書館, “全國博碩士論文資訊網,
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2009-08-20公開。
  • 同意授權瀏覽/列印電子全文服務,於2009-08-20起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信