淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


下載電子全文限經由淡江IP使用) 
系統識別號 U0002-0207200715370200
中文論文名稱 以模糊相關分析為基礎之文件多重分類方法
英文論文名稱 Text Multi-Categorization Method based on Fuzzy Correlation Analysis
校院名稱 淡江大學
系所名稱(中) 資訊工程學系博士班
系所名稱(英) Department of Computer Science and Information Engineering
學年度 95
學期 2
出版年 96
研究生中文姓名 闕豪恩
研究生英文姓名 Hao-En Chueh
學號 890190134
學位類別 博士
語文別 英文
口試日期 2007-06-08
論文頁數 63頁
口試委員 指導教授-林丕靜
委員-趙榮耀
委員-蔣定安
委員-謝楠楨
委員-王亦凡
中文關鍵字 文件多重分類  模糊簡單相關分析  模糊半淨相關分析 
英文關鍵字 Text Multi-Categorization  Fuzzy Simple Correlation Analysis  Fuzzy Semi-Partial Correlation Analysis 
學科別分類 學科別應用科學資訊工程
中文摘要 文件多重分類,是一種根據文件本身的內容,將未分類的文件適當地歸類到一個或一個以上預先設定類別的過程。由於未分類文件的內容可能涉及多個不同的主題,因此文件多重分類的方式是合理的。
為了能合理地將未分類的文件進行多重分類,在本論文中提出一個以模糊相關分析為基礎之文件多重分類。模糊簡單相關 (Fuzzy Simple Correlation) 分析是一種用以分析兩個模糊屬性之間是否存線性關係的有效工具,而本論文所提出的方法,可用來協助了解未分類文件與各預先設定類別的關聯性。但由於在大部分的文件分類過程中,各預設類別間可能存在相關性,因此在分析一篇未被分類之文件與某個預先設定類別間的關連性時,則必須考慮到與此類別有關聯性之其他類別所可能產生的影響。因此,本論文中推導出另一種重要的模糊相關分析,稱之為模糊半淨相關(Fuzzy Semi-Partial Correlation)分析,並將此種模糊相關分析應用於處裡此種情形。
本論文所提出方法的主要架構,是在分類過程中逐步分析出,對於未分類文件的內容能夠提供顯著解釋(說明)能力的預先設定類別。而此類別即是在每一個分析步驟中,與未分類文件之間具有最大模糊半淨相關係數者。根據模糊簡單相關係數及模糊半淨相關係數的特性,以及利用顯著性分析的檢定,便可從所有預先設定類別中擷取出與未分類文件具有最顯著正關聯性的類別,也可依此將未分類的文件適當地歸類到一個或一個以上的預設類別中。
英文摘要 Text multi-categorization is the procedure that each unlabeled text document can be assigned into more than one appropriate category according to its content. Because content of an unlabeled text document may be involved in different issues, this kind of text categorization procedure, text multi-categorization, seems reasonably.
To assign an unlabeled text document into more than one appropriate category, a novel text multi-categorization method based on fuzzy correlation analysis is proposed in this thesis. A fuzzy simple correlation analysis show the strength and the direction of linear relationship between two fuzzy attributes, which is useful for us to analyze the relationships between the unlabeled text documents and the predefined categories. But, in a text categorization procedure, there may be a relationship between the predefined categories. Effects of other predefined categories may influence the relationship between the observed text document and the objective predefined category. Thus, a fuzzy semi-partial correlation analysis which examining the relationship between two fuzzy attributes when the influences of other fuzzy attributes are removed is used together with the fuzzy simple correlation analysis to construct a new text multi-categorization method in this dissertation.
The main concept of our proposed method is to find the predefined categories which can significantly describe (explain) the content of an unlabeled text document, step by step. The category we choose at each step is the category with the largest fuzzy semi-partial correlation between itself and the unlabeled text document after remove the influences of the already assigned categories. According to the properties of these fuzzy correlation coefficients, and by using the test of significance, we can find the categories with the most positive relationships to an unlabeled text document, and thus assign these appropriate categories to the document.
論文目次 Contents Ⅰ
List of Figures Ⅲ
List of Tables Ⅳ
1 Introduction 1
1.1 Motivation of this Dissertation 1
1.2 Research Objectives of this Dissertation 4
1.3 Organization of this Dissertation 6
2 Background Knowledge of Text Categorization 7
2.1 Text Mining and Information Retrieval 7
2.2 Text Categorization 10
2.2.1 Data selection 11
2.2.2 Characteristic extraction and analysis 13
2.2.3 Similarity measurement 17
2.2.4 Categories determination 19
2.2.5 Accuracy measurement 22
3 Fuzzy Correlation Analysis 23
3.1 Simple Correlation Analysis of Fuzzy Sets 24
3.2 Partial Correlation Analysis of Fuzzy Sets 28
3.3 Semi-Partial Correlation Analysis of Fuzzy Sets 30
3.3.1 First-order semi-partial correlation analysis of fuzzy sets 31
3.3.2 Generalized semi-partial correlation analysis of fuzzy sets 34
3.3.3 Fuzzy semi-partial correlation analysis used in fuzzy prediction 38
3.4 Summary 41
4 Text Multi-Categorization Method based on Fuzzy Correlation Analysis 42
4.1 Data Preprocesses 43
4.2 Fuzzy Simple Correlation used in Text Multi-Categorization 44
4.3 Fuzzy Semi-Partial Correlation used in Text Multi- Categorization 47
5 Experiment and Results 51
6 Conclusions 57
Appendix A 58
Appendix B 59
References 60

List of Figures
Figure 2.1 Relationship between the set of relevant documents and the set of retrieved documents. 9

List of Tables
Table 2.1 Profile selections of related literatures. 11
Table 2.2 Distances between T and categories A, B, C. 19
Table 2.3 Probabilities of T belongs to categories A, B, C. 20
Table 2.4 Result of the categorization of category Ci. 22
Table 5.1 The procedure of text multi-categorization. 52
Table 5.2 Results of the experiment by using fuzzy correlation analysis. (α= 0.10.) 56
Table 5.3 Results of the experiment by using Euclidean distance. 56

參考文獻 [1] T. W. Anderson, An Introduction to Multivariate Statistical Analysis, 2nd Ed. John Wiley & Sons, 1984.
[2] S. F. Arnold, Mathematical Statistics, Prentice- Hall, New Jersey, 1990.
[3] L. Douglas Baker, Andrew K. Mccallum, “Distributional clustering of words for text categorization”, Proceedings of the 21th Annual International ACM SIGIR Conference on Research and Development in Information Retrieva, pp. 96-103, 1998.
[4] H. Bustince, P. Burillo, “Correlation of interval-valued intuitionistic fuzzy sets”, Fuzzy sets and systems, Vol. 74, 1995, pp.237-244.
[5] D. -A. Chiang, N. P. Lin, “Correlation of Fuzzy Sets”, Fuzzy Sets and Systems, Vol. 102, 1999, pp. 221-226.
[6] D. -A. Chiang, N. P. Lin, “Partial Correlation of Fuzzy Sets”, Fuzzy Sets and Systems, Vol. 110, 2000, pp. 209-215.
[7] H. -E. Chueh, Nancy P. Lin, “Fuzzy Correlation used in Text Multi-Categorization Problem”, Proceedings of Artificial Neural Network in Engineering, 2001, pp. 319-324.
[8] H. -E. Chueh, “Fuzzy Correlation used in Text Multi-Categorization Problem”, Master Degree Thesis, Department of Computer Science and Information Engineering, Tamkang University, Taipei, 2001.
[9] S. Dowdy, S. Wearden, Statistics for Research, John Wiley & Sons, 1983.
[10] M. H. Dunham, Data mining: Introductory and Advanced Topics, Pearson Education, Inc., 2003.
[11] T. Gerstenkorn, J. Manko, “Correlation of intuitionistic fuzzy sets”, Fuzzy Sets and Systems, Vol. 44, 1991, pp.39-43.
[12] J. Han, M. Kamber, Data mining: Concepts and Techniques, Academic Press, 2001.
[13] D. H. Hong, S. Y. Hwang, “Correlation of intuitionistic fuzzy sets in probability spaces”, Fuzzy Sets and Systems, Vol. 75, 1995, pp.77-81.
[14] Http://etds.ncl.edu.tw/theabs/english_site/search_simple_eng.jsp
[15] M. Iwayama, T. Tokunaga, “Cluster-based text categorization: a comparison of category search strategies”, Proceedings of the 18th Annul International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 273-281, 1995.
[16] T. C. Jo, “Text categorization with the concept of fuzzy set of informative keywords”, IEEE International Fuzzy Systems Conference Proceedings, Vol. 2, 1999, pp.609-614.
[17] T. Joachims, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features”, Proceedings of the European Conference on Machine Learning, 1998.
[18] G. J. Klir, T. A. Folger, Fuzzy Sets, Uncertainty, and Information, Prentice-Hall International, Inc., 1988.
[19] G. J. Klir, B. Yuan, Fuzzy Sets and Fuzzy Logic, Theory and Applications, Prentice-Hall International, Inc., 1995.
[20] D. Koller, M. Sahami, “Hierarchically classifying documents using very few words”, Proceedings of the Fourteenth International Conference on Machine Learning, pp. 170-178, 1997.
[21] W. Lam, C.Y. Ho, “Using a generalized instance set for automatic text categorization”, Proceedings of the 21th Annul International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.81-89, 1998.
[22] D.D. Lewis, M. Ringuette, “Comparison of two learning algorithms for text categorization”, Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval, 1994.
[23] N. P. Lin, H. -E. Chueh, “Fuzzy Semi-Partial Correlation Analysis”, WSEAS TRANSACTIONS on COMPUTERS, Issue 12, Vol. 5, 2006, pp. 2970-2976.
[24] N. P. Lin, H. -E. Chueh, “Text Multi-Categorization Based on Fuzzy Correlation Analysis”, WSEAS TRANSACTIONS on SYSTEMS, Issue 2, Vol. 6, 2007, pp. 273-278.
[25] B. Masand, G. Linoff, D. Waltz, “Classifying news stories using memory based reasoning”, Proceedings of the 15th Annul International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59-64, 1992.
[26] A. McCallum, K. Nigam, “A comparison of event models for Naive Bayes text classification”, Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, 1998.
[27] T. Mitchell, Machine Learning, McGraw Hill, 1996.
[28] N. -L. Taso, “The investigation of fuzzy document classification on Interent,” Master Degree Thesis, Department of Computer Science and Information Engineering, Tamkang University, Taipei, 2000.
[29] C. J. van Rijsbergen, Information Retrieval, Butterworths, London, 1979.
[30] Y. Yang, “Expert network: Effective and efficient learning from human decisions in text categorization and retrieval”, Proceedings of the 17th Annul International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.13-22, 1994.
[31] Y. Yang, J.P. Pedersen, “Feature selection in statistical learning of text categorization”, Proceedings of the Fourteenth International Conference on Machine Learning, pp.412-420, 1997.
[32] Y. Yang, Xin Liu, “A re-examination of text categorization methods”, Proceedings of the 22nd Annual International ACM SIGIR conference, 1999, pp.42-49.
[33] Y. Yang, “An evaluation of statistical approaches to text categorization”, Journal of Information Retrieval, 1999.
[34] C. Yu, “Correlation of fuzzy numbers”, Fuzzy Sets and Systems, Vol. 55, 1993, pp.303-307.
[35] L. A. Zadeh, “Fuzzy sets”, Information and Control, Vol. 8, 1965, pp. 338-353.
[36] H. -J. Zimmermann, Fuzzy Set Theory and Its Applications, 2nd Education, Kluwer Academic Publishers, 1991.
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2008-07-06公開。
  • 同意授權瀏覽/列印電子全文服務,於2008-07-06起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信