電子學位論文服務

§ 瀏覽學位論文書目資料

本論文電子全文於2010-03-01起於校外公開使用
本論文紙本於2011-03-01起公開使用

系統識別號	U0002-1902201011191500
DOI	10.6846/TKU.2010.00505
論文名稱(中文)	以關鍵字序列型樣探勘為基礎之文件檢索方法
論文名稱(英文)	Document retrieval based on mining keywords sequential patterns
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	資訊工程學系碩士班
系所名稱(英文)	Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	98
學期	1
出版年	99
研究生(中文)	吳建興
研究生(英文)	Chien-Hsin Wu
學號	696411742
學位類別	碩士
語言別	繁體中文
第二語言別	英文
口試日期	2010-01-15
論文頁數	67頁
口試委員	指導教授 - 林丕靜(nancylin@mail.tku.edu.tw) 委員 - 謝楠楨(nchsieh@ntcn.edu.tw) 委員 - 蔣定安(chiang@cs.tku.edu.tw) 委員 - 林丕靜(nancylin@mail.tku.edu.tw)
關鍵字(中)	資訊檢索資料探勘序列型樣
關鍵字(英)	Information Retrieval Data mining Sequential patterns
第三語言關鍵字
學科別分類
中文摘要	在這個資訊爆炸的時代，網路使用者透過現今的資訊搜索功能，多半能迅速檢索到大量的相關資訊，但很可能因為精確度欠佳及各種影響檢索系統判別的因素之存在，而使得使用者經常檢索出過多的資訊，而這些資訊又往往與使用者的期待相距甚遠，雖然能夠在一時之間搜尋整理出大量的相關資訊，但卻無法精準地達到其資料搜尋應有的效率。本研究為了謀求這個問題的解決方法，是以『以關鍵字序列型樣探勘為基礎之文件檢索方法』研究的目的與主題。使用者經常性檢索出過多資訊的主要原因，多半是由於檢索系統對於關鍵字的斷句位置判斷錯誤，進而導致檢索系統最後檢索出錯誤的資訊。本研究方法首先利用以關鍵字輸入之先後順序作為篩選方式，減少與使用者輸入之關鍵字詞不契合的網站，再透過序列型樣探勘加強並且揣測使用者所需要的檔，並且根據此項結果，進行排序的計算，加速使用者在尋找檔時的便利性。實驗顯示，與目前大部分檢索網站所檢索出結果比較，本方法確實可以檢索出更精簡、更正確的資訊。
英文摘要	In this age of information explosion, internet users through today's information-search feature, most quickly retrieved a large number of relevant information, but in all probability due to the poor and various impacts retrieval system discrimination factors exist, making users usually retrieved too much information. This study to the problem, is to “Document retrieval based on mining keywords sequential patterns” to purpose and solve in the topic. This information is often related to a user's expectations far cry, although to be able to put together all of a sudden to search for a great deal of information, but it still cannot really reaches its data to search for the necessary efficiency. First, the study use keywords which user’s key in Successively select that not match the web site and decrease result. And using sequential patterns mining to strengthen and guess user what they want documents, according to this result to calculate the rank, speeding up looking for documents by user.The experiment shows, and most research sites the comparison is retrieved as a result, this method can be retrieved more simple and correct information.
第三語言摘要
論文目次	第一章前言 1 1.1 背景 1 1.2 研究動機 3 1.3 研究方法與步驟 5 1.4 論文內容大綱 6 第二章文獻探討及相關研究 7 2.1 資訊檢索(Information retrieval) 7 2.1.1 布林函數模型(Boolean Model) 7 2.1.2 向量空間模型(Vector Space Model) 8 2.1.3 文件檢索(document retrieval) 11 2.2 現行的網頁排序所用的演算法 12 2.2.1 PageRank Algorithm 12 2.2.2 PageRank的優缺點 18 2.2.3 Timed PageRank演算法 18 2.3 資料探勘(data mining) 19 2.3.1 資料探勘 19 2.3.2 序列型樣探勘 22 第三章以關鍵字序列型樣探勘為基礎之檔檢索方法 27 3.1 模型架構 27 3.2 資料預處理 29 3.3 依關鍵字順序篩選文件 30 3.4 序列型樣探勘 35 3.5 依Score值排序文件 39 3.6 選擇使用者需要的關鍵字 40 第四章實驗結果 41 4.1 實驗資料來源 41 4.2 實驗流程 41 4.3 檢索的效能評估 52 4.4 實驗結果 54 第五章結論與未來展望 55 5.1 結論 55 5.2 未來展望 55 參考文獻 57 附錄-英文論文 61 圖目錄圖2-1 向量空間模型 8 圖2-2 資料探勘 20 圖3-1 本方法對使用者所輸入關鍵字進行文章搜尋的流程圖 28 圖3-2 篩選順序正確文件之流程圖 34 圖3-3 關鍵字序列型樣探勘流程圖 38 表目錄表2-1 相似度計算公式比較 10 表4-1 資料預處理 41 表4-2 C1的部分資料，support count=15 44 表4-3 C2的部分資料，support count=6 45 表4-4 C3的資料 46 表4-5 部分文章之n, m, SL, Total.gaps和score值 47 表4-6 各文章優先值(Score)及序列長度 47 表4-7 查詢需求各階段的篩檢 52 表4-8 實驗結果 53 公式目錄式（1） 39
參考文獻	1. 高嘉祺(民89)，線上圖文購物引擎-以手機應用為例，大葉大學資訊管理所碩士論文。 2. 曾憲雄,蔡秀滿,蘇東興,曾秋蓉,王慶堯. “資料探勘”. 旗標出版股份有限公司. (2006 年3 月) 3. 林隆祺(民89)，運用字詞位置的文件索引技術初探，台灣大學資訊管理學研究所碩士論文 4. 凌士雄（民93），非對稱性分類分析解決策略之效能比較，國立中山大學資訊管理學系碩士在職專班論文。 5. Mowen,, J C.. Consumer Behavior. Maxwell. 2nded..1990 6. Sarwar, B.M., J. A. Konstan, AI Borchers, Jon Herlocker, Brad Miller and John Riedl, “Using filtering agent to improve prediction quality in the Grouplens research collaborative filtering system,” Proceedings of ACM 1998 Conference on Computer Supported Cooperative Work, p. 345, Nov.1998 7. S.K.M. Wong, Vijay V. Raghavan, "Vector space model of information retrieval: areevaluation", Proceedings of the 7th annual international ACM SIGIR conference on Research and development in information retrieval, British Computer Society, Cambridge, England,1984, pp.167 – 185 8. Silva, I.R., Souza, J.N.; Santos, K.S., "Dependence among terms in vector space model", Database Engineering and Applications Symposium, 2004. IDEAS '04. Proceedings. International 7-9 July 2004, pp. 97-102. 9. Hideyuki UCHIDA Atsushi MANO and Takashi YUKAWA,“Patent Map Generation using Concept-based Vector Space Model”,working notes of NTCIR-4, Tokyo,2-4 June 2004 10. Gary H. Merrill, “The Babylon Project: Toward an Extensible Text-Mining Platform”, IT Pro, IEEE Computer Society, March \| April 2003. 11. Comparisons of similarity metrics http://www.dcs.shef.ac.uk/~sam/stringmetrics.html#compare 12. Johan Natt och Dag, Björn Regnell, “Evaluating Automated Support for Requirements Similarity Analysis in Market-Driven Development” 13. Tzeras K, Hartmann S., Automatic indexing based on Bayesian inference networks, In Processings of SGIIR-93, 16th ACM International Conference on Research and Development in Information Retrieval, 1993, pp.23-34 14. Agrawal R., Srikant R., Mining sequential patterns, in: Proc. 1995 Internet. Conf. Data Engineering, pp.3–14, (1995). 15. Han, J., Kamber, M., Data mining: Concepts and Techniques, Academic Press, (2001). Klir, G. J., Yuan, B., Fuzzy sets and Fuzzy Logic’, Theory and Applications, Prentice Hall PTR, (1995). 16. G. Salton, A. Wong, and C. S. Yang (1975), "[http://www.cs.uiuc.edu/class/fa05/cs511/Spring05/other_papers /p613- salton.pdf A Vector Space Model for Automatic Indexing]," Communications of the ACM, vol. 18, nr. 11, pages 613–620 17. Baeza-Yates Ricardo, Berthier-Neto, Modern Information Retrieval, Addison-Wesley Publishers, New York, 1999 18. Belkin N. J. and Croft W. B., "Information filtering and information retrieval: two side of the same coin?", Commum ACM 35, 12, 1992, pp.29-38 19. http://ckip.iis.sinica.edu.tw/CKIP/ 20. Susan M. Bridges, Rayford B. Vaughn. “An Improved Algorithm for Fuzzy Data Mining for Intrusion Detection”. 21. Nancy P. Lin, Chung-I Chang, Hao-En Chueh, Pei-Yu Liao, “Sequential Patterns Mining with Fuzzy Time-Intervals”. Department of information engineering, Tamkang University. (2008) 22. Mannila, H., Toivonen, H., Inkeri Verkamo, A., ‘‘Discovery of frequent episodes in event sequences, ’’Data Mining and Knowledge Discovery, 1(3), pp.259-289, (1997). 23. M. N. Garofalakis, R. Rastogi, and K. Shim, ‘‘SPIRIT: Sequential Pattern Mining with Regular Expression Constraints,’’ Proc. Int. Conf. on Very Large Data Bases (VLDB), pp. 223-234, (1999). 24. J. Pei, J. Han, B. Mortazavi-Asl, H. Zhu, Mining access patterns efficiently from web logs, in: Proc. 2000 Pacific-Asia Conf. Knowledge Discovery and Data Mining, pp. 396–407, (2000). 25. P. C. Wong, W. Cowley, H. Foote, E. Jurrus, and J. Thomas, ‘‘Visualizing sequential patterns for text mining,’’ Pacific Northwest National Laboratory. In Proceedings of IEEE Information Visualization,(2000). 26. Wu, P.-H, Peng, W.-C., Chen, M.-S., ‘‘Mining sequential alarm patterns in a telecommunication database,’’ Proceedings of Workshop on Databases in Telecommunications (VLDB 2001), pp. 37-51, (2001) 26. M. Benkhalifa, A. Bensaid and A. Mouradi, “Text categorization using the semi-supervised fuzzy c-means algorithm,” NAFIPS International Fuzzy Information Processing Society, pp561 – 565(1999) 27. Chun-Kai Chen, “A Study in Automatic Document Classification By Using Artificial Neural Network,” Master Degree Thesis, Department of Information Engineering, Tamkang University, Taipei(1994)
論文全文使用權限	校內：紙本論文於授權書繳交後1年公開同意電子論文全文授權校園內公開校內電子論文立即公開校外：同意授權校外電子論文立即公開

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信