淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


  查詢圖書館館藏目錄
系統識別號 U0002-1702200817324100
中文論文名稱 利用派翠網路來協助網站使用者習性探勘
英文論文名稱 Using Petri Nets to Enhance Web Usage Mining
校院名稱 淡江大學
系所名稱(中) 資訊工程學系博士班
系所名稱(英) Department of Computer Science and Information Engineering
學年度 96
學期 1
出版年 97
研究生中文姓名 楊士央
研究生英文姓名 Shih-Yang Yang
學號 888190021
學位類別 博士
語文別 英文
口試日期 2008-01-17
論文頁數 73頁
口試委員 指導教授-陳伯榮
委員-趙景明
委員-陳省隆
委員-莊博任
委員-伍麗樵
委員-施國琛
委員-陳伯榮
中文關鍵字 網站使用者習性探勘  派翠網路  資料前置處理 
英文關鍵字 Web Usage Mining  Petri Nets  Data Preprocessing 
學科別分類 學科別應用科學資訊工程
中文摘要 在網站使用者習性探勘的過程中,正確的網站架構分析不僅可以協助資料前置處理,也可以提高探勘結果的正確性。
派翠網路是ㄧ個被廣泛應用的高階圖形化模型,派翠網路可以將模型分析的結果及特性儲存於關聯矩陣中以做為進一步分析之用。另外,它的ㄧ些經過完整驗證且廣為熟知的特性也可以用來協助解決研究人員面臨的問題。
在本論文中,我們提出利用派翠網路來來作為分析網站網頁架構的模型,我們利用派翠網路模型中的位置來代表網站中的網頁並利用轉移來代表網站中的超連結,並討論如何利用分析網站架構後所得到(或產生)的關聯矩陣來協助進行資料前置處理中的網頁內容範圍辨識,並利用派翠網路模型的可到達行為特性來協助進行資料前置處理中的路徑填補。此外,我們也應用派翠網路模型的馬可夫特性,利用網站架構分析過程所產生的網頁內容範圍關聯矩陣來進行使用者瀏覽習性的分析。
英文摘要 Precise analysis of the web structure can facilitate data pre-processing and enhance the accuracy of the mining results in the procedure of web usage mining.
PN(Petri Nets) is a high-level graphical model widely used in modeling system activities with concurrency. PN can save the analyzed results in an incidence matrix for future follow-up analyses, and some already-verified properties held by PN, such as reachability, can also be used to solve some unsettled problems in the model.
In the present study, we put forth the use of PN as the Web structure model. We adopt Place in the PN model to represent webpage on the websites and use Transition to represent hyperlink. Through the model, we can conduct Web structure analysis. We simultaneously employ the Web structure analysis information in the incidence matrix and the reachability properties, obtained from the PN model, to help proceed with pageview identification and path completion at the data preprocessing phase. In addition, we conduct Web structure analysis to generate pageview state matrix, and we further undergo the analysis of user browsing behaviors through Markov analysis at the phase of pattern discovery.
論文目次 Contents I
List of Figures III
List of Tables IV
Chapter 1 Introduction 1
Chapter 2 Background Knowledge 5
2.1 Web Usage Mining 5
2.1.1 Input 8
2.1.2 Preprocessing 9
2.1.3 Pattern Discovery 12
2.1.4 Pattern Analysis 15
2.2 Petri Nets 16
2.2.1 The Definition of PN 16
2.2.2 Reachability 18
2.2.3 Markov Chains 20
2.2.4 Relate Research of PN in Web Services 22
Chapter 3 Modeling a Website Structure with PN 23
3.1 Using PN Model to Represents a Website 23
3.2 Parsing Algorithm 26
3.3 Example of Parsing a Website 30
Chapter 4 Using PN Model to Enhance Data Preprocessing 35
4.1 Data Preprocessing 35
4.2 Pageview Identification 38
4.2.1. Using Incidence Matrix to Assist Pageview Identification 38
4.2.2. Pageview Identification Algorithm 39
4.2.3. Example of Pagview Identification 43
4.3 Path Completion 45
4.3.1. Using Reachability to Assist Path Completion 45
4.3.2. Path Completion Algorithm 46
4.3.3. Example of Path Completion 48
Chapter 5 Markov Analysis for PN Web Structure Model 50
5.1. Using Markov to Analysis User Behavior 50
5.2. Markov Analysis Algorithm 51
5.3. Example of Markov Analysis 53
Chapter 6 Design and Implementation for System Architecture 54
6.1 Use-Case 54
6.2 Class Diagram 62
6.3 Sequence Diagram 64
Chapter 7 Conclusions and Future Research 66
Reference 68

List of Figures
Figure 1-1 PN Based Web Usage Mining Structure 3
Figure 2-1 Web Usage Mining Process 7
Figure 3-1 The Parsing Algorithm 27
Figure 3-2 The incidence matrix representing the webpage structure shown in Table 7-1 33
Figure 3-3 The Petri Nets corresponding to the website of Table 1 33
Figure 3-4 Pageview State Matrix 34
Figure 4-1 The Component Diagram of Pageview Identification and Path Completion 37
Figure 4-2 Algorithm of Pageview Identification 41
Figure 4-3 Algorithm of Algorithm of Path Completion 47
Figure 4-4 The State Equation of Path Complete 49
Figure 5-1 Markov Chain Analysis 51
Figure 5-2 Pageview State Matrix after adding pageview Z 53
Figure 5-3 Pageview State Matrix with Frequency of each Pageview 53
Figure 6-1 Used Case Diagram of System 54
Figure 6-2 Use Case Diagram of Data Preprocessing 58
Figure 6-3 Class Diagram of Modeling Website Structure 62
Figure 6-4 Class Diagram of Data Preprocessing 63
Figure 6-5 Sequence Diagram of Data Preprocessing 64

List of Tables
3.1 A Website Example 30
3.2 The Execution of Main Loop in the Parsing Algorithm 31
3.3 The Place Number and its Corresponding Webpage 32
3.4 The Transition Number and its Corresponding Hyperlink 32
3.5 Corresponding Pageview ID Table of Figure 7-3 34
4-1 A User Session before Pageview Identification 44
4-2 A User Session after Pageview Identification 44
5-1 Probability of each Pageview 53
6-1 Use Case Description of Modeling Website Structure .55
6-2 Use Case Description of Data Preprocessing 56
6-3 Use Case Description of Analyzing Website Structure 57
6-4 Use Case Description of Data Cleaning 59
6-5 Use Case Description of Session Identification 60
6-6 Use Case Description of Pageview Identification 60
6-7 Use Case Description of Path Completion 61
6-8 Data Format of User Log File 63
6-9 Data Dictionary of User Log File 63
6-10 Operation Description of Data Cleaning 64
6-11 Operation Description of Session Identification 64
6-12 Operation Description of Pageview Identification 65
6-13 Operation Description of Path Completion 65


參考文獻 [1] Jaideep Srivastava, Robert Cooley, Mukund Deshpande, and Pan-Ning Tan, “Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data,” SIGKDD Explorations, Vol.1, Issue 2, pp12-23, Jan. 2000.
[2] Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava, “Data Preparation for Mining World Wide Web Browsing Patterns”, Journal of Knowledge and Information System, 1(1), 1999, pp 5-32.
[3] Murat Ali Bayir, Ismail H. Toroslu, and Ahmet Cosar, “A New Approach for Reactive Web Usage Mining Data Processing.”, Proceeding of the 22nd International Conference on Data Engineering Workshops (ICDEW’06).
[4] Robert Cooley, “The Use of Web Structure and Content to Identify Subjectively Interesting Web Usage Patterns.” , ACM Transactions on Internet Technology, Vol.3, No.2, May, 2003, pp.93-116.
[5] M Spiliopoulou, B Mobasher, B Berendt, and M Nakagawa, “A Framework for the Evaluation of Session Reconstruction Heuristics in Web Usage Analysis.”, INFORMS Journal on Computing, 2003.
[6] Magdalini Eirinaki and Michalis Vazirgiannis, “Web Mining for Web personalization.”, ACM Transactions on Internet Technology, Vol. 3, No. 1, Feb. 2003, pp.1-27.
[7] Oren Etzione, “The World-Wide Web: Quagmire or Gold Mine?”, Communication of the ACM, Nov. 1996, pp.65-68.
[8] Sanjay Madria, Sourav S Bhowmick, W. –K Ng, E. P. Lim, “Research Issues in Web Data Mining”, In proceedings of Data Warehousing and Knowledge Discovery, First International Conference, DaWak, ’99, 1999, pp.303-312.
[9] Jose Borges and Mark levene, “Data Mining of User Navigation Patterns”, In Proceedings of the WEBKDD’99, 1999, pp. 31-39.
[10] Raymond Kosala and Hendrik Blockeel, “Web mining research: A survey”, SIGKDD Explorations, Vol.2, Issue 1, pp1-15, July 2000.
[11] Myra Spiliopoulou, Lukas C. Faulstich, “WUM: A tool for Web Utilization analysis”, Proceeding of EDBT workshop WebDB’98, LNCS 1590, Springer, Berlin, Germany, pp. 184-203, 1998.
[12] Murat Ali Bayir, Ismail H. Toroslu, Ahmet Cosar, “A New Approach for Reactive Web Usage Data Processing”, Pro. Of ICDEW’06, pp.91-100, 2006.
[13] World Wide Web Consortium, Common Logfile Formate, http://www.w3.org/Daemon/User/Config/Logging.html
[14] Miriam Baglioni, U. Ferrara, Andrea Romei, Salvatore Ruggieri, Franco Turini, "Preprocessing and Mining Web Log Data for Web Personalization", Proc. of 8th Natl' Conf. of the Italian Association for Artificial Intelligence., LNCS 2829, Springer, Berlin, Germany, pp. 237-249, 2003.
[15] Yan Wang, “Web Mining and Knowledge Discovery of Usage Patterns”
[16] Robert Cooley, Pang-Ning Tan, Jaideep Srivastava, ”Discovery of Interesting Usage Patterns from Web Data”, Lecture Notes in Computer Science, 2000.
[17] Lara Catledge and James Pitkow, “Characterizing Browsing Behaviors on the World Wide Web”, Computer Networks and ISDN Systems, 27(6), 1995.
[18] Alex G. Buchner, Maurice Mulvenna, “Discovering Internet Marketing Intelligence through Online Analytical Web Usage Mining”, SIGMOD Record, Vol.27, No.4, pp.54-61, Dec.1998.
[19] Peter Pirolli, James Pitkow, Ramana Rao , “Silk from a Sow’s Ear:Extracting Usable Structures from the Web”, Conference on Human Factors in Computing Systems, CHI-96, 1996.
[20] Myra Spiliopoulou, Carsten Pohle, Lukas C. Faulstich, “Improving the effectiveness of a Web site with Web usage mining”, WEBKDD, 1999.
[21] Jeffrey Heer, Ed H. Chi , “Identification of Web User Traffic Composition using Multi-Modal Clustering and Information “, In Proceedings of the 1st SIAM International Conference on Data Mining Workshop on Web Mining, pp.51-58, 2001.
[22] Rakesh Agrawal and Ramakrishnan Srikant, “Fast Algorithms for Mining Association Rules”, In: Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994.
[23] Karuna P. Joshi, Anupam Joshi, and Yelena Yesha, “On Using a warehouse to analyze Web logs”, Distributed and Parallel Database, 13, pp.161-180, 2003.
[24] Alexandros Nanopoulos, Dimitrios Katsaros, and Yannis Manolopulos, “Exploiting Web log mining for Web cache enhancement”, WEBKDD2001, LNAI 2356, pp.68-87, 2002.
[25] Cody Wong, Simon Shiu and Sankar Pal,”Mining fuzzy association rules for Web access case adaptation”, In Case-Based Reasoning Research and Development : Proceedings of the Fourth International Conference on Case-Based Reasoning, 2001.
[26] Behzad Mortazavi-Asl, “Discovering and mining user Web-page traversal pattern”, Master’s thesis, Simon Fraser University, 2001.
[27] Yunjuan Xie and Vir V. Phoha, “Web user clustering from access log using belief function”, In Proceedings of the First International Conference on Knowledge Capture(K-CAP 2001), pp.202-208, October 2001.
[28] Birgit Hay, Geert Wets, and Koen Vanhoof, “Clustering Navigation patterns on a website using a sequence alignment method”, In: Intelligent techniques for Web personalization: IJCAI 2001 17th International Joint Conference on Artificial Intelligence, August 4, Seattle, Wash., USA, l., pp. 1-6, 2001.
[29] Cyrus Shagabi and Yi-Shin Chen, “Improving user profiles for e-commerce by genetic algorithms”, E-Commerce and Intelligent Methods Studies in Fuzziness and Soft Computing”, 2002.
[30] O. Nasraoui, F. Gonzalez, and D. Dasgupta, ”The fuzzy artificial immune system: Motivations, basic concepts, and application to clustering and Web profiling”, In Proceedings of the World Congress on Computational Intelligence(WCCI) and IEEE Internaional Conference on Fuzzy Systems, pp.711-716, 2002.
[31] Shigeru Oyanagi, Kazuto Kubota, and Akihiko Nakase, “Application of matrix clustering to Web log analysis and access prediction”, In WEBKDD 2001-Mining Web Log Data Across All Customers Touch Points, Third International Workshop, 2001.
[32] Usama Fayyad, Gregory Piatesky-Shapiro, and Padhraic Smyth, “From data mining to knowledge discovery: An overview”, In Advances in knowledge discovery and data mining, pp.1-34, 1996.
[33] Osmar R. Zaiane, Man Xin, Jiawei Han, “Discovering Web Access Patterns and Trends by applying OLAP and Data Mining Technology on Web Logs“, In Advances in Digital Bibraries, pp. 19-29, 1998.
[34] Wang Bin and Liu Zhijing, „Web mining research“, in Proceedings of Fifth International Conference on Computational Intelligence and Multimedia Applications (ICCIMA), pp.84-89, 2003.
[35] M. Ajmone Marsan, “Stochastic Petri Nets:An Elementary Introduction”, Lecture Notes in Computer Science, Vol. 424 :Advances in Petri Nets 1989, pp.1-29, 1990.
[36] Tadao Murata, “Petri Nets: Properties, Analysis and Applications”, Proceedings of the IEEE, Vol. 77, No. 4, 1989.
[37] Wolfgang Reisig, “Correctness Proofs of Distributed Algorithms”, Lecture Notes in Computer Science, Vol. 938: Theory and Practice in Distributed Systems, pp. 164-177, 1995.
[38] Richard M. Karp and Raymond E. Miller, “Parallel Program Schemata,” RC-2053, IBM T. J. Watson Research Center, Yorktown Heights, New York, April 1968, 54 pages, also Journal of Computer and System Science, Vol. 3, No.2, pp. 147-195, 1969.
[39] S. Rao Kosaraju, “Decidability of reachability in vector addition systems,” in Proc. 14th Annual ACM Symp.on Theory of Computing, San Francisco, pages. 267-281,May 1982.
[40] Ernst W. Mayr, “An algorithm for the general Petri net reachability Problem,” SIAM, J. Comput. Vol. 13, No. 3, pages. 441-460, August 1984.
[41] James. R. Norris, “Markov Chains”, Cambridge University Press, 1997.
[42] Pierre Bremaud, “Markov Chains”, Springer, 1999.
[43] Johnson P. Thomas, Mathews Thomas, and George Ghinew, “Modeling of Web Services Flow”, IEEE International Conference on E-Commerce(CEC’03), June, 2003.
[44] Rachid Hamadi and Boualem Benatallah, “A Petri Net-based Model for Web Service Composition”, In Preceedings of the Fourteenth Australasian database conference on Database technologies, 2003, p.191-200, February 01, 2003, Adelaide, Australia.
[45]Lisa Wells S. Christensen, Lars M. Kristensen, and Kjeld H. Mortensen, “Simulation Based Performance Analysis of Web Servers”, In Proceedings of 9th International Workshop on Petri Nets and Proformance Models, Sept. 11-14, pp 9-68, 2001
[46]P. David Scotts, Richard Furuma, “Petri-net-based hypertext: document structure with browsing semantics”, ACM Transactions on Information Systems, Volume 7, Issue 1, pp.3-29, January 1989.
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2013-02-25公開。
  • 不同意授權瀏覽/列印電子全文服務。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信