淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


  查詢圖書館館藏目錄
系統識別號 U0002-0104201113455200
中文論文名稱 使用隨機派翠網路之線上網頁推薦系統
英文論文名稱 Online Web Recommendation System by Using Stochastic Timed Petri Nets
校院名稱 淡江大學
系所名稱(中) 資訊工程學系博士班
系所名稱(英) Department of Computer Science and Information Engineering
學年度 99
學期 1
出版年 100
研究生中文姓名 孫初豪
研究生英文姓名 Chu-Hao Sun
學號 894190320
學位類別 博士
語文別 英文
口試日期 2011-01-13
論文頁數 118頁
口試委員 指導教授-陳伯榮
委員-陳良弼
委員-陳省隆
委員-趙景明
委員-莊博任
委員-陳伯榮
中文關鍵字 網頁推薦系統  網頁使用習性探勘  資料串流  隨機過程時間派翠網路  網頁結構特性  衡量網站的基準 
英文關鍵字 Web usage mining  Data stream mining  Stochastic Timed Petri Nets  Markov model  Web graph properties  Web metrics 
學科別分類 學科別應用科學資訊工程
中文摘要 網頁推薦系統為網頁使用習性探勘典型的應用,而網頁推薦系統在結構上可分成離線(off-line)與線上(online)兩個元件,離線元件主要根據分析過去歷史使用者習性資料(usage profile)來建立知識,並提供給線上元件使用。由於一般網頁推薦系統大多使用離線之資料前置處理,而執行資料探勘時也沒有時間限制,故此方法並不適合即時的動態環境,因此,我們需要高效能之線上網站使用習性探勘技術來解決這些問題。
我們在本文中提出以隨機過程時間派翠網路為基礎之線上網頁推薦系統,網頁推薦處理程序可分為資料準備、模式發掘和推薦三個階段。在資料準備階段,我們利用隨機過程時間派翠網路作為分析網站網頁架構的模型,並加入衡量網站基準分析網頁結構特性來增進網頁資訊的擷取。而分析網站架構後所得到的關聯矩陣與派翠網路模型之可到達行為特性可協助進行資料前置處理中的網頁內容範圍辨識和路徑填補。在模式發掘階段,我們應用流覽圖(navigational graph)和可到達圖(reachability graph)來作為使用者習性資料的模型。我們利用資料串流探勘技術維護在滑動視窗(sliding-window)上的流覽圖,並應用關聯矩陣與派翠網路模型之可到達行為特性進行可到達圖的建立。在推薦階段,我們使用馬可夫移轉機率和穩定狀態機率來協助預測使用者網頁瀏覽的行為,推薦引擎可藉由模式發掘階段所產生之流覽圖和可到達圖來協助完成動態網頁的推薦。而我們結合以代理人為基礎和事件驅動非同步通知的架構來達到線上即時的資料準備、模式發掘和推薦三個階段。
英文摘要 Web recommendation systems are typical applications of Web usage mining. The Web recommendation system is structured according to an online and an off-line component. The off-line component is aimed at building the knowledge base by analyzing past usage profiles that is then used in the online component. The general Web recommendation system mainly uses offline data preprocessing and the mining process is not time-limited. However, this approach is not suitable in real-time dynamic environments. Therefore, we need high-performance online Web usage mining techniques to provide solutions to these problems.
In this paper, we propose an online Web recommendation system using STPN. The Web recommendation process consists of the data preparation phase, pattern discovery phase, and recommendation phase. In data preparation phase, we use STPN to model the Web structure, and also apply the Web metrics of Web graph properties to analyze the Web structure in improving the Web access. We simultaneously employ the Web structure analysis information in the incidence matrix and the reachability properties, obtained from the STPN model, to help proceed with pageview identification and path completion at the data preprocessing. In pattern discovery phase, the navigational graph and reachability graph are employed to model the usage profiles. We use the data stream mining technique to incrementally maintain the navigational graph over a sliding-window. The STPN’s reachability behavior characteristic and incidence matrix are applied to construct the reachability graph. In recommendation phase, we use the transition probability and steady-state probability in Markov model to predict the user’s navigational behavior. The navigational graph and reachability graph are used by the Web recommendation engine to generate the online dynamic Web recommendation. We combine the features of both the agent-based architecture and event-driven asynchronous notification architecture to achieve the online data preparation, pattern discovery, and recommendation.
論文目次 Contents
Chapter 1 Introduction 1
Chapter 2 Background and Related Works 6
2.1 Stochastic Timed Petri Nets (STPN) 6
2.1.1 STPN Reachability Behavior Characteristic 7
2.2 Markov Chains 9
2.3 Web Graph 10
2.3.1 Centrality 11
2.3.2 Global Metrics 12
2.3.3 Local Metrics 14
2.4 WS-Notification 15
2.5 Related Works 17
Chapter 3 System Overview 19
3.1 System Architecture 19
3.2 Component Description 21
Chapter 4 Online Change Detection and processing 24
4.1 Web Server Log File 24
4.2 File Monitoring Agent 26
4.2.1 Data cleaning 30
4.3 Notification broker 31
Chapter 5 Using STPN to Model and Analysis the Web Structure 34
5.1 Modeling a Website Structure with STPN 34
5.2 Web Structure Metrics 39
5.2.1 Centrality 42
5.2.2 Global Metrics 48
5.2.3 Local Metrics 50
5.3 Incremental Web Structure Adapting 51
5.3.1 Adding a Link 51
5.3.2 Adding a Webpage 53
5.3.3 Deleting a link 56
5.3.4 Deleting a Webpage 58
Chapter 6 Using STPN to Enhance Web Usage Data Preprocessing 62
6.1 User Identification 66
6.2 Session Identification 67
6.3 Pageview Identification 68
6.4 Path Completion 74
6.5 Data Transformation 78
Chapter 7 Online Web Recommendation 80
7.1 User Profiles 80
7.1.1 Navigational Graph 81
7.1.2 Reachability Graph 82
7.2 Probabilistic Model 83
7.3 Incremental Maintaining Navigational Graph and Reachability Graph 89
7.4 Web Recommendation 103
Chapter 8 Conclusions and Future Research 108
Reference 111

List of Figures
Figure 1: The Web usage mining process 1
Figure 2: General architecture of a typical Web recommendation system 2
Figure 3: The compactness of global metrics 13
Figure 4: The stratum of global metrics 14
Figure 5: A typical WS-Notification interaction 16
Figure 6: A typical brokered WS-Notification interaction 16
Figure 7: Sample topic space 16
Figure 8: Sample WS-topic topic space definition 17
Figure 9: The system architecture and major components 20
Figure 10: Example Web log file in the common log format 25
Figure 11: The file monitoring agent component 26
Figure 12: Example of the event message for Web program 27
Figure 13: Example of the event message for Web log 28
Figure 14: The file monitoring agent algorithm 29
Figure 15: The data cleaning algorithm 31
Figure 16: The notification broker component 32
Figure 17: The interaction diagram of the message exchange 33
Figure 18: The Web structure of Table 1 37
Figure 19: The Stochastic Timed Petri Nets corresponding to the Web structure of Table 2 38
Figure 20: The flow chart of calculating Web metrics algorithm 40
Figure 21: Convert the Incidence matrix to the converted distance matrix algorithm 41
Figure 22: The Web centrality metrics algorithm 43
Figure 23: The Web structure of Table 3 45
Figure 24: The cross-linked tree structure after adjusting 48
Figure 25: The cross-linked tree structure and depth vector, child vector after adjusting 50
Figure 26: The adding a link algorithm 53
Figure 27: The adding a Webpage algorithm 55
Figure 28: The deleting a link algorithm 57
Figure 29: The deleting a Webpage algorithm 60
Figure 30: The Web usage data preprocessing process 63
Figure 31: The component diagram of pageview identification and path completion 64
Figure 32: The Web usage data preprocessing using STPN algorithm 65
Figure 33: The data schema of the User, Session, and Pageviews 66
Figure 34: The user identification algorithm 67
Figure 35: The session identification algorithm 68
Figure 36: Web structure and STPN structure 70
Figure 37: The STPN pageview identification algorithm 71
Figure 38: The STPN path completion algorithm 76
Figure 39: The state equation of path complete 78
Figure 40: The navigational graph 82
Figure 41: The reachability graph 83
Figure 42: The calculate stead-state probability matrix algorithm 87
Figure 43: The flow chart of adding session algorithm 89
Figure 44: The adding session algorithm 90
Figure 45: The add child node algorithm 92
Figure 46: The navigational graph over a sliding-window (size=6) 94
Figure 47: The remove session algorithm 96
Figure 48: The navigational graph over a sliding-window (size=6) after adding the new session S7 98
Figure 49: The adding session to reachability graph algorithm 100
Figure 50: The reachability graph after adding the session S7 101
Figure 51: The remove Web page algorithm 103
Figure 52: The flow chart of Web recommendation algorithm 104
Figure 53: The Web recommendation algorithm 105
Figure 54: The Web recommendation of the active session: b 106
Figure 55: The Web recommendation of the active session: b->f 107

List of Tables
Table 1: The hyperlink content of a Web structure 36
Table 2: The incidence matrix represents a Web structure shown in Figure 15 37
Table 3: The Web structure example 44
Table 4: The places and the corresponding Web pages 45
Table 5: The transitions and the corresponding hyperlinks’ labels 45
Table 6: Incidence matrix [Aij]7x8 which denotes the Web structure 46
Table 7: The initial matrix [Cij]7x7 which denotes the Web structure 46
Table 8: The converted distance matrix [Cij]7x7 & the value of COD, CID, ROC, RIC 46
Table 9: The incidence matrix [Aij]7x9 after adding g link to e 47
Table 10: The converted distance matrix [Cij]7x7 & the value of COD, CID, ROC, RIC after adjusting 48
Table 11: The converted distance matrix [Dij]7x7 & stratum related metrics after adjusting 49
Table 12: The transitions and the corresponding hyperlinks’ labels after adding the link: g to e 53
Table 13: The incidence matrix after adding a new page h 53
Table 14: The places and the corresponding Web pages after adding a new page h 55
Table 15: The transitions and the corresponding hyperlinks’ labels after adding the link: h to e 55
Table 16: The incidence matrix after adding Web page h and the link: h to e 55
Table 17: The transitions and the corresponding hyperlinks’ labels after deleting the link: e to a 58
Table 18: The incidence matrix after deleting the link: e to a 58
Table 19: The places and the corresponding Web pages 60
Table 20: The transitions and the corresponding hyperlinks’ labels 61
Table 21: The associated matrix after deleting the link: e to d and the link: d to a 61
Table 22: A user session before pageview identification 73
Table 23: A user session after pageview identification 73
Table 24: The activity of Web usage metrics 79
Table 25: The 6 pieces of user’s session data 82
Table 26: The co-occurrence matrix of the two Web pages 85
Table 27: The initial transition probability matrix 85
Table 28: The steady-state probability matrix 88
Table 29: The co-occurrence matrix of the navigational graph 94
Table 30: The initial transition probability matrix of the navigational graph 95
Table 31: The steady-state probability matrix of the navigational graph 95
Table 32: The standardized steady-state probability matrix of the navigational graph 95
Table 33: The co-occurrence matrix after deleting session S1 and adding the new session S7 98
Table 34: The initial transition probability matrix after deleting session S1 and adding the new session S7 99
Table 35: The standardized steady-state probability matrix after deleting session S1 and adding the new session S7 99
Table 36: The co-occurrence matrix of the reachability graph 101
Table 37: The initial transition probability matrix of the reachability graph 102
Table 38: The standardized steady-state probability matrix of the reachability graph 102
參考文獻 [1] J. Srivastava, R. Cooley, M. Deshpande, and Pan-Ning Tan, “Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data,” SIGKDD Explorations, Vol.1, Issue 2, pp12-23, Jan. 2000.
[2] R. Cooley, B. Mobasher, and J. Srivastava, “Data Preparation for Mining World Wide Web Browsing Patterns,” Journal of Knowledge and Information System, 1(1), pp. 5-32, 1999.
[3] R. Cooley, Pang-Ning Tan, J. Srivastava, “Discovery of Interesting Usage Patterns from Web Data,” Lecture Notes in Computer Science, 2000.
[4] R. Cooley, B. Mobasher, and J. Srivastava, “Web Mining: Information and Pattern Discovery on the World Wide Web,” Proc. Ninth IEEE Int’l Conf. Tools with AI (ICTAI ’97), pp. 558-567, 1997.
[5] O. Nasraoui, R. Krishnapuram, and A. Joshi, “Mining Web AccessLogs Using a Relational Clustering Algorithm Based on a RobustEstimator,” Proc. Eighth Int’l World Wide Web Conf. (WWW ’99),pp. 40-41, 1999.
[6] O. Nasraoui, R. Krishnapuram, H. Frigui, and A. Joshi, “Extracting Web User Profiles Using Relational Competitive Fuzzy Clustering,” Int’l J. Artificial Intelligence Tools, vol. 9, no. 4, pp. 509-526, 2000.
[7] M. Spiliopoulou and L.C. Faulstich, “WUM: A Web Utilization Miner,” Proc. First Int’l Workshop Web and Databases (WebDB ’98), 1998.
[8] M. Eirinaki and M. Vazirgiannis, “Web Mining for Web personalization,” ACM Transactions on Internet Technology, Vol. 3, No. 1, pp.1-27, Feb. 2003.
[9] B. Mobasher, R. Cooley, and J. Srivastava, “Automatic Personalization Based on Web Usage Mining,” Communications of the ACM, Vol. 43, pp. 142-151, 2000.
[10]R. Baraglia and P. Palmerini. “Suggest: a Web Usage Mining System,” In Proc. of IEEE Int’l Conf. on Information Technology: Coding and Computing, April 2002.
[11] S.S. Anand, B. Mobasher, “Intelligent Techniques for Web Personalization,” Vol. 3169, Springer, Heidelberg, pp.1–36, 2005.
[12]M. Jalali, N. Mustapha, A. Mamat, Md N. Sulaiman, “OPWUMP An Architecture for Online Predicting in WUM-based Personalization System,” In 13th International CSI Computer Science, Springer Verlag, 2008.
[13]M. Jalali, N. Mustapha, N. B. Sulaiman, and A. Mamat, “A Web Usage Mining Approach Based on LCS Algorithm in Online Predicting Recommendation Systems,” in 12th International on Information Visualisation, London, UK, pp. 302-307, 2008.
[14]G. Hulten, L. Spencer, P. Domingos, “Mining Time-Changing Data Streams,” In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106, 2001.
[15]B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and Issues in Data Stream Systems,” The 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. Madison, Wisconsin, pp. 1-16, June 2002.
[16]L. Golab, M. Ozsu, “Issues in Data Stream Management,” ACM SIGMOD Record, Vol. 32, pp. 5-14, 2003.
[17]M.M. Gaber, A. Zaslavsky, S. Krishnaswamy, “Mining Data Streams: a Review,” SIGMOD Rec. 34(2), pp. 18–26, 2005.
[18]M. Stonebraker, U. Cetintemel, S. Zdonik, “The 8 Requirements of Real-Time Stream Processing,” SIGMOD Rec. 34(4), pp. 42–47, 2005.
[19]C. Aggarwal, “Data Streams: Models and Algorithms,” Advances in Database Systems, Springer, Heidelberg, 2007.
[20]W. Reisig, “Correctness Proofs of Distributed Algorithms,” Lecture Notes in Computer Science, Vol. 938: Theory and Practice in Distributed Systems, pp. 164-177, 1995.
[21]R. Sarukkai, “Link Prediction and Path Analysis Using Markov Chains,” 9th International WWW Conference, Amsterdam, pp. 377–386, 2000.
[22]Devanshu Dhyani, Sourav S Bhowmick, Wee-Keong Ng, “Modelling and Predicting Web Page Accesses Using Markov Processes,” IEEE,Computer Society, pp. 1529-4188, 2003.
[23]M. Deshpande and G. Karypis, “Selective Models for Predicting Web Page Accesses,” Transactions on Internet Technology, Volume 4, Issue 2, pp. 163–184, 2004.
[24]M. Eirinaki, M. Vazirgiannis, and D. Kapogiannis, “Web Path Recommendations Based on Page Ranking and Markov Models,” Proceedings of the 7th annual ACM international workshop on Web information and data management, New York, NY, USA, ACM Press, pp. 2–9, 2005.
[25]Faten Khalil, Jiuyong Li, Hua Wang, “Integrating Markov Model with Clustering for Predicting Web Page Accesses,” Australian Conference, pp. 1-26, Mar. 2007.
[26]Devanshu Dhyani, Wee Keong Ng, and Sourav S. Bhowmick, “A Survey of Web Metrics,” ACM Computing Surveys, Vol. 34, No. 4, pp. 469-503, Dec. 2002
[27]Tadao Murata, “Petri Nets: Properties, Analysis and Applications,” Proceedings of the IEEE, Vol. 77, No. 4, 1989.
[28]M. Ajmone Marsan, “Stochastic Petri Nets:An Elementary Introduction,” Lecture Notes in Computer Science, Vol. 424, pp. 1-29, 1990.
[29]Wolfgang Reisig, “Correctness Proofs of Distributed Algorithms,” Lecture Notes in Computer Science, Vol. 938: Theory and Practice in Distributed Systems, pp. 164-177, 1995.
[30]James. R. Norris, “Markov Chains,” Cambridge University Press, 1997.
[31]Pierre Bremaud, “Markov Chains,” Springer, 1999.
[32]R. Botafogo, E. Rivlin, and B. Shneiderman, “Structural Analysis of Hypertexts: Identifying Hierarchies and Useful Metrics,” ACM Transaction on Information System, 10, pp.142-180, Apr. 1992.
[33]S. Graham et al., “Publish-Subscribe Notification for Web Services,” V. 1.0, joint specification by BEA Systems, IBM, and Microsoft, Mar. 2004.
[34]S. Graham et al., “Web Services Base Notification (WS-Base Notification),” V.1.0, joint specification by BEA Systems, IBM, and Microsoft, Mar. 2004.
[35]S. Graham et al., “Web Services Brokered Notification (WS-Brokered Notification),” V.1.0, joint specification by BEA Systems, IBM, and Microsoft, Mar. 2004.
[36]S. Graham et al., “Web Services Topics (WSTopics),” V. 1.0, joint specification by BEA Systems, IBM, and Microsoft, Mar. 2004.
[37]T. Yan, M. Jacobsen, H. Garcia-Molina and U. Dayal, “From User Access Patterns to Dynamic Hypertext Linking,” Fifth International World Wide Web Conference, May 1996.
[38]M. Perkowitz, and O.Etzioni, “Adaptive web sites: Automatically synthesizing web pages,” In Proceedings of the Fifteenth National Conference on Artificial Intelligence, 1998.
[39]M. Perkowitz, and O.Etzioni, “Towards adaptive web sites: Conceptual framework and case study,” In Proceedings of the eighth World Wide Web Conference, 1999.
[40]R. Baraglia and F. Silvestri, “Dynamic personalization of web sites without user intervention,” Communications of the ACM, vol. 50, pp. 63-67, 2007.
[41]R. Baraglia and F. Silvestri, “An Online Recommender System for Large Web Sites,” pp.199-205, 2004.
[42]Myra Spiliopoulou, Lukas C. Faulstich, “WUM: A Tool for Web Utilization Analysis,” Proceeding of EDBT workshop WebDB’98, LNCS 1590, Springer, Berlin, Germany, pp. 184-203, 1998.
[43]Murat Ali Bayir, Ismail H. Toroslu, Ahmet Cosar, “A New Approach for Reactive Web Usage Data Processing,” Pro. Of ICDEW’06, pp. 91-100, 2006.
[44]World Wide Web Consortium, Common Logfile Formate, http://www.w3.org/Daemon/User/Config/Logging.html
[45]Miriam Baglioni, U. Ferrara, Andrea Romei, Salvatore Ruggieri, Franco Turini, “Preprocessing and Mining Web Log Data for Web Personalization,” Proc. of 8th Natl' Conf. of the Italian Association for Artificial Intelligence., LNCS 2829, Springer, Berlin, Germany, pp. 237-249, 2003.
[46]M. Wooldridge and P. Ciancarini, “Agent-Oriented Software Engineering: The State of the Art,” First International Workshop on Agent-Oriented Software Engineering, pp. 1–28, 2001.
[47]R. Cooley “The Use of Web Structure and Content to Identify Subjectively Interesting Web Usage Patterns,” ACM Transactions on Internet Technoloey, Vol.3, No.2, pp 93-116, May 2003.
[48]A. Buchner, M. Mulvenna, “Discovering Internet Marketing Intelligence Through Online Analytical Web Usage Mining,” SIGMOD Record, Vol.27, No.4, pp.54-61, Dec. 1998.
[49]Peter Pirolli, James Pitkow, Ramana Rao, “Silk from a Sow’s Ear: Extracting Usable Structures from the Web,” Conference on Human Factors in Computing Systems, CHI-96, 1996.
[50]Myra Spiliopoulou, Carsten Pohle, Lukas C. Faulstich, “Improving the Effectiveness of a Web Site with Web Usage Mining,” WEBKDD, 1999.
[51]Jeffrey Heer, Ed H. Chi, “Identification of Web User Traffic Composition using Multi-Modal Clustering and Information,” In Proceedings of the 1st SIAM International Conference on Data Mining Workshop on Web Mining, pp.51-58, 2001.
[52]Chen, Po-Zung; Yang, Shih-Yang; Sun, Chu-Hao, “Modeling and Analysis the Web Structure Using Stochastic Timed Petri Nets,” Journal of Software, Vol 3, No 8 pp. 19-26, 2008
[53]M. Spiliopoulou, B. Mobasher, B. Berendt, and M. Nakagawa, “A Framework for the Evaluation of Session Reconstruction Heuristics in Web Usage Analysis,” INFORMS Journal on Computing, 2003.
[54]Magdalini Eirinaki and Michalis Vazirgiannis, “Web Mining for Web Personalization,” ACM Transactions on Internet Technology, Vol. 3, No. 1, pp.1-27, Feb. 2003.
[55]Chen, Po-Zung; Yang, Shih-Yang; Sun, Chu-Hao, “Using Petri Nets to Enhance Web Usage Mining,” Acta Polytechnica Hungarica, Vol. 4, No. 3, pp. 113-125, 2007.
[56]Yan Wang, “Web Mining and Knowledge Discovery of Usage Patterns.”
[57]Lara Catledge and James Pitkow, “Characterizing Browsing Behaviors on the World Wide Web”, Computer Networks and ISDN Systems, 27(6), 1995.
[58]P.I. Hofgesang, “Web Personalisation Through Incremental Individual Profiling and Support-based User Segmentation,” In: WI 2007: Proceedings of the 2007 IEEE/WIC/ACM International Conference on Web Intelligence, IEEE Computer Society, Washington, pp. 213–220, 2007.
[59]P.I. Hofgesang, J.P. Patist, “Online Change Detection in Individual Web User Behaviour,” In: WWW 2008: Proceedings of the 17th International Conference on World Wide Web, ACM, New York , pp. 1157–1158, 2008.
[60]P.I. Hofgesang, “Online Mining of Web Usage Data: An Overview,” Web Mining Applications in E-Commerce and E-Services, Springer, pp. 1-24, 2009.

論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2016-04-12公開。
  • 不同意授權瀏覽/列印電子全文服務。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信