系統識別號 | U0002-0104201113455200 |
---|---|
DOI | 10.6846/TKU.2011.01087 |
論文名稱(中文) | 使用隨機派翠網路之線上網頁推薦系統 |
論文名稱(英文) | Online Web Recommendation System by Using Stochastic Timed Petri Nets |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 資訊工程學系博士班 |
系所名稱(英文) | Department of Computer Science and Information Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 99 |
學期 | 1 |
出版年 | 100 |
研究生(中文) | 孫初豪 |
研究生(英文) | Chu-Hao Sun |
學號 | 894190320 |
學位類別 | 博士 |
語言別 | 英文 |
第二語言別 | |
口試日期 | 2011-01-13 |
論文頁數 | 118頁 |
口試委員 |
指導教授
-
陳伯榮
委員 - 陳良弼 委員 - 陳省隆 委員 - 趙景明 委員 - 莊博任 委員 - 陳伯榮 |
關鍵字(中) |
網頁推薦系統 網頁使用習性探勘 資料串流 隨機過程時間派翠網路 網頁結構特性 衡量網站的基準 |
關鍵字(英) |
Web usage mining Data stream mining Stochastic Timed Petri Nets Markov model Web graph properties Web metrics |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
網頁推薦系統為網頁使用習性探勘典型的應用,而網頁推薦系統在結構上可分成離線(off-line)與線上(online)兩個元件,離線元件主要根據分析過去歷史使用者習性資料(usage profile)來建立知識,並提供給線上元件使用。由於一般網頁推薦系統大多使用離線之資料前置處理,而執行資料探勘時也沒有時間限制,故此方法並不適合即時的動態環境,因此,我們需要高效能之線上網站使用習性探勘技術來解決這些問題。 我們在本文中提出以隨機過程時間派翠網路為基礎之線上網頁推薦系統,網頁推薦處理程序可分為資料準備、模式發掘和推薦三個階段。在資料準備階段,我們利用隨機過程時間派翠網路作為分析網站網頁架構的模型,並加入衡量網站基準分析網頁結構特性來增進網頁資訊的擷取。而分析網站架構後所得到的關聯矩陣與派翠網路模型之可到達行為特性可協助進行資料前置處理中的網頁內容範圍辨識和路徑填補。在模式發掘階段,我們應用流覽圖(navigational graph)和可到達圖(reachability graph)來作為使用者習性資料的模型。我們利用資料串流探勘技術維護在滑動視窗(sliding-window)上的流覽圖,並應用關聯矩陣與派翠網路模型之可到達行為特性進行可到達圖的建立。在推薦階段,我們使用馬可夫移轉機率和穩定狀態機率來協助預測使用者網頁瀏覽的行為,推薦引擎可藉由模式發掘階段所產生之流覽圖和可到達圖來協助完成動態網頁的推薦。而我們結合以代理人為基礎和事件驅動非同步通知的架構來達到線上即時的資料準備、模式發掘和推薦三個階段。 |
英文摘要 |
Web recommendation systems are typical applications of Web usage mining. The Web recommendation system is structured according to an online and an off-line component. The off-line component is aimed at building the knowledge base by analyzing past usage profiles that is then used in the online component. The general Web recommendation system mainly uses offline data preprocessing and the mining process is not time-limited. However, this approach is not suitable in real-time dynamic environments. Therefore, we need high-performance online Web usage mining techniques to provide solutions to these problems. In this paper, we propose an online Web recommendation system using STPN. The Web recommendation process consists of the data preparation phase, pattern discovery phase, and recommendation phase. In data preparation phase, we use STPN to model the Web structure, and also apply the Web metrics of Web graph properties to analyze the Web structure in improving the Web access. We simultaneously employ the Web structure analysis information in the incidence matrix and the reachability properties, obtained from the STPN model, to help proceed with pageview identification and path completion at the data preprocessing. In pattern discovery phase, the navigational graph and reachability graph are employed to model the usage profiles. We use the data stream mining technique to incrementally maintain the navigational graph over a sliding-window. The STPN’s reachability behavior characteristic and incidence matrix are applied to construct the reachability graph. In recommendation phase, we use the transition probability and steady-state probability in Markov model to predict the user’s navigational behavior. The navigational graph and reachability graph are used by the Web recommendation engine to generate the online dynamic Web recommendation. We combine the features of both the agent-based architecture and event-driven asynchronous notification architecture to achieve the online data preparation, pattern discovery, and recommendation. |
第三語言摘要 | |
論文目次 |
Contents Chapter 1 Introduction 1 Chapter 2 Background and Related Works 6 2.1 Stochastic Timed Petri Nets (STPN) 6 2.1.1 STPN Reachability Behavior Characteristic 7 2.2 Markov Chains 9 2.3 Web Graph 10 2.3.1 Centrality 11 2.3.2 Global Metrics 12 2.3.3 Local Metrics 14 2.4 WS-Notification 15 2.5 Related Works 17 Chapter 3 System Overview 19 3.1 System Architecture 19 3.2 Component Description 21 Chapter 4 Online Change Detection and processing 24 4.1 Web Server Log File 24 4.2 File Monitoring Agent 26 4.2.1 Data cleaning 30 4.3 Notification broker 31 Chapter 5 Using STPN to Model and Analysis the Web Structure 34 5.1 Modeling a Website Structure with STPN 34 5.2 Web Structure Metrics 39 5.2.1 Centrality 42 5.2.2 Global Metrics 48 5.2.3 Local Metrics 50 5.3 Incremental Web Structure Adapting 51 5.3.1 Adding a Link 51 5.3.2 Adding a Webpage 53 5.3.3 Deleting a link 56 5.3.4 Deleting a Webpage 58 Chapter 6 Using STPN to Enhance Web Usage Data Preprocessing 62 6.1 User Identification 66 6.2 Session Identification 67 6.3 Pageview Identification 68 6.4 Path Completion 74 6.5 Data Transformation 78 Chapter 7 Online Web Recommendation 80 7.1 User Profiles 80 7.1.1 Navigational Graph 81 7.1.2 Reachability Graph 82 7.2 Probabilistic Model 83 7.3 Incremental Maintaining Navigational Graph and Reachability Graph 89 7.4 Web Recommendation 103 Chapter 8 Conclusions and Future Research 108 Reference 111 List of Figures Figure 1: The Web usage mining process 1 Figure 2: General architecture of a typical Web recommendation system 2 Figure 3: The compactness of global metrics 13 Figure 4: The stratum of global metrics 14 Figure 5: A typical WS-Notification interaction 16 Figure 6: A typical brokered WS-Notification interaction 16 Figure 7: Sample topic space 16 Figure 8: Sample WS-topic topic space definition 17 Figure 9: The system architecture and major components 20 Figure 10: Example Web log file in the common log format 25 Figure 11: The file monitoring agent component 26 Figure 12: Example of the event message for Web program 27 Figure 13: Example of the event message for Web log 28 Figure 14: The file monitoring agent algorithm 29 Figure 15: The data cleaning algorithm 31 Figure 16: The notification broker component 32 Figure 17: The interaction diagram of the message exchange 33 Figure 18: The Web structure of Table 1 37 Figure 19: The Stochastic Timed Petri Nets corresponding to the Web structure of Table 2 38 Figure 20: The flow chart of calculating Web metrics algorithm 40 Figure 21: Convert the Incidence matrix to the converted distance matrix algorithm 41 Figure 22: The Web centrality metrics algorithm 43 Figure 23: The Web structure of Table 3 45 Figure 24: The cross-linked tree structure after adjusting 48 Figure 25: The cross-linked tree structure and depth vector, child vector after adjusting 50 Figure 26: The adding a link algorithm 53 Figure 27: The adding a Webpage algorithm 55 Figure 28: The deleting a link algorithm 57 Figure 29: The deleting a Webpage algorithm 60 Figure 30: The Web usage data preprocessing process 63 Figure 31: The component diagram of pageview identification and path completion 64 Figure 32: The Web usage data preprocessing using STPN algorithm 65 Figure 33: The data schema of the User, Session, and Pageviews 66 Figure 34: The user identification algorithm 67 Figure 35: The session identification algorithm 68 Figure 36: Web structure and STPN structure 70 Figure 37: The STPN pageview identification algorithm 71 Figure 38: The STPN path completion algorithm 76 Figure 39: The state equation of path complete 78 Figure 40: The navigational graph 82 Figure 41: The reachability graph 83 Figure 42: The calculate stead-state probability matrix algorithm 87 Figure 43: The flow chart of adding session algorithm 89 Figure 44: The adding session algorithm 90 Figure 45: The add child node algorithm 92 Figure 46: The navigational graph over a sliding-window (size=6) 94 Figure 47: The remove session algorithm 96 Figure 48: The navigational graph over a sliding-window (size=6) after adding the new session S7 98 Figure 49: The adding session to reachability graph algorithm 100 Figure 50: The reachability graph after adding the session S7 101 Figure 51: The remove Web page algorithm 103 Figure 52: The flow chart of Web recommendation algorithm 104 Figure 53: The Web recommendation algorithm 105 Figure 54: The Web recommendation of the active session: b 106 Figure 55: The Web recommendation of the active session: b->f 107 List of Tables Table 1: The hyperlink content of a Web structure 36 Table 2: The incidence matrix represents a Web structure shown in Figure 15 37 Table 3: The Web structure example 44 Table 4: The places and the corresponding Web pages 45 Table 5: The transitions and the corresponding hyperlinks’ labels 45 Table 6: Incidence matrix [Aij]7x8 which denotes the Web structure 46 Table 7: The initial matrix [Cij]7x7 which denotes the Web structure 46 Table 8: The converted distance matrix [Cij]7x7 & the value of COD, CID, ROC, RIC 46 Table 9: The incidence matrix [Aij]7x9 after adding g link to e 47 Table 10: The converted distance matrix [Cij]7x7 & the value of COD, CID, ROC, RIC after adjusting 48 Table 11: The converted distance matrix [Dij]7x7 & stratum related metrics after adjusting 49 Table 12: The transitions and the corresponding hyperlinks’ labels after adding the link: g to e 53 Table 13: The incidence matrix after adding a new page h 53 Table 14: The places and the corresponding Web pages after adding a new page h 55 Table 15: The transitions and the corresponding hyperlinks’ labels after adding the link: h to e 55 Table 16: The incidence matrix after adding Web page h and the link: h to e 55 Table 17: The transitions and the corresponding hyperlinks’ labels after deleting the link: e to a 58 Table 18: The incidence matrix after deleting the link: e to a 58 Table 19: The places and the corresponding Web pages 60 Table 20: The transitions and the corresponding hyperlinks’ labels 61 Table 21: The associated matrix after deleting the link: e to d and the link: d to a 61 Table 22: A user session before pageview identification 73 Table 23: A user session after pageview identification 73 Table 24: The activity of Web usage metrics 79 Table 25: The 6 pieces of user’s session data 82 Table 26: The co-occurrence matrix of the two Web pages 85 Table 27: The initial transition probability matrix 85 Table 28: The steady-state probability matrix 88 Table 29: The co-occurrence matrix of the navigational graph 94 Table 30: The initial transition probability matrix of the navigational graph 95 Table 31: The steady-state probability matrix of the navigational graph 95 Table 32: The standardized steady-state probability matrix of the navigational graph 95 Table 33: The co-occurrence matrix after deleting session S1 and adding the new session S7 98 Table 34: The initial transition probability matrix after deleting session S1 and adding the new session S7 99 Table 35: The standardized steady-state probability matrix after deleting session S1 and adding the new session S7 99 Table 36: The co-occurrence matrix of the reachability graph 101 Table 37: The initial transition probability matrix of the reachability graph 102 Table 38: The standardized steady-state probability matrix of the reachability graph 102 |
參考文獻 |
[1] J. Srivastava, R. Cooley, M. Deshpande, and Pan-Ning Tan, “Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data,” SIGKDD Explorations, Vol.1, Issue 2, pp12-23, Jan. 2000. [2] R. Cooley, B. Mobasher, and J. Srivastava, “Data Preparation for Mining World Wide Web Browsing Patterns,” Journal of Knowledge and Information System, 1(1), pp. 5-32, 1999. [3] R. Cooley, Pang-Ning Tan, J. Srivastava, “Discovery of Interesting Usage Patterns from Web Data,” Lecture Notes in Computer Science, 2000. [4] R. Cooley, B. Mobasher, and J. Srivastava, “Web Mining: Information and Pattern Discovery on the World Wide Web,” Proc. Ninth IEEE Int’l Conf. Tools with AI (ICTAI ’97), pp. 558-567, 1997. [5] O. Nasraoui, R. Krishnapuram, and A. Joshi, “Mining Web AccessLogs Using a Relational Clustering Algorithm Based on a RobustEstimator,” Proc. Eighth Int’l World Wide Web Conf. (WWW ’99),pp. 40-41, 1999. [6] O. Nasraoui, R. Krishnapuram, H. Frigui, and A. Joshi, “Extracting Web User Profiles Using Relational Competitive Fuzzy Clustering,” Int’l J. Artificial Intelligence Tools, vol. 9, no. 4, pp. 509-526, 2000. [7] M. Spiliopoulou and L.C. Faulstich, “WUM: A Web Utilization Miner,” Proc. First Int’l Workshop Web and Databases (WebDB ’98), 1998. [8] M. Eirinaki and M. Vazirgiannis, “Web Mining for Web personalization,” ACM Transactions on Internet Technology, Vol. 3, No. 1, pp.1-27, Feb. 2003. [9] B. Mobasher, R. Cooley, and J. Srivastava, “Automatic Personalization Based on Web Usage Mining,” Communications of the ACM, Vol. 43, pp. 142-151, 2000. [10]R. Baraglia and P. Palmerini. “Suggest: a Web Usage Mining System,” In Proc. of IEEE Int’l Conf. on Information Technology: Coding and Computing, April 2002. [11] S.S. Anand, B. Mobasher, “Intelligent Techniques for Web Personalization,” Vol. 3169, Springer, Heidelberg, pp.1–36, 2005. [12]M. Jalali, N. Mustapha, A. Mamat, Md N. Sulaiman, “OPWUMP An Architecture for Online Predicting in WUM-based Personalization System,” In 13th International CSI Computer Science, Springer Verlag, 2008. [13]M. Jalali, N. Mustapha, N. B. Sulaiman, and A. Mamat, “A Web Usage Mining Approach Based on LCS Algorithm in Online Predicting Recommendation Systems,” in 12th International on Information Visualisation, London, UK, pp. 302-307, 2008. [14]G. Hulten, L. Spencer, P. Domingos, “Mining Time-Changing Data Streams,” In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106, 2001. [15]B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and Issues in Data Stream Systems,” The 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. Madison, Wisconsin, pp. 1-16, June 2002. [16]L. Golab, M. Ozsu, “Issues in Data Stream Management,” ACM SIGMOD Record, Vol. 32, pp. 5-14, 2003. [17]M.M. Gaber, A. Zaslavsky, S. Krishnaswamy, “Mining Data Streams: a Review,” SIGMOD Rec. 34(2), pp. 18–26, 2005. [18]M. Stonebraker, U. Cetintemel, S. Zdonik, “The 8 Requirements of Real-Time Stream Processing,” SIGMOD Rec. 34(4), pp. 42–47, 2005. [19]C. Aggarwal, “Data Streams: Models and Algorithms,” Advances in Database Systems, Springer, Heidelberg, 2007. [20]W. Reisig, “Correctness Proofs of Distributed Algorithms,” Lecture Notes in Computer Science, Vol. 938: Theory and Practice in Distributed Systems, pp. 164-177, 1995. [21]R. Sarukkai, “Link Prediction and Path Analysis Using Markov Chains,” 9th International WWW Conference, Amsterdam, pp. 377–386, 2000. [22]Devanshu Dhyani, Sourav S Bhowmick, Wee-Keong Ng, “Modelling and Predicting Web Page Accesses Using Markov Processes,” IEEE,Computer Society, pp. 1529-4188, 2003. [23]M. Deshpande and G. Karypis, “Selective Models for Predicting Web Page Accesses,” Transactions on Internet Technology, Volume 4, Issue 2, pp. 163–184, 2004. [24]M. Eirinaki, M. Vazirgiannis, and D. Kapogiannis, “Web Path Recommendations Based on Page Ranking and Markov Models,” Proceedings of the 7th annual ACM international workshop on Web information and data management, New York, NY, USA, ACM Press, pp. 2–9, 2005. [25]Faten Khalil, Jiuyong Li, Hua Wang, “Integrating Markov Model with Clustering for Predicting Web Page Accesses,” Australian Conference, pp. 1-26, Mar. 2007. [26]Devanshu Dhyani, Wee Keong Ng, and Sourav S. Bhowmick, “A Survey of Web Metrics,” ACM Computing Surveys, Vol. 34, No. 4, pp. 469-503, Dec. 2002 [27]Tadao Murata, “Petri Nets: Properties, Analysis and Applications,” Proceedings of the IEEE, Vol. 77, No. 4, 1989. [28]M. Ajmone Marsan, “Stochastic Petri Nets:An Elementary Introduction,” Lecture Notes in Computer Science, Vol. 424, pp. 1-29, 1990. [29]Wolfgang Reisig, “Correctness Proofs of Distributed Algorithms,” Lecture Notes in Computer Science, Vol. 938: Theory and Practice in Distributed Systems, pp. 164-177, 1995. [30]James. R. Norris, “Markov Chains,” Cambridge University Press, 1997. [31]Pierre Bremaud, “Markov Chains,” Springer, 1999. [32]R. Botafogo, E. Rivlin, and B. Shneiderman, “Structural Analysis of Hypertexts: Identifying Hierarchies and Useful Metrics,” ACM Transaction on Information System, 10, pp.142-180, Apr. 1992. [33]S. Graham et al., “Publish-Subscribe Notification for Web Services,” V. 1.0, joint specification by BEA Systems, IBM, and Microsoft, Mar. 2004. [34]S. Graham et al., “Web Services Base Notification (WS-Base Notification),” V.1.0, joint specification by BEA Systems, IBM, and Microsoft, Mar. 2004. [35]S. Graham et al., “Web Services Brokered Notification (WS-Brokered Notification),” V.1.0, joint specification by BEA Systems, IBM, and Microsoft, Mar. 2004. [36]S. Graham et al., “Web Services Topics (WSTopics),” V. 1.0, joint specification by BEA Systems, IBM, and Microsoft, Mar. 2004. [37]T. Yan, M. Jacobsen, H. Garcia-Molina and U. Dayal, “From User Access Patterns to Dynamic Hypertext Linking,” Fifth International World Wide Web Conference, May 1996. [38]M. Perkowitz, and O.Etzioni, “Adaptive web sites: Automatically synthesizing web pages,” In Proceedings of the Fifteenth National Conference on Artificial Intelligence, 1998. [39]M. Perkowitz, and O.Etzioni, “Towards adaptive web sites: Conceptual framework and case study,” In Proceedings of the eighth World Wide Web Conference, 1999. [40]R. Baraglia and F. Silvestri, “Dynamic personalization of web sites without user intervention,” Communications of the ACM, vol. 50, pp. 63-67, 2007. [41]R. Baraglia and F. Silvestri, “An Online Recommender System for Large Web Sites,” pp.199-205, 2004. [42]Myra Spiliopoulou, Lukas C. Faulstich, “WUM: A Tool for Web Utilization Analysis,” Proceeding of EDBT workshop WebDB’98, LNCS 1590, Springer, Berlin, Germany, pp. 184-203, 1998. [43]Murat Ali Bayir, Ismail H. Toroslu, Ahmet Cosar, “A New Approach for Reactive Web Usage Data Processing,” Pro. Of ICDEW’06, pp. 91-100, 2006. [44]World Wide Web Consortium, Common Logfile Formate, http://www.w3.org/Daemon/User/Config/Logging.html [45]Miriam Baglioni, U. Ferrara, Andrea Romei, Salvatore Ruggieri, Franco Turini, “Preprocessing and Mining Web Log Data for Web Personalization,” Proc. of 8th Natl' Conf. of the Italian Association for Artificial Intelligence., LNCS 2829, Springer, Berlin, Germany, pp. 237-249, 2003. [46]M. Wooldridge and P. Ciancarini, “Agent-Oriented Software Engineering: The State of the Art,” First International Workshop on Agent-Oriented Software Engineering, pp. 1–28, 2001. [47]R. Cooley “The Use of Web Structure and Content to Identify Subjectively Interesting Web Usage Patterns,” ACM Transactions on Internet Technoloey, Vol.3, No.2, pp 93-116, May 2003. [48]A. Buchner, M. Mulvenna, “Discovering Internet Marketing Intelligence Through Online Analytical Web Usage Mining,” SIGMOD Record, Vol.27, No.4, pp.54-61, Dec. 1998. [49]Peter Pirolli, James Pitkow, Ramana Rao, “Silk from a Sow’s Ear: Extracting Usable Structures from the Web,” Conference on Human Factors in Computing Systems, CHI-96, 1996. [50]Myra Spiliopoulou, Carsten Pohle, Lukas C. Faulstich, “Improving the Effectiveness of a Web Site with Web Usage Mining,” WEBKDD, 1999. [51]Jeffrey Heer, Ed H. Chi, “Identification of Web User Traffic Composition using Multi-Modal Clustering and Information,” In Proceedings of the 1st SIAM International Conference on Data Mining Workshop on Web Mining, pp.51-58, 2001. [52]Chen, Po-Zung; Yang, Shih-Yang; Sun, Chu-Hao, “Modeling and Analysis the Web Structure Using Stochastic Timed Petri Nets,” Journal of Software, Vol 3, No 8 pp. 19-26, 2008 [53]M. Spiliopoulou, B. Mobasher, B. Berendt, and M. Nakagawa, “A Framework for the Evaluation of Session Reconstruction Heuristics in Web Usage Analysis,” INFORMS Journal on Computing, 2003. [54]Magdalini Eirinaki and Michalis Vazirgiannis, “Web Mining for Web Personalization,” ACM Transactions on Internet Technology, Vol. 3, No. 1, pp.1-27, Feb. 2003. [55]Chen, Po-Zung; Yang, Shih-Yang; Sun, Chu-Hao, “Using Petri Nets to Enhance Web Usage Mining,” Acta Polytechnica Hungarica, Vol. 4, No. 3, pp. 113-125, 2007. [56]Yan Wang, “Web Mining and Knowledge Discovery of Usage Patterns.” [57]Lara Catledge and James Pitkow, “Characterizing Browsing Behaviors on the World Wide Web”, Computer Networks and ISDN Systems, 27(6), 1995. [58]P.I. Hofgesang, “Web Personalisation Through Incremental Individual Profiling and Support-based User Segmentation,” In: WI 2007: Proceedings of the 2007 IEEE/WIC/ACM International Conference on Web Intelligence, IEEE Computer Society, Washington, pp. 213–220, 2007. [59]P.I. Hofgesang, J.P. Patist, “Online Change Detection in Individual Web User Behaviour,” In: WWW 2008: Proceedings of the 17th International Conference on World Wide Web, ACM, New York , pp. 1157–1158, 2008. [60]P.I. Hofgesang, “Online Mining of Web Usage Data: An Overview,” Web Mining Applications in E-Commerce and E-Services, Springer, pp. 1-24, 2009. |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信