§ 瀏覽學位論文書目資料
系統識別號 U0002-0104201113455200
DOI 10.6846/TKU.2011.01087
論文名稱(中文) 使用隨機派翠網路之線上網頁推薦系統
論文名稱(英文) Online Web Recommendation System by Using Stochastic Timed Petri Nets
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系博士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 99
學期 1
出版年 100
研究生(中文) 孫初豪
研究生(英文) Chu-Hao Sun
學號 894190320
學位類別 博士
語言別 英文
第二語言別
口試日期 2011-01-13
論文頁數 118頁
口試委員 指導教授 - 陳伯榮
委員 - 陳良弼
委員 - 陳省隆
委員 - 趙景明
委員 - 莊博任
委員 - 陳伯榮
關鍵字(中) 網頁推薦系統
網頁使用習性探勘
資料串流
隨機過程時間派翠網路
網頁結構特性
衡量網站的基準
關鍵字(英) Web usage mining
Data stream mining
Stochastic Timed Petri Nets
Markov model
Web graph properties
Web metrics
第三語言關鍵字
學科別分類
中文摘要
網頁推薦系統為網頁使用習性探勘典型的應用,而網頁推薦系統在結構上可分成離線(off-line)與線上(online)兩個元件,離線元件主要根據分析過去歷史使用者習性資料(usage profile)來建立知識,並提供給線上元件使用。由於一般網頁推薦系統大多使用離線之資料前置處理,而執行資料探勘時也沒有時間限制,故此方法並不適合即時的動態環境,因此,我們需要高效能之線上網站使用習性探勘技術來解決這些問題。
我們在本文中提出以隨機過程時間派翠網路為基礎之線上網頁推薦系統,網頁推薦處理程序可分為資料準備、模式發掘和推薦三個階段。在資料準備階段,我們利用隨機過程時間派翠網路作為分析網站網頁架構的模型,並加入衡量網站基準分析網頁結構特性來增進網頁資訊的擷取。而分析網站架構後所得到的關聯矩陣與派翠網路模型之可到達行為特性可協助進行資料前置處理中的網頁內容範圍辨識和路徑填補。在模式發掘階段,我們應用流覽圖(navigational graph)和可到達圖(reachability graph)來作為使用者習性資料的模型。我們利用資料串流探勘技術維護在滑動視窗(sliding-window)上的流覽圖,並應用關聯矩陣與派翠網路模型之可到達行為特性進行可到達圖的建立。在推薦階段,我們使用馬可夫移轉機率和穩定狀態機率來協助預測使用者網頁瀏覽的行為,推薦引擎可藉由模式發掘階段所產生之流覽圖和可到達圖來協助完成動態網頁的推薦。而我們結合以代理人為基礎和事件驅動非同步通知的架構來達到線上即時的資料準備、模式發掘和推薦三個階段。
英文摘要
Web recommendation systems are typical applications of Web usage mining. The Web recommendation system is structured according to an online and an off-line component. The off-line component is aimed at building the knowledge base by analyzing past usage profiles that is then used in the online component. The general Web recommendation system mainly uses offline data preprocessing and the mining process is not time-limited. However, this approach is not suitable in real-time dynamic environments. Therefore, we need high-performance online Web usage mining techniques to provide solutions to these problems.
 In this paper, we propose an online Web recommendation system using STPN. The Web recommendation process consists of the data preparation phase, pattern discovery phase, and recommendation phase. In data preparation phase, we use STPN to model the Web structure, and also apply the Web metrics of Web graph properties to analyze the Web structure in improving the Web access. We simultaneously employ the Web structure analysis information in the incidence matrix and the reachability properties, obtained from the STPN model, to help proceed with pageview identification and path completion at the data preprocessing. In pattern discovery phase, the navigational graph and reachability graph are employed to model the usage profiles. We use the data stream mining technique to incrementally maintain the navigational graph over a sliding-window. The STPN’s reachability behavior characteristic and incidence matrix are applied to construct the reachability graph. In recommendation phase, we use the transition probability and steady-state probability in Markov model to predict the user’s navigational behavior. The navigational graph and reachability graph are used by the Web recommendation engine to generate the online dynamic Web recommendation. We combine the features of both the agent-based architecture and event-driven asynchronous notification architecture to achieve the online data preparation, pattern discovery, and recommendation.
第三語言摘要
論文目次
Contents
Chapter 1  Introduction	1
Chapter 2  Background and Related Works	6
2.1  Stochastic Timed Petri Nets (STPN)	6
2.1.1  STPN Reachability Behavior Characteristic	7
2.2  Markov Chains	9
2.3  Web Graph	10
2.3.1  Centrality	11
2.3.2  Global Metrics	12
2.3.3  Local Metrics	14
2.4  WS-Notification	15
2.5  Related Works	17
Chapter 3  System Overview	19
3.1  System Architecture	19
3.2  Component Description	21
Chapter 4  Online Change Detection and processing	24
4.1  Web Server Log File	24
4.2  File Monitoring Agent	26
4.2.1  Data cleaning	30
4.3  Notification broker	31
Chapter 5  Using STPN to Model and Analysis the Web Structure	34
5.1  Modeling a Website Structure with STPN	34
5.2  Web Structure Metrics	39
5.2.1  Centrality	42
5.2.2  Global Metrics	48
5.2.3  Local Metrics	50
5.3  Incremental Web Structure Adapting	51
5.3.1  Adding a Link	51
5.3.2  Adding a Webpage	53
5.3.3  Deleting a link	56
5.3.4  Deleting a Webpage	58
Chapter 6  Using STPN to Enhance Web Usage Data Preprocessing	62
6.1  User Identification	66
6.2  Session Identification	67
6.3  Pageview Identification	68
6.4  Path Completion	74
6.5  Data Transformation	78
Chapter 7  Online Web Recommendation	80
7.1  User Profiles	80
7.1.1  Navigational Graph	81
7.1.2  Reachability Graph	82
7.2  Probabilistic Model	83
7.3  Incremental Maintaining Navigational Graph and Reachability   Graph	89
7.4  Web Recommendation	103
Chapter 8  Conclusions and Future Research	108
Reference	111
 
List of Figures
Figure 1:   The Web usage mining process	1
Figure 2:   General architecture of a typical Web recommendation system	2
Figure 3:   The compactness of global metrics	13
Figure 4:   The stratum of global metrics	14
Figure 5:   A typical WS-Notification interaction	16
Figure 6:   A typical brokered WS-Notification interaction	16
Figure 7:   Sample topic space	16
Figure 8:   Sample WS-topic topic space definition	17
Figure 9:   The system architecture and major components	20
Figure 10:  Example Web log file in the common log format	25
Figure 11:  The file monitoring agent component	26
Figure 12:  Example of the event message for Web program	27
Figure 13:  Example of the event message for Web log	28
Figure 14:  The file monitoring agent algorithm	29
Figure 15:  The data cleaning algorithm	31
Figure 16:  The notification broker component	32
Figure 17:  The interaction diagram of the message exchange	33
Figure 18:  The Web structure of Table 1	37
Figure 19:  The Stochastic Timed Petri Nets corresponding to the Web  structure of Table 2	38
Figure 20:  The flow chart of calculating Web metrics algorithm	40
Figure 21:  Convert the Incidence matrix to the converted distance matrix algorithm	41
Figure 22:  The Web centrality metrics algorithm	43
Figure 23:  The Web structure of Table 3	45
Figure 24:  The cross-linked tree structure after adjusting	48
Figure 25:  The cross-linked tree structure and depth vector, child vector   after adjusting	50
Figure 26:  The adding a link algorithm	53
Figure 27:  The adding a Webpage algorithm	55
Figure 28:  The deleting a link algorithm	57
Figure 29:  The deleting a Webpage algorithm	60
Figure 30:  The Web usage data preprocessing process	63
Figure 31:  The component diagram of pageview identification and path completion	64
Figure 32:  The Web usage data preprocessing using STPN algorithm	65
Figure 33:  The data schema of the User, Session, and Pageviews	66
Figure 34:  The user identification algorithm	67
Figure 35:  The session identification algorithm	68
Figure 36:  Web structure and STPN structure	70
Figure 37:  The STPN pageview identification algorithm	71
Figure 38:  The STPN path completion algorithm	76
Figure 39:  The state equation of path complete	78
Figure 40:  The navigational graph	82
Figure 41:  The reachability graph	83
Figure 42:  The calculate stead-state probability matrix algorithm	87
Figure 43:  The flow chart of adding session algorithm	89
Figure 44:  The adding session algorithm	90
Figure 45:  The add child node algorithm	92
Figure 46:  The navigational graph over a sliding-window (size=6)	94
Figure 47:  The remove session algorithm	96
Figure 48:  The navigational graph over a sliding-window (size=6) after adding the new session S7	98
Figure 49:  The adding session to reachability graph algorithm	100
Figure 50:  The reachability graph after adding the session S7	101
Figure 51:  The remove Web page algorithm	103
Figure 52:  The flow chart of Web recommendation algorithm	104
Figure 53:  The Web recommendation algorithm	105
Figure 54:  The Web recommendation of the active session: b	106
Figure 55:  The Web recommendation of the active session: b->f	107
 
List of Tables
Table 1:   The hyperlink content of a Web structure	36
Table 2:   The incidence matrix represents a Web structure shown in    Figure 15	37
Table 3:   The Web structure example	44
Table 4:   The places and the corresponding Web pages	45
Table 5:   The transitions and the corresponding hyperlinks’ labels	45
Table 6:   Incidence matrix [Aij]7x8 which denotes the Web structure	46
Table 7:   The initial matrix [Cij]7x7 which denotes the Web structure	46
Table 8:   The converted distance matrix [Cij]7x7 & the value of COD, CID,  ROC, RIC	46
Table 9:   The incidence matrix [Aij]7x9 after adding g link to e	47
Table 10:  The converted distance matrix [Cij]7x7 & the value of COD, CID,  ROC, RIC after adjusting	48
Table 11:  The converted distance matrix [Dij]7x7 & stratum related metrics  after adjusting	49
Table 12:  The transitions and the corresponding hyperlinks’ labels after  adding the link: g to e	53
Table 13:  The incidence matrix after adding a new page h	53
Table 14:  The places and the corresponding Web pages after adding a    new page h	55
Table 15:  The transitions and the corresponding hyperlinks’ labels after  adding the link: h to e	55
Table 16:  The incidence matrix after adding Web page h and the link:        h to e	55
Table 17:  The transitions and the corresponding hyperlinks’ labels after deleting the link: e to a	58
Table 18:  The incidence matrix after deleting the link: e to a	58
Table 19:  The places and the corresponding Web pages	60
Table 20:  The transitions and the corresponding hyperlinks’ labels	61
Table 21:  The associated matrix after deleting the link: e to d and the link:     d to a	61
Table 22:  A user session before pageview identification	73
Table 23:  A user session after pageview identification	73
Table 24:  The activity of Web usage metrics	79
Table 25:  The 6 pieces of user’s session data	82
Table 26:  The co-occurrence matrix of the two Web pages	85
Table 27:  The initial transition probability matrix	85
Table 28:  The steady-state probability matrix	88
Table 29:  The co-occurrence matrix of the navigational graph	94
Table 30:  The initial transition probability matrix of the navigational    graph	95
Table 31:  The steady-state probability matrix of the navigational graph	95
Table 32:  The standardized steady-state probability matrix of the  navigational graph	95
Table 33:  The co-occurrence matrix after deleting session S1 and    adding the new session S7	98
Table 34:  The initial transition probability matrix after deleting session    S1 and adding the new session S7	99
Table 35:  The standardized steady-state probability matrix after deleting session S1 and adding the new session S7	99
Table 36:  The co-occurrence matrix of the reachability graph	101
Table 37:  The initial transition probability matrix of the reachability     graph	102
Table 38:  The standardized steady-state probability matrix of the  reachability graph	102
參考文獻
[1]	J. Srivastava, R. Cooley, M. Deshpande, and Pan-Ning Tan, “Web Usage Mining: Discovery and Applications of Usage Patterns from Web Data,” SIGKDD Explorations, Vol.1, Issue 2, pp12-23, Jan. 2000.
[2]	R. Cooley, B. Mobasher, and J. Srivastava, “Data Preparation for Mining World Wide Web Browsing Patterns,” Journal of Knowledge and Information System, 1(1), pp. 5-32, 1999.
[3]	R. Cooley, Pang-Ning Tan, J. Srivastava, “Discovery of Interesting Usage Patterns from Web Data,” Lecture Notes in Computer Science, 2000.
[4] R. Cooley, B. Mobasher, and J. Srivastava, “Web Mining: Information and Pattern Discovery on the World Wide Web,” Proc. Ninth IEEE Int’l Conf. Tools with AI (ICTAI ’97), pp. 558-567, 1997.
[5] O. Nasraoui, R. Krishnapuram, and A. Joshi, “Mining Web AccessLogs Using a Relational Clustering Algorithm Based on a RobustEstimator,” Proc. Eighth Int’l World Wide Web Conf. (WWW ’99),pp. 40-41, 1999.
[6] O. Nasraoui, R. Krishnapuram, H. Frigui, and A. Joshi, “Extracting Web User Profiles Using Relational Competitive Fuzzy Clustering,” Int’l J. Artificial Intelligence Tools, vol. 9, no. 4, pp. 509-526, 2000.
[7] M. Spiliopoulou and L.C. Faulstich, “WUM: A Web Utilization Miner,” Proc. First Int’l Workshop Web and Databases (WebDB ’98), 1998.
[8]	M. Eirinaki and M. Vazirgiannis, “Web Mining for Web personalization,” ACM Transactions on Internet Technology, Vol. 3, No. 1, pp.1-27, Feb. 2003.
[9]	B. Mobasher, R. Cooley, and J. Srivastava, “Automatic Personalization Based on Web Usage Mining,” Communications of the ACM, Vol. 43, pp. 142-151, 2000.
[10]R. Baraglia and P. Palmerini. “Suggest: a Web Usage Mining System,” In Proc. of IEEE Int’l Conf. on Information Technology: Coding and Computing, April 2002.
[11] S.S. Anand, B. Mobasher, “Intelligent Techniques for Web Personalization,” Vol. 3169, Springer, Heidelberg, pp.1–36, 2005.
[12]M. Jalali, N. Mustapha, A. Mamat, Md N. Sulaiman, “OPWUMP An Architecture for Online Predicting in WUM-based Personalization System,” In 13th International CSI Computer Science, Springer Verlag, 2008.
[13]M. Jalali, N. Mustapha, N. B. Sulaiman, and A. Mamat, “A Web Usage Mining Approach Based on LCS Algorithm in Online Predicting Recommendation Systems,” in 12th International on Information Visualisation, London, UK, pp. 302-307, 2008.
[14]G. Hulten, L. Spencer, P. Domingos, “Mining Time-Changing Data Streams,” In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 97–106, 2001.
[15]B. Babcock, S. Babu, M. Datar, R. Motwani, and J. Widom, “Models and Issues in Data Stream Systems,” The 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. Madison, Wisconsin, pp. 1-16, June 2002.
[16]L. Golab, M. Ozsu, “Issues in Data Stream Management,” ACM SIGMOD Record, Vol. 32, pp. 5-14, 2003.
[17]M.M. Gaber, A. Zaslavsky, S. Krishnaswamy, “Mining Data Streams: a Review,” SIGMOD Rec. 34(2), pp. 18–26, 2005.
[18]M. Stonebraker, U. Cetintemel, S. Zdonik, “The 8 Requirements of Real-Time Stream Processing,” SIGMOD Rec. 34(4), pp. 42–47, 2005.
[19]C. Aggarwal, “Data Streams: Models and Algorithms,” Advances in Database Systems, Springer, Heidelberg, 2007.
[20]W. Reisig, “Correctness Proofs of Distributed Algorithms,” Lecture Notes in Computer Science, Vol. 938: Theory and Practice in Distributed Systems, pp. 164-177, 1995.
[21]R. Sarukkai, “Link Prediction and Path Analysis Using Markov Chains,” 9th International WWW Conference, Amsterdam, pp. 377–386, 2000. 
[22]Devanshu Dhyani, Sourav S Bhowmick, Wee-Keong Ng, “Modelling and Predicting Web Page Accesses Using Markov Processes,” IEEE,Computer Society, pp. 1529-4188, 2003.
[23]M. Deshpande and G. Karypis, “Selective Models for Predicting Web Page Accesses,” Transactions on Internet Technology, Volume 4, Issue 2, pp. 163–184, 2004. 
[24]M. Eirinaki, M. Vazirgiannis, and D. Kapogiannis, “Web Path Recommendations Based on Page Ranking and Markov Models,” Proceedings of the 7th annual ACM international workshop on Web information and data management, New York, NY, USA, ACM Press, pp. 2–9, 2005.
[25]Faten Khalil, Jiuyong Li, Hua Wang, “Integrating Markov Model with Clustering for Predicting Web Page Accesses,” Australian Conference, pp. 1-26, Mar. 2007.
[26]Devanshu Dhyani, Wee Keong Ng, and Sourav S. Bhowmick, “A Survey of Web Metrics,” ACM Computing Surveys, Vol. 34, No. 4, pp. 469-503, Dec. 2002
[27]Tadao Murata, “Petri Nets: Properties, Analysis and Applications,” Proceedings of the IEEE, Vol. 77, No. 4, 1989. 
[28]M. Ajmone Marsan, “Stochastic Petri Nets:An Elementary Introduction,” Lecture Notes in Computer Science, Vol. 424, pp. 1-29, 1990.
[29]Wolfgang Reisig, “Correctness Proofs of Distributed Algorithms,” Lecture Notes in Computer Science, Vol. 938: Theory and Practice in Distributed Systems, pp. 164-177, 1995.
[30]James. R. Norris, “Markov Chains,” Cambridge University Press, 1997.
[31]Pierre Bremaud, “Markov Chains,” Springer, 1999.
[32]R. Botafogo, E. Rivlin, and B. Shneiderman, “Structural Analysis of Hypertexts: Identifying Hierarchies and Useful Metrics,” ACM Transaction on Information System, 10, pp.142-180, Apr. 1992.
[33]S. Graham et al., “Publish-Subscribe Notification for Web Services,” V. 1.0, joint specification by BEA Systems, IBM, and Microsoft, Mar. 2004.
[34]S. Graham et al., “Web Services Base Notification (WS-Base Notification),” V.1.0, joint specification by BEA Systems, IBM, and Microsoft, Mar. 2004.
[35]S. Graham et al., “Web Services Brokered Notification (WS-Brokered Notification),” V.1.0, joint specification by BEA Systems, IBM, and Microsoft, Mar. 2004.
[36]S. Graham et al., “Web Services Topics (WSTopics),” V. 1.0, joint specification by BEA Systems, IBM, and Microsoft, Mar. 2004.
[37]T. Yan, M. Jacobsen, H. Garcia-Molina and U. Dayal, “From User Access Patterns to Dynamic Hypertext Linking,” Fifth International World Wide Web Conference, May 1996.
[38]M. Perkowitz, and O.Etzioni, “Adaptive web sites: Automatically synthesizing web pages,” In Proceedings of the Fifteenth National Conference on Artificial Intelligence, 1998.
[39]M. Perkowitz, and O.Etzioni, “Towards adaptive web sites: Conceptual framework and case study,” In Proceedings of the eighth World Wide Web Conference, 1999.
[40]R. Baraglia and F. Silvestri, “Dynamic personalization of web sites without user intervention,” Communications of the ACM, vol. 50, pp. 63-67, 2007.
[41]R. Baraglia and F. Silvestri, “An Online Recommender System for Large Web Sites,” pp.199-205, 2004.
[42]Myra Spiliopoulou, Lukas C. Faulstich, “WUM: A Tool for Web Utilization Analysis,” Proceeding of EDBT workshop WebDB’98, LNCS 1590, Springer, Berlin, Germany, pp. 184-203, 1998.
[43]Murat Ali Bayir, Ismail H. Toroslu, Ahmet Cosar, “A New Approach for Reactive Web Usage Data Processing,” Pro. Of ICDEW’06, pp. 91-100, 2006.
[44]World Wide Web Consortium, Common Logfile Formate, http://www.w3.org/Daemon/User/Config/Logging.html
[45]Miriam Baglioni, U. Ferrara, Andrea Romei, Salvatore Ruggieri, Franco Turini, “Preprocessing and Mining Web Log Data for Web Personalization,” Proc. of 8th Natl' Conf. of the Italian Association for Artificial Intelligence., LNCS 2829, Springer, Berlin, Germany, pp. 237-249, 2003.
[46]M. Wooldridge and P. Ciancarini, “Agent-Oriented Software Engineering: The State of the Art,” First International Workshop on Agent-Oriented Software Engineering, pp. 1–28, 2001.
[47]R. Cooley “The Use of Web Structure and Content to Identify Subjectively Interesting Web Usage Patterns,” ACM Transactions on Internet Technoloey, Vol.3, No.2, pp 93-116, May 2003.
[48]A. Buchner, M. Mulvenna, “Discovering Internet Marketing Intelligence Through Online Analytical Web Usage Mining,” SIGMOD Record, Vol.27, No.4, pp.54-61, Dec. 1998.
[49]Peter Pirolli, James Pitkow, Ramana Rao, “Silk from a Sow’s Ear: Extracting Usable Structures from the Web,” Conference on Human Factors in Computing Systems, CHI-96, 1996.
[50]Myra Spiliopoulou, Carsten Pohle, Lukas C. Faulstich, “Improving the Effectiveness of a Web Site with Web Usage Mining,” WEBKDD, 1999.
[51]Jeffrey Heer, Ed H. Chi, “Identification of Web User Traffic Composition using Multi-Modal Clustering and Information,” In Proceedings of the 1st SIAM International Conference on Data Mining Workshop on Web Mining, pp.51-58, 2001.
[52]Chen, Po-Zung; Yang, Shih-Yang; Sun, Chu-Hao, “Modeling and Analysis the Web Structure Using Stochastic Timed Petri Nets,” Journal of Software, Vol 3, No 8 pp. 19-26, 2008
[53]M. Spiliopoulou, B. Mobasher, B. Berendt, and M. Nakagawa, “A Framework for the Evaluation of Session Reconstruction Heuristics in Web Usage Analysis,” INFORMS Journal on Computing, 2003.
[54]Magdalini Eirinaki and Michalis Vazirgiannis, “Web Mining for Web Personalization,” ACM Transactions on Internet Technology, Vol. 3, No. 1, pp.1-27, Feb. 2003.
[55]Chen, Po-Zung; Yang, Shih-Yang; Sun, Chu-Hao, “Using Petri Nets to Enhance Web Usage Mining,” Acta Polytechnica Hungarica, Vol. 4, No. 3, pp. 113-125, 2007.
[56]Yan Wang, “Web Mining and Knowledge Discovery of Usage Patterns.”
[57]Lara Catledge and James Pitkow, “Characterizing Browsing Behaviors on the World Wide Web”, Computer Networks and ISDN Systems, 27(6), 1995.
[58]P.I. Hofgesang, “Web Personalisation Through Incremental Individual Profiling and Support-based User Segmentation,” In: WI 2007: Proceedings of the 2007 IEEE/WIC/ACM International Conference on Web Intelligence, IEEE Computer Society, Washington, pp. 213–220, 2007.
[59]P.I. Hofgesang, J.P. Patist, “Online Change Detection in Individual Web User Behaviour,” In: WWW 2008: Proceedings of the 17th International Conference on World Wide Web, ACM, New York , pp. 1157–1158, 2008.
[60]P.I. Hofgesang, “Online Mining of Web Usage Data: An Overview,” Web Mining Applications in E-Commerce and E-Services, Springer, pp. 1-24, 2009.
論文全文使用權限
校內
紙本論文於授權書繳交後5年公開
校內書目立即公開
校外
不同意授權

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信