§ 瀏覽學位論文書目資料
  
系統識別號 U0002-2507201723312000
DOI 10.6846/TKU.2017.00905
論文名稱(中文) 在社群網路串流中之樣式探勘
論文名稱(英文) Mining Patterns in Social Network Streaming
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士在職專班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 105
學期 2
出版年 106
研究生(中文) 莊敦欽
研究生(英文) Tun-Chin Chuang
學號 704410074
學位類別 碩士
語言別 英文
第二語言別
口試日期 2017-07-19
論文頁數 38頁
口試委員 指導教授 - 陳以錚(ejen0831@gmail.com)
委員 - 施國琛(timothykshih@gmail.com)
委員 - 惠霖(amar0627@gmail.com)
關鍵字(中) 社群網路分析
資料串流探勘
循序樣式探勘
關鍵字(英) Social Network Analysis
Data Stream Mining
Sequential Pattern Mining
第三語言關鍵字
學科別分類
中文摘要
近年來,由於社群網站的蓬勃發展,許多的研究都集中在從社群網路中探勘出有用的知識,目前所有的方法大都是以圖論的方法為主,即從龐大的社群資料中,找出一些頻繁或具有代表性的子圖(sub-graph)。但以圖論為基礎的方法,在找尋所有頻繁子圖時的效率非常的差,就算有利用一些加速編碼(encoding)的方式或是一些修剪策略(pruning strategy),效能方面還是依舊無法大幅改善。

現今的社群網站資料都是非常的龐大,以圖論為基礎的子圖探勘方法通常只能適用於中小型的圖形資料,無法處理大型的社群網路圖資。社群資料通常會隨著時間有所變化,會有新的節點或新的鏈結加入,相對地也會有舊的節點或鏈結被移除,圖論的方法只能從一個時間點中的靜態社群網路粹取出循序樣式,對於社群網路的串流資料,並無法從中間找出時間演進所帶來的變化,這些都是以圖論為基礎的方法為人所詬病的地方。

在本論文中,我們設計一個利用傳統的循序樣式探勘技術,從社群網路串流資料中,能有效率地探勘出具有時間演進代表性循序樣式。此外,提出一個新的演算法Streaming Evolution Pattern Miner (SEPM),有效的提升探勘效率與維護頻繁序列。SEPM還採用了一個修剪策略,有效地減少記憶體使用,最後使用虛構資料集的實驗結果表示SEPM於社群網路串流中有高效地執行效率。
英文摘要
In recent years, due to the growth of social website, many studies have focused on discovery useful knowledge from the social networks. Currently, nearly all methods are based heavily on graph theory method. To find some frequent or representative subgraphs from the large social data. However, the graph-based approach is very inefficient identifying frequent subgraphs. Even if some methods of accelerating encoding or pruning strategies are implemented, performance is not improved significantly.

Today's social networking community information is very large. Based on graph theory the sub-graph exploration methods are usually only applicable to small and medium-sized graphics data sets. They cannot handle large-scale social network graph data. Community information continually changes over time. There will be new nodes or new edges joining the set. Similarly, old nodes or edges will need to removed. This method can only be used to represent a snapshot of the static social network of streaming data, and cannot represent the evolution of the changes brought about by these are based on the graph theory methods criticized area.

In this paper, we devised a method of using traditional sequential pattern mining technology from the social network streaming data to effectively discovery the evolution of representative sequential pattern over time. In addition, a new algorithm, Streaming Evolution Pattern Miner (SEPM), is proposed to effectively improve the efficiency of exploration and to maintain frequent sequences. Additionally, SEPM uses a pruning strategy to effectively reduce the use of memory. Finally, we utilize the synthetic dataset of experimental results to display improved SEPM social network streaming efficiency.
第三語言摘要
論文目次
Table of Contents
Chinese Abstract I
Abstract II
Table of Contents IV
List of Figures VI
Chapter 1  Introduction	1
Chapter 2  Related work	9
2.1 Pattern Mining in Dynamic Networks	9
2.2 Pattern Mining in Interval database	11
Chapter 3  Problem Definition	13
3.1 Preliminary	13
Definition 1 (Social Network Streaming)	13
Definition 2 (Interaction Interval and Frequent Sequence)	14
3.2 Interaction Expression and Evolution Pattern	14
Definition 3 (Interaction Expression).	15
Definition 4 (Evolution Database)	16
Definition 5 (Evolution Pattern)	17
Chapter 4  Algorithm: Streaming Evolution Pattern Miner (SEPM)	18
4.1 Streaming Evolution Pattern Mining	18
Definition 6 (Streaming Evolution Pattern Tree)	21
4.2 Streaming Evolution Pattern Maintenance	24
Definition 7 (Updated Network and Database)	24
Chapter 5  Experiments	30
Chapter 6  Conclusion	34
Reference	35
 
List of Figures 
Fig. 1: An illustration of the significant subgraph evolution in a social network streaming	5
Fig. 2: Overview of the SEPM Framework.	6
Fig. 3: An example of the social network streaming and transformed evolution database.	16
Fig. 4: The streaming evolution pattern tree of DB in Fig. 3 with min_sup = 0.4.	21
Fig. 5: Concept of network and database update	25
Fig. 6: An example of the updated network and database.	26
Fig. 7: Execution Time	31
Fig. 8: Maximum Memory Usage	31
Fig. 9: Execution Time of varying min_sup	32
Fig. 10: Maximum Memory Usage of varying min_sup	33
參考文獻
[1] 	Mendes, L., Ding, B., and Han, J. 2008. Stream sequential pattern mining with precise error bounds. In Proceeding of the 8th IEEE International Conference on Data Mining (ICDM 2008). 941–946
[2] 	Allen, J. 1983. Maintaining Knowledge about Temporal Intervals. Communications of ACM 26(11) 832-843.
[3] 	Asur, S., Parthasarathy, S., and Ucar, D. 2007. An event-based framework for characterizing the evolutionary behavior of interaction graphs. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’07). 913-921.
[4] 	Berlingerio, M., Bonchi, F., Bringmann, B., and Gionis, A. 2009. Mining Graph Evolution Rules. In Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD’09). 115-130.
[5] 	Borgwardt, K., Kriegel, H., and Wackersreuther, P. 2006. Pattern mining in frequent dynamic subgraphs. In Proceedings of the 6th International Conference on Data Mining (ICDM’06). 818-822.
[6] 	Chakrabarti, D., Kumar, R., and Tomkins, A. 2006. Evolutionary clustering. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’06). 554-560.
[7] 	Chen, Y., Jiang, J., Peng, W., and Lee, S. 2010. An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases. In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM’10). 49-58.
[8] 	Chi, Y., Song, X., Zhou, D., Hino, K., and Tseng, B. 2007. Evolutionary Spectral Clustering by Incorporating Temporal Smoothness. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’07). 153-162.
[9] 	Desikan, P. and Srivastava, J. 2004. Mining Temporally Evolving Graphs. In Proceedings of the 6th International Workshop on Knowledge Discovery from the Web (WEBKDD’04).
[10] 	Hopper, F. 2002. Finding informative rules in interval sequences. Intelligent Data Analysis 6(3). 237-255.
[11] 	Inokuchi, A., and Washio, T. 2008. A fast method to mine frequent subsequences from graph sequence data. In Proceeding of the 8th IEEE International Conference on Data Mining (ICDM’08). 303-312.
[12] 	Kam, P., and Fu, W. 2000. Discovering Temporal Patterns for Interval-based Events. In International Conference on Data Warehousing and Knowledge Discovery (DaWaK’00). 317-326.
[13] 	Kim, M., and Han, J. 2009. A Particle-and-Density based Evolutionary Clustering Method for Dynamic Networks. In Proceedings of the 35th International Conference on Very Large Data Bases (VLDB’09).
[14] 	Lahiri, M., and Berger-Wolf, T. 2007. Structure Prediction in Temporal Networks using Frequent Subgraphs. In Proceedings of the 2007 IEEE Symposium on Computational Intelligence and Data Mining (CIDM’07). 35-42.
[15] 	Lin, Y., Chi, Y., Zhu, S., Sundaram, H., and Tseng, B. 2008. Facetnet: A Framework for Analyzing Communities and Their Evolutions in Dynamic Networks. In Proceedings of the 17th International Conference on the World Wide Web (WWW’08). 685-694.
[16] 	Liu, Z., Yu, J., Ke, Y., Lin, X., and Chen, L. 2008. Spotting significant changing subgraphs in evolving graphs. In Proceeding of the 8th IEEE International Conference on Data Mining (ICDM 2008). 917-922.
[17] 	Morchen, F., and Ultsch, A. 2007. Efficient Mining of Understandable Patterns from Multivariate Interval Time Series. Data Mining Knowledge Discovery 15(2) 181-215.
[18] 	Morchen, F., and Fradkin, D. 2010. Robust mining of time intervals with semi-interval partial order patterns. In: Proceedings of the SIAM International Conference on Data Mining (SDM’10). 315-326.
[19] 	Palla, G., Barabási, A., and Vicsek, T. 2007. Quantifying social group evolution. Nature 446, 664-667.
[20] 	Papapetrou, P., Kollios, G., Sclaroff, S., and Gunopulos, D. 2005. Disovering Frequent Arrangements of Temporal Intervals. In International Conference on Data Mining (ICDM’05). 354-361.
[21] 	Patel, D., Hsu, W., and Lee, M. 2008. Mining Relationships Among Interval-based Events for Classification. In Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, 393-404.
[22] 	Qin, G., Gao, L., Yang, J., and Li, J. 2011. Evolution Pattern Discovery in Dynamic Networks. In: IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC’11).
[23] 	Sadasivam, R., and Duraiswamy, K. 2013. Efficient Approach to Discover Interval-based Sequential Patterns. Journal of Computer Science 9, 2, 225-234.
[24] 	Wackersreuther, B., Wackersreuther, P., Oswald, A., Bohm, C., and Borgwardt, K. 2010. Frequent Subgraph Discovery in Dynamic Networks. In: Proceedings of the 8th Workshop on Mining and Learning with Graphs (MLG’10), 155-162.
[25] 	Winarko, E., and Roddick, J. 2007. ARMADA-An algorithm for discovering richer relative temporal association rules from interval-based data. Data and Knowledge Engineering 63, 1. 76-90.
[26] 	Wu, S., and Chen, Y. 2007. Mining Nonambiguous Temporal Patterns for Interval-based Events. IEEE Transactions on Knowledge and Data Engineering 19, 6, 742-758.
[27] 	Wu, S., and Chen, Y. 2009. Discovering hybrid temporal patterns from sequences consisting of point- and interval-based events. Data & Knowledge Engineering 68, 11, 1309-1330.
[28] 	You, C., Holder, L., Cook, D. 2009. Learning Patterns in the Dynamics of Biological Networks. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’09).
[29] 	Manku, G. and Motwani, R. 2002. Approximate frequency counts over data streams. In Proceedings of the 28th international conference on Very Large Data Bases (VLDB'02), 346-357.
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信