系統識別號 | U0002-2308202112005700 |
---|---|
DOI | 10.6846/TKU.2021.00612 |
論文名稱(中文) | 基於PrefixSpan演算法的時間序列特徵模式社群網路探勘 |
論文名稱(英文) | Social network mining of time sequence pattern base on PrefixSpan algorithm |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 資訊工程學系碩士班 |
系所名稱(英文) | Department of Computer Science and Information Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 109 |
學期 | 2 |
出版年 | 110 |
研究生(中文) | 翁愷澤 |
研究生(英文) | Kai-Ze Weng |
學號 | 607410205 |
學位類別 | 碩士 |
語言別 | 繁體中文 |
第二語言別 | |
口試日期 | 2021-07-20 |
論文頁數 | 32頁 |
口試委員 |
指導教授
-
王英宏(inhon@mail.tku.edu.tw)
委員 - 陳以錚(ycchen@mgt.ncu.edu.tw) 委員 - 惠霖(121678@mail.tku.edu.tw) |
關鍵字(中) |
資料探勘 時序性資料 社群網路 特徵模式探勘 |
關鍵字(英) |
Data Mining Time Sequence Data Social Network Sequence Pattern Mining |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
時序性資料是常見且龐大的資料型態,無時無刻都會有新的資料生成且被記錄。社群網路是近年因網路快速發展所興起的線上虛擬社群,也因這種發展狀況,社群網路已經成為我們生活幾乎不可或缺的一部分,人與人之間的網絡關係也成為另一項重點研究項目,在社群內的各個使用者之間的連結與互動將會形成一個龐大而複雜的社群網路,並將記錄下在社群中所有用戶的的活動狀態、發言記錄、使用者之間的互動狀況及活動發生的時間,然而對於這些龐大且雜亂的社群網路的時間序列資料集合,要如何有效且迅速的篩選並處理這些資料將是本次要探討的重點。本次研究所提出的概念是結合數據分析與社群網路資料集,利用大量的時序性資料在經過篩選去除多餘資訊之後,經過標記起始時間點及結束時間點後,再利用PrefixSpan演算法去找出頻繁出現的使用者時序性序列資訊,再將其輸出結果加以記錄來實現。 |
英文摘要 |
Time Sequence is a common and enormous data type. Every second will generate a new set of data sequences then be recorded. Social network is a virtual online community that has developed in the last decades. Since this kind of development status, social network has become a part of our daily life. The connection between people has become another important research project. As the connection between people, the interaction and connectivity between users will form a humongous and complicated social network. This will record every user in this network’s activity status, speech record, interactive between users and record interactive time. Facing this kind of enormous and messy social network sequence datasets. How can we filter and process these datasets efficiently? We proposed a concept that uses a huge amount of time sequence data after filter them without extra information. Marked the starting-time point and end-time point. Then we used PrefixSpan algorithm to find the frequency of the user’s time sequence data, then recorded them for further usage. |
第三語言摘要 | |
論文目次 |
中文摘要 I 英文摘要 II 圖目錄: V 表目錄: VI 第一章. 緒論 1 1.1. 引言 1 1.2. 論文架構 2 第二章. 文獻探討 3 2.1. 資料探勘 3 2.2. 時間模式 6 2.3. 社群網路 8 第三章. 系統架構 9 第四章. 實驗設計與結果 18 4.1. 實驗資料設定 18 4.2. 實驗結果 20 第五章. 結論 22 參考文獻 23 附錄 英文論文 28 圖目錄: Figure 1. Allen’s 13 Temporal Pattern 7 Figure 2. System architectures 10 Figure 3. Program Describe 14 Figure 4. Algorithm of Pre-Processing 15 Figure 5. Algorithm of Add Time Tag 16 Figure 6. Algorithm of PrefixSpan 17 Figure 7. User Data After Filter 19 Figure 8. Pre-PrefixSpan result 20 Figure 9. Result After PrefixSpan 21 Figure 10. Final Result of Frequent User List 21 表目錄: Table 1. Example of PrefixSpan 5 Table 2. Program Describe 11 Table 3. User sample Datasets 19 |
參考文獻 |
[1] R. Agrawal and R. Srikant, "Fast algorithms for mining association rules," in Proc. 20th int. conf. very large data bases, VLDB, 1994, vol. 1215: Citeseer, pp. 487-499. [2] R. Agrawal and R. Srikant, "Mining sequential patterns," in Proceedings of the eleventh international conference on data engineering, 1995: IEEE, pp. 3-14. [3] J. F. Allen, "Maintaining knowledge about temporal intervals," Communications of the ACM, vol. 26, no. 11, pp. 832-843, 1983. [4] J. Chen, "An updown directed acyclic graph approach for sequential pattern mining," IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 7, pp. 913-928, 2009. [5] Y.-C. Chen, J.-C. Jiang, W.-C. Peng, and S.-Y. Lee, "An efficient algorithm for mining time interval-based patterns in large database," in Proceedings of the 19th ACM international conference on Information and knowledge management, 2010, pp. 49-58. [6] Y.-C. Chen, W.-C. Peng, and S.-Y. Lee, "Mining temporal patterns in time interval-based data," IEEE Transactions on Knowledge and Data Engineering, vol. 27, no. 12, pp. 3318-3331, 2015. [7] J. Han, G. Dong, and Y. Yin, "Efficient mining of partial periodic patterns in time series database," in Proceedings 15th International Conference on Data Engineering (Cat. No. 24 99CB36337), 1999: IEEE, pp. 106-115. [8] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M.-C. Hsu, "FreeSpan: frequent pattern-projected sequential pattern mining," in Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 2000, pp. 355-359. [9] J. Han et al., "Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth," in proceedings of the 17th international conference on data engineering, 2001: Citeseer, pp. 215-224. [10] J. Han, J. Pei, and Y. Yin, "Mining frequent patterns without candidate generation," ACM sigmod record, vol. 29, no. 2, pp. 1-12, 2000. [11] F. Höppner and F. Klawonn, "Finding informative rules in interval sequences," in International Symposium on Intelligent Data Analysis, 2001: Springer, pp. 125-134. [12] P.-S. Kam and A. W.-C. Fu, "Discovering temporal patterns for interval-based events," in International Conference on Data Warehousing and Knowledge discovery, 2000: Springer, pp. 317-326. [13] S. Laxman, P. Sastry, and K. Unnikrishnan, "Discovering frequent generalized episodes when events persist for different durations," IEEE Transactions on Knowledge and Data Engineering, vol. 19, no. 9, pp. 1188-1201, 2007. [14] M. Leleu, C. Rigotti, J.-F. Boulicaut, and G. Euvrard, "Constraint-based mining of sequential patterns over datasets with consecutive repetitions," in European Conference 25 on Principles of Data Mining and Knowledge Discovery, 2003: Springer, pp. 303-314. [15] Y. Li, J. Bailey, L. Kulik, and J. Pei, "Mining probabilistic frequent spatio-temporal sequential patterns with gap constraints from uncertain databases," in 2013 IEEE 13th International Conference on Data Mining, 2013: IEEE, pp. 448-457. [16] M.-Y. Lin and S.-Y. Lee, "Fast discovery of sequential patterns by memory indexing," in International Conference on Data Warehousing and Knowledge Discovery, 2002: Springer, pp. 150-160. [17] H. Mannila, H. Toivonen, and A. I. Verkamo, "Discovery of frequent episodes in event sequences," Data mining and knowledge discovery, vol. 1, no. 3, pp. 259-289, 1997. [18] F. Masseglia, F. Cathala, and P. Poncelet, "The PSP approach for mining sequential patterns," in European Symposium on Principles of Data Mining and Knowledge Discovery, 1998: Springer, pp. 176-184. [19] F. Mörchen and D. Fradkin, "Robust mining of time intervals with semi-interval partial order patterns," in Proceedings of the 2010 SIAM international conference on data mining, 2010: SIAM, pp. 315-326. [20] F. Mörchen and A. Ultsch, "Efficient mining of understandable patterns from multivariate interval time series," Data mining and knowledge discovery, vol. 15, no. 2, pp. 181-215, 2007. [21] M. Muzammal and R. Raman, "Mining sequential patterns from probabilistic databases," 26 in Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2011: Springer, pp. 210-221. [22] P. Papapetrou, G. Kollios, S. Sclaroff, and D. Gunopulos, "Discovering frequent arrangements of temporal intervals," in Fifth IEEE International Conference on Data Mining (ICDM'05), 2005: IEEE, p. 8 pp. [23] D. Patel, W. Hsu, and M. L. Lee, "Mining relationships among interval-based events for classification," in Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 2008, pp. 393-404. [24] J. Pei et al., "Mining sequential patterns by pattern-growth: The prefixspan approach," IEEE Transactions on knowledge and data engineering, vol. 16, no. 11, pp. 1424-1440, 2004. [25] R. Srikant and R. Agrawal, "Mining sequential patterns: Generalizations and performance improvements," in International conference on extending database technology, 1996: Springer, pp. 1-17. [26] S. J. Van Schaik, D. Olteanu, and R. Fink, "Enframe: A platform for processing probabilistic data," arXiv preprint arXiv:1309.0373, 2013. [27] R. Villafane, K. A. Hua, D. Tran, and B. Maulik, "Knowledge discovery from series of interval events," Journal of Intelligent Information Systems, vol. 15, no. 1, pp. 71-89, 2000. 27 [28] A. K. Wong, D. Zhuang, G. C. Li, and E.-S. A. Lee, "Discovery of delta closed patterns and noninduced patterns from sequences," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 8, pp. 1408-1421, 2011. [29] S.-Y. Wu and Y.-L. Chen, "Mining nonambiguous temporal patterns for interval-based events," IEEE transactions on knowledge and data engineering, vol. 19, no. 6, pp. 742-758, 2007. [30] J. Yang, W. Wang, and P. S. Yu, "Infominer: mining surprising periodic patterns," in Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, 2001, pp. 395-400. [31] J. Yang, W. Wang, P. S. Yu, and J. Han, "Mining long sequential patterns in a noisy environment," in Proceedings of the 2002 ACM SIGMOD international conference on Management of data, 2002, pp. 406-417. |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信