淡江大學覺生紀念圖書館 (TKU Library)

系統識別號 U0002-1908201413162700
中文論文名稱 基於時間序列探勘之適性化數位學習元件管理暨檢索機制
英文論文名稱 An Adaptive Learning Object Management and Search Mechanism based on Time-Series Mining
校院名稱 淡江大學
系所名稱(中) 資訊工程學系博士班
系所名稱(英) Department of Computer Science and Information Engineering
學年度 102
學期 2
出版年 103
研究生中文姓名 嚴昱文
研究生英文姓名 Yu-Wen Yen
學號 897410105
學位類別 博士
語文別 英文
口試日期 2014-06-23
論文頁數 57頁
口試委員 指導教授-趙榮耀
中文關鍵字 使用者生成資料  資料探勘  資訊檢索  時間序列  社群網路分析  數位學習 
英文關鍵字 User-generated data  Data mining  Information retrieval  Time-series  Social network analysis  E-learning 
學科別分類 學科別應用科學資訊工程
中文摘要 近年來,資訊科技的蓬勃發展促使網際網路(World Wide Web)變成了互動的平台。雖然互動的參與者,尤指使用者與其相關的事件,在各個方面皆彼此相異,但我們很確定地可以預見大量且複雜的資訊量。 這個現象的確造成了在資訊管理、取得以及重複使用上的困難,同時也降低了這些資訊本身的價值。在本論文中,我們嘗試提出有效的方法來管理使用者生成資料(User-generated Data)與其衍生之資訊,更試著藉以經驗來實作使用者中心的服務。
本論文著重於有意義的管理與重複使用使用者生成資料,尤其是其對於數位學習活動進行時之支援。首先,我們提出了一套用以管理使用者生成資料的狀態機,它主要用以明確地記錄此類資料相互間的關係,以及其衍生資訊間之關係。為了增加資料模型的準確度,我們再狀態機的設計之上,提出了一套時間序列的探勘演算法,用以針對特定時間區段內的資料之互動,進行處理。最後,在此基礎之上,我們實作了一套資料庫管理系統及資料檢索服務,以簡化使用者於數位學習資源檢索時之複雜度。我們蒐集了500位使用者在過去五年中於其使用之社群媒體(如Facebook, Twitter等)所創建出的數據,並用以進行效能與可行性之評量。實驗結果證實,本研究所提出之資料處理方法暨檢索服務,能有效支援數位學習活動中,資訊檢索之複雜度。
英文摘要 Recent advances in information technology have turned out World Wide Web to be the main platform for interactions where participants – users and corresponding events – are triggered. Although the participants vary in accordance with scenarios, a considerable size of data will be generated. This phenomenon indeed causes the complexity in information retrieval, management, and reuse, and meanwhile, turns down the value of this data. In this thesis, we attempt to achieve efficient management of user-generated data and its derivative contexts for human supports.
This thesis concentrates on the meaningful reuse of user-generated data, especially its usage for learning purpose, through an efficient and purpose-built data management process. First, an intelligent state machine, which is the essence to the scenario of user-generated data processing, was developed to identify, especially those frequently-accessed and with timely manner, relations of data and its derivative contexts. To accelerate the accuracy in data correlation modeling, a temporal mining algorithm is then defined. This algorithm is applied to highlight the event that a data item is being accessed, and further examines its relative attributes with other correlated items. Last, but not the least, we present a conceptual scenario of human-centric search to demonstrate the proposed approach. The performance and feasibility can be revealed by the experiments that were conducted on the data collected from open social networks (e.g., Facebook, Twitter, etc.) in the past few years with size around 500 users and 8,000,000 shared contents from them.
CHAPTER I. Introduction 1
1.1 Background 2
1.2 Motivation and Contributions 4
1.3 Thesis Organization 5
CHAPTER II. Literature Review 7
2.1 Design and Applications of State Machine 8
2.2 Social Data Analysis and Extraction 10
2.3 Temporal Information Mining 12
2.4 Summary 14
CHAPTER III. Intelligent State Machine 15
3.1 Definition 16
3.2 Formulation of Intelligent State Machine 17
3.3 Execution of ISM 22
3.4 Exception Control in ISM 25
3.5 Quantification of Connections 32
3.5.1 Adding Temporal Information 32
3.5.2 Considering Usage (Co-usage) Information 33
CHAPTER IV. ISM-based Search 35
4.1 Facilitating the Search Process 36
4.1.1 The Weight Function 36
4.1.2 The Rank Function 38
4.2 Query Revision and Suggestion 39
CHAPTER V. The Experiments 43
5.1 The Data Set 44
5.2 Accuracy of ISM-based Data Management System 45
5.3 Reuse Rate of ISM-based System 48
5.4 Performance of ISM-based Search Support 50
CHAPTER VI. Conclusions 52
6.1 Summary of Thesis 53
6.2 Future Work 54
Bibliography 55

Figure 1. Concept of data management 16
Figure 2. Basic Elements of ISM 18
Figure 3. A transition with empty event 25
Figure 4. A decision transition with non-empty event 25
Figure 5. Illustration of search scenario 39
Figure 6. A P-R performance comparison between ISM-empowered search system and Google Customized-based search system 51

Table 1. Algorithm of query revision 40
Table 2. Average and standard deviation on the accuracy of implemented classifiers (raw dataset) 45
Table 3. Average and standard deviation on the accuracy of implemented classifiers (pre-processed dataset) 46
Table 4. Average accuracy of classifier-in-parallel in an ISM-based system (pre-processed dataset) 46
Table 5. User feedbacks of applied search service 48
參考文獻 Aarts, F.; Jonsson, B.; Uijen, J. (2010) “Generating Models of In-finite-State Communication Protocols Using Regular Inference with Abstraction,” Testing Software and Systems, 6435, 188-204
Bose, I.; Mahapatra, R.K. (2001) “Business data mining – a machine learning perspective,” Information and Management, 39, 3, 211-225
Carpineto, C.; Osinski, S.; Romano, G.; Weiss, D. (2009) “A sur-vey of Web clustering engines,” ACM Computing Sur-veys, 41, 3, 17
Cavalli, A.; Gervy, C.; Prokopenko, S. (2003) “New approaches for passive testing using an Extended Finite State Ma-chine specification,” Information and Software Technology, 45, 12, 837-852
Chen, Y.; Dong, G.; Han, J.; Wah, B.W.; Wang, J. (2002) “Multi-dimensional regression analysis of time-series data streams,” Proceedings of the 28th international conference on Very Large Data Bases, 323-334
Cheng, K.T.; Krishnakumar, A.S. (1996) “Automatic generation of functional vectors using the extended finite state ma-chine model,” ACM Transactions on Design Automation of Electronic Systems, 1, 1, 57-79
Culotta, A.; Bekkerman, R.; McCallum, A. (2004) “Extracting social networks and contact information from email and the Web,” In Proceedings of CEAS-1
Erickson, T.; Kellogg, W.A. (2000) “Social translucence: an approach to designing systems that support social processes,” ACM Transactions on Computer-Human Interaction, 7, 1, 59-83
Esling, P.; Agon, C. (2012) “Time-series data mining,” ACM Computing Surveys, 45, 1, 12
Fagni, T.; Perego, R.; Silvestri, F.; Orlando, S. (2006) “Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data,” ACM Transactions on Information Systems, 24, 1, 51-78
Faloutsos, C.; McCurley, K.S.; Tomkins, A. (2004) “Fast discovery of connection subgraphs,” In Proc. ACM SIGKDD 2004
Gaber, M.M.; Zaslavsky, A.; Krishnaswamy, S. (2005) “Mining data streams: a review,” ACM SIGMOD, 34, 2, 18-26
Glynn Mangold, W.; Faulds, D.J. (2009) “Social media: The new hybrid element of the promotion mix,” Business Horizons, 52, 4, 357-365
Guralnik, V.; Srivastave, J. (1999) “Event detection from time series data,” Proceedings of the fifth ACM SIGKDD International conference on Knowledge discovery and data mining, 33-42
Harada, M., Sato, S.; Kazama, K. (2004) “Finding authoritative people from the web,” In Proc. Joint Conference on Digital Libraries
Hoheisel, A.; Alt, M. (2007) “Petri Nets,” Workflows for e-Science, 190-207
Hong, J.E.; Bae, D.H. (2000) “Software modeling and analysis using a hierarchical object-oriented Petri net,” Information Sciences, 130, 1-4, 133-164
Jensen, K.; Kristensen, L.M.; Wells, L. (2007) “Coloured Petri Nets and CPN Tools for modelling and validation of concurrent systems,” International Journal on Software Tools for Technology Transfer, 9, 3, 213-254
Joachims, T.; Granka, L.; Pan, B.; Hembrooke, H.; Gay, G. (2005) “Accurately interpreting clickthrough data as implicit feedback,” Proceeding of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 154-161
Karsai, M.; Kivela, M.; Pan, R.K.; Kaski, K.; Kertesz, J.; Barabasi, A.-L.; Saramaki, J. (2011) “Small but slow world: How net-work topology and burstiness slow down spreading,” Physical Review E, 83, 2
Keogh, E.; Kasetty, S. (2003) “On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration,” Data Mining and Knowledge Discovery, 7, 4, 349-371
Kozłowski, T.; Dagless, E.; Saul, J.; Adamski, M.; Szajna, J. (1995) “Parallel controller synthesis using Petri nets,” IEE Pro-ceedings – Computers and Digital Techniques, 142, 4, 263-271
Lee, K.; Agrawal, A.; Choudhary, A. (2013) “Real-time disease surveillance using Twitter data: demonstration on flu and cancer,” Proceeding of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1474-1477
Li, L.; Hadjiicostis, C.N.; Sreenivas, R.S. (2008) “Designs of Bi-similar Petri Net Controllers With Fault Tolerance Capabilities,” IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 38, 1, 207-217
Liu, B.; Liu, Y.K. (2002) “Expected value of fuzzy variable and fuzzy expected value models,” IEEE Transactions on Fuzzy Systems, 10, 4, 445-450
Mandal, S.N.; Choudhury, J.P.; Chaudhuri, S.R.B.; De, D. (2008) “Soft Computing Approach in Prediction of A Time Series Data,” Journal of Theoretical & Applied Information Technology, 4, 12, 1131-1141
Mika, P. (2005) “Ontologies are us: A unified model of social networks and semantics,” In Proc. ISWC2005
Milanovic, N.; Malek, M. (2004) “Current solutions for Web service composition,” IEEE Internet Computing, 8, 6, 51-59
Mitra, S.; Pal, S.K.; Mitra, P. (2002) “Data mining in soft compu-ting framework: a survey,” IEEE Transactions on Neural Networks, 13, 1, 3-14
Pais, R.; Gomes, L.; Paulo Barros, J. (2011) “From UML State Machines to Petri nets: History Atribute Translation Strategies,” The 37th Annual Conference on IEEE Industrial Electronics Society, 3776-3781
Rocchio, J. (1971) “Relevance Feedback Information Retrieval. The Smart Retrieval System – Experiments,” Automatic Document Processing, 313-323
Roya, M.; Chang, R.; Qi, X. (2007) “Learning From Relevance Feedback Sessions Using A K-Nearest-Neighbor-Based Semantic Repository,” Proc. of IEEE International Conference on Multimedia and Expo, 1994-1997
Salimifard, K.; Wright, M. (2001) “Petri net-based modeling of workflow systems: An overview,” European Journal of Operational Research, 134, 3, 664-676
Schadt, E.E.; Linderman, M.D.; Soreson, J.; Lee, L.; Nolan, G.P. (2010) “Computational solutions to large-scale data management and analysis,” Nature Reviews Genetics 11, 647-657
Shtykh, R.Y.; Jin, Q. (2011) “A human-centric integrated ap-proach to web information search and sharing,” Human-centric Computing and Information Sciences, 1:2
Steyvers, M.; Tenenbaum, J.B. (2005) “The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth,” Cognitive Science, 29, 1, 41-78
Tay, F.E.H.; Cao, L. (2001) “Application of support vector ma-chines in financial time series forecasting,” Omega, 29, 4, 309-317
Thelwall, M. (2001) “A web crawler design for data mining,” Journal of Information Science, 27, 5, 319-325
van der Aalst, W.M.P.; Song, M. (2004) “Mining Social Net-works: Uncovering Interaction Patterns in Business Processes,” Business Process Management, LNCS 3080, 244-260
Yen, N.Y.; Shih, T.K.; Jin, Q. (2013) “LONET: An Interactive Search Network for Intelligent Path Generation,” ACM Transactions on Intelligent Systems and Technology, 4, 2, 30
Zhang, J.; Chang, C.K.; Chung, J.Y.; Kim, S.W. (2004) “WS-Net: a Petri-net based specification model for Web services,” IEEE International Conference on Web Services, 420-427
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2014-08-25公開。
  • 同意授權瀏覽/列印電子全文服務,於2014-08-25起公開。

  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2487 或 來信 dss@mail.tku.edu.tw