系統識別號 | U0002-1908201413162700 |
---|---|
DOI | 10.6846/TKU.2014.00741 |
論文名稱(中文) | 基於時間序列探勘之適性化數位學習元件管理暨檢索機制 |
論文名稱(英文) | An Adaptive Learning Object Management and Search Mechanism based on Time-Series Mining |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 資訊工程學系博士班 |
系所名稱(英文) | Department of Computer Science and Information Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 102 |
學期 | 2 |
出版年 | 103 |
研究生(中文) | 嚴昱文 |
研究生(英文) | Yu-Wen Yen |
學號 | 897410105 |
學位類別 | 博士 |
語言別 | 英文 |
第二語言別 | |
口試日期 | 2014-06-23 |
論文頁數 | 57頁 |
口試委員 |
指導教授
-
趙榮耀
委員 - 洪啟舜 委員 - 施國琛 委員 - 許輝煌 委員 - 顏淑惠 委員 - 趙榮耀 |
關鍵字(中) |
使用者生成資料 資料探勘 資訊檢索 時間序列 社群網路分析 數位學習 |
關鍵字(英) |
User-generated data Data mining Information retrieval Time-series Social network analysis E-learning |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
近年來,資訊科技的蓬勃發展促使網際網路(World Wide Web)變成了互動的平台。雖然互動的參與者,尤指使用者與其相關的事件,在各個方面皆彼此相異,但我們很確定地可以預見大量且複雜的資訊量。 這個現象的確造成了在資訊管理、取得以及重複使用上的困難,同時也降低了這些資訊本身的價值。在本論文中,我們嘗試提出有效的方法來管理使用者生成資料(User-generated Data)與其衍生之資訊,更試著藉以經驗來實作使用者中心的服務。 本論文著重於有意義的管理與重複使用使用者生成資料,尤其是其對於數位學習活動進行時之支援。首先,我們提出了一套用以管理使用者生成資料的狀態機,它主要用以明確地記錄此類資料相互間的關係,以及其衍生資訊間之關係。為了增加資料模型的準確度,我們再狀態機的設計之上,提出了一套時間序列的探勘演算法,用以針對特定時間區段內的資料之互動,進行處理。最後,在此基礎之上,我們實作了一套資料庫管理系統及資料檢索服務,以簡化使用者於數位學習資源檢索時之複雜度。我們蒐集了500位使用者在過去五年中於其使用之社群媒體(如Facebook, Twitter等)所創建出的數據,並用以進行效能與可行性之評量。實驗結果證實,本研究所提出之資料處理方法暨檢索服務,能有效支援數位學習活動中,資訊檢索之複雜度。 |
英文摘要 |
Recent advances in information technology have turned out World Wide Web to be the main platform for interactions where participants – users and corresponding events – are triggered. Although the participants vary in accordance with scenarios, a considerable size of data will be generated. This phenomenon indeed causes the complexity in information retrieval, management, and reuse, and meanwhile, turns down the value of this data. In this thesis, we attempt to achieve efficient management of user-generated data and its derivative contexts for human supports. This thesis concentrates on the meaningful reuse of user-generated data, especially its usage for learning purpose, through an efficient and purpose-built data management process. First, an intelligent state machine, which is the essence to the scenario of user-generated data processing, was developed to identify, especially those frequently-accessed and with timely manner, relations of data and its derivative contexts. To accelerate the accuracy in data correlation modeling, a temporal mining algorithm is then defined. This algorithm is applied to highlight the event that a data item is being accessed, and further examines its relative attributes with other correlated items. Last, but not the least, we present a conceptual scenario of human-centric search to demonstrate the proposed approach. The performance and feasibility can be revealed by the experiments that were conducted on the data collected from open social networks (e.g., Facebook, Twitter, etc.) in the past few years with size around 500 users and 8,000,000 shared contents from them. |
第三語言摘要 | |
論文目次 |
TABLE OF CONTENT CHAPTER I. Introduction 1 1.1 Background 2 1.2 Motivation and Contributions 4 1.3 Thesis Organization 5 CHAPTER II. Literature Review 7 2.1 Design and Applications of State Machine 8 2.2 Social Data Analysis and Extraction 10 2.3 Temporal Information Mining 12 2.4 Summary 14 CHAPTER III. Intelligent State Machine 15 3.1 Definition 16 3.2 Formulation of Intelligent State Machine 17 3.3 Execution of ISM 22 3.4 Exception Control in ISM 25 3.5 Quantification of Connections 32 3.5.1 Adding Temporal Information 32 3.5.2 Considering Usage (Co-usage) Information 33 CHAPTER IV. ISM-based Search 35 4.1 Facilitating the Search Process 36 4.1.1 The Weight Function 36 4.1.2 The Rank Function 38 4.2 Query Revision and Suggestion 39 CHAPTER V. The Experiments 43 5.1 The Data Set 44 5.2 Accuracy of ISM-based Data Management System 45 5.3 Reuse Rate of ISM-based System 48 5.4 Performance of ISM-based Search Support 50 CHAPTER VI. Conclusions 52 6.1 Summary of Thesis 53 6.2 Future Work 54 Bibliography 55 LIST OF FIGURES Figure 1. Concept of data management 16 Figure 2. Basic Elements of ISM 18 Figure 3. A transition with empty event 25 Figure 4. A decision transition with non-empty event 25 Figure 5. Illustration of search scenario 39 Figure 6. A P-R performance comparison between ISM-empowered search system and Google Customized-based search system 51 LIST OF TABLES Table 1. Algorithm of query revision 40 Table 2. Average and standard deviation on the accuracy of implemented classifiers (raw dataset) 45 Table 3. Average and standard deviation on the accuracy of implemented classifiers (pre-processed dataset) 46 Table 4. Average accuracy of classifier-in-parallel in an ISM-based system (pre-processed dataset) 46 Table 5. User feedbacks of applied search service 48 |
參考文獻 |
Aarts, F.; Jonsson, B.; Uijen, J. (2010) “Generating Models of In-finite-State Communication Protocols Using Regular Inference with Abstraction,” Testing Software and Systems, 6435, 188-204 Bose, I.; Mahapatra, R.K. (2001) “Business data mining – a machine learning perspective,” Information and Management, 39, 3, 211-225 Carpineto, C.; Osinski, S.; Romano, G.; Weiss, D. (2009) “A sur-vey of Web clustering engines,” ACM Computing Sur-veys, 41, 3, 17 Cavalli, A.; Gervy, C.; Prokopenko, S. (2003) “New approaches for passive testing using an Extended Finite State Ma-chine specification,” Information and Software Technology, 45, 12, 837-852 Chen, Y.; Dong, G.; Han, J.; Wah, B.W.; Wang, J. (2002) “Multi-dimensional regression analysis of time-series data streams,” Proceedings of the 28th international conference on Very Large Data Bases, 323-334 Cheng, K.T.; Krishnakumar, A.S. (1996) “Automatic generation of functional vectors using the extended finite state ma-chine model,” ACM Transactions on Design Automation of Electronic Systems, 1, 1, 57-79 Culotta, A.; Bekkerman, R.; McCallum, A. (2004) “Extracting social networks and contact information from email and the Web,” In Proceedings of CEAS-1 Erickson, T.; Kellogg, W.A. (2000) “Social translucence: an approach to designing systems that support social processes,” ACM Transactions on Computer-Human Interaction, 7, 1, 59-83 Esling, P.; Agon, C. (2012) “Time-series data mining,” ACM Computing Surveys, 45, 1, 12 Fagni, T.; Perego, R.; Silvestri, F.; Orlando, S. (2006) “Boosting the performance of Web search engines: Caching and prefetching query results by exploiting historical usage data,” ACM Transactions on Information Systems, 24, 1, 51-78 Faloutsos, C.; McCurley, K.S.; Tomkins, A. (2004) “Fast discovery of connection subgraphs,” In Proc. ACM SIGKDD 2004 Gaber, M.M.; Zaslavsky, A.; Krishnaswamy, S. (2005) “Mining data streams: a review,” ACM SIGMOD, 34, 2, 18-26 Glynn Mangold, W.; Faulds, D.J. (2009) “Social media: The new hybrid element of the promotion mix,” Business Horizons, 52, 4, 357-365 Guralnik, V.; Srivastave, J. (1999) “Event detection from time series data,” Proceedings of the fifth ACM SIGKDD International conference on Knowledge discovery and data mining, 33-42 Harada, M., Sato, S.; Kazama, K. (2004) “Finding authoritative people from the web,” In Proc. Joint Conference on Digital Libraries Hoheisel, A.; Alt, M. (2007) “Petri Nets,” Workflows for e-Science, 190-207 Hong, J.E.; Bae, D.H. (2000) “Software modeling and analysis using a hierarchical object-oriented Petri net,” Information Sciences, 130, 1-4, 133-164 Jensen, K.; Kristensen, L.M.; Wells, L. (2007) “Coloured Petri Nets and CPN Tools for modelling and validation of concurrent systems,” International Journal on Software Tools for Technology Transfer, 9, 3, 213-254 Joachims, T.; Granka, L.; Pan, B.; Hembrooke, H.; Gay, G. (2005) “Accurately interpreting clickthrough data as implicit feedback,” Proceeding of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 154-161 Karsai, M.; Kivela, M.; Pan, R.K.; Kaski, K.; Kertesz, J.; Barabasi, A.-L.; Saramaki, J. (2011) “Small but slow world: How net-work topology and burstiness slow down spreading,” Physical Review E, 83, 2 Keogh, E.; Kasetty, S. (2003) “On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration,” Data Mining and Knowledge Discovery, 7, 4, 349-371 Kozłowski, T.; Dagless, E.; Saul, J.; Adamski, M.; Szajna, J. (1995) “Parallel controller synthesis using Petri nets,” IEE Pro-ceedings – Computers and Digital Techniques, 142, 4, 263-271 Lee, K.; Agrawal, A.; Choudhary, A. (2013) “Real-time disease surveillance using Twitter data: demonstration on flu and cancer,” Proceeding of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1474-1477 Li, L.; Hadjiicostis, C.N.; Sreenivas, R.S. (2008) “Designs of Bi-similar Petri Net Controllers With Fault Tolerance Capabilities,” IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans, 38, 1, 207-217 Liu, B.; Liu, Y.K. (2002) “Expected value of fuzzy variable and fuzzy expected value models,” IEEE Transactions on Fuzzy Systems, 10, 4, 445-450 Mandal, S.N.; Choudhury, J.P.; Chaudhuri, S.R.B.; De, D. (2008) “Soft Computing Approach in Prediction of A Time Series Data,” Journal of Theoretical & Applied Information Technology, 4, 12, 1131-1141 Mika, P. (2005) “Ontologies are us: A unified model of social networks and semantics,” In Proc. ISWC2005 Milanovic, N.; Malek, M. (2004) “Current solutions for Web service composition,” IEEE Internet Computing, 8, 6, 51-59 Mitra, S.; Pal, S.K.; Mitra, P. (2002) “Data mining in soft compu-ting framework: a survey,” IEEE Transactions on Neural Networks, 13, 1, 3-14 Pais, R.; Gomes, L.; Paulo Barros, J. (2011) “From UML State Machines to Petri nets: History Atribute Translation Strategies,” The 37th Annual Conference on IEEE Industrial Electronics Society, 3776-3781 Rocchio, J. (1971) “Relevance Feedback Information Retrieval. The Smart Retrieval System – Experiments,” Automatic Document Processing, 313-323 Roya, M.; Chang, R.; Qi, X. (2007) “Learning From Relevance Feedback Sessions Using A K-Nearest-Neighbor-Based Semantic Repository,” Proc. of IEEE International Conference on Multimedia and Expo, 1994-1997 Salimifard, K.; Wright, M. (2001) “Petri net-based modeling of workflow systems: An overview,” European Journal of Operational Research, 134, 3, 664-676 Schadt, E.E.; Linderman, M.D.; Soreson, J.; Lee, L.; Nolan, G.P. (2010) “Computational solutions to large-scale data management and analysis,” Nature Reviews Genetics 11, 647-657 Shtykh, R.Y.; Jin, Q. (2011) “A human-centric integrated ap-proach to web information search and sharing,” Human-centric Computing and Information Sciences, 1:2 Steyvers, M.; Tenenbaum, J.B. (2005) “The Large-Scale Structure of Semantic Networks: Statistical Analyses and a Model of Semantic Growth,” Cognitive Science, 29, 1, 41-78 Tay, F.E.H.; Cao, L. (2001) “Application of support vector ma-chines in financial time series forecasting,” Omega, 29, 4, 309-317 Thelwall, M. (2001) “A web crawler design for data mining,” Journal of Information Science, 27, 5, 319-325 van der Aalst, W.M.P.; Song, M. (2004) “Mining Social Net-works: Uncovering Interaction Patterns in Business Processes,” Business Process Management, LNCS 3080, 244-260 Yen, N.Y.; Shih, T.K.; Jin, Q. (2013) “LONET: An Interactive Search Network for Intelligent Path Generation,” ACM Transactions on Intelligent Systems and Technology, 4, 2, 30 Zhang, J.; Chang, C.K.; Chung, J.Y.; Kim, S.W. (2004) “WS-Net: a Petri-net based specification model for Web services,” IEEE International Conference on Web Services, 420-427 |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信