§ 瀏覽學位論文書目資料
  
系統識別號 U0002-1706200517431900
DOI 10.6846/TKU.2005.00350
論文名稱(中文) 應用語法搜尋於電影採礦之設計
論文名稱(英文) The Designing of a Syntax-based Retrieval System for Mining Movies
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 93
學期 2
出版年 94
研究生(中文) 張振富
研究生(英文) Chen-Fu Chang
學號 692191520
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2005-06-16
論文頁數 73頁
口試委員 指導教授 - 郭經華(chkuo@mail.tku.edu.tw)
委員 - 陳孟彰
委員 - 劉遠楨
委員 - 郭經華(chkuo@mail.tku.edu.tw)
關鍵字(中) 詞性加註
詞性還原
電影場景偵測
索引建置
關鍵字(英) POS tagging
Lemmatizatize
Movie scene change
index construction
第三語言關鍵字
學科別分類
中文摘要
本系統最主要是提供一個可查詢語法的電影檢索系統。英文老師可以利用此系統來編制教材,提供給學生學習日常生活中常會用到的一些語法。為了提供語法查詢的功能我們必須先將電影字幕做一些前處理,例如:將字幕做詞性加註、詞性還原且將詞性加註 和詞性還原 後的資訊存成可擴充標示語言格式提供正規語言表示比對。為了提供一個完整包含語法搜尋結果的電影片段,系統也利用了一個簡單的圖片相似度的方法來實做場景偵測。

   當我們利用正規語言表示來當作我們的查詢語言,正規語言表示比對將會耗費相當多的時間。因此,我們將電影字幕建置索引來降低正規語言表示所要比對的句子個數。關於索引建置,我們是利用單字字元的索引建置方法,此方法最主要包含了單字字元切割、有效索引與無前置後置集。此外,電影場景偵測部分,我們利用了連續兩張圖片的相似度來判斷是否有場景變化的發生。

   在系統的實做的過程中,我們比對了未做索引、單字字元切割完後的索引 與無前置後置集的索引的數量與搜尋的時間,經過了實驗數據的分析與探討,充分驗證了當我們做完了無前置後置集後的索引 對於降低索引 的個數有著相當大的幫助。因此,當索引 數量降低,正規語言表示 比對所要花的時間相對的也降低了。在此一電影檢索系統中,單字字元的建置便顯的相當的重要,此也是本論文對於搜尋大量資料的索引建置的主要貢獻。
英文摘要
This paper will discuss how to build a movie retrieval system which can search English Grammar. English Teachers can design the teaching materials by this system. The teaching materials can provide some grammar examples which are used in daily life for students to learn. To achieve searching the English grammar in the movies, the movie subtitles will be processed before user’s query. For example, the movie subtitles will be processed by POS tagging、Lemmatizatize,and the information of POS tagging and Lemmatization will be saved to be XML Format. To provide a movie clip with the syntax result, our system also detects movie scene change which is implemented by the image similarity.

   When we use the regular expression as the query language, it will cost much time to match pattern. Therefore, we build the index of the movie subtitles to reduce the searching time. About the index construction, we use the k-gram indexing to be our approach which contains k-gram indexing、Useful index and Presuf-free set。Besides, we use the similarity of two continuous frames to detect the scene change.

   To test the actually system, we compare the searching time and the number of syntax result which is searched by the full、complete and the presuf-free indices. After examining and analyzing the results, we concluded through expand by sense, we could reduce the number of the indices and the searching time by constructing the k-gram indexing.. In this paper, we show how to construct the k-gram indexing before users search has a concrete contribution to the area of large database systems
第三語言摘要
論文目次
第1章	緒論...................................3
1.1	研究動機與目的.............................3 
1.2	研究內容...................................6
1.3	研究內容大綱...............................7

第2章	背景知識與相關研究..................8
2.1	XML文件表示語言.........................8
2.2	XML文件索引機制........................ 12
2.3	詞類標記................................16
2.4	場景偵測................................20
2.4.1	分鏡偵測................................. 22
2.4.2	未經壓縮影片格式之偵測方式............... 24
2.4.3	經壓縮影片格式之偵測方式................. 27

第3章	系統架構圖與系統之設計............ 29
3.1	系統架構............................... 29
3.2	K-gram indexing之設計.................. 33
3.2.1	Multi-gram indexing...................... 34
3.2.2	Useful index.............................. 36
3.2.3	Presuf free set........................... 38
3.2.4	檢索子系統................................ 39
3.3	語法搜尋................................42
3.4	電影片段偵測............................49
3.4.1	圖片相似度計算............................ 50
第4章	實作與討論........................ 52
4.1	系統功能介紹........................... 52
4.2	語法搜尋結合場景偵測之探討............. 56
4.3	K-gram indexing 之實驗測試............. 58

第5章	結論與未來研究方向.............. 64
5.1	結論................................... 64
5.2	未來研究方向........................... 66
參考文獻...................................... 68
參考文獻
[1] Jane King, “Using DVD Feature Films in the EFL Classroom,”,Computer Assisted Language Learning,Vol. 15, No. 5, pp 509-523, 2002.
[2] Erwin Tschirner, “Language Acquisition in the Classroom: The Role of Digital Video,” Computer Assisted Language Learning, Vol. 14, No. 3-4, pp 305-319, 2001.
[3] http://www.w3c.org.
[4] S. Abiteboul. D.Quass .J.Mchugh.J.Widom.and .Wiener.”The Lorel Query Language for Semistructured Data” International Journal on Digital Libraries , Vol 1, pp-68-88, 1997.
[5] XML Path Language .Http://www.w3c.org/TR/Xpath.
[6] XML Path Language .Http://www.w3c.org/TR/Xquery.
[7] R. Goldman and J. Widom,”DataGuides :Enabling Query Formulation and Optimization in Semistructured Databases”Proc.Ofthe 23rd VLDB conference ,1997,pp. 436-455 .
[8] S.Park and H.J.Kim,”A new query processing technique for XML based on signature,”Database Systems for Advanced  Application,2001. Proceedings .Seventh International Conference on,pp.22-29,2001.
[9] V. Tseng and W. Lin,”A new Method for Indexing for XML Document”Proc. Of the 12th Workshop on Object-Oriented Technology and Application,2001. pp.39-46. 
[10] S.Park and H.J.Kim,”SigDAQ:an enhanced XML Query optimization technique ,” Journal of Systems and Software Vol 61,Issue:2,pp-91-103,March 15,2002 .
[11] Thorsten.Brants, TnT-A Statistical Part-of-Speech Tagger. In Proceedings of the Sixth Applied Natrual Language Processing Conference ANLP-2000, Seatle,WA, 2000.
[12] http://www.coli.uni-sb.de/sfb378/negra-corpus/.
[13] http://www.cogs.susx.ac.uk/users/geoffs/RSue.html.
[14] Alan Hanjalic, Reginald L. Lagendijk, “Automated  High Level  Movie Segmentation for Advanced  Video Retrieval Systems” IEEE  TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 9, NO. 4, JUNE 1999.
[15] A. Hanjalic, R. L. Lagendijk, and J. Biemond, “A new method for key frame based video content representation ,” in Image Databases and Multi Media Search, A. W. M. Smeulders and R. Jain, Eds. Singapore:World Scientific, 1997, pp. 97–107.
[16] R. L. Lagendijk, A. Hanjalic, M. P. Ceccarelli, M. Soletic, and E. Persoon, “Visual search in a SMASH system,” in Proc. IEEE ICIP’96, vol. III, pp. 671–674.
[17] M. M. Yeung and B. Liu, “Efficient matching and clustering of video shots,” in Proc. IEEE ICIP’95, vol. I, pp. 338–341.
[18] H. Zhang, C. Y. Low, and S. W. Smoliar, “Video parsing and browsing using compressed data,” in Multimedia Tools and Applications.Norwell, MA: Kluwer Academic, 1995, vol. 1, pp. 89–111.
[19] Hampapur, A., Jain, R., and Weymouth, T., "Digital Video Segmentation", Proc. ACM Multimedia 94, San Francisco, CA, October, 1994, pp. 357-364.
[20] Kiyotaka Otsuji , Yoshinobu Tonomura, Projection detecting filter for video cut detection, Proceedings of the first ACM international conference on Multimedia, p.251-257, August 02-06, 1993, Anaheim, California, United States.
[21] B. Truong, C. Dorai, and S. Venkatesh, \New enhancements to cut, fade and dissolve detection processes in video segmentation," ACM Multimedia 2000, pp. 219{227, November 2000.
[22] M. S. Drew, Z.-N. Li, and X. Zhong. Video dissolve and wipe detection via spatio-temporal images of chromatic histogram differences. In Proceeding of IEEE Int. Conf. on Image Processing (ICIP 2000),volume 3, pages 909–932, 2000.
[23] R. Lienhart and A. Zaccarin. A system for reliable dissolve detection in video. In Proceeding of IEEE Intl. Conf. on Image Processing 2001 (ICIP’01), Thessaloniki, Greece, Oct. 2003.
[24] R. Zabih, J. Miller, and K. Mai. A feature-based algorithm for detecting and classifying production effects. ACM Journal of Multimedia Systems, 7:119–128, 1999.
[25] B.Shahraray. Scene Change Detection and content-based sampling of video sequences, Proceedings of International on Image Processing Lausanne.
[26] Hampaper,A.,Jain,R.,and Weymouth,T.,”Digital Video Segmentation”,Proc.ACM Multimedia 94 ,San Francisco,CA October 1994,pp. 357-364.
[27] Nagasaka,A. and Tanaka, Y.,”Automatic Video Indexing and Full-Video Search for Object Appearances”, in visual Database Systems Ⅱ,E., Wegner,L.,Editor, Elsevier Science Publishers, 1992,99. 113-127.
[28] R. Zabih, J. Miler, K. Mai, A feature-based algorithm for detecting and classifying production e!ects, Multimedia Systems 7 (1999) 119}128.
[29] H. Yu .G Bozdagi. and S.Harrington .”Feature-based hierarchical Video Segmentation” Proc. International Conference on Image Processing , Santa Barbara. pp.498-501.1997.
[30] H. C. Liu and G. L. Zick “Automatic determination of scene change in MPEG Compressed Video, in Proc. ISCAS-IEEE Int. Symp. Circuits and System.1995. pp. 764-767.
[31] Boon-Lock Yeo & Bede Liu, “A unified approach to temporal segmentation of motion JPEG and MPEG compressed video”, Proceedings of the International Conference on Multimedia Computing and Systems, pp. 81-88, 1995.
[32] Cho, Junghoo and Sridhar Rajagopalan. 2002. A Fast Regular Expression Indexing Engine. In Proceedings of 18th IEEE Conference on Data Engineering. 
[33] Xia Wan and C.-C,Jay Kuo 1998 A New Approach to Image Retrieval With Hierarchical color clustering  IEEE Tracsactions on circuits and systems for video technology.
[34] Swain and Ballard[10] “Color indexing”Int J. Comput Vision voL 7 no 1 pp. 11-32 1991.
[35] BNC  http://www.natcorp.ox.ac.uk/.
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信