淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


下載電子全文限經由淡江IP使用) 
系統識別號 U0002-0107200908560900
中文論文名稱 音樂資訊擷取之架構
英文論文名稱 A Framework for Music Information Retrieval
校院名稱 淡江大學
系所名稱(中) 資訊工程學系博士班
系所名稱(英) Department of Computer Science and Information Engineering
學年度 97
學期 2
出版年 98
研究生中文姓名 吳宏宣
研究生英文姓名 Hung-Hsuan Wu
學號 892190041
學位類別 博士
語文別 英文
口試日期 2009-06-23
論文頁數 53頁
口試委員 指導教授-林慧珍
委員-陳朝欽
委員-徐道義
委員-黃心嘉
委員-顏淑惠
委員-林慧珍
中文關鍵字 以內容為基礎的音樂擷取系統  最長子序列  幾何比對  反對稱性  二元搜尋樹  音樂相似度 
英文關鍵字 content-based music retrieval  longest common subsequence  geometric matching  antisymmetry  binary search tree  musical similarity 
學科別分類 學科別應用科學資訊工程
中文摘要 將一個短的查詢樂段 (query) 與一個較長的參考樂段 (reference) 做比對,並依照查詢與參考樂段的相關性,建立出相關參考樂段之排名,即為以內容為基礎的音樂擷取系統的目的。這種系統的效能取決於其採用的音樂比對 (matching) 法。以內容為基礎的音樂擷取系統中,最常使用的比對技術為字串比對 (string-based matching) 與幾何比對 (geometric matching) 兩種,此兩種比對方式在最近幾年被廣泛的探討。前者比對方式較具彈性且快速,然而後者則具較高之正確率。
在本論文中,我們研究上述兩種比對技術,分別改善其問題並提出一字串比對的改良版本,稱之為 RLCS法,與幾何比對的改良版本。
最長共同子序列 (longest common subsequence: LCS) 為以字串為基礎的比對中最常見的一種相似度量測方式,因為它極具彈性且有效率,但它僅反應出兩序列的全域相似度,並不適合前述的內容為基礎的音樂擷取系統。因此我們提出一個 LCS 的變化版本,rough longest common subsequence (RLCS),並利用RLCS以及我們所定義的參考樂段跨越寬度(width-across-reference,WAR) 與查詢樂段跨越寬度 (width-across-query,WAQ) 來計算兩序列的區域相似度。
我們亦提出改進D. Ó. Maidín [20] 所提的幾何比對方法。首先使用音程 (pitch interval) 取代音高 (pitch) 以達成音調不變性 (transposition invariance) ,同時避免在幾何比對時,搜尋最佳位置所需的垂直方向移動。除此我們還提出一個「分支與剪裁」(branch-and-prune)的機制用以加速搜尋。之後我們將音長 (duration) 特徵以音長比 (duration ratio) 取代之,以達成速度不變性 (tempo invariance)。無論如何,使用音程與音長比為特徵時,當一個音符的音高或音長改變時,會影響兩個連續音符之音程或音長比值,我們稱此問題為「反對稱性」(antisymmetry)。我們亦提出偵測與降低此反對稱性問題的方法。最後我們提出以平衡二元搜尋樹 (balanced binary search tree)來改進幾何比對演算法之時間複雜度。
英文摘要 The aim of a content-based music retrieval system is to match a short query melody against a longer reference melody to establish and rank the relevant references according to the similarity measurement. The performance of such a system is heavily dependent on the matching method adopted. Techniques of string-based matching and geometric matching, the most commonly used in content-based music retrieval systems, have been explored in the past few years. The former is more flexible and more efficient than the latter, and the latter has higher accuracy rate than the former.
In this dissertation, we investigate the improvement of these two matching techniques and propose an improved version of the string-based matching method, called RLCS, and an improved version of the geometric matching method.
The longest common subsequence (LCS) is commonly used for similarity measure in string-based matching techniques due to its flexibility and efficiency, but it reflects only the global similarity between two musical sequences. We propose a variant of the LCS, called the Rough LCS (RLCS), with which and the values of width-across-reference (WAR) and width-across-query (WAQ) we define and evaluate the local similarity between two musical sequences. We also improve the geometric matching proposed by D. Ó. Maidín [20] by the use of pitch interval instead of absolute pitch, to achieve transposition (key) invariance and avoid the vertical shifting required in the search of the best matching, required by D. Ó. Maidín’s method. We then further speed up the proposed method with the aid of a branch-and-prune mechanism. To achieve tempo invariance, we replace the feature of duration with duration ratio. However, the use of pitch interval and duration ratio might encounter antisymmetry problems caused by the change of pitch and the change of duration, respectively. Thus, we also propose to detect and reduce such problems. Finally, an efficient matching mechanism with the aid of a balanced binary search tree is proposed.
論文目次 Table of Contents
List of Figures VI
List of Tables VIII
Chapter 1 Introduction 1
Chapter 2 Related Works 3
2.1 String-Based Matching 3
2.2 Geometric Matching 4
Chapter 3 Rough Longest Common Subsequence 7
3.1 LCS and RLCS 7
3.2 RLCS for Music Matching 9
Chapter 4 Geometric Matching 14
4.1 Geometric Matching 14
4.1.1 Pitch-Duration Geometric Matching by Aloupis et al. 15
4.1.2 Pitch Interval-Duration Geometric Matching 17
4.1.3 Branch and Prune 23
4.2 Solving Antisymmetry Problems 27
4.2.1 Solving Antisymmetry Problem for Pitch Interval (SAP-PI) 27
4.2.2 Solving Antisymmetry Problem for Duration Ratio (SAP-DR) 29
4.2.3 Simultaneously Solving the Two Antisymmetry Problems (SAP-PIDR) 30
4.3 Geometric Matching Using A Binary Search Tree 32
4.3.1 Binary Search Tree Construction 32
4.3.2 Tree Update 34
Chapter 5 Experimental Results 38
5.1 Results of RLCS 38
5.2 Results of Geometric Matching 41
5.2.1 Results of Pitch Interval-Duration Geometric Matching 41
5.2.2 Results of Solution of Antisymmetry Problems 44
5.3 Comparison of Rough LCS and Geometric Matching 47
Chapter 6 Conclusions and Future Works 49
References 51

List of Figures
Figure 1. Examples of two matches: (a). The LCS of X and Y is < M, A, T, H, T >, with the minimum ranges across X and Y: < M, A, T, H, E, M, A, T > and < M, A, T, H, T >, respectively, (b). the LCS of X and Z is < M, T, H, E, S >, with the minimum ranges across X and Z: < M, A, T, H, E, M, A, T, I, C, S > and < M, O, T, H, E, R, S >, respectively. 9
Figure 2. A matching example: R = < M, A, T, H, E, M, A, T, I, C, S > and Q = < M, O, T, H, E, R >. 12
Figure 3. (a). A fragment of Mozart’s Variations on 'Ah, vous dirai–je, maman', K. 265, (b). Orthogonal chain representation of (a) 15
Figure 4. The region between two melodies is partitioned into 7 rectangles 17
Figure 5. The PID sequence of query Q is moved from left to right to search for the minimum pitch interval area 19
Figure 6. (a). The initial state of two PID sequences with step[1] = 2, step[2] = step[3] = step[4] = step[5] = 1, shift-step = 1, (b). The query in (a) is shifted by 1 unit 22
Figure 7. (a). Query of a folk song, (b). A reference to be matched, (c). The sketch of A(δ) for the query against the reference 24
Figure 8. (a). The PID sequences of an original query, a variant query and a reference, (b). the antisymmetry effect occurs at the second and third rectangles; If δ = 0.5, the total area would be reduced from A = 4.5 to A’ = 2.25. 29
Figure 9. (a). Query Q, (b). query Q’ with duration replacement in the third note, (c). reference R. 31
Figure 10. (a). The area of the region enclosed by query Q and reference R is A = 29.5, (b). the area of the region enclosed by query Q’ and reference R is A’ = 0. 31
Figure 11. The region between the query and the reference was partitioned into 7 rectangles. 33
Figure 12. (a) A shrinking rectangle Ci. (b) The rectangle Ci in (a) is vanishing and the rectangle Cj with zero area is created after the query makes a move of step size . (c) The rectangle Cj in (b) grows into a new rectangle Ck after another move of step size . 35
Figure 13. (a) The region between the query and the reference is partitioned into 7 rectangles. (b) After a move, the rectangle C3 and C5 are vanishing, and their neighboring rectangles change types. 36
Figure 14. The average retrieval rank for the noisy query set over different settings for α = 0.8. 39
Figure 15. Retrieval rates on top-n lists for RLCS and Local-Alignment tested on (a) the noisy query set and (b) the rearranged query set. 41
Figure 16. Time cost required by the four algorithms tested on the original query set 42
Figure 17. Time cost required by the four algorithms tested on the variant query set 42
Figure 18. Retrieval rates on top-n lists for pitch/pitch-interval geometric matching algorithms with/without branch-and prune mechanism tested on the variant query set 43
Figure 19. The average retrieval rank for the first variant version of the query set over various settings for δ. 44
Figure 20. The retrieval rates for top-n lists by two algorithms. 45
Figure 21. The average retrieval rank for the second variant version of the query set over various settings for λ. 45
Figure 22. The retrieval rates for top-n lists of two algorithms. 46
Figure 23. The retrieval rates for top-n lists of three algorithms. 46
Figure 24. The retrieval rates for top-n lists of SAP-PIDR and RLCS-PI (on the rearranged query set). 48
Figure 25. The retrieval rates for top-n lists of SAP-PIDR and RLCS-PI (on the noisy query set). 48

List of Tables
Table 1. Shifting step δ and corresponding area A(δ) for each step of coarse alignment 27
Table 2. Average rank yielded by two algorithms over two query sets 40
Table 3. Comparison of average time costs (in seconds) required by the four algorithms on two query sets 42
Table 4. Comparison of average time costs (in seconds) required by the two algorithms on two query sets 47
參考文獻 [1] L. Allison and T. I. Dix, “A bit-string longest common subsequence algorithm,” Information Processing Letters, Vol. 23(5), 1986, pp. 305–310.
[2] G. Aloupis, T. Fevens, S. Langerman, T. Matsui, A. Mesa, Y. Nunez, D. Rappaport, and G. Toussaint, “Algorithms for computing geometric measures of melodic similarity,” Computer Music Journal, Vol. 30(3), 2006, pp. 67-76.
[3] D. Bainbridge, M. Dewsnip, and I. H. Witten, “Searching digital music libraries,” Information Processing and Management , Vol. 41(1), 2005, pp. 41-56.
[4] H. Barlow and S. Morgenstern, A dictionary of musical themes, Crown Publishers, Inc., New York, 1975.
[5] D. Byrd and T. Crawford, “Problems of music information retrieval in the real world,” Information Processing and Management, Vol. 38(2), 2002, pp. 249-272.
[6] M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “Content-Based Music Information Retrieval: Current Directions and Future Challenges,” in Proceedings of the IEEE, Vol. 96, Issue 4, 2008, pp. 668-696.
[7] R. Clifford, M. Christodoulakis, T. Crawford, D. Meredith, and G. Wiggins, “A fast, randomised, maximal subset matching algorithm for document-level music retrieval,” in Proceedings of the 7th International Conference on Music Information Retrieval, 2006, pp. 150-155.
[8] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms, 2nd Ed., the MIT Press and MacGraw-Hill Book Company, 2001.
[9] M. Crochemore, C. S. Iliopoulos, Y. J. Pinzon, and J. F. Reid, “A fast and practical bit-vector algorithm for the Longest Common Subsequence problem,” Information Processing Letters, Vol. 80(6), 2001, pp. 279–285.

[10] S. Doraisamy, and S. Rüger, “A Polyphonic Music Retrieval System Using N-grams,” in Proceedings of the Fourth International Conference on Music Information Retrieval, 2004, pp. 204–209.
[11] S. Downe and M. Nelson, “Evaluation of a simple and effective music information retrieval method,” in Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2000, pp. 73–80.
[12] C. Francu and C. G. Nevill-Manning, “Distance metrics and indexing strategies for a digital library of popular music,” in Proceedings of the IEEE International Conference on Multimedia and EXPO, 2000, pp. 889–892.
[13] A. Guo and H. Siegelmann, “Time-warped longest common subsequence algorithm for music retrieval,” in: Proceedings of the Fifth International Conference on Music Information Retrieval, 2004, pp. 10–15.
[14] J.-S. Roger Jang and Hong-Ru Lee, “Hierarchical filtering method for content-based music retrieval via acoustic input,” in Proceedings of the 9th ACM Multimedia Conference, 2001, pp. 401–410.
[15] Joseph Kerman, Listen, Worth Publishers, 1976.
[16] K. Lemström and E. Ukkonen, “Including interval encoding into edit distance based music comparison and retrieval,” in Proceedings of AISB, 2000, pp. 53–60.
[17] H. J. Lin, H. H. Wu, and Y. T. Kao, “Geometric Measures of Distance Between Two Pitch Contour Sequences,” Journal of Computers, Vol. 19(2), 2008, pp. 55–66.
[18] H. J. Lin and H. H. Wu, “Efficient Geometric Measure of Music Similarity,” Information Processing Letters, Vol. 109(2), 2008, pp. 116–120.
[19] A. Lubiw and L. Tanur, “Pattern matching in polyphonic music as a weighted geometric translation problem,” in Proceedings of the 5th International Conference on Music Information Retrieval, 2004, pp. 289–296.
[20] D. Ó. Maidín, “A geometrical algorithm for melodic difference,” Computing in Musicology, Vol. 11, 1998, pp. 65–72.
[21] M. Mongeau and D. Sankoff, “Comparison of musical sequences,” Computers and the Humanities, Vol. 24, 1990, pp. 161–175.
[22] S. Rho and E. Hwang, “FMF: Query adaptive melody retrieval system,” Journal of Systems and Software, Vol. 79(1), 2006, pp. 43–56.
[23] M. Robine, P. Hanna, and P. Ferraro, “Music Similarity: Improvements of Edit-Based Algorithms by Considering Music Theory,” in Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval, 2007, pp. 135–141.
[24] L. A. Smith, R. J. McNab, and I. H. Witten, “Sequence-based melodic comparison: A dynamic programming approach,” Computing in Musicology, Vol. 11, 1998, pp. 101–117.
[25] T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” Journal of Molecular Biology, Vol. 147, 1981, pp. 195–197.
[26] I. S. H. Suyoto and A. L. Uitdenbogerd, “Effectiveness of Note Duration Information for Music Retrieval,” in Proceedings of the 10th Database Systems for Advanced Applications Conference, 2005, pp. 265–275.
[27] R. Typke, F. Wiering, and R. C. Veltkamp, “Transportation distances and human perception of melodic similarity,” ESCOM Musicae Scientiae, Similarity Perception in Listening to Music, 2007, pp. 153–181.
[28] A. Uitdenbogerd and J. Zobel, “Manipulation of music for melody matching,” in Proceedings of the ACM Multimedia Conference, Bristol, UK, September 1998, pp. 235–240.
[29] F. Wiering, R. Typke, and R.C. Veltkamp, “Transportation distances and their application in music-notation retrieval,” Computing in Musicology, Vol. 13, 2004, pp. 113-128.
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2011-07-13公開。
  • 同意授權瀏覽/列印電子全文服務,於2011-07-13起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信