
系統識別號 
U00020107200908560900 
中文論文名稱

音樂資訊擷取之架構 
英文論文名稱

A Framework for Music Information Retrieval 
校院名稱 
淡江大學 
系所名稱(中) 
資訊工程學系博士班 
系所名稱(英) 
Department of Computer Science and Information Engineering 
學年度 
97 
學期 
2 
出版年 
98 
研究生中文姓名 
吳宏宣 
研究生英文姓名 
HungHsuan Wu 
學號 
892190041 
學位類別 
博士 
語文別 
英文 
口試日期 
20090623 
論文頁數 
53頁 
口試委員 
指導教授林慧珍 委員陳朝欽 委員徐道義 委員黃心嘉 委員顏淑惠 委員林慧珍

中文關鍵字 
以內容為基礎的音樂擷取系統
最長子序列
幾何比對
反對稱性
二元搜尋樹
音樂相似度

英文關鍵字 
contentbased music retrieval
longest common subsequence
geometric matching
antisymmetry
binary search tree
musical similarity

學科別分類 
學科別＞應用科學＞資訊工程

中文摘要 
將一個短的查詢樂段 (query) 與一個較長的參考樂段 (reference) 做比對，並依照查詢與參考樂段的相關性，建立出相關參考樂段之排名，即為以內容為基礎的音樂擷取系統的目的。這種系統的效能取決於其採用的音樂比對 (matching) 法。以內容為基礎的音樂擷取系統中，最常使用的比對技術為字串比對 (stringbased matching) 與幾何比對 (geometric matching) 兩種，此兩種比對方式在最近幾年被廣泛的探討。前者比對方式較具彈性且快速，然而後者則具較高之正確率。
在本論文中，我們研究上述兩種比對技術，分別改善其問題並提出一字串比對的改良版本，稱之為 RLCS法，與幾何比對的改良版本。
最長共同子序列 (longest common subsequence: LCS) 為以字串為基礎的比對中最常見的一種相似度量測方式，因為它極具彈性且有效率，但它僅反應出兩序列的全域相似度，並不適合前述的內容為基礎的音樂擷取系統。因此我們提出一個 LCS 的變化版本，rough longest common subsequence (RLCS)，並利用RLCS以及我們所定義的參考樂段跨越寬度(widthacrossreference，WAR) 與查詢樂段跨越寬度 (widthacrossquery，WAQ) 來計算兩序列的區域相似度。
我們亦提出改進D. Ó. Maidín [20] 所提的幾何比對方法。首先使用音程 (pitch interval) 取代音高 (pitch) 以達成音調不變性 (transposition invariance) ，同時避免在幾何比對時，搜尋最佳位置所需的垂直方向移動。除此我們還提出一個「分支與剪裁」(branchandprune)的機制用以加速搜尋。之後我們將音長 (duration) 特徵以音長比 (duration ratio) 取代之，以達成速度不變性 (tempo invariance)。無論如何，使用音程與音長比為特徵時，當一個音符的音高或音長改變時，會影響兩個連續音符之音程或音長比值，我們稱此問題為「反對稱性」(antisymmetry)。我們亦提出偵測與降低此反對稱性問題的方法。最後我們提出以平衡二元搜尋樹 (balanced binary search tree)來改進幾何比對演算法之時間複雜度。 
英文摘要 
The aim of a contentbased music retrieval system is to match a short query melody against a longer reference melody to establish and rank the relevant references according to the similarity measurement. The performance of such a system is heavily dependent on the matching method adopted. Techniques of stringbased matching and geometric matching, the most commonly used in contentbased music retrieval systems, have been explored in the past few years. The former is more flexible and more efficient than the latter, and the latter has higher accuracy rate than the former.
In this dissertation, we investigate the improvement of these two matching techniques and propose an improved version of the stringbased matching method, called RLCS, and an improved version of the geometric matching method.
The longest common subsequence (LCS) is commonly used for similarity measure in stringbased matching techniques due to its flexibility and efficiency, but it reflects only the global similarity between two musical sequences. We propose a variant of the LCS, called the Rough LCS (RLCS), with which and the values of widthacrossreference (WAR) and widthacrossquery (WAQ) we define and evaluate the local similarity between two musical sequences. We also improve the geometric matching proposed by D. Ó. Maidín [20] by the use of pitch interval instead of absolute pitch, to achieve transposition (key) invariance and avoid the vertical shifting required in the search of the best matching, required by D. Ó. Maidín’s method. We then further speed up the proposed method with the aid of a branchandprune mechanism. To achieve tempo invariance, we replace the feature of duration with duration ratio. However, the use of pitch interval and duration ratio might encounter antisymmetry problems caused by the change of pitch and the change of duration, respectively. Thus, we also propose to detect and reduce such problems. Finally, an efficient matching mechanism with the aid of a balanced binary search tree is proposed.

論文目次 
Table of Contents
List of Figures VI
List of Tables VIII
Chapter 1 Introduction 1
Chapter 2 Related Works 3
2.1 StringBased Matching 3
2.2 Geometric Matching 4
Chapter 3 Rough Longest Common Subsequence 7
3.1 LCS and RLCS 7
3.2 RLCS for Music Matching 9
Chapter 4 Geometric Matching 14
4.1 Geometric Matching 14
4.1.1 PitchDuration Geometric Matching by Aloupis et al. 15
4.1.2 Pitch IntervalDuration Geometric Matching 17
4.1.3 Branch and Prune 23
4.2 Solving Antisymmetry Problems 27
4.2.1 Solving Antisymmetry Problem for Pitch Interval (SAPPI) 27
4.2.2 Solving Antisymmetry Problem for Duration Ratio (SAPDR) 29
4.2.3 Simultaneously Solving the Two Antisymmetry Problems (SAPPIDR) 30
4.3 Geometric Matching Using A Binary Search Tree 32
4.3.1 Binary Search Tree Construction 32
4.3.2 Tree Update 34
Chapter 5 Experimental Results 38
5.1 Results of RLCS 38
5.2 Results of Geometric Matching 41
5.2.1 Results of Pitch IntervalDuration Geometric Matching 41
5.2.2 Results of Solution of Antisymmetry Problems 44
5.3 Comparison of Rough LCS and Geometric Matching 47
Chapter 6 Conclusions and Future Works 49
References 51
List of Figures
Figure 1. Examples of two matches: (a). The LCS of X and Y is < M, A, T, H, T >, with the minimum ranges across X and Y: < M, A, T, H, E, M, A, T > and < M, A, T, H, T >, respectively, (b). the LCS of X and Z is < M, T, H, E, S >, with the minimum ranges across X and Z: < M, A, T, H, E, M, A, T, I, C, S > and < M, O, T, H, E, R, S >, respectively. 9
Figure 2. A matching example: R = < M, A, T, H, E, M, A, T, I, C, S > and Q = < M, O, T, H, E, R >. 12
Figure 3. (a). A fragment of Mozart’s Variations on 'Ah, vous dirai–je, maman', K. 265, (b). Orthogonal chain representation of (a) 15
Figure 4. The region between two melodies is partitioned into 7 rectangles 17
Figure 5. The PID sequence of query Q is moved from left to right to search for the minimum pitch interval area 19
Figure 6. (a). The initial state of two PID sequences with step[1] = 2, step[2] = step[3] = step[4] = step[5] = 1, shiftstep = 1, (b). The query in (a) is shifted by 1 unit 22
Figure 7. (a). Query of a folk song, (b). A reference to be matched, (c). The sketch of A(δ) for the query against the reference 24
Figure 8. (a). The PID sequences of an original query, a variant query and a reference, (b). the antisymmetry effect occurs at the second and third rectangles; If δ = 0.5, the total area would be reduced from A = 4.5 to A’ = 2.25. 29
Figure 9. (a). Query Q, (b). query Q’ with duration replacement in the third note, (c). reference R. 31
Figure 10. (a). The area of the region enclosed by query Q and reference R is A = 29.5, (b). the area of the region enclosed by query Q’ and reference R is A’ = 0. 31
Figure 11. The region between the query and the reference was partitioned into 7 rectangles. 33
Figure 12. (a) A shrinking rectangle Ci. (b) The rectangle Ci in (a) is vanishing and the rectangle Cj with zero area is created after the query makes a move of step size . (c) The rectangle Cj in (b) grows into a new rectangle Ck after another move of step size . 35
Figure 13. (a) The region between the query and the reference is partitioned into 7 rectangles. (b) After a move, the rectangle C3 and C5 are vanishing, and their neighboring rectangles change types. 36
Figure 14. The average retrieval rank for the noisy query set over different settings for α = 0.8. 39
Figure 15. Retrieval rates on topn lists for RLCS and LocalAlignment tested on (a) the noisy query set and (b) the rearranged query set. 41
Figure 16. Time cost required by the four algorithms tested on the original query set 42
Figure 17. Time cost required by the four algorithms tested on the variant query set 42
Figure 18. Retrieval rates on topn lists for pitch/pitchinterval geometric matching algorithms with/without branchand prune mechanism tested on the variant query set 43
Figure 19. The average retrieval rank for the first variant version of the query set over various settings for δ. 44
Figure 20. The retrieval rates for topn lists by two algorithms. 45
Figure 21. The average retrieval rank for the second variant version of the query set over various settings for λ. 45
Figure 22. The retrieval rates for topn lists of two algorithms. 46
Figure 23. The retrieval rates for topn lists of three algorithms. 46
Figure 24. The retrieval rates for topn lists of SAPPIDR and RLCSPI (on the rearranged query set). 48
Figure 25. The retrieval rates for topn lists of SAPPIDR and RLCSPI (on the noisy query set). 48
List of Tables
Table 1. Shifting step δ and corresponding area A(δ) for each step of coarse alignment 27
Table 2. Average rank yielded by two algorithms over two query sets 40
Table 3. Comparison of average time costs (in seconds) required by the four algorithms on two query sets 42
Table 4. Comparison of average time costs (in seconds) required by the two algorithms on two query sets 47 
參考文獻 
[1] L. Allison and T. I. Dix, “A bitstring longest common subsequence algorithm,” Information Processing Letters, Vol. 23(5), 1986, pp. 305–310.
[2] G. Aloupis, T. Fevens, S. Langerman, T. Matsui, A. Mesa, Y. Nunez, D. Rappaport, and G. Toussaint, “Algorithms for computing geometric measures of melodic similarity,” Computer Music Journal, Vol. 30(3), 2006, pp. 6776.
[3] D. Bainbridge, M. Dewsnip, and I. H. Witten, “Searching digital music libraries,” Information Processing and Management , Vol. 41(1), 2005, pp. 4156.
[4] H. Barlow and S. Morgenstern, A dictionary of musical themes, Crown Publishers, Inc., New York, 1975.
[5] D. Byrd and T. Crawford, “Problems of music information retrieval in the real world,” Information Processing and Management, Vol. 38(2), 2002, pp. 249272.
[6] M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “ContentBased Music Information Retrieval: Current Directions and Future Challenges,” in Proceedings of the IEEE, Vol. 96, Issue 4, 2008, pp. 668696.
[7] R. Clifford, M. Christodoulakis, T. Crawford, D. Meredith, and G. Wiggins, “A fast, randomised, maximal subset matching algorithm for documentlevel music retrieval,” in Proceedings of the 7th International Conference on Music Information Retrieval, 2006, pp. 150155.
[8] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algorithms, 2nd Ed., the MIT Press and MacGrawHill Book Company, 2001.
[9] M. Crochemore, C. S. Iliopoulos, Y. J. Pinzon, and J. F. Reid, “A fast and practical bitvector algorithm for the Longest Common Subsequence problem,” Information Processing Letters, Vol. 80(6), 2001, pp. 279–285.
[10] S. Doraisamy, and S. Rüger, “A Polyphonic Music Retrieval System Using Ngrams,” in Proceedings of the Fourth International Conference on Music Information Retrieval, 2004, pp. 204–209.
[11] S. Downe and M. Nelson, “Evaluation of a simple and effective music information retrieval method,” in Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2000, pp. 73–80.
[12] C. Francu and C. G. NevillManning, “Distance metrics and indexing strategies for a digital library of popular music,” in Proceedings of the IEEE International Conference on Multimedia and EXPO, 2000, pp. 889–892.
[13] A. Guo and H. Siegelmann, “Timewarped longest common subsequence algorithm for music retrieval,” in: Proceedings of the Fifth International Conference on Music Information Retrieval, 2004, pp. 10–15.
[14] J.S. Roger Jang and HongRu Lee, “Hierarchical filtering method for contentbased music retrieval via acoustic input,” in Proceedings of the 9th ACM Multimedia Conference, 2001, pp. 401–410.
[15] Joseph Kerman, Listen, Worth Publishers, 1976.
[16] K. Lemström and E. Ukkonen, “Including interval encoding into edit distance based music comparison and retrieval,” in Proceedings of AISB, 2000, pp. 53–60.
[17] H. J. Lin, H. H. Wu, and Y. T. Kao, “Geometric Measures of Distance Between Two Pitch Contour Sequences,” Journal of Computers, Vol. 19(2), 2008, pp. 55–66.
[18] H. J. Lin and H. H. Wu, “Efficient Geometric Measure of Music Similarity,” Information Processing Letters, Vol. 109(2), 2008, pp. 116–120.
[19] A. Lubiw and L. Tanur, “Pattern matching in polyphonic music as a weighted geometric translation problem,” in Proceedings of the 5th International Conference on Music Information Retrieval, 2004, pp. 289–296.
[20] D. Ó. Maidín, “A geometrical algorithm for melodic difference,” Computing in Musicology, Vol. 11, 1998, pp. 65–72.
[21] M. Mongeau and D. Sankoff, “Comparison of musical sequences,” Computers and the Humanities, Vol. 24, 1990, pp. 161–175.
[22] S. Rho and E. Hwang, “FMF: Query adaptive melody retrieval system,” Journal of Systems and Software, Vol. 79(1), 2006, pp. 43–56.
[23] M. Robine, P. Hanna, and P. Ferraro, “Music Similarity: Improvements of EditBased Algorithms by Considering Music Theory,” in Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval, 2007, pp. 135–141.
[24] L. A. Smith, R. J. McNab, and I. H. Witten, “Sequencebased melodic comparison: A dynamic programming approach,” Computing in Musicology, Vol. 11, 1998, pp. 101–117.
[25] T. F. Smith and M. S. Waterman, “Identification of common molecular subsequences,” Journal of Molecular Biology, Vol. 147, 1981, pp. 195–197.
[26] I. S. H. Suyoto and A. L. Uitdenbogerd, “Effectiveness of Note Duration Information for Music Retrieval,” in Proceedings of the 10th Database Systems for Advanced Applications Conference, 2005, pp. 265–275.
[27] R. Typke, F. Wiering, and R. C. Veltkamp, “Transportation distances and human perception of melodic similarity,” ESCOM Musicae Scientiae, Similarity Perception in Listening to Music, 2007, pp. 153–181.
[28] A. Uitdenbogerd and J. Zobel, “Manipulation of music for melody matching,” in Proceedings of the ACM Multimedia Conference, Bristol, UK, September 1998, pp. 235–240.
[29] F. Wiering, R. Typke, and R.C. Veltkamp, “Transportation distances and their application in musicnotation retrieval,” Computing in Musicology, Vol. 13, 2004, pp. 113128. 
論文使用權限 
同意紙本無償授權給館內讀者為學術之目的重製使用，於20110713公開。同意授權瀏覽/列印電子全文服務，於20110713起公開。 


