系統識別號 | U0002-2501200721004000 |
---|---|
DOI | 10.6846/TKU.2007.00755 |
論文名稱(中文) | 以多階層向量量化為基礎之語者辨識 |
論文名稱(英文) | Speaker Identification Based on Multistage Vector Quantization |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 電機工程學系碩士班 |
系所名稱(英文) | Department of Electrical and Computer Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 95 |
學期 | 1 |
出版年 | 96 |
研究生(中文) | 鄭竹勝 |
研究生(英文) | Chu-Sheng Cheng |
學號 | 692351140 |
學位類別 | 碩士 |
語言別 | 繁體中文 |
第二語言別 | |
口試日期 | 2007-01-04 |
論文頁數 | 71頁 |
口試委員 |
指導教授
-
謝景棠
委員 - 許志旭 委員 - 陳慶逸 |
關鍵字(中) |
語者辨識 語音切割 向量量化 多階層向量量化 |
關鍵字(英) |
Speaker Identification Speech Segmentation Vector Quantization Multistage VQ |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
為提高系統的抗雜訊能力,我們提出一個新的語者辨識系統。在辨識器前作ㄧ區分週期與非週期性語料的預處理,因為語音的非週期性語料易受雜訊影響且能量較弱,我們僅利用週期性語料進行語者辨認,又因屬於極零模型的線性預估倒頻譜參數(LPCC) 比梅爾刻度倒頻譜參數(MFCC)對週期性語料有較好的描述能力,我們使用LPCC作為語者辨識的特徵參數,且多階層的VQ比單階層VQ具有更好的編碼能力與需較少的容量,我們使用作為語者辨識的分類器,以提升辨識率。 |
英文摘要 |
We presents an effective speaker recognition system for improving the performance in noisy environments and under various recording conditions, including microphone, common phones. In our previous works, we segment speech manually into regions of aperiodic consonant and others. As we find the characteristic of aperiodic consonant of LPCCs effect the performance of speaker identification in noisy environments. For speech feature extraction, we use LPCC and MFCC. The experimental results show that LPCC is more effective than MFCC, particularly extract form periodic corpora. For classifiers, we have tested VQ (Vector quantization) and 2-stage VQ. The experimental results show that 2-stage VQ is more effective than VQ and the 2-stage VQ is computationally more efficient than VQ. In our experiments, to evaluate the combinations of speech features and speaker classifiers, we have used two speech corpora in this study, include TIMIT and NTIMIT database. |
第三語言摘要 | |
論文目次 |
目 錄 第一章 緒論……………………………………………………….1 1.1 研究目的與相關研究…………...……………………….1 1.2 論文架構………...……………………………………5 第二章 語音特徵參數…..……………………………………………..6 2.1 語音分類參數…………………………….……………..6 2.2 多維度高斯密度函數….……………………………….16 2.3 線性預估倒頻譜係數………..………………………....19 2.4 多頻帶特徵參數……………………..………………....21 第三章 辨識器....……………………………………………………..22 3.1 向量量化編碼…………………………….……………25 3.2 多階層向量量化編碼..…………………………………27 3.3 多頻帶特徵之兩階向量量化辨識器……...…………...29 第四章 語者辨識系統……………………………………………32 4.1 語音分類分析…...……………………………………...32 4.2 辨識系統分析………………………………………......37 4.3 系統流程……………………………………………......40 第五章 實驗與討論…….…………………………………………….42 5.1 語音資料庫簡介…………………………………..........43 5.2 TIMIT語者辨識實驗…………………………………...44 5.2.1 參數階數分析……...…………………...44 5.2.2 VQ階層數分析……………………..….47 5.2.3 多頻帶分析…………………….…….....51 5.2.4 雜訊分析………………………………..59 5.3 TIMIT 以不同訓練語料句數之語者辨識實驗………..63 第六章 結論與未來研究方向….……………………….........................67 6.1 結論……………………………………………………..67 6.2 展望……………………………………………………..68 參考文獻………………..……………………………………………….....69 圖目錄 圖2.1 熵參數和能量參數的波形……………………………………..........9 圖2.2 語音信號與過零率波形..………………….…………….…………10 圖2.3 語音信號與能量波形...……………………………………………..11 圖2.4 語音信號與自相關係數波形………………………….……………12 圖2.5 聲音產生流程圖……………………………………………………13 圖2.6 聲音分類系統流程圖.………...........................................................18 圖2.7 特徵擷取演算法…...…………………………….………………….21 圖3.1 兩階向量量化流程圖.……………………………….……………...28 圖3.2 多頻帶特徵之兩階層向量量化…………………………….………31 圖4.1 語者辨識系統架構圖..………………….…………………………..39 圖4.2 多頻帶特徵之兩階向量量化系統圖..…………….…………..........41 圖5.1 LC1VQ與FC1VQ的多頻帶特徵辨識率..………………….……...49 圖5.2 LC2VQ與FC2VQ的多頻帶特徵辨識率..…………….…………...50 圖5.3 LC1VQ與FC1VQ的多頻帶特徵辨識率..………………….……...54 圖5.4 LC2VQ與FC2VQ的辨識率..…………….………….......................54 圖5.5 LC2VQ與FC2VQ的多頻帶特徵辨識率..………………….……...56 圖5.6 LC1VQ全頻帶特徵辨識率.…………….………….........................57 圖5.7 LC1VQ用不同語料的辨識率..………………….………………….58 圖5.8 LC2VQ加入高斯雜訊的辨識率..…………….………….................61 圖5.9 LC2VQ加入高斯雜訊的辨識率..…………….………….................61 表目錄 表4.1週期性語料分類結果……………………………………………33 表 4.2 週期性語料分類結果………………….……………….…………..34 表 4.3 週期性語料分類結果………………..………..................................35 表 4.4 靜音分類結果.…………………..……….........................................36 表 5.1 LC2VQ多頻帶特徵的辨識率.……………………………………45 表 5.2 LC1VQ多頻帶特徵的辨識率..….....…..…………………………46 表 5.3 LC1VQ與FC1VQ多頻帶特徵的辨識率………………………48 表 5.4 LC2VQ與FC2VQ多頻帶特徵的辨識率.……………….……….48 表 5.5 LC1VQ與FC1VQ多頻帶特徵的辨識率.…………………..……49 表 5.6 LC1VQ與FC1VQ多頻帶特徵的辨識率.…………………..……52 表 5.7 LC2VQ與FC2VQ的辨識率…..……….........................................53 表 5.8 LC2VQ與FC2VQ多頻帶特徵的辨識率.………………………..56 表 5.9 LC1VQ全頻帶特徵的辨識率….............…..……………………..57 表 5.10 LC2VQ加入高斯雜訊的辨識率…………..………......................60 表 5.11 LC2VQ多頻帶特徵的辨識率.…………………………………..64 表 5.12 LC2VQ多頻帶特徵的辨識率….....…..…………………………66 |
參考文獻 |
參考文獻 [1] L. S. Huang and C. H. Yang, “A novel approach roust speech endpoint detection in car environments”, Proc. IEEE ICASSP-00, Vol. 3, pp. 1751-1754, June 2000. [2] J. F. Wang, C. H. Wu, S.H. Chang, and J. Y. Lee, “A hierarchical neural network model based on a C/V segmentation algorithm for isolated Mandarin speech recognition”, IEEE Trans. Signal Process., 39, No. 9, pp. 2141-2146, 1991. [3] S. Ahmadi and A. S. Spanias, “Cepstrum-Based Pitch Detection Using a New Statistical V/UV Classification Algorithm”, IEEE Trans. Speech and Audio Process., Vol.7, NO.3, pp. 333-337, MAY 1999. [4] S. H. Chen and J. F. Wang, “Application of wavelet transforms for C/V segmentation on Mandarin speech signals”, IEE Proc.-Vis. Image Signal Process., Vol. 148, No. 2, pp. 133-139, April 2001. [5] F.K. Soong, A. E. Rosenberg and B. H. Juang, “A vector quantization approach to speaker recognition”, IEEE Trans. Speech, and Signal Process., Vol. 10 , pp. 387-390, 1985. [6] S.Furui, “Comparison of speaker recognition method using statistical features and dynamic features”, IEEE Trans. Acoustics, Speech, and Signal Process., Vol. 29 , pp. 342-350, 1981. [7] A. Poritz, “Linear predictive hidden Markov models and the speech signal”, IEEE Trans. Acoustics, Speech, and Signal Process., Vol. 7 , pp. 1291-1294, 1982. [8] D. A. Reynolds, and R. C. Rose, “Robust test-independent speaker identification using Gaussian mixture speaker models”, IEEE Trans. , Speech, and Audio Process., Vol. 3 , pp.72-83, 1995. [9] B. S. Atal and L. R. Rabiner, “A Pattern Recognition Approach to Voiced-Unvoiced-Silence Classification with Applications to Speech Recognition”, IEEE Trans. Acoust. Speech, Signal Process., Vol.24, NO.3, pp. 201-211, June 1976. [10] Z.M. Lu, D.G. Xu and S.H. Sun, “Multipose Image Watermarking Algorithm Based on Multistage Vector Quantization”, IEEE Trans. Image Process., Vol. 14, No.6, pp. 822-831, 2005. [11] G. Strang and T. Nguyen, Wavelets and Filter Banks, Wellesley Cambridge, 1997. [12] S. G. Mallat, “Multifrequency channel decomposition of images and wavelet models”, IEEE Trans. Acoust. Speech Signal Process., pp. 2091-2110, 1989. [13] L. S. Huang and C. H. Yang, “A novel approach roust speech endpoint detection in car environments”, Proc. IEEE ICASSP-00, Vol. 3, pp. 1751-1754, June 2000. [14] H. C. Wang, “MAT- A project to collect Mandarin speech data through telephone networks in Taiwan”, Comput. Linguist. Chinese Language Proc., pp. 73-90, 1997. [15] W. Y. Chan , S. Gupta and A. Gersho,”Enhanced Multistage Vector Quantization by Joint Codebook Design”, IEEE Trans on Communication,Vol. 40, No. 11, pp.1693-1697 1992. [16] W. C. Chen, C.T. Hsieh and E. Lai, “Multiband Approach to Robust Text-Independent Speaker Identification”, C.L.C.L. Process., Vol.9, NO.2, pp. 63-76, August 2004. [17] B. H. Juang and A. H. Gray, “Multiple stage vector quantization for speech coding,” in Proc. ICASSP-82, Volume 7, pp. 597 – 600, Apr. 1982. [18] 陳高斌,“應用SOM-PNN 混合神經網路在語者辨識,”私立義守大學,電機工程研究所碩士論文,2001。 [19] 謝鴻文,“幾個應用於連續語音音節切割之演算法之效能比較及系統實作,”私 立長庚大學,資訊工程研究所碩士論文,2004。 [20] 楊壁如,“語者歌者辨識,”國立清華大學,資訊工程研究所博士論文,1999。 [21] 王小川,“語音訊號處理,”全華科技圖書公司,2004。 [22] 王有傳,“以模糊理論及類神經網路為基礎利用遺傳演算法的語者辨認,”私立 淡江大學,電機工程研究所碩士論文,1996。 [23]林青慧,“強韌式語者辨識系統從麥克風、市話到手機,”國立 清華大學,資訊系統與應用研究所碩士論文,2004。 |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信