淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


下載電子全文限經由淡江IP使用) 
系統識別號 U0002-1608201613531000
中文論文名稱 基於RM距離量測之語料獨立語者辨識系統
英文論文名稱 Speaker Recognition with Independent Corpus Based on RM Distance Measure
校院名稱 淡江大學
系所名稱(中) 電機工程學系碩士班
系所名稱(英) Department of Electrical Engineering
學年度 104
學期 2
出版年 105
研究生中文姓名 江正元
研究生英文姓名 Cheng-Yuan Chiang
學號 603440263
學位類別 碩士
語文別 中文
口試日期 2016-07-15
論文頁數 38頁
口試委員 指導教授-謝景棠
委員-蘇木春
委員-謝君偉
中文關鍵字 語音增強  稀疏表示  K-SVD  Label Consistent K-SVD (LC K-SVD)  黎曼距離  MFCC  歐基理德距離 
英文關鍵字 Speech Enhancement  sparse representations  K-SVD  Label Consistent K-SVD (LC K-SVD)  Riemannian Distance  MFCC  Euclidean Distance 
學科別分類 學科別應用科學電機及電子
中文摘要 在辨識人的身分這方面,語音一直是滿熱門的研究方向。近幾年來,學者們陸續提出在白色雜訊環境與彩色雜訊環境下語者辨識的研究。為了提高LLR、PESQ、SNR 與SNRseg等的評估品質,雜訊去除導入了稀疏表示演算法,但花費時間長。所以我們提出一致性標籤KSVD 稀疏編碼,來縮短處理時間。目前語者辨識系統大多使用歐基理德距離來計算特徵的距離,我們的目標是短語料長度與語料獨立,這使高辨識精確度更難達成。我們提出黎曼距離(Riemannian Distance)取代歐基理德距離,但我們的實驗結果顯示,歐基理德距離遠勝於黎曼距離。本論文的實驗使用波形、MFCC與MFCC平滑化頻譜特徵搭配RD、ED來進行語者識別實驗。
英文摘要 The speaker recognition is always a hot topic in the research field. Technologies of speaker recognition under white and color noisy environments have been proposed in recent years. Sparse representation algorithm has been introduced into noise filtering for improving the assessments of speech quality, such as SNR, SNRseg, LLR and PESQ, but the cost time is lengthy. So we employ Label Consistent K-SVD sparse coding (LC-KSVD) to de-noise speech data and decrease processing time. Speaker recognition systems almost use Euclidean distance to compute the distance between features, currently. Our goal is to have short corpus and independent corpus, which makes it more difficult to achieve high recognition accuracy. We propose Riemannian distance replace Euclidean distance, but our experimental results show that Euclidean distance is superior than Riemannian distance. We use waveform, MFCC and MFCC smoothing spectrum with RD and ED for speaker recognition experiment in this paper.
論文目次 目錄
致謝....................................................................................................................I
中文摘要..........................................................................................................II
英文摘要.........................................................................................................III
目錄.................................................................................................................IV
圖目錄...........................................................................................................VII
表目錄..........................................................................................................VIII
第一章緒論...................................................................................................1
1.1研究動機..........................................................................................1
1.2研究方法..........................................................................................2
1.3論文架構..........................................................................................3
第二章相關研究與基礎技術.......................................................................4
2.1一致性標籤KSVD 稀疏編碼(Label Consistent K-SVD sparse coding, LC-KSVD)...................................................................................4
2.1.1K-SVD去雜訊......................................................................4
2.1.2LC K-SVD.............................................................................7
2.2黎曼距離(Riemannian Distance).....................................................9
2.3功率頻譜密度(PSD)矩陣..............................................................10V

第三章語者辨識系統.................................................................................11
3.1系統架構........................................................................................11
3.2系統流程........................................................................................11
3.2.1LC K-SVD去雜訊..............................................................11
3.2.2語者辨識.............................................................................12
第四章系統評估.........................................................................................15
4.1實驗環境........................................................................................15
4.2實驗資料庫....................................................................................15
4.2.1CHIME資料庫...................................................................15
4.2.2自建資料庫.........................................................................16
4.3語音質量評估................................................................................16
4.3.1Log-Likelihood Ratio (LLR)...............................................16
4.3.2Perceptual Evaluation of Speech Quality (PESQ)..............17
4.3.3Signal-to-noise ratio(SNR)..................................................18
4.3.4segmental SNR (SNRseg).................................................19
4.4K-SVD與LC K-SVD之語音質量評估與比較..........................19
4.5語者辨識之辨識率........................................................................23
第五章結論與未來課題.............................................................................34
5.1結論................................................................................................34VI

5.2未來課題........................................................................................35
參考文獻.........................................................................................................37VII

圖目錄
圖2. 1稀疏理論去雜訊系統流程圖..............................................................4
圖2. 2含雜訊訊號、雜訊訊號以及去雜訊訊號的波形比較圖.................5
圖2. 3 K-SVD 訓練乾淨語音字典之流程圖...............................................7
圖2. 4 黎曼流形(Riemannian manifold)................................................10
圖3. 1 語者辨識系統流程圖.......................................................................11
圖3. 2LC K-SVD 去雜訊流程圖.................................................................12
圖4. 1在LLR品質評估方式的平均結果.................................................20
圖4. 2在PESQ品質評估方式的平均結果...............................................21
圖4.3在SNR品質評估方式的平均結果.................................................21
圖4. 4在segSNR品質評估方式的平均結果............................................22
圖4. 5不分訓練方式的正確接受率............................................................27
圖4. 6不分語音內容的正確接受率............................................................27
圖4. 7不分訓練方式的錯誤接受率............................................................28
圖4. 8不分語音內容的錯誤接受率............................................................28
圖4. 9語音增長的錯誤接受率....................................................................31
圖4. 10語音增長的正確接受率..................................................................31VIII

表目錄
表4. 1 PESQ分數映射至MOS分數之指標..............................................18
表4. 2LC K-SVD與K-SVD的運算時間表..............................................22
表4. 3步驟A的語者辨識結果統計表.......................................................25
表4. 4步驟B的語者辨識結果統計表.......................................................25
表4. 5步驟C的語者辨識結果統計表.......................................................26
表4. 6步驟B、C的語者辨識結果統計表................................................30
表4. 7步驟D (絕對值)的語者辨識結果統計表........................................32
表4. 8步驟D (實數部)的語者辨識結果統計表........................................33
表4. 9步驟E的語者辨識結果統計表.......................................................33
參考文獻 [1] M.A. Abd El-Fattah, M.I. Dessouky, A.M. Abbas, S.M. Diab, S.M. El-Rabaie, F.E. Abd El-samie. "Speech enhancement with an adaptive Wiener filter," International Journal of Speech Technology, vol. 17, no. 1, 2014, pp. 53-64.
[2] M. Bahoura, J. Rouat, "Wavelet Speech Enhancement Based on the Teager Energy Operator," IEEE Signal Processing Letters, vol. 8, no. 1, 2001, pp. 10-12.
[3] J. F. Wang, S. H.Chen. and J. J. Lee. "Speech Signal Denoising Based on Multi-Type Wavelet Transforms," Asia Pacific Conference on Multimedia Technology and Applications, 2000, pp. 287-291.
[4] C. T. Hsieh, P. Y. Huang, T. W. Chen, Y. H. Chen. "Speech Enhancement based on Sparse Representation under Color Noisy Environment," International Symposium on Intelligent Signal Processing and Communication Systems, November, 2015.
[5] J. Jost. "Riemannian geometry and geometric analysis." Springer-Verlag, 1998.
[6] Y. Li and K. M. Wong. “Riemannian Distances for Signal Classification by Power Spectral Density.” IEEE Journal of Selected Topics in Signal Processing, Vol. 7, No. 4, August 2013.
[7] Z. Jiang, Z. Lin, and L. S. Davis. “Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 35, No. 11, November, 2013.
[8] Z. Zhang; Y. Xu; J. Yang; X. Li and D. Zhang, "A Survey of Sparse Representation: Algorithms and Applications," IEEE J. ACCESS, vol. 3, 2015, pp. 490-530.
[9] M. Aharon, M. Elad and A. Bruckstein, "K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation," IEEE Transactions on Signal Processing, vol. 54, no. 11, 2006, pp. 4311-4322.
[10] D. Pham and S. Venkatesh, “Joint Learning and Dictionary Construction for Pattern Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2008.
[11] Q. Zhang and B. Li, “Discriminative K-SVD for Dictionary Learning in Face Recognition,” Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[12] J. Yang, K. Yu, and T. Huang, "Supervised Translation-Invariant Sparse Coding," Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2010.
[13] J. Mairal, F. Bach, J. Ponce, G. Sapiro, and A. Zisserman, “Supervised Dictionary Learning,” Proc. Conf. Neural Information Processing Systems, 2009.
[14] Y. Li, K. M. Wong, and H. DeBruin, “EEG signal classification for sleep-stage decision – A Riemannian geometry approach,” IET Signal Process., vol. 6, no. 4, pp. 288–299, Jun. 2012.
[15] J. Barker, E. Vincent, N. Ma, C. Christensen and P. Green, "The PASCAL CHiME speech separation and recognition challenge," Computer Speech and Language, vol. 27, no. 3, 2013, pp. 621-633.
[16]Y. Hu and P. Loizou, "Evaluation of objective quality measures for speech enhancement," IEEE Transactions on Speech and Audio Processing, vol. 16, no. 1, 2008, pp. 229-238.
[17] J. Ma, Hu, Y. Hu and P. Loizou, "Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions," Journal of the Acoustical Society of America, vol. 125, no. 5, 2009, pp. 3387-3405.
[18] N. A. Fox, B. A. O'Mullane and R. B. Reilly, “The Realistic Multi-modal VALID database and Visual Speaker Identification Comparison Experiments”, in Proc. of the 5th International Conference on Audio- and VideoBased Biometric Person Authentication (AVBPA-2005), New York, 2005.
[19] 張智星, 12-2 MFCC, 擷取自 Audio Signal Processing and Recognition (音訊處理與辨識): http://mirlab.org/jang/books/audiosignalprocessing/speechFeatureMfcc.asp?title=12-2%20MFCC
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2019-08-17公開。
  • 同意授權瀏覽/列印電子全文服務,於2019-08-17起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2486 或 來信