系統識別號 | U0002-0408201515193900 |
---|---|
DOI | 10.6846/TKU.2015.00124 |
論文名稱(中文) | 高雜訊環境下基於稀疏表示法的語音強化 |
論文名稱(英文) | Speech Enhancement based on Sparse Theory under Noisy Environment |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 電機工程學系碩士班 |
系所名稱(英文) | Department of Electrical and Computer Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 103 |
學期 | 2 |
出版年 | 104 |
研究生(中文) | 陳彥亨 |
研究生(英文) | Yan-Heng Chen |
學號 | 602440280 |
學位類別 | 碩士 |
語言別 | 繁體中文 |
第二語言別 | |
口試日期 | 2015-07-15 |
論文頁數 | 51頁 |
口試委員 |
指導教授
-
謝景棠(hsieh@ee.tku.edu.tw)
委員 - 蘇木春(muchun@csie.ncu.edu.tw) 委員 - 謝君偉(shieh@ntou.edu.tw) |
關鍵字(中) |
語音增強 稀疏表示 K-SVD Discrete consine transform(DCT) Orthogonal matching pursuit(OMP) |
關鍵字(英) |
Speech enhancement sparse representations K-SVD discrete consine transform(DCT) orthogonal matching pursuit(OMP) |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
近年來,基於稀疏演算法用於訊號增強是越來越熱門的議題,此論文,我們運用稀疏演算法來增強語音訊號,我們將稀疏過程分為兩個部分:一部份為字典訓練,另一部份為語音訊號重建。乾淨語音字典是使用含雜訊語音資料利用K-SVD演算法訓練後取得,而乾淨語音字典的稀疏係數X則利用Orthogonal Matching Pursuit (OMP)演算法進行最佳化。語音訊號重建時可透過乾淨語音字典矩陣與稀疏係數矩陣相乘產生。系統則在高雜訊環境下進行評估,其環境分為白色高斯雜訊與彩色雜訊環境;同時利用四種語音客觀評估方式(SNR、LLR、SNRseg與PESQ)來評估語音去除雜訊效能。最後再與其他語音增強方法進行比較,實驗證實我們所提出的方法較優於其他語音增強方法。 |
英文摘要 |
Recently, sparse algorithm for signal enhancement is more and more popular issues. In this paper, we apply it to enhance speech signal. The process of sparse theory is classified into two parts, one is for dictionary training part and the other is signal reconstruction part. We focus environment on both white Gaussian noise and color noise filtering based on sparse. The orthogonal matching pursuit (OMP) algorithm is used to optimize the sparse coefficients X of clean speech dictionary, where clean speech dictionary is trained by K-SVD algorithm. Then, we multiply these two matrixes D' and X to reconstruct the clean speech signal. Denoising performance of the experiments shows that our proposed method is superior than other state of art methods in four kind of objective quality measures as SNR, LLR, SNRseg and PESQ. |
第三語言摘要 | |
論文目次 |
目錄 致謝 I 中文摘要 II 英文摘要 III 目錄 IV 圖目錄 VII 表目錄 IX 第一章 緒論 1 1.1 研究動機 1 1.2 研究方法 2 1.3 論文架構 2 第二章 相關研究與基礎技術 3 2.1 相關研究 3 2.1.1 基於稀疏表示的影像雜訊濾除 3 2.1.2 傳統的語音增強 5 2.2 基礎技術 6 2.2.1 頻譜相減法(Spectral subtraction) 6 2.2.2 小波係數臨界法(Wavelet coefficient thresholding) 7 2.2.2.1 硬式閥值(Hard thresholding) 7 2.2.2.2 軟式閥值(Soft thresholding) 7 2.2.3 雜訊絕對值平均減法(Noise absolute mean subtraction) 8 2.2.4 Wiener濾波器(Wiener filter) 9 2.2.5 自適性溫尼濾波器(Adaptive wiener filtering) 10 2.2.6 lp-norm 12 2.2.7 K-SVD 13 第三章 sparse去雜訊系統 14 3.1 系統架構 14 3.2 系統流程 15 3.2.1 乾淨語音字典訓練 15 3.2.2 乾淨語音訊號重建 18 第四章 系統評估 20 4.1 實驗環境 20 4.2 實驗資料庫 20 4.2.1 CHIME資料庫 21 4.2.2 NOIZEUS資料庫 21 4.3 語音質量評估 22 4.3.1 Signal-to-noise ratio(SNR) 22 4.3.2 Log-Likelihood Ratio (LLR) 22 4.3.3 segmental SNR (SNRseg) 23 4.3.4 Perceptual Evaluation of Speech Quality (PESQ) 23 4.4 實驗比較 25 4.4.1 白色高斯雜訊環境 25 4.4.2 彩色雜訊環境 34 4.4.3 時間比較 46 第五章 結論與未來課題 48 5.1 結論 48 5.2 未來課題 49 參考文獻 50 圖目錄 圖2. 1 M. Aharon [2]所提出方法 4 圖2. 2 M. Elad [3]所提出方法 4 圖2. 3 自適性溫尼濾波流程圖[4] 10 圖2. 4不同norm幾何詮釋[1] 12 圖3. 1 sparse去雜訊系統流程圖 14 圖3. 2 輸入雜訊訊號波形圖 14 圖3. 3 乾淨語音字典訓練流程圖 15 圖3. 4 字典更新 16 圖3. 5乾淨語音訊號重建流程圖 18 圖4. 1SNR=5dB雜訊濾除波形與聲紋頻譜比較(續) 26 圖4. 2 SNR=5dB雜訊濾除波形與聲紋頻譜比較 27 圖4. 3 白色高斯環境下單句語音SNR評估 29 圖4. 4 白色高斯環境下單句語音LLR評估 29 圖4. 5 白色高斯環境下單句語音SNRseg評估 30 圖4. 6 白色高斯環境下單句語音PESQ評估 30 圖4. 7 白色高斯環境下平均SNR評估 31 圖4. 8白色高斯環境下平均LLR評估 32 圖4. 9 白色高斯環境下平均SNRseg評估 32 圖4. 10白色高斯環境下平均PESQ評估 33 圖4. 11 SNR=5dB彩色雜訊濾除波形與聲紋頻譜比較(續) 35 圖4. 12 SNR=5dB彩色雜訊濾除波形與聲紋頻譜比較 36 表目錄 表4. 1 PESQ分數映射至MOS分數之指標 24 表4. 2 SNR=5dB雜訊濾除效能評估 28 表4. 3多語料在三種雜訊程度四種環境下不同方法的SNR效能評估 38 表4. 4多語料在三種雜訊程度四種環境下不同方法的SNR效能評估 39 表4. 5多語料在三種雜訊程度四種環境下不同方法的LLR效能評估 40 表4. 6多語料在三種雜訊程度四種環境下不同方法的LLR效能評估 41 表4. 7多語料在三種雜訊程度四種環境下不同方法的SNRseg效能評估 42 表4. 8多語料在三種雜訊程度四種環境下不同方法的SNRseg效能評估 43 表4. 9多語料在三種雜訊程度四種環境下不同方法的PESQ效能評估 44 表4. 10多語料在三種雜訊程度四種環境下不同方法的PESQ效能評估 45 表4. 11 白色高斯雜訊環境下平均計算時間 46 表4. 12 彩色雜訊環境下平均計算時間 47 |
參考文獻 |
參考文獻 [1] Z. Zhang; Y. Xu; J. Yang; X. Li and D. Zhang, "A Survey of Sparse Representation: Algorithms and Applications," IEEE J. ACCESS, vol. 3, 2015, pp. 490-530. [2] M. Aharon, M. Elad and A. Bruckstein, "K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation," IEEE Transactions on Signal Processing, vol. 54, no. 11, 2006, pp. 4311-4322. [3] M. Elad and M. Aharon, "Image Denoising Via Sparse and Redundant Representations Over Learning Dictionaries," IEEE Transactions on Image Processing, vol. 15, no. 12, 2006, pp. 3736-3745. [4] M.A. Abd El-Fattah, M.I. Dessouky, A.M. Abbas, S.M. Diab, S.M. El-Rabaie, F.E. Abd El-samie. "Speech enhancement with an adaptive Wiener filter," International Journal of Speech Technology, vol. 17, no. 1, 2014, pp. 53-64. [5] M. Bahoura, J. Rouat, "Wavelet Speech Enhancement Based on the Teager Energy Operator," IEEE Signal Processing Letters, vol. 8, no. 1, 2001, pp. 10-12. [6] J. F. Wang, S. H.Chen. and J. J. Lee. "Speech Signal Denoising Based on Multi-Type Wavelet Transforms," Asia Pacific Conference on Multimedia Technology and Applications, 2000, pp. 287-291. [7] J. Barker, E. Vincent, N. Ma, C. Christensen and P. Green, "The PASCAL CHiME speech separation and recognition challenge," Computer Speech and Language, vol. 27, no. 3, 2013, pp. 621-633. [8] Y. Hu and P. Loizou, "Subjective evaluation and comparison of speech enhancement algorithms," Speech Communication, vol. 49, no. 7, 2007, pp. 588-601. [9] Y. Hu and P. Loizou, "Evaluation of objective quality measures for speech enhancement," IEEE Transactions on Speech and Audio Processing, vol. 16, no. 1, 2008, pp. 229-238. [10] J. Ma, Hu, Y. Hu and P. Loizou, "Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions," Journal of the Acoustical Society of America, vol. 125, no. 5, 2009, pp. 3387-3405. [11] Z. Zhang, L. Wang, Q. Zhu, Z. Liu, and Y. Chen, "Noise modeling and representation based classification methods for face recognition," Neurocomputing, vol. 148, no. 19, 2015, pp. 420-429. [12] N. Tanabe, T. Furukawa, H. Matsue and S. Tsujii, "Kalman Filter for Robust Noise Suppression in White and Colored Noises," IEEE International Symposium on Circuits and Systems, ISCAS, 2008, pp. 1172-1175. [13] M. S. Lewicki and T. J. Sejnowski. "Learning overcomplete representations," Neural Comput, vol. 12, no. 2, 2000, pp. 337-365. [14] R. Gribonval and K. Schnass. "Some recovery conditions for basis learning by L1-minimization," in Proc. Int. Symp.Commun., Control, Signal Process. ISCCSP, vol. 12, no.14, 2008, pp. 768-733. [15] X. Zhimin and G. Yuantao, "Adaptive Speech Enhancement using Sparse Prior Information," IEEE Intemational Conference on Acoustics, Speech and Signal Processing, 2013, pp. 7025-7029. [16] Jafari, M. G. and Plumbley, M. D. "Fast Dictionary Learning for Sparse Representations of Speech Signals," IEEE journal of selected topics in signal processing, vol. 5, no. 5, 2011, pp. 1025-1031. |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信