§ 瀏覽學位論文書目資料
  
系統識別號 U0002-0408201515193900
DOI 10.6846/TKU.2015.00124
論文名稱(中文) 高雜訊環境下基於稀疏表示法的語音強化
論文名稱(英文) Speech Enhancement based on Sparse Theory under Noisy Environment
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 電機工程學系碩士班
系所名稱(英文) Department of Electrical and Computer Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 103
學期 2
出版年 104
研究生(中文) 陳彥亨
研究生(英文) Yan-Heng Chen
學號 602440280
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2015-07-15
論文頁數 51頁
口試委員 指導教授 - 謝景棠(hsieh@ee.tku.edu.tw)
委員 - 蘇木春(muchun@csie.ncu.edu.tw)
委員 - 謝君偉(shieh@ntou.edu.tw)
關鍵字(中) 語音增強
稀疏表示
K-SVD
Discrete consine transform(DCT)
Orthogonal matching pursuit(OMP)
關鍵字(英) Speech enhancement
sparse representations
K-SVD
discrete consine transform(DCT)
orthogonal matching pursuit(OMP)
第三語言關鍵字
學科別分類
中文摘要
近年來,基於稀疏演算法用於訊號增強是越來越熱門的議題,此論文,我們運用稀疏演算法來增強語音訊號,我們將稀疏過程分為兩個部分:一部份為字典訓練,另一部份為語音訊號重建。乾淨語音字典是使用含雜訊語音資料利用K-SVD演算法訓練後取得,而乾淨語音字典的稀疏係數X則利用Orthogonal Matching Pursuit (OMP)演算法進行最佳化。語音訊號重建時可透過乾淨語音字典矩陣與稀疏係數矩陣相乘產生。系統則在高雜訊環境下進行評估,其環境分為白色高斯雜訊與彩色雜訊環境;同時利用四種語音客觀評估方式(SNR、LLR、SNRseg與PESQ)來評估語音去除雜訊效能。最後再與其他語音增強方法進行比較,實驗證實我們所提出的方法較優於其他語音增強方法。
英文摘要
Recently, sparse algorithm for signal enhancement is more and more popular issues. In this paper, we apply it to enhance speech signal. The process of sparse theory is classified into two parts, one is for dictionary training part and the other is signal reconstruction part. We focus environment on both white Gaussian noise and color noise filtering based on sparse. The orthogonal matching pursuit (OMP) algorithm is used to optimize the sparse coefficients X of clean speech dictionary, where clean speech dictionary is trained by K-SVD algorithm. Then, we multiply these two matrixes D' and X to reconstruct the clean speech signal. Denoising performance of the experiments shows that our proposed method is superior than other state of art methods in four kind of objective quality measures as SNR, LLR, SNRseg and PESQ.
第三語言摘要
論文目次
目錄
致謝	I
中文摘要	II
英文摘要	III
目錄	IV
圖目錄	VII
表目錄	IX
第一章 緒論	1
1.1 研究動機	1
1.2 研究方法	2
1.3 論文架構	2
第二章 相關研究與基礎技術	3
2.1 相關研究	3
2.1.1 基於稀疏表示的影像雜訊濾除	3
2.1.2 傳統的語音增強	5
2.2 基礎技術	6
2.2.1 頻譜相減法(Spectral subtraction)	6
2.2.2 小波係數臨界法(Wavelet coefficient thresholding)	7
2.2.2.1 硬式閥值(Hard thresholding)	7
2.2.2.2 軟式閥值(Soft thresholding)	7
2.2.3 雜訊絕對值平均減法(Noise absolute mean subtraction)	8
2.2.4 Wiener濾波器(Wiener filter)	9
2.2.5 自適性溫尼濾波器(Adaptive wiener filtering)	10
2.2.6 lp-norm	12
2.2.7 K-SVD	13
第三章 sparse去雜訊系統	14
3.1 系統架構	14
3.2 系統流程	15
3.2.1 乾淨語音字典訓練	15
3.2.2 乾淨語音訊號重建	18
第四章 系統評估	20
4.1 實驗環境	20
4.2 實驗資料庫	20
4.2.1 CHIME資料庫	21
4.2.2 NOIZEUS資料庫	21
4.3 語音質量評估	22
4.3.1 Signal-to-noise ratio(SNR)	22
4.3.2 Log-Likelihood Ratio (LLR)	22
4.3.3 segmental SNR (SNRseg)	23
4.3.4 Perceptual Evaluation of Speech Quality (PESQ)	23
4.4 實驗比較	25
4.4.1 白色高斯雜訊環境	25
4.4.2 彩色雜訊環境	34
4.4.3 時間比較	46
第五章 結論與未來課題	48
5.1 結論	48
5.2 未來課題	49
參考文獻	50
 
圖目錄
圖2. 1 M. Aharon [2]所提出方法	4
圖2. 2 M. Elad [3]所提出方法	4
圖2. 3 自適性溫尼濾波流程圖[4]	10
圖2. 4不同norm幾何詮釋[1]	12
圖3. 1 sparse去雜訊系統流程圖	14
圖3. 2 輸入雜訊訊號波形圖	14
圖3. 3 乾淨語音字典訓練流程圖	15
圖3. 4 字典更新	16
圖3. 5乾淨語音訊號重建流程圖	18
圖4. 1SNR=5dB雜訊濾除波形與聲紋頻譜比較(續)	26
圖4. 2 SNR=5dB雜訊濾除波形與聲紋頻譜比較	27
圖4. 3 白色高斯環境下單句語音SNR評估	29
圖4. 4 白色高斯環境下單句語音LLR評估	29
圖4. 5 白色高斯環境下單句語音SNRseg評估	30
圖4. 6 白色高斯環境下單句語音PESQ評估	30
圖4. 7 白色高斯環境下平均SNR評估	31
圖4. 8白色高斯環境下平均LLR評估	32
圖4. 9 白色高斯環境下平均SNRseg評估	32
圖4. 10白色高斯環境下平均PESQ評估	33
圖4. 11 SNR=5dB彩色雜訊濾除波形與聲紋頻譜比較(續)	35
圖4. 12 SNR=5dB彩色雜訊濾除波形與聲紋頻譜比較	36
 
表目錄
表4. 1 PESQ分數映射至MOS分數之指標	24
表4. 2 SNR=5dB雜訊濾除效能評估	28
表4. 3多語料在三種雜訊程度四種環境下不同方法的SNR效能評估	38
表4. 4多語料在三種雜訊程度四種環境下不同方法的SNR效能評估	39
表4. 5多語料在三種雜訊程度四種環境下不同方法的LLR效能評估	40
表4. 6多語料在三種雜訊程度四種環境下不同方法的LLR效能評估	41
表4. 7多語料在三種雜訊程度四種環境下不同方法的SNRseg效能評估	42
表4. 8多語料在三種雜訊程度四種環境下不同方法的SNRseg效能評估	43
表4. 9多語料在三種雜訊程度四種環境下不同方法的PESQ效能評估	44
表4. 10多語料在三種雜訊程度四種環境下不同方法的PESQ效能評估	45
表4. 11 白色高斯雜訊環境下平均計算時間	46
表4. 12 彩色雜訊環境下平均計算時間	47
參考文獻
參考文獻
[1]	Z. Zhang; Y. Xu; J. Yang; X. Li and D. Zhang, "A Survey of Sparse Representation: Algorithms and Applications," IEEE J. ACCESS, vol. 3, 2015, pp. 490-530.
[2]	M. Aharon, M. Elad and A. Bruckstein, "K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation," IEEE Transactions on Signal Processing, vol. 54, no. 11, 2006, pp. 4311-4322.
[3]	M. Elad and M. Aharon, "Image Denoising Via Sparse and Redundant Representations Over Learning Dictionaries," IEEE Transactions on Image Processing, vol. 15, no. 12, 2006, pp. 3736-3745.
[4]	M.A. Abd El-Fattah, M.I. Dessouky, A.M. Abbas, S.M. Diab, S.M. El-Rabaie, F.E. Abd El-samie. "Speech enhancement with an adaptive Wiener filter," International Journal of Speech Technology, vol. 17, no. 1, 2014, pp. 53-64.
[5]	M. Bahoura, J. Rouat, "Wavelet Speech Enhancement Based on the Teager Energy Operator," IEEE Signal Processing Letters, vol. 8, no. 1, 2001, pp. 10-12.
[6]	J. F. Wang, S. H.Chen. and J. J. Lee. "Speech Signal Denoising Based on Multi-Type Wavelet Transforms," Asia Pacific Conference on Multimedia Technology and Applications, 2000, pp. 287-291.
[7]	J. Barker, E. Vincent, N. Ma, C. Christensen and P. Green, "The PASCAL CHiME speech separation and recognition challenge," Computer Speech and Language, vol. 27, no. 3, 2013, pp. 621-633.
[8]	Y. Hu and P. Loizou, "Subjective evaluation and comparison of speech enhancement algorithms," Speech Communication, vol. 49, no. 7, 2007, pp. 588-601.
[9]	Y. Hu and P. Loizou, "Evaluation of objective quality measures for speech enhancement," IEEE Transactions on Speech and Audio Processing, vol. 16, no. 1, 2008, pp. 229-238.
[10]	J. Ma, Hu, Y. Hu and P. Loizou, "Objective measures for predicting speech intelligibility in noisy conditions based on new band-importance functions," Journal of the Acoustical Society of America, vol. 125, no. 5, 2009, pp. 3387-3405.
[11]	Z. Zhang, L. Wang, Q. Zhu, Z. Liu, and Y. Chen, "Noise modeling and representation based classification methods for face recognition," Neurocomputing, vol. 148, no. 19, 2015, pp. 420-429.
[12]	N. Tanabe, T. Furukawa, H. Matsue and S. Tsujii, "Kalman Filter for Robust Noise Suppression in White and Colored Noises," IEEE International Symposium on Circuits and Systems, ISCAS, 2008, pp. 1172-1175.
[13]	M. S. Lewicki and T. J. Sejnowski. "Learning overcomplete representations," Neural Comput, vol. 12, no. 2, 2000, pp. 337-365.
[14]	R. Gribonval and K. Schnass. "Some recovery conditions for basis learning by L1-minimization," in Proc. Int. Symp.Commun., Control, Signal Process. ISCCSP, vol. 12, no.14, 2008, pp. 768-733.
[15]	X. Zhimin and G. Yuantao, "Adaptive Speech Enhancement using Sparse Prior Information," IEEE Intemational Conference on Acoustics, Speech and Signal Processing, 2013, pp. 7025-7029.
[16]	Jafari, M. G. and Plumbley, M. D. "Fast Dictionary Learning for Sparse Representations of Speech Signals," IEEE journal of selected topics in signal processing, vol. 5, no. 5, 2011, pp. 1025-1031.
論文全文使用權限
校內
紙本論文於授權書繳交後3年公開
同意電子論文全文授權校園內公開
校內電子論文於授權書繳交後3年公開
校外
同意授權
校外電子論文於授權書繳交後3年公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信