§ 瀏覽學位論文書目資料
系統識別號 U0002-1102200902022900
DOI 10.6846/TKU.2009.01254
論文名稱(中文) 雜訊環境下強健性語者辨認的新方法
論文名稱(英文) Novel Approaches for Robust Speaker Identification under Noisy Environments
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 電機工程學系博士班
系所名稱(英文) Department of Electrical and Computer Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 97
學期 1
出版年 98
研究生(中文) 陳萬城
研究生(英文) Wan-Chen Chen
學號 889350061
學位類別 博士
語言別 英文
第二語言別
口試日期 2009-01-12
論文頁數 70頁
口試委員 指導教授 - 謝景棠(hsieh@ee.tku.edu.tw)
委員 - 蘇木春(muchun@csie.ncu.edu.tw)
委員 - 邱榮輝(jhchiu@mail.cgu.edu.tw)
委員 - 簡福榮(frjean@ntut.edu.tw)
委員 - 許志旭(hsuch@ems.cku.edu.tw)
關鍵字(中) 語者辨認
小波轉換
多層解析
特徵抽取
重要成份分析
高斯混和式模型
多階層向量量化
關鍵字(英) speaker identification
wavelet transform
multi-resolution
feature extraction
principal component analysis(PCA)
Gaussian mixture model(GMM)
multi-stage vector quantization(MSVQ)
第三語言關鍵字
學科別分類
中文摘要
當訓練環境與應用環境彼此不匹配時,語者辨認系統的辨識效能會嚴重下降。本論文主要針對語者辨認系統在環境不匹配所造成的問題,提出幾個改善強健性的技術。在語音特徵方面,提出一個多頻帶語音特徵抽取技術,利用離散小波轉換技術將語音訊號分解成幾個頻帶,並萃取出分佈於各個頻帶訊號的線性預估倒頻譜係數,最後在求出的語音特徵上作特徵向量正規化處理,以確保在不同的環境下能獲得相似的語音特徵。為有效利用所求出之多頻帶語音特徵,在辨識模型上我們提出幾種改良的方法。首先提出多頻帶特徵結合法與多頻帶機率結合法應用於高斯混和式模型。實驗顯示這兩種方法的辨識效能均優於使用線性預估倒頻譜係數與梅爾刻度倒頻譜係數語音特徵的高斯混和式模型。第二部分提出多頻帶二階向量量化模型。此辨識模型的量化誤差為每一個頻帶的二階向量量化器量化誤差總和。實驗顯示此一方法的辨識效能優於使用線性預估倒頻譜係數與梅爾刻度倒頻譜係數語音特徵的向量量化模型與高斯混和式模型的辨識架構。第三部分提出一改良型的多頻帶向量量化模型。此一辨識架構主要是利用分層處理的概念來消除不同頻帶間語音係數的干擾並以重要成份分析技術來表現各頻帶編碼簿的特性,使得所建構出的編碼簿更能有效描述音素的特性。實驗結果顯示此方法的辨識效能均優於先前所提的辨識模型。
英文摘要
The performance of speaker recognition system is seriously degraded due to mismatched condition between training and testing environments. This dissertation is mainly focused on some particular parts of the robustness issues of a speaker identification system. At first, a multi-band linear predictive cepstral coefficients (MBLPCC) speech feature is presented. Based on discrete wavelet transform (DWT) technique, the input speech signal is decomposed into various frequency subbands, and LPCC of the lower frequency subband for each decomposition process are calculated. Furthermore, cepstral domain feature vector normalization is applied to all computed features in order to provide similar parameter statistics in all acoustic environments. By using MBLPCC speech feature as the front-end of the speaker identification, three approaches are proposed to deal with the various robustness problems of a text-independent speaker identification system. Firstly, we use feature recombination and likelihood recombination methods in Gaussian mixture model (GMM) to evaluate the task of text-independent speaker identification. Experimental results show that both proposed methods achieve better performance than GMM using full-band LPCC and mel-scale frequency cepstral coefficients (MFCC) in noisy environments. Secondarily, we propose a multi-band two-stage vector quantization (VQ) as the recognition model. Various two-stage VQ classifiers are applied independently to each band, and then the errors of all two-stage VQ classifiers are combined to yield a total error. It is shown that the proposed method is more effective and robust than conventional VQ and GMM models using full-band LPCC and MFCC features. Thirdly, we propose a modified VQ as the identifier. This model uses the multi-layer concept to eliminate interference among multi-band speech features and then uses principal component analysis (PCA) technique to evaluate the codebooks in all bands for capturing a more detailed distribution of individual speaker’s phoneme characteristics. By evaluating the proposed method, we can see that the proposed method gives better performance than other recognition models proposed previously in both clean and noisy environments. Also, a satisfactory performance can be achieved in low signal-to-noise ratio (SNR) environments.
第三語言摘要
論文目次
Contents

Abstract (in Chinese)	i
Abstract (in English)	ii
Contents	iv
List of Figures	vii
List of Tables	viii
	
	
Chapter 1 Introduction	1
1.1 Motivation	1
1.2 Review of Robust Speaker Recognition in Noisy Environment	3
1.3 Summary and Outline of This Dissertation	7
1.3.1 Multi-Band Speech Features Based on Wavelet Transform	8
1.3.2 Multi-Band Recognition Models Using Feature Recombination and Likelihood Recombination for Speaker Identification	8
1.3.3 Robust Speaker Identification System Based on Multi-Band Two-Stage Vector Quantization	9
1.3.4 Robust Speaker Identification System Based on Multi-Layer Eigen-Codebook Vector Quantization	9
	
Chapter 2 Multi-Band Speech Features Based on Wavelet Transform	10
2.1 Review of Wavelet Transform	10
2.2 Linear Predictive Cepstral Coefficients (LPCC)	14
2.3 Multi-Band Linear Predictive Cepstral Coefficients (MBLPCC)	17
	
	
Chapter 3 Multi-Band Recognition Models Using Feature Recombination and Likelihood Recombination for Speaker Identification	20
3.1 Gaussian Mixture Model (GMM)	20
3.2 Multi-Band Speaker Recognition Models	24
3.3 Experimental Results	27
3.3.1 Database Description and Parameter Setting	27
3.3.2 Effect of Decomposition Level	28
3.3.3 Comparison with Conventional GMM Models	30
3.4 Concluding Remarks of This Chapter	31
	
Chapter 4 Robust Speaker Identification System Based on Multi-Band Two-Stage Vector Quantization 	32
4.1 Two-Stage Vector Quantization	32
4.2 Multi-Band Two-Stage VQ Recognition Model	34
4.3 Experimental Results	36
4.3.1 Database Description and Parameters Setting	36
4.3.2 Contribution of Multi-Band	37
4.3.3 Effect of Number of Code Vectors in First and Second Stage Codebooks	38
4.3.4 Comparison with Other Existing Models	39
4.4 Concluding Remarks of This Chapter	41
	
Chapter 5 Robust Speaker Identification System Based on Multi-Layer Eigen-Codebook Vector Quantization	43
5.1 Vector Quantization	43
5.2 Principal Component Analysis	45
5.3 Multi-Layer Eigen-Codebook Vector Quantization (MLECVQ)	46
5.4 Experimental Results	49
5.4.1 Database Description and Parameters Setting	50
5.4.2 Contribution of Multi-Band	51
5.4.3 Comparison of the Performance between Eigen-Codebook VQ and Conventional VQ	52
5.4.4 Comparison with Other Existing Models	54
5.5 Concluding Remarks of This Chapter	57
	
Chapter 6 Conclusions and Future Work	59
	
	
References	61
Publications	70

List of Figures

Figure 1.1 Structure of speaker recognition system.	5
Figure 2.1 (a) Standard time domain basis.	11
Figure 2.1 (b) Standard frequency domain basis.	11
Figure 2.2 Three-scale of wavelet basis.	12
Figure 2.3 Filter-bank structure of discrete wavelet transform.	14
Figure 2.4 Two-band analysis tree for discrete wavelet transform.	19
Figure 2.5 Feature extraction algorithm of MBLPCC.	19
Figure 3.1 Depiction of an M-component Gaussian mixture density.	21
Figure 3.2 Structure of FCGMM.	25
Figure 3.3 Structure of LCGMM	26
Figure 4.1 Structure of two-stage VQ.	34
Figure 4.2 Structure of multi-band two-stage VQ model.	35
Figure 4.3 Effect of number of bands on identification performance of the multi-band two-stage VQ model with 64 code vectors in first stage and 32 code vectors in second stage in clean environment.	38
Figure 4.4 Effect of number of bands on identification performances of MBVQ with 96 code vectors and the multi-band two-stage VQ model with 64 code vectors in first stage and 32 code vectors in second stage in clean environment.	40
Figure 5.1 Structure of MLECVQ.	47
Figure 5.2 Effect of number of bands on identification performance of MLECVQ model with 96 code words and three projection basis vectors in clean environment.	52
Figure 5.3 Effect of number of projection basis vectors on identification performance of MLECVQ model with 96 code words and features of three bands in clean environment.	53
Figure 5.4. Performances of MBVQ and MLECVQ model with 96 code words and three projection basis vectors in clean environment.	54
 
List of Tables

Table 3.1 Effect of number of bands on identification rates for FCGMM and LCGMM models in clean and noisy environments.	29
Table 3.2 Identification rates for GMM+LPCC, GMM+MFCC, FCGMM and LCGMM models under white noise corruption.	31
Table 4.1 Effect of number of code vectors in first and second stage codebooks on identification rates for 3-band two-stage VQ in clean and noisy environments.	39
Table 4.2 Identification rates for VQ+LPCC, VQ+MFCC, GMM+LPCC, GMM+MFCC and 3-band two-stage VQ under white noise corruption.	41
Table 5.1 Identification rates, the recognition time per testing utterance and the numbers of floating point numbers of parameters of VQ+MFCC, GMM+MFCC, 3-band FCGMM, 4-band LCGMM, 3-band two-stage VQ and 3-band MLECVQ under white noise corruption.	56
Table 5.2 Identification rates of auditory model [82], 3-band FCGMM, 4-band LCGMM, 3-band two-stage VQ and 3-band MLECVQ under white noise corruption using 90 seconds training utterances and testing segments 6 seconds in length of 49 speakers of KING speech database. For auditory model [82], the length of testing segments is 6.4 seconds.	58
參考文獻
REFERENCES

[1]	  B. H. Juang and T. H. Chen, “The past, present, and future of speech processing,” IEEE Signal Processing Magazine, vol. 15, no. 3, May 1998, pp. 24-28.
[2]	  A. E. Rosenberg and F. K. Soong, “Recent research in automatic speaker recognition,” in S. Furui and M. M. Sondhi, editors, Advances in Speech Signal Processing, Marcel Dekker, 1991, pp. 701-738.
[3]	  D. A. Reynolds, “Experimental evaluation of features for robust speaker identification,” IEEE Trans. Speech and Audio Processing, vol. 2, no. 4, Oct. 1994, pp. 639-643.
[4]	  R. J. Mammone, X. Zhang, and R. P. Ramachandran, “Robust speaker recognition, a feature-based approach,” IEEE Signal Processing Magazine, vol. 13, no. 5,  1996, pp. 58-71.
[5]	  B. Atal, “Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification,” Journal of Acoustical Society America, vol. 55, June 1974, pp. 1304-1312.
[6]	  G. M. White and R. B. Neely, “Speech recognition experiments with linear prediction, bandpass filtering, and dynamic programming,” IEEE Trans. Acoustics, Speech, Signal Processing, vol. 24, no. 2, 1976, pp.183-188.
[7]	  R. Vergin, D. O’Shaughnessy, and A. Farhat, “Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 5, Sept. 1999, pp. 525-532.
[8]	  S. Tibrewala and H. Hermansky, “Sub-band based recognition of noisy speech,” in Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1997), vol. 2, Apr. 1997, pp. 1255–1258.
[9]	  H. Bourlard and S. Dupont, “A new ASR approach based on independent processing and recombination of partial frequency bands,” in Proc. Int. Conf. Spoken Language Processing, vol. 1, 1996, pp. 426–429.
[10]	N. Mirghafori and N. Morgan, “Combining connectionist multiband and full-band probability streams for speech recognition of natural numbers,” in Proc. Int. Conf. Spoken Language Processing, vol. 3, 1998, pp. 743–747.
[11]	S. Okawa, E. Bocchieri, and A. Potamianos, “Multi-band speech recognition in noisy environments,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP 1998), vol. 2, May 1998, pp. 641–644.
[12]	J. Ming, P. Hanna, D. Stewart, M. Owens, and F.J. Smith, “Improving speech recognition performance by using multi-model approaches,” Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP 1999), vol. 1, 15-19 March 1999, pp. 161-164.
[13]	R. Hariharan, I. Kiss, and O. Viikki, “Noise robust speech parameterization using multiresolution feature extraction,” IEEE Trans. Speech and Audio Processing, vol. 9, no. 8, Nov. 2001, pp. 856-865.
[14]	C. T. Hsieh and Y. C. Wang, “A robust speaker identification system based on wavelet transform,” IEICE Trans. Information and Systems, vol. E84-D, no. 7, July 2001, pp.839-846. 
[15]	C. T. Hsieh, E. Lai, and Y. C. Wang, “Robust speech features based on wavelet transform with application to speaker identification,” IEE Proceedings–Vision, Image and Signal Processing, vol. 149, no. 2, April 2002, pp. 108-114.
[16]	R. Gemello, F. Mana, D. Albesano, and R. D. Mori, “Multiple resolution analysis for robust automatic speech recognition,” Computer Speech and Language, vol. 20, no. 1, Jan. 2006, pp. 2-21.
[17]	O. Farooq and S. Datta, “Mel Filter-Like Admissible Wavelet Packet Structure for Speech Recognition,” IEEE Signal Processing Letters, vol. 8, no. 7, July 2001, pp. 196-198.
[18]	O. Farooq and S. Datta, “Robust features for speech recognition based on admissible wavelet packets,” Electronics Letters 6th, vol. 37, no. 25, Dec. 2001, pp. 1554-1556.
[19]	S. Y. Lung, “Applied multi-wavelet feature to text independent speaker identification,” IEICE Trans. Fundamentals of Electronics, Communications and Computer Sciences, vol. E87-A, no. 4, April 2004, pp.944-945.
[20]	J. S. Lim and A. V. Oppenheim, “Enhancement and bandwidth compression of noisy speech,” Proceedings of IEEE, vol. 67, no. 12, 1979, pp. 1586-1604.
[21]	D. Van Compernolle, “Noise adaption in a hidden Markov model speech recognition system,” Computer speech and Language, 3, 1989, pp. 151-167.
[22]	P. Lockwood and J. Boudy, “Experiments with a nonlinear spectral subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars,” Speech Communication, vol. 11, no. 2-3, 1992, pp. 215–228.
[23]	S. Furui, “Cepstral analysis technique for automatic speaker verification,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 29, no. 2, Apr. 1981, pp. 254-272.
[24]	A. Rosenberg, C.-H. Lee, and F. Soong, “Cepstral channel normalization techniques for HMM-based speaker verification,” in Proc. Int. Conf. on Spoken Language Processing, 1994, pp. 1835–1838.
[25]	A. Acero, “Environmental robustness in automatic speech recognition,” in Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1990) , 1990, vol. 2, pp. 849–852.
[26]	A. Sankar and C.-H. Lee, “A maximum-likelihood approach to stochastic matching for robust speech recognition,” IEEE Trans. Acoustics, Speech, Signal Processing, vol. 4, no. 3, May 1996, pp. 190-202.
[27]	C.-H. Lee, “On stochastic feature model compensation approaches to robust speech recognition,” Speech Communication, 25, 1998, pp. 29-49.
[28]	P. J.  Moreno, B. Raj, and R. M. Stern, “Data-driven environmental compensation for speech recognition: A unified approach,” Speech Communication, 24, 1998, pp. 267-285.
[29]	S. Furui, “Speaker-independent isolated word recognition using dynamic features of speech spectrum,” IEEE Trans. Acoustics, Speech, Signal Processing, vol. 34, no. 1, Feb. 1986, pp. 52-59.
[30]	F. K. Soong and A. E. Rosenberg, “On the use of instantaneous and transitional spectral information in speaker recognition,” IEEE Trans. Acoustics, Speech, Signal Processing, vol. 36, no. 6, June 1988, pp. 871-879.
[31]	S. Furui, “Comparison of speaker recognition methods using statistical features and dynamic features,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 29, no. 3, 1981, pp. 342-350.
[32]	H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Trans. Acoustics, Speech, and Signal Processing, vol. 26, no. 1, Feb. 1978, pp. 43-49.
[33]	A. L. Higgins and R. E. Wohlford, “A new method of text-independent speaker recognition,” in Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP 1986), vol. 11, Apr. 1986, pp. 869-872.
[34]	B. Yegnanarayana, S. R. M. Prasanna, J. M. Zachariah, and C. S. Gupta, “Combining evidence from source, suprasegmental and spectral features for a fixed-text speaker verification system,” IEEE Trans. Speech and Audio Processing, vol. 13,  no. 4,  July 2005, pp. 575-582.
[35]	K. Yu, J. Mason, and J. Oglesby, “Speaker recognition using hidden Markov models, dynamic time warping and vector quantization,” IEE Proceedings–Vision, Image and Signal Processing, vol. 142, no. 5, Oct. 1995, pp. 313–318. 
[36]	N. Z. Tisby, “On the application of mixture AR hidden Markov models to text independent speaker recognition,” IEEE Trans. Signal Processing, vol. 39, no. 3, March 1991, pp. 563–570.
[37]	A. Poritz, “Linear predictive hidden Markov models and the speech signal,” in Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1982), vol. 7, May 1982, pp. 1291-1294.
[38]	M. Inman, et al., “Speaker identification using hidden Markov models,” in Proc. of Int. Conf. on Signal Processing (ICSP 1998), vol. 1, Oct. 1998, pp. 609-612.
[39]	T. Matsui and S. Furui, “Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMM's,” IEEE Trans. Speech and Audio Processing, vol. 2, no. 3, July 1994, pp. 456-459.
[40]	N. B. Yoma and M. Villar, “Speaker verification in noise using a stochastic version of the weighted Viterbi algorithm,” IEEE Trans. Speech and Audio Processing, vol. 10, no. 3, March 2002, pp. 158-166.
[41]	D. A. Reynolds and R. C. Rose, “Robust text-independent speaker identification using Gaussian mixture speaker models,” IEEE Trans. Speech and Audio Processing, vol. 3, no. 1, Jan. 1995, pp. 72–83.
[42]	D. A. Reynolds, “Speaker identification and verification using Gaussian mixture speaker models,” Speech Communication, vol. 17, Issues 1-2, 1995, pp. 91-108.
[43]	D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker verification using adapted Gaussian mixture models,” Digital Signal Processing, vol. 10, no. 1, 2000, pp. 19-41.
[44]	C. Miyajima, Y. Hattori, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Text-independent speaker identification using Gaussian mixture models based on multi-space probability distribution,” IEICE Trans. Information and Systems, vol. E84-D, no. 7, 2001, pp. 847–855.
[45]	C. M. Alamo, F. J. C. Gil, C. T. Munilla, and L. H. Gomez, “Discriminative training of GMM for speaker identification,” in Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1996), vol. 1, May 1996, pp. 89–92.
[46]	B. L. Pellom and J. H. L. Hansen, “An efficient scoring algorithm for Gaussian mixture model based speaker identification,” IEEE Signal Processing Letters, vol. 5, no. 11, Nov. 1998, pp. 281–284.
[47]	U. V.  Chaudhari, J.  Navratil, and S. H.  Maes, “Multigrained modeling with pattern specific maximum likelihood transformations for text-independent speaker recognition,” IEEE Trans. Speech and Audio Processing, vol. 11, no. 1, Jan. 2003, pp. 61-69.
[48]	C. C. T. Chen, C. T. Chen, and P. W. Cheng, “Hybrid KLT/GMM approach for robust speaker identification,” Electronics Letters, vol. 39,  no. 21, Oct. 2003, pp.1552-1554.
[49]	C. Seo, K. Y. Lee, and J. Lee, “GMM based on local PCA for speaker identification,” Electronics Letters, vol. 37, no. 24, Nov. 2001, pp. 1486-1488.
[50]	T. Kinnunen, E. Karpov, and P. Franti, “Real-time speaker identification and verification,” IEEE Trans. Audio, Speech, Language Processing, vol. 14, no. 1, Jan. 2006, pp.277-288.
[51]	M. Grimaldi and F. Cummins, “Speaker Identification Using Instantaneous Frequencies,” IEEE Trans. Audio, Speech, Language Processing, vol. 16, no. 6, Aug. 2008, pp. 1097-1111.
[52]	S.-Y. Lung, “Distributed genetic algorithm for Gaussian mixture model based speaker identification,” Pattern Recognition, vol. 36, no. 10, Oct. 2003, pp. 2479-2481.
[53]	F. K. Soong, A. E. Rosenberg, L. R. Rabiner, and B. H. Juang, “A vector quantization approach to speaker recognition,” in Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1985), vol. 10, Apr. 1985, pp. 387-390.
[54]	D. K. Burton, “Text-dependent speaker verification using vector quantization source coding,” IEEE Trans. Acoustics, Speech, Signal Processing, vol. 35, no. 2, Feb. 1987, pp. 133-143.
[55]	J. He, L. Liu, and G. Palm, “A discriminative training algorithm for VQ-based speaker identification,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 3, May 1999, pp. 353-356.
[56]	Y. J. Kyung and H. S. Lee, “Bootstrap and aggregating VQ classifier for speaker recognition,” Electronics Letters, vol. 35, no. 12, June 1999, pp. 973-974.
[57]	Z.X. Yuan, B.L. Xu, and C.Z. Yu, “Binary quantization of feature vectors for robust text-independent speaker identification,” IEEE Trans. Speech and Audio Processing, vol. 7, no. 1, Jan. 1999, pp. 70-78.
[58]	S.-X. Zhang, M.-W. Mak, and H. M. Meng, “Speaker Verification via High-Level Feature Based Phonetic-Class Pronunciation Modeling,” IEEE Trans. Computers, vol. 56, no. 9, Sept. 2007, pp. 1189-1198.
[59]	V. Hautamaki, T. Kinnunen, I. Karkkainen, J. Saastamoinen, M. Tuononen, and P. Franti, “Maximum a Posteriori Adaptation of the Centroid Model for Speaker Verification,” IEEE Signal Processing Letters, vol. 15,  2008, pp. 162-165.
[60]	G. Zhou and W. B. Mikhael, “Speaker identification based on adaptive discriminative vector quantisation,” IEE Proceedings–Vision, Image and Signal Processing, vol. 153, no. 6, 2006, pp. 754-760.
[61]	B. H. Juang and A. H. Gray, “Multiple stage vector quantization for speech coding,” in Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 1982), vol. 7, 1982, pp. 597-600.
[62]	W.-Y. Chan, S. Gupta, and A. Gersho, “Enhanced multistage vector quantization by joint codebook design,” IEEE Trans. Communications, vol. 40,  no. 11,  Nov. 1992, pp. 1693-1697.
[63]	N. Phamdo, N. Farvardin, and T. Moriya, “A unified approach to tree-structured and multistage vector quantization for noisy channels,” IEEE Trans. Information Theory, vol. 39, no. 3, May 1993, pp. 835-850.
[64]	V. Krishnan, D.V. Anderson, and K.K. Truong, “Optimal multistage vector quantization of LPC parameters over noisy channels,” IEEE Trans. Speech and Audio Processing, vol. 12, no. 1, Jan. 2004, pp. 1-8.
[65]	P. N. Belhumeur, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. Fisherfaces: recognition using class specific linear projection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 19, no. 7, July 1997, pp. 711-720.
[66]	A. M. Martinez, and A. C. Kak, “PCA versus LDA,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 23, no. 2, Feb. 2001, pp. 228-233.
[67]	C. T. Hsieh, E. Lai, W. C. Chen, and Y. C. Wang, “Compact speech features based on wavelet transform and PCA with application to speaker identification,” Proc. of the International Symposium on Chinese Spoken Language Processing(ISCSLP 2002), Taipei, Aug. 2002, pp.165-168.
[68]	C. T. Hsieh, E. Lai, and W. C. Chen, “Robust speaker identification system based on multilayer eigen-codebook vector quantization,” IEICE Transactions on Information and Systems, vol. E87-D, no. 5, May 2004, pp.1185-1193.
[69]	W. C. Chen, C. T. Hsieh, and E. Lai, “Robust speaker identification system based on wavelet transform and Gaussian mixture model,” Proc. of the First International Joint Conference on Natural Language Processing (IJCNLP-04), Hainan, March 2004, pp. 129-134.
[70]	W. C. Chen, C. T. Hsieh, and E. Lai, “Multiband approach to robust text-independent speaker identification,” Journal of Computational Linguistics and Chinese Language Processing, vol. 9, no. 2, August 2004, pp. 63-76.
[71]	W. C. Chen, C. T. Hsieh, and C. H. Hsu, “Two-Stage Vector Quantization Based Multi-band Models for Speaker Identification,” Proceedings of 2007 International Conference on Convergence Information Technology (ICCIT07), Gyeongju, Korea, 21-23 Nov. 2007, pp. 2336-2341.
[72]	W. C. Chen, C. T. Hsieh, and C. H. Hsu, “Robust Speaker Identification System Based on Two-Stage Vector Quantization,” Tamkang Journal of Science and Engineering, March 2008 accepted.
[73]	C. S. Burrus, R. A. Gopinath, and H. Guo, “Introduction to Wavelets and Wavelet Transforms,” Prentice-Hall, 1998.
[74]	G. Strang and T. Nguyen, “Wavelets and Filter Banks,” Wellesley Cambridge, 1997.
[75]	L. R. Rabiner and B. H. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993
[76]	J. Makhoul, “Linear prediction: A tutorial review,” Proc. IEEE, vol. 63, 1975, pp. 561-580.
[77]	J. B. Allen, “How do humans process and recognize speech?,” IEEE Trans. Speech and Audio Processing, vol. 2, no. 4, Oct. 1994, pp. 567–577.
[78]	J. Godfrey, D. Graff, and A. Martin, “Public databases for speaker recognition and verification,” in Proc. ESCA Workshop Automat. Speaker Recognition, Identification, Verification, Apr. 1994, pp. 39-42.
[79]	  Y. Linde, A. Buzo,  and R. M. Gray, “An algorithm for vector quantizer design,” IEEE Trans. Commun., vol. 28, no. 1, Jan. 1980, pp. 84-95.
[80]	G. McLachlan, Mixture Models, New York: Marcel Dekker, 1988.
[81]	A. Dempster, N. Laird, and D. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Royal Stat. Soc., vol. 39, 1977, pp. 1-38.
[82]	X. Wu, D. Luo, H. Chi, and S. H., “Biomimetics speaker identification systems for network security gatekeepers,” Proc. of Int. Joint Conf. on Neural Networks, vol. 4, 2003, pp. 3189-3194.
論文全文使用權限
校內
紙本論文於授權書繳交後2年公開
校內書目立即公開
校外
不同意授權

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信