§ 瀏覽學位論文書目資料
系統識別號 U0002-1307200920590300
DOI 10.6846/TKU.2009.01288
論文名稱(中文) 基於隱藏式馬可夫模型之唇語辨識系統
論文名稱(英文) A Lipreading System Based on Hidden Markov Model
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 電機工程學系碩士班
系所名稱(英文) Department of Electrical and Computer Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 97
學期 2
出版年 98
研究生(中文) 張志瑜
研究生(英文) Chih-Yu Chang
學號 696440352
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2009-06-19
論文頁數 48頁
口試委員 指導教授 - 謝景棠(hsieh@ee.tku.edu.tw)
委員 - 蘇木春(muchun@csie.ncu.edu.tw)
委員 - 林慧珍(hjlin@cs.tku.edu.tw)
委員 - 許志旭(hsuch@ems.cku.edu.tw)
委員 - 陳慶逸(chingyi@mail.mcu.edu.tw)
關鍵字(中) 唇語辨識
隱藏式馬可夫模型
彩度色彩模型
K-means演算法
關鍵字(英) Lipreading
HMM
chromaticity color space
K-means algorithm
第三語言關鍵字
學科別分類
中文摘要
傳統使用語音資訊之語音辨識系統,在日常生活中的應用已是很常見的,例如:聲控開關;然而,易受雜音干擾則為此類語音辨識系統之最大弊病,即使能夠選用改良之收音器材,如指向性麥克風,以減少雜音干擾之情形。然而,高昂的成本即為設計此系統要面臨之代價。於是,許多學者針對上述之問題,提出了改良方法,包括:以影像資訊為基礎之語音辨識系統,即唇語辨識系統。唇語辨識系統能夠免除於雜音之干擾,甚至可與以語音資訊為基礎之語音辨識系統結合,能夠有效提昇其辨識率。本研究之目的即為設計一唇語辨識系統,結合彩度色彩空間(chromaticity color space)與K-means演算法(K-means algorithm)作為唇形影像切割方式,進而擷取出唇形特徵,並配合隱藏式馬可夫模型的使用,以提昇唇語辨識系統之辨識率。實驗結果將比較不同色彩空間之唇形切割技術,以及不同特徵之辨識率。
英文摘要
Nowadays, the conventional speech recognition system has been used in many applications. However, the conventional speech recognition system would be interfered by the voice noise According to the disturbance, the recognition rate would be decreased in the noise condition. So, researchers proposed the singular visual feature speech recognition system, a lipreading system, to avoid the affection of voice noise. The lipreading system can be the assistance part of the conventional speech recognition system, to raise the speech recognition rate. In our research, we proposed a lipreading system which the lip image segmentation part is chromaticity color space combined with K-means algorithm. And taking the Hidden Markov Model as the recognition part to improve the recognition rate. In the experiment results, our method compared with other color based lip segmentation, and compared the recognition rate of different features.
第三語言摘要
論文目次
目錄
中文摘要	I
英文摘要	II
目錄	III
圖目錄	VI
表目錄	VIII
第一章  緒論	1
1.1研究背景	1
1.2研究動機	2
1.3章節概要	3
第二章  相關研究	4
2.1 唇形切割	4
2.1.1基於模型的唇形切割	4
2.1.2基於色彩的唇形切割	6
2.1.3基於群聚的唇形切割	9
2.2唇形特徵	11
2.3結論	         12
第三章  系統架構	14
3.1系統流程	14
3.2影像前處理	15
3.2.1彩度色彩空間轉換	15
3.2.2影像平滑化	17
3.3唇形切割	18
3.3.1利用K-means algorithm切割唇形	18
3.3.2形態學處理	19
3.4唇形特徵擷取	20
第四章  辨識系統	24
4.1隱藏式馬可夫模型	25
4.2 Viterbi演算法	27
4.3 EM(Expectation-Maximization)演算法	28
第五章  實驗結果	30
5.1實驗環境	30
5.2唇形切割實驗	30
5.3唇語辨識實驗	38
5.4結果與討論       	41
第六章  結論與未來工作	44
參考文獻              	46

圖目錄
圖2.1  RGB色彩空間	7
圖2.2  HSI色彩空間	8
圖2.3  {H1,V1,V2,V3}及{S,A}唇形特徵示意圖	11
圖2.4  半徑唇形特徵	12
圖3.1  系統流程圖	14
圖3.2  唇形影像色彩空間轉換	16
圖3.3  高斯低通濾波結果	17
圖3.4  利用K-means演算法取閥值後二值化結果	19
圖3.5  形態學運算	20
圖3.6  直方圖投影	21
圖3.7  全部嘴唇特徵點	21
圖3.8  嘴唇特徵點	22
圖3.9  嘴唇特徵點	22
圖3.10 {H1,V1,V2,V3}特徵擷取結果	23
圖4.1  隱藏式馬可夫模型狀態轉移示意圖	26
圖4.2  Viterbi流程示意圖	27
圖5.1  發音「零」Frame 1之唇形切割結果比較	31
圖5.2  發音「一」Frame 27之唇形切割結果比較	32
圖5.3  發音「二」Frame 14之唇形切割結果比較 	33
圖5.4  發音「四」Frame 18之唇形切割結果比較	34
圖5.5  發音「五」Frame 25之唇形切割結果比較	35
圖5.6  發音「八」Frame 25之唇形切割結果比較	36
圖5.7  發音「九」Frame 30之唇形切割結果比較	37

表目錄
表5.1  特徵A辨識率	39
表5.2  特徵B辨識率	39
表5.3  特徵C辨識率	40
表5.4  特徵D辨識率	40
參考文獻
[1]N. Deshmukh, A. Ganapathiraju and J. Picone, “Hierarchical search for large-vocabulary conversational speech recognition,” IEEE Signal Processing Magazine, vol. 16, Sept. 1999, pp. 84-107.

[2]D. Nguyen, D. Halupka, P. Aarabi and A. Sheikholeslami, “Real-time face detection and lip feature extraction using field-programmable gate arrays,” IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics 36, vol. 36, Aug. 2006, pp. 902-912.

[3]T. Chen, and R. R. Rao, “Audio-Visual Integration in Multimodal Communication,” Proc. of the IEEE, vol. 86, May. 1998, pp. 837-852.

[4]A. S. M. Sohail, and P. Bhattacharya, “Automated lip contour detection using the level set segmentation method,” in Proc. Int. Image Analysis and Processing Conf. (ICIAP’07), Sept. 2007, pp.425-430.

[5]M. Kass, A. Witkin, and D. Terzopulos, “Snakes: Active Contour Models,” Int. Journal of Computer Vision, Vol. 1, 1988, pp. 321-331.

[6]R. C. Gonzalez, and R. E. Woods, Digital Image Processing, 2nd ed., Prentice-Hall, 2002.

[7]X. Zhang, and R.M. Mersereau, “Lip Feature Extraction Towards an Automatic Speechreading System,” in Proc. of IEEE Int. Image Processing  Conf., Sept. 2000, pp. 226-229.

[8]A. Hulbert and T. Poggio, “Synthesizing a color algorithm from examples,” Science, New Series, vol. 239, Jan. 1998, pp. 482-485.

[9]H. J. Trussell, M. J. Vrhel and E. Saber, “Color Image Processing [basics and special issue overview],” IEEE Signal Processing Mag., vol. 22, Jan. 2005, pp. 14-22.

[10]M. Sadeghi, J. Kittler and K. Messer, “Segmentation of lip pixels for lip tracker initialization,” in Proc. of Int. Image Processing Conf., Oct. 2001, pp. 7-10.

[11]M. N. Q. Kaynak, A. D. Cheok, K. Sengupta, Z. Jian and K. C. Chung, “Analysis of lip geometric features for audio-visual speech recognition,” IEEE Trans. on Systems, Man, and Cybernetics Part A: Systems and Humans., vol. 34, Jul. 2004, pp. 564-570.

[12]L. G. Silveira, J. Facon, and D. L. Borges, “Visual Speech Recognition: a Solution from Feature Extraction to Words Classification," in Proc. of Int. Computer Graphics and Image Processing Conf., Oct. 2003, pp. 399-405.

[13]M. J. Lyons, C. H. Chan, and N. Tetsutani, “Mouth Type: text entry by hand and mouth,” in Proc. of Human Factors in Computing Systems Conf., Apr. 2004, pp. 1383-1386.

[14]T. Saitoh, and R. Konishi, “Lip reading based on sampled active contour model,” Image analysis and recognition Conf.(ICIAR’05), Sept. 2005, pp.507-515.

[15]Saitoh, T., Konishi, R., “Word recognition based on two dimensional lip motion trajectory,” Int. Symposium on Intelligent Signal Processing and Communications(ISPACS’06), Dec. 2006 , pp. 287-290.

[16]N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. on Sys., Man., Cyber, vol. 9, Jan. 1979, pp. 62-66.

[17]H. S. Hippert, C. E. Pedreira, and R. C. Souza, “Neural Networks for Short-Term Load Forecasting: A Review and Evaluation,” IEEE Trans. on Power Systems, vol. 16, Feb. 2001, pp. 44-45.

[18]J. Huang, X. Shao, and H. Wechsler, “Face pose discrimination using support vector machines (SVM),” in Proc. of Int. Pattern Recognition Conf.(ICPR’98), Aug. 1998, pp. 155-156.

[19]R. Lawrence Rabiner, “A tutorial on hidden Markov model and selected application in speech recognition,” Processing of the IEEE, vol. 77, Feb. 1989, pp. 257-286.

[20]S. L. Wang, A. W. C. Liew, W. H. Lau, and H. S. Leung, “An Automatic Lipreading System for Spoken Digits With Limited Training Data,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 18, Dec. 2008, pp. 1760-1765.

[21]L. R Rabiner, B. H. Juang, Fundamentals of speech Recognition. Englewood Cliffs, NJ: Pretice-Hall, 1993
論文全文使用權限
校內
紙本論文於授權書繳交後5年公開
校內書目立即公開
校外
不同意授權

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信