淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


  查詢圖書館館藏目錄
系統識別號 U0002-1307200920590300
中文論文名稱 基於隱藏式馬可夫模型之唇語辨識系統
英文論文名稱 A Lipreading System Based on Hidden Markov Model
校院名稱 淡江大學
系所名稱(中) 電機工程學系碩士班
系所名稱(英) Department of Electrical Engineering
學年度 97
學期 2
出版年 98
研究生中文姓名 張志瑜
研究生英文姓名 Chih-Yu Chang
學號 696440352
學位類別 碩士
語文別 中文
口試日期 2009-06-19
論文頁數 48頁
口試委員 指導教授-謝景棠
委員-蘇木春
委員-林慧珍
委員-許志旭
委員-陳慶逸
中文關鍵字 唇語辨識  隱藏式馬可夫模型  彩度色彩模型  K-means演算法 
英文關鍵字 Lipreading  HMM  chromaticity color space  K-means algorithm 
學科別分類 學科別應用科學電機及電子
中文摘要 傳統使用語音資訊之語音辨識系統,在日常生活中的應用已是很常見的,例如:聲控開關;然而,易受雜音干擾則為此類語音辨識系統之最大弊病,即使能夠選用改良之收音器材,如指向性麥克風,以減少雜音干擾之情形。然而,高昂的成本即為設計此系統要面臨之代價。於是,許多學者針對上述之問題,提出了改良方法,包括:以影像資訊為基礎之語音辨識系統,即唇語辨識系統。唇語辨識系統能夠免除於雜音之干擾,甚至可與以語音資訊為基礎之語音辨識系統結合,能夠有效提昇其辨識率。本研究之目的即為設計一唇語辨識系統,結合彩度色彩空間(chromaticity color space)與K-means演算法(K-means algorithm)作為唇形影像切割方式,進而擷取出唇形特徵,並配合隱藏式馬可夫模型的使用,以提昇唇語辨識系統之辨識率。實驗結果將比較不同色彩空間之唇形切割技術,以及不同特徵之辨識率。
英文摘要 Nowadays, the conventional speech recognition system has been used in many applications. However, the conventional speech recognition system would be interfered by the voice noise According to the disturbance, the recognition rate would be decreased in the noise condition. So, researchers proposed the singular visual feature speech recognition system, a lipreading system, to avoid the affection of voice noise. The lipreading system can be the assistance part of the conventional speech recognition system, to raise the speech recognition rate. In our research, we proposed a lipreading system which the lip image segmentation part is chromaticity color space combined with K-means algorithm. And taking the Hidden Markov Model as the recognition part to improve the recognition rate. In the experiment results, our method compared with other color based lip segmentation, and compared the recognition rate of different features.
論文目次 目錄
中文摘要 I
英文摘要 II
目錄 III
圖目錄 VI
表目錄 VIII
第一章 緒論 1
1.1研究背景 1
1.2研究動機 2
1.3章節概要 3
第二章 相關研究 4
2.1 唇形切割 4
2.1.1基於模型的唇形切割 4
2.1.2基於色彩的唇形切割 6
2.1.3基於群聚的唇形切割 9
2.2唇形特徵 11
2.3結論 12
第三章 系統架構 14
3.1系統流程 14
3.2影像前處理 15
3.2.1彩度色彩空間轉換 15
3.2.2影像平滑化 17
3.3唇形切割 18
3.3.1利用K-means algorithm切割唇形 18
3.3.2形態學處理 19
3.4唇形特徵擷取 20
第四章 辨識系統 24
4.1隱藏式馬可夫模型 25
4.2 Viterbi演算法 27
4.3 EM(Expectation-Maximization)演算法 28
第五章 實驗結果 30
5.1實驗環境 30
5.2唇形切割實驗 30
5.3唇語辨識實驗 38
5.4結果與討論 41
第六章 結論與未來工作 44
參考文獻 46

圖目錄
圖2.1 RGB色彩空間 7
圖2.2 HSI色彩空間 8
圖2.3 {H1,V1,V2,V3}及{S,A}唇形特徵示意圖 11
圖2.4 半徑唇形特徵 12
圖3.1 系統流程圖 14
圖3.2 唇形影像色彩空間轉換 16
圖3.3 高斯低通濾波結果 17
圖3.4 利用K-means演算法取閥值後二值化結果 19
圖3.5 形態學運算 20
圖3.6 直方圖投影 21
圖3.7 全部嘴唇特徵點 21
圖3.8 嘴唇特徵點 22
圖3.9 嘴唇特徵點 22
圖3.10 {H1,V1,V2,V3}特徵擷取結果 23
圖4.1 隱藏式馬可夫模型狀態轉移示意圖 26
圖4.2 Viterbi流程示意圖 27
圖5.1 發音「零」Frame 1之唇形切割結果比較 31
圖5.2 發音「一」Frame 27之唇形切割結果比較 32
圖5.3 發音「二」Frame 14之唇形切割結果比較 33
圖5.4 發音「四」Frame 18之唇形切割結果比較 34
圖5.5 發音「五」Frame 25之唇形切割結果比較 35
圖5.6 發音「八」Frame 25之唇形切割結果比較 36
圖5.7 發音「九」Frame 30之唇形切割結果比較 37

表目錄
表5.1 特徵A辨識率 39
表5.2 特徵B辨識率 39
表5.3 特徵C辨識率 40
表5.4 特徵D辨識率 40



參考文獻 [1]N. Deshmukh, A. Ganapathiraju and J. Picone, “Hierarchical search for large-vocabulary conversational speech recognition,” IEEE Signal Processing Magazine, vol. 16, Sept. 1999, pp. 84-107.

[2]D. Nguyen, D. Halupka, P. Aarabi and A. Sheikholeslami, “Real-time face detection and lip feature extraction using field-programmable gate arrays,” IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics 36, vol. 36, Aug. 2006, pp. 902-912.

[3]T. Chen, and R. R. Rao, “Audio-Visual Integration in Multimodal Communication,” Proc. of the IEEE, vol. 86, May. 1998, pp. 837-852.

[4]A. S. M. Sohail, and P. Bhattacharya, “Automated lip contour detection using the level set segmentation method,” in Proc. Int. Image Analysis and Processing Conf. (ICIAP’07), Sept. 2007, pp.425-430.

[5]M. Kass, A. Witkin, and D. Terzopulos, “Snakes: Active Contour Models,” Int. Journal of Computer Vision, Vol. 1, 1988, pp. 321-331.

[6]R. C. Gonzalez, and R. E. Woods, Digital Image Processing, 2nd ed., Prentice-Hall, 2002.

[7]X. Zhang, and R.M. Mersereau, “Lip Feature Extraction Towards an Automatic Speechreading System,” in Proc. of IEEE Int. Image Processing Conf., Sept. 2000, pp. 226-229.

[8]A. Hulbert and T. Poggio, “Synthesizing a color algorithm from examples,” Science, New Series, vol. 239, Jan. 1998, pp. 482-485.

[9]H. J. Trussell, M. J. Vrhel and E. Saber, “Color Image Processing [basics and special issue overview],” IEEE Signal Processing Mag., vol. 22, Jan. 2005, pp. 14-22.

[10]M. Sadeghi, J. Kittler and K. Messer, “Segmentation of lip pixels for lip tracker initialization,” in Proc. of Int. Image Processing Conf., Oct. 2001, pp. 7-10.

[11]M. N. Q. Kaynak, A. D. Cheok, K. Sengupta, Z. Jian and K. C. Chung, “Analysis of lip geometric features for audio-visual speech recognition,” IEEE Trans. on Systems, Man, and Cybernetics Part A: Systems and Humans., vol. 34, Jul. 2004, pp. 564-570.

[12]L. G. Silveira, J. Facon, and D. L. Borges, “Visual Speech Recognition: a Solution from Feature Extraction to Words Classification," in Proc. of Int. Computer Graphics and Image Processing Conf., Oct. 2003, pp. 399-405.

[13]M. J. Lyons, C. H. Chan, and N. Tetsutani, “Mouth Type: text entry by hand and mouth,” in Proc. of Human Factors in Computing Systems Conf., Apr. 2004, pp. 1383-1386.

[14]T. Saitoh, and R. Konishi, “Lip reading based on sampled active contour model,” Image analysis and recognition Conf.(ICIAR’05), Sept. 2005, pp.507-515.

[15]Saitoh, T., Konishi, R., “Word recognition based on two dimensional lip motion trajectory,” Int. Symposium on Intelligent Signal Processing and Communications(ISPACS’06), Dec. 2006 , pp. 287-290.

[16]N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. on Sys., Man., Cyber, vol. 9, Jan. 1979, pp. 62-66.

[17]H. S. Hippert, C. E. Pedreira, and R. C. Souza, “Neural Networks for Short-Term Load Forecasting: A Review and Evaluation,” IEEE Trans. on Power Systems, vol. 16, Feb. 2001, pp. 44-45.

[18]J. Huang, X. Shao, and H. Wechsler, “Face pose discrimination using support vector machines (SVM),” in Proc. of Int. Pattern Recognition Conf.(ICPR’98), Aug. 1998, pp. 155-156.

[19]R. Lawrence Rabiner, “A tutorial on hidden Markov model and selected application in speech recognition,” Processing of the IEEE, vol. 77, Feb. 1989, pp. 257-286.

[20]S. L. Wang, A. W. C. Liew, W. H. Lau, and H. S. Leung, “An Automatic Lipreading System for Spoken Digits With Limited Training Data,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 18, Dec. 2008, pp. 1760-1765.

[21]L. R Rabiner, B. H. Juang, Fundamentals of speech Recognition. Englewood Cliffs, NJ: Pretice-Hall, 1993
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2014-07-16公開。
  • 不同意授權瀏覽/列印電子全文服務。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信