電子學位論文服務

§ 瀏覽學位論文書目資料

本論文紙本於2014-07-16起公開使用

系統識別號	U0002-1307200920590300
DOI	10.6846/TKU.2009.01288
論文名稱(中文)	基於隱藏式馬可夫模型之唇語辨識系統
論文名稱(英文)	A Lipreading System Based on Hidden Markov Model
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	電機工程學系碩士班
系所名稱(英文)	Department of Electrical and Computer Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	97
學期	2
出版年	98
研究生(中文)	張志瑜
研究生(英文)	Chih-Yu Chang
學號	696440352
學位類別	碩士
語言別	繁體中文
第二語言別
口試日期	2009-06-19
論文頁數	48頁
口試委員	指導教授 - 謝景棠(hsieh@ee.tku.edu.tw) 委員 - 蘇木春(muchun@csie.ncu.edu.tw) 委員 - 林慧珍(hjlin@cs.tku.edu.tw) 委員 - 許志旭(hsuch@ems.cku.edu.tw) 委員 - 陳慶逸(chingyi@mail.mcu.edu.tw)
關鍵字(中)	唇語辨識隱藏式馬可夫模型彩度色彩模型 K-means演算法
關鍵字(英)	Lipreading HMM chromaticity color space K-means algorithm
第三語言關鍵字
學科別分類
中文摘要	傳統使用語音資訊之語音辨識系統，在日常生活中的應用已是很常見的，例如：聲控開關；然而，易受雜音干擾則為此類語音辨識系統之最大弊病，即使能夠選用改良之收音器材，如指向性麥克風，以減少雜音干擾之情形。然而，高昂的成本即為設計此系統要面臨之代價。於是，許多學者針對上述之問題，提出了改良方法，包括：以影像資訊為基礎之語音辨識系統，即唇語辨識系統。唇語辨識系統能夠免除於雜音之干擾，甚至可與以語音資訊為基礎之語音辨識系統結合，能夠有效提昇其辨識率。本研究之目的即為設計一唇語辨識系統，結合彩度色彩空間(chromaticity color space)與K-means演算法(K-means algorithm)作為唇形影像切割方式，進而擷取出唇形特徵，並配合隱藏式馬可夫模型的使用，以提昇唇語辨識系統之辨識率。實驗結果將比較不同色彩空間之唇形切割技術，以及不同特徵之辨識率。
英文摘要	Nowadays, the conventional speech recognition system has been used in many applications. However, the conventional speech recognition system would be interfered by the voice noise According to the disturbance, the recognition rate would be decreased in the noise condition. So, researchers proposed the singular visual feature speech recognition system, a lipreading system, to avoid the affection of voice noise. The lipreading system can be the assistance part of the conventional speech recognition system, to raise the speech recognition rate. In our research, we proposed a lipreading system which the lip image segmentation part is chromaticity color space combined with K-means algorithm. And taking the Hidden Markov Model as the recognition part to improve the recognition rate. In the experiment results, our method compared with other color based lip segmentation, and compared the recognition rate of different features.
第三語言摘要
論文目次	目錄中文摘要 I 英文摘要 II 目錄 III 圖目錄 VI 表目錄 VIII 第一章緒論 1 1.1研究背景 1 1.2研究動機 2 1.3章節概要 3 第二章相關研究 4 2.1 唇形切割 4 2.1.1基於模型的唇形切割 4 2.1.2基於色彩的唇形切割 6 2.1.3基於群聚的唇形切割 9 2.2唇形特徵 11 2.3結論 12 第三章系統架構 14 3.1系統流程 14 3.2影像前處理 15 3.2.1彩度色彩空間轉換 15 3.2.2影像平滑化 17 3.3唇形切割 18 3.3.1利用K-means algorithm切割唇形 18 3.3.2形態學處理 19 3.4唇形特徵擷取 20 第四章辨識系統 24 4.1隱藏式馬可夫模型 25 4.2 Viterbi演算法 27 4.3 EM(Expectation-Maximization)演算法 28 第五章實驗結果 30 5.1實驗環境 30 5.2唇形切割實驗 30 5.3唇語辨識實驗 38 5.4結果與討論 41 第六章結論與未來工作 44 參考文獻 46 圖目錄圖2.1 RGB色彩空間 7 圖2.2 HSI色彩空間 8 圖2.3 {H1,V1,V2,V3}及{S,A}唇形特徵示意圖 11 圖2.4 半徑唇形特徵 12 圖3.1 系統流程圖 14 圖3.2 唇形影像色彩空間轉換 16 圖3.3 高斯低通濾波結果 17 圖3.4 利用K-means演算法取閥值後二值化結果 19 圖3.5 形態學運算 20 圖3.6 直方圖投影 21 圖3.7 全部嘴唇特徵點 21 圖3.8 嘴唇特徵點 22 圖3.9 嘴唇特徵點 22 圖3.10 {H1,V1,V2,V3}特徵擷取結果 23 圖4.1 隱藏式馬可夫模型狀態轉移示意圖 26 圖4.2 Viterbi流程示意圖 27 圖5.1 發音「零」Frame 1之唇形切割結果比較 31 圖5.2 發音「一」Frame 27之唇形切割結果比較 32 圖5.3 發音「二」Frame 14之唇形切割結果比較 33 圖5.4 發音「四」Frame 18之唇形切割結果比較 34 圖5.5 發音「五」Frame 25之唇形切割結果比較 35 圖5.6 發音「八」Frame 25之唇形切割結果比較 36 圖5.7 發音「九」Frame 30之唇形切割結果比較 37 表目錄表5.1 特徵A辨識率 39 表5.2 特徵B辨識率 39 表5.3 特徵C辨識率 40 表5.4 特徵D辨識率 40
參考文獻	[1]N. Deshmukh, A. Ganapathiraju and J. Picone, “Hierarchical search for large-vocabulary conversational speech recognition,” IEEE Signal Processing Magazine, vol. 16, Sept. 1999, pp. 84-107. [2]D. Nguyen, D. Halupka, P. Aarabi and A. Sheikholeslami, “Real-time face detection and lip feature extraction using field-programmable gate arrays,” IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics 36, vol. 36, Aug. 2006, pp. 902-912. [3]T. Chen, and R. R. Rao, “Audio-Visual Integration in Multimodal Communication,” Proc. of the IEEE, vol. 86, May. 1998, pp. 837-852. [4]A. S. M. Sohail, and P. Bhattacharya, “Automated lip contour detection using the level set segmentation method,” in Proc. Int. Image Analysis and Processing Conf. (ICIAP’07), Sept. 2007, pp.425-430. [5]M. Kass, A. Witkin, and D. Terzopulos, “Snakes: Active Contour Models,” Int. Journal of Computer Vision, Vol. 1, 1988, pp. 321-331. [6]R. C. Gonzalez, and R. E. Woods, Digital Image Processing, 2nd ed., Prentice-Hall, 2002. [7]X. Zhang, and R.M. Mersereau, “Lip Feature Extraction Towards an Automatic Speechreading System,” in Proc. of IEEE Int. Image Processing Conf., Sept. 2000, pp. 226-229. [8]A. Hulbert and T. Poggio, “Synthesizing a color algorithm from examples,” Science, New Series, vol. 239, Jan. 1998, pp. 482-485. [9]H. J. Trussell, M. J. Vrhel and E. Saber, “Color Image Processing [basics and special issue overview],” IEEE Signal Processing Mag., vol. 22, Jan. 2005, pp. 14-22. [10]M. Sadeghi, J. Kittler and K. Messer, “Segmentation of lip pixels for lip tracker initialization,” in Proc. of Int. Image Processing Conf., Oct. 2001, pp. 7-10. [11]M. N. Q. Kaynak, A. D. Cheok, K. Sengupta, Z. Jian and K. C. Chung, “Analysis of lip geometric features for audio-visual speech recognition,” IEEE Trans. on Systems, Man, and Cybernetics Part A: Systems and Humans., vol. 34, Jul. 2004, pp. 564-570. [12]L. G. Silveira, J. Facon, and D. L. Borges, “Visual Speech Recognition: a Solution from Feature Extraction to Words Classification," in Proc. of Int. Computer Graphics and Image Processing Conf., Oct. 2003, pp. 399-405. [13]M. J. Lyons, C. H. Chan, and N. Tetsutani, “Mouth Type: text entry by hand and mouth,” in Proc. of Human Factors in Computing Systems Conf., Apr. 2004, pp. 1383-1386. [14]T. Saitoh, and R. Konishi, “Lip reading based on sampled active contour model,” Image analysis and recognition Conf.(ICIAR’05), Sept. 2005, pp.507-515. [15]Saitoh, T., Konishi, R., “Word recognition based on two dimensional lip motion trajectory,” Int. Symposium on Intelligent Signal Processing and Communications(ISPACS’06), Dec. 2006 , pp. 287-290. [16]N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Trans. on Sys., Man., Cyber, vol. 9, Jan. 1979, pp. 62-66. [17]H. S. Hippert, C. E. Pedreira, and R. C. Souza, “Neural Networks for Short-Term Load Forecasting: A Review and Evaluation,” IEEE Trans. on Power Systems, vol. 16, Feb. 2001, pp. 44-45. [18]J. Huang, X. Shao, and H. Wechsler, “Face pose discrimination using support vector machines (SVM),” in Proc. of Int. Pattern Recognition Conf.(ICPR’98), Aug. 1998, pp. 155-156. [19]R. Lawrence Rabiner, “A tutorial on hidden Markov model and selected application in speech recognition,” Processing of the IEEE, vol. 77, Feb. 1989, pp. 257-286. [20]S. L. Wang, A. W. C. Liew, W. H. Lau, and H. S. Leung, “An Automatic Lipreading System for Spoken Digits With Limited Training Data,” IEEE Trans. on Circuits and Systems for Video Technology, vol. 18, Dec. 2008, pp. 1760-1765. [21]L. R Rabiner, B. H. Juang, Fundamentals of speech Recognition. Englewood Cliffs, NJ: Pretice-Hall, 1993
論文全文使用權限	校內：紙本論文於授權書繳交後5年公開校內書目立即公開校外：不同意授權予資料庫廠商

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信