§ 瀏覽學位論文書目資料
  
系統識別號 U0002-0509201716171500
DOI 10.6846/TKU.2017.00165
論文名稱(中文) 使用深度學習於靜態手勢辨識之研究
論文名稱(英文) A Study of Using Deep Learning in Static Hand Gesture Recognition
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 105
學期 2
出版年 106
研究生(中文) 林清鴻
研究生(英文) Ching-Hung Lin
學號 604410927
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2017-06-22
論文頁數 83頁
口試委員 指導教授 - 洪文斌(horng@mail.tku.edu.tw)
委員 - 范俊海
委員 - 彭建文
關鍵字(中) 深度學習
靜態手勢辨識
堆疊降噪自動編碼器
關鍵字(英) Deep Learning
Static Hand Gesture Recognition
Stacked Denoising AutoEncoders
第三語言關鍵字
學科別分類
中文摘要
本論文嘗試利用深度學習神經網路來提高靜態手勢的辨識率。整個系統架構分為影像前處理與神經網路訓練兩大部分。首先,將Moeslund的手勢資料庫提供之灰階影像處理過後,定位手勢位置並切割影像,再將其製作成訓練樣本與測試樣本。訓練的神經網路為堆疊降噪自動編碼器(Stacked Denoising AutoEncoder, SDAE),是一種深度學習的網路架構,能夠學習到較好的特徵。訓練完成的SDAE可以進行辨識測試樣本的手勢為何種字母。在訓練時,我們還使用了Momentum下降梯度算法,能夠突破區域最小值並且加速收斂,以及Dropout能夠隨機選擇神經元讓其停止運作,降低過擬和(Overfitting)的發生機率。實驗使用1,440張影像訓練、600張影像測試,最後的辨識結果最高能達到99.96%的正確率。本論文提出的系統架構能夠快速且準確的辨識靜態手勢,具有高度的實用價值。
英文摘要
This study tried to utilize deep-learning neural networks to promote the recognition rate of static hand gestures. The recognition system is divided into two parts: image preprocessing and network training. First, the gray-scale images provided by Moeslund’s static hand gesture database are processed to locate the hand gesture subimages. Then, these gesture images are partitioned into two categories: training dataset and test dataset. The training neural network is Stacked Denoising AutoEncoders (SDAE), which is a deep learning network architecture that can learn better features. During training, the momentum descent gradient algorithm is used to break the region minimum and accelerate convergence. In addition, the dropout technique is used to randomly select neurons to stop functioning and reduce the probability of overfitting. In the experiment, 1,440 gesture images are used to train the network and 600 gesture images are used to test the network. The final recognition rate can achieve 99.96%. The recognition system proposed by this study can quickly and accurately identify static gestures, which is highly valuable and practical.
第三語言摘要
論文目次
目錄
第一章	緒論	1
1.1	研究動機	1
1.2	研究方向與目的	2
1.3	論文架構	3
第二章	深度學習	4
2.1	神經網路歷史	4
2.2	神經網路基本架構	6
2.2.1	神經元	7
2.2.2	層	8
2.2.3	權重	9
2.3	倒傳遞演算法	10
2.4	卷積神經網路	12
2.4.1	局部連接	12
2.4.2	權值共享	13
2.4.3	池化	13
2.4.4	網路訓練	14
2.5	堆疊降噪自動編碼器	15
2.5.1	自動編碼器	15
2.5.2	堆疊自動編碼器	16
2.5.3	降噪自動編碼器	17
第三章	文獻回顧	18
3.1	Oyedotun 和 Khashman所作的研究論文	18
第四章	手勢辨識系統	22
4.1	影像前處理	22
4.1.1	中值濾波器	22
4.1.2	手勢定位與切割	25
4.1.3	修改的縮放方法	27
4.1.4	灰階影像的製作	29
4.2	製作訓練樣本與測試樣本	31
4.3	訓練神經網路	33
4.4	測試神經網路	35
第五章	實驗與結果	37
5.1	實驗	37
5.1.1	實驗一	37
5.1.2	實驗二	41
5.1.3	實驗三	44
5.1.4	實驗四	49
5.1.5	實驗五	54
5.2	結果	55
5.3	錯誤的辨識	56
第六章	結論與未來發展方向	58
6.1	評估與檢討	58
6.2	未來發展方向	59
參考文獻	61
附錄	66
Main.m	67
ImageProcess.m	68
Image_Data.m	70
saesetup.m	70
nnsetup.m	70
saetrain.m	71
nntrain.m	71
nnff.m	73
nnbp.m	74
nnapplygrads.m	75
nneval.m	75
nntest.m	76
nnpredict.m	76
A Study of Using Deep Learning in Static Hand Gesture Recognition	77


圖目錄
圖 1:單一神經元運作	7
圖 2:Sigmoid Function函數圖形	8
圖 3:網路層	9
圖 4:全連結與部分連結的差異	12
圖 5:一個卷積核對輸入掃描得出特徵圖	13
圖 6:平均池化與最大值池化的差異	14
圖 7:自動編碼器結構	15
圖 8:堆疊自動編碼器	16
圖 9:降噪自動編碼器	17
圖 10:ASL的24種手勢影像	18
圖 11:前處理過後的手勢影像	19
圖 12:濾波器窗口大小與網路辨識能力的關係	23
圖 13:中值濾波器大小的差異	24
圖 14:水平與垂直投影定位手勢	26
圖 15:縮放32x32與縮放32x40的手勢影像差異	28
圖 16:從原始影像切割後的手勢樣本	29
圖 17:將黑灰色背景去除的新灰階影像	30
圖 18:製作樣本集方式示意圖	32
圖 19:手勢辨識系統整體步驟	35
圖 20:Oyedotun和Khashman使用濾波器(15,10)製作之32x32樣本影像	38
圖 21本論文使用濾波器(7,7)製作之32x32樣本影像	39
圖 22:Oyedotun和Khashman論文SDAE3所學習到的核心	40
圖 23:本論文將濾波器改為(7,7)所學習到的核心	40
圖 24:原手勢影像為32x32像素	41
圖 25:使用改進的縮放方法將手勢影像切割縮放成32x40像素	41
圖 26:手勢影像32x32像素所學習到的核心	42
圖 27:手勢影像32x40像素所學習到的核心	43
圖 28:32x32像素的黑白手勢影像	44
圖 29:32x32像素的灰階手勢影像	44
圖 30:32x40像素的黑白手勢影像	45
圖 31:32x40像素的灰階手勢影像	45
圖 32:32x32像素的黑白手勢影像所學習到的核心	46
圖 33:32x32像素的灰階手勢影像所學習到的核心	47
圖 34:32x40像素的黑白手勢影像所學習到的核心	47
圖 35:32x40像素的灰階手勢影像所學習到的核心	48
圖 36:字母E的編號1至100號(由左而右,由上而下編號)	49
圖 37:字母O的編號1至100號(由左而右,由上而下編號)	50
圖 38:有連續性的字母O手勢影像	51
圖 39:將編號打亂的字母O手勢影像	52
圖 40:32x32像素的灰階手勢影像將編號打亂後所學習到的核心	53
圖 41:32x40像素的灰階手勢影像將編號打亂所學習到的核心	53
圖 42:縮放為32x32 pixel的樣本較多相似性	56
圖 43:縮放為32x40 pixel則較少相似	56


表目錄
表 1:CNN訓練參數	20
表 2:SDAE訓練參數	20
表 3:訓練樣本在網路回想的表現	21
表 4:測試樣本在網路回想的表現	21
表 5:實驗一數據比對	38
表 6:實驗二數據比對	42
表 7:實驗三數據比對	46
表 8:實驗四數據比對	52
表 9:實驗五數據比對(一)	55
表 10:實驗五數據比對(二)	55

程式碼目錄
程式碼 1:中值濾波器	23
程式碼 2:手勢定位	25
程式碼 3:手勢影像切割、縮放、並儲存	26
程式碼 4:切割並縮放成40x32 的手勢影像	27
程式碼 5:製作灰階手勢影像	29
程式碼 6:灰階手勢影像去背	30
程式碼 7:製作樣本集	31
程式碼 8:訓練神經網路程式	33
程式碼 9:測試神經網路程式	35
參考文獻
[1]	Nguyen, T. N., Huynh, H. H., and Meunier, J. “Static hand gesture recognition using artificial neural network.” Journal of Image and Graphics, vol. 1, no. 1, pp. 34-38, 2013.

[2]	Nagi, J., Ducatelle, F., Di Caro, G. A., Cireşan, D., Meier, U., Giusti, A., Nagi, F., Schmidhuber, J., and Gambardella, L. M. “Max-pooling convolutional neural networks for vision-based hand gesture recognition.” In Processings of the IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 342-347, 2011, November.

[3]	Rahman, M. H. and Afrin, J. “Hand gesture recognition using multiclass support vector machine.” International Journal of Computer Applications, vol. 74, no. 1, 2013.

[4]	Sultana, A. and Rajapuspha, T. “Vision based gesture recognition for alphabetical hand gestures using the svm classifier.” International Journal of Computer Science and Engineering Technology, vol. 3, no. 7, 2012.

[5]	Oyedotun, O. K., Olaniyi, E. O., Helwan, A., and Khashman, A. “Decision support models for iris nevus diagnosis considering potential malignancy.” International Journal of Scientific and Engineering Research, vol. 5, no. 12, pp. 419-426, 2014.

[6]	Yewale, S. K. and Bharne, P. K. “Hand gesture recognition using different algorithms based on artificial neural network.” In 2011 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC), pp. 287-292, 2011, April.

[7]	Phu, J. J., and Tay, Y. H. “Computer vision based hand gesture recognition using artificial neural network.” Faculty of Information and Communication Technology, University Tunku Abdul Rahman, Malaysia, pp. 1-6, 2006.

[8]	Ibraheem, N. A. and Khan, R. Z. “Vision based gesture recognition using neural networks approaches: a review.” International Journal of human Computer Interaction, vol. 3, no. 1, pp. 1-14, 2012.

[9]	Triesch, J., and Von Der Malsburg, C. “A system for person-independent hand posture recognition against complex backgrounds.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 12, pp. 1449-1453, 2001.

[10]	Murakami, K. and Taguchi, H. “Gesture recognition using recurrent neural networks.” In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM Press, pp. 237-242, 1991, April.

[11]	Ahmed, T. “A neural network based real time hand gesture recognition system.” International Journal of Computer Applications, vol. 59, no. 4, 2012.

[12]	Khashman, A. “Application of an emotional neural network to facial recognition.” Neural Computing and Applications, vol. 18, no. 4, pp. 309-320, 2009.

[13]	Wang, W., Yang, J., Xiao, J., Li, S., and Zhou, D. “Face recognition based on deep learning.” In International Conference on Human Centered Computing, pp. 812-820, 2014, November.

[14]	Khashman, A. “Investigation of different neural models for blood cell type identification.” Neural Computing and Applications, vol. 21, no. 6, pp. 1177-1183, 2012.

[15]	Oyedotun, O. K., Tackie, S. N., Olaniyi, E. O., and Khashman, A. “Data Mining of Students' Performance: Turkish Students as A Case Study.” International Journal of Intelligent Systems and Applications, vol. 7, no. 9, pp. 20-27, 2015.

[16]	Oyedotun, O. K. and Khashman, A. “Deep learning in vision-based static hand gesture recognition.” Neural Computing and Applications, pp. 1-11, 2016.

[17]	Thomas Moeslund’s gesture recognition database—PRIMA. http://www-prima.inrialpes.fr/FGnet/data/12-MoeslundGesture/database.html (2017/06/13 visited.)

[18]	Noda, K., Yamaguchi, Y., Nakadai, K., Okuno, H. G., and Ogata, T. “Audio-visual speech recognition using deep learning.” Applied Intelligence, vol. 42, no. 4, pp. 722-737, 2015.

[19]	Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., and Kuksa, P. “Natural language processing (almost) from scratch.” Journal of Machine Learning Research, vol. 12, no. 8, pp. 2493-2537, 2011.

[20]    Kruger, N., Janssen, P., Kalkan, S., Lappe, M., Leonardis, A., Piater, J., Rodriguez-Sanchez, A., and Wiskott, L. “Deep hierarchies in the primate visual cortex: What can we learn for computer vision?.” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1847-1871,2013.

[21]	McCulloch, W. S. and Pitts, W. “A logical calculus of the ideas immanent in nervous activity.” The Bulletin of Mathematical Biophysics, vol. 5, no. 4, pp. 115-133, 1943.

[22]	Hebb, Donald. The Organization of Behavior, Wiley and Sons, 1949.

[23]	Rosenblatt, F. “The perceptron: A probabilistic model for information storage and organization in the brain.” Psychological Review, vol. 65, no. 6, pp. 386-408, 1958.

[24]	Rosenblatt, F. Principles of neurodynamics: Perceptrons and the theory of brain mechanisms.” Cornell Aeronautical Laboratory, Report no. VG-1196-G-8, Spartan Books, 1962.

[25]	Minsky, M. and Papert, S. Perceptrons: An Introduction to Computational Geometry, MIT Press, Cambridge MA, 1972 (2nd edition with corrections, first edition 1969).

[26]	Hopfield, J. J. “Neural networks and physical systems with emergent collective computational abilities.” Proceedings of the National Academy of Sciences, vol. 79, no. 8, pp. 2554-2558, 1982.

[27]	Kelley, H. J. “Gradient theory of optimal flight paths.” Ars Journal, vol. 30, no. 10, pp. 947-954, 1960.

[28]	Rumelhart, D. E., Hinton, G. E., and Williams, R. J. “Learning representations by back-propagating errors.” Nature, vol. 323,no. 6088, pp. 533-538,1986.

[29]	Hochreiter, S., Bengio, Y., Frasconi, P., and Schmidhuber, J. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, 2001.

[30]	Hinton, G. E., Osindero, S., and Teh, Y. W. “A fast learning algorithm for deep belief nets.” Neural Computation, vol. 18, no. 7, pp. 1527-1554, 2006.

[31]	Hinton, G. E. “Learning multiple layers of representation.” Trends in Cognitive Sciences, vol. 11, no. 10, pp. 428-434, 2007.

[32]	Russell, S. and Norvig, P. “The most popular method for learning in multilayer networks is called back-propagation.” Artificial Intelligence: A Modern Approach, pp. 578, 1995.

[33]	Ciresan, D. C., Meier, U., Masci, J., Gambardella, L. M., and Schmidhuber, J. “Flexible, high performance convolutional neural networks for image classification.” In Twenty-Second International Joint Conference on Artificial Intelligence, 2011, June.

[34]	Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., and Manzagol, P. A. “Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.” Journal of Machine Learning Research, vol. 11, no. 12, pp. 3371-3408, 2010.

[35]	Bengio, Y., Lamblin, P., Popovici, D., and Larochelle, H. “Greedy layer-wise training of deep networks.” Advances in Neural Information Processing Systems, vol. 19, pp. 153-160, 2007.

[36]	DeeplearnToolbox, https://github.com/rasmusbergpalm/DeepLearn Toolbox (2017/06/13 visited.)

[37]	Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. “Dropout: a simple way to prevent neural networks from overfitting.” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.

[38]	Sutskever, I., Martens, J., Dahl, G. E., and Hinton, G. E. “On the importance of initialization and momentum in deep learning.” In International Conference on Machine Learning, pp. 1139-1147, 2013, February.
論文全文使用權限
校內
紙本論文於授權書繳交後2年公開
同意電子論文全文授權校園內公開
校內電子論文於授權書繳交後2年公開
校外
同意授權
校外電子論文於授權書繳交後2年公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信