系統識別號 | U0002-2507201803071500 |
---|---|
DOI | 10.6846/TKU.2018.00779 |
論文名稱(中文) | 應用深度學習於中文文字識別之研究 |
論文名稱(英文) | The Study of Chinese Optical Character Recognition by using Deep Learning Methods |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 電機工程學系碩士班 |
系所名稱(英文) | Department of Electrical and Computer Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 106 |
學期 | 2 |
出版年 | 107 |
研究生(中文) | 楊詔羽 |
研究生(英文) | Zhao-Yu Yang |
學號 | 605470334 |
學位類別 | 碩士 |
語言別 | 繁體中文 |
第二語言別 | |
口試日期 | 2018-06-04 |
論文頁數 | 54頁 |
口試委員 |
指導教授
-
周建興(chchou@mail.tku.edu.tw)
委員 - 趙于翔(yxzhao@nqu.edu.tw) 委員 - 夏至賢(chhsia625@gmail.com) |
關鍵字(中) |
文字識別 深度學習 卷積神經網路 |
關鍵字(英) |
Optical Character Recognition Deep Learning Convolution Neural Network |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
光學文字辨識(OCR),主要用途是針對既有書面的文件進行文字識別的工作,在電腦視覺的領域中為一重要角色。然而,在傳統的OCR的應用中,主要探討的應用主題大多以掃描的文件為主,使用者使用掃描器對文件影像進行掃描,經由一連串的前處理後,再由OCR辨識出文字。但是在人手一台行動裝置、人人都相機在身的時代,如果能使用手中的相機直接進行文字辨識將是既方便又經濟實惠的方式。然而,由於相機所擷取到字元影像可能會出現歪斜、旋轉、或是雜訊干擾等情況,導致OCR分類上的困難。因此,本論文的研究目標是希望能夠設計出一套適用於相機拍攝繁體中文字圖片的OCR系統。經過一連串的實驗而得出了兩個能夠辨識歪斜中文字的中文文字辨識系統,分別為擴增式CNN與階層式OCR。前者雖然能承受的歪斜角度較小,但辨識率較高,而後者雖然辨識率略低,但是能夠承受的歪斜角度較大。兩個系統在中文字圖片歪斜角度0°~20°時準確度相差不多,都在95%上下;但是,在中文字圖片歪斜角度20°~ 40°時階層式OCR的準確度就會明顯高於擴增式CNN。 |
英文摘要 |
Optical character recognition (OCR), which is mainly used to identify existing written documents, plays an important role in the field of Computer Vision. However, in the traditional application of OCR, the main topics discussed are mainly scanner-base. We use scanner to scan the image of the document, after a series of preprocessing, OCR can identify the text. But in an age of mobile devices and personal cameras, it would be both convenient and affordable to use your camera to recognize words directly. However, due to the character images that captured by camera may appear skewed, rotation, or noise jamming, etc., lead to difficulties in OCR classification. Therefore, the research objective of this paper is to design an OCR system that is suitable for traditional Chinese word image by camera-base. After a series of experiments, we conclude two recognition system, can recognize skewed in terms of Chinese character. Augmented CNN and Hierarchical OCR respectively. Although the former can withstand a small skew Angle, the Accuracy rate is high, while the latter can withstand a larger skew Angle despite a slightly lower identification rate. Two systems in the text image skew Angle 0 ° ~ 20 ° phase accuracy, almost all around 95%; However, in the text image skew Angle of 20 ° ~ 40 ° Hierarchical OCR accuracy will be significantly higher than the Augmented CNN. |
第三語言摘要 | |
論文目次 |
目錄 摘要 I 目錄 III 圖目錄 V 表目錄 VII 第一章 緒論 1 1.1 前言 1 1.2 研究背景 1 1.3 研究目標 2 第二章 基礎理論與背景知識 3 2.1 光學文字辨識(Optical Character Recognition, OCR) 3 2.2 人工智慧 (Artificial Intelligence, AI) 5 2.3 深度學習(Deep Learning) 6 2.4 卷積神經網路(Convolution Neural Network, CNN) 9 2.5 OpenCV (Open Source Computer Vision Library) 11 2.6 Tensorflow 12 第三章 應用深度學習於中文文字識別之研究 13 3.1 神經網路模型 14 3.2 資料集 16 3.3 訓練結果 18 3.4 使用相機擷收之資料集測試(Camera-base Testing Dataset) 19 3.4.1 資料集建立方法 20 3.4.2 測試結果 21 第四章 對歪斜中文文字識別之研究 22 4.1 掃描法 24 4.1.1 方法一:垂直掃描法 24 4.1.2 方法二:水平掃描法 28 4.1.3 垂直掃描法與水平掃描法的分析比較 31 4.2 使用神經網路之複合型掃描法 32 4.3 擴增式深度學習訓練方法 38 4.3.1 資料集(Data-Set) 39 4.3.2 訓練結果 41 4.3.3 真實資料測試 42 第五章 結合掃描法與擴增CNN之階層式演算法 43 第六章 結論 49 參考文獻 50 圖目錄 圖2. 1 Deep learning 架構圖 6 圖2. 2生物神經元VS人工神經元 7 圖2. 3人工神經網路架構圖 8 圖2. 4卷積神經網路(CNN)架構圖 9 圖3. 1卷積神經網路模型及參數 15 圖3. 2部分資料集 17 圖3. 3實際拍攝的中文資料集 20 圖4. 1歪斜的中文字圖片集 23 圖4. 2垂直掃描中文字圖片示意圖(有效行數為14) 24 圖4. 3旋轉圖片掃描示意圖(左20°,中10°,右0°) 25 圖4. 4經垂直掃描轉正後的圖片 26 圖4. 5水平掃描中文字圖片示意圖(有效列數為14) 28 圖4. 6經水平掃描轉正後的圖片 29 圖4. 7水平掃描法與垂直掃描法的輸出圖片比對 31 圖4. 8測試一神經網路模型圖 32 圖4. 9測試二神經網路模型圖 34 圖4. 10經複合式掃描法轉正後的圖片 36 圖4. 11測試三神經網路模型 37 圖4. 12部分旋轉過後的資料集 40 圖5. 1擴增式CNN角度測試圖 44 圖5. 2階層式OCR系統流程圖 45 圖5. 3階層式OCR角度測試圖與比較 47 表目錄 表3. 1資料集數量及類別 16 表3. 2運算環境規格 18 表3. 3運算環境軟體與版本 18 表3. 4訓練結果 19 表3. 5真實資料測試結果 21 表4. 1歪斜真實資料測試結果 22 表4. 2 垂直掃描實驗準確度與比較 27 表4. 3水平掃描實驗準確度與比較 30 表4. 4複合型掃描法訓練結果 34 表4. 5複合型掃描法結果準確度與比較 35 表4. 6增加後的資料集數量及類別 39 表4. 7擴增式CNN訓練結果 41 表4. 8擴增式CNN結果準確度與比較 42 表5. 1擴增式CNN角度測試表 43 表5. 2階層式OCR角度測試表 46 表5. 3擴增式CNN與階層式OCR之比較 48 |
參考文獻 |
[1] C. C. Wu, C. H. Chou, and F. Chang, “A Machine-Learning Approach for Analyzing Document Layout Structures with Two Reading Orders,” Pattern Recognition, vol. 41, no. 10, 2008, pp. 3200-3213. [2] C. H. Chou, S. Y. Chu, and F. Chang, “Estimation of Skew Angles for Scanned Documents Based on Piecewise Covering by Parallelograms,” Pattern Recognition, vol. 40, no. 2, 2007, pp. 443-455. [3] C. H. Chou, C. C. Lin, Y. H. Liu, and F. Chang, “A Prototype Classification Method and Its Use in A Hybrid Solution for Multiclass Pattern Recognition,” Pattern Recognition, vol. 39, no. 4, 2006, pp. 624-634. [4] C. H. Chou, C. Y. Kuo, and F. Chang, “Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers,” 9th International conference on Document Analysis and Recognition ICDAR 2007, vol. 1, 2007, 198-202. [5] C. H. Chou, W. H. Lin, and F. Chang, “A Binarization Method with Learning-Built Rules for Document Images Produced by Cameras,” Pattern Recognition, vol. 43, no. 4, 2010, pp. 1518-1530. [6] Y. X. Zhao, C. H. Chou, “Feature Selection Method Based on Neighborhood Relationships: Applications in EEG Signal Identification and Chinese Character Recognition,” Sensors, vol. 16, no. 6, 2016. [7] D. Silver et al., “Mastering the game of Go with deep neural networks and tree search”, Nature, vol. 529, no. 7587, 2016, pp. 484-489. [8] C. J. Maddison, A. Huang, I. Sutskever, D. Silver, “Move evaluation in Go using deep convolutional neural networks,” 3rd International Conference on Learning Representations, 2015. [9] C. Clark, A. Storkey, “Training deep convolutional neural networks to play go,” 32nd International Conference on Machine Learning, 2015, pp.1766–1774. [10] Y. LeCun, Y. Bengio, G. Hinton, “Deep learning”, Nature 521, May. 2015, pp. 436-444. [11] J. Schmidhuber, “Deep learning in neural networks: An overview” Neural Networks, Vol. 61, Jan 2015, pp. 85-117. [12] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Advances in Neural Information Processing Systems 25 (NIPS 2012), 2012. [13] Hinton,geoffrey e.,and ruslan R.salakhutdinov,” Reducing the Dimensionality of Data with Neural Networks,” Science, vol. 313, no. 5786, 2009, pp. 504-507. [14] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, 1998, pp. 2278-2324. [15] Y LeCun, LD Jackel, L Bottou, C Cortes, JS Denker, H Drucker, I Guyon,” Learning algorithms for classification: A comparison on handwritten digit recognition,” Neural networks: the statistical mechanics perspective, pp. 261-276. [16] Y Bengio, Y LeCun,” Scaling learning algorithms towards AI,” Large-scale kernel machines, vol. 34, no. 5, pp. 1-41. [17] D Erhan, Y Bengio, A Courville, PA Manzagol, P Vincent, S Bengio,” Why does unsupervised pre-training help deep learning?,” Journal of Machine Learning Research 11, Feb 2010, pp. 625-660. [18] J Ngiam, A Khosla, M Kim, J Nam, H Lee, AY Ng,” Multimodal deep learning,” Proceedings of the 28th International Conference on Machine Learning (ICML-11). [19] N Srivastava, RR Salakhutdinov,” Multimodal learning with deep boltzmann machines,” Advances in neural information processing systems, 2012. [20] Y. LeCun, L. D. Jackel, L. Bottou, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. A. Muller, E. Sackinger, P. Simard, V. Vapnik, “Learning algorithms for classification: A comparison on handwritten digit recognition,” Neural Networks: the Statistical Mechanics Perspective, pp. 261-276. 1995. [21] Y. LeCun and Y. Bengio, "Convolutional networks for images, speech, and time-series," The Handbook of Brain Theory and Neural Networks, MIT Press Cambridge, 1998, pp. 255-258. [22] K. Kavukcuoglu, P. Sermanet, Y. L. Boureau, K. Gregor, M. Mathieu, Y. LeCun, “Learning convolutional feature hierarchies for visual recognition,” 23rd International Conference on Neural Information Processing Systems (NIPS), vol. 1, Dec. 2010, pp. 1090-1098. [23] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, “OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks,” International Conference on Learning Representations (ICLR), 2014. [24] H. Lee, R. Grosse, R. Ranganath, A. Y. Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations,” Proceedings of the 26th Annual International Conference on Machine Learning, 2009, pp.609-616. [25] H. Lee, R. Grosse, R. Ranganath, A. Y. Ng, “Unsupervised learning of hierarchical representations with convolutional deep belief networks,” Communications of the ACM, vol. 54, no. 10, 2011, pp. 95-103. [26] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, Columbus, USA, 2014, pp.1725-1732. [27] P. Le Callet, C. Viard-Gaudin, D. Barba, “A convolutional neural network approach for objective video quality assessment,” IEEE Transactions on Neural Networks, vol. 17, no. 5, 2006, pp.1316-1327. [28] M. Collins, N. Duffy, “Convolution kernels for natural language,” 14th Advances in neural information processing systems, Dec. 2001, pp. 625-632. [29] P. Y. Simard, D. Steinkraus, J. C. Platt, “Best Practices for Convolutional Neural Networks Applied to Visual Document Analysis,” Proceedings of the Seventh International Conference on Document Analysis and Recognition. Edinburgh, UK, 2003. [30] Fabien Lauera, Ching Y. Suenb, Gérard Blocha, “A trainable feature extractor for handwritten digit recognition,” Pattern Recognition, vol. 40, no. 6, Jun. 2007, pp. 1816-1824. [31] X. Song, X. Gao, Y. Ding, Z. Wang, “A handwritten Chinese characters recognition method based on sample set expansion and CNN,” 2016 3rd International Conference on Systems and Informatics (ICSAI), Shanghai, China, Jan 2017, pp. 843-849. [32] L. Liu, P.-L. Yang, W.-W. Sun, J.-W. Ma, "Similar Handwritten Chinese Character Recognition Based on CNN-SVM," Proceedings of the International Conference on Graphics and Signal Processing, 2017, pp. 16-20. [33] Y. Tang, L. Peng, Q. Xu, “CNN Based Transfer Learning for Historical Chinese Character Recognition,” 2016 12th IAPR Workshop on Document Analysis System (DAS) , Apr. 2016, pp. 25-29. [34] S. Yang, F. Nian, T. Li, “A light and discriminative deep networks for off-line handwritten Chinese character recognition,” 2017 32nd Youth Academic Annual Conference of Chinese Association of Automation (YAC), Hefei, China, May 2017, pp. 785-790. [35] C. Cheng, X. Y. Zhang, X. H. Shao, X. D. Zhou, “Handwritten Chinese Character Recognition by Joint Classification and Similarity Ranking,” 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR), Shenzhen, China, Oct 2016, pp. 507-511. [36] W. Pan, L. Jin, Z. Feng, “Recognition of Chinese characters based on multi-scale gradient and deep neural network,” Journal of Beijing University of Aeronautics and Astronautics, vol. 41, no. 4, 2015, pp. 751-756 [37] S. He, X. Hu, “Chinese Character Recognition in Natural Scenes,” 2016 9th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, Dec 2016, pp. 124-127. [38] S. Liu, S. Shen, Z, Sun, “Research on Chinese characters recognition in complex background images,” 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, China, Jun 2017, pp. 214-217. |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信