§ 瀏覽學位論文書目資料
  
系統識別號 U0002-2206201212034300
DOI 10.6846/TKU.2012.00925
論文名稱(中文) 一種應用鄰近關係的特徵擷取演算法
論文名稱(英文) A Novel Feature Selection Method based on the Neighborhood Relation
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 電機工程學系碩士班
系所名稱(英文) Department of Electrical and Computer Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 100
學期 2
出版年 101
研究生(中文) 戴賢榜
研究生(英文) Hsien-Pang Tai
學號 699470406
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2012-06-21
論文頁數 37頁
口試委員 指導教授 - 周建興
委員 - 蘇木春
委員 - 李揚漢
委員 - 江正雄
委員 - 許志旭
關鍵字(中) 循序向前特徵選取法
循序向後特徵選取法
文字辨識
腦波分析
關鍵字(英) weight value
SFS
SBS
text categorization
第三語言關鍵字
學科別分類
中文摘要
近年來特徵選取逐漸應用於成千上萬的資料集中提取重要的特徵的領域中,這些領域包括:文字辨識、基因陣列、生醫信號分析。
特徵選取在模式識別與機器學習中扮演至關重要的角色。在眾多特徵選取的方法中,循序向前搜尋法(SFS)和循序向後搜尋法(SBS)是最廣泛採用的方法,本文提出的方法是根據鄰近關係結合於循序向前搜尋法(SFS)以及循序向後搜尋法(SBS),我們開發一個以鄰近關係概念為基礎的特徵選取方式,依照每個特徵之前後特徵(一維資料)或上下左右特徵(二維資料)關係給予遠近權重值,在依據其遠近權重值進行排名,照先後順序選取辨識率能提高之特徵,再將其餘不重要的特徵再進一步予以剔除,如此篩選出重要特徵子集以提升辨識率,並且特徵將會聚集在關鍵的區域當中。
本論文中,實驗部分我們分別針對文字辨識以及大鼠腦波進行方法模擬測試,在大鼠腦波中,我們針對大鼠清醒狀態(AW)、慢波睡眠(SWS)、快速眼動睡眠(REM)狀態三個行為的腦波狀態做特徵選取。另外在文字辨識中,我們對中文字中的「太」、「大」、「犬」進行實驗分析,分別為「太」、「大」一組以及「大」、「犬」一組以及「太」、「大」、「犬」一組。最後本論文所提出的方法可以將上述實驗提升其辨識率並且所找出來的特徵子集也將落於關鍵的區域當中。
英文摘要
Recently, feature selection have been applied to many area which have housands of dataset. Those area is text categorization , microarrays , biomedical signal analysis.
Feature selection in pattern recognition and machine learning to play a crucial role. Law in a number of feature selection method, the sequential forward search (SFS) and sequential backward search (SBS) is the most widely used method, the proposed method is based on proximity combine in sequential forward search method (SFS), as well as sequential backward search (SBS), we have developed a feature selection based on the concept of proximity, in accordance with the characteristics (one-dimensional data) before and after each feature or features (2D data), the relationship between up and down to give the distance weight value, ranking, based on its distance weight value according to the order to select the recognition rate can improve the characteristics, and then the rest of the unimportant features further be removed, so filter out the important feature subset in order to enhance the recognition rate, and characteristics will be gathered at a critical the area.
In this paper, the experimental part we were had simulation test for character recognition, as well as rat brain waves in the rat brain waves for the rats awake state (AW), slow wave sleep (SWS) and rapid eye movement sleep (REM) brainwave state of the three acts of the state to do feature selection. In addition, character recognition, text "太", "大", "犬" experiment, respectively, as a group of "太"大"and"大","犬"a group of too, "太" and"大"and"犬"group. Finally the proposed method in this paper the above experiments to enhance the recognition rate and are looking out feature subset which fall in the critical region.
第三語言摘要
論文目次
目   錄
第一章  緒論	- 1 -
1.1	前言	- 1 -
1.2	動機與目的	- 2 -
1.3	論文章節架構	- 3 -
第二章  背景知識與相關研究	- 5 -
2.1	特徵選取	- 5 -
2.1.1	循序向前特徵搜尋法 (SFS)	- 6 -
2.1.2	循序向後特徵搜尋法 (SBS)	- 6 -
2.1.3	雙向搜尋法 (BDS)	- 7 -
2.2	分類器	- 7 -
2.2.1	最近鄰居分類法 (NN)	- 7 -
第三章  應用鄰近關係的特徵擷取演算法	- 9 -
3.1	選特徵	- 11 -
3.2	刪特徵	- 15 -
第四章  模擬結果與討論	- 17 -
4.1	大鼠腦波模擬實驗	- 18 -
4.1.1	Dataset 1: EEG	- 20 -
4.2	文字辨識模擬實驗	- 23 -
4.2.1	Dataset 2: 大太	- 24 -
4.2.2	Dataset 3: 大犬	- 27 -
4.2.3	Dataset 4: 大太犬	- 29 -
第五章  結論與未來展望	- 32 -
參考文獻	- 36 -



 
圖目錄
圖1.1中文字犬太天木與大的差異	- 2 -
圖1.2 中文字太與大的差異	- 3 -
圖2.1 特徵選取四大步驟	- 5 -
圖3.1 鄰近關係特徵選取演算法流程圖	- 10 -
圖3.2 鄰近關係權重值計算示意圖	- 12 -
圖3.3 選取一個特徵的例子	- 12 -
圖3.4 一次選取兩個特徵的例子	- 13 -
圖3.5 一次刪除一個特徵的例子	- 15 -
圖4.1 腦波訊號經由FFT轉成頻譜	- 19 -
圖4.2 大鼠腦波訊號之三種狀態	- 20 -
圖4.3 EEG腦波無雜訊特徵子集	- 22 -
圖4.4 EEG腦波有雜訊特徵子集	- 22 -
圖4.5  EEG腦波經NR選取之特徵子集比較圖	- 22 -
圖4.6 從ETL9b數據庫中三個漢字之例子	- 23 -
圖4.7 字體圖像分成16×16塊	- 24 -
圖4.8 漢字「大」與「太」無雜訊特徵子集	- 26 -
圖4.9 漢字「大」與「太」有雜訊特徵子集 	- 26 -
圖4.10 漢字「大」與「犬」無雜訊特徵子集	- 28 -
圖4.11 漢字「大」與「犬」有雜訊特徵子集	- 28 -
圖4.12 漢字「大」與「太」與「犬」無雜訊特徵子集	- 30 -
圖4.13 漢字「大」與「太」與「犬」有雜訊特徵子集	- 31 -
圖5.1 改良腦波遊戲機Mind Flex的腦波測量系統	- 33 -
圖5.2 解封包電路以及Mind Flex遊戲機測量器具	- 34 -
圖5.3 測量腦波程式執行畫面	- 34 -
圖5.4 實際測量腦波執行畫面	- 35 -







表目錄
表3.1 選取一個特徵例子的權重值排名	- 13 -
表3.2 一次選取兩特徵例子的權重值排名	- 14 -
表3.3 一次選取兩特徵例子的選取順序	- 14 -
表3.4一次刪除一個特徵例子的權重值排名	- 15 -
表4.1 實驗模擬四筆Dataset	- 18 -
表4.2  Dataset1實驗模擬結果	- 21 -
表4.3  Dataset2實驗模擬結果	- 25 -
表4.4  Dataset3實驗模擬結果	- 27 -
表4.5  Dataset4實驗模擬結果	- 30 -
參考文獻
[1]	H. Liu, J. Li and L. Wong, “A Comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns,” Genome Informatics, vol. 13, pp. 51-60, 2002.
[2]	T. Li, C. Zhang and M. Ogihara, “A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression,” Bioinformatics, vol. 20, no. 15, pp. 2429-2437, 2004.
[3]	E. P. Xing, M. I. Jordan and R. M. Karp, “Feature selection for high-dimensional genomic microarry data,” International Conference on Machine Learning, pp. 601-608, 2001.
[4]	Y. Yang and J. O. Pedersen, “A comparative study on feature selection in text categorization,” International Conference on Machine Learning, pp. 412-420, 1997.
[5]	T. Liu, S. Liu, Z. Chen and W. Y. Ma, “An evaluation of feature selection for text categorization,” International Conference on Machine Learning, 2003.
[6]	G. Forman, “An Extensive empirical study of feature selection metrics for text classification,” Journal of Machine Learning Research, vol. 3, pp 1289-1305, 2003.
[7]	T. Joachims, “Text categorization with support vector machines: learning with many relevant features,” Proceedings of the European Conference on Machine Learning (ECML), Springer, 1998.
[8]	Y. X. Zhao, Y. Z. Hsieh, H. P. Tai, C. H. Chou, “A Novel Feature Selection Algorithm by Using False Feature,” International Conference on Electrical, Computer, Electronics and Communication Engineering, Paris, 2011.
[9]	Z. E. Yu, C. C. Kuo, C. H. Chou, C. T. Yen, F. Chang, “A Machine Learning Approach to Classify Vigilance States in Rats,” Expert Systems and Applications, vol. 38, no. 8, pp. 10153-10160, 2011.
[10]	H. Liu, L. Yu, "Toward Integrating Feature Selection Algorithms for Classification and Clustering," IEEE Transactions on Knowledge and Data Engineering, vol. 17, no. 4, pp. 491-502, 2005.
[11]	A. W. Whitney, “A direct method of nonparametric measurement selection,” IEEE Trans. Computers , pp. 1100–1103, 1971.
[12]	T. Marill and D. M. Green, "On the effectiveness of receptors in recognition systems," IEEE Trans. Inform. Theory, vol. IT-9, Jan. 1963, pp. 11-17.
[13]	 I. Pohl, “Bi-Directional Search”, Machine Intelligence, pp. 127-140, 1971.
[14]	 D. DeChampeaux and L. Sint, "An Improved Bidirectional Heuristic Search Algorithm," Journal of the ACM, Vol. 24, No. 2, 1977, pp. 177-191.
[15]	T. M. Cover and P. E Hart, "Nearest neighbor pattern classification," IEEE Trans. Inform. Theory, vol. IT-13, Jan. 1967, pp. 21-27.
[16]	R. Collobert, S. Bengio, and J. Mariethoz, “Torch: a modular machine learning software library,” Technical Report IDIAP-RR 02-46, IDIAP, 2002.
[17]	ER Kandel et al., 2000. Principles of Neural Science. 4th ed. McGraw-Hill Medical.
[18]	C. Robert, et al., "Automated sleep staging systems in rats," Journal of Neuroscience Methods, vol. 88, pp. 111-122, May 1999.
[19]	R. P. Louis, et al., "Design and validation of a computer-based sleep-scoring algorithm," Journal of Neuroscience Methods, vol. 133, pp. 71-80, Feb 2004.
[20]	H. Yamada, K. Yamamoto, and T. Saito, “A nonlinear normalization method for handprinted Kanji character recognition – line density equalization,” Pattern Recognition, vol. 23, no. 9, pp. 1023-1029, 1990.
[21]	C. H. Chou, C. Y. Kuo, and F. Chang, “Recognition of Fragmented Characters Using Multiple Feature-Subset Classifiers,” 9th International conference on Document Analysis and Recognition ICDAR 2007, vol. 1, 198-202, 2007.
[22]	feature selection  http://mirlab.org/jang/books/dcpr/fsMethod.asp
[23]	k-nearest-neighbor-classifier http://neural.cs.nthu.edu.tw/jang/books/dcpr/prKnnc.asp
[24]	Ricardo Gutierrez-Osuna, “Introduction to Pattern Analysis,” http://research.cs.tamu.edu/prism/lectures/pr/pr_l11.pdf, Retrieved May 30, 2009
論文全文使用權限
校內
紙本論文於授權書繳交後5年公開
同意電子論文全文授權校園內公開
校內電子論文於授權書繳交後5年公開
校外
同意授權
校外電子論文於授權書繳交後5年公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信