§ 瀏覽學位論文書目資料
  
系統識別號 U0002-0407201308244700
DOI 10.6846/TKU.2013.00140
論文名稱(中文) 基於中文語法規則的意見單元抽取方法之研究
論文名稱(英文) A study of opinion unit extraction based on Chinese syntactic rules
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊管理學系碩士班
系所名稱(英文) Department of Information Management
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 101
學期 2
出版年 102
研究生(中文) 陳柏翰
研究生(英文) Po-Han Chen
學號 600630122
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2013-06-14
論文頁數 76頁
口試委員 指導教授 - 蕭瑞祥(rsshaw@mail.tku.edu.tw)
委員 - 翁頌舜
委員 - 戴敏育(myday@mail.tku.edu.tw)
關鍵字(中) 情感分析
意見單元
句法路徑
資料探勘
關鍵字(英) Sentiment analysis
Opinion unit
Syntactic path
Data mining
第三語言關鍵字
學科別分類
中文摘要
意見單元是評價對象及其對應意見詞的組合。意見單元的抽取是為情感分析領域的基礎任務之一。本研究提出了一個應用於中文部落格及論壇「智慧型手機」產品評論文章,基於語句層級中文語法規則的意見單元自動抽取方法。
本研究採用系統發展研究方法,建置一套雛型系統,此系統實作了建立意見單元抽取模式的流程。其中使用資料探勘分類技術進行訓練及測試,自動歸納出意見單元的抽取規則,以建立意見單元抽取模式。雛型系統以來自中文部落格及論壇,關於「智慧型手機」產品的評論文章做為來源。我們將本研究所建立的意見單元抽取模式的意見單元抽取結果與人工抽取結果相比較,並與相關研究方法比較,其中我們以F-Measure做為雛型系統主要評估的指標。
透過雛型系統的評估結果發現,在中文「智慧型手機」產品評論文章中,本研究建立的意見單元抽取模式,與相關研究使用字詞距離及比對句法路徑模式庫的意見單元抽取方法相比,在F-Measure皆有提升。另外,我們也發現同時使用語句結構與句法路徑結構作特徵屬性,有助於本系統意見單元抽取模式品質的提升,且語句結構在意見單元抽取較句法路徑結構具影響性。
本研究最後歸納在進行意見單元抽取時,能夠取得較佳結果的資料探勘分類技術與輸入特徵屬性類型的組合,作為實際運用時的建議。同時驗證本研究建置的建立意見單元抽取模式流程,對於意見單元的正確抽取是有幫助的。
英文摘要
Opinion unit is a combination of evaluation objects and corresponding opinion words. Opinion unit extraction is one of the basic tasks in the sentiment analysis field. This study proposes an opinion unit extraction method based on syntactic rules in Chinese.
This study uses the systems development in information systems research to build a prototype system. We use classification techniques of data mining in the prototype system for training and testing to summarize opinion unit extraction rules and establish an opinion unit extraction mode. We use Chinese smartphone product review articles from the blog and forum to assess prototype system. We extract the comments regarding the mode of opinion unit recognition results and compared to artificial recognition results. Finally, we calculate the Precision, Recall and F-Measure e to validate the prototype system, and compare to related research methods.
By evaluation of the prototype system and found that, we use our opinion unit extraction mode compared with the opinion unit extraction method based on word distance accuracy raised and compared with the opinion unit extraction method based on syntactic path accuracy raised in Chinese smartphone product review articles. In addition, we also found using the sentence structure and syntactic path structures as features will contribute to opinion unit extraction mode, and the statement structure is more influential in the opinion unit extraction. Finally, the study summarized the combination of the data mining classification techniques and characteristic attribute that can get the better result in extracting opinion unit as a recommendation for implementing. This study also confirms the establishment of the process of opinion unit extraction mode that it's helpful for extracting opinion unit.
第三語言摘要
論文目次
目錄

目錄	III
圖目錄	V
表目錄	VI
第一章 緒論	1
1.1研究背景與動機	1
1.2研究目的	4
第二章 文獻探討	7
2.1意見單元	7
2.1.1意見單元的定義	7
2.1.2意見單元的抽取	8
2.2句法路徑	11
2.3中文剖析系統	12
2.4資料探勘(Data Mining)	13
2.5資料探勘的分類技術	15
2.5.1類神經網路(Artificial Neural Network, ANN)	16
2.5.1.1類神經網路介紹	16
2.5.1.2倒傳遞類神經網路(Back-propagation Neural Network, BNN)	18
2.5.2決策樹(Decision Tree)	20
2.5.3支援向量機(Support Vector Machine, SVM)	21
2.5.4 K-最鄰近分類法(K-nearest Neighbor, KNN)	22
2.5.5單純貝氏分類法(Naive Bayesian Classifier)	24
第三章 研究方法	25
第四章 雛型系統	27
4.1網路爬蟲蒐集評價文章	27
4.2人工擷取評價語句	28
4.3抽取意見單元&產生句法路徑	28
4.4人工標注意見單元	31
4.5產生語句&句法特徵值	33
4.6特徵選取	40
4.7建立意見單元抽取模式	41
4.8模式應用	41
第五章 系統評估	44
5.1資料來源	44
5.2特徵選取結果	44
5.3系統評估方式	48
5.4系統評估結果與討論	50
5.4.1語句結構與句法路徑結構特徵屬性評估	50
5.4.2資料探勘分類技術比較	51
5.4.2.1類神經網路	51
5.4.2.2決策樹	54
5.4.2.3支援向量機	56
5.4.2.4 K-最鄰近點分類法	60
5.4.2.5單純貝氏分類法	62
5.4.2.6小結	64
5.4.3字詞距離方法比較	65
5.4.4句法路徑方法比較	67
第六章 結論	69
6.1結論	69
6.2研究限制	71
6.3未來展望	72
參考文獻	73
中文部分	73
英文部分	73

圖目錄

圖1.1 情感分析架構圖	3
圖2.1 句法路徑示意圖	12
圖2.2 句法樹示意圖	13
圖2.3 倒傳遞類神經網路模式示意圖	19
圖2.4 決策樹示意圖	21
圖2.5 SVM超平面示意圖	22
圖2.6 最鄰近分類法(NN)示意圖	23
圖3.1 系統發展研究流程圖	26
圖4.1 雛型系統架構圖	27
圖4.2 模式應用運作流程示意圖	43
圖5.1 Weka的類神經網路參數設定畫面圖	52
圖5.2 特徵屬性數量與建模時間關係圖	54
圖5.3 Weka的決策樹參數設定畫面圖	55
圖5.4 Weka的支援向量機參數設定畫面圖	57
圖5.5 特徵屬性數量與建模時間關係圖	59
圖5.6 Weka的K-最鄰近點分類法參數設定畫面圖	60
圖5.7 Weka的單純貝氏分類法參數設定畫面圖	62

表目錄

表4.1 NTUSD意見詞之詞性統計表	30
表4.2 Kappa值一致性解釋表	32
表4.3 人員A與人員B標注候選意見單元組合結果表	33
表4.4 語句(Sentence)結構特徵屬性表	34
表4.5 詞性編號對照表	35
表4.6 句法路徑(Path)結構特徵屬性表	36
表4.7 候選意見單元識別結果(Result)規則表	37
表4.8 訓練資料與測試資料相關統計結果表	38
表4.9 候選意見單元及其對應之句法路徑表	38
表4.10 候選意見單元之特徵屬性列表	39
表4.11 範例之候選意見單元組合表	42
表5.1 Weka特徵選取結果表	45
表5.2 Weka特徵選取結果之特徵變數組合表	46
表5.3 Pearson相關分析結果表	47
表5.4 Pearson相關分析結果之特徵屬性組合表	47
表5.5 Confusion Matrix	49
表5.6 特徵屬性依照重要性排序結果表	50
表5.7 抽取意見單元組合的評估結果表	53
表5.8 抽取意見單元組合的評估結果表	56
表5.9 抽取意見單元組合的評估結果表	59
表5.10 抽取意見單元組合的評估結果表	61
表5.11 抽取意見單元組合的評估結果表	63
表5.12 基於資料探勘分類技術的綜合評估結果表	65
表5.13 與字詞距離方法比較結果表	66
表5.14 與句法路徑方法比較結果表	68
參考文獻
中文部分

[1]	王正豪、李啟菁,《中文部落格文章之意見分析》,碩士論文,國立台北科技大學資訊工程研究所,2010。
[2]	中央研究院資訊科學研究所詞庫小組,<CKIP Chinese Parser>,網址:http://parser.iis.sinica.edu.tw/,上網日期:2013年1月15日。
[3]	邱皓政,《量化研究與統計分析》,第五版,臺北市:五南圖書出版公司出版,ISBN:978-957-11-6094-8,2010。
[4]	唐都钰,《领域自适应的中文情感分析词典构建研究》,碩士論文,哈尔滨工业大学计算机科学与技术学院信息检索研究中心,2012。
[5]	葉怡成,《應用類神經網路》,第三版,臺北市:儒林圖書公司出版,ISBN:957-499-423-6,2001。
[6]	楊盛帆、陸承志,《以整合式規則來做網路論壇上的3C產品口碑分析》,碩士論文,元智大學資訊管理研究所,2009。

英文部分

[1]	Aleksander, I., Morton, H. B., and Myers, C. E., "HCI: a cognitive neural net prospects Neural Nets in Human-Computer Intern," Proceedings of the IEEE Colloquium, South America, pp. 1-4, 1990.
[2]	Berry, M. J., and Linoff, G., Data Mining Techniques: For Marketing, Sales, and Customer Support, 3rd ed. New York: John Wiley & Sons, Inc, 1997.
[3]	Bloom, K., Garg, N., and Argamon, S., "Extracting appraisal expressions," Proceedings of NAACL HILT. Rochester, New York, pp. 308-315, 2007.
[4]	Bloom, K., and Argamon, S., "Automated learning of appraisal extraction patterns," Language and Computers, vol. 71, no. 1, pp. 249-260, 2009.
[5]	Cortes, C., and Vapnik, V., "Support vector networks," Machine Learning, vol. 20, no. 3, pp. 273-297, 1995.
[6]	Fish, K. E., Barnes, J. H., and Aiken, M. W., "Artificial neural networks: a new methodology for industrial market segmentation," Industrial Marketing Management, vol. 24, no. 5, pp. 431-438, 1995.
[7]	Frawley, W. J., Piatetsky-Shapiro, G., and Matheus, C. J., "Knowledge discovery in databases: An overview," AI magazine, vol. 13, no. 3, pp. 57-70, 1992.
[8]	Grupe, F. H., and Owrang, M. M., "DATA BASE MINING discovering new knowledge and competitive advantage," Information Systems Management, vol. 1, no. 4, pp. 25-31, 1995.
[9]	Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H., "The WEKA data mining software: an update," Journal of ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10-18, 2009.
[10]	Hu, M., and Liu, B., "Mining Opinion Features in Customer Reviews," Proceedings of the 19th National Conference on Artificial Intelligence, San Jose, California, United States, pp. 755-760, 2004.
[11]	Hu, M., and Liu, B., "Mining and Summarizing Customer Reviews," Proceedings of the 10th ACM International Conference on Knowledge Discovery and Data Mining, Seattle, Washington, United States, pp. 168-174, 2004.
[12]	Huang, Y. H., Pu, X. J., Yuan, C. F., and Wu, G. S., "Appraisal expression extraction based on parse tree structure," Application Research of Computers, vol. 28, no. 9, pp. 3229-3234, 2011.
[13]	Kim, S. M., and Hovy, E., "Determining the sentiment of opinions," Proceedings of the COLING conference, pp. 1367-1374, 2004.
[14]	Kim, S. M., and Hovy, E., "Automatic detection of opinion bearing words and sentences," Proceedings of International Joint Conference on Natural Language Processing, Jeju Island, Korea, pp. 61-66, 2005.
[15]	Kobayashi, N., Inui, K., and Matsumoto, Y., "Opinion Mining from Web Documents: Extraction and Structurization," Journal of the Japanese Society for Artificial Intelligence, vol. 22, no. 2, pp. 227-238, 2007.
[16]	Kobayashi, N., Inui, K., Matsumoto, Y., Tateishi, K., and Fukushima, T., "Collecting evaluative expressions for opinion extraction," Proceedings of the International Joint Conference on Natural Language Processing, New York, United States, pp. 584-589, 2004.
[17]	Landis, J., and Koch, G. G., "The measurement of observer agreement for categorical data," Journal of the International Biometric Society, vol. 33, no. 1, pp. 159-174, 1977.
[18]	Larsen, B., and Aone, C., "Fast and effective text mining using linear-time document clustering," Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, San Diego, California, United States, pp. 16-22, 1999.
[19]	Liu, B., "Sentiment analysis and subjectivity," Handbook of Natural Language Processing, 2nd ed. CRC Press, pp. 627-666, 2010.
[20]	Liu, B., Hu, M., and Cheng, J., "Opinion Observer: Analyzing and Comparing Opinions on the Web," Proceedings of the 14th international Conference on World Wide Web, Chiba, Japan, pp. 342-351, 2005.
[21]	Morinaga, S., Yamanishi, K., Tateishi, K., and Fukushima, T., "Mining product reputations on the Web," Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, Edmonton, Alberta, Canada, pp. 341-349, 2002.
[22]	Nunamaker, J. R., Chen, J. F., and Purdin, T. D. M., "Systems Development in Information Systems Research," Journal of Management Information Systems, vol. 7, no. 3, pp. 89-106, 1990-1991.
[23]	Qu, L., Toprak, C., Jakob, N., and Gurevych, I., "Sentence Level Subjectivity and Sentiment Analysis Experiments in NTCIR-7 MOAT Challenge," Proceedings of NTCIR-7 Workshop Meeting, Tokyo, Japan, pp. 210–217, 2008.
[24]	Turban, E., Sharda, R., and Delen, D., Decision support and business intelligence systems, 9th ed. Boston: Prentice-Hall, 2011.
[25]	Witten, I. H., and Frank, E., Data Mining: Practical machine learning tools and techniques, 2nd ed. San Francisco: Morgan Kaufmann, 2005.
[26]	Wu, Y., Zhang, Q., Huang, X., and Wu, L., "Phrase Dependency Parsing for Opinion Mining," Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, pp. 1533-1541, 2009.
[27]	Zhang, G., Patuwo, B. E., and Hu, M. Y., "Forecasting with artificial neural networks: the state of the art," International Journal of Forecasting, vol. 14, no. 1, pp. 35-62, 1998.
[28]	Zhao, Y. Y., Qin, B., Che, W. X., and Liu, T., "Appraisal Expression Recognition with Syntactic Path for Sentence Sentiment Classification," International Journal of Computer Processing of Languages, vol. 23, no. 1, pp. 21-37, 2011.
[29]	Zhao, Y. Y., Qin, B., and Liu, T., "Sentiment analysis," Journal of Software, vol. 21, no. 8, pp. 1834-1848, 2010.
論文全文使用權限
校內
紙本論文於授權書繳交後5年公開
同意電子論文全文授權校園內公開
校內電子論文於授權書繳交後5年公開
校外
同意授權
校外電子論文於授權書繳交後5年公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信