電子學位論文服務

§ 瀏覽學位論文書目資料

本論文電子全文於2015-07-21起於校外公開使用
本論文紙本於2015-07-21起公開使用

系統識別號	U0002-1407201523095500
DOI	10.6846/TKU.2015.00393
論文名稱(中文)	領域響應詞典之中文意見分析研究
論文名稱(英文)	A Study of Domain Responsive Dictionary on Chinese Sentiment Analysis
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	資訊管理學系碩士在職專班
系所名稱(英文)	On-the-Job Graduate Program in Advanced Information Management
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	103
學期	2
出版年	104
研究生(中文)	郭紹德
研究生(英文)	Shao-Te Kuo
學號	701630195
學位類別	碩士
語言別	繁體中文
第二語言別
口試日期	2015-05-30
論文頁數	81頁
口試委員	指導教授 - 戴敏育(myday@mail.tku.edu.tw) 委員 - 徐煥智(shyur@mail.im.tku.edu.tw) 委員 - 翁頌舜(wengss@ntut.edu.tw)
關鍵字(中)	情感分析機器學習領域詞典意見單元網路探勘
關鍵字(英)	Sentiment Analysis Machine Learning Domain Dictionary Opinion Unit Web Mining
第三語言關鍵字
學科別分類
中文摘要	在網際網路的口碑與評論中，評價詞彙會隨著領域變化，因為人們會用不同的評價語句來表達自己的意見，所以特定領域的話題所使用的詞彙是很重要的，在不同領域中的情緒詞彙可能極為相似。然而在網際網路資訊成長的同時，許多不同的特定領域所使用屬性詞彙、評價詞彙也隨之大量增加，並且被廣泛的使用，傳統的評價詞庫已逐漸不敷使用。本研究所建立之雛型系統以及分類模型，了解文章領域分類效果之影響以及對目標領域意見單元萃取效果之影響，以萃取出與目標領域相關的意見單元組合。本研究提出一套雛型系統以及領域詞庫選擇分類模型，實驗中發現對於領域詞庫選擇的預測有著明顯的影響，交叉驗證準確度可達83.35%，而開放測試準確度達到84.8%，領域正面意見單元擷取提升24.2%，領域負面意見單元擷取提升22.9%。
英文摘要	Blooming Internet social media produces huge people opinions and comments. Hence, to analyze those text contents is necessary to have much more complicated with domain oriented sentiment wordings. However, categorizing specific-domain meanings of sentiment wordings and to help for building significant domain dictionary is important for rising accuracy rate of extraction and evaluation opinion units from text contents. 　　In this paper, we propose prototype system and the classification model to describe the text dependency with domain classification and the efficiency of opinion unit extraction form specific target domain. 　　To prove this domain responsive dictionary classified system prototype, the experiment results showed that the overall performance of our proposed system achieved 83.35% with accuracy of cross validation and 84.8% with accuracy of open laboratory test. Furthermore, this system validation is found on fetching correct positive opinion units rising to 24.2% as well as retrieving correct negative opinion unit increasing to 22.9% with domain responsive dictionary.
第三語言摘要
論文目次	目　錄目　錄 i 表目錄 iii 圖目錄 iv 第一章緒論 1 1.1 研究背景 1 1.2 研究動機 2 1.3 問題定義 6 1.4 研究目的 6 1.5 論文架構 9 第二章文獻探討 11 2.1. 意見探勘 11 2.2. 分句與斷詞 15 2.2.1. 分句 15 2.2.2. 斷詞 18 2.3. 意見詞典 22 2.4. 機器學習 23 2.4.1. 支援向量機 23 2.5. 意見單元 25 2.5.1. 評價語句定義意見單元 25 2.5.2. 意見單元萃取 27 第三章研究方法 29 3.1 研究流程 29 3.2 雛形系統建置 32 3.3 實驗流程 37 3.3.1 自動化擷取 37 3.3.2 語料本文預處理 39 3.3.3 詞庫處理與建構 40 3.3.4 特徵屬性處理 43 3.3.5 機器學習 49 3.3.6 意見單元擷取 51 3.3.7 系統資料流程 52 第四章實驗評估 54 4.1 實驗資料分配與評估方式 54 4.2 分組實驗 56 4.3 意見單元擷取 59 第五章論與意涵 62 5.1. 結論 62 5.2. 研究貢獻 63 5.3. 管理意涵 64 5.4. 未來展望 64 參考文獻 66 附錄一 DoReDic攝影屬性詞 70 附錄二 DoReDic旅遊美食屬性詞 74 附錄三 DoReDic攝影評價詞 79 附錄四 DoReDic旅遊美食評價詞 81 表目錄表2.1 程式語言中正規表示式常用參數 17 表2.2 知網中文情緒詞範例表 22 表2.3 NTUSD 中文情緒詞範例表 23 表3.1 領域詞庫清單 35 表3.2 常用換行符號表 40 表3.3 詞頻統計結果與部分擷取呈現 42 表3.4 NTUSD與DoReDic之比較表 43 表3.5 基礎領域(Common Domain) 特徵值屬性表 44 表3.6 攝影領域特徵值屬性表 45 表3.7 旅遊美食領域特徵值屬性表 46 表3.8 Weka特徵選取結果表 48 表3.9 Weka特徵選取結果之特徵變數組合表 49 表4.1 Weka 測試資料集預測範例表 55 表4.2 攝影領域實驗評結果表 56 表4.3 旅遊領域實驗評結果表 57 表4.4 其他未分類領域實驗評結果表 57 表4.5 領域分類加權平均值與準確率 58 表4.6 各特徵組10-Fold交叉測試結果分析 59 表4.7 領域分類意見單元擷取數量表 60 表4.8 領域分類意見單元擷取數量及差異提升表 61 圖目錄圖1.1 本文處理層級與處理流程 3 圖1.2 領域響應詞典之中文意見分析研究之架構 10 圖2.1網路意見探勘與意見分析架構 12 圖2.3 情感意見傾向類型與方法 14 圖2.4 中研院CKIP斷詞系統之系統流程圖 19 圖2.5 基於分層型隱藏式馬可夫模型的漢語分析流程 21 圖2.6 SVM超平面示意圖 24 圖2.7 Weka資料探勘軟體LibSVM操作範例 25 圖2.8 使用SVM距離分類演算法之比對順序 28 圖3.1 系統發展方法的研究與生命週期循環圖 30 圖3.2 Nunamaker Jr & Chen系統發展研究方法論流程 32 圖3.3 領域響應詞典之中文意見傾向研究的系統架構 33 圖3.4 領域響應詞典整合流程 35 圖3.5 本研究雛型系統網路爬蟲流程圖 37 圖3.6 語料本文預處理流程 39 圖3.7 詞庫處理與建構流程 41 圖3.8 特徵轉換流程與LibSVM訓練格式 47 圖3.9 LibSVM測試與模型建構流程 50 圖3.10 系統資料流程圖 53 圖4.1 本文特徵轉換示意圖 55 圖4.2 領域分類加權平均值與準確率比較圖 58
參考文獻	參考文獻 [1] ACNielsen. (2013). 第三方背書的免費廣告最受全球消費者信賴. Retrieved from http://www.nielsen.com/tw/zh/press-room/2013/newsTWTrustInAd20130917.html [2] Ameur, H., & Jamoussi, S. (2013). Dynamic construction of dictionaries for sentiment classification. Paper presented at the Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference On, 896-903. [3] Aue, A., & Gamon, M. (2005). Customizing sentiment classifiers to new domains: A case study. Paper presented at the Proceedings of Recent Advances in Natural Language Processing (RANLP), , 1(3.1) 2.1. [4] Brown, P. F., Desouza, P. V., Mercer, R. L., Pietra, V. J. D., & Lai, J. C. (1992). Class-based n-gram models of natural language. Computational Linguistics, 18(4), 467-479. [5] Chang, C., & Lin, C. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27. [6] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273-297. [7] Ding, X., Liu, B., & Yu, P. S. (2008). A holistic lexicon-based approach to opinion mining. Paper presented at the Proceedings of the 2008 International Conference on Web Search and Data Mining, 231-240. [8] Dong, Z., & Dong, Q. (2006). HowNet and the computation of meaning World Scientific. [9] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10-18. [10] Ku, L., & Chen, H. (2007). Mining opinions from the web: Beyond relevance retrieval. Journal of the American Society for Information Science and Technology, 58(12), 1838-1850. [11] Larsen, B., & Aone, C. (1999). Fast and effective text mining using linear-time document clustering. Paper presented at the Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 16-22. [12] Liu, B. (2010a). Sentiment analysis and subjectivity. Handbook of Natural Language Processing, 2, 627-666. [13] Liu, B. (2010b). Sentiment analysis: A multi-faceted problem. IEEE Intelligent Systems, 25(3), 76-80. [14] Liu, B. (2012). Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5(1), 1-167. [15] Liu, B., Hu, M., & Cheng, J. (2005). Opinion observer: Analyzing and comparing opinions on the web. Paper presented at the Proceedings of the 14th International Conference on World Wide Web, 342-351. [16] Ma, W., & Chen, K. (2003). Introduction to CKIP chinese word segmentation system for the first international chinese word segmentation bakeoff. Paper presented at the Proceedings of the Second SIGHAN Workshop on Chinese Language Processing-Volume 17, 168-171. [17] McNaughton, R., & Yamada, H. (1960). Regular expressions and state graphs for automata. [18] Nguyen, H. N., Van Le, T., Le, H. S., & Pham, T. V. (2014). Domain specific sentiment dictionary for opinion mining of vietnamese text. Multi-disciplinary trends in artificial intelligence (pp. 136-148) Springer. [19] Nunamaker Jr, J. F., & Chen, M. (1990). Systems development in information systems research. Paper presented at the System Sciences, 1990., Proceedings of the Twenty-Third Annual Hawaii International Conference On, , 3 631-640. [20] O'reilly, T. (2007). What is web 2.0: Design patterns and business models for the next generation of software. Communications & Strategies, (1), 17. [21] Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?: Sentiment classification using machine learning techniques. Paper presented at the Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing-Volume 10, 79-86. [22] Shelke, N. M., Deshpande, S., & Thakre, V. (2012). Survey of techniques for opinion mining. International Journal of Computer Applications (0975–8887) Volume, 57 [23] Wang, J., & Lee, C. (2011). Unsupervised opinion phrase extraction and rating in chinese blog posts. Paper presented at the Privacy, Security, Risk and Trust (Passat), 2011 Ieee Third International Conference on and 2011 Ieee Third International Conference on Social Computing (Socialcom), 820-823. [24] Wiebe, J. (2000). Learning subjective adjectives from corpora. Paper presented at the AAAI/IAAI, 735-740. [25] Yu, H., & Hatzivassiloglou, V. (2003). Towards answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sentences. Paper presented at the Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, 129-136. [26] Zhang, H., Yu, H., Xiong, D., & Liu, Q. (2003). HHMM-based chinese lexical analyzer ICTCLAS. Paper presented at the Proceedings of the Second SIGHAN Workshop on Chinese Language Processing-Volume 17, 184-187. [27] Zhao, L., & Li, C. (2009). Ontology based opinion mining for movie reviews Springer. [28] 楊盛帆. (2009). 以整合式規則來做網路論壇上的 3C 產品口碑分析. 元智大學資訊管理學系學位論文, , 1-60. [29] 王卫平, & 孟翠翠. (2011). 基于句法分析与依存分析的评价对象抽取. 计算机系统应用, 20(8), 52-57. [30] 簡之文. (2012). 部落格文章情感分析之研究. 淡江大學資訊管理學系碩士班學位論文, , 1-52. [31] 謝衫蒂. (2014). 應用機器學習與多辭典的中英雙語意見分析之研究. 淡江大學資訊管理學系碩士在職專班學位論文, , 1-89. [32] 陈强, 宋俊德, & 鄂海红. (2013). 基于动态词库的中文分词模块的设计与实现.
論文全文使用權限	校內：校內紙本論文立即公開同意電子論文全文授權校園內公開校內電子論文立即公開校外：同意授權校外電子論文立即公開

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信