淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


系統識別號 U0002-1507201320205700
中文論文名稱 中文意見探勘系統之新增意見詞演算法
英文論文名稱 The algorithm for pick out new opinion word of Chinese opinion mining system
校院名稱 淡江大學
系所名稱(中) 資訊工程學系碩士在職專班
系所名稱(英) Department of Computer Science and Information Engineering
學年度 101
學期 2
出版年 102
研究生中文姓名 王天煜
研究生英文姓名 Tien-Yu Wang
學號 700410136
學位類別 碩士
語文別 中文
第二語文別 英文
口試日期 2013-06-21
論文頁數 116頁
口試委員 指導教授-陳俊豪
委員-蔣璿東
委員-王鄭慈
委員-陳俊豪
中文關鍵字 意見探勘  中文  意見詞 
英文關鍵字 Opinion Mining  Chinese  Opinion Word 
學科別分類 學科別應用科學資訊工程
中文摘要 因為建立或維護詞庫型意見探勘系統,需耗費大量人力或時間,本研究試圖將這些耗費大量人力的工作簡化,原來建立或維護詞庫時需要看很多文章,並挑出針對領域有用的意見詞之後新增至詞庫,現在只要執行演算法將文章內新的意見詞挑出,之後判斷演算法所挑出的新意見詞及比照新意見詞和句子是否為對領域有用的意見詞並新增至詞庫。由於檢查演算法挑出的新意見詞比直接看文章來得省時省力,進而達到降低詞庫建置或維護的成本。
英文摘要 Due to huge manpower and time consuming for establishing and maintaining a word library type Opinion Mining system, the study attempts to simplify the operation of huge manpower and time consuming, thus the original reading many articles for establishing and maintaining the word library, where Opinion Word useful to the domain have to be pickup and added to the word library, with the process of the study, it only needs to execute algorithms to pickup new Opinion Word in articles, then determine whether the pickup Opinion Word is useful to the domain by comparing the pickup Opinion Word and sentence, and then add to word library. Owing to checking new Opinion Word picked out by algorithms is much time and manpower saving comparing to reading articles directly, moreover word library establishing and maintaining cost can be reduced.
論文目次 目錄
第1章 緒論 1
1.1 研究動機與目的 1
1.2 研究架構 6
第2章 文獻探討 7
2.1 意見單元定義 7
2.2 特徵詞的抽取與判斷 11
2.3 意見詞的擴充 21
2.4 意見極性判斷 33
第3章 演算法介紹 41
3.1 演算法──斷詞斷字 43
3.2 意見詞極性轉變之處理 52
第4章 實驗討論 57
4.1 資料來源與背景 58
4.2 電信領域實驗結果分析與討論 59
4.3 網路領域實驗結果分析與討論 71
第5章 結論與未來展望 82
附表A 84
附表B 84
附表C 85
附表D 85
參考文獻 86
附錄-英文論文 92

圖目錄
圖 1 共生模式八種類型 13
圖 2 特徵詞與意見詞配對矩陣 18
圖 3 意見詞擴充示意圖 23
圖 4 Feature-Opinion對應圖 38
圖 5 演算法──斷詞斷字步驟 44
圖 6 一段隱含詞庫無法辨識之意見詞的段落 47
圖 7 遺漏意見元素的段落 49
圖 8 遺漏意見元素的段落 49
圖 9 刪除被長詞包含的短詞 51
圖 10 演算法──意見詞加意見詞步驟 54
圖 11 演算法──意見詞不意見詞步驟 55
圖 12 演算法──意見詞了步驟 56
圖 13 演算法執行流程 57
圖 14 各月份意見詞標記所需人工判斷筆數 61
圖 15 演算法──斷詞斷字頻率1以上與頻率2以上所需花費時間(分)對照 64
圖 16 演算法──斷詞斷字頻率2以上各月份所需人工判斷筆數 64
圖 17 演算法──意見詞加意見詞各月份所需人工判斷筆數 66
圖 18 演算法──意見詞不意見詞各月份所產生筆數 68
圖 19 演算法──意見詞了各月份所需人工判斷筆數 70
圖 20 各月份意見詞標記所需人工判斷筆數 73
圖 21 演算法──斷詞斷字頻率1以上與頻率2以上所需花費時間(分)對照 76
圖 22 演算法──斷詞斷字頻率2以上各月份所需人工判斷筆數 76
圖 23 演算法──意見詞加意見詞各月份所需人工判斷筆數 78
圖 24 演算法──意見詞了各月份所需人工判斷筆數 81

表目錄
表 1 意見元素 8
表 2 電影元素的特徵表 13
表 3 特徵詞詞性 17
表 4 意見詞與特徵詞之間的定義 28
表 5 Propagation rule表 29
表 6 「哈啦飆網包」拆解後結果 50
表 7 電信領域各月份資料量 59
表 8 各月份意見詞標記實驗數據 60
表 9 演算法──斷詞斷字實驗數據 63
表 10 演算法──意見詞加意見詞實驗數據 66
表 11 演算法──意見詞不意見詞實驗數據 68
表 12 演算法──意見詞了實驗數據 70
表 13 網路領域各月份資料量 71
表 14 各月份意見詞標記實驗數據 72
表 15 演算法──斷詞斷字實驗數據 75
表 16 演算法──意見詞加意見詞實驗數據 78
表 17 演算法──意見詞不意見詞實驗數據 79
表 18 演算法──意見詞了實驗數據 81
參考文獻 [1] Andreevskaia, A., & Bergler, S. (2006). Mining WordNet for fuzzy sentiment: Sentiment tag extraction from WordNet glosses.
[2] Bin, Shi, & Kuiyu, Chang. (2006, Dec. 2006). Mining Chinese Reviews. Paper presented at the Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on.
[3] Bing, Xu, Tie-Jun, Zhao, De-Quan, Zheng, & Shan-Yu, Wang. (2010, 11-14 July 2010). Product features mining based on Conditional Random Fields model. Paper presented at the Machine Learning and Cybernetics (ICMLC), 2010 International Conference on.
[4] Bouchlaghem, R., Elkhlifi, A., & Faiz, R. (2010, Nov. 29 2010-Dec. 1 2010). Automatic extraction and classification approach of opinions in texts. Paper presented at the Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on.
[5] Ding, X., Liu, B., & Yu, P.S. (2008). A holistic lexicon-based approach to opinion mining.
[6] Esuli, A., & Sebastiani, F. (2006a). Determining term subjectivity and term orientation for opinion mining.
[7] Esuli, A., & Sebastiani, F. (2006b). Sentiwordnet: A publicly available lexical resource for opinion mining.
[8] Esuli, Andrea, & Sebastiani, Fabrizio. (2005). Determining the semantic orientation of terms through gloss classification. Paper presented at the Proceedings of the 14th ACM international conference on Information and knowledge management, Bremen, Germany.
[9] Etzioni, Oren, Cafarella, Michael, Downey, Doug, Popescu, Ana-Maria, Shaked, Tal, Soderland, Stephen, . . . Yates, Alexander. (2005). Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence, 165(1), 91-134. doi: 10.1016/j.artint.2005.03.001
[10] Fuketa, M., Kadoya, Y., Atlam, E., Kunikata, T., Morita, K., Kashiji, S., & JUN-ICHI, AOE. (2005). A method of extracting and evaluating good and bad reputations for natural language expressions. International Journal of Information Technology & Decision Making, 4(02), 177-196.
[11] Hai, Zhen, Chang, Kuiyu, & Kim, Jung-jae. (2011). Implicit Feature Identification via Co-occurrence Association Rule Mining. In A. Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing (Vol. 6608, pp. 393-404): Springer Berlin Heidelberg.
[12] Haiping, Zhang, Zhengang, Yu, Ming, Xu, & Yueling, Shi. (2011, 11-13 March 2011). Feature-level sentiment analysis for Chinese product reviews. Paper presented at the Computer Research and Development (ICCRD), 2011 3rd International Conference on.
[13] Hatzivassiloglou, V., & McKeown, K.R. (1997). Predicting the semantic orientation of adjectives.
[14] Hu, Minqing, & Liu, Bing. (2004). Mining and summarizing customer reviews. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, WA, USA. http://dl.acm.org/citation.cfm?id=1014073
[15] Jin, Wei, Ho, Hung Hay, & Srihari, Rohini K. (2009). OpinionMiner: a novel machine learning system for web opinion mining and extraction. Paper presented at the Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France.
[16] Kanayama, Hiroshi, & Nasukawa, Tetsuya. (2006). Fully automatic lexicon expansion for domain-oriented sentiment analysis. Paper presented at the Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia.
[17] Kim, Soo-Min, & Hovy, Eduard. (2004). Determining the sentiment of opinions. Paper presented at the Proceedings of the 20th international conference on Computational Linguistics, Geneva, Switzerland.
[18] Kobayashi, Nozomi, Inui, Kentaro, & Matsumoto, Yuji. (2007). Opinion Mining from Web Documents: Extraction and Structurization. Information and Media Technologies, 2(1), 326-337.
[19] Kobayashi, Nozomi, Inui, Kentaro, Matsumoto, Yuji, Tateishi, Kenji, & Fukushima, Toshikazu. (2005). Collecting Evaluative Expressions for Opinion Extraction
[20] Natural Language Processing – IJCNLP 2004. In K.-Y. Su, J. i. Tsujii, J.-H. Lee & O. Kwong (Eds.), (Vol. 3248, pp. 596-605): Springer Berlin / Heidelberg.
[21] Ku, L.W., Liu, I.C., Lee, C.Y., Chen, K., & Chen, H.H. (2008). Sentence-Level Opinion Analysis by CopeOpi in NTCIR-7.
[22] Ku, Lun-Wei, & Chen, Hsin-Hsi. (2007). Mining opinions from the Web: Beyond relevance retrieval. Journal of the American Society for Information Science and Technology, 58(12), 1838-1850. doi: 10.1002/asi.20630
[23] Ku, Lun-Wei, Ho, Hsiu-Wei, & Chen, Hsin-Hsi. (2009). Opinion mining and relationship discovery using CopeOpi opinion analysis system. Journal of the American Society for Information Science and Technology, 60(7), 1486-1503. doi: 10.1002/asi.21067
[24] Li, Zhichao, Zhang, Min, Ma, Shaoping, Zhou, Bo, & Sun, Yu. (2009). Automatic Extraction for Product Feature Words from Comments on the Web
[25] Information Retrieval Technology. In G. Lee, D. Song, C.-Y. Lin, A. Aizawa, K. Kuriyama, M. Yoshioka & T. Sakai (Eds.), (Vol. 5839, pp. 112-123): Springer Berlin / Heidelberg.
[26] Lijun, Shi, Jing, Zhang, & Xuegang, Hu. (2010, 9-11 July 2010). Subjective relation identification in Chinese opinion mining based on sentential features and ensemble classifier. Paper presented at the Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on.
[27] Lin, Dekang. (2003). Dependency-Based Evaluation of Minipar
[28] Treebanks. In A. Abeille (Ed.), (Vol. 20, pp. 317-329): Springer Netherlands.
[29] Liu, Bing, Hu, Minqing, & Cheng, Junsheng. (2005). Opinion observer: analyzing and comparing opinions on the Web. Paper presented at the Proceedings of the 14th international conference on World Wide Web, Chiba, Japan.
[30] Liu, Bing, & Zhang, Lei. (2012). A Survey of Opinion Mining and Sentiment Analysis
[31] Mining Text Data. In C. C. Aggarwal & C. Zhai (Eds.), (pp. 415-463): Springer US.
[32] Mei, Qiaozhu, Ling, Xu, Wondra, Matthew, Su, Hang, & Zhai, ChengXiang. (2007). Topic sentiment mixture: modeling facets and opinions in weblogs. Paper presented at the Proceedings of the 16th international conference on World Wide Web, Banff, Alberta, Canada.
[33] Miller, George A. (1980). WordNet. from http://wordnet.princeton.edu/
[34] Nasukawa, Tetsuya, & Yi, Jeonghee. (2003). Sentiment analysis: capturing favorability using natural language processing. Paper presented at the Proceedings of the 2nd international conference on Knowledge capture, Sanibel Island, FL, USA.
[35] Ohana, B., & Tierney, B. (2009). Sentiment classification of reviews using SentiWordNet. Paper presented at the 9th. IT & T Conference.
[36] Peiliang, Tian, Yuanchao, Liu, Ming, Liu, & Shanzong, Zhu. (2009, 10-11 Oct. 2009). Research of Product Ranking Technology Based on Opinion Mining. Paper presented at the Intelligent Computation Technology and Automation, 2009. ICICTA '09. Second International Conference on.
[37] Popescu, Ana-Maria, & Etzioni, Oren. (2005). Extracting product features and opinions from reviews. Paper presented at the Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada.
[38] Qiang, Ye, Wen, Shi, & Yijun, Li. (2006, 04-07 Jan. 2006). Sentiment Classification for Movie Reviews in Chinese by Improved Semantic Oriented Approach. Paper presented at the System Sciences, 2006. HICSS '06. Proceedings of the 39th Annual Hawaii International Conference on.
[39] Qiu, G., Liu, B., Bu, J., & Chen, C. (2009). Expanding domain sentiment lexicon through double propagation.
[40] Qiu, G., Wang, C., Bu, J., Liu, K., & Chen, C. (2008). Incorporate the Syntactic Knowledge in Opinion Mining in User-generated Content. WWW 2008.
[41] Qiu, Guang, Liu, Bing, Bu, Jiajun, & Chen, Chun. (2011). Opinion Word Expansion and Target Extraction through Double Propagation. Computational Linguistics, 37(1), 9-27. doi: 10.1162/coli_a_00034
[42] Shanzong, Zhu, Yuanchao, Liu, Ming, Liu, & Peiliang, Tian. (2009, 7-9 Dec. 2009). Research on Feature Extraction from Chinese Text for Opinion Mining. Paper presented at the Asian Language Processing, 2009. IALP '09. International Conference on.
[43] Stone, P.J., Dunphy, D.C., & Smith, M.S. (1966). The General Inquirer: A Computer Approach to Content Analysis.
[44] Su, Qi, Xu, Xinying, Guo, Honglei, Guo, Zhili, Wu, Xian, Zhang, Xiaoxun, . . . Su, Zhong. (2008). Hidden sentiment association in chinese web opinion mining. Paper presented at the Proceedings of the 17th international conference on World Wide Web, Beijing, China.
[45] Tan, Songbo, Wang, Yuefen, & Cheng, Xueqi. (2008). Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples. Paper presented at the Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, Singapore, Singapore.
[46] Ting-Chun, Peng, & Chia-Chun, Shih. (2010, 5-8 July 2010). Using Chinese part-of-speech patterns for sentiment phrase identification and opinion extraction in user generated reviews. Paper presented at the Digital Information Management (ICDIM), 2010 Fifth International Conference on.
[47] Turney, P., & Littman, M.L. (2003). Measuring praise and criticism: Inference of semantic orientation from association.
[48] Turney, Peter D. (2002). Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. Paper presented at the Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, Pennsylvania. http://dl.acm.org/citation.cfm?id=1073153
[49] Wei, Wei, Hongyan, Liu, Jun, He, Hui, Yang, & Xiaoyong, Du. (2008, 18-20 Oct. 2008). Extracting Feature and Opinion Words Effectively from Chinese Product Reviews. Paper presented at the Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on.
[50] Xu, Ge, Huang, Chu-Ren, & Wang, Houfeng. (2013). Extracting Chinese Product Features: Representing a Sequence by a Set of Skip-Bigrams. In D. Ji & G. Xiao (Eds.), Chinese Lexical Semantics (Vol. 7717, pp. 72-83): Springer Berlin Heidelberg.
[51] Xu, H., Zhao, K., Qiu, L., & Hu, C. (2011). Expanding Chinese sentiment dictionaries from large scale unlabeled corpus.
[52] Yi, J., & Niblack, W. (2005, 5-8 April 2005). Sentiment mining in WebFountain. Paper presented at the Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on.
[53] Zhai, Z., Liu, B., Zhang, L., Xu, H., & Jia, P. (2011a). Identifying evaluative sentences in online discussions.
[54] Zhai, Z., Liu, B., Zhang, L., Xu, H., & Jia, P. (2011b). Identifying evaluative sentences in online discussions. Paper presented at the Proceedings of National Conf. on Artificial Intelligence (AAAI-2011).
[55] Zhang, Changli, Zeng, Daniel, Li, Jiexun, Wang, Fei-Yue, & Zuo, Wanli. (2009). Sentiment analysis of Chinese documents: From sentence to document level. J. Am. Soc. Inf. Sci. Technol., 60(12), 2474-2487. doi: 10.1002/asi.v60:12
[56] Zhuang, L., Jing, F., & Zhu, X.Y. (2006). Movie review mining and summarization.
[57] Zhuang, Li, Jing, Feng, & Zhu, Xiao-Yan. (2006). Movie review mining and summarization. Paper presented at the Proceedings of the 15th ACM international conference on Information and knowledge management, Arlington, Virginia, USA.
[58] 李林琳. (2008). 基于特定领域的汉语句子意见挖掘. 上海交通大学. Retrieved from http://cdmd.cnki.com.cn/Article/CDMD-10248-2008053539.htm
[59] 邱鴻達. (2011). 意見探勘在中文電影評論之應用. 國立交通大學 資訊科學與工程研究所.
[60] 娄德成, & 姚天昉. (2006). 汉语句子语义极性分析和观点抽取方法的研究. 计算机应用, 26(11), 2622-2625.
[61] 孫瑛澤, 陳建良, 劉峻杰, 劉昭麟, & 蘇豐文. (2010). 中文短句之情緒分類.
[62] 梅家駒等編著. (1983). 同義詞詞林.
[63] 梅家駒等編著. (1997). 同義詞詞林. 臺灣東華書局股份有限公司.
[64] 陳立. (2010). 中文情感語意自動分類之研究.
[65] 楊盛帆. (2009). 以整合式規則來做網路論壇上的 3C 產品口碑分析. 元智大學資訊管理學系研究所碩士論文.
[66] 董振東. (1999 ). HowNet.
[67] 謝鎮宇. (2010). 意見探勘在中文評鑑語料之應用. 交通大學. Available from Airiti AiritiLibrary database.
[68] 杨锋, 彭勤科, & 徐涛. (2010). 基于随机网络的在线评论情绪倾向性分类. 自动化学报, 36(6), 837-844.
論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2018-08-19公開。
  • 同意授權瀏覽/列印電子全文服務,於2018-08-19起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2281 或 來信