§ 瀏覽學位論文書目資料
  
系統識別號 U0002-1909201913060800
DOI 10.6846/TKU.2019.00585
論文名稱(中文) Aspect-level中文意見探勘系統之研究與實作
論文名稱(英文) Research and Implementation of Aspect-level Chinese Opinion Mining System
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系博士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 107
學期 2
出版年 108
研究生(中文) 張漢琦
研究生(英文) Han-Chi Chang
學號 805410049
學位類別 博士
語言別 繁體中文
第二語言別
口試日期 2019-06-21
論文頁數 151頁
口試委員 指導教授 - 蔣璿東
委員 - 葛煥昭
委員 - 張世豪
委員 - 王亦凡
委員 - 王鄭慈
關鍵字(中) 中文意見探勘系統
中文意見層級探勘系統
資料探勘
關鍵字(英) Chinese Opinion Mining System
Chinese Aspect-Level Mining System
Data Mining
第三語言關鍵字
學科別分類
中文摘要
網際網路的發展到現在儼然已經成為一個新的媒體,閱聽眾已不再是被動地獲得資訊,而是可以自主的選擇自己想要資訊,甚至可以對於各個公司、團體、人物、產品等的主體發表個人的意見與評論。這些評論對於其他人如何評價主體是具有相當的影響力,所以去了解發文內涵的極性與面向,將有助於了解大眾的好惡以及原因。
不過這些網路評論的產生速度之快,數量之大,早已無法以人力來做分析。然而現今的意見探勘系統主要都以document-level的方式針對主體的聲量做正負評的統計,所得到的結果雖然有一定的參考價值,但是缺乏對於面向做分析,無法進一步了解這些意見的細節。因此我們發展中文的aspect-level意見探勘系統,透過完整句演算法來獲取觀點層次的意見。
意見探勘系統的主要組成是:爬文模組、分析模組與報表模組。作者的研究是針對這三個模組做改善與精進。改善的方式主要如下:爬文模組是透過“排除關鍵字”來提昇爬文的準確率,減少不相關文章帶來的雜訊;分析模組主要是用我們提出的“評價計分演算法”來平衡發文中正負評計算的特殊狀況,讓探勘結果更接近真相;報表模組則是改進使用者介面與報表的呈現,讓使用者能更容易了解每日的正負評及其面向。此外我們開發了發文者追蹤的功能,對於判斷發文者意見是否具有參考價值(如寫手的意見),或發文者真的需要主體特別的協助(如問題遲遲無法解決)具有相當的貢獻。
英文摘要
The development of the Internet has become just like a new media; audience are no longer passively getting information but allowed to view the information only they desired, and even to express their personal opinions and comments on subjects of companies, organizations, individuals, products, etc. 
These comments will considerably influence the viewpoints of others on a given subject; therefore, understanding the polarity and orientation of the connotation will help to perceive the public attitudes toward the subject and the cause of the attitudes. 
However, a huge quantity of the comments on the internet are produced rapidly, which comes too large and too fast to be analyzed manually. Yet, the opinion exploration system today mainly adopts document-level approach, which the polarity is found by the level of volume. Although the outcome of this finding certainly has a reference value, no further details about the comments can be identified due to lack of dimensional analysis. Therefore, we develop Chinese aspect-level opinion exploration system; through an algorithm of completed sentence, various level of viewpoints from the comments can be obtained. 
The opinion exploration system is made of crawling module, analysis module and report module. The author's research is aimed at improving and enhancing these three modules. 
The improvement approaches are as follows: crawling module is to "exclude keywords" to improve the accuracy of crawling and reduce the noise from irrelevant articles; the analysis module mainly uses our "evaluation scoring algorithm" "To balance the special situation of both polarities in the comments, so that the outcome of the finding can be even more realistic; the report module is to improve the user interface and report presentation, which allows the user easily to comprehend the both polarities and dimensions of daily comments. 
In addition, we have developed a comment tracking feature, which can identify whether a comment has the reference value (i.e. through fake reviewer) or whether a comment maker actually requires a special assistance on an given subject (i.e. a problem has still not been resolved), has a significant contribution.
第三語言摘要
論文目次
目錄
第1章 緒論	1
1.1	研究動機與目的	1
1.1.1	背景	1
1.1.2	研究動機與目的	3
1.2	論文架構	5
第2章 文獻探討	7
2.1	意見元素定義	7
2.2	特徵詞的抽取與判斷	10
2.2.1	人工建立特徵詞詞庫	10
2.2.2	使用自然語言技術擷取特徵詞	13
2.3	意見詞的擴充	19
2.3.1	利用詞庫擴充意見詞	20
2.3.2	利用語料庫擴充意見詞	24
2.4	意見極性判斷	29
2.4.1	判斷意見詞傾向	30
2.4.2	否定詞和連接詞的判斷	35
第3章 研究方法	36
3.1	爬文模組	37
3.1.1	問題陳述	39
3.1.2	問題的解決方法	45
3.2	分析模組	52
3.2.1	意見元素的上下文關係	53
3.2.2	預設Topic和預設Feature	56
3.2.3	句型的配對	60
3.3	報表模組	66
3.3.1	強化文章的準確性與可讀性	67
3.3.2	網民和記者傾向追蹤分析	70
3.3.3	特殊事件追蹤分析	72
第4章 探勘系統的分析實驗	76
4.1	準確率實驗	76
4.1.1	評估方式	76
4.1.2	評估結果	77
4.2	面向分析	83
4.2.1	正負評的計算方式	83
4.2.2	計分結果介紹	88
第5章 爬文與日報模組的實驗	96
5.1	爬文的關鍵字搜尋結果分析	97
5.1.1	資料來源與關鍵字訓練	97
5.1.2	爬文結果分析	100
5.2	日報系統的處理與分析	104
5.2.1	日報的呈現形式	104
5.2.2	事件追蹤分析	109
第6章 結論	117
參考文獻	119
附件、日報實例	125
EBTI論壇分析日報	125
EBTI新聞分析日報	135
附錄、相關期刊發表	141

 
表格目錄
表格 1意見元素定義表	5
表格 2意見元素	9
表格 3電影元素的特徵表	12
表格 4特徵詞詞性	15
表格 5意見詞與特徵詞之間的定義	26
表格 6 關鍵字與標題對照表	44
表格 7 Feature和Opinion Word(OP)關係表	54
表格 8 Topic和Feature關係表	55
表格 9 連接詞表	64
表格 10評估方式	77
表格 11 寬頻2011-11~2012-06資料數量	78
表格 12 寬頻2012-07~2013-02資料數量	78
表格 13表 寬頻2011-11~2012-06準確率、回收率、F1值	78
表格 14 寬頻2012-07~2013-02準確率、回收率、F1值	78
表格 15 準確率、回收率、F1值 平均值	82
表格 16 五大網際網路服務供應商(ISP)的討論文章數	88
表格 17 寬頻六大面向的描述說明	91
表格 18 爬文與篩選後數量的加總(以週為單位)	100
表格 19 以單日蒐集的新聞資料為例,說明關鍵字篩選的效率	102

 
圖目錄
圖 1共生模式八種類型	12
圖 2特徵詞與意見詞配對矩陣	16
圖 3意見詞擴充示意圖	21
圖 4 Feature-Opinion對應圖	34
圖 5中文意見探勘系統架構圖	37
圖 6 基本爬文流程圖	38
圖 7 中嘉寬頻與其他不相關的新聞 範例 3-1	42
圖 8 資料來源之不相關文章圖例	43
圖 9 新增排除句關鍵字介面	48
圖 10 排除句關鍵字設定	48
圖 11 新聞網站關鍵字搜尋演算法	49
圖 12 論壇關鍵字搜尋演算法	51
圖 13 短篇文章預設Topic之範例	58
圖 14 PTT論壇標題範例(四)	60
圖 15 配對流程圖	62
圖 16 對等句配對之範例	64
圖 17 比較句配對之範例	66
圖 18 人工檢查介面	69
圖 19 人工檢查介面-單篇新聞文章	69
圖 20 人工檢查介面-以新竹市社會處為例	70
圖 21 發文者評價面向分析	72
圖 22 中華評價分佈(201406~201502)	74
圖 23中華評價分佈(201503~201506)	75
圖 24 寬頻 2011-11 至 2012-06 準確率	79
圖 25 寬頻 2011-11 至 2012-06 回收率	79
圖 26 寬頻 2011-11 至 2012-06 F1值	80
圖 27 寬頻 2011-07 至 2013-02 準確率	81
圖 28 寬頻 2011-07 至 2013-02 回收率	81
圖 29 寬頻 2011-07 至 2013-02 F1值	82
圖 30 評價計分演算法	84
圖 31 五大網際網路服務供應商(ISP)的討論發文數及所佔比例	89
圖 32 網際網路服務供應商(ISP)正評價及負評價的完整句數量	90
圖 33 五家ISP的六大面向雷達圖	92
圖 34 中華電信的六大面向雷達圖	93
圖 35 SeedNet、台灣大哥大、Kbro、So-Net的五大面向雷達圖	93
圖 36 中華電信、SeedNet、Kbro在上網費用面向的評價折線圖	94
圖 37新聞頻道爬文蒐集新聞筆數比例圖	99
圖 38篩選相關文章結果的日平均折線圖(以週為單位)	101
圖 39不相關文章被收集之案例	103
圖 40 論壇分析日報總表	105
圖 41 論壇分析日報發文明細資料	106
圖 42單一文章正負面評價結果	107
圖 43 論壇發文首篇文章內容	108
圖 44 論壇的發文者追蹤	109
圖 45 MOD事件討論聲量	111
圖 46  MOD事件正負面評價趨勢	112
圖 47 MOD事件評價趨勢與新聞篇數	113
圖 48 中華電信正負面評價趨勢圖	114
圖49 中華電信與MOD評價趨勢圖	115
參考文獻
[1]	Andreevskaia, A., & Bergler, S. (2006). Mining WordNet for fuzzy sentiment: Sentiment tag extraction from WordNet glosses.
[2]	Bar-haim, R., Dinur, E., Feldman, R., Fresko, M.,& Goldstein, G. (2011). "Identifying and Following Expert Investors in Stock Microblogs." Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP'11), Edinburgh, Scotland, UK, 1310-1319.
[3]	Bin, Shi, & Kuiyu, Chang. (2006, Dec. 2006). Mining Chinese Reviews. Paper presented at the Data Mining Workshops, 2006. ICDM Workshops 2006. Sixth IEEE International Conference on.
[4]	Bouchlaghem, R., Elkhlifi, A., & Faiz, R. (2010, Nov. 29 2010-Dec. 1 2010). Automatic extraction and classification approach of opinions in texts. Paper presented at the Intelligent Systems Design and Applications (ISDA), 2010 10th International Conference on.
[5]	Daniel M. RomeroWojciech GalubaSitaram AsurBernardo A. Huberman. “Influence and Passivity in Social Media.”  European Conference, ECML PKDD 2011
[6]	Ding, X., Liu, B., & Yu, P.S. (2008). A holistic lexicon-based approach to opinion mining.
[7]	Esuli, A., & Sebastiani, F. (2006a). Determining term subjectivity and term orientation for opinion mining.
[8]	Esuli, A., & Sebastiani, F. (2006b). Sentiwordnet: A publicly available lexical resource for opinion mining.
[9]	Esuli, Andrea, & Sebastiani, Fabrizio. (2005). Determining the semantic orientation of terms through gloss classification. Paper presented at the Proceedings of the 14th ACM international conference on Information and knowledge management, Bremen, Germany.
[10]	Etzioni, Oren, Cafarella, Michael, Downey, Doug, Popescu, Ana-Maria, Shaked, Tal, Soderland, Stephen, . . . Yates, Alexander. (2005). Unsupervised named-entity extraction from the Web: An experimental study. Artificial Intelligence, 165(1), 91-134. doi: 10.1016/j.artint.2005.03.001
[11]	Fuketa, M., Kadoya, Y., Atlam, E., Kunikata, T., Morita, K., Kashiji, S., & JUN-ICHI, AOE. (2005). A method of extracting and evaluating good and bad reputations for natural language expressions. International Journal of Information Technology & Decision Making, 4(02), 177-196.
[12]	Hai, Zhen, Chang, Kuiyu, & Kim, Jung-jae. (2011). Implicit Feature Identification via Co-occurrence Association Rule Mining. In A. Gelbukh (Ed.), Computational Linguistics and Intelligent Text Processing (Vol. 6608, pp. 393-404): Springer Berlin Heidelberg.
[13]	Hatzivassiloglou, V., & McKeown, K.R. (1997). Predicting the semantic orientation of adjectives.
[14]	Haiping, Zhang, Zhengang, Yu, Ming, Xu, & Yueling, Shi. (2011, 11-13 March 2011). Feature-level sentiment analysis for Chinese product reviews. Paper presented at the Computer Research and Development (ICCRD), 2011 3rd International Conference on.
[15]	Haiyun Peng, Erik Cambria, Amir Hussain(2017). “A Review of Sentiment Analysis Research in Chinese Language.” Springer Science Cogn Comput (2017) 9:423–435
[16]	Hu, Minqing, & Liu, Bing. (2004). “Mining and summarizing customer reviews.” Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, WA, USA. http://dl.acm.org/citation.cfm?id=1014073
[17]	IBM (2017). “10 Key Marketing Trends for 2017” IBM Marketing Cloud
[18]	Jeff Schultz(2017). “How Much Data is Created on the Internet Each Day?” Micro Focus Blog
[19]	Jin, Wei, Ho, Hung Hay, & Srihari, Rohini K. (2009). “OpinionMiner: a novel machine learning system for web opinion mining and extraction.” Paper presented at the Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France. 
[20]	Kanayama, Hiroshi, & Nasukawa, Tetsuya. (2006). Fully automatic lexicon expansion for domain-oriented sentiment analysis. Paper presented at the Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia. 
[21]	Kim, Soo-Min, & Hovy, Eduard. (2004). “Determining the sentiment of opinions.” Paper presented at the Proceedings of the 20th international conference on Computational Linguistics, Geneva, Switzerland. 
[22]	Kobayashi, Nozomi, Inui, Kentaro, & Matsumoto, Yuji. (2007). “Opinion Mining from Web Documents: Extraction and Structurization.” Information and Media Technologies, 2(1), 326-337. 
[23]	Kobayashi, Nozomi, Inui, Kentaro, Matsumoto, Yuji, Tateishi, Kenji, & Fukushima, Toshikazu. (2005). “Collecting Evaluative Expressions for Opinion Extraction”
[24]	Ku, Lun-Wei, & Chen, Hsin-Hsi. (2007). “Mining opinions from the Web: Beyond relevance retrieval.” Journal of the American Society for Information Science and Technology, 58(12), 1838-1850. doi: 10.1002/asi.20630
[25]	Ku, Lun-Wei, Ho, Hsiu-Wei, & Chen, Hsin-Hsi. (2009). Opinion mining and relationship discovery using CopeOpi opinion analysis system. Journal of the American Society for Information Science and Technology, 60(7), 1486-1503. doi: 10.1002/asi.21067Li, Zhichao, Zhang, Min, Ma, Shaoping, Zhou, Bo, & Sun, Yu. (2009). Automatic Extraction for Product Feature Words from Comments on the Web
[26]	Ku, L.W., Liu, I.C., Lee, C.Y., Chen, K., & Chen, H.H. (2008). Sentence-Level Opinion Analysis by CopeOpi in NTCIR-7.
[27]	Li, Zhichao, Zhang, Min, Ma, Shaoping, Zhou, Bo, & Sun, Yu. (2009). Automatic Extraction for Product Feature Words from Comments on the Web
[28]	Lijun, Shi, Jing, Zhang, & Xuegang, Hu. (2010, 9-11 July 2010). Subjective relation identification in Chinese opinion mining based on sentential features and ensemble classifier. Paper presented at the Computer Science and Information Technology (ICCSIT), 2010 3rd IEEE International Conference on.
[29]	Lin, Dekang. (2003). Dependency-Based Evaluation of Minipar
[30]	Liu, Bing, Hu, Minqing, & Cheng, Junsheng. (2005). Opinion observer: analyzing and comparing opinions on the Web. Paper presented at the Proceedings of the 14th international conference on World Wide Web, Chiba, Japan.
[31]	Liu, Bing, & Zhang, Lei. (2012). “A Survey of Opinion Mining and Sentiment Analysis”
[32]	Mei, Qiaozhu, Ling, Xu, Wondra, Matthew, Su, Hang, & Zhai, ChengXiang. (2007). Topic sentiment mixture: modeling facets and opinions in weblogs. Paper presented at the Proceedings of the 16th international conference on World Wide Web, Banff, Alberta, Canada.
[33]	Miller, George A. (1980). WordNet. from http://wordnet.princeton.edu/
[34]	Nasukawa, Tetsuya, & Yi, Jeonghee. (2003). Sentiment analysis: capturing favorability using natural language processing. Paper presented at the Proceedings of the 2nd international conference on Knowledge capture, Sanibel Island, FL, USA.
[35]	Nicole A. Buzzetto-More.(2012). “Social Networking in Undergraduate Education.” Interdisciplinary Journal of Information, Knowledge, and Management Volume 7, 2012
[36]	Ohana, B., & Tierney, B. (2009). Sentiment classification of reviews using SentiWordNet. Paper presented at the 9th. IT & T Conference.
[37]	Peiliang, Tian, Yuanchao, Liu, Ming, Liu, & Shanzong, Zhu. (2009, 10-11 Oct. 2009). Research of Product Ranking Technology Based on Opinion Mining. Paper presented at the Intelligent Computation Technology and Automation, 2009. ICICTA '09. Second International Conference on.
[38]	Popescu, Ana-Maria, & Etzioni, Oren. (2005). Extracting product features and opinions from reviews. Paper presented at the Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, Vancouver, British Columbia, Canada.
[39]	Qiang, Ye, Wen, Shi, & Yijun, Li. (2006, 04-07 Jan. 2006). Sentiment Classification for Movie Reviews in Chinese by Improved Semantic Oriented Approach. Paper presented at the System Sciences, 2006. HICSS '06. Proceedings of the 39th Annual Hawaii International Conference on.
[40]	Qiu, G., Liu, B., Bu, J., & Chen, C. (2009). “Expanding domain sentiment lexicon through double propagation.”
[41]	Qiu, Guang, Liu, Bing, Bu, Jiajun, & Chen, Chun. (2011). Opinion Word Expansion and Target Extraction through Double Propagation. Computational Linguistics, 37(1), 9-27. doi: 10.1162/coli_a_00034
[42]	Qiu, G., Wang, C., Bu, J., Liu, K., & Chen, C. (2008). Incorporate the Syntactic Knowledge in Opinion Mining in User-generated Content. WWW 2008.
[43]	Qualman, E. (2010). “How social media transforms the way we live and do business.” Socialnomics
[44]	Shanzong, Zhu, Yuanchao, Liu, Ming, Liu, & Peiliang, Tian. (2009, 7-9 Dec. 2009). Research on Feature Extraction from Chinese Text for Opinion Mining. Paper presented at the Asian Language Processing, 2009. IALP '09. International Conference on.
[45]	Stone, P.J., Dunphy, D.C., & Smith, M.S. (1966). The General Inquirer: A Computer Approach to Content Analysis.
[46]	Su, Qi, Xu, Xinying, Guo, Honglei, Guo, Zhili, Wu, Xian, Zhang, Xiaoxun, . . . Su, Zhong. (2008). Hidden sentiment association in chinese web opinion mining. Paper presented at the Proceedings of the 17th international conference on World Wide Web, Beijing, China.
[47]	Ting-Chun, Peng, & Chia-Chun, Shih. (2010, 5-8 July 2010). Using Chinese part-of-speech patterns for sentiment phrase identification and opinion extraction in user generated reviews. Paper presented at the Digital Information Management (ICDIM), 2010 Fifth International Conference on.
[48]	Tan, Songbo, Wang, Yuefen, & Cheng, Xueqi. (2008). Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples. Paper presented at the Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, Singapore, Singapore.
[49]	Turney, Peter D. (2002). “Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews.”  Paper presented at the Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, Pennsylvania. http://dl.acm.org/citation.cfm?id=1073153
[50]	Turney, P., & Littman, M.L. (2003). Measuring praise and criticism: Inference of semantic orientation from association.
[51]	Wei, Wei, Hongyan, Liu, Jun, He, Hui, Yang, & Xiaoyong, Du. (2008, 18-20 Oct. 2008). Extracting Feature and Opinion Words Effectively from Chinese Product Reviews. Paper presented at the Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on.
[52]	Xu, Ge, Huang, Chu-Ren, & Wang, Houfeng. (2013). Extracting Chinese Product Features: Representing a Sequence by a Set of Skip-Bigrams. In D. Ji & G. Xiao (Eds.), Chinese Lexical Semantics (Vol. 7717, pp. 72-83): Springer Berlin Heidelberg.
[53]	Xu, H., Zhao, K., Qiu, L., & Hu, C. (2011). Expanding Chinese sentiment dictionaries from large scale unlabeled corpus.
[54]	Yi, J., & Niblack, W. (2005, 5-8 April 2005). Sentiment mining in WebFountain. Paper presented at the Data Engineering, 2005. ICDE 2005. Proceedings. 21st International Conference on.
[55]	Zhai, Z., Liu, B., Zhang, L., Xu, H., & Jia, P. (2011a). Identifying evaluative sentences in online discussions.
[56]	Zhang, Changli, Zeng, Daniel, Li, Jiexun, Wang, Fei-Yue, & Zuo, Wanli. (2009). Sentiment analysis of Chinese documents: From sentence to document level. J. Am. Soc. Inf. Sci. Technol., 60(12), 2474-2487. doi: 10.1002/asi.v60:12
[57]	Zhuang, Li, Jing, Feng, & Zhu, Xiao-Yan. (2006). “Movie review mining and summarization.” Paper presented at the Proceedings of the 15th ACM international conference on Information and knowledge management, Arlington, Virginia, USA.
[58]	李林琳. (2008). 基于特定领域的汉语句子意见挖掘. 上海交通大学. Retrieved from http://cdmd.cnki.com.cn/Article/CDMD-10248-2008053539.htm  
[59]	娄德成, & 姚天昉. (2006). 汉语句子语义极性分析和观点抽取方法的研究. 计算机应用, 26(11), 2622-2625. 
[60]	邱鴻達. (2011). “意見探勘在中文電影評論之應用.” 國立交通大學	資訊科學與工程研究所. 
[61]	杨锋, 彭勤科, & 徐涛. (2010). 基于随机网络的在线评论情绪倾向性分类. 自动化学报, 36(6), 837-844.
[62]	孫瑛澤, 陳建良, 劉峻杰, 劉昭麟, & 蘇豐文. (2010). 中文短句之情緒分類.
[63]	陳立. (2010). 中文情感語意自動分類之研究. 
[64]	楊盛帆. (2009). “以整合式規則來做網路論壇上的 3C 產品口碑分析.” 元智大學資訊管理學系研究所碩士論文.
[65]	梅家駒等編著. (1997). “同義詞詞林.” 臺灣東華書局股份有限公司.
[66]	董振東. (1999 ). HowNet. 
[67]	謝鎮宇. (2010). 意見探勘在中文評鑑語料之應用. 交通大學.
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信