| 系統識別號 | U0002-0607202316593800 |
|---|---|
| DOI | 10.6846/tku202300326 |
| 論文名稱(中文) | 中文意見探勘系統之意見詞為動詞之研究 |
| 論文名稱(英文) | Research in Verbs as Opinion Words in Chinese Opinion Mining System |
| 第三語言論文名稱 | |
| 校院名稱 | 淡江大學 |
| 系所名稱(中文) | 資訊工程學系博士班 |
| 系所名稱(英文) | Department of Computer Science and Information Engineering |
| 外國學位學校名稱 | |
| 外國學位學院名稱 | |
| 外國學位研究所名稱 | |
| 學年度 | 111 |
| 學期 | 2 |
| 出版年 | 112 |
| 研究生(中文) | 洪文斌 |
| 研究生(英文) | Wen-Pin Hung |
| 學號 | 808410012 |
| 學位類別 | 博士 |
| 語言別 | 繁體中文 |
| 第二語言別 | |
| 口試日期 | 2023-06-30 |
| 論文頁數 | 145頁 |
| 口試委員 |
口試委員
-
王鄭慈(ctwang@tea.ntue.edu.tw)
口試委員 - 張世豪(sh.chang@ntut.edu.tw) 口試委員 - 王英宏(inhon@mail.tku.edu.tw) 口試委員 - 陳瑞發(alpha@mail.tku.edu.tw) 指導教授 - 蔣璿東(081863@mail.tku.edu.tw) |
| 關鍵字(中) |
動詞 中文意見探勘系統 上下文關聯性 關聯式分類 |
| 關鍵字(英) |
Verb Chinese Opinion Mining System Context Dependent Associative Classification |
| 第三語言關鍵字 | |
| 學科別分類 | |
| 中文摘要 |
隨著網路技術的快速發展,大部份消費者在購買某項產品或某公司的服務時,會先瀏覽網站上的相關評論之後,再進行購買行為。相關研究指出,評論文章會影響消費者對產品或某公司服務的購買決策。所以對於公司而言,公司利用顧客評論的文章做各面向的口碑分析及查看網友們的意見,以便於能在最短的時間內回覆和平衡與公司相關的負面評價,這是件非常重要的工作。因此就台灣的公司必須利用中文意見探勘系統來做各面向的口碑分析,我們研究室已經初步發展了一個屬於面向層級的中文意見探勘系統;雖然此系統已分別利用預設主題和預設面向來增加意見和所要討論面向的回收率,但仍有改善的空間。所以本研究計畫利用意見元素間的上下文關聯性讓系統不但能推論出部分網友在回文中所要討論的主題、面向和子面向,同時亦能修正部分由預設主題和預設面向所造成的錯誤;此增加的功能是能讓使用者在做口碑分析時,能獲得更詳細和正確的資訊。另外,對含某些特定動詞的文章,我們將利用關聯式分類演算法對含這些動詞的意見做進一步的處理,新增擷取意見元素組成新的完整句來強化完整句的表達內容,以提升意見探勘結果的準確率;經過實驗證實,本研究提出的改善方式具有成效,透過這些改進,提供給使用者進行口碑分析時,可以獲得更詳細和正確的資訊,同時減少他們花費在閱讀不相關文章上的時間,這將有助於提高回覆相關問題的工作效率,從而減少顧客流失及增加新顧客的機會。 |
| 英文摘要 |
With the rapid development of Internet technology, when most consumers purchase a certain product or service of a certain company, they will first browse the relevant reviews on the website before making a purchase. Relevant studies have pointed out that review articles can affect consumers' purchase decisions on products or services of a company. Therefore, it is very important for the company to use the articles reviewed by customers to conduct word-of-mouth (WOM) analysis and check the opinions of netizens in order to reply and balance the negative comments that related to the company in the shortest possible time. Thus, companies in Taiwan must use the Chinese opinion mining system to do WOM analysis in various aspects. We have initially developed an aspect-level Chinese opinion mining system; although this system has used default topic and default feature to increase opinions and the recovery rate is oriented to discuss, but there is still room for improvement. This research project utilizes the context dependent among opinion elements so that the system can not only infer the topic, feature and item that some netizens want to discuss in the palindrome, also correct some errors caused by default topic and default feature; this increase the function is to allow users to obtain more detailed and correct information when doing WOM analysis. Moreover, for articles containing certain verbs, we will use the associative classification algorithm to further process the opinions containing these verbs. Also add new elements to extract opinions to form new complete sentences to strengthen the expression content of complete sentences, therefore as to improve the accuracy of opinion mining results. Experiments have proved that the improvement methods proposed in this study are effective. When providing users with WOM analysis, they can obtain more detailed and correct information; furthermore, at the same time reduce their spending on reading. This will help improve productivity in answering relevant questions, thereby reducing customer churn and increasing opportunities for new customers. |
| 第三語言摘要 | |
| 論文目次 |
目錄 第1章 緒論 1 1.1 研究動機與目的 1 1.1.1 背景 1 1.1.2 研究動機與目的 2 1.2 論文架構 7 第2章 文獻探討 9 2.1 中文意見探勘系統相關研究 9 2.1.1 意見元素配對 9 2.1.2 意見極性判斷 11 2.2 本研究室開發之中文意見探勘系統簡介 12 2.2.1 爬文模組 14 2.2.2 分析模組 15 2.2.3 報表模組 18 2.3 關聯規則 19 2.4 關聯式分類 21 第3章 問題詳述與研究方法 24 3.1 問題詳述 24 3.1.1 意見詞為「動詞」的問題 24 3.1.2 缺乏主題或面向的問題 26 3.2 研究方法 30 3.2.1 改善回文缺乏主題或面向資訊的問題 30 3.2.2 增加完整句表達內容來改善意見詞為動詞時完整句表達能力不佳的問題 42 第4章 CDA實驗結果 47 4.1 資料來源 47 4.2 CDA單月實驗結果分析 48 4.2.1 CDA對缺乏面向或子面向之影響 49 4.2.2 CDA對預設主題與預設面向之微幅修正 51 第5章 關聯式分類實驗結果 57 5.1 AC的評估方式 58 5.2 排序方式比較 61 5.2.1 「CS」排序之實驗結果 62 5.2.2 新提出「YCSNCS」排序之實驗結果 65 5.2.3 「CS」與「YCSNCS」排序之比較 66 5.3 4-Tuple與7-Tuple表示法回推前實驗結果比較 67 5.4 7-Tuple表示法回推前與回推後實驗結果比較 69 5.5 AC實驗結果討論 71 第6章 結論 73 參考文獻 75 附錄一 實驗數據資料表 79 圖目錄 圖 1:一般關聯式分類器規則排序方式 23 圖 2:例7之PTT討論區文章標題及內容 27 圖 3:例8之PTT討論區文章標題 29 圖 4:例11之PTT討論區文章標題及內容 34 圖 5:CDA執行步驟 40 圖 6:CDA執行前面向資訊統計 50 圖 7:CDA執行後面向資訊統計 51 圖 8:4-Tuple表示法以CS排序訓練資料規則信心度1%~100%分析圖 63 圖 9:4-Tuple表示法以CS排序測試資料規則信心度1%~100%分析圖 64 表目錄 表 1:意見元素定義 4 表 2:連接詞 16 表 3:例6之「回推前」新七元素7-Tuple表示法 44 表 4:例6之「回推後」新七元素7-Tuple表示法 45 表 5:中文意見探勘系統資料來源頻道 48 表 6:ISP領域擷取文章數 48 表 7:增加「面向」完整句統計表 49 表 8:增加「子面向」完整句統計表 50 表 9:default topic完整句CDA執行成效 52 表 10:CDA執行前default topic錯誤完整句之原因分析 53 表 11:default feature完整句CDA執行成效 55 表 12:系統探勘經人工標記結果 57 表 13:實驗數據表達法 60 表 14:4-Tuple表示法以CS及YCSNCS排序測試資料最好的分類結果 67 表 15:4-Tuple及7-Tuple表示法回推前測試資料最好的分類結果 69 表 16:7-Tuple表示法回推前與回推後測試資料最好的分類結果 71 |
| 參考文獻 |
[1] 資策會產業情報研究所(MIC). "資策會產業情報研究所:調查顯示81%台灣消費者購物前會搜尋網路口碑訊息," https://kknews.cc/tech/k9pnlq.html. [2] 張漢琦, “Aspect-level中文意見探勘系統之研究與實作,” 淡江大學資訊工程學系博士班, 淡江大學, 2019. [3] M. Karamibekr, and A. A. Ghorbani, "Verb Oriented Sentiment Classification." pp. 327-331. [4] 陳怡廷, and 欒錦榮, “自然語言處理在口碑研究的應用,” 中華傳播學刊, no. 22, pp. 259-289, 2012. [5] F. F. Reichheld, and W. E. Sasser, Jr., “Zero defections: quality comes to services,” Harv Bus Rev, vol. 68, no. 5, pp. 105-11, Sep-Oct, 1990. [6] D. P. M. Rogers, The One to One Future: Building Relationships One Customer at a Time, p.^pp. 443: Crown Business, 1993. [7] 廖述賢, and 溫志皓, 資料探勘:人工智慧與機器學習發展以SPSS Modeler為範例, p.^pp. 1-416, 台灣: 博碩, 2019. [8] N. Kobayashi, K. Inui, and Y. Matsumoto, “Opinion Mining from Web Documents: Extraction and Structurization,” Transactions of the Japanese Society for Artificial Intelligence, vol. 22, pp. 227-238, 01/01, 2007. [9] B. Liu, and L. Zhang, "A Survey of Opinion Mining and Sentiment Analysis," Mining Text Data, C. C. Aggarwal and C. Zhai, eds., pp. 415-463, Boston, MA: Springer US, 2012. [10] W. Jin, H. Ho, and R. Srihari, “Opinion Miner: A novel machine learning system for web opinion mining and extraction,” in 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France, 2009, pp. 1195-1204. [11] N. Kobayashi, K. Inui, Y. Matsumoto, K. Tateishi, and T. Fukushima, "Collecting Evaluative Expressions for Opinion Extraction," Natural Language Processing – IJCNLP 2004. pp. 596-605. [12] L. Zhuang, F. Jing, and X.-Y. Zhu, “Movie review mining and summarization,” in Proceedings of the 15th ACM international conference on Information and knowledge management, Arlington, Virginia, USA, 2006, pp. 43-50. [13] 邱鴻達, “意見探勘在中文電影評論之應用,” 資訊科學與工程研究所, 國立交通大學, 2011. [14] 楊盛帆, “以整合式規則來做網路論壇上的3C產品口碑分析,” 資訊管理學系碩士班, 元智大學, 桃園, 2009. [15] X. Ding, B. Liu, and P. S. Yu, “A holistic lexicon-based approach to opinion mining,” in Proceedings of the 2008 International Conference on Web Search and Data Mining, Palo Alto, California, USA, 2008, pp. 231-240. [16] M. Hu, and B. Liu, “Mining and summarizing customer reviews,” in Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, WA, USA, 2004, pp. 168-177. [17] Z. Li, M. Zhang, S. Ma, B. Zhou, and Y. Sun, "Automatic Extraction for Product Feature Words from Comments on the Web." pp. 112-123. [18] S. Lijun, J. Zhang, and H. Xuegang, "Notice of Retraction: Subjective relation identification in Chinese opinion mining based on sentential features and ensemble classifier." pp. 450-455. [19] G. Qiu, "Incorporate the Syntactic Knowledge in Opinion Mining in User-generated Content." [20] B. Shi, and K. Chang, "Mining Chinese Reviews." pp. 585-589. [21] W. Wei, H. Liu, J. He, H. Yang, and X. Du, "Extracting Feature and Opinion Words Effectively from Chinese Product Reviews." pp. 170-174. [22] J. Yi, and W. Niblack, "Sentiment Mining in WebFountain." pp. 1073-1083. [23] H. Zhang, Z. Yu, M. Xu, and Y. Shi, "Feature-level sentiment analysis for Chinese product reviews." pp. 135-140. [24] S. Zhu, Y. Liu, M. Liu, and P. Tian, "Research on Feature Extraction from Chinese Text for Opinion Mining." pp. 7-10. [25] V. Patel, G. Prabhu, and K. Bhowmick, “A Survey of Opinion Mining and Sentiment Analysis,” International Journal of Computer Applications, vol. 131, pp. 24-27, 12/17, 2015. [26] P. Tian, Y. Liu, M. Liu, and S. Zhu, "Research of Product Ranking Technology Based on Opinion Mining." pp. 239-243. [27] L.-W. Ku, H.-W. Ho, and H.-H. Chen, “Opinion mining and relationship discovery using CopeOpi opinion analysis system,” Journal of the American Society for Information Science and Technology, vol. 60, no. 7, pp. 1486-1503, 2009. [28] S. Tan, Y. Wang, and X. Cheng, "Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples." pp. 743-744. [29] H. Xu, K. Zhao, L. Qiu, and C. Hu, "Expanding Chinese Sentiment Dictionaries from Large Scale Unlabeled Corpus." pp. 301-310. [30] 孫瑛澤, 陳建良, 劉峻杰, 劉昭麟, and 蘇豐文, "中文短句之情緒分類." pp. 184-198. [31] 陳立, “中文情感語意自動分類之研究,” 資訊工程研究所, 國立臺灣師範大學, 台北市, 2009. [32] 謝鎮宇, “意見探勘在中文評鑑語料之應用,” 資訊學院資訊學程, 國立交通大學, 2010. [33] H. Peng, E. Cambria, and A. Hussain, “A Review of Sentiment Analysis Research in Chinese Language,” Cognitive Computation, vol. 9, no. 4, pp. 423-435, 2017/08/01, 2017. [34] J. Schultz. "How Much Data is Created on the Internet Each Day?," https://blog.microfocus.com/how-much-data-is-created-on-the-internet-each-day/. [35] A. Esuli, and F. Sebastiani, "Determining the semantic orientation of terms through gloss analysis." [36] A. Esuli, and F. Sebastiani, "SentiWordNet: A Publicly Available Lexical Resource for Opinion Mining." [37] S.-M. Kim, and E. Hovy, "Determining the Sentiment of Opinions," COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics. pp. 1367-1373. [38] C. Zhang, D. Zeng, J. Li, F.-Y. Wang, and W. Zuo, “Sentiment analysis of Chinese documents: From sentence to document level,” Journal of the American Society for Information Science and Technology, vol. 60, no. 12, pp. 2474-2487, 2009. [39] V. Hatzivassiloglou, and K. McKeown, "Predicting the Semantic Orientation of Adjectives." pp. 174-181. [40] G. Qiu, B. Liu, J. Bu, and C. Chen, "Expanding Domain Sentiment Lexicon through Double Propagation." pp. 1199-1204. [41] G. Qiu, B. Liu, J. Bu, and C. Chen, “Opinion Word Expansion and Target Extraction through Double Propagation,” Computational Linguistics, vol. 37, no. 1, pp. 9-27, 2011. [42] Y. Qiang, S. Wen, and L. Yijun, "Sentiment Classification for Movie Reviews in Chinese by Improved Semantic Oriented Approach." pp. 53b-53b. [43] Q. Su, X. Xu, H. Guo, Z. Guo, X. wu, X. Zhang, B. Swen, and Z. Su, "Hidden sentiment association in Chinese web opinion mining." pp. 959-968. [44] R. Agrawal, T. Imielinski, and A. Swami, "Mining Association Rules Between Sets of Items in Large Databases, SIGMOD Conference," p. 207, 1993. [45] L. T. Vo, and 江湖海, 社群網站資料探勘:看數字說故事、不用拔草也能測風向, 台灣: 碁峰, 2020. [46] 郝沛毅, 李御璽, and 黃嘉彥, 資料資料探勘 Data Mining-Concepts and Techniques 3/E, p.^pp. 1-488: 高立圖書, 2017. [47] F. A. Thabtah, P. Cowling, and P. Yonghong, "MMAC: a new multi-class, multi-label associative classification approach." pp. 217-224. [48] F. THABTAH, “A review of associative classification mining,” Knowl. Eng. Rev., vol. 22, no. 1, pp. 37-65, 2007. [49] E. Baralis, and P. Garza, "A lazy approach to pruning classification rules." pp. 35-42. [50] B. Liu, W. Hsu, and Y. Ma, “Integrating classification and association rule mining,” in Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, New York, NY, 1998, pp. 80-86. [51] K. Wang, Y. He, and D. W. Cheung, “Mining confident rules without support requirement,” in Proceedings of the tenth international conference on Information and knowledge management, Atlanta, Georgia, USA, 2001, pp. 89-96. [52] L. Wenmin, H. Jiawei, and P. Jian, "CMAR: accurate and efficient classification based on multiple class-association rules." pp. 369-376. [53] K. Wang, S. Zhou, and Y. He, Growing decision trees on support-less association rules, 2000. |
| 論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信