電子學位論文服務

§ 瀏覽學位論文書目資料

本論文電子全文於2015-09-04起於校外公開使用
本論文紙本於2015-09-04起公開使用

系統識別號	U0002-2408201517004100
DOI	10.6846/TKU.2015.00802
論文名稱(中文)	以N-gram為基礎之網路新聞讀者情緒預測方法
論文名稱(英文)	Prediction of News readers’ Emotion by N-gram
第三語言論文名稱
校院名稱	淡江大學
系所名稱(中文)	資訊管理學系碩士在職專班
系所名稱(英文)	On-the-Job Graduate Program in Advanced Information Management
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度	103
學期	2
出版年	104
研究生(中文)	沈育信
研究生(英文)	Yu-Hsinh Shen
學號	701630161
學位類別	碩士
語言別	繁體中文
第二語言別
口試日期	2015-05-30
論文頁數	46頁
口試委員	指導教授 - 張昭憲委員 - 趙景明委員 - 衛信文
關鍵字(中)	文章情感分析文字探勘 N-gram 斷詞資料探勘
關鍵字(英)	Emotion analysis Text mining N-gram Word Segematation Data mining Performance
第三語言關鍵字
學科別分類
中文摘要	隨著社群網路的興起，群眾開始習慣在網路上發表意見，並進行評論。使用者在網路的活動，留下了大量的公開資料，若能仔細加以加析，便可獲得寶貴的訊息，了解民眾的喜好與需求。由於具有高度實用性，產、官、學各界紛紛投入網路與情探勘(Public Opinion Mining)的行列中。本研究以網路新聞讀者情感預測為目標，希望能了解讀者對於剛刊登新聞之可能反應，以做為當局發布新聞、制定決策時之重要參考。為此，本研究長時間大量蒐集網路新聞，使用N-gram技術對於網路新聞進行斷詞，對於常用字詞進行次數統計，並配合讀者的情緒投票，產生新聞與讀者情感之預測模型。對待測新聞進行預測時，本研究亦嘗試各種不同的相似度計算方法，以提升準確率。本研究蒐集2013年12月8日至2014年11月12日止，共193,489筆新聞進行實驗，結果顯示本研究提出之方法在特定新聞類別中具有良好準確率。此外，我們也發現新聞蒐集時間增長時，預測準確率更可獲得明顯提升。其次，當有重大新聞發生時，延後塑模的時間點可獲得更佳的預測結果。
英文摘要	With the rise of community networks, people began to get used to show their opinion and comment. Network users leaving a large number of publicly available data by their activity. We can extract data to useful and precious information by analysis data carefully to understanding the requirements and preferences of people. Due to highly practicable of emotion analysis, filed , academic and government join the research of public opinion mining. This study will focus on prediction of news readers’ emotion. Government or companies can make decision by referring to emotion of news readers. Collecting large internet news long time and make word segmentation by N-gram on every news. Statistic frequency of key word and create emotion model by news readers’ emotion voting. When predict readers’ emotion of news, this study try to use three method to improve accuracy rate. This study collect internet news from December 8 2013 to November 12 2014, total 193,489 news. This study present high accuracy in some specific category of news. In this study, accuracy rate will improve apparently with news collection time. When grave news occurred, postpone the model timestamp will get better accuracy rate.
第三語言摘要
論文目次	表目錄 VII 圖目錄 IX 第一章緒論1 第一節研究背景1 第二節研究動機與目的2 第二章文獻探討3 第一節相關研究3 第二節 N-gram 技術及其應用5 第三節 Zipf's law (齊夫式定律)7 第四節分類與分群演算法8 第三章新聞讀者情緒預測系統架構與設計11 第四章實驗結果 27 第一節準確率統計27 第二節準確率影響因素分析31 第五章結論與未來工作42 第一節研究結論42 第二節研究限制與未來研究建議43 參考文獻 44 表1: 將新聞內文儲存至資料庫之格式16 表2: 中文文章之N-gram分析 17 表3: 改良式之N-gram字頻統計17 表4: 不同情緒下最常出現的介係詞、連接詞及主詞統計18 表5: 利用改進式N-gram分解新聞20 表6: 整合資料庫中的N-gram頻率表21 表7: 利用單詞預測讀者對於新聞的情感22 表8: 以NNC演算法進行新聞情感預測23 表9: 以K-means演算法進行新聞情感預測24 表10: 以餘弦相似度進行新聞情感預測26 表11: 新聞情感預測結果27 表12: 依照情感分類之預測準確率統計28 表13: 依照新聞分類之預測準確率統計30 表14: 查詢該時間點內相同類別新聞及心情分類所使用本文數量32 表15: 依照本文數量之預測準確率統計33 表16: 使用不同新聞資料蒐集區間產生情感分類模型，所獲得之預測準確率統計35 表17: 透過相同類型並產生相同情感反應之新聞數量，發覺「重大新聞」 39 表18: 重大新聞發生當下40 表19: 2014/3/28、29二天之「政治」類別之「火大」情感模型41 表20 重複字詞出現平均數量42 圖1: 單詞出現的頻率與排名關係8 圖2: 典型的資料分類流程9 圖3: 本研究提出之網路新聞讀者情緒分析流程13 圖4: 利用本文將新聞分類並擷取時間戳記15 圖5: 各項情感預測準確率29 圖6: 各項分類預測準確率30 圖7: 本文數量之預測準確率33 圖8: 新聞資料蒐集區間預測準確率圖36 圖9: 新聞數量圖38
參考文獻	一、中文文獻 1. 何浩、杨海棠，2002。一种基于N2Gram 技术的中文文献自动分类方法。情報學報，21，421-426。 2. 赵妍妍、秦兵、刘挺，2010，文本情感分析。軟件學報，1835-1848。 3. 馬偉傑，2012，新聞媒體及網絡行為習慣調查。傳媒透視4月號，12-14。 4. 王正豪、葉庭瑋，2013，基於意見詞修飾關係之微網誌情感分析技術。臺北科技大學資訊工程系研究所學位論文。 5. 周家宇，2012，基於餘弦和模糊相似度方法之漸進式企業電子郵件分類。中央大學資訊工程學系學位論文。 6. 陳立，2010，中文情感語意自動分類之研究。臺灣師範大學資訊工程研究所學位論文。二、網路文獻 1. 中研院資訊科學研究所，中文斷詞系統，http://ckipsvr.iis.sinica.edu.tw/，最後存取日期2015/5/23 三、英文文獻 1. Bo Pang and Lillian Lee, 2008, “Opinion Mining and Sentiment Analysis”, Foundations and Trends in Information Retrieval Vol. 2, No 1-2 (2008) , 1–135. 2. Steven E. Calyman, “Display Neutrality in Television News Interviews”, Social Problems, Vol. 35, No. 4, Special Issue: Language, Interaction, and Social Problems (Oct., 1988), 474–492. 3. Thorsten Jochins, “Text Categorization with Support Vector Machines: Learning with Many Relevant Features”, Lecture Notes in Computer Science Volume 1398, 1998, 137-142. 4. T.R. Niesler and P.C.Woodland, “A variable-length category-based N-gram language model”, Computer Speech & Language, Vol. 13 (1), January 1999, 99–124. 5. William B. Cavnar and John M. Trenkle, 1994, “N-gram-Based Text Categorization”. Environmental Research Institute of Michigan. 6. Witten, I. H., Frank, E., Hall M. A., “Data Mining, Practical Machine Learning Tools and Techniques (3/e)”, Morgan Kaufmann series in data management systems, 2011. 7. Isidro, P.-M., et al., “Feature-based Opinion Mining through Ontologies,” Expert Systems with Applications, 41 (2014) 5996-6008. 8. Yang, C. C. and Dorbin, T., “Analyzing and Visualizing Web Opinion Development and Social Interactions with Density-based Clustering, “ IEEE trans. on Systems, Man, and Cybernetics-Part A, Vol. 41, No. 6, Nov. 2011. 9. Wu Y., et al., “OpinionFlow: Visual Analysis of Opinion Diffusion on Social Media,” IEEE trans. on Visualization and Computer Graphics, Vol. 20, No 12, Dec. 2014, pp. 1763-1772. 10. Huang, W., Zhao, Y., Yang S., Lu, Y., “Analysis of the user behavior and opinion classification based on the BBS”, Applied Mathematics and Computation 205 (2008) 668-676. 11. Chen, H., and Zimbra, D., “AI and Opinion Mining,” IEEE Intelligent Systems, May/June 2010, pp. 74-76. 12. Cambria, E., et al., “New Avenues in Opinion Mining and Sentiment Analysis, “ March/April IEEE Intelligent Systems, pp. 15-21. 13. Akay, A., et al., “A Novel Data-Mining Approach Leveraging Social Media to Monitor Comsumer Opinion of Sitagliptin, “ IEEE Journal of Biomedical and Health Informatics, Vol. 19, No. 1, Jan. 2015, pp. 389-396.
論文全文使用權限	校內：校內紙本論文立即公開同意電子論文全文授權校園內公開校內電子論文立即公開校外：同意授權校外電子論文立即公開

返回頁首

如有問題，歡迎洽詢！
圖書館數位資訊組　(02)2621-5656 轉 2487 或來信