§ 瀏覽學位論文書目資料
  
系統識別號 U0002-0909202010574200
DOI 10.6846/TKU.2020.00220
論文名稱(中文) 基於BERT技術挖掘自適應偏好之新聞推薦系統
論文名稱(英文) Recommendation System by Adaptively Exploring User Preferences based on BERT
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊工程學系碩士班
系所名稱(英文) Department of Computer Science and Information Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 108
學期 2
出版年 109
研究生(中文) 蕭憲鴻
研究生(英文) Hsien-Hong Hsiao
學號 608410105
學位類別 碩士
語言別 繁體中文
第二語言別 英文
口試日期 2020-06-12
論文頁數 71頁
口試委員 指導教授 - 張志勇(cychang@mail.tku.edu.tw)
共同指導教授 - 郭經華(chkuo@mail.tku.edu.tw)
委員 - 廖文華
委員 - 游國忠
關鍵字(中) 人工智慧
推薦系統
BERT
關鍵字詞彙
關鍵字(英) Artificial Intelligence
Recommendation System
BERT
Keyword Vocabulary
第三語言關鍵字
學科別分類
中文摘要
目前在發展快速的網路資訊的時代中,新聞及媒體平台內的訊息龐大無比,想從其中取得有幫助的資訊並非容易之事。資訊爆炸的問題是新聞及媒體平台的所面臨的困境之一,因此使用推薦系統,將需要的、重要的新聞資訊,主動過濾與分析,並推薦給使用者,將是一重要的研究。它不僅可提高新聞平台的價值與商家的訊息曝光度,同時提升用戶在平台的體驗,藉由新聞平台獲取有興趣且重要的訊息。
本論文基於自然語言與人工智慧的技術,開發自適應用戶偏好的新聞推薦系統,從用戶看過的歷史新聞中去動態的分析用戶的偏好取向,並且能夠將用戶沒看過且可能感興趣的新聞資訊推薦給用戶,提升用戶的使用體驗。本論文所提出的「基於BERT技術挖掘自適應偏好之新聞推薦系統」,大致可分為三個部分,第一是數據收集與清理,第二是數據分析與特徵提取,第三是用戶喜好分析與推薦預測。
首先,第一部分需要先收集各新聞平台的新聞資訊,由於其分類不一,若要對新聞數據重新分類整理,需要先清理新聞內不必要的訊息;接著,第二部分是要對新聞重新分類,為達到分類的目標,我們需要提取新聞的特徵關鍵字,基於自然語言的模型算法,提取具有文意特徵的關鍵字,再根據主題熱度重新做分類;最後,第三部分是要分析用戶偏好並做預測推薦,本論文使用多種方法分析,其一,使用基於內容的方法分析,從過去用戶看過的新聞歷史中提取用戶的喜好特徵,並尋找符合用戶喜好特徵的新聞做推薦預測,其二,使用協同過濾的方法分析,尋找平台內其他用戶與目標用戶喜好特徵具有相同愛好的用戶還看過那些新聞做推薦預測。然而,線上新聞平台需要即時且靈活變化的推薦,因此,使用本論文自訂的規則模型ONDA,進行動態的新聞預測推薦。
本論文的貢獻如下:
(1) 新聞關鍵字文意特徵
線上的新聞平台使用本論文所設計之系統,亦可提取具有文意特徵的新聞關鍵字,在以往的論文中所提取的新聞關鍵字是根據字詞出現在本文章的次數比出現在其他新聞中有較高的次數,依此來辨別字詞的重要程度,而在現代的新聞中許多新興詞彙往往簡短而有力,出現次數並不太多,而本論文基於深度學習的方法,除了比對字詞的出現頻率外,還學習了字詞前後句子之間的關係,以此特徵來表達字詞在文章中的意義,因此,模型是根據字詞前後句子的文意特徵,相互比對之後所提取的關鍵字,其富有文意特徵之新聞關鍵字。
(2)時間熱度與用戶之間的關係
在新聞難以推薦時,以往的論文中使用基於熱門的推薦方法處理,利用目標新聞在使用者中的點閱率,找出最熱門的新聞來提供給目標用戶參考,而本篇論文基於的是新聞關鍵字在當前時刻的搜尋熱度分數來觀察目標用戶與新聞熱度之間的關係,而特立獨行的用戶基於本方法能有效提高推薦的準確程度。
(3)用戶偏好隨時間衰退
在推薦系統中難以處理的問題之一便是用戶偏好的轉變,當用戶對新聞類別長期關注的偏好發生改變,過往的推薦系統將難以即時處裡,而本論文提出一種用戶偏好衰退的機制,使用者的閱讀偏好發生轉變時,過往的偏好影響力將會銳減,而現階段的偏好影響力將會相對提高,因此,能有效處理在短時間內用戶偏好轉變而改變推薦策略來提高預測的準確度。
(4) ONDA機制
推薦系統在預測目標用戶偏好時往往需要大量的線下運算,在即時的線上新聞平台,大量的線下運算無法給予用戶即時且有效的推薦,而本論文所提出的ONDA機制能更加彈性的改變推薦,定義用戶的新聞瀏覽規則,即時對用戶閱讀行為作出反應,能降低系統運算負擔並提升用戶在平台的體驗。
根據實驗數據顯示,透過以上本論文所提出的四項貢獻,相較於其他線上新聞推薦系統更能夠即時且有效的推薦對使用者可能感興趣的新聞,對於目標用戶偏好的轉變能更為有效且巧妙的改變推薦,並且即時的推薦機制能降低系統對大量用戶偏好運算的複雜程度,進而達到自適應用戶偏好預測,協助用戶在新聞平台能快速的取得對其有幫助的新聞資訊。
英文摘要
At present, in the era of rapid development of Internet information, news and media platforms contain huge amounts of information. It is not easy to obtain helpful information from them. The problem of information explosion is one of the dilemmas faced by news and media platforms. Therefore, it will be an important research to use recommender system to filter and analyze the important news information and recommend it to users. It can not only improve the value of the news platform and the information exposure of merchants, but also enhance the user experience on the platform, and obtain interesting and important information through the news platform.
Based on the technology of natural language and artificial intelligence, this paper develops an adaptive news recommendation system, which can dynamically analyze the user's preference from the historical news that the user has seen, and can recommend the news information that the user has not seen and may be interested in to the user, so as to improve the user's experience. The " Recommendation System by Adaptively Exploring User Preferences based on BERT " proposed in this paper can be roughly divided into three parts, The first is data collection and cleaning, the second is data analysis and feature extraction, the third is user preference analysis and recommendation prediction.
First of all, the first part needs to collect news information from various news platforms. Due to the different classification of news information, it is necessary to clean up the unnecessary information in the news before sorting out the news data; Then, the second part is to re classify the news. In order to achieve the goal of classification, we need to extract the feature keywords of news. Based on the model algorithm of natural language, we can extract the keywords with textual features, and then re classify them according to the hot topic; Finally, the third part is to analyze user preferences and make prediction recommendations, First, the content-based analysis is used to extract the user's favorite features from the news history that users have seen in the past, and find the news that meets the user's preference characteristics to make recommendation prediction, Secondly, collaborative filtering method is used to find out which news users in the platform have the same preferences as the target users. However, online news platform needs instant and flexible recommendation. Therefore, Onda, a self-defined rule model, is used to make dynamic news prediction recommendation.
The contributions of this paper are as follows:
(1) News keywords contain the meaning of the article
The online news platform can also extract the news keywords with the meaning of the article by using the system designed in this paper, the news keywords extracted in previous papers are based on the fact that the number of times words appear in this article is higher than that in other news, so as to identify the importance of words.However, in modern news, many new words are often short and powerful, and they do not appear many times. Based on the method of deep learning, this paper not only compares the frequency of the words, but also studies the relationship between the sentences before and after the words, so as to express the meaning of words in the text. Therefore, the model is based on the semantic features of the sentence before and after the word, and the extracted keywords are the news keywords with rich cultural features.
(2) The relationship between time heat and users
When the news is difficult to recommend, the previous paper uses the popular recommendation method to deal with it. By using the click rate of the target news among the users, the most popular news is found for the reference of the target user. This paper is based on the search heat score of news keywords at the current time to observe the relationship between the target user and the news popularity, and the unique user This method can effectively improve the accuracy of the recommendation.
(3) User preferences weaken over time
One of the difficult problems to deal with in the recommendation system is the change of user preferences. When the user's long-term focus on the news category changes, it will be difficult for previous recommendation systems to change the recommendation policy immediately. In this paper, we propose a mechanism of user preference decline. When the user's reading preferences change, the influence of past preferences will be sharply reduced, while the influence of current preferences will be relatively enhanced. Therefore, we can effectively handle the change of user preferences and recommendation strategies in a short time to improve the accuracy of prediction.

(4) ONDA mechanism
Recommendation systems often require a large number of offline operations to predict the preferences of target users. In an instant online news platform, a large number of offline operations cannot give users instant and effective recommendations. The ONDA mechanism proposed in this paper can be more flexible to change recommendation, define users'news browsing rules, and react to users' reading behavior instantly. It can reduce the burden of system operation and improve users' experience on the platform.
Based on experimental data, the four contributions presented in this paper show that, compared with other online news recommendation systems, they are more immediate and effective in recommending news that might be of interest to users, more effective and clever in recommending changes in the preferences of target users, and the instant recommendation mechanism can reduce the complexity of the system's operation on a large number of user preferences.Thus, it achieves adaptive user preference prediction and helps users to quickly get useful news information on the news platform.
第三語言摘要
論文目次
目錄
目錄X
圖目錄XII
表目錄XV
第一章、簡介1
第二章、相關研究7
第三章、背景知識10
3-1、推薦系統10
3-2、分析與預測技術14
第四章、系統架構22
4-1 、環境與問題描述22
4-2 、系統架構26
第五章、實驗分析41
第六章、結論48
參考文獻49
附錄-英文論文51
 
圖目錄
圖 1、基於內容的推薦11
圖 2 、基於用戶的協同過濾12
圖 3、基於商品的協同過濾13
圖 4、新聞分析模組流程圖15
圖 5、CKIP斷詞解析範例圖16
圖 6、Topic Modeling訓練關鍵字分群圖19
圖 7、Topic Modeling使用期20
圖 8、設計及實作基於BERT技術挖掘自適應偏好之新聞推薦系統	23
圖 9、基於BERT技術挖掘自適應偏好之新聞推薦系統架構圖27
圖 10、新聞分析模組流程圖28
圖 11、網路爬蟲新聞資料圖29
圖 12、新聞分析模組流程圖30
圖 13、新聞標題預處理31
圖 14、BERT提取新聞關鍵字31
圖 15、Topic Modeling訓練結果圖32
圖 16、Topic Modeling分析結果圖32
圖 17、Topic Modeling使用期基於BERT關鍵字向量圖33
圖 18、新聞預測模組34
圖 19、C-BERT流程示意圖35
圖 20、C-BERT用戶閱讀歷史主題偏好例子圖36
圖 21、CF-BERT流程示意圖37
圖 22、混和模組流程示意圖38
圖 23、TOP-N子模組流程示意圖40
圖 24、本論文之技術實際網站41
圖 25、新聞列表42
圖 26、新聞內容42
圖 27、標題與說明輸出43
圖 28、提取用戶特徵43
圖 29、用戶歷史資料庫44
圖 30、新聞資料庫45
圖 31、系統各模型預測精確度比較分析46
圖 32、系統各模型預測召回率比較分析46
圖 33、系統各模型預測用戶閱讀歷史在不同疏密程度下的分析	47
 
表目錄
表 1、相關研究比較表9
表 2、單一用戶混淆矩陣表24
表 3、系統預測用戶集混淆矩陣表25
參考文獻
[1]J. Han et al., "Adaptive Deep Modeling of Users and Items Using Side Information for Recommendation," in IEEE Transactions on Neural Networks and Learning Systems, vol. 31, no. 3, pp. 737-748, March 2020, doi: 10.1109/TNNLS.2019.2909432.
[2]T. Yoneda, S. Kozawa, K. Osone, Y. Koide, Y. Abe and Y. Seki, "Algorithms and System Architecture for Immediate Personalized News Recommendations," 2019 IEEE/WIC/ACM International Conference on Web Intelligence (WI), Thessaloniki, Greece, 2019, pp. 124-131.
[3]A. Patankar, J. Bose and H. Khanna, "A Bias Aware News Recommendation System," 2019 IEEE 13th International Conference on Semantic Computing (ICSC), Newport Beach, CA, USA, 2019, pp. 232-238, doi: 10.1109/ICOSC.2019.8665610.
[4]R. S. Bader, "EANRS: An Emotional Arabic News Recommender System," 2019 4th Scientific International Conference Najaf (SICN), Al-Najef,Iraq,2019,pp.139-144,doi:10.1109/SICN47020.2019.9019374.
[5]P. Suppasert, R. Pungprasert, K. Putkhaw and S. Tuarob, "Newsaday: A personalized thai news recommendation system," 2017 6th ICT International Student Project Conference (ICT-ISPC), Skudai, 2017, pp. 1-4, doi: 10.1109/ICT-ISPC.2017.8075321.
[6]S. Natarajan and M. Moh, "Recommending News Based on Hybrid User Profile, Popularity, Trends, and Location," 2016 International Conference on Collaboration Technologies and Systems (CTS), Orlando, FL, 2016, pp. 204-211, doi: 10.1109/CTS.2016.0050.
[7]S. V. Chavan, S. S. Sambare and A. Joshi, "Diet recommendation based on Prakriti and season using Fuzzy ontology and Type-2 Fuzzy Logic," 2016 International Conference on Computing Communication Control and automation (ICCUBEA), Pune, 2016, pp. 1-6, doi: 10.1109/ICCUBEA.2016.7860026.
[8]Y. Zhang, X. Liu, W. Liu and C. Zhu, "Hybrid Recommender System Using Semi-supervised Clustering Based on Gaussian Mixture Model," 2016 International Conference on Cyberworlds (CW), Chongqing, 2016, pp. 155-158, doi: 10.1109/CW.2016.32.
[9]J. Fu, L. Liang, J. Zheng and X. Zhou, "Text Categorization by Weighted Features," 2018 5th International Conference on Information Science and Control Engineering (ICISCE), Zhengzhou, 2018, pp. 544-547, doi: 10.1109/ICISCE.2018.00119.
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信