系統識別號 | U0002-1207202010153100 |
---|---|
DOI | 10.6846/TKU.2020.00298 |
論文名稱(中文) | 基於異質網路表示法之跨媒體使用者輪廓描繪 |
論文名稱(英文) | Cross-media User Profiling with Heterogeneous Network Embedding |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 管理科學學系企業經營碩士班 |
系所名稱(英文) | Master's Program In Business And Management, Department Of Management Sciences |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 108 |
學期 | 2 |
出版年 | 109 |
研究生(中文) | 陳律安 |
研究生(英文) | Lu-An Chen |
學號 | 608620034 |
學位類別 | 碩士 |
語言別 | 繁體中文 |
第二語言別 | |
口試日期 | 2020-07-08 |
論文頁數 | 77頁 |
口試委員 |
指導教授
-
吳家齊
委員 - 魏志平 委員 - 陳怡妃 |
關鍵字(中) |
社群媒體探勘 機器學習 轉移學習 使用者輪廓描繪 網路表示法 |
關鍵字(英) |
Social Media Mining Machine Learning Transfer Learning User Profiling Network Embedding |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
隨著科技快速發展,網際網路已成為人們生活中不可或缺的一部分,人們藉由網際網路進行網路購物、影音直播、休閒娛樂和使用社群媒體等,網際網路在生活中已相當普及。隨著社群媒體越來越盛行,服務內容越來越多樣,使用者在社群上留下的紀錄也越來越大量。這些資料中可能隱含著使用者的行為意圖、消費偏好,或性別、年齡,政治傾向等個人資訊。藉由這些資訊的收集與應用,能帶來極大的商業利潤或公共利益。 使用者輪廓描繪(User Profiling)常用於監督式學習(Supervised Learning),監督式學習需要有大量的資料,但如果單一領域資料不完整,就會造成資料不足並且難以判別使用者輪廓(User Profile)。因此,本研究使用轉移學習(Transfer Learning)將不同領域之間做轉移,以彌補單一領域資訊的不足。此研究主要圍繞在Instagram社群媒體,我們使用Instagram使用者產生的圖片作為分類的屬性去進行研究。建立一個Instagram與其它領域的異質網路(Heterogeneous Network),此異質網路為兩個不同領域的媒介,並做轉移學習。異質網路使用網路表示法(Network Embedding),而後使用機器學習(Machine Learning)分類出Instagram使用者性別等類別資料。 研究結果顯示,本研究實驗優於基準(Benchmark)實驗,表示異質網路確實可以將知識(Knowledge)連結並且能用於不同領域之間的特徵轉移。經過轉移學習預測Instagram使用者真實性別後,準確率最高為類神經網路(Artificial Neural Network)的71.05%,其次為隨機森林(Random Forests)的67.98%。 |
英文摘要 |
With the rapid development of technology, the Internet has become an indispensable part of people’s lives. People use the Internet for online shopping, live audio and video, leisure entertainment, and use of social media. The Internet has become an important part of life. As social media becomes more and more popular, the content of services becomes more and more diverse, and the user's records and published content on the community are also increasing. These data may imply the user's behavior intentions, consumption preferences, or personal information such as gender, age, and political orientation. Through the collection and application of this information, it can bring great commercial profits or public benefits. User Profiling is often used in Supervised Learning. Supervised Learning requires training data and test data, but if the data in a single domain is incomplete, this will cause insufficient data and it is difficult to discern the user profile. Therefore, this study uses Transfer Learning to transfer between different domains to make up for the lack of information in a single domain. This research mainly focuses on Instagram. We use the pictures generated by Instagram users as the attributes of the classification to conduct research. Creating a Heterogeneous Network for Instagram and other domains. This Heterogeneous Network is a medium for two different domains, and do transfer learning. Using Heterogeneous Network to use the Network Embedding calculates the vector of each node, and then uses Machine Learning to classify the Instagram user gender and other categories of data. The research results show that this research experiment is better than the Benchmark experiment, indicating that the Heterogeneous Network can indeed connect knowledge and can be used for feature transfer between different domains. After Transfer Learning predicts the true gender of Instagram users, the highest accuracy rate is 71.05% of Artificial Neural Network, followed by 67.98% of Random Forests. |
第三語言摘要 | |
論文目次 |
目錄 中文摘要 I 英文摘要 III 目錄 V 圖目錄 VIII 表目錄 IX 第一章 緒論 1 第二章 文獻探討 8 2.1社群媒體 (Social media) 8 2.2使用者輪廓描繪 (User Profiling) 10 2.3轉移學習 (Transfer Learning) 16 2.4網路表示法 (Network Embedding) 20 第三章 研究方法 28 3.1資料收集及前處理 (Data pre-processing) 28 3.2建立異質網路 (Heterogeneous Network) 30 3.3網路表示法提取向量 33 3.4匯入分類器 (Classifier) 35 3.5分類及驗證結果 36 第四章 實證結果 39 4.1資料庫 (Database) 39 4.1.1 Instagram 39 4.1.2爛番茄 (Rotten Tomatoes) 42 4.2 異質網路與資料敘述統計 44 4.2.1異質網路 (Heterogeneous Network) 44 4.2.2資料敘述統計 45 4.3 實驗結果 55 4.3.1 來源領域(Source Domain)獨自進行分類 55 4.3.2 目標領域(Target Domain)藉由轉移學習進行分類 57 4.3.3 基準 (Benchmark) 60 4.3.4 本研究與基準(Benchmark) 64 4.3.5 小結 68 第五章 結論與建議 70 5.1結論 70 5.2 未來發展與建議 71 參考文獻 72 中文文獻 72 英文文獻 72 圖目錄 圖 1-1、社群使用者發布電影相關文章並與電影領域連結 4 圖 1-2、社群使用者發布文章並與電影領域連結 5 圖 2-1、傳統機器學習與轉移學習之間的差異 18 圖 3-1、本研究之研究流程圖 29 圖 3-2、GOOGLE CLOUD VISION判斷照片標籤 30 圖 3-3、INSTAGRAM與來源領域建構之異質網路 31 圖 3-4、網路表示法提取網路向量示意圖 34 圖 3-5、離群相似度示意圖 37 圖 4-1、INSTAGRAM 使用者發布文章之頁面 41 圖 4-2、爛番茄(ROTTEN TOMATOES) 網站頁面 (2020) 43 圖 4-3、INSTAGRAM與爛番茄建構之異質網路 44 圖 4-4、INSTAGRAM資料庫中出現次數前二十名的標籤男女比例 47 圖 4-5、爛番茄資料庫中出現次數前二十名的電影男女比例 49 圖 4-6、爛番茄資料庫中電影種類佔5873部電影的比例圓餅圖 50 圖 4-7、爛番茄資料庫中電影種類的男女比例 51 圖 4-8、來源領域獨自進行分類與目標領域藉由轉移學習進行分類 69 表目錄 表 2-1、過去研究使用者輪廓描繪的相關文獻 13 表 2-2、各式分類器演算法優缺點比較 15 表 2-3、傳統機器學習與多種轉移學習之間的關聯 19 表 2-4、過去研究使用網路表示法節點和邊的定義與不同用途 21 表 2-5、過去研究網路表示法用於使用者輪廓描繪的相關文獻 27 表 3-1、混淆矩陣(CONFUSION MATRIX) 37 表 4-1、INSTAGRAM 資料庫總計 41 表 4-2、爛番茄資料庫總計 42 表 4-3、本研究異質網路八種節點各自數量 45 表 4-4、爛番茄資料庫中出現次數前十名的海報標籤 53 表 4-5、爛番茄資料庫中出現次數前十名關鍵字 54 表 4-6、預測爛番茄影評人性別的分類器結果比較 56 表 4-7、預測INSTAGRAM使用者單篇文章性別的分類器結果比較 58 表 4-8、預測INSTAGRAM使用者性別的分類器結果比較 59 表 4-9、BAG OF WORDS方式轉化為向量示意表 60 表 4-10、BAG OF WORDS預測爛番茄影評人性別的分類器結果比較 61 表 4-11、BAG OF WORDS預測INSTAGRAM使用者單篇文章性別 62 表 4-12、BAG OF WORDS預測INSTAGRAM使用者性別的分類器結果比較 63 表 4-13、本研究與基準預測爛番茄影評人性別的比較 65 表 4-14、本研究與基準預測INSTAGRAM使用者單篇文章性別的比較 66 表 4-15、本研究與基準預測INSTAGRAM使用者性別的比較 67 |
參考文獻 |
中文文獻 Cherice Chen(2020年1月)。2020 年你必須知道的 Instagram 統計數據。檢索自: https://reurl.cc/E7na7K 。 宋瓊玲(2007年)。新世代的圖書館服務:Web 2.0/Library 2.0的在圖書館應用。檢索自: https://www.lib.ncu.edu.tw/book/n43/43-2b.htm 。國立中央大學圖書館發行第43期。 財團法人台灣網路資訊中心(2019年)。台灣網路報告。檢索自: https://report.twnic.tw/2019/ 。 英文文獻 Argamon, S., Koppel, M., Pennebaker, J. W. and Schler, J. (2009). Automatically Profiling the Author of an Anonymous Text. In Communications of the ACM Boyd, D. M. and Ellison, N. B. (2007). Social Network Sites: Definition, History, and Scholarship. In Journal of Computer-Mediated Communication (pages 210-230). Cao, S., Lu, W., and Xu, Q. (2016). Deep neural networks for learning graph representations. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, ACM (pages 1145-1152). Chen, H., Sun, M., Tu1, C., Lin, Y. and Liu, Z. (2016). Neural Sentiment Classification with User and Product Attention. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (pages 1650-1659). Cui, P., Wang, X., Pei, J. and Zhu, W. (2019). Survey on Network Embedding. In IEEE Transactions on Knowledge and Data Engineering (pages 833-852). Estruch, C. P., Paredes, R. and Rosso, P. (2017). Learning Multimodal Gender Profile using Neural Networks. In Proceedings of the International Conference Recent Advances in Natural Language Processing, RANLP. Farnadi, G., Tang, J., Cock, M. D. and Moens, M. (2018). User Profiling through Deep Multimodal Fusion. In Proceedings of the Eleventh ACM International Conference on Web Search and Data (pages 171-179). Gilbert, E. and Karahalios, K. (2009). Predicting Tie Strength with Social Media. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, ACM (pages 211-220). Grover, A. and Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (pages 855-864). Ghani, N. A., Hamid, S., Hashem, I. A. T. and Ahmed, E. (2018). Social media big data analytics: A survey. In Computers in Human Behavior Reports companion journal. Goyal, P. and Ferrara, E. (2018, July) Graph Embedding Techniques, Applications, and Performance: A Survey. In Knowledge-Based Systems (pages 78-94). Goenawana, R. N., Chanrico, W., Suhartono, D. and Purnomo, F. (2019). Gender Demography Classification on Instagram based on User's Comments Section. In 4th International Conference on Computer Science and Computational Intelligence, ICCSCI (pages 64-71). Gu, Y., Ding, Z., Wang, S. and Yin, D. (2020, January). Hierarchical User Profiling for E-commerce Recommender Systems. In Proceedings of the 13th International Conference on Web Search and Data Mining (pages 223-231). Hao, P., Zhang, G., Martinez, L. and Lu, J. (2019, January). Regularizing Knowledge Transfer in Recommendation with Tag-Inferred Correlation. In IEEE Transactions On Cybernetics (Volume 49, NO. 1, pages 83-96). Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, ACM (pages 3111-3119). Otterbacher, J. (2010). Inferring Gender of Movie Reviewers: Exploiting Writing Style, Content and Metadata. In Proceedings of the 19th international conference on Information and knowledge management, ACM (pages 369-378). Ou, M., Cui, P., Pei, J., Zhang, Z. and Zhu, W. (2016). Asymmetric transitivity preserving graph embedding. In Proc. 22nd ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining (pages 672-681). Pan, S. J. and Yang, Q. (2010). A Survey on Transfer Learning. In IEEE Transactions On Knowledge and Data Engineering (pages 1-15). Peersman, C., Daelemans, W. and Vaerenbergh, L. V. (2011). Predicting Age and Gender in Online Social Networks. In Proceedings of the 3rd international workshop on Search and mining user-generated contents, ACM (pages 37-44). Pennacchiotti, M. and Popescu, A.-M. (2011, July). A Machine Learning Approach to Twitter User Classification. In International Conference on Weblogs and Social Media, ICWSM. Perozzi, B., Al-Rfou, R. and Skiena, S. (2014). Deepwalk: Online learning of social representations”. In Proceedings 20th international conference on Knowledge discovery and data mining (pages 701-710). Rao, D., Yarowsky, D., Shreevats, A. and Gupta M. (2010). Classifying Latent User Attributes in Twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents, ACM (pages 37-44). Ren, J., Long, J. and Xu, Z. (2019, October). Financial news recommendation based on graph embeddings. In Decision Support Systems (Volume 125). Schler, J., Koppel, M., Argamon, S. and Pennebaker, J. W. (2006). Effects of Age and Gender on Blogging. In AAAI Spring Symposium - Technical Report (pages 191-197). Schwartz, H. A., Eichstaedt, J. C., Kern1, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman1, M. E. P. and Ungar, L. H. (2013, September). Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. In PLoS ONE. Smith, P., and Chen, C. (2018). Transfer Learning with Deep CNNs for Gender Recognition and Age Estimation. In IEEE International Conference on Big Data (Big Data). Torrey, L. and Shavlik, J. (2009). Transfer Learning. In Handbook of Research on Machine Learning Applications. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J. and Mei, Q. (2015). Line: Large-Scale Information Network Embedding. In Proc. 24th Int. Conf. World Wide Web (pages 1067-1077.). Tu, C., Zhang, W., Liu, Z. and Sun, M. (2016). Max-Margin DeepWalk: Discriminative Learning of Network Representation. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, ACM (pages 3889-3895). Wang, D., Cui, P., and Zhu, W. (2016). Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM (pages 1225-1234). Wang, F., Lu, C.-T., Qu, Y. and Yu, P. S. (2017). Collective Geographical Embedding for Geolocating Social Network Users. In Advances in Knowledge Discovery and Data Mining: 21st Pacific-Asia Conference (pages 599-611). Wang, J., Li, S., Jiang, M., Wu, H. and Zhou, G. (2018). Cross-media User Profiling with Joint Textual and Social User Embedding. In Proceedings of the 27th International Conference on Computational Linguistics (pages 1410-1420). Yan, M., Sang, J., Mei, T. and Xu, C. (2013). Friend transfer: Cold-Start Friend Recommendation with Cross-Platform Transfer Learning of Social knowledge. In IEEE International Conference. Yang, C., Liu, Z., Zhao, D., Su1, M. and Chang, E. Y. (2015). Network Representation Learning with Rich Text Information. In Proceedings of the 24th International Conference on Artificial Intelligence, ACM (pages 2111-2117). Yang, C., Zhang, C., Chen, X., Ye, J. and Han, J. (2018). Did You Enjoy the Ride: Understanding Passenger Experience via Heterogeneous Network Embedding. In IEEE 34th International Conference on Data Engineering (ICDE). Zhang, C. and Zhang, P. (2010). Predicting gender from blog posts. In Proceedings of the 3rd international workshop on Search and mining user-generated contents, ACM (pages 37-44). Zhang, L., Fu, S., Jiang, S., Bao, R. and Zeng, Y. (2018). A Fusion Model of Multi-data Sources for User Profiling in Social Media. In Natural Language Processing and Chinese Computing, NLPCC (pages 3-15). |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信