淡江大學覺生紀念圖書館 (TKU Library)

系統識別號 U0002-1407202014205800
中文論文名稱 基於照片共享社群媒體的文章圖片和內文分類使用者性別:以Instagram為例
英文論文名稱 Gender classification on photo-sharing social media based on images and content analysis: a case study of Instagram
校院名稱 淡江大學
系所名稱(中) 管理科學學系企業經營碩士班
系所名稱(英) Master's Program In Business And Management, Department Of Management Sciences
學年度 108
學期 2
出版年 109
研究生中文姓名 王怡蓁
研究生英文姓名 Yi-Chen Wang
學號 608620059
學位類別 碩士
語文別 中文
口試日期 2020-07-08
論文頁數 47頁
口試委員 指導教授-吳家齊
中文關鍵字 使用者特徵  機器學習  文字探勘  圖片分類  社群媒體探勘 
英文關鍵字 User Profile  Machine Learning  Text Mining  Picture classification  Social Media Mining 
中文摘要 近年來,社群媒體日漸興起,若是能從中有效的掌握使用者特徵,企業在行銷及推廣上,便能更迅速的掌握目標客群。本研究欲針對近幾年用戶參與度逐年倍增,以照片分享為宗旨的社群媒體Instagram平台作為分析對象,但在以往的研究中,大多只對發布的內文、文字部分作探討,在此種照片共享社群平台上無法有效分析其使用者特徵。
英文摘要 In recent years, with the rise of social media, companies can effectively grasp target customers in marketing and promotion if they can effectively grasp user profiles. This study aims to analyze social media platform that are mainly based on photo-sharing. However, most of the previous studies only discussed the content and text of posts to analyze user profiles.
In this study, we collected public users of Instagram to classify their gender. In the final result, only 73.68% of the inferred gender accuracy rate was obtained in the text part. In contrast, by using image analysis to classify the gender of users, the accuracy rate is as high as 92.11%. In addition, if the attributes of user information, images, and text are integrated into the neural network classifier, the accuracy rate is 89.47%.
Through this study, we found out that the classification model constructed only by image analysis is more effective than text or other structured information to infer the gender characteristics of users on photo-sharing social media.
論文目次 目錄
中文摘要 I
英文摘要 III
目錄 V
圖目錄 VII
表目錄 IX
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機 4
1.3 研究目的 6
第二章 文獻探討 10
2.1 使用者特徵(User profile) 10
2.2 圖片分析 13
2.3 小結 17
第三章 研究方法 18
3.1 研究設計 18
3.2 原始資料介紹 19
3.3 屬性篩選及處理 21
3.3.1 使用者資訊 22
3.3.2 圖片 22
3.3.3 文字 27
第四章 實驗結果 31
4.1 資料描述 31
4.2 訓練資料(Training)和測試資料(Test) 31
4.3 屬性分類 32
4.4 分類結果評估 33
4.4.1 使用者資訊、圖片、文字三部分 33
4.4.2 圖片部分的結果評估 35
4.4.3 文字和圖片Ensemble前後比較 37
4.4.4 推論使用者性別 39
第五章 結論與建議 42
5.1 研究發現 42
5.2 未來方向與建議 42
參考文獻 44
網路資源 44
英文文獻 44
圖1-1 全球最常使用之社群平台 2
圖1-2 世界各國16-64歲對於個人資料的濫用關心程度 3
圖1-3 社群平台、公司企業與使用者關係圖 4
圖1-4 各社群媒體平台的資源分佈 5
圖1-5 沒有任何文字的Instagram貼文 6
圖1-6 Instagram公開帳號資料搜集 7
圖1-7 Instagram頭貼照片非本人無法判斷性別之範例 9
圖1-8 研究流程圖 9
圖2-1 Google Vision API Objects 範例 15
圖2-2 Google Vision API Labels範例 15
圖2-3 Google Vision API Properties範例 16
圖3-1 研究架構圖 18
圖3-2 Instagram公開及私人帳號頁面 19
圖3-3 圖片部分研究架構 27
圖3-4 文字部分研究架構 28
圖3-5 標點符號出現在內文比例之比較 30
圖4-1 訓練和測試資料分配 32
圖4-2 三部分的分類結果 34
圖4-3 貼文圖片前20則最確定為男性的Labels排名 35
圖4-4 貼文圖片前20則最確定為女性的Labels排名 36
圖4-5 貼文圖片前20則最確定為女性和男性的Objects比較 37
圖4-6 最終分類結果 40
表2-1 使用者特徵相關研究 10
表3-1 Instagram帳號資料結構、單位說明 21
表3-2 屬性一覽表 21
表3-3 圖片屬性挑選原因 23
表4-1 屬性整理一覽表 32
表4-2 文字部分的Ensemble前後比較 38
表4-3 圖片部分的Ensemble前後比較 39
參考文獻 網路資源
Data Reportal. (Jan, 2020). DIGITAL 2020: GLOBAL DIGITAL OVERVIEW. Retrieved from: https://datareportal.com/reports/digital-2020-global-digital-overview
Data Reportal. (Apr, 2020). DIGITAL 2020: APRIL GLOBAL STATSHOT. Retrieved from: https://datareportal.com/reports/digital-2020-april-global-statshot?rq=Digital%202020%20April%20Global%20Statshot%20Report
Socialinsider. (Mar, 2020). [Survey]Social Media Bottlenecks: Getting More Traffic and Engagement Is An Issue For 55% Of Professionals. Retrieved from: https://www.socialinsider.io/blog/social-media-marketing-bottlenecks/

Argamon, S., Koppel, M., Pennebaker, J. W., & Schler, J. (2009). Automatically profiling the author of an anonymous text. Communications of the ACM, 52(2), 119-123.
Bamman, D., Eisenstein, J., & Schnoebelen, T. (2014). Gender identity and lexical variation in social media. Journal of Sociolinguistics, 18(2), 135-160.
Chen, H., Sun, M., Tu, C., Lin, Y., & Liu, Z. (2016, November). Neural sentiment classification with user and product attention. In Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 1650-1659).
Cheng, A. J., Chen, Y. Y., Huang, Y. T., Hsu, W. H., & Liao, H. Y. M. (2011, November). Personalized travel recommendation by mining people attributes from community-contributed photos. In Proceedings of the 19th ACM international conference on Multimedia (pp. 83-92).
Dhir, A., Pallesen, S., Torsheim, T., & Andreassen, C. S. (2016). Do age and gender differences exist in selfie-related behaviours?. Computers in Human Behavior, 63, 549-555.
Estruch, C. P., Palacios, R. P., & Rosso, P. (2017, September). Learning Multimodal Gender Profile using Neural Networks. In RANLP (pp. 577-582).
Goenawana, R. N., Chanrico, W., Suhartono, D., & Purnomo, F. (2019). Gender Demography Classification on Instagram based on User’s Comments Section. Procedia Computer Science, 157, (pp. 64-71).
He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., & Li, M. (2019). Bag of tricks for image classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 558-567).
Li, S., Song, W., Fang, L., Chen, Y., Ghamisi, P., & Benediktsson, J. A. (2019). Deep learning for hyperspectral image classification: An overview. IEEE Transactions on Geoscience and Remote Sensing, 57(9), 6690-6709.
Lin, X., Featherman, M., Brooks, S. L., & Hajli, N. (2019). Exploring gender differences in online consumer purchase decision making: An online product presentation perspective. Information Systems Frontiers, 21(5), 1187-1201.
Liu, P., Qiu, X., & Huang, X. (2016). Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101.
Miller, Z., Dickinson, B., & Hu, W. (2012). Gender prediction on twitter using stream algorithms with n-gram character features.
Mitchell, V. W., & Walsh, G. (2004). Gender differences in German consumer decision‐making styles. Journal of Consumer Behaviour: An International Research Review, 3(4), 331-346.
Otterbacher, J. (2010, October). Inferring gender of movie reviewers: exploiting writing style, content and metadata. In Proceedings of the 19th ACM international conference on Information and knowledge management (pp. 369-378).
Peersman, C., Daelemans, W., & Van Vaerenbergh, L. (2011, October). Predicting age and gender in online social networks. In Proceedings of the 3rd international workshop on Search and mining user-generated contents (pp. 37-44).
Pennacchiotti, M., & Popescu, A. M. (2011, July). A machine learning approach to twitter user classification. In Fifth international AAAI conference on weblogs and social media.
Rao, D., Yarowsky, D., Shreevats, A., & Gupta, M. (2010, October). Classifying latent user attributes in twitter. In Proceedings of the 2nd international workshop on Search and mining user-generated contents (pp. 37-44).
Schler, J., Koppel, M., Argamon, S., & Pennebaker, J. W. (2006, March). Effects of age and gender on blogging. In AAAI spring symposium: Computational approaches to analyzing weblogs (Vol. 6, pp. 199-205).
Schwartz, H. A., Eichstaedt, J. C., Kern, M. L., Dziurzynski, L., Ramones, S. M., Agrawal, M., Shah, A., Kosinski, M., Stillwell, D., Seligman, M. E. P., & Ungar, L. H. (2013). Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS one, 8(9), e73791.
Shih-Yu Shu, Chih-Ping Wei (2017). A Semi-supervised Approach for Profiling Online Reviewers. College of Management National Taiwan University, Taiwan.
Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. (2013, October). Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1631-1642).
Zhang, C., & Zhang, P. (2010). Predicting gender from blog posts. University of Massachussetts Amherst, USA.
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2025-07-31公開。
  • 同意授權瀏覽/列印電子全文服務,於2025-07-31起公開。

  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2486 或 來信