§ 瀏覽學位論文書目資料
  
系統識別號 U0002-2401202118453200
DOI 10.6846/TKU.2021.00621
論文名稱(中文) 考量不平衡資料集之線上拍賣詐騙偵測方法
論文名稱(英文) Online Auction Fraud Detection for Imbalanced Datasets
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊管理學系碩士班
系所名稱(英文) Department of Information Management
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 109
學期 1
出版年 110
研究生(中文) 鄭悦彤
研究生(英文) Yue-Tong Zheng
學號 607630307
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2021-01-13
論文頁數 43頁
口試委員 指導教授 - 張昭憲
委員 - 壽大衛
委員 - 魏世杰
委員 - 張昭憲
關鍵字(中) 異常偵測
不平衡資料集
機器學習
線上拍賣詐騙
電子商務
關鍵字(英) Anomaly Detection
Imbalanced Datasets
Machine Learning
Online Auction Fraud
E-commerce
第三語言關鍵字
學科別分類
中文摘要
電子商務的蓬勃發展有目共睹,預計至2020年將可超過4兆美元。面對如此龐大的交易金額,許多網路犯罪(Internet Crimes)也因應而生。在網路的屏蔽之下,詐騙者運用虛擬身分與複雜的多重手法,讓民眾受害而不自覺。以美國為例,2019年網路犯罪申訴案件便高達467,361件,損失金額更超過35億美金,其嚴重程度可見一斑。雖然相關單位對網路詐騙高度重視,並經常教導民眾自保之道,但詐騙手法日新月異,顯然需有更積極措施,才能防範未然,避免無辜民眾受害。有鑑於此,學者們紛紛提出各種詐騙偵測方法,以協助正確辨識詐騙案件,提供相關單位早期預警。然而,相對於正常交易,詐騙案件相對稀少,導致不平衡資料集問題的產生,嚴重影響各種方法之效能與實用性。有鑑於此,本研究針對不平衡資料集,發展有效的詐騙偵測方法。首先,為克服單一模型的效能限制,我們採用多模型之偵測流程,並以不同配比之不平衡資料集進行評估,以了解其實際差異。其次,本研究將詐騙者進行分群,除分析其類型外,也藉此了解偵測方法的效能瓶頸。此外,本研究也嘗試使用LSTM深度學習方法發展有效的偵測方法。透過交易歷史切割,產生時序資料集,產生能兼顧時間特性的偵測模型。為驗證提出方法之有效性,本研究採用實際拍賣資料集進行實驗。結果顯示,對於不平衡資料集,以連續過濾為基礎之多模型偵測架構可獲得最佳結果。其次,實驗也顯示單一模型對於配比未知之資料集,無法提供穩定有效之偵測準確率。上述結果顯示,多模型偵測架構對於異常偵測之重要性。此外,偵測模型對於不同類型詐騙者的偵測效能確實明顯不同,分析結果可做為發展新方法之依據。對以LSTM建立之偵測模型,結果並不及於多模型方法,但未來可考量將其結合入多模型架構中,以資料融合概念,進一步提升偵測準確率。
英文摘要
The flourishing development of e-commerce is vigorous to all and it is expected to be more than 4 trillion US dollars by 2020. Faced with such a huge transaction amount, many Internet Crimes have also emerged. Under the shield of the Internet, fraudsters use virtual identities and complicated tactics to make people be deceived unconsciously. Take the United States as an example, there were 467,361 cybercrime complaints and the amount of losses exceed 3.5 billion U.S. dollars in 2019. The severity of the Internet Fraud is getting worse. Although the authority concerned attach great importance to online fraud and often teach people how to protect themselves, fraudulent methods are changing with each passing day. Obviously, more proactive methods are needed to prevent innocent people from being defrauded. In view of this, researchers have proposed various fraud detection methods to assist in the identification of fraud cases and provide early warning for the authority. However, compared with normal transactions, fraud cases are relatively rare leading to the problem of imbalanced datasets, and seriously affecting the effectiveness of various methods. To this end, this research aims to develop effective fraud detection methods for imbalanced datasets. First of all, in order to overcome the performance limitation of a single model, we adopt a multi-model detection process and evaluate the imbalanced datasets with different ratio to understand the actual differences. In addition, this research also attempts to develop effective detection methods using LSTM deep learning methods. Through transaction history partition, time-dependent datasets are generated and detection models taking time factors into account are generated. In order to verify the effectiveness of the proposed method, this study uses actual auction datasets for experiment. The results show that for imbalanced datasets, multi-model detection architecture based on successive filtering can obtain the best results. In addition, experiment also show that a single model cannot provide a stable and effective detection accuracy for a data set with an unknown ratio. The above results show the importance of multi-model detection architecture for anomaly detection. For the detection model built with LSTM, the result is obviously not as good as the multi-model method, but in the future, it can be considered to incorporate it into a multi-model architecture to further improve the detection accuracy with the concept of data fusion.
第三語言摘要
論文目次
目錄
第一章 緒論1
第二章 相關技術與背景知識4
2.1	線上購物詐騙 (Online Shopping Fraud) 4
2.2	塑模方法介紹6
2.2.1	集成式學習(Ensemble Learning):Random Forest and AdaBoost 6
2.2.2	深度學習方法(Deep Learning) 7
2.3	以模型融合(Model-Fusion)結合多模型進行偵測8
2.4	不平衡資料集之塑模與偵測9
第三章 針對不平衡資料集之線上拍賣詐騙偵測方法10
3.1	不平衡資料集之偵測(Detection of Imbalanced Test Set) 10
3.2	以模型融合方式進行詐騙偵測13
3.3	偵測屬性集15
3.4	資料集分群18
3.5	使用深度學習方式進行詐騙偵測22
第四章 實驗結果24
4.1	效能評量指標(Evaluation Metrics) 24
4.2	多模型偵測方法之偵測結果27
4.3	以深度學習法進行詐騙偵測之實驗結果29
4.4	連續過濾法M5模型進行分群30
4.5	多模型偵測方法之個別模型資料配比實驗結果33
4.6	以分類樹產生偵測規則34
第五章 結論與未來工作36
參考文獻38
附錄A 40

表目錄
表 2-1:常見的線上購物詐騙手法5
表 3-1:以不同資料比例建立偵測模型對測試資料之偵測結果11
表 3-2:運用不平衡資料集建立偵測模型14
表 3-3:本研究所使用之詐騙偵測屬性集(劉祐宏, 2012)17
表 3-4:使用X-means對於詐騙者(500筆資料)進行分群19
表 3-5:使用X-means對於正常者(1000筆資料)進行分群21
表 4-1:混淆矩陣(Confusion Matrix)25
表 4-2:Confusion Matrix範例26
表 4-3:單一模型、連續過濾法、平衡式偵測法之效能比較,Test Set(F:NF=1:8)28
表 4-4:單一模型、連續過濾法、平衡式偵測法之效能比較,Test Set(F:NF=1:4)28
表 4-5:單一模型、連續過濾法、平衡式偵測法之效能比較,Test Set(F:NF=1:1)29
表 4-6:LSTM、連續過濾法、平衡式偵測法之效能比較,Test Set(F:NF=1:8) 30
表 4-7:LSTM、連續過濾法、平衡式偵測法之效能比較,Test Set(F:NF=1:4) 30
表 4-8:根據分群結果之詐騙者偵測準確率統計(針對連續過濾之M5模型) 31
表 4-9:根據分群結果之正常者偵測準確率統計(針對連續過濾之M5模型 )32
表 4-10:連續過濾法中各模型使用不同資料配比之偵測效能比較33
表 4-11:連續過濾法中各模型使用不同資料配比之偵測效能比較33
表 4-12:使用不同資料配比產生之J48分類規則34

圖目錄
圖 2-1:使用RNN與LSTM進行深度學習之隱藏層元件(圖片來源: Greff et al., 2017)8
圖 3-1:以不同資料比例塑模對不平衡測試集(F:NF=1:8)之偵測結果12
圖 3-2:以不同資料比例塑模對不平衡測試集(F:NF=1:4)之偵測結果12
圖 3-3:以不同資料比例塑模對測試集(F:NF=1:1)之偵測結果12
圖 3-4:以模型融合方式結合多個詐騙偵測模型(陳世軒,2019)15
圖 3-5:拍賣網站會員交易歷史資料16
圖 3-6:以深度學習模型進行詐騙偵測23
圖 3-7:交易歷史資料切割,以產生時序資料集23
參考文獻
Benchaji, I., Douzi, S., & El Ouahidi, B. (2018). Using genetic algorithm to improve classification of imbalanced datasets for credit card fraud detection. International Conference on Advanced Information Technology, Services and Systems.	
Chang, J.-S., Chang, W.-H. J. E. C. R., & Applications. (2014). Analysis of fraudulent behavior strategies in online auctions for detecting latent fraudsters. 13(2), 79-97. 	
Chang, W.-H., Chang, J.-S. J. E. C. R., & Applications. (2012). An effective early fraud detection method for online auctions. 11(4), 346-360. 	
Chau, D. H., & Faloutsos, C. (2005). Fraud detection in electronic auction. European Web Mining Forum at ECML/PKDD.	
Chau, D. H., Pandit, S., & Faloutsos, C. (2006). Detecting fraudulent personalities in networks of online auctioneers. European Conference on Principles of Data Mining and Knowledge Discovery.	
Chen, C., Zhu, Q., Lin, L., Shyu, M.-L. J. A. T. o. I. S., & Technology. (2013). Web media semantic concept retrieval via tag removal and model fusion. 4(4), 1-22. 	
Chen, J., Tao, Y., Wang, H., Chen, T. J. T. J. o. F., & Science, D. (2015). Big data based fraud risk management at Alibaba. 1(1), 1-10. 	
eMarketer, Retail &Ecommerce report, Retrieved on Mar. 1, 2020, https://www.emarketer.com/topics/topic/retail-ecommerce 
Gavish, B., & Tucci, C. L. J. C. o. t. A. (2008). Reducing internet auction fraud. 51(5), 89-97. 	
Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., Schmidhuber, J. J. I. t. o. n. n., & systems, l. (2016). LSTM: A search space odyssey. 28(10), 2222-2232. 	
Huang, D., Mu, D., Yang, L., & Cai, X. J. I. A. (2018). CoDetect: Financial fraud detection with anomaly feature detection. 6, 19161-19174. 	
Huang, S., Ma, J., Cheng, P., Wang, S. J. A. T. o. I. S., & Technology. (2015). A hybrid multigroup coclustering recommendation framework based on information fusion. 6(2), 1-22. 	
Kim, K., Choi, Y., Park, J. J. E. C. R., & Applications. (2013). Pricing fraud detection in online shopping malls using a finite mixture model. 12(3), 195-207. 
Kingston, J. K. (2017). Representing, reasoning and predicting fraud using fraud plans. 2017 11th International Conference on Research Challenges in Information Science (RCIS).	
Kumar, M. S., Soundarya, V., Kavitha, S., Keerthika, E., & Aswini, E. (2019). Credit card fraud detection using random forest algorithm. 2019 3rd International Conference on Computing and Communications Technologies (ICCCT). 	
Kunlin, Y. (2018). A Memory-Enhanced Framework for Financial Fraud Detection. 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).
Makki, S., Assaghir, Z., Taher, Y., Haque, R., Hacid, M.-S., & Zeineddine, H. J. I. A. (2019). An experimental study with imbalanced classification approaches for credit card fraud detection. 7, 93010-93022. 	
Mishra, A., & Ghorpade, C. (2018). Credit card fraud detection on the skewed data using various classification and ensemble techniques. 2018 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS).	
Mitchell,T. and McGraw-Hill, "Machine Learning", 1997, pp.52-81.
National White Collar Crime Center (NW3C). 2019 Internet Crime Report. Retrieved on Mar. 1, 2020, from Internet Crime Complaint Center: https://pdf.ic3.gov/2019_IC3Report.pdf
Pandit, S., Chau, D. H., Wang, S., & Faloutsos, C. (2007). Netprobe: a fast and scalable system for fraud detection in online auction networks. Proceedings of the 16th international conference on World Wide Web. 	
Tsang, S., Koh, Y. S., Dobbie, G., & Alam, S. J. K.-B. S. (2014). SPAN: Finding collaborative frauds in online auctions. 71, 389-408. 	
Xie, S., & Philip, S. Y. (2018). Next Generation Trustworthy Fraud Detection. 2018 IEEE 4th International Conference on Collaboration and Internet Computing (CIC). 	
Xuan, S., Liu, G., Li, Z., Zheng, L., Wang, S., & Jiang, C. (2018). Random forest for credit card fraud detection. 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC). 	
Zamini, M., & Montazer, G. (2018). Credit card fraud detection using autoencoder based clustering. 2018 9th International Symposium on Telecommunications (IST). 	
鄭孝儒(2011)。線上拍賣潛伏期詐騙者之有效偵測。淡江大學資訊管理學系碩士論文。
劉祐宏(2012)。線上拍賣詐騙偵測之屬性挑選與流程設計。淡江大學資訊管理學系碩士論文。
陳世軒(2019)。以模型融合為基礎之線上拍賣詐騙偵測。淡江大學資訊管理學系碩士論文。
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信