淡江大學覺生紀念圖書館 (TKU Library)
進階搜尋


下載電子全文限經由淡江IP使用) 
系統識別號 U0002-1907201719162700
中文論文名稱 基於RFpS的集成學習於惡意程式分類之研究
英文論文名稱 Base on RFpS of Ensemble learning in Malware Family Classification
校院名稱 淡江大學
系所名稱(中) 資訊管理學系碩士在職專班
系所名稱(英) On-the-Job Graduate Program in Advanced Information Management
學年度 105
學期 2
出版年 106
研究生中文姓名 趙偉傑
研究生英文姓名 Wei-Chieh Chao
學號 704630200
學位類別 碩士
語文別 中文
第二語文別 英文
口試日期 2017-07-03
論文頁數 33頁
口試委員 指導教授-李鴻璋
委員-張昭憲
委員-壽大衛
中文關鍵字 惡意程式分類  機器學習  集成學習 
英文關鍵字 malware classification  machine learning  ensemble learning 
學科別分類
中文摘要 在惡意程式分析這領域,雖然近幾年在機器學習與人工智慧的挹注下有顯著的分析成果,然而,一般機器學習的分類方法遇到大量特徵時,會有學習時間過長以及大量消耗資源的問題。
本論文提出一個稱為RFpS(Random Forest predicated Svm)的兩段監督式集成學習的快速分類技術。克服以往因過多的多餘特徵訊息所造成的模型過度配適(overfitting)以及預測雜訊的問題。RFpS是結合Random Forest特徵萃取與SVM強分類的學習與預測能力,針對惡意程式進行快速及精準的分類。驗證的結果說明,RFpS方法與單獨只用SVM比較下,其平均學習塑型速度增加約4.5倍,而預測速度增加約2.5倍,平均精準度提昇約20%,達到98.4%。
英文摘要 As we know some fundamental issues of data mining applications are much more critical and severe once it refers to malware analysis, and unfortunately, they are still not well-addressed.
In this paper, the proposed a function, as well as uses supervised feature projection for redundant feature reduction and noise filtering. Combining Random Forest with SVM for named RFPS (Random Forest Predicated Svm), Method of reducing feature and fast classification.
The results that the learning time about 4.5 times compared with the SVM , predicted speed increases by about 2.5 times ,and the accuracy is about 20% to 98.4%.
論文目次 第一章 緒論(Introduction)1
1.1 研究背景與動機1
1.2 貢獻2
1.3 全文架構2
第二章 相關研究(Related work)4
2.1 近代惡意程式分類方法4
2.2 靜態與動態分析4
2.3 分類機器學習技術7
2.4 集成學習分類概念8
2.5 惡意程式分類模式10
2.6 分類機器學習技術10
第三章 RFpS系統與架構14
3.1 實驗步驟15
第四章 實驗結果28
4.1 分類結果驗證28
4.2 其他分類方法比較29
第五章 結論31
第六章 參考文獻32
===================================================
圖目次
圖 1. windows系統下惡意程式靜態分析技術6
圖 2 . 隨機森林隨機演進12
圖 3. 所提之RFpS系統主要架構14
圖 4. 驗證步驟15
圖 5. 21651的惡意程式共9類樣本16
圖 6. .bytes檔16
圖 7. .asm檔17
圖 8 . 重要的API範例17
圖 9 . Section資訊18
圖 10 . Disassembled code DB資訊18
圖 11 . Disassembled code DD資訊18
圖 12 . Data Define資訊19
圖 13. 十六進制機器碼19
圖 14. PE檔20
圖 15. 從二進位的原始碼反組譯成十六進制作為特徵值20
圖 16 . k-fold(k=5)交叉驗證架構26
圖 17 . 交叉驗證結果(平均為98.44%)28
圖 18 . 混淆矩陣分析結果29
圖 19. RF後接不同分類器分類準確率比較30
=====================================================
表目次
表 1惡意程式行為及種類<本研究整理> 5
表 2演算法分類整理<本研究整理> 7
表 3 .bytes類型中13小類及其特徵數量 21
表 4 兩種不同資料類型準確率效度的比較表 22
表 5特徵挑選結果 24
表 6實驗在有無使用RFpS挑選特徵挑選的性能比較 28


參考文獻 [1] https://www.kaggle.com/c/malware-classification
[2] Sami, A., Yadegari, B., Rahimi, H., Peiravian, N., Hashemi, S., Hamze, A.: Malware detection based on mining api calls. In: Proceedings of the 2010 ACM symposium on applied computing, ACM (2010) 1020–1025
[3] Ye, Y., Wang, D., Li, T., Ye, D., Jiang, Q.: An intelligent pe-malware detection system based on association mining. Journal in computer virology 4 (2008) 323–334
[4] Narouei, M., Ahmadi, M., Giacinto, G., Takabi, H., Sami, A.: Dllminer: structural mining for malware detection. Security and Communication Networks 8 (2015)3311–3322
[5] Willems, C., Holz, T., Freiling, F.: Toward automated dynamic malware analysis using cwsandbox. IEEE Security & Privacy 5 (2007)
[6] Rieck, K., Holz, T., Willems, C., Dussel, P., Laskov, P.: Learning and classification of malware behavior. In: International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Springer (2008) 108–125
[7] Ahmadi, M., Sami, A., Rahimi, H., Yadegari, B.: Malware detection by behavioural sequential patterns. Computer Fraud & Security 2013 (2013) 11–19
[8] Wuchner, T., Ochoa, M., Pretschner, A.: Malware detection with quantitative data flow graphs. In: Proceedings of the 9th ACM symposium on Information, computer and communications security, ACM (2014) 271–282
[9] Kirat, D., Vigna, G.: Malgene: Automatic extraction of malware analysis evasion signature. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, ACM (2015) 769–780
[10] Drew, J., Moore, T., Hahsler, M.: Polymorphic malware detection using sequence classification methods. In: Security and Privacy Workshops (SPW), 2016 IEEE, IEEE (2016) 81–87
[11] Santos, I., Brezo, F., Ugarte-Pedrero, X., Bringas, P.G.: Opcode sequences as representation of executables for data-mining-based unknown malware detection. Information Sciences 231 (2013) 64–82
[12] Hu, X., Chiueh, T.c., Shin, K.G.: Large-scale malware indexing using function-call graphs. In: Proceedings of the 16th ACM conference on Computer and communications security, ACM (2009) 611–620
[13] Griffin, K., Schneider, S., Hu, X., Chiueh, T.C.: Automatic generation of string signatures for malware detection. In: International Workshop on Recent Advances in Intrusion Detection, Springer (2009) 101–120
[14] Kirat, D., Vigna, G., Kruegel, C.: Barecloud: Bare-metal analysis-based evasive malware detection. In: USENIX Security. Volume 2014. (2014) 287–301
[15] Kolter, J. Z., & Maloof, M. A. (2006). Learning to detect and classify malicious executables in the wild. Journal of Machine Learning Research, 6, 2721-2744.
[16] Santos, I., Brezo, F., Nieves, J., Penya, Y. K., Sanz, B., Laorden, C., & Bringsa, P. G. (2010). Idea: Opcode-sequence-based malware detection. Proceedings of the 2nd International Conference on Engineering Secure Software and Systems, Pisa, Italy.
[17] Conti, G., Bratus, S., & Shubinay, A. (2010). A visual study of primitive binary fragment types. Black Hat USA. Retireved July 4, 2010, from http://www.rumit.org/gregconti/ publications/taxonomy-bh.pdf
[18] Schultz, M., Eskin, E., Zadok, F. and Stolfo, S. (2001) Data Mining Methods for Detection of New Malicious Execu-
tables. Proceedings of 2001 IEEE Symposium on Security and Privacy, Oakland, 14-16 May 2001, 38-49

[19] Islam, R., Tian, R., Battenb, L. and Versteeg, S. (2013) Classification of Malware Based on Integrated Static and Dy- namic Features. Journal of Network and Computer Application, 36, 646-556. http://dx.doi.org/10.1016/j.jnca.2012.10.004

[20] Breiman L. Random Forests [J]. Machine Learning, 2001, 45(1):5-32.
[21] R. Lyda and J. Hamrock. Using entropy analysis to find encrypted and packed malware. IEEE Security and Privacy, 5(2):40–45, Mar. 2007.
[22] M. Christodorescu, S. Jha, S. Seshia, D. Song, and R. Bryant. Semantics-aware malware detection. In Security and Privacy, 2005 IEEE Symposium on, pages 32–46, May 2005.
D. Bilar. Statistical structures: Fingerprinting malware for classification and analysis. In Blackhat, 2006.

[23] Top maliciously used apis. https: //www.bnxnet.com/top-maliciously-used-apis/, 2015

[24] M. Ferna ́ndez-Delgado, E. Cernadas, S. Barro, and D. Amorim. Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res., 15(1):3133–3181, Jan. 2014.


論文使用權限
  • 同意紙本無償授權給館內讀者為學術之目的重製使用,於2017-08-24公開。
  • 同意授權瀏覽/列印電子全文服務,於2017-08-24起公開。


  • 若您有任何疑問,請與我們聯絡!
    圖書館: 請來電 (02)2621-5656 轉 2486 或 來信