系統識別號 | U0002-2307201918004900 |
---|---|
DOI | 10.6846/TKU.2019.00736 |
論文名稱(中文) | 利用深度混合模型辨識Android惡意應用程式 |
論文名稱(英文) | A Deep Learning Hybrid Model Detecting Android Malwares |
第三語言論文名稱 | |
校院名稱 | 淡江大學 |
系所名稱(中文) | 資訊工程學系碩士班 |
系所名稱(英文) | Department of Computer Science and Information Engineering |
外國學位學校名稱 | |
外國學位學院名稱 | |
外國學位研究所名稱 | |
學年度 | 107 |
學期 | 2 |
出版年 | 108 |
研究生(中文) | 鍾豪 |
研究生(英文) | Hao Chung |
學號 | 606410420 |
學位類別 | 碩士 |
語言別 | 繁體中文 |
第二語言別 | |
口試日期 | 2019-06-11 |
論文頁數 | 25頁 |
口試委員 |
指導教授
-
黃心嘉
委員 - 顏嵩銘 委員 - 黃仁俊 委員 - 黃心嘉 |
關鍵字(中) |
Android 惡意應用程式 Android惡意應用程式辨識 深度學習 |
關鍵字(英) |
Android malware apps malware app detection deep learning |
第三語言關鍵字 | |
學科別分類 | |
中文摘要 |
面對數量不斷成長的Android惡意應用程式,發展透過靜態分析,以深度學習的方法來辨識惡意應用程式是十分重要的。相較於動態分析,靜態分析的優點是需要較少的計算資源與時間。由於惡意應用程式的演化與Android版本的推陳出新,導致需要新增特徵以維持良好的辨識率。然而,新增特徵對於許多現有的透過靜態分析,以深度學習的辨識方法,將導致需要全部重新訓練,十分耗時。為了解決此一問題,本研究將提出具有彈性、適應性佳及有效能的深度類神經網路模型。該模型包含兩個主要類神經網路,一個初始網路和一個整合網路。初始網路對不同形態特徵的採用分類擷取方式,具有部份調整彈性,而另一方面整合網路可以有效且善於辨識惡意應用程式。彈性意味該新網路可以有效方式調校以增加新特徵;適應性意味透過定期更新權重以維持辨識率;效率意味僅需調整特徵擷取的子模型部分。我們的使用API方法函數呼叫與權限兩種特徵的混合模型,具有研究價值與實用性,因為實驗顯示準確率可達98.15%。 |
英文摘要 |
Due to the growing number of Android malware apps, a deep learning approach with static analysis to detect malware apps is necessary. Even though some malware apps’ detection utilizes dynamic analysis, the detection with static analysis needs less computation resources and computational time. Due to the evolution of malware apps and the new released version of the Android operating system, more new features should be added to increase accuracy rates. However, to add those new features, most of the proposed deep learning detections have to re-train totally again. To overcome this problem, a flexible, adaptable, and efficient deep neural network learning hybrid model will be proposed. This hybrid model contains two neural networks: initial neural network and final neural network. The initial network is flexible to extract multiple feature sets while the final network is efficient and good at malware app detection. The flexibility means that the initial network can be adjusted for adding new features. Adaptable property means that the neural network can be easily modified weights periodically to maintain detection rate. The efficiency means that re-training partially neural networks for maintaining detection rate without re-training overall neural networks. Our hybrid model using API method calls and permission feature sets is research-valuable and practical, because our accuracy rate is 98.15%. |
第三語言摘要 | |
論文目次 |
Table of Content Chapter 1 Introduction 1 Chapter 2 Review 7 Chapter 3 A Hybrid Model Scheme 11 3-1 Feature Extraction Stage 12 3-2 Model Training Stage 13 3-3 Detection Stage 15 Chapter 4 Experiment Results 16 4-1 Datasets 16 4-2 Experimental Environment 16 4-3 Hybrid Model Malware Detection Performance 16 4-4 Effectiveness of Hybrid Model 17 4-5 Discussions 19 Chapter 5 Conclusions 21 References 22 List of Figures Fig. 1. Allix et al.’s Experiment Result 5 Fig. 2. Deep Learning Methodology 7 Fig. 3. Multimodal Architecture 10 Fig. 4. Hybrid Model Architecture 11 Fig. 5. The Architecture of CNN Sub-model 12 List of Tables Table 1: Hyper-parameter of Hybrid Model Sub-model 15 Table 2: Hybrid Model Evaluation Metrics 17 Table 3: Performance Metrics for the Model Using Single Feature Set 18 Table 4: Performance Metrics for Our Hybrid Model Using Different Feature Sets 18 |
參考文獻 |
[1] Statista. [Online]. Available: https://www.statista.com/statistics, Accessed: Dec. 17, 2018. [2] C. Lueg, “New malware every 10 seconds,” G Data, Bochum, Germany, Tech. Rep., May 2018. [Online]. Available: https://www.gdatasoftware.com/blog/2018/05/30735-new-malware-every-10-seconds [3] N. Peiravian and X. Zhu, “Machine Learning for Android Malware Detection Using Permission and API Calls,” in Proceedings of IEEE 25th International Conference on Tools with Artificial Intelligence, 2013, pp. 300-305. [4] Y. LeCun, Y. Bengio, and G. Hinton, "Deep Learning," Nature, vol. 521, pp. 436-444, 2015. [5] I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, An MIT Press book, 2016. [6] J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural Networks, vol. 61, 2015, pp. 85-117. [7] D.-J. Wu, C.-H. Mao, T.-E. Wei, H.-M. Lee, and K.-P. Wu, “DroidMat: Android Malware Detection through Manifest and API Calls Tracing,” in Proceeding of the Seventh Asia Joint Conference on Information Security (Asia JCIS), Aug. 2012, pp. 62–69. [8] C.-Y. Huang, Y.-T. Tsai, and C.-H. Hsu, “Performance Evaluation on Permission-Based Detection for Android Malware,” in Advances in Intelligent Systems and Applications (Smart Innovation, Systems and Technologies), vol. 2. Berlin, Germany: Springer, 2013, pp. 111–120. [9] Z. Aung and W. Zaw, “Permission-Based Android Malware Detection,” International Journal of Science and Technology Research, vol. 2, no. 3, pp. 228–234, 2013. [10] L. Deshotels, V. Notani, and A. Lakhotia, “DroidLegacy: Automated Familial Classification of Android Malware,” in Proceedings of ACM SIGPLAN on Program Protection Reverse Engineering Workshop, 2014, Article no. 3. [11] M. Zhang, Y. Duan, H. Yin, and Z. Zhao, “Semantics-Aware Android Malware Classification Using Weighted Contextual API Dependency Graphs,” in Proceedings of ACM Conference on Computer and Communications Security (CCS), 2014, pp. 1105–1116. [12] D. Arp, M. Spreitzenbarth, M. Hübner, H. Gascon, and K. Rieck, “DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket,” in Proceedings of the Symposium on Network Distributed System Security (NDSS), vol. 14, 2014, pp. 23–26. [13] L. K. Yan and H. Yin, “DroidScope: Seamlessly Reconstructing the OS and Dalvik Semantic Views for Dynamic Android Malware Analysis,” in Proceedings of 21st USENIX Security Symposium, 2012, pp. 569–584. [14] W. Enck, P. Gilbert, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth, “TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones,” ACM Transactions on Computer Systems, vol. 32, no. 2, p. 5, 2014. [15] R. Pascanu, J. W. Stokes, H. Sanossian, M. Marinescu, and A. Thomas, “Malware Classification with Recurrent Networks,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Apr. 2015, pp. 1916–1920. [16] Z. Yuan, Y. Lu, and Y. Xue, “Droiddetector: Android Malware Characterization and Detection Using Deep Learning,” Tsinghua Science and Technology, vol. 21, no. 1, pp. 114–123, Feb. 2016. [17] Y. Bengio, “Learning Deep Architectures for AI,” Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1–127, 2009. [18] R. Vinayakumar, K. P. Soman, and P. Poornachandran, “Deep Android Malware Detection and Classification,” in IEEE International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2017, pp. 1677–1683. [19] R. Nix and J. Zhang, “Classification of Android Apps and Malware Using Deep Neural Networks,” in IEEE International Joint Conference on Neural Networks (IJCNN), May 2017, pp. 1871–1878. [20] N. McLaughlin, J. M. d. Rincon, B. Kang, S. Yerima, P. Miller, S. Sezer, Y. Safaei, E. Trickel, Z. Zhao, A. Doupe, and G. J. Ahn, "Deep Android Malware Detection," in Proceedings of the Seventh ACM Conference on Data Application Security and Privacy (CODASPY), 2017, pp. 301-308. [21] E. B. Karbab, M. Debbabi, A. Derhab, D. Mouheb, “Maldozer: Automatic Framework for Android Malware Detection Using Deep Learning,” in Digital Investigation, vol. 24, Supplement, Mar. 2018, pp. S48–S59. [22] T. Kim, B. Kang, M. Rho, S. Sezer, and E. G. Im, “A Multimodal Deep Learning Method for Android Malware Detection Using Various Features,” IEEE Transactions on Information Forensics and Security, vol. 14, no. 3, Mar. 2019, pp. 773-788. [23] T.-P. Liang, J. S. Chandler, and I. Han, “Integrating Statistical and Inductive Learning Methods for Knowledge Acquisition,” in Expert Systems with Applications, vol. 1, no. 4, 1990, pp. 391-401. [24] K. C. Lee, I. Han, and Y. Kwon, “Hybrid Neural Network Models for Bankruptcy Predictions,” in Decision Support Systems, vol. 18, no. 1, Sep. 1996, pp. 63-72. [25] G. P. Zhang, “Times Series Forecasting Using a Hybrid ARIMA and Neural Network Model,” in Neurocomputing, vol. 50, Jan. 2003, pp. 159-175. [26] K. Allix, T. F. Bissyandé, J. Klein, and Y. L. Traon, “Are Your Training Datasets Yet Relevant? An Investigation into the Importance of Timeline in Machine Learning-based Malware Detection,” in Engineering Secure Software and Systems, vol. 8978 of LNCS, pp. 51–67, Switzerland: Springer, 2015. [27] VirusShare. Accessed: Dec. 2018. [Online]. Available: https://virusshare.com [28] APKtool. Accessed: Dec. 2018. [Online]. Available: https://ibotpeaches.github.io/Apktool [29] D. P. Kingma and J. Ba. (2014). “Adam: A Method for Stochastic Optimization.” [Online]. Available: https://arxiv.org/abs/1412.6980 [30] TensorFlow. Accessed: Dec. 2018. [Online]. Available: https://tensorflow.org [31] Keras. Accessed: Dec. 2018. [Online]. Available: https://keras.io [32] Google Play Store. Accessed: Dec. 2018. [Online]. Available: https://play.google.com/store [33] AndroGuard. Accessed: Dec. 2018. [Online]. Available: https://pypi.org/project/androguard |
論文全文使用權限 |
如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信