§ 瀏覽學位論文書目資料
  
系統識別號 U0002-2502201914400900
DOI 10.6846/TKU.2019.00815
論文名稱(中文) 運用深度學習於網路入侵檢測之探討
論文名稱(英文) Conducting Network Intrusion Detection with Enhanced Deep Learning
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 電機工程學系碩士班
系所名稱(英文) Department of Electrical and Computer Engineering
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 107
學期 1
出版年 108
研究生(中文) 吳東燁
研究生(英文) Dong-Ye Wu
學號 605450195
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2019-01-04
論文頁數 70頁
口試委員 指導教授 - 莊博任
委員 - 陳省隆
委員 - 許獻聰
關鍵字(中) 入侵檢測系統
深度學習
機器學習
異常檢測
特徵選擇
攻擊檢測
電腦網路安全
關鍵字(英) Intrusion Detection System
Deep learning
Machine Learning
Anomaly detection
Feature selection
Attack detection
Computer network security
第三語言關鍵字
學科別分類
中文摘要
隨著近年來資通訊的高速發展,人們所使用的傳輸量也隨之上升,除此之外,大量物聯網設備開始流入市面,這樣的情況也造成大量的數據傳輸,隨著這些流量的產生,將會對入侵檢測系統產生挑戰。
最近幾年的研究發現入侵檢測系統遭遇的挑戰主要可以分為下列幾類,分別是(1)在網路中大量生成的資料(2)入侵檢測系統檢測的深度(3)多樣的協議和資料,這三個問題是近年來入侵檢測系統主要面對的挑戰,而第一點網路中大量生成的資料主要是因為近年來資通訊產業的發展迅速,且物聯網設備的發展也日漸多元,因此造成大量的設備流入市面,因而造成大量的資訊在網路中傳輸,而在網路中的資料也變得更加龐大,這將會對入侵檢測系統造成負擔,因為大量的資料在傳輸的過程中,需要更密集的去處理大量的資料,即使在電腦效能有所提升的情況下,仍然不足以應付日漸增加的傳輸量,而在入侵檢測系統檢測的深度方面,為了提升入侵檢測系統的有效性和準確性,入侵檢測系統不能再依靠一些簡易或明顯的特徵來辨識攻擊與否,而必須要能更深度來觀察與檢測,這也意謂入侵檢測系統需要觀察更多特徵。
本論文提出了使用深度學習的方法來解決目前流行的入侵檢測資料集各類別資料集不平衡的情況,我們使用深度變分自動編碼機生成新資料來使不平衡的資料集變的平衡,平衡過後的資料將可以使分類器在訓練時,因為各資料不平衡而產生分類上的偏差被降低,除此之外,我們還使用平衡過後的資料集來訓練深度自動編碼器,利用深度自動編碼器可以壓縮精華特徵的特性,我們可以去除特徵中冗餘的部份,這將使我們可以更準確地去分類我們的資料。
實驗結果證實,在使用平衡的資料集的情況下,分類的準確率有更好的表現,再加上使用平衡過後的資料集所訓練的特徵壓縮模型下,我們可以得更好的準確率,和未平衡的資料集相比,我們在面對未知的攻擊時,我們有更好的強健性,我們也可以解決因為各類別資料的不平衡所造成的模型在訓練時所發生的過擬合的問題,這將使我們的入侵檢測模型在遭遇新型態的資料時,不會因為資料不曾出現在訓練資料集中,而發生誤判的問題。
英文摘要
Abstract:
With the rapid development of information and communication in recent years, the amount of transmission used by people increase. In addition, a large number of internet of things devices are entering the market, which also results in a large amount of data transmission. With the generation of these flows, intrusion detection systems will be challenged.

In recent years, research has found that the challenges encountered by intrusion detection systems can be divided into the following categories: (1) the volume of data both stored and passing through networks continues to increase; (2) the depth of intrusion detection systems; (3) a variety of protocols and data. These three problems are the main challenges of intrusion detection systems in recent years. The first reason for the large amount of data generated in the network is the rapid development of the information and communication industry in recent years. The development of internet of things devices is also increasingly diversified, resulting in a large number of devices into the market. As a result, a large amount of information is transmitted over the network, and the data in the network becomes even larger. This will impose a burden on the intrusion detection system, because a large number of traffics in the transmission process, needs to be more intensive to deal with a large number of traffics. Even with the improvement in computing performance, it is still not enough to cope with the increasing traffic. In terms of the detection depth of intrusion detection system, in order to improve the effectiveness and accuracy of intrusion detection system, intrusion detection system can no longer rely on some simple or obvious features to identify whether the traffic is an attack. Must be able to observe and detect in greater depth, which means that intrusion detection system needs to observe more characteristics.

This paper proposes the method of deep learning to solve the imbalance of intrusion detection data set. We use deep variational autoencoders to generate new data to balance the unbalanced dataset. The balanced data can reduce the deviation of classifier in training because of the imbalance of data. In addition, we used a balanced dataset to train the deep autoencoder. By using the depth autoencoder to compress the features of the essential features, we can remove the redundant parts of the features. This will enable us to classify data more accurately. Experimental results show that classification accuracy is better when balanced datasets are used. Coupled with the use of the balanced dataset trained by the feature compression model, we can get better accuracy. Compared to unbalanced datasets, we have better robustness against unknown attacks. We can also solve the problem of over-fitting in the training of the model caused by the imbalance of various types of data. This will ensure that our intrusion detection model will not misjudge new types of data because they are not in the training data set.
第三語言摘要
論文目次
目錄

第一章、緒論	1
1.1 研究背景	1
1.2 研究動機	2
1.3 論文架構	4
第二章、相關研究背景	5
2.1入侵檢測系統	5
2.1.1主機型入侵檢測系統	6
2.1.2網路型入侵檢測系統	7
2.2 主機型與網路型入侵檢測系統的比較	8
2.3 入侵檢測系統的挑戰	10
2.4 入侵檢測系統技術	10
2.4.1 統計異常的方法	11
2.4.2 基於規則	11
2.4.3 人工神經網路	11
2.4.4 自動編碼機(Autoencoder)	12
2.4.5 變分自動編碼機(Variational Autoencoder)	15
2.5 自動編碼器結合深度學習	18
第三章、提出之新方法	20
3.1 NSL-KDD Dataset	21
3.2 資料的前處理	24
3.2.1 One-hot編碼[25]	25
3.2.2 特徵值標準化	26
3.3平衡資料生成模型	27
3.4平衡資料收集	31
3.5使用平衡資料訓練特徵壓縮模型	35
3.6 平衡資料最佳化	38
第四章、模擬評估	38
4.1實驗環境	38
4.2訓練與測試資料	39
4.3面對未知資料的編碼評估	43
4.4入侵檢測系統評價指標	45
4.4.1混淆矩陣	46
4.4.2評估指標	48
4.5特徵壓縮模型比較	51
4.5.1 有無特徵壓縮模型與平衡資料分類	51
4.5.2 平衡資料準確率提升討論	53
4.5.3 特徵轉換壓縮成度對模型的影響	54
4.6 資料的生成數量的評估	56
4.7 神經網路層數量對準確度的影響	59
4.8 討論	60
第五章、結論與未來工作	62
參考文獻	65

圖目錄
圖2.1、傳統網路架構	5
圖2.2、主機型入侵檢測系統	7
圖2.3、網路型入侵檢測系統	8
圖2.4、傳統自動編碼器	14
圖2.5 變分自動編碼器	17
圖2.6 深度學習模型	18
圖2.7 深度自動編碼器	19
圖3.1、自動編碼器架構	28
圖3.2、變分自動編碼器架構	29
圖3.3、變分自動編碼器形狀	30
圖3.4、生成平衡資料集	32
圖3.5、使用平衡資料訓練的自動編碼器	36
圖3.6、入侵檢測系統分類模型	37
圖4.1 變分自動編碼器架構	41
圖4.2訓練資料集和平衡資料集與測試資料集間的比較	42
圖4.3、使用平衡資料和未平衡資料訓練自動編碼機的比較	45


 
表目錄
表2.1主機型和網路型入侵檢測系統的比較[5]	9
表3.1 NSL-KDD資料集的特徵	24
表3.2轉換前的資料特徵	26
表3.3 轉換後的資料特徵	26
表3.4、攻擊型態與攻擊類別的對應	33
表3.5 平衡訓練資料集與原始訓練資料集的數量比較	35
表4.1 運算環境規格	39
表4.2 訓練資料集與測試資料集各類別數量	40
表4.3 各類別資料生成組數	42
表4.4 使用平衡資料訓練特徵壓縮模型的入侵檢測系統	47
表4.5 使用未平衡資料訓練特徵壓縮模型的入侵檢測系統	48
表4.6 五種分類的分類評價指標結果	50
表4.7、特徵壓縮模型與平衡資料集對傳統機器學習分類器的影響	52
表4.8 各類別準確率的提升的影響	54
表4.9、特徵壓縮程度對入侵檢測系統準確率所產生的影響	56
表4.10 訓練資料平衡程度對模型的影響	58
表4.11 訓練資料平衡對模型訓練時間的影響	59
表4.12 訓練資料縮放比例程度對模型的影響	59
表4.13 減少神經網路層數中不同神經元對準確率的影響	60
參考文獻
[1]	Y.Lecun, Y.Bengio, andG.Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
[2]	K. G.Kim, “Book Review: Deep Learning,” Healthc. Inform. Res., vol. 22, no. 4, p. 351, 2016.
[3]	M.Roesch andothers, “Snort: Lightweight intrusion detection for networks.,” in Lisa, 1999, vol. 99, no. 1, pp. 229–238.
[4]	T.Bajtoš, A.Gajdoš, L.Kleinová, K.Lučivjanská, andP.Sokol, “Network Intrusion Detection with Threat Agent Profiling,” Secur. Commun. Networks, vol. 2018, 2018.
[5]	Harley Kozushko, “Intrusion Detection: Host-Based and Network-Based Intrusion Detection Systems,” vol. 11, 2003.
[6]	N.Moustafa andJ.Slay, “UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set),” 2015 Mil. Commun. Inf. Syst. Conf. MilCIS 2015 - Proc., pp. 1–6, 2015.
[7]	A.Gupta, B.Singh Bhati, andV.Jain, “Artificial Intrusion Detection Techniques: A Survey,” Int. J. Comput. Netw. Inf. Secur., vol. 6, no. 9, pp. 51–57, 2014.
[8]	N.Moustafa andJ.Slay, “The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set,” Inf. Secur. J., vol. 25, no. 1–3, pp. 18–31, 2016.
[9]	S.Rodda, “Network intrusion detection systems using neural networks,” Adv. Intell. Syst. Comput., vol. 672, pp. 903–908, 2018.
[10]	X. J. A.Bellekens, C.Tachtatzis, R. C.Atkinson, C.Renfrew, andT.Kirkham, “A Highly-Efficient Memory-Compression Scheme for GPU-Accelerated Intrusion Detection Systems,” Proc. 7th Int. Conf. Secur. Inf. Networks - SIN ’14, pp. 302–309, 2014.
[11]	M. A.Alsheikh, S.Lin, D.Niyato, andH. P.Tan, “Machine learning in wireless sensor networks: Algorithms, strategies, and applications,” IEEE Commun. Surv. Tutorials, vol. 16, no. 4, pp. 1996–2018, 2014.
[12]	A.Saied, R. E.Overill, andT.Radzik, “Detection of known and unknown DDoS attacks using Artificial Neural Networks,” Neurocomputing, vol. 172, pp. 385–393, 2016.
[13]	M. E.Aminanto, H. C.Tanuwidjaja, andK.Kim, “Wi-Fi Intrusion Detection Using Weighted-Feature Selection for Neural Networks Classifier,” pp. 99–104, 2017.
[14]	P. V.Dinh andT. N.Ngoc, “Deep Learning Combined with De-noising Data for Network Intrusion Detection,” pp. 55–60, 2017.
[15]	S.Potluri andC.Diedrich, “Accelerated deep neural networks for enhanced Intrusion Detection System,” IEEE Int. Conf. Emerg. Technol. Fact. Autom. ETFA, vol. 2016–Novem, pp. 1–8, 2016.
[16]	M.Tavallaee, E.Bagheri, W.Lu, andA. A.Ghorbani, “A Detailed Analysis of the KDD CUP 99 Data Set,” in Proceedings of the Second IEEE International Conference on Computational Intelligence for Security and Defense Applications, 2009, pp. 53–58.
[17]	C.Doersch, “Tutorial on Variational Autoencoders,” pp. 1–23, 2016.
[18]	T. A.Tang, L.Mhamdi, D.McLernon, S. A. R.Zaidi, andM.Ghogho, “Deep learning approach for Network Intrusion Detection in Software Defined Networking,” Proc. - 2016 Int. Conf. Wirel. Networks Mob. Commun. WINCOM 2016 Green Commun. Netw., pp. 258–263, 2016.
[19]	G. E.Hinton andR. R.Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science (80-. )., vol. 313, no. 5786, pp. 504–507, 2006.
[20]	A.Javaid, Q.Niyaz, W.Sun, andM.Alam, “A Deep Learning Approach for Network Intrusion Detection System,” Proc. 9th EAI Int. Conf. Bio-inspired Inf. Commun. Technol. (formerly BIONETICS), 2016.
[21]	Y.Chuan-long, Z.Yue-fei, F.Jin-long, andH.Xin-zheng, “A Deep Learning Approach for Intrusion Detection using Recurrent Neural Networks,” IEEE Access, vol. 5, pp. 1–1, 2017.
[22]	N.Gao, L.Gao, Q.Gao, andH.Wang, “An intrusion detection model based on deep belief networks,” in Advanced Cloud and Big Data (CBD), 2014 Second International Conference on, 2014, pp. 247–252.
[23]	N.Shone, T. N.Ngoc, V. D.Phai, andQ.Shi, “A Deep Learning Approach to Network Intrusion Detection,” IEEE Trans. Emerg. Top. Comput. Intell., vol. 2, no. 1, pp. 41–50, 2018.
[24]	L.Dhanabal andS. P.Shantharajah, “A Study on NSL-KDD Dataset for Intrusion Detection System Based on Classification Algorithms,” Int. J. Adv. Res. Comput. Commun. Eng., vol. 4, no. 6, pp. 446–452, 2015.
[25]	S. K.Knapp, “Accelerate FPGA macros with one-hot approach,” 1990.
[26]	H.Yang, R.C. Qiu, X.Shi, andX.He, “Deep Learning Architecture for Voltage Stability Evaluation in Smart Grid based on Variational Autoencoders.” 2018.
[27]	M.Frasca, A.Bertoni, M.Re, andG.Valentini, “A neural network algorithm for semi-supervised node label learning from unbalanced data,” Neural Networks, vol. 43, pp. 84–98, Jul.2013.
[28]	H.Pant, S.Soman, M.Sharma, andothers, “Scalable Twin Neural Networks for Classification of Unbalanced Data,” arXiv Prepr. arXiv1705.00347, 2017.
[29]	A. S.Eesa, Z.Orman, andA. M. A.Brifcani, “A novel feature-selection approach based on the cuttlefish optimization algorithm for intrusion detection systems,” Expert Syst. Appl., vol. 42, no. 5, pp. 2670–2679, 2015.
[30]	Y.Wang, H.Yao, andS.Zhao, “Auto-encoder based dimensionality reduction,” Neurocomputing, vol. 184, pp. 232–242, 2016.
[31]	U.Bhowan, M.Johnston, M.Zhang, andX.Yao, “Evolving diverse ensembles using genetic programming for classification with unbalanced data,” IEEE Trans. Evol. Comput., vol. 17, no. 3, pp. 368–386, 2013.
[32]	H.Yin andK.Gai, “An Empirical Study on Preprocessing High-Dimensional Class-Imbalanced Data for Classification,” in 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems, 2015, pp. 1314–1319.
[33]	S.Choudhury andA.Bhowal, “Comparative analysis of machine learning algorithms along with classifiers for network intrusion detection,” in Smart technologies and management for computing, communication, controls, energy and materials (ICSTM), 2015 International conference on, 2015, pp. 89–95.
[34]	F.Chollet andothers, “Keras.” 2015.
[35]	M.Abadi et al., “Tensorflow: a system for large-scale machine learning.,” in OSDI, 2016, vol. 16, pp. 265–283.
[36]	F.Seide andA.Agarwal, “CNTK: Microsoft’s open-source deep-learning toolkit,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, p. 2135.
[37]	Theano Development Team, “Theano: A {Python} framework for fast computation of mathematical expressions,” arXiv e-prints, vol. abs/1605.0, 2016.
[38]	S.Karsoliya, “Approximating number of hidden layer neurons in multiple hidden layer BPNN architecture,” Int. J. Eng. Trends Technol., vol. 3, no. 6, pp. 714–717, 2012
論文全文使用權限
校內
紙本論文於授權書繳交後1年公開
同意電子論文全文授權校園內公開
校內電子論文於授權書繳交後1年公開
校外
同意授權
校外電子論文於授權書繳交後1年公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信