§ 瀏覽學位論文書目資料
  
系統識別號 U0002-2308201705113600
DOI 10.6846/TKU.2017.00822
論文名稱(中文) 攻擊程式出現預測:社群媒體(Twitter)情資分析應用
論文名稱(英文) Prediction of Real-World Exploits : the Use of Social Media (Twitter) Analytics
第三語言論文名稱
校院名稱 淡江大學
系所名稱(中文) 資訊管理學系碩士班
系所名稱(英文) Department of Information Management
外國學位學校名稱
外國學位學院名稱
外國學位研究所名稱
學年度 105
學期 2
出版年 106
研究生(中文) 王妤平
研究生(英文) Yu-Ping Wang
學號 605630069
學位類別 碩士
語言別 繁體中文
第二語言別
口試日期 2017-06-04
論文頁數 63頁
口試委員 指導教授 - 鄭啟斌(cbcheng@mail.tku.edu.tw)
委員 - 張應華(yhchang@mail.tku.edu.tw)
委員 - 趙景明(chao@csim.scu.edu.tw)
關鍵字(中) 漏洞
資料不平衡
機器學習
分類
支持向量機
決策樹
貝氏機率
關鍵字(英) Vulnerability
data imbalance
Machine learning
Classification
Support vector machine
Decision tree
Bayes’ probability
第三語言關鍵字
學科別分類
中文摘要
隨著網路基礎設施普及以及資訊系統的廣泛使用,企業或組織曝露在資安風險的機率越來越高。而不時被揭露的軟硬體漏洞更提供了網路犯罪份子開發攻擊程式危害企業組織的管道。漏洞資訊及其討論經常透過網路論壇交流,在社群媒體興起後,更成為資安資訊交換的平台。本研究之目的即在於利用Twitter上發佈討論的漏洞訊息,提前發現可能會被網路罪犯利用開發並進行攻擊的漏洞。
本研究除了收集Twitter上的漏洞資訊外,並參考其他資安資源以擴充對漏洞特性的描述;這些資安資源包括:美國國家漏洞數據庫、第三方漏洞平台( CVE Details與VULDB)、ExploitDB以及Microsoft Technet。本研究提出一個三階段的分類方法來預測一個漏洞被利用開發的機率,同時以k-means分群來調整樣本中正反案例的比例,以降低資料(類別)不平衡問題對預測準確度的影響。三階段分類的步驟為:(1)第一階段使用支持向量機(SVM)訓練分類器;(2)SVM測試結果中,被判定為會被實作攻擊碼者之案例,在第二階段用以訊練決策樹分類;(3) 決策樹測試結果為實作攻擊碼者,在第三階段計算其貝氏機率,以作為企業防禦或廠商開發修補程式之依據。
英文摘要
As the growth and completeness of networking infrastructure and the popularity of information systems, enterprises and organizations are greatly exposed under information security risk. Software and hardware vulnerabilities that are revealed frequently provide a convenient way for cyber criminals to exploit and attack enterprises or organizations. The publications and discussions of vulnerabilities are frequently found on internet forums; social media have become major platforms for such information exchange after their popularity. The goal of this study is to utilize messages on Twitter regarding vulnerabilities to assess the probability that a vulnerability will be exploited in the real-world.
Beside messages on Twitter, information security resources are also used to extract the features of a vulnerability; these resources include: National Vulnerability Database, CVE Details, VulDB, ExploitDB and Microsoft Technet. The study proposes a three-stage classification model to predict the probability that a vulnerability will be exploited, and employs the k-means clustering to adjust the ratio between the positive and negative instances in the sample to alleviate the data (class) imbalance problem during training. The steps of the three-stage classifier are: (1) using support vector machine (SVM) at the first stage training; (2) at the second stage, those instances that are classified as exploited in the testing sample by SVM are further used as training sample of the decision tree classification; (3) the third stage compute the Bayes’ probabilities of those instances which are classified as exploited by decision tree in the testing result. The resulting Bayes’ probabilities serve as a reference for enterprises or vendors to take an appropriate action to a vulnerability.
第三語言摘要
論文目次
第一章 緒論	1
1.1	研究背景與動機	1
1.2	研究目的	5
1.3	研究限制	7
 第二章 文獻探討	8
2.1	模型數據來源	8
2.1.1	社群媒體預測趨勢	8
2.1.2	漏洞評分系統的缺陷	9
2.1.3	漏洞評分系統建議改進	10
2.1.4	漏洞評分系統更新	11
2.1.5	漏洞預測分類模型	12
2.2	機器學習	13
2.2.1	資料不平衡	13
2.2.2	K-means	14
2.2.3	SVM	14
2.2.4	分類模型性能評估	15
 第三章 研究架構與方法	17
3.1	研究架構	17
3.2	特徵值萃取	18
3.2.1	社群媒體Twitter	19
3.2.2	國家漏洞資料庫	21
3.2.3	第三方漏洞平台	23
3.2.4	關鍵字詞頻	25
3.3	真實數據	26
3.3.1	漏洞利用資料庫	26
3.3.2	國家漏洞資料庫	26
3.3.3	微軟資訊安全公告	27
3.4	資料前處理	27
3.5	攻擊程式碼出現預測模型	29
3.5.1	不平衡資料特性	29
3.5.2	研究方法	30
 第四章 漏洞資料特性分析	35
4.1	漏洞討論趨勢分析	35
4.2	漏洞廠商分析	39
4.3	漏洞零日價格分析	41
4.4	漏洞類型分析	42
 第五章 研究方法實證評估	44
5.1	資料內容	44
5.2	實作工具	45
5.3	模型結果	45
5.3.1	預測模型說明	46
5.3.2	漏洞機率值說明	52
5.3.3	漏洞機率值結果	54
5.4	企業模型選擇	55
5.5	模型相互關係	56
 第六章 結論	59
參考文獻	60


圖目錄
圖 1‑1:歷年漏洞趨勢圖	2
圖 1‑2:歷年來攻擊程式碼成長趨勢	3
圖 1‑3:從2006年到2016年上半年漏洞工具包趨勢	3
圖 1‑4:從2010年到2016年Twitter每月活躍用戶數量 (百萬)	5
圖 1‑5:資訊安全議題在Twitter的討論趨勢	5
圖 1‑6:漏洞發展時序圖	7
圖 3‑1:研究架構圖	18
圖 3‑2:Twitter資料分布圖	20
圖 3‑3:Twitter 特徵值	21
圖 3‑4:CVSS v2特徵值	22
圖 3‑5:CVSS v3特徵值	23
圖 3‑6:www.vuldb.com 漏洞利用價格架構圖	24
圖 3‑7:預測分類模型架構圖	31
圖 3‑8:預測模型結果圖	33
圖 4‑1:每月漏洞數量統計分析圖	36
圖 4‑2:漏洞在NVD與Twitter上的覆蓋比例圖	37
圖 4‑3:5~11月Twitter前5名討論數量比較圖	39
圖 4‑4:漏洞廠商比較圖	40
圖 4‑5:漏洞有攻擊程式碼的廠商比較圖	40
圖 4‑6:有攻擊程式碼的廠商零日價格箱型圖	42
圖 4‑7:漏洞類型直條圖	43
圖 4‑8:漏洞類型圓餅圖	43
圖 5‑1:預測模型圖	46
圖 5‑2:K-means群數Elbow Method圖	53
圖 5‑3:K-means群數側影係數圖	53
圖 5‑4:漏洞機率值圖	55
圖 5‑5:特徵值相互關係圖前36名	57
圖 5‑6:特徵值相互關係圖後40名	58

表目錄
表 2‑1:混淆矩陣表	16
表 3‑1:關鍵字詞	25
表 4‑1:5~8月Twitter討論量前5名的CVE ID	38
表 4‑2:9~11月Twitter討論量前5名的CVE ID	38
表 5‑1:預測模型結果	47
表 5‑2:K-means 群數之側影係數	54
參考文獻
[1]	Check Point Research Team, “More Than 1 Million Google Accounts Breached by Gooligan,” http://blog.checkpoint.com/, 30-Nov-2016. .
[2]	National Institute of Standards and Technology, “National Vulnerability Database,” National Vulnerability Database. [Online]. Available: https://nvd.nist.gov/.
[3]	L. Allodi and F. Massacci, “Comparing Vulnerability Severity and Exploits Using Case-Control Studies,” ACM Trans Inf Syst Secur, vol. 17, no. 1, p. 1:1–1:20, Aug. 2014.
[4]	Offensive Security, “A Decade of Exploit Database Data,” https://www.offensive-security.com, 02-May-2016. .
[5]	TRENDMICRO, “Exploit Kit.” .
[6]	Trend Labs, “2016上半年資訊安全總評-勒索病毒當道的時代,” 2016h1_security_roundup_report.pdf. [Online]. Available: https://www.trendmicro.tw/cloud-content/tw/pdfs/security-intelligence/reports/2016h1_security_roundup_report.pdf.
[7]	“Number of monthly active Twitter users worldwide from 1st quarter 2010 to 4th quarter 2016 (in millions).” .
[8]	T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake Shakes Twitter Users: Real-time Event Detection by Social Sensors,” in Proceedings of the 19th International Conference on World Wide Web, New York, NY, USA, 2010, pp. 851–860.
[9]	J. Bollen, H. Mao, and X. Zeng, “Twitter mood predicts the stock market,” J. Comput. Sci., vol. 2, no. 1, pp. 1–8, Mar. 2011.
[10]	M. S. A. Wolfram, Modelling the stock market using twitter. 2010.
[11]	H. Achrekar, A. Gandhe, R. Lazarus, S.-H. Yu, and B. Liu, “Predicting Flu Trends using Twitter data,” in 2011 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), 2011, pp. 702–707.
[12]	E. Aramaki, S. Maskawa, and M. Morita, “Twitter Catches the Flu: Detecting Influenza Epidemics Using Twitter,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing, Stroudsburg, PA, USA, 2011, pp. 1568–1576.
[13]	K. Thomas, F. Li, C. Grier, and V. Paxson, “Consequences of Connectivity: Characterizing Account Hijacking on Twitter,” in Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, 2014, pp. 489–500.
[14]	F. Benevenuto, G. Magno, T. Rodrigues, and V. Almeida, “Detecting spammers on twitter,” in In Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS, 2010.
[15]	K. Thomas, C. Grier, and V. Paxson, “Adapting Social Spam Infrastructure for Political Censorship,” in Proceedings of the 5th USENIX Conference on Large-Scale Exploits and Emergent Threats, Berkeley, CA, USA, 2012, pp. 13–13.
[16]	Rajab and M. Abu, “CAMP: Content-Agnostic Malware Protection.,” NDSS, 2013.
[17]	M. Bozorgi, L. K. Saul, S. Savage, and G. M. Voelker, “Beyond Heuristics: Learning to Classify Vulnerabilities and Predict Exploits,” in Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 2010, pp. 105–114.
[18]	M. A. McQueen, T. A. McQueen, W. F. Boyer, and M. R. Chaffin, “Empirical Estimates and Observations of 0Day Vulnerabilities,” in 2009 42nd Hawaii International Conference on System Sciences, 2009, pp. 1–12.
[19]	L. Bilge and T. Dumitras, “Before We Knew It: An Empirical Study of Zero-day Attacks in the Real World,” in Proceedings of the 2012 ACM Conference on Computer and Communications Security, New York, NY, USA, 2012, pp. 833–844.
[20]	O. H. Alhazmi and Y. K. Malaiya, “Prediction capabilities of vulnerability discovery models,” in RAMS ’06. Annual Reliability and Maintainability Symposium, 2006., 2006, pp. 86–91.
[21]	S. Asur and B. A. Huberman, “Predicting the Future with Social Media,” in 2010 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2010, vol. 1, pp. 492–499.
[22]	“Using Social Media to Predict and Track Disease Outbreaks - ProQuest.” [Online]. Available: http://search.proquest.com/openview/0f0eb2208e5d950da013109387874860/1?pq-origsite=gscholar&cbl=48869. [Accessed: 11-Mar-2017].
[23]	E. Bothos, D. Apostolou, and G. Mentzas, “Using Social Media to Predict Future Events with Agent-Based Markets,” IEEE Intell. Syst., vol. 25, no. 6, pp. 50–58, Nov. 2010.
[24]	C. Sabottke, O. Suciu, and T. Dumitras, “Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits,” in 24th USENIX Security Symposium (USENIX Security 15), Washington, D.C., 2015, pp. 1041–1056.
[25]	J. A. Ozment, “Vulnerability discovery & software security,” Ph.D., University of Cambridge, 2007.
[26]	S. Zhang, D. Caragea, and X. Ou, “An Empirical Study on Using the National Vulnerability Database to Predict Software Vulnerabilities,” in Database and Expert Systems Applications, 2011, pp. 217–231.
[27]	S. Zhang, X. Ou, and D. Caragea, “Predicting Cyber Risks through National Vulnerability Database,” Inf. Secur. J. Glob. Perspect., vol. 24, no. 4–6, pp. 194–206, Dec. 2015.
[28]	J. A. Wang, F. Zhang, and M. Xia, “Temporal Metrics for Software Vulnerabilities,” in Proceedings of the 4th Annual Workshop on Cyber Security and Information Intelligence Research: Developing Strategies to Meet the Cyber Security and Information Intelligence Challenges Ahead, New York, NY, USA, 2008, p. 44:1–44:3.
[29]	L. Gallon, “On the Impact of Environmental Metrics on CVSS Scores,” in 2010 IEEE Second International Conference on Social Computing, 2010, pp. 987–992.
[30]	S. H. Houmb, V. N. L. Franqueira, and E. A. Engum, “Quantifying security risk level from CVSS estimates of frequency and impact,” J. Syst. Softw., vol. 83, no. 9, pp. 1622–1634, Sep. 2010.
[31]	Q. Liu and Y. Zhang, “VRSS: A new system for rating and scoring vulnerabilities,” Comput. Commun., vol. 34, no. 3, pp. 264–273, Mar. 2011.
[32]	J. Luo, K. Lo, and H. Qu, “A Software Vulnerability Rating Approach Based on the Vulnerability Database,” J. Appl. Math., vol. 2014, p. e932397, May 2014.
[33]	H. Holm and K. K. Afridi, “An expert-based investigation of the Common Vulnerability Scoring System,” Comput. Secur., vol. 53, pp. 18–30, Sep. 2015.
[34]	P. Johnson, R. Lagerstrom, M. Ekstedt, and U. Franke, “Can the Common Vulnerability Scoring System be Trusted? A Bayesian Analysis,” IEEE Trans. Dependable Secure Comput., vol. PP, no. 99, pp. 1–1, 2016.
[35]	F. Massacci and V. H. Nguyen, “Which is the Right Source for Vulnerability Studies?: An Empirical Analysis on Mozilla Firefox,” in Proceedings of the 6th International Workshop on Security Measurements and Metrics, New York, NY, USA, 2010, p. 4:1–4:8.
[36]	Q. Yang and X. Wu, “10 challenging problems in data mining research,” Int. J. Inf. Technol. Decis. Mak., vol. 5, no. 4, pp. 597–604, Dec. 2006.
[37]	Y. Sun, A. K. Wong, and M. S. Kamel, “Classification of imbalanced data: A review,” Int. J. Pattern Recognit. Artif. Intell., vol. 23, no. 4, pp. 687–719, 2009.
[38]	N. Japkowicz and S. Stephen, “The class imbalance problem: A systematic study,” Intell. Data Anal., vol. 6, no. 5, pp. 429–449, 2002.
[39]	J. MacQueen and others, “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967, vol. 1, pp. 281–297.
[40]	C. Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, Sep. 1995.
[41]	S. Raschka, Python machine learning. Packt Publishing Ltd, 2015.
[42]	Serkan Özkan, “CVE Details,” CVE security vulnerability database. Security vulnerabilities, exploits, references and more. [Online]. Available: www.cvedetails.com.
[43]	SCIP, “Vulnerability Database,” Vulnerability Database. [Online]. Available: https://vuldb.com/.
[44]	elastic, “An Introduction to the ELK Stack,” An Introduction to the ELK Stack (Now the Elastic Stack). [Online]. Available: https://www.elastic.co/webinars/introduction-elk-stack.
[45]	Twitter, “Twitter Developer Documentation,” API Rate Limits — Twitter Developers. [Online]. Available: https://dev.twitter.com/rest/public/rate-limiting.
[46]	FIRST, “Common Vulnerability Scoring System,” Common Vulnerability Scoring System (CVSS-SIG). [Online]. Available: https://www.first.org/cvss.
[47]	Offensive Security, “Exploit Database,” Exploits Database by Offensive Security. [Online]. Available: https://www.exploit-db.com/.
[48]	Microsoft Security Response Center (MSRC), “資訊安全摘要報告與公告,” 資訊安全摘要報告與公告. [Online]. Available: https://technet.microsoft.com/zh-tw/library/security/.
[49]	K. Nayak, D. Marino, P. Efstathopoulos, and T. Dumitraş, “Some Vulnerabilities Are Different Than Others,” in Research in Attacks, Intrusions and Defenses, 2014, pp. 426–446.
[50]	L. Allodi and F. Massacci, “A Preliminary Analysis of Vulnerability Scores for Attacks in Wild: The Ekits and Sym Datasets,” in Proceedings of the 2012 ACM Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, New York, NY, USA, 2012, pp. 17–24.
[51]	R. Tibshirani, G. Walther, and T. Hastie, “Estimating the number of clusters in a data set via the gap statistic,” J. R. Stat. Soc. Ser. B Stat. Methodol., vol. 63, no. 2, pp. 411–423, 2001.
[52]	L. Kaufman and P. J. Rousseeuw, Finding groups in data: an introduction to cluster analysis, vol. 344. John Wiley & Sons, 2009.
[53]	V. N. Vapnik and V. Vapnik, Statistical learning theory, vol. 1. Wiley New York, 1998.
[54]	“Beautiful Soup Documentation — Beautiful Soup 4.4.0 documentation.” [Online]. Available: https://www.crummy.com/software/BeautifulSoup/bs4/doc/. [Accessed: 17-Aug-2017].
[55]	“scikit-learn: machine learning in Python — scikit-learn 0.18.1 documentation.” [Online]. Available: http://scikit-learn.org/stable/index.html. [Accessed: 06-Mar-2017].
[56]	“Project Jupyter | Home.” [Online]. Available: http://jupyter.org/. [Accessed: 17-Aug-2017].
論文全文使用權限
校內
校內紙本論文立即公開
同意電子論文全文授權校園內公開
校內電子論文立即公開
校外
同意授權
校外電子論文立即公開

如有問題,歡迎洽詢!
圖書館數位資訊組 (02)2621-5656 轉 2487 或 來信