Security pros have to deal with thousands of vulnerabilities a year, so how do they prioritise them? Michael Roytman, Chief Data Scientist at Kenna Security talks to CBR about applying machine learning to vulnerability management.
Why is vulnerability management such a big problem in the enterprise?
The average enterprise has over 60 thousand assets and more than 24 million
vulnerabilities, with dozens of new ones discovered each day. Manually analysing, correlating, and prioritising each vulnerability isn’t humanly possible – even for large security teams. Their time is limited, they’re under immense pressure from executive teams to fix vulnerabilities that are in the headlines (whether or not those vulnerabilities are likely to ever become credible threats), and there’s simply too much data coming in too fast for them to ever gain the upper hand. The Internet of Things (IoT) and big data solutions are adding even more to this data deluge, making what was already a losing proposition that much worse.
Yet despite all this vulnerability data, attackers only utilise 1% of all vulnerabilities. Without a powerful, intelligent model, powered by data science and machine learning, security and vulnerability teams are forced to ‘guess’ at which vulnerabilities they should patch first. Many will use the Common Vulnerability Scoring System (CVSS) to help them narrow the list, but CVSS is of relatively limited value primarily because it’s static in nature. In addition, it really only assesses the vulnerability itself, in isolation – it doesn’t consider other critical information such as the value of the asset, the current threat environment, active breaches, and what attackers are doing in real time. It’s only by gathering, correlating, and analysing all of this data together that security and vulnerability teams can truly understand their true risk, and prioritise what actions to take first to remediate that risk.
How do security teams typically prioritise vulnerabilities to remediate?
Many teams attempt to manually prioritise using CVSS scores and Excel spreadsheets – but this method simply can’t scale. As mentioned, the average enterprise has upwards of 24 million vulnerabilities, yet spreadsheets are limited to just over one million rows. As a result, organisations end up with dozens of spreadsheets to track all of their vulnerabilities, which rapidly becomes an unmanageable task for any human to accomplish.
Most mature security and vulnerability teams will also use multiple scanning solutions to continuously assess the assets throughout their environment. Whilst there’s no question of the value of having this data – and more is always better – the downside of having lots of scan data is that it’s just even more data you have to manually parse, thereby further exacerbating the problem.
You mentioned predicting which vulnerabilities are most likely to be weaponised – how do you do this?
We are applying machine learning, crunching lots of data, and correlating different data sets. We’ve been capturing vulnerability data to understand an organisation’s risk score for a number of years; our data warehouse already has over two billion historic and active vulnerability instances in it. We also have data on over nine billion successful exploits globally and are using more than 50 other data sources. We correlate all of this to understand hacker behaviour, and to distil the vulnerability ‘traits’ that are currently indicative of a future exploit being discovered and developed against. In other words, we determine how hackers currently choose to spend time developing an exploit for one vulnerability versus another. This assessment is then fed into a predictive model to accurately evaluate whether weaponization is likely, and thus whether it will pose a threat in the foreseeable future.
It’s also important to note this is not a point-in-time analysis; Kenna performs this assessment continuously and in real time, so as attacker behaviour changes, the predictive risk assessment will immediately update to reflect the change in risk.
How does the machine learning work?
In the simplest form, we have a very large amount of both current and historical data, and we know from the historical data whether a particular vulnerability was exploited or not. Our system predicts what combination of characteristics for each vulnerability make it most likely to be actually used by hackers. Since we know which vulnerabilities were or were not actually exploited by hackers in the wild, we can check the outcome to see if the prediction was correct. If the prediction was incorrect, the system adjusts the combination of characteristics it looks for. If it was correct, the system becomes more confident in the combination of characteristics it used to come to that determination.
Here are the technical details for all of the data scientists out there:
Machine learning generally takes one of two forms; generating new datasets based on past ones (supervised learning), or labelling datasets that already exist (clustering, aka, unsupervised learning). Since we have a robust dataset of vulnerabilities, scan data, and exploits going back six years, we use this already labelled data to answer the question at the moment a vulnerability is released, “Will this vulnerability have an exploit published for it?”.
This is a binary classification problem, it is in the supervised learning family of machine learning. The ‘label’ is the past existence of an exploit. The data we use to train the model is 105,000 vulnerabilities that have been published to date, 55+ exploit sources amounting to about 23,000 exploits, and our customer dataset of 2 billion “scanned in the wild’ vulnerabilities.
The methodology by which we do it is unique but utilises common algorithms. We use support vector machines in production, but we calibrate the model with random forests. Essentially, we are looking to ensure that when we issue a prediction, we are confident that it is correct and that the set of all predictions we issue covers most exploits. We think of these measures as more important than accuracy (which is around 95% for this model). ‘Efficiency’ and ‘Coverage’ are essential to making sure the predictions are not just accurate but are useful to an enterprise; our current model ranges in efficiency between 80% and 95% (we tell the customer what we expect for each prediction), whilst maintaining over 50% coverage.