NEW VIDEO: Security Weekly - How to protect AWS metadata services (used in Capital One breach). Watch now!
 
 

Four questions to ask to ensure you’re getting real ML

As threats become more sophisticated, nearly every technology vendor in the cybersecurity space is incorporating or claiming to incorporate machine learning (ML) into their solutions. In fact, it’s become so important to include machine learning as a cybersecurity feature, those vendors that don’t have it will have trouble competing against those who do. In response, vendors’ marketing teams are making claims that stretch the limits of what ML really means, leaving customers to determine which vendors truly provide ML capabilities and which are just blowing hot air.

Machine learning isn’t a static field — it’s constantly changing and developing, so ML capabilities will differ widely among vendors’ products. To ensure you deploy a security solution that’s not just a bunch of ML marketing fluff, here are a few questions to ask before you buy.

How old is the data you’re using to train your algorithm?

When an ML algorithm makes a prediction, it’s basing that prediction off historical data. Even if a vendor is feeding its ML engine current data, that data will be from the past by the time it is processed and a decision has been made. True, I might be stretching the definition of historical a bit if I’m applying it to information that’s a half-hour old, but the point is that ML can never be completely current, nor can it account for new variables. 

Keeping this in mind, it’s important to ask vendors about their “aging process” for the data inputs. How frequently are algorithms adjusted to account for past mistakes? How often is it fed new requirements? ML isn’t so much “thinking” as it is identifying patterns, and its ability to find patterns accurately depends on the quality and currency of the information used to train it.

Who are their data scientists?

All ML systems require human expertise, usually in the form of a data scientist, who selects and develops the algorithms, defines the goals and outputs, and trains the ML engine with data. The old software adage, “Garbage in, garbage out” definitely applies to ML. It takes human intelligence to evaluate varying use cases and determine which data is appropriate for each project.

So make sure to ask vendors who their data science subject matter experts are and for information on their background. Any reputable organization with real ML functionality should be happy to share their experts’ background and experience.


Stay on the cutting edge. Subscribe to our blog.


How has the vendor defined the goal of ML?

Related to their data science experts, a vendor incorporating ML into their systems needs to establish goal functions. It’s absolutely critical, because if the data scientists don’t define the problem the ML system is solving with sufficient accuracy and detail, you’ll get unreliable results, no matter how good the algorithm and supporting data sets may be.

For example, let’s say the goal is defined as, “determine secure connections to MySQL,” but inputs don’t account for usability requirements, a reasonable outcome for an ML algorithm could be that no connections are secure. Strictly speaking, this might be true, but a completely secure database that’s also completely unusable isn’t exactly the outcome anyone wants.

To balance security and useability, the vendor’s data science team should carefully characterize:

  • the problem;
  • the goal; and
  • the learning process, including what margin error is acceptable. 

It’s important for everyone to agree from the beginning on acceptable levels of accuracy and what kinds of mistakes can be tolerated.

What is the vendor’s testing process?

Just like any other technology, ML requires a lot of testing and validation to make sure it’s effective. You need to understand the logic flow behind the data, and regularly test the system to make sure you’re flushing out false positives. So, if you have a vendor touting their ML capabilities, make sure to discuss their testing process, how they adjust logic flows, data input and the algorithms they use.

We’ve only just begun to scratch the surface of ML’s potential, and, as an emerging technology, the quality of different cybersecurity vendors’ ML capabilities differ a great deal. Savvy buyers need to dig deep into potential vendors’ offerings, asking the tough questions that will provide the answers you need to separate what’s real from what’s just advertising copy.

 

Dan Perkins, Director of Products & Solutions

Written by Dan Perkins, Director of Products & Solutions

Dan Perkins is Director of Products and Solutions for Edgewise Networks, where he oversees the direction and development of Edgewise’s zero trust platform. Prior to Edgewise, Dan was Director of Product Management at Infinio, where he was responsible for product vision and the ongoing quality and applicability of Infinio’s solution. He also previously served in several software engineering and quality assurance roles for Citrix. Dan holds a B.S. in computer engineering from Northeastern University.