Edgewise is now part of the Zscaler family. Learn More

Machine Learning on the Edge(wise)

By John O'Neil — Jun 13, 2019

This week, as we introduce the concept of 1-click auto-segmentation to the public, it’s important for anyone interested in microsegmenting their networks using Edgewise to understand how our technology is capable of accomplishing what we say it does. It’s easy for a cybersecurity solution provider to claim seemingly-magical capabilities; it’s another thing to back up those claims with hard data.

To achieve application-centric, zero trust microsegmentation for our customers’ environments, the following three things are central to Edgewise’s success:

  1. Edgewise collects data from agents installed on a customer’s hosts. This data is sent to Edgewise’s backend where we put together a view of what's happening on the customer’s networks.
  2. We create an enforcement plane where policies can be sent to the agents and enforced there, allowing for maximal security.
  3. We use the collected information to automatically create recommended application and host segmentation policies. These policies are made available to the customer, who can apply or modify them as they want.

While all three of the above processes are necessary for preventing unauthorized access to our customers’ applications and services, this blog post is about number 3: how Edgewise uses machine learning to continuously build and recommend the most hardened microsegmentation policies, based on customers’ unique network environments.

One major problem with traditional microsegmentation is that it uses network information—i.e., IP addresses, ports, and protocols—to determine policies for what’s allowed to communicate into, out of, and between segments. This method creates immense complexity and continual manual management of thousands of policies, mainly because network information is constantly changing due to the dynamism of cloud and container networks, and because application deployment is rapid and continuous. Traditional microsegmentation was built for traditional, on-premises networks, but most organizations today operate in ever-changing, auto-scaling environments. Edgewise’s method for analyzing an environment and building policies eschews traditional practices in favor of application-centric segmentation policies that are simple and easy to use. This is possible because of our patented machine learning and automation.

Edgewise will be at AWS re:Inforce, June 25-26, 2019 in Boston. We hope to see  you there! <https://www.edgewise.net/events>

Collecting and analyzing the data

The agents installed in our customers’ environments allow us to see what’s happening on their networks. We can see what applications, services, and hosts are present; how they’re communicating; and when they’re communicating. The data we collect allows us to baseline normal communication patterns and assess the level of network overexposure. Using the observed traffic and a statistical analysis of what’s necessary for ideal-but-secure workload communication, we create “nearly optimal” policies. Edgewise has several goals for the generated policies to be optimal, which include:

  1. We want the smallest possible number of policies. This is to make policy management less onerous.
  2. We want policies that account for as much of the observed data as possible. This is for accuracy.
  3. We want policies to allow as little unobserved data as possible. In other words, we don’t want to guess at what’s happening on a customer’s network. However, this is complicated because we don't see unobserved data. That is, we know what should be allowed to communicate and with what other entities, but there's no explicit information about what shouldn't be allowed.
  4. We want relatively "trusted" applications to talk relatively freely, but for more suspicious or potentially dangerous applications to be more constrained—and for malicious applications not to be allowed to communicate at all.
  5. We want interpretable policies that users can understand. We can't have a "black box" model because those aren't interpretable. There are several reasons for this:
    1. First, most cybersecurity professionals don't blindly trust computers—if they did, they probably wouldn't be working in cybersecurity in the first place.
    2. Second, by using computers to process huge amounts of data into suggested policies, we want to allow users to integrate their knowledge of their networks into those policies, and gain insight about their networks.

Clearly, these goals cannot all be optimized simultaneously because they're in conflict with each other. However, we can define all these goals, and others, mathematically. Then, we use Bayesian techniques to look for sets of policies that satisfy all of these goals as well as possible.

Building optimal policy sets through machine learning

The tricky part is to create the set of candidate policies, from which we choose the final policy set. In the candidate set, we need both "low-level" policies (like "application X on host 1 can talk to application Y on host 2") and "higher-level" policies based on groups of host and groups of applications. If we see hosts that are in some meaningful ways acting alike (e.g., communicating with similar entities regularly, have similar identity profiles), we create a "host segment" with those hosts, and then protect them with an overarching segment policy. Likewise, if we see a group of applications that all act in similar ways, we create an "application collection" with those applications, and then build a segmentation policy around that collection. These abstractions provide the raw material for the higher-level candidate policies. Once we have identified that set, we discover the most "nearly optimal" subset of policies that we can, and we suggest those policies to the customer.

Of course, networks are always changing so we can't generate policies once and then never again. We regenerate policies constantly, using what we know from gathered data about the environment and our machine learning, and take into account the customer's own or adopted policies to ensure that the proposed policies won't conflict with the customer's actions.

This is how Edgewise uses and applies machine learning. We want to discover and suggest an accurate and interpretable set of policies so our customers can gain insight into their networks, and use that insight to secure their applications and services—easily, quickly, flexibly, and as transparently as possible.


John O'Neil

Written by John O'Neil

John O’Neil is the Data Scientist at Edgewise Networks. He writes and designs software for data analysis and analytics, search engines, natural language processing and machine learning. He has a PhD in linguistics from Harvard University, and is the author of more than twenty papers in Computer Science, Linguistics, and associated fields, and has given talks at numerous professional and academic conferences.