As data center environments march inexorably towards greater complexity (thanks in large part to virtualized workloads), the ability of any one person to understand the scope of the system is shrinking...rapidly. Not long ago, at least one infrastructure engineer could answer common questions about the environment such as, “which applications talk to app X,” or conversely, “what does app X talk to?”.
Today, even the most technically savvy IT staff in your organization can probably only describe the interactions between a mere handful of the applications in your tech stack. If anyone in your organization maintains a map of how everything interacts, it’s probably built manually and is obsolete the minute it’s published. When it comes to building security policies, it’s no wonder the best anyone can hope for is to define broad firewall rules between subnets and cross their fingers. However, as threats become increasingly sophisticated, we all know we’re reaching the end of that road. Something has to change.
A shared burden
The first step forward is to accept that security must share the responsibility for designing policy with the engineers who write and deploy solutions (click to tweet this). Even a few years ago, systems were too complex to reasonably expect an engineer or security practitioner, while bombarded with the day-to-day, to stay heads-down for the hours or days necessary to map out how all the components within an application stack communicate. Today, it’s virtually impossible to ask for good, specific policy, even for rockstar engineers immersed full-time in the application. There are too many details to look up, and they change too frequently.
What we’re left with is a conundrum: we can’t afford the exposure of coarse-grained policies, but there isn’t enough time in the day to write explicit policies. (And even if there were enough time, nobody has the requisite understanding of how everything interacts.) This is compounded by the explosion of complicated, third-party components, in particular distributed databases and distributed computing frameworks. Consider: short-lived containers and services expand elastically, and policy complexity explodes beyond the point of comprehension when it does. We’re entering the realm of “policy by algorithm.
Now that we’ve determined that engineer-y/devops-y folks have to be involved in policy creation and maintenance, we’ve also admitted that they alone don’t know enough.
The key prerequisite, which no one person can satisfy, is deep behavioral insight:
- Which apps talk to which other apps?
- On which hosts do the communicating apps reside?
- Which user accounts are the apps running as?
- Are the apps containerized? If so, what container images are they inheriting from?
- Are any of these apps running on an auto-scaling infrastructure?
- What can we say about the network topology between the communicating apps? To wit, are there any NATs, firewalls, load balancers, etc. in those communication paths?
- What are the versions of the apps, and of their various software dependencies?
- What OS versions and patch levels are in play?
- How often do these details change?
The sad truth is that almost no one can answer these questions with enough confidence to be useful. Assume for a second we had an engineering team that could gather everything I just listed. We’d be left dazed, staring at a veritable mountain of data. Nobody would be able to make any sense of it, so we’d have to hire another team of engineers and data scientists to figure out what all those network interfaces, routing tables, flows, apps and whatnot actually mean.
Practically no one has that kind of budget, so we’re left with plan B: read some product documentation, gather some flows in a test environment, and draw some inferences about how the system will behave in production. Realistically, that only happens for critical apps. More often, it’s: deploy the app into a subnet, figure out what resources it needs in other subnets, then add firewall rules where appropriate. Did it break? Capture some traffic, request more firewall rules, rinse, repeat. It’s crude, it turns every subnet into a free-for-all for hackers, but it’s the best we can do with limited time.
More importantly, our tried-and-true security process is demotivating: it typically involves several teams, none of whom are empowered to complete the project. Life in our siloed IT world is a maze of submitting change request tickets, waiting for ticket resolution, hurriedly debugging once we get our changes, opening more tickets, waiting some more, finger-pointing when management gets impatient, etc.
There has to be a way to bring the security and ops teams together for the good of the environment. In my next blog post, I’ll dig into the how.