Fitness Validation For Your Kubernetes Apps: Policy As Code

A hands-on coding journey to implement ‘Policy As Code’ and validate the fitness of your Kubernetes Application against the cluster policies.

Arun Ramakani

Published in

ITNEXT

8 min readJun 21, 2020

Policy as Code is an idea where we attempt to keep the policy generation and validation under source control by expression policies as code.

Why Policy As Code?

The policies we have to manage and organize our apps are continuously evolving with changing business and technology landscape. The landscape is changing responding to Growth Opportunities, Competition, Team Maturity, Technology Disruptions, New Engineering Practices, Compliance, and Security Threats.

Generally, when an application or a infrastructure is created as a greenfield work, we plan and implement the policies well. But it tends to decay over time with manual policy governance.

These policies are a mix of authorization rules, network policies, architecture characteristics, best practices, and operational concerns. The policies are contributed and validated by different stakeholders like Developers, Operators, Security Engineers, Architects, and Product Owners. Multidimensional policies contributed and validated by many stakeholders, makes it challenging to keep the production ecosystem in sync with evolving policies

With this high rate of change & need for collaboration, policies inherit many of the requirements that we have to manage application code. And that triggers the need to manage policies as code.

Once we start managing policy as code, it opens up a lot of advantages. Some of them are listed below.

Easy to Collaborate
Sharable and Reusable
Auditable
Repeatable and Reliable
Revert & Rollback
Easy Debugging
Automated Governance

This article concentrates on policy validation as code for applications deployed in Kubernetes using “Open Policy Agent”. If you are interested in policy generation as code, have a look at the article “Helm Is Not Enough, You Also Need Kustomize”.

What is “Open Policy Agent”?

Open Policy Agent(OPA)is a general-purpose opensource policy engine, which can help enforce policies across the stack. We can use OPA for

Enforcing authorization in a microservice API
Validating the integrity of Kubernetes YAML
Policy verification with Infrastructure as Code (Terraform, Crossplane)
Apply access control for docker
Apply access control in a Linux Machines

OPA is able to fit into a different stack because of its architectural simplicity. OPA exposed JSON API with policy query as input and policy decision as a response back. In OPA policies are written in a domain-specific language, custom build for the purpose called Rego. OPA also has an in-memory data store holding the data necessary to make decisions.

All we have to do is 1) Write Rego policies, 2) Populate the decision support store, and 3) Start making policy queries.

OPA can integrate with many more ecosystems, read more about these integrations here.

Gatekeeper: A Kubernetes OPA Integration

Gatekeeper is integrated into kubernetes as an admission controller webhook. They execute rego policies whenever a resource is created, updated or deleted to validate and authorize the change.

The admission controller intercepts all requests to the Kubernetes API server prior to the ETCD persistence, but after authenticated and authorized. There are two types of admission controller 1) Validation Admission Controller 2) Mutation Admission Controller. While the validation admission controller can be used for policy validation, mutating admission can be used policy generation.

Gatekeeper currently has only validation admission controller and audit implementation. Mutating admission controller is planned for future implementation.

The audit functionality is a way to look at the list of policy violations, in an existing Kubernetes resource created in the absence of Gatekeeper (Either before creating the policy constraint or when the Gatekeeper is down). This article will focus on Gatekeeper v3.0. It's important to note that the way v3.0 works is different from the older version. Let's have a look into how gatekeeper fits into Kubernetes architecture.

At first look, this might look complicated. Understanding a few concepts covered below will make it clear.

Policy Template & Policy Constraint CRDs

Policy Constraint is a Customer Resource Definition (CRDs) used by the cluster operator to define a set of rules that need to be met before adding / deleting / modifying a Kubernetes resource. If you don't know what is a CRD, they are a way to extend Kubernetes with your own custom objects. When you need more than the Kubernetes native objects like Pods, Deployments, Secret, Ingress, etc., we go for CRDs. Read more about CRDs here. Below is an example of a Policy Constraint where we declaratively specify that we should have a label named “owner” on all the k8s namespace resources.

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: ns-must-have-owner
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Namespace"]
  parameters:
    labels: ["owner"]

Policy Constraint is not a standalone unit, it depends on the Policy Template CRD. Policy Template is something that defines the Rego rules to be executed for the Policy Constraint and schema for Policy Constraint CRD.

Policy Templates can be shared and reused across the organization and to the wider ecosystem. Gatekeeper has a set of repeatedly used Policy Template inside the library folder. Look for a policy template that suits your need before writing a new one here. Once we have the template CRDs installed in the cluster, cluster operators can start adding constraints matching the schema defined in the template. Look at the Policy Templates for the above-given constraint.

apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
      validation:
        openAPIV3Schema:
          properties:
            labels:
              type: array
              items: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredlabels

        deny[{"msg": msg, "details": {"missing_labels": missing}}] {
          provided := {label | input.review.object.metadata.labels[label]}
          required := {label | label := input.parameters.labels[_]}
          missing := required - provided
          count(missing) > 0
          msg := sprintf("you must provide labels: %v", [missing])
        }

Some of the important components of the above policy template and policy constrain YAML is explained here.

“spec.crd.spec.names.kind” from the policy template YAML is the schema definition for policy constraint CRD defining the name.
“spec.crd.spec.validation.openAPIV3Schema.properties” from the policy template YAML is the input parameters schema for policy constraint CRD.
In the policy template YAML, the label is the only property defined as policy constraint CRD spec. In the policy constraints YAML string array “["owner"]” is the input for “spec.parameters.labels”.
“spec.match.kinds” from the policy constraint YAML defines the list of Kubernetes resources against which the policy constraints should be applied.
Finally “spec.target.rego” from the policy template YAML will hold the actual Rego code that will get executed when we run the policy constraint.

Some Basics of Rego

Learning rego is a big topic by itself. We will learn some basics needed for the job in this blog.

Tip 1: We can have many rules defined in the rego section. The syntax for the rule block looks like this

rule_header[return_values]{
rule_body
}

In the above example

deny — is the rule header
{“msg”: msg, “details”: {“missing_labels”: missing}} — is return values
Everything else within “{}” is the rule body

Tip 2: Whenever you see “input.”, it refers to the input parameters passed on to the rego block. In the example “input.review.object.metadata.labels” referees to the labels from the incoming YAML under review. “input.review.object.” referees to the incoming YAML and “metadata.labels” is accessing labels object inside the incoming YAML by traversing. “input.parameters.” in the rego block referees to the inputs parameters, we defined in the policy constraint.

Tip 3: Whenever we refer to “data.”, it means we are accessing the data stored in OPA in-memory for decision support. To be able to access some data as “data.” you should sync that data in OPA first. Why and how will you do that is covered towards the end of the blog.

Tip 4: All statements inside the rule-body are executed as “and” logical operation. This means that as soon as one of the expressions logically outputs false, the execution breaks indicating that the complete block failed.

expression-1 AND expression-2 AND ... AND expression-N

Tip 5: We should look at writing the rules, as defensive programming. When execution returns in the middle with a “false”, it will not have a return value and hence there is no information available to deny resource addition.

# When the count of missing labels is greater than zero the execution continues as the output is true
# When the count of missing labels is not greater than zero the execution breaks to return and admits the resource into the clustercount(missing) > 0

Tip 6: When you are looking for “or” logical operation we should use multiple rules block. Between the rule blocks “or” logical operation is applied. This means when we have multiple rules block, all of them will be executed until one the block executes till the end to return some values and fail the resource admission.

rule_header1[return_values1]{
rule_body1
}
rule_header2[return_values2]{
rule_body2
}

Tip 7: There are many built-in functions to help us write quick rule blocks. “count” in the above code is an example. Look for functions you need here. You can build your own functions with the below syntax.

function_name(parameters) = return {
function_body
}

There is more to Rogo like other programming languages (arrays, import, loop, dictionaries, virtual documents, regular expression, etc). If you are willing to deep dive, a detailed walkthrough is here.

Watch & Replicate

Sometimes we will require to access the details of existing resources to make policy decisions. A typical example of this could be to have a unique ingress endpoint. To do this we should be able to access the existing ingress resources from the cluster. This can be technically implemented using sync resources, that can selectively watch and replicate data from ETCD into OPA in-memory space. Once replicated this data can be accessed inside the policy rego with “data.”. An example of the replication object, replicating Ingress and Pod is described in the below YAML.

apiVersion: config.gatekeeper.sh/v1alpha1
kind: Config
metadata:
  name: config
  namespace: "gatekeeper-system"
spec:
  sync:
    syncOnly:
      - group: ""
        version: "networking.k8s.io/v1beta1"
        kind: "Ingress
      - group: ""
        version: "v1"
        kind: "Pod"

The Audit

Gatekeeper admission controller is a webhook and might be down when we add/edit/delete resources. This may result in non-standerd resources making their way into the cluster. And also we may end up in a situation where there are a lot of existing non-standard resources inside the cluster, entered before the constraints are applied. We need a mechanism to identify these non-standard resources in the cluster and that's the audit function of Gatekeeper. Audit functionality is a periodic verification of existing resources within the cluster and brings them to our notice by appending them to the status field of the Policy Constraint CRD as violations.

apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
  name: ns-must-have-owner
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Namespace"]
  parameters:
    labels: ["owner"]
status:
  auditTimestamp: "2019-08-06T01:46:13Z"
  byPod:
  - enforced: true
    id: gatekeeper-controller-manager-0
  violations:
  - enforcementAction: deny
    kind: Namespace
    message: 'you must provide labels: {"owner"}'
    name: default

Conclusion

We are beginning to see that Policy as Code is getting more popular and it has grabbed its place in the innovator's quadrant of infoQ 2020 Architecture Trends. OPA being a general-purpose policy engine recognized by CNCF, this makes learning OPA a nice skill in the arsenal of a DevOps Engineer 🏄.

See you in an upcoming article ✋.