Creating custom policies in Terrascan (Argo Workflows example)

Creating a new policy

Creating a new policy for Terrascan is a pretty straightforward process. Terrascan itself abstracts away the details of parsing the IaC files, resolving variables, and more. It normalizes the configuration, then passes that configuration to the Open Policy Agent (OPA) runtime for evaluation. Different forms of IaC leverage specialized policies that understand the conventions of that IaC.

Terrascan stores all of its policies in the pkg/policies/opa/rego directory. There are subdirectories for different providers, which helps organize the policies.

When Terrascan is first run, or run with the init command, it will download the default set of policies from the rego directory described above in the Terrascan GitHub repo, into the local Terrascan configuration directory (usually $HOME/.terrascan). When adding new policies, it is usually convenient to use Terrascan's -p option to explicitly specify the policy directory from a local source rather than downloading the policies from the Terrascan repo.

A policy for Argo Workflows

As an example of how to create a new policy, let's say we want to create a policy to enforce image requirements for Argo Workflows. Argo workflows leverage containers, and we want to ensure that the containers used in our workflows use immutable image specifiers--meaning that we use specific image tags rather than depending on the default or latest image. Workflows that use mutable or unstable image specs run the risk of breaking due to changes in the upstream image, even when nothing has changed in the workflow itself.

Terrascan includes support for Kubernetes, but the built-in policies are mostly focused on standard Kubernetes objects rather than the CRDs used by Argo. Fortunately, Kubernetes configurations are well defined standards and Terrascan's Kubernetes support can easily handle CRDs.

An Argo workflow is defined in a configuration that might look something like this:

$ cat workflow.yml
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: hello-world-    # name of the workflow spec
spec:
  entrypoint: whalesay          # invoke the whalesay template
  templates:
  - name: whalesay              # name of the template
    container:
      image: docker/whalesay
      command: [cowsay]
      args: ["hello world"]
      resources:
        limits:
          memory: 32Mi
          cpu: 100m

Our policy will examine the configuration for containers that use mutable or unstable image specifiers, meaning that they reference no tag or they reference a mutable tag such as "latest".

Terrascan policies work through OPA, and OPA works with JSON data. Terrascan converts the configuration into a normalized JSON format before passing it to OPA for evaluation.

Building Terrascan policies is typically done in two phases:

  1. Building the Rego code for OPA in the Rego Playground
  2. Creating a policy in Terrascan that uses the Rego code

Building input data for OPA

In order to use the Rego Playground, we need to have the input data for OPA. Terrascan makes this easy with the --config-only option. In config-only mode, Terrascan will parse the input files as usual. But instead of scanning for violations, it will output the configuration which would be passed to OPA. In the directory where our Argo Workflows are defined, we can run the following to get the OPA input:

$ terrascan scan --config-only -o json -i k8s
{
  "kubernetes_workflow": [
    {
      "id": "kubernetes_workflow.hello-world-.default",
      "name": "hello-world-",
      "source": "/iac/workflow.yml",
      "line": 1,
      "type": "kubernetes_workflow",
      "config": {
        "apiVersion": "argoproj.io/v1alpha1",
        "kind": "Workflow",
        "metadata": {
          "generateName": "hello-world-"
        },
        "spec": {
          "entrypoint": "whalesay",
          "templates": [
            {
              "container": {
                "args": [
                  "hello world"
                ],
                "command": [
                  "cowsay"
                ],
                "image": "docker/whalesay",
                "resources": {
                  "limits": {
                    "cpu": "100m",
                    "memory": "32Mi"
                  }
                }
              },
              "name": "whalesay"
            }
          ]
        }
      },
      "skip_rules": null
    }
  ]
}

Now that we have the input, we can open the Rego Playground in a browser and paste the OPA JSON into the "INPUT" pane toward the upper right of the page. We'll work on the Rego code in the left-hand pane, and can use the "Evaluate" button toward upper-right to evaluate the policy.

Build the Rego code

With the input data in place, we can focus on the Rego code. I won't go into tthe details of Rego here; the OPA docs provide a great introduction to OPA and include detailed documentation for Rego as well.

There are two important conventions relevant to Terrascan's use of the Rego code:

  1. The variable "input" will contain the JSON passed to the OPA engine.
  2. The Rego output should contain a list of resource ids that violate the policy.

First, let's clear the Rego pane and add the following code:

package accurics

workflowShouldUseImmutableImageRef[violations.id] {
  violations := input.kubernetes_workflow[_]
  violations.config.apiVersion == "argoproj.io/v1alpha1"
  hasUnstableImageRef( violations.config.spec.templates[_].container.image )
}

hasUnstableImageRef(image) {
  indexof(image, ":") < 0
}

This code defines a rule workflowShouldUseImmutableImageRef which returns the value of violations.id. Returning the .id member is important, because that tells Terrascan which resources violate the rule. Based on that id, Terrascan can provide more context about the problem, such as the source file name, line number, etc.

Note that the name of the rule was chosen to help explain the goal of the rule. In this case, we want our workflow images to use immutable refs so we chose this particular name for the rule. If we see a violation of the rule, the rule name itself gives us a pretty good clue about how to fix it.

violations starts with all of the elements in the input.kubernetes_workflow array (remember, input refers to our input JSON data from Terrascan). The rest of the rule establishes constraints that identify which elements represent violations of the policy that we are writing.

Since we're specifically looking for Argo workflows, we'll add a constraint that the elements need to have their config.apiVersion value set to "argoproj.io/v1alpha1". Any elements that don't satisfy this condition will be ignored. Finally, we want only the elements that contain a config.spec.templates.container.image member that satisfies the function hasUnstableImageRef.

The hasUnstableImageRef function simply checks whether the referenced image includes a : denoting a specific reference or tag. If no reference is present, then the workflow will use the default image which we consider unstable or mutable. Any images that remain a part of violations use such an unstable reference and should be returned as violations.

Our input data uses the image docker/whalesay which is indeed unstable, so let's test out our policy by clicking the "Evaluate" button. Sure enough, our policy identifies a match:

Found 1 result in 199.772 µs.
1 {
2    "workflowShouldUseImmutableImageRef": [
3       "kubernetes_workflow.hello-world-.default"
4    ]
5 }

The only problem is that this policy is only checking whether a tag — any tag — is specified. If the image explicitly references undesirable tags then we should flag those as violations as well. To do so, we can add an alternative implementation of hasUnstableImageRef which looks for the undesirable tags. That might look something like this:

hasUnstableImageRef(image) {
  unstableRefs := [ "latest", "default" ]
  tagStart := indexof(image, ":") + 1
  tagStart > 0
  ref := substring(image, tagStart, -1)
  unstableRefs[_] == ref
}

This focuses on images that include a tag, as indicated by the presence of the : separator. If a tag is present and it is either "latest" or "default", then the function returns true. If we put it all together, our final rego file looks like this:

package accurics

workflowShouldUseImmutableImageRef[violations.id] {
  violations := input.kubernetes_workflow[_]
  violations.config.apiVersion == "argoproj.io/v1alpha1"
  hasUnstableImageRef( violations.config.spec.templates[_].container.image )
}

hasUnstableImageRef(image) {
  indexof(image, ":") < 0
}

hasUnstableImageRef(image) {
  unstableRefs := [ "latest", "default" ]
  tagStart := indexof(image, ":") + 1
  tagStart > 0
  ref := substring(image, tagStart, -1)
  unstableRefs[_] == ref
}

If we re-run the policy, we are correct to still find the violation. Try adding a tag to the image in the INPUT pane (around line 26), like "docker/whalesay:latest" or "docker/whalesay:v1". If the tag is "latest" or "default", you should continue to see the violation. For other tags, the violation should go away:

Found 1 result in 200.295 µs.
1 {
2     "workflowShouldUseImmutableImageRef": []
3 }

With our Rego operating properly, it is time to add it to Terrascan.

Adding a policy to Terrascan

Terrascan policies consist of two files:

  1. A JSON file which contains metadata about the policy such as severity, description, etc.
  2. A Rego file which contains the Rego code for the policy itself.

Get sample JSON metadata

For simplicity, and to ensure I define all appropriate fields, I'll start by downloading an existing policy (JSON) from the Terrascan repo into policy.json. We can modify that file to create our new policy. Note that it doesn't really matter which policy we choose. Since our Argo policy will be similar to a Kubernetes policy, I'll start with an arbitrary K8s policy.

$ curl -so policy.json https://raw.githubusercontent.com/accurics/terrascan/master/pkg/policies/opa/rego/k8s/kubernetes_pod/allow_privilege_escalation/accurics.kubernetes.IAM.1.json

Built-in Terrascan policies are typically built from templates and include a template_args member which is not relevant to us. We'll remove that from the JSON file for clarity, and edit the remaining fields as appropriate. We might end up with something like this:

{
    "name": "workflowShouldUseImmutableImageRef",
    "file": "policy.rego",
    "severity": "High",
    "description": "Images used in workflow templates should specify an immutable reference, not default or latest",
    "reference_id": "my.argo.policy.1",
    "category": "Argo best practices",
    "version": 1
}

The name member does double duty as the name of the rule in the Rego file whose value will be checked after evaluation, and the name used in the Terrascan output. We'll use workflowShouldUseImmutableImageRef, the same rule name that we used in the Rego code. With the metadata defined, now we just need to paste the Rego code from the Rego playground into the file policy.rego defined in the metadata.

The files policy.json and policy.rego can be placed into the Terrascan policy directory filled by Terrascan init, to be used automatically, or you can use something like Terrascan's -p option to have it use a specific policy directory.

If we add the JSON and Rego files to ./my-policy-dir/, we could use our new custom policy like this:

$ terrascan scan -o yaml -i k8s -p ./my-policy-dir/
results:
    violations:
        - rule_name: workflowShouldUseImmutableImageRef
          description: Images used in workflow templates should specify an immutable reference, not default or latest
          rule_id: my.argo.policy.1
          severity: High
          category: Argo best practices
          resource_name: hello-world-
          resource_type: kubernetes_workflow
          file: /iac/workflow.yml
          line: 1
    scan_summary:
        file/folder: /iac
        iac_type: k8s
        scanned_at: 2021-01-09 10:15:04.121056704 +0000 UTC
        policies_validated: 1
        violated_policies: 1
        low: 0
        medium: 0
        high: 1

Congratulations! You've created your first Terrascan policy. As you build a libarary of policies, you will probably want to keep them organized and applied to the right projects. You can leverage Terrascan options like -p or -c to specify which policies to use. -p specifies a local directory as shown above, and -c can specify the source of the remote repository from which the policies will be cloned.

The Terrascan usage page includes more information about configuration options, notifications, server mode, and more.