Reflections on Declarative Configuration

Brian Grant
ITNEXT
Published in
8 min readMar 27, 2024

Are declarative configuration and Infrastructure as Code all they’re cracked up to be?

I have spent a lot of time working on declarative configuration in Kubernetes: early thoughts, kubectl apply, KRM, kustomize, Google Cloud Config Sync, kpt, porch, ... During that same time, declarative automation evolved in parallel inside Google, which has used declarative configuration extensively for many years, and Terraform emerged outside, as well as many other tools in this fragmented space.

What is declarative configuration, what is it good for, and how would I approach it?

Declarative configuration vs. Infrastructure as Code

First, note that I didn’t say Infrastructure as Code (IaC). Infrastructure as Code generally has two implications that get mixed together:

  1. Infrastructure configuration should be represented as code or a code-like format, including template markup syntax, configuration languages (e.g., HCL, CUE), general-purpose languages (e.g., Typescript), and sometimes scripts. These formats typically support inline parameter substitution, field references, expressions, conditionals, and sometimes loops and functions.
  2. Configuration should be managed using the same tools and practices as code, including text files, source control, code reviews, builds, and CI/CD.

Neither is necessary nor sufficient for declarative configuration. Infrastructure as Code is not even necessarily declarative. That said, popular Infrastructure as Code tools, such as Terraform and Helm, are mostly declarative (except for provisioners, hooks, etc.).

Also note that I’m talking mostly about configuration of resources in Kubernetes and resources in cloud-like infrastructure systems, though it also largely applies to application configuration. Such resources are data structures, often with many attributes and substructures (lists, maps, unions).

Here’s a Terraform example:

resource "google_compute_global_forwarding_rule" "http" {
project = var.project
count = local.create_http_forward ? 1 : 0
name = var.name
target = google_compute_target_http_proxy.default[0].self_link
ip_address = local.address
port_range = var.http_port
labels = var.labels
load_balancing_scheme = var.load_balancing_scheme
network = local.internal_network
}

Kubernetes supports standard operations (Create, Read, Update, Delete, List) and metadata (names, labels, annotations) on its resources. This enables (mostly) general-purpose read, modify, write operations on groups of them, which kubectl leveraged early on. The resource types and their schemas can be defined declaratively. In Terraform, the operations and schemas are defined in code on the client side through the provider framework. This framework enables fat clients to paper over quirky APIs with custom methods, unfortunate behaviors, inconsistent structure, and so on. Such a client can be very time consuming to write, however. More on that later.

Common use cases require multiple heterogeneous resources, often with common properties and cross references among them and to other resources.

So configuration of resources using Infrastructure as Code entails writing, editing, maintaining a large set of interlinked struct constructors. These constructors are executed sometimes at build time, but often at deploy time.

There are often two tools reasonably popular in a particular category at any given time, such as Chef and Puppet or Helm and Kustomize, and a long long tail of others that never reach escape velocity. My impression is that choice of format is frequently driven by ecosystems (e.g., Terraform, Helm), familiarity, and preference.

New configuration languages (AKA DSLs) generally have steep learning curves and initially smaller content, tool, and library ecosystems. I’ve also observed that many people responsible for authoring configuration don’t want to write, test, maintain, build, operate, upgrade, etc. configuration generation programs written in general-purpose languages. I’d be interested in data regarding whether that is changing with Pulumi and the CDKs (e.g., cdk8s.io). I’d guess it partly depends on how reusable the generation code is and how often it needs to be changed. It seems like a good fit where it can be embedded in a tool or framework. I was going to use sst.dev as an example, but it appears to be moving away from the AWS CDK and is rebuilding on top of Terraform providers, which are used in Pulumi and Crossplane as well as Terraform. I agree that targeting the resource layer is the way to go.

What is declarative configuration?

Declarative configuration expresses the desired state as opposed to the steps required to achieve it. It represents the goal. Declarative configuration enables focusing on the WHAT rather than the HOW. Fundamentally, the desired state is representable as serialized data. Since this data can be voluminous, it is often generated, such as using Infrastructure as Code tools.

Sometimes declarative configuration is called intent-based configuration, but I view intent as more a higher level of abstraction or other form of simplification rather than as inherently declarative vs. imperative, and natural-language-based intent will likely soon be feasible.

Declarative configuration does not necessarily imply a higher-level abstraction is used. At the other extreme, it could represent all the desired attributes of the resources under management. This is sometimes called WET configuration, for Write Every Time, to contrast with DRY, or Don’t Repeat Yourself, which implies factoring out common attributes somehow. WET configuration can be thought of as the assembly-code-like output of Infrastructure as Code tools or Kubernetes YAML.

What are the capabilities enabled by declarative configuration?

Bulk operations

Declarative representations of resources describe groups of objects, nouns rather than verbs. The operations supported are standard across all of the types. This enables bulk operations, such as create or delete.

Some tools determine the most appropriate actions based on a comparison between the desired state and the current state. This capability is extremely useful in particular for reconciling updates to the desired state. No matter how the changes to the desired state were made, there is just one mutation operation, apply.

That model enables operation selection, orchestration, sequencing, error handling, retries, recovery from partial failure, and so on to be implemented in a reusable way by the tool rather than being manually scripted for each change.

Serialized representation of the desired state

A serialized representation of the desired state, even the simple, low-level WET variety, already enables a number of useful capabilities, such as:

  • automated repair of the live state from the desired state
  • state export/import
  • pre-deployment validation and policy enforcement
  • change diff review and approval
  • versioning and undo/rollback

Sharing and reuse

A serialized representation also enables publication, distribution, and discovery of aggregations of resources for off-the-shelf solutions. However, in addition to packaging, the ability to easily adapt, or customize, the resources for a specific use case is also necessary.

Many tools, such as Helm and Terraform, address both packaging and customization, and also some amounts of orchestration and runtime state management (e.g., keeping track of the inventory of what was deployed). Infrastructure as Code packages generally have the following characteristics:

  • Packages are generally human readable and authorable
  • Package sources are typically managed as text files in version control
  • Dependencies are often statically bundled
  • Packages may be pushed to an artifact repository or object store for deployment or read directly from version control
  • Packages often include descriptions and other metadata for discovery

Such packages make it possible to encapsulate and reuse knowledge, and to embed required settings, recommended defaults, best practices, and organizational conventions, or even just to create distinct variants, such as for multiple environments or multiple teams.

How is declarative configuration beneficial?

Declarative configuration can be especially helpful to larger organizations using Kubernetes and/or cloud resources. Provisioning infrastructure is one of the main time sinks of platform teams. Declarative configuration enables desirable capabilities that aren’t built in to most infrastructure systems, such as versioning, and can provide a common tooling layer across multiple heterogeneous systems, similar to unified APIs but without creating common abstractions across providers. Bulk operations enable management of aggregations of resources, which likely is attractive to users that need to manage large numbers of resources. Sharing and reuse make it easier to get started on similar scenarios, and variant construction makes it easier to repeat operations across multiple environments, regions, or teams.

Declarative configuration can also provide a foundation for other systems and tools, such as provisioning services.

Infrastructure as Code pitfalls

Infrastructure as Code has some common pitfalls in current practice.

Most Infrastructure as Code tools implement a one-way generation approach that assumes changes are only made through those tools (AKA exclusive actuation), and other changes to the live state are considered to be undesirable configuration drift. Drift can be hard to prevent in practice. Sometimes it can be caused by humans and sometimes by systems. GitOps can detect such drift quickly, but automated remediation of the drift can generally be only in one direction, to change the live state.

The combination of off-the-shelf reuse, variant generation, the desire to factor out common attributes, and the desire to create higher-level abstractions drive configuration generator complexity. This can cause the configuration generators to be more difficult to write and maintain, while falling short of reusability and understandability goals. This has been called “artisanal automation”. It also obstructs most automated changes.

The inherent trade off between flexibility and usability / simplicity often causes abstractions to erode, resulting in excessive parameterization (Terraform example) and reducing some of the desired benefits from abstraction.

Managing configuration in git incurs non-negligible toil (clone, branch, add, commit, tag, push, push tags, review, merge) when it is stored separately from application source code, and is coupled to application changes otherwise. Either way, it generally assumes humans in the loop. I am not alone in seeing disadvantages to storing configuration in source control.

Why do we have management-plane APIs, and how are they used? A bunch of tools and systems call them interoperably. The Kubernetes API surface is a good example of this, and supports multi-system interaction via watch. When we exclusively move management to an unspecified source of truth (e.g., some files in some directory in some git repo) in an inscrutable format rendered through an irreversible, opaque process, we’ve basically converted the management APIs to read-only APIs rather than read/write APIs.

Can declarative configuration be improved?

First, when declarative configuration is directly exposed to users, it’s a user interface surface. Thus, authoring and modifying configurations are UX problems, while generating configuration is an engineering challenge. Thus we need to combine UX and engineering approaches, and not just apply engineering only. For example, there are UX techniques for mitigating complexity other than abstraction, such as progressive disclosure. The IDP community is catching on to that.

Second, a key puzzle that needs to be solved is how to improve the UX economically for users, service providers, tool creators, plugin authors, and package authors. Most Infrastructure as Code tools so far have been relatively straightforward to build using decades-old concepts: a templating engine (e.g., jinja, envsubst) or simple language transpiler, a package manager, a plugin mechanism, a task graph, a build and deployment system (or an existing CI/CD system), a service catalog. These are the resource-type-independent parts. The surface area of the resource schemas across popular providers is very large, which makes the cost large for resource-type-specific code (e.g., provider plugins) and configurations (e.g., reusable Terraform modules). These are generally crowd-sourced to some degree and not directly monetizable.

Third, we need to tackle the areas where users spend the most time, which I’d guess is writing and changing configurations. I like that Firefly is focused on this part of the problem, starting with resources users already provisioned. There are also a number of recent infrastructure from code frameworks and tools, graphical design tools, and so on. I am encouraged to see uses of declarative configuration that aren’t just invocations of hand-written templates.

We sometimes still write assembly code or HTML, but usually we don’t. If declarative configuration were rarely handwritten, how would we change it?

Feel free to reply here, or send me a message on LinkedIn or X/Twitter, where I plan to crosspost this.

If you found this interesting, you may be interested in other posts in my Infrastructure as Code and Declarative Configuration series.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Published in ITNEXT

ITNEXT is a platform for IT developers & software engineers to share knowledge, connect, collaborate, learn and experience next-gen technologies.

Written by Brian Grant

CTO of ConfigHub. Original lead architect of Kubernetes and its declarative model. Former Tech Lead of Google Cloud's API standards, SDK, CLI, and IaC.

No responses yet

Write a response