Terraform and ArgoCD in beautiful harmony

Terraform and ArgoCD in beautiful harmony
24 November, 2021 lmg

Challenges when combining multiple provisioning tools

Today, many teams are looking into Infrastructure-as-Code and other ways of improving the way they provision resources in an automatic and reproducible way. In our previous blog post, we discussed modern provisioning tools and concepts. Follow along as we dive into a specific use case where we combine Terraform and ArgoCD.

At Irori, we have recently explored a specific provisioning problem that we think is representative of the complexities of using multiple tools: how to propagate facts generated when provisioning cloud infrastructure with Terraform, to GitOps style Helm deployments managed by ArgoCD. On a more general level, such problems are likely to occur when you want to mix and match multiple provisioning tools, where the outputs of some of them are required as the input to others.

Even if you are operating within the same tool, there can be a need to divide the scope into multiple sections, perhaps managed by teams with different responsibilities (e.g. Network vs Kubernetes infrastructure). For Terraform, you could use the Remote State import feature to read outputs of other units of independent Terraform state than the one you are currently managing. However, the documentation for that feature hints that such usage can be problematic: “…, we recommend explicitly publishing data for external consumption to a separate location instead of accessing it via remote state”. It doesn’t become easier when we are mixing tools altogether.

A concrete provisioning scenario

We work with many clients that use Kubernetes and Kafka, so we are highly interested in finding effective and stable ways of setting them up together. Terraform and ArgoCD are two very competent tools when it comes to provisioning cloud infrastructure and Kubernetes resources, but there can sometimes be an overlap in their capabilities.

Our specific motivating use case is:

  • We want to provision a Strimzi/AMQ Streams managed Kafka cluster on OpenShift running on GCP
  • We want to manage as much as possible with Infrastructure-as-Code and GitOps, if possible, to provide a clear, declarative view of the desired environment state in Git, managed with PR/code review flows
  • We want to use Terraform to manage Cloud Provider services (Network, DNS, Compute, etc)
  • We want to use ArgoCD to manage Kubernetes deployments, typically packaged as different Helm charts, with dynamic properties defined as Helm values
  • The Strimzi/AMQ Streams Helm charts need to know the specific LoadBalancer IP addresses to use for the Kafka cluster, or else we will get assigned dynamic ones, that could possibly change during certain maintenance operations, or if we reprovision from scratch
  • Terraform will provision static IP addresses from GCP that are preserved over time, and also configure the relevant Cloud DNS records

Provisioning scenario overview

Investigating solutions

The quick and dirty solution is to run Terraform once, inspecting the generated IPs, and then update the Helm values file in Git before running ArgoCD. Clearly this is a dirty hack! It requires manual intervention, and a two step provisioning cycle. While we could opt to manipulate Git using a Terraform module, that is almost even dirtier. From a more philosophical point of view, we would like the Git config to only contain the necessary details that specify properties we care about in the environment. Imagine, for example, that you have two modules in Terraform where one depends on some output of the other. In that case, we don’t want to have a middle step where we save such values to Git – we want it automatically propagated by our provisioning tooling.

If we digress for a moment, it is worth considering that ArgoCD is a fairly new project, and that GitOps as a concept is quite new, which means that best practices on how tools should be best used are still developing. It seems people have different philosophies on what it means to do GitOps and what requirements this puts on the tools used. As an example, the currently most commented ticket in the ArgoCD project repo is about a feature to be able to separately manage Helm charts and values files. From our point of view, such a feature would be very valuable in order to achieve a clear distinction between versioned Helm charts as a kind of artifact, and the specific application of a chart in an environment (values file).

Anyway, back to the problem at hand. We did a google deep dive and brainstormed different alternatives:

  • A) Terraform updates the Git Repo
  • B) Using ArgoCD Parameter overrides, which apparently also can be written back to Git
  • C) Creating some kind of ArgoCD preprocessing plugin, that could potentially look up data in some external data store, or get data injected via the ArgoCD pod

None of the above alternatives seemed really clean, but we were about to move on to try some basic implementations of one or two of these, when we discovered some deep wisdom of the ancients among ArgoCD GitHub tickets. In this case not only did it seem someone had a similar problem, but also a solution! Big thanks to @jcrsilva for the core of the approach below.

Final Terraform-ArgoCD integration solution

In short, the solution is to inject an extra Helm values file in ArgoCD, which is mounted via a ConfigMap into the ArgoCD pod:

resource "kubernetes_config_map" "environment_metadata" {

  metadata {

name = "environment-metadata"

namespace = kubernetes_namespace.argo.metadata[0].name

  }

  data = var.environment_metadata

}

...
# Provide the values to the ArgoCD Terraform module like so:

environment_metadata = {

"kafka-cluster-tf-environment.yaml" = yamlencode({

    # needs to map to the Helm chart value structure

    "kafka" = {

      "bootstrap_external_ip" = module.my_network_tf_module.kafka_bootstrap_ip

      "broker_external_ips" = module.my_network_tf_module.kafka_broker_ips

    }

})

  }

And the corresponding values added for the ArgoCD Helm chart:

repoServer:

  volumes:

- name: environment-metadata

  configMap:

    name: environment-metadata

  volumeMounts:

- name: environment-metadata

  mountPath: /environment-metadata

Actually using this in your ArgoCD Application definition:

helm:

  valueFiles:

    - /environment-metadata/kafka-cluster-tf-environment.yaml

  values: |

    # your other Helm values

So what did we actually achieve here?

  • Terraform Network output facts are automatically propagated and available to the ArgoCD Application deployment specifications
  • No manual intermediate step necessary
  • No two phase application of provisioning configuration
  • Git declarative description of the environment is free of “spam” details that we don’t really care about

But have we really achieved provisioning nirvana here? Are we following the tenets of GitOps strictly (whatever we imagine them to be)? Not really, there are still some issues that remain:

  • The Terraform configmap definition is tightly coupled with the specific Helm chart values structure where the facts are used
  • The propagation behavior in case of updated values is a bit unclear. It seems that we might at least need to restart the ArgoCD pod after any value is updated, and then we are not sure exactly how and when an ArgoCD sync might trigger to apply the new values.

The second point here is probably not a big issue if we don’t expect these values to change a lot. The first point is however not great. We can imagine creating some extra abstraction layer that maps Terraform outputs to a specific Helm values structure. Perhaps an ArgoCD plugin can do some preprocessing. Maybe even just an init container with some templating engine. But that is an exercise for another day 🙂

Final notes

In conclusion, we can see that implementing GitOps strictly can mean you have to do some extra work and add a little extra complexity, but in the end we believe that it is worth it to enjoy the benefits of a well defined desired state in Git, managed by your typical review/dry-run/merge workflows at scale within larger teams.

Thanks for reading! Hopefully you found some of this interesting. Stay tuned for more deep dives into other interesting tech topics.

Author:

Elias Norrby
Solution Architect

 

Björn Löfroth
Infrastructure Architect