Tools and practices for cloud infrastructure provisioning

Introduction to provisioning

This article is the result of the Irori learning track where we gather co-workers to explore new patterns and technologies, and continuously try to sharpen our skills and expand our horizons. This is the first of two posts on the topic of provisioning, the next one will focus on a multi tool provisioning scenario.

Today’s development teams have an ever increasing need for automated infrastructure provisioning to support the application development process. They require an effective, repeatable, unified provisioning process for deploying cloud infrastructure and services in a reliable manner, ensuring a predictable outcome. These processes require tools supporting multiple cloud providers and service deployments. Also, many different development teams must be able to collaborate on the infrastructure while maintaining a common view on the state of their environments.

Along the way one is likely to encounter the need for multiple tools to manage the different stages and aspects of our deployments. For infrastructure deployments, for example, one might turn to Terraform, but other tools may be required for other deployment aspects. However, it’s not always entirely obvious how these tools should interact with each other, and sometimes their capabilities overlap.

Terraform and GitOps

While there are many tools that can help you provision Infrastructure-as-Code, such as Ansible or different cloud provider specific libraries, Terraform has emerged as a popular tool. From our point of view its main strengths are:

  • A large ecosystem of provider plugins to deal with a wide range of cloud and on prem infrastructure provisioning needs
  • A consistent declarative way of specifying the intended state, with tooling to propagate information about provisioned resources as inputs for other resources
  • Thoughtful features to improve the shared usage within teams, such as shared remote state

Once you start using Terraform in a team you need to start using a VCS to share code between colleagues and teams, and with that, a working practice on how to work with your code. Adopting a mindset that the code is the single source of truth is usually helpful as it can help you decide/know if you should or shouldn’t create or destroy a set of resources.
It also helps by enforcing that all changes happen in the code and not via a GUI or a CLI.

The next hurdle that you’ll likely encounter is that someone has checked out the code, made changes to it locally, and applied it to the infrastructure. If they now forget to check in their code, you’ll have Terraform listing changes the next time you run it from your machine.
One solution is to simply remove the changes made by your colleague, assuming you agreed on the VCS being the source of truth. This is something you probably won’t be so quick to do if the changes would affect production infrastructure.

So how can we avoid getting into this situation?

Another principle that is helpful to adopt is GitOps. Having a pipeline to automatically apply code sourced from a repository will ensure only Terraform code that has been added to your codebase via pull requests gets applied to the infrastructure.

While embracing GitOps solves a lot of issues, it comes with its own set of challenges. There are cases where, although an end goal is technically achievable, the means to do so would violate one or more core GitOps principles. These issues often crop up in the interface between different tools or services, such as when an output generated by Terraform is needed as an input to another system, when an Ansible inventory is dynamically provisioned by another tool, or when a Helm chart deployed by ArgoCD must have some values overridden.

ArgoCD for Kubernetes GitOps deployments

Doing everything with Terraform is a path easily taken, but usually something you’ll probably end up regretting once you start moving outside what Terraform is built to do (provision infrastructure). Multiple posts have been written about the pains of trying to manage a wide spectrum of resources with Terraform, e.g. Kubernetes resources (see recent example).An alternative is to try and find a set of tools, where each tool tries to do one thing well, in line with the Unix philosophy. Trying to stay true to GitOps principles and still solve problems can get you into almost philosophical discussions. In an upcoming post, we’ll dive into a specific problem we encountered and how we solved it.

While Terraform technically has a lot of tooling to support Kubernetes and Helm resources, we believe ArgoCD is a much more tailored tool to support the specific needs when doing GitOps for Kubernetes deployments. Check out our previous blog posts about GitOps and ArgoCD.

Apart from being focused on the Kubernetes use case, the main difference between the ArgoCD approach and the Terraform usage pattern is that ArgoCD is an operator that continuously oversees the state of the Kubernetes cluster, and manages deviations from the desired state specified in Git, while Terraform relies on triggering the “apply” operation, typically from a CI/CD pipeline to reconcile the state.

You will be able to read more on how Terraform and ArgoCD harmonize in the upcoming post.

Jonas Boström – Cloud Architect
Jonathan Kyrklund – Infrastructure Engineer