Dhall-terraform


(Ollie Charles) #1

Hi all,

I’ve recently been learning about Terraform as an alternative to managing resources with NixOps. My rough idea is to change work ops to use Terraform to manage raw resources, and then have NixOS manage what is on the machines. There will be some glue to then stick all of this together, but that’s further down the line. This presents two problems:

  1. How do I write the Terraform configuration files?
  2. How do the machine configurations know about the data produced by Terraform. E.g., telling our API server the IP of our PostgreSQL server.

2 is solvable but not the focus of this post, this is really just about the first point. Being a Dhall enthusiast my preference is to write the Terraform configuration in Dhall. As HCL is compatible with JSON, the obvious solution is to just use dhall-json to compile Dhall to JSON. This isn’t all that bad, but the one problem I ran in to was the classic “well-typed graph” problem.

Terraform represents resources as nodes of a graph, and allows arbitrary edges between them to encode dependencies. But interestingly, the data that you can reference is not the same as the data that you define. For example, an AWS Elastic IP produces a public IP, but that’s not something you can define… but you are most certainly interested in using this value in other configurations!

The approach I immediately explored takes us out of pure Dhall and requires a special dhall-terraform executable. The idea is that a Terraform configuration is actually a function from it’s resulting resources to their configuration. Here’s a small example:

  λ(resources : { elastic-ip : { id : Text }, ec2-machine : { id : Text } })
→ { elastic-ip =
      { instance = resources.ec2-machine.id }
  , ec2-machine =
      { ami = "ami-2757f631", instance_type = "t2.micro" }
  }

This is the configuration for an EC2 machine with an Elastic IP. Note the type of this expression

  ∀(resources : { elastic-ip : { id : Text }, ec2-machine : { id : Text } })
→ { elastic-ip :
      { instance : Text }
  , ec2-machine :
      { ami : Text, instance_type : Text }
  }

So a configuration can depend on the ultimate result of itself! This is where dhall-terraform comes in - to provide this recursive definition.

I almost have a working proof of concept of this, but before sharing that I wanted to take a little step back and and to ask:

  • What are your pain-points with Terraform? My work here is to solve the annoyance of naming things and having name resolution “in” Dhall - but it just moves a typing phase out of terraform apply and into Dhall. This is to say, eitherway the name checking happens, so have I really solved a problem? I think I have, because going forward we can actually type these final data definitions (e.g., ec2-machine.id could have the opaque type EC2ID)

(Alex Humphreys) #2

Hey there,

I was looking at this problem recently as well. That id value of resources.ec2-machine.id is only known to terraform after it runs, and can’t be specified in the json configuration that is passed to terraform. So I’m not sure resources.ec2-machine.id can be resolved to the id, unless the dhall-terraform executable is checking the value with terraform show or something then passing the value back to the dhall configuration? At least I think that’s the case, maybe I’ve misunderstood your approach?

I’ve been playing around with how to solve this, using dhall to create json that is passed to terraform. I came up with something like this for handling references:

let ref : Text -> Text -> Text -> Text =
\(prefix : Text) -> \(suffix : Text) -> \(name : Text)
-> "\${${prefix}.${name}.${suffix}}"

let AwsInstance = { arn : Text -> Text -> Text -> Text }

let awsInstance = { arn = ref "aws_instance" "arn" }

in awsInstance.arn "my-instance-name"

which would evaluate to the Text '${aws_instance.my-instance-name.arn}' which terraform would then be able to look up (with the proper escaping of course). Granted this isn’t particularly type safe, as there’s nothing stopping one writing the expression let awsInstance = { arn = ref "aws_instance" "foo" } which would obviously be invalid. I’m hoping that could be mitigated when https://github.com/dhall-lang/dhall-lang/issues/224 is standardised, then that ref function could be constrained to only use valid values.

Also, my approach would fail for references where the type expects a type that isn’t Text, as that ref function only returns Text, but could reference an exported value that’s a list of strings, bool, etc. I could maybe make types always accept a union of “Type of value terraform wants” or “reference as string”, but that’d get super verbose for the 90% of times where you’re not passing a reference. Haven’t had any good ideas here yet.

As for pain points you asked about with terraform:

So yeah, most of my pain points have open issues about them, so I’m hopeful this should improve over time.

I like your idea of using terraform to provision cloud resources, then use nix to configure them. I was hoping to play around with that too soon, but haven’t had a chance yet, so I’d be interested to hear how you get on. Just wondering, have you seen this https://github.com/dhall-lang/dhall-nix? Maybe it can help with the sharing values between the dhall-terraform stuff and the nix configs?


(Ollie Charles) #3

Ah, I should have been clearer as to what dhall-terraform is doing. It is actually only trying to produce a HCL file to give to Terraform, much like generating a JSON configuration with dhall-json. So in the given example, the “output” of the whole process is to just generate a HCL like:

resource "aws_eip" "elastic-ip" { instance = "${aws_instance.ec2-machine.id" }
resource "aws_instance" "ec2-machine" { ... }

So I don’t need to know what id is, just that someone was trying to refer to it. This means I then compile the Dhall expression to a Terraform variable reference.

This is only if you want to produce multiple files. My approach so far has to been to split everything into separate Dhall files, and then compose them into a single Dhall file with imports. Then I pass this into dhall-json which resolves all the imports and gives me a single JSON file as a result.


(Ari Becker) #4

Our current approach is not to replace the HCL with Dhall at all, but to replace the tfvars files with ones that are generated by dhall-to-text. For example, generating our state_bucket.tfvars, which is then used by terraform init -backend-config=state_backend.tfvars:

let Environment = ./redacted : Type

in    λ(env : Environment)
    → λ(project : Text)
    → ''
      bucket = "${env.deploy.state_bucket.tf_name}"
      key    = "state-files/${project}/terraform.tfstate"
      region = "${env.deploy.state_bucket.region}"
      ''

and before the terraform init invocation, calling dhall-to-text <<< './state_backend.tfvars.dhall ./redacted/SomeEnvironment.dhall "some-project"' > state_backend.tfvars

This approach achieves most of the benefits of integration with our Dhall-based configuration, preserves developer/editor tooling that exists for mainline Terraform HCL (since main.tf, variables.tf, and outputs.tf are all written in normal HCL), and does not require generating types for any Terraform provider. To me, it seems like win-win.

What I’d be more interested in seeing is something that takes an outputs.tf file, generates a type for it, and takes the output of terraform output -json and generates a Dhall record that matches the type of outputs.tf. For now, we’re maintaining a separate TerraformOutput.dhall : Type which has to be updated by hand whenever the real Terraform outputs change, plus a script which tediously generates such a Dhall expression in a non-generic way.


(Gabriel Gonzalez) #5

My view is that a dhall-to-terraform integration should be “idiomatic”, meaning that it translates Dhall idioms into their analogous Terraform idioms.

For example, the Terraform analog of a Dhall function is a module, where the variables are the module inputs, the outputs are the function’s outputs, and resources are the function’s “effects” (i.e. analogous to IO or Writer [Resource]). So I would expect a Dhall function like this:

  λ(inputs : { x : Natural, y : Natural })
→ { resources = [] : List Resource, outputs = { z = x + y } }

… to translate to this Terraform module:

variable "x" {
  type = number
}

variable "y" {
  type = number
}

output "z" {
  value = "${x + y}"
}

I’m oversimplifying a bit, but that’s intentional. Mapping a functional programming language onto Terraform means that some Terraform idioms might not be encodable in Dhall, similar to how compiling a structured programming language to assembly means that you can no longer program using goto. For example, in Terraform variables are global, meaning that a variable declared in one module can be accessed within another module, whereas if you restrict yourself to Dhall compiled to terraform you might need to thread variable values explicitly between modules via their inputs/outputs (i.e. pure functional programming embedded within Terraform).

However, once you view a Terraform module as a function then terraform apply becomes exactly analogous to function application in Dhall. You are applying the module (function) to its variables (function arguments). Similarly, calling a parametrized child module and accessing its outputs is exactly analogous to function application in Dhall.

That might be a bit of an extreme way to approach modeling this integration, but I’m throwing the utopian idea out there to get people to view this problem as doing more than just trying to reduce Terraform configuration boilerplate.


(Arian Van Putten) #6

Note that HCL2 is gonna add some more dhall features like being able to map over lists.

Anyhow personally I’ve tried to avoid as much terraform-specific functionality as possible.
So instead of count parameters, I use dhall map function, instead of variables I use dhall ENV imports. Instead of modules I’d use dhall functions. Honestly I think HCL the way Terraform uses it is not well-designed (and even Terraform agrees and is working on a new version of HCL because of it). Modules are not intuitive to me at all. However they did so much work for us by writing all these providers it would be stupid not to use them.

HCL has a JSON backend for a reason and that is specifically “code generation target for if you want something else than HCL” hence I try to utilise as many dhall idioms to reduce boilerplate instead of Terraform idioms.

One downside is that other Terraform modules can not call your dhall module… But at least you can still call existing Terraform modules from your dhall one so there’s that

Interesting side-note by the way. Terraform-providers are just standalone binaries that implement a local RPC interface. They can be used without Terraform in theory. I’ve been dabbling with the idea of reusing the providers in NixOps or a similar tool, without the Terraform frontend.


(Ollie Charles) #7

This makes sense, but to me it seems like a big part of Terraform is about building the dependency graph and sharing the created resource data with other resource definitions. That’s why my dhall-terraform idea tries to “close the loop”, and gives you the tools to build that graph directly in Dhall. Where do you see that fitting in? I’d be interested in seeing an expansion of your example that covers the Terraform tutorial of creating an Elastic IP and assigning it to an EC2 instance, for example.

I will have to read up on Terraform modules, all I’ve really used so far is resource definitions. It seemed like anything beyond that was about making Terraform easier to use, but I don’t really care about that if I’m going through Dhall. I considered just resources to be primitives - variables can come from either function arguments to my Dhall expression, or env imports.


(Arian Van Putten) #8

FYI:
In my experiments last time, I stumbled upon this Gist by the way


it converts a Terraform Provider Schema into a JSON Schema that describes exactly how json values should look. A terraform Provider schema also details what the result arguments are, so that’s useful.
https://github.com/dhall-lang/dhall-kubernetes is basically a glorified jsonschema-to-dhall converter. I think if we rip out that core functionality, we can quite easily build something without too much manual code. Would mean we dont have to painstakingly write Dhall types for each provider, but instead generate them using a code generator.

To answer your question @ocharles I simply did dirty stringly typed approach that wasn’t checked whatsoever. I think a dhall-terraform executable that feeds in the results is a neat idea. Actually NixOps works the same way. First the module is evaluated, then resources are created, and then the module is evaluated again but with the resource’s IP etc injected into it.


(Alex Humphreys) #9

I wrote a ruby script that makes terraform aws resources in dhall here: https://github.com/advancedtelematic/dhall-terraform-aws-provider

It’s using the schema provided by https://github.com/minamijoyo/tfschema, but if you could reuse the "jsonschema-to-dhall"ness of dhall-kubernetes that would be pretty sweet.


(Ari Becker) #10

What I’d be more interested in seeing is something that takes an outputs.tf file, generates a type for it, and takes the output of terraform output -json and generates a Dhall record that matches the type of outputs.tf . For now, we’re maintaining a separate TerraformOutput.dhall : Type which has to be updated by hand whenever the real Terraform outputs change, plus a script which tediously generates such a Dhall expression in a non-generic way.

I went ahead and wrote something to address this. Turned out to be simpler than I thought it would be. Currently works for string and list output, supporting generating Text, Natural, and List Text types.

@ocharles fyi