Execute local files at import?

lisael · May 8, 2020, 1:42pm

There’s a feature I find handy in Ansible, it’s the way it manages the executable files : if a configuration file has the exec bit, ansible would run the file and use the stdout result as if it’s the content of the configuration. It’s often use to generate dynamic inventories, and I, for myself, use it to store secrets in GPG encrypted files (long story short: this whole idea popped while I was trying to figure the best way to securely add secrets in a Dhall configuration).

So the proposal here is to make Dhall run local executable files at import. I understand that this proposal defeats the whole not-turing-complete thing, but I’d argue that it pushes the impure, turing-complete stuff at the boundaries, much like the JSON import solution in #121. Speaking of JSON import, let me show a code example that solves #121 using the executable import feature:

in myconfig.yaml (this requires that your /usr/bin/env support the -S option):

#! /usr/bin/env -S yaml-to-dhall '{a: Natural, b: Bool}' --file
a: 42
b: true

then in my dhall files:

let a = assert : ./myconfig.yaml === { a = 42, b = True }

Tadam! the user defined parsers are just executable files that output valid Dhall.

For security purpose, there could be limitations such as not interpreting *.dhall files or freezed imports.

Any thoughts ? Is it irrevocably deemed way too impure to even think about it ?

Gabriel439 · May 8, 2020, 3:55pm

@lisael: I can’t speak for others since the Haskell implementation is only one vote on the standard, but I view interpreting arbitrary executables as too unsafe.

One of the tensions I see in the language evolution right now is that Dhall is currently getting pulled in two different directions:

An ergonomic programming language with a lightweight import system
A configuration language with an emphasis on language security

I place a higher priority on language security because Dhall will never be best-in-class for the first use case (an ergonomic language with lightweight imports). Nix is going to be the best in that category for the foreseeable future and will only grow in popularity as time goes on. This issue is compounded by the fact that Dhall does not have bidirectional type inference; maybe we will in the feature, but by then it will likely be too late to be the leader in this category.

However, Dhall is far and away the leading language in the language security category; nothing else even comes close. This is why I’m reluctant to sacrifice safety guarantees, because we’d be compromising our position of strength.

In my experience, you only gain mainstream adoption if you are best in class in for some use case, so I wouldn’t want to give up on the language security use case unless I was reasonably confident that we could trade it for being best-in-class for an even broader use case.

lisael · May 8, 2020, 9:42pm

To be honest, I wasn’t expecting another answer. Thank you for the detailed rational behind your opinion (you lost me at “bidirectional type inference”. Given that it pops a lot in GH issues and here, it looks like a big deal. I think I have to read about it a bit.)

However, it’s fairly easy to hack this feature right now. I could start a local web server that run executable files and then import from http://localhost/myconfig.yaml or mount a FUSE fs that runs myconfig.yaml on read, allowing to import this file directly, and so on.

These solutions, count on the infinite imagination of users, will exist in the wild, defeating the total-ness of Dhall. I’m not convinced that a forked process is really more unsafe in terms of security. An implementation could, for instance, fork as early as possible a dedicated worker for unsafe stuff and send messages via a socket to alleviate the risk of secrets leaks. This solution looks a lot like a buitlin version of the web server hack I mentioned, isn’t it?

I’m not trying to convince anyone, I genuinely want to understand the theory behind this kind of radical no-go.

ari-becker · May 9, 2020, 9:06am

@lisael For what it’s worth, the idiomatic way to represent secrets in a Dhall configuration is to parameterize them and provide them at runtime.

For example, given config.dhall:

\(secret: Text) -> { field = secret }

to get a final, sensitive config.json:

export FIELD='super-secret-password'
dhall-to-json <<< "./config.dhall (env:FIELD as Text)" > config.json # produces { "field" : "super-secret-password" }

Secrets should be easy to rotate, but configurations should stay the same while the secrets change over time. In most configuration languages, that’s pretty difficult to pull off safely, but Dhall makes it easy - the hash of config.dhall stays the same the whole time.

ari-becker · May 9, 2020, 9:43am

lisael:

However, it’s fairly easy to hack this feature right now. I could start a local web server that run executable files and then import from http://localhost/myconfig.yaml or mount a FUSE fs that runs myconfig.yaml on read, allowing to import this file directly, and so on.

These solutions, count on the infinite imagination of users, will exist in the wild, defeating the total-ness of Dhall. I’m not convinced that a forked process is really more unsafe in terms of security. An implementation could, for instance, fork as early as possible a dedicated worker for unsafe stuff and send messages via a socket to alleviate the risk of secrets leaks. This solution looks a lot like a buitlin version of the web server hack I mentioned, isn’t it?

I’m not trying to convince anyone, I genuinely want to understand the theory behind this kind of radical no-go.

I remember when I tried to hack around the limitations when I was new to Dhall - you may be interested in @Gabriel439’s reply: https://github.com/dhall-lang/dhall-lang/issues/370#issuecomment-462809687

The safety guarantees of a language like Dhall are everything to its heavy users. Choosing a language with strong safety guarantees, for us, means that we pay an additional upfront safety cost for each new feature we build, in exchange for drastically easier and more confident maintainability over the long term. It makes our maintenance confident. That has a direct business impact in reducing the number of engineers who need to be employed to develop and maintain the system over time.

At work, this sprint we’re going to be upgrading upgrading Kubernetes underneath hundreds of servers, most of them databases, upgrading Kafka, switching the host OS, in multiple environments/VPCs… with three engineers. With more automation in non-configuration-related parts, it could be done with one. It would be completely impossible without Dhall and its safety guarantees.

If the language were to lose its safety, the language would make it easier (not possible, because it’s not currently impossible, see the link above, but easier) for us to shoot us in the foot. Murphy’s Law promises that we would shoot ourselves in the foot. If we shot ourselves in the foot, then we’re going to have to build in more verification and safety mechanisms - ideally code-based, but Murphy promises it’ll be human approval. Everything would slow down, and I’d wonder why I’m risking my career on off-the-beaten-path tooling that’s slowing me down instead of making my life easier.

What’s the theory behind this kind of radical no-go? Just because something can be built, doesn’t mean that it should. I value Dhall’s focus, sheer discipline, and uncompromising quality, because it translates into huge benefits for us.