Supporting transitive imports from private repositories?

timbertson · April 7, 2021, 11:59am

This is a continuation on Deep imports and environment variables and Import chaining headers

tl;dr let’s say I have two private github repositories, timbertson/base and timbertson/app. Base is… some shared stuff, and app builds upon it.

When I import timbertson/base/package.dhall from timbertson/app/package.dhall, I will need to pass auth headers including my $GITHUB_AUTH environment variable.

That all works fine when I’m importing locally, but comes unstuck when I want to import app/package.dhall from github directly - I can’t evaluate the header expression since it can’t access any environment variables.

One fallback which works in both local and remote contexts is:

-- base-impl.dhall
  (https://raw.githubusercontent.com/timbertson/base/COMMIT_ID/package.dhall using ./github-headers.dhall)
  ? ../../../base/COMMIT_ID/package.dhall

and then importing that (to freeze it):

-- base.dhall
./base-impl.dhall sha256:(...)

But it’s got quite a few usability issues:

I can’t test the (local) path is right without actually pushing it and importing the result
I need to repeat the COMMIT_ID in both parts, and keep the paths in sync
it confuses tooling (e.g. I have a script to bump a commit for github imports, and it’ll only understand the first import)
I need two different files for a single import
the error messages for fallbacks tend to be more confusing, since one of the two errors is “expected” but users aren’t always aware of which one or why. And even when they are it’s still a screenful of error message to wade through.

So I’m wondering if there’s a modification to the the “same origin” policy which could support a single import working in both contexts.

One idea would be that for a remote file (imported from the web), any transitive imports with the same origin implicitly use the same headers (even if it’s not a relative import). I think it would also need to ignore explicit headers in the case where a remote import happens to contain a fully-qualified import for the same origin, which is definitely a little surprising.

I imagine headers are only really used for auth in practice though, so this slightly surprising behaviour is likely to be the most useful? I don’t believe it weakens dhall’s security posture, since it only affects behaviour which is already within the same origin (and can already use relative imports).

Gabriel439 · April 7, 2021, 3:10pm

I was a little confused by the 2nd-to-last paragraph. This part I agree with:

One idea would be that for a remote file (imported from the web), any transitive imports with the same origin implicitly use the same headers (even if it’s not a relative import).

Specifically, I agree with the idea that we should permit forwarding headers even for non-relative imports as long as we know it shares the same origin (i.e… by comparing the domain).

However, I didn’t understand this part because it sounded like it was saying the opposite:

I think it would also need to ignore explicit headers in the case where a remote import happens to contain a fully-qualified import for the same origin, which is definitely a little surprising.

timbertson · April 8, 2021, 12:21am

Yeah that is a bit confusing. What I meant is, say you have this file:

-- package.dhall
https://example.com/dependency.dhall using ./headers.dhall

Generally, the value of ./headers.dhall is used for the request for dependency.dhall. And that would remain true when this file is imported from the filesystem, or from any other origin.

But I’m proposing that in the case where the current file was imported from the same origin (https://example.com), the ./headers.dhall expression would not be evaluated at all. It would be ignored, and instead the headers that were used to import this file (https://example.com/package.dhall) would be used to fetch dependency.dhall.

The rationale being that ./headers.dhall references $GITHUB_TOKEN, and that won’t evaluate if it’s a remote import.

i.e. it’s not enough to say “propagate the same headers for a request to the same origin”. We also need to say “don’t even attempt to evaluate a using expression for an import with the same origin (because it will likely fail)”.

Gabriel439 · April 9, 2021, 4:13pm

This makes me wonder if we’re operating at the wrong abstraction layer. It seems like what might work better is some Dhall mapping from domain names to custom headers, rather than specifying that information in-band within the configuration files. In other words, something like a Dhall analog of a ~/.netrc file:

toMap
  { `raw.githubusercontent.com` = ./github-headers.dhall
  }

I believe @philandstuff proposed an idea like this a while back as a way to simplify the standardization logic around custom headers.

timbertson · April 10, 2021, 1:22am

Yeah, I agree with that - the information really relates to the host far more than any particular import.

If we’re spitballing, I think the dotfile makes sense but it would also be great to be able to override this in an envvar. We have workflows that require shuffling auth tokens over e.g. docker invocations, and mounting a file onto ~/.config/dhall/netrc or something is definitely more awkward than being able to export a DHALL_NETRC env.

Gabriel439 · April 10, 2021, 3:09am

Yeah, I might suggest something like this pseudo-Dhall code for resolving the custom headers file:

  env:DHALL_HEADERS
? "${env:XDG_CONFIG_HOME as Text}/dhall/headers.dhall"  -- Not valid Dhall
? ~/.config/dhall/headers.dhall"

timbertson · April 26, 2021, 11:40am

@Gabriel439 / @philandstuff, do you folks think this is worth supporting? What needs to happen to progress this? I assume it’d need to be a standard change, not just the haskell implementation?

Open questions (with my assumptions nested):

Do we keep the inline { expression } with { headers } formulation? How do the two interact?
- we do, and they are merged, with the inline version taking precedence on key conflicts
How to specify headers? AFAIK this is the first dhall config file, presumably it’s written in dhall?
What is the type, and do we need to consider future changes to the type?
- it’s a map of origin to headers, i.e. Map Text (Map Text Text), and we don’t need to support type evolution
Are there any special rules for interpretation?
- it’s treated as any other dhall file on disk
are we happy with the www origin rules, e.g. you can leave off the port number if it’s the default? Or should we make the port mandatory to make it easier to implement?

Gabriel439 · April 27, 2021, 4:39pm

@timbertson: My guess is that we will eventually want to get rid of the using headers feature, but not immediately since we want to have a long enough grace period for people to migrate to the new way of doing things. Part of the reason I suggest eventually getting rid of the old using headers feature is that it’s pretty difficult to implement correctly and is one of the most complicated parts of the import-related standard logic.

I do think that the headers file should be Dhall file. The main issue I can foresee with making it a Dhall file is that we’d have to standardize that the configuration file does not itself have any remote imports (otherwise it would trigger an infinite loop of the headers file trying to use itself to resolve its own remote imports). It’s doable, though, and not that hard to standardize.

I think the type should be what you suggested (i.e. Map Text (Map Text Text)). Ideally the type should be fixed, which is why I’d prefer a Map over a record.

The main special rule for interpretation is that it should forbid remote imports.

Also, (and the standard is not very clear in this regard) when interpreting a Dhall expression you always need to specify a root import that transitive imports are resolved relative to. For example, if you interpret a Dhall file then the root import is the file itself, and if you interpret an expression that is a string on stdin then the Haskell implementation sets a root import of the current directory. In the case of resolving the headers file, the root import would be wherever the headers configuration was originally found (e.g. env:DHALL_HEADERS,“${env:XDG_CONFIG_HOME as Text}/dhall/headers.dhall”or~/.config/dhall/headers.dhall`).

Could you clarify what you mean by the origin rules? It’s not clear how the headers file could infer which port to use from the scheme since it only operates in terms of hostnames.

timbertson · April 29, 2021, 9:23am

Looking at your earlier example, it seems like you are proposing a map of headers per hostname. I was assuming we’d use web browsers’ notion of an “origin”, which is protocol + hostname + optional port.

So my question was, do we support both of these as equivalent:

-- explicit port:
toMap { `https://github.com:443`= toMap { Authentication = " ... " } }

-- implicit port 443 due to https protocol:
toMap { `https://github.com`= toMap { Authentication = " ... " } }

Or, do we:

require a port explicitly in each key (simple but a little tedious)
disallow ports (and protocol?) entirely, and instead treat the hostname as the origin, as I think you originally suggested.
- I’d rather not, since that doesn’t match a web browser’s notion of an origin, and could cause bad things like sending auth tokens over http.

Gabriel439 · April 30, 2021, 7:19am

Oh, I think the simple thing to do would be to require the port to be explicit, at least initially. If we decide to support an implicit port we can always add that on later as a non-breaking change.

timbertson · July 31, 2021, 5:46am

For anyone following along, this is being standardised in https://github.com/dhall-lang/dhall-lang/pull/1192 and I’m working on an implementation in dhall-haskell here: https://github.com/dhall-lang/dhall-haskell/pull/2236