Proposal: only canonicalize imports when chaining

Currently, the standard requires you to canonicalize imports at import resolution time. This means that, for example, a raw URL for https://example.com/foo/./bar should result in a web request to https://example.com/foo/bar and this behaviour is tested for in the tests/import/success/unit/asLocation/RemoteCanonicalize* tests.

I would like to change this behaviour: I think we should only canonicalize when chaining imports, not for all import resolution. I have two reasons for this:

  1. it is much easier for me to implement in dhall-golang
  2. it feels more correct

Easier to implement

Currently, dhall-golang only canonicalizes at chaining time, not at resolution time. This is easier, because Go’s url library only provides functionality to do what we call “canonicalization” in its ResolveReference method. There is no URL.Canonicalize() method.

More correct

Now, I’m in danger of motivated reasoning here (see previous section), but I think it is more correct to only canonicalize when chaining, not at all times when resolving.

I’d argue that my proposal is more correct from two points of view:

  • it’d be unusual for a user to input a URL containing dotted segments in their dhall source, but I’d expect them to know what they’re doing and for this to be respected
  • RFC3986, which defines the process of removing dot segments in URLs, only does so in the context of resolving a relative reference (ie what we call “import chaining”). (This explains the functionality of the Go standard library above).

A minor related point is that our import chaining doesn’t quite match RFC3986 reference resolution, and the test RemoteCanonicalize4A.dhall demonstrates how: we don’t strip leading .. segments from URLs, but we should.

Thoughts?

I don’t have strong opinions on this, but my general rule of thumb is to prefer whatever solution is easier for people to implement. It sounds like what you propose is easier for your implementation and it doesn’t really change much for the Haskell implementation (since it’s not relying on a URI package to handle this resolution). I’m basically waiting for other implementations to weigh in on whether this is more or less complex for them.

Well, the rust uri library automatically canonicalizes everything at construction time so I cannot implement your proposal (and I was planning on altering the standard to fix RemoteCanonicalize4A.dhall).
That said, does it make any observable difference ? Local caching requires canonicalization but we could easily drop that, I don’t expect it to make noticeable differences. The whole point of RFC3986 is that a web server shouldn’t care whether we use a canonicalized url or not, right ? Can we leave it to the implementation to decide ? Since the rust uri library canonicalizes everything, I think it’s fair to assume that other libraries do that in the wild and thus it shouldn’t matter

Thanks, that’s useful info. I’ve also realised I can force the Go library to canonicalise by resolving the empty relative reference, so I think after all we can leave the standard as is.

1 Like