Platform-specific import types

TimWSpence · April 14, 2020, 11:34am

A discussion we’re currently having around our JVM implementation https://github.com/travisbrown/dhallj is how to handle imports from the JVM classpath. When running on the JVM you basically have two parallel filesystems - the local filesystem and the classpath and imports are potentially ambiguous between the two.

We’ve thought of a few options for solving this and thought we’d seek feedback here.

Option 1: a (very small) extension to the spec

We would allow the user to write let import = cp:/absolute/path/to/import in ... This uses very similar syntax to env imports and requires the user to specify an absolute path to avoid classloader-related ambiguity. I believe the semantics would be pretty much the same as an absolute path local import (including referential sanity), just with a different mechanism for resolving the content of the file.

Option 2

No extension to the spec. We would require the user to pass some kind of discriminating function
prefix : Path -> FileSystem Obvious advantage is that it doesn’t require extending the spec. Disadvantages are the user experience and the fact that it doesn’t really help you if the prefixes are the same on both local and classpath filesystems (unlikely though this may be!)

Option 3

Something else we have’t thought of! Suggestions?

philandstuff · April 14, 2020, 12:26pm

My gut feeling is that option 1 is better here. It doesn’t sit well with me having the same syntax for two different kinds of import, and then using some hack later on to actually decide what kind of import it really is.

Note that you don’t need to change the spec to allow this: I think a conforming implementation can have extensions, such as this one, which would otherwise be a syntax error. But it doesn’t make sense for non-JVM implementations to support classpath imports.

Gabriel439 · April 14, 2020, 3:14pm

I’d also lean towards Option 1 (except with the minor suggestion to use classpath for the scheme). The main reason why is that code written to use the classpath import can be made compatible with non-JVM implementations by resolving the imports (i.e. like running the code through dhall resolve)

Another thing that will help is to treat the imports as “local” (for the purpose of the referential sanity check), that way people won’t use them in shared packages that are imported remotely. My mental model is “local imports = application code” and “remote imports = library code”, so classifying classpath imports as local will help restrict them to application code, which will limit potential compatibility issues.

philandstuff · April 14, 2020, 3:19pm

For reference, there is some prior art in Spring for classpath: as a URI scheme.

TimWSpence · April 15, 2020, 1:16pm

Many thanks for your input @Gabriel439 and @philandstuff! classpath:/... it is

lisael · June 20, 2020, 11:23pm

Shouldn’t we change the spec so that an implementation that doesn’t implement a scheme doesn’t fail but simply ignore the import?

Something like

let MyLib = classpath:/com/acme/MyApp//MyLib.dhall
            ? /usr/share/lib/dhall/MyApp/MyLib.dhall
            ? http://acme.com/MyApp/MyLib.dhall

should be possible, and parsable. (I can’t think about a use-case, though)

lisael · June 21, 2020, 11:58am

I’m implementing a custom import scheme for pydhall. One issue is the cbor encoding of the scheme.

We don’t want a custom scheme encoding to clash with dhall standard schemes.

Option 1

add a requirement in the spec that any implementation-specific scheme must be encoded as a number >1000

Pros:

trivial to implement

Cons:

what if pydhall, that already uses 1001 for pydhall+classpath:, wants to support dhallj’s classpath: that happens to be encoded as 1001 too?

Option 2

Encode custom schemes as CBOR strings

Pros:

no name clashes (especially if we enforce the namespacing of custom scheme like in pydhall+)

Cons:

It’s easy in Python to have a dynamic type for the CBOR-enoded scheme (integer for standard schemes, string for a custom scheme). It may be more complex for statically typed langages

Option 3

Use a standardized (string -> integer) hash. This combines Option 1 and Option 2. Note that we don’t even need to compute the hash at runtime. It can be hard-coded in implementations.

@TimWSpence how do you encode classpath imports ?

As a side note, shouldn’t we require that custom scheme must be namespaced by <implem_code_name>+ as in pydhall+classpath ?

TimWSpence · June 22, 2020, 8:12am

@lisael our implementation is fairly adhoc for the moment and comes with a warning that the semantics are not finalized. For the moment, I’ve encoded classpath imports as 23 as (from what I remember) that is the highest integer that can be CBOR tiny field encoded. Obviously in theory that could cause problems as Dhall expands but it would have to add a lot of import types to do so.

Compatibility between extensions for different languages is a very good question. I have no idea right now what the best solution is!

lisael · June 22, 2020, 8:54am

Option 4

Attribute a new binary id to custom imports so they are not decoded as imports and store the scheme as a string. They must follow the same chaining, caching rules as regular imports.

cbor_encode(i: custom_import) = [29, hash(i), import_type(i), "scheme", ...]

Gabriel439 · June 25, 2020, 2:46pm

@lisael: I believe all of the import schemes are valid URIs, so what if we only require that a custom import is a valid URI? That would allow us to preserve more structure in the encoding than just a string

lisael · June 25, 2020, 4:28pm

@Gabriel439 if I understand correctly, your suggested encoding is

[24, null, 0, 8, "my+scheme://a/path"]

or (more consistent with http imports as the paths is broken down into components)

[24, null, 0, 8, "my+scheme", "a", "path"]

It’s clean, but it changes the decoding logic. With standard imports we rely on the first and fourth elements to find the implementation of an import (python classes in pydhall’s case). It’s not a big deal to add another dispatching routine based on the scheme element.

I’ll open an specification issue for this.

Gabriel439 · June 26, 2020, 4:07am

@lisael: Yeah, the idea was that existing standard import schemes would still be encoded in the same way and custom schemes would use a different more weakly-structured encoding.

lisael · June 26, 2020, 5:10pm

github.com/dhall-lang/dhall-lang

Support platform specific import schemes

opened 05:07PM - 26 Jun 20 UTC

lisael

Standardize the way platform-specific imports are parsed, resolved and CBOR-enco…ded. ## Use cases ### Integration with the platform's packaging standards A Dhall implementation may want to rely on language-specific semantics to locate dhall files. This permits users to use the language ecosystem's packaging standards to ship dhall files with an application or a library. It's the use-case raised by Dhallj implementers: ``` let pkg = classpath://com.acme.myApp/package.dhall ``` With pydhall I plan to support `pkg_resources` data loading mechanism too. ### Custom resolution routine I'll show a pydhall example here, as it's actually the use case that triggered my reflexions. I don't want the discussion here to forget about this use-case :). The pydhall API way to define a configuration schema is to define python classes: ```py from pydhall.schema import Schema, NaturalField class MyConfig(Schema): a = NaturalField(default=42) ``` Then the end-user can import this schema in their dhall configuration: ```dhall let myApp = pydhall+schema://myApp.settings/MyConfig -- load MyConfig class from myApp.settings python module ``` This import resolves as: ```dhall {MyConfig = { Type = { a : Natural }, default = { a = 42 }}} ``` ## Proposed design 0. Custom imports are permitted by the standard 1. A custom imports MUST be a valid URI 2. Custom imports URI schemes SHOULD be namespaced with the name of the platform that first introduced them (`pydhall+`, `dhallj+` ..) 3. The parser is modified to parse non-http URIs as custom import. The parser SHOULD split path components. 4. Custom imports are binary-encoded as imports, using the first unused scheme code (8 at time of writing): `[24, <hash>, <import_type>, 8, <scheme_as_string>, <uri_authority>, <uri_path_element> ...]` 5. Custom imports MAY be hash-protected 6. Resolution 1. The implementation knows the scheme 1. It's up to the implementation to resolve the import (including the handling of the import type (`as Text`,...) and of the chaining semantics) 2. caching rules stay the same as those of standard import - same URI resolves as the same Dhall term durring an import pass - if hash protected, try to load the term from the cache and check the hash of the result 2. The implementation doesn't know the scheme - the resolution fails - the implementation SHOULD raise a warning to the user ## Notes An implementation SHOULD provide a way to show the user the hash of an import. This permits the user to use standard Dhall tooling on dhall files that use custom imports: ``` let pkg = acme+loader://my.pkg/package.dhall ? missing sha256:deadbeef... in ... ``` Is a correct input for `dhall format`, `dhall normalize`, `dhall-to-json` tools without modifying those, as long as the result of the import was cached beforehand.