@sjakobi: Yeah, that’s a good point
So assuming that we agree with @blamario that we should focus on high-level primitives (i.e. ones that are not based on individual characters), there are basically three possible options to choose from:
-
Make
Text“transparent”e.g. add a
Text/split : Text → Text → Textprimitive and then we could implement the original request as:Prelude.Text.concatSep "_" (Text/split "-" name) -
Add
Texttransformations that don’t enable introspectioni.e. instead of
Text/splitwe add built-ins likeText/replace : Text → Text → Text → Text, which does not permit introspectionSome other hypothetical primitives that might fall into this category are:
Text/take : Natural → Text → TextText/escapeXML : Text → Text
-
Continue to keep
Textopaque
The best choice depends on the ultimate use cases. That being said, the first option does not really make Text transparent or enable introspection in the same way that Haskell’s accursed type String = [Char] does. I can’t think of any serious harm it could cause in the long term. It could perhaps become obsoleted by more primitive functions in future, but that doesn’t seem like a heavy burden. If nobody else can think of a more serious problem, that would be my default choice.
Again it depends on the desired end goal, but one possible design is to gradually add primitives that both
- keep
Textopaque but decompose it into smallerTextvalues, and - can be used to eventually build a parser combinator library.
The first property ensures that no door to an alternative approach is closed, the second that every possible use case is eventually covered. The only primitives you really need to build a decently-performing, general-purpose combinator library are
stripPrefix :: Text -> Text -> Maybe TextsplitAt :: Natural -> Text -> (Text, Text)span :: (Text -> Bool) -> Text -> (Text, Text)- A family of
Text -> Boolfunctions, such asstartsWithLetter,startsWithNumber, etc.
The main downsides stem from Dhall not being Haskell. Without user-defined operators and do-notation the parsers would not look as nice. I’m not sure if the type system is up to the task either. And finally, while the result would be completely general-purpose and pretty fast, for any specific use case it would still be slower than a dedicated function like split.
I think an important question (which I think you were hinting at @Gabriel439) is whether we want to support text equality or even substring checking (like Text/hasSubString : Text -> Text -> Bool).
I think that once we have one of these features (enabled for example via Text/split : Text -> Text -> List Text), users may use that instead of properly modelling their domain with unions, thereby reducing clarity and ultimately maintainability of their configurations.
So I think we ought to be be cautious about allowing this kind of Text introspection. From this perspective it would be safer to go with “non-introspective” operations like Text/replace or Text/take.
(These considerations are obviously inspired by one of your blogposts: http://www.haskellforall.com/2016/04/worst-practices-should-be-hard.html)
@sjakobi: Yeah, that’s where I was going with that distinction about introspection. As a simple example, if we provided a Text/split built-in then users might choose to model lists as comma-separated unquoted values. In other words instead of this:
[ "foo", "bar", "baz" ]
… they might try to create a Text DSL where they ask the user to instead supply:
"foo, bar, baz"
… and that DSL would be vulnerable to mistakes like elements containing commas in their names.
I forgot to mention that besides being error-prone it would deteriorate discoverability due to being weakly typed, as a user would not be able to infer what to supply for the Text value since the type does not suggest that it expects an internal structure of comma-separated values.
That is almost a philosophical choice. As long as you’re aware of its consequences, it’s hard to argue against. My own design philosophy is to give the tools to the developer, even if they can be used to construct a gun and shoot their foot. Mind you, if there is a way to make that bad outcome less likely, and good outcomes more, of course I’ll take it. The choice is rarely that clear.
Now about those consequences. Your comma-separated list is an easy example to disallow, but what are you going to do about the established structured strings that are not a developer’s whim? The prime examples are file paths, dates and times. If a user wishes to get a parent directory for a given path, or the year of a given date, you have four options:
- provide text-splitting primitives,
- add
FilePathandDatetypes to the language with the appropriate operations, - tell the user to provide the directory and year as separate inputs, or
- send them away.
You seem to be arguing for option #3, but that’s going to feel like #4 for many users, if not most of them. Option #2 is technically the safest and most correct one, but – please correct me if I’m wrong – it’s way too complex for Dhall. So really the choice is #1 or #4, adding the text-splitting primitives or refusing to support a large subset of potential users.
I had an idea. Allow me to add another option to my list:
1.5. Add the ability to declare structured string types, such as Date or FilePath, at the I/O boundary only
Here’s an example:
let Date = {year : Natural,
month : Natural,
day : Natural}
let DateYMD : Type = Date as Text separated with "-"
let today : DateYMD = "2020-03-06"
in today.year
This way all text introspection happens at input time, and there’s no way it can be abused within the program. It’s in keeping with another good blog post.
@blamario: Yeah, I like that idea. I had a similar idea here (in the context of importing JSON): https://github.com/dhall-lang/dhall-lang/issues/121#issuecomment-511955678
@blamario reminds me more of the user-defined grammars issue, which is maybe the issue which @Gabriel439 was thinking about in the #121 comment.
My idea was really only about minimal support for records and lists represented as separated strings. What @Gabriel439 was hinting at seems more like full text-parsing support that’s constrained to I/O. I like his idea even better in principle, but I’d like to see more detail.
Starting from the ./someImport.lang as ./someGrammar.dhall syntax, I’d like to see clarified:
-
What is the language available inside
someGrammar.dhall? How does it specify a grammar?- Does it have some text-parsing primitives available, like
stripPrefixetc. I outlined above? If so, how are they made available there but not in regular Dhall? Is that reflected in the type ofsomeGrammar? - Following on the last thought, there could be a built-in
Grammartype that’s basically an applicative functor or even a monad. It would come with a number of primitive constructors and combinators that can appear anywhere. The only way to apply aGrammar, however, would be theaskeyword. In this design there would be nostripPrefix : Text -> Text -> Maybe Text, onlymatchPrefix : Text -> Grammar ().
- Does it have some text-parsing primitives available, like
-
Note that my weaker idea of
record as Text separated bycan be easily extended to its inversetext as Record separated by. Would a grammar specification also be bi-directional? In other words, would there be a way to serialize a Dhall into a string according to a grammar, such as syntaxvalue as text of ./someGrammar.dhall? Ifvalueis constant, what would be the normal form of this? -
Would a string literal be allowed on the left-hand side on
as? For example, would"2020-03-07" as Datebe legal? How about arbitraryTextexpressions? -
The right-hand side of
as, ignoring the design-imposed constraints for a moment, is really nothing more than a function of typeText -> a. Could these functions be composed? For example,./myFile.json.gz as MyNormalizer . JSON . UTF8 . GZIP?
-
I haven’t really thought this through, but the rough idea I had in mind was that the
./grammar.dhallexpression would be an ordinary Dhall expression of typeText → Optional AorText → < Error : Text | Result : A >with access to additionalTextintrospection built-ins. -
The grammar does not need to be bidirectional. Nobody has requested this that I know of
-
You could permit arbitrary expressions instead of restricting this to just imports, but this wouldn’t change anything. The reason why is that imports are type-checked with an empty context, so they can’t refer to values in scope. So, for example, an expression like
λ(x : Text) → x as Datewould be a type error because the subexpressionxwould be type-checked with an empty context where the bound variablexwas no longer in scope. That prevents theas ./grammar.dhallmechanism from being used as aTextintrospection backdoor. -
Presumably the right-hand-side could be an arbitrary Dhall expression, so grammars are composable insofar as Dhall expressions are composable
That would probably be the shortest way to get something in working order, but how do you distinguish between Dhall expressions that have access to the Text introspection built-ins and those that don’t? I mean, the ./grammar.dhall file by itself is not on the right-hand side of any as. Would it be considered legal by itself? What would be the output of dhall <<< ./grammar.dhall?
Perhaps not yet, but for any json-to-dhall there is a dhall-to-json. Any format important enough to be imported in its native form will probably be important enough to be exported as well. However that can be accomplished with a separate pretty-printer, and your answer to #1 precludes any more unified solution.
What about choosing an actual grammar as the grammar type? Maybe start with something like WSN or BNF? Is that too meta?
For one thing, BNF and WSN by themselves wouldn’t specify the mapping between the text input and the Dhall value. We could extend them with appropriate constructs, but that would be a new grammar formalism. You’d probably want to design it from scratch to make it as close to Dhall as possible.
Instead of text equality, I would really like something like an open sum type instead.
Because I think most people arguing for Text/equal (also lol what does that even mean) really want open sum types with an equality on the constructors, like symbols in lisp.
Then there’s the Text/split and Text/lowercase camp, but
-
Text/splitis going to make people implement string parsing algorithms again, which leads to “oh no, this language totally not made for this is slow, I need more primitives to make it faster”.
I will refer to https://github.com/mozilla/nixpkgs-mozilla/blob/master/lib/parseTOML.nix as an example (Fun fact: to enable that, a tokenizer builtin was added to nix, but the parser was still horribly slow obviously, so in the end they fixed it by adding thebuiltins.fromTOMLbuiltin).
-
Text/lowercasehas the same problems asText/equal: lol, what does that even mean
Glad you asked: https://www.unicode.org/reports/tr15/
For Text/compare there’s https://www.unicode.org/reports/tr10/
That would be http://www.unicode.org/L2/L1999/99190.htm
Mind you, what people usually want to do with Text/lowercase and like is case-insensitive comparison, and that’s better done directly.
I do hope you are joking.