Text manipulation functions

timbertson · May 19, 2020, 12:16pm

As a big fan of dhall, I just wanted to chime in in favour of text introspection as an escape hatch (and not just at the import level). I’m trying to evangelize dhall for some promising use cases at work. For the most part, dhall is a great fit, I can imagine it working quite well. But I have a growing belief that some small parts will require text introspection. It’s a long story, but in tl;dr it’s likely one part will need to manipulate kubernetes manifests, so that’d be a choice between “use text manipulation” and “rewrite the entire dhall-kubernetes type tree with custom unions in most places where there are strings”.

So, I’m a devoted user who’s motivated to do things the right way, because I understand the benefit. But even I’m not sure I can wholeheartedly recommend dhall for this before knowing the entire solution, since there’s no escape hatch.

Gabriel439 · May 19, 2020, 2:53pm

@timbertson: For your use case, do you mainly need introspection in order to convert a Text field into an enum or are there other Text introspection facilities that you require?

The reason I ask is that we could add support for a toUnion keyword that, given a enum and an “immediate” (i.e. non-abstract) Text value, converts that Text value to an alternative of the same name, and fails at type-checking time otherwise (if the Text literal does not match any alternative name). This would be in roughly the same spirit as the fromMap keyword which was discussed here:

github.com/dhall-lang/dhall-lang

Builtins for `Map`

opened 03:09PM - 26 Apr 20 UTC

f-f

Usecase/context: we're designing a [Package Registry for PureScript](https://git…hub.com/purescript/registry) in Dhall, and in the current iteration we're storing things like "dependencies" and "build targets" in a Dhall `Map`, because they are key-value mappings and we can't fix the keys beforehand so it's not possible to use a record. However, we would need to "override" values inside the `Map` sometimes (see [this issue](https://github.com/purescript/registry/issues/17) for details), and this is not possible at all in Dhall right now. So I was wondering if we'd be open to add some kind of manipulation builtin for `Map`? In our case having some kind of `Map/update` would fix the current concerns. I think a good candidate for a builtin could be: ```dhall Map/alter : ∀(a : Type) → ∀(k : Type) → (Optional a → Optional a) → k → Map k a → Map k a ``` ..modeled after [`Data.Map.alter`](https://www.stackage.org/haddock/lts-15.10/containers-0.6.2.1/Data-Map-Lazy.html#v:alter), on top of which we could implement `insert`/`update`/`delete` in the Prelude. If we had string comparison we could implement this with `List` operations, but adding such a builtin might be valuable anyways for performance reasons. Of course if we add this then it could be exploited to implement string comparison, but IMO it's a fairly acceptable risk. Thoughts?

To answer your broader question, we’ll generally do our best to work with you to find a solution that balances user ergonomics with language security guarantees.

The user experience I would like to preserve is that once import resolution and type-checking succeed the user can then have supreme confidence that the configuration will “just work”. This is why I keep brainstorming ways to move Text introspection checks into either the import resolution phase or the the type-checking phase.

timbertson · May 19, 2020, 11:51pm

I don’t think that would help, if I understand you correctly. The problem is we would be doing this on Text values passed in (as part of a kubernetes manifest), so it wouldn’t be a Text literal.

Simple example: imagine writing a function which adds a given annotation to a kubernetes deployment, but only if it doesn’t already exist. You’re given an instance of the Deployment type, and it’s impractical to modify that type so that all annotations are represented as a union (since that would mean forking the whole type hierarchy of dhall-kubernetes).

If the operation couldn’t fail, then could it be relaxed to all Text values, not just literals? What if toUnion always required a fallback case?

e.g. toUnion <OptionA | OptionB | Unknown: Text>

That would ensure everything’s complete, and puts the burden of “what to do when it isn’t a known value” back on the user.

(It should probably be an Either type though, rather than embedding an Unknown branch in each enum)

singpolyma · May 21, 2020, 12:58am

I’ve always thought that Text being opaque was a core feature of Dhall.

That said, if we do add some Text operations, we should be sure to not offer any that use the accursed “codepoints” in any way. Codepoints barely deserve to exist, and shouldn’t be exposed in anything as high-level as Dhall.

timbertson · May 22, 2020, 11:13am

I guess I don’t know what you mean exactly by “just work”. Presumably any text manipulation would be total (i.e. returning Optional instead of failing, etc)? So the overall expression would be just as safe / correct, you’d just be allowing dramatically more branching logic.

For the contexts I’m using dhall, the end consumer of this config (kubernetes, github actions, etc) are already great big piles of runtime checks, so pushing any of that logic into dhall (where at least it’s pure) is improving the “just works” likelihood.

Gabriel439 · May 22, 2020, 3:39pm

@timbertson: I’m using the phrase “just work” as a shorthand for restating the idea that errors are caught at import resolution and type-checking time.

There are some built-ins within the language that return an Optional value (e.g. List/head) so we’re not strongly opposed to using Optional for programming logic. That said, enabling Text introspection, even total Text introspection, is different because it is prone to users creating weakly-typed DSLs embedded within Dhall (see my comment about comma-separated strings).

I understand that there are external tools and formats that might have some sort of weakly-typed structure, which is why we strive to provide adapter tools that convert from weakly-typed representations to Dhall (e.g. yaml-to-dhall).

That brings me to my next train of thought: I would like to lean a bit more on getting yaml-to-dhall to work for your use case. If I understood your original comment: the issue is that the default dhall-kubernetes schemas use Text in a few places where you would prefer a more precise union type. Would that issue be resolved if there were an easy way to update the deeply nested types?

In other words, what if there were a type-level analog of the current with keyword, so that you could do something like this:

let UpdateType = < RollingUpdate | OnDelete >

let MyDeployment = k8s.Deployment.Type with spec.?.strategy.?.type = UpdateType

… then you could use that customized type for your yaml-to-dhall conversion.

timbertson · May 27, 2020, 11:01am

Hmm, it’s hard to be sure. First thing to clarify is that it probably woudn’t involve yaml-to-dhall, the intent is for everyone to be writing dhall. Which means it’s (relatively) easy - people would be writing the union directly, rather than trying to get it through yaml-to-dhall.

That construct feels useful in general (so if it’s something you’re considering anyway you have my vote), but then you’ve still got the problem that any function built to work with the vanilla dhall-kubernetes types wouldn’t work on your modified type. And in particular it feels like it might be very difficult to get an ergonomic “clone” of the dhall-kubernetes package, in particular having all the various default records target the modified type. I haven’t looked too deep into it, but it feels like you could run into problems other than just the raw types.

PierreR · August 27, 2020, 10:22pm

I am sorry if this is the wrong place to ask but is there a way in dhall to assert that one user inputs a value containing lowercase only ?
We have encountered such problem in k8s where namespaces have to be lowercase (and we had received a foreign user input with a capital (e.g: name = “Foo”)

Somehow I wish for a type Text that only allows lowercase.

Of course if a function Text/lowerCase : Text → Text exists I could just do the transformation before converting in yaml with dhall-to-yaml (but that is not as nice in this case)

Gabriel439 · October 21, 2020, 6:07am

I just wanted to note here that Text/replace was standardized, so now the original request can be resolved using Text/replace "-" "_"

PierreR · October 29, 2020, 1:29pm

@Gabriel439 is there a way to enforce Text to be lowercase only (as it is the case for k8s namespace) using the new Text/replace ?

Gabriel439 · October 29, 2020, 2:26pm

@PierreR: You can approximate lowercase behavior using something like Text/replace "A" 'a' (Text/replace "B" 'b' …), but in general lowercasing a string in a Unicode-aware way requires a special-purpose built-in because the real rules are complicated. See:

http://www.unicode.org/versions/Unicode13.0.0/ch05.pdf#G21180

vmchale · October 30, 2020, 2:13pm

perhaps that can be a next couple builtins!

Uppercasing/lowercasing a fun corner of text

Gabriel439 · July 9, 2021, 4:31am

Following up yet again to mention that the Prelude does in fact have those suggested utilities for uppercasing and lowercasing ASCII Text:

StructSeeker · December 1, 2023, 4:34pm

I would love for dhall to have a rich support for text validation and i immediately realize there are many facilities required for a expressive validation systems. In my view, the text support should be gradually added in several parts:

Provide List UnicodeScalar Primitive (or something similar) and other common unicode/ascii utilities for advanced users and library writer. Note that UnicodeScalar is perferred over codepoint to avoid the issue of surrogate pair
Add minimalism text support for ordinary user, including span, splitAt, stripPrefix and Natural/fromText, Float/fromText etc.

The design and implementation for the first two stage should be easy and already useful for many users. The following design and implementation would be complicated.

Support check up/extract information against particular grammar (regular grammar, ABNF, PEG, Attribute Grammar…).

Obviously, more expressive grammar leads to more precise validation, but it also brings implementation complexity.
In my opinion, a good implementation of 3 requires Inductive and User-Defined data type. First, RG and CFG can be regarded as an instance of an inductive data type (e.g. Grammar NonTerminal, Grammar as a type constructor, data constructor take production rules). Secondly, a parsing tree for given grammar rules is naturally a family of mutual inductive data types for each nonterminal. It would not be very ergonomics to continue use current roundabout. Third, certain validation/deserialization schemes can be associated with the corresponding data type.

Note that allowing user have access to parsing tree further enhance the validation ability. For example, it may allow PEG grammar be checked even if only ABNF is supported; or force that the close and open tag pair of xml are of the same element name (which can’t be done in CFG).

kukimik · June 10, 2024, 12:21pm

It turns out that once Text/replace was included in the language, some simple Text validation is possible (which may be a good thing), as well as some very simple text-based DSL-s (which probably isn’t so good). See GitHub - kukimik/dhall-text-utils: Utilities for validation and modification of Text values in the Dhall configuration language for a proof of concept. Admittedly, it’s a hackish workaround, but it is possible.

hellemithh · November 23, 2024, 5:33am

Hi,
These functions are very helpful.