Proposal: "Do notation" syntax

jle · October 31, 2019, 12:49am

I’ve been experimenting with a few things that would benefit from do notation – mostly to organize how type annotations work. I’ve also been working through a rough implementation on the haskell backend and things don’t seem to be too breaking The main idea is this – introduce “do notation” syntax that comprises, syntactically, of three parts:

do_block = do <functor> <bind> <result type>
            <do_lines>
            <final_line>

do_lines = <ident> : <type> <- <expr>
         | <expr>
final_line = expr

In general, we’d have:

do M bind A
    x : B <- q
    r
    z : D <- s
    k

turn into:

bind B A q (\(x : B) ->
  bind {} A r (\(_ : {}) ->
    bind D A s (\(z : D) ->
      k
    )
  )
)

M (in do M bind A) would have to be an expression of type Type -> Type.

bind (in do M bind A) would have to be an expression of type

forall (a : Type)
  -> forall (b : Type)
  -> M a
  -> (a -> M b)
  -> M b

And the type of the entire expression would be M A, and k would also have type M A.

Specifically, the rules for within do M bind A would be:

do M bind A
  x : B <- q
  rest

=>

bind B A q (\(x : B) -> desugar-rest)

and

do M bind A
  r
  rest

=>

bind {} A r (\(_ : {}) -> desugar-rest)

where any “non-binding” line is required to have type M {}.

So for example, for Optional, if we had ./bind:

\(a : Type)
  -> \(b : Type)
  -> \(x : Optional a)
  -> \(f : a -> Optional b)
  -> Optional/fold a x (Optional b) f (None b)

and ./subtractMaybe

\(x : Natural) ->
    if Natural/isZero x
      then None Natural
      else Some (Natural/subtract 1 x)

Then we could do:

\(x : Natural) -> do Optional ./bind Natural
     y : Natural <- ./subtractMaybe x
     z : Natural <- ./subtractMaybe y
     ./subtractMaybe z

This would be desugared as:

\(x : Natural) -> ./bind Natural Natural (./subtractMaybe x) (\(y : Natural) ->
    ./bind Natural Natural (./subtractMaybe y) (\(z : Natural) ->
      ./subtractMaybe z
    )
  )

The requirements for type annotations for all bindings, and also that any “non-binding” line has to have type M {}, requires no extra logic for type inference: the types are all present as they are.

The advantages of being able to write

bind B A q (\(x : B) ->
  bind {} A r (\(_ : {}) ->
    bind D A s (\(z : D) ->
      k
    )
  )
)

as

do M bind A
    x : B <- q
    r
    z : D <- s
    k

are:

The type applications are not duplicated, and closer to their main meaningful binding.

In the case of q and x, for instance, B would normally appear twice: as an argument for bind, and again as the lambda annotation. In the do-notation, B would only appear once, and next to x, which is arguably where it is the most important to visually see.

Most egregiously, in the normal case, A appears three times – O(n) on the number of lines. That’s because it is the type of the “final binds” for all of the lines. Now here, A only appears once: at the head of the block, to indicate the type of the entire expression as being M A.
Formatting is clearer: we don’t need the nested indentation creep of the nested lambdas. Automatic formatting also fails to show the intended structure that the original represents. For do notation, automatic formatting could be made to be more clear.
Dataflow is visually clearer.

The “need” for a do notation arose from my experiments with using dhall as a cross-platform safe scripting language, and in the end I’d like to be able to write things like

do ./IO ./bindIO {}
    x : Text <- ./getLine
    y : Text <- ./readFile "myfile.txt"
    ./putStrLn x
    ./writeFile "destination.txt" y

and have them be interpretable by an interpreter.

This works at the moment with explicit binds, but is made very unweildy by the formatting and, probably most significantly, the duplication of necessary type annotations and type arguments for things to work out properly.

I…don’t really have a plan as for how indentation/layouting would work. Right now my examples are a bit ad-hoc with how layouting, newlines, etc. work and I’m open to suggestions.

Let me know if you think this would be something that would fit into the overall vision of dhall, or any tweaks (if possible) that would bring it more in line! If it’s way too out-there to warrant being a part of the standard language, I’d understand too.

sjakobi · October 31, 2019, 12:15pm

Wow, that’s pretty fascinating!

Do you think you could make this non-indentation-sensitive, e.g. with semicolons and braces?

ocharles · November 1, 2019, 11:57am

It’s worth thinking about why you want do notation. Haskell mostly introduced do notation as a way to build IO values - you use the monad interface to build a big IO computation, and the runtime system is then capable of actually interpreting that and doing IO. This makes sense in the context of Haskell - Haskell is one language with (mostly) one interpretation. But does this hold for Dhall? I don’t think so!

In Dhall, we have one language, one normalisation strategy, but then many different ways to interpret the result. Sometimes we interpret the normal form in terms of JSON, sometimes YAML, sometimes as a Cabal package description, and so on. What I’m hinting at here is that normalisation isn’t the end, the user of the normal form can add extra interpretation.

Once you notice that, you start to realise that you don’t need syntactic constructs as often as you would in Haskell - we’re often using those because we can’t provide any extra interpretation beyond just writing Haskell. We’re stuck inside just one language. However, in Dhall we can build custom interpreters though for particular domains.

do ./IO ./bindIO {}
    x : Text <- ./getLine
    y : Text <- ./readFile "myfile.txt"
    ./putStrLn x
    ./writeFile "destination.txt" y

Taking your IO example, I think a more Dhall-y formulation might be

getLine
  (   λ(line : Text)
    → readFile
        "myfile.txt"
        (λ(contents : Text) → putStrLn line (writeFile "destination.txt" y))
  )

This is basically continuation passing. Maybe there are other ways to formulate this though:

let x : Text = getLine
let y : Text = readFile "myfile.txt"
in sequence [ putStrLn x, writeFile y ]

This isn’t quite the same program though - it reorders when the file is read and when x is printed.

I’m not suggesting any of the examples above are better, but more trying to encourage a bit more lateral thinking. We have a Haskell-like language, but that doesn’t mean it is Haskell - what works there might not necessarily be best for Dhall.

Gabriel439 · November 2, 2019, 3:48pm

Yeah, I agree with @ocharles that Dhall tries to push effects into the language runtime so that they don’t need to be modeled in the language. For example:

IO - For input, the language has built-in support for importing values by path or URL. For output, effects are mediated through the host program being configured
Either - The language strives to push errors into type-checking

… rather than run-time failures via Optional or something similar. Yes some utilities and built-ins return Optional, but I generally want to encourage users to push invariants upstream into their inputs rather than propagating failure downstream to their outputs. In other words, I’d prefer this:
```
head : NonEmpty a → a
```
… rather than this:
```
head : List a → Optional a
```
List - This is an interesting effect that is not handled by the language runtime, which I’m including as a counterpoint

Using the example for IO, the way I would restructure it is as a pure function:

-- ./example.dhall
λ(x : Text) → λ(y : Text) → { stdout = x, stderr = "", files = toMap { `destination.txt` = y } }

… and then I’d create a dhall-to-filesystem utility that converted that to stdout/stderr and created desired files based on the provided files record, like this:

dhall-to-filesystem <<< './example.dhall "Some input" (./myfile.txt as Text)'

I understand that wouldn’t handle more complex workflows like interactive output, though, but the general idea I’m going for is that the Dhall ecosystem (and denotative programming in general) should strive to have “thick” interpreters so that the user-facing code remains simpler and mostly effect-free.

jle · November 4, 2019, 9:33pm

My main motivation for a do sugar would be to be able to use dhall as a platform-independent extensible “scripting language”, similar to how lua etc. are. I suppose I wasn’t clear about “what kind of scripting” – i don’t mean IO scripting to replace bash, but rather extensible scripts for programs I write.

For example if I am programming some platform in a compiled language (like, programming a text editor), what could I use as a scripting language in the same place as vimscript or emacs lisp?

I want the user to be be able to write scripts that are:

Interpreted with a very light runtime (so ruling out languages with huge runtimes like haskell)
Typed and type-safe (so not lua, python, ruby, lisp, vimscript, etc.)

In this case I would allow the user to write a script using dhall:

do ./Vim.Type ./Vim.bind {}
    x : Natural <- ./getScreenWidth
    y : Natural <- ./getScreenHeight
    ./drawBox x y
    z : Text <- ./getUserInput
    ./log (z + Natural/show x)

Right now there isn’t really any nice typed solution for an extensible scripting language like lua. Dhall would be a great solution, I think, because it is a cross-platform language already with many backends. I would love to be able to use dhall as “the” scripting language for my text editor, or other sort of programs I have. Dhall seems to be the perfect scripting language (literally zero runtime, type-safe) except for ergonomic issues that do notation would help with.

Without do notation, writing scripts with interactivity, ordering of effects, interleaving of effects, etc. becomes very unweidly. With it, it becomes pretty simple.

Gabriel439 · November 5, 2019, 3:38am

I think at some point we may need to add do notation as Dhall’s use cases grow, but I would like to see Dhall provide a polished and mature experience for at one use case before adding complexity to the language to support other use cases.

I have one use case in mind that appears promising (Dhall as a Helm replacement) but that specific use case is not as important to me as the more general principle of agreeing on one use case and sticking to it instead of spreading ourselves thin. I could be persuaded to pursue another use case as long as it’s just one.

There are a few reasons why it’s beneficial to prioritize one use case early on in a tool’s adoption lifecycle:

Mainstream programmers won’t adopt if a tool is only 80% ready for their use case

These programmers are career-oriented when evaluating tools and they need to feel comfortable staking their careers on a choice of tool and convincing others to follow suit. They are harder to convince but once they switch they are powerful allies because they’re invested in ensuring that there is a viable career centered around their preferred tools wherever they go.
We’re going to butt heads and if we don’t agree on where we are all going

Once you start digging into and polishing each use case you will run into conflicting design goals and you have to start making hard choices. I’d prefer to say no to specific use cases earlier when it’s less painful to do so rather than later after people have invested a large amount of their time.
We need a critical mass of adopters for the project to become self-sustaining

If we dilute our efforts across several use cases then we never create a chain reaction where the project’s momentum sustains itself.

What that means for do notation is that I would like to see the motivation for it framed in those terms, because it’s adding a non-trivial amount of language complexity that all current and future implementations will have to bear.

Gabriel439 · May 30, 2020, 10:24pm

Just for reference, there was an issue opened requesting list comprehension support, which is a special case of do notation: https://github.com/dhall-lang/dhall-lang/issues/1017