Dhall-docs cli utility

Hello everyone!

I’m Germán and I was selected on GSoC program to contribute on dhall, specifically on dhall-haskell implementation. My project details can be found here. @Gabriel439, @sjakobi and @Profpatsch are my mentors

In summary, the goal of the project is to develop a dhall-docs cli utility that will take up some dhall files or packages and output an HTML containing actual documentation. Similar to other tools in other programming languages like haddock on haskell or rustdoc on rust. After generating HTML, I’ll implement markdown support as well.

I’d like to read your comments about what would you expect from the tool, like syntax and general usage.

The features that generated HTML will have are the following:

  • The actual documentation extracted from source code comments;
  • Browsing the original source code;
  • Type on hover;
  • Jump to definition

So first thing that I need to do is to design a grammar in order to successfully extract and manipulate source code comments. I was thinking to go with something markdown based. To bold a phrase, we can use markdown bold syntax. To italic a phrase, we can use markdown italic syntax and so on.

I don’t know if comments should have a marker like haddock’s one (-- | or {- |) or javadoc one (/** instead of /*) for example. I’d say yes since not all comments could be for documentation purposes. I’ll keep up with haddock’s one for the rest of the post.

I’ll start with file headers. For example,

{-| My *awesome* function. This does _everything_ you want.
You should also see `myOtherFun` to see what it does

**NOTE**: this is something really important that
you need to be careful about
-}
let f = 1 in f

Name references can be made using backtics. In the HTML output, it will show itself using a console style and as a link that you can click and go to that specific html page.

If the name reference is a link, then it can go to that link. The project should not make any assumption yet about where are these docs going to be deployed and it may be in different places.

If the name reference is a relative path to another location, it could redirect to that page.

Previous example put their docs in the header part of the file and scope of my project is aimed to start processing only that part, but as an enhancement I’d like to extend these doc annotations to other places, like record fields on types:

{-| Create a world citizen

You can customize their ages and stuff

-}
let Person
  : Type
  =  { name : Text -- | Name of the perseo
     , dob: Date -- | Date of birth
     }

on union types

let Strategy: Type =
  < Full -- | Do full analysis
  | Partial -- | Do partial analysis. As `Strategy.Full`, but without A*
  >

on function arguments

{-|
Concatenate a `List` of `List`s into a single `List`
-}
let concat
    : ∀(a : Type) → List (List a) → List a
    = λ(a : Type) →            -- | Element types
      λ(xss : List (List a)) → -- | List to concat into a single one
        ...

and any other places that you think it could be helpful to.

Please, let me know what you think about this and issues you see. I’m not very experienced on dhall and probably I’m missing something. Thanks in advance

13 Likes

First off, I’m very glad that we got this GSoC slot, and totally stoked to be involved as a mentor in your project, Germán! :slight_smile:

I have a few comments regarding these initial syntax ideas:

  1. Inline code and identifiers to be hyperlinked should have different syntax. Otherwise, docs will be prone to contain unintended links. For example, some documentation could contain

    {- | This is similar to the Haskell  `foo` function. -}
    

    At some later stage, the code base gets a foo function of its own, and the inline code unexpectedly turns into a link.

    Maybe we can go with rustdoc’s Markdown link style here, maybe we can find something better.

    Also see [`bla`].
    
    [`bla`]: ../bla.dhall
    
  2. We need some simple rules how doc comments are associated with definitions. In your examples I think it’s slightly surprising that, in the case of f, the doc comment refers to the following identifier, while in the later examples, the doc comment refers to the preceding identifier. Maybe we ought to have two “associativities” like Haddock does with -- | and -- ^. OTOH that might be a non-essential complication that we’d rather avoid. Haddock has a few weird edge cases (e.g. 1, 2) in these rules, that I hope we can avoid, although they aren’t super important in practice.

  3. In your concat example, I was slightly surprised to see that the doc comments were mixed with the function arguments instead of the function type, e.g.

    let concat
        : ∀(a : Type) → -- | Element types
          List (List a) → -- | List to concat into a single one
          List a
        = λ(a : Type) →
          λ(xss : List (List a)) →
            ...
    

    I think we’ll also want to support the latter style, particularly when the bound expression is not a lambda literal, e.g.

    let foo
      : X -> -- | bla
        Y
      = bar baz
    

    If we want to support both styles, we should probably warn or error out when users mix them, e.g.

    let foo
      : X -> -- | bla
        Y
      = \(x : X) -> -- | blarg
         ...
    

Eventually we should also think about “structural documentation” that belongs to a package or a collection of identifiers instead of one specific identifier. The relevant documentation for Haddock is here and here. I’m not sure whether this in the scope for this GSoC though.

Also, if we do end up copying a lot of things from rustdoc, it would be great to get some feedback from people who have more experience with that tool. Are there any flaws that we should avoid copying? Maybe @Nadrieril (the maintainer of dhall-rust) has some ideas?

Some random thoughts:

  1. I think most of the parsing logic should live in a separate library, so tools like dhall-lsp could reuse the code. Dhall-docs (may I suggest dhocs?) would only build the static HTML pages.
  2. The doc grammar should be formally specified in an optional section of dhall specs, and we should eventually define acceptance tests, so other implementations could use the grammar for their own doc utilities.
1 Like

I’ve been thinking a bit more about the frustrations of using Haddock, and how to avoid them in our tool:

With Haddock, it’s often tricky to find the right syntax that produces the desired rendered documentation. By going with a popular markup format like Markdown instead of a custom format like Haddock I think we’re already on a better track. Nevertheless, users will make syntax mistakes, and we should try to make it as simple and quick as possible to find and fix these mistakes.

One way might be to validate the input rather strictly and to warn or error out when anything looks amiss. If we go this way, users will expect the tool to produce high quality error messages, which will take quite a bit of effort.

I wonder whether it might be better to be a bit more lenient, and to enable users to render the documentation they’re currently editing very quickly for visual feedback. If the usual “batch mode” is too slow for that, maybe we ought to have some “single file mode” that could possibly skip resolving imports and type-checking…

Another idea would be have some kind of “edit mode” in the tool, where the tool highlights problems directly in the rendered documentation, instead of emitting warnings to stderr. I’m not sure how tricky it would be to implement something like that – maybe it shouldn’t be in scope for the GSoC.

Uh, and why not go all the way to an “editor mode” where users can edit the documentation in a WYSIWYG editor in the browser. :slight_smile: Again, that’s probably an idea for a another GSoC. :wink:

Some thoughts, in a priority order under headings:

Important ideas

dhall format currently strips all comments other than those at the top of the file. This will need to change so that dhall format does not remove useful dhall-doc comments. (This goes hand-in-hand with formalizing the grammar for comments that @lisael said above).

Bikesheddy ideas

I find Go’s tool (godoc) to be an interesting point in the design space. It has an aggressively minimal syntax: no bold or italic, no inline links (though URLs will be formatted as links), no special syntax for “godoc comments” versus “other comments” (instead there’s a convention that a comment above a definition of a top-level public identifier is a godoc comment; all other comments are not).

It also has strong conventions about comment content (eg the first word, modulo any “a” or “the”, should be the identifier being documented; write full sentences including full stops (periods)).

The lack of features can cause pain sometimes. It does not support bulleted lists, but it does support preformatted (monospace) text, so people sometimes emulate bullets in preformatted text like this:

- item one
- item two

but mostly I find its simplicity and total lack of syntax is a feature, not a bug.

The only other tool I have any experience with is javadoc, and I found that it was hard:

  • to know all the appropriate syntax
  • to use the syntax correctly
  • to write javadoc that was as readable in source form as in formatted form

For an example of this last point, inline links would make sentences hard to read in source form, because a URL is inserted into the middle of a sentence, breaking up reading flow. (I also find this true of Markdown inline links and generally prefer reference-style links in any Markdown where people can expect to read the source file.)

Another interesting feature of godoc is example tests: runnable bits of code which are rendered as preformatted code in the resulting godoc. This is similar to python doctests. I wonder if it could be possible to do something similar with Dhall assertions - ie have particular Dhall assertions which appear in the dhall-doc for a file (though this is definitely in the “future nice-to-have” category).

dhall's parser does preserve comments in a few more spots these days. We’ll most likely need to enhance that further for the documentation generator though. The relevant issue is https://github.com/dhall-lang/dhall-haskell/issues/145.

Thanks for your input! I’ll try to reply to each message you sent:

I agree with this syntax and with your concern.

Actually I didn’t mind about that concern you say in the moment I wrote my post and you’re right. We could stick with using two different comment markers to let our parser know what object user is pointing to: one commonly used for header and let-binding descriptions and other one for record fields and funciton types, restricting the place that user puts them on the source code (before or after desired documented element), though that will complicated dhall format a little bit.

Yes, it is better to keep them on the type definition, name argument bindings are kind of ephimeral after all. A case that I’m thinking about is that the user doesn’t provide a type for that function ant the bound expression isn’t a lambda, like:

let myFun = Integer/show

I guess that in that case he shouldn’t try to add docs to their args and if they wanted to then add a type definition!

@lisael I love that name dhocs you suggest! Regarding doc grammar one of the reasons that we (mentors and myself) though of using markdown is because majority of dhall packages already kind of stick with it for their documentation headers.

I like the idea of having a shared grammar definition heavily based on markdown, specifically on commonmark definition, where there is actually a haskell implementation, mmark. That could be somehow used for dhall-lsp-server and dhall format commands, though last one doesn’t care about doc comments.

That feature was suggested on the summer of haskell ideas list and although I didn’t include it in my proposal it could be really nice to have.

1 Like

I think I was misunderstood. I don’t have a strong opinion about the documentation format itself, commonmark is fine, and I agree we shouldn’t blow a brand new grammar.

What I suggest is that we formally specify what is a doc comment, how to associate the doc comment to a definition, and what is the format of the documentation (not the format itself). And this spec should be an optional part of the dhall spec, a nice-to-have for any dhall implementation.

Imagine I write go app, configured with dhall, I want to document the configuration format right into the generated godoc of my package. If dhall-golang (or a third party go package) implements the dhall-docs specs and a dhall-docs-to-godoc translator, it’s a non-issue. We should encourage the implementation of such doc generators and translators by providing the formal specs in the standard along with acceptance tests.

Sorry if I was not clear at first. Ah, and congrats for the GSOC, BTW :slight_smile:

2 Likes

I totally understand your concern and I agree. dhall-docs grammar should be specified so we don’t have several deviated implementations.

Since currently there is not a defined standard yet, I think that we can use this post to discuss about it (its primary purpose) and to publish final decision to the common repository.

My doubt now is if we publish that spec for dhocs, how are we going to define acceptance tests? Are they gonna be the expected HTML/markdown/whatsover output by the tool if a parse was succesfull? An idea that I have (not so elaborated) is to:

  1. Have a .dhall file with the doc format test
  2. Have a .docs file with some machine-readable format (maybe dhall?) that represents the extracted doc information with its definitiond and so one
  3. Compare the actual output with the one defined in (2)

But again, we should focus more primarily about things like docs syntax, markers, documented definitions first and later tackle the standard specs issue.

2 Likes

That utility looks very promising!

About using inline comments to document record fields, is this going to be somehow added to the Dhall.Core.Expr haskell data so that the documentation can easily be generated from openapi swagger definitions, for example: here ?

Yes. The tool will be built on top of the Haskell dhall library. Supporting annotations like those in Germán’s Person example will require storing comments on record fields in Expr.

1 Like

Instead of manually linking, the user should be able to just

[`bla`][]

and it would be linked to the definition of bla from the current scope. We are going to have type-on hover, so the same information can be used.

This overlaps a bit with plain Markdown links, so we should pin down a good ergonomic semantics.

Rustdoc is very primitive in that they have no automatic crosslinking at the moment, which we definitely want here.

I agree, but I don’t think this is in the Scope of this GSoC, since it requires a refactor and a rather deep understanding of the code I’m afraid.

We can use what we got for now, module-level comments and comments in let bindings.

To be clear, at the moment the parser strips everything but the comment at the very beginning of the file and comments appearing directly after the let keyword:

let
  -- test12
  foo = …

Maybe some new preserved locations have been added and I missed it.

You may be already aware of the previous attempts to support Markdown in Haskell’s Haddock. It didn’t go well:

https://web.archive.org/web/20180109024021/http://fuuzetsu.co.uk/blog/posts/2013-08-30-why-Markdown-in-Haddock-can’t-happen.html

Some of the problems we due to backward compatibility requirements, so you’re in a better position. It’s good to learn from others’ mistakes.

1 Like

Having given some thought to this, I’m not sure that Haddock or Javadoc are a good model after all. You see, one thing common to Haskell and Java is that they have a module syntax that includes a static list of top-level declarations. That makes it easy to associate a comment with each declaration. Furthermore, imports and (re-)exports are also static and easy to keep track of for a simple syntax-driven tool.

Dhall on the other hand has no concept of module, import, export, or top-level declaration. It’s just expressions all the way down. This means that any documentation tool that wants to support re-exports, for example, must be able to not just parse but normalize. Say you have a well-documented file base.dhall:

{-- | user name must contain letters and digit only
 userName : Text,
 -- | sha256 please
 passwordHash : Text,
 -- | optional full name
 realName : Optional Text}

That’s a decent equivalent of a Haskell/Java module with everything exported. Now what happens when we re-export it from reexport.dhall?

let base = ./base.dhall
in {
  userName : base.userName, -- reexport
  hash : base.passwordHash, -- reexport renamed
  -- | age in years, for legal purposes
  age : Natural}

It would be pretty disappointing if the generated reexport.html documentation contained only the age comment and not the comment for the re-exported userName and hash. But for that to happen, it’s clear that the documentation tool (I do like dhocs @lisael) can’t be just a dumb parser – unless it’s a dumb parser of a normalized expression with meticulously preserved comments.

I agree that we should eventually support this, but I believe it’s a bit tricky to get right. Both the link representation (should it be bla, ./bla, or Prelude.List.bla?) and generating the the right link target seem like fairly subtle problems to me. It might be easier to add this once we know what the structure of the generated documentation looks like.

To support comments on record fields would be a bit of work, but I don’t think it’s very hard. Updating the prettyprinter is probably the trickiest bit. Given the demand for that feature, I think it would be effort well spent.

A few more spots in a let-binding are preserved too: https://github.com/dhall-lang/dhall-haskell/blob/93313dc99fe9e1179158e2b4232cbd6346b171f2/dhall/src/Dhall/Syntax.hs#L182-L194

(BTW, this is a nice example of a doc comment that isn’t rendered as expected by the author :wink: )

FYI, that plan isn’t quite dead, but it’s not clear when we’ll see more progress on the implementation.

The question is who should do it. imho it’s out-of-scope for the GSoC, but of course somebody else could jump in and try to fix it.

Why would it be out-of-scope? It’s simple groundwork for an important feature of the tool. I don’t believe that it would take Germán more than 2 or 3 days to address, and he’d get a good tour of the code base too! :slight_smile:

It’s simple groundwork for an important feature of the tool. I don’t believe that it would take Germán more than 2 or 3 days to address, and he’d get a good tour of the code base too! :slight_smile:

That’s good news, because it’s probably inescapable – unless we don’t care for re-exports as in my previous example.