Dhall-docs cli utility

sjakobi · May 22, 2020, 3:12pm

dhall's parser does preserve comments in a few more spots these days. We’ll most likely need to enhance that further for the documentation generator though. The relevant issue is https://github.com/dhall-lang/dhall-haskell/issues/145.

german1608 · May 22, 2020, 6:05pm

Thanks for your input! I’ll try to reply to each message you sent:

I agree with this syntax and with your concern.

Actually I didn’t mind about that concern you say in the moment I wrote my post and you’re right. We could stick with using two different comment markers to let our parser know what object user is pointing to: one commonly used for header and let-binding descriptions and other one for record fields and funciton types, restricting the place that user puts them on the source code (before or after desired documented element), though that will complicated dhall format a little bit.

Yes, it is better to keep them on the type definition, name argument bindings are kind of ephimeral after all. A case that I’m thinking about is that the user doesn’t provide a type for that function ant the bound expression isn’t a lambda, like:

let myFun = Integer/show

I guess that in that case he shouldn’t try to add docs to their args and if they wanted to then add a type definition!

@lisael I love that name dhocs you suggest! Regarding doc grammar one of the reasons that we (mentors and myself) though of using markdown is because majority of dhall packages already kind of stick with it for their documentation headers.

I like the idea of having a shared grammar definition heavily based on markdown, specifically on commonmark definition, where there is actually a haskell implementation, mmark. That could be somehow used for dhall-lsp-server and dhall format commands, though last one doesn’t care about doc comments.

That feature was suggested on the summer of haskell ideas list and although I didn’t include it in my proposal it could be really nice to have.

lisael · May 22, 2020, 11:06pm

I think I was misunderstood. I don’t have a strong opinion about the documentation format itself, commonmark is fine, and I agree we shouldn’t blow a brand new grammar.

What I suggest is that we formally specify what is a doc comment, how to associate the doc comment to a definition, and what is the format of the documentation (not the format itself). And this spec should be an optional part of the dhall spec, a nice-to-have for any dhall implementation.

Imagine I write go app, configured with dhall, I want to document the configuration format right into the generated godoc of my package. If dhall-golang (or a third party go package) implements the dhall-docs specs and a dhall-docs-to-godoc translator, it’s a non-issue. We should encourage the implementation of such doc generators and translators by providing the formal specs in the standard along with acceptance tests.

Sorry if I was not clear at first. Ah, and congrats for the GSOC, BTW

german1608 · May 22, 2020, 11:21pm

I totally understand your concern and I agree. dhall-docs grammar should be specified so we don’t have several deviated implementations.

Since currently there is not a defined standard yet, I think that we can use this post to discuss about it (its primary purpose) and to publish final decision to the common repository.

My doubt now is if we publish that spec for dhocs, how are we going to define acceptance tests? Are they gonna be the expected HTML/markdown/whatsover output by the tool if a parse was succesfull? An idea that I have (not so elaborated) is to:

Have a .dhall file with the doc format test
Have a .docs file with some machine-readable format (maybe dhall?) that represents the extracted doc information with its definitiond and so one
Compare the actual output with the one defined in (2)

But again, we should focus more primarily about things like docs syntax, markers, documented definitions first and later tackle the standard specs issue.

tristanC · May 24, 2020, 2:13am

That utility looks very promising!

About using inline comments to document record fields, is this going to be somehow added to the Dhall.Core.Expr haskell data so that the documentation can easily be generated from openapi swagger definitions, for example: here ?

sjakobi · May 24, 2020, 10:50am

Yes. The tool will be built on top of the Haskell dhall library. Supporting annotations like those in Germán’s Person example will require storing comments on record fields in Expr.

Profpatsch · May 25, 2020, 10:49am

Instead of manually linking, the user should be able to just

[`bla`][]

and it would be linked to the definition of bla from the current scope. We are going to have type-on hover, so the same information can be used.

This overlaps a bit with plain Markdown links, so we should pin down a good ergonomic semantics.

Rustdoc is very primitive in that they have no automatic crosslinking at the moment, which we definitely want here.

Profpatsch · May 25, 2020, 10:51am

I agree, but I don’t think this is in the Scope of this GSoC, since it requires a refactor and a rather deep understanding of the code I’m afraid.

We can use what we got for now, module-level comments and comments in let bindings.

Profpatsch · May 25, 2020, 10:53am

To be clear, at the moment the parser strips everything but the comment at the very beginning of the file and comments appearing directly after the let keyword:

let
  -- test12
  foo = …

Maybe some new preserved locations have been added and I missed it.

blamario · May 25, 2020, 1:14pm

You may be already aware of the previous attempts to support Markdown in Haskell’s Haddock. It didn’t go well:

https://web.archive.org/web/20180109024021/http://fuuzetsu.co.uk/blog/posts/2013-08-30-why-Markdown-in-Haddock-can’t-happen.html

Some of the problems we due to backward compatibility requirements, so you’re in a better position. It’s good to learn from others’ mistakes.

blamario · May 25, 2020, 2:19pm

Having given some thought to this, I’m not sure that Haddock or Javadoc are a good model after all. You see, one thing common to Haskell and Java is that they have a module syntax that includes a static list of top-level declarations. That makes it easy to associate a comment with each declaration. Furthermore, imports and (re-)exports are also static and easy to keep track of for a simple syntax-driven tool.

Dhall on the other hand has no concept of module, import, export, or top-level declaration. It’s just expressions all the way down. This means that any documentation tool that wants to support re-exports, for example, must be able to not just parse but normalize. Say you have a well-documented file base.dhall:

{-- | user name must contain letters and digit only
 userName : Text,
 -- | sha256 please
 passwordHash : Text,
 -- | optional full name
 realName : Optional Text}

That’s a decent equivalent of a Haskell/Java module with everything exported. Now what happens when we re-export it from reexport.dhall?

let base = ./base.dhall
in {
  userName : base.userName, -- reexport
  hash : base.passwordHash, -- reexport renamed
  -- | age in years, for legal purposes
  age : Natural}

It would be pretty disappointing if the generated reexport.html documentation contained only the age comment and not the comment for the re-exported userName and hash. But for that to happen, it’s clear that the documentation tool (I do like dhocs @lisael) can’t be just a dumb parser – unless it’s a dumb parser of a normalized expression with meticulously preserved comments.

sjakobi · May 25, 2020, 2:21pm

Profpatsch:

Instead of manually linking, the user should be able to just
[`bla`][]
and it would be linked to the definition of bla from the current scope. We are going to have type-on hover, so the same information can be used.

I agree that we should eventually support this, but I believe it’s a bit tricky to get right. Both the link representation (should it be bla, ./bla, or Prelude.List.bla?) and generating the the right link target seem like fairly subtle problems to me. It might be easier to add this once we know what the structure of the generated documentation looks like.

To support comments on record fields would be a bit of work, but I don’t think it’s very hard. Updating the prettyprinter is probably the trickiest bit. Given the demand for that feature, I think it would be effort well spent.

A few more spots in a let-binding are preserved too: https://github.com/dhall-lang/dhall-haskell/blob/93313dc99fe9e1179158e2b4232cbd6346b171f2/dhall/src/Dhall/Syntax.hs#L182-L194

(BTW, this is a nice example of a doc comment that isn’t rendered as expected by the author )

FYI, that plan isn’t quite dead, but it’s not clear when we’ll see more progress on the implementation.

Profpatsch · May 25, 2020, 2:27pm

The question is who should do it. imho it’s out-of-scope for the GSoC, but of course somebody else could jump in and try to fix it.

sjakobi · May 25, 2020, 2:57pm

Why would it be out-of-scope? It’s simple groundwork for an important feature of the tool. I don’t believe that it would take Germán more than 2 or 3 days to address, and he’d get a good tour of the code base too!

blamario · May 25, 2020, 3:00pm

It’s simple groundwork for an important feature of the tool. I don’t believe that it would take Germán more than 2 or 3 days to address, and he’d get a good tour of the code base too!

That’s good news, because it’s probably inescapable – unless we don’t care for re-exports as in my previous example.

german1608 · May 26, 2020, 12:53am

Hi, just digesting all your input

That’s what actually I wrote on my proposal I think. But I’d really like to add on records. If that gets difficult, a workaround will be for people to put the relevant doc for each record field on their module header or let-binding.

Thank you for noticing that and yes, I should try to make dhocs (henceforth I’m gonna use that name @lisael) in a way that it just doesn’t takes comments and outputs HTML, but to do this kind of things so dhall users don’t repeat themselves and without creating a tool that it’s hard to use, to create a haddock without the bad things of haddock).

And I don’t have any problem working on that after GSOC project (of course, if I have time to do so). I mean, I don’t want to just work on the project these 3 months and just say goodbye after it. I’d like to keep contributing later, I’m liking this whole project you’ve done.

Now, I’d like to thank all of your comments about this and ask you to not stop. I’m currently designing some simple mockups about what I think to be the front-end of the HTML output, but I’m open to any other suggestion or idea. I’ll also send here the result of my mockups so you can give me your thoughts about it as well.

Profpatsch · May 26, 2020, 10:40am

Just a thought, how about you make it output a dhall expression which can then be converted to HTML or Markdown or manpages or whatever in a second step.

blamario · May 26, 2020, 12:51pm

That’s an interesting idea. If you take it another step forward (or backward, against the data flow) you might ask why not make the original annotations Dhall expressions rather than comments:

the annotation would not change the type of the expression,
it would be stripped away by destructive operations such as +, *, or ==,
a lambda would preserve it,
record merges would preserve the field annotations.

As an analogy, think of the semi-standard <?> parser combinator.

german1608 · May 26, 2020, 1:41pm

@Profpatsch I like the idea and it could simplify a lot of things! I think it goes by the hand from @sjakobi comment on extending Dhall.Core.Expr a little.

@blamario regarding your last post I understand what you’re looking at and it looks really clever and it could simplify a lot of things. I’d like to give it a try and see how it goes.

german1608 · June 19, 2020, 4:45pm

Hi all! I have some news about the tool so far.

We end up using dhall-docs instead of dhocs as @lisael suggested. The reason is that described in #1833 (comment)
I’ve made some progress, but the tool is far from actually usable yet. See this list of PRs to have a notion of what is actually done.

Now I’d like to introduce some comments and regards we end up having about the structure of the documentation

(context #1863 (comment)) We’ll need to distinguish between documentation comments and internal comments. In that thread we agreed we’ll use {-| and -- |.

(context #1868) To ease documentation writing, we’ll use commonmark as a base. mmark is a haskell package that let us parse commonmark and produce html. Now an issue is that commonmark is really sensitive to indentation.
For example, if you write the following markdown:

foo
bar
baz

it will render 3 <p> html elements, but the following:

foo
bar
    baz

will render 2 <p> for foo and bar and render a <code> for baz. That feature is indented-codeblocks

When we write documentation in a header, one might write something like this:

{-| foo
    bar
    baz
-}

One would expect that it renders 3 <p>. Currently, dhall-docs is just removing the {-| and -}, so the text ends being:

foo
    bar
    baz

and it renders <code> blocks, instead of what the user might thought at the beginning: 3 <p>.

We can’t just strip each line because there are some common markdown features (like nested list items) that needs the indentation, so it looks like we need an explicit way to define it.

In that thread, we proposed using | to solve it.

{-| foo
  | bar
  | baz
-}

Now is clear that indentation is the same, and its easier to process. This is similar to javadoc, where each line starts with *; or rustdoc, where each line starts with ///.

I’d like to read your thoughts about this so far and your experiences with other language doc generators.