Change `dhall format` to use ASCII by default?

I personally use only ascii and enforce ascii formatting in CI in every project.

I do this because ascii is easier to type, and newcomers that read the file don’t have to figure out the mental map between unicode and ascii symbols when typing (which is cognitive overhead).

Also, some unicode symbols are problematic to read, for example // with the wrong font almost looks like a /.

I’d personally prefer if the default was ascii, and if that were to happen, as said above, for consistency everything should be ascii then, including Prelude and dhall console output.

I’d like to also chime in and say that I’d be very happy if we’d switch the default formatting to ASCII, not because I don’t personally like the Unicode syntax (I love it), but because of practical issues when it comes to the adoption of the language.

Some of you might know that I run a project called Spago, that uses Dhall for its configurations. In there I got so many complaints about the Unicode syntax that we had to switch all the formatting to ASCII.
The reasons seem to be mostly that:

  1. The Unicode syntax generates in people approaching the language this dread of “and now how do I type this character?”
    I’ve been there myself and I’d go as far as to say that this is hurting adoption. Of course we could try to make this knowledge more prominent and introduce people to the fact that there is an ASCII variant that they can type and format with etc etc, but why not just avoid the hassle altogether?
  2. roughly half of the world’s population is using “non-latin based fonts” (to support i.e. logographic writing systems: Chinese, Japanese, Korean, etc). It turns out the support for Unicode in these fonts is generally painful. For as much as I support Unicode and I’d like to scream “let’s fix font systems then”, the reality is that this is changing slowly and having Unicode as default unnecessarily ruins UX for these users

Based on the feedback from @SiriusStarr and @philandstuff, then I’d like to propose a new plan for consideration:

  • Make --ascii the default but still accept the flag (which becomes the no-op)
  • Eventually deprecate and remove the --ascii flag
  • Preserve Unicode support in the standard grammar indefinitely (mainly for backwards compatibility even if effectively nobody will use it anymore)
3 Likes

I’m probably being stupid here, but I’m slightly confused by this. Are you saying there’s no --unicode flag, that --ascii behavior is default, and that the --ascii flag would eventually just go away?

In that case, I think it’d honestly be better just to deprecate and then ditch Unicode support entirely from the standard. If you can’t format to it and it’s a PITA to type, what’s the point of it existing? The alternative being anyone who wants it having to maintain a fork of dhall-haskell, with all the issues that entails. Losing the nicer syntax doesn’t exactly thrill me, but I’d rather have one somewhat inferior way of doing things than have 95% of the community do things the “bad” way and 5% the “good” way (which on top of it requires me to fork all my tooling just to do it that way).

1 Like

I hope we’ll get a --unicode or --unicode-symbols flag then?!

Completely removing support for formatting Dhall with Unicode symbols seems disproportionate for a discoverability issue to me.

Making --ascii the default seems understandable to me. But I think we can do just that and then wait and see whether that fixes the issue or whether we need to take further steps.

Also note that Unicode symbols help save line space which is not an unimportant aspect when using dhall format. (See https://github.com/dhall-lang/dhall-haskell/issues/1496 for a related discussion).

2 Likes

See my previous comment - I think having a single standard format used by everyone is preferable to having an ascii format for casual users / users with poor Unicode operator font support, and a different format used by cognoscenti.

I personally like the Unicode format, but if it can’t be used by everyone, I’d prefer we all standardise on something that can be used by everyone.

Hm, I understand the appeal of something like go fmt – I used it happily when I was playing with Go.

I’m not convinced that removing Unicode support will bring dhall format closer to universal adoption though. There’s already quite a bit of demand for a formatter that works very differently – see https://github.com/dhall-lang/dhall-haskell/issues/1496, but also some other issues with the formatting label.

IMHO removing Unicode support would inconvenience existing users (by reducing readability and formatting compactness) for the benefit of beginners and non-users where we don’t know whether they’re going to use Dhall anyway. In fact, I wonder if switching to ASCII wouldn’t make the language less attractive to non-users by reducing it’s visual appeal.

Ultimately it’s obviously easier to add things than to take them away, and I think we ought to be cautious about removing well established features.

1 Like

I guess I jumped the gun a bit by going straight to proposing a design without discussing the requirements.

Based on people’s comments, here is my understanding of the various (possibly competing) requirements people have in mind:

  1. Avoid fragmentation so that people don’t have to read multiple coding styles

  2. Preserve backwards compatibility with existing code in the wild

    e.g. importing code written using Unicode symbols

  3. Remove unused language features for simplicity and ease of porting new implementations

    e.g. dropping support for Unicode symbols if the standard formatter doesn’t use them

Let me know if I missed a requirement that you feel is relevant to this discussion.

@SiriusStarr @sjakobi @philandstuff @f-f @amarrella @tristanC: Out of the above requirements, what relative priority do you all place on them?

@sjakobi made a good point, switching to ASCII may not help adoption after-all. The main question I hear is ‘how to make this character λ?’ , and perhaps how to use Unicode needs to be better documented up-front (e.g. add the dhall format command to saving hook or something…).

On the other hand, switching to ASCII remove this early step, and that’s one less possible argument against dhall. To that effect, 1 > 2 > 3 sounds right to me.

Yep, this was essentially what I was trying to get at, namely that if the problem is (essentially) lack of familiarity with Dhall tooling, the better solution might be to improve onboarding with the tooling/do more to promote it, rather than change the behavior of the tooling/language to need the tooling less.

And @sjakobi’s point that removing Unicode syntax might decrease Dhall’s attractiveness (by decreasing the shiny factor) is a good one; I certainly would much rather glance at something and see ∀(a : Type) → List a → Bool than the much longer and uglier forall (a : Type) -> List a -> Bool. Which is why I think the examples on dhall-lang.org should use Unicode syntax and offer dhall format-style conversion so people can immediately get a taste for that. (Or at least make one of the tabs about Unicode syntax.)

As far as priorities, I would probably rank 1 > 3 > 2, with the following caveats:

  • Moving to a Vonderhaar-style formatter, as has been discussed, doesn’t “break” 1 to me (i.e. doesn’t count as multiple coding styles). I certainly don’t feel like I’m reading multiple coding styles with Elm.

  • 2 (preserving backwards compatibility) is ranked so lowly here by me due to the fact that migration is essentially trivial in this case. If existing code can be brought into compatibility by just running dhall format on it, that’s a non-issue to me personally. If it’s instead a change that required manually updating files, that’s a very different priority.

1 Like

I’d say the requirement (1) here should include “read and write multiple styles”: another problem with having Unicode formatting by default is that people will have to learn both ASCII and Unicode syntax, as they’ll most likely type their files in ASCII syntax, and see them formatted in Unicode syntax.
Yes, the Unicode looks nicer, but I can totally see how it gets in the way of picking up the language.
Also some of the above comments worried about “optimizing for people that are not going to be users anyways”, but this leaves out a big part of the demographic: users that cannot choose and have to interact with Dhall because of other tools - e.g. casual Spago users.

So if the above makes sense the priority for me is 1 > 2 > 3

Let me add one more requirement inspired by last year’s survey:

  1. Don’t use Unicode in beginner-facing examples

This was one of the consistent pieces of feedback I got and the reason I changed all of the examples on dhall-lang.org and the tutorials on the wiki to use ASCII.

As one of the survey respondents said:

Also, having the example with unicode syntax, while cool, make them hard/impossible to type along

… which supports what @f-f is saying about how it increases the difficulty of picking up the language if the beginner has to to learn both the Unicode and ASCII syntax and the correspondence between them to follow along with a tutorial.

Sorry in advance for the brief response. I notice that I have some kind of an emotional budget that can only fit so many difficult or divisive discussions at once. :slight_smile:

The following things are important to me:

  1. Dhall should be accessible to newcomers, and IMHO that means that

    1. Code targeted at newcomers should use ASCII style, and it should be easy to find out that Unicode symbols are not necessary for writing Dhall.

    2. We need to spread the word, that when modifying code in Unicode style, you can write ASCII symbols and then let dhall format take care of changing these symbols to Unicode. We should either extend the cheatsheet to show the ASCII symbol variants or have an extra document that shows the symbols in both variants.

    3. If the measures above are insufficient, we should change the default style for dhall format and other dhall commands to ASCII. I’m less sure about type errors, where I think readability and compactness are also very important.

  2. Readability and visual appeal are important, and I think it would be a loss for existing many Dhall users, if we removed support for Unicode style.

    In consequence, I think Unicode style should remain supported, for example via a --unicode or maybe --unicode-symbols flag.


Regarding style uniformity across the ecosystem, I think that is ultimately desirable, but it’s too early to standardize: There are quite a few people who don’t like the current output of dhall format, and I think we should give a Vonderhaar-style formatter a chance, although I’m not convinced that dhall format is the right place for that experiment.

Never have I more strongly identified with a post. :heart:

Regardless, I do think it’s worth trying better onboarding first before taking the additional measure of switching the default if it proves insufficient.

2 Likes

@sjakobi @SiriusStarr: On the bright side, having several people who care very deeply about the future of the language is a good problem to have! :slight_smile:

Regarding introductory material, what I can do is just defer this discussion until after the upcoming survey that I create at the beginning of each year. That should give us a better read of whether people still complain about Unicode or not. The introductory material has been ASCII-only for a while so updated survey results should help us understand whether people only disapproved of Unicode tutorials or Unicode in general.

1 Like

Maybe we should convert the cheatsheet to use ASCII style?

@sjakobi: Good catch! Yeah, I will do that

Also, I’m thinking of adding questions to this year’s survey to help us gather data on this subject. The questions I’m considering are:

When typing Dhall symbols in my editor (before formatting the code) I use:

  • [ ] ASCII
  • [ ] Unicode

When formatting my code, I use:

  • [ ] Nothing (I don’t format my code)
  • [ ] dhall format
  • [ ] dhall format --ascii

I prefer to read Dhall code written using:

  • [ ] ASCII
  • [ ] Unicode
3 Likes

Yeah, that sounds good! :+1:

I actually find this a little irksome with spago having git diffs because the formatter’s default is unicode and running spago upgrade-set doesn’t follow the default and I do a manual change later. Without passing a spago --use-unicode upgrade flag and having a fork in all projects consuming Dhall, is there any interest in seeing some sort of rc file read on what the user or project prefers as their default so your team can be on the same page?

I agree with all the above arguments on approachability and personally found the unicode support a shiny, cool thing that made me more likely to use the language. I like that Gabriel said indefinite support. It reminds me of Sass: I always loved and preferred the indented syntax, but the SCSS syntax won out because of approachability and similarity with CSS; and as such the tooling chose SCSS to be the default shown, but the support of Sass’s indented syntax was slated to be supported indefinitely (though I moved to SugarSS for different reasons later).

Starting in the next dhall release, dhall format will detect the pre-existing character set of a file and stick to it when formatting the output: https://github.com/dhall-lang/dhall-haskell/pull/2108

1 Like