Change `dhall format` to use ASCII by default?


#2

As I posted on Github, I don’t think it’s a horrible change as long as the language server/IDE plugins gets an easy flag for Unicode.

But it’s not clear to me that this doesn’t fail to solve (or even compound) the issue. Changing the behavior of dhall format presupposes that the newcomer has discovered the existence of dhall format, at which point the Unicode syntax would be a non-issue. And if dhall format uses ASCII by default, and a newcomer comes across a file with Unicode syntax, then copy-pasting is actually the only way for them to edit it (barring a potentially obscure flag). Not to mention this necessitating changing literally every file in the Prelude (and a lot of files in the standard), though those shouldn’t be semantic hash-affecting, thankfully.

Honestly, I see this as potentially fracturing the community (not in an argument sense, haha), just in terms of then running into half of files formatted in one way and half in the other, which invariably becomes a pain to work with. (See also line-endings, spaces vs tabs, braces on the same line or next, etc. etc. etc.)

At the end of the day, I guess I come down on the Unicode side for the following reason: It’s hard to write Unicode syntax without dhall format, but it’s not hard to write ASCII syntax without it. If it’s going to be horribly annoying to use Unicode syntax, then it’s not really clear why it should be supported at all. (Yes, of course, there’d be a flag for it, but at that point it doesn’t seem to be anything but an afterthought.)

As I said on the github issue, I do wonder if it’d be better in the long run to instead focus more on introducing dhall format/Unicode syntax to newcomers. Honestly, dhall format could be made a selling point on the home page (never worry about formatting again!). Anyways, I talked a bit about that there, so I won’t repeat myself here.


#3

I agree with SiriusStarr, I don’t think we should add this option to dhall format.

More generally, I think code formatters should not offer options at all. I think that offering options to formatters causes a fragmentation of code styles and I much prefer more recent efforts such as standardjs and go fmt which have a much more “my way or the highway” ethos. I think consistency has proved to be more valuable than configurability. My “aha” moment for this was realising that, although I hate tabs for indentation, I prefer using go fmt (which uses tabs) than breaking with the community consistency.

So, if we accept that dhall format should not be configurable, we have two choices:

  1. Keep the status quo - Unicode operators
  2. Switch to ASCII

and I think if we choose option 2, then effectively the Unicode operators will be dead language features which should eventually be removed.

I slightly prefer option 1 but I can see the argument for option 2 based around discoverability for new users. If we went with option 2, people could still get their Unicode fix via IDE display modes such as emacs pretty-mode.


#4

For option 2 to be effective, we would also have to update the Prelude code and the --explain message to stop using Unicode. I personally don’t mind either way, but it seems like new comers are most likely more comfortable with ascii…


#5

I personally use only ascii and enforce ascii formatting in CI in every project.

I do this because ascii is easier to type, and newcomers that read the file don’t have to figure out the mental map between unicode and ascii symbols when typing (which is cognitive overhead).

Also, some unicode symbols are problematic to read, for example // with the wrong font almost looks like a /.

I’d personally prefer if the default was ascii, and if that were to happen, as said above, for consistency everything should be ascii then, including Prelude and dhall console output.


#6

I’d like to also chime in and say that I’d be very happy if we’d switch the default formatting to ASCII, not because I don’t personally like the Unicode syntax (I love it), but because of practical issues when it comes to the adoption of the language.

Some of you might know that I run a project called Spago, that uses Dhall for its configurations. In there I got so many complaints about the Unicode syntax that we had to switch all the formatting to ASCII.
The reasons seem to be mostly that:

  1. The Unicode syntax generates in people approaching the language this dread of “and now how do I type this character?”
    I’ve been there myself and I’d go as far as to say that this is hurting adoption. Of course we could try to make this knowledge more prominent and introduce people to the fact that there is an ASCII variant that they can type and format with etc etc, but why not just avoid the hassle altogether?
  2. roughly half of the world’s population is using “non-latin based fonts” (to support i.e. logographic writing systems: Chinese, Japanese, Korean, etc). It turns out the support for Unicode in these fonts is generally painful. For as much as I support Unicode and I’d like to scream “let’s fix font systems then”, the reality is that this is changing slowly and having Unicode as default unnecessarily ruins UX for these users

#7

Based on the feedback from @SiriusStarr and @philandstuff, then I’d like to propose a new plan for consideration:

  • Make --ascii the default but still accept the flag (which becomes the no-op)
  • Eventually deprecate and remove the --ascii flag
  • Preserve Unicode support in the standard grammar indefinitely (mainly for backwards compatibility even if effectively nobody will use it anymore)

#8

I’m probably being stupid here, but I’m slightly confused by this. Are you saying there’s no --unicode flag, that --ascii behavior is default, and that the --ascii flag would eventually just go away?

In that case, I think it’d honestly be better just to deprecate and then ditch Unicode support entirely from the standard. If you can’t format to it and it’s a PITA to type, what’s the point of it existing? The alternative being anyone who wants it having to maintain a fork of dhall-haskell, with all the issues that entails. Losing the nicer syntax doesn’t exactly thrill me, but I’d rather have one somewhat inferior way of doing things than have 95% of the community do things the “bad” way and 5% the “good” way (which on top of it requires me to fork all my tooling just to do it that way).


#9

I hope we’ll get a --unicode or --unicode-symbols flag then?!

Completely removing support for formatting Dhall with Unicode symbols seems disproportionate for a discoverability issue to me.

Making --ascii the default seems understandable to me. But I think we can do just that and then wait and see whether that fixes the issue or whether we need to take further steps.

Also note that Unicode symbols help save line space which is not an unimportant aspect when using dhall format. (See https://github.com/dhall-lang/dhall-haskell/issues/1496 for a related discussion).


#10

See my previous comment - I think having a single standard format used by everyone is preferable to having an ascii format for casual users / users with poor Unicode operator font support, and a different format used by cognoscenti.

I personally like the Unicode format, but if it can’t be used by everyone, I’d prefer we all standardise on something that can be used by everyone.


#11

Hm, I understand the appeal of something like go fmt – I used it happily when I was playing with Go.

I’m not convinced that removing Unicode support will bring dhall format closer to universal adoption though. There’s already quite a bit of demand for a formatter that works very differently – see https://github.com/dhall-lang/dhall-haskell/issues/1496, but also some other issues with the formatting label.

IMHO removing Unicode support would inconvenience existing users (by reducing readability and formatting compactness) for the benefit of beginners and non-users where we don’t know whether they’re going to use Dhall anyway. In fact, I wonder if switching to ASCII wouldn’t make the language less attractive to non-users by reducing it’s visual appeal.

Ultimately it’s obviously easier to add things than to take them away, and I think we ought to be cautious about removing well established features.


#12

I guess I jumped the gun a bit by going straight to proposing a design without discussing the requirements.

Based on people’s comments, here is my understanding of the various (possibly competing) requirements people have in mind:

  1. Avoid fragmentation so that people don’t have to read multiple coding styles

  2. Preserve backwards compatibility with existing code in the wild

    e.g. importing code written using Unicode symbols

  3. Remove unused language features for simplicity and ease of porting new implementations

    e.g. dropping support for Unicode symbols if the standard formatter doesn’t use them

Let me know if I missed a requirement that you feel is relevant to this discussion.

@SiriusStarr @sjakobi @philandstuff @f-f @amarrella @tristanC: Out of the above requirements, what relative priority do you all place on them?


#13

@sjakobi made a good point, switching to ASCII may not help adoption after-all. The main question I hear is ‘how to make this character λ?’ , and perhaps how to use Unicode needs to be better documented up-front (e.g. add the dhall format command to saving hook or something…).

On the other hand, switching to ASCII remove this early step, and that’s one less possible argument against dhall. To that effect, 1 > 2 > 3 sounds right to me.


#14

Yep, this was essentially what I was trying to get at, namely that if the problem is (essentially) lack of familiarity with Dhall tooling, the better solution might be to improve onboarding with the tooling/do more to promote it, rather than change the behavior of the tooling/language to need the tooling less.

And @sjakobi’s point that removing Unicode syntax might decrease Dhall’s attractiveness (by decreasing the shiny factor) is a good one; I certainly would much rather glance at something and see ∀(a : Type) → List a → Bool than the much longer and uglier forall (a : Type) -> List a -> Bool. Which is why I think the examples on dhall-lang.org should use Unicode syntax and offer dhall format-style conversion so people can immediately get a taste for that. (Or at least make one of the tabs about Unicode syntax.)

As far as priorities, I would probably rank 1 > 3 > 2, with the following caveats:

  • Moving to a Vonderhaar-style formatter, as has been discussed, doesn’t “break” 1 to me (i.e. doesn’t count as multiple coding styles). I certainly don’t feel like I’m reading multiple coding styles with Elm.

  • 2 (preserving backwards compatibility) is ranked so lowly here by me due to the fact that migration is essentially trivial in this case. If existing code can be brought into compatibility by just running dhall format on it, that’s a non-issue to me personally. If it’s instead a change that required manually updating files, that’s a very different priority.


#15

I’d say the requirement (1) here should include “read and write multiple styles”: another problem with having Unicode formatting by default is that people will have to learn both ASCII and Unicode syntax, as they’ll most likely type their files in ASCII syntax, and see them formatted in Unicode syntax.
Yes, the Unicode looks nicer, but I can totally see how it gets in the way of picking up the language.
Also some of the above comments worried about “optimizing for people that are not going to be users anyways”, but this leaves out a big part of the demographic: users that cannot choose and have to interact with Dhall because of other tools - e.g. casual Spago users.

So if the above makes sense the priority for me is 1 > 2 > 3


#16

Let me add one more requirement inspired by last year’s survey:

  1. Don’t use Unicode in beginner-facing examples

This was one of the consistent pieces of feedback I got and the reason I changed all of the examples on dhall-lang.org and the tutorials on the wiki to use ASCII.

As one of the survey respondents said:

Also, having the example with unicode syntax, while cool, make them hard/impossible to type along

… which supports what @f-f is saying about how it increases the difficulty of picking up the language if the beginner has to to learn both the Unicode and ASCII syntax and the correspondence between them to follow along with a tutorial.


#17

Sorry in advance for the brief response. I notice that I have some kind of an emotional budget that can only fit so many difficult or divisive discussions at once. :slight_smile:

The following things are important to me:

  1. Dhall should be accessible to newcomers, and IMHO that means that

    1. Code targeted at newcomers should use ASCII style, and it should be easy to find out that Unicode symbols are not necessary for writing Dhall.

    2. We need to spread the word, that when modifying code in Unicode style, you can write ASCII symbols and then let dhall format take care of changing these symbols to Unicode. We should either extend the cheatsheet to show the ASCII symbol variants or have an extra document that shows the symbols in both variants.

    3. If the measures above are insufficient, we should change the default style for dhall format and other dhall commands to ASCII. I’m less sure about type errors, where I think readability and compactness are also very important.

  2. Readability and visual appeal are important, and I think it would be a loss for existing many Dhall users, if we removed support for Unicode style.

    In consequence, I think Unicode style should remain supported, for example via a --unicode or maybe --unicode-symbols flag.


Regarding style uniformity across the ecosystem, I think that is ultimately desirable, but it’s too early to standardize: There are quite a few people who don’t like the current output of dhall format, and I think we should give a Vonderhaar-style formatter a chance, although I’m not convinced that dhall format is the right place for that experiment.


#18

Never have I more strongly identified with a post. :heart:

Regardless, I do think it’s worth trying better onboarding first before taking the additional measure of switching the default if it proves insufficient.


#19

@sjakobi @SiriusStarr: On the bright side, having several people who care very deeply about the future of the language is a good problem to have! :slight_smile:

Regarding introductory material, what I can do is just defer this discussion until after the upcoming survey that I create at the beginning of each year. That should give us a better read of whether people still complain about Unicode or not. The introductory material has been ASCII-only for a while so updated survey results should help us understand whether people only disapproved of Unicode tutorials or Unicode in general.


#20

Maybe we should convert the cheatsheet to use ASCII style?


#21

@sjakobi: Good catch! Yeah, I will do that

Also, I’m thinking of adding questions to this year’s survey to help us gather data on this subject. The questions I’m considering are:

When typing Dhall symbols in my editor (before formatting the code) I use:

  • [ ] ASCII
  • [ ] Unicode

When formatting my code, I use:

  • [ ] Nothing (I don’t format my code)
  • [ ] dhall format
  • [ ] dhall format --ascii

I prefer to read Dhall code written using:

  • [ ] ASCII
  • [ ] Unicode