Merely for my own edification, but hopefully someone can shed some light on it.
The ABNF in the language standard makes it pretty clear that certain unicode escape sequences are not valid:
; The parser must also reject Unicode escape sequences that are either:
;
; * Surrogate pairs (i.e. `%xD800-DFFF`)
; * Non-characters (i.e. `%xNFFFE-xNFFFF` for each `N` in `{ 0 .. F }`)
However, the dhall seems perfectly content to accept text that fails these rules, e.g.
$ dhall <<< '"\uD800"'
"�"
$ dhall <<< '"\uFFFE"'
""
(Note that there is a character not rendering in the second example above.)
Additionally, it seems to reject sequences that are supposedly accepted per the ABNF, e.g.
$ dhall <<< '"\u{1FFF0}"'
dhall:
Error: Invalid input
(stdin):1:10:
|
1 | "\u{1FFF0}"
| ^
Invalid Unicode code point
Additionally, surrounding the aforementioned cases with curly braces causes them to be rejected…
$ dhall <<< '"\u{D800}"'
dhall:
Error: Invalid input
(stdin):1:9:
|
1 | "\u{D800}"
| ^
Invalid Unicode code point
$ dhall <<< '"\u{FFFE}"'
dhall:
Error: Invalid input
(stdin):1:9:
|
1 | "\u{FFFE}"
| ^
Invalid Unicode code point
Just trying to understand what is going on here.