Normalized form in dhall - limits and roadmap


#1

Hi! I’ve been having a chat with Fabrizio about dhall and he recommended I post some of the questions here to the broader community. I’m looking forward to getting to know y’all!

I am wondering what the limits of dhall’s normalized form are for functions and if there is a roadmap for future development. For example, normalized form on arithmetic is not yet supported in some cases:

PS C:\Users\MikeSolomon\devel\random-scripts\dhall> "\(n : Natural) -> [ n, n + 3 ]" | .\dhall.exe hash    
sha256:8b215570802fe99fe3ba60a9bc9a81015a5c831e662d1fccf6521ad2b9831a3f      
PS C:\Users\MikeSolomon\devel\random-scripts\dhall> "\(n : Natural) -> [ n, n + 1 + 2 ]" | .\dhall.exe hash
sha256:9250fbfabd94c822ad2e530b857b04c097ee0a5c6f7a64a8c29d7103f867619e

While it is supported in other cases:

PS C:\Users\MikeSolomon\devel\random-scripts\dhall> "\(n : Natural) -> 1 + 1" | .\dhall.exe hash
sha256:ea410b604b0eb8f0c9f9c77c30c2c40222829f6ccdc8ca600a7a2fb51bedebbd      
PS C:\Users\MikeSolomon\devel\random-scripts\dhall> "\(n : Natural) -> 2" | .\dhall.exe hash
sha256:ea410b604b0eb8f0c9f9c77c30c2c40222829f6ccdc8ca600a7a2fb51bedebbd

The background is that I’m doing an experiment using dhall to compare pure functions, and I would like any functions that have the same behavior to have the same hash (extensionality). This is of course quite hard to do and impossible for a Turing complete language, but as dhall is not Turing complete I thought that this sort of thing may be worth discussing and knowing more about.

Thanks in advance for any thoughts/info about this!


Function hash changing even though function outcome does not
#2

Hi Mike!

Dhall’s hashes are derived like this: For an input expression e, the hash is just

sha256 (binary-encode (α-normalize (β-normalize e))))

In this case you’re already running into the limits of β-normalization:

$ echo '\(n : Natural) -> n + 1 + 1' | dhall normalize
λ(n : Natural) → n + 1 + 1

n + 1 + 1 is parsed as ((n + 1) + 1) and the standard doesn’t provide any rules to simplify that expression further.

Gabriel once proposed additional simplifications for associative operations, but that was ultimately rejected: https://github.com/dhall-lang/dhall-lang/pull/274 (I’ll have to reread that thread to understand why though…)


#3

Super useful! Classic problem - I did a little bit of work with/on the z3 solver and these are exactly the types of questions that come up there. Ultimately, unless you’re using something like coq, everything will come down to heuristics, and even with coq there needs to be a human in the loop for all but the most trivial of proofs.

The most useful thing IMO is a statement in the docs about what is and isn’t doable for hashing and normalization. I skimmed the standard and, unfortunately, it’s too hard for me to come up with a mind map of the limits of hashing and normalization from my initial skim. I’d need to spend a lot more time familiarizing myself with the standard to grok those limits.


Function hash changing even though function outcome does not
#4

It looks like I might have contributed to the closure of that PR by pointing out that getting into this is a slippery slope since the amount of rules that we might have to add is not unbounded (as it’s the case with Turing-complete languages) but it’s rather large (combinatorial with the amount of primitives we have I guess?), and standardizing them with the current way we standardize things could add a big burden on bootstrapping implementations.

I suspect this is also why we didn’t move forward with your #688. As I noted there though, I think having very powerful extensionality is very well worth pursuing, but we should first find a way to encode rules (possibly in Dhall itself?) in such a way so that implementations can just generate code to handle them