High memory use when decoding Dhall expressions


#1

Hi all,

When using cpkg, I get better time-performance via hashes/caching, however, memory use seems unreasonably high.

Running cpkg install clzip I get:

That is, it seems that it takes 75M to deserialize - possibly a fault in the cbor library, or possibly a fault in Dhall? Any pointers to what may be going wrong?

Cheers,
Vanessa McHale


#2

This deserializes the cache for pkg-set.dhall, right?

While I could be wrong, 75MB heap honestly doesn’t seem all that big to me. The binary cache file alone is 4.8MB already. Remember that it’s normalized, so it contains a lot of redundancies.


#3

Fair enough; I’d have to dig deeper and look at how long it “should” take to decode. The “source” is a few hundred kilobytes.

(As an aside, it might be cool caching compressed with lz4 or zstd or something…)


#4

@vmchale: I made an attempt to improve encoding/decoding performance recently and right now the bottleneck appears to be in the cborg package.

I also noticed the same problem you did, which is that the decoding speed I’m getting is much lower than I’d expect to get. According to benchmarks, the decoding speed is about 10 MB/s, but I expect an efficient binary decoder to be able to reach at least 100 MB/s decoding speed.

There is also the possibility of not storing αβ-normalized forms. One idea I floated was providing a cache option for integrity checks that stores them without normalization and ignores hash mismatches:

In other words, something that reifies within the language what the --cache flag to dhall freeze currently does.


#5

For reference: I’ve made an issue to investigate this further: https://github.com/dhall-lang/dhall-haskell/issues/1804.

Do you possibly have some ideas how to debug the slow performance, @vmchale? I’m also hoping for some help from the cborg folks.


#6

So far all I’ve done is heap profiling! I think it might be upstream, but it’s hard to confirm…