Figuring out performance bottlenecks

timbertson · May 13, 2020, 10:57am

We have an internal dhall library, which is pretty new and not that big (~1400 lines, not that heavily commented so that’s mostly code). It’s a “boilerplate” library, where you import it and (using dhall to-directory-tree) you can use a bunch of types and functions to generate various common CI files.

Over time it feels like it’s gotten slower than it should be, so I was hoping to dig in and see if there was an obvious culprit. For reference it’s taking 5s to simply load + typecheck, not counting any actual work. I’m testing this by reexporting Prelude, and running:

time dhall <<< '(./package.dhall).Prelude.Bool.not True'
False
real 0m5.140s
user 0m4.748s

The only external dependency is the prelude, which is frozen, so nothing’s hitting the network. And if I import the prelude import directly it only takes a fraction of that time:

time dhall <<< '(./dependencies/prelude.dhall).Bool.not True'
False
real 0m0.144s
user 0m0.119s

At a gut level, this feels wrong. We haven’t written that much code, and the prelude is something like 3-4x the number of lines. Admittedly ours is probably more dense (less comments, more complex types) than the Prelude, but it still feels excessive.

I’ve tried a few things, following hunches:

frozen all imports in all dhall files: no noticeable difference
tried to surgically comment out imports / functions

I’ve seen some effect getting the evaluation time down by commenting out various parts, but it’s very hard for me to tell if that’s because I’ve found a suspiciously expensive branch, or that code just happens to be bringing in a bunch of transitive expressions which ought to be expensive.

So, any suggestions as to how to continue? I’ve been trying to find a reproduction case I can share, but no real luck so far.

One remaining hunch I don’t know how to check is my suspicion around the JSON type. Since we’re using dhall-to-directory-tree we’ve got a lot of functions mapping custom types into JSON.Type, and perhaps with the way it’s encoding that recursion it’s having an outsize effect on type checking?

sjakobi · May 13, 2020, 12:56pm

Interesting!

Could you time dhall type --quiet on your package.dhall?

A profiled run would also be very interesting. I believe the Nix tooling also produces profiled binaries (or at least libraries), but I’m not sure how to access them…

timbertson · May 13, 2020, 11:08pm

Similar time for just type --quiet. Interestingly, I noticed there’s a whole extra second taken when I interpret “an expression referencing the file” vs passing in the --file directly. That’s surprising to me…

time dhall type --quiet <<< ./package.dhall

real	0m4.804s
user	0m4.480s

time dhall type --quiet --file ./package.dhall

real	0m3.849s
user	0m3.561s

(I tried each variant 3 times, timings are consistent)

So there seems to be a general “wrapping” tax somewhere:

$ echo './package.dhall' > package2.dhall
$ time dhall type --quiet --file ./package2.dhall

real	0m4.807s
user	0m4.458s

$ echo './package2.dhall' > package3.dhall
$ time dhall type --quiet --file ./package3.dhall

real	0m5.793s
user	0m5.414s

sjakobi · May 13, 2020, 11:38pm

Do you already use dhall freeze --cache --all like the Prelude does?

Gabriel439 · May 14, 2020, 4:34am

I can provide some context that might help.

First off, local imports (even ones without integrity checks) are implicitly cached by the Haskell implementation. This is a non-standard feature specific to the Haskell implementation, but it has no affect on the observable behavior; it only affects performance. This implicit cache is stored underneath ~/.cache/dhall-haskell.

This is likely the reason why trivially wrapping the expression in another file that imports it changes performance, because this is triggering this implicit cache.

This is also the reason why I don’t believe dhall freeze --cache --all will improve things, because the Haskell implementation is already implicitly caching local imports, so it would likely not make a difference.

Usually when something like this happens it is because the interpreter is decoding a large normal form. One thing you should do is check how large the normal form is, like this:

$ dhall --file "${FILE}" | dhall encode | wc -c

For example, the Prelude’s normal form is ~182 KB when stored in the cache

$ dhall <<< 'https://prelude.dhall-lang.org/v16.0.0/package.dhall' | dhall encode | wc -c
  181982

The reason this matters is that the Haskell implementation appears to be slow when decoding CBOR (~5-10 MB/s decoding speed), so I suspect what is happening is that your normal form is about tens of megabytes large and it’s taking the interpreter a few seconds to decode all of that. For more context, see:

github.com/dhall-lang/dhall-haskell

[LSP] Completion is very slow in files with big imports

opened 01:07AM - 15 Nov 19 UTC

akshaymankar

performance LSP

When I have [this file](https://github.com/akshaymankar/dhall-concourse/blob/mas…ter/package.dhall) imported, completion starts to take about 4-5 seconds. And with this if I import Prelude, and have a biggish expresion, completion starts taking more than 10-15 seconds and becomes completely unusable. On profiling I found that [buildCompletionContext](https://github.com/dhall-lang/dhall-haskell/blob/f18ec5f8e0145f8c77a3d59fedd8d0934c0e1bc0/dhall-lsp-server/src/Dhall/LSP/Handlers.hs#L493) takes most of this time. From what I understood, this function infers type of each let binding in the "binder expression" every time completion is requested by the lsp-client. Maybe having a cache of let bindings to type would make this significantly faster. I am just not sure how would the lsp-server know in case type of an import gets changed by some external change to the imported file or its dependencies.

timbertson · May 14, 2020, 6:43am

$ dhall --file "${FILE}" | dhall encode | wc -c
4727750

So, 25x the size of the prelude, apparently

Is there likely to be anything I can do to get this normal form down? Or is it strongly tied to the complexity / number of things exported?

Gabriel439 · May 14, 2020, 3:05pm

@timbertson: Probably the easiest thing to do is to interpret the file and visually inspect the normal form and see if anything stands out.

Other than that, the main solution here is to improve decoding (5-10 MB/s for decoding CBOR was never fast enough in my opinion)

ari-becker · May 14, 2020, 3:49pm

It’s difficult to advise you without looking at your specific configuration, but one thing that made massive improvements in the size of our normal forms was to reduce our use of deeply nested unions. Remember that every time that you use a union, each union literal needs to include the types of all of the alternatives which it isn’t using at the time. If the types of the alternatives include unions themselves, and those unions include alternatives which are themselves unions, and so forth, then the top level union literal ends up including every single possible combination of deeply nested unions for every union literal.

We had a case where the unions were six levels deep… the size of the normal form suffered a combinatorial explosion.

timbertson · May 15, 2020, 3:09am

Thanks for the context, folks. It’s obviously hard to eyeball the whole thing, but I did do it on some submodules and peeked at the most-surprisingly-biggest ones.

So I guess not type checking, but this does seem to be a likely culprit. Basically, if I jump to random locations in the normalized form of a heavy module, most of the time the screen is full of the JSON type.

As an illustrative example, I have a simple JSON helper:

let mapListToNull =
    -- map a List T into a JSON.array (or JSON.null if empty)
          \(T : Type)
      ->  \(fn : T -> JSON.Type)
      ->  \(values : List T)
      ->  nullIfEmpty (mapList T fn values)

Obviously this will include the definitions of nullIfEmpty and mapList which you can probably guess at, but that comes out to an encoded form of over 6kb. Meanwhile a whole module full of bash-script related types and templates is only 2kb.

The JSON type is pervasive elsewhere, too - due to the use of dhall to-directory-tree, basically every type has a toJSON companion function, and the repetitions of the JSON type takes up the vast majority of the normalized form.

Gabriel439 · May 15, 2020, 3:53pm

@timbertson: The proliferation of */toJSON functions can be mitigated by standardizing this:

github.com/dhall-lang/dhall-lang

`toJSON` keyword?

opened 05:26PM - 03 Jan 19 UTC

Gabriel439

I'm opening up an issue to discuss this comment: https://github.com/dhall-lan…g/dhall-lang/issues/125#issuecomment-450601520 ... which is also related to this repository: https://github.com/akshaymankar/dhall-recursive-json The informal proposal is to add a new `JSON` primitive type and a `toJSON` keyword that converts any Dhall expression to the corresponding JSON expression. The repository linked above has the `toJSON` keyword taking a type argument followed by the expression to convert to JSON, but we can simplify that a bit to remove the type argument since we can infer the type from the expression to convert (similar to how the `Some` keyword behaves). Conceptually the `JSON` type is an opaque type (i.e. similar to how we treat `Double`). You can create it using `toJSON` but you can't do anything with it within the language. The only interesting thing you can do with it is embed it within a larger expression that you convert to JSON. In other words, this is legal: ```haskell let x : JSON = toJSON 1 let y : JSON = toJSON True in toJSON [ x, y ] ``` ... which would correspond to the following JSON expression: ```json [ 1, true ] ``` Note that normalizing `toJSON` just normalizes its argument. In other words, if we were to normalize the above Dhall expression the normal form would be: ```haskell toJSON [ toJSON 1, toJSON True ] ``` Type-checking `toJSON` would essentially entail verifying that the Dhall expression is convertable to JSON (which can be made precise, but this is just an informal exposition of the idea). If that check succeeds then the inferred type is `JSON`. The non-trivial aspect to this is that each language binding would now be responsible for converting any value of type `JSON` to a JSON expression within the host language. For example, the Haskell bindings to Dhall would now have an `Interpret` instance for decoding into the Haskell representation of a `JSON` value: ```haskell instance Interpret Data.Aeson.Value where ... ``` I'll respond to this in a separate comment with my own thoughts on this proposal. cc: @akshaymankar

Nadrieril · May 19, 2020, 3:30pm

This is something that will probably keep coming up because normal forms duplicate things a lot. Would it make sense to imagine a binary encoding that includes hash-consing? That would take care of all of this redundancy in one go. But that’s of course quite complex to specify

philandstuff · May 20, 2020, 1:53pm

Yes, it feels like there are three separate issues here:

normal forms have a lot of redundancy
CBOR preserves certain kinds of string (which is problematic for things like JSON/Type)
dhall-haskell’s CBOR decoding is slow

I think that we need to fix both 1 and 3 (with 2 a nice-to-have but lower priority), in order to properly address the performance issues with fetching from cache.

Gabriel439 · May 20, 2020, 4:30pm

@philandstuff: What CBOR decoding performance do you get for the Go implementation (i.e. in MB/s)?

philandstuff · May 21, 2020, 1:45pm

I’ve just thrown together some naive-ish benchmarks and I get ~10Mb/s (on Prelude) and ~18Mb/s (on tests/parser/success/largeExpression.dhallb).

I haven’t put any effort into optimizing (or even measuring) this before, so no doubt there’s probably room for improvement.

sjakobi · May 22, 2020, 1:52pm

For reference: I’ve opened https://github.com/dhall-lang/dhall-haskell/issues/1804 to track the decoding performance of the Haskell implementation.

sjakobi · May 24, 2020, 11:26pm

@timbertson Do you happen to use Prelude.JSON.render a lot? Its normalized form, and therefore its binary encoding, is absolutely massive:

$ echo 'https://prelude.dhall-lang.org/v16.0.0/JSON/render' | dhall --alpha | dhall encode | wc -c
148899

It’s 82% of the whole encoded Prelude!

$ echo 'https://prelude.dhall-lang.org/v16.0.0/package.dhall' | dhall --alpha | dhall encode | wc -c
181982

Curiously, renderYAML is much smaller:

$ echo 'https://prelude.dhall-lang.org/v16.0.0/JSON/renderYAML' | dhall --alpha | dhall encode | wc -c
12511

Maybe we can figure out what makes render so huge and try to get rid of that for some easy wins…

sjakobi · May 25, 2020, 3:02am

I already managed to shave ~~20%~~ 37% off the encoded Prelude size with a small simplification in Prelude.JSON.renderAs: https://github.com/dhall-lang/dhall-lang/pull/1015

timbertson · May 26, 2020, 10:18am

The irony of course is that that it was me who made JSON.render a bunch bigger (with no idea it’d have this effect) in https://github.com/dhall-lang/dhall-lang/pull/936

I wouldn’t have expected it to be that big a difference, I’m using it in only a handful of places (at the edges of each file format) meanwhile the (smaller) JSON helpers are used all other the place.

But I ran your commit through the toplevel import and it went from ~7s -> ~4.5s, so that’s quite an improvement, thank you!

sjakobi · May 26, 2020, 11:33am

Yeah, I was surprised too by how much cache sizes are affected by how often an argument appears in a function body. Although it makes perfect sense when you think about how normalization works. This reminds me of the linear types proposal for GHC, which seems like a really curious connection.

If you normalize your mapListToNull function you’ll probably also find that the values argument is used at least twice: Once for a List/null check, once to produce the output JSON.array.

You can make that function “linear” in multiple ways: One way would be to replace the List/null check with List/unsnoc, somewhat like this:

merge
  { None -> JSON.null
  , Some -> \(ny : NonYtpme JSON.Type) -> JSON.array (ny.init # [ ny.last ])
  }
  (List/unsnoc JSON.Type (mapList T fn values))

Or you could try fusing List/unsnoc with mapList directly…

It’s all not very obvious though, and doesn’t generally improve the readability of the code, so I don’t think this should become standard practice.

I also don’t believe that you can get around duplicating type arguments in this way, so those will still cause a lot of bloat in the caches.

I think Gabriel has mentioned a possible new caching mechanism that doesn’t normalize expressions. That seems like a promising idea to me.

I’m also curious about @philandstuff’s idea to make normal forms less redundant. What did you have in mind there, @philandstuff?

vmchale · May 26, 2020, 7:52pm

As for redundancy, one could use compression. That would solve some but not all problems.