Figuring out performance bottlenecks


#21

I don’t think compression would help much. I expect that most of the performance issues come from having to allocate a gigantic expression. Could be wrong though, we’d need to measure.


#22

I am also trying to see how I could improve performance while working with dhall-openshift:

→ time dhall type --quiet <<< ./openshift/package.dhall
86.64s user 1.32s system 99% cpu 1:28.03 total

I am totally puzzled because importing dhall-openshift is really quick:

→ time dhall type --quiet <<< ./openshift/packages/openshift.dhall
2.96s user 0.06s system 99% cpu 3.026 total

If I freeze all imports in package.yaml I have got a nice improvment (but it is still quite slow):

→ time dhall type --quiet <<< ./openshift/package.dhall
51.44s user 0.77s system 99% cpu 52.277 total

Mainly a single cpu is used during this time.

Is there any trick I could apply, some bad practice I should be aware of ?


#23

@PierreR: The reason ./openshift/package.dhall is much more expensive is because it’s also importing ./openshift/func/func.dhall, which is where most of the time is being taken. If you don’t re-export the functions from ./openshift/package.dhall then it’s much faster again.


#24

My ideas are very half-baked at this stage, but here are some not-thought-through thoughts:

  1. Weaker normal form

Right now, we fully normalise every expression. I suspect this causes a lot of the redundancy we see. Each time a function is applied, a single expression gets copied to every variable site.

I wonder if there’s a way to reduce to a weaker normal form (eg head normal form, weak normal form, or weak had normal form) in order to reduce to a smaller size in cache.

Things we’d have to consider:

  • changing the normal form changes semantic hashes, so this would be a very disruptive change
  • the particular choice of normal form affects evaluation order (eg weak head normal form is basically Haskell evaluation order), while currently implementations can choose any evaluation order
  • this might not even work? If we want to pursue it, we should try it on some real libraries and measure the results first before committing to it
  • if we choose weak (head?) normal form, we won’t normalise function bodies any more. What are the consequences of that?
  1. Delay reducing lets

This is the same idea as above but focused on let. Is there a way to delay substituting let bodies until we need to? I’m not sure how to define this formally in a consistent manner, so this idea is even less baked than the one above.


#25

BTW I recommend ghc-prof-flamegraph for .prof files


#26

@philandstuff: I think the main thing that would improve performance is having a different hash “mode” (e.g. cache:sha256:… or some similar syntax) that would have the following behavior:

  • The import would be fully resolved, but not α-normalized or β-normalized upon import

  • Therefore, the cache key would be the hash of the non-normalized import

    … so that the cached import is still content-addressable

  • If there is a cache hit and the cached import doesn’t match the hash, it is rejected

  • If there is a cache miss and the fully-resolved import doesn’t match the hash, it is still accepted, but not cached

    … similar to how the current import sha256:… ? import idiom behaves

  • If there is a cache miss and the fully-resolved import matches the hash, it is cached

There is also a separate question of whether we should also do the same for imports without an integrity check (i.e. don’t β-normalize them by default).

I believe @AndrasKovacs proposed something similar to this when investigating how to improve Dhall performance.


#27

Not knowing many internals, that sounds pretty good to me @Gabriel439

I’m curious, what would the downsides of this be if it just replaced the existing strategy? It sounds like it would always result in smaller files.

Obviously changes that don’t affect the semantic hash would be one case where it’s a worse strategy, although personally I’d be fine with that - if I’m using a hash with no fallback, I’d always be using something where the input is fully immutable (like a git commit / tag).

Given the above I’m also confused by having it accept-but-not-cache if the import doesn’t match. What’s the reasoning for have it be more lenient? I guess the naming of cache: suggests this, but if it were just called resolve: is there a reason it should only be used for caching?


#28

@timbertson: The way I think about it is that there are two separate requirements for import hashes that partially overlap:

  • Security: Sometimes we want to verify that the import has not been tampered with
  • Caching: Sometimes we want to accelerate the import process by locally caching imports

My reasoning is that when a user adds a hash to an import the language should provide a way for the user to clarify their intent: are they adding a hash for security reasons or for caching reasons? The import system’s behavior should then reflect the user’s intent and the difference in behavior between the two hash “modes” reflects the differences in requirements based on the user’s intent.

For example, if the user’s intent is to secure the import, then a failure to match the hash is a showstopper and the interpreter should proceed no further. However, if the user’s intent is to simply cache the import, then a failure to match the hash is a warning at most and there’s no reason why the interpreter should not gracefully recover by continuing to resolve the import.

Similarly, the choice whether to αβ-normalize or not depends on the user’s intent. If the user is only caching the import, then performance matters and we have numerous experience reports that indicate that not pre-normalizing imports would improve performance (due to avoiding bloated normal forms). Also, @AndrasKovacs made a point in some other thread (which I can no longer find) that human-authored code tends to be reasonably optimal and compact and should be preserved as much as possible for efficiency. However, if the user’s intent is to secure the import then they’re not going to want to have to re-audit the import every time somebody refactors their code, in which case αβ-normalizing the import makes sense to make the integrity check insensitive to these sorts of refactors.


#29

To me that’s the existing fallback syntax, and if I’m not using that I would want cache failures to be fatal, even with the un-normalized hash. Not because I’ve audited it, but just so I notice when I forget to update a hash (instead of slowing down).


#30

FWIW the generation of this simple yaml takes about 1 min:

apiVersion: v1
items:
  - apiVersion: v1
    kind: Namespace
    metadata:
      annotations:
        openshift.io/display-name: cicd KB technical documentation site
        openshift.io/requester: cicd_cicd
      name: cicd-doc-dev
  - apiVersion: v1
    kind: ResourceQuota
    metadata:
      name: cicd-doc-dev-quota
      namespace: cicd-doc-dev
    spec:
      hard:
        limits.cpu: '2'
        limits.memory: "2Gi"
        requests.cpu: '1'
        requests.memory: "1Gi"
kind: List
real 0m51.929s
user 0m51.051s
sys 0m0.772s

I am using the following command:
dhall-to-yaml --documents --file sbx/operation/projects.dhall)" > sbx/operation/projects.yaml

The project.dhall source file is:

let oc = env:OC

let project =
      oc.Project::{
      , name = "cicd-doc-dev"
      , displayName = "cicd KB technical documentation site"
      , requester = "cicd_cicd"
      }

in  oc.makeProject project

In case you are wondering, the oc import is frozen. The freeze of that import makes a huge difference (from 3 minutes down to less than a minutes)

I am using this source: https://github.com/PierreR/dhall-packages/tree/master/openshift

I will test against 1.33.0 shortly and come back.


#31

It looks quite faster with 1.33.0.
Of course this is not real benchmark but I consistently got something around the line of

real    0m34.850s
user    0m33.950s
sys     0m0.763s

An impressive improvement !


#32

True but still 30 seconds!

I think what needs to be done next is write a CBOR package or figure out what’s going on upstream?


#33

Another option would be to never normalize under lambdas before hashing/caching. I think this should prevent most blowup, and still provide a good check that the file hasn’t changed in the case of a “simple” file, i.e. a functionless thing that could be passed to dhall-to-json.
The rationale is that when refactoring a function, we would often break hashing anyways, so using hashes to test preservation of intent already isn’t reliable for functions. However, for records of data we may want that let-abstraction doesn’t break hashes, and this would still be the case.
The main issue would be with the JSON type, and for that honestly I think it would make perfect sense to have it as a builtin type, with JSON/build and JSON/fold to convert to and from the current Church encoding. This would also make it easier for implementations to treat that type differently; that’s currently a pain point for me. See https://github.com/dhall-lang/dhall-lang/issues/1027 for a discussion of this idea.
Not normalizing under lambdas could also simplify implementations that use closures, since for those normalization under lambdas is extra work.


#34

I would like to reproduce your timings but I’m not familiar with your project and setup. Could you describe how to reproduce the issue, ideally as a series of bash commands or something?


#35

Sure I will be happy to do so. I will try to pack something up. I will be glad to know if I do something stupid :wink:


#36

@sjakobi You can use this repository https://github.com/PierreR/dhall-packages

I suggest to concentrate on ‘./openshift/examples/application1.dhall’:

→ dhall --file "./openshift/examples/application1.dhall" | dhall encode | wc -c
9398255

(about 50x the size of the prelude)

How long does it takes to get the type of the expression:

→ bench 'dhall type --file ./openshift/examples/application1.dhall --quiet'
benchmarking dhall type --file ./openshift/examples/application1.dhall --quiet
time                 32.02 s    (29.89 s .. 33.42 s)
                     1.000 R²   (0.998 R² .. 1.000 R²)
mean                 32.52 s    (32.18 s .. 32.70 s)
std dev              322.5 ms   (104.5 ms .. 419.9 ms)
variance introduced by outliers: 19% (moderately inflated)

How long does it takes to generate the yaml file:

→ bench 'dhall-to-yaml --file ./openshift/examples/application1.dhall --output ./openshift/examples/application1.yaml'
benchmarking dhall-to-yaml --file ./openshift/examples/application1.dhall --output ./openshift/examples/application1.yaml
time                 33.47 s    (31.26 s .. 36.47 s)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 34.66 s    (34.02 s .. 35.31 s)
std dev              711.2 ms   (413.9 ms .. 1.005 s)
variance introduced by outliers: 19% (moderately inflated)ed)

Results are similar for other example files such as ‘project1.dhall’.


#37

Thanks @PierreR, I’ll try to take a look soon!


#38

All the tests can be done without any internet connection (all remote import are froozen).

Also the import depends on a env variable that is defined in the shell.nix (in case you miss it).


#39

That sounds incredible but replacing the env variable in the import by its values:

- let oc = env:OC
+ let oc = ../package.dhall sha256:088.....

seems to be a significant performance boost.

I have witnessed with bench on a few examples a drop from ~35s to ~18s (nearly 50%)


#40

Only hash-protected imports are cached. In the first case, this happens:

  • try to resolve env:OC
    • not hashed ? read the env var
      • try to resolve ../package.dhall sha256:088....
        • hashed? load the fully resolved and type-checked expression
    • resolve and type-check the expression

The second form does this:

  • try to resolve ../package.dhall sha256:088....
    • hashed? load the fully resolved and type-checked expression

TLDR; only hashed imports can skip the type-check pass. env:OC is not hashed and will be type-checked even if it only imports a cached expression.