Performance improvements + interned identifiers

(Vanessa McHale) #1

I wanted performance improvements to Dhall for my own improvements (a 3000-line file admittedly), but with the intention to create a Dhall language server I think this might become more important - being able to normalize expressions within the editor would be really cool.

One of the notable areas for improvement is interning identifiers prior to doing anything with typechecking or normalization. Basically, this would entail creating an IntMap storing the user’s identifiers (stored as `Text); then comparisons between identifiers would be much faster - since compilers contain many such comparisons it would hopefully improve performance of normalization a good deal.

I don’t quite have the time to implement this now (though I may be interested in implementing it in the future) but I figured I’d ask: is there any interest in interning identifiers in the Haskell implementation? It would be a big project so I would want buy-in from those who would be reviewing a future pull request.


In regard to Dhall language server, the question that concerns me a lot is the size of the dhall programs that people use. Since dhall being used mostly for templating, the size of resulting dhall program with all imports inlined could be huge. What is worse I think even the sheer byte size of type info can be quite large. This might be “OK” in a command mode, but a LSP server must run continuously, can have multiple files open, and on top of that it is preferrable to have large number of intermediate caches, to speed up compilation (I think user would expect at least <1s delay). And refactoring support or type hints or advanced goto definition are totally dependant on LSP having complete type-checked AST in memory.

So from my standpoint this issue is extremely relevant. I don’t know, but maybe with quick normalization we might be able to cache not only simple things like strings in compiler but entire sub-expressions which sounds a pretty cool thing.

Note, I’m myself Dhall noobie, don’t take my answer too seriously.

(Gabriel Gonzalez) #3

I believe NbE type checking + conversion checking will solve these performance issues when that work is completed

(Vanessa McHale) #4

Yeah. I’m not even sure how LSP integration would work given that there’s such reliance on HTTP & whatnot. But it’s something I am interested in.

I think Dhall does rely quite heavily on large files… not sure how to solve that, though.


Well http is less of a problem, because we can rely on perfect caching. As long as sha sum hasn’t changed we can keep file in the local cache forever. Even if that’s not the case with some meaningful timeout the problem can be mitigated.

(Gabriel Gonzalez) #6

I think @vmchale has a point that the LSP performance will deteriorate if the code has any remote imports that aren’t protected by a semantic integrity check. The main solutions I can think of are:

  • Make it easier to use semantic integrity checks within the IDE (which the Google Summer of Code project does propose)
  • Imperfectly cache imports for an IDE session with an option to invalidate the cache or automatically expire after some duration
  • Debounce requests so that imports aren’t fetched as often

My preference is probably to lean on semantic integrity checks and make them easier to use within an IDE because I think in most cases pinning a remote import is the right thing to do anyway.