Roadmap for improved Kubernetes support

I mentioned at the beginning of the year that I would be focusing on improving the dhall-kubernetes experience. More specifically, my goal is to enable the Dhall analog of Helm, which several contributors are independently converging on.

To give those contributors better visibility into what is going on, I though I would use this thread to summarize my personal roadmap for change I plan to propose and implement to improve the Kubernetes experience.

Performance

The first thing I’m focusing on is improving the performance of the language on large-scale examples. This is a good thing to do in general, but it’s essential for Kubernetes support since the Kubernetes schema is large.

The two things I plan to do are:

  • Propose a new caching mechanisms that users can opt into (e.g. using cache:sha256:…) that hashes and caches expressions exactly the way they were written

    I expect this to greatly improve caching performance and also reduce the size of cached expressions. This will also have a bonus effect of improving the UX for importing cached expressions, since they won’t be α-normalized when fetched from cache.

  • Propose more efficient support for with

    This will either be via https://github.com/dhall-lang/dhall-lang/pull/1050 or another proposal in the same spirit (depending on the discussion on that issue)

  • Complete store.dhall-lang.org to cache packages like Kubernetes

    I outlined this in RFC: proxy.dhall-lang.org and I think I’m pretty close to completing this with a few small tweaks on the Nixpkgs side of things. This should improve the first-time experience by speeding up the resolution of the dhall-kubernetes package.

    This will eventually be combined with the work on the documentation generator to create a package repository where the same URL can serve either the documentation or the cached expression depending on the Accept header

Ergonomics

All of the ergonomic changes I plan are going to be related to the with keyword:

  • Add support for updating Optional/List values using ?/*, respectively

    In other words, let the user write something like: x with foo.?.bar.*.baz

  • Add support for updating Maps using with if the keys are statically known

    See https://github.com/dhall-lang/dhall-lang/issues/979 for more details

  • Change the :: operator to desugar to with expressions instead of //

    In other words, Example::{ foo.bar = 1, baz = True } would now desugar to (Example.default with foo.bar = 1 with baz = True) : Example.Type

Feedback

I hope that helps give other people some idea of what I have in mind. Also, feel free to share feedback on that or new ideas on this thread.

I’m also trying to spread out the language changes over time, so as not to exhaust maintainers of other bindings. There’s no rush on my end because these changes are mostly backwards-compatible and therefore don’t require switching over the ecosystem ASAP.

8 Likes

I’m very excited about the planned ergonomics improvements, which I’m sure will result in large line-count reductions in our code base.

For what it’s worth, we haven’t been bitten so badly by the performance issues. Sure, first-time caching sucks, but then we forget about it. In CI, we pay the caching cost one time when we build the container image we use for running our CI jobs, and then each CI job is usually pretty fast as our scripts first copy from the image’s pre-built cache to the CI’s cache folder. If we hit a performance snag, it’s because of the combination of a) it’s not cached in the build image and b) this particular expression hasn’t been run on this particular CI runner before, or else it would already be cached on this CI runner.

(caveat: relatively small team / number of human users)

That looks great, performance improvements can be quite important when doing interactive iteration.
May I suggest a couple more things:

  • Being able to extend the typeUnions with CRDs as defined in this proposal. That would enable using CRDs like cert-manager along with regular Kubernetes types.
  • Define best practice to share configuration. I guess this is partly covered by the store.dhall-lang.org, but could this be used like the hub.helm.sh? For example could I add to it my zuul-operator resources?

Performance is the main blocking point for us.
The second one is discoverability. Within the team, some technical members have been reluctant to use Dhall because they just don’t know how to make some changes. I feel the vscode-dhall-lsp-server could greatly help on that matter but unfortunately it is not an option given that loading a simple dhall file takes so much time.
I usually use https://github.com/dhall-lang/dhall-kubernetes/tree/master/1.18 when I need to build our Dhall client interface. The fact that I need to go so deep in source code just to make some tiny progress is - I would say - a bad sign (I don’t mind too much myself but the other members of the team that are not so committed toward functional programming, static typing, … won’t feel the same way).

2 Likes

Thanks for posting this! I’m sharing it around with the rest of the team, and this is going to weigh heavily in our decision for whether or not we adpot dhall for our use-case.

Of the points you listed, I’d say the number one concern for us is performance (so it’s great to see that it’s your priority). It has a lot of ramifications:

  • It negatively impacts how github.com/sourcegraph/deploy-sourcegraph-dhall is structured (need to avoid importing dhall-kuberentes and instead clone / import the individual files directly)
  • The language server is unusably slow on those files
  • The slowness requires us to a write number of scripts/workarounds to “warm” the cache before a user runs our configuration pipeline. This also needs to happen whenever we make a change to the underlying implementation.

It harms first impressions of Dhall, which makes it much harder for me to sell this to our users or convince the rest of the rest of my team to adopt it.

2 Likes

Sorry for the previous deleted post.

Instead of reporting a problem in CI, I have figured out it is better to report it in a local environment …

If I restraint the memory of my virtual linux box to 2G, I won’t be able to generate more than 4 or 5 yaml files with dhall-to-yaml. Could this be a hint that a space leak is at play somewhere ?

It would also help to have an idea about the planning.

Can we expect some improvements in the next few months ?

I am only asking to be able to take actions on possible workarounds. For instance we could (as @ggilmore) clone / import individual files instead of using dhall-kubernetes/packages.dhall. Does it worth the effort knowing the issue would go away soon or later ?

1 Like

I suggest doing the work-arounds for now. In general with open source work I can’t guarantee any estimates

@Gabriel439 I am always disappointed when I see the expression “In general with opensource” as an excuse to put aside user expectations .

I understand you can give no guarantee (that’s not what I have asked). I also understand that given the limited number of contributors, the lack of sponsorship, the complexity of the problem, the lack of time (you are doing this on your own free time), it is difficult to come with estimates. I also know as a users of dhall-kubernetes I am not entitled to make any kind of demands.

I have to confess that after reading this:

That said, I’m fairly confident that with some attention Dhall can become the best-in-class solution in this space"

I had started to hope that by the end of the year, dhall-kubernetes could meet its own proposition.

Not only kubernetes is a (opensource) crowded field it is also a field with high standardisation given the existence of the CNCF TOC. To be “the best-in-class solution in this space” means a lot …

Fair enough, I understand why no gross estimates can be communicated. Given a resolution date is totally unknown, may I suggest a little warning on the README page of dhall-kubernetes that describes the current limitation ?

I do not plan to add a warning because in my view our open source model is a strength, not a limitation. Open source works well when the people who want something improved are empowered to scratch their own itch.

Like you said: you hoped that dhall-kubernetes could meet the proposition I outlined at the beginning of the year. That’s great, because you can contribute to all the relevant projects, all of which have an open governance model. If you judge that you don’t have the time or resources to contribute, that’s fine, but that’s also nobody else’s fault.

I would prefer and welcome feedback about things that prevents others from contributing more effectively (such as poor contribution instructions, insufficient permissions/privileges, a confusing code base). This is why I spend a significant proportion of my time improving things like CI/CD and documentation, to empower other people to contribute as effectively as possible.

@Gabriel439 to be honest a part from sharing my experience with dhall-kubernetes (which is valuable enough given the current lack of user feedback) I don’t have the time to contribute much.

That’s why I have chosen to be a sponsor.

Asking your end users to be able to help in a significant way by contributing to the open-source projects is a nice vision. But let’s be pragmatic … Most people that uses Helm are not active Helm contributors.

This case is even worse. I would bet you can count on the fingers of one hand the number of people capable of solving the performance problem we are talking about here.

Even if I stop working and give all my energy on solving it I am pretty sure I won’t make it.

Most of the open source projects out there (from Kubernetes to Ansible, Salt, Terraform, …) won’t count on their users to make a proposition transform into a useful tool. Surely they welcome such contributions but they won’t go as far as saying : “you don’t want to contribute, that’s fine but it is nobody else fault”.

The suggested warning would just be a honest statement and recognition that the dhall-kubernetes project has encountered some known limitations. I don’t see how this goes into saying that the open source model is limited.

Trying to move the debate to a more positive tone - here is an interesting article about open-source: https://diff.substack.com/p/working-in-public-and-the-economics