Can't build a nixified Dhall package in a sandbox depending on the environment

I’m having a problem (or rather, our customer is, and I can’t debug or reproduce it locally). The following is what I hope is a minimal version of the problem, but I really can’t check it: GitHub - kenranunderscore/nix-dhall-sandbox-repro: Trying to build a minimal example to reproduce a problem with building Dhall packages in a Nix sandbox

The idea is that there is a Dhall package that defines GitLab CI pipelines, and I use dhall-to-yaml-ng to convert that to YAML inside of a Nix job. I’ve used dhall-nixpkgs to generate a Nix expression (using fixed output derivations) for the dependencies. Now I am able to use nix build (in this example nix-build release.nix) with the --no-sandbox flag to build everything locally, with result being the generated YAML file. This all works just fine for me and in our own GitLab instance and is actually a pleasure to work with (thank you for that!) :slight_smile: I’m really happy I could get rid of --no-sandbox, so now I can properly use flakes and hydra for our project.

But: this seems to fail when running in our customer’s GitLab instance. I don’t have access there, but what I know is that they usually have all the proxy variables set. The Nix build above is triggered in GitLab CI, and even though I can run it in a sandbox locally, the error they’re seeing seems to suggest that “someone” tries to access the internet. Setting the correct proxy variables for the job should have fixed it IMHO, as long as Nix is the one doing the downloading, but it doesn’t seem to work with or without those. This is the relevant part of the error they sent me:

dhall> checking for references to /build/ in /nix/store/4lzqcvq0cz2sxm6p1477mycp96p5fbz6-dhall-1.40.2-doc...
building '/nix/store/68gqy3m9lpvrk3p77kz979gv0zcpy9ri-package.dhall.drv'...
package.dhall> Warning: Could not get or create the default cache directory:
package.dhall> ↳ /homeless-shelter/.cache/dhall
package.dhall> You can enable caching by creating it if needed and setting read,
package.dhall> write and search permissions on it or providing another cache base
package.dhall> directory by setting the $XDG_CACHE_HOME environment variable.
package.dhall> dhall:
package.dhall> Error: InternalException (HostCannotConnect "" [Network.Socket.connect: <socket: 3>: does not exist (Connection refused),Network.Socket.connect: <socket: 3>: does not exist (Connection refused),Network.Socket.connect: <socket: 3>: does not exist (Connection refused),Network.Socket.connect: <socket: 3>: does not exist (Connection refused)])
package.dhall> URL:
package.dhall> dhall:
package.dhall> Error: Invalid input
package.dhall> (input):1:1:
package.dhall>   |
package.dhall> 1 | <empty line>
package.dhall>   | ^
package.dhall> unexpected end of input
package.dhall> expecting #!, expression, or whitespace
error: builder for '/nix/store/68gqy3m9lpvrk3p77kz979gv0zcpy9ri-package.dhall.drv' failed with exit code 1

Sorry that I don’t have more information. I’m trying to grasp the problem myself. All the other builds that we’re doing that don’t require --no-sandbox actually run fine in their GitLab instance, so I’m a bit at a loss. Thanks for any pointers!

I’m still trying to reproduce your problem, but the error message you pasted shows that dhall is trying to resolve the gitlab-ci import locally. The nix builder doesn’t have access to the internet (for good reason!), hence dhall fails. What should be happening instead is that the nix-dhall tooling should prefetch and cache the gitlab-ci import, avoiding dhall network calls entirely.

Yeah, this is exactly what my goal was with making it sandboxable. I thought: if it builds on my machine without --no-sandbox and --impure, it should build in CI as well. At least if the proxy settings on the runner are present; then nix can fetch using those and nothing inside the nix builds needs internet access. (Before that I had to actually duplicate lots of impure flake targets, explicitly setting their proxy settings for when running on their GitLab. That was the reason why I switched to the Nix/Dhall integration in the first place.)

Just to be clear: I cannot reproduce this with my example locally either. Perhaps I could if I were behind a proxy myself. But therefore I cannot be 100% sure that my repo is actual a working example. It’s very close to the actual code though. The dhall-gitlab-ci is its only external dependency, with exactly this hash, and everything else I’m doing in the codebase is just more Dhall code using it.

One hint: could this be due to a cache miss somehow? I don’t see it, but the reason I’m not using Ben Gamari’s original dhall-gitlab-ci repo is Prelude/package.dhall checksum incorrect · Issue #13 · bgamari/dhall-gitlab-ci · GitHub. When upgrading to Dhall 1.40 the hash has changed, but I don’t see anything that obviously looks like it it in the release notes.

Brainstorming, could it be how prelude is defined ?

I wouldn’t expect that, as the error above is very specific about which URL it cannot reach, isn’t it? I’m not extremely good at Dhall: could you elaborate what’s weird about that inclusion of Prelude?

I think there’s some confusion here about how Nix’s sandboxing works with respect to the network, so let me clarify that first.

Sandboxing does not imply the complete absence of network calls in Nix builds. Rather, sandboxing only permits Nix builds to use the network if they are protected with a fixed output hash; this ensures that the build remains deterministic even if it relies on the network. For example, that’s why all of the fetch* derivations can download things that are not already cached, because they require a fixed output hash.

This implies that you do not need to run dhall-nixpkgs on the builder ahead of time to pre-cache the derivation. So long as the builder can access the relevant resource over the network and the derivation is protected by a fixed output hash the build should succeed even if the build is sandboxed.

From a quick scan of the repository you linked to, it seems like the derivation in question was likely protected by a fixed output hash, so if the build still fails it is most likely due to the fact that the Nix user performing the build does not have network access to due to issues with the network and not due to sandbox-related restrictions.

An easy to narrow down these sorts of network vs. sandbox issues is with a minimal reproducing derivation like this one:

  pkgs = import <nixpkgs> { };

  pkgs.runCommand "networked"
    { nativeBuildInputs = [ pkgs.curl ];
      outputHashAlgo = "sha256";
      outputHashMode = "flat";
      outputHash     = "47DEQpj8HBSa+/TImW+5JCeuQeRkm5NMpJWZG3hSuFU=";
    touch $out

That build should succeed inside of a sandbox that has a correctly configured network (despite the curl request) because it has a fixed output hash. Additionally:

  • as a negative control you can you comment out the outputHash* lines then the build will fail inside of a sandbox
  • as a positive control you can further disable the sandbox and verify that it succeeds again

If you ask the customer to build the above Nix derivation (with the fixed output hash and with the sandbox enabled) in CI then it will quickly ascertain if their network is correctly configured for the Nix build user or not.

Thanks for the detailed explanation! I think I had FODs somewhat correct in my mind, but I got very confused again when Dhall and its hashes came into play as well. It takes me quite long to build something that the customer then has to try, but this derivation above is very very similar to some Clojure builds I’ve done in their network, so I think I have some intuition and a better explanation of where I’ve gone wrong w.r.t. sandboxing.

  • In our Clojure(Script) builds, I used lein deps (download Maven dependencies) in a buildPhase, similar to the above, and it did NOT work. I’ve had to explicitly export the http_proxy variables in the build phase for it to work, so I’m reasonably certain that your example above does not build in their CI at all, no matter the sandbox.
  • I’ve thought with sandboxing enabled and outputHash set I wouldn’t need any special proxy settings in (my own) builds, but yes, that’s wrong! It only works in our other cases because the fetching is done by Nix itself, which has access to the environment settings. But of course dhall or lein or whatever should be allowed to access the network, too, when it’s an FOD; that’s the part I misunderstood I think.

What I don’t understand yet: Is this really a “network issue” on their end (that is, they’re doing something wrong), or rather just circumstance/configuration? Context: I’m using the nixpkgs/nix Docker image to run our builds in, and I know that the customer sets http_proxy and the other variables according to their proxy settings.

But this has left the realm of Dhall, so thank you very much already for your input! If you’d still like to help me, I’d be glad if you had ideas to point me to w.r.t “what I can do” in this scenario? The "source their environment variables during the buildPhase" hack that I used to do in the past doesn’t work here, because this time it’s not me writing the build script; but somehow impurely inheriting the network settings from the outside world doesn’t play nice with flakes, either, as far as I understand.

I think I know what’s wrong now. Even if the correct environment variables are set (e.g. http_proxy), the Nix build does not have access to them unless they are explicitly inherited from the environment as impure environment variables. See:

Nix’s fetchers (e.g. fetchurl) do this and it’s again permitted for derivations with a fixed output hash. So I think the solution here is to extend the buildDhall* functions to inherit those same environment variables and I put up a pull request fix that:

I suspect if you apply the same change (either via an overlay or by patching Nixpkgs) it should fix the issue you’re running into. An equivalent overlay would be something like this:

pkgsNew: pkgsOld: {
  dhallPackages = pkgsOld.dhallPackages (old: {
    overrides =
      pkgsNew.lib.composeExtensions (old.overrides or (_: _: { }))
        (dhallPackagesNew: dhallPackagesOld: {
          buildDhallUrl = args:
            (dhallPackagesOld.buildDhallUrl args).overrideAttrs (old: {
              impureEnvVars = [ "http_proxy" "https_proxy" "ftp_proxy" "all_proxy" "no_proxy" ];

I’ll try the overlay variant as soon as I’m back at work, so not before Monday, but I’m positive it’ll work :slight_smile:

I now remember having tried using impureEnvVars for another derivation at one point, but have since converted the derivation and forgotten about it…

Thanks for the quick reaction and fix!

You’re welcome!

(For posterity: I believe there’s an .override missing in the overlay sample code: pkgsOld.dhallPackages.override)

I merged the PR:

The problem was/is still happening. The reason is that the impureEnvVars were added to the “wrong” runCommand. I’ve opened #178544 which should fix the problem once and for all; I could reproduce it locally with the updated repo I mentioned at the start and confirm the fix on x86_64-linux.

I’ve still got one question left, though, since I cannot update nixpkgs in my project easily: Is there a better/smarter/shorter way of overlaying this fix into my project than adding basically the whole build-dhall-url.nix file? The previous version of the overlay does not work anymore as the overrideAttrs call is in the wrong place and the intermediate derivation already “consumed” at that point.

There is not an easy way to overlay the latter change from #178544, as far as I know. However, in the extreme case you can patch your Nixpkgs (within Nix itself) and then import the patched Nixpkgs (using import from derivation). This is what we do at work if we need to backport changes to older versions of Nixpkgs that we depend on.