Dhall as a Data Serialization Format

greyhillman · December 23, 2021, 6:50pm

I’ve been debating with myself if Dhall may be used as a data-serialization language. Technically, it can be used as a data-serialization language, but it might go against the purpose of Dhall.

JSON and YAML are data-serialization languages and text formats that are “human readable”. A program can read/write data from/to a JSON or YAML file, which might be written by a human or machine. Configuring a program means passing data to the program to change how it works and is often done by humans. Therefore, it makes sense to use JSON or YAML as a configuration file format.

However, writting JSON or YAML by hand can be tedious and error-prone. That’s why Dhall was created. It provides some basic programming language features (like types and functions) to help write and maintain these configuration files by hand.

But should Dhall be used for the primary purpose of JSON and YAML: data-serialization?

No, it shouldn’t be.

Dhall was created to deal with writing configuration files by hand and the issues that come from it: duplication, errors, scaling, etc. It’s not supposed to store data like users’ emails or the list of cities you want to visit.

Yes, it can be.

A configuration file is data for a program, hence why people call it “configuration data”. The configuration data is about changing the behaviour of the program (like printing everything compact or pretty). But any data into the program can change its behaviour: if there’s one data point, do this; otherwise, do this other thing. From a program’s perspective, configuration data and normal data are the same: just data.

Dhall can be used to create data to be fed into a program or select the data to be fed. Features like importing a file as Text (./file/path.txt as Text), functions, etc. are used for that purpose. The website also has some examples that look like data and not just configuration data:

a list of license information (Core Language Features)
a company configuration file (main site under “Don’t repeat yourself”)
a list of test suites to run (the string matrix tutorial)

I’m leaning towards yes, Dhall can be used as a data-serialization language, but I’d like to know the answer from a more authoritative source.

Gabriel439 · December 23, 2021, 10:16pm

I think for a data serialization format, the main advantage of a programmable format is that you can try to compress repetition in the data set. However, for Dhall specifically that’s less of an advantage because it tends to be more heavyweight due to the lack of type inference. You’d probably get better mileage out of using a non-programmable data serialization format (e.g. JSON or CBOR) and compressing that with a standard compression algorithm.

Overall, I think Dhall wouldn’t be well-suited as a data serialization format, since data serialization formats tend to be optimized for being machine-readable/editable rather than being human-readable/editable.