well, this is more about reproducible research and Guix than about DAT specifically.
The current thread about pipelines is very interesting, but I feel like there's a missing bit.
The data. The sets of files or the datasets
They are part of a pipeline and they should be versioned too. And sometimes a pipeline produces a dataset. So there could be packages producing packages.
There's this project, DAT, and it seems they are onto something, in this domain.
http://dat-data.com/Based on how they talk about the issue, I'd say they don't know about Guix. But they do have the same concern about the exact same software running on a dataset in order to produce a comparable result.
It seems to me that Guix has something to offer to the DAT community. They want reproducible builds to use on their versioned bittorrent distributed datasets, so they end up distributing code together with datasets (Gnunet distributed substitues, anyone ?) AND because they don't know about Guix they also end up into containerization.
They even touch the issue of the relationship between developers and users, that I thing Guix and Guile are trying to blur.
But I didn't understand completely what she said about this because English is not my native language.
As for distributing large amounts of data with DAG things and merkle hashes and bittorrent like swarms, there's ipfs doing these things (ipfs aims to distribute triples rather than raw files) and now DAT too.