* DAT
@ 2016-06-18 13:23 Catonano
2016-06-20 8:21 ` DAT Ludovic Courtès
0 siblings, 1 reply; 2+ messages in thread
From: Catonano @ 2016-06-18 13:23 UTC (permalink / raw)
To: guix-devel
[-- Attachment #1: Type: text/plain, Size: 2029 bytes --]
well, this is more about reproducible research and Guix than about DAT
specifically.
The current thread about pipelines is very interesting, but I feel like
there's a missing bit.
The data. The sets of files or the datasets
They are part of a pipeline and they should be versioned too. And sometimes
a pipeline produces a dataset. So there could be packages producing
packages.
There's this project, DAT, and it seems they are onto something, in this
domain.
http://dat-data.com/
Based on how they talk about the issue, I'd say they don't know about Guix.
But they do have the same concern about the exact same software running on
a dataset in order to produce a comparable result.
It seems to me that Guix has something to offer to the DAT community. They
want reproducible builds to use on their versioned bittorrent distributed
datasets, so they end up distributing code together with datasets (Gnunet
distributed substitues, anyone ?) AND because they don't know about Guix
they also end up into containerization.
They even touch the issue of the relationship between developers and users,
that I thing Guix and Guile are trying to blur.
But I didn't understand completely what she said about this because English
is not my native language.
As for distributing large amounts of data with DAG things and merkle hashes
and bittorrent like swarms, there's ipfs doing these things (ipfs aims to
distribute triples rather than raw files) and now DAT too.
I was wondering whether GnuNet has something to say with regard to
reproducible research. These ideas about how to distribute research data in
a serverless fashion came from the "decentralize the web" arena, so maybe
GnuNet might have something to say
But this is not on topic, on the Guix list.
I just thought that you Guix people should be pointed to DAT and the issues
it raises and tries to solve.
It seems to me there's some overlapping and the respective communities
should be aware of each other.
I hope I didn't bother anyone of you with this message
[-- Attachment #2: Type: text/html, Size: 2482 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: DAT
2016-06-18 13:23 DAT Catonano
@ 2016-06-20 8:21 ` Ludovic Courtès
0 siblings, 0 replies; 2+ messages in thread
From: Ludovic Courtès @ 2016-06-20 8:21 UTC (permalink / raw)
To: Catonano; +Cc: guix-devel
Hi,
Catonano <catonano@gmail.com> skribis:
> They are part of a pipeline and they should be versioned too. And sometimes
> a pipeline produces a dataset. So there could be packages producing
> packages.
>
> There's this project, DAT, and it seems they are onto something, in this
> domain.
>
> http://dat-data.com/
From a quick look it seems to me that DAT is primarily focusing on
efficient peer-to-peer data distribution, at least in its current form.
In that sense, I would say that DAT and Guix would be complementary
rather than overlapping in a reproducible science toolbox: Guix could be
used to described data sources, build processes, and pipelines, while
DAT would take care of retrieving data sets (DAT data sets could be
described using ‘origin’ in Guix.)
Thanks for sharing!
Ludo’.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-06-20 8:21 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-18 13:23 DAT Catonano
2016-06-20 8:21 ` DAT Ludovic Courtès
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).