all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Ricardo Wurmus <rekado@elephly.net>
To: Amirouche Boubekki <amirouche@hypermove.net>
Cc: Guix Devel <guix-devel@gnu.org>, ludovic.courtes@inria.fr
Subject: Re: Use guix to distribute data & reproducible (data) science
Date: Mon, 19 Feb 2018 08:57:56 +0100	[thread overview]
Message-ID: <87lgfpjtrv.fsf@elephly.net> (raw)
In-Reply-To: <24274adb01ba9c928a4701054b686a4a@hypermove.net>


Amirouche Boubekki <amirouche@hypermove.net> writes:

> Then, in a follow up mail, you reply to Konrad:
>
>>> Konrad Hinsen <konrad.hinsen@fastmail.net> skribis:
>>
>> [...]
>>
>>> It would be nice if big datasets could conceptually be handled in the
>>> same way while being stored elsewhere - a bit like git-annex does for
>>> git. And for parallel computing, we could have special build daemons.
>>
>> Exactly.  I think we need a git-annex/git-lfs-like tool for the store.
>> (It could also be useful for things like secrets, which we don’t want
>> to have in the store.)

In addition to the answers by Ludo and Roel, I’d like to add that for
data we have more things that we’d like to know about.  For any given
dataset on storage I’d like to know how it relates to previous versions
of the same dataset.  The hash alone would not be sufficient.  I’d
actually need to know which dataset is the parent and which is a child.

The store does not give me relations like that when given two or more
items.  The store retains information about links between items in one
generation (if they embed such references), but not across generations.

I think the requirements for the storage and retrieval of (big) datasets
are very different to those of software packages.

There are projects dedicated to dataset storage, such as Pachyderm.io.
Since data storage is just a stepping stone to better workflows,
Pachyderm also includes support for application bundles, but it may be
better to let a dedicated workflow language take care of the application
side.

Maybe the GWL can be integrated with dedicated data storage solutions
like Pachyderm.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

  parent reply	other threads:[~2018-02-19  7:58 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-16 16:43 Use guix to distribute data & reproducible (data) science Amirouche Boubekki
2018-02-17 22:21 ` Roel Janssen
2018-02-18 23:42 ` Ludovic Courtès
2018-02-19  7:57 ` Ricardo Wurmus [this message]
  -- strict thread matches above, loose matches on Subject: below --
2018-02-09 16:32 Amirouche Boubekki
2018-02-09 17:13 ` Ludovic Courtès
2018-02-09 17:48   ` zimoun
2018-02-09 19:15   ` Konrad Hinsen
2018-02-09 23:01     ` zimoun
2018-02-09 23:17       ` Ricardo Wurmus
2018-02-12 11:46       ` Konrad Hinsen
2018-02-10  9:51     ` Amirouche Boubekki
2018-02-10 11:28       ` zimoun
2018-02-14 13:06     ` Ludovic Courtès
2018-02-15 17:10       ` zimoun
2018-02-16  9:28         ` Konrad Hinsen
2018-02-16 14:33           ` myglc2
2018-02-16 15:20             ` Konrad Hinsen
2018-02-16 12:41         ` Amirouche Boubekki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lgfpjtrv.fsf@elephly.net \
    --to=rekado@elephly.net \
    --cc=amirouche@hypermove.net \
    --cc=guix-devel@gnu.org \
    --cc=ludovic.courtes@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.