all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Amirouche Boubekki <amirouche.boubekki@gmail.com>
To: Konrad Hinsen <konrad.hinsen@fastmail.net>
Cc: guix-devel@gnu.org
Subject: Re: Use guix to distribute data & reproducible (data) science
Date: Sat, 10 Feb 2018 09:51:41 +0000	[thread overview]
Message-ID: <CAL7_Mo9kpmuec8krj8SyDK3NciKXj+46MLC==uVSC7a-5GZJAA@mail.gmail.com> (raw)
In-Reply-To: <1cb709d0-b282-192c-ce1d-20fbff43430e@fastmail.net>

[-- Attachment #1: Type: text/plain, Size: 1979 bytes --]

On Fri, Feb 9, 2018 at 8:16 PM Konrad Hinsen <konrad.hinsen@fastmail.net>
wrote:

> Hi,
>
> On 09/02/2018 18:13, Ludovic Courtès wrote:
>
> > Amirouche Boubekki <amirouche@hypermove.net> skribis:
> >
> >> tl;dr: Distribution of data and software seems similar.
> >>         Data is more and more important in software and reproducible
> >>         science. Data science ecosystem lakes resources sharing.
> >>         I think guix can help.
> >
> > Now, whether Guix is the right tool to distribute data, I don’t know.
> > Distributing large amounts of data is a job in itself, and the store
> > isn’t designed for that.  It could quickly become a bottleneck.  That’s
> > one of the reasons why the Guix Workflow Language (GWL) does not store
> > scientific data in the store itself.
>
> and then distributed via standard channels (Zenodo, ...)


Thanks for the pointer!


> For big datasets, some other mechanism is required.
>

Big as in bigger than ram?


> I think it's worth thinking carefully about how to exploit guix for
> reproducible computations. As Lispers know very well, code is data and
> data is code. Building a package is a computation like any other.
>

What I was thinking about, is use guix to distribute data packages just like
we distribute softwares from pypi. The advantage of using guix seems
obvious,
but apparantly it's not desirable or possible and I don't understand why.

Scientific workflows could be handled by a specific build system. In
> fact, as long as no big datasets or multiple processors are involved, we
> can do this right now, using standard package declarations.
>

Ok, good to know.


> It would be nice if big datasets could conceptually be handled in the
> same way while being stored elsewhere - a bit like git-annex does for
> git.


Thanks again for the pointer.

And for parallel computing, we could have special build daemons.
>

That's where OWL comes in?

[-- Attachment #2: Type: text/html, Size: 3232 bytes --]

  parent reply	other threads:[~2018-02-10  9:51 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-09 16:32 Use guix to distribute data & reproducible (data) science Amirouche Boubekki
2018-02-09 17:13 ` Ludovic Courtès
2018-02-09 17:48   ` zimoun
2018-02-09 19:15   ` Konrad Hinsen
2018-02-09 23:01     ` zimoun
2018-02-09 23:17       ` Ricardo Wurmus
2018-02-12 11:46       ` Konrad Hinsen
2018-02-14  4:43         ` Do you use packages in Guix to run neural networks? Fis Trivial
2018-02-14  6:07           ` Pjotr Prins
2018-02-14  7:27             ` Fis Trivial
2018-02-14  8:04           ` Konrad Hinsen
2018-02-10  9:51     ` Amirouche Boubekki [this message]
2018-02-10 11:28       ` Use guix to distribute data & reproducible (data) science zimoun
2018-02-14 13:06     ` Ludovic Courtès
2018-02-15 17:10       ` zimoun
2018-02-16  9:28         ` Konrad Hinsen
2018-02-16 14:33           ` myglc2
2018-02-16 15:20             ` Konrad Hinsen
2018-02-16 12:41         ` Amirouche Boubekki
  -- strict thread matches above, loose matches on Subject: below --
2018-02-16 16:43 Amirouche Boubekki
2018-02-17 22:21 ` Roel Janssen
2018-02-18 23:42 ` Ludovic Courtès
2018-02-19  7:57 ` Ricardo Wurmus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAL7_Mo9kpmuec8krj8SyDK3NciKXj+46MLC==uVSC7a-5GZJAA@mail.gmail.com' \
    --to=amirouche.boubekki@gmail.com \
    --cc=guix-devel@gnu.org \
    --cc=konrad.hinsen@fastmail.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.