* Paper preprint: Reproducible genomics analysis pipelines with GNU Guix
@ 2018-04-11 12:18 Ricardo Wurmus
2018-04-11 18:30 ` [rb-general] " Holger Levsen
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Ricardo Wurmus @ 2018-04-11 12:18 UTC (permalink / raw)
To: guix-devel; +Cc: rb-general, guix-hpc@gnu.org
Hey all,
I’m happy to announce that the group I’m working with has released a
preprint of a paper on reproducibility with the title:
Reproducible genomics analysis pipelines with GNU Guix
https://www.biorxiv.org/content/early/2018/04/11/298653
We built a collection of bioinformatics pipelines and packaged them with
GNU Guix, and then looked at the degree to which the software achieves
bit-reproducibility (spoiler: ~98%), analysed sources of non-determinism
(e.g. time stamps), discussed experimental reproducibility at runtime
(e.g. random number generators, kernel+glibc interface, etc) and
commented on the idea of using “containers” (or application bundles)
instead.
The middle section is a bit heavy on genomics to showcase the features
of the pipelines, but I think the introduction and the
discussion/conclusion may be of general interest.
--
Ricardo
GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC
https://elephly.net
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [rb-general] Paper preprint: Reproducible genomics analysis pipelines with GNU Guix
2018-04-11 12:18 Paper preprint: Reproducible genomics analysis pipelines with GNU Guix Ricardo Wurmus
@ 2018-04-11 18:30 ` Holger Levsen
2018-04-11 18:40 ` Ricardo Wurmus
2018-04-11 18:31 ` Holger Levsen
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Holger Levsen @ 2018-04-11 18:30 UTC (permalink / raw)
To: General discussions about reproducible builds
Cc: guix-devel, guix-hpc@gnu.org
[-- Attachment #1: Type: text/plain, Size: 1454 bytes --]
Hi Ricardo,
On Wed, Apr 11, 2018 at 02:18:38PM +0200, Ricardo Wurmus wrote:
> I’m happy to announce that the group I’m working with has released a
> preprint of a paper on reproducibility with the title:
>
> Reproducible genomics analysis pipelines with GNU Guix
> https://www.biorxiv.org/content/early/2018/04/11/298653
>
> We built a collection of bioinformatics pipelines and packaged them with
> GNU Guix, and then looked at the degree to which the software achieves
> bit-reproducibility (spoiler: ~98%), analysed sources of non-determinism
> (e.g. time stamps), discussed experimental reproducibility at runtime
> (e.g. random number generators, kernel+glibc interface, etc) and
> commented on the idea of using “containers” (or application bundles)
> instead.
wow, just wow. very very nice to see that!
> The middle section is a bit heavy on genomics to showcase the features
> of the pipelines, but I think the introduction and the
> discussion/conclusion may be of general interest.
As you might guess I have just skimmed over the text but it's really
super cool to see reproducible builds used in science! and diffoscope,
too!
just one thing/question: in the keywords you have "reproducible
software" but not "reproducible builds", which is kind of our "marketing
term". Do you think you could squeeze that in?
--
cheers,
Holger, once again wishing he could read more (and more...)
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [rb-general] Paper preprint: Reproducible genomics analysis pipelines with GNU Guix
2018-04-11 12:18 Paper preprint: Reproducible genomics analysis pipelines with GNU Guix Ricardo Wurmus
2018-04-11 18:30 ` [rb-general] " Holger Levsen
@ 2018-04-11 18:31 ` Holger Levsen
2018-04-11 21:16 ` Roel Janssen
2018-04-23 8:20 ` [rb-general] " Ludovic Courtès
3 siblings, 0 replies; 13+ messages in thread
From: Holger Levsen @ 2018-04-11 18:31 UTC (permalink / raw)
To: General discussions about reproducible builds
Cc: guix-devel, guix-hpc@gnu.org
[-- Attachment #1: Type: text/plain, Size: 113 bytes --]
hi again,
and extra kudos and thanks for releasing this under a free licence! \o/
--
cheers,
Holger
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [rb-general] Paper preprint: Reproducible genomics analysis pipelines with GNU Guix
2018-04-11 18:30 ` [rb-general] " Holger Levsen
@ 2018-04-11 18:40 ` Ricardo Wurmus
2018-04-11 19:00 ` Holger Levsen
0 siblings, 1 reply; 13+ messages in thread
From: Ricardo Wurmus @ 2018-04-11 18:40 UTC (permalink / raw)
To: Holger Levsen
Cc: General discussions about reproducible builds, guix-devel,
guix-hpc@gnu.org
Hi Holger,
thanks for your comments!
> just one thing/question: in the keywords you have "reproducible
> software" but not "reproducible builds", which is kind of our "marketing
> term". Do you think you could squeeze that in?
Heh, it used to be “reproducible builds”, but the term was deemed too
abstract for the audience of biologists, so it was decided to change it
to “reproducible software”…
Lots of small compromises need to be made when writing a paper together,
and that was one of them :)
--
Ricardo
GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC
https://elephly.net
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [rb-general] Paper preprint: Reproducible genomics analysis pipelines with GNU Guix
2018-04-11 18:40 ` Ricardo Wurmus
@ 2018-04-11 19:00 ` Holger Levsen
0 siblings, 0 replies; 13+ messages in thread
From: Holger Levsen @ 2018-04-11 19:00 UTC (permalink / raw)
To: Ricardo Wurmus
Cc: General discussions about reproducible builds, guix-devel,
guix-hpc@gnu.org
[-- Attachment #1: Type: text/plain, Size: 517 bytes --]
On Wed, Apr 11, 2018 at 08:40:47PM +0200, Ricardo Wurmus wrote:
> > just one thing/question: in the keywords you have "reproducible
> Heh, it used to be “reproducible builds”, but the term was deemed too
> abstract for the audience of biologists, so it was decided to change it
> to “reproducible software”…
hehe.
> Lots of small compromises need to be made when writing a paper together,
> and that was one of them :)
I understand.
& thanks again, super cool!
--
cheers,
Holger
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Paper preprint: Reproducible genomics analysis pipelines with GNU Guix
2018-04-11 12:18 Paper preprint: Reproducible genomics analysis pipelines with GNU Guix Ricardo Wurmus
2018-04-11 18:30 ` [rb-general] " Holger Levsen
2018-04-11 18:31 ` Holger Levsen
@ 2018-04-11 21:16 ` Roel Janssen
2018-04-15 7:50 ` Amirouche Boubekki
2018-04-23 8:20 ` [rb-general] " Ludovic Courtès
3 siblings, 1 reply; 13+ messages in thread
From: Roel Janssen @ 2018-04-11 21:16 UTC (permalink / raw)
To: Ricardo Wurmus; +Cc: guix-devel, rb-general, guix-hpc@gnu.org
Ricardo Wurmus <rekado@elephly.net> writes:
> Hey all,
>
> I’m happy to announce that the group I’m working with has released a
> preprint of a paper on reproducibility with the title:
>
> Reproducible genomics analysis pipelines with GNU Guix
> https://www.biorxiv.org/content/early/2018/04/11/298653
>
> We built a collection of bioinformatics pipelines and packaged them with
> GNU Guix, and then looked at the degree to which the software achieves
> bit-reproducibility (spoiler: ~98%), analysed sources of non-determinism
> (e.g. time stamps), discussed experimental reproducibility at runtime
> (e.g. random number generators, kernel+glibc interface, etc) and
> commented on the idea of using “containers” (or application bundles)
> instead.
>
> The middle section is a bit heavy on genomics to showcase the features
> of the pipelines, but I think the introduction and the
> discussion/conclusion may be of general interest.
This looks really great! I also like how you leverage GNU Autotools.
Finally there is a paper that uses GNU Guix as deployment tool for
scientific purposes. :)
Kind regards,
Roel Janssen
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Paper preprint: Reproducible genomics analysis pipelines with GNU Guix
2018-04-11 21:16 ` Roel Janssen
@ 2018-04-15 7:50 ` Amirouche Boubekki
0 siblings, 0 replies; 13+ messages in thread
From: Amirouche Boubekki @ 2018-04-15 7:50 UTC (permalink / raw)
To: Roel Janssen; +Cc: guix-devel, rb-general, guix-hpc@gnu.org
[-- Attachment #1: Type: text/plain, Size: 1342 bytes --]
Wow very great, thanks for sharing.
On Wed, Apr 11, 2018 at 11:17 PM Roel Janssen <roel@gnu.org> wrote:
>
> Ricardo Wurmus <rekado@elephly.net> writes:
>
> > Hey all,
> >
> > I’m happy to announce that the group I’m working with has released a
> > preprint of a paper on reproducibility with the title:
> >
> > Reproducible genomics analysis pipelines with GNU Guix
> > https://www.biorxiv.org/content/early/2018/04/11/298653
> >
> > We built a collection of bioinformatics pipelines and packaged them with
> > GNU Guix, and then looked at the degree to which the software achieves
> > bit-reproducibility (spoiler: ~98%), analysed sources of non-determinism
> > (e.g. time stamps), discussed experimental reproducibility at runtime
> > (e.g. random number generators, kernel+glibc interface, etc) and
> > commented on the idea of using “containers” (or application bundles)
> > instead.
> >
> > The middle section is a bit heavy on genomics to showcase the features
> > of the pipelines, but I think the introduction and the
> > discussion/conclusion may be of general interest.
>
> This looks really great! I also like how you leverage GNU Autotools.
>
> Finally there is a paper that uses GNU Guix as deployment tool for
> scientific purposes. :)
>
> Kind regards,
> Roel Janssen
>
>
[-- Attachment #2: Type: text/html, Size: 1870 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [rb-general] Paper preprint: Reproducible genomics analysis pipelines with GNU Guix
2018-04-11 12:18 Paper preprint: Reproducible genomics analysis pipelines with GNU Guix Ricardo Wurmus
` (2 preceding siblings ...)
2018-04-11 21:16 ` Roel Janssen
@ 2018-04-23 8:20 ` Ludovic Courtès
[not found] ` <87fu30fsra.fsf@elephly.net>
3 siblings, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2018-04-23 8:20 UTC (permalink / raw)
To: Ricardo Wurmus; +Cc: guix-devel, guix-hpc@gnu.org
Hello Ricardo & all!
Ricardo Wurmus <rekado@elephly.net> skribis:
> I’m happy to announce that the group I’m working with has released a
> preprint of a paper on reproducibility with the title:
>
> Reproducible genomics analysis pipelines with GNU Guix
> https://www.biorxiv.org/content/early/2018/04/11/298653
>
> We built a collection of bioinformatics pipelines and packaged them with
> GNU Guix, and then looked at the degree to which the software achieves
> bit-reproducibility (spoiler: ~98%), analysed sources of non-determinism
> (e.g. time stamps), discussed experimental reproducibility at runtime
> (e.g. random number generators, kernel+glibc interface, etc) and
> commented on the idea of using “containers” (or application bundles)
> instead.
Very impressive piece of work! I think it’s important to stress that
reproducible builds is a crucial foundation for reproducible
computational experiments, and this paper does a great job at this.
Also nice that you show you can have these bit-reproducible pipelines
formalized in Guix *and* produce a ready-to-use “container image.”
Hopefully we can soon address the remaining sources of non-determinism
shown in Table 3 (I think you already addressed some of them in the
meantime, didn’t you?).
The bit I’m less comfortable with is Autotools. I do understand how it
helps capture configure-time dependencies, and how it generally helps
people package and use the software; I think it’s one of the best tools
for the job. However it’s also hard to learn and, whether it’s
justified or not, it’s considered “scary.”
Given the intended audience, I wonder how we could provide a simpler
path to achieve the same goal. It could be a set of Autoconf macros
leading to high-level ‘configure.ac’ files without any line of shell
code, or it could be Guix interpreting a top-level .scm or JSON file,
both of which would ideally be easier to write for bioinformaticians.
What are your thoughts on this?
Anyway, kudos on this, thank you!
Ludo’.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [rb-general] Paper preprint: Reproducible genomics analysis pipelines with GNU Guix
[not found] ` <87fu30fsra.fsf@elephly.net>
@ 2018-05-11 8:10 ` Ludovic Courtès
2018-05-11 8:19 ` Ricardo Wurmus
0 siblings, 1 reply; 13+ messages in thread
From: Ludovic Courtès @ 2018-05-11 8:10 UTC (permalink / raw)
To: Ricardo Wurmus; +Cc: guix-devel, guix-hpc@gnu.org
Hello!
Ricardo Wurmus <rekado@elephly.net> skribis:
> Ludovic Courtès <ludovic.courtes@inria.fr> writes:
[...]
>> Given the intended audience, I wonder how we could provide a simpler
>> path to achieve the same goal. It could be a set of Autoconf macros
>> leading to high-level ‘configure.ac’ files without any line of shell
>> code, or it could be Guix interpreting a top-level .scm or JSON file,
>> both of which would ideally be easier to write for bioinformaticians.
>
> I think a higher level “configure.ac” file would be of great help. In
> general, independent of this particular use case.
Perhaps we could add to Autoconf-Archive (if it doesn’t have such things
already) macros to deal with the R and Python stuff you had to deal
with? And then publish a simple template that people could use as a
starting point.
> There is a danger in pushing all of this work to Guix, though. One of
> the great features of the Autotools suite is that users don’t need to
> know about it. If we assume that users have Guix (which in our paper we
> only strongly encourage) we might as well have implemented the whole
> pipeline using the Guix Workflow Language. This is, of course, a valid
> option, but the goal of the paper was to demonstrate a more general
> claim and approach to designing pipelines. I wanted to encourage
> pipeline developers to treat their pipeline as a first-class package,
> not as some glue code that binds together tools in a specially crafted
> runtime environment.
Yes, that makes sense.
> I think that this alternative is worth exploring, though. Building a
> complex pipeline with the Guix Workflow Language that addresses both
> deployment and execution order would be an interesting project; it would
> also be good to look into ways to make such a workflow available to
> users who do not have the ability or intention to install Guix. An easy
> way is to bundle up the whole environment as one giant container blob,
> but I think we can do better. I’d love to collaborate with other users
> of the GWL to see how far we can push it.
Would be nice, indeed.
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [rb-general] Paper preprint: Reproducible genomics analysis pipelines with GNU Guix
2018-05-11 8:10 ` Ludovic Courtès
@ 2018-05-11 8:19 ` Ricardo Wurmus
2018-05-11 9:39 ` Catonano
0 siblings, 1 reply; 13+ messages in thread
From: Ricardo Wurmus @ 2018-05-11 8:19 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: guix-devel, guix-hpc@gnu.org
Ludovic Courtès <ludovic.courtes@inria.fr> writes:
> Perhaps we could add to Autoconf-Archive (if it doesn’t have such things
> already) macros to deal with the R and Python stuff you had to deal
> with? And then publish a simple template that people could use as a
> starting point.
I submitted my macros for R packages (and they have been accepted), but
I actually don’t really like them because they are not as useful as it
may seem. While they do check for R packages in the environment at
configure time, nothing is done to record the environment necessary to
access these packages.
That’s a general problem for software that depends on search path
environment variables. I can’t just record the location of each
individual R package that was detected and use that to set up the
environment at runtime. The R packages have other runtime dependencies
that would also need to be recorded.
It’s not ideal.
--
Ricardo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [rb-general] Paper preprint: Reproducible genomics analysis pipelines with GNU Guix
2018-05-11 8:19 ` Ricardo Wurmus
@ 2018-05-11 9:39 ` Catonano
2018-05-13 5:07 ` Ricardo Wurmus
0 siblings, 1 reply; 13+ messages in thread
From: Catonano @ 2018-05-11 9:39 UTC (permalink / raw)
To: Ricardo Wurmus; +Cc: guix-devel, Ludovic Courtès, guix-hpc@gnu.org
[-- Attachment #1: Type: text/plain, Size: 2962 bytes --]
2018-05-11 10:19 GMT+02:00 Ricardo Wurmus <rekado@elephly.net>:
>
> Ludovic Courtès <ludovic.courtes@inria.fr> writes:
>
> > Perhaps we could add to Autoconf-Archive (if it doesn’t have such things
> > already) macros to deal with the R and Python stuff you had to deal
> > with? And then publish a simple template that people could use as a
> > starting point.
>
> I submitted my macros for R packages (and they have been accepted), but
> I actually don’t really like them because they are not as useful as it
> may seem. While they do check for R packages in the environment at
> configure time, nothing is done to record the environment necessary to
> access these packages.
>
> That’s a general problem for software that depends on search path
> environment variables. I can’t just record the location of each
> individual R package that was detected and use that to set up the
> environment at runtime. The R packages have other runtime dependencies
> that would also need to be recorded.
>
> It’s not ideal.
>
> --
> Ricardo
>
>
>
Ricardo, I don't understand the problem you're raising here (I didn't read
the article yet, though)
Would you mind to elaborate on that ?
Why would you want to record the environment ?
I have this tiny prototype that checks for the availability of the Guile
module "sqlite3" at configure time and writes this csexp (
https://gitlab.com/dustyweb/guile-csexps ) in a file
(7:sqlite32:no)
(7:sqlite33:yes)
The first line is produced in an environment in which sqlite3 is not
available
The second one is produced in an environment in which sqlite3 is, well
guess what, available
I produce such environments with the Guix "environment" command
I think csexps are cool because they are readable to humans
A user creating their pipeline can easily inspect the result of the
configuration phase
They could even paste excerpts of text on mailing lists, should they want
to ask for help
In my idea a build tool doesn't attempt at managing an environment
You could have sqlite3 because you set up a Guix environment, or because
you installed it with apt-get or dnf or manually
The build tool only worries about the availabilty, not how it's achieved
If every dependency is available (anyhow) it just builds
Because building and package management are supposed to be differrent
concerns.
If you have Guix, fine.
If you haven't Guix, then you're on your own, if you can manage, fine
This should address your concern to let people treat their pipelines as
packages
Doesn't it ?
Is this approach not enough for you ?
May I ask why ?
For now it only tests Guile modules but it could be obviously generalized
to test for more things (libs versions, data structures availability, along
the lines of what Autoconf does)
I'd love to be able to set up my (Guile) packages without having to deal
with the Autotools 😯
[-- Attachment #2: Type: text/html, Size: 4068 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [rb-general] Paper preprint: Reproducible genomics analysis pipelines with GNU Guix
2018-05-11 9:39 ` Catonano
@ 2018-05-13 5:07 ` Ricardo Wurmus
2018-05-13 8:58 ` Catonano
0 siblings, 1 reply; 13+ messages in thread
From: Ricardo Wurmus @ 2018-05-13 5:07 UTC (permalink / raw)
To: Catonano; +Cc: guix-devel, Ludovic Courtès, guix-hpc@gnu.org
Catonano <catonano@gmail.com> writes:
> Ricardo, I don't understand the problem you're raising here (I didn't read
> the article yet, though)
>
> Would you mind to elaborate on that ?
>
> Why would you want to record the environment ?
I want to record the detected build environment so that I can restore it
at execution time. Autoconf provides macros that probe the environment
and record the full path to detected tools. For example, I’m looking
for Samtools, and the user may provide a particular variant of Samtools
at configure time. I record the full path to the executable at
configure time and embed that path in a configuration file that is read
when the pipeline is run.
This works fine for tools, but doesn’t work very well at all for modules
in language environments. Take R for example. I can detect and record
the location of the R and Rscript executables, but I cannot easily
record the location of build-time R packages (such as r-deseq2) in a way
that allows me to rebuild the environment at runtime.
Instead of writing an Autoconf macro that records the exact location of
each of the detected R packages and their dependencies I chose to solve
the problem in Guix by wrapping the pipeline executables in R_SITE_LIBS,
because I figured that on systems without Guix you aren’t likely to
install R packages into separate unique locations anyway — on most
systems R packages end up being installed to one and the same directory.
I think the desire to restore the configured environment at runtime is
valid and we do this all the time when we run binaries that have
embedded absolute paths (to libraries or other tools). It’s just that
it gets pretty awkward to do this for things like R packages or Python
modules (or Guile modules for that matter).
The Guix workflow language solves this problem by depending on Guix for
software deployment. For PiGx we picked Snakemake early on and it does
not have a software deployment solution (it expects to either run inside
a suitable environment that the user provides or to have access to
pre-built Singularity application bundles). I don’t like to treat
pipelines like some sort of collection of scripts that must be invoked
in a suitable environment. I like to see pipelines as big software
packages that should know about the environment they need, that can be
configured like regular tools, and thus only require the packager to
assemble the environment, not the end-user.
--
Ricardo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [rb-general] Paper preprint: Reproducible genomics analysis pipelines with GNU Guix
2018-05-13 5:07 ` Ricardo Wurmus
@ 2018-05-13 8:58 ` Catonano
0 siblings, 0 replies; 13+ messages in thread
From: Catonano @ 2018-05-13 8:58 UTC (permalink / raw)
To: Ricardo Wurmus; +Cc: guix-devel, Ludovic Courtès, guix-hpc@gnu.org
[-- Attachment #1: Type: text/plain, Size: 5651 bytes --]
2018-05-13 7:07 GMT+02:00 Ricardo Wurmus <rekado@elephly.net>:
>
> Catonano <catonano@gmail.com> writes:
>
> > Ricardo, I don't understand the problem you're raising here (I didn't
> read
> > the article yet, though)
> >
> > Would you mind to elaborate on that ?
> >
> > Why would you want to record the environment ?
>
> I want to record the detected build environment so that I can restore it
> at execution time. Autoconf provides macros that probe the environment
> and record the full path to detected tools. For example, I’m looking
> for Samtools, and the user may provide a particular variant of Samtools
> at configure time.
Thanks for clarifying !
Let me vent some thoughts on te issue !
Under Guix, the way to provide a specific version of the Samtools would be
to run the configuration in an environment that offers a specific Samtools
package, so that the configuration tool can pick that up
Under a traditional distro, it'd be to feed file paths to the configuration
tool
So, how much of the traditional way of doing things do we want to support,
in our pipelines ?
I record the full path to the executable at
> configure time and embed that path in a configuration file that is read
> when the pipeline is run.
>
> This works fine for tools, but doesn’t work very well at all for modules
> in language environments. Take R for example. I can detect and record
> the location of the R and Rscript executables, but I cannot easily
> record the location of build-time R packages (such as r-deseq2) in a way
> that allows me to rebuild the environment at runtime.
>
> Instead of writing an Autoconf macro that records the exact location of
> each of the detected R packages and their dependencies I chose to solve
> the problem in Guix by wrapping the pipeline executables in R_SITE_LIBS,
> because I figured that on systems without Guix you aren’t likely to
> install R packages into separate unique locations anyway — on most
> systems R packages end up being installed to one and the same directory.
>
> I think the desire to restore the configured environment at runtime is
> valid and we do this all the time when we run binaries that have
> embedded absolute paths (to libraries or other tools).
I didn't mean to imply it's not valid
I was just trying to understand what are the concerns on the ground and the
context
> It’s just that
> it gets pretty awkward to do this for things like R packages or Python
> modules (or Guile modules for that matter).
>
> The Guix workflow language solves this problem by depending on Guix for
> software deployment. For PiGx we picked Snakemake early on and it does
> not have a software deployment solution (it expects to either run inside
> a suitable environment that the user provides or to have access to
> pre-built Singularity application bundles). I don’t like to treat
> pipelines like some sort of collection of scripts that must be invoked
> in a suitable environment. I like to see pipelines as big software
> packages that should know about the environment they need, that can be
> configured like regular tools, and thus only require the packager to
> assemble the environment, not the end-user.
>
I understand your concern to consider pipelines as packages
But say, for example, that a pipeline gets distributed as a .deb package
with dependencies to R (or Guile) modules
Or, say, that a pipeline is distributed with a bundled guix.scm file
containing R modules (or Guile modules) as inputs
Would that break the idea of a pipeline as a package ?
I'm afraid that the idea of a pipeline as a package shouldn't be entrusted
to the configuration tool, but rather to the package management tool
And the pipeline author shouldn't be assumed to work in isolation,
confident that any package management environment will be able to rus their
pipeline smoothly
The pipelines authors should be concerned with the collocation of their
pipeline in the packaged graph, that shouldn't be a concern of the packager
only
Maybe the sotware authors should provide dependency information in a
standardized format (rdf ? ) and that should be leveraged by packagers in
order to prepare .deb packages or guix.scm files
And if you are a developer and you want to test the software with a
specific version of a dependency, then you should run the configuration
tool in an environment where that version of the dependency is available,
so that the configuration tool can pick that up
If you are on Guix, you will probably create that environment with the Guix
environment tool
If you are on Debian or Fedora, you will have to rely on those distros
development tools
On traditional distros, you can install packages in your user folder or in
/opt or in other positions
And then, you can feed those to the configuration tool
On Guix, the conditions are different
The idea of pipelines as packages will be treated differently by the
configuration tool under Guix and the configuration tool under Debian/Fedora
So, in my view a configuration tool should be quite dumb and assume that
the package management is smarter
You object that implies the idea of the pipeline as a ugly hack
That is not necessarily so
It's just that I don't think that the pipelines authors can complete the
issue in their configuration management
Guix introduces the idea of the whole dependencies stack and that can't be
of concern to packagers only.
I don't think so
Maybe I'm too pessimistic, I don't know
Thanks for this discussion !
[-- Attachment #2: Type: text/html, Size: 7044 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-05-13 8:58 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-04-11 12:18 Paper preprint: Reproducible genomics analysis pipelines with GNU Guix Ricardo Wurmus
2018-04-11 18:30 ` [rb-general] " Holger Levsen
2018-04-11 18:40 ` Ricardo Wurmus
2018-04-11 19:00 ` Holger Levsen
2018-04-11 18:31 ` Holger Levsen
2018-04-11 21:16 ` Roel Janssen
2018-04-15 7:50 ` Amirouche Boubekki
2018-04-23 8:20 ` [rb-general] " Ludovic Courtès
[not found] ` <87fu30fsra.fsf@elephly.net>
2018-05-11 8:10 ` Ludovic Courtès
2018-05-11 8:19 ` Ricardo Wurmus
2018-05-11 9:39 ` Catonano
2018-05-13 5:07 ` Ricardo Wurmus
2018-05-13 8:58 ` Catonano
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.