unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Bioconductor: Use SVN/git by default?
@ 2016-06-17  8:54 Ricardo Wurmus
  2016-06-17 15:13 ` Ludovic Courtès
  2016-06-30  2:44 ` Ben Woodcroft
  0 siblings, 2 replies; 5+ messages in thread
From: Ricardo Wurmus @ 2016-06-17  8:54 UTC (permalink / raw)
  To: guix-devel

Hi Guix,

Bioconductor makes me sad.  Bioconductor is a repository of R packages
for bioinformatics.  They have a bit of a weird release model.  There
are releases of all of Bioconductor (current version is 3.3), but within
releases R packages can be updated.  The packages are supposed to be
compatible with other packages in the same Bioconductor release.

For Bioconductor synchronisation appears to be important, but is also
elusive.  You usually want to have all R packages from the same
Bioconductor release, but since a Bioconductor release is a fluid thing
and individual packages get updated all the time you probably want to
have the latest at all times.

Unfortunately, Bioconductor doesn’t have an archive of previous releases
of R packages.  They only keep the latest version of any particular R
package at a time.  All of Bioconductor is also kept in SVN and there
are git mirrors of the SVN repository.

Our Bioconductor importer (guix import cran -a bioconductor) fetches
DESCRIPTION files of individual R packages from SVN.  I found that the
tarballs offered for download are not always in sync with what is
offered on SVN, so the importer sometimes fails as it tries to fetch a
tarball version that doesn’t exist.

The lack of an archive is also a problem for reproducibility.  You
simply cannot download an archive for an obsolete package version.

This makes me wonder if we shouldn’t ignore the tarballs and fetch
directly from SVN or the git mirror.  I would like to make this a little
more reliable, so that people can reproduce the state of Bioconductor at
a particular point in time if they have a manifest and a git hash of the
Guix repository.

Releases of individual packages are not tagged in the Bioconductor SVN
repository, however.  Do we still have to append the SVN revision to the
version strings of every Bioconductor package?  An increase in the SVN
revision does not necessarily mean that an individual package has been
updated.

What do you think?  I see no way around using the sources from the
central Bioconductor SVN repository as tarballs simply don’t give us
what we need in terms of reproducibility.

~~ Ricardo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bioconductor: Use SVN/git by default?
  2016-06-17  8:54 Bioconductor: Use SVN/git by default? Ricardo Wurmus
@ 2016-06-17 15:13 ` Ludovic Courtès
  2016-06-29 14:51   ` Ricardo Wurmus
  2016-06-30  2:44 ` Ben Woodcroft
  1 sibling, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2016-06-17 15:13 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Hi!

Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> skribis:

> The lack of an archive is also a problem for reproducibility.  You
> simply cannot download an archive for an obsolete package version.

[...]

> What do you think?  I see no way around using the sources from the
> central Bioconductor SVN repository as tarballs simply don’t give us
> what we need in terms of reproducibility.

Would it help if we had access to a universal content-addressed archive
that would include everything Bioconductor has ever published?

That could be another solution (with a big “if”, granted ;-)).

Ludo’.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bioconductor: Use SVN/git by default?
  2016-06-17 15:13 ` Ludovic Courtès
@ 2016-06-29 14:51   ` Ricardo Wurmus
  2016-07-01 12:44     ` Ludovic Courtès
  0 siblings, 1 reply; 5+ messages in thread
From: Ricardo Wurmus @ 2016-06-29 14:51 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel


Ludovic Courtès <ludo@gnu.org> writes:

> Hi!
>
> Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> skribis:
>
>> The lack of an archive is also a problem for reproducibility.  You
>> simply cannot download an archive for an obsolete package version.
>
> [...]
>
>> What do you think?  I see no way around using the sources from the
>> central Bioconductor SVN repository as tarballs simply don’t give us
>> what we need in terms of reproducibility.
>
> Would it help if we had access to a universal content-addressed archive
> that would include everything Bioconductor has ever published?
>
> That could be another solution (with a big “if”, granted ;-)).

I guess this would work too, but it would have to be comprehensive to be
useful.

The advantage of using SVN is that a user could quite easily create
variants of a set of Bioconductor R packages for a particular version of
the Bioconductor SVN repository.  This gives them additional granularity
which makes the fluidity of the Bioconductor releases more manageable.

Another advantage is that SVN exists right now.  It already behaves like
a full-blown archive of all Bioconductor packages, even *between*
Bioconductor releases.  It is just a little more cumbersome to access.

In any case, I think this would be an improvement over what we have
now.  Right now Bioconductor packages in Guix simply are not
reproducible over time.  As this invalidates the method of fully
describing a software environment symbolically (using a git hash of the
Guix repository and a manifest), I think we should build Bioconductor
packages from SVN to fix this.

~~ Ricardo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bioconductor: Use SVN/git by default?
  2016-06-17  8:54 Bioconductor: Use SVN/git by default? Ricardo Wurmus
  2016-06-17 15:13 ` Ludovic Courtès
@ 2016-06-30  2:44 ` Ben Woodcroft
  1 sibling, 0 replies; 5+ messages in thread
From: Ben Woodcroft @ 2016-06-30  2:44 UTC (permalink / raw)
  To: Ricardo Wurmus, guix-devel



On 17/06/16 18:54, Ricardo Wurmus wrote:
> Hi Guix,
>
> Bioconductor makes me sad.  Bioconductor is a repository of R packages
> for bioinformatics.  They have a bit of a weird release model.  There
> are releases of all of Bioconductor (current version is 3.3), but within
> releases R packages can be updated.  The packages are supposed to be
> compatible with other packages in the same Bioconductor release.
>
> For Bioconductor synchronisation appears to be important, but is also
> elusive.  You usually want to have all R packages from the same
> Bioconductor release, but since a Bioconductor release is a fluid thing
> and individual packages get updated all the time you probably want to
> have the latest at all times.
>
> Unfortunately, Bioconductor doesn’t have an archive of previous releases
> of R packages.  They only keep the latest version of any particular R
> package at a time.  All of Bioconductor is also kept in SVN and there
> are git mirrors of the SVN repository.
>
> Our Bioconductor importer (guix import cran -a bioconductor) fetches
> DESCRIPTION files of individual R packages from SVN.  I found that the
> tarballs offered for download are not always in sync with what is
> offered on SVN, so the importer sometimes fails as it tries to fetch a
> tarball version that doesn’t exist.
>
> The lack of an archive is also a problem for reproducibility.  You
> simply cannot download an archive for an obsolete package version.
>
> This makes me wonder if we shouldn’t ignore the tarballs and fetch
> directly from SVN or the git mirror.  I would like to make this a little
> more reliable, so that people can reproduce the state of Bioconductor at
> a particular point in time if they have a manifest and a git hash of the
> Guix repository.
>
> Releases of individual packages are not tagged in the Bioconductor SVN
> repository, however.  Do we still have to append the SVN revision to the
> version strings of every Bioconductor package?  An increase in the SVN
> revision does not necessarily mean that an individual package has been
> updated.

The problem is that the SVN revision number gets bumped each time any 
Bioconductor package is updated? I still would think we should, to me a 
version increase indicates a possible change of source code upstream, 
not a guarantee of one.

What are you intending that 'refresh' report?

Overall FWIW I'm supportive of moving to SVN for Bioconductor, because 
otherwise there is too much onus on us to keep all tarballs in 
perpetuity, or to make sure they are available somewhere.

ben

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Bioconductor: Use SVN/git by default?
  2016-06-29 14:51   ` Ricardo Wurmus
@ 2016-07-01 12:44     ` Ludovic Courtès
  0 siblings, 0 replies; 5+ messages in thread
From: Ludovic Courtès @ 2016-07-01 12:44 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Hello!

Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Hi!
>>
>> Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> skribis:
>>
>>> The lack of an archive is also a problem for reproducibility.  You
>>> simply cannot download an archive for an obsolete package version.
>>
>> [...]
>>
>>> What do you think?  I see no way around using the sources from the
>>> central Bioconductor SVN repository as tarballs simply don’t give us
>>> what we need in terms of reproducibility.
>>
>> Would it help if we had access to a universal content-addressed archive
>> that would include everything Bioconductor has ever published?
>>
>> That could be another solution (with a big “if”, granted ;-)).
>
> I guess this would work too, but it would have to be comprehensive to be
> useful.

For the record:

  https://sympa.inria.fr/sympa/arc/swh-devel/2016-06/msg00007.html

> The advantage of using SVN is that a user could quite easily create
> variants of a set of Bioconductor R packages for a particular version of
> the Bioconductor SVN repository.  This gives them additional granularity
> which makes the fluidity of the Bioconductor releases more manageable.
>
> Another advantage is that SVN exists right now.  It already behaves like
> a full-blown archive of all Bioconductor packages, even *between*
> Bioconductor releases.  It is just a little more cumbersome to access.
>
> In any case, I think this would be an improvement over what we have
> now.  Right now Bioconductor packages in Guix simply are not
> reproducible over time.  As this invalidates the method of fully
> describing a software environment symbolically (using a git hash of the
> Guix repository and a manifest), I think we should build Bioconductor
> packages from SVN to fix this.

Yeah, using SVN is a solution that would work right now, so if that
seems workable for you without too much work, go for it.

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-07-01 12:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-17  8:54 Bioconductor: Use SVN/git by default? Ricardo Wurmus
2016-06-17 15:13 ` Ludovic Courtès
2016-06-29 14:51   ` Ricardo Wurmus
2016-07-01 12:44     ` Ludovic Courtès
2016-06-30  2:44 ` Ben Woodcroft

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).